Data 310 project summaries
(This isn’t my computer but it definitely sounded like this might happen when I was running my CNN)
About the data:
For project 3, we were provided with 10,000 images of Korle Gonno, a town in Accra, Ghana. Each image is 480x480 pixels and contains 3 bytes per pixel. Accra is ~225.7 km^2 so each of the 10,000 images represents ~0.02257 km^2 of the total area. The images, known as orthophotos, were taken from an aircraft and are of a high resolution (~60cm). For each photo, the approximate population residing within the pictured area was recorded in a csv file, where each entry corresponds to the assigned name of each photo (for example, the population of ‘1.jpeg’ is the first record value in the csv file).
The process: Using the above dataset, I built a model that predicts the population within a given image. I went about this process in 3 ways with varrying levels of success and run times (hence, “all aboard the struggle bus” became my personal montra for the weekend). Before I began, I downloaded the labels (population sizes) csv and unpacked the zipfiles containing the images. After moving them into a file with my desired file path, I began my first attempt.
Results: Something I quickly realized was that ‘Accuracy’ was not a helpful metric in evaluating the success of my model. This was because this number was often so low that it read ‘0000000 e0’, so I decided to focus on MSE and MAE. MSE is an ancronym for mean squared error, a measure of the average squared distance of outputs/predictions from the true values. MAE stands for mean absolute error is the average ditance between predictions and the actual values (source). Both of these measures are used to gauge a model’s performance- higher values generally indicate a poor model. Note that distance is >= 0. Next, I will detail four of my more notable CNN runs.
Run 1: Before this run, I tried using smaller subsets of the data, fewer epochs, and smaller batch sizes. When the MAE and MSE remained high, I opted to increase my parameters. In this run, I used a training batch of 500 images and a testing batch of 50 images. I fit my model using epochs=10, steps_per_epoch = 10, and batch_size = 5. After just a few minutes, I was able to call model.evalute, which returned loss (MSE): 62.9714, MAE: 7.9354. Although these results are not indicative of overall fit, this seemed promising given I started at over a million for loss. Determined to do better, I made some more changes and tried again.
As evident in the graphs, MAE and MSE are esentially scaled versions of eachother so their graphs will generally have the same shape. In these cases, MSE will have larger values (in the y direction) becuase the distance between the true values and the predicted values was greater than 1 (if the difference (d) is >1, the squared difference d^2>d and if d<1, d^2<d). In these first graphs, there is a sharp increase in error at 4 epochs. This may suggest overfitting after this point, however, I think this is likely because this run used such a small data subset and batch sizes.
Run 2: Next, I increased my batch size to 1,000 training images and 100 test images. I kept epochs, steps_per_epoch, and batch_size the same and had results in under an hour. This time, test loss increased to 877.9531 and MAE to 25.5498. Chalking the decrease in performance up to small batch sizes, I ran the model a third time with larger batch sizes.
These graphs show more irregularity than any other graphs. I think this may be a product of providing a large training dataset, while keeping batch size and other paraments small (so the model couldn’t get a cohesive view of the data- its like trying describe what a collection of things look like by seeing a few of them through a straw).
Run 3: For this run, I reused the 1,000 image training set and 100 image testing set and set epochs=10, steps_per_epoch = 10, and batch_size = 100. I kept these numbers relatively low out of fear for miscalculations resulting in insufficient data to complete the fitting process. After 8 hours, the following was returned loss: 878.0463, MAE: 25.5515.
After accounting for an increase in training set size in my model fitting parameters, the results looked more promising. Although my overall test loss and MAE were higher than in previous runs, these graphs show that the model steadily improved throughout training- the lack of irregular behavior in the graphs is a promising sign that the model fit is improving too.
Run 4: Lastly, I increased the number of convolution and pooling layers to 3 each. I fit this model twice- for the first fit, I used a batch size of 10 and for the second, I used a batch size of 50. Other than these changes, I kept my CNN and other fitting parameters the same.
Note: After increasing number of epochs for each model, my results changed insignificaly and my accuracy remained at 0.
Run 4(a): Based on my results for run 2, I worried that fitting my model using such small parameters and larger quantities of data would yield poor results, however, the additional convolution and pooling layers seemed to help avoid the irregularity seen in run 2 results. The testing loss (MSE) was just 106.3372 and MAE was 7.5652. Similarly to the graph from run 3, MSE and MAE steadily decreased as training progressed. Additionally, the maximum and minimum values for MSE and MAE were also considerably lower than in the last run.
Run 4(b): This run, I increased batch size to 50. Although it took ~3 hours to complete fitting, I was left with a testing loss of 89.6446 and MAE of 5.9938. Comparatively, these graphs reflect a greater amount of detail (becuase we are working with smaller numbers, visibly smaller change is more significant. An interesting part about this graph is the increase in error at 8 epochs. This increase may be ‘temporary’ and we could see a further decrease in error beyond 10 epochs, however, it may be that the model becomes overfit or innacurate after this point.
Applications: Although my model was highly innacurate, similar models could be useful in the following ways:
Ways to improve: My PyCharm will not show matplotlib plots, even with simple examples (it has worked in the past but, for lack of better words, it seems insulted by my numberous long model runs) so I produced the above graphs in Google Colab. This definitely limited what plots I was able to make. If I was able, I would have produced plots of validation MSE, MAE, and accuracy. I would have also produced a plot showing the convolutions for an image, similarly to what we did for the mnsit dataset in the class exercise from the 14th.
In the future, I would like to add more convolution and pooling layers, change the activations, and change the number of neurons per layer, to see how accuracy would change- this would make another interesting graph too (how scores changed over convolutions). Although my model didn’t turn out very well, I learned a lot in the process and would like to try it on images of other areas of the world and different terrains to see how it performs.
Conclusion: After giving my computer a brief reprieve from fitting, I ran the following images through the model. Unfortunately, the model’s terrible predictive powers shown though.
Model prediction: 20.085924
True value: 36.1680641174316
Model prediction: 18.063229
True value: 24.4707736968994
Model prediction: 18.373161
True value: 35.9370880126953
It is possible that the model is overfit. This is supported by my relatively smooth graphs above and the fact that the model can seemingly only predict values in a small window (~18-20 based on my limited examples above). It may also be that the model is underfit because of the small training set, batch sizes, and low epochs. As I was unable to graph validations error/other scores, I am unable to tell which of these it the case.
Beyond challenges with fitting the model, the data itself also presents some limitations, including: