Data 310 project summaries
Reflecting on this summer semester, I am filled with a mixture of gratitude, relief, and an overwhelming desire to take a nap. In all seriousness, although this program was challenging at times, I learned SO much. I don’t think I fully knew what I signed up for when the program started but after getting familiar with the data science lingo, things began to feel less intimidating.
One of the practical skills I refined while completing this project was the art of just winging it- seriously! I had no idea where to start. In fact, I spent a 2 full weeks attempting to clean up a dataset (by COVID-net), only to realize that it was so poorly organzied that I couldn’t separate distinct patient records- it was a mess. By some miracle, I started looking up prestegious institutions and COVID data, after struggling to use a George Washington Univeristy dataset, and found one by the University of Oxford that seemed just right. Although it was missing values for many of the 37 parameters for most countries, I was able to identify 6 that were available for every country (only in the most recently added rows).
After a lot of trail and error with what type of model to use and adjusting the different layers and arguments, it started to run. At first, my results were pretty abysmal but as I began making small changes, I was able to get a better undertanding for what each adjustment did. I eventually settled on the model, which I detail on my poster. Considering my MAE started at over 10,000, I’m pretty happy with the decrease to ~540 by the end. With the US data, I was also able to make a correlation matrix heat map, which is available in my slides for the showcase. In my in-class presentation, I shared some additional info about how the optimizer and loss functions, which I included below (more info on MAE included on poster):
NOTE: I just reran the model the got the same MAE but with just 75 epochs (instead of 1,000) when using Adam instead of RMS Prop- correction to poster
When we train the model, the optimizer makes guesses about the data, of which the loss function judges the accuracy of. Gradient descent is this ‘guess and check’ process. Depending on the weights assigned within the model, its predictions will have a different measure of error. The goal of gradient descent is to reduce this error- picture an empty pond. The ground slopes up and down, there are several local minimums in the pond and one global minimum (the lowest point). This surface represents the multidimensional graph that Adam analyzes by ‘walking around’, guessing and checking if it is at the minimum possible error by producing a gradient vector, which points in the direction of maximum rate of change of the surface. Different optimizers work better depending on the shape of this graph. Adam can take a specified learning rate argument to define how the mathematical functions in the model’s transformers can learn using gradient descent. In other words, the learning rate defines how large of steps the function will take as it walks around the graph. By setting a low learning rate, we can ensure that the model doesn’t train too quickly, which will increase accuracy (especially since I’m working with a fairly large dataset).
If I had more time, I would have liked to generate more data to better train the model and to have made a model that specifically predicted values for the United States or another country rather than trying to generealize data about the whole world.