Data-310-Public-Raposo

Data 310 project summaries

View the Project on GitHub aeraposo/Data-310-Public-Raposo

7.28.20 Class Exercise:

Word Embeddings:

word embedding accuracy

word embedding loss

As visible in the above graphs, training accuracy and loss smoothly approach 1 and 0 respectively, however, this is not the case for the validation scores. The line of validation accuracy greatly increases and decreases eradically and shows an overall decline, suggesting overfitting. Similarly, validation loss changes eradically and increases singificantly over epochs. This overfitting is likely becuase of a large vocabulary size (because the model spends equal time learning uncommon words as it does the more relevant, common ones).

word embedding pca and tsne
The above visualization of the embeddings displays (using PCA and tSNE dimensionality techniques so we can visualize datapoints originally in hundreds of dimensions) word relations. Points that are closer to eachother represent words with similar meaning (their 4DE vectors are similar). In this case, since some words are split into fragments, not all of this clustering makes sense. These plots are a way to visualize the relation between words that the model assigns using embedding.

Text Classification with an RNN:

Without LSTM:
RNN text classification model without LSTM layers
With LSTM:
RNN text classification model with LSTM layers