Data 310 project summaries
A. Premade estimators:
How did you split the labels from the training set? What was the name of the labels dataset?
The training and testing data was read in from 2 seperate csv files. In the training data, we then split out the ‘species’ label becuase this is what we want our model to predict. The data contains features and labels. The features are different measures (in this case about flowers) of a datapoint and the labels are the datapoint’s true classification. The labels datasets are named train_y and test_y, which contain the species classifications of each datapoint in both the training and testing sets.
List 5 different estimators from tf.estimator and include the base command as you would write it in a script (for example this script used the tf.estimator.DNNClassifier() function from the API).
Estimators are functions that estimate the results of a model based on training data. Different estimators look for different patterns in the data. Below are 5 examples of estimators:
DNNClassifier
classifier = tf.estimator.DNNClassifier(
hidden_units, feature_columns, model_dir=None, n_classes=2, weight_column=None,
label_vocabulary=None, optimizer=’Adagrad’, activation_fn=tf.nn.relu,
dropout=None, config=None, warm_start_from=None,
loss_reduction=losses_utils.ReductionV2.SUM_OVER_BATCH_SIZE, batch_norm=False)
DNNLinearCombinedClassifier
classifier = tf.estimator.DNNLinearCombinedClassifier(
model_dir=None, linear_feature_columns=None, linear_optimizer=’Ftrl’,
dnn_feature_columns=None, dnn_optimizer=’Adagrad’, dnn_hidden_units=None,
dnn_activation_fn=tf.nn.relu, dnn_dropout=None, n_classes=2, weight_column=None,
label_vocabulary=None, config=None, warm_start_from=None,
loss_reduction=losses_utils.ReductionV2.SUM_OVER_BATCH_SIZE, batch_norm=False,
linear_sparse_combiner=’sum’)
LinearClassifier
classifier = tf.estimator.LinearClassifier(
feature_columns, model_dir=None, n_classes=2, weight_column=None,
label_vocabulary=None, optimizer=’Ftrl’, config=None, warm_start_from=None,
loss_reduction=losses_utils.ReductionV2.SUM_OVER_BATCH_SIZE,
sparse_combiner=’sum’)
LinearRegressor
regressor = tf.estimator.LinearRegressor(
feature_columns, model_dir=None, label_dimension=1, weight_column=None,
optimizer=’Ftrl’, config=None, warm_start_from=None,
loss_reduction=losses_utils.ReductionV2.SUM_OVER_BATCH_SIZE,
sparse_combiner=’sum’)
train_and_evaluate
trainer = tf.estimator.train_and_evaluate(
estimator, train_spec, eval_spec)
What are the purposes input functions and defining feature columns?
Input functions allocate data for training, testing, and predicting by creating small datasets. These datasets contains two elements- features and a label. Features is a python dictionary that contains the data’s feature names (as keys). Each key is mapped to array that contains the corresponding values of each feature. The label is an array containing the values of the label for every example contained in the features dictionary. Feature columns is an object that describes how the estimator model should use the raw data passed into it. In other words, feature columns give meaning to raw data by mapping the data to specific features, which the estimator can interpret the significance of. Here is an image I found helpful in interpreting what this means.
Image source
Describe the command classifier.train() in detail. What is the classifier and how did you define it? Which nested function (and how have you defined it) are you applying to the training and test detests?
classifier.train() trains the estimator model. It has 2 imputs- input_fn and steps. input_fn = a function we defined earlier in the code, which batches our data so in this step, we call that function so the classifier is trained on batches that the function creates. Steps specifies how long (how many stops) the model should train for.
What is the difference between a categorial column and a dense feature?
Categorical columns have non-numeric inputs (or numeric inputs with no numerically significant value) that represent categories. In the Titanic data, for example, gender is a categorical column (‘female’ and ‘male’ are the categories). These column categorical entries can be transformed into numberic values using indicator digits (for example, use 0 = male, 1 = female to transform the column). Dense features are numeric columns with significance beyond being an indicator digit. Age, for example, represents a dense feature in this dataset.