You are given an MNIST dataset and a cross entropy loss function. They ask questions:
If Accuracy of Classifier is 1, what is the lower/upper bound on the loss function for a single training example. (Your answer should just be a scalar value)
If the accuracy of the classifier is now assumed to be zero, what is the lower/upper bound on the loss function for a single training example?
Then, derive answers to the same question as we consider not just a single observation but rather an entire dataset.
Then, you are asked to describe the expected shape of a train/validation error curve. This follows the classic answer that we all learn in school.
The interviewer will ask about why we are entering over fitting territory as # epocha grows large based on the log loss curve (note that the accuracy curve does not show the same phenomena).
The reason is because the log loss depends on the predicted probability of each class. As the model becomes overconfident in its predictions, a phenomena that happens with overtraining, the log loss gets worse.
The interviewer will then ask, based on the initially posed questions about bounds on log loss, whether the increase in loss is most likely coming from many small errors or one large error.
It's more likely that a single incorrectly classified observation is affecting the loss function more so than many correctly classified observations with each a small loss. (This is in part response to the question - how can the log loss increase even when accuracy is nearly 1?)
After, you are given a section for writing code. The code is about an Average Calibration Error. In particular, this is defined by bucketing the predictions based on their magnitude, and then seeing within each bucket of predictions what's the average calibration error (defined as the average absolute difference between the predictions and the labels, for each bucket).
The solution is about 12 lines long. You need a total variable, a for loop, and to calculate the bounds of each bin. It's dead simple.
At the end, the interviewer asks about the noise in the plot from Average Calibration Error as a function of # epochs. The reason this is noisy is because our metric uses bins that may have a small number of predictions/data points available. Using a weighted average instead of an unweighted average would mitigate the noise in the metric.