In a regression task, the model learns to predict numeric scores. An example is predicting the price of a stock on future days given past price history and other information about the company and the market.
This section will deal with two ways of measuring regression performance:
The most commonly used metric for regression tasks is RMSE (Root Mean Square Error). This is defined as the square root of the average squared distance between the actual score and the predicted score:
Here, denotes the true score for the i-th data point, and denotes the predicted value. One intuitive way to understand this formula is that it is the Euclidean distance between the vector of the true scores and the vector of the predicted scores, averaged by , where is the number of data points.
import graphlab as gl y = gl.SArray([3.1, 2.4, 7.6, 1.9]) yhat = gl.SArray([4.1, 2.3, 7.4, 1.7]) print gl.evaluation.rmse(y, yhat)
While RMSE is the most common metric, it can be hard to interpret. One alternative is to look at quantiles of the distribution of the absolute percentage errors. The Max-Error metric is the worst case error between the predicted value and the true value.
import graphlab as gl y = gl.SArray([3.1, 2.4, 7.6, 1.9]) yhat = gl.SArray([4.1, 2.3, 7.4, 1.7]) print gl.evaluation.max_error(y, yhat)