Classification is the problem of predicting a categorical target using training data. The key difference between regression and classification is that in regression the target is continuous while in classification, the target is categorical.
Creating classification models is easy with GraphLab Create! Currently, the following models are supported for classification:
- Logistic regression
- Nearest neighbor classifier
- Support vector machines (SVM)
- Boosted Decision Trees
- Random Forests
- Decision Tree
- Neural network classifier (deep learning)
These algorithms differ in how they make predictions, but conform to the same API. With all models, call create() to create a model, predict() to make flexible predictions on the returned model, classify() which provides all the sufficient statistics for classifying data, and evaluate() to measure performance of the predictions. Models can incorporate:
- Numeric features
- Categorical variables
- Dictionary features (i.e sparse features)
- List features (i.e dense arrays)
- Text data
It isn't always clear that we know exactly which model is suitable for a given task. GraphLab Create's model selector automatically picks the right model for you based on statistics collected from the data set.
import graphlab as gl # Load the data data = gl.SFrame('https://static.turi.com/datasets/regression/yelp-data.csv') # Restaurants with rating >=3 are good data['is_good'] = data['stars'] >= 3 # Make a train-test split train_data, test_data = data.random_split(0.8) # Automatically picks the right model based on your data. model = gl.classifier.create(train_data, target='is_good', features = ['user_avg_stars', 'business_avg_stars', 'user_review_count', 'business_review_count']) # Generate predictions (class/probabilities etc.), contained in an SFrame. predictions = model.classify(test_data) # Evaluate the model, with the results stored in a dictionary results = model.evaluate(test_data)
GraphLab Create implementations are built to work with up to billions of examples and up to millions of features.