mxnet

MXNet is an open source deep learning framework designed for efficiency and flexibility. GraphLab Create integrates MXNet for creating advanced deep learning models.

MXNet makes it easy to create state-of-the-art network architectures including deep convolution neural networks (CNN), and recurrent neural networks (RNN). MXNet supports multiple CPUs and GPUs out-of-the-box: the computation is represented as symbolic graph and automatically parallelized across multiple devices. Recent benchmarks showed MXNet performed equally or faster than other frameworks such as TensorFlow, Torch or Caffe.

GraphLab Create embraces MXNet as its own module, and adds features like SFrame integration to further accelerate the creation or fine tuning your own advanced deep learning models with numerical, text, or image data.

For documentation on the MXNet design, please visit http://mxnet.readthedocs.io/en/latest/

Quick start

Linear Regression

import graphlab as gl
from graphlab import mxnet as mx

# Define the network symbol, equivalent to linear regression
net = mx.symbol.Variable('data')
net = mx.symbol.FullyConnected(data=net, name='fc1', num_hidden=1)
net = mx.symbol.LinearRegressionOutput(data=net, name='lr')

# Load data into SFrame and normalize features
sf = gl.SFrame.read_csv('https://static.turi.com/datasets/regression/houses.csv')
features = ['tax', 'bedroom', 'bath', 'size', 'lot']
for f in features:
    sf[f] = sf[f] - sf[f].mean()
    sf[f] = sf[f] / sf[f].std()

# Prepare the input iterator from SFrame
# `data_name` must match the first layer's name of the network.
# `label_name` must match the last layer's name plus "_label".
dataiter = mx.io.SFrameIter(sf, data_field=features, label_field='price',
                            data_name='data', label_name='lr_label',
                            batch_size=1)

# Train the network
model = mx.model.FeedForward.create(symbol=net, X=dataiter, num_epoch=20,
                                    learning_rate=1e-2,
                                    eval_metric='rmse')

# Make prediction
model.predict(dataiter)

Logistic Regression

import graphlab as gl
import numpy as np
from graphlab import mxnet as mx

# Define the network symbol, equivalent to logistic regression
net = mx.symbol.Variable('data')
net = mx.symbol.FullyConnected(data=net, name='fc1', num_hidden=1)
net = mx.symbol.LinearRegressionOutput(data=net, name='lr')

# Load data into SFrame and normalize features
sf = gl.SFrame.read_csv('https://static.turi.com/datasets/regression/houses.csv')
sf['expensive'] = sf['price'] > 100000
features = ['tax', 'bedroom', 'bath', 'size', 'lot']
for f in features:
    sf[f] = sf[f] - sf[f].mean()
    sf[f] = sf[f] / sf[f].std()

# Prepare the input iterator from SFrame
# `data_name` must match the first layer's name of the network.
# `label_name` must match the last layer's name plus "_label".
dataiter = mx.io.SFrameIter(sf, data_field=features, label_field='expensive',
                            data_name='data', label_name='lr_label',
                            batch_size=1)

# Define the custom evaluation function for binary accuracy
def binary_acc(label, pred):
    return int(label[0]) == int(pred[0] >= 0.5)

model = mx.model.FeedForward.create(symbol=net, X=dataiter, num_epoch=20,
                                    learning_rate=1e-2,
                                    eval_metric=mx.metric.np(binary_acc))

# Make prediction
model.predict(dataiter)

Train your own Convolution Neural Network (CNN) for Image Classification

import graphlab as gl
from graphlab import mxnet as mx
import numpy as np

# Define the network symbol
data = mx.symbol.Variable('data')
conv1= mx.symbol.Convolution(data = data, name='conv1', num_filter=32, kernel=(3,3), stride=(2,2))
bn1 = mx.symbol.BatchNorm(data = conv1, name="bn1")
act1 = mx.symbol.Activation(data = bn1, name='relu1', act_type="relu")
mp1 = mx.symbol.Pooling(data = act1, name = 'mp1', kernel=(2,2), stride=(2,2), pool_type='max')

conv2= mx.symbol.Convolution(data = mp1, name='conv2', num_filter=32, kernel=(3,3), stride=(2,2))
bn2 = mx.symbol.BatchNorm(data = conv2, name="bn2")
act2 = mx.symbol.Activation(data = bn2, name='relu2', act_type="relu")
mp2 = mx.symbol.Pooling(data = act2, name = 'mp2', kernel=(2,2), stride=(2,2), pool_type='max')

fl = mx.symbol.Flatten(data = mp2, name="flatten")
fc2 = mx.symbol.FullyConnected(data = fl, name='fc2', num_hidden=10)
softmax = mx.symbol.SoftmaxOutput(data = fc2, name = 'sm')

# Load MINST image data into SFrame
sf =  gl.SFrame('https://static.turi.com/datasets/mnist/sframe/train')

batch_size = 100
num_epoch = 1

# Prepare the input iterator from SFrame
# `data_name` must match the first layer's name of the network.
# `label_name` must match the last layer's name plus "_label".
dataiter = mx.io.SFrameImageIter(sf, data_field=['image'],
                            label_field='label',
                            data_name='data',
                            label_name='sm_label', batch_size=batch_size)

# Train the network
model = mx.model.FeedForward.create(softmax, X=dataiter,
                                    num_epoch=num_epoch,
                                    learning_rate=0.1, wd=0.0001,
                                    momentum=0.9,
                                    eval_metric=mx.metric.Accuracy())

# Make prediction
model.predict(dataiter)

Model Creation

Model Training

As the examples above showed, model.FeedForward.create() is the high level for training all kinds of neural networks. The main parameters are symbol and X, binding to the network architectures and the data iterator respectively. Additional parameters such as num_epoch, optimizer are used to control the optimization procedure. The default optimizer is optimizer.SGD. For convenience, model.FeedForward.create() also takes optimization related parameters as kwargs, i.e. learning_rate, momentum. For instance:

model = mx.model.FeedForward.create(symbol=net, X=dataiter, learning_rate=0.01)

is equivalent to

sgd = mx.optimizer.SGD(learning_rate=0.01, rescale_grad=1.0/batch_size)
model = mx.model.FeedForward.create(symbol=net, X=dataiter, optimizer=sgd)
model.FeedForward Model class of MXNet for training and predicting feedforward nets.
model.FeedForward.create Functional style to create a model.
model.FeedForward.load Load model checkpoint from file.
model.FeedForward.save Checkpoint the model checkpoint into file.
model.FeedForward.predict Run the prediction, always only use one device.
optimizer.SGD A very simple SGD optimizer with momentum and weight regularization.
optimizer.ccSGD A very simple SGD optimizer with momentum and weight regularization.
optimizer.RMSProp RMSProp optimizer of Tieleman & Hinton, 2012,
optimizer.Adam Adam optimizer as described in [King2014].

Callback functions can be used to print progress or checkpoint model during training. There are two types of callbacks: epoch_end_callback and batch_end_callback. Both are arguments to the model.FeedForward.create() function. For instance, the following example does model check point every epoch and print progress every 10 batches.

model = mx.model.FeedForward.create(symbol=net, X=dataiter,
                                    batch_end_callback=mx.callback.Speedometer(batch_size=batch_size, frequent=10),
                                    epoch_end_callback=mx.callback.do_checkpoint(prefix='model_checkpoint'))

epoch_end_callback is called at the end of each epoch and batch_end_callback is called at then end of each batch. The following shows the example of callbacks for doing model check pointing, logging metrics and progress printing.

Epoch end callbacks:

callback.do_checkpoint Callback to checkpoint the model to prefix every epoch.
callback.log_train_metric Callback to log the training evaluation result every period.

Batch end callbacks:

callback.Speedometer Calculate training speed in frequent
callback.ProgressBar Show a progress bar.

Multiple GPU support

MXNet supports using multiple GPUs for model training and prediction. GPU support is available for Linux operating systems that have

  1. Nvidia CUDA 7.0 capable GPU(s)
  2. CUDA toolkit v7.0
  3. Minimum driver version of 346.xx. (Link)

By default, the CUDA toolkit library is installed at /usr/local/cuda. If the CUDA toolkit is installed at a non-default location, please set the environment variable LD_LIBRARY_PATH to include the CUDA toolkit location. The following code shows an example that MXNet GPU capability is activated:

>>> from graphlab import mxnet as mx
2016-04-15 11:37:22,580 [INFO] graphlab.mxnet.base, 42: CUDA GPU support is activated

By default, model training and prediction uses CPU. When Nvidia CUDA-enabled GPU is available and drivers are properly installed, you can specify using a single GPU or multiple GPUs to speedup computation.

Devices in MXNet are called context. For example, the following code uses two GPUs for training a FeedForward network.

gpus = [mx.context.gpu(0), mx.context.gpu(1)]
model = mx.model.FeedForward.create(symbol=net, X=data, ctx=gpus, ...)
model2 = mx.model.FeedForward.load(..., ctx=gpus)

Note: Passing a list of CPU devices have no effect. Computation on CPU is parallelized by the BLAS implementation.

context.gpu Return a GPU context.
context.cpu Return a CPU context.

Data Input from SFrame Iterator

SFrame is a scalable data frame object that can hold numericals, text, and images. Training deep neural network requires large amount of data, usually much larger than memory can hold. SFrame makes it easy to transform your large dataset and feed it to MXNet for training without the need of writing to disk with a custom file format or using a database.

MXNet integrates with SFrame via SFrameIter which implements the DataIter interface. SFrameIter supports SFrame with either a single image-typed column or general tabular data with multiple numerical columns, in which the numerical columns can be of different dimensions.

SFrameImageIter is a specialized iterator for image typed data. SFrameImageIter supports image augmentation operations such as “subtracting mean pixel values” or “rescale pixel values”.

io.SFrameIter DataIter from SFrame Provides DataIter interface for SFrame, a highly scalable columnar DataFrame.
io.SFrameImageIter Image Data Iterator from SFrame Provide the SFrameIter like interface with options to normalize and augment image data.
io.DataIter DataIter object in mxnet.

Builtin Networks

The following shows the built-in state-of-the-art network architechtures for image classification.

builtin_symbols.symbol_alexnet.get_symbol Return the “AlexNet” architecture for image classification
builtin_symbols.symbol_googlenet.get_symbol Return the “GoogLeNet” architecture for image classification
builtin_symbols.symbol_vgg.get_symbol Return the “VGG” architecture for image classification
builtin_symbols.symbol_inception_v3.get_symbol Return the “Inception-v3” architecture for image classification
builtin_symbols.symbol_inception_bn.get_symbol Return the “BN-Inception” architecture for image classification
builtin_symbols.symbol_inception_bn_28_small.get_symbol Return a simplified version of “BN-Inception” architecture for image classification
builtin_symbols.symbol_inception_bn_full.get_symbol Return a variant of “BN-Inception” architecture for image classification

Task Oriented Pretrained Models for Image Classification

Pretrained models are deep neural networks trained on large datasets for specific tasks. For general purpose tasks such as image classification or object detection, the quickest way to get value from deep learning is to directly apply the pretrained models.

MXNet in GLC streamlines the process of using pretrained models in the following ways:

  • Provides task oriented API designed to simplify the common use cases
  • Integrates with SFrame for scalable data loading and transformation
  • Allows model download and management via a simple API

The following example shows the end to end process of downloading a pretrained image classifier and classifying thousands of images.

import graphlab as gl
from graphlab import mxnet as mx

mx.pretrained_model.download_model('https://static.turi.com/models/mxnet_models/release/image_classifier/imagenet1k_inception_bn-1.0.tar.gz')

mx.pretrained_model.list_models()

image_classifier = mx.pretrained_model.load_model('imagenet1k_inception_bn', ctx=mx.gpu(0))

# Load image data into SFrame
sf = gl.SFrame('https://static.turi.com/datasets/cats_dogs_sf')

# Predict using the pretrained image classifier
prediction = image_classifier.predict_topk(sf['image'], k=1)
pretrained_model.list_models Return list of pretrained model names.
pretrained_model.download_model Perform downloading the model to local filesystem.
pretrained_model.load_model Load a pretrained model by name.
pretrained_model.load_path Load a pretrained model by path.
pretrained_model.ImageClassifier.predict_topk Predict the topk classes for given data
pretrained_model.ImageClassifier.extract_features Extracts features from the second to last layer in the network.

Task Oriented Pretrained Models for Object Detection

Turi also provides a pre-trained object detector. Object detection is the task of identifying objects in an image, and also providing bounding boxes for them. This is generally a challenging task, but neural networks have proven to be quite effective.

import graphlab as gl
from graphlab import mxnet as mx

mx.pretrained_model.download_model('https://static.turi.com/models/mxnet_models/release/image_detector/coco_vgg_16-1.0.tar.gz')

mx.pretrained_model.list_models()

image_detector = mx.pretrained_model.load_model('coco_vgg_16', ctx=mx.gpu(0))

# Load image data into SFrame
sf = gl.SFrame('https://static.turi.com/datasets/cats_dogs_sf')

# Predict using the pretrained image classifier
prediction = image_detector.detect(sf['image'][0], k=1)

image_detector.visualize_detection(sf['image'][0],prediction)
pretrained_model.list_models Return list of pretrained model names.
pretrained_model.download_model Perform downloading the model to local filesystem.
pretrained_model.load_model Load a pretrained model by name.
pretrained_model.load_path Load a pretrained model by path.
pretrained_model.ImageDetector.detect Detect objects for the given data
pretrained_model.ImageDetector.extract_features Detect objects and extract feature of objects for the given data

Symbols (Layers)

symbol.Symbol is the building block of a neural network. Every symbol can be viewed as a functional object with forward and backward operation. Individual symbols can be composed into more complexed symbol which becomes a neural network.

For a more detailed tutorial on the Symbolic API, please see: http://mxnet.readthedocs.io/en/latest/packages/python/symbol.html

Fully Connected and Convolution Layers

symbol.FullyConnected Apply matrix multiplication to input then add a bias.
symbol.Convolution Apply convolution to input then add a bias.
symbol.Deconvolution Apply deconvolution to input then add a bias.
symbol.Pooling Perform spatial pooling on inputs.

Activation Layers

symbol.Activation Apply activation function to input.Softmax Activation is only available with CUDNN on GPUand will be computed at each location across channel if input is 4D.
symbol.SoftmaxActivation Apply softmax activation to input.

Output Layers

symbol.LinearRegressionOutput Use linear regression for final output, this is used on final output of a net.
symbol.LogisticRegressionOutput Use Logistic regression for final output, this is used on final output of a net.
symbol.MAERegressionOutput Use mean absolute error regression for final output, this is used on final output of a net.
symbol.SoftmaxOutput Perform a softmax transformation on input, backprop with logloss.

Reshape Layers

symbol.Cast Cast array to a different data type.
symbol.Concat Perform an feature concat on channel dim (defaut is 1) over all This function support variable length of positional input.
symbol.Crop Crop the 2nd and 3rd dim of input data, with the corresponding size of h_w or with width and height of the second input symbol, i.e., with one input, we need h_w to specify the crop height and width, otherwise the second input symbol’s size will be used This function support variable length of positional input.
symbol.ElementWiseSum Perform an elementwise sum over all the inputs.
symbol.Flatten Flatten input
symbol.Group Create a symbol that groups symbols together.
symbol.SliceChannel Slice input equally along specified axis
symbol.UpSampling Perform nearest neighboor/bilinear up sampling to inputs This function support variable length of positional input.

Regularization Layers

symbol.Dropout Apply dropout to input
symbol.BatchNorm Apply batch normalization to input.

Debugging and Monitoring

Monitor is used to track outputs, weights and gradients during training. Create a Monitor object and pass to the model.FeedForward.create() function to enable monitoring.

monitor.Monitor Monitor outputs, weights, and gradients for debugging.

Extract Features from a Model

The following shows how to extract features from a given model. model.extract_features() is an API for extracting features from a given model for data. The main parameters are model and data, for given model.FeedForward model and iterator of data. There is an optional parameter top_layer, which indicates from which layer features will be extracted; If top_layer is not set, it will automatically extract the second-to-last layer as features.

Note

Default `top_layer` is only correct for single output network

# net is a FeedForward model
# This will extract features in the second last layer (the last layer is classifier)
fea = mx.model.extract_features(model=net, data=dataiter)

or

# net is a FeedForward model
# This will extract features in layer `fc_output`
fea = mx.model.extract_features(model=net, data=dataiter, top_layer='fc1_output')

If a wrong top_layer name is given, correct candidates for top_layer will be given in exception.

model.extract_feature Extract feature from model

Finetune a Model

There is a helper function for finetuning, which is model.get_feature_symbol(). This function will generate a feature symbol from a given FeedForward model. Similar to extract features, an optional parameter top_layer is used for setting until which layer the network will be kept. If no value is set, the layer before the linear classifier will be used as features.

Note

Default `top_layer` is only correct for single output network

# net is a FeedForward model
# This will get symbol of the second last layer and below (the last layer is classifier)
feature_symbol = mx.model.get_feature_symbol(net)

or

# net is a FeedForward model
# This will get symbol of `fc1_output` and below
feature_symbol = mx.model.get_feature_symbol(net, top_layer='fc1_output')

After we get feature symbol, we can make new symbol for different task:

# net is a FeedForward model
# This will build a classifier symbol based on feature symbol
feature_symbol = mx.model.get_feature_symbol(net)
classifier = mx.sym.FullyConnected(data=feature, num_hidden=18, name='new_classifier')
classifier = mx.sym.SoftmaxOutput(data=classifier, name='softmax')

or

# net is a FeedForward model
# This will build a regressor symbol based on feature symbol
feature_symbol = mx.model.get_feature_symbol(net)
regressor = mx.sym.FullyConnected(data=feature, num_hidden=1, name='new_regressor')
regressor =  mx.symbol.LinearRegressionOutput(data=net, name='lr')

Then the API model.finetune is used for finetuning a model. The main parameters are symbol and model, for new task network symbol and the given model.FeedForward model to be used for finetuning. Other parameters are same to model.FeedForward.create(), for indicating training data, validation data and optimization.

Note

Usually, smaller learning rate is used for finetuning.

# classifier is the new symbol
# net is a FeedForward model
new_model = mx.model.finetune(symbol=classifier, model=net, num_epoch=2, learning_rate=1e-3,
                              X=train, eval_data=val,
                              batch_end_callback=mx.callback.Speedometer(100))
model.finetune Get a FeedForward model for fine-tune ..
model.get_feature_symbol Get feature symbol from a model ..