Distributing model parameter search

For all model parameter search methods and cross_val_score, you have the choice of running the jobs locally or remotely.


By default, jobs are scheduled to run locally in an asynchronous fashion. This is called a LocalAsync environment.


You may also run jobs on an EC2 cluster or a Hadoop cluster. This is especially useful when you want to perform a larger scale parameter search.

To launch a job on an EC2 cluster, you first create an EC2 environment and pass it into the environment argument:

ec2config = graphlab.deploy.Ec2Config()
ec2 = graphlab.deploy.ec2_cluster.create(name='mps',

j = graphlab.model_parameter_search.create((train, valid),
                                           my_model, my_params,

For launching jobs on a Hadoop cluster, you instead create a Hadoop environment and pass this object into the environment argument:

hd = gl.deploy.hadoop_cluster.create(name='hadoop-cluster',
                                     turi_dist_path=<path to installation>)

j = graphlab.model_parameter_search.create((train, valid),
                                           my_model, my_params,

For more details on creating EC2- and Hadoop-based environments, checkout the API docs or the Deployment chapter of the userguide.

When getting started, it is useful to keep perform_trial_run=True to make sure you are creating your models properly.