Turi Distributed Introduction

In this chapter, we demonstrate how to use Turi Distributed, a distributed and asynchronous execution framework that makes it easy to take your prototypes to production. Turi Distributed provides a light-weight framework for creating an environment (for example, in EC2 or Hadoop) for distributed execution and submit jobs to these environments, with loose coupling, and management tools to support asynchronous execution.

In order to work with a Turi Distributed deployment (either on an on-premises cluster, or in the Cloud), you will use a GraphLab Create client. The APIs under graphlab.deploy provide the necessary functionality to create, use, and administer Turi Distributed environments. Moreover, some of the runtime information about distributed job execution can be visualized in GraphLab Canvas.

Jobs Dashboard in GraphLab Canvas

The following chapters provide more details on the following aspects of remote and asynchronous job execution:

Asynchronous Jobs describes how you can execute jobs asynchronously, but still within you local machine. Note that this functionality does not depend on Turi Distributed.

Installing on Hadoop explains how to install Turi Distributed on your local Hadoop environment.

Clusters provides a walk-through of submitting jobs to EC2 as well as Hadoop.

An end-to-end example demonstrates how to implement a recommender and run it as a remote job.

Distributed Machine Learning introduces the concept of executing Turi toolkits in a distributed environment transparently.

Monitoring Jobs outlines how to gain insight into the status and health of previously submitted jobs.

Session Management contains information about how to maintain local references to jobs and environments.

The chapter about Dependencies explains how external packages required by your use case can be included in the job deployment and execution.