graph_analytics

The graph analytics toolkit contains methods for analyzing a SGraph. Each method takes an input graph and returns a model object, which contains the training time, an SFrame with the desired output for each vertex, and a new graph whose vertices contain the output as an attribute. Note that all of the methods in the graph analytics toolkit are available from the top level graphlab import.

For this example we download a dataset of James Bond characters to an SFrame, then build the SGraph.

>>> from graphlab import SFrame, SGraph
>>> url = 'https://static.turi.com/datasets/bond/bond_edges.csv'
>>> data = SFrame.read_csv(url)
>>> g = SGraph().add_edges(data, src_field='src', dst_field='dst')

The degree_counting.create() method computes the inbound, outbound, and total degree for each vertex.

>>> from graphlab import degree_counting
>>> deg = degree_counting.create(g)
>>> deg_graph = deg['graph'] # a new SGraph with degree data attached to each vertex
>>> in_degree = deg_graph.vertices[['__id', 'in_degree']]
>>> out_degree = deg_graph.vertices[['__id', 'out_degree']]

The connected_components.create() method finds all weakly connected components in the graph and returns a ConnectedComponentsModel. A connected component is a group of vertices such that there is a path between each vertex in the component and all other vertices in the group. If two vertices are in different connected components there is no path between them.

>>> from graphlab import connected_components
>>> cc = connected_components.create(g)
>>> print cc.summary()
>>> print cc.list_fields()

>>> cc_ids = cc.get('component_id')  # an SFrame
>>> cc_ids = cc['component_id']      # equivalent to the above line
>>> cc_graph = cc['graph']

The shortest_path.create() method finds the shortest directed path distance from a single source vertex to all other vertices and returns a ShortestPathModel. The output model contains this information and a method to retrieve the actual path to a particular vertex.

>>> from graphlab import shortest_path
>>> sp = shortest_path.create(g, source_vid=123)
>>> sp_sframe = sp['distance']
>>> sp_graph = sp['graph']
>>> path = sp.get_path('98')

The triangle_counting.create() counts the number of triangles in the graph and for each vertex and returns a TriangleCountingModel. A graph triangle is a complete subgraph of three vertices. The number of triangles to which a vertex belongs is an indication of the connectivity of the graph around that vertex.

>>> from graphlab import triangle_counting
>>> tc = triangle_counting.create(g)
>>> tc_out = tc['triangle_count']

The pagerank.create() method computes the pagerank for each vertex and returns a PagerankModel. The pagerank value indicates the centrality of each node in the graph.

>>> from graphlab import pagerank
>>> pr = pagerank.create(g)
>>> pr_out = pr['pagerank']

The label_propagation.create() method computes the label probability for the vertices with unobserved labels by propagating information from the vertices with observed labels along the edges. The method returns a LabelPropagationModel, which contains the probability that a vertex belongs to each of the label classes.

>>> from graphlab import label_propagation
>>> import random
>>> def init_label(vid):
...     x = random.random()
...     if x > 0.9:
...         return 0
...     elif x < 0.1:
...         return 1
...     else:
...         return None
>>> g.vertices['labels'] = g.vertices['__id'].apply(init_label, int)
>>> m = label_propagation.create(g)
>>> labels = m['labels']

The K-core decomposition recursively removes vertices from the graph with degree less than k. The value of k where a vertex is removed is its core ID; the kcore.create() method returns a KcoreModel which contains the core ID for every vertex in the graph.

>>> from graphlab import kcore
>>> kc = kcore.create(g)
>>> kcore_id = kc['core_id']

Graph coloring assigns each vertex in the graph to a group in such a way that no two adjacent vertices share the same group. graph_coloring.create() method returns a GraphColoringModel which contains the color group ID for every vertex in the graph.

>>> from graphlab import graph_coloring
>>> color = graph_coloring.create(g)
>>> color_id = color['color_id']
>>> num_colors = color['num_colors']

For more information on the models in the graph analytics toolkit, plus extended examples, please see the model definitions and create methods in the API documentation, as well as the data science Gallery, the How-Tos, and the graph analytics chapter of the User Guide.

connected components

connected_components.create Compute the number of weakly connected components in the graph.
connected_components.ConnectedComponentsModel A ConnectedComponentsModel object contains the component ID for each vertex and the total number of weakly connected components in the graph.
connected_components.get_default_options Get the default options for graphlab.connected_components.create().

degree counting

degree_counting.create Compute the in degree, out degree and total degree of each vertex.
degree_counting.DegreeCountingModel Model object containing the in degree, out degree and total degree for each vertex,
degree_counting.get_default_options Get the default options for graphlab.degree_counting.create().

graph coloring

graph_coloring.create Compute the graph coloring.
graph_coloring.GraphColoringModel A GraphColoringModel object contains color ID assignments for each vertex and the total number of colors used in coloring the entire graph.
graph_coloring.get_default_options Get the default options for graphlab.graph_coloring.create().

k-core

kcore.create Compute the K-core decomposition of the graph.
kcore.KcoreModel A KcoreModel object contains a core ID for each vertex, and the total number of cores in the graph.
kcore.get_default_options Get the default options for graphlab.kcore.create().

label propagation

label_propagation.create Given a weighted graph with observed class labels of a subset of vertices, infer the label probability for the unobserved vertices using the “label propagation” algorithm.
label_propagation.LabelPropagationModel A LabelPropagationModel computes the probability of each class label for each unlabeled vertex.
label_propagation.get_default_options Get the default options for graphlab.label_propagation.create().

pagerank

pagerank.create Compute the PageRank for each vertex in the graph.
pagerank.PagerankModel A PageRankModel object contains the pagerank value for each vertex.
pagerank.get_default_options Get the default options for graphlab.pagerank.create().

shortest path

shortest_path.create Compute the single source shortest path distance from the source vertex to all vertices in the graph.
shortest_path.ShortestPathModel Model object containing the distance for each vertex in the graph to a single source vertex, which is specified during graphlab.shortest_path.create().
shortest_path.get_default_options Get the default options for graphlab.shortest_path.create().

triangle counting

triangle_counting.create Compute the number of triangles each vertex belongs to, ignoring edge directions.
triangle_counting.TriangleCountingModel Model object containing the traingle count for each vertex, and the total number of triangles.
triangle_counting.get_default_options Get the default options for graphlab.triangle_counting.create().