graphlab.recommender.item_similarity_recommender.create

graphlab.recommender.item_similarity_recommender.create(observation_data, user_id='user_id', item_id='item_id', target=None, user_data=None, item_data=None, nearest_items=None, similarity_type='jaccard', threshold=0.001, only_top_k=64, verbose=True, target_memory_usage=8589934592, **kwargs)

Create a recommender that uses item-item similarities based on users in common.

Parameters:

observation_data : SFrame

The dataset to use for training the model. It must contain a column of user ids and a column of item ids. Each row represents an observed interaction between the user and the item. The (user, item) pairs are stored with the model so that they can later be excluded from recommendations if desired. It can optionally contain a target ratings column. All other columns are interpreted by the underlying model as side features for the observations.

The user id and item id columns must be of type ‘int’ or ‘str’. The target column must be of type ‘int’ or ‘float’.

user_id : string, optional

The name of the column in observation_data that corresponds to the user id.

item_id : string, optional

The name of the column in observation_data that corresponds to the item id.

target : string, optional

The observation_data can optionally contain a column of scores representing ratings given by the users. If present, the name of this column may be specified variables target.

user_data : SFrame, optional

Side information for the users. This SFrame must have a column with the same name as what is specified by the user_id input parameter. user_data can provide any amount of additional user-specific information. (NB: This argument is currently ignored by this model.)

item_data : SFrame, optional

Side information for the items. This SFrame must have a column with the same name as what is specified by the item_id input parameter. item_data can provide any amount of additional item-specific information. (NB: This argument is currently ignored by this model.)

similarity_type : {‘jaccard’, ‘cosine’, ‘pearson’}, optional

Similarity metric to use. See ItemSimilarityRecommender for details. Default: ‘jaccard’.

threshold : float, optional

Predictions ignore items below this similarity value. Default: 0.001.

only_top_k : int, optional

Number of similar items to store for each item. Default value is 64. Decreasing this decreases the amount of memory required for the model, but may also decrease the accuracy.

nearest_items : SFrame, optional

A set of each item’s nearest items. When provided, this overrides the similarity computed above. See Notes in the documentation for ItemSimilarityRecommender. Default: None.

target_memory_usage : int, optional

The target memory usage for the processing buffers and lookup tables. The actual memory usage may be higher or lower than this, but decreasing this decreases memory usage at the expense of training time, and increasing this can dramatically speed up the training time. Default is 8GB = 8589934592.

seed_item_set_size : int, optional

For users that have not yet rated any items, or have only rated uniquely occuring items with no similar item info, the model seeds the user’s item set with the average ratings of the seed_item_set_size most popular items when making predictions and recommendations. If set to 0, then recommendations based on either popularity (no target present) or average item score (target present) are made in this case.

training_method : (advanced), optional.

The internal processing is done with a combination of nearest neighbor searching, dense tables for tracking item-item similarities, and sparse item-item tables. If ‘auto’ is chosen (default), then the estimated computation time is estimated for each, and the computation balanced between the methods in order to minimize training time given the target memory usage. This allows the user to force the use of one of these methods. All should give equivalent results; the only difference would be training time. Possible values are {‘auto’, ‘dense’, ‘sparse’, ‘nn’, ‘nn:dense’, ‘nn:sparse’}. ‘dense’ uses a dense matrix to store item-item interactions as a lookup, and may do multiple passes to control memory requirements. ‘sparse’ does the same but with a sparse lookup table; this is better if the data has many infrequent items. “nn” uses a brute-force nearest neighbors search. “nn:dense” and “nn:sparse” use nearest neighbors for the most frequent items (see nearest_neighbors_interaction_proportion_threshold below), and either sparse or dense matrices for the remainder. “auto” chooses the method predicted to be the fastest based on the properties of the data.

nearest_neighbors_interaction_proportion_threshold : (advanced) float

Any item that has was rated by more than this proportion of users is treated by doing a nearest neighbors search. For frequent items, this is almost always faster, but it is slower for infrequent items. Furthermore, decreasing this causes more items to be processed using the nearest neighbor path, which may decrease memory requirements.

degree_approximation_threshold : (advanced) int, optional

Users with more than this many item interactions may be approximated. The approximation is done by a combination of sampling and choosing the interactions likely to have the most impact on the model. Increasing this can increase the training time and may or may not increase the quality of the model. Default = 4096.

max_data_passes : (advanced) int, optional

The maximum number of passes through the data allowed in building the similarity lookup tables. If it is not possible to build the recommender in this many passes (calculated before that stage of training), then additional approximations are applied; namely decreasing degree_approximation_threshold. If this is not possible, an error is raised. To decrease the number of passes required, increase target_memory_usage or decrease nearest_neighbors_interaction_proportion_threshold. Default = 1024.

Notes

Currently, ItemSimilarityRecommender does not leverage the use of side features user_data and item_data.

Incorporating pre-defined similar items

For item similarity models, one may choose to provide user-specified nearest neighbors graph using the keyword argument nearest_items. This is an SFrame containing, for each item, the nearest items and the similarity score between them. If provided, these item similarity scores are used for recommendations. The SFrame must contain (at least) three columns:

  • ‘item_id’: a column with the same name as that provided to the item_id arugment (which defaults to the string “item_id”).
  • ‘similar’: a column containing the nearest items for the given item id. This should have the same type as the item_id column.
  • ‘score’: a numeric score measuring how similar these two items are.

For example, suppose you first create an ItemSimilarityRecommender and use get_similar_items:

>>> sf = graphlab.SFrame({'user_id': ["0", "0", "0", "1", "1", "2", "2", "2"],
...                       'item_id': ["a", "b", "c", "a", "b", "b", "c", "d"]})
>>> m = graphlab.item_similarity_recommender.create(sf)
>>> nn = m.get_similar_items()
>>> m2 = graphlab.item_similarity_recommender.create(sf, nearest_items=nn)

With the above code, the item similarities computed for model m can be used to create a new recommender object, m2. Note that we could have created nn from some other means, but now use m2 to make recommendations via m2.recommend().

Examples

Given basic user-item observation data, an ItemSimilarityRecommender is created:

>>> sf = graphlab.SFrame({'user_id': ['0', '0', '0', '1', '1', '2', '2', '2'],
...                       'item_id': ['a', 'b', 'c', 'a', 'b', 'b', 'c', 'd']})
>>> m = graphlab.item_similarity_recommender.create(sf)
>>> recs = m.recommend()

When a target is available, one can specify the desired similarity. For example we may choose to use a cosine similarity, and use it to make predictions or recommendations.

>>> sf2 = graphlab.SFrame({'user_id': ['0', '0', '0', '1', '1', '2', '2', '2'],
...                        'item_id': ['a', 'b', 'c', 'a', 'b', 'b', 'c', 'd'],
...                        'rating': [1, 3, 2, 5, 4, 1, 4, 3]})
>>> m2 = graphlab.item_similarity_recommender.create(sf2, target="rating",
...                                                  similarity_type='cosine')
>>> m2.predict(sf)
>>> m2.recommend()