graphlab.SFrame.unpack

SFrame.unpack(unpack_column, column_name_prefix=None, column_types=None, na_value=None, limit=None)

Expand one column of this SFrame to multiple columns with each value in a separate column. Returns a new SFrame with the unpacked column replaced with a list of new columns. The column must be of list/array/dict type.

For more details regarding name generation, missing value handling and other, refer to the SArray version of unpack().

Parameters:

unpack_column : str

Name of the unpacked column

column_name_prefix : str, optional

If provided, unpacked column names would start with the given prefix. If not provided, default value is the name of the unpacked column.

column_types : [type], optional

Column types for the unpacked columns. If not provided, column types are automatically inferred from first 100 rows. For array type, default column types are float. If provided, column_types also restricts how many columns to unpack.

na_value : flexible_type, optional

If provided, convert all values that are equal to “na_value” to missing value (None).

limit : list[str] | list[int], optional

Control unpacking only a subset of list/array/dict value. For dictionary SArray, limit is a list of dictionary keys to restrict. For list/array SArray, limit is a list of integers that are indexes into the list/array value.

Returns:

out : SFrame

A new SFrame that contains rest of columns from original SFrame with the given column replaced with a collection of unpacked columns.

Examples

>>> sf = graphlab.SFrame({'id': [1,2,3],
...                      'wc': [{'a': 1}, {'b': 2}, {'a': 1, 'b': 2}]})
+----+------------------+
| id |        wc        |
+----+------------------+
| 1  |     {'a': 1}     |
| 2  |     {'b': 2}     |
| 3  | {'a': 1, 'b': 2} |
+----+------------------+
[3 rows x 2 columns]
>>> sf.unpack('wc')
+----+------+------+
| id | wc.a | wc.b |
+----+------+------+
| 1  |  1   | None |
| 2  | None |  2   |
| 3  |  1   |  2   |
+----+------+------+
[3 rows x 3 columns]

To not have prefix in the generated column name:

>>> sf.unpack('wc', column_name_prefix="")
+----+------+------+
| id |  a   |  b   |
+----+------+------+
| 1  |  1   | None |
| 2  | None |  2   |
| 3  |  1   |  2   |
+----+------+------+
[3 rows x 3 columns]

To limit subset of keys to unpack:

>>> sf.unpack('wc', limit=['b'])
+----+------+
| id | wc.b |
+----+------+
| 1  | None |
| 2  |  2   |
| 3  |  2   |
+----+------+
[3 rows x 3 columns]

To unpack an array column:

>>> sf = graphlab.SFrame({'id': [1,2,3],
...                       'friends': [array.array('d', [1.0, 2.0, 3.0]),
...                                   array.array('d', [2.0, 3.0, 4.0]),
...                                   array.array('d', [3.0, 4.0, 5.0])]})
>>> sf
+----+-----------------------------+
| id |            friends          |
+----+-----------------------------+
| 1  | array('d', [1.0, 2.0, 3.0]) |
| 2  | array('d', [2.0, 3.0, 4.0]) |
| 3  | array('d', [3.0, 4.0, 5.0]) |
+----+-----------------------------+
[3 rows x 2 columns]
>>> sf.unpack('friends')
+----+-----------+-----------+-----------+
| id | friends.0 | friends.1 | friends.2 |
+----+-----------+-----------+-----------+
| 1  |    1.0    |    2.0    |    3.0    |
| 2  |    2.0    |    3.0    |    4.0    |
| 3  |    3.0    |    4.0    |    5.0    |
+----+-----------+-----------+-----------+
[3 rows x 4 columns]