graphlab.TimeSeries

class graphlab.TimeSeries(data=None, index=None, **kwargs)

The TimeSeries object is the fundamental data structure for multivariate time series data. TimeSeries objects are backed by a single SFrame, but include extra metadata.

The TimeSeries data is stored like the following:

T V_0 V_1 ... V_n
t_0 v_00 v_10 ... v_n0
t_1 v_01 v_11 ... v_n1
t_2 v_02 v_12 ... v_n2
... ... ... ... ...
t_k v_0k v_1k ... v_nk

Each column in the table is a univariate time series, and the index is shared across all of the series.

Parameters:

data : SFrame | str

data is either the SFrame that holds the content of the TimeSeries object or is a string. If it is a string, it is interpreted as a filename. Files can be read from local file system or a URL (local://, hdfs://, s3://, http://).

index : str

The name of the column containing the index of the time series in the SFrame referred to by data. The column must be of type datetime.datetime. data will be sorted by the index column if it is not already sorted by this column. If data is a filename, this parameter is optional and ignored; otherwise it is required.

**kwargs : optional

Keyword parameters passed to the TimeSeries constructor.

  • is_sorted : bool, optional

Examples

Construction

>>> import graphlab as gl
>>> import datetime as dt
>>> t0 = dt.datetime(2013, 5, 7, 10, 4, 10)
>>> sf = gl.SFrame({'a': [1.1, 2.1, 3.1],
...                 'b': [t0, t0 + dt.timedelta(days=5),
...                       t0 + dt.timedelta(days=10)]})
>>> ts = gl.TimeSeries(sf, index='b')
>>> print ts
+---------------------+-----+
|          b          |  a  |
+---------------------+-----+
| 2013-05-07 10:04:10 | 1.1 |
| 2013-05-12 10:04:10 | 2.1 |
| 2013-05-17 10:04:10 | 3.1 |
+---------------------+-----+
[3 rows x 2 columns]
The index column of the TimeSeries is: b

Save and Load

>>> ts.save("my_series")
>>> ts_new = gl.TimeSeries("my_series")
>>> print ts_new
+---------------------+-----+
|          b          |  a  |
+---------------------+-----+
| 2013-05-07 10:04:10 | 1.1 |
| 2013-05-12 10:04:10 | 2.1 |
| 2013-05-17 10:04:10 | 3.1 |
+---------------------+-----+
[3 rows x 2 columns]
The index column of the TimeSeries is: b

Element Accessing in TimeSeries

>>>  ts.index_col_name
'b'
>>> ts.value_col_names
['a']
>>> ts['a']
dtype: float
Rows: 3
[1.1, 2.1, 3.1]
>>> ts[0]
{'a': 1.1, 'b': datetime.datetime(2013, 5, 7, 10, 4, 10)}

Resampling TimeSeries

>>> t_resample = ts.resample(dt.timedelta(days=1),
...                          downsample_method='sum',
...                          upsample_method='nearest')
>>> print t_resample
e
+---------------------+-----+
|          b          |  a  |
+---------------------+-----+
| 2013-05-07 00:00:00 | 1.1 |
| 2013-05-08 00:00:00 | 1.1 |
| 2013-05-09 00:00:00 | 1.1 |
| 2013-05-10 00:00:00 | 2.1 |
| 2013-05-11 00:00:00 | 2.1 |
| 2013-05-12 00:00:00 | 2.1 |
| 2013-05-13 00:00:00 | 2.1 |
| 2013-05-14 00:00:00 | 2.1 |
| 2013-05-15 00:00:00 | 3.1 |
| 2013-05-16 00:00:00 | 3.1 |
+---------------------+-----+
[11 rows x 2 columns]

Shifting Index Column

>>> interval  = dt.timedelta(days=5)
>>> ts_tshifted = ts.tshift(steps=interval)
>>> print ts_tshifted
+---------------------+-----+
|          b          |  a  |
+---------------------+-----+
| 2013-05-12 10:04:10 | 1.1 |
| 2013-05-17 10:04:10 | 2.1 |
| 2013-05-22 10:04:10 | 3.1 |
+---------------------+-----+
[3 rows x 2 columns]
The index column of the TimeSeries is: b

Shifting Value Columns

>>> ts_shifted = ts.shift(steps=2)
>>> print ts_shifted
+---------------------+------+
|          b          |  a   |
+---------------------+------+
| 2013-05-07 10:04:10 | None |
| 2013-05-12 10:04:10 | None |
| 2013-05-17 10:04:10 | 1.1  |
+---------------------+------+
[3 rows x 2 columns]
The index column of the TimeSeries is: b

>>> ts_shifted = ts.shift(steps=-1)
>>> print ts_shifted
+---------------------+------+
|          b          |  a   |
+---------------------+------+
| 2013-05-07 10:04:10 | 2.1  |
| 2013-05-12 10:04:10 | 3.1  |
| 2013-05-17 10:04:10 | None |
+---------------------+------+
[3 rows x 2 columns]
The index column of the TimeSeries is: b

Join Two TimeSeries on Index Columns

>>> import graphlab as gl
>>> import datetime as dt
>>> t0 = dt.datetime(2013, 5, 7, 10, 4, 10)
>>> sf = gl.SFrame({'a': [1.1, 2.1, 3.1],
...                 'b':[t0, t0 + dt.timedelta(days=1),
...                      t0 + dt.timedelta(days=2)]})
>>> ts = gl.TimeSeries(sf, index='b')
>>> print ts
+---------------------+-----+
|          b          |  a  |
+---------------------+-----+
| 2013-05-07 10:04:10 | 1.1 |
| 2013-05-08 10:04:10 | 2.1 |
| 2013-05-09 10:04:10 | 3.1 |
+---------------------+-----+
[3 rows x 2 columns]
The index column of the TimeSeries is: b

>>> sf2 = gl.SFrame({'a':[1.1, 2.1, 3.1],
...                  'b':[t0 + dt.timedelta(days=1),
...                       t0 + dt.timedelta(days=2),
...                       t0 + dt.timedelta(days=3)]})
>>> ts2 = gl.TimeSeries(sf2, index='b')
>>> print ts2
+---------------------+-----+
|          b          |  a  |
+---------------------+-----+
| 2013-05-08 10:04:10 | 1.1 |
| 2013-05-09 10:04:10 | 2.1 |
| 2013-05-10 10:04:10 | 3.1 |
+---------------------+-----+
[3 rows x 2 columns]
The index column of the TimeSeries is: b

>>> ts_join = ts.index_join(ts2, how='inner')
>>> print ts_join
+---------------------+-----+-----+
|          b          |  a  | a.1 |
+---------------------+-----+-----+
| 2013-05-08 10:04:10 | 2.1 | 1.1 |
| 2013-05-09 10:04:10 | 3.1 | 2.1 |
+---------------------+-----+-----+
[2 rows x 3 columns]
The index column of the TimeSeries is: b

Slicing TimeSeries

>>> sliced_ts = ts.slice(t0, t0 + dt.timedelta(days=3),
...                               closed="left")
>>> print sliced_ts
+---------------------+-----+
|          b          |  a  |
+---------------------+-----+
| 2013-05-07 10:04:10 | 1.1 |
+---------------------+-----+
[1 rows x 2 columns]
The index column of the TimeSeries is: b

>>> sliced_ts = ts[dt.date(2013, 5, 7)]
>>> print sliced_ts
+---------------------+-----+
|          b          |  a  |
+---------------------+-----+
| 2013-05-07 10:04:10 | 1.1 |
+---------------------+-----+
[1 rows x 2 columns]
The index column of the TimeSeries is: b

>>> ts[dt.datetime(2013, 5, 7):dt.datetime(2013, 5, 13)]
+---------------------+-----+
|          b          |  a  |
+---------------------+-----+
| 2013-05-07 10:04:10 | 1.1 |
| 2013-05-12 10:04:10 | 2.1 |
+---------------------+-----+
[2 rows x 2 columns]
The index column of the TimeSeries is: b

Add/Remove TimeSeries Columns

>>> ts.add_column(gl.SArray([1, 2, 3]), "new_value")
>>> print ts
+---------------------+-----+-----------+
|          b          |  a  | new_value |
+---------------------+-----+-----------+
| 2013-05-07 10:04:10 | 1.1 |     1     |
| 2013-05-12 10:04:10 | 2.1 |     2     |
| 2013-05-17 10:04:10 | 3.1 |     3     |
+---------------------+-----+-----------+
[3 rows x 3 columns]
The index column of the TimeSeries is: b

>>> ts.remove_column("new_value")
+---------------------+-----+
|          b          |  a  |
+---------------------+-----+
| 2013-05-07 10:04:10 | 1.1 |
| 2013-05-12 10:04:10 | 2.1 |
| 2013-05-17 10:04:10 | 3.1 |
+---------------------+-----+
[3 rows x 2 columns]
The index column of the TimeSeries is: b

Methods

TimeSeries.add_column(data[, name]) Add a column to this TimeSeries object.
TimeSeries.apply(fn[, dtype, seed]) See apply() for documentation.
TimeSeries.argmax(agg_column) Return index of the row with the maximum value from agg_column.
TimeSeries.argmin(agg_column) Return index of the row with the minimum value from agg_column.
TimeSeries.column_names() See column_names() for documentation.
TimeSeries.column_types() See column_types() for documentation.
TimeSeries.copy() Returns a shallow copy of the TimeSeries.
TimeSeries.dropna([columns, how]) See dropna() for documentation.
TimeSeries.dropna_split([columns, how]) See dropna_split() for documentation.
TimeSeries.filter_by(values, column_name[, ...]) Filter a TimeSeries by values inside an iterable object.
TimeSeries.group(key_columns) Separate a TimeSeries by the distinct values in one or more columns.
TimeSeries.head([n]) The first n rows of the TimeSeries.
TimeSeries.index_join(other[, how, ...]) Join the TimeSeries object with the other TimeSeries object based on the join method.
TimeSeries.join(other[, on, how]) Join the current (left) TimeSeries with the given (right) SFrame using a SQL-style equi-join operation by column and generates as output a new SFrame.
TimeSeries.print_rows([num_rows, ...]) See print_rows() for documentation.
TimeSeries.remove_column(name) Remove a column from this TimeSeries.
TimeSeries.rename(names) Rename the given columns.
TimeSeries.resample(period[, ...]) Resample or bucketize the input TimeSeries based on the given period.
TimeSeries.save(location) Save the TimeSeries object to the given location.
TimeSeries.shift(steps) Shift the non-index columns in the TimeSeries object by specified number of steps.
TimeSeries.slice(start_time, end_time[, closed]) Returns a new TimeSeries with the range specified by start_time and end_time.
TimeSeries.swap_columns(column_1, column_2) Swap the columns with the given names.
TimeSeries.tail([n]) The last n rows of the TimeSeries.
TimeSeries.to_sframe([include_index]) Convert the TimeSeries to an SFrame.
TimeSeries.tshift(delta) Shift the index column of the TimeSeries object by ‘delta’ time.
TimeSeries.union(other) Union the TimeSeries object with the other TimeSeries object.

Attributes

TimeSeries.max_time The maximum value of the time index.
TimeSeries.min_time The minimum value of the time index.
TimeSeries.range The minimum and maximum value of the time index.