Python Time Series Forecast on Bitcoin Data (Part II)

5/5 - (1 vote)

A Time Series is essentially a tabular data with the special feature of having a time index. The common forecast taks is ‘knowing the past (and sometimes the present), predict the future’ . This task, taken as a principle, reveals itself in several ways: in how to interpret your problem, in feature engineering and in which forecast strategy to take.

This is the second article in our series. In the first article we discussed how to create features out of a time series using lags and trends. Today we follow the opposite direction by highlighting trends as something you want directly deducted from your model. 

Reason is, Machine Learning models work in different ways. Some are good with subtractions, others are not.

For example, for any feature you include in a Linear Regression , the model will automatically detect whether to deduce it from the actual data or not. A Tree Regressor (and its variants) will not behave in the same way and usually will ignore a trend in the data.

Therefore, whenever using the latter type of models, one usually calls for a hybrid model , meaning, we use a Linear(ish) first model to detect global periodic patterns and then apply a second Machine Learning model to infer more sophisticated behavior.

We use the Bitcoin Sentiment Analysis data we captured in the last article as a proof of concept.

The hybrid model part of this article is heavily based on Kaggle’s Time Series Crash Course , however, we intend to automate the process and discuss more in-depth the DeterministicProcess class.

Trends, as something you don’t want to have

(Or that you want it deducted from your model)

An aerodynamic way to deal with trends and seasonality is using, respectively, DeterministicProcess and CalendarFourier from statsmodel . Let us start with the former. 

DeterministicProcess aims at creating features to be used in a Regression model to determine trend and periodicity. It takes your DatetimeIndex and a few other parameters and returns a DataFrame full of features for your ML model.

A usual instance of the class will read like the one below. We use the sentic_mean column to illustrate.

from statsmodels.tsa.deterministic import DeterministicProcess

y = dataset['sentic_mean'].copy()

dp = DeterministicProcess(

X = dp.in_sample()


We can use X and y as features and target to train a LinearRegression model. In this way, the LinearRegression will learn whatever characteristics from y can be inferred (in our case) solely out of:

the number of elapsed time intervals ( trend column); the last number squared ( trend_squared ); and a bias term ( const ).

Check out the result:

from sklearn.linear_model import LinearRegression

model = LinearRegression().fit(X,y)

predictions = pd.DataFrame(
columns=['Deterministic Curve']

Comparing predictions and actual values gives:

import matplotlib.pyplot as plt

ax = plt.subplot()
y.plot(ax=ax, legend=True)

Even the quadratic term seems ignorable here. The DeterministicProcess class also helps us with future predictions since it carries a method that provides the appropriate future form of the chosen features.

Specifically, the out_of_sample method of dp takes the number of time intervals we want to predict as input and generates the needed features for you.

We use 60 days below as an example:

X_out = dp.out_of_sample(60)

predictions_out = pd.DataFrame(
columns=['Future Predictions']

ax = plt.subplot()
y.plot(ax=ax, legend=True)
predictions_out.plot(ax=ax, color='red')

Let us repeat the process with sentic_count to have a feeling of a higher-order trend.

<div class=