Bitcoin price prediction using ARIMA and LSTM

. The goal of this paper is to compare the accuracy of bitcoin price in USD prediction based on two different model, Long Short term Memory (LSTM) network and ARIMA model. Real-time price data is collected by Pycurl from Bitfine. LSTM model is implemented by Keras and TensorFlow. ARIMA model used in this paper is mainly to present a classical comparison of time series forecasting, as expected, it could make efficient prediction limited in short-time interval, and the outcome depends on the time period. The LSTM could reach a better performance, with extra, indispensable time for model training, especially via CPU.


Introduction
Finance Field has long been regarded as a prospective field in Machine learning, considering the price of financial assets is always non-linear, dynamic and chaotic, namely, it is difficult to predict. [1] Many famous organizations, including American Accounting Associates(AAA), EMERJ have all developed their own research areas. Models including RNN, LSTM have all proven to be efficient in predicting the future trend of finance grows in stocks, shares and currency flow.
Bitcoin is a special type of virtual currency, it is broadly used in series of online trading systems. In the past decades, the price of Bitcoin has went through series of fluctuation. Nowadays, the average price of Bitcoin is about 7000 from BTC to USD. It is an ideal platform to test the machine learning models as well as traditional time series prediction as its relatively young age and resulting volatility. [2] ARIMA has been shown to be one of the most commonly used algorithms in time-series data prediction. It is applied to forecast prices and performs satisfied [3]. Comparing to ARMA, it is more precise, and take less time to make calculation. In addition, LSTM have proven to be an efficient tool in making price prediction, risk-recognition due to the temporal nature of bitcoin data.

Dataset
Bitfinex have provided an API for users to reach the realtime pricing information about Bitcoin, it can automatically collect data from this API and result will show in its homepage. Pycurl is a commonly used python library in collecting online data, which support users to request information communication between server and terminal, suitable for receiving data from Bitfinex website. In order to obtain a trainable, considerable price dataset, data is collected from the API, after extract the information (in JSON format), change the Dataframe format, initial data will finally convert to a suitable dataset for later different scenarios analyzation. Based on the method described above, by using 5seconds interval trading data on the website Bitfinex, 10000 prices information, include price between BTC/USD, ETH/BTC, ex cetera is collected. And then it is splinted into training set and testing set, each contains first 8000 and last 2000 values, use price of each 5 second for training model. Table 1 shows some fundamental attributes of the information of BTC/USD.

ARIMA model
In statistics and econometrics, and particular in time series analysis, an autoregressive integrated moving average (ARIMA) model is a generalization of an autoregressive moving average (ARMA) model. The 'integrated' refers to the number of times need to difference a series in order to achieve stationary, which is required for ARMA models to be valid. In other words, ARMA models is equivalent to an ARIMA model of the same MA and AR orders [4] In this section, the description of the proposed ARIMA model and the general statistical methodology are presented as follows:

Data Pre_requisition
Since ARIMA model is used for forecasting a time series which can be made to be 'stationary', after gaining the dataset. In most of the competitive electricity markets this series presents: high frequency, nonconstant mean and variance, and multiple seasonality. [4] Therefore the stationarity and seasonality of price data should be checked, performing differencing if necessary and choosing model specification ARIMA(p,d,q).
Considering the tiny gap between two neighbor price, an average price in a time pried can be adopted, the plot constructed by results that take an average for every 12 prices, which denoted the average price each minute, then its relative autocorrelation plot is also presented to show the stationarity of dataset.
A stationary series roams around a defined mean and its ACF plots reaches to zero fairly quick, while for the original price series, its ACF is slow-decaying, PACF first bar 1 , both implying price data is nonstationary. As previous result denotes that the initial time series is non-stationary, it is necessary to perform transformations to make it stationary, the most common way is to difference it, the right order of differencing is the minimum differencing required to get the near-stationary series to avoid over-difference. After lag-1 differencing, the result (Fig.2) and correspondent ACF describes its fast-decaying feature, which is a convenient evidence to support the stationarity of differencing data. Simultaneously, both ACF and PACF eventually decayed exponentially to zero, shows refined data is able to construct an ARIMA model. To avoid high random probability, Lagrange multiplier statistic test for heteroscedasticity. The p-value is 5.8137286e-14, much less than 0.05, ruled out the possibility of white noise.

Forcasting
After data pre-processing, the refined dataset contains 733 points in training set and 100 points in testing set separately. Model is autocorrelated, ARIMA(1,1,0), which is used for differenced first-order autoregressive model, is suitable for forecasting, by regressing the first difference of Y(in this case is the bitcoin price), on itself lagged by one period. The model would yield the following prediction equation:

LSTM model
Long short-term memory (LSTM) is developed from Recurrent neural network (RNN) model to solve the vanishing gradient problem. Comparing to the traditional Front forward neural network (FNN), the RNN adds a self-connecting edge to every node in the network, and thereby allowing the neural network to utilize the prediction result from the last run, to make time-series prediction. [5] This characteristic allows programmers to process various data, including handwriting recognition, speech recognition and anomaly detection. LSTM could be regarded as an improved version of RNN, it adds the input gate, output gate and forget gate to the existing cell units. These additional gates allow the cell unit to discard useless information, and memorizing important information in the training process. These gates could also handle with the exploding and vanishing gradient problems. In nowadays, LSTM is the most broadly-used tool to making classifying, processing and making predictions based on time series data. The forget gate takes the h and x as input, and output a decision (0~1) to the other parts. 0 represents "forget all", whilst 1 represents "keep all". In the input gate, the sigmoid function would decide which information to renew. Finally, the output gate would decide which value to output. It will combine the information from different parts, and decide which information to keep, and which information to output.
The model is trained 100 epochs with 10 pieces of continuous data in each round, the loss plot ( fig.7) recorded from each epoch shows that loss initially be large and then convergence immediately. The loss close to 0.02 at first, after 5 epochs, it becomes around 3 10 and reduce its decaying rate. With different input training shape, which is the time length for predicting next step's value, loss growth as time period stretch. Loss based on different strategy of input time series are also presented in the figure. In the training process, it is obvious that at the beginning of training, the loss is relatively high, but it could decrease sharply as the epoch goes up. After 3~4 epochs, the loss is very close to zero already.
(a)prediction based on 1 previous data (b) error distribution Trained LSTM model performs a satisfying prediction of testing set. The average Error rate is 0.4765938, with a standard deviation of 2.092208. This is ignorable since the original data is quite large. Model that use 5 and 10 previous data is also tried in testing set and comparing to single previous point, 5 or 10 points considered for forecasting is actually having negative effect, even to capture the fluctuation of data, the predicted result is not as precisely as expected.
(a) prediction based on 5 previous data (b) prediction based on 10 previous data In Dec 14th, another test set, containing captured 3752 pieces of information from the Bitfinex, is also tried for the model. The prediction result shows that LSTM is also performing well in datasets collected from various time periods.

Conclusion
Although both ARIMA and LSTM could perform well in predicting Bitcoin price, the LSTM would take extra amount of time to train the neural network model for about 42 minute an epoch via 4 core CPU or 1 minute 12 seconds via 2 core GPU. However, after training, the LSTM could make prediction more efficiently, and the precision rate is also higher. In general case, taking less previous data to make prediction in LSTM could lead to better result. ARIMA is quite efficient in making prediction in short span of time; but as the time grows, the precision rate would decrease.