Studies on the influencing factors and prediction of product star change in the process of e-commerce transaction based on BP neural network and VAR models

Based on the data of reviews and scores of pacifiers sold in Amazon online market from February 2011 to August 2015, this paper extracts the text emotion words and the deviation degree of text content from the theme through the LDA theme model, and then combines the text length, based on VAR model to analyze the impact of the overall star level volatility of the market by comment length, text emotional words and topic deviation. Further, this study compare the prediction of star level by VAR model and BP neural network model, and finally put forward a more stable prediction model.


Introduction
Consumers can select commodities based on the information displayed by merchants, and merchants can adjust their sales strategies at different times based on consumer comments on commodities [1]. During the ecommerce transaction process, all transaction behaviors can be retained in the form of data. The text of ecommerce reviews is subjective and often difficult to use directly by sellers. Therefore, finding the link between the review text and the star rating is of some help to the seller in analyzing the product market.
Regarding the extraction and analysis of e-commerce review information, many scholars have conducted analysis and research. The LDA model is an important model for extracting text information. By the LDA theme model, [2] and [3] provide intuitive information for users of e-commerce reviews based on the analysis from multiple dimension. A large number of researchers believe that emotional orientation has a significant contribution to the overall evaluation of products. [4] and [5] believe that the emotional tags of text content have an important impact on people's rapid grasp of product characteristics.

Data source and processing
The data used in this study represents ratings and comments from customers of pacifiers ovens sold in the Amazon market from June 2004 to August 2015.

Data cleaning
We have deleted the items where "vine" and "verified purchase" are either N or Y, that is, the merchant does not participate in the Amazon vine review program, and the customer does not purchase the product in the Amazon market, or the merchant participates in the Amazon vine review program and the customer purchases the product in the Amazon market.
As the evaluation and comment of e-commerce platform are often influenced by some network water forces, based on the criteria of "network water forces" [6], we propose the criteria combining with the actual data: a) The same user commented on the same product for many times; b) There is a rating, and the comment title is inconsistent with the comment content; c) There is too much difference between single comment and other comments; Based on the above criteria, we manually filter and delete comments with the above characteristics.

Topic deviation and emotional vocabulary
In this paper, LDA topic model is used to predict star level of products based on text classification. LDA model is essentially a Bayesian network with clear logical hierarchy. It is mainly divided into three layers: words, themes and documents. LDA model is used to estimate the topic distribution of documents. It gives the topic of each document in the document set in the form of probability distribution. LDA model can also be used to calculate the degree of deviation between each comment and the topic. The greater the degree of deviation, the higher the possibility that it is a cyber Navy. There are two parts to calculate the deviation degree between the comment and the topic.
In this paper, the standard topic is generated by the method of average value. For each product, the probability of all comments is taken as the average, and a document topic model of standard comments is generated.
For each comment, we calculate the degree of deviation from the standard comment. In this paper, we use cosine similarity calculation method to get M1, the formula is as follows.
In addition, after analyzing some documents and extracting the topic distribution, we can cluster or classify the text according to the emotional tendency in the comments, and get M2 and M3.

Measure words
In addition, to quantify the amount of information in each comment, we added M4 to measure the number of words. We divide the amount of information in each comment into ten levels. The higher the level, the more information the comment contains.

Other indicators
The rest comes from Amazon.

BP neural network
BP neural network structure includes input layer, hidden layer and output layer, and each layer is connected by nodes. The model adjusts the weight and threshold of the network continuously through error back propagation to minimize the sum of squares of the network error. After training, the neural network model can store a large number of input-output mode mapping relationships, so as to obtain the prediction results.
The neural network architecture adopted in this paper has one input layer, one output layer and two hidden layers with 10 nodes whose activation function is Tansig and purelin respectively. Then, gradient descent algorithm of Momentum Back Propagation and dynamic adaptive learning rate traingdx are selected for optimization. We divided 53 samples into 47 for training and 6 for test. A star prediction model based on BP neural network is established.

VAR model
By taking every endogenous variable in the system as a function of the lag value of all endogenous variables in the system, the model is constructed, avoiding the requirements of structural model. VAR model is an effective prediction model for the interconnected time series variable system. At the same time, vector autoregression model is frequently used to analyze the dynamic influence of different types of random errors on system variables.
VAR model describes that n variables (endogenous variables) in the same sample period can be used as linear functions of their past values. A VAR (p) model can be written as: yt C A1yt-1 A2yt-2 .... Apyt-p et (2) C is n × 1 constant vector, each Ai (A1、A2…) is n × n matrix. Et is n × 1 error vector.
In this study, the VAR star prediction model will be established based on the theme deviation degree and comment sentiment of microwave oven review, and compared with neural network model to find out the best star prediction model for e-commerce market.

Prediction accuracy of the model
This paper uses the percentage of prediction error to measure the prediction accuracy of the model. The calculation formula is as follows: p e = (a i -e i ) e i ⁄ (3) Among them, "p e " is the percentage of prediction error, "a i " is the predicted value of the test sample, and "e i " is the actual value of the test sample.

Fitting degree of the model
This paper uses AIC information criterion to measure the good of statistical model. The calculation formula is as follows: AIC= ln RSS n ⁄ + k n ⁄ (4) Where n is the number of training samples and K is the number of explanatory variables.

Unit root inspection
Unit root test is usually used to check whether there is a unit root in a time series. If there exists unit root, it means that this is a non-stationary time series.

Characteristic root test
The eigenvalue is used to judge whether the VAR model is stable or not. If all the roots of the model fall in the unit circle, the VAR model has reference value.

Fig1. Prediction results of BP neural network
Through the MATLAB operation model, the model has a good prediction ability for star series, and the prediction error is between -0.05% -1.1%. The RSS of the model is 0.0248 and AIC is -8.6618.

Stationary test of variables
The results of ADF test are as follows: all six variables reject the original hypothesis at 5% significance level, that is, all variables are stationary series.

Model results and tests
In econometrics, information criteria such as SC, AIC and HQ can help to determine the optimal lag number of the model. As shown in the table below, the results of SC and HQ support the selection of 1-Phase lag. Therefore, this paper establishes VAR (1) model.

Stability test of VAR model
As shown in the figure, all eigenvalues are in the unit circle, so the model is stable. b) Impulse response Figure 4 shows the response of as to the change of one standard deviation unit of each variable. The impact of M5 on itself is 0.0267 in the current period, and then gradually decreases to a stable level; for the impact of M2, M3 and M4, the current fluctuation of M5 is 0, reaching the peak in the second period, and then decreases with the increase of the number of periods and finally tends to be stable.

d)Prediction effect
The VAR model is established to predict as in 2015m03-m08 for six months, with a prediction error of 0.07% -0.27%, which has a good prediction ability.

Results and analysis
We use LDA topic model to extract the content of ecommerce text reviews, and analyze the potential influencing factors of pacifier review stars on Amazon from three aspects: emotional vocabulary, number of review words, and topic deviation. By analyzing the results of VAR model, we find that the whole market stars, positive emotion words and negative emotion words in the previous period have a significant impact on the current market stars. It is worth noting that in the variance decomposition, the contribution of negative emotion words to star level fluctuation is nearly twice as much as that of positive emotion words, which indicates that in the pacifier market, consumers are more likely to be affected by negative emotion words when commenting on goods. At last, we use BP neural network model and VAR model to predict the data out of sample. The prediction results show that the two models have excellent prediction ability.