Study on the Spatial and Temporal Characteristics of Weibo Users’ Online Shopping Festival Concerns: A Case Study of “Double 11 Shopping Carnival”

. Online shopping festivals have become powerful ways for major e-commerce platforms to conduct promotions. To explore the rule and hidden information of the change of users’ attention to online shopping festivals, it can provide scientiﬁc and e ﬀ ective marketing reference, and promote the sustainable development of online shopping festivals. This paper takes the “Double 11” online shopping festival as the research object. Based on the LDA + LSA + NMF integrated theme extraction model, kernel density analysis, grouping analysis and cluster outlier analysis in ArcGIS technology are adopted. We explore the evolution pattern of microblog users’ attention to the online shopping festival in the spatial and temporal dimensions. In the temporal dimension, the number of releases showed a stepwise increase, and concerns showed annual topic evolution and daily three-stage di ﬀ erential evolution. In the spatial dimension, the release volume showed a gradient decreasing trend from coastal to inland and from east to west, and concerns showed certain regional di ﬀ erences. The main contribution of this paper is to propose a funnel theme extraction model with multi-model integration, and provide a new perspective for the spatial research of online shopping festival based on ArcGIS spatial analysis technology.


Introduction
With the rapid development of the Internet, in order to attract consumers to buy goods, major e-commerce platforms and merchants have spawned a series of online shopping festivals, such as "Double 11" and "618". In online shopping festivals, the focus of consumers has been changing and adjusting with the change of consumer fashion. It is very important to timely capture and analyze the changes of consumers' concerns in the future online shopping festival. Because it can help the platform and merchants to pay attention to the consumer behavior, and directly related to the major online shopping platform sales strategic choice. Therefore, it is necessary to analyze and research these major online shopping festivals in terms of consumer focus and its changes.
Sina Weibo is the largest real-time interactive sharing social media platform in China. The data of consumers' blog posts in Weibo have both time and location tags, which reflect consumers' concerns in a more timely and accurate manner. Therefore, this paper explores the changes of consumers' focus on online shopping festival in time and space dimensions based on Weibo data. We mainly studied the number of microblogs published by consumers in online shopping carnivals over the years, as well as the temporal and spatial variation characteristics of their focus on online shopping festivals. Finally, we analyzed the law contained in it.
At present, the research on microblog attention analysis at home and abroad is mainly based on IF-TDF algorithm, LDA algorithm, word2vec algorithm model, and by combining the corresponding influencing factors, such as microblog hashtags [1] , retweet relationship [2] and local relationship [3]. We can build a more robust model to extract more accurate microblog attention topics. As for the online shopping festival, which is an important part of the current new online marketing, its related research is mainly about the sales prediction of the festival [4] and marketing methods [5]. However, there is a lack of academic research on online shopping festivals from the perspective of consumers. This paper took "Double 11" online shopping festival as an example and collected relevant microblog data. We build an integrated topic extraction model LDA+LSA+NMF through natural language processing techniques, which has obtained better results in the empirical study. In addition, we used ArcGIS for visual spatio-temporal exploration to explore the evolution pattern of microblog users' concerns about online shopping festivals in spatiotemporal dimensions. On the one hand, it can provide insight into the changes of consumers' attitudes toward online shopping festivals in multiple dimensions. On the other hand, it can bring practical application guidance for e-commerce platforms and merchants' online shopping festival operations to improve the marketing strategies.
The main contributions of this paper are as follows: (1) We propose a multi-model integrated funnel topic extraction model. By comparing the LDA model, LSA model and NMF model, we created a coarse-to-fine topic extraction model. It can improve the accuracy of topic extraction, enhance the variability among individual topics to a greater extent, and make the topic extraction results more interpretable. (2) We provided a new perspective for the spatial study of online shopping festivals. Based on ArcGIS spatial analysis technology, we analyzed the attention of network users in different regions to the online shopping festival. It made up for the lack of space exploration in the field of online shopping, and provided a theoretical basis for the study of online shopping festival spatial data analysis

Literature Review
By combing the domestic and foreign literatures in recent years, it can be seen that scholars focus on the online shopping festival from two aspects: research perspective and analysis method.
The research perspectives of online shopping festivals mainly include the exploration of key influencing factors, promotional strategies and consumer behavior. In the study of exploring influencing factors, Sumeru et al. [6] investigated the impact of price sensitivity on online shopping atmosphere to determine whether incentives would affect consumers' purchase intention. Based on the theory of shopping motivation, the theory of social influence and the theory of conformity, Liu et al. [7] established a research model of the factors influencing consumers' shopping behavior during the online shopping festival. For the promotion strategies of online shopping festivals, Chen et al. [8] explored the effectiveness and differences of the "Double 11" online shopping festival promotion stimulus on consumers' willingness to participate with different personal characteristics. Dewi et al. [9] explored the effect of online shopping festival promotion strategies on consumers' willingness to participate, and also examined the moderating effect of atmosphere promotion and control variables on consumers' willingness to participate in online shopping festivals. In recent years, some researchers have also begun to investigate the dynamic effects of consumer emotions on purchase intentions. Wang et al. [10] empirically analyzed the influence of online shopping festival on the online shopping sentiment of Chinese consumers and the online sales price of commodities. Xu et al. [11] made use of carnival theory and herd behavior to explore how information incentives and social influences affect consumer behavior during Singles' Day. From the perspective of the analysis method of online shopping festival, case study is the main research method. Domestic and foreign research cases mainly focus on "Double 11" and "Black Friday". Wang [12] conducted an in-depth discussion on the causes, characteristics and scripts of the "Double Eleven" event through literature analysis, case study and comparative study. Srivas [13] conducted a questionnaire survey to understand the impact of "Black Friday" sales on consumer behavior in Delhi. In recent years, big data analysis methods based on marketing have become a trend. Ye et al. [14] analyzed the patterns and trends of Black Friday in the U.S. based on Twitter tweet dataset. They used DEA method to analyze tweet patterns in time and space. Lee et al. [15] analyzed descriptive features of perceptions and motivations. They developed and tested a shopping mechanism model to examine Korean consumers' experiences and perceptions of the Black Friday shopping festival.
Based on the related research analysis, it can be seen that the research of online shopping festival has gradually begun to focus on the role of consumers and the impact on consumers. Among them, consumer behavior and emotion are the main focus of research and the future trend. Meanwhile, the existing research methods of online shopping festival are mainly based on case analysis and questionnaire analysis, which are prone to problems such as insufficient representativeness of sample survey.
Social media such as Weibo are favored by many researchers due to their data characteristics and topicality, they also have time tag and geographic location tag. Therefore, we use the spatio-temporal data characteristics of social media to build LDA+LSA+NMF integrated topic extraction model. Then, we deeply explore the evolution of users' concerns on "Double 11" from the perspective of time trend and geographical location trend based on the analysis method of spatial and temporal characteristics. Finally, we excavate the consumer behavior rules of online shopping festival to provide new ideas for the corresponding research of scholars in the future.

Research Ideas
In this paper, we firstly collected the Weibo data of "Double 11" online shopping festival, and extracted the Weibo users' attention points through the proposed LDA+NMF+LSA integrated theme extraction model. Then, based on the extracted topic content and the number of microblog posts, we used ArcGIS to visualize the spatio-temporal exploration. We analyzed the evolution rule of Weibo users' focus on online shopping festival by region and time segment. Finally, we put forward the corresponding enterprise marketing suggestions. The overall research framework is shown in figure 1.

Topic Extraction Model
(1) LDA probabilistic topic model Latent Dirichlet Allocation (LDA) model is a three-layer probabilistic topic generation model containing words, topics and documents. It has been applied by a large number of researches in the field of natural language processing [16][17][18]. The process of generating document topic probabilities by the LDA model is organized as shown in figure 2.
Firstly, we select a document D according to the prior probability P (d). Secondly, we generate a document by sampling from the Dirichlet distribution D of the topic distribution θ d , the topic distribution θ d generated by the Dirichlet distribution with hyperparameter α. Thirdly, we sample from the polynomial distribution of the topic θ d to generate the topic Z d,n of the nth word in the documents D. Fourthly, we generate the topic Z d,n from the Dirichlet distribution with hyperparameter β corresponding word distribution ϕ K . In other words, the word distribution ϕ K is generated by the Dirichlet distribution with parameter. Finally, we sample words w d,n from the polynomial distribution of words ϕ k . (2) Non-negative Matrix Factorization model Non-negative Matrix Factorization (NMF) model is an efficient data dimensionality reduction model [19]. The decomposition idea of the NMF model is sorted out and shown in figure 3. Document matrix V M×N includes M words and N texts, where V mn corresponds to the feature value of the mth word in the nth text, and the normalized TF-IDF value based on pre-processing is usually used as this value. After the processing of the NMF model, the document matrix can be decomposed into a topic matrix W M×K and a topic weight matrix H K×N . K is the number of generated topics, W mk denotes the probability correlation between the mth word and the kth topic, and H kn denotes the probability correlation between the jth text and the kth topic.

(3) Latent Semantic Analysis model
Latent Semantic Analysis (LSA) model is a practical model for topic extraction based on Singular Value Decomposition (SVD) method [20].We assume that we have the matrix A MN , then we define the SVD processing of the matrix A MN as the following equation: The above equation can be used to explain the SVD method using graphical abstraction, as shown in figure 4.

ArcGIS Spatial Exploration
We use ArcGIS to conduct in-depth mining of spatio-temporal data. The specific design can be divided into the following three parts: kernel density analysis, grouping analysis, clustering and outlier analysis.

(1) Kernel density analysis
The kernel density analysis tool in ArcGIS is used to calculate the density of an element in its surrounding neighborhood. The formula for calculating the kernel density of a point is as follows.
Where: i=1, ..., n is the input points, pop i is the population field value of point i, dist i is the distance between point i and the (x, y) position.
(2) Grouping analysis Group analysis tools in ArcGIS mainly group data naturally through unsupervised Kmeans machine learning algorithm, so as to minimize the differences between elements in each group of all groups. This study used the grouping analysis tool of cluster mapping in ArcGIS spatial statistics tool. Then we divided the volume of data posted by Weibo users in each province of China into spatial clusters. Finally, we achieved dimension reduction on regional features. statistic on a given set of weighted elements to identify statistically significant high value densities, low value densities, and spatial outliers. The spatially correlated Local Moran's I statistics are shown below: Where: x i is the attribute of element i. X is the average of the corresponding attributes, n is the total number of elements. w i, j is the spatial weight between elements i and j, and: A positive value of I indicates that the element has neighboring elements that contain equally high or equally low attribute values, and the element is part of the cluster. A negative value of I indicates that the element has neighboring elements that contain different values, and the element is an outlier. When the p-value is small enough, it means that the clusters and outliers are statistically significant at this point.

Data Acquisition and Pre-processing
This paper collected all the original microblogging data from 2010 to 2019, from November 1 to November 20, with the theme of "Double 11". We obtained 4,177,783 original microblog data. Then, we used Excel to remove redundant and duplicate data, and used the regular module of Python to delete "@username", "#topic content#", "web link", "meaningless words" and "extra spaces". Finally, we built a custom word splitting function based on the JIEBA word splitting library in Python to realize Chinese word splitting. A total of 3,741,936 pieces of data were counted after processing.

Comparison and Selection of Topic Extraction Models
In view of the advantages and limitations of LDA model, NMF model and LSA model in the process of topic extraction, we further tested the models with some collected microblog data. For example, we collected 109,699 blog posts during the "Double 11" warm-up period in 2013. Then we used three models to extract topics, set the number of topics to 15, and extracted the top 25 words with probability importance for each topic. Due to space constraints, we have selected the top6 words from the first five topics for presentation, where the numbers in parentheses are the probability correlation between words and topics, as shown in Table 1.
The extraction results in the above table showed that: in terms of processing time, the NMF model took the shortest time, the LSA model took moderate time, and the LDA model took the longest time. In terms of topic extraction results, the LDA model had better results due to the consideration of a priori knowledge, while the results of the NMF and LSA models were affected by the local hot topics and lead to some deviation of topics. There are certain similar themes in each model, and the subject words overlap among the themes. It can be seen that the results of the three models have their advantages and disadvantages. In "Buy" (0.104) "Remember" (2.254) "Remember" "Good" (0.056) "Wife" (1.675) "Wife" "Stuff" (0.040) "Paypal" (1.632) "Paypal" "Taobao" (0.033) "Code" (1.629) "Code" "Shopping cart" (0.021) "Internet Banking" (1.581) "Internet Banking" "Clothes" (0.019) "Error" (1.559) "Error" Topic 2 "Activity" (0.065) "Activity" (7.755) "Activity" "Special Price" (0.028) "Participation" (0.561) "Taobao" "WeChat" (0.018) "Time" (0.505) "Discount" "Mask" (0.017) "Address" (0.322) "TMC" "Dear friends" (0.014) "WeChat" (0.315) "Share" "Products" (0.014) "Participation" (0.299) "Shopping" The LDA+NMF+LSA integrated topic model used a funnel fusion approach to extract topics from coarse to fine. The specific ideas were as follows: first of all, we used LDA model, NMF model and LSA model respectively to extract 15 rough topics, and each rough topic included 30 most important keywords. Secondly, we combined the results of the three models according to the dimension of data processing. Thirdly, we used the efficient NMF model to extract the subject again. The final result includes 10 detailed topics, each of which includes 10 subject terms.
According to the idea of integrated theme model, we processed the data in the warm-up period of "Double 11" in 2013 again, and got the final theme result as shown in Table 2.
The results of the integrated topic model show that: firstly, the characteristics of each theme are more obvious, and the similarities between themes are significantly reduced. Secondly, the results of the integrated topic model have stronger interpretation. In conclusion, the LDA+NMF+LSA integrated topic model proposed in this study can bring more optimized results for topic extraction, and can lay a solid foundation for topic mining to analyze the attention points of microblog users on "Double 11".

Time Series Analysis
(1) Analysis of the variation of publishing volume over time We analyzed the temporal evolution of microblog release volume and got the following rule. According to the overall "Double 11" Weibo data from 2010 to 2019, the number of releases presents a "stepwise rise" trend, with obvious growth in 2011, 2014 and 2018 respectively, as shown in figure 5. As can be seen from the curve trend of daily microblog release volume, each year from November 1 to November 20 presents an obvious "inverted V" trend, as shown in figure  6. According to the peak shape, each year's "Double 11" is divided into warm-up period, peak period and afterheat period. The number of daily Weibo posts increased slowly before November 9, which we called the warm-up period. The number of daily Weibo posts began to surge on November 10, reached the peak on November 11, and then fell rapidly on November 12, which we called the three days the peak period. The release volume of Weibo began to decrease slowly on November 13 and decreased to the warm-up period from November 16. We called this period of time the after-heat period.
(2) Analysis of the time evolution of hot topics We use the LDA+NMF+LSA integrated theme model to extract the themes of the "Double 11" microblog data. Then we analyze the evolution of users' concerns in the three phases of the "Double 11" shopping festival: warm-up period, peak period and after-heat period through theme mining. Among them, "promotion" denotes marketing and advertising themes, "positivenegative feedback" denotes feedback or sharing opinions expressed by users during online shopping, "Logistics" refers to the information about express delivery posted by users after the "Double 11" online shopping, as shown in Table 3. Based on the above table, we can summarize the thematic evolution for each year and stage as follows.
• "Double 11" warm-up period: promotion is the main content, user's focus on changes slightly In general, the theme of "promotion" is the main content of the annual "Double 11". This shows that platforms and merchants are trying to attract consumers to make purchases before Singles' Day. From the themes of each year, the theme of "anticipation" gradually emerged from 2012 to 2014, "platform wars" and "negative feedback" began to appear in 2015, "negative feedback" and "TaoQi value swap" appeared in 2018, the main themes were "promotion" and "building" in 2019. This shows that with the advance of time, users' focus on "Double 11" online shopping has undergone slight changes.
• "Double 11" peak period: users actively participate in the discussion, feedback is more positive On the whole, the main contents that users focus on are "positive feedback", "holiday singles/wishes" and "turnover". It indicates that users are more likely to actively participate in discussions during the peak period. From the theme of each year, "negative feedback" was mainly concentrated in 2012 and 2013, and "Tmall Gala" has become the focus of Weibo users since 2016. This indicates that a large number of users had negative feedback due to the lack of perfect activities at the beginning of the "Double 11". Later, the appearance of "Tmall Gala" made "Double 11" more diversified, and users' feedback on online shopping was more positive.
• "Double 11" afterheat period: users are more concerned about logistics and express delivery In general, "logistics" and "positive/negative feedback" are the main topics of concern for Weibo users, indicating that a large number of users share their express delivery information and shopping feedback after the "Double 11" online shopping. From the themes of each year, "promotion" still appeared in the afterheat period of 2011 and 2012, indicating that merchants felt the afterglow effect of "Double 11" more obviously. The theme of "poor evaluation of logistics" appeared for the first time in 2019, reflecting users' dissatisfaction with logistics express delivery in that year.
In a word, the focus of users in each stage changes with the promotion of the festival, indicating that the theme of users in each stage of the "Double 11" is quite different. • The release volume in the geographical area shows a gradient decreasing trend from coast to inland and from east to west By using ArcGIS software, it can be seen that: firstly, provinces such as Beijing and Guangdong Province, which are in the high distribution area, are in the leading position in China in terms of economy. While the medium distribution volume and low distribution area are slightly lower, indicating that the geographical division based on the grouping of microblogging volume is feasible. Secondly, the division of each gradient geographically shows a distribution trend from west to east and from inland to coastal.

Spatial Analysis
• The eastern coastal provinces are in the high release aggregation area and the western inland provinces are in the low release aggregation area The results of clustering and outlier analysis of the number of tweets in the geographic area were obtained by ArcGIS software. The clustering results of each province in microblog data volume can be seen that Eastern coastal provinces are in the high posting volume aggregation area, and western inland provinces are in the low posting volume aggregation area. It conforms to the results of grouping analysis. Sichuan province is in the special high posting volume area, and Jiangxi province is in the special low posting volume area. The surrounding areas of Guangdong and Beijing lack high-value provinces and are in the low-density area, so they are not classified as high release volume aggregation area.
(2) Analysis of the spatial evolution of the hot topic We extracted the theme of "Double 11" in each year and each region respectively, and summarized the theme of each year as shown in Table 4.
According to the above • Active participation and positive user feedback in high release areas In the high release area, the themes that users focus on in general are "promotion" and "holiday blessing". indicating that Weibo users in the high release area are more actively. From the change of themes in each year, "holiday blessing" appears from 2011 to 2015, "promotion" decreased after 2011. It indicates that the marketing and promotion methods of platforms and merchants may have changed.
• Obvious promotional themes in the medium release area and positive user feedback In the medium release region, the overall themes that users focus on are "promotion", "holiday blessing" and "transaction amount". The theme of "transaction amount" appeared from 2012 to 2015, indicating that microblog users in this area are more concerned about the "Double 11" results. The number of "promotion" and "positive feedback" topics in this region is higher than that in the high-posting region, indicating that merchants are more active in marketing and users have more positive feedback.
• Low release area campus theme is prominent, the user positive and negative feedback co-exist In the low release area, the overall themes that users focus on are "promotion" and "logistics", the "negative feedback" appears more often than the previous two regions, indicating that the negative sentiment of users in this region is more obvious. "Campus feedback" appeared continuously in this region from 2016 to 2019, indicating that students become the main discussion participants in this region.

Marketing Suggestions
Based on the evolution of Weibo users' focus on "Double 11" in terms of temporal and spatial characteristics, we can put forward the following marketing suggestions.
(1) Warm-up period to create significant momentum, use low price strategy to attract audiences The theme of "promotion" is the main content of the "Double 11" warm-up period in each year, indicating that users pay more attention to promotional goods in the early stage. Therefore, businesses should adopt the low-price discount strategy in the "Double 11" warmup period, through the half-price, buy one get free discounts, to create a sense of "buy is to earn" experience for consumers.
(2) Peak period to shape the "Double 11" shopping festival ritual During the peak period, the platform and merchants can interact with various social media to drive users' enthusiasm for participation. In addition, "Tmall Gala" will be held to send "holiday blessing" to create a grand holiday atmosphere and boost users' enthusiasm for participation.
(3) The afterheat period leads to positive feedback from users and strengthens logistics services The topics of "logistics" and "positive/negative feedback" are the main concerns of Weibo users during the afterheat period. Therefore, platforms and merchants should guide users to give positive feedback and strengthen the logistics system, free freight insurance and delivery on time.

(4) Selection of geographical areas for targeted marketing
It is recommended that merchants target their marketing mainly to the high release area, the provinces have more diverse concerns and more active user participation. We can use the big data recommendation algorithm to precisely push student-related products to the low release area. It is a good choice to target Sichuan if merchants need to market in the southwest region.

Conclusion
To study the discussion hotspots of microblog users on the "Double 11" online shopping festival, this study crawls the Weibo content about "Double 11" from Nov. 1, 2010 to Nov. 20, 2019. Then we propose and construct an LDA+LSA+NMF integrated topic extraction model, and combine ArcGIS technology to analyze the spatio-temporal evolution of Weibo users' focus on online shopping festival. The results show that: in the temporal dimension, the annual topic-based evolution and the daily three-stage differential evolution are presented. In the spatial dimension, certain regional differences are presented, with more diversified attention in the high release area and more direct attention in the middle release area and low release area. In addition, it can provide feasible suggestions for enterprises' precise marketing in time and space. However, there are still limitations in this study, as the theme mining in the spatial and temporal dimensions is relatively rough. In the future, according to the actual needs, we can carry out more detailed theme mining. This work is partly supported by