Investigating How the Content Semantic Features Inﬂuence the Social Media Rumor Refutation Effectiveness

. Due to the widespread use of internet and social media, rumors can quickly spread to every corner of the world. Therefore, it’s important to repute rumors precisely. In order to investigate how content semantic features a ﬀ ect the e ﬀ ectiveness of rumor reputation, this paper crawls 55847 reposts and 77080 comments data of 251 rumor reputation microblogs for empirical analysis. Reputation e ﬀ ectiveness index (REI) is chosen as the response variable. Independent variables are deﬁned from two aspects, that is content factors and semantic factors. This paper also set creator factors as control variables to exclude infection by other variables. Our research proved that some independent variables (the number of “!”, the presence of an obvious title and hot topics) have a signiﬁcantly positive relationship with REI. Moreover, under di ﬀ erent topics, hot topics have di ﬀ erent enhancing e ﬀ ects on REI. Finally, some suggestions about rumor management are proposed.


Introduction
With the development of online communication, the spread of rumors becomes more rapid and impactful. For example, accompanied with the outbreak of COVID-19, a word called "information epidemic" has been created. It means when too much information (some right, some wrong) makes it difficult for people to find out what they can trust. It may even be harmful to their health [1]. Due to the use of personalized service technology, pushed information is based on a cumulative portrait of users' information. It shows special difficulty of Internet rumor management, but also highlights the urgency and importance of Internet rumor management. Many users produce information with obvious irrational emotions and profit-seeking purposes. They do not carefully check the authenticity of information when sharing it, which leads to the rapid spread of a large amount of false information. Then people are prone to misjudge it, creating a vicious circle. Therefore, the precise management of rumors has significant practical significance and application value.
The existing research directions related to rumor governance are mainly in the areas of modelling, influencing factors and effectiveness evaluation. (1) Modelling. Kumari et al. proposed a large-scale pre-trained model for text novelty detection and sentiment prediction tasks [2]. Luo et al. proposed a framework for identifying rumor positions by combining network topology and online social media comments as a way to accurately identify rumors [3]. (2) Influencing factors. Lv et al. analyzed the temporal evolution characteristics and spatial-temporal correlation characteristics of rumors. They revealed characteristics and driving factors of social rumors during the COVID-19 [4]. Lan et al. explored which key factors act on the influence of online rumors on online public opinion. The strength of the influence of each parameter under different public opinion patterns was evaluated [5]. (3) Evaluation of effectiveness. Li et al. proposed an appropriate reputation effectiveness index (REI) and made recommendations for decision-making on disinformation platforms [6]. Wang et al. developed a Debunking Effectiveness Index (DEI) to help government develop effective strategies to debunk rumors and curb their spread on social media [7].
Overall, there is no existing literature that specifically addresses the impact of content semantic features on the effectiveness of rumor reputation. But the specific content contained in the microblog is a key factor influencing users' judgement. Therefore, the contributions of this paper can be concluded in the following: (1) The influence of content semantic features on the effectiveness of rumor reputation is systematically considered. (2) The topic of the microblog is explored as a mediating variable affecting the effectiveness of rumor reputation as an interaction term.
The rest part of this paper is organized as follows. Some key problems including the definition of reputation effectiveness index and contents semantic factors will be stated in Section 2. The process of data preparation and variables setting will be stated in Section 3. An analysis of the empirical data will be conducted and results will be explained in Section 4. Some conclusions and suggestion will be presented in Section 5, as well as prospects for future research.

Key Problems Statement
In this section, key problems, that is key definition will be stated.

Definition of REI
To quantify the effectiveness of rumor refutation, Li et al. proposed the rumor refutation effectiveness index (REI), which took into account both the number of refutation posts and the impact of dissemination [6]. It is expressed in the following way: where r indicates the number of reposts of the rumor refutation microblog. Among the reporters, there is also a group of influential users, namely the big-v users of microblog. Bigv users' followers can be potential reposters. The number of potential reposts is represented by r * k, with k indicating the proportion of big-v users among the reposters. p indicates the number of positive comments on a rumor refutation microblog. The 1 in the denominator indicates the creator of the rumor refutation microblog, who is the person must agree with the microblog.

Definition of Contents Semantic Factors
Existing research shows that the creator, specific content, structure and topic of the rumor refutation microblog all have an impact on their dissemination process. The theoretical framework of this paper is shown in figure 1.  [8]. Therefore, this paper defines general text features as: text length, the number of sentences, "!" and "?".
2) Richness of content. The proportion of multimedia elements and external links have a positive effect on online interactions in government departments [9]. The presence of figures, videos, titles, mentions, tags, and emotions also affects the number of users' comments [10]. Therefore, richness of content is measured by the number of figures, tags, "@", the presence of links, videos, titles.
3) Richness in parts of speech. The number of adjectives and adverbs have a significant effect on the propagation of government microblogs [11]. Therefore, richness in parts of speech is measured by the number of nouns, verbs, adjectives and adverbs.
2) Related topics. It's proved that under different topics, positive words have significantly different influences on information transmission [11].
3) Hot topics. Du et al. proved that hot topics could attract people's continuous attention and discussion in a long period of time [13]. Therefore, if the rumor reputation microblog is related to hot topics, it will have a better rumor reputation effect.
(3) Creator Factors 1) Number of fans. The number of users' followers plays an important role in the dissemination of information. The more fans they have, the more they are able to spread information to others [14].
2) Number of daily posts. The number of users' daily posts can be a measure of their activity. But when the amount of information is too much, it can reduce the attention of one of the posts [15].

Data Collection and Processing
This section states the process of data preparation, as well as variable setting. Description content_length The total length of the microblog in Chinese characters num_sentence The number of sentences in a rumor reputation microblog num_! The number of "!" in a rumor reputation microblog num_?
The number of "?" in a rumor reputation microblog num_figure The number of figures in a rumor reputation microblog link Whether the microblog contains a link? (True=1, False=0) video Whether the microblog contains a video? (True=1, False=0) title Whether the microblog contains a clear title? (True=1, False=0) num_tag The number of tags in a rumor reputation microblog num_@ The number of"@" in a rumor reputation microblog num_n The number of nouns in a rumor reputation microblog num_v The number of verbs in a rumor reputation microblog num_adj The number of adjectives in a rumor reputation microblog num_adv The number of adverbs in a rumor reputation microblog sentiment The emotional tendency of a rumor reputation microblog (Positive=1, Negative=0) topics The topics of a rumor reputation microblog (Political=1, Social=2, Economic=3) hot_topic Whether a hot topic contain in the microblog followers The followers of the rumor reputation microblog creator daily_posts The daily posts of the rumor reputation microblog creator

Data Collection
This paper crawls 251 rumor reputation microblogs and their corresponding 55,847 forwarding data and 77,080 comment data from the official rumor reputation platform of microblog through microblog API and web crawler. Rumor reputation microblogs mainly includes political, social and economic three topics.

Data Processing
Chinese text does not use spaces as a natural separator between words like English text. It is important to split the words before further processing the microblog. Jieba is chosen as the word splitting tool, and a more precise and accurate word splitting mode is used. Using spaces as dividers, the sequence of words is saved in the database after the word separation. In order to identify the sentiment tendency of text messages and count the number of positive comments, this paper uses Baidu's AirNLP to perform sentiment analysis. It gives a visual indication of the probability that the text is positive. When the probability is greater than 0.5, the text is judged to be positive and positive, while the opposite is true.

Variable Setting
According to theoretical framework, variable setting is shown in table 1.

Empirical Analysis and Results
This section carries out descriptive and statistical analysis. A multiple linear regression model will be established. The results of data analysis will also be interpreted.

Descriptive statistics
For continuous numerical variables, scatterplots between REI and independent variables are shown in figure 2. It's found that only the number of "!" has a positive relationship with REI. The specific relationships between the variables are described in detail in the statistical analysis section.
For the categorical variables, this paper performs descriptive statistics for the REI under different categories. The specific results are shown in table 2. From a thematic perspective, social rumor reputation microblogs have a higher REI and are more effective when they are hot topics. From the perspective of sentiment, the number of negative rumor reputation microblog is higher, but the average REI is lower than that of positive rumor reputation microblog. From the perspective of the content factors, the presence of links, videos and the average REI are not significantly related. The significant difference in average REI for the presence or absence of explicit titles may be due to the uneven number of categories.

Statistical Analysis
There exist various approaches for statistical analysis of large dimensions in the frame of regression models [16][17][18] and different methods for estimating the different parameters of the model and the elements of covariance matrix. Due to the large number of independent variables, this paper first uses the variance inflation factor (VIF) to make a diagnosis of covariance. When the value of VIF is greater than 10, there is multicollinearity in the corresponding variables. Only the VIF corresponding to content_length is found to be greater than 10. Therefore, content_length is removed and we re-run the multicollinearity test. The results demonstrate that there is no multicollinearity, shown in table 3.
To further explore the effect of content factors and semantic factors on the effectiveness of rumor reputation, this paper conducts a partial correlation analysis using the creator factor as control variables. Part of the results of the partial correlation analysis are shown in table 4. For content factors, it's found that only num_! and title are significantly positively correlated with REI, with a correlation coefficient of 0.253 and 0.160. For semantic factors, hot_topic  is significantly positively correlated with REI, with a correlation coefficient of 0.246. While sentiment and topics are not significantly correlated with REI, they are significantly correlated with hot_topic. This paper therefore further explores the interaction term effect of hot_topic with sentiment and topics respectively, shown in table 5. The results show that the interaction term topics*hot_topic is significantly and positively correlated with REI. This suggests that hot topics can enhance the effectiveness of rumor reputation, and that the enhancement effect varies across topics.

Results of Multiple Linear Regression Model
Based on the previous descriptive and statistical analysis, a multiple linear regression model is developed, with the results shown in table 7 and figure 3. The variables significantly associated with REI are used as independent variables. Model 0 is the basic model, including only the dependent variable REI and the control variables. Model 1 introduces num_! and title (content factors). Model 2 introduces hot_topic (semantic factors) on fundamental of model 1. Model 3 introduces the interaction term between hot_topic and topics. The adjusted Rsquared and information criterion (AIC) are chosen as the criterion for judging the goodness of fit of the model.
As for content factors, the "!" number of microblog and whether the microblog has an obvious title are positively correlated with REI. "!" is usually found alongside some degree words such as "very" and "especially", which tend to attract attention. When people pay more attention to a rumor reputation microblog, they are more likely to repost the microblog. Meanwhile, if a microblog has a clear title, it allows people to find their interested topics. They can easily become potential reposters and commenters of rumor reputation microblog as well.
As for semantic factors, both hot_topic and the interaction item are significantly related to REI. If a rumor reputation microblog belongs to a hot topic, it's more likely to spark a  In terms of model goodness of fit, the adjusted R-squared gradually increases and AIC gradually decreases from Model 0 to Model 3. This further validates the correctness of the proposed model.

Discussions
This paper explored the impact of content semantic features on the effectiveness of rumor reputation. The main conclusions were drawn as following: content semantic features have an impact on the effectiveness of rumor reputation. (1) The number of "!" and the presence of title which belong to content factors have a significantly positive relationship with REI. (2) The presence of hot topics plays an important role in improving the rumor reputation effectiveness. Moreover, different topics have different impact on the enhancing effect. Compared with political and economics categories, social rumor reputation microblog containing a hot topic presents a stronger effectiveness.
According to the conclusions of this paper, some suggestions about rumor reputation are proposed. (1) Rumor reputation microblogs should include an obvious title to attract people who are interested in this topic. (2) Some identifiers with warnings may be used in rumor reputation microblog, such as "!". They can be more successful in highlighting what the microblog wants to say. (3) The rumor reputation microblogs contain hot topics can spread and be accepted much more easily. Therefore, the government should pay more attention to the rumor management which not contain a hot topic, especially economic and political rumors.
In the feature research, more data should be collected to perform more accurate analysis. Methods such as machine learning and structural equation modelling can be applied to further explore the factors affecting the effectiveness of rumor reputation.