Combining Big Data Analysis to Study the Relationship be-tween the Tone of CSR Reports and Information Asymme-try

. Big data mining and analytics help uncover hidden patterns and correlations in business. It serves as the optimal tool to interpret the behavior of companies in speciﬁc environments. Built on the large amount of data obtained from various sources, this paper examines the relationship between the tone of corporate social responsibility(CSR) reports and the degree of information asymmetry between investors and managers. Python software is used for data collection, text analysis, and word frequency statistics. The results show that the tone of the social responsibility report reduces the degree of information asymmetry, indicating that the tone of the social responsibility report has an incremental information e ﬀ ect. Further analysis shows that the tone of CSR reports signiﬁcantly reduces information asymmetry in companies with optimistic forecasts and high media attention.


Introduction
Disclosure of corporate social responsibility (CSR) activities has increased significantly over years. CSR reports have received extensive attention from stakeholders while playing an important role in the capital market [1]. A CSR report is a key tool to communicate social and environmental activities to stakeholders [2,3]. Social responsibility information disclosure is believed to significantly reduce information asymmetry in the capital market [4]. However, due to the low standardization, weak supervision, and lack of a unified reporting framework of CSR reports [5], enterprises have greater autonomy and arbitrariness in social responsibility disclosure. In the social responsibility report based on the narrative description, intonation has a significant impact on the information content and communication effect.
The existing literature on the impact of social responsibility tone on the capital market focuses on two points of view. Managers may use tone to provide external investors with timely, reliable, and hard-to-quantify incremental information, thus reducing the degree of information asymmetry. On the other hand, to cater to market stakeholders, managers use the text tone to exaggerate corporate social responsibility and development prospects, thereby expanding the information asymmetry between managers and investors [6]. In summary, it deserves further study whether the tone of social responsibility reporting is a means of information increment or impression management. It is of great significance to explore the impact of the tone of CSR reports on the information asymmetry between managers and investors, to strengthen the supervision of social responsibility information disclosure and improve the efficiency of capital market resource allocation.
Using a dataset of 5074 CSR reports issued by China's Shanghai and Shenzhen A-share listed companies from 2014 to 2021, we document an association between the tone of CSR reports and information asymmetry. The results show a significant negative relationship between the net positive tone of social responsibility reporting and the degree of information asymmetry, which is stronger among companies with optimistic forecasts by analysts and high media attention.
Our study makes the following contributions. First, this paper expands on the study of the economic consequences of tone in CSR reporting. This paper links CSR report tone with information asymmetry and clarifies the direct impact of CSR report tone on the market information environment, which has an important reference role for the formulation of current regulatory policies for CSR report tone disclosure. Secondly, our results add to the growing studies on the quantitative dimension of text information, which has incremental contributions to the research on text information content. Research has covered the tone of annual reports [7], the tone of management analysis and discussion [8], and the tone of quarterly earnings announcements [9], but little has been done on the tone of CSR reporting. Finally, this study enriches the research on the influencing factors of information asymmetry. At present, scholars mainly explore the impact of social responsibility information disclosure on information asymmetry [4] and consider less about the possible impact of the information content of the social responsibility report text itself. This paper explores the value of nonperformance information such as the tone of CSR reports to the information environment of the capital market.

Analysis Based on Information Increment Theory
The true disclosure hypothesis [10] holds that managers use positive and negative words to convey information which is difficult to quantify to indicate their expectations for the future performance of the enterprise.
On the one hand, studies have shown that corporate engagement in socially responsible activities can ease financing constraints, including borrowing costs [11] and equity capital costs [12], and can enhance company reputation [13] and performance [14], increase company value [15], and trigger a major stock market reaction. Therefore, it is of great significance to enterprises and investors how to convey corporate social responsibility information truthfully, effectively, and accurately. Unlike financial information, CSR information relies heavily on qualitative textual descriptions. Managers may need to use a richer tone in reports to convey information that is difficult to quantify.
Numerous studies on the net tone of the disclosure have shown that optimism in tone is effective in predicting a company's future operating performance and affecting the company's value. Specifically, the net tone in different disclosure formats, such as annual reports [7], management analysis and discussion [8], earnings presentations [9], and company news [16], are significantly and positively associated with a company's future surplus. Moreover, when the information implied in the tone of the text is interpreted, the market usually responds in line with the emotional direction of the text message, that is, the net tone in the text message is significantly positively correlated with the company's future stock excess return and trading volume [16]. Therefore, the incremental information provided by intonation can significantly reduce the information asymmetry between investors and company managers.
On the other hand, CSR information is one of the important reference information for investors to make decisions. In addition to obtaining the social responsibility report issued by the company, stakeholders can also obtain various corporate social responsibility information provided by independent third parties, such as the Rankins CSR Ratings(RKS), KLD index, and the social responsibility rating of Hexun.com. The involvement of these third-party information intermediaries limits managers' opportunistic disclosure motives.
In summary, we believe that the tone of CSR reports has incremental information value, which can provide more enterprise-level information for China's capital market, thereby reducing the information asymmetry between managers and investors. This paper proposes the hypothesis: Hypothesis 1a. The tone of CSR reports is negatively associated with information asymmetry between managers and external stakeholders.
Impression management is the act of managers using impression management methods to manipulate the information disclosed by the enterprise out of self-interested motives [17]. By selectively processing disclosed information, stakeholders can modify their impression of the company [18], thereby misleading the judgment of information users.
First of all, good social responsibility performance will bring many advantages to enterprises. Corporate social responsibility fulfillment can provide a positive image and reputation for a company [13] and moderate the negative judgment of stakeholders about the company. As a result, managers with opportunistic and self-interested motives may engage in impression management of CSR reporting, which in turn interferes with stakeholders' judgments about the company's image, performance, and even management's reputation [19]. Previous literature has shown that "greenwashing" and impression management are common in CSR disclosure [20]. In narrative description-based social responsibility reports, management's biased choice of tone is an important expression of impression management [20]. Cho et al [20] found that companies with low environmental performance used biased words and tone (i.e., more optimistic and less certain) to paint a more favorable picture of their performance. Zhang et al [21] found that "optimism" in social responsibility reports was negatively correlated with social responsibility fulfillment. The act of managers' using tone to manage impressions further widens the degree of information asymmetry between managers and investors.
Second, social responsibility reporting is less regulated, subject to weak regulation, and lacks a uniform reporting framework [22,23], and there is no mandatory third-party independent validation system, making it less risky for companies to manage social responsibility reports tone through impression management. Enterprises have a discretionary-based disclosure and the current social responsibility reports are disclosed as magnificent content but lack substance, lengthy reports but incomplete, incomparable information [24]. CSR reports are mainly descriptive text and lack quantitative information. This large amount of textual information presentation provides greater scope and facility for managers to manipulate disclosures through impression management.
To sum up, we believe that managers have a clear need and motivation to use tone to manage the impression of social responsibility report information, and the characteristics of social responsibility reporting provide more room for management to manipulate the tone of social responsibility reporting, which will aggravate the degree of information asymmetry between managers and investors.
Hypothesis 1b. The tone of CSR reports is positively associated with information asymmetry between managers and external stakeholders.

The Sample
Selected from Shanghai and Shenzhen A-share listed companies that disclosed their social responsibility reports from 2014 to 2021 as a research sample, this paper explores the relationship between the tone of corporate social responsibility reports and information asymmetry. First, Python software is used to crawl the social responsibility report of listed companies, and then word frequency statistics are carried out through Python's "Jieba" Chinese word segmentation module to obtain intonation data. Media attention data comes from the CN-RDS database, and all other data comes from the CSMAR database. To prevent the empirical results from being affected by extreme values, samples with special treatments such as ST, PT, and delisted are excluded. Finally, observations of industries belonging to the financial sector and observations with missing data for required variables are deleted. The final full sample includes 5074 observations.

Variables
(1) Dependent variable: Information asymmetry Referring to Cui et al [4], we measure information asymmetry using deviation and dispersion of analysts' forecast. Analysts' disagreement over a company's future earnings may be a correlated proxy for the degree of asymmetry among investors [25]. The deviation of the analyst's forecast(FEEOR) is the average deviation of the analyst's earnings forecast from the actual earnings value, while the analyst forecast divergence degree (FDIS P) refers to the standard deviation of each analyst's most recent earnings forecast. Referring to the measurement method of Ye Yingying et al [26], this article first excludes the sample of analysts' forecast announcement date later than the annual report announcement date (later than April 30 of the following year), and if the same analyst publishes multiple forecasts for the same company in a year, only the last sample of the analyst's forecast value in that year is retained. The missing samples of actual earnings per share and forecasted earnings per share were then excluded. Finally, formula (1) is used to measure the analyst's forecast bias, and formula (2) is used to measure the analyst's forecast divergence.
AEPS i,t is the company's actual earnings per share, and FEPS i,t is the analyst's forecast earnings per share. S td(FEPS i,t ) is the standard deviation of the analyst's earnings forecast, and Abs(AEPS i,t ) is the absolute value of actual earnings.
(2) Independent variable: Tone of CSR reports The explanatory variable is the tone used by managers in social responsibility report disclosure. First, a web crawler program written in Python 3.7 is used to batch download the social responsibility reports disclosed by listed companies. Then, we use Python to convert the social responsibility report pdf to a txt version for easy identification and supplement by manual conversion if the conversion is unsuccessful.
The key to constructing tone indicators is the emotion lexicon. Among foreign studies, Loughran and Mcdonald (2011) proposed a dictionary applicable to the intonation analysis of English annual reports of listed companies based on manual screening, which is widely used with high authority [27]. Xie and Lin [28] conducted a manual screening and translation method of Loughran and Mcdonald's dictionary for Chinese word usage conventions and contexts. In addition, there exist some general dictionaries for Chinese text analysis, such as Hownet, the Chinese Emotional Polarity Dictionary of National Taiwan University (Ntusd), and the Praise and Derogatory Dictionary of Tsinghua University. Its usefulness and accuracy are still in doubt. While acknowledging the limitation in terms of usefulness and accuracy of the above dictionaries in the financial field, Bian et al. [29] constructed the Chinese Financial Text Sentiment Dictionary (CFSD) applicable to the field of accounting and finance. Based on the lexicon construction method of Loughran and Mcdonald [7], more than 20,000 annual reports, IPO prospectus, online roadshow transcripts, and earnings conference call transcripts of Chinese companies are selected as the base corpus. Combining the existing Chinese sentiment dictionaries (HOWNET, DLUTSD, NTUSD) and LM dictionary, we construct a financial text sentiment analysis dictionary containing 1489 negative words and 1108 positive words. Similarly, Yao et al. [30] constructed a Chinese sentiment dictionary (CSDF) in the financial domain applicable to both formal and informal texts through dictionary restructuring and deep learning algorithms. Since the subject of our analysis is corporate social responsibility reports, the formal text dictionary is chosen.
Based on the above two Chinese sentiment dictionaries [29,30] applicable to the field of accounting and finance, Python software (Jieba package) is used to determine the frequency of positive and negative words in CSR reports, and to calculate intonation variables(T one_y i and T one_b i ).
Referring to Price et al. [31] and Xie et al. [28], positive emotions are assigned +1 and negative emotions are assigned -1. Therefore, we calculate the net tone as shown in Eq. (3): Positive is the number of words expressing an optimistic tone in textual disclosure, and Negative is the number of words expressing a negative tone in textual disclosure. As the variable T one is larger, both the vocabulary used by management and the tone of the social responsibility report is more positive. The variables are defined in table 1.

Models
The objective of this study is to examine the effect of the net positive tone of CSR reporting on information asymmetry between managers and investors. Error and dispersion of analysts' forecasts are used as measures of information asymmetry. In this paper, the following model is constructed using the lag term of the intonation variable to coincide with the timing of analysts' prediction decisions. Equals 1 if the CEO and the chairman are the same people and 0 otherwise T op5 The shareholding ratio of the top five shareholders Inst The shareholding ratio of the institutional investors S oe Equals 1 if a firm is a state-controlled enterprise and 0 otherwise FDIS P i,t = β 0 + β 1 T one i,t−1 + β 2 S ize i,t + β 3 Lev i,t + β 4 BM i,t + β 5 Board i,t + β 6 Indep i,t +β 7 Dual i,t + β 8 T op5 i,t + β 9 Inst i,t + β 10 S oe i,t + Industry and Year f ixed e f f ects + ε i,t . Table 2 reports the descriptive statistics of our main variables for the full sample. The mean value of the which measures information asymmetry is 2.229, the standard deviation is 7.601, and the maximum value is 219.9, indicating that there is a large deviation in the analyst's earnings forecasts, reflecting the high degree of information asymmetry between management and investors. The average value is 1.428 and the maximum value is 265.6, indicating that the analyst earnings forecast is more divergent. Among the explanatory variables, the average intonation of CSR reports constructed using the "Chinese Financial Text Sentiment Dictionary" (CFSD) [29] is 0.848 and the median is 0.860, indicating that the tone of CSR reports of Chinese listed companies was generally optimistic, which was consistent with previous studies [6]. The mean intonation of CSR reports constructed using "The Construction of Chinese sentiment dictionary in finance" (CSDF) [30] applicable to formal texts is 0.877 and the median is 0.886, which is not much different from the first indicator.

Multivariate Analysis
(1) The tone of CSR reports and information asymmetry Table 3 presents the results of the regression of the tone of CSR reports and the error and dispersion of analyst forecasts. The regression coefficient for the net positive tone of CSR reports (T one_b and T one_y) to the error of analyst forecast (FERROR) is significantly negative at the 1% level. The net positive tone of the social responsibility report and the dispersion of analyst forecast(FDIS P) are significantly negative at the 5% and 10% levels, respectively. This finding supports H1a, which predicts that the net positive tone of CSR reporting significantly reduces analyst forecast bias and divergence. That is, the tone of the social responsibility report reduces information asymmetry between management and external stakeholders, showing that the tone of CSR reporting has incremental informational value. Management uses tone to convey hard-to-quantify information to provide more enterpriselevel information for China's capital markets.
(2) Distinguish the direction of analyst forecast deviation A positive tone is a positive assessment of the company's level of social responsibility fulfillment and future impact. If the positive tone is objective and true incremental information, which accurately expresses the good performance of corporate social responsibility and the positive impact of future corporate development, then the positive tone of social responsibility can accurately provide information on corporate social responsibility, thereby reducing the degree of information asymmetry. From the perspective of measurement indicators, analysts can more accurately judge the real situation of the company's future performance, the more objective the optimistic forecast of the company's performance, and the smaller the deviation of the optimistic forecast. In this paper, optimistic bias samples and pessimistic bias samples are further distinguished according to the deviation direction between analysts' forecasted surpluses and actual earnings, and a group regression test is performed to examine whether there is a difference in the influence of CSR reporting tone. The test results are shown in table 4. The regression coefficient of management's net positive tone versus analyst forecast bias in the CSR report is significantly negative only in the optimistic bias sample group, suggesting that tone is more likely to reduce information asymmetry and provide incremental information to investors when companies perform well.
(3) Group according to media attention In the current regulatory environment of CSR reporting, the legal mechanism to inhibit the management of CSR reporting impression is relatively weak. Therefore, the role of alternative non-legal mechanisms, such as media coverage, is a cause for concern. Media coverage puts companies under government and public scrutiny [32]. The media plays an important role in the capital market through information dissemination and information manufacturing. -0.283 -0.285 -0.288 0.002 Note: *, **, and *** indicate statistical significance at the 10%, 5%, and 1% levels, respectively.
The information environment of the capital market has improved, and the information asymmetry between management and external stakeholders has been reduced [33]. If the content disclosed by the media is inconsistent with the social responsibility information formed by impression management, the regulatory authorities may pay attention to it or even take an administrative intervention. The impression management behavior of managers on CSR reports out of opportunistic motives is effectively constrained. Under this market surveillance hypothesis, enterprises with high media attention are more inclined to disclose the true situation of enterprises, and the tone of social responsibility reports can provide incremental information to reduce the degree of information asymmetry. Table 5 shows the group regression results, which show that the regression coefficients of net positive tone (T ONE) in CSR reporting and analyst forecast divergence (FDIS P) and analyst forecast bias (FERROR) are significantly negative only in the sample with high media attention.

Conclusion
Based on the theory of information increment and impression management, this paper discusses the influence of CSR report tone on the degree of information asymmetry between managers and external stakeholders. Taking A-share listed companies that issued CSR reports from 2014 to 2021 as research samples, we find that the net positive tone of CSR reports reduces the degree of information asymmetry. This suggests that management may use tone to convey hard-to-quantify information, helping analysts and investors to better predict the value of CSR performance. Further analysis shows that the net positive tone has a significant negative correlation with information asymmetry in companies with optimistic forecast bias. It shows that a positive tone can accurately express the good performance of corporate social responsibility and the positive impact on the future development of the enterprise when the enterprise performs well. Stakeholders and analysts can more accurately judge the true situation of the company's Note: *, **, and *** indicate statistical significance at the 10%, 5%, and 1% levels, respectively.
future performance, thus reducing the degree of information asymmetry. In terms of media attention, companies with high media attention are subject to more regulatory pressure. The tone of social responsibility report is more inclined to convey real information and reduce the degree of information asymmetry between managers and stakeholders. This paper provides practical value for external stakeholders, companies and market regulators. Firstly, investors need to enhance their ability to identify the textual information in CSR reports. The tone of social responsibility disclosures provides incremental information which is of great importance to investors in making decisions. However, investors should not rely entirely on the information provided in CSR reports. It is necessary to improve their ability to interpret the textual information and to make comprehensive decisions with reference to the company's internal governance level, its business operation and the specific industry characteristics. Secondly, companies can make proper use of their report disclosure to improve the information environment in the capital market. Finally, the policies related to social responsibility information disclosure are not yet perfect. Regulators should introduce corresponding policies and regulate the information disclosure of CSR reports as soon as possible. Specially, they should pay attention to the language features such as the textual tone of non-financial information, and incorporate the tone of CSR reports into the regulatory framework of information disclosure as soon as possible.
We are grateful for the support of the National Natural Science Foundation of China (71902128), the Innovation spark project of Sichuan University (2018hhf-49) and the Social Science Research Fund