Research on Credit Evaluation Model of Online Store Based on SnowNLP

The online store credit rating is a reflection of the seller's integrity and the quality of the product. The level of the credit rating directly affects the buyer's desire to purchase. Two important factors affecting the credit rating are data and models. The innovation of this research is that the collected data comes from the second evaluation, and the credit evaluation model is improved based on the snowNLP tool, and the malicious brushing filtering function is added. Compared with the credit evaluation system commonly used in current online stores, the evaluation results of the paper are more accurate, detailed and intuitive, and may effectively reduce false brushing and threat review.


Introduction
Credit is the foundation of online store survival.For this reason, many online shopping platforms provide credit evaluation systems for both buyers and sellers.When a transaction is completed, both parties can give praise, mid-level review or bad review to the other party according to the satisfaction degree of the transaction.The paper founds that the first review after the completion of the transaction is often completed when the buyer is not particularly familiar with the goods or very casual, and the second review is produced after the buyer ' s real experience of the goods.If the experience is too good or too poor, the second review is more representative of the actual credit rating of the store than the first review.It is more representative.Therefore, The paper uses sentiment analysis to analyze the text of the second review, filters the malicious brusher review content through the TF-IDF tool, and builds the online store credit evaluation model using SnowNLP sentiment analysis tool.The comparison results show that the credit evaluation model of this study is more accurate, detailed and intuitive than the credit evaluation system commonly used in online stores, and it can effectively improve the phenomenon of false brushing and threat review.

Text sentiment analysis
Text sentiment analysis, also known as Opinion Mining, refers to the process of analyzing, processing, summarizing and reasoning subjective texts with emotional color.

SnowNLP
SnowNLP is a Python class library that can perform Chinese word segmentation, part-of-speech tagging, sentiment analysis, text categorization, conversion pinyin, traditional simplification, extracting text keywords, extracting abstracts, segmenting sentences, and text similarity.This paper uses the SnowNLP tool to perform text sentiment analysis on the content of the second review.
The specific idea is word segmentation-extracting features--feature selection--classification model--recognition results.

TF-IDF
TF-IDF statistical method used to determine the importance of a word in a certain text set.The traditional method is only TF.TF indicates that the frequency of the word appears is more important, but in the review, "you" , "I" , "he" , etc. often have meaningless words, which leads to simply reading words.Frequently determining its importance is inaccurate.This paper introduces IDF.IDF indicates that the less the text of a word has, the larger the IDF is, the more important the word will be.By combining the two TF-IDFs, it is possible to more accurately determine the importance of words to the text set.

Octopus data collector
Octopus data collector can not only capture the page text data, but also set the automatic page turning and capture multiple pages of text data.This paper uses the Octopus data collector to collect secondary evaluation text data.

Emotional factor calculation formula
The negative word sentiment word Neg i is -1 when there is a negative word, and +1 when there is no negative word.
Degree adverbs are classified as positive and negative Mod i according to the degree of classification.
Emotional factor Q advi

Emotional value polarity formula
f ti is an attribute in the comment.

Positive formula of emotional value intensity 3.2.4 Affective value strength negative formula
The cp in Formula 3 represents a sentence whose text attribute obtained after filtering corresponds to a positive polarity; The cn in Formula 4 represents a sentence in which the text attribute obtained after the screening corresponds to a negative polarity; w represents the weight value of the positive and negative polarity of the text analysis, W j is positive and W k is negative.

Keyword extraction construction formula
(1) The sentence segmentation is given for a given text set T, which is divided into several small sentences S, that is, the set T = [S1, S2, S3, ..., S4].
(3) A graph of the candidate keywords G = (V, E),Here we have V as the node set, which consists of the specified keywords selected in the above 2, and uses the co-occurrence relationship to construct the edges of any two nodes in the node set.The existence edge between the two nodes will be co-occurring only when the length of the corresponding vocabulary is K (the size of this window), indicating that it has a maximum of K words on its surface.
(4) Then, according to the formula of the weight, the weights of the respective nodes are iteratively calculated until the convergence is finally displayed.
(5) Reverse the weighted weights to get the N words with the highest importance.The collected keywords are marked in the original text set, and if they are adjacent, keywords containing multiple words are formed.

Collecting data
The Octopus Data Collector software was used to analyze the product review of Taobao's home textile shop.
(1) Select the web address and set the page turning cycle in the process interface.
(2) Select all the reviews with review in the review.Each review selects 5 fields, which are user name, user review, review time, whether the review is valid, and additional review, including the failure to complete the review according to the time.The system defaults to praise, but later there is additional review.
(3) Import the filtered data into the excel table.
For example:

Print.(s1.sentiments)
The score result is: 0.6105560234212057 The result of sentiment analysis here is a value in the interval [0,1].The closer to 1, the more positive the emotion, the closer to 0, the more negative the emotion or it can be understood as the probability of positive.
Probability of s.sentiments#positive 0.8463107097139686 (5) Obtain the emotional positive result graph and the emotional negative result graph

Effective keyword extraction research
By crawling the review data, it is found that there is a malicious phenomenon, and it is necessary to filter the review with the following characteristics: (1) There are a certain number of similar or identical reviews.
(2) The review in the front of the comment area is long and the review is comprehensive.
(2) The text is short, but there are too many similar words.
(3) A large number of anonymous ratings appeared in a short period of time, especially those with higher levels of Taobao users.
(4) The length is moderate, but the text content is almost identical.
(5) Common high frequency words for network brushing.

Conclusions and significance
The currently widely used evaluation system is compared with the evaluation system of this paper: (1) Through the comparison of pictures, we can clearly find that the scores of the analyzed comments are more detailed.In the original interface, only some of the key short sentences are displayed, and the consumers are not given a more intuitive visual form.After the improvement, specific words will be presented, and the label of the keyword language will be more completely supplemented.This avoids users who are threatened by praise or their own review habits.Although they are not satisfied with the quality of the products, they still give Praise.The content contained in these reviews may not be a compliment to the product, but more dissatisfaction.It is difficult to notice this review only through the original praise evaluation system.
(2) The current society is developing rapidly, and people's time is getting more and more tense.Many consumers may not have enough time to browse the review of this product.After the extraction, they can quickly find the main review characteristics of this product.With more detailed guidance, in addition to the key words of all comments, we can clearly understand the weight of them, can more accurately and succinctly understand the consumer's feelings of this product, increase the overall user viscosity.
(3) According to the weight of these keywords evaluated by consumers, merchants can also find some advantages and disadvantages of their own products, and can further improve their own advantages, and can also be highlighted in the promotion, and the description is more concentrated.Disadvantages can be improved.