Construction and Application of University Patent Evaluation Model based on Machine Learning

. As the frontier of scientiﬁc and technological innovation, universities will produce a large number of patents based on their talent, technology and resource advantages. How to evaluate the value of university patents in a more scientiﬁc and efﬁcient manner is of great signiﬁcance in improving the scientiﬁc research and innovation capability of universities and promoting the transfer and transformation of university patents. Firstly, combined with the characteristics of universities and the deﬁnition of “high-value patents”, we constructed a scientiﬁc evaluation index system of university patent value. Secondly, machine learning algorithms were used to build patent value evaluation models. Finally, we conducted an empirical study with invention patent data from 134 universities in Sichuan Province, and then tested six evaluation models for their performances. The XGB model and GBDT model are found to have better accuracy and reliability. In addition, the number of IPC classiﬁcations, patent family citations and independent claims are of higher importance in patent value evaluation, university characteristics are less important to the value of university patents.


Introduction
Intellectual Property (IP) is of great significance to the innovative development and sustainable economic growth of a country. The report of the 20th National Congress of the Communist Party of China emphasized that we must adhere to the principle that science and technology are the first productive force, talent is the first resource, and innovation is the first driving force. Besides, we should improve the scientific and?technological innovation system, accelerate the implementation of innovation-driven development strategy, and enhance our capacity of independent innovation as well. As an important form and core content of IP, patents have greatly promoted the progress of science and technology and the development of social economy. Since the implementation of the innovation-driven development strategy, the number of invention patent applications accepted by China National Intellectual Property Administration (CNIPA) has surged year by year, ranking first in the world for consecutive years. However, compared with developed countries in Europe and the United States, China is just a country with great patent applications, our patent quality still needs to be improved.
As an important part of China's innovation system, colleges and universities (hereinafter referred to as "universities") have diversified academic backgrounds, profound research foundations, perfect talent echelons and rich research resources. Relying on scientific research projects and the National Science Fund, universities produce a large number of scientific and technological innovation achievements every year, and patent is one of the core and important achievements. In 2020, the Chinese government issued the "Some Opinions on Improving the Quality of University Patents and Promoting the Transfer and Application", which clearly pointed out that the quality of university patents should be comprehensively improved, and the creation, application and management of high-value patents should be strengthened 1 . According to the "2021 Annual Report of Intellectual Property Statistics", by 2021, the number of invention patents authorization of universities accounts for 23.61% of the total number in China; In 2021, the number of applications for invention patents in universities accounts for 18.26% of the total domestic application, and the number of authorizations reach 24.99% of the total 2 . However, when the overall industrialization rate of China's invention patents is 35.4% in 2021, the industrialization rate of university patents is only 3% 3 , which shows that universities still have a large number of "sleeping patents" that have not been put into production and application. Therefore, how to construct an evaluation method with strong scientific and operability has great practical significance for improving the patent quality and patent industrialization rate of universities, enhancing the independent innovation ability of universities, and realizing the self-reliance of our country's high-level science and technology.
Through literature review and analysis, we found that although scholars at home and abroad have conducted in-depth research on patent value evaluation, only a small number of them have embarked on research on patent value assessment in universities, and a unified and effective value assessment index system and model has yet to be formed. Therefore, we took innovatively incorporate the characteristics of the university where the patent right is located and the definition of "high-value patent" in China 4 into the evaluation index and construct a scientific and efficient evaluation index system of university patents. Then, we use various machine learning methods to build university patent value evaluation models, and finally conduct an empirical study using the patents of 134 universities in Sichuan Province and test the performance of every model in order to select the model with the best evaluation effect.
The structure and organization of this paper is as follows: In Section 2, we combed and analyzed existing studies; Section 3 constructs the value evaluation index system of university patent; Section 4 introduces the data, builds the evaluation model, and conducts performance tests; The last part summarizes the main findings and limitations of the study.

Literature Review
The current research on patent value evaluation mainly covers two aspects, constructing a patent value evaluation index system by analyzing the influencing factors, and choosing an appropriate method to build a patent value evaluation model. Furthermore, some scholars have begun to focus on how to evaluate the value of transfer and transformation.
Patent is an intangible commodity with extremely high use value and exchange value, with many factors affecting its value. The existing studies mainly analyze the value of patents from three aspects of technical value, legal value, and market value [1]. In terms of technical value, Bakker J's study shows that the number of patent citations showed a log-linear relationship with patent value [2]. Some scholars measured patent value directly in terms of the number of patent citations and used it as the dependent variable in a regression analysis [3]. Squicciarini M and others used the number of IPC classification numbers to measure the technical scope of patents, and found that it was positively related to patent value [4]. Callaert J 's research found a correlation between the number of citations of non-patent references in patents and the value of patented technology [5]. Mark investigated the number and quality of joint patents of American universities and found that the patent quality of universities and different partners is different [6]. The number of overseas family patents is used to measure high-value patents 5 . For general patent family, studies by Schettino F [7] and Squicciarini M [4] verified that the number of patent family has a positive effect on patent value. In terms of legal value and market value, Marco A C and others studied the positive impact of the length and number of patent claims on patent value [8]. Research by Rassenfosse D G and Jaffe A B shows that increasing patent fees can effectively eliminate low-quality patents, indicating that the longer the patent is maintained, the higher the fee to be paid, which means the higher value of the patent [9]. Boeing explored the application of the number of independent claims in the evaluation of patent value [10]. Some researchers found that there is a significant positive correlation between patent value and its economic value [11]. In addition, the number of emerging industrial fields 6 and patent document pages [12] is also considered to be related to the patent value.
Secondly, the methods used to evaluate the value of patents in relevant studies can be divided into three main categories: (1) Economic methods, such as expenditure approach, income approach, real options, etc. Mark Russell used the income approach and the expenditure approach to value pharmaceutical intangible assets, and found that the discounted cash flow valuation of pharmaceutical patents has value relevance [13]; Wu M C explored the factors influencing the value of patents based on the real options framework, and found that the value of patents for companies could be increased by reducing costs, increasing the number of patents and improving the efficiency of innovation [14]. (2) Comprehensive evaluation method, such as Analytic Hierarchy Process (AHP), Entropy Method, Fuzzy Comprehensive Evaluation Method, etc. Ko et al. selected 21 indicators for value evaluation by using the Principal Component Analysis (PCA) method from both the internal and external characteristics of the patent [15]; Han and S. Y. Sohn used Text Mining to identify the elements related to patent value, and focused on mining their life span, which helps to improve the accuracy of patent value evaluation [16]; Choi et al. combined social network analysis and multiple regression analysis to construct a model for evaluating the value of patents in universities [17]; Mukundan and others advanced PQ literature by identifying the fourth generation of strategic PQ indicators and proposed a hybrid multi-criteria model, based on AHP-TOPSIS, for patent portfolio measurement [18]; Barbazza and other scholars proposed a multi-expert system for ranking patents using the TOPSIS and the Analytic Hierarchy Process [19]; Song Kai, based on the Entropy-TOPSIS model and Gradient Boosting Decision Tree (GBDT) algorithm, evaluated the patents with transfer/license value and the patents with risk of invalidation in university [20]. (3) Machine Learning method. With the continuous maturity of Machine Learning theory and technology, it has been more widely used in the field of patent value evaluation. Kim and Geum used the number of citations as a proxy variable for patent value, selected indicators such as the number of similar patents and the number of historical citations of rights holders, then used Random Forest and Logistic Regression and other methods to construct the patent value evaluation model [21]; Trappey et al. used Back Propagation Neural Network (BPNN) method to evaluate patent value for the first time [22]; Lee and others used AdaBoost algorithm to identify the patent value in Korean artificial intelligence field [23]; CHUNG P, SOHN S. Y. used Deep Learning to evaluate the patent level [24];Li Jianlin and Li Lanqi regarded patent value evaluation as a classification problem, and used eight machine learning algorithms to recognize the transformable patents of universities [25]; Based on Bayesian Theory and Combination Weighting method, Han Meng and other scholars proposed a method for identifying transferable patents in universities [12].
To sum up, most of the existing studies have built patent value evaluation indicators based on patents of industrial fields or enterprises, only few studies on the value of university patents. At the same time, the technical, legal and market dimensions are mostly considered when selecting indicators, which are relatively general, without considering the possible influence of the subject where the patent right is located. Besides, in terms of evaluation methods, the economic method and comprehensive evaluation method have strong subjectivity, and their efficiency get low when the volume of data is large. Machine learning method, on the other hand, offers a high degree of objectivity, operational efficiency, and accuracy. Therefore, we take university patents as our research object, and comprehensively consider the characteristics of universities to build an index evaluation system. Then we adopt various machine learning algorithms and select the best as the final evaluation model, with a view to providing a scientific and efficient method for universities to evaluate their value of patents, and also providing a decision reference for enterprises to seek high-value patents from universities.

Evaluation Index System of University Patent Value
In 2021, the CNIPA clearly defined a valid invention patent as high-value if it meets one of the following five conditions: (1) belongs to a strategic emerging industry; (2) has patent rights of the same family overseas; (3) the maintenance period exceeds 10 years; (4) achieves a higher pledge financing amount; and (5) receives the National Science and Technology Award or the China Patent Award 7 .
As the main force of national scientific and technological innovation, universities have a high level of patent creation, and their patents are usually highly advanced in technology. Compared with patent output, the level of patent industrialization in universities is relatively low. Thus, following the principles of systematization, scientificity and operability, we comprehensively consider the characteristics of university patents and the definition of high-value patent, and select the influencing factors of patent value as the indicators from four aspects: technical characteristics, legal characteristics, characteristics of universities where the patent right is located and industrialization characteristics.

Technical Characteristics
Technical characteristics measure the value of patents from various perspectives, such as R&D and application of patented technology, innovation, importance and so on. Based on previous studies, we select 9 indicators to reflect the technical characteristics of university patents. Among them, the number of patent applicants, the number of inventors, and the number of IPC classifications can reflect the status of technical development and application. While the number of citations, cited citations, family citations, family cited citations, references to scientific and technical literature, and simple family members can reflect the 7 http://www.gov.cn/zhengce/content/2021-10/28/content_5647274.htm innovation and importance of patents' technology. According to the definition of "high-value patents", the number of overseas family members is also selected as a technical characteristic indicator.

Legal Characteristics
The value of a patent is closely related not only to the technical characteristics, but also to its legal characteristics. The legal characteristics of patents are reflected in the legal status of patent, and the quality of patent application documents also affects the legal value of a patent as well. Based on the existing research, we select 5 indicators to characterize the legal characteristics of university patents, including patent maintenance years, legal status, the number of claims, the number of first right words and the number of document pages.

University Characteristics
Universities themselves have unique subject characteristics, so two indicators representing the characteristics of universities are added in our research, which named university type and discipline type. Among them, the values of university type are 1, 2, 3 and 4, 1 for junior college, 2 for undergraduate college, 3 for universities with first-class disciplines and 4 for first-class universities; the values of discipline type are 1 and 0, which means first-class disciplines and non-first-class disciplines respectively.
In addition, some existing studies have studied the relationship between the cooperative application and patent quality [26,27]. And we believe that multiple applicants have more information about knowledge application scenarios than a single applicant [28], collaboration with other organizations (including enterprises, scientific research institutes, and state organs) enables academics to improve the value of patent. Therefore, we also choose the cooperation type of universities as an indicator of university characteristics, with the values of 1, 2, 3, 4. And 1 represents patents applied by universities alone, 2 represents universities jointly apply with enterprises or with research institutes and organizations, 3 represents universities jointly apply with research institutes or state organs, 4 represents universities and enterprises jointly apply with research institutes or state organs.

Industrialization Characteristics
Universities patents are usually generated from research projects with advanced technologies, but they often face more obstacles in their actual operation. Practice shows that patents that have been transferred or transformed usually have more practical feasibility and are more in line with market demand, thus can generate more revenue. Considering the accessibility and scientificity of the indexes, we use the industrialization characteristics to indicate the potential operation possibility of patents, and takes the number of transfers, licenses and pledges of patents as the indicators of the industrialization characteristics. Besides, according to the definition of "high-value patents", the number of emerging industries is also selected as the indicator of industrialization characteristics.
In summary, we select 22 indicators from 4 aspects to build our university patent value evaluation system. The meaning of each indicator is shown in table1.

Data
We collected the bibliographic data of invention patents applied by 134 universities in Sichuan Province between 2012 and 2021 at Incopat (i.e., an important business partner of Reflect the degree of cooperation, the more applicants, the higher the patent value is. inventors T2 Reflect the size of the research team, the larger the size is, the higher the technical complexity is, the higher the value is. IPC classification T3 Reflect the fields of technology involved in the patent, the more classifications, the more comprehensive the technology is. citations T4 Reflect the foundation of technology, the more citations, the more solid the foundation is, the higher the value is. cited T5 Reflect the influence of patented technology, the more times it is cited, the greater the influence is. family citations T6 Reflect the foundation of technology, the more citations, the higher the value is. family cited T7 Reflect the influence of the patent in different countries, the more cited, the greater the influence is. scientific literatures T8 Reflect the relevance of between patents and scientific frontier, the more scientific literatures are cited, the more frontiers patent is. simple family T9 Reflect the layout of the patent, the more the number, the more comprehensive the layout, the higher the value is. overseas family T10 Reflect the international layout of patents, the greater the number, the wider the geographical coverage is. Legal Characteristics maintenance years L1 Reflect the maintenance time of patent, the longer the maintenance years, the higher the maintenance cost, and the higher the value is. legal states L2 Categorical variables, reflect the current legal status of patents. claims L3 Reflect the scope of technological innovation of patent, the broader the scope, the more valuable the patent is. first claim L4 Reflect the degree of description of technical protection details, the more detailed the description, the more valuable the patent is. pages L5 Reflect the details of patent document, the more detailed the content, the more valuable the patent is.  Clarivate in China, and its address is https://www.incopat.com/) on April 11, 2022. After excluding some data that do not meet the research objectives, a total of 43816 valid data were obtained. Through the original data in the database and manual indexing, data of 22 evaluation indicators were obtained. Descriptive statistical results of each indicator are shown in table2.

Model Build
We chose 6 machine learning algorithms in our research: (1) Logical Regression (LR): LR is a generalized linear regression analysis model. Although its name contains the word "regression", it is a classification model, which runs fast and is easy to implement. (2) Naive Bayes (NB): NB algorithm is a classification method based on Bayes Theorem and conditional independence assumption, which is less sensitive to missing data and has a more stable classification efficiency. (3) K-Nearest Neighbor (KNN): KNN algorithm is the most classical supervised machine learning algorithm with mature theory. Due to its high tolerance for outliers and noise, it has a high accuracy. (4) Random forest (RF): RF is an algorithm that integrates multiple decision trees through the Bagging idea of integrated learning, which can directly and accurately handle high-dimensional features and data of missing eigenvalues without easy overfitting. (5) Gradient Lifting Decision Tree (GBDT): GBDT is a very practical Boosting algorithm in integrated learning. It takes the negative gradient of the loss function as an approximate value of the residual, improves the prediction accuracy of weak classifier by continuously using residual iteration and fitting the regression tree. Its prediction accuracy is high. (6) eXtreme Gradient Boosting (XGB): XGB is an upgraded Boosting algorithm based on GBDT, which has stronger generalization ability, supports parallel operation and higher accuracy compared with GBDT. Then we trained and built above evaluation models separately by Python. Firstly, onehot code the categorical variables, and normalize the data in KNN model and GBDT model. Secondly, divide the data into training set and testing set in the ratio of 8:2 for model training. Finally, optimize the parameters of models, and construct the final evaluation model based on the optimal parameters. The results are as follows.
The patent value is divided into 10 grades from 1 to 10. Figure. 1 shows the comparison results between the evaluation value (blue line) and true value (red line) of each model. Besides, for the convenience of visual observation, only the first 1000 data are shown for each model. It can be seen that the evaluation results of XGB model are almost consistent with the actual situation, GBDT model and RF model are basically consistent, and the results of the other three models have large deviations.

Model Performance Test
To further test the performance of each evaluation model and select the best value evaluation model, we used 4 indicators, namely accuracy, recall, F1_ score and precision, to evaluate the classification effect of the model. The relevant calculations are shown in formula (1) - (4).

Importance of Patent Value Evaluation Indicator
To better judge the role of each indicator in the evaluation process, we calculated the importance of each indicator in the XGB and GBDT model, and then take the mean value of these two models. The result is shown in figure2 .
As can be seen from figure2, the number of IPC classifications is the most important indicator in the evaluation of university patent value, which is 0.19, followed by the number of family citations, the number of independent claims and the legal status of patent. Moreover, three indicators reflecting the characteristics of universities are relatively less important, and the type of cooperation among them has a higher impact on the university patent value, especially the patents that cooperate with enterprises or with research institutes and organizations.

Summary
Our research aims at value evaluation of university patent. Combining the characteristics of universities and the definition of high-value patents, 22 evaluation indicators are selected from 4 dimensions of technology, law, university, and industrialization. With the invention patents applied by 134 universities in Sichuan Province in the past ten years as sample data, 6 machine learning algorithms are used to build a patent value evaluation model to obtain the optimal evaluation model, so as to evaluate the patent value of universities more scientifically and efficiently. It is conducive to improving the patent quality of universities and promoting the transfer and transformation of scientific and technological achievements in universities.
The results show that: (1) The accuracy and F1_score of the evaluation model based on XGB algorithm and GBDT algorithm reached above 95, which can evaluate the university patent value more accurately; (2) In terms of the importance of each indicator in patent value evaluation, the number of IPC classifications, the number of family citations, the number of independent claims and legal status of the patent show strong explanatory ability for patent value, while the three indicators reflecting the characteristics of universities are of low importance; (3) Among the three indicators reflecting the characteristics of universities, the cooperation type has a higher impact on the patent value in universities, especially the patents that cooperate with enterprises or with research institutes and organizations.
According to our research, there are some innovative suggestions on patent produce and cooperative management for universities. Firstly, universities already have the advantages of interdisciplinary integration and talents?reservoir, it is capable and necessary for university researchers to produce multi-disciplinary patents, continuously improving the ability of integrating innovation. Besides, cooperating with external organizations to apply for patents may help to improve the quality of university patent, such as enterprises or research institutes. This research may also provide the patent examination office with options when examining university patents, such as focusing more attention on the number of IPC classifications, patents filed jointly by universities and external organizations, and so on.
Finally, there are still some deficiencies in this study. For example, this study only selects the patents from universities in Sichuan Province as the sample data, and the results may be affected by regional and other factors. The follow-up research can consider selecting universities nationwide. In addition, the weight of university characteristic indicators is relatively low, so the classification criteria and scoring proportion of indicators can be refined or select other university indicators in future research.