Development of a Predictive System Model Using Big Data in the Transport Sector

. This article discusses the possibility of creating a predictive system model using smart innovative technologies to generate a request for spare parts for the automotive industry using big data analysis, which is quite relevant for many transport, logistics and repair companies. The authors seek to generalize and systematize research in this area, with the possibility of integrating practical knowledge and big data for the implementation of urgent management and logistics tasks that will allow economic growth. In addition, the relevance of timely diagnostics and rapid leveling of vehicle defects is explored in light of current market conditions. Using data from the domestic market, a hypothesis has been proposed for specific parts and assemblies that may fail for a given car model. The top-selling models on the Russian second-hand market, consisting of 9 brands and 27 models from foreign manufacturers, were selected. An algorithm for analyzing the selected data set was developed, implemented using the Python programming language in Jupiter Notebook. Following a comprehensive investigation of the car's flaws, stacked diagrams were created to show the flaws, and the relevance of the 27 most popular model's flaws and their structural makeup were determined according to the country of origin. A model for predicting used car spare parts based on the results of the analysis of the structure of defects has been developed. The model combines data from revocable companies, feedback from car owners, maintenance services and component suppliers. Also, this article establishes the direction for future research in this area.


Introduction
car is on average 15-20 years (Mitrokhin, Pavlov, 2015). According to a study by (Blagoveshchensky, Kozlovsky, and Vasin, 2021), the majority of defects related to manufacturing processes or design features show up during the stage of vehicle operation. As a result of the current geopolitical situation, challenges arose in the first half of 2022 as a consequence of the exit of some automakers from the market (Boiko, Lyapuntsova, 2022). According to Russian experts, there is a unique opportunity to develop the domestic auto industry. However, increasing our own production capacity with the required model range will not be possible until 2030. (Izvekova, 2022). According to a study by Autostat, the cost of the 10 most popular cars in Russia has increased by an average of 28% since January 2022 (Loboda, 2022). The inflation rate in 2022 was the highest since 2015 and amounted to 11.94% (Fedotova, 2023). At the same time, sales of passenger cars decreased significantly: in May 2022, 86% fewer cars were sold than in May 2021 (Chuprov, 2022).
These factors indicate that car consumption will increase in the coming years. As auto parts become more scarce and more expensive, the need for preventive measures aimed at early detection and prevention of potential car breakdowns emerges.
This work is devoted to determining the elements of cars that most often fail for a particular brand and model. This will allow us to form a data set for further research in the development of a recommender system, which is planned to be universal not only for the automotive industry, but also for mechanical engineering in general.
Hypothesis: The study's hypothesis is that each model of a specific brand of car has its own reasons for vehicle failure.

Materials and Methods
In order to detect defects in a timely manner and improve the quality of vehicles in international practice, the National Law on Road Traffic and Vehicle Safety is applied (Bae, Benitez-Silva, 2011). According to this law, the Ministry of Transport has the right to require automakers to to conduct vehicle recall campaigns if a defect is discovered. In total, more than 390 million cars, trucks, buses, recreational vehicles, motorcycles and mopeds, 46 million tires, 66 million pieces of automotive equipment and 42 million child seats have been recalled since 1967. Due to the fact that data on recall campaigns are open, many scientists have used them in the study of safety on public roads. Nichols and Fumier studied the impact of recall campaigns on manufacturers' reputations (Nichols, Foumier, 1999). Rupp and Taylor investigated recall initiation (Rupp and Curtis, 2002). They found that the government is more likely to initiate recalls that are related to defects in a large number of cars. At the same time, automakers are more likely to conduct recall campaigns that are associated with the elimination of inexpensive defects. Yong-Kyun and Hugo found that recalling a dangerous model could reduce the number of road accidents by 20% (Bae, Benitez-Silva, 2011).
These studies are of interest for the further formation of the concept and recommendations in the predictive system being developed, which is planned to be a universal tool.
The purpose of this study is to develop a predictive system model for generating a request for a stock of automotive parts at service points.
To achieve this goal, it is necessary to solve the following tasks: • choice of data set; • development of data analysis algorithm; • determination of the structure of defects in vehicle models; • development of a predictive system model The study uses a dataset prepared by the National Highway Traffic Safety Administration (NHTSA) and hosted on the Kaggle data science competition platform (NHTSA, 2017). In addition, information from the auto.ru website is used, as this resource contains classified ads for the sale of used cars.
Due to the fact that the proposed data set contains about 124 thousand values, we will use the Python programming language and the Jupiter Notebook environment as processing and analysis tools. The uniqueness and versatility of this programming language allows further development of NOW CODE and LOW CODE systems. Which is very important for application by specialists who do not have programming skills. This concept is currently actively used both in business and in production.
The following scientific methods were used in the work: filtering, grouping, counting results and visualization.
To conduct the study, a special algorithm was developed, shown in Figure 1. At the first stage of the analysis, the manufacturer is determined, at the second stage, the model is determined, and at the third stage, the data are combined and transposed.
Because the data set was compiled by an international association, it must be adapted to the domestic automotive market. Data from the popular Russian website Auto.ru was employed to accomplish this goal. It was assumed that the number of ads posted on the site determines the popularity of the brand and model. As a result, the nine most popular car brands in the Russian secondary market were identified (see Table 1). The values in the Auto.ru column are sorted in descending order.
It should be noted that the following automakers were not included in the set: Lada, a domestic automobile brand, and Skoda, a Czech automobile brand, because these brands are not currently of interest on the international market.
Moreover, the table traces consumer national preferences: the manufacturer Ford is most heavily represented in the set.

Results
As a result of the study, 27 car models were analyzed and the structure of defects in each model was determined. It should be noted that stacked diagrams were constructed for clarity. In this paper, some information will be presented in form of diagrams. Figure 2 shows the result of the analysis of BMW models. In the domestic market, BMW is popular with business class cars (5 series), middle class cars (3 series) and a mid-size crossover (X5). These models have some key nodes that can lead to revocable companies in more than 50% of cases. These components include: engine, airbags and parking brake. At the same time, in the business class, a greater number of defects are associated with the engine and fewer with airbags, while in the middle class and crossover, the largest number of breakdowns is associated with airbags. It can be assumed that airbags for the business class have a greater degree of reliability than for the middle class. In addition, two elements can be distinguished, which are presented in 2 out of 3 models: a speed sensor and external lighting. Figure 3 shows the result of the analysis of Volkswagen models. According to the results obtained, compact (Tiguan) and mid-size (Touareg) crossovers, as well as mid-size cars (Passat) are popular among Volkswagen in the Russian market. Despite the fact that Tiguan and Touareg are crossovers, the elements that fail most often differ between these models. Let's consider each model in more detail.
Owners of a compact crossover are most likely to encounter problems with airbags (33%), the fuel supply system (25%) and the electronic system (12%). The Passat has the following defective elements: airbags (16%), fuel supply system (28%) and engine (13%).  In the mid-size Touareg crossover, problems can be found in the parking brake (26%), electric propulsion system (22%), seat belts (13%) and body structure (13%). At the same time, defects associated with airbags are not included in 80% of all defects in this model.
It should be noted that the popular Volkswagen Polo in Russia is not of interest on the international market, so it was excluded from the analysis. Figure 4 shows the result of the analysis of Ford models. Compact (Focus) and mid-size (Mondeo) cars, as well as a full-size crossover (Explorer) are popular on the Russian market. It should be noted that the Ford Mondeo is called the Ford Contour in the US market.
As a result of the information analysis, it is possible to conclude that each model has its own set of unique elements that can lead to revocable companies. The most frequently recalled companies at Focus are locks (27%), electronics (17%), and bodywork (12%). Mondeo is equipped with a parking brake (18%) and an ignition system (13%). The Explorer comes equipped with a speed sensor (23%), suspension (19%), and seats (11%).
The analysis of other models is presented in Table 2.  Among the considered automakers, 4 countries can be distinguished. Let us analyze how widespread certain defects are in these countries. Table 3 presents the calculation of the significance of a particular defect for a particular country. The 10 most common defects were selected: (1) airbags, (2) speed sensor, (3) electronic components, (4) engine, (5) seat belts, (6) parking brake, (7) bodywork, (8 ) exterior lighting, (9) locks, (10) tires.
The following algorithm was used to compute the scores. Because each manufacturer is taken into account in each of the three models, the occurrence of a specific defect n could occur in any of the models. The maximum number of points for a single manufacturer for a specific defect is then three. At the same time, such a flaw could never occur and would only occur in one or two models. The "sum" column is the total points awarded to manufacturers in a country for each distinct defect. The ratio of the amount to the variety of automobile manufacturers in a given country is shown in the column labeled "points". The reason for this normalization is because the number of automakers varies across nations. Thus, we can assume that airbags are the most likely to fail in Japanese-made cars; South Korean-made cars have a parking brake and external lighting problem; German-made cars have airbags; and American-made cars have a speed sensor defect.

Discussion
The data obtained can be used to predict the need for spare parts for a car at repair and maintenance points. Figure 5 depicts the predictive system model. This model is comprised of three major blocks: data collection, data analysis, and decision making based on the analyzed data. It is proposed that data be gathered from two sources: information about revocable companies and consumer reviews. Unstructured data enters the analytical center, where it is pre-processed, and the best predictive model is chosen and built. You can plan and refine the maintenance schedule using the information provided by the analytical center. For example, if a car owner arrives at a service center with problem "A," but there is statistical evidence show that a specific model frequently has problems of type "B" at a given mileage, it is prudent to plan not only the elimination of problem "A," but also the diagnosis of a type "B" problem. Thus, with the owner's permission, it is possible to predict scheduled car repairs and order the necessary components.

Conclusion
As a result of the work carried out, the goal of the study was achieved: a model of a predictive system was developed for the formation of a stock of automotive parts at service points. This will make it possible to develop more versatile software in the future that can analyze data and make recommendations to large businesses, logistics companies, and repair organizations.