Clustering Analysis of Traffic Accident in Semarang City

. Traffic accidents are one of the global issues that require serious handling. Accidents occur in different places with different incidents, which makes it difficult to determine which areas have a high degree of traffic accidents. Information about areas prone to accidents is needed by the community and law enforcement. Such information can be taken into consideration for the supervision and anticipation action especially for the police. In this study made a cluster to analyze the areas prone to accidents in the city of Semarang. The method used is cluster analysis where the grouping to determine the vulnerability of an area. The result of the research stated that the level of traffic accident vulnerability is mostly happened in Semarang - Semarang regency passing through Semarang regency. In addition, the level of vulnerability in the city of Semarang occurred on weekdays. From the validation results that have been made, the suitability of the hazardous modeling area that has been formed is: Occurs more likely on weekdays (Monday, Thursday, Friday and Sunday); At an average Kilometer of 19.75-Direction B; During Afternoon and Evening; Small and Large Vehicle Types; Cloudy, Drizzle and Rain.


Introduction
The increasing number of residents in the city of Semarang every year causes the need for transportation is also increasing (1,602,717 people in 2016).Population growth in Semarang City in 2016 reached 0.47% per year [1].An increase in population will indirectly increase the risk of growing transportation problems.Transportation problems according to Tamin [2] are not only limited to the limited number of modes of transportation.The rapid development of transportation will indirectly increase the risk of growing traffic problems (traffic accidents).Traffic accidents according to the Law of the Republic of Indonesia No. 22 of 2009 is an incident on an unexpected and unintentional highway involving a vehicle with or without other road users that results in human casualties and / or property losses.Data from the Police of the City of Semarang Resort (Polrestabes-Semarang) states that in 2017 there were 936 traffic accidents in Semarang [3].
This clearly needs attention and effective handling because it relates to losses suffered by the community during an accident.Polrestabes-Semarang recapitulates the number of accidents, number of victims, and total material losses in an area to be presented in the form of descriptive statistical analysis.
Generally, traffic accident data analysis is presented as an information on the results of description statistics ( [4][5] [6]), which includes: (a) frequency distribution, (b) periodic data (data arranged in a time sequence), (c) weighting (the value used to calculate the accident index based on the characteristics of each accident), (d) z-score technique (analysis based on raw standards), (e) cumulative summary technique (procedure used to identify accident locations, or (f) stick diagram analysis (used to classify similar types of accidents).
Data processing with new descriptive statistical techniques reveals a small portion of information hidden in the database.Important information that supports decision making to reduce and prevent traffic accidents, such as patterns of causes of accidents and trends that develop due to accidents have not yet been presented [7].Another limitation of descriptive statistics is that it does not show causality between parameters or to recognize similarities in phenomena that may be hidden in the data [8].
Data mining is known as a technique for summarizing data by finding unexpected relationships, finding patterns that can be understood and useful for data owners (Larose, 2005).Several studies related to the use of data mining in processing traffic accident datasets in Indonesia, such as the use of Apriori [9] and Naïve Bayes techniques [10] to predict traffic accidents.Use of Association techniques [11] relating to accident predictions based on the event rules contained in the dataset.However, research related to data clustering as a basis for identifying areas prone to traffic accidents is still low, so research is needed on data clustering.From these data, the Polrestasbes Semarang every month recapitulates the number of accidents, the number of victims who died, serious injuries, minor injuries, and total material losses.The results of this recapitulation are one of the main information to identify to traffic accidents.According to Wedasana [4], the determination of accident-prone areas ideally considers historical data, so that the Polrestabes-Semarang usually also refer to the number of accidents in recent years.
Accident-prone areas are a location where the number of accidents is high with the incidence of recurrent accidents in a space and a relatively the same time span, caused by a particular cause.To identify accident-prone areas there are two stages that must be passed [4], namely: a. Study the history of accidents (accident history) from all study areas and then choose locations that are considered to be accident-prone.b.Study in detail the selected location to find the treatment that can be done.
The search for knowledge in data, also known as Knowledge discovery in Databases (KDD) [12], is defined as data extraction that has the potential for valuable information that is implicit and not previously recognized.

Fig.1. Steps in Knowledge Data Process ( Pal and Jain, 2005)
There are a number of stages in the KDD process, however, there are basically three main stages [13] as shown in Figure 1, namely: a. Pre-processing, related to data collection and retrieval (data collection), data cleaning (data cleaning), and data selection and transformation (data selection and transformation).
From traffic accident data, there are a number of key information selected to be used as parameters or criteria for clustering, which are: time, location of the incident, type of vehicle involved, and condition of the victim due to an accident.This information is generally stored in the form of descriptions / descriptions.
For computing needs, descriptive information (nominal) needs to be changed in the form of ordinal, interval, or ratio data types.In this case the simplification and generalization process of data applies to obtain information that can be further processed.the potential to be exploited [14].There are a number of models or techniques that can be used to find these patterns, such as: anomaly c.Post-processing, related to the evaluation of results and visualizing them form that understood by the user.After the clustering process is complete, then it is carried out then entering the third stage of post-processing data in the form of analysis and visualization of results.

Conclusion
Based on the mapping of traffic accident data clustering results that have been carried out, there are several things that can be concluded, namely: The traffic accident data clustering system with the Hierarchical Clustering method can be used to classify road objects based on similar characteristics to the number of victims, the type of vehicle involved, and the number of accidents that occur within a certain time span.The traffic accident data clustering system requires the role of a traffic expert to analyze the results of clustering and determine the classification of the status of the level of highway vulnerability.Moreover, by reducing the number of accidents, a further traffic jam caused by occurring accident can be greatly reduced.Reduced number of traffic accident will also prevent carbon emission from the engine combustion so that a low carbon society can be achieved.
-day -Location (km) -Directions (A or B) -Vehicle Type (Small, Medium, Large) -Weather (Bright, Cloudy, Drizzle, Rain, Heavy Rain) -Victims (Light Injuries, Weight, Death) b.Data mining, related to the process of data exploration to find patterns or rules that have not been identified before, can be interpreted, and have https://doi.org/10.1051/e3sconf/201873,