Research on Supermarket Product Layout Method Based on Comprehensive Correlation Degree

— The commodity layout of a large supermarket has an important impact on the sales and benefits of the supermarket. The quality of the commodity layout directly affects the consumer's psychological feeling and purchasing behavior in the shopping process. According to the previous research, we can find that the traditional commodity layout method has the shortcomings of lack of data support, poor stability, and does not include the actual factors such as the impulse purchase behavior of consumers and the purchase law of commodities in the research scope, resulting in the problem that the optimization effect is not obvious. In view of this deficiency, this paper proposes a commodity layout method based on comprehensive correlation degree. Firstly, the concepts of adjacency effect value and placement freedom degree are given. Based on this, transaction data is used to mine association rules of commodities in customers' shopping baskets, and the influence of the same commodity on surrounding commodities due to different placement positions is further analyzed. Based on the maximization of the comprehensive correlation value, an analytic model for the optimization of commodity layout is established, and the optimal commodity layout is obtained by using genetic algorithm. In this article, the layout of the B supermarket commodity layout is taken as an example. The overall layout is more closely related. It verifies the feasibility and superiority of the commodity layout method based on comprehensive correlation, which can better improve the market competitiveness of the supermarket.


Introduction
Consumers' impulse buying behavior is actually a kind of stimulus response, which can stimulate consumers' inner purchasing desire through external stimulation, thus making purchasing behavior. When consumers purchase commodities, in addition to the commodities they plan to buy in advance, they will also purchase some commodities because of the impulse of external stimulus. As an external stimulus, commodity layout is also an important way to promote consumption. Relevant data show that scientific, professional, commodity layout that adapts to consumer psychology and consumer demand can drive sales growth of 30%-40%, far greater than the sales increase brought about by promotion [1]. The distribution of commodities in large supermarkets has a direct impact on consumers' psychological feelings and purchasing behavior during the shopping process. Reasonable layout will leave a good impression on consumers and stimulate their purchasing desire. Therefore, studying the stimulation of external layout on consumers' purchasing behavior and optimizing the distribution of supermarket commodities is a way to attract and retain consumers to buying and improve supermarket sales.
The main purpose of supermarket commodities layout research is to improve supermarket operation efficiency, increase consumer satisfaction, and increase sales. Many scholars conduct research from different perspectives. From the perspective of improving supermarket operation efficiency, the most common method is to use SLP (System Layout Design) to plan and design the supermarket layout. For example, Zhao Feng and Wang Ze [2] aimed at reducing congestion and saving time, and based on consumer traffic as a logistics factor, a dynamic SLP method was used to lay out the interior of the supermarket to obtain a clear regional layout; Liu Zhihai and Zhang Dandan [3] aimed at more concise zoning and stronger inter-regional relationships. Based on the traditional SLP method to refine nonlogistics factors, they added dynamic analysis and feedback to the supermarket layout planning process to obtain corresponding optimization solutions. SLP method can effectively avoid congestion, but the method to determine the correlation degree between regions is too subjective, and the layout scheme is diverse, so the optimal scheme can not be determined.
From the perspective of improving consumer satisfaction, in order to reduce consumer shopping time and settlement time, some scholars based on data mining technology [4][5][6], using service operation theory, queuing theory and other methods to layout supermarket commodities. For example, Li Xiao et al. [7] take the shortest path for consumers to shop as the goal, based on consumer shopping data and spatial modeling, use a heuristic search algorithm to generate an approximate optimal layout for consumers to complete shopping quickly; Wang Tingting et al. [8] take the maximum satisfaction of consumers as the main goal, based on the triple representation method to achieve the description of the problem, using the PGSA (plant growth simulated algorithm) to achieve the optimal scheduling of the supermarket cashier. This method effectively reduces the consumer's picking time and improves the consumer shopping experience, but does not consider the proportion of impulse purchases, and ignores consumers' impulse purchase behavior on external stimuli when purchasing, resulting in loss of potential sales benefits. Impulsive purchases account for a significant portion of product sales [9], and this ratio has changed from the lowest 27% to more than 80% [10], and has shown a clear upward trend in recent years.
From the perspective of increasing turnover, mainly through the magnet point design, customers can go through as many areas as possible during the in-store shopping process, see more commodities, and stimulate consumer purchases. For example, Jia Tongling [11] aims to increase the sales volume of the supermarket, and based on the magnet point theory and the dynamic line of customers, obtains a feasible layout plan for the supermarket; Gu Guangsheng [12] aimed to increase sales, based on the magnet point theory, and used the shelf display as the main factor to layout the supermarket stores. At present, the magnet point theory is widely used in supermarket, but it has some shortcomings, and the guiding effect on consumers is poor. Masao ohtah and Yoshiyuki Higuchi [13] research found that the second and third magnet point customer loss rates are 56% and 59%, respectively.
Based on the above analysis, we can see that the traditional supermarket commodity layout method has the disadvantages of subjective correlation degree determination method, lack of data support, diversified results and poor guidance effect, and the model consistency is very strong, and some improvements have not been made according to the changes in the social environment, but there have been improvements in the detailed operation of the existing model. Therefore, the purpose of this article is to strengthen the relationship between commodities to stimulate consumer purchases, and to plan the layout of supermarket commodities. First, use data mining technology to conduct in-depth analysis of historical consumption data; secondly, use the adjacency effect value and placement freedom as the main observation indicators to establish a commodity layout model that maximizes the comprehensive correlation value, and improve the overall correlation of the layout; Then, use the optimization algorithm to determine the final supermarket layout. Finally, experimental results show that the method in this paper can successfully design a reasonable supermarket commodity layout, and the layout design algorithm is simple and efficient.

Association rule
Association rules are used to reflect the interdependence and relevance between one thing and other things. The implication expression is X Y  , where , X I Y I   and = X Y  , that is, X and Y are two different things. Suppose I is a set of items, given a transaction database D , where each transaction T is a non-null subset of I , that is, each transaction corresponds to a unique identifier TID .
(2) Support and confidence describe the strength of association rules by probability. The support degree is the probability that two commodities X Y  appear in the trading set D , that is, the probability that X and Y are purchased at the same time. In formula The confidence degree is the probability that Y also appears after X appears in data set represents the probability that X occurs in trading set D .However, it is found that the degree of support and confidence cannot exactly check the effectiveness of rules, and the ability to express rules is not strong, which cannot guarantee the effectiveness of decisions. In order to enhance the guiding significance, the lifting degree is put forward, whose intuitive meaning is the ratio of the probability of containing Y under the condition of containing X to the probability of containing Y under the condition of not containing X .It describes how much the use of rules can be improved compared with the use of no rules. The formula is: indicates that Y accounts for the overall occurrence probability, and the formula is expressed as ,indicates that the appearance of X will suppress the appearance of Y , they are negatively correlated; When   =1 L X Y  , it means that the appearance of X has nothing to do with Y , they are independent; When   1 L X Y   , it means that the appearance of X can increase the appearance of Y , and they are positively related.
Association rule mining is an important technology for data mining. It is used to mine valuable association rules between data items from a large amount of data. Association rule mining has many applications in life, which are often used in the recommendation system of physical stores or online e-commerce. Beer and diapers are typical applications of this kind of technology. Through association rule mining, get the rules that can guide decision-making, the Apriori algorithm [11] proposed by Agrawal, lmielinski and Swami is the most commonly used in association rule mining research. The execution process of association rule mining using this algorithm is: (1) Generation of frequent itemsets, all item sets that meet the minimum support threshold are found, which are called frequent item sets.
(2) Generation of association rules, all the rules that meet the minimum confidence threshold are extracted from the found frequent item set. These rules are called strong association rules.
As an external stimulus, supermarket commodity layout is a very important means to promote consumption. Mining association rules based on shopping data can get effective commodity association rules, provide decision support for supermarket commodity layout, and further improve sales ability and revenue. Therefore, by mining association rules on the customer's purchase record data set, the ultimate goal of this article is to discover the inherent customer purchasing habits. For example, the purchase of commodity A will increase the probability of purchasing commodity B. According to the mining results, design a reasonable commodity layout to achieve sales increase. The specific method of this article is introduced below.

Formal description of supermarket space
In order to facilitate the theoretical analysis, this article analyzes the three-dimensional supermarket space according to the supermarket space environment, and gives a more intuitive formal description of the supermarket space. Assume that the distance between adjacent shelves is one unit. Consumers can reach in one step between adjacent goods shelves. For non-adjacent shelves, consumers cannot reach directly and must pass through other areas. The commodity shelf is represented by rectangle, and the distance between adjacent shelves is standardized as above. The formal description of the supermarket's plane topology is shown in Figure 1: When the current and subsequent items are the same, ,the product has no lifting effect on itself. The lift matrix reflects the mutual relationship of commodities and provides basic data for subsequent calculations. For the description of commodity relationship, the commonly used method is based on association rules, using implication expression to describe the association relationship between two kinds of commodities, but this method can not fully reflect the impact of commodities on multiple commodities around. In order to better describe the relationship, this paper give the concepts of the degree of freedom of placement and the value of adjacency effect according to the position on the shelf space and the number of shelves.
Definition 1: select a shelf, the number of shelves that can be placed adjacent to it, which is called the degree of freedom of this shelf. This article uses k to express it. Definition 2: the adjacency effect value is the average promotion effect value of the commodities on the location to the surrounding commodities. Suppose a represents a layout and i represents the shelf number, then   Among them, i a indicates the products placed on the shelf i under the a layout, k indicates the degree of freedom of placement, and ik a indicates the products placed on the k -th shelf around the shelf i .
In the actual supermarket product layout, the space has a variety of shapes, such as fan-shaped, circular, rectangular, irregular, etc., but each shelf has adjacent shelves, which are placed according to the different positions of the shelves. There are many cases of k values, and three cases in rectangular space. These three cases are explained below: Case 1: It is placed in the corner, and placing the degree of freedom is 3 k  . Taking shelf 1 as an example, it is relatively close to shelves 2, 5, and 6 in position, which will have a correlation and interaction. Therefore, when 3 k  , it is necessary to consider the influence of the surrounding three shelves on it. This situation is visually represented by a shelf position diagram as shown in Figure 2: The comprehensive correlation value can reflect the lifting effect of the entire layout. Therefore, based on the adjacency effect value, this paper establishes a mathematical model for the comprehensive correlation value of the comprehensive product layout. According to equation (4) lifting degree matrix and equation (5) adjacency effect value expression, a quantitative expression is given as follows: Among them, n indicates the number of shelves, k indicates the degree of freedom for placing shelves i , and M indicates all possible layout situations.
This article studies the distribution of supermarket commodity with the purpose of maximizing the comprehensive correlation value. This commodity layout model has the following characteristics in practical applications: (1) The model uses the degree of freedom of placement and the commodity correlation degree as observation index to calculate the value of adjacency effects when the goods are placed at different positions. Obtain quantifiable specific results, avoid the problem of bias in evaluation parameters due to strong subjectivity, and improve rigor. (2) This model describes the relationship between locations from four directions, and the adjacency effect value is proposed. Using the association rules of data mining to select the goods with the maximum value of location adjacency effect, and then match the commoditys with the positions to obtain the optimal commodity layout. So that the relationship between the goods is closer, as long as commodities are adjacent, they must be related, and the whole layout is more systematic. However, the layout method in this paper involves a large number of shelves, each shelf has a different k value, has a large amount of calculation, and is difficult to complete by manual calculation. Therefore, the genetic algorithm is used to solve the model.

Solving the model of product layout by maximizing comprehensive correlation values on genetic algorithm
Genetic Algorithm is a computational model that simulates the natural selection and genetic mechanism of Darwin 's biological evolution. It is a method to search for the optimal solution by simulating natural evolution. The genetic algorithm can directly operate on the structural objects, and adopts a probabilistic optimization method, which can adaptively adjust the search direction, and has parallelism and better global optimization capabilities. Because of the large number of parameters of the commodity layout model with the maximum adjacency effect, the complexity of the algorithm is high, and the difficulty of manual calculation is large, the genetic algorithm is used to solve it. The solution process is as follows:

Encoding
It adopts integer permutation coding.

Population initialization
After completing the chromosome encoding, an initial population must be generated as the starting solution, so first of all, we need to determine the number of initialization population. The number of initialized populations is generally based on experience. In general, the number of populations depends on the size of the commodity type, and the value of case in this paper is 30.

Fitness Function
Let |1|  The larger the fitness function value, the better the chromosome, and vice versa.

Select operation
The selection operation is to select individuals to a new population with a certain probability from the existing population. The probability that an individual is selected is related to the fitness value. The greater the individual fitness value, the greater the probability of being selected. This article uses the roulette selection method.

Cross operation
Using partial mapping hybridization, determine the parent of the cross operation, and divide the parent sample into two groups, and repeat the following process for each group (assuming the number of products is 16): 1) Generate two random integers 1 q and 2 q in the interval [1,16]

Reverse operation
In order to improve the local search ability of genetic algorithms, continuous multiple evolution reversal operations are introduced after selection, crossover, this operation added to accelerate evolution. Mutation. "Evolution" here refers to the unidirectionality of the reversal operator, it means that only after the reversal, the fitness value will be accepted, otherwise the reversal will be invalid. Generate two random integers 1 q and 2 q in the interval [1,16], determine the two positions, and swap them, such as 1 2 =4 =9 q q ， : 1 15 7 |2 3 9 10 11| 12 5 6 12 8 14 13 4 After the reversal it becomes: 1 15 7 |11 10 9 3 2| 12 5 6 12 8 14 13 4 Perform cross mutation for each individual, and then substitute it into the fitness function for evaluation. Select individuals with large fitness values to perform the next generation of crossover, mutation, and evolution reversal operations. Perform a loop operation. Determine whether the set maximum generation number MAXGEN is met, if not, then enter the calculation of fitness value, otherwise, the genetic operation is ended.

Case analysis
In order to meet people's growing demand, the scale and number of supermarkets have increased significantly in recent years, and industry competition has become increasingly fierce. However, there is not a big gap between the operating models and promotional methods of supermarkets, which has caused supermarket sales to be tested. In order to develop well, B supermarket urgently needs to change its business strategy and improve its competitiveness and turnover. Scientific and reasonable product layout can affect consumers' consumption behavior, thereby increasing supermarket sales. Therefore, in the layout of products, this article considers the impact on customer consumption psychology, changes the traditional product layout, and proposes a new product layout method to stimulate consumers to change their initial consumption behavior, thereby helping supermarkets increase consumption. In addition, the cost of optimizing the product layout is relatively small, the implementation difficulty is relatively low, and it can produce the effect relatively quickly, which has certain feasibility.
B Supermarket focusing on food and daily necessities. It is located at 188 Zhongshan East Road, northeast of the intersection of Ping'an North Street and Yuhua East Road. It belongs to the core business district. It is adjacent to the civic center and the first block of finance and commerce, and the transportation is convenient. There are many high-end communities such as Huicui Garden, Chang'an Flower Garden, Zhongji Liyu, Yinhong Flower Garden and so on, with a large and very stable consumer group, the purchasing power and purchasing demand in the region are guaranteed. In this paper, we randomly collected part of the shopping data of B medium-sized chain supermarket in Shijiazhuang City, Hebei Province in the past month, including 58 attributes (including customer number, shopping time, shopping amount and 55 kinds of commodities), with a total of 2877 transaction records. The attribute value "0" indicates that the customer did not purchase the product in this transaction, and the value "1" indicates that the product was purchased. After necessary processing of the original data, the product categories are integrated into 30 types, and each piece of data is a record of different customer purchases, that is, a shopping basket, and the number of product types purchased by each customer at a time becomes 1-30. In the mining software Clementine12.0, the Apriori algorithm was used to mine association rules. The rule support degree was set to be greater than 10%, the confidence degree was greater than 20%, and 988 association rules were run.

Formal description of supermarket space
Supermarket B with a total area of 3255 square meters and a business area of 1955 square meters. Protective products, cosmetics and other products, about 15,000 varieties of products. Although the supermarket is a three-dimensional structure, in fact, the behavior of shoppers is mainly on a two-dimensional plane. The standardized supermarket form description is shown in Figure 5: Standardization plan of B supermarket

Goods location optimization process
1) Classification of goods: Based on the analysis of the historical consumption data of the supermarket, this chapter proposes to carry out calculation analysis on the major commodity categories. According to the product function and use attributes, the categories are classified to solve the problems of small analysis granularity and high calculation complexity caused by analysis on specific products. All commodities are divided into 30 categories. Integrate the initial data according to the category and number the products. The product classification is shown in Table 1: 2) Organize the association matrix: Based on historical consumption data, perform data mining operations to analyze the correlation between products. Clementine 12.0 is a general-purpose data mining software, which can help users build a complete data mining process, and provides a series of functions so that users can perform any of these mining steps. For the processed shopping data, use Clementine 12.0 to establish the Apriori algorithm model to find the association rules between products. The process mainly includes establishing data, establishing a process, importing data, setting parameters, exporting results, and analyzing. According to the promotion degree of each commodity, the association matrix is sorted out, as shown in Figure 6: (2) the layout of goods shelves; Set the layout of the shelf space in the program, substitute the correlation matrix arranged in the number sequence into the program, and run the program. First generate a random set of initial solutions, so we get a set of unoptimized random genes: Calculate the comprehensive correlation function value of the unoptimized genes, that is, the sum of all adjacent effect values, unoptimized comprehensive correlation value function value is 30.9144. In the experiment of supermarket product layout, the specific supermarket product layout is randomly generated and is not repeated.Continue to run the program to get the convergence graph of genetic algorithm, as shown in Figure7:   Figure 7.
Convergence diagram of genetic algorithm optimization In different supermarket product layout, the value of adjacency effect is calculated based on the historical consumption data of the supermarket, and the comprehensive correlation value under each supermarket product layout is obtained. By comparing and analyzing the layout with the largest comprehensive correlation value, we consider this layout to be the current approximate optimal layout. Figure 9 shows the optimal convergence process of the model. It can be seen that when the number of iterations is 256, the optimal fitness value tends to be stable, and the comprehensive correlation function value is 35.14. When the number of iterations reaches 470, the optimal fitness value no longer increases and reaches the maximum, the comprehensive correlation function value is 35.21, which is the optimal solution to the problem. At this time, a group of genes optimized by genetic algorithm can be obtained: The value of the comprehensive correlation value function of the gene after genetic algorithm optimization is 35.2138, which is the global optimum. At the end of the program, the optimized product layout is obtained. The darker the color is, the greater the value of adjacency effect is, and vice versa. The optimal product layout of the supermarket can be obtained by matching the products represented by the numbers with those in the result chart one by one. The optimal product layout is shown in Figure 8: Optimal layout results of genetic algorithm 4) Conclusion: As can be seen from Figure 9, it is relatively successful to optimize the layout of supermarket products based on the comprehensive correlation value of the products. Under the initial random production layout, the comprehensive correlation value of the product layout is 30.9144, while the relatively optimal product layout has a comprehensive correlation value of 35.2138. the difference between the two is 4.2994, and the difference ratio between the two before and after optimization is about 14%, which shows that the optimization effect is very good. It can effectively optimize the supermarket layout, and get the relatively best supermarket product layout.

Conclusions and research prospects
The purpose of this article is to provide a constructive plan for the layout of supermarket products. By designing a reasonable product layout, it will affect consumers' consumption behavior, thereby increasing supermarket sales. This article first analyzes the impact of goods on the surrounding goods due to different cargo locations, and put forward the concept of placing degrees of freedom, considering the impact of goods on other goods in multiple directions, the overall layout is more closely related. A commodity network with adjacency relation is constructed to make suggestions for commodity layout. By understanding the purchasing habits of customers, the mutual influence of product sales is expanded, and customers are guided to continue to purchase products, thereby driving the sales of products with less sales. Then, according to the different degrees of freedom of placement, the model of maximizing the comprehensive correlation value is established, calculate the adjacency effect value of each product and use genetic algorithm to solve it, and get the comprehensive set of related values under each supermarket product layout. Through comparative analysis, the layout with the largest comprehensive correlation value is obtained. After determining the layout of the product, it is mapped to the corresponding space area to form the final supermarket product layout. In this way, when customers choose products by themselves in the supermarket, they will go to the products they most want to buy, and the related products are around them, which gives customers a kind of implicit guidance, arouses customers' interest in purchasing other products, thus promoting the sales of products and playing a role of commodity recommendation. At the same time, the quantitative method is adopted to avoid the deviation of evaluation parameters due to the strong subjectivity, which is more rigorous. The experimental results show that the method in this paper can successfully design a reasonable supermarket layout, and the layout design algorithm is simple and efficient. At the same time, related products drive sales to each other, and increasing sales brings economic benefits to the enterprise. The data analyzed in this article comes from consumer shopping information of B supermarkets. Due to the small amount of data extracted and a certain degree of randomness, the results obtained may not be accurate and perfect. However, the mining principle is basically the same, and the results have certain reference and guidance significance.