Predicting the client’s purchasing intention using Machine Learning models

. In this paper, we introduce a prediction algorithm that will determine the likelihood that a client will purchase from a website or not. This system is part of a global e-commerce solution that will help the clients to get the best possible experience. The paper presents an overview of the e-commerce system's various components and their various steps and also an activity diagram of the system, which shows the various steps that the platform can perform. It also provides a general idea of the system's workflow.


Introduction
Today, most domains are focused on anticipating and acting on future events.E-commerce platforms are very popular among consumers due to the existence of advanced technologies such as artificial intelligence.Among the key causes Data Mining and Machine learning have become an integral part of the Customer Relationship Management, is to replicate the close relationship that is emerging between small businesses and the purchaser.The goal is to find out what sets each customer apart and build trust between the two parties [1].
This paper is the continuation of [1], where we provided a general idea about the current state of art in terms of Machine Learning and Data Mining in the Digital Marketing domain.This paper presents the prediction of the likelihood that a client will pay for a product before leaving the Platform, the dataset used in this problematic was introduced by [2].
The paper is structured as follows, the first part explains the e-commerce platform's logical component, how it contributes to the platform's overall operations and how it will help the clients to get the best possible experience.The second part tackles the predictions of the likelihood that a user would purchase from the platform.This module can be used in a variety of ecommerce platforms.It can predict the best products in the platform, or it can introduce new features and concepts related to the consumer decision journey.This paper represents a global view that can be used in different domains and not only in a company context.

Logical components of an e-commerce platform
In order to develop an adequate system that can meet the demand of the 2.0 clients, the system we propose is based on a package diagram developed in [3] that talks about the various components of a real-world ecommerce system.
The authors focused on the various aspects of designing an e-commerce plat-form that would enable them to handle various types of data.Some of these included data cleansing, data migration, and data customization [1].The UASP package is responsible for the management of the user accounts and the customization of the platform's user experience.It also acts as the The authors treat the various modules as interrelated.The diagram aims to lay out the foundations for an ecommerce system.Nevertheless, these packages can be developed further, added to or split into other modules.
This paper presents an approach that enables to have only one simple function per package.It will then interact with the others according to the user's needs.Visitor manager is designed to handle the temporary visit of a client who doesn't have an account.It sends him to the appropriate page in case of purchase.The visitor sees the data adapted from the Ads manager for the best rated products.
User account manager is used for creating a new user account, logging in, and storing history.It also manages the purchase history.This module communicates directly with Best Next Action component and Shopping cart manager.
Best Next Action manager uses the data collected from the user account to provide the best possible user experience.It sends the most relevant products and services to the user based on his usual activities and behavior.
Product manager depends on the Ads manager and on the Best Next Action manager in order to show the products that will interest the user.
Ads manager sends data to the Best Next Action module in order to show the ads that interest the user most.
Shopping cart manager is responsible for the products a user adds to their cart.
Payment manager is responsible for the entire payment process after the user validates his cart.
Delivery manager is a tool that tracks the progress of the delivery process.It sends notifications when the user has finished their payment and on every step of the shipping.

Activity diagram of the proposed system
This section shows the interaction between the various modules of the platform.It shows how the various components interact with each other to provide a better and more complete user experience.
In general, the activity diagrams' generic idea of the e-commerce platforms are similar (Ex: [4]).In Best Next Action module (BNMA), the data collected by the account manager is used to create the best possible user experience for each individual user using past behaviors with Machine Learning Algorithm.The suggestions and ads are also handled throughout the module.The operating of Best Next Action Module was detailed in the previous section.The user can customize the items he wants to buy from the cart, and then proceed with the payment.After the purchase, the user can add the billing address, visa card data and track his order.

Dataset description
We worked on the dataset proposed by the paper published by [2].In this paper, the authors formulated a model that predicts the likelihood of abandonment and the purchasing intention of a user/session.
The dataset is composed of data gathered from 12,330 sessions; each session belongs to a new user to avoid any influence on the model.The target variable is a binary value describing whether the platform had any revenue from the session.The dataset has ten numerical features and eight categorical attributes.Some features' data is originated from the URL routes of the pages visited and is updated simultaneously when an action takes place.The dataset also includes operating system, browser, region, traffic type, visitor type as returning or new visitor, a Boolean value indicating whether the date of the visit is weekend, and month of the year [2].

Fig. 4. Correlation Heatmaps with Seaborn & Matplotlib
As an exploratory data analysis tool, this correlation matrix contain same important visual information.In fact, It shows that features are not very correlated and no potential multicollinearity problem was detected.

Prediction
In this study, we used the features offered by the Python library PyCaret [5].It is a Python library that provides a complete set of features for creating and manipulating machine learning models [6].
After getting the data and setting up the PyCart Environment, the library compiles the data and allows the user to compare different supervised or unsupervised models.It takes into account the most interesting results.The best algorithms are chosen based on various metrics such as Accuracy, Recall, etc. and presented in a table.The output prints a score grid that shows the average of the Accuracy, AUC, Recall, Precision, F1, Kappa and MCC across the folds (10 folds by default) along with the training times.Before using the data with PyCaret, it was cleaned and split in train and test data: The following results were obtained using the dataset mentioned above using the compare_models() method.prediction [9] RF can handle high dimensional data and use a large number of trees in the ensemble [10].

Fine-tuned prediction
PyCaret allows the users to train and make predictions about individual models.It can also be tuned to get better results; It can also automatically tune the hyper-parameters of a model using the Random Grid Search.
rf = create_model('rf') tuned_rf = tune_model(rf) We notice satisfying results of the trained models in this paper by PyCaret using the dataset we have discussed above.the three top models discussed in the first step have near values.The tuned Random Forest model scored the best results with a 91% Accuracy score before tuning the hyper-parameters when used with the sessions dataset.

Conclusion
Due to the convenience of shopping online, many people prefer to do business online.This is also beneficial for the retailers as it allows them to reach out to a wider customer base and offer various products at lower prices [11].
This paper aims to provide a global view of the various components that a platform can use to reinforce its consumer decision journey.It shows how the various components can be utilized to enhance the platform's effectiveness.
The prediction module can be used in various aspects of a website, such as prediction of the user's future purchase.The goal of this paper is to develop a platform that will allow the customer to act on their needs instantly.

Fig. 3 .
Fig. 3.The activity diagram of the predicting system based on the client's purchasing.The activity diagram is Fig 3, The flow is initiated once a potential client browses the website and completes a purchase or registration.It is usually triggered by a visit to the website's homepage.In Best Next Action module (BNMA), the data collected by the account manager is used to create the best possible user experience for each individual user using past behaviors with Machine Learning Algorithm.The suggestions and ads are also handled throughout the module.The operating of Best Next Action Module was detailed in the previous section.The user can customize the items he wants to buy from the cart, and then proceed with the payment.After the purchase, the user can add the billing address, visa card data and track his order.

Fig. 5 .
Fig. 5. Results of PyCaret's comparing models' method.The best models based on PyCaret with the best metrics are: • Light Gradient Boosting Machine -LIGHTGBM is a gradient boosting framework based on decision trees, it increases the efficiency of the model and reduces memory usage [7], • Gradient Boosting Classifier -GBC combine many weak learning models together to create a strong predictive model.Decision trees are usually used when doing gradient boosting [8], • Random Forest Classifier -RF consists of a large number of individual decision trees that operate as an ensemble.Each tree spits out a class prediction and the class with the most votes become the model's

Fig. 6 .
Fig. 6.Representation of the results from the three top algorithms based on PyCaret.

Table 1 .
Features names and types.