A survey of cache placement algorithms in content delivery networks

. This paper analyzes cache deployment algorithms in content delivery networks (CDN), which are important for increasing content delivery speed, reducing network traffic, and ensuring scalability. A CDN is a geographically distributed network infrastructure, which is placed on the Internet and is aimed at reliably delivering content to users, as well as reducing traffic on the networks of communication operators. A content delivery network serves as an important element in today ’ s internet infrastructure to ensure fast and secure delivery of content around the world. When using a content delivery network, it uses local cache servers to keep copies of content on a central resource server closer to end users. This reduces latency and increases content delivery speed. Cache placement algorithms determine which cache server to store content and where to place it. As the demand for high-quality content continues to grow, caching algorithms play an increasingly important role in efficiently delivering content to users around the world. This article analyzes caching algorithms in CDNs, including their advantages, disadvantages, and application areas.


Introduction
A content delivery network is a geographically distributed network infrastructure, which is placed on the Internet and is aimed at the reliable delivery of content to users, as well as reducing traffic in the networks of communication operators (CDN operators) (Fig. 1) [1].A content delivery network serves as an important element in today's internet infrastructure to ensure fast and secure delivery of content around the world.When using a content delivery network, data from a central resource server is copied to regional cache servers.
Each cache server stores a fully or partially updated copy of frequently requested data.A router located next to the cache server communicates with the provider's local networks and distributes content to end users via the shortest network route based on server availability.The state of development of CDN can be characterized by the following trends: the growth of the number of content delivery network nodes in the world, the increase in the number of users of mobile devices, the development of CDN technologies and algorithms, security issues and data distribution models.
CDN technologies have continued to evolve over the years, and new innovations in content delivery continue to emerge today, including improved caching algorithms, increased bandwidth, and more efficient management of network infrastructure.
Demand for global CDNs is also expected to grow, especially in developing regions with increasing internet penetration and user numbers.We continue to work with CDNs and content providers to offer the highest efficiency and quality of service that is the core mission of the CDN industry.
The Internet was created as a means of communication between two nodes and is now evolving from a communication network to a global business, entertainment and media platform.As billions of different devices go online every day, the amount of data stored on the network has already reached hundreds of exabytes.Since the Internet was not originally designed to provide users with reliable and fast access to a wide variety of digital information, in recent decades many solutions have been needed to adapt the Internet to modern conditions [2].
One such solution is various caching mechanisms [3].The essence of such solutions is to create a buffer with fast access and to contain the information that is in demand in a relatively small volume.CDNs are used here as an example of a caching network.
A CDN is a network consisting of a collection of strategically located servers that provide content replication and distribution, reliability, and good performance [4,5].The first CDN, Akamai [6], was founded in 1999 at the Massachusetts Institute of Technology (MIT) as part of a research project to solve the problem of server downtime due to traffic spikes.CDNs from large organizations such as Akamai and Amazon CloudFront [7] charge content providers for content delivery and give them complete control over the content caching process.Typically, content is copied from a content provider and distributed to CDN servers, and users access content from the CDN.A CDN is an efficient solution for content providers, who take responsibility for content hosting and distribution.A large number of geographically distributed content servers (thousands) belonging to a CDN provide high availability, ease of use and low latency for users [8].
In a normal network, a user's request is directed to a domain name server (DNS), which translates a website's name to its IP address, and the request is directed to an origin server, which responds to the user's request.In the CDN architecture, the content delivery CDN is responsible for distributing the content of the origin server among the content servers (Fig. 2).

Fig. 2. CDN architecture
When a user request enters the DNS, it is directed to the Request Routing System (RRS), which is responsible for directing users to the appropriate CDN content server [9].Choosing a content server depends on content availability, distance between the user and the content server, delivery cost, and load balancing.To optimize content delivery, the CDN performs network measurements to update content location and network status information.
In CDN networks, cache deployment and cache replacement processes are critical to ensure the speed and reliability of content delivery to users around the world.Cache deployment is the distribution of CDN cache across geographic regions to increase content availability and reach for users.Cache placement algorithms are critical to a CDN because they determine which cache server to store content and where to place it.Cache placement algorithms aim to improve content delivery speed, reduce network traffic, and ensure scalability.

Materials and methods
Caching technologies are used in content delivery networks to distribute video services in network segments that are as close as possible to users.Since the capacity of caches is limited compared to the capacity of servers, they need to store the most requested content.Caching highly requested content significantly reduces access latencies, server overload, the amount of data to be cached, and resource costs required for caching.
Figure 3 shows the steps for delivering content through video service caches.In the typical case of video services, a user requests a video from a server, and if the requested video is available in the cache serving the user, the user retrieves the video from that cache.If the video is not available in the cache, then the user's video request is redirected to the remote server.Here, the first case is called caching, and the second case is called non-caching.The cache server updates its caches by replacing existing cached videos with newly requested videos, which is controlled by the cache update policy.

Fig. 3. Content delivery process
The content delivery process described above has a simple structure and is appropriately designed for simple cache servers.In the event that the content of the provided service is not available in the local cache server, the user's request is directed directly to the main server (for example, when providing a video service, the request is directed to the video server, such a service may require large resources depending on the video format), and in the process of making such a request, the resources of the network and other related elements are spent.In the case of caching with a complex structure, the process of caching the content of the provided service is carried out hierarchically, which maximizes the caching priorities.An example of hierarchical caching is shown in Fig. 4.a, where caches are arranged in the form of a multi-level tree.network boundary, it is provided through the neighboring network cache servers.Such an architecture requires additional communication overhead and the ability to transmit up-todate information about content updates to shared service caches to each cache server.[10].

Results and discussion
Content placement algorithms are used to determine the order of routers from the actual content source to the user [11][12].Algorithms for placing operational caches are important for Internet providers (ISPs -Internet Service Providers) to reduce traffic and achieve good CDN performance [13][14].LCE (LCE -Leave Copy Everywhere), LCD (LCD -Leave Copy Down) and Prob algorithms [11,[15][16][17] to determine the best in terms of delivery time are discussed below, and the implementation process of these algorithms is shown in Fig. 5.

Fig. 5. LCE, LCD and Prob Algorithm Implementation Process
Leave Copy Everywhere (LCE).LCE is a widely used caching algorithm in CDN.It states that all routers store all data packets that pass through them in the CDN, as shown in Fig. 5. Thus, this algorithm is not described among the caching parameters.In addition, LCE causes duplication in content caching on each router.At the same time, the content population increases and the cache capacity expands [18].In the case of LCE, the cache output is stable.This algorithm is a good choice for cache overflow events.
On the other hand, the LCE algorithm shows shortcomings in processing resources on multiple platforms.In addition to duplicating to each router in the route, it results in: using all the content storage capacity, using many cache replacement algorithms, and doing a lot of effort to complete a single request.This results in more time consumption than other caching algorithms, which increases the delivery time.Also, there are good algorithms other than LCE for caching.
Leave Copy Down (LCD).The LCD cache placement algorithm allows a new copy of the requested object to be stored in only one router, which is the point where it is located in the previous level cache, as shown in Figure 5.For multi-caching (multi-caching) levels, this LCD algorithm can be used to switch scaling and avoid unnecessary duplication of multiple levels of caching similar objects.In addition, LCD is able to perform better under many interconnection topologies and workloads, especially when learning the caching behavior of multi-caching levels [13].However, latency may occur when the cache size increases.
However, if we compare LCD with LCE algorithms, it can be concluded that LCD is more modern than LCE, because it involves several requests to get an object into the lowerlevel cache (leaf cache).Whereas a new instance of the object brings one router closer to the user with each request [15].A particular advantage of this algorithm is the simplicity of reducing the amount of content to create a suitable cache size in particular.Therefore, LCD Probabilistic Cache (Prob).It caches data packets at each router using a constant probability [11].In this caching algorithm, the content of the request is moved with a certain probability p at each network node along the delivery path from the source to the client.But when the probability is p=1, this algorithm is similar to the LCE algorithm (that is, it places a copy of the content in each network node) [13,14].The probe is moderately complex and it tries to minimize redundancy between network caches as shown in Fig. 5. Therefore, it is more likely that the content will be found in the direction for subsequent queries.It is the most stable in distributed networks.In the probe, the Hop between the user's router and the publisher will be lower [11,19].In fact, a hop is the distance to the nearest node that contains a copy of the content object [20].These result in a shorter distance between the user and the server, less inventory, and less delivery time.This hop drop is reduced because it frequently replaces content on nodes near the user(s).
In some studies, cache replay is localized in router caches and a dangerous problem in publishing real-time data [21].The researcher found that the Probe cache placement algorithm can reduce cache redundancy in server hits by about 20%, while reducing the total number of hops required to hit cached data by about 8%.It can handle high packet loads on multiple platforms, it can handle multiple nodes, and it is stable in most network settings.
Thus, the Probe algorithm is effective to significantly reduce network traffic redundancy, which can result in minimal latency compared to other deployment algorithms.
Pprob.The probability of this router being cached depends on its distance from the source server and the total storage capacity of the route [21,22].Content should be cached closer to the destination to leave caching space for shorter content streams at the core of the network.This algorithm makes efficient use of network resources, reduces cache redundancy, and therefore reduces network traffic overhead, as shown in Fig. 5 above [23].The optimal algorithm misses the cache when an element is requested.When an item under a particular package is re-requested, PProb does not cache data for the requested item and does not kill any processes until the link is confirmed [22].
Thus, the advantage of this approach is that it reduces the number of misses that occur when objects are queried.On the other hand, it reduces the time it takes to iterate over objects when requested in the future.It supports multiple platforms and the reference sequence for each node is constant with the packet.However, the disadvantage of this approach is that it may require additional space to store the sequence of elements at different points in the cache [24].
At high cache sizes, PProb uses efficient queue management, and the packet is preprocessed based on the number and priority of the nodes [18,25].Some studies have concluded that PProb is still an adequate algorithm when it comes to latency at 10 GB cache size, and that the latency increases when processing multimedia sources with larger cache sizes [15].
A comparison of these algorithms is presented in Table 1 below.

Conclusions
Cache placement algorithms are critical to a CDN because they determine which cache server to store content and where to place it.Cache deployment algorithms are used in a variety of applications, including video-on-demand, e-commerce, and gaming.As the demand for highquality content continues to grow, caching algorithms play an increasingly important role in delivering content efficiently and quickly to users around the world.Algorithms for caching in CDNs have been reviewed and explored in detail above.The cache placement algorithms such as LCE, LCD, Rand, Prob, PProb, which are widely used today, were discussed.Research shows that CDN needs to improve by finding and discussing the best caching algorithm in terms of latency, stability, multi-node processing, multi-platform implementation and complexity.

Fig. 4 .
Fig. 4. Caching structures: a) hierarchical caching and b) distributed cachingIn the distributed caching shown in Fig.4.b, caches are placed at the network boundary, and in such an architecture consisting of several networks, cache servers serve each other, that is, if the content requested by a network user is not available in the cache server of that has less redundancy in routing, less work per request, more simplicity, suitable cache size, and more moderation.This results in shorter lead times than LCE.

Table 1 .
Comparison of cache placement algorithms