Development of the principles of parallelization of the symmetric AES encryption algorithm

. This article delves into the potential of modern computational systems and technological advancements in refining encryption algorithms, with a particular focus on the Advanced Encryption Standard. The paper underscores the significance of understanding the intricate mechanics and primary processing stages of AES to harness its full potential for parallelization and acceleration. The research emphasizes the pivotal role of Graphics Processing Units in this optimization journey. Given their multi-threaded architecture, GPUs, when integrated with state-of-the-art parallel computing platforms like CUDA and OpenCL, can process vast data volumes with heightened efficiency. This integration not only speeds up the encryption process but also enhances its robustness. The article concludes with a forward-looking perspective, anticipating a deeper convergence between the realms of cryptography and parallel computing. Such a fusion is projected to usher in revolutionary enhancements in both the security and performance facets of encryption systems.


Introduction
The Advanced Encryption Standard (AES) algorithm, commonly known as AES, was introduced following meticulous research and selections conducted by the U.S. National Institute of Standards and Technology.The primary objective behind the development of this new standard was to establish an algorithm capable of replacing the increasingly obsolete and less secure Data Encryption Standard (DES).Adopted in the 1970s, DES served as a reliable encryption standard for many years.However, with the advancement of computational capabilities and cryptanalysis methods, it became evident that DES could be susceptible to brute-force key attacks within feasible time frames.This realization spurred the creation of a newer, more robust algorithm.
Key characteristics of AES: -AES operates as a symmetric cipher, implying that the same key is employed for both encryption and decryption processes.This differentiates AES from asymmetric ciphers, where distinct keys are used for encryption and decryption.
-Instead of encrypting data character by character, AES encrypts in blocks.This means it takes an input data block of a fixed length (128 bits) and transforms it into an encrypted block of the same length.If the input data's size doesn't match the block's size, it can be padded to fit the required dimensions.
Today, AES stands as one of the most widely adopted and trusted encryption standards.Its symmetric and block-based structure ensures efficient and secure data encryption.When used correctly and with adherence to key size and operation mode recommendations, AES can provide a high level of security in contemporary information systems.

AES processing stages and their parallelization potential
The AES algorithm stands as a cornerstone in contemporary cryptographic systems.To grasp its mechanics and potential optimization avenues, it's imperative to delve into the primary data processing stages within AES and discern which of these can be parallelized [1].

Core processing stages of AES:
-SubBytes: at this juncture, each byte in the data block is substituted with a corresponding byte from a specialized table known as the S-box.This substitution operates on the principle of nonlinear transformation.
-ShiftRows: this stage embodies a permutation operation where the state rows are shifted varying positions depending on their row number.
-MixColumns: during the MixColumns phase, each state column is multiplied by a fixed polynomial in the Galois field, ensuring additional data intermixing.
-AddRoundKey: throughout this stage, the state block is element-wise added (modulo 2) with the current round key.
Parallelization potential across stages: -SubBytes: given that each byte is processed independently from the others, this stage is perfectly suited for parallelization.Vectorization and SIMD (Single Instruction, Multiple Data) operations can be employed for simultaneous processing of multiple bytes.
-ShiftRows: although the ShiftRows operation is a straightforward permutation, it can also be optimized by parallel processing each data block row.
-MixColumns: as each block column is processed independently, this stage can be executed in parallel on multiprocessor systems.
-AddRoundKey: since each state block byte is independently added to the corresponding round key byte, this stage too can be parallelized.
To achieve peak performance when deploying the AES algorithm on modern multiprocessor and multithreaded systems, it's critically essential to comprehend its operational mechanics and parallelization opportunities at various processing stages.Such understanding will facilitate effective workload distribution, leading to a significant acceleration in the encryption process [2,3].

Leveraging graphics processors for AES transformations
Graphics Processing Units (GPUs) have long transcended their original role as mere hardware components for graphics rendering.With their burgeoning computational capabilities, they have found applications in diverse domains, ranging from scientific research to cryptography.One notable arena where GPUs have been harnessed is in accelerating encryption algorithms, such as AES [4,5].
Modern GPUs boast a plethora of cores, each capable of executing instructions independently of the others.This architecture, termed SIMD, empowers GPUs to process a vast number of data streams concurrently [6,7].This makes them particularly apt for tasks demanding simultaneous processing of substantial data volumes, like encryption.
In the realm of parallel computing, NVIDIA's CUDA (Compute Unified Device Architecture) stands out as a revolutionary platform and programming model.Designed to harness the full potential of NVIDIA's GPU architectures, CUDA has transformed the landscape of high-performance computing.Parallel computing has long been a focal point for researchers aiming to achieve higher computational speeds and efficiency.With the advent of powerful Graphics Processing Units, the potential for parallel processing has expanded exponentially [8,9].NVIDIA, a global leader in GPU production, recognized the untapped potential of GPUs for tasks beyond graphics rendering.This led to the development of CUDA, a platform specifically designed to leverage the parallel processing capabilities of NVIDIA GPUs.While there are several parallel computing platforms available, CUDA is uniquely optimized for NVIDIA hardware.This ensures that developers can extract maximum performance from NVIDIA GPUs when using CUDA.One of CUDA's standout features is its ability to grant applications direct access to both the physical and virtual resources of the GPU [10,11].This means that memory allocation, data transfer, and computational tasks can be managed and optimized at a granular level, leading to enhanced performance and efficiency.CUDA provides a versatile programming model, allowing developers to write parallel code using familiar languages like C, C++, and Fortran.This ensures that a broader community of developers can harness the power of parallel computing without a steep learning curve.NVIDIA's CUDA platform has undeniably reshaped the landscape of parallel computing.By offering direct access to GPU resources and being intricately tailored for NVIDIA products, CUDA ensures that the immense computational power of modern GPUs can be fully harnessed for a wide range of applications.
In the ever-evolving landscape of parallel computing, the need for a universal standard that can cater to various hardware platforms is paramount.OpenCL (Open Computing Language) emerges as a solution, offering a platform-agnostic approach to parallel programming.With the surge in computational demands, parallel computing has become a cornerstone for achieving enhanced performance and efficiency.The challenge, however, lies in catering to the diverse range of hardware platforms available.Recognizing the need for a unified programming interface that can bridge the gap between different hardware platforms, the OpenCL standard was introduced.Its primary objective is to provide developers with a toolkit that remains consistent irrespective of the underlying hardware.OpenCL stands out for its ability to operate across a multitude of platforms.Whether it's Graphics Processing Units, Central Processing Units, or even Field-Programmable Gate Arrays, OpenCL ensures seamless functionality.The strength of OpenCL lies in its widespread acceptance.Numerous hardware manufacturers endorse and support OpenCL, testifying to its robustness and versatility.This endorsement ensures that developers can rely on OpenCL for consistent performance across various devices.One of the most significant advantages of OpenCL is its ability to facilitate the development of universal code.Developers can script programs without being tethered to a specific hardware architecture, ensuring that the same codebase can be executed on various hardware setups with minimal modifications.OpenCL represents a significant stride in the realm of parallel programming.Its ability to offer a consistent programming interface across diverse hardware platforms ensures that it remains a preferred choice for developers aiming for cross-platform compatibility and performance [12,13].
The Advanced Encryption Standard stands as a pivotal encryption algorithm in modern cryptography.Its block-based structure inherently suggests potential avenues for parallel processing.With the advent of powerful GPUs, there's an opportunity to harness this parallelism for improved encryption speeds.AES operates on data blocks, applying a series of transformations to each block.This block-wise operation, coupled with the need to process multiple blocks, especially in large datasets, makes it a candidate for parallel processing [14,15].The AES algorithm comprises several stages, including SubBytes, ShiftRows, MixColumns, and AddRoundKey.Each stage involves specific transformations that contribute to the overall encryption of the data block.Modern GPUs are designed with a multitude of cores capable of executing tasks concurrently.This architecture is particularly beneficial for tasks that can be broken down into parallel operations, such as the transformations in AES.
Parallelizing AES stages on GPUs: -SubBytes: given its byte-level operation, this transformation can be distributed across multiple GPU cores for simultaneous processing.
-ShiftRows and MixColumns: these involve permutations and mixing operations, respectively.Both can be efficiently parallelized on a GPU, processing multiple rows or columns concurrently.
-AddRoundKey: this bitwise operation can also be parallelized, with each byte of the data block being processed independently.
Implementing AES on GPUs can lead to significant acceleration in the encryption process.Preliminary tests indicate that, especially for large datasets, GPU-accelerated AES can achieve substantial speed-ups compared to traditional CPU-based implementations.The block-based structure of AES, combined with the parallel processing capabilities of GPUs, offers promising avenues for enhancing encryption speeds.As data volumes continue to grow, leveraging GPUs for AES encryption will become increasingly crucial for real-time secure data transmission.
The incorporation of GPUs in the realm of cryptography, especially in AES encryption, heralds new frontiers in performance and processing speed [16,17].Platforms like CUDA and OpenCL enable developers to tap into the full computational prowess of GPUs, rendering encryption swifter and more efficient [18] than ever before.

Conclusion
The contemporary advancements in technology and computational systems have ushered in a plethora of opportunities for refining and optimizing encryption algorithms, notably the Advanced Encryption Standard.A deep dive into the mechanics of AES and its primary processing stages underscores the criticality of comprehending the algorithm's internal structure.Such understanding is pivotal for its effective parallelization and acceleration.
In this context, Graphics Processing Units emerge as instrumental assets.Their multithreaded architecture, when synergized with cutting-edge parallel computing platforms like CUDA and OpenCL, facilitates the efficient processing of vast data volumes.This, in turn, expedites and fortifies the encryption process.

Fig.
Fig. Principles of AES parallelization.