Parallel Computing for Numerical Analysis of a Fan Assembly Subjected to a SPH Bird

. Smoothed Particle Hydrodynamics (SPH) is widely adopted to predict bird strike events. To improve the parallel computing efficiency of the SPH approach, parallel computing was performed on the process of a bird striking the fan assembly. Since the cube-shaped domains aligned along the coordinate axes that are inherent in the decomposition algorithm may result in low computational efficiency, the effect of customized data partitioning on the efficiency is investigated. The results show that customized decompositions can minimize communication between processors and ensure the load balance during the simulation process. Besides, distributed computing with domain decompositions can present reasonable predictions at soft-impact damage, achieving consistent results within a range of less than 7% of the reference data derived from shared memory computing.


Introduction
Bird-strike events present an apparent hazard to the safety of aircrafts. Aviation authorities require that a certain level of bird-strike resistance in certification tests of aircraft components must be substantiated [1]. However, bird-strike experiments are expensive, timeconsuming and challenging to be conducted. In addition, it proves difficult and complicated to record the experimental parameters on account of the ultra-high speed and energies. With the development of advanced numerical techniques and the advent of highperformance computers, numerical techniques have been more widely used since the 1980s to capture the softimpact damage of aircraft components [2]. SPH is sufficient for industrial problems where large deformations occur [3]. The SPH method is recommended in the simulation of the bird-strike process, due to high stability, low cost and good correlation with experimental observations in terms of scattering particles [1]. However, the associated computational cost is increased considerably for SPH simulations [4]. Reducing the computational time without any significant decrease in accuracy is naturally quite critical for the SPH technique [4]. This paper is presented to improve the efficiency of the SPH approach. Since data partitioning algorithm plays a key role in the efficiency of the simulation, this study focuses on determining the most suitable decomposition. Customized decompositions were proposed to improve the efficiency of distributed computing. Furthermore, the effect of distributed computing with domain decompositions on the consistency of numerical results was discussed.

Parallel computations
Parallel computing breaks down a large problem into smaller pieces, assigning one per processor. The speed of distributed computing depends on not only the highperformance computer but also the domain decomposition approach.

Parallel computing environment
The numerical analysis was calculated based on the high-performance computer Magic Cubic-Ⅱ of Shanghai Supercomputer Centre. Figure 1 illustrates the cluster architecture of the high-performance computer system. Under the operating system of Linux, a Dual 12-core Xeon ® E5-2680v3 @ 2.5GHz with 128 Gb Memory was used to conduct the parallel computing of the birdstrike simulation.

Domain decomposition method
For a simulation of distributed computing, the first step is to divide the given problem into subdomains based on the number of compute nodes assigned. DDM, Domain Decomposition Method, provides to divide a large-scale problem into subdomains. The decomposed pieces of the model are assigned and reinitialized by respective processors. Data transmission and exchange is performed between processors during each time step. Figure 2 shows the computational process of DDM. Each partition consists of internal and boundary nodes. The communication between processors is achieved via exchanging the information of boundary nodes. In the study presented in this paper, RCB, Recursive Coordinate Bisection, is used for partitioning. Firstly, the largest dimension of the model along the coordinate directions is determined. Then, to handle the weighting of the element count prior to division, the amount of elements is computed along the longest dimension. Finally, with this final centreline, the model is divided into two decompositions in the roughly equal amount of elements. The resulting portions will then be split along their largest dimensions, with a portion allocated to each processor until each domain has roughly the same number of elements [3].

Load balancing
Generally, considering the weighting with respect to the count of elements, halving the model by the default RCB algorithm cannot ensure load balancing of distributed computing due to unequal portions with varying SPH element count. On the one hand, SPH elements have higher computational cost than other types of elements [3]. Thus, it needs to distribute them to each compute node evenly. To illustrate this, a simple model which consists of 4 SPH elements and 12 solid elements (see figure 3), runs using 4-CPUs. Apparently, the decomposition with evenly distributed SPH elements has higher performance in parallel computing. On the other hand, the amount of communication needed between processors must be considered.

Numerical model
The SPH bird striking a rotatory fan assembly modelled with solid elements was established. The bird was aimed at 70% of the blade height, with an impact speed of 103 m/s. The fan assembly was at an initial rotatory velocity of 542 rad/s. Figure 4 shows the bird-strike simulation model. With a length-to-diameter ratio of 2:1, a hemispherical ended cylinder was used instead of a bird, associated with a null material constitutive model. The Mie-Grüneisen equation of state (EOS) is defined to model the compressibility characteristics of the bird material. The total mass of the bird was 1.85 kg, with a length of 260 mm. The bird was modelled using the SPH method, with an inter-particular distance of 2 mm. The amount of SPH elements was 243,512. Twenty-four evenly spaced blades were attached to a fan disc. Each blade was meshed with 4650 solid elements. The total number of fan-assembly solid elements is 165,200. The fully-bladed fan rotor is made of titanium alloy Ti-6AL-4V, with the empirical Johnson-Cook material model.  The simulation was calculated using 16 CPUs under LS-DYNA MPP. The communication was performed with Platform-MPI (Message Passing Interface). The termination time of the bird-strike simulation was 4 ms.

Decompositions of numerical model
Using the RCB method, the standard decomposition of the bird-strike simulation model was shown in figure 5a. It can obviously be seen that the amount of SPH particles included in each processor varies within a large range. Since SPH elements can have a significant impact on the calculation cost, the efficiency will be directly linked with the SPH computation [3]. It can be concluded that standard decomposition used for finite elements will result in a poor load balance. Therefore, to achieve better load balancing, it is the least to distribute SPH elements to all computing nodes evenly. As the "SPHdist" decomposition shown in figure 5b, the SPH part is separately decomposed and SPH elements are allocated across all processors, resulting in regular cube domains of the SPH model. Considering the weighting of SPH element cost, "SPHdist" may seem like a good choice. However, the RCB method tends to generate cube-shaped domains aligned along the coordinate directions, which is inherent in the algorithm, but is often not the behaviour desired. Besides, the resulting domains may produce bad speedup performance of the simulation. To minimize communications between processors and ensure load balancing, it is desirable to define customized decompositions.
As shown in figure 6, with extracting the SPH part from the simulation model, three new decompositions of the bird model were developed, named "SPH Z-slice" (see figure 6a), "SPH XY-slice" (see figure 6b) and "SPH YZ-slice" (see figure 6c). Applied with a set of coordinate transformation functions to scale the initial dimensions, the SPH model is decomposed along the supposed direction. Associated with these customized decompositions of the SPH model, the fan-assembly model is decomposed by default (used of the fanassembly decomposition in figure 5b).  The fan-assembly part is converted into a cylindrical coordinate system. The resulting domains are "cubes" in cylindrical coordinates. This will result in that cylindrical shells of domains are generated. Associated with these customized decompositions of the fan assembly, the "SPHdist" decomposition is used for the SPH model.

Results and discussion
On the one hand, the speedup of customized decompositions was studied. On the other hand, comparing with the result derived from shared-memory computing (SMP LS-DYNA), the consistency of results from the developed decompositions was discussed.

Speed up
Comparing with the calculation cost of the "SPHdist" decomposition, the speedup of optimized decompositions was shown in figure 8. It can be obviously found that "SPH XY-slice" produces the best speed-up performance of the simulation, with the calculating time reduced by 25% compared to "SPHdist". Regarding modified decompositions of the fan-assembly modelled with solid elements, the "Concentric" decomposition provides the highest efficiency, with the speedup of 83% compared to the default decomposition (see figure 5b). It can be concluded that the proper definition of domain decomposition for the numerical model can effectively improve the calculation efficiency.

Consistency of results
Impact force of the bird and effective plastic strains of blades are adopted to evaluate the soft-impact damage during the bird-strike process. To discuss the consistency of numerical results, the reference data is derived from shared-memory computing which the simulation model cannot be decomposed into subdomains assigned on cluster nodes. Figure 9 shows numerical results of the simulation. It is found that distributed computing with customized decompositions achieves consistent impact force, with the peak value of impact force in a range of less than 2% of the reference data obtained by sharedmemory computing. Nevertheless, distributed computing gives a slightly more significant impact on results of fanblade effective plastic strains than that in the case of impact force introduced by the bird, within a range of less than 7% compared to the reference data. Notably, modified decompositions of "SPH XY-slice" and "Concentric" which has a higher calculation efficiency produce the consistency of effective plastic strains in a more extensive range, with the percentage of 6.32% and 5.76%, respectively.

Conclusions
In this study, to improve the computing speed of distributed computing, the process of a fan assembly during a bird strike was simulated using modified decompositions. It is found that specifically modified decompositions of both SPH elements and solid elements have a significant improvement in the computation efficiency. With respect to the SPH part of the bird, applied with modification to scale the initial dimensions, customized decompositions with reduced boundaries between domains are more efficient. For the fan assembly modelled with solid elements, in contrast to cube-shaped domains aligned along the coordinate axes, the resulting domains in cylindrical coordinates show better computational efficiency.
Moreover, comparing with the reference data, distributed computing with modified domains achieves consistent results within a range of less than 7%. It is worth noticing that customized decompositions with higher efficiency produce the consistency of results in a larger range. Thus, it has to be aware of the consistency of numerical results as defining modified decompositions to improve the efficiency and ensure a better load balance.