Quality indicators in methodology for designing RAM units as part of microprocessor systems in digital devices used in environmental engineering

. The article is devoted to the design of RAM blocks as part of microprocessor systems and methods for ensuring fault tolerance. The structural diagram of RAM and the process of the influence of heavy charged particles (HCP) on a memory integrated circuit (IC) are considered. Particular attention is paid to the influence of the biopolar effect on the fault tolerance of IC elements, as well as to emerging multibit events. The article analyzes the various phases of RAM operation and the reactions of memory circuit elements to the occurrence of failures caused by the impact of environment in digital devices used in environmental engineering


Introduction
A typical block of static RAM consists of [1]: a) an array of storage cells; b) decryption block consisting of: 1) line number decoder; 2) column number decoder; c) input/output block; d) control logic.A generalized block diagram of static RAM is shown in Fig. 1.
An array of storage cells usually occupies the bulk of the RAM area and can be structured into various levels of hierarchy, depending on the implementation features of a particular memory [2].
In simple cases, applicable to small blocks, the array has one-dimensional addressing, and the address is selected only by line number.Typically, register files are executed in this way.From a durability point of view, an array with one-dimensional addressing has the disadvantage that cells of the same word are adjacent to each other, and the likelihood of a double failure that will not be detected by the parity system increases.It is advantageous to implement RAM blocks of several kbits on the basis of arrays with two-dimensional addressing.In addition to the row selection, a column selection is added, so that several words are stored in one row, the digits of which are mixed together.Although two-dimensional array addressing does not provide a gain in area, it allows you to bring the block shape closer to square and reduce the likelihood of double failures by dispersing the bits of the same word in space.
In the case of large memory blocks, three-dimensional addressing and partitioning into banks are used.The circuit that combines banks of RAM into a single whole can be either a custom block or synthesized in Verilog.
The decryption unit provides access to the desired word in memory.Typically, address decryption is carried out in two stages.Separation into stages allows you to reduce buffering costs and increase the speed of the circuit.The column number decoder is used to select one of the vertical columns located in a block of one digit; accordingly, the row number decoder is responsible for selecting a memory line.In synchronous RAM blocks, decoders may contain latches that allow the selected address to be stored.The location of the latches can be different -both at the input of the memory block, before the decryption path, and after it.
The I/O block is used to read and write the selected memory bit and, accordingly, consists of read and write paths.The read path contains a precharge circuit, a read amplifier, latches for storing the read bit, and an output buffer.The recording path consists of an input buffer, a signal-based recording enable circuit, and a recording amplifier.In synchronous RAM blocks, latches may be present in the write path, allowing the received data to be stored for some time.I/O blocks are connected to the array of memory cells through column multiplexers, which are the second stage of decoding the column number.Each I/O block is responsible for one bit of the stored word.
The RAM is controlled by a block of control logic, which forms the correct timing diagram of the block as a whole.In Figure 2 shows the internal reading diagram of a typical memory block.The cycle begins with the reception of the address and its subsequent decryption.Simultaneously with the decryption of the row and column, a precharge is turned on, clearing the reading path of unnecessary potentials and history.After decryption is completed, the process of fetching the line begins, followed by reading, amplification and buffering of the data.
In Figure 3 shows the internal recording diagram of a typical memory block.The cycle begins with the reception of the address and its subsequent decryption.Simultaneously with the decryption of the row and column, a precharge is turned on, clearing the recording path of unnecessary potentials and history.After decryption is completed, the process of sampling the address begins, followed by recording the input data.The problem of ensuring fault tolerance of RAM is not only that the constituent blocks of RAM have different sensitivity to various influences, but also that the block as a whole has different sensitivity depending on the phase of its operation [3][4][5].A RAM block can be in one of three main operating modes: write, read, store.Most susceptible to special effects.factors storage and reading phase.When analyzing the fault-tolerance of RAM blocks, it is also necessary to take into account the relationship between the presented phases of work in the time plane, which is determined by: the purpose of a specific RAM block, the frequency and type of access to it from its environment and other parameters of the specific implementation and use.I eat a block.

Materials and methods
When a high-voltage charge enters a bulk MOSFET, charge separation occurs in the strong field of the drain pn junction, which means that the sensitive area is the drain of the transistor and its immediate surroundings.
In SOI MOSFET, the drain p-n junction does not have a bottom part, therefore, even if it enters directly into the drain, charge separation may not occur.The sensitive area in this case is a small area of the drain and body of the transistor.However, another mechanism plays an important role in COI-the parasitic bipolar effect.When a heavy charged particle enters the body of the transistor, electron-hole pairs are formed, which can be separated in the strong electric field of the drain p-n junction.As a result, the main charge carriers can accumulate in the body of the transistor, which leads to an increase in the potential of the body, opening of the source p-n junction and the inclusion of a parasitic bipolar structure [6][7][8].Thus, in contrast to a bulk MOSFET, in SOI the main volume sensitive to TPC is the drain p-n junction only on the side of the transistor body; the bottom part of the drain p-n junction is blocked by a hidden oxide.
When a heavy charged particle (HCP) enters a reverse-biased p-n junction (usually the drain p-n junction of a closed MOS transistor), electron-hole pairs are generated along the track of the particle, that is, charge is injected into the p-n junction [9][10][11].The current pulse corresponding to the injected charge consists of two main parts: a burst with a characteristic rise time of about 0.1 ns and a slightly longer decay.The fast surge is associated with charge collection due to carrier drift in the field of the pn junction; the decline is determined mainly by the diffusion of charge carriers from the substrate.Thus, all other things being equal, SOI technology has advantages over volumetric technology due to the shorter charge collection length.
The entry of a charged particle into a reverse-biased p-n junction is shown in Fig. 4.
If the current induced by a charged particle is large enough, it can cause switching and loss of information in a memory cell, and in a combinational circuit, the temporary appearance of an incorrect signal.

Fig. 4. A charged particle enters a reverse biased p-n junction
The minimum charge leading to switching (hereinafter: critical charge) is a characteristic of a particular circuit and significantly depends on many of its parameters.Accordingly, by varying the circuit parameters it is possible to achieve an increase in the critical charge and, accordingly, greater resistance to the effects of high-voltage particles [12][13][14].
Failures are caused mainly by primary high-energy protons and heavy charged particles (HCP) of solar and galactic origin (O, Si, Fe, Mg, etc.) with high values of linear energy transfer (LET) per unit mass length dE/(dx).The relevant range for glitches is 1-30 MeV/mg/cm2.The spectrum and density of the spectral components of high-energy particles in outer space are determined by regulatory documents.Since the values of the critical charge for different functional units of the same crystal can differ significantly, the impact of the spectrum of TCPs with different LETs significantly depends on the LET value.Not every TZCH causes a failure.For HCPs with lower LET values, the number of failures is significantly less than the number of impacting HCPs.It should also be borne in mind that the spectral current density of the TPC with high LET values decreases.
Direct ionization by protons is insignificant (for example, protons with an energy of 60 MeV have an LET of 0.008 MeV/mg/cm2 in silicon), and therefore failures from protons occur due to the formation of secondary particles (recoil nuclei and products of nuclear reactions) [15][16][17].The energy release during nuclear reactions is high enough that in modern submicron circuits a nuclear reaction in a sensitive volume almost always leads to failure.
It is traditionally believed that the main contribution to the total number of single failures in VLSI such as microprocessors and systems on a chip is made by failures in the built-in memory [18].
Memory is one of the main basic elements of any microelectronic systems, and one of the most vulnerable, due to the large volume of stored data and the large area on the chip.A comparison of the intensity of failures in the register file of the commercial PowerPC 7455 processor at different operating frequencies shows that when the operating frequency increases three times, the intensity of failures increases by almost an order of magnitude.This is illustrated in Fig. 5. Research in the field of memory protection has been carried out for several decades, the results of which show the superiority of static memory over dynamic memory [19].The main problem of dynamic memory (and dynamic logic built on the same principle) is storing information on a capacitor in the form of a charge, which can easily be changed when collecting charge from a passing single particle.The static memory has positive feedback, which stabilizes the potentials of the circuit nodes and largely prevents false switching.
The main directions for increasing the durability of static memory are increasing the fault tolerance of a standard six-transistor static memory cell using its topological and circuit modifications, as well as the development of fundamentally different storage cells that have greater fault tolerance (usually due to internal -early redundancy and some reduction in functional characteristics).One of the options for increasing the durability of a standard memory cell is to introduce passive or active RC circuits into the cell feedbacks.This approach reduces the speed of the cell, but makes it possible to quite effectively filter fast current pulses induced by flying single particles.In general, the main disadvantage of the method is the decrease in performance, as a result of which the time for its widespread use has ended quite a long time ago.Developments in the field of designing original static memory cells, one way or another, use the idea of internal redundancy and cross-feedback in the storage element.
The use of such memory cells makes it possible to increase the failure threshold from a few to several tens of MeV/(mg/cm 2 ), but leads to a significant increase in the occupied area on the crystal, an increase in energy consumption and a slight decrease in performance.The disadvantages listed above lead to the fact that such memory cells are used only for critical applications, where reliability is more important than losses in functionality.
Static memory, made on SOI transistors with a "floating" body, has a very low threshold LET of failures, regardless of design standards [20].
When a heavy charged particle enters "floating" body of the transistor, electron-hole pairs are formed, which can be separated in the strong electric field of the drain pn junction.As a result, the main charge carriers can accumulate in the body of the transistor, which leads to an increase in the potential of the body, opening of the source p-n junction and the inclusion of a parasitic bipolar structure.
The base current gain β of a parasitic bipolar structure depends in the same way as that of a conventional bipolar transistor, on many factors, the main ones being the base width (also known as the MOSFET channel length), the body doping profile and the density of surface states near the Si interface /SiO2.
The parasitic bipolar effect is one of the main problems in ensuring the fault tolerance of commercial SOI technologies that use partially depleted transistors with a "floating" body.This solution allows saving from 20 to 50 percent of the area on the chip, but leads to a sharp decrease in noise immunity and a decrease in the breakdown voltage of the gate dielectric due to the influence of the uncontrolled potential of the "floating" body of the transistor.
The influence of the parasitic bipolar effect leads to a multiplication of the radiationinduced charge and, as a consequence, a decrease in the minimum radiation-induced charge leading to failure.The threshold LET of failures in SOI circuits made on transistors with a floating body is reduced to a level of 1-2 MeV/(mg/cm 2 ), as shown in Fig. 6.The minimum radiation-induced charge leading to a failure in the memory element is called the critical charge and is a characteristic of the circuit (however, depending on the parameters of the radiation-induced current pulse).In fact, the critical charge represents the noise immunity characteristic of the circuit against fast current interference.
With a small saturation cross section for single failures, characteristic of SOI, a low failure threshold leads to a higher failure intensity in real conditions.The characteristic LET of cosmic particles lies in the range of up to 30 MeV/(mg/cm 2 ), which leads to the need to increase the threshold LET of failures by circuit methods.
The parameters of a parasitic bipolar transistor depend on the topology and physical structure of the affected IC element, in particular, on the channel length of the affected transistor (which is the width of the base of the parasitic BT), on the lifetime of charge carriers and some other factors.
One of the main parameters of a bipolar transistor is the width of the base, corresponding to the length of the main MOSFET channel.The BT gain is inversely proportional to the square of the base width, that is, it grows rapidly with decreasing design standards, which the fight against the bipolar effect especially relevant for deeply submicron and nanoscale circuits.
Another important parameter is the amount of resistance between the body and the source.In the case of a "floating" body, the effective resistance is determined by recombination on surface states and amounts to hundreds of kOhms.In the case of a contact to the body, the effective resistance is directly proportional to the resistivity of the body and the distance to the contact, which entails the need to create a large number of contacts to the body or reduce the width of the transistor.
With a large distance between the contacts, local switching on of a parasitic bipolar transistor is possible when hit by a high-voltage charge.
The main method of reducing the influence of the parasitic bipolar effect on the fault tolerance of a SOI transistor is reliable grounding of the entire body of the transistor, preventing the opening of the source p-n junction.In this case, it is important to determine the optimal distance between the contacts to the body, which allows for good grounding with an acceptable reduction in the effective width of the transistor.

Research and results
Failure of several bits in memory can occur if a charged particle passes through the sensitive region of transistors located in different information storage cells.In relatively old technologies, where the distance between sensitive volumes is quite large, the charge caused by the passage of a charged particle affects, as a rule, one sensitive volume, and, accordingly, one transistor.In the case of using new technologies, as well as in the case of particles passing at an angle other than straight, as shown in Fig. 7, it is possible to affect several sensitive volumes and, accordingly, several transistors.If the sensitive volumes correspond to transistors located in different information storage cells, a multi-bit failure may occur -a failure of several cells located next to each other.The number of bits affected depends on the angle at which the charged particle entered the substrate, the size of the sensitive regions, and the distance between the sensitive regions.In the limiting case of a particle passing parallel to the IC surface, the number of affected bits will be maximum.In the case of the convergence of sensitive areas, as well as in the case of an increase in their volumes, the angle of impact of the charged particle relative to the normal, which can cause multi-bit events, increases.Accordingly, the transition to more modern technologies sharply increases the likelihood of multibit events.
Another possibility for the occurrence of multiple failures is associated not with the passage of a single particle directly through the sensitive volume, but with the capture of charge carriers from the particle track by the strong field of a nearby reverse-biased p-n junction.
That is, several transistors located in neighboring cells can be affected even if the particle track does not intersect with their sensitive volume.In this case, a multibit event can be caused by a particle hitting even a right angle to the surface of the conductor, as shown in Fig. 8.Even fault-tolerant cells such as DICE or TMR can be susceptible to multibit events, especially if their topology design is incorrect.If it is necessary to implement persistent memory, it is necessary to use topological design technology, in which the sensitive volumes of nearby cells alternate with each other, and thereby reduce the likelihood of each individual cell overturning.One option is to locate overlapping modules at some distance from each other.Another option is a memory implementation in which, along with cells that are resistant to single failures, correction codes are used.It is also necessary to periodically rewrite the entire memory to avoid the accumulation of errors.The time between updates must be selected on a case-by-case basis as it will depend on the technology used, radiation intensity and other factors.
In Fig. 9 and 10 show schematic plans of the "bad" and "good" topology of a tripled Dflip-flop on logic elements.The corresponding elements of different triggers are marked with the same color, the simultaneous defeat of which will lead to a failure at the output of the voting circuit.It can be seen that in the first version of the topological implementation, the risk of simultaneous failure, especially with submicron sizes of elements, is very high, which can "negate" the results of using circuit methods for increasing resistance.
Exposure to the full absorbed dose can seriously change the parameters of microelectronic devices, including off-state leakage currents and threshold voltage.As a result, dose degradation can affect the resistance of IC elements to the effects of single ionizing particles.A study of the effect of the full dose on the intensity of failures in several types of static memory chips shows that in the general case, under the influence of the full dose, the cross section of single failures increases.The reason may be a decrease in transistor transconductance due to a drop in mobility, or a change in the noise immunity of memory cells due to a shift in the threshold voltages of transistors.
A decrease in the transconductance of the transistors of a memory cell leads to a proportional drop in the critical charge and an increase in the failure cross section due to lower-energy particles and particles with a shorter effective track length.

Conclusion
The operation of any RAM can be divided into three main phases:  recording;  storage;  reading.Resistance to failures is determined not only by the localization of the HCP hit in a particular block, but also by the operating phase of the memory block.Below is an analysis of the operation of various elements of the memory circuit and their reaction to the occurrence of failures caused by the ingress of heavy charged particles.
Let's consider the operation of the memory unit during data recording and possible places where failures may occur caused by the ingress of heavy charged particles.An array of storage cells is most susceptible to failures, however, if a particle hits the area of a cell being rewritten, its effect is negligible, since the erroneous information will either be immediately rewritten with the correct one, or will be leveled out by the inertia of the recording amplifiertarget cell path.An exception may be a cell failure at the end of the recording phase, at the moment the word sampling line is closed, but the probability of such an event is low.In addition to the memory array, failures can occur in the control logic, as well as in the input buffers responsible for receiving data for writing.Let's move on to their analysis.
If the TCD enters the area the memory input cascades, where input buffers based on combinational logic are used, recording is carried out by a differential signal with a full swing.First, the data being written goes to the input inverter, which can be prone to failures, but this is followed by a chain of more powerful inverters, ending with buffers with a third state (the write amplifier).Even if a failure occurs at the input, it will reach the cell with low confidence, since powerful buffers do not allow needle-shaped pulses to pass through.
In Figure 11 shows a fragment of a circuit responsible for recording in a fairly simple memory block.First, the data goes to an input buffer containing an input inverter and a small delay line to optimize the timing diagram.The "WRT_CONDRV" block based on inverters generates buffer control signals with a third state ("WRT_CON2").The "DIO_WDV2" block is a "recording amplifier" -two powerful buffers that can efficiently handle heavy loads.The logic that controls the recording process is also prone to failure.In the simplest cases, it can be combinational, in which case a short-term failure will be suppressed by the output buffer.Some versions of available memory blocks use triggers in the control logic, which store the current state (idle/write).A failure in such a trigger is more likely to cause a false write to a random address or is less likely to prematurely abort a scheduled write.The problem can be solved by using DICE circuitry.
Address decryption, the process that precedes read and write operations, can also be subject to failure.In simple memory circuits, decoders contain only combinational elements ending in powerful buffers.Short-term failures in such configurations will be suppressed.In more complex versions with synchronization based on a global clock signal, registers can be used.Depending on the decoder implementation, the address can be latched either at the input or at the output.Solutions with multiple latches are possible, for example, when it is necessary to distribute read and write operations into different parts of the clock cycle.In general, as the number of triggers increases, the resistance deteriorates, since the probability of HCP hitting a trigger increases.Thus, the version of the decoder with a register at the input can be considered the best solution, since it will require the least number of flip-flops.A failure in the synchronous decoder can cause two or more addresses to be sampled, or incorrect decryption to occur.The consequences of such a failure will be severe -data may be corrupted at several addresses at once that have nothing to do with the address being

Fig. 5 .
Fig. 5. Intensity of failures in the register file of the PowerPC 7455 processor at different operating frequencies

Fig. 6 .
Fig. 6.The value of the threshold LET of failures in SOI for memory on transistors with a "floating" body and with contacts to the body

Fig. 7 .
Fig. 7. Passage of a charged particle at an angle to the substrate

Fig. 8 .
Fig. 8. Multi-bit event caused by a high-energy charge hitting the IC surface at right angles

Fig. 10 .
Fig. 10.Correct placement of triplicated elements in the topology

Fig. 11 .
Fig. 11.An example of a fragment of an I/O block circuit responsible for recording More complex memory blocks may use a write circuit synchronized by a global clock signal, usually involving a register to store input data.Failure is more likely in this configuration because additional triggers are present.Increased durability can be achieved by using DICE redundancy.The logic that controls the recording process is also prone to failure.In the simplest cases, it can be combinational, in which case a short-term failure will be suppressed by the output buffer.Some versions of available memory blocks use triggers in the control logic, which store the current state (idle/write).A failure in such a trigger is more likely to cause a false write to a random address or is less likely to prematurely abort a scheduled write.The problem can be solved by using DICE circuitry.Address decryption, the process that precedes read and write operations, can also be subject to failure.In simple memory circuits, decoders contain only combinational elements ending in powerful buffers.Short-term failures in such configurations will be suppressed.In more complex versions with synchronization based on a global clock signal, registers can be used.Depending on the decoder implementation, the address can be latched either at the input or at the output.Solutions with multiple latches are possible, for example, when it is necessary to distribute read and write operations into different parts of the clock cycle.In general, as the number of triggers increases, the resistance deteriorates, since the probability of HCP hitting a trigger increases.Thus, the version of the decoder with a register at the input can be considered the best solution, since it will require the least number of flip-flops.A failure in the synchronous decoder can cause two or more addresses to be sampled, or incorrect decryption to occur.The consequences of such a failure will be severe -data may be corrupted at several addresses at once that have nothing to do with the address being