Formation of a management strategy for innovation and investment activities of an enterprise

. The influence of single failures on the operation of digital devices is considered. Single events and their classification are considered. A generalized function for ensuring fault tolerance in the design of integrated circuits is introduced. The implementation of these methods is shown on the example of a microprocessor chip.


Introduction
In terms of their functionality and a number of operational parameters, industrial automation tools -microcomputers, controllers, network devices -turn out to be suitable for solving many problems as part of on-board equipment (BA) of spacecraft (SC).Their use is attractive in that it can significantly reduce the cost of BA.An obstacle to this is the special operational requirements for spacecraft equipment, and, above all, the requirements to ensure radiation resistance [1].
This problem has arisen since the beginning of human space activity.It was necessary to ensure the resistance to radiation of a whole set of microelectronic products.However, despite the large accumulated experience in this area, the task of ensuring sustainability is no less relevant.This is due to the general development of the level of technical progress, the introduction of new and improvement of existing technological processes, the revision of the requirements for the composition and parameters of ionizing radiation (IR), due to the clarification of the real radiation situation due to changes in operating conditions.
Recently, the development of microelectronics has led to a sharp decrease in design standards, an increase in the degree of integration and the introduction of advanced design methods using macrofragments, which are called complex functional blocks (SF blocks).As a result, it became possible to implement several SF blocks on one chip.This led to the creation of specialized very large integrated circuits such as "system on a chip" (SoC).
Given the small size of the active regions in these products, the so-called single events began to manifest themselves to a greater extent.These are radiation effects, the cause of which is the interaction of a separate (one) nuclear particle with the active region of the device.These effects belong to a new class of microdosimetric radiation effects in electronic devices and are probabilistic in nature [2].
The main method of providing resistance to these effects is redundancy in the number of elements, software, etc. [3].However, the specific implementation of these methods on certain classes of microcircuits is a rather difficult task.
This publication is devoted specifically to the implementation of these methods for specific types of microcircuits.

Materials and methods
Single events (OS) are radiation effects caused by the interaction of a separate (single) nuclear particle with the active region of the device.
All OS can be classified into reversible and irreversible (catastrophic).The first group includes such effects as SEU, SEFI, SET.The second group includes the effects of SEL, SEH, SEB, SEGR, SES [4].
Irreversible events can lead to catastrophic failures of the VLSI, and usually a power down and re-initialization is required to restore a working state.Reversible failures do not directly lead to catastrophic failures and it is not necessary to turn off the power to restore the normal functioning of the VLSI.However, such failures can be a serious hindrance to the normal functioning of the equipment.This article discusses only reversible effects.
The main methods of protection against these effects, which are used both in equipment and in VLSI, are: circuit solutions for the protection of VLSI elements; the use of special codes that correct errors; methods of structural, temporal and software redundancy.
Circuit protection methods include special solutions for VLSI elements, which allow either to completely avoid failures or to reduce them.
Structural redundancy is an increase in the number of elements with a redundancy function.The most common method is triple redundancy (Triple Modular Redundancy, TMR), based on the creation of duplicates of critical circuit nodes.The total value is chosen by the voting circuit based on the outputs of these elements.Thus, exposure to radiation will change the state of the logic element only if several nodes suffer at once.The greater the redundancy, the more usable die area is used and the less likely SEU is to occur.The disadvantage of this approach is the increase in the number of transistors to perform the same function.TMR does not correct errors, it only provides the correct value.The TMR scheme can be extended to an N-module redundant scheme.With an odd number of N and N ≥ 3, it is possible to use the coincidence of most results to determine the output signal.
The main idea behind temporal redundancy is to increase the time it takes to complete an operation, which is usually achieved by decreasing the clock frequency.This increases the likelihood that during the operation, the failed block will be able to recover and "work out" the correct signal.Also, temporal redundancy can be implemented in repeating calculations two or more times at different times.The results of all calculations are stored in registers.The results are then analyzed by the selection circuit for differences in results, and an output signal is determined.If the selection scheme does not have the ability to decide on the majority of matches, the calculation can be repeated.
The idea of software redundancy is to ensure the reliability of the most important information management and processing decisions.It consists in comparing the results of processing the same initial data by different circuit blocks and eliminating the distortion of the results due to the impact of single failures.
The main difficulty in applying these methods lies in their optimization, the development of dominant solutions depending on a specific block.Consider this problem.
One of the ways to ensure the fault tolerance of integrated circuits is to balance the indicators of structural, informational, software, operational types of efficiency [5] (at the same time, we believe that circuit engineering methods for protecting elements can also be considered as a certain structural redundancy, since they ultimately increase the area crystal).It should be noted that some indicators must be maximized, while others must be minimized with certain restrictions [6].
To find such generalized quality indicators, you can use the minimax norm which is a special case of the generalized norm The generalized minimax criterion of resistance to single failures can be represented in the following form: In our case, as parametersx 1 , x n ̅̅̅̅̅̅̅ only the fault tolerance index of the integrated circuit is used.As parameters y 1 , y n ̅̅̅̅̅̅̅ the overall characteristics of the circuit, the time of performing the main operations, the current consumption and the number of repeated operations are used.
In the case of BA used in spacecraft, current consumption does not play a significant role, since all spacecraft are equipped with powerful power sources that ensure stable operation of microcircuits [7].
Thus, the minimax criterion of resistance to single failures for an integrated circuit can be written as: where Zcom -single failure tolerance criterion for integrated circuits; N -number of redundancy methods used; Zi -redundancy methods used; Let us rewrite (4) in the form where Z1 -structural redundancy; Z2 -temporary redundancy; Z3 -software redundancy; Taking into account relation (3), finally the criterion of resistance to single failures for integrated circuits (5) can be represented as: where x -integrated circuit reliability index; y1 -integrated circuit area; y2 -calculation time of the main operations; y3 -number of repetitions of basic operations; In microcircuits and FPGAs, the structural redundancy method is considered the most optimal choice for protecting RAM and registers from single failures.Its use leads to virtually trouble-free operation of RAM elements.This removes the need to use other redundancy methods for RAM.Similarly, temporal redundancy techniques are used to protect the combinational logic, Hamming code is used to protect the ROM.
Thus, expression (6) can be rewritten as where  1 (,  1 ) -minimax function describing structural redundancy in RAM blocks;  2 (,  2 ) -minimax function describing temporal redundancy in combinational logic blocks; 3 (,  3 ) -minimax function describing software redundancy in registers and ROM.The limitations of this method are the maximum possible area and the minimum possible frequency [8,9].

Research and results
It is known that the first step in applying methods of protection against failures in the design of complex VLSI is the allocation of certain functional blocks.Then it is necessary to evaluate the available remedies in terms of not exceeding the limits [10].
One of the first areas to focus on first is the RAM cells.They are most critical to single failures, due to the relatively large mercy and "failure severity" -the loss of information.First, the RAM area is estimated without cell protection means, then with "special" cells -i.e. using circuit methods and, finally, the increase in area is estimated when cells are reserved, which in the general case can be both ordinary and "special".After that, you should decide which method to use.
To organize VLSI RAM, standard, unprotected memory blocks were used.The circuit consists of three RAM blocks and an error detection block.
When reading data by a microcommand of the microcontroller core, information is read simultaneously from three blocks at once.Combination elements contained in the selection circuit determine the value transmitted to their output by two matches.The disadvantage of this organization is that in the event of a memory cell failure in any of the RAM blocks and a corresponding error is detected, the corrupted data is not corrected.If the memory cell (in the case of an appropriate organization of the program) is not overwritten for a long time, then a failure is already possible in two blocks, which will cause the selection circuit to output an incorrect value.This protection mechanism was used in the microprocessor.This method can be modified by adding a monitoring block.This block, at times when there is no access to the RAM by microinstructions, sequentially reads and rewrites the memory data.In the event that a failure occurs in one of the RAM blocks, all RAM blocks are overwritten with the correct value.With such a protection organization, a situation where the data has not been modified for a long time is impossible.The switching unit is designed to switch between the input data coming to the RAM from the microprocessor core and coming from the monitoring unit.The monitoring unit consists of a control register designed to switch between operating modes, a register that stores the value of a RAM cell, a cell address counter containing information about the address of a cell that needs to be read, a data bus idle detection unit, and a generator of internal control signals for RAM SF blocks .The address counter, in the event of a data bus idle, enumerates the entire RAM address area.The values read from the 3 RAM blocks are compared with each other, and the selection circuit makes a decision on two matches.The correct value is stored in the register.If the value of one of the three blocks differs from the other two, all three blocks are overwritten with the value stored in the register.When using triple redundant RAM methods in microprocessor circuits, crystal growth is observed, which can be up to 20% of the original area.
The next area to consider may be the ROM.
Since the amount of ROM in modern circuits plays a decisive role in shaping the consumer properties of VLSI, it is not advisable to use triple redundancy (TMR) methods to protect against failures in them.The most optimal is the use of corrective Hamming codes.When organizing the ROM of the microprocessor circuit in blocks of 1024 words of 16 bits (1024x16), for each block it is necessary to enter additional 1024 words of 8 bits (1024x8) in order to correct one and fix two errors in the data word.The ROM area can store non-realtime data (correction factors, device states, etc.) and user programs.Program failure can lead to uncontrolled consequences, which is undesirable in real-time systems (it takes time for the watchdog timer to reset a misbehaving program).The VLSI of the microprocessor used SFblocks of EEPROM memory with built-in data protection by the Hamming code.
The total area occupied by the ROM on the chip is 42%, of which 13% of the total area occupied by all elements is accounted for by the verification information.
Then you need to consider registers, which are a collection of flip-flops, united by a common functionality.To protect against failures, some developers use error-correcting coding (parity bit or Hamming code) [11].The disadvantage is the need to write this protection in the HDL code and the insufficient coverage of such protection for all trigger elements.This means that failures are still possible in individual triggers that are not grouped into registers.The way out is to use special library trigger elements with protection against single failures.If there are no special trigger elements with failure protection in the library, it is possible to create a control program (script) that will automatically replace library triggers with a system of triggers protected from single failures.This technique was used in design centers in the development of the VLSI microprocessor.The script was launched after loading the gate netlist into the topology synthesis program.The resulting circuit consisted of three equivalent flip-flops, a selection circuit, and inverters designed to spread the write process over time (temporal redundancy).The area of the composite cell exceeds the area of one trigger by 4-5 times.Since the total area under all trigger elements (12616 pieces) in the VLSI microprocessor is 12% of the total area under the elements, the use of this method led to an increase in the total occupied area by only about 4-6% percent.The advantage of this method is that all triggers in VLSI are protected from failures, this technique is applicable to various technologies [12][13][14].
Finally, consider the implemented methods for protecting combinational logic.When TZCh gets into the elements of combinational logic, a transient process (needles) may occur at the output.Since the input signals (outputs of the corresponding flip-flops) do not change as a result of the failure, some time after the failure, the combinational logic output is set to the correct value.Redundancy, for example, TMR, is not always effective for such elements, since it requires a lot of area, therefore, to protect combinational logic elements from failures, it is better to use temporal redundancy methods, namely, reducing the clock frequency of the device [15][16][17].
The maximum clock frequency of the IC is determined by the execution time of the longest operation.The frequency limit imposes the longest (in time) combinational path in the circuit.If there is a hit of the TGCH and the appearance of a needle in the longest (in time) path, then the combinational logic will not have a margin in time to restore the correct value at its output.In the case of the correct development of the IC, there are a large number of such long-term paths.The output is the laying of the margin (30% -50%) of the maximum clock frequency of the IC [18][19][20].
Figure 4 shows the situation of the appearance of a needle, if the frequency of the device is maximum, and Figure 5 -if the device operates at a frequency equal to half of the maximum.The figures show that in the second case there is a probability of correcting the error, while in the first case there is no such probability.
The probability of correction depends on the time when the particle hits and on the location of the hit.

Conclusion
The tests were carried out on microprocessor microcircuits made using CMOS 0.35 micron technology (without the use of RAM and ROM protection), technology -0.35 CMOS (RAM redundancy -three blocks of 512 bytes, RAM "regeneration" -constant reading and rewriting in case error detection, redundancy of all triggers, Hamming code protection of data memory and command memory) and a microprocessor made using 0.5 µm CMOS / SOI technology (reserving RAM -three blocks of 256 bytes each).
Experimental studies were carried out under the influence of ions with LET(Si) -40 (MeV/mg cm2) and ions with LET(Si) -60 (MeV/mg cm2) To conduct radiation research, a specialized testing board was developed and manufactured, which allows to carry out functional control of a series of 8-bit microcontrollers with the MCS-51 architecture and includes the following main functional units: 64K × 8 instruction ROM based on electrically erasable and programmable flash memory AT29C512; 32K×8 data RAM UM62256, address latch -eight-bit register 74AC373SMT; signal conditioner -four logic elements 2I 74AC08 and decoderdemultiplexer 3 to 8 74HC138; reset device -using the supervisor TLC77331P; The MAX3232 is a transceiver that matches RS-232 levels to digital levels.
The internal generator of the IC under study is used as a clock pulse generator when connected to the XTAL1 and XTAL2 terminals of a 24 MHz quartz resonator with two matching capacitors.
In the study of OS and FC, the supply voltage of the IC, as well as the board, was supplied from different power sources, while the current consumption of the IC was measured using an automated measuring system based on a PC.In the event of a TE, an automatic limitation of the consumption current is provided, which was set at the level of 100 mA.In addition, a short-term disconnection of the power source was provided in the event of a TE.
The studied ICs were installed in the UK48-4S contacting device.The board has drivers and interface connectors for COM1 and COM2 ports for bidirectional information exchange (to communicate the IC with a computer via its RS-232 serial port).
In the process of irradiation during the implementation of the PC and measurement of the current consumption of the studied ICs, 4 computers are used: 2 as control computers (for power supplies and test board) located in the irradiated area and 2 as remote PCs located in the measurement room connected via an Ethernet (cross) cable.
To check the functioning of the IS during the exposure to TGCH, a program was developed with the help of which the numbers FFh, 00h, 55h, AAh are sequentially recorded (RAM control).First, the number FFh is written to each RAM cell, followed by reading the RAM, then the number 00h, and so on in the above sequence.
When examining the ROM, the numbers FFh, 00h, 55h, AAh are sequentially recorded.First, the ROM is erased, the number FFh is written to each cell of the ROM, then the number 00h, and so on in the above sequence.
At the end of writing and reading all four numbers, the whole cycle repeats.The program calculates the number of failures and failures of the microcircuit under study under the influence of HCP.
Test results show improved fault tolerance compared to traditional design methods.

Fig. 1 .
Fig. 1.Application of TMR to protect the RAM in the microprocessor chip

Fig. 2 .Fig. 3 .
Fig. 2. The use of a monitoring unit to prevent multiple errors in the RAM of the microprocessor chip

Fig. 4 .
Fig. 4. The occurrence of a failure in combinational elements when operating at maximum frequencyIf the needle occurs in combinational logic elements that are closer to the logic output, then restoring the correct value at the output requires less time than when the needle occurs in elements located close to the input.The microprocessor chip is a fast, economical, 8-bit CMOS microcontroller manufactured using 0.35 µm CMOS technology.The microcircuit provides operation with a frequency from 1.25 MHz to 33 MHz and supports two softwareselectable power saving modes.In this way, it provides a sufficiently high fault tolerance due to temporal redundancy.

Fig. 5 .
Fig. 5.The occurrence of a failure in combinational elements when operating at a frequency lower than the maximum