|
Modern electronic circuits are highly complex systems and, as such, are
susceptible to occasional errors or failures. In addition to permanent
hardware failures, electronic components are subject to random transient
errors which originate from various electronic noise sources. In digital
electronics, errors which are not caused by permanent damage to the
circuits are referred to as soft errors, soft fails, or single-event
upsets.
IBM scientists have long known that electronic noise capable of causing
soft errors in electronic components could also be created by energetic
nuclear particles originating from either of two sources:
extraterrestrial cosmic ray particles, which constantly bombard the
earth, and the decay of radioactive atoms. The soft errors generated in
digital electronics by particle bombardment are produced at localized
sites and involve single memory bits or single Boolean logic steps.
Although digital circuits in computers are constantly exposed to these
particles, their effects do not necessarily translate into operational
mistakes. For example, a changed memory bit may be overwritten before it
is read. There are two common approaches for combating soft errors.
Chips may be selected or designed with components that have reduced
sensitivity to cosmic ray particles. Alternatively, methods such as
parity checking and error correction codes, in use since the 1960s, may
be employed to prevent soft fails from causing system errors. Analyses
of error propagation and correction techniques in digital circuits have
been extensively reviewed elsewhere and are not treated in this issue.
This issue of the IBM Journal of Research and Development focuses on
studies of soft errors in computer chips caused by cosmic rays at
terrestrial altitudes. Soft errors caused by radioactive contaminants
are also considered, but emphasis is placed on the experimental,
theoretical, and modeling aspects of cosmic-ray-induced soft errors in
computer chips. Details of procedures used at IBM to improve chip
reliability by minimizing chip sensitivity to cosmic radiation are
outside the scope of this issue.
The issue is divided into two sections: experimental and theoretical. In
the first paper,
Ziegler et al. trace IBM's experimental studies of
cosmic-ray-induced chip errors.
Ziegler's second paper reviews the
physics of terrestrial cosmic ray flux.
O'Gorman et al. then describe
field-testing measurements of soft errors in computer chips caused by
naturally occurring terrestrial cosmic rays at various altitudes.
Accelerated testing of computer chips with particle beams to provide a
rapid and accurate evaluation of the sensitivity of newly fabricated
chips to cosmic rays is discussed in a
review by Ziegler et al. This is
followed by a
short paper by Ziegler et al. which describes the design
of a portable nonvacuum Faraday cup used in accelerated-testing studies.
In the first theoretical paper,
Srinivasan presents an overview of the
basic physical principles and methodologies which have been incorporated
into a software tool called SEMM (Soft Error Monte Carlo Modeling). SEMM
is used extensively in IBM during computer chip design to meet
reliability goals by predicting soft-error rates. Next,
Tang reviews the
theory of high-energy particle bombardment of microelectronic devices
which is used in the nuclear spallation model to generate the database
required by SEMM. The following paper, by
Murley and Srinivasan,
describes how SEMM takes information on circuit design and layout, on
processing design details, and on circuit critical charge (i.e., the
collected charge necessary to switch a signal in an integrated circuit)
to predict soft-error rates.
Freeman provides an important contribution
to soft-error modeling in the final paper, in which he examines the
concept and calculation of critical charge. Although critical charge is
typically introduced into soft-error simulation models as a single-valued
parameter, Freeman shows that normal manufacturing and operational
tolerances can cause a significant variation in critical charge and give
rise to a range in simulated soft-error values.
IBM scientists have been actively engaged in research on cosmic ray
particle bombardment of computer chips for nearly two decades. Their
purpose has been to understand and to quantify the probability of chip
soft errors and to provide means to contain these errors within specified
standards of system reliability. The papers presented in this issue of
the IBM Journal of Research and Development describe their achievements
and illustrate IBM's continuing efforts to enhance the reliability of its
products.
J. F. Ziegler
G. R. Srinivasan
Guest Editors
|