|
Introduction
The 4Mb DRAM generation saw a revolutionary change in
technology at IBM, with the introduction of CMOS, trench capacitor
storage, and other new processes and structures. Although rapid
progress continues, the basic cell structures and many of the processes
developed then are being used in the 64Mb and 256Mb DRAMs being
developed today. In addition, much of the technology developed for the
4Mb and 16Mb DRAMs is now used in CMOS logic technology. This paper
describes the DRAM cell used by IBM beginning with the 4Mb generation,
and traces its evolution to the 256Mb cell being developed today. We
then describe the development of some key technology elements, and
explain how key DRAM device problems were solved.
Dynamic random access memory has been a good vehicle for technology
development, because there is a predictable demand for a large number
of chips of standard design. The density of the array, a
well-understood benchmark which determines cost, is a very effective
driver of technology development. The addressability and repetitive
character of the array make it possible to find and solve technology
problems in the product. The high volume allows employment of the team
of experts required to do a thorough development job. Thus, DRAM is the
product that has driven the state of the art of silicon device
technology up to the present day.
The one-device DRAM cell [1],
invented at IBM by R.
Dennard, consists of a cell transistor with the drain connected to
one node of the cell storage capacitor, the source connected to a bit
line, and the gate connected to the word line, which runs
orthogonal to the bit line
(Figure 1). The
requirement to have a large capacitor in a small space with low
leakage is the main driver of DRAM technology. A brief description of
the cell operation will help to explain why. To write, the bit line is
driven to a high or low logic level with the cell transistor turned
on, and then the cell transistor is shut off, leaving the capacitor
charged high or low. Since charge leaks off the capacitor, a maximum
refresh interval is specified. To read, or refresh the data in the
cell, the bit line is left floating when the cell transistor is turned
on, and the small change in bit-line potential is sensed and
amplified to a full logic level. The ratio of cell capacitance to
bit-line capacitance, called the transfer ratio, which ranges from
about 0.1 to 0.2, determines the magnitude of the change in bit-line
potential. A large cell capacitance is needed to deliver an adequate
signal to the sense amplifier.
Figure 1
The evolution of technology has followed the following overall trends.
DRAM cell size has decreased from 11.3
µ m² for the
first 4Mb cell to 0.6
µ m² for the first 256Mb cell.
Improvements in lithography were responsible for much but not all of
the size reduction. New process features were also necessary to shrink
the cell and to improve array performance. As a result, there has been
a trend toward increased process complexity, as reflected in the number
of masking steps used in the process, which has increased from 13 in
the 4Mb generation to 25 in the 256Mb generation.
The increase in the complexity of DRAM technology has driven up the
cost of DRAM development, resulting in the formation of alliances
between companies to reduce the expense to individual companies. The
IBM 64Mb DRAM is being developed by an alliance between IBM and
Siemens, and the 256Mb by a triple alliance including IBM, Toshiba, and
Siemens.
DRAM external power supplies follow industry standards. Because the
chip power is low enough, DRAM can use on-chip power supply regulation
to reduce the internal circuit power supply swings. IBM has led the
industry in reduction of power supply voltages for CMOS logic and
memory. DRAM technology resists power supply scaling more than
logic technology because of the need for storage of charge.
Table 1 illustrates this trend.
Power supply voltage reduction will come rapidly, since the market for
battery-operated equipment is growing faster than previously
anticipated. Also, performance competition in microprocessors demands
ever-shorter channel lengths, which in turn require reduction in power
because of device scaling. DRAM chips and technology will be similarly
forced to operate at lower power supply voltages in the near future.
The market for battery-operated equipment also creates a need for
longer DRAM retention times, to reduce the power associated with
refreshing the data. The data retention time specification is currently
64-256 ms, making very low leakage current a requirement, along with a
large cell capacitance.
The cell capacitor was a simple planar structure through the 1Mb
generation. At and beyond the 4Mb generation, as the cell size
decreased, the effective surface area of the capacitor was maintained
by placing the capacitor on the sides of a narrow trench etched
into the silicon, or by putting the capacitor on top of the other
elements of the cell.
The next section begins by explaining the development of the IBM 4Mb
substrate plate trench (SPT) DRAM cell, at a cell size of 11.3 µ m. We
next show how important features were added for succeeding generations
to reduce the cell size to 0.6 µ m², where it now stands
for the 256Mb chip. Succeeding sections trace in more detail the
development of certain technology elements essential to DRAM. We start
with the strap connection between the storage trench polysilicon and
the node diffusion, a unique SPT DRAM requirement, which is a challenge
for process integration. Then we discuss device isolation, retrograde
n-well, salicidation, lithography, and metallization. Finally,
solutions for various cell device design and retention time problems
encountered during DRAM development are described. Included are
gate-induced drain leakage (GIDL), three-dimensional device effects,
dislocation-related leakage, and the variable retention time phenomenon.
DRAM cell structure evolution
The folded bit-line cell array configuration (Figure
2) has been used universally in the industry since the
1Mb time frame. In the folded bit-line configuration, a cell is crossed
by two word lines and one bit line. One of the word lines (WL1 in
Figure 2) is the "active word line" for the cell, and forms the
gate of the cell device. The second word line (WL2), the "passing
word line," is the gate of the cell device on the adjacent cell.
Thus, the bit line (BL) and reference bit line (BL) can be
adjacent, leading to better matching and noise rejection, as well as
providing a wider pitch for the layout of the sense amplifier. Although
the cell now contains two word lines (active and passing), this does
not require more cell area than an open bit-line cell, since the
additional area is also generally the same area used for the storage
capacitor.
Figure 2
IBM adopted CMOS technology for DRAM at the 4Mb generation. Previously,
DRAM had been implemented in simple n-MOS technology because the latter
was relatively inexpensive. However, logic applications, which were
sensitive to active power, had already migrated to CMOS. Integration of
a DRAM cell structure into a CMOS technology brought with it some
fundamental issues which had to be resolved before tackling the cell
structure in detail. For an integrated DRAM technology, the doping
types and profiles from the starting substrate up to the device gates
had to be chosen and optimized to the best trade-off of cost,
reliability, function, and speed. The most fundamental issue,
however, was the one of choice between n-well CMOS on a p-type
substrate and p-well on an n-type substrate. Two important conditions
were set which the technology had to meet, and which still obtain for
current and future generations:
- The array must be isolated from the substrate by building it
within a well of opposite doping to take full advantage of CMOS. This
benefits cell retention time by eliminating all leakage current sources
associated with the substrate wafer. It reduces the incidence of soft
errors due to ionizing radiation by confining the effective minority
carrier collection length within the well and sending many of the
generated minority carriers to the substrate, where they do not
affect the storage node diffusion.
- The well potential must be stable despite the impact ionization
that accompanies FET operation. This ionization is largest for an
n-channel device. Therefore, n-MOS devices should not be positioned in
a well whose conductivity is reduced by light doping or constrained
depth. This constraint is satisfied by an n-well CMOS technology on
a heavily doped p-type substrate.
A p-MOS DRAM array built in an n-well CMOS technology meets these
conditions and was chosen for the 4Mb generation. The cell choice was
then made within that framework.
The criteria for cell choice are density, process simplicity, adequate
storage capacitance for detectable signal, and low parasitic
capacitances for performance and minimization of noise. Each generation
of DRAM must compete with prior generations by providing an ultimate
lower cost per bit. This is accomplished by decreasing cell size with
each generation, while minimizing the increase in processing cost. The
industry trend [2]
is to reduce cell size by a factor of 0.33 for
each generation. The industry trend in lithography is to reduce the
minimum image size by a factor of 0.7 for each generation, so the use
of lithography alone would reduce the cell area by 50% for each
generation. Figure 3 shows the cell size vs.
generation plotted together with the square of the minimum lithographic
image. This shows that technological innovation, involving a change in
cell structure, is needed in addition to lithographic scaling to reduce
the cell size by a multiple of one third for each generation. Also,
technical advances are required to implement dimensional scaling
(reduced heat cycles, film thicknesses, defect levels, etc.) and
to mitigate electrical limitations arising from such scaling.
Figure 3
At the transition from 1Mb to 4Mb [3],
planar capacitors did
not provide enough cell capacitance, and were replaced by
three-dimensional capacitors throughout the industry. These took the
form of either trench capacitors buried within etched holes in the
silicon [4-7]
or stacked capacitors built above the silicon
[8-12] in the region of the interconnect-level films.
The planar capacitor in the 1Mb and prior generations in IBM used an
oxide/nitride/oxide (ONO) storage insulator consisting of a sandwich of
thermally grown oxide, followed by deposited silicon nitride, which is
subjected to oxidation to seal any weak spots in the nitride. Early
experiments with deep-trench capacitors produced excellent results
using the same ONO storage node insulator used in the 1Mb generation.
Since the defect levels per unit area were much lower than predicted by
experience with planar capacitors, trench capacitors were chosen
for the 4Mb DRAM generation.
The 4Mb generation
The cross section of the IBM 4Mb cell is shown in
Figure 4. The capacitor consists of the
polysilicon storage node electrode which fills the trench, the ONO node
dielectric on the trench walls, and the p+ substrate which forms
the storage plate. Thus, there is no need for the separate plate wiring
layer found in other cell types. The trench polysilicon node is
connected to the array device diffusion pocket by a selective silicon
epitaxy surface strap, which bridges the thin oxide separating the
active area and the top surface of the storage node. This cell
structure is referred to as the substrate plate trench (SPT) cell
[13]. This
type of cell differs from the standard industry trench
cells, which either form the storage node in the silicon substrate
outside the trench, or stack two polysilicon electrodes separated by
the insulator inside the trench.
Figure 4
Active device areas are formed in a p-epitaxial layer grown on the p+
substrate. As shown in the layout of Figure
5(a), the active regions are separated by conventional
isolation. Because the cell is in a well, a vertical parasitic p-FET is
formed between the p+ storage node diffusion and the p+ substrate, with
the trench polysilicon as the gate. This parasitic device is never
turned on because the gate is tied to the p+ storage node diffusion,
which is always the source of the p-FET, and the array n-well is
back-biased at about 1 V above the power supply voltage.
Figure 5
The 16Mb generation
In the 4Mb cell, a localized oxidation of silicon (LOCOS)
isolation region must separate a trench from an adjacent active device
area to avoid parasitic sidewall currents gated by the storage node
polysilicon, and the automatic strapping of all adjacent nodes and
trenches which would otherwise occur. In the 16Mb cell, this limitation
was overcome by a modification of the trench structure.
Figure 6 is a cross section of the 16Mb cell,
showing that the insulator lining the trench now contains a thick
(approximately 100-nm) SiO
collar which extends from the
silicon surface to a point below the n-well. The thick
SiO
collar prevents unwanted bridging of exposed node silicon and diffusion
surfaces. It also has the function of isolating the node trench
polysilicon from the cell device edge under the word line, which was
the role of the LOCOS isolation in the 4Mb cell. To further isolate the
storage trench polysilicon from the abutting cell device region and the
word line, the top of the trench polysilicon must also be recessed
below the active device area wafer surface and covered by a thick
oxide. The storage trench can now be placed in the space between cell
devices, as shown in Figure 5(b). This increases the
efficiency of the cell layout by decreasing the area devoted to thick
oxide isolation and increasing the area available for storage
capacitance.
Figure 6
Electrical connection between the trench polysilicon node and the array
device across the thick collar is made by a deposited polysilicon
surface strap using a novel process to be described in a subsequent
section of this paper. This strap is borderless to the dielectric-encapsulated word line. This reduces the active-to-passing word-line
space, which was determined by the overlay tolerance of the trench,
isolation, and word-line layers in the 4Mb cell. The 16Mb cell is
referred to as the merged isolation and node trench (MINT) SPT cell
[14].
The 64Mb generation
Along with the density increases, improvements in performance were
also realized as a consequence of scaling. During the 4Mb and 16Mb
generations, the lower performance of a p-MOS cell device relative to
n-MOS was not a problem. With the 64Mb generation, the time required to
move data in and out of cells could be significant. Therefore, an n-MOS
array was desired. The simplest structural change to achieve this would
be simply to interchange n-material for p-material relative to the 4Mb
and 16Mb generations. Thus, the starting material would be n-type, with
implanted p-wells in which the cell arrays would be formed. However,
this structure forfeited the noise immunity advantages of n-well
technology as argued for the 4Mb and 16Mb generations. The benefits of
an n-well CMOS technology on a p-type substrate could be retained
at the cost of some increased process complexity. The array p-well and
the substrate would have to be electrically isolated. This allowed the
array well to be reverse-biased (-1 V) for low leakage, low parasitic
capacitance, and maximum signal, while the substrate was at ground
for low noise and best performance.
Figure 7 shows the cell configuration which
achieves this for the 64Mb generation. The array p-well is
isolated from the substrate by an underlying n-type layer which is
formed by outdiffusion from a source deposited within the trenches. In
a dense array, the trenches are close enough together that diffused
regions form a continuous n-type layer. Since the n-type region
extends to the bottom of the trenches, it also serves as a capacitor
plate. Connection of this n-type plate to the top surface is formed by
an n-well ring which surrounds the array. This cell configuration is
called the buried plate trench (BPT) cell [15].
Figure 7
The overall cell layout is similar to that of the 16Mb generation, as
shown by Figure 5(c), with the addition of a "borderless
contact." This feature reduces the cell size by eliminating the
diffusion border required between the bit-line contact and the
adjacent word line. This requires a special contact structure made
by imposing a film underneath the interlevel oxide, which can act as an
etch stop during formation of the hole for the contact stud. The
word line must be insulator-encapsulated prior to the deposition of the
etch-stop film, so that the contact hole can be opened without exposing
any portion of the word line which may be within the contact image.
The 256Mb generation
Scaling of the 64Mb surface strap to 256Mb dimensions presented
formidable challenges, because the strap-to-trench overlay is critical to the width of the cell, and the strap must
be built in the narrow opening between the active and passing word
lines. For these reasons, a new strap structure, the "buried
strap," and a different cell layout were used as shown in
Figure 8 and
Figure 5(d). The buried
strap is fabricated early in the process and has a diffused connection
formed by creating a sidewall contact on one edge of the trench
capacitor. It saves the cost of the strap mask and avoids the
high-aspect-ratio processing in the active-to-passing word-line space.
Unfortunately, this cell layout produces a relatively smaller trench
than its predecessor, but this is compensated for by scaling the
node dielectric thickness to increase the cell capacitance and biasing
the plate at V/2 to reduce maximum field in the dielectric.
Otherwise, the well/substrate configuration is as described for
the 64Mb generation. This cell is referred to as the buried strap
trench (BEST) cell [16].
Figure 8
Table 2 summarizes the process sequence as it
evolved from the 4Mb generation through the 256Mb generation. It
shows that a large number of processes were kept from generation to
generation, while additions were also made. The strap connection
between the node trench polysilicon fill and the node diffusion was
changed significantly for each generation. We now discuss the
technology elements in more detail.
Strap process development
The "strap" which connects the drain of the array transfer
device to the storage trench polysilicon is an essential part of the
STP cell. This strap adds small cost and requires little additional
area; it should not degrade the retention time of the cell. It is a
special DRAM-oriented process that demands the utmost in
inventiveness to be successful.
The strap process for the 4Mb DRAM relies on the fact that the
diffusion and polysilicon in the storage trench are coplanar and are
separated only by the 10-nm ONO layer on the trench sidewall
(Figure
4). After the spacers on the gate conductor are formed and the
junctions implanted, a thin (70-nm) layer of intrinsic selective
silicon is grown. This bridges the ONO insulating layer
[17]. The
next step is salicide formation, which consumes the selective silicon
and forms a low-resistance strap. Both selective silicon deposition and
salicidation are needed to form such a strap. This process uses no
extra masks for the strap, but increases the word pitch because the
passing word line cannot pass over the strap contact.
In the 16Mb cell, the strap must bridge the 160-nm-wide oxide collar
and a 160-nm step from the trench top to the node diffusion
(Figure 6).
Selective silicon was not used, because the thickness required would
cause spurious nucleation on insulators and bridging of the storage
trench to the "wrong" diffusion.
To produce a manufacturable strap, a novel process, called "boron
out-diffused surface strap," or BOSS, was developed. After
source-drain implantation, a thin layer of silicon nitride is
deposited on the chip. A contact hole is etched through the silicon
nitride and trench top oxide in each cell, exposing the boron-doped
trench polysilicon and p+ diffusions that are to be connected. A thick
SiO2 cap and sidewall spacer are required on the gate
electrode to avoid exposure of the gate electrode surface during this
etch. A blanket layer of intrinsic polysilicon is deposited, and the
wafer is annealed to diffuse boron up into the intrinsic polysilicon
from the trench and diffusion tops. The result is a boron-doped
polysilicon layer bridging the trench and diffusion within each hole.
The remaining intrinsic polysilicon is then removed by a selective wet
etch, isolating the cells from one another. Finally, an oxide is grown
over the strap polysilicon, and the blanket nitride is removed to
prepare for salicidation of diffusion. Figure 9
shows a cross section of the strap contact at two points in the
process. For the BOSS process, the critical parameters are selectivity
of the silicon wet-etch processes to boron doping, and the diffusion of
boron in an undoped polysilicon film. The wet etch must remove undoped
polysilicon to avoid strap-strap shorts, while the boron-doped polysilicon that bridges the trench to the node remains.
Figure 9
The 64Mb cell has an n-type node diffusion and polysilicon trench fill.
An equivalent BOSS process was not available because a suitable
doping-sensitive wet etchant was not available for n-type material.
Therefore, the strap technology was modified to use a thin
in-situ-doped
n-type polysilicon film deposited in the strap contact holes of
each cell. An oxide fill and planarization of the strap contacts with
short etch to remove polysilicon from the contact hole sidewalls
complete the strap and also maintain a planar surface.
The 256Mb "buried strap" is fabricated after the recess etch of
the second trench polysilicon fill, as described in the process
sequence of Table 2.
The top part of the oxide collar is removed so
that the third n+ polysilicon trench fill that is subsequently
deposited can contact the silicon wafer just below the surface. The
trench polysilicon is then recessed again. As a result of the cell
layout, the isolation trench etch, which is done next, leaves only the
trench sidewall next to the storage node diffusion connected to the
storage node trench polysilicon. The n+ outdiffusion from the node
polysilicon merges with the node source- drain diffusion to complete
the contact. To prevent the buried-strap diffusion from affecting the
cell device characteristics, arsenic, which diffuses slowly, is
used as the n+-type dopant in the trench polysilicon.
Evolution of isolation with DRAM generations
The IBM 4Mb DRAM contained many innovative process features, but
elected to employ the conventional LOCOS process for device isolation.
It has the lowest cost and is a well-understood "industry
standard" process, but it has two major disadvantages. One is area
loss to the "bird's-beak" phenomenon at the isolation boundary, which remains at
approximately 0.10-0.15 µ m per edge and thus becomes an
ever-increasing fraction of the total lithographically limited
isolation pitch. A second drawback is isolation oxide thinning in very
narrow isolation areas due to multidimensional oxidation effects
[18].
The IBM 16Mb generation had a new cell structure that affected the
choice of isolation technology
(Figure 6). As explained in the section
on cell structure evolution, the top of the 16Mb deep trench must be
recessed below the wafer surface and capped with an insulator to make
the MINT cell work. While some oxide grows on the (polysilicon) trench
fill during gate growth, it is not thick enough to provide adequate
planarity for the word line. Therefore, the recess must be filled with
an insulator (oxide) and planarized to create this cell.
Shallow-trench isolation (STI), an alternative to LOCOS, offers the
possibility of true lithographically limited pitch and feature size,
low thermal cycle, and improved surface planarity. It is accomplished
by etching isolation trenches into the silicon wafer, depositing oxide
to fill them, and planarizing the surface. Its major drawback is the
increased cost and complexity of planarizing the blanket-deposited fill
oxide.
Since deposition and planarization of the deep-trench capping oxide
must be performed in order to fabricate MINT memory cells, the
challenge of STI had to be addressed even if LOCOS was employed in the
support area. As a result, STI became the complete answer for both
storage-trench capping and standard device isolation beginning with the
16Mb generation.
The fill-oxide planarization method chosen for 16Mb processing employed
a planarization block mask (PBM) immediately following oxide
deposition. This leaves photoresist in larger-sized isolation regions
to help with planarization. This masking step is followed by the
application of planarization polymer. Reactive ion etch (RIE) of the
oxide-polymer stack and chemical-mechanical polishing (CMP) to remove
deposited oxide from the active areas complete the planarization
process [19].
Figures 10 and
11 physically compare isolation for
4Mb and 16Mb chips.
Figure 10
Figure 11
Devices made using STI show unique electrical behavior not found in
LOCOS-isolated devices. Inverse narrow-channel effect (decreasing device threshold voltage as active width
narrows) [20],
local dielectric thinning [21], and the existence of
a lower-threshold n-channel parasitic device, parallel to the main
device channel
[20, 22],
all result from the abrupt surface
topography (corner) where isolation abuts a device. While angled
implantation or solid-source diffusion has been shown to modify and/or
eliminate the n-channel parasitic device and inverse narrow-channel
effect
[23, 24],
device behavior could be controlled within
acceptable bounds without adding these process steps.
Table 3 compares LOCOS and STI for their last and
first IBM DRAM implementations, respectively. Note that the minimum
active device width is no longer determined by isolation restrictions,
but is limited by salicidation process boundaries. Minimum isolation
width, as well, is not an inherent technology issue, but simply set to
provide manufacturability in photolithography.
The extendability of the basic STI process has been demonstrated in the
64Mb and 256Mb DRAM processes being developed by IBM and its alliance
partners. While modifications to the planarization process have been
exercised to widen the process window
[25], the basic process
advantages offered by STI have been found to be extendable over several
DRAM generations with minor alterations.
CMOS well doping to prevent latch-up
During the development of the 4Mb DRAM, CMOS latch-up was an important concern. Latch-up can be prevented by lowering
parasitic bipolar gains and reducing the parasitic substrate and n-well
resistances to prevent voltage drops from forward-biasing the
base-to-emitter junctions.
Most CMOS technologies being developed at the time used
p-substrates and diffused n-wells, which are susceptible to latch-up.
To prevent latch-up, the layout ground rules ensured that regions which
form bipolar bases had large lateral dimensions, thereby reducing the
bipolar gains.
The IBM DRAM designs used a different strategy, driven by the SPT cell
design for 4Mb generation, as well as by the desire to avoid
latch-up. It was based on technology demonstrated by IBM research
[26]. Since
the grounded p-type substrate formed the plate electrode
of the 4Mb and 16Mb SPT DRAM cell
(Figure 4), it had to be heavily
doped (N =
10
cm )
to avoid n-type reduction in cell capacitance due to depletion when
positive bias was applied to the trench fill. Heavily doped substrates
provide excellent latch-up protection, since the substrate, which would
be the base of the parasitic npn, has a low-resistance connection to ground. Earlier CMOS n-wells were
diffused from the top surface of the silicon (Figure
12), and the donor concentration decreased with
depth. A retrograded n-well was used to increase the threshold voltage
of the vertical parasitic FET present in the SPT cell. (See the 4Mb
cell structure description above.) The retrograde profile
(Figure
13) was produced by high-energy implantation of
phosphorus using MeV ion implanters. This retrograde well was lower in
resistance, allowing for larger spaces between n-well contacts, and
provided higher base doping, which lowered the gain of the parasitic
pnp transistor.
Figure 12
Figure 13
The p+ substrate and the retrograde n-well raised the latch-up holding
voltage above the maximum application voltage at the minimum spacing
allowed between n+ and p+ diffusions. These features were also used for
the 16Mb DRAM and for logic generations created in the same time frame.
In the 64Mb and 256Mb DRAMs, the base technology was altered because
trade-offs in the cell design provided less latch-up immunity. While
the highly doped substrate was eliminated, the retrograde n-wells
remained.
Use of advanced lithography for DRAM
To a first order, the minimum printable image size for a
lithography tool is given by
R = k /NA,
where R is the resolution limit, k is the
Rayleigh k-factor which determines the image constrast and
is approximately -7 for manufacturing purposes,
is the
exposure wavelength, and NA is the numerical aperture of the
lens system.
DRAM development has been the test bed for new lithographic equipment
and photoresists in IBM. The 1Mb generation was the last in IBM to use
G-line (mercury-arc light source; wavelength 436 nm) lithography with
1.0-µ m ground rules. Historically, the resolution capability of
lithographic systems has been scaled by reducing the exposure
wavelength and increasing the optical numerical aperture of the
phototool. Figures 14 and
15 illustrate this trend for I-line and deep-UV
(DUV) processes.
Figure 14
Figure 15
I-line (mercury-arc; wavelength 365 nm) systems were introduced in the
mid-1980s, with a numerical aperture of 0.28 and a resolution
capability of 0.80 µ m. Early I-line tool prototypes were used in the
development of the first IBM 4Mb DRAM products in 1986. This helped to
drive the development of mid-UV photoresist systems. Over the last nine
years, the numerical aperture of these systems has been increased to
0.60, with a resolution capability of 0.50 µ m.
High-resolution (mercury-arc or excimer laser; wavelength 248 nm) DUV
lithographic systems became available in late 1987. Early prototypes of
DUV machines were used by IBM for the development and manufacture of
16Mb DRAMs at 0.5-µ m ground rules, while other semiconductor
manufacturers have chosen to extend I-line equipment for this
application. This early use drove the development of deep-UV
photoresist systems in IBM. Deep-UV machine numerical apertures range
from 0.36 to 0.60. DUV lithography has generally provided a three-year
performance advantage relative to I-line (see
Figure 14), although it
entails higher chemical and equipment costs. IBM is using DUV
lithography for the manufacture of CMOS devices with image-size ground
rules from 0.40 to 0.60 µ m. While both conventional steppers and
step-
and-scan lithographic tools have been used, the ease of attaining large
field sizes with the step-and-scan approach has favored its
implementation.
Recent advances in resolution-enhancement techniques such as
phase-shifting masks and off-axis illumination have provided the
capability to widen the available process window that can be obtained
from a given lithographic toolset. These approaches are particularly
attractive for the 256Mb DRAM, where stringent demands are
placed on current DUV systems for generating 0.25-µ m images.
Simulations predict that through the judicious application of
attenuated phase-shift masking, off-axis illumination, and
feature-dependent biasing, acceptable process windows can be achieved
for all critical levels within the cell [27].
Initial experimental
evaluations have confirmed the predicted benefits at 0.25-µ m
dimensions [28].
With efforts to establish manufacturable
solutions well underway, application of these resolution-enhancement
techniques to I-line lithography will also offer a cost-effective alternative to DUV lithography for the 64Mb DRAM at dimensions
of 0.35-0.4 µ m.
Silicide technology for diffusions and gate conductors
Driven by requirements of reduced resistance for gate conductors
and diffusions, titanium disilicide metallization was introduced for
the 4Mb DRAM generation. The self-aligned silicide (salicide) technique, which forms silicide on both the
gate conductors and the diffusions simultaneously with no additional
photolithographic steps, was used. This process requires an insulating
spacer on the sidewalls of the gate conductor to avoid shorting the
gates and diffusions.
The process sequence is as follows: deposition of titanium metal,
formation annealing to react the Ti with exposed Si (gate conductors
and diffusions) to form TiSi ,
selective etching to
remove unreacted Ti, and transformation annealing to form the
low-resistance phase of TiSi .
The integration issues which
must be traded off are as follows:
- First, the maximum titanium silicide thickness is limited by the
junction depth of the technology.
- Second, filaments of residual titanium silicide shorting the gate
conductor to the diffusions (G-D shorts) limit the choices available
for selective etch and annealing temperatures.
- Third, the tendency of the silicide film to agglomerate limits the
maximum thermal cycle to which the films can be exposed.
A two-step anneal is needed because titanium disilicide exists in
two phases, a high-resistance phase (C49) of approximately 60-70
µ -cm
and a low-resistance phase (C54) of approximately 15-20
µ -cm. Circuit
requirements require conversion to the
low-resistance phase, which needs a high-temperature anneal, while G-D
shorts limit the maximum allowed formation-annealing temperature. The
selective etch is designed to remove TiN, a by-product of the formation anneal, and unreacted Ti, while maintaining
selectivity to TiSi .
In the case of the 4Mb DRAM generation, the junction depth and the
deposition of selective silicon for the strap after junction formation
allowed the deposition of a Ti layer more than 600 Å
in thickness. A
conventional tube furnace was used for both anneals. The
high-temperature reflow annealing process, used to planarize the BPSG
(boron-phosphosilicate glass) first passivation insulator in the
early version of the 4Mb DRAM process, caused agglomeration, and was
eventually eliminated in favor of chemical-mechanical polishing (CMP),
as described below in the section on wiring and insulation technology.
The 16Mb generation required a gate electrode encapsulated with oxide
in order to fabricate the strap (see above), so
TiSi
salicidation, which requires an exposed gate electrode, could not be
used for the gate. A WSi
(tungsten polycide)-polysilicon
sandwich, which was capped with oxide before gate patterning, was used
to reduce the gate electrode resistance. A high-temperature anneal was
required after patterning to reduce the resistance of the
WSi . The diffusion metallization remained
TiSi . The occurrence
of G-D shorts decreased and yield
improved because the gate was encapsulated in
SiO .
However, this benefit was obtained at the cost of increased gate stack
height, which complicated the gate etch process.
The shallower diffusions for the 16Mb DRAM generation reduced the
maximum allowable Ti deposition for the salicide to less than 480 Å .
The reduced TiSi
thickness made agglomeration a major
consideration in process development. Polish planarization was employed
to reduce the back-end-of-line process temperature. Furthermore, it
became difficult to obtain the low-resistance C54 phase on narrow
diffusions. In fact, it was found that the onset of agglomeration
preceded the transformation of narrow
TiSi lines to the
C54 phase using conventional furnaces for the second anneal. Rapid
thermal annealing (RTA) can bring the
TiSi to a
temperature high enough to convert the narrow lines, for a time short
enough to avoid agglomeration, and is now used instead of a
conventional furnace.
Before RTA could be used in manufacturing, temperature control had to
be improved. Conventional RTA temperature control relied on pyrometric
temperature measurement of the wafer back side for feedback. Wafer
temperature was not well controlled with this system, because the
thickness variation of oxide and nitride films left on the wafer back
side led to variations in emissivity. As a result, a power control
strategy (open loop on temperature) was developed to make RTA a
practical manufacturing tool
[29].
RTA was also used for high-temperature annealing (above 1050° C) of
the WSi
polycide gate stack. This reduced the resistance
of the WSi and
relaxed stress in the thermal oxides,
while the short annealing time minimized perturbation of the
well-doping profiles. These objectives could not be accomplished by
furnace annealing.
Salicidation is an essential feature of present and future
high-performance logic technologies. The lessons learned in DRAM
development have been useful in understanding the problems of scaling
line widths and junction depths and refining the silicide manufacturing
processes. Although salicided diffusions are not planned for 64Mb and
256Mb generations, the use of WSi
polycide is expected to
continue in DRAM generations beyond 16 Mb.
Metal wiring and insulation technology
The 4Mb DRAM introduced innovations in metal wiring and insulation
technology [often referred to as the back-end-of-line (BEOL) technology]. These processes include
chemical-mechanical polishing (CMP) of insulators, chemical vapor
deposition (CVD) of tungsten, and deposition/etching of interlevel
dielectrics. The introduction of the borderless contacts in the 64Mb
and 256Mb generations (as described in the section on cell evolution)
presents significant technological challenges requiring new processes
and equipment.
Prior to the 4Mb DRAM generation, the BEOL had tapered contacts and
sputtered Ti-Al-Cu-Si wiring. The contact sidewalls were tapered
because the sputtered Al-based wiring had poor step coverage when walls were vertical. However,
the shallow step angle required for reliability affected cell size
as the technology was scaled.
It was necessary for the contact hole sidewalls from the bit lines
to the diffusions below to be nearly vertical in order to achieve the
desired cell size. CVD tungsten, because of its superior step coverage,
was selected for the bit-line metallurgy. A single conformal
deposition of tungsten was introduced, both to fill contacts and to
provide the first layer of wiring. This layer was initially patterned
using RIE. The challenge of filling the spaces between tungsten lines
with nearly vertical sidewalls was met using a process with a repeating
sequence of plasma-enhanced chemical vapor deposition (PECVD) of silicon dioxide followed
by argon sputtering, which provided void-free insulator fill.
This interlevel dielectric (ILD) was planarized using CMP,
which resulted in a very flat surface for the second level of
metal, avoiding lithography problems due to steps. Vias between the two
metal levels were also nearly vertical. CVD tungsten was deposited
again to fill nearly vertical vias. The superior conductivity of
aluminum was needed for the second level of metal, so tungsten was
removed from the insulator surface with an RIE etchback, leaving just
the vias filled.
Several improvements have been made to the initial 4Mb process. RIE
patterning of tungsten bit lines was replaced with a dual damascene
approach which reduced defect levels associated with tungsten
etching and deposition using 4Mb generation equipment
[30]. This
process begins with etching both contacts and troughs for the wiring
into a thick, planarized SiO
layer. Tungsten is deposited
and then removed from the surface using CMP. This leaves the wiring
inlaid in the insulator. Because the surface is now flat, there is no
need for an expensive deposition/etch insulator or CMP between the
first and second layer of wiring, so this is replaced with a single
deposition. Vertical vias are still filled with tungsten, but it is
removed from the surface with CMP instead of RIE, eliminating a
difficult etching depth control problem and related RIE-induced
defects, and dramatically reducing the sensitivity to tungsten
deposition defects.
Technology elements developed for the 4Mb generation were also used in
the 16Mb process, but were integrated differently. Phosphosilicate
glass (PSG), used for first-level passivation, is deposited using a
sequence of PECVD deposition and argon sputtering like the ILD process
developed for the 4Mb generation
[31]. The lower processing
temperature with this approach is necessary to prevent agglomeration of
the thin TiSi
used on junctions. The PSG is CMP
planarized, and vertical contacts are filled with tungsten using the
4Mb via fill process. The 16Mb generation is again a metal bit-line
design, but aluminum-based metal is used instead of tungsten, because
superior conductivity is needed for support circuit performance. The
insulator between the first and second levels of metal is deposited
using a deposition/etch sequence optimized to fill tight spaces between
lines and to provide smoothing
[32]. This is important,
because no additional planarization is used. More traditional tapered
vias without tungsten fill are used. The second-level metal is used
only for support circuit wiring in our 16Mb designs, so the design
density penalty with this simpler process is small (0.5%). The
manufacturing costs for the ILD and contacts are 40% less with the
nonplanar, tapered-via approach.
Recent work on wiring technology for 256Mb DRAMs has demonstrated that
vertical and horizontal scaling works effectively down to the
0.25-µ m level. Figure 16 summarizes the
evolution of metal levels in the IBM DRAM generations. Heights and
spacings are reduced together to keep the aspect ratio around 1.0. This
prevents void formation during deposition of interlevel dielectrics,
and keeps the metal line-to-line capacitance under control, which is
especially important in the case of the bit line. The
first-level-metal wiring layer, which forms the bit line in the 256Mb
DRAM, is made by the tungsten damascene process scaled to 0.20-µ m
thickness. The second wiring layer in the 256Mb generation is a
TiN/Al-0.5% Cu/Ti sandwich with a total thickness of 0.3 µ m.
Figure 16
The production of the borderless contact
(Figures 7 and 8) required
for the 64Mb and 256Mb generations (see the section on DRAM
structure evolution) presents a formidable challenge. The deposited
layer which will form the gate electrode must include an insulating cap
layer, which remains after gate formation. Subsequently, insulating
sidewall spacers complete the encapsulation of the gate. A special
etch-stop layer is deposited just underneath the passivation. This
layer must withstand the process of etching the diffusion contact hole
through the glass passivation so that the insulating cap layer covering
the adjacent gate remains intact. The etch chemistry is then altered to
remove the etch-stop layer. The etch-stop thickness is limited by
aspect ratio and contact resistance requirements to less than 700 Å ,
so etch selectivities greater than 25:1 are required on topographical
as well as planar surfaces.
One etch stop that has been used is a silicon nitride/
thin polysilicon sandwich. The polysilicon is an excellent barrier to
SiO
glass passivation etching using conventional dry
etching processes [33].
The polysilicon is oxidized after the glass
passivation has been etched. The thin silicon nitride layer underneath
the polysilicon provides an oxidation barrier. This process, however,
has undesirable side effects. The oxidation creates stress and leaves
unoxidized polysilicon filaments; and the polysilicon film increases
the contact aspect ratio, making it difficult to fill the contact
hole with tungsten without voids.
Recent advances in etch technology have made it possible to use silicon
nitride for the barrier layer, eliminating the need for oxidation
[34].
Selectivity greater than 40:1 between
SiO and
Si N
has been obtained on planar surfaces.
Although selectivity at the gate corner is reduced by sputtering, it
can be improved by controlling the ratio of polymer deposition to ion
bombardment during the etch. Continued development of this process is
expected to result in an effective borderless contact process for the
256Mb generation.
The wiring technology developed in DRAMs has also been useful in
logic products. The tungsten-stud- aluminum-based metal
combination is the basis for the dense back-end-of-line wiring in
the IBM CMOS IV and CMOS V logic technologies. The low-cost nonplanar
tapered-via approach is used in CMOS V logic technologies for the final
metal layer, in cases when high-density wiring is not required.
Meeting DRAM-specific leakage requirements
Low parasitic leakage currents are important to the DRAM product
because of the cell retention time requirements, which have increased
from 16 ms in the early 4Mb generation to as much as 256 ms for current
low-power products. The use of trench technology in DRAMs resulted
in some unique leakage problems due to device structure and
processing. Their solution required simulation and extensive
experimentation. This section first describes gate-induced drain
leakage (GIDL), which caused array n-well-to-substrate leakage problems
in 4Mb and 16Mb chips. Next we describe how the three-dimensional
geometry of the cell device increased GIDL leakage between the cell
node diffusion and the n-well in the 16Mb cell, and what was done
to solve the problem. Finally, we show how retention time (RT) problems
due to dislocations present during the early production of DRAMs with
trenches were solved.
Well-to-substrate GIDL
Gate-induced drain leakage (GIDL) refers to the diffusion-to-substrate leakage generated in the gate-to-diffusion overlap region
of a gated diode structure or FET device. The leakage increases as the
gate is biased to deplete the surface of the diffusion. GIDL
had an impact on technology design as a result of its impact on
array well-to-substrate leakage and cell retention time.
During the development of the 4Mb DRAM, it was found that as the
trench-fill-to-substrate bias was increased, a significant current was
observed between the n-well and the substrate terminals
(Figure
17). This is due to the generation of electrons in the
depletion region formed along the trench dielectric-silicon substrate
interface, and their collection by the n-well. The polysilicon fill in
the trench plays the role of the gate in a gated diode configuration.
This current had the effect of overloading the n-well generator at the
high voltage and temperature used for burn-in.
Figure 17
GIDL is generated by more than one mechanism. The graph of
Figure
18 shows trench-fill-gated diode leakage versus
trench-fill-to-substrate voltage at various temperatures, for the 4Mb
DRAM technology. This graph indicates the existence of at least two
mechanisms. Above 4 V, a highly voltage-dependent component of the
current with weak but noticeable temperature dependence appears. At
lower voltages, the current has a much stronger temperature dependence
and a reduced voltage dependence
[35, 36].
The same low-voltage
thermal generation mechanism was also observed on large-perimeter n- and p-channel MOSFET test structures, where it was shown to
be a function of dielectric thickness and gate-to-drain overlap
[37].
Concurrently, researchers in universities were pursuing
parallel research efforts
[38-40]. Research efforts outside IBM
focused on MOSFET GIDL in high-field band-to-band tunneling and the
avalanche breakdown regime; within IBM, the effort was more focused on
the low-voltage thermal regime.
Figure 18
A model for GIDL was added to the IBM finite-element device analysis
simulator FIELDAY II
[41, 42].
It includes band-to-band
tunneling, as well as electric-field-dependent thermal generation
processes, such as Frenkel-Poole generation and trap-to-band
tunneling models.
The simulator results were used to design the 16Mb cell to eliminate
the trench-fill bias-induced n-well current described above. The
solution was to extend the thick insulating collar below the n-well to
create a potential barrier isolating the n-well from the heavily doped
p-type bulk where the electrons were generated (Figure
19). The simulator helped to determine the minimum
collar depth necessary to eliminate the leakage
[43].
Figure 19
DRAM cell device GIDL
The use of trenches for isolation, begun with the 16Mb DRAM,
causes a unique high-field region at the intersection of the gate drain
and the isolation which is three-dimensional in nature
(Figure
20). The isolation of the DRAM cell device is actually
formed by the storage trench collar. GIDL in this structure was
simulated using FIELDAY II
[42, 44].
This simulator allows complex
geometries to be simulated, and new physical models to be added
easily, because it is highly modular, separating a problem into
generalized geometric definition, equation setup, and solution
phases. Figure 21 shows a plot of the simulated
electric field at the device surface, which is enhanced at the
"corner" where the gate and isolation intersect. This high
electric field increases GIDL.
Figure 20
Figure 21
The existence of the high GIDL current at the corner is shown
experimentally by plotting the measured MOSFET GIDL current for
trench-isolated devices of differing width versus device width,
together with the MOSFET GIDL current measured from a very wide device
scaled for width (Figure 22).
As devices become
narrower, the measured GIDL current tends to be independent of width
because of the current generated at the high-field region described
above. For wide devices, the current approaches the wide-device current scaled for width.
Figure 22
The GIDL current was simulated with variations in structural and device
design parameters which influence GIDL. They include trench corner
shape, junction grading, gate oxide thickness, and gate wrap-around of
the trench sidewall. Model parameter calibration was accomplished by
fitting a 2D model minority carrier lifetime and band-to-band tunneling coefficients to the measured current from a wide-gated
diode without corners. The 2D and 3D model geometries were
obtained from SEM micrographs. Some of the simulation results are
shown in Figure 23, a graph of the 3D GIDL
corner current vs. gate-to-drain voltage difference for different
junction profiles [45].
A GIDL current of less than 2 fA per cell was
achieved by using 3D device simulation and experimentation to guide
optimization of the drain doping profile and controlling critical
process steps.
Figure 23
Solution of dislocation-related
retention problems
In the development of the 4Mb cell, leakage due to dislocations
and variable retention time (VRT) proved to be the most important
retention time problems in the early stages of development. (VRT is a
phenomenon in which retention time fluctuates with time.) The solutions
to these problems were used in subsequent generations.
Stress, combined with nucleation sites in the silicon, can create
dislocations in some cells. When these dislocations cut through the
storage node diffusion of the cell, leakage is generated along this
defect, reducing retention time. Dislocations were generated in the 4Mb
DRAM cell at the intersection of the deep-trench storage node,
LOCOS isolation, and the ion-implanted junction, where stress and
nucleation sites coincide [Figure 24(a)]. The
greatest stress around the trench is created by the oxidation of the
trench sidewall during LOCOS and subsequent processes, when a
vertical bird's beak pushes out from the trench [Figure
24(b)]. A variety of modeling tools were used to
understand how to improve the cell design from the standpoint of
minimizing stress. FEDSS (Finite Element Diffusion Simulator System)
was used with simple analytic approximations to evaluate the stresses
due to oxidation around the SPT cell [46].
Figure 24
TEM pictures showed that dislocations usually begin at nucleation sites
in the source-drain implant area in front of the deep trench
(Figure 25).
Chips with low retention times had
many dislocations, and chips with long retention times had few.
Although perfect correlation between dislocations and individual
retention time failures was not found, no dislocations were found in
chips having RTs of 750 ms and greater.
Figure 25
A number of process modifications were made to eliminate dislocations.
A high-temperature RTA, inserted just before source-drain implantation
to relieve stress by allowing oxide to flow, resulted in a drastic
reduction in the number of dislocations. Some improvement in the number
of point nucleation defects created by the source-drain ion implant
was made by implanting directly into the silicon instead of through a
screen oxide. By this change, incidental free oxygen from the screen
oxide was eliminated as a postimplant interstitial.
Heavy-metal (Fe, Ni, Cu, Mo) contamination, which electrically
activates and increases the density of nucleation point defects, has
been reduced by nearly two orders of magnitude since the beginning of
4Mb DRAM manufacturing. Metal contamination monitoring has become
an integral part of tool and wet-process monitoring.
The layout of the trench, which can influence stress, was modified a
few times during the 4Mb development. The first time, an experimental
approach was taken, with test masks and fabrication experiment; it took
a year to produce a design. Later in the program, when photo exposure
changed and a new trench layout was needed, a simulation program,
Boundary Element Design System (BEASY)
[47], was used to design
it. A simple model (an expanding plug in a hole) calculated the
stress on the trench face versus shape and guided a mask change that
successfully reduced the stress
[48]. This took less than three
months and resulted in a significant increase in yield
[49].
Researchers at IBM discovered VRT on DRAM cells shortly after Yaney et
al. reported the phenomenon in 1987
[50]. A VRT cell may cause a
field failure or may be unimportant, depending on the application
and the minimum retention time of the VRT cell. Extensive testing on
DRAMs from 64Kb to 16Mb from many vendors and technology types has
shown that VRT cells exist on all of them. Fortunately, VRT instability
is usually seen only at high application temperatures and at very long
retention times compared to the application specification.
Two types of VRT cells were found: two-state VRT, where the retention
time of the cell falls randomly into only two stable exponential
probability distributions, and multistate VRT, where there is a
continuous range of retention times and the probability distribution
looks Gaussian on a logarithmic scale of retention time. Thermal
activation energies for the leakage current of high and low states in
the two-state VRT cell have been measured at about 1.0 eV for the high
state and 0.9 eV for the low RT state. The frequency of switching
between high and low states is also thermally activated
[51].
Physical failure analysis of VRT cells has shown dislocations and
silicon crystal defects only on cells that spend large amounts of time
below 300 ms. For VRT cells with minimum RT greater than 300 ms, no
defects are found.
Chips with low minimum VRT typically have many weak cells for ordinary
RT as well (Figure 26). This provides a method
of screening product to prevent customers from getting product with low
minimum VRT. Product with a 16-ms retention time requirement was
screened for retention times many times longer. A high-temperature module test used to screen for VRT during burn-in provided
data on the effectiveness of wafer-level screens.
Figure 26
As the retention time specifications increased to 32 ms and greater, it
was found that the wafer-level RT guardband could be reduced without
degrading VRT performance. VRT problems were eliminated as a result of
drastic reduction in the occurrence of dislocations and the improvement
in retention time distributions.
Conclusions
The substrate plate trench (SPT) cell used for the 4Mb DRAM has
been the basic concept for three successive DRAM generations at IBM, as
the cell area has been reduced from
11.3 µ m to 0.6
µ m .
The outstanding advantages of the SPT cell are more
planar topography and reduced junction leakage and SER.
To obtain the cell size reductions required by competition, the minimum
lithographic image size has been reduced by 0.7 ×
at each generation,
and new features have been added to the cell technology. The
increase in process complexity with each generation, and the
resultant increase in development cost, have necessitated the use of
alliance partnerships to reduce the investment required by one
company alone, beginning with the 64Mb generation.
The first generations of I-line and DUV lithography were developed
along with 4Mb and 16Mb DRAM, respectively. Additional lithography
improvements are being developed as 64Mb and 256Mb development
proceeds.
The important cell features which enable shrinkage of the SPT cell are
the addition of a thick SiO
collar around the top part of
the trench, the borderless contact, and new methods of forming the
strap contact between storage trench fill and node diffusion.
The 4Mb and 16Mb DRAMs have p-MOS arrays in retrograde n-wells
implanted in a p-epitaxial layer on a p+ substrate, which serves as
the plate of the storage capacitors. With the 64Mb generation, an n-MOS
array is used to improve performance, and the buried n+ layer that
forms the plate is diffused from the bottom part of the trench into a
p-substrate.
Many technology features first introduced in 4Mb and 16Mb DRAMs are
also important to advanced logic, including shallow-trench isolation,
polish-planarization techniques, retrograde n-wells, and planarized
wiring technology. The wiring technology developed for the 0.7-µ m
4Mb DRAM technology is found to be scalable to 0.25-µ m, 256Mb
technology.
The need for DRAM retention time improvements drives the study of
important leakage phenomena. GIDL was studied in connection with a
storage-trench-gated n-well-to-substrate leakage impacting n-well bias generation at burn-in conditions in the 4Mb and 16Mb chips and a cell node leakage
mechanism affected by 3D geometry in the 16Mb cell. Dislocation
defects, which appeared in the early stages of trench process
development, were found to cause retention time loss and gave rise to
the variable retention time phenomenon. These problems were solved
after simulation suggested techniques which eliminated the
dislocations.
Use of the same basic cell concept for four generations has encouraged
the transfer of knowledge and process techniques from one generation to
the next. At this point, the cell design to be used for the 1Gb DRAM
chip has not yet emerged.
Acknowledgments
This paper would not be complete without acknowledgment that the
developments described in this paper are the work of IBM Research, the
IBM DRAM development teams in Essex Junction, Vermont, and the Advanced
Semiconductor Technology Center in East Fishkill, New York, including
the Siemens and Toshiba alliance partners. The references cited in this
publication are only a small part of what was contributed by the
workers from these laboratories. There are too many people to name
individuals. The principal author would also like to thank the
anonymous referees and editors of the IBM Journal of Research and
Development for their constructive suggestions, and his manager
for support during the writing of this paper.
References
Received July 25, 1994; accepted for publication October 14, 1994
|