IBM Skip to main content
  Home     Products & services     Support & downloads     My account  
  Select a country  
Journals Home  
  Systems Journal  
Journal of Research
and Development
  ·  Current Issue  
  ·  Recent Issues  
  ·  Papers in Progress  
  ·  Search/Index  
  ·  Orders  
  ·  Description  
  ·  Patents  
  ·  Recent publications  
  ·  Author's Guide  
  Staff  
  Contact Us  
Journal of Research and Development  
Volume 39, Number 12, 1995
IBM CMOS Technology
 Table of contents: arrowHTML      DOI: 10.1147/rd.391.0167 arrowCopyright info
   

The evolution of IBM CMOS DRAM technology

by E. Adler, J. K. DeBrosse, S. F. Geissler, S. J. Holmes, M. D. Jaffe, J. B. Johnson, C. W. Koburger III, J. B. Lasky, B. Lloyd, G. L. Miles, J. S. Nakos, W. P. Noble, Jr., S. H. Voldman, M. Armacost, and R. Ferguson
The development of DRAM at IBM produced many novel processes and sophisticated analysis methods. Improvements in lithography and innovative process features reduced the cell size by a factor of 18.8 in the time between the 4Mb and 256Mb generations. The original substrate plate trench cell used in the 4Mb chip is still the basis of the 256Mb technology being developed today. This paper describes some of the more important and interesting innovations introduced in IBM CMOS DRAMs. Among them, shallow-trench isolation, I-line and deep-UV (DUV) lithography, titanium salicidation, tungsten stud contacts, retrograde n-well, and planarized back-end-of-line (BEOL) technology are core elements of current state-of-the-art logic technology described in other papers in this issue. The DRAM specific features described are borderless contacts, the trench capacitor, trench-isolated cell devices, and the "strap." Finally, the methods for study and control of leakage mechanisms which degrade DRAM retention time are described.
Introduction

The 4Mb DRAM generation saw a revolutionary change in technology at IBM, with the introduction of CMOS, trench capacitor storage, and other new processes and structures. Although rapid progress continues, the basic cell structures and many of the processes developed then are being used in the 64Mb and 256Mb DRAMs being developed today. In addition, much of the technology developed for the 4Mb and 16Mb DRAMs is now used in CMOS logic technology. This paper describes the DRAM cell used by IBM beginning with the 4Mb generation, and traces its evolution to the 256Mb cell being developed today. We then describe the development of some key technology elements, and explain how key DRAM device problems were solved.

Dynamic random access memory has been a good vehicle for technology development, because there is a predictable demand for a large number of chips of standard design. The density of the array, a well-understood benchmark which determines cost, is a very effective driver of technology development. The addressability and repetitive character of the array make it possible to find and solve technology problems in the product. The high volume allows employment of the team of experts required to do a thorough development job. Thus, DRAM is the product that has driven the state of the art of silicon device technology up to the present day.

The one-device DRAM cell [1], invented at IBM by R. Dennard, consists of a cell transistor with the drain connected to one node of the cell storage capacitor, the source connected to a bit line, and the gate connected to the word line, which runs orthogonal to the bit line (Figure 1). The requirement to have a large capacitor in a small space with low leakage is the main driver of DRAM technology. A brief description of the cell operation will help to explain why. To write, the bit line is driven to a high or low logic level with the cell transistor turned on, and then the cell transistor is shut off, leaving the capacitor charged high or low. Since charge leaks off the capacitor, a maximum refresh interval is specified. To read, or refresh the data in the cell, the bit line is left floating when the cell transistor is turned on, and the small change in bit-line potential is sensed and amplified to a full logic level. The ratio of cell capacitance to bit-line capacitance, called the transfer ratio, which ranges from about 0.1 to 0.2, determines the magnitude of the change in bit-line potential. A large cell capacitance is needed to deliver an adequate signal to the sense amplifier.

Figure 1

The evolution of technology has followed the following overall trends.

DRAM cell size has decreased from 11.3 µ m² for the first 4Mb cell to 0.6 µ m² for the first 256Mb cell. Improvements in lithography were responsible for much but not all of the size reduction. New process features were also necessary to shrink the cell and to improve array performance. As a result, there has been a trend toward increased process complexity, as reflected in the number of masking steps used in the process, which has increased from 13 in the 4Mb generation to 25 in the 256Mb generation.

The increase in the complexity of DRAM technology has driven up the cost of DRAM development, resulting in the formation of alliances between companies to reduce the expense to individual companies. The IBM 64Mb DRAM is being developed by an alliance between IBM and Siemens, and the 256Mb by a triple alliance including IBM, Toshiba, and Siemens.

DRAM external power supplies follow industry standards. Because the chip power is low enough, DRAM can use on-chip power supply regulation to reduce the internal circuit power supply swings. IBM has led the industry in reduction of power supply voltages for CMOS logic and memory. DRAM technology resists power supply scaling more than logic technology because of the need for storage of charge. Table 1 illustrates this trend.

Power supply voltage reduction will come rapidly, since the market for battery-operated equipment is growing faster than previously anticipated. Also, performance competition in microprocessors demands ever-shorter channel lengths, which in turn require reduction in power because of device scaling. DRAM chips and technology will be similarly forced to operate at lower power supply voltages in the near future.

The market for battery-operated equipment also creates a need for longer DRAM retention times, to reduce the power associated with refreshing the data. The data retention time specification is currently 64-256 ms, making very low leakage current a requirement, along with a large cell capacitance.

The cell capacitor was a simple planar structure through the 1Mb generation. At and beyond the 4Mb generation, as the cell size decreased, the effective surface area of the capacitor was maintained by placing the capacitor on the sides of a narrow trench etched into the silicon, or by putting the capacitor on top of the other elements of the cell.

The next section begins by explaining the development of the IBM 4Mb substrate plate trench (SPT) DRAM cell, at a cell size of 11.3 µ m. We next show how important features were added for succeeding generations to reduce the cell size to 0.6 µ m², where it now stands for the 256Mb chip. Succeeding sections trace in more detail the development of certain technology elements essential to DRAM. We start with the strap connection between the storage trench polysilicon and the node diffusion, a unique SPT DRAM requirement, which is a challenge for process integration. Then we discuss device isolation, retrograde n-well, salicidation, lithography, and metallization. Finally, solutions for various cell device design and retention time problems encountered during DRAM development are described. Included are gate-induced drain leakage (GIDL), three-dimensional device effects, dislocation-related leakage, and the variable retention time phenomenon.

DRAM cell structure evolution

The folded bit-line cell array configuration (Figure 2) has been used universally in the industry since the 1Mb time frame. In the folded bit-line configuration, a cell is crossed by two word lines and one bit line. One of the word lines (WL1 in Figure 2) is the "active word line" for the cell, and forms the gate of the cell device. The second word line (WL2), the "passing word line," is the gate of the cell device on the adjacent cell. Thus, the bit line (BL) and reference bit line (BL) can be adjacent, leading to better matching and noise rejection, as well as providing a wider pitch for the layout of the sense amplifier. Although the cell now contains two word lines (active and passing), this does not require more cell area than an open bit-line cell, since the additional area is also generally the same area used for the storage capacitor.

Figure 2

IBM adopted CMOS technology for DRAM at the 4Mb generation. Previously, DRAM had been implemented in simple n-MOS technology because the latter was relatively inexpensive. However, logic applications, which were sensitive to active power, had already migrated to CMOS. Integration of a DRAM cell structure into a CMOS technology brought with it some fundamental issues which had to be resolved before tackling the cell structure in detail. For an integrated DRAM technology, the doping types and profiles from the starting substrate up to the device gates had to be chosen and optimized to the best trade-off of cost, reliability, function, and speed. The most fundamental issue, however, was the one of choice between n-well CMOS on a p-type substrate and p-well on an n-type substrate. Two important conditions were set which the technology had to meet, and which still obtain for current and future generations:

  1. The array must be isolated from the substrate by building it within a well of opposite doping to take full advantage of CMOS. This benefits cell retention time by eliminating all leakage current sources associated with the substrate wafer. It reduces the incidence of soft errors due to ionizing radiation by confining the effective minority carrier collection length within the well and sending many of the generated minority carriers to the substrate, where they do not affect the storage node diffusion.
  2. The well potential must be stable despite the impact ionization that accompanies FET operation. This ionization is largest for an n-channel device. Therefore, n-MOS devices should not be positioned in a well whose conductivity is reduced by light doping or constrained depth. This constraint is satisfied by an n-well CMOS technology on a heavily doped p-type substrate.

A p-MOS DRAM array built in an n-well CMOS technology meets these conditions and was chosen for the 4Mb generation. The cell choice was then made within that framework.

The criteria for cell choice are density, process simplicity, adequate storage capacitance for detectable signal, and low parasitic capacitances for performance and minimization of noise. Each generation of DRAM must compete with prior generations by providing an ultimate lower cost per bit. This is accomplished by decreasing cell size with each generation, while minimizing the increase in processing cost. The industry trend [2] is to reduce cell size by a factor of 0.33 for each generation. The industry trend in lithography is to reduce the minimum image size by a factor of 0.7 for each generation, so the use of lithography alone would reduce the cell area by 50% for each generation. Figure 3 shows the cell size vs. generation plotted together with the square of the minimum lithographic image. This shows that technological innovation, involving a change in cell structure, is needed in addition to lithographic scaling to reduce the cell size by a multiple of one third for each generation. Also, technical advances are required to implement dimensional scaling (reduced heat cycles, film thicknesses, defect levels, etc.) and to mitigate electrical limitations arising from such scaling.

Figure 3

At the transition from 1Mb to 4Mb [3], planar capacitors did not provide enough cell capacitance, and were replaced by three-dimensional capacitors throughout the industry. These took the form of either trench capacitors buried within etched holes in the silicon [4-7] or stacked capacitors built above the silicon [8-12] in the region of the interconnect-level films.

The planar capacitor in the 1Mb and prior generations in IBM used an oxide/nitride/oxide (ONO) storage insulator consisting of a sandwich of thermally grown oxide, followed by deposited silicon nitride, which is subjected to oxidation to seal any weak spots in the nitride. Early experiments with deep-trench capacitors produced excellent results using the same ONO storage node insulator used in the 1Mb generation. Since the defect levels per unit area were much lower than predicted by experience with planar capacitors, trench capacitors were chosen for the 4Mb DRAM generation.

The 4Mb generation

The cross section of the IBM 4Mb cell is shown in Figure 4. The capacitor consists of the polysilicon storage node electrode which fills the trench, the ONO node dielectric on the trench walls, and the p+ substrate which forms the storage plate. Thus, there is no need for the separate plate wiring layer found in other cell types. The trench polysilicon node is connected to the array device diffusion pocket by a selective silicon epitaxy surface strap, which bridges the thin oxide separating the active area and the top surface of the storage node. This cell structure is referred to as the substrate plate trench (SPT) cell [13]. This type of cell differs from the standard industry trench cells, which either form the storage node in the silicon substrate outside the trench, or stack two polysilicon electrodes separated by the insulator inside the trench.

Figure 4

Active device areas are formed in a p-epitaxial layer grown on the p+ substrate. As shown in the layout of Figure 5(a), the active regions are separated by conventional isolation. Because the cell is in a well, a vertical parasitic p-FET is formed between the p+ storage node diffusion and the p+ substrate, with the trench polysilicon as the gate. This parasitic device is never turned on because the gate is tied to the p+ storage node diffusion, which is always the source of the p-FET, and the array n-well is back-biased at about 1 V above the power supply voltage.

Figure 5

The 16Mb generation

In the 4Mb cell, a localized oxidation of silicon (LOCOS) isolation region must separate a trench from an adjacent active device area to avoid parasitic sidewall currents gated by the storage node polysilicon, and the automatic strapping of all adjacent nodes and trenches which would otherwise occur. In the 16Mb cell, this limitation was overcome by a modification of the trench structure. Figure 6 is a cross section of the 16Mb cell, showing that the insulator lining the trench now contains a thick (approximately 100-nm) SiO2 collar which extends from the silicon surface to a point below the n-well. The thick SiO2 collar prevents unwanted bridging of exposed node silicon and diffusion surfaces. It also has the function of isolating the node trench polysilicon from the cell device edge under the word line, which was the role of the LOCOS isolation in the 4Mb cell. To further isolate the storage trench polysilicon from the abutting cell device region and the word line, the top of the trench polysilicon must also be recessed below the active device area wafer surface and covered by a thick oxide. The storage trench can now be placed in the space between cell devices, as shown in Figure 5(b). This increases the efficiency of the cell layout by decreasing the area devoted to thick oxide isolation and increasing the area available for storage capacitance.

Figure 6

Electrical connection between the trench polysilicon node and the array device across the thick collar is made by a deposited polysilicon surface strap using a novel process to be described in a subsequent section of this paper. This strap is borderless to the dielectric-encapsulated word line. This reduces the active-to-passing word-line space, which was determined by the overlay tolerance of the trench, isolation, and word-line layers in the 4Mb cell. The 16Mb cell is referred to as the merged isolation and node trench (MINT) SPT cell [14].

The 64Mb generation

Along with the density increases, improvements in performance were also realized as a consequence of scaling. During the 4Mb and 16Mb generations, the lower performance of a p-MOS cell device relative to n-MOS was not a problem. With the 64Mb generation, the time required to move data in and out of cells could be significant. Therefore, an n-MOS array was desired. The simplest structural change to achieve this would be simply to interchange n-material for p-material relative to the 4Mb and 16Mb generations. Thus, the starting material would be n-type, with implanted p-wells in which the cell arrays would be formed. However, this structure forfeited the noise immunity advantages of n-well technology as argued for the 4Mb and 16Mb generations. The benefits of an n-well CMOS technology on a p-type substrate could be retained at the cost of some increased process complexity. The array p-well and the substrate would have to be electrically isolated. This allowed the array well to be reverse-biased (-1 V) for low leakage, low parasitic capacitance, and maximum signal, while the substrate was at ground for low noise and best performance.

Figure 7 shows the cell configuration which achieves this for the 64Mb generation. The array p-well is isolated from the substrate by an underlying n-type layer which is formed by outdiffusion from a source deposited within the trenches. In a dense array, the trenches are close enough together that diffused regions form a continuous n-type layer. Since the n-type region extends to the bottom of the trenches, it also serves as a capacitor plate. Connection of this n-type plate to the top surface is formed by an n-well ring which surrounds the array. This cell configuration is called the buried plate trench (BPT) cell [15].

Figure 7

The overall cell layout is similar to that of the 16Mb generation, as shown by Figure 5(c), with the addition of a "borderless contact." This feature reduces the cell size by eliminating the diffusion border required between the bit-line contact and the adjacent word line. This requires a special contact structure made by imposing a film underneath the interlevel oxide, which can act as an etch stop during formation of the hole for the contact stud. The word line must be insulator-encapsulated prior to the deposition of the etch-stop film, so that the contact hole can be opened without exposing any portion of the word line which may be within the contact image.

The 256Mb generation

Scaling of the 64Mb surface strap to 256Mb dimensions presented formidable challenges, because the strap-to-trench overlay is critical to the width of the cell, and the strap must be built in the narrow opening between the active and passing word lines. For these reasons, a new strap structure, the "buried strap," and a different cell layout were used as shown in Figure 8 and Figure 5(d). The buried strap is fabricated early in the process and has a diffused connection formed by creating a sidewall contact on one edge of the trench capacitor. It saves the cost of the strap mask and avoids the high-aspect-ratio processing in the active-to-passing word-line space. Unfortunately, this cell layout produces a relatively smaller trench than its predecessor, but this is compensated for by scaling the node dielectric thickness to increase the cell capacitance and biasing the plate at V/2 to reduce maximum field in the dielectric. Otherwise, the well/substrate configuration is as described for the 64Mb generation. This cell is referred to as the buried strap trench (BEST) cell [16].

Figure 8

Table 2 summarizes the process sequence as it evolved from the 4Mb generation through the 256Mb generation. It shows that a large number of processes were kept from generation to generation, while additions were also made. The strap connection between the node trench polysilicon fill and the node diffusion was changed significantly for each generation. We now discuss the technology elements in more detail.

Strap process development

The "strap" which connects the drain of the array transfer device to the storage trench polysilicon is an essential part of the STP cell. This strap adds small cost and requires little additional area; it should not degrade the retention time of the cell. It is a special DRAM-oriented process that demands the utmost in inventiveness to be successful.

The strap process for the 4Mb DRAM relies on the fact that the diffusion and polysilicon in the storage trench are coplanar and are separated only by the 10-nm ONO layer on the trench sidewall (Figure 4). After the spacers on the gate conductor are formed and the junctions implanted, a thin (70-nm) layer of intrinsic selective silicon is grown. This bridges the ONO insulating layer [17]. The next step is salicide formation, which consumes the selective silicon and forms a low-resistance strap. Both selective silicon deposition and salicidation are needed to form such a strap. This process uses no extra masks for the strap, but increases the word pitch because the passing word line cannot pass over the strap contact.

In the 16Mb cell, the strap must bridge the 160-nm-wide oxide collar and a 160-nm step from the trench top to the node diffusion (Figure 6). Selective silicon was not used, because the thickness required would cause spurious nucleation on insulators and bridging of the storage trench to the "wrong" diffusion.

To produce a manufacturable strap, a novel process, called "boron out-diffused surface strap," or BOSS, was developed. After source-drain implantation, a thin layer of silicon nitride is deposited on the chip. A contact hole is etched through the silicon nitride and trench top oxide in each cell, exposing the boron-doped trench polysilicon and p+ diffusions that are to be connected. A thick SiO2 cap and sidewall spacer are required on the gate electrode to avoid exposure of the gate electrode surface during this etch. A blanket layer of intrinsic polysilicon is deposited, and the wafer is annealed to diffuse boron up into the intrinsic polysilicon from the trench and diffusion tops. The result is a boron-doped polysilicon layer bridging the trench and diffusion within each hole. The remaining intrinsic polysilicon is then removed by a selective wet etch, isolating the cells from one another. Finally, an oxide is grown over the strap polysilicon, and the blanket nitride is removed to prepare for salicidation of diffusion. Figure 9 shows a cross section of the strap contact at two points in the process. For the BOSS process, the critical parameters are selectivity of the silicon wet-etch processes to boron doping, and the diffusion of boron in an undoped polysilicon film. The wet etch must remove undoped polysilicon to avoid strap-strap shorts, while the boron-doped polysilicon that bridges the trench to the node remains.

Figure 9

The 64Mb cell has an n-type node diffusion and polysilicon trench fill. An equivalent BOSS process was not available because a suitable doping-sensitive wet etchant was not available for n-type material. Therefore, the strap technology was modified to use a thin in-situ-doped n-type polysilicon film deposited in the strap contact holes of each cell. An oxide fill and planarization of the strap contacts with short etch to remove polysilicon from the contact hole sidewalls complete the strap and also maintain a planar surface.

The 256Mb "buried strap" is fabricated after the recess etch of the second trench polysilicon fill, as described in the process sequence of Table 2. The top part of the oxide collar is removed so that the third n+ polysilicon trench fill that is subsequently deposited can contact the silicon wafer just below the surface. The trench polysilicon is then recessed again. As a result of the cell layout, the isolation trench etch, which is done next, leaves only the trench sidewall next to the storage node diffusion connected to the storage node trench polysilicon. The n+ outdiffusion from the node polysilicon merges with the node source- drain diffusion to complete the contact. To prevent the buried-strap diffusion from affecting the cell device characteristics, arsenic, which diffuses slowly, is used as the n+-type dopant in the trench polysilicon.

Evolution of isolation with DRAM generations

The IBM 4Mb DRAM contained many innovative process features, but elected to employ the conventional LOCOS process for device isolation. It has the lowest cost and is a well-understood "industry standard" process, but it has two major disadvantages. One is area loss to the "bird's-beak" phenomenon at the isolation boundary, which remains at approximately 0.10-0.15 µ m per edge and thus becomes an ever-increasing fraction of the total lithographically limited isolation pitch. A second drawback is isolation oxide thinning in very narrow isolation areas due to multidimensional oxidation effects [18].

The IBM 16Mb generation had a new cell structure that affected the choice of isolation technology (Figure 6). As explained in the section on cell structure evolution, the top of the 16Mb deep trench must be recessed below the wafer surface and capped with an insulator to make the MINT cell work. While some oxide grows on the (polysilicon) trench fill during gate growth, it is not thick enough to provide adequate planarity for the word line. Therefore, the recess must be filled with an insulator (oxide) and planarized to create this cell.

Shallow-trench isolation (STI), an alternative to LOCOS, offers the possibility of true lithographically limited pitch and feature size, low thermal cycle, and improved surface planarity. It is accomplished by etching isolation trenches into the silicon wafer, depositing oxide to fill them, and planarizing the surface. Its major drawback is the increased cost and complexity of planarizing the blanket-deposited fill oxide.

Since deposition and planarization of the deep-trench capping oxide must be performed in order to fabricate MINT memory cells, the challenge of STI had to be addressed even if LOCOS was employed in the support area. As a result, STI became the complete answer for both storage-trench capping and standard device isolation beginning with the 16Mb generation.

The fill-oxide planarization method chosen for 16Mb processing employed a planarization block mask (PBM) immediately following oxide deposition. This leaves photoresist in larger-sized isolation regions to help with planarization. This masking step is followed by the application of planarization polymer. Reactive ion etch (RIE) of the oxide-polymer stack and chemical-mechanical polishing (CMP) to remove deposited oxide from the active areas complete the planarization process [19]. Figures 10 and 11 physically compare isolation for 4Mb and 16Mb chips.

Figure 10 Figure 11

Devices made using STI show unique electrical behavior not found in LOCOS-isolated devices. Inverse narrow-channel effect (decreasing device threshold voltage as active width narrows) [20], local dielectric thinning [21], and the existence of a lower-threshold n-channel parasitic device, parallel to the main device channel [20, 22], all result from the abrupt surface topography (corner) where isolation abuts a device. While angled implantation or solid-source diffusion has been shown to modify and/or eliminate the n-channel parasitic device and inverse narrow-channel effect [23, 24], device behavior could be controlled within acceptable bounds without adding these process steps.

Table 3 compares LOCOS and STI for their last and first IBM DRAM implementations, respectively. Note that the minimum active device width is no longer determined by isolation restrictions, but is limited by salicidation process boundaries. Minimum isolation width, as well, is not an inherent technology issue, but simply set to provide manufacturability in photolithography.

The extendability of the basic STI process has been demonstrated in the 64Mb and 256Mb DRAM processes being developed by IBM and its alliance partners. While modifications to the planarization process have been exercised to widen the process window [25], the basic process advantages offered by STI have been found to be extendable over several DRAM generations with minor alterations.

CMOS well doping to prevent latch-up

During the development of the 4Mb DRAM, CMOS latch-up was an important concern. Latch-up can be prevented by lowering parasitic bipolar gains and reducing the parasitic substrate and n-well resistances to prevent voltage drops from forward-biasing the base-to-emitter junctions.

Most CMOS technologies being developed at the time used p-substrates and diffused n-wells, which are susceptible to latch-up. To prevent latch-up, the layout ground rules ensured that regions which form bipolar bases had large lateral dimensions, thereby reducing the bipolar gains.

The IBM DRAM designs used a different strategy, driven by the SPT cell design for 4Mb generation, as well as by the desire to avoid latch-up. It was based on technology demonstrated by IBM research [26]. Since the grounded p-type substrate formed the plate electrode of the 4Mb and 16Mb SPT DRAM cell (Figure 4), it had to be heavily doped (Na = 10^19 cm^-3) to avoid n-type reduction in cell capacitance due to depletion when positive bias was applied to the trench fill. Heavily doped substrates provide excellent latch-up protection, since the substrate, which would be the base of the parasitic npn, has a low-resistance connection to ground. Earlier CMOS n-wells were diffused from the top surface of the silicon (Figure 12), and the donor concentration decreased with depth. A retrograded n-well was used to increase the threshold voltage of the vertical parasitic FET present in the SPT cell. (See the 4Mb cell structure description above.) The retrograde profile (Figure 13) was produced by high-energy implantation of phosphorus using MeV ion implanters. This retrograde well was lower in resistance, allowing for larger spaces between n-well contacts, and provided higher base doping, which lowered the gain of the parasitic pnp transistor.

Figure 12 Figure 13

The p+ substrate and the retrograde n-well raised the latch-up holding voltage above the maximum application voltage at the minimum spacing allowed between n+ and p+ diffusions. These features were also used for the 16Mb DRAM and for logic generations created in the same time frame.

In the 64Mb and 256Mb DRAMs, the base technology was altered because trade-offs in the cell design provided less latch-up immunity. While the highly doped substrate was eliminated, the retrograde n-wells remained.

Use of advanced lithography for DRAM

To a first order, the minimum printable image size for a lithography tool is given by

R = k (lambda) /NA,

where R is the resolution limit, k is the Rayleigh k-factor which determines the image constrast and is approximately -7 for manufacturing purposes, (lambda) is the exposure wavelength, and NA is the numerical aperture of the lens system.

DRAM development has been the test bed for new lithographic equipment and photoresists in IBM. The 1Mb generation was the last in IBM to use G-line (mercury-arc light source; wavelength 436 nm) lithography with 1.0-µ m ground rules. Historically, the resolution capability of lithographic systems has been scaled by reducing the exposure wavelength and increasing the optical numerical aperture of the phototool. Figures 14 and 15 illustrate this trend for I-line and deep-UV (DUV) processes.

Figure 14 Figure 15

I-line (mercury-arc; wavelength 365 nm) systems were introduced in the mid-1980s, with a numerical aperture of 0.28 and a resolution capability of 0.80 µ m. Early I-line tool prototypes were used in the development of the first IBM 4Mb DRAM products in 1986. This helped to drive the development of mid-UV photoresist systems. Over the last nine years, the numerical aperture of these systems has been increased to 0.60, with a resolution capability of 0.50 µ m.

High-resolution (mercury-arc or excimer laser; wavelength 248 nm) DUV lithographic systems became available in late 1987. Early prototypes of DUV machines were used by IBM for the development and manufacture of 16Mb DRAMs at 0.5-µ m ground rules, while other semiconductor manufacturers have chosen to extend I-line equipment for this application. This early use drove the development of deep-UV photoresist systems in IBM. Deep-UV machine numerical apertures range from 0.36 to 0.60. DUV lithography has generally provided a three-year performance advantage relative to I-line (see Figure 14), although it entails higher chemical and equipment costs. IBM is using DUV lithography for the manufacture of CMOS devices with image-size ground rules from 0.40 to 0.60 µ m. While both conventional steppers and step- and-scan lithographic tools have been used, the ease of attaining large field sizes with the step-and-scan approach has favored its implementation.

Recent advances in resolution-enhancement techniques such as phase-shifting masks and off-axis illumination have provided the capability to widen the available process window that can be obtained from a given lithographic toolset. These approaches are particularly attractive for the 256Mb DRAM, where stringent demands are placed on current DUV systems for generating 0.25-µ m images. Simulations predict that through the judicious application of attenuated phase-shift masking, off-axis illumination, and feature-dependent biasing, acceptable process windows can be achieved for all critical levels within the cell [27]. Initial experimental evaluations have confirmed the predicted benefits at 0.25-µ m dimensions [28]. With efforts to establish manufacturable solutions well underway, application of these resolution-enhancement techniques to I-line lithography will also offer a cost-effective alternative to DUV lithography for the 64Mb DRAM at dimensions of 0.35-0.4 µ m.

Silicide technology for diffusions and gate conductors

Driven by requirements of reduced resistance for gate conductors and diffusions, titanium disilicide metallization was introduced for the 4Mb DRAM generation. The self-aligned silicide (salicide) technique, which forms silicide on both the gate conductors and the diffusions simultaneously with no additional photolithographic steps, was used. This process requires an insulating spacer on the sidewalls of the gate conductor to avoid shorting the gates and diffusions.

The process sequence is as follows: deposition of titanium metal, formation annealing to react the Ti with exposed Si (gate conductors and diffusions) to form TiSi2, selective etching to remove unreacted Ti, and transformation annealing to form the low-resistance phase of TiSi2. The integration issues which must be traded off are as follows:

  • First, the maximum titanium silicide thickness is limited by the junction depth of the technology.
  • Second, filaments of residual titanium silicide shorting the gate conductor to the diffusions (G-D shorts) limit the choices available for selective etch and annealing temperatures.
  • Third, the tendency of the silicide film to agglomerate limits the maximum thermal cycle to which the films can be exposed.

A two-step anneal is needed because titanium disilicide exists in two phases, a high-resistance phase (C49) of approximately 60-70 µ ohms-cm and a low-resistance phase (C54) of approximately 15-20 µ ohms-cm. Circuit requirements require conversion to the low-resistance phase, which needs a high-temperature anneal, while G-D shorts limit the maximum allowed formation-annealing temperature. The selective etch is designed to remove TiN, a by-product of the formation anneal, and unreacted Ti, while maintaining selectivity to TiSi2.

In the case of the 4Mb DRAM generation, the junction depth and the deposition of selective silicon for the strap after junction formation allowed the deposition of a Ti layer more than 600 Å in thickness. A conventional tube furnace was used for both anneals. The high-temperature reflow annealing process, used to planarize the BPSG (boron-phosphosilicate glass) first passivation insulator in the early version of the 4Mb DRAM process, caused agglomeration, and was eventually eliminated in favor of chemical-mechanical polishing (CMP), as described below in the section on wiring and insulation technology.

The 16Mb generation required a gate electrode encapsulated with oxide in order to fabricate the strap (see above), so TiSi2 salicidation, which requires an exposed gate electrode, could not be used for the gate. A WSi2 (tungsten polycide)-polysilicon sandwich, which was capped with oxide before gate patterning, was used to reduce the gate electrode resistance. A high-temperature anneal was required after patterning to reduce the resistance of the WSi2. The diffusion metallization remained TiSi2. The occurrence of G-D shorts decreased and yield improved because the gate was encapsulated in SiO2. However, this benefit was obtained at the cost of increased gate stack height, which complicated the gate etch process.

The shallower diffusions for the 16Mb DRAM generation reduced the maximum allowable Ti deposition for the salicide to less than 480 Å . The reduced TiSi2 thickness made agglomeration a major consideration in process development. Polish planarization was employed to reduce the back-end-of-line process temperature. Furthermore, it became difficult to obtain the low-resistance C54 phase on narrow diffusions. In fact, it was found that the onset of agglomeration preceded the transformation of narrow TiSi2 lines to the C54 phase using conventional furnaces for the second anneal. Rapid thermal annealing (RTA) can bring the TiSi2 to a temperature high enough to convert the narrow lines, for a time short enough to avoid agglomeration, and is now used instead of a conventional furnace.

Before RTA could be used in manufacturing, temperature control had to be improved. Conventional RTA temperature control relied on pyrometric temperature measurement of the wafer back side for feedback. Wafer temperature was not well controlled with this system, because the thickness variation of oxide and nitride films left on the wafer back side led to variations in emissivity. As a result, a power control strategy (open loop on temperature) was developed to make RTA a practical manufacturing tool [29].

RTA was also used for high-temperature annealing (above 1050° C) of the WSi2 polycide gate stack. This reduced the resistance of the WSi2 and relaxed stress in the thermal oxides, while the short annealing time minimized perturbation of the well-doping profiles. These objectives could not be accomplished by furnace annealing.

Salicidation is an essential feature of present and future high-performance logic technologies. The lessons learned in DRAM development have been useful in understanding the problems of scaling line widths and junction depths and refining the silicide manufacturing processes. Although salicided diffusions are not planned for 64Mb and 256Mb generations, the use of WSi2 polycide is expected to continue in DRAM generations beyond 16 Mb.

Metal wiring and insulation technology

The 4Mb DRAM introduced innovations in metal wiring and insulation technology [often referred to as the back-end-of-line (BEOL) technology]. These processes include chemical-mechanical polishing (CMP) of insulators, chemical vapor deposition (CVD) of tungsten, and deposition/etching of interlevel dielectrics. The introduction of the borderless contacts in the 64Mb and 256Mb generations (as described in the section on cell evolution) presents significant technological challenges requiring new processes and equipment.

Prior to the 4Mb DRAM generation, the BEOL had tapered contacts and sputtered Ti-Al-Cu-Si wiring. The contact sidewalls were tapered because the sputtered Al-based wiring had poor step coverage when walls were vertical. However, the shallow step angle required for reliability affected cell size as the technology was scaled.

It was necessary for the contact hole sidewalls from the bit lines to the diffusions below to be nearly vertical in order to achieve the desired cell size. CVD tungsten, because of its superior step coverage, was selected for the bit-line metallurgy. A single conformal deposition of tungsten was introduced, both to fill contacts and to provide the first layer of wiring. This layer was initially patterned using RIE. The challenge of filling the spaces between tungsten lines with nearly vertical sidewalls was met using a process with a repeating sequence of plasma-enhanced chemical vapor deposition (PECVD) of silicon dioxide followed by argon sputtering, which provided void-free insulator fill. This interlevel dielectric (ILD) was planarized using CMP, which resulted in a very flat surface for the second level of metal, avoiding lithography problems due to steps. Vias between the two metal levels were also nearly vertical. CVD tungsten was deposited again to fill nearly vertical vias. The superior conductivity of aluminum was needed for the second level of metal, so tungsten was removed from the insulator surface with an RIE etchback, leaving just the vias filled.

Several improvements have been made to the initial 4Mb process. RIE patterning of tungsten bit lines was replaced with a dual damascene approach which reduced defect levels associated with tungsten etching and deposition using 4Mb generation equipment [30]. This process begins with etching both contacts and troughs for the wiring into a thick, planarized SiO2 layer. Tungsten is deposited and then removed from the surface using CMP. This leaves the wiring inlaid in the insulator. Because the surface is now flat, there is no need for an expensive deposition/etch insulator or CMP between the first and second layer of wiring, so this is replaced with a single deposition. Vertical vias are still filled with tungsten, but it is removed from the surface with CMP instead of RIE, eliminating a difficult etching depth control problem and related RIE-induced defects, and dramatically reducing the sensitivity to tungsten deposition defects.

Technology elements developed for the 4Mb generation were also used in the 16Mb process, but were integrated differently. Phosphosilicate glass (PSG), used for first-level passivation, is deposited using a sequence of PECVD deposition and argon sputtering like the ILD process developed for the 4Mb generation [31]. The lower processing temperature with this approach is necessary to prevent agglomeration of the thin TiSi2 used on junctions. The PSG is CMP planarized, and vertical contacts are filled with tungsten using the 4Mb via fill process. The 16Mb generation is again a metal bit-line design, but aluminum-based metal is used instead of tungsten, because superior conductivity is needed for support circuit performance. The insulator between the first and second levels of metal is deposited using a deposition/etch sequence optimized to fill tight spaces between lines and to provide smoothing [32]. This is important, because no additional planarization is used. More traditional tapered vias without tungsten fill are used. The second-level metal is used only for support circuit wiring in our 16Mb designs, so the design density penalty with this simpler process is small (0.5%). The manufacturing costs for the ILD and contacts are 40% less with the nonplanar, tapered-via approach.

Recent work on wiring technology for 256Mb DRAMs has demonstrated that vertical and horizontal scaling works effectively down to the 0.25-µ m level. Figure 16 summarizes the evolution of metal levels in the IBM DRAM generations. Heights and spacings are reduced together to keep the aspect ratio around 1.0. This prevents void formation during deposition of interlevel dielectrics, and keeps the metal line-to-line capacitance under control, which is especially important in the case of the bit line. The first-level-metal wiring layer, which forms the bit line in the 256Mb DRAM, is made by the tungsten damascene process scaled to 0.20-µ m thickness. The second wiring layer in the 256Mb generation is a TiN/Al-0.5% Cu/Ti sandwich with a total thickness of 0.3 µ m.

Figure 16

The production of the borderless contact (Figures 7 and 8) required for the 64Mb and 256Mb generations (see the section on DRAM structure evolution) presents a formidable challenge. The deposited layer which will form the gate electrode must include an insulating cap layer, which remains after gate formation. Subsequently, insulating sidewall spacers complete the encapsulation of the gate. A special etch-stop layer is deposited just underneath the passivation. This layer must withstand the process of etching the diffusion contact hole through the glass passivation so that the insulating cap layer covering the adjacent gate remains intact. The etch chemistry is then altered to remove the etch-stop layer. The etch-stop thickness is limited by aspect ratio and contact resistance requirements to less than 700 Å , so etch selectivities greater than 25:1 are required on topographical as well as planar surfaces.

One etch stop that has been used is a silicon nitride/ thin polysilicon sandwich. The polysilicon is an excellent barrier to SiO2 glass passivation etching using conventional dry etching processes [33]. The polysilicon is oxidized after the glass passivation has been etched. The thin silicon nitride layer underneath the polysilicon provides an oxidation barrier. This process, however, has undesirable side effects. The oxidation creates stress and leaves unoxidized polysilicon filaments; and the polysilicon film increases the contact aspect ratio, making it difficult to fill the contact hole with tungsten without voids.

Recent advances in etch technology have made it possible to use silicon nitride for the barrier layer, eliminating the need for oxidation [34]. Selectivity greater than 40:1 between SiO2 and Si3N4 has been obtained on planar surfaces. Although selectivity at the gate corner is reduced by sputtering, it can be improved by controlling the ratio of polymer deposition to ion bombardment during the etch. Continued development of this process is expected to result in an effective borderless contact process for the 256Mb generation.

The wiring technology developed in DRAMs has also been useful in logic products. The tungsten-stud- aluminum-based metal combination is the basis for the dense back-end-of-line wiring in the IBM CMOS IV and CMOS V logic technologies. The low-cost nonplanar tapered-via approach is used in CMOS V logic technologies for the final metal layer, in cases when high-density wiring is not required.

Meeting DRAM-specific leakage requirements

Low parasitic leakage currents are important to the DRAM product because of the cell retention time requirements, which have increased from 16 ms in the early 4Mb generation to as much as 256 ms for current low-power products. The use of trench technology in DRAMs resulted in some unique leakage problems due to device structure and processing. Their solution required simulation and extensive experimentation. This section first describes gate-induced drain leakage (GIDL), which caused array n-well-to-substrate leakage problems in 4Mb and 16Mb chips. Next we describe how the three-dimensional geometry of the cell device increased GIDL leakage between the cell node diffusion and the n-well in the 16Mb cell, and what was done to solve the problem. Finally, we show how retention time (RT) problems due to dislocations present during the early production of DRAMs with trenches were solved.

Well-to-substrate GIDL

Gate-induced drain leakage (GIDL) refers to the diffusion-to-substrate leakage generated in the gate-to-diffusion overlap region of a gated diode structure or FET device. The leakage increases as the gate is biased to deplete the surface of the diffusion. GIDL had an impact on technology design as a result of its impact on array well-to-substrate leakage and cell retention time.

During the development of the 4Mb DRAM, it was found that as the trench-fill-to-substrate bias was increased, a significant current was observed between the n-well and the substrate terminals (Figure 17). This is due to the generation of electrons in the depletion region formed along the trench dielectric-silicon substrate interface, and their collection by the n-well. The polysilicon fill in the trench plays the role of the gate in a gated diode configuration. This current had the effect of overloading the n-well generator at the high voltage and temperature used for burn-in.

Figure 17

GIDL is generated by more than one mechanism. The graph of Figure 18 shows trench-fill-gated diode leakage versus trench-fill-to-substrate voltage at various temperatures, for the 4Mb DRAM technology. This graph indicates the existence of at least two mechanisms. Above 4 V, a highly voltage-dependent component of the current with weak but noticeable temperature dependence appears. At lower voltages, the current has a much stronger temperature dependence and a reduced voltage dependence [35, 36]. The same low-voltage thermal generation mechanism was also observed on large-perimeter n- and p-channel MOSFET test structures, where it was shown to be a function of dielectric thickness and gate-to-drain overlap [37]. Concurrently, researchers in universities were pursuing parallel research efforts [38-40]. Research efforts outside IBM focused on MOSFET GIDL in high-field band-to-band tunneling and the avalanche breakdown regime; within IBM, the effort was more focused on the low-voltage thermal regime.

Figure 18

A model for GIDL was added to the IBM finite-element device analysis simulator FIELDAY II [41, 42]. It includes band-to-band tunneling, as well as electric-field-dependent thermal generation processes, such as Frenkel-Poole generation and trap-to-band tunneling models.

The simulator results were used to design the 16Mb cell to eliminate the trench-fill bias-induced n-well current described above. The solution was to extend the thick insulating collar below the n-well to create a potential barrier isolating the n-well from the heavily doped p-type bulk where the electrons were generated (Figure 19). The simulator helped to determine the minimum collar depth necessary to eliminate the leakage [43].

Figure 19

DRAM cell device GIDL

The use of trenches for isolation, begun with the 16Mb DRAM, causes a unique high-field region at the intersection of the gate drain and the isolation which is three-dimensional in nature (Figure 20). The isolation of the DRAM cell device is actually formed by the storage trench collar. GIDL in this structure was simulated using FIELDAY II [42, 44]. This simulator allows complex geometries to be simulated, and new physical models to be added easily, because it is highly modular, separating a problem into generalized geometric definition, equation setup, and solution phases. Figure 21 shows a plot of the simulated electric field at the device surface, which is enhanced at the "corner" where the gate and isolation intersect. This high electric field increases GIDL.

Figure 20 Figure 21

The existence of the high GIDL current at the corner is shown experimentally by plotting the measured MOSFET GIDL current for trench-isolated devices of differing width versus device width, together with the MOSFET GIDL current measured from a very wide device scaled for width (Figure 22). As devices become narrower, the measured GIDL current tends to be independent of width because of the current generated at the high-field region described above. For wide devices, the current approaches the wide-device current scaled for width.

Figure 22

The GIDL current was simulated with variations in structural and device design parameters which influence GIDL. They include trench corner shape, junction grading, gate oxide thickness, and gate wrap-around of the trench sidewall. Model parameter calibration was accomplished by fitting a 2D model minority carrier lifetime and band-to-band tunneling coefficients to the measured current from a wide-gated diode without corners. The 2D and 3D model geometries were obtained from SEM micrographs. Some of the simulation results are shown in Figure 23, a graph of the 3D GIDL corner current vs. gate-to-drain voltage difference for different junction profiles [45]. A GIDL current of less than 2 fA per cell was achieved by using 3D device simulation and experimentation to guide optimization of the drain doping profile and controlling critical process steps.

Figure 23

Solution of dislocation-related retention problems

In the development of the 4Mb cell, leakage due to dislocations and variable retention time (VRT) proved to be the most important retention time problems in the early stages of development. (VRT is a phenomenon in which retention time fluctuates with time.) The solutions to these problems were used in subsequent generations.

Stress, combined with nucleation sites in the silicon, can create dislocations in some cells. When these dislocations cut through the storage node diffusion of the cell, leakage is generated along this defect, reducing retention time. Dislocations were generated in the 4Mb DRAM cell at the intersection of the deep-trench storage node, LOCOS isolation, and the ion-implanted junction, where stress and nucleation sites coincide [Figure 24(a)]. The greatest stress around the trench is created by the oxidation of the trench sidewall during LOCOS and subsequent processes, when a vertical bird's beak pushes out from the trench [Figure 24(b)]. A variety of modeling tools were used to understand how to improve the cell design from the standpoint of minimizing stress. FEDSS (Finite Element Diffusion Simulator System) was used with simple analytic approximations to evaluate the stresses due to oxidation around the SPT cell [46].

Figure 24

TEM pictures showed that dislocations usually begin at nucleation sites in the source-drain implant area in front of the deep trench (Figure 25). Chips with low retention times had many dislocations, and chips with long retention times had few. Although perfect correlation between dislocations and individual retention time failures was not found, no dislocations were found in chips having RTs of 750 ms and greater.

Figure 25

A number of process modifications were made to eliminate dislocations. A high-temperature RTA, inserted just before source-drain implantation to relieve stress by allowing oxide to flow, resulted in a drastic reduction in the number of dislocations. Some improvement in the number of point nucleation defects created by the source-drain ion implant was made by implanting directly into the silicon instead of through a screen oxide. By this change, incidental free oxygen from the screen oxide was eliminated as a postimplant interstitial.

Heavy-metal (Fe, Ni, Cu, Mo) contamination, which electrically activates and increases the density of nucleation point defects, has been reduced by nearly two orders of magnitude since the beginning of 4Mb DRAM manufacturing. Metal contamination monitoring has become an integral part of tool and wet-process monitoring.

The layout of the trench, which can influence stress, was modified a few times during the 4Mb development. The first time, an experimental approach was taken, with test masks and fabrication experiment; it took a year to produce a design. Later in the program, when photo exposure changed and a new trench layout was needed, a simulation program, Boundary Element Design System (BEASY) [47], was used to design it. A simple model (an expanding plug in a hole) calculated the stress on the trench face versus shape and guided a mask change that successfully reduced the stress [48]. This took less than three months and resulted in a significant increase in yield [49].

Researchers at IBM discovered VRT on DRAM cells shortly after Yaney et al. reported the phenomenon in 1987 [50]. A VRT cell may cause a field failure or may be unimportant, depending on the application and the minimum retention time of the VRT cell. Extensive testing on DRAMs from 64Kb to 16Mb from many vendors and technology types has shown that VRT cells exist on all of them. Fortunately, VRT instability is usually seen only at high application temperatures and at very long retention times compared to the application specification.

Two types of VRT cells were found: two-state VRT, where the retention time of the cell falls randomly into only two stable exponential probability distributions, and multistate VRT, where there is a continuous range of retention times and the probability distribution looks Gaussian on a logarithmic scale of retention time. Thermal activation energies for the leakage current of high and low states in the two-state VRT cell have been measured at about 1.0 eV for the high state and 0.9 eV for the low RT state. The frequency of switching between high and low states is also thermally activated [51].

Physical failure analysis of VRT cells has shown dislocations and silicon crystal defects only on cells that spend large amounts of time below 300 ms. For VRT cells with minimum RT greater than 300 ms, no defects are found.

Chips with low minimum VRT typically have many weak cells for ordinary RT as well (Figure 26). This provides a method of screening product to prevent customers from getting product with low minimum VRT. Product with a 16-ms retention time requirement was screened for retention times many times longer. A high-temperature module test used to screen for VRT during burn-in provided data on the effectiveness of wafer-level screens.

Figure 26

As the retention time specifications increased to 32 ms and greater, it was found that the wafer-level RT guardband could be reduced without degrading VRT performance. VRT problems were eliminated as a result of drastic reduction in the occurrence of dislocations and the improvement in retention time distributions.

Conclusions

The substrate plate trench (SPT) cell used for the 4Mb DRAM has been the basic concept for three successive DRAM generations at IBM, as the cell area has been reduced from 11.3 µ m2 to 0.6 µ m2. The outstanding advantages of the SPT cell are more planar topography and reduced junction leakage and SER.

To obtain the cell size reductions required by competition, the minimum lithographic image size has been reduced by 0.7 × at each generation, and new features have been added to the cell technology. The increase in process complexity with each generation, and the resultant increase in development cost, have necessitated the use of alliance partnerships to reduce the investment required by one company alone, beginning with the 64Mb generation.

The first generations of I-line and DUV lithography were developed along with 4Mb and 16Mb DRAM, respectively. Additional lithography improvements are being developed as 64Mb and 256Mb development proceeds.

The important cell features which enable shrinkage of the SPT cell are the addition of a thick SiO2 collar around the top part of the trench, the borderless contact, and new methods of forming the strap contact between storage trench fill and node diffusion.

The 4Mb and 16Mb DRAMs have p-MOS arrays in retrograde n-wells implanted in a p-epitaxial layer on a p+ substrate, which serves as the plate of the storage capacitors. With the 64Mb generation, an n-MOS array is used to improve performance, and the buried n+ layer that forms the plate is diffused from the bottom part of the trench into a p-substrate.

Many technology features first introduced in 4Mb and 16Mb DRAMs are also important to advanced logic, including shallow-trench isolation, polish-planarization techniques, retrograde n-wells, and planarized wiring technology. The wiring technology developed for the 0.7-µ m 4Mb DRAM technology is found to be scalable to 0.25-µ m, 256Mb technology.

The need for DRAM retention time improvements drives the study of important leakage phenomena. GIDL was studied in connection with a storage-trench-gated n-well-to-substrate leakage impacting n-well bias generation at burn-in conditions in the 4Mb and 16Mb chips and a cell node leakage mechanism affected by 3D geometry in the 16Mb cell. Dislocation defects, which appeared in the early stages of trench process development, were found to cause retention time loss and gave rise to the variable retention time phenomenon. These problems were solved after simulation suggested techniques which eliminated the dislocations.

Use of the same basic cell concept for four generations has encouraged the transfer of knowledge and process techniques from one generation to the next. At this point, the cell design to be used for the 1Gb DRAM chip has not yet emerged.

Acknowledgments

This paper would not be complete without acknowledgment that the developments described in this paper are the work of IBM Research, the IBM DRAM development teams in Essex Junction, Vermont, and the Advanced Semiconductor Technology Center in East Fishkill, New York, including the Siemens and Toshiba alliance partners. The references cited in this publication are only a small part of what was contributed by the workers from these laboratories. There are too many people to name individuals. The principal author would also like to thank the anonymous referees and editors of the IBM Journal of Research and Development for their constructive suggestions, and his manager for support during the writing of this paper.

References

Received July 25, 1994; accepted for publication October 14, 1994