

### Low Voltage and High speed Interface in Nand Flash

### (1.2V and 800 Mbps IO)

### Changhyuk Lee

Email: <a href="mailto:changhyuk.lee@skhynix.com">changhyuk.lee@skhynix.com</a>

SKHYNIX



## Outline

### ◆ Trend of High Performance IO

- Technology Breakthrough (Process and Design)
- Prospect of System Application

### Conclusion



## Outline

### Trend of High Performance IO

### Technology Breakthrough (Process and Design)

### Prospect of System Application

### Conclusion



## **Interface Roadmap**

#### Enhanced Performance & Expandable Lanes





### Page Size Vs IO Bandwidth (Nand)







# B/W requirement for mobile Dram



Source: skhynix DRAM marketing team





### ♦ Trend of High Performance IO

### Technology Breakthrough (Process and Design)

### Prospect of System Application

#### Conclusion



# **Technical Breakthrough**

- Nand Flash IO Performance reached limitation (~400Mbps) due to its Peripheral Transistor Limit
- Its Performance Limitation can be overcome by another Slim Gate oxide Transistor equal to that of Mobile DRAM
- Additional Process Cost is about 10% (2D Floating Gate) and 5% (3D Nand)
- SKHynix has both Nand Flash and DRAM Technology, So Technology and Circuit adoption is easier.
- Ref> A 64Gb NAND Flash Memory with 800MB/s Synchronous DDR Interface, Hwang Huh, 2012 4th IEEE International Memory Workshop



# Addition of Slim Transistor

- ♦slim Tr. with Tox=~ 3nm is additionally built up in conventional Flash Tr.
- ◆Transmitter/Receiver made of thin Tr. can't support 800Mb/s/pin at VCCQ=1.2V
- Simple solution?: Thin Tr. Tox reduction
- ◆ However, Tox reduction of thin Tr. would degrade cell reliability.

|                                               | Conventiona<br>I NAND | HS-IO<br>NAND | Remark                                                                        |
|-----------------------------------------------|-----------------------|---------------|-------------------------------------------------------------------------------|
| Thick Transistor<br>(Tox = ~ 50 nm)           | Ο                     | 0             | For high voltage generation and transfer                                      |
| Thin (Medium)<br>Transistor<br>(Tox= ~ 10 nm) | Ο                     | 0             | For peripheral circuitry, Tox is limited by flash cell tunnel oxide thickness |
| Slim Transistor<br>(Tox= ~ 3 nm)              | X                     | Ο             | For IO-speed related circuitry,<br>Operation voltage=1.2V                     |



# **Block diagram**



- Power domain split (HV and LV)
- Keep Source Power of HV pump as
  3.3V

Change the power of data in/out path

and other peripheral area to 1.2V

- ✓ HV to LV Interface Circuitry
- ✓ 1.2V Analog and Digital Circuitry
- ♦ Local IO Sense Amp



# **Split Power Page buffer**

A workaround for avoiding slim Tr. breakdown voltage issue

 $\rightarrow$  Addition of Thin Tr. and proper its gate bias control.

→ Adequate sequence is proposed : Precharge/Discharge Sequence





# LSA Data In/Out Architecture

LSA Data In/out architecture makes sensing time less than 5 ns.  $\rightarrow$ IO/IOb line is divided into proper segments and LSA is additionally implemented in each segment.





# **Current Reduction**

1.2V 800MB/s diminishes the IO & data path current remarkably.

 $\rightarrow$  IO current in read operation gets reduced due to Tx junction capacitance reduction & IO swing voltage down.

→ Average Data path current of is similar to that of conventional case, but Data transfer time is reduced to half, total Charge is reduced

 $\rightarrow$ System power consumption reduction resulted from current reduction and operation voltage down (P=IV) => Increase # of interleaving chips due to power reduction

| Estimated average current [mA] |                 | Conventional<br>(3.3V/1.8V, 400MB/s) | This work<br>(1.2V/1.2V/3.3V, 800MB/s) |              |
|--------------------------------|-----------------|--------------------------------------|----------------------------------------|--------------|
|                                | VccQ (IO)       |                                      | 70mA                                   | 55mA         |
| Deed                           | Vcc (Data       |                                      | 80mA                                   | 82mA         |
| Vcc<br>(tR)                    | Vcc             | 1 plane                              | 24mA                                   | $\leftarrow$ |
|                                | (tR)            | 2 plane                              | 31mA                                   | <del>~</del> |
|                                | Vcc (Data path) |                                      | 85mA                                   | 87mA         |
| Program                        | Vcc<br>(tPROG)  | 1 plane                              | 27mA                                   | $\leftarrow$ |
|                                |                 | 2 plane                              | 34mA                                   | $\leftarrow$ |



## **Characteristics of Thin/Slim Transistors**

A transmitter composed of slim Tr. (Slim Tr. Tx) can support 800Mb/s/pin.

 $\rightarrow$  The on-resistance of slim Tr. is lower than that of thin Tr.

- $\rightarrow$  Slim Tr. Tx shows lower pin cap. than Thin Tr. Tx due to reduced Tr. width for specific driver strength (ex. 40 ohm)
- $\rightarrow$  It meets the data valid window(~0.75 ns) for 800Mb/s/pin

 $\rightarrow$  The addition of slim Tr. can make IO speed improved to more than 800Mb/s/pin (ex. 1066 Mb/s/pin).

| Char.                                      | Thin Tr.         | Slim Tr.       |
|--------------------------------------------|------------------|----------------|
| Gate Oxide Thickness                       | ~100Å            | ~30Å           |
| Gate Type                                  | Single Poly Gate | Dual Poly Gate |
| Operation Voltage                          | 2.3V             | 1.2V           |
| Nominal Tr. Length                         | 0.3um            | 0.1um          |
| Saturation Current                         | ~250uA           | ~250uA         |
| Breakdown Voltage<br>(Gate Oxide/Junction) | ~5V/~5V          | ~1.5V/~2.3V    |
| Capacitance                                | High             | Low            |

Simulated output waveform



1UI=0.75 ns, target data valid window:0.5UI



### **Chip Architecture & Structural Features**



| Feature                | Value                                                |  |
|------------------------|------------------------------------------------------|--|
| Bit per Cell           | 2                                                    |  |
| Density                | 64Gb                                                 |  |
| Technology             | 1X nm F.G                                            |  |
| Organization           | 2-plane × 16KB ×<br>128 pages × 4K<br>blocks × 8-I/O |  |
| Program<br>Performance | 20MB/s                                               |  |
| I/O<br>Bandwidth       | 800MB/s                                              |  |
| Erase Time             | 5ms                                                  |  |



# Outline

Trend of High Performance IO

Technology Breakthrough (Process and Design)

Prospect of System Performance





## **Sequential Read**

Sequential Read Performance is maximized when IO bus is full of Data Transfer Time.

In Sequential Read Operation, more Number of Chip Interleaving enhances Read Performance,

If tR < (N-1) x tData (N= Number of Chip Interleave)

Therefore Enhancing Nand IO bps directly affects Controller to Nand Interface Performance, with

Appropriate Number of Chip Interleave

(2 Die in 400Mbps => 4 Die in 800Mbps in this Example)





## **Sequential Write**

- ✓ Sequential Write Performance has the same logic with Sequential Read
   Performance.
- ✓ The maximum Write Performance Criteria is  $tPROG < (N-1) \times tData$ 
  - (N= Number of Chip Interleave)
- ✓ Ideally 16 Die Interleave is possible in 800Mbps in this case (8 die in 400Mbps), but Power Drop limitation due to many Interleaving Dies has another Limitation.
- ✓ Therefore, state of the art Performance (400Mbps) seems maximum 4 die interleave.
- ✓ 1.2V IO operation may help overcoming the Limitation by Power Drop (helps
   25% Interleaving Numbers Die increasing by simple calculation)



# Random Read Operation (1)

Before Make comparison of 400Mbps and 800Mbps Data Rate, Let's make a few Simple Approximation

- Random IOPS = Data size/ (Read Time + tECC)
  - Read Time = Map Read + Data Read
- ◆ Consider using SLC buffer techniques, because it is widely used.

Map Read is executed from SLC buffer (Ram size is small) or RAM memory (Ram size is large)

Data Read is executed from SLC buffer (Hot data) or MLC (Cold data)

When Command Queuing is used, High freq IO is more effective





More Hot data hit rate and using enough RAM memory, and More Die Interleave Numbers and Flash memory aligned workload and File System  $\Rightarrow$  High frequency data IO is more effective

|                           | R/R IOPS    |             | diffor | Circale Accuration of Operation      |
|---------------------------|-------------|-------------|--------|--------------------------------------|
| Cases                     | 400<br>Mbps | 800<br>Mbps | ance   | Simple Assumption of Operation       |
| cold data                 | 29.6K       | 34.8K       | 17%    | Map Read (SLC) + Data Read (MLC)     |
| hot data                  | 42.1K       | 53.3K       | 27%    | Map read (SLC) + Data read (SLC)     |
| hot data (RAM)            | 57.1K       | 80.0K       | 40%    | Map read (RAM) + Data read (SLC)     |
| hot data (RAM), 2 x tDATA | 36.4K       | 57.1K       | 57%    |                                      |
| hot data (RAM), 4 x tDATA | 21.1K       | 36.4K       | 73%    | N x tDATA means data transfer/die is |
| hot data (RAM), 8 x tDATA | 11.4K       | 21.1K       | 84%    |                                      |

Assumption

SLC read: 25us

Data transfer per die: 4kB

MLC read: 60us

tECC: 10us



•

# **Random Write Performance**

◆ In case of random write performance, Nand Data transfer time (~80us) is very small compared with tPROG (~ 1500us)

Random Write Operation is more complex than Random Read

High frequency effect is negligible at present, but similar to Random Read Case, when Hot data hit rate ,enough RAM memory ,Die Interleave Numbers and Flash memory aligned workload and File System etc are ready, High frequency IO may get more effective.



### Outline

### Trend of High Performance IO

Technology Breakthrough (Process and Design)

### Prospect of System Application

### Conclusion



## Conclusion

♦ As Nand Flash memory user's Interface Performance is rapidly Increasing, Technical Breakthrough of Nand Flash IO is required.

- We presented 1.2V 800MB/s IO Process and Design for the Worldwide 1rst Time
  - ✓ A 30-nm gate oxide slim Tr. was introduced in conventional Flash Tr.
  - ✓ Split Power page buffer
  - ✓ LSA Data In/Out architecture, etc
  - $\checkmark$  IO and Data path current was reduced.
- System Performance compared with that of 400MB/s
  - ✓ Sequential Read Performance can be doubled with More Die Interleave and Random Read Performance can be improved with Die interleave and Innovation of System level control
  - ✓ Write performance can also be improved effectively if Power drop due to Multi
     Die Interleave can be managed