

# ECC/DSP System Architecture for Enabling Reliability Scaling in Sub-20nm NAND

Eran Sharon, Idan Alrod, Avi Klein, Alon Eyal, Ofer Shapira Intelligent Memory Systems, Memory Division, SanDisk Corp.

August 2013



During our presentation today we may make forward-looking statements.

Any statement that refers to expectations, projections or other characterizations of future events or circumstances is a forward-looking statement, including those relating to market growth, industry trends, future memory technology, technology transitions and future products. This presentation contains information from third parties which reflect their projections as of the date of issuance.

Actual results may differ materially from those expressed in these forward looking statements due to factors detailed under the caption "Risk Factors" and elsewhere in the documents we file from time to time with the SEC, including our annual and quarterly reports.

We undertake no obligation to update these forward-looking statements, which speak only as of the date hereof.

Disclaimer: This tutorial provides an overview of various techniques and concepts, some or all of which may not necessarily reflect what SanDisk is actually using in their products.



- Gap Between Product Requirements and Technology Capability
  - Applications Requirements: Endurance, Performance, Power
  - Reliability Challenges with Scaling
- ECC/DSP solutions
  - Tier 0: Adaptive NAND Parameters Optimization
  - Tier 1: Noise Reduction
  - Tier 2: Advanced Error Correction Coding (ECC)
  - Tier 3: Second level Error Correction (RAID)
  - Tier 4: Flash Management Algorithms
  - Tier 5: Host Data Manipulation
- Summary

Disclaimer: This tutorial provides an overview of various techniques and concepts, some or all of which may not necessarily reflect what SanDisk is actually using in their products.



# **Increasing Product Requirements**





# Gap Between Raw Memory Capability and Applications Requirements





# Gap Between Raw Memory Capability and Applications Requirements





## Optimized Endurance for enhanced video download & application caching







## Optimized Performance for superior gaming experience

# Better performance







### Optimized Power Consumption for longer web browsing







# **Reliability Challenges with Scaling**

# As an example we will describe the phenomena of <u>Read Disturb</u>







# 1. BL Pre-Charge





#### Threshold Voltage numbers are nominal



- 1. BL Pre-Charge
- 2. Gate Voltages

- Opens "select gate" transistors
- Opens unselected cells "Victims"
- Senses state of selected cell "Target"



# Read Operation Target Cell is Erased – Read "1"



- 1. BL Pre-Charge
- 2. Gate Voltages
- 3. Sensing

13



# Read Operation Target Cell is Erased – Read "1"



- 1. BL Pre-Charge
- 2. Gate Voltages
- 3. Sensing





# Read Operation Target Cell is Erased – Read "1"



- 1. BL Pre-Charge
- 2. Gate Voltages
- 3. Sensing

15



# Read Operation Erased Cell – Read "1"



- 1. BL Pre-Charge
- 2. Gate Voltages
- 3. Sensing

Current is Sensed ≻ Cell is Erased! Read "1"





# Read Operation Target cell is Programmed – Read "0"



- 1. BL Pre-Charge
- 2. Gate Voltages
- 3. Sensing

17

Current is NOT Sensed➢ Cell is Programmed! Read "0"









19



 P/E cycles leads to Tunnel Oxide (Tox) degradation that creates traps



 P/E cycles lead to Tunnel Oxide (Tox) degradation that creates traps

3. Sensing

"Weak Programming" in unselected cells due to unintentional tunneling of electrons to the FG



1. BL Pre-Charge
2. Gate Voltages

Current NOT Sensed

> Cell is Programmed!

 P/E cycles lead to Tunnel Oxide (Tox) degradation that creates traps

3. Sensing

 "Weak Programming" in unselected cells due to unintentional tunneling of electrons to the FG

SanDisk



# ECC/DSP Methods: from NAND to System







# Error Handling System Solutions Early Technologies



#### Basic ECC sufficient to meet application requirements





24

# Error Handling System Solutions Advanced (Sub-20nm)



Sophisticated ECC and DSP techniques applied to mitigate the natural drift in reliability, and to meet the more demanding requirements of embedded application.



# Tier 0: Adaptive NAND Parameters Optimization

- Adaptive NAND parameters optimization along the memory lifetime.
- Parameter setting ("trimming") of the Program, Erase and Read parameters.
- System level feedback adapts the parameters to:
  - Memory wearing and error rates along the lifetime
  - Die to die, block to block, WL to WL variations within the memory
  - Host data patterns
- Once NAND level optimization has been exhausted, the residual noises and errors need to be handled at system level





# Adaptive Read Thresholds – Example

#### Problem:

- Cell Voltage Distribution is not fixed:
  - Changes along the memory lifetime with W/E cycling and time (DR)
  - Variations within a die changes from Block to Block, WL to WL,...
- Using a fixed set default thresholds result in high BER and decoding failure



San Disk<sup>®</sup>

#### Solution:

Adaptive read thresholds



- System level residual NAND "noise" reduction via DSP and coding techniques, aimed at reducing error rates to a bare minimum level
  - Tier 1 countermeasures may reduce raw NAND error rates from a ~1E-1 error level to ~1E-2 error level
  - Tier 1 countermeasures are aimed at:
    - Ensuring that the next Tier 2 Error Correction Coding (ECC) is cost effective (i.e. less redundancy)
    - Maximizing performance and reducing power consumption
  - Tier 1 countermeasures deal with non intrinsic "noises", which can be cancelled out, mitigated or compensated for:
    - Data dependent noises such as cross-coupling induced widening, back pattern effects, Program and Read Disturbs, Over programming errors, etc.





# NAND Scaling Challenges – Interferences





Source: Semiconductor Insights

![](_page_27_Picture_6.jpeg)

![](_page_28_Picture_0.jpeg)

![](_page_28_Figure_1.jpeg)

#### SanDisk

![](_page_29_Picture_0.jpeg)

![](_page_29_Figure_1.jpeg)

With technology scaling, CCC increases dramatically

Air Gap technology make the 19nm (AG) CCC equivalent to 24nm
 30 (no AG)~ 27% reduction
 SanDisk<sup>\*</sup>

![](_page_30_Picture_0.jpeg)

Digitally mitigating cross coupling and other data depended noises during read by taking into account the neighboring cell's read state

![](_page_30_Figure_2.jpeg)

![](_page_31_Picture_0.jpeg)

# Tier 2: Advanced Error Correction Coding (ECC)

- Advanced Error Correction Coding (ECC) is required in order to handle the residual errors of tier 1
  - Tier 2 ECC can reduce the ~1E-2 residual error levels of tier 1 to ~1E-16 error level
  - State of the art iterative coding techniques, such as LDPC, are replacing algebraic coding techniques, such as BCH codes
  - Advanced ECC techniques are essential for achieving an optimal cost, endurance and performance tradeoff, as they allow operation near the theoretic limits (Shannon limit), providing maximal correction capability for a given amount of overprovisioning ("ECC redundancy")

![](_page_31_Picture_6.jpeg)

![](_page_32_Picture_0.jpeg)

How can we compute the Flash capacity?: Information Theory (Shannon 1948)

Based on knowing probability to read a voltage level Y given that a voltage level X was programmed

![](_page_32_Figure_3.jpeg)

Actual computations are more complicated. Depend on:

- Verify and read voltage levels
- Data retention
- P/E cycles

33

- Temperature
- Tuning voltage ambiguity

- Cross coupling
- Back pattern

a (a (a

Program/Read disturb

San

![](_page_33_Picture_0.jpeg)

# Approaching the Shannon Limit

![](_page_33_Figure_2.jpeg)

Source: Forward Insights

![](_page_33_Picture_4.jpeg)

![](_page_34_Picture_0.jpeg)

# Tier 3: Second level Error Correction (RAID)

- For enhanced reliability, especially required for SSD applications, a second level error correction, aimed to deal with complete NAND failures resulting in colossal errors, is required. RAID like techniques are used for that purpose.
  - Tier 3 level protection is used for both:
    - Reducing tier 2 error rates from ~1E-16 to ~1E-24 or lower
    - Reducing dPPM levels due to gross NAND failure, such as WL breaks, WL shorts, etc.
  - Tier 3 protection may require extra overprovisioning, or may only maintain the overprovisioning temporarily in the controller until verifying data integrity.

![](_page_34_Picture_7.jpeg)

![](_page_35_Picture_0.jpeg)

![](_page_35_Picture_1.jpeg)

![](_page_35_Picture_2.jpeg)

![](_page_35_Picture_3.jpeg)

![](_page_36_Picture_0.jpeg)

# Tier 4: Flash Management Algorithms

- Back End Flash Management algorithms which manage how logical data is stored on the physical NAND level, in a way that will provide the best performance (both sequential, random or any other combined use case) and the best endurance
- Examples of Flash management functions are:
  - Logical to physical address
  - Wear leveling
  - Garbage collection

![](_page_36_Picture_7.jpeg)

![](_page_37_Picture_0.jpeg)

- Host data manipulation, leveraging the inherent "redundancy" in the host data for improving endurance, performance and power
  - Examination of host data produced by users or arising from various operating and file system shows that a significant fraction of the data is of low entropy, having many repetitive data patterns
  - Low entropy data from the host can be manipulated by the controller in various ways:

![](_page_37_Figure_4.jpeg)

![](_page_38_Figure_0.jpeg)

Tier 0: Adaptive Parameters

> Tier 1: Noise Reduction

Tier 2: Advanced ECC

Tier 3: Second Level Error Correction

Tier 4: Flash Management Algorithm

Tier 5: Host Data Manipulation

**Optimization:** Optimization for different tradeoffs

![](_page_39_Picture_0.jpeg)

# Thank you!

© 2013 SanDisk Corporation. All rights reserved. SanDisk is a trademark of SanDisk Corporation, registered in the United States and other countries. iNAND and Adaptive Flash Management are trademarks of SanDisk Corporation. Other brand names mentioned herein are for identification purposes only and may be the trademarks of their respective

![](_page_39_Picture_3.jpeg)