# Coding for Unreliable Flash Memory Cells

#### Ryan Gabrys, Lara Dolecek

Laboratory for Robust Information Systems (LORIS) Department of Electrical Engineering, UCLA





- 2 Empirical Data
- 3 Data Analysis
- 4 Error-Correction Model
- 5 Performance Results

#### 6 Conclusion

#### Flash Memory Basics

• Flash memory is comprised of a set of floating gate cells.



ション ふゆ メ キャ キャ マ ちょうく

# Flash Memory Basics

- Flash memory is comprised of a set of floating gate cells.
- Information is stored by controlling the number of electrons stored within each cell.



# Flash Memory Basics

- Flash memory is comprised of a set of floating gate cells.
- Information is stored by controlling the number of electrons stored within each cell.
- Density Per Cell

# Flash Memory Basics

- Flash memory is comprised of a set of floating gate cells.
- Information is stored by controlling the number of electrons stored within each cell.
- Density Per Cell
  - Single-Level-Cell (SLC) 1 bit per cell.



# Flash Memory Basics

- Flash memory is comprised of a set of floating gate cells.
- Information is stored by controlling the number of electrons stored within each cell.
- Density Per Cell
  - Single-Level-Cell (SLC) 1 bit per cell.
  - Multiple-Level-Cell (MLC) 2 bits per cell.



# Flash Memory Basics

- Flash memory is comprised of a set of floating gate cells.
- Information is stored by controlling the number of electrons stored within each cell.
- Density Per Cell
  - Single-Level-Cell (SLC) 1 bit per cell.
  - Multiple-Level-Cell (MLC) 2 bits per cell.
  - Triple-Level-Cell (TLC) 3 bits of information per cell.



# Flash Memory Basics

- Flash memory is comprised of a set of floating gate cells.
- Information is stored by controlling the number of electrons stored within each cell.
- Density Per Cell
  - Single-Level-Cell (SLC) 1 bit per cell.
  - Multiple-Level-Cell (MLC) 2 bits per cell.
  - Triple-Level-Cell (TLC) 3 bits of information per cell.



# Mapping Information to Voltage Levels in TLC



- Center Significant Bit CSB
- Least Significant Bit LSB



Low Voltage



# Data Collection

• On the first of every 100 P/E cycles the following was performed:



▲□▶ ▲□▶ ▲□▶ ▲□▶ ▲□ ● ● ●



- On the first of every 100 P/E cycles the following was performed:
  - Erase the block. (block= $2^{20}$  cells)



ション ふゆ メ キャ キャ マ ちょうく

# Data Collection

- On the first of every 100 P/E cycles the following was performed:
  - Erase the block. (block= $2^{20}$  cells)
  - 2 Read back the errors.



# Data Collection

- On the first of every 100 P/E cycles the following was performed:
  - Erase the block. (block= $2^{20}$  cells)
  - 2 Read back the errors.
  - Write random data.

# Data Collection

- On the first of every 100 P/E cycles the following was performed:
  - Erase the block. (block= $2^{20}$  cells)
  - 2 Read back the errors.
  - Write random data.
  - Read back the errors.



# Data Collection

- On the first of every 100 P/E cycles the following was performed:
  - Erase the block. (block= $2^{20}$  cells)
  - 2 Read back the errors.
  - Write random data.
  - ④ Read back the errors.
- On the other 99 cycles, the block was erased and all-zeros were written.



#### Raw Error Rate



#### Observation 1: Error Patterns Within a Symbol

| Number of bits in symbol that err | Percentage of errors |
|-----------------------------------|----------------------|
| 1                                 | 0.9617               |
| 2                                 | 0.0314               |
| 3                                 | 0.0069               |



▲□▶ ▲□▶ ▲□▶ ▲□▶ ▲□ ● ● ●

#### **Observation 2: Unreliable Flash Cells**

• We organized the Flash cells into two categories:



## **Observation 2: Unreliable Flash Cells**

- We organized the Flash cells into two categories:
  - **1** reliable Flash cells,



▲□▶ ▲□▶ ▲□▶ ▲□▶ ▲□ ● ● ●

#### **Observation 2: Unreliable Flash Cells**

- We organized the Flash cells into two categories:
  - reliable Flash cells,
  - ② unreliable Flash cells.



Observation 2: Unreliable Flash Cells

- We organized the Flash cells into two categories:
  - reliable Flash cells,
  - ② unreliable Flash cells.
- An unreliable Flash cell is a cell that experienced  $\geq$  50 errors across the lifetime of the device.



Observation 2: Unreliable Flash Cells

- We organized the Flash cells into two categories:
  - reliable Flash cells,
  - 2 unreliable Flash cells.
- An unreliable Flash cell is a cell that experienced  $\geq$  50 errors across the lifetime of the device.
- We identified 62659 (of the 134217728 total cells tested) unreliable Flash cells.



Observation 2: Unreliable Flash Cells

- We organized the Flash cells into two categories:
  - reliable Flash cells,
  - 2 unreliable Flash cells.
- An unreliable Flash cell is a cell that experienced  $\geq$  50 errors across the lifetime of the device.
- We identified 62659 (of the 134217728 total cells tested) unreliable Flash cells.
- These cells accounted for over 10% of the total errors measured.



# Observation 3: Unreliable Flash Cells Have Prominent Error Patterns

| Programmed state   | Percentage of Errors |
|--------------------|----------------------|
| 000                | 0.4745               |
| 000                | 0.1630               |
| 0 <mark>1</mark> 0 | 0.0711               |
| 000                | 0.0676               |
| 001                | 0.0558               |
| 00 <mark>1</mark>  | 0.0525               |
| 01 <mark>1</mark>  | 0.0186               |





Low Voltage

(□) (č

ヨー の ٩



# Main Idea

• Design error-correction scheme that takes into account Observations 1, 2, and 3.



▲□▶ ▲□▶ ▲□▶ ▲□▶ ▲□ ● ● ●

# Main Idea

- Design error-correction scheme that takes into account Observations 1, 2, and 3.
  - Code corrects errors where most of the errors affect only a single bit within each Flash cell (Observation 1).



# Main Idea

- Design error-correction scheme that takes into account Observations 1, 2, and 3.
  - Code corrects errors where most of the errors affect only a single bit within each Flash cell (Observation 1).
  - Code allows unreliable Flash cells to be programmed at low voltage levels (Observations 2,3).



# Code Properties

• Codes are over alphabet of size  $q = 2^m$ , where *m* is some positive integer and each symbol represents a Flash cell.



# Code Properties

- Codes are over alphabet of size  $q = 2^m$ , where *m* is some positive integer and each symbol represents a Flash cell.
- A symbol is a binary length-*m* vector.



# Code Properties

- Codes are over alphabet of size  $q = 2^m$ , where *m* is some positive integer and each symbol represents a Flash cell.
- A symbol is a binary length-*m* vector.
- A codeword is *n* binary length-*m* vectors so the result is a length-*nm* vector.

# Code Properties

- Codes are over alphabet of size  $q = 2^m$ , where *m* is some positive integer and each symbol represents a Flash cell.
- A symbol is a binary length-*m* vector.
- A codeword is *n* binary length-*m* vectors so the result is a length-*nm* vector.
- Example over alphabet of size 8: (45702) -> (100 101 111 000 010)



## Error Vectors

#### Definition (Graded Bit-Error Vector)

The length-*nm* vector  $\mathbf{e} = (\mathbf{e}_0, \mathbf{e}_1, \dots, \mathbf{e}_{n-1})$ , where each *m*-bit vector  $\mathbf{e}_i$  represents a symbol of size  $2^m$ , is a  $[\mathbf{t}_1, \mathbf{t}_2; \ell_1, \ell_2]$ -bit-error-vector if



# Error Vectors

#### Definition (Graded Bit-Error Vector)

The length-*nm* vector  $e = (e_0, e_1, \dots, e_{n-1})$ , where each *m*-bit vector  $e_i$  represents a symbol of size  $2^m$ , is a  $[t_1, t_2; \ell_1, \ell_2]$ -bit-error-vector if  $|\{i : e_i \neq 0\}| \le t_1 + t_2$ .



# Error Vectors

#### Definition (Graded Bit-Error Vector)

The length-*nm* vector  $e = (e_0, e_1, \dots, e_{n-1})$ , where each *m*-bit vector  $e_i$  represents a symbol of size  $2^m$ , is a  $[t_1, t_2; \ell_1, \ell_2]$ -bit-error-vector if

**1** 
$$|\{i : e_i \neq 0\}| \leq t_1 + t_2$$

$$2 \quad \forall i, wt(\boldsymbol{e}_i) \leq \ell_2.$$



# Error Vectors

#### Definition (Graded Bit-Error Vector)

The length-*nm* vector  $\mathbf{e} = (\mathbf{e}_0, \mathbf{e}_1, \dots, \mathbf{e}_{n-1})$ , where each *m*-bit vector  $\mathbf{e}_i$  represents a symbol of size  $2^m$ , is a  $[\mathbf{t}_1, \mathbf{t}_2; \ell_1, \ell_2]$ -bit-error-vector if

**1** 
$$|\{i : e_i \neq 0\}| \leq t_1 + t_2$$

2 
$$\forall i, wt(e_i) \leq \ell_2$$

**3** 
$$|\{i : wt(e_i) > \ell_1\}| \le t_2$$
.



## Graded Bit-Error Example

- Suppose the vector *x* of length 6 with 3-bit symbols shown below was transmitted.
  - $x = (000 \ 110 \ 010 \ 101 \ 000 \ 111)$



# Graded Bit-Error Example

- Suppose the vector x of length 6 with 3-bit symbols shown below was transmitted.
  - $x = (000 \ 110 \ 010 \ 101 \ 000 \ 111)$
- The vector **y** was received:

 $y = (101 \ 110 \ 000 \ 101 \ 000 \ 011).$ 



# Graded Bit-Error Example

- Suppose the vector x of length 6 with 3-bit symbols shown below was transmitted.
  - $x = (000 \ 110 \ 010 \ 101 \ 000 \ 111)$
- The vector y was received:

 $y = (101 \ 110 \ 000 \ 101 \ 000 \ 011).$ 

• Then [2,1;1,2]-bit-errors occurred.



# Graded Bit-Error Example

- Suppose the vector x of length 6 with 3-bit symbols shown below was transmitted.
  - $x = (000 \ 110 \ 010 \ 101 \ 000 \ 111)$
- The vector y was received:

 $y = (101 \ 110 \ 000 \ 101 \ 000 \ 011).$ 

- Then [2, 1; 1, 2]-bit-errors occurred.
  - There are 2 + 1 symbols in error.



▲□▶ ▲□▶ ▲□▶ ▲□▶ = 三 のへで

# Graded Bit-Error Example

- Suppose the vector x of length 6 with 3-bit symbols shown below was transmitted.
  - $x = (000 \ 110 \ 010 \ 101 \ 000 \ 111)$
- The vector y was received:

 $y = (101 \ 110 \ 000 \ 101 \ 000 \ 011).$ 

- Then [2, 1; 1, 2]-bit-errors occurred.
  - There are 2 + 1 symbols in error.
  - At most 2 bits are in error for each symbol.



▲□▶ ▲□▶ ▲□▶ ▲□▶ = 三 のへで

# Graded Bit-Error Example

- Suppose the vector x of length 6 with 3-bit symbols shown below was transmitted.
  - $x = (000 \ 110 \ 010 \ 101 \ 000 \ 111)$
- The vector y was received:

 $y = (101 \ 110 \ 000 \ 101 \ 000 \ 011).$ 

- Then [2, 1; 1, 2]-bit-errors occurred.
  - There are 2 + 1 symbols in error.
  - At most 2 bits are in error for each symbol.
  - There is 1 symbol that has more than 1 bit in error.



#### Dynamic Bit-Error-Correcting Codes

• Let C be a  $[t_1, t_2; \ell_1, \ell_2]_{2^m}$  code.



Dynamic Bit-Error-Correcting Codes

- Let  $\mathcal{C}$  be a  $[t_1, t_2; \ell_1, \ell_2]_{2^m}$  code.
- The encoder chooses a message w ∈ Z<sub>M</sub> for some positive integer M.



Dynamic Bit-Error-Correcting Codes

- Let C be a  $[t_1, t_2; \ell_1, \ell_2]_{2^m}$  code.
- The encoder chooses a message w ∈ Z<sub>M</sub> for some positive integer M.
- For any collection of indices S where  $|S| \leq s_1$ , there exists a codeword c that can be used to represent w such that:



Dynamic Bit-Error-Correcting Codes

- Let C be a  $[t_1, t_2; \ell_1, \ell_2]_{2^m}$  code.
- The encoder chooses a message w ∈ Z<sub>M</sub> for some positive integer M.
- For any collection of indices S where  $|S| \leq s_1$ , there exists a codeword c that can be used to represent w such that:
  - For any  $j \in S$ , the voltage level of cell  $c_j$  is at most  $s_2$ .



Dynamic Bit-Error-Correcting Codes

- Let  $\mathcal{C}$  be a  $[t_1, t_2; \ell_1, \ell_2]_{2^m}$  code.
- The encoder chooses a message w ∈ Z<sub>M</sub> for some positive integer M.
- For any collection of indices S where  $|S| \leq s_1$ , there exists a codeword c that can be used to represent w such that:
  - For any  $j \in S$ , the voltage level of cell  $c_j$  is at most  $s_2$ .
- The code C is referred to as an  $[t_1, t_2; \ell_1, \ell_2; s_1, s_2]_{2^m}$ dynamic-bit-error-correcting code.





• We compared a dynamic graded-bit-error-correcting code against the following classes of codes:



# Evaluation

- We compared a dynamic graded-bit-error-correcting code against the following classes of codes:
  - **1** A non-binary code over GF(8).



# Evaluation

- We compared a dynamic graded-bit-error-correcting code against the following classes of codes:
  - **1** A non-binary code over GF(8).
  - A binary BCH code applied to MSB/CSB/LSB in parallel.



# Evaluation

- We compared a dynamic graded-bit-error-correcting code against the following classes of codes:
  - A non-binary code over GF(8).
  - 2 A binary BCH code applied to MSB/CSB/LSB in parallel.
  - Three different binary BCH codes each one applied to one of the MSB/CSB/LSB.



# Evaluation

- We compared a dynamic graded-bit-error-correcting code against the following classes of codes:
  - **1** A non-binary code over GF(8).
  - **2** A binary BCH code applied to MSB/CSB/LSB in parallel.
  - Three different binary BCH codes each one applied to one of the MSB/CSB/LSB.
  - A  $[t_1, t_2; \ell_1, \ell_2]_{2^3}$ -bit-error-correcting code.



イロト (理) (ヨ) (ヨ) (ヨ) の(の)

# Evaluation

- We compared a dynamic graded-bit-error-correcting code against the following classes of codes:
  - **1** A non-binary code over GF(8).
  - **2** A binary BCH code applied to MSB/CSB/LSB in parallel.
  - Three different binary BCH codes each one applied to one of the MSB/CSB/LSB.
  - A  $[t_1, t_2; \ell_1, \ell_2]_{2^3}$ -bit-error-correcting code.
- For each graph, the codes compared have the same length and the same rate.



▲ロト ▲冊ト ▲ヨト ▲ヨト ヨー のなべ

# Results for block length 4096





# Results for block length 8192

Error Rates of Codes Applied to TLC Flash





# Results for block length 16384







• Newer generations of Flash memory continue to demand more efficient error-correction schemes.



・ロト ・ 日 ・ モ ト ・ 日 ・ うらぐ



- Newer generations of Flash memory continue to demand more efficient error-correction schemes.
- Errors that occur within these newer Flash devices tend to follow certain patterns.





- Newer generations of Flash memory continue to demand more efficient error-correction schemes.
- Errors that occur within these newer Flash devices tend to follow certain patterns.
- By taking into account the dominant error patterns observed on a TLC Flash cell, we designed more efficient error correction codes.





New center on Coding for Storage at UCLA: http://www.loris.ee.ucla.edu/codess

Kick-off day on 9/19/2013!

Registration is free. Register early, space is limited.



ション ふゆ メ キャ キャ マ ちょうく