

## FPGAs in Flash Controller Applications

David McIntyre DSMcIntyreConsulting@gmail.com



#### **FPGA** Then...







## **Processing Options**

#### Technology scaling favors programmability and parallelism





## Altera FPGA Technology – Hardware Programming



# Flash Memory FPGA Utilization across Data Centers



#### **Point and SOC Solutions**

- Application Acceleration
- Embedded Processing
- I/O Protocol Support
- Memory Control
- Compression
- Security
- Port Aggregation & Provisioning



#### Hybrid RAID System

#### - Persistent DRAM and Flash Caches





#### Hybrid RAID System - PCIe Switch Centric



## Flash Cache Challenges & Evolution

Ongoing Challenges

SUMMIT

- Error correction costs increasing
- Limited endurance (lifetime writes)
- Slow write speed
- SATA/SAS SSD interface is slow
- Storage over PCIe
  - Faster BW projections
  - SATA Express
  - NVM Express
  - SCSI Express
- NVMe over Fabrics
- Emerging flash technologies
  - MRAM (Magneto Resistive)
  - PCM (Phase Change)
  - RRAM (Resistive)
  - NRAM (Carbon Nanotube)...





9

## **Memory Categories**

Figure 1. Categories of Memory (Charge Versus Resistivity)



Key: DRAM = dynamic RAM EEPROM = electrically erasable programmable ROM EPROM = erasable programmable ROM FeRAM = ferroelectric RAM

MRAM = magnetoresistive RAM PRAM = phase-change RAM PSRAM = pseudostatic RAM SRAM = static RAM

**Flash**Memorv

SUMMI



## A Cost Effective Bridge between DRAM and NAND?

- Intel/Micron Xpoint (NV Memory)
  - Vertical placement of floating gate cells
  - Vastly improved endurance and performance vs. NAND
  - 256GB 32- tier 3D TLC



Source: Anantech.com

- Sandisk/Toshiba
  - 256GB 48 layer 3D NAND (TLC)

## **Migration Timeline- Cost**

Figure 6. Migration Timeline for Emerging Memory Technologies

Geometry (nm)

Memory

SUMMIT

Fla



Source: Gartner



### **5MB in Flight!**





#### Flash Controller Design Considerations



- Uncertainty Favors PLDs for Flash Control Solutions
- Flash Challenges Continue
  - Data loss, slow writes, wear leveling, write amplification, RAID
- Many Performance Options
  - Write back cache, queuing, interleaving, striping, over provisioning
- Many Flash Cache Opportunities
  - Server, blade and appliance



#### Emerging memory types

- ONFI 4.0, Toggle Mode 2.x
- PCM, MRAM
- DDR4

#### Controller Performance Options

- Write back cache, queuing, interleaving, striping
- ECC levels
  - BCH, LDPC, Hybrid
- FTL location- Host or companion
- Data transfer interface support
  - PCI Express, SAS/SATA, FC, IB





## Flash Memory Flash Controller Support

| IP                 | Ю          | Speed    | Logic Density | Comments                                              |
|--------------------|------------|----------|---------------|-------------------------------------------------------|
| ONFI 3.x           | 40 pins/ch | 400 MTps | 5KLE/ch       | NAND flash control, wear leveling, garbage collection |
| Toggle<br>Mode 2.x | 40 pins/ch | 400 MTps | 5KLE/ch       | Same                                                  |
| DDR3               | 72 bit     | 1066 MHz | 10KLE         | Flash control modes available for NVDIMM              |
| PCM                |            |          | 5KLE          | PCM- Pending production \$                            |
| MRAM               |            |          | 5KLE          | MRAM- Persistent memory<br>controller                 |
| BCH                |            |          | <10KLE        | Reference design                                      |
| PCle               | G3x8       | 64Gbps   | HIP           | Flash Cache                                           |

# Flash Cache Controller Examples

- Multi Channel Controller
  - Single to multi Flash channel capability
  - Basic NAND
     development platform
  - Provides High Speed ONFI & Toggle NAND PHY
  - ECC of 8 and 15 bits of error correction
- Single Channel Controller







## **Error Correction Overview**

## Driving Factors for New ECC

- Increasing Bit errors in NAND Flash
- Soft error occurrences
- Decrease in write cycles
- RS, BCH overhead for data and spare area
- Increase use of Metadata in file systems
- Correction Overhead
- Gate count
- Requirement for no data loss

#### **Comparing ECC Solutions**

| Features         | BCH  | LDPC   |
|------------------|------|--------|
| Gate<br>Count    | High | Mid    |
| Latency          | Low  | Medium |
| Tuneablity       | low  | high   |
| Soft Data        | no   | high   |
| Data<br>Overhead | high | low    |





- **ECC- Block Hamming** 
  - DRAM variant
  - Applicable to the flash page block sizes
  - Smaller blocks used as error rates increased

#### Reed Solomon

- CD-ROM basis, stronger than Hamming
- Split correction blocks split into 9 bit symbols
- Good for clumped errors

BCH

- Better supports MLC >8bits correction block
- BCH ECC increasing with correction block sizes



- Addresses higher BER across process node curve
- Good for TLC
- FPGA parallelism of Parity Matrix allows for faster processing of algorithm

 $\mathbf{c}_1 \, \mathbf{c}_2 \, \mathbf{c}_3 \, \mathbf{c}_4 \, \mathbf{c}_5 \, \mathbf{c}_6 \, \mathbf{c}_7 \, \mathbf{c}_8 \, \mathbf{c}_9 \mathbf{c}_{10} \mathbf{c}_{11} \mathbf{c}_{12}$ 

 0
 1
 0
 0
 1
 1
 1
 0
 0
 0
 0

 1
 1
 0
 0
 1
 1
 0
 0
 0
 1
 1
 1
 1
 0
 0
 0
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1
 1

 $c_{3} \oplus c_{6} \oplus c_{7} \oplus c_{8} = 0$   $c_{1} \oplus c_{2} \oplus c_{5} \oplus c_{12} = 0$   $c_{4} \oplus c_{9} \oplus c_{10} \oplus c_{11} = 0$   $c_{2} \oplus c_{6} \oplus c_{7} \oplus c_{10} = 0$   $c_{1} \oplus c_{3} \oplus c_{8} \oplus c_{11} = 0$   $c_{4} \oplus c_{5} \oplus c_{9} \oplus c_{12} = 0$   $c_{1} \oplus c_{4} \oplus c_{5} \oplus c_{7} = 0$   $c_{6} \oplus c_{8} \oplus c_{11} \oplus c_{12} = 0$   $c_{2} \oplus c_{3} \oplus c_{9} \oplus c_{10} = 0$ 





Target Application: Enterprise Tier-1 Storage: Databases and Virtualization



| Function      | Solution<br>Rqts                                                           | IP Rqts                                                                                                                              |
|---------------|----------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------|
| Flash Control | -ONFI 2.X/3.0<br>-Toggle Mode 2.0<br>- Multi flash load/ch<br>- 40 GPIO/ch | <ul> <li>Flash Controller<br/>(bad block mgt and<br/>wear leveling)</li> <li>Metadata &amp; caching</li> <li>ECC BCH core</li> </ul> |
| RAID Control  | PCIe Gen 3                                                                 | - Flash-specific RAID<br>- Switching and<br>aggregation                                                                              |



<u>Target Application:</u> Embedded PCIe storage for flash cache and scale-out computing



PCIe: Gen 3x8

FPGA controller provides flexibility to integrate multiple complex functions and adapt to changing interfaces & APIs.

| Function         | Solution<br>Rqts                                                                                                    | IP Rqts                                                                                                                                                                                                            |
|------------------|---------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Flash<br>Control | -ONFI 2.X/3.0<br>-Toggle Mode 2.0<br>- Multi flash load/ch<br>-40 GPIO/ch<br>-PCIe Gen 3 x8<br>-Low power & cooling | <ul> <li>Flash Controller</li> <li>(bad block mgt and wear<br/>leveling)</li> <li>Flash RAID</li> <li>Cache controller</li> <li>BCH core</li> <li>PCIe config &lt; 100msec</li> <li>Host interface/APIs</li> </ul> |



### **System IO Considerations**



#### System Application Requirements

- Performance- bandwidth
- IO network
- Memory
- Latency





| PCIe Mode | Thruput<br>(GT/s per<br>lane) | Production |
|-----------|-------------------------------|------------|
| Gen 2     | 5.0                           | Now        |
| Gen 3     | 8.0                           | Now        |
| Gen 4     | 16.0                          | 2016       |

#### Note:

1. LMI: Local Management Interface

2. DPRIO: Dynamic Partial Reconfigurable Input/Output

#### Hardened IP (HIP) Advantages

- Resource savings of 8K to 30K logic elements (LEs) per hard IP instance, depending on the initial core configuration mode
- Embedded memory buffers included in the hard IP
- Pre-verified, protocol-compliant complex IP
- Shorter design and compile times with timing closed block
- Substantial power savings relative to a soft IP core with equivalent functionality



- Scalable host controller interface for PCIe-based solid state drives
- Optimized command issue and completion path
- Benefits
- Software driver standardization
- Direct access to flash
- Higher IOPS and MB/s
- Lower latency
- Reduced Power Consumption



## Memory DRAM Cache Backup

- Data Center server power outages continue
- Read/Write Consequences
  - Data Loss
  - Undetected errors in host application
- NVDIMM designs protect system integrity but...

| Battery Limitations   | Issue                      |  |
|-----------------------|----------------------------|--|
| Shelf Life            | One year max or 500 cycles |  |
| Disposal and Handling | Hazardous Waste Management |  |
| Data Storage Capacity | Up to 72 hours             |  |
| Down Time             | Charge Time up to 6 hours  |  |
| Replacement Cost      | Field Time and Materials   |  |



### **The Perfect Storm**



#### Technology Enablers

- Super Capacitors are production worthy
- Flash memory costs continue to decline
- **FPGA** technology meeting power/performance/cost





### **NVDIMM Controller Architecture**





## **Flashing Forward**

- FPGAs are a great technology option for Data Centers
  - Networking: Port aggregation
  - Compute: Application Acceleration
  - Storage: Persistent Memory Control
- All development phases supported
  - Prototyping
  - Production
  - Test Validation
  - Upgrades