Configuring FPGAs with high-speed NOR flash

NOR flash memory has been widely deployed as a configuration device for FPGAs (Field Programmable Gate Arrays). Its low latency and high data throughput characteristics for FPGAs make FPGAs widely used in applications such as industrial, communications, and automotive ADAS (Advanced Driver Assistance Systems). The fast start-up time requirement of the camera system in a car scene is a good example - the display speed of the rear view image on the dashboard display after the vehicle is started is the most prominent design challenge.

After power-up, the FPGA immediately loads the configuration bitstream stored in the NOR device. After the transfer is complete, the FPGA transitions to the active (configured) state. FPGAs include a number of configuration interface options, typically including a parallel NOR bus and a Serial Peripheral Interface (SPI) bus. Memory that supports these buses is always subject to minor incompatibilities between products from different vendors, adding to the difficulty of purchasing multiple memory devices.

The newly released JEDEC xSPI specification was jointly developed by major NOR flash vendors. The new standard has ended the decades-old NOR flash memory vendor's independent development of products and their own situation. Although there are nuances, the core JEDEC xSPI functionality of each vendor's products is now identical. The JEDEC xSPI specification standardizes bus transactions, commands, and a number of internal functions. Combined with high throughput performance, these next-generation flash memories enable new applications and features. For example, the Cypress Semper NOR Flash family is compliant with the JEDEC xSPI specification and provides a sustained transfer rate of 400MB / s, making it ideal for FPGA configuration memory. Specifically, with a data rate of 400 MB/s, a device with a capacity of 128 MB (1 Gb) can be transferred in 320 ms.

FPGA configuration history review

When the FPGA first came out, the optional configuration memory was a parallel EPROM or parallel EEPROM product. Over time, NOR flash technology has emerged as the times require, and is widely used due to its reprogrammability and cost-effectiveness within the system. In the second revolutionary turn, the SPI memory interface replaced the parallel NOR interface in most applications. Today's SPI memory products feature high density, small package size and high read throughput, and the most important feature - a more efficient low pin count interface.


Figure 1 - Gigabit Quad SPI (6-pin) and Parallel NOR (45-pin) Interface

Figure 1 shows a comparison of the pin assignments of a Gigabit SPI device to a Gigabit parallel NOR. For a gigabit memory, a quad serial peripheral interface (QSPI) device has a six-pin interface, while a parallel NOR device requires 45 pins. The large difference in pin counts has led to the widespread adoption of QSPI devices as the preferred configuration interface. The QSPI interface allows density to be changed without changing the footprint of the device.

FPGA configuration speed

As process nodes shrink, FPGA devices continue to increase the number of programmable logic blocks available, leading to the need for higher density and faster configuration memory. Modern FPGAs need to load up to 128MB of data during configuration. These high-density configuration bitstreams take longer to transfer from the NOR flash device to the FPGA. The configuration interface is optimized not only for read throughput, but also for facilitating interoperability between different NOR flash manufacturers.

SPI read throughput

In the past few years, SPI read throughput has grown significantly from the initial SPI interface running in x1 mode to modern QSPI products running in x4 DDR mode. As can be seen from Table 1, the next-generation flash memory device can drive another breakthrough in SPI bus performance.

Table 1 - Flash Device SPI Read Throughput Options

Modern SPI devices can be permanently configured for a fixed bus width and transmission type that can be run immediately upon power up. The FPGA must support this permanent configuration to initiate the configuration process immediately after power up.

Alternatively, the SPI memory can exit the power-on state in x1 mode, allowing the host system (FPGA) to query for characteristics in the Serial Flash Discoverable Parameters (SFDP) table in memory. This x1 mode has become a standard feature supported by multiple memory vendors and allows FPGAs to retrieve critical information about device functionality. Once the device characteristics are retrieved, the FPGA memory controller and SPI memory device can be quickly reconfigured for maximum read performance.


Figure 2 - Using the Serial Flash Discoverable Parameters (SFDP) Table to Configure the SPI Bus Function When Powering Up

Using an integrated SFDP table to retrieve critical device information is especially important when choosing a next-generation flash device that can operate with x1, x4, or x8 bus widths and SDR or DDR transfer types. The bus width and transfer type chosen must be consistent with the bus interface infrastructure implemented on the FPGA.

Dual QSPI configuration interface

To reduce FPGA configuration time, many modern FPGAs allow the configuration bitstream to be partitioned into two QSPI devices (Figure 3). The two QSPI devices are connected in parallel, with the lower nibble of the bit stream being stored in the "master" QSPI device (QSPI_P) and the upper nibble of the bit stream being stored in the "auxiliary" QSPI device (QSPI_S). These two devices operate in parallel while loading the bit stream, effectively doubling the read data transfer rate.

Note that the interface is essentially independent on both devices except for the shared SCK (serial clock) line. The reason for sharing the SCK line is to minimize timing skew when reading devices in parallel (ie, simultaneously). When the same operation is performed using the same target address, only one device can be accessed at a time, or both devices can be accessed at the same time.


Figure 3 - bis QSPI an interface (11 pin) configuration bitstream allowed between two devices QSPI partition, thereby effectively doubling the read data transfer rate.

This 11-pin dual QSPI configuration is a great advantage when large FPGA devices need to transfer large configuration (ie, high density) configuration bitstreams in the fastest way possible.

Flash configuration

Next-generation flash memory runs with x1 (primarily for SFDP access), x4 or x8 IO bus width, supports data transfer in SDR or DDR format, and facilitates high-speed transfers by using new Data Strobe signals. For example, the Cypress Semper NOR flash octal configuration with an 11-pin interface (Figure 4).


Figure 4 - The low pin count interface supports x1, x4 or x8 IO bus width transfer data in SDR or DDR format. The figure shows a Cypress Semper NOR flash octal configuration with an 11-pin interface.

This new data pass must be incorporated into the FPGA configuration interface to take advantage of the high throughput read performance of next-generation flash devices. The data strobe is aligned with the output read data edge in the same manner as the strobe on a low power DDR DRAM device (Figure 5). The data strobe "draws" the data eye and allows the FPGA to effectively grab data at high clock frequencies.


Figure 5 - The x8 DDR read transaction with data strobe is edge aligned with the output read data, enabling the FPGA to efficiently fetch data at high clock frequencies.

Support for continuous read operations is one of the flash features that are well suited for FPGA configuration. Continuous reading begins with the host (MCU (microcontroller) or FPGA) asserting CS# (CS chip select pin) and then issues a read command followed by the target address. After a plurality of delay cycles, the memory outputs data from the target address. If the host continues to switch clocks, the memory will respond by outputting data from the next sequential address. As long as the clock continues to switch, the memory will continue to output data from the sequential address. This sequential read capability allows the FPGA to configure a single read transaction.

AutoBoot is another feature that helps with FPGA configuration. AutoBoot performs an automatic read from the pre-configured target address during a power-on reset and then outputs the data as soon as the first CS# is set (Figure 6). This feature is also useful for ASIC (dedicated chip) devices that require a simple configuration mechanism. Once CS# is deasserted, the memory will return to its standby state and the subsequent operations will be processed in the normal way.


Figure 6 - AutoBoot Read in Operation (with 3 warm-up cycles)

The write transaction for the NOR flash device (Figure 7) is almost identical to the standard SPI operation, with two exceptions. First, the new data strobe signal must be driven to LOW during the entire transaction. Second, when configured for DDR operation, the data is written as word (16b) rather than the byte write programming granularity of traditional SPI products.


Figure 7 - The write transaction for the NOR flash requires driving the data strobe signal to LOW during the entire transaction and writing the data as a 16-bit word when configured for DDR operation.

Next-generation NOR flash devices offer the high throughput required for high-density and out-of-the-box requirements for large-scale FPGA applications. Major NOR flash manufacturers are involved in the development of the JEDEC xSPI specification, providing OEMs with a wide range of procurement options. The JEDEC xSPI specification covers the above octal SPI interface and the HyperBus interface, both providing 400MB/s read throughput, and the read throughput achieved is much higher than traditional SPI products. To take advantage of the high-speed infrastructure, the FPGA SPI controller needs to be modified. New features to consider include DDR data rates, new data strobe pins for data capture, and an extended x8 bus interface. In addition, some NOR flash devices (such as the Cypress Semper NOR family) allow the elimination of one of the QSPI devices when implementing a dual QSPI configuration architecture. In the case of fast FPGA configuration and FPGA applications that perform real-time reconfiguration, the performance offered by next-generation flash will have a strong advantage.

About the author: Cliff Zitlaw semiconductor memory has 36 years of experience in the development, focus can be optimized memory performance bus interfaces in various applications constrained range. Cliff is the inventor of the Xicor Microprocessor Serial Memory Interface (EEPROM), Micron's CellularRAM Interface (PSRAM) and Cypress's Hyperbus Interface (NOR and PSRAM). Cliff is 49 patent authors or co-authors related to memory functions and usage. In his spare time, Cliff likes to barbecue, watch TV, or take a nap on Saturday.