The new DDR4 standard represents a substantial upgrade to JEDEC’s dynamic random access memory (DRAM) standard, with numerous changes designed to lower power consumption while delivering higher density and bandwidth within the memory subsystem.
DDR developers are targeting this new technology at a range of applications from high density blade servers, to high performance workstations to power-conscious mobile devices. Deploying general purpose memory in systems with specialized power and performance requirements mean the designer must evaluate the cost/benefit of these new DDR4 features within the context of the target application. New techniques for analyzing and testing DDR operation in a live system will be essential to gain this visibility. Balancing the promise of faster memory IO with the goal of lower power consumption at the system level will require tuning of features, timing, and design.
DDR4 is expected to deliver significantly higher performance via faster data transfer rates reaching at least 3200 MT/s over time. In addition, the new specification introduces a number of enhancements used to improve both power efficiency and reliability. These features can add significant verification for system designers, firmware developers and software designers. As one would expect, engineers are expected to march through the natural progression of the technology validation including signal integrity, timing analysis and specification compliance, performance tuning and power management modeling.
This article explores methods to verify initial design and compliance with the new DDR4 JEDEC specifications along with techniques used to take advantage of DDR4 features to maximize system performance. While there are many potential instruments that can be used, a new generation of dedicated DDR bus analyzers now provide comprehensive timing and protocol analysis making them an important tool for accelerating DDR4 system validation and design. Substantially lower in cost than a logic analyzer, these systems can be used to qualify different memory DIMM components, as well as help sustaining engineering groups verify system operations over the entire product life cycle.
DDR4 Technical Overview
Table 1 provides a brief comparison between DDR4 and DDR3 memory technology. DDR4, initially targeted for the server market, adopts a number of enhancements intended to deliver better performance, power-savings and RAS (reliability, accessibility and serviceability) versus DDR3. These enhancements present unique and significant performance improvement and power reduction opportunities. Special attention must be taken when setting DDR4 power savings parameters so that suitable performance levels are still achieved.
DDR4’s new memory interface employs “pseudo- open-drain” (POD) termination where memory cells can store a logical 1 without consuming power. POD relies on switchable, on-die termination instead of a separate resistor pull up. Parallel-terminating the receiver at the far end means the DDR4 DIMM only consumes power when the Vdd rail is pulled low to represent logical zero.
The anticipated higher transfer rates in DDR4 mandate tighter timing margins to support normal variations in memory DIMMs. DDR4 also offers programmable Command-to-Address Latency that can be used to improve system power efficiency. Expanded role of MRS and the introduction of bank groups make memory controller designs more complex. These factors are expected to drive changes in memory controller designs and associated IP in order to support DDR4.
Data transfer rates for DDR4 and DDR3 should overlap for the foreseeable future, with DDR4 delivering a longer performance runway. It is quite conceivable for a DDR4 platform to deliver moderate power savings versus a comparable DDR3 design, but potentially at the expense of lower memory bandwidth under certain DDR4 operating parameters. System designers need to design highly tuned, balanced platforms that leverage the power saving and RAS enhancements of DDR4.
Managing DDR4 JEDEC Specifications
The JEDEC specification targets specific timings for DDR4 memory controllers and their associated DRAMs. The majority of these are described as minimums, along with a minimum time before subsequent events are allowed. One of the primary JEDEC specification objectives is to avoid memory collisions caused by overlapping commands. Memory controllers and DRAMs therefore must be designed and tested for adherence to the JEDEC Teledyne LeCroy Introduction to DDR4 Design and Test page | 3 of 8 specifications across process, voltage and temperature variation during their functional testing with Automated Test Equipment (ATE). Additional variables introduced at the system level, such as DIMM design, socket, and motherboard design and layout, can contribute to timing violations at a system level, and must be taken into consideration.
DDR4 introduces the concept of Bank Groups that allows the system designer to build interleaved memory arrays down to the individual device level. For smaller systems which may have only a single memory device, this Bank Group feature can offer substantial benefits.
For example, one bank group can receive a series of pipelined commands for its upcoming data transfers. Once the first Bank Group starts its actual data transfers, another Bank Group can be initialized with its separate set of pipelined commands. After the first Bank Group completes its data transfers, the second Bank Group can initiate its data transfers, since it has already received its set of pipelined commands. In many cases, this Bank Group command pipelining can significantly reduce the impact of memory device delays, such as shown below.
A new generation of dedicated DDR4 protocol analyzer such as the Teledyne LeCroy Kibra 480 system provides automatic detection of JEDEC timing violations by monitoring memory IO on a live system. Furthermore, validation engineers can leverage the flexibility of a protocol analyzer’s trigger state machine and its deeper recording memory to set up more complex triggering scenarios, or optionally use the analyzer in conjunction with an externally triggered oscilloscope for deeper signal integrity evaluation and analysis.
Figure 1 (on the next page), illustrates a DDR4 timing violation captured from a series of sequential accesses to the same DRAM bank group. The JEDEC tRRD_L specification requires a minimum delay of 6 clock periods between subsequent accesses. In this case, the tRRD_L specification has been violated since only 5 Clock intervals between activates are found in the captured trace. The memory controller needs to be adjusted so that the tRRD_L specification is properly met. In addition to the spec violation, this violation may also cause bus contention on the data lines.
Further examination of the lower pane in Figure 1 (Traffic Summary) illustrates the total number of timing violations in the captured traffic. Note that the traffic violation tool tip in the waveform pane highlights the expected timing interval (i.e. timing specification) versus the actual measured timing. Some of the timing violations can be attributed to poor design issues in the memory controller, while others could be caused by signal integrity marginalities introduced at the board level or sequence/pattern specific failures. As these violations are flagged, they point out problem areas for the memory system designer to investigate.
DDR4 Configuration Starts with MRS Commands
DDR4 introduces four new MRS commands to help support new features. Many of these new features are optional allowing system integrators to turn them on and off based on the application. Making system more configurable will involve verifying that specific functions are enabled by viewing the MRS commands.
A critical feature in DDR4 is the DQ Training with MPR that is initiated via the MR3 immediately after power on. This provides pre-defined registers that can be used to choose fixed or custom training patterns that will be read in a controller specified order. Most memory designers will concede that DDR4 will not reach the expected performance without receiver calibration. The MR3 payload (see Figure 2) shows the Address lines (A2) is used to enable MPR training and the (A12 –A11) are used to identify the format of the MPR pattern.
Using the DDR4 Bus analyzer, it’s possible to trigger, capture, and decode the MR3 command. “MPR Page Selection:0” specifies that the DIMM should use the Multi-Purpose Register 3 - page zero default patterns for training and transmit them in serial format. Toggling the “Dataflow from MPR” option programs the DRAM to respond with the specified pattern on the next READ command instead of from the memory array. Back-to-back reads will now be used to “tune” the receivers to operate with the highest signal integrity.
Teledyne LeCroy’s Kibra 480 analyzer provides developers with unique ability to “Follow MRS Commands” on the fly. Enabling this option allows the analyzer to adjust the JEDEC timing intervals in real time. In the event the memory controller sends MRS commands that change specific parameters, the Follow MRS Commands option prevents the Kibra from detecting and reporting false errors (i.e.: MRS commands that toggle DQ Training mode or change the burst length).
The timing analysis methods discussed above allow designers to quickly identify timing violations on an individual system basis. However, robust system designs should be able to accommodate platform, component and DIMM variations. This requires a deeper characterization of critical timing specifications to ensure sufficient system design tolerances. As memory systems increase in speed and complexity, many controller and DIMM combinations may perform better than the JEDEC specification. Memory system designers need visibility into the system configuration and performance as opposed to simple specification compliance.
Dedicated bus analyzers like the Teledyne LeCroy Kibra 480 system offer excellent flexibility to selectively sweep and measure critical timing parameters, helping measure the actual system design timing margins, as seen in figure 2. Of unique value, timing analysis can be performed on previously captured traffic by simply modifying timing parameters and re-running the analysis software on a personal computer allowing the protocol analyzer to be freed up for more critical debug activities.
Maximizing DDR4 System Performance
As mentioned earlier, while DDR4 is designed to provide significant power and performance advantages over DDR3, in the interim their transfer rates are likely to overlap giving the more mature highly-tuned DDR3 designs a performance advantage for the time being. Given the near term anticipated cost disadvantage of DDR4 memory subsystems, early adopters should pay special attention to tuning their platform in order to maximize performance.
DDR4 architecture introduces the concept of two or four selectable bank groups, a unique feature that can be used to boost the performance of DDR4 platforms. This allows for separate activation, access and refresh of each unique bank group, improving overall memory efficiency and bandwidth.
DDR4 platforms can only achieve the highest throughputs with consecutive reads and writes when targeting different bank groups allowing for lower latencies on command-to-command timing (tCCD-S) and faster burst access. New memory controller designs taking advantage of the DDR4 bank groups must be thoroughly validated since interleaving data between bank groups increases the risk of collisions as DDR4 systems pivot between CCD-S and CCD-L. This effort requires a high degree of visibility into the traffic patterns across the banks and bank groups.
In examining various instruments, we found the Teledyne LeCroy analyzer to provide a unique way to easily view and analyze traffic patterns across bank groups not currently available on other validation instruments.
Figure 4 (next page) illustrates a DDR4 platform without DDR4 bank group tuning. One can easily observe that memory access is sparsely distributed, with long periods of inactivity where the banks are left open for long Teledyne LeCroy Introduction to DDR4 Design and Test page | 7 of 8 periods of time. The Kibra 480 protocol analyzer “Bank State View” can be very useful for analyzing traffic distribution across banks and identifying inefficient memory accesses in order to better tune the platform.
Figure 5 (next page) shows the same platform during a highly tuned memory access taking advantage of the performance advantages of bank groups with demonstrably quicker memory access.
This article highlights several key DDR4 features that will be critical for delivering higher performance and power efficiency within the memory subsystem. Using a dedicated DDR4 protocol analyzer such as the Teledyne LeCroy Kibra 480 allows faster analysis, verification and tuning of key system operating parameters.