By Douglas Malech and Sameer Kuppahalli, IDT and Ryan Baxter and Eric Caward, Micron Technology, Inc.
Traditionally, LRDIMMs and RDIMMs have provided complementary solutions for data center enterprise servers – LRDIMMs for applications requiring deeper memory and RDIMMs for applications requiring higher data bandwidth. However, with the introduction of 8 gigabit (Gb) DRAMs, this white paper shows that LRDIMMs provide a superior alternative solution for both deeper memory and higher data bandwidth. In particular, 32 GB 2RX4 LRDIMMs based on 8 Gb DDR4 DRAMs are shown to transcend 32 GB 2RX4 RDIMMs in both server memory capacity and bandwidth.
Deeper memories and more bandwidth
There are a growing number of Internet applications benefitting from both deeper memories and higher data bandwidth such as in-memory databases (IMDBs). Larger IMDBs mean that more data can reside in high-speed DDR4 DRAM memory, reducing data exchanges with slower storage media during data processing. Reducing data exchanges with slower storage media means the memory-intensive application will run faster. IMDB application examples include data analytics, financial algorithms, gaming and search algorithms. Figure 1 shows some examples.
The LRDIMM advantage
Prior to 8 Gb DRAMs, 32 GB LRDIMMs were constructed using more expensive DDPFigure 2. In 32 GB 4RX4 LRDIMMs based on 4 Gb DRAMs, two DRAM data loads from the frontside DDP and two from the backside DDP share the same data bus. These four DRAM data loads are further reduced down to one data buffer load by the LRDIMM’s unique data buffering architecture. 1 packaged 4 Gb DRAMs, as shown in Figure 2.
Because of this data load reduction technique, when populating three LRDIMMs in a memory channel, only three loads are present. In addition, the nine data buffers are physically located very close to the edge connector, reducing data transmission stub length. Reducing transmission stub lengths and stub counts improve signal integrity. Figure 3 shows a memory controller channel (MCH) configured with three LRDIMMs (3 DPC). Improving signal integrity adds more signal margin, thereby improving server system reliability when populating thousands of memory modules into thousands of servers at a typical data center.
Contrary to 32 GB 4RX4 LRDIMMs, 32 GB 4RX4 RDIMMs were not developed because, in the absence of load reducing data buffers, all four DRAM data loads would be visible to the MCH channel, presenting twelve loads in a three RDIMM per MCH configuration (4 DRAM loads x 3 RDIMM). In addition, without data buffers, the signal distance from the DRAMs to the edge connector is increased. An increase in transmission stub lengths and stub counts means poorer signal integrity. This is why RDIMMs based on 4 Gb DRAMs stop at 16 GB 2RX4 memory capacity while LRDIMMs go up to 32 GB 4RX4 memory capacity.
As applications continue to benefit from increased memory capacity, 8 Gb DRAMs enable the RDIMM “sweet spot” to increase from 16 GB memory modules to 32 GB. A 16 GB RDIMM is constructed using 4 Gb DRAMs in a 2RX4 configuration. It follows that a 32 GB RDIMM can be constructed from 8 Gb DRAMs using the same 2RX4 configuration because each DRAM contributes twice as much memory. Likewise, a 32 GB LRDIMM can be reconstructed using 8 Gb DRAMs in a 2RX4 configuration instead of using more expensive DDPs in a 4RX4 configuration. With 8 Gb DRAMs doubling RDIMM memory capacity from 16 GB to 32 GB and simultaneously replacing more expensive DDP’s previously used to construct 32 GB LRDIMMs, which is the better choice for fully populated systems – 32 GB LRDIMMs or 32 GB RDIMMs? Our lab measurements show that 32 GB 2RX4 LRDIMMs have a clear advantage over 32 GB 2RX4 RDIMMs in that you can benefit from the additional memory at a higher bandwidth.
Comparing 32 GB LRDIMMs and 32 GB RDIMMs
A typical enterprise class server can have up to 24 memory modules as shown in Figure 4. A server with 24 memory modules, each with 32 GB of memory, will have 768 Gb of memory (24 x 32 GB).
IDT wanted to see which 32 GB memory module solution provided the superior solution for both total server memory and data bandwidth – LRDIMM or RDIMM. The process by which IDT made this determination was in the following manner:
- Determine the module’s signal integrity on the MCH channel for the fully populated 3 DPC system configuration
- Choose the highest speed possible for LRDIMM and RDIMM with acceptable signal integrity
- Compare bandwidths at these highest speeds to determine whether LRDIMM or RDIMM gives higher memory bandwidth
Data Signal integrity was measured in two places as shown in Figure 5 as V+ and V-. More positive V+ and more negative V- voltage measurements infer better signal integrity. In each case, it is a measure of how much voltage margin is available between the actual Data Eye Signal and the Data Eye Mask. The Data Eye Signal must never dip into the area within the Data Eye Mask for all combinations of data signal patterns, DIMMs, server motherboards and microprocessors. If the Data Eye Signal dips into the Data Eye Mask region, a data value of “1” might be interpreted as a “0” and vice versa.
The four measurements taken for receive and transmit directions show that 3DPC LRDIMM signal integrity at 2400 MT/s operation has better signal integrity than 3DPC RDIMMs at both 2400 MT/s and 2133 MT/s speeds. The measured signal integrity data is shown in Figure 6 with the 3DPC RDIMM measurements in amber and the 3DPC LRDIMM measurement in green. LRDIMM at 2400 MT/s has more positive V+ and more negative V-, indicating overall better signal integrity.
Since 3DPC RDIMMs at 2400 MT/s had much lower voltage margins, IDT assumed that this RDIMM combination of speed and density would be ignored as a possible candidate for server applications. While 3DPC RDIMMs at 2133 MT/s also showed lower voltage margins, IDT chose this 2133 MT/s speed configuration, in absence of the 2400 MT/s option, to compare bandwidths with 32 GB LRDIMMs operating at 2400 MT/s.
IDT used Membw2 to compare bandwidths. Membw is a memory bandwidth testing software package that is public domain software. Membw stresses the memory modules with reads and writes across all memory channels. The server configuration used in this benchmarking exercise has two Intel multi-core microprocessors, each with 4 memory channels and 3 DPC for a total of 24 memory modules. The Membw benchmark measurements showed that 3DPC LRDIMM bandwidth at 2400 MT/s is 8% higher than the 3DPC RDIMM bandwidth at 2133 MT/s.
Evolving enterprise server applications will benefit from both higher bandwidth and more memory module capacity. IDT compared 32 GB 2RX4 LRDIMM and 32 GB 2RX4 RDIMM performance in a 3DPC configuration for both signal integrity and read/write bandwidth. A fully populated server with 24 32 GB LRDIMMs operating at 2400 MT/s showed better signal integrity than the same configuration using 24 32 GB RDIMMs operating at 2133 MT/s. The LRDIMM operating at 2400 MT/s also has an 8% higher bandwidth than RDIMMs operating at 2133 MT/s.
DDR4 LRDIMMs let you achieve higher memory bandwidth than RDIMMs at even mainstream module densities.
Integrated Device Technologies