Nov 9, 2008

Core i7 Nehalem's Review

Core i7 Genesis
The Core i7 is Intel's first new CPU architecture since the original Core 2 shipped back in July, 2006. It's hard to believe that the first Core 2 processors shipped over two years ago.

Since then, Intel has shipped incremental updates to the product line.

Quad-core Core 2 CPUs arrived in November 2006, in the form of the QX6700. AMD was quick to point out that Intel's quad-core solutions weren't "true" quad-core processors, but consisted of two Core 2 Duo dies in a single package. Despite that purist objection, Intel's quad-core solutions proved highly successful in the market.

The original Core 2 line was built on a 65nm manufacturing process. In late 2007, Intel began shipping 45nm CPUs, code-named Penryn. Intel's 45nm processors offered a few incremental feature updates, but were basically continuations of the Core 2 line.

In the past year, details about Nehalem began dribbling out, culminating with full disclosure of the Core i7 architecture at the August, 2008 Intel Developer Forum. If you want more details about Nehalem's architecture, that article is well worth a read. However, we'll touch on a few highlights now.

Speeds and Feeds

Intel will be releasing three Core i7 CPUs later this month (November, 2008). Here's a rundown of the specs of the three processors:


Core i7 965 Extreme Core i7 940 Core i7 920

Clock Frequency 3.20GHz 2.93GHz 2.66GHz

QPI Data Rate 6.4 GT/sec* 4.8 GT/sec 4.8GT/sec

Max. Non-Turbo Ratio NA NA 20

Thermal Design Power 130W 130W 130W

Transistor Count 731M 731M 731M

Die Size 263mm^2 263mm^2 263mm^2

Price $999 (qty. 1,000) $562 (qty. 1,000) $285 (qty. 1,000)
*GT = gigatransfers per second

Cache and Memory
The initial Core i7 CPUs will offer a three-tiered cache structure. Each individual core contains two caches: a 64K L1 cache (split into a 32K instruction cache and a 32K data cache), plus a 256K unified L2 cache. An 8MB L3 cache is shared among the four cores. That 256K L2 cache is interesting, because it's built with an 8-T (eight transistors per cell) SRAM structure. This facilitates running at lower voltages, but also takes up more die space. That's one reason the core-specific L2 cache is smaller than you might otherwise expect.

Like AMD's current CPU line, Nehalem uses an integrated, on-die memory controller. Intel has finally moved the memory controller out of the north bridge. The current memory controller supports only DDR3 memory. The new controller also supports three channels of DDR3 per socket, with up to three DIMMs per channel supported. Earlier, MCH-style memory controllers only supported two channels of DRAM.

The use of triple-channel memory mitigates the relatively low, officially supported DDR3 clock rate of 1066MHz (effective.) In conversations with various Intel representatives, they were quick to point out that three channels of DDR3-1066 equates to 30GB/sec of memory bandwidth

The integrated memory controller also clocks higher than one built into a north bridge chip, although not necessarily at the full processor clock speed. This higher clock, plus the lack of having to communicate over a north bridge link, substantially improves memory latency.

To facilitate the integrated memory controller, Intel developed a new, point-to-point system connect, similar in concept to AMD's HyperTransport. Known as QuickPath Interconnect or QPI for short, the new interconnect can move data at peak rates of 25GB/sec (at a 6.4 gigatranfers per second base). Note that not all Nehalem processors will support the full theoretical bandwidth. The Core i7 940 and 920 CPUs support the 4.8 gigatransfer per second base rate, with a maximum throughput of 19.2GB/sec per channel. That's still more than enough bandwidth for three DDR3-1066 memory channels.

Improvements to the Base Core Architecture

Core i7 boasts a substantial set of enhancements over the original Core 2 architecture, some of which are more subtle than others.
Let's run down some of the more significant enhancements, in no particular order.

  • The Return of Hyper-Threading—Core i7 now implements Hyper-Threading, Intel's version of simultaneous multithreading. Each processor core can handle two simultaneous execution threads. Intel added processor resources, including deeper buffers, to enable robust SMT support. Load buffers have been increased from 32 (Core 2) to 48 (Core i7), while the number of store buffers went from 20 to 32.
  • New SSE4.2 instructions—Intel enhanced SSE once again, by adding instructions that can help further speed up media transcoding and 3D graphics.
  • Fast, unaligned cache access—Before Nehalem, data needed to be aligned on cache line boundaries for maximum performance. That's no longer true with Nehalem. This will help newer applications written for Nehalem, more than older ones, only because compilers and application authors often took great care to align data along cache line boundaries.
  • Advanced Power Management—The Core i7 actually contains another processor core, much tinier than the main cores. This is the power management unit, and is a dedicated microcontroller on the Nehalem die that's not accessible from the outside world. Its sole purpose is to manage the power envelope of Nehalem. Sensors built into the main cores monitor thermals, power and current, optimizing power delivery as needed. Nehalem is also engineered to minimize idle power. For example, Core i7 implements a per core C6 sleep state.
  • Turbo Mode—One interesting aspect of Core i7's power management is Turbo Mode (not to be confused with Turbo Cache). Turbo mode is a sort of automatic overclocking features, in which individual cores can be driven to higher clock frequencies as needed. Turbo Mode is treated as another sleep state by the power management unit, and operates transparently to the OS and the user.

That's a quick rundown of the key Nehalem architectural features. But do these additional features come together to enable better performance? Is Nehalem really more efficient than the Core 2 processors? We ran a ton of performance tests to find out. Before we dive into those tests, let's take a look at the test system. (story from)