Core i7 Nehalem's Review  

Posted by Mohammad Talha in , , , ,

Core i7 Genesis

The Core i7 is Intel's first new CPU architecture since the original Core 2 shipped back in July, 2006. It's hard to believe that the first Core 2 processors shipped over two years ago.

Since then, Intel has shipped incremental updates to the product line.

Quad-core Core 2 CPUs arrived in November 2006, in the form of the QX6700. AMD was quick to point out that Intel's quad-core solutions weren't "true" quad-core processors, but consisted of two Core 2 Duo dies in a single package. Despite that purist objection, Intel's quad-core solutions proved highly successful in the market.

The original Core 2 line was built on a 65nm manufacturing process. In late 2007, Intel began shipping 45nm CPUs, code-named Penryn. Intel's 45nm processors offered a few incremental feature updates, but were basically continuations of the Core 2 line.

In the past year, details about Nehalem began dribbling out, culminating with full disclosure of the Core i7 architecture at the August, 2008 Intel Developer Forum. If you want more details about Nehalem's architecture, that article is well worth a read. However, we'll touch on a few highlights now.

Speeds and Feeds

Intel will be releasing three Core i7 CPUs later this month (November, 2008). Here's a rundown of the specs of the three processors:

Core i7 965 Extreme Core i7 940 Core i7 920

Clock Frequency 3.20GHz 2.93GHz 2.66GHz

QPI Data Rate 6.4 GT/sec* 4.8 GT/sec 4.8GT/sec

Max. Non-Turbo Ratio NA NA 20

Thermal Design Power 130W 130W 130W

Transistor Count 731M 731M 731M

Die Size 263mm^2 263mm^2 263mm^2

Price $999 (qty. 1,000) $562 (qty. 1,000) $285 (qty. 1,000)
*GT = gigatransfers per second

Cache and Memory
The initial Core i7 CPUs will offer a three-tiered cache structure. Each individual core contains two caches: a 64K L1 cache (split into a 32K instruction cache and a 32K data cache), plus a 256K unified L2 cache. An 8MB L3 cache is shared among the four cores. That 256K L2 cache is interesting, because it's built with an 8-T (eight transistors per cell) SRAM structure. This facilitates running at lower voltages, but also takes up more die space. That's one reason the core-specific L2 cache is smaller than you might otherwise expect.

Like AMD's current CPU line, Nehalem uses an integrated, on-die memory controller. Intel has finally moved the memory controller out of the north bridge. The current memory controller supports only DDR3 memory. The new controller also supports three channels of DDR3 per socket, with up to three DIMMs per channel supported. Earlier, MCH-style memory controllers only supported two channels of DRAM.

The use of triple-channel memory mitigates the relatively low, officially supported DDR3 clock rate of 1066MHz (effective.) In conversations with various Intel representatives, they were quick to point out that three channels of DDR3-1066 equates to 30GB/sec of memory bandwidth

The integrated memory controller also clocks higher than one built into a north bridge chip, although not necessarily at the full processor clock speed. This higher clock, plus the lack of having to communicate over a north bridge link, substantially improves memory latency.

To facilitate the integrated memory controller, Intel developed a new, point-to-point system connect, similar in concept to AMD's HyperTransport. Known as QuickPath Interconnect or QPI for short, the new interconnect can move data at peak rates of 25GB/sec (at a 6.4 gigatranfers per second base). Note that not all Nehalem processors will support the full theoretical bandwidth. The Core i7 940 and 920 CPUs support the 4.8 gigatransfer per second base rate, with a maximum throughput of 19.2GB/sec per channel. That's still more than enough bandwidth for three DDR3-1066 memory channels.

Improvements to the Base Core Architecture

Core i7 boasts a substantial set of enhancements over the original Core 2 architecture, some of which are more subtle than others.
Let's run down some of the more significant enhancements, in no particular order.

  • The Return of Hyper-Threading—Core i7 now implements Hyper-Threading, Intel's version of simultaneous multithreading. Each processor core can handle two simultaneous execution threads. Intel added processor resources, including deeper buffers, to enable robust SMT support. Load buffers have been increased from 32 (Core 2) to 48 (Core i7), while the number of store buffers went from 20 to 32.
  • New SSE4.2 instructions—Intel enhanced SSE once again, by adding instructions that can help further speed up media transcoding and 3D graphics.
  • Fast, unaligned cache access—Before Nehalem, data needed to be aligned on cache line boundaries for maximum performance. That's no longer true with Nehalem. This will help newer applications written for Nehalem, more than older ones, only because compilers and application authors often took great care to align data along cache line boundaries.
  • Advanced Power Management—The Core i7 actually contains another processor core, much tinier than the main cores. This is the power management unit, and is a dedicated microcontroller on the Nehalem die that's not accessible from the outside world. Its sole purpose is to manage the power envelope of Nehalem. Sensors built into the main cores monitor thermals, power and current, optimizing power delivery as needed. Nehalem is also engineered to minimize idle power. For example, Core i7 implements a per core C6 sleep state.
  • Turbo Mode—One interesting aspect of Core i7's power management is Turbo Mode (not to be confused with Turbo Cache). Turbo mode is a sort of automatic overclocking features, in which individual cores can be driven to higher clock frequencies as needed. Turbo Mode is treated as another sleep state by the power management unit, and operates transparently to the OS and the user.

That's a quick rundown of the key Nehalem architectural features. But do these additional features come together to enable better performance? Is Nehalem really more efficient than the Core 2 processors? We ran a ton of performance tests to find out. Before we dive into those tests, let's take a look at the test system. (story from)

Intel Core i7 Review: Nehalem Gets Real  

Posted by Mohammad Talha in , , ,

Nehalem is here.

Anticipation for Intel's latest CPU architecture rivals the intensity for the original Core 2 Duo.
It's not just that Nehalem is a new CPU architecture. Intel's new CPU line also brings along with it a new system bus, new chipsets, and a new socket format.
Today, we're mainly focusing on the Core i7 CPU and its performance compared to Intel's Core 2 quad-core CPUs. There's a ton of data to sift through just on CPU performance. We'll have ample opportunity to dive into the platform, and its tweaks, in future articles.
Intel will be launching three new Core i7 products in the next couple of weeks, at 2.66GHz, 2.93GHz, and 3.20GHz, at prices ranging from $285 to $999 (qty. 1,000). That's right: You'll be able to pick up a Core i7 CPU for around $300 fairly soon. Of course, that's not the whole story: You'll need a new motherboard and very likely, new memory, since the integrated memory controller only supports DDR3.
In the past several weeks, we've been locked in the basement lab, running a seemingly endless series of benchmarks on six different CPUs. Now it's time to talk results. While we'll be presenting our usual stream of charts and numbers, we'll try to put them in context, including discussions of how and when it might be best to upgrade.
Let's get started with a peek under the hood. (story Link)

XP Users: Speed Up Your 802.11n Connection  

Posted by Mohammad Talha in , , , , ,

When you think about boosting the performance of your wireless network, 5-GHz wireless 802.11n routers are probably what come to mind. But the higher frequency used in the draft-n routers is only half the story. The "n" specification also allows wireless devices to use multiple antennas and MIMO (multiple input–multiple output) technology, also referred to as spatial multiplexing. With spatial multiplexing, operating systems have to coordinate higher volumes of TCP data streams at constant round-trip times between the router or access point and laptop. For years our connection and network bandwidth limited our download speeds. With 802.11n networks, it might just be our local operating systems that throttle the pipe.

Because the throughput of 802.11n is wider, more bits can flow between your wireless notebook and the router, but that means a legacy operating system like Windows XP requires more time to assemble and process the data. There is an option called Receive Window Scaling to optimize the performance of Windows XP, but the option is turned off by default in the Registry.
For this tweak to work, your laptop's wireless network adapter must use either an internal "n" wireless chipset or an "n" PC card. Just add a few entries in the Registry:

1. Open the Windows XP Registry (Click Start and then Run, type regedit, and press Enter)

2. Back up the current Registry (File | Export…)

3. Find the values in the folder path HKey_LOCAL_MACHINE\ SYSTEM\ CurrentControlSet\ Services\ Tcpip\ Parameters and add a DWORD entry with the name Tcp1323Opts. Right-click on the entry, then choose New | DWORD value. Give it a hexadecimal value of 3.

Add a DWORD entry with the name TcpWindowSize and give it a hexadecimal value of 40000.
You must reboot your system so that the changes can take effect. The tweak tells the driver that the OS has turned on the Windows Scaling option. Windows Scaling improves the data stream's round-trip time in the TCP stack, so your laptop will perform faster with 802.11n routers in 5-GHz mode. There isn't a nice, neat dialog box you can call up to tell you how much your throughput has improved, but you should see a real difference in download times. (story Link) - Blog Search