AMD's K10 Architecture in Detail
First off, before we get into our Phenom processor, we wanted to have a bit of a look at the K10 architecture that AMD is hoping will save it from Intel's Core architecture that has managed to severely hamper AMD's sales as the top performance CPU.
AMD's K10 architecture is simply a re-vamp of the original K8 architecture or the AMD64 architecture as it's commonly known. The same 64-bit OS support and internal Northbridge with memory controller tradition continues, with a few additional features to help speed things up
AMD Wide Floating Point Accelerator
One of the new features of the K10 based CPUs is AMD's Wide Floating Point Accelerator. What this does in terms of the CPU execution is to process a full SIMD instruction in one clock cycle. K8 and Netburst both had a 64-bit SIMD buffer, this meant to execute a SIMD instruction it would take two clock cycles as SIMD instructions are 128-bit wide. Core 2 and K10 both have a 128-bit SIMD engine that allows the CPU to process a full SIMD in one clock cycle, reducing the amount of time it takes to move to the next instruction, in affect halving the encode and decode times.
In order to keep up with the Core 2 processors, AMD increased the fetcher engine to handle 32 bytes of data from the L1 cache, which is double that of the K8 and Netburst based CPUs. This allows the CPU to retrieve more data from L1 cache in a single clock cycle, reducing wasted clock cycles when trying to fetch data from the L1 cache.
AMD has really sorted out the issues of storing as much data on the CPU as it possibly can. While system memory can store quite a bit of data, it's nowhere near as fast as the cache memory on the CPU. System memory access results in delays as the system has to wait for the memory to cycle through its in-and-out stages to get a command in, and data out. CPU cache allows the processor to store data that is frequently accessed on its own contained memory, but because of the speeds these memories run at, sizes are limited.
AMD has set forth with a new Cache layout on its Phenom Processors. First off, the CPU has its standard L1 data and instruction caches which remain at 64Kb for instructions and 64Kb for data; a total of 128Kb, and this is for each core. So, a dual core processor will have in effect 256K L1 caches combined and a Quad Core will have 512Kb combined. The level 2 cache on Phenom processors is 512Kb per core; again, for a dual core its 1MB combined. Tri core is 1.5MB combined and Quad core 2MB combined L2 cache.
Now, just where you think things end, AMD has done what Intel did with Core 2 by creating a unified cache system which adds in a third level of cache or "L3" cache system. This is a common cache and is accessible by all of the cores on the CPU, allowing for each CPU to share instructions and data without having to go through any system request buses or FSB's. L3 cache is actually stored as part of the Intergrated Northbridge on the K10 processor and runs at 2GHz constant, no matter what the CPU is running at, and it measures 2MB in size.
Dual Independent Memory Controller
This has to be one of the biggest upgrades AMD has made to the K10 architecture, which not only sees an increase in speeds but also a new way of storing data on the bus. AMD has kept the same memory technology on the first series of K10 processors, that being DDR2 memory. Since AMD has on-CPU memory, the pin-out on the bottom of the CPU has a set amount for the memory modules to run from. If AMD wants to change memory technologies, they need new boards, RAM and CPUs, making it harder than the Intel off-die memory controller. However, where Intel gets a multitude of upgrade options, AMD's CPUs see a far better memory bandwidth throughput.
Now we get to the real complicated bit. While dual channel memory controllers usually combine two identical sized modules into a larger single module (not unlike a RAID array) by increasing the bus bandwidth from 64-bit to 128-bit, this in theory increases the amount of memory bandwidth. However, it becomes inefficient in storing data. For instance, if two different fetches are required on a traditional Dual Channel system, each has to be performed on its own. AMD has gone to a new height by doing it a different way.
By having two 64-bit channels that can be accessed together at the same time, this creating a dual channel memory interface whilst allowing separate access to each module, gives K10 the same bandwidth as traditional dual channel memory, but with a faster fetch time thanks to two memory operations happening at once. Thanks to this new way of accessing memory, AMD is now using divider ratios like Intel to control DDR memory speeds rather than the older way AMD was working things, giving inconsistent memory speeds.