SOC Design

Wednesday, April 20, 2005

Live Law, Dying Corollaries

Yesterday marked the 40th birthday of Moore’s Law. Electronics magazine published Moore’s first article on the topic of device scaling in semiconductors on April 19, 1965. In the ensuing 40 years, the number of components we can fabricate on a production integrated circuit jumped by a factor of more than one million. On average, that’s better than a 10x increase in device count every decade and we can expect at least another 10x before hitting hard limits set by the size of silicon atoms.

However, just because Moore’s Law seems alive and well doesn’t mean that everything is fine in the land of the ever-shrinking transistor. Two corollaries of Moore’s Law—often mistaken for the real thing by the press—are clearly dying. Those corollaries relate to device speed and power dissipation. For nearly 40 years, smaller transistors also meant faster transistors that ran on less power, at least individually. Those corollaries started to break somewhere around the turn of the century.

Nowhere is this effect more apparent than in the 25-year fight for the fastest clock rate in the world of PC processors. Intel and AMD have been locked in a PC-processor death match for more than 20 years and, for most of that time, processor clock rate largely determined which company was “winning” at any given time. (Actually, Intel’s always winning when it comes to sales but the competition has wavered back and forth with respect to technology.)

In the early 1980s, Intel signed a cross-license agreement with AMD for manufacturing x86 processors starting with the 8086 and 8088. Intel then introduced the 80286 at 12.5MHz. AMD, being the second source, sought a sales advantage over prime source Intel and found one by introducing a faster, 16MHz version of the 80286. Intel fought back with its 80386, which ran at 33MHz, and refused to hand over the design for that processor to official second source AMD. This naturally led to a lawsuit.

Meanwhile, AMD fought back on the technological front by introducing a reverse-engineered 80386 running at 40MHz. The race went on for years. In 1997, AMD’s K6 processor hit 266MHz and Intel countered by introducing the Pentium II processor running at 266MHz just three weeks later. Three years later, AMD’s Athlon processor was the first x86 processor to hit a 1GHz clock rate.

Finally, Intel really got the message about clock rate. Clock rate was clearly king in the processor wars. As a result, Intel re-architected the Pentium’s microarchitecture to emphasize clock rate (though not necessarily real performance) by creating a really deep pipeline and the resulting Pentium 4 processor put Intel substantially ahead in the clock-rate war.

All of these clock-rate escalations relied on Moore’s-Law scaling to achieve the added speed. Faster clock rates automatically accompanied smaller transistor designs through the 1970s, 1980s, and 1990s. However, this isn’t true any more. Intel and AMD are no longer trying to win these clock fights because that war is essentially over.

Additional Moore’s-Law transistor scaling produces smaller transistors, so that more of them fit on a chip. But these shrunken transistors don’t necessarily run any faster for a number of technical reasons and they also don’t run at lower power due to related factors. There are some additional processing tricks such as strained silicon and SOI that can achieve higher clock speeds, but they no longer come as an automatic benefit of Moore’s Law.

There’s been another casualty of the clock-rate war: power dissipation. The original IBM PC ran its 8088 microprocessor at 4.77MHz and required no heat sink. The 80286, 80386, and early ‘486 processors also ran without heat sinks. Around 100MHz, PC processors started requiring heat sinks. Eventually, they required heat sinks with integrated fans. Some high-end PCs now come equipped with active liquid-cooling systems for the processor. This isn’t progress because fans add noise and have reliability issues of their own. However, these processor fans are essential because it’s become very difficult to extract the rapidly growing amount of waste heat from these processors as the clock rate has climbed.

Consequently, Intel and AMD are moving the PC-processor battlefront from clock speed to parallelism: getting more work done per clock. Multiple processors per chip is now the name of the game. Coincidentally, this week, during the 40th-anniversary celebrations of Moore’s Law, both Intel and AMD are announcing dual-core versions of their top-end PC processors. The companies have concluded that faster performance through mere clock rate escalation is a played-out tune. However, Moore’s Law is still delivering more transistors every year so Intel and AMD can put those numerous, smaller transistors to work by fabricating two processors on one semiconductor die.

None of this is new in the world of SOC design because SOC designers have never been able to avail themselves of the multi-GHz clock rates achieved by processor powerhouses Intel and AMD. SOC design teams don’t have hundreds of engineers to hand-design the critical-path circuits, which is the price for achieving these extremely high clock rates. However, all SOC designers can avail themselves of the millions of transistors per chip provided by Moore’s Law in the 21st century. As a result, many companies have been developing SOC designs with multiple processors for several years.

The International Technology for Semiconductors Design Technical Working Group (ITRS Design TWG) recently met in Munich to discuss changes to the next official ITRS. Part of that discussion involved a forecast in the increase in the number of processing engines used in the average SOC from 2004 to 2016. The current forecast starts with 18 processing engines on an SOC in 2004, which jumps to 359 processing engines in the SOC for 2016. (Today, Tensilica’s customers incorporate an average of 6 processors per SOC and one customer, Cisco, has developed a networking chip with 188 active processors and four spares.)

I don’t think the ITRS Design TWG knows exactly what those 359 processing engines will be doing in the year 2016, but it estimates that these engines will consume about 30% of the SOC’s die area—roughly the same amount of area (as a percentage of the total die area) that’s consumed by last year’s 18 processing engines. In another decade, SOC designers will clearly get a lot more processing power for the silicon expended. Harnessing all that processing power is a topic for another blog post.

Parallelism is clearly the path to performance in the 21st century. Exploiting parallelism adheres to and exploits the true Moore’s Law, which is still very much alive, and veers away from the dying corollary of higher clock rate. Boosting parallelism, which is inherent in a large number of end applications, lowers the required clock rate and therefore lowers power dissipation. Given the thrust of processor and SOC design over the last decade, dropping the clock rate seems counterintuitive. Nevertheless, the physics demand it.

Moore’s Law continues to benefit all IC designers, even after 40 years, although it’s very handy corollaries seem to be dying out. With work and just a bit of luck, Moore’s Law will continue to benefit the industry for at least another decade.

(Thanks to Kevin Krewell, Editor-in-Chief of The Microprocessor Report, for nailing down the PC processor clock-rate facts so neatly in his recent editorial titled "AMD vs. Intel, Round IX.")

0 Comments:

Post a Comment

<< Home