SOC Design

Wednesday, April 27, 2005

RAW deal—duel of the digicam file formats

As the “digitization of everything” continues at breakneck speed, one of the real success stories is the rapid replacement of film cameras with digital versions. At the top of the digicam pyramid reside the digital SLRs (“dSLRs”) made by Canon, Nikon, Olympus, Pentax, etc. Although almost all digicam’s produce images in standard file formats (JPEG and/or TIFF), the dSLRs can also record images in an uncompressed format known as “RAW.” Photo professionals and top-end photo enthusiasts prefer the RAW format because it can produce pictures with the greatest resolution and color depth and it avoids all compression artifacts. However, every digicam vendors’ RAW format is proprietary and different.

Consequently, every digicam vendor that makes cameras with RAW file capability must offer a proprietary RAW converter program so that their RAW images can be converted to the standard file formats at some point. In addition, vendors of image-editing and image-manipulation programs such as Adobe, Bibble Labs, and DxO offer their own homegrown RAW converters, but not for every camera.

As dSLRs evolve, the problem of proprietary RAW formats grows and the professionals and enthusiasts in the dSLR market are increasingly up in arms about proprietary formats. Most recently, Nikon has marketed itself straight into the crosshairs of what may well turn out to be a shooting war. Nikon has drawn fire because it is encrypting the white-balance data in its RAW files (called Nikon Electronic Format or NEF files) from the Nikon D2X and D2Hs dSLRs (as reported in dpreview.com, the Engadget blog, ). This encryption effectively cripples third-party image-processing programs and forces users of those two cameras to employ Nikon’s own Nikon Capture program (sold separately). Nikon has an SDK that converts NEF files into standard image files that can be processed by third party programs. Nikon says it will license the SDK to what the company calls “bona fide” software developers, but the encryption has been cracked by at least two developers, although such actions may open the third-party developers to prosecution under the Digital Millennium Copyright Act (DMCA).

There are some signs of sanity on the horizon. One of the first is the openRAW project, an attempt to get digicam manufacturers to openly document their RAW formats for the benefit of the entire photographic industry. Adobe is taking a different tack, trying to drum up support for a universal RAW format that it introduced last year called the Digital Negative (DNG) format, which is based on the TIFF 6.0 standard. Here’s hoping the imaging mavens wake up sooner rather than later.

Tuesday, April 26, 2005

Shoe Biz Redux

Last month, I wrote about German shoe maker Adidas' new, $250, computerized Adidas 1 running shoes that automatically and continuously adjust the shoes' cushioning based on the runner's gait. Each shoe has a 5-MIPS processor in it (presumably there's a left processor and a right processor). Each processor accepts feedback data from a magnetic impact sensor, computes the "optimal" amount of cushioning using a "proprietary" algorithm, and drives a small electric motor that adjusts the shoe accordingly. It turns out that a real runner can feel the difference. For a positive, hands-on (feet-on?) review of the Adidas 1 shoes, see Frank Bajak's MSNBC article online.

For a nice exploded diagram and technical explanation of the shoe, click here.

To a terahertz and beyond

This month, two researchers working at the Micro and Nanotechnology Laboratory at the University of Illinois Urbana-Champaign announced that they have pushed transistor operating frequency more than halfway towards a terahertz. Developed by physicists Walid Hafez and Milton Feng, the pseudomorphic HBT (heterojunction bipolar transistor) has a maximum operating frequency of 604GHz. It’s fabricated from indium phosphide and indium gallium arsenide.

The researchers have been developing increasingly fast transistors; Two years ago, they broke the 500GHz barrier. Before this latest development, the team’s fastest transtor operated at 550GHz and operated at 176 degrees C. As always, heat is a big issue with high-frequency transistor operation. According to Hafez, "Projections from our earlier high-frequency devices indicated that in order to create a transistor with a cutoff frequency of 1 terahertz, the devices would have to operate above 10,000 degrees C. By introducing grading into the layer structure of the device, we have been able to lower the potential operating temperature for a terahertz transistor to within an acceptable range."

The 604GHz transistor surpasses what was previously the world’s fastest transistor, a 562GHz HEMT (high electron mobility transistor) FET developed in 2002 by Akira Endoh at Fujitsu Laboratories. Endoh and his colleagues are also shooting for a terahertz and beyond.

Wednesday, April 20, 2005

Live Law, Dying Corollaries

Yesterday marked the 40th birthday of Moore’s Law. Electronics magazine published Moore’s first article on the topic of device scaling in semiconductors on April 19, 1965. In the ensuing 40 years, the number of components we can fabricate on a production integrated circuit jumped by a factor of more than one million. On average, that’s better than a 10x increase in device count every decade and we can expect at least another 10x before hitting hard limits set by the size of silicon atoms.

However, just because Moore’s Law seems alive and well doesn’t mean that everything is fine in the land of the ever-shrinking transistor. Two corollaries of Moore’s Law—often mistaken for the real thing by the press—are clearly dying. Those corollaries relate to device speed and power dissipation. For nearly 40 years, smaller transistors also meant faster transistors that ran on less power, at least individually. Those corollaries started to break somewhere around the turn of the century.

Nowhere is this effect more apparent than in the 25-year fight for the fastest clock rate in the world of PC processors. Intel and AMD have been locked in a PC-processor death match for more than 20 years and, for most of that time, processor clock rate largely determined which company was “winning” at any given time. (Actually, Intel’s always winning when it comes to sales but the competition has wavered back and forth with respect to technology.)

In the early 1980s, Intel signed a cross-license agreement with AMD for manufacturing x86 processors starting with the 8086 and 8088. Intel then introduced the 80286 at 12.5MHz. AMD, being the second source, sought a sales advantage over prime source Intel and found one by introducing a faster, 16MHz version of the 80286. Intel fought back with its 80386, which ran at 33MHz, and refused to hand over the design for that processor to official second source AMD. This naturally led to a lawsuit.

Meanwhile, AMD fought back on the technological front by introducing a reverse-engineered 80386 running at 40MHz. The race went on for years. In 1997, AMD’s K6 processor hit 266MHz and Intel countered by introducing the Pentium II processor running at 266MHz just three weeks later. Three years later, AMD’s Athlon processor was the first x86 processor to hit a 1GHz clock rate.

Finally, Intel really got the message about clock rate. Clock rate was clearly king in the processor wars. As a result, Intel re-architected the Pentium’s microarchitecture to emphasize clock rate (though not necessarily real performance) by creating a really deep pipeline and the resulting Pentium 4 processor put Intel substantially ahead in the clock-rate war.

All of these clock-rate escalations relied on Moore’s-Law scaling to achieve the added speed. Faster clock rates automatically accompanied smaller transistor designs through the 1970s, 1980s, and 1990s. However, this isn’t true any more. Intel and AMD are no longer trying to win these clock fights because that war is essentially over.

Additional Moore’s-Law transistor scaling produces smaller transistors, so that more of them fit on a chip. But these shrunken transistors don’t necessarily run any faster for a number of technical reasons and they also don’t run at lower power due to related factors. There are some additional processing tricks such as strained silicon and SOI that can achieve higher clock speeds, but they no longer come as an automatic benefit of Moore’s Law.

There’s been another casualty of the clock-rate war: power dissipation. The original IBM PC ran its 8088 microprocessor at 4.77MHz and required no heat sink. The 80286, 80386, and early ‘486 processors also ran without heat sinks. Around 100MHz, PC processors started requiring heat sinks. Eventually, they required heat sinks with integrated fans. Some high-end PCs now come equipped with active liquid-cooling systems for the processor. This isn’t progress because fans add noise and have reliability issues of their own. However, these processor fans are essential because it’s become very difficult to extract the rapidly growing amount of waste heat from these processors as the clock rate has climbed.

Consequently, Intel and AMD are moving the PC-processor battlefront from clock speed to parallelism: getting more work done per clock. Multiple processors per chip is now the name of the game. Coincidentally, this week, during the 40th-anniversary celebrations of Moore’s Law, both Intel and AMD are announcing dual-core versions of their top-end PC processors. The companies have concluded that faster performance through mere clock rate escalation is a played-out tune. However, Moore’s Law is still delivering more transistors every year so Intel and AMD can put those numerous, smaller transistors to work by fabricating two processors on one semiconductor die.

None of this is new in the world of SOC design because SOC designers have never been able to avail themselves of the multi-GHz clock rates achieved by processor powerhouses Intel and AMD. SOC design teams don’t have hundreds of engineers to hand-design the critical-path circuits, which is the price for achieving these extremely high clock rates. However, all SOC designers can avail themselves of the millions of transistors per chip provided by Moore’s Law in the 21st century. As a result, many companies have been developing SOC designs with multiple processors for several years.

The International Technology for Semiconductors Design Technical Working Group (ITRS Design TWG) recently met in Munich to discuss changes to the next official ITRS. Part of that discussion involved a forecast in the increase in the number of processing engines used in the average SOC from 2004 to 2016. The current forecast starts with 18 processing engines on an SOC in 2004, which jumps to 359 processing engines in the SOC for 2016. (Today, Tensilica’s customers incorporate an average of 6 processors per SOC and one customer, Cisco, has developed a networking chip with 188 active processors and four spares.)

I don’t think the ITRS Design TWG knows exactly what those 359 processing engines will be doing in the year 2016, but it estimates that these engines will consume about 30% of the SOC’s die area—roughly the same amount of area (as a percentage of the total die area) that’s consumed by last year’s 18 processing engines. In another decade, SOC designers will clearly get a lot more processing power for the silicon expended. Harnessing all that processing power is a topic for another blog post.

Parallelism is clearly the path to performance in the 21st century. Exploiting parallelism adheres to and exploits the true Moore’s Law, which is still very much alive, and veers away from the dying corollary of higher clock rate. Boosting parallelism, which is inherent in a large number of end applications, lowers the required clock rate and therefore lowers power dissipation. Given the thrust of processor and SOC design over the last decade, dropping the clock rate seems counterintuitive. Nevertheless, the physics demand it.

Moore’s Law continues to benefit all IC designers, even after 40 years, although it’s very handy corollaries seem to be dying out. With work and just a bit of luck, Moore’s Law will continue to benefit the industry for at least another decade.

(Thanks to Kevin Krewell, Editor-in-Chief of The Microprocessor Report, for nailing down the PC processor clock-rate facts so neatly in his recent editorial titled "AMD vs. Intel, Round IX.")

Tuesday, April 19, 2005

More on Moore

Yesterday, I wrote about Moore’s Law and its 40th birthday. Well, today’s the actual day of the anniversary of Moore’s article in Electronics magazine and many people, including Gordon Moore, thought things might not get as far as they have. As recently as 1995, 30 years after writing that first article predicting exponential device growth in semiconductors, Moore said:

“As we go below 0.2 micron, the SIA road map says 0.18 micron line widths is the right number, we must use radiation of a wavelength that is absorbed by almost everything. Assuming more or less conventional optics, problems with depth of field, surface planarity and resist technology are formidable, to say nothing of the requirement that overlay accuracy must improve as fast as resolution if we are to really take maximum advantage of the finer lines.”

Moore delivered these words in 1995 to the members of SPIE, the International Society for Optical Engineering. Not very optimistic for the creator of Moore’s Law—and Moore went even further in describing his chart for lithographic progress:

“My plot goes only to the 0.18 micron generation, because I have no faith that simple extrapolation beyond that relates to reality.”

Getting even more serious, Moore said:

“Beyond this is really terra incognita, taking the term from the old maps. I have no idea what will happen beyond 0.18 microns.

In fact, I still have trouble believing we are going to be comfortable at 0.18 microns using conventional optical systems. Beyond this level, I do not see any way that conventional optics carries us any further. Of course, some of us said this about the one micron level. This time however, I think there are fundamental materials issues that will force a different direction.”

It didn’t happen the way Moore expected. Today, 180nm lithography is middle-of-the-road stuff and no one considers it miraculous while quarter-micron (250nm) lithography is trailing-edge and .35-micron technology—still in production—is absolutely Neolithic. Actually, even 130nm lithography is quite manufacturable today although it took a lot of engineering magic and ingenuity to “make it so.”

Even 90nm integrated circuits are already in production. In fact, the FPGA industry currently uses 90nm fabrication as its top-of-the-line technology foundation and 65nm chips are being fabricated, although no one would say that 65nm chips are yet in volume production. However, 65nm production in volume is clearly coming, of that there is no doubt. A couple of weeks ago, an article written by David Lammers in EE Times quoted Ted Vucuverich, senior vice president of advanced R&D at Cadence, as saying that the jump from 90nm to 65nm design rules might be quick indeed because the transition requires no change to the materials flow used in the current 90nm fabrication process so designers get the immense transistor bounty of the next design node without much of the pain normally associated with such a change.

Writing as a Regional Editor for EDN Magazine—way, way back in 1988—I described what was then the world’s smallest transistor: IBM had fabricated a transistor using 70nm design rules, which was an amazing feat in the disco decade. That 70nm transistor ran on 1V when nearly all digital ICs ran on 5V way back in 1988. Although the 70nm transistor required liquid-nitrogen cooling to combat thermal noise, the IBM researchers saw no fundamental reason why such tiny FETs couldn’t run at room temperature. Today we know they can. Quite well in fact. Back in 1989, leading-edge IC production technology used 0.7- or 0.8-micron (700 to 800nm) design rules. That was 10x what IBM had achieved in 1988 with its 70nm design rules and we are just starting to put such small transistors into production, some 17 years later.

So what is the smallest transistor made today? Just how far has Moore’s Law been stretched this early in the 21st century? In late 2003, NEC announced that it had built an FET with a 5nm gate length. Intel and AMD have publicly discussed fabrication of transistors with 10nm gate lengths and IBM built one with a 6nm gate length in 2002. (Note: For you purists, I know I’m mixing gate lengths and drawn geometries here for brevity’s sake. At the IEDM conference in 2002, AMD also discussed a flash memory that it built in conjunction with Stanford University using 5nm drawn geometries.)

These geometries are approximately 10x smaller than what’s used in today’s most advanced production devices. So, we already know that transistors will continue to work at geometries an order of magnitude smaller than what’s broadly manufacturable today and we already know ways to make such transistors. The ability to make such small transistors in production volumes will undoubtedly follow because the economic incentives to make things smaller and cheaper remain.

Even after 40 years, it looks like Moore’s Law still has a few birthdays left.


PS: How Moore’s Law got its name

Gordon Moore isn’t really the sort of guy to name a law after himself. He had help. Rick Merritt and Patrick Mannion told the story in an article in this week’s EE Times. It seems that it took another huge name in semiconductor lore, Carver Mead, to bestow the name. Mead started consulting for Moore back in Moore’s days at Fairchild and continued after Moore co-founded Intel. Mead named the law in an interview with a journalist in 1971. Mead was discussing his work on electron tunneling and the associated limits on device scalability. He spontaneously coined the catchy phrase during the interview. The rest is well-trodden history.

Monday, April 18, 2005

Happy birthday to Moore’s Law

On its 40th birthday (tomorrow), Gordon Moore’s prediction about the increasing number of components that can be put on an integrated circuit is very much alive and well. However, it’s often cited corollaries relating to device speed and power dissipation, which are often confused with the law itself, are in big trouble. But I’ll leave that discussion for another time.

The August 19, 1965 issue of Electronics Magazine published the original article written by Gordon Moore, who was Director of Fairchild Semiconductor’s R&D laboratories at the time. Moore’s article titled “Cramming more components onto integrated circuits” predicted an exponential growth in the number of “components” (transistors, diodes, resistors, and capacitors) that would be built on an integrated circuit as semiconductor fabrication expertise grew. His initial observation was that the device-doubling time was 18 months, later refined to 2 years.

Incredibly, Moore synthesized this remarkably accurate, long-lived truism from very few data points. The integrated circuit had only been invented in late 1959, a little more than five years before Moore’s article appeared. By 1965, integrated-circuit fabrication technology had progressed only to the point where it could put 50 or 60 components on a chip. Moore conjured his doubling-trick projection using the five data points he had available. His law put the industry on a breakneck course that has lasted 45 years and will surely continue for at least another 10-15 years. However, the doubling time, now administered by the International Technology Roadmap for Semiconductors (ITRS), may be stretching out to three years as we approach the fundamental atomic limits of the materials. (Trace widths on today's most advanced ICs are now only a couple of dozen atoms wide and gate-oxide thicknesses are less than 10 atoms thick.)

Ten years after his first article appeared, Moore spoke about his law at the IEEE’s 1975 International Electron Devices Meeting (IEDM) and said that he saw “no present reason to expect a change in this trend.” However, over the years, there have been frequent predictions of the law’s demise.

For example, the industry was facing the transition from 3- and 2-micron lithographies to 1-micron technology in 1984. At that time, the fear was not that semiconductor fabrication technology would fail to keep pace. Rather, the issue was the IC designers’ ability to design such complex chips, which was falling behind the abilities of the manufacturing processes. Bob Kirk and Tom Daspit of American Microsystems published these paragraphs in their article “Making the Design Transition,” published in a 1984 issue of Semiconductor International magazine:

“The ability to design complex ‘one-micron design rule’ ICs is, in fact, a key in the semiconductor industry’s transition to using such processes effectively. … The capabilities of one micron technology will not be fully exploited until sufficiently powerful tools in these areas emerge.”

Remarkably similar to the situation the industry faces today with nanometer design rules, no? The 1984 article continues:

“Design engineers are, in fact, now beginning the transition from pencil-and-paper logic design to interactive design using engineering workstations. …

It should be noted that wider use of automated design aids is contingent on the acceptance of some reduction in the use of circuit area—known as area penalty. …

Automated tools generally waste some silicon area; they are not as area efficient as human designers.

While industry pressure exists to reduce area penalty, the sheer complexity of one-micron circuits dictates that some area be traded off in favor of completing designs in a timely way with automated tools. The pressure to reduce area penalty can be softened by trading off total design cost against the total manufacturing costs.”

So, the issues of moving IC hardware designers to higher abstraction levels and trading off silicon area for design complexity and cycle time have been with us for at least 20 years. Even 20 years ago, Moore’s-Law scaling was providing more components than could be accommodated by the design tools and methodologies in use. The same is true today even though designers have jumped several levels of abstraction from hand-drawn transistors, through schematic-drafting CAD systems, to HDLs and logic synthesis. Moore’s Law continues to keep the raw silicon capabilities ahead of our design abilities and will do so for at least another decade. After that, we will need to find another medium to work in because silicon will be worked out.

Thursday, April 14, 2005

If it’s broke, fix it

Bad news for SOC designers as reported in EETimes last week in an article written by David Lammers quoting Cadence’s Senior VP of R&D Ted Vucuverich. While the bulk of the article covered the industry trend toward adoption of 65nm design rules sooner rather than later (more on that topic in a later blog), the article’s last paragraph discussed the dismal state of SOC design success today:

"The industry has been weighed down by relatively poor first-time design success rates, he said, quoting data from analysis firm Collett and Associates. In 2003, only about one-third of the 130-micron designs achieved first-time success. After the third iteration, only 60 percent of the designs worked, he said, attributing hard-to-detect in the designs for the low rate of improvement. After three failures, many designs afflicted with "really hard problems" are declared disasters and abandoned altogether, he said."

A 66% initial failure rate is troubling and says something about the EDA industry’s current inability to support designs of deep-submicron complexity. Not to sound like a broken record here, but the current approach to system design, which is based on techniques developed more than 10 years ago, is now well and truly broken. The statistics prove it! This trend will only worsen as 90nm and then 65nm design rules become more common.

I believe that the solution to this design problem is to engineer systems at much higher abstraction levels. That means that engineers need to spend much less time hacking RTL, far less time verifying new blocks of custom logic, and much more time thinking about, tinkering with, and simulating systems at the block-diagram level. To do this, design teams must use building blocks larger than gates, flip-flops, registers, and ALUs. They must also cease and desist from manually translating algorithms from high-level languages into hardware-description languages.

Moore’s Law isn’t dead. The International Technology Roadmap for Semiconductors (ITRS) has codified this law and ensures that we will have more transistors per chip every year. Our system-design styles must now use those transistors far more effectively to overcome the barriers to complex system design and to substitute what’s in surplus (transistors) for what’s scarce (engineering time and project cycle time).

I work for a configurable microprocessor core vendor so it’s no secret that I think processor cores are part of the solution. They are pre-verified, correct-by-construction blocks of RTL that need relatively little verification. Processor cores can run software directly, eliminating manual translation of C or C++ to RTL. Configurable cores can run HLL programs at speeds approaching those of hand-built RTL blocks but they’re far easier to design into a system, in large numbers. Processor cores and memories are clearly part of the solution to the high failure rate of today’s SOC designs.

Wednesday, April 13, 2005

Don’t reinvent the wheel if you only need to add a few spokes

Bob Colwell once worked for Intel as chief IA32 architect through the Pentium II, III, and 4 microprocessors. The guy knows a lot about traditional microprocessor design. These days, he’s a consultant and a regular columnist in the IEEE Computer Society’s publication Computer. The April issue of Computer just arrived at my house and Colwell’s column was the first thing I turned to, as usual because I love reading what he has to say.

This month, Colwell’s theme is the “point of highest leverage.” By that, he means putting your efforts on producing results that will have the greatest impact on your project. Some of his advice is excellent for any team contemplating the design of a new system.

Colwell starts by discussing a common sense approach to developing software:

“If you have written computer programs, you have probably wrestled with computer performance analysis. Naïve programmers may just link dozens of off-the-shelf data structures and algorithms together, while more experienced coders design their program with an eye toward the resulting speed. But either way, you end up running the program and wishing it was faster.”

This paragraph succinctly sums up the experience of all computer programmers for the last 60 years, ever since the switch was first thrown to power up ENIAC. Colwell continues:

“The first thing you do is get a run-time histogram of your code, which reveals that of the top 25 sections, one of them accounts for 72 percent of the overall runtime, while the rest are in the single digits… Do you a) notice that one of the single-digit routines is something you’d previously worried about and set out to rewrite it, or b) put everything else aside and figure out what to do about that 72 percent routine?”

Developers who choose alternative b) obviously understand the principle of the “point of highest leverage.” Previously, software developers stuck with fixed-ISA processors (like Colwell’s Pentiums) would have to heavily rework the troublesome code, perhaps dropping into assembly language to truly maximize the processor’s performance. Today’s SOC developers have another choice: extend a processor core’s instruction set specifically for the target code to achieve a project’s performance goals. This approach is a natural evolution of the harnessing of microprocessor technology, now that the tool automation is available to automatically generate the RTL for the extended microprocessor and all of the required software-development tools.

But things (at least processor-related things) are no longer the way Colwell describes them later in this column:

“It is, however, crucial to identify exactly what should be at the top of your worry list. Important changes (read: risks) such as new process technologies automatically go on that list because if trouble arises there, you have few viable alternatives. If you’re contemplating a new microarchitecture, that goes at the top of the list. After all, your team hasn’t conjured up the new microarchitecture yet—you’re only asserting that you need one. The gap between the two facts may turn out to be insurmountable.”

These words were sure and true in the day when processors and software tools were developed by hand. Developing new Pentium architectures is surely a year’s-long endeavor requiring hundreds of engineers. However, one engineer can now add new registers and instructions to a base processor in a few days using automated design tools. The resulting processor hardware, generated automatically by the design tools, is correct by construction.

There are two key elements that are essential to the ability to rapidly create such extended processors. The first is a small, fast base processor architecture that can execute any program because it is a complete processor. The base processor may not execute the target code at the desired speed, but it can at least execute that code. That’s a significantly advantageous starting point.

It’s also a very logical starting point. There’s no need to reinvent a way to add two 32-bit integers. It’s been done before. However, there are very real, performance-related reasons for adding new instructions that streamline code. For example, specific registers sized to an application’s data elements (such as 48-bit, 2-element audio vectors) and instructions that explicitly manipulate those data elements (such as direct codebook lookups and customized MAC instructions) can significantly boost code performance well beyond the limits of traditional assembly-language coding while adding very few gates to the processor’s hardware design.

The second essential element is the automatic generation of the associated software-development tools. The task of manually writing compilers, assemblers, debuggers, and profilers for a new processor architecture is as time consuming, and just as important, as developing the new processor itself. The processor is useless if software developers cannot easily write and debug programs for it.

Colwell’s column wavers perilously close to the edge of reality when he writes:

“Start with the givens. Experience gives you a set of things you can take for granted: techniques, know-how, who is good at what, tools that have proven themselves, validation plans and repositories, how to work within corporate planning processes. If you’ve accumulated enough experience, you’ve learned never to take anything for granted, but some things don’t need to appear at the top of your worry list.”

Most design teams do not have processor customization at the top of their worry list because they don’t realize that it’s now possible to directly attack processor performance by designing a better processor. These people already “know” that they’re not processor designers and that it “would be foolish” for them to even consider developing a processor with instructions specifically for a task on an SOC.

Conventional wisdom says that when a fixed-ISA processor cannot handle a job, you need to design hardware by writing some Verilog or VHDL. That conventional wisdom is based on nearly 35 years of design experience with microprocessors. When the microprocessor cannot do the job, it needs supplemental hardware.

That conventional wisdom is now plainly wrong. The “techniques” and “know-how” that Colwell takes for granted because of experience have now been superseded because of the march of technology. In the 1960s, the conventional wisdom rejected integrated circuits entirely. Here’s a quote from a speech Gordon Moore gave to SPIE in 1995:

“In 1965 the integrated circuit was only a few years old and in many cases was not well accepted. There was still a large contingent in the user community who wanted to design their own circuits and who considered the job of the semiconductor industry to be to supply them with transistors and diodes so they could get on with their jobs.”

Things were no different a few years later when Intel introduced the first microprocessor. The Intel 4004, which appeared in 1971, did not take the system-design world by storm. Design engineers knew how to wire up hundreds or thousands of TTL gates packaged a few at a time in 7400-series logic packages. They did not know how to write and debug software. Further, early microprocessors cost one or two hundred dollars, far more than the few TTL packages they replaced. As a result, it took about a decade for microprocessors to become well established as essential elements in system design.

Things were again no different in the late 1980s as the IC-design industry was facing a complete breakdown in design methodology. The schematic-capture methods of the day were proving to be completely inadequate to the task of describing the complexity of the chips that could be built. Here’s a quote from an article on VHDL written by EDA editor Michael C Markowitz in the March 30, 1989 issue of EDN Magazine:

“The reluctance of designers to embrace new techniques over their well-worn, time-proven methods will impede VHDL’s rate of acceptance… But once the benefits become clearer and the reluctance to write code rather than draw or capture a design dissipates, VHDL will gather steam as a design language.”

Markowitz was dead on regarding the onset of hardware-description languages although it was Verlilog, not VHDL that established a hold on designers in the United States. European designers did adopt VHDL.

Kurt Keutzer, then with AT&T Bell Labs and now a professor at UC Berkeley, summarized the situation quite well in his paper that same year, at the 1989 Design Automation Conference:

“One of the biggest obstacles to the acceptance of synthesis for ASIC design is the lack of education. Designing a circuit using a synthesis system is radically different from designing a circuit using most current design systems. The ability to hand optimize transistor or gate-level networks is of little use in synthesis systems, while an entirely new class of skills are demanded. The acceptance of synthesis procedures requires a significant re-education of designers currently in industry, as well as a broadened academic curriculum for the upcoming generation of designers.”

Today, there are many SOC designers who believe that things are as they always have been. They weren’t around 15 years ago to see logic synthesis take over the industry. They believe that people have been writing RTL since the dawn of time and will continue to do so until the universe expires of thermodynamic heat death. These people share much in common with the 1960s designers who wanted their discrete diodes and transistors, the 1970s designers who refused to learn how to program microprocessors, and the 1980s designers who clung to their schematics rather than embracing Verilog and VHDL. There are too many transistors on today’s SOCs to design even most of them using hardware-description languages. Once again, IC fabrication technology has outstripped our “popular” design methods and new methods are required to keep pace.

Colwell is right about a lot of things but he’s wrong about experience giving you “a set of things you can take for granted.” Thanks to the pace of technological development, you must always question your assumptions about the things you can take for granted. The industry changes. Design changes. And the companies that adapt quickly survive. The rest don’t.

Wednesday, April 06, 2005

Billions and Billions of Processors

Last night, I attended an open house in Menlo Park at the new digs of the Foresight Institute, a nanotech think tank founded in 1986 to further the cause of nanotechnology. The Foresight Institute was created by K. Eric Drexler, who brought nanotechnology into the public light with his book, Engines of Creation, published in 1987. I remember reading Drexler’s book during a plane flight in the late 1980s. I also remember being stunned by the raw power and promise of nanotech. Today, nanotech still offers a lot of promise, and a few real products.

However, the image appearing on the Foresight Institute’s home page stopped me cold yesterday. It shows an artist’s conception of a desktop nanofactory building a white block about the size of a Rubik’s cube. The caption for the image mentions that the white block could contain, for example, one billion processors.

At Tensilica today, (and in our book Engineering the Complex SOC) we’re concerned with harnessing tens or hundreds of processors on an SOC. The average Tensilica customer uses 6 processors per chip and we have one client, Cisco, which has put close to 200 processors on one chip. That’s today.

A billion processors sort of approximate the processing power of all the computers currently attached to the Internet. In the volume of a baseball. That’s tomorrow.

We don’t yet know how to master the chaotic energy of a billion processors. However, the dreams of the people working in nanotechnology give us fair warning that we need to start thinking seriously about how to harness such complexity.