The explosion of AI applications is obvious – many of us now use them on a daily basis, e.g., when we talk to our phones (Hello, Siri!) or to our homes (Alexa, what is the weather like?). And before you know it, I predict we’ll even be talking to our cars and letting them drive us. What is less obvious, though, is how this progress in AI is being driven by the amazing evolution of computer hardware over the past half century.

Computer hardware has made significant strides in terms of price and performance over the last 50 years. In fact, if the price / performance of cars had changed as much as the price / performance of computer hardware – computer memory in particular – we would be buying 1000s of Tesla Model S's for just a penny...and they would have a top speed of 60,000 miles an hour! We're talking a factor of a trillion in performance / price improvements. And these improvements happened just in time for AI to become practical in everyday life.

Computers are, at a very basic level, still not very smart. To train them to run the AI applications we now use every day can require many trillions of arithmetic operations. To put this in context, the computers we used in the 1970s would have needed years to do the calculations we now do routinely in a matter of seconds.

Computer hardware will continue to improve for at least the next decade. Before the year 2000, these hardware improvements tended to produce more speed. However now, not so much. Instead, we’ve seen an improvement in total processing power. For example, we can now place thousands of processors on a chip that previously would have contained just one.

A factor of a trillion seems almost inconceivable and indeed, we’ve had a lot of trouble conceiving it. For example, computer programs are still mostly written using concepts from the 1980s. And while “innovations” like multicore processors allow you to get 8 or 16 processor cores onto a single chip, each core is still programmed using old-style tools. In addition, the chips must contain elaborate hardware to ensure that each one can "see" the same memory and coordinate with other chips.

However, what the new bounty of processing power is giving us is not just 8 or 16 cores on a chip, but hundreds of thousands of processors in a system that are available to do our bidding - provided we can find new ways to program them.

We have two approaches to exploiting these processors. First, some problems can be naturally - and easily - handled by many parallel machines. For example, preparing invoices or showing web pages can be done by thousands of processors simultaneously, because most invoices or web pages don't need to examine other invoices or web pages to be processed. This independence of operations is called data parallelism. Many mathematical operations are data parallel, and the computations can be run quickly by sharing the work among swarms of processors.

The other kind of parallelism is operational parallelism. These are programs that process their data in a series of independent operations, passing data from one operation to the next. These programs can benefit from a process called pipelining. The first piece of data is processed by the first processor, and the result is sent to the second processor. While the second processor is working on that data, the first processor processes another piece of data, and so on. Soon all the processors are busy working, dramatically increasing the output.

The most familiar example of operational parallelism is an automobile assembly line. All workers are busy all the time, and the cars appear much faster than they would if built one at a time. Assembly lines made automobiles so inexpensive that every person could benefit, and we believe operational parallelism will have a similar effect on AI.

Most AI programs can benefit from both data parallelism and operational parallelism. For large problems, there is almost no limit to the speedup the combination of these techniques can bring, provided the processing power is present, and the hardware and software architecture were designed to support both kinds of parallelism.

Wave systems are based on this type of architecture, offering thousands of processors that will offer dramatic increases in speed and reductions in cost over legacy AI systems. The dataflow-based systems are designed to exploit both data parallelism and operational parallelism at the same time, accelerating the time-to-market for AI applications while being cost effective. To learn more about the Wave’s underlying dataflow technology, check out our white papers.