Second, the chip giant fights the cloud: NVIDIA dominates, and Intel AMD rises.
Wisdom things (WeChat official account: zhidxcom)
Wen | Xinyuan
In 2019, the new cloud AI chip battlefield is surging.
In the past few years, artificial intelligence (AI) has suddenly exploded from a neglected academic unpopular research to the forefront of commercialization in running all the way, which has set off a hurricane of intelligent upgrading and Internet of Everything in various fields closely related to people’s lives, such as security, finance, education, manufacturing, home and entertainment.
The direct promoters of this unprecedented technological revolution are foreign Internet giants such as Google, Microsoft, Facebook and domestic BAT, as well as a number of new AI start-ups, and the soul pillar of these companies to quickly break ground in the AI field is to provide a steady stream of AI hardware providers with high-density computing power.
AI hardware application scenarios are usually divided into cloud and terminal. Cloud mainly refers to large-scale data centers and servers, and terminals include rich scenarios such as mobile phones, vehicles, security cameras and robots.
Whether it is online translation, voice assistant, personalized recommendation or various AI development platforms that lower the threshold for developers to use, wherever AI technology is needed, cloud AI chips are needed to provide strong computing support for data centers around the clock.
According to the data revealed by NVIDIA in 2017, by 2020, the global market size of cloud AI chips will exceed 20 billion US dollars, and this huge market has become a place where chip giants are eyeing.
NVIDIA General Graphics Processing Unit (GPGPU) is soaring on the east wind of deep learning. Its share price was still $20 in 2015, and soared to $292 in October 2018. Its market value surpassed KFC and McDonald’s, and it became the first share in the AI field, with a market value of billions of dollars, enjoying unlimited scenery.
Its rocket-like rise awakened a number of potential competitors, and the storm appeared on the horizon. Semiconductor giants such as Intel and AMD are catching up. Google, Amazon, Baidu, and Huawei have made cross-border self-research, and dozens of new chip startups have sprung up, with the intention of breaking through the ceiling of cloud AI chip performance and reshaping this market through self-research architecture.
This paper will take a panoramic view of the battle of cloud AI chips, take stock of the five semiconductor giants, seven Chinese and American technology giants and 20 domestic and foreign chip companies that have joined the battle, and see if NVIDIA, which once created myths, can maintain its legendary empire? Can the new computing architecture that has appeared or is being developed now adapt to the future algorithms? Which companies are more likely to survive in a competitive environment with strong hands?
Whoever can dominate this cloud AI chip war will win more say in the battle of cloud computing and AI market in the future.
It all started with an accident, and it was by no means an accident.
More than a decade ago, NVIDIA and AMD became the two dominant players in the field of graphics cards after fierce fighting with dozens of rivals. At that time, most NVIDIA employees did not know what artificial intelligence (AI) was.
At that time, NVIDIA’s total revenue was about $3 billion, and its founder and CEO Huang Renxun made a risky decision-spending $500 million on CUDA projects every year, and transforming GPU into a more general computing tool through a series of changes and software development, with a total amount of nearly $10 billion.
This is a very forward-looking decision. In 2006, CUDA, the world’s first general computing solution on GPU, came into existence. This technology brought more and more convenient entry experience to programmers, and gradually accumulated a strong and stable developer ecology for NVIDIA GPU.
Until 2012, NVIDIA encountered the enthusiasm of deep learning.
This year, Geoffrey Hinton, a professor at the University of Toronto, Canada, a master in the field of machine learning and the father of neural networks, led the research group to train CNN)AlexNet, a convolutional neural network (CNN), and won the ImageNet image recognition competition in one fell swoop, pushing AI to the historical turning point of academic focus.
GPU is not born for deep learning, and its parallel computing ability coincides with the logic of deep learning algorithm. Each GPU has thousands of cores in parallel, and these cores usually perform many low-level and complicated mathematical operations, which is very suitable for running deep learning algorithms.
After that, the increasingly strong "CUDA+GPU" combination, with its invincible processing speed and multitasking ability, quickly captured the hearts of a large number of researchers and soon became an essential component of major data centers and cloud service infrastructures around the world.
The battle of the giants’ cloud AI chips quietly kicked off.
With an early start and ecological stability, NVIDIA soon became the leader in the cloud AI chip market.
NVIDIA is marching forward on the road to stronger, showing amazing technologies such as Tensor Core and NVSwitch one after another, and constantly creating new performance benchmarks. In addition, it also builds a GPU cloud, which enables developers to download a new version of the deep learning optimization software stack container at any time, greatly lowering the threshold for AI R&D and application.
In this way, NVIDIA built an indestructible wall by accumulating time, talents and technology. Anyone who wants a city needs to follow the rules specified by NVIDIA. As of today, NVIDIA has more than 10,000 engineers, and its GPU+CUDA computing platform is by far the most mature AI training program, which devours the cakes in most training markets.
From the functional point of view, the cloud AI chip is mainly doing two things: Training and Inference.
Training is to cram massive data into the machine and make it learn to master specific functions by repeatedly adjusting the AI algorithm. This process requires extremely high computational performance, accuracy and universality.
Reasoning is to apply the trained model, its parameters have been solidified, and it does not need massive data, so the requirements for performance, accuracy and universality are not as high as those of training.
GPU is a difficult mountain to climb in the training market, but its advantages are relatively less obvious in the reasoning market which requires higher power consumption.
And here, it is also the direction where the semiconductor giants who entered the game late gathered.

▲ Incomplete statistics of major cloud AI chip products of chip giants
Chips are a winner-take-all market, and cloud AI chips are no exception. NVIDIA’s high, medium and low-end general-purpose GPU for accelerating data center applications has always been a performance benchmark for all players.
NVIDIA invested billions of dollars and thousands of engineers in a short time, and launched the first Pascal GPU optimized for deep learning in 2016. In 2017, it launched Volta, a new GPU architecture with five times higher performance than Pascal, and TensorRT 3, a neural network reasoning accelerator, also appeared at the same time.
In the latest quarterly financial report, NVIDIA’s data center revenue increased by 58% year-on-year to $792 million, accounting for nearly 25% of the company’s total revenue, reaching a total of $2.86 billion in the past four quarters. If it can maintain this growth, it is estimated that the data center will reach about $4.5 billion in 2019.
AMD, which has long competed with NVIDIA in the GPU field, is also actively promoting the research and development of AI accelerated computing. In December 2016, AMD announced ——Radeon Instinct, an accelerator card program focusing on AI and deep learning.

Speaking of it, AMD’s start in the field of deep learning is inseparable from the support of China. Baidu was the first China company to adopt AMD Radeon Instinct GPU in its data center, and later Alibaba signed a contract with AMD.
At present, AMD’s GPU is still at least behind NVIDIA’s Tesla V100. However, when NVIDIA’s new move was not issued, AMD took the lead in announcing the launch of the world’s first 7nm GPU at its Next Horizon conference, named Radeon Instinct MI60, with a memory bandwidth of 1 TB/s. It also claims that its 7nm GPU has become the fastest double-precision accelerator in the world through AMD Infinity Fabric Link and other technologies, which can provide floating-point performance as high as 7.4 TFLOPS.

In addition to providing GPU chips, AMD is also building a more powerful open source machine learning ecosystem by launching ROCm open software platform.
Although GPU can’t resist NVIDIA for the time being, AMD has its own unique advantages. AMD has both GPU and CPU, and it can connect seamlessly between GPU and CPU with Infinity Fabric, but it is difficult for Intel Xeon processor +NVIDIA GPU to achieve such a perfect connection.
Imagination Technologies is also camped in GPU market, but it has been deeply involved in mobile GPU for a long time. From 2017 to 2018, Imagination announced three new PowerVR graphics processing units (GPUs), focusing on the AI terminal market.
At the end of last year, Imagination executives revealed in an interview that Imagination may announce the launch of GPU for AI training.
In the application of AI reasoning, FPGA has the advantages of flexibility and programmability compared with ASIC, which can be reconfigured in real time for specific tasks and has lower power consumption than GPU.

▲ Flexibility and performance difference of processors
The eldest and second children in the field of FPGA are Xilinx and Intel Altera all the year round. Facing the emerging AI market, the innovative genes in the body are also eager to try.
Xilinx’s killer is called Versal, which is the first adaptive computing acceleration platform (ACAP) in the industry. It adopts TSMC’s 7nm process and integrates AI and DSP engines. Its software and hardware can be programmed and optimized by developers.
This killer took four years to polish, and it is said that the AI inference performance of Versal AI Core is expected to be 8 times higher than that of the industry-leading GPU. According to the news released by Xilinx, Versal will deliver the goods this year.
Some insiders believe that Versal series may change the AI reasoning market.

If NVIDIA opens the door to AI by natural genes, then Intel quickly ranks among the front rows of cloud AI chips by the shortcut of "buy in buy buy". As a semiconductor overlord for decades, Intel’s first goal is to become a "generalist".
As we all know, Intel’s invincible trump card is Xeon processor. Xeon processor is like a brilliant strategist, strategizing and able to handle all kinds of tasks, but if you let him forge weapons, his efficiency is completely inferior to that of a warrior with a simple mind but a brute force.
Therefore, in the face of AI with a large number of repetitive and simple operations, it is both overqualified and inefficient for Xeon processors to handle such tasks. Intel’s approach is to match the Xeon processor with an accelerator.
What if I don’t have the technical background to be an AI accelerator? With a stroke of Intel’s pen, buy it directly!
In December 2015, Intel spent $16.7 billion to buy Altera, the second child of the programmable logic device (FPGA) at that time. Now Intel has accelerated some tasks in the data center by more than ten times with the "Xeon+Altera FPGA" heterogeneous chip.
Especially in the past year, Intel’s overweight on FPGA is visible to the naked eye. Two years ago, Intel successively launched Stratix 10 series, which is known as the fastest FPGA chip in history, and this series won the favor of Microsoft.
Microsoft launched Project Brainwave, a cloud solution based on Intel Stratix 10 FPGA, claiming that it runs at a speed of 39.5 TFLOPS with a delay of less than 1 ms.
In addition to Stratix 10 FPGA chip, Intel first settled in Chongqing in December last year, and then in April this year, it unveiled a new weapon that has been quietly polished for several years-FPGA Agilex with a new architecture, which integrated Intel’s most advanced 10nm process, 3D packaging, the second generation HyperFlex and other innovative technologies.

Intel’s FPGA has gained a foothold in the server market, while another important transaction is still in the dormant period.
In August 2016, Intel spent 300-400 million dollars to buy Nervana, a California startup dedicated to building deep learning hardware. Shortly after the acquisition, the former CEO of Nervana was promoted to the general manager of Intel AI business unit, and the first deep learning special chip Lake Crest using TSMC’s 28nm process was mass-produced in 2018, and claimed that its performance was 10 times that of the fastest GPU at that time.
In May 2018, Intel’s new cloud AI chip Nervana Neural Network Processors (NNP)-Spring Crest was officially unveiled. It is said that its power consumption is less than 210 watts, and its training performance is 3-4 times higher than that of Lake Crest, which will be open to users in the second half of 2019.
For cloud AI chip reasoning, Intel revealed at CES in Las Vegas that it is working closely with Facebook on the reasoning version of Nervana neural network processor NNP-I. NNP-I will be a system-on-a-chip (SoC) with built-in Intel 10nm transistors, and will include the IceLake x86 core.

Compared with Google’s TPU, Carey Kloss, vice president of Intel Artificial Intelligence Group (AIPG) and core member of Nervana team, thinks that TPU 2.0 is similar to Lake Crest and TPU 3.0 is similar to Spring Crest.
Qualcomm, which is flourishing in the field of mobile chips, has just raised a stepping stone to enter the field of cloud computing and supercomputing.
In April this year, Qualcomm announced the launch of the Cloud AI 100 accelerator, which will expand Qualcomm’s technology to data centers, and it is expected that samples will be delivered to customers in the second half of 2019.
It is reported that this accelerator is based on Qualcomm’s technology accumulation in signal processing and efficacy, and is specially designed to meet the rapidly increasing demand of AI reasoning processing in the cloud, so that distributed intelligence can spread from the cloud to the user’s edge terminal and all nodes between the cloud and the edge terminal.

Keith Kressin, senior vice president of product management in Qualcomm, said: "Qualcomm CloudAI 100 accelerator will set a new benchmark for AI inference processors in data centers in today’s industry-no matter which combination of CPU, GPU and/or FPGA is adopted to realize AI inference processors."
In addition, he also said that Qualcomm is currently in an advantageous position to support complete AI solutions from the cloud to the edge, and all AI solutions can be connected with 5G with the advantages of high speed and low latency.
Compared with the ambitious chip giants facing the cloud and data center market, the minds of the following cross-border players are relatively "simple".
The goal of these Chinese and American Internet giants is not to directly compete with NVIDIA, Intel or AMD, but to provide their own cloud customers with powerful computing power and reduce their dependence on traditional chip manufacturers.
Their choice of self-developed chips is also different. Google, Amazon and others choose the route of ASIC, while Microsoft and others are committed to using field programmable gate array (FPGA).

▲ Incomplete statistics of major cloud AI chip products of cross-border technology giants
As one of the earliest technology companies to start AI-related research and development, Google is also a pioneer in testing special AI chips, and it was the first to verify that ASIC can replace GPU in the field of deep learning.
In 2016, Google launched its own AI chip Tensor Processing Unit(TPU), which has now entered the third generation, providing computing support for various AI applications such as Google’s voice assistant, Google Maps and Google Translation. The TPU originally designed is used in the reasoning stage of deep learning, and the new version can already be used for AI training.
Google claims that it takes one day to train a machine translation system using 32 best commercial GPUs, and the same workload takes six hours on eight connected TPU.
Google currently only operates this device in its own data center and does not sell it to the outside world. Recently, however, Google said it would allow other companies to purchase its TPU chips through its cloud computing service.
Google TPU is limited in Google’s external service market. TPU can only be used and run with Google TensorFlow AI framework. Users cannot use them to train or run AI built with Apache MxNet or Facebook’s PyTorch, nor can they be used in non-AI HPC applications where GPU occupies the supreme position.
But Google is satisfied with this because it regards TPU and TensorFlow as its comprehensive AI leadership strategy. Software optimized for its software is optimized for its software, which can build a powerful and durable platform.
The new news this year is that Google set up a new chip team gChips in Bangalore, and recruited at least 16 technical veterans from traditional chip companies such as Intel, Qualcomm, Broadcom and NVIDIA.
In May last year, the Microsoft AI chip Brainwave opened the cloud beta, saying that the FPGA chip used in the Project Brainwave computing platform was designed for real-time AI, which was five times faster than the TPU chip used by Google (the Microsoft AI chip Brainwave open cloud beta was five times faster than TPU). Jason Zander, executive vice president of Microsoft Azure, also said that Microsoft Azure actually designed many self-developed chips for data centers.

I have to admit that the domestic technology giants named the chip, and the cultural Level was higher than that of foreign countries by more than one level.
The "Kunlun" named by Baidu for the cloud AI chip is the first mountain in China. According to legend, the ancestor of this mountain was honored as "the ancestor of thousands of mountains" and "the ancestor of Long Mai" by the ancients, and well-known myths and legends such as the Goddess Chang’e flying to the moon, Journey to the West and Legend of the White Snake are all related to this mountain.
The "rise" of Huawei’s cloud AI chip takes the meaning of transcending the world, rising and imposing, and is quite popular among literati.
Both Baidu and Huawei are domestic technology companies that have made cross-border cores early. As early as August 2017, Baidu released a 256-core FPGA-based cloud computing acceleration chip at the Hot Chips conference in California, and its partner was Xilinx. Huawei made chips earlier. In 2004, HiSilicon, a semiconductor company, was established, but it used to be a chip solution for terminals.
In the second half of 2018, a new round of core-making forces represented by them blew the horn of China cloud AI chip charge.
Baidu is an early technology giant in China. As early as 2010, it began to use FPGA to do research and development of AI architecture. In 2011, it launched small-scale deployment. In 2015, it broke the deployment scale of thousands of chips. In 2017, it deployed more than 10,000 FPGAs. Baidu’s internal data center and autonomous driving system were all used on a large scale.
In August 2017, Baidu released a 256-core XPU chip based on FPGA, which is in cooperation with Xilinx. The core is small, there is no cache or operating system, and the efficiency is equivalent to that of CPU.

Then at the Baidu AI Developers Conference held in July 2018, Baidu announced Kunlun, the most powerful AI chip in the industry at that time.
In terms of parameters, Kunlun chips are manufactured by Samsung, using 14nm technology, with a memory bandwidth of 512GB/s and tens of thousands of cores, which can provide computing power of 260 TOPS at a power consumption of over 100W W..
Compared with NVIDIA’s latest Turing architecture T4 GPU, T4′ s maximum power consumption is 70W, and the highest computing power it can provide is 260 TOPS. However, this GPU was released two months later than Kunlun Chip, and it was not sold in China at the beginning. Ouyang Jian, chief architect of Baidu, revealed at this year’s AI chip innovation summit that this year’s "Kunlun" will be used on a large scale within Baidu.

Huawei’s cloud AI chip Ascension 910 is directly at the release site and the front PK of NVIDIA and Google. Ascent 910 directly uses the most advanced 7nm technology, adopts the Leonardo da Vinci architecture developed by Huawei, and the maximum power consumption is 350W. Huawei’s banner is "the chip with the highest computing density on a single chip" as of the release date, and its semi-precision (FP16) computing power reaches 256 TFLOPS, which is twice as high as the 125 TFLOPS of NVIDIA V100.
Xu Zhijun even said that if 1024 Ascending 910s are collected, there will be "the largest AI computing cluster in the world so far, with a performance of 256P, and no matter how complicated the model is, it can be easily trained." This large-scale distributed training system is called "Ascend Cluster".

In terms of landing, Baidu said that its Kunlun will be widely used in Baidu data center this year. Huawei’s Ascent 910 was originally planned to be listed in Q2 this year. Now, under the background of trade war, I don’t know if it will be delayed.
As the leaders of the cloud computing market in China and the United States, Alibaba and Amazon are a little late, but they will never be absent.
The research and development purposes of the two companies are very clear, which are to solve the AI reasoning operation problems of commercial scenes such as image, video recognition and cloud computing, improve the operation efficiency and reduce the cost.
Alibaba Dharma Institute announced in April last year that the performance of Ali-NPU will be 10 times that of mainstream CPU and GPU architecture AI chips on the market, with manufacturing cost and power consumption only half, and cost performance exceeding 40 times. In the same month, Ali wholly acquired Zhongtianwei, the only independent embedded CPU IP core company in mainland China.
The new progress occurred in September, when Ali merged Zhongtianwei and Dharma Institute’s self-developed chip business into a chip company, Pingtou Ge. The important task of developing Ali-NPU is taken over by Pingtou Ge. The first batch of AI chips are expected to be available in the second half of 2019 and will be applied in cloud data scenarios such as Ali data center, urban brain and autonomous driving. It will be opened to the outside world through Alibaba Cloud in the future.
In the simulation verification test, the prototype of this chip saved 35% of the hardware cost of laying Ali city brain. But since then, Ali has hardly made any sound of relevant progress.
Amazon’s cloud AI chip Inferentia was announced at the Re: Invent conference held in Las Vegas last November.
The technical source of this chip can be traced back to Annapurna Labs, an Israeli chip company acquired by Amazon for $350 million in early 2015. According to the official introduction, each Infirentia chip provides hundreds of TOPS, and multiple AWS Inferentia chips can form thousands of TOPS. The chip is still under development. According to the forecast, the chip will be listed at the end of 2019.

Facebook’s core-building plan surfaced very early, but it is the player with the least information exposure.
In addition to buying relatively mature chip companies, recruiting is also a standing choice. Facebook’s core-building plan first appeared in April last year, and official website posted an advertisement for ASIC&FPGA design engineers to form a chip team. Three months later, the US media Bloomberg reported that Facebook poached Shahriar Rabii, Google’s senior engineer, as vice president and chip leader.
Yann LeCun, the chief artificial intelligence scientist of Facebook and the winner of the latest Turing Award, revealed in an interview that its core-building is mainly to meet the needs of real-time video surveillance of the website in the future.
By January of this year, Intel said at the Global Consumer Electronics Show (CES) that it was working with Facebook to develop a new AI chip to speed up reasoning and strive to complete it in the second half of this year.
But so far, the outside world knows nothing about the performance information of Facebook AI chip.
The revival of AI has subverted the stable situation of the whole industry of carry, a top chip company such as Intel, AMD and Qualcomm, and created opportunities for a new group of chip entrepreneurs.
Some startups want to create a new platform from scratch, all the way to the hardware, which is optimized for AI operation. It is hoped that by doing so, it can surpass GPU in speed, power consumption and even the actual size of the chip.

▲ Incomplete statistics of major cloud AI chip products of domestic start-ups.
Let’s talk about the domestic cloud AI chip creation enterprises, the most dazzling of which are Bitland and Zhongke Cambrian.
Bitcontinent is famous as the leader of mining chip industry, but in the past year’s bitcoin ebb tide, Bitcontinent was the first to fall into the whirlpool of public opinion, and the listing plan failed to be realized as scheduled.
This company, established in 2013, started the AI chip business in 2015. Following the launch of the first generation of 28nm cloud AI chip product BM1680 in 2017, it released the second generation BM1682 in the first quarter of 2018, with an iteration time of only 9 months.
According to the core-making plan announced by Bitcontinent last year, the 12nm cloud chip BM1684 should be launched at the end of 2018, and BM1686 will be launched in 2019, probably using the 7nm process, but both chips are late.

Like Bitland, there are also AI chiplet unicorns, such as the Cambrian.
In CAMBRIAN, the neural network processor (NPU) embedded in Kirin 970, Huawei’s first mobile phone AI chip, became a popular fried chicken in AI chip enterprises at home and abroad. After two rounds of financing, the overall valuation was about 2.5 billion US dollars (about 17 billion yuan).
In May 2018, the CAMBRIAN officially released the first generation cloud AI chip MLU100, which is said to provide better performance than NVIDIA V100 with lower power. Iflytek, its customer, once disclosed the test results, saying that the energy consumption efficiency of MLU100 chip in voice intelligent processing is more than five times ahead of the cloud GPU solution of international competitors.
A year later, Siyuan 270, the second-generation cloud AI chip, did not get hot first, and some of its performance was exposed by netizens in Zhihu. The peak performance and power consumption were basically the same as NVIDIA Tesla T4. It is rumored in the industry that the Cambrian may make a breakthrough in the field of low-precision training. The chip will be released in the near future if nothing unexpected happens.

The startups that want to benchmark NVIDIA and Google don’t stop there.
A slightly surprising player is Etu Technology, one of the four little dragons of domestic computer vision (CV). In May of this year, Yitu released questcore, the first cloud AI chip jointly developed with ThinkForce, an AI chip maker.
Yizhi Electronics is a low-key but not to be underestimated AI chip startup in Shanghai. In 2017, it received a series A financing of 450 million yuan from Eto Technology, Yunfeng Fund, Sequoia Capital and Gaochun Capital. Its core members come from semiconductor giants such as IBM, AMD, Intel, Broadcom, Cadence, etc., and all have more than ten years of experience in the chip industry.
This customized SoC chip for cloud deep learning reasoning adopts 16nm process and ManyCore architecture with independent intellectual property rights. It is said that it can provide visual reasoning performance of up to 15 TOPS per second, and only accelerates for INT 8 data (8-bit integer data type). The maximum power consumption is only 20W, which is smaller than an ordinary light bulb.
According to the figure, the development of this chip is not to pursue the computing power of hundreds of T like NVIDIA, but to value high computing density.
Like the aforementioned cross-border technology giants, the first step of commercialization of Yitu chip is to package and sell it with its own software and hardware and solutions, and it will not be sold separately. The second and third generation products are also in preparation.

Shanghai’s hot new core-making force is Suiyuan Technology. It can be said that it is the youngest AI chip maker in China. It was established in March 2018 and received 340 million yuan of Pre-A financing from Tencent, focusing on R&D investment in cloud AI acceleration chips and related software ecology. This is the first time Tencent has invested in a domestic AI chip venture.
The founding team of Suiyuan Technology mainly came from AMD, and its founder Zhao Lidong previously worked in AMD China, and then went to Rideco (now merged with Spreadtrum to become Ziguang Zhanrui) as the president.
On June 6, 2019, Suiyuan Technology announced a new round of financing of RMB 300 million, which was led by Red Dot Venture Capital China Fund and invested by Haisong Capital and Tencent. The mystery of its deep learning high-end chip has not yet been unveiled.
Different from the previous players, Tianzhixin and Denglin Technology chose a general-purpose GPU that directly matched NVIDIA.
In China, there is no GPGPU company that can compete with NVIDIA, which is an opportunity worth cutting into for entrepreneurs.
The core-making lineups of the two companies are very mature. The hardware team of Tianzhixin is based on AMD’s GPU teams in Shanghai and Silicon Valley, and the founding team of Denglin Technology is also a veteran in the GPU industry for many years.
At present, the high, medium and low-end GPGPU products of Tianzhixin are under development, and its high-end chip Big Island will support cloud reasoning and training at the same time. The GPGPU processor of Denglin Technology has also passed FPGA verification, and the design of the first generation product Goldwasser has been completed, and it is planned to be available for customer testing before the end of this year.
There is also a startup named Longjiazhi, which was founded in July 2017 and led by Zhixin Capital and Yiling Capital, and is committed to the research and development of TPU chips.
In order to meet the requirements of low latency, high reliability and data security, Long Jiazhi introduced a new chip type, Mission-Critical AI Processor. The first generation chip was named Dino-TPU, which was first applied in cloud data centers. Its computing power exceeded all GPUs except the latest Nvidia Volta, with a delay of only 1/10 of that of Volta V100, power consumption of 75W, and unique redundancy backup and data security.
According to Long Jiazhi’s development plan, the company plans to complete the streaming of the first chip by the end of 2018.
On the other side of the ocean, many AI chip startups in the United States have also targeted the cloud and computing center market.
A company with a strong presence last year was Wave Computing. This startup acquired MIPS, an old chip IP supplier, last year, and also launched the MIPS open plan. Its accumulated financing reached $117 million.
Its core product is called Data Stream Processor Unit (DPU), which adopts CGRA (Coarsegrain Reconfigurable Array/Accelerator) technology, and is suitable for large-scale asynchronous parallel computing problems.
Its main advantage is to make hardware more flexible to adapt to software, achieve a good comprehensive balance in programmability (or universality) and performance, lower the threshold of AI chip development, and will not be affected by memory bottlenecks in accelerators such as GPU.
Wave’s first generation DPU adopts 16nm process technology and runs at a speed above 6 GHz, which has been put into commercial use. According CTO Chris Nicol, its senior vice president and CTO, the new generation of 7nm DPU will introduce MIPS technology and adopt HBM(High Band Memory), which is expected to be released next year.

There is also a very mysterious startup, Cerebras System, which was founded in California in 2016. Even if it hasn’t released any products yet, it doesn’t prevent it from being often compared with chip giants.
Cerebras’s founding team mostly comes from chip giant AMD. Andrew Feldman, its co-founder and CEO, previously founded SeaMicro, a low-power server manufacturer, which was acquired by AMD for $334 million in 2012. Since then, Feldman spent two and a half years climbing to the position of vice president of AMD.
Cerebras raised $112 million in three rounds of financing, and its valuation has soared to as high as $860 million. Today, Cerebras is still in secret mode. According to relevant sources, its hardware will be tailored for "training" deep learning algorithms.

▲Cerebras uses deep learning accelerator for neural network training and reasoning patent.
The Groq founding team established in April 2017 is even more eye-catching, with 8 people from the core team of Google TPU. This startup is ambitious as soon as it comes out, and the computing power of official website display chip will reach 400 TOPS.
SambaNova Systems was founded seven months later than Groq, and headquartered in Palo Alto, California. Its founders include two Stanford professors, Kunle Olukotun and ChrisRé, and an old chip company (Sun’s former senior vice president of development).
Its A round of financing was led by Google Venture(GV), the venture capital department of Google’s parent company Alphabet. This is the first time that GV has invested in an artificial intelligence chip company. In April this year, intel capital announced a total of $117 million in new investment in 14 technology startups, and SambaNova Systems was also on the list.
In addition to China and the United States, AI chip startups in other regions are also gaining momentum.
The most optimistic is a well-funded British unicorn Graphcore, which was established in 2016 with a valuation of US$ 1.7 billion and accumulated financing of US$ 312 million. This startup can be called a giant harvester with a strong investment lineup, including Sequoia Capital, BMW, Microsoft, Bosch and Dell Technology.
This company has built an intelligent processing unit (IPU) specially designed for machine intelligent workload, which supports on-chip interconnection and on-chip storage, and extends from edge devices to "Colossus" dual-chip package for data center training and reasoning.
Graphcore wrote in official website: Our IPU system aims to reduce the cost of accelerating AI applications in cloud and enterprise data centers, and improve the performance of training and reasoning by as much as 100 times compared with the fastest system at present.
At the event of NeurIPS at the end of last year, Graphcore showed an example configuration RackScale IPU-Pod, including 32 1U IPU-Machines, each consisting of four Colossus GC2 IPU processors, providing mixed precision calculation of 500 TFLOPS, more than 1.2GB of processor memory and more than 200TB/s memory bandwidth.

▲Graphcore IPU-Pod racksale system
Habana Labs, another Israeli startup founded in 2016, announced at the AI Hardware Summit last September that it was ready to launch its first AI chip Goya for reasoning. It showed the throughput of classifying 15,000 images per second in the Resnet50 image classification database, which was about 50% higher than NVIDIA’s T4 device, with a delay time of 1.3ms and a power consumption of only 100 W.
Its latest $75 million Series B financing (December 2018) was led by Intel Venture Capital, and part of the funds will be used to develop the second chip Gaudi, which will be oriented to the training market. It is said that the training performance can be linearly extended to more than 1,000 processors.
AlphaICs of India was also established in 2016, and is designing AI chips and working on AI 2.0, hoping to realize the next generation of AI through this series of products.
One of the co-founders of AlphaICs is Vinod Dham, who has the title of "Father of Pentium Chip". He cooperated with some young chip designers to create an executable agent-based AI co-processing chip-RAP chip.
Dham said that AlphaICs chips have an advantage over competitors in processing speed, and that most of what we see now belongs to weak AI, and they can be called "strong AI".
According to Dham, the RAP chip is expected to be launched in mid-2019, "hoping to create a big bang for the real AI".
Tenstorrent is a startup located in Toronto, Canada. It was founded by two former AMD engineers, Ljubisa Bajic and Milos Trajkovic. Most of the core teams came from NVIDIA and AMD to develop high-performance processors designed for deep learning and intelligent hardware.
Earlier last year, the company received a seed round investment from Real Ventures, but it is still in a secret mode.
Among the hardware forces in the field of cloud and data center, a special team is favored by domestic and foreign technology giants, which is photonic AI chip.
Different from conventional chips, these chips use photonic circuits instead of electronic transmission signals. They have higher transmission speed, lower delay and higher throughput than electronic circuits.
In 2016, the MIT research team built the first optical computing system, which was published in the top journal Nature Photonics in 2017 as a cover article. It is this paper that inspires more people around the world to invest in the research and development of photonic AI chips.
Only this MIT team hatched two American companies, Lightelligence and LightMatter, in 2017.
In February 2018, Lightelligence received a $10 million seed round financing from Baidu Venture Capital and American semiconductor industry executives. In February 2019, LightMatter received a $22 million B round financing led by Google Ventures, a venture capital department of Google’s parent company Alphabet.
Lightelligence claims that Photonic Circuits can not only be used as a coprocessor of CPU to accelerate deep learning training and reasoning in the field of cloud computing, but also be used for network edge devices that require high efficiency and low energy consumption.
In April this year, Lightelligence announced the successful development of the world’s first photonic chip Prototype board (prototype), and its photonic chips have been in contact with customers at Google, Facebook, AWS and BAT levels.
LightMatter also focuses on large cloud computing data centers and high-performance computing clusters. They have built two early chips, one of which contains more than eleven transistors.
Inspired by the MIT paper, in 2017, the first photonic AI chip enterprise in China was founded by doctoral students from 10 universities including Tsinghua University, Peking University and Beijing Jiaotong University.
The company received angel round financing in September 2018. It is said that the performance of its photonic chip is 1000 times that of electronic chip, and the power consumption is only 1% of that of electronic chip.
Just this month, Bill Gates also began to invest in AI chips, and invested in Luminous, which also developed silicon light technology. Other investors include the 10100 fund of Uber co-founder Travis Kalanick and the current Uber CEO Dara Khosrowshahi.
Luminous currently has only seven members, but its appetite is not small. Its goal is to create a substitute for 3,000 circuit boards containing Google’s latest Tensor Processing Unit AI chip. Their methods draw lessons from the early work of their co-founder Mitchell Nahmias in neuroporphology photonics at Princeton University.
Now the common problem of these startups is that it is not clear how long it will take to release the first mass-produced photonic AI chips, and whether the practical application effect of these chips can really replace the position of electronic chips.
Nowadays, there are dozens of players who have cut into the cloud AI chip market. However, the general pattern of the software, hardware and service market, which is dominated by NVIDIA and divided by many semiconductor giants, is still relatively stable, and it is not an easy task to produce new pattern changes.
For the chip industry, sufficient production capacity is crucial.
Semiconductor giants can achieve 10 times or 100 times the production capacity, but it is difficult for a startup to do this at the early stage of its business. Today’s startups are mostly IC design manufacturers. If they want to become "self-sufficient" companies like Intel and Samsung, they may need to spend billions of dollars.
After the wave of semiconductor industry integration in 2015-2016, the wave of semiconductor mergers and acquisitions in the past two years is gradually "cooling down", and large companies will be more cautious in their investment or acquisition of chip startups.
The core competitiveness of cloud AI chips lies in talents.
Judging from the more concerned cloud AI chip companies in the current market, their research teams are mostly industry veterans with more than ten years of experience in chip giants, and often have the experience of taking the lead in developing relevant successful products.
Both semiconductor giants and cross-border core-building technology giants are basically taking two paths. One is to invest in and acquire mature chip companies, and the other is to poach chip executives from other big companies.
Song Jiqiang, president of Intel Research Institute, once told Zhizhi that the future of AI chips must be diversified. Different kinds of products meet the requirements of different power consumption, size and price. AI is a marathon, and now this game has just begun.
At this stage, the vast majority of giants and entrepreneurs in the field of incoming cloud AI chips are playing innovative signs, including innovative architecture, storage technology and silicon light technology.
Due to the surge in demand for new computing resources to promote deep learning, many people think that this is a rare opportunity for start-ups to win funds from giants and investment institutions.
Although the number of players is increasing and the flags played tend to be diversified, at present, the innovative hardware that is really mass-produced is still limited. There are still many difficulties faced by cloud AI chips, such as Moore’s Law, which is common in computer architecture, and bottlenecks in semiconductor devices.
The process of developing chips may take several years, and most of the hardware is still under development or in the early test plan. Therefore, it is difficult to predict which enterprises will achieve the promised performance.
Generally speaking, the cloud AI chip market is gradually divided into three forces: semiconductor giants represented by NVIDIA and Intel, Chinese and American technology giants represented by Google and Huawei, and chip entrepreneurs represented by Cambrian and Groq. Among them, semiconductor giants and chip-creating enterprises focus on general-purpose chips, while cross-border core-making technology giants and AI-creating enterprises will not sell directly to the outside world for the time being.
From the application field, although the high energy consumption of GPU has been more and more criticized by the industry, due to its unparalleled parallel computing ability, there is no player who can compete with NVIDIA GPU in the field of cloud AI training. Players who challenge this field are mainly traditional chip giants and startups. Cross-border technology giants include Google, Baidu and Huawei, and the main architectures adopted are general GPU and ASIC.
In the field of cloud AI reasoning, which pays more attention to energy consumption, delay, cost and cost performance, there are relatively more players entering the game, and the advantages of FPGA and ASIC are relatively higher than GPU. Intel, which has a comprehensive AI chip layout, is gaining momentum, and other players are not far behind. Several major Internet giants in China and the United States have basically joined the battle, but the progress of chip research and development of some giants is still unknown.
With regard to improving the core-building strength, most semiconductor giants and technology giants have chosen the shortcut of investment, M&A and chip-digging, so as to be directly assisted by mature chip teams and quickly fill the vacancy of talents and business. For start-ups, there are basically two factors that are favored by the investment community-an experienced founding team and products with innovative technologies. From the perspective of the landing process, the pace of chip start-ups in China can rank among the top in the world.
At present, the vast majority of AI applications still rely on training and reasoning in the cloud. In the field of training, NVIDIA’s solid ecosystem is still an unshakable mountain, and in the field of reasoning, it is even more competitive. As AI is more widely spread to all walks of life, the cloud AI chip market will gain more room for growth, but this market may not accommodate so many players. Capital, device bottlenecks, architectural innovation, adapting to rapidly changing AI algorithms and building an ecosystem are all difficult problems facing these enterprises. What is the AI chip form that is completely suitable for cloud training and reasoning has not yet reached a unified conclusion.