What is the field of deep learning hardware in the field of artificial intelligence

News classification

Contact us

Add: No. 9, North Fourth Ring Road, Haidian District, Beijing. It mainly includes face recognition, living detection, ID card recognition, bank card recognition, business card recognition, license plate recognition, OCR recognition, and intelligent recognition technology.
Tel: 13146317170 廖经理
Fax:
Email: 398017534@qq.com

Deep learning has been overwhelming recently: from image classification and speech recognition to image annotation, understanding of visual scenes, video summarization, speech translation, painting, and even the generation of images, voice, voice and music. As our home becomes more and more intelligent, you will find that many devices will need to use depth to learn, collect, and dispose of data in a continuous way.

So we need new hardware, a more efficient hardware that is more efficient than the Intel Xeon driver. A Intel server CPU can consume 100-150 watts of power and need a super large system with a cooling installation to support its performance.
Graphic processor, GPU
Field programmable logic devices, FPGA (field programmable gate array /Field-Programmable Gate Array)
Custom chip, special application integrated circuit, ASIC
Digital signal processor, DSP
The technology of the future, the creation of an alien, and the new laws of Physics
GPU

GPU was first designed to generate computer graphics based on polygonal networks. In recent years, because of the demand and complexity of the recent computer game and graphics engine category, GPU has accumulated a strong performance. NVIDIA GPU category leader, manager can consume thousands of kernel, the kernel design efficiency can reach 100%. In practice, these processors are also very suitable for computing the operation of neural networks and matrix multiplication. Note that the multiplication of matrix vectors is considered to be "difficult parallel (embarrassingly parallel)", because it can be parallelized by simple algorithm expansion (they are short branches, so that they can prevent cache information from losing).
Due to the ultra multi core GPU (~3500 Xeon, CF Intel 16 / Xeon Phi 32), between Intel CPU and NVIDIA's GPU competition grows the development of the GPU than CPU in the frequency of 2~3 times faster. GPU core is a streamlined version of CPU core which is more complex (branch prediction and process), but many of the former supports higher level parallel operation, so it has better performance.
This GPU is very good at training deep learning systems - convolution neural networks or recurrent neural network. They can operate a batch of 128 or 256 images in just a few milliseconds. But they also consume about 250 watts of power and require a good computer to support operation, which consumes an extra 150 watts of power. A high performance GPU system takes at least 400 watts of power.
This does not apply to the strengthening of ideal glasses, UAVs, mobile phones, mobile devices and small robots. It is also not acceptable for future consumer - class autopilot.
NVIDIA devices are trying to develop more efficient, for example, Tegra TX1 TX2 (performance, the depth of the neural network needs 12 watts of energy consumption and ~100 per second sustainedcomputing TX2 needs more and more powerful) Drive PX (250 watts, and the consumption of a Titan X.).
It is also important to note that live video is necessary in automatic driving vehicle and intelligent camera, and image batch processing is impossible. Video demand is stopped in real time for timely response.
The degree of ordinary GPU is about 5 G-flops/s per W. If we want to deploy a deep learning process plan in the moving system, we need a better way to do it!
GPU and CPU
The part of GPU's work is this: the amount of calculation is large, but there is no technical content, and many repeated times.
CPU and GPU are different because of the first tasks to be disposed of.
General CPU needs very strong to deal with various data types, and logic and will introduce a lot of branches and interrupt disposal. All of these make the internal structure of CPU complex. GPU is faced with a highly unified, undependent, and uninterrupted pure computing environment.
FPGA

The modern FPGA device of Xilinx and other companies is the Lego in the electronic components. We can use its circuit as a module to build the entire customized microprocessors and complex heterogeneous systems. In recent years, FPGA has begun to consume more and more modular computing modules. These DSP modules, like their names, can perform multiplication operations and are able to be arranged together to stop a certain amount of parallel operations.
Custom SoC

AMD, ARM, Intel, Qualcomm and NVIDIA are in the existing treatment plan to make customized chips integrated into them in. Nervana and Movidius, which are now under Intel, have or are developing a confluence plan. The performance of SoC on the same technical node is about 10 times as high as that of the FPGA system, and is even higher in a particular construction. As the power required for SoC and the processor becomes lower and lower, the difference will come from the effective application of the new confluent memory system and the bandwidth of the external memory. In this category, the 3D memory integrated into systems-on-a-package (SOP) can save at least 10 times the power.
DSP

DSP has been in existence for a long time, and it was originally used to perform matrix algorithms. But so far, DSP has not really provided any useful performance or a device that can match the GPU. Why is it like this? The main reason is the number of nuclear. DSP is mainly used in telecommunication systems without having to have 16 or more than 32 cores. The workload does not require so much. On the contrary, the GPU load has been increasing in the past 10~15 years.

PREVIOUS：Artificial intelligence related hot word NEXT：Mobile phone photo identification techno