Hello! Welcome to Embedic!
This website uses cookies. By using this site, you consent to the use of cookies. For more information, please take a look at our Privacy Policy.
Home > Embedded Events > MLPerf latest list, Nvidia ushered in the big challenge

MLPerf latest list, Nvidia ushered in the big challenge

Date: 06-07-2022 ClickCount: 238

MLPerf, the authoritative list of chip performance in the field of artificial intelligence, has the latest runtime results. This update, mainly MLPerf edge inference and MLPerf tiny, reflects the latest achievements of new and old giants in the chip industry (such as Nvidia, Qualcomm, Alibaba, etc.) in related directions.

 

Background of MLPerf

With the development of AI, the computational power of related chips for AI-related applications has also become an important chip performance indicator. When AI chips first emerged, different companies often showed a performance result that was most favorable to them. Still, comparing the numbers between different chip companies was difficult. For example, it wasn't easy to directly compare the latency results of company A on 1-bit quantized ResNet with the results of company B on 32-bit floating-point precision VGGNet.

 

To solve this problem, ML Commons, an industry group related to machine learning hardware, has launched an MLPerf runtime platform where ML Commons gives representative models on a range of mainstream tasks and defines various details of these models (e.g., prediction accuracy, quantization methods, etc.). Major companies can run these models on their chips and submit results. Companies can run these models on their chips and submit their results, verified by ML Commons and officially published on the MLPerf list.

 

The MLPerf list is divided into several sub-lists: Data Center Training, Data Center Inference, MLPerf Edge inference, Mobile Device Inference, and MLPerf Tiny. IoT applications for lower-power AI computation (much less power consumption than mobile devices).

 

The New Landscape of Edge Inference

Edge inference applications have received much attention in the last two years. Edge computing was originally launched for robotics, 5G, and other applications requiring large computing power and low latency. The main reason for the current attention is that edge computing needed in smart driving is moving into the mainstream, so edge computing for smart driving is expected to become the next major market for semiconductor chips, thus attracting the attention of major companies in the industry.

 

From the competitive landscape, for the edge scenario of artificial intelligence computing, Nvidia is the longest layout. Almost from 2016 began launching and iterating the relevant chips to maintain their leading position in the field. In the initial launch of MLPerf in the edge computing field, Nvidia has often been leading. However, since last year, Qualcomm has become a heavyweight competitor in this space with its Cloud AI series of compute acceleration cards for edge computing scenarios.

 

In the latest MLPerf score results, we also see Nvidia and Qualcomm as the biggest names in edge computing results. First, Nvidia released runtime results for its latest Orin family of SoCs, with a three- to five-fold performance improvement over its predecessor, Xavier. The Orin SoC is a design tailored by Nvidia for scenarios such as robotics and autonomous driving. The Orin SoC includes CPU, GPU, DLA, and other vision-related gas pedal IP to ensure that the SoC can efficiently perform various related tasks. In the MLPerf-related scores released, Nvidia also noted that these models run in the SoC's GPU and DLA, not just in one IP, allowing higher performance.

 

The highlight for Qualcomm is that we are seeing more and more companies using its Cloud AI 100 accelerator card. Since Qualcomm first uploaded its Cloud AI 100 results last September, we've seen companies like Gigabyte, Alibaba, and KRAI submit their scores using the Cloud AI 100 in addition to Qualcomm, showing that the card is slowly gaining acceptance from major system vendors.

So how does the Nvidia Orin compare to the Qualcomm Cloud AI 100? First, both chips are suitable for edge computing scenarios, with Orin's overall system power consumption at 15-40W and Qualcomm's accelerator card's power consumption at 15-20W (note that Orin is an SoC and the power consumption includes the processor part, while Qualcomm's accelerator card also needs to work with the main processor), so it can be said that the overall power consumption is very close. The latest scores from MLPerf also show that the two scores are in the same ballpark.

 

Qualcomm's performance and power efficiency ratios for the image classification task are strong, running single-stream ResNet with an overall system power consumption of 24W to achieve a latency of 0.84ms. At the same time, Orin requires more power consumption (42W) to achieve a similar latency (0.92ms). Qualcomm's Cloud AI 100 is also slightly better in batch processing throughput, processing 5849 images per second with 24W system power and 9780 images per second with 36W power; in contrast, Nvidia Orin can only process 4750 ResNet-based image classification tasks per second with 42W power.

 

However, Nvidia outperforms Qualcomm on the object detection task, with Orin running the object detection task at a latency of 0.59ms using the SSD model, compared to 1.7ms for Cloud AI 100. samples per second).

 

We believe that this difference in performance comes from the different architectures of Nvidia and Qualcomm: Nvidia Orin is an SoC and includes a GPU that can handle different operators more efficiently and flexibly. At the same time, Qualcomm's Cloud AI 100 is an accelerator card dedicated to neural networks. Some operators it cannot handle need to be transferred to the main processor via the PCIe interface to complete the computation. This introduces latency. For tasks such as image classification, where almost all operations are done in the neural network, Qualcomm's Cloud AI 100 has the advantage and can achieve a very high power efficiency ratio; however, in the object detection task, there are some non-standard neural network operators, and this time Nvidia Orin, which can handle all kinds of operators more flexibly, has a greater advantage in terms of latency.

 

Nvidia and Qualcomm also represent two different design ideas, i.e., SoCs that flexibly support various operators and accelerator cards that do the ultimate optimization for a large class of operators. We believe that Nvidia will continue this way of thinking, with its strength in software ecology and strong network compatibility as its main selling point. And as smart driving and other applications with high demand for latency become more and more important, whether Qualcomm will also launch SoCs specifically for such applications is very much worth our attention.

 

Qualcomm does not lack technical capabilities in this area. All it needs is the determination to invest capital in SoCs specifically for this market (the current Cloud AI 100 is not specifically for edge computing applications but for both cloud and edge computing acceleration cards). Suppose Qualcomm is also determined to make related SoCs. In that case, we think it will mark the next stage of the competitive landscape in the edge computing market, where multiple large companies will be fully committed to investing significant resources in this area, and this will also reverse the development of the edge computing space from another perspective, as chip computing in this area is expected to evolve more quickly in a more competitive landscape.

 

Low-power inference: Alibaba shows AI strength

In addition to edge computing, the low-power inference segment of the latest MLperf score was a highlight. That's because Alibaba also submitted their results using their self-researched Xuantie chip. According to the relevant results, Alibaba's score is significantly ahead of other companies, which reflects Alibaba's ability to combine software and hardware in the AI space.

 

The scores submitted by Alibaba are based on its Xuantie processor (using the RISC-V instruction set architecture), which does not contain dedicated AI gas pedal IP on the processor, but rather has models running directly on it. In addition, the results submitted by Alibaba do not run the standard model directly for each task, but use a model optimized by Alibaba for the Xuantie processor (while ensuring the same prediction accuracy as the standard model in each task), so we believe such a large lead is the result of collaborative software and hardware optimization. We believe that in low-power scenarios, the only way to achieve the maximum energy efficiency ratio improvement is to use software and hardware co-optimization. With its strong AI R&D capabilities, Alibaba has a big advantage in this regard. At the same time, since Alibaba has all the design details of the Xuantie CPU, it can design the relevant neural network to ensure that both the operators and data storage in the model can maximize efficiency.

  • Melexis Introduces the First Miniature Angle Encoder Chip with Smaller Dimensions
  • Difficult points to note in MCU circuit design

Hot Products

  • DM355SDZCE270

    Manufacturer: Texas Instruments

    IC DIGITAL MEDIA SOC 337-NFBGA

    Product Categories: SOC

    Lifecycle:

    RoHS:

  • MKL03Z32VFG4

    Manufacturer: NXP

    IC MCU 32BIT 32KB FLASH 16QFN

    Product Categories: 32bit MCU

    Lifecycle:

    RoHS:

  • TMS320C6743BPTP2

    Manufacturer: Texas Instruments

    IC DSP FIX/FLOAT POINT 176HLQFP

    Product Categories: DSP

    Lifecycle:

    RoHS:

Customer Comments

  • Looking forward to your comment

  • Comment

    Verification Code * 

Compare products

Compare Empty