Huawei's KunLun AI Space Debuts FP8 Inference for DeepSeek V3.1

Recently released by Huawei Computing, the DeepSeek V3.1 model has caught the attention of industry watchers due to its use of the FP8 precision format (UE8M0 FP8). According to official announcements from Huawei, KunLun Technology Co., Ltd.^*, a subsidiary focusing on AI and computing solutions, has successfully developed an innovative soft FP8 solution based on Ascend C operator programming language for the Ascend AI platform.

The adoption of FP8 precision significantly reduces model memory requirements compared to traditional FP16 or BF16 formats. This breakthrough not only cuts down server hardware pressure but also ensures higher inference accuracy when contrasted with standard INT8 quantization methods, striking a balance between cost-efficiency and performance quality.

Key Breakthroughs

The newly introduced soft FP8 solution offers two critical advancements:

Precision Retention: By feeding FP8 weight models into Ascend hardware and converting them to BF16 format through exact reverse quantization operators, this method maintains computational accuracy while future-proofing for upcoming FP8 models.
Scalability Across Devices: The scheme ensures that a single KunLun G8600 machine can run the full version of DeepSeek V3.1 smoothly, and even less powerful units such as KunLun G5500V2 or G5580 can handle doubled model parameters while increasing concurrent processing capabilities.

Technical Details

KunLun Technology’s solution is built upon three core technologies:

Custom FP8 Reverse Quantization Operators: Efficiently cutting down both memory and bandwidth demands.
Operator Full Graph Dispatching: Boosting inference efficiency by 32%.
Seamless Compatibility with Mainstream Models: Ensuring easy support for a range of FP8 models.

This breakthrough in computing infrastructure represents another leap forward in the field of AI and machine learning, showcasing Huawei’s commitment to advancing technological capabilities through innovative solutions. KunLun AI Space’s FP8 solution is now fully compatible with DeepSeek V3.1 and other leading FP8 models like DeepSeek-V3/R1 and Qwen3, providing a robust framework for future developments.

With the rapid evolution of AI technologies, this development marks a significant step towards more efficient resource utilization without compromising on performance. It opens up new possibilities for businesses looking to implement advanced AI solutions with reduced costs, demonstrating Huawei’s prowess in delivering cutting-edge computing technologies that address real-world challenges.

Source: ithome.com

Huawei’s KunLun AI Space Debuts FP8 Inference for DeepSeek V3.1

Key Breakthroughs

Technical Details

Leave a Reply Cancel reply