Quantization Process - Search News

Nvidia shrinks LLM memory 20x without changing model weights

Nvidia's KV Cache Transform Coding (KVTC) compresses LLM key-value cache by 20x without model changes, cutting GPU memory costs and time-to-first-token by up to 8x for multi-turn AI applications.

The Malaysian Reserve

Nota AI Reduces Memory Usage of Upstage’s Solar LLM by 72%, Demonstrating Proprietary Quantization Technology

New "Nota AI MoE Quantization" approach preserves model performance while significantly improving memory efficiencySEOUL, South Korea, March 5, 2026 ...

21d

Alibaba's new open source Qwen3.5-Medium models offer Sonnet 4.5 performance on local computers

This leap is made possible by near-lossless accuracy under 4-bit weight and KV cache quantization, allowing developers to process massive datasets without server-grade infrastructure.

GitHub

RuntimeError during INT4 AWQ quantization of Qwen3-Next-80B-A3B-Instruct: probability tensor contains inf/nan Description

When attempting to quantize Qwen3-Next-80B-A3B-Instruct using the HF PTQ example with INT4 AWQ quantization, the calibration process appears to complete successfully ...

blockchain

NVIDIA's NVFP4 KV Cache Revolutionizes Inference Efficiency

NVIDIA introduces NVFP4 KV cache, optimizing inference by reducing memory footprint and compute cost, enhancing performance on Blackwell GPUs with minimal accuracy loss. In a significant development ...

Control Global

Fundamentals to better understand process dynamics

There is a chance we can all get on the same page as to what really is going on in a process’s dynamic response. There is a lot of confusion that can be resolved if we have a fundamental understanding ...

marktechpost

This AI Paper Explores Quantization Techniques and Their Impact on Mathematical Reasoning in Large Language Models

Mathematical reasoning stands at the backbone of artificial intelligence and is highly important in arithmetic, geometric, and competition-level problems. Recently, LLMs have emerged as very useful ...

IEEE

DL-AQUA: Deep-Learning-Based Automatic Quantization for MMSE MIMO Detection

Abstract: Directly affecting both error performance and complexity, quantization is critical for MMSE MIMO detection. However, naively pruning quantization levels is ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results