Nvidia's KV Cache Transform Coding (KVTC) compresses LLM key-value cache by 20x without model changes, cutting GPU memory ...
DirectStorage 1.4 brings along key upgrades to the API, including support for Zstandard compression as well as CreatorID for ...