Lowering the cost of inference is typically a combination of hardware and software. A new analysis released Thursday by Nvidia details how four leading inference providers are reporting 4x to 10x ...
Modal Labs, a startup specializing in AI inference infrastructure, is talking to VCs about a new round at a valuation of about $2.5 billion, according to four people with knowledge of the deal. Should ...
The focus of this new AI accelerator is inference— the production deployment of AI models in applications. Its architecture combines high compute performance with a newly designed memory system and a ...
The creators of the open source project vLLM have announced that they transitioned the popular tool into a VC-backed startup, Inferact, raising $150 million in seed funding at an $800 million ...
Nvidia (NVDA) has backed Baseten, a startup focused on providing inference for artificial intelligence applications, in its latest funding round, according to the Wall Street Journal. Baseten recently ...
Google researchers have warned that large language model (LLM) inference is hitting a wall amid fundamental problems with memory and networking problems, not compute. In a paper authored by ...
In recent years, the big money has flowed toward LLMs and training; but this year, the emphasis is shifting toward AI inference. LAS VEGAS — Not so long ago — last year, let’s say — tech industry ...
AI storage firm Vast Data has launched native integration of its operating system available on Nvidia BlueField-4 DPUs in a bid to service inference sessions in the agentic era. Leveraging those DPUs ...
In a pivotal move that could reshape the AI hardware landscape, Nvidia has reportedly secured approximately 90% of the workforce from AI chipmaker Groq, including its CEO and the renowned inventor of ...
The option to reserve instances and GPUs for inference endpoints may help enterprises address scaling bottlenecks for AI workloads, analysts say. AWS has launched Flexible Training Plans (FTPs) for ...
Chipmaker Nvidia has reported revenue of $57bn for its third-quarter 2026 filing,with its datacentre business contributing the most to the company’s bottom line, posting revenue of $51bn – a 66% ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results