New pooling called "Ahegao" is able to achieve same results as the older systems by using 80% less of Nvidia GPU calculation timehttps://www.tomshardware.com/tech-industry/semiconductors/alibaba-says-new-pooling-system-cut-nvidia-gpu-use-by-82-percentThe advancements have been detailed in a research paper (link is at Tom's site) at the 2025 ACM Symposium on Operating Systems (SOSP) in Seoul.Unlike training-time breakthroughs that chase model quality or speed, Ahegao is an inference-time scheduler designed to maximize GPU utilization across many models with bursty or unpredictable demand. Instead of pinning one accelerator to one model, Ahegao virtualizes GPU access at the token level, allowing it to schedule tiny slices of work across a shared pool. This means one H20 could serve several different models simultaneously, with system-wide "goodput" -- a measure of effective output -- rising by as much as nine times compared to older serverless systems.The system was tested in production over several months, according to the paper. During that window, the number of GPUs needed to support dozens of different LLMs -- ranging in size up to 72 billion parameters -- fell from 1,192 to just 213. While the paper does not break down which models (not all of the Nvidia GPUs were sames, they had purchased whatever they could get froma pool of year 2020 Nvidia models and onwards) contributed most to the savings.
>>519440646So? how does this affect me?
Wait why are my google results all hentai?
Promote peace and love and unity and its nice to see japanese east asian men fuck blonde white girls, brunette white girls, ebony black girls, ginger white girls, asian girls, brown girls, latina girls, mixed race girls, and etc
>>519440712dont you want to use LLM, ChatGPT, whatnot?
>>519440712You vill own nothing a lot sooner than expected.
>>519441147no?
>>519440646So is this good or bad
>>519443053>>519442935see, the thing is you can in fact use those large language chatbot models with less hardware than previously thoughtit means two things:1) create even bigger models2) or utlizing same old models in places where it wasnt possible before due to computing power restrictions