[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/pol/ - Politically Incorrect


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1760908160774953.jpg (143 KB, 735x920)
143 KB
143 KB JPG
New pooling called "Ahegao" is able to achieve same results as the older systems by using 80% less of Nvidia GPU calculation time

https://www.tomshardware.com/tech-industry/semiconductors/alibaba-says-new-pooling-system-cut-nvidia-gpu-use-by-82-percent

The advancements have been detailed in a research paper (link is at Tom's site) at the 2025 ACM Symposium on Operating Systems (SOSP) in Seoul.

Unlike training-time breakthroughs that chase model quality or speed, Ahegao is an inference-time scheduler designed to maximize GPU utilization across many models with bursty or unpredictable demand. Instead of pinning one accelerator to one model, Ahegao virtualizes GPU access at the token level, allowing it to schedule tiny slices of work across a shared pool.

This means one H20 could serve several different models simultaneously, with system-wide "goodput" -- a measure of effective output -- rising by as much as nine times compared to older serverless systems.

The system was tested in production over several months, according to the paper. During that window, the number of GPUs needed to support dozens of different LLMs -- ranging in size up to 72 billion parameters -- fell from 1,192 to just 213.

While the paper does not break down which models (not all of the Nvidia GPUs were sames, they had purchased whatever they could get froma pool of year 2020 Nvidia models and onwards) contributed most to the savings.
>>
>>519440646
So? how does this affect me?
>>
Wait why are my google results all hentai?
>>
Promote peace and love and unity and its nice to see japanese east asian men fuck blonde white girls, brunette white girls, ebony black girls, ginger white girls, asian girls, brown girls, latina girls, mixed race girls, and etc
>>
>>519440712
dont you want to use LLM, ChatGPT, whatnot?
>>
>>519440712
You vill own nothing a lot sooner than expected.
>>
>>519441147
no?
>>
File: 1712558773905578.gif (1017 KB, 498x345)
1017 KB
1017 KB GIF
>>519440646
So is this good or bad
>>
>>519443053
>>519442935
see, the thing is you can in fact use those large language chatbot models with less hardware than previously thought

it means two things:
1) create even bigger models
2) or utlizing same old models in places where it wasnt possible before due to computing power restrictions



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.