/g/ - LLMs at 45x (Nvidia wants to ban it) - Technology

Anonymous

LLMs at 45x (Nvidia wants to b(...) 05/27/26(Wed)16:01:16 No.108920736

File: Screenshot from 2026-05-2(...).png (49 KB, 733x325)

LLMs at 45x (Nvidia wants to ban it) Anonymous 05/27/26(Wed)16:01:16 No.108920736 Archived

So what's the catch? The model has to be baked into silicon.

https://www.reddit.com/r/singularity/comments/1r9frzk/taalas_llms_baked_into_hardware_no_hbm_weights/

Link to Taalas
https://taalas.com/

Their demo, the agent "Jimmy":
https://chatjimmy.ai/

Anyway, this is why NVIDIA wants the government to mandate constantly updooted "ethics". Obviously, if it's baked in, that will make it a lot harder. So, they want to use the law to make Taalas illegal.

NVIDIA's way is SLOW.

Anonymous
05/27/26(Wed)16:15:09 No.108920828

Anonymous 05/27/26(Wed)16:15:09 No.108920828

>>108920736
buy an ad (for your indian slop company)

Anonymous
05/27/26(Wed)16:18:58 No.108920855

Anonymous 05/27/26(Wed)16:18:58 No.108920855

>>108920828
You're a shill of nvidia. Nvidia is the enemy of fast llms.

Anonymous
05/27/26(Wed)16:21:21 No.108920876

Anonymous 05/27/26(Wed)16:21:21 No.108920876

Imagine if this were true of any game.

>45x faster performance in Cyberpunk 2077
>using special card
>and it only works with the one game on it

It would be big news, but somehow it's an llm and everyone ignores it?

I think that most yt and press are just bribed by NVIDIA, unironically.

45x.

That means nvidia is obsolete - for LLM. It's already ogre.

Anonymous
05/27/26(Wed)16:22:21 No.108920885

Anonymous 05/27/26(Wed)16:22:21 No.108920885

>>108920876
(bearing in mind you have to train llm using gpus - but not for running them! not anymore, anyway)

Anonymous
05/27/26(Wed)16:23:07 No.108920891

Anonymous 05/27/26(Wed)16:23:07 No.108920891

it's a neat proof of concept but i don't think it'll scale
8b retard model is one thing, burning a multitrillion param monster on silicon is something else entirely
the sram/big chip companies like cerebras, groq, and matx are far likelier to eat up a lot of the inference market - which is why nvidia threw 20b at groq

also you sound a bit retarded, op

Anonymous
05/27/26(Wed)16:23:37 No.108920895

Anonymous 05/27/26(Wed)16:23:37 No.108920895

>>108920736
>So what's the catch?
Can I buy that card?

sage
05/27/26(Wed)16:24:53 No.108920903

sage 05/27/26(Wed)16:24:53 No.108920903

>>108920876
>lets just bake it into silicon and have a two-year rollout to ship a model on a card that will be two years out of date by the time the cards are on the shelves
sage goes in all fields

Anonymous
05/27/26(Wed)16:25:30 No.108920908

Anonymous 05/27/26(Wed)16:25:30 No.108920908

>>108920736
fuck llms and nvidia, this buble should just pop and get over it

/thread

Anonymous
05/27/26(Wed)16:26:25 No.108920913

Anonymous 05/27/26(Wed)16:26:25 No.108920913

>>108920908
2 more weeks

Anonymous
05/27/26(Wed)16:28:49 No.108920930

Anonymous 05/27/26(Wed)16:28:49 No.108920930

>>108920891
>it's a neat proof of concept but i don't think it'll scale
but
>Taalas HC1
>Fabricated by TSMC using its 6nm process node

Come on man, they can scale it.

Anonymous
05/27/26(Wed)16:29:50 No.108920938

Anonymous 05/27/26(Wed)16:29:50 No.108920938

>>108920895
>Can I buy that card?
I wish. I want one too. It's so fast you could put it in front of rendering a frame and still get 30fps.

Anonymous
05/27/26(Wed)16:31:24 No.108920957

Anonymous 05/27/26(Wed)16:31:24 No.108920957

>>108920908
Your response doesn't reflect an understanding of the capacities of llm, and as such it comes off as about 2 years out of date.

Anonymous
05/27/26(Wed)16:32:53 No.108920973

Anonymous 05/27/26(Wed)16:32:53 No.108920973

bizzzzz links
https://www.datacenterdynamics.com/en/news/ai-chip-startup-taalas-raises-169m-unveils-hc1-processor-optimized-for-llama-31-8b/

https://www.forbes.com/sites/karlfreund/2026/02/19/taalas-launches-hardcore-chip-with-insane-ai-inference-performance/

Anonymous
05/27/26(Wed)16:34:09 No.108920982

Anonymous 05/27/26(Wed)16:34:09 No.108920982

>>108920930
how are you going to fit llms that are 200x the size onto a chip, bro.
and also >>108920903
maybe when we have actual ASI it'll be worth burning an ASI chip, but until then no one's paying for faster inference on a 2 year old model
the sram chips are fast enough for now, have room to push further, and you can throw new models on to them without losing too much

Anonymous
05/27/26(Wed)16:36:11 No.108921001

Anonymous 05/27/26(Wed)16:36:11 No.108921001

File: 20260527223537002398.jpg (89 KB, 1052x1300)

89 KB JPG

>>108920736
seems pretty useless

Anonymous
05/27/26(Wed)16:41:29 No.108921049

Anonymous 05/27/26(Wed)16:41:29 No.108921049

>>108920982
>how are you going to fit llms that are 200x the size onto a chip, bro.
You can't, but we don't need 200x, not now.

google gemini:
>A hypothetical "HC-Next" could cram an estimated 25 GB to 30 GB model directly onto a single, maximum-reticle monolithic N2 chip.

well ok what might fit...

>gemma-4-26B-A4B-it-Q8_0.gguf
>26.9 GB

wow.

My dream of gemma 4 31B bf16 isn't coming soon, but this is still a huge "wow" situation.

Anonymous
05/27/26(Wed)16:42:29 No.108921051

Anonymous 05/27/26(Wed)16:42:29 No.108921051

>>108921001
>not using a jailbreak
why?

Anonymous
05/27/26(Wed)16:44:15 No.108921066

Anonymous 05/27/26(Wed)16:44:15 No.108921066

File: youre what the french cal(...).jpg (147 KB, 1440x1081)

147 KB JPG

>it's a 100m model

Anonymous
05/27/26(Wed)16:44:58 No.108921076

Anonymous 05/27/26(Wed)16:44:58 No.108921076

>>108920982
>>108920982
>but until then no one's paying for faster inference on a 2 year old model

I would. The ability to run an llm as an interpreter for each frame of a game is just one idea.

Anonymous
05/27/26(Wed)16:45:58 No.108921085

Anonymous 05/27/26(Wed)16:45:58 No.108921085

>>108921066
pig hands

Anonymous
05/27/26(Wed)16:46:59 No.108921091

Anonymous 05/27/26(Wed)16:46:59 No.108921091

I tried it and couldn't do a simple Lorentz transformation. It just started randomly answerings giving 5 different answers 5 different times

Anonymous
05/27/26(Wed)16:47:31 No.108921099

Anonymous 05/27/26(Wed)16:47:31 No.108921099

>>108920736
This is impressive and will probably have some useful niche applications but it's not scalable to replace regular GPUs. Anthropic is renting SpaceX's colossus datacentre, which began construction in September of 2024, for over a billion dollars a month. How much would they be willing to pay for it if all it could run was a 45x faster version of Claude 3 Opus? Probably not much. Let alone if they had to use a model from when the H100 GPUs were released (2022), let alone when those GPUs started development. In the same way OpenAI's current cheapskate model (GPT-5.4 mini, $4.50 / 1M output tokens) is smarter than their top-tier model from 2 years ago (gpt-4o, $15.00 / 1M output tokens), since the world of atoms progresses so much slower than the world of bits, it's not worth it to get better hardware at the expense of exponentially outdated software

Anonymous
05/27/26(Wed)16:47:45 No.108921103

Anonymous 05/27/26(Wed)16:47:45 No.108921103

>>108921049
there's no market for gemma 4 chips in 2027/28.
there isn't a market big enough to mass produce gemma chips today and sell them at a cost that's affordable for consumers.
you know what there is a market for? kimi 2.6 (1tn params) running at ~1k toks/s. that's why cerebras ipo'd at 60B mcap.

Anonymous
05/27/26(Wed)16:59:16 No.108921173

Anonymous 05/27/26(Wed)16:59:16 No.108921173

File: Screenshot from 2026-05-2(...).png (142 KB, 738x868)

142 KB PNG

>>108921103
There is from me. I would like a Taalas HC1.

I'm literally dreaming about it.

Older/dumber llms are highly useful.

Here's an example.

Anonymous
05/27/26(Wed)17:02:48 No.108921197

Anonymous 05/27/26(Wed)17:02:48 No.108921197

>>108921099
>but it's not scalable to replace regular GPUs
>>108921049

:^)

Anonymous
05/27/26(Wed)17:03:06 No.108921203

Anonymous 05/27/26(Wed)17:03:06 No.108921203

>>108921173
and how much would be willing to pay for this? 30k? 300k?

The Fool !OFoXTHUGNs
05/27/26(Wed)17:07:20 No.108921231

The Fool !OFoXTHUGNs 05/27/26(Wed)17:07:20 No.108921231

Rate my dotfiles?

https://github.com/foolish-dev/niri-dotfiles

Anonymous
05/27/26(Wed)17:08:46 No.108921238

Anonymous 05/27/26(Wed)17:08:46 No.108921238

>>108921203
So, for me I'm in at $2,000, basically what the 5090 should have cost. But the fires/meltdown meant I wasn't going to go that way.

Anonymous
05/27/26(Wed)17:09:47 No.108921240

Anonymous 05/27/26(Wed)17:09:47 No.108921240

>>108921203
>>108921238
google gemini (free edition):
>The estimated manufacturing and Bill of Materials (BOM) cost for the Taalas HC1 chip is approximately $400 to $600 per card. By physically hardwiring the model's weights into transistors, the HC1 eliminates the need for expensive High Bandwidth Memory (HBM) and complex packaging.

Anonymous
05/27/26(Wed)17:14:03 No.108921265

Anonymous 05/27/26(Wed)17:14:03 No.108921265

>>108921197
I'm not sure people like you are a big enough consumer base that this company is even going to do anything other than B2B sales in large batches. Even if they did, you're neglecting the fact that these would likely cost several times the amount as a regular GPU. So your choice would be between buying a GPU to run Gemma 4 26b Q8 incredibly quickly, or buying a similarly priced GPU that can run some future 100 billion parameter model at regular speed. Again, I think this is cool, but come on.

Anonymous
05/27/26(Wed)17:14:23 No.108921269

Anonymous 05/27/26(Wed)17:14:23 No.108921269

llms being baked into chips only makes sense when a single frontier model isnt replaced in 6 months by something better

eventually all datacenters will use those to reduce compute and electricity footprint, but only when new model development slows down

Anonymous
05/27/26(Wed)17:15:00 No.108921271

Anonymous 05/27/26(Wed)17:15:00 No.108921271

>>108921240
>chip is approximately $400 to $600 per card
at what run size, bro lol

Anonymous
05/27/26(Wed)17:17:24 No.108921294

Anonymous 05/27/26(Wed)17:17:24 No.108921294

>>108920736
Damn it is really fast

Anonymous
05/27/26(Wed)17:22:01 No.108921341

Anonymous 05/27/26(Wed)17:22:01 No.108921341

all inferences are llms being baked into the chip, are you guys retarded?

Anonymous
05/27/26(Wed)17:22:41 No.108921345

Anonymous 05/27/26(Wed)17:22:41 No.108921345

>>108921269
this, shit will be obsolete in a couple of months

Anonymous
05/27/26(Wed)17:24:12 No.108921360

Anonymous 05/27/26(Wed)17:24:12 No.108921360

>>108921345
the foremost advantage would be something like a mobile AI model you can bring with you anywhere at low wattage

some good enough local model being baked in a 200b parameters or something

Anonymous
05/27/26(Wed)17:25:03 No.108921368

Anonymous 05/27/26(Wed)17:25:03 No.108921368

>>108920736
>Taalas
moar liek Saarlas

Anonymous
05/27/26(Wed)17:27:09 No.108921380

Anonymous 05/27/26(Wed)17:27:09 No.108921380

>>108921345
I wouldn't mind paying like $200 every few months so I can run the latest chatgpt model locally at these speeds

Anonymous
05/27/26(Wed)17:29:07 No.108921393

Anonymous 05/27/26(Wed)17:29:07 No.108921393

>>108921269
>but only when new model development slows down
anon, I... you know this has already happened, right? we're done.. there are no more breakthroughs
words could only get us so far...

Anonymous
05/27/26(Wed)17:31:19 No.108921407

Anonymous 05/27/26(Wed)17:31:19 No.108921407

>>108921393
>we're done.. there are no more breakthroughs
t. everyone in history ever when talking about technological innovations
and yet here we are

Anonymous
05/27/26(Wed)17:32:38 No.108921415

Anonymous 05/27/26(Wed)17:32:38 No.108921415

File: Screenshot from 2026-05-2(...).png (387 KB, 748x938)

387 KB PNG

>>108921368
>Ljubisa Bajic
idk, here he is.

Anonymous
05/27/26(Wed)17:37:06 No.108921435

Anonymous 05/27/26(Wed)17:37:06 No.108921435

File: 1748902990637125.png (956 KB, 1080x762)

956 KB PNG

If this technology is so bad then why is NVIDIA lobbying Drumpf to ban it?

Anonymous
05/27/26(Wed)18:03:35 No.108921572

Anonymous 05/27/26(Wed)18:03:35 No.108921572

>>108921380
This also applies to games. Paying for a "game card" that's actually for a specific game makes sense, if it's a good game.

I would buy a "skyrim card".
>but it's not the latest version

well, I mean, there are down sides but man, I have heard people complaining while on a cruise ship. You're not eating out of a garbage bin.

Anonymous
05/27/26(Wed)18:14:41 No.108921638

Anonymous 05/27/26(Wed)18:14:41 No.108921638

File: 1758877815398314.png (160 KB, 806x1194)

160 KB PNG

>>108920736
Whatever model they baked into this is easy to fool

Anonymous
05/27/26(Wed)18:17:10 No.108921654

Anonymous 05/27/26(Wed)18:17:10 No.108921654

>>108921638
Llama 3.1 8B

This is exactly how nvidia is attempting to block them. If they make it against the law to have "unsafe" replies, then they can keep out competitors.

Obviously,t he obligation shouldn't be entirely on the provider. The user and their family matter.

Anonymous
05/27/26(Wed)18:35:21 No.108921762

Anonymous 05/27/26(Wed)18:35:21 No.108921762

File: Screenshot_20260527_173316-1.png (206 KB, 1232x235)

206 KB PNG

My jewgle friend had this to say about it

Anonymous
05/27/26(Wed)18:42:06 No.108921806

Anonymous 05/27/26(Wed)18:42:06 No.108921806

>>108921762
That sounds inaccurate. "gemini" isn't a specific model.

Anonymous
05/27/26(Wed)18:43:56 No.108921820

Anonymous 05/27/26(Wed)18:43:56 No.108921820

>>108921762
So while literally it's an "asic" - it's not like typical asics, anyway:

>The chip uses a "Mask ROM recall fabric" where the 8 billion parameters of the Llama 3.1 8B model are represented by the physical configuration of transistors and metal interconnects. There is no "loading" of weights; the data flow is the computation.

Anonymous
05/27/26(Wed)18:53:22 No.108921879

Anonymous 05/27/26(Wed)18:53:22 No.108921879

>>108921638
>any special indigenous population that lives anywhere on the planet is called a thing : that thing is a slur now t. llm
cool

Anonymous
05/27/26(Wed)18:59:59 No.108921916

Anonymous 05/27/26(Wed)18:59:59 No.108921916

>>108921820
so basically its a silicon brain with 0 neuroplasticity like some old grandma with dementia who cant remember what happened yesterday
i guess Grandma learned to code

Anonymous
05/27/26(Wed)19:12:29 No.108921993

Anonymous 05/27/26(Wed)19:12:29 No.108921993

>>108921916
promptlets think

Anonymous
05/27/26(Wed)19:14:22 No.108921999

Anonymous 05/27/26(Wed)19:14:22 No.108921999

>>108920876
>It would be big news,
well where is the card that makes games run 45x faster?

no i don't care about llm garbage

Anonymous
05/27/26(Wed)19:16:19 No.108922013

Anonymous 05/27/26(Wed)19:16:19 No.108922013

>>108921999
It's a paradigm shift. It's beyond the imagination of the gaming industry rn.

Anonymous
05/27/26(Wed)19:17:56 No.108922021

Anonymous 05/27/26(Wed)19:17:56 No.108922021

>>108920876
You say that but hasn't this always been the case? With things like PsysX, CUDA, et al.?
Only works with thier card, and only on things that choose to support it.
I'm not defending this or saying it's good, I'd rather open general solutions, but I also don't think this is at all new for them.
Vendor lock-in exclusivity bullshit.

Anonymous
05/27/26(Wed)19:18:04 No.108922023

Anonymous 05/27/26(Wed)19:18:04 No.108922023

>>108920876
>insane fps
>devs forced to remove all bugs before the game releases because they cant just patch it out later
BUT
>games cost $700 each
>you can't pirate the game anymore

Anonymous
05/27/26(Wed)19:19:37 No.108922025

Anonymous 05/27/26(Wed)19:19:37 No.108922025

>>108920736
for the nth fucking time: TRAINING+INFERENCE is far more resource-intensive than INFERENCE. what big tech companies are doing is TRAINING+INFERENCE. why? because they are also training the models with your own inputs...
what this hardware (and the company using it) probably does is INFERENCE ONLY.
please learn how this shit works.
any yes, it is possible that this is a thing.

Anonymous
05/27/26(Wed)19:20:25 No.108922033

Anonymous 05/27/26(Wed)19:20:25 No.108922033

>>108920876
A card like this would be expensive enough that we'd unironically need to go back to an arcade-style model for it to make sense.
I don't think any game company would go that far anyway, realistically we could already get a big boost from going back to making every game in assembly, but all the people who know enough x86 assembly to pull this shit off already have higher paying jobs than gamedev doing other shit; Now think about how there's even less people capable of designing complex silicon than that

Anonymous
05/27/26(Wed)19:20:32 No.108922034

Anonymous 05/27/26(Wed)19:20:32 No.108922034

>>108922023
>you can't pirate the game anymore
Seems like llms make everything piratable.

Anonymous
05/27/26(Wed)19:22:39 No.108922042

Anonymous 05/27/26(Wed)19:22:39 No.108922042

>>108922023
The game won’t need to cost as much as a 4070 if it runs 45x faster since you would need a 4070 to run it

Anonymous
05/27/26(Wed)19:23:25 No.108922045

Anonymous 05/27/26(Wed)19:23:25 No.108922045

>>108920876
It would be 'big news' because everyone would be calling it retarded and a waste of money

Anonymous
05/27/26(Wed)19:23:48 No.108922049

Anonymous 05/27/26(Wed)19:23:48 No.108922049

>>108922033
Yeah, companies often resist change. I would be shocked if any of them survive the transition.

>>108922025
I don't really care. Like fast thing. simple as.

Anonymous
05/27/26(Wed)19:25:31 No.108922059

Anonymous 05/27/26(Wed)19:25:31 No.108922059

>>108922025
also, let us know when they make a similar chip for deepseek r2

>>108922049
>I don't really care. Like fast thing. simple as.
ok, good for you. go buy one of these devices then, should work just fine.

Anonymous
05/27/26(Wed)19:27:58 No.108922074

Anonymous 05/27/26(Wed)19:27:58 No.108922074

>>108922045
I vibe asked gemini free how much people pay for ms improvements. it said this re the jump from premium 1000hz to ultra premium 8000hz mice:
>Value Metric: $34.28 spent per 1 ms saved.

Gamers will be willing to pay for custom cards for each game they play.

Anonymous
05/27/26(Wed)19:28:59 No.108922080

Anonymous 05/27/26(Wed)19:28:59 No.108922080

>>108922059
>because you can't buy it at microcenter it's not real technology

Anonymous
05/27/26(Wed)19:30:46 No.108922090

Anonymous 05/27/26(Wed)19:30:46 No.108922090

>>108920876
that was called arcade boards, specially designed to run one game well, then commuters got fast so 90's arcades were just desktop PC's with a monitor in a sawdust and Elmer's glue box

Anonymous
05/27/26(Wed)19:31:58 No.108922093

Anonymous 05/27/26(Wed)19:31:58 No.108922093

>>108922090
mmhm

Anonymous
05/27/26(Wed)19:33:07 No.108922098

Anonymous 05/27/26(Wed)19:33:07 No.108922098

>>108922080
learn basic logic before claiming anything retarded or just making shit up, RETARD

Anonymous
05/27/26(Wed)19:34:18 No.108922106

Anonymous 05/27/26(Wed)19:34:18 No.108922106

I like ENTJ people, sparingly. It's especially un-nice when their iq isn't quite as high as your own, because they never know it.

Anonymous
05/27/26(Wed)19:38:14 No.108922138

Anonymous 05/27/26(Wed)19:38:14 No.108922138

>>108922106
same with INTPs but theyre even more diehard about thinking their half-baked ideas are correct. they literally dopamine goon repetitive slop all day and think that somehow theyre still genius

Anonymous
05/27/26(Wed)19:39:30 No.108922143

Anonymous 05/27/26(Wed)19:39:30 No.108922143

>>108920891
Yeah just like gpt-1 was a proof of concept? Did you try this shit? The model is retarded for sure but so were all of them, and it’s fast. Like, it’s not even fast, it’s instantaneous

Anonymous
05/27/26(Wed)19:42:15 No.108922157

Anonymous 05/27/26(Wed)19:42:15 No.108922157

I need to bench llama-31-8b on my machine.

But if we consider what's coming, ram is expensive, and it could take a long time for consumer hardware to actually advance in terms of ram, and other components have skyrocketed.

I think this is the perfect storm for Taalas.

Anonymous
05/27/26(Wed)21:12:38 No.108922549

Anonymous 05/27/26(Wed)21:12:38 No.108922549

>>108920736
Why the fuck are they using shit outdated model to demonstrate it
Use Qwen 3.6 or Gemma 4 holy shit

Anonymous
05/27/26(Wed)21:17:16 No.108922570

Anonymous 05/27/26(Wed)21:17:16 No.108922570

>>108920736
Nvidia will just buy them out again just like Groq, won't they?

Anonymous
05/27/26(Wed)21:33:10 No.108922620

Anonymous 05/27/26(Wed)21:33:10 No.108922620

>>108921049
The largest model almost never fit in a single GPU's vram, instead they use horizontal scalers, like nvidia nvl-servers, that are a generalization of the nvlink technology.

You can load a 1.5T model using about 16 GPU with 96GB of vram, so their chip could theoretically be used to run these types of models. Also once you have the base model, you can use loras and setup agents. For agents the model speed is crucial, because a lot of tasks are sequential and can't be parallelized.

Anonymous
05/28/26(Thu)01:20:05 No.108923515

Anonymous 05/28/26(Thu)01:20:05 No.108923515

File: 1760555085157639.png (22 KB, 702x453)

22 KB PNG

>8B
It's a freaking 1B model

Anonymous
05/28/26(Thu)01:47:47 No.108923643

Anonymous 05/28/26(Thu)01:47:47 No.108923643

>>108923515
even blacked well only has 100B tranny sisters in it. obviously that's all they're going to fit in it.

Anonymous
05/28/26(Thu)02:03:40 No.108923725

Anonymous 05/28/26(Thu)02:03:40 No.108923725

File: 1779948110735-019e6d2c-4a(...).jpg (167 KB, 1024x1024)

167 KB JPG

>Artifical permanent retrograde amnesia.
They should have called it "Leonard", not "Jimmy".