So what's the catch? The model has to be baked into silicon.https://www.reddit.com/r/singularity/comments/1r9frzk/taalas_llms_baked_into_hardware_no_hbm_weights/Link to Taalashttps://taalas.com/Their demo, the agent "Jimmy":https://chatjimmy.ai/Anyway, this is why NVIDIA wants the government to mandate constantly updooted "ethics". Obviously, if it's baked in, that will make it a lot harder. So, they want to use the law to make Taalas illegal.NVIDIA's way is SLOW.
>>108920736buy an ad (for your indian slop company)
>>108920828You're a shill of nvidia. Nvidia is the enemy of fast llms.
Imagine if this were true of any game.>45x faster performance in Cyberpunk 2077>using special card>and it only works with the one game on itIt would be big news, but somehow it's an llm and everyone ignores it?I think that most yt and press are just bribed by NVIDIA, unironically.45x.That means nvidia is obsolete - for LLM. It's already ogre.
>>108920876(bearing in mind you have to train llm using gpus - but not for running them! not anymore, anyway)
it's a neat proof of concept but i don't think it'll scale8b retard model is one thing, burning a multitrillion param monster on silicon is something else entirelythe sram/big chip companies like cerebras, groq, and matx are far likelier to eat up a lot of the inference market - which is why nvidia threw 20b at groqalso you sound a bit retarded, op
>>108920736>So what's the catch?Can I buy that card?
>>108920876>lets just bake it into silicon and have a two-year rollout to ship a model on a card that will be two years out of date by the time the cards are on the shelvessage goes in all fields
>>108920736fuck llms and nvidia, this buble should just pop and get over it/thread
>>1089209082 more weeks
>>108920891>it's a neat proof of concept but i don't think it'll scalebut>Taalas HC1>Fabricated by TSMC using its 6nm process nodeCome on man, they can scale it.
>>108920895>Can I buy that card?I wish. I want one too. It's so fast you could put it in front of rendering a frame and still get 30fps.
>>108920908Your response doesn't reflect an understanding of the capacities of llm, and as such it comes off as about 2 years out of date.
bizzzzz linkshttps://www.datacenterdynamics.com/en/news/ai-chip-startup-taalas-raises-169m-unveils-hc1-processor-optimized-for-llama-31-8b/https://www.forbes.com/sites/karlfreund/2026/02/19/taalas-launches-hardcore-chip-with-insane-ai-inference-performance/
>>108920930how are you going to fit llms that are 200x the size onto a chip, bro.and also >>108920903maybe when we have actual ASI it'll be worth burning an ASI chip, but until then no one's paying for faster inference on a 2 year old modelthe sram chips are fast enough for now, have room to push further, and you can throw new models on to them without losing too much
>>108920736seems pretty useless
>>108920982>how are you going to fit llms that are 200x the size onto a chip, bro.You can't, but we don't need 200x, not now.google gemini:>A hypothetical "HC-Next" could cram an estimated 25 GB to 30 GB model directly onto a single, maximum-reticle monolithic N2 chip.well ok what might fit...>gemma-4-26B-A4B-it-Q8_0.gguf>26.9 GBwow.My dream of gemma 4 31B bf16 isn't coming soon, but this is still a huge "wow" situation.
>>108921001>not using a jailbreakwhy?
>it's a 100m model
>>108920982>>108920982>but until then no one's paying for faster inference on a 2 year old modelI would. The ability to run an llm as an interpreter for each frame of a game is just one idea.
>>108921066pig hands
I tried it and couldn't do a simple Lorentz transformation. It just started randomly answerings giving 5 different answers 5 different times
>>108920736This is impressive and will probably have some useful niche applications but it's not scalable to replace regular GPUs. Anthropic is renting SpaceX's colossus datacentre, which began construction in September of 2024, for over a billion dollars a month. How much would they be willing to pay for it if all it could run was a 45x faster version of Claude 3 Opus? Probably not much. Let alone if they had to use a model from when the H100 GPUs were released (2022), let alone when those GPUs started development. In the same way OpenAI's current cheapskate model (GPT-5.4 mini, $4.50 / 1M output tokens) is smarter than their top-tier model from 2 years ago (gpt-4o, $15.00 / 1M output tokens), since the world of atoms progresses so much slower than the world of bits, it's not worth it to get better hardware at the expense of exponentially outdated software
>>108921049there's no market for gemma 4 chips in 2027/28.there isn't a market big enough to mass produce gemma chips today and sell them at a cost that's affordable for consumers.you know what there is a market for? kimi 2.6 (1tn params) running at ~1k toks/s. that's why cerebras ipo'd at 60B mcap.
>>108921103There is from me. I would like a Taalas HC1.I'm literally dreaming about it.Older/dumber llms are highly useful.Here's an example.
>>108921099>but it's not scalable to replace regular GPUs>>108921049:^)
>>108921173and how much would be willing to pay for this? 30k? 300k?
Rate my dotfiles?https://github.com/foolish-dev/niri-dotfiles
>>108921203So, for me I'm in at $2,000, basically what the 5090 should have cost. But the fires/meltdown meant I wasn't going to go that way.
>>108921203>>108921238google gemini (free edition):>The estimated manufacturing and Bill of Materials (BOM) cost for the Taalas HC1 chip is approximately $400 to $600 per card. By physically hardwiring the model's weights into transistors, the HC1 eliminates the need for expensive High Bandwidth Memory (HBM) and complex packaging.
>>108921197I'm not sure people like you are a big enough consumer base that this company is even going to do anything other than B2B sales in large batches. Even if they did, you're neglecting the fact that these would likely cost several times the amount as a regular GPU. So your choice would be between buying a GPU to run Gemma 4 26b Q8 incredibly quickly, or buying a similarly priced GPU that can run some future 100 billion parameter model at regular speed. Again, I think this is cool, but come on.
llms being baked into chips only makes sense when a single frontier model isnt replaced in 6 months by something bettereventually all datacenters will use those to reduce compute and electricity footprint, but only when new model development slows down
>>108921240>chip is approximately $400 to $600 per cardat what run size, bro lol
>>108920736Damn it is really fast
all inferences are llms being baked into the chip, are you guys retarded?
>>108921269this, shit will be obsolete in a couple of months
>>108921345the foremost advantage would be something like a mobile AI model you can bring with you anywhere at low wattagesome good enough local model being baked in a 200b parameters or something
>>108920736>Taalasmoar liek Saarlas
>>108921345I wouldn't mind paying like $200 every few months so I can run the latest chatgpt model locally at these speeds
>>108921269>but only when new model development slows downanon, I... you know this has already happened, right? we're done.. there are no more breakthroughswords could only get us so far...
>>108921393>we're done.. there are no more breakthroughst. everyone in history ever when talking about technological innovationsand yet here we are
>>108921368>Ljubisa Bajicidk, here he is.
If this technology is so bad then why is NVIDIA lobbying Drumpf to ban it?
>>108921380This also applies to games. Paying for a "game card" that's actually for a specific game makes sense, if it's a good game.I would buy a "skyrim card".>but it's not the latest versionwell, I mean, there are down sides but man, I have heard people complaining while on a cruise ship. You're not eating out of a garbage bin.
>>108920736Whatever model they baked into this is easy to fool
>>108921638Llama 3.1 8BThis is exactly how nvidia is attempting to block them. If they make it against the law to have "unsafe" replies, then they can keep out competitors.Obviously,t he obligation shouldn't be entirely on the provider. The user and their family matter.
My jewgle friend had this to say about it
>>108921762That sounds inaccurate. "gemini" isn't a specific model.
>>108921762So while literally it's an "asic" - it's not like typical asics, anyway:>The chip uses a "Mask ROM recall fabric" where the 8 billion parameters of the Llama 3.1 8B model are represented by the physical configuration of transistors and metal interconnects. There is no "loading" of weights; the data flow is the computation.
>>108921638>any special indigenous population that lives anywhere on the planet is called a thing : that thing is a slur now t. llmcool
>>108921820so basically its a silicon brain with 0 neuroplasticity like some old grandma with dementia who cant remember what happened yesterdayi guess Grandma learned to code
>>108921916promptlets think
>>108920876>It would be big news,well where is the card that makes games run 45x faster? no i don't care about llm garbage
>>108921999It's a paradigm shift. It's beyond the imagination of the gaming industry rn.
>>108920876You say that but hasn't this always been the case? With things like PsysX, CUDA, et al.?Only works with thier card, and only on things that choose to support it.I'm not defending this or saying it's good, I'd rather open general solutions, but I also don't think this is at all new for them.Vendor lock-in exclusivity bullshit.
>>108920876>insane fps>devs forced to remove all bugs before the game releases because they cant just patch it out laterBUT>games cost $700 each>you can't pirate the game anymore
>>108920736for the nth fucking time: TRAINING+INFERENCE is far more resource-intensive than INFERENCE. what big tech companies are doing is TRAINING+INFERENCE. why? because they are also training the models with your own inputs...what this hardware (and the company using it) probably does is INFERENCE ONLY.please learn how this shit works.any yes, it is possible that this is a thing.
>>108920876A card like this would be expensive enough that we'd unironically need to go back to an arcade-style model for it to make sense. I don't think any game company would go that far anyway, realistically we could already get a big boost from going back to making every game in assembly, but all the people who know enough x86 assembly to pull this shit off already have higher paying jobs than gamedev doing other shit; Now think about how there's even less people capable of designing complex silicon than that
>>108922023>you can't pirate the game anymoreSeems like llms make everything piratable.
>>108922023The game won’t need to cost as much as a 4070 if it runs 45x faster since you would need a 4070 to run it
>>108920876It would be 'big news' because everyone would be calling it retarded and a waste of money
>>108922033Yeah, companies often resist change. I would be shocked if any of them survive the transition.>>108922025I don't really care. Like fast thing. simple as.
>>108922025also, let us know when they make a similar chip for deepseek r2>>108922049>I don't really care. Like fast thing. simple as.ok, good for you. go buy one of these devices then, should work just fine.
>>108922045I vibe asked gemini free how much people pay for ms improvements. it said this re the jump from premium 1000hz to ultra premium 8000hz mice:>Value Metric: $34.28 spent per 1 ms saved.Gamers will be willing to pay for custom cards for each game they play.
>>108922059>because you can't buy it at microcenter it's not real technology
>>108920876that was called arcade boards, specially designed to run one game well, then commuters got fast so 90's arcades were just desktop PC's with a monitor in a sawdust and Elmer's glue box
>>108922090mmhm
>>108922080learn basic logic before claiming anything retarded or just making shit up, RETARD
I like ENTJ people, sparingly. It's especially un-nice when their iq isn't quite as high as your own, because they never know it.
>>108922106same with INTPs but theyre even more diehard about thinking their half-baked ideas are correct. they literally dopamine goon repetitive slop all day and think that somehow theyre still genius
>>108920891Yeah just like gpt-1 was a proof of concept? Did you try this shit? The model is retarded for sure but so were all of them, and it’s fast. Like, it’s not even fast, it’s instantaneous
I need to bench llama-31-8b on my machine.But if we consider what's coming, ram is expensive, and it could take a long time for consumer hardware to actually advance in terms of ram, and other components have skyrocketed.I think this is the perfect storm for Taalas.
>>108920736Why the fuck are they using shit outdated model to demonstrate itUse Qwen 3.6 or Gemma 4 holy shit
>>108920736Nvidia will just buy them out again just like Groq, won't they?
>>108921049The largest model almost never fit in a single GPU's vram, instead they use horizontal scalers, like nvidia nvl-servers, that are a generalization of the nvlink technology.You can load a 1.5T model using about 16 GPU with 96GB of vram, so their chip could theoretically be used to run these types of models. Also once you have the base model, you can use loras and setup agents. For agents the model speed is crucial, because a lot of tasks are sequential and can't be parallelized.
>8BIt's a freaking 1B model
>>108923515even blacked well only has 100B tranny sisters in it. obviously that's all they're going to fit in it.
>Artifical permanent retrograde amnesia.They should have called it "Leonard", not "Jimmy".