Mythos can improve speed of training code 52x (compared to human 4x at 4-8hrs)
The footnote reads: «How large the speedup gets depends heavily on how much room for improvement the starting code leaves, and it should not be read as a real-world training speedup. So the absolute multiple is not the figure to anchor on here. What is more informative is the like-for-like comparison that this experimental setup makes possible, both across models (~3x to ~52x over the past year) and against a skilled human (~4x in four to eight hours on the same task).»
>>108982265aitoddlers btfod
>>108982253Great news. Maybe mathfags at Anthrophic will be able to produce code that's only 30 orders of magnitude slower than the norm now!
>>108982253Is this why 4.7 and 4.8 were worse than 4.5 and 4.6 and why claude code remains a buggy pule of steaming shit?
AI will be effectively useless to all but brown people until it can be ran cheaply and efficiently in a local setup.
>goythos
>>108982253why are you marketing a product which will never be sold to the public?
>look our model that we can't show you did something amazing on some vague metric that we invented!What's the point of these shill threads?
Also work with CPU designhttps://github.com/FeSens/auto-arch-tournament/blob/main/docs/auto-arch-tournament-blog-post.mdLLM will save the computing world
>>108983247Move Slow, Snailcat!
Over
>>108982253We don't need faster, we need smarter. Let us know when it can count the number of days in a week with the letter 'a' without having to make a tool call, or at least make it so that it knows when it's guessing and when it's sure instead of pretending it's sure every time.
>>108983632Ironically I wrote a simple prototype that simply hooked up an LLM with a wikidata database and tools allowing fetching triplets and storing triplets in an overriding 'user database', and that alone was enough to fix virtually everything about it. I'm also working on using this mechanism to add quality/confidence of source on the triplets which is a big more challenging, but should allow the LLM to choose not to believe schizobabble over primary sources.So far it's proven extremely useful in overriding bad assumptions made by the models. You can define new facts and words and it will get them right all the time. Whereas without this hookup, it'll just hallucinate meanings and go off the rails. If it can't find predicates or entities, it always informs the user it doesn't know instead of making it up, which is great. If it finds multiple potentially matching entities, it asks the user which option makes most sense instead of guessing, unless the context is enough to disambiguate (for example, ask gpt 5.5 who is albert einstein and it will immediately go for the scientist, my system will ask if you mean the scientist or the actor now known as Albert Brooks, but if you ask about what albert einstein contributed to physics, it will disambiguate and not consider the actor as a potential match). Also, it was able to bring the answer quality of qwen 3.5 up beyond that of gpt 5.5 xhigh. This is good evidence that the thinking layer can be improved and that improving it has direct, profound impact on downstream quality.
>>108983632>count the number of days in a week with the letter 'a' without having to make a tool callit cant, because thats not what an LLM/token predictor does. the fact that it can make a tool call is all that matters
>>108983705Yes, this is correct. The whole end-to-end shtick is old deep learning ideas that only mattered in the lab, it was the high water mark of what these things can do but has never been an actual thing in production where these systems were always hooked up with tools like we are now rediscovering and pretending it's new. Same with agents, we always had these in practice. Classical answering systems had a router entrypoint that would dispatch to deep learning goal-oriented chatbots. And those systems, within their goal-oriented niches, performed as well as early gpt-4 did, though they were nearly never available to the public. The fact gpt-4 worked about as well despite being general was really cool, though.
>>108982253>It's hecking AGI i swear.strawb...>NOO!!! Not like that!!!
>>108983247Move slow, Snailcat!