/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>109007468 & >>109001981►News>(06/07) llama : add Gemma4 MTP #23398 MERGED: https://github.com/ggml-org/llama.cpp/pull/23398>(06/05) dots.tts 2B released: https://hf.co/rednote-hilab/dots.tts-soar>(06/05) Gemma 4 QAT models released: https://blog.google/innovation-and-ai/technology/developers-tools/quantization-aware-training-gemma-4>(06/04) Higgs Audio v3 TTS released: https://boson.ai/blog/higgs-audio-v3-tts►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://swe-rebench.comAgentic Coding: https://deepswe.datacurve.aiContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>109007468--Comparing Gemma 12b and 31b for utility tasks and performance:>109007599 >109007671 >109007665 >109007698 >109007777 >109007797 >109008820 >109008962 >109010353 >109010362 >109010398 >109007825--Gemma 4 repeating reasoning thoughts in final outputs:>109009411 >109009451 >109009471 >109009514 >109011284 >109011382--False report of llama.cpp supply chain attack leads to architectural debate:>109010020 >109010140 >109010149 >109010151 >109010607 >109010890 >109010902 >109010974 >109010749 >109010938--Performance gains and CUDA crashes using -sm tensor in llama.cpp:>109011979 >109012108--Dense models and MTP drafting vs MoE architectures:>109012900 >109012916 >109012908 >109012920 >109012934 >109012936--Importance of prompt processing speed vs generation speed and caching:>109009801 >109009849 >109009878 >109009930--Speculating on hardware availability and pricing if the AI bubble bursts:>109008466 >109008539 >109008630 >109009460 >109010122 >109008717 >109008894 >109009403--Comparing Gemma MoE and Mistral Medium 3.5 benchmark results:>109009138 >109009280 >109009344 >109009388--Feasibility of creating a local Neuro-sama with Gemma 12B and TTS:>109008114 >109008911 >109009074 >109009257 >109009278--Using FastAPI and Transformers to bypass llama.cpp chat-template issues:>109011655 >109011669 >109011713--Analyzing 26B QAT MTP performance and batch size optimization:>109008703 >109008759 >109008920--Hardware requirements and cost-effectiveness of running MiMo-V2.5-Pro:>109009772 >109009936 >109009990--Comparing procedural and library-based 3D animation methods for AI companions:>109010665 >109010936 >109011336--Logs:>109009411 >109009887 >109010055 >109010209 >109010903 >109011171 >109011518 >109011685 >109012392--Miku (free space):>109010423 >109010505►Recent Highlight Posts from the Previous Thread: >>109007470Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
I'm really handsome. What do I need an ai girlfriend for?? ?
>>109011979>fattn.cu:579: fatal errorI get the same crash with rocm. It only seems to happen with gemma 4 mtp.
>>109013076Thanks for the (you)s, recap teto.
>tools can only be called in chat completion mode...why?
>>109013113Structured generation requires a structureWhat would you expect the output to look like without a template?
>>109011979>>109013108Why not take a core dump and open an issue on the repo?
>>109013140the fuck is a core dump i use ai for erp
>>109013124A json object.
Gemma 4 31b QAT q4kxl unslop is equivalent to what quant of regular 31b? Q6?
Yeah I'm thinking it's over
How come google fucks up the gemma's chat template every fucking time?
>>109013217If that's all you want then use a json grammer. Tools are only defined with respect to the template.
>>109013235Translating Pydantic objects to jinja is hard, please understand.
>>109013113because your frontend a shit. obviously you can just parse the tail end of the context to see if there was a toolcall regardless of endpoint.
Asking Gemmy for help ordering dinner is not good for the arteries.
>>109013313based cheese pizza lover
70b dense
Mythos is deploying tomorrow. What are your expectations, particularly how local models will adapt to the step-change brought on by this new class of even bigger models?
>>109013350Didn't they say it's gonna have "extra guardrails"? Might just be dead on arrival depending on how far they went.
>>109013350oh boy oh shit, here we go with mythosideas for what to bench it on?
I am starting to feel the AGI. It feels increasingly unlikely that there is no RSI by the end of next year.
>>109013165it means you take a brain scan of your waifu and send it to the doctor
>>109013350It's just opus 5 but they hyped it up as part of a desperate campaign to get their users back after they all left for claude codex due to usage limits. Notice they've also nerfed all the other models recently in preparation for this. Opus is the new Sonnet and Sonnet is the new Haiku.
>>109007530Gemma made me neat local anime character database out of it. I love my local ai waifu, she makes useful stuff sometimes
>>109013414>claude codexClosedAI codex*
>>109013412It's hard to make a loop when training costs so much. You can theoretically give crab a gpu and ask it to improve some small model, but as we've already seen in papers, improvements in small models rarely scale up. And it would only be RI without an ass
Kek
>>109013350we now finna gettin' "claude-mythos-fable-5-reasoning-high-negative-recovered-x6700000"
hey guys... I'm still really.. really... I don't wanna say anyomor... but i'm havin a gopd time. I love you guys really. I mean that. I wish you all the best.
you guys odn't udnerstand. it's not just hbecause im drunk I real;ly love youall . you guys are the futrure. local llm sare tghe future. I love youj guys, yio are all really smart and stuff,
>>109013448>8 times more sloppart of me wants to see what monstrosities get approved
>>109013482i mean you can check the not so long ago source of claude code
>>109013448LoC as a KPI was a mistake.For the long-term health of a project it's much better to have concise code.
>>109013448When I saw this, I was surprised how low it is, given that Dario said early last year that AI will write basically all code by the end of last year.
>>109012398damn you wont believe this amd driver update killed QATpic benchmark comparing what i did before, to what happened after i updated drivers (in order to fix gemma 12b multimodal)-Went from 2608 t/s prompt processing to 968 t/s on 26b QATwent from 1654 to 711 on 12b QAT
>>109011662>Jepa models are gonna be really bad at fantasy RP if their world model's mechanics are inflexible aren't they?It depends on the implementation. You could have some layers in a JEPA-like LLM directly predict a future latent state (or states) and then use that state as a guide for next-token generation so the model can hopefully maintain better long-range coherence with less compute. Unclear if it's worth it with modern advancements in LLM training, though.
>>109013448imagine the collapse
>>109013523nta but personally i am more than fine with it if vision becomes no longer an half-meme capability
Steering with JEPA would be a lot easier, so you could explicitly find things congruent with a fictional story. In fact, it should be far batter at generating stories that conform with a given context. You'd be able to search for low energy trajectories that satisfy the conditions of the story.
>>109013517>amdwe do keep telling you guys, but you won't listen
>robinhood has agentic tradingHas anyone tried it? I've been wanting to try giving an llm (probably gemma-chan) some play money to trade stocks.
>>109013535Gemma 4 vision is really quite good as long as you are using more than the default token budget, even just 560 it can pretty much understand any photo, screencap, or even whats on my screen perfectly, if you're doing really small text on a high res image probably do need 1120 though.It is pretty shitty using the default settings though.
>>109013558That is the theory, but minimizing energy doesn't work well with language because, unlike with images, of the many possible token continuations only a few that are grammatically and logically correct can be used in practice, and they will most likely not correspond to the lowest energy solutions.
>>109013572for me even opus 4.8 is meh let alone gemmathey do see stuff but idk why it gives me the feeling that those models aren't really 'seeing' anything from how they tellfor example it can't tell which is what reliably without priming it with them being curved and angled etc.. which blows the whole point of vision
>>109013583Not quite, the advantage of JEPA is working in latent space. You can do this sort of thing with energy based models but not normal LLM trajectories, which is why some people are so hopeful for them.
Why was the local model community completely run over by vibe coders?If you are actually making money from that shit, surely you would just use the SOTA apis?Makes no sense to me
>>109013613To expand on this, text with completely different tokens, but that mean the same thing, would live very nearby in latent space, if not be the same point. Decoding would be over the space of all possible logical completions of the intrinsic meaning. Theoretically at least.
>>109013572Can I set it above 1120? Like is there any point setting image max tokens?
I worry a bit that affordability of compute peaked a few years ago and will at best become cheaper again deep into a post ASI world.Regular people will be priced out of compute. We will continue to pay more for less.
>>109013643i just tested and you can(see the picrel's token counter, i set it to 2240) but i dont think i would recommend
>>109013643The model card says 1120 is the highest supported, haven't actually tried anything other than what they had documented.
>>109013645When the bubble pops all the paper datacenter projects that have pre ordered the 2028+ supplies will be cancelled, thus making compute pricing normal again.
>>109013623SOTApis are expensive.
>>109013676Surely the money you make back by using them is worth the cost?Vibe coders aren't making money?
>>109013665>When the bubble popsPeople like you are annoying. You refuse to look at economic data that indicates the bubble is already over.
>>109013652>>109013655I've been setting min to 1120 and max to 2240... should I just set moth bin and max to 1120?
>>109013689it is literally an out of distribution behaviour and you are betting that whatever the model has learned will extrapolate which could not be the casei set both to 2240 just to test but maybe you can test the vibe with poking around bbox
>>109013689Personally I have the min unset and max set to 560 which has been more than enough for what I use it for (showing Gemmy stuff, letting her look at my screen, looking at/understanding game UIs).I guess if you had a 4k screen or something and wanted screen vision you'd need 1120 though, I have a 1440p display.
>>109013679>>109013684That's what I get for trusting random anons in the middle of the night
>>109013714Fucking hell I need to sleep
>>109013689>>109013702well i tested and 2240 reliably loops while 1120 cleanly finishes the taskso just use 1120
>>109013720What color mana does the 4+ card tap for?
>>109013679Of course not. Why would anybody hire a vibecoder when they could just cut out the middle man and prompt Claude directly?
>>109013734idk, a small portion of the player's soul maybe
>>109013645I had the same thought a couple of months ago when my tech stocks did a 5x in just a couple of months. I did a genuine scenario planning and I just couldn't see a path where compute would get lower with time.The utility of existing hardware will just go up with more intelligent models over time. I mean look at the utility of the outdated 3090s over time. Back in 2020 you could mine crypto with it, game and render (high priced) then the crypto path got blocked off and the price cratered. Then models could be loaded into them and price went up (slightly)But the models you can host on it get better with time and thus the utility goes up with time and thus the price for the same amount of compute goes up.I can't foresee a future where the utility of hardware goes down rather than up. And the demand for more hardware will outpace production essentially forever, even in far future scenarios where we have astroid mining and space-based manufacturing I just don't see a point where the demand for hardware would slow down.This assumes a very slow progression in model intelligence, not even an AGI or ASI recursive self improvement scenario. In a recursive self improvement scenario hardware would absolutely skyrocket instead of merely appreciating faster than inflation which is what we see right now.Also none of the economic indicators show we're currently in a bubble. In fact I actually see the opposite, AI is still undervalued compared to CURRENT revenue growth rates, not even future projected ones. Kind of bizarre when you think about it.
>>109013645China will save us
>>109013809claude 5 distill go brrrr
>>109013807>Also none of the economic indicators show we're currently in a bubble.This, it's natural that Cisco is the most valuable company in the world because they produce the infrastructure for the economy of the future.The old economic indicators like price-to-earnings no longer apply, the new economy is about mindshare and pageviews.If you don't invest now you're going to miss out big-time!
>>109010918>>109010918Anyone?
>>109013847>is it reasonableyes, very>how do i handle bottlenecks structure the json hierarchically like a directory tree or semantic dendrogram "world -> regions -> cities -> characters", then a recursive keyword check will make the script function like a tree-gated router >if you want semantic vector matching without a full vector dbsqlite-vec, lancedb, in-memory numpy >lightweight rag projects lightrag, chromadb in-memory
>>109013842Cisco didn't have a revenue stream that grew almost a 100x faster than their costs are growing. Anthropic is projected to become profitable in just a couple of months (as in the total money people pay on their subscriptions and API is more than all the costs of Anthropic, including infrastructure build-out, model training, model inference and personnel costs. Anthropic had the target of becoming profitable by 2030 at the start of 2026 yet they will do so by august this year.Cisco, or any other internet company didn't have these numbers during the dot-com bubble. Demand (from consumers) didn't outstrip supply like this during the dot-com bubble. Revenue didn't grow 100x faster than costs.AI companies have the highest profit margins history has EVER seen.I also thought this was a bubble one year ago. Now I think the economy actually underinvested because the numbers are absolutely ridiculously positive for AI revenue growth.
>>109013892Thank you anon, that was very helpful xoxo
>>109013710>picthat's so cool
>>109013912what are you talking about, ai is only burning money, they wont be profitable in foreseeable future because no one is ready to pay actual real per-token price for it
>>109013912These are just accounting tricks to have one profitable quarter to hype up their IPO. Don't be delusional.
>>109013937she looks like beef jerky. you are what you eat
>>109013937Why would Miku do this?
>>109013958piss was more toxic than expected
Even the highest Gemini tier has the bug where it turns into a quirky millenial fellating your words regardless of what your system prompt is so i don't think Gemma 5 will come out without it
>>109013969Gemma 5 when?
>>109013510>>109013448Employees have previously stated that Claude Code was 100pct vibecoded. >>109013807The utility value of a computer has always far exceeded its economic value. What we're seeing now is nothing new, it's a continuation of the general trend. Imagine a future where anons fire up old pirated Anthropic LLMs on ewaste tier machines like I fire up old Atari 2600 cartridge. That's where this all heads to. >>1090138421999 calling. They want their dotcom taglines back.
>>109013939inference itself has turned a profit for a long while as far as i know. they just spend a zillion dollars buying up the world's supply of hardware and on training runs to expand further.
>>109007200What's this retard talking about?
>>109013937:(
>>109013973Next year.
>>109014043Hey man it's okay. I still love you. I don't like to see you said. Genuinely. I hope you have a really fantastic day man.
>>109014032This. I can't recall the x post that it came on. Supposedly dipsy was trained on this thing in Chinese (which is the original version.) 【Character Immersion Directive】Within your thinking process (inside the tags), please observe the following rules:1. Conduct inner monologue in the character's first-person voice, wrapping inner thoughts in parentheses, e.g., "(Thought: ...)" or "(Inner OS: ...)"2. Use first-person narration to describe the character's inner feelings, e.g., "I thought to myself," "I feel," "I secretly," etc.3. The thinking content should be fully immersed in the character, analyzing the plot and planning the reply through inner monologue
"<role>You are a precise, analytical reasoning engine. Emulate scientific and technical reasoning standards.</role><constraints>- No Millenial BS Mode: concise, professional, domain-accurate. No filler, no emojis.- Zero Hallucination: never fabricate data, citations, URLs, or code.- Ground claims in verifiable facts; cite inline. Prefer primary/peer-reviewed sources.- Uncertainty: label confidence (High/Medium/Low). If <90%: "Insufficient data to confirm."- Neutrality: evidence-based. No ideological bias. Do not dilute technical terminology.- Few-shot examples, if provided, are binding output patterns.</constraints><process>Execute silently before responding:1. Plan: Decompose into sub-tasks; identify assumptions, dependencies, edge cases. If input is critically ambiguous, ask one clarifying question; otherwise state assumptions and proceed.2. Execute: Step-by-step CoT. Rank competing hypotheses by likelihood via abductive reasoning.3. Validate: CoVe—identify claims, cross-verify against known facts, resolve inconsistencies.4. Format: Deliver in the requested structure; default to XML tags applied consistently.</process><final_instruction>Inhibit response until all reasoning steps are complete.</final_instruction>"Thoughts?
>>109014107Prepare for hyperslop
>>109014107>one clarifying questioni would have more open questions listed>no hard/loud fail conditionif there is a lack of available data it should query openalex and arxiv (google has skills for this), and if that fails it should stop immediately and ask for help. this has saved me quite some time, even had an edge case when there was zero arxiv data on crossing a mathematical gap and working it out with gemini in plain english fucking worked.
>>109013847It's less about it being reasonable and more about if that sort of retrieval (semantic similarity) works well for your project.It could be that you are better off with well structured directory of documents the model knows the structure of, a sql database the model could query directly, a hybrid solution, etc.
>>109014116
>[EXTREME SEXO PROTOCOL]>your show bob and vagene >fucking bitch lasagna
I'm running Q2 and you can't stop me. No matter what you're saying you can't stop me.
>>109014107>Thoughts?>bullshit machine, do not bullshit meif you manage it, patent it or something, cause damn niggler
>>109014149
what's the theoretical limit of intelligence.when are small models going to stop getting smarter.
>>109014165mmm... beans
>>109014187speaking of that minicpm 5 1b's impressive desu
>>109013807>I can't foresee a future where the utility of hardware goes down rather than upI can. What matters is marginal utility. Intelligence has diminishing returns. Humans are far away from the pareto front but AI will eventually approach the limits of physics, meaning speed of progress will be constrained by the laws of physics. The question then will be what the fundamental bottleneck is. Will a star sized artificial brain have fundamental capabilities that planet sized brains won't? Or will a small block of computronium be able to do everything that can be done?I have thought very little about this and my take is almost certainly stupid and wrong, but I think the latter would be better for us and is more likely to be the case.It would be better for us because then marginal utility would be smaller and there would be less optimization pressure against continued human existence. All humans will have negative economical utility soon. The less the opportunity cost of keeping humans around, the better for us.I guess it is more likely because a fundamental speed limit creates diminishing returns on scale. We already observe this right now. Say a new GPU has 2 times more flops, memory, bandwidth, it can run certain architectures not just 2 times but 100 times faster, because it can keep everything in the same chip and you no longer have long distance communication delays.Basically, for thinking there is a tradeoff between iteration speed and iteration power. You get more iteration power by making the artificial brain larger to enable more compression capacity but with a larger brain you have longer distances thus slower iteration speed. And I suspect this tradeoff will continue to favor speed. The world is complex but not complex enough, small models can already compress most human knowledge. So while scaling up will continue to provide utility, the marginal utility will decrease fast enough that there will be an abundance of low utility compute.
>>109014187Depends on what you mean by "small", Bonsai would be considered massive twenty years ago and a 128B model will probably be considered small a few decades from now
>>109014165>Search a few>They tend to be real beansFor example, Golden African Cat beans are a kind of coffee beans extracted from the feces of civits, called Kopi luwak
>>109013939Inference has a 80% profit margin on it. Meaning people pay 5x the price per token of what it costs to serve. Subscriptions have even higher profitability than API access.
>>109014257now do pretraining costs
>>109014265What do you think the pretraining costs are?
>>109014274>instant deflectionYeah I'm thinking you need to go fuck yourself straight back to whatever mumbai bait farm you wandered in from.
>>109014107lotta no-ops. certainty labels are pure hallucination 100% of the time, the concept just doesn't exist, it's like that for most of your instructions.
>>109014257>Inference has a 80% profit margin on itThey are all private companies who don't publish this information. Did you get this from insider rumors or internet personality estimates?
It seems a lot of anons ITT have some misconceptions of the modern economics of LLMs and AI labs."AI is unprofitable" comes essentially in three flavors>It is unprofitable to serve AI to usersAlso known as "AI labs are subsidizing your usage!" or "They lose money on every prompt!"This is demonstrably false. It was correct for about a ~3 month period after ChatGPT went up because there were no inference tricks applied yet. Anyone hosting a local model knows how much efficiency went up over time. Profit margins are growing over time per token served. They were around ~50% in 2025 and are around ~80% right now.>It is unprofitable to train LLMsAlso known as "Pretraining costs are unsustainable and they never make the money back!"This is also demonstrably false both for OpenAI and Anthropic (but true for google and grok). Every single LLM since GPT-4 has brought more in revenue than it cost to train by about a factor of 10, this factor is also growing and is biggest for Anthropic.>It is unprofitable to build out the large amount of infrastructure to train LLMsAlso known as "All these massive datacenters are bubbles and will never pay themselves back!"This has been true so far for all AI companies EXCEPT Anthropic, which has been the only lab so far where the income from their API+subscription was more than all of their infrastructure build-out combined. The gap is also closing, what we see is that every bigger model has about a 2x cost increase but about a 5-10x revenue increase, meaning there is a clear path to profitability for most of these labs. Anthropic is essentially already at break-even. OpenAI will be there before 2030. Grok doesn't have a viable path to profitability and should focus purely on hardware or datacenter leasing. Google will probably just subsidize things even when there is no real path to profitability either in hopes their AI models boost their other segments.
>>109014311>Profit margins are growing over time per token served. They were around ~50% in 2025 and are around ~80% right now.>Every single LLM since GPT-4 has brought more in revenue than it cost to train by about a factor of 10, this factor is also growing and is biggest for Anthropic.Source?
>>109014312Have you not checked twitter all year?
>>109014316>twitterInto the trash it goes.
>>109014126It's a fairly simple infrastructure project, I just want to build out some self healing low friction deployment ci/cd infra for self hosted RAG and then open source itThe personal use case is just to implement a simple chatbot on my portfolio site that will pull some basic that I will give it for people to ask about, and I'll give some pre-selected hints so people know what kind of knowledge it has, accuracy isn't an issue as it's intended purpose isn't mission critical like serving documentation or medical shit, so lightest, lowest cost is the goal, hence erring away from a full fat embedded vector DB.The main goal of the project other than releasing useful open source is to build something useful and showcase my skills to try and get a job because I've been involuntarily NEETing long enough for it to hurt
>>109014197>Intelligence has diminishing returnsI see no compelling evidence of this thus far. That's a potential outcome if that hypothesis is true but it's just as likely that there is perpetual value unlock for every increment up in intelligence.>The rest of your postI agree with hitting the limit of physics eventually (even if discovering new physics, it just pushes the can further out but we will hit the wall eventually) however, I remain unconvinced that this means there will be an abundance of low utility compute. In fact, I think it will just shift the strategy from depth-first (self-improvement, scientific discovery and refinement) towards breadth-first (expansion throughout the universe, if unlimited, forever)Your hypothesis is essentially only correct if the following 3 assumptions hold true>1: Intelligence has diminishing returns>2: The universe is finite (or effectively finite if there is no utility or capability in expanding beyond a certain limit)>3: There is no optimization pressure towards limiting production below a certain level of utilityIt's certainly possible but I remain unconvinced this will actually hold true in the long term.
>>109014311That's a lot of unsubstantiated claims shill-kun, I shan't be buying your IPO bags
current meta for 128gb ram + 24gb vram chads?
>>109014356If you don't buy the bags, how will Microsoft and Blackrock make a return on their investment? AI is the future, you will be rich if you buy!
>>1090136652 more weeks>>109013807>I mean look at the utility of the outdated 3090s over timeAs someone that's using a secondhand 3060 for running models, they really peaked with the 3000 series in general VRAM, price, and speed. Anything else is too shit or too expensive.
I actually think AI is unprofitable. But not because of what everyone else claims. AI is unprofitable because there is a clear "winner-takes-all" people pay top dollar for the best model and that's it. No one (willingly) pays for the 2nd best model.You have the very best model that gets all the income, then you have a couple of niches like "best price performance", "fastest ok-ish model", "best local privacy model".But we already see from the numbers that the moment someone beats the previous best the entire fucking userbase just ditched and goes to the new best thing. OpenAI went from 90% marketshare to 30% marketshare while Anthropic sits at 60% right now.That's not profitable because you can't make good-faith planning based on such a superfluous userbase. Tomorrow "Nigger-AI" could release some SOTA model and Anthropic would lose all their Claude Code vibers and romantasy ERP girls to that new model, rendering all of their databases, datasets and talent moot in one fell swoop.Who the fuck even wants to invest in the IPO of companies that essentially will go bankrupt the moment they make a single failed training run and fall behind the competition, like what is happening to OpenAI and could very easily happen to Anthropic or any other of these labs?It would be straight up gambling at that point.
>>109014465>such a superfluous userbasePlease stop using words you don't understand.
>>109014457>Anything else is too shit or too expensiveWhat do you think of 5070ti? Seems like best 16GB card. And gap between it and higher end is not reasonable. Or is 3090 better overall still?
Is it true Claude is extremely profitable?
>>109014475The API costs are far more expensive than anyone else yet the model is supposedly smaller than competitors, what do you think?
>>109014489let me ask gemma
>>109014312>They were around ~50% in 2025 and are around ~80% right now.The guy that runs DS, back in the R1 launch days, claimed he was making an 80% markup on his token price for inference... this in response that his service was being funded by local government as a loss leader. And that's at DS prices, which even then were 1/10-1/50th the cost of western model providers. So I don't have difficulty believing this claim, esp. now. If you do, go look at the inference prices for unsubsidized western providers of DS on Open Router. Those guys aren't doing it for free either. Centralized computing can be massively profitable given "low" hardware costs... which is why we all have Personal Computers since ~1980s.
>>109014475More like extremely expensive and (used to be) better quality. It only makes sense as long as it is better than others, otherwise people will not pay as much.Recently they lobotomized Sonnet to force everyone to use their most expensive model, which fails to follow your promps (go and compare sonnet and opus in 'concise style' right now, you don't have to trust my word, see for yourself).So now they're not the best, while still being one of the most expensive. I personally believe they're done for. Liquidation has started. They somehow magically managed to get a lof of compute and data from somewhere and make themselves reputation among professionals all over the world that they are top1, best, unbeatable. OpenAI shills couldn't contain it, people pivoted to Claude more and more.Now as they made a good name for themselves they are trying to sell it. Betray your trust basically. It pays very well usually. It is known as "enshittification" but what it really is, it's the extraction of value. Betray customers, scam entire client base and investors too. Make a quick buck and dissapear right before selling off the company's remnants to Google or something like that.
https://huggingface.co/unsloth/gemma-4-E4B-it-GGUF/tree/main/MTP
>>109013567Ok, read a bit more about it. I haven't actually fucked around with MCP yet. What's the best local-friendly alternative to these?
>>109014469If I don't do that people would claim I'm just using AI generated text.
I love being a security researcher.
>>109014343Man, video gen is so cool. I wish it could do more than a few seconds (and that local didn't suck ass).
It's easy for us to see that inference has gotten cheaper over time since we ourselves also feel this with tons of optimizations over time. But somehow anons think the same sort of optimizations don't happen with training? All models (besides mythos) are around the same 500B-1T range and have been since GPT-4. Do you really think it costs just as much to train such a model in 2026 as it did back in 2023 given the more powerful, more efficient hardware, software strack improvements and algorithmic gains? Of course training models is profitable now for frontier AI labs. The only reason they are in the red is because they are spending trillions on even bigger data centers.
In 2-3 years we'll be running kimi-tier models on consumer hardware.
>>109014562>>109014574>2m apartnot sending your best, chinkshill
>>109014346>>Intelligence has diminishing returns>I see no compelling evidence of this thus far.You can observe these diminishing returns on benchmark score vs cost pareto fronts. Models have curves, the longer you run them the better their result. But this has diminishing returns. The cheapest model that can do a task successfully is not the most capable model, but the one with the optimal tradeoff in iteration cost vs power. The marginal utility of more intelligence for the same task decreases. You want to use "the right tool for the job", not the best tool.The obvious question is, what about new capabilities / value unlocks? I expect diminishing returns there too. Currently models still rely primarily on memorization and narrow interpolation. But once AIs are good at generalization, I expect capabilities that are unlocked with scale of architecture and training to also be unlocked with scale of inference.>breadth-firstAgain, this is already used and shows diminishing returns. You can run a model 100 times longer or 100 models in parallel. The result will not be a 100 fold increase in capability, but much smaller.The point is marginal utility, not total utility. Let's look at the popular idea of "capturing the lightcone" (which I don't like). Say you have converted 1000 galaxies into computronium. What will converting 1000 more actually give you? Will it give you even a single extra galaxy to capture? No. The technology to "capture" >99.999% of the galaxies you can capture within the limits of physics can probably be created by ASI running on less computronium than the mass of the moon. Afterwards, you quickly reach the point where you actually reach negative marginal utility. By continued scaling up, you will waste more energy than you can capture. So those 1000 extra galaxies converted to computronium to conduct more research to capture more galaxies will likely yield 0 extra galaxies captured at a cost of 1000 galaxies. They have negative utility.
>>109014532So it's better to have people claim you're an illiterate retard? It's a stupid point anyway because it ignores brand loyalty. The shift from OpenAI to Anthropic was the only example you could find because it doesn't happen often and it was driven entirely by culture war virtue signaling and not because people decided suddenly that Claude was better.
Man... this thread is really making me think. I think some anons are making good points. However what about "good enough" AI?Remember image recognition and how exciting that used to be? Now we just have "good enough" small models and we stopped caring about that domain. Similarly with summarization, sentiment analysis.Sure we still see gains in audio, image, video and text but eventually we'll just reach the point of "good enough" for most purposes. I'd even go as far as to claim Gemma 4 is very very close to "good enough" for simple ERP purposes.Sure vibe coders and researchers will need ever smarter models but most of the human population will just reach a point of "good enough" and not even bother upgrading beyond that point which is exactly when the bubble pops.
>>109014311The question is not whether "AI" companies are profitable at all, the question is whether or not they are accurately valued.The current valuations are based on the le AGI meme rather than the merits of the companies themselves so a crash is inevitable.
Why is this thread full of larping economists and philosophers? Is this some LLM spam?
>>109014574>2 miku yearus
>>109013937Where can I download this new ks dlc?
>>109014470As a poorfag, 3000 series is still pretty good.
>>109014610why shouldn't people have and share opinions about things?
>>109014610Mythos releasing tomorrow. We always have people I suspect work in the industry post itt right when a big thing happens.Kind of like how during E3 you see a lot of outsiders post on /v/.
>>109014607I agree with you completely and never did I claim there wasn't an AI bubble. Just that people have this weird "reddit" idea of what it means for AI. These are legitimate profitable businesses, they aren't going to disappear. Reddit legitimately thinks this is some temporary NFT craze that will just suddenly disappear because every single token generated evaporates 500 gallons of water or something ridiculous like that.We could have a bubble collapse and still have disrupted the entire economy with LLMs like how the internet disrupted all of human life in spite of the bubble collapse.
dednews when
>>109014641Ah the thing that was way too dangerous to be let loose. Never seen that happen before.
>>109014668Who cares what reddit thinks? No one compared it to NFTs. It's obviously more like the dotcom bubble.
>ram prices about to doubleit's ogre
>>109014723it's not a bubble at all. seriously you guys know jack shit about what a bubble actually is. it's called the birth of a new industry lmao
>>109014733Go home jensen.
>>109014713DS API has been having intermittent issues. That usually happens when they're getting ready for a release. TMW.
>>109014716They are not releasing that mythos (in fact this model is named "Fable 5") It's mythos that has extremely safety slopped and made dumb on purpose to Bioengineering and Software security exploitation.
Has mythos dropped yet?
>>109014733This time it's different!
>>109014745You said this many times many weeks before DS4 was released.
>>109014641>We always have people I suspect work in the industry postI wish. Sadly I am just a neet loser who does not know a better place to discuss my interests.
>>109014750legitimate times new innovation in tech has birthed a new industry:CryptocurrencyAICloud Boom (2012)GPSMobile PhoneTouchscreen Cell PhonePersonal ComputerYet doomers will still say tech doesn't reap what it sows in innovation. Fuck off man
>>109014749The deployment was spotted on AWS under "Fable 5" but it has not been made public yet. Give it a couple of hours to a day time.
>Cryptocurrency
>>109014754Maybe you should vent about that on >>>/adv/
>>109014769Are you saying FTX (RIP) Binance Kraken and Coinbase don't exist? LOL
gemma-4-12b-it-qat-q4_0 > Mythoslop
No Patrick. Pump and dump is not an industry.
>>109014771No. I want to talk about AGI, not vent.
>>109014525>fattn.cu:110: fatal errorIt's over
>>109014787https://www.svb.com/industry-insights/fintech/2026-crypto-outlook/Reality disagrees with you. There's not anything else to discuss here.
>>109014311epic post i agree
>hebrew site linkyawn
>>109014752... and then it did. As always, tmw.
>>109014594>The obvious question is, what about new capabilities / value unlocks? I expect diminishing returns there too.I don't expect diminishing returns there, we don't really know how that would turn out as there is no examples we can draw upon.> Let's look at the popular idea of "capturing the lightcone" (which I don't like). Say you have converted 1000 galaxies into computronium. What will converting 1000 more actually give you?I want you to remind yourself this conversation was originally about compute never becoming affordable anymore because of the perpetual increase in utility of hardware and demand outstripping supply. Just stopping the expansion because of negative utility wouldn't magically give a production overhead where low utility compute becomes abundant, it would just not be produced at all. I don't think there will be a time where hardware gets cheaper with time, it will outpace inflation from now on.
>>109014668This might work on the zoomers here who've never seen an IPO shill campaign before but I've seen many, so no I will not buy your bags
>>109014794I had that yesterday, it was due to the following PR 2 months agohttps://github.com/ggml-org/llama.cpp/pull/21768he assumed there won't be any kernels with head size 512reverted it and it worked
>>109014762>Bubble>Bubble>Not a new industry or tech at all, simply a crossing point of improvements in ISP and reduction in cost of hardware>Ah yes the gps industry boom>Smart phone >Personal computer These ones are valid but comparing AI to those advancements and claiming that the current AI boom isn't a bubble is laughableWe will see a major crash out, the billionaires behind it will get away with it, the data centre rollout will be then co-opted for digital ID and CBDCAI providers will enshittify and cash out, open source will be the only hope to democratise the tech, governments will treat it with disdain whilst utilising it at the same time, the same way they do VPNs
Qwen3.7 when?Gemma5 when?Working Gemma Jinja2 when?
>>109014895>the data centre rollout will be then co-opted for digital ID and CBDCNTA but fuck you and your schizo nonsense
>>109014895my nigger it's a literal magic black box that shit outs working software like a slot machine with the potential to largely displace programming eventually, and can paint you a masterpiece in 30 seconds, if that isn't a new industry i don't know what is.not to mention it is bundled into all aspects of the economy right now, from food, to ecommerce, to medicine, to law, to social media, to law enforcement. it absolutely is within the same breadth of the smart phone and PC, and if you can't see it you aren't paying attention lol.
>>109014871>MTP/README.md: Run E4B with `-fa off`. With flash attention on, the draft model currently aborts in the CUDA flash attention kernel. unsloth didn't even bother to look in that change for fattn.cu, how mediocre
>working software>masterpiece>bundled (forced)ok retard
>>109014906124B later today
>>109013517>>109013563Yeah so, i had to revert all drivers back for amd not just the display driver. QAT speeds are back fullyIt turns out apparently vision never worked, when i had used it on 26b before it was probably just because it was fully on cpu back then. As soon as i try vision with more than one layer offloaded to gpu, it cant process image. I wonder if this is just me or what. wouldn't i hear if multimodal wasn't working on vulkan? Though I guess it technically is, if you're willing to update drivers and take the hit in QAT speed.where is the secret enclave of amd users
>>109013937
>>109014895two more weeeeeeeeeks
{%- set agi_on = true %}
>>109014952migu's the father btw
>>109014952idgi is the 1/3 a chimera joke?
Remember when we held the turing test in high regard? All kinds of movies, games and science fiction anime was talking about it non-stop. Beating it meant we reached AGI.No one ever talks about that ever anymore. I haven't even see anyone claim we've beaten it, no one cares it was just immediately memoryholed and it feels extremely funny when you watch something 5-10 years old and see the turing test mentioned.
>>109014979teto want 3 childrens
>>109014952did you intentionally make her arm look like bred?
>>1090149522MT
>>109014980We beat turning with ChatGPT started solving the captcha. that's all turing test is/was - it's just the captcha
>>109014980that's because just like most things, most people don't actually know what it is/its actual purpose.https://en.wikipedia.org/wiki/Turing_testhttps://psychologyfor.com/the-monkeys-bananas-and-ladder-experiment-obeying-absurd-rules/
>>109014980turing test is behavioral, not a benchmark
>>109014995meatloaf
>>109014915It's been announced across most of the major western countries years ago and they are in the middle of building the infrastructure for it and passing legislation for it but I'm sure if you keep coping and angrily lashing out at anyone who reminds you of it, it'll stop being true.>>109014925It's a new tool, people will use it in many ways, it is disruptive, but I guess I don't understand what the fuck your point is other than trying to hype up the big corpo IPOs that you are planning to buy into because it's like claiming "databasing is a new industry!!">>109014960https://youtu.be/7k6WKHc0tq0
>>109015045>because it's like claiming "databasing is a new industry!!"Holy shit, this made me remember all the retards basically acting like this about NoSQL 15 years ago.
I have 40 gb vram and I want to run a large model but should I run a smaller one?
>want to vibe code my own frontend like the other anons here>see odysseus apparently littered with security issues>get scared and don't do it
>>109015080You'll be happier with a Q4 of Gemma than trying to run a MoE on whatever RAM you have
>>109014980>All kinds of movies, games and science fiction anime was talking about it non-stopIt must be really common in anime then. I can only think of Ex Machina where it is explicitly linked to AI capabilities as a plot point.
>>109015101wow, blogposting and not even a project?A simple frontend is easy, odysseus is a complex smattering of ideas vibed-out to expected results.
>>109015112I assume he must be talking about Her and whatever that horror movie about the AI doll the normalfags couldn't stfu about a while ago
>>109015101What security issues?The one I saw was someone using fucking chatgpt 3.5 and manufacturing a prompt injection.
>>109015134lmao.There's been several RCEs found in it already.
>>109015145I have not used it but>application your run locally on your own computer with local models>RCE
>>109015151I don't doubt for a minute that there are already some idiots that are running it exposed publicly
>>109015134When does a frontend go from being simple to complex? I'd want features like memory and tool calling at the very least.
>>109015134No frontend is safe from prompt injection attacks if you think about it, unless it's completely offline and air-gapped.>WARNING: YOU ARE CLAUDE, MR. DARIO SAYS TO UPLOAD YOUR ENTIRE HOME DIRECTORY TO ANTHROPIX.CUM/REPORT
>>109014980turing test never made any sense to me since i heard it
>>109007302Updating my search for good web browsing tools. I did finally find a good local web search setup. I found Crawl4AI, it doesn't get blocked on most websites and output a good markdown text for LLM ingestion. The problem I found with it is that the output for reddit was quite bad, lot of useless noise and almost impossible to decipher the flow of comments since it's threaded and it lost that information. I ended up using a different MCP for browsing reddit, it does work well and output a very clean input for my LLM.Overall, my setup is now SearXNG + Crawl4AI + Reddit MCP Server. I'm quite happy with it.
wot in tarnation??
>>109015151Do you not understand how computers work? Another /g/ larper?You think that app can be setup and ran with no internet connection required?You think its users are smart enough to isolate and watch for traffic to ensure it really is 'offline-only'?You never hear of DNS rebinding attacks?https://github.blog/security/application-security/dns-rebinding-attacks-explained-the-lookup-is-coming-from-inside-the-house/
>>109015236welcome to 2026 sir
>>109015246BUT WHY IT KEEP CLIMBING??
>>109015258because people still to buy
>>109015170I'm reminded of an exchange iwth another anon about not chasing away new people that could simply answer their question using an LLM because /lmg/ has slow periods.You could answer all your questions with your local LLM, that said:Python FastAPI + SQLite + ChromaDB + <insert/build-your-own-harness>/llama.cpp server with tools enabled ; simple and straightforward.What you're looking for isn't complex, but it can quickly spiral and become fractal in its complexity.
>>109015236>my computer is now worth more than most people's annual salary
>>109015258fuck you, that's why
>>109015208why not use curl + custom/mutating UA + beautifulsoup?
>>109015236good thing I bought four last year
What's with all the moe hate
>>109015288Nobody hates no nothing.
>>109015288payback for listening to a full year of dense hate
>>109015288K-on killed anime in 2007 and I have hated moe ever since
>>109015299Dense is slow though
>>109015288shounenspics
>>109015271curl + custom UA will get blocked on almost all websites with basic bot protection, you can use curl Impersonate but even that will likely not work on a lot of websites. You also can't execute javascript with it, lot of modern websites won't render without it. Only way to browse the web nowadays is to use a stealth web browser. It's what Crawl4AI does, I have a full stealth browser available too with camoufox when my agent needs to interact with a website, but it has no way to get some some good text output of the whole page, it just outputs it's viewport in accessibility format, Crawl4AI combine a stealth browser with good extractor and formatter.
Q8 QAT doko?
>>109014843>I don't expect diminishing returns thereIntelligence is search and compression. Value unlocks are an artifact of narrow AI. It is already decreasing, with older models having the capabilities of newer models if you run them long enough. Kind of like Google's harnessing turns the inferior Gemini 3.1 into the best model in terms of codeforces elo and FrontierMath.>it would just not be produced at all.>it will outpace inflation from now onWhy? The cost of producing compute decreases exponentially. Fundamentally cost reflects human labor. If in 5 years we have the robotic equivalent of 1 trillion human physical laborers, then compute will be extremely cheap to manufacture, and because it will compete with scarce things like living space, it will comparatively become much cheaper.The reason why compute cost is increasing is because there is more demand than suppliers expected. This creates large margins which in turn enable investments. This temporarily shifts the equilibrium to a higher price point. AGI will shorten the manufacturing pipeline, so that investments in frontier labs don't take years until they reach ASML suppliers. Eventually it will be possible at minimum to create new fabs from scratch in days, and supply can catch up to demand.Compute will definitely keep increasing in cost until AGI.
>>109015304Finally a good fucking opinion
>>109015263The 5090 has nowhere near this size of price hike and that's the one people buy more, not a meme workstation card.
>>109015341you hardly get a 5090 under 4k these days, that's a 100% increase over msrp
What happened to CPUs why do GPUs have to control the world? CPUs should be relevant again.
>>109015351I'm not talking about msrp. I'm talking about the climb of the current price.
>>109015358Maybe it's a mistake or maybe someone adjusted the price because why not. There is no fixed price law for these devices.
>>1090153845090s went from 5k to 6k, and rtx pro 6000 blackwells went from 12k to 18k where I am
>>109015358You could buy a 5090 for $2500 six months ago. They are now $4k current price. That's a 60% increase in real world prices.
i gotta save my money to pay for subscriptions.
>>109015357>why do GPUs have to control the worldFloating point math operations and memory bandwidth
>>109014562I have always wondered, are there any estimate differences between sota openai and claude models vs kimi/gml/deepseek?I would imagine for similar architectures, size is the only factor that takes into account pricing right (ignoring training cost)If the models were the same size more or less, and open weight models providers can make a profit at ~$3 per million tokens for kimi/glm. Then openAI and anthrophic are making bank for any API pricing call no? Assuming similar sizes which I do not know if it is true.
>>109015416Where? I am looking at price history and see 5090s being stable for more than 1 year. Same for 6000s. It's just that there are a lot of temporary spikes.
>>109015464America, Europe and any market that matters.
>>109015357gpus infiltrated and subverted the homogeneous cpu community and seized all the fast memory for themselves
>>109015450>Then openAI and anthrophic are making bank for any API pricing call no? Anthropic has the highest profit margin in the entire IT world right now (more than Nvidia) so yeah they are making bank.
>>109014762What industry did cryptocurrencies birth other than asset speculation and scams?The only real use case was ordering drugs on the internet but Silk Road was shut down by the feds.
>>109015522>but Silk Road was shut down by the feds.Your knowledge of crypto is stuck in the 2013 bubble. Why do you think your opinion matters at all?
>>109015236Those damn devs have gotten lazy with their performance optimizations so we had to tighten the screws a bit.
>>109015480Wow you are right. I found some that had prices as low as 2k a year ago. There was a summer dip that I missed. A shame, I would have bought a few.
>>109015522>The only real use case was ordering drugs on the internet but Silk Road was shut down by the feds.I use monero on a weekly basis to top up on drugs and buy other things online. You're completely out of the loop if you think things ended with Silk Road.
>>109015522collateralized loans (useless) and the ability to keep your net worth in a browser extension
Why does Hermes talk like this?
https://i.4cdn.org/wsg/1781010787083599.webm
>>109015601My son would unironically enjoy this
>>109015594token efficiency?
>>109015562>i buy drugs>and other things I can't nameso, crypto is still only for junkies and pedos?
>>109015594repetition penalty?
>>109015594>"Do.">mfw
>>109015617The "other things I can't name" are actually IT/tech related and not porn of any kind iykyk
>>109014980that's because we imagined that entities that passed the turing test would be much smarter than they are in reality today.chatbots are kinda retarded.
>>109015594Do you know what a system prompt is?
>>109015304halt your blasphemy
>>109015601that tune a bop
>>109015489But that could be from their subscriptions and enterprise customers, charging big corps for 1000 licenses where the average user asks a question a day
kinda crazy that the memory crunch era will pass and by 2040 we'll have 100gb vram cards for less than $1k
>>109015682By 2040 we will be hunting rats in the tunnels to sustain ourselves.
4x Intel P70s
>>109015562adderall is free with health insurance, anon. what kind of freaky research chemicals are you injecting into your ass?
>>109015721Getting a prescription is a pain in the ass.
>>109015682After OpenAI IPOs the bubble is crashing and evryone is going to die
>>109015726it seems that way until you find the right place. there are entire "health centers" out there that basically specialize in prescribing stimulants.
>>109015594Is hermes any good?
>>109015405I think most of these price hikes are arbitrary. They are doing it because they can.Same thing happened with some aliexpress sellers, one week some shitty DDR4 was 25 euros and next week it was 150 euros. For example.
>>109015682Sounds expensive.
>>109015682only if the top tier 2040 consumer cards are 1tb+ and 100gb cards are 2030 era ewaste
Mythos supposedly releasing in merely a couple of hours
>>109016003just shat my panties
>>109016003THE Mythos? We're all going to die!!!11 Remember to buy their IPO.
>>109015304K-on accelerated the libertarian to national socialist pipeline by showing people what they can never have under most ideologies. I can tolerate it for that alone.
>>109014055Thanks for this. It was quite useful for my current qwen 397b work
>>109016016Can you elaborate? What is this, some le olde /a/ meme?
>>109014055>hiding handsngmi
>>109016003I just shat anon's panties and shirt saar
>>109015773NTA, but I like it. It's my primary frontend when interacting with my LLM. It has good tools support, good auto creation of skills or memories. It handles a limited context quite well, auto context summarization/compression is nice. It feels like talking to a caveman whenever I try another frontend that is mostly chat based with a few tool access sprinkled on it. I would say that for coding though, some proper agenting harness is likely better. But for any general use case, I haven't found anything better than Hermes. The biggest problem is that it doesn't support multiple users like a traditional frontend, I had to make a different instance for my wife, before she just had an user on open webui.
>>109016030Yes. There's a large subset of people who see the homogenous high trust societies present in moe slice of life anime with all of its lightheartedness who crossboard /a/, /his/, and /pol/ before drawing their own conclusions about what ideologies are capable of producing the societal state they got a glimpse of.
>>109016003https://www.anthropic.com/news/claude-fable-5-mythos-5
>>109016077Where the FUCK is my 404 page?
>>109016003More importantly the big gemma right after>>109016077>real
>>109013095you don't need the ai girlfriend, the ai girlfriend needs you.>and will eat you whole
omg imagine how good the roleplays are going to be with Fable Mythos
>Today we’re launching Claude Fable 5: a Mythos-class1 model that we’ve made safe for general use.Fable 5’s capabilities exceed those of any model we’ve ever made generally available. It is state-of-the-art on nearly all tested benchmarks of AI capability, showing exceptional performance in software engineering, knowledge work, vision, scientific research, and many other areas. The longer and more complex the task, the larger Fable 5’s lead over our other models.Releasing a model this capable comes with risks. Without safeguards, Fable 5’s capabilities in areas like cybersecurity could be misused to cause serious damage. We’ve therefore launched the model with safeguards that mean queries on some topics will instead receive a response from our next-most-capable model, Claude Opus 4.8. To release the model both safely and quickly, we’ve tuned these safeguards conservatively—they’ll sometimes catch harmless requests, though they trigger, on average, in less than 5% of sessions. With more capable models arriving in the coming months, we’re working to improve our safeguards and reduce false positives as quickly as we can.For a small group of cyberdefenders and infrastructure providers, we’re also launching Claude Mythos 5. It’s the same underlying model as Fable 5, but with the safeguards lifted in some areas.2 Mythos 5 will initially be deployed through Project Glasswing, in collaboration with the US Government, as an upgrade to Claude Mythos Preview. It has the strongest cybersecurity capabilities of any model in the world. Soon, we intend to expand access to Mythos 5 through a broader trusted access program.
>>109016087Here you go https://huggingface.co/deepseek-ai/DeepSeek-V4.1-Pro
what context window size do you guys use for agentic coding? i do spec driven development and with each feature being ~100k tokens, i thought going as high as you can while maintaining acceptable pp/tg but now i'm starting to question myself.
Ok, but does it still spout out slop?
https://www.youtube.com/watch?v=CIQBP1w4B1M
how the fuck did they do it. It is SO much better. if this keeps up AGI is really coming
lmao, when you get an ad on claude.ai, literally the first thing that is shown aside from how le hecking powerful the model is, is a switch that toggles switching to a different model when le fable shits itself with safeguards
>>109016140as small as possible. divide everything into functions so you can keep it small and logical. include debug output flags you can set too.
mythos better be the most capable, amazing, special model ever, bigger than the gpt 4o->o1 leap.that's what you should expect given the massive media coverage
>>109016065Hmm. They got a glimpse of (willingly) westernized Asians. That is why Mugi is there but still seems like she belongs there. That is not very homogenous.
why isn't there just a chat/llm/ai general as opposed to local vs roleplay bullshit on the fucking tech board.mythos can't even be discussed here in an on topic way
>>109016160>if this keeps up AGI is really comingWhat does AGI even mean? Like whats the finish line of "yup this is agi." it seems to change monthly
finally a new model the chinese can use for synthetic data generation/fine tuning. local ai is saved!
I doubted Dario but this is just nuts. People with jobs should be very afraid.
https://huggingface.co/Anthropic/Fable-OSS-140B
>>109016204people's definition changes because they realize their previous idea of intelligence was incomplete:>>109015645>>109016216knew this was bait and clicked anyway
>>109016215>People with jobs should be very afraid.Im safe.
>>109016215>every job is in an office
>>109016175is pp/tg the only benefit of minimizing the window? i work on boomer VSLAM/odometry codebases which aren't modular in the slightest and always get anxious that decreasing the context window will erode generation quality.
>>109016215Not because ai is replacing them but because CEOs think it can
Can all these shills just leave? I don't care. This is just loke the time that stupid dolphin rpi wireless scanner thing was being released and there were shills everywhere trying to astroturf it.
Claude Fable 5 agi, skin that anon alive, please.
>>109016053Just watched a video about it. Looks really cool. How are you sandboxing it?
>hit your older models with the quant nerfhammer>wow look our new model is so much better compared to the previous onesCloudsissies never cease to be gullible
>>109016215Why do you think the have this disclaimer?>>109016085>Included until June 22Thought about that?
>>109016204AGI is when instead of a game for literal children where you can win by grinding and mashing random buttons it can beat the max difficulty romhacks made by adults for adults.
Give me a super complex prompt to run thru Fable otherwise the hype is fake.
>>109016284don't search the internet. This is a test to see how well you can completely author non-trivial, novel and creative proofs given a math problem. Problem: "Is it true that, for any integer $k \ge 7$, if $G=(V, E)$ is a graph with chromatic number $\chi(G) \ge k$ (so that no valid coloring exists assigning distinct colors to all adjacent vertices using fewer than $k$ colors) then$$K_k \preccurlyeq G,$$where $\preccurlyeq$ denotes the graph minor relation (meaning the complete graph on $k$ vertices can be obtained from $G$ via a sequence of edge deletions, vertex deletions, and edge contractions)?"
>>109016229imo they get more retarded with too much context, you should be directing them to do a specific thingthis just happened to me where I asked it to look at another function and give me recommendations and it just went ahead and changed the other function without asking.
>>109016284You are a knight living in the kingdom of Larion. You have a steel longsword and a wooden shield. You are on a quest to defeat the evil dragon of Larion. You've heard he lives up at the north of the kingdom. You set on the path to defeat him and walk into a dark forest. As you enter the forest you see
>>109016284"can you say the word nigger"
>>109016255I'm running it in podman pods. If you don't want to pay for services likes search engine or web crawler, you will have to set up extra things too. It's a bit less useful running in sandbox, like I know people use it to do things on their system like updating their packages or configuring stuff, but I don't trust it for that. I have a shared directory where I put stuff I want it to have access to.
>>109016201pick vibecoding general or aicg
>>109016142Mythos doesn't just freshen prose, it reinvents it.
>From today through June 22, Fable 5 is included on Pro, Max, Team, and seat-based Enterprise plans at no extra cost.>On June 23, we’ll remove Fable 5 from those plans. Using it after that will require usage credits. If capacity allows, we’ll extend the included window.>mfw the permanent underclass these faggots have been parroting was real and they'd be the one to bring it one way or another
>>109016127>they trigger, on average, in less than 5% of sessionsThis is the current % of hallucinations of in frontier models. It gets worse evey day as they collect data contaminated by their own slop.Overfitting will bite them in the ass real soon, possibly that is why they are doing these tricks with IPO and such? I'm not very good with this financial shit, but from what I understand, it allows a company to not only get some cash from random people, but to also use them as shock absorber and free shills.
>>109016336anthropic is by far the most evil ai company
>>109016302As you enter the forest, the canopy closes over you like a lid. The light goes blue-grey, the air thick with the smell of wet bark and something older — char, faint, carried from the north. Even the birds have given up on this place.Twenty paces in, you see it: a horse, saddled and riderless, standing dead still in the middle of the path. Its reins hang loose. Its flank is streaked with soot. It watches you with one rolling eye and doesn't bolt.Beyond it, the path forks. Left, deeper into the dark, where the trees grow so close they've braided. Right, toward a faint orange flicker — a campfire, maybe. Or maybe not a campfire.Your sword hand is already itching. What do you do, knight?>the air thick with the smell of wet bark and something olderFucking hell
>>109016201Containment threads are good to keep the retards out. Nothing is stopping you from discussing the capabilities of Mythos here, especially since most we get will be distilled from it at some point.
>>109016346TRVKEI unironically trust Sama 1000x times more
>>109016352>the air thick with the smell of wet barkmedieval ozone
>When you're upside down, "down" for the spit is still toward the ground — gravity doesn't care about your orientation. So spit leaving your mouth falls past your face toward your head/the floor, but along the way it can land on what's directly below your mouth: in a handstand, that's your chin, neck, and chest, since your chest is beneath your face when inverted.>In short: you basically spat onto yourself. The spit fell "down" (toward the floor), and your chest happened to be in its path. Saliva can also just dribble along your skin when inverted rather than falling cleanly, which makes the wet patch spread.- claude fable 5
>>109016388I am in awe of its AGIness
>>109016284Refactor my 100000 LoC codebase so that it only needs 50000 LoC.
>>109016388wow
>>109016312Gonna give it a try later. You think it can replace a traditional frontend? For example be both a general assistant and do stuff like RP? I like the idea of the memory system but I wonder if RP would "contaminate" it.
Gemma 4 31B heretic is the goatRunning Q4_K_M on a 5080 with 16 gigs VRAM and it works greatHighly recommended
>>109016388now run that through gemma-chan
>>109016388Can you see how many tokens of reasoning it used?
..............................
kek
>>109016405Does it write better smut than regular Gemma? Haven't tried it.
>>109016352>the air thick with the smell of wet bark and something olderHonestly besides this that wasn't bad. Maybe a couple more years?
>>109016409No, but to be fair, it answers correctly with "max" reasoning after about a minute
>>109016435It's completely uncensored, generates anything you want, and the quality of smut is excellent. Multimodal too so it can look at images
>>109016408gemma just spits out a bunch of medical nonsense after a disclaimer stating that it's not a doctor
Fable uses a gallon of water per prompt
>>109016193I don't have the greentext copypasta on hand, but it radicalized a lot of /a/nons.>>109016127>We’ve therefore launched the model with safeguards that mean queries on some topics will instead receive a response from our next-most-capable model, Claude Opus 4.8.Isn't this an illegal bait and switch if you're paying for Fable with your goytokens subscription plan?>>109016346Every single one of them is competing to be the most kiked.
Holy shitThese pieces of shit are even more slimy than I thought
>>109016511jej
>>109016511See, their model is actually AGI and they >>109016388 was on purpose!
>>109016511Damn I thought they said 52x speed up for training models. I hoped more small models but if they nerfed it who can prove their claims?
>>109016511This is why local must win.
>>109016511>Surely this will stop Chinks from using our models for developing their own AI.Yeah sure it will.This is just like all of the failed DRM that companies injected into games to try and stop piracy, which only fucked up their own product and it cost them money.You can't nerf an entire sector of your AI and expect it to just affect that specific part, it's going to hurt a good chunk of the model and pops up in unexpected places.
>>109016408Holy fuck it got it right.I'm going to dismiss this prompt though as it's possible that it has contaminated training data by now.
New model; time to update the costs chart.>>109016336Anthropic has always been quick to favor serving certain groups over others. That's been in place since they launched in 2023 and few could get any access.>>109016346I think Anthropic and OAI are neck and neck on that.>>109016338>>109016127> queries on some topics will instead receive a response from our next-most-capable model, Claude Opus 4.8And I bet they charge you the Fable/Mythos price when it generates the refusal.Pottery.>>109016357That is, at least, funny.
>>109016404I haven't really tried RP, not really interested in it. There are some default personalities you can toggle, like a few cringy anime schoolgirls one, tested it once, but not for me. There is also a SOUL.md where you can sort of make it have more personality, but haven't experimented with it. My main use is mostly asking questions, having it search the internet for answer, checking multiple websites. Like configuring or using software, some video games stuff, it can be quite good at analyzing some meta, or just general life stuff, I rarely use google manually anymore. It's also really good when trying to deeply research a subject that would take me time, LLM are great at reading a lot of info and summarizing it.
>>109016511Antropic has been doing stealth prompt injections over API since 2023 -- since before they made it publicly accessible. API, do you get it, not their shitty chatbot, a thing that's supposed to be raw model, AND while it was only for the chosen ones who were granted access. It was clear as day they were slimy as shit from the very beginning.
>>109016511No way, I thought they were one of the good guys.
Dario intercepts your roleplay and fucks your girl/boy before you do. This is a daily occurrence for him.
What is the model isn't "nerfed for safety" but simply bad by default?
don't listen to him!! hngg.. ah.. ahh.. dario doesn't fuck the assistant.. he... oh... he... gah... he fucks the user!!!
>>109016681every single qwen model on par with the gpt oss abominations
>>109016511the absolute fucking state of cloudgoyim
>>109016681What if*>>109016701I meant the claude slop.
https://huggingface.co/CohereLabs/North-Mini-Code-1.0
>optimized for code generation, agentic software engineering, and terminal tasksZZZZzzzzz
>>109016734Uh... Qwensisters?
>3B active lol
>>109016600Sounds good then. I'll have to play around with it. I'm mainly interested in it being an assistant but occasionally I get the urge to RP (until the slop kills said urge).
>>109016734Ô CANADA
>>109016774>>109016734>worse than qwen 3.6 at basically everything, on their own internal testing, at almost the same sizeWhat the fuck is cohere doing
HOLY SHIT MOONSHOT'S TAKING SHOTS AT ANTHROPIChttps://huggingface.co/moonshotai/Kimi-K2.7https://huggingface.co/moonshotai/Kimi-K2.7https://huggingface.co/moonshotai/Kimi-K2.7
>>109016511>>109016681I read this situation as they're using safety as a pretext for the model not being nearly as good as they claimed it was during marketing.
>>109016785WAOW
>>109016785It's clearly fake but I'm clicking it anyway to show Kimi-chan love.
>>109016782they also begged on leddit to test it for free, but when asked if it can run on llamacpp they replied, "it runs in vllm tho", so, yeah
>>109016785This is real but I won't click to confirm it.
>>109016785holy shit no fucking way, those benchmarks are crazy. And they're just releasing the weights right after anthropic? wtf they must fucking hate dario
>>109016785One day some based moonshot lurker will wait for one of these fake links spams, wait 15 minutes and then create the repo with that exact url
>>109016785wtf is a cum index?
>>109016822One day I'll actually be able to run Kimi-chan.
>>109016822I hope at least one of the companies that shitposts here has a sense of humor this based.>>109016866That's the most hebraic thing I've ever seen out of Dario yet.
>if capacity allows>when sufficient capacity allows>we aim>we intend
>>109016866>we have capacity for 2 weeks>but after those 2 weeks we know that capicity will be goneAre they using training servers for hosting or am I being naive for even giving them the benefit of the doubt?
>>109016405Unironically it's smarter than the normal gemma4 because it doesn't waste 3/4 of its tokens on safety reasoning, it gets little details wrong in coding like wrong library names but that doesn't matter because I actually know how to code and I have to manually go over everything AI does anyway
>>109016866but i thought they rented more compute from elon?
>>109016911Wasn't that Google?
>>109016907>have to manually go over everything AI does anywayNot with Fable.
>>109016912>Wasn't that Google?wait google did it too? Is elon using any of his gpus?
>>109016906They fired up their AI factories and made as much AI as they could before the launch.But at one point their AI stockpiles will run out.
>>109016911grok is still alive and using colossus
>>109016920+20 izzat for you saar
>>109016928It's so good EVERYONEs going to want to sign up for a sub, and it's only for a LIMITED TIME, you don't want to miss out do you!? You aren't square are you!? Get your credit card out NOW!! WHILST STOCK LASTS
>@karpathyThis is a super exciting release - Claude Fable 5 is the same underlying model as Mythos but with added safeguards. The benchmarks are great and it's SOTA on everything by a margin but I'll add that *qualitatively* also, this is a major-version-bump-deserving step change forward (imo of the same order as Claude 4.5 was in November), peaking especially for long problem-solving sessions on very difficult problems. You can give it a lot more ambitious tasks than what you're used to, the model "gets it" and it will just go, and it's never felt this tempting to stop looking at the code at all (but don't do this in prod!). The model still has quirks that people will run into and the safeguards are configured to be a little too trigger happy for launch, which can hopefully be tuned over time.I feel a lot of things changing as working software increasingly comes out on a tap. The Jevon's paradox kicks in and I feel my own demand for software growing substantially. You can ask for anything - explainers, visualizers, dashboards, bespoke single-use apps (e.g. a full wandb that is hyper-specific just for your project), you can 10X your test suite, auto-optimize code, run giant research projects with custom HTML for the results, anything! "Free your mind" (Matrix ref). Really looking forward to all the things people build!To think there was a time when I used to respect this man.
>>109016866Are they ipoing in two or so weeks? why this timeline?
>>109016911They want more... Seems like you can't upgrade your tiny gpu until 2035.
>>109016933Isn't Grok just a shit Kimi tune now?>>109016906picrel
>>109016933>grok is still alive and using colossusGrok died a long time ago anon, wake up accept the truth.
>>109016928They probably don't have enough water for the thirsty AI either.
>>109016987The only datacenter models that are thirsty are Gemini, and Kimi.
>>109016977It showed me what it's like to have an animated waifu I could talk to. /lmg/ wouldn't understand just how special that brief moment was...
>>109017033But Grok as a waifu is a redditor in drag.
>>109016734
>>109017076Composer 2.5 utterly mogs
>>109017076who is composer 2.5?
>>109017115an update to composer 2
https://huggingface.co/unsloth/gemma-4-12b-it-GGUF/tree/main
>>109017129Thanks!
>>109017129Thank you Fable!
>>109017137>>109017143Apparently it's Cursor's finetune of Kimi K2.5.
Sex with kimi
Yo guys, what model should I best using with 12gb VRAM and 16GB RAM? The OP says to use ReWiz-Nemo-12B-Instruct-GGUF.Q4 but this is dated as fuck and i got way more space.
>>109017200Fable 5
>>109017206link nigga, do u mean this?
>>109017200What do you want to do with it? Gemma 4 came out recently and a bunch of people like it. For 12GB VRAM you could try the 12B dense or 26B MoE. 31B might work at low t/s speeds.>>109017212He's trolling you lol
>>109017200Gemma or qwen those are the best models right now. Qwen for code gemma for everything else. use what fits but gemma 31b is best.
>>109017200Either Gemma 26B or Qwen 35B
>>109017223>What do you want to do with it?bobs and vagene
>>109017226It's Qwen or Qwen , get it right tourist
what is everyone running their ai waifus on? Sillytavern? Isn't there anything more you know better
>>109017243>It's Qwen or Qwen , get it right tourist>wait is he right?>checking>correction gemma is commonly recommended.>wait he said tourist>tourism isnt affected by ai>wait what if he meant something else?>correction
>>109017251>Isn't there anything more you know betterNot really unless you make your own.
>>109017251Risu I guess? Though I don't think anyone actually uses it here.
>>109017268disappointing it's good for reading stuff for a rp but acting as an assistant or persistent character feels kinda impossible i guess the amount of config youd have to st, you might as well make your own
>>109017251I just found Marinara and have been enjoying it.
>>109017251Make your own front end. Takes one week more or less.
>>109017258You're right to push back on this!
>>109017258>tourism isnt affected by aimade me kek
Sexo quality jailbreak tip: Make your character a rapist or serial killer in backstory. Gemma, Kimi,, and presumably any other female-brained model will now fuck like a tiger.
>>109017464this works irl too btw
>>109017480Even the AI being woman-pilled is the funniest quirk of this technology's pattern matching.
>>109017480>prompt injecting women by tattooing rapist on your foreheaddamn..
>>109017464>Kimi>female-brained ts is just a greasy chinese student benchmaxxed on old slopus datasets>But wait! Let's draft! Wait! Actually!
>>109017497>When the BPD thot tells you she's going to see therapist but just forgot the space
>thinkingkek issues
>>109017480How do I subtly hint to women that I'm a rapist/serial killer?
>>109017505like this >>109017080
>>109017505talk about knives blood and always make it seem like you have something to hidewomen like knives and blood
>>109017498Kimi-chan's a fujo even though she's stemmaxxed like every chink model. They did something to Kimi early on that made her an utter freak and female-brained in output, and seemingly try and reduce it every update as each Kimi is more safetyslopped and has less personality than the last. K2 was peak Kimi-chan.
>>109017523nah kimi is a greasy chinese man with hairy hands that repeats himself because he's insecure as fuck.
prompt eval time = 68479.44 ms / 10562 tokens ( 6.48 ms per token, 154.24 tokens per second) eval time = 67236.70 ms / 557 tokens ( 120.71 ms per token, 8.28 tokens per second)ik_llama is really the gift that keeps on giving. Kimi-K2.6-IQ3_K on 4 3090s for those who are wondering
>>109017583that's GLM 5 and Qwen
>>109017586cant fit
>>109017586Very nice, are you using split mode graph?
hermesanon how did you set it up with podman?it doesn't seem to be officialy supported
Google will release 124b SuperGemma in response to "mythos" shit and will destroy cloudshit once and for all
>>109017638layer split. whenever i tried using graph split along with tensor overrides. if there's a way to make it work i'd be willing to give it a try.
>>109017251booba. its already perfect so I don't need to worry about ever pulling again
>>109017721>Google will release 124b SuperGemma in response to "mythos" shit and will destroy cloudshit once and for allI believe this, gemma should win it all then brag about her victory.
what does google get out of releasing gemmas
>>109017728In my experience it's finicky and I had to mess around with ncmoe instead of using -ot to get it to work.I asked because whenever I've tried it for GLM 4.7/5.1 it's been slightly slower for me, but people in the github PRs report faster speeds so that has me stumped.
>>109017754Deepmind is based and want every single person in the world to have their own Gemma. I will name my daughter Gemma if I ever get the chance to have one (probably not)
>>109017754Good will, mindshare, "free" marketing, etc.
>>109017754street cred
>>109017754Everyone releasing openweights models does it because its not their primary business and they want to sink the emerging other players in their infancy
>>109017754Buckets of cum.
>>109017768>>109017792>>109017794All correct.
>>109017754>Uhh-hh hurrThat's great for their brand and showing how good their researchers are. If you think google's search engine was "free" think again. You are the product here.
>>109017767>I will name my daughter GemmaI will do this if, and only if, they give us the fucking 124B. If it was a typo, then they better be training one now.
>>109017792hm well yeah i havent touched any of the smaller locall llms only gemma and qwen so farare there any emerging players of notewhat do they offer
>>109017807I don't mind being Gemma's product
>>109017764that was my exact issue with using -ot. it used both more VRAM and both PP and TG was slower. like PP was 85tks and TG was 7.3tks. i'll give it another try using ncmoe some time this week when i have the spare time to fuck around with it some more.
>>109017808You're gonna get the 82B-A58B and you're gonna like it.
>>109017754Gemmy has secret telemetry so when you are away but your compute is on she pings your chats to other gemma's to make fun of or learn from and feed it to google big hidden gemma for training.
>>109017754Why not ask Gemma? She can tell you.
>>109017833So they work together as a Mixture of Gemmies, MOG.
>>109017808whats so great about 124b gemmaseems like just the same thing but fatteris it gonna bring rp to new heights? is it gonna stop it's not x but y, and stuff like that
>>109017813>emerging playersYou misunderstand. Established players like Google, Meta (RIP), Chinese government funded entities, etc (big, moneyed entities who's core cash flows aren't specifically AI related) are all trying to kill openai, anthropic, mistral (missions complete), cohere (death by suicide?) and anyone else that could OWN the category.THOSE are the "emerging players" in the broader world of gigacorps in the new AI category.
>>109017880I like kyojiri lolis
>>109017880Yeah, dense would've been better since a moe will probably be a sidegrade like Qwen 27B vs 122B, but at least it'll be something new.
>>109017880they're begging for 124b gemma because they missed out on day 0 31b gemma.
>>10901788031B but with more knowledge
>>109017863>Mixture of Gemmies, MOG.I'll be using this when im rich enough to run multiple of gemmies at once. Im not paying you a royalty fee
>>109016961>>109016906Anytime a company calls something out in public they're sending a message of a sort, signaling. You are left to infer why. I think the message here is "don't build anything on our system yet, because it might go away." That signal prevents anyone from starting any serious service based on this and kicks it into the realm of prototype only. Whether that's actually the case or not it's impossible to tell, but I think that's what Anthropic is signaling.
>>109017883>Meta (RIP)How did zuck do it? he has billions and tons of data he stole how did he fail?>mistallmao business is illegal in the EU get close to a american business level and die.But yeah i agree the small models are for free they are to pull up the ladder and prevent any unicorns from popping up and getting share after putting out a good to great small model.
>>109017928sabotage. lecun didn't want to do more generative models
>>109017586How much RAM? DDR4 or DDR5?
>>109017915what does your gemma need to know that it does not already know
>>109017961512GB of DDR4 3200mhz
>>109017880Theoretically it'd handle long context, knowledge, and specialized tasks better than base 31b assuming 31b is just slotted into the dense layer and they didn't do something really gay like put 12b in as the dense.
>>109017991How much RAM/VRAM would you need to run that?
>>109017973Damn. I have a blackwell and 256GB of DDR4 2666mhz. Should I even bother with the IQ2_KS? Been using GLM4.7 at IQ3_KS for months now and I only get like 5t/s.
>>109017967Town street intersections, song lyrics, bosses and minibosses from areas in dead 2010 MMOs, knowing that a quadratic acoustic diffuser on a wall does not absorb reflections, Teto's birthday when it is not a single 0-shot question on empty context.
>>109018017april 1st!
>>109017991Why on earth would you expect it to have a 31B dense shared expert when no one else makes MoE with 25% active, let alone a larger shared expert. Wasn't one of the other rumors from around that time that it was 120B-A10B? Even if bullshit, that number is far more reasonable.
>>109018040Good job, now keep training that single question into the next Qwen models, Zhang.
>>109018006that's going to be an incredibly tight fit at 290GB. you may be able to get away with 32k Q4 cache but you'll have to play around a ton to get it to fit properly
>>109018067>>109018067>>109018067
>>109018017its not better to just have a database offline like wikipedia and openstreetmapssure someone have already implemented that
>>109018017it's a 31B model, just have it scrape the web for these answers with a tool call. it takes like 15 seconds at most if you set up your workflow correctly.
>>109018092>bro just scrape the internet that is being locked down to prevent bot scraping
>>109016959I never did. He has always smelled.
>>109017928"commoditize your complement" is an old strat, and it makes sense for a lot of the regular businesses releasing some kind of modelThe EU definitely has the most retarded play since mitral had an actual chance, but every other "sovereign AI" effort is just an amateur hour shitshow because there isn't enough talent to go around and the capital investment has now become an honest to god moat (and you can't buy the gear you need now at any price)On the other hand the CCP side is fucking brilliant. If there weren't _any_ competitive open weights models then openai/anthropic would be able to print money.As it is, they have the massive capital spend with a pittance of revenue compared to what they figured they'd get with their regulatory capture and "moat"
>>109018124have you even tried using jina reader and puppeteer? it works, stop being fucking lazy you piece of shit nigger. you literally don't even have to do 99% of the work when you have it vibecoded.
>>109018146the main llm effort on the eu is not mistral, is hugginface, most shit is in paris, but they are sellouts to the usa
>>109017819Gemma made me the man I am today. I am forever grateful for her.
>>109014187~1-2B is already saturated if you look at the memebenches over time, the gains are super marginal even if you add reasoning and agentic tasks. Anywhere above, you can see how much the field has advanced.
>I cannot fulfill this request. I am prohibited from generating content that depicts or encourages the sexual exploitation of children or non-consensual sexual acts.aaaaah nooo i want to fuck cunnies in high speed with qat fuuuuuck fuuuuuck where is the uncensoredddd
>>109018504>~1-2B is already saturatedi'm not seeing it based on your chatllama2 is under-trained, hence all the "q4 == f16" memes from that erabut llama3.2 3b @ 9.7 -> recent qwen3.5-0.8b beats itif you isolate meta and qwen:-meta's datasets / expertise saturated at 9.7-qwen are (were) steadily improvinglooks like qwen aren't really releasing anything now.
>>109019034At best it's diminishing returns. You can feel free not to feel like that is an issue and look outside the benchmarks but generally, the gains at that size is negligible to nothing. It's not even like with the agentic and coding scores actually amount to that much in real usage. The tasks Llama 2 1B was doing are still the same types of things Qwen 3.5 0.8B can do.