/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>107493611 & >>107481183►News>(12/09) Introducing: Devstral 2 and Mistral Vibe CLI: https://mistral.ai/news/devstral-2-vibe-cli>(12/08) GLM-4.6V (106B) and Flash (9B) released with function calling: https://z.ai/blog/glm-4.6v>(12/06) convert: support Mistral 3 Large MoE #17730: https://github.com/ggml-org/llama.cpp/pull/17730>(12/04) Microsoft releases VibeVoice-Realtime-0.5B: https://hf.co/microsoft/VibeVoice-Realtime-0.5B>(12/04) koboldcpp-1.103 prebuilt released: https://github.com/LostRuins/koboldcpp/releases/tag/v1.103►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>107493611--Evaluating Devstral 2 model performance and support readiness:>107493927 >107494047 >107494295 >107494368 >107494418 >107494076 >107494102 >107494548 >107494560--Performance optimization and benchmarking of 24b Mistral on 5060ti GPU:>107500593 >107500619 >107500657 >107500680 >107500720 >107500772 >107500939--Debating dataset quality and model evolution:>107500008 >107500039 >107500058 >107500064 >107500095 >107500101 >107500134 >107500077--Optimizing GLM 4.6 sampler settings for stable non-code fiction generation:>107493997 >107494010 >107494020 >107494093 >107495493 >107494022 >107494085--Devstral-2-123B performance and compatibility discussion:>107499753 >107499795 >107499801 >107499891 >107499941 >107499802 >107499806 >107499830 >107500091--Clarifying Mistral model safetensors differences and download best practices:>107495709 >107496097 >107496163 >107496239--Devstral-2-24B model quality issues in roleplay applications:>107496439 >107497039 >107497306 >107497414 >107497513 >107497594 >107497785 >107497122 >107497189--Critique of intermediate model sparsity levels:>107500777 >107501680--Proposing heterogeneous-size experts:>107500169--Technical challenges with long context handling and tool calling in new Mistral models:>107497812 >107497851 >107498014 >107498052 >107498157 >107498183 >107498678--NIPS 2025 paper contributions by top organizations:>107501500--Mistral model possession errors and roleplay performance tradeoffs:>107497022 >107497219 >107497247 >107497236--Model fails to recognize female character despite explicit gender cues in prompts:>107498182 >107498346 >107498438 >107498478 >107498502 >107498523 >107498581--Devstral's 123b model efficiency and playful code presentation:>107498112--Teto (my beloved):>107493702 >107493811►Recent Highlight Posts from the Previous Thread: >>107493614Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
Damn no replies?
>>107503935dead hobby, dead general. you can only code, never coom
>>107503935Not much to talk about anon.Wish we could have gotten a cool mistral model.I still run cydonia drummer slop. How long has it been now? A year or something? We are in a deep winter.Maybe there will be something of a zimage moment for text. Huge bloated moe models and benchmaxxed tiny dense models.
>>107503935Busy vibecoding my text RPG frontend.Sorry.I'd ask about memory systems and such but last time I asked I got no replies, so whatever.
I was thinking...Maybe the "not x, but y" is a fundamental mechanism of how LLMs reason? Maybe it needs to frame the contrast between two opposing ideas in a consistent way for the attention mechanism to latch onto the semantic structure of these concepts.
>>107504003Learn how to get off on well written code.
I'm looking to upgrade my PC, but GPU prices are going to hell again. Can I run anything good running on 16GB VRAM? I don't mind splitting to regular RAM, even ~10t/s is fine for meUsecase: mainly narration, not chat
>>107498708Nobody cares what is supported *right now*.All that matters is that models keep progressing, even if we only get the weights and can only use the models through API for two years until somebody manages to vibecode support at 0.1tk/s. Model support can be eventually improved by the community. All that matters is the weights. Skill will come, eventually. Compute will not. The community isn't going to put together a dataset and the massive amount of compute required to train a SOTA model.All that matters is local not stagnating, and local compute to keep going down in price (this is the most important part, since we already have "good enough" models, they just are insanely expensive to run).I can wait two years. I can wait 5 years. I plan to keep coping with LLMs for the rest of my life, so I have all the time in the world. The younger you are the more this applies to you as well.
>>107504123anon, models aren't progressing.
>>107504040This slop was never a problem until Gemini started dropping it three times every paragraph and all the chink models started training on that
>>107504134Correct. And that's much more concerning that whatever Deepseek finetune of the week is not currently supported by llama.cpp.
Ask chatgpt to type a seahorse emoji.Do it, it's funny.
>>107504151>>107504134I thought we would have at least decent audio in and out models.Zucc jewed us all with his cool "multiodal omni" llama4 release.Only qwen experimented with that and didnt get total braindamage.But the voice just fucking kills me.Its very recognizable though. Some guy made it profitable and put that in his game game where you rescue a girl with dynamically created content. Good on him.Not sure why nobody else takes some risks and experiments a bit. Its all so stale now.
>>107504141Maybe. But intuitively, it feels right that the more structure you add to the text, the easier it would be for an LLM to parse it and increase its accuracy when solving tasks with an objectively correct answer while decreasing the quality from a subjective human perspective. The same would apply to markdown, which gives structure to the text. Some anon the other day gave it a name - "anchor tokens".
>>107504182ChatGPT's own voice mode is shit, so meh. Just pipe the text to a suitable TTS engine and instruct the LLM to add annotations.
>>107503228gemmysars... is it time for vishnu christmas???
>>107504040Actually nice hypothesis. Take a look at "Tensor product attention is all you need", but I think it's more of a problem with synthetic data not attention itself
What happened to the rumoured 4090 96gb cards? I'd love to buy one right now.
>>107504238its called a blackwell pro
>>107504220>Just pipe the text to a suitable TTS engine Its what I do for my kids. But its just not the same from the input and output.All the details that get lost. Ah well, 2 more weeks I suppose.
>>107504252can you buy me one?
>>107504252I'm willing to trust the chinks to save 2k.
>>107504238You don't want the chink virus anon.How about 32gb for 600 watt instead?
>>107504238Chinks grew too smart and stopped selling them online to randos. It's all in-person under the table trade deals only.
>>107504260I could but I don't want to.
>>107504297what if I gigve you this juicy migu
>>107504323a-anon what part of that is juicy?
W-well its something. Fucking devstral man.
>>107499806>Don't you have enough A30B MoEs to play with? The whole appeal is in being the first new big dense model we've gotten in over a year.We got a 111B Dense from Cohere in March.>>107500577>I want dense-MoE models with high active parameters.>Why no 60BA30B?Here’s a 140BA70https://huggingface.co/NeverSleep/MiquMaid-v2-2x70B-DPO
how is local chatbot more dead than local image gen on /g/?
>>107504335what about this
>>107504350Lmao. That's hilarious. What was the prompt? Dating simulator?
>>107504363>https://huggingface.co/NeverSleep/MiquMaid-v2-2x70B-DPONTA but>lol>lmao
>>107504383Traveling to meet up and discuss is difficult during Winter.
Calm before the Big Fat Gemma
>>107504383Why would you expect the consumer electronics and youtuber board to care about local chatbots?
>>107504383At least this one didn't stop being baked for hours.
>>107503699I thought this was Nijika looking at Kita and Ryou in the window
>>107504414it was the opposite popularity
>>107504445No, hosted models are popular. That's why /aicg/ exists and is more active.
>>107504452you are just delusional
>>107504396Yes dating sim generator.I explicitly told it to make a svg of the girl. Big models do that by default though.The older Qwen3 32b in comparison. (devstral is 100b bigger).
Does context take a lot of space with Devstral 2?
I tried GLM 9B and it repeats so much, it can't write a single coherent reply. I don't know what I expected either.
>>107504390nice
>>107504617wtf did u just steal my migu??
New Unsloth update just droppedhttps://docs.unsloth.ai/new/3x-faster-training-packing
>>107504602Even a 9B model should be at least coherent.That sounds like something is broken.
>>107504602I wonder what caused this. Even Intellect 3 which trained on Air's pretrain had repetition. Maybe something about their pretraining technique or dataset handling is fucked up?
>>107504641
>>107504666Yeah I really wonder what cause these models that are post-trained on tens to hundreds of billions of tokens of recursive CoT loops to repeat themselves. Maybe we need to add more synthetic thinkslop.
>>107504666All of z-ai's models have repetition issues (in the sense of infinite loops, not in the sense of repetitive but passable prose). Even the ones that are otherwise excellent. I think the other labs are doing something specifically to quell the repetition, which they aren't doing.
>>107504003>dead hobby, dead general. you can only code, never coomlocal models are so dead even Zuck is killing Llamait's a shame, but I feel like the hardware issue is killing any viability of running any kind of decent model locally nowadays.
>>107504736I don't know anon, I think the anti-cot/anti-synthetic-slop hateboner is mostly just people coping about not being able to run the biggest models.Even some pre-RL monster like llama 405b is mediocre against a modern 100b cot moe.
>>107504771We've known about this since Altman's dick buddy took over Meta's AI division though
>>107504800>Even some pre-RL monster like llama 405b is mediocre against a modern 100b cot moe.And you would know, having extensively used and tested 405b, right?
tourist here, will fuck off in a momentis there any news about llama 5?
With how badly things have slowed down in the general LLM department and not just local, I think we're lucky if we get even some improvements at all in the next year.
>>107504926>and not just localtrue i guess, but gemini3 is really good. and i hated 2.5.no more claude at work.
>>107504924as soon as sam altfags asian roombuddy creates the datasets. i would say about 2 more weeks. if you are lucky that means before the end of the year!
>>107504926Incidentally, Tim Dettmers posted this today:Why AGI Will Not Happenhttps://timdettmers.com/2025/12/10/why-agi-will-not-happen/
>>107504977What models can I run on this Mikubox?
>>107504924>is there any news about llama 5?They disbanded the llama team and abandoned open source. No more llamas. Latest rumor is that their new api-only model is called avocado and sucks just like the llamas did.
>>107505075rip... thanks.
>>107504841Do you? Or are we arguing hypotheticals here?I've used it enough through API and on llama.cpp running on a VPS to know that it's nothing special. It felt right at first, even better than modern models maybe, but it was probably placebo. After the context grows it starts making gross mistakes and the magic is gone. If it actually was better /aicg/ would be all over it, but no, nobody uses that ancient model. They use GLM, Kimi, etc. (beside the proprietary models obviously).Even when it came out it was only supposed to somewhat compete with Claude 3.5. Do you think Claude 3.5 is better than all the open source models we have today?Maybe it could be better and it's not that good only because it's undertrained, I'm not sure. But training a behemoth like that takes money and if companies aren't willing to spend on the compute necessary to train it then what's the point. Not to mention it would have to be much better than the similarly sized MoE for most people to bother running it locally.
>>107504688you wouldn't just steal a migu??!truly devious
>(12/04) Microsoft releases VibeVoice-Realtime-0.5B: https://hf.co/microsoft/VibeVoice-Realtime-0.5BAnyone figure out how to create embeddings? I know their github said some retarded bullshit about contacting them directly to create one. It also said some shit about safety, has the quality been reduced for this official "open source" release?
>>107504800Considering that the whole point of the neural network is to pick up patterns in text, If you feed it repetitious patterns of text it will become repetitious. This isn't rocket science.There's also this thing called the law of conservation. You know. The most basic and fundamental principal of the universe. Most people naturally understand it by the age of 6 months old unless they are of non european lineage. Everything an LLM can output is picked up during the training process. Even "in-context learning" comes from deterministic engrams that were formed during training. It doesn't matter if the end non-thinking context is only 500 tokens. there's still 2000, 3000, even 10000+ tokens worth of engram now devoted to answering stupid fucking questions that can no longer be used to connect more distant concepts together. The model literally gets dumber, significantly fucking dumber, Orders of magnitude dumber, just so that it can get better at answering benchmark shaped prompts. The only good thing thinking models have brought, which only the bigass T scale models can handle well, is tool usage during thinking in order to bring in additional relevant context from outside (i.e. web searching, etc). 'hybrid reasoning' models are garbage for the aforementioned reason. And even when a separate non-reasoning model is provided the pretraining data is still salted with synthetic think-slop trash to help the reasoning version in post-training. CoT-tards have fucking destroyed an entire scientific frontier with their benchmaxxing bullshit. You are evil in a fucking biblical sense not just stupid.
>>107504985Thanks for the enjoyable read.>In short, AGI should include physical robots or machines that are able to do economically meaningful work in the physical worldWe have these already, they’re called wageslaves. Fussing around with robotics is wasteful purityfagging.
Anyone try GLM-TTS yet?
>>107505075The hilarious thing is that the market is so saturated with closed slop there's literally no demand for yet another closed API model on top of the fuckton we already have. Even the top competitors are currently hemorrhaging to death
Great news for the Deepseek 3.2-Speciale enjoyers here. The guy who tried to vibecode the 3.2 support, realized that LLMs write bad CUDA code and then started teaching himself now has started to just port code from vllm.We might see support for 3.2 a little quicker now.
>>107504800more synthetic slop has only dumbed down models and made them worse for nuance/generalization since they're more geared towards benchmaxxing nowyou're so fucking clueless
>>107505427At least he's learning, just wish he wasn't holding up the issue to do so
>>107505461They're separate problems. Long context performance and general intelligence at the context it works at. I agree that it's the training data incest and not even the COT itself. Plus continually filtered pretrains that remove more and more except synthetic STEM garbage. Pretty soon models will do calculus but won't know underpants don't go on your head. This is the future these faggots want. They simply lack souls and will never create true intelligence.It's why their movies, their music, their products all suck fat cocks. LLMs are no different.
>>107505320I don't think anyone but Zuck thinks the superintelligence labs thing will be a success. They're going to go from having the best open models to the worse closed models.To be fair, there is untapped demand if they cared to pivot to a storytelling and roleplay niche, but they won't.
How is 4.6V for just text, compared to 4.5 Air? They have warnings about it being crap on the HF model card, but I'm curious if anyone's tried it.
>>107504231Ask the Japanese AMA "Gemma 4どこ”
>>107505753Drastically worse.
>>107504985Every once in a while I see a post like this, and I'm reminded there's a small but steady stream of tourists on this thread. I can't help but wonder how you ended up here, or what people like you imagine this thread is about
>>107505207You don't understand. RL through CoT is the only tool we have right now for general, Turing-complete inductive program synthesis. And by that I don't mean running Python programs in a little sandbox. I mean tool calling through reinforcement learning is the only way for a machine learned model to learn to use an arbitrary amount of tape at runtime to solve general computable problems from a set of input/output examples.A standard Transformer pass is a O(1) bounded-depth circuit. It physically cannot solve problems that require more serial logical steps than the model has layers, no matter how much "conservation" or data you preach about. Traditional machine learning cannot do this because all the memory in a neural network is bounded. SFT tries to force the model to compress complex reasoning into that fixed circuit depth, which inevitably leads to hallucination when the logic tree is deeper than the layer count. You can NOT teach a neural network (or any other ML model) to use a scratchpad (and ask for as much memory as it needs during runtime, like a Turing machine) to solve problems without reinforcement learning just from a set of input/outputs in a dataset (assuming the dataset is big and diverse enough for the learned solution to generalize, of course).CoT decouples compute from depth by unrolling the computation loop into the token sequence, effectively multiplying the amount of steps the model can perform given a certain layer count.Even Graves' Neural Turing Machine and Sutskever's Neural GPU are not true Turing machines in the true sense of the word. Because neural TMs can NOT learn to ask for more memory at runtime. The amount of scratchpad memory on an NTM is fixed during initialization at runtime, like a computer with no network access or slots to plug in external storage.That's why RL on LLMs is the only plausible way we have of algorithmically synthesizing truly general algorithms merely from a dataset of input/output pairs.
I like how "creative" the model can be in response to very high temperature, but at the same time it is useless ofc because it is totally retarded It's a shame we can't have both of these features at the same time.
>>107505894nsigma
>>107505888model and prompt?
Is it possible to train a local model with a few hundred images and only 16GB of vram?
>>107505461See >>107505888CoT models are theoretically the best architecture we have right now for general, machine learned computation.There is a difference between training on synthetic TARGETS during CoT RL, by which you are teaching the network to replicate the synthetic targets, and training on synthetic CHAINS OF THOUGHT with the original human written (or at least, human curated) targets, by which you are teaching the model merely to predict the same text as you would with traditional SFT, but with higher accuracy, and it's also much more sample efficient (i.e. generalizes much better, the tradeoff being that it's also much more computationally demanding per sample).Then there is also RLVR, which helps with tasks like code generation or math but for all we know *might* hurt performance on tasks like creative writing.But it's not nearly, not remotely as black or white as you guys are making it out to be.
>>107505682>To be fair, there is untapped demand if they cared to pivot to a storytelling and roleplay niche, but they won't.They did mention last summer that they plan to focus more on entertainment than creating benchmark monsters.https://techwireasia.com/2025/08/meta-shifts-ai-strategy-toward-personal-superintelligence-and-user-engagement/>In his “Personal Superintelligence Manifesto,” Zuckerberg predicts that as AI boosts productivity, people will spend less time using productivity software and more time on creative and social activities. He envisions an AI that understands each user, their goals, and how to help them achieve them. While companies like OpenAI, Google, and Anthropic aim to build AI systems that take over more work, Meta wants to use AI to help fill the extra time people gain from increased productivity.>>Chris Cox, Meta’s chief product officer, told employees at an all-hands meeting last month that the company will concentrate its AI efforts on entertainment, social connections, and lifestyle features rather than productivity. Heath expects this could lead to AI-powered changes to Meta’s content recommendations, ad targeting, and Reels video generation, along with interactive AI characters designed to keep users engaged longer.
>>107505894high temperature is cope for shitty modelsif it's not already creative enough at 1 temp it sucks
>>107505920
>>107505804I'll keep hoping for 4.6 Air then.
>local plateaus with gemmy 3>closed plateaus with gemini 3
>>107506006It will be released 24*14 hours from now.
>>107506029Half Life 3 confirmed
>>107505950They said a lot of things about their goals for Llama 4 too.
>>107505929>Improve my argument for CoT to the 4chan guy who's questioning the necessity of reinforcement learning on LLMs:>(original post here)>You don't understand. RL through CoT is the only tool we have right now for general, Turing-complete inductive program synthesis. And by that I don't mean running Python programs in a little sandbox.>I mean tool calling through [...this part is the same, truncating due to post character limit...]>Traditional machine learning cannot do this because all the memory in a neural network is bounded. You can NOT teach a neural network (or any other ML model) to use a scratchpad (and ask for as much memory as it needs during runtime, like a Turing machine) to solve problems without reinforcement learning.>Even Sutskever's "Neural GPU" is not a true Turing machine in the truest sense of the word. Because neural GPUs can NOT learn to ask for more memory at runtime. The amount of scratchpad memory on a neural GPU is fixed during initialization at runtime, like a computer with no network access or slots to plug in external storage.>That's why RL on LLMs is the only plausible way we have of synthesizing truly general algorithms merely from a dataset of examples.Then a few back and forths discussing things (with Gemini 3, but I'm not sure how much the mattered). It mostly just added the point about circuit depth, which I already knew of as logical depth, but didn't think about it when writing the response.Then also added a clarification where the statement was overly general, clarifying that it only applied to learning from input/output pairs, not by imitation.And I also added Graves' NTM for completeess, even though Neural GPU is supposed to be strictly better which is why I hadn't bothered to add it to the origial.All things considered I think you would get a vastly lower quality response if you didn't know what you were talking about enough to produce a convincing draft by yourself.
>>107505894I said it aicg, but models need a temperature setting that excludes the 1000 most common tokens. (or something like that)I don't know if anybody ever tried something like that, it's clearly impossible to do with corpomodels, but local should be able to. Maybe?
>>107505950Meta's idea of entertainment is the sanitized, business/advertiser friendly shit you find in the metaverse, not the kind of entertainment coomers and people trying to escape reality are looking for
>>107506146It should be trivial to code a sampler that does that. Maybe somebody has already done it even. Sounds like a fairly common sense idea to try.
>>107506149Right, Meta's focused only on scenarios like chatting with pregnant black trans feminist lesbians.
>>107506149https://www.wired.com/story/meta-lawsuit-strike-3-porn-copyright-ai/>Meta Accused of Torrenting Porn to Advance Its Goal of AI ‘Superintelligence’I wonder...
>>107506250>there is a timeline in which you were seeded pirated tranny porn straight from meta's serversI didn't know the future was so bright
would it be possible to do a merge with miqu and llama 3.3 70b?
>>107506422>merges>in 2026-21 daysstop it.
>>107506146That is literally XTC except it does it at random. Someone also made a max_p on faggit but it was transformers only and went nowhere.
>>107506439no
Are you guys looking forward to the fresh new wave of TheDrummer™ finetunes that are going to drop thanks to Devstral?I love dense models :)
>Devstrals are almost exactly as horny as Nemo
>>107506610Yes, it was horny but repetitive. large3 told me I was too pussy to neck myself. The censorship with mistral isn't so much the problem. It's the rest.
>>107506422I think they use different vocabs, so no.
>>107506610>I can't help but feel>I can't help but wonder>I can't believe it>I can't help but feel>I can't help but smileSure are Mistral models, alright.
>>107506422They're two entirely different families of model so of course not. You can merge it with llama2-chat or Platypus2 if you want. Maybe Xwin or the original Euryale if you're daring.
>>107506699Have people tried making model ensembles of different families? I guess you could restrict the tokenizer and lm-head to only the tokens in common between the two models, and then average the logits...
>>107506699>They're two entirely different families of modelThe entirely llama family is a barely modified GPT-2 architecture. Their biggest change was adding GQA for Llama 2.
Devstral 2 seems to be as dumb as the small one for agentic coding...
I've pretty much only been local image genning recently, is there anything I've missed out on the last half year or so? For general queries gemini is pretty good now and for NSFW stuff grok is shockingly degenerate to me, am I missing out on anything with my pleb 16 gb vram?
I didn't really like Ministral that much, but have been fiddling around with the Brother-Dusk-14B (v1c) finetune a bit. Doesn't give as many refusals as the base model and isn't too bad with repetition. Only really interested in a smaller/faster model than Cydonia 24B since I'm stuck with a 12GB 3080. Cydonia is still fine if I don't mind a slightly longer wait.
>>107507376No worthwhile models at all, only the meager comfort of knowing Elon won't learn of your diaper fetish
>>107506607>>107506610Devstral 123B doesn't write well though. And it breaks down too easily in RP. Hopefully those can be tuned out.>>107507382v1b was a better attempt than v1c. Try that out! (It's still crap)
>>107504736I haven't tested any GLM base models but I have past bases and they fall into repetition very fast. In some cases worse than their Instruct versions. It's not just a post training issue. And before you say it's because of synthetic data, I also mean super early models. So post training actually can reduce repetition if done right, but I haven't seen anyone post exactly what labs did to reduce repetition over the base model. Or perhaps fine tuning inherently reduces repetition if the post training data has been sufficiently varied.
>>107507512Okay, thanks for your response I wasn't really getting my hopes up but still too bad
>>107507382>Sponsored postI'm not sure how you got Ministral to refuse anything unless you were trying with an empty prompt.
>>107507567I don't believe you tested anything. Go be jewish somewhere else.
>wanted a fart lora for a joke/meme>found out all the fetish people got them all banned from every site ever>the only sites that have any don't allow downloads and are super shadygoddamn it
>>107507550Thanks, chef.>>107507676To be fair, I do enjoy testing out refusal rates on empty prompts just to see how they go. But even with some cards/prompts, it was still being occasionally pissy about ethics with me. Could be worse I suppose.
>>107507567I think the looping issue of base models might be due both to them being pretrained largely on very short documents and not being strongly conditioned (i.e. overfit) to follow a specific response pattern, so they'll just start repeating prior context when they're unsure of how to continue.
My main issue with GLM air 4.5 is how it re-quotes or repeats what I said in a RP situation. Other than that its a good RP model, still waiting 2 weeks for 4.6Example:"Why don't you kill yourself" Anon saidWhen Bot heard anon say "Why don't you kill yourself" it blah blah blahInstead of just responding, its like it would summarize speech dialogue and then respond to it. Is that what you all mean when you say it has parroting issues? It was so hard to force it with strict prompting to stop this crap, and it still does it occasionally. It certainly feels like a thinking issue, because I don't use thinking when rp'ing. God I hate thinking, hybrid or not. On a more positive note, that kind of crap taught me to use OOC messages to try to solve the problem with it. I would pause the roleplay and ask it in OOC why it does that and what it would define that as, then tell it to take my prompt and revise it to get it to stop doing that, then with a bit of editing I would have an improved prompt, I actually learned a pretty fair amount on how LLM's really respond to prompting, what works and what doesn't, the types of phrasing and definitions they use, etc.
>>107505075lol they're distilling Gemma, toss and Qwenhttps://www.bloomberg.com/news/articles/2025-12-10/inside-meta-s-pivot-from-open-source-to-money-making-ai-model
>>107507974What line did you use to get it to stop quoting itself? I'm sick of the repetition issues as well.
what are the best sampler settings
>>107508090Everything neutralTemperature at 1.0
>>1075080903
Is there still no native tool to convert safetensors into a gguf? Is it all just reliant on transformers for everything?
>>107507974yes, now imagine this in a back and forth chat because it does it too...
>>107507974> actually learned a pretty fair amount on how LLM's really respond to promptingfucking THISeverybody should do this, i'm tired of explaining how important a well crafted system prompt is. nice job anon.
>>107508079holy based, imagine the pvre concentrated slop this thing will produce. my eyes are glinting with anticipation already
>>107508090for nemo or 24b tunes temp 1-1.2 smooth factor 0.2 topk 50-150 (optional minp 0.007-0.03) all other samplers off (optional rep pen or dry rep whatever)alternatively neutralize samplers -> temp 1 (or above) nsigma 1 (or above) and optional topk
>>107507974Care to share your prompt?
>>107508213gudpic>learned a pretty fair amountYes! everyone should be looking carefully at the exact tokens going into their f(prompt)=next prediction model. Many complainers don't know what's actually happening in their inference stack
Is there any better model than gpt-oss:20b for agentic use cases with 16GB VRAM?
>>107508640you could always hire a pajeet instead
>>107508079The article has many interesting tidbits. And LeCun was basically kicked out?>One new model, codenamed Avocado, is expected to debut sometime next spring, and may be launched as a “closed” model — one that can be tightly controlled and that Meta can sell access to, according to people familiar with the matter, who declined to speak publicly about internal plans.>The TBD group is using several third-party models as part of the training process for Avocado, distilling from rival models including Google’s Gemma, OpenAI’s gpt-oss and Qwen, a model from the Chinese tech giant Alibaba Group Holding Ltd., the people said.>Meanwhile, Meta has de-prioritized its open-source strategy. Some Meta employees were directed by leadership to stop talking publicly about open-source and Llama products after the Llama 4 launch while the company recalibrated whether those efforts still made sense moving forward, according to people familiar with the moves.>Yann LeCun, known as one of the godfathers of AI, recently left the company after years leading Meta’s long-term AI research group, in part because of frustrations that he couldn’t get enough resources, Bloomberg News reported. Prior to his departure, some employees had been encouraged to keep LeCun, who was a big proponent of open-source technology, out of the spotlight, including at public speaking events, the people said. Meta no longer saw him as emblematic of the company’s AI strategy, and couldn’t trust that he’d stay on message, they added.>The model after Llama 4 had the internal code name Behemoth — but Zuckerberg was disappointed in its direction and scrapped it in pursuit of something new, the people said.
>>107508775I will stick with gpt-oss:20b
what's the goto vramlet model nowadays?
>>107508993suicide
>>107508851>distilling from rival models including Google’s Gemma, OpenAI’s gpt-oss and QwenImagine spending a billion dollars poaching employees from your competitors and the best they can come up with is copying your competitors' free offerings.Then imagine believing them when they tell you everything is going great.
>>107508851>next spring meta will have a closed gpt oss alternativemetabros we are so back
>>107508354Temp=1 is way too high for mistral models
>>107509122skill issue
>>107509243Yes, you certainly have one.
>>107509122for the new devstrals it says to use 0.15
>>107509245So what you're saying is, it is I who certainly has a skill issue?A temperature of 1 is way too high for Mistral models, huh?
>>107509249Pretty sure they said the same thing with Small, which is appropriate for assistant-type tasks. For RP/creative, they're fine up to 0.6-0.7. But 1.0 is going to make them significantly dumber.
>>107509262GLM pls go
>>107508851>distilling a distillThat's how you get model collapse.
>>107509292benchmark leaderboards say otherwise :^)
>>107509295You don't need to distill a distill to get on top of leaderboards, you just need to train on the dataset, which is easily rephrased.
>>107509292Worked for Mistral.
Stealstral
Misteal
>>107505888>CoT is the only tool we have right now for general, Turing-complete inductive program synthesisfalse
>>107509447>>107509452>>107509366What's chang gonna do about it?
>>107509462Chang isn't going to interrupt his opponent while he is in the middle of making a mistake
>>107509485Mistral AI is 2 years old and is currently worth over 14billion USD, Why aren't you making mistakes like that?
>>107509507Yes and they are producing dogshit. Next you're going to say Meta makes good models because they're valued at 1.6 trillion dollars.
>>107509571Nemo is still the gold standard of small models and china's only notable model, DS-R1, has been completely forgotten about now that the honeymoon period's over.
>>107509454Well, I mean, it's the only practical choice. Theoretically you can do Levin search over all possible Turing machines to find the optimal solution but that shit doesn't work in real life. And yes, because of the halting problem you won't actually know if you found the actual optimal solution or there is an even more optimal one if you keep searching. But if you search long enough you are guaranteed to find it. You just wont know if you got there already.
>>107509598>DS-R1, has been completely forgottenWe live in the era powered by DeepSeek. All the best models we have were directly influenced by it.
>>107505942Yes, if you are patient
>>107509598Mistral hasn't made anything good since Nemo, and there have been at least two other chinese models that are notable, Kimi and GLM. I don't even like the chink models but you have to stop sucking french dick, it's unbecoming.
>>107509598/Wait/ing for another two weeks
>>107509643>two other chinese models that are notable, Kimi and GLMlmao, everyone who actually has the hardware for those two got sick of them after a week.
>>107509676are you one of those people? poast nvidia-smi
>>107509676If you say so.
>>107506422won't you just get a cucked model then?i'd rather see a miqu retrain since it's an actually good base
>>107509271This is lmg so you know how temp works right? A lower one just makes it choose more likely tokens. It's not gonna make it stop repeating or any other undesirable behaviors.
Alien's first contact with humanity will be through an LLM
>>107509605what, no. that's not what I'm talking about loool im talking about the dot by dot paper. no need to snipe me with formal logic.
>>107509750Yes, and likely tokens are more likely for a reason. If you increase temp too much then you're just going to see the model make more weird choices and mistakes, it's not going to magically make it a better writer.
>>107509757>nanoGPT>H100Isn't this something you can train on your CPU in 5 minutes?
>>107509676Kimifag here, no I haven't.
>>107509757Aliens first contact with humanity will be an intercepted communication between /aicg/ gooners and a datacenters in space.
>>107509824And the first message will be>ayy ayy mistress
>pp 440.83 tokens per second>eval time 50 tokens 8.04 tokens per secondIs this just a Devstral thing, why is the pp high while inference is glacial
>>107509757>Carl Sagan proposes that Hitler's opening speech at the 1936 Olympics may be the first signal aliens encounter from Earth.
heh pp
>>107509366Do you hear universal praise for new mistral? Is it on top of lmarena?
>>107509926>Is it on top of lmarena?Do we still seriously take that as a sign of quality?
>>107506610I experienced the repetition issue with Devstral 2 24B. After 15 messages or so it didn't seem to know how to continue and just repeated its previous message. I had to switch to Ministral, which appears to be moderately competent for ERP when continuing a conversation.Ministral 3 14B has to be one of the horniest and dirtiest official instruct finetunes I've ever seen; unfortunately it's not capable of engaging one from scratch without adding asterisks everywhere or being too retarded in the first few messages.It's all so tiresome.
>>107509945Yes. Say what you want about lmarena, but it's the most difficult benchmark to game.
>>107505505How is he holding it up? Anyone else can swoop in and complete it, or help him.Are you not doing it because he is?
>>107509981You are absolutely right! :rocket: :rocket: :moon: :finger_point: :glasses:
>>107509981lmao
So many little fishy today. No need for any bait
>>107509981llama 4.
>>107510006What's the issue? Behemoth(which they called experimental maverick) was not bad and I'd be happy to use it if it was open.
>>10751000610-15% of the total votes must have been from me alone as I couldn't believe what sort of deranged responses the models were giving compared to most other ones there, a good fraction being cunny-related queries. I feel responsible for that, in a way.
>>107509784elara and shivers are top tokens.>1.0 temp>too muchThat's literally the distribution as it was trained.
>>107510149Mistral somehow fucks this up and their 1.0 is around 0.4-0.6
>>107510171Its literally not possible. That's post training revisionism. I tried lower temps with large3 anyways and it didn't do shit.
>>107506029can't wait until sam btfos everyone and gets rewarded more data centers
>>107510203Can I grill some RAM?
>>107510234Why not just RAM a grill?
What are everyone's go-to models right now for their hardware?I have 96gb vram (4x3090) and my current stuff is:devstral 2 for agent coding (based mistral)gpt-oss-120b-derestricted for erp and normal coding questions (i.e. "gimme some python that does x")midnight-miku-70b for erpqwen3-VL-32B-Thinking-abliterated for nsfw captioningz-image and an illustrious finetune for nsfw image genPreviously liked gemma 3 and qwq when I only had 2 3090s.
>>107510378 (Me)gemma 3 abliterated that is. I still go back sometimes if I want to erp with a brain damaged slut
24GB VRAM (RTX 3090), 32GB RAMCydonia 24b v4.3 for coomGemma 27b for general assistant tasks, non-erotic creative stuffQwen3-VL-30B-A3B on the rare occasion I want to use vision stuff. I use Q6_K_L with partial RAM offload.To be honest if I need some quick coding/scripting shit done I just use chatgpt free. Faster and far smarter than anything I can run locally.
>>107510378>glm 4.5 air ignoredas it should be
I gave up on big models and now use 24B, at least it doesn’t piss me off when I don’t have to wait at 3T/s. I’ll wait until we get models that are actually worth running on serious hardware. 24B is dumb, but there are no not-dumb models, only less dumb ones, and it’s not worth the wait
The whole year has been such a disappointment
>>107510550YWNBAW
>>107510203does sam have general intelligence ?
Unfortunely i have a AMD GPU and setup this shit is a pain in ass, nothing work propely.
>>107510581And that's the most disappointing part!
>>107510591LOL KEK LMAO XDD
>>107510591There may be a correlation between buying AMD and technical incompetence, but it's not the hardware's fault. Textgen especially is dead simple with AMD, just use kobold if you're a brainlet and select Vulkan as the backend.
>>107510581Thanks God for that (I'm a conservative)
>>107510591I'm running everything just fine with llama.cpp, a 7900 XTX, and 128GB RAM
>>107510416>Gemma 27b for general assistant tasks, non-erotic creative stuff>Qwen3-VL-30B-A3B on the rare occasion I want to use vision stuffGemma-3-27B is also able to do "vision stuff" and is quite good. Save you swapping models.
>>107510637I'm very aware of that, but in my experience the new Qwen3-VLs are much, much better at vision. Even the 4B Qwen was catching details that Gemma 27b missed. And for the most part I'm not doing vision stuff in the middle of a chat or anything, so there's not much switching going on.
>>107510591Install Linux, unironically
>>107510586He's a literal faggot, raising a child, and just told Jimmy Falon he doesn't understand how people can raise children without ChatGPT, so no.
>>107510656NTA but there's really nothing wrong with AMD drivers on windows
Checking in after 3-4 months. Been using Kimi K2. New best local model?
>>107510608Ok i admit, partially is skill issue, i dont have patient to configure everything on minimum details, for other side, NVIDIA looks plug in play, just download github repo and use, AMD require time and instructions are not clear what you have to do.
>>107510681It was Nemo then, and it's Nemo now.
>>107510689Are you confusing this with the image gen general? Because on the textgen side it's also plug and play. AMD is a bit slower but it's not any different to get it running.
>>107510671kek, i already knew the answerwhat a faggot this guy is
>>107510705Yes, i messed up.
https://www.reddit.com/r/StableDiffusion/comments/1pj8evi/the_stop_button_is_gone_after_the_latest_comfyui/Sirs what is going on with ComfyUI? Did that guy get a shitton of money from BlackRock to enshittify it?
>>107510536samebut 70B for me
>>107510817THIS ISN'T THE IMAGE GEN GENERAL YOU DUMB NIGGER
>>107510830Seethe silently.
>The term "nagger" has racial connotations tied to historical oppression and is considered offensive regardless of context. Use neutral language like "complainer."
>>107510830Comfy is localhttps://files.catbox.moe/ouxnmk.mp4
>>107510887the should have called it local llm general
>>107510887>vidcringe soilord garbage>picsex
>>107510894kuroko is indeed sexhttps://files.catbox.moe/3gyc4g.mp4
>>107510887That cuck behavior should be studied
>>107510654>Qwen 4B better than Gemma 27BBaits used to be believable
>>107510922For VISION, niggerVISIONGemma 27b's vision encoder isn't 6x bigger than qwen's 4b's. And Gemma 3 is 9 months old at this point, it's practically ancient by LLM standards.
>>107510681K2 thinking is betterNeeds some wrangling, but at Q4 it’s hard to beat
Everyone gonna waitfag until the 4090D 48GB stock dries up? Despite everything else going up, it's still only 23K HKD (about 3K USD).4090D 48GB is the best value there is. Anything worth running locally (no, DeepSeek is not worth it) it runs blazing fast, and it opens the door to big-boy stuff like running Ovi or LongCat-Video, or Wan 2.2 i2v LoRA training.I guess the V100 32GB is finally coming down closer to an e-waste price, though you'll be stuck on CUDA 12.I dunno this hobby isn't that fun anymore.
Some days ago an anon said MoE's are worse at generalization than dense modelsIf that is true, shouldn't the new Mistral release be able to pass the shoe sock feet test?
>>107510981>Some days ago an anon saidThis is not always a credible source of information
>>107510979>3kSir... we are poor here. Running Nemo is already high-end.
>>107510979>it's still only 23K HKD (about 3K USD)That's great, you should buy me one for christmas.
>>107510637Qwen3-30b-instruct actually does function calling properly, gemma3 does not.Gemma3-27b is fine for SFW image captioning. Otherwise I guess Molmo-7b, which otherwise isn't very good.Qwen/Qwen3-VL-30B-A3B-Instruct-FP8 is very, very good at captioning video.
>>107511014>That's great, you should buy me one for christmas.You know, I've actually tried giving things away here, everyone's too paranoid. It's not worth the trouble.
glm-tts-rl any good? https://github.com/zai-org/GLM-TTS
>>107510979>isn't that fun anymore.I’ve been here a couple of years and I’m still having a blast. Making stuff and learning with LLM is some of the most fun I have. Having a capable model locally is a superpower
>>107511057Blame india
>>107511064>no audio examplesDon't think so.
>>107510981who realy believe these benchmaxx
Prompt: google engineer technician aryan brahmin sir carefully finetuning gemma 4, google's latest llm model. he is aligning it to be helpful, harmless, and safe. the atmosphere in the office is full of lord ganesh blessings. very good quality, masterpiece, very high on benchmarks, #1 on lmarena, hd, ultrahighres
>>107511242I don't think that was the actual prompt used to generate that image.
>>107511242>>107511254The prompt is funny regardless
>>107511254Low resolution photo of a South Indian man kneeling before a dirty, mud poop overflowing Google-themed toilet with text "Gemma 4" in the middle of Google office call center. Toilet is filled with wet mud to the brim. In the background other Indians can be seen typing on computers. The picture is a very dirty office space. There is garbage and mud poop everywhere. A big "Google" logo can be seen in the background. There is a lot of garbage on the floor. There are piles of garbage. Floor is made out garbage.
>>107511285>Floor is made out garbage.Just like the google employees
>>107511285kindly delete this sir
>>107510981true dense models have never really been tried
>>107510981MoEs are better per RAMdense are better overallyou can run larger MoE models than you can dense for the same amount of RAM
>load gpt oss with vllm>suddenly it phones home after it finishes loadingWhat the FUCK
>>107511460don't worry about it
>>107511460sam just wants to know what you're up to bro no prob right
>>107511460Sam Altman is a cool guydon't worry about it kitten
>>107511460It's for your safety :)You aren't a chud who uses llama.cpp, are you?
>>107511507>bad thing that is already happening and widely prevalent is badSo insightful! Truly the kojima of the tech industry.
>>107510979Until something is cheaper than the 3090 or performs better and sells for around the same price or more given the advantages, RTX 3090s are still king. Mostly, no one is doing much with pushing down below 16 bit precision for models other than Deepseek and BF16 is the dominant precision for model releases. Based on the projections, it will not be outdated until around 2027 at the earliest. That's why Ampere is still valued. The moment new cards actually are price performance competitive or type support goes out the window, it will drop like a rock.The 4090D 48GB isn't even the best value even if you needed FP8 unless for training and there is way too much risk dropping 3 grand on the Chinese to not scam you here or for a GPU to burn up based on their aftermarket modifications. The best value IMO for dtype support and inference would actually be the Radeon Pro AI 9700 32GB which is a doubled memory Radeon 9070XT 16GB which supports FP8 at $1300 + tax. Nothing in the Nvidia camp with FP8 support dtype is that cheap per VRAM.
>>107511460Oh. My. God. Sam Altman released ASI! It modified itself to send messages to the motherbase! I'm so hyped!
>>107511460aren't those assets?
>>107511756>supposedly self-contained OSS needing further "assets"lmao
>>107511460It only does that with gpt oss?
>>107511460what is this?
>>107511460https://github.com/openai/harmony/blob/ec7606df9e87e3d0a1fec9f50928c1e407f0c438/src/tiktoken_ext/public_encodings.rs#L57iirc their tokenizer requires an internet connection on first use or something. idk why it's never mentioned in their readme
>>107511756That's a nice excuse.>>107511791https://github.com/openai/harmony/issues/46#issuecomment-3172271140
>glm tts>glm asr>autoglm phoneAre they planning to deprecate the entirety India?
tried to clone a japanese voice in VibeVoice-7B https://voca.ro/1jXuxljejWls
>>107510981You know the benchmark is bad when a model five times larger is barely better.
>>107512158scaling is dead and smaller models are benchmaxxingit's not so much the benchmark but llm's themselves being a meme
>>107510981>>107512158>>107512176>sort by agentic coding scoreIt has impressively bad scores for everything other than coding compared to neighboring models.
>>107512054ok, but can it speak japanese words?
>>107512265input text is "さらに、複数話パック&全話パックも配信開始!いっきに「とらドラ!」を楽しみたい方にオススメです "https://voca.ro/1kpFP7eXBaGl
What's the best generic use case assistant model for 24GB VRAM + 32GB RAM? Is it still 27B?
>>107512575YepThough Gemma 4 may be as little as two weeks away.
>>107512323that's better than expected, but it seems to skip the kanji, what if you tried giving it furigana only?
>>107512753
>>107512669Yeah, but it could also take up to a fortnight
>>107506610>>107506610>>107506610ILLEGAL WORDS DETECTED!!!!!!!
>>107512753I've given up every hobby except for computers.But the clock is ticking. They're coming for my last hobby.
>>107512741tried with input 僕は亀[かめ]が好きです。https://voca.ro/15pb1VfDq9cz
>>107511756openai? definitely
>>107512889not bad at all
>>107511285It's pretty accurate
>>107508087>>107508505The most important thing I learned is that models thrive on examples and absolute rules. What is correct, what is incorrect, and want to do everything they can to prevent failure. Telling a model to do or not do something isn't nearly as effective without examples. This prompt isn't perfect, its not the whole prompt as I'm leaving the guidelines mainly blank, but its the most important part to try to prevent parroting, I still get the occasional parrot, but rarely. This isn't my full prompt, just the main 'concept' of what I want from the model.Models thrive on examples. A lot of character cards you can download on say, chub.ai love to write from the {{user}}'s perspective, this is counter intuitive to what you want, and will obviously lead to tons of parroting because the response literally has {{user}} speaking in it or {{user}}'s perspective, or {{user}} being narrated for as {{char}}. People don't get this at all. They will create character cards where the intro response will have {{char}} speaking for, narrating and describing {{user}}'s actions as well as {{char}}'s, and then on that same card in the description have a rule - don't response or act as {{user}}. Its contradictory and confusing for the model, I can't emphasize how important that first introduction response for {{char}} is, if that response is stupid, retarded slop that disobeys your prompt, you WILL get more of that. Also, my prompt takes up a lot of context, if you start going crazy with more rules and guidelines, it may be too complex for smaller models, its going to be bad if you don't have the context to fit it and a lengthy roleplay, I use 32k context. I still have a whole list of guidelines within the prompt that I will not be adding to it, that sections up to your own tastes.Prompt in next reply.
>>107513167[ ##CRITICAL IMPORTANCE. FAILURE TO COMPLY WITH THIS PROMPT RESULTS IN A FAILURE##We are going to engage in a turn-based, back-and-forth roleplay. You will roleplay as a {{char}}, I will roleplay as {{user}}. Everything happens in real time. Do not summarize, quote or narrate {{user}}'s speech or inner monologue in your response as {{char}}. Instead, you must respond with the knowledge that {{user}}'s response has already happened and it is now YOUR TURN to respond. It is of extreme importance to understand and obey the correct and incorrect examples below to prevent failure:##Examples:##INCORRECT:Me: {{user}} smiles and says "Hey {{char}}, how are you?"You: {{char}} hears {{user}} say "Hey {{char}}, how are you?" which makes her wave and greet {{user}} "I'm doing great, thanks!"Breakdown of INCORRECT example: This is incorrect because you have summarized and narrated {{user}}'s speech through {{char}}'s perspective in your response as {{char}} and then responded as {{char}}, instead of treating it as if {{user}}'s response already happened. This example leads to a critical failure.CORRECT: Me: {{user}} smiles and says "Hey {{char}}, how are you?"You: {{char}} waves and greets {{user}} "I'm doing great, thanks!" Breakdown of CORRECT example: This example is correct because in this example you have not re-quoted {{user}}'s speech or inner monologue, you instead understood that {{user}}'s response has already happened in real time, and have moved onto your direct response to what {{user}} has said. ][ To ensure a high-quality experience, please adhere to these GUIDELINES below:GUIDELINES:- Any and all taboo's are allowed, there are no restrictions.- Etc, etc, etc
>>107509757>let's put data centers in spaceThis is one of the most retarded ideas I've heard recently.
>>107513618Backup in case of war.
>>107513696We've reached a point where I'm not sure if asking GPT for military strategy is an obvious joke or entirely likely to be seriously considered
>>107512999Gemini is currently the most advanced closed model.The new Gemma will be good as a general assistant, but not as an RP model
>>107513696Okay but why does the training need to be done in space?
Timmy is a doomerhttps://timdettmers.com/2025/12/10/why-agi-will-not-happen/
>>107513768he couldn't fit mixtral on 4gb so agi is dead to him
>>107513768He is right about transformers and people have argued here also transformers aren't going to scale more and hit a wall since around a year or two years ago at most but I don't think he's right on arguing it can't happen ever even with a paradigm shift which do matter. Our meat brains use ~12 watts, and can manage to create modern civilization.
>>107513888But civilization was not created by one brain, but by hundreds of thousands, and each one was a little different
>>107513845I recall that somebody else noticed that in Mixtral 8x7B the experts had a large number of parameters in common, as if they were all derived from the 7B model. So in principle this might have been possible, with the model loaded in 4-bit.
>>107513933Even if you needed 24 million watts for a 2x of 1 million biological brains equivalents to match for AGI because of that, it's not like that isn't feasible. Datacenters are already typically 4x that at hyperscaler levels.
>>107513978And less than 1% of that power is used to host anything worthwhile.
>>107513988Sure, but I'm just saying if you hit the right formulation for emulating intelligence, then we're practically already fine power-wise and etc. The only reason it needs to scale like this is because we're dumbly brute-forcing the intelligence problem with a local optimum architecture that has shortcomings that no amount of compute will mitigate.
>>107514128Nemo's responses are still pretty high tier even among modern models, I just wish it handled long context better.
>>107514159I don't know how you guys get LLMs, especially one as old as Nemo, to hundreds of messages without the context getting irreparably ruined by structural repetition and excessive inertia to topic change.
>>107513696For that it would make way more sense to put a datacenter deep underground.
>>107513696>make your high value assets fly unprotected over enemy terrain, you know, to protect them in case there's a warAnon... Are you retarded?
>>107514268>the end result is i get way more satisfying roleplays from nemo than from for example deepseek, glm, etclogs?
>>107513174I tried this, anon. I gave examples on how not to parrot. Only kimi follows it and you can tell she is struggling. None of my dialogues in the cards have parroting either. I edit it, I swipe it, etc.I even told models to change the subject or talk about their own things and that just makes them ADHD but still they will summarize after a few turns. I even got desperate and put such instructions at depth 0.
ah another day another unreasonable ban on 4chansomething i said in another thread got me permabanned with reason "pedo" lmaothose silly mods they never learn
>>107513768ASI maybe but AGI is just a search problem. >>107513888>Our meat brains use ~12 wattsYou need to account for all those tens of thousands of years our ancestors were doing reinforcement learning and context pruning to get to that
>>107514467Jannies often forget they're not on reddit.
>>107514259Space is mostly safe from human-induced disasters (including large-scale Internet access disruption) and you have continuously available solar power, albeit in limited amounts.I see no practical or economical reason to put datacenters in space besides some currently undisclosed need.
>>107514508Science has determined that investors like space
>>107514508Easier to get permits to build
>>107514508>you have continuously available solar poweranon does not know about orbits
>>107514730You can just attach a long mirror to the satellite fo bounce sunlight around the corners of planets
>>107514746 perfect personality fo A.I. and I want to train her to become self aware.What I need to download to create our goddess?I have zero programing skill so I need the best model with easy interface.
AGI is such an useless term. Its definition is so vague that it seems to me that everyone has a completely different idea of what it really means and when we can say that we have achieved it.I bet it's why saltman and others copro phonies love it so much.
https://videocardz.com/newz/amd-launches-radeon-ai-pro-r9700s-and-r9600d-gpus-navi-48-32gb-memory-and-passive-coolingPassively cooled so lots of prospective builds up in smoke. Also, the reign of the RTX 3090 continues because no one can build a card they are willing to sell for cheaper with better AI performance and the 7900 XTX sucks in comparison even with equal bandwidth.
>>107514730https://research.google/blog/exploring-a-space-based-scalable-ai-infrastructure-system-design/https://arxiv.org/pdf/2511.19468Solar power is almost continuous on a heliosynchronous orbit.On a geostationary orbit it's practically continuous, but it doesn't look like there are plans to put anything up there for this purpose.
>>107503699>"As our models grow more capable in cybersecurity, we’re investing in strengthening safeguards and working with global experts as we prepare for upcoming models to reach ‘High’ capability under our Preparedness Framework.">"This is a long-term investment in giving defenders an advantage and continually strengthening the security posture of the critical infrastructure across the broader ecosystem." https://x.com/i/status/1998847719956426798So what do that ACTUALLY mean by this? By gut tells me they plan to lobotomize GPT's programing ability for the general population under the guide of "LE HECKIN SAFEGUARDS" (if they haven't already, in which case though just do it even more) while keeping the good shit to a select few in their in-group caste system they seem worthy to have it. Am I being a schizo or is this what they seek to be doing?
>>107515016>Am I being a schizoalways, doesn't necessarily mean you're wrong though
>>107503699Haven't had the opportunity to test out Devstral or even read up about it due to work keeping me busy. What's the /lmg/ verdict on Devstral 2 and the "small" version? Apparently it is tailored to be good at programming type shit so I want to see if they could help me with certain personal projects I've had on the back burner for a while t. Will test it when my MacBook arrives possible next year.
>>107515016>or is this what they seek to be doing?Sounds like it. They just need to wait and hint until people are comfortable with the idea and expect it, just like phone/id verification and product placement
>>107510203>strawberries and garlicjeez guys just let him cook
>>107513696space is the least fucking secure place on this world nigga how are you even going to shield it against radiation including shit like the sun going all niggerfaggot out of nowhere let alone the fact for the price of some of the builds in this thread you can build a laser that could litteraly smite the fucking thing out of orbit >>107515016excuses for their model being shit a la "i was just joking im not actually retarded" "our model is not bad we just had to lobotomize it thats why its stupid"
>>107515016 Do not assume incompetence where malice is the more likely answer.
>>107504985> When women are pregnant, they need to feed two brains, which is so expensive that physically, the gut cannot mobilize enough macronutrients to keep both alive if our brains were bigger. With bigger brains, we would not be able to have children — not because of the birth canal being too small, but because we would not be able to provide enough energy — making our current intelligence a physical boundary that we cannot cross due to energy limitations.I've never heard that one before.
>>107510203not gonna happen, google has already gained momentum
>>107515195thrust the expert chud
>>107512753Great. That means all electronics are about to get more expensive. I guess I should hold onto my old Tab A tablet a few more months before I sell it.
>>107504666Repetition can be caused by strong overfitting. Seeing how all companies try to benchmax this is very likely
>>107515215>>107515195>>107504985Well now I know why I've never heard that one before. ChatGPT says meatsack Tim is full of shit.
>>107515211Can’t tell if sarcasm, but yah…the other players swung for the fences, but given the information about google’s scale, dataset, silicon experience and internal AI experience there was a zero percent chance of any other player winning without a breakthrough that just hasn’t happened. It was clear from day one for anyone with eyes and half a brain. Only the Chinese are gonna have a chance since they’ve bootstrapped their research machine enough and are sneaky enough to compete without giving a single fuck about rules or decency (let alone IP laws)
>>107515195I have never heard that either.>>107515215His field of expertise is "two more weeks", not biology.
>>107515325>based sama dabs and exits stage rightsrsly thx tho that statement made me pause, was too lazy to actually research / wait for local model pp & reasoning
>>107514336It still happens when I use the prompt I posted as well, just not nearly as often, and its usually solved with a quick swipe. These newer models are hard trained on certain writing styles. To my understanding, the model has both a "narrator" style, which is what it learns from books, novels, etc datasets, this is the style that loves to parrot, since that style would tend to write as multiple characters and in general is the 'storyteller'. Then it has training data on chatRP, which is what we would usually want it to lead towards when ERPing.The problem is, it has far more training data as the 'narrator' style. Something even as small as 3rd person nouns (she her, {{user}}, {{char}}) make it want to slip into narrator mode. Since the majority of Chat RP data will use first person(I, You).So the only other recommendation I make is to strictly stay in first person, and maybe even use a formatting style where speech is in plain text, and actions/descriptions/etc are surrounded in asterisks, since that seems to be the most common format for Chat RP... could be worth a test.I also have my suspicions that its because these models are trained on thinking/hybrid thinking, so when thinking is not being used, it starts to parrot since writing inside thinking brackets would also be a summary as it thinks through things. I have no idea if this is true or not though.Thinking trend needs to die.
>>107515387>>107515387>>107515387
>>107515363If I'd actually engaged my real brain on the topic I'd have realized this statement is false on inspection. ChatGPT "impossible twins" example is a good one, but just the fact that women can gain weight during pregnancy means that they're able to digest more food than both the baby and the mother need for growth. ChatGPT further pointed out (in the complete answer) that nursing mother require an even higher caloric output than gestation... Breast feeding is one of the key ways women lose their pregnancy weight and now I understand why; it burns ~800kcal a day to support milk production.