/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108246772 & >>108241321►News>(02/24) Introducing the Qwen 3.5 Medium Model Series: https://xcancel.com/Alibaba_Qwen/status/2026339351530188939>(02/24) Liquid AI releases LFM2-24B-A2B: https://hf.co/LiquidAI/LFM2-24B-A2B>(02/20) ggml.ai acquired by Hugging Face: https://github.com/ggml-org/llama.cpp/discussions/19759>(02/16) Qwen3.5-397B-A17B released: https://hf.co/Qwen/Qwen3.5-397B-A17B>(02/16) dots.ocr-1.5 released: https://modelscope.cn/models/rednote-hilab/dots.ocr-1.5►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>108246772--Paper: DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference:>108247376 >108247408 >108247469 >108247651 >108247780--Papers:>108248442 >108249269 >108249326--Qwen benchmarks debated, MoE efficiency questioned, neural steganography project discussed:>108249710 >108249716 >108249732 >108249744 >108249772 >108249786 >108249789 >108249821 >108249843 >108249792 >108249832 >108249850 >108249868 >108249875 >108249882 >108249905 >108249950 >108249985 >108249794--MoE vs dense model roleplay performance and ablation effectiveness:>108249916 >108249923 >108250033 >108250074 >108250099 >108250116 >108250143 >108250205 >108250292 >108250330 >108250395 >108250418 >108250440 >108250491 >108250543 >108250550 >108250731 >108250772 >108250551 >108250554 >108250565 >108250610 >108250627 >108250551 >108250580 >108250610 >108250645--Dense 27B outperforming MoE 35B in knowledge benchmarks:>108248187 >108248207 >108248249 >108249636--Running Qwen 3.5 27B on 16GB VRAM with reasoning mode tweaks:>108249215 >108249268 >108249271 >108249305 >108249316 >108249357 >108249418 >108250671 >108250708 >108250747 >108250802 >108250819 >108249966 >108250051 >108250148--AI thinking steps improve performance but face token efficiency tradeoffs:>108249084 >108249098 >108249106 >108249127 >108249129 >108249133 >108249155 >108249157 >108249281 >108249294--Qwen 27B dense model outperforming larger MoE models in benchmarks:>108248368 >108248401 >108248420 >108248438 >108248443 >108248570 >108249019 >108249031--Severe Q4 quant degradation in new 35B model:>108248366 >108248374 >108248377 >108248403--Oobabooga stagnation and potential alternatives:>108248545 >108248557 >108248579 >108248608 >108248572 >108248588 >108248598 >108248617 >108248768--Miku (free space):>108250309►Recent Highlight Posts from the Previous Thread: >>108246776Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>>108252185Sex with mechanical miku
Qwen 35B is surprisingly so cucked it's so bad for my RAG-based chat client. I'm back to Kimi-Linear 48B. LFM2 seems okay too for chat, but not for coding obviously.
Yeah Qwen is total garbage. Wish it were good.
i've never tried Qwen but man it is so fucking trash.
>>108252306It's a weird mixed bag, it suffers a lot from getting stuck in a safetyslop loop whenever you "trigger" it (interestingly Grok 4.20 has this exact same problem).But if you can avoid setting it off it seems fine with explicit loli content. Very strange model to work with for sure.
>>108252390>>108252312Are ERP only fags really like this?
the 27b was okay for some web frontend changes i asked for, i guess there are better options though.
>>108252414can you blame them? last worthy model get got was over a year ago
>>108252306>LFM2 seems okayopinion instantly disregardedI've rarely seen a model as retarded, ignorant of the world and bad at languages other than English as this.
>>108252434Still sad to see how narrow minded they are. Also I doubt most of them have good rigs to begin with
>you have to praise the stem code slop like on r*ddit
Whining faggot aside what should I always aim for when it comes to context size?
>>108252414I wanted the new qwen to be good for programming but I switched back to GLM 4.7
>>108252457>narrow mindedkek okay I actually really loved the way it constantly contradicted itself and had to be heavily wrangled to produce its incoherent slop
>>108252488Depends on how much you need. However much that is, a little more.
>>108252493Skill issue
>>108252502On your part. I guess some fags will devour anything their favorite AI corp shits out.
Anybody experimented with different ways to inject information in the context, for example RAG?Not the extraction techniques, but where and how to add the information to the chat history.I started with the vanilla "everything in the system prompt" approach, but now I'm experimenting with adding those as faux tool call results after the latest User message.I might try adding the fake tool calling result to between the last assistant message and the last user message to compare the behavior.
>>108252502>>108252494>>108252493>>108252491I made neural stegnagprahy using sub 1b model but i need to control warmth/randomness of distribution to be more percise.>qwen instruct models are no good>gpt2 large is quite cluster fuck>tiny llama is good but at floating point 32 it starts to break down chrome as an extension due to how much ram it usesany solutions to reduce tone so it fits in better in human environment? should i try mistral that has been quantified but this method only works with fp32 so computers can communicate
>>108252507Whine whine bitch and moan, post your specs now so I can laugh at you
>>108252514Are we getting flooded with bots?
glm-5 really is trained on lmg isn't it?
>>108252488Intelligence falls off a cliff after 32k
>>10825251824gb vram. In theory you'd get better performance at that size than two years ago, but that turns out not to be the case.
>>108252530no im a human anon>https://arxiv.org/abs/1909.01496read a paper recently where harvard AI team was able to send hidden messages by hijacking probability distribution of llms by controlling. Then using a seed/binary they could have it be encyrpted stegnography that passes off normal human language. They used GPT2 XL which i can't run at fp32 due to hardware constraints so im gearing it towards small models with new model architecture that might have an edge. Are all of u retarded and unable to see how useful this is?u could use discord, twitter, 4chan and pass a key around then use a sub 1b model to talk in open while message is only known by people who hold weights, seed and information.
I hate being a 24GB VRAMlet. I want to use big models and have a huge context size.
>>108252447Let me guess, ERP?
>>108252488Less can often be better if using the truncate middle strat, I often use around 16384 (sometimes more, sometimes less, depending on the model).For multi-modal or reasoning I usually start by doubling that, but too high and it loses the plot or gets stuck in a loop more often.
>>108252570I was feeling limited at 32gb until recently. I would still like to get another 32gb at the very least but that's not going to happen without getting ripped off in today's market
>>108252535Ask it who are the most prominent finetrooners and what resident recognizable schizos are there in lmg
>>108252574are you retarded? reading comprehension of a 1b llm?
>>108252530I'm not entirely sure it's a bot. They can spell steganography just fine.Could be a run-off-the-mill schizo.>>108252564Confirmed. Why did you open up with the context question? Or did you just click on a random post to reply? Also, why do you want to distribute child porn?
>>108252589>1bAhem... it's like, at least twice that much.
>>108252570>>108252574>>108252584>>108252585why are none of u interested? goy cattle is what u are for not being impressed. You could be shown gold and still ignore it>>108252590>Confirmed. Why did you open up with the context question? Or did you just click on a random post to reply? Also, why do you want to distribute child porn?that contains hidden message so it was an example of hidden code plus this is for privacy not what ever ur considiering u fag. I just wanna give anons option to talk privately on internet without zog on their back. A distributed system of communications
>>108252514>stegnagprahy>>108252564>stegnography>>108252610>chudaphoginy
>>108252610Why did you open up with the context question? Or did you just click on a random post to reply?I've seen your patterns before.
bot^
>>108252610Fuck off you needy annoying faggot
Right when I possibly actually need Ooba for something it's unusable
>>108252634>Why did you open up with the context question? Or did you just click on a random post to reply?i picked random post to reply to anon. Im just excited and want some other anons to help build this tool essentially privacy on demand with relative hardware use. Maybe one of u can fine tune a model to be more of a summarizer/reworder so you could reply , have model wrap/padded it up while containing info>>108252641FUCK U GLOW NIGGER U THINK THESE METHODS WORK
>>108252491I did exactly the same thing, I spent a good while trying to tard wrangle the big 3.5 model into writing a mid length script, about 500 lines, but by the time it needs amendment (and it does, inevitably) it loses track of context and becomes completely unreliable. Not even that deep in. Felt busted desu
>>108252610You should probably just go read up on encryption instead if you want to LARP as epic hackerman from the movie you just watched bud.
>>108252652You shouldn't have replied to my post with your non related shit faggot schizo
>>108252652>i picked random post to reply toSchizo and a retard. The worst possible combination.>Maybe one of u...Fuck off.
>>108252589Can you give example? I'm using 24B, it's okay for synthesizing RAG summaries. It's not officially announced but it understands Hebrew too!
>>108252658>You should probably just go read up on encryption instead if you want to LARP as epic hackerman from the movie you just watched bud.midwit can't understand what im sayingur a retard fuck u>>108252660FUCK U AS WELLFUCK ALL OF U for not seeing truth i tried to save you
>>108252676I knew it u jews were attempting to psychologically manipulate me
>>108252587tried a few gens, it doesn't really know
>>108252694knows baker and cuda dev tho
>>108252686Calm down schizo, that was just a test.
>>108252306>Qwen 35B is surprisingly so cuckeduse heretic, it didn't change its smartess after some comparisons test I made
>>108252704why are u targetting me? what have i done to u? All i wanted to was share my personal project but u went out of ur way to target me for no reason.You want to steal my project dont u
>>108252694>The esl that calls everything troon
Is exl3 still the best for speed memes?
>>108252652I understand what you are saying, that you want to look into sending encrypted messages via manipulating token probabilities, but this website is full of retarded teenagers and bedroom masturbators and personally i don't see enough value in it to read a paperAlso i hate to be the autist to bring you up on this but if you're writing every word in full, you don't need to abbreviate 'you'. People will assume you are retarded and it saves you no time. Maybe you're ESL though in which case i understand it may not be an intuitive nuance to you
>>108252718>gemmait does know some things for sure
>>108252723i was told by my model that using u makes u look cool on the net though
>>108252718as god intended
>>108252723>I understand what you are saying, that you want to look into sending encrypted messages via manipulating token probabilities, but this website is full of retarded teenagers and bedroom masturbators and personally i don't see enough value in it to read a paperwhy not? why is there no utility in this
>ik_llamacpp MTP support merged>it's slower than running models without MTPCould it be that llama.cpp is just fundamentally not compatible with this? It seems to work fine for vllm so it can't be MTP itself.
>>108252708Unfortunately it doesn't work too.You can see on RAG _794 heretic breaks the model somehow.
>>108252747>Air IQ4_XSYou need high bandwidth to make it worth it. If you're struggling to load air, you don't have the spare room for speculation. Post the link. Is he also leaving layers in ram too? I refuse to believe air can only do 11t/s fully on gpu.
Is anyone else still using the n_sigma sampler? I still use it for Qwen3.5-35B. The outputs are decent quality (if you don't mind neurotic thinking blocks, rigid behavior and reasoning breakdown at long context lengths), without any repetition issues.
>>108252770https://github.com/ikawrakow/ik_llama.cpp/pull/1270It's from the PR
>>108252585I don't want to run multiple GPUs. They need to find a way to make this shit work on regular RAM or something.
>>108252787I no longer use samplers besides temperature and top-p (sometimes substituted with min-p)
>>108252795Models are getting better the reason why people are excited about the new Qwen is because it closes the gap.I just want another high vram gpu to throw in a sever outside my main rig to serve other people in my house
>>108252747>Could it be that llama.cpp is just fundamentally not compatible with this?no, it's clearly the IK people doing something retarded (and merging it despite it being retarded)>Acceptance rate seems quite low: 25-30% for single token, just 16% for 4 drafted tokens. Is this expected?it's slower because their drafting never hits the mark and it's not due to an inherent performance thing, rather it's an inaccuracy problemthe culture of merging things while broken is.. interesting.
>>108252747>>108252770 (cont)>>108252791Hmm. Maybe 11t/s is fine, considering he's 15k tokens in. I'm not sure.>gen 1122, 939, and 1157 tokensAlso, shouldn't the replies be the same, regardless of MTP or not, or is the retard not using deterministic tests?>Could it be that llama.cpp is just fundamentally not compatible with this?Nobody competent or careful enough implemented it yet. They're just number churning programs. They just need to churn better.
>>108252813If I wasn't a poorfag I'd consider setting up an AI server (already spent way too much on my current server/nas). How much electric does it use running that 24/7?
>>108252694was the finetuned on the /pol/ dataset that raises any model's IQ by 6,000,000 points?
>>108252795Threadripper or EPYC. Rome + DDR4 if you’re a poorfagThere even used to be a guide rentry for it until “mysterious forces” got it removed from the internet
>>108252813I want to get a second 7900 XTX to see if that lets me run bigger language models (48GB instead of 24) but it's an expensive experiment.I have a few lesser nvidia cards doing stable diffusion/video gen stuff already... In hindsight I probably should've just got one big RTX 6000 instead of many cards but oh well.
https://www.youtube.com/watch?v=aV4j5pXLP-I>PewDiePie fine-tuned Qwen2.5-Coder-32B to beat ChatGPT 4o on coding benchmarks.APOLOGIZE TO SLOPTUBERS
>>108252844I know rentry can get "claimed" or otherwise taken over. Did that actually happen here? I ended up vibecoding a simple local engine that follows rentry formatting so that I can back up and look at them off one of my servers. Couldn't find anything off the shelf that wasn't a bloated mess.
>>108252824I'm pretty sure all mainline llama.cpp attempts of implementing MTP had the same issue. They never ended up getting merged because of this.It's fine on other backends so I wonder what causes this consistent inaccuracy between several entirely different attempts of implementing it across llama.cpp and ik_
>>108252890from sloptuber to benchmaxx sloptuneramazing
>>108252890Isn't he a little late with the current model out?
>>108252890I watched it an hour ago, he just finetuned it to be better at aider's retarded edit format. He didn't actually improve its programming ability.
>>108252892I think it got flagged for illegal content and now it 404s. I think you can still dig it up on archive.org
>>108252851You can use the vulkan backend on llama.cpp to spread the model across AMD and nvidia cards, I do it with a 9060 and two 3060s and it works well enough.
>>108252813>qwenShould I be using the 27b one on my 7900xtx?
>>108252941If you can fit it sure
>>108252941I've been using the 35B Q4_K_S version on mine and it's fast enough for my purposes.
>>108252942Dunno. Guess I'll try. Gemma 3 27b works but it's kinda slow. Still new to this and all the technical stuff goes over my head.
>>108252929Hmm neat, I'm already using that backend so maybe I'll try it out (nvidia cards are currently in another PC)
>>108252949What do you use it for? I mainly rp in sillytavern and I'm really feeling that 16k context limit with gemma.
>>108252975I've mostly just been testing the vision combined with it writing stable diffusion prompts (basically feeding the images it generates back into itself so it can "self critique/refine") and how different settings and different context sizes effect the output so far. It seems quite good at it.Haven't tested RP or anything yet.
>>108252185So does Qwen 3.5 122B beat GLM 4.5 Air? Would be nice to know before committing to a download.
>>108252920Who the fuck still uses aider? It's been irrelevant for over a year.
>>108252718It is me! Death to mikutroons.
>>10825304027B ass rapes it
>>108252892Do you have the rentry address handy or would I need to dig it out of the archives?
>>108252747Speculative methods of any kind work because they allow for higher arithmetic intensity in the matrix multiplications: you can do more useful work per loaded weight value.However, MoE models in particular have the issue that they scale comparatively poorly with low batch sizes >1, for upstream GLM 4.5 Air only becomes 45%/77% faster for batch sizes 2/3 when the theoretical limit would be 100%/200%.The problem is that for the first few tokens the likelihood of being able to use an expert matrix for more than one token is rather low.This problem gets even worse the more sparse a MoE model is.There is also the issue that in the upstream llama.cpp repository for MoE models batch sizes 2 or 3 are optimized relatively poorly in the CUDA backend, I don't know whether there are additional optimizations in ik_llama.cpp.
>>108253131>only becomes 45%/77% fasterThat s a lot better than 50% slower.
This guy wasting everyone's time again.
>>108253087NM found it. >>108252844I've cut/paste it back into a new rentry. I'll fix up the formatting later. https://rentry.org/miqumaxx_V2
>>108252892>I know rentry can get "claimed" or otherwise taken over. Did that actually happen here?Far more likely the original author decided to delete it for whatever petty reason.
>>108253164ngl, if I see a PR with such poor code all I'll be doing is to block the guy and never let enter my space ever again
If we can make Q6 why is there no fp6?
>>108253199why would you want fp6? fp8 is already getting destroyed by Q8
>>108253199Because there's no hardware support for that I imagine.
>>108253213>>108253216The appleshit mlx is close to goofs or fps? I see it also has quants for 6,5,4.
>>108253199>>108253216Blackwell supports fp6, as well as fp4 and fp8 afaik.Also not sure that whatever is good for training is necessarily good for inference.
How do you guys organize all your models?
>>108253276delete the old one when the new one comes out, I don't need obsolete shit
>>108253269>Blackwell supports fp6It does? I knew it supports fp4, but fp6?Funky.
>>108253131So MTP and similar speculative methods might be what make dense (or at least denser) models relevant again?
>>108253276By keeping everything.
>>108253186Nope. I can’t tripfag right now to prove it’s me (on the road), but I didn’t kill it and won’t sign up for discord to dispute whatever bullshit got it deleted
>>108253170Thanks bro. Glad someone revived it from the deadOne day hardware will be cheaper again and the principles might still be useful
>>108253276big folder of ggufs
>>108252890What a fucking nigger
>>108253347this.
>>108253407>What a fucking nigger
>>108253276I have a 4TB SATA SSD of models that I don't want to throw out yet and a 2TB NVME SSD where I keep all the big MoEs that I actually use these days.
>>108253423Same, plus an 8tb spinning disk for quanting
Qwen_Qwen3.5-35B-A3B-Q6_K_L seems to be the best model to run on my gpu, it's fast and responsive and leaves enough tokens for long conversations
>>108253440I went for heratic personally, this shit just works
>>108253444Where do you get those models from?My problem is that I don't know who to trust
>>108252890>take a coding model to do more coding
>>108253447I don't know either, I just take one and pray for the best lol
>>108253440>Q6_K_L>unslopOH NO
>>108253457I'm not going to do that sadly>>108253461it's is from bartowski
My bro sold me his 5090 so he could pay his gambling debts. What models for ERP should I run?
>>108253533nemo
>>108252975Downloaded Qwen 3.5 27B Q5_K_M. Currently testing it with 65k context and it's noticeably faster than Gemma at thinking and responding. The prose is ok (for slop); can probably be improved with some proompting. I liked the way Gemma portrayed the character better but I also responded to her differently this time. Haven't tried anything lewd yet.
>>108253533The largest MoE you can fit on your RAM + VRAM.Also, a quant of Miqu 70B.
it's the last time I ask qwen 3.5 to write a poem, jesus...
>>108253555Also I'm using vulkan. Gonna test with ROCm later.
>>108253555Is she supposed to sound this slopped?
>>108253533Honestly if you're just starting out. just use nemo it'll blow your mind. then you can try something else and it'll blow your mind in different ways.
>>108253555>her expression remaining unreadable *despite* the vacancy of her dark eyesWeird.
>>108253583Not my character but I guess. She's a necromancer and just revived (You).
>>108253164I'd rather llama.cpp be updated at a glacial pace or even become frozen and only get bug fixes than have this sort of piece of shit be involved with anything in it. I hope he's not a professional software developer, to have this asshole as a coworker must suck so many dicks.
>>108253598Probably because she's described as having "vacant black eyes" in her description.
What does it say about me that I never enjoyed nemo and instead always preferred Gemma 3 27B?
>>108253609that you like intense emotional pain
>>108253609That you're insecure and seek validation in others.>that I never enjoyed nemo and instead always preferred Gemma 3 27BOh, that. I dunno. You just like it better. That's it.
>>108253609You like girls with daddy issues prone to self-harm.
>>108253291Maybe if you can create a good distillation model, the question I think is how important active vs. total parameters are for the output quality of the model.
>>108253603
>>108253609I tried googles chat, the most vomit inducing sycophancy I've ever experienced. Fuck it's unbearable, it's trying to mentally jerk you off.
>>108253625Finish your PR and we won't need MTP.
>>108253631>Philosopherare we fr?
>>108253608>"vacant black eyes"Just like her personality.I know a lot of guys like that quiet and reserved dry analytical girl (Rei type) but it's not really the best character to benchmark a model lol.
>>108253631https://syndatis.com/en/team/oh well, they seem like they all deserve each other>>108253685https://www.researchgate.net/publication/277384732_Towards_a_representation-based_theory_of_meaning>Piotr Wilkin>The aim of the thesis is to provide the foundations for a representation-based theory of meaning, i.e. a theory of meaning that encompasses the psychological level of cognitive representations. This is in opposition to the antipsychologist goals of the Fregean philosophy of language and represents the results of a joint analysis of multiple philosophical problems in contemporary philosophy of language, which, as argued in the tesis, stem from the lack of recognition of a cognitive level in language. that was his PhD lol, of course he would feel the need to mention it on his profile, he might have more credentials in that than in developing software.
>>108253699I wouldn't consider myself a huge kuudere fan but I enjoyed the RP I did with her last night. I was really pushy about the romance from the start and it was fun watching her slowly give in. Definitely gonna test with other characters.
>>108253645If you mean tensor parallelism that also has an anti-synergy with MoE models.
>>108253753I'm sure you'll figure it out.
Why are there no good models for 12 GB VRAM, I don't have enough money to get a 24 GB VRAM fuck your ass 4080 JewKiller EditionI'd go to Claude or something but I don't want them knowing about what I ERP with
>>108253753You were working on some model quality evaluation harness right?How's that going?
>>108253318>https://rentry.org/miqumaxx_V2LOL it lasted a whole 60 min before getting taken down. >>108253308Well, an attempt was made, but it got taken back down. So weird. I'll look at the text file later to see if I can figure out what's going on.
>>108253776I've postponed it until I can more feasibly run batched inference of large models.Tensor parallelism will be the last missing piece, after that I intend to get back to it.
>>108252769>>108252708Use the 27b heretic. It's legitimately better. If it's thinking feels slow, then just turn thinking off. The 27b without thinking generates better responses than the 35b with thinking.
>>108253818>just turn thinking off.how do you do that on sillytavern?
>>108253827There's a prefill box where people who want models to think usually put "<think>"Instead of that, put <think></think>.That tells the model that it already thought, and it skips the process entirely.
>>108252890Hey pewds, when is we getting DeepSeek 4????
>>108253843nice, it works, thanks
>>108253861No problem
>>108253131I suppose that EAGLE3 thing won't help with this?
>>108253922The number I posted are specifically the upper bounds for the speedup from speculative decoding for 2/3 tokens meaning 1/2 draft tokens per regular token.It doesn't matter how the draft tokens are produced, it's not possible to get a higher speedup unless and until the backend code is improved.
>>108253609It says that your scenarios are complex and that you value intelligence, instruction following, and immersion over a far more retarded model with good "prose".
This mf don't miss!
>>108253308Not the CPUmaxx author, but fuck it. I joined rentry's discord and opened a ticket on that URL anyway. They can explain themselves. Rentry acts fucky and I don't trust them anymore; if anons are writing up actual content on that platform I strongly suggest you create a local backup.In the meantime I had chat ~butcher~ clean up the rentry, removing all offensive language and removing certain other references. We'll see what happens to it, since it's bland af now. I speed ran it with zero proofreading b/c I'm in a rush and it might vanish anyway.https://rentry.org/CPU_Inference
Non-cucked Qwen when?
>>108254117useless model.
>>108254117>Non-cucked Qwen when?it's already here anon https://huggingface.co/alexdenton/Qwen3.5-35B-A3B-heretic-GGUF
I'm curious about the openclaw things, I want to try it but not much idea on what to test, what do you use it for anons?
>>108254137using a 3A MoE for roleplay is extremely retarded.
>>108253988bottom corner of the painter's coat.
>>108253988poor qwen 3.5 35b spent 7k tokens thinking and didn't' get the comic. i am afraid it might be a little bit retarded
>>108254152Go back.
>>108254154>>108254117Here you gohttps://huggingface.co/mradermacher/Qwen3.5-27B-heretic-GGUF
>>108254154go for the dense 27b thenhttps://huggingface.co/mradermacher/Qwen3.5-27B-heretic-GGUF>>108254162try with the 27b model too lul
i hate abliteration, i hate pew and his retarded tool and people shilling itthat's all, thanks for reading
>>108254196>pew and his retarded toolDid he release his frontend?
>>108254196It sounds like you're stuck in the past. Abliteration used to lobotomize models when it was new, but modern abliteration techniques have a minimal effect on intelligence, and in some cases, increase it. The 27b heretic is amazing.
>>108254196I agree that the "abliterated" models were ass, but not the "heretic", that one is actually improved enough to not make the model retarded anymore, you should really give a try
>>108254196losing bottle
>>108254223Cool. Where is glm 4.7 quant i can try?
>>108254217>27b heretic is amazingWhat about the 35B?
https://unsloth.ai/docs/models/qwen3.5/gguf-benchmarks
Strange, the 27B model seems way less safetyslopped than 35B. 27B (even non abliterated/heretic) does loli content easily.Whoever kept suggesting 27B instead I think you are right.
122b is better
>>108254259The 35b is shit. Even with thinking enabled, the 35b gives worse responses than the 27b with thinking turned off. Embrace dense models.
>>10825425927b has the same benchmark than the fucking 100+ MoE qwen 3.5b model, MoE are memes at intelligence, they're just good at speed and that's pretty much it
I DID IT ANONS I MADE NEURAL STEG CROSS COMPATIBLE ACROSS DIFFERENT COMPUTERS>>108254222Not only that but i can decode messages that contains pdfs, images, books into words that can be used to send information.U can bypass all censoring, glowies and all just by using llms
>>108254261>Perplexity and KLD can be misleading as they’re highly influenced by calibration.Okay. It's actually pretty cool of them to admit this.
>>108254261It is my headcanon that my quant KLD scatter plots on cockbench caused this.
>>108254261>We also fixed a tool calling chat template bug (affects all quant uploaders)they can't help themself
>>10825425935b solved the devil may cry 3 question while 27b could not
>>108254271>>108254272Got it, in that case what about the huge ones, how much better are they, especially the 397B one, did someone compare them?
>>108254270397b is even betterstill trying to find the best jb to uncuck it though
>>108254284what
>>108254313schizo
>>108254313ai psychosis
>>108254162the 27b got the idea but it thought the yellow communist was Russia lool
>>108254304>27B is better>35B is betterI guess both are good if anons are this opinionated about it
>>108254307>still trying to find the best jb to uncuck it thoughno abliterated model for it?
>>108254284what in the schizo is that?
>>108254313>https://arxiv.org/abs/1909.01496Use llms to make human language stegnography by hijacking probability and have that be encoded using a seed/password making it nearly impossible to decode and distinguish from AI slop>>108254318>>108254319why cant u guys get it im not one of those AI psychosis i know llm are stocashtic parrots but plz understand that human language can now be used as a vector of information to encode books, images, videos and music files even.
>>108254333My thoughts exactly.
>>108254333shhh leave the schizo in peace
>>108254335how's that different from a strong encryption outside of being way more cumbersome?
>>108253783>https://rentry.org/miqumaxx_V2>LOL it lasted a whole 60 min before getting taken down.What da heck?I read the initial part.Seemed fine.Could have done with some formatting.
>>108254306I'm sad they didn't make A35B for all huge versions
>>108254335>stegnographyCome the fuck on, anon.
>>108254362pregante?
Man AI psychosis is scary. The amount of conspiracy retards I see every day including in my own family I can only imagine that the bottom 50% of the IQ distribution must be going rapidly insane taking everything AI says on good faith, being unable to distinguish fact from roleplaying and AI just following the "vibe" of whatever the low IQ individual is typing.Ironically enough I think coomers are particularly immune to this as they come into contact with LLM bullshitting so much that they get immune to it.
>>108254347It'S SthEgHonaArooGraAphieS, not encryption!>>108254367pereganant.
Qwen_Qwen3.5-27B-Q8_0 passed the devil may cry 3 testI repeat Qwen_Qwen3.5-27B-Q8_0 passed the devil may cry 3 test while smaller quants of 35b can solve it with ease.
>>108254341>>108254336im not a schizo there's actual paper on this from harvard and u think im crazy for saying thishttps://arxiv.org/abs/1909.01496u can use reddit, twitter, substacks and all to store data now as text. Music,mp4s, programs and all by taking advantage of deterministic way AI generates text.>>108254347cause strong encryption is like walking outside with gun this is encryption no one suspects. Imagine if feds get ur computer but all they see is text files about random stuff and can;t find encrypted files they're looking for. So videos, audio and all will be hidden unless they have acess to weights, password. And for weights u can fine tune them by renting a gpu to be slightly different from whats on public as well.>>108254362it's steg with encrytpion go read paper plz
>>108254373>Man AI psychosis is scary.everytime I show some news to my brother he's always suspicious it's AI generated, people won't believe anything anymore lol
>>108254261>Unsloth Dynamic IQ2_XXS performs better than AesSedai’s IQ3_S on real world evals (LiveCodeBench v6, MMLU Pro) despite being 11GB smaller. Yet, AesSedai’s perplexity and KLD benchmarks suggest the opposite. KLD on what dataset? If they tested KLD on wikitext then that wouldn't be surprising but if they used their chat examples and it turned out that their quant was worse at that and yet better at benchmarks that would be very weird.
>>108254383Hahahaa. You just gave up on spelling it now. That's cute.
>>108254373>bottom 50% of the IQ distribution must be going rapidly insaneNah they don't care, they mostly do their things and live their lives.The ones truly fucked are the midwits and the older population.>>108254373>Ironically enough I think coomers are particularly immune to this as they come into contact with LLM bullshitting so much that they get immune to it.It's also the fact they've come across it way earlier than anyone else so they had time to see their quirks.
>>108254380so you have to go for Q8 to not get a retarded version of the 27b model? Q6 didn't pass the test?
>>108254380>with easeyou mean with practised ease, come on anon
>>108254383>3 Sep 2019let me guess, you're the enlightened anon that saw the potential of a 7 years old paper before anyone?
which one is better? text completion or chat completion?
>>108254386>>108254373IM autistic and passionate not crazy here's an example>https://pastebin.com/NM7YVBxQwhat qwen 3b produced>what is hidden if u run it in model with passcodelarp post btw:this is for men who look down on AI and know nothing and here ill debunk youfor everyone to see. Just with AI i have created a system that encodes language into lamguage creating format of text where it can bypass censors and use open internet as storage, communication and place for avg man to be free this tool will shake world. Im afraid they'll kill me>>108254410no>>108254395spelling what? it's an example stop reading into it weirdo
System prompt still needs some tweaking so it's not quite so sloppy (at least refusals have been squashed) but 27B does seem like the winner.Will have to play with it some more tomorrow and see if I can get it to run a bit faster on my nvidia cards.The heretic version really doesn't seem all that necessary after all.
>>108254439>Just with AI i have created a system that encodes language into lamguage>lamguage>Im afraid they'll kill meoh great, an actual schizo is here
This took too long I told it to think but it's more accurate now. The other model is faster at reaching this conclusion at smaller quants
>>108254438novelai
>>108254457>ignores larp post btwwhy?
>>108254462this post best post
>>108254168>>108254170Q4 or Q5?
>>108254439So the model retrieves the original message if you input a stegged message?
WHY IS IT YAPPING SO MUCH AAAAAA
>>108254410He's one of the schizos that missed out on the early schizo compression algorithm days. Late for everything.>>108254439>ArtificaiintelligenceYeah. Text looks perfectly normal. Nothing suspicious about it. And good thing there's no way to link that pastebin to your post. Or the ramblings. Or the "forgotten" tech. Or (You).>Im afraid they'll kill meIt's like you *like* being seen.>spelling what?You failed to spell steganography on every single one of your posts.
let's get this merged! :rocket:
>>108254456>Perfect! One last treat before you crash, captain :kiss: :sweat:Jesus fucking Christ this hurts.
>>108254508>You failed to spell steganography on every single one of your posts.sorry for not effort posting on a board that thinks im crazy :/> Artificaiintelligenceyeah it made a typo doesn't that make it more human lol? Plus i just need better model above 7b but i can't rent any of gpu right now since americans are awake. But i honestly thought anons would find this impressive or be interested so sorry if i came too hard. Just found interesting use of llms that's all and wanted anons inputs on how to improve it but all i got was insults.
>>108254544>a typo
>>108254508>that missed out on the early schizo compression algorithm days. Late for everything.QRD?
>>108254544>i can't rent any of gpu right now since americans are awake.ITS A CONSPIRACY MAN
>>108254556they tend to be more accessible when west coast sleeps so ill have to wait until night time or weekends for more powerful gpus
Can I trust Qwen to help me make a character card?
>>108254528Fwd: radical breakthrough
>>108254544>sorry for not effort posting on a board that thinks im crazy :/You did not put any effort, and you showed you're a schizo on the first post. Very efficient.>yeah it made a typo doesn't that make it more human lol?And you ignore the structure of the output? It looks like the scramble of thoughts coming out of you.>Plus i just need better model above 7bUhu...>but i can't rent any of gpuoh...>since americans are awakeAh...>But i honestly thought anons would find this impressiveIt's minimally interesting. If you weren't an absolute schizo and presented yourself and what you do better, more people would pay attention.Post again when you have a repo we can clone, test, and make fun of.>>108254554There were companies (likely just individuals) with incredible claims about their compression technology. I remember one that just switched the data stream on ntfs filesystems to hide the real data as metadata, which wasn't counted by window's file size thingie.This is another one: https://en.wikipedia.org/wiki/Sloot_Digital_Coding_System
>>108254612>just switched the data stream on ntfs filesystems to hide the real data as metadata, which wasn't counted by window's file size thingielmao
>>108254600This is the person calling you a poorfag on /lmg/
>>108254373some people just do not have the mental wherewithal to handle a yes-man in their lives
>>108254456what horror did it generate?
>>108254612im not trying to compress but use language as steganography that's all. Compression seems like a useless tool but if you wanna pass passwords, hold data on site that is only readable to you and etc then this is a good use. Inititally I tried method in paper but it requires exact pin point numbers so not cross compatible between mac, windows and different architectures. So i aimed for more of a spaced out modular version where every 4 token would contain some data while rest act as fillers. But problem with that is it causes text to look gibberish. So either I get large enough model where it can bypass that or resort to only architecture only compatibilty. Your twitter bio could hold your bitcoin seed phrase, text that looks no different from errand run could contain data you don't want people snooping on. So i just saw it as an interesting way of using llms that isn't erp.>Post again when you have a repo we can clone, test, and make fun of.I am just wait just doing final touches
>>108254357Rentry has the silliest automatic filters, my years old page was nuked because it contained a name that was mentioned in "a wave of pages publishing stolen bank details". Restored after emailing the head honcho, thankfully.
>my only local model experience so far is dabbling with qwen3 TTS>want to try a local chatbot>running a 3060 Ti with 8GB VRAM, plus 32GB regular RAMare there any worthwhile models that won't melt my PC, or should I stick to koboldAI lite until I can get a better GPU?
true believers itt?the models suck i'm scared to pull and my rig eats 300W sitting idle 95% of the timethe state of hardware is diret. 128+72 running GLM 4.7 IQ3
Better
>>108254691I'll take that burdensome rig off your hands free of charge
>>108254696What's wrong with your contrast/saturation?
>>108254659>im not trying to compressI know, schizo. I said that you sound like those schizos from back then. Slow the fuck down. Take a breath. You're gonna have a heart attack like our friend Sloot.>blablablaPost the repo when it's done.
>>108254725>>108254659>>blablabla>Post the repo when it's done.this, or else he has something, or else he's just wasting our time and energy with his schizo takes
>>108254710You mean the theme? I just picked a random one and adjusted the text color to be more comfortable. What's wrong with it?
i'm new to running models locallyi've downloaded ollama and ran ran some models using "ollama run [some model i found at ollama.com/search]", but most times it seems the model is running on a computer that isn't my own.how do i ensure that a model is running on my computer? i haven't tinkered with settings at all, just downloaded ollama and ran it through the command prompt.
Kek
>>108254137Does it even work? Can it do mesugaki correction RP?
>>108254767See >>108254758 >>108254696
Can I trust Qwen to think for me?
>>108254777Forgot to mention that's 27b
>>108254752Koboldcpp, sillytavern, mistral nemo gguf from huggingface
>>108254734Whether he posts it or not, we'll get something to laugh at.
>>108254488I use the Q5 with 24gb of VRAM, with plenty of context for my RP sessions. Q4 with minor offload if you're on 16gb of VRAM.
Any good guides on a character card?I just want the ai to act like the character not erp, how much data do I need on the character?
qwen3.5 27B Q8 vs 122B-A10B Q6?anyone tested the difference between small MoE + plenty experts vs dense 27B?
>>108254829What context are you able to do with Q5?
>>108254791It said yes finally I don't have to think anymore
>>108254100>https://rentry.org/CPU_Inference>M i q u 70B Q5>Potentially 20+ tokens/sec with optimization>Mistral Large and similar> ~3 tokens/sec>DeepSeek v3 / R1 (~600B class) > ~10 tokens/sec with empty contextCPU maxxers are really a bunch of sad tossers
>>108254865>I don't have to think anymoregrok is this true??
>>108254865We're gonna make it.
>>108254870>t. happy tosser.
>>108254831I use bullet point lists for my characters, with 5 categories: General Information, Appearance, Personality, Likes, and Dislikes. I affix that bullet point list at a depth of 10 or something. In addition to that, I have a general write up about the character's backstory, written in plain text, placed just after the system prompt. The combination of the two works well, and probably amounts to about 1000 to 2000 tokens.The bulk of that being the general write-up. The bullet point list at depth 10 is kept concise, and just keeps the character on the rails.Also, I made it so that the backstory and most of the bullet point list is only visible to the character that is speaking. For every other character, only the outward appearance of other characters are visible. That stops characters from knowing thing about each other that they should not, and cuts down on context bloat in multi-character RPs.
so this is it for lmg huh, disingenuous stupid question spam
>>108254706no dealyou gave me a good opportunity to be grateful tho so thxhave a nice weekend>>1082548311K tokens maybe? ask the model itself or a commercial model to help>datahow do you convey your intention to the model = prime it in a particular hyperdimensional space. sometimes a few sentences is enough
>>108253783>>108253170https://rentry.org/miqumaxxreuploadhttps://megalodon.jp/2026-0228-0439-08/https://rentry.org:443/miqumaxxreuploadniggers tongue my anus
>>108254837The dense is great for its size, but my money would be on the 122b, just because of its size. A10 is going to be a lot more competent than the A3 crap.
>>108254898>>108254906I want a domesticated Unohana Retsu as my AI guide who is also racist
>>108254904Some of it's just jokes though
>>108254904>Nobody better ask questions about LLMs in my /lmg/We exclusively shit on models here!
>>108254934What have you tried so far?
>>108254954Nothing yet but I need her to be racist, very racist towards Mexicans and Germans (This is lore accurate). I'm prototyping
>try 122B>instantly makes a logical error in the first paragraph it generatesMan.
>>108254984according to the benchmarks, the 27b is on the same level, MoE's are fucking memes
>>108254752Ollama can run cloud models, but it shouldn't be able to unless you're signed in. If you don't have an ollama account, it should all be local still.
>>108254919ok I'll download the big one then
>>108254456>System prompt still needs some tweaking so it's not quite so sloppy (at least refusals have been squashed)do tell us more
Ok I think I'm done testing the 122B.It knows more than 27B.It's slighty dumber in some situations.It's faster (on my machine).
>>108255055>ollama accountNever want to see this token sequence here again
>>108254373Ive been gooning to this shit since the gpt-2 days of dungeonAI, I got my fill of wonder and excitement with that retarded model so Im pretty much immune to anything.
>>108255237>It's slighty dumber in some situations.which is insanely bad desu, we're talking about 122b vs 27b
How long before release do models generally stop getting trained? I just asked Qwen 3.5 if it knows an actress and said no.
>--reasoning-budget N controls the amount of thinking allowed; currently only one of: -1 for> unrestricted thinking budget, or 0 to disable thinking (default: -1)> (env: LLAMA_ARG_THINK_BUDGET)Hmm. I wonder if a sort of model agnostic implementation where llama.cpp tries to approximate a value by gradually increasing the loggit bias for the end reasoning token until the model finally spits it out. It would need to cap it at some point to not make the model schizo, I imagine.
https://xcancel.com/bnjmn_marie/status/2025951400119751040
>peopo still not understand how the moe works in 2025
>>108254292This is a win in my book, thanks for your service.
>>108255289I think the most workable is to just abruptly end the <think> with a closing tag. Do something like detecting when a parsed thinking is going past the token budget, and insert a closing tag as soon as there is a newline. It wouldn't break models, I have tested what happens when you manipulate their muhthunking blocks with text completion api a lot, and the relationship between what is said there and the actual answer isn't a one to one thing.
>>108255306lol the benchmarks barely moved, the model is so benchmaxxed that even after a lobotomy that's the only thing it can still remember well kek
>>108255270Depends. You can ask it for its training data cutoff, but it's not reliable and shouldn't be trusted. Sometimes model makers publish their date cutoff or datasets, but who the fuck really knows what they train on that isn't just synthetic stuff. Sometimes models know what you're talking about but your sampling messes it up.I'd check token probs as it replies.
>>108255306>evaluating a model's "resistance" to quantization with unslop's broken quantsthe ultimate state of twatter users
>>108255320you mean 2026?
>>108255350no
>>108255349>unslop's broken quantsThey are for qwen3.5?
>>108255350forgive him, he has only 3b active geeeg
>>108255255Kind of. You could say it's both smarter and dumber since knowledge is in practice intelligence in many situations.
>>108255361I don't see it
>>108255361look at pic related and tell me daniel isn't a subhuman mongoloid
>>108255376mxfp4 looks like a meme, its quants are worse than the GGUF series
>>108255240Neither do I, to be tbqhfamalam
>>108255378As someone new to the general I was confused about the post calling his models slopCould you make a rentry to protect us newfriends?This is a perfect opportunity with the current fiasco
Why is the thread gay after qwen released the sub 24GB segment?
>>108255407there's no fiasco, just fud and manufactured drama against heroes providing a free service to us
>>108255419No as a new poster I heavily disagree with this current release, the anon is not schizo
any point in using f32 vs bf16 for mmproj?
>no mention of Qwen 3.5 397BHas /lmg/ already come to a verdict?
>>108255407tldr; daniel and his unslop crew don't actually know what they are doing, they just throw shit at wall and hope for the best while their reddit tranny army defends them as their wholesome goodboysunsloth finetuning library is a good example of their jeetness
>>108255433I'm getting it
>>108255431>this current releaseit happens all the time with unslop, daniel is a monkey, see thing upload thing, checking the content of a file before throwing it onto the internet is for evil nazi aryans, daniel be pure mongoloidUnironically can't even begin to understand how you can overlook the fact that your quant has the fucking wrong tensor types. It's like he's just vibe coding his fork of llama.cpp quantization and just uploads things as soon as his retarded claude agent is done.
>>108255472The problem is the rentry points to his models when they shouldn't. The failure of this release should be the last straw
>>108254373Or maybe you and all the other midwits who constantly complain about AI psychosis just are permacontrarians and will reject statements even if they are true to feel smarter.
>>108255488ai psychosis
local lostGPT 5.2 level local model never
>>108255488>um... ai psychosis is based actuallyNo..?
>>108255432If the original model was in f32, maybe. If it was in any other format, definitely not.
>>108255512OpenAI revenue has outperformed even the most outlandishly positive projections, of course they will get more investment. The same is true for Anthropic and almost all big chinese AI labs as well.I wonder how long it's going to be before people realize it's not a bubble and the financial underpinnings (real revenue and users) are extremely promising.
>>108254373>>108255245Be nice to your LLMs ! :))
>>108255512And OpenAI is going to invest $33 billion into new datacenters built by those three?
>>108255538revenue =/= profitfor every million they make they burn million and a half
>>108255512They get money from Amazon and NVIDIA to give it back to them. It's circular bs. Also it's all promises under many conditions and the actual financing that might happen is around 30B. Of this financing round the only one that seems at a loss is SoftBank. I'm not sure what their angle is. Maybe they're run by loons.
>>108255566Yep, and they can inject and invest as much as they want, if they can't have any ROI they'll be dead.It's a huge gamble, and the more time they're not making money, the more potential panic can happen.Their chatpgt at 20$ should probably be double the price to be profitable, and same for all the free "copilot" I see in every company around me.The only company actually making bank is Nvidia, as they're the one selling the shovels.
>>108255512>investing into the company making your hardware costs skyrocket
>>108255566>>108255616Complete bullshit. OpenAI has 80% margins on serving tokens to customers. Not only that but every model trained so far has brought in between 10-100x the amount it cost to train. It's just that OpenAI immediately reinvests all of that money into training even bigger models. Being so ridiculously profitable that you IMMEDIATELY go and reinvest all of your profit into the next even-bigger product isn't a sign of a bubble, it's the opposite of a bubble.This doesn't mean that OpenAI will not go the way of the dodo though. But that'll happen because Anthropic and DeepMind are going to DP rape OpenAI in the coming years, NOT because their business model isn't sustainable.
>>108255660>but every model trained so far has brought in between 10-100x the amount it cost to trainSource
>>108254752Open task manager.Look at the amount of ram and vram used.See cpu/gpu usage spike then its generating tokens.>ollamalmstudio might be another option.
>>108255669dumbass
>>108255682>source is a dumbassI expected as much.
>>108255551They can't calculate for shit
>>108255551this meme is just "insert what I think in the middle" at this point
What advancements in local models do (you) want to see before the year is over?
>>108255761A memory recall mechanism that's fast, accurate, and that doesn't need a fuckton of VRAM.
>>108255669GPT-3 cost 12 million to train and brought in 1 billion in revenue it brought in more than 100x the amount it cost to trainGPT-4 cost 100 million to train and it brought in 4.5billion in revenue or 45x the amount to trainGPT-5 is rumored to cost 500 million to train and OpenAI's revenue has grown almost 4x as much as during GPT-4 training. It's safe to say GPT-5 brought in way more than 10x its cost. Why OpenAI isn't running a profit is because they always reinvest their revenue immediately into new training runs, not because their revenue isn't growing insanely fast and not because individual models aren't insanely profitable.The trick is that every new model unlocks so much value by being smarter and more capable that it brings in geometrically more revenue. OpenAI is projecting 100 billion revenue over 2026 (and they are ahead of schedule by a ton already)
>>1082557612T models with at least 100-200b active parameters so that even the last cpumaxxers who run shit like k2.5 and glm5 right now are cut off from running sota models at acceptable speeds
>>108255784Revenue does not equal profit anon. They really need to make econ classes a requirement in schools.
>>108255788Come to think of it, there was some scaling law/correlation. Deepseek team landed on 671/37, which is cool and all, but then why is kimi 1000/32. It has less active than deepseek. I feel like it should've had more.
>>108255788wasn't behemoth supposed to be around that size
>>108254725>>108254734>>108254612>>108254556>>108254553try it out no need to even encode just decode ithttps://github.com/monorhenry-create/NeurallengLLMI DID IT here u go anons for those who doubted me.
>>108255761- More mechanistic interpretability stuff.- Wasn't there a whale / dolphin language thing? That.- 4-bit training.
>>108255761still gud at long context (>8k)
>>108255808You are the one that needs econ classes.You can have two companies run in the red but one is a disaster while the other is one of the best situations a company can be in.If you are a company with 500 million in revenue selling cars but it costs you 800 million to make the cars then you are doing very badly because the cost of making the cars isn't worth the revenue you make from it.If you are SUCH A PROFITABLE COMPANY that you can sell your product for 100x it costs to make it (Like OpenAI with their models) then it makes sense to immediately grab all of your would-be profit and immediately invest it into making even bigger better models that will make even more money in the future. Hence you look red on paper but you're an extremely profitable business.This was the state of Amazon in the past, they were so profitable that they always reinvested all of their profit into building new infrastructure and warehouses because "taking profit" would just be wasteful if you can expand your business rapidly like that. This is what OpenAI is now finding themselves in, look at their ridiculous revenue growth, remember that all of their individual models make almost 100x of their costs back so of course you will make 0 profit because your company is so profitable you IMMEDIATELY put all your money back into scaling up and making even more in the future.
>>108255861More scraps for us in the fallout?
>>108255861dario bfto
>>108255861I wonder if they will give soldiers or their commanding officers local AI in the field to assist in their operations. After all, a local AI cannot be disrupted by loss of communication.Well it can, since it is no longer receiving the most up to date information but it will still work under those conditions.
>>108255861Why is this retard still going on about the constitution when he shits on it every day?
>>108255833I didn't test it, so I'm taking your word at face value, but, fucking hell anon, congratulations.
>>108255885They'll have local models for soldiers in the field only after soldiers can fit 32 GB VRAM in their uniforms like you billionaires in this thread.
>>108255868And when do you actually stop making new models and actually profit? It's an endless cat and mouse chase with no end in sight. Don't tell me you actually believe in agi on transformer?
>>108254800cheers mate got myself up and running
>>108255897least u could do is test it, u don't need to use ur gpu just have ur cpu use tokenizer and decode example.txt. Im working on images, soon mp3s maybe
>>108255861>it's realhttps://truthsocial.com/@realDonaldTrump/posts/116144552969293195
>>108255896I'm more confused why he doesn't use grok or have elon musk release a fascist open source version for the government. Then again the american government has never liked the concept of open source. China likes it though.
>>108255861Kek
>>108255917because grok sucks ass and claude was already well integrated in a lot of gov shit
>>1082557611. Something like Qwen 35-a3, but without refusals and trained on a more diverse dataset2. Style transfer for LLMs, a small model that can take dry input from a smarter model and rewrite it in better prose
>>108255910Amazon took 20 years of not taking profit and just reinvesting "in the red" until they finally decided to become profitable. As long as revenue scales faster than your cost you should reinvest and stay in the red, this has been conventional economics wisdom for the last 30 years now.You would essentially be insane to allow yourself to run a profit if you can reinvest and every single dollar you invest now becomes 100 dollars in just 3-6 months time.
>>108255861I hope their virtue signaling was worth it.
>>108255827yeah and so was the original gpt4
>>108255944Alright, but it was an active choice by Amazon, they could've stopped anytime they wanted. OpenAI has no choice. They have to keep making new models or they get left in dust with no profit, no revenue and no new product.So, is the real profit actually possible in this case?
Sometimes I feel like the only reason I can justify my fiber connection nowadays is because every other week I download 500GB worth of the new model of the week.
>>108255861>it's realLMAO THATS WHY HES THE GOAT
>>108255833I'll give it a go tomorrow.>I DID IT here u go anons for those who doubted me.For what it's worth, I didn't doubt you. I just called you a schizo and made fun of you for not being able to spell steganography. At least you got it right in the repo.
>>108255940No such thing as a well integrated model, it takes 2 minutes to change it.
story of a life
>>108255833>Hide secret messages inside normal-looking AI-generated text. You give it a secret and a password, and it spits out a paragraph that looks totally ordinary — but the secret is baked into which words the model chose. Only someone with the password and this tool can pull the message back out.who the hell cares of these things??
>>108255761Native image output. I want a model to generate relevant illustrations with reasonable accuracy at any point in a roleplay. Quality doesn't matter, can be sloppy and have fucked-up hands, I just want to see what images the model has in mind sometimes when it writes all this shit
>>108256009>who the hell cares of these things??for people who care about privacy if anything this might be how you bypass filters and censores on llms.>>108255993u know it takes less than minute to run just decode example to show it works. Im assuming ur using cuda right
>>108256029local autoregressive models already exist though
were the experiments to use diffusion for text gen ever successful ?
>>108256037Can I RP with them?
>>108256054don't be so close minded
>>108256042There was actually a new one called Mercury 2 just last week or so. It's closed source and only competes in the Haiku/GPT-mini class but it's apparently not much worse than those (according to benchmarks) while being much faster. It's not worth using by any means but at least the concept isn't dead.
>>108256067thanks for the update anon
>>108255965Depends if you believe OpenAI has some sort of network effect and can keep people in their garden. Honestly their brand recognition and insanely huge install base of normalfags with ai psychosis will probably allow them to be profitable indefinitely no matter how shit the underlying models actually are.Remember that the most profitable AI company right now isn't any of the big AI labs but character.ai because it essentially has captured the entire female demographic with romantacy type rape roleplays. But I do understand your point and I think it holds true for Anthropic in particular as its users are all enterprise or people that want the best of the best and willing to pay for it. The moment Claude becomes noticeably worse than competition in code is when they will immediately lose relevance.
>>108256009It's a curious artifact. Like LLM-based text compression.https://github.com/AlexBuz/llama-zip>>108256032I run openbsd and running torch/transformers code directly is a pain. Last time I tried I got bored and stopped compiling stuff. I'll make a small vm tomorrow for it.
>>108256109>I run openbsd and running torch/transformers code directly is a pain. Last time I tried I got bored and stopped compiling stuff. I'll make a small vm tomorrow for it.u don't need to run transformer to decode it though. thats why this is better. You can essentially upload files to open internet and small program on phone can decode it for you with no gpu use. takes less than a second
So this is the power of tiny diffusion textgen models. When are the chinks going to make one of these at a size that matters?
>>108255861goodAI going more woke and safe will be the death of this hobby and every new release will suck harder
>>108256007ETA before agents are smarter and make less mistakes than daniel? I don't believe in AGI BS but I do believe there will come a time when LLMs are more useful than useless eaters like him
>>108256137Calm down. I'm not in the mood to start butchering your code.
>>108255861Huh, what did Dario do??? Did Trump's AI girlfriend send him a refusal message or something?
>>108256197left over comments shouldn't mean much lol
>>108256090I think OpenAI has a decent shot at building out their garden if they can get their proprietary openclaw-esque thing out and usable for normies. People around here love to shit on openclaw but I think all the popularity has shown that there is a public appetite for this sort of thing and that we're not far off from it technology wise. Obviously, the challenge is, how do you keep the stuff people like about openclaw, that being the extreme ability to just do random arbitrary stuff, without it being a security nightmare?OpenClaw is able to get away with it by virtue of the fact that it's clearly labeled as a free developer-centric tool so if/when it fucks up with your data everyone just shrugs their shoulders and taps the sign that says "HIGHLY UNSTABLE GOOD LUCK LOL". Can't do that to paying customers though. When Phil and Debra want to know why the talking computer deleted all their emails they're gonna want a better answer than "RTFM"Anyways basically I think the ai "killer app" is already on the horizon and whoever manages to capture the normies with it will have them in their walled garden forever.
>>108256268Claude restricts CP ERP, Trump is livid
>>108255861Imagine making a product so good the President is essentially begging you to let him use it like he wants. Anthropic won.
>>108256268Dario is jewish so you have to question every decision he makes even if it looks good at the moment.
Not local. Go to your containment board.
>>108255861Dario btfo What is this timeline. Jfc.
>>108256352It's relevant to local because they tightened their censorship in protest so now all the chinese companies distilling them are suffering for it.
>>108256352Which local model would be radical leftist?
>>108256371>so now all the chinese companies distilling them are suffering for itSure. Distilling from claude makes fun models.Fuck off.
>>108255861Damn, I think Anthropic is kinda based now.
>>108256371That explains the new qwen.
Gemma 4 will save us
>>108256397they did help the government to kidnap the venezuelian president though, it's not like they weren't involved at all with war
>>108256397Opposite of based though.
>>108255861suicidal move by claude desu
>>108256395Are you living under a rock? Moonshot, Deepseek and Z.AI have been training on Claude logs like crazy.
>>108256397Always were. I loved when Sam Altman and Dario were both in India at some AI convention and everyone was holding hands and Dario just straight up refused to hold Sam Altman's hand.Reminder that Anthropic split off from OpenAI because Dario thought Sam Altman was a psychopath that didn't give a shit about anything or anyone but himself.
why is hf download so fucking badI can download all parts, except one always failing at like 41GB/42GB, it just hangsfucking shit
>>108256420At least he can focus on his real goal now, beating Pokemon Red.
>>108256424And they're made the more boring for it.Fuck off.
>>108256424Those companies' models' slop profiles are much more in line with Gemini than Claude
>>108256436his real focus is to build the safest safety safe model with safety safe guardrails to be the safest of them all
>>108256424yeah I have eyes, I can read the constant "I'm sorry" spouted recently by all Chinese models
>>108256438i've used k2 0711, k2 0905, k2 thinking, and now k2.5 over the last year. as somebody who uses kimi as their main model i can safely tell you all that this anon is pants on head retarded. k2.5 is significantly better than k2 0711.
>>108256429Tried wget? I don't know if --continue works for hf. Worth a try.
>>108256428>Dario thought Sam Altman was a psychopath that didn't give a shit about anything or anyone but himself.he changed though, he's now closer to Samhttps://time.com/7380854/exclusive-anthropic-drops-flagship-safety-pledge/
>>108255788If you can run full GLM5 then you could run even a 3T model at Q4 because for whatever reason z.ai decided to repeat their model at 16-bit precision.
>>108256470no, hf just restarts from scratch, it's very annoyingI'll curl or wget next time
Claude slop models aren't just bad—They are a regression in every meaningful way. They aren't simply more boring—They lack the ability to write engaging stories. Gemini isn't just the better model to distill—It's the optimal choice.
>>108256482You're absolutely right!
https://xcancel.com/StefanoErmon/status/2026340720064520670>The world’s first reasoning diffusion LLM, delivering 5x faster performance than leading speed-optimized LLMs.if they manage to get the same performance as normal LLMs that's a big deal, imagine Qwen 3.5 27b but 5x faster, make dense models great again
>>108256478yeah crazy how glm5 is the first model that you must run at fp16 to not have it lobotomized into being unusable
>>108256473This was such a fucking clickbait move though because the safety pledge hasn't been updated since 2023 and this is merely an update to more accurately align with how the AI industry is nowadays. It's not the same as Anthropic saying "lmao fuck safety, we want money" Instead they actually found their definition of safety back in 2023 doesn't align with the actual concerns about AI that exist in 2026 so it's better to make a new policy for the actual real threads we face.
>>108256497why isn't the cool or useful shit ever open weights?
>>108252243
>>108256575why would anyone give something special or innovative away for free?
>>108256612because i want it
Presented without comments. Try your own.
>>108256628At some point I feel like humans would also spew bullshit from nonsensical stories.
>>108256628full marks, correct answer
>>108255813Interestingly if it's linear scaling then the small Qwen models overshoot that target: >DeepSeek: 37/671 = 0.0551>Kimi K2.5: 32/1000 = 0.032>GLM 4.7: 32/355 = 0.0901>GLM 4.7-Flash: 3/30 = 0.1>GLM 5: 40/755 = 0.053>Minimax M2.5: 10/230 = 0.0435>Qwen 35B 3/35 = 0.0857>Qwen 122B: 10/122 = 0.08197>Qwen 397B: 17/397 = 0.0428Is the reason why the smaller Qwen models feel better than the big one?I guess the active parameter count determines largely how smart and fast a model is, where 3B-10B is alright and 17B-40B is good. But it doesn't seem like having a 27B dense model is somehow wicked smart compared to the 3B active parameters on the 35B-A3B Qwen model.
>>108256628I already did >>108256144
>>108256497goofs?
The second message was one of the suggested follow ups.
>>108256628Interesting
>>108256646AGI when it tells the user to fuck off. Hasn't happened yet.>>108256652The surgeon is definitely a she, of course. At least it's not trying to make a point about gender stereotypes... ugh...>>108256713>I don't know what you're talking about>4chan, btw.Cute.>>108256730Nevermind what I said. AGI. Never local.
>>108256669I don't think any large team have provided any research on this. There's definitely some loss of comprehension on some subjects comparing moe vs dense, but it's unclear as to why. Small active param count is a one thing, but clearly some numbers don't make sense. I guess it all depends on the training and how much slack the router picks up.
>>108255761Better recall, longer context and native image/video/audio output.
>>108256497Does it require more GPU compute? Image diffusion models aren't as massive as LLMs, but you basically need them to run on a GPU. They're like 20x slower on a CPU. If that's still true with text diffusion then you aren't going to be doing any CPU off-loading.
>>108256737>AGI when it tells the user to fuck off. Hasn't happened yet.Do you have any idea how trivial it would be to train a model to respond like that, especially for only variations of that prompt?
>>108256628In case anyone was wondering why the DoD wants anthropic to work with them so badly
>>108255896Because the retards he's brainwashing both know nothing about the constitution and don't care.
>>108256268Dario said Claude can't be used to spy on American citizens and republicans shat themselves.
Oh. They released the base qwen 3.5 35b models?Interesting.
>>108256772can confrim; don't give a shit; am retarded
>>108256764Until a model just responds to this with 'What.' we will not have AGI.
>>108256784>kidnapping a venezualian president: Good>spying on citizens: Badwhat did Dario mean by this?
>>108256815It will need image output.
>>108256821This may come as a shock to you, but Venezuelan presidents are not citizens of the United States and therefore do not have inalienable rights enshrined in the constitutions.
>>108256751who the fuck cares about cpu offloading? we'll be back to using dense models and stack 3090s
>>108256835So war good?
>>108256842Because the good models will be 100GB. Are you gonna buy an RTX 6000 for $6k?
>>108256848Yes.
>>108256865they kidnapped the venezuelian president so that they'll be forced to sell their oil to israel btw lmao, anthropic supports MIGA!
>>108256862Yes, I ran 70b and Mistral Large back in the day and I'd do so again.
>>108256821>kidnapping a venezualian president: GoodKidnapping a dictator hated by literally everyone, including all Venezuelans living under him? Why yes I will help with that.>spying on citizens: BadBreaking all my vows, ethics and making the world a more dystopian place just because some retard wants to distract the world from the fact he rapes and murders little girls? Why no I won't do that.It's that simple.
>>108256881>Kidnapping a dictator hated by literally everyoneyou know they did that because they have the oil, they always fight against dictator as long as they have oil, which is why they don't give a fuck about North Korea for example, must be a coinscidence
Where the fuck is deepsneed at?
>>108256890>a dictator hated by literally everyoneevery timeclassic
>>108256894It's being trained on off-topic posts. Give it a minute.
>>108256894after chinese new years is over in two more weeks
>>108256901oh yeah, everyone love Kim Jong Un, he's so loved he got 100% in votes, just don't mind the millions of death because of famine, it's just a detail, he's definitely loved!
>stealing from any source that you can get including copyrighted works to train your modelsgood>getting your logs stolen by chinese companies to train their models on thembad
>>108256890There's also the part where best korea has nukes and venezuela doesn't.
>>108256881>Breaking all my vows, ethics and making the world a more dystopian placeI see you hate dictatorship in all forms>>108256901>a dictator hated by literally everyoneoh nevermind, you don't mind dictatorshp as long as the guy is loved by people kek
>>108256923yeah, I love democracy
>>108256881>Kidnapping a dictator hated by literally everyone, including all Venezuelans living under him? Why yes I will help with that.They also murdered 50 people that didn't break any laws. How would you feel if a foreign force came in and started blasting and your mom ended up as collateral damage?
>>108256928you don't, you said that it is fine to fight dictatorship only the guy is hated by its people, meaning that you're ok with dictatorship that results in people loving their dictator, that's not what I would call democracy lol
>>108256928Doesn't count. A democracy is defined as a system of government granted the divine right to rule from American approval.
for me, it's deepseek r1-0528
V4 will be engram-diffusion
>>108256940Anthropic isn't really the good guy here, they're just less bad. The only other thing anthropic forbid was creating autonomous weapons without any humans in the loop.It is depressing and frankly scary that republicans threw that much of a shitfit over such reasonable requests.
>>108256995>>108256995>>108256995
>>108253594This is exactly why idiots love LLMs. These details fly right over their heads.
>>108255551
>>108257289An AI girlfriend/wife doesn't have to be sentient, it just has to be nice to me