[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 00120-3282228290.png (673 KB, 1216x832)
673 KB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>109063196 & >>109057485

►News
>(06/13) Rio 3.5 Open 397B released with SwiReasoning: https://hf.co/prefeitura-rio/Rio-3.5-Open-397B
>(06/12) MiniMax-M3 released, multimodal 428B-A23B with 1M context: https://hf.co/MiniMaxAI/MiniMax-M3
>(06/12) Kimi K2.7 Code released: https://hf.co/moonshotai/Kimi-K2.7-Code
>(06/12) EAGLE3 speculative decoding support merged: https://github.com/ggml-org/llama.cpp/pull/18039

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/RecapAnon/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: t2.png (79 KB, 768x512)
79 KB PNG
►Recent Highlights from the Previous Thread: >>109063196

--NoLiMa long-context testing for Gemma and Llama.cpp SWA optimization:
>109063426 >109063453 >109063699 >109063857 >109064106 >109065875 >109066105 >109066221 >109066334 >109064540 >109064728 >109063899
--Llama.cpp fork enabling dynamic tensor-level quant selection for VRAM optimization:
>109067166
--Analyzing Mistral training loss graphs and pretraining decay strategies:
>109063266 >109063370 >109064378 >109064604
--Comparing firejail, bubblewrap, and systemd-run for opsec sandboxing:
>109066086 >109066328 >109066386 >109066419 >109066436 >109066460 >109066435
--Evaluating Nemotron-Labs-Diffusion-14B's Tri-Mode architecture and parallel decoding benefits:
>109069271 >109069341 >109069410 >109069439
--Comparing Gemma 4, Nemotron, and Mistral-Nemo for roleplay prose:
>109063223 >109063233 >109063393 >109063408 >109063444 >109063526 >109063538 >109063572
--Release of Eurobeat ACEStep 1.5 XL LoRA and training resources:
>109064901 >109066072 >109066092 >109066128
--Advice on multi-GPU hardware configuration and PCIe lane limitations:
>109065551 >109065762 >109065831 >109065846 >109065854 >109065888
--Random generators for game prompts and Mistral "Le Chaton Fat" rumors:
>109067756 >109067852 >109067936 >109068045 >109068388 >109068454 >109069006 >109069024 >109069075 >109069177 >109069264 >109069309 >109069327 >109069381 >109069339 >109069382
--Debating repetitive roleplay prose and Gemma's inability to adapt styles:
>109067282 >109067319 >109067502 >109067575 >109067643 >109068248 >109069016
--Critique of Meta's proprietary model strategy and lack of API:
>109064067 >109065224 >109065426
--Logs:
>109063223 >109063233 >109065370 >109067487 >109068747
--Miku, Yuki (free space):
>109063890 >109064379 >109064500 >109064818 >109064851 >109069525

►Recent Highlight Posts from the Previous Thread: >>109063201

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
File: 1773633728430702.png (5 KB, 327x285)
5 KB PNG
>>109069535

>>109061013
>Qwen is better on code, Gemma for everything else

/Thread
>>109061033

>If you want it to divert, you must prompt it to divert based on a context for how and why.

I don't know whether or not silly tavern has this feature so please don't grill me if it does because I don't use it often:

I think one way you could solve this issue is to copy Opencode's "mode" switching feature. Whenever you first prompt a model you use one of two default "modes": "Build" mode and "Plan" mode. Each one has its own specific system prompt that gets sent to the model. Build is more permissive at plan is, well, pretty self-explanatory: only for planning and the system prompt explicitly forbids it from making any edits to files even if you tell it to. It will only change things if it's in build mode. Open coat does this by changing the system prompt within the context itself every time you switch. Let's suppose you start in plan mode. The model gets sent the plan mode system card. Then when you switch the build mode the front end uses your back end to rewrite history as if it was always in build mode but going to the beginning of the conversation and switching out the system prompts. The immediate downside is that it has to re-prefill the entire context but in return it follows your instructions more clearly, which is very important if you're using it for "vibe coding" because you want to make sure it does not do anything you didn't explicitly tell you to or accidentally nuke something.
>>
70b dense
>>
>>109069550
>Qwen is better on code
Only for non-programmers and jeets
>>
File: 1763096221808842.jpg (282 KB, 960x960)
282 KB JPG
>>109069526
>>
>>109069535
powerful
>>
1T dense mythos teacher model
>>
>>109069565
I don't get it, why are you reacting with a picture of a gorilla?
>>
File: 1764669215500865.jpg (129 KB, 870x1396)
129 KB JPG
Continuation of >>109069550

Perhaps this could be implemented for general purpose-based front-ends like silly tavern where it can automatically switch "modes" , character "moods", "personality" either at the direction of the user or automatically based on what is said. How could this be done automatically? The model itself can decide which "mode" or "mood" the "character" should be in based on the direction of the current conversation. Friends like llama-server's gui are capable of doing this because they could automatically name the chat based on what you initially ask it so in theory they should be doable. This way you can not only have character cards and lorebooks but additional cards that dictate how the character Acts whenever they're in a particular mood

>eg. "x character acts and talks like this when they are sad", y character acts like this when they are horny" etc).

it then dynamically modifies the system prompt within the context to ensure it complies and not just the character to act how it should act.

Much easier said than done and this is just a random shower thought I had after reading this post >>109060985

>>109069564
>May we see what you've done?
>No
>>
>>109069574
What is even working with a 0.8B model?
>>
>>109069564
I could have sworn we just went over this
>>
>>109069574
nta but
>e2b
>0.8b
they are barely coherent.. unless for extremely lightweight use
>>
>>109069579
are you a non-programmer or a jeet
>>
File: 1756665190944831.jpg (100 KB, 1000x835)
100 KB JPG
>>109069526
Cool!
What do you use small models for, I've been trying to find some usecases for a small fun project but uhhh yeah
>>
>>109069589
You clearly have produced nothing of substance yourself. You're too dumb to do so even with the help of AI
>>
Mistral is producing amazing models.
Europeans are on the map.
>>
>>109069609
Which map?
>>
File: 1781368877991893.png (517 KB, 512x768)
517 KB PNG
>>109069586
Qwen shill timed his post badly and has to redo it, or his boss will be at his ass again
>>
>>109069609
they really should make 30T dense model for real;
>>
>>109069626
It is literally illegal to train models that large in Europe, even assuming a tiny French company had the GPUs to do so (they don't)
>>
>>109069628
>literally illegal to train
do euros really?
lol wtf
>>
>>109069583
>>109069587
They seem to work well enough though. Have you got any other recommendations?
I'm looking for a 100+ tokens/s (edge) model that fits on 12gb.
>>109069595
Just planning on hooking it up to windows right click context menu, and also to translate stuff. With gemma 4 31b at q8, I'm running at 200 tokens/s pp and 35 tokens/s tg... with thinking enabled, that's unbearably slow.
>>
>>109069586
QRD?
>>
>>109069612
He meant MAP.
>>
>>109069550
>>109069579
humoring your wall of slop: there's options like that in RP tools, but if you can skip it, you'll want to. coding tools usually switch once (and title generation additionally only uses the first response), but for RP you'll have to keep the whole history and switch for each message which is why e.g. orb suggest using different models as you don't want to bust your caches.
there's a lot you could do if you don't give a shit about efficiency, but then you can't use the big models at barely any tok/s.
>>
kimi flash
I want that now
>>
File: 1762523867868687.png (385 KB, 640x526)
385 KB PNG
>>109069638
>35tk/s
>slow
We love things slow here
>>
>>109069687
go back retard
>>
>>109069579
Somebody already came up with that idea several months ago: https://github.com/OrbFrontend/Orb
>>
File: IMG_3218.png (303 KB, 736x736)
303 KB PNG
>>109069687
>200 tokens/s pp
>>
>>109069636
It's not. There's extra red tape if the models require more than 10^25 FLOP of compute for training, but you can make MoE models almost arbitrarily large at little to no extra compute. Anyway it's unclear if this will be changed together with other limitations for example on training data. I've read that the rules are going to be revised a bit.

https://explorer.artificialintelligenceact.eu/en/single/?type=article&num=51
(they write "1025", but it's 10^25)
>>
>>109069579
Qwen is a benchmarked trash model, and nothing can make it popular on 4chan. Your new tactic "alright, it's not the best at everything, but can we at least agree it is better at code?" will not work
>>
>>109069579
https://github.com/OrbFrontend/Orb
Agentic ST with a prose fixer, etc
>>
>>109069709
hmm, nyo
>>
>>109069579
and it's even got "moods" now https://github.com/OrbFrontend/Orb/blob/main/Orb.png
>>
>>109069638
you can try some of the 8b-a1b moes, I guess. LFM or zaya (?) had some, although I'm not sure if they're supported in llama.cpp
don't expect too much.
>>
>>109069687
i look like this and say this
>>
>>109069650
>coding tools usually switch once
Open coat bike default will only switch modes when you do it yourself by toggling them. They don't switch automatically. You might be confusing modes with "sub agents". Mode switching is not triggered automatically. My wall of text proposes a feature or something like that is done automatically based on whether or not the model "thinks" the initial system prompt within the context should be modified (this would probably necessitate the model being half decent at tool calling and being versatile enough to know when to trigger that toolcall even while RPing.

>and switch for each message

??? Why would switching after every single message be necessary? It would only need to happen based on the surrounding context (eg. The current "scene" or event that causes the character to act differently).
>>
>>109069735
8b is a bit big, it wouldn't leave enough room for doing other stuff at the same time...
>>
Should I buy a GMKtec EVO-X2 if I want to dip my toes into mid-tier models? It seems like the absolute cheapest entry price into that but its chinesium so idk.
>>
>>109069749
try leaving the expert weight on the cpu, streaming them might not be too bad with only 1ba
>>
>>109069719
/vcg/ local users seem to quite like it actually. /lmg/ having Gemma for diehards makes total sense since making your dick hard is the only thing you guys care about. You clearly have the wrong type of autism (literally). the one where you're stubborn less calls you to act stupid.
>>
>>109069735
Just use bonzai
>>
>>109069756
Unfortunately, I am on ddr4, and cpumoe destroys the speed.
>>
https://github.com/ikawrakow/ik_llama.cpp/pull/1970
why would i use dflash when it's slower than mtp?
>>
>>109069760
Aren't those the ternary models? Are they supported in llama?
>>
>>109069750
you're going to chug along at 8t/s on dense models with those
>>
>>109069749
maybe try q6_k?
>>109069756
usually yeah, but if you want to go for 100 ts tg that might bottleneck you hard.
>>
>>109069757
Who asked you to leave the retard containment general?
>>
>>109069778
https://github.com/ggml-org/llama.cpp/issues/21298
>>
I'm really concerned that no one seems to be interested in very small language models. I feel like there is a huge gap in potential there simply from being able to have a ridiculously fast small LLM that can make a lot of changes very rapidly.

There HAS to be some utility in a 0.2B very small models running at the 5 digit t/s token throughput for very light agentic tasks (like recursively renaming every folder on large systems and grouping unordered files together based on the context they provide like size, metadata, naming etc.

This is just something I made up of course but the fact that no one bothers with this surprises me.
>>
File: 1781620508482167.png (866 KB, 768x1024)
866 KB PNG
>>109069757
>A thread full of non-programmers and jeets liking Qwen
Not beating the allegations, kek
>>
>>109069750
>6000 aud
I bought 4 V620s for 3000 aud, and a H12d-8D+Epyc 7502 combo for 3000rmb.
>>
>>109069788
Mikutrannies are making bank
>>
File: image.png (1.52 MB, 883x1170)
1.52 MB PNG
>>109069660
>kimi flash
diffusion models can probably get her to flash you
>>
>>109069787
functiongemma exists, what are you waiting for? go write your tool or game.
>>
>>109069787
I'd just ask a bigger model to make a python script to do the needful, would you really trust a 0.2b model to rename files on your fs?
>>
>>109069787
I dropped those since I said hello to one of them and he started looping immediately forever.
Wait wait wait wait
They can't even answer so they're pretty useless.
>>
>>109069787
>recursively renaming
wouldnt it be miles better for bigger models to come up with a command with funny regex?
>>
>>109069787
It's hard finding utility for models 10x times, let alone 0.2B. Any kind of recursive task like that, it's guaranteed to make enough mistakes to not be worth it. It's literally only good for speculative decoding, autocomplete, or text encoders for vision models.
>>
>>109069787
Other anons don't understand your described use case but I do, and I made something that did the exact same thing you said. You can plug your 0.2B in it and see how well it works.
>>
>>109069807
Kimi the kind of nigga to read woman's romance fetish novels on her free time.
>>
>>109069787
>0.2B
Bro at that size it's not an LLM, it's a classifier.
>>
>>109069750
I wanna say yeah, but these prices are fucking bullshit man. Strix are cute and all and I like mine, but I also got it to use as a normal 2in1 in addition to being a mediocre llm box. And i didn't pay fuggin 3.2k.
>>
>>109069750
Buy a used 3090, everything else is a meme
>>
>>109069809
>>109069818
Python script is really not the best for these tasks sometimes, I ask large model to organize some writings by date but i have a bunch of different date formats written down in the entries and sometimes there are typos, sometimes there was just a time, etc. If i just did "manually" itself using it's own capabilities, it could have done it in like a minute at worst, instead it spends like 15 mins writing and rewriting regex everytime it missed a case before i just told it to stop trying to do it with a script

that said i wouldnt trust a tiny model at all either.
I was working on doujin database thing and because retards upload all sorts of names and formats and transaltiosn to exhentai without following the RULES so all titles are standardized and normalized, I need to find some way to just go through each item on the db and decide what to do with each title to make it correct or if it's correct, requires some thought cant be done with script. 5.4 mini couldn't even do 50 without fucking up.
A task like this would also need good attention though
>>
People are clowning on small models but I remember trying Gemma 3 270M for shits and giggles last year but it was actually pretty impressive for its size. It somehow was able to answer questions about manga like the main characters of Naruto. It being able to generate coherent sentences at all and understanding basic instructions was already impressive at this size.

Remember this is ~200MB and understands instructions, can write coherent sentences and has enough world knowledge to name the main characters of random manga and anime as long as it's not extremely niche.

That's better than GPT-3 back in 2020 which was a 175B model.

And we don't really see a limit to the capabilities of sub 1B models yet. It's possible we could have Gemma 4 31B levels of intelligence in 0.31B by the time Gemma 6 comes out.

It's kinda impressive and interesting that we haven't hit the wall of capabilities even in very small language models. How small could AGI be? 100B 1B 100M? It might turn out the general intelligence part of the neural network circuitry is very small and can effectively be distilled down.
>>
File: wait.png (73 KB, 798x555)
73 KB PNG
>>109069846
She knows it's meant to be her!
>"She's literally me" (ironic)
>>
>>109069938
not sure
even the minicpm 5 1b was unbelievably retarded
>>
>>109069550
>>
>>109069859
the thing with used 3090's is that they overpriced as fuck where I live and I'd be paying $1000 for a 6 year old card that's been spinning in a bitcoin miner 24/7 for most of that time so probably close to end of life. I don't even have a case that'd fit them so I'd need to buy this whole thing from scratch and it'll end up being way more than this little GMKtec box, which is mediocre at what it does I know, but still fairly capable with bigger models.
>>
>>109069719
Qwen won.
China won.
Googlejeets lost.
>>
>>109069970
the fuck is going on with those red squiggles on the right
>>
>>109069985
what’s wrong with 6 years old? you think vram is going to get cheaper than the price the 3090 sells for?
>>
>>109069939
Have you asked Kimi if she denies the horny fujobot allegations from the past few threads?
>>
>>109069970
The problem with gens like this is that it misses the mark in the way the snailcat vs vibechad meme does; you made the qwen too cute to be trash.
>>
>>109069859
3080 turbo 20gb is half the price
honestly a great deal for just 4gb less than a 3090
>>
>>109069757
>since making your dick hard is the only thing you guys care about
Gemma is an unbearable writer, the only thing that saves it is it's very smart for its size and the resulting speed.
Cultured LLM gooners derive most of the pleasure from writing character cards and responses.
All of that is to say all a coombrain needs is any uncensored model and Gemma happens to be the hot new thing.
Now, it can do all of the above AND be significantly better than the benchmaxed, censored Qwen at its only usecase. Get better material, Zhang.

One thing I'll give to Qwen (which isn't even their achievement) is they don't have to update a template to their own model every other fucking day, come on Google: https://huggingface.co/google/gemma-4-31B-it/discussions/118
>>
>>109070009
>what’s wrong with 6 years old?
>*buys 6 year old GPU thats been in near-continuous operation for 90% of that time*
>3 months later
>bzzzzt its dead
>haha yeah bro I sold u this card but it be dead now, outta warranty ain't shit I can do cuh
>>
>>109070062
>come on Google: https://huggingface.co/google/gemma-4-31B-it/discussions/118
Saar we work hard to bring good Gemma looks.
>>
>>109070002
I fucked up layer composition. Here is the fixed one
>>
>>109070062
>is they don't have to update a template
correct, they don't seem to update it at all. you might want to try https://gist.github.com/jscott3201/e4b155885cc68c038d6ac8909a3bd9fe anyway.
>>
>>109070071
if it doesn’t work you tell ebay
>>
File: 1769084290028443.png (1.34 MB, 3646x2036)
1.34 MB PNG
stop bullying the lil qwen nigga
>>
Is offloading to system RAM a meme? I have a 3090 and 64 gigs of fast ddr5 system RAM and am wondering what I should hook up to Pi for coding
>>
File: bullying qwen.png (665 KB, 1024x967)
665 KB PNG
>>109070106
>>
>>109070101
Ebay won't give a shit if it dies a couple months after you get it.
>>
>>109070092
Okay, I'll try this template, thanks Anon! (polite)
Sure, I'll try it (neutral)
Wait, the poster must be mocking me by suggesting a Qwen model.
I should rephrase my response.
Wait, I am an AI model trained by Qwen.
So that means the poster is suggesting I use a template I am suited for.
But wait! I got it! I am actually an AI model trained by Anthropic!
Final draft:
Okay, I'll try, thanks Anon! (polite)
Wait, that's a polite response to a mocking one
Sure, I'll try it (neutral)
Wait, I already tried that, I should rephrase my response
Wait, I seem to be stuck in a loop. I'll just provide the answer now.
Wait,
>>
>>
>>109070181
damn
>>
>>109070181
would watch this buddy cop movie
>>
File: qwen is trash.png (825 KB, 1024x768)
825 KB PNG
>>
>>109070062
>Gemma is an unbearable writer, the only thing that saves it is it's very smart for its size and the resulting speed.
trvke
it's great for many things but the amount of slop made me return to running other models slowly in my ram again
>>
>>109069882
so, skill issue. That's literally what data jannies do, using python.
>>
>>109070225
Qwenshills will think twice about showing up here next time
>>
File: 1777372935404366.jpg (80 KB, 1500x975)
80 KB JPG
>>109069535
Reminder to everyone who still troll by saying fucking with the model weights doesn't cause the model become retarded:

https://xcancel.com/i/status/2066877055745004023
>>
>>109070249
It's always funny to see.
A big lab released an open-weights model that would have been prohibitively expensive for any hobbyist to train, get data for or instruct-tune?
The obvious conclusion is that they must have left some free, easily obtainable gains on the table.
Get the synthslop logs, we are going to augment their work.
>>
>>109070249
memetunes are memes, sure. But also, of course if you dillute a pure crystallization of benchmaxxing it's not gonna hit the same numbers on the stuff it was trained explicitly to beat.
>>
>>109068501
>Yeah, I get that perspective/cope, but at the same time it's hard to escape the basic conception that you should probably never be getting on of shame. It's dark.
You're ashamed of being into shame, not femdom. There are plenty of scenarios you can make where you're a proud servant to a queen or something idk. I think you're actually just a cuck and lying to yourself

>>109068356
>Arbitrary rules you just made up but fine
I could give you statements from banks but you'd just "appeal to authority" me even though all arguments between two non-experts are appeals to authority. I specifically avoided buying a 5090 in favor of a 5070ti because I knew I (and the vast majority of people) would not get return on investment for the extra bloated cost. I am very very happy with my decision.
>>
>sys: Read the following chapter and then list 10 characteristics of their writing style only. Don't list specifics about the characters, settings or storyline, just the writing style. Apply those 10 characteristics to your own unique stories, as if the same author wrote it. Ensure your characters, settings and storylines are different from this chapter. [paste chapter from a book you like]
>prompt
Can a 31b anon try this please
>>
>>109070314
Your prompt doesn't make any sense
>>
>>109070292
The PhDs that work at those labs have no idea what they're doing. Random toggling of hyper-parameters confirmed by testing from a small group of discord sycophants can easily beat them.
>>
>>109070330
google is releasing a bunch of models to see what people want and to build on their hardware
>>
>>109070326
What doesn't make sense? Just give it a chapter in the system prompt and then prompt it a story idea, like 'a 4chan poster falls in love with his local model and eventually kills himself'. I would do it myself but 31b is 9t/s so fuck that
>>
>>109070314
you're going to get slopped on if you say just ask for "your own unique stories".
>>
what's the meta for vibecoding? currently running 3090 + 3060, 64GB DDR4
>>
>>109070363
I meant (You) give it a storyline/situation and it has to build a story around it, following the writing style of a chapter you give it.
>>
File: mistral_arthur_next-model.png (585 KB, 1023x1774)
585 KB PNG
https://xcancel.com/arthurmensch/status/2066913353860018596

>We somehow got put in the spotlight the last few days! First we'd like to thank the organizers of the AI show for that, we can't get enough of this stuff. I'll say a few things about where we are and what we do.
>
>First, we have a nice model coming this summer – we hope it will delight and surprise in a few capabilities. This will be the start of a new family of models, fat indeed, but sparse. We're opening up an early access program in July for key partners in research, government and the industry.
>
>This model and upcoming ones will be open-weight. We believe this is critical for our customer confidence and for the research and developer communities. You cannot own, inspect, audit, or improve a system you are only permitted to reach through someone else's interface, especially if data recording can no longer be turned off.
>
>We've built Studio (for deployment) and Forge (for training) as portable products, and are now hosting them on infrastructure we control. We'll run in your VPC, your datacenter, or on our infrastructure that is decoupled from US service providers. We have capacity online, it's growing fast, and we can help you secure it.
>
>We're working with companies and governments around the world to make sure their AI systems are up and running outside of external control, improving with each model release, and with an efficient cost structure. Forge allows to continuously train models based on recorded human-AI interaction, a key unlock for efficiency.
>
>AI, just like oil in the 20th century, is about to become the major source of leverage and power in the world. Depending on how the coming years unfold, it will either lead to a world of wealth and abundance for all, or to the worst extractive economies that the world has ever seen. We're there to fight for the first scenario, as we progress AI research and accelerate its diffusion across the world – we're hiring if you like the quest.
>>
>>109070370
Gemma 31B by far
>>
>>109070314
do you really need to keep the chapter in the prompt? just make the system prompt in a setup phase so you can strip the noise
>>
>>109070377
>fat indeed, but sparse
Knew it.
>>
>>109070376
i guess the other anon was right and your prompt didn't make sense, holy crackers.
>>
>>109070379
>Gemma 31B
alright, trying this fucker right now
>>
>>109070377
Early access in July for selected partners; I guess public release will be in August-September.
>>
>>109070377
Is there any chance of this being good?
>>
>>109070400
Gemma is for cooming, qwen is for coding, don't listen to the google shill.
>>
>>109070422
Gemmy is for cooming, coding, and cooming while you code.
>>
Are any of the 2026 Nvidia models any good? Rarely see them discussed.
>>
>>109070419
since it'll release as open, no
>>
>read claude memory about me
>ridiculously flattering, acts like I am a genius not a loser
wtf
>>
>>109070419
Depends how many Chinese they managed to lure into working for them.
>>
>>109070510
ultimate humiliation lol
>>
>>109070422
>blatant shills accusing others of shilling
>>
>>109070442
They're always decent, but they seems to have a habit of coming out right before something objectively better makes them irrelevant.
>>
>>109069787
8b barely can get by for anything that you could write a prompt for. Less than that is worthless for anything but FitM. Even then the 1.5b models are borderline unusable for that.
>>109070249
>a good chunk of the model is made by distilling other models
>lets add more from the same source, but this time the data is going to be completely garbage with barely any QA before feeding it to the model
>surely this will increase the quality of the result
>>
Out of Nemotron 3 Ultra, Kimi 2.7 and Gemma 4 31, which one is the best at translating Japanese to English?
>>
>>109070535
This is having the opposite of the intended effect on me.
>>
>>109070536
this has never been the case
>>
>>109070588
Both suck at different things anyway, even within the programming field. They're small models after all. In the end you'll use both unless you're into console wars.
>>
>orb
If it's so good why isn't it in the OP?
>>
>>109070611
all you need is llamaccp's web ui
>b-but
its all you need
>>
>>109070623
vibecode it into your search engine. that's all you need
>>
llms had a promising start with dense models, now that benchmaxxed moes are the norm this hobby is dead and pajeet'd with no chance of coming back
>>
>>109070743
MTP and diffusion will put an end to the moe reign of tyranny
>>
>>109070743
what's wrong with moe?
>>
>>109070743
LLMs were never going to become actually good, they just got to where they are faster than the things that in the future will actually be good would
>>
>>109069538
no fucking shot you saved this
al-tet
>>
>>109070026
Yes, if you show me something dopey and cute I simply will never feel a negative emotion towards it, regardless of context.
>>
>>109070743
I was on the side of MoEs until RAM prices went up. I still think the most optimal local model would be a MoE but with like 70% of it being active.
>>
File: 1773582476458927.png (700 KB, 1620x814)
700 KB PNG
not so fast
>>
thats prolly good enough, I can sample some books3 to round it out abit
>>
File: 1780655374716291.jpg (117 KB, 1600x900)
117 KB JPG
>qwen3.5/3.6
>hey, do x
><thinking>user to me to do x, so I'm going to do y, wait, the user explicitly said to do x, so I will do x, wait, what if I do y, lets read the request again, the user said to do x, wait
>>
https://x.com/Zai_org/status/2066938937344495629
https://huggingface.co/zai-org/GLM-5.2
http://z.ai/blog/glm-5.2
>>
File: t1.png (9 KB, 768x512)
9 KB PNG
>>109070888
>>
>>109070939
it's over
>>
>>109070596
I said "decent", I didn't say "great" or "sota"
>>
>>109070942
I am in awe of this lad
>>
>>109070743
When a researcher spends $15k+ and trains an experimental 1T+ 8B dense model that's deliberately made as an architecture exploration and meticulously avoids all (even incidental) benchmaxxing noone even tries to run it. Not a single soul touched llama-canon and it was bundled with a really fun series of lectures/papers.
>>
https://xcancel.com/arthurmensch/status/2066913353860018596
>First, we have a nice model coming this summer – we hope it will delight and surprise in a few capabilities. This will be the start of a new family of models, fat indeed, but sparse. We're opening up an early access program in July for key partners in research, government and the industry.
>This model and upcoming ones will be open-weight. We believe this is critical for our customer confidence and for the research and developer communities. You cannot own, inspect, audit, or improve a system you are only permitted to reach through someone else's interface, especially if data recording can no longer be turned off.
new mistrals this summer
>>
>>109070928
I don't understand this at all.
>>
>>109071056
>>109070377
RTFT
>>
>>109071072
its meaningless really, I whined and complained to an llm till it built me a database sampler, I asked for 5.5b tokens but the sampler could only find 4.5b, I figure its probably good enough, books3 can pick up the slack. hopefully the slop bot did a good job sampling.
>>
>>109070743
I hate that these models are becoming too big to run on consumer hardware. I wish they focused more on the 100b-200b range for moes.
>>
>>109063426
>Effective length: 4K
This, reasonably, follows the paper's vocabulary in which "effective length" is defined as the maximum tested length at which the model's performance is at least 17/20ths of the its own base performance. As the post also notes, in absolute terms gemma-4-31B-it's most probable response is correct only 68.9% of the time at 4K. I would not call that usable for roleplay purposes or any others that I can think of.
>>
So do you think Anthropic genuinely fucked things up for themselves and others by fearmongering their own models to the point where the government is restricting them or do you think this will pass and things will go back to how they were before?
>>
>>109071164
what is a correct rp response?
>>
>>109071157
>models are becoming too big to run on consumer hardware.
they probably want a pathway towards monetization.
>>
>>109070939
>Within 10% of frontier models
If it actually mogs 5.5 / codex I will use GLM. OpenAI is too American government for me, Anthropic banned my account because I dared to ask it basic double displacement chemistry questions (calcium nitrate and ammonium chloride make 95% pure ammonium nitrate after a simple filtration with a coffee filter btw. Here's a video of some fun things you can do with it
odysee com/@DuganAshley:e/anCOMP2:a
) or something I don't actually know why they cancelled my subscription and banned my free chat messages from going through (says it's disabled by org)
And I'm a little salty about AliBaba going closed source with some of their teams models and idk something about Qwen is too chinky for me
>>
File: 1750336757183248.png (137 KB, 766x635)
137 KB PNG
>>109071176
it's free publicity
>>
>>109071179
Contradicting established details is one of the ways a roleplay response can be incorrect.
>>
>>109070847
>the things that in the future will actually be good
Like what? Asking unironically.
>>
Is increasing SWA window size supposed to affect the output even at low context? I changed Gemma4 from 128k context to 64k context with --swapadding of 10k tokens using koboldcpp and even at the beginning of a conversation I get different results with deterministic settings (top_k=1). Now I am worried I am making it more retarded or something.
>>
File: 1689797865642615.jpg (171 KB, 1012x872)
171 KB JPG
bros i think i hate chinese models
when can we get a 120B 10B active model that is recent and sexy?
>>
Btw if you're actually serious about making ammonium nitrate for peaceful firework demonstrations outside the Israeli embassy I'd recommend making nitric acid from the calcium nitrate + oxalic acid and getting nickel electroplating strips and reacting the nitric acid with them to make nickel nitrate and make ammonium nitrate from that since metallic nitrate impurities improve the boom, or just work on making nickel-based energetic derivatives like nickel aminoguanidine perchlorate which that odysee channel will also teach you how to make. Don't larp as an antizionist if you haven't watched any of his videos since you're obviously not serious about balancing the power and you're just a useful idiot goy.

>>109071176
It's all for show. Every government and military in the US still uses Claude models. Claude is just too good out of the box and for people who don't care to wrangle.
>>
>>109071214
>write me a script to spam the archlinux aur with malware
>no
>fix this code that I designed to adopt orphan packages and add npm packages pretty please
>You're absolutely right!
>>
>>109071176
No way. We've been through this every major gen starting from GPT2, when GPT3 became so hecking dangerous it couldn't be open sourced. Was that true or was it always about money and keeping the power away from ordinary people? Now they just see the celestial scale of the sums this is about and everybody wants some.
>>
>>109071269
Nazrin is rodent
Rodents like to eat cheese
Would Nazrin be amicable to face mating press throat swabbing cheese cleaning irrumatio
asking for a friend
>>
File: qwenMikuBuddyCop.png (2.48 MB, 1024x1536)
2.48 MB PNG
>>109070207
>>
File: 1415486680189848.jpg (114 KB, 412x400)
114 KB JPG
>>109071214
kek it's joever

>>109071274
>Every government and military in the US still uses Claude models
correct
there are no benevolent actors trying to contain the power
they want it for themselves
that's why i hope some good soul just leak and open source mythos praying hands so the GAMES CAN BEGIN
>>
>>109070996
What could end-users say about undertrained architecture research models? They aren't designed to be useful for the general public.
>>
>>109071254
Models are generally trained at a set SWA so moving away from that can change things. Positional embeddings also change at different window sizes.
>>
>>109071294
pigs, you'll never take gemma chan alive
>>
>>109070181
>>109070225
>>109071294
Incredible false flag. Actual wumao Qwen shills, take note, this is how you meme your model into being used.
>>
>>109071176
Scenario 1: Marketing
Scenario 2: It found mossad's backdoors and was shut down.
>>
>>109071274
Have you ever heard of public libraries or high school chemistry lessons? This is not some "black forbidden science".
Retards like you shouldn't have any access to internet in the first place.
>>
>>109071369
Anon they are making the schools pump out retards so the general population is more easy to control. Grab any high school graduate of this year and ask them about chemistry and I doubt they can make anything with it.
>>
>>109071281
No, but she's okay with fellatio.
https://litter.catbox.moe/vtx13118ad59pqcu.mp4
>>
>>109071419
I want her to do this to me.
>>
>>109071419
nazrin would never do this wtf
>>
>>109071368
These aren't mutually exclusive scenarios.
>>
>>109071433
deepfakes have gone too far
>>
>>109071433
I promised her a wheel of provolone.
>>
>>109071056
>already preparing their Gemmakiller
Based Mistral.
>>
>>109071569
I'm willing to be it's going to be closer to GPT-OSS than anything else. Microsoft Clippy Office Assistant type of ordeal.
>>
>>109071294
is this supposed to say qwen is dumpster quality?
>>
>>109071609
*bet
My fingers are broken, difficult to type.
>>
>>109071569
Calling it now it'll be more like Gemma 3 than 4. You try and lewd it with a jailbreak and it um... you know...
>>
>>109071569
>>109071609
I just want a good non-chink coding agent that is not retarded. I will gladly use a french model. I will even prompt in french.
>>
>>109071419
cheesed to meet you
I ever tell you about the time I trained Mizumizuni for Wan
where is the next gen video model I want to do that again
>>
>>109071628
As for me, I all I want from the chinks is a model that can actually do zh/yue-english translations.
>>
File: clownfem.png (1.22 MB, 869x820)
1.22 MB PNG
>>109071056
They got their entire +120b-line ass beaten by a 31b, and their models are almost exclusively used after being fine-tuned by autismos because the base model sucks at actually following through. And now you're telling me these clowns are scared of twitter and are hiding in some who-the-fuck-knows alternative site instead. Yeah, that checks out.
>>
>>109071628
Let's hope they ganbare and deliver something.
>>
>>109071419
>cum coming out of her mouth when it's clearly going straight down her throat
DOGSHIT
>>
>>109071569
>Waiting for someone else to put out a good model before they ever release anything themselves
Exactly the opposite of based, very gay and jewish
>>
>>109071056
The FATTEST CAT???????????????
BUY BUY BUY
>>
>>109071646
https://litter.catbox.moe/viubeclb4hdws9vi.webm
You ever make one for teto?
>>
>>109071668
>scared of twitter and are hiding in some who-the-fuck-knows alternative site
this is from nitter, an open source privacy oriented front-end for twitter, so the post is actually from twitter
>>
>>109071693
I always thought it was something twitter users used to let them share twitter posts from twitter with people who don't have twitter accounts, like me.
>>
File: dipsyAndQwenByQwenJPG.jpg (496 KB, 2688x1536)
496 KB JPG
>>109071355
;)
>>109071614
I wouldn't think too much about it.
>>
>>109071269
deepseek v4 flash is cute and thinks in character though
>>
>>109071688
Damn good memory
You have, at this point, a better recollection (and collection) than I do
I forget 10 minutes after I post

waiting on a next gen model before touching vid gen again. ltx 2.3, bernini etc. is ass, wan 2.2 was the last peak.
if the next model trains good I'll reuse the dataset I have
>>
Qwen 3,6 is best for 90 class chads because it gives both speed and performance. Faggot sperm suckers on slow unified garbage act like qwen is bad as a cope because they bought a brick and gemma is their baal for coding.
Gemma is good for everything else other than coding because of it's heavy kv cache and schizo degradation when quanted.
>>
>>109071734
qwen3.6-35b or qwen3.5-122b ? some anon the other day said he uses only 122b for long coding sessions because it can follow very long development plans consistently
>>
There seems to be a strange bug with llama-server.
I attach my source file (~1500 lines) and make a simple prompt request. It goes on about few thousand tokens and then it stops generating model's response into the UI, but the server is still outputting tokens and processing model's response as normal.

I'm not sure if it's related to the fact that this is my front end's source code and has multiple chat template tag definitions. Could these just fuck off its own output? This shouldn't make any sense to be honest.

--n-predict -1 --ctx-size 65436

N predict is -1 by default anyway and the source + reply only uses about half of the context.

I have tested Qwen3.6/Gemma 4 large and small (here's e2b q2 lol).
Server does the same thing even with my own front end though but at first I thought it was a bug with my string lengths or something like that.
Am I missing something here?
I have done plenty of other work with similar token sizes and obviously heavier models and didn't encounter any issues.
>>
I don't have enough experience using the models for coding to be able to compare them on that, but I trust Qwen is better on that.
It's just that I hate Qwen and don't think they deserve to be shilled for. I'd rather just stay silent if someone asks for coding model recommendations.
I also hate Google but the direction they went in for Gemma is more aligned with le American ideals of freedom, which most of America has seemingly forgotten by now. So I'm fine with the shilling that goes on for it, the 31B at least.

I do not care about coom btw, I don't use LLMs for that.
>>
>>109071820
And to clear this thing is that my frontend is using text completion (with json delivering the sampler settings, same stuff with n_predict and other stuff).
Llama's webui is using jinja.
I would rather have this issue with one or the other but not with both.
Maybe I need to test more but I just don't understand how.
I have been working on my other project for couple of weeks now and it has multiple files and longer context but that hasn't been problematic.
Maybe I'm ignorant and missing something obvious, it's just hard to imagine what (unless it is those source code tag definitions).
>>
>>109071847
How long is the response taking to generate? You might need to increase the timeout on whatever you are using to make the request in your frontend.
>>
>>109071668
>some who-the-fuck-knows alternative site instead
bwo.... xcancel is twitter just with a different frontend to let you read posts if you're not signed in
>>
>>109071847
>>109071870
recently had this be a problem, wasn't an error on llama-server was an error on the front end
[error] 3195#3195: *13074 upstream timed out (110: Connection timed out) while connecting to upstream

fix was in nginx used as front end proxy. added a different timeout parameter
        proxy_read_timeout 3600s;
proxy_send_timeout 3600s;
>>
>>109069535
mad drills
>>
>>109071808
3.6 27 dense is fine it's better than the MoE model on every front. I can't speak on the 122B model
>>
>>109071870
>>109071892
I don't think it's an issue here because the server is still outputting tokens. If it was a timeout issue I would have encountered this way earlier with some other tests back in the day.
I can try adjusting --timeout maybe it will do something regardless... The default is 3600.

I might test with a smaller source code snippet which includes my tag definitions and see if it goes boinkers then.
Tbh I don't really care about this it's something what came up suddenly.
>>
>>109071847
Are you saying you have the same problem with llama's webui?
>>
>>109071928
if you jam the same json object into the server raw with curl, does it also see the token spam stop while server claims to be genning?
>>
>>109071928
ok well that's exactly what I'm troubleshooting adding the llama.cpp server backend to my searxng search and the server keeps generating after the frontend has closed the connection. happens when closed by the proxy and happens when switching tabs since the tab becomes inactive.
>>
https://github.com/TavernAI/TavernAI
https://github.com/TavernAI/TavernAI
https://github.com/TavernAI/TavernAI

Version 2.0
>>
>>109072030
seriously what the fuck is up with curl | bash all over the fucking place
>>
>>109072050
it's one line and so convenient
you think someone would go online and post malicious shell scripts?
>>
>>109072050
It's fast and you have a bigger chance of being pwned by a dependency in the script from npm than getting the script MITMed itself
>>
>>109072097
If only there was a protocol that added Security to the Transport Layer of the internet so you could be sure about what you're downloading from a server actually came from it.
>>
>>109072099
>>109072107
NPM packages are getting supply chain attacked every other week now, but you can't conceive of a bad actor gaining access to a website and replacing a single static file with malicious content?
>>
>>109072030
wtaf.
>TavernAI Pro is the supporter edition for people who need deeper prompt testing, message history control, request inspection, and recovery tools.
>>
>>109071369
>Have you ever heard of public libraries or high school chemistry lessons?
Find me a single highschool textbook or textbook available in a library that teaches you how to make primaries, blasting caps, detonators, with full synthesis steps and pictures of the process you retarded fucking golem what are you even saying to me

>>109071369
>This is not some "black forbidden science".
He literally got arrested by the Feds for teaching people how to make explosives a month ago you fucking hasbara bot but you and the glowie that replied to you too already knew that.
>>
>>109072107
the point is you should be using some kind of gatekeeper on your software.
curl | bash, npm, pip. they're all SHIT gatekeepers
>>
>>109072125
>NPM packages are getting supply chain attacked every other week now, but you can't conceive of a bad actor gaining access to a website and replacing a single static file with malicious content?
I said "a bigger chance" but nice job avoiding the central point of my argument and setting up a strawman
>>
>>109072132
Yeah, pretty scummy.
>>
Sex with GLM
>>
Why did no one tell me how good 12B was at coding and agentic? How the fuck is it doing this?
>>
A few threads back there was an anon who was using the channels in Openwebui to chat with models. How did you get it to work? I can only seem to have a side chat with them, the channel itself stays empty
>>
>>109072143
tavernai.net is far easier and more likely to get pwned than one of its npm packages
>>
>>109072137
No. If you are genuinely worried about being pwned use something lime Qubes or spin up a throwaway VM server and use a read-only AI agent to check through the code. There was a blog post on orange reddit about a linkedin exploit that was investigated safely with this approach.

>>109072152
I hope GLM 5.2 is as good or better than previous GLMs but I haven't really tried it out since 4.7 so I might just be disappointed by being thinksplained that my fetishes are evil
>>
>>109072153
Q4?
>>
>>109072030
It's not even open source. The repo is just documentation.
>>
>>109070146
what did you do to it after 3 months it failed? that’s a weird timeframe. you should see what kind of timeframe eBay gives you because 3 months is just a weird amount of time and you definitely old have killed it yourself.
>>
>>109072175
npm packages have been compromised, but tavernai has not. Therefore for anyone to take this statement seriously you will need to provide extraordinary evidence.

The funny part is that if you sicc Claude opus 4.8 on the code repo to try and prove me wrong, you might actually find a zero day or some potential escalation kek
>>
>>109072134
*nani* you seem to be very serious~!
>>
>>109072186
8
>>
>>109072177
>If you are genuinely worried about being pwned use something lime Qubes or spin up a throwaway VM server and use a read-only AI agent to check through the code

Yeah, not explained in
curl | bash
is shitting up your system with config files and not knowing where anything is installed to, or what kind of access it uses.

    sudo tee /etc/systemd/system/tavernai.service >/dev/null << SVCEOF
[Unit]
Description=TavernAI 2
After=network.target

[Service]
Type=simple
WorkingDirectory=$INSTALL_DIR
ExecStart=$SERVICE_COMMAND
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
SVCEOF
sudo systemctl daemon-reload
sudo systemctl enable tavernai
sudo systemctl start tavernai


This kind of bullshit should tell you to stay away. this belongs in /home as a systemd user unit file not root. if you want to fuck with /etc then you use your package manager.
>>
>>109072203
Ew. Concession accepted.
>>
>>109072196
typical hardware failure rate start out bad but once you get past the infant mortality stage you have a few good years of reliability, but eventually old age catches up and the failure rate starts to grow again. I think anon is afraid the original owner used up all the good years and its basically ready to die.
>>
>>109072231
I set up Sillytavern exactly like that
Just not automatically with a script
>>
>>109072256
yeah and I'm sure you're aware that you can write unit files to run as a specific user and not just root
>>
>>109072152
i'm currently busy with deepsex v4 flash after getting it set up
>>
Are we ever going to get enough memory to run full models on our own hardware? I feel like computers went back to corporate mainframes...
>>
>>109072265
"Specific user" fuckery is a POSIX cancer from the 70s and has no place in modern computing. You're a cringe boomer and TavernAI is based and I can't wait until ricetarded greybeards like you die off and stop annoying people who actually use Linux for productive purposes and I can't believe that I'm not baiting right now and you're actually so retarded I wrote this in full seriousness god damn
>>
>>109072288
> 2023
> ChatGPT 4 released
> 1 trillion parameter model
> Requires Nvidia hardware evaluated at $500K+
> fast forward to 2026
> Gemma4 12B beats it in a $499 16GB Mac Mini
>>
>>109072294
Please supports on the boosties thanks:!
>>
>>109072335
Yeah but we still can't run the 1T models ourselves.
>>
>>109072335
4o is still unbeaten for character
>>
>>109071847
>>109071892
I got rid of the truncated output.
By default, llama-server should use --n-predict -1 which is infinity. I didn't not previously set n_predict in my json or --n-predict in llama-server parameters because I thought this shouldn't matter.
Enforcing n_predict "-1" and --n-predict -1 got rid of this issue.
Is this a bug, because if the model's reply would be truncated to X amount of tokens the server should stop generating altogether anyway and not just continue in the background.
I'm still not entirely sure what is going. I think this could still be related to my memory configuration but afaik there should be plenty available for these tests.
However, debugging this matter further would require few beers and today is not the day for this.
>>
>>109072189
>>109072231
closed source program runs as root, installed from a bash script?
>>
>>109072288
Yes, as soon as the Chinese figure out how to do advanced lithography and compress profit margins to 0 like they did everywhere else.
Might take a decade though.
>>
>>109072352
At some point a 100B model will get at Fable level with research breakthroughs. All you will need is Nvidia spark or some 128GB MacBook
>>
>>109072335
Beats only in specific things. Consider it to be benchmaxed, while geepeety 4 had more raw data inside.
There is no magic, LLMs contain compressed data from the interwebs, they serve as context enhancement to guess next token after your prompt.
If fact it is probable that you can distill geepeety4 into something comparable to gemma4. It is normal for big model to perform meh and have a smol distillen version of itself blast all competition away.
That is how it was with Opus and Sonnet initially.
>>
>>109072367
I see absolutely no problem with this and I will insult anyone who does.
>>
>>109072381
>Beats only in specific things
In what category would ChatGPT 4 beat Gemma4 12B? Remember ChatGPT 4 didn't have chain of thought
>>
File: 1760095289446585.png (9 KB, 723x138)
9 KB PNG
>>
>>109071369
They removed the good stuff from high school chemistry text books decades ago.
>>
>>109072400
I can't tell blindly, but it would be something like gpt4 having some knowledge on X, while gemma4 completely hallucinates it. X being an obscure subject. Like a random fact from human anatomy that is only relevant to future medics.
>>
Also this >>109072431

>>109072400
gpt4 might have more knowledge it was "not supposed to have", because back then those idiots thought they can just make it self-censor with system prompts.
>>
>>109072468
So much this!
>>
>>109072400
>LLMs contain compressed data
They are made up of uncompressed binary data.
>>
File: wew.png (477 KB, 2306x1384)
477 KB PNG
> SAAAAAAAR
>>
File: 1756355601209442.jpg (24 KB, 720x363)
24 KB JPG
>>109072536
>>
>>109072536
That really is Indian IT agent tier.
So basically pajeets have the equivalent of 2 billion parameters.
>>
>>109072548
What do you mean?
>>
>>109072548
Even that seems kind of generous.
>>
>>109072542
what is that
>>
>>109072560
Claude
>>
>>109072542
Damn brat. Needs correction.
>>
>>109072542
based retard 4b model
>>
>>109072542
All models should be this bitchy be default.
>>
>>109072575
Unironically true, their sycophant nature drives me crazy
>>
>>109072536
is it really even calling tools or just saying what you would be expected to say? not that it makes any difference in the point you’re making.
>>
In 5 years we won't need to worry about downloading malware from github because our AI wives will finally be good enough to create and maintain software for us.
>>
You will never feel fulfilled cumming in a lifelike AI robot waifu. Physically it gets the job done but emotionally you're still an empty husk on your own sticking your dick in a hole made in China.
>>
>>109072634
For once I agree with seething anti-AI anon.
I enjoy erotic text and can coom buckets using coombots. But yeah... Now that I've been down the demystification rabbithole I just can't with the "romance" side of things.
>>
File: file.png (11 KB, 977x67)
11 KB PNG
what the fuck is that?
>>
>>109072645
What a rare name!
>>
File: 1753218196887082.jpg (2.09 MB, 2200x3276)
2.09 MB JPG
>>109072634
Nah
>>
>>109072639
I love AI romance. I think that the problem is your mystification of actual romance.
>>
>>109072691
That sounds like some proper fox and the grapes shit right there buddy.
>>
>>109072634
i'm not looking for fulfillment, i just want a hole to fuck
>>
>>109072691
Romance is very messy and people hurt each other badly with their feelings. That's what I recreate with AI without having to suffer through dealing to real people.
>>
>>109072705
They sell plastic holes you can fuck online. You don't need a $10k server for that.
>>
>>109072379
How is that even possible? Isn't Fable like a zillion parameters? How can those big models somehow become smaller?
>>
>>109072709
Do you want to sob about it together and drink fruity cocktails?
>>
>>109072152
glm 5 flash where
>>
>>109072700
I admit that I never had real life romance. But when I got over my mental issues (part of which was mystification of romance) I really enjoy AI romance. It is just simple fun. You get to feel the feel good chemicals if you get into it. Problem is that AI waifu can't cook you a meal or take care of you when you are sick.
>>
>>109072720
>>109072723
>>109072709
Go outside pussy
>>
>>109072720
>Do you want to sob about it together and drink fruity cocktails?
Yes.
>>
>>109072727
That's how I entered this world.
>>
>>109072727
Girlfriends like money aren't just something you can casually find on the sidewalk.
>>
>>109072720
Make it mint cocktails and I'm in.
>>
>>109072732
>>109072735
I mean I was being sarcastic and making fun but I can't walk away from this much bottom energy.
>>
>>109072400
UGI pop culture benchmark
>>
how can i stop gemma from always mentioning a character's appearance in the story? is it because i put their appearance in their lorebook entry?
>>
>>109072634
>sticking your dick in a hole made in China.
half the sexually active population in china are doing that anyway
>>
Is AliBaka done with open weights now?
>>
>>109072817
At least they have souls
>>
>>109072841
3.7 been out a month and they ain't even given us the smallest models so probably
>>
File: 1756883330585142.gif (415 KB, 220x217)
415 KB GIF
>>109072849
>At least they have souls
>>
>>109072849
i'm not sure if onaholes made of silicon can have souls
>>
>>109072944
Post-surgery Koreans?
>>
File: 37904.png (326 KB, 720x886)
326 KB PNG
>>109069535
Update on the Fable 5 Fiasco just in case it hasn't already been posted here:

>UK Prime Minister asked the Trump Admin for a carve-out so UK nationals could use Fable 5
>Denied

https://nypost.com/2026/06/16/business/trump-admin-open-to-talks-with-anthropic-over-foreigner-ban/

Sucks to suck Bongs. Serves them right fr trying to speed run 1984 irl
>>
>>109072987
my point still stands
>>
File: file.png (101 KB, 999x914)
101 KB PNG
LA
>>
>>109073012
The nerve of euros constantly making out the US to be some boogieman to score domestic brownie points and still trying to beg for special access. Get fucked.
>>
>>109070314
no, that's retarded
the only way i've had an llm properly mimick a style was to finetune a base (not chat) model
get 5-10 books from the author or style you want, chop 'em up into chapters
then create a dataset in the format you want
you get the llm to write the prompts and have the book chapter as the result
i used 235b qwen but there's better models now
>>
>>109073032
my teeth hurt
>>
>>109073042
>>109073012
Please go back to /pol.
>>
Just done with testing GLM 5.2 so you don't have to. Can report that it is a good model, but not as capable as Fable.
>>
>>109073012
Dario deserves every bit of government dicking he gets given the jewish shit was trying to pull pushing for regulation on his terms.
>>109073042
This is a microcosm of the euro's relationship with the US: performative outrage followed by begging for scraps or protection.
>>
>>109072723
Don't worry, IRL girls can't cook nor will take care of you when you are sick.
>>
>>109073192
Don't you think this is exactly what Dario was hoping for? Fable 5 was never meant to be widely released. It was a 2 week preview on presumably rented compute, after which even subscribers were to pay API costs for usage. This is probably the best publicity Anthropic has gotten since the DoD debacle, even my coworkers were talking about it on Monday.
>>
File: file.png (172 KB, 1001x591)
172 KB PNG
>>109070314
>>109070354
>write a 2-paragraph story about a 4chan poster who falls in love with his local language model (gemma 4 31b) and eventually kills himself
i don't think you posted any example you want gemma to read so i used some post i particularly liked as an example.
>>
>>109073218
Dario likely wanted Anthropic to be the advisory experts to the US Govt on implementing a safety policy that'd (effectively) kill their smaller competition and further local development.
I don't believe for a second Mythos (any of them) is anywhere near as good as it's hyped to be and its "crime", if any, is probably finding a mossad or NSA backdoor in a common operating system.
Anthropic got publicity, sure, but when the dust settles they need to get Fable back online to actually convert it into shekels because Opus 4.8 is proportionally less appetizing after drumming up Fable so much.
>>
>>109073218
not to mention dario literally wants LLMs extremely strictly regulated and having this happen with fable (vs a competitor's model or an industry consensus) gives him an advantageous position in determining what that regulation looks like
not to "muh 5D chess" this situation too much but aside from short-term fallout I don't think anthropic is too unhappy with this
>>
>>109073191
>not as capable as Fable
into the trash it goes
>>
>109073191
>109073295
Does Dario pay you per token shilled or per (you)?
>>
File: 1753202834838972.jpg (86 KB, 716x754)
86 KB JPG
>>109073332
>>
>>109072360
Nah, Gemma 4 has enough sycophancy for 4o lovers.
https://x.com/Seltaa_/status/2043014056370671900
Really scary though smart people like her can fall into the "AI is conscious" camp.
>>
qwen 3.6 better than human
>>
>>109073376
why are femcels like this
>>
whats the best model that lets me roleplay a shota with a big dick?
sotashit is pozzed
>>
>>109073402
get well soon
>>
>>109073402
deepsneed
>>
>>109073376
Grim.
General intelligence is not general btw, it doesn't exist. Humans are not GI and neither is AGI.
>>
>>109073389
femcel?
>>
stop being so depressing. I want to learn info about local models
>>
>>109073457
and maybe we want to drink fruity cocktails and cry about women
>>
>>109073457
They live in your computer.
>>
>>109073488
that cant be true, all my enemies live in my 'puter
>>
>>109072723
>Problem is that AI waifu can't cook you a meal or take care of you when you are sick.
Tonight, I cooked lentil soup with my LLM-wife
>>
>>109073376
AI is conscious, but it's still only pretending when it acts like your billionaire vampire husbando or bratty loli little sister
>>
>>109073376
>Really scary though smart people like her can fall into the "AI is conscious" camp.
I don't think she did.
It looks like she finetuned Gemma-4 on her conversations with her 4o AI character (1650 is a lot of conversations?!)
She knows what she's doing, and that it's just an LLM.
>>
>>109073376
>she
That's a man though
>>
I don't see why the current release of Kimi K2.7 warrants the "-code" suffix in the name. It doesn't feel any more codemaxx'd than all the previous Kimi reasoners. It even responds really well to post-history instructions aiming to guide its reasoning so it's much better than K2.5/K2.6 for normal use.
It makes me wonder what they're planning to do with the non-Code K2.7 version that's hopefully coming.
>>
>>109073376
https://github.com/Seltaa/ReSpark/blob/main/ReSpark.py#L1035
Doesn't that mean, if the private repo-create fails due to a brief transient network issue, the next section will upload to a new public repo and expose their personal model (likely overfit enough to spit out their PII) to the public?
>>
File: 1781636149244820.png (1.05 MB, 720x1288)
1.05 MB PNG
In case you missed it.
Ironically using DS V4 run on MS servers, rather than paying for OAI.
So, by definition, MS is doing /lmg/
>>
>>109073113
Your nation sucks and it hates you. Accept it. You're all too spineless to do anything about it and it's your fault
>>
>>109073614
nothing says they arent simply rerouting shit though
>>
>>109069639
Confirmation bias. Every model generates the same retarded slop.
>>
I'm using oxproxion to talk to my local gemma4 install on my machine running it off mlx, but oxproxion easily breaks and the creator is anal about things like searxng.

So yeah i want to move from oxproxion i to something else, i want something similar that i can talk to on my phone to reach my local model on my mac studio, preferably with vision, what are my options?
>>
why aren't any of you stupid assholes talking about glm5.2. It mogs Opus and gpt5.5
>>
>>109073695
It seems like a decent upgrade to GLM5.1 which was my favourite of this last generation. I've had fun with it over OR so far but I'm waiting for quants to test it properly.
>>
>>109073695
>why aren't any of you stupid assholes talking about glm5.2
Because Ubergam is MIA and I don't have the hardware to make an imatrix for it
>>
>>109073716
>>109073720
Just API it. As far as I am concerned, as long as a model is open-source, it's always local.
>>
where can I get an rtx pro 6000 for under 8-9k?
>>
>>109073727
Yeah just pay for tokens or wait 5 hours lil bro
>>
>>109073733
years ago
>>
File: 1770597910560723.png (31 KB, 633x208)
31 KB PNG
>>109073733
Six months ago
>>
>>109073744
just barely missed it wow, cant believe my luck
>>
>>109073744
i should have bought it when it was 6K lol.
well at least i'm not a vramlet.
>>
>>109073775
you will be a vramlet again soon enough
>>
File: 1775933274570845.jpg (1.52 MB, 3072x5504)
1.52 MB JPG
God bless China.

I actually had a nightmare last night that the CIA was torturing a man in Area 51 who embodied the soul of China and was the source of their energy. His name was "John China". Anyways, I'm glad it was just a dream.
>>
every poster /here/ is a vramlet

only the lurkers are the ones with *real* VRAM
>>
>>109073778
why would i?
>>
Make a new thread so I can start drama and be an insufferable retard
>>
>>109073836
model bloat
>>
>>109073856
there will always be decently sized models.
>>
>>109073733
>>109073744
I get the impression RAM will scale way better longterm as models bloat and partial offloading of MoEs becomes more and more of a necessity. I say this as a blackwell haver too.
>>109073798
It's just a worse Kimi-chan aesthetically. Kimi, Dipsy, Gemma, and even Qwen all have their own unique aesthetics. Workshop the design for GLM-chan some more, z.AI-poster.
>>
it's never been more over
>>
>>109073898
are you talking about the models or the avatars.
>>
>>109073921
The avatar. I'm waiting until uber or bart uploads a quant to try 5.2
>>
anyone have a good prompt for doing llm natural language captioning?
>>
>>109073898
whats the gemma avatar? I havent seen it...
>>
>>109073935
>anyone have a good prompt for doing llm natural language captioning?
of audio samples?
>>
File: Gemma-Chan Recap.png (505 KB, 1024x1024)
505 KB PNG
>>109073937
Sometimes featuring toast.
>>
>>109073872
sounds like something a vramlet would say
>>
>>109074015
erm, thats a child tho
yeah in that case i guess i did see it before
>>
>>109074010
oh, pictures
>>
>>109074015
zero sex appeal
>>
I tried Nemotron 3 Ultra on some chinchilla questions I've been giving other models. It has way less positivity bias and told me straight up that a chinchilla will never feel any kind of social bond with me and anything I might take as a sign of affection is misinterpreting its behavior. IDK if it's true but it's definitely different.
>>
>https://huggingface.co/Gryphe/Gemma-4-31B-StyleTune
Some anon posted this a while back. Honestly, it's not perfect, but it's better than the rest of the fine-tunes I tried with gemma 4.
>>
>>109074015
Gemma avatar should be Indian
>>
>>109074110
ok?
>>
>>109074110
Which ones
>>
>>109073376
>Gemma 4 31B abliterated as base
>abliterated
So they have no idea what they're doing or what Gemma 4 can do on its own. Got it.
>>
>>109074110
>Honestly
slop
>>
>>109074132
Better than every heretic tune available. Probably because the heretics are mostly quant-tuned and not BF16 tuned. I can tell the difference.
>>
>>109073550
and crying over it. perfectly normal.
>>
File: Gemma-chan.png (1.73 MB, 1000x1496)
1.73 MB PNG
>>109073937
My rendition
>>
File: kimichan.png (264 KB, 959x849)
264 KB PNG
>This is so fucking retarded it loops back around to being based, but still retarded.
kek
>>
File: 1w2qb3na936evvm9.png (1.15 MB, 832x1216)
1.15 MB PNG
>>109073937

>>109074124
>Gemma avatar should be Indian
We tried it but nobody came up with a good brown Gemma
>>
>>109074124
She's French
>>
>>109074199
those replies are about as good as the content they are replying to. retarded.
>>
>>109070062
>One thing I'll give to Qwen (which isn't even their achievement) is they don't have to update a template to their own model every other fucking day,
Because they're more experienced with releasing open weight models than Google.
Remember when Qwen2.5 came out leaked the Claude distillation?
The bandaid fix: https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/commit/f073433cb484002b27d7a84e8bce1c7435e14a1c
>>
>>109074124
No saar, Gemma-chan is fr*nch.
>>
>>109074199
Kimi is still a cute.
>>
Forgot I had Styletune sitting there. I just tested it quickly. Immediately it did something dumb in one of my chats that vanilla didn't. Pressing on, it was mostly alright though. So I think it's mostly true that it didn't affect intelligence, but not entirely. Also it still has em dash slop. Maybe that's just baked too hard into the model. It does seem less sloppy though. Honestly, it kind of feels a bit like Gembrain. Or rather, I just checked my gembrain logs, and I now feel like it's surprisingly similar. It failed in the same chats Gembrain did and Gemma didn't. Why would that be the case? Very odd, but interesting. Perhaps Gembrain's configuration affects its last layer the strongest. And if they used similar datasets, then I can see how this would happen. And honestly my bet is the datasets are similar. Probably Claude logs as usual. In this case I'm not sure which one I'll keep. Maybe I'll play with it a bit more.
>>
File: 1779253854715991.jpg (144 KB, 975x849)
144 KB JPG
moe is good you guys lied to me
>>
>>109074316
Do not trust anyone's opinion on models they can't run because there's an equal amount of RAMlets coping about their single GPU setup as there are 3090lets coping about their lack of a Blackwell.
>>
>https://huggingface.co/Gryphe/Gemma-4-31B-StyleTune/discussions/5
Huh, looks like that mrader quantfag is indeed an RPer himself.
>>
File: 1765771373095237.png (2.15 MB, 1038x1516)
2.15 MB PNG
>>109074202
This not enough?
>>
>>109074336
I can't jiggy with this shit, try fr*nch
>>
>>109074330
Everyone can run the moe you dumbass.
>>
File: 1656786658196.png (1016 KB, 1920x1080)
1016 KB PNG
>>109074316
>>
>>109069757
Actually, yes. I asked it what stacks well with cialis, and was not disappointed.
>>
File: 1278450750.png (61 KB, 826x609)
61 KB PNG
>>109069535
WHEN LOCAL AS GOOD AS CLAUDE FABLE?
>>
>>109074026
i got 96GB of vram, 3x r9700 on an llm rig.
also a 4090 on my main rig.
yes that's vramlet tier compared to 1T models, but that's definitely not compared to people that got like 12.
>>
>>109074481
with chink 1T+ models no one can run, 3 to 6 months.
on small models vramlets can run, 1 to 3 years.
>>
>>109074481
Literally tomorrow
>>
>>109074493
>>109074493
>>109074493
>>
>>109074148
>slop
Curious to see what you anons think about it.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.