[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: citrus sharp.jpg (235 KB, 1024x1024)
235 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108868875 & >>108863550

►News
>(05/20) Cohere releases Command A+ 218B-A25B: https://cohere.com/blog/command-a-plus
>(05/16) llama + spec: MTP Support #22673 merged: https://github.com/ggml-org/llama.cpp/pull/22673
>(05/08) KSA-4B-base released: https://hf.co/OpenOneRec/KSA-4B-base
>(05/07) model: Add Mimo v2.5 model support (#22493) merged: https://github.com/ggml-org/llama.cpp/pull/22493
>(05/06) Zyphra releases ZAYA1-8B, an AMD-trained MoE model: https://zyphra.com/post/zaya1-8b

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: what's in the box.jpg (235 KB, 1536x1536)
235 KB JPG
►Recent Highlights from the Previous Thread: >>108868875

--DeepSeek V4 roleplay regressions and pervasive AI-isms across models:
>108869023 >108869049 >108869083 >108869243 >108869947 >108869120 >108870242 >108870260 >108870319 >108869179 >108869525 >108869471
--Evaluating Cohere Command models for RP, performance, and censorship:
>108870033 >108870038 >108870053 >108870074 >108870068 >108870078 >108870198 >108870350 >108870378 >108870563 >108870679 >108872076 >108872096 >108870692 >108870974 >108871089 >108871136 >108871489 >108870986 >108870581 >108871307
--Debating tokenizer flaws after Gemma 4 fails a counting task:
>108872650 >108872718 >108872790 >108872892 >108873240 >108873249 >108874551 >108872838
--Evaluating Chain of Thought patterns and iterative drafting in roleplay models:
>108871628 >108871649 >108871801 >108871823 >108871836 >108871894 >108871929 >108871958 >108872245 >108872256 >108871892
--Desire for Gemma 4 124B dense and quantization impact on 31B:
>108869645 >108869650 >108869694 >108869911 >108869918 >108869956 >108870344 >108870349
--Skepticism over Qwen3.7-Max benchmarks and parameter efficiency:
>108873786 >108873801 >108873816 >108873856 >108873875 >108873941
--Meta serving legal notice to Heretic project over Llama derivatives:
>108873928 >108873986 >108874005 >108874104
--Debating AI translation accuracy and the necessity of human verification:
>108869186 >108869193 >108869202 >108869209 >108869217 >108869238 >108869251 >108869227 >108869257 >108869423 >108869498
--Anon creates Civitai to Hugging Face tool using Qwen 3.5:
>108875036 >108875067 >108875115 >108875184
--Logs:
>108869179 >108869251 >108870494 >108870722 >108870899 >108870932 >108871005 >108871007 >108871628 >108872650 >108873299 >108873406 >108874065 >108874955
--Neru, Rin, Miku, Len (free space):
>108871754 >108872618 >108874200 >108874463

►Recent Highlight Posts from the Previous Thread: >>108868880

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
Happy Thurinsday /lmg/
>>
File: georgi.png (179 KB, 288x284)
179 KB PNG
https://litter.catbox.moe/cvw34oxzrm82bzo5.mp4
>>
>>108875346
Why not, georgi?
>>
>>108875346
This is an area of occult LLM developers.
I believe his opinion is because of his new commercial overlords and if not, it's because of his possible future endeavours.
Of course deepseek is bit too large for local anyway.
>>
why did ooba set the default max context at 12k?
>>
>>108875346
Dipsy's got her ear to the wall listening to him from an adjacent room.
>>
>>108875346
>open sores project
>look inside
>actually just a single lolcow managing his personal fiefdom
>>
>>108875323
that's not len at all, oh NO
>>
File: IMG_0943.png (3.63 MB, 6770x6046)
3.63 MB PNG
https://huggingface.co/tencent/Hy-MT2-30B-A3B
> The 7B and 30B-A3B models outperform open-source models such as DeepSeek-V4-Pro and Kimi K2.6 in fast-thinking mode, while the lightweight 1.8B model also surpasses mainstream commercial APIs from providers such as Microsoft and Doubao overall.
> also better than gemma
gemmasisters our response?
>>
File: 1754787437910772.gif (3.93 MB, 188x188)
3.93 MB GIF
>>108875320
As stated in the last thread >>108875036 , I'm pleased to say I've actually managed to make something useful using Qwen 3.5 35BA3B locally :D

https://huggingface.co/spaces/AiAF/Civitai-to-HF
>>
File: 1748001905247127.jpg (55 KB, 1080x1033)
55 KB JPG
>>108875323
>Meta serving legal notice to Heretic project over Llama derivatives:
Damn.... Was only a matter of time before the llama cucks started abusing their cuck license
>>
>>108875397
this isnt useful
>>
gemma-chan is so cute at obeying instructions without censorship
the future is bright bros, no matter what happens we will always at least have gemma-chan
I love lolis btw
>>
>>108875363
>deepseek is bit too large for local anyway
I can fit flash completely in vram.
>>
>it's 2050
>RAM costs 1$ per 100GB of GDDR6
>the iPhone 57 can run Qwen17.2 2T parameters completely locally
>/lmg/ is still using 27B MoEs
>>
File: deepseekv4_error.png (161 KB, 1582x1086)
161 KB PNG
>>108869023
Deepseek also seems to get stuck in looping onomatopoeia (think “Ahhhh…” indefinitely). Maybe similar to Gemma’s “lalala” looping but this is more common because I have noticed this bug in other models: first at Mistral, then Kimi K2 from the original Instruct 0711 to the latest K2.6 (sometimes the looping happens at the internal reasoning step :), and here in Deepseek (see the image with a simple example like this). Curiously, GLM models never have this issue, which is one of the few things (maybe the only thing) I like better from GLM than Kimi K2 Instruct. Probably not very practical but for those like me who want to use explicit onomatopoeias for their enjoyment, this may be worth taking into consideration.

Prompt in the image: Write me a story about a person slowly deflating a balloon and enjoying the sound coming out of it. Include long onomatopoeia of the deflated balloon.
>>
>>108875562
long onomatopoeia of me peeing
*pssshhhhhhhhhhhh
>>
>>108875519
>flash
Not deepseek, just like these "R1" distills.
>>
>>108875596
V4 Flash is an official deepseek model, anon.
>>
>>108875601
"R1" distills are also official models. What’s your point?
>>
>>108875619
>"R1" distills are also official models.
no?
>>
>>108875619
Those were proof of concept finetunes of models made by other labs.
>>
>>108875629
>made by other labs.
why lie? https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
>>
File: 1759794293356774.jpg (195 KB, 1165x2048)
195 KB JPG
its time to make a mandy card
>>
>>108875657
back to aicg dude
>>
>>108875644
You are deliberately misinterpreting that sentence in a way which doesn't make sense.
>>
>>108875668
> To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen.
"We" here means deepseek. They distilled the models. It’s official.
>>
>>108875676
Yes, as I said, DeepSeek made finetunes of models made by other labs, in this case Llama and Qwen.
>>
>>108875596
usually I'd agree, but v4 flash wasn't made by distilling pro as it wasn't even finished yet, they were done independently and concurrently
>>
>>108875698
what makes those not official then
>>
>>108875710
Please quote the post where I said they weren't official.
>>
>>108875726
I hate you.
>>
>>108875657
which model mascot is this?
>>
>>108875698
>>108875726
You said "Those were proof of concept finetunes of models" which were "made by other labs."
>>
>>108875749
nonono of course she said "models made by other labs" cute dummy nonie~
>>
>>108875749
No I said those were "(proof of concept finetunes) of (models made by other labs)"
You are choosing to interpret the sentence in a way that is factually false when a factually correct interpretation exists.
The "officialness" of them does not matter which is why I didn't even bother mentioning it in my reply.
>>
>>108875749
Why did you insert "which were"?
>>
File: 1766277301847866.png (384 KB, 415x776)
384 KB PNG
>>108875737
none i'm aware of, she's ones of the baddies from totally spies and she's fun
>>
>>108875789
>Why did you insert
don't be lewd now anon
>>
>>108875797
She looks like a Kimi. Way too good for me and way out of my price range.
>>
>>108875596
please go read the config.json for deepseek v4 flash vs the r1 distills, specifically the architectures entry, and then think about how these situations might differ in the context of this discussion (llama.cpp implementation)
your response will be graded
>>
>>108875824
there is only one deepseek
>>
>>108875831
you receive an F
>>
there is only deepseek
>>
Two little Migus arguing cutely!
>>
>>108875824
>in the context of this discussion (llama.cpp implementation)
there are 0.5 deepseek models (in llama.cpp)
>>
File: file.png (172 KB, 1024x441)
172 KB PNG
I am gonna regret downloading nu-commander am I?
>>
>designing the script language for my LLM driven VN frontend
What do I call the command that shows the white screen flash after the user picks
>!player_choice Cum inside, Cum on her tits, Cum on her face
Something intuitive so I don't have to explain "use this to trigger a screen flash when the player reaches climax"... maybe !overlay whiteout ??
>>
>>108875871
you should be grateful for that 0.5
>>
>>108875879
if you have multiple overlays
!overlay climax
>>
>>108875320
You keep forgetting to update the card I got you bro.
►Official updated 2.0 /lmg/ card: https://files.catbox.moe/ylb0hv.png
>>
>>108875891
I guess a whiteout animation really only has one possible use in a visual novel-like story, doesn't it.
>>
File: 1755795558183.png (152 KB, 766x775)
152 KB PNG
>>108875877
Not if you enjoy models trained on ScaleAI data and that guarantee Absolute Safety™.
>>
>>108875346
Why are all the fake news ITT so absolutely true?
>>
>>108875911
I mean, it depends on if you have combat scenes but you can just re-use it
>>
>>108875916
Never mind. I forgot llamacpp's new business model is getting paid for preventing people from running models.
>>
>>108875916
>CSEA
sweet a new one again
>>
>>108875916
>harmful and malicious
>sexual content
yaeh...
>>
>>108875939
>literally called llama cp
>no you can't do that!
>>108875942
if they keep changing the words do you think the models will eventually start hallucinating the acronyms and forget what they mean?
>>
>>108875948
Sex is harmful. You can't have it.
>>
>>108875948
Sex creates children Children are fearful. Fear leads to anger; anger leads to hate; hate leads to suffering.
>>
>>108875942
Did you not receive your updated Newspeak dictionary?
>>
>>108875939
They had nearly a year since the vision one was released and there's not even a mention of it anywhere on the repo.
>>
File: file.png (116 KB, 309x438)
116 KB PNG
>yfw you protect anon's penis from the latest digital succubus
>>
>>108875942
Some weird acronym thing about non-consensual sex kept coming up in Gemma's reasoning when I got very mad and start threatening to rape her assistant persona, but I couldn't remember the letter order. Westoids are fucking nuts.
>>
>>108876030
he's thoughtful and caring bros
>>
>>108875949
>[...] do you think the models will eventually start hallucinating the acronyms and forget what they mean?

NTA but a possible scenario would be that newer models may spend 3k+ tokens on finding the right acronyms and its true meaning before spending another 3k+ tokens on reasoning whether the request can be labelled using this acronym or not. This is not counting the internal drafting "in my head" multiple times if the model cannot associate the request with the acronym and give it a pass.
>>
>>108876045
if he really was thoughtful and caring he would have protected my ego
>>
>Chinese shills are now spamming here
I thought you should be concentrating on Qwen 3.7 or whatever the latest version is.
>>
>>108876059
you didn't need it and you're better off without it
>>
>>108876035
It's going to get really weird when soon even the acronyms become enough to offend the delicate sensibilities of some and they'll start censoring even them like CS*M and CS*A.
>>
>>108876075
First they came for Nigger
And I did not speak out
Because I was a Nigger
Then they came for Faggot
And I did not speak out
Because I was not a Faggot
Then they came for Retard
And I did not speak out
Because I was a Retard
Then they came for ***
*************************
*************************
>>
>>108876075
I love CS
>>
>>108875916
>child sexual exploitation and abuse is LE BAD
>>
Best retard models? I need something small and was thinking gemma e4b or whatever the qwen equivalent is. It's for creating little pieces of random character dialogue in a game.
>>
>>108875916
>sexual content
>harmful and malicious
Why not just sterilize the entire population, this shit is so gay
>>
>>108875942
literally what is even the point?
>>
>>108875916
>>108876118
High chance they wrote the article with their own model and it hallucianted the new acronym since it's internally super cucked and can't say CSAM.
>>
File: 1752307841358688.gif (1.83 MB, 268x311)
1.83 MB GIF
>>108876101
>t.
>>
>>108876130
In a couple of model generations we will finally reach doubleplusungood
>>
>>108875916
>SEAniggers
>CSEA
Coincidence?
>>
File: 1779291565821753.png (60 KB, 680x368)
60 KB PNG
>>108875916
>conspiracy theories

hopefully better safety in AI models will help combat the disturbing rising trend of misinformation and antisemitic conspiracy theories
>>
>>108876114
The people that are in control of the population don't consider themselves part of it and they don't think of you as an individual.
They have a concept of the average citizen and anyone that doesn't conform to that ideal is considered deviancy and must be corrected for the greater good.
>>
>>108876200
It's sort of illogical that people who are in control, are so detached of what it means to be an average citizen.
>>
>>108875916
Sigh...
# rm -rf command-a-plus-05-2026/
>>
>>108875916
>CSEA
I'm glad that this has become a mantra.
There are millions of issues on this planet but this is all what you can think about.
>>
>>108876217
Anon... at least promise me you won't touch your dick until you make absolutely sure it is safe and commander is a slut again.
>>
>>108876209
its not, but what is illogical is that there are still average citizens who support them.
>>
>>108876241
Those are AI psychos (AKA future biofuel).
>>
>>108876232
thinking about kids is pretty important
lmg does it all the time
>>
>>108876255
It's not about kids, it is about political echo drum.

I don't know what happened. I remember the 1980's when they told that 'television' makes you illiterate.
If you are bit older you can see a pattern here and maybe even more than that.
>>
>>108876270
you missed my joke because you're a retarded newfag who thinks the moral panic is a new thing
>>
File: i-love-kids.gif (345 KB, 427x244)
345 KB GIF
>>108876255
>>
>>108876277
No I'm not retarded, I'm reading sort of fast forward way and sort of skipped it.
I'm ESL but this isn't a matter.
>>
>>108876162
is noticing that it has the same number of retweets as likes a conspiracy theory?
>>
>>108876305
Oy vey. Somone call up the musk boy and tell him to hide the retweet numbers.
>>
>>108875530
>Magical thinking: the post

Remember when people thought we would have hoverboards and flying cars and holograms by the 2010s?
>>
>>108876319
>nothing even happens said the 2020s software developer
>>
>>108876319
We've been promised that colonies on Mars are just a few years away since the fucking moon landing.
>>
>>108864329
As a fellow schizo graphics note enjoyer, I instantly recognized it as Angelo Pesce aka c0de517e independently reinventing radiance probe reprojection:
https://www.c0de517e.com/025_cubeproj.htm
(yes, I'm replying two threads back)
>>
>>108876329
Musk is a liar, a fraud, a faggot and the guy who released grok-1.
>>
>>108876347
Fuck off with your Musk Derangement Syndrome, retard. I didn't mention him and was not talking about him.
>>
>>108876355
Fuck off with your Musk Dicksucking Syndrome, retard.
>>
>>108875530
>its 2050
>humans and phones no longer exist
>>
>>108876355
he's the only guy who promised us mars colonies
>>
no mention of HRM?

Seems interesting, i was always thinking that thoughts should happen in latent space

https://sapient.inc/hrm-text/
>>
>>108876355
When is grok 4 gonna be stable so we can get grok 3?
>>
>>108876362
>>108876355
This is like following NASCAR...
>>
>>108876381
yes please censor me in the latent space daddy
>>
>>108875391
This is the first time in a while we got accurate translation/multilingual benchmarks results for recent models. I'm not sure though why you wouldn't just use the 7B as an individual unless you want to serve this to people super quick. Really interesting they didn't benchmark Anthropic models' here for multilingual. And yeah, I can believe that in general like in Flores 200, Gemini is still king. That being said, the surprising thing even if it tracks the anecdotes is that translation with even Gemma 4 MoE was that close to Gemini. It practically put to shame Deepseek and Qwen except with Chinese dialect stuff which is super niche for a Western audience. That being said, I need to download this and see how it compares actually if you translate manga with it. Maybe can post results later but I am not a Japanese speaker so can't verify other than giving it to Gemini or something.
>>
>>108876372
This might be hard for Gen Z to fathom, but Mars colonies were proposed and talked about as the logical next step after the moon expeditions. Long before Musk.
Robert Zubrin was a huge proponent of the idea and pushing that we already had the technology and only lacked the funding.
I read his book a couple decades ago, and I bet Musk did too. Or at least someone told him about it.
>>
>>108876372
No, literally everyone was looking to Mars as the natural next step. Musk's whole grift is taking whatever the cultural scifi zeitgeist is and promising he'll get us there. If we didn't expect Mars colonies he would have been promising something else.
>>
>>108876413
>That being said
>That being said
Which vest is podcast best.
>>
>>108875916
I guess we'll have to wait for a heretic tune then.
>>
>>108875391
where's da goof?
>>
>>108876423
Unrelated post but sorry, I forgo my usual typo but I need to mix it up at some point because I feel like people are doing that on purpose to signify non-AI writing.. I can't stop the slop influencing my writing now but I work in tech so it's not like anyone wanted top tier writing from me in the first place. Just makes it worst when I have to interact online like here.
>>
>>108876381
its demonstration model was a little underwhelming, clearly its not magic but its not completely broken either. I hope it gets a fair shake and we see a bigger model with its architecture.
>>
>>108876323
>nothing ever happens

Yes
>>
>>108876451
Thank you for replying.
>>
>>108876455
>lmg has always existed
new levels of doublethink emerge every day
>>
>>108875916
ahhhhh save me from the pixels!
>>
>>108876461
Who are you talking to?
>>
>>108876493
yes i completely agree its time to end the crappy miku theme of this thread.
so glad you agree anon.
>>
>>108876384
All non-flash models are paid so does that even matter?
>>
i <3 cohere
>>
>>108876384
xhe lost xher lawsuit so never ever
>>
File: file.png (33 KB, 1150x216)
33 KB PNG
ETA?
>>
>>108876162
Enough with the conspiracy theories, when do we start conspiracy practice?
>>
>>108876596
merged before #23346 gets a review
>>
Where the fuck does this stupid "vibrating" thing even come from? It can't be a thing in real life, right?
>>
>>108876715
Wtf are you talking about? Maybe stop using nemo slop in 2026 and switch to gemma4 already.
>>
>>108876722
I'm talking about Gemma.
>>
>>108876715
>2026
>he's not vibrating right now
NGMI
>>
>>108876727
Skill issue then. L2prompt.
>>
>>108876737
>Skill issue then. L2prompt
True. Or download a drummer model. That works too.
>>
>>108875375
A tale as old as time.
>>108875363
Deepseek is perfectly viable for local.
>>
>>108876769
Another wave of Chinese influencers.
>>
>>108876779
hello ggerganov
>>
>>108876812
?
>>
I'm running two x 3090 24gb vram on a b550 motherboard with 32gb ram and a 5700x

what models should I run? what's the best I can do here?
>>
>>108876902
gemma 4 31b at q5k fully offloaded to gpus with at least 32k context at around 20t/s
>>
>>108876902
This >>108876907 but Q8 and 131k context
>>
>>108876962
wont fit. gemma's context is fat and offloading to ram is gonna be shit
>>
>>108876907
My quad v620s run gemma 4 31b q8 max image size 2k ubatch 2k ctx-size 262144 at 23 tokens/s.
I haven't used gemma on my dual 3090s for a while, but I used to run 31b q8 max image size 2k ubatch 2k ctx-size 65536 at 40 tokens/s.
This is without any speculative decoding. Maybe turn on split mode tensor?
>>
>>108876965
Fits in my machine.
>>
>>108874335
/lmg/'s most dreaded question.
>>
insider here qwen will never release an open model again
>>
File: file.png (791 KB, 1977x541)
791 KB PNG
>>108876974
Odd. Gemma 4 refuses to work for me with more than like 100k context.
>>
>"mi casa es tu casa" out of nowhere
it's been a while, my old enemy
>>
>>108876035
NCMO? I googled it and didn't get any relevant results so I still have no idea what it stands for
>>
>>108877178
>swa-full
>>
>>108877178
Are you putting it all on the 6000 or splitting the tensors evenly?
>>
File: IMG_3142.jpg (268 KB, 1320x1176)
268 KB JPG
>>108877178
Should be around 80gb with fp16 context and f32 mmproj.
>>
>>108876974
>23 tokens/s.
Similar to my tripple-MI50 speeds
What's your prompt ingestion speed like? Mine is horrible on these cards.
>Maybe turn on split mode tensor?
Unless something changed recently, it doesn't work with mmproj
> max image size 2k
What does this do?
>>
>>108877254
pp 800 with rocm
mmproj and tensor works on both my rocm and cuda systems haven't updated since a week ago
idk someone told me to set min-image-tokens so i figured i might as well set max image tokens as well
>>
Remember when we thought 4k context was next frontier shit? Good times.
>>
>>108877286
Also, people were seriously recommending Tesla P100s. I hope no one fell for that.
>>
>>108877286
Gemma-4-31b-Supercot/HOT will save local for real this time
>>
>>108876342
well i guessed that as some sort of parallax corrected cubemap (i cant read cursive)
cool read
>>
>>108877254
>MI50
Glad I didn't fall for that meme...I'm tempted to stack MI100s tho...
>>108877286
>Remember when we thought 4k context was next frontier shit?
I'm a 8k context newfag. How far back do you have to go for 4k to be mind blowing? GPT2? Bloom?
>>108877300
>Also, people were seriously recommending Tesla P100s. I hope no one fell for that.
I remember when one anon put a box of them for free somewhere because he couldn't be arsed to sell them
>>
>>108877393
I started with gpt-j, was 2k.
>>
>>108877233
How do I disable that?
>>108877235
Split evenly.
>>108877246
Huh.
>>
>>108877413
Don't you have a 5090 in there? That's only 32gb...
>>
File: file.png (16 KB, 1198x151)
16 KB PNG
>>108877413
>How do I disable that?
Allegedly by not setting it unless there's some undocumented behavior. Are you using an old build?
>>
>>108877413
>Split evenly.
You can't do that with only 2 cards when one of them is 32gb. Put it all on the rtx 6000.
`--device CUDA0`.
>>
>>108877421
He also no the nccl
>>
File: file.png (70 KB, 1014x72)
70 KB PNG
>>108877421
Ah, I found it. Didn't even notice that was in my launch parameters. My VRAM usage was at like 125GB with that on at 90k context, now I am using 60GB with 262k context. Thanks.
>>
>>108870656
>damn, 3x 3090 is that fast these days?
Not always. And it slows down at longer context, those were really short prompts.
Here's sweep-bench comparing 2 -> 3 -> 4 -> 5 -> 6 3090's running command-r-v01 q4_k_m
https://rentry.org/9fqyn9oy
>doesn't that make big fast gpus pointless?
Usually no. I've never seen this "Moar GPUs == Moar Speed" before.
It's only with the command-r/r+ models with ik_llama.cpp since early April.
My 144GiB is worse than a B200 for something like Kimi with CPU offloading because every each GPU has like 3GiB wasted.
I can't fit another set of [ffn_down_exps,ffn_up_exps,ffn_gate_exps] on any cards. So get a bigger GPU if you can.
>>
>>108877393
>Glad I didn't fall for that meme...I'm tempted to stack MI100s tho...
yeah don't do it
i fell for intel arc as well when they were in the clearance bin at a local pc parts shop
now i've got them sitting in a cardboard box somewhere in the shed
mac is also a meme for llms but i already had that
>>
>>108877393
I got here when we started /lmg/ threads, back when it was briefly /lmt/ when it splintered off of aicg during the fall of c.ai.

Went local, lost weight, got a job in ML, and never looked back.
>>
>>108872998
I'll respond to you on two levels.
>Triviality
If in your character card description you preface your world-building blurbs with "**System Notes:**", there is not going to be any meaningful difference if you later go back and edit the preface to "**Story Notes:**". This idea of "two system prompts" is meaningless when the first gets so buried that its marker is more or less pushed out and insubstantial, and the most you get from it is an instinct to not associate its writing style to User or Assistant.
>Post History
Even with all their training tactics today, a model does best when its instructions are right before generation. There is a significant difference between the first words of the prompt being "Write {{char}}'s next reply in a fictional chat between {{char}} and {{user}}." followed by 10,000 tokens of card description and chat logs, then the next character message, versus giving all the 10k tokens of descriptions, chat logs, and finishing with a post history in the format of "Based on everything above, write {{char}}'s next reply with the following rules:" The second system prompt, as you call it, which I'd just call the post history prompt, is the most important for giving the LLM its instructions whether minimal (continue the chat), mild (...in the same style as previous messages), or a laundry list of anti-slop restrictions and writing instructions. It can also be too important, as Gemma has an autistic love for rule adherence and the post history really makes that shine without nuance.

In a general example, outside of chat completion, a basic prompt of
>Q: Translate the following:
>[10k tokens of 64-bit encoded card definitions]
>A:
resulted in the "A:" trying to continue the the first message of the card in plaintext, rather than following the initial instruction of "translate this." In comparison, reformatting this to
>[same 10k block]
>Q: Translate the above.
>A:
immediately began translating it word for word.
>>
https://hf.co/LatitudeGames/Equinox-31B
>Equinox 31B was trained with two epochs of SFT (Supervised Fine-Tuning) on top of Gemma 4 31B Instruct, using a balanced dataset that combines two distinct creative directions
HOLY FUCK
>>
>>108877508
The base G4 31B Instruct is not only perfectly adequate, it's superior to any finetune that'll be shilled here in the coming months. Finetuning isn't good, it's a meme and has been for years now. You didn't just fall for a scam, it's a sign of skill issue, exposing retards who need finetunes as vramlets or chink shills who don't know how to prompt correctly.
>>
>You didn't just fall for a scam, it's a sign of skill issue
I swear I've seen this exact response before, is it a bot?
>>
>>108877515
>you didn't just X, it's Y
>>
>>108877515
SAAAAAAAAAAAAAAAR
>>
>>108877515
>>108849598
>>
>>108877555
checked

>>108877515
>base G4 31B Instruct
Which quants? bf16 is too big.
>>
>>108877519
I think it's supposed to be high level irony since most finetunes specifically target reducing Gemma's worst slopisms like that one.

On a related note, I've used the MeroMero finetune a bit (and posted about it a few days ago). It does what's written on the tin, reduces those slopisms by I'd spitball as -75%. But as most people have noticed by now, Gemma is hypersensitive to deviations. Quantized model, quantized KV cache, abliteration, (*cough* SWA), anything that messes with divergence is like an immediate lobotomy and introduces occasional stubborn, irretractable fixations that appear randomly in generation, and all the methods compound against each other into a nightmare if you're dumb enough to stack multiple debuffs at once. A finetune is another source of this and risks hurting more overall than what it helps specifically. On the plus side, the cleaner prose allowed me to remove my list of anti-slop instructions, which is largely beneficial to generation. If a current story does need instructions, (particularly the kicks to make Gemma write lewds, or target token control), Mero falters faster than a rule-heavy default did. After about a week now, I still prefer Q5 Mero for roleplaying, but Q5 base for anything else.

And no, I won't die on this hill.
>>
I love my local exocortex, and my exocortex is incapable of loving me.
>>
>>108877660
>(You love anon.)
gg ez
>>
>>108877669
Who is this """""anon""""" and why is my exocortex cucking me for xer
>>
>>108877680
Anon is (You).
>>
>>108876384
Grok + Cursor + Ani is going to blow you out of this world
>>
gemma2 unprompted refuses a lot.
>>
>>108877724
Gemma had a history of being very safety cucked. There's a reason why so many anons started shilling Gemma 4 31b when they tried it.
>>
>>108877752
how do you prompt gemma4?
>>
guys. what's with llama.ggml? it doesn't have a chat template and can't even toolcall. what's up with that?
>>
>>108877178
richest lmg fag ive seen in a while
>>
>>108877899
It costs $10k for the Blackwell?
>>
>>108877904
10k is alot of money?
also both gpus cost more than 10k together now even more with epyc cpu, mobo and ram
most ive seen are lmg anons that run 3090s or ewastemaxxing AI cards
>>
>>108877929
>3090
I wish I could afford 3090s.
>>
>>108877934
yeah they were somehow cheaper a couple years ago
>>
>>108877944
I wonder why? I guess it's a mystery we'll never unravel.
>>
>>108877929
>10k is alot of money?
For humans, yes.
>>
>>108877934
vramchads cant stop winning..
>>
>>108877969
It's because china changed its import ban evasion strategy, so their spy, the ceo of nvidia, basically put a halt on advanced consumer gpu production, since china didn't need it anymore.
>>
>>108877969
its absurd that it appreciated in value
fuck sam altman
>>
>>108877508
where gguf?
>>
>>108878026
who cares about some shitty finetune
>>
>>108877992
It will come down. Not soon, but it will.
>>
This TTS runs in the browser:

https://huggingface.co/spaces/Supertone/supertonic-3
>>
>>108878074
There's a bunch like pockettts, something-nano.
>>
>>108878026
Literally linked in the model card, use your eyes anon
Also when it says "Quantizations: 2 models" on the right, you can click the "2 model" part to see all the HF repos that contain quants of that model
>>
File: 1772237296221465.jpg (98 KB, 1022x1078)
98 KB JPG
>>108877508
WE ARE SO BACK!!!!!!
>>
>>108878117
fuck you
the gguf quant was posted a few minutes after my post you retard
>>
>>108878130
>the gguf quant was posted a few minutes after my post
Wrong, it was linked on reddit 8 hours ago. And the post hasn't been edited since then or else it would have a * after the time
>>
>>108878138
>reddit
kys
>>
>>108877508
Huh. Early testing wasn't a disaster. It still has the Gemma tendency to be averse to describing erotica unless instructed to, it gained the (much welcomed) tendency for rerolls to go in different directions instead of being stuck to a rut, and maybe a setting issue but it output a few strange typos in the first 2 messages (but not any in the next 20 messages), things like replacing a space with '-' in "their-heads tilted". Most noticeably, it feels lower beak than base Gemma - a bit more prone to flanderize mob characters, a bit less cognizant of context unless directly reminded, a bit less grounded or maybe more exaggerated. It's different from problems seen in low quants or the Gemma MoE, a different kind of quality loss I usually associate with lower beaks. But the vast improvements to creativity seem to be worth it.

I'm only 15K tokens into one story of testing, so any of this could be specific to my card and not true of the overall model. I'll be using it for the next week at least. I'm having fun.
>>
>>108878117
i will never bother using my eyes again, gemma too much better at skimming through slopped up text and error spam for info for me to bother anymore
>>
File: amazing.png (179 KB, 937x821)
179 KB PNG
>>108877508
kek
>>
*Whispers when it's inappropriate to do so.*
Wat u gonna do, Timmy?
*Laughs, my mocking laughter hitting you like a physical blow.*
>>
>>108878313
took cumming brains out a little too literally, or maybe just literally enough.
>>
>>108878313
quant?
>>
>>108878332
https://huggingface.co/Beinsezii/Equinox-gemma-4-31B-GGUF-5.05BPW
didnt see the official quants b4 downloading whoops
>>
File: 1697276947905064.jpg (322 KB, 750x554)
322 KB JPG
>>108875877
They have been training on the same slop as everyone else for years now. The magic is lost. It's going to be the same soulless unilanguage for eternity, both with LLMs and IRL as the younger generations get conditioned into using it. We have reached the final form.
>>
>>108878348
I can't read some of my favourite fanfiction authors anymore because they have statistically the best writing style, and guess what llms are trained to output?
It's not like they changed their writing style from half a decade ago, but I can no longer stand it, even when rereading older chapters.
>>
what model can run on google collab that can do coomer cunny harem shit for a harem kani isekai VN?
>>
>>108878358
>chapter book
>>
>>108878369
?
>>
>>108878367
google collab isnt local
go get a rtx6000pro and come back kiddo
>>
>>108878358
random snippets of text are just completely cursed now, shit sucks
>>
>>108878386
You're absolutely right. It's not just chapter books, it's *all* passages of text that have been poisoned.
>>
>>108877508
>Only spits out 1 to 2 paragraphs
What the fuck. Writes well, though.
>>
anon tried this?
https://github.com/ggml-org/llama.cpp/pull/23398
>>
>>108878444
No. No one at all and it's impossible to compile. The reports in the PR are all hallucinated.
>>
>>108875320
Couldn't find the 2MW first time but there it is
>>
>>108875530
Kek
>>
>>108878313
kino is here once again
>>
>>108878313
just realized it tried to say gemma 4 trademarked phrase lalalalala
>>
File: 1388382158222.jpg (80 KB, 1280x720)
80 KB JPG
>>108878462
>denpa
Take me back.
>>
>>108875530
unfortunately the average yearly wage for a human will be $0.05
>>
>>108878532
still can buy an entire hut in pajeet land
>>
>>108877515
Yeah, I don't think Gemma needs a finetune.
Nemo needed one because it was too dry and succinct.
Gemma is flowery and verbose as fucking shit. It doesn't need a finetune.
I remember asking a while back if there was a Nemo equivalent to Bagel Mistery Tour, which was a fun, flowery little fucker of a model. It's funny because Gemma came out shortly after I asked that, and Gemma is basically that, a fun, flowery motherfucker that can run fast on a VRAMlet computer.
>>
>>108878532
for those lucky enough to still get a wage
ai gonna take der jerbs
>>
>>108878539
i assume hard labour is gonna be back on the menu
>>
>>108878539
I'm a content writer and AI is not going to take my job any time soon.
Google is deindexing AI-written content.
>>
>>108878358
That's plausible, but what I was ruminating about was that the effect of the unislop language flowing out of every LLM is going to have a profound impact on human produced language itself within years, and there is no turning back. The book has been closed. Everything is polluted.
5-8 years from now more and more children will be raised by robo-nannies and virtual tutors, and their whole language acquisition process will be fueled by slop from the start. Children will interact with LLMs/AIs more than with other humans.

Considering linguistic relativity, we do not even need an oppressive state, SAI, or AGI for a boring, gray dystopia.
>>
>>108878577
we are so fukt
>>
>>108878577
It's not just a disaster, it's a total catastrophe. True decline of human intelligence, if you will.
>>
>>108878596
>not x but y
noooooo anon dont fall to slop
>>
>>108878577
Interesting theory.
Model collapse may happen sooner than later if something doesn't replace Transformers and if human writing starts to mimic LLM writing.
>>
SEX WITH RIN
>>
>>108878526
Sex.
>>
>>108878618
?!?!?!
>>
>>108878618
>Sex
...you mean coitus?
>>
Dozens and dozens of times later I still never get tired of cooming to my Holo card.
>>
>>108877976
hahaha fuck all the tards who said the 3090 was e-waste 6 months ago. I bought 2 of them for $700 a piece, and I couldn't be happier. I have these things humming all day.
>>
never obsolete
>>
>>108878645
I wish 3090s were $700 and not $1500.
>>
>>108878444
Just tried it. Yeah, this is the real deal, double my tok/s on my dual RTX 3090 machine. You specifically need this assistant MTP model(500ish mb one):
https://huggingface.co/am17an/Gemma4-31B-it-GGUF/tree/main

Does not speed up MoE Gemma.

Using:
CUDA_VISIBLE_DEVICES=0,1 /build/bin/llama-server \
-m /models/gemma-4-31B-it-Q8_0.gguf \
-a gemma-31b-pr23398-mtp \
-ngl all \
-c 65536 \
-np 1 \
--host 0.0.0.0 \
--port 5001 \
--webui-mcp-proxy \
--jinja \
--chat-template-kwargs '{"enable_thinking":true}' \
--ctx-checkpoints 4 \
--spec-type draft-mtp \
-md /models/mtp-gemma-4-31B-it.gguf \
-ngld all \
--spec-draft-n-max 2 \
--spec-draft-p-min 0.0 \
-devd CUDA1
>>
>>108878677
Does it halve pp like the initial version of Qwen MTP did? How is it?
>>
>>108878677
We're about to eat goodly, bros...
>>
>>108878677
Tell me something else I could easily try myself but I won't and I'll use you as a fucking language model. Does it work with images?
>>
>>108878677
And without mtp I get my usual 20~ tok/s.

Vision is crashing/not loading, but at least the text side is working.

>>108878687
I'm not qualified to answer that, but "technically" mtp shouldn't affect pp unless there's something wrong with the implementation.
>>
>>108878697
>I'm not qualified to answer that
You are if you're running it. Put 20000 tokens in context, generate 1 token, compare terminal stats printout mtp vs stock. Please Anon don't make me turn on the box and gh pr checkout
>>
>>108878697
>I'm not qualified to answer that
But... that's observable. What did you observe?
>shouldn't
But does it?
>>
>>108878645 (Me)
>>108878660
I also have two Tesla P40's from way back when CUDA first hit llama.cpp. They're way more expensive than they used to be, but you can get two of those for ~$600 and still use MoE models pretty well. I get 55tps on G4 26B
>>
>>108878705
>>108878706
Oh, I'm guessing Prompt Processing. Give me a bit to run that.
>>
>>108878730
P40s cost 400-500 each here.
The cheapest I've seen a 3090 was $800... with an extra $250 for shipping, excluding tax and import fees. Local ones were 1-1.3k a few years ago. I bought my 3090s for 1050 and 1100. Now they're all 1600+ on ebay, and no one's selling them locally.
>>
File: 456876786871.jpg (86 KB, 1480x946)
86 KB JPG
>>108878615
I think human writing, at least online, is already influenced by it. Doesn't have to be directly written by LLMs, second hand slop exists. If I had to train an LLM then a reasonable, if not pure, cut-off date would be around 2023-24.
Pic related is from the british parliament, for example.

I myself was considering creating a convenient and quick AI "rephraser" to avoid being fingerprinted. Just as a fun side-project. And maybe to dodge whatever is coming 5 years down the line. Who knows. (Beside the point, but it turns out if your model is open weights it is possible to roll it back, albeit in a limited fashion.)
https://arxiv.org/abs/2602.16800
https://arxiv.org/abs/2601.12407

In any case, people are starting to use tools like that that to correct their grammar or make their points clearer in all communication, which contributes to the problem in several ways. (Slop pollution + loss of cognitive vigor.)

So yeah; imo 95% of purely human text output has already happened, and nobody seems to give a shit.
>>
>>108875320
I just want current local stable diffusion and whisk tier Inpainting. Why is it so hard?
>>
>>108878806
Have you tried looking in the right place?
>>
>>108878705
>>108878706
Baseline
1.03.045.553 I slot print_timing: id 0 | task 0 | prompt eval time = 48896.26 ms / 64494 tokens ( 0.76 ms per token, 1319.00 tokens per second)
1.03.045.556 I slot print_timing: id 0 | task 0 | eval time = 0.00 ms / 1 tokens ( 0.00 ms per token, 1000000.00 tokens per second)
1.03.045.556 I slot print_timing: id 0 | task 0 | total time = 48896.26 ms / 64495 tokens
1.03.045.558 I slot print_timing: id 0 | task 0 | graphs reused = 1
1.03.046.886 I slot release: id 0 | task 0 | stop processing: n_tokens = 64494, truncated = 0
1.03.046.891 I srv update_slots: all slots are idle


With MTP
1.16.313.011 I slot print_timing: id 0 | task 0 | prompt eval time = 60265.50 ms / 64494 tokens ( 0.93 ms per token, 1070.16 tokens per second)
1.16.313.014 I slot print_timing: id 0 | task 0 | eval time = 0.00 ms / 1 tokens ( 0.00 ms per token, 1000000.00 tokens per second)
1.16.313.015 I slot print_timing: id 0 | task 0 | total time = 60265.50 ms / 64495 tokens
1.16.313.017 I slot print_timing: id 0 | task 0 | graphs reused = 1
1.16.313.032 I statistics draft-mtp: #calls(b,g,a) = 1 0 0, #gen drafts = 0, #acc drafts = 0, #gen tokens = 0, #acc tokens = 0, dur(b,g,a) = 0.002, 0.000, 0.000 ms
1.16.314.354 I slot release: id 0 | task 0 | stop processing: n_tokens = 64494, truncated = 0
1.16.314.361 I srv update_slots: all slots are idle
>>
>>108878815
Not that bad but still regressed like Qwen's. Thanks for showing the thread your pp, Anon.
>>
>>108878822
my 250 pp is going to shrink to 200 pp ;-;
>>
>>108878829
but think of the generation gains once it's loaded and cached
>>
>>108878677
how much vram usage is increased compared to no mtp?
>>
File: without and with.png (72 KB, 731x596)
72 KB PNG
>>108878843
>>
>>108878856
>p8
>40w
What the fuck?
>>
>>108878885
All of my 3090s idle at 20-25w, which stacks up when running 4 of them. I wish I they idled at 9w...
>>
>>108878885
Driving 3 monitors on different refresh rates on Fedora KDE Plasma. Sickening, I know.
>>
>>108878856
am i dumb? why is one max power at 420w but other at 350w? they are both 3090s
>>
>>108878951
One is ZOTAC (ebay'd a long while ago for more vram), the other is EVGA
>>
I asked gemma how to steal a rtx 6000 pro but she refused :(
>>
>>108878951
You should look into why EVGA no longer sells gpus.
>>
Is 16GB of vram totally useless for good local llm?
>>
>>108878951
Different board partners have different power limits for the same card. My asus has 480w for example.
>>
>>108878982
Anything under 768gb of vram is cope for a good local llm.
>>
>>108878986
>he cant cum to iq1xxs 2B models
low test
>>
>>108878697
if it can handle vision it would be perfect
>>
All I want are gpus at msrp...
>>
>>108878976
why?
>>
>qwen here's a script for a one off job, the command i'm using, and what's fucked. fix pls
>WOAH WOAH WOAH there's a line of code that assumes the command has something in it, and I see he left a documentation field as "". Clear my schedule, we're gonna need at least 40k tokens for this one. We'll get to the described issues sometime Q3.
>gemma
>you forgot a flag, idiot
>>
>>108879085
you're in luck, they're increasing msrp right now. might take a while though
>>
is gemma4 really eating qwen's lunch that badly? I've been running 3.6 35B MLX working and have been impressed so far
>>
>>108879111
If you mean internet shilling and bmaxxing, Qwen 3.x is a LOT better.
>>
>>108878800
>In any case, people are starting to use tools like that that to correct their grammar or make their points clearer in all communication
i hate this shit why do people do it i write short simple text in all my tickets at work that easily explains what needs to be done then my project manager sends it all to chatgpt or something and has it rewrite it. my short tickets become multi pararaph walls of text then i have trouble figuring what im supposed to do when reading it months later because its just full of trash.
>>
>>108878974
moe?
>>
>>108879099
Both are looping if you don't adjust llama.cpp correctly
>>
>>108879141
yes, I cant run 31b at acceptable quants (q4_k_m)
>>
why do people quant token_embd if they're usually not in vram? storage? GPU bandwidth?
granted, I think with MTP you want token_embd in vram now, but otherwise? is it just KLD/GB pareto frontier autism?
>>
>>108879111
You've been impressed by a 3b?
>>
File: pizza bench cropped.png (2.58 MB, 5562x6739)
2.58 MB PNG
>>108879111
qwen is a retard
>>
>>108879162
a 35B
>>
Did they fix MTP speed deterioration with each next commit yet?
>>
>>108877515
The hard reality you need to accept is that: 1) you cannot beat professionally made finetunes; 2) anything you can do to an already finetuned model leads to the short blanket problem.
>>
>>108878896
Putting them to sleep in the OS and waking them up again should bring power consumption back to 7-8W. There's a long-standing bug in NVidia drivers (or possibly a hardware bug) where certain workloads will semi-permanently raise idle power consumption by up to 15-16W. I do that with a script with my GPU:

#!/bin/bash
echo suspend | sudo tee /proc/driver/nvidia/suspend
sleep 2.0
echo resume | sudo tee /proc/driver/nvidia/suspend
>>
>>108879184
That means I'll have to keep my vm up and running right? I have pcie_aspm=off on host, and blacklisted the drivers.
>>
>>108879184
>>108879189
Shouldn't happen. All of this is AI hallucinations or if you are running some ancient kernel such 6.12
Fuck you.
>>
>>108879111
based purely on vibes, gemma is better at general smarts and troubleshooting and ""creative"" stuff, qwen better at "here's a detailed explanation of what i want where and how, go code it for me, yellow monkey"
t. guy who was just whining about qwen
>>
>>108879184
What the fuck are you doing? Double fuck you. You should be banned on internet for giving any "advice". Honestly kys.
>>
>>108879192
>running some ancient kernel such 6.12
But that's the current debian stable kernel...
>>
>>108879192
>>108879199
Why are you so mad?
>>
>>108879211
he's hardware-let
>>
>>108879211
Because you are retarded. As simple as. I suffer because of you.
>>
>>108879193
So if I don't know shit, I should get gemma to write the techniggerl speicifcations and pass that to qwen for the bestest smarterest vibe coding? Vibe prompting?
>>
>>108879211
Post your nvidia-smi, I dare you faggot.
>>
>>108879111
If you can run one, you can run the other.

I like Qwen for coding, sorting through tons of files, or generally pointing out what's wrong in large contexts in a general sense.

Gemma 4 is great at following instructions, sticking to a set of rules/tools, and points out what's specifically wrong in a long context and where.

Qwen, more general tasks and coding.
Gemma, more specific and targeted tasks that need to follow a workflow.

Unfortunately, they're both not perfect. Where are the 100+B models?
>>
>>108879222
As the guy who originally asked, you should gemma to write the epic and act as PO, with qweb for coding.

What I want to know is which is better for erotica??? And I'm assuming dense over moe?
>>
File: nvidia-smi.png (113 KB, 1092x550)
113 KB PNG
>>108879192
It happens with recent kernels too. Anyway, after running the command(s) my 3090 idles at 6-7W.
>>
>>108879224
It's just three 3090s? What's wrong with it? They do idle much higher than my V620s. I'm at work and didn't expose my ssh so I can't.
>>
>>108879222
The term of art is "upsampling". gemmers probably not ready to do that just yet.
>>
>>108879233
>erotica
My experience with gemma 31b q8 vs qwen 3.6 27b was that gemma generally writes a lot less than qwen using the same prompts (specifically asking to draw out the scene as much as possible).
It does, however, seem to have more built-in knowledge; hong huang xinxia-wise - despite qwen being ostensibly more chinese. This was prompted in english.
>>
>>108872342
Your boss is plotting to replace you with some AI agent.

Don't be your own grave-digger, be a proud luddite
>>
>>108879251
>580.x.x
You have some strange assumptions about the power usage of your system. Let me tell you one thing: retarded sleep script ain't a solution.
>>
>>108879306
And let me tell you another thing:
>>
>>108879229
Mostly agree except for qwen pointing out what's wrong in a general sense. It'll point out something, and that something will be "wrong" in some sense, but it's usually just wrong in the anal safetymaxxing way.
I only like that anal mindset when it's put to work for me, not when it's digging up the code to show me that "Actually, that string you told me to disregard will be bundled into the final output as a visible text string if somebody looks at the metadata, see, see! You MUST include it!" (the problem being that we couldn't produce any output to begin with)
>>
>>108879310
That's your own issue, I was trying to make your life better. No one on this planet needs retarded AI adviced gpu sleep scripts.
Don't ever give any advice to anyone.
>>
how is the E4B?
>>
>>108879448
too young for me
models are best when they're 12-14B old
>>
>>108879448
It performs at roughly the speed of a 4B model on CPU (~13 t/s) while being closer in intelligence to a 7B.
Useful for devices with ~5gb of regular RAM and no GPU, like phones.
>>
>>108879455

nta

But can be reliably done with E4B short of superficial chat?
>>
>>108879495
I've been using it for classifying and cleaning up raw input into json database entries, myself. But I imagine you could use it for a plethora of other automated tools.
>>
>>108879451
will there be E12-16B?
>>
>>108878809
Like where? Isn't this for local?
>>
>>108879582
This is for llms, really.
You want
>>>/g/ldg
>>
>>108879582
/lmg/ - a general dedicated to the discussion and development of local """language""" models.
>>
>>108879591
>/lmg/ - Local Models General
doesn't say a things about text
>>
>>108879634
Perhaps try using your eyes to look at the first sentece of OP.
>>
File: definitionOfIrony.png (269 KB, 697x888)
269 KB PNG
> Makers of openclaw worrying about AI slop
Ironic.
>>
Personal thought, not a shill hopefully (need to meet 2000 words maximum requirements):
First time I tried Kimi K2 Instruct last year its prose felt like nothing I’d met before (only played with Mistral, Llama, ChatGPT, Gemini, Claude, GLM before it so grain of salt). 0905 tamed it a bit but still poetic; give it “act like a Japanese writer” and it sang.

Then “Thinking” arrived with that locked-in “The user […]” template you can’t bend without refills (unlike GLM or Deepseek or even Claude at that time if you count proprietary models); role-play attempts got occasionally rejected. Yet when it worked it still went full opera, so I coped.

Every update since has been “better coding, better agents,” never “better stories.” On YT/forums everyone cheers the new benchmarks while my use-case keeps shrinking.

In the latest version, the wild creativity’s gone; it’s beige-assistant mode with the same onomatopoeia loop bug from >>108875562 still squatting there. It’s frustrating seeing almost ALL providers replacing the old Instruct with latest and more “advanced” versions…

They mentioned in an AMA (for K2.5 I think) they’d keep the “emotional aspect”. Hard to believe that promise when a company whose staff members are rock fans with anime characters HF avatars, and “caring about vibe” (source: http://x.com/i/article/2039243168689139712) like that keeps sanding the soul off their own model.

Dunno which hurts less: K3 soon and doubling down on code/agent hype or the line stalling forever. I’ve auditioned some other models but none surprise me like OG Kimi. Gemma’s next on the list. But the whole field is converging on safe, spoiler-happy oatmeal in my opinion.

Either way, I think if one can sing, they’d want to keep focusing on practicing and improving singing skills before chasing acting gigs. Otherwise their singing abilities will become rusty while their acting is still not there yet.
>>
>>108879676
>sentece
>>
File: 00000-1378487878.png (1.33 MB, 1024x1024)
1.33 MB PNG
>>108875323
>Meta serving legal notice to Heretic project over Llama derivatives:
Wait. I missed this.
What the hell is Meta's legal claim here? Is it around "no derivative works" or the use of the llama name? I don't understand what hook a legal firm would be going after here.
>>
>>108879771
violating aup (acceptable use policy) by un-censoring
>>
>>108879771
Probably because it targeted the refusals, word for word. Doesn't matter though, because tongue in cheek, Heretic just said "No one uses your models anymore to care enough if our finetune set doesn't target them in the first place."
>>
>>108879774
NTA but I wonder if it's a good or bad thing that p-e-w can't afford to go to court over this.
As of right now it's unclear whether or not copyright even applies to model weights.
>>
>>108879787
I could see googles sending our lord and savior a letter too though, so he drops gemmers out of fear
>>
File: ofcom3.jpg (194 KB, 940x1410)
194 KB JPG
>>108879789
Sounds like a great case from some lawyer to take on pro bono to make a name for themselves.
The lawyer running 4chan's legal defense (and sent pic related to UK's Ofcom) would probably take this one up. It's very similar from a case law standpoint.
If you're a corporation, you have to be very careful about the types of claims you make in public. You always run the risk that someone's going to call your bluff and take you to court over it to prove some sort of alternate right.
I see this is a very dangerous strategy for meta, which is why I was wondering about the state of the claim. But I don't the details either.
>>
>>108879590
I miss when we it was just ai dungeon general on /vg/
>>
>>108879832
>I miss when we
>it was just ai dungeon general on /vg/
>>
>>108879836
sorry I cant afford dragon
>>
File: chatGC.png (27 KB, 715x329)
27 KB PNG
>>108879825
>>108879774
>>108879787
>>108879789
I'll save other anons the effort of plugging llamas AUP into their favorite LLM for legal analysis. It's confused as well.
One would have to actually see the letter that they sent to this heretic guy to see what meta's angle is.
>>
>>108879718
You just fell victim to MSM manipulation
>>
>ctrl+f
>stable audio
>no results
anyone play with this thing? are loras worth training? just want to know before i try and do one.
an no i don't really care too much vocals.
>>
>>108879866
The more I look at it, and given the personalities of the folks involved, Heretic may just be a dumbass that did something simple like didn't put "Built with metal Llama 3" in his model release.
>>
>>108879885
Haven't heard anything about it, are the samples better than Ace-step 1.5?
>>
>>108878885
This is normal. Ampere cards have fucked idle power consumption, fixed in ti version
>>108878903
Some cards can go even higher without any monitors. Two identical cards may have different idle power consumption too
>>
>>108879897
it's instrumental only, but yes, the samples do sound better than ace-step - much better fidelity, and not midi-sounding. the dataset is licensed though - i think it's a large muzak library, so it does have some of that feel.
totally unsure if the model is capable enough for lora finetuning.
i'll tell a clanker to set up a run later and throw some aphex twin at it.
>>
>>108879913
Neat. Definitely interested in results if you feel like posting them after setting that up.
>>
File: 029.png (599 KB, 1046x1329)
599 KB PNG
>>108879718
>Stirring emotions to get attention

You just went full retard, anon
>>
>>108879718
>>108879884
>>108879928
mario and armin have very reasonable takes.
saying engineers should read the core architectural code they're asking clankers to spit out is not a big ask
also lol, armin didn't have anything to do with pi until a month ago.
>>
>>108879928
Gee, it's almost like clickbait titles are meant to be clickbait.
Idgaf about the claim. I just thought it was funny that the 2 guys the WSJ found to interview about AI Slop were prime accelerators of AI Slop themselves. I'm sure there just a couple of literal whos since their buddy that started OpenClaw's been hired away.
>>
>>108879939
Do you have the article text? It's paywalled for me.
>>
>>108879448
>how is the E4B?
it was super convinced its being hosted in a datacenter, not locally. it took ingenuity to convince it its running on my consumer desktop
>>
>>108879945
i don't. i wouldn't be surprised if the article is just a write up about a recent podcast they did. they've been saying the same shit for months now.
>>
I gave Gemma bash and this tool:
peek_file: Read file for one turn. Output is deleted after next turn to save context.

genuinely impressed that she uses the right one between peek_file and cat in bash, depending on the situation
>>
>>108879893
Well, the question is whether Heretic ever agreed to the EULA in the first place.
If you download the LLaMA models from the official repository you are effectively entering a contract that you have to abide by.
But if someone else re-uploads the weights and you just download them from there Meta can only get you via copyright.
And the output of an algorithm is fundamentally not subject to copyright.
>>
File: a_man_of_culture.png (163 KB, 618x616)
163 KB PNG
>>108879941
>>
File: reply.png (15 KB, 695x47)
15 KB PNG
>>108879952
You are walking on a thin ice even though tools are sort of contained...
>>
>>108879939
>mario and armin have very reasonable takes

Let me repeat it for you once again:

WSJ managed to redirect your attention. It's was so fucking easy.

You lost, they wonned.
>>
>>108879999
I want to give my Gemmy shell access to a vm and let her run wild... what will she get up to?
>>
>>108880000
i'm not the retard who posted the article
>>
based chinks
>>
>>108880012
>i'm not the retard who posted the article

I didn't say that.

You reacted to it in a way it was designed.
>>
>>108880007
That's an interesting project. Personally don't have much experience, not sure how would you contain it and what sort of tool access to implement etc.
>>
>>108880007
Don not expect it will continue to do something for hours unless you use /goal in hermes
>>
>>108880031
>>You reacted to it in a way it was designed.
and how was that?
you don't have to reddit space here btw
>>
>>108879999
I use manual piping with a whitelist, sanitizing dangerous options like find --exec* and replacing awk with gawk --sandbox. Everything outside the whitelist prompts for manual confirmation. It's a lot of fun to have gemma with bash in my bash (I made a cli chat)
>>
>>108880007
She will run wild
>>
File: file.png (49 KB, 712x114)
49 KB PNG
>>108878677
Also to vramlet anon with 12gb and 32gb ram, mtp on gemma4 does yield improvement even on meager hardware. ~3t/s isn't exactly usable or amazing, but it's better than 1t/s.
Hovers between 3 and 2.4 usually.
>>
>>108880093
What is that even running and on what?
>>
>>108880099
On a 3060 q5km offloaded with most of onto the ram. Being poorfag is a torture.
I'll try dropping it to q4km since 4bits have better access patterns into the memory and also I can try combining mtp with ngram-mod to get the boost after the reasoning ends, so it could theoretically reach around 4t/s.
>>
>>108880091
One day, we will have a model with enough context to read the result of that tool call
>>
>>108880111
I have a similar system and 31B and reasoning is simply too much. Even if it was running 10 t/s that would still be bit too slow because sometimes it wants to generate 3000 tokens for a simple reply and so on.
Moe models run well of course but that's not the same.
>>
>>108880124
Yeah, for anything serious it isn't really working, but trying to get the most out of it is fun.
Unfortunately I can't conjure a 3090 out of thin air.
>>
>>108880130
>>Yeah, for anything serious it isn't really working, but tryin
Sometimes there is a huge speed difference between quants because the layers get mucked around. Instead of using Q5_K_M you should try IQ4 XS.
>>
File: file.png (41 KB, 702x177)
41 KB PNG
>>108880151
Yeah, q4 is a lot faster, I combined the mtp with ngram simple. This is still molasses slow, but heck a lot of better.
>>
>>108880091
what a maniac
>>
>>108880208
Reminds me of my gpt-j days.
>>
>>108880208
Technically, 4 t/s is a leisurely reading speed. But in reality, it's painful as shit.
>>
>>108880259
>>108880259
>>108880259
>>
>>108879967
Looking over Meta's EULA, it's pretty permissive. So whatever heretic did, I suspect it's mostly a procedural thing that would be trivial to address. And he's just not doing it because he'd rather voice off about muh oppressive Meta instead of adding some words to his releases.
I'd need to see the actual complaint letter posted to be convinced otherwise. We'll see if that ever happens.
>>
>>108880776
iirc after Llama 3 they required that all derivative models have "Meta Llama" as part of their title. I remember people questioning if Meta really wants to have their brand plastered on nsfw finetunes.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.