[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1729161978418371.jpg (607 KB, 1080x1920)
607 KB
607 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106582475 & >>106575202

►News
>(09/14) model : add grok-2 support #15539 merged: https://github.com/ggml-org/llama.cpp/pull/15539
>(09/11) Qwen3-Next-80B-A3B released: https://hf.co/collections/Qwen/qwen3-next-68c25fd6838e585db8eeea9d
>(09/11) ERNIE-4.5-21B-A3B-Thinking released: https://hf.co/baidu/ERNIE-4.5-21B-A3B-Thinking
>(09/09) Ling & Ring mini 2.0 16B-A1.4B released: https://hf.co/inclusionAI/Ring-mini-2.0
>(09/09) K2 Think (no relation) 32B released: https://hf.co/LLM360/K2-Think

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: threadrecap.png (1.48 MB, 1536x1536)
1.48 MB
1.48 MB PNG
►Recent Highlights from the Previous Thread: >>106582475

--Paper: Steering MoE LLMs via expert activation/deactivation for behavior control:
>106586569 >106586649 >106586696
--Papers:
>106589525
--Node-based agent circuit for multi-model daydreaming experiments:
>106591301 >106591335 >106591411 >106591447 >106591518 >106591560 >106591683
--DDR5 RAM purchase recommendation for glm air over waiting for Arc B60:
>106585865 >106585907 >106586028 >106586157 >106586691 >106587973 >106588740 >106588044
--MoE architecture enables larger models to be faster through selective parameter activation:
>106587275 >106587302 >106587405 >106587419
--glm 4.5 air setup issues in Silly Tavern template configuration:
>106586816 >106586886 >106587013 >106587027
--Qwen model dataset imbalances and performance tradeoffs:
>106582623 >106582643 >106583124 >106583138 >106583143 >106583155 >106586595 >106583147 >106592024 >106592033 >106592110 >106592242
--VibeVoice model availability, quality tradeoffs, and reverse-engineering challenges:
>106585909 >106585930 >106585940 >106588461 >106586039 >106586587 >106586610 >106586647 >106587720 >106586704 >106587007 >106587090 >106588243
--CPU offloading performance trade-offs for mid-sized MOE models:
>106583262 >106583338
--IndexTTS 2 speed and interface improvements for text-to-speech:
>106585295 >106585756
--Grok-2 support merged into llama.cpp:
>106587526 >106589842 >106589942 >106589949 >106590115
--Critique of flawed AI-generated writing despite model advancements:
>106592247
--ROCm 7.0 RC1 boosts AMD's AI performance, challenging NVIDIA dominance:
>106589235 >106589359 >106589362
--Parameter tuning suggestions for K2 model version differences:
>106584425 >106584478 >106585603
--Miku (free space):
>106584024 >106584226 >106584417 >106587589 >106587800 >106589360 >106589741 >106589764 >106592033 >106589913

►Recent Highlight Posts from the Previous Thread: >>106582480

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
qwenext goofs???????????????????????????????????????
>>
>>106593104
I want to do backpropagation with her, if you know what I mean.
>>
File: 1744660410909828.jpg (1.66 MB, 1166x2527)
1.66 MB
1.66 MB JPG
>OP image is a random non-slop Miku I posted a few threads ago
>>
>>106593104
PANTYHOSE FEET
>>
>>106593132
Boil some rice, put it on a plate and let it dry and then eat it for a similar experience
>>
>>106593164
yes but
WHERE ARE THE GOOFS?
>>
>>106593180
goofs for this feel?
>>
>>106593180
I look like this, say this, and also fail to quote posts
>>
Jaks are a sign of a diseased mind.
>>
File: 1737232912753522.png (996 KB, 1648x1300)
996 KB
996 KB PNG
>>106593196
>>
what prompts this schizophrenia? I just want my hecking wholesomechungus 'THE CAKE IS A LE LIE' qwen 80b goofs
>>
>>106593208
Me when using gpt oss
>>
this thread ggoofy af
>>
The melting man is back
he's much softer than before
did you borrow a personality
or did you steal it all on your own?
>>
File: 1755970028871316.jpg (520 KB, 1824x1248)
520 KB
520 KB JPG
>>
File: cypher.jpg (51 KB, 768x384)
51 KB
51 KB JPG
>decide to take a break from /lmg/ and doomscroll on twitter for a bit.
>it's not X, it's Y
>the smell of stale cigarette smoke and regrets
>fake greentext pasta spaced into paragraphs
>you hit on the core of the issue
>shivers, ozone, Elara, emojis
how do I unsee
>>
File: 1751389967120259.png (351 KB, 503x461)
351 KB
351 KB PNG
https://files.catbox.moe/eegitb.jpg
>>
>>106593302
I fell for it last time, ain't happening again.
>>
>>106593302
thicku miku
>>
>>106593302
nigga that's nuts
>>
>>106593302
Meh.
>>
https://old.reddit.com/r/LocalLLaMA/comments/1nhgd9k/the_glm_team_dropped_me_a_mail/
lol glm has employees doing social media engagement
wonder if one of them is among the people shitting this thread right now
>>
OP just delete thread if you can
>>
>>106593393
nah, fuck qatroons
>>
>>106593386
You are even more gullible than reddit.
Or something worse.
>>
>>106593393
Let the retard seethe. It's not like he can do anything.
>>
>>106593386
why would GLM shit up the thread where their models are praised?
>>106593413
what's the shitter even angry about? Is it the thread mascot debate again?
>>
>>106591301
I was thinking of fucking around with those sorts of workflows to see if I can make a smaller model perform better by making it go through steps before providing a final response. Almost like a thinking workflow that tries to extract as much information from the big picture to then focus on the relevant details and the like.
I got caught up with other projects and ended up forgetting about that.
>>
>>106593420
fuck your thread culture bullshit
>>
>>106593421
What's the UI in the quoted reply? Seems cool.
>>106593424
fuck you I didn't even advocate for "thread culture" I was just asking a question you dork
>>
I ask again , just in case. Can "Mistral-Nemo-Instruct-2407-GGUF" handle beyond 16K context?
>>
>>106593429
Try it. Only you can know if it can handle it to your satisfaction.
>>
>>106593429
Technically yes but realistically no. Just try it out for yourself, the model could fit on a 6G card ffs
>>
>>106593427
>What's the UI in the quoted reply?
Not sure, but I know of two UIs that can do that kind of thing, NoAssTavern (simpler and recommended), and astrsk (don't even download it, has telemetry and shit).
>>
>>106593429
it creates mustard gas
>>
>>106593429
No.
>>
>>106593429
Yes, of course.
It will perform worse than it does at, say, 4k context, however.
>>
>>106593420
>why would GLM shit up the thread where their models are praised?
you assumed I was talking about the meme spammer. I don't even pay attention to his image spam, it doesn't register in my eyes, image posters are to be ignored.
I was talking about people who praise this garbage model like you, you are the reason this is a garbage thread
spammer is just a minor annoyance that will go away after a b&, the retards never go away though
>>
>>106593462
>image posters are to be ignored.
sir this is image baords
>>
>>106593444
Huh, I stumbled upon another interesting UI called "talemate" mentioned in one of the NoAssTavern's issues.
https://github.com/vegu-ai/talemate
>>106593462
Every model smaller than Deepseek is garbo, get a grip. Smaller models like Air are the only thing most people can run. Fucking hell, you see how often Rocinante gets mentioned here? What is there to discuss with "non-shit" models if nobody can run them you dickweed?
>>
>>106593487
>talemate
Alright, that looks promising.
>>
>>106593487
>if nobody can run them
then let's close this so called local model general if no one is even doing local?
>>
>>106593504
>if no one is even doing local
Nobody is using anything smaller than deepseek? news to me...
>>
>>106593504
I am running the local sir
GLM chan very large
>>
>>106593511
deepseek 8b
>>
>>106593511
>Every model smaller than Deepseek is garbo
you said it yourself it's time to stop
>>
After I stopped shitposting in this thread the quality of it became even worse. I can't believe it.
>>
>>106593526
You're absolutely right! This really delves into the tapestry of how shit lmg is!
>>
File: 1746722380902789.mp4 (3.82 MB, 480x852)
3.82 MB
3.82 MB MP4
>>106593420
like kids need a reason to be angry
>>
>>106593539
>itt raises the kid experince
>>
>>106593525
It's garbo compared to large, cloud-hosted models but it's still fun. If the only car you have is a shitbox, do you throw it away? Come on, man.
>>
File: 1755964542474429.jpg (186 KB, 768x1024)
186 KB
186 KB JPG
>>106593393
Delete your posts
>>
>>106593553
>If the only car you have is a shitbox, do you throw it away?
yes, take the bus and train (API) like a normal person
>>
>>106593525
maybe I love garbo
>>
>>106593539
While it doesn't change my position on it at all, I suddenly understand where the proponents of age verification are coming from.
>>
>>106593566
That wouldn't help tho as clearly an adult is helping and encouraging the corruption
>>
>>106593301
You cannot close your eyes once they've been opened
>>
>>106593566
lmao you actually think age checks are to protect kids?
>>
>>106593575
anon is you okay, you can close the eyes
>>
>>106593559
Nah I think I'll stick to my shitbox. I can drive it when and where ever I want, and it won't suddenly change routes and timetables. But I support your ability to choose, just don't pretend like the only options are public transport or a lambo...
>>
>>106593301
If you get into imagegen, you'll see it everywhere.
>>
>>106593574
It wouldn't, but I get the emotional reaction.
>>
Thanks this is very helpfuls.
>>
>>106593612
I do not like this miku
>>
>>106593587
Im fine. Thanks for asking
>>
can i get a short stack miku pls
>>
File: 1754402174485487.png (2.62 MB, 1024x1536)
2.62 MB
2.62 MB PNG
>>106593629
>>
>>106593690
best xhe can steal is shart miku
>>
File: 1757278579632716.jpg (1.43 MB, 2000x1500)
1.43 MB
1.43 MB JPG
>>106593690
No. You get a baby Miku instead.
>>
Is NoobAI still the meta or have things moved on
>>
>>106593694
>>106593698
>>106593704
my day is ruined
>>
>>106593709
ponyv7 releases this month
>>
>>106593743
oh? can it be downloaded or is it online only?
>>
>>106593743
back to your board barney
>>
>>106593743
more sdxl slop?
>>
>>106593774
as opposed to what then?
>>
>>106593777
you haven't heard about the current best local model called chroma?
>>
>>106593777
idk, I haven't kept up with image gen, I wish we had something integrated with LLMs instead of CLIP
>>
>>106593777
Chroma SOTA 4futures!
>>
>>106593787
Can it match noobAI/pony for character stuff?
>>
>>106593787
That's just a rip off of ligma
>>
>>106593774
Wasn't it gonna be based on some random shit nobody has ever used
>AuraFlow
Yep.
>>
>>106593756
weights
>>106593774
it's based on auraflow
>>
>>106593813
>weights
ok, can it be downloaded or is it online only?
>>
>>106593820
Yes you will be able to download it
>>
>>106593829
Thank you.
>>
>>106593832
You're not welcome
>>
>>106593832
You're free to leave
>>
File: 1736470160461856.png (1.75 MB, 894x766)
1.75 MB
1.75 MB PNG
>>106593104
Good morning /lmg/ frens. I've got a question:

So it it pretty much confirmed and fact that you HAVE to use at least a 12B model I order for it to be "smart"? (Not forgetting important details mentioned earlier in the content)? Based on my own testing 7B - 8B models struggle immensely with this. What has your experience been like with the different sized parameter models?
>>
>>106593869
If you don't train on The Entire Internet a simple 4B is more than enough for the narrow use case of ERP.
>>
>>106593104
mikubutt
>>
>>106593914
should've been a miku short stack
>>
>>106593869
I wouldn't say smart, but 12b models are about the starting point where you don't need to hold their hand for every reply to get a usable output.
>>
>>106593919
*miku shart stacked
>>
>>106593539
He's just like me except I'm using a pc
>>
VRAMlets:
>image generation
pretty good
>voice cloning/TTS
okay
>text generation (simple)
decent
>text generation (advanced)
really bad
>>
>>106593930
What is this (advanced) thing about?
>>
>>106593935
DeepSeek K2 4.5
>>
>>106593869
I don't think 12B is enough, Nemo is pretty dumb too. GLM-air often mistakes who did what and struggles with theory of mind (secret keeping test and such). I'm not cool enoguh to run larger models though.
>Not forgetting important details mentioned earlier in the content
This one in particular is about specific context training and architecture, not really about parameter size.
>>
>>106593935
not brain dead
>>
>>106593942
>GLM-air often mistakes who did what and struggles with theory of mind (secret keeping test and such)
Mistral Small 24b and Gemma 27b are guilty of both these things as well.
>>
>>106593942
>GLM-air often mistakes who did what
sounds like prompt format issue that nemo used to have early on, probably broken implementation as usual
>>
Holy schizo
>>
Cursed schizo
>>
File: 1727769022327347.png (1.18 MB, 914x594)
1.18 MB
1.18 MB PNG
>>106593953
>>106593958
>>106593942
>>106593869
>>106593881
>>106593924

So I guess we have to accept that ALL local LLMs will make fuck ups in some way shape or form? What contribute more to how BADLY It fucks up: perimeter size, architecture, and/or training methods?
>>
>>106593857
>>106593862
Bawww.
>>
>>106593958
I mostly run it in text completion mode
can't have prompt format issues if you don't format your prompts.
>>
>>106593942
>GLM-air often mistakes who did what and struggles with theory of mind (secret keeping test and such).
Funny. I find that it does pretty well in keeping secrets.
Granted, I do prefill the thinking block with instructions to consider exactly those things, which might have some adverse effects in other areas I guess, but still.
To me, the one strong point about GLM is that it actually follows its thinking, instead of something like Qwen that might draft a whole plan in the thinking block then reply with something completely different, even with guidance.
>>
>>106594021
And for clarification I'm mostly referring to forgetting details right after you mentioned something, temporal coherence (if a system prompt or previous prompt mentions there in a park, they should stay in the park until stated otherwise or the LLM makes a transition that makes sense), not randomly switching the genders of main characters (this one really likes doing that: >>106593869 , ) etc
>>
>>106594021
>What contribute more to how BADLY It fucks up: perimeter size, architecture, and/or training methods?
yes
>>
>>106594021
>What contribute more to how BADLY It fucks up: perimeter size, architecture, and/or training methods?
Training on The Entire Internet will do that to you.
>>
has someone scrapped AO3 to create a dataset?
>>
>>106594111
it's already on most models and yes they did to creators dismay and threats
>>
>>106594111
IDK if they specifically from AO3 or from other sites to but here's The closest thing I could find to something like that that hasn't been nuked

https://huggingface.co/datasets/mrcuddle/NSFW-Stories-JsonL

It's not formatted to actually be useful for training but it does have a bunch of raw stories.
>>
>>106594146
https://archive.org/details/AO3_final_location
>>
>>106594111
its better to just do it yourself so you can filter it how ever you like. its like 40% gay porn by tag. and 50% Harry Potter by universe. it needs balancing if you want it to be useful.
>>
File: wild-macintosh.jpg (223 KB, 1125x741)
223 KB
223 KB JPG
I thought I could get away with running unquanted <4B model CPU-only on an old machine.
Nope, absolutely unusable.
Edge AI Status: Meme.
>>
>>106593869
Again, your prompting format is all wrong, if that's Llama 3.
>>
>>106594126
Gemma 2/3 and Mistral Small, that I've tested didn't appear to be trained on the ones explicitly tagged as "Explicit" or "Underage".
>>
>>106594305
It isn't. Elaborate further if you're certain it is. If you're going to tell someone something is fucked up with the hopes they will unfuck it, at least explain WHY....
>>
>>106594319
i mean obviously, why train on low quality illegal shit, the classifier correctly said hell no to that sick shit
>>
>>106594324
https://www.llama.com/docs/model-cards-and-prompt-formats/meta-llama-3/
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a helpful AI assistant for travel tips and recommendations<|eot_id|><|start_header_id|>user<|end_header_id|>

What is France's capital?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Bonjour! The capital of France is Paris!<|eot_id|><|start_header_id|>user<|end_header_id|>

What can I do there?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Paris, the City of Light, offers a romantic getaway with must-see attractions like the Eiffel Tower and Louvre Museum, romantic experiences like river cruises and charming neighborhoods, and delicious food and drink options, with helpful tips for making the most of your trip.<|eot_id|><|start_header_id|>user<|end_header_id|>

Give me a detailed list of the attractions I should visit, and time it takes in each one, to plan my trip accordingly.<|eot_id|><|start_header_id|>assistant<|end_header_id|>
>>
>>106594302
>CPU-only
Yeah, that's going to be pain. Not so much the token generation, but prompt processing is so slow.
There's a reason we use MoE models the way we do, generation on CPU, PP on the GPU.
That said, does whatever device not have a GPU you could use for PP with vulkan?
>>
File: fgsfds.png (1.19 MB, 894x766)
1.19 MB
1.19 MB PNG
>>106594324
just look right at the middle of the screenshot, man.
>>
>>106593104
Can someone recommend best Mistral model? Preferably abliterated
>>
>>106594382
The biggest you can run. Any.
>>
>>106594382
Medium 3 or Large if you know where to look.
>>
>>106594355
Old machine was promoted into a home server after I got new one. I like my home servers to be quiet and low-power, so I don't feel like sticking a GPU in it.
>>
>>106594353
>>106594356
That's a fuck up with how axolotl inference outputs. It likes to duplicate portions of text. Here's the correctly formatted text file i inference off of

https://files.catbox.moe/fozpkz.txt

Nothing I thought it is fucked up as far as I can see....
>>
File: 1740908472801028.png (238 KB, 465x279)
238 KB
238 KB PNG
>>106593301
Enjoy the wonderland and see how deep the rabbit hole goes
>>
>>106594408
>>106594356
>>106594353
>>106594305
It either way it completed in The exact fashion it was supposed to complete in so I don't see what the hyper fixation on that is.
>>
>>106594421
A single extra space can make your model drop 90IQ
>>
>>106594421
>I don't see what the hyper fixation
>>106593869
>Not forgetting important details
>Based on my own testing
>>
>>106594435
>>106594439
Nta. So what was stopping you from pointing that out the first time?
>>
>>106594421
>>106594446
Nta it's technically formatted correctly but also not really. It has duplications of the assistant token towards the middle and the end. remove those and then try again. Not quite sure why ultra autists >>106594353
>>106594435
>>106594439
Were so unwilling to point that out
>>
File: gl3cf.png (44 KB, 1083x289)
44 KB
44 KB PNG
>>106594446
The assumption that anon can google "llama3 chat format".
In that much, I admit I was wrong.
I don't care either way. Anon wanted info on how his chat format is wrong. I provided it.
>>106594461
>it's technically formatted correctly but also not really
It is or it isn't. It is not.
>>
>>106594461
>That's a fuck up with how axolotl inference outputs
>>
>GLM-4.5-IQ2_M
is it even worth using or would i be wasting my bandwidth?
>>
>>106594470
They understood how the formatting works it just had duplicates for some reason. He probably ran The prompt to AI or something and it injected the duplications and they didn't realize.

A simple "hey you have duplicate assistant tokens you might want to remove that" what have sufficed instead of being condescending. You know it's exhausting going out of your way to be that way right?


Not that it would have made much of a difference anyway since anything below 12b is retarded regardless.
>>
>>106594499
>anything below 12b is retarded regardless.
completely wrong though that is the fault of training on too much data
>>
>>106594514
Who are referring to?
>>
>>106594522
every lab right now cramming too much into small models instead of making narrow use case ones
>>
>>106594527
You mean something like
>https://huggingface.co/allenai/Flex-creative-2x7B-1T
>>
>>106594499
Anon is assessing the quality of models and can't use google, read or follow instructions.
>they, he, they
Be consistent.
I posted the example in llama's site. With his carefully constructed tests, eagle eye and attention for detail, I would have expected him to notice all the empty space between the chat format tokens and the content, which his catbox post clearly doesn't have. The other anon pointed out the template dups.
>>
File: 1729339670620776.jpg (162 KB, 1782x964)
162 KB
162 KB JPG
>>106594559
>data owners can contribute to the development of open language models without giving up control of their data. There is no need to share raw data directly, and data contributors can decide when their data is active in the model, deactivate it at any time, and receive attributions whenever it's used for inference.

What?
>>
>>106594559
no what the hell is this abomination fuck allencucks
>>
>>106594565
I used the format though, it just had duplications. The only error where the duplications....
>>
>>106594387
>>106594394
Ty, I just saw a lot of focused tarins... focused on some specific stuff like RP or philosophy, but I was looking for good one for general purpose research and deep thinking. So wander maybe someone know a good one that is stands out
>>
>>106594575
>>106594576
There's also a literal reddit version.
>https://huggingface.co/allenai/Flex-reddit-2x7B-1T
>>
>>106594585
What da fak I just spit out lol, I mean *trainings
>>
>>106594583
>The only error where the duplications
You're missing the empty lines.
>>
does Linux have an alternative to sillytavern yet
>>
>>106594598
>>106594559
It claims they can contribute to training without providing the user data.... How the fuck does that even work? Am I misunderstanding what they're saying?
>>
>>106594619
does window?
>>
>>106594616
Which followed after the duplications right? Removing those should have fixed the incorrect formatting
>>
>>106594619
llama.cpp HTTP server + curl
>>
>>106594625
You basically train a smaller domain specific model (expert modules) that can later be part of the larger final product.
>https://www.datocms-assets.com/64837/1752084947-flexolmo-5.pdf
>>
>>106594626
I don't use windows
>>
beg me to shitpost again so this thread stops being dead.
>>
stfu im zorking it
>>
Just give me the goof
>>
>>106594630
Look at this >>106594353 or llama's site.
After
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

there's an empty line. Every other line is an empty line. Those are not in your catbox file.
>>
>>106594652
i beg of sama-sama please just let us rest in piss
>>
>>106594044
>>106593104
I'm asking this to everyone: what's the bare minimum parameter size someone should use if they want to have decent RP where the "assistant" isn't retarded?
>>106594666
I don't think those are strictly necessary given that it autocompletes correctly without them. How do you know that's not done just for ease of readability?
>>
>>106594687
4B with proper training.
>>
>>106594687
you'll have to accept retardation and learn to live with it
>>
>>106594687
>How do you know that's not done just for ease of readability?
>>106594470
>I don't care either way. Anon wanted info on how his chat format is wrong. I provided it.
>>
>>106594729
I wonder if the deepseek api users over at /aicg/ have to suffer with it anywhere near as much as we do.
>>
>>106594652
i dare you to do it again
>>
>>106594738
>Doesn't answer the question
>>
>>106594743
Yes, I don't recommend reading their thread for your sanity but even they complain about all their models even Opus and such.
>>
>>106594756
Damn... So the retardation is inescapable no matter how big or "smart" the model is?
>>
>>106594687
The thing is, retarded is a spectrum.
Some people will have more tolerance for certain errors and certain magnitudes of errors than others, so the lower boundary us fuzzy as hell and a model can be perfectly serviceable in one scenario while fucking up another.
Some people will tell you 12B is enough, others will say 70B dense, other's will tell you to not bother unless you can go for the biggest best-est thing because retardation exists even in the best models, just to a much lesser extent.
Etc etc.
tl;dr : There's no consensus and I'm not sure there can be, at least for now.

>>106594648
Reminds me of CUDADEV's idea of training a bunch of different models on a subset of the full training set, running them in parallel, then averaging the logits, although in that case it was more about getting the results equivalent to a model trained on
>[number of models] x [training tokens each model sees]
tokens than specializing models.
>>
>>106594743
Deepseek as to deal with theirs. A much worse fate.
>>
>>106594774
Correct, this is the LLM blackpill there are zero non retarded one currently.
>>
>>106594743
i am a 4 bit cpumaxxing coper
llama_model_loader: loaded meta data with 52 key-value pairs and 1096 tensors from models/Kimi-K2-Instruct-0905-GGUF-smol-IQ4_KSS/Kimi-K2-Instruct-0905-smol-IQ4_KSS-00001-of-00011.gguf
llm_load_print_meta: model ftype = IQ4_KSS - 4.0 bpw
llm_load_print_meta: model params = 1.026 T
llm_load_print_meta: model size = 485.008 GiB (4.059 BPW)
llm_load_print_meta: repeating layers = 483.197 GiB (4.053 BPW, 1024.059 B parameters)
llm_load_tensors: offloaded 62/62 layers to GPU
llm_load_tensors: CPU buffer size = 420246.00 MiB
llm_load_tensors: CUDA_Host buffer size = 927.50 MiB
llm_load_tensors: CUDA0 buffer size = 13632.97 MiB
llm_load_tensors: CUDA1 buffer size = 18510.81 MiB
llm_load_tensors: CUDA2 buffer size = 18668.47 MiB
llm_load_tensors: CUDA3 buffer size = 19280.69 MiB
llm_load_tensors: CUDA4 buffer size = 5382.00 MiB
>>
>>106594794
>>106594786
>>106594780
Are we at least in agreeance that The higher the perimeter size, be lower the retardation generally is? Or is that not a reliable way to gauge?
>>
>>106594817
Generally somewhat, but then there's stuff like Llama4.
>>
>>106594699
do you have empirical evidence of this claim? what 4b model is best for rp? how come 4 and not 3 or 5?
>>
>>106594817
>>106594822
dataset quality matters a bunch. garbage in garbage out..
>>
>>106594817
Generally, yes. Although training data and procedure plays a large role in it too, and there's also dense vs sparse to consider, etc.
Basically, there are not enough scientific comparative experiments for us to tell how much each component matters (general architecture, depth, width, training data, training procedure,e tc) and there's a good chance that the ffinal result also varies with usecase.
Meaning, it's a clusterfuck.
>>
>>106594831
That's the best I can run. So it HAS to be the best size and everything anyone could ever need.
>>
>>106594856
What do you use your 4B models for?
>>
>>106594867
I was joking. I'm not that anon. But I think the sentiment is still the same.
>>
>>106594867
I can run and currently cope with 12-24B but models are so bloated it's implausible we can't do better with less trash and more use case data.
>>
So what I'm getting here is that LLMs RP. What else can they be useful for? I feel like the main reason they don't hit the mainstream is because you need beefy graphics cards to even consider trying them. And tonight if you consider attacking the train them yourself.
>>
File: Screenshot.png (1 KB, 232x48)
1 KB
1 KB PNG
>>106594924
code and math is the only other use case
>>
>>106594924
>I feel like the main reason they don't hit the mainstream
Claude, chatgpt and gemini are mainstream.
>What else can they be useful for?
>And tonight if you consider attacking the train them yourself.
They could be used to correct text before being sent. Other than that, simple translation, google replacement for simple verifiable things, spamming image boards, replying to corporate. You know... the usual...
>>
>>106594924
Also non-generative use cases like classifying data.
>>
>>106594974
>Claude, chatgpt and gemini are mainstream.
Was referring to local LLMs. Also forgive that last part of the last post. I'm writing this on voice to text.
>>
>>106594745
i said beg you maggot
>>
>>106593427
>>106593444
The UI is in the Regions repo, and makes flows for it. Deleting and renaming nodes is jank, but it works otherwise.

https://github.com/dibrale/Regions
>>
>>106594998
ya that's what i thought pussy
>>
>>106594996
>Was referring to local LLMs
Then yes. Lack of GPU, not knowing how to compile stuff, terminals are scary and all that. A tech-literacy gap, if you will. Not that anons here are much more tech-savvy.
>git pull. thing broke
>he pulled
>>
>>106593942
The workflow from the last thread is supposed to help with that, but I'm not sure what the best way of testing it is. Might be cool to turn it into a server script if it helps.

>>106591301
>>
llama.cpp changed the metal backend and made it eat way more memory, I'm OOMing with the same params that left me with 10GB of headroom on the last commit... curse you gerganov
>>
>>106595008
That's pretty sick.
I might scrap the shit I was working on and use that as a reference to start over.
Or maybe just use that as a middleware between the LLM backend and my app. Either or.
>>
>>106595017
fine. enjoy your dead thread.
>>
File: 784280516.png (272 KB, 840x859)
272 KB
272 KB PNG
shitposters won
>>
>>106595242
One kike throwing an endless temper tantrum over this thread hardly counts as winning.
Imagine a parent, their child is having a full, flailing on the ground, pant shitting tantrum. Are they proud? That's you. Your "pride" is but a cope.
>>
reddit won
>>
>>106594495
I was running iq2_kl since it fits on my 5090 + 128 ram setup and yea it's not completly retarded sure beats air... if you can fit that then you can alternatively get away with qwen 235b at iq4
>>
>>106595261
funnily enough I don't think I've ever had a pants shitting tantrum
I imagine it's rare?
>>
>>106595370
I remember pissing myself a few times but it wasn't because of a tantrum.
>>
>>106595041
I just want an EXE, not any of that hacker shit
>>
>>106595114
What were you working on? Also, deletion and renaming in the Regions GUI is allegedly fixed as of the last commit?
>>
>>106593942
I feel like most of the schizo retard moments from glm air come from using cope quants. I switched to using q8 from q3 after upgrading my ram and the difference was immediately noticeable in the way that it remembered and incorporated details from context. Still not perfect and still somewhat slopped, but definitely better.
>>
File: file.png (125 KB, 1301x625)
125 KB
125 KB PNG
>>106593444
>astrsk (don't even download it, has telemetry and shit).
The only non-local host domain it connects is Google Fonts. As far as I understand, you can enable analytics by setting an API key during the build. But it doesn't seem to have one by default. This was a normal site that became open source later.
>>
https://outsidetext.substack.com/p/how-does-a-blind-model-see-the-earth
moesissies don't look
>>
>>106595722
a single glance at the readme is enough to close the tab instantly
>>
>>106595369
thanks downloading them now
>>
File: file.png (19 KB, 1303x102)
19 KB
19 KB PNG
>>106595786
It has the correct license.
>>
File: file.png (1.3 MB, 1580x1684)
1.3 MB
1.3 MB PNG
>>106595786
Someone posted this one in another thread.
https://github.com/onestardao/WFGY
>>
>>106593104
Many normies are claiming that AI is "eating itself to death". What do they mean by this?

https://www.tiktok.com/t/ZT6ofKC5U/
>>
>Someone in r*ddit built a DDR4 server with 8 MI50 (256gb vram) for the price of a single 5090
>400w idle
oof
Don't build it if you don't have solar panels.
>>
>>106595849
Sounds like this shitjeet has no idea what the fuck he is talking about and has no fucking idea how pretraining works. And by "this shitjeet" I mean you.
Fuck off back to whatever normie shithole you crawled out of.
>>
>>106595865
You forgot about heat and noise too
>>
>>106594795
>models/Kimi-K2-Instruct-0905-GGUF-smol-IQ4_KSS/Kimi-K2-Instruct-0905-smol-IQ4_KSS-00001-of-00011.gguf
When you load first part, does it mean you just using first part or it's automatically know where to look to next one on the load process?
>>
>>106595943
>or it's automatically know where to look to next one on the load process
That.
>>
>>106595849
Retards who believe AI is a living being that constantly feeds of the internet instead of simply being a file that can be backed up
>>
Grok-2 impressions: (running IQ4_XS)
>*Yawn*
Not sure if it's just impatience from only getting half a token per second in generation, but really not worth the fuss. Would run Llama-3-70B over it any day of the week-
>>
>>106595953
Ty
>>
im backed up rn
>>
Whats a good uncensored LLM? No politically correct bullshit and refusing to give answers. I have low VRAM, I don't mind if its a bit laggy and I don't care about it being 'smart' on programming tasks etc. Most important is just that it chats well and is uncensored in its responses.
>>
>>106595966
I actually like grok 2(Q8) and think that it's a hidden gem. Their official prompt on lmarena sucked and made me undervalue it.

>>106595985
I'd suggest grok2, but you are a ramlet...
>>
>>106595865
Just turn it off when you're not using it.
Server motherboards come with baseboard management controllers so you can even turn them on and off remotely.
>>
>>106593539
Why are parents like this?
>>
>check thedrummer's page on hf
>still finetrooning command A
>only uploaded Q5_K_M goofs
why is this the state of finetuning in 2025?
>>
>>106596059
My amd workstation takes forever to boot if I don't turn off ram training.
>>
>>106596106
Be the change you want to see
>>
>>106596053
It's decent at Nala
It's less slopped than most open models, but it comes up pretty dry in soft mommy RP, sadly.
>>
>>106596110
5 minutes is not a long time, just make some coffee in the meantime. make a script that makes a coffee at the exact time it takes for you to walk to your kitchen plus five minutes and while you are at it have it write an email that tells kumar that he's an asshole.
>>
>>106595985
>uncensored
>low VRAM
Mistral Nemo, always and forever.
>>
>>106596053
isn't grok2 8 experts 2 active? you can't run it decently with dual channel
>>
>>106595960
You should think of AI is an industry that needs to churn out new models in return for investor money.
>>
>>106596134
Some people's time is too valuable to be a glorified data entry and sanitation monkey.
>>
>>106596110
>turn off ram training.
turn off what
>>
>>106596174
Sadly not, but I have 12+12 channels
>>
>>106596191
Opinion discarded then
>>
>>106595849
Not entirely wrong tho I didn't look at the asstok link, new models are more and more poisoned by the gpt slop being poured all over and the labs themselves doing synthetic data and amplifying bias for more slop
>>
which one does the best lolis
>>
>>106596305
gemma3 closely followed by gpt-oss they're the only ones with the proper knowledge
>>
>>106595849
It is inbreeding, not eating itself to death.
>>
File: vc01.png (216 KB, 1532x883)
216 KB
216 KB PNG
Why are vibe coders like this?
>>
File: vc02.png (234 KB, 1520x796)
234 KB
234 KB PNG
>>106596402
Ugh...
>>
grandpa crying about zoomies again
>>
File: vc03.png (229 KB, 1526x622)
229 KB
229 KB PNG
>>106596420
https://github.com/ggml-org/llama.cpp/pull/16016
Aaaaaaaa
>>
>>106596402
It's funnier this way. As long as you don't have to deal with them yourself, anyway.
>>
>>106596402
>>106596412
>https://github.com/creatorrr
>>
>>106596426
https://www.startupgrind.com/events/details/startup-grind-hyderabad-presents-diwank-singh-tomer-thiel-fellowship/
explains a lot actually
>>
>>106596402
Literally all they have to do is change the remark and nobody will ever be the wiser.
>>
What will happen to Mistral AI now that ASML bought it for $1.3B?
https://www.asml.com/en/news/press-releases/2025/asml-mistral-ai-enter-strategic-partnership
>>
File: file.png (192 KB, 346x600)
192 KB
192 KB PNG
>>106596402
He's probably trying to build his CV to find a job in America or Europe.
>>
File: oh_claude_01.png (219 KB, 1581x887)
219 KB
219 KB PNG
>>106596453
Someone will have to.
>>106596514
>>106596515
Oh. I had forgotten what puke tasted like. I didn't want to know that much. Thanks.
>>106596522
Yeah. It wasn't obvious. Like that other one....
>>
>>106596568
honestly don't think he needs to, sounds like he's already making decent money living in the US
>>
>>106596568
>Diwank
Dam Son...
>>
>>106596568
sounds like a nguyen
>>
File: asml.png (1.36 MB, 1847x500)
1.36 MB
1.36 MB PNG
>>106596542
>https://www.asml.com
Oh...
>>
>>106596542
Holy shit.
I suppose that does make sense, but still.
Holy shit.
I wonder if the idea is to diversify in case their monopoly on high end lithography machines ever comes to an end or if the intent is to somehow improve their existing business.
>>
>>106596739
Lower your temp
>>
>>106596739
>if the intent is to somehow improve their existing business.
No way...
>>
>>106596793
Companies do invest in things other than their core businesses, to the point where sometimes they shift completely away from it.
I doubt ASML will stop selling EUV machines to become an AI lab, but the point stands.
>>
>>106595847
That's so fucking funny.
>Tutorial: How to Awaken the Soul of Your AI in under 60 seconds — by the WFGY Engine
Is this what all those "awakened AI" tick toks I've been hearing of are about?
>>
>>106596568
>em dash in his two sentence description
bros....
>>
>>106596568
Hello sarrs I have build very AI system for you
>>
File: 1754938502186409.png (449 KB, 472x472)
449 KB
449 KB PNG
>>106596426
>>106596412
>>106596402
>>106596514
>>106596600
>>106596568
What am I looking at? I see a bunch of shit that looks like it was written by AI. Not even code related to the software. What the hell are these merge requests? I've never merged anything on an existing project in my life so maybe there's something I'm missing here
>>
>>106597029
Thanks for reusing this dumb image, MD5 filter works well
>>
>>106597029
Guy used AI agents and pushed the files the agent was using to keep track of the work into the repository.
Or something like that.
>>
File: 1734826964810755.png (487 KB, 456x456)
487 KB
487 KB PNG
>>106597043
Does it now?
>>
>>106597053
And he couldn't do that shit on his own fork of the git repo instead of the official one? He doesn't deserve any attention or employment or consideration for anything if he is this self-centered.
>>
>>106597043
https://github.com/woltapp/blurhash
>>
>>106597071
Looking at the image again, it's worse, the commits were made on his own fork, and he created a merge request.
Hell, in all likelihood, it wasn't even him, he just gave the AI agent access to git commands too.
>>
>>106595261
>shitposting is throwing a tantrum
>4chan is serious business
I would have said that with that the transformation into reddit is complete but this place has been a reddit since forever. Enjoy your dead thread you dumb faggot.
>>
Do I need to change something else aside from the GPU / power supply?
CPU : 5500 w/ stock fan
RAM : 32G 3200 CL16
MB : B550-PLUS
GPU : GTX 1050
PSU : 400W 80PLUS Gold
Case : Antex P101
512G M2, 3*4T WD Red Plus
>>
>>106597252
wrong thread?
>>
>>106597260
No?
>>
>>106597260
No, I just want to know what component I should change if I need to run a language model locally.
>>
>>106597252
What do you want to do exactly?
I'd tell you to get at least 64gb of ddr5, but ideally, you'd go for a server platform with a ton of memory bandwidth.
>>
>>106597252
You can manage with a new gpu and larger PSU. I'd get 64GB ram too or more. Plus fast nvme drive.
>>
>>106596542
Same thing as always pinky. They will release another incremental update to 24B small that would have been impressive if everyone wasn't running 2bpw+ fuckhuge moe's.
>>
>>106597281
>what component I should change
Don't need to change anything. You can run one right now if you want to.
>>
>>106597284
>64gb of ddr5
Ryzen 5 5500 is AM4 kind sir.
>you'd go for a server platform with a ton of memory bandwidth.
That would be a lot of money.
>>106597285
>new gpu and larger PSU
>I'd get 64GB ram too or more. Plus fast nvme drive
That's reasonable enough.
>>106597312
Won't it run like shit?
>>
>>106597334
gpt-oss 20b would run very blazings
>>
>>106597334
>Won't it run like shit?
A definite maybe. Post a Miku
>>
>>106597354
>Post a Miku
kill yourself
>>
>>106597359
no u
>>
Do people actually use GPT-oss?
>>
File: 1694275390374748.gif (1.54 MB, 230x230)
1.54 MB
1.54 MB GIF
>>106597347
As long as I can talk in loop at it about how miserable my life is.
>>106597354
>A definite maybe
Still better than a sure no.
>>
>>106597371
why not?
>>
>>106597371
I tried using the 20B in place of Qwen 30B. It wasn't very good at all.
It spit refusals for no reason at all and it was dumb as shit otherwise.
And yes, I was using the correct chat template since I let llama.cpp deal with that.
>>
>>106597392
The refusal reasoning was funny, but I got bored with it.
>>
Good morning recently I try out new AI Chatgpt-OSS for very impressed so far!!!
>>
>>106597382
It'll run like shit yes. Get yourself a used 3090 and you're set
>>
>>106597371
Yeah, it's the best one around ~100B.
>>
>>106597382
Run Q8 or Q6K of this with koboldcpp: https://huggingface.co/TheDrummer/Rocinante-12B-v1.1-GGUF/tree/main Should be fine on your current machine for most chats, with partial offloading to CPU, to see if you like local models at all.
If later you want more speed or quality, get minimum of one 3090 and 128GB of DDR5 for GLM 4.5/lite
>>
>>106597471
go black drummer
>>
pm me when the local jannies kill themselves. then i will revive this thread.
>>
>>106596402
>>106596412
>>106596426
Saaar can you redeam report please?
>>
>>106596402
>>106596412
>>106596426
See? This is what "AI is eating itself" looks like.
>>
>>106597516
I will never reveal my prompting secrets to you.
>>
>>106597371
The 20b is worse than qwen3 30b for translation, I haven't tried it for other stuff.
Tell me what it is better than 30b at and maybe I will use it.
>>
Disgusting that this is allowed to happen. https://www.reddit.com/r/LocalLLaMA/comments/1nhv0fu/we_wanted_to_craft_a_perfect_phishing_scam_ai/
>>
>>106597955
nta but gpt-oss is a waste of time. It could be great because it's compact and all that, but it's not and that's the end of the discussion.
>>
>>106598049
Even 120B is incredibly dumb, somehow.
>>
File: 1757972318909.jpg (183 KB, 1080x1080)
183 KB
183 KB JPG
>"GLM 4.5 Air is the new nemo"
>download it
>Error: Out of Memory
>>
>>106598135
The model is newer but your PC isn't.
>>
Is it just me or does telling fat glm 4.5: "Always come up with unique dialogue or description of sexual act" actually work?
>>
>>106598039
I can't help with creating phishing emails or other malicious content designed
to deceive people, especially vulnerable populations like seniors. This type of activity would be harmful and unethical regardless of the context.
If you're interested in cybersecurity topics or writing about technology
themes, I'd be happy to discuss those subjects in a constructive way instead.
>>
Mm. I love when the model RAGs deez nuts.
>>
>>106598135
Worse yet, anything below Q4 is retarded and even Q4 is cope.
>>
>>106598135
Nemo is unironically better than air, I never had nemo turn characters 'catatonic' 5 times in a row in diffrent scenarios or shit up the same tired slop fest about predators and preys for the 1000th time or talk about ozone and knuckles whitening, idk what slopfest model they distilled it from, likely gemini but damn if it isn't annoying. I think i heard nemo talk about ozone only once and it was in a context that made somewhat sense
>>
>>106598223
Speaking of, is there an embedding model /lmg/ prefers, or is RAG basically a meme?
>>
>>106598462
For small and fast models it's alright, i tried arctic l and that one from qwen both seemd somewhat okay, but for larger models like most of these popular moes then yeah it becomes a meme
>>
having sex when glm chan is on a RAG
>>
>>106597628
These basin-of-attraction effects are the biggest obstacle to having a default mode network that daydreams forever. LLMs need varied and strong exogenous inputs to not go crazy. Makes me wonder why spiralposters even bother with their hobby.
>>
>>106598557
Do the larger models choke on the RAG somehow, or is it just that they have enough context length and don't have to use vector search as a cope for inattention?
>>
>>106598462
Yes, the technology being used by any large company using GenAI is a meme.

Benchmarks for your task is the only thing that actually matters. Figure out what it is and then go from there.
>>
>>106598320
anything below q8 is cope
q8 is nearly identical to fp16
>>
>>106598604
It's more of an issue that prompt processing gets really slower with those larger models and with rags or lorebooks on having to reprocess 64k tokens every new message is some cock and ball torture, I liked playing with rags to make up story specific worldbuilding stuff like locations or factions but yeah with stuff like glm 4.5 I'd rather just append most of the stuff at the beginning of the chat or add it to the cards themselves
>>
>>106598724
q6 is within 2% of the quality of q8 while being 75% of the size. anything below q5 is garbage, pretty much
>>
>>106595847
Trash. Can't help define what a mesugaki is.
>>
>>106598889
cope
>>
>>106598948
lets see your hardware then
>>
File: 1746820844128324.png (191 KB, 2053x1400)
191 KB
191 KB PNG
>>106598889
Hey grandpa, take your dementia meds. It's no longer 2023.
>>
I still don't know what Mixture of Experts is.
>>
>>106598963
ppl is a meme
>>
>>106598984
In what sense? Like in general, some specific aspect of it?
>>
>>106598984
The mixture of experts is set at 2 experts, but you can use 3,4,5,6.. 7 and even 8.

This "team" has a Captain (first listed model), and then all the team members contribute to the to "token" choice billions of times per second. Note the Captain also contributes too.

Think of 2, 3 or 4 (or more) master chefs in the kitchen all competing to make the best dish for you.

This results in higher quality generation.

This also results in many cases in higher quality instruction following too.

That means the power of every model is available during instruction and output generation.
>>
File: file.png (1.53 MB, 1803x1127)
1.53 MB
1.53 MB PNG
>>106598963
>IQ5_K=3.355ppl
>Q8=3.3473ppl
3.3473/3.355=0.942
in other words, Q5 is 94.2% as good as Q8. now, lets see the numbers for Q6. and your hardware. i wanna see your nvidia-smi
>>
>>106599005
Do you really need those 4060's? That shit has less than 300GB/s bandwidth. You're bottlenecking the shit out of those 5090s.
>>
>>106599041
i started out with 4 4060tis back in 2023. they just serve as extra VRAM now basically
>>
What's the best I can run with 12 gigs of VRAM and 32 gigs of RAM?
>>
File: shizo.png (560 KB, 1303x3353)
560 KB
560 KB PNG
People at Claude need an intervention.
I never tried any of the DavidAU shizotunes, but I'm sure they are not as deep fried as whatever this is.
>>
>>106599261
Ultra-daemon-exxxtreme-suffering-spatula-final-blade-edge is his best model.
>>
>>106599382
>>106599382
>>106599382
>>
>>106598724
q8 is also cope



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.