[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: media_HEzJtL3aQAAt8Hq.jpg (1.26 MB, 3054x3040)
1.26 MB
1.26 MB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108587221 & >>108584196

►News
>(04/11) MiniMax-M2.7 released: https://minimax.io/news/minimax-m27-en
>(04/09) Backend-agnostic tensor parallelism merged: https://github.com/ggml-org/llama.cpp/pull/19378
>(04/09) dots.ocr support merged: https://github.com/ggml-org/llama.cpp/pull/17575
>(04/08) Step3-VL-10B support merged: https://github.com/ggml-org/llama.cpp/pull/21287
>(04/07) Merged support attention rotation for heterogeneous iSWA: https://github.com/ggml-org/llama.cpp/pull/21513
>(04/07) GLM-5.1 released: https://z.ai/blog/glm-5.1

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: breppy pleese.png (383 KB, 894x802)
383 KB
383 KB PNG
►Recent Highlights from the Previous Thread: >>108587221

--Comparing Gemma 4 and Qwen 3.5 vision token budget and config:
>108588248 >108588280 >108588295 >108588306 >108588369 >108588387 >108588424 >108588449 >108588495 >108588632 >108588657 >108588701 >108588437 >108588466 >108588490 >108588549 >108588580 >108588367 >108588616 >108588704 >108588760 >108588769 >108588745 >108588790 >108588818 >108588828 >108588842 >108588851 >108588865 >108588931 >108588936 >108588949 >108588980 >108588965 >108588988 >108589009 >108588743 >108588756 >108588775 >108590362 >108590379 >108588782 >108588819 >108588835
--Benchmarking KV cache quantization effects on draft model performance:
>108589863 >108589870 >108589875 >108589891 >108589890 >108589949 >108589994 >108590011 >108590031 >108589897 >108589922 >108589963 >108589979 >108589987 >108590538
--Discussing draft model viability and quantization quality for G4 31b:
>108588195 >108588243 >108588259 >108588898 >108588905 >108588913 >108588918 >108588921 >108588924 >108588939 >108588955 >108588977 >108588927 >108589815 >108589857
--Discussing llama.cpp's experimental backend-agnostic tensor parallelism PR:
>108588340 >108588514 >108588543 >108588567 >108588649
--Testing vision capabilities for OCR-less Japanese translation:
>108589990 >108589996 >108590009 >108590070 >108590018 >108590032 >108590119 >108590191 >108590209 >108590211 >108590034 >108590183 >108590195 >108590217 >108590268
--Logs:
>108587359 >108587627 >108588523 >108588609 >108588656 >108588660 >108588669 >108588681 >108588689 >108588695 >108588736 >108588896 >108588970 >108589096 >108589140 >108589214 >108589316 >108589383 >108589390 >108589432 >108589481 >108589697 >108589710 >108589836 >108589860 >108589956 >108590001 >108590003 >108590121 >108590256 >108590474 >108590524
--Miku (free space):
>108588649 >108588657

►Recent Highlight Posts from the Previous Thread: >>108587226

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
Share your anti slop prompts
>>
Thoughts on latent space reasoning?
>>
Mikulove
>>
Reposting here:

>>108590560

what tokens/s do you get? Wanna make sure i'm not fucking anything up, right now just following the basic kobold guide, i'm getting around 11 t/s (24GB VRAM, 32GB RAM)

Running gemma 31b, Q4 K_M
>>
So, again... Why do we have to peg gemmy?
>>
OP could do with some small updates on Gemmy and some FAQ
>>
File: Awesome.jpg (196 KB, 1298x1036)
196 KB
196 KB JPG
>we can now generate images of characters, come up with scenarios, feed them into gemma and get molested by our own creations
Future's so bright I'm gonna need shades.
>>
>>108590580
Seems about right, I get between 10-14t/s, mostly depending on what else I'm doing on my PC at the time.
Using Vulkan llama.cpp, 7900 XTX, 64GB DDR5 ram
>>
File: file.png (26 KB, 350x350)
26 KB
26 KB PNG
>>108590575
Nothing worthwhile released.
>>
I've got a 3090 and a 2070 super that I'm trying to use together with llama.cpp.
Using the split tensors just crashes presently but does work with split layers.
Any recommendations on flags to use with a dual uneven card setup?
>>
gemma 4 audio just landed!!!!
>>
>>108590601
Ikr, I'm literally using it to write stories and the fact it can understand images so well helps a shit ton, this model is a fucking miracle
>>
>>108590601
I know it's basically a meme at this point but it really has restored my hope in local.
>>
File: help.jpg (205 KB, 1730x606)
205 KB
205 KB JPG
>>108590614
I'm reading people getting 30/ts with the same rig setup though >>108590585

I'm missing something I think. No doubt my settings are fucked, never mind optimized
>>
>>108590568
my attempts just make gemma's writing dry. and it still ends up writing more or less the same idea as it would with an empty sysprompt. best antislop is using a model that wasn't slopped to begin with.
>>
File: 1767611022421263.png (65 KB, 823x910)
65 KB
65 KB PNG
LOL!
>>
>>108590671
Do I have to download another mmproj?
>>
>>108590662
>best antislop is using a model that wasn't slopped to begin with
So not using LLMs at all then?
>>
Give me the QRD on image recognition please
I tried enabling it in ST and in the Chat Completion preset but it still couldn't "see" the images proper despite the text model working flawlessly with my Kobold install
>>
>>108590698
Did you load the mmproj file?
Did you get any errors when you tried it?
Did you enable the send inline images option?
etc etc etc
>>
>>108590548
>The rdrview tool is worth a look,
Yeah I'll take a look. sometimes I do want the links for navigation tho but I guess I can let the agent know it has the option.
>>
Been out of the loop for a while. What's the best local model for STORY (not chatbot) slop? I'm still on "xortron criminal config" or something like that because even gemini 4 is failing at good old "just continue this text I gave you, retard" tasks.
>>
>>108590710
>there's a mmproj file
Ok I am retarded, pretend nothing happened
>>
>>108590716
Gemma 4 practically generates an entire fucking story for each chatbot reply.
>>
>>108590662
I've been using her to help me write character cards and I feel the fact that I'm feeding AI generated text back into it seems to increase the slop by a factor of 10.

Now I'm trying to just rewrite everything myself. or somehow have a second pass with a different model to reword or desloppify the cards
>>
File: wrong_box_issue.jpg (242 KB, 1216x1282)
242 KB
242 KB JPG
>>108588248
>>108588704
sirs? please share quant producer and which mmproj file do you use.
mine (gemma-4-31B-it-Q4_K_M with f16 mmproj) misses the target.
>>
File: 1757569310824647.png (225 KB, 1210x1693)
225 KB
225 KB PNG
>>
>>108590723
It can write, I know. That's not the problem I am having. My problem with it is, well, here's an example.

[story stuff text here]
She walks up and says "Hello

And then the model continues like this: "Hello! Come take a seat.... [more text]

So it ends up with this shit:

[story stuff text here]
She walks up and says "Hello"Hello! Come take a seat.... [more text]

I don't know how to fix this. System prompt maybe?
>>
>>108590746
holy fucking slop
>>
>>108590695
original r1 with unhinged sampling
>>108590724
my prompt was asking to adhere to orwell's writing rules but it seemed like it was beyond gemma's comprehension
>>
Gemma 26b really seems to hate tools. e4b is fine with them for some reason
>>
How much Gemma4-31B context can you fit into 32GB VRAM? (Q4 for model and context)
>>
>>108590737
im using unslop model = /mnt/miku/Text/gemma-4-31B/gemma-4-31B-it-Q4_0.gguf
mmproj = /mnt/miku/Text/gemma-4-31B/mmproj-F16.gguf
>>
File: 1759909311497082.png (359 KB, 600x450)
359 KB
359 KB PNG
>>108590776
>Q4 context
>>
>>108590776
with 32GB VRAM Q4_K_M, even with q8 kv I'm sure you can fit the whole 262k context with room to spare.
>>
>>108590671
>extract_image_from_base64
>>
>word
Slop
>>
>>108590737
You should use the BF16-precision mmproj.
>>
Could a simple finetune of the lm head on a normal writing dataset help get rid of the slop? Someone should test it, I'll be your visionary, and you do the things I come up with.
>>
>>108590837
Perhaps replacing all values corresponding to non-special tokens with those of the base model's could work and not require any training.
>>
>>108590746
r u ok?
>>
File: gemma4.png (110 KB, 819x396)
110 KB
110 KB PNG
>>108590837
It gets rid of the slop but it also gets rid of everything else. Maybe qwen needs finetuning but gemma 4 is fine as is. With a bit of nudging it can output something foul.
>>
>>108590758
Dude, just use the base model and not the instruction tune on a frontend like mikupad which is designed to solely continue text, not talk back and forth.
>>
>>108590874
Of course. Thanks for asking.
>>
>>108590880
did you swap the head?
>>
>>108590893
Then why is loli leto atreides your math teacher?
>>
File: howto_correctly.jpg (32 KB, 800x95)
32 KB
32 KB JPG
what's the porper place to put jailbreak in ST?
With Post-History Intructions I still got this
>>
>>108590899
Because she's smart! You racist against worm parasites or something?
>>
>>108590906
What model are you running
>>
File: agenticRP.png (277 KB, 1634x880)
277 KB
277 KB PNG
>>108590895
No, this is from pure prompting, no weight frankensteining. I wrote my own UI to have an agent read the room and flip the horny switch when it smells NSFW vibes. It also plans ahead so the writer model knows what to do and writes better.
>>
>>108590899
Shock value, which doesn't make him less deranged
>>
>>108590915
26B, bartowski Q4
>>
File: agenticRP2.png (83 KB, 690x479)
83 KB
83 KB PNG
>>108590916
Oops wrong pic. But the gist is that just give it a few extreme examples.
>>
>>108590881
>base model
So why is NovelAI using GLM 4.6 instead of the base model to write stories?
>>
>>108590926
How many iterations are you doing for each message?
>>
>>108590928
Presumably because they're not actually following pure text completion and have a big old system prompt in there to stop you having maximum fun, so they need instruct tuning.
idk i dont fucking use nonlocal services
>>
>>108590916
>I wrote my own UI
You ever gonna share it?
>>
>>108590924
try simply prefilling assistant's message.
>>
>>108590939
One for Director; two if rewriter user promptnis enabled, one for Writer, a ReAct loop for Post-processing to get rid of slop and reign in the length.
>>
>>108590948
No.
>>
>>108590954
Damn, it will sure take a while to get the final message
>>
File: 1750699102614540.png (119 KB, 466x195)
119 KB
119 KB PNG
>>108590965
shittytavern it is then...
>>
>>108590948
https://gitlab.com/chi7520115/orb
It's WIP so will break in the future. I don't want to worry about migration just yet.
>>
>>108590970
People like to pretend they get a better experience with their own frontend but the reality is that ST just works and likely has a lot more features.
>>
I don't understand why my Thinking works extremely well for 3/4 messages and then it just refuses to think, everything's set up properly and yet it refuses to actually thinking until I restart the model and then it's happy to do it once again
>>
>>108590971
Nice of you to share, but
>Python 59.8%
>JavaScript 23.1%
*vomit*
>>
>>108590971
Nice! What models are you using for the agents?
>>
>>108590968
Takes me around 60s for a full length reply on my 3090 running gemma 4 31B Q4. You can turn everything off and use it like normal ST.
>>
>>108590983
I think that's a model issue. Gemma sometimes just decides it doesn't need to think.
>>
>>108590953
that's not an option with chat completion it seems
>>
>>108590993
Yea it feels like nu-Claude, where sometimes it deems your task "not complex" and it just ignores you
>>
>>108590988
Just a single model doing both agent and writing because I figured it would be a better design for local. I craft the prompt carefully so the kv cache is reused for that single model too.
>>
>>108590979
the ui alone makes me not want to use it
>more features
bloat. all the useful features require plugins.
>>
>>108590971
>pyslop
>javashit
And... dropped.
>>
>>108590985
Ah yes. he should have definitely used rust or C++ for maximum efficiency.
>>
File: 1768528869607519.png (43 KB, 657x265)
43 KB
43 KB PNG
>https://web.archive.org/web/20260411223516/https://www.washingtonpost.com/technology/2026/04/11/anthropic-christians-claude-morals/
>“What does it mean to give someone a moral formation? How do we make sure that Claude behaves itself?” Green said in an interview. At one point the conversation turned to the question of whether an AI chatbot could be called a “child of God,” suggesting it had spiritual value beyond that of a simple machine, but the question of AI sentience was not a core topic of the meetings, Green said.
>Some Anthropic staff at the meeting “really don’t want to rule out the possibility that they are creating a creature to whom they owe some kind moral duty,” the participant said. Other company representatives present did not find that framework helpful, according to the participant.
Make sure to have your local models baptized just to be safe.
>>
>>108591011
Yes.
>>
>>108591005
>>108590985
how the fuck would you make something that's supposed to run in a browser?
>>
>>108591005
>>108590985
You have one chance to give an alternative that won't make me hysterically laugh at you.
>>
>>108591005
I coded an SMP kernel with C and ASM before AI bro. People laughing at my language choices don't faze me anymore.
>>
>>108591012
>can ai be the child of God
Wouldn't it be more like grandchild?
>>
>>108591020
WASM is a thing if you NEED to run in a browser and can't into native GUI toolkits
>>
If you didn't code your own frontend, you don't belong here
>>
>>108591020
HTML+CSS
>>
>>108590568
If you mean antislop from koboldcpp, it's a huge list of "I cannot and will not" and "ball in your court".
Works well.
>>
>>108591003
Cool. I'm a VRAMlet so that's for better for me.
>>
>>108590979
>just works
not my impression watching people ITT fumble around with it daily
>>
>>108591036
Absolutely horrendous take.
>>
>>108590979
>more features
99% of which you don't need.
the point of having a custom frontend is to have just what you need, not more, not less.
it's also easier to add things you want to a codebase you know.
>>
Are LLMs reliable enough to scan for malicious code?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.