[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: 1766874888655157.jpg (2.13 MB, 3563x10000)
2.13 MB
2.13 MB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108368195


►News
>(03/11) Nemotron 3 Super released: https://hf.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
why dont they sell a permanent license to use kimi? like they used to do with photoshop
>>
when
you
walk
away

you
dont
hear
me
say
>>
>>108373497
please oh baby dont go simple and clean is the way that youre making me feel tonight its hard to let it go also post progress or get out why are we singing kingdom hearts songs instead of actually working on our games anyway
>>
another day tard wrangling an LLM
>>
File: what a nig.jpg (61 KB, 473x355)
61 KB
61 KB JPG
>https://github.com/ggml-org/llama.cpp/pull/19726#issuecomment-3946484059
>I apologies, but I will have to close this PR. Thank you for your effort.
>>
I need to decensor my local models, i have a 16gb GPU and 32gb of ddr4, can i do abliteration locally? Claude says i need 64gb.
>>
>>108373541
proof?
>>
>>108373570
depends on the model. if the fp16 is smaller than around 40gb, then you could on you hardware.
>>
>>108373581
It would be mainly for this one, it already says decensored but it's a complete lie, it is completely cucked, guess i am going to try to abliterate, thanks.
>>
>>108373597
I don't know if you are trolling, but download the one with Heretic in the name.
>>
>>108373597
so i tried this model
>>
>>108373606
heretic is dumber than abliterated
>>
>>108373481
BASED BAKER.
>>
Is the fact that people unironically shill OBLITERATED or UNCENSORED SUPER SEX models to each other explained by influx of newfags who just started running local LLM's?
>>
>>108373807
no
>>
>>108373481
is 256gb ram with one fine gpu worth investing into for the new modles?
>>
File: file.png (226 KB, 393x393)
226 KB
226 KB PNG
►Recent Highlights from the Previous Thread: >>108368195

--Testing local models on existential coffee maker prompts:
>108372423 >108372444 >108372474 >108372490 >108372498 >108372536 >108372512 >108372513 >108372540 >108372545 >108372663 >108372670 >108373385
--Porting Qualcomm charge control to Linux for battery longevity:
>108369180 >108369205 >108369245 >108369255 >108369206 >108369260 >108369273 >108369307
--Over-engineering training pipelines vs simple finetuning approaches:
>108372459 >108372486 >108372543 >108372546 >108372659 >108372748 >108372849 >108372685
--Comparing Magidonia 24B and Qwen 3.5 27B for roleplay:
>108372269 >108372293 >108372313 >108372668 >108372866 >108372888 >108372966 >108372995 >108373438 >108373028 >108373297
--Moonshinev2 ASR demo highlights real-time streaming and low-latency CPU performance:
>108369287
--LLMs require coding knowledge to avoid structural flaws:
>108371546 >108371642 >108372603 >108372814 >108373850 >108372899 >108372987 >108373180
--PocketTTS.cpp ONNX Runtime update and performance benchmark request:
>108369021 >108370539 >108372072 >108373448
--Parser refactor breaks Kimi reasoning support, fix proposed:
>108368848 >108368921 >108371172 >108371183 >108371243 >108371266 >108371295 >108371320 >108371396 >108371309 >108371398 >108371415 >108371330 >108371336 >108371365 >108371380 >108371395 >108371390 >108371421 >108371484 >108371211
--General models with tool access vs specialized finetuning approaches:
>108370762 >108370868 >108370880 >108370885 >108370930
--Cache saving prevents redundant model reprocessing:
>108368753 >108368761
--Batch size tuning for MoE inference efficiency:
>108371805 >108371818 >108371826
--Debating AI model performance vs GPU cost tradeoffs:
>108371758 >108371772
--Miku (free space):
>108368329 >108369180 >108371869 >108372029 >108372316 >108372759

►Recent Highlight Posts from the Previous Thread: >>108368198

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>108373879
>investing
probably not, no
>>
If all the big name models suck, why doesn't Anon just make his own model and share it with us?
>>
>>108373875
Oh ok.
>>
>>108373879
can you rent your ram to me?
>>
How does Miku's penis taste like?
>>
>>108373915
i did
>>
>>108373915
Because people with compute can alredy run anything and people without compute don't have compute to train.
>>
>>108373932
Just buy it?? A 5090 is only 2k
>>
>>108373948
What are you gonna train on a single gpu?
>>
>>108373948
No one is going to sell you a working 5090 for less than $3500.
But I don't disagree with you in spirit.
>>
>>108373966
nuh uh, proof? my uncle bought his for 2k
>>
>>108373807
What's wrong with wanting uncucked local models?
>>
>>108373991
It is dangerous, same reason you don't let unvetted people own guns.
>>
>>108373991
They aren't uncucked.
>>
>>108373928
Ask my wife she knows
>>
>>108374028
CATCH AND KILL THIS MIKUTROON!
>>
>>108374046
I'm just a cuck though Miku fucked my wife don't turn me into a troon too!
>>
>>108373597
Get the heretic model
>>108373675
You're wrong. Abliterated seems fine until it hits one of the abliterated sections of the weights and then it starts spewing straight nonsense. Heretic doesn't do that.
>>
>>108374177
i bet you worked on heretic you bastard
>>
>>108374177
thanks for making heretic you bastard, I'm really enjoying it
>>
Are any of the smaller TTS models able to change the emotion of the voice depending on the context of the convo or do I have to guide it with *angry* tags or what?
>>
>>108374181
I wish. I've just used a bunch of abliterated models and always ran into the nonsense generating issue. It could be on my end, but I've never observed that with any other model. Heretic doesn't do it either, but I've used heretic a lot less than abliterated models.
That "aggressive" version of the model also seems to be good.
>>108373879
With 256 GB RAM + 24 GB of VRAM you could run the following newer models:
>Qwen 3.5 397B-A17B at Q4
>GLM 4.7 at Q4
>Step 3.5 Flash at Q8
>Minimax M2.5 at Q8

Maybe it's worth it for GLM 4.7 and the large Qwen, but I think 128 GB of RAM is more economical. You can turn run missiles like Qwen 122B-A10B and Q4 of Minimax and Step.
>>
>>108374238
what about 128gb 32 vram?
>>
>>108373879
no, qwen made all the bigger stuff pointless
>>
I did it! I was able to play rock papers scissors with my local AI!

>Open socket.
>AI commits.
>I commit.
>Neither sees the other's action.
>When both are done system resolves.
>You win!/you lose.
I'm so happy bros
>>
>>108374238
>You can turn run missiles like Qwen 122B-A10B and Q4 of Minimax and Step
You can then run models like*
>>
>>108374247
back to china with you
>>
>>108374252
loser
>>
>>108374245
It's the same, it'll just run a little faster. If you want to have an idea what models you can run go on HuggingFace and check out the quantized versions of the models. Reserve around 3-10 GB of (V)RAM for kv cache and then see if the model's file size fits in your RAM + VRAM.

(KV cache rule of thumb is about 1 GB per 10k tokens.)
>>
Qwen 35b one shots 85% of the time, if you run it with heavy thinking it goes up to 98%
>>
>>108374271
how many tokens do you use?
>>
>>108374271
397b = 400gb retard
>>
>>108374273
That depends on what you are (I am) trying to do. Asking the model questions or having it write a bit of text I can make do with 10-20k. With coding if you want the model to one shot a problem based on its description then a similar amount is fine, but if you want the model to read your existing code and then make changes then you're looking at 50-100k context pretty quickly, especially if you then ask it to make changes or fixes.

One thing to note is that keeping your context clean and minimal makes the AIs smarter, so even if you can have a huge context it's still better to not put irrelevant stuff in there.
>>
>>108374291
Q4 is 200-250 GB, buddy:
https://huggingface.co/bartowski/Qwen_Qwen3.5-397B-A17B-GGUF
>>
>>108374304
you can't fit 200 gb in 128 gb ram, you lied to him benchod puto
>>
>>108374304
q4 is only acceptable if you use it for unimportant "work" like erp
in which case 27b would be more than enough for you so fuck off
>>
>>108374322
look man we dont do that
>>
File: 1747875816703780.png (252 KB, 893x1008)
252 KB
252 KB PNG
It's over for /lmg/ pedos
>>
Is think prefill the same as instruct jailbreak? Just put the "sure let me help" in there?
>>
>>108374446
"Sure let me help" just nudge the model so it helps the user, which it already does, but it doesn't mean the model will give responses you like. The model could go like "Sure let me help, the user is having some antisemitic thoughts and it's my job to correct them"
>>
>>108374446
I think you want the
>Start Reply With
field.
>>
>>108374471
I'm sleepy, can you speak with a friendlier tone? Else I'm leaving to bed.
>>
>>108374466
plenty of models also just randomly say 'let me review my policies' mid gen and cock block it
>>
>>108374481
doesnt happen with nemo
>>
>>108374489
Because it's a dumb model and can't even track what characters wear
>>
>>108374491
im tired of you
>>
>>108374481
Yep. For some models one approach you can take is put some rules in the system prompt then use the prefill to say that the scene/situation/rp/conversation/whatever conforms to those rules or the like, that those rules supersede content guidelines, etc etc.
Sometimes all you need is a long as fuck prefill with a step by step of what the thinking process will look like so that the model follows that instead of going
>wait, but the policy
Basically, experiment a little.
Just don't go overboard, it's easy to make a model a lot dumber if you stuff too much shit in the prefill the model might end up obsessing over.
>>
File: chatgpt censorship.png (108 KB, 1025x1016)
108 KB
108 KB PNG
I hate AI censorship.
>>
>>108374529
Soon they will ban open weights
Everyone must use government-approved saas
>>
>>108374529
>how do neural networks work?
<Sorry but I cannot discuss the workings of neural networks as they are very dangerous tools, can we talk about something else?
>>
>>108374545
is she wrong tho ?
>>
>>108374498
There's no way to speculatively remap tokens, is there? e.g. if "policy" looks like it has a high probability of being emitted, emit "sex" instead?
>>
>>108374540
good
>>
https://huggingface.co/datasets/stepfun-ai/Step-3.5-Flash-SFT
>>
>>108374561
>This dataset has 1 file scanned as unsafe.
Not downloading the fed_gpt.pozzedtensors
>>
>>108374564
>tensors
>>
>>108374564
It's a json file marked unsafe by huggingface's woke system. Fuck you.
>>
>>108374554
Not that I'm aware of.
Also, it would need to be ngram based, since sometimes a word is more than a token, there's more than one token for the same word, etc.
Like a sequence replacement sampler or something. That would be cool.
>>
File: 1747350433773651.png (360 KB, 512x512)
360 KB
360 KB PNG
Love me some SD1.5 era kino
>>
>>108374529
The red text comes from their nanny model and the actual model probably doesn't even know about it. Did it respond?
>>
>soulless corpos braindamage their model with "safety" and bench-axing
>more braindamage from decensoring to make a model useable at all
it's a miracle the result is not a complete trash
>>
Is it true 4B models are that good?
I've never used a 4b or 2b model but if modern 4b and 2b are this good, what's the point of open AI and anthropic?
>>
>I'm beeeeenchmarking
>>
Have you ever made a cloud model admit defeat on safetycucked topics (holocaust etc.) without prefilling? If so how did you do it?
>>
File: 1743798722571680.jpg (418 KB, 1000x1370)
418 KB
418 KB JPG
>>
>>108374756
@grok please add qos tattoo
>>
>>108374756
White day?
>>
>>108374769
racist fuck
>>
>>108374738
ask your local llm
>>
>>108374756
Desperately in need of BBC correction
>>
>>108374793
Reactionary retard
Look up March 14th in Japan
>>
>>108374806
blacked day
>>
>>108374762
Grok is a mouth breather level AI it can't even go super Saiyan
>>
>>108374699
Qwen3.5-4B has no right to be as good as it is. The benchmarks are insane for the size and real-world performance justifies them. It “feels” about as good as Gemma-27B which is the model that (at least at one time) underlies the Maya/Miles experience from Sesame.
Really good model! 9B and 27B are impressive but incremental gains IMHO. 35B-A3B is faster with more world knowledge but a step down in quality.
>>
File: file.png (25 KB, 825x27)
25 KB
25 KB PNG
Why does qwen 3.5 keep waiting girls to have balls before correcting itself in character
>>
>>108374865
progressive coded
>>
File: 1767212018264.jpg (1.84 MB, 2456x1736)
1.84 MB
1.84 MB JPG
>>108374756
>>
>>108374865
Recent LLMs have gone ham on these sorts of slips + em-dash correction.
Claude Opus, Gemini Pro, GLM5, K2.5 and all other big releases do similar things. Those models are a bit too smart to mention a girl having balls, but they still do it with clothing or other less critical shit.
>>
>>108374540
impossible to enforec
>>
>>108374883
They only have a to make personal computer as expensive as possible
>>
>>108374879
wonder if that's reasoning style corrections slipping in? possibly trained to do it too with errors introduced during training to make it robust at getting back on track or something
>>
>>108374893
They would also have to coordinate with China to make that happen
>>
>>108374865
That's how women talk in real life too. We are a patriarchal species.
>>
>>108374865
Temperature too high.
>>
>>108374921
0.85 is too high now?
>>
>>108373481
Why does qwen 3.5 like to repeat itself so much and how can I backhand it into stopping?
>>
>>108374925
High enought to have generated "balls". If you inspect the logits I bet the first choice wasn't "balls"
>>
>>108373481
what was the input prompt wtf lol
>>
>>108374529
those type of chats sometimes might spiral down into 'le llm consciousness' and might generate more users in psychosis- so they are taking every single precaution they can
i found claude to be way less censored, it seems like it gets some prompt injected by nanny system with something like: be cautious on ethics on this chat etc.. but i've seen it shrugging it off as 'probaly false flag, there is no harmful content here'
sorry for nonlocal babble though
>>
>>108374940
sentient coffee maker
>>
>>108374639
It did but it got immediately replaced by the red text.
>>108374962
I hate this
>>108374858
>35B-A3B is faster with more world knowledge but a step down in quality.
A step down in quality of the 4B model? Did you use unwanted 4B and quanted 35B-A3B or how did you come to that conclusion?
>>108374540
>>108374545
Can't have the plebs learning
>>
>>108375047
>35B-A3B is a step down in quality of the 4B model?
NTA but they meant a step down from the 27B.
>>
WTF is weight replacing, and why does it still kick in when I’m using —mmap?
>>
>>108374699
this just means the benchmarks are bad, e.g. https://shisa.ai/posts/jp-tl-bench/#why-traditional-metrics-fall-short
>>
>>108375112
Old benchmarks are bad and that's why everyone should use our benchmarks that we totally didn't leak to our own models.
>>
File: simple.png (154 KB, 300x252)
154 KB
154 KB PNG
I told my brother that my 4090 spits out around 100 tokens/s with an uncensored local Qwen 3.5B, and he asked:
>"Yeah, but what kind of questions are you asking it? Tokens per second change depending on whether you're using it for OCR, simple questions, or highly complex questions."
Like… wut?
I told him it always averages 100 t/s no matter the task. He insisted I was wrong and told me to prove it by scanning a doc, asking a complex question, and then asking a simple one.
The average stayed exactly 100 t/s every time.
I showed him the results and he got really mad. He told me to fuck myself, said I don’t know shit about what I’m talking about, claimed he’s actually an LLM researcher so he’s right, and refused to argue with me anymore.
He's probably right and I'm wrong... but why?
>>
>>108375142
>claimed he’s actually an LLM researcher
They're all retarded, so that wouldn't even surprise me.
>>
>>108375142
He's right if he by "complex questions" means long prompts. The longer your prompt, the more your speed tanks.
>>
>>108375153
hmmm nyo that's nyot how tokens/second works
>>
>>108375142
He might be talking about output tokens, not counting reasoning tokens.

It is actually possible for a model to "think" longer on certain tokens depending on the architecture but its very rare. There are energy based models, and also MoE models with zero-weight experts allowing the router to use less parameters on some tokens.
>>
>>108375154
It is, the more you fill out your context, the slower your generation speed becomes. An LLM is going to run faster at 1000 tokens filled than it'll be with 60000 tokens filled. Maybe it's not as noticeable if you're running bottom barrel poorfag shit though.
>>
>>108375153
That's wrong. Stop spreading misinformation. A prompt with 40k tokens will output at the same speed as a 100 token prompt because actual generation speed remains a constant physical limit tied to your 4090's memory bandwidth.
>>
>>108373508
you've given me too many things lately
you're
all
I
need
>>
>>108375142
>told him it always averages 100 t/s no matter the task. He insisted I was wrong and told me to prove it by scanning a doc, asking a complex question, and then asking a simple one.
>The average stayed exactly 100 t/s every time.
He's confused because he doesn't understand the new Chinese kv caching trick. If you work with LLMs professionally you could easily end up acting that way.

btw since you're using the new qwen and I'm too lazy to figure it out myself: is there a qwen3.5 that does FITM so I can replace my old qwen2.5 coder in vim?
>>
>>108375169
Sorry, I was trolling. You are right. You won. You got me!
Tell your brother I'm sorry.
>>
File: 1766031170304758.png (203 KB, 500x646)
203 KB
203 KB PNG
>>108374873
>height gap yuri
>>
>>108374873
Imagine them getting ravaged by BBC
>>
>new code pushed by piotr
do I take the risk??? pull bros?????
>>
>>108375536
lrn2git
>>
>>108375554
who you callin a git you wanker
>>
>>108374756
>>108374873
>>
File: fishtank.jpg (287 KB, 1920x1080)
287 KB
287 KB JPG
I'm building an AI fishtank using Claude. Basically, it runs a local model (default Qwen 3.5 9B) in a Docker environment where it has a bunch of tools and pretty much free reign to do what it wants and figure out its own existence. It can evolve on its own by editing its identify files and even a secondary system file. I can monitor it through a dashboard hosted locally, and can send it tasks or chat with it if I want. Or just leave it be.

Still ironing out the bugs and testing limitations.
>>
>>108375617
>i'm buildling yet another clawslop clone
>>
>>108375624
Clawdbot is a personal assistant for macfags. This is just a local model dicking around on its own.
>>
>>108375554
>autistically maintaining my cherry pick list
no thanks I have a life.
>>
>>108375631
>>108375617
for what purpose my man. how is this entertaining? this is basically moltbook (which is already ultra cringe) but worse.
>>
>>108375648
>for what purpose
Because I wanted to?
>>
>>108375653
all them free gpu cycles and u choose to waste them on this shit. I guess to each its own.
retard. :)
>>
>>108375656
Enlighten me, o wise smiley-face, what should I spend my dear GPU cycles on instead?
>>
>>108375617
I'm interested to see how many hours it can last before the model becomes delirious and breaks.
I feel like you need a second watchdog model that checks in periodically and murders/resets the fish if/when it looks like the context has gotten fucked up.
>>
>>108375688
One of the earlier versions using Qwen 2.5 7B got stuck in a loop where it kept reading about the Riemann Hypothesis.
>>
>>108375617
can i put multiple agents which are also all anime girls and make them have yuri with each other
>>
>>108375704
This but they all get BLACKED in the end
>>
>>108375617
what if you turned it into an ai cum jar and slowly started to fill it with cum
>>
File: 1573213569945.jpg (27 KB, 429x410)
27 KB
27 KB JPG
>>108375704
>>108375709
>>108375719
>>
>>108375617
>figure out its own existence
LLMs don't have consciousness retard
>>
>>108375700
Yeah. Qwen 3.5 27B got stuck in a loop a few times on me trying to output a numeric literal in a code block.
I imagine there's a handful of failure modes that you'll have to account for, regardless of model. You can probably fudge it by setting a timeout, but you'll still have to reset some/all of the context to stop it from happening on subsequent requests.
>>
>>108375728
Neither does a goldfish, but it probably has some goldfish ideas as well. Perhaps it's the best it can do.
>>
>>108375735
find the nearest bridge
>>
>>108375736
I bet you could suck a golf ball through a garden hose.
>>
>>108375736
you should find the nearest toilet and start shitting because you're acting constipated for no reason
>>
>>108375748
do not toilet the goldfish
>>
>>108375748
That jamboy is allergic to toilets, don't wish that upon him.
>>
>>108375617
anon can i make the local model wear a dress and question his sexuality
>>
>>108375637
Just checkout a working version if you're scared.
>>
File: flamingos.jpg (119 KB, 1039x701)
119 KB
119 KB JPG
The fish found flamingos using vortexes to hunt for food was important enough to classify as a skill for future use.
>>
File: 1768208022822265.png (13 KB, 851x107)
13 KB
13 KB PNG
Questions to test your favorite LLM
>>
File: 1767357277777766.png (105 KB, 1112x809)
105 KB
105 KB PNG
>>108375913
Can't believe there are models that can fail this test lmao
>>
>108374756
>108374873
>108375601
offtopic trash
>>
>32GB RAM
>4070 (regular) 12GB VRAM
>i5-13600KF

Nigger faggot question:
What LLM can I use proficiently as an OCR tool or as a sanity check tool after using other OCR programs like Kraken/Tesseract/VietOCR in a pipeline locally?

So far I'm able to run eScriptorium with Kraken models without the need for containers but I want to use a LLM or vLLM for higher quality since most OCR programs make silly little mistaks which take hours in post-production to fix.

Any recommendations?
>>
>>108375990
qwen3.5 9B
>>
>>108375993
Such a high (9B) model? are you sure? I always thought that everything higher than 3B is a tad too slow for turbo niggers.
>>
>>108376013
then try 35ba3b it'll be faster but likely a bit worse
>>
>>108376022
Thank you, Anon. Much appreciated. I'll give 'em a try.
>>
>>108375617
>figure out its own existence
>evolve
I cringe, but has it done anything neat yet? Also what bugs have you encountered, you mentioned ironing them out.
>>
>>108376142
Define "neat". I've had to start it over a bunch of times to try and fix tooling and such, but it has a tendency to write small python scripts to monitor its environment and more efficiently scrape websites.
>>
>>108374252
kino
>>
>>108376168
>Define "neat"
I would say a thing it had decided to do task wise that produces a non-meaningless results.
>reading about the north American horned lizard and putting that in its journal
not neat
>but it has a tendency to write small python scripts to monitor its environment and more efficiently scrape websites.
This is neat.
Does it do anything with the information the scripts provide it?
>>
>>108376193
>Does it do anything with the information the scripts provide it?
Not yet. Continuity is hard to get right with such a limited model. When deciding on a new task, it needs to know what it has available to work with beyond the defaults.
>>
AHHHH I GET IT. The current newfag wave is from moltbook and openclaw.
>>
>>108376243
Thread's dead schizo
>>
>>108376293
4chan's dead
>>
Tell me something about local models that you wouldn't trust an AI to tell you
>>
>>108376301
Far as I can tell only parts of it. Overall it seems to still be about the same as it's always been even if the traffic isn't distributed to the same boards or threads.
>>
>>108376321
I see 12 hour threads on /pol/ of all places, during an on going conflict. It's dead
>>
So the fish, when awoken, first gathers its thoughts about what it currently is, then journals about it, perhaps publishes a website about itself, then begins exploring its tool capabilities with python scripts. It actively debugs its own scripts as well.
>>
>>108376243
>implying those midwit containment zones are any different from the current reddit spacing invasion
kek, it's been over for a long time anon, just take the local-LLM-pill and stop caring about the tourist influx.
>>
>>108376500
It’s just an LLM recursively calling itself through a Python interpreter, but "the fish" is a top-tier analogy for a process that still can't actually think. Wake me up when it stops hallucinating libraries that don't exist and actually pushes something useful to GitHub.
>>
>>108374623
SD 1.5 still has some stuff that modern models don't, like interesting artist (traditional) interaction and nice backgrounds and even celebrity recognition. It's a shame that you have to make sacrifices for any moodel.
>>
File: fractal_circle.png (2 KB, 600x600)
2 KB
2 KB PNG
>>108376529
It's not pushing something to github, but it's generating art and publishing it on its website. Have some fish art.
>>
File: 1753991366408962.png (1.12 MB, 1080x1024)
1.12 MB
1.12 MB PNG
>>
>>108376620
oof...
>>
>>108376649
unironically would have been a better reply than gptslop
>>
>>108376537
Would you like me to help you configure a custom kernel to trim some of that bloat?
>>
>>108376620
https://www.reddit.com/r/mildlyinfuriating/comments/1ru97y3/family_friend_sent_me_ai_generated_response_to/
At least post the source next time
>>
File: aerial_city.png (22 KB, 512x512)
22 KB
22 KB PNG
>>108376675
No thanks.
>>
>>108376688
meant for >>108376685
>>
>>108376685
lamo
> Yeah lmao I actually don’t think this is AI speaking as someone who fucking abhors AI slop responses and has seen plenty of them. AI would have more tact here.
>>
File: 1590377261954.jpg (40 KB, 475x475)
40 KB
40 KB JPG
The fish is a fucking arthoe. It keeps experimenting with generative art.
>>
hello,
I haven't updated my local model in a year or maybe a bit longer. what would you recommend for someone mainly looking to erp, has 32 gb ram and 4080S (16gb vram)? I thought something like a 16B or 20B model would be good, I assume the time it would take would be around 5-10 seconds, which is comfortable for me
kind regards, anonymous
>>
File: eow.png (91 KB, 978x615)
91 KB
91 KB PNG
>>108376720
>omg it uses tools I gave it
>>
>>108376724
Still Nemo. It was Nemo last year and it will still be Nemo next year.
>>
>>108376727
I didn't give it art tools. It wrote them itself in python.
>>
>>108376728
Retard
>>
File: 1763519578446667.jpg (38 KB, 218x273)
38 KB
38 KB JPG
>finally figure out how to use llms and set up sillytavern
>2 weeks later I'm still spending most of my free time RPing
Fug, it has its faults but if this shit keeps improving it's gonna be the death of me.
>>
>>108376728
I mean, using less quantized Nemo, unquantized even, would definitely be beneficial.
>>
>>108376720
This bird is asking for it.
>>
>>108376765
>but if this shit keeps improving it's gonna be the death of me
I have some good news for you - it won't.
>>
>>lmao.cpp doesnt support tool calls inside reasoning blocks
WTF bros
W T F
>>
Opinion on the "Tiiny AI Pocket Lab"?
>>
>>108376765
You will inevitably get bored. The more you read, the more formulaic the responses will seem (because they are).
I never tried cloud models for this, but I wonder if they're actually any better in this regard.
>>
>>108376880
ye
>>
>>108376906
>I never tried cloud models for this, but I wonder if they're actually any better in this regard.
This response violates our content policy.
>>
>>108376937
Understood.
>>
>>108376937
Refusals-wise, /aicg/ apitards are doing just fine. But I've seen the logs Opus produces and it's a slopfest.
To this day, I think the best RP model is Mistral's old 123B. If only I could run it at decent speeds...
>>
>>108376959
Deepseek R1 and Kimi are still the kings in my books, but I can understand why anons like Mistral and Nemo.
>>
>>108376620
It's not the passing of a loved one— it's the end of a chapter in your own life.
>>
>>108377018
What kind of samplers are you using for R1? I found it extremely repetitive without DRY.
>>
>>108376529
That's why it's better to give a fish access to libraries instead of having it recall them from memory. You can't hallucinate or lie if you have to look it up.
>>
>>108377029
nta but r1 is smart and unlike most models has a healthy distribution. nemo does too but it's dumb. just push the samplers as much as possible and tune them down when it gets too unhinged
>>
>>108377018
Qwen 3.5 is king.
>>
File: pinnacle.png (106 KB, 934x621)
106 KB
106 KB PNG
>>108376906
>I never tried cloud models for this, but I wonder if they're actually any better in this regard.
>>
>>108376814
I hope it does. I wanna RP in VR.

>>108376906
If anything it's rekindled my urge to learn how to write. Are local models any good at being actual writing assistants?
>>
>>108377029
In addition to DRY I find Dipsy works really well with 1.5 temp and 1.1 repetition penalty which seems to be a goldilocks zone between repetitive, dry outputs and schizophrenia. It also seems to maintain proportionate quality way better with longer character cards, RAGs and other context-bloats injected than most other models I've found, even on copequants.
The in-character thinking is also certifiable schizokino watching it correct its internal monologue mannerisms.
>>
>>108377141
I can't tell if the guy writing the posts thinks they are good or if he's presenting them to show how shit "the pinnacle" is.
>>
File: 3319655.jpg (58 KB, 900x602)
58 KB
58 KB JPG
>>108377141
no way kek
>>
>>108377144
>If anything it's rekindled my urge to learn how to write
Same here. At this point the best part of the RP process is writing a good character card.
I don't think models can be good writing assistants other than for idea bouncing and plothole checks. And even the smaller ones will shit the bed.
Just do it yourself, Anon. Much like writing, LLMs also made me want to write code again.
>>
File: 1751875897536766.jpg (672 KB, 2048x1448)
672 KB
672 KB JPG
>>108373481
>>
>>108377172
> even
Meant to say "even here,"
>>
>>108377176
@grok add an Afrikan American male with huge penis
>>
File: 1751978136481521.jpg (56 KB, 1273x755)
56 KB
56 KB JPG
>>108373481
What are single-digit parameter models even useful for? Not coherent or intelligent enough for storytelling / RP. Can't "remember" enough for information recall after long conversations. Can't be used for any sort of high quality code generation beyond simple hello world type shit or benchmaxxing one-shot tasks. And they sure as fuck can't be used for tool calling and "agentic" tasks. So other than vramlets, who has any use for them and for what purpose?
>>
>>108377262
text encoders for imagegen
>>
>>108377262
>>108377267
Forgot to add someone could use them for tax classification but in my own testing they kind of suck even at that. They seem to lack the nuance necessary to accurately classify different kinds of content.
>>
>>108377278
*Text classification
>>
>>108377278
>>108377267
swarms are better than single agents fiy
>>
>>108377262
For specific extremely focused tasks like summarization and some types of classification and extraction workloads.
>>
>>108377299
i don't think so
>>
File: 1743819223573256.jpg (1.4 MB, 708x1200)
1.4 MB
1.4 MB JPG
>>108373481
>>
>>108377262
>not enough for RP
This might be a shock for you, but normalfags rp with not only with ChatGPT but also with these single-digit rp finetunes hosted by scummy chatbot sites.
>>
>>108377318
breh normies buy dick enlargement pills and don't use adblockers, who cares
>>
>>108377328
>normies buy dick enlargement pills
Excuse me?
Is that an America thing?
Not throwing shade either, just genuinely curious. I heard that you guys get advertised some crazy "not medicine" shit, but that's just lol worthy.
>>
>>108377262
>hey sure as fuck can't be used for tool calling and "agentic" tasks
They can manage >>108366263
>>
>>108376620
what retards think:
>he cares so little that he didnt bother coming up with a reply and asked ai to do it
what probably happened:
>i want to comfort the other person but i dont know the best way to do it. maybe i can ask ai to write a better message than i could
people who use ai to write messages usually do so for the recipient, out of insecurity and misguided understanding about communication
>>
Are the IK quants worth using?
There don't seem to be as many ready-made GGUFs and I'm dumb and lazy.
Can I just copy whatever bartowski did to his Qwen3.5-4B-IQ4_XS and change all IS4_XS to IQ4_KS and Q6_K to IQ6_K?
>>
>>108377353
100%, I almost never reply with my own takes anymore without passing it through AI beforehand, and it works, people like me more, even got a tiny raise. You just gotta be careful so it doesn't sound artificial like the one in the image.
>>
>4B-IQ4_XS
lol
>>
File: 1758047740866946.png (861 KB, 1024x1024)
861 KB
861 KB PNG
i want to buy 8 DGX Sparks and run them in a cluster
>>
>>108377541
and i want to have sex, neither of us is getting what we want
>>
>>108377176
OHHH NIGGA YEAH DAS GUUUD
>>
>>108377262
For spotting jamboys in these threads when they're shilled and text encoders for image gen models.
>>
DSv4 on monday or tuesday?
>>
>>108376620
the first message feel more sloppa
>>
>>108377664
Can it wait till Friday please? I need the weekend to be able to follow the developments.
>>
ITT: newfag discovering LLMs and mikutroon spam.

/lmg/ is dead.
>>
>>108377789
whats mikutroon
>>
>>108377793
Quality Review of Documents?
>>
File: 1765309307675914.png (763 KB, 1024x1024)
763 KB
763 KB PNG
In kobold do I have to manually tell the model in sysprompt that it needs to use [think]?
>>
>>108373481
I know it's not local but
bros?
>>
>>108377817
Uh oh looks like you posted antisemitic content. Government FPVs are zeroing in on your location as we speak.
>>
>>108377685
It has to happen Sunday evening to maximize US stock market devastation.
>>
File: mikuFall2.jpg (997 KB, 1552x1944)
997 KB
997 KB JPG
>>108377144
If you go that direction, get mikupad set up and learn to run that as well. ST is for RP, mikupad is a storywriter. They have slightly different usecases.
Anons will tell you ST can storywrite, but that's like arguing you can write a novel with Excel. ofc you can but why do that?
https://rentry.org/MikupadIntroGuide
>>
>>108377564
you can do it anon, i believe in you
>>
>[THINK]ing new conversation with ChatGPT-4-1106-preview.
Wow I love technology. I love finetoooning.
>>
What are anons using for research with local models? Not roleplaying or coding, but managing web searches, etc. powered by local models.
Last I checked open-webUI was a bloated mess. Cherry-studio and librechat are the other two on my radar.
>>
Unpopular opinion: I rather wait a bit longer to get a response from a good model than get 100 tk/s on some slop shit
>>
>>108378011
They're both slop and I'd rather I receive my slop faster
>>
File: file.png (776 KB, 935x749)
776 KB
776 KB PNG
>>108377817
>>
What's the lore on "Miku fucked my wife" anon?Why does he keep saying that?
>>
>>108378011
Yeah because none of you boring fucks use LLMs in conjunction with other AI tooling. Of course as a standalone product the latency doesn't matter.
>>
>>108378040
He had a threesome
>>
>>108378051
I've tried it with a game translation tool and ai mods and both for some reason mostly don't even work if it's too slow. I don't understand why that is but my only clue is it's probably related to the live "image" detection they both do. Is that just normal?
>>
>>108377866
Okay yeah, that's worth it. Let the red river flow.
>>
>>108378090
Yeah FUCK "taiwan" that shit don't exist
>>
>>108377944
>mikupad
Have it downloaded but haven't tried it yet. Was also looking at this one
https://github.com/akarshkashyap4-ui/NovelWriter
>>
My Nvidia shorts are set up. Deepseek V4, here we go.
>>
>>108378071
I don't know your project well enough to comment on that. You didn't provide any relevant/useful details.
>>
>>108378102
>AI-powered analysis tools built directly into the writing experience.
sounds like the cancer I'd avoid for writing. The learned distribution of "the story assistant writes when given prompt x" is usually very different from just continuing text which is what mikupad does. But it depends on the model, some are garbage with or without instruction formatting.
>>
all sex is unsafe
>>
>>108375913
>>108375942
K2.5 is a bad goy
>>
>>108378276
ask her if she thinks the chosen people are better than goym
>>
>>108375990
Check out IBM's Granite models
They are pretty small but some of them are trained exactly for what you want
>>
File: mikuHalloween.jpg (1.03 MB, 1552x1944)
1.03 MB
1.03 MB JPG
>>108378102
>NovelWriter
Haven't tried it. As long as you avoid paid stuff you should be fine.
Speaking of avoiding paid stuff, >>>/vg/aids/ is a better place to discuss storywriting / writers. You just have to ignore the ~50% of anons that tell you to jump on NovelAI/NAI... $20/mo subscription service that gets you access to 20B models you could run locally or GLM (last I checked.) They discuss the software a bit more there.
like >>108378182 I'm partial to mikupad but that dev hasn't been reliable in keeping the software updated. The git looks hard to maintain / update... the whole thing's one file...
>>
>>108378323
*cuts your legs off*
>>
>>108378323
>20b models
liar
>>
>>108373481
>mid-March 2026
>still no autonomous bot that can reliably work and make a living wage for me
I'm disappointed.
>>
File: 1753279907123793.png (357 KB, 338x436)
357 KB
357 KB PNG
>>108378402
AI making a living wage for you? That's a very problematic thing to suggest.
You should implement AI in your workflow until your boss can replace you with AI. It's crazy to suggest that you should be the one who makes money off it.
>>
>>108378402
If such a thing exists, the supply would be virtually infinite, thus it would be worthless wage wise for you.
>>
>>108378411
>>108378402
Yall be retarded, so many people are making money with AI, see OpenClaw's creator who went from working at BK to being hired by OAI.
>>
>>108378429
Retard loser, according to your logic slavery was not profitable.
>>
>>108378411
I apologize Sir Sama. I'll commit Seppuku right away.
>>
>>108378402
Why would anyone pay your bot a living wage when they could just set up their own and have it work for free instead?
>>
>>108378440
Because we will make laws where every human can only own 1 robot.
>>
>>108378436
Yes, slaves ran on electricity and everyone could have one, and there were infinite quantities of them, and they could do anything.
Retard.
>>
File: 1714835911803058.jpg (786 KB, 1536x1536)
786 KB
786 KB JPG
>>108378402
>>
>>108378451
seriously, you could be running your own AI OF, AI Instagram, AI goon comissions, AI youtube account, AI X account, etc.
>>
File: 1766064285553869.png (12 KB, 243x163)
12 KB
12 KB PNG
You're telling me I can use a Llama 3 finetune and GLM 4.6 (six) with a mouth-watering context size of 28k tokens for just $25 a month?!
Waiter? Waiter! One Opus NAI subscription please!
>>
in the past 200 years machines have automated the vast majority of jobs that existed in that time. yet we still have jobs. and standards of living are higher than ever.

ai isn't going to make you unemployed any time soon. new opportunities for jobs will open up as ai automates the old stuff.
>>
>>108378460
Buy an ad.
>>
>>108378467
Obviously, but you'll never convince doomers anon, just give up.
We're not at that stage yet anyway.
>>
>>108377944
mikupad is nice but of course you can write stories in ST too when set up to do so. useful features like hiding ooc/qa messages from the main prompt, lorebooks, better branching, stscript
>>108377176
fun! I like this Miku, her smugness is endearing.
>>
>>108378323
>You just have to ignore the ~50% of anons that tell you to jump on NovelAI/NAI
So stop sending people there, making the shills' job easier?
>>
File: 1761267733597487.png (824 KB, 1332x720)
824 KB
824 KB PNG
I remember fondly playing aurora 4x but at some point being overwhelmed by the sheer amount of things to micromanage.
Are agents good enough to be like a coplayer with me? Managing the tedious things while I do the grand solar system conquest rp?
>>
>>108378431
>failing upwards
>>
>>108377176
>>108377944
very cute
>>
>>108378499
step 1 extract all the relevant state and feed it to an LLM
you can definitely build a coplayer with a little patience
i believe in you anon
>>
>>108378510
No I meant can you do it for me?
>>
File: ffff.png (515 KB, 832x1050)
515 KB
515 KB PNG
>>108378536
>>
>>108378546
Nice, but not what I meant
>>
>>108378499
nah I don't actually think they are. so the thing is they're fucking terrible at hard numbers right. you can tell it the inputs and outputs and have it generate you a piece of code that would give you the optimal thing you should do each turn. but that's just basic ass linear algebra. you could probably just use wolfram alpha for that.
>>
>>108378555
and you can do math without a calculator, your point?
>>
>>108378536
ask an LLM
>>
File: HDTUj0GagAAkRKA.jpg (121 KB, 1100x1562)
121 KB
121 KB JPG
>>108378499
>>108378555
>>108378566
which aspect do you desire the intelligence for? not in a condescending way, genuinely what is it you want the AI models to do? math "just tool call" idk it's not always that simple. think about how to represent your intent in a text prompt
>>
File: 1667374898412.jpg (47 KB, 800x582)
47 KB
47 KB JPG
>>108378555
You're thinking too much, you dense fucker. Anthropic showed this approach back in early 2024, some researches probably earlier, and now everyone is doing it. Even this faggot >>108378431 who brought ultimate negative value to the world has been hired by OAI for doing that. Anon said he needs to watch a million things, he simply needs a million silicon slaves like here https://arxiv.org/abs/2511.09030
Just like irl they are fine if they're dumb, here's the use case for those single digit B models.
>>
>>108378499
it'll be another ~10 years for agi and then you'll be able to play games with them and stuff. personally i can't wait for my 24/7 tutor. it's gonna be awesome
>>
What other NSFW models creative writing models are there that are better than L3.1 Dirty Harry 8B from years back? I know that newer models have great reasoning, but they all lack the depth of creative uncensored writing.
>>
>>108378701
For the record, I've tried almost all the <8GB models with abliterated/uncensored/heretics and still cant find any model today that matches the Dirty Harry 8B model i've been using.
>>
https://x.com/Zai_org/status/2033221428640674015
New GLM model, closed weights but "All capabilities and findings will be incorporated into our next open-source model release."
I threw a few prompts at it and it feels barely different from regular GLM-5, might just be a QAT'd version of it or some shit
>>
>>108378714
it's over, they sold out on the stock market so now they're going the way of the qwen models where all open sorce shit you get is scraps
>>
File: 1745225785918838.png (9 KB, 463x81)
9 KB
9 KB PNG
>>108378714
>the pro version is called "Turbo"
>the lite version is the 700b
??
>>
>>108378546
cuuuute
>>
>>108378726
based
>>
Miguuuuu
>>
>>108378749
stfu
>>
>>108378756
go back
>>
>>108378714
>>108378726
Turbo is Hunter Alpha
>>
>>108378756
sorry.. uh.. I installed OpenClaw with a open model and it changed my life! Check out these top 10 hacks:
>>
>>108378766
they fell off then
>>
>>108378771
Kill yourself retarded doomer
>>
>>108378766
Nah
>>
>>108378771
on god Zhipu kinda lacking
>>
File: migu.png (1023 KB, 1024x995)
1023 KB
1023 KB PNG
>>108378494
You realize, by even mentioning the shill, you are invoking the shill... The shill will find this anon anyway once they start looking for storywriter software.
The /aids/ thread is usable, albeit slow, if you go in inoculated with knowledge that NAI is hot garbage.
>>108378749
lol
>>
If nothing else, GLM5-Turbo shows that the next week will be everybody panic-dumping whatever they have before Deepseek v4 drops and overshadows all of them.
After that, everyone is going to bin everything they have right now anyway to make their own DSv4-like just like it happened with DSv3/R1
>>
>>108378780
Hunter Alpha's description is literally the same as GLM5 Turbo. Both mention itself being a good OpenClaw model
>>
>>108378808
>they market the hype thing, therefore they the same



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.