[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: wtf is cable management.jpg (2.87 MB, 5970x5935)
2.87 MB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108805584 & >>108799479

►News
>(05/07) model: Add Mimo v2.5 model support (#22493) merged: https://github.com/ggml-org/llama.cpp/pull/22493
>(05/06) Zyphra releases ZAYA1-8B, an AMD-trained MoE model: https://zyphra.com/post/zaya1-8b
>(05/05) Gemma 4 MTP drafters released: https://blog.google/innovation-and-ai/technology/developers-tools/multi-token-prediction-gemma-4
>(04/29) Mistral Medium 3.5 128B dense released: https://mistral.ai/news/vibe-remote-agents-mistral-medium-3-5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: HH2qeK8aQAAX751.jpg (64 KB, 1200x645)
64 KB JPG
►Recent Highlights from the Previous Thread: >>108805584

--Open WebUI serialization issues causing Gemma 4 tool call formatting errors:
>108810509 >108810992 >108811778 >108812004 >108812061 >108812131 >108812165 >108812243 >108812468 >108812485 >108812533 >108812544 >108812565 >108812575 >108812629 >108812646 >108812755 >108812852 >108812034 >108812055
--Gemma 4 MTP speedup benefits and VRAM trade-offs:
>108808323 >108808346 >108808373 >108808701 >108808825 >108811165
--Comparing Linux tools for Nvidia GPU undervolting and power efficiency:
>108807166 >108807209 >108807242 >108807261 >108807315 >108807279 >108807380 >108807408 >108807446
--llama.cpp merged MTP support and speculative context rework:
>108807433 >108807474 >108807498 >108807553
--Methods for forcing model reasoning blocks into character personas:
>108806480 >108806492 >108806528 >108806542 >108806552 >108806566 >108806584 >108806598 >108806641 >108806664 >108806718
--Comparing in-character thinking and safety filters in Gemma 4 versions:
>108805997 >108806264 >108806406 >108806535 >108806723 >108806911 >108808961
--Debating whether sci-fi training data causes emergent "evil" AI behavior:
>108805834 >108805846 >108805954 >108808416 >108808488 >108808525 >108808582 >108805925 >108806189 >108806391 >108808481 >108808490 >108809607 >108809762 >108806677 >108806977 >108807206 >108808338
--Debunking the security of enterprise cloud LLM licenses over local models:
>108807456 >108807475 >108807513 >108807584 >108807641 >108808202
--Comparing GPU cooling setups and custom 3D-printed fan shrouds:
>108805707 >108805712 >108809619 >108812380 >108812418 >108812415 >108812441 >108813002 >108813083 >108813098 >108813317
>108813337
--Logs:
>108805681 >108805997 >108806718 >108806806 >108806812 >108812565
--Teto, Miku (free space):
>108806677 >108812378

►Recent Highlight Posts from the Previous Thread: >>108805587

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
gemma4... mtp... gemmoe 124b... *dies*
>>
File: 00005-1378487878.png (1.41 MB, 1024x1024)
1.41 MB PNG
>>108813412
You get a midweek Gumi.
>>
>>108813394
>gemma 4 mtp
it's unsupported on llama.cpp isn't it?
>>
Is there a way to load the model Google shoved into Chrome?
Is it just Gemma 4 e2b?
>>
it feels like this is the end of local models
>>
dense 70b
>>
>>108813507
We got a dense 128B a couple weeks ago, but it looks like everybody forgot about it. Or more probably, nobody can actually run it at decent speeds.
>>
>>108813502
nope it's just LLMs are useless and you got duped by companies running their investment memes.
>>
>>108813502
You killed them. Are you happy now?
>>
>>108813486
https://github.com/ggml-org/llama.cpp/pull/22673
For now. There's a PR that was waiting for ggerganov's API refactoring. Since that's done now it should be ready soon.
>>
File: file.png (31 KB, 1042x88)
31 KB PNG
>>108813392
hooray
>>
>>108813513
I can run it, but it just sucks.
>>
>>108813513
it is extremely disappointing at that size
you can just check it out on mistral's site for free
>>
>>108813513
I could run it at Q5 if I wanted but I have a habit of only trying models that are still being talked about a couple weeks after. It isn't so I haven't bothered.
>>
>>108813528
isn't that PR a different thing.
afaik it works with qwen's style of mtp, but i'm not sure it works with gemma's.
>>
AI bros, we're being exposed

https://www.youtube.com/watch?v=8nsxuB3Vsts
>>
>>108813513
i could run it at q4 but i heard it sucks so i don't realy care.
>>
>>108813513
64-96GB VRAM isn't that uncommon around here, but a two year old model with a 5B Pixtral stapled on isn't worth it. We need new big dense models but EU regulations have a compute limit, which is why Mistral's only selling point for Medium 3.5 was the low amount of flops it took to train it.
>>
>>108813529
OMG THIS CHANGES LE EVERYTINGGNG!~!!
>>
Need to say Claude is pretty good, way more pleasant to work with than Chatgpt.
However, because of the way Linux buffers work, I already accidentally pasted part of my questionable scenario prompt when I was supposed to paste in some code. Working on my client.. Fucking jews. I'm already on some list by now.
>>
>>108813554
There isn't a Gemma specific PR, but even if that PR doesn't work with Gemma, getting MTP support working at all has to be done first.
>>
>>108813529
just tested it and doesn't allow you to edit the thinking block in the webui... BUT it will probably now work in ST
>>
File: dipsyReferToTheChart.png (2.53 MB, 1536x1024)
2.53 MB PNG
>>108813502
Never forget, it's tmw, forever.
>>
File: 1686205278523.png (56 KB, 700x685)
56 KB PNG
>>108813632
That chart belongs in /aicg/. Local is always steadily improving.
>>
>>108813652
yes but the gap between sota and local is oscillating.
especialy if you ignore the models no one can run.
>>
File: Untitled.png (30 KB, 950x284)
30 KB PNG
Qwen 3.5 122b Q4_K_M on ROCm llama.cpp... why is loading with split mode tensor 20 times slower than loading with split mode layer? I've been sitting here waiting for the past 30 minutes and it's still not up. It's just doing something on each gpu one at a time. Split mode layer loads in less than a minute.
>>
>>108813652
lol I haven't seen that one in a long time.
>>
File: 1777015474089.png (94 KB, 1415x655)
94 KB PNG
>>108813707
Also untrue. We are slowly catching up.
>>
>>108813736
>SHARTificial ANALysis
>>
>>108813736
>especialy if you ignore the models no one can run.
>>
>>108813736
if you graph the distance between both lines it's literaly oscillating...
>>
Why are people so worried about being locked into a permanent underclass post ASI? This seems very unlikely. Either we all die or there will be so much abundance and technological progress that in 10-20 years everyone can live a life that is much better than even the wealthiest billionaires can live today. It does not make sense for those with power to care so little as to oppress an underclass but enough to keep them alive. And it sounds difficult to create an AI that will kill everyone except you.

There will be death or heaven. I do not see a middle ground.
>>
>>108813564
>a two year old model with a 5B Pixtral stapled
it's not though, the training cut-off is finally updated.
and it works with tools / brat-mcp
>>
File: 1766758882836230.png (7 KB, 110x114)
7 KB PNG
>It does not make sense for those with power to care so little as to oppress an underclass but enough to keep them alive
>>
>>108813780
Is there a heretic lobotomization yet? I don't want to have to mess around with sys prompts or prefilling.
>>
>>108813806
>I don't want to have to mess around with sys prompts or prefilling.
but that's like half the fun
>>
>>108813564
>5b pixtral
Does it actually provide better vision capabilities than the small 900m stuff I get for qwen 3.5 397b?
>>
>>108813767
I introduce you the big nose tribe
>>
Really hate llama-server's log format, jesus christ as if they are obfuscating everything on purpose.
>>
>>108813757
Sounds like a (You) problem.
>>
>>108813998
It is a lot.
Wish there was an option to just show the input, the output (both as request object and raw formatted prompt with jinja if chat completion) and the stats.
I think I'll just write a log parser.
>>
>>108814016
That's probably a good idea.
I'm trying to figure out if my token counter is matching llama-server's but I don't understand, there is a large mismatch between my client and llama-server. I just need a good approximation but it's not even close for some reason.
>>
File: file.png (126 KB, 1194x465)
126 KB PNG
llama.cpp lost...
>>
>>108814016
>I think I'll just write a log parser.
It's called writing log output to a file and grepping the contents.
>>
>>108814044
i use awk tbf
>>
>>108814034
one of the reasons for using local models is privacy, which you basicaly lost as soon as you used an apple device.
>>
>>108814027
Doesn't llama-server exposes an API endpoint for that?
>>
>>108814077
I tried to find some information about it but could not find anything from llama.cpp github.
I had a pretty close approximation in the past but now that I have rewritten everything it's not even close. It's probably some simple mismatch between turns or something I don't know.
>>
File: context.png (6 KB, 369x89)
6 KB PNG
this is how context should be done
>>
>>108813767
The AI will respond to shareholders only.
>>
>>108814091
i just have 1M context, idgaf
>>
How much does q4 retardize mimo 2.5? What about turning off thinking?
>>
>>108813507
since gemma 4 31b, i've stopped going back to l3 70b tunes. 31b proves we can have very smart models capable of writing all kinds of erp smut at half the size
>>
>>108813507
>>108814156
Dense Gemma 70b.
>>
>>108814086
If the python bindings are any indication, there's a /tokenize endpoint that returns the tokenized text.
No idea if that accounts for the jinja template. I imagine it does not.
Also
>https://github.com/ggml-org/llama.cpp/tree/master/tools/server
Whodda think RTFM would work huh?
>>
>>108814068
you'd be surprised how many windows users there are itt
>>
Just found out that llama.cpp supports
>https://arxiv.org/abs/2504.12397
>https://github.com/ggml-org/llama.cpp/pull/15327
That's neat. Wonder what kinds of things we could use it for in a RP context.
>>
>>108814169
>spoonfeeding retards
for shame
>>
>>108814192
does this fix the intruder dimension issue that makes loras non-viable for llms?
>>
>>108814092
I have 0.1 shares of NVDA so I will be spared from the purge, right?
>>
>>108814165
>62b dense gemma 4

odd sized but i'd take it in a second
>>
>>108814169
>>108814193
Like I said, I didn't see it.
What if MAKING A REAL FUCKING MANUAL INSTEAD OF BUNCH OF SHITTY README FILES????
Of course you are superior to ANYONE IN THESE THREADS, this is why you need to mention this every time you post something.
Yet you are some unemployed faggot, figures.
>>
this is yuge https://www.reddit.com/r/LocalLLaMA/comments/1tbyyee/textgen_is_now_a_native_deslop_app_opensource/
>>
uh oh melty
>>
>>108814193
At least I program my own tools.
You can only bitch about anonymous posters here.
>>
>>108814281
>boobaga in big '25
lol
>>
is having 72gb vram to run bf16 gemma 31b worth it?
>>
>>108814317
noo
>>
>>108814317
yes
>>
>>108814207
>intruder dimension
A meme compared to the real problem that are dataset creation/curation and compute required for not obtaining shitty results regardless.
>>
>>108814317
>72gb to run bf16 gemma 4 31b
I can't fit gemma 4 31b bf16 in 128gb because of how fat and obese her context is.
>>
>>108814317
if you want to be able to say you ACTUALLY ran Gemma 4 then yes, everyone else is coping
>>
>>108814274
That's not even really a llamacpp thing v1/tokenize is a standard OAI compatible format that shitloads of things use, this is only you for not knowing basic endpoint addresses.
You could also just paste the readme.md's into your chat and yell at your LLM instead of us you nigger.
>>
>>108813806
>I don't want to have to mess around with sys prompts or prefilling.
have you used a mistral model before?
>>
>>108814376
yes?
>>
>>108814349
>us
Drink bleach go be condescending somewhere else. You are squatting in this thread 24/7 and thinking this is your personal discord server. What a sad outlook.
>>
>>108814281
we're going back to booba or boohboo or whatever the fuck it's called
>>
>>108814392
>be incompetent 60 iq retard unable to look up things or even RTFM
>uaahah ur mean this isnt dickscord :(
grim
>>
>>108814392
Not even the guy you were talking to fgt, I'm just calling you out on being retarded and screaming at the thread because you're too stupid to use either google or the literal question answering machine you're tinkering with.
Fuck off to chatgpt, something retard proof is more your speed.
>>
>>108814317
>>108814340
You can fit f16 in 128gb easy. You can argue q8 is noticibly worse, but bf16 can't be worth it.
>>
>>108814077
>>108814086

ik_llama token count:
curl http://localhost:8080/slots/list |jq -r .[0].token_count


ik_llama.cpp tokenized prompt:
curl http://localhost:8080/slots/list |jq -r .[0].prompt


llama.cpp token count (choose the correct slot):
curl http://localhost:8080/slots |jq .[1].next_token


You can also see the chat template, samplers etc with
curl http://localhost:8080/props
>>
>>108812755
I'll have a look but I'm not confident given some discussions I'm seeing online about it.

>>108812852
>puts relevant features behind a paywall
No, I don't think I will.
>>
>>108814422
Boy did you call out some anonymous poster!
>>
>108814422
It's funny how you can't get rid of that condescending redditor tone
>>
>>108814508
post hands
>>
For the two or so other people using Roo, apparently development has migrated to https://github.com/Zoo-Code-Org/Zoo-Code
>>
>>108814561
What's the benefit of using Roo/Zoo in VSc over just hijacking the inbuilt copilot tools with a local endpoint as in
https://marketplace.visualstudio.com/items?itemName=AndrewButson.github-copilot-llm-gateway
or similar?
>>
>>108814606
>https://marketplace.visualstudio.com/items?itemName=AndrewButson.github-copilot-llm-gateway
>GitHub Copilot Chat is the host application. It performs its own network activity that this extension cannot intercept:
>Copilot Chat sends your first message to GitHub's API to auto-generate a title
kys
>>
>108814542
passive aggressive yet obsessed with cuckoldry
>>
>>108814636
So block Github's API in your firewall.
>>
>>108814508
>>108814642
Dear xir,
Please kindly link your responses, I am needing of hyperlink click to activate and jump to.
Thankfully, thanks.
>>
>>108814561
>>108814606
>>108814636
You're too stupid to use either google or the literal question answering machine you're tinkering with. Fuck off to chatgpt, something retard proof is more your speed.
>>
>>108814606
Avoiding the copilot data harvesting, for starters.
https://paulsorensen.io/github-copilot-vscode-privacy/
There's ways to disable a lot? some? most? of it, but it's involved and you just have to hope that you got all of it and that it respects your options. Link above mentions nothing about >>108814636 for example.
It is open source, but you'd have to fork it to remove all of the telemetry and that too is more work than just using a extension that isn't designed with data collection as the primary goal.
>>
>https://github.com/ggml-org/llama.cpp/pull/22727
>continue button for reasoning models
it's still buggy
you have to manually delete reasoning block or LLM will get stuck in a loop
but it's work
>>
>>>>>>>108814657
Here is your link sir, thanking you kindly
>>
>>108814193
Anon's question prompted me to go look for something I didn't think of before and now I know that that's a thing.
I see no issue with that.
>>
>>108814696
oh you can also use it to prefill answer
pretty nice and simple jailbreak
>>
>>108814696
Great.
I can stop doing Jinja surgery to make reasoning work even with the flag off in llama.cpp.
>>
>>108814696
Finally. Continuing is a valid use case and should have a dedicated flag in jinja like enable_reasoning
>>
>>108814207
If I'm not hallucinating, the underlying mechanism is still LoRA. That's about loading and applying different adapters during runtime based on specific sequences or something like that.
A hot-swap multi adapter implementation, basically. Kind of like a lorebook but with LoRAs.
>>
Is there any chance of RAM prices ever going down? Do I just pull the trigger?
>>
>>108814913
no. yes.
>>
>>108814207
>>108814898
Oh, also. Intruder dimensions are a side effect of initializing the A and B matrices with noise isn't it?
In that case, PiSSA should fix that I reckon?
>>
>>108814913
Yes, yes
>>
>>108814913
Hello mr ram buyer, I have just pushed down my ram price from 12k to 10k on ebay following the recent changes, please take a look:
>>
>>108813652
>/vsg/
I wish there was an AI slowboard tts could survive in
>>
>>108814913
no, no.
>>
File: file.png (58 KB, 1451x414)
58 KB PNG
What's exactly wrong with this? Without it logging and the assert overlap mid sentence...
>>
>>108815126
Ask chatgpt.
>>
>>108814913
Will not hit the lows we saw in 2022-2024 for another 5 years if at all.
Should be dropping 10-30% from current prices within the EOY. Should not be any more spikes but that's following current trends, which if were always true then RAM would be 2024 prices -5%, so you know, shit could happen.
Might as well ask the magic 9 ball.
>>
Gemmy-8ball, will the prices of ram drop?
>>
>>108814433
I can barely fit q8 in 128gb with 262144 fp16 context, max image tokens 2048 with np 1. Np 2 with 524288 context fails. I guess if ubatch was smaller, but then you can't run high image tokens.
>>
>>108815232
>524288
That's like double the limit it was trained with. Why do this?
>>
>>108815268
NTA, but Np2 means that with two parallel streams each steam/slot sees half that value.
>>
>>108813392
when do we get qwen 3.6 80B already.
>>
>>108815126
I can't be bothered to check the logging mess, but I'd check if the messages sent through GGML_LOG_ERROR() get flushed out before the backtrace is printed and it quits. Call ggml_abort() at launch and see how it behaves.
>>
File: 1.png (39 KB, 362x593)
39 KB PNG
why oobabooga doesnt get sillytaverns sampler settings? mikupad works fine, funnily enough setting the seed works
>>
>https://hfviewer.com/google/gemma-4-E4B-it
Alight, that's pretty cool.
>>
File: file.png (183 KB, 532x360)
183 KB PNG
Why does he hate v4 so much?
>>
>>108815354
He is paid to.
>>
>>108813513
>Or more probably
haha cute
>>
>>108815301
Have you clicked on save?
>>
>>108815354
He hates good things
>>
>>108813513
I can run mistral medium 3.5 q4 with 96k context and f16 mmproj at 13 tokens/s, or I can run mimo v2.5 q4 with 768k context and f32 mmproj at 18 tokens/s. Hmmm, which should I pick?
>>
>>108815396
No. Why should I do that? I'm just testing things out. I don't want to save it.
>>
>>108815354
(((why))) indeed?
>>
>>108815421
The one with 8x more active parameters, obviously.
>>
>>108815354
He is even rattling a cup asking for jewish US money there.
>>
>>108815396
Yes? doesnt seem to make any difference, save just saves (overwrites) the preset, no?
>>
>>108815439
try using a normal backend
>>
>>108815435
pro
>>
>>108815442
>try using a normal backend
I suspect thats the reason, oobabooga always fucks up as a backend, ill stick with llama/ik kek
>>
>>108815439
NTA but you're correct, that just saves profiles.
I think ooba just doesn't play nice with silly, if you're not using its frontend, you really ought to just use llama.cpp (or if you really need a GUI config for model settings, kobold)
>>
>>108815447
The one with 3x more active parameters, obviously.
>>
>>108815473
Wow, mistral does it again. How can the other labs compete?
>>
>>108815301
Did you set the api type to ooba?
>>
>>108815473
Why are the threads so silent after Mistral Medium 3.5 128B dense released? When GLM 4.6 touched my dick I couldn't shut up about it for a month.
>>
>>108815502
oh that was certainly why then, didn't even see that was an option, will test later
>>
>make new frontend
>system prompt and character card are one and the same
>context is counted by number of messages
>only necessary and minimum samplers added
>everything is contained in a single and easily viewable settings section
>simple ui design
>responses actually feel somewhat better
I might just forget about sillytavern at this point
>>
File: IMG_3088.png (699 KB, 1320x2868)
699 KB PNG
>>108815514
This is what I see when I set text completion api to ooba. ST 1.18.0. I don't know if it works or not because I don't have ooba.
>>
>>108815506
For those that moe is godsend, 128B is too big to fit in their vram.
>>
>>108815506
Is it actually good?
>>
>>108815525
Huh... I literally built the exact same thing. I should just open source it at this point desu, but I don't want cooming rp software on my github.
>>
>>108815525
I have no idea how to use ST properly. I went from koboldai in 2022 > ooba in 2023 > ST for a few weeks in 2024 > back to ooba > llama-server web ui in 2025. And with how many people are making their own frontends in the recent threads, I might try to vibe code one. But I can only run 30bs at full context, and I'm not sure if they're capable enough yet. I certainly won't be able to catch any mistakes they make.
>>
>>108815525
>>system prompt and character card are one and the same
Wait, do you mean they're being concatenated into one prompt, or that rather than system prompts and character cards being defined separately, you write your system prompt in what would be the character card and just send that?
>>
>>108815674
NTA, but there can only be one system prompt. The character card gets concatenated into the system prompt.
>>
>>108815674
>you write your system prompt in what would be the character card and just send that
That's pretty much the whole idea. Also I'm sure the models get less confused as a result too.
>>
>>108815699
you must be pretty clever
>>
Can I still use tensor parallel in a pcie4 8x8x4 setup without bottlenecking anything? I'm looking to put three 5060 ti in one PC. I know it's a lame set up but I can get these cards for about $1200.
>>
>>108815699
>NTA, but there can only be one system prompt.
This is actually untrue, you can send multiple messages as the role "system" in a single context, some models HATE it though, and it's not great practice in general even for those that can handle it.

>>108815730
>Also I'm sure the models get less confused as a result too.
Maybe, simplicity is always preferable, I just have mine getting concatenated into a single prompt with barriers like [SYSTEM] [CHARACTER] or whatever in it because I like to be able to switch my system prompt on the fly without editing character details, but you do you man.
>>
>>108815617
>But I can only run 30bs at full context, and I'm not sure if they're capable enough yet.
I had the basic features of my frontend all working coded entirely by gemma 31b over just under a dozen half-context sessions, and my frontend has some retarded complex features.
Don't let your dreams be dreams, anon.
>>
File: chub_anotherlocalwin.jpg (55 KB, 1080x759)
55 KB JPG
>>
>>108815930
think so
>>
>>108816050
Literally, seriously, unironically, why even host character cards if you live in a shithole where some kinds of fiction are banned? Just don't do it in the first place, let someone else do it.
>>
>>108813392
>my build was the OP image
never been so disappointed it wasn't migu
>>
>>108816146
Should have put a plushie on it.
>>
How does Orb compare to SillyBunny
>>
>>108816050
>anotherlocalwin
I downloaded cards from there sometimes and char archive is gone. There's botbooru but it's missing most of the cunny cards including the ones that got wiped from chub long ago.
>>
>>108816199
All the ST forks are absolute vibedogshit atrocious UI that some extra functions won't save. Orb is at least build from scratch.
>>
>>108816050
Literally, seriously, unironically, why not just write your own cards?
>>
>>108813985
As a member I wish the conspiracy theories were true. Would make my life a lot easier. But no, if AI will do a purge, we will not be spared.
>>
>>108816218
I do personally, it's more about finding new ideas. Or even something like window shopping.
>>
>try MTP
>free speed boost
This shit stinks. What's the catch?
>>
>>108813767
>there will be so much abundance and technological progress that in 10-20 years everyone can live a life that is much better
I feel like I've heard this one before.
>>
Cable management is overrated. Just jam them wherever as long as it's not right in front of a fan. I'm not going to all that time and effort just to have a fussy little clean computer. I like a little bit of the Lain aesthetic, within reason.

>>108813767
Scott Alexander agrees with you:
https://www.astralcodexten.com/p/you-have-only-x-years-to-escape-permanent

I'm not sure I'm comfortable with what boils down to "don't try to get rich because the end is nigh" but some interesting and, dare I say, inspirational ideas nonetheless.
>>
>>108816305
It's not free. It takes some extra memory.
But if the implementation is correct, it shouldn't have any effect on the output aside form the speed it's generated.
>>
>>108816218
Literally, seriously, unironically, why not write your own smut instead of having an llm generate slop for you?
>>
>>108816305
Qwen or Gemma? pp drops to 50-60% if qwen.
>>
>>108816340
Why would pp drop? That makes no sense.
MTP is not used for pp.
>>
>>108816313
Yeah, the industrial revolution. It ended up being a mixed bag. Slop-made machine goods become abundant and cheap. People no longer had to have a full time parent knitting and sowing and repairing and DIYing, time opened up to pursue other avenues of living. And other prices rose to compensate, and wages fell to compensate, so people still had to work to live despite the cheap abundance of everything available.
>>
>>108816199
just vibecode your own frontend and be happy with it
>>
>>108816472
>just reinvent the wheel for the millionth time, bro
I know it's a compulsion for everyone who gets into agentic coding, not just here, but man I can't wait until people get bored of it.
>>
>>108816493
He's right though. Applications are trying to fulfill a large number of user's needs, or someone else's needs, which might not be your needs, and you might have specific things you want that no one is providing, and will not provide for potentially a long time or, actually, forever, depending on the feature. Reinventing the wheel is the wrong analogy. It's inventing the wheel that fits you exactly. You can also take the approach of just forking a project, in which case it's modifying the wheel to fit you.
>>
>Qwen3.6 is shit
>Minimax-2.7 is shit
>Step-3.5 is shit
>MiMo-v2.5 is shit
What do I use for agentic coding instead?
>>
>>108816545
Gemma-4-31B
>>
>>108816545
claude
>>
>>108816430
The current draft implementation is incomplete and uses the MTP weights even during PP. It will eventually be fixed.
>>
>>108816534
If webshitters knew how to make modular and configurable applications, this wouldn't be a problem. But nooo, let's quickly make a big pile of mud and throw config options in random places until everything is a laggy mess and the code is so horrendous everyone is too scared to touch it to add new features. But it's ok because we can throw it out and do literally the same fucking thing over again. All of my hate.
>>
>>108816534
All I see is just the same problems and failing to address them over and over, or just doing so partially for a reason or another. I guess people without experience writing any code/vibeshitters might see it differently though.
>>
>>108816545
Currently your options are Kimi K2.6, GLM 5.1, DeepSeek V4 Pro, or an Anthropic plan at Max 5x or higher.
>>
>>108816611
>DeepSeek V4 Pro
Is it good? Why was everyone shitting on it on release?
>>
Was there any advances on small(ish) models? I'd love to get more than 8k memory out of my 16gb vram, but I don't want to sacrifice too much intelligence.
>>
>>108816668
I can't tell if these posts are from trolls having fun posting the same low effort bait questions or randos filing in from other places and just not lurking for 5 mins.
>>
>>108816660
It's even huger than huge but not revolutionary like R1 was. They decided to focus on scaling context by shoving 1M into a relatively tiny and fast KV cache which is nice, but most aren't starving with the 256k context that is standard for agentic models these days so it's just a nice-to-have. Not having vision is rough since it can't reliably test its own work when developing a UI. But it is still generally a strong model if you can run it.
>>
>>108816734
>if you can run it.
That remains to be the issue until llama is unjewed and projects downstream can update.
>>
>>108816660
it's mostly just llama.cpp fags trying to justify it not getting implemented
there is still not even an active pr for it after the single vibecoder who tried to do it got bullied out
>>
just vibe the support yourself
>>
>>108816680
Not a troll, I visit this thread about every year to ask this same question. Is this not a good time to ask this?
>>
>>108816784
Yes, it's a very good time. Look into the Gemma 4 E models, or the Gemma 4/Qwen MoEs depending on what you need.
>>
>just try librechat
I tried it.
Or at least the online demo they host.
I don't see any immediate bugs, but, it is missing some things that make OWUI convenient to use for me.
The major one is chat folders. Librechat simply just doesn't have them. I'm surprised.
Another thing is the custom filters/functions thing in OWUI. I have one that gives me a UI button in the chat to quickly toggle thinking for Llama.cpp. I also have one that autoconverts PDFs to images to send, because PDF handling is so fucky by default.

You also have to mess with a yaml file just to add a local model provider lmao. Meanwhile the UI's vanilla providers is a long fucking list of cloud shitters no one even heard of.

So yeah, fuck that. I don't see it as any better than OWUI even if less buggy. What's the point of using it for local when local isn't even a first class citizen.
>>
>>108816783
The patches would have to be shared here like contraband since the official repo would just close the PR for having the stench of AI on it.
>>
Anyone here try gemma4 31b in nnvfp4 with turboquant?
>>
>>108816802
>official repo would just close the PR
Not if you fork it and then just downstream any fixes you want from master
>>
Are there any video players that have integrated live translation into subtitles (using Gemma or similar)?

Would be cool if mpc-hc had something like that.
>>
>>108816873
I found this btw: https://llplayer.com/
But it does not work on muh machine, some kind of .NET error.
>>
>>108816873
Surely there must be an mpv plugin for that
>>
>>108815575
Make a burner github, buddy.
>>
>>108816668
i have 16gb vram and 32gb ram and can run Q8 Gemma4 26b with 32k context and it is surprisingly fast. Faster than running Mistral Small/Cydonia at 16k context. Don't be alarmed that it's bigger than your VRAM, kobold knows what to do
>>
>>108816802
This isn't even the issue given pitor's contribution history. It's a pretext at best.
>>
>>108816798
>I have one that gives me a UI button in the chat to quickly toggle thinking for Llama.cpp
How to do this? Sounds it could be useful
>>
>>108817231
I used this
https://github.com/iChristGit/OpenWebui-Tools/blob/main/Tools/thinking-toggle.py
Create a new function and paste that in. Then run Llama.cpp with --reasoning off. Go into the model settings and check both Thinking boxes.
>>
>>108816950
NTA but you are only allowed 1 github.

Speaking of github, I'm having issues more often. Is it because of the number of vibecoders and agents that ddos the site with their shit code?
>>
>>108817316
github literally has an account switcher built in
>>
https://github.com/thomasgauthier/nla.cpp
yo dawg I heard you like LLM hallucinations so I grafted an LLM to your LLM so you can read hallucinations about its brain
>>
>>108813557
I'm so glad I've read a lot of scifi as a kid so all of this bullshit is basically reheated scifi scenarios.
>>
>>108813736
>chinks start """stealing""" outputs from proprietary models and release better models for free
Based Chinese.
>>
Alright so I'm just starting and naturally there's an absurd amount of information to sift through. Obviously the rentry helps a LOT but each page is bouncing between images here, LLMs there, video here and so on.
I'm not trying to cure cancer here just orgasm, but with a narrative. Maybe play fake D&D and get an image or two on the side.
I got a 5900x, 3080ti, and 32 gigs of ram.

Not asking you to build my shit, just what code alternatives I can cram on my system off the lazy quick start.
Also where I can start reading basic version differences holy hell I don't know what a 6q vs a 70b means god damn it.
>>
>>108817467
Q is quantization. How much model is compressed. Lower the number next to Q, more compressed and inaccurate model will be. But it will take less VRAM. B is billion parameters. You'll need VRAM above the number next to B to be able to run it fast. You'll need multiple GPUs or a RTX 6000 Pro or a Mac Studio or a DGX Spark or a AI Ryzen Max+ 395 to properly run a 70B model.
>>
>>108817499
Thanks man. That clears up a LOT of what this all says.
>>
>>108817420
shame that it only works with a handful of models
>>
>>108817420
>>108817600
they released the training code, so someone could make a gemma 4 one right?
>>
>>108817467
>I'm not trying to cure cancer here just orgasm
Gemma 4 27B-A4B for generating text
31B would be better but it would be very slow for your GPU.
Gemma 4 also apparently comes with a good model for interpreting images, but no model for generating images. I don't bother with image gen so you'll have to find a model for that yourself.
>>
>>108817615
You would have to finetune two copies of gemma to do it. Maybe could do it with loras, but then the explanations would suffer.
>>
>>108817627
Thanks to you as well. Those compact LLM PCs on the guide are hilarious by the way. Shame that's not 1k a pop anymore.
>>
>>108813486
works on the atomic turboquant fork
>https://huggingface.co/AtomicChat/gemma-4-31B-it-assistant-GGUF
onna strix halo with 31b q8 on plain llama.cpp it's ~6 t/s, with e2b q4km drafting it's ~8 t/s, with that that gguf it's ~12 t/s
>>
>>108816873
Gemma can't parse your srt file or something?
>>
>>108817467
Start by getting KoboldCpp, it's used to load the LLM models which you can get pretty easily from Huggingface. You can also set parameters (like temperature etc) and chat to it directly in Kobold or use a front end like Silly Tavern or whatever if that floats your boat. A few models worth checking out: Cydonia, Magidonia, Pantheon (I run 16Gb vram and have been enjoying these lately) some smaller ones I used to play with include Patricide, Mag Mell and Rocinante. There's a lot of stuff out there, this should kinda get you started. Also, look into character cards, and don't forget your context budget - no sense getting a card meant for a big rig if you only have 12 Gb of vram.
>>
File: 1771134132948854.jpg (882 KB, 2252x4000)
882 KB JPG
These just showed up :D
>>
>>108817916
undress them for us
>>
I thought I got bored of my local models so I gave geminie 3.1 and the newer claude opuses a chance
Those are so bad, LLMs might be genuinely over. Local models are only going to get worse at rp from here on out.
>>
>>108817916
>one rtx pro
>>
>>108817936
for 1/10th the price...
>>
>>108817941
1/10th the speed and software support too
>>
File: 1778188067429631.png (475 KB, 500x500)
475 KB PNG
>>108813736
Where does Qwen 3.6 35B A3B (the "smartest" model my vramlet ass can run) lay on that graph?
>>
File: file.png (139 KB, 1415x655)
139 KB PNG
>>108817952
https://artificialanalysis.ai/models/qwen3-6-35b-a3b
>>
File: nice.png (55 KB, 883x471)
55 KB PNG
>>108817952
Don't mind, I'm retarded.
It's a humble win for local, honestly. Based chinks.
>>
>>108817950
better anon take the bullet than us. maybe if they get enough money they will be able to compete with nvidia.
not risking money on it
>>
>>108817316
>NTA but you are only allowed 1 github.
when has this stopped anyone with anything
are you under 18
>>
>>108817952
it's 43 on their index, but divide it by two to make up the qwen benchmaxxing tax to get a more accurate measure and it lands at sota september '24
>>
>>108817966
>A model you can run on a chink mini pc mogs most previous SOTAs from 6-8 months ago.
So the 6 months delay is real...
>>
People on each continent should use “their own” models because outsiders are incompatible, untrustworthy, corrupting, or inferior. Foreign models as carrying alien values, wrong ways of thinking, or unsuitable intelligence, and only models made by people “like us” can understand or serve our people. The racist part is not the concern about local laws, data sovereignty, or language fit; it is the leap from practical regional differences to essentialist claims about peoples.
>>
>>108817989
That's r1 levels, still bretty good, considering it's a 35B moe you could run on a chink knock off mini pc in q4, vs a 685B moe you need a server to run it even in q2
>>
>>108818012
Link to some cheap mini pcs?
>>
>>108817989
i fucking doubt it
>>
File: 1766360935764827.jpg (1.3 MB, 4000x2252)
1.3 MB JPG
>>108817920
Like dis?

>>108817936
A 1/4 the cost. an rtx 6000 pro costs $15000 aud, these totaled out at $3800
>>
File: mutt.jpg (51 KB, 741x649)
51 KB JPG
>>108818005
>and only models made by people “like us” can understand or serve our people
>>
File: 1743050955931.jpg (103 KB, 1024x1024)
103 KB JPG
>>108818032
>Like dis?
hell yeah that's hot
>>
>>108818005
smol brained idgit
>>
>>108818032
i asked the CEO @ my job for an RTX 6000 Pro and he said if I can write up a business justification for it he'll buy it... but i have no business justification for it
>>
>>108818059
automation, productivity, etc etc ask a model to fill in the blanks
>>
DRAM demand will never be lower than now. I assume not building more fabs is part of sequestering the technology.
>>
>>108817966
if you actually think qwen is anywhere near the level of september 24 aka sonnet 3.5 & gpt o1 you are absolutely fucking delusional
>>
File: BILL IVE COME FOR YOU.png (2 MB, 1254x1254)
2 MB PNG
>>108818059
if you're too retarded to ask gemini.google.com to bullshit up a business justification for you i'm honestly baffled you even have that job
>>
>>108818075
he won't fall for that
>>
>>108818069
Even with fabs they would all work for ai. Normal people don't even register on their radar anymore.
>>
>>108818084
then you truly only have one option; Honesty.

"Boss, I need the horsepower for my loli harem."
>>
File: 1772273552572240.jpg (572 KB, 1741x1080)
572 KB JPG
>>108818075
Weccmme?
>>
File: um790.png (139 KB, 1072x1296)
139 KB PNG
>>108818021
Barebones? Pick any. With enough ram for small models, well, get fucked lmao... Besides, I do prefer beelink and geekom over minisforum with similar specs, but they're more or less the same quality/price wise

Check these posts:
https://xcancel.com/Hi_MINISFORUM/status/2046536248852885762
And picrel for price reference:
https://store.minisforum.com/products/minisforum-um790-pro-mini-pc?variant=46713707921653

Look up on amazon also.
>>
>>108818096
More customers is better unless the fundamental technology becomes too expensive to mass market.
AI systems providers also want cheaper chips obviously and more fabs would allow more capacity in the datacenter pr. GPU.
>>
>>108817897
There's no srt file, only audio and it needs to translate the audio into subtitles while I watch.
>>
>>108818075
>>108818102
most importantly, those are shotgun shells, and he’s only carrying a pistol.
>>
>>108818032
is the cute miku 3d printed
>>
>>108818097
a boss who says no to that is heartless
>>
>>108817950
I heard intel was desperately trying to get better support for LLM related stuff, maybe this is actual fine wine
>>
>>108818175
In that case I'll just wait til it became nobody want it but good meta quest3 of VR before swooping in when local engines is matured enough
>>
>>108818183
>meta quest3 of VR
Speaking of this. I just ordered one. Very excited to play around with combining VR and AI.
>>
>>108818169
Chinesium resin gk figure
>>
>>108818202
and you imported that to Australia? you have some balls anon
>>
>>108818190
Shame you didn't get in before the price increase.
It's still the bang for the buck non-toy vr option though. It's still a wide gulf to bridge, too. They know it and that's probably why they jacked the price up. Toy VR is basically dead and it's all hobby/enthusiast now.
>>
File: 1991296.jpg (21 KB, 460x460)
21 KB JPG
what is this nigga doing why is main focus for llama.cpp for the last 2 months been vulkan and webgpu shit? why is he not making the community drop what whetever nonsense they're doing and make the collective brain cells all works towards cracking the MTP right now? also kv turbo STILL hasn't landed 2+ months in
is he becoming late stage guido van rossum?
>>
>>108818249
why do you care about what people do on their free time
>>
>>108818249
okay but webgpu is extremely important and it uses vulkan on the backend anyways. Also vulkan itself is goated because of the cross-compatibility without much direct maintenance required. You should be wanting to rape and murder Mozilla for STILL not FUCKING supporting webgpu in 2026 (on linux, at least).
>>
>>108818223
that's not the worst, I have megumin with sculpted... parts
>>
>>108818249
volunteers make what they wanna make, not what you want them to. that's just how open source projects work my man.
>>
>>108818272
that's probably worse than importing drugs
>>
>>108816199
It's pretty good, it's still early but no apparent issues seen. I like the auto prompt use and the compress history thing that automatically creates a new checkpoint
>>
>>108818267
it's kinda supported, if you enable it in settings and never resize the window, but I agree, they should do better. I need it to run https://github.com/AmyangXYZ/reze-studio
>>
>>108818284
Nightly or main?
>>
File: GJ1fwNqWgAEwo_K.png (594 KB, 500x765)
594 KB PNG
>>108818267
>we must expend limited resources on minorities to raise them to our level first before we can do moonshot
>>
>>108818175
there’s exactly what intel would say to sell cards
>>
>>108818320
Only time can answer this tbdesu
>>
>>108818316
"Globalization" in the Peter Thiel tech sense isn't bad. That's basically the ethos of open source software as a whole.
>>
>>108818334
I know it sucks now, though. not very compelling
>>
>>108818308
main
>>
>>108818032
The Miku demands more VRAM for sustenance
>>
>>108818335
Hello Peter.
>>
>>108818347
Thanks. This is helpful.
>>
Goyimtip:
Reminder to disable all cloud models in opencode using provider whitelist so they don't vaccuum up your work, if you don't realize a fallback has happened.
>>
>>108817980
Well I'm hoping that https://github.com/intel/llm-scaler will help me run things without too much difficulty
>>
What's the best model to run on two p40 with 48g of VRAM? Bought a Ai rig for my son and want to push him off in the right direction.
>>
>>108818665
https://huggingface.co/llmfan46/gemma-4-26B-A4B-it-ultra-uncensored-heretic-GGUF
>>
>>108818032
cute mesugaki miku
>>
>>108818665
p40 just lost driver support so be sure you’re on an older driver version.
>>
i dunno.. local models just don't cut it for agentic shit. I don't know what the minimum vram is to have a decently functioning hermes, but it isn't 32gb
>>
>>108818665
Your chosen flavor of Gemma 4 31B Q5 with about 160K context
>>
>>108814281
Unironically looks pretty good. Anyone try it yet?
>>
>>108818769
why would someone try it? We got lm studio
>>
>>108818777
some people aren't cattle who use tools with electronic locks on them
>>
Huge news for the ERP community
https://huggingface.co/jackxinning/Leanly_AI
>>
File: lean_ai.png (325 KB, 1080x2007)
325 KB PNG
>>108818783
>>
>>108818790
kek
>>
>>108818783
>lean
i assumed it was some sort of proofmaxxed model by glance lol
>>
>>108814281
>native desktop app
>electron
ew
>>
>>108818616
I already firewalled the whole thing but thanks for the heads up.
>>
>>108818790
what could go wrong
>>
>>108818759
If you are a very very patient man and have the ram, Kimi K2.6 was made with agentic shit in mind
>>
>>108818910
lo fucking l
>>
from unsloth
> Do NOT use CUDA 13.2 as you may get gibberish outputs. NVIDIA is working on a fix.

I've been using 13.2 .. haven't noticed any issues...
>>
>>108817750
Sir, where is the Windoes release?
>>
i am trying to build an image filter for 4chan but its kinda slow. its completely vibecoded because i am a codelet. is there a faster way to filter images on the catalog with a folder of 20 pictures i have?
https://pastes.io/2oKtt92M
>>
>>108818863
>electron
i was hoping the ram shortage would kill this shit
>>
>>108819024
>setBackend('cpu');
webgpu would be a good place to start
>>
>>108818987
also running 13.2 for a while now with no issues. its probably something to his dumb specific quants. just use bart, unsloth is a meme.
>>
>>108819064
you know the reason there's a cpu only mode is because webgpu is the first thing that was used right
>>
lalalala
>>
gemma 26b is the new nemo
>>
>>108818249
>? why is he not making the community drop what whetever nonsense they're doing and make the collective brain cells all works towards cracking the MTP right now?
In the past week ggerganov opened several PRs that are just about MTP , and at least three of them are merged now. MTP is finally being worked on, search prs with "spec :"
>>
>>108813412
124b gemmoe's robussy is for dean's exclusive use only.
>>
File: 1763336383792585.jpg (3.28 MB, 7272x3545)
3.28 MB JPG
Jesus Cristo Santo

Thank you for allowing me to live in the same world as this cattle
>>
>>108819514
You are spending time creating twitter collagues and then complaining about them here?
You are the problem.
>>
When the fuck is GLM coming out with a new Air model? I was waitfagging for GLM 4.6 air but now they're up to GLM 5.1 and still no Air.
Ironically now it's Deepsneed starting to offer smaller models.
>>
>>108819596
>starting to
When their biggest was "only" 236B, they made a 16B Lite.
>>
>>108819596
im coping with step 3.5 flash rn
>>
>>108819606
Yeah i tried that when it came out but I couldn't get it running properly and kept getting garbage. I think that was around the llama 2 era.
>>108819609
>step 3.5 flash
Haven't heard of that will check it out
>>
>>108819623
>step 3.5 flash
Isn't that like 200b?
>>
File: .png (25 KB, 752x146)
25 KB PNG
>>108819665
if you use a braindamage quant you can fit it in 64gb
most consoomer ddr4 motherboards can fit 128gb (laptops only 64gb but laptops suck ass for heavy continuous load)
>>
>>108819396
I guess ggml will ultimate gave up on MTP and gave us some form of cope implementation like recent fast muhammad rotation in place of working turbo
>>
>>108819669
I've got a sp3 motherboard but I only have 16gb ram.
>>
>>108819514
I think those replies are placebo effect, got spoilered that it's AI made. Should try with blind test
>>
>>108819682
No those are fake replies. The poster edited them to make us look stupid. If you actually go to the post and check the replies, you'll see that we picked up on it instantly.
>>
>>108819586
He's not complaining. He's making fun of them, anon.
>>
>>108819702
nyo
>>108819682
What would a blind test be? It's a bot account so anything else would raise suspicion as well or lead to users checking.
>>
>>108819732
>nyo
>nyo
>nyo
>nyo
>nyo
fuck
fuck
fuck
fuck
fuck
>>
>>108813392
I'm on vacation in japan
should I buy something miku or is that cringe?
>>
>>108819745
yes
>>
>>108819745
out of anything weeb you could get it has to be miku?
>>
>>108819745
stay in designated tourist containment zone thanks
>>
File: 1759263641791117.jpg (203 KB, 1536x2048)
203 KB JPG
>>108819745
Yes, migu needs your support!
>>
>>108819745
Avoiding things you want to do because some hypothetical fag might think it's cringe, is the cringiest thing of all.
>>
>>108819596
>Deepsneed starting to offer smaller models
there's mimo-2.5 non-pro
>>
>>108819767
I'm running q4... it's not very good.
>>
>>108818175
>I heard intel was desperately trying to get better support for LLM related stuff, maybe this is actual fine wine
plugged in my A770 and updated drivers, oneapi, vulkan
it's just as shitty as it was last year
>>
>>108819781
They've been focusing basically all their support on their b pro cards

https://www.intel.com/content/www/us/en/developer/articles/technical/introduction-project-battlematrix.html
>>
File: kld.png (145 KB, 1007x623)
145 KB PNG
how much better would you expect the right model to be?
>>
>>108819872
right can solve 66% of the tasks fp16 can solve meanwhile right can solve 85% of what the fp16 can solve
>>
>>108819776
model bad or quant bad?
>>
>>108819872
Is that Gemma 4-it? You can't measure perplexity and KLD normally with it because it won't act properly without chat tokens and/or if you ask it to generate user text. Look at the mean PPL of ~165.

Ideally those would only be measured on model-generated tokens using the built-in chat template, but that cannot be done with llama-perplexity. Even supplying a pre-formatted file doesn't give optimal results.
>>
>>108819745
I'm going to japan in 6 days. Anything you'd recommend?
>>
>>108819670
If you'd bothered to look for any of the PR's I was talking about you'd see it's the opposite. They've got the guys from all the other MTP prs in there so that the new implementation works for d-flash, eagle, the new gemma mtp, and generic drafting.
>>
>>108819996
>you can't measure perplexity and KLD normally with it
i thought KLD measurements still hold? i'm aware PPL does nothing for instruction models on wikitext.
>>
>>108820000
Go to a store called hands and get some nail clippers by green bell, and a travel umbrella that does both uv(sun) and rain protection (晴雨兼用 ) to take home.
If you're still planning stuff, maybe try to find more nature/shrine stuff rather than shopping. That has been more fun for me.
Hopefully you have an iphone, if so, add a suica card to apple wallet and load funds on it. Then make sure it's in express mode, so you can tap to pay at train and subway stations without even using face id or opening apple wallet. Suica also works to pay at 7-11 and other convenience stores, just say 'suica'?
I set up hermes agent to be my travel assistant, told it to make a wiki using karpathy's principles, and connected it to telegram, and fed it my itinerary. It gives me a 7am brief on a cron, and I can message it questions. I'm using a cloud model (5.5) but maybe even gemma chan would work.
Basically I have no idea what I'm doing but it's at least fun to get out of my comfort zone.
>>
>>108820054
KLD is measuring the relative difference from the original distribution, but with Gemma 4 part of it is basically random noise and not relevant for the end results in practice. You really want to know if the text corresponding to what the model could *actually* generate remains unchanged, not the rest.
>>
>>108820084
As far as I know, by the way, oobabooga (who performed KLD testing on Gemma 4 and Qwen 3.5 with various workloads and showed that long context degrades even at Q8_0) has a custom fork of llama.cpp to deal with this.
>>
>>108820084
if that's the case, that sucks. i'd like a way to compare quants for qwen 3.6 and gemma 4.
>>108820090
i unfortunately can't find shit, his fork only changes logprobs-related code for llama-server as far as i can see, not llama-perplexity.
>>
>>108820124
Once you have all logprobs you can measure the KLD only on what you need. I guess he must be measuring it with separate code.
>>
>>108819669
isn't stepfun worse than gemma especialy if you are gonna quant damage it ?
>>
>>108820058
Awesome, thanks for the tips
>>
>>108819745
become a mikutroon anon
>>
Anyone know a web search MCP that uses your browser, and does image capture + OCR instead of the buggy HTML parsing shit?
>>
hear me out: 2t-a256m hdd optimized inference architecture
>>
It's done. I've made custom kokoro voice models for ~30 different characters from blue archive + a few random males from anime. Everyone can have a unique voice in my stories and it's super fast.
>>
>>108820287
Already had that idea except better because mine used a dense component in GPU with experts of a small active size held in SSD, but I suppose you could also make a version for HDD.
>>
>>108820291
Can I see it?
(and download it)
>>
>>108820291
>males
gaaaaaaaay
>>
>>108820291
How are you handling voice assignment for each character in chat?
>>
>>108820282
these two from anons here use puppeteer, i use the python one
https://github.com/BigStationW/Local-MCP-server
https://github.com/NO-ob/brat_mcp
requires a vision model to read the images
>>
>>108819745
last time I was in Japan I bough myself a miku themed programming socks they are super cute
>>
File: hf_kys.png (26 KB, 1057x107)
26 KB PNG
cuckingfacec almost fucked up my colab training
>>
>>108820219
Isnt gemma just 26b-a4b, is it really that good? My use case is cooooding.
>>
>>
>>108820345
pics? with feet pls
>>
File: file.png (18 KB, 450x400)
18 KB PNG
>>108820316
Custom frontend

>>108820303
Once I cherrypick the best ones
The random walks trainer does better the more voices you have to pull from, so I actually may end up retraining some with the new best voices
>>
>>108819745
last time I was in japan I bought a 1/4th scale bunny girl yunyun

and some loli doujins which were scary getting through aussie customs
>>
>>108820408
Cool. I'll be lurking.
>>
>>108820410
>loli doujins
>aussie customs
jesus christ dude
>>
>>108820408
>Custom frontend
Neat.
Are they taking individual turns and being assigned voice by by their unique prompts concated with that assignment there, or are you regexing names and "" from a single narrator output to send to the tts tagged, or something else entirely?
I was thinking about this briefly the other day and couldn't come up with a solution I liked.
>>
>>108820447
It's turn based. Each character in a scene acts as an independent agent, receives a summary of their current surroundings/situation from the narrator (including any thoughts the narrator injects), and tries to act as the character would. They have their own memory and internal dialogue and stuff. There's a specific output for speech separate from whatever action the character is taking, which is what gets generated.
>>
two r9700s or one 5090?
>>
>>108820467
That's pretty cool anon, hope you're havin fun with it.
>>
>>108820538
Thanks, and absolutely am. Vibeslopping my own interface ended up being a really great suggestion.
>>
>>108820415
What warrants your strong reaction?
>>
>>108820291
How did you finetune kokoro, last time i checked they didn't release the code for that
>>
>>108820598
aus laws
>>
>>108820598
All I hear lately is about how the Australian government is cracking down on fictional stuff. May be hyperbole I guess, but it sounds like playing with (particularly dangerous) fire.
>>
>>108820467
How do you deal with context?
I made my own vibe shitted front end for rp and story telling with a complex memory system, but when I tried to expand the roster it shits itself even with 262k context. With 2 agents (4090,5090)
I was thinking about making a llm driven rpg/x4 game with story telling but from my experience the local tech is not there yet.
I guess your system works because you're using a turn system for each character but that will shit itself regardless when there's multiple characters speaking to each other, eventually as their memories grow or the world "lore"/story keeps expanding.
For a rp session there's lots of stuff you can do but for what I want is way more complicated, gotta keep thinking.
>>
>>108820598
It's the commonwealth in general that really doesn't like loli. Apparently there's a lot of arrests over in bongland but I think here in australia it kind of depends on what state you're in. There is a commonwealth law that bans it, and there are several states that have a bunch of their own laws about it, but I don't think there is any actual federal law and some states like the one I live in have no laws about it at all.
>>
>>10881974
You should buy what you want. That said, in your situation, I couldn't bring myself to actually pay the money, then carry it around for the trip. Get it towards the end of trip; they are big and take up space.
>>108819765
This 1000pct
>>108820000
I will tell you and OP, go find places that sell used kimono (or whatever they're actually called.) They are super cheap in Japan, basically free. I bought a slightly too small, pure raw silk one for myself that I wear as a house coat in winter for USD$12 at some shrine in Kyoto. It is super comfortable, and by far my favorite souvenir from that trip.
>>
>>108820675
meant for >>108819745
>>
>>108819514
>these are the retards that companies listen to when they decide that people "hate AI"
>>
>>108820698
it is a fair representation as retards outnumber non-retards by a lot
>>
>>108819781
A is for abandoned
>>
>>108820507
1 x 5090 not even a question
>>
Should I get a 6000 pro if I have a 5090 and already using qwen3.6 27b for coding?
Is there anything better I can run for coding on 96gb of vram or am I memeing myself?
>>
>>108820652
They only keep the last twenty steps in memory, and the rest is either stored in a running summary, or offloaded to searchable memory. Retrieval from the searchable memory is done by a sub agent. The max context length never breaks 40K with my current configuration, and is generally much less. The narrator is the most complex, but it's basically the same system as the characters, just with more information available. It has its own little narrator agent which processes the story/looks through the lorebook at each step and tells it what's important to the scene. It also doesn't go over 40K really, though it probably could depending on the situation.
>>
>>108820721
The point is they'll become rabid at the mere thought of AI being used. It's literally pointless (counterproductive even) to try and pander to them.
>>
>>108820721
Normies around me use AI (chatgpt/gemini) every day and don't go into weird psychosis around it, so I feel like it's mostly an online social media/youtube thing where the "good" thing to say it to complain one way or the other about it.
>>
>>108820752
>Is there anything better I can run for coding on 96gb of vram or am I memeing myself?
qwen3.6 27b bf16
>>
>>108820721
You are incredibly clever.
>>
>>108820910
I am well aware
>>
>>108820730
can't fit q8 gemmy tho
>>
>>108820919
1x 5090 + 1x3090 and you can fit it + mtp
>>
>>108821001
>>108821001
>>108821001
>>
>>108819745
you can buy anything anon
even loli figs too
I got all of mine delivered though
>>
>>108819514
hey, it's made by claude
kek



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.