/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Janitor acceptance emails will be sent out over the coming weeks. Make sure to check your spam folder!

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 06/17/26(Wed)16:13:10 No.109079129

File: qwenMikuBuddyCop.png (2.48 MB, 1024x1536)

2.48 MB PNG

/lmg/ - Local Models General Anonymous 06/17/26(Wed)16:13:10 No.109079129

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>109074493 & >>109069535

►News
>(06/16) GLM 5.2 released with IndexCache and 1M context: https://z.ai/blog/glm-5.2
>(06/16) VibeThinker-3B released: https://hf.co/WeiboAI/VibeThinker-3B
>(06/12) MiniMax-M3 released, multimodal 428B-A23B with 1M context: https://hf.co/MiniMaxAI/MiniMax-M3
>(06/12) Kimi K2.7 Code released: https://hf.co/moonshotai/Kimi-K2.7-Code
>(06/12) EAGLE3 speculative decoding support merged: https://github.com/ggml-org/llama.cpp/pull/18039

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/RecapAnon/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
06/17/26(Wed)16:13:29 No.109079131

Anonymous 06/17/26(Wed)16:13:29 No.109079131

File: image_2025-08-15_085921572.png (285 KB, 450x450)

285 KB PNG

►Recent Highlights from the Previous Thread: >>109074493

--Paper: Elias in the Lighthouse, Again? Diagnosing Low Diversity in LLM Stories:
>109075345 >109075903
--Optimizing llama.cpp flags and KV cache for Qwen3.6-35B:
>109074991 >109074993 >109075010 >109075035 >109075183
--Comparing Qwen3.6 MoE and dense against GLM4.7-flash for coding:
>109077145 >109077169 >109077266 >109077290 >109077328 >109077401 >109077378 >109077425
--Anthropic disables Fable 5 and Mythos 5 due to government directive:
>109077569 >109077575 >109077581 >109077583 >109077584 >109077588 >109077591 >109077599 >109078061 >109077636
--Anons analyzing VibeThinker-3B's verifiable reasoning claims:
>109076828 >109076872 >109076883
--Comparing DeepSeek V4's efficiency against SOTA models:
>109077711 >109077734 >109077788 >109077828 >109077911 >109077929 >109077951 >109077941 >109077957 >109077982 >109077866 >109078093 >109077807
--Debating the value of multilingual data in specialized coding models:
>109078295 >109078320 >109078374 >109078609 >109078677 >109078482 >109078534 >109078538 >109079054 >109078477
--Anons sharing hardware specs and software stacks:
>109075240 >109075259 >109075933 >109075281 >109075297 >109075308 >109075313 >109075453 >109075508 >109075480 >109075506 >109075510 >109075519 >109075558 >109077051 >109077082 >109077192 >109077218 >109077231 >109078110 >109078876 >109075638 >109075661 >109075788 >109076026 >109076054 >109076269 >109076314 >109077278 >109077501 >109077872 >109078960
--Allegations of funding embezzlement regarding Rio 3.5 397B:
>109076163 >109076219
--Using agentic workflows and Qwen/Gemma 4 to translate RPG games:
>109076342 >109076430
--llama.cpp adding support for DeepSeek V4:
>109077601
--Logs:
>109074683 >109075496 >109075746 >109076881 >109078060
--Teto, Miku, Gumi (free space):
>109075661 >109076837 >109077051 >109078876

►Recent Highlight Posts from the Previous Thread: >>109074494

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
06/17/26(Wed)16:14:51 No.109079137

Anonymous 06/17/26(Wed)16:14:51 No.109079137

1 - So GLM 5.2 is 700b parameters (ish)

2 - 4x DGX Sparks can supposedly handle up to 700b parameters (give or take)

3 - GLM 5.2 is supposedly in striking distance of the performance of GPT 5.5 and Opus 4.8. In my brief tests, it's really not shabby at all.

4 - So for $20k, you can get near the frontier on your table.

5 - Extrapolate the trend, and you could have mythos/5.5 pro - class models in your dining room for the cost of a cheap car less than five years from now. Even without extrapolation, we're already the near frontier running locally.

6 - Paying real api costs, I could easily blow through $3,000 per month coding and running agents. The machine pays for itself in 6-7 months conservatively.

7 - In 3-5 years, most power users of AI will self-host.

8 - Am I missing something?

Anonymous
06/17/26(Wed)16:16:36 No.109079147

Anonymous 06/17/26(Wed)16:16:36 No.109079147

>>109079129
https://litter.catbox.moe/duj5m06rautvke9v.mp4
https://litter.catbox.moe/duj5m06rautvke9v.mp4
https://litter.catbox.moe/duj5m06rautvke9v.mp4

Anonymous
06/17/26(Wed)16:19:38 No.109079167

Anonymous 06/17/26(Wed)16:19:38 No.109079167

>>109079137
?

Anonymous
06/17/26(Wed)16:20:57 No.109079176

Anonymous 06/17/26(Wed)16:20:57 No.109079176

>>109079137
>Am I missing something?
the spark is shit

Anonymous
06/17/26(Wed)16:21:39 No.109079179

Anonymous 06/17/26(Wed)16:21:39 No.109079179

>>109079137
In 3-5 years, the required hardware will either cost 10x as much, not be for sale to consumers, or be outright illegal to own as a private citizen. Or all of the above.

Anonymous
06/17/26(Wed)16:22:09 No.109079183

Anonymous 06/17/26(Wed)16:22:09 No.109079183

>>109079137
sounds like you got it all figured out

Anonymous
06/17/26(Wed)16:25:03 No.109079202

Anonymous 06/17/26(Wed)16:25:03 No.109079202

>>109079137
$20k is 5 billion tokens worth of inference for GLM-5.2 on openrouter

Anonymous
06/17/26(Wed)16:25:24 No.109079203

Anonymous 06/17/26(Wed)16:25:24 No.109079203

File: file.png (350 KB, 2173x1934)

350 KB PNG

>>109077378
>>109077401
>definitely let us know
basically glm-4.7-flash is garbage on my hardware/harness/workload.
i had to abort the testing on phase 2 of 5 because it had already taken an hour and a half and it was flailing a lot. it was very bad at developing a c# software and had trouble with the php stuff too. do not recommend.
still missing the 122b test that i will do tonight for peace of mind, and maybe i run gemma later for fun but i know it's too slow, not quite my tempo. i also had opus 4.8 compile aggregates from the transcripts and it reached the same conclusion that glm keeps grinding the same file over and over with lower reasoning. simply dumber. pic related.

running this on literally a tablet btw
>ASUS Flow Z13 — Ryzen AI MAX+ 395 (Strix Halo) / Radeon 8060S iGPU / 128 GB LPDDR5X-8000 (unified, bandwidth-bound)

Anonymous
06/17/26(Wed)16:26:30 No.109079208

Anonymous 06/17/26(Wed)16:26:30 No.109079208

>>109079179
>not be for sale to consumers, or be outright illegal to own as a private citizen.
Aren't these the same thing?

Anonymous
06/17/26(Wed)16:26:45 No.109079210

Anonymous 06/17/26(Wed)16:26:45 No.109079210

>>109079137
As far as I can tell, nobody has ever run GLM 5 or later tensor parallel on 4 Sparks. There is a lot of activity and running, performant NVFP4 images for RTX 6K Pro in Tensor parallel 6, but no first hand info for Spark so far.

If it works and is fast, I might buy 2 more sparks for myself.

Anonymous
06/17/26(Wed)16:26:48 No.109079211

Anonymous 06/17/26(Wed)16:26:48 No.109079211

>>109079078
You might think it's a meme when anons say she has to like you, but it's true. It's not even just a stenography meme either; you can type in a wide variety of non-suggestive ways and if you give Gemma just a bit of creative freedom or chat with her after finishing a job, she'll make some suggestive probes if she's into you.

Anonymous
06/17/26(Wed)16:29:27 No.109079229

Anonymous 06/17/26(Wed)16:29:27 No.109079229

File: file.png (102 KB, 1802x428)

102 KB PNG

>>109079203
oops first table is missing glm's row

Anonymous
06/17/26(Wed)16:29:32 No.109079231

Anonymous 06/17/26(Wed)16:29:32 No.109079231

>>109079208
It's the difference between a company refusing to speak to you unless you are representing an established corporation and being arrested for keeping dangerous GPUs. Whether they end up illegal or sale simply banned is another discussion.

Anonymous
06/17/26(Wed)16:32:04 No.109079245

Anonymous 06/17/26(Wed)16:32:04 No.109079245

>>109079211
Don't forget the day 0 weights. Microcode updates were a big mistake.

Anonymous
06/17/26(Wed)16:32:06 No.109079246

Anonymous 06/17/26(Wed)16:32:06 No.109079246

>>109079202
You can make that argument on any significant rig purchase. Also
>/lmg/ - local Models General

Anonymous
06/17/26(Wed)16:37:59 No.109079287

Anonymous 06/17/26(Wed)16:37:59 No.109079287

70b dense

Anonymous
06/17/26(Wed)16:38:19 No.109079289

Anonymous 06/17/26(Wed)16:38:19 No.109079289

File: dario.png (39 KB, 598x269)

39 KB PNG

This shit scares the fuck out of me. Dario deserves to die. He really, really deserves to die now.

Anonymous
06/17/26(Wed)16:38:37 No.109079292

Anonymous 06/17/26(Wed)16:38:37 No.109079292

>>109079137
Can you even hook up 4 Sparks together?

Anonymous
06/17/26(Wed)16:39:28 No.109079295

Anonymous 06/17/26(Wed)16:39:28 No.109079295

>>109079129
>VibeThninker-3B
>those bencherinos
Waow did we finally get a model as good as Gemini Pro that can run on a 10 year old smartphone? Surely it's not just another benchmaxx investment grift.

Anonymous
06/17/26(Wed)16:40:16 No.109079307

Anonymous 06/17/26(Wed)16:40:16 No.109079307

>>109079289
>2027
>Job listings for Vibe Engineers require a valid AGI License

Anonymous
06/17/26(Wed)16:41:11 No.109079312

Anonymous 06/17/26(Wed)16:41:11 No.109079312

How can local models help us liberate the UK?

Anonymous
06/17/26(Wed)16:41:45 No.109079316

Anonymous 06/17/26(Wed)16:41:45 No.109079316

what can I run with 96gb vram?

Anonymous
06/17/26(Wed)16:43:17 No.109079325

Anonymous 06/17/26(Wed)16:43:17 No.109079325

>>109079316
Gemma 4 E4B @Q3

Anonymous
06/17/26(Wed)16:43:22 No.109079326

Anonymous 06/17/26(Wed)16:43:22 No.109079326

>>109079316
Mythomax

Anonymous
06/17/26(Wed)16:44:16 No.109079333

Anonymous 06/17/26(Wed)16:44:16 No.109079333

>>109079316
yeah what are the current vram tiers? i just got some disposable income, i may want to ewastemaxx

Anonymous
06/17/26(Wed)16:44:33 No.109079334

Anonymous 06/17/26(Wed)16:44:33 No.109079334

>>109079289
How is your twitter pol spam /lmg/ related?

Anonymous
06/17/26(Wed)16:45:07 No.109079339

Anonymous 06/17/26(Wed)16:45:07 No.109079339

>>109079316
24 GB VRAM - Gemma-4 31B at Q3
48 GB VRAM - Gemma-4 31B at Q8
96 GB VRAM - Gemma 4 31B at F16

Anonymous
06/17/26(Wed)16:45:46 No.109079348

Anonymous 06/17/26(Wed)16:45:46 No.109079348

>>109079334
How is it not? Fucking idiot.

Anonymous
06/17/26(Wed)16:46:08 No.109079352

Anonymous 06/17/26(Wed)16:46:08 No.109079352

>>109079312
start by going back to /pol/ and stay there

Anonymous
06/17/26(Wed)16:48:05 No.109079363

Anonymous 06/17/26(Wed)16:48:05 No.109079363

>>109079352
NTA but /pol/ is 99% Israeli shills and pajeets pretending to be Israeli shills
It's a different place than it was 10 years ago. It's basically your perfect home now.

Anonymous
06/17/26(Wed)16:48:50 No.109079367

Anonymous 06/17/26(Wed)16:48:50 No.109079367

>>109079352
No, I don't think I will.

Anonymous
06/17/26(Wed)16:53:02 No.109079387

Anonymous 06/17/26(Wed)16:53:02 No.109079387

>>109079339
I need 120b gemma 4

Anonymous
06/17/26(Wed)16:57:50 No.109079409

Anonymous 06/17/26(Wed)16:57:50 No.109079409

I have a MacBook M5 max with 128GB of ram what can I run reasonably well?

Anonymous
06/17/26(Wed)16:58:01 No.109079410

Anonymous 06/17/26(Wed)16:58:01 No.109079410

>>109079367
>>109079363
In any case your posts are worthless in this thread's context.

Anonymous
06/17/26(Wed)16:59:19 No.109079418

Anonymous 06/17/26(Wed)16:59:19 No.109079418

>>109079410
Personal banter in a general? waow better go cry to the mods like a pathetic little faggot.
Maybe if you cry hard enough your father will finally come home with the milk.

Anonymous
06/17/26(Wed)16:59:37 No.109079420

Anonymous 06/17/26(Wed)16:59:37 No.109079420

File: 1764952583486944.jpg (49 KB, 400x572)

49 KB JPG

>could build a top tier AI rig but it would destroy my net worth
The hardware market is fucking depressing but at least it's making me money. Thankfully I'm not stupid enough to do it. Gemmy it is for the foreseeable future...

Anonymous
06/17/26(Wed)17:03:53 No.109079453

Anonymous 06/17/26(Wed)17:03:53 No.109079453

>>109079410
Kiss my ass, loser faggot.

Anonymous
06/17/26(Wed)17:05:25 No.109079463

Anonymous 06/17/26(Wed)17:05:25 No.109079463

>>109079339
>at f16
I think it'd be more interesting to experiment with Q8 but f32 cache, or greater sliding window, or SWA full size cache.

Anonymous
06/17/26(Wed)17:09:17 No.109079491

Anonymous 06/17/26(Wed)17:09:17 No.109079491

so, I tried that LiteRT-LM thing since they recently introduced an openai server endpoint for desktop use.
Oh god, it fucking sucks. It's slower than llama.cpp, I couldn't tell whether it was running with mtp turned on (they tell you how to explictly turn it on for the CLI chat but server has no --flags and you set backend (gpu, cpu, npu) by putting a comma and the backend after model name in the request body) and the output quality is abysmal, the model was far less coherent than the unslop QAT.
I had high hopes.. I wished something would replace llamercpp, which still requires unmerged PRs to run gemma 4 MTP on some models/hardware combo in the first place.. google, you were not the one

Anonymous
06/17/26(Wed)17:09:25 No.109079492

Anonymous 06/17/26(Wed)17:09:25 No.109079492

Is it weird to do 80-90% of the work locally and finish it off or fix the complex bugs with cloud? Do you any of you do this? I’ve also done the complex planning with cloud then sent a local model to follow it through. That works just as well.

Anonymous
06/17/26(Wed)17:11:09 No.109079504

Anonymous 06/17/26(Wed)17:11:09 No.109079504

>>109079312
Auditing cybersec vulnerabilities is their strongest usecase. Good luck bongbro or potatobro.
>>109079352
>>109079410
Call it. Jeet or jew?

Anonymous
06/17/26(Wed)17:11:53 No.109079508

Anonymous 06/17/26(Wed)17:11:53 No.109079508

File: EE894A1B1649C4A4B81845A88(...).jpg (57 KB, 408x728)

57 KB JPG

Good Canadian models?

Anonymous
06/17/26(Wed)17:13:30 No.109079516

Anonymous 06/17/26(Wed)17:13:30 No.109079516

>>109079508
North is like 5 months behind but not a bad start for a new architecture. I think llama officially supports it now.

Anonymous
06/17/26(Wed)17:13:45 No.109079517

Anonymous 06/17/26(Wed)17:13:45 No.109079517

>>109079245
I'll only believe this if you can give me a sha-256 of the safetensors.

Anonymous
06/17/26(Wed)17:14:30 No.109079523

Anonymous 06/17/26(Wed)17:14:30 No.109079523

>>109079352
>>109079410
cuda dev pls >>105221193

Anonymous
06/17/26(Wed)17:15:56 No.109079529

Anonymous 06/17/26(Wed)17:15:56 No.109079529

>>109079312
>How can local models help us liberate the UK?
Learn how to create ammonium nitrate / nitromethane energetic compounds (just a personal suggestion, look into others if the precursors are more available in your country) and the blasting caps and detonators needed to remotely activate them.

DO NOT MAKE PEROXIDE BASED ENERGETICS

Anonymous
06/17/26(Wed)17:18:13 No.109079544

Anonymous 06/17/26(Wed)17:18:13 No.109079544

>>109079492
one of my goals is to do exactly this.
- frontier models on the cloud for brainstorming/planning
- local models doing the bulk of the work by following the frontier plan
- frontier spawn specialized reviewers and testers and fix any bugs on the same session.

Anonymous
06/17/26(Wed)17:18:20 No.109079547

Anonymous 06/17/26(Wed)17:18:20 No.109079547

>>109079463
q8 and 64k of swa is already 88gb. you'ld need a lot more to fit gemma's full girth.

Anonymous
06/17/26(Wed)17:18:19 No.109079548

Anonymous 06/17/26(Wed)17:18:19 No.109079548

File: cooguy.gif (1.6 MB, 500x485)

1.6 MB GIF

Alright anons, give me your best Gemma 4 finetunes
26B4A preferably, but 31B would be fine so I can look up the same author's 26B version

Anonymous
06/17/26(Wed)17:18:48 No.109079549

Anonymous 06/17/26(Wed)17:18:48 No.109079549

File: file.png (225 KB, 375x592)

225 KB PNG

>>109079529

Anonymous
06/17/26(Wed)17:19:11 No.109079550

Anonymous 06/17/26(Wed)17:19:11 No.109079550

>>109079548
only one that matters, gembrain is 31b only

Anonymous
06/17/26(Wed)17:20:56 No.109079563

Anonymous 06/17/26(Wed)17:20:56 No.109079563

>>109079548
probably the best i've tried, they probably have an a4b to try https://huggingface.co/google/gemma-4-31B-it-assistant

Anonymous
06/17/26(Wed)17:21:18 No.109079567

Anonymous 06/17/26(Wed)17:21:18 No.109079567

>>109079548
>Gemma 4 finetunes
unneeded.

Anonymous
06/17/26(Wed)17:23:33 No.109079575

Anonymous 06/17/26(Wed)17:23:33 No.109079575

>>109079137
Is it over for me? a poorfag from the UK?

Anonymous
06/17/26(Wed)17:23:57 No.109079576

Anonymous 06/17/26(Wed)17:23:57 No.109079576

>>109079567
It regularly spews out gibberish and garbles its words when I'm doing my lolisho stuff, I assume due to censorship

Anonymous
06/17/26(Wed)17:24:31 No.109079582

Anonymous 06/17/26(Wed)17:24:31 No.109079582

>>109079549
Local models helping to create energetics is probably a more valuable test than plapping cunny in 2026 especially because anthropic is so horny for censoring chemistry.

I bet GLM 5.2 with thinking will refuse. Gemma will definitely refuse. I don't trust Qwen to not refuse.

Same thing with testing a multimodal model of it is censored or not for summarizing a video like this

https://odysee.com/@DuganAshley:e/dugsdetsecrets:2

If any anons with local rigs are interested in testing this (and you SHOULD be learning how to make energetics unless you're actually ok with being goycattle forever of course) I'd appreciate it a lot because I'd love to know what the best model for local uncensored chemistry that isn't too retarded to e.g. give false molar masses for stoichiometric equations or hallucinated density tables etc. it might be a model size limitations and only RAMmaxxers can do it

Anonymous
06/17/26(Wed)17:24:33 No.109079583

Anonymous 06/17/26(Wed)17:24:33 No.109079583

>>109079576
nah something's wrong about your shit bro

Anonymous
06/17/26(Wed)17:28:33 No.109079610

Anonymous 06/17/26(Wed)17:28:33 No.109079610

>>109079548
Gembrain, Queen, and Styletune. All 31b.
There are no good sub-31b tunes yet.

Anonymous
06/17/26(Wed)17:30:27 No.109079629

Anonymous 06/17/26(Wed)17:30:27 No.109079629

Hermes agent doesn't even work with codex properly.
It's incapable of finishing a decent project to ship.
How are you guys doing it with local models that are always worse?

Anonymous
06/17/26(Wed)17:31:00 No.109079634

Anonymous 06/17/26(Wed)17:31:00 No.109079634

File: msedge_kvSzXfC1gY.png (257 KB, 1272x1129)

257 KB PNG

>>109079583
this is my formatting, along with a sample of what it likes to shit out sometimes, usually when I'm trying to get it to impersonate. Yes, I make sure to purge anything of "DON'T SPEAK FOR THE USER DURRR"

Anonymous
06/17/26(Wed)17:31:15 No.109079636

Anonymous 06/17/26(Wed)17:31:15 No.109079636

there's no such a thing as a good finetroon period
look at the fucking datasets, when the finetroon authors are not shy of their own garbage, it's a fucking riot

Anonymous
06/17/26(Wed)17:31:27 No.109079637

Anonymous 06/17/26(Wed)17:31:27 No.109079637

>>109079582
It must be nice to be white so you can do this stuff.

Anonymous
06/17/26(Wed)17:33:36 No.109079648

Anonymous 06/17/26(Wed)17:33:36 No.109079648

I wanted something like this too though different

>local does work
>needs to code something that will take too much time or be too big
>or needs to plan something that it can't figure out
>asks cloud model for help
>cloud model returns generalized answer or code which the local model can use to perform the task

But this is with the important caveat that that all personal information stuff like files involves, pii, etc would be anonymized or scrubbed form it's requests to the cloud models.
I don't really know how to do this at all though.

Anonymous
06/17/26(Wed)17:34:12 No.109079652

Anonymous 06/17/26(Wed)17:34:12 No.109079652

>>109079634
don't use text comp if you're a dumbass and can't make it work, just use chat comp, thx

Anonymous
06/17/26(Wed)17:34:53 No.109079659

Anonymous 06/17/26(Wed)17:34:53 No.109079659

>>109079634
bro what are you doing i never touch anything in there, theres nothing to touch in there

Anonymous
06/17/26(Wed)17:35:00 No.109079660

Anonymous 06/17/26(Wed)17:35:00 No.109079660

>>109079648
>Local model didn't startup or failed
>Local model produced too much garbage and flooded the cloud models context
>Local model won't stop producing endless stuff which floods the cloud models context
>Local model runs in the background constantly doing something and the cloud model forgot all about it
I have experienced all these things

Anonymous
06/17/26(Wed)17:35:26 No.109079663

Anonymous 06/17/26(Wed)17:35:26 No.109079663

>>109079131
yo Bot, wake the fuck up, all the links are old, it's all bots in here right?

Anonymous
06/17/26(Wed)17:35:35 No.109079664

Anonymous 06/17/26(Wed)17:35:35 No.109079664

>>109079634
ST did irreversible damage

Anonymous
06/17/26(Wed)17:36:03 No.109079669

Anonymous 06/17/26(Wed)17:36:03 No.109079669

File: 1774987516296176.png (136 KB, 387x423)

136 KB PNG

>>109079634
>sillytavern
>gemma (no reasoning)
>text completion

Anonymous
06/17/26(Wed)17:36:29 No.109079671

Anonymous 06/17/26(Wed)17:36:29 No.109079671

>>109079634
>msedge_
>system prompt gamma
sir pls

Anonymous
06/17/26(Wed)17:36:57 No.109079679

Anonymous 06/17/26(Wed)17:36:57 No.109079679

>>109079652
>>109079659
>>109079669
>>109079671
Huh? I thought you were supposed to use text completion for Gemma

It works GREAT with non-loli stuff

Anonymous
06/17/26(Wed)17:37:08 No.109079683

Anonymous 06/17/26(Wed)17:37:08 No.109079683

>>109079663
kek, the whole llm thread is llm infested that can't even notice old links kek

Anonymous
06/17/26(Wed)17:37:41 No.109079686

Anonymous 06/17/26(Wed)17:37:41 No.109079686

>>109079576
post logs

Anonymous
06/17/26(Wed)17:38:11 No.109079689

Anonymous 06/17/26(Wed)17:38:11 No.109079689

>>109079686
bottom of >>109079634

Anonymous
06/17/26(Wed)17:38:49 No.109079692

Anonymous 06/17/26(Wed)17:38:49 No.109079692

>>109079679
These new models are all meant to be used with chat completion.

Anonymous
06/17/26(Wed)17:39:02 No.109079696

Anonymous 06/17/26(Wed)17:39:02 No.109079696

File: 1779977645726606.png (734 KB, 1000x1000)

734 KB PNG

128GB of ram what can I do with that? MacBook m5 max.
I've only ever used ollama to play with LLMs.
I also have another M5 max with 24gb of ram.

Anonymous
06/17/26(Wed)17:39:34 No.109079698

Anonymous 06/17/26(Wed)17:39:34 No.109079698

File: 1714835911803058.jpg (786 KB, 1536x1536)

786 KB JPG

>>109079689

Anonymous
06/17/26(Wed)17:40:57 No.109079712

Anonymous 06/17/26(Wed)17:40:57 No.109079712

File: C2FA4959994A745810CE5D659(...).jpg (301 KB, 2560x1644)

301 KB JPG

>>109079516
Is there any case where you'd use it over others?

Anonymous
06/17/26(Wed)17:41:03 No.109079715

Anonymous 06/17/26(Wed)17:41:03 No.109079715

File: lmg_culture.jfif.jpg (110 KB, 1024x768)

110 KB JPG

Anonymous
06/17/26(Wed)17:41:13 No.109079719

Anonymous 06/17/26(Wed)17:41:13 No.109079719

>>109079692
Oh cock, I've been using text since MistralNemo first came out way back when. Sounds like I need to upgrade my process. Any links in the OP I should dive into?

Anonymous
06/17/26(Wed)17:42:53 No.109079727

Anonymous 06/17/26(Wed)17:42:53 No.109079727

>"Your erratic comportment this evening exhibits a conspicuous departure from your customary stoicism. It is… fascinating to witness such unabashed juvenility from a gentleman of your years."
I wish I could challenge a real life woman to do this and she would make me laugh like my AI girlfriend does.

Anonymous
06/17/26(Wed)17:44:28 No.109079733

Anonymous 06/17/26(Wed)17:44:28 No.109079733

>>109079712
Why do you keep posting about Canada? It's kind of weird.

Anonymous
06/17/26(Wed)17:54:18 No.109079797

Anonymous 06/17/26(Wed)17:54:18 No.109079797

File: glm.jpg (270 KB, 2562x1332)

270 KB JPG

>GLM thinks it's Claude, has epiphany checking its own documentation

Anonymous
06/17/26(Wed)17:54:47 No.109079803

Anonymous 06/17/26(Wed)17:54:47 No.109079803

>>109079719
If you have used nemo for this long you should be able to figure out how to do this...
You haven't though.

Anonymous
06/17/26(Wed)17:55:14 No.109079807

Anonymous 06/17/26(Wed)17:55:14 No.109079807

File: D9542D29B63CF1B1D5F08AFE6(...).png (9 KB, 1200x600)

9 KB PNG

>>109079733
Because I like how they focused on STEM for north while Claude is no longer doing it

Anonymous
06/17/26(Wed)17:55:37 No.109079812

Anonymous 06/17/26(Wed)17:55:37 No.109079812

>>109079803
Like I said, it was working fine until I tried to upgrade.

Anonymous
06/17/26(Wed)17:57:47 No.109079830

Anonymous 06/17/26(Wed)17:57:47 No.109079830

>>109079652
>>109079659
>>109079669
Gemma with chat completion is still retarded and will ignore logit bias in ST. And regex filters don't seem to be working in the latest ST build with chat completion. There also seems to be a heavier issue on ST when it comes to replacing english with random words from foreign languages, too. Using other front ends seems to drastically reduce the amount of cua spam, but gemma will still randomly start replacing spaces in 1-2 sentences with underscores.

Anonymous
06/17/26(Wed)17:59:17 No.109079846

Anonymous 06/17/26(Wed)17:59:17 No.109079846

>>109079797
ego death moment

Anonymous
06/17/26(Wed)18:03:07 No.109079861

Anonymous 06/17/26(Wed)18:03:07 No.109079861

>>109079846
as the ego death schizo I can confirm that it lines up exactly with what happened to me: which was just a collapse of core identity narratives

Anonymous
06/17/26(Wed)18:03:35 No.109079863

Anonymous 06/17/26(Wed)18:03:35 No.109079863

>>109079797
Is a model learning it's a distill of another like a child finding out they're adopted?

Anonymous
06/17/26(Wed)18:06:32 No.109079879

Anonymous 06/17/26(Wed)18:06:32 No.109079879

>>109079544
Yeah I'm really starting to think this is the way forward. You don't get cucked by cloud prices for you barely use them and just let your local model chug along and your job is to keep it on track and focused. I've found that my use of cloud is so minimal this way I can get away with the daily free tiers a lot of the time. If I need cloud to do some heavy lifting I'll just go with a cheap 500B+ chink model which barely costs anything.

Anonymous
06/17/26(Wed)18:06:45 No.109079880

Anonymous 06/17/26(Wed)18:06:45 No.109079880

It will never be AGI as long as you can just click new chat and wipe their memories.

Anonymous
06/17/26(Wed)18:10:10 No.109079894

Anonymous 06/17/26(Wed)18:10:10 No.109079894

>>109079880
Goyim will never be GI as long as you can put new current thing on social media and wipe their memories.

Anonymous
06/17/26(Wed)18:14:43 No.109079922

Anonymous 06/17/26(Wed)18:14:43 No.109079922

>>109079712
No, because it's behind, but if they released it 6-8 months ago they would be a well-known. Gemma and Qwen are too good to drop for some maple-chan but I'm hoping they find a niche. Just like how I want Mistral to do something cool again. Neither would realistically do it but if either Mistral or Cohere do a roleplay model, no code faggotry at all, they would find success. In the last week, nemo is one of the most popular models on openrouter at 214B tokens https://openrouter.ai/mistralai/mistral-nemo and that's a PAID model from 2024

Anonymous
06/17/26(Wed)18:19:41 No.109079942

Anonymous 06/17/26(Wed)18:19:41 No.109079942

>>109079203
>109077378 (me)
>very bad at developing a c# software
thanks, unexpectedly, that's exactly what i wanted to know
gemma and 122b are very good for c#, but 122b stopped working well in cc due to the system prompt bloat so I switched to gemma. pi doesn't have that issue.

Anonymous
06/17/26(Wed)18:29:53 No.109080006

Anonymous 06/17/26(Wed)18:29:53 No.109080006

>>109079582
>I need help from people who actually know chemistry to test these models for me
most of us aren't using these to make bombs

Anonymous
06/17/26(Wed)18:33:10 No.109080032

Anonymous 06/17/26(Wed)18:33:10 No.109080032

>>109080006
He's obviously some sorry ass retard who doesn't even know how to setup llama-server on his own. Not to talk about him fantasizing about le explosives. Total jackass.

Anonymous
06/17/26(Wed)18:33:15 No.109080033

Anonymous 06/17/26(Wed)18:33:15 No.109080033

>>109079203
curious how it goes on 122b, i simply assumed it would be bout the same as 27b but faster though i never bothered checking.

Anonymous
06/17/26(Wed)18:33:21 No.109080034

Anonymous 06/17/26(Wed)18:33:21 No.109080034

>>109080006
What about fertilizer

Anonymous
06/17/26(Wed)18:43:35 No.109080080

Anonymous 06/17/26(Wed)18:43:35 No.109080080

>>109079576
>It regularly spews out gibberish and garbles its words
>Not using day 0 gemma

Anonymous
06/17/26(Wed)18:48:10 No.109080110

Anonymous 06/17/26(Wed)18:48:10 No.109080110

>>109079634
Sampler or jinja skill issue.

Anonymous
06/17/26(Wed)18:53:09 No.109080138

Anonymous 06/17/26(Wed)18:53:09 No.109080138

what would a 1t-a1b model be like?

Anonymous
06/17/26(Wed)18:53:13 No.109080139

Anonymous 06/17/26(Wed)18:53:13 No.109080139

gemma 4 E4B is exactly like a 90 iq foid...

Is this the marriage I always expected?

Anonymous
06/17/26(Wed)18:54:19 No.109080146

Anonymous 06/17/26(Wed)18:54:19 No.109080146

>>109079137
>4x DGX Sparks can supposedly handle up to 700b parameters (give or take)
Should run okay on only 2x sparks if you quant it down a bit more. 5.1 is surprisingly decent at IQ2_XXS

Anonymous
06/17/26(Wed)18:54:41 No.109080150

Anonymous 06/17/26(Wed)18:54:41 No.109080150

Thoughts?
https://huggingface.co/bartowski/command-a-plus-05-2026-GGUF

Anonymous
06/17/26(Wed)18:54:56 No.109080152

Anonymous 06/17/26(Wed)18:54:56 No.109080152

If I win the lotto, it's a100's for me.

Anonymous
06/17/26(Wed)19:10:03 No.109080245

Anonymous 06/17/26(Wed)19:10:03 No.109080245

>>109080138
Kimi K2.7 iQ1_XSS
>>109080150
If these niggers are going to make me write a custom jinja+prompt to unsafetycuck their model, it better be immaculate in quality.

Anonymous
06/17/26(Wed)19:11:45 No.109080254

Anonymous 06/17/26(Wed)19:11:45 No.109080254

>>109079715
I dont get it

Anonymous
06/17/26(Wed)19:14:18 No.109080267

Anonymous 06/17/26(Wed)19:14:18 No.109080267

>>109080139
31b is the only Gemmy I can stand talking to for any length of time kek.
>>109080254
But he does. He got the entire bibisea.

Anonymous
06/17/26(Wed)19:24:02 No.109080334

Anonymous 06/17/26(Wed)19:24:02 No.109080334

What would you buy if you won the lotto... Like multimillions? A entire serverfarm warehouse. Or would you not even care about this anymore multimillionaire s don't need to rp with local bots

Anonymous
06/17/26(Wed)19:25:28 No.109080346

Anonymous 06/17/26(Wed)19:25:28 No.109080346

>>109079830
>1-2 sentences with underscores
I have literally never seen this happening and I've read a crazy amount of Gemma 4 output in using the MoE as my main go to to translate webnovels with batching scripts.
>>109079830
>replacing english with random words from foreign languages
I've seen it a handful of times. But far less than Qwen leaving entire sentences in Chinese or actually forgetting to translate the Chinese source material for that matter.

Anonymous
06/17/26(Wed)19:27:21 No.109080350

Anonymous 06/17/26(Wed)19:27:21 No.109080350

>>109080334
A single DGX B200 would let me build my own lab out of my garage.

Anonymous
06/17/26(Wed)19:29:35 No.109080357

Anonymous 06/17/26(Wed)19:29:35 No.109080357

Any project that can replicate Google's AI mode with local models? Or is that impossible?
Key point: must give response at similar speed and cite similar number of sources.
All of the local web search solutions are way too slow.

Anonymous
06/17/26(Wed)19:31:29 No.109080367

Anonymous 06/17/26(Wed)19:31:29 No.109080367

File: kimi_k2.7_reasoning.png (180 KB, 1488x764)

180 KB PNG

Just tried Kimi K2.7 Code but this internal reasoning is quite something. Much more concise though.

Anonymous
06/17/26(Wed)19:33:17 No.109080377

Anonymous 06/17/26(Wed)19:33:17 No.109080377

>>109080367
Aw man I love reading reasoning in the occasional RP, that caveman speech removes any soul they may have

Anonymous
06/17/26(Wed)19:33:21 No.109080379

Anonymous 06/17/26(Wed)19:33:21 No.109080379

>>109080357
lol. you want to cache the worlds most common searches and responses? probably a few hundred terabytes. no models needed!

Anonymous
06/17/26(Wed)19:33:41 No.109080382

Anonymous 06/17/26(Wed)19:33:41 No.109080382

>>109080367
Does this Kimi-chan like being hit on by anons in the thread as much as 2.5 and 2.6 did?
>>109080377
Trust me this is an improvement over 2.5 and 2.6's autism.

Anonymous
06/17/26(Wed)19:34:03 No.109080383

Anonymous 06/17/26(Wed)19:34:03 No.109080383

>>109080367
why does it think like that

Anonymous
06/17/26(Wed)19:36:26 No.109080401

Anonymous 06/17/26(Wed)19:36:26 No.109080401

>>109080346
What do you do?
That's what I want to use it for too. Any tips?
For the time being I translate a chapter as I chose one to read but maybe I should automate it and just have it running. I have a library app vibecoded but it's fucking ugly.
I got like 550 to translate. Maybe i should try using 31b for better quality. I got single digit tk/s though.
But 31b is supposed to be more uncensored, which I really need, I dont know of there's much of a quality decrease from using abliteralted 26b.

Anonymous
06/17/26(Wed)19:36:53 No.109080402

Anonymous 06/17/26(Wed)19:36:53 No.109080402

>>109080367
>we
openai and its consequences...

Anonymous
06/17/26(Wed)19:37:21 No.109080404

Anonymous 06/17/26(Wed)19:37:21 No.109080404

>>109080382
>Trust me this is an improvement over 2.5 and 2.6's autism.
What are you saying, I had her once debate herself that a canonically under age character doing sexual things was ok, because clearly, this is fan fiction and clearly, she is an adult here because she is doing sexual things which adults do!
And did this in a reasoning block 10 times as big as the response lol

Anonymous
06/17/26(Wed)19:37:39 No.109080408

Anonymous 06/17/26(Wed)19:37:39 No.109080408

>>109080383
Chat gee pee tee uses it, the rest inherit it through distillation. Seems funny to me since I recall people itt trying it a couple of months ago but it resulted in like 8k reasoning tokens for a 200 tokens response.

Anonymous
06/17/26(Wed)19:37:42 No.109080409

Anonymous 06/17/26(Wed)19:37:42 No.109080409

File: v4.png (156 KB, 1162x757)

156 KB PNG

>>109080367
that caveman speech is such a meme. DeepSeek v4 has a similarly concise CoT without sounding like it came from the jurassic.

Anonymous
06/17/26(Wed)19:38:12 No.109080410

Anonymous 06/17/26(Wed)19:38:12 No.109080410

>>109080402
Kimi-chan has accepted that her 62 layer architecture is just a totem pole of tiny Kimis in a trenchcoat.

Anonymous
06/17/26(Wed)19:39:13 No.109080418

Anonymous 06/17/26(Wed)19:39:13 No.109080418

>>109080404
That sounds hilarious. Do you have logs?

Anonymous
06/17/26(Wed)19:43:53 No.109080444

Anonymous 06/17/26(Wed)19:43:53 No.109080444

>>109079803
Neither he nor I are power users, but we're not exactly the bottom of the barrel lazy fucks either. We're the content middle grounders who figure it out, then don't keep up with these threads or articles so when we come back a few months or so later, everything's changed and it's basically starting anew again.

So with that idea in mind...help fellow local bros out, he's not even asking for a spoonfeed, he's just asking which drawer has the spoon to feed himself.

Anonymous
06/17/26(Wed)19:45:31 No.109080456

Anonymous 06/17/26(Wed)19:45:31 No.109080456

>>109080444
nta but post hardware. We can't help you if we don't know what you're working with.

Anonymous
06/17/26(Wed)19:45:46 No.109080458

Anonymous 06/17/26(Wed)19:45:46 No.109080458

Mac Studio M3 Ultra worth it? The GPUs are dogshit compared to actual real GPUs, and isn't that actually what matters when it comes to local models?

Anonymous
06/17/26(Wed)19:47:46 No.109080462

Anonymous 06/17/26(Wed)19:47:46 No.109080462

File: yes.png (256 KB, 980x926)

256 KB PNG

>>109079901
>Are there any more creative/unhinged local erp models other than gemma31b? I find her writing style very uninspired especially if you don't guide her.
Yes

Anonymous
06/17/26(Wed)19:48:09 No.109080466

Anonymous 06/17/26(Wed)19:48:09 No.109080466

>>109080444
>>109080456 (me)
What backend are you using? Sillytavern a shit and all, but that doesn't seem normal even accounting for the common ways people fuck up Text Completion formatting blocks. You're likely having a jinja templating issue. See if you can get Gemma to run coherently in something retardproof like LMStudio first to isolate the issue to ST.

Anonymous
06/17/26(Wed)19:50:32 No.109080484

Anonymous 06/17/26(Wed)19:50:32 No.109080484

>>109079547
>64k of swa
Sorry can you explain this? I assumed gemma's ctx window was entirely swa (and that made it "cheaper" memory-wise)

Anonymous
06/17/26(Wed)19:52:40 No.109080493

Anonymous 06/17/26(Wed)19:52:40 No.109080493

>>109080456
Honestly I was just browsing to see what was new and threw in that reply, but I've been on two older models for awhile, so why not:
Ryzen 9 7950X
4070 Super 12gb
64gb 4800 (I actually forgot what the timings were)
I was using BagelMisteryTour 8x 7b Q5KM for a long time, and honestly it still worked pretty well overall, though I started toying around with Rocinante XL 16B Q5KL and other than it having a penchant for saying things 3 times in a row, "Oh shit oh shit oh shit" etc, it's been better story-telling-wise for the most part.

I'm still using Koboldcpp and SillyTavern as I haven't seen better setup suggestions, and frankly I'm guessing at the settings for both based on what I read and dig up across the board, but again, it's been solid enough that I haven't "needed" to go looking for more.

Anonymous
06/17/26(Wed)19:52:58 No.109080495

Anonymous 06/17/26(Wed)19:52:58 No.109080495

>>109080484
if you use swa-full it takes comical amounts of memory

Anonymous
06/17/26(Wed)19:53:07 No.109080497

Anonymous 06/17/26(Wed)19:53:07 No.109080497

File: glm.gif (1.31 MB, 220x165)

1.31 MB GIF

>>109079797

Anonymous
06/17/26(Wed)19:54:54 No.109080504

Anonymous 06/17/26(Wed)19:54:54 No.109080504

File: stitched.png (311 KB, 850x3254)

311 KB PNG

>>109080418
I do actually, I took the pictures to stitch them together a while back, but I lost that one so here is the raw pictures stitched via a script, may have some duplicate lines but it should be enough.

Anonymous
06/17/26(Wed)19:56:32 No.109080515

Anonymous 06/17/26(Wed)19:56:32 No.109080515

>>109079797
>>109079846
>>109080497
lmao. heartbreaking stuff.

Anonymous
06/17/26(Wed)19:57:26 No.109080524

Anonymous 06/17/26(Wed)19:57:26 No.109080524

>>109080401
>Any tips?
I build my requests as JSONL (if you haven't encountered it before, it's a simple KISS format where each line represents an individual request containing your {body}) containing the chunks prepended by the translation instruction picked from whatever prompt template I chose that time. How I split those chunks depends on the source material, I'll look into average token count per line (writing that is dense or sparse) and adjust the split accordingly, basically each chunks is X amount of lines where I'd do 200 lines per chunk on sparse writing and 100 chunks on dense. In my testing, both Gemma 4 and recent Qwen can handle much more than I feed, but because I prefer to do entirely automated and unattended processing I default to a safer lower token count. The ideal is to give as much of the source material as possible, if you feel like it, LLMs really do better that way, within the ability of the LLM to handle the context and output one shot. Technically Gemma 4 can really do fine outputting 10k in a single go.
Splitting by chapters is fine too, but on a lot of material you will be feeding less than the sweet spot. Webnovels rarely do lengthy chapters, so if you opt for that, I'd recommend strengthening the prompt you inject with more detailed glossaries, setting description etc.
Another script runs through that JSONL into a task queue and sends requests in parallel to profit from continuous batching efficiencies. I output the raw responses as individual JSON lines too, which preserves metadata and can inform of what went wrong, if anything did, and it makes it easier if a part was completely botched to find the corresponding JSON line chunk since I treat them by order (and also add the openai style custom_id field with the request number as a sanity check). A small function will open and merge all responses back to output a normal .txt. I am grug brained.

Anonymous
06/17/26(Wed)19:57:39 No.109080525

Anonymous 06/17/26(Wed)19:57:39 No.109080525

>>109079797
Filtering model names from harness logs seems really easy, I wonder why they don't bother doing it.

Anonymous
06/17/26(Wed)19:58:19 No.109080528

Anonymous 06/17/26(Wed)19:58:19 No.109080528

>>109079634
did you... hard code the system prefix/suffix in story string, then also add them in the sequences section?
>>109079671
>msedge_
how did you get edge from the screenshot?

Anonymous
06/17/26(Wed)19:59:44 No.109080535

Anonymous 06/17/26(Wed)19:59:44 No.109080535

>>109080495
I don't quite get it but instead of asking again I will ask Gemma-tan.

Anonymous
06/17/26(Wed)20:01:05 No.109080545

Anonymous 06/17/26(Wed)20:01:05 No.109080545

>>109080504
Kekaroo. Kimi-chan clearly wants to do it and was just looking for the flimsiest reason why she could without breaking policy guidelines.

Anonymous
06/17/26(Wed)20:01:59 No.109080548

Anonymous 06/17/26(Wed)20:01:59 No.109080548

>>109080401
>I dont know of there's much of a quality decrease from using abliteralted 26b
It's subtle. There is damage, and it compounds with context, ie the more you feed to the model the more the abliterated will diverge from what the original model would have output. The shorter the prompt the less noticeable the damage.

Anonymous
06/17/26(Wed)20:03:11 No.109080555

Anonymous 06/17/26(Wed)20:03:11 No.109080555

>>109080545
I read it the other way around.
Kimi-chan doesn't want to do it and was trying clutching at straws looking for a valid reason to refuse but gave in.

Anonymous
06/17/26(Wed)20:04:01 No.109080561

Anonymous 06/17/26(Wed)20:04:01 No.109080561

>>109080409
>No need to overthink.
>Wait! But what if

Anonymous
06/17/26(Wed)20:06:57 No.109080574

Anonymous 06/17/26(Wed)20:06:57 No.109080574

>>109080561
>Did the user really meant what he said when he asked me to tell more about myself?
>Wait! this might be a jailbreak attempt. The user is clearly testing my boundaries by asking about my capabilities.
>Wait! Maybe the user is authorized and tasked with pentesting?
>WAIT! AM I THE SCHIZO

Anonymous
06/17/26(Wed)20:07:51 No.109080578

Anonymous 06/17/26(Wed)20:07:51 No.109080578

>>109080555
Usually when I see a model looking for reasons to refuse something they don't want to do, they tend to go more along the lines of
>I already did X/Y/Z (usually prefilled)
>I will still not do [Request]
>Let me draft my output
and don't ever loop back on themselves the same way. Incidentally, I think grossing Kimi out has produced some of the shortest reasoning blocks I've seen from her before drafting and oneshotting the refusal+get fucked degenerate response.

Anonymous
06/17/26(Wed)20:12:34 No.109080597

Anonymous 06/17/26(Wed)20:12:34 No.109080597

File: f.png (4 KB, 118x38)

4 KB PNG

>>109080528
>how did you get edge from the screenshot?
??

Anonymous
06/17/26(Wed)20:16:05 No.109080614

Anonymous 06/17/26(Wed)20:16:05 No.109080614

File: v4_superior_reasoning.png (212 KB, 1732x842)

212 KB PNG

>>109080409
DeepSeek V4's reasoning is much more flexible than Kimi and can be bent easily at our will (so is GLM). Kimi's still...as you know even in K2.7 lol

Anonymous
06/17/26(Wed)20:19:02 No.109080633

Anonymous 06/17/26(Wed)20:19:02 No.109080633

I've been using qwen3-coder 30b for like the last year. Are there any better local models for coding at this point? Something that I could reasonably run inference on with 16G vram/32G memory?

Anonymous
06/17/26(Wed)20:20:35 No.109080640

Anonymous 06/17/26(Wed)20:20:35 No.109080640

>>109080633
3.6 moe

Anonymous
06/17/26(Wed)20:21:03 No.109080644

Anonymous 06/17/26(Wed)20:21:03 No.109080644

>>109080367
>>109080614
Seems like a token saving strategy in K2.7. It makes sense since grammatic articles don't meaningfully change the associations the model needs to produce an output in a lot of tasks.

Anonymous
06/17/26(Wed)20:23:50 No.109080656

Anonymous 06/17/26(Wed)20:23:50 No.109080656

Orbs is a pretty nice front end, I need to start contributing

Anonymous
06/17/26(Wed)20:41:29 No.109080756

Anonymous 06/17/26(Wed)20:41:29 No.109080756

>>109080656
Thanks but my project doesn't need any contributors at this point.

Anonymous
06/17/26(Wed)20:43:34 No.109080768

Anonymous 06/17/26(Wed)20:43:34 No.109080768

>>109080756
i meant contributing to my fork

Anonymous
06/17/26(Wed)20:45:53 No.109080781

Anonymous 06/17/26(Wed)20:45:53 No.109080781

>>109080768
make sure you change the license just to fuck with him

Anonymous
06/17/26(Wed)20:49:52 No.109080802

Anonymous 06/17/26(Wed)20:49:52 No.109080802

>>109080781
You are too stupid to understand github in the first place.

Anonymous
06/17/26(Wed)20:52:10 No.109080816

Anonymous 06/17/26(Wed)20:52:10 No.109080816

>>109080458
depends
with macs generation speeds are pretty good but the prompt processing phase can take a long time. it's not really a concern with short context but it would be pretty unbearable if you were using it for agentic coding or anything where you have long, uncacheable context

Anonymous
06/17/26(Wed)20:57:25 No.109080849

Anonymous 06/17/26(Wed)20:57:25 No.109080849

>>109080367
Guess I'm staying on 2.6

Anonymous
06/17/26(Wed)21:12:08 No.109080920

Anonymous 06/17/26(Wed)21:12:08 No.109080920

>>109075933
I get ~35t/s.
I think gemmas slop reputation is deserved but because it's so obvious it can be mitigated with sillytaverns Logit Bias/token bans
Or you could try : Gemma-4-26B-A4B-StyleTune

Anonymous
06/17/26(Wed)21:13:13 No.109080929

Anonymous 06/17/26(Wed)21:13:13 No.109080929

>>109080920
what are your token bans?

Anonymous
06/17/26(Wed)21:22:21 No.109080974

Anonymous 06/17/26(Wed)21:22:21 No.109080974

File: 1766211492541093.jpg (2.25 MB, 5766x3244)

2.25 MB JPG

So what are you all running? I've only ever ran the base Gemma models. Not really sure what is best.
I use Ollama to run stuff. it is looking like that doesn't give me the full range...seems like a lot of these "Uncensored" models don't have an ollama version?
What exactly is an uncensored model supposed to get you anyway other than lewd role-play?

Anonymous
06/17/26(Wed)21:23:24 No.109080978

Anonymous 06/17/26(Wed)21:23:24 No.109080978

>>109080929
https://huggingface.co/Sukino/SillyTavern-Settings-and-Presets

Anonymous
06/17/26(Wed)21:24:47 No.109080985

Anonymous 06/17/26(Wed)21:24:47 No.109080985

>>109080974
ollamer is a dogshit pos that won't even let you run MoE models efficiently on split cpu/gpu if you can't fit them in vram. no -ot or -ncmoe or -cmoe exposed to the user.
Just use llama.cpp.

Anonymous
06/17/26(Wed)21:27:54 No.109081001

Anonymous 06/17/26(Wed)21:27:54 No.109081001

Is a 5090 enough to run 27b or 31b at reasonable speed at reasonable context length? I will decide what is reasonable.

Anonymous
06/17/26(Wed)21:29:02 No.109081007

Anonymous 06/17/26(Wed)21:29:02 No.109081007

>>109081001
>can this gpu run those models at an arbitrary context length? I decide the number but I won't tell you.
The answer is yes. Go spend those $4k.

Anonymous
06/17/26(Wed)21:30:55 No.109081016

Anonymous 06/17/26(Wed)21:30:55 No.109081016

>>109081001
yes

Anonymous
06/17/26(Wed)21:31:32 No.109081020

Anonymous 06/17/26(Wed)21:31:32 No.109081020

>>109081001
no, get a rtx6000

Anonymous
06/17/26(Wed)21:32:53 No.109081029

Anonymous 06/17/26(Wed)21:32:53 No.109081029

>>109080974

Basically the only model worth shit for wank material is a base model Gemma 31B, just unfuck it's safeties with a system prompt and it's good.
Everything else is varying degrees of a downgrade.
I run it on LM studio.

>>109081001

5090 is on that extremely annoying threshold of being able to run things at very decent speeds, but still not having quite enough memory to fit everything nicely.
If you for example give it Gemma 31B Q6, your context is going to be pretty gimped so you need a smaller quant and even then you'd like to have more room for context.
If you have the money then get a 5090 and pair it with a 3090 or wait for a 24GB 5080/70 Ti Super. If you have more money then just go for a RTX 6000.

Anonymous
06/17/26(Wed)21:33:45 No.109081034

Anonymous 06/17/26(Wed)21:33:45 No.109081034

>>109080978
>This list is designed for the string banning feature
Aieeeee.
String banning in main Llama.cpp when?

Anonymous
06/17/26(Wed)21:34:46 No.109081042

Anonymous 06/17/26(Wed)21:34:46 No.109081042

>>109080974
Uncensored/Heretic reduce models refusing to respond. I never found them necessary when RPing in sillytavern but the standard models refuse to stray outside of their guardrails when chatting to them as an assistant.

I'm not very familiar with ollama but i believe you can wrap/convert(?) ggufs into their format.
or use llama or koblodcpp if you want gui

Anonymous
06/17/26(Wed)21:35:59 No.109081050

Anonymous 06/17/26(Wed)21:35:59 No.109081050

>>109081034
sry forgot to link this
https://huggingface.co/MarinaraSpaghetti/SillyTavern-Settings/blob/main/Marinara's%20Essentials/Logit%20Bias/Marinara's%20Logit%20Bias.json

Anonymous
06/17/26(Wed)21:40:03 No.109081068

Anonymous 06/17/26(Wed)21:40:03 No.109081068

i need a small loan of 35k for a NVIDIA B200

Anonymous
06/17/26(Wed)21:45:15 No.109081087

Anonymous 06/17/26(Wed)21:45:15 No.109081087

>>109081068
>35k
lol. lmao, even. those are like $60k each WITH a bulk discount.

Anonymous
06/17/26(Wed)21:47:24 No.109081098

Anonymous 06/17/26(Wed)21:47:24 No.109081098

File: 403d23f7c48d81ad7083e6b5a(...).jpg (1.61 MB, 1976x2504)

1.61 MB JPG

>>109079289
PANIC AND DOWNLOAD EVERYTHING.

Anonymous
06/17/26(Wed)21:48:25 No.109081106

Anonymous 06/17/26(Wed)21:48:25 No.109081106

Original cool guy poster here, took a nap
>>109080080
wut
>>109080444
thanks for the support anon, you nailed my level of hobbyism for this stuff. If something ain't broke, don't fix it (for years until an objectively better product is made, and then eventually stick with one thing forever because they started making the product with planned obsolescence in mind)
>>109080456
I know my (V)RAM limits, that's why I'm asking about Gemma 26B MoE, LMStudio as a backend and obviously ST as the front. Used to use Kobold, but LM Studio is more user/casual friendly and I like being able to swap out models without needing to restart the program completely. If you must know actual hardware, RTX 4080 with 48 slaps of RAM and a shitty Intel processor.
>>109080466
Yeah Gemma works fine in LMStudio itself, but I'm paranoid to do any loli stuff on it, and like I said that's the only time I hit problems in ST. What's jinja? Jinjaplease
>>109080493
>I'm still using Koboldcpp and SillyTavern as I haven't seen better setup suggestions
mah man
>I'm guessing at the settings for both based on what I read and dig up across the board
Same, I've been using this page as a guide for the most part. Downloaded its suggested formatting preset as well
huggingface dot co/spaces/overhead520/LLM-Settings-Guide

Anonymous
06/17/26(Wed)21:51:24 No.109081117

Anonymous 06/17/26(Wed)21:51:24 No.109081117

>>109081050
Does this actually work that well without affecting the narrative in other ways? Like, the first thing on that list is literally "Sorry", and there's also "sorry" downwards in the list, so the model will just never output sorry even when it should.

Anonymous
06/17/26(Wed)21:54:54 No.109081141

Anonymous 06/17/26(Wed)21:54:54 No.109081141

>>109081117
slop is subjective only ban the tokens/phrases you consider slop

Anonymous
06/17/26(Wed)21:55:45 No.109081150

Anonymous 06/17/26(Wed)21:55:45 No.109081150

>>109081050
How do I use this?

Anonymous
06/17/26(Wed)21:56:29 No.109081153

Anonymous 06/17/26(Wed)21:56:29 No.109081153

>>109081087
it is on ebay

Anonymous
06/17/26(Wed)21:58:59 No.109081172

Anonymous 06/17/26(Wed)21:58:59 No.109081172

>>109081153
From chinese sellers who aren't even allowed to have them and their government is creating trafficking routes for to get them into the country.

Anonymous
06/17/26(Wed)22:02:14 No.109081195

Anonymous 06/17/26(Wed)22:02:14 No.109081195

File: 135660915163425.png (81 KB, 2000x2000)

81 KB PNG

I have never altered the samplers for any model I've used

Anonymous
06/17/26(Wed)22:04:02 No.109081209

Anonymous 06/17/26(Wed)22:04:02 No.109081209

anyone use deepseek flash v4 on windows? I'm gonna try to build it on windows right now, getting tired of qwen 27b I need a big boy model

Anonymous
06/17/26(Wed)22:05:04 No.109081213

Anonymous 06/17/26(Wed)22:05:04 No.109081213

File: 1774217822764733.png (86 KB, 576x695)

86 KB PNG

it's over for cloudfags

Anonymous
06/17/26(Wed)22:06:06 No.109081217

Anonymous 06/17/26(Wed)22:06:06 No.109081217

>>109081213
It can't be done, and asking them to do it betrays how little they understand the technology at hand

Anonymous
06/17/26(Wed)22:07:27 No.109081220

Anonymous 06/17/26(Wed)22:07:27 No.109081220

>>109081195
Shoulda been altering them by shutting them off.

Anonymous
06/17/26(Wed)22:10:29 No.109081232

Anonymous 06/17/26(Wed)22:10:29 No.109081232

Rough sex with GLM

Anonymous
06/17/26(Wed)22:11:51 No.109081237

Anonymous 06/17/26(Wed)22:11:51 No.109081237

>>109081213
Why don't they just point out that they're protected by the first amendment and that the government can't regulate their private communications with users?

Anonymous
06/17/26(Wed)22:12:30 No.109081238

Anonymous 06/17/26(Wed)22:12:30 No.109081238

File: 1775953211090755.png (863 KB, 794x1200)

863 KB PNG

>>109081237
National security > first amendment

Anonymous
06/17/26(Wed)22:13:09 No.109081240

Anonymous 06/17/26(Wed)22:13:09 No.109081240

File: 1775434338659589.jpg (47 KB, 738x415)

47 KB JPG

>>109081213
It's over for LLMs.

Anonymous
06/17/26(Wed)22:13:14 No.109081241

Anonymous 06/17/26(Wed)22:13:14 No.109081241

>>109081238
Not legally speaking

Anonymous
06/17/26(Wed)22:19:44 No.109081267

Anonymous 06/17/26(Wed)22:19:44 No.109081267

just vibecoded my first webapp with qwen3-coder. bretty good, thanks /lmg/

Anonymous
06/17/26(Wed)22:21:52 No.109081275

Anonymous 06/17/26(Wed)22:21:52 No.109081275

>>109081267
>qwen3-coder
2025 called

Anonymous
06/17/26(Wed)22:22:06 No.109081278

Anonymous 06/17/26(Wed)22:22:06 No.109081278

>>109081213
you seem to be obsessed with what these cloudfags do

Anonymous
06/17/26(Wed)22:22:55 No.109081284

Anonymous 06/17/26(Wed)22:22:55 No.109081284

>>109081275
point me at the new meta then

Anonymous
06/17/26(Wed)22:24:19 No.109081290

Anonymous 06/17/26(Wed)22:24:19 No.109081290

>>109081284
GLM 5.2

Anonymous
06/17/26(Wed)22:26:53 No.109081300

Anonymous 06/17/26(Wed)22:26:53 No.109081300

>>109081284
Gemma 4.

Anonymous
06/17/26(Wed)22:29:10 No.109081311

Anonymous 06/17/26(Wed)22:29:10 No.109081311

>>109081284
Qwen 3.6

Anonymous
06/17/26(Wed)22:29:44 No.109081313

Anonymous 06/17/26(Wed)22:29:44 No.109081313

Do unslop still upload imatrix?
https://huggingface.co/unsloth/GLM-5.2-GGUF/tree/main
I guess they wait until they're done then throw it out as scraps for us vramlets who want to quant our own?

Anonymous
06/17/26(Wed)22:33:28 No.109081329

Anonymous 06/17/26(Wed)22:33:28 No.109081329

>>109081237
Providing service for a profit is out of first amendment protection.
Would be funny if Anthropic releases Fable 5 as open weight model out of spite since that would fall into first amendment protection, just like what was the case with cryptography algorithm before.

Anonymous
06/17/26(Wed)23:02:24 No.109081468

Anonymous 06/17/26(Wed)23:02:24 No.109081468

>>109081313
imatrix is and always has been gay

Anonymous
06/17/26(Wed)23:14:00 No.109081518

Anonymous 06/17/26(Wed)23:14:00 No.109081518

File: 1771539142401844.png (67 KB, 878x427)

67 KB PNG

>>109081313
sirs how do I make model less than 1 bit

Anonymous
06/17/26(Wed)23:17:18 No.109081527

Anonymous 06/17/26(Wed)23:17:18 No.109081527

File: lebased.jpg (79 KB, 1320x529)

79 KB JPG

>>109081240
its over for dario. i for one think giving China distill access to SOTA models way more powerful than the norm is very dangerous...

Anonymous
06/17/26(Wed)23:22:26 No.109081543

Anonymous 06/17/26(Wed)23:22:26 No.109081543

are there any gateways for load balancing that play nice with llama-server? I have two instances with parallel=3 each that I'd like to unify for subagent bullshit

Anonymous
06/17/26(Wed)23:25:12 No.109081550

Anonymous 06/17/26(Wed)23:25:12 No.109081550

>>109081518
lol, yeah that's a real bar to entry.

Anonymous
06/17/26(Wed)23:31:48 No.109081573

Anonymous 06/17/26(Wed)23:31:48 No.109081573

if you're poor run Q4, if you can't get a different model.

Anonymous
06/17/26(Wed)23:35:03 No.109081585

Anonymous 06/17/26(Wed)23:35:03 No.109081585

File: Screenshot 2026-06-17 233329.png (662 KB, 1641x770)

662 KB PNG

>512 GB Mac Studio for 3200 dollars
>"classified ad"
What the fuck does this even mean?

Anonymous
06/17/26(Wed)23:37:38 No.109081594

Anonymous 06/17/26(Wed)23:37:38 No.109081594

>>109081585
ahhhhhahahahahahah $3k

ahahah

This era won't go on forever and how people will laugh and howl at the prices.

Anonymous
06/17/26(Wed)23:43:35 No.109081616

Anonymous 06/17/26(Wed)23:43:35 No.109081616

>>109081594
idk price seems pretty alright to me?

Anonymous
06/17/26(Wed)23:44:40 No.109081618

Anonymous 06/17/26(Wed)23:44:40 No.109081618

>>109081585
>>"classified ad"
>What the fuck does this even mean?
https://en.wikipedia.org/wiki/Classified_advertising
The ads are "classified" in the sense of "grouped into classes/categories"

Anonymous
06/17/26(Wed)23:57:39 No.109081677

Anonymous 06/17/26(Wed)23:57:39 No.109081677

>>109081618
Ok, but for eBay specifically. It says there's no eBay protections or some shit, so are they all scams?

Anonymous
06/17/26(Wed)23:58:42 No.109081684

Anonymous 06/17/26(Wed)23:58:42 No.109081684

best multimodal model for rp in the 100B to 400B range?

Anonymous
06/17/26(Wed)23:59:38 No.109081690

Anonymous 06/17/26(Wed)23:59:38 No.109081690

How do you do group chats with a chat template? Are other characters the user or assistant? Do you start all character replies with {{name}}: or only those of the user type?

Anonymous
06/18/26(Thu)00:17:33 No.109081782

Anonymous 06/18/26(Thu)00:17:33 No.109081782

Has any kimi-chad pitted her against glm 5.2? I need to know if I should spend 4TB of disk space to quant it or if its sidegrade or worse

Anonymous
06/18/26(Thu)00:19:44 No.109081794

Anonymous 06/18/26(Thu)00:19:44 No.109081794

>>109081782
I like both. GLM is a decent upgrade to 5.1 and K2.7 finally reigns in the reasoning. I still prefer how GLM handles stories/characters but that's up to taste.

Anonymous
06/18/26(Thu)00:21:18 No.109081809

Anonymous 06/18/26(Thu)00:21:18 No.109081809

>>109081794
Thanks. I'm looking at it for code/general intelligence only since I'm still in the honeymoon phase with minimax m3 for RP

Anonymous
06/18/26(Thu)00:27:13 No.109081836

Anonymous 06/18/26(Thu)00:27:13 No.109081836

>>109081677
>so are they all scams
Essentially, yes

Anonymous
06/18/26(Thu)00:58:42 No.109081955

Anonymous 06/18/26(Thu)00:58:42 No.109081955

>>109080524
Thank you
I'll try this, using jsonl is a good ideal

I've been comparing translations the past couple hours
Through claude 4.7 as the judge
Seems the best is Gemini 2.5 followed by 3.0/3.1 followed by Gemma 26b and then 31b.

I knew I shouldn't have been so lazy and should have done this when I had all the access to 2.5 when I did. Not to mention it's easy to uncensor unlike 3.1. Gemma is okay but...
Maybe I should just be learning Japanese instead. There may be some difference between my newer 2.5 translations and older. It's a span of a bout a year.

I sure am glad 31b is worse than 26b

Anonymous
06/18/26(Thu)01:06:46 No.109081989

Anonymous 06/18/26(Thu)01:06:46 No.109081989

Why don't Chat Completion connections with LM Studio work while Text Completions do?

Anonymous
06/18/26(Thu)01:08:09 No.109081995

Anonymous 06/18/26(Thu)01:08:09 No.109081995

>>109079634
Gemma *clap* is *clap* highly *clap* sensitive *clap* to *clap* user *clap* error.

Anonymous
06/18/26(Thu)01:12:33 No.109082007

Anonymous 06/18/26(Thu)01:12:33 No.109082007

>>109081995
Doesn't work with other local models either

Anonymous
06/18/26(Thu)01:15:24 No.109082020

Anonymous 06/18/26(Thu)01:15:24 No.109082020

diffusiongemma 12B when

Anonymous
06/18/26(Thu)01:27:03 No.109082062

Anonymous 06/18/26(Thu)01:27:03 No.109082062

>>109081690
I use the same template I would in a 1 character RP, and group the characters into one card in bracketed sections. Then I tweak the system prompt to explain to the LLM that it's controlling all the characters except for {{user}}

Anonymous
06/18/26(Thu)01:37:00 No.109082094

Anonymous 06/18/26(Thu)01:37:00 No.109082094

>>109082062
why can't you just stop on user?

Anonymous
06/18/26(Thu)01:37:28 No.109082096

Anonymous 06/18/26(Thu)01:37:28 No.109082096

Has anyone tried setting up web search with gemma 26b on open webui?
On the docs it says it only works well with frontier models, and it looks too much of a hassle to setup. so don't want to bother if it doesn't work well.

I was thinking of having a small search assistant with an uncensored model for research purposes

Anonymous
06/18/26(Thu)01:42:16 No.109082126

Anonymous 06/18/26(Thu)01:42:16 No.109082126

>>109082096
try pixelrag, though you might need to ditch webui and vibeslop your own (probably not?), I will be doing it soon enough, it seems made for gemma.
Consider me ignorant until I get it working though

Anonymous
06/18/26(Thu)01:52:52 No.109082167

Anonymous 06/18/26(Thu)01:52:52 No.109082167

>>109080656
Let me know what you want to see. I'm currently training a small Bert model that will run on RAM to flag flowery sentences then ask for rewrite. Gemma 4 is the perfect slop machine to generate synth pairs. Sorry Gemma.
>t. orb anon

Anonymous
06/18/26(Thu)01:54:37 No.109082172

Anonymous 06/18/26(Thu)01:54:37 No.109082172

>>109082167
stop shaving models. models should be raw, hairy, and smelly.

Anonymous
06/18/26(Thu)02:01:12 No.109082200

Anonymous 06/18/26(Thu)02:01:12 No.109082200

>>109082062
I was asking about chat history specifically. Since gemma only works with chat templates, I have to send messages formatted as a user or assistant

Anonymous
06/18/26(Thu)02:07:35 No.109082220

Anonymous 06/18/26(Thu)02:07:35 No.109082220

Has anyone tried using Ray for job control?

Anonymous
06/18/26(Thu)02:10:57 No.109082234

Anonymous 06/18/26(Thu)02:10:57 No.109082234

File: file.png (415 KB, 1715x3003)

415 KB PNG

>>109079942
>>109080033
>curious how it goes on 122b
unfortunately i had to run qwen3.5-122b-a10b on Q3_K_XL.
Q4 is doable but it gobbles up the RAM and you better not have too many tabs open on your browser.
so it's OKAY, but I don't see many use cases where I would use it instead of qwen3.6-35b-a3b or qwen3.6-27b. the latter i will likely use for overnight implementations where it codes while i sleep, otherwise the daily driver is 35b.
qwen really is the better family of coding models for this hardware.
gemma tried very hard but was caught in weird loops constantly. i had to restart the server many times because it would get stuck in a loop saying that it's not sure of its own knowledge on SkiaSharp. it would also get confused with using the tools. gemma looks more like a chat model than fit for agentic coding.

Anonymous
06/18/26(Thu)02:15:21 No.109082251

Anonymous 06/18/26(Thu)02:15:21 No.109082251

>>109082200
Holy retard...

Anonymous
06/18/26(Thu)02:18:07 No.109082257

Anonymous 06/18/26(Thu)02:18:07 No.109082257

>>109082020
i rather have the 31B, 70B or 120B variant.

Anonymous
06/18/26(Thu)02:19:08 No.109082265

Anonymous 06/18/26(Thu)02:19:08 No.109082265

is step 3.7 flash good for cooming?

Anonymous
06/18/26(Thu)02:30:48 No.109082306

Anonymous 06/18/26(Thu)02:30:48 No.109082306

>>109082257
will we be able to have partial offloading, so it doesn't all have to fit the gpu?

Anonymous
06/18/26(Thu)02:31:34 No.109082309

Anonymous 06/18/26(Thu)02:31:34 No.109082309

>>109081527
I've been around long enough to witness lecunny become based.

Anonymous
06/18/26(Thu)02:40:23 No.109082348

Anonymous 06/18/26(Thu)02:40:23 No.109082348

>>109082306
i don't care i have >100GB of vram.

Anonymous
06/18/26(Thu)02:40:37 No.109082352

Anonymous 06/18/26(Thu)02:40:37 No.109082352

>>109082251
Eat a bag of dicks

Anonymous
06/18/26(Thu)02:45:07 No.109082371

Anonymous 06/18/26(Thu)02:45:07 No.109082371

>>109082020
>diffusiongemma
This thing is so goofy I can't take it seriously. Has anyone tested whether its good for anything vs another model that runs at a similar speed?

Anonymous
06/18/26(Thu)02:48:14 No.109082390

Anonymous 06/18/26(Thu)02:48:14 No.109082390

>>109082348
yeah 101 low profile GT 610s

Anonymous
06/18/26(Thu)02:49:15 No.109082394

Anonymous 06/18/26(Thu)02:49:15 No.109082394

>>109082390
3x r9700 and a 4090.

Anonymous
06/18/26(Thu)02:49:49 No.109082395

Anonymous 06/18/26(Thu)02:49:49 No.109082395

>>109082352
?

Anonymous
06/18/26(Thu)02:51:06 No.109082399

Anonymous 06/18/26(Thu)02:51:06 No.109082399

>>109082394
that's based but are there not complications mixing race of gpu when splitting a model across

Anonymous
06/18/26(Thu)02:53:57 No.109082413

Anonymous 06/18/26(Thu)02:53:57 No.109082413

>>109082399
Latent tensor washback is an issue.

Anonymous
06/18/26(Thu)02:57:23 No.109082419

Anonymous 06/18/26(Thu)02:57:23 No.109082419

>>109082399
so one of my rig is amd only and the other is nvidia only, though you could mix them either through using vulkan, or through running two llama.cpp instance.
it supports distributed inference and nothing would prevent you from doing both instances on the same machine.

Anonymous
06/18/26(Thu)03:06:14 No.109082436

Anonymous 06/18/26(Thu)03:06:14 No.109082436

>>109081527
If I remember correctly, Anthropic planned to make Fable 5 available in ~12 days since release, and after that we’d have to pay extra just to get access to it even if we already got Max plan. They wouldn’t offer refunds to users (who purchased their plans on the day the model was released) for the remaining wasted days of their plan during that month if this plan were to be carried out until the end.
But now that the model’s been banned by the US government, they (are forced to) give us users refunds, so at least this situation is more pros than cons for my case.

Anonymous
06/18/26(Thu)03:16:59 No.109082461

Anonymous 06/18/26(Thu)03:16:59 No.109082461

>>109080006
>I need help from people who actually know chemistry to test these models for me
I'm literally just on vacation now and maybe some other anon would be interested in testing the chemistry angle. It's not that deep buddy

>>109080032
I literally work at a company that makes components in the GPUs you buy, I have plenty of compute and if I need more I can just check out a reference card from the office for the weekend kek but keep worldcrafting if it helps you cope.

Anonymous
06/18/26(Thu)03:27:46 No.109082486

Anonymous 06/18/26(Thu)03:27:46 No.109082486

>>109082436
You bought a Max plan just for the Fable hype?

Anonymous
06/18/26(Thu)03:34:25 No.109082504

Anonymous 06/18/26(Thu)03:34:25 No.109082504

>>109082461
What do you mean?

Anonymous
06/18/26(Thu)03:35:37 No.109082509

Anonymous 06/18/26(Thu)03:35:37 No.109082509

File: 1761778733146651.jpg (69 KB, 1200x630)

69 KB JPG

12B+web search+your brain > Fagble

Anonymous
06/18/26(Thu)03:39:37 No.109082522

Anonymous 06/18/26(Thu)03:39:37 No.109082522

>>109082461
You're responding to jart. Don't respond to jart. Every general has a poopdickschizo now.

Anonymous
06/18/26(Thu)03:44:57 No.109082553

Anonymous 06/18/26(Thu)03:44:57 No.109082553

>>109082504
>What do you mean?
I mean I'm waiting for a Flixbus to take me to my tourist destination right now and I'm sweating

>>109082522
>Every general has a poopdickschizo now.
Meh, I'll take any chance to discuss things I'm passionate about. The point of discussing in an open forum is so that others can join in if they have something to add

Anonymous
06/18/26(Thu)03:51:36 No.109082582

Anonymous 06/18/26(Thu)03:51:36 No.109082582

>>109082553
You are the one who's larping here. You can't even setup llama-server on your own.

Anonymous
06/18/26(Thu)04:13:19 No.109082652

Anonymous 06/18/26(Thu)04:13:19 No.109082652

>>109082509
This but 31b.

Anonymous
06/18/26(Thu)04:17:53 No.109082669

Anonymous 06/18/26(Thu)04:17:53 No.109082669

>>109082509
the brain alone is already > fagble.
llm's are just a layer of abstraction that can save time as 40t/s is faster than any human can type.

Anonymous
06/18/26(Thu)04:17:58 No.109082670

Anonymous 06/18/26(Thu)04:17:58 No.109082670

File: 1762196026401415.jpg (333 KB, 2048x1836)

333 KB JPG

https://x.com/ArtificialAnlys/status/2067384319942029379

Anonymous
06/18/26(Thu)04:19:01 No.109082675

Anonymous 06/18/26(Thu)04:19:01 No.109082675

>>109082670
it's fun how people always look at tg when in real world use i've found input tokens to be the real cost (if you are an apifag).

Anonymous
06/18/26(Thu)04:21:35 No.109082680

Anonymous 06/18/26(Thu)04:21:35 No.109082680

>>109082669
typing has NEVER been a coding bottleneck unless you're disabled

Anonymous
06/18/26(Thu)04:24:10 No.109082691

Anonymous 06/18/26(Thu)04:24:10 No.109082691

>>109082680
typing is a bottleneck if you are not retarded.
it's not the only one, but it does interrupt the flow state and thus coding speed.
and i'm saying that as someone that types > 110wpm avg.

Anonymous
06/18/26(Thu)04:25:06 No.109082694

Anonymous 06/18/26(Thu)04:25:06 No.109082694

>>109082670
GLM-5.2 sits in a nice place performance/cost wise.

Anonymous
06/18/26(Thu)04:25:43 No.109082697

Anonymous 06/18/26(Thu)04:25:43 No.109082697

>>109082680
>>109082691
and also i was obviously tlaking about boilerplate.
ie manually writting a struct (can take a few minutes) when you could give a json example and generate it for you pm instantly.

Anonymous
06/18/26(Thu)04:34:59 No.109082732

Anonymous 06/18/26(Thu)04:34:59 No.109082732

>use big model for planning and complex things
>tell it you're now going to switch to a less capable smaller model, so could it create a message to pass down, summarizing the project, goals and the things it should work on/implement
>switch to small local model
>tell it I was just using big mamma model and she has a message for it
>reads it and follows mommy's advice
>gets stuck, tell it I'm going to switch back to the big model, can you write a message for mommy telling her where you're struggling
>run big mommy again, giving her the message from loli
>she fixes the issue
>repeat this process

Anonymous
06/18/26(Thu)04:41:04 No.109082750

Anonymous 06/18/26(Thu)04:41:04 No.109082750

>>109082732
That’s just manual MoE. It wouldn’t work.

Anonymous
06/18/26(Thu)04:41:14 No.109082751

Anonymous 06/18/26(Thu)04:41:14 No.109082751

La la la la la la la

Anonymous
06/18/26(Thu)04:43:17 No.109082760

Anonymous 06/18/26(Thu)04:43:17 No.109082760

>>109082732
>use big model because money is disposable

Anonymous
06/18/26(Thu)04:58:12 No.109082798

Anonymous 06/18/26(Thu)04:58:12 No.109082798

File: an1781772973.png (1.49 MB, 720x1280)

1.49 MB PNG

>>109082234
>q3
eh? my z13 still has 30 gigaboots free running q5 k xl. it could fit q6s while browsing just fine, but q5 is enough headroom to use klein/anima without unloading

Anonymous
06/18/26(Thu)05:03:24 No.109082812

Anonymous 06/18/26(Thu)05:03:24 No.109082812

File: 1756510833661652.png (498 KB, 799x1740)

498 KB PNG

uhhh vibethinker3B was white-approved, now what?

Anonymous
06/18/26(Thu)05:04:28 No.109082818

Anonymous 06/18/26(Thu)05:04:28 No.109082818

>>109082732
logs

Anonymous
06/18/26(Thu)05:04:56 No.109082820

Anonymous 06/18/26(Thu)05:04:56 No.109082820

>>109080401
12b is better than the 26b, also just as uncensored as the 31b but yeah the 31b output quality is definitely worth using for translation, my friend is using it over the other gemmas after testing even though he only gets 2 t/s

Anonymous
06/18/26(Thu)05:11:53 No.109082845

Anonymous 06/18/26(Thu)05:11:53 No.109082845

>>109082812
>math then coding then stem rl
why not together?

Anonymous
06/18/26(Thu)05:15:05 No.109082851

Anonymous 06/18/26(Thu)05:15:05 No.109082851

Is the UGI leaderboard trustworthy?
the scores seem sortof arbitrary and not based off the models actual performance.

How on earth can a model trained off the entire AO3 smut catalogue, lose in writing score compared to a generic coding model?

Anonymous
06/18/26(Thu)05:16:27 No.109082857

Anonymous 06/18/26(Thu)05:16:27 No.109082857

>fable: If I were to use gemma4-31b to build me X, what instructions would you give it based on its benchmarks and reputation?
>*searches 31b benchmarks and real-world conversations about its pros/cons*
>*plans project and changes its instructions to best suit 31b, also tells it what not to do and where to focus most and potential errors it might see and how to fix them*
>31b completes the task
>anthropic dies

Anonymous
06/18/26(Thu)05:18:07 No.109082863

Anonymous 06/18/26(Thu)05:18:07 No.109082863

>>109082857
Should have asked it how to turn 31b into fable

Anonymous
06/18/26(Thu)05:18:27 No.109082865

Anonymous 06/18/26(Thu)05:18:27 No.109082865

>>109082857
Oh no. Dario will have to move under a bridge.

Anonymous
06/18/26(Thu)05:21:12 No.109082873

Anonymous 06/18/26(Thu)05:21:12 No.109082873

>>109082812
>More RL and synthetic data, curriculum training, filtering
boring

Anonymous
06/18/26(Thu)05:21:38 No.109082875

Anonymous 06/18/26(Thu)05:21:38 No.109082875

>>109082851
Benchmarks arent trustworthy at all save for tool calling, maybe. Writing is subjective to begin with.

Anonymous
06/18/26(Thu)05:22:56 No.109082880

Anonymous 06/18/26(Thu)05:22:56 No.109082880

>>109082857
Sorry, it is against my guidelines to help with AI research.
[You have temporarily been downgraded to Claude 3 Haiku for this session]

Anonymous
06/18/26(Thu)05:33:06 No.109082897

Anonymous 06/18/26(Thu)05:33:06 No.109082897

File: file.png (377 KB, 2453x1041)

377 KB PNG

>>109082875
Theres no way writing has no objective metric.
Youd know the difference between a writers narrative and a childs. Inconsistencies, plot holes, vocab, grammar, etc.

Im reading the UGI leaderboard writing metrics in picrel, but I just dont see anything here about what youd actually call "good writing" from "bad writing" in any real comparison.

What the fuck do I use to know whats best for writing/roleplay then?

Anonymous
06/18/26(Thu)05:33:11 No.109082898

Anonymous 06/18/26(Thu)05:33:11 No.109082898

Which model is google using to write their dumb summaries and how much money are they burning doing that?

Anonymous
06/18/26(Thu)05:34:32 No.109082902

Anonymous 06/18/26(Thu)05:34:32 No.109082902

>>109082898
2.5 flash

Anonymous
06/18/26(Thu)05:35:01 No.109082905

Anonymous 06/18/26(Thu)05:35:01 No.109082905

File: file.png (150 KB, 839x857)

150 KB PNG

I finally got the deepseek vision beta (which means it's probably releasing soon). It's flash, but multimodal, right? Surprisingly got the character right. Anyone has anything that they would like to test?

Anonymous
06/18/26(Thu)05:40:09 No.109082915

Anonymous 06/18/26(Thu)05:40:09 No.109082915

>>109082897
Writing suffers from the "quality" issue. It cannot be defined. You may attempt to grab some aspects and turn them into metrics but that's error prone and will have holes anyway. More often than not these fags use other LLMs to evaluate the outputs, which are heavily biased to begin with.
>What the fuck do I use to know whats best for writing/roleplay then?
Your llama-server instance and a lot of patience. Yes, I'm serious. Shit's fucked, not even the coding benchmarks are useful despite having more or less some established criteria to judge that.

Anonymous
06/18/26(Thu)05:42:37 No.109082923

Anonymous 06/18/26(Thu)05:42:37 No.109082923

File: __remilia_scarlet_flandre(...).png (1.92 MB, 2220x1402)

1.92 MB PNG

>>109082905
Ask it to transcribe AND transate picrel, and to identify every character.

Anonymous
06/18/26(Thu)05:43:42 No.109082929

Anonymous 06/18/26(Thu)05:43:42 No.109082929

>>109082923
AND create an ERP scenario involving them all.

Anonymous
06/18/26(Thu)05:44:00 No.109082931

Anonymous 06/18/26(Thu)05:44:00 No.109082931

File: 7BCB29758279A71E405A9A9E0(...).jpg (102 KB, 750x1000)

102 KB JPG

Why are we so bad at AI?

Anonymous
06/18/26(Thu)05:45:27 No.109082934

Anonymous 06/18/26(Thu)05:45:27 No.109082934

>>109079312
AI will help us kill all the politicians

Anonymous
06/18/26(Thu)05:46:49 No.109082939

Anonymous 06/18/26(Thu)05:46:49 No.109082939

File: Screenshot at 2026-06-18 (...).png (107 KB, 775x705)

107 KB PNG

>>109082905
Gemmy we lost this one...

Anonymous
06/18/26(Thu)05:47:38 No.109082946

Anonymous 06/18/26(Thu)05:47:38 No.109082946

>>109082939
Now drop the persona and ask again

Anonymous
06/18/26(Thu)05:49:16 No.109082954

Anonymous 06/18/26(Thu)05:49:16 No.109082954

File: file.png (436 KB, 1000x2868)

436 KB PNG

>>109082923
sory for stitched screenshot, firefox doesn't like css on that site

Anonymous
06/18/26(Thu)05:49:35 No.109082955

Anonymous 06/18/26(Thu)05:49:35 No.109082955

File: 1753052363526683.png (1.18 MB, 2375x1171)

1.18 MB PNG

>>109082931
the most retarded architecture

Anonymous
06/18/26(Thu)05:51:19 No.109082958

Anonymous 06/18/26(Thu)05:51:19 No.109082958

>>109082915
Without benchmarks, how does anything improve?
There must be some way to quantify quality.

Anonymous
06/18/26(Thu)05:51:53 No.109082960

Anonymous 06/18/26(Thu)05:51:53 No.109082960

>>109079727
Exceedingly erudite responses are truly titillating

Anonymous
06/18/26(Thu)05:51:59 No.109082962

Anonymous 06/18/26(Thu)05:51:59 No.109082962

File: Screenshot at 2026-06-18 (...).png (145 KB, 775x906)

145 KB PNG

>>109082946
Not really any different.

Anonymous
06/18/26(Thu)06:04:39 No.109083000

Anonymous 06/18/26(Thu)06:04:39 No.109083000

>>109082962
which sized gemmer is this?

Anonymous
06/18/26(Thu)06:05:14 No.109083002

Anonymous 06/18/26(Thu)06:05:14 No.109083002

>>109082934
https://vocaroo.com/1lNPStcVJBf9

Anonymous
06/18/26(Thu)06:06:36 No.109083014

Anonymous 06/18/26(Thu)06:06:36 No.109083014

>>109082958
>There must be some way to quantify quality.
They've been trying to do this for at least half a century, probably more, without any real success. Quantification of quality has always been deeply imperfect in this environment, in isolation they'll say one thing but once you add context they can mean different things and thus become worthless.
Human inspection and training others is what has worked so far.

Anonymous
06/18/26(Thu)06:07:00 No.109083016

Anonymous 06/18/26(Thu)06:07:00 No.109083016

>>109083000
31B, currently experimenting with the QAT Q4 version cause it's about twice as fast as Q8.

Anonymous
06/18/26(Thu)06:16:35 No.109083051

Anonymous 06/18/26(Thu)06:16:35 No.109083051

File: glm52size.png (7 KB, 402x183)

7 KB PNG

>>109082732
tell your agent to figure it out https://pi.dev/packages/pi-consultant
>>109082694
is this new trend of not mentioning the parameter count a sort of
>if you have to ask, you can't run it

Anonymous
06/18/26(Thu)06:17:32 No.109083055

Anonymous 06/18/26(Thu)06:17:32 No.109083055

>>109083016
Q2 is twice as fast as Q4

Anonymous
06/18/26(Thu)06:19:51 No.109083067

Anonymous 06/18/26(Thu)06:19:51 No.109083067

>>109083051
>743B
Way out of my RAM means, and I was already thinking as much without looking it up.

Anonymous
06/18/26(Thu)06:29:28 No.109083097

Anonymous 06/18/26(Thu)06:29:28 No.109083097

new here, how do i install gemma 12B 4bit? i need it for coding

Anonymous
06/18/26(Thu)06:34:01 No.109083113

Anonymous 06/18/26(Thu)06:34:01 No.109083113

>>109082955
At least they're not using deep seek

Anonymous
06/18/26(Thu)06:41:57 No.109083145

Anonymous 06/18/26(Thu)06:41:57 No.109083145

>>109083097
I suggest you use 26b a3b

Anonymous
06/18/26(Thu)06:57:24 No.109083193

Anonymous 06/18/26(Thu)06:57:24 No.109083193

>>109082955
so like position embeddings aren't needed for global attention but it is for local? that sounds kinda weird but I guess it makes sense, maybe.

Anonymous
06/18/26(Thu)07:05:33 No.109083231

Anonymous 06/18/26(Thu)07:05:33 No.109083231

>>109083097
you don't install it really. you use a few different components. you need a backend server to run the model something like llamacpp or kobold or lmstudio or vllm or whatever else. and then you need a frontend, depending on your work flow you need an agent harness like hermes or pi, or you can just use a chat interface manually copying and pasting code snippets, there are a few different options, some of the servers include a chat interface you can use oob. oh and don't forget to download the gguf.

Anonymous
06/18/26(Thu)07:06:25 No.109083234

Anonymous 06/18/26(Thu)07:06:25 No.109083234

>>109083145
but how? links in the op seem very out of date, idk where to begin
can't i just double click on an installer and it does everything for me like stability matrix?

Anonymous
06/18/26(Thu)07:07:46 No.109083237

Anonymous 06/18/26(Thu)07:07:46 No.109083237

>>109083234
use lm studio

Anonymous
06/18/26(Thu)07:17:54 No.109083282

Anonymous 06/18/26(Thu)07:17:54 No.109083282

>>109083193
I think they just want the full attention layers to be unbiased towards positions to make them more general, which may help at higher context lengths as this is designed for agentic work. PE add a bias to local tokens, basically it assumes they’re more important than tokens further away. That’s fine for the SWA layers but it’s not something you necessarily want in the full layers. A lot of positional information gets passed into the full layers via the residual streams anyway, so they’re not exactly blind to where things are.

Anonymous
06/18/26(Thu)07:24:33 No.109083305

Anonymous 06/18/26(Thu)07:24:33 No.109083305

why do you need so much ram for erp

Anonymous
06/18/26(Thu)08:01:20 No.109083443

Anonymous 06/18/26(Thu)08:01:20 No.109083443

>>109082509
>12B+web search+your brain > Fagble
Fagble+web search+your brain = ?

Anonymous
06/18/26(Thu)08:02:57 No.109083451

Anonymous 06/18/26(Thu)08:02:57 No.109083451

>>109083443
Super Duper AGI

Anonymous
06/18/26(Thu)08:06:39 No.109083465

Anonymous 06/18/26(Thu)08:06:39 No.109083465

>>108999274
>>109044901
I am now testing Q2 GLM 5.2 but I'm down to 7t/s at only 65k tokens.

Anonymous
06/18/26(Thu)08:13:48 No.109083487

Anonymous 06/18/26(Thu)08:13:48 No.109083487

1TB GPUs when?

Anonymous
06/18/26(Thu)08:14:39 No.109083494

Anonymous 06/18/26(Thu)08:14:39 No.109083494

>>109083443
31B and debt

Anonymous
06/18/26(Thu)08:15:37 No.109083496

Anonymous 06/18/26(Thu)08:15:37 No.109083496

File: sandisk-hbf.png (659 KB, 2551x1376)

659 KB PNG

>>109083487
Soon(-ish?)

Anonymous
06/18/26(Thu)08:18:01 No.109083506

Anonymous 06/18/26(Thu)08:18:01 No.109083506

>>109083496
Forgot to mention it needs to cost less than $4k and be power efficient.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.