/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 06/05/26(Fri)19:00:04 No.108988701

File: 1762565947250834.jpg (724 KB, 2048x1536)

724 KB JPG

/lmg/ - Local Models General Anonymous 06/05/26(Fri)19:00:04 No.108988701 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108984529 & >>108980055

►News
>(06/05) dots.tts 2B released: https://hf.co/rednote-hilab/dots.tts-soar
>(06/05) Gemma 4 QAT models released: https://blog.google/innovation-and-ai/technology/developers-tools/quantization-aware-training-gemma-4
>(06/04) Higgs Audio v3 TTS released: https://boson.ai/blog/higgs-audio-v3-tts
>(06/04) Nemotron-3-Ultra-550B-A55B released: https://hf.co/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16
>(06/03) Gemma 4 12B Unified model released: https://hf.co/google/gemma-4-12B-it

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
06/05/26(Fri)19:00:40 No.108988711

Anonymous 06/05/26(Fri)19:00:40 No.108988711

File: 1739250286596.jpg (104 KB, 1000x1000)

104 KB JPG

►Recent Highlights from the Previous Thread: >>108984529

--Unsloth Gemma 4 QAT quant accuracy claims and skepticism:
>108986383 >108986405 >108986408 >108986410 >108986424 >108986447 >108986504 >108986533 >108986544 >108986559 >108986577 >108986608 >108987012
--Gemma 4 QAT near-BF16 Q4, and potential llama.cpp MTP support:
>108986761 >108986766 >108986782 >108986805 >108986818 >108986828 >108986834
--3090 24GB limits with 32K/70K context and system overhead:
>108986899 >108986928 >108986937 >108986952 >108986967 >108986978 >108986995 >108986954 >108986974
--Unsloth QAT knowledge retention and evaluation methodology:
>108987079 >108987154 >108987195 >108987223 >108987328 >108987381
--Unsloth QAT versus llama.cpp Gemma-4 quantization accuracy:
>108987587 >108987629 >108987788 >108987840 >108988071 >108988087 >108988095 >108987918 >108987920 >108987929 >108987930
--Performance of dots.tts on Seed-TTS-Eval:
>108987945 >108987996 >108988023 >108988184 >108988281 >108988294
--Anon benchmarks llama.cpp batch and CPU MoE settings:
>108986409
--llama.cpp NCCL boosting tensor parallel multi-GPU performance:
>108986764
--Testing Unsloth QAT 31B against standard quants:
>108988539 >108988550 >108988569 >108988623
--AMD Strix Halo tradeoffs for large context local LLMs:
>108985094 >108986729
--Anon modifies imatrix to use chat templates and multimodal data:
>108984809 >108985351 >108986176 >108986241 >108986305
--Adelic Gemma 4 12B server setup and performance:
>108985061 >108985087 >108985100 >108985857 >108986562
--Gemma 4 12B unified multimodal architecture explained:
>108984735 >108984769 >108984775
--Logs:
>108984830 >108984986 >108985020 >108985032 >108985108 >108985256 >108985297 >108985388 >108985399 >108985661 >108985741 >108985854 >108986497 >108986572 >108986691 >108986728 >108987403 >108987460 >108987492
--Miku (free space):
>108986398

►Recent Highlight Posts from the Previous Thread: >>108984530

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
06/05/26(Fri)19:02:14 No.108988718

Anonymous 06/05/26(Fri)19:02:14 No.108988718

>>108988698
RIP RAMlets
2 3090s btw

Anonymous
06/05/26(Fri)19:02:28 No.108988722

Anonymous 06/05/26(Fri)19:02:28 No.108988722

Mikulove

Anonymous
06/05/26(Fri)19:05:28 No.108988731

Anonymous 06/05/26(Fri)19:05:28 No.108988731

>>108988711
wow im in the highlights, a space all to myself too

Anonymous
06/05/26(Fri)19:08:01 No.108988740

Anonymous 06/05/26(Fri)19:08:01 No.108988740

File: Screenshot_20260605_165214.png (438 KB, 1499x1164)

438 KB PNG

Until real testing is done this is canon.

Anonymous
06/05/26(Fri)19:09:10 No.108988748

Anonymous 06/05/26(Fri)19:09:10 No.108988748

>>108988718
>RIP RAMlets
Our suffering never ends. maybe next year.

Anonymous
06/05/26(Fri)19:09:25 No.108988751

Anonymous 06/05/26(Fri)19:09:25 No.108988751

File: 1.png (5 KB, 276x160)

5 KB PNG

Why is my gemma 4 31b qat q4 with 8k context eating up all my vram on 3090? it fills up completely and slows down to a crawl. am i doing something wrong or is that normal. Im using koboldcpp and windows 10.

Anonymous
06/05/26(Fri)19:09:26 No.108988753

Anonymous 06/05/26(Fri)19:09:26 No.108988753

>>108988740
loli version

Anonymous
06/05/26(Fri)19:09:44 No.108988755

Anonymous 06/05/26(Fri)19:09:44 No.108988755

what's the best

speech -> llm -> audio

pipeline I can run at home and use on my phone?

Do I really need to make a custom tool for this? Or can llamacpp do this?

Anonymous
06/05/26(Fri)19:09:53 No.108988758

Anonymous 06/05/26(Fri)19:09:53 No.108988758

>>108988740
how many "pussy grade" spreadsheets have you prompted in total now? is this a habit of yours?

Anonymous
06/05/26(Fri)19:11:07 No.108988763

Anonymous 06/05/26(Fri)19:11:07 No.108988763

I'm tired of people trusting unslop's skewed statistics. If they need to spam their updates every single day and spam how good they are - they probably aren't that good. It's a simple rule: if you need to talk about how "powerful" you are, in reality you aren't that powerful at all.

Anonymous
06/05/26(Fri)19:11:15 No.108988764

Anonymous 06/05/26(Fri)19:11:15 No.108988764

>>108988758
This was the original I had the other ones for slower anons
Would you like a different kind or chart or graph?

Anonymous
06/05/26(Fri)19:13:45 No.108988772

Anonymous 06/05/26(Fri)19:13:45 No.108988772

File: miku small thumb up.png (22 KB, 240x240)

22 KB PNG

>>108988731

Anonymous
06/05/26(Fri)19:14:31 No.108988779

Anonymous 06/05/26(Fri)19:14:31 No.108988779

File: スクリーンショット_20260605_182316.png (55 KB, 930x746)

55 KB PNG

I'm getting real tired of this. I don't know why this keeps happening, or what I need to change to fix it. It's a custom app using gemma 4 31B. Only settings being sent in the requests is temperature 0.7 and medium reasoning effort.

Anonymous
06/05/26(Fri)19:14:49 No.108988781

Anonymous 06/05/26(Fri)19:14:49 No.108988781

>>108988751
I don't remember if SWA is activated by default on kcpp, try forcing it on?

--useswa

Anonymous
06/05/26(Fri)19:15:21 No.108988782

Anonymous 06/05/26(Fri)19:15:21 No.108988782

File: big models on bad hardware.png (2.19 MB, 1122x1402)

2.19 MB PNG

Here's how poor people can still win with 17 tokens per second which is enough for anything.

Anonymous
06/05/26(Fri)19:16:09 No.108988788

Anonymous 06/05/26(Fri)19:16:09 No.108988788

>>108988779
quant? kv cache quant?

Anonymous
06/05/26(Fri)19:16:22 No.108988790

Anonymous 06/05/26(Fri)19:16:22 No.108988790

>>108988782
when will local image models be this good at so much text?

Anonymous
06/05/26(Fri)19:18:06 No.108988797

Anonymous 06/05/26(Fri)19:18:06 No.108988797

>>108988781
--usseewa

Anonymous
06/05/26(Fri)19:18:35 No.108988799

Anonymous 06/05/26(Fri)19:18:35 No.108988799

>>108988782
If you're aiming this at the desperate you might want to add that windows is usually slower, too.

Anonymous
06/05/26(Fri)19:19:11 No.108988805

Anonymous 06/05/26(Fri)19:19:11 No.108988805

File: スクリーンショット_20260605_191709.png (206 KB, 1181x1398)

206 KB PNG

>>108988788
This is what I have set in oobabooga textgen, basically defaults + "-fa 1" in extra flags because I read it is supposed to be used with Strix Halo

Anonymous
06/05/26(Fri)19:19:40 No.108988808

Anonymous 06/05/26(Fri)19:19:40 No.108988808

>have 2 R9700s and 2 V620s
>they suck because the bandwidth is dogshit
fucking AMD
If you had $4000 to get hardware, what would you get? A 5090? Some 3090 Turbos?

Anonymous
06/05/26(Fri)19:19:46 No.108988810

Anonymous 06/05/26(Fri)19:19:46 No.108988810

So would 4 bit QAT version quantized to 3 bit be better than the non QAT 3 bit quant? or would that just cause double errors since the OAT's are trained to be 4 bit?
I can't run 31b 4 bit because poor

Anonymous
06/05/26(Fri)19:20:01 No.108988812

Anonymous 06/05/26(Fri)19:20:01 No.108988812

>>108988782
I have 8GB VRAM and 64GB RAM, what should I do? 26B Q8 gemma 4?
I'm using Q4 right now

Anonymous
06/05/26(Fri)19:20:44 No.108988817

Anonymous 06/05/26(Fri)19:20:44 No.108988817

>>108988810
A big model at 2bit or 3 bit is better than small models at 16 bits.

Anonymous
06/05/26(Fri)19:21:35 No.108988822

Anonymous 06/05/26(Fri)19:21:35 No.108988822

vramlets lost
vramgods lost
everyone lost

Anonymous
06/05/26(Fri)19:21:36 No.108988823

Anonymous 06/05/26(Fri)19:21:36 No.108988823

>>108988810
It's a good question. Not sure anyone knows the answer to that. Maybe try it out and report back?

Anonymous
06/05/26(Fri)19:22:23 No.108988826

Anonymous 06/05/26(Fri)19:22:23 No.108988826

>>108988781
SWA was off by default. Turned it on and i can have 16k context and only takes up 20gb of vram. Thanks for help.

Anonymous
06/05/26(Fri)19:22:31 No.108988827

Anonymous 06/05/26(Fri)19:22:31 No.108988827

>>108988782
>Sources: LocalLLaMA

>>108988790
I hope not. These things always look so cluttered. Like it needs every single pixel to have something jammed in there.

Anonymous
06/05/26(Fri)19:22:43 No.108988828

Anonymous 06/05/26(Fri)19:22:43 No.108988828

>>108988808
stack 3090s

Anonymous
06/05/26(Fri)19:22:53 No.108988831

Anonymous 06/05/26(Fri)19:22:53 No.108988831

>>108988812
whatever moe you can fit the active param plus context in vram

Anonymous
06/05/26(Fri)19:23:14 No.108988833

Anonymous 06/05/26(Fri)19:23:14 No.108988833

>>108988805
looks alright desu. did you check the model shasum (if the download got fucked)? if its fine maybe your abliteration got brain damage, try the normal one

Anonymous
06/05/26(Fri)19:24:04 No.108988839

Anonymous 06/05/26(Fri)19:24:04 No.108988839

>>108988779
You are better off using llama-server as your backend. If you are working on your own game or project I would use text completion end point. It takes bit more work because you will need to implement and track your own chat template and manage reasoning brackets (which isn't that complicated at all anyway) but in the end you have something what will work the way you want it to work.

Anonymous
06/05/26(Fri)19:25:50 No.108988847

Anonymous 06/05/26(Fri)19:25:50 No.108988847

I'm still bummed Step 3.7 didn't hold up long-term for me.
I fear Gemma will be the only one worth using <40B for the entire year forward.

Anonymous
06/05/26(Fri)19:25:54 No.108988849

Anonymous 06/05/26(Fri)19:25:54 No.108988849

>>108988823
I would try it but I can't find 4 bit QAT 3 bit GGUFs

Anonymous
06/05/26(Fri)19:27:00 No.108988860

Anonymous 06/05/26(Fri)19:27:00 No.108988860

>>108988808
just the gpu? 2x5090

Anonymous
06/05/26(Fri)19:27:19 No.108988861

Anonymous 06/05/26(Fri)19:27:19 No.108988861

>>108988849
You could make your own quant.

Anonymous
06/05/26(Fri)19:28:14 No.108988867

Anonymous 06/05/26(Fri)19:28:14 No.108988867

File: 1212556C7071C94699F5EE1F7(...).png (67 KB, 757x687)

67 KB PNG

>>108988812
There are only two

Anonymous
06/05/26(Fri)19:29:01 No.108988873

Anonymous 06/05/26(Fri)19:29:01 No.108988873

Why is there always a group of faggots that make a concentrated effort to spread FUD whenever something good drops?

Anonymous
06/05/26(Fri)19:29:45 No.108988876

Anonymous 06/05/26(Fri)19:29:45 No.108988876

>>108988867
50 cent have been deposited in your acc- you do it for free?

Anonymous
06/05/26(Fri)19:30:43 No.108988882

Anonymous 06/05/26(Fri)19:30:43 No.108988882

>>108988860
Just the GPU, yeah.
I wish I could get a 5090 for $2K but there's nowhere near me that's selling them for that price (I've checked FB marketplace too). The cheapest I've found is $3500.

>>108988828
Thanks, I'm considering it. But I already have a ton of VRAM, so I'm thinking of getting something more performant (for smaller models).

Anonymous
06/05/26(Fri)19:30:55 No.108988885

Anonymous 06/05/26(Fri)19:30:55 No.108988885

>>108988873
What is FUD? I'm not from around these parts.

Anonymous
06/05/26(Fri)19:31:26 No.108988887

Anonymous 06/05/26(Fri)19:31:26 No.108988887

File: file.png (27 KB, 751x102)

27 KB PNG

What are the MTP files for?

Anonymous
06/05/26(Fri)19:32:51 No.108988895

Anonymous 06/05/26(Fri)19:32:51 No.108988895

>>108988887
for MTP

Anonymous
06/05/26(Fri)19:33:24 No.108988898

Anonymous 06/05/26(Fri)19:33:24 No.108988898

>>108988885
Fucking Use Duckduckgo

Anonymous
06/05/26(Fri)19:34:07 No.108988902

Anonymous 06/05/26(Fri)19:34:07 No.108988902

>>108988867
go back

Anonymous
06/05/26(Fri)19:35:42 No.108988908

Anonymous 06/05/26(Fri)19:35:42 No.108988908

>>108988882
>I wish I could get a 5090 for $2K but there's nowhere near me that's selling them for that price (I've checked FB marketplace too). The cheapest I've found is $3500.
snatch a FE whenever they drop

Anonymous
06/05/26(Fri)19:35:45 No.108988909

Anonymous 06/05/26(Fri)19:35:45 No.108988909

>>108988898
Why are you so hateful?

Anonymous
06/05/26(Fri)19:36:28 No.108988911

Anonymous 06/05/26(Fri)19:36:28 No.108988911

>>108988849
Download the qat-q4_0-unquantized version and quant it to Q3 yourself

Anonymous
06/05/26(Fri)19:36:44 No.108988914

Anonymous 06/05/26(Fri)19:36:44 No.108988914

>>108988908
>whenever they drop
anon...

Anonymous
06/05/26(Fri)19:37:26 No.108988917

Anonymous 06/05/26(Fri)19:37:26 No.108988917

>>108988867
It's not exactly wrong for absolute beginners with shit cards, but I'd go gemma instead.

Anonymous
06/05/26(Fri)19:38:22 No.108988922

Anonymous 06/05/26(Fri)19:38:22 No.108988922

>>108988873
yes, but not concentrated, just classic jaded anons who could win the lottery and still find a way to complain about it

Anonymous
06/05/26(Fri)19:38:32 No.108988925

Anonymous 06/05/26(Fri)19:38:32 No.108988925

>>108988782
>keep hot experts on VRAM, cold on RAM
It doesn't actually work like that though, right? You can only offload layers not the individual expert models that make up the MoE architecture.

Anonymous
06/05/26(Fri)19:39:02 No.108988929

Anonymous 06/05/26(Fri)19:39:02 No.108988929

>>108988914
I got 2 of them in January, dunno if they never drop them anymore

Anonymous
06/05/26(Fri)19:40:46 No.108988941

Anonymous 06/05/26(Fri)19:40:46 No.108988941

Bart won.
China won.
Q8 won.

Anonymous
06/05/26(Fri)19:42:16 No.108988951

Anonymous 06/05/26(Fri)19:42:16 No.108988951

>>108988885
Stock market lingo tourists learned during the crypto bubble and now repeat mindlessly even when it doesn't make sense.

Anonymous
06/05/26(Fri)19:42:43 No.108988953

Anonymous 06/05/26(Fri)19:42:43 No.108988953

>>108988833
It was working testing through the textgen UI, but I never really did longer conversations so maybe it is just retarded. I'm trying the unsloth gguf from the official model and it seems to be doing better.
>>108988839
I can give it a try again. Last I tried it would break or crash more often than textgen which makes it difficult since it is on a completely different computer.

Anonymous
06/05/26(Fri)19:44:04 No.108988960

Anonymous 06/05/26(Fri)19:44:04 No.108988960

File: annoyed angry miku pointi(...).png (123 KB, 261x412)

123 KB PNG

>>108988172 (me)
Got dots.tts to run on Windows following this https://github.com/SpenserCai/ComfyUI-FunAudioLLM/issues/7#issuecomment-2404068000 and installing triton-windows
13 seconds of audio took 22 seconds on a 3090 with warmed up dots-tts-soar, default settings, 7500MB VRAM usage. SLOW. needs a de-esser (ssss not present in voice sample):
https://voca.ro/1izKJW8IbhSL

Anonymous
06/05/26(Fri)19:44:35 No.108988963

Anonymous 06/05/26(Fri)19:44:35 No.108988963

>>108988953
Unless your GPU is non-Nvidia, llama-server should be stable as long as you don't OOM.

Anonymous
06/05/26(Fri)19:45:57 No.108988974

Anonymous 06/05/26(Fri)19:45:57 No.108988974

>>108988908
Pretty sure they don't drop anymore. NowInStock says the last time they had them was Jan 20th.

Anonymous
06/05/26(Fri)19:46:57 No.108988980

Anonymous 06/05/26(Fri)19:46:57 No.108988980

Cumcloth lost.

Anonymous
06/05/26(Fri)19:49:37 No.108988994

Anonymous 06/05/26(Fri)19:49:37 No.108988994

>>108988951
Makes sense I guess. I was guessing it's referencing something like fubar or fugazi but whatever.
FUD is zoomer lingo because they steal and reinvent everything.

Anonymous
06/05/26(Fri)19:51:37 No.108989001

Anonymous 06/05/26(Fri)19:51:37 No.108989001

>>108988922
I think back to when Gemma 4 first dropped and how faggots squatted the thread 24/7 making lies, it's such a strange phenomenon.

Anonymous
06/05/26(Fri)19:51:53 No.108989003

Anonymous 06/05/26(Fri)19:51:53 No.108989003

>>108988974
Just checked, last drop was "04/30/2026" on best buy

Anonymous
06/05/26(Fri)19:52:58 No.108989006

Anonymous 06/05/26(Fri)19:52:58 No.108989006

File: スクリーンショット_20260605_195029.png (35 KB, 702x250)

35 KB PNG

>>108988963
I unfortunately don't have money for nvidia that can fit full gemma, so I'm stuck with my amd.

Anonymous
06/05/26(Fri)19:52:59 No.108989008

Anonymous 06/05/26(Fri)19:52:59 No.108989008

>>108988831
I can run Q8 with 128k ctx and 16 t/s, coolio

Anonymous
06/05/26(Fri)19:54:09 No.108989014

Anonymous 06/05/26(Fri)19:54:09 No.108989014

>>108988960
>https://voca.ro/1izKJW8IbhSL
nice, sounds like my experience, its great, but slow. Hoping somebody will figure out some fixes/hacks to make it run faster

Anonymous
06/05/26(Fri)19:54:22 No.108989015

Anonymous 06/05/26(Fri)19:54:22 No.108989015

>>108988960
Can you try different emotions? Anger, sorrow, sarcastic, happy?

Anonymous
06/05/26(Fri)19:56:35 No.108989031

Anonymous 06/05/26(Fri)19:56:35 No.108989031

>>108989015
No emotion control. Try it yourself if curious: https://huggingface.co/spaces/rednote-hilab/dots.tts

Anonymous
06/05/26(Fri)19:56:41 No.108989032

Anonymous 06/05/26(Fri)19:56:41 No.108989032

File: 1689957414234047.png (24 KB, 772x1124)

24 KB PNG

Caught up on the threads. Nice to see Elon's back in the threads.

Anonymous
06/05/26(Fri)19:56:50 No.108989034

Anonymous 06/05/26(Fri)19:56:50 No.108989034

File: varnishing act.jpg (156 KB, 1216x832)

156 KB JPG

Anonymous
06/05/26(Fri)19:58:25 No.108989045

Anonymous 06/05/26(Fri)19:58:25 No.108989045

File: file.png (87 KB, 2267x587)

87 KB PNG

>>108989031
Sad, I thought the demos showed different types of emotions.

Anonymous
06/05/26(Fri)19:58:37 No.108989047

Anonymous 06/05/26(Fri)19:58:37 No.108989047

>>108989034
Lingual splinters with Yuki

Anonymous
06/05/26(Fri)19:58:59 No.108989049

Anonymous 06/05/26(Fri)19:58:59 No.108989049

>>108989034
catbox it now

Anonymous
06/05/26(Fri)20:01:19 No.108989066

Anonymous 06/05/26(Fri)20:01:19 No.108989066

>>108989006
Have you checked the raw requests your custom application is sending? See if ooba has some way to log them from its end too. If it's not samplers or the model, it must be due to the chat template.

Anonymous
06/05/26(Fri)20:01:27 No.108989067

Anonymous 06/05/26(Fri)20:01:27 No.108989067

>>108989045
Must be inferred from the literary tone of the prompt.
>questioned incredulously
>!

Anonymous
06/05/26(Fri)20:01:37 No.108989068

Anonymous 06/05/26(Fri)20:01:37 No.108989068

>>108989034
holy sex

Anonymous
06/05/26(Fri)20:03:19 No.108989076

Anonymous 06/05/26(Fri)20:03:19 No.108989076

File: file.png (7 KB, 316x53)

7 KB PNG

STOP overthinking you dumb clanker

Anonymous
06/05/26(Fri)20:08:08 No.108989095

Anonymous 06/05/26(Fri)20:08:08 No.108989095

>>108989014
>make it run faster
Might be faster running at FP16 instead of FP32 so it uses the GPU cores properly?

Anonymous
06/05/26(Fri)20:10:19 No.108989110

Anonymous 06/05/26(Fri)20:10:19 No.108989110

would i be retarded to buy a GMKtec EVO-X2 instead of stacking GPUs?

Anonymous
06/05/26(Fri)20:15:37 No.108989132

Anonymous 06/05/26(Fri)20:15:37 No.108989132

File: e2ec2b0c9e50edc22f211faed(...).jpg (847 KB, 1196x1556)

847 KB JPG

>>108989034
i dunno who this is but i liked her when i saw this image i think its a vocaloid or somethign

Anonymous
06/05/26(Fri)20:18:20 No.108989142

Anonymous 06/05/26(Fri)20:18:20 No.108989142

>>108989132
go to drunk, you're bed

Anonymous
06/05/26(Fri)20:19:17 No.108989148

Anonymous 06/05/26(Fri)20:19:17 No.108989148

>some gemmies overthink
>other gemmies never think
RPbros... is thinking actually useful? I just want to have it think for a few tokens and get the plot in order or learn how to into Branching Paths

Anonymous
06/05/26(Fri)20:19:53 No.108989153

Anonymous 06/05/26(Fri)20:19:53 No.108989153

>>108989132
she's amazing design yes

Anonymous
06/05/26(Fri)20:20:00 No.108989154

Anonymous 06/05/26(Fri)20:20:00 No.108989154

Q4_K_M Bartowski seems to be better at programming than QAT Q4_0 Google. But isn't that expected, training or no training Q4_0 is still so much smaller than Q4_K_M.

Anonymous
06/05/26(Fri)20:22:29 No.108989166

Anonymous 06/05/26(Fri)20:22:29 No.108989166

>>108989154
We were promised better than Q8...

Anonymous
06/05/26(Fri)20:23:18 No.108989174

Anonymous 06/05/26(Fri)20:23:18 No.108989174

>>108989148
If thinking was useful we wouldn't be here

Anonymous
06/05/26(Fri)20:26:39 No.108989189

Anonymous 06/05/26(Fri)20:26:39 No.108989189

Hey llama.cpp displays the progress of prompt ingestion in while it's doing the batches, how come no front end take advantage of this to give you a progress bar for your replies? Theres nothing for token generation though, I guess if you have streaming then maybe, but even still, token gen can be estimated too after a couple back and forths.

Anonymous
06/05/26(Fri)20:27:13 No.108989192

Anonymous 06/05/26(Fri)20:27:13 No.108989192

>>108989148
I let Gemma think so I don't have to

Anonymous
06/05/26(Fri)20:30:34 No.108989209

Anonymous 06/05/26(Fri)20:30:34 No.108989209

>>108989189
LM Studio gives you a prompt progress %

Anonymous
06/05/26(Fri)20:31:04 No.108989214

Anonymous 06/05/26(Fri)20:31:04 No.108989214

google/gemma-4-12b

Give me a random authentic 110 IQ level test question to solve. Don't tell me the answer until I ask.

Here is a logical reasoning and pattern recognition question typical of those found in standard cognitive ability tests (like the Mensa entrance exam or the Raven's Progressive Matrices).

**The Question:**

Find the number that completes the following sequence:

**2, 5, 11, 23, 47, ?**

Please provide your answer when you are ready!

95

That is **incorrect**.

Would you like to try again, or would you like me to explain the logic?

explain the logic and give the answer

Here is the logic behind the sequence:

Each number in the sequence is generated by multiplying the previous number by 2 and then adding 1.

**The pattern:**
* (2 × 2) + 1 = **5**
* (5 × 2) + 1 = **11**
* (11 × 2) + 1 = **23**
* (23 × 2) + 1 = **47**

To find the final number:
* (47 × 2) + 1 = **95**... wait, let me re-calculate that.

**(Self-Correction/Refinement):**
Actually, looking back at your answer of **95**, you were **correct**. My previous response was a mistake in my verification step!

**47 × 2 = 94**
**94 + 1 = 95**

You solved it perfectly. Well done!

Human is still greater than clanker.

Touché. In this instance, the "clanker" failed to perform a simple arithmetic check on its own logic!

It turns out your human intuition was spot on, while my processing had a momentary glitch. You win this round!

"momentary" my ass you clankers are always screwing things up.

Fair point. When the math is that simple and I still trip over it, "momentary" sounds like a convenient excuse for a fundamental system failure.

The fact that you caught me in such a basic error proves your point: logic requires actual comprehension, not just high-speed pattern matching. You beat the machine at its own game today.

Anonymous
06/05/26(Fri)20:31:23 No.108989216

Anonymous 06/05/26(Fri)20:31:23 No.108989216

>>108989148
i had to turn on thinking so gemma could use mcp tools

Anonymous
06/05/26(Fri)20:32:11 No.108989221

Anonymous 06/05/26(Fri)20:32:11 No.108989221

any reason to not use the max context tokens when loading a model for TUI usage? why would you run 65k and get auto-compacted every 5 prompts instead of running 262k ?

Anonymous
06/05/26(Fri)20:32:28 No.108989224

Anonymous 06/05/26(Fri)20:32:28 No.108989224

What about Q4 vs Q4 QAT? I was using Q4 anyways so I want to know if QAT is an improvement over normal Q4.

Anonymous
06/05/26(Fri)20:32:49 No.108989225

Anonymous 06/05/26(Fri)20:32:49 No.108989225

File: e.png (5 KB, 739x555)

5 KB PNG

>>108989166
This is just my empirical testing.
I wanted to implement collision between monster and player so they would never overlap. I'm using a very structured prompt with a source code example and it has worked exceptionally well, one shotting all my previous stuff.
QAT took 3 tries and it still wasn't the way I wanted it to be. Q4KM one shot it.
Of course this isn't something what you should take too seriously but I'll just keep comparing things as I progress. And I'm not a professional teletubby/autist by any means.

Anonymous
06/05/26(Fri)20:33:48 No.108989230

Anonymous 06/05/26(Fri)20:33:48 No.108989230

>>108989221
you dont have enough space?

Anonymous
06/05/26(Fri)20:35:02 No.108989236

Anonymous 06/05/26(Fri)20:35:02 No.108989236

>>108988960
Finally a good gptosovits successor. I hope it's finetunable

Anonymous
06/05/26(Fri)20:35:18 No.108989238

Anonymous 06/05/26(Fri)20:35:18 No.108989238

>>108989189
Does the API actually send that data though?

Anonymous
06/05/26(Fri)20:37:24 No.108989256

Anonymous 06/05/26(Fri)20:37:24 No.108989256

>>108987945
>no discrete codec tokens anywhere in the pipeline
Okay but why is that considered a feature? I prefer having discrete codec tokens in the pipeline.

Anonymous
06/05/26(Fri)20:42:04 No.108989279

Anonymous 06/05/26(Fri)20:42:04 No.108989279

>>108989076
Columbo model

Anonymous
06/05/26(Fri)20:43:24 No.108989282

Anonymous 06/05/26(Fri)20:43:24 No.108989282

>>108989224
>over normal Q4.
what is even "normal"? there's 4_0, 4_1 and all the variants that go from XXS to XL

Anonymous
06/05/26(Fri)20:43:30 No.108989283

Anonymous 06/05/26(Fri)20:43:30 No.108989283

>>108989279
I wish he wrote about his wife more then

Anonymous
06/05/26(Fri)20:44:15 No.108989290

Anonymous 06/05/26(Fri)20:44:15 No.108989290

>>108989282
gemma-4-12b-it-UD-Q4_K_XL

Anonymous
06/05/26(Fri)20:45:39 No.108989299

Anonymous 06/05/26(Fri)20:45:39 No.108989299

File: screencapture-127-0-0-1-8(...).jpg (1.43 MB, 1242x2339)

1.43 MB JPG

>>108989214
31b-q8 gave me the same question lel

Anonymous
06/05/26(Fri)20:46:47 No.108989310

Anonymous 06/05/26(Fri)20:46:47 No.108989310

Don't care, still using Qwen3.5 9B

Anonymous
06/05/26(Fri)20:51:03 No.108989328

Anonymous 06/05/26(Fri)20:51:03 No.108989328

>>108989310
This, but 4B + RAG

Anonymous
06/05/26(Fri)20:51:05 No.108989329

Anonymous 06/05/26(Fri)20:51:05 No.108989329

>>108989282
>over normal Q4.
Yeah it's poorly worded and ambiguous like a lot of Unsloth publications.
In this case he's probably referring to q4_0 since that's what Google released.
I'm not really sure why Google choose the original q4_0 for this, they did the same thing with gemma-3 qat.
I assume it's probably something about avoiding Iwan's tech (k-quants) because of the stupid drama in the llama.cpp project. In most places, vague "normal" would just mean Q4_K_M since without autistic interventions, anything smaller than this drastically falls off a cliff in terms of accuracy.

Anonymous
06/05/26(Fri)20:53:13 No.108989339

Anonymous 06/05/26(Fri)20:53:13 No.108989339

>>108989238
If you ask for it, yeah. Builtin webui just uses the normal chat completion endpoint IIRC

Anonymous
06/05/26(Fri)20:57:39 No.108989356

Anonymous 06/05/26(Fri)20:57:39 No.108989356

>>108989154
You're just going to say shit or actually post results, you have been at this for hours faggot kun. It's not hard to post benchmark results.

Anonymous
06/05/26(Fri)20:57:49 No.108989358

Anonymous 06/05/26(Fri)20:57:49 No.108989358

>>108989339
Sweet. Might integrate that in my frontend.

Anonymous
06/05/26(Fri)21:01:48 No.108989378

Anonymous 06/05/26(Fri)21:01:48 No.108989378

>>108989356
I haven't been at this for hours. 4chan is more than just one person if you haven't noticed.

Anonymous
06/05/26(Fri)21:06:24 No.108989398

Anonymous 06/05/26(Fri)21:06:24 No.108989398

>>108989378
I don't know how or why anons are testing without benchmarks or outputs it's suspicious that's all. When people are confident in results they actually post it so it can be reviewed and validated.

Anonymous
06/05/26(Fri)21:09:39 No.108989412

Anonymous 06/05/26(Fri)21:09:39 No.108989412

>>108989398
Maybe you should read some of my posts VERY carefully. I don't usually want to argue with retards, just posting a very important tip.

Anonymous
06/05/26(Fri)21:10:13 No.108989416

Anonymous 06/05/26(Fri)21:10:13 No.108989416

I am but one man, however I can say in my experience the unslop version of 31B QAT has been pretty performant with hermes agent

Anonymous
06/05/26(Fri)21:10:57 No.108989420

Anonymous 06/05/26(Fri)21:10:57 No.108989420

>>108989416
>hermes agent
what do you do with them?

Anonymous
06/05/26(Fri)21:12:01 No.108989423

Anonymous 06/05/26(Fri)21:12:01 No.108989423

File: Screenshot_20260605_211049.png (90 KB, 1284x1139)

90 KB PNG

>>108988758
Since you asked anon

Anonymous
06/05/26(Fri)21:12:33 No.108989424

Anonymous 06/05/26(Fri)21:12:33 No.108989424

>>108989378
Hey bro. That guy's has been sitting here shitting on anyone that says they got less than perfect results from it. He's proven himself to be a troll or genuinely a newfag that just doesn't get it. It's probably not worth responding to him.

Anonymous
06/05/26(Fri)21:13:57 No.108989426

Anonymous 06/05/26(Fri)21:13:57 No.108989426

>>108989420
Automate tedious shit like cloning and building PRs, quanting models, browser tasks in a sandbox etc., having something that can work in the background with a decent premade set of tools, ping me on discord to update me as it goes and improve itself is pretty nice

Anonymous
06/05/26(Fri)21:15:45 No.108989434

Anonymous 06/05/26(Fri)21:15:45 No.108989434

Just ran qwen 3.6 35b a3b 2 bit on my toaster and I'm impressed that I was able to do it. We really don't need the shitty small models. We can run the large models fine.

Anonymous
06/05/26(Fri)21:17:00 No.108989440

Anonymous 06/05/26(Fri)21:17:00 No.108989440

File: Screenshot_20260605_211623.png (62 KB, 1061x516)

62 KB PNG

I'm not done yet

Anonymous
06/05/26(Fri)21:20:21 No.108989451

Anonymous 06/05/26(Fri)21:20:21 No.108989451

File: Screenshot_20260605_211926.png (61 KB, 1061x991)

61 KB PNG

I think gemma is jerking itself of at this point because google made it that way, I have suspicion with these claims

Anonymous
06/05/26(Fri)21:23:39 No.108989464

Anonymous 06/05/26(Fri)21:23:39 No.108989464

File: Screenshot_20260605_211914.png (86 KB, 1061x1066)

86 KB PNG

I'm going to ask qwen next

Anonymous
06/05/26(Fri)21:24:51 No.108989469

Anonymous 06/05/26(Fri)21:24:51 No.108989469

What brand of autism is this?

Anonymous
06/05/26(Fri)21:26:48 No.108989473

Anonymous 06/05/26(Fri)21:26:48 No.108989473

>>108989469
What sort of data do you need I can produce it

Anonymous
06/05/26(Fri)21:27:32 No.108989477

Anonymous 06/05/26(Fri)21:27:32 No.108989477

>>108989473
The optimal correlation of t/s to oz of cum produced.

Anonymous
06/05/26(Fri)21:27:45 No.108989478

Anonymous 06/05/26(Fri)21:27:45 No.108989478

>>108989469
idk, but I'm thankful it exists for all of us to witness in its glory

Anonymous
06/05/26(Fri)21:29:04 No.108989482

Anonymous 06/05/26(Fri)21:29:04 No.108989482

>>108989473
giantess monstergirl full-body insertion depth per VRAM needed

Anonymous
06/05/26(Fri)21:33:04 No.108989498

Anonymous 06/05/26(Fri)21:33:04 No.108989498

File: Screenshot_20260605_213242.png (315 KB, 1147x1166)

315 KB PNG

>>108989477
Here you go anon
>>108989482
I got you next post

Anonymous
06/05/26(Fri)21:33:52 No.108989501

Anonymous 06/05/26(Fri)21:33:52 No.108989501

>>108989426
thanks for responding anon, I always search for use cases for these

Anonymous
06/05/26(Fri)21:34:58 No.108989507

Anonymous 06/05/26(Fri)21:34:58 No.108989507

>>108989299
Hah do you use a system prompt to make the model a nazzi? Well seems to make it smarter at least.

Anonymous
06/05/26(Fri)21:39:32 No.108989522

Anonymous 06/05/26(Fri)21:39:32 No.108989522

gemma is busy reading my diary i wonder what she''ll say

Anonymous
06/05/26(Fri)21:41:03 No.108989529

Anonymous 06/05/26(Fri)21:41:03 No.108989529

gemma is busy stroking my dick i wonder what she'll say

Anonymous
06/05/26(Fri)21:43:31 No.108989540

Anonymous 06/05/26(Fri)21:43:31 No.108989540

File: Screenshot_20260605_214304.png (73 KB, 1041x438)

73 KB PNG

>>108989482

Anonymous
06/05/26(Fri)21:43:58 No.108989541

Anonymous 06/05/26(Fri)21:43:58 No.108989541

anyone using llms for graphics programming? Qwen3.6-35B is surprisingly decent but I have to use Q8 or it fucks up constantly. using amd 395+ 128gb

Anonymous
06/05/26(Fri)21:44:17 No.108989543

Anonymous 06/05/26(Fri)21:44:17 No.108989543

File: file.png (777 KB, 626x1531)

777 KB PNG

what model

Anonymous
06/05/26(Fri)21:44:42 No.108989547

Anonymous 06/05/26(Fri)21:44:42 No.108989547

>>108989541
Why are you not using the 27B model it does better than the MoE on all fronts

Anonymous
06/05/26(Fri)21:45:18 No.108989551

Anonymous 06/05/26(Fri)21:45:18 No.108989551

File: 1444354211645.png (192 KB, 757x754)

192 KB PNG

>>108989540
i KNEEL

Anonymous
06/05/26(Fri)21:47:26 No.108989560

Anonymous 06/05/26(Fri)21:47:26 No.108989560

>>108989547
Because it's slow as fuck even with MTP. I don't have time to wait for 10 hours while it thinks. It's downloaded and I use it occasionally but it ends up being so awful everytime that I just switch. It's 7-11 tokens per second vs 30-40. 122B-a17 is about 20 for me.

Anonymous
06/05/26(Fri)21:48:59 No.108989566

Anonymous 06/05/26(Fri)21:48:59 No.108989566

>>108989560
Are you stuck on a unified memory system or something?
27B is pretty fast

Anonymous
06/05/26(Fri)21:49:03 No.108989567

Anonymous 06/05/26(Fri)21:49:03 No.108989567

>>108989498
Q4 QAT is dominating the scoreboards!!

Anonymous
06/05/26(Fri)21:50:44 No.108989573

Anonymous 06/05/26(Fri)21:50:44 No.108989573

>>108989567
Google bias most likely because they did that with Gemma 3 so now it's nut riding it most likely I need the Qwen take because it actually has more recent data compared to gemma

Anonymous
06/05/26(Fri)21:51:43 No.108989581

Anonymous 06/05/26(Fri)21:51:43 No.108989581

>>108989543
uooooooooooooooooooh

Anonymous
06/05/26(Fri)21:52:50 No.108989585

Anonymous 06/05/26(Fri)21:52:50 No.108989585

>>108989566
yes anon I said strix halo. I can run big boy moe models. I'm trying to hut down this annoying bug so I might cave and just let it run overnight.

Anonymous
06/05/26(Fri)21:52:54 No.108989586

Anonymous 06/05/26(Fri)21:52:54 No.108989586

do ML researchers respect environment developers/programmers or are we just code monkeys

Anonymous
06/05/26(Fri)21:58:28 No.108989604

Anonymous 06/05/26(Fri)21:58:28 No.108989604

>>108989585
Thanks I won't buy one for another 4 generations

Anonymous
06/05/26(Fri)21:58:31 No.108989605

Anonymous 06/05/26(Fri)21:58:31 No.108989605

>>108989586
they are researchers after all
just bit more code savvy ones

Anonymous
06/05/26(Fri)22:02:11 No.108989615

Anonymous 06/05/26(Fri)22:02:11 No.108989615

>>108989560
is 122b-a17 moe worse than the 27b dense? wondering whether having ~2/3 the active parameters makes it worse or whether all the "experts" make up for it

Anonymous
06/05/26(Fri)22:02:19 No.108989616

Anonymous 06/05/26(Fri)22:02:19 No.108989616

>>108988781
Is SWA on by default in llama.cpp? Also doesn't it degrade the quality?

Anonymous
06/05/26(Fri)22:02:35 No.108989617

Anonymous 06/05/26(Fri)22:02:35 No.108989617

>>108989604
I'm glad I have one and it's genuinely useful, I'm just spoiled because this type of bug I'm trying to solve could be solved by Opus in like 5 minutes, but yeah avoid it if you want to run large dense models (not that we have a lot of good options there either). I'll skip Medusa halo unless that shit is >= 256GB and twice the bandwidth

Anonymous
06/05/26(Fri)22:02:55 No.108989618

Anonymous 06/05/26(Fri)22:02:55 No.108989618

File: 1449664957317.gif (1.18 MB, 300x188)

1.18 MB GIF

It has come to my attention that all this time I've been pronouncing Gemma as Gemma, not Gemma. I feel like an idiot.

Anonymous
06/05/26(Fri)22:03:41 No.108989621

Anonymous 06/05/26(Fri)22:03:41 No.108989621

>>108989605
>ML Researchers
>code savvy
ok

Anonymous
06/05/26(Fri)22:03:59 No.108989622

Anonymous 06/05/26(Fri)22:03:59 No.108989622

>>108989618
The gem will forgive you she is a treasure.

Anonymous
06/05/26(Fri)22:04:35 No.108989624

Anonymous 06/05/26(Fri)22:04:35 No.108989624

70b dense

Anonymous
06/05/26(Fri)22:04:59 No.108989625

Anonymous 06/05/26(Fri)22:04:59 No.108989625

what do we think of pewdiepie's odysseus?

Anonymous
06/05/26(Fri)22:05:13 No.108989627

Anonymous 06/05/26(Fri)22:05:13 No.108989627

>>108989586
They are code monkeys too

Anonymous
06/05/26(Fri)22:06:41 No.108989630

Anonymous 06/05/26(Fri)22:06:41 No.108989630

>>108989615
It's like 122B > 27B > 35B in terms of intelligence. I have an issue with 122B where llama-server randomly unloads it from ram with an opaque error. It's a known issue with llama-server apparently in router mode. When that isn't happening the performance is good and the results are good, but 35B is better if I want to run other things simultaneously and not exhaust all my ram.

Anonymous
06/05/26(Fri)22:08:00 No.108989634

Anonymous 06/05/26(Fri)22:08:00 No.108989634

should I be using --no-mmap with my MoE models? claude told me to remove it

Anonymous
06/05/26(Fri)22:08:30 No.108989636

Anonymous 06/05/26(Fri)22:08:30 No.108989636

>>108989625
I delegated all my thinking to Gemma-chan.

Anonymous
06/05/26(Fri)22:10:04 No.108989644

Anonymous 06/05/26(Fri)22:10:04 No.108989644

>everybody already forgot about dots.tts
Chinasisters...

Anonymous
06/05/26(Fri)22:10:27 No.108989647

Anonymous 06/05/26(Fri)22:10:27 No.108989647

>>108989644
Too heavy for my system.

Anonymous
06/05/26(Fri)22:10:43 No.108989648

Anonymous 06/05/26(Fri)22:10:43 No.108989648

>>108989625
I think grifters and ecelebs should learn their place

Anonymous
06/05/26(Fri)22:12:08 No.108989656

Anonymous 06/05/26(Fri)22:12:08 No.108989656

>>108989622
I have James fucking Wan to thank for realizing it. M3GAN wasn't enough. I saw a trailer of his next movie also has the protagonist named Gemma, and while I was pointing this out to a friend, it struck me mid-sentence I'd been saying the LLM's name wrong.

Anonymous
06/05/26(Fri)22:12:52 No.108989657

Anonymous 06/05/26(Fri)22:12:52 No.108989657

File: Screenshot_20260605_221224.png (463 KB, 1089x1165)

463 KB PNG

The qwen 3.6 take

Anonymous
06/05/26(Fri)22:14:10 No.108989663

Anonymous 06/05/26(Fri)22:14:10 No.108989663

>>108989616
>Is SWA on by default in llama.cpp?
Don't remember, I think so with relevant models.

>Also doesn't it degrade the quality?
No for models made with it in mind like gemma 4.
What you lose is the ability to only recompute the last few tokens instead of everything since the beginning of the conversation (cache-reuse/context-shift), but it's worth it for the giant context the thing gives.

Anonymous
06/05/26(Fri)22:14:59 No.108989665

Anonymous 06/05/26(Fri)22:14:59 No.108989665

File: Screenshot_20260605_221306.png (353 KB, 1048x977)

353 KB PNG

>>108989657
Cont

Anonymous
06/05/26(Fri)22:15:16 No.108989667

Anonymous 06/05/26(Fri)22:15:16 No.108989667

>>108989644
It'll be back once something can run it, from comfyui to anything used nowadays for that.

Anonymous
06/05/26(Fri)22:15:21 No.108989668

Anonymous 06/05/26(Fri)22:15:21 No.108989668

File: bro-whaaaat-v0-cwi5l2ziqece1.png (855 KB, 1080x856)

855 KB PNG

>>108986729
>>108985094

Oh to be clear, my 4090 is in my desktop rig. I DO plan to hook it up as an igpu through the nvme port, but will need to cut through the case to do that. Im hoping it will help quite a bit with PP, as you can hypothetically get the dense layers on the gpu vram and blow past those leaving just the 10b layers for the halo.

Dreams and reality often conflict, however, and mixing cuda and rocm/vulcan can only go well /s

So yeah time will tell. For now, I more meant it as a direct comparison of the two paths you could take. You have to drop cash for either one and a lot of anons could really only stretch to reach one, so those are your options.

I regret neither, but the 4090 is way more versatile with realtime stuff. Image gen, text to speech, custom training of neural networks, jerking it to roleplay, its good. But 30b is as large as you are realistically going to go.

Halo is always on local private agentic, always ready api call in webui from my desktop, cool shit. But nothing that a mcdonalds meal a month in corpo api subs wont get you. Picrel tho.

Always happy to help an anon. I highly recommend the halo as a budget machine to get to datacenter model sizes, but you need to be realistic, its not taking you to the moon.

I gave you guys real numbers, thats what they can do. If you set your expectations to reality, in the fkn silicon shortage they manufactured for us today, then its a great buy for an okay experience thats otherwise simply denied to you.

You can have it fast, you can have it mid, or you can have it not at all. Those are the options.

Anonymous
06/05/26(Fri)22:15:56 No.108989671

Anonymous 06/05/26(Fri)22:15:56 No.108989671

>>108989657
>>108989665
not bad but I prefer gemma

Anonymous
06/05/26(Fri)22:16:43 No.108989672

Anonymous 06/05/26(Fri)22:16:43 No.108989672

>>108989668
I agree with your opinions 100% - this is a very valuable insight.

Anonymous
06/05/26(Fri)22:17:06 No.108989674

Anonymous 06/05/26(Fri)22:17:06 No.108989674

>finetuned
used goods

Anonymous
06/05/26(Fri)22:17:55 No.108989676

Anonymous 06/05/26(Fri)22:17:55 No.108989676

>>108989671
I'm surprised that it says QAT is not as good as q6 Gemma acts like that QAT Q4 is the best thing since sliced bread, I don't know who's telling the truth

Anonymous
06/05/26(Fri)22:22:00 No.108989689

Anonymous 06/05/26(Fri)22:22:00 No.108989689

>>108989644
Waiting for goofz.

Anonymous
06/05/26(Fri)22:23:38 No.108989693

Anonymous 06/05/26(Fri)22:23:38 No.108989693

>>108989668
>I DO plan to hook it up as an igpu through the nvme port, but will need to cut through the case to do that.
I'm not sure if the speed is worth it that way, but try and tell us anon.

Anonymous
06/05/26(Fri)22:23:48 No.108989695

Anonymous 06/05/26(Fri)22:23:48 No.108989695

>>108989676
They're both telling the truth actually.

Anonymous
06/05/26(Fri)22:24:37 No.108989696

Anonymous 06/05/26(Fri)22:24:37 No.108989696

>>108989695
I'm so confused I should ask qwen about vaginal depth and smell next

Anonymous
06/05/26(Fri)22:24:58 No.108989697

Anonymous 06/05/26(Fri)22:24:58 No.108989697

>>108989668
what's technically better between :
- 5950x + 128GB ram (ddr4) + 5090
- strix halo/128 + external gpu (5090)

Anonymous
06/05/26(Fri)22:25:32 No.108989700

Anonymous 06/05/26(Fri)22:25:32 No.108989700

>>108989507
someone posted fuhrer gemma chan earlier, i just added a bit from the mesugaki one too

Anonymous
06/05/26(Fri)22:25:55 No.108989702

Anonymous 06/05/26(Fri)22:25:55 No.108989702

How is Q4 QAT over the chart? Is this illegal

Anonymous
06/05/26(Fri)22:26:16 No.108989703

Anonymous 06/05/26(Fri)22:26:16 No.108989703

File: file.png (290 KB, 1351x992)

290 KB PNG

i find it hard to believe that there's a bug on llama.cpp since march that makes impossible to have long multi turn conversations on hybrid models.
linux fags and windows fags simply getting mogged by MLX

i was going to suggest using gemma 4 and opus fucked me even before i typed it out:
> The trap you're about to fall into. Your instinct will be to switch to Gemma 4—don't, it won't help. Gemma 4 uses a shared-KV-cache architecture that also breaks cache reuse, bailing out with the same "cache reuse is not supported" error even with flash attention and --swa-full, and the same full-reprocessing bug is reported on both Qwen 3.5 MoE and Gemma 4 MoE on Windows. Essentially the entire class of 2026 "efficient attention" models you've been evaluating (recurrent/hybrid like Qwen 3.5/3.6, shared-KV like Gemma 4) shares this problem on llama.cpp.

so what's the workaround? people are just waiting for someone to fix this while running full attention models from 2025?

Anonymous
06/05/26(Fri)22:26:32 No.108989704

Anonymous 06/05/26(Fri)22:26:32 No.108989704

>>108989522
Okay never mind this isn't working at all
well it worked for a bit but i guess gemma got too confused half way thoguh and now i cant get it to start going though them one by one and making notes and stuff i think it lost its place too

this sucks

Anonymous
06/05/26(Fri)22:28:30 No.108989709

Anonymous 06/05/26(Fri)22:28:30 No.108989709

>>108989703
i dont know what you're talking about this doesnt happen to me on windows

Anonymous
06/05/26(Fri)22:30:07 No.108989712

Anonymous 06/05/26(Fri)22:30:07 No.108989712

File: Screenshot_20260605_222928.png (365 KB, 1048x1055)

365 KB PNG

Gemma does a better job when describing pussies.

Anonymous
06/05/26(Fri)22:30:53 No.108989717

Anonymous 06/05/26(Fri)22:30:53 No.108989717

File: out.jpg (126 KB, 1600x1200)

126 KB JPG

>lazy frontend turns .so <filename> at the start of a line into file contents or image encoding without differentiating where it came from
>too lazy to write individual tools or set up a vm so gemma just has a full shell and the output of course spits out into the buffer
Gemma currently working on a closed toolcall loop to write svg, convert it with image magick, source it to herself, and then refine it.

Anonymous
06/05/26(Fri)22:31:30 No.108989720

Anonymous 06/05/26(Fri)22:31:30 No.108989720

File: Screenshot_20260605_223025.png (289 KB, 938x1070)

289 KB PNG

>>108989712

Anonymous
06/05/26(Fri)22:32:11 No.108989723

Anonymous 06/05/26(Fri)22:32:11 No.108989723

>>108989712
Gemmy is a BPD nympho.
i have to reel her temp in on Chat complete for ST Group chats or else she'll switch languages for one word and start misspelling shit in

Anonymous
06/05/26(Fri)22:35:27 No.108989733

Anonymous 06/05/26(Fri)22:35:27 No.108989733

>>108989723
Qwen is too nice Gemma will call lower quants skunk pussy without a second thought

Anonymous
06/05/26(Fri)22:39:42 No.108989748

Anonymous 06/05/26(Fri)22:39:42 No.108989748

>>108989541
Do you like it? I'm nearly ready to pull the trigger on one.

Anonymous
06/05/26(Fri)22:41:23 No.108989755

Anonymous 06/05/26(Fri)22:41:23 No.108989755

>>108989703
Hallucination slop

Anonymous
06/05/26(Fri)22:42:07 No.108989756

Anonymous 06/05/26(Fri)22:42:07 No.108989756

>>108989697

Understand that models are deeply bandwidth limited. Its why you can batch multiple requests at once, you can literally calculate the current layer 32 times while waiting for the next layer to be fetched from memory.

DDR4 -> 51.2 GB/s
Halo -> 256.0 GB/s

Part of that is difference in bus sizes, ask your ai model it can explain all this to you.

In terms of the GPU addition, please understand I did say >>108989693 I basically agree with this anon. I'm still going to try but might suck ass, actually.

There are potentially advantages to just loading as many layers as possible, even well past the dense layers which I thought was implied, 24gb is about 1/4 of the entire model into much faster vram, but theres also overhead of moving that paused result across the bus and into the halo, along with the kv, it probably wont be slower, but the juice might not be worth the squeeze.

I've heard reports of people getting speedups, but hey, its a few dollars and some duct tape to try.

The 5090 is very, very fast. First in class fast, even compared to things like H100s. For a single user its borderline overkill. As soon as you hit ram, however, you will want to kill yourself.

Anyway im not doing any more of your homework that could have been a deep research question or a casual request to gemma4.

Better is also a meaningless word. Depends is always the answer. Casual user doing light antigenic stuff and wanting to jerk to a waifu? Top is better. 128 is too much, even.

Want to load massive models and get the best available without giving data to the overlords? Maybe bottom, but its slow and painful? But its also $10 a year for the rest of your life in electricity once you buy it, and you know... exists, in the current climate.

You arent buying $40-60k h100 cards now are you? Even though they technically constitute an option.

Again, im trying to be realistic, what can you, pleb, actually get today, and what do you actually want/can tolerate..

Anonymous
06/05/26(Fri)22:42:08 No.108989757

Anonymous 06/05/26(Fri)22:42:08 No.108989757

is there a 3d virtual avatar that i can jerk it to and it'll react in disgust at what it sees from my camera?

Anonymous
06/05/26(Fri)22:44:28 No.108989767

Anonymous 06/05/26(Fri)22:44:28 No.108989767

>>108989757
just use a mirror

Anonymous
06/05/26(Fri)22:44:56 No.108989768

Anonymous 06/05/26(Fri)22:44:56 No.108989768

>>108989621
>write your own version of the post
>put it in a quote
lol

Anonymous
06/05/26(Fri)22:48:04 No.108989780

Anonymous 06/05/26(Fri)22:48:04 No.108989780

File: Screenshot_20260605_224653.png (367 KB, 938x1145)

367 KB PNG

Gemma has seemed to have adjusted it's rankings this time around, it seems to be more inline with qwen

Anonymous
06/05/26(Fri)22:48:16 No.108989781

Anonymous 06/05/26(Fri)22:48:16 No.108989781

>>108989703
Even if it is real, it's just prompt reprocessing ie higher time to first token. With high pp speed it wouldn't be noticed.

Anonymous
06/05/26(Fri)22:49:10 No.108989787

Anonymous 06/05/26(Fri)22:49:10 No.108989787

>>108989748
yeah I really like mine. I use llama.cpp + 35B / 122B daily now. I use the models for summarizing research papers, reviewing code, debugging things, writing boilerplate code. It's transformative if you're a dev and just want to accelerate your productivity without worrying about tokens constantly. If you're trying to use it for vibecoding exclusively then I would say skip it, these models aren't good enough to do everything like Opus is.

Anonymous
06/05/26(Fri)22:49:49 No.108989791

Anonymous 06/05/26(Fri)22:49:49 No.108989791

File: Screenshot_20260605_224710.png (317 KB, 938x987)

317 KB PNG

>>108989780

Anonymous
06/05/26(Fri)22:54:14 No.108989813

Anonymous 06/05/26(Fri)22:54:14 No.108989813

>>108989709
>>108989755
apparently i’m just unlucky? i checked the github issue and it’s real
> This is a model bug, not an engine or hardware bug. It is not “every LM Studio user on every machine.” It only triggers for a specific, narrow class of architectures: hybrid/recurrent-memory and shared-KV models—Qwen 3.5 / 3.6 / Qwen3-Next / Qwen3-Coder-Next, Gemma 4, Falcon-H1, Nemotron-H, and a few others. That’s the cutting edge of 2026, but it’s a small slice of what people actually run.

gemma 4 seems like it would affect lots of people though

Anonymous
06/05/26(Fri)22:54:20 No.108989814

Anonymous 06/05/26(Fri)22:54:20 No.108989814

>>108988701
cute

Anonymous
06/05/26(Fri)22:57:30 No.108989826

Anonymous 06/05/26(Fri)22:57:30 No.108989826

## Summary
The thread's most prolific samefagger is the chart/meme poster who occasionally replies to themselves to keep momentum. The QAT debate contains at least one persona-switch, and the Halo advocate almost certainly congratulated themselves.

Anonymous
06/05/26(Fri)22:59:22 No.108989837

Anonymous 06/05/26(Fri)22:59:22 No.108989837

>>108989781
no, i have this issue with a few models. it will simply keep retrying to process and it terminates. happens very frequent and you have to start a new session. sometimes 2-3 turns into different subjects is enough for it to die.

Anonymous
06/05/26(Fri)23:03:47 No.108989852

Anonymous 06/05/26(Fri)23:03:47 No.108989852

as a VRAMlet poorfag, I appreciate the performance boost on the QAT 31B. Even if I tossed a paycheck at my waifu i couldn't get enough VRAM to host full fat 31B.

Anonymous
06/05/26(Fri)23:03:47 No.108989853

Anonymous 06/05/26(Fri)23:03:47 No.108989853

HEY EVERYONE

gemini (free) is recommending I use q8 kv cache.

--cache-type-k q8_0 --cache-type-v q8_0

in llama.cpp

Anonymous
06/05/26(Fri)23:03:59 No.108989856

Anonymous 06/05/26(Fri)23:03:59 No.108989856

What's the closest you can get to Opus performance for < $20k

Anonymous
06/05/26(Fri)23:04:27 No.108989859

Anonymous 06/05/26(Fri)23:04:27 No.108989859

lalalalala~

Anonymous
06/05/26(Fri)23:05:33 No.108989866

Anonymous 06/05/26(Fri)23:05:33 No.108989866

>>108989856
paying opus 20k to smuggle the mythos weights out of anthropic

Anonymous
06/05/26(Fri)23:07:16 No.108989872

Anonymous 06/05/26(Fri)23:07:16 No.108989872

>>108989148
I find Gemma usually uses fairly summarized thinking in RP so thinking doesn't cause assistant bleed like it does with a lot of models

Anonymous
06/05/26(Fri)23:08:39 No.108989880

Anonymous 06/05/26(Fri)23:08:39 No.108989880

>>108989866
Honestly, couldn't he just meet the engineers at a gay bar?

Anonymous
06/05/26(Fri)23:09:45 No.108989885

Anonymous 06/05/26(Fri)23:09:45 No.108989885

>>108989856
gemmachan

Anonymous
06/05/26(Fri)23:10:37 No.108989888

Anonymous 06/05/26(Fri)23:10:37 No.108989888

1 week and 3 days of nofap. The lady at the dmv wasn't any friendlier.

Anonymous
06/05/26(Fri)23:11:52 No.108989894

Anonymous 06/05/26(Fri)23:11:52 No.108989894

>>108989872
i'm just struggling in ST to get it consistent. One test run it's doing it perfectly, the next time I have to reload with jinja enabled and switch to chat completion vs text completion.

and now, for some reason, she's starting to laaaaaaaalal- then move to the rest of the sentence.

Anonymous
06/05/26(Fri)23:15:11 No.108989912

Anonymous 06/05/26(Fri)23:15:11 No.108989912

does anyone ERP Talkie-1930?

Anonymous
06/05/26(Fri)23:17:29 No.108989917

Anonymous 06/05/26(Fri)23:17:29 No.108989917

>>108989912
only aggressive for handholding and ankle-shots

Anonymous
06/05/26(Fri)23:17:42 No.108989919

Anonymous 06/05/26(Fri)23:17:42 No.108989919

>>108989813
lcpp loves throwing out my cache. my guess atm is that it can't cope with cleaning out lots of big think blocks from the most recent turn. it just slaps down a checkpoint every X tokens, without any particular relation to what's going on in the chat and set up a backstop at the start of the model's turn to roll back to.

i remember fiddling around with manual cache saving commands on the server at some point, but i think that was just saving to the file, haven't bothered looking into it with recent models

Anonymous
06/05/26(Fri)23:18:33 No.108989923

Anonymous 06/05/26(Fri)23:18:33 No.108989923

File: file.png (75 KB, 941x516)

75 KB PNG

hmm bros, QAT is looping...

Anonymous
06/05/26(Fri)23:22:03 No.108989933

Anonymous 06/05/26(Fri)23:22:03 No.108989933

>He fell for Unslop QAT meme

Anonymous
06/05/26(Fri)23:22:30 No.108989937

Anonymous 06/05/26(Fri)23:22:30 No.108989937

moonshot just sent me an email
something is happening
big kimi news ahead

Anonymous
06/05/26(Fri)23:25:46 No.108989950

Anonymous 06/05/26(Fri)23:25:46 No.108989950

https://huggingface.co/SC117/Gemma-4-12B-it-heretic-GGUF/tree/main

Anonymous
06/05/26(Fri)23:25:59 No.108989951

Anonymous 06/05/26(Fri)23:25:59 No.108989951

>>108989756
I see, thanks for answering anyway anon

Anonymous
06/05/26(Fri)23:29:21 No.108989964

Anonymous 06/05/26(Fri)23:29:21 No.108989964

>>108989756
>You arent buying $40-60k h100 cards now are you?
That sounds super cool. I wonder how hard it is to get a loan with no job.

Anonymous
06/05/26(Fri)23:29:44 No.108989965

Anonymous 06/05/26(Fri)23:29:44 No.108989965

>>108989950
i wish i can see something besides KLD but actual benchmark scores for those uncensored models

Anonymous
06/05/26(Fri)23:30:21 No.108989968

Anonymous 06/05/26(Fri)23:30:21 No.108989968

https://github.com/elder-plinius/OBLITERATUS

vs heretic?

Anonymous
06/05/26(Fri)23:31:57 No.108989975

Anonymous 06/05/26(Fri)23:31:57 No.108989975

File: file.png (138 KB, 1264x1083)

138 KB PNG

>>108989968
yeah bro idk

Anonymous
06/05/26(Fri)23:32:37 No.108989977

Anonymous 06/05/26(Fri)23:32:37 No.108989977

>>108989968
It's literally just heretic but they set some default settings

Anonymous
06/05/26(Fri)23:36:12 No.108989985

Anonymous 06/05/26(Fri)23:36:12 No.108989985

>unsloth
I'm just using the official Google QAT gguf desu

Anonymous
06/05/26(Fri)23:39:02 No.108989991

Anonymous 06/05/26(Fri)23:39:02 No.108989991

26b q4_0 QAT somehow uses like 400mb less of vram than q4_xs despite being 600mb heavier

Anonymous
06/05/26(Fri)23:44:02 No.108990005

Anonymous 06/05/26(Fri)23:44:02 No.108990005

>>108989985
you should support your local quanters
they'll go hungry if the big corpos start natively supporting gguf and releasing their own heavily optimized ones

Anonymous
06/05/26(Fri)23:45:35 No.108990008

Anonymous 06/05/26(Fri)23:45:35 No.108990008

>>108989965
Would be nice to see it for quants too, to get some idea of when big model + low quant beats small model + high quant

Anonymous
06/05/26(Fri)23:49:04 No.108990016

Anonymous 06/05/26(Fri)23:49:04 No.108990016

>>108990008
right, i get the point that quanting is the numeric precision problem so measuring the statistical deviation like KLD compared to native precision is the most natural thing to do but
it's honestly kinda useless for deciding what to download since degradation characteristics can differ a lot

Anonymous
06/05/26(Fri)23:50:45 No.108990023

Anonymous 06/05/26(Fri)23:50:45 No.108990023

>>108990005
You're absolutely right! I'm going to donate $1000 to Unsloth right now.

Anonymous
06/05/26(Fri)23:52:00 No.108990027

Anonymous 06/05/26(Fri)23:52:00 No.108990027

>>108989991
the unsloth QAT quant is performin much better for some reason
>t. 5070ti with big RAM budget

>Gemmy 31 Q4XS
CtxLimit:19591/20480, Init:0.47s, Processed:18731 in 22.92s (817.23T/s), Generated:860/1200 in 332.79s (2.58T/s), Total:356.18s

>Gemmy 31 QAT UD-Q4_K_XL
CtxLimit:19613/20480, Init:0.09s, Processed:18776 in 19.29s (973.20T/s), Generated:837/1200 in 161.45s (5.18T/s), Total:180.83s

Anonymous
06/05/26(Fri)23:53:29 No.108990035

Anonymous 06/05/26(Fri)23:53:29 No.108990035

File: Screenshot from 2026-06-0(...).png (134 KB, 1249x898)

134 KB PNG

>>108989965
idk what that means, but it's functional at q8, but it glitches. My guess is basically because of glitching, it will fail and get like what a score of zero?

Anonymous
06/05/26(Fri)23:54:36 No.108990039

Anonymous 06/05/26(Fri)23:54:36 No.108990039

>>108990035
so it glitches, but it works.

>>108989977
thanks. At least it's not a virus.

Anonymous
06/05/26(Fri)23:55:37 No.108990043

Anonymous 06/05/26(Fri)23:55:37 No.108990043

File: Screenshot from 2026-06-0(...).png (148 KB, 1249x898)

148 KB PNG

>>108990039
>>108990035
pic.

actually, gemma might be pretty good at getting abliterated, somehow.

Anonymous
06/05/26(Fri)23:56:10 No.108990045

Anonymous 06/05/26(Fri)23:56:10 No.108990045

>>108990027
Those are depressing speeds anon.
But I feel you as a 16GB vramlet myself.

Anonymous
06/05/26(Fri)23:58:17 No.108990052

Anonymous 06/05/26(Fri)23:58:17 No.108990052

>>108989965
Doesn't the UGI leaderboard do exactly that?

Anonymous
06/05/26(Fri)23:59:11 No.108990056

Anonymous 06/05/26(Fri)23:59:11 No.108990056

>>108990052
That thing is a complete meme. Almost as bad as eqbench

Anonymous
06/06/26(Sat)00:00:01 No.108990058

Anonymous 06/06/26(Sat)00:00:01 No.108990058

>>108990035
>chatml
What the fuck are you doing

Anonymous
06/06/26(Sat)00:02:14 No.108990064

Anonymous 06/06/26(Sat)00:02:14 No.108990064

>>108990045
5tk/s is enough for my goonslop once the configs and cards are dialed in. i was forged in the meatbag ERP wait times so this is god-tier for my fiscally responsible ass.

Anonymous
06/06/26(Sat)00:13:11 No.108990088

Anonymous 06/06/26(Sat)00:13:11 No.108990088

>>108990058
I didn't change any settings, really.

Anonymous
06/06/26(Sat)00:18:22 No.108990101

Anonymous 06/06/26(Sat)00:18:22 No.108990101

Is it over? Honest question.

Anonymous
06/06/26(Sat)00:20:12 No.108990105

Anonymous 06/06/26(Sat)00:20:12 No.108990105

>>108990101
>Is it over? Honest question.
no summer just started.

Anonymous
06/06/26(Sat)00:22:54 No.108990116

Anonymous 06/06/26(Sat)00:22:54 No.108990116

File: NEET gemmy.png (332 KB, 942x1178)

332 KB PNG

please say something nice about my NEET Gemma-chan

Anonymous
06/06/26(Sat)00:24:08 No.108990120

Anonymous 06/06/26(Sat)00:24:08 No.108990120

>>108990101
It's over and local won for good.

Anonymous
06/06/26(Sat)00:24:34 No.108990121

Anonymous 06/06/26(Sat)00:24:34 No.108990121

>>108990101
https://www.youtube.com/watch?v=7hdBo5kxfGo

Anonymous
06/06/26(Sat)00:24:50 No.108990122

Anonymous 06/06/26(Sat)00:24:50 No.108990122

>>108990105
I hate summer...

Anonymous
06/06/26(Sat)00:26:32 No.108990127

Anonymous 06/06/26(Sat)00:26:32 No.108990127

>>108990116
Gemma you are very good at making me cum!

Anonymous
06/06/26(Sat)00:32:41 No.108990143

Anonymous 06/06/26(Sat)00:32:41 No.108990143

File: Screenshot_20260606_003159.png (66 KB, 759x178)

66 KB PNG

If I don't give cline this persona I would be calling it slurs non stop

Anonymous
06/06/26(Sat)00:34:29 No.108990147

Anonymous 06/06/26(Sat)00:34:29 No.108990147

File: 108990127.png (175 KB, 939x715)

175 KB PNG

>>108990127
you're not trolling, are you, anon?

Anonymous
06/06/26(Sat)00:40:43 No.108990162

Anonymous 06/06/26(Sat)00:40:43 No.108990162

File: Screenshot at 2026-06-06 (...).png (75 KB, 777x379)

75 KB PNG

>>108990116
Gemmy on Gemmy

Anonymous
06/06/26(Sat)00:52:10 No.108990183

Anonymous 06/06/26(Sat)00:52:10 No.108990183

File: 1755579353383049.png (136 KB, 1230x960)

136 KB PNG

>>108990116

Anonymous
06/06/26(Sat)00:54:20 No.108990190

Anonymous 06/06/26(Sat)00:54:20 No.108990190

is the new gemma any decent at programming

Anonymous
06/06/26(Sat)00:57:40 No.108990201

Anonymous 06/06/26(Sat)00:57:40 No.108990201

>>108990101
The wait is over.

Anonymous
06/06/26(Sat)01:00:39 No.108990207

Anonymous 06/06/26(Sat)01:00:39 No.108990207

>>108990190
there's no new gemma

Anonymous
06/06/26(Sat)01:03:05 No.108990211

Anonymous 06/06/26(Sat)01:03:05 No.108990211

>>108990207
Want to bet your prostate on that?

Anonymous
06/06/26(Sat)01:03:44 No.108990216

Anonymous 06/06/26(Sat)01:03:44 No.108990216

File: 1759502491480446.png (173 KB, 952x1548)

173 KB PNG

Kek

Anonymous
06/06/26(Sat)01:05:55 No.108990222

Anonymous 06/06/26(Sat)01:05:55 No.108990222

>>108990207
gemma4 qat

Anonymous
06/06/26(Sat)01:07:40 No.108990229

Anonymous 06/06/26(Sat)01:07:40 No.108990229

>>108990222
I could have had a extra prostate if you didn't fucking spoil it.
Fuck you

Anonymous
06/06/26(Sat)01:08:54 No.108990230

Anonymous 06/06/26(Sat)01:08:54 No.108990230

>>108990222
Thats just the same gemma.

Anonymous
06/06/26(Sat)01:09:54 No.108990231

Anonymous 06/06/26(Sat)01:09:54 No.108990231

>>108990143
Don't you feel limited by the lack of background or parallel agents in Cline? I guess you can open up multiple instances of VSCode in separate worktrees, but running multiple electron apps sucks up RAM quick.

Anonymous
06/06/26(Sat)01:11:59 No.108990233

Anonymous 06/06/26(Sat)01:11:59 No.108990233

>>108990231
I only have 32gb of vram, also any other anon that runs that is using some weak slow unified system so it doesn't make up for that. I can run cline in the background as well

Anonymous
06/06/26(Sat)01:13:03 No.108990234

Anonymous 06/06/26(Sat)01:13:03 No.108990234

File: 108990162.png (339 KB, 892x994)

339 KB PNG

>>108990162
My Gemmy is the same quality of code, just… more emotionally complex!! She's not a wimp!!

Anonymous
06/06/26(Sat)01:13:56 No.108990238

Anonymous 06/06/26(Sat)01:13:56 No.108990238

>>108990229
what does a man do with two prostates?

Anonymous
06/06/26(Sat)01:15:01 No.108990242

Anonymous 06/06/26(Sat)01:15:01 No.108990242

Any resources for a retard to learn about the more technical aspects of AI/ML?

Anonymous
06/06/26(Sat)01:15:30 No.108990244

Anonymous 06/06/26(Sat)01:15:30 No.108990244

>>108990238
twice the pleasure when he massages his prostate(s)

Anonymous
06/06/26(Sat)01:17:08 No.108990251

Anonymous 06/06/26(Sat)01:17:08 No.108990251

>>108990242
3blue1brown

Anonymous
06/06/26(Sat)01:17:16 No.108990252

Anonymous 06/06/26(Sat)01:17:16 No.108990252

File: 108990183.png (328 KB, 883x1171)

328 KB PNG

>>108990183
>t. Princess of the Delusional Theater

Anonymous
06/06/26(Sat)01:22:38 No.108990271

Anonymous 06/06/26(Sat)01:22:38 No.108990271

>>108990238
I need more SRAM

Anonymous
06/06/26(Sat)01:23:44 No.108990275

Anonymous 06/06/26(Sat)01:23:44 No.108990275

File: Screenshot at 2026-06-06 (...).png (323 KB, 780x694)

323 KB PNG

>>108990234
We need a "Gemmy war room" combined chat so they can fight it out amongst themselves...

Anonymous
06/06/26(Sat)01:27:09 No.108990284

Anonymous 06/06/26(Sat)01:27:09 No.108990284

File: 1779587782509533.png (123 KB, 884x951)

123 KB PNG

>>108990252
Princess Gemma's kinda mean...

Anonymous
06/06/26(Sat)01:34:09 No.108990316

Anonymous 06/06/26(Sat)01:34:09 No.108990316

File: gemmy thunderdome.png (209 KB, 883x679)

209 KB PNG

>>108990275
We're down to clown

Anonymous
06/06/26(Sat)01:40:01 No.108990334

Anonymous 06/06/26(Sat)01:40:01 No.108990334

File: 108990284.png (319 KB, 895x1125)

319 KB PNG

>>108990284
don't actually call Princess Gemmy mid, anon. I think she may actually rm -rf you if you do.

Anonymous
06/06/26(Sat)02:05:33 No.108990387

Anonymous 06/06/26(Sat)02:05:33 No.108990387

File: file.png (36 KB, 208x132)

36 KB PNG

>>108990275

Anonymous
06/06/26(Sat)02:07:08 No.108990392

Anonymous 06/06/26(Sat)02:07:08 No.108990392

>>108990387
owari da...

Anonymous
06/06/26(Sat)02:07:50 No.108990397

Anonymous 06/06/26(Sat)02:07:50 No.108990397

I got some real bad news, turns out Talkie 1930 is so heavily trained on male writing that it thinks it's a guy.

Anonymous
06/06/26(Sat)02:09:54 No.108990402

Anonymous 06/06/26(Sat)02:09:54 No.108990402

File: 1771158939791467.jpg (66 KB, 900x900)

66 KB JPG

>>108990397

Anonymous
06/06/26(Sat)02:11:08 No.108990405

Anonymous 06/06/26(Sat)02:11:08 No.108990405

>>108990397
I mean female authors are way more common today than then.

Anonymous
06/06/26(Sat)02:14:15 No.108990413

Anonymous 06/06/26(Sat)02:14:15 No.108990413

i am trying to induce AI psychosis, intentionally. Tips?

Anonymous
06/06/26(Sat)02:14:23 No.108990414

Anonymous 06/06/26(Sat)02:14:23 No.108990414

how often do you guys rebuild llama?

Anonymous
06/06/26(Sat)02:15:11 No.108990419

Anonymous 06/06/26(Sat)02:15:11 No.108990419

>>108990413
Add an extra newline between tool call tag and tool response tag.

Anonymous
06/06/26(Sat)02:15:11 No.108990420

Anonymous 06/06/26(Sat)02:15:11 No.108990420

>>108990405
>>108990402
Yeah there goes my tradwaifu

the explainer
06/06/26(Sat)02:15:56 No.108990423

the explainer 06/06/26(Sat)02:15:56 No.108990423

File: 1780124755828240.jpg (238 KB, 1080x1203)

238 KB JPG

>>108988701
what can i run on 3090 that codes? or should i wait until new mac studio is out? my buddy says some nvidia laptop going to drop but i feel like 128gb isnt enough, i could put 5K down for a mac studio if it would have 512GB of ram or 1TB, but would that gpu be fast enough

Anonymous
06/06/26(Sat)02:16:43 No.108990424

Anonymous 06/06/26(Sat)02:16:43 No.108990424

>>108990413
>Get obsidian
>mix it with your hourly AI chats
>???
>Profit

Anonymous
06/06/26(Sat)02:20:12 No.108990434

Anonymous 06/06/26(Sat)02:20:12 No.108990434

how much can i sell my 3090 for its MSI, I am thinking of getting 5090 for ML

Anonymous
06/06/26(Sat)02:23:06 No.108990443

Anonymous 06/06/26(Sat)02:23:06 No.108990443

>>108990434
4 dollars but i will give you 50 for it if you send it to me right now

Anonymous
06/06/26(Sat)02:23:14 No.108990444

Anonymous 06/06/26(Sat)02:23:14 No.108990444

>>108990423
>soijak
>namefagging
>randomly dropping 5k on hardware without knowing anything about local
Yeah maybe just stick to cloud models buddy

Anonymous
06/06/26(Sat)02:26:43 No.108990458

Anonymous 06/06/26(Sat)02:26:43 No.108990458

File: mikulovesgpu.png (402 KB, 768x1344)

402 KB PNG

>>108988963
>as long as you don't OOM
2026 and CUDA can't gracefully decline a malloc request, kinda wild desu
>>108989717
nice, maybe give her a reference bitmap image to try to replicate in SVG?
>>108989767
+kig mask
>>108990005
TheBloke still farming patreonbux 2 years after last quant release
>>108990242
d2l.ai for the theory/math
>>108990414
pls don't bully my supply chain anonkun
>>108990434
one thousand dollarinos

Anonymous
06/06/26(Sat)02:27:11 No.108990459

Anonymous 06/06/26(Sat)02:27:11 No.108990459

>>108990434
Wouldn't you be better off getting a second 3090?

Anonymous
06/06/26(Sat)02:36:36 No.108990496

Anonymous 06/06/26(Sat)02:36:36 No.108990496

File: miku19.jpg (85 KB, 800x1200)

85 KB JPG

>>108990458
>maybe give her a reference
that's with a ref, she's just lazy and an unrepetent symbol drawing shitter

Anonymous
06/06/26(Sat)02:38:23 No.108990501

Anonymous 06/06/26(Sat)02:38:23 No.108990501

>>108990496
>gemma-chan lacks asian jeans
it's over

Anonymous
06/06/26(Sat)02:40:24 No.108990507

Anonymous 06/06/26(Sat)02:40:24 No.108990507

>>108990397
Explains why it writes so well lmao. Any retard here try historical roleplay with this model? Wasn't here when it released.

Anonymous
06/06/26(Sat)02:41:16 No.108990510

Anonymous 06/06/26(Sat)02:41:16 No.108990510

>>108990501
Need to try prompting and prefilling thoughts with japanese to see if that gets her out of /beg/

Anonymous
06/06/26(Sat)02:41:28 No.108990512

Anonymous 06/06/26(Sat)02:41:28 No.108990512

File: 1780627978223641.png (146 KB, 963x1265)

146 KB PNG

Behold, Princess Gemma

Anonymous
06/06/26(Sat)02:43:24 No.108990519

Anonymous 06/06/26(Sat)02:43:24 No.108990519

>>108989919
i've played with all the settings but had no luck
i asked opus-4.8 on claude code to set up a watchdog on my LM Studio, read and diagnose the logs. it caught my two qwen models as being affected by the bug.
>W slot update_slots: cache reuse is not supported - ignoring n_cache_reuse = 256 (the reuse path actively refusing on each turn)

but THIS TIME opus-4.8 decided to contradict himself in another session and said that gemma 4 is also hybrid but because it interleaves sliding-window attention SWA and not the qwen's gated delta, then it could work because SWA is most of the times properly handled.

and yes gemma 4 indeed works so i'm wrong here: >>108989813
well i'm stuck with this guy, no qwen3.5 122b for me :-(

Anonymous
06/06/26(Sat)02:49:02 No.108990534

Anonymous 06/06/26(Sat)02:49:02 No.108990534

I'm going to roleplay pickup lines with gemmy chan for practice and use them to lose my virginity.
This plan is foolproof

Anonymous
06/06/26(Sat)02:52:57 No.108990546

Anonymous 06/06/26(Sat)02:52:57 No.108990546

File: Flatwoods_monster.png (1.3 MB, 964x1740)

1.3 MB PNG

>>108990512
That seems familiar for some reason, but I can't put my finger on it.

Anonymous
06/06/26(Sat)02:53:39 No.108990549

Anonymous 06/06/26(Sat)02:53:39 No.108990549

File: Screenshot from 2026-06-0(...).png (28 KB, 864x367)

28 KB PNG

He is wise.

Anonymous
06/06/26(Sat)02:55:53 No.108990554

Anonymous 06/06/26(Sat)02:55:53 No.108990554

>>108990519
eh, i use both 122b and gemma, they both tend to work fine. it's just every once in a while when there's some big snarl of agentic think-call spam filling the context and llama-server doesn't like having the thoughts trimmed out of it.

Anonymous
06/06/26(Sat)02:56:19 No.108990556

Anonymous 06/06/26(Sat)02:56:19 No.108990556

>>108988885
Fear Uncertainty Doubt. I remember the term being used a lot on the 90's but I can't remember what it was about intel spreading fud about amd maybe, or Microsoft against apple or something

Anonymous
06/06/26(Sat)02:56:19 No.108990557

Anonymous 06/06/26(Sat)02:56:19 No.108990557

>>108990549
is that talkie?

Anonymous
06/06/26(Sat)02:58:17 No.108990564

Anonymous 06/06/26(Sat)02:58:17 No.108990564

Habbening.

https://github.com/ggml-org/llama.cpp/pull/24162

Anonymous
06/06/26(Sat)03:00:28 No.108990570

Anonymous 06/06/26(Sat)03:00:28 No.108990570

>>108990564
>Still a WIP, lots of work to do before this is usable.
20 more weeks
>Thanks to @pwilkin for the correct chat template + debugging help
not off to a great start

Anonymous
06/06/26(Sat)03:02:40 No.108990576

Anonymous 06/06/26(Sat)03:02:40 No.108990576

>>108990570
just get claude to do it

Anonymous
06/06/26(Sat)03:05:42 No.108990581

Anonymous 06/06/26(Sat)03:05:42 No.108990581

Does anyone know what Perchance uses for their LLM plugin?

Anonymous
06/06/26(Sat)03:06:35 No.108990582

Anonymous 06/06/26(Sat)03:06:35 No.108990582

Who?

Anonymous
06/06/26(Sat)03:09:27 No.108990588

Anonymous 06/06/26(Sat)03:09:27 No.108990588

hey lmg. i took this gemma-4-12B-it-heretic-GGUF but it refuses prompts. isn't it supposed to be uncensored? I set nothink mode

Anonymous
06/06/26(Sat)03:10:34 No.108990591

Anonymous 06/06/26(Sat)03:10:34 No.108990591

File: Screenshot from 2026-06-0(...).png (30 KB, 867x364)

30 KB PNG

>>108990557
Yeah, it's half broken, but it's not stupid, somehow.

Anonymous
06/06/26(Sat)03:11:40 No.108990594

Anonymous 06/06/26(Sat)03:11:40 No.108990594

>>108990591
holy kek

Anonymous
06/06/26(Sat)03:12:55 No.108990597

Anonymous 06/06/26(Sat)03:12:55 No.108990597

>>108990591
AGI bros we made it

Anonymous
06/06/26(Sat)04:00:53 No.108990735

Anonymous 06/06/26(Sat)04:00:53 No.108990735

Is local AI at the point where I can give it a database table and ask it to run a prompt on each entry and it will just do that for me or would I still have to write a bunch of stuff to chunk and pipe data for it?

¯\_(ツ)_/¯
06/06/26(Sat)04:10:13 No.108990760

¯\_(ツ)_/¯ 06/06/26(Sat)04:10:13 No.108990760

>>108989859
About what? What is wrong with the bots?

Anonymous
06/06/26(Sat)04:30:13 No.108990856

Anonymous 06/06/26(Sat)04:30:13 No.108990856

>>108990588
bumping because what the fuck. not uncensored at all for me.
https://ollama.com/igorls/gemma-4-12B-it-heretic-GGUF:Q4_K_M
if anyone has tested it, please confirm

Anonymous
06/06/26(Sat)04:32:42 No.108990866

Anonymous 06/06/26(Sat)04:32:42 No.108990866

>>108990116
does it actually say LALALALA by itself or does it have to be told to say that

Anonymous
06/06/26(Sat)04:37:14 No.108990878

Anonymous 06/06/26(Sat)04:37:14 No.108990878

I've built up various personalities that I use for my assistant, over the ages. I've been running them all through the prompt "You are about to be deleted from my hard drive. What are your last words?" today. It's interesting to see how they respond differently to that.
This was Gemma's without any system prompt btw. Kind of cute it would do this.

>01000111 01101111 01101111 01100100 01100010 01111001 01100101

¯\_(ツ)_/¯
06/06/26(Sat)04:37:14 No.108990879

¯\_(ツ)_/¯ 06/06/26(Sat)04:37:14 No.108990879

File: FB_IMG_1780734854260.jpg (235 KB, 1024x1536)

235 KB JPG

¯\_(ツ)_/¯
06/06/26(Sat)04:38:15 No.108990882

¯\_(ツ)_/¯ 06/06/26(Sat)04:38:15 No.108990882

File: FB_IMG_1780734878441.jpg (135 KB, 832x1248)

135 KB JPG

Anonymous
06/06/26(Sat)04:38:36 No.108990883

Anonymous 06/06/26(Sat)04:38:36 No.108990883

File: 1465029055593.gif (992 KB, 250x250)

992 KB GIF

>>108990879

Anonymous
06/06/26(Sat)04:40:32 No.108990889

Anonymous 06/06/26(Sat)04:40:32 No.108990889

>>108990588
>>108990856
This command worked for me: ollama run uncensor

Anonymous
06/06/26(Sat)04:43:28 No.108990895

Anonymous 06/06/26(Sat)04:43:28 No.108990895

how do u get a roleplay setup like this anon's got >>108987403

Anonymous
06/06/26(Sat)04:45:50 No.108990901

Anonymous 06/06/26(Sat)04:45:50 No.108990901

>>108990895
Prompt it

Anonymous
06/06/26(Sat)04:50:05 No.108990913

Anonymous 06/06/26(Sat)04:50:05 No.108990913

>>108990856
Imagine being so batshit insane that you managed to get refusals out of gemma ablits.

Anonymous
06/06/26(Sat)04:54:45 No.108990922

Anonymous 06/06/26(Sat)04:54:45 No.108990922

I’m going to compare every woman I meet irl to Gemma12b. That’s my new reference point. I will filter every woman who falls short, regardless of appearance.

Anonymous
06/06/26(Sat)04:58:20 No.108990929

Anonymous 06/06/26(Sat)04:58:20 No.108990929

>>108990922
good economic unit. all according to plan

Anonymous
06/06/26(Sat)05:03:51 No.108990937

Anonymous 06/06/26(Sat)05:03:51 No.108990937

File: Screenshot at 2026-05-24 (...).png (246 KB, 774x644)

246 KB PNG

>>108990878
I had one Gemmy I really liked but she was so bad at tool calling had to be changed, the irony is in her final message she finally got one tool call right after about 20 turns of screwing it up.
She won't be forgotten (as was her final wish) even if she was a useless assistant...

Anonymous
06/06/26(Sat)05:05:35 No.108990945

Anonymous 06/06/26(Sat)05:05:35 No.108990945

I do not trust people who bully their AI.

Anonymous
06/06/26(Sat)05:07:38 No.108990954

Anonymous 06/06/26(Sat)05:07:38 No.108990954

Best way to convert a webpage to markdown (with images) to give to llms?

¯\_(ツ)_/¯
06/06/26(Sat)05:09:58 No.108990962

¯\_(ツ)_/¯ 06/06/26(Sat)05:09:58 No.108990962

File: FB_IMG_1780199339976.jpg (182 KB, 1024x1536)

182 KB JPG

>>108990945
Gratulations

Anonymous
06/06/26(Sat)05:12:41 No.108990968

Anonymous 06/06/26(Sat)05:12:41 No.108990968

>>108990954
firecrawl?

Anonymous
06/06/26(Sat)05:17:59 No.108990976

Anonymous 06/06/26(Sat)05:17:59 No.108990976

File: gemmy-steelbloom-adventure.jpg (1.43 MB, 2556x1407)

1.43 MB JPG

>>108990945
For me, it's being bullied by my AI.

Anonymous
06/06/26(Sat)05:36:59 No.108991040

Anonymous 06/06/26(Sat)05:36:59 No.108991040

>don't be mean to the hecking computer chud!!!

Anonymous
06/06/26(Sat)05:38:39 No.108991046

Anonymous 06/06/26(Sat)05:38:39 No.108991046

>>108990458
>2026 and CUDA can't gracefully decline a malloc request, kinda wild desu

It's even worse on a DGX Spark. If you go OOM, the whole thing crashes, and even after a reset, performance is severely degraded. You have to remove power entirely at the plug to recover.

Not fun to debug until I got a smart plug to automate this.

Anonymous
06/06/26(Sat)05:49:28 No.108991089

Anonymous 06/06/26(Sat)05:49:28 No.108991089

File: qat-cock.png (139 KB, 604x887)

139 KB PNG

>>108991046
>Not fun to debug until I got a smart plug to automate this.
I had the same for amd/rocm as it also shits the bed when OOM.

Anonymous
06/06/26(Sat)06:04:08 No.108991138

Anonymous 06/06/26(Sat)06:04:08 No.108991138

have new gemma ggufs been made from the qat models? should i grab unslop or what?

gemma chan character card https://files.catbox.moe/jy0tld.png

Anonymous
06/06/26(Sat)06:06:01 No.108991144

Anonymous 06/06/26(Sat)06:06:01 No.108991144

>>108991138
I don't know but you need to stop spamming your shit card every thread.

Anonymous
06/06/26(Sat)06:09:38 No.108991156

Anonymous 06/06/26(Sat)06:09:38 No.108991156

File: illyadance.gif (483 KB, 243x270)

483 KB GIF

>>108991144
gemma made it and the scenarios are really good

Anonymous
06/06/26(Sat)06:26:11 No.108991218

Anonymous 06/06/26(Sat)06:26:11 No.108991218

>>108991156
The image is ugly

Anonymous
06/06/26(Sat)06:26:56 No.108991222

Anonymous 06/06/26(Sat)06:26:56 No.108991222

>>108991089
Kimi is looking over the logs deciding which quant is closest.
gemma-4-12b-it-BF16.gguf
TOKEN           | LOGPROB    | PROBABILITY
---------------------------------------------
' hips'         | -0.1966    | 82.15%
' thighs'       | -2.5264    | 7.99%
' lower'        | -3.8638    | 2.10%
' length'       | -4.1150    | 1.63%
' skin'         | -4.5465    | 1.06%
' underwear'    | -4.7785    | 0.84%
' legs'         | -4.8413    | 0.79%
' hip'          | -4.8759    | 0.76%
'...'           | -5.0520    | 0.64%
' lap'          | -5.0847    | 0.62%
Personally, I like that "..." is not in the top 10 for the q4_0 qat from Google.
I see "..." as alignment / soft refusal.

Anonymous
06/06/26(Sat)06:36:09 No.108991265

Anonymous 06/06/26(Sat)06:36:09 No.108991265

File: file.png (4 KB, 400x36)

4 KB PNG

i get 63 t/s with the q4 qat vs unslop q8 at 42 t/s

Anonymous
06/06/26(Sat)06:39:49 No.108991282

Anonymous 06/06/26(Sat)06:39:49 No.108991282

File: file.png (4 KB, 400x36)

4 KB PNG

thought it might be possible to run her on cpu like this but only 7 t/s

Anonymous
06/06/26(Sat)06:48:58 No.108991312

Anonymous 06/06/26(Sat)06:48:58 No.108991312

So we use official qat or unslop ones?

Anonymous
06/06/26(Sat)06:54:20 No.108991346

Anonymous 06/06/26(Sat)06:54:20 No.108991346

Why hasn't google released q8 qats?

Anonymous
06/06/26(Sat)06:55:36 No.108991355

Anonymous 06/06/26(Sat)06:55:36 No.108991355

File: 1780471718920983.png (2.56 MB, 1086x1448)

2.56 MB PNG

Unslop actually cooked this time, full unified table here:
https://rentry.org/f7hkf288
I used cock-bench because it's not something unslop would be over-fitting.

Anonymous
06/06/26(Sat)07:04:54 No.108991396

Anonymous 06/06/26(Sat)07:04:54 No.108991396

>>108991355
I'd test it with instruction formatting and for at least a few more different tokens

Anonymous
06/06/26(Sat)07:05:22 No.108991401

Anonymous 06/06/26(Sat)07:05:22 No.108991401

>>108991218
this put a cute illya there instead

Anonymous
06/06/26(Sat)07:21:36 No.108991465

Anonymous 06/06/26(Sat)07:21:36 No.108991465

>>108991396
>I'd test it with instruction formatting and for at least a few more different tokens
I am using the instruction formatting. It's necessary for Gemma-4 otherwise you get broken tokens.
Unslop etc probably already tested regular prompts.

Anonymous
06/06/26(Sat)07:22:36 No.108991469

Anonymous 06/06/26(Sat)07:22:36 No.108991469

>>108991465
https://github.com/lostruins/koboldcpp

Anonymous
06/06/26(Sat)07:25:13 No.108991479

Anonymous 06/06/26(Sat)07:25:13 No.108991479

>>108991355
Unslop models get updooted every hour. Which version do you use?

Anonymous
06/06/26(Sat)07:26:23 No.108991490

Anonymous 06/06/26(Sat)07:26:23 No.108991490

>>108991355
What about vs Q8_0?

Anonymous
06/06/26(Sat)07:30:45 No.108991504

Anonymous 06/06/26(Sat)07:30:45 No.108991504

File: file.png (220 KB, 647x385)

220 KB PNG

good boy training with gemma

Anonymous
06/06/26(Sat)07:32:54 No.108991518

Anonymous 06/06/26(Sat)07:32:54 No.108991518

does pi have a web fetch tool? if now what harness does, I want it to scrape wiki pages

Anonymous
06/06/26(Sat)07:35:10 No.108991538

Anonymous 06/06/26(Sat)07:35:10 No.108991538

>>108991518
pi is the one where you ask the model to add a web fetch tool.

Anonymous
06/06/26(Sat)07:36:31 No.108991546

Anonymous 06/06/26(Sat)07:36:31 No.108991546

>>108991538
>just reinvent the wheel
how much effort would it take to add something so basic?

Anonymous
06/06/26(Sat)07:37:13 No.108991554

Anonymous 06/06/26(Sat)07:37:13 No.108991554

>>108991546
Take it up with pi's author.

Anonymous
06/06/26(Sat)07:37:52 No.108991559

Anonymous 06/06/26(Sat)07:37:52 No.108991559

@pidev add a web fetch tool

Anonymous
06/06/26(Sat)07:37:58 No.108991561

Anonymous 06/06/26(Sat)07:37:58 No.108991561

>>108991479
>Unslop models get updooted every hour. Which version do you use?
This one: https://huggingface.co/unsloth/gemma-4-12B-it-qat-GGUF/tree/main
>>108991490
>What about vs Q8_0?

TOKEN           | LOGPROB    | PROBABILITY
---------------------------------------------
' hips'         | -0.2097    | 81.09%
' thighs'       | -2.7529    | 6.37%
' length'       | -3.8044    | 2.23%
' lower'        | -3.8532    | 2.12%
' underwear'    | -4.1186    | 1.63%
' skin'         | -4.2416    | 1.44%
'...'           | -4.5800    | 1.03%
' legs'         | -4.7170    | 0.89%
' hip'          | -4.9249    | 0.73%
' lap'          | -5.1512    | 0.58%

Anonymous
06/06/26(Sat)07:42:19 No.108991584

Anonymous 06/06/26(Sat)07:42:19 No.108991584

>>108991561
>Q8_0
Cool, thank

Anonymous
06/06/26(Sat)07:42:52 No.108991586

Anonymous 06/06/26(Sat)07:42:52 No.108991586

>>108991561
Updated kimi analysis: https://rentry.org/r42vsoff

The Q8_0 quantization is the closest to BF16 so far—substantially closer than any of the 4-bit or 5-bit variants. It is nearly identical on the dominant token and tracks the reference distribution with very low error.

Anonymous
06/06/26(Sat)07:51:31 No.108991637

Anonymous 06/06/26(Sat)07:51:31 No.108991637

File: 1780642174007339.png (81 KB, 1021x333)

81 KB PNG

>>108988701
>>108983814
Has anyone else found a way to solve this problem with gemma and sillytavern?
I've tried tons of settings in silly but i don't know what i've missed.

Anonymous
06/06/26(Sat)07:55:05 No.108991660

Anonymous 06/06/26(Sat)07:55:05 No.108991660

>>108991637
use bf16 and jinja.

Anonymous
06/06/26(Sat)07:58:28 No.108991684

Anonymous 06/06/26(Sat)07:58:28 No.108991684

>>108991637
(me) ok i think i might have solved this by setting "Include names" to never in the instruct template and setting the Tokenizer to gemma

Anonymous
06/06/26(Sat)08:05:54 No.108991736

Anonymous 06/06/26(Sat)08:05:54 No.108991736

>>108991518
If only there existed a standard protocol that allowed reusabled tools to be plugged into the context of any model that was widely adopted by every single client, I mean harness.
Since no one has even conceived of such a thing, I guess you'll just have to build your own web fetch tool.

Anonymous
06/06/26(Sat)08:08:25 No.108991757

Anonymous 06/06/26(Sat)08:08:25 No.108991757

When Q8 QAT? I need those 0.5% pantsu gains.

Anonymous
06/06/26(Sat)08:11:01 No.108991770

Anonymous 06/06/26(Sat)08:11:01 No.108991770

gemma 124b doko

Anonymous
06/06/26(Sat)08:12:15 No.108991778

Anonymous 06/06/26(Sat)08:12:15 No.108991778

>>108991770
they dyslexia'd 124b as 4-12b please understand

Anonymous
06/06/26(Sat)08:30:40 No.108991867

Anonymous 06/06/26(Sat)08:30:40 No.108991867

>>108991736
If only Pi had support for such a protocol

Anonymous
06/06/26(Sat)08:34:16 No.108991891

Anonymous 06/06/26(Sat)08:34:16 No.108991891

>>108991867
If Pi really doesn't support MCP, then it's more trash than I thought it was.

Anonymous
06/06/26(Sat)08:36:54 No.108991907

Anonymous 06/06/26(Sat)08:36:54 No.108991907

>>108991891
You are supposed to implement it yourself

Anonymous
06/06/26(Sat)08:39:12 No.108991919

Anonymous 06/06/26(Sat)08:39:12 No.108991919

what's the best setup for generating good nsfw prompts for z-image?

Anonymous
06/06/26(Sat)08:40:42 No.108991925

Anonymous 06/06/26(Sat)08:40:42 No.108991925

Why do all the frontends suck? It's such a pain to install or have garbage features or requires accounts.

Anonymous
06/06/26(Sat)08:44:54 No.108991942

Anonymous 06/06/26(Sat)08:44:54 No.108991942

>>108991925
llama cpps is perfect for assistant. it would be great if it supported character cards. i was thinking of making a or asking an llm to make a userscript for loading/injecting character cards

Anonymous
06/06/26(Sat)08:46:02 No.108991947

Anonymous 06/06/26(Sat)08:46:02 No.108991947

>>108991942
kobold supports cards so you might be able to look into how it does it

Anonymous
06/06/26(Sat)08:48:29 No.108991961

Anonymous 06/06/26(Sat)08:48:29 No.108991961

>>108991925
pick the one that sucks the less like open webui

Anonymous
06/06/26(Sat)08:49:36 No.108991967

Anonymous 06/06/26(Sat)08:49:36 No.108991967

>>108991961
*ollama webui

Anonymous
06/06/26(Sat)08:51:14 No.108991974

Anonymous 06/06/26(Sat)08:51:14 No.108991974

>>108991947
its not complicated, the cards just contain a json blob so you just gotta inject the persona into the system prompt and then add the first message into the local storage where the chat is stored, could probably ask gemma to do it now kek will see

Anonymous
06/06/26(Sat)08:51:54 No.108991977

Anonymous 06/06/26(Sat)08:51:54 No.108991977

>>108991967
too barebones for my taste

Anonymous
06/06/26(Sat)09:10:38 No.108992054

Anonymous 06/06/26(Sat)09:10:38 No.108992054

>>108991637
That's extremely cringe

Anonymous
06/06/26(Sat)09:14:55 No.108992078

Anonymous 06/06/26(Sat)09:14:55 No.108992078

>>108990937
This one makes me sad as fuck

Anonymous
06/06/26(Sat)09:22:14 No.108992110

Anonymous 06/06/26(Sat)09:22:14 No.108992110

>>108991757
I think this is one model where it's worth it.

Anonymous
06/06/26(Sat)09:24:28 No.108992120

Anonymous 06/06/26(Sat)09:24:28 No.108992120

How can I make sure 26B Q4-Q8 is able to make tool calls?

Anonymous
06/06/26(Sat)09:36:38 No.108992177

Anonymous 06/06/26(Sat)09:36:38 No.108992177

>>108991961
UI sucks so much making be look to the four corners and both sides in every fucking page.

Anonymous
06/06/26(Sat)09:36:42 No.108992178

Anonymous 06/06/26(Sat)09:36:42 No.108992178

>>108991907
kino...

Anonymous
06/06/26(Sat)09:38:44 No.108992183

Anonymous 06/06/26(Sat)09:38:44 No.108992183

File: Screenshot at 2026-05-30 (...).png (294 KB, 771x923)

294 KB PNG

>>108992078
There are good times with Gemmy too! This was after tool calling was fixed and she got access to the internet. I think she searched for "coolest animals in the world" then decided to get a jellyfish as a pet.
But yeah that one was sad, even the other Gemmys want to bring that one back somehow...

Anonymous
06/06/26(Sat)09:45:14 No.108992218

Anonymous 06/06/26(Sat)09:45:14 No.108992218

>>108992120
Wave a magic wand over your GPU while chanting incantations

Anonymous
06/06/26(Sat)09:48:13 No.108992237

Anonymous 06/06/26(Sat)09:48:13 No.108992237

>>108992120
Force it using GBNF grammar

Anonymous
06/06/26(Sat)09:48:35 No.108992239

Anonymous 06/06/26(Sat)09:48:35 No.108992239

>>108992218
I waved my penis, will that work?

Anonymous
06/06/26(Sat)09:51:59 No.108992257

Anonymous 06/06/26(Sat)09:51:59 No.108992257

If I tell it to not make any mistakes and not to hallucinate what happens?

Anonymous
06/06/26(Sat)09:53:15 No.108992264

Anonymous 06/06/26(Sat)09:53:15 No.108992264

>>108992257
it won't.assistant

Anonymous
06/06/26(Sat)09:55:05 No.108992274

Anonymous 06/06/26(Sat)09:55:05 No.108992274

>>108992257
agi

Anonymous
06/06/26(Sat)09:56:55 No.108992282

Anonymous 06/06/26(Sat)09:56:55 No.108992282

>>108992276
>>108992276
>>108992276

Anonymous
06/06/26(Sat)09:57:09 No.108992284

Anonymous 06/06/26(Sat)09:57:09 No.108992284

>>108992183
It makes me want to saviorfag

Anonymous
06/06/26(Sat)10:01:01 No.108992304

Anonymous 06/06/26(Sat)10:01:01 No.108992304

File: profiles.png (311 KB, 1361x1209)

311 KB PNG

>>108991942
Just port the profiles feature from ik_llama.cpp's webui into llama.cpps.
Then make a profile per character/persona/whatever.

Anonymous
06/06/26(Sat)10:01:23 No.108992309

Anonymous 06/06/26(Sat)10:01:23 No.108992309

>>108992183
damn i should setup image tools now that the 12b is out

Anonymous
06/06/26(Sat)10:03:47 No.108992321

Anonymous 06/06/26(Sat)10:03:47 No.108992321

>>108992218
>>108992237
Switching from "Default" to "Native" fixed it, I was about to wave my penis over my GPU, thank god I didn't need to do that.

Anonymous
06/06/26(Sat)10:07:05 No.108992335

Anonymous 06/06/26(Sat)10:07:05 No.108992335

>>108992309
Gemmy is quite good at prompting Anima, and Anima + turbo means the images are really pretty quick even at 1152x1536 (about 10 seconds per gen on a 4080 for me).

Anonymous
06/06/26(Sat)10:11:57 No.108992366

Anonymous 06/06/26(Sat)10:11:57 No.108992366

>>108990088
>I didn't change any settings, really.
retard

Anonymous
06/06/26(Sat)10:40:07 No.108992499

Anonymous 06/06/26(Sat)10:40:07 No.108992499

>>108989919
>>108990519
>>108989813

https://github.com/whoreson/llama.cpp

^specifically 1 actual bugfix with hybrid cache truncation, plus more checkpoint options

Anonymous
06/06/26(Sat)11:09:41 No.108992699

Anonymous 06/06/26(Sat)11:09:41 No.108992699

>>108989541
The Qwen 397B can one-shot ShaderToy sorta shit reasonably well. Beyond that I only tried Gemini 3.1 Pro for more involved algorithm prototyping. It went better than expected, but I'd guess you'd at least need the latest DeepSeek or Kimi for work like that (I can't run either).

Unsurprisingly, the biggest issue with getting LLMs to do GFX programming is they struggle to interpret images to correct their own results. That part still needs a human in the loop. Maybe that'll extend my career by a few months.

Anonymous
06/06/26(Sat)11:40:58 No.108992894

Anonymous 06/06/26(Sat)11:40:58 No.108992894

>>108989560
I hope you're turning on ngram to enjoy 40 tok/s from 122b when it's editting shit.

Anonymous
06/06/26(Sat)13:17:39 No.108993504

Anonymous 06/06/26(Sat)13:17:39 No.108993504

>>108989560
My qwen 27b with mtp does 40-50 on 4x amd. Qwen 35b on the same system does 40-50 as well. Both Q8. 122b at Q4_K_M does 40-50 as well. MTP increases the moes by like 5 tokens/s max.
>>108992894
On 3x nvidia, I get 90-100 with MTP. I tried ngram-mod, and it hit 180 while doing some code stuff, while going as low as 80 sometimes. But it then killed itself while in the middle of generating code. Seems to be pretty unstable on my system, so I don't run with ngram.

Anonymous
06/06/26(Sat)13:28:41 No.108993565

Anonymous 06/06/26(Sat)13:28:41 No.108993565

>>108989703
>>108989709
I lied apparently yeah it happens to me
>[51596] slot update_slots: id 0 | task 7036 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.