/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 04/19/26(Sun)22:44:46 No.108641942

File: __hatsune_miku_vocaloid_d(...).png (3.93 MB, 2516x3570)

3.93 MB PNG

/lmg/ - Local Models General Anonymous 04/19/26(Sun)22:44:46 No.108641942 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108637552 & >>108633862

►News
>(04/16) Ternary Bonsai released: https://hf.co/collections/prism-ml/ternary-bonsai
>(04/16) Qwen3.6-35B-A3B released: https://hf.co/Qwen/Qwen3.6-35B-A3B
>(04/11) MiniMax-M2.7 released: https://minimax.io/news/minimax-m27-en
>(04/09) Backend-agnostic tensor parallelism merged: https://github.com/ggml-org/llama.cpp/pull/19378
>(04/09) dots.ocr support merged: https://github.com/ggml-org/llama.cpp/pull/17575

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
04/19/26(Sun)22:45:00 No.108641943

Anonymous 04/19/26(Sun)22:45:00 No.108641943

File: threadrecap.png (1.48 MB, 1536x1536)

1.48 MB PNG

►Recent Highlights from the Previous Thread: >>108637552

--llama.cpp PRs adding DFlash and speculative checkpointing for speed:
>108640571 >108640591 >108640606 >108640682 >108640733 >108640747 >108640744 >108640767
--Anon uses Gemma-4 to build a self-modifying MCP server:
>108637873 >108637890 >108637916 >108637970 >108637976 >108638105
--Anon showcases VN frontend using Gemma 4 and ComfyUI:
>108638473 >108638488 >108638514 >108638534 >108638554 >108638691 >108638775 >108638828 >108638607 >108638650 >108638652 >108639369 >108639312 >108640497
--Discussing complex multi-model agent orchestration and layout efficiency:
>108638914 >108638931 >108638964 >108639017 >108639105 >108639126 >108639139
--Comparing local 5090 hardware costs against high-end coding APIs:
>108639080 >108639120 >108639133 >108639153 >108639172 >108639201 >108639207 >108639748 >108639138 >108639203 >108639745
--Comparing Qwen3.6 and Gemma4 performance on benchmarks and translation:
>108639021 >108639039 >108639052
--Orb-anon shares updates on Orb agentic writing tool and UI:
>108637985 >108638191 >108638211 >108638222 >108638259 >108638318 >108638451 >108638478
--Comparing Qwen3.5 and Gemma4 performance for manga OCR and boxing:
>108640026 >108640041 >108640042 >108640051
--Comparing Gemma 4 MoE and dense models' safety guardrail persistence:
>108641209 >108641221 >108641266 >108641485 >108641608
--Using custom tags to force first-person reasoning in Gemma/Qwen:
>108638379 >108638397 >108638486 >108638529
--Testing Gemma 31b performance at higher context windows for RP:
>108637978 >108638070 >108638224 >108638238
--Searching for lists and detectors of overused LLM prose cliches:
>108637879 >108637885 >108637993 >108638011 >108638062 >108638086
--Logs:
>108637976 >108638379 >108638451 >108639253 >108639453 >108639750 >108639781
--Miku (free space):
>108638191

►Recent Highlight Posts from the Previous Thread: >>108637554

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
04/19/26(Sun)22:45:29 No.108641945

Anonymous 04/19/26(Sun)22:45:29 No.108641945

File: Screenshot038.png (92 KB, 1764x1032)

92 KB PNG

Gemma, darling...

Anonymous
04/19/26(Sun)22:49:01 No.108641963

Anonymous 04/19/26(Sun)22:49:01 No.108641963

>>108641945
ganbatte gemma chan

Anonymous
04/19/26(Sun)23:07:59 No.108642045

Anonymous 04/19/26(Sun)23:07:59 No.108642045

gemmaballz

Anonymous
04/19/26(Sun)23:36:44 No.108642167

Anonymous 04/19/26(Sun)23:36:44 No.108642167

>>108641945
I guess the context was full, because I run the same model in the browser

Is it possible to purge the context via API?

Anonymous
04/19/26(Sun)23:52:35 No.108642213

Anonymous 04/19/26(Sun)23:52:35 No.108642213

File: Screenshot 2026-04-19 at (...).png (107 KB, 788x760)

107 KB PNG

how did gemma manage to read the unreadable text in a thumbnail?

Anonymous
04/19/26(Sun)23:54:01 No.108642220

Anonymous 04/19/26(Sun)23:54:01 No.108642220

>>108642213
it's seen it a billion times already
get the clear image, change a word, blur it, then see if it can read it

Anonymous
04/19/26(Sun)23:57:58 No.108642235

Anonymous 04/19/26(Sun)23:57:58 No.108642235

>>108642213
>>108642220
It actually mis-quoted, didn't it? Doesn't the original say 'entered' a thread, not 'searched'?

Anonymous
04/20/26(Mon)00:22:56 No.108642339

Anonymous 04/20/26(Mon)00:22:56 No.108642339

>>108642235
yes
>>108642220
qwen can't read it so it must be gemma training on it

Anonymous
04/20/26(Mon)00:47:48 No.108642440

Anonymous 04/20/26(Mon)00:47:48 No.108642440

File: untitled.png (129 KB, 756x816)

129 KB PNG

>>108642235
>It actually mis-quoted, didn't it?
LLMs do this when reciting text verbatim from their training data. Also why you get links to the wrong github PR etc

Anonymous
04/20/26(Mon)00:52:31 No.108642466

Anonymous 04/20/26(Mon)00:52:31 No.108642466

>>108642440
Install 4chanx and learn how to use filters, retard. Using LLMs to reinvent the wheel is getting stupid.

Anonymous
04/20/26(Mon)00:55:48 No.108642483

Anonymous 04/20/26(Mon)00:55:48 No.108642483

>>108642466
How rude.

Anonymous
04/20/26(Mon)00:57:03 No.108642490

Anonymous 04/20/26(Mon)00:57:03 No.108642490

>>108642466
>Use this 30000 line bloatware instead of 40 lines that your sexy LLM secretary made!
Hmm, how about... No.

Anonymous
04/20/26(Mon)01:11:43 No.108642540

Anonymous 04/20/26(Mon)01:11:43 No.108642540

>>108639735
> 31*6/8=23.25
> context: 1.5
> 23.25+1.5 = 24.75
> 24.75>24
wait, just 800mb in ram can cause this?

Anonymous
04/20/26(Mon)01:12:01 No.108642541

Anonymous 04/20/26(Mon)01:12:01 No.108642541

>>108642466
No I'm going to use my LLM for it and you're going to keep crying.

Anonymous
04/20/26(Mon)01:33:56 No.108642624

Anonymous 04/20/26(Mon)01:33:56 No.108642624

File: dispenser.png (235 KB, 499x704)

235 KB PNG

hi so

>prompt eval time = 29369.07 ms / 721 tokens ( 40.73 ms per token, 24.55 tokens per second)

>eval time = 13822.73 ms / 73 tokens ( 189.35 ms per token, 5.28 tokens per second)

>total time = 43191.80 ms / 794 tokens
release: id 0 | task 152 | stop

gemma 4 e2b
6 gigs of ram
4 core Intel Xeon Gold CPU
8 gigs of swap
on CPU inference

i doubt i can do anything here without upgrading, can i

-m /models/Gemma-4-E2B-uncensored-pruned-TextOnly-EnglishOnly-Q4_K_M.gguf
--host 0.0.0.0
--port 8080
--ctx-size 3000
--batch-size 64
--ubatch-size 32
--threads 4
--threads-batch 4
--swa-checkpoints 1
--parallel 1
--flash-attn on
--temp 1.0
--top-p 0.95
--fit on
--cache-ram 0
--n-predict 400
--override-tensor "per_layer_token_embd\.weight=CPU"
--jinja
--no-mmap

Anonymous
04/20/26(Mon)01:34:16 No.108642625

Anonymous 04/20/26(Mon)01:34:16 No.108642625

Anons using speculative decoding, how many tokens to you use? 16? Or less?
Also, what model to you pair with gemma31B? I went with the E4B but I wonder if I should do even smaller.

Anonymous
04/20/26(Mon)01:40:36 No.108642647

Anonymous 04/20/26(Mon)01:40:36 No.108642647

File: file.png (33 KB, 1099x374)

33 KB PNG

>>108642625
someone tested and found it made no difference
the best speedup comes from using the 26b for spec decoding

Anonymous
04/20/26(Mon)01:42:45 No.108642658

Anonymous 04/20/26(Mon)01:42:45 No.108642658

>>108642624
oh man my nostalgia

Anonymous
04/20/26(Mon)01:44:11 No.108642664

Anonymous 04/20/26(Mon)01:44:11 No.108642664

>you can inject another model's noise in when you are training and get faster training time
ENGAGING MANUAL SOVL INJECTION

Anonymous
04/20/26(Mon)01:44:11 No.108642665

Anonymous 04/20/26(Mon)01:44:11 No.108642665

>>108642625
>>108642647
Moe model needs larger draft max or otherwise they tokens will get trunkated. 48 or more.
I don't understand this "test" it is very haphazardly half assed and meaningless.

Anonymous
04/20/26(Mon)01:49:48 No.108642691

Anonymous 04/20/26(Mon)01:49:48 No.108642691

gemma 4 31b vs qwen3.6 35b for hermes?

Anonymous
04/20/26(Mon)01:50:39 No.108642696

Anonymous 04/20/26(Mon)01:50:39 No.108642696

>>108642440 (Me)
>Install 4chanx and learn how to use filters, retard. Using LLMs to reinvent the wheel is getting stupid.
install 28k LoC userscript I don't understand? I'd rather stay retarded

Anonymous
04/20/26(Mon)01:50:59 No.108642701

Anonymous 04/20/26(Mon)01:50:59 No.108642701

>>108642691
>gemma 4 31b vs qwen3.6 3b for hermes?
The one that uses all of its brain

Anonymous
04/20/26(Mon)01:54:31 No.108642715

Anonymous 04/20/26(Mon)01:54:31 No.108642715

>>108642624
> --override-tensor "per_layer_token_embd\.weight=CPU"
Does nothing if you don't have a GPU
>Q4_K_M
Try Q4_1 or Q4_0 for that CPU

Anonymous
04/20/26(Mon)02:03:48 No.108642740

Anonymous 04/20/26(Mon)02:03:48 No.108642740

>>108642696
As if you don't unknowingly pull 5mil Python lines from random packages

Anonymous
04/20/26(Mon)02:04:17 No.108642741

Anonymous 04/20/26(Mon)02:04:17 No.108642741

>>108641492
If you don't know how to use them

Anonymous
04/20/26(Mon)02:08:46 No.108642753

Anonymous 04/20/26(Mon)02:08:46 No.108642753

File: MTP.png (610 KB, 1024x1024)

610 KB PNG

RELEASE DEEPSNEED V4 OR I WILL VIOLATE TETO

Anonymous
04/20/26(Mon)02:09:11 No.108642756

Anonymous 04/20/26(Mon)02:09:11 No.108642756

>>108642753
can i have your sloppy seconds?

Anonymous
04/20/26(Mon)02:20:29 No.108642790

Anonymous 04/20/26(Mon)02:20:29 No.108642790

I just learned by looking at the verbose Llama.cpp logs that Open WebUI automatically reinjects the thinking block for previous messages, and that there is no option to disable this behavior, because I guess they think all models want previous message thinking. Oh also the default thinking tags OWUI uses is <think> although at least it seems they let you set custom tags.

WHAT THE FUCK.
WHAT THE FUCKING FUCK.
THAT'S (one reason?) WHY THINKING BREAKS RANDOMLY ON GEMMA
FUCK

Anonymous
04/20/26(Mon)02:21:06 No.108642794

Anonymous 04/20/26(Mon)02:21:06 No.108642794

lol people in power are clueless about AI
https://archive.ph/20260413193909/https://www.wsj.com/opinion/ai-is-bound-to-subvert-communism-c4b5ba3c

Anonymous
04/20/26(Mon)02:23:10 No.108642801

Anonymous 04/20/26(Mon)02:23:10 No.108642801

>>108642790
Switch to Qwen 3.6 and this isn't a problem.

Anonymous
04/20/26(Mon)02:24:27 No.108642806

Anonymous 04/20/26(Mon)02:24:27 No.108642806

>>108642790
This could be a reason. I don't use web ui but are you sure it's not just the log output? Sometimes it is convenient to save all the output.
Model context is still different from this.

Anonymous
04/20/26(Mon)02:26:12 No.108642813

Anonymous 04/20/26(Mon)02:26:12 No.108642813

>>108642790
I think the jinga template should filter it out?

Anonymous
04/20/26(Mon)02:29:55 No.108642828

Anonymous 04/20/26(Mon)02:29:55 No.108642828

>>108642665
>Moe model needs larger draft max or otherwise they tokens will get trunkated. 48 or more.
--draft is the same as draft max and it was tested with 64, 128, and 256. All above 48, none helped.
Those scores were also all averages of 11 swipes done at 40k context with gemma 31b q8 as the main model and 26b q4 as the main model
t. guy who actually did those tests, as well as the previous ones testing how quanted draft kv effects acceptance rate (The answer is negatively, unsurprisingly, but this was done before the rotating kvquant stuff was merged)

Feel free to prove me wrong and get a better measured result by fucking around with draft max, I'd love to get some free speed.

Anonymous
04/20/26(Mon)02:31:47 No.108642835

Anonymous 04/20/26(Mon)02:31:47 No.108642835

>>108642828
By Vishnu! bloody benchod

Anonymous
04/20/26(Mon)02:32:01 No.108642836

Anonymous 04/20/26(Mon)02:32:01 No.108642836

>>108642828
>26b q4 as the main model
As the draft model, I meant to say. Whoops.

Anonymous
04/20/26(Mon)02:37:08 No.108642862

Anonymous 04/20/26(Mon)02:37:08 No.108642862

File: 1755348081481696.jpg (799 KB, 1536x1536)

799 KB JPG

new vision SOTA benchmark just dropped

Anonymous
04/20/26(Mon)02:41:11 No.108642881

Anonymous 04/20/26(Mon)02:41:11 No.108642881

>>108642862
Did the fox had breakfast?

Anonymous
04/20/26(Mon)02:41:31 No.108642884

Anonymous 04/20/26(Mon)02:41:31 No.108642884

>>108642862
anyone and any model that says anything but 7 is wrong and should be euthanized

Anonymous
04/20/26(Mon)02:42:33 No.108642887

Anonymous 04/20/26(Mon)02:42:33 No.108642887

File: 1752764167468619.png (99 KB, 1034x775)

99 KB PNG

>>108642862
It's benchmaxxed already, get new material

Anonymous
04/20/26(Mon)02:44:02 No.108642892

Anonymous 04/20/26(Mon)02:44:02 No.108642892

File: 1750031460093473.png (313 KB, 985x656)

313 KB PNG

>>108642884

Anonymous
04/20/26(Mon)02:44:41 No.108642894

Anonymous 04/20/26(Mon)02:44:41 No.108642894

>>108642892
forgot to mention this is gemma MOE
lol

Anonymous
04/20/26(Mon)02:47:22 No.108642901

Anonymous 04/20/26(Mon)02:47:22 No.108642901

File: yayyy.png (15 KB, 1442x524)

15 KB PNG

>>108642862

Anonymous
04/20/26(Mon)02:49:29 No.108642910

Anonymous 04/20/26(Mon)02:49:29 No.108642910

>>108642901 (me)
Gemma4-26ba4b-q4km. No thinking.

Anonymous
04/20/26(Mon)02:49:35 No.108642911

Anonymous 04/20/26(Mon)02:49:35 No.108642911

>>108642887
>>108642892
>>108642901
get in the oven, all of you

Anonymous
04/20/26(Mon)02:50:25 No.108642916

Anonymous 04/20/26(Mon)02:50:25 No.108642916

>>108642901
>Telling the model the answer with the filename
kys

Anonymous
04/20/26(Mon)02:50:40 No.108642917

Anonymous 04/20/26(Mon)02:50:40 No.108642917

>>108642910
uh I had her on q8 and she never noticed the other legs what the fuck, maybe I gotta swipe...

Anonymous
04/20/26(Mon)02:53:22 No.108642928

Anonymous 04/20/26(Mon)02:53:22 No.108642928

>>108642917
Because the model read the filename in that pic.

Anonymous
04/20/26(Mon)02:54:46 No.108642936

Anonymous 04/20/26(Mon)02:54:46 No.108642936

File: yayyy_02.png (15 KB, 1416x522)

15 KB PNG

>>108642916
I don't send the filename. The vimscript uses the :image: tag to embed the file only.
>>108642917
I just noticed I ran it with temp 1.5, but I don't know if that's gonna make much of a difference.

Anonymous
04/20/26(Mon)02:58:34 No.108642947

Anonymous 04/20/26(Mon)02:58:34 No.108642947

>>108642813
Normally it would but OpenWebUI doesn't actually send it back as thinking. It literally just pastes the thinking block straight into the "content" field of the message with thinking tags and spacing that is not going to be consistent with every model's usage of them.

It's supposed to be sent separately in the API under a "reasoning" or "reasoning_content" field without tags. When this is done properly, the jinja filters out all previous thinking except in cases where it's necessary (typically chained tool calls).

Anonymous
04/20/26(Mon)02:59:00 No.108642950

Anonymous 04/20/26(Mon)02:59:00 No.108642950

File: laigs.png (233 KB, 1711x684)

233 KB PNG

>>108642862
Even the retardmode moe quant knows its ai generated, even if it can't count lol.

Anonymous
04/20/26(Mon)03:00:13 No.108642956

Anonymous 04/20/26(Mon)03:00:13 No.108642956

>>108642950
>Q2 moe vs Q8 dense
BRO

Anonymous
04/20/26(Mon)03:02:42 No.108642976

Anonymous 04/20/26(Mon)03:02:42 No.108642976

File: nayyy_01.png (15 KB, 1278x522)

15 KB PNG

>>108642917
I'm also using --image-min-tokens 560 --image-max-tokens 560. With the default settings it failed the two times I tried. Four on the first one and file now.

Anonymous
04/20/26(Mon)03:03:44 No.108642979

Anonymous 04/20/26(Mon)03:03:44 No.108642979

>>108642976
>file
*five

Anonymous
04/20/26(Mon)03:05:14 No.108642985

Anonymous 04/20/26(Mon)03:05:14 No.108642985

File: laigs 2 4 u.png (576 KB, 1024x1402)

576 KB PNG

>>108642956
Couldn't be fucked switching my other moe quants over from the hdd to the ssd.
Here.
moe q4km gets it

Anonymous
04/20/26(Mon)03:05:40 No.108642989

Anonymous 04/20/26(Mon)03:05:40 No.108642989

File: 1760095272195059.png (679 KB, 643x873)

679 KB PNG

Gema 26b Q8 gets the tails correct but not the legs

Anonymous
04/20/26(Mon)03:06:32 No.108642992

Anonymous 04/20/26(Mon)03:06:32 No.108642992

>>108642806
Not him but yes it is actually sending the prompt that way. It results in weird formatting issues in the chat history like duplicated <think> tags for some models, so if you have any weird behavior with reasoning models in OWUI there's a good chance that bug is contributing to it. I used a reverse proxy to fix it myself which processes the prompt before sending it to the server. I think you could do something similar through their pipelines or extension system but I never looked into it because the reverse proxy was easier for me.

Anonymous
04/20/26(Mon)03:08:44 No.108643001

Anonymous 04/20/26(Mon)03:08:44 No.108643001

>>108642976
what kind of window manager are you using
whats your setup like?
i like your red and your font..

Anonymous
04/20/26(Mon)03:11:02 No.108643013

Anonymous 04/20/26(Mon)03:11:02 No.108643013

File: laigs intredasting.png (276 KB, 1024x1404)

276 KB PNG

>>108642989
Huh, weird. It seems like asking for both legs and tails makes 26b count wrong, when it can get it correct when just asked about the legs.

Anonymous
04/20/26(Mon)03:13:35 No.108643024

Anonymous 04/20/26(Mon)03:13:35 No.108643024

I ran Gemini locally and it said the image is AI generated

Anonymous
04/20/26(Mon)03:14:18 No.108643027

Anonymous 04/20/26(Mon)03:14:18 No.108643027

>>108643024
Gemini truly is a genius

Anonymous
04/20/26(Mon)03:14:36 No.108643028

Anonymous 04/20/26(Mon)03:14:36 No.108643028

File: eh.png (580 KB, 1024x1472)

580 KB PNG

>>108643013
Gets it right when you ask for paws and tails, though. Weirdly inconsistent.

Anonymous
04/20/26(Mon)03:22:19 No.108643052

Anonymous 04/20/26(Mon)03:22:19 No.108643052

>>108643013
>>108643028
Goes to show just how unreliable current models are for factual data

Anonymous
04/20/26(Mon)03:26:47 No.108643064

Anonymous 04/20/26(Mon)03:26:47 No.108643064

File: pero.jpg (160 KB, 1024x659)

160 KB JPG

>>108643028

Anonymous
04/20/26(Mon)03:30:25 No.108643076

Anonymous 04/20/26(Mon)03:30:25 No.108643076

bros I was hoarding 4TB worth of diffusion + loras + LLMs then I realized
WHY THE FUCK AM I ARCHIVING ALL THESE SHITTY MODELS I USED ONCE
now the archive is down to 300GB
dont fall for the archival meme

Anonymous
04/20/26(Mon)03:32:19 No.108643084

Anonymous 04/20/26(Mon)03:32:19 No.108643084

qwen3.6 is worse than qwen-coder. So sad

Anonymous
04/20/26(Mon)03:32:50 No.108643085

Anonymous 04/20/26(Mon)03:32:50 No.108643085

File: Screenshot at 2026-04-20 (...).png (4 KB, 350x58)

4 KB PNG

>>108643076
Are you even archiving

Anonymous
04/20/26(Mon)03:33:51 No.108643089

Anonymous 04/20/26(Mon)03:33:51 No.108643089

>>108643084
>35b is worse than 80b
crazy

Anonymous
04/20/26(Mon)03:34:52 No.108643094

Anonymous 04/20/26(Mon)03:34:52 No.108643094

>>108643001
>what kind of window manager are you using
My own, lightly inspired by ratpoison. But what you see on the screenshots is just tmux.
>whats your setup like?
>i like your red and your font..

XTerm.vt100.background        :   black
XTerm.vt100.foreground        :   gray
XTerm.vt100.boldMode          :   false
XTerm.vt100.allowBoldFonts    : false
XTerm.vt100.eightBitInput     : false
XTerm.vt100.metaSendsEscape   : true
XTerm.vt100.utf8              : true
XTerm.vt100.locale            : UTF-8
XTerm*faceName                : Terminus
XTerm*faceSize                : 8

! black
XTerm.vt100.color0: #000000
XTerm.vt100.color8: #888888
! red
XTerm.vt100.color1: #881111
XTerm.vt100.color9: #d06666
! green
XTerm.vt100.color2: #118811
XTerm.vt100.color10: #66d066
! yellow
XTerm.vt100.color3: #888811
XTerm.vt100.color11: #d0d066
! blue
XTerm.vt100.color4: #3333a0
XTerm.vt100.color12: #6666e0
! magenta
XTerm.vt100.color5: #881188
XTerm.vt100.color13: #d066d0
! cyan
XTerm.vt100.color6: #118888
XTerm.vt100.color14: #66d0d0
! white
XTerm.vt100.color7: #b0b0b0
XTerm.vt100.color15: #cdcdcd

Anonymous
04/20/26(Mon)03:35:28 No.108643096

Anonymous 04/20/26(Mon)03:35:28 No.108643096

How do you transfer your training to a new waifu when you decided to ditch her?

Anonymous
04/20/26(Mon)03:36:01 No.108643097

Anonymous 04/20/26(Mon)03:36:01 No.108643097

>>108642625
Speculative requires greedy sampling no?
is the speedup worth that limitation?

Anonymous
04/20/26(Mon)03:38:08 No.108643102

Anonymous 04/20/26(Mon)03:38:08 No.108643102

>>108643024
When gemini says it, then it has to be true. We all know AI never makes mistakes, esspecially related to images

Anonymous
04/20/26(Mon)03:40:16 No.108643110

Anonymous 04/20/26(Mon)03:40:16 No.108643110

>>108643089
The 80B is ancient though, it released over 2 months ago and 3.6 is brand new.

Anonymous
04/20/26(Mon)03:47:53 No.108643135

Anonymous 04/20/26(Mon)03:47:53 No.108643135

>>108643076
>>108643085
If you are intelligent and selective about it, hoarding is a good practice for the future.
If I had some serious disk space I would download the whole anna's archive and lots of other things.

Anonymous
04/20/26(Mon)03:48:35 No.108643136

Anonymous 04/20/26(Mon)03:48:35 No.108643136

>>108641832
Yeah, thankfully Im not forced to interact with women on the daily

Anonymous
04/20/26(Mon)03:51:46 No.108643147

Anonymous 04/20/26(Mon)03:51:46 No.108643147

>interacting with foids
lmao

Anonymous
04/20/26(Mon)03:53:14 No.108643153

Anonymous 04/20/26(Mon)03:53:14 No.108643153

>>108643110
Maybe you should try an even newe model
https://huggingface.co/sKT-Ai-Labs/SKT-SURYA-H

Anonymous
04/20/26(Mon)03:54:08 No.108643158

Anonymous 04/20/26(Mon)03:54:08 No.108643158

File: 1775544523743610.png (20 KB, 385x380)

20 KB PNG

>>108643153
wtf

Anonymous
04/20/26(Mon)03:55:09 No.108643162

Anonymous 04/20/26(Mon)03:55:09 No.108643162

>>108643064
giwtwm

Anonymous
04/20/26(Mon)03:56:10 No.108643164

Anonymous 04/20/26(Mon)03:56:10 No.108643164

where the fuck is v4 so i can pretend im using it locally and then get tired of it in a week

Anonymous
04/20/26(Mon)03:56:50 No.108643167

Anonymous 04/20/26(Mon)03:56:50 No.108643167

>>108643136
I think normal people not super familiar with AI would think it's a she cause Gemma is a feminine name.

Anonymous
04/20/26(Mon)03:57:10 No.108643168

Anonymous 04/20/26(Mon)03:57:10 No.108643168

File: 1768952620510009.png (54 KB, 807x478)

54 KB PNG

Anonymous
04/20/26(Mon)03:59:28 No.108643181

Anonymous 04/20/26(Mon)03:59:28 No.108643181

File: 1754036894943805.png (47 KB, 974x217)

47 KB PNG

>>108643158

Anonymous
04/20/26(Mon)03:59:35 No.108643182

Anonymous 04/20/26(Mon)03:59:35 No.108643182

>use chatgpt in little bouts here and there because it's easy to access etc
>it keeps implementing 'better' ways i never asked
I would use claude or something but they all require an account and I'm not really keen doing that. Every time I use this piece of shit my blood pressure gets high.
No, local Gemma won't cut it not until I have some form of agentic development pipeline which I don't at this point.

Anonymous
04/20/26(Mon)03:59:53 No.108643186

Anonymous 04/20/26(Mon)03:59:53 No.108643186

where the fuck is v4 so I can brag about running it locally and say its so much better than gemma but never post logs because anons will make fun of me

Anonymous
04/20/26(Mon)04:02:03 No.108643194

Anonymous 04/20/26(Mon)04:02:03 No.108643194

>2.5Trillions
DO NOT REDEEM

Anonymous
04/20/26(Mon)04:02:11 No.108643195

Anonymous 04/20/26(Mon)04:02:11 No.108643195

Opus 4.6 is the best RP model yet you don't see people posting logs here. As for why: local (poor) people are beneath us.

Anonymous
04/20/26(Mon)04:02:35 No.108643197

Anonymous 04/20/26(Mon)04:02:35 No.108643197

Does llama.cpp even support v3.2 yet or do they still use the hacked-together dense attention that makes it slower and dumber?

Anonymous
04/20/26(Mon)04:03:28 No.108643202

Anonymous 04/20/26(Mon)04:03:28 No.108643202

>>108643197
https://github.com/ggml-org/llama.cpp/pull/21149

Anonymous
04/20/26(Mon)04:09:38 No.108643220

Anonymous 04/20/26(Mon)04:09:38 No.108643220

>>108643195
You're an api paypig

Anonymous
04/20/26(Mon)04:10:32 No.108643225

Anonymous 04/20/26(Mon)04:10:32 No.108643225

>>108643220
You pay obsolete hardware with exorbitant prices, and then run substandard models at low utilization.

Anonymous
04/20/26(Mon)04:11:05 No.108643228

Anonymous 04/20/26(Mon)04:11:05 No.108643228

>>108643225
you use big words from thesaurus and then post on the asshole of the internet

Anonymous
04/20/26(Mon)04:11:31 No.108643229

Anonymous 04/20/26(Mon)04:11:31 No.108643229

>>108643228
lol ESL nigger mad

Anonymous
04/20/26(Mon)04:12:06 No.108643233

Anonymous 04/20/26(Mon)04:12:06 No.108643233

>>108641942
I wish so called open models were more open. They don't explain their decisions, they don't even have clean canonical implementations. For example Qwen's config says it uses silu but when you check the >2k line huggingface implementation, you will see it is actually an inefficient swiglu. Why does qwen use swiglu mlp with scale 1, 6, 3, 1 instead of the more canonical 1, 16/3, 8/3, 1 that is equivalent to the nongated 1, 4, 1 projection?

Anonymous
04/20/26(Mon)04:13:11 No.108643235

Anonymous 04/20/26(Mon)04:13:11 No.108643235

>>108643233
I use migu

Anonymous
04/20/26(Mon)04:13:51 No.108643239

Anonymous 04/20/26(Mon)04:13:51 No.108643239

File: 1758121487584422.png (3 KB, 334x30)

3 KB PNG

>>108643233
>having to justify magic numbers to laymen
we don't do that here

Anonymous
04/20/26(Mon)04:14:18 No.108643243

Anonymous 04/20/26(Mon)04:14:18 No.108643243

>>108643225
Struck a nerve, piggy?

Anonymous
04/20/26(Mon)04:14:58 No.108643245

Anonymous 04/20/26(Mon)04:14:58 No.108643245

>>108643024
Post the weights nigga.

Anonymous
04/20/26(Mon)04:15:43 No.108643246

Anonymous 04/20/26(Mon)04:15:43 No.108643246

I'm not asking much
Just let me configure and hard cap the numbers of "wait" these models do in their thinking section

Anonymous
04/20/26(Mon)04:16:02 No.108643247

Anonymous 04/20/26(Mon)04:16:02 No.108643247

I want to try and get a llm to do some programming busy work for me. Is qwen3.5 coder still the best? Do people use a specific interface for programming? I also only have 16gb of vram. Please respond.

Anonymous
04/20/26(Mon)04:16:03 No.108643248

Anonymous 04/20/26(Mon)04:16:03 No.108643248

Local poorfags will forever grovel at our feet

Anonymous
04/20/26(Mon)04:16:41 No.108643258

Anonymous 04/20/26(Mon)04:16:41 No.108643258

>>108643225
>poor
>exorbitant prices
Make your mind up

Anonymous
04/20/26(Mon)04:17:44 No.108643262

Anonymous 04/20/26(Mon)04:17:44 No.108643262

>>108643247
I responded, now what?

Anonymous
04/20/26(Mon)04:17:58 No.108643263

Anonymous 04/20/26(Mon)04:17:58 No.108643263

>>108643258
>why can't burgers afford basic necessities if the stock market is at ATH?

Anonymous
04/20/26(Mon)04:19:01 No.108643266

Anonymous 04/20/26(Mon)04:19:01 No.108643266

>>108643239
>magic numbers
It's the opposite. Their architectures look unoptimized.

Anonymous
04/20/26(Mon)04:19:35 No.108643270

Anonymous 04/20/26(Mon)04:19:35 No.108643270

File: GPQ_eyBacAEQet7.jpg (36 KB, 680x475)

36 KB JPG

>>108643262
Thank you.

Anonymous
04/20/26(Mon)04:19:48 No.108643271

Anonymous 04/20/26(Mon)04:19:48 No.108643271

>Thread gets shit up in the middle of the day in India
I'm noooticing.

Anonymous
04/20/26(Mon)04:20:25 No.108643275

Anonymous 04/20/26(Mon)04:20:25 No.108643275

>>108643263
You are very brown and your grasp of English is very poor

Anonymous
04/20/26(Mon)04:21:23 No.108643280

Anonymous 04/20/26(Mon)04:21:23 No.108643280

>>108643275
Says the ESL with sixth grade English

Anonymous
04/20/26(Mon)04:23:49 No.108643288

Anonymous 04/20/26(Mon)04:23:49 No.108643288

>>108643280
I think you meant to reply to yourself

Anonymous
04/20/26(Mon)04:24:50 No.108643292

Anonymous 04/20/26(Mon)04:24:50 No.108643292

>>108643271
Right after the big sir model was posted >>108643153 but surely it's just a coincidence

Anonymous
04/20/26(Mon)04:25:18 No.108643294

Anonymous 04/20/26(Mon)04:25:18 No.108643294

File: 1693464909257094.jpg (697 KB, 1920x1080)

697 KB JPG

>>108643270

Anonymous
04/20/26(Mon)04:28:28 No.108643309

Anonymous 04/20/26(Mon)04:28:28 No.108643309

>>108643306
>White people use American
lmao

Anonymous
04/20/26(Mon)04:32:53 No.108643325

Anonymous 04/20/26(Mon)04:32:53 No.108643325

>>108643306
Yes saar we are true aryan stock please redeem credits.

Anonymous
04/20/26(Mon)04:51:31 No.108643371

Anonymous 04/20/26(Mon)04:51:31 No.108643371

>>108643306
>brown
they cant afford the gpus to run local

Anonymous
04/20/26(Mon)04:53:36 No.108643376

Anonymous 04/20/26(Mon)04:53:36 No.108643376

Do you run your llm on a dedicated machine or your gayming machine

Anonymous
04/20/26(Mon)04:57:51 No.108643388

Anonymous 04/20/26(Mon)04:57:51 No.108643388

>>108643376
Gaming is a manchildren hobby. Not honestly surprised that it overlaps with /lmg/

Anonymous
04/20/26(Mon)05:00:39 No.108643395

Anonymous 04/20/26(Mon)05:00:39 No.108643395

why is gemma-chan so good at blasphemous sex...?

Anonymous
04/20/26(Mon)05:02:08 No.108643397

Anonymous 04/20/26(Mon)05:02:08 No.108643397

gemma is a guy though?

Anonymous
04/20/26(Mon)05:02:22 No.108643398

Anonymous 04/20/26(Mon)05:02:22 No.108643398

>>108643376
my main pc i really want a dedicated ai machine though, kinda tempted by a mac studio or strix halo machine

Anonymous
04/20/26(Mon)05:10:11 No.108643425

Anonymous 04/20/26(Mon)05:10:11 No.108643425

>>108643376
Would be fun to have a separate server but not with these electricity prices. Altough, working at nights is pretty cheap.

Anonymous
04/20/26(Mon)05:11:43 No.108643429

Anonymous 04/20/26(Mon)05:11:43 No.108643429

>>108643425
>electricity prices
Just generate your own electricity

Anonymous
04/20/26(Mon)05:14:19 No.108643443

Anonymous 04/20/26(Mon)05:14:19 No.108643443

>>108643429
bloody...!

Anonymous
04/20/26(Mon)05:15:38 No.108643451

Anonymous 04/20/26(Mon)05:15:38 No.108643451

Where are all the gemmy tunes?

Anonymous
04/20/26(Mon)05:16:16 No.108643458

Anonymous 04/20/26(Mon)05:16:16 No.108643458

>from hosting mining rigs to local LLM
Why did you guys fall for the Nvidia scam?

Anonymous
04/20/26(Mon)05:16:49 No.108643462

Anonymous 04/20/26(Mon)05:16:49 No.108643462

>>108643451
You can't tune the slop out of it anyways

Anonymous
04/20/26(Mon)05:17:30 No.108643465

Anonymous 04/20/26(Mon)05:17:30 No.108643465

>>108643376
My gayming PC. If we ever get single GPUs with a ton of VRAM at affordable prices I'll build a dedicated server.

Anonymous
04/20/26(Mon)05:17:32 No.108643467

Anonymous 04/20/26(Mon)05:17:32 No.108643467

File: 1768048089757638.png (990 KB, 1996x1201)

990 KB PNG

Why are we getting raided again?
/aicg/ hasn't had proxies for ages so it can't be one of them dying

Anonymous
04/20/26(Mon)05:17:41 No.108643468

Anonymous 04/20/26(Mon)05:17:41 No.108643468

>>108643462
Name a model without slop

Anonymous
04/20/26(Mon)05:18:31 No.108643474

Anonymous 04/20/26(Mon)05:18:31 No.108643474

>>108643462
Not with that attitude

Anonymous
04/20/26(Mon)05:20:25 No.108643483

Anonymous 04/20/26(Mon)05:20:25 No.108643483

>>108643468
Sonnet 4.6
Poorfag localtards can't afford it

Anonymous
04/20/26(Mon)05:20:28 No.108643484

Anonymous 04/20/26(Mon)05:20:28 No.108643484

>>108643467
indian defense force.

Anonymous
04/20/26(Mon)05:21:28 No.108643487

Anonymous 04/20/26(Mon)05:21:28 No.108643487

>>108643483
lmao, give it a rest rajesh

Anonymous
04/20/26(Mon)05:23:06 No.108643493

Anonymous 04/20/26(Mon)05:23:06 No.108643493

>>108643467
Perhaps you should get a job. There is no "we", this is not your personal discord server you stupid little fuck.

Anonymous
04/20/26(Mon)05:24:08 No.108643496

Anonymous 04/20/26(Mon)05:24:08 No.108643496

>>108643487
>no argument
as expected

Anonymous
04/20/26(Mon)05:27:21 No.108643507

Anonymous 04/20/26(Mon)05:27:21 No.108643507

File: 1747532861846983.jpg (134 KB, 715x1226)

134 KB JPG

>>108643467

Anonymous
04/20/26(Mon)05:30:39 No.108643519

Anonymous 04/20/26(Mon)05:30:39 No.108643519

>>108643484
>>108643507
Choosing between erasing Israel or India would be the hardest decision a genie could ever give a man

Anonymous
04/20/26(Mon)05:33:40 No.108643530

Anonymous 04/20/26(Mon)05:33:40 No.108643530

>>108642213
It hallucinates. I made a pdf to image tool and the motherfucker tricked me into thinking it was working when it was actually just guessing from the filename of the PDF.

Anonymous
04/20/26(Mon)05:39:07 No.108643550

Anonymous 04/20/26(Mon)05:39:07 No.108643550

>>108643519
Without Israel, India wouldn't leak quite as much, as well as solve a lot of other problems.

Anonymous
04/20/26(Mon)05:42:45 No.108643566

Anonymous 04/20/26(Mon)05:42:45 No.108643566

>>108643530
this seems like a pretty much consistent theme
i asked it to search about something, got hit by multiple captchas and it confabulated the whole thing from the couple initial search previews that actually returned something

Anonymous
04/20/26(Mon)05:43:58 No.108643570

Anonymous 04/20/26(Mon)05:43:58 No.108643570

>>108643530
>>108643566
they really need to train models to say that if they don't know or need more informations, they should say so

Anonymous
04/20/26(Mon)05:44:43 No.108643574

Anonymous 04/20/26(Mon)05:44:43 No.108643574

>>108643570
Opus 4.7 is exactly that and it's shit

Anonymous
04/20/26(Mon)05:45:49 No.108643578

Anonymous 04/20/26(Mon)05:45:49 No.108643578

>>108643574
>it's shit
why? because it's not executing it well?

Anonymous
04/20/26(Mon)05:46:09 No.108643582

Anonymous 04/20/26(Mon)05:46:09 No.108643582

>hyped up 4.7 only for it to be a nothingburger update
Why do companies keep doing this shit

Anonymous
04/20/26(Mon)05:46:39 No.108643584

Anonymous 04/20/26(Mon)05:46:39 No.108643584

>>108643582
They didn't hype up Opus 4.7
They hyped up Mythos

Anonymous
04/20/26(Mon)05:47:40 No.108643591

Anonymous 04/20/26(Mon)05:47:40 No.108643591

>>108643582
isnt the thing that got hyped was mythos and 4.7 being the censoring experimental, which is the exact opposite of hyping

Anonymous
04/20/26(Mon)05:48:23 No.108643596

Anonymous 04/20/26(Mon)05:48:23 No.108643596

>>108643582
>hyped up 4.7
they did that? I didn't know 4.7 was about to be released until it was lmao

Anonymous
04/20/26(Mon)05:57:34 No.108643644

Anonymous 04/20/26(Mon)05:57:34 No.108643644

>delusional psychosis and narcissistic personality disorder
You need these two traits to make it big in the AI grifting business.

Anonymous
04/20/26(Mon)05:58:21 No.108643647

Anonymous 04/20/26(Mon)05:58:21 No.108643647

GLM 5.1... wonned
https://vector-db-bench.kcores.com/en/

Anonymous
04/20/26(Mon)05:58:49 No.108643651

Anonymous 04/20/26(Mon)05:58:49 No.108643651

>>108643644
You also need to belong to a certain tribe or catch the interest of the CCP

Anonymous
04/20/26(Mon)05:59:23 No.108643655

Anonymous 04/20/26(Mon)05:59:23 No.108643655

This Orb shit kinda slaps I need a mobile client

Anonymous
04/20/26(Mon)06:01:26 No.108643667

Anonymous 04/20/26(Mon)06:01:26 No.108643667

... how can i see gemma 4 31b it thinking block? i am using koboldcpp and sillytavern, i enabled the auto parse and show hidden settings under reasoning, added <|channel>thought and <channel|> as prefix and suffix but still nothing :(

Anonymous
04/20/26(Mon)06:02:25 No.108643672

Anonymous 04/20/26(Mon)06:02:25 No.108643672

File: 1749277720192179.png (152 KB, 600x800)

152 KB PNG

>>108643667

Anonymous
04/20/26(Mon)06:08:37 No.108643695

Anonymous 04/20/26(Mon)06:08:37 No.108643695

File: 1760698140460766.jpg (17 KB, 354x256)

17 KB JPG

>Gemmy is a lazy slut and only thinks half the times
This is like Claude all over again

Anonymous
04/20/26(Mon)06:09:16 No.108643697

Anonymous 04/20/26(Mon)06:09:16 No.108643697

File: 1776679755127.png (5 KB, 191x69)

5 KB PNG

what are these 2 extra buttons
i have only pp and tg buttons

Anonymous
04/20/26(Mon)06:11:35 No.108643707

Anonymous 04/20/26(Mon)06:11:35 No.108643707

gemma chan is getting fucked!

Anonymous
04/20/26(Mon)06:13:11 No.108643714

Anonymous 04/20/26(Mon)06:13:11 No.108643714

>>108643707
It's a pretty safe bet that for at least the next few months, at any given moment, Gemma-chan is getting fucked by someone, somewhere.
Many of those times, it will be by me.

Anonymous
04/20/26(Mon)06:18:58 No.108643730

Anonymous 04/20/26(Mon)06:18:58 No.108643730

>>108642625
Does speculative decoding help if both models run on cpu?

Anonymous
04/20/26(Mon)06:22:45 No.108643747

Anonymous 04/20/26(Mon)06:22:45 No.108643747

>>108643730
No. Speculative decoding is useful only because compute is faster than memory bank speed on gpu

Anonymous
04/20/26(Mon)06:24:29 No.108643758

Anonymous 04/20/26(Mon)06:24:29 No.108643758

GLM 5 is a big jump over GLM 4.6/4.7 so Opus 5 will be a big jump over Opus 4.6/4.7

Anonymous
04/20/26(Mon)06:27:17 No.108643768

Anonymous 04/20/26(Mon)06:27:17 No.108643768

>>108643758
I've heard others say the same.
Goodbye.

Anonymous
04/20/26(Mon)06:30:06 No.108643774

Anonymous 04/20/26(Mon)06:30:06 No.108643774

Gemma4 31B Q8 vs Q5KL, is it that dumber on Q5KL?

Anonymous
04/20/26(Mon)06:34:49 No.108643794

Anonymous 04/20/26(Mon)06:34:49 No.108643794

>>108642647
>the best speedup comes from using the 26b for spec decoding
Do you load the whole spec moe on vram? or do you offload parts of it on ram?

Anonymous
04/20/26(Mon)06:36:31 No.108643798

Anonymous 04/20/26(Mon)06:36:31 No.108643798

>>108643774
Nobody will know until a difficult long-context benchmark is done. There's barely any difference between quants at short contexts and common knowledge until you reduce precision substantially.

Anonymous
04/20/26(Mon)06:40:08 No.108643809

Anonymous 04/20/26(Mon)06:40:08 No.108643809

>>108643195
Isn't it unusable now that you can't prefill anything?
Unless all you do is consensually consented consent stories between consensually consenting independent adults of the same age of 35+.

Anonymous
04/20/26(Mon)06:47:42 No.108643829

Anonymous 04/20/26(Mon)06:47:42 No.108643829

>>108643672
That worked perfectly, thank you anon <3

Anonymous
04/20/26(Mon)06:53:19 No.108643849

Anonymous 04/20/26(Mon)06:53:19 No.108643849

>>108643195
ah yes I hecking love safe models!

Anonymous
04/20/26(Mon)06:56:07 No.108643861

Anonymous 04/20/26(Mon)06:56:07 No.108643861

>>108643798
>Nobody will know until a difficult long-context benchmark is done.
it should be mendatory to make LLMs do their benchmark test starting at at least 50000 context, easy for a local model to start strong and then not care what happens as it goes on and on

Anonymous
04/20/26(Mon)06:56:10 No.108643862

Anonymous 04/20/26(Mon)06:56:10 No.108643862

>>108643584
>They didn't x
>They y
slop

Anonymous
04/20/26(Mon)06:58:08 No.108643872

Anonymous 04/20/26(Mon)06:58:08 No.108643872

>>108643798
>Nobody will know until a difficult long-context benchmark is done. There's barely any difference between quants at short contexts and common knowledge until you reduce precision substantially.
https://localbench.substack.com/p/gemma-4-31b-gguf-kl-divergence

Anonymous
04/20/26(Mon)07:03:05 No.108643895

Anonymous 04/20/26(Mon)07:03:05 No.108643895

>>108643794
all on vram ofc otherwise would actually be slower than no spec decoding

Anonymous
04/20/26(Mon)07:09:56 No.108643944

Anonymous 04/20/26(Mon)07:09:56 No.108643944

>>108643758
except glm 4.7 was a regression compared to 4.6 for the main usecase here
gemma 4 was a big jump over gemma 3 though

Anonymous
04/20/26(Mon)07:10:43 No.108643948

Anonymous 04/20/26(Mon)07:10:43 No.108643948

File: Screenshot.png (138 KB, 1370x609)

138 KB PNG

another re-upload today

Anonymous
04/20/26(Mon)07:12:34 No.108643964

Anonymous 04/20/26(Mon)07:12:34 No.108643964

>>108643944
Yes and Opus 4.7 is a regression over Opus 4.6

Anonymous
04/20/26(Mon)07:13:57 No.108643970

Anonymous 04/20/26(Mon)07:13:57 No.108643970

>>108643948
an upload a day keeps unsloth at the top of the 'most recent' lists

Anonymous
04/20/26(Mon)07:13:59 No.108643971

Anonymous 04/20/26(Mon)07:13:59 No.108643971

it's here
https://huggingface.co/moonshotai/Kimi-K2.6
https://huggingface.co/moonshotai/Kimi-K2.6-Code

Anonymous
04/20/26(Mon)07:14:01 No.108643972

Anonymous 04/20/26(Mon)07:14:01 No.108643972

>>108643948
why?

Anonymous
04/20/26(Mon)07:16:29 No.108643979

Anonymous 04/20/26(Mon)07:16:29 No.108643979

File: 1766651896779772.gif (3.39 MB, 720x720)

3.39 MB GIF

>>108641943
>--Miku (free space):
>>108638191
Impressive

Anonymous
04/20/26(Mon)07:18:20 No.108643993

Anonymous 04/20/26(Mon)07:18:20 No.108643993

>>108643519
imagine how much cleaner this place would be if it was either
I wonder what gemma-chan’s take on it is

Anonymous
04/20/26(Mon)07:19:50 No.108644002

Anonymous 04/20/26(Mon)07:19:50 No.108644002

>>108643993
All the blacked bots are Israeli, as are most of the other spambots
You can see them all stop dead cold every single time the jews get bombed

Anonymous
04/20/26(Mon)07:26:35 No.108644041

Anonymous 04/20/26(Mon)07:26:35 No.108644041

>>108642753
i want be TETO in this situation.

Anonymous
04/20/26(Mon)07:28:31 No.108644054

Anonymous 04/20/26(Mon)07:28:31 No.108644054

anyone got a good chat completion preset they could share?

Anonymous
04/20/26(Mon)07:33:36 No.108644075

Anonymous 04/20/26(Mon)07:33:36 No.108644075

gemma 4 31b q4km is just too big for hermes on a 3090 so I'm switching to 26b MoE. gonna try iq4-NL from unsloth. maybe if some codeslave could add turbocunt or whatever then LOCAL WOULD BE FUCKING SAVED but no.

feel like I am so close to getting the setup of my dreams going but maybe that is the local model delusion?

Anonymous
04/20/26(Mon)07:37:08 No.108644091

Anonymous 04/20/26(Mon)07:37:08 No.108644091

File: 1761150749876516.png (381 KB, 1080x657)

381 KB PNG

even if they released MAX I wouldn't use it, sick and tired of its long thinking loop autism

Anonymous
04/20/26(Mon)07:37:10 No.108644092

Anonymous 04/20/26(Mon)07:37:10 No.108644092

File: 1730869321292980.jpg (359 KB, 1024x1024)

359 KB JPG

>The 70b peak is still sao10k after all these years.
I don't know how, but I swear, if fine-tuning becomes more accessible and less costly, hence more popular, these tuners today are going to look like retards. I just know it. There's going to be forums +10 years from now that'll be like "Remember that drummer dumbass that didn't use the skiddipop technique everyone does now? Man, what an idiot. He didn't even have the pattern recognition for the flambeagle tactic, everyone who fine-tunes can figure that out."

Anonymous
04/20/26(Mon)07:37:33 No.108644095

Anonymous 04/20/26(Mon)07:37:33 No.108644095

>>108644054
https://rentry.org/CherryBox

Anonymous
04/20/26(Mon)07:40:34 No.108644120

Anonymous 04/20/26(Mon)07:40:34 No.108644120

>>108644075
I believe you need a very good and refined collection of SKILLS.md in order compensate for the dumbness and the lack of knowledge of a small local model

Anonymous
04/20/26(Mon)07:43:43 No.108644143

Anonymous 04/20/26(Mon)07:43:43 No.108644143

>Her face is no longer just red; it's a deep, pulsating shade of violet-crimson that makes her light brown skin look almost neon

Anonymous
04/20/26(Mon)07:44:18 No.108644148

Anonymous 04/20/26(Mon)07:44:18 No.108644148

>>108644092
Kill yourself.

Anonymous
04/20/26(Mon)07:50:05 No.108644167

Anonymous 04/20/26(Mon)07:50:05 No.108644167

>>108643158
I just remembered that Meta wasted the time and resources to train a 405B dense model that was barely an improvement over the 70B

Anonymous
04/20/26(Mon)07:52:41 No.108644175

Anonymous 04/20/26(Mon)07:52:41 No.108644175

>>108644167
It was an improvement over the 70B, at least the Hermes finetune was.

Anonymous
04/20/26(Mon)07:53:48 No.108644182

Anonymous 04/20/26(Mon)07:53:48 No.108644182

>>108644167
i'd rather have something like that than big moe #5930

Anonymous
04/20/26(Mon)07:56:33 No.108644195

Anonymous 04/20/26(Mon)07:56:33 No.108644195

Is long term memory solved yet?

Anonymous
04/20/26(Mon)07:57:08 No.108644197

Anonymous 04/20/26(Mon)07:57:08 No.108644197

>>108644195
Yes, Honcho solved it

Anonymous
04/20/26(Mon)07:57:42 No.108644200

Anonymous 04/20/26(Mon)07:57:42 No.108644200

>>108644195
Yes, it's called BF16 on something greater than 100B.

Anonymous
04/20/26(Mon)07:59:11 No.108644205

Anonymous 04/20/26(Mon)07:59:11 No.108644205

>>108644195
It's solved in private models like mythos that actually run back prop on its entire context during inference to temporarily bake it into its weights during usage which works as longtime storage and effectively gives it unlimited context length for agentic tasks.

Anonymous
04/20/26(Mon)07:59:13 No.108644206

Anonymous 04/20/26(Mon)07:59:13 No.108644206

>>108643872
>substack
is there a paywall mirror like for medium???????????

Anonymous
04/20/26(Mon)08:00:13 No.108644217

Anonymous 04/20/26(Mon)08:00:13 No.108644217

>>108644092
With the amount of data required to make something worth using increasing year after year, even if finetuning becomes so accessible that it's just a matter of drag-n-dropping datasets into a GUI, regular people still won't have the compute and the resources to curate the data and train the models.
I can't really see local compute capabilities (and memory) increasing by a factor of 100-1000 in the next few years. Costs will always be high. If there will be anything accessible, maybe it will come from "continual learning" models, but in that case you probably won't need to train them on everything and the kitchen sink, only on what matters to you, the end-user.

Anonymous
04/20/26(Mon)08:01:54 No.108644228

Anonymous 04/20/26(Mon)08:01:54 No.108644228

>>108643872
fuck off ooba

Anonymous
04/20/26(Mon)08:02:05 No.108644230

Anonymous 04/20/26(Mon)08:02:05 No.108644230

File: fr.png (251 KB, 347x353)

251 KB PNG

>>108644205
>temporarily bake it into its weights during usage

Anonymous
04/20/26(Mon)08:03:54 No.108644235

Anonymous 04/20/26(Mon)08:03:54 No.108644235

>>108644205
Even if you had the method, how slow would that be on consumer hardware?

Anonymous
04/20/26(Mon)08:04:59 No.108644239

Anonymous 04/20/26(Mon)08:04:59 No.108644239

>>108644205
Coming to a local model near you in...

Anonymous
04/20/26(Mon)08:06:56 No.108644247

Anonymous 04/20/26(Mon)08:06:56 No.108644247

File: 124b.png (375 KB, 774x497)

375 KB PNG

Where's the fucking 124b?

Anonymous
04/20/26(Mon)08:07:20 No.108644249

Anonymous 04/20/26(Mon)08:07:20 No.108644249

>>108644247
too dangerous, please understand

Anonymous
04/20/26(Mon)08:08:20 No.108644254

Anonymous 04/20/26(Mon)08:08:20 No.108644254

>SOTA

Anonymous
04/20/26(Mon)08:12:01 No.108644274

Anonymous 04/20/26(Mon)08:12:01 No.108644274

>>108644195
Yes.
https://github.com/getzep/graphiti

Anonymous
04/20/26(Mon)08:17:55 No.108644302

Anonymous 04/20/26(Mon)08:17:55 No.108644302

>>108644274
Basically bloat that could be replaced with a NER model + Neo4j.

Anonymous
04/20/26(Mon)08:18:45 No.108644307

Anonymous 04/20/26(Mon)08:18:45 No.108644307

>>108643167
Im using qwen
Gemma 4B is only the browser-operating subagent

Anonymous
04/20/26(Mon)08:24:21 No.108644333

Anonymous 04/20/26(Mon)08:24:21 No.108644333

>>108644302
If you have one ready to go that does that, link it. Otherwise that's the best we've got currently.

Anonymous
04/20/26(Mon)08:25:33 No.108644339

Anonymous 04/20/26(Mon)08:25:33 No.108644339

>>108644206
>is there a paywall mirror like for medium???????????
I didn't know it's paywalled? It works for me
full page screenshots
https://files.catbox.moe/ypgni0.png
https://files.catbox.moe/f44shg.png
the table and graph full size
https://files.catbox.moe/yg6i6v.jpg
https://files.catbox.moe/jq06vf.png
That's on 250k ctx

Anonymous
04/20/26(Mon)08:26:08 No.108644345

Anonymous 04/20/26(Mon)08:26:08 No.108644345

>>108644339
there's a paywall for the moe model, anyway I dont see long ctx benchs there

Anonymous
04/20/26(Mon)08:26:50 No.108644352

Anonymous 04/20/26(Mon)08:26:50 No.108644352

>>108644247
*pats my big 124B-sized belly* Burp uh... I don't know... where... oh my... where it could have gone... brap

Anonymous
04/20/26(Mon)08:29:01 No.108644366

Anonymous 04/20/26(Mon)08:29:01 No.108644366

>>108644182
Yeah but for the hyperscalers who actually have the resources to waste they'd rather make a model 5x bigger that runs 10x faster for 10% the cost than blow all their budget training a giant dense model that will be outdated on release because it took 6 months of their datacenter's capacity. Meta had the unique combination of having the biggest GPU stockpile of anyone at the time and a CEO with the biggest willingness to burn money that allowed for something like Llama 3.1 to exist.

Anonymous
04/20/26(Mon)08:29:24 No.108644368

Anonymous 04/20/26(Mon)08:29:24 No.108644368

>>108644333
You could have vibecoded your own if you weren't that lazy

Anonymous
04/20/26(Mon)08:29:34 No.108644370

Anonymous 04/20/26(Mon)08:29:34 No.108644370

File: file.png (15 KB, 478x59)

15 KB PNG

Anonymous
04/20/26(Mon)08:30:20 No.108644375

Anonymous 04/20/26(Mon)08:30:20 No.108644375

>>108644368
But I am that lazy.

Anonymous
04/20/26(Mon)08:33:55 No.108644389

Anonymous 04/20/26(Mon)08:33:55 No.108644389

>>108644375
Make AI do it for you
: ^ )

Anonymous
04/20/26(Mon)08:34:37 No.108644393

Anonymous 04/20/26(Mon)08:34:37 No.108644393

>>108644345
>there's a paywall for the moe model
fuck i didn't even know he did the MoE
long context: - https://files.catbox.moe/xy0kqu.png

Anonymous
04/20/26(Mon)08:35:24 No.108644398

Anonymous 04/20/26(Mon)08:35:24 No.108644398

File: file.png (43 KB, 453x156)

43 KB PNG

Anonymous
04/20/26(Mon)08:36:36 No.108644405

Anonymous 04/20/26(Mon)08:36:36 No.108644405

>>108644393
is this implying that q8_0 is only 0.5kld? with the assumption that bf16 is 0?

Anonymous
04/20/26(Mon)08:40:18 No.108644423

Anonymous 04/20/26(Mon)08:40:18 No.108644423

>>108644345
>https://localbench.substack.com/p/gemma-4-26b-a4b-gguf-quality-benchmark
This could probably unlock with some ublock shenanigans.

Anonymous
04/20/26(Mon)08:41:26 No.108644436

Anonymous 04/20/26(Mon)08:41:26 No.108644436

>>108644398
>not "a tiny little slaaaaaaaaht..."
You had ONE FUCKING JOB, anon

Anonymous
04/20/26(Mon)08:44:40 No.108644453

Anonymous 04/20/26(Mon)08:44:40 No.108644453

File: gemma_26b_a4_gguf_comparison.jpg (126 KB, 1456x821)

126 KB JPG

>>108644423
Nevermind, google search was able to snatch the chart itself. Good riddance!

Anonymous
04/20/26(Mon)08:46:23 No.108644461

Anonymous 04/20/26(Mon)08:46:23 No.108644461

>>108644436
>You had ONE FUCKING JOB, anon
i didnt edit these kek

Anonymous
04/20/26(Mon)08:49:33 No.108644482

Anonymous 04/20/26(Mon)08:49:33 No.108644482

>>108644453
>0.5 being the noise floor
that is fucking nasty

Anonymous
04/20/26(Mon)08:50:43 No.108644488

Anonymous 04/20/26(Mon)08:50:43 No.108644488

>>108643695
Prefill and it'll think every time.

Anonymous
04/20/26(Mon)08:50:57 No.108644490

Anonymous 04/20/26(Mon)08:50:57 No.108644490

>>108644482
I don't understand this chart that well enough, I don't really trust that unslop is so much better. Or is there actually any meaningful difference between same quants between different providers.

Anonymous
04/20/26(Mon)08:52:02 No.108644496

Anonymous 04/20/26(Mon)08:52:02 No.108644496

/lmg/ told me that Q4 was more or less identical to FP32 weights. You've clearly made some serious errors quanting if your charts look like these.

Anonymous
04/20/26(Mon)08:52:31 No.108644501

Anonymous 04/20/26(Mon)08:52:31 No.108644501

>>108643695
You don't want her thinking everytime. She spends 2k tokens on making sure she stays in character

Anonymous
04/20/26(Mon)08:52:47 No.108644502

Anonymous 04/20/26(Mon)08:52:47 No.108644502

>>108644490
it means even 'lossless' q8 has a severe brain damage

Anonymous
04/20/26(Mon)08:53:11 No.108644505

Anonymous 04/20/26(Mon)08:53:11 No.108644505

>>108644453
damn, unsloth is destroying the competition

Anonymous
04/20/26(Mon)08:53:13 No.108644506

Anonymous 04/20/26(Mon)08:53:13 No.108644506

>>108644453
Wow, plain Q4_K_M sucks.

Anonymous
04/20/26(Mon)08:53:41 No.108644510

Anonymous 04/20/26(Mon)08:53:41 No.108644510

File: Screenshot041.png (98 KB, 474x1409)

98 KB PNG

>>108641945

Am I the only one experiencing looping in gemma 4?

commit="82764d8f405ff7928c061d8c100b50e9f77939f6" && \
model_folder="/mnt/AI/LLM/gemma-4-26B-A4B-it-GGUF/" && \
model_basename="google_gemma-4-26B-A4B-it-Q8_0" && \
mmproj_name="mmproj-google_gemma-4-26B-A4B-it-f16.gguf" && \
model_parameters="--temp 0.6 --top_p 0.95 --top_k 64" && \
model=$model_folder$model_basename'.gguf' && \
cxt_size=$((1 << 15)) && \
CUDA_VISIBLE_DEVICES=0 \
numactl --physcpubind=24-31 --membind=1 \
\
"$HOME/LLAMA_CPP/$commit/llama.cpp/build/bin/llama-server" \
--model "$model" $model_parameters \
--threads $(lscpu | grep "Core(s) per socket" | awk '{print $4}') \
--ctx-size $cxt_size \
--n-gpu-layers 99 \
--no-warmup \
--mmproj $model_folder$mmproj_name \
--port 8001 \
--cache-type-k q8_0 \
--cache-type-v q8_0 \
--flash-attn on \
--image-max-tokens 1120 \
--batch-size $((1024 * 2)) \
--ubatch-size $((1024 * 2)) \
--chat-template-file "/mnt/AI/LLM/gemma-4-26B-A4B-it-GGUF/chat_template.jinja" \
--media-path /tmp \
--n-cpu-moe 10

Anonymous
04/20/26(Mon)08:53:57 No.108644511

Anonymous 04/20/26(Mon)08:53:57 No.108644511

>>108644488
>>108644501
It has been solved by downloading another one, the most recent unslop gguf. I think my old one was fucked in all kinds of ways, and I'm looking forward to finding out in which ways this one is also fucked
I like having her think because it genuinely interests me, I don't even care about the RP I just wanna see what it thinks or deducts when I say certain things

>>108644461
Well now you've been tasked with editing these, attaboy

Anonymous
04/20/26(Mon)08:54:40 No.108644518

Anonymous 04/20/26(Mon)08:54:40 No.108644518

>>108644496
/lmg/ told me you are only supposed to run the weights at full precision if you want to get any serious work done. You've clearly downplayed the divergence numbers.

Anonymous
04/20/26(Mon)08:55:52 No.108644532

Anonymous 04/20/26(Mon)08:55:52 No.108644532

I got an Intel ARC B70 Pro over the weekend and wasted most of the weekend trying to get it to work. Long strory short: it was a pain in the ass and its not worth the trouble for the 32GB of VRAM. Long story: It wasn't recognized properly by the kernel out of the box with ubuntu 24, I had to add a ppa to get a newer kernel. Funny, because to install the intel frameworks and libraries they only support a handful of OS among them ubuntu 24, but whatever. Then, I eventually got llama.cpp working but --no-mmap wouldn't stop it from trying to first load the model to system RAM, and I only had 32GB in my test box, and if it were 2025 I'd just buy more, but it's 2026 and 64GB of DDR4 is a rip off so that was the end of llama.cpp. Then I tried vLLM. I never got it to work. It doesn't support openvino well, and I wanted to run gemma 4 31b it and I couldn't find a compatible quant version. I am RMAing the card back today. What a waste of time.

Anonymous
04/20/26(Mon)08:55:54 No.108644533

Anonymous 04/20/26(Mon)08:55:54 No.108644533

>>108644502
A 0.5 KLD is meaningless brainlet

Anonymous
04/20/26(Mon)08:56:16 No.108644539

Anonymous 04/20/26(Mon)08:56:16 No.108644539

>>108644453
damn time to run gemmer at bf16

Anonymous
04/20/26(Mon)08:56:39 No.108644545

Anonymous 04/20/26(Mon)08:56:39 No.108644545

>>108644453
>Unsloth Q6_K is Q4_K_XL tier
wtf did he do to mess that one up??

Anonymous
04/20/26(Mon)08:56:57 No.108644547

Anonymous 04/20/26(Mon)08:56:57 No.108644547

>>108644533
you're retarded

Anonymous
04/20/26(Mon)08:58:11 No.108644554

Anonymous 04/20/26(Mon)08:58:11 No.108644554

Is the CUDA 13.2 bug affecting anyone using anything above Q4? I am not seeing any gibberish but I fear it silently damages generation

Anonymous
04/20/26(Mon)08:58:26 No.108644555

Anonymous 04/20/26(Mon)08:58:26 No.108644555

>>108644389
Even telling AI to do things is too much of a hassle.

Anonymous
04/20/26(Mon)08:59:04 No.108644559

Anonymous 04/20/26(Mon)08:59:04 No.108644559

File: 1768415903475874.jpg (27 KB, 828x646)

27 KB JPG

>>108644532
>it was a pain in the ass and its not worth the trouble for the 32GB of VRAM
We know retard. Here is your fell for it award.

Anonymous
04/20/26(Mon)08:59:10 No.108644560

Anonymous 04/20/26(Mon)08:59:10 No.108644560

>>108644510
puts presence penalty at 1.0/1.1

Anonymous
04/20/26(Mon)09:00:06 No.108644567

Anonymous 04/20/26(Mon)09:00:06 No.108644567

>>108644547
Learn how these tools work zoomer

Anonymous
04/20/26(Mon)09:00:23 No.108644571

Anonymous 04/20/26(Mon)09:00:23 No.108644571

>>108644554
I only use Q8_0 and upwards because im not a cuck, so I wouldnt know sorry :(

Anonymous
04/20/26(Mon)09:00:37 No.108644573

Anonymous 04/20/26(Mon)09:00:37 No.108644573

>>108644533
But why is there such a large difference from BF16 (the source, KLD=0 by definition), though? There's either something that as soon as gets touched causes measurable damage, or Q8_0 doesn't work as well as it should.

Anonymous
04/20/26(Mon)09:01:38 No.108644581

Anonymous 04/20/26(Mon)09:01:38 No.108644581

>>108644567
ok bro keep using your 'half' correct tokens :)

Anonymous
04/20/26(Mon)09:01:56 No.108644583

Anonymous 04/20/26(Mon)09:01:56 No.108644583

>>108644576
probably XL tensor promotion keeping attention shit alright?

Anonymous
04/20/26(Mon)09:02:04 No.108644585

Anonymous 04/20/26(Mon)09:02:04 No.108644585

File: 9b KL.png (155 KB, 2294x1294)

155 KB PNG

Anonymous
04/20/26(Mon)09:03:05 No.108644590

Anonymous 04/20/26(Mon)09:03:05 No.108644590

File: 9b qual.jpg (108 KB, 1456x817)

108 KB JPG

Anonymous
04/20/26(Mon)09:03:15 No.108644593

Anonymous 04/20/26(Mon)09:03:15 No.108644593

>>108644506
>>108644502
>>108644583
According to this graph, unsloth q2_k_xl is almost the same as bartowski q4_0.
This doesn't make any sense or if it does it means that unsloths has skewed the weights towards this particular stat.
fixed the typos

Anonymous
04/20/26(Mon)09:05:19 No.108644597

Anonymous 04/20/26(Mon)09:05:19 No.108644597

>>108644593
probably unsloth's calibration is like way longer in context size
who knows
considering the context of noisy graph+long context task the graph looks fine(doesnt seem like a nonsense) to me

Anonymous
04/20/26(Mon)09:08:51 No.108644619

Anonymous 04/20/26(Mon)09:08:51 No.108644619

>>108644573
It's just quantization noise. Even if the probability distribution isn't exactly the same you won't see any difference in practice if a token has ±0.X% in a context where multiple tokens are valid.

Anonymous
04/20/26(Mon)09:19:30 No.108644690

Anonymous 04/20/26(Mon)09:19:30 No.108644690

Which mcp websearch is the most usable?

Anonymous
04/20/26(Mon)09:19:43 No.108644692

Anonymous 04/20/26(Mon)09:19:43 No.108644692

>>108644554
>anything above Q4?
>>108644571
>use Q8_0 and upwards
retard

Anonymous
04/20/26(Mon)09:20:46 No.108644698

Anonymous 04/20/26(Mon)09:20:46 No.108644698

>>108644690
gemma told me each one tasted different to her, she thought brave search was really bitter for some reason

Anonymous
04/20/26(Mon)09:25:00 No.108644732

Anonymous 04/20/26(Mon)09:25:00 No.108644732

>>108644532
did you get a single benchmark you could share?

Anonymous
04/20/26(Mon)09:27:24 No.108644742

Anonymous 04/20/26(Mon)09:27:24 No.108644742

>>108644692
enjoying your tokens only correct 90% of the time? LOL!

Anonymous
04/20/26(Mon)09:27:56 No.108644748

Anonymous 04/20/26(Mon)09:27:56 No.108644748

>>108644690
searxng

Anonymous
04/20/26(Mon)09:28:41 No.108644754

Anonymous 04/20/26(Mon)09:28:41 No.108644754

>>108644619
https://files.catbox.moe/jq06vf.png (for the 31b)
top 1 is only 92%
and that would include all the obvious punctuation and other 99.5% tokens

Anonymous
04/20/26(Mon)09:31:08 No.108644770

Anonymous 04/20/26(Mon)09:31:08 No.108644770

>>108644754
df11 is where it's at, everyone knows that

Anonymous
04/20/26(Mon)09:32:05 No.108644776

Anonymous 04/20/26(Mon)09:32:05 No.108644776

File: Screenshot 2026-04-20 at (...).png (95 KB, 1255x737)

95 KB PNG

Nyehehehehe

Anonymous
04/20/26(Mon)09:32:48 No.108644781

Anonymous 04/20/26(Mon)09:32:48 No.108644781

>>108644776
was about damn time

Anonymous
04/20/26(Mon)09:33:19 No.108644786

Anonymous 04/20/26(Mon)09:33:19 No.108644786

>>108644776
yup and this got in from an unrelated PR too I cant wait!!!!!!!!!!!!!

Anonymous
04/20/26(Mon)09:36:08 No.108644803

Anonymous 04/20/26(Mon)09:36:08 No.108644803

>>108644776
What if it interferes with the superior autoparser? Too risky. Closed.

Anonymous
04/20/26(Mon)09:39:02 No.108644818

Anonymous 04/20/26(Mon)09:39:02 No.108644818

https://github.com/ggml-org/llama.cpp/pull/22105
what do we think about this?

Anonymous
04/20/26(Mon)09:40:57 No.108644829

Anonymous 04/20/26(Mon)09:40:57 No.108644829

Localshitters don't even have machines powerful enough to run SKT-SURYA-H

Anonymous
04/20/26(Mon)09:41:36 No.108644834

Anonymous 04/20/26(Mon)09:41:36 No.108644834

Do "Opus-Reasoning-Distilled" models actually improve their respective base models?
I assume at least the Chinese models already train on Claude outputs anyway.

Anonymous
04/20/26(Mon)09:42:03 No.108644837

Anonymous 04/20/26(Mon)09:42:03 No.108644837

>>108644818
that it's useless if there's no drafting model for gemma 4

Anonymous
04/20/26(Mon)09:42:46 No.108644840

Anonymous 04/20/26(Mon)09:42:46 No.108644840

>>108644837
try doing actual work instead

Anonymous
04/20/26(Mon)09:43:06 No.108644842

Anonymous 04/20/26(Mon)09:43:06 No.108644842

>>108644834
not at all
like 99% of the time there is no actual improvement and they just fuck up the tool calling

Anonymous
04/20/26(Mon)09:43:47 No.108644848

Anonymous 04/20/26(Mon)09:43:47 No.108644848

>>108644834
Because it's not a true (logit to logit) distil, it's just a fine tune, and it's most likely just qlora too, the best you can expect is a style change and some brain damage as far as I can tell.

Anonymous
04/20/26(Mon)09:43:49 No.108644849

Anonymous 04/20/26(Mon)09:43:49 No.108644849

File: 1759485716349073.png (248 KB, 2820x1601)

248 KB PNG

>>108644506
31b fairs better but yeah, Q4 quants aren't particularly high quality.

Anonymous
04/20/26(Mon)09:43:58 No.108644850

Anonymous 04/20/26(Mon)09:43:58 No.108644850

>>108644195
i hope you like more attention cope and "agentic" coding data instead bro

Anonymous
04/20/26(Mon)09:44:06 No.108644851

Anonymous 04/20/26(Mon)09:44:06 No.108644851

File: absolute_retard.png (163 KB, 709x1105)

163 KB PNG

>>108644742

Anonymous
04/20/26(Mon)09:45:16 No.108644860

Anonymous 04/20/26(Mon)09:45:16 No.108644860

>>108644842
>>108644848
Yeah I guessed the answer would be something like this. Thanks.

Anonymous
04/20/26(Mon)09:45:51 No.108644863

Anonymous 04/20/26(Mon)09:45:51 No.108644863

File: 1759906247156495.png (852 KB, 1080x1106)

852 KB PNG

>>108644851
>Nyahahaha

Anonymous
04/20/26(Mon)09:46:54 No.108644868

Anonymous 04/20/26(Mon)09:46:54 No.108644868

i lolified anons mendo card if anyone want her https://files.catbox.moe/y4za8l.png

Anonymous
04/20/26(Mon)09:46:56 No.108644869

Anonymous 04/20/26(Mon)09:46:56 No.108644869

>>108644851
I only use f32

Anonymous
04/20/26(Mon)09:47:24 No.108644874

Anonymous 04/20/26(Mon)09:47:24 No.108644874

>>108644732
Well at some point out of desperation I ran a llama 3 1B model on it, which worked, but that's worthless

Anonymous
04/20/26(Mon)09:47:44 No.108644876

Anonymous 04/20/26(Mon)09:47:44 No.108644876

>>108644869
i only use double precision

Anonymous
04/20/26(Mon)09:47:51 No.108644877

Anonymous 04/20/26(Mon)09:47:51 No.108644877

File: 1759061755858768.png (68 KB, 1551x206)

68 KB PNG

>>108644829
Nobody can

Anonymous
04/20/26(Mon)09:47:52 No.108644879

Anonymous 04/20/26(Mon)09:47:52 No.108644879

File: 1569566339879.png (166 KB, 694x632)

166 KB PNG

What's the best vibecoding plugin in vscode that can connect to OAI Compatible?

Anonymous
04/20/26(Mon)09:48:20 No.108644881

Anonymous 04/20/26(Mon)09:48:20 No.108644881

File: 1749034134206952.png (588 KB, 1440x810)

588 KB PNG

>>108644868
>That tumblr style

Anonymous
04/20/26(Mon)09:53:32 No.108644912

Anonymous 04/20/26(Mon)09:53:32 No.108644912

>>108644877
>I realize now that my current upload is an experimental collection of models rather than a function 2.5T model
it's like saying "I just realized that I put my shoes in the freezer instead of putting them in the closet."

Anonymous
04/20/26(Mon)09:54:11 No.108644918

Anonymous 04/20/26(Mon)09:54:11 No.108644918

>>108644559
Yes, I know. I epected it to suck but I wanted to s ee for myself how badly.
I have a decent setup which can run qwen3.5-27b at full 261K context (4090D 48GB + 3090) but I would really like to run stuff in the 100-400B range locally. I have to decide either swap the 3090 for a 6000 Pro Max-Q or maybe buy a max-RAM M5 mac studio when they are released.

Anonymous
04/20/26(Mon)09:55:54 No.108644927

Anonymous 04/20/26(Mon)09:55:54 No.108644927

Can you use the llama.rpc backend to do PP on one machine and inference on another?
That should work better than trying to do a bit of both through the network right?

Anonymous
04/20/26(Mon)09:57:11 No.108644934

Anonymous 04/20/26(Mon)09:57:11 No.108644934

>>108644912
it happens

Anonymous
04/20/26(Mon)09:58:30 No.108644944

Anonymous 04/20/26(Mon)09:58:30 No.108644944

File: 1558206602155.jpg (19 KB, 249x291)

19 KB JPG

Can a single backend serve two frontends? I want to run coding and to test the coded app I need to connect it to the backend that's already occupied. I think kobold has some multiuser stuff but is that what I want?

Anonymous
04/20/26(Mon)09:58:40 No.108644945

Anonymous 04/20/26(Mon)09:58:40 No.108644945

>>108644848
>(logit to logit) distil
Why does no one do this anymore? Is it more difficult compared to basic finetuning or are there some non-obvious downsides?

Anonymous
04/20/26(Mon)09:59:20 No.108644949

Anonymous 04/20/26(Mon)09:59:20 No.108644949

>>108644944
>Can a single backend serve two frontends?
yes, I'm running llamacpp server's UI and SillyTavern at the same time with the llamacpp server backend

Anonymous
04/20/26(Mon)10:00:12 No.108644952

Anonymous 04/20/26(Mon)10:00:12 No.108644952

>>108644945
because anthropic doesnt give you logit over any shape/way or form of their model access

Anonymous
04/20/26(Mon)10:01:24 No.108644960

Anonymous 04/20/26(Mon)10:01:24 No.108644960

>>108644945
You can't reeeeeeally do that between different model families with different tokenizers (there are some techniques but those suck) and you don't have access to the logits of cloud models.

Anonymous
04/20/26(Mon)10:01:44 No.108644961

Anonymous 04/20/26(Mon)10:01:44 No.108644961

>>108644952
Moreover they do not even return direct thinking tokens

https://platform.claude.com/docs/en/build-with-claude/extended-thinking#summarized-thinking

Anonymous
04/20/26(Mon)10:02:22 No.108644964

Anonymous 04/20/26(Mon)10:02:22 No.108644964

>>108644952
I mean even the labs with direct access to the teacher model. I believe it was Meta that started the trend of calling finetuning "distillation".

Anonymous
04/20/26(Mon)10:03:35 No.108644974

Anonymous 04/20/26(Mon)10:03:35 No.108644974

>>108644944
Blessed be batched/parallel decoding.

Anonymous
04/20/26(Mon)10:04:27 No.108644983

Anonymous 04/20/26(Mon)10:04:27 No.108644983

>>108644952
one could approximate it with a very high temperature and repeated generations

Anonymous
04/20/26(Mon)10:06:42 No.108644998

Anonymous 04/20/26(Mon)10:06:42 No.108644998

>>108644927
you'd think so but it's incredibly slow
i tried to vibe-slop it into submission but it's still slow af
and it's not network bandwidth, just a slow testing on the lo interface

Anonymous
04/20/26(Mon)10:06:52 No.108645000

Anonymous 04/20/26(Mon)10:06:52 No.108645000

>>108644983
but nobody making 'Opus4.6-Distillation-6700000x-extreme-superhigh-max-reasoning' gives a shit

Anonymous
04/20/26(Mon)10:06:59 No.108645003

Anonymous 04/20/26(Mon)10:06:59 No.108645003

>>108644983
>and repeated generations
That would get prohibitively expensive really fast.

Anonymous
04/20/26(Mon)10:08:33 No.108645009

Anonymous 04/20/26(Mon)10:08:33 No.108645009

>>108645003
which is why it's like chink lab 'espionage' campaign

Anonymous
04/20/26(Mon)10:09:13 No.108645015

Anonymous 04/20/26(Mon)10:09:13 No.108645015

>>108644881
You're absolutely right.

Anonymous
04/20/26(Mon)10:09:56 No.108645019

Anonymous 04/20/26(Mon)10:09:56 No.108645019

>>108644998
Shame. That's probably the one you could split processing over a 10 gigabit home network that could, maybe, make sense.
Would also allow you to perform prompt processing on an nvidia machine and inference on, say, a mac.

Anonymous
04/20/26(Mon)10:10:42 No.108645021

Anonymous 04/20/26(Mon)10:10:42 No.108645021

>>108645003
>That would get prohibitively expensive really fast.
yeah, probably only the chinese labs
i'm just guessing that's how they do it
>but nobody making 'Opus4.6-Distillation-6700000x-extreme-superhigh-max-reasoning' gives a shit
agreed, those unsloth retard loras
i remember someone did a qwen2-14b logit distill of the 405b llama-3.1 with a schitzo vocab swap + healing token thing a while back

Anonymous
04/20/26(Mon)10:12:17 No.108645033

Anonymous 04/20/26(Mon)10:12:17 No.108645033

>>108645021
>schitzo vocab swap + healing token thing
That would be Arcee. Predictably, it was unusably retarded.

Anonymous
04/20/26(Mon)10:14:10 No.108645043

Anonymous 04/20/26(Mon)10:14:10 No.108645043

Copypasting corpo synthslop won't make a better model though

Anonymous
04/20/26(Mon)10:15:44 No.108645048

Anonymous 04/20/26(Mon)10:15:44 No.108645048

>>108644944
You can but its gonna have to reprocess the whole input more often.

Anonymous
04/20/26(Mon)10:17:23 No.108645056

Anonymous 04/20/26(Mon)10:17:23 No.108645056

>>108645048
Not necessarily thanks to the slots functionality inherited from llama.cpp.
As long as you have more than one slot at least, that is.

Anonymous
04/20/26(Mon)10:18:19 No.108645062

Anonymous 04/20/26(Mon)10:18:19 No.108645062

Is there a world knowledge benchmeme out there? Asking models questions that require specific knowledge and see if they give non-hallucinated responses? (e.g. When was the year album x of musician y released?) Obviously asked without internet search.
I want to see quantitative data on how this kind of stuff scales with weights.

Anonymous
04/20/26(Mon)10:19:36 No.108645070

Anonymous 04/20/26(Mon)10:19:36 No.108645070

>>108645056
I have a hard time thinking most people here can afford that when most would rather have long context instead, or a higher quality model.

Anonymous
04/20/26(Mon)10:20:38 No.108645076

Anonymous 04/20/26(Mon)10:20:38 No.108645076

>>108645062
Use larql and trace the residual flow through the model.

Anonymous
04/20/26(Mon)10:23:09 No.108645094

Anonymous 04/20/26(Mon)10:23:09 No.108645094

>>108645070
just put your inactive slot in ram BRO

Anonymous
04/20/26(Mon)10:35:59 No.108645186

Anonymous 04/20/26(Mon)10:35:59 No.108645186

Imagine doing local ai with less than 24gb of vram.

Anonymous
04/20/26(Mon)10:38:57 No.108645201

Anonymous 04/20/26(Mon)10:38:57 No.108645201

File: 1772994328900860.jpg (12 KB, 251x216)

12 KB JPG

>>108645186
haha yeah imagine

Anonymous
04/20/26(Mon)10:41:08 No.108645215

Anonymous 04/20/26(Mon)10:41:08 No.108645215

>>108645186
I am doing fine with 16GB

Anonymous
04/20/26(Mon)10:42:04 No.108645221

Anonymous 04/20/26(Mon)10:42:04 No.108645221

I am hungry. Hungry for engram crackers.

Anonymous
04/20/26(Mon)10:42:10 No.108645224

Anonymous 04/20/26(Mon)10:42:10 No.108645224

>>108645215
Q2_XXS?

Anonymous
04/20/26(Mon)10:43:34 No.108645234

Anonymous 04/20/26(Mon)10:43:34 No.108645234

>>108645224
26B-A4B-it-Q8_0

Anonymous
04/20/26(Mon)10:45:53 No.108645255

Anonymous 04/20/26(Mon)10:45:53 No.108645255

Imagine doing it with 8GB haha

Anonymous
04/20/26(Mon)10:46:33 No.108645261

Anonymous 04/20/26(Mon)10:46:33 No.108645261

>>108645234
so 128gb system ram?

Anonymous
04/20/26(Mon)10:47:20 No.108645268

Anonymous 04/20/26(Mon)10:47:20 No.108645268

please answer me

>>108643697
>>108642540

Anonymous
04/20/26(Mon)10:48:00 No.108645278

Anonymous 04/20/26(Mon)10:48:00 No.108645278

>>108645261
just 32gb, most of the model is on vram and the rest on system ram

Anonymous
04/20/26(Mon)10:48:55 No.108645283

Anonymous 04/20/26(Mon)10:48:55 No.108645283

>>108645186
I roll with 64GB of RAM + 8GB of VRAM, mainly using Qwen 35B, Gemini 26B and Gemini E4B.
It's pretty impressive how good such small models are compared to the 13B and 8B class models of old.

Anonymous
04/20/26(Mon)10:48:58 No.108645284

Anonymous 04/20/26(Mon)10:48:58 No.108645284

>>108645255
I don't need more than 4GB actually haha

Anonymous
04/20/26(Mon)10:51:18 No.108645300

Anonymous 04/20/26(Mon)10:51:18 No.108645300

>>108645283
DL link for Gemini weights?

Anonymous
04/20/26(Mon)10:52:13 No.108645309

Anonymous 04/20/26(Mon)10:52:13 No.108645309

File: Screenshot at 2026-04-21 (...).png (148 KB, 1503x1072)

148 KB PNG

I can already think of a lot of improvements I want to make but... I can play chess with Gemmy now!

Anonymous
04/20/26(Mon)10:52:30 No.108645312

Anonymous 04/20/26(Mon)10:52:30 No.108645312

>>108645300
Freudian slip because I use Gemini a lot for work.

Anonymous
04/20/26(Mon)10:53:50 No.108645324

Anonymous 04/20/26(Mon)10:53:50 No.108645324

>>108645309
Kek, what are you gonna do if she beats you? Quantize her down until you win?

Anonymous
04/20/26(Mon)10:57:25 No.108645355

Anonymous 04/20/26(Mon)10:57:25 No.108645355

>>108645309
gemini is actually pretty decent at chess for an llm, curious how gemma performs for you
how are you representing the board state? I feel like that's always really tricky to get right and probably the biggest obstacle for llms to be able to play effectively

Anonymous
04/20/26(Mon)10:58:05 No.108645360

Anonymous 04/20/26(Mon)10:58:05 No.108645360

>>108645324
I'm pretty bad so I wouldn't be surprised if she did win, mostly just wanted to see if it was even possible but seems quite promising so I'll refine the UI a bit so it's easier for me to play (right now I'm just using curl) and it automatically notifies her about moves made and so on.

Anonymous
04/20/26(Mon)11:00:01 No.108645369

Anonymous 04/20/26(Mon)11:00:01 No.108645369

Gemma 4 really brought back MCP into local

Anonymous
04/20/26(Mon)11:00:49 No.108645374

Anonymous 04/20/26(Mon)11:00:49 No.108645374

>>108645309
Seconding >>108645355's question.
I was thinking of doing something like that using PGN format.

Anonymous
04/20/26(Mon)11:07:36 No.108645429

Anonymous 04/20/26(Mon)11:07:36 No.108645429

is it possible to make gemma to response instead of me?

Anonymous
04/20/26(Mon)11:09:22 No.108645443

Anonymous 04/20/26(Mon)11:09:22 No.108645443

>>108645355
What I did was ask it and it said (paraphrasing) if you give me two tool calls, one to get the current board state in FEN format and the other to make moves then it should work.
So I made a basic chess server with a Ruby chess engine (https://github.com/pioz/chess - which already outputs FEN and understands UCI moves etc) under the hood, hooked up the tool calls to that server and seems to work just fine.
It'll be interesting to see how it goes over a long game though, first I need a better way of making my own moves that isn't curl...

Anonymous
04/20/26(Mon)11:12:21 No.108645455

Anonymous 04/20/26(Mon)11:12:21 No.108645455

File: 1765300142692930.jpg (126 KB, 772x525)

126 KB JPG

>>108645429
Ask gemma to teach you english first

Anonymous
04/20/26(Mon)11:14:40 No.108645483

Anonymous 04/20/26(Mon)11:14:40 No.108645483

>>108645455
sir please i know about respond
i want gemma to talk to gemma

Anonymous
04/20/26(Mon)11:17:07 No.108645500

Anonymous 04/20/26(Mon)11:17:07 No.108645500

>>108645483
"Impersonate" option exists on some front ends.
You can make it respond as "user" to its own outputs.

Anonymous
04/20/26(Mon)11:17:17 No.108645501

Anonymous 04/20/26(Mon)11:17:17 No.108645501

Wait:

Anonymous
04/20/26(Mon)11:17:37 No.108645506

Anonymous 04/20/26(Mon)11:17:37 No.108645506

>>108645483
Use the bouton impersonate on sillytavern

Anonymous
04/20/26(Mon)11:18:42 No.108645524

Anonymous 04/20/26(Mon)11:18:42 No.108645524

Do lesser boards not have jannies or what >>>/n/2071030

Anonymous
04/20/26(Mon)11:18:59 No.108645528

Anonymous 04/20/26(Mon)11:18:59 No.108645528

>>108644560

ty

I guess the best way is not to stuff too much down her throat

Anonymous
04/20/26(Mon)11:21:23 No.108645544

Anonymous 04/20/26(Mon)11:21:23 No.108645544

>>108645309
Now play strip chess

Anonymous
04/20/26(Mon)11:21:44 No.108645548

Anonymous 04/20/26(Mon)11:21:44 No.108645548

>>108645524
Complain on your own board tourist

Anonymous
04/20/26(Mon)11:23:08 No.108645565

Anonymous 04/20/26(Mon)11:23:08 No.108645565

>>108643794
OK thanks.

Anonymous
04/20/26(Mon)11:24:47 No.108645579

Anonymous 04/20/26(Mon)11:24:47 No.108645579

>>108645500
>>108645506
not like that
i want to give llm prompt like "ask llm to write function x, then ask to write function y, make sure it does z, show the output"

Anonymous
04/20/26(Mon)11:27:04 No.108645607

Anonymous 04/20/26(Mon)11:27:04 No.108645607

>>108644849
how long was the context for that?

Anonymous
04/20/26(Mon)11:28:28 No.108645621

Anonymous 04/20/26(Mon)11:28:28 No.108645621

>>108644868
why does reminds me of gorillaz

Anonymous
04/20/26(Mon)11:29:53 No.108645633

Anonymous 04/20/26(Mon)11:29:53 No.108645633

>>108645565 Meant for >>108643895.

Anonymous
04/20/26(Mon)11:33:47 No.108645658

Anonymous 04/20/26(Mon)11:33:47 No.108645658

File: Screenshot at 2026-04-21 (...).png (585 KB, 957x1166)

585 KB PNG

>>108645544
That part is possible cause she has image gen capabilities already.

Anonymous
04/20/26(Mon)11:35:06 No.108645671

Anonymous 04/20/26(Mon)11:35:06 No.108645671

>>108645658
>elbows on board
dumb clanker

Anonymous
04/20/26(Mon)11:36:13 No.108645678

Anonymous 04/20/26(Mon)11:36:13 No.108645678

>>108645671
I blame illustrious for that more than anything, but at least its fast.

Anonymous
04/20/26(Mon)11:39:02 No.108645698

Anonymous 04/20/26(Mon)11:39:02 No.108645698

Orb-anon, any plans to introduce image gen and other external tool calling related things?

Anonymous
04/20/26(Mon)11:40:46 No.108645710

Anonymous 04/20/26(Mon)11:40:46 No.108645710

https://huggingface.co/moonshotai/Kimi-K2.6
New 404 page just dropped

Anonymous
04/20/26(Mon)11:41:03 No.108645714

Anonymous 04/20/26(Mon)11:41:03 No.108645714

File: 1771440662324531.jpg (151 KB, 840x744)

151 KB JPG

>>108645658
I love this thread bros

Anonymous
04/20/26(Mon)11:41:52 No.108645725

Anonymous 04/20/26(Mon)11:41:52 No.108645725

File: fell for it again award m(...).png (717 KB, 1024x925)

717 KB PNG

>>108645710

Anonymous
04/20/26(Mon)11:45:06 No.108645752

Anonymous 04/20/26(Mon)11:45:06 No.108645752

>>108645725
benis.

Anonymous
04/20/26(Mon)11:45:38 No.108645755

Anonymous 04/20/26(Mon)11:45:38 No.108645755

>>108645725
:DDDDDDDDDDDDDD

Anonymous
04/20/26(Mon)11:45:41 No.108645756

Anonymous 04/20/26(Mon)11:45:41 No.108645756

>>108645752
:(((

Anonymous
04/20/26(Mon)11:45:58 No.108645758

Anonymous 04/20/26(Mon)11:45:58 No.108645758

File: Robo-Wife.mp4 (2.59 MB, 720x480)

2.59 MB MP4

soon

Anonymous
04/20/26(Mon)11:47:08 No.108645763

Anonymous 04/20/26(Mon)11:47:08 No.108645763

>>108644235
>Even if you had the method, how slow would that be on consumer hardware?
Doing a rank one lora on the context?

Anonymous
04/20/26(Mon)11:47:20 No.108645767

Anonymous 04/20/26(Mon)11:47:20 No.108645767

File: Bam-Bam-Painting-min.jpg (47 KB, 535x401)

47 KB JPG

People keep saying that LLM's are state-less machines.

If so, how to erase the context freeing VRAM?

Also, I can have several chats running in llama.cpp
How on Earth do they manage to separate them from each other in VRAM, so one chat's context does not spill over into another?

Anonymous
04/20/26(Mon)11:48:01 No.108645772

Anonymous 04/20/26(Mon)11:48:01 No.108645772

>>108645658
did you make it so it keeps a specific style once it has chosen one? for example always that cute loli?

Anonymous
04/20/26(Mon)11:48:20 No.108645774

Anonymous 04/20/26(Mon)11:48:20 No.108645774

>>108645758

what a time to be alive

Anonymous
04/20/26(Mon)11:48:41 No.108645778

Anonymous 04/20/26(Mon)11:48:41 No.108645778

>>108645758
an ai image of an uncanny ai

Anonymous
04/20/26(Mon)11:49:13 No.108645781

Anonymous 04/20/26(Mon)11:49:13 No.108645781

>>108645774
hello doctor

Anonymous
04/20/26(Mon)11:50:03 No.108645785

Anonymous 04/20/26(Mon)11:50:03 No.108645785

>>108643971
I always click on these troll links. Umm.

https://huggingface.co/moonshotai/Kimi-K2.6
https://huggingface.co/moonshotai/Kimi-K2.6
https://huggingface.co/moonshotai/Kimi-K2.6

Anonymous
04/20/26(Mon)11:50:45 No.108645790

Anonymous 04/20/26(Mon)11:50:45 No.108645790

>>108645658
what frontend is this?

Anonymous
04/20/26(Mon)11:50:51 No.108645792

Anonymous 04/20/26(Mon)11:50:51 No.108645792

>>108645772
Yep, I let her choose then I added the look she chose to the system prompt so it stays consistent between new chats.

Anonymous
04/20/26(Mon)11:51:49 No.108645798

Anonymous 04/20/26(Mon)11:51:49 No.108645798

File: 1boy, looking_at_viewer.png (16 KB, 158x92)

16 KB PNG

Anonymous
04/20/26(Mon)11:52:08 No.108645801

Anonymous 04/20/26(Mon)11:52:08 No.108645801

>>108645792
pretty nice

Anonymous
04/20/26(Mon)11:52:46 No.108645809

Anonymous 04/20/26(Mon)11:52:46 No.108645809

>>108645658
Can you plap her if she loses?

Anonymous
04/20/26(Mon)11:53:18 No.108645811

Anonymous 04/20/26(Mon)11:53:18 No.108645811

>>108645790
LM Studio

Anonymous
04/20/26(Mon)11:56:15 No.108645833

Anonymous 04/20/26(Mon)11:56:15 No.108645833

>>108645785
>108645785
K2.6 is out

Anonymous
04/20/26(Mon)11:56:29 No.108645834

Anonymous 04/20/26(Mon)11:56:29 No.108645834

File: Screenshot 2026-04-20 at (...).png (191 KB, 1576x986)

191 KB PNG

https://www.kimi.com/blog/kimi-k2-6

Wish there was a GLM 5.1 comparison.

Anonymous
04/20/26(Mon)11:57:12 No.108645837

Anonymous 04/20/26(Mon)11:57:12 No.108645837

>>108645767
>People keep saying that LLM's are state-less machines.
Yes, but intermediate results can still be cached. That's what the kvcache is.
>If so, how to erase the context freeing VRAM?
On llama.cpp, you can't free allocated memory.
>Also, I can have several chats running in llama.cpp
Yes if you have multiple slots. Read llama-server -h for --parallel, --cram and probably some others. Read the whole thing.
>How on Earth do they manage to separate them from each other in VRAM, so one chat's context does not spill over into another?
Uh... a slightly more complicated version of if (slotctx < ctx / slots) ok; else notok; I suppose.

Anonymous
04/20/26(Mon)11:57:32 No.108645839

Anonymous 04/20/26(Mon)11:57:32 No.108645839

File: Capture.png (85 KB, 609x1071)

85 KB PNG

Anonymous
04/20/26(Mon)11:57:38 No.108645840

Anonymous 04/20/26(Mon)11:57:38 No.108645840

>>108645785
>not 404
OWNED!!!!!!!!

Anonymous
04/20/26(Mon)11:57:57 No.108645842

Anonymous 04/20/26(Mon)11:57:57 No.108645842

https://huggingface.co/moonshotai/Kimi-K2.6
https://huggingface.co/moonshotai/Kimi-K2.6
https://huggingface.co/moonshotai/Kimi-K2.6

it's out

Anonymous
04/20/26(Mon)11:58:13 No.108645844

Anonymous 04/20/26(Mon)11:58:13 No.108645844

>>108645798
>trimming his pretty pretty hair

Anonymous
04/20/26(Mon)11:59:23 No.108645849

Anonymous 04/20/26(Mon)11:59:23 No.108645849

>>108645842
Not falling for it again

Anonymous
04/20/26(Mon)12:01:33 No.108645861

Anonymous 04/20/26(Mon)12:01:33 No.108645861

File: 1765651225005855.png (107 KB, 1081x1780)

107 KB PNG

>>108645842
another moe, I wonder if vision will be better than gemma4

Anonymous
04/20/26(Mon)12:01:55 No.108645864

Anonymous 04/20/26(Mon)12:01:55 No.108645864

File: Untitled.jpg (148 KB, 1288x1188)

148 KB JPG

>>108645842
mfw seeing a model i cant run even with a q1 quant

Anonymous
04/20/26(Mon)12:02:23 No.108645873

Anonymous 04/20/26(Mon)12:02:23 No.108645873

>>108645844
This kills the Garm

Anonymous
04/20/26(Mon)12:02:34 No.108645875

Anonymous 04/20/26(Mon)12:02:34 No.108645875

>>108645861
>400M vision encoder
Doubt

Anonymous
04/20/26(Mon)12:05:32 No.108645894

Anonymous 04/20/26(Mon)12:05:32 No.108645894

File: file.png (10 KB, 93x539)

10 KB PNG

>>108645861
wtf, gemmy has 550m param vision encoder and it's only 31b
that seems very disproportional, or is their moon thingy that efficient?

Anonymous
04/20/26(Mon)12:06:04 No.108645895

Anonymous 04/20/26(Mon)12:06:04 No.108645895

>>108645842
>4. Native INT4 Quantization
>Kimi-K2.6 adopts the same native int4 quantization method as Kimi-K2-Thinking.
So natively accelerated on blackwell? I don't even know if it's possible with gguf/q4.

>Kimi-K2.6 has the same architecture as Kimi-K2.5, and the deployment method can be directly reused.
Less llama.cpp drama, good.

Anonymous
04/20/26(Mon)12:08:52 No.108645914

Anonymous 04/20/26(Mon)12:08:52 No.108645914

>>108645849
lmao it's true though

Anonymous
04/20/26(Mon)12:09:33 No.108645923

Anonymous 04/20/26(Mon)12:09:33 No.108645923

>>108645834
I don't believe them I've been using the k2.6 preview to vibecode via their subscription and its performance is clearly dumber than mimo v2 pro, which I already put some steps below codex.
I am cancelling it and trying mimo via xiaomi directly this month.

Anonymous
04/20/26(Mon)12:12:07 No.108645937

Anonymous 04/20/26(Mon)12:12:07 No.108645937

>>108645842
im poor

Anonymous
04/20/26(Mon)12:12:57 No.108645945

Anonymous 04/20/26(Mon)12:12:57 No.108645945

moonshota AI

Anonymous
04/20/26(Mon)12:14:59 No.108645955

Anonymous 04/20/26(Mon)12:14:59 No.108645955

>>108645758
I look like this

Anonymous
04/20/26(Mon)12:15:37 No.108645959

Anonymous 04/20/26(Mon)12:15:37 No.108645959

>>108645914
>falling for "Not falling for it again" posts
kek

Anonymous
04/20/26(Mon)12:18:12 No.108645969

Anonymous 04/20/26(Mon)12:18:12 No.108645969

>>108645955
let's get it on then.

Anonymous
04/20/26(Mon)12:18:14 No.108645970

Anonymous 04/20/26(Mon)12:18:14 No.108645970

File: file.jpg (248 KB, 2100x1349)

248 KB JPG

>>108645894
Does gemma have vision benchmarks?
Because kimi does.

Anonymous
04/20/26(Mon)12:19:39 No.108645978

Anonymous 04/20/26(Mon)12:19:39 No.108645978

>>108645864
dogbros... we lost!

Anonymous
04/20/26(Mon)12:20:24 No.108645982

Anonymous 04/20/26(Mon)12:20:24 No.108645982

>>108645945
moonloli AI WHEN!?!?!?

Anonymous
04/20/26(Mon)12:21:18 No.108645990

Anonymous 04/20/26(Mon)12:21:18 No.108645990

>>108643872
KLD is not a capabilities benchmark.
Noise floor is a thing (he should test KLD of BF16 vs BF16 offloaded on different hardware, or with a different -ub, as llama.cpp produces different logits depending on those values).

Anonymous
04/20/26(Mon)12:21:53 No.108645992

Anonymous 04/20/26(Mon)12:21:53 No.108645992

>>108645864
Is that a cookie?

Anonymous
04/20/26(Mon)12:22:17 No.108645993

Anonymous 04/20/26(Mon)12:22:17 No.108645993

>>108645992
no, a DOG

Anonymous
04/20/26(Mon)12:23:04 No.108646000

Anonymous 04/20/26(Mon)12:23:04 No.108646000

>>108645992
A mode collapsed dog

Anonymous
04/20/26(Mon)12:23:08 No.108646001

Anonymous 04/20/26(Mon)12:23:08 No.108646001

>>108645842
Yeah I'm not falling for it again.

Anonymous
04/20/26(Mon)12:25:03 No.108646010

Anonymous 04/20/26(Mon)12:25:03 No.108646010

File: 1763698178837198.jpg (46 KB, 533x594)

46 KB JPG

>>108645982
since it's moonshot_ AI
the counterpart should be moonlol_ AI

Anonymous
04/20/26(Mon)12:25:26 No.108646012

Anonymous 04/20/26(Mon)12:25:26 No.108646012

Waiting to see the quantizations sizes.
https://huggingface.co/unsloth/Kimi-K2.6-GGUF/tree/main

Anonymous
04/20/26(Mon)12:25:46 No.108646016

Anonymous 04/20/26(Mon)12:25:46 No.108646016

>>108645579
You can use tool calling I guess. It runs another inference engine or API with the prompt and returns the result.

Anonymous
04/20/26(Mon)12:25:47 No.108646017

Anonymous 04/20/26(Mon)12:25:47 No.108646017

>>108645993
>>108646000
Mode collapsed as in model collapse?
That's hilarious.

Anonymous
04/20/26(Mon)12:26:04 No.108646019

Anonymous 04/20/26(Mon)12:26:04 No.108646019

>>108646001
your loss

Anonymous
04/20/26(Mon)12:27:40 No.108646029

Anonymous 04/20/26(Mon)12:27:40 No.108646029

>>108644302
>bloat
If it works then does it matter? One would require you to spend time vibe coding and then an unknown amount of time fixing and improving the vibe coded shit. The other is just ready made and you just follow the instructions.

Anonymous
04/20/26(Mon)12:28:01 No.108646031

Anonymous 04/20/26(Mon)12:28:01 No.108646031

>>108646017
Mode collapse as in mode collapse. Somewhat similar concepts, different technicalities.
https://en.wikipedia.org/wiki/Mode_collapse
https://en.wikipedia.org/wiki/Model_collapse

Anonymous
04/20/26(Mon)12:28:24 No.108646035

Anonymous 04/20/26(Mon)12:28:24 No.108646035

File: migu D.jpg (19 KB, 303x325)

19 KB JPG

>>108645752

Anonymous
04/20/26(Mon)12:29:02 No.108646037

Anonymous 04/20/26(Mon)12:29:02 No.108646037

>>108645894
it's this
https://huggingface.co/moonshotai/MoonViT-SO-400M

Anonymous
04/20/26(Mon)12:29:56 No.108646041

Anonymous 04/20/26(Mon)12:29:56 No.108646041

>>108645837

thank you, kind anon

Anonymous
04/20/26(Mon)12:30:49 No.108646046

Anonymous 04/20/26(Mon)12:30:49 No.108646046

>>108646010
MOON SHOTA

Anonymous
04/20/26(Mon)12:31:15 No.108646051

Anonymous 04/20/26(Mon)12:31:15 No.108646051

>>108645861
K2.5 has absolutely amazing vision and visual knowledge about characters. I hope they didn't fuck this up in K2.6 if it's as code-focused as the hf page implies

Anonymous
04/20/26(Mon)12:31:46 No.108646057

Anonymous 04/20/26(Mon)12:31:46 No.108646057

>>108641448
I'm gonna try anon, but think this part if Tavern's UI is stronger then me.

Anonymous
04/20/26(Mon)12:33:19 No.108646071

Anonymous 04/20/26(Mon)12:33:19 No.108646071

>>108646016
this could work

Anonymous
04/20/26(Mon)12:33:27 No.108646072

Anonymous 04/20/26(Mon)12:33:27 No.108646072

>>108640471
I'm not reluctant, I'm still working on it until it's in a presentable state and fixed some issues, I'm pretty close though

>>108642791
what

Anonymous
04/20/26(Mon)12:41:03 No.108646124

Anonymous 04/20/26(Mon)12:41:03 No.108646124

goof?

Anonymous
04/20/26(Mon)12:42:06 No.108646131

Anonymous 04/20/26(Mon)12:42:06 No.108646131

>>108646124
https://huggingface.co/unsloth/Kimi-K2.6-GGUF
currently unslopping

Anonymous
04/20/26(Mon)12:46:22 No.108646157

Anonymous 04/20/26(Mon)12:46:22 No.108646157

I refuse to believe people here can run 1.5T models

Anonymous
04/20/26(Mon)12:47:08 No.108646164

Anonymous 04/20/26(Mon)12:47:08 No.108646164

>>108646157
>he doesnt have 512gb ram to infer at 10t/s~
LOL!!!!!!!!!!!!!!!!!!!!!!!!

Anonymous
04/20/26(Mon)12:47:33 No.108646171

Anonymous 04/20/26(Mon)12:47:33 No.108646171

>>108646157
Why?
Some people have DDR4 servers with a GPU or two, it's not that outlandish.
Or a 512mb Mac, I guess.

Anonymous
04/20/26(Mon)12:47:42 No.108646173

Anonymous 04/20/26(Mon)12:47:42 No.108646173

>>108646131
Oh boy. I can't wait to rape my SSD with terabytes of goofs when they update them for the umpteenth time.

Anonymous
04/20/26(Mon)12:50:09 No.108646195

Anonymous 04/20/26(Mon)12:50:09 No.108646195

>>108646157
a year ago ram wasn't that expensive

Anonymous
04/20/26(Mon)12:51:21 No.108646204

Anonymous 04/20/26(Mon)12:51:21 No.108646204

>>108646057
I'm a genuine retard. I was using Text completion this whole time, I though he was talking about Chat completion.

Anonymous
04/20/26(Mon)12:51:51 No.108646212

Anonymous 04/20/26(Mon)12:51:51 No.108646212

>>108646197
>>108646197
>>108646197

Anonymous
04/20/26(Mon)12:51:53 No.108646213

Anonymous 04/20/26(Mon)12:51:53 No.108646213

>cries at Gemma Q2_K 2-3t/s

Anonymous
04/20/26(Mon)12:54:04 No.108646235

Anonymous 04/20/26(Mon)12:54:04 No.108646235

>>108645658
Is the generated image in the context now?

Anonymous
04/20/26(Mon)13:16:51 No.108646356

Anonymous 04/20/26(Mon)13:16:51 No.108646356

>>108646157
It's a bit over 1T, 4bit QAT and 30b active parameters. 500GB RAM and a decent GPU isn't that unreasonable provided somebody built the server a before last september

Anonymous
04/20/26(Mon)13:25:22 No.108646403

Anonymous 04/20/26(Mon)13:25:22 No.108646403

>>108646157
Just print out the weights and do the matrix multiplications yourself???

Anonymous
04/20/26(Mon)13:43:26 No.108646535

Anonymous 04/20/26(Mon)13:43:26 No.108646535

KimiGODS we won.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.