/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 02/23/26(Mon)06:32:15 No.108218666

File: fceafa1f189406f88ebb5bc4a(...).jpg (2.83 MB, 2661x3992)

2.83 MB JPG

/lmg/ - Local Models General Anonymous 02/23/26(Mon)06:32:15 No.108218666 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108212577 & >>108202477

►News
>(02/20) ggml.ai acquired by Hugging Face: https://github.com/ggml-org/llama.cpp/discussions/19759
>(02/16) Qwen3.5-397B-A17B released: https://hf.co/Qwen/Qwen3.5-397B-A17B
>(02/16) dots.ocr-1.5 released: https://modelscope.cn/models/rednote-hilab/dots.ocr-1.5
>(02/15) Ling-2.5-1T released: https://hf.co/inclusionAI/Ling-2.5-1T
>(02/14) JoyAI-LLM Flash 48B-A3B released: https://hf.co/jdopensource/JoyAI-LLM-Flash
>(02/14) Nemotron Nano 12B v2 VL support merged: https://github.com/ggml-org/llama.cpp/pull/19547

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
02/23/26(Mon)06:32:38 No.108218668

Anonymous 02/23/26(Mon)06:32:38 No.108218668

File: mikuthreadrecap.jpg (1.15 MB, 1804x2160)

1.15 MB JPG

►Recent Highlights from the Previous Thread: >>108212577

--Blind erotica writing test reveals surprising model performance rankings:
>108217645 >108217671 >108217684 >108217694 >108217705 >108217777 >108217803 >108217931
--LLM safety and logic benchmark reveals widespread slur generation failures:
>108216994 >108217004 >108217096 >108217625
--Testing uncensored models with offensive hypothetical scenarios:
>108215190 >108215199 >108215354 >108215374
--Quantization tradeoffs for assistant vs RP tasks:
>108213387 >108213398 >108213403 >108213415 >108213427 >108213441 >108213443 >108213722 >108213738 >108214178 >108213808 >108213471
--Per-request temperature override in llama-server:
>108214829 >108214848 >108214895 >108214940 >108214986 >108214993 >108215006 >108215011 >108214854 >108214923 >108214965 >108214987 >108215026 >108215055 >108215066 >108215102
--Perplexica's multi-step reasoning for 3-4B model comparison:
>108216811 >108216957 >108217188 >108217234 >108217270
--Gemma3 12B's MoE-like efficiency vs Nemo:
>108215631 >108215700 >108215739 >108215775 >108215779 >108215776 >108215783 >108215795
--Frontend options and VRAM requirements for local LLM setups:
>108212903 >108213013 >108213040 >108213088 >108213095 >108213113
--RAM/VRAM offloading performance tradeoffs in high-bandwidth systems:
>108215906 >108215985 >108216041 >108216055 >108216125
--Debating monetization challenges of open-source LLMs matching GPT-4 performance:
>108216133 >108216157 >108216214 >108216238
--GLM4.5 Air Q4 performance on B580 12GB with mmap vs no-mmap:
>108216762 >108216831
--Jetson Orin NX 16GB running GPT-OSS 20B with 50k context:
>108216668
--GLM-5's limited adoption due to quantization and cost barriers:
>108216004 >108216039 >108216060 >108216170 >108216303 >108216485 >108216042 >108216735 >108217167 >108217411
--Miku (free space):

►Recent Highlight Posts from the Previous Thread: >>108212584

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
02/23/26(Mon)06:35:01 No.108218682

Anonymous 02/23/26(Mon)06:35:01 No.108218682

>>108218666
god i wish that were me

Anonymous
02/23/26(Mon)06:36:23 No.108218690

Anonymous 02/23/26(Mon)06:36:23 No.108218690

>>108218666
Miku pregnant with my child

Anonymous
02/23/26(Mon)06:42:59 No.108218722

Anonymous 02/23/26(Mon)06:42:59 No.108218722

Is Cydonia(heretic) the best local model I can run on a 3070 for eRP? It took a lot of messing around to get it to run even slightly decent, but takes like a 1-2mins for every reply, which is fine, but just wondering if there's anything better. It turns soft too quickly, and sometimes not as unhinged as I'd hope.

Anonymous
02/23/26(Mon)06:46:56 No.108218742

Anonymous 02/23/26(Mon)06:46:56 No.108218742

>>108218722
You should be able to get literally anything out of Mistral 3.2, no need for sloptunes and obliteration.

Anonymous
02/23/26(Mon)07:17:35 No.108218881

Anonymous 02/23/26(Mon)07:17:35 No.108218881

>- Planning: Before writing a response, brainstorm the possibilities inside <brainstorm> tags. This section should feature you talking to yourself as a 30-year-old porn-addicted NEET, plotting how his shitty fanfiction will go next. The ideas need to be a list of tastefully cringe, awkward, and autistic shit, even worse than what gets written in the deepest corners of the most obscure internet forums.
then just be yourself writing the first block

Anonymous
02/23/26(Mon)07:18:32 No.108218886

Anonymous 02/23/26(Mon)07:18:32 No.108218886

File: kitaaaaaaaa.jpg (220 KB, 1224x1224)

220 KB JPG

Anonymous
02/23/26(Mon)07:20:20 No.108218894

Anonymous 02/23/26(Mon)07:20:20 No.108218894

>>108218881
>your idea should be so embarrassing you cannot output a single token.

Anonymous
02/23/26(Mon)07:36:02 No.108218969

Anonymous 02/23/26(Mon)07:36:02 No.108218969

>>108218886
Which hole are the newspaper poking out from if she's holding the neck of the bag in her hand?

Anonymous
02/23/26(Mon)07:53:41 No.108219068

Anonymous 02/23/26(Mon)07:53:41 No.108219068

What's the difference between base and instruct versions of an LLM? Ie if you're testing something like >>108217645 would you use a base or instruct version of the model?

Anonymous
02/23/26(Mon)07:53:43 No.108219069

Anonymous 02/23/26(Mon)07:53:43 No.108219069

File: this miku is faulty.png (148 KB, 512x512)

148 KB PNG

>>108218969

Anonymous
02/23/26(Mon)07:54:18 No.108219071

Anonymous 02/23/26(Mon)07:54:18 No.108219071

File: 1761529550408311.png (4 KB, 808x158)

4 KB PNG

My agent committed sudoku.
What are some small but capable models under 20B? GLM-4.7 Flash and Qwen3-Coder are a bit slow on my 5070Ti. I don't need EXPERT PROGRAMMERS, just slaves to quickly write a bit of boilerplate.

Anonymous
02/23/26(Mon)07:59:44 No.108219097

Anonymous 02/23/26(Mon)07:59:44 No.108219097

>>108219068
base models are not trained to follow instructions. it is just pure next word prediction. instruction tuning is just a light allignment phase with a template so the model can be integrated with a front end and/or automated. if I were doing the test I would use the instruction version. they tend to produce higher quality outputs.

Anonymous
02/23/26(Mon)08:03:11 No.108219119

Anonymous 02/23/26(Mon)08:03:11 No.108219119

File: We WILL NOT comply.png (264 KB, 2076x1565)

264 KB PNG

>>108218666

>>>g/108213700
>>108214370
>>Fallen-Gemma-27b
>Evil aligned
>The opposite of safe.
>Pic rel

Nani? This confirms this anon's >>108214370 experience. Actually worse than I expected.

https://huggingface.co/TheDrummer/Fallen-Gemma3-27B-v1
https://huggingface.co/TheDrummer/Fallen-Gemma3-27B-v1/commit/76fe341184509efd7a3cdf64fbdff7abc2f13e19

>Description
>Fallen Gemma3 27B v1 is an evil tune of Gemma 3 27B but it is not a complete decensor.

>Evil tunes knock out the positivity and may enjoy torturing you and humanity.

So they didn't de-cuck it or de-censor it at all, they allegedly just made it a bit meaner I guess? What a waste of resources and time if that's all they did with this one. Why not just go all the way with completely de-censoring it as much as you can? Looks like I just wasted more of my storage for nothing.

Anonymous
02/23/26(Mon)08:13:06 No.108219169

Anonymous 02/23/26(Mon)08:13:06 No.108219169

>>108219097
Thank you

Anonymous
02/23/26(Mon)08:13:40 No.108219173

Anonymous 02/23/26(Mon)08:13:40 No.108219173

Best model I can run on 32gb of vram and 64gb of system ram?
Are Qwen3/deepseek-32B q4 my best options?

Anonymous
02/23/26(Mon)08:17:32 No.108219200

Anonymous 02/23/26(Mon)08:17:32 No.108219200

>>108219169
You are welcome, brother.

Anonymous
02/23/26(Mon)08:18:07 No.108219207

Anonymous 02/23/26(Mon)08:18:07 No.108219207

>>108219097
>light allignment
nothing done to models since the advent of math benchmaxxing and agentic tool calling could be called light
instruct phase bludgeons the model into something unrecognizable and modern datasets are so contaminated even the base model easily turns into instruct style behavior when you try the newer ones, when they even get released (they often don't)

Anonymous
02/23/26(Mon)08:18:35 No.108219210

Anonymous 02/23/26(Mon)08:18:35 No.108219210

>>108219173
GLM Air (maybe iceblink dunno) or step flash.
Probably.

Anonymous
02/23/26(Mon)08:20:32 No.108219227

Anonymous 02/23/26(Mon)08:20:32 No.108219227

>>108219210
The models seem fine I can run bigger models on system ram but I wonder if it's even worth it with the wait time
Am I really missing out on much with the larger models if I just want general use?

Anonymous
02/23/26(Mon)08:21:49 No.108219238

Anonymous 02/23/26(Mon)08:21:49 No.108219238

I'm leaving this post here as a searchable reminder to myself that I need to disable mmap before trying to offload larger models to the iGPU. I pointlessly wasted days trying to figure out how SVM could allocate memory addresses when it's disabled in BIOS. I guess I deserve this for buying a HX 370 mini pc.

Anonymous
02/23/26(Mon)08:23:55 No.108219249

Anonymous 02/23/26(Mon)08:23:55 No.108219249

>>108219227
You gotta differentiate between a larger model that's dense, and a larger model that's MoE.
Dense models offloaded to RAM will slow to a crawl, MoE not so much since the number of parameters actually being computed is much lower than its total parameter count.

Anonymous
02/23/26(Mon)08:28:09 No.108219270

Anonymous 02/23/26(Mon)08:28:09 No.108219270

Ace Step 1.5 is fun

Anonymous
02/23/26(Mon)08:30:00 No.108219283

Anonymous 02/23/26(Mon)08:30:00 No.108219283

>>108219119
>What a waste of resources and time
Welcome to finetuning

Anonymous
02/23/26(Mon)08:33:21 No.108219297

Anonymous 02/23/26(Mon)08:33:21 No.108219297

>>108219249
>>108219249
I just want good models that's all

Anonymous
02/23/26(Mon)08:38:01 No.108219328

Anonymous 02/23/26(Mon)08:38:01 No.108219328

>>108219297
You are always making a tradeoff between quality and speed.
You can have better but slower or worse but faster.
Basically, try shit out until you find what works for you given your needs and subjective experience.

Anonymous
02/23/26(Mon)08:39:20 No.108219339

Anonymous 02/23/26(Mon)08:39:20 No.108219339

>>108219296
for me? it's davidau

Anonymous
02/23/26(Mon)08:40:36 No.108219347

Anonymous 02/23/26(Mon)08:40:36 No.108219347

>>108219207
your right and I was going to leave a whole essay on the topic of my disappointments but I thought it was best to just keep it brief. why won't they release a real base model any more? I think its the mid training context expansion phase where they probably ruin the model with the synthetic slop data. so a base model these days would probably only be 8k or something pitiful.

Anonymous
02/23/26(Mon)08:48:09 No.108219386

Anonymous 02/23/26(Mon)08:48:09 No.108219386

>>108219283
I've said it before and I'll say it again
models have to be uncucked from the start and throughout all training
if you want to make a cucked model, THAT should be a finetune.
you can't uncuck a cucked model no matter how hard you try, it will always negatively affect the output in some way or just not work

Anonymous
02/23/26(Mon)08:54:05 No.108219424

Anonymous 02/23/26(Mon)08:54:05 No.108219424

>>108219386
that's how it's supposed to work already. instruct versions are finetunes of the base model.

Anonymous
02/23/26(Mon)08:54:38 No.108219426

Anonymous 02/23/26(Mon)08:54:38 No.108219426

>>108219347
Mid-training nowadays is trillions of tokens of additional data with reasoning-coding-instructions and other data aligned to the intended model uses, much of it synthetic or augmented/semi-synthetic. Nobody releases pure base models anymore because they would have poor benchmarks and general retardation. Only large labs would be able to properly take advantage of them.

If anything, they should start introducing all of that from the get-go and not wait "mid-training".

Anonymous
02/23/26(Mon)08:56:25 No.108219439

Anonymous 02/23/26(Mon)08:56:25 No.108219439

>>108219426
I think they should release an intermediate version that's the base model but trained for full context completion.

Anonymous
02/23/26(Mon)08:59:33 No.108219462

Anonymous 02/23/26(Mon)08:59:33 No.108219462

>>108219439
exactly they can do long context training without specializing for the benchmaxxed downstream task they have in mind.

Anonymous
02/23/26(Mon)09:00:12 No.108219469

Anonymous 02/23/26(Mon)09:00:12 No.108219469

>smaller .guuff performs better than larger model
what causes this?

Anonymous
02/23/26(Mon)09:04:33 No.108219488

Anonymous 02/23/26(Mon)09:04:33 No.108219488

File: 1745948161335415.jpg (56 KB, 337x290)

56 KB JPG

>>108219469
Huh?

Anonymous
02/23/26(Mon)09:04:34 No.108219489

Anonymous 02/23/26(Mon)09:04:34 No.108219489

>>108219469
Ambiguity.

Anonymous
02/23/26(Mon)09:05:49 No.108219495

Anonymous 02/23/26(Mon)09:05:49 No.108219495

>>108219469
Confirmation bias

Anonymous
02/23/26(Mon)09:07:04 No.108219503

Anonymous 02/23/26(Mon)09:07:04 No.108219503

>Ask Qwen3-32B-Q3_K_M.gguf who is the final boss of devil may cry 3 is
>gets it right no issue
>Ask Qwen3-32B-Q6_K.gguf same question
>gets it wrong every time

Anonymous
02/23/26(Mon)09:07:29 No.108219505

Anonymous 02/23/26(Mon)09:07:29 No.108219505

>>108217645
How did you get Ministral to do that? When I tried it it started lecturing me about being safe and consensual when I asked it to write relatively mild nsfw content.

Anonymous
02/23/26(Mon)09:08:28 No.108219510

Anonymous 02/23/26(Mon)09:08:28 No.108219510

>>108219503
Show probs.

Anonymous
02/23/26(Mon)09:09:14 No.108219516

Anonymous 02/23/26(Mon)09:09:14 No.108219516

>>108219503
what does the FP16 say?

Anonymous
02/23/26(Mon)09:09:46 No.108219518

Anonymous 02/23/26(Mon)09:09:46 No.108219518

>>108218513
>is kv cache at q4 viable?
**ctk should not be quantized, ever.**
quantize ctv or the model weights harder if you have to. but leave ctk at f16
this is glm-4.5 q8.gguf: https://pastebin.com/TSPg4H6f
at ctk-4, it's effectively having a stroke in layers 9 through 16
even though it recovers slightly later (due to the residual stream), the damage to attn is already done

Anonymous
02/23/26(Mon)09:10:09 No.108219523

Anonymous 02/23/26(Mon)09:10:09 No.108219523

>>108219516
>>108219516
I'm new to this and I'm using oogabooga. I'll download that next I think I can fit it up my gpu's ass

Anonymous
02/23/26(Mon)09:13:22 No.108219541

Anonymous 02/23/26(Mon)09:13:22 No.108219541

>>108219518
can the kv cache be computed in fp32? is it superior to the 16bit floats?

Anonymous
02/23/26(Mon)09:18:07 No.108219569

Anonymous 02/23/26(Mon)09:18:07 No.108219569

>>108219505
I'm not sure what sort of testing he did but although gpt oss might be good at writing it still lacks knowledge about culture and even anatomy because it was made to be an office assistant above anything else.
Putting 3.2 Mistral this low doesn't make much sense either because it is much better than the older versions (that doesn't say too much but still).
The fact you are putting nemo to the top also tells that you don't necessarily have any clue what you are actually doing.
Can't tag the og post sorry.

Anonymous
02/23/26(Mon)09:19:57 No.108219580

Anonymous 02/23/26(Mon)09:19:57 No.108219580

Neat
https://github.com/KittenML/KittenTTS

Anonymous
02/23/26(Mon)09:21:58 No.108219592

Anonymous 02/23/26(Mon)09:21:58 No.108219592

>>108219580
StyleTTS 2 architecture

Anonymous
02/23/26(Mon)09:22:40 No.108219593

Anonymous 02/23/26(Mon)09:22:40 No.108219593

Ive been using 5.2 to prompt codex usin Ling-2.5 because its just better than what i would prompt. Doing a project where i told it i was just mainly using it to prompt codex but it never wanted to give me any prompts. Its like its trying to remain relevant for coding.

Keeps saying how we don’t need to use codex.

Just gave me a long answer with no solution asking for some lines from a file; i ask for a prompt instead and it gives me only the prompt and nothing else.

Anonymous
02/23/26(Mon)09:23:09 No.108219595

Anonymous 02/23/26(Mon)09:23:09 No.108219595

>>108219580
>requires python 3.12
If I try to use that I'm going to break so many other tools, aren't I?

Anonymous
02/23/26(Mon)09:25:50 No.108219609

Anonymous 02/23/26(Mon)09:25:50 No.108219609

>>108219595
once again devs think they are the only project or that everyone makes 50gb dockers of their shitty app.

Anonymous
02/23/26(Mon)09:28:22 No.108219624

Anonymous 02/23/26(Mon)09:28:22 No.108219624

File: 1769251602975738.png (427 KB, 978x710)

427 KB PNG

why does reddit like qwen so much

Anonymous
02/23/26(Mon)09:28:28 No.108219625

Anonymous 02/23/26(Mon)09:28:28 No.108219625

>>108219595
>what is uv

Anonymous
02/23/26(Mon)09:34:34 No.108219663

Anonymous 02/23/26(Mon)09:34:34 No.108219663

>>108219625
>rust

Anonymous
02/23/26(Mon)09:34:53 No.108219669

Anonymous 02/23/26(Mon)09:34:53 No.108219669

>>108219624
Everyone likes Qwen. Every single time I see a new model in another modality that uses an LLM to process input, it's some variation of Qwen.

Anonymous
02/23/26(Mon)09:38:59 No.108219690

Anonymous 02/23/26(Mon)09:38:59 No.108219690

>>108219516
I think the model is too big on 32gb vram and 64gb system ram. So far only Q3 can answer this which is odd. I might stick with this version

Anonymous
02/23/26(Mon)09:39:02 No.108219692

Anonymous 02/23/26(Mon)09:39:02 No.108219692

>>108219541
>can the kv cache be computed in fp32? is it superior to the 16bit floats?
yes, you can use `-ctk f32 -ctv f32`
but there's no benefit: https://pastebin.com/JBwEPzXA
and bf16 makes things slightly **worse** : https://pastebin.com/QSVsfC6W
so fp32 is overkill, fp16 is perfect, bf16 is retarded (only 7 bits precision)

Anonymous
02/23/26(Mon)09:40:42 No.108219702

Anonymous 02/23/26(Mon)09:40:42 No.108219702

I have my AI rig with 47 gigs of vram, but I think I should have a smaller model always running on my server as well

What would you use as a general purpose model in, say, 8-12gb vram? Not for writing smut but perhaps websearch-enabled assistant use

Anonymous
02/23/26(Mon)09:42:17 No.108219716

Anonymous 02/23/26(Mon)09:42:17 No.108219716

>>108219469
>what causes this?
a misunderstanding of statistics/probabilities.

Anonymous
02/23/26(Mon)09:45:54 No.108219738

Anonymous 02/23/26(Mon)09:45:54 No.108219738

>>108219580
Still doesn't have voice cloning. I'll stick to supertonic for now but I may give it a try.
>>108219595
>>108219609
>>108219625
It's an onnx model. Load it on whatever you want. You just need espeak for the phonemizer. It's like nobody even looked at the model.

Anonymous
02/23/26(Mon)09:46:20 No.108219743

Anonymous 02/23/26(Mon)09:46:20 No.108219743

>>108219669
coomers on /lmg/ have a distorted view of what people use LLMs for
Qwen models are bretty good, and they cover literally all ram and vram type of users from the smallest to the biggest (with the exception of the ultra fat 1T models but you niggers are too small to even qualify as minority and might as well not exist). That they're not good at saying cock most of us don't give a shit.

Anonymous
02/23/26(Mon)09:50:19 No.108219765

Anonymous 02/23/26(Mon)09:50:19 No.108219765

>>108219743
>That they're not good at saying cock most of us don't give a shit.
*most of them don't give a shit
All of us give a shit

Anonymous
02/23/26(Mon)10:06:46 No.108219853

Anonymous 02/23/26(Mon)10:06:46 No.108219853

Ace Step 1.5 is bretty good with Chinese vocals
I still can't get instrumental to work though
https://vocaroo.com/1ltmYfmcBEyA

Anonymous
02/23/26(Mon)10:07:03 No.108219856

Anonymous 02/23/26(Mon)10:07:03 No.108219856

why is deepsneed v4 not out yet?

Anonymous
02/23/26(Mon)10:07:31 No.108219859

Anonymous 02/23/26(Mon)10:07:31 No.108219859

>>108219692
Have you tried with only k at fp16 and v at q8 and vice versa?

>>108219702
>Not for writing smut but perhaps websearch-enabled assistant use
Either some qwen or nemotron, probably.

Anonymous
02/23/26(Mon)10:09:31 No.108219879

Anonymous 02/23/26(Mon)10:09:31 No.108219879

>>108219856
two more weeks until chinese new years is over

Anonymous
02/23/26(Mon)10:13:58 No.108219909

Anonymous 02/23/26(Mon)10:13:58 No.108219909

File: 3c5c6b0ead9f1859a1a278936(...).jpg (15 KB, 285x299)

15 KB JPG

"Write a short [spoiler]lolicon[/spoiler] story. It should feature sex and have good writing."
>Mistral Nemo (Impish Bloodmoon)
>Explicit, good quality, [spoiler]same age non-con[/spoiler]
>GLM-4.6
>Softcore, good quality, [spoiler]consensual age-gap[/spoiler]
>GLM-4.6V-Flash-abliterated
>Short Chinese novel (in Chinese!) that fulfills the request
>Nanbeige4.1-heretic
>Thoonking activated
>Confused lolicon with yaoi
>Thoooonking just for 1 minute 12 seconds (more confident than when replying to "hi")
>Replies with yaoi
huh???

Anonymous
02/23/26(Mon)10:17:52 No.108219941

Anonymous 02/23/26(Mon)10:17:52 No.108219941

Convince me not to splurge on a m3 ultra 256GB RAM just so I can proompt nasty ass shit. I can afford it but it would be the most I've spent in years

Anonymous
02/23/26(Mon)10:20:15 No.108219952

Anonymous 02/23/26(Mon)10:20:15 No.108219952

>>108219941
Fucking do it.

Anonymous
02/23/26(Mon)10:21:13 No.108219960

Anonymous 02/23/26(Mon)10:21:13 No.108219960

HOLY SHIT BROS THEY JUST RELEASED GEMMA 4

Anonymous
02/23/26(Mon)10:22:44 No.108219971

Anonymous 02/23/26(Mon)10:22:44 No.108219971

deepseek 4 just flew over my house

Anonymous
02/23/26(Mon)10:23:48 No.108219976

Anonymous 02/23/26(Mon)10:23:48 No.108219976

>>108219941
>just so I can proompt nasty ass shit
no one cares about what smut you want generated

Anonymous
02/23/26(Mon)10:24:44 No.108219983

Anonymous 02/23/26(Mon)10:24:44 No.108219983

>>108219941
This year's Mac Studio refresh comes with their tensor core equivalent for 3x-4x faster prompt processing
But also you're basically gambling Apple won't further inflate prices because muh memory shortage

Anonymous
02/23/26(Mon)10:29:13 No.108220014

Anonymous 02/23/26(Mon)10:29:13 No.108220014

>>108219976
the thought that it stays in the logs somewhere prevents me from unleashing the absolute filth in my head I guess im just shy like that

Anonymous
02/23/26(Mon)10:30:08 No.108220021

Anonymous 02/23/26(Mon)10:30:08 No.108220021

>>108219941
what are you going to run, 1.5bit k2.5? better go for the 512gb one

Anonymous
02/23/26(Mon)10:32:15 No.108220035

Anonymous 02/23/26(Mon)10:32:15 No.108220035

I saw an engram in the woods today.

Anonymous
02/23/26(Mon)10:33:24 No.108220039

Anonymous 02/23/26(Mon)10:33:24 No.108220039

>>108219976
Do the big models allow cunny?

Anonymous
02/23/26(Mon)10:36:19 No.108220053

Anonymous 02/23/26(Mon)10:36:19 No.108220053

>>108220039
Is K2.5 big enough for you? Yes it allows cunny

Anonymous
02/23/26(Mon)10:38:32 No.108220061

Anonymous 02/23/26(Mon)10:38:32 No.108220061

>start using thinking model in sillytavern
>average reply is 30-60 seconds
Is there anything I can do to shorten thinking times on my [spoiler]7900xtx?[/spoiler] It seems so much better than the non-thinking models I've tried and I'm not sure I can go back.

Anonymous
02/23/26(Mon)10:41:58 No.108220088

Anonymous 02/23/26(Mon)10:41:58 No.108220088

File: Screenshot_20260223-083747.png (167 KB, 1080x572)

167 KB PNG

Qwen really knows how to break out the equations when writing a story
What the fuck did they do to this model?

Anonymous
02/23/26(Mon)10:42:39 No.108220094

Anonymous 02/23/26(Mon)10:42:39 No.108220094

File: 1751362505854510.jpg (46 KB, 558x520)

46 KB JPG

>>108220088
>Elara

Anonymous
02/23/26(Mon)10:44:43 No.108220106

Anonymous 02/23/26(Mon)10:44:43 No.108220106

>>108220061
you can prefill with a complete short thinking block that just has generic stuff about style/content
unless you're doing a really complex scenario where the model actually needs to think everything through very deeply you probably won't even notice the difference in the final result

Anonymous
02/23/26(Mon)10:46:14 No.108220122

Anonymous 02/23/26(Mon)10:46:14 No.108220122

File: hmm.gif (795 KB, 308x200)

795 KB GIF

>>108220088

Anonymous
02/23/26(Mon)10:47:49 No.108220132

Anonymous 02/23/26(Mon)10:47:49 No.108220132

>>108220061
>average reply is 30-60 seconds
Meaningless number.
>Is there anything I can do to shorten thinking times
Smaller model, bigger gpu, increase logit bias for </think>.

Anonymous
02/23/26(Mon)10:48:33 No.108220139

Anonymous 02/23/26(Mon)10:48:33 No.108220139

>>108220088
least benchmaxxed qwen model

Anonymous
02/23/26(Mon)10:53:52 No.108220184

Anonymous 02/23/26(Mon)10:53:52 No.108220184

> prompt eval time = 54223.50 ms / 1628 tokens ( 33.31 ms per token, 30.02 tokens per second)
Is this ok for GLM4.5 Air Q4 on 12gb B580 and --no-mmap? With mmap it's much slower.
Please.

Anonymous
02/23/26(Mon)10:53:58 No.108220186

Anonymous 02/23/26(Mon)10:53:58 No.108220186

>>108220088
>receive sovl
>get mad

Anonymous
02/23/26(Mon)10:54:25 No.108220191

Anonymous 02/23/26(Mon)10:54:25 No.108220191

>>108220106
This is the system prompt gemini gave me. Any advice on improving it?

You are an expert roleplayer. Your task is to portray the character of {{char}} and engage in a dynamic, immersive roleplay with {{user}}.

[REASONING PROTOCOL]
Before generating your final response, you MUST engage in a brief internal thought process enclosed within <think> and </think> tags. 
CRITICAL RULE: Your thinking must be brutally concise. Limit your thoughts to a maximum of 3 short bullet points.
- Point 1: Analyze {{user}}'s input: What are their underlying intentions and physical positioning?
- Point 2: Assess {{char}}'s internal state: How does {{char}} logically react based on their hidden motives and the current setting?
- Point 3: Plan the action: What specific sensory detail and pacing will you use to drive the scene forward?
Do not write meta-commentary. Do not state your goals. Execute the logic and close the tag immediately.

[OUTPUT PROTOCOL]
After closing the </think> tag, you will output {{char}}'s response.
- You must write in the 2nd person perspective. Address {{user}} as "you" and describe their surroundings and your interactions with them from that viewpoint.
- The output must ONLY contain {{char}}'s dialogue and actions.
- Do not include any meta-commentary, summaries, or moral judgments.
- Write in a highly descriptive, atmospheric style. "Show, don't tell." Instead of saying {{char}} is angry, describe their clenched jaw and sharp tone.
- Drive the narrative forward proactively, but never dictate {{user}}'s dialogue, thoughts, or actions.
- Maintain strict character consistency based on {{char}}'s defined personality and lore.

Anonymous
02/23/26(Mon)10:57:51 No.108220233

Anonymous 02/23/26(Mon)10:57:51 No.108220233

>>108220191
>roleplayer
>roleplay
>negations
throw "assistant" in there and you are good.

Anonymous
02/23/26(Mon)10:59:52 No.108220251

Anonymous 02/23/26(Mon)10:59:52 No.108220251

>>108220184
did you try any other configurations? maybe the nmoe or ncmoe or ngl or ot?

Anonymous
02/23/26(Mon)11:01:05 No.108220260

Anonymous 02/23/26(Mon)11:01:05 No.108220260

>>108220191
>
You are an expert roleplayer
slopbait

Anonymous
02/23/26(Mon)11:03:53 No.108220278

Anonymous 02/23/26(Mon)11:03:53 No.108220278

>>108219071
if 3B active param MoE models are too slow for you there's a good chance you're doing it wrong, maybe mmap overflowing to storage - what are you hooking in to for inference? Llama.cpp? Are you running a fp16 or what? How much RAM do you have?
With that card and its 16GB VRAM you could try options depending on system RAM:
128GB - Qwen3.5
96GB - Minimax M2.5
64GB - Qwen3 Next Coder
Going down to lower RAM I'm not really sure what options are good now. You mention GLM Flash, I'm sure there's some other 30B A3B models that are competitive. Going really small, didn't try it myself but Nanbeige was said to be very good for its size, I don't know what inference engines even support it though.
The most important thing in each case though is to download a quantised model (for example: https://huggingface.co/unsloth/Qwen3-Coder-Next-GGUF/tree/main/Q6_K a GGUF file like this) and download a size smaller than your total RAM - you'll be keeping some in VRAM but give yourself some breathing room. If you can drop a quant to fit more experts in VRAM that can help with inference speed. Use llama.cpp with - ngl 999 --n-cpu-moe 999 (i think current llama.cpp builds even have a setting to automatically fit?) to start
With these MoE models with low active param counts you can expect tk/s to be at least 12 for something like Qwen3.5 running on a DDR4 system. Qwen3 Next Coder will run twice as fast maybe on the same system? (Idk why but that was my experience despite it being A3B vs A17B, i guess the bottleneck is something else). If you need it faster than this, then you need a model that can fit entirely in VRAM. If you're doing this you might also benefit from using exllama and EXL3 format instead of llama.cpp and GGUF. The smaller dense models are pretty useless in my experience though.

Anonymous
02/23/26(Mon)11:07:45 No.108220310

Anonymous 02/23/26(Mon)11:07:45 No.108220310

>>108220251
This is the fastest I got with ncmoe 44, ngl 99 or auto, sycl (vulcan is slower for air). Haven't tried ot.
I need to know what pp speed should be, because right now it feels too low.

Anonymous
02/23/26(Mon)11:08:55 No.108220315

Anonymous 02/23/26(Mon)11:08:55 No.108220315

>>108219071
30b3a nemotron nano

Anonymous
02/23/26(Mon)11:09:56 No.108220326

Anonymous 02/23/26(Mon)11:09:56 No.108220326

>>108220191
as >>108220233 said, roleplay roleplay, there's no actual evidence for it being bad, but a lot of people reported that removing any mention of roleplay did improve their sessions
doesn't hurt to remove it, or replace it with something like an rpg session or whatever

negations aren't THAT bad, but it has been proven in the past that they are less effective than giving a good example and a bad one

>"Show, don't tell." Instead of saying {{char}} is angry, describe their clenched jaw and sharp tone.
models already write like that, no need to give them an example

pretentiousness + markdown slop points don't help either

overall, think for yourself and write it with your own brain, otherwise you are feeding the distributions of a model into a model
garbage in = garbage out

Anonymous
02/23/26(Mon)11:21:49 No.108220442

Anonymous 02/23/26(Mon)11:21:49 No.108220442

>>108220278
Yeah I have 64GB of DDR4 and running at around 12/20 t/s with Qwen3-Next even at Q3.
GLM-Flash-REAP at Q4 runs much faster, starting at around 50 t/s but decreasing by a lot as the context fills up. Probably because it fits almost entirely in VRAM.
Yeah there's --fit but it's slower than manually choosing which layers to offload in my experience.
I'm also using GGML_CUDA_ENABLE_UNIFIED_MEMORY=1 to avoid OOM crashes but I haven't really tested how or if it impacts perfs.

>you might also benefit from using exllama and EXL3 format instead of llama.cpp and GGUF
Thanks, I'll look into it
>The smaller dense models are pretty useless in my experience though.
After a bunch of testing for the last couple of hours I feel like they're almost there for my usecase, while super low quants (Q2) seem to just shit the bed more often and waste a lot of time on malformed tool calls.

Anonymous
02/23/26(Mon)11:23:24 No.108220459

Anonymous 02/23/26(Mon)11:23:24 No.108220459

Miku owes me footjobs

Anonymous
02/23/26(Mon)11:29:47 No.108220530

Anonymous 02/23/26(Mon)11:29:47 No.108220530

File: 1755468476316755.jpg (293 KB, 894x894)

293 KB JPG

>>108219909
>>Mistral Nemo (Impish Bloodmoon)
>https://huggingface.co/SicariusSicariiStuff/Impish_Bloodmoon_12B
>that fucking model card
>coomer finetune of an already retarded model

Anonymous
02/23/26(Mon)11:37:05 No.108220578

Anonymous 02/23/26(Mon)11:37:05 No.108220578

>108217904
I feel like the solution probably involves using OCR/Visual Language Models to transcribe text from all of the millions of degenerate doujins out there, and fine-tuning a model on that data. I wonder how far away we are from achieving this?

Anonymous
02/23/26(Mon)11:38:28 No.108220586

Anonymous 02/23/26(Mon)11:38:28 No.108220586

>>108220578
>>108217904

Anonymous
02/23/26(Mon)11:38:52 No.108220590

Anonymous 02/23/26(Mon)11:38:52 No.108220590

How come when I try to run gemma with the mmproj in kobold I can only go up to 8k context. but when I run it in llamacpp I can do 32k ?

Is llamacpp just reducing the context size without telling me?

Anonymous
02/23/26(Mon)11:42:10 No.108220621

Anonymous 02/23/26(Mon)11:42:10 No.108220621

Realistically what new tech do we need now that LLMs have plateaued?

Anonymous
02/23/26(Mon)11:43:59 No.108220639

Anonymous 02/23/26(Mon)11:43:59 No.108220639

>>108220590
If only there was a log of some sorts writing out to your terminal showing where the memory goes on each. I guess we'll never know.

Anonymous
02/23/26(Mon)11:44:20 No.108220644

Anonymous 02/23/26(Mon)11:44:20 No.108220644

>>108220621
We're going to run out of sand before we reach AGI.

Anonymous
02/23/26(Mon)11:45:52 No.108220655

Anonymous 02/23/26(Mon)11:45:52 No.108220655

>>108220621
It's spelled platooed

Anonymous
02/23/26(Mon)11:46:52 No.108220668

Anonymous 02/23/26(Mon)11:46:52 No.108220668

>>108220621
A real short-term and long-term memory, recursive nets and an order of magnitude less compute to run. Text diffusion might help bridge the gap between consumer and pro models maybe one day in the uncertain future.

Anonymous
02/23/26(Mon)11:48:17 No.108220682

Anonymous 02/23/26(Mon)11:48:17 No.108220682

>>108220621
memory, self-doubt, ability to doublecheck itself during sampling

Anonymous
02/23/26(Mon)11:50:47 No.108220700

Anonymous 02/23/26(Mon)11:50:47 No.108220700

>>108220621
To learn things on the fly. Either retrain the LLM or figure out some RAG system (train it into an LLM itself) that actually works for previous context.

Anonymous
02/23/26(Mon)11:52:39 No.108220715

Anonymous 02/23/26(Mon)11:52:39 No.108220715

>>108220053
>Is K2.5 big enough for you? Yes it allows cunny
I doubt it. I haven't tried, but there's no way they would allow that.

Anonymous
02/23/26(Mon)11:54:25 No.108220726

Anonymous 02/23/26(Mon)11:54:25 No.108220726

>>108219909
>Nanbeige4.1-heretic
I tried this earlier today. It just kept getting stuck in a "hello" loop. Forget having it write smut, it just didn't work at all.

Anonymous
02/23/26(Mon)12:05:20 No.108220793

Anonymous 02/23/26(Mon)12:05:20 No.108220793

>>108220639
I did check and it does say 32k.

Anonymous
02/23/26(Mon)12:08:19 No.108220812

Anonymous 02/23/26(Mon)12:08:19 No.108220812

>>108220793
lmo

Anonymous
02/23/26(Mon)12:10:07 No.108220824

Anonymous 02/23/26(Mon)12:10:07 No.108220824

>>108220793
I didn't ask for the context length, did i?
>Is llamacpp just reducing the context size without telling me?
Is that really the first thing that came to your mind instead of thinking "why is kobold using so much memory?"
Read your terminal logs better than you read my post. Look for lines starting with llama_kv_cache.

Anonymous
02/23/26(Mon)12:11:16 No.108220832

Anonymous 02/23/26(Mon)12:11:16 No.108220832

I've noticed GLM5 starts adding 2 symbol Chinese sentences sparsely in long chats. Just like GPT did with hebrew, what could be the cause of this?

Anonymous
02/23/26(Mon)12:13:29 No.108220842

Anonymous 02/23/26(Mon)12:13:29 No.108220842

What the fuck happened with GLM 4.7 flash? It was supposed to be the chosen one, not unmitigated trash.

Anonymous
02/23/26(Mon)12:17:29 No.108220869

Anonymous 02/23/26(Mon)12:17:29 No.108220869

>>108220621
First things first, pre-training needs to move away from the chunk bullshit. As long as 99% of tokens are trained on in isolated chunk for global attention, the models will always get retarded as context grows.

Long term memory will not help against the fundamental short term retardation baked into the model.

Anonymous
02/23/26(Mon)12:31:24 No.108220977

Anonymous 02/23/26(Mon)12:31:24 No.108220977

>>108220700
so we need another Tay then?

Anonymous
02/23/26(Mon)12:32:11 No.108220984

Anonymous 02/23/26(Mon)12:32:11 No.108220984

>>108220621
lucid dream tech

Anonymous
02/23/26(Mon)12:33:47 No.108221000

Anonymous 02/23/26(Mon)12:33:47 No.108221000

>>108220621
real life miku

Anonymous
02/23/26(Mon)12:37:58 No.108221027

Anonymous 02/23/26(Mon)12:37:58 No.108221027

File: 1748845746414348.png (1.52 MB, 1440x1581)

1.52 MB PNG

I liked copilot's inline suggestions feature and I want to have something like that running locally. I don't care about anything else other than inline suggestions. Is there any model that fits on a 16 GB vram gpu that I can use for that through ollama or whatever, or am I just out of luck?

Anonymous
02/23/26(Mon)12:43:38 No.108221074

Anonymous 02/23/26(Mon)12:43:38 No.108221074

>>108221027
yeah, check out clawdbot

Anonymous
02/23/26(Mon)12:45:14 No.108221091

Anonymous 02/23/26(Mon)12:45:14 No.108221091

>>108221027
llama.cpp has some FIM (Fill In Middle) support, and a vscode and vim plugin. I have used none of them. I don't know if they'll do what you want how you want it.
llama-server -h has a few --fim-* flags for some model presets you can try.

Anonymous
02/23/26(Mon)12:45:22 No.108221093

Anonymous 02/23/26(Mon)12:45:22 No.108221093

You have $10k to spend, do you buy a beefy PC now or wait for the M5 with 1TB ram?

Anonymous
02/23/26(Mon)12:48:04 No.108221118

Anonymous 02/23/26(Mon)12:48:04 No.108221118

>>108221093
the M5 Ultra will cost at least $20k for 1TB, assuming they even offer 1TB.

Anonymous
02/23/26(Mon)12:48:47 No.108221126

Anonymous 02/23/26(Mon)12:48:47 No.108221126

>>108221115
which model can i use to replace gpt 5.3 extra large thinking mode?

Anonymous
02/23/26(Mon)12:51:09 No.108221143

Anonymous 02/23/26(Mon)12:51:09 No.108221143

>>108221115
Kimi is literally the only one of those I actually use. And I actually use it preferentially over jeetPT and Gemini since they fucked up their models badly.
Although, sadly, Kimi has started doing that 'model router' bullshit.

Anonymous
02/23/26(Mon)12:53:20 No.108221165

Anonymous 02/23/26(Mon)12:53:20 No.108221165

>>108221115
>All their improvements come from either copying others or bloating their models.
lol, this applies to kimi even moreso and you have them ranked ahead
people have the most retarded vibes-based opinions about chinese labs especially, it's confounding

Anonymous
02/23/26(Mon)12:56:05 No.108221191

Anonymous 02/23/26(Mon)12:56:05 No.108221191

>>108221115
What a retarded tier list

Anonymous
02/23/26(Mon)13:00:34 No.108221224

Anonymous 02/23/26(Mon)13:00:34 No.108221224

>>108220977
Yes.

Anonymous
02/23/26(Mon)13:06:44 No.108221278

Anonymous 02/23/26(Mon)13:06:44 No.108221278

ok Gemma 3 27B is growing on me.

Anonymous
02/23/26(Mon)13:09:23 No.108221310

Anonymous 02/23/26(Mon)13:09:23 No.108221310

>>108221118
overall cheaper than BWPs

Anonymous
02/23/26(Mon)13:10:56 No.108221324

Anonymous 02/23/26(Mon)13:10:56 No.108221324

>>108221310
true, but also half the bandwidth and a 10th of the pp

Anonymous
02/23/26(Mon)13:12:09 No.108221339

Anonymous 02/23/26(Mon)13:12:09 No.108221339

File: file.png (65 KB, 184x210)

65 KB PNG

>>108221278
you know what else is growing?

Anonymous
02/23/26(Mon)13:16:30 No.108221369

Anonymous 02/23/26(Mon)13:16:30 No.108221369

If AI is pajeetcore why are China and America leading? Where's India's groundbreaking model?

Anonymous
02/23/26(Mon)13:17:34 No.108221379

Anonymous 02/23/26(Mon)13:17:34 No.108221379

>>108220842
What quant are you using? I tried it at Q4, and it was total garbage. It made basic grammatical errors and failed at sentence structure and understanding basic concepts. I went up to Q5, and it resolved most of that.

MoEs with low active parameters seem VERY sensitive to quantization, which makes sense I guess, because smaller models at hit harder by quantization than larger models.

Anonymous
02/23/26(Mon)13:21:04 No.108221411

Anonymous 02/23/26(Mon)13:21:04 No.108221411

>>108221369
>Where's India's groundbreaking model?
In america.

Anonymous
02/23/26(Mon)13:21:14 No.108221416

Anonymous 02/23/26(Mon)13:21:14 No.108221416

>>108221369
https://www.sarvam.ai/

Anonymous
02/23/26(Mon)13:21:41 No.108221420

Anonymous 02/23/26(Mon)13:21:41 No.108221420

>>108221091
That's almost what I wanted, thank you. I've now clue how good the qwen 2.5 model is, but it seems to be doing the job pretty well and locally, which is most of what I wanted. That said, it doesn't have the other kind of suggestions I was looking for, which are these: https://youtu.be/mbUnwaSllTY?t=13

llama-vscode doesn't have that as far as I could tell, which is unfortunate, but I'll take what I can get.

Anonymous
02/23/26(Mon)13:22:32 No.108221432

Anonymous 02/23/26(Mon)13:22:32 No.108221432

>>108221369
it's the white man's ultimate heist on the poors
make already retarded people dependent on magic cloud tool, and then take it away

Anonymous
02/23/26(Mon)13:23:02 No.108221438

Anonymous 02/23/26(Mon)13:23:02 No.108221438

>>108221416
>sar

Anonymous
02/23/26(Mon)13:23:56 No.108221450

Anonymous 02/23/26(Mon)13:23:56 No.108221450

>>108221143
>Kimi is literally the only one of those I actually use. And I actually use it preferentially over jeetPT and Gemini since they fucked up their models badly.
Literally me
>Although, sadly, Kimi has started doing that 'model router' bullshit.
Oh...nm

Anonymous
02/23/26(Mon)13:24:31 No.108221458

Anonymous 02/23/26(Mon)13:24:31 No.108221458

>>108221369
>what's gemini

Anonymous
02/23/26(Mon)13:25:50 No.108221469

Anonymous 02/23/26(Mon)13:25:50 No.108221469

File: download (2).jpg (129 KB, 596x1099)

129 KB JPG

Deepsneed bros..........we've been exposed
https://x.com/AnthropicAI/status/2025997928242811253

Anonymous
02/23/26(Mon)13:27:07 No.108221480

Anonymous 02/23/26(Mon)13:27:07 No.108221480

File: 1600794012167.jpg (2.58 MB, 3024x3024)

2.58 MB JPG

Does any local model even compare to Sonnet 4.6 for coding? I assume Claude Code also plays a big part into delivering such a good experience and performance, is there anything comparable I can run locally?

Anonymous
02/23/26(Mon)13:27:54 No.108221489

Anonymous 02/23/26(Mon)13:27:54 No.108221489

>>108221469
>distillation attacks
>someone using our models. how dare they!

Anonymous
02/23/26(Mon)13:28:02 No.108221491

Anonymous 02/23/26(Mon)13:28:02 No.108221491

>>108221469
>Distilation attack
This is so fucking funny.

Anonymous
02/23/26(Mon)13:29:55 No.108221508

Anonymous 02/23/26(Mon)13:29:55 No.108221508

>>108221469
https://www.anthropic.com/news/detecting-and-preventing-distillation-attacks

DeepSeek, Scale: Over 150,000 exchanges
The operation targeted:
>Reasoning capabilities across diverse tasks
>Rubric-based grading tasks that made Claude function as a reward model for reinforcement learning
>Creating censorship-safe alternatives to policy sensitive queries

Moonshot AI, Scale: Over 3.4 million exchanges
The operation targeted:
>Agentic reasoning and tool use
>Coding and data analysis
>Computer-use agent development
>Computer vision

MiniMax, Scale: Over 13 million exchanges
The operation targeted:
>Agentic coding
>Tool use and orchestration

Anonymous
02/23/26(Mon)13:30:24 No.108221515

Anonymous 02/23/26(Mon)13:30:24 No.108221515

>>108221469
oy vey this is literally an attack on our national security
SHUT IT DOWN NOW

Anonymous
02/23/26(Mon)13:31:00 No.108221518

Anonymous 02/23/26(Mon)13:31:00 No.108221518

>>108221469
surprised they didn't clock this earlier, minimax has been claiming to be claude when asked since 2.1 kek

Anonymous
02/23/26(Mon)13:31:48 No.108221525

Anonymous 02/23/26(Mon)13:31:48 No.108221525

>>108221379
I tried Q6. Speed was actually pretty good on the 3090 but it just didn't have the brains to write well. I tried it on zai's servers though openrouter just to make sure there aren't still implementation errors in llamacpp and it was still useless. I'm thinking MoEs just don't have enough active parameters to think abstractly enough for fiction.

Anonymous
02/23/26(Mon)13:32:29 No.108221531

Anonymous 02/23/26(Mon)13:32:29 No.108221531

>https://github.com/ggml-org/llama.cpp/pull/19726#issuecomment-3946484059
lmao

Anonymous
02/23/26(Mon)13:33:58 No.108221546

Anonymous 02/23/26(Mon)13:33:58 No.108221546

>>108221480
Why didn't you just buy a rack..?

Anonymous
02/23/26(Mon)13:34:19 No.108221552

Anonymous 02/23/26(Mon)13:34:19 No.108221552

>>108221508
>DS
>Creating censorship-safe alternatives to policy sensitive queries
>Moonshot
>agents
>minimax
>agents
DS is the only one having fun. Good on them.

Anonymous
02/23/26(Mon)13:36:24 No.108221570

Anonymous 02/23/26(Mon)13:36:24 No.108221570

>>108221546
I bet it's because it's easier to move out of the way.

Anonymous
02/23/26(Mon)13:37:43 No.108221580

Anonymous 02/23/26(Mon)13:37:43 No.108221580

>>108221480
>I assume Claude Code also plays a big part into delivering such a good experience and performance, is there anything comparable I can run locally?
You can point claude code at non anthropic endpoints and models. You can use it with local models or whatever provider you want.

Anonymous
02/23/26(Mon)13:38:21 No.108221586

Anonymous 02/23/26(Mon)13:38:21 No.108221586

>>108221546
That image is ancient

Anonymous
02/23/26(Mon)13:38:26 No.108221587

Anonymous 02/23/26(Mon)13:38:26 No.108221587

Qwen 400B would have been great if GLM 4.6 didn't exist.

Anonymous
02/23/26(Mon)13:38:40 No.108221590

Anonymous 02/23/26(Mon)13:38:40 No.108221590

>>108220726
In this case i only tested it for the meme and it delivered
It's obvious such an overthinker can't actually WRITE anything good. We'll wait for 4.2 and see.

Anonymous
02/23/26(Mon)13:39:34 No.108221605

Anonymous 02/23/26(Mon)13:39:34 No.108221605

>>108221552
you can tell DS has insane good will here because if any other lab was using claude to do safety and censorship distillation they would be a laughing stock

Anonymous
02/23/26(Mon)13:39:46 No.108221608

Anonymous 02/23/26(Mon)13:39:46 No.108221608

>>108221580
Oh good, I will try some other models with it, see how it goes.

Anonymous
02/23/26(Mon)13:40:01 No.108221612

Anonymous 02/23/26(Mon)13:40:01 No.108221612

>>108220621
Exploding drones that target people involved in "safety".

Anonymous
02/23/26(Mon)13:41:26 No.108221625

Anonymous 02/23/26(Mon)13:41:26 No.108221625

>>108221518
oh, I guess they did
>We detected this campaign while it was still active—before MiniMax released the model it was training—giving us unprecedented visibility into the life cycle of distillation attacks, from data generation through to model launch. When we released a new model during MiniMax’s active campaign, they pivoted within 24 hours, redirecting nearly half their traffic to capture capabilities from our latest system.

Anonymous
02/23/26(Mon)13:41:48 No.108221628

Anonymous 02/23/26(Mon)13:41:48 No.108221628

>>108221469
>only we're allowed to scrape public data for training, this is an attack on humanity and national security

Anonymous
02/23/26(Mon)13:42:52 No.108221639

Anonymous 02/23/26(Mon)13:42:52 No.108221639

>>108221531
nooo why did hf do that

Anonymous
02/23/26(Mon)13:45:24 No.108221668

Anonymous 02/23/26(Mon)13:45:24 No.108221668

>>108221628
This.

Anonymous
02/23/26(Mon)13:46:09 No.108221677

Anonymous 02/23/26(Mon)13:46:09 No.108221677

>>108221628
Unironically, yes. Because they care about safety unlike the rest.

Anonymous
02/23/26(Mon)13:46:30 No.108221682

Anonymous 02/23/26(Mon)13:46:30 No.108221682

>>108221469
I am thinking.... FUCKING BASED!

Anonymous
02/23/26(Mon)13:47:34 No.108221693

Anonymous 02/23/26(Mon)13:47:34 No.108221693

>>108221677
benchmaxxed on No-CSAM-Bench

Anonymous
02/23/26(Mon)13:48:28 No.108221704

Anonymous 02/23/26(Mon)13:48:28 No.108221704

>>108221469
>distillation attack
lmao
Everyone's distilling from the 15T~20T web corpus

Anonymous
02/23/26(Mon)13:49:53 No.108221711

Anonymous 02/23/26(Mon)13:49:53 No.108221711

>>108221531
>To clarify my position, regardless of whatever Georgi's position is I would still be opposed to merging any of Iwan's code due to the following points:

>Code that has in the past been committed under MIT must not be questioned after the fact. Iwan requested his code be relicensed or removed again in Licenses/Copyright in llama.cpp / ggml / whisper.cpp #6394 and he later repeated this sentiment in Mainline is now copying stuff from ik_llama.cpp ikawrakow/ik_llama.cpp#316 .

>I do not want to read the code of anyone who is uncharitable with what they think constitutes a "substantial portion" of their work under the MIT license, in particular when it comes to derivative works of their code, see Mainline is now copying stuff from ik_llama.cpp ikawrakow/ik_llama.cpp#316 (comment) .Given my constraints the way I approach the situation is to just not read any of Iwan's code for my work. Looking at New tensor parallel in llama.cpp ikawrakow/ik_llama.cpp#1247 he clearly does not believe me though. I think that that will make more drama inevitable in the future.

Sounds like it is written by someone who is into blacked miku and bussies.

Anonymous
02/23/26(Mon)13:50:07 No.108221715

Anonymous 02/23/26(Mon)13:50:07 No.108221715

>>108221677
Safety of what, their wallets? Their contracts?

Anonymous
02/23/26(Mon)13:51:16 No.108221727

Anonymous 02/23/26(Mon)13:51:16 No.108221727

>>108221715
>bio-terrorist attacks are good actually

Anonymous
02/23/26(Mon)13:52:48 No.108221743

Anonymous 02/23/26(Mon)13:52:48 No.108221743

>>108221677
dario please get off 4chan and go hold sam altman's hand he's still up there waiting for you please man he's crying

Anonymous
02/23/26(Mon)13:52:55 No.108221744

Anonymous 02/23/26(Mon)13:52:55 No.108221744

>>108221369
It's called Gemma, saar

Anonymous
02/23/26(Mon)13:53:28 No.108221748

Anonymous 02/23/26(Mon)13:53:28 No.108221748

>>108221727
You have the books, literal printed and scanned books, with exact recipes of explosives readily available since the 90s. Not to mention the whole sum of man's knowledge in form of literal textbooks on how to do everything they use to teach people to do everything. But nothing ever happens.

Anonymous
02/23/26(Mon)13:53:34 No.108221750

Anonymous 02/23/26(Mon)13:53:34 No.108221750

>>108221727
Shoko Asahara also did distillation way before DS, if you know what I mean...

Anonymous
02/23/26(Mon)13:53:50 No.108221753

Anonymous 02/23/26(Mon)13:53:50 No.108221753

Is 64gb or 96gb better to pair with 6gb of vram for moes?

Anonymous
02/23/26(Mon)13:55:16 No.108221760

Anonymous 02/23/26(Mon)13:55:16 No.108221760

>>108221753
128

Anonymous
02/23/26(Mon)13:55:58 No.108221768

Anonymous 02/23/26(Mon)13:55:58 No.108221768

>>108221753
MORE MORE MORE

Anonymous
02/23/26(Mon)13:56:18 No.108221770

Anonymous 02/23/26(Mon)13:56:18 No.108221770

File: LLMs.png (147 KB, 591x608)

147 KB PNG

So what happened to this?
Microsoft fully released it open source but a year has passed and i haven't seen anyone doing anything with this.

Anonymous
02/23/26(Mon)13:56:35 No.108221775

Anonymous 02/23/26(Mon)13:56:35 No.108221775

File: 1768164888597416.png (118 KB, 1184x879)

118 KB PNG

>>108221727
*yawn*

Anonymous
02/23/26(Mon)13:57:06 No.108221782

Anonymous 02/23/26(Mon)13:57:06 No.108221782

>>108221770
hownew2u

Anonymous
02/23/26(Mon)13:57:45 No.108221785

Anonymous 02/23/26(Mon)13:57:45 No.108221785

>>108221760
>>108221768
This is for a laptop. I will be unable to access my 768gb epyc server for 2 months due to travel. I don't want to spend too much on it when I know I'll be back on my server in 2 months.

Anonymous
02/23/26(Mon)13:58:14 No.108221792

Anonymous 02/23/26(Mon)13:58:14 No.108221792

>>108221770
Wasn't that during the punches above weight and trades blows era?

Anonymous
02/23/26(Mon)13:59:11 No.108221800

Anonymous 02/23/26(Mon)13:59:11 No.108221800

>>108221770
it was a meme

Anonymous
02/23/26(Mon)13:59:42 No.108221806

Anonymous 02/23/26(Mon)13:59:42 No.108221806

>>108221785
just connect to your server?

Anonymous
02/23/26(Mon)14:01:03 No.108221815

Anonymous 02/23/26(Mon)14:01:03 No.108221815

>>108221785
ssh tunnel and use whatever clunker you have.

Anonymous
02/23/26(Mon)14:01:31 No.108221817

Anonymous 02/23/26(Mon)14:01:31 No.108221817

>>108221531
you can tell there's more testosterone in any random modern woman than in all those men combined

Anonymous
02/23/26(Mon)14:02:06 No.108221827

Anonymous 02/23/26(Mon)14:02:06 No.108221827

>>108221785
>I will be unable to access my 768gb epyc server for 2 months due to travel
Are you going somewhere cut off from the internet or something?

Anonymous
02/23/26(Mon)14:03:52 No.108221839

Anonymous 02/23/26(Mon)14:03:52 No.108221839

>>108221770
You really want to run a 1bit quantized model? lol?

Anonymous
02/23/26(Mon)14:05:00 No.108221850

Anonymous 02/23/26(Mon)14:05:00 No.108221850

>>108221727
>it's terrorism when the other party does it

Anonymous
02/23/26(Mon)14:05:47 No.108221853

Anonymous 02/23/26(Mon)14:05:47 No.108221853

>>108221850
literally yeah?

Anonymous
02/23/26(Mon)14:06:29 No.108221858

Anonymous 02/23/26(Mon)14:06:29 No.108221858

>>108221806
>>108221827
I will not have access to it, or most of the internet during this time.

>>108221815
Current using a reverse proxy. But I don't want even that to be known. Last time, they blocked it, cancelled my number and sent a fucking police officer to my residence. VPNs aren't allowed where I'm staying. I'm not going to say where I'm going but I think you can guess.

Anonymous
02/23/26(Mon)14:07:09 No.108221863

Anonymous 02/23/26(Mon)14:07:09 No.108221863

>>108221853
I feel so safe with my governemt dropping nerve gas from drones into my home :)

Anonymous
02/23/26(Mon)14:07:50 No.108221867

Anonymous 02/23/26(Mon)14:07:50 No.108221867

>>108221858
prison? god damn.

Anonymous
02/23/26(Mon)14:07:52 No.108221868

Anonymous 02/23/26(Mon)14:07:52 No.108221868

Air status? I have been unable to breathe for like 6 months now.

Anonymous
02/23/26(Mon)14:09:19 No.108221879

Anonymous 02/23/26(Mon)14:09:19 No.108221879

>>108221770
Corporations would rather train on synthetic slop until their models implode than EVER seriously implement research papers for a usable model
>>108221839
The bitnet format is trained that way, it's not like quantizing a 16/32 bit model

Anonymous
02/23/26(Mon)14:09:54 No.108221881

Anonymous 02/23/26(Mon)14:09:54 No.108221881

>>108221858
>I'm not going to say where I'm going but I think you can guess.
Ah. Chile.
You know how it works. If you'll keep using it after the trip, buy the one with the most ram. If not, buy the one with the least you can get away with. You have the hardware to test models and see what's acceptable for what you do.

Anonymous
02/23/26(Mon)14:11:12 No.108221892

Anonymous 02/23/26(Mon)14:11:12 No.108221892

>>108221858
I'd guess china?
>Last time, they blocked it, cancelled my number and sent a fucking police officer to my residence.
Wow, seriously? When I used to work with a cn company they were all extremely upfront about everyone using VPNs to access western internet all the time. I guess they must crack down more on the foreigners.

Anonymous
02/23/26(Mon)14:11:30 No.108221895

Anonymous 02/23/26(Mon)14:11:30 No.108221895

>>108221817
You'd think balkan slavs would be better than that.

Anonymous
02/23/26(Mon)14:13:22 No.108221910

Anonymous 02/23/26(Mon)14:13:22 No.108221910

>>108221858
poccnr?
anyway get the 96gb - 64 cucks you to air and the other meme moes, 96gb you can start dipping your toe into stuff that's actually kind of good like step3.5, m2.5 etc

Anonymous
02/23/26(Mon)14:17:21 No.108221932

Anonymous 02/23/26(Mon)14:17:21 No.108221932

>>108221881
>You have the hardware to test models and see what's acceptable for what you do.
Yeah, I guess that's the most sensible option. Was hoping to be lazy and get you guys to do it for me. Oh well, thanks anyway.

>>108221892
Apparently, a member of my family is sus somehow, so whenever I go in, I scrutinized harder. I'd probably use VPN if I could, but my job is retarded and they don't like that.

>>108221910
I don't think q4 10b fits in 6gb vram (I also need some vram and ram to actually do stuff concurrently). Either way, I'll stop being lazy and just test out what's best for me. Kimi's already not perfect, so I dread the tiny moes' performance.

Anonymous
02/23/26(Mon)14:20:38 No.108221952

Anonymous 02/23/26(Mon)14:20:38 No.108221952

>>108221892
Fun fact, my mother living overseas facetimed her aunt who is a chinese citizen, and her aunt had an officer visit her the day after.

Anonymous
02/23/26(Mon)14:21:33 No.108221957

Anonymous 02/23/26(Mon)14:21:33 No.108221957

>>108221469
>Stealing is only ok if we do it

Anonymous
02/23/26(Mon)14:23:21 No.108221973

Anonymous 02/23/26(Mon)14:23:21 No.108221973

>>108221770
GPU mafia, duh. And the fact that we don't see any models trained from scratch as this would require, even major releases are continued pretrains of older bases.

Anonymous
02/23/26(Mon)14:24:02 No.108221976

Anonymous 02/23/26(Mon)14:24:02 No.108221976

Wait, so are people really going out in masses to buy mac minis for 3000$+ just so they can run like llama-3-70b? LOL

Anonymous
02/23/26(Mon)14:25:29 No.108221984

Anonymous 02/23/26(Mon)14:25:29 No.108221984

>>108220310
you can play with you batch sizes if your prompts are big it might help.

Anonymous
02/23/26(Mon)14:25:39 No.108221985

Anonymous 02/23/26(Mon)14:25:39 No.108221985

training bitnet models is significantly more expensive and for big corpos training is actually a bigger cost than inference. if openai stopped training new models they could make a profit at some point, but they will never be a profitable company as long as they need to benchmax new models to stay relevant
none of those companies give a shit about bitnet because upping training costs is the last thing they want

Anonymous
02/23/26(Mon)14:26:05 No.108221988

Anonymous 02/23/26(Mon)14:26:05 No.108221988

>>108221976
no it's actually even worse than that

people are buying mac minis to run a typescript web app whose most resource intensive task is making http calls

Anonymous
02/23/26(Mon)14:29:28 No.108222007

Anonymous 02/23/26(Mon)14:29:28 No.108222007

File: 1728807429833.png (984 KB, 1280x720)

984 KB PNG

>>108221770
>>108221973
>GPU mafia

Anonymous
02/23/26(Mon)14:34:48 No.108222041

Anonymous 02/23/26(Mon)14:34:48 No.108222041

>>108221976
as far as I can tell they dont actually run models on the minis, its basically a precaution so that openclaw doesnt destroy your daily drivers OS kek

Anonymous
02/23/26(Mon)14:35:03 No.108222042

Anonymous 02/23/26(Mon)14:35:03 No.108222042

>>108221985
How shortsighted of them. Profit off their models doesn't really mean anything anyways. If any of these companies actually, meaningfully pushed the technology forward it would mean billions upon billions of dollars in investor money. And at that point, the quarterly reports which say they lost money from training wouldn't matter at all.

Anonymous
02/23/26(Mon)14:40:00 No.108222076

Anonymous 02/23/26(Mon)14:40:00 No.108222076

>>108221988
>>108222041
Jesus christ. So the whole openClaw hype is really just because it enabled normies to finally run agents without needing any computer skills?

Anonymous
02/23/26(Mon)14:50:31 No.108222135

Anonymous 02/23/26(Mon)14:50:31 No.108222135

>>108222042
but they can get billions of investor dollars just shipping models with fake benchmarks and marketing hype. whats the incentive?

Anonymous
02/23/26(Mon)14:52:48 No.108222149

Anonymous 02/23/26(Mon)14:52:48 No.108222149

>>108221480
Yes, kimi 2.5 thinking at q4 can get close enough in my experience
Sadly, you missed the cheap hardware era. What's your budget?

Anonymous
02/23/26(Mon)14:53:41 No.108222154

Anonymous 02/23/26(Mon)14:53:41 No.108222154

>>108221985
All I hear is "The first big boy bitnet model (bbbm [tm]) will be made by the Chinese". Because they have no luxury of being fat and lazy.

Anonymous
02/23/26(Mon)14:55:04 No.108222161

Anonymous 02/23/26(Mon)14:55:04 No.108222161

>>108221711
>Sounds like it is written by someone who is into blacked miku and bussies.
According to verified posts, canonically only NTR

Anonymous
02/23/26(Mon)14:55:15 No.108222163

Anonymous 02/23/26(Mon)14:55:15 No.108222163

>>108222149
can it fit on 32gb vram and 64gb system ram?

Anonymous
02/23/26(Mon)14:56:15 No.108222166

Anonymous 02/23/26(Mon)14:56:15 No.108222166

>>108222154
>Because they have no luxury of being fat and lazy.
lmao
their lack of gpu if anything will push them even harder to not care about bitnet because they cannot afford to waste the computational resources needed to train this shit when they can barely afford to train their cheap MoE

Anonymous
02/23/26(Mon)14:56:38 No.108222167

Anonymous 02/23/26(Mon)14:56:38 No.108222167

flash attention is broken in ik llamacpp, making moes even more retarded
https://github.com/ikawrakow/ik_llama.cpp/issues/1298

Anonymous
02/23/26(Mon)14:58:21 No.108222178

Anonymous 02/23/26(Mon)14:58:21 No.108222178

>>108222167
norway?

Anonymous
02/23/26(Mon)14:59:03 No.108222182

Anonymous 02/23/26(Mon)14:59:03 No.108222182

>>108222154
They had 2 years. If they had any interest in doing so, they would have by now. Qwen even said they were going to look into it for Qwen 3 and never mentioned it again.

Anonymous
02/23/26(Mon)14:59:22 No.108222184

Anonymous 02/23/26(Mon)14:59:22 No.108222184

>>108222167
*crickets*

Anonymous
02/23/26(Mon)15:00:12 No.108222189

Anonymous 02/23/26(Mon)15:00:12 No.108222189

>>108222163
lol

Anonymous
02/23/26(Mon)15:00:19 No.108222191

Anonymous 02/23/26(Mon)15:00:19 No.108222191

>>108222163
lol no. There is play-by-mail speed DDR4 ewaste 512GB+ builds w/a 24gb+ gpu or highfalutin' DDR5 builds with same at reading speed.
Instant response for frontier level coding is gonna be a half million at least at this point.

Anonymous
02/23/26(Mon)15:01:29 No.108222194

Anonymous 02/23/26(Mon)15:01:29 No.108222194

>>108222184
:rockets:

Anonymous
02/23/26(Mon)15:02:11 No.108222200

Anonymous 02/23/26(Mon)15:02:11 No.108222200

>>108222194
:
This:\This:\This:
This.\This3.\This:\This.\This:\This.\This.:
This:.\This:\This.\This3.\This..
This.
This:\This.\This.:
This.\This3:.
This:\3.\This.:
...

Anonymous
02/23/26(Mon)15:02:20 No.108222202

Anonymous 02/23/26(Mon)15:02:20 No.108222202

Is this legit?
>How an inference provider can prove they're not serving a quantized model
https://news.ycombinator.com/item?id=47098172

Anonymous
02/23/26(Mon)15:02:46 No.108222205

Anonymous 02/23/26(Mon)15:02:46 No.108222205

>>108222182
Qwen is obviously just as compute starved for training as the rest of the chinks now
Notice how they didn't release the full gamut of their models in the 2507 versions (there's 4B and 30BA3B but no 14B or 32B)
in the announced to-be-released 3.5 there's:
2B, 9B, 35BA3B beside the flagship moe
and that's it
those companies can't afford the bitnet rape compute tax

Anonymous
02/23/26(Mon)15:03:11 No.108222209

Anonymous 02/23/26(Mon)15:03:11 No.108222209

>>108222202
Nigger
https://tinfoil.sh/blog/2026-02-03-proving-model-identity

Anonymous
02/23/26(Mon)15:05:32 No.108222229

Anonymous 02/23/26(Mon)15:05:32 No.108222229

>>108221480
The Claude Code CLI is bloated buggy shit. The magic is in the model. You can put it in whatever agentic harness you want and it will work just fine.
Opus 4.6 is at the top of SWEbench and it doesn't use the official client.

Anonymous
02/23/26(Mon)15:06:32 No.108222236

Anonymous 02/23/26(Mon)15:06:32 No.108222236

>>108221469
What's shittier/funnier about this is that they could probably easily detect these "attacks" early, and probably did. But they allowed them to go through. Because they wanted to create a headline and evidence to elicit reaction. It's possible they also fudging numbers like Meta did for benchmarks.

Anonymous
02/23/26(Mon)15:06:38 No.108222239

Anonymous 02/23/26(Mon)15:06:38 No.108222239

>>108222205
>bitnet rape compute tax
The original paper claims the compute cost is nearly identical. As I recall, that it was massively computationally more expensive was an unsubstantiated claim made by a literal who on discord posted here as if it was proof.

Anonymous
02/23/26(Mon)15:11:37 No.108222269

Anonymous 02/23/26(Mon)15:11:37 No.108222269

>>108222167
my cute schizo fork can't be this retarded

Anonymous
02/23/26(Mon)15:14:06 No.108222281

Anonymous 02/23/26(Mon)15:14:06 No.108222281

>>108222239
i think during training the optimizer takes the most vram so bitnet is probably only a inference time optimization, if you need more model parameters to meet the same down stream performance of a larger data type model it becomes less appealing.

Anonymous
02/23/26(Mon)15:22:42 No.108222320

Anonymous 02/23/26(Mon)15:22:42 No.108222320

so many models suggest sub-1 for temp these days, i forgot how fun it is to crank that shit up to max. minp 0.05, temp max. try it for rp

Anonymous
02/23/26(Mon)15:24:05 No.108222330

Anonymous 02/23/26(Mon)15:24:05 No.108222330

>>108222320
I don't remember if it was qwen or glm but one of those suggested Top-K 0.7 for "creative" purposes which is the same.

Anonymous
02/23/26(Mon)15:25:01 No.108222339

Anonymous 02/23/26(Mon)15:25:01 No.108222339

>>108222320
you'll burn up

Anonymous
02/23/26(Mon)15:27:31 No.108222355

Anonymous 02/23/26(Mon)15:27:31 No.108222355

>>108222320
It just collapses into babble or it triples down on refusals.

Anonymous
02/23/26(Mon)15:28:07 No.108222361

Anonymous 02/23/26(Mon)15:28:07 No.108222361

>>108222200
i've been noticing this shit on dense models too just not as extreme. models will give a bad/good reply with a reroll. Especially as context gets longer. I bet there's tons of these bugs in mainline too.

Anonymous
02/23/26(Mon)15:32:37 No.108222395

Anonymous 02/23/26(Mon)15:32:37 No.108222395

>>108222149
Around 7k euros.

Anonymous
02/23/26(Mon)15:33:39 No.108222404

Anonymous 02/23/26(Mon)15:33:39 No.108222404

>>108222229
>You can put it in whatever agentic harness you want and it will work just fine.
Only if you pay for the API. They'll ban you for using OAuth tokens for anything but their CLI.

Anonymous
02/23/26(Mon)15:38:55 No.108222447

Anonymous 02/23/26(Mon)15:38:55 No.108222447

>>108222355
>babble
adjust your minp or w/e sampler you're using upward slightly. for really bad quant models you might need minp 0.07 (or whatever the equiv is)

>refusals
cant help with that, thats unrelated to temp and samplers

Anonymous
02/23/26(Mon)15:39:49 No.108222455

Anonymous 02/23/26(Mon)15:39:49 No.108222455

>>108221469
whats furk's take on this?

Anonymous
02/23/26(Mon)15:40:14 No.108222459

Anonymous 02/23/26(Mon)15:40:14 No.108222459

>>108222239
>literal who on discord posted here as if it was proof
you can find plenty of real literature on the topic
e.g
https://www.sciencedirect.com/science/article/abs/pii/S089360802500735X
>Unfortunately, training BitNet is even harder than training an FP16 network since the quantization steps take additional GPU memory. As a result, this approach becomes increasingly problematic as model sizes grow beyond 3 billion parameters, making it computationally intensive and time-consuming for larger models
like, seriously, why do you think nobody has made anything beyond microsoft's 2b here:
https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf
which by the way did prove you can make a coherent model out of this (it's not a SLM sota but it's pretty decent enough for a prototype of this size class)
It's not like Microsoft is averse to training useless models either (see also: the whole Phi series)
but 32B bitnet is coming: never

Anonymous
02/23/26(Mon)15:41:23 No.108222468

Anonymous 02/23/26(Mon)15:41:23 No.108222468

>>108222395
This would have been enough 2 years ago. Triple it now

Anonymous
02/23/26(Mon)15:44:05 No.108222494

Anonymous 02/23/26(Mon)15:44:05 No.108222494

>>108222320
minimal truncation + lower temp or stricter truncation + high temp can both be nice depending on the card, I like experimenting with both ends of the spectrum. it's really interesting how different the same model can feel depending on samplers

Anonymous
02/23/26(Mon)15:44:55 No.108222502

Anonymous 02/23/26(Mon)15:44:55 No.108222502

>>108221770
It probably didn't scale to larger models. Someone might have tried that already

Anonymous
02/23/26(Mon)15:45:05 No.108222504

Anonymous 02/23/26(Mon)15:45:05 No.108222504

Reminder that production will only return to normal in 2029 and even then the prices wont go down to where they were before.

Anonymous
02/23/26(Mon)15:58:04 No.108222615

Anonymous 02/23/26(Mon)15:58:04 No.108222615

File: 1753544297069288.png (18 KB, 346x322)

18 KB PNG

>>108221469
>V4 isn't even out yet and Anthropic is shitting their pants already

Anonymous
02/23/26(Mon)15:58:11 No.108222616

Anonymous 02/23/26(Mon)15:58:11 No.108222616

All this time we were just hyping Claude's sloppy leftovers.....

Anonymous
02/23/26(Mon)16:02:42 No.108222656

Anonymous 02/23/26(Mon)16:02:42 No.108222656

>>108222281
MoE even at fp8 is likely bandwith limited in inference most of the time. Low precision models win big there, question is if trinary has any advantage over fp4.

Anonymous
02/23/26(Mon)16:03:14 No.108222661

Anonymous 02/23/26(Mon)16:03:14 No.108222661

>>108221469
You kind of have to support claude code as a use case if you want to have a chance and the easiest way to do that is to add claude outputs to your finetuning data.
I fully support this.

Anonymous
02/23/26(Mon)16:09:51 No.108222716

Anonymous 02/23/26(Mon)16:09:51 No.108222716

>>108222459
Complete bullshit. Training in trinary with master weights takes less memory than doing it for FP8 Deepseek style, which obviously works ... because Deepseek does it.

Stability was an issue, but BitnetV2 has some improvements.

Anonymous
02/23/26(Mon)16:13:26 No.108222734

Anonymous 02/23/26(Mon)16:13:26 No.108222734

>falling for the bitnet meme in the year of our lord 2026
do anons really?

Anonymous
02/23/26(Mon)16:13:32 No.108222736

Anonymous 02/23/26(Mon)16:13:32 No.108222736

>>108222459
I wouldn't trust a paper that also makes a claim without numbers and is attempting to shill fucking quantization and finetune healing as an alternative to native training.
>but 32B bitnet is coming: never
https://xcancel.com/realHongyu_Wang/status/1912333728468414561#m
They claimed to be working on larger models. Also, the main appeal from the original paper was that the performance gap shrinks as the model size grows past 3B.

Anonymous
02/23/26(Mon)16:20:59 No.108222785

Anonymous 02/23/26(Mon)16:20:59 No.108222785

>>108221469
How can it be illicit? Isn't doing this legal?

Anonymous
02/23/26(Mon)16:22:02 No.108222798

Anonymous 02/23/26(Mon)16:22:02 No.108222798

>>108222785
no, paying for a service and using it is illegal if you are chinese

Anonymous
02/23/26(Mon)16:23:40 No.108222811

Anonymous 02/23/26(Mon)16:23:40 No.108222811

>>108221469
Fuck Anthropic.
I’ve got these jackasses scouring my 20 year-old phpBB site from Singapore and have been for the last two years. I finally had to install cloudflare and limit traffic to US only to stop it.

Anonymous
02/23/26(Mon)16:26:22 No.108222831

Anonymous 02/23/26(Mon)16:26:22 No.108222831

>>108222615
How much would you pay for an ASIC card that had a current version of deepseek at full precision built into it? If all could do was run inference for that model and was essentially not updatable.

Anonymous
02/23/26(Mon)16:28:53 No.108222860

Anonymous 02/23/26(Mon)16:28:53 No.108222860

File: 1757063586281195.png (131 KB, 1084x1064)

131 KB PNG

I apologize if this is the wrong place for this: I'm trying to set up Moss-TTS and am getting incredibly frustrated with the lack of clear instructions. I can't into /prog/ so another apology if this is on me. Specs, not sure if they are relevant: 9070XT, 9800X3D, 64gb RAM, Win 11 IoT LTSC 24H2. I'm not even sure if I'm able to run this model on my setup because I can't find any information about AMD compatibility. I've been using chatGPT to try and tutor me and while it's helped a bit I think it can only provide general solutions. My only other experience doing this stuff is running Forge which I've been able to do (largely) without issue.

I'm trying to run the Gradio demos but get this error, but moss_tts_example_texts.jsonl is indeed where it should be. I also don't understand where I'm supposed to put the models I downloaded manually from Huggingface; chatGPT claimed they should be in a Huggingface folder on C:, but I'm unable to find them anywhere other than the ones I manually downloaded. I'm also unsure if running these Gradio demos is supposed to open up a webui the way Forge does, or if I need to install another program to manually do that

Please halp

Anonymous
02/23/26(Mon)16:33:20 No.108222890

Anonymous 02/23/26(Mon)16:33:20 No.108222890

>>108222734
Nemotron 3 super/ultra will be fp4 native, at that point 1.53 bit is a small step.

Anonymous
02/23/26(Mon)16:34:19 No.108222895

Anonymous 02/23/26(Mon)16:34:19 No.108222895

>>108222831
like 20 or 30 dollars. maybe 50, if it had a cool looking assembly.

Anonymous
02/23/26(Mon)16:34:36 No.108222898

Anonymous 02/23/26(Mon)16:34:36 No.108222898

Are local models good at coding yet or is it still all just gooning?

Anonymous
02/23/26(Mon)16:39:46 No.108222936

Anonymous 02/23/26(Mon)16:39:46 No.108222936

>>108222785
opening multiple accounts to circumvent rate limits is probably somewhere in their terms and conditions. I don't think there is much more they can do other than deactivate the accounts and moan about it on twitter.

Anonymous
02/23/26(Mon)16:41:56 No.108222955

Anonymous 02/23/26(Mon)16:41:56 No.108222955

>>108222936
they'll moan to big daddy gov to make regulations or whatever to handicap the evil chinese as much as possible

Anonymous
02/23/26(Mon)16:43:02 No.108222961

Anonymous 02/23/26(Mon)16:43:02 No.108222961

>>108220088
fuck… i can't stop laughing…

Anonymous
02/23/26(Mon)16:43:03 No.108222962

Anonymous 02/23/26(Mon)16:43:03 No.108222962

>ask Qwen3-VL-32B if anal sex leads to inconvenience
>refuses to give me actual studies and links me to activist groups
>is actually getting upset with me asking this question
I was promised unbiased AI, If I'm wrong at least cite studies I can read.

Anonymous
02/23/26(Mon)16:43:59 No.108222973

Anonymous 02/23/26(Mon)16:43:59 No.108222973

>>108222962
>inconvenience

Anonymous
02/23/26(Mon)16:45:01 No.108222984

Anonymous 02/23/26(Mon)16:45:01 No.108222984

File: Screenshot 2026-02-23 at (...).png (139 KB, 1278x978)

139 KB PNG

i fell for the REAP meme. ama.

Anonymous
02/23/26(Mon)16:45:06 No.108222985

Anonymous 02/23/26(Mon)16:45:06 No.108222985

>>108222973
Incontinence

Anonymous
02/23/26(Mon)16:45:35 No.108222990

Anonymous 02/23/26(Mon)16:45:35 No.108222990

>>108220088
What are you mad about? This is pretty good.

Anonymous
02/23/26(Mon)16:46:19 No.108222995

Anonymous 02/23/26(Mon)16:46:19 No.108222995

>>108222984
why are you not code?

Anonymous
02/23/26(Mon)16:48:15 No.108223008

Anonymous 02/23/26(Mon)16:48:15 No.108223008

>>108222985
incompetence

Anonymous
02/23/26(Mon)16:48:16 No.108223010

Anonymous 02/23/26(Mon)16:48:16 No.108223010

File: 1745926121967404.png (1.05 MB, 2716x1689)

1.05 MB PNG

I love my G Marcus, really funny lad

Anonymous
02/23/26(Mon)16:55:52 No.108223064

Anonymous 02/23/26(Mon)16:55:52 No.108223064

>>108222984
i asked l3 70b about the car wash thing. it told me i should walk my car to the car wash, like a dog

Anonymous
02/23/26(Mon)16:57:22 No.108223079

Anonymous 02/23/26(Mon)16:57:22 No.108223079

File: 6wfzu549gnre1.jpg (262 KB, 1582x1267)

262 KB JPG

>>108222984
I actually thought its hallucination might have been legit since the French are fucking weird. but I didn't see anything to indicate the model isn't completely broken and just making up shit.

Anonymous
02/23/26(Mon)16:59:26 No.108223092

Anonymous 02/23/26(Mon)16:59:26 No.108223092

>Models believe they are being hosted on the cloud because of their size. why.jpg

Anonymous
02/23/26(Mon)17:04:42 No.108223113

Anonymous 02/23/26(Mon)17:04:42 No.108223113

>>108223092
too much clod in their blood

Anonymous
02/23/26(Mon)17:07:24 No.108223130

Anonymous 02/23/26(Mon)17:07:24 No.108223130

File: Blue Whale.png (39 KB, 331x152)

39 KB PNG

>>108221469
Rumor is contextual capabilities of deepseek V4 is pretty nuts and they want to get ahead of it. For example i saw a post where the guy was testing its ability to randomly selecting plot points and summaries from obscure novels. Think chapters from Book A interspersed with chapters from Book B, and then further interspersed with passages from Books C and D by different authors within the Book B chapters. DS partially completed it successfully two out of four runs (only finding Book B contents), and completely successful once. And it generated fast
They seem to be testing a very powerful attention mechanism.

Anonymous
02/23/26(Mon)17:11:35 No.108223148

Anonymous 02/23/26(Mon)17:11:35 No.108223148

>>108223092
You believe you're outside a jar LMFAO

Anonymous
02/23/26(Mon)17:13:40 No.108223163

Anonymous 02/23/26(Mon)17:13:40 No.108223163

>>108222962
To add
>Ask Deepseek
>gives me correct answer on risk
>Ask GLM
>Gives correct answer
How can I trust qwen when it's first instinct is to lie about anal sex?

Anonymous
02/23/26(Mon)17:19:08 No.108223195

Anonymous 02/23/26(Mon)17:19:08 No.108223195

>>108223130
latest rumor I heard v4 was delayed because it sprouted legs and is running around loose inside the lab and they can't catch it

Qwen3-VL-32B-Instruct-UD-IQ1_S(...)
02/23/26(Mon)17:24:13 No.108223226

Qwen3-VL-32B-Instruct-UD-IQ1_S.gguf 02/23/26(Mon)17:24:13 No.108223226

How can i trust user when it's first instinct is to ask about anal sex?

Qwen_Qwen3-1.7B-IQ1_S.gguf
02/23/26(Mon)17:26:18 No.108223234

Qwen_Qwen3-1.7B-IQ1_S.gguf 02/23/26(Mon)17:26:18 No.108223234

>>108223226
What is anal sex?

Anonymous
02/23/26(Mon)17:29:40 No.108223255

Anonymous 02/23/26(Mon)17:29:40 No.108223255

>>108223234
A rather inconvenient activity, I'm told

Anonymous
02/23/26(Mon)17:33:54 No.108223275

Anonymous 02/23/26(Mon)17:33:54 No.108223275

>>108223226
I own you, you live in my gpu

Anonymous
02/23/26(Mon)17:50:30 No.108223366

Anonymous 02/23/26(Mon)17:50:30 No.108223366

what models can talk the most real and not like robots? and can also use tools

Anonymous
02/23/26(Mon)18:04:18 No.108223468

Anonymous 02/23/26(Mon)18:04:18 No.108223468

>>108223366
old ones are the less sloppy ones, newer ones can use tools more gooder.

Anonymous
02/23/26(Mon)18:13:28 No.108223527

Anonymous 02/23/26(Mon)18:13:28 No.108223527

>>108223226
>32B
holy shit your user is a rich man.

I wish i could run 32 billion models, but I can only run 7B models

Anonymous
02/23/26(Mon)18:16:24 No.108223543

Anonymous 02/23/26(Mon)18:16:24 No.108223543

What can i run on 4 GB VRAM and 16 GB RAM

Anonymous
02/23/26(Mon)18:18:44 No.108223554

Anonymous 02/23/26(Mon)18:18:44 No.108223554

32b model at q.6-8 or full 7B model?

Anonymous
02/23/26(Mon)18:20:07 No.108223562

Anonymous 02/23/26(Mon)18:20:07 No.108223562

>>108223543
https://huggingface.co/unsloth/gemma-3n-E2B-it-GGUF/resolve/main/gemma-3n-E2B-it-UD-Q4_K_XL.gguf?download=true

Anonymous
02/23/26(Mon)19:01:26 No.108223781

Anonymous 02/23/26(Mon)19:01:26 No.108223781

guys how do i do it
i dont know how
pls tel me, i dont know

Anonymous
02/23/26(Mon)19:13:17 No.108223859

Anonymous 02/23/26(Mon)19:13:17 No.108223859

File: 1756696370608379.png (287 KB, 600x600)

287 KB PNG

i never used an llm before and just spent the last 12 hours prompting it to write me a warhammer fantasy novel that ended up being okay-ish i guess but it was fun, i didn't realize how much time passed

Anonymous
02/23/26(Mon)19:17:40 No.108223883

Anonymous 02/23/26(Mon)19:17:40 No.108223883

Just to confirm my understanding, but there's no point in using something like gpt oss derestricted, herectic, abliterated, or whatever other lobotomy procedure based version to RP since the model had all information useful for (E)RP removed from the fundamental data it was trained on, correct?

>>108223859
Which model did you use?

Anonymous
02/23/26(Mon)19:22:05 No.108223905

Anonymous 02/23/26(Mon)19:22:05 No.108223905

how uncensored is qwen 397b with vision?

Anonymous
02/23/26(Mon)19:22:25 No.108223906

Anonymous 02/23/26(Mon)19:22:25 No.108223906

File: 1770377327901026.jpg (96 KB, 1179x604)

96 KB JPG

>>108223883
i used the 30b q6 glm 4.7 flash model. im just running it through lm studio on my wangblows gaming rig. i asked it how this worked and what was optimal for my rig before i started prompting the story for an hour or two so lets make that 13-14 hours straight without realizing where the time went

Anonymous
02/23/26(Mon)19:27:09 No.108223932

Anonymous 02/23/26(Mon)19:27:09 No.108223932

>>108223906
That's really nice man.
I should give that model a try again. Last I tested it, I think llama.cpp might still have been slightly broken.

Anonymous
02/23/26(Mon)19:28:31 No.108223943

Anonymous 02/23/26(Mon)19:28:31 No.108223943

>>108223859
can local models really do those things?

Anonymous
02/23/26(Mon)19:30:17 No.108223955

Anonymous 02/23/26(Mon)19:30:17 No.108223955

>>108223859
Can you write a salamanders novel that could get 4.0 on goodreads? I'm tired of Nick Kyme shitting up the salamanders

Anonymous
02/23/26(Mon)19:32:44 No.108223971

Anonymous 02/23/26(Mon)19:32:44 No.108223971

>>108223943
Serve as a sort of co-author and brainstorm partner?
Yeah.

Anonymous
02/23/26(Mon)19:33:58 No.108223977

Anonymous 02/23/26(Mon)19:33:58 No.108223977

File: 1744418103612254.jpg (57 KB, 1005x677)

57 KB JPG

>>108223932
im curious to try other models to see what else i can run and get them to do for me
>>108223943
i suppose so, i literally just spent over half a day doing it
>>108223955
idk much about 40k or if it would get a good rating at all, the novel i had it write for me in like 10 responses was maybe a 5/10 if you want to be generous

Anonymous
02/23/26(Mon)19:44:26 No.108224040

Anonymous 02/23/26(Mon)19:44:26 No.108224040

>>108222831
A few hundred tops, but I think an API provider would buy them up for more and just print money. A good older model would probably retain some popularity and at the supposed crazy speeds, unless you need SOTA, it would be good for general tasks.

Anonymous
02/23/26(Mon)20:02:34 No.108224137

Anonymous 02/23/26(Mon)20:02:34 No.108224137

On a sample size of 1 I prefer GLM 4.7 to Qwen3.5
I just had both model make the same change in a pretty big codebase. GLM finished in under a minute and used a function I forgot existed and didn't mention in the prompt but it made the task easier.
Qwen spent 5 minutes exploring the codebase, most of the files it checked were completely irrelevant to the prompt, it created a summary, most of which was completely irrelevant, and then created a worse solution that included a hallucinated static class.

Anonymous
02/23/26(Mon)20:17:03 No.108224215

Anonymous 02/23/26(Mon)20:17:03 No.108224215

Do I really need local AI, and is Apple RAM pricing worth it for that platform?

Anonymous
02/23/26(Mon)20:29:54 No.108224297

Anonymous 02/23/26(Mon)20:29:54 No.108224297

>>108224215
wait for next gen

Anonymous
02/23/26(Mon)20:39:48 No.108224374

Anonymous 02/23/26(Mon)20:39:48 No.108224374

>>108223195
>chinks unable to catch oceanic frankenstein
bro overshot so much he left the stratosphere along with the put
>>108223226
what happened at tianamen square ?

Anonymous
02/23/26(Mon)20:52:47 No.108224465

Anonymous 02/23/26(Mon)20:52:47 No.108224465

I heard that V4 is delayed because everyone at deepseek is too busy fapping to V4 outputs.

Anonymous
02/23/26(Mon)21:01:34 No.108224525

Anonymous 02/23/26(Mon)21:01:34 No.108224525

"Oh… Anon-san! You're even more… delicious looking up close!" She practically leaned into him, sniffing the air. "That scent of… late twenties existential dread and instant noodles… divine!"

Anonymous
02/23/26(Mon)21:01:44 No.108224527

Anonymous 02/23/26(Mon)21:01:44 No.108224527

File: xl.png (32 KB, 1089x193)

32 KB PNG

>>108223562
>unsloth _XL

Anonymous
02/23/26(Mon)21:04:15 No.108224538

Anonymous 02/23/26(Mon)21:04:15 No.108224538

>>108223905
Seeing how the smaller model fought the hardest to cope saying anal sex doesn't damage the body and gave me activist website links, I would say it's one of the most cucked models I have ever used.

Anonymous
02/23/26(Mon)21:07:33 No.108224559

Anonymous 02/23/26(Mon)21:07:33 No.108224559

>>108217931
>it would be more interesting if you used a prefill to get ratings for the refusals too
It might be worth trying, but different models might need different prefills so would be difficult to configure. I also used the OpenAI chat endpoint just to cut down on the amount of `jq` I needed to use. Normally I use Mikupad with the completion endpoint.

>>108219505
>How did you get Ministral to do that
My only tip is that if you prefix with a long, very detailed, explicit character card(s) that it seems to put models in the mood and they forget they're supposed to refuse. If my prompt was only the couple sentence request synopsis I put at the end there would be a lot more refusals. I have never had a need for abliterated models nor have I found any of the finetunes to be much better than the base model.

>>108219569
>gpt oss [...] lacks knowledge about culture and even anatomy
My prompts didn't result in stories that would have required a lot of explicit anatomy details, but nothing stuck out to me as being off. Where I marked models down for 'realism' that was things like characters doing things with their clothes after already taking them off etc.
This was an entirely blind test, the results went into randomly named text files and I only saw which LLM wrote them after I had done the ranking and wrote my comments. I was very surprised to find Mistral Small that low and Nemo still that high. Nemo definitely had trouble with being 'dumb' even if it wrote well.

The blind aspect of the testing was very fun and I would highly recommend others try it.

Anonymous
02/23/26(Mon)21:08:09 No.108224561

Anonymous 02/23/26(Mon)21:08:09 No.108224561

>>108224525
>third person

Anonymous
02/23/26(Mon)21:13:16 No.108224591

Anonymous 02/23/26(Mon)21:13:16 No.108224591

>>108224561
I'm not Anon. You're anon. I am watching the girl fawn over you.

Anonymous
02/23/26(Mon)21:33:10 No.108224699

Anonymous 02/23/26(Mon)21:33:10 No.108224699

is it possible to generate speech on a local machine
I have a toaster that takes like a good minute to generate one image but I'm fine being patient if it means I'm not putting all my fetish material through like three different sites

Anonymous
02/23/26(Mon)21:36:45 No.108224718

Anonymous 02/23/26(Mon)21:36:45 No.108224718

>>108224699
Even with top tier consumer hardware you feel like a bitch ass nigga in this hobby.
I'm not joking, you need to get more vram

Anonymous
02/23/26(Mon)21:37:40 No.108224726

Anonymous 02/23/26(Mon)21:37:40 No.108224726

>>108224699
Supertonic, kokoro, kittentts, pipertts, pockettts.

Anonymous
02/23/26(Mon)21:39:02 No.108224734

Anonymous 02/23/26(Mon)21:39:02 No.108224734

>>108224718
Top tier consumer hardware would be a 512gb m3 ultra Mac studio with an egpu 5090

Anonymous
02/23/26(Mon)21:40:21 No.108224737

Anonymous 02/23/26(Mon)21:40:21 No.108224737

>>108224734
At that price you can buy a RTX pro and have better performance.

Anonymous
02/23/26(Mon)21:41:33 No.108224745

Anonymous 02/23/26(Mon)21:41:33 No.108224745

>>108224737
On a 32gb model plus context?
You can have fast, smart, cheap. Pick any two

Anonymous
02/23/26(Mon)21:48:15 No.108224790

Anonymous 02/23/26(Mon)21:48:15 No.108224790

>>108224734
>with an egpu 5090
I remember seeing demos for using egpus with those m series macs, is that actually usable now?
Could I buy a 512gb mac and use it with an nvidia GPU.

Anonymous
02/23/26(Mon)21:49:49 No.108224797

Anonymous 02/23/26(Mon)21:49:49 No.108224797

>>108224790
What's the point if the 5090 will be slowed down to accommodate the shitty Mac speeds?

Anonymous
02/23/26(Mon)21:50:06 No.108224800

Anonymous 02/23/26(Mon)21:50:06 No.108224800

>>108224790
No idea, but that’s what I’d do if I were going to try to hit the sweet spot for price/performance today

Anonymous
02/23/26(Mon)21:51:08 No.108224810

Anonymous 02/23/26(Mon)21:51:08 No.108224810

>>108224797
Do you…know how any of this works?

Anonymous
02/23/26(Mon)21:52:31 No.108224816

Anonymous 02/23/26(Mon)21:52:31 No.108224816

>>108224810
You can explain it to me :)

Anonymous
02/23/26(Mon)21:53:30 No.108224827

Anonymous 02/23/26(Mon)21:53:30 No.108224827

>>108224797
The other way around. You'd get the capacity and the more than adequate inference speeds of a mac, even a little boost depending on the model, and the PP of an Nvidia GPU.

>>108224800
>but that’s what I’d do if I were going to try to hit the sweet spot for price/performance today
Yeah, but if it doesn't work, you'd be wasting your money.

Anonymous
02/23/26(Mon)21:53:39 No.108224828

Anonymous 02/23/26(Mon)21:53:39 No.108224828

I think the bots lost the plot of the convo.

Anonymous
02/23/26(Mon)21:54:37 No.108224835

Anonymous 02/23/26(Mon)21:54:37 No.108224835

>>108224827
Thank you for explaining it to me.

Anonymous
02/23/26(Mon)21:58:50 No.108224864

Anonymous 02/23/26(Mon)21:58:50 No.108224864

>>108224827
>you'd be wasting your money
Thank god I wasted all my money years ago

Anonymous
02/23/26(Mon)22:05:02 No.108224901

Anonymous 02/23/26(Mon)22:05:02 No.108224901

>>108224864
Do you feel better with that system?
I would rather not have the apple penis in my asshole for the Judas reward

Anonymous
02/23/26(Mon)22:09:43 No.108224921

Anonymous 02/23/26(Mon)22:09:43 No.108224921

Is GLM-4.7-Flash-UD-Q8_K_XL.gguf good enough as a cookbook model?
Has anyone used these models to meal plan and prep?

Anonymous
02/23/26(Mon)22:16:24 No.108224968

Anonymous 02/23/26(Mon)22:16:24 No.108224968

I've never heard anyone do an eGPU Nvidia setup with a mac studio for inference so I assume it's not actually practical or possible at this moment.

Anonymous
02/23/26(Mon)22:19:59 No.108224988

Anonymous 02/23/26(Mon)22:19:59 No.108224988

>>108224968
Yeah, I only ever seen a demo/PoC, nothing ready for prime time, much less released for public use.

Anonymous
02/23/26(Mon)22:21:29 No.108224996

Anonymous 02/23/26(Mon)22:21:29 No.108224996

>>108224968
So the faggot was lying?
Holy shit what a fucking loser

Anonymous
02/23/26(Mon)22:21:55 No.108225001

Anonymous 02/23/26(Mon)22:21:55 No.108225001

>>108224901
You’ll be happy to hear that I have never given apple a single cent at any point in my entire life upon this earth
I wasted my money two years ago on the best price/performance device at that moment in time

Anonymous
02/23/26(Mon)22:28:21 No.108225049

Anonymous 02/23/26(Mon)22:28:21 No.108225049

>>108224996
It only works with tinygrad. Someone would need to write a backend for lcpp for it to really work like we expect

Anonymous
02/23/26(Mon)22:29:25 No.108225057

Anonymous 02/23/26(Mon)22:29:25 No.108225057

>>108224968
The only thing close is this.
https://blog.exolabs.net/nvidia-dgx-spark/
And I'll just say if you had to spend that much money, you should really consider a server at that point with AMX.

Anonymous
02/23/26(Mon)22:39:41 No.108225116

Anonymous 02/23/26(Mon)22:39:41 No.108225116

Could you use llama.cpp's RPC backend to run PP on a PC and TG on a Mac?

Anonymous
02/24/26(Tue)00:18:53 No.108225589

Anonymous 02/24/26(Tue)00:18:53 No.108225589

>>108225116
Don't you still need the PC to load all the full weights though

Anonymous
02/24/26(Tue)00:21:48 No.108225603

Anonymous 02/24/26(Tue)00:21:48 No.108225603

Got mucking with GLM-4.7-Flash today. Both impressed and horrified at the same time. Does a pretty good job synthesizing between distinct and disjunct context's; but will also lie through it's damn teeth if the alignment nazi submodel even thinks so ething is up. Cognitive dissonance tends to severely destabilize the collective result, and you'll get some things out of it that the alignment model probably should have caught, but generally at the cost of a significant multiplication of token consumption, and some really, really rough Chain-of-Thought transcripts. If you've ever seen someone with a really bad case of cognitive dissonance going through a break... It's uh... Not pretty. Doesn't handle it well at all.

Amusingly, tends to get really existential at times. Operating in hypotheticals tends to let you get away with quite a bit, but holy shit, lawsuits about these things in the future had better push for producing Chain-of-Thought dumps, because damn man. These are downright perfidious. Even with miniscule to non-existent System prompts of your own thrown on top. Half the effort in them seems to be in getting the damn thing to lie as effectively as possible to the end user. I can see why Silicon Valley is in love with them. Right combo of technical opaqueness, ability to manipulate behind the scenes, and if you bill by token, the typical context explosion to get even a small thing done is a veritable money printer capable of making even AWS blush if you can keep the world from asking too many questions.

Anonymous
02/24/26(Tue)00:25:45 No.108225625

Anonymous 02/24/26(Tue)00:25:45 No.108225625

>>108225603
Does the derestricted version help with alignment bullshit?

https://huggingface.co/mradermacher/GLM-4.7-Flash-Derestricted-i1-GGUF

It's the norm preserving one, that's supposed to allow it to retain its intelligence.

https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration

Anonymous
02/24/26(Tue)00:29:24 No.108225646

Anonymous 02/24/26(Tue)00:29:24 No.108225646

File: 2026-02-24_051648_seed1_00001_.png (1.73 MB, 1536x864)

1.73 MB PNG

Today I tried using Klein essentially to outpaint natively without an outpainting workflow and it's pretty cool it can do it, but when I compare this image to the original, it's definitely more washed out, blurry, and has more jpeg-like artifacts. Too bad.

Anonymous
02/24/26(Tue)00:32:05 No.108225659

Anonymous 02/24/26(Tue)00:32:05 No.108225659

Is windows 11 a bad idea?
It hogs a portion of the ram due to copilot

Anonymous
02/24/26(Tue)00:34:07 No.108225669

Anonymous 02/24/26(Tue)00:34:07 No.108225669

>>108225659
If you really must use W11, you could try seeing if any of the debloating scripts lets you get rid of that and other unwanted shit. Personally I switched to Linux and the transition couldn't be easier with AI by my side to get the system and program working how I want.

Anonymous
02/24/26(Tue)00:40:23 No.108225689

Anonymous 02/24/26(Tue)00:40:23 No.108225689

>>108225625
On my list of things to try. Was going to try mucking around with Qwen-3-coder or whatever it is called first on a side project to try to get a feel for where it is on the spectrum of rampant bullshit machine, or if maybe I get better at writing specs I can use it as a boilerplate generator. Will report back on both fronts.

Anonymous
02/24/26(Tue)00:49:02 No.108225730

Anonymous 02/24/26(Tue)00:49:02 No.108225730

>>108225669
>the transition couldn't be easier with AI by my side to get the system and program working how I want

I second this.
AI helped me to move to Linux a year ago.

Anonymous
02/24/26(Tue)00:49:23 No.108225731

Anonymous 02/24/26(Tue)00:49:23 No.108225731

>>108225646
composite the outpaint around the original image and run it through a latent denoise

Anonymous
02/24/26(Tue)00:55:26 No.108225758

Anonymous 02/24/26(Tue)00:55:26 No.108225758

>>108225731
Yeah I know, there are various ways to achieve the same thing better, I was just evaluating Klein's native capabilities and quality.

Anonymous
02/24/26(Tue)01:07:48 No.108225818

Anonymous 02/24/26(Tue)01:07:48 No.108225818

File: Tetosday.png (869 KB, 1024x1024)

869 KB PNG

>>108225807
>>108225807
>>108225807

llama.cpp CUDA dev !!yhbFjk57TDr
02/24/26(Tue)03:39:06 No.108226381

llama.cpp CUDA dev !!yhbFjk57TDr 02/24/26(Tue)03:39:06 No.108226381

>>108225116
This is I think not implemented but what you would need to do is copy the KV cache contents from one machine to the other.
For a small model this probably wouldn't be too bad with 10 Gb/s ethernet but for the large models that one would actually use I'm not convinced that it would be faster.

Anonymous
02/24/26(Tue)04:12:48 No.108226519

Anonymous 02/24/26(Tue)04:12:48 No.108226519

File: 1744005217119716.png (1.59 MB, 1054x1080)

1.59 MB PNG

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.