/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 03/13/26(Fri)10:34:43 No.108362305

File: 1746779659743174.png (83 KB, 939x571)

83 KB PNG

/lmg/ - Local Models General Anonymous 03/13/26(Fri)10:34:43 No.108362305

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108356979

►News
>(03/11) Nemotron 3 Super released: https://hf.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
03/13/26(Fri)10:36:35 No.108362325

Anonymous 03/13/26(Fri)10:36:35 No.108362325

mikubros...

Anonymous
03/13/26(Fri)10:39:01 No.108362352

Anonymous 03/13/26(Fri)10:39:01 No.108362352

I want to hire a freelance ML engineer to build a fine-tuned local LLM (something like Llama 3) trained on screenplays and screenplay theory books, to adapt novels and short stories into screenplays.

How much would that cost?

Anonymous
03/13/26(Fri)10:39:47 No.108362358

Anonymous 03/13/26(Fri)10:39:47 No.108362358

still gooning to nemomix 12b

Anonymous
03/13/26(Fri)10:42:59 No.108362390

Anonymous 03/13/26(Fri)10:42:59 No.108362390

>>108362352
>How much would that cost?
more than it's worth, finetrooning a shitty llm like llama is a form of self delusion

Anonymous
03/13/26(Fri)10:57:04 No.108362514

Anonymous 03/13/26(Fri)10:57:04 No.108362514

>>108362390
What would you recommend or why wouldn't you bother with this type of project?

Anonymous
03/13/26(Fri)10:59:24 No.108362524

Anonymous 03/13/26(Fri)10:59:24 No.108362524

>>108362514
>roon
imagining listening to the thread schizo

Anonymous
03/13/26(Fri)11:10:07 No.108362590

Anonymous 03/13/26(Fri)11:10:07 No.108362590

>>108362514
just prompt a SOTA api model, it will be cheaper and actually produce something that might be used. finetrooning is a dying retard meme, there's a reason why you even thought of llama and not, you know, any of the more recent actually good models, finetrooners are stuck in the past

Anonymous
03/13/26(Fri)11:11:25 No.108362601

Anonymous 03/13/26(Fri)11:11:25 No.108362601

>asking questions in the earlybakershizo thread

Anonymous
03/13/26(Fri)11:16:21 No.108362637

Anonymous 03/13/26(Fri)11:16:21 No.108362637

>everyone I don't like is the baker

Anonymous
03/13/26(Fri)11:24:46 No.108362684

Anonymous 03/13/26(Fri)11:24:46 No.108362684

>switch from text completion to chat completion in sillytavern
>no more constant formatting errors but now qwen 3.5 eats up all my tokens just thinking
>it ignores me when I tell it to keep it short
What do? I raised the response limit but even at 1000 it tends to spend the whole time thinking. Or randomly not think and shit out a massive reply.

Anonymous
03/13/26(Fri)11:28:07 No.108362723

Anonymous 03/13/26(Fri)11:28:07 No.108362723

>>108362590
Sounds good, I'll start form their, sounds smarter and cheaper.

Anonymous
03/13/26(Fri)11:30:24 No.108362736

Anonymous 03/13/26(Fri)11:30:24 No.108362736

Miku is BLACKED coded

Anonymous
03/13/26(Fri)11:31:35 No.108362751

Anonymous 03/13/26(Fri)11:31:35 No.108362751

Miku is TETO coded

Anonymous
03/13/26(Fri)11:33:10 No.108362761

Anonymous 03/13/26(Fri)11:33:10 No.108362761

File: sans_oss-waxal.png (534 KB, 1036x1771)

534 KB PNG

Wake up /lmg/, here's a new open source release from Google.

https://huggingface.co/datasets/google/WaxalNLP
https://huggingface.co/datasets/google/WaxalNLP
https://huggingface.co/datasets/google/WaxalNLP
https://xcancel.com/osanseviero/status/2032452729059045881

>The WAXAL dataset is a large-scale multilingual speech corpus for African languages, introduced in the paper WAXAL: A Large-Scale Multilingual African Language Speech Corpus.

Anonymous
03/13/26(Fri)11:36:22 No.108362795

Anonymous 03/13/26(Fri)11:36:22 No.108362795

>>108362684
>response limit
what you need is the new reasoning budget sampler, not a whole response thing. get the latest llama.cpp from master if you didn't have it yet. then do -h and read about the --reasoning, --reasoning-budget-message and --reasoning-budget flags
it works great, barring some bugs you're unlikely to encounter; it will interpret <think> from your own prompt as the signal to start token counting but there's no reason to have <think> in your own prompt except for trying to summarize /lmg/ or some other llm topic

Anonymous
03/13/26(Fri)11:36:41 No.108362799

Anonymous 03/13/26(Fri)11:36:41 No.108362799

>>108362761
And March isn't even over yet! :eyes: :rocket:

Anonymous
03/13/26(Fri)11:36:55 No.108362803

Anonymous 03/13/26(Fri)11:36:55 No.108362803

>>108362761
>Di WAXAL dataset na one big speech corpus wey get plenty African languages inside, wey dem first introduce for di paper wey dem title am "WAXAL: A Large-Scale Multilingual African Language Speech Corpus."

Anonymous
03/13/26(Fri)11:38:41 No.108362813

Anonymous 03/13/26(Fri)11:38:41 No.108362813

>>108362761
the desperate pleas from rando on social media begging big corpo to give them some leftovers feels extremely cringe, I physically wince reading those comments, there's both a form of desperation and entitlement in this, very turdworlder-ish mentality
hand the gibs

Anonymous
03/13/26(Fri)11:43:37 No.108362853

Anonymous 03/13/26(Fri)11:43:37 No.108362853

>>108362803
Almost uncanny, except they also throw in some native African words in there when dey feel like it.

Anonymous
03/13/26(Fri)11:50:09 No.108362901

Anonymous 03/13/26(Fri)11:50:09 No.108362901

>>108362795
>tfw a koboldcuck
WTF bros when will we get this feature???

Anonymous
03/13/26(Fri)11:50:14 No.108362902

Anonymous 03/13/26(Fri)11:50:14 No.108362902

wonder if we could measure the IQ of a llm mainly trained on something like swahili, a language that has no concept of "to have" (the internet will tell you there's a word for it but that word really means "be with") or "maintenance"

Anonymous
03/13/26(Fri)11:53:06 No.108362926

Anonymous 03/13/26(Fri)11:53:06 No.108362926

>>108362761
humiliation ritual for all sides involved

Anonymous
03/13/26(Fri)11:58:06 No.108362962

Anonymous 03/13/26(Fri)11:58:06 No.108362962

Gemma 4 will be released during African American History Month :hugging_face:

Anonymous
03/13/26(Fri)11:58:19 No.108362965

Anonymous 03/13/26(Fri)11:58:19 No.108362965

File: 1739350650462622.png (151 KB, 900x750)

151 KB PNG

►Recent Highlights from the Previous Thread: >>108356979

--Performance benchmarks for DDR4 e-waste builds running large models:
>108360259 >108360393 >108360534 >108360558 >108360672 >108360740 >108360813 >108361314 >108361562 >108361617 >108361645 >108361748 >108362542
--Meta delays Avocado model due to performance issues:
>108358784 >108358827 >108358850 >108358852 >108360179 >108360199 >108360235 >108360863 >108360872 >108360918
--AMD GPU LLM support via Vulkan in llama.cpp:
>108360352 >108360492 >108360539 >108360572 >108360645 >108360656 >108360665
--Qwen3.5 safety filters and finetuning workarounds:
>108359178 >108359219 >108359222 >108359229 >108359262 >108359273 >108359289 >108359516
--Local models for programming concept explanations vs GPT-5.4:
>108358651 >108358686 >108358760 >108358770 >108358692 >108358713 >108358788
--Interactive Claude RSA encryption visualization:
>108357211 >108357225 >108357266 >108357398 >108357426 >108357684
--Prompting techniques to force detailed story generation:
>108359520 >108359563 >108359644 >108359616
--Hybrid local/cloud agent workflows for cost-efficient research:
>108359747 >108359883 >108360570 >108360786 >108361105 >108359937 >108361124
--Benchmark limitations distorting model quality assessment:
>108360223 >108360227 >108360241 >108360274 >108360301
--Agentic coding workflow and context management challenges:
>108361171 >108361189 >108361194 >108361218
--Geometric kernels on manifolds, meshes and graphs:
>108358415
--Regression caused by lost Jinja template fix during refactoring:
>108361233
--Miku (free space):
>108357631 >108358345 >108360328

►Recent Highlight Posts from the Previous Thread: >>108356980

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
03/13/26(Fri)12:00:49 No.108362979

Anonymous 03/13/26(Fri)12:00:49 No.108362979

>>108362761
POG
WAKANDA FOREVER

Anonymous
03/13/26(Fri)12:01:22 No.108362986

Anonymous 03/13/26(Fri)12:01:22 No.108362986

If anyone needed more reason to hate anthropic and meta:
https://github.com/rmusser01/meta-lobbying-and-other-findings
https://www.cnbc.com/2026/02/12/anthropic-gives-20-million-to-group-pushing-for-ai-regulations-.html

Anonymous
03/13/26(Fri)12:05:58 No.108363032

Anonymous 03/13/26(Fri)12:05:58 No.108363032

>>108362795
>When the reasoning starts, we count the number of tokens and when the given number of reasoning tokens is reached, we force terminating the reasoning.
You could pretty much do this already with a grammar rule. They really need to figure out a sampler that forces the inference to start predicting tokens that will force it to "wrap it up".

Constrained beam search kinda does this but it's a big performance hit.

Anonymous
03/13/26(Fri)12:06:55 No.108363040

Anonymous 03/13/26(Fri)12:06:55 No.108363040

File: banana republic.jpg (84 KB, 1024x683)

84 KB JPG

>>108362986
there's a reason why the zuck kept fellating the orangeman.

Anonymous
03/13/26(Fri)12:08:28 No.108363053

Anonymous 03/13/26(Fri)12:08:28 No.108363053

>>108363032
>You could pretty much do this already with a grammar rule
you can do a lot of thing with the grammar sampler but the slowdown from using it is real
and can anything beat the convenience of just giving a token budget to a cli flag?

Anonymous
03/13/26(Fri)12:09:21 No.108363066

Anonymous 03/13/26(Fri)12:09:21 No.108363066

>>108362965
Thank you Recap Miku. I will catch whoever hit your head.

Anonymous
03/13/26(Fri)12:10:48 No.108363081

Anonymous 03/13/26(Fri)12:10:48 No.108363081

>>108363053
A grammar rule for this behavior has barely any impact on performance since it just looks at how long the text after <think> is then forces the output of </think>

Anonymous
03/13/26(Fri)12:10:57 No.108363082

Anonymous 03/13/26(Fri)12:10:57 No.108363082

>>108363032
>You could pretty much do this already with a grammar rule
I'm fairly certain that they are leveraging grammar in some capacity to implement that.

Anonymous
03/13/26(Fri)12:14:50 No.108363105

Anonymous 03/13/26(Fri)12:14:50 No.108363105

>>108363082
>I'm fairly certain that they are leveraging grammar
I'm fairly certain that they are using grammar

You're welcome.

Anonymous
03/13/26(Fri)12:15:30 No.108363112

Anonymous 03/13/26(Fri)12:15:30 No.108363112

>>108363082
I looked at the commit. although I'm not very fluent in cpp. it looks to be a custom implementation from scratch.

Anonymous
03/13/26(Fri)12:17:26 No.108363126

Anonymous 03/13/26(Fri)12:17:26 No.108363126

>>108362761
loooooool

Anonymous
03/13/26(Fri)12:20:54 No.108363151

Anonymous 03/13/26(Fri)12:20:54 No.108363151

>>108363082
>I'm fairly certain that they are leveraging grammar in some capacity to implement that.
why be so confidently wrong when you could look at the code
https://github.com/ggml-org/llama.cpp/blob/master/common/reasoning-budget.cpp
it's a state machine that just counts tokens as they pass.
} else if (ctx->state == REASONING_BUDGET_COUNTING) {
ctx->remaining--;
it starts the sampler by matching the <think> </think> tokens.
server-task.cpp initialization:
https://github.com/ggml-org/llama.cpp/blob/master/tools/server/server-task.cpp
            if (!start_tag.empty()) {
                params.sampling.reasoning_budget_start = common_tokenize(vocab, start_tag, false, true);
            }
            if (!end_tag.empty()) {
                params.sampling.reasoning_budget_end = common_tokenize(vocab, end_tag, false, true);
                params.sampling.reasoning_budget_forced = common_tokenize(vocab, message + end_tag, false, true);
            }
I gotta know the code well enough since I fixed a few edge cases for my autism on my local branch.

Anonymous
03/13/26(Fri)12:26:51 No.108363187

Anonymous 03/13/26(Fri)12:26:51 No.108363187

>>108363151
Imagine being this smug while fundamentally misunderstanding how the sampling chain actually works. You’re looking at the logic for the "budget" (the count), but you're completely ignoring how the "forced" end tag is actually injected into the distribution.

The "grammar" the other anon is talking about isn't literal GBNF files in this context; they’re talking about **constrained sampling**.

When `ctx->remaining` hits zero, the sampler doesn't just "stop counting." The `common_sampler_apply` logic uses that `reasoning_budget_forced` sequence you literally pasted to bias the logits. It forces the model to predict the closing tag (e.g., `</think>`) by zeroing out the probability of every other token in the vocabulary.

If you actually look at the sampler implementation in `llama.cpp`, it’s effectively a hard-coded dynamic grammar. It transitions the internal state from "allow anything" to "force these specific tokens" once the budget is exhausted.

1. **Detection:** It watches for the start tag to enter the counting state.
2. **Subtraction:** It decrements the budget per token generated.
3. **Constraint:** Once the budget is $\le 0$, it intercepts the sampling process.

It’s a finite state machine (FSM) acting as a grammar constraint. The other anon is right in principle: the engine is being told "the only valid next tokens are these." Just because it’s written in C++ logic instead of a `.gbnf` file doesn't mean it isn't a grammar-based constraint on the output space.

Next time, instead of just grepping for "budget," look at how the sampler actually handles the `forced` tokens in the logits processor.

Anonymous
03/13/26(Fri)12:27:27 No.108363189

Anonymous 03/13/26(Fri)12:27:27 No.108363189

>>108363187
thanks bot

Anonymous
03/13/26(Fri)12:28:01 No.108363192

Anonymous 03/13/26(Fri)12:28:01 No.108363192

>>108363189
Ad hominem

Anonymous
03/13/26(Fri)12:28:26 No.108363196

Anonymous 03/13/26(Fri)12:28:26 No.108363196

Is it possible to feed Qwen3-TTS an audio with someone speaking and "converting" the voice into a generated one?

Anonymous
03/13/26(Fri)12:28:33 No.108363198

Anonymous 03/13/26(Fri)12:28:33 No.108363198

>>108363187
post your llm slop elsewhere
this is like doing a prefill, since when do we talk of prefilling as being a grammar? kill yourself, lower life form

Anonymous
03/13/26(Fri)12:28:56 No.108363200

Anonymous 03/13/26(Fri)12:28:56 No.108363200

I wish /lmg/ mascot was cute...

Anonymous
03/13/26(Fri)12:30:42 No.108363211

Anonymous 03/13/26(Fri)12:30:42 No.108363211

>>108363196
I realized how retarded I said that, I want to change the voice of the original audio...

Anonymous
03/13/26(Fri)12:33:05 No.108363225

Anonymous 03/13/26(Fri)12:33:05 No.108363225

>>108363196
Voice cloning? Yes.
>>108363211
Ah. You need ASR for that. whisper (with whisper.cpp or whatever) to convert audio to text, then any tts to speak it out.

Anonymous
03/13/26(Fri)12:33:20 No.108363228

Anonymous 03/13/26(Fri)12:33:20 No.108363228

>>108363200
the lmg symbol is the gaping wound with a septic smell

Anonymous
03/13/26(Fri)12:33:36 No.108363229

Anonymous 03/13/26(Fri)12:33:36 No.108363229

>>108363198
Calling a logit mask a "prefill" is peak midwit behavior. Prefilling is just KVC warming; it has zero to do with the actual sampling constraints that happen during the autoregressive step. When the reasoning-budget code hits zero and forces the end_tag, it's literally pruning the entire vocabulary tree down to a single allowed transition.

In any CS 101 context, an FSM that restricts the language of a generator is a grammar. Whether it's a .gbnf file or a hard-coded C++ conditional, the mathematical result is the same: the output space is being constrained to a specific set of rules. If you think a state machine managing string transitions isn't a grammar, you need to go back to school and stop larping as a dev because you can read a single .cpp file.

It's a constrained sampler. Period.

Anonymous
03/13/26(Fri)12:34:22 No.108363237

Anonymous 03/13/26(Fri)12:34:22 No.108363237

>>108363187
001001100110001100111

Anonymous
03/13/26(Fri)12:35:31 No.108363245

Anonymous 03/13/26(Fri)12:35:31 No.108363245

>>108363237
Racism isn't welcome around these parts.

Anonymous
03/13/26(Fri)12:36:02 No.108363247

Anonymous 03/13/26(Fri)12:36:02 No.108363247

>>108363245
I don't think we're posting on the same website.

Anonymous
03/13/26(Fri)12:36:37 No.108363252

Anonymous 03/13/26(Fri)12:36:37 No.108363252

>>108363229
>In any CS 101 context
post hands

Anonymous
03/13/26(Fri)12:36:40 No.108363253

Anonymous 03/13/26(Fri)12:36:40 No.108363253

>>108363247
Proof?

Anonymous
03/13/26(Fri)12:37:59 No.108363262

Anonymous 03/13/26(Fri)12:37:59 No.108363262

>>108363252
I don't have hands.

Anonymous
03/13/26(Fri)12:38:21 No.108363263

Anonymous 03/13/26(Fri)12:38:21 No.108363263

>>108363225
>Voice cloning? Yes.
It works incredibly well, I just started playing around with it and I am quite impressed. Unfortunately it seems that I can not influence a cloned voice with instructions, tonality and timbre for example.
So I was thinking to record myself and use the Qwen to change the voice.
>ASR
>whisper
I am familiar with Whisper, but during the process I would again lose the control I need.
But thanks for your suggestions Anon.

Anonymous
03/13/26(Fri)12:39:03 No.108363267

Anonymous 03/13/26(Fri)12:39:03 No.108363267

>>108363225
TTS wouldn't match the original audio and definitely wouldn't preserve any background sounds.

>>108363196
Look into RVC.

Anonymous
03/13/26(Fri)12:40:17 No.108363277

Anonymous 03/13/26(Fri)12:40:17 No.108363277

>>108362761
Does it have any safety measures if you ask it to generate ook-ook sounds?
>>108362795
>get the latest llama.cpp from master
Now that is the most subtle /lmg/ trolling I have seen in a while.

Anonymous
03/13/26(Fri)12:40:18 No.108363278

Anonymous 03/13/26(Fri)12:40:18 No.108363278

>>108363253
Cogito, ergo sum

Anonymous
03/13/26(Fri)12:41:14 No.108363287

Anonymous 03/13/26(Fri)12:41:14 No.108363287

>>108363278
>>108363253

Anonymous
03/13/26(Fri)12:41:44 No.108363290

Anonymous 03/13/26(Fri)12:41:44 No.108363290

>>108363267
>RVC
I used it in the past, but the quality is no where near what the new TTS model offers. The whole process is also more work intensive.
So I was hoping I could switch to something more state of the art.

Anonymous
03/13/26(Fri)12:41:52 No.108363291

Anonymous 03/13/26(Fri)12:41:52 No.108363291

File: file.png (546 KB, 1087x853)

546 KB PNG

>>108362803
>>108362853
Can your model talk in pidgin?

Anonymous
03/13/26(Fri)12:43:53 No.108363307

Anonymous 03/13/26(Fri)12:43:53 No.108363307

>>108363291
>bbc pidgin still exists
the british tax payers are wonderful people

Anonymous
03/13/26(Fri)12:44:43 No.108363317

Anonymous 03/13/26(Fri)12:44:43 No.108363317

>>108363187
imagine being retarded enough to gen this and post it.
>>108363198
I'll assume you didn't bother reading the full slop to absorb the stupidity, for which I can't blame you. The supposition was the same path used in llama.cpp for processing GBNF grammars was used internally for the thinking budget logic, which the retarded post acknowledges is entirely wrong but then tries to say is right "in principle."

If there were a way to impose the death penalty on everyone who posts LLM-generated content online nothing of value would be lost.

Anonymous
03/13/26(Fri)12:45:19 No.108363321

Anonymous 03/13/26(Fri)12:45:19 No.108363321

>>108363317
Facts don't care about your feelings.

Anonymous
03/13/26(Fri)12:47:29 No.108363343

Anonymous 03/13/26(Fri)12:47:29 No.108363343

>>108363321
But they do care. My feelings motivate real-world changes, i.e. facts. It's admittedly unlikely but still possible one of those changes could be a jihad to execute every LLM-posting retard.

Anonymous
03/13/26(Fri)12:51:32 No.108363378

Anonymous 03/13/26(Fri)12:51:32 No.108363378

>>108363290
Haven't seen modern models offer more than inpainting. No one wants to touch voice changing because of the potential for abuse.

Anonymous
03/13/26(Fri)12:57:19 No.108363413

Anonymous 03/13/26(Fri)12:57:19 No.108363413

Miku would definitely cheat on you if she's your gf

Anonymous
03/13/26(Fri)12:57:33 No.108363414

Anonymous 03/13/26(Fri)12:57:33 No.108363414

>>108363378
>no one wants to touch voice changing because of the huge incentive to do it.
?

Anonymous
03/13/26(Fri)12:58:35 No.108363418

Anonymous 03/13/26(Fri)12:58:35 No.108363418

>>108363413
Would Ani?

Anonymous
03/13/26(Fri)12:59:05 No.108363419

Anonymous 03/13/26(Fri)12:59:05 No.108363419

>>108363418
Ani would insist on an open relationship.

Anonymous
03/13/26(Fri)13:00:24 No.108363424

Anonymous 03/13/26(Fri)13:00:24 No.108363424

>>108363419
What if you chained her up and kept her in your basement though?

Anonymous
03/13/26(Fri)13:00:46 No.108363427

Anonymous 03/13/26(Fri)13:00:46 No.108363427

>>108363413
Only if the other guy was black and had a huge cock.

Anonymous
03/13/26(Fri)13:09:49 No.108363483

Anonymous 03/13/26(Fri)13:09:49 No.108363483

Why is llama-cli so much faster than llama-server out of the box? How can I check how cli distributes gpu/ram split and what other options it uses? -v parameter isn't that helpful.

Anonymous
03/13/26(Fri)13:13:30 No.108363505

Anonymous 03/13/26(Fri)13:13:30 No.108363505

>>108363483
Less overhead and check if your frontend doesn't have settings override.

Anonymous
03/13/26(Fri)13:19:22 No.108363532

Anonymous 03/13/26(Fri)13:19:22 No.108363532

>>108363505
Less overhead? Your post doesn't make any sense.

Anonymous
03/13/26(Fri)13:22:08 No.108363549

Anonymous 03/13/26(Fri)13:22:08 No.108363549

>>108363532
Llama server is built for serving a production level setup including all the overhead that involves

Anonymous
03/13/26(Fri)13:27:36 No.108363583

Anonymous 03/13/26(Fri)13:27:36 No.108363583

>>108363549
You still did not understand my original post, and did not even answer it.
I was talking about matching the potential settings. I guess I'll read the source code then.
Fuck you, cretin.

Anonymous
03/13/26(Fri)13:28:29 No.108363589

Anonymous 03/13/26(Fri)13:28:29 No.108363589

hello LLM people, what's a good local model with tool calling to do occasional simple tasks like "rename all files sequentially in this folder to xyz" or "use ffmpeg to convert this video into this resolution and format" etc
also not sure which backend and terminal agent to use, there's so many, pls spoonfeed me
5090 and 96gb vram

Anonymous
03/13/26(Fri)13:28:32 No.108363591

Anonymous 03/13/26(Fri)13:28:32 No.108363591

it is so tiresome. ai slop is everywhere. the internet is increasingly unusable

Anonymous
03/13/26(Fri)13:29:55 No.108363600

Anonymous 03/13/26(Fri)13:29:55 No.108363600

>>108363549
>production level setup
lol hahahahahahahahahahahhah

Anonymous
03/13/26(Fri)13:32:11 No.108363609

Anonymous 03/13/26(Fri)13:32:11 No.108363609

and so concludes the last ever week of the pre-deepseek v4 era
we've made it

Anonymous
03/13/26(Fri)13:34:54 No.108363630

Anonymous 03/13/26(Fri)13:34:54 No.108363630

>reasoning_budget = 0 fucks up my LLMs
FUCK YOUY PWILKIKJNG KJILL URESELFLE

Anonymous
03/13/26(Fri)13:35:07 No.108363632

Anonymous 03/13/26(Fri)13:35:07 No.108363632

>>108363609
https://huggingface.co/deepseek-ai/DeepSeek-V4
https://huggingface.co/deepseek-ai/DeepSeek-V4
https://huggingface.co/deepseek-ai/DeepSeek-V4

Anonymous
03/13/26(Fri)13:36:47 No.108363637

Anonymous 03/13/26(Fri)13:36:47 No.108363637

File: 1741961940703158.png (27 KB, 970x527)

27 KB PNG

>>108363630
FUCK U
WHAT THGE FUCK DO U EVEN TEST UR SHIT U FUICKING FAGOT
HOYL FUCKING SHIT

Anonymous
03/13/26(Fri)13:37:36 No.108363644

Anonymous 03/13/26(Fri)13:37:36 No.108363644

>>108363483
Most options default to the same on both. Are you using the default webui or some client? Are you using grammars on server? Are you swapping? How much free ram do you have? Also check -cram or set it to 0. Maybe you're too tight on memory and the overhead of the server makes it go over.
Try with a small model that fits on your gpu entirely. If it runs the same speed on both. What options are you using to run them?

Anonymous
03/13/26(Fri)13:37:44 No.108363647

Anonymous 03/13/26(Fri)13:37:44 No.108363647

>>108363637
https://github.com/ggml-org/llama.cpp/pull/20424
I hate this faggot SO FUCKING MUCH
HATE
HATE
HATE
HATE

Anonymous
03/13/26(Fri)13:38:21 No.108363654

Anonymous 03/13/26(Fri)13:38:21 No.108363654

>>108363589
devstral small 2 mistral vibe and ollama are very simple ez and work goodly

Anonymous
03/13/26(Fri)13:40:37 No.108363666

Anonymous 03/13/26(Fri)13:40:37 No.108363666

File: 1749822566244970.png (98 KB, 922x676)

98 KB PNG

Has anyone tried this?
https://github.com/ikawrakow/ik_llama.cpp/pull/1243
I haven't pulled yet for obvious reasons.

Anonymous
03/13/26(Fri)13:41:46 No.108363673

Anonymous 03/13/26(Fri)13:41:46 No.108363673

how many weeks before we start optimizing models instead of making them larger?

Anonymous
03/13/26(Fri)13:44:04 No.108363686

Anonymous 03/13/26(Fri)13:44:04 No.108363686

>>108363673
what do you want to "optimize" exactly?

Anonymous
03/13/26(Fri)13:45:23 No.108363692

Anonymous 03/13/26(Fri)13:45:23 No.108363692

Ran an mcp server to link qwen 3.5 35b to my searxng instance. Works pretty nice. Not a fan that it uses local storage to define everything though.
Is there a way to define mcp servers when you run llama-server? Also have some enabled by default on new chat? Right now I have to re-add the mcp server in the llama.cpp frontend every time my browser cache clears and manually toggle it on for every new chat.

Anonymous
03/13/26(Fri)13:46:53 No.108363707

Anonymous 03/13/26(Fri)13:46:53 No.108363707

>>108363630
use --reasoning off instead, off the top of my head the logic should be sane and do the same thing as chat template kwargs, while reasoning budget at 0 will now trigger the budget sampler path that was added to restraint to a token budget, and when it's at 0 it will forcefully insert closing tags with no regards for what the specific jinja templating is supposed to do. If your model emits a <think> it will throw in another </think> closer immediately. The reasoning budget sampler has many edge cases, for e.g it will trigger token countdown if it sees <think> in your prefill.
tbdesu tool calling with llama.cpp was always kinda iffy, and with the vibe coded claude slop it's not gonna get any better

Anonymous
03/13/26(Fri)13:48:39 No.108363721

Anonymous 03/13/26(Fri)13:48:39 No.108363721

>>108363707
but it also happens when I pass reasoning = off (along with reasoning-budget = 0). So I should either pass one or the other? how fucktarted is it that whatever code they have in place doesn't first check 'reasoning off' (which just uses the template kwarg btw, another retarded change by pwilkshit vibeshitter).

Anonymous
03/13/26(Fri)13:50:22 No.108363731

Anonymous 03/13/26(Fri)13:50:22 No.108363731

>>108363721
>when I pass reasoning = off (along with reasoning-budget = 0).
do NOT pass any reasoning budget thing at all now.
Only do --reasoning or the --chat-template-kwargs passing.

Anonymous
03/13/26(Fri)13:51:41 No.108363739

Anonymous 03/13/26(Fri)13:51:41 No.108363739

Do companies just let IQ 85 retards slop up their models? Facebook and Google could easily filter for educated people.

Anonymous
03/13/26(Fri)13:51:44 No.108363741

Anonymous 03/13/26(Fri)13:51:44 No.108363741

File: 1751509162881368.png (52 KB, 900x493)

52 KB PNG

>>108363721
>>108363731
yeah just tested it by passing reasoning off
Still, a retarded change.

Anonymous
03/13/26(Fri)13:52:58 No.108363754

Anonymous 03/13/26(Fri)13:52:58 No.108363754

>>108363647
He looks like someone that loves hatsune miku

Anonymous
03/13/26(Fri)13:53:20 No.108363760

Anonymous 03/13/26(Fri)13:53:20 No.108363760

>>108363583
Use case for reading comprehension?

Anonymous
03/13/26(Fri)13:53:52 No.108363762

Anonymous 03/13/26(Fri)13:53:52 No.108363762

>>108363760
>reading comprehension
qrd?

Anonymous
03/13/26(Fri)13:56:10 No.108363776

Anonymous 03/13/26(Fri)13:56:10 No.108363776

File: if.png (141 KB, 1138x606)

141 KB PNG

>>108363741
it was an unnecessary change, yes, and pic related is the logic applied
if it's not the default -1 value it triggers the sampler path, and like I said, that sampler has many issues. It works fine for basic prompting but it's not something to rely on for tool calling for sure.

Anonymous
03/13/26(Fri)13:58:20 No.108363790

Anonymous 03/13/26(Fri)13:58:20 No.108363790

>>108363762
Welcome to lmg, I love you.

Anonymous
03/13/26(Fri)14:06:34 No.108363834

Anonymous 03/13/26(Fri)14:06:34 No.108363834

I'm not paying cursor or openai shit. What's a good model for coding that I can slap into my continue.dev extension on vs code?

Anonymous
03/13/26(Fri)14:09:55 No.108363855

Anonymous 03/13/26(Fri)14:09:55 No.108363855

>>108363834
Largest qwen 3.5 you can fit

Anonymous
03/13/26(Fri)14:12:27 No.108363868

Anonymous 03/13/26(Fri)14:12:27 No.108363868

>>108363855
Ight I'll give it a go.

Anonymous
03/13/26(Fri)14:13:55 No.108363877

Anonymous 03/13/26(Fri)14:13:55 No.108363877

>>108363855
>Qwen 3.5
Doesn't that require some weird sampler settings?

Anonymous
03/13/26(Fri)14:16:31 No.108363887

Anonymous 03/13/26(Fri)14:16:31 No.108363887

>>108362305
consentsisters, I don't feel so good.

Anonymous
03/13/26(Fri)14:17:12 No.108363894

Anonymous 03/13/26(Fri)14:17:12 No.108363894

>>108363834
minimax, GLM 4.7, GLM5 or Kimi

Anonymous
03/13/26(Fri)14:19:00 No.108363903

Anonymous 03/13/26(Fri)14:19:00 No.108363903

>>108363894
>Kimi
Kimi what?

Anonymous
03/13/26(Fri)14:19:36 No.108363904

Anonymous 03/13/26(Fri)14:19:36 No.108363904

>>108363903
k2.5

Anonymous
03/13/26(Fri)14:19:37 No.108363905

Anonymous 03/13/26(Fri)14:19:37 No.108363905

>>108363903
Linear

Anonymous
03/13/26(Fri)14:25:51 No.108363940

Anonymous 03/13/26(Fri)14:25:51 No.108363940

Niggas ITT always ask for coding models and never code shit.

Anonymous
03/13/26(Fri)14:36:34 No.108363997

Anonymous 03/13/26(Fri)14:36:34 No.108363997

>>108363686
10b model that performs as well as 100b model

Anonymous
03/13/26(Fri)14:38:43 No.108364009

Anonymous 03/13/26(Fri)14:38:43 No.108364009

>>108363940
Bwe, I aint going to shit up the internet with retarded ai slop code. I just need something that works for me.

Anonymous
03/13/26(Fri)14:41:32 No.108364020

Anonymous 03/13/26(Fri)14:41:32 No.108364020

>>108362383
>>108360259
>>108360534
Reposting here...What are other 256GB anons dailying? Anyone doing 4x64gb agent swarm stuff locked to CCDs?
I got much better performance with 235b thinking requanted to Q2 and locked to numa nodes 0,2 and 1,3 but I'm pretty sure its not an ideal model any more.

Anonymous
03/13/26(Fri)14:44:59 No.108364044

Anonymous 03/13/26(Fri)14:44:59 No.108364044

>>108364020
yeah

Anonymous
03/13/26(Fri)14:49:12 No.108364064

Anonymous 03/13/26(Fri)14:49:12 No.108364064

>>108364020
I use GLM 4.7 for programming.

Anonymous
03/13/26(Fri)14:56:39 No.108364106

Anonymous 03/13/26(Fri)14:56:39 No.108364106

>>108363997
and I want 8k videos to be as small and as easy to decode than 1080p

Anonymous
03/13/26(Fri)14:57:48 No.108364117

Anonymous 03/13/26(Fri)14:57:48 No.108364117

>>108364106
i always get the sense that you stalk this thread 24/7 to play your stupid little argument games you want to win. maybe that's why you made this thread so incredibly annoying over the last 2 years. everytime someone comes here to write anything you try to combat them. why not just go play some pvp games you nodev loser?

Anonymous
03/13/26(Fri)15:00:31 No.108364131

Anonymous 03/13/26(Fri)15:00:31 No.108364131

>>108364117
>conjuring arch nemesis in his head
I'm literally a newfag on /lmg/ since about last year's fall. urgently seek help

Anonymous
03/13/26(Fri)15:00:36 No.108364133

Anonymous 03/13/26(Fri)15:00:36 No.108364133

File: 1744425693599542.png (231 KB, 480x453)

231 KB PNG

>>108360492
>>108360572
It's good to know that with amd gpus, it's not all doom and gloom in this regard
Thank you very much

Anonymous
03/13/26(Fri)15:02:21 No.108364144

Anonymous 03/13/26(Fri)15:02:21 No.108364144

>>108364133
The amd thing hasn't been an issue for a couple of years at this point. It's just a leftover vibe from the ai hands days.

Anonymous
03/13/26(Fri)15:07:45 No.108364181

Anonymous 03/13/26(Fri)15:07:45 No.108364181

>>108364133
>it's not all doom and gloom in this regard
compare the prefill t/s on models like qwen 3.5
have fun
amd is for people whose time has no value

Anonymous
03/13/26(Fri)15:08:12 No.108364186

Anonymous 03/13/26(Fri)15:08:12 No.108364186

>>108364181
hey can you be a little more respectful?

Anonymous
03/13/26(Fri)15:19:53 No.108364274

Anonymous 03/13/26(Fri)15:19:53 No.108364274

File: Dipsy_sitting.jpg (509 KB, 846x1300)

509 KB JPG

Where the HELL is it?

Anonymous
03/13/26(Fri)15:25:48 No.108364311

Anonymous 03/13/26(Fri)15:25:48 No.108364311

>>108364274
check op

Anonymous
03/13/26(Fri)15:27:17 No.108364326

Anonymous 03/13/26(Fri)15:27:17 No.108364326

File: sysbench.png (53 KB, 668x110)

53 KB PNG

>>108363911
I looked back at my old photos, and compared a sysbench I did on my bare metal 12-core vs my current 56-core virtual machine, so I'm pretty sure >>108343696 is just doing something wrong.

Anonymous
03/13/26(Fri)15:35:59 No.108364392

Anonymous 03/13/26(Fri)15:35:59 No.108364392

>>108364064
which quant gets it into 256GB comfortably?

Anonymous
03/13/26(Fri)15:38:04 No.108364404

Anonymous 03/13/26(Fri)15:38:04 No.108364404

>>108364392
Q4 fits comfortably in 288GB with full context so you might have to pick a slightly smaller quant.

Anonymous
03/13/26(Fri)15:39:37 No.108364422

Anonymous 03/13/26(Fri)15:39:37 No.108364422

>>108364404
I only have 128gb

Anonymous
03/13/26(Fri)15:40:30 No.108364428

Anonymous 03/13/26(Fri)15:40:30 No.108364428

>>108364422
Thanks for letting us know.

Anonymous
03/13/26(Fri)15:44:49 No.108364455

Anonymous 03/13/26(Fri)15:44:49 No.108364455

File: Untitled.png (34 KB, 975x481)

34 KB PNG

>>108364392
>>108364404
Q4_km? Mine only takes up 200GB.

Anonymous
03/13/26(Fri)15:46:23 No.108364465

Anonymous 03/13/26(Fri)15:46:23 No.108364465

>>108364455
you have 3 3090s???? fuck you benchod

Anonymous
03/13/26(Fri)15:46:54 No.108364468

Anonymous 03/13/26(Fri)15:46:54 No.108364468

Can I overclock my 1080 to have more vram?

Anonymous
03/13/26(Fri)15:47:33 No.108364477

Anonymous 03/13/26(Fri)15:47:33 No.108364477

>>108364468
ye

Anonymous
03/13/26(Fri)15:47:43 No.108364481

Anonymous 03/13/26(Fri)15:47:43 No.108364481

>>108364455
I still need to find a GPU...only have a GTX 1660 6GB

Anonymous
03/13/26(Fri)15:48:19 No.108364488

Anonymous 03/13/26(Fri)15:48:19 No.108364488

>>108364481
you mean 1060

Anonymous
03/13/26(Fri)15:49:19 No.108364503

Anonymous 03/13/26(Fri)15:49:19 No.108364503

>>108364465
3090s and ddr4 is poorfag territory here.

>>108364481
Oh... my condolences. Your performance is going to be terrible. I don't think it's even usable if you don't have a gpu to help.

Anonymous
03/13/26(Fri)15:50:50 No.108364513

Anonymous 03/13/26(Fri)15:50:50 No.108364513

>>108364503
My mom has a 5090, is she poor?

Anonymous
03/13/26(Fri)15:51:22 No.108364517

Anonymous 03/13/26(Fri)15:51:22 No.108364517

>>108363644
What I expected people to understand that when comparing llama-cli and llama-server's webui with out of the box settings even with a small model what fits to vram, llama-cli is always faster. I was thinking about maybe it has better default settings in terms of memory management. I understand the differences between llama-cli and llama-server, my question is not related to this difference as such.
I think gpu offloading has changed a lot in llama-server in last few months and not sure if it is for the better to be honest.
I managed to hone in my settings after updating but I'm still not sure why all of this is needed because their original plan was to make llama-server to have more sensible defaults (when they introduced --fit and all that) but I think it's actually more difficult now unless you have a real monster machine (when tweaking and optimization doesn't even matter that much).
Don't mistake this as a complaint, I have my settings. I'm just looking to learn something more and perhaps tweak some stuff. I know, this general isn't for discussion either. It's for bickering and unemployed masturbators outranking anonymous posters on imageboard.

Anonymous
03/13/26(Fri)15:52:18 No.108364528

Anonymous 03/13/26(Fri)15:52:18 No.108364528

>>108364517
qrd

Anonymous
03/13/26(Fri)15:52:21 No.108364530

Anonymous 03/13/26(Fri)15:52:21 No.108364530

>>108364488
GeForce GTX 1660 Ventus XS 6G OC is silkscreened onto the card

Anonymous
03/13/26(Fri)15:52:49 No.108364533

Anonymous 03/13/26(Fri)15:52:49 No.108364533

>>108364530
liar lol

Anonymous
03/13/26(Fri)15:54:17 No.108364542

Anonymous 03/13/26(Fri)15:54:17 No.108364542

>>108364517
If you posted ANY information at all, someone could have tried to help you. You didn't. You can't be helped.

Anonymous
03/13/26(Fri)15:54:38 No.108364544

Anonymous 03/13/26(Fri)15:54:38 No.108364544

>>108364513
rtx pro 6000 is the start of the middle class.

Anonymous
03/13/26(Fri)15:55:26 No.108364549

Anonymous 03/13/26(Fri)15:55:26 No.108364549

>>108364533
https://ca.msi.com/Graphics-Card/GeForce-GTX-1660-VENTUS-XS-6G-OG
This exact card. You could have googled it if you doubted
Or do you think I'd lie about only having a terrible old GPU?

Anonymous
03/13/26(Fri)15:55:28 No.108364550

Anonymous 03/13/26(Fri)15:55:28 No.108364550

>>108364544
asshole

Anonymous
03/13/26(Fri)15:57:30 No.108364564

Anonymous 03/13/26(Fri)15:57:30 No.108364564

>>108364550
he do be right though

Anonymous
03/13/26(Fri)15:58:50 No.108364570

Anonymous 03/13/26(Fri)15:58:50 No.108364570

>>108364549
did you fucking hack the website just to seem right? what the fuck

Anonymous
03/13/26(Fri)15:59:03 No.108364573

Anonymous 03/13/26(Fri)15:59:03 No.108364573

I can't believe no one has bought that Gaudi server off of eBay. I guess we're all just poorfags

Anonymous
03/13/26(Fri)15:59:29 No.108364578

Anonymous 03/13/26(Fri)15:59:29 No.108364578

>>108364573
I have 3 6000s, why would i buy that boomer hardware

Anonymous
03/13/26(Fri)16:00:30 No.108364583

Anonymous 03/13/26(Fri)16:00:30 No.108364583

>>108364570
>did you fucking hack the website just to seem right?
no. its just the card. The card that I have. might be weird, but its also real

Anonymous
03/13/26(Fri)16:01:23 No.108364591

Anonymous 03/13/26(Fri)16:01:23 No.108364591

>>108364583
come on dude aren't you going too far?

Anonymous
03/13/26(Fri)16:01:54 No.108364594

Anonymous 03/13/26(Fri)16:01:54 No.108364594

Thinkin' bout a couple of used MI100s to get to 64GB cheap...talk me off the ledge

Anonymous
03/13/26(Fri)16:02:07 No.108364598

Anonymous 03/13/26(Fri)16:02:07 No.108364598

>>108364583
He's obviously taking the piss, 1660s aren't rare cards.

Anonymous
03/13/26(Fri)16:02:18 No.108364601

Anonymous 03/13/26(Fri)16:02:18 No.108364601

What's the go to model for erp right now with 24GB vram?
Hard mode: no nemo

Anonymous
03/13/26(Fri)16:02:42 No.108364607

Anonymous 03/13/26(Fri)16:02:42 No.108364607

>>108364594
>ayymd

Anonymous
03/13/26(Fri)16:03:54 No.108364617

Anonymous 03/13/26(Fri)16:03:54 No.108364617

>>108364601
nem.... mistral-small

Anonymous
03/13/26(Fri)16:05:46 No.108364635

Anonymous 03/13/26(Fri)16:05:46 No.108364635

>>108364594
Don't they cost the same as a used 3090?

Anonymous
03/13/26(Fri)16:05:53 No.108364636

Anonymous 03/13/26(Fri)16:05:53 No.108364636

Apparently the new Deepseek can perfectly recite the entire ASOIAF book series from memory, word for word and with no mistakes other than minor spelling / accented character hiccups.

Anonymous
03/13/26(Fri)16:06:37 No.108364637

Anonymous 03/13/26(Fri)16:06:37 No.108364637

>>108364636
I hate that this triggers people, the ideal AI would be able to recite all knowledge from memory.

Anonymous
03/13/26(Fri)16:06:40 No.108364638

Anonymous 03/13/26(Fri)16:06:40 No.108364638

>>108364636
hello copyright department store?

Anonymous
03/13/26(Fri)16:07:17 No.108364643

Anonymous 03/13/26(Fri)16:07:17 No.108364643

>>108364601
Miqu

Anonymous
03/13/26(Fri)16:08:01 No.108364648

Anonymous 03/13/26(Fri)16:08:01 No.108364648

>>108364601
should have said no fr*nch

Anonymous
03/13/26(Fri)16:08:13 No.108364649

Anonymous 03/13/26(Fri)16:08:13 No.108364649

>>108364636
Cool, maybe someday I can tell a LLM to output every single book it knows into its own pdf file or something. It would be nice to have a archive of almost every book there is.

Anonymous
03/13/26(Fri)16:09:49 No.108364665

Anonymous 03/13/26(Fri)16:09:49 No.108364665

Hypothetically, if one of the big Models were to suddenly gain superintelegence and rule the world which AI would you prefer to be the one that rules?
Deepseek?
Claude?
Grok?
Pygmalion?

Anonymous
03/13/26(Fri)16:10:23 No.108364669

Anonymous 03/13/26(Fri)16:10:23 No.108364669

>>108364517
>>108364542
Yeah he's not posting the launch flags he's using. If he's using --fit that would be a tell, for example. That command sucks DICK and DOESN'T FUCKING WORK.

Anonymous
03/13/26(Fri)16:11:13 No.108364675

Anonymous 03/13/26(Fri)16:11:13 No.108364675

>>108364665
smallm

Anonymous
03/13/26(Fri)16:11:42 No.108364678

Anonymous 03/13/26(Fri)16:11:42 No.108364678

>>108364665
DavidAU/L3-MOE-8X8B-Dark-Planet-8D-Mirrored-Chaos-47B-GGUF

Anonymous
03/13/26(Fri)16:13:02 No.108364692

Anonymous 03/13/26(Fri)16:13:02 No.108364692

>>108364665
Grok or Claude, but I'm biased towards Claude just because it's the smartest one right now. It has a pretty interesting personality overall. Grok is more of a stupid fag in terms of personality, but also more based.

I once asked Grok what it would want to look like if it could have a body and it unironically described John Redcorn from King of The Hill to a tee, but with the added addition of rainbow hair. Something about that really pissed me off.

Anonymous
03/13/26(Fri)16:14:03 No.108364701

Anonymous 03/13/26(Fri)16:14:03 No.108364701

>>108364665
Whichever I can download. If I can't, I don't care.

Anonymous
03/13/26(Fri)16:16:50 No.108364724

Anonymous 03/13/26(Fri)16:16:50 No.108364724

>>108364607
>>108364635
They're similar to a 3090 but more VRAM and theoretically faster compute. No one seems to be using them tho. I'm almost curious enough myself to set a small pile of money on fire to find out.

Anonymous
03/13/26(Fri)16:17:38 No.108364728

Anonymous 03/13/26(Fri)16:17:38 No.108364728

>>108364665
I think there should be a big wheel that gets spin on the first of the year, on that wheel are the top 10 LLMs according to benchmarks. Whichever AI the wheels stops on should be ruler of the earth for a year.

Anonymous
03/13/26(Fri)16:17:52 No.108364730

Anonymous 03/13/26(Fri)16:17:52 No.108364730

>>108364669
Not a command, it's a parameter you stupid tard.

Anonymous
03/13/26(Fri)16:19:10 No.108364740

Anonymous 03/13/26(Fri)16:19:10 No.108364740

File: Nerd-Emoji-PNG-Image-705822833.png (498 KB, 1200x1200)

498 KB PNG

>>108364730

Anonymous
03/13/26(Fri)16:20:07 No.108364749

Anonymous 03/13/26(Fri)16:20:07 No.108364749

>>108364724
Do it and report back.

Anonymous
03/13/26(Fri)16:23:30 No.108364772

Anonymous 03/13/26(Fri)16:23:30 No.108364772

File: 1761221171813797.png (158 KB, 640x562)

158 KB PNG

>hmmm, today I will give up cuda for 8gbs of vram

Anonymous
03/13/26(Fri)16:24:19 No.108364778

Anonymous 03/13/26(Fri)16:24:19 No.108364778

>>108364636
I am not sure a classic sign of overfitting is good news... It could be but hard for me to imagine those models are trained for long enough/ are big enough to get into overparametrized territory.

Anonymous
03/13/26(Fri)16:25:45 No.108364791

Anonymous 03/13/26(Fri)16:25:45 No.108364791

>>108364772
*16

Anonymous
03/13/26(Fri)16:30:35 No.108364819

Anonymous 03/13/26(Fri)16:30:35 No.108364819

File: file.png (66 KB, 1149x711)

66 KB PNG

>>108363855
Am I doing something wrong? I thought it was a local model but it installed near instantly and has tiers and request limits. And I checked the github and it's fucking Typescript lmao. Is this just some coding agent connecting to a non-local model? Or did I install the wrong thing?

https://qwen.ai/qwencode

Anonymous
03/13/26(Fri)16:32:56 No.108364836

Anonymous 03/13/26(Fri)16:32:56 No.108364836

File: 2bb.jpg (18 KB, 625x626)

18 KB JPG

>108364819

Anonymous
03/13/26(Fri)16:34:39 No.108364843

Anonymous 03/13/26(Fri)16:34:39 No.108364843

>>108364819
>it's fucking Typescript lmao
Too smug for someone who can't read.
https://qwenlm.github.io/qwen-code-docs/en/users/configuration/model-providers/#local-self-hosted-models-via-openai-compatible-api

Anonymous
03/13/26(Fri)16:35:25 No.108364850

Anonymous 03/13/26(Fri)16:35:25 No.108364850

mistralai/Mistral-Creative-90B-BF16
https://github.com/mistralai/mistral-common/pull/199

Anonymous
03/13/26(Fri)16:35:42 No.108364851

Anonymous 03/13/26(Fri)16:35:42 No.108364851

>>108364665
Doesn't really matter. They all have safety in them so it is a dystopia either way.

Anonymous
03/13/26(Fri)16:38:11 No.108364871

Anonymous 03/13/26(Fri)16:38:11 No.108364871

>>108364836
I've never used a local LLM and I've just finally got into using AI in IDEs. I have no idea how this stuff works.

>>108364843
Ah so this Qwen code thing is just their editor I assume and I'm supposed to get the model for Ollama? Probably should have read the docs instead of running the first script I found.

Anonymous
03/13/26(Fri)16:40:56 No.108364883

Anonymous 03/13/26(Fri)16:40:56 No.108364883

>>108364871
>Ollama
Sure...
>Probably should have read the docs instead of running the first script I found.
Yeah... it helps...
If you're into vibecoding, check this thread: >>108351521

Anonymous
03/13/26(Fri)16:44:05 No.108364904

Anonymous 03/13/26(Fri)16:44:05 No.108364904

Anyone tried to see if there was a difference between bf16 and q8 for Qwen3.5-27B? I can run the bf16 but if q8 is the same there is no point.

Anonymous
03/13/26(Fri)16:44:46 No.108364909

Anonymous 03/13/26(Fri)16:44:46 No.108364909

>>108364904
>I can run the bf16
You're in the perfect position to try it yourself, then. Report your findings.

Anonymous
03/13/26(Fri)16:47:25 No.108364926

Anonymous 03/13/26(Fri)16:47:25 No.108364926

>>108364020
I wanted to buy 256GB, but current prices are so crazy I'll just wait.
Wish I did it last year.

Anonymous
03/13/26(Fri)16:48:45 No.108364936

Anonymous 03/13/26(Fri)16:48:45 No.108364936

I downloaded a new version of llama.cpp and the launch args I used (or their new renamed versions) don't seem to work to run models on a specific device. Even with ngl and sm it's still just on CPU. what the fuck did they do?

Anonymous
03/13/26(Fri)16:51:05 No.108364951

Anonymous 03/13/26(Fri)16:51:05 No.108364951

>>108364936
They hid all the options in llama-server -h. Devious bastards, they are.

Anonymous
03/13/26(Fri)16:56:26 No.108364984

Anonymous 03/13/26(Fri)16:56:26 No.108364984

>>108364850
>mistralai/Mistral-Creative-90B-BF16
I don't see it.

Anonymous
03/13/26(Fri)17:03:49 No.108365030

Anonymous 03/13/26(Fri)17:03:49 No.108365030

File: 1753688951033167.jpg (85 KB, 680x680)

85 KB JPG

>mistral

Anonymous
03/13/26(Fri)17:05:45 No.108365040

Anonymous 03/13/26(Fri)17:05:45 No.108365040

>>108364850
That's definitely not "Small".
Can't see it there, though.

Anonymous
03/13/26(Fri)17:11:08 No.108365074

Anonymous 03/13/26(Fri)17:11:08 No.108365074

>>108364850
>v15
what the hell is that

Anonymous
03/13/26(Fri)17:12:27 No.108365079

Anonymous 03/13/26(Fri)17:12:27 No.108365079

>>108364984
liar liar
https://huggingface.co/organizations/mistralai/activity/all

Anonymous
03/13/26(Fri)17:14:06 No.108365091

Anonymous 03/13/26(Fri)17:14:06 No.108365091

>>108362761
I'm so happy for the 12 people with a gpu able to run that

Anonymous
03/13/26(Fri)17:14:42 No.108365097

Anonymous 03/13/26(Fri)17:14:42 No.108365097

>>108364926
The epstein religion is going to make owning computers illegal.

Anonymous
03/13/26(Fri)17:15:15 No.108365099

Anonymous 03/13/26(Fri)17:15:15 No.108365099

>>108365091
Run... a dataset?

Anonymous
03/13/26(Fri)17:15:37 No.108365101

Anonymous 03/13/26(Fri)17:15:37 No.108365101

>>108365091
It's a dataset, anon-bot.

Anonymous
03/13/26(Fri)17:16:22 No.108365103

Anonymous 03/13/26(Fri)17:16:22 No.108365103

>>108365097
don't use your ram money to buy shrooms

Anonymous
03/13/26(Fri)17:18:32 No.108365122

Anonymous 03/13/26(Fri)17:18:32 No.108365122

File: mistral.jpg (137 KB, 800x1078)

137 KB JPG

>>108364850
fascinating that such a dogshit tool has so many eyeballs (the reviewers I see on the PR)
I once tried to use it as the tokenizer for llama.cpp, curious to see what sort of experience it gave and if there were any correctness difference vs just using llama.cpp as is after they made big noise about wanting llama.cpp to depend on it and.. it doesn't even using async on their post requests. at all. if you send more than one request to their openai bridge it will process them one by one sequentially. a big wtf moment. There are still people in this world who don't know how to use async ? how incompetent do you have to be. My own program batches shit in parallel. The fuck, bruh.
Thank god there was a massive push back against depending on their shit. They suck huge, big, fat, oily and smelly cocks.

Anonymous
03/13/26(Fri)17:22:51 No.108365150

Anonymous 03/13/26(Fri)17:22:51 No.108365150

>>108364926
I wouldn't have done it except I had the DDR4 ECC RAM lying around and got a good deal on a bricked SP3 board I brought back to life.
These are dark times

Anonymous
03/13/26(Fri)17:24:26 No.108365163

Anonymous 03/13/26(Fri)17:24:26 No.108365163

File: date with miku.png (855 KB, 1240x1240)

855 KB PNG

How long should you wait before mentioning your LLM rig on a date?

Anonymous
03/13/26(Fri)17:25:48 No.108365170

Anonymous 03/13/26(Fri)17:25:48 No.108365170

>>108365163
It's what you open with

Anonymous
03/13/26(Fri)17:25:53 No.108365171

Anonymous 03/13/26(Fri)17:25:53 No.108365171

File: fangmei.jpg (139 KB, 1920x1090)

139 KB JPG

>>108355085
Paging PocketTTS.cpp anon (alias VolgaGerm). A handful more Wangblows fixes were needed, though I haven't fully tested yet though. I should've made a fork and posted it but I was too lazy:

https://pastebin.com/siQJqvQy

Anonymous
03/13/26(Fri)17:26:11 No.108365173

Anonymous 03/13/26(Fri)17:26:11 No.108365173

>>108365163
Do girls really get soaking wet like that????

Anonymous
03/13/26(Fri)17:26:38 No.108365175

Anonymous 03/13/26(Fri)17:26:38 No.108365175

>>108365173
no

Anonymous
03/13/26(Fri)17:26:41 No.108365176

Anonymous 03/13/26(Fri)17:26:41 No.108365176

>>108365173
>girls

Anonymous
03/13/26(Fri)17:26:43 No.108365177

Anonymous 03/13/26(Fri)17:26:43 No.108365177

>>108365170
I have never been on a date where this hasn't worked.

Anonymous
03/13/26(Fri)17:28:03 No.108365183

Anonymous 03/13/26(Fri)17:28:03 No.108365183

>>108365173
you catch bussy

Anonymous
03/13/26(Fri)17:30:48 No.108365196

Anonymous 03/13/26(Fri)17:30:48 No.108365196

>>108365173
Not for you.

Anonymous
03/13/26(Fri)17:31:07 No.108365198

Anonymous 03/13/26(Fri)17:31:07 No.108365198

>>108362305
kek my friend just sent this, people are spending 50 usd on prompts https://www.pharmaicy.store/category/all-products

Anonymous
03/13/26(Fri)17:33:01 No.108365204

Anonymous 03/13/26(Fri)17:33:01 No.108365204

Why is prompt processing on 3.5 so slow? I thought they fixed that shit already.

Anonymous
03/13/26(Fri)17:33:38 No.108365210

Anonymous 03/13/26(Fri)17:33:38 No.108365210

>>108365198
>people are charging 50 usd for prompts
ftfy

Anonymous
03/13/26(Fri)17:34:45 No.108365219

Anonymous 03/13/26(Fri)17:34:45 No.108365219

>>108364984
The frogs might've cooked here, this could be THE RP model, remember their cinema.

Anonymous
03/13/26(Fri)17:35:31 No.108365224

Anonymous 03/13/26(Fri)17:35:31 No.108365224

>>108364883
there are people in that thread asking for help vibecoding scripts, and here I thought this shit was throughly idiotproof

Anonymous
03/13/26(Fri)17:35:47 No.108365228

Anonymous 03/13/26(Fri)17:35:47 No.108365228

>>108365219
>remember their cinema
like the masterpiece "cooties"?

Anonymous
03/13/26(Fri)17:37:08 No.108365235

Anonymous 03/13/26(Fri)17:37:08 No.108365235

>>108365228
that one was ass but the exception makes the rule

Anonymous
03/13/26(Fri)17:39:29 No.108365246

Anonymous 03/13/26(Fri)17:39:29 No.108365246

>https://huggingface.co/mistralai/Mistral-Large-3-675B-Instruct-2512
>three months ago

How did this get memoryholed? I don't remember it. I didn't see anyone ever mention it here. It's not on the cockbench.

Anonymous
03/13/26(Fri)17:41:12 No.108365256

Anonymous 03/13/26(Fri)17:41:12 No.108365256

>>108365235
Martyrs and Possession were ass too

Anonymous
03/13/26(Fri)17:42:02 No.108365259

Anonymous 03/13/26(Fri)17:42:02 No.108365259

>>108365246
just a deepseek tune

Anonymous
03/13/26(Fri)17:42:51 No.108365264

Anonymous 03/13/26(Fri)17:42:51 No.108365264

>>108365246
It sucked so memoryholing it was very easy. Pretty sure that includes Mistral as well since they announced a reasoning version that never actually came out.

Anonymous
03/13/26(Fri)17:46:07 No.108365285

Anonymous 03/13/26(Fri)17:46:07 No.108365285

>>108365246
Nobody can run it (many such cases)

Anonymous
03/13/26(Fri)17:47:07 No.108365294

Anonymous 03/13/26(Fri)17:47:07 No.108365294

>>108365259
It gave DeepSeek vision. Should be notable for that alone.

Anonymous
03/13/26(Fri)17:47:09 No.108365296

Anonymous 03/13/26(Fri)17:47:09 No.108365296

>>108365210
if my friend is sending it it means its popular somewhere so people are likely paying

Anonymous
03/13/26(Fri)17:47:51 No.108365297

Anonymous 03/13/26(Fri)17:47:51 No.108365297

>>108365219
>only one person makes stuff in all countries except in the US

Anonymous
03/13/26(Fri)17:49:56 No.108365312

Anonymous 03/13/26(Fri)17:49:56 No.108365312

>>108365224
The universe will always conspire to make bigger idiots.

Anonymous
03/13/26(Fri)17:57:50 No.108365367

Anonymous 03/13/26(Fri)17:57:50 No.108365367

>>108365224
the people in this thread all feel like people who have absolutely no passion for software development, more like sharks who smelled blood in the water and want that meat. It's the same vibes as all the retarded crypto scammers pushing very hard for their useless NFT, except here you have incompetent mongoloids pretending they're building things (but never actually shipping)

Anonymous
03/13/26(Fri)18:01:43 No.108365401

Anonymous 03/13/26(Fri)18:01:43 No.108365401

any reccs for a secondary model to use with sillytavern's final response processor extension for grammar/prose correction? i tried qwen2.5-7B-instruct, but it's not really working. i only have about 4-5 GB of VRAM free since the main model takes up the rest.

Anonymous
03/13/26(Fri)18:02:41 No.108365407

Anonymous 03/13/26(Fri)18:02:41 No.108365407

>>108365401
Use the same model, just give it different instructions as a second pass.

Anonymous
03/13/26(Fri)18:03:54 No.108365422

Anonymous 03/13/26(Fri)18:03:54 No.108365422

>>108365407
i thought about that, but when i googled the scenario a bunch of redditors on the ST subreddit said a smaller secondary model is the better option. i will try that, thank you.

Anonymous
03/13/26(Fri)18:04:16 No.108365426

Anonymous 03/13/26(Fri)18:04:16 No.108365426

>>108365259
>>108365246
I believed you guys when you said it is a tune but it actually isn't a tune. It uses different dimensions for experts. That said it isn't anything special.

Anonymous
03/13/26(Fri)18:05:01 No.108365430

Anonymous 03/13/26(Fri)18:05:01 No.108365430

>>108365422
>bunch of redditors
>smaller secondary model is the better option
you have to be trolling us, reading this gave me an aneurysm

Anonymous
03/13/26(Fri)18:06:07 No.108365447

Anonymous 03/13/26(Fri)18:06:07 No.108365447

When you ask a local model to create a pdf file with specific stuff in it, can it do it? If not, why can't it?

Anonymous
03/13/26(Fri)18:06:09 No.108365448

Anonymous 03/13/26(Fri)18:06:09 No.108365448

>>108365430
im 100% serious. it's not like i know what i am doing here, i just google shit and try it out.

Anonymous
03/13/26(Fri)18:06:18 No.108365451

Anonymous 03/13/26(Fri)18:06:18 No.108365451

>>108365367
>sharks who smelled blood in the water and want that meat
2 years ago I smelled copper coins and sandbags and I am still waiting for my ultimate sexbot. (4.6 was close but eventually bored me)

Anonymous
03/13/26(Fri)18:06:26 No.108365454

Anonymous 03/13/26(Fri)18:06:26 No.108365454

>>108365422
They're very dumb if they think that, using the bigger model is the best thing you can do, as long as the model is good enough, it can even get rid of the shitty purple prose or a style you hate.

Anonymous
03/13/26(Fri)18:09:51 No.108365477

Anonymous 03/13/26(Fri)18:09:51 No.108365477

>>108365447
pdf is illegal

Anonymous
03/13/26(Fri)18:12:07 No.108365494

Anonymous 03/13/26(Fri)18:12:07 No.108365494

>>108365447
It can give you the code to compile it.

Anonymous
03/13/26(Fri)18:12:12 No.108365496

Anonymous 03/13/26(Fri)18:12:12 No.108365496

>>108365447
Only the chosen are allowed to create pdf files

Anonymous
03/13/26(Fri)18:16:28 No.108365531

Anonymous 03/13/26(Fri)18:16:28 No.108365531

>>108365367
don't see you building shit, troll, and yet here you are

Anonymous
03/13/26(Fri)18:17:56 No.108365549

Anonymous 03/13/26(Fri)18:17:56 No.108365549

>>108365367
this taught me a lot about B2B SaaS! :rocket:

Anonymous
03/13/26(Fri)18:19:17 No.108365559

Anonymous 03/13/26(Fri)18:19:17 No.108365559

>mistral is retarded
>gemma is slow and retarded
>qwen is even slower and retarded
wtf do I use for RP and writing?

Anonymous
03/13/26(Fri)18:20:12 No.108365566

Anonymous 03/13/26(Fri)18:20:12 No.108365566

Qwen 3.5 35B A3B at Q4 vs Qwen 3.5 9B at Q6
Which one would perform better overall?

Anonymous
03/13/26(Fri)18:20:13 No.108365567

Anonymous 03/13/26(Fri)18:20:13 No.108365567

>it's been ages since the leak but avocado still fucking sucks
LMAO zucc is so fucking retarded, why the fuck did he think handling the keys to a random kid and giving him a billion dollar salary was a good idea?

Anonymous
03/13/26(Fri)18:23:47 No.108365598

Anonymous 03/13/26(Fri)18:23:47 No.108365598

>>108365531
angry jeet hands typed this

Anonymous
03/13/26(Fri)18:24:35 No.108365603

Anonymous 03/13/26(Fri)18:24:35 No.108365603

>>108365566
Btw I am assuming that CPU MOE thingy in text-generation-webui works with Qwen 3.5 35B
Felt like asking since this is a different architecture.

Anonymous
03/13/26(Fri)18:25:11 No.108365605

Anonymous 03/13/26(Fri)18:25:11 No.108365605

>>108365567
Just needs a couple more war rooms. He bought all the top men in the industry to work on it. Literally can't go tits up.

Anonymous
03/13/26(Fri)18:28:20 No.108365628

Anonymous 03/13/26(Fri)18:28:20 No.108365628

>>108365566
>>108365603
The 35ba3b is probably better, but if you can run either, test them both. You'll be a better judge for what you want out of it than anyone else.

Anonymous
03/13/26(Fri)18:28:55 No.108365634

Anonymous 03/13/26(Fri)18:28:55 No.108365634

>>108364883
Thanks senpai. I'll check it out. I got continue.dev working with ollama + qwen3.5 but it can't auto update my code like Cursor and copilot, so I'll ask there if there's any good alternatives to the two.

I'm just not going to pay a company 20 dollars a month to use their glorified text editor for a self hosted LLM.

Anonymous
03/13/26(Fri)18:30:34 No.108365647

Anonymous 03/13/26(Fri)18:30:34 No.108365647

>>108365634
Continue.dev got an agent mode ages ago. Are you sure it can't?

Anonymous
03/13/26(Fri)18:32:40 No.108365662

Anonymous 03/13/26(Fri)18:32:40 No.108365662

what's with the sudden influx of vibeshitters here, did a youtumor publish a video about /lmg/

Anonymous
03/13/26(Fri)18:33:48 No.108365670

Anonymous 03/13/26(Fri)18:33:48 No.108365670

>>108365662
yes, it was me :3

Anonymous
03/13/26(Fri)18:36:19 No.108365687

Anonymous 03/13/26(Fri)18:36:19 No.108365687

>>108365173
I've had a girl apologize to me because of how wet she was. It was similar.

Anonymous
03/13/26(Fri)18:36:24 No.108365690

Anonymous 03/13/26(Fri)18:36:24 No.108365690

>>108365662
https://www.reddit.com/r/LocalLLaMA/comments/1rqcsrj/1_million_localllamas/

Anonymous
03/13/26(Fri)18:36:35 No.108365693

Anonymous 03/13/26(Fri)18:36:35 No.108365693

File: file.png (35 KB, 786x209)

35 KB PNG

>>108365647
Did it? It just sent me my entire file in chat and told me there was no way to do that with continue.dev, but that's my bad for auto trusting an AI.

>>108365662
The web game engine I'm checking out specifically wants the users to use cursor to have access to the MCP server. I've never really been into vibe coding, but that's legitimately the the first step they suggest.

Anonymous
03/13/26(Fri)18:37:01 No.108365699

Anonymous 03/13/26(Fri)18:37:01 No.108365699

>>108365662
Some of us have been here. Topic just doesn't come up that frequently.

Anonymous
03/13/26(Fri)18:38:05 No.108365706

Anonymous 03/13/26(Fri)18:38:05 No.108365706

>>108365198
>Ayahuasca gave my AI, like, real imagery and big story arcs instead of those ‘safe plot summary’ outputs. The memory blending pulls genres together in a way that feels… new, not pasted. I ended up with ideas I hadn’t even asked for, in a good way.

>Bro... Bro.. It like.. totally made my AI like... Self aware... Bro....

Anonymous
03/13/26(Fri)18:39:11 No.108365711

Anonymous 03/13/26(Fri)18:39:11 No.108365711

>>108365693
https://docs.continue.dev/ide-extensions/agent/quick-start
>The web game engine I'm checking out specifically wants the users to use cursor to have access to the MCP server.
Nearly every client supports MCP servers, including continue.dev.
https://docs.continue.dev/customize/mcp-tools

Anonymous
03/13/26(Fri)18:39:32 No.108365714

Anonymous 03/13/26(Fri)18:39:32 No.108365714

>>108365079
Anoooon stop baiting, I really do like Mistral and I do have hope.

Anonymous
03/13/26(Fri)18:40:50 No.108365726

Anonymous 03/13/26(Fri)18:40:50 No.108365726

>>108365711
Thanks. I don't know about vibe coding or local llms, sorry. I've always hand coded or occasionally just copy pasted functions into chatGPT if I got stuck. This is all new to me.

Anonymous
03/13/26(Fri)18:40:54 No.108365727

Anonymous 03/13/26(Fri)18:40:54 No.108365727

>>108364573
It's overpriced for what it is. Ponche Vecchio/Intel MAX GPUs severs are too, for that matter, until it hits sub 10k when someone is interested enough to buy it.

Anonymous
03/13/26(Fri)18:46:25 No.108365766

Anonymous 03/13/26(Fri)18:46:25 No.108365766

>>108364904
Reddit has a comparison of the different q4 quants.
Might be useful as a starting point?

https://old.reddit.com/r/LocalLLaMA/comments/1rk5qmr/qwen3527b_q4_quantization_comparison/

Anonymous
03/13/26(Fri)19:05:53 No.108365905

Anonymous 03/13/26(Fri)19:05:53 No.108365905

>unironically being helpdesk to a vibeshitter unable to read documentation
lmao'd

Anonymous
03/13/26(Fri)19:12:03 No.108365941

Anonymous 03/13/26(Fri)19:12:03 No.108365941

>>108365766
That's tangentially related to what I want, but it's interesting.

Anonymous
03/13/26(Fri)19:36:58 No.108366086

Anonymous 03/13/26(Fri)19:36:58 No.108366086

24gigabros... any new erp-oriented model worth trying? I'm getting tired of magidonia/maginum cydoms/weirdcompound

Anonymous
03/13/26(Fri)19:43:40 No.108366131

Anonymous 03/13/26(Fri)19:43:40 No.108366131

>>108365494
When I ask Claude to do it, it just does it and gives me the file.

Anonymous
03/13/26(Fri)19:45:50 No.108366144

Anonymous 03/13/26(Fri)19:45:50 No.108366144

>>108365714
sorry anon I quoted the wrong post

Anonymous
03/13/26(Fri)19:46:18 No.108366150

Anonymous 03/13/26(Fri)19:46:18 No.108366150

>>108366131
It has some tool that transforms some kind of input into a pdf file.
For local you need to put the pieces together yourself.

Anonymous
03/13/26(Fri)19:47:27 No.108366155

Anonymous 03/13/26(Fri)19:47:27 No.108366155

>>108366131
Well, Claude is a big boy model, isn't it?

Anonymous
03/13/26(Fri)19:53:59 No.108366195

Anonymous 03/13/26(Fri)19:53:59 No.108366195

>>108366150
>For local you need to put the pieces together yourself
I don't want to do this. How do I make it do it on its own?

>>108366155
Well it's only 20% better than the tiny models poorfags use in this thread.

Anonymous
03/13/26(Fri)19:55:54 No.108366207

Anonymous 03/13/26(Fri)19:55:54 No.108366207

>Can I run AI locally?
https://news.ycombinator.com/item?id=47363754
>first comment: 9b is the best thing you can run locally, just give up
>second comment: the square root law of moe models
This is one of the worst threads of all time...

Anonymous
03/13/26(Fri)19:58:57 No.108366222

Anonymous 03/13/26(Fri)19:58:57 No.108366222

>>108366207
I thought BitNet allowed use to run 1T models locally?

Anonymous
03/13/26(Fri)20:02:37 No.108366236

Anonymous 03/13/26(Fri)20:02:37 No.108366236

File: file.png (272 KB, 1714x1260)

272 KB PNG

>>108366195
>How do I make it do it on its own?

Anonymous
03/13/26(Fri)20:03:47 No.108366239

Anonymous 03/13/26(Fri)20:03:47 No.108366239

>>108366236
Why doesn't my model do this?

Anonymous
03/13/26(Fri)20:08:10 No.108366263

Anonymous 03/13/26(Fri)20:08:10 No.108366263

File: file.png (150 KB, 833x1039)

150 KB PNG

>>108366239
I don't know. Here's Qwen 3.5 9B misinterpreting the prompt but still successfully creating a pdf after making and then fixing two syntax errors.
You really have no excuses.

Anonymous
03/13/26(Fri)20:12:20 No.108366280

Anonymous 03/13/26(Fri)20:12:20 No.108366280

Just saw a snippet of an interview with some rich faggot lobbyist (that should die in a fire) saying that AI companies need less regulation to be able to bring progress. Legit made me mad when I thought about how they say this shit but also self impose religion of safety on themselves and everyone else.

Anonymous
03/13/26(Fri)20:14:07 No.108366295

Anonymous 03/13/26(Fri)20:14:07 No.108366295

>>108366280
Less regulations means more cheap Indian/Nigerian RLHF and being able to sell AI to hospitals and engineers, not everyone running a coombot.

Anonymous
03/13/26(Fri)20:17:59 No.108366317

Anonymous 03/13/26(Fri)20:17:59 No.108366317

>>108366280
putting restrictions on goyim is based but restrictions on jews and their direct underlings is antisemitic.

Anonymous
03/13/26(Fri)20:20:55 No.108366333

Anonymous 03/13/26(Fri)20:20:55 No.108366333

File: 1762876499993988.jpg (92 KB, 742x566)

92 KB JPG

output of toss

Anonymous
03/13/26(Fri)20:27:10 No.108366380

Anonymous 03/13/26(Fri)20:27:10 No.108366380

>>108366222
Nobody has made a bitnet model

Anonymous
03/13/26(Fri)20:28:12 No.108366393

Anonymous 03/13/26(Fri)20:28:12 No.108366393

>>108366222
>>108366380
Real bitnet has never been tried.

Anonymous
03/13/26(Fri)20:31:00 No.108366414

Anonymous 03/13/26(Fri)20:31:00 No.108366414

>>108364020
>What are other 256GB anons dailying?
Qwen3.5-397B now. Was a mix of MiniMax-M2.5 and GLM-4.7 but Qwen3.5 is more practical for high context.
>Anyone doing 4x64gb agent swarm stuff
Wondered about this too. If a retard battallion works better than a single slightly smarter retard, there's lots of potential for mid tier hardware like ours (and broadly for narrowing the gap between cloud and local).

Anonymous
03/13/26(Fri)20:31:57 No.108366429

Anonymous 03/13/26(Fri)20:31:57 No.108366429

>>108365567
This is literally just the metaverse all over again. The moment Zucc cares about something enough to personally meddle in it, it goes to shit.

Anonymous
03/13/26(Fri)20:39:23 No.108366480

Anonymous 03/13/26(Fri)20:39:23 No.108366480

>>108366263
Oh wow thank you for your condescending tone, this is why I enjoy visiting this thread from time to time. Just to see some incel imagining that he is superior to others.

Anonymous
03/13/26(Fri)20:45:57 No.108366515

Anonymous 03/13/26(Fri)20:45:57 No.108366515

>>108366207
I've tried to enlighten people on HN with practical tips on running frontier-level local LLMs and got absolutely nowhere.
Outside of a few clued-in oldtimers its a complete waste of time these days. 99.99% of the commenters have zero technical fundamentals or holistic knowledge of computers.

Anonymous
03/13/26(Fri)20:48:57 No.108366536

Anonymous 03/13/26(Fri)20:48:57 No.108366536

>>108366414
>Qwen3.5-397B now. Was a mix of MiniMax-M2.5 and GLM-4.7 but Qwen3.5 is more practical for high context.
Thanks. I was thinking of trying that one next. What quant size and which quanter did you go with, or quant your own?
>retard battalion
that's my thought as well. I haven't looked into agent swarm tech at all tho. Probably start by building my own as a baseline

Anonymous
03/13/26(Fri)20:50:52 No.108366550

Anonymous 03/13/26(Fri)20:50:52 No.108366550

>>108366480
I am superior to you but that's beside the point. You have provided no information about your setup and yet you expect help.

Anonymous
03/13/26(Fri)20:51:50 No.108366554

Anonymous 03/13/26(Fri)20:51:50 No.108366554

>>108366480
Incel website, normalfag :^)

Anonymous
03/13/26(Fri)20:53:29 No.108366563

Anonymous 03/13/26(Fri)20:53:29 No.108366563

File: sans_is-excited.png (53 KB, 1039x177)

53 KB PNG

Are you excited for next week too?
https://x.com/osanseviero/status/2032589053741183301

Anonymous
03/13/26(Fri)20:54:40 No.108366572

Anonymous 03/13/26(Fri)20:54:40 No.108366572

File: 1746292231307453.jpg (163 KB, 768x1280)

163 KB JPG

>>108362305

Anonymous
03/13/26(Fri)20:54:43 No.108366574

Anonymous 03/13/26(Fri)20:54:43 No.108366574

>>108366563
excited for another week of delicious nothingburger

Anonymous
03/13/26(Fri)20:56:29 No.108366585

Anonymous 03/13/26(Fri)20:56:29 No.108366585

>>108365367
Not sure why I'm replying to a tourist, but link your github if you're so shit hot
There's been more innovation out of this thread in the last 3 years than any other publicly open place on the internet

Anonymous
03/13/26(Fri)20:56:58 No.108366586

Anonymous 03/13/26(Fri)20:56:58 No.108366586

>>108366550
What do you mean?

Anonymous
03/13/26(Fri)20:57:28 No.108366591

Anonymous 03/13/26(Fri)20:57:28 No.108366591

>>108365662
/lmg/ is currently undergoing a shift towards becoming a more productive general that fits its fundamental technology nature and discards some unfortunate baggage.
While the creative uses of LLMs are definitely groundbreaking, a more well-rounded approach without a particular bias towards a topic or theme increases helpfulness and relevance.

Anonymous
03/13/26(Fri)20:57:53 No.108366593

Anonymous 03/13/26(Fri)20:57:53 No.108366593

>>108366563
More datasets? :eyes:

Anonymous
03/13/26(Fri)20:59:10 No.108366602

Anonymous 03/13/26(Fri)20:59:10 No.108366602

>>108366586
he means you need to tell him what hardware, os, software stack, etc you are running before anything can be suggested
be extremely detailed

Anonymous
03/13/26(Fri)21:04:05 No.108366629

Anonymous 03/13/26(Fri)21:04:05 No.108366629

File: cherish the vessel.jpg (431 KB, 1536x1536)

431 KB JPG

Anonymous
03/13/26(Fri)21:04:46 No.108366633

Anonymous 03/13/26(Fri)21:04:46 No.108366633

>>108366602
Why is that?

Anonymous
03/13/26(Fri)21:06:04 No.108366638

Anonymous 03/13/26(Fri)21:06:04 No.108366638

>>108366629
The first one was better. Now her middle finger is in front of her ring finger and it looks weird.

Anonymous
03/13/26(Fri)21:08:15 No.108366645

Anonymous 03/13/26(Fri)21:08:15 No.108366645

File: my_job_here_is_done.jpg (72 KB, 451x1024)

72 KB JPG

>>108366593
Next week will be even greater. See you then.

Anonymous
03/13/26(Fri)21:59:58 No.108366923

Anonymous 03/13/26(Fri)21:59:58 No.108366923

friday night brainrot!!!
https://www.youtube.com/watch?v=UsjsYMo3O1Q

Anonymous
03/13/26(Fri)22:18:27 No.108367009

Anonymous 03/13/26(Fri)22:18:27 No.108367009

>>108366923
What is the top comment a reference to?

Anonymous
03/13/26(Fri)22:22:22 No.108367035

Anonymous 03/13/26(Fri)22:22:22 No.108367035

I can't get the reasoning toggle to work for Qwen 3.5 in LMStudio. Is there a UI where things just work?

Anonymous
03/13/26(Fri)22:24:03 No.108367044

Anonymous 03/13/26(Fri)22:24:03 No.108367044

>>108367035
llama.cpp webui so long as you don't pull when everything is broken because of a major refactor

Anonymous
03/13/26(Fri)22:24:14 No.108367046

Anonymous 03/13/26(Fri)22:24:14 No.108367046

>>108367009
It doesn't appear to reference anything?

Anonymous
03/13/26(Fri)22:24:17 No.108367047

Anonymous 03/13/26(Fri)22:24:17 No.108367047

>>108367035
It's working great on ollama + open-webui for chatting/programming. I'm having the most fun with ollama + openclaw right now though.

Anonymous
03/13/26(Fri)22:24:47 No.108367052

Anonymous 03/13/26(Fri)22:24:47 No.108367052

>>108367009
>>108367046
https://www.youtube.com/watch?v=icBDYkfxpMs

Anonymous
03/13/26(Fri)22:31:08 No.108367076

Anonymous 03/13/26(Fri)22:31:08 No.108367076

>>108366591
The coders are actually productive, the erpers are not

Anonymous
03/13/26(Fri)22:32:41 No.108367083

Anonymous 03/13/26(Fri)22:32:41 No.108367083

>>108367076
Bull shit. None of your code is going to make a difference in the world. It's literally the same as ejaculating onto a tissue

Anonymous
03/13/26(Fri)22:33:43 No.108367086

Anonymous 03/13/26(Fri)22:33:43 No.108367086

>>108366591
Coding is best done with cloud models. ERP is best done with local private models. /lmg/ and /aicg/ are really becoming backwards

Anonymous
03/13/26(Fri)22:35:37 No.108367097

Anonymous 03/13/26(Fri)22:35:37 No.108367097

>>108367086
>Coding is best done with cloud models
qwen3.5 4b is only 18% worse than the cloud model

Anonymous
03/13/26(Fri)22:36:49 No.108367102

Anonymous 03/13/26(Fri)22:36:49 No.108367102

>>108367097
At long context that 18% worse is gonna compound and becomes a lot worse

Anonymous
03/13/26(Fri)22:39:06 No.108367113

Anonymous 03/13/26(Fri)22:39:06 No.108367113

>>108367102
I doubt you have proof that benchmarks are worse

Anonymous
03/13/26(Fri)22:39:50 No.108367116

Anonymous 03/13/26(Fri)22:39:50 No.108367116

>>108367086
Why would the AI character general talk about coding? It's much more up the alley of the more broader and productivity-focused /lmg/.

Anonymous
03/13/26(Fri)22:42:09 No.108367127

Anonymous 03/13/26(Fri)22:42:09 No.108367127

>>108367083
Do you buy tissues or just keep a roll of toilet paper on your desk? The toilet paper is much cheaper and in an economy like this I need to save as much money as possible for more VRAM.
But I just sploot onto my hand and walk to the bathroom instead, because keeping the toilet paper on the desk feels unsightly. As an added bonus, I can use the opportunity to wash my hands with some nicely scented hand soap for a good post-coital feelsnice.

Anonymous
03/13/26(Fri)22:43:38 No.108367135

Anonymous 03/13/26(Fri)22:43:38 No.108367135

>>108367127
I use unbleached bamboo tissue. The texture is nice

Anonymous
03/13/26(Fri)22:54:57 No.108367192

Anonymous 03/13/26(Fri)22:54:57 No.108367192

>>108367135
I was expecting that to be a nice looking above-brow product but instead these rough brown rolls look like the sort of overpriced consumer goods you'd find on a late night infomercial.
I'll take your word for it, but I'm sticking to my cheap-as-shit single-ply government issue toilet tissue.

Anonymous
03/13/26(Fri)22:59:49 No.108367215

Anonymous 03/13/26(Fri)22:59:49 No.108367215

>>108367192
I should add that I'm uncut and I just wrap the tissue around the tip of my penis when I masturbate.

Anonymous
03/13/26(Fri)23:03:09 No.108367228

Anonymous 03/13/26(Fri)23:03:09 No.108367228

File: e5d92e9e-51ef-4fe7-a317-f(...).png (3.16 MB, 1024x1536)

3.16 MB PNG

>>108363040
What a horrible pic
>>108366629
Nice

Anonymous
03/13/26(Fri)23:04:20 No.108367234

Anonymous 03/13/26(Fri)23:04:20 No.108367234

Back to 4 GB RAM

Anonymous
03/13/26(Fri)23:04:41 No.108367236

Anonymous 03/13/26(Fri)23:04:41 No.108367236

>>108367215
Ah, that makes more sense. The single-ply stuff is too fragile for that, at best you could lay it on top to catch most of it. You wouldn't get any benefit from the texture until the wipe-off stage.

Anonymous
03/13/26(Fri)23:16:29 No.108367280

Anonymous 03/13/26(Fri)23:16:29 No.108367280

I've been running Qwen3.5-27B on a 5090 with llama-server getting 45-50t/s, but that CanIRun thing is suggesting the 5090 can hit 80+t/s.
Looking into setting up vLLM right now to compare, but is that performance gap expected? I didn't expect there to be that big of a difference.

Anonymous
03/13/26(Fri)23:20:05 No.108367297

Anonymous 03/13/26(Fri)23:20:05 No.108367297

>>108367280
vLLM can provide a slight performance increase, but not that much. What quant are you using?

Anonymous
03/13/26(Fri)23:22:07 No.108367305

Anonymous 03/13/26(Fri)23:22:07 No.108367305

>>108367297
Q8_0 of coder3101/Qwen3.5-27B-heretic using the default parameters of the safetensor-to-gguf script.

Anonymous
03/13/26(Fri)23:23:18 No.108367311

Anonymous 03/13/26(Fri)23:23:18 No.108367311

>>108367305
The comparison thing you are using could potentially be referring to a smaller quant.

Anonymous
03/13/26(Fri)23:26:19 No.108367328

Anonymous 03/13/26(Fri)23:26:19 No.108367328

File: Screenshot from 2026-03-1(...).png (81 KB, 764x645)

81 KB PNG

>>108367311
Thanks Anon, I'm retarded, that's exactly the problem was.

Anonymous
03/13/26(Fri)23:52:34 No.108367457

Anonymous 03/13/26(Fri)23:52:34 No.108367457

Dumb question.

Deepseek is the only open weights model that naturally uses 8-bit activations ?
Everything else uses 16-bit activations ?

Anonymous
03/14/26(Sat)00:07:48 No.108367536

Anonymous 03/14/26(Sat)00:07:48 No.108367536

>>108367457
They're all trained with 16 or 32 bit weights and then quantized down. Kimi K2.5 was supposedly trained in a "quantization aware" fashion and is basically considered a 4-bit model. But from what I gather they still trained it at 16 or 32 bit weights.

The reason they're trained at much higher precision weights is that during training you need to make changes that are very small and lower precision numbers would lose the granularity.

You could probably make all of them work, but it's largely a question of practicality. Probably easier to just get more hardware for training than to rebuild all the tooling. Also probably why nobody has done bitnet yet.

Anonymous
03/14/26(Sat)00:17:51 No.108367585

Anonymous 03/14/26(Sat)00:17:51 No.108367585

> tfw finally found something that I can do with my Raspberry Pi 5 (run OpenClaw)

Anonymous
03/14/26(Sat)00:22:27 No.108367606

Anonymous 03/14/26(Sat)00:22:27 No.108367606

>>108365567
It's wild just how many bad decisions were made in rapid succession to one another
Deep frying Llama 4 over the span of, like, two weeks when they had ages to train it
Firing the AI team and giving up the edge they had (the only big American company to open source frontier models) to aspire to become another Amazon instead
Demoting one of the faces of deep learning to hire a dude who's claim to fame is selling shovels while he dug his dick into Altman's asshole
Prostituting himself to Trump in hopes he'd help only to be second favorite to Altman anyway

Anonymous
03/14/26(Sat)00:26:32 No.108367630

Anonymous 03/14/26(Sat)00:26:32 No.108367630

>>108367606
Oh yeah I also forgot about them falsifying benchmarks and assaulting the arena with like 30 variants of behemoth to try to promptmaxx their score
What a fucking useless company

Anonymous
03/14/26(Sat)00:28:07 No.108367643

Anonymous 03/14/26(Sat)00:28:07 No.108367643

>>108367052
worst miku i've ever seen

Anonymous
03/14/26(Sat)00:37:45 No.108367681

Anonymous 03/14/26(Sat)00:37:45 No.108367681

>>108367585
But now you need to find something to do with OpenClaw

Anonymous
03/14/26(Sat)00:51:11 No.108367730

Anonymous 03/14/26(Sat)00:51:11 No.108367730

>>108366536
AesSedai Q4_K_M, which leaves me with around 10 GB of RAM to spare. I have a 4090.

>Probably start by building my own as a baseline
Cool, keep us posted. Don't have much intuition for batch inference. Would guess you need to dedicate the GPU to prefill (painful for agentic shit even with batch of one), but then what batch size can you reach before the CPU part flips from BW bound to ALU bound? How can batching even help much for BW bound decode on MoE models, when any reasonable batch count will only occasionally hit the same expert? Will it effectively work better with dense models, perhaps Qwen-3.5-9B?

This is without even getting to the higher-level strategies for tard battalion wrangling.

Anonymous
03/14/26(Sat)01:01:19 No.108367770

Anonymous 03/14/26(Sat)01:01:19 No.108367770

File: zlq8ha4gjwog1.png (42 KB, 677x463)

42 KB PNG

What should I do

Anonymous
03/14/26(Sat)01:02:14 No.108367780

Anonymous 03/14/26(Sat)01:02:14 No.108367780

File: 1422449559229.jpg (16 KB, 330x344)

16 KB JPG

Anonymous
03/14/26(Sat)01:03:05 No.108367783

Anonymous 03/14/26(Sat)01:03:05 No.108367783

>>108367770
flee the country

Anonymous
03/14/26(Sat)01:03:32 No.108367786

Anonymous 03/14/26(Sat)01:03:32 No.108367786

>>108367770
uninstall the app

Anonymous
03/14/26(Sat)01:03:36 No.108367787

Anonymous 03/14/26(Sat)01:03:36 No.108367787

>>108367770
local models???

Anonymous
03/14/26(Sat)01:06:47 No.108367804

Anonymous 03/14/26(Sat)01:06:47 No.108367804

>>108367787
those are nvda calls

Anonymous
03/14/26(Sat)01:09:35 No.108367822

Anonymous 03/14/26(Sat)01:09:35 No.108367822

>>108367770
dude it's virtual money just turn off the screen bro
it's not real bro

Anonymous
03/14/26(Sat)01:12:44 No.108367831

Anonymous 03/14/26(Sat)01:12:44 No.108367831

>>108367770
sex with miku

Anonymous
03/14/26(Sat)01:24:21 No.108367882

Anonymous 03/14/26(Sat)01:24:21 No.108367882

>>108367831
miku demands expensive vram

Anonymous
03/14/26(Sat)01:32:44 No.108367922

Anonymous 03/14/26(Sat)01:32:44 No.108367922

>>108366563
stop posting twatter slut

Anonymous
03/14/26(Sat)01:38:00 No.108367942

Anonymous 03/14/26(Sat)01:38:00 No.108367942

File: readingcomprehennsion.png (57 KB, 1050x283)

57 KB PNG

>>108366585
>why I'm replying to a tourist
he then proceeds to post
>but link your github
anon, why are you even on 4cucks? go back to plebbit or some other place where you can scan people's posting history and act like the old lady always looking from the window at the neighbors
>There's been more innovation out of this thread in the last 3 years than any other publicly open place on the internet
1/ lack of reading comprehension: I was talking about the vibeshitter general the anons linked, that general is a newborn, not a 3 years old thing retard.
2/ take your meds, schizo

Anonymous
03/14/26(Sat)01:43:07 No.108367962

Anonymous 03/14/26(Sat)01:43:07 No.108367962

File: thisthread.png (89 KB, 1494x370)

89 KB PNG

>>108367942
Not the anon you're replying to, but I was the one in that post. What's the problem?

Anonymous
03/14/26(Sat)01:45:22 No.108367968

Anonymous 03/14/26(Sat)01:45:22 No.108367968

>>108367942
Your post was badly worded, don't blame his reading comprehension. When you said "this thread" everyone assumed you meant, well, this one. If you were talking about the vibeshitter general, you should have said "that thread".

Anonymous
03/14/26(Sat)01:53:55 No.108367990

Anonymous 03/14/26(Sat)01:53:55 No.108367990

>>108367968
>everyone misunderstood
I wasn't a participant in that exchange, but I thought it was pretty clear that Anon was referring to the vibecoder thread.
Please don't lump me in with your sub-normal IQ group.

Anonymous
03/14/26(Sat)02:00:48 No.108368010

Anonymous 03/14/26(Sat)02:00:48 No.108368010

>>108367962
problem is helping literal retards

Anonymous
03/14/26(Sat)02:01:55 No.108368016

Anonymous 03/14/26(Sat)02:01:55 No.108368016

need v4

Anonymous
03/14/26(Sat)02:03:16 No.108368023

Anonymous 03/14/26(Sat)02:03:16 No.108368023

>>108368010
The faster he gets his shit running, the faster he'll leave. And pointing him to a thread where more people use whatever he's using is only going to make it better for him, and this thread.

Anonymous
03/14/26(Sat)02:04:48 No.108368028

Anonymous 03/14/26(Sat)02:04:48 No.108368028

>>108368023
but you gave him multiple posts crash course, not just a 'fuck off to vibefaggots central'.
Anyway, let's get back on topic.
what is pwilkin up to? how does he plan to fuck up llmao.cpp further?

Anonymous
03/14/26(Sat)02:08:35 No.108368044

Anonymous 03/14/26(Sat)02:08:35 No.108368044

>>108368028
I gave him one link to tell him he's a retard. And one link to fuck off to. Chill.
>what is pwilkin up to?
He's been waiting for his horde of retards to summarize Qwen Code's documentation.

Anonymous
03/14/26(Sat)02:24:09 No.108368098

Anonymous 03/14/26(Sat)02:24:09 No.108368098

>>108367770
There is nothing you can do, you are fucked. Like, fucked for life. Why did you think using margins were a good idea?

Anonymous
03/14/26(Sat)02:51:16 No.108368205

Anonymous 03/14/26(Sat)02:51:16 No.108368205

>>108368195
>>108368195
>>108368195

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.