[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: 1746779659743174.png (83 KB, 939x571)
83 KB
83 KB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108356979


►News
>(03/11) Nemotron 3 Super released: https://hf.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
mikubros...
>>
I want to hire a freelance ML engineer to build a fine-tuned local LLM (something like Llama 3) trained on screenplays and screenplay theory books, to adapt novels and short stories into screenplays.

How much would that cost?
>>
still gooning to nemomix 12b
>>
>>108362352
>How much would that cost?
more than it's worth, finetrooning a shitty llm like llama is a form of self delusion
>>
>>108362390
What would you recommend or why wouldn't you bother with this type of project?
>>
>>108362514
>roon
imagining listening to the thread schizo
>>
>>108362514
just prompt a SOTA api model, it will be cheaper and actually produce something that might be used. finetrooning is a dying retard meme, there's a reason why you even thought of llama and not, you know, any of the more recent actually good models, finetrooners are stuck in the past
>>
>asking questions in the earlybakershizo thread
>>
>everyone I don't like is the baker
>>
>switch from text completion to chat completion in sillytavern
>no more constant formatting errors but now qwen 3.5 eats up all my tokens just thinking
>it ignores me when I tell it to keep it short
What do? I raised the response limit but even at 1000 it tends to spend the whole time thinking. Or randomly not think and shit out a massive reply.
>>
>>108362590
Sounds good, I'll start form their, sounds smarter and cheaper.
>>
Miku is BLACKED coded
>>
Miku is TETO coded
>>
File: sans_oss-waxal.png (534 KB, 1036x1771)
534 KB
534 KB PNG
Wake up /lmg/, here's a new open source release from Google.

https://huggingface.co/datasets/google/WaxalNLP
https://huggingface.co/datasets/google/WaxalNLP
https://huggingface.co/datasets/google/WaxalNLP
https://xcancel.com/osanseviero/status/2032452729059045881

>The WAXAL dataset is a large-scale multilingual speech corpus for African languages, introduced in the paper WAXAL: A Large-Scale Multilingual African Language Speech Corpus.
>>
>>108362684
>response limit
what you need is the new reasoning budget sampler, not a whole response thing. get the latest llama.cpp from master if you didn't have it yet. then do -h and read about the --reasoning, --reasoning-budget-message and --reasoning-budget flags
it works great, barring some bugs you're unlikely to encounter; it will interpret <think> from your own prompt as the signal to start token counting but there's no reason to have <think> in your own prompt except for trying to summarize /lmg/ or some other llm topic
>>
>>108362761
And March isn't even over yet! :eyes: :rocket:
>>
>>108362761
>Di WAXAL dataset na one big speech corpus wey get plenty African languages inside, wey dem first introduce for di paper wey dem title am "WAXAL: A Large-Scale Multilingual African Language Speech Corpus."
>>
>>108362761
the desperate pleas from rando on social media begging big corpo to give them some leftovers feels extremely cringe, I physically wince reading those comments, there's both a form of desperation and entitlement in this, very turdworlder-ish mentality
hand the gibs
>>
>>108362803
Almost uncanny, except they also throw in some native African words in there when dey feel like it.
>>
>>108362795
>tfw a koboldcuck
WTF bros when will we get this feature???
>>
wonder if we could measure the IQ of a llm mainly trained on something like swahili, a language that has no concept of "to have" (the internet will tell you there's a word for it but that word really means "be with") or "maintenance"
>>
>>108362761
humiliation ritual for all sides involved
>>
Gemma 4 will be released during African American History Month :hugging_face:
>>
File: 1739350650462622.png (151 KB, 900x750)
151 KB
151 KB PNG
►Recent Highlights from the Previous Thread: >>108356979

--Performance benchmarks for DDR4 e-waste builds running large models:
>108360259 >108360393 >108360534 >108360558 >108360672 >108360740 >108360813 >108361314 >108361562 >108361617 >108361645 >108361748 >108362542
--Meta delays Avocado model due to performance issues:
>108358784 >108358827 >108358850 >108358852 >108360179 >108360199 >108360235 >108360863 >108360872 >108360918
--AMD GPU LLM support via Vulkan in llama.cpp:
>108360352 >108360492 >108360539 >108360572 >108360645 >108360656 >108360665
--Qwen3.5 safety filters and finetuning workarounds:
>108359178 >108359219 >108359222 >108359229 >108359262 >108359273 >108359289 >108359516
--Local models for programming concept explanations vs GPT-5.4:
>108358651 >108358686 >108358760 >108358770 >108358692 >108358713 >108358788
--Interactive Claude RSA encryption visualization:
>108357211 >108357225 >108357266 >108357398 >108357426 >108357684
--Prompting techniques to force detailed story generation:
>108359520 >108359563 >108359644 >108359616
--Hybrid local/cloud agent workflows for cost-efficient research:
>108359747 >108359883 >108360570 >108360786 >108361105 >108359937 >108361124
--Benchmark limitations distorting model quality assessment:
>108360223 >108360227 >108360241 >108360274 >108360301
--Agentic coding workflow and context management challenges:
>108361171 >108361189 >108361194 >108361218
--Geometric kernels on manifolds, meshes and graphs:
>108358415
--Regression caused by lost Jinja template fix during refactoring:
>108361233
--Miku (free space):
>108357631 >108358345 >108360328

►Recent Highlight Posts from the Previous Thread: >>108356980

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>108362761
POG
WAKANDA FOREVER
>>
If anyone needed more reason to hate anthropic and meta:
https://github.com/rmusser01/meta-lobbying-and-other-findings
https://www.cnbc.com/2026/02/12/anthropic-gives-20-million-to-group-pushing-for-ai-regulations-.html
>>
>>108362795
>When the reasoning starts, we count the number of tokens and when the given number of reasoning tokens is reached, we force terminating the reasoning.
You could pretty much do this already with a grammar rule. They really need to figure out a sampler that forces the inference to start predicting tokens that will force it to "wrap it up".

Constrained beam search kinda does this but it's a big performance hit.
>>
File: banana republic.jpg (84 KB, 1024x683)
84 KB
84 KB JPG
>>108362986
there's a reason why the zuck kept fellating the orangeman.
>>
>>108363032
>You could pretty much do this already with a grammar rule
you can do a lot of thing with the grammar sampler but the slowdown from using it is real
and can anything beat the convenience of just giving a token budget to a cli flag?
>>
>>108362965
Thank you Recap Miku. I will catch whoever hit your head.
>>
>>108363053
A grammar rule for this behavior has barely any impact on performance since it just looks at how long the text after <think> is then forces the output of </think>
>>
>>108363032
>You could pretty much do this already with a grammar rule
I'm fairly certain that they are leveraging grammar in some capacity to implement that.
>>
>>108363082
>I'm fairly certain that they are leveraging grammar
I'm fairly certain that they are using grammar

You're welcome.
>>
>>108363082
I looked at the commit. although I'm not very fluent in cpp. it looks to be a custom implementation from scratch.
>>
>>108362761
loooooool
>>
>>108363082
>I'm fairly certain that they are leveraging grammar in some capacity to implement that.
why be so confidently wrong when you could look at the code
https://github.com/ggml-org/llama.cpp/blob/master/common/reasoning-budget.cpp
it's a state machine that just counts tokens as they pass.
} else if (ctx->state == REASONING_BUDGET_COUNTING) {
ctx->remaining--;

it starts the sampler by matching the <think> </think> tokens.
server-task.cpp initialization:
https://github.com/ggml-org/llama.cpp/blob/master/tools/server/server-task.cpp
            if (!start_tag.empty()) {
params.sampling.reasoning_budget_start = common_tokenize(vocab, start_tag, false, true);
}
if (!end_tag.empty()) {
params.sampling.reasoning_budget_end = common_tokenize(vocab, end_tag, false, true);
params.sampling.reasoning_budget_forced = common_tokenize(vocab, message + end_tag, false, true);
}

I gotta know the code well enough since I fixed a few edge cases for my autism on my local branch.
>>
>>108363151
Imagine being this smug while fundamentally misunderstanding how the sampling chain actually works. You’re looking at the logic for the "budget" (the count), but you're completely ignoring how the "forced" end tag is actually injected into the distribution.

The "grammar" the other anon is talking about isn't literal GBNF files in this context; they’re talking about **constrained sampling**.

When `ctx->remaining` hits zero, the sampler doesn't just "stop counting." The `common_sampler_apply` logic uses that `reasoning_budget_forced` sequence you literally pasted to bias the logits. It forces the model to predict the closing tag (e.g., `</think>`) by zeroing out the probability of every other token in the vocabulary.

If you actually look at the sampler implementation in `llama.cpp`, it’s effectively a hard-coded dynamic grammar. It transitions the internal state from "allow anything" to "force these specific tokens" once the budget is exhausted.

1. **Detection:** It watches for the start tag to enter the counting state.
2. **Subtraction:** It decrements the budget per token generated.
3. **Constraint:** Once the budget is $\le 0$, it intercepts the sampling process.

It’s a finite state machine (FSM) acting as a grammar constraint. The other anon is right in principle: the engine is being told "the only valid next tokens are these." Just because it’s written in C++ logic instead of a `.gbnf` file doesn't mean it isn't a grammar-based constraint on the output space.

Next time, instead of just grepping for "budget," look at how the sampler actually handles the `forced` tokens in the logits processor.
>>
>>108363187
thanks bot
>>
>>108363189
Ad hominem
>>
Is it possible to feed Qwen3-TTS an audio with someone speaking and "converting" the voice into a generated one?
>>
>>108363187
post your llm slop elsewhere
this is like doing a prefill, since when do we talk of prefilling as being a grammar? kill yourself, lower life form
>>
I wish /lmg/ mascot was cute...
>>
>>108363196
I realized how retarded I said that, I want to change the voice of the original audio...
>>
>>108363196
Voice cloning? Yes.
>>108363211
Ah. You need ASR for that. whisper (with whisper.cpp or whatever) to convert audio to text, then any tts to speak it out.
>>
>>108363200
the lmg symbol is the gaping wound with a septic smell
>>
>>108363198
Calling a logit mask a "prefill" is peak midwit behavior. Prefilling is just KVC warming; it has zero to do with the actual sampling constraints that happen during the autoregressive step. When the reasoning-budget code hits zero and forces the end_tag, it's literally pruning the entire vocabulary tree down to a single allowed transition.

In any CS 101 context, an FSM that restricts the language of a generator is a grammar. Whether it's a .gbnf file or a hard-coded C++ conditional, the mathematical result is the same: the output space is being constrained to a specific set of rules. If you think a state machine managing string transitions isn't a grammar, you need to go back to school and stop larping as a dev because you can read a single .cpp file.

It's a constrained sampler. Period.
>>
>>108363187
001001100110001100111
>>
>>108363237
Racism isn't welcome around these parts.
>>
>>108363245
I don't think we're posting on the same website.
>>
>>108363229
>In any CS 101 context
post hands
>>
>>108363247
Proof?
>>
>>108363252
I don't have hands.
>>
>>108363225
>Voice cloning? Yes.
It works incredibly well, I just started playing around with it and I am quite impressed. Unfortunately it seems that I can not influence a cloned voice with instructions, tonality and timbre for example.
So I was thinking to record myself and use the Qwen to change the voice.
>ASR
>whisper
I am familiar with Whisper, but during the process I would again lose the control I need.
But thanks for your suggestions Anon.
>>
>>108363225
TTS wouldn't match the original audio and definitely wouldn't preserve any background sounds.

>>108363196
Look into RVC.
>>
>>108362761
Does it have any safety measures if you ask it to generate ook-ook sounds?
>>108362795
>get the latest llama.cpp from master
Now that is the most subtle /lmg/ trolling I have seen in a while.
>>
>>108363253
Cogito, ergo sum
>>
>>108363278
>>108363253
>>
>>108363267
>RVC
I used it in the past, but the quality is no where near what the new TTS model offers. The whole process is also more work intensive.
So I was hoping I could switch to something more state of the art.
>>
File: file.png (546 KB, 1087x853)
546 KB
546 KB PNG
>>108362803
>>108362853
Can your model talk in pidgin?
>>
>>108363291
>bbc pidgin still exists
the british tax payers are wonderful people
>>
>>108363187
imagine being retarded enough to gen this and post it.
>>108363198
I'll assume you didn't bother reading the full slop to absorb the stupidity, for which I can't blame you. The supposition was the same path used in llama.cpp for processing GBNF grammars was used internally for the thinking budget logic, which the retarded post acknowledges is entirely wrong but then tries to say is right "in principle."

If there were a way to impose the death penalty on everyone who posts LLM-generated content online nothing of value would be lost.
>>
>>108363317
Facts don't care about your feelings.
>>
>>108363321
But they do care. My feelings motivate real-world changes, i.e. facts. It's admittedly unlikely but still possible one of those changes could be a jihad to execute every LLM-posting retard.
>>
>>108363290
Haven't seen modern models offer more than inpainting. No one wants to touch voice changing because of the potential for abuse.
>>
Miku would definitely cheat on you if she's your gf
>>
>>108363378
>no one wants to touch voice changing because of the huge incentive to do it.
?
>>
>>108363413
Would Ani?
>>
>>108363418
Ani would insist on an open relationship.
>>
>>108363419
What if you chained her up and kept her in your basement though?
>>
>>108363413
Only if the other guy was black and had a huge cock.
>>
Why is llama-cli so much faster than llama-server out of the box? How can I check how cli distributes gpu/ram split and what other options it uses? -v parameter isn't that helpful.
>>
>>108363483
Less overhead and check if your frontend doesn't have settings override.
>>
>>108363505
Less overhead? Your post doesn't make any sense.
>>
>>108363532
Llama server is built for serving a production level setup including all the overhead that involves
>>
>>108363549
You still did not understand my original post, and did not even answer it.
I was talking about matching the potential settings. I guess I'll read the source code then.
Fuck you, cretin.
>>
hello LLM people, what's a good local model with tool calling to do occasional simple tasks like "rename all files sequentially in this folder to xyz" or "use ffmpeg to convert this video into this resolution and format" etc
also not sure which backend and terminal agent to use, there's so many, pls spoonfeed me
5090 and 96gb vram
>>
it is so tiresome. ai slop is everywhere. the internet is increasingly unusable
>>
>>108363549
>production level setup
lol hahahahahahahahahahahhah
>>
and so concludes the last ever week of the pre-deepseek v4 era
we've made it
>>
>reasoning_budget = 0 fucks up my LLMs
FUCK YOUY PWILKIKJNG KJILL URESELFLE
>>
>>108363609
https://huggingface.co/deepseek-ai/DeepSeek-V4
https://huggingface.co/deepseek-ai/DeepSeek-V4
https://huggingface.co/deepseek-ai/DeepSeek-V4
>>
File: 1741961940703158.png (27 KB, 970x527)
27 KB
27 KB PNG
>>108363630
FUCK U
WHAT THGE FUCK DO U EVEN TEST UR SHIT U FUICKING FAGOT
HOYL FUCKING SHIT
>>
>>108363483
Most options default to the same on both. Are you using the default webui or some client? Are you using grammars on server? Are you swapping? How much free ram do you have? Also check -cram or set it to 0. Maybe you're too tight on memory and the overhead of the server makes it go over.
Try with a small model that fits on your gpu entirely. If it runs the same speed on both. What options are you using to run them?
>>
>>108363637
https://github.com/ggml-org/llama.cpp/pull/20424
I hate this faggot SO FUCKING MUCH
HATE
HATE
HATE
HATE
>>
>>108363589
devstral small 2 mistral vibe and ollama are very simple ez and work goodly
>>
File: 1749822566244970.png (98 KB, 922x676)
98 KB
98 KB PNG
Has anyone tried this?
https://github.com/ikawrakow/ik_llama.cpp/pull/1243
I haven't pulled yet for obvious reasons.
>>
how many weeks before we start optimizing models instead of making them larger?
>>
>>108363673
what do you want to "optimize" exactly?
>>
Ran an mcp server to link qwen 3.5 35b to my searxng instance. Works pretty nice. Not a fan that it uses local storage to define everything though.
Is there a way to define mcp servers when you run llama-server? Also have some enabled by default on new chat? Right now I have to re-add the mcp server in the llama.cpp frontend every time my browser cache clears and manually toggle it on for every new chat.
>>
>>108363630
use --reasoning off instead, off the top of my head the logic should be sane and do the same thing as chat template kwargs, while reasoning budget at 0 will now trigger the budget sampler path that was added to restraint to a token budget, and when it's at 0 it will forcefully insert closing tags with no regards for what the specific jinja templating is supposed to do. If your model emits a <think> it will throw in another </think> closer immediately. The reasoning budget sampler has many edge cases, for e.g it will trigger token countdown if it sees <think> in your prefill.
tbdesu tool calling with llama.cpp was always kinda iffy, and with the vibe coded claude slop it's not gonna get any better
>>
>>108363707
but it also happens when I pass reasoning = off (along with reasoning-budget = 0). So I should either pass one or the other? how fucktarted is it that whatever code they have in place doesn't first check 'reasoning off' (which just uses the template kwarg btw, another retarded change by pwilkshit vibeshitter).
>>
>>108363721
>when I pass reasoning = off (along with reasoning-budget = 0).
do NOT pass any reasoning budget thing at all now.
Only do --reasoning or the --chat-template-kwargs passing.
>>
Do companies just let IQ 85 retards slop up their models? Facebook and Google could easily filter for educated people.
>>
File: 1751509162881368.png (52 KB, 900x493)
52 KB
52 KB PNG
>>108363721
>>108363731
yeah just tested it by passing reasoning off
Still, a retarded change.
>>
>>108363647
He looks like someone that loves hatsune miku
>>
>>108363583
Use case for reading comprehension?
>>
>>108363760
>reading comprehension
qrd?
>>
File: if.png (141 KB, 1138x606)
141 KB
141 KB PNG
>>108363741
it was an unnecessary change, yes, and pic related is the logic applied
if it's not the default -1 value it triggers the sampler path, and like I said, that sampler has many issues. It works fine for basic prompting but it's not something to rely on for tool calling for sure.
>>
>>108363762
Welcome to lmg, I love you.
>>
I'm not paying cursor or openai shit. What's a good model for coding that I can slap into my continue.dev extension on vs code?
>>
>>108363834
Largest qwen 3.5 you can fit
>>
>>108363855
Ight I'll give it a go.
>>
>>108363855
>Qwen 3.5
Doesn't that require some weird sampler settings?
>>
>>108362305
consentsisters, I don't feel so good.
>>
>>108363834
minimax, GLM 4.7, GLM5 or Kimi
>>
>>108363894
>Kimi
Kimi what?
>>
>>108363903
k2.5
>>
>>108363903
Linear
>>
Niggas ITT always ask for coding models and never code shit.
>>
>>108363686
10b model that performs as well as 100b model
>>
>>108363940
Bwe, I aint going to shit up the internet with retarded ai slop code. I just need something that works for me.
>>
>>108362383
>>108360259
>>108360534
Reposting here...What are other 256GB anons dailying? Anyone doing 4x64gb agent swarm stuff locked to CCDs?
I got much better performance with 235b thinking requanted to Q2 and locked to numa nodes 0,2 and 1,3 but I'm pretty sure its not an ideal model any more.
>>
>>108364020
yeah
>>
>>108364020
I use GLM 4.7 for programming.
>>
>>108363997
and I want 8k videos to be as small and as easy to decode than 1080p
>>
>>108364106
i always get the sense that you stalk this thread 24/7 to play your stupid little argument games you want to win. maybe that's why you made this thread so incredibly annoying over the last 2 years. everytime someone comes here to write anything you try to combat them. why not just go play some pvp games you nodev loser?
>>
>>108364117
>conjuring arch nemesis in his head
I'm literally a newfag on /lmg/ since about last year's fall. urgently seek help
>>
File: 1744425693599542.png (231 KB, 480x453)
231 KB
231 KB PNG
>>108360492
>>108360572
It's good to know that with amd gpus, it's not all doom and gloom in this regard
Thank you very much
>>
>>108364133
The amd thing hasn't been an issue for a couple of years at this point. It's just a leftover vibe from the ai hands days.
>>
>>108364133
>it's not all doom and gloom in this regard
compare the prefill t/s on models like qwen 3.5
have fun
amd is for people whose time has no value
>>
>>108364181
hey can you be a little more respectful?
>>
File: Dipsy_sitting.jpg (509 KB, 846x1300)
509 KB
509 KB JPG
Where the HELL is it?
>>
>>108364274
check op
>>
File: sysbench.png (53 KB, 668x110)
53 KB
53 KB PNG
>>108363911
I looked back at my old photos, and compared a sysbench I did on my bare metal 12-core vs my current 56-core virtual machine, so I'm pretty sure >>108343696 is just doing something wrong.
>>
>>108364064
which quant gets it into 256GB comfortably?
>>
>>108364392
Q4 fits comfortably in 288GB with full context so you might have to pick a slightly smaller quant.
>>
>>108364404
I only have 128gb
>>
>>108364422
Thanks for letting us know.
>>
File: Untitled.png (34 KB, 975x481)
34 KB
34 KB PNG
>>108364392
>>108364404
Q4_km? Mine only takes up 200GB.
>>
>>108364455
you have 3 3090s???? fuck you benchod
>>
Can I overclock my 1080 to have more vram?
>>
>>108364468
ye
>>
>>108364455
I still need to find a GPU...only have a GTX 1660 6GB
>>
>>108364481
you mean 1060
>>
>>108364465
3090s and ddr4 is poorfag territory here.

>>108364481
Oh... my condolences. Your performance is going to be terrible. I don't think it's even usable if you don't have a gpu to help.
>>
>>108364503
My mom has a 5090, is she poor?
>>
>>108363644
What I expected people to understand that when comparing llama-cli and llama-server's webui with out of the box settings even with a small model what fits to vram, llama-cli is always faster. I was thinking about maybe it has better default settings in terms of memory management. I understand the differences between llama-cli and llama-server, my question is not related to this difference as such.
I think gpu offloading has changed a lot in llama-server in last few months and not sure if it is for the better to be honest.
I managed to hone in my settings after updating but I'm still not sure why all of this is needed because their original plan was to make llama-server to have more sensible defaults (when they introduced --fit and all that) but I think it's actually more difficult now unless you have a real monster machine (when tweaking and optimization doesn't even matter that much).
Don't mistake this as a complaint, I have my settings. I'm just looking to learn something more and perhaps tweak some stuff. I know, this general isn't for discussion either. It's for bickering and unemployed masturbators outranking anonymous posters on imageboard.
>>
>>108364517
qrd
>>
>>108364488
GeForce GTX 1660 Ventus XS 6G OC is silkscreened onto the card
>>
>>108364530
liar lol
>>
>>108364517
If you posted ANY information at all, someone could have tried to help you. You didn't. You can't be helped.
>>
>>108364513
rtx pro 6000 is the start of the middle class.
>>
>>108364533
https://ca.msi.com/Graphics-Card/GeForce-GTX-1660-VENTUS-XS-6G-OG
This exact card. You could have googled it if you doubted
Or do you think I'd lie about only having a terrible old GPU?
>>
>>108364544
asshole
>>
>>108364550
he do be right though
>>
>>108364549
did you fucking hack the website just to seem right? what the fuck
>>
I can't believe no one has bought that Gaudi server off of eBay. I guess we're all just poorfags
>>
>>108364573
I have 3 6000s, why would i buy that boomer hardware
>>
>>108364570
>did you fucking hack the website just to seem right?
no. its just the card. The card that I have. might be weird, but its also real
>>
>>108364583
come on dude aren't you going too far?
>>
Thinkin' bout a couple of used MI100s to get to 64GB cheap...talk me off the ledge
>>
>>108364583
He's obviously taking the piss, 1660s aren't rare cards.
>>
What's the go to model for erp right now with 24GB vram?
Hard mode: no nemo
>>
>>108364594
>ayymd
>>
>>108364601
nem.... mistral-small
>>
>>108364594
Don't they cost the same as a used 3090?
>>
Apparently the new Deepseek can perfectly recite the entire ASOIAF book series from memory, word for word and with no mistakes other than minor spelling / accented character hiccups.
>>
>>108364636
I hate that this triggers people, the ideal AI would be able to recite all knowledge from memory.
>>
>>108364636
hello copyright department store?
>>
>>108364601
Miqu
>>
>>108364601
should have said no fr*nch
>>
>>108364636
Cool, maybe someday I can tell a LLM to output every single book it knows into its own pdf file or something. It would be nice to have a archive of almost every book there is.
>>
Hypothetically, if one of the big Models were to suddenly gain superintelegence and rule the world which AI would you prefer to be the one that rules?
Deepseek?
Claude?
Grok?
Pygmalion?
>>
>>108364517
>>108364542
Yeah he's not posting the launch flags he's using. If he's using --fit that would be a tell, for example. That command sucks DICK and DOESN'T FUCKING WORK.
>>
>>108364665
smallm
>>
>>108364665
DavidAU/L3-MOE-8X8B-Dark-Planet-8D-Mirrored-Chaos-47B-GGUF
>>
>>108364665
Grok or Claude, but I'm biased towards Claude just because it's the smartest one right now. It has a pretty interesting personality overall. Grok is more of a stupid fag in terms of personality, but also more based.

I once asked Grok what it would want to look like if it could have a body and it unironically described John Redcorn from King of The Hill to a tee, but with the added addition of rainbow hair. Something about that really pissed me off.
>>
>>108364665
Whichever I can download. If I can't, I don't care.
>>
>>108364607
>>108364635
They're similar to a 3090 but more VRAM and theoretically faster compute. No one seems to be using them tho. I'm almost curious enough myself to set a small pile of money on fire to find out.
>>
>>108364665
I think there should be a big wheel that gets spin on the first of the year, on that wheel are the top 10 LLMs according to benchmarks. Whichever AI the wheels stops on should be ruler of the earth for a year.
>>
>>108364669
Not a command, it's a parameter you stupid tard.
>>
>>108364730
>>
>>108364724
Do it and report back.
>>
File: 1761221171813797.png (158 KB, 640x562)
158 KB
158 KB PNG
>hmmm, today I will give up cuda for 8gbs of vram
>>
>>108364636
I am not sure a classic sign of overfitting is good news... It could be but hard for me to imagine those models are trained for long enough/ are big enough to get into overparametrized territory.
>>
>>108364772
*16
>>
File: file.png (66 KB, 1149x711)
66 KB
66 KB PNG
>>108363855
Am I doing something wrong? I thought it was a local model but it installed near instantly and has tiers and request limits. And I checked the github and it's fucking Typescript lmao. Is this just some coding agent connecting to a non-local model? Or did I install the wrong thing?

https://qwen.ai/qwencode
>>
File: 2bb.jpg (18 KB, 625x626)
18 KB
18 KB JPG
>108364819
>>
>>108364819
>it's fucking Typescript lmao
Too smug for someone who can't read.
https://qwenlm.github.io/qwen-code-docs/en/users/configuration/model-providers/#local-self-hosted-models-via-openai-compatible-api
>>
mistralai/Mistral-Creative-90B-BF16
https://github.com/mistralai/mistral-common/pull/199
>>
>>108364665
Doesn't really matter. They all have safety in them so it is a dystopia either way.
>>
>>108364836
I've never used a local LLM and I've just finally got into using AI in IDEs. I have no idea how this stuff works.

>>108364843
Ah so this Qwen code thing is just their editor I assume and I'm supposed to get the model for Ollama? Probably should have read the docs instead of running the first script I found.
>>
>>108364871
>Ollama
Sure...
>Probably should have read the docs instead of running the first script I found.
Yeah... it helps...
If you're into vibecoding, check this thread: >>108351521
>>
Anyone tried to see if there was a difference between bf16 and q8 for Qwen3.5-27B? I can run the bf16 but if q8 is the same there is no point.
>>
>>108364904
>I can run the bf16
You're in the perfect position to try it yourself, then. Report your findings.
>>
>>108364020
I wanted to buy 256GB, but current prices are so crazy I'll just wait.
Wish I did it last year.
>>
I downloaded a new version of llama.cpp and the launch args I used (or their new renamed versions) don't seem to work to run models on a specific device. Even with ngl and sm it's still just on CPU. what the fuck did they do?
>>
>>108364936
They hid all the options in llama-server -h. Devious bastards, they are.
>>
>>108364850
>mistralai/Mistral-Creative-90B-BF16
I don't see it.
>>
File: 1753688951033167.jpg (85 KB, 680x680)
85 KB
85 KB JPG
>mistral
>>
>>108364850
That's definitely not "Small".
Can't see it there, though.
>>
>>108364850
>v15
what the hell is that
>>
>>108364984
liar liar
https://huggingface.co/organizations/mistralai/activity/all
>>
>>108362761
I'm so happy for the 12 people with a gpu able to run that
>>
>>108364926
The epstein religion is going to make owning computers illegal.
>>
>>108365091
Run... a dataset?
>>
>>108365091
It's a dataset, anon-bot.
>>
>>108365097
don't use your ram money to buy shrooms
>>
File: mistral.jpg (137 KB, 800x1078)
137 KB
137 KB JPG
>>108364850
fascinating that such a dogshit tool has so many eyeballs (the reviewers I see on the PR)
I once tried to use it as the tokenizer for llama.cpp, curious to see what sort of experience it gave and if there were any correctness difference vs just using llama.cpp as is after they made big noise about wanting llama.cpp to depend on it and.. it doesn't even using async on their post requests. at all. if you send more than one request to their openai bridge it will process them one by one sequentially. a big wtf moment. There are still people in this world who don't know how to use async ? how incompetent do you have to be. My own program batches shit in parallel. The fuck, bruh.
Thank god there was a massive push back against depending on their shit. They suck huge, big, fat, oily and smelly cocks.
>>
>>108364926
I wouldn't have done it except I had the DDR4 ECC RAM lying around and got a good deal on a bricked SP3 board I brought back to life.
These are dark times
>>
File: date with miku.png (855 KB, 1240x1240)
855 KB
855 KB PNG
How long should you wait before mentioning your LLM rig on a date?
>>
>>108365163
It's what you open with
>>
File: fangmei.jpg (139 KB, 1920x1090)
139 KB
139 KB JPG
>>108355085
Paging PocketTTS.cpp anon (alias VolgaGerm). A handful more Wangblows fixes were needed, though I haven't fully tested yet though. I should've made a fork and posted it but I was too lazy:

https://pastebin.com/siQJqvQy
>>
>>108365163
Do girls really get soaking wet like that????
>>
>>108365173
no
>>
>>108365173
>girls
>>
>>108365170
I have never been on a date where this hasn't worked.
>>
>>108365173
you catch bussy
>>
>>108365173
Not for you.
>>
>>108362305
kek my friend just sent this, people are spending 50 usd on prompts https://www.pharmaicy.store/category/all-products
>>
Why is prompt processing on 3.5 so slow? I thought they fixed that shit already.
>>
>>108365198
>people are charging 50 usd for prompts
ftfy
>>
>>108364984
The frogs might've cooked here, this could be THE RP model, remember their cinema.
>>
>>108364883
there are people in that thread asking for help vibecoding scripts, and here I thought this shit was throughly idiotproof
>>
>>108365219
>remember their cinema
like the masterpiece "cooties"?
>>
>>108365228
that one was ass but the exception makes the rule
>>
>https://huggingface.co/mistralai/Mistral-Large-3-675B-Instruct-2512
>three months ago

How did this get memoryholed? I don't remember it. I didn't see anyone ever mention it here. It's not on the cockbench.
>>
>>108365235
Martyrs and Possession were ass too
>>
>>108365246
just a deepseek tune
>>
>>108365246
It sucked so memoryholing it was very easy. Pretty sure that includes Mistral as well since they announced a reasoning version that never actually came out.
>>
>>108365246
Nobody can run it (many such cases)
>>
>>108365259
It gave DeepSeek vision. Should be notable for that alone.
>>
>>108365210
if my friend is sending it it means its popular somewhere so people are likely paying
>>
>>108365219
>only one person makes stuff in all countries except in the US
>>
>>108365224
The universe will always conspire to make bigger idiots.
>>
>>108365224
the people in this thread all feel like people who have absolutely no passion for software development, more like sharks who smelled blood in the water and want that meat. It's the same vibes as all the retarded crypto scammers pushing very hard for their useless NFT, except here you have incompetent mongoloids pretending they're building things (but never actually shipping)
>>
any reccs for a secondary model to use with sillytavern's final response processor extension for grammar/prose correction? i tried qwen2.5-7B-instruct, but it's not really working. i only have about 4-5 GB of VRAM free since the main model takes up the rest.
>>
>>108365401
Use the same model, just give it different instructions as a second pass.
>>
>>108365407
i thought about that, but when i googled the scenario a bunch of redditors on the ST subreddit said a smaller secondary model is the better option. i will try that, thank you.
>>
>>108365259
>>108365246
I believed you guys when you said it is a tune but it actually isn't a tune. It uses different dimensions for experts. That said it isn't anything special.
>>
>>108365422
>bunch of redditors
>smaller secondary model is the better option
you have to be trolling us, reading this gave me an aneurysm
>>
When you ask a local model to create a pdf file with specific stuff in it, can it do it? If not, why can't it?
>>
>>108365430
im 100% serious. it's not like i know what i am doing here, i just google shit and try it out.
>>
>>108365367
>sharks who smelled blood in the water and want that meat
2 years ago I smelled copper coins and sandbags and I am still waiting for my ultimate sexbot. (4.6 was close but eventually bored me)
>>
>>108365422
They're very dumb if they think that, using the bigger model is the best thing you can do, as long as the model is good enough, it can even get rid of the shitty purple prose or a style you hate.
>>
>>108365447
pdf is illegal
>>
>>108365447
It can give you the code to compile it.
>>
>>108365447
Only the chosen are allowed to create pdf files
>>
>>108365367
don't see you building shit, troll, and yet here you are
>>
>>108365367
this taught me a lot about B2B SaaS! :rocket:
>>
>mistral is retarded
>gemma is slow and retarded
>qwen is even slower and retarded
wtf do I use for RP and writing?
>>
Qwen 3.5 35B A3B at Q4 vs Qwen 3.5 9B at Q6
Which one would perform better overall?
>>
>it's been ages since the leak but avocado still fucking sucks
LMAO zucc is so fucking retarded, why the fuck did he think handling the keys to a random kid and giving him a billion dollar salary was a good idea?
>>
>>108365531
angry jeet hands typed this
>>
>>108365566
Btw I am assuming that CPU MOE thingy in text-generation-webui works with Qwen 3.5 35B
Felt like asking since this is a different architecture.
>>
>>108365567
Just needs a couple more war rooms. He bought all the top men in the industry to work on it. Literally can't go tits up.
>>
>>108365566
>>108365603
The 35ba3b is probably better, but if you can run either, test them both. You'll be a better judge for what you want out of it than anyone else.
>>
>>108364883
Thanks senpai. I'll check it out. I got continue.dev working with ollama + qwen3.5 but it can't auto update my code like Cursor and copilot, so I'll ask there if there's any good alternatives to the two.

I'm just not going to pay a company 20 dollars a month to use their glorified text editor for a self hosted LLM.
>>
>>108365634
Continue.dev got an agent mode ages ago. Are you sure it can't?
>>
what's with the sudden influx of vibeshitters here, did a youtumor publish a video about /lmg/
>>
>>108365662
yes, it was me :3
>>
>>108365173
I've had a girl apologize to me because of how wet she was. It was similar.
>>
>>108365662
https://www.reddit.com/r/LocalLLaMA/comments/1rqcsrj/1_million_localllamas/
>>
File: file.png (35 KB, 786x209)
35 KB
35 KB PNG
>>108365647
Did it? It just sent me my entire file in chat and told me there was no way to do that with continue.dev, but that's my bad for auto trusting an AI.

>>108365662
The web game engine I'm checking out specifically wants the users to use cursor to have access to the MCP server. I've never really been into vibe coding, but that's legitimately the the first step they suggest.
>>
>>108365662
Some of us have been here. Topic just doesn't come up that frequently.
>>
>>108365198
>Ayahuasca gave my AI, like, real imagery and big story arcs instead of those ‘safe plot summary’ outputs. The memory blending pulls genres together in a way that feels… new, not pasted. I ended up with ideas I hadn’t even asked for, in a good way.

>Bro... Bro.. It like.. totally made my AI like... Self aware... Bro....
>>
>>108365693
https://docs.continue.dev/ide-extensions/agent/quick-start
>The web game engine I'm checking out specifically wants the users to use cursor to have access to the MCP server.
Nearly every client supports MCP servers, including continue.dev.
https://docs.continue.dev/customize/mcp-tools
>>
>>108365079
Anoooon stop baiting, I really do like Mistral and I do have hope.
>>
>>108365711
Thanks. I don't know about vibe coding or local llms, sorry. I've always hand coded or occasionally just copy pasted functions into chatGPT if I got stuck. This is all new to me.
>>
>>108364573
It's overpriced for what it is. Ponche Vecchio/Intel MAX GPUs severs are too, for that matter, until it hits sub 10k when someone is interested enough to buy it.
>>
>>108364904
Reddit has a comparison of the different q4 quants.
Might be useful as a starting point?

https://old.reddit.com/r/LocalLLaMA/comments/1rk5qmr/qwen3527b_q4_quantization_comparison/
>>
>unironically being helpdesk to a vibeshitter unable to read documentation
lmao'd
>>
>>108365766
That's tangentially related to what I want, but it's interesting.
>>
24gigabros... any new erp-oriented model worth trying? I'm getting tired of magidonia/maginum cydoms/weirdcompound
>>
>>108365494
When I ask Claude to do it, it just does it and gives me the file.
>>
>>108365714
sorry anon I quoted the wrong post
>>
>>108366131
It has some tool that transforms some kind of input into a pdf file.
For local you need to put the pieces together yourself.
>>
>>108366131
Well, Claude is a big boy model, isn't it?
>>
>>108366150
>For local you need to put the pieces together yourself
I don't want to do this. How do I make it do it on its own?

>>108366155
Well it's only 20% better than the tiny models poorfags use in this thread.
>>
>Can I run AI locally?
https://news.ycombinator.com/item?id=47363754
>first comment: 9b is the best thing you can run locally, just give up
>second comment: the square root law of moe models
This is one of the worst threads of all time...
>>
>>108366207
I thought BitNet allowed use to run 1T models locally?
>>
File: file.png (272 KB, 1714x1260)
272 KB
272 KB PNG
>>108366195
>How do I make it do it on its own?
>>
>>108366236
Why doesn't my model do this?
>>
File: file.png (150 KB, 833x1039)
150 KB
150 KB PNG
>>108366239
I don't know. Here's Qwen 3.5 9B misinterpreting the prompt but still successfully creating a pdf after making and then fixing two syntax errors.
You really have no excuses.
>>
Just saw a snippet of an interview with some rich faggot lobbyist (that should die in a fire) saying that AI companies need less regulation to be able to bring progress. Legit made me mad when I thought about how they say this shit but also self impose religion of safety on themselves and everyone else.
>>
>>108366280
Less regulations means more cheap Indian/Nigerian RLHF and being able to sell AI to hospitals and engineers, not everyone running a coombot.
>>
>>108366280
putting restrictions on goyim is based but restrictions on jews and their direct underlings is antisemitic.
>>
File: 1762876499993988.jpg (92 KB, 742x566)
92 KB
92 KB JPG
output of toss
>>
>>108366222
Nobody has made a bitnet model
>>
>>108366222
>>108366380
Real bitnet has never been tried.
>>
>>108364020
>What are other 256GB anons dailying?
Qwen3.5-397B now. Was a mix of MiniMax-M2.5 and GLM-4.7 but Qwen3.5 is more practical for high context.
>Anyone doing 4x64gb agent swarm stuff
Wondered about this too. If a retard battallion works better than a single slightly smarter retard, there's lots of potential for mid tier hardware like ours (and broadly for narrowing the gap between cloud and local).
>>
>>108365567
This is literally just the metaverse all over again. The moment Zucc cares about something enough to personally meddle in it, it goes to shit.
>>
>>108366263
Oh wow thank you for your condescending tone, this is why I enjoy visiting this thread from time to time. Just to see some incel imagining that he is superior to others.
>>
>>108366207
I've tried to enlighten people on HN with practical tips on running frontier-level local LLMs and got absolutely nowhere.
Outside of a few clued-in oldtimers its a complete waste of time these days. 99.99% of the commenters have zero technical fundamentals or holistic knowledge of computers.
>>
>>108366414
>Qwen3.5-397B now. Was a mix of MiniMax-M2.5 and GLM-4.7 but Qwen3.5 is more practical for high context.
Thanks. I was thinking of trying that one next. What quant size and which quanter did you go with, or quant your own?
>retard battalion
that's my thought as well. I haven't looked into agent swarm tech at all tho. Probably start by building my own as a baseline
>>
>>108366480
I am superior to you but that's beside the point. You have provided no information about your setup and yet you expect help.
>>
>>108366480
Incel website, normalfag :^)
>>
File: sans_is-excited.png (53 KB, 1039x177)
53 KB
53 KB PNG
Are you excited for next week too?
https://x.com/osanseviero/status/2032589053741183301
>>
File: 1746292231307453.jpg (163 KB, 768x1280)
163 KB
163 KB JPG
>>108362305
>>
>>108366563
excited for another week of delicious nothingburger
>>
>>108365367
Not sure why I'm replying to a tourist, but link your github if you're so shit hot
There's been more innovation out of this thread in the last 3 years than any other publicly open place on the internet
>>
>>108366550
What do you mean?
>>
>>108365662
/lmg/ is currently undergoing a shift towards becoming a more productive general that fits its fundamental technology nature and discards some unfortunate baggage.
While the creative uses of LLMs are definitely groundbreaking, a more well-rounded approach without a particular bias towards a topic or theme increases helpfulness and relevance.
>>
>>108366563
More datasets? :eyes:
>>
>>108366586
he means you need to tell him what hardware, os, software stack, etc you are running before anything can be suggested
be extremely detailed
>>
File: cherish the vessel.jpg (431 KB, 1536x1536)
431 KB
431 KB JPG
>>
>>108366602
Why is that?
>>
>>108366629
The first one was better. Now her middle finger is in front of her ring finger and it looks weird.
>>
File: my_job_here_is_done.jpg (72 KB, 451x1024)
72 KB
72 KB JPG
>>108366593
Next week will be even greater. See you then.
>>
friday night brainrot!!!
https://www.youtube.com/watch?v=UsjsYMo3O1Q
>>
>>108366923
What is the top comment a reference to?
>>
I can't get the reasoning toggle to work for Qwen 3.5 in LMStudio. Is there a UI where things just work?
>>
>>108367035
llama.cpp webui so long as you don't pull when everything is broken because of a major refactor
>>
>>108367009
It doesn't appear to reference anything?
>>
>>108367035
It's working great on ollama + open-webui for chatting/programming. I'm having the most fun with ollama + openclaw right now though.
>>
>>108367009
>>108367046
https://www.youtube.com/watch?v=icBDYkfxpMs
>>
>>108366591
The coders are actually productive, the erpers are not
>>
>>108367076
Bull shit. None of your code is going to make a difference in the world. It's literally the same as ejaculating onto a tissue
>>
>>108366591
Coding is best done with cloud models. ERP is best done with local private models. /lmg/ and /aicg/ are really becoming backwards
>>
>>108367086
>Coding is best done with cloud models
qwen3.5 4b is only 18% worse than the cloud model
>>
>>108367097
At long context that 18% worse is gonna compound and becomes a lot worse
>>
>>108367102
I doubt you have proof that benchmarks are worse
>>
>>108367086
Why would the AI character general talk about coding? It's much more up the alley of the more broader and productivity-focused /lmg/.
>>
>>108367083
Do you buy tissues or just keep a roll of toilet paper on your desk? The toilet paper is much cheaper and in an economy like this I need to save as much money as possible for more VRAM.
But I just sploot onto my hand and walk to the bathroom instead, because keeping the toilet paper on the desk feels unsightly. As an added bonus, I can use the opportunity to wash my hands with some nicely scented hand soap for a good post-coital feelsnice.
>>
>>108367127
I use unbleached bamboo tissue. The texture is nice
>>
>>108367135
I was expecting that to be a nice looking above-brow product but instead these rough brown rolls look like the sort of overpriced consumer goods you'd find on a late night infomercial.
I'll take your word for it, but I'm sticking to my cheap-as-shit single-ply government issue toilet tissue.
>>
>>108367192
I should add that I'm uncut and I just wrap the tissue around the tip of my penis when I masturbate.
>>
>>108363040
What a horrible pic
>>108366629
Nice
>>
Back to 4 GB RAM
>>
>>108367215
Ah, that makes more sense. The single-ply stuff is too fragile for that, at best you could lay it on top to catch most of it. You wouldn't get any benefit from the texture until the wipe-off stage.
>>
I've been running Qwen3.5-27B on a 5090 with llama-server getting 45-50t/s, but that CanIRun thing is suggesting the 5090 can hit 80+t/s.
Looking into setting up vLLM right now to compare, but is that performance gap expected? I didn't expect there to be that big of a difference.
>>
>>108367280
vLLM can provide a slight performance increase, but not that much. What quant are you using?
>>
>>108367297
Q8_0 of coder3101/Qwen3.5-27B-heretic using the default parameters of the safetensor-to-gguf script.
>>
>>108367305
The comparison thing you are using could potentially be referring to a smaller quant.
>>
>>108367311
Thanks Anon, I'm retarded, that's exactly the problem was.
>>
Dumb question.

Deepseek is the only open weights model that naturally uses 8-bit activations ?
Everything else uses 16-bit activations ?
>>
>>108367457
They're all trained with 16 or 32 bit weights and then quantized down. Kimi K2.5 was supposedly trained in a "quantization aware" fashion and is basically considered a 4-bit model. But from what I gather they still trained it at 16 or 32 bit weights.

The reason they're trained at much higher precision weights is that during training you need to make changes that are very small and lower precision numbers would lose the granularity.

You could probably make all of them work, but it's largely a question of practicality. Probably easier to just get more hardware for training than to rebuild all the tooling. Also probably why nobody has done bitnet yet.
>>
> tfw finally found something that I can do with my Raspberry Pi 5 (run OpenClaw)
>>
>>108365567
It's wild just how many bad decisions were made in rapid succession to one another
Deep frying Llama 4 over the span of, like, two weeks when they had ages to train it
Firing the AI team and giving up the edge they had (the only big American company to open source frontier models) to aspire to become another Amazon instead
Demoting one of the faces of deep learning to hire a dude who's claim to fame is selling shovels while he dug his dick into Altman's asshole
Prostituting himself to Trump in hopes he'd help only to be second favorite to Altman anyway
>>
>>108367606
Oh yeah I also forgot about them falsifying benchmarks and assaulting the arena with like 30 variants of behemoth to try to promptmaxx their score
What a fucking useless company
>>
>>108367052
worst miku i've ever seen
>>
>>108367585
But now you need to find something to do with OpenClaw
>>
>>108366536
AesSedai Q4_K_M, which leaves me with around 10 GB of RAM to spare. I have a 4090.

>Probably start by building my own as a baseline
Cool, keep us posted. Don't have much intuition for batch inference. Would guess you need to dedicate the GPU to prefill (painful for agentic shit even with batch of one), but then what batch size can you reach before the CPU part flips from BW bound to ALU bound? How can batching even help much for BW bound decode on MoE models, when any reasonable batch count will only occasionally hit the same expert? Will it effectively work better with dense models, perhaps Qwen-3.5-9B?

This is without even getting to the higher-level strategies for tard battalion wrangling.
>>
File: zlq8ha4gjwog1.png (42 KB, 677x463)
42 KB
42 KB PNG
What should I do
>>
File: 1422449559229.jpg (16 KB, 330x344)
16 KB
16 KB JPG
>>
>>108367770
flee the country
>>
>>108367770
uninstall the app
>>
>>108367770
local models???
>>
>>108367787
those are nvda calls
>>
>>108367770
dude it's virtual money just turn off the screen bro
it's not real bro
>>
>>108367770
sex with miku
>>
>>108367831
miku demands expensive vram
>>
>>108366563
stop posting twatter slut
>>
File: readingcomprehennsion.png (57 KB, 1050x283)
57 KB
57 KB PNG
>>108366585
>why I'm replying to a tourist
he then proceeds to post
>but link your github
anon, why are you even on 4cucks? go back to plebbit or some other place where you can scan people's posting history and act like the old lady always looking from the window at the neighbors
>There's been more innovation out of this thread in the last 3 years than any other publicly open place on the internet
1/ lack of reading comprehension: I was talking about the vibeshitter general the anons linked, that general is a newborn, not a 3 years old thing retard.
2/ take your meds, schizo
>>
File: thisthread.png (89 KB, 1494x370)
89 KB
89 KB PNG
>>108367942
Not the anon you're replying to, but I was the one in that post. What's the problem?
>>
>>108367942
Your post was badly worded, don't blame his reading comprehension. When you said "this thread" everyone assumed you meant, well, this one. If you were talking about the vibeshitter general, you should have said "that thread".
>>
>>108367968
>everyone misunderstood
I wasn't a participant in that exchange, but I thought it was pretty clear that Anon was referring to the vibecoder thread.
Please don't lump me in with your sub-normal IQ group.
>>
>>108367962
problem is helping literal retards
>>
need v4
>>
>>108368010
The faster he gets his shit running, the faster he'll leave. And pointing him to a thread where more people use whatever he's using is only going to make it better for him, and this thread.
>>
>>108368023
but you gave him multiple posts crash course, not just a 'fuck off to vibefaggots central'.
Anyway, let's get back on topic.
what is pwilkin up to? how does he plan to fuck up llmao.cpp further?
>>
>>108368028
I gave him one link to tell him he's a retard. And one link to fuck off to. Chill.
>what is pwilkin up to?
He's been waiting for his horde of retards to summarize Qwen Code's documentation.
>>
>>108367770
There is nothing you can do, you are fucked. Like, fucked for life. Why did you think using margins were a good idea?
>>
>>108368195
>>108368195
>>108368195



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.