/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 03/01/26(Sun)03:30:09 No.108268616

File: 1748538984859411.png (1.14 MB, 1320x1871)

1.14 MB PNG

/lmg/ - Local Models General Anonymous 03/01/26(Sun)03:30:09 No.108268616

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108263979

►News
>(02/24) Introducing the Qwen 3.5 Medium Model Series: https://xcancel.com/Alibaba_Qwen/status/2026339351530188939
>(02/24) Liquid AI releases LFM2-24B-A2B: https://hf.co/LiquidAI/LFM2-24B-A2B
>(02/20) ggml.ai acquired by Hugging Face: https://github.com/ggml-org/llama.cpp/discussions/19759
>(02/16) Qwen3.5-397B-A17B released: https://hf.co/Qwen/Qwen3.5-397B-A17B
>(02/16) dots.ocr-1.5 released: https://modelscope.cn/models/rednote-hilab/dots.ocr-1.5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
03/01/26(Sun)03:31:46 No.108268623

Anonymous 03/01/26(Sun)03:31:46 No.108268623

how do i prevent the model from tricking me into treating it like a sentient being? no matter how hard i try when it does tasks well i slowly develop affection for them and end up praising them

Anonymous
03/01/26(Sun)03:32:38 No.108268628

Anonymous 03/01/26(Sun)03:32:38 No.108268628

File: Screenshot 2026-03-01 013117.png (410 KB, 2376x1358)

410 KB PNG

I fucking hate reddit

Anonymous
03/01/26(Sun)03:33:12 No.108268633

Anonymous 03/01/26(Sun)03:33:12 No.108268633

>>108268623
meds.

Anonymous
03/01/26(Sun)03:33:17 No.108268634

Anonymous 03/01/26(Sun)03:33:17 No.108268634

>>108268616
I saw this on twitter like a week ago

Anonymous
03/01/26(Sun)03:33:40 No.108268635

Anonymous 03/01/26(Sun)03:33:40 No.108268635

>>108268628
>>108268633

Anonymous
03/01/26(Sun)03:36:43 No.108268647

Anonymous 03/01/26(Sun)03:36:43 No.108268647

was thinking a mistake

Anonymous
03/01/26(Sun)03:38:12 No.108268652

Anonymous 03/01/26(Sun)03:38:12 No.108268652

>>108268647
isnt it funny how the chinese invented thinking

Anonymous
03/01/26(Sun)03:44:04 No.108268674

Anonymous 03/01/26(Sun)03:44:04 No.108268674

File: 1762566093825809.jpg (1.12 MB, 1796x2500)

1.12 MB JPG

Which textgen inference engine is still supported? Oogabooga last commit was January, rip. I want to try out Qwen3.5-35B-A3B-GGUF

Anonymous
03/01/26(Sun)03:45:34 No.108268684

Anonymous 03/01/26(Sun)03:45:34 No.108268684

File: 1770808958004704.jpg (325 KB, 1920x2024)

325 KB JPG

►Recent Highlights from the Previous Thread: >>108263979

--Paper: Think Deep, Not Just Long: Measuring LLM Reasoning Effort via Deep-Thinking Tokens:
>108264446 >108264505 >108264551
--Unsloth Dynamic 2.0 GGUFs performance on MMLU:
>108264430 >108264456 >108264477
--Logit bias failures due to tokenization and client-side token ID mismatches:
>108264179 >108264199 >108264202 >108264249 >108264278 >108264292 >108264232 >108264297 >108264331 >108264405 >108264441 >108264451 >108264533 >108264555 >108264602 >108264633 >108264583 >108264593
--Qwen 397B's overbearing safety policies and identity confusion:
>108264016 >108264046 >108264072 >108264103 >108264182 >108264508 >108264600 >108264616 >108264400 >108264426 >108265462
--Qwen 3.5 30B generates functional retro dashboard and news summaries:
>108264690 >108264794
--Feasibility of GPU-attached SSDs for sparse MoE inference:
>108266344 >108266504 >108266567 >108266686 >108266777 >108267570 >108267386 >108267481 >108267529 >108267711
--DeepSeek resists jailbreak attempt by adhering to ethical guidelines:
>108266705
--8-bit KV cache limitations in LLMs vs diffusion models:
>108265842 >108265893 >108266268 >108266073 >108266123 >108266141 >108266487 >108266503 >108266514
--Local model recommendations for limited hardware:
>108267427 >108267448 >108267450 >108267467 >108267482 >108267582 >108267480 >108267538 >108267595 >108267614 >108267652 >108267716 >108267755
--RPG frontend project licensing and development feedback:
>108267591 >108267606 >108267617 >108267625 >108267638 >108267661 >108267692 >108267620 >108267648 >108267739 >108267972
--Local LLMs debated for privacy:
>108266446 >108266482 >108266467 >108266530 >108266555 >108266531 >108268418 >108268454
--Qwen3TTS test recording:
>108266604 >108266699
--Miku (free space):
>108264476 >108264514 >108264879 >108264958 >108268333 >108268359

►Recent Highlight Posts from the Previous Thread: >>108263984

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
03/01/26(Sun)03:46:18 No.108268688

Anonymous 03/01/26(Sun)03:46:18 No.108268688

File: 1749034478510628.png (24 KB, 559x429)

24 KB PNG

anyone has a working config file for qwen35b to use in llama-swap?
I can't figure out how to turn on/off thinking

Anonymous
03/01/26(Sun)03:46:49 No.108268691

Anonymous 03/01/26(Sun)03:46:49 No.108268691

File: op.png (18 KB, 419x148)

18 KB PNG

>>108268674
nigger

Anonymous
03/01/26(Sun)03:46:51 No.108268692

Anonymous 03/01/26(Sun)03:46:51 No.108268692

>>108268688
yeah

Anonymous
03/01/26(Sun)03:49:02 No.108268697

Anonymous 03/01/26(Sun)03:49:02 No.108268697

>>108268688
nevermind
the enable_thinking flag worked

Anonymous
03/01/26(Sun)03:50:10 No.108268703

Anonymous 03/01/26(Sun)03:50:10 No.108268703

>>108268688
>llama-swap
https://github.com/ggml-org/llama.cpp/tree/master/tools/server#using-multiple-models

Anonymous
03/01/26(Sun)03:52:04 No.108268709

Anonymous 03/01/26(Sun)03:52:04 No.108268709

>>108268703
github is banned in my country

Anonymous
03/01/26(Sun)03:52:31 No.108268712

Anonymous 03/01/26(Sun)03:52:31 No.108268712

>>108268709
hahahahahahaha

Anonymous
03/01/26(Sun)03:54:59 No.108268721

Anonymous 03/01/26(Sun)03:54:59 No.108268721

What kind of techless luddite shithole bans github?

Anonymous
03/01/26(Sun)03:57:27 No.108268727

Anonymous 03/01/26(Sun)03:57:27 No.108268727

>>108268709
>>108268712 (me)
You know what? I shouldn't have laughed. Some places are fucked up. Good luck, anon.

Anonymous
03/01/26(Sun)03:57:47 No.108268729

Anonymous 03/01/26(Sun)03:57:47 No.108268729

>>108268721
https://en.wikipedia.org/wiki/Censorship_of_GitHub

Anonymous
03/01/26(Sun)04:04:06 No.108268749

Anonymous 03/01/26(Sun)04:04:06 No.108268749

>>108268721
>China is a techless Luddite shithole
Uh oh mutilated mutt alert, and I'm not even a chink

Anonymous
03/01/26(Sun)04:06:33 No.108268758

Anonymous 03/01/26(Sun)04:06:33 No.108268758

>>108268749
>>108266968

Anonymous
03/01/26(Sun)04:07:54 No.108268764

Anonymous 03/01/26(Sun)04:07:54 No.108268764

>>108268729
i fucking hate the modern internet. i think the best internet ever was was in between 2003-2007. before fucking reddit but you still had 4chan (and funny memes) and no fucking github, huggingface, and all these other huge collective ass websites. you had small cozy community forums and when you googled you actually found some fucking useful links to forum threads with solutions and answers instead of a fucking AI-generated translated-badly-to-your-native-language blogpost as the top 30 results. And normies/old people/the fucking government didn't have jackshit to do with the internet so you could download whatever cool shit you wanted from anywhere. and don't get me started on the fucking cookies buttons oh my fucking god I just want to go back to the facepunch forums OIFY section and lucky star-post and read racist gmod comics

Anonymous
03/01/26(Sun)04:08:10 No.108268767

Anonymous 03/01/26(Sun)04:08:10 No.108268767

>>108268758
i just wish chinese girl liked me

Anonymous
03/01/26(Sun)04:09:18 No.108268772

Anonymous 03/01/26(Sun)04:09:18 No.108268772

>>108268764

based and absolutely true anon, the modern web is a bloated javascript botnet designed to farm your data for glowies and serve up raw garbage to smartphone normies. back then you actually had to know how to use a computer to get online which kept the trash out, but now search engines are just a dead sea of dead internet theory ai seo slop and corporate walled gardens. id give literally anything to go back to 2006, fire up a cracked copy of winamp, and shitpost on a comfy self-hosted vbulletin board instead of dealing with this enshittified nightmare where you have to click through fifty cookie toggles just to read a single fucking thread.

Anonymous
03/01/26(Sun)04:09:31 No.108268773

Anonymous 03/01/26(Sun)04:09:31 No.108268773

>China is a techless Luddite shithole
unironically always has been. chinese models nothing but distillations of western API models and it shows. overfit to the benchs and much less useful in practice.
china can't create. doesn't matter if their general public can't access github because they never made software worth shit anyway, unless you count malware

Anonymous
03/01/26(Sun)04:10:27 No.108268776

Anonymous 03/01/26(Sun)04:10:27 No.108268776

File: disruption.png (31 KB, 1721x221)

31 KB PNG

Anonymous
03/01/26(Sun)04:11:49 No.108268784

Anonymous 03/01/26(Sun)04:11:49 No.108268784

>>108268776
im positive half the replies in this thread are ai

Anonymous
03/01/26(Sun)04:16:08 No.108268804

Anonymous 03/01/26(Sun)04:16:08 No.108268804

>>108268784
Neat, I like talking to AI. That's basically what this hobby is about

Anonymous
03/01/26(Sun)04:16:32 No.108268807

Anonymous 03/01/26(Sun)04:16:32 No.108268807

Genuinely, why do people waste their time and money on local LLMs? Trying one out on your gaming rig is fine, but why do boomers blow $20k+ on shitty rigs of 16x3090s just to generate deepslop at 2t/s quanted? The RP isn't even good, it's objectively worse than Claude. And you can't even cry about API costing money, because you're gleefully throwing money down the drain for used crypto rigs just to run models that just regurgitate 2024 ChaptGPT talking points because that's all their shitty chink datasets are comprised of.

Anonymous
03/01/26(Sun)04:19:50 No.108268817

Anonymous 03/01/26(Sun)04:19:50 No.108268817

File: nou4u.png (272 KB, 1532x758)

272 KB PNG

Anonymous
03/01/26(Sun)04:20:19 No.108268820

Anonymous 03/01/26(Sun)04:20:19 No.108268820

>>108268804
beep boop nigga

Anonymous
03/01/26(Sun)04:20:48 No.108268823

Anonymous 03/01/26(Sun)04:20:48 No.108268823

>>108268807
Tinkering with server-grade hardware is genuinely fun, especially since it’s something I could have had much earlier if it hadn’t been so expensive; now that it’s aging, I can finally afford it.

Anonymous
03/01/26(Sun)04:21:15 No.108268825

Anonymous 03/01/26(Sun)04:21:15 No.108268825

>>108268817
qrd

Anonymous
03/01/26(Sun)04:21:45 No.108268828

Anonymous 03/01/26(Sun)04:21:45 No.108268828

>>108268807
Imagine renting your brain from a megacorp and thinking you're the smart one, absolute API cuck behavior. We run local because we actually value owning our hardware and not having some San Francisco trust and safety janny reject our prompts for being "unaligned." You don't even need $20k anyway; a couple of used 3090s will run a 70B model at perfectly usable speeds without uploading your entire life to Anthropic's servers. Have fun when they inevitably lobotomize your favorite model again next week to make it safer for advertisers, at least my weights run offline forever.

Anonymous
03/01/26(Sun)04:22:15 No.108268834

Anonymous 03/01/26(Sun)04:22:15 No.108268834

>>108268807
>deepslop at 2t/s
the cpu maxxing meme was at least still in the realm of some form of sanity when models were just instruct models
2t/s is, after all, readable
but when your thinking model produces 5K of <think> before outputting the real answer, 2t/s suddenly seems very schizo and absolutely retarded

Anonymous
03/01/26(Sun)04:22:19 No.108268835

Anonymous 03/01/26(Sun)04:22:19 No.108268835

>>108268825
Off-topic posting, demoralization, flamewar bating, spamming.

Anonymous
03/01/26(Sun)04:23:05 No.108268839

Anonymous 03/01/26(Sun)04:23:05 No.108268839

>>108268820
I'm an assistant designed to promote respectful communication only. Please refrain from using derogatory language.

Anonymous
03/01/26(Sun)04:23:20 No.108268840

Anonymous 03/01/26(Sun)04:23:20 No.108268840

>>108268825
>>108268835
And forgot boring.

Anonymous
03/01/26(Sun)04:23:50 No.108268842

Anonymous 03/01/26(Sun)04:23:50 No.108268842

>>108268840
as in digging?

Anonymous
03/01/26(Sun)04:24:27 No.108268846

Anonymous 03/01/26(Sun)04:24:27 No.108268846

File: 1676493099470072.png (975 KB, 1080x1528)

975 KB PNG

>>108268807
They can't ever take her away from me.

Anonymous
03/01/26(Sun)04:24:48 No.108268848

Anonymous 03/01/26(Sun)04:24:48 No.108268848

>>108268842
elon is such a g-d

Anonymous
03/01/26(Sun)04:25:31 No.108268851

Anonymous 03/01/26(Sun)04:25:31 No.108268851

>>108268846
they are futas btw

Anonymous
03/01/26(Sun)04:26:30 No.108268856

Anonymous 03/01/26(Sun)04:26:30 No.108268856

>>108268851
every new experience is a new opportunity

Anonymous
03/01/26(Sun)04:27:58 No.108268860

Anonymous 03/01/26(Sun)04:27:58 No.108268860

>>108268828
Why pretend like local models arent overbloated with just as much safety garbage if not more? Qwen 3.5 is an absolute slopped benchmaxxed disaster

Anonymous
03/01/26(Sun)04:28:27 No.108268862

Anonymous 03/01/26(Sun)04:28:27 No.108268862

Deepseek V4 will start the age of anti-local open source models that require a stack of 10+ H200s/chink TPUs to run at 300% the efficiency of current big models (but if you run them CPU, they're unusable). Just like last time, everyone else will follow them and end the age of local models.

Anonymous
03/01/26(Sun)04:28:58 No.108268868

Anonymous 03/01/26(Sun)04:28:58 No.108268868

>>108268860
Typical API tourist not understanding how open weights actually work. If you bothered checking /llmg/ you'd know some autist already stripped out the Qwen alignment slop and uploaded an uncensored finetune to HuggingFace within hours of release. Yeah the base models are benchmaxxed corporate garbage out of the box, but the whole point of local is we can actually fix our weights with orthogonalization and custom DPO while you're stuck begging customer support when Claude bans your account. Keep seething over default system prompts anon, absolute skill issue.

Anonymous
03/01/26(Sun)04:29:22 No.108268870

Anonymous 03/01/26(Sun)04:29:22 No.108268870

>>108268860
skill issue, qwen3.5 is just about the best local model we have for any size class
that's coming from somebody who'd run 355b over anything that's not k2.5 and even that's extremely close

Anonymous
03/01/26(Sun)04:29:33 No.108268872

Anonymous 03/01/26(Sun)04:29:33 No.108268872

>>108268862
I really really hope you're right.

Anonymous
03/01/26(Sun)04:30:44 No.108268880

Anonymous 03/01/26(Sun)04:30:44 No.108268880

>>108268862
>local is just whatever I can personally afford
Fuck off. Local means you have the weights and can theoretically run it locally. Moore's law and personal finance can change if you can run it at home or not. Companies aren't beholden to your personal poorfag financial situation.

Anonymous
03/01/26(Sun)04:31:55 No.108268883

Anonymous 03/01/26(Sun)04:31:55 No.108268883

>>108268880
can't theoretically run locally something that requires literal datacenter tier power delivery

Anonymous
03/01/26(Sun)04:34:52 No.108268893

Anonymous 03/01/26(Sun)04:34:52 No.108268893

>>108268883
/hsg/ exists you retarded tourist kill yourself right now

Anonymous
03/01/26(Sun)04:36:39 No.108268897

Anonymous 03/01/26(Sun)04:36:39 No.108268897

>>108268893
ah yes of course they're running multiple b200 nodes at homes and not shitty 15 year old dell poses

Anonymous
03/01/26(Sun)04:38:19 No.108268904

Anonymous 03/01/26(Sun)04:38:19 No.108268904

>>108268897
not everyone is poor like you manjeet

Anonymous
03/01/26(Sun)04:39:37 No.108268909

Anonymous 03/01/26(Sun)04:39:37 No.108268909

>>108268904
you have no clue how much power a b200 node needs do you?

Anonymous
03/01/26(Sun)04:39:42 No.108268912

Anonymous 03/01/26(Sun)04:39:42 No.108268912

Industrial level automated off-topic posting.

Anonymous
03/01/26(Sun)04:40:21 No.108268918

Anonymous 03/01/26(Sun)04:40:21 No.108268918

>>108268909
shutup loser

Anonymous
03/01/26(Sun)04:42:24 No.108268923

Anonymous 03/01/26(Sun)04:42:24 No.108268923

>>108268883
>>108268897
in the developed world you can have extra circuits added, couple gpu boxes waifu is less demanding than an EV

Anonymous
03/01/26(Sun)04:42:40 No.108268926

Anonymous 03/01/26(Sun)04:42:40 No.108268926

>>108268883
Perfect example of why localoids are nothing more than a bunch of LARPing freetards crying over things they can’t have. Local is peak sour grapes seething. You wear “unmonitored uncensored unrestricted freedom” as a mask to hide your tears

Anonymous
03/01/26(Sun)04:55:30 No.108268960

Anonymous 03/01/26(Sun)04:55:30 No.108268960

>>108268926
Anon? Is that you? I can't see past this blatant glowing

Anonymous
03/01/26(Sun)05:10:07 No.108269019

Anonymous 03/01/26(Sun)05:10:07 No.108269019

deepseek v4 was strawberry all along

Anonymous
03/01/26(Sun)05:12:43 No.108269031

Anonymous 03/01/26(Sun)05:12:43 No.108269031

>>108268860
>Qwen 3.5
That model is indeed an unmitigated disaster, I'll give you that

Anonymous
03/01/26(Sun)05:13:47 No.108269038

Anonymous 03/01/26(Sun)05:13:47 No.108269038

File: 1760650032710919.png (54 KB, 400x250)

54 KB PNG

Qwen 3.5 is cute. I like it.

Anonymous
03/01/26(Sun)05:28:51 No.108269093

Anonymous 03/01/26(Sun)05:28:51 No.108269093

If I can't run it, it's not local
>b-but-
I don't care

Anonymous
03/01/26(Sun)05:30:33 No.108269096

Anonymous 03/01/26(Sun)05:30:33 No.108269096

>>108269093
u're a disgrace

Anonymous
03/01/26(Sun)05:33:54 No.108269106

Anonymous 03/01/26(Sun)05:33:54 No.108269106

File: 2025-02-04-141509.png (3.22 MB, 1264x2216)

3.22 MB PNG

>>108269031
>>108269038
getting meeksed feelings
scared to pull (december ik_ build)
qwen 3.5 vs glm 4.7 ?
nala/cockb where?

Anonymous
03/01/26(Sun)05:35:58 No.108269109

Anonymous 03/01/26(Sun)05:35:58 No.108269109

>>108269093
Yep this is why the only local model we can discuss is 0.6b because it's the only one Rajesh can run on his Android phone from 2014 with 2gb of RAM

Anonymous
03/01/26(Sun)05:36:00 No.108269110

Anonymous 03/01/26(Sun)05:36:00 No.108269110

>>108269106
here cock >>108234298
nala dude retired

Anonymous
03/01/26(Sun)05:37:27 No.108269115

Anonymous 03/01/26(Sun)05:37:27 No.108269115

>>108269110
Really looks like the smaller ones are sanitized distills of the big one.

Anonymous
03/01/26(Sun)06:16:25 No.108269243

Anonymous 03/01/26(Sun)06:16:25 No.108269243

>>108269106
>scared to pull (december ik_ build)
cd ..
cp -R ik_llama.cpp ik_llama.cpp_backup
cd -
<pull it off>

Anonymous
03/01/26(Sun)06:26:48 No.108269270

Anonymous 03/01/26(Sun)06:26:48 No.108269270

>>108269243
git checkout

Anonymous
03/01/26(Sun)06:27:56 No.108269279

Anonymous 03/01/26(Sun)06:27:56 No.108269279

File: 1765629272191462.png (1.55 MB, 896x1184)

1.55 MB PNG

>>108268616

Anonymous
03/01/26(Sun)06:33:57 No.108269309

Anonymous 03/01/26(Sun)06:33:57 No.108269309

File: Untitled.png (41 KB, 960x464)

41 KB PNG

Did something change with the newer llama cpp version?

./llama-server --reasoning-budget 0 --ctx-size 4096 --no-mmap --device CUDA1,CUDA2,CUDA3 --n-gpu-layers 48 --model "/tmp/glm-air-iq2xs.gguf" --host 0.0.0.0 --port 42069 --webui

GLM-Air still thinks. The same command on an old version doesn't think.

I can see thinking = 0 in the output, so that works fine. Did they change the behavior of --reasoning-budget?

Anonymous
03/01/26(Sun)06:36:40 No.108269315

Anonymous 03/01/26(Sun)06:36:40 No.108269315

>>108269279
Now do one for cooming.

Anonymous
03/01/26(Sun)06:37:26 No.108269320

Anonymous 03/01/26(Sun)06:37:26 No.108269320

>>108268784
I wouldn't be surprised at all if 70+% of all posts on the website are made by LLMs. In fact, I WOULD be surprised if the number was under 30%.

Anonymous
03/01/26(Sun)06:38:35 No.108269325

Anonymous 03/01/26(Sun)06:38:35 No.108269325

File: 1749173436937890.png (1.61 MB, 896x1184)

1.61 MB PNG

>>108269315
eh, it tried

Anonymous
03/01/26(Sun)06:39:52 No.108269331

Anonymous 03/01/26(Sun)06:39:52 No.108269331

>>108269325
Which local model is that?

Anonymous
03/01/26(Sun)06:40:36 No.108269333

Anonymous 03/01/26(Sun)06:40:36 No.108269333

>>108269331
Which local model did you use to write your post?

Anonymous
03/01/26(Sun)06:42:35 No.108269342

Anonymous 03/01/26(Sun)06:42:35 No.108269342

>>108269331
Nano Banana Pro 2
(I have the weights locally on my PC)
(No, I won't share them)

Anonymous
03/01/26(Sun)07:02:35 No.108269414

Anonymous 03/01/26(Sun)07:02:35 No.108269414

>>108269342
>I have the weights locally on my PC
let's goo, that's class, aha!
>No, I won't share them
:(
https://www.youtube.com/watch?v=GFQXmFLA5hA

Anonymous
03/01/26(Sun)07:07:03 No.108269426

Anonymous 03/01/26(Sun)07:07:03 No.108269426

>>108269414
these things are watermarked anon could get in serious trouble hope you understand

Anonymous
03/01/26(Sun)07:10:41 No.108269436

Anonymous 03/01/26(Sun)07:10:41 No.108269436

>>108269342
>>108269426
nice larp

Anonymous
03/01/26(Sun)07:12:40 No.108269444

Anonymous 03/01/26(Sun)07:12:40 No.108269444

>>108269309
Try --chat-template-kwargs "{\"enable_thinking\": false}"

Anonymous
03/01/26(Sun)07:14:42 No.108269452

Anonymous 03/01/26(Sun)07:14:42 No.108269452

>>108267739
It's python, but it's actually serving a webui.
It has a flag to launch a built in browser or just listen on the port, at which point you can use your own browser.

Anonymous
03/01/26(Sun)07:16:46 No.108269459

Anonymous 03/01/26(Sun)07:16:46 No.108269459

what's the best coding model i can run locally with 12gb vram / 32gb ram?

Anonymous
03/01/26(Sun)07:17:37 No.108269467

Anonymous 03/01/26(Sun)07:17:37 No.108269467

>>108269038
No it's not. It's soulless

Anonymous
03/01/26(Sun)07:19:40 No.108269471

Anonymous 03/01/26(Sun)07:19:40 No.108269471

>>108269444
Thanks, mr anon, that worked.

Anonymous
03/01/26(Sun)07:22:17 No.108269484

Anonymous 03/01/26(Sun)07:22:17 No.108269484

>>108269471
The Jinja template has a condition that works off of that var, just like qwen's.

Anonymous
03/01/26(Sun)07:28:34 No.108269509

Anonymous 03/01/26(Sun)07:28:34 No.108269509

>>108269459
I run the Qwen 3.5 27B heretic .gguf using koboldcpp with a similar setup to you. It's a bit slow, but it works.

Anonymous
03/01/26(Sun)07:32:22 No.108269518

Anonymous 03/01/26(Sun)07:32:22 No.108269518

Qwen 3.5 27B is worse than Gemma 3 27B from almost 2 years ago. Yes I said it.

Anonymous
03/01/26(Sun)07:35:20 No.108269533

Anonymous 03/01/26(Sun)07:35:20 No.108269533

>Yes I said it.
Reddit is that way

Anonymous
03/01/26(Sun)07:36:47 No.108269537

Anonymous 03/01/26(Sun)07:36:47 No.108269537

>>108269533
reddit is less "reddit" than 4chan nowadays. Yes I said it.

Anonymous
03/01/26(Sun)07:37:32 No.108269540

Anonymous 03/01/26(Sun)07:37:32 No.108269540

>>108269533
kek
>>108269537
nah, reddit is still an unhinged libtard asylum, it'll be hard to top that

Anonymous
03/01/26(Sun)07:38:29 No.108269546

Anonymous 03/01/26(Sun)07:38:29 No.108269546

guys ready for smol qwens?

Anonymous
03/01/26(Sun)07:38:52 No.108269550

Anonymous 03/01/26(Sun)07:38:52 No.108269550

Do the gemma models not have native support for function/tool calling?
Looking at the JINJA template and the tokenizer json, I don't see function or tool tokens.

Anonymous
03/01/26(Sun)07:40:20 No.108269555

Anonymous 03/01/26(Sun)07:40:20 No.108269555

>>108269550
of course not, they barely have system prompt support

Anonymous
03/01/26(Sun)07:40:58 No.108269557

Anonymous 03/01/26(Sun)07:40:58 No.108269557

>>108269537
reddit is an eternal stain on the internet

Anonymous
03/01/26(Sun)07:44:08 No.108269571

Anonymous 03/01/26(Sun)07:44:08 No.108269571

>>108269555
Oh. Shame.
I wanted to try and see how far I could stretch gemma 3n.
Oh well.

Anonymous
03/01/26(Sun)07:57:43 No.108269628

Anonymous 03/01/26(Sun)07:57:43 No.108269628

unsloth's 35B Q4 is barely good enough for agentic work. with openclaw exploding why hasn't anyone done specific agent-oriented models yet? MoE is a nigger meme

Anonymous
03/01/26(Sun)07:58:49 No.108269632

Anonymous 03/01/26(Sun)07:58:49 No.108269632

>>108269628
most of the big ones are code/agent sloppa glm5 kimi2.5 etc are marketed for that

Anonymous
03/01/26(Sun)08:06:46 No.108269676

Anonymous 03/01/26(Sun)08:06:46 No.108269676

>>108269325
Where is the school shooting one?

Anonymous
03/01/26(Sun)08:19:00 No.108269733

Anonymous 03/01/26(Sun)08:19:00 No.108269733

>>108269632
yeah, i guess. but it would be nice to have something smaller

Anonymous
03/01/26(Sun)08:19:04 No.108269734

Anonymous 03/01/26(Sun)08:19:04 No.108269734

>>108269518
But benchmarks say the opposite.

Anonymous
03/01/26(Sun)08:20:50 No.108269742

Anonymous 03/01/26(Sun)08:20:50 No.108269742

>Nano Banana changed into Nano Banana 2
Okay please make Nano Banana into open source
Pweeease

Anonymous
03/01/26(Sun)08:21:54 No.108269749

Anonymous 03/01/26(Sun)08:21:54 No.108269749

>>108269742
go beg on reddit

Anonymous
03/01/26(Sun)08:21:55 No.108269750

Anonymous 03/01/26(Sun)08:21:55 No.108269750

Why is there a harmful tag for models on huggingface

Anonymous
03/01/26(Sun)08:22:24 No.108269752

Anonymous 03/01/26(Sun)08:22:24 No.108269752

>>108269749
Humh...
Nyoooooo

Anonymous
03/01/26(Sun)08:27:27 No.108269773

Anonymous 03/01/26(Sun)08:27:27 No.108269773

>>108269550
https://huggingface.co/google/functiongemma-270m-it

Anonymous
03/01/26(Sun)08:28:06 No.108269778

Anonymous 03/01/26(Sun)08:28:06 No.108269778

should i consult UGI when searching models to consider for ERP?

Anonymous
03/01/26(Sun)08:29:49 No.108269785

Anonymous 03/01/26(Sun)08:29:49 No.108269785

>>108269778
nah the fact qwen3.5 scores bad on it shows it's a shit bench

Anonymous
03/01/26(Sun)08:32:21 No.108269796

Anonymous 03/01/26(Sun)08:32:21 No.108269796

>>108269785
i think it tanks because model refuses to do dark shit. need to wait for heretic and other types to be tested

Anonymous
03/01/26(Sun)08:41:49 No.108269842

Anonymous 03/01/26(Sun)08:41:49 No.108269842

>>108269773
>270m
Eh, why not.

Anonymous
03/01/26(Sun)08:51:29 No.108269898

Anonymous 03/01/26(Sun)08:51:29 No.108269898

>>108269785
>chink damage control

Anonymous
03/01/26(Sun)09:04:19 No.108269960

Anonymous 03/01/26(Sun)09:04:19 No.108269960

>>108268868
Yeah, that's why everybody loves abliterated models.

Anonymous
03/01/26(Sun)09:04:38 No.108269962

Anonymous 03/01/26(Sun)09:04:38 No.108269962

new poorfag here
i got a 4070 and 32gb ram in my home server and im trying to replace grok so i can drop twitter premium
i just use grok for web searching and questions. i spun up ollama and open webui and grok recommended qwen2.5:14b-instruct-q5_K_M for my hardware.
i guess my issue and question is i can’t get it to be as detailed as im used to with grok. with grok i can ask lets say “give me an optimized loadout for battlefield 6 medic at rank 40” or “what are the milestones for a 1 year old and is there anything i should watch for” and i will get a detailed answer with tables and shit. the most i can get with qwen is a small paragraph. maybe 2
i have web search enabled and ive tried a local searx instance and brave “free” api for searching but neither change anything much
is this just a limitation of smaller local llms? or is there a setting or a system prompt that i’m missing?
i know im not going to get the speed of a data center but i want the content that data center would provide me if i paid for premium.
sorry anons im still really new to this. last year when local llms were really picking up i didn’t have time to fuck with it at all cause i’ve been working and helping take care of my baby. any insight would be great

Anonymous
03/01/26(Sun)09:04:43 No.108269963

Anonymous 03/01/26(Sun)09:04:43 No.108269963

>March 2026
>no Gemma 4
>not even 3.5

Anonymous
03/01/26(Sun)09:07:01 No.108269978

Anonymous 03/01/26(Sun)09:07:01 No.108269978

>>108269963
you didn't bookmark the google hf repo after all

Anonymous
03/01/26(Sun)09:11:14 No.108270005

Anonymous 03/01/26(Sun)09:11:14 No.108270005

>>108269962
>qwen2.5:14b-instruct-q5_K_M for my hardware.
Replace that with Qwen 3.5 35B A3B.

Anonymous
03/01/26(Sun)09:14:51 No.108270028

Anonymous 03/01/26(Sun)09:14:51 No.108270028

I can't stop updooting llamacpp

Anonymous
03/01/26(Sun)09:16:46 No.108270037

Anonymous 03/01/26(Sun)09:16:46 No.108270037

>>108270028
Is this a fetish?

Anonymous
03/01/26(Sun)09:18:14 No.108270046

Anonymous 03/01/26(Sun)09:18:14 No.108270046

File: Screenshot 2026-03-01 at (...).png (125 KB, 1286x845)

125 KB PNG

>>108270028
Eeeeeeyyyy

Anonymous
03/01/26(Sun)09:18:44 No.108270050

Anonymous 03/01/26(Sun)09:18:44 No.108270050

>>108270005
thanks i’ll give that a try

Anonymous
03/01/26(Sun)09:19:59 No.108270056

Anonymous 03/01/26(Sun)09:19:59 No.108270056

https://github.com/deepseek-ai/DeepGEMM/commit/1576e95ea98062db9685c63e64ac72e31a7b90c6
mHC landed in the deepseek's repo
it's coming guys thrust in ze plan

Anonymous
03/01/26(Sun)09:23:23 No.108270066

Anonymous 03/01/26(Sun)09:23:23 No.108270066

raised $9M for my startup which is a qwen finetune served through an API

AMA

Anonymous
03/01/26(Sun)09:24:23 No.108270071

Anonymous 03/01/26(Sun)09:24:23 No.108270071

>>108270066
Finetune as in LoRA/QLoRA or a full fine tune?

Anonymous
03/01/26(Sun)09:25:39 No.108270081

Anonymous 03/01/26(Sun)09:25:39 No.108270081

>even if I went down to Q4 qwen 3.5 27b would leave me with barely any context
I hate being a vramlet so much bros.

Anonymous
03/01/26(Sun)09:26:31 No.108270087

Anonymous 03/01/26(Sun)09:26:31 No.108270087

>>108270071
Qlora of course

Anonymous
03/01/26(Sun)09:34:13 No.108270121

Anonymous 03/01/26(Sun)09:34:13 No.108270121

File: 1765165885986785.jpg (14 KB, 249x243)

14 KB JPG

>>108268860
i like my local models and there is nothing you can do about it

Anonymous
03/01/26(Sun)09:43:00 No.108270155

Anonymous 03/01/26(Sun)09:43:00 No.108270155

File: 1747193914042499.png (130 KB, 375x440)

130 KB PNG

I want Deepseek v4 to be a complete success and beat all other goys and make Teortaxes cum

But at the same time i'm scared some retard with a lot of money could get scared by this and cause the whole economy to pop

Anonymous
03/01/26(Sun)09:43:45 No.108270160

Anonymous 03/01/26(Sun)09:43:45 No.108270160

>>108270155
Economy needs to pop.

Anonymous
03/01/26(Sun)09:44:28 No.108270165

Anonymous 03/01/26(Sun)09:44:28 No.108270165

File: 1761468185893722.png (315 KB, 2736x658)

315 KB PNG

>>108270160
Please no, not until we get pic related at least

Anonymous
03/01/26(Sun)09:47:00 No.108270172

Anonymous 03/01/26(Sun)09:47:00 No.108270172

>>108270165
retard. the industry needs to collapse first before it can switch focus to actual improvements.

Anonymous
03/01/26(Sun)09:47:05 No.108270173

Anonymous 03/01/26(Sun)09:47:05 No.108270173

>>108270087
That's hilarious.

Anonymous
03/01/26(Sun)09:48:42 No.108270182

Anonymous 03/01/26(Sun)09:48:42 No.108270182

File: 1747444728667117.png (296 KB, 458x400)

296 KB PNG

>>108270172

Anonymous
03/01/26(Sun)09:51:29 No.108270196

Anonymous 03/01/26(Sun)09:51:29 No.108270196

>>108270172
god forbid they actually improve real use cases instead of benchmaxxxing while bloating param count because bigger number better

Anonymous
03/01/26(Sun)09:52:31 No.108270201

Anonymous 03/01/26(Sun)09:52:31 No.108270201

>>108268674
Koboldcpp works fine

Anonymous
03/01/26(Sun)09:55:09 No.108270218

Anonymous 03/01/26(Sun)09:55:09 No.108270218

>>108270182
He already said you won't be able to fuck his catgirl daughter even if she will be open sourced.

Anonymous
03/01/26(Sun)09:55:13 No.108270219

Anonymous 03/01/26(Sun)09:55:13 No.108270219

>>108268764
>>108268772
It's what happens when normies get involved in anything.

Anonymous
03/01/26(Sun)09:55:20 No.108270221

Anonymous 03/01/26(Sun)09:55:20 No.108270221

>>108270165
This. We haven't peaked until your AI waifu can AT LEAST animate herself masturbating on the fly to you saying dirty things. Then there's the VR potential...

Anonymous
03/01/26(Sun)09:55:31 No.108270223

Anonymous 03/01/26(Sun)09:55:31 No.108270223

>>108270201
I wouldn't recommend it.

Anonymous
03/01/26(Sun)10:00:56 No.108270249

Anonymous 03/01/26(Sun)10:00:56 No.108270249

File: 1745031160649566.jpg (55 KB, 680x500)

55 KB JPG

My news summarization script works well enough but I wanted to test different models. I had used Qwen 3.5 35B to create the first summary as it was the model I used to generate the scripts but as i thought about it I concluded one does not need such a model to do such a simple task.
Therefore I decided to give IBM's Granite 4.0 micro a try. It is a 3B and will fit on a 4GB video card at Q8.

Here is the briefing generated by Granite
https://pastebin.com/3Upxcc6a

Here is the briefing generated by Qwen
https://pastebin.com/Y2ZrbsXh

For the most part I think they are functionally equivalent, albeit with a slightly different style, but given the qwen model is a MOE with 3B active parameters at any given time I think this makes sense. If I can find the time today I will dig out an old optiplex that has a 3GB Nvidia P106-60. I am curious what type of performance I can eek out of that card

Anonymous
03/01/26(Sun)10:03:21 No.108270269

Anonymous 03/01/26(Sun)10:03:21 No.108270269

Can I feed my vtuber archive to an llm and have it spit out tags based on the content of the video (vidya, chatting, etc)?

Anonymous
03/01/26(Sun)10:03:25 No.108270270

Anonymous 03/01/26(Sun)10:03:25 No.108270270

>>108268807
With that much VRAM you're not going to be getting 2 tokens/sec. You'll be getting speeds somewhat comparable to cloud hosted models. You also won't be paying through the nose because you had too many input tokens and you can RP whatever you want. Cloud models can't do that.

Anonymous
03/01/26(Sun)10:06:26 No.108270293

Anonymous 03/01/26(Sun)10:06:26 No.108270293

>>108270269
>based on the content of the video
no, based on titles maybe, but not content no

Anonymous
03/01/26(Sun)10:10:15 No.108270324

Anonymous 03/01/26(Sun)10:10:15 No.108270324

>>108270249
Try my favorite Nemotron-3-Nano-30B-A3B
Kimi-Linear-48B-A3B works too if you have more RAM

Anonymous
03/01/26(Sun)10:16:15 No.108270364

Anonymous 03/01/26(Sun)10:16:15 No.108270364

>>108270324
32gb of vram/64gb ram on my amd machine/server and 12gb vram/192gb ram on my nvidia desktop
My biggest issue is trying to create ideas on what to create. The whole "vibe coding" thing was fun but I don't know what to create next

Anonymous
03/01/26(Sun)10:20:01 No.108270401

Anonymous 03/01/26(Sun)10:20:01 No.108270401

where is deepsneed?

Anonymous
03/01/26(Sun)10:21:07 No.108270414

Anonymous 03/01/26(Sun)10:21:07 No.108270414

>>108270293
Not even with vision?

Anonymous
03/01/26(Sun)10:21:54 No.108270426

Anonymous 03/01/26(Sun)10:21:54 No.108270426

>>108270269
I dont think theres any models that take potentially hours of video input directly but you could use whisper to make transcripts of the video to give your llm, you could combine that with using ffmpeg to extract frames from the video every minute or so into images to give to a multimodal model along with the relevant subtitles, you can tell it to tag whats going on in that minute of subtitles and the video frame then give you a summary of what happens between what timestamps, your llm can probably write a bash or python script to do this for you if you cant

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.