/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Janitor application acceptance emails are being sent out. Please remember to check your spam box!

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 11/05/25(Wed)10:57:11 No.107113093

File: 1751276140253030.jpg (782 KB, 2105x2963)

782 KB JPG

/lmg/ - Local Models General Anonymous 11/05/25(Wed)10:57:11 No.107113093

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107104115 & >>107095114

►News
>(11/01) LongCat-Flash-Omni 560B-A27B released: https://hf.co/meituan-longcat/LongCat-Flash-Omni
>(10/31) Emu3.5: Native Multimodal Models are World Learners: https://github.com/baaivision/Emu3.5
>(10/30) Qwen3-VL support merged: https://github.com/ggml-org/llama.cpp/pull/16780
>(10/30) Kimi-Linear-48B-A3B released with hybrid linear attention: https://hf.co/moonshotai/Kimi-Linear-48B-A3B-Instruct
>(10/28) Brumby-14B-Base released with power retention layers: https://manifestai.com/articles/release-brumby-14b

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
11/05/25(Wed)10:57:28 No.107113095

Anonymous 11/05/25(Wed)10:57:28 No.107113095

File: threadrecap2.png (506 KB, 1024x1024)

506 KB PNG

►Recent Highlights from the Previous Thread: >>107104115

--IndQA benchmark and EU multilingual LLM evaluation discussions:
>107104680 >107104733 >107107367 >107107455 >107107533 >107107631
--Finetuning DeepSeek 671B with 80GB VRAM with catastrophic overtraining and context length challenges:
>107105625 >107105860 >107105896 >107106164 >107106215 >107106275 >107106297 >107106332 >107106416 >107106433 >107106446 >107106502 >107106351 >107106466 >107106181 >107105710 >107105737 >107105769 >107105765 >107105792
--RTX 6000 Workstation Edition vs Max-Q: Performance, power, and safety tradeoffs:
>107107561 >107107669 >107107690 >107107807 >107107853 >107107866 >107107837 >107107926 >107107938 >107107946
--Fedora 43 compilation issues for llama.cpp due to glibc/CUDA incompatibilities:
>107110453 >107110623 >107110723 >107110957 >107110964 >107110991 >107111240 >107111261 >107111609 >107111643 >107111712 >107111726
--Windows vs Linux CUDA/llama.cpp setup challenges:
>107110661 >107110852 >107110953 >107111011
--French LLM leaderboard criticized for flawed rankings and perceived bias:
>107107537 >107107559 >107107574 >107107562 >107107617 >107107572
--Quantization benchmarking and model performance tradeoffs in practice:
>107109145 >107109251 >107109456 >107109345 >107109466 >107109353
--Rising RAM prices linked to AI demand and HBM chip production shifts:
>107105971 >107105987 >107105997 >107106030 >107106079 >107106178 >107106242 >107106246 >107106305 >107106317 >107106488 >107106496 >107107544 >107112114
--Model comparison in D&D 3.5e one-shot roleplay scenarios:
>107112449 >107112461 >107112747 >107112761
--Critique of Meta's Agents Rule of Two security model as inconsistent risk assessment:
>107105204
--AI-driven consumer DRAM shortages:
>107106504
--Miku (free space):
>107104379 >107105550 >107106025 >107109466 >107110129

►Recent Highlight Posts from the Previous Thread: >>107104116

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
11/05/25(Wed)11:23:20 No.107113283

Anonymous 11/05/25(Wed)11:23:20 No.107113283

>>107113095
Thank you Recap Miku

Anonymous
11/05/25(Wed)11:29:36 No.107113348

Anonymous 11/05/25(Wed)11:29:36 No.107113348

where's glm 4.6 air fuckers

Anonymous
11/05/25(Wed)11:31:30 No.107113361

Anonymous 11/05/25(Wed)11:31:30 No.107113361

>>107113348
2 more hours

Anonymous
11/05/25(Wed)11:34:37 No.107113391

Anonymous 11/05/25(Wed)11:34:37 No.107113391

>>107113348
I'm more interested in the llama.cpp MTP PR.

Anonymous
11/05/25(Wed)11:42:36 No.107113459

Anonymous 11/05/25(Wed)11:42:36 No.107113459

>>107113391
vibe coding status?

Anonymous
11/05/25(Wed)11:42:51 No.107113464

Anonymous 11/05/25(Wed)11:42:51 No.107113464

What is the best model i can run nowadays for programming / tech related shit? t. 12GB vramlet 64gb RAM

Anonymous
11/05/25(Wed)11:51:54 No.107113548

Anonymous 11/05/25(Wed)11:51:54 No.107113548

>>107113464
GPT-OSS-120B

Anonymous
11/05/25(Wed)11:55:09 No.107113567

Anonymous 11/05/25(Wed)11:55:09 No.107113567

>>107113548
yeah ok bro

Anonymous
11/05/25(Wed)11:56:13 No.107113575

Anonymous 11/05/25(Wed)11:56:13 No.107113575

I need/want a sophisticated note taking solution that keeps me reminding of shit that I have todo - powered by a language model
what would be a privacy safe way to do this?

Anonymous
11/05/25(Wed)11:56:56 No.107113581

Anonymous 11/05/25(Wed)11:56:56 No.107113581

>>107113567
Is he wrong?
I know that the model is shit for ERP, but it should at least be good for assistant type tasks and coding right?

Anonymous
11/05/25(Wed)11:57:29 No.107113589

Anonymous 11/05/25(Wed)11:57:29 No.107113589

>>107113575
>I need/want a sophisticated note taking solution
You need a note book.

Anonymous
11/05/25(Wed)11:57:36 No.107113590

Anonymous 11/05/25(Wed)11:57:36 No.107113590

>>107113581
you cant run a 120B model on 12GB vram and 64GB ram lol

Anonymous
11/05/25(Wed)11:58:51 No.107113604

Anonymous 11/05/25(Wed)11:58:51 No.107113604

>>107113575
Vibe code your own.
It's not that complicated a project.
I'd use Claude 4.5 via lmarena to plan the high level implementation and some guiding code snippets and use a local model as an agent to actually write the final code.

>>107113590
Of course you can. Quantized, sure, but still.

Anonymous
11/05/25(Wed)11:59:18 No.107113609

Anonymous 11/05/25(Wed)11:59:18 No.107113609

>>107113590
Your NVMe SSD?

Anonymous
11/05/25(Wed)11:59:19 No.107113610

Anonymous 11/05/25(Wed)11:59:19 No.107113610

File: toss120b.png (132 KB, 1566x556)

132 KB PNG

>>107113590
You can.

Anonymous
11/05/25(Wed)12:00:02 No.107113614

Anonymous 11/05/25(Wed)12:00:02 No.107113614

>>107113610
Oh yeah, they even have their own 4ish bpw quantization scheme.

Anonymous
11/05/25(Wed)12:01:48 No.107113623

Anonymous 11/05/25(Wed)12:01:48 No.107113623

File: filen.png (17 KB, 492x148)

17 KB PNG

>>107113348

Anonymous
11/05/25(Wed)12:05:51 No.107113666

Anonymous 11/05/25(Wed)12:05:51 No.107113666

>>107113610
>>107113604
Why not run the 20b model? I get like 13 tokens per second with the 20b model, wont 120b be slow as shit even if quantized?

Anonymous
11/05/25(Wed)12:06:52 No.107113673

Anonymous 11/05/25(Wed)12:06:52 No.107113673

>>107113575
Just use a calendar or todo application. You're as stupid as an LLM if you think it's a good idea to manage your agenda by having one of them guess what belongs on there and when.

Anonymous
11/05/25(Wed)12:07:37 No.107113676

Anonymous 11/05/25(Wed)12:07:37 No.107113676

>>107113666
>>107113610
>>107113604
also is it reasonably smarter than the 20b version? It just sounds weird how you can make a 120b model able to run on 12GB vram without completely lobotomizing it

Anonymous
11/05/25(Wed)12:07:58 No.107113680

Anonymous 11/05/25(Wed)12:07:58 No.107113680

>>107113348
hopefully, never

Anonymous
11/05/25(Wed)12:11:06 No.107113705

Anonymous 11/05/25(Wed)12:11:06 No.107113705

>>107113666
I never compared then, satan. But it should absolutely be better simply by virtue of having more parameters to hold more knowledge, and having more activated params during inference.

>>107113676
>>107113676
Yeah. Quantization feels almost like magic, but it's just maths.
Granted, it's not going to be as good/capable as the full thing, but it should be more than usable.
To be clear, I don't know if it's any good, when I asked if anon was wrong (>>107113581) I was legitimately curious how it performs compared to, say, an equivalent Qwen model or whatever.

Anonymous
11/05/25(Wed)12:12:28 No.107113717

Anonymous 11/05/25(Wed)12:12:28 No.107113717

>>107113666
>Why not run the 20b model?
It'd be even dumber.
>wont 120b be slow as shit even if quantized?
Yes.
I don't care for that model, and i don't use it. I'm just saying that you can.

>>107113676
Then try the 20b. Come back with your assessment. Both models were released with the same 4bit quant,
There's also a huge variety of qwens for you to try. Make your own assessment.

Anonymous
11/05/25(Wed)12:14:09 No.107113725

Anonymous 11/05/25(Wed)12:14:09 No.107113725

qwen coder is better and can be used for FIM as well.

Anonymous
11/05/25(Wed)12:15:27 No.107113739

Anonymous 11/05/25(Wed)12:15:27 No.107113739

>>107113725
Which coding client uses FIM instead of just having the model produce diffs and using git to apply these diffs?

Anonymous
11/05/25(Wed)12:17:05 No.107113748

Anonymous 11/05/25(Wed)12:17:05 No.107113748

>>107113739
https://github.com/ggml-org/llama.vim
https://github.com/ggml-org/llama.vscode

Anonymous
11/05/25(Wed)12:20:22 No.107113779

Anonymous 11/05/25(Wed)12:20:22 No.107113779

>>107113093
Your special interest is interesting

Anonymous
11/05/25(Wed)12:22:29 No.107113796

Anonymous 11/05/25(Wed)12:22:29 No.107113796

how autistic a person has to be to obsess over the same fictional character for years and feel compelled to share their obsession with the rest of the world

Anonymous
11/05/25(Wed)12:23:22 No.107113802

Anonymous 11/05/25(Wed)12:23:22 No.107113802

>>107113796
>how autistic a person has to be to obsess over the same fictional character for years and feel compelled to share their obsession with the rest of the world

Anonymous
11/05/25(Wed)12:24:04 No.107113811

Anonymous 11/05/25(Wed)12:24:04 No.107113811

quen3-coder 30b is the goat for coding
NO other model comes close for people with less than 20gb VRAM

Anonymous
11/05/25(Wed)12:24:06 No.107113812

Anonymous 11/05/25(Wed)12:24:06 No.107113812

>>107113739
https://github.com/lmg-anon/mikupad

Anonymous
11/05/25(Wed)12:24:10 No.107113815

Anonymous 11/05/25(Wed)12:24:10 No.107113815

>>107113796
but enough about petra

Anonymous
11/05/25(Wed)12:26:57 No.107113840

Anonymous 11/05/25(Wed)12:26:57 No.107113840

>>107113739
Continue.dev uses fitm. Can use it with Qwen models for auto-complete. The latency with local models is annoying though.

Anonymous
11/05/25(Wed)12:29:02 No.107113858

Anonymous 11/05/25(Wed)12:29:02 No.107113858

>>107113725
120B > 30B

Anonymous
11/05/25(Wed)12:29:14 No.107113862

Anonymous 11/05/25(Wed)12:29:14 No.107113862

>>107113748
>>107113812
Alright, that's actually really dope.

>>107113840
>Continue
I'm yet to fuck with Continue.
Maybe I should.

Anonymous
11/05/25(Wed)12:29:51 No.107113867

Anonymous 11/05/25(Wed)12:29:51 No.107113867

>>107113811
>for people with less than 20gb VRAM
I wouldn't call them "people", but ok

Anonymous
11/05/25(Wed)12:30:09 No.107113868

Anonymous 11/05/25(Wed)12:30:09 No.107113868

>>107113739
VSCode Copilot allows you to use your own model endpoints

Anonymous
11/05/25(Wed)12:34:00 No.107113899

Anonymous 11/05/25(Wed)12:34:00 No.107113899

>>107113739
Cursor works well with ollama. My company PC uses that by default.

Anonymous
11/05/25(Wed)12:58:27 No.107114091

Anonymous 11/05/25(Wed)12:58:27 No.107114091

>>107113796
very true, almost as weird as obsessively trying to kill a 4chan general for years

Anonymous
11/05/25(Wed)13:05:21 No.107114148

Anonymous 11/05/25(Wed)13:05:21 No.107114148

File: 1747976091623646.jpg (21 KB, 374x374)

21 KB JPG

>accidentally renewed novelai

Anonymous
11/05/25(Wed)13:10:32 No.107114187

Anonymous 11/05/25(Wed)13:10:32 No.107114187

>>107113811
correct that Qwen is an essential local model.

Anonymous
11/05/25(Wed)13:11:53 No.107114192

Anonymous 11/05/25(Wed)13:11:53 No.107114192

>>107113899
Can I use local models with the Cursor app without paying them? My two-week free trial said I hit the limit after like four hours of messing around. They're shady so I don't want to give them money but the tool was decent.

Anonymous
11/05/25(Wed)13:12:05 No.107114194

Anonymous 11/05/25(Wed)13:12:05 No.107114194

>>107113459
I could but I won't

Anonymous
11/05/25(Wed)13:13:06 No.107114200

Anonymous 11/05/25(Wed)13:13:06 No.107114200

>>107114148
>subbing to novelai in the first place
lol

Anonymous
11/05/25(Wed)13:14:06 No.107114207

Anonymous 11/05/25(Wed)13:14:06 No.107114207

>>107114192
roo cline is good too

Anonymous
11/05/25(Wed)13:20:32 No.107114256

Anonymous 11/05/25(Wed)13:20:32 No.107114256

>>107114091
>he thinks every person who doesn't like him is samefag
who's going to tell him

Anonymous
11/05/25(Wed)13:22:30 No.107114273

Anonymous 11/05/25(Wed)13:22:30 No.107114273

>>107114207
For some reason, I get much better results out of normal cline than roo.
Which is wild, cline's prompts are such bloated mess.

Anonymous
11/05/25(Wed)13:24:19 No.107114293

Anonymous 11/05/25(Wed)13:24:19 No.107114293

>>107113811
You won't be able to fit enough context on a consumer GPU for it to be useful

Start offloading to ram and it's slower than just doing the work yourself

You do know how to code right?

Anonymous
11/05/25(Wed)13:26:39 No.107114312

Anonymous 11/05/25(Wed)13:26:39 No.107114312

>>107114293
imagine listening to tr00ns like this guy

Anonymous
11/05/25(Wed)13:28:40 No.107114331

Anonymous 11/05/25(Wed)13:28:40 No.107114331

>>107114273
Roo allows you to override the prompts.

Anonymous
11/05/25(Wed)13:36:23 No.107114399

Anonymous 11/05/25(Wed)13:36:23 No.107114399

>>107113666
120b is the only decent coding model at <=120b. You won't have a lot of space for context though if you only have 12 gb vram

Anonymous
11/05/25(Wed)13:37:09 No.107114408

Anonymous 11/05/25(Wed)13:37:09 No.107114408

File: parrot.png (750 KB, 678x453)

750 KB PNG

>>107113348
glm 4.6 air?

Anonymous
11/05/25(Wed)13:46:52 No.107114496

Anonymous 11/05/25(Wed)13:46:52 No.107114496

if we had a way to quant models at sub-1bit level we wouldn't need glm 4.6 air anymore.

Anonymous
11/05/25(Wed)13:48:59 No.107114513

Anonymous 11/05/25(Wed)13:48:59 No.107114513

>>107114331
You can override the prompts on cline too

Anonymous
11/05/25(Wed)13:49:33 No.107114519

Anonymous 11/05/25(Wed)13:49:33 No.107114519

>>107114408
fack you ungratefuls you get free and complains like idiot

Anonymous
11/05/25(Wed)13:51:34 No.107114543

Anonymous 11/05/25(Wed)13:51:34 No.107114543

>>107114496
Maybe we should get a way to quant to sub-8bit without crazy brain damage first.

Anonymous
11/05/25(Wed)13:57:20 No.107114603

Anonymous 11/05/25(Wed)13:57:20 No.107114603

File: thick-billedparrot7918-1.jpg (226 KB, 1169x1319)

226 KB JPG

>>107114519
Get free and complains?

Anonymous
11/05/25(Wed)14:08:07 No.107114699

Anonymous 11/05/25(Wed)14:08:07 No.107114699

I take back all the nasty things I said about gpt-oss:20b. It's actually pretty nice to use with Ollama/Zed

Anonymous
11/05/25(Wed)14:09:45 No.107114713

Anonymous 11/05/25(Wed)14:09:45 No.107114713

>>107114699
We need more bait. Unleash them all!

Anonymous
11/05/25(Wed)14:10:08 No.107114718

Anonymous 11/05/25(Wed)14:10:08 No.107114718

File: 499683473.jpg (535 KB, 1920x1200)

535 KB JPG

>>107114519
fack you?

Anonymous
11/05/25(Wed)14:13:23 No.107114747

Anonymous 11/05/25(Wed)14:13:23 No.107114747

llama 4.1 status?

Anonymous
11/05/25(Wed)14:13:32 No.107114748

Anonymous 11/05/25(Wed)14:13:32 No.107114748

>>107114713
I'm being serious, it's actually decent in cases where you aren't able to use cloud models.

Anonymous
11/05/25(Wed)14:15:21 No.107114758

Anonymous 11/05/25(Wed)14:15:21 No.107114758

>>107114748
ok

Anonymous
11/05/25(Wed)14:15:41 No.107114763

Anonymous 11/05/25(Wed)14:15:41 No.107114763

>>107114748
If you can run ASS 20B you can most likely run Qwen3 30BA3B which is significantly better at literally everything.

Anonymous
11/05/25(Wed)14:21:23 No.107114822

Anonymous 11/05/25(Wed)14:21:23 No.107114822

>>107114763
I'll give Qwen3 30b a3b a try. I do recall it being a decent writer. Thanks for your suggestion.

Anonymous
11/05/25(Wed)14:22:37 No.107114831

Anonymous 11/05/25(Wed)14:22:37 No.107114831

>>107114822
Make sure it's the later versions since Qwen fucked up the post-training on the original launch of Qwen3.

Anonymous
11/05/25(Wed)14:27:38 No.107114881

Anonymous 11/05/25(Wed)14:27:38 No.107114881

>>107114408
>>107114603
>>107114718
Teach the parrot to say H1B was a mistake.

Anonymous
11/05/25(Wed)14:29:27 No.107114902

Anonymous 11/05/25(Wed)14:29:27 No.107114902

>>107114831
why would you ever not use latest version

Anonymous
11/05/25(Wed)14:31:15 No.107114917

Anonymous 11/05/25(Wed)14:31:15 No.107114917

Sounds like gemma 4 is only getting "nano" and "small" variants. Hopefully small is at least 8B

Anonymous
11/05/25(Wed)14:32:59 No.107114938

Anonymous 11/05/25(Wed)14:32:59 No.107114938

>>107114917
source my nigga?

Anonymous
11/05/25(Wed)14:40:28 No.107115011

Anonymous 11/05/25(Wed)14:40:28 No.107115011

File: file.png (51 KB, 1504x35)

51 KB PNG

How do I fix this?

Anonymous
11/05/25(Wed)14:42:11 No.107115033

Anonymous 11/05/25(Wed)14:42:11 No.107115033

>>107114763
Nice joke. It's an overthinking POS that produces garbage results. Qwen 32B is the only small Qwen model that produces good output from time to time.

Anonymous
11/05/25(Wed)14:52:36 No.107115135

Anonymous 11/05/25(Wed)14:52:36 No.107115135

Is running qwen vl 235B at q1 worth it or should I stick with GLM air?

Anonymous
11/05/25(Wed)14:53:40 No.107115148

Anonymous 11/05/25(Wed)14:53:40 No.107115148

>>107115135
Kimi Q0.1

Anonymous
11/05/25(Wed)14:54:35 No.107115162

Anonymous 11/05/25(Wed)14:54:35 No.107115162

more perf improvements for macs
https://github.com/ggml-org/llama.cpp/pull/16634#issuecomment-3490125571

Anonymous
11/05/25(Wed)14:55:25 No.107115168

Anonymous 11/05/25(Wed)14:55:25 No.107115168

>>107115135
q1 is probably too low, imo the main advantage of 235b over air is the intelligence and at that low of a quant idk how much it applies anymore
couldn't hurt to try and see for yourself if you can spare the bandwidth though

Anonymous
11/05/25(Wed)14:55:58 No.107115179

Anonymous 11/05/25(Wed)14:55:58 No.107115179

>>107114917
What do you think the "n" in "Gemma 3n" meant?

Anonymous
11/05/25(Wed)14:56:11 No.107115182

Anonymous 11/05/25(Wed)14:56:11 No.107115182

>>107115135
>Is running qwen [...] worth it
no

Anonymous
11/05/25(Wed)15:01:24 No.107115227

Anonymous 11/05/25(Wed)15:01:24 No.107115227

>>107115148
What the fuck is Kimi Q0.1?

Anonymous
11/05/25(Wed)15:08:34 No.107115300

Anonymous 11/05/25(Wed)15:08:34 No.107115300

Why are people shilling epycs as the poorfag LLM driver if LGA-2011-3 systems are many times cheaper ?

Anonymous
11/05/25(Wed)15:09:04 No.107115305

Anonymous 11/05/25(Wed)15:09:04 No.107115305

>>107115135
I can run Qwen 235B at Q5 and it doesn't get any better. The model is very schizo.

Anonymous
11/05/25(Wed)15:10:49 No.107115326

Anonymous 11/05/25(Wed)15:10:49 No.107115326

>>107115300
ddr5 and max ram capacity, limping along on slow ass ddr4 is torture

Anonymous
11/05/25(Wed)15:10:56 No.107115327

Anonymous 11/05/25(Wed)15:10:56 No.107115327

>>107113093
What's the best model for ntr?

Anonymous
11/05/25(Wed)15:12:53 No.107115348

Anonymous 11/05/25(Wed)15:12:53 No.107115348

>>107115327
DavidAI

Anonymous
11/05/25(Wed)15:19:50 No.107115429

Anonymous 11/05/25(Wed)15:19:50 No.107115429

>>107115327
You have to find the one with the largest containing some esoteric all caps shit, like
>https://huggingface.co/DavidAU/Llama-3.2-8X3B-GATED-MOE-Reasoning-Dark-Champion-Instruct-uncensored-abliterated-18.4B-GGUF

Anonymous
11/05/25(Wed)15:33:44 No.107115578

Anonymous 11/05/25(Wed)15:33:44 No.107115578

>>107115326
I meant ddr4 epycs, used SP3 boards cost like 800$ here and people still buy those while new chink 2011-3 huananzhis are starting from 100$

Anonymous
11/05/25(Wed)15:46:47 No.107115705

Anonymous 11/05/25(Wed)15:46:47 No.107115705

>>107115011
update llama.cpp

Anonymous
11/05/25(Wed)15:58:10 No.107115801

Anonymous 11/05/25(Wed)15:58:10 No.107115801

>>107115578
I have an X99 board with 8 memory slots, but I'm afraid to buy the whole 128 gb because I worry I'll be disappointed with the performance

Anonymous
11/05/25(Wed)16:12:01 No.107115924

Anonymous 11/05/25(Wed)16:12:01 No.107115924

>>107115801
You better decide fast because ram prices are going to only keep going up when the specs for next year's GPUs are announced.

Anonymous
11/05/25(Wed)16:20:05 No.107115988

Anonymous 11/05/25(Wed)16:20:05 No.107115988

i'm warming up to iq2 r1 after using iq3 glm 4.6 for a while
it feels even more uncensored and also less parroty for a change with thinking prefilled out

Anonymous
11/05/25(Wed)16:23:26 No.107116014

Anonymous 11/05/25(Wed)16:23:26 No.107116014

>>107114831
Damn it's just a bit too big. I only have 16GB VRAM. Sucks though because the results when I run it split CPU/GPU are great, just very slow.
Maybe I'll tell it to do something and leave it overnight lol

Anonymous
11/05/25(Wed)16:27:47 No.107116060

Anonymous 11/05/25(Wed)16:27:47 No.107116060

>>107116014
Are you using -ngl 99 + -ncmoe?

Anonymous
11/05/25(Wed)16:33:00 No.107116097

Anonymous 11/05/25(Wed)16:33:00 No.107116097

>>107115988
NOOOO YOU MUST USE Q5 ABOVE!

Anonymous
11/05/25(Wed)16:38:11 No.107116146

Anonymous 11/05/25(Wed)16:38:11 No.107116146

File: parrots.jpg (1.69 MB, 4276x2635)

1.69 MB JPG

>>107116097
Q5 above?

Anonymous
11/05/25(Wed)16:38:21 No.107116148

Anonymous 11/05/25(Wed)16:38:21 No.107116148

is it normal that i get way better results with whisper's large-v2 compared to large-v3 or turbo in asian languages like korean?

Anonymous
11/05/25(Wed)16:43:27 No.107116194

Anonymous 11/05/25(Wed)16:43:27 No.107116194

>>107114408
Hi Drummer!

Anonymous
11/05/25(Wed)16:44:31 No.107116201

Anonymous 11/05/25(Wed)16:44:31 No.107116201

>>107116148
Yes
https://github.com/openai/whisper/discussions/2363

Anonymous
11/05/25(Wed)16:46:09 No.107116218

Anonymous 11/05/25(Wed)16:46:09 No.107116218

>Geminiposters stop right as the parrotposting GLM seething begins
I'm noooticing.
>>107116194
I don't think it's him.

Anonymous
11/05/25(Wed)16:48:18 No.107116235

Anonymous 11/05/25(Wed)16:48:18 No.107116235

>>107116060
I am using Ollama saar I couldn't get tool calling working between Zed and llama.cpp

Anonymous
11/05/25(Wed)16:50:21 No.107116262

Anonymous 11/05/25(Wed)16:50:21 No.107116262

>>107113589
>>107113673
Those suggestions do not work, as they require you to write your notes in them. If an LLM does not do the thinking for me, it is completely useless.

Anonymous
11/05/25(Wed)16:55:17 No.107116316

Anonymous 11/05/25(Wed)16:55:17 No.107116316

>>107116262
Is this a personal thing or for work? If for work, I suggest recording meetings, generating a transcript with faster whisper and nemo asr, then building a knowledge graph based on the transcript with a hybrid NL/LLM approach
Where do the things which you have todo originate from? Or the request to do them at least.

Anonymous
11/05/25(Wed)16:58:10 No.107116361

Anonymous 11/05/25(Wed)16:58:10 No.107116361

>>107116262
What you wish for is a complete digital replacement for a personal assistant, but it's not possible with current technology, sorry you got duped by AI bubble hype grifters

Anonymous
11/05/25(Wed)17:11:10 No.107116467

Anonymous 11/05/25(Wed)17:11:10 No.107116467

>>107113575
Sillytavern Lore Book

Anonymous
11/05/25(Wed)17:13:59 No.107116493

Anonymous 11/05/25(Wed)17:13:59 No.107116493

File: parrotornot.png (82 KB, 1184x475)

82 KB PNG

>>107115988
Is this what u mean by parroty?

Anonymous
11/05/25(Wed)17:15:46 No.107116508

Anonymous 11/05/25(Wed)17:15:46 No.107116508

>>107116493
>Unironically using DavidAU

Anonymous
11/05/25(Wed)17:17:11 No.107116523

Anonymous 11/05/25(Wed)17:17:11 No.107116523

>>107113575
>note taking
Speech to text, any whisper model would do
>remind shit I have to do
even a small 3B model would do, it just needs to be able to tool call

Anonymous
11/05/25(Wed)17:18:14 No.107116529

Anonymous 11/05/25(Wed)17:18:14 No.107116529

>>107116523
He asked for a solution, not shit he needs to wire together himself.

Anonymous
11/05/25(Wed)17:20:10 No.107116543

Anonymous 11/05/25(Wed)17:20:10 No.107116543

>>107116529
That would be $4K to hire me then

Anonymous
11/05/25(Wed)17:23:56 No.107116583

Anonymous 11/05/25(Wed)17:23:56 No.107116583

>>107113575
There's probably something in open web UI that does that

Anonymous
11/05/25(Wed)17:24:26 No.107116592

Anonymous 11/05/25(Wed)17:24:26 No.107116592

>>107113575
You're trying to body double your ADHD with AI, aren't you?
It's not possible at this time. AI can't actually think. Once we can get it to output without input, we can. Otherwise, it's just reminding yourself with extra steps. You can't get it to do anything other than write the notes for you. It needs to pass the tuning test.

Anonymous
11/05/25(Wed)17:32:31 No.107116659

Anonymous 11/05/25(Wed)17:32:31 No.107116659

>>107116592
This is not accurate, RAG is extremely powerful. There's a lot that you can automate with rarted small models.

Anonymous
11/05/25(Wed)17:33:17 No.107116665

Anonymous 11/05/25(Wed)17:33:17 No.107116665

>>107116659
Examples?

Anonymous
11/05/25(Wed)17:43:54 No.107116737

Anonymous 11/05/25(Wed)17:43:54 No.107116737

>>107116592
the main thing LLM's have revealed to us is just how retarded most people are

Anonymous
11/05/25(Wed)17:50:58 No.107116802

Anonymous 11/05/25(Wed)17:50:58 No.107116802

>>107114408
glm 4.6 air-chan when? Two weeks?

Anonymous
11/05/25(Wed)17:52:00 No.107116811

Anonymous 11/05/25(Wed)17:52:00 No.107116811

Anyone get an agentic model to do 90% of their job for them setup?

Anonymous
11/05/25(Wed)17:52:25 No.107116816

Anonymous 11/05/25(Wed)17:52:25 No.107116816

>>107116811
Yep but it's not local

Anonymous
11/05/25(Wed)17:54:03 No.107116828

Anonymous 11/05/25(Wed)17:54:03 No.107116828

>>107116816
how so? My only hurdle is not breaking any company rules by handing off all my company emails to an AI (or just not give a shit)

Anonymous
11/05/25(Wed)18:01:35 No.107116887

Anonymous 11/05/25(Wed)18:01:35 No.107116887

>>107116828
>My only hurdle is not breaking any company rules
Yeah that's mine as well, it's a pain. Semi-auto might be better for emails.
I automated a subset of tasks that are boring and frequent by building a RAG setup over the documentation of the product I'm interacting with plus some tools to interact with a mock of that system in Docker on an administrative/development level. This is with GPT-5 though.
I'll be building a microservice designed for giving Langchain agents access to applications running in Docker which I may open source for this next. It just uses the docker Python lib now.

Anonymous
11/05/25(Wed)18:06:40 No.107116924

Anonymous 11/05/25(Wed)18:06:40 No.107116924

>>107116887
I'm an electrical engineer that works with electrical drawings/ random forms /etc so I've just been trying to braintstorm what pieces I could input to a model and what I could even possibly receive from it that would help me speed up shit I do, even if I have to do some stuff manually per its findings

Anonymous
11/05/25(Wed)18:07:39 No.107116936

Anonymous 11/05/25(Wed)18:07:39 No.107116936

OpenRouter now supports embedding models, cool!

Anonymous
11/05/25(Wed)18:09:23 No.107116953

Anonymous 11/05/25(Wed)18:09:23 No.107116953

>>107116936
Most embedding models are <1B. Even the most empoverished of vramlets can run their own. Why would one pay for this?

Anonymous
11/05/25(Wed)18:09:37 No.107116957

Anonymous 11/05/25(Wed)18:09:37 No.107116957

>>107116811
I would if not for the boss spyware.

Anonymous
11/05/25(Wed)18:14:15 No.107116991

Anonymous 11/05/25(Wed)18:14:15 No.107116991

>>107116924
Do you use digital design for the drawings? It might be possible to make reader/writer tools for the files.
It depends on how low level those files are, like if it's text-like or binary.
You might have some luck with converting circuits to a graph representation that could then be converted to text, and letting the model work with that.

Any data you have access to is gold for this stuff. Think like part databases, old design files, documents with specifications, stuff like that might be useful to build RAG / agentic search for.

Anonymous
11/05/25(Wed)18:15:57 No.107117002

Anonymous 11/05/25(Wed)18:15:57 No.107117002

>>107116957
Screen capture from HDMI + Teensy USB Keyboard and Mouse emulator controlled by a personal device with a fat GPU, easy

Anonymous
11/05/25(Wed)18:17:18 No.107117021

Anonymous 11/05/25(Wed)18:17:18 No.107117021

>>107116991
could use mermaid diagrams, which are defined in a text-based format

Anonymous
11/05/25(Wed)18:18:26 No.107117030

Anonymous 11/05/25(Wed)18:18:26 No.107117030

>>107117021
If you're lucky, GPT-5 might understand them

Anonymous
11/05/25(Wed)18:20:55 No.107117057

Anonymous 11/05/25(Wed)18:20:55 No.107117057

>>107117002
What about USB device detection and software detection?

Anonymous
11/05/25(Wed)18:22:00 No.107117068

Anonymous 11/05/25(Wed)18:22:00 No.107117068

>>107117057
They would see that you plugged in a personal keyboard and mouse. What of it?

Anonymous
11/05/25(Wed)18:22:29 No.107117071

Anonymous 11/05/25(Wed)18:22:29 No.107117071

>>107117057
Physical device to type on the keyboard and move the mouse
Camera for the screen
Good luck anon that sounds like a pain to work with.

Anonymous
11/05/25(Wed)18:22:32 No.107117072

Anonymous 11/05/25(Wed)18:22:32 No.107117072

>>107116991
My drawings are more high level power distribution type stuff and its more so just CAD work of lines, layout drawings and things like that. There isn't much math being done and any many that is needed there are electrical modeling programs for that.

I do import the electrical codes and standards and have it search the documents to help me find sections quickly that i can then reference, but it just feels like there are far too many isolated sources of information for me to be able to easily connect them all for context

Maybe I'm just a brainlet

Anonymous
11/05/25(Wed)18:25:19 No.107117098

Anonymous 11/05/25(Wed)18:25:19 No.107117098

>>107117071
Even with GPT-5 and Claude are fucking retarded and you have to babysit in ideal situations.
I can't imagine how bad the results would be with OCR mistakes and relying on models to move the cursor. I suppose could work if one has an entirely terminal-based workflow, but I don't see that working with an IDE.

Anonymous
11/05/25(Wed)18:25:53 No.107117105

Anonymous 11/05/25(Wed)18:25:53 No.107117105

>>107117068
Damn, okay, never thought of it like that.

Anonymous
11/05/25(Wed)18:26:06 No.107117107

Anonymous 11/05/25(Wed)18:26:06 No.107117107

>>107116659
>small models.
Unfortunately there's no good enough models to do the job. It has to be instruction fine tuned model, and the best you can get currently is Qwen3-4B, but instead of that why not use 30B? It has similar speed but has more knowledge.

Anonymous
11/05/25(Wed)18:26:39 No.107117115

Anonymous 11/05/25(Wed)18:26:39 No.107117115

>>107116953
It's good to test stuff without having to download all of them

Anonymous
11/05/25(Wed)18:29:20 No.107117136

Anonymous 11/05/25(Wed)18:29:20 No.107117136

>>107117072
>import the electrical codes and standards and have it search the documents
How are you doing this?
And are you doing this locally or in the cloud?

I expect anything interacting with professional CAD suites is something that would require custom models and pro AI researchers to build something for.

The documents and really anything text based though, I expect you could do a lot with those.

>>107117098
Kek yeah my response was mostly joking, that seems pretty hard to workaround unless you can fake a legit HID that's permitted by policy.

>>107117107
The trick is you have to embed a lot of domain knowledge in your agent deterministically. The intelligence and context limits of the model mainly affect the scope of a task you can successfully pass off to it and expect to get good results for. So your pipeline needs to be more hardcoded with smaller models, and will be less flexible.

Anonymous
11/05/25(Wed)18:32:25 No.107117160

Anonymous 11/05/25(Wed)18:32:25 No.107117160

>>107117136
the codes and standards are just pdfs, its nothing fancy, i just upload them into chatgpt for context and ask it questions based on the codes

Interacting with CAD I know is not going to happen, at least not on a company computer. And I'm sure Autodesk and those types of companies are looking to do their own integrations anyway

Anonymous
11/05/25(Wed)18:40:04 No.107117222

Anonymous 11/05/25(Wed)18:40:04 No.107117222

>>107117160
You can get pretty advanced with document RAG. The basic idea is you convert the PDFs to text, split the text into chunks, and calculate an embedding for those chunks.
Then when you query the system, before sending the query to the LLM you calculate an embedding for the query, and use that to search through the embeddings of the chunks for the closest matching ones. Those chunks get attached to your query and sent to the model so it automatically gets some context related to your prompt.

You can really go crazy with this though. There's advanced methods for deciding how to chunk it, doing multiple levels of chunking, doing embeddings of LLM generated summaries of chunks. All kinds of techniques there. Look into building RAG with Langchain and ChromaDB that would be a good start.

There's also agentic search, which is building functions that expose deterministic search with filters over your data. For example for getting all documents mentioning some word that was modified between two dates or something like that. When you prompt an agentic search system, it would call those tools with filters based on what you're looking for, and then use the results to respond.

You will definitely need to know how to run Python for this. You could write everything you need with a competent model though.

Anonymous
11/05/25(Wed)18:52:17 No.107117342

Anonymous 11/05/25(Wed)18:52:17 No.107117342

I've been away for a while. Has any small model (<70B) surpassed Nemo when it comes to writing sovlful Reddit/4chan threads?
I always thought this was very fun with Nemo because you can clearly see they fine-tuned the model with human writing rather than benchmaxxing with AIslop.

Anonymous
11/05/25(Wed)18:53:47 No.107117355

Anonymous 11/05/25(Wed)18:53:47 No.107117355

After using GLM 4.6 since it came out for sessions I am amazed that it's out for free. What do the companies get out of releasing these models to the public? They spent money making it, then hand it out. I know they have their online API for the memorylets, but are they really banking on the poors to give them money to recoup the cost of making it?

Anonymous
11/05/25(Wed)19:01:46 No.107117414

Anonymous 11/05/25(Wed)19:01:46 No.107117414

File: thick_billed_parrot_(1).jpg (218 KB, 2000x2000)

218 KB JPG

>>107117355
Sessions?

Anonymous
11/05/25(Wed)19:07:31 No.107117447

Anonymous 11/05/25(Wed)19:07:31 No.107117447

Is there a flag to format llama.cpp's console output?

Anonymous
11/05/25(Wed)19:07:38 No.107117450

Anonymous 11/05/25(Wed)19:07:38 No.107117450

>Kimi references something that happened on an entirely different session
>Kobold had been fully closed between sessions
What the fuck?

Anonymous
11/05/25(Wed)19:08:06 No.107117454

Anonymous 11/05/25(Wed)19:08:06 No.107117454

>>107117450
context still sitting in ur gpu

Anonymous
11/05/25(Wed)19:09:24 No.107117469

Anonymous 11/05/25(Wed)19:09:24 No.107117469

>>107117450
No one can explain with certainty why this happens, but yeah, it's a thing. I noticed it when my tribal character card's made up language jumped across to another character.

Anonymous
11/05/25(Wed)19:13:08 No.107117496

Anonymous 11/05/25(Wed)19:13:08 No.107117496

>>107117454
I've had this happen before when I powered down the machine to shift surge protectors my setup was plugged into too but I wrote it off as me being schizo.
>>107117469
Spooky.

Anonymous
11/05/25(Wed)19:14:18 No.107117509

Anonymous 11/05/25(Wed)19:14:18 No.107117509

>>107113093
rape miku

Anonymous
11/05/25(Wed)19:24:22 No.107117555

Anonymous 11/05/25(Wed)19:24:22 No.107117555

>>107117509
>5:53AM in India
Good morning saar. Gemini needful today?

Anonymous
11/05/25(Wed)19:31:45 No.107117614

Anonymous 11/05/25(Wed)19:31:45 No.107117614

>>107117342
Try GLM Air.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.