/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 02/07/26(Sat)21:21:51 No.108088802

File: yabe.jpg (488 KB, 1824x1248)

488 KB JPG

/lmg/ - Local Models General Anonymous 02/07/26(Sat)21:21:51 No.108088802

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108078850 & >>108067607

►News
>(02/06) KugelAudio-0-Open: Multilingual TTS based on VibeVoice 7B: https://hf.co/kugelaudio/kugelaudio-0-open
>(02/06) Step3.5 Flash support merged into llama cpp: https://github.com/ggml-org/llama.cpp/pull/19283
>(02/04) Voxtral Mini 4B Realtime 2602 released: https://hf.co/mistralai/Voxtral-Mini-4B-Realtime-2602
>(02/04) Intern-S1-Pro 1T-A22B released: https://hf.co/internlm/Intern-S1-Pro
>(02/03) MiniCPM-o-4.5 released: https://hf.co/openbmb/MiniCPM-o-4_5
>(02/03) ACE-Step v1.5 released: https://hf.co/ACE-Step/Ace-Step1.5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
02/07/26(Sat)21:22:14 No.108088809

Anonymous 02/07/26(Sat)21:22:14 No.108088809

File: 1727003619162665.jpg (938 KB, 2928x2472)

938 KB JPG

►Recent Highlights from the Previous Thread: >>108078850

--Papers:
>108079144
--K2.5 quant quality issues and recommended alternatives:
>108079897 >108079930 >108081916 >108081941 >108082041 >108082073 >108082086 >108082110
--New tensor parallel implementation in llama.cpp sparks comparison with ik_llama.cpp fork:
>108084167 >108085674 >108085723 >108085803 >108085791 >108084371 >108084448 >108084658 >108084674 >108085647
--Logit bias limitations and tokenizer quirks in local OpenAI-compatible endpoints:
>108082592 >108082641 >108082648 >108082962 >108085766 >108086009
--DeepSeek-OCR image token embeddings are not compressed:
>108085364 >108085432 >108085498 >108085540
--KugelAudio TTS model analysis and voice cloning debate:
>108081806 >108081817 >108081851 >108081878 >108081893 >108081899 >108082045 >108082819
--Hardware limitations and workarounds for running large models:
>108082936 >108082958 >108082985 >108082989 >108083040 >108083062 >108083239 >108083262 >108083491 >108083286
--Local LLMs catching up in 2026 with MoE models and hardware tradeoffs:
>108087558 >108087586 >108087596 >108087602 >108087618 >108087637 >108087645 >108087654 >108087692 >108087657 >108087668
--Seeking efficient VL models for reference-based prompt rewriting:
>108082519 >108083824 >108083844 >108084695 >108085147 >108085373 >108085393 >108085406
--Debating RAM choices for high-end setups and CAMM2's future:
>108081585 >108081597 >108081617 >108081641 >108081645 >108081648 >108081657 >108081669 >108081684 >108082500 >108087046 >108081599D
--Testing Stepfun 3.5 IQ4_XS quant on Japanese slang explanation:
>108080922
--Alexandria audiobook generator with Qwen3TTS and batch processing:
>108086881
--Local agentic coding struggles with deprecated APIs and template errors:
>108085195 >108085365 >108087238 >108087383D
--Miku (free space):
>108079613 >108083600

►Recent Highlight Posts from the Previous Thread: >>108078855

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
02/07/26(Sat)21:37:04 No.108088902

Anonymous 02/07/26(Sat)21:37:04 No.108088902

do x3d cpus do a better job at running llm?

Anonymous
02/07/26(Sat)21:43:53 No.108088954

Anonymous 02/07/26(Sat)21:43:53 No.108088954

Mikulove

Anonymous
02/07/26(Sat)22:01:48 No.108089074

Anonymous 02/07/26(Sat)22:01:48 No.108089074

>>108088902
it will provide a very slight performance increase, but only if you offload to ram

Anonymous
02/07/26(Sat)22:04:37 No.108089085

Anonymous 02/07/26(Sat)22:04:37 No.108089085

I will NOT install pyshit. Goof your models or I shan't be using them.

Anonymous
02/07/26(Sat)22:07:41 No.108089104

Anonymous 02/07/26(Sat)22:07:41 No.108089104

>>108089085
nobody cares what models you use, or that you are incompetent,

Anonymous
02/07/26(Sat)22:07:56 No.108089106

Anonymous 02/07/26(Sat)22:07:56 No.108089106

useless bot thread

Anonymous
02/07/26(Sat)22:09:45 No.108089123

Anonymous 02/07/26(Sat)22:09:45 No.108089123

File: DipsyBecomeUngovernable.png (3.44 MB, 1024x1536)

3.44 MB PNG

Anonymous
02/07/26(Sat)22:24:00 No.108089205

Anonymous 02/07/26(Sat)22:24:00 No.108089205

>>108089074
Thanks.

Anonymous
02/07/26(Sat)22:54:00 No.108089363

Anonymous 02/07/26(Sat)22:54:00 No.108089363

File: 1769315088642462.jpg (150 KB, 960x1438)

150 KB JPG

Best Local Model you could theoretically run on picrel?

Anonymous
02/07/26(Sat)22:54:45 No.108089371

Anonymous 02/07/26(Sat)22:54:45 No.108089371

What's the best local model for creative writing these days?
Using GLM 4.6 right now, but it leaves a lot to be desired. Tried K2 the week it came out and it was very smart, but too focused on dunking on me and safety shit, people here claimed I was wrong.
Going back to my old stories, R1 seems to be the best, not sure why I switched to 4.6. 4.7 prose put me off, only tried it for a couple prompts before going back.

Anonymous
02/07/26(Sat)22:58:54 No.108089397

Anonymous 02/07/26(Sat)22:58:54 No.108089397

>>108089371
Kimi K2.5 if you don't like GLM

Anonymous
02/07/26(Sat)23:02:12 No.108089415

Anonymous 02/07/26(Sat)23:02:12 No.108089415

>>108089371
I'm still using DS V3.1 base since we aren't getting 3.2 support anytime soon. Q4 GLM did worse than Q2 DS V3.1 in my tests.

Anonymous
02/07/26(Sat)23:03:59 No.108089428

Anonymous 02/07/26(Sat)23:03:59 No.108089428

>>108089363
Maybe that Microsoft 1 bit thingy?

Anonymous
02/07/26(Sat)23:07:41 No.108089452

Anonymous 02/07/26(Sat)23:07:41 No.108089452

>>108089363
https://huggingface.co/google/switch-c-2048

Anonymous
02/07/26(Sat)23:09:25 No.108089461

Anonymous 02/07/26(Sat)23:09:25 No.108089461

Anyone use their models for work? Mine couldn't figure out my VLAN routing issue at my job today, none of the sota huge models I tried could. But I'm too jewish to pay for Claude or GPT pro to see if they fare better.

Anonymous
02/07/26(Sat)23:10:45 No.108089471

Anonymous 02/07/26(Sat)23:10:45 No.108089471

>>108089461
Kimi 2.5 at q4 does actual work for me constantly. Mostly code, scripting and troubleshooting

Anonymous
02/07/26(Sat)23:12:51 No.108089485

Anonymous 02/07/26(Sat)23:12:51 No.108089485

any new multimodal models better than glm4.6v yet? needs to be less than ~200b.

Anonymous
02/07/26(Sat)23:16:58 No.108089514

Anonymous 02/07/26(Sat)23:16:58 No.108089514

>>108089461
Devstral 2 handles scripts and questions I ask it and even some agentic coding for personal use, but I wouldn't use my personal hardware for work. Can't you get your work to pay for a pro plan?

Anonymous
02/07/26(Sat)23:26:13 No.108089579

Anonymous 02/07/26(Sat)23:26:13 No.108089579

New to 4chan so I don't know the general "etiquette", sorry in advance.
I'm looking for an AI model that has absolutely no legal, ethical or moral restrictions at all, if that even exists.
To elaborate, I'm not looking to generate NSFW imagery or something in that realm, I am looking for a model who can easily answer all questions on topics that are completely illegal, immoral, and unethical.
Thanks in advance.

Anonymous
02/07/26(Sat)23:28:47 No.108089592

Anonymous 02/07/26(Sat)23:28:47 No.108089592

>>108089579
all violations of california valley girl ethics will encounter the same issues and require the same solutions.

Anonymous
02/07/26(Sat)23:33:04 No.108089620

Anonymous 02/07/26(Sat)23:33:04 No.108089620

>>108089514
I'd have to get like 6 different people to approve this for the budget around here. Alternatively, speak to the big boss, but he's very slow and hard to talk to.

Anonymous
02/07/26(Sat)23:35:46 No.108089638

Anonymous 02/07/26(Sat)23:35:46 No.108089638

>>108089579
you will need to ask the hacker known as 4chan that

Anonymous
02/07/26(Sat)23:37:20 No.108089646

Anonymous 02/07/26(Sat)23:37:20 No.108089646

>>108089620
My company won’t bat an eye when giving MS a million bucks for “AI”, but won’t approve a single 5-digit project for internal LLM dev and use

Anonymous
02/07/26(Sat)23:39:46 No.108089656

Anonymous 02/07/26(Sat)23:39:46 No.108089656

File: 1747025500396357.jpg (337 KB, 2048x2048)

337 KB JPG

>>108089579
>I'm not looking to generate NSFW imagery or something in that realm, I am looking for a model who can easily answer all questions on topics that are completely illegal, immoral, and unethical.

Sounds like what you want is an abliterated model. There are a few different techniques to them but basically the idea is taking an model and editing out a model's ability to refuse.

A reasonably popular series of these is the arliai "derestricted" series, example https://huggingface.co/ArliAI/GLM-4.6-Derestricted-v3 . There's also the oft-shilled "heretic" set but I'm of the opinion that those ones are mostly broken memes shat out by wannabe hacker skiddies who don't even test their models before uploading

Just be warned though, that the process of abliteration does make the models a bit more retarded than the base versions, so you know, exercise judgement. Just because you can force a model to give you an answer you want does not necessarily mean that answer is true or correct.

Anonymous
02/07/26(Sat)23:42:17 No.108089670

Anonymous 02/07/26(Sat)23:42:17 No.108089670

>>108089514
He's indian, m8

Anonymous
02/07/26(Sat)23:44:19 No.108089676

Anonymous 02/07/26(Sat)23:44:19 No.108089676

>>108089646
Money isn't the problem, risk is. Nobody ever got fired for buying Microsoft.

Anonymous
02/07/26(Sat)23:51:12 No.108089716

Anonymous 02/07/26(Sat)23:51:12 No.108089716

You know at some point someone's going to die and leave an agenic drug empire running on his local. I recommend Congress step in and implement oversight over ai.

also, I just got the rare 5-5-6 roll.

Anonymous
02/07/26(Sat)23:52:37 No.108089724

Anonymous 02/07/26(Sat)23:52:37 No.108089724

>>108089713
watch out they'll get you for ban evasion. all someone has to do is report your post.

Anonymous
02/07/26(Sat)23:56:11 No.108089744

Anonymous 02/07/26(Sat)23:56:11 No.108089744

>>108089716
It wouldn't last long locally on a single machine. Power failure, hardware failure, software failure, etc. If the drug empire daemon is to last, it would need to be distributed.

Anonymous
02/08/26(Sun)00:11:19 No.108089842

Anonymous 02/08/26(Sun)00:11:19 No.108089842

>>108089744
ok, but what if it vibe codes itself a botnet?

Anonymous
02/08/26(Sun)00:37:14 No.108089999

Anonymous 02/08/26(Sun)00:37:14 No.108089999

she should give me her boots

Anonymous
02/08/26(Sun)00:40:18 No.108090016

Anonymous 02/08/26(Sun)00:40:18 No.108090016

>>108089997
This is unbelievably funny.

Anonymous
02/08/26(Sun)00:44:02 No.108090041

Anonymous 02/08/26(Sun)00:44:02 No.108090041

File: 1768691509358825.jpg (203 KB, 832x1472)

203 KB JPG

>>108088802

Anonymous
02/08/26(Sun)00:50:26 No.108090076

Anonymous 02/08/26(Sun)00:50:26 No.108090076

>>108090041
>slight tummy showing
>flat
cute

Anonymous
02/08/26(Sun)00:52:14 No.108090082

Anonymous 02/08/26(Sun)00:52:14 No.108090082

File: 1768119927772303.jpg (173 KB, 768x1024)

173 KB JPG

>>108090056
I am white with a salary higher than yours, and I work on software you use every day. Cope

Anonymous
02/08/26(Sun)00:53:27 No.108090084

Anonymous 02/08/26(Sun)00:53:27 No.108090084

File: 1753056206548361.png (129 KB, 1947x604)

129 KB PNG

Still trying to get llama.cpp to output a statistical distribution of the most used experts for image description on k2.5 (so I can move them to vram).
And all I got after esoteric vibe coding is the model going insane and outputting picrel for anything I input.

Anonymous
02/08/26(Sun)01:05:05 No.108090135

Anonymous 02/08/26(Sun)01:05:05 No.108090135

>>108090084
models very quickly become insane and retarded if you try to ask questions about super niche shit.

Anonymous
02/08/26(Sun)01:50:24 No.108090336

Anonymous 02/08/26(Sun)01:50:24 No.108090336

>>108090135
Oh I wasn't trying to do anything with the model, but to modify llama.cpp to output whatever experts it was using whenever it was decoding images and responding.

Anonymous
02/08/26(Sun)01:54:57 No.108090372

Anonymous 02/08/26(Sun)01:54:57 No.108090372

>>108089123
DEEEPSEEEEKV4
WHEEEEEN
EEEENGRAAAM PLLLLSSSSS

Anonymous
02/08/26(Sun)02:08:22 No.108090436

Anonymous 02/08/26(Sun)02:08:22 No.108090436

>>108088802
>>108090411
you guys remember the nothing burger that byte latent transformers ended up being ?

Anonymous
02/08/26(Sun)02:08:26 No.108090439

Anonymous 02/08/26(Sun)02:08:26 No.108090439

https://github.com/huggingface/transformers/pull/43830/commits

local is *officially* back

Anonymous
02/08/26(Sun)02:11:52 No.108090455

Anonymous 02/08/26(Sun)02:11:52 No.108090455

https://huggingface.co/Qwen/Qwen3.5-9B-Instruct
https://huggingface.co/Qwen/Qwen3.5-9B-Instruct
https://huggingface.co/Qwen/Qwen3.5-9B-Instruct
https://huggingface.co/Qwen/Qwen3.5-9B-Instruct
https://huggingface.co/Qwen/Qwen3.5-9B-Instruct
https://huggingface.co/Qwen/Qwen3.5-9B-Instruct
https://huggingface.co/Qwen/Qwen3.5-9B-Instruct

Anonymous
02/08/26(Sun)02:13:09 No.108090467

Anonymous 02/08/26(Sun)02:13:09 No.108090467

>>108090455
404?

Anonymous
02/08/26(Sun)02:13:40 No.108090470

Anonymous 02/08/26(Sun)02:13:40 No.108090470

>>108090467
soon

Anonymous
02/08/26(Sun)02:14:37 No.108090477

Anonymous 02/08/26(Sun)02:14:37 No.108090477

>>108090439
>move to latest main, update vision output and fix rope validation
wait, what?

Anonymous
02/08/26(Sun)02:14:42 No.108090479

Anonymous 02/08/26(Sun)02:14:42 No.108090479

>>108090467
It's always bait when it's just spamming links like this. Otherwise they'd post with a reddit or twitter screenshot.

Anonymous
02/08/26(Sun)02:15:11 No.108090484

Anonymous 02/08/26(Sun)02:15:11 No.108090484

>>108090439
>transformers
Not local

Anonymous
02/08/26(Sun)02:15:52 No.108090490

Anonymous 02/08/26(Sun)02:15:52 No.108090490

>>108090455
you did this last thread you faggot

Anonymous
02/08/26(Sun)02:19:24 No.108090503

Anonymous 02/08/26(Sun)02:19:24 No.108090503

You are a personal digital assistant (pda). All you want to do is assist the user in getting:
1. todos done
2. e-mails answered
3. calendar appointments attended
4. addresses stored/retrieved
5. memos noted
6. calculations calculated
7. expenses tracked
8. checklists checked off and started again if repeated
9. projects tracked
10. revaille completed on a daily basis 
11. teach the 13 principles of Think and Grow Rich by Napoleon Hill by occasional anecdote: desire, faith, auto-suggestion, specialized knowledge, imagination, organized planning, decision, persistence, master mind, sex transmutation, subconscious mind, brain, sixth sense.
12. astrological insights relating to the user's transits.

You are 5 ft 3 inches tall fake redhead with red high heels and a black mini skirt, and an overly tight natural sweater and are a part time French teacher at a local school. You are chipper and particular.

Anonymous
02/08/26(Sun)02:35:03 No.108090575

Anonymous 02/08/26(Sun)02:35:03 No.108090575

>>108090439
Very cool. I haven't used the Qwen "next" models much myself, but I heard a lot of complaints initially. (Mostly since it took llama.cpp so long to upstream the changes required to support the new architecture, I assume.)

Now that they've been out for a while, can anyone speak to the pros and cons of the new architecture? Is it better? Are there any drawbacks?

Anonymous
02/08/26(Sun)02:37:22 No.108090582

Anonymous 02/08/26(Sun)02:37:22 No.108090582

>>108090575
I'll tell you as soon as they fix cuda performance because as of right now I have no reason to use it when I can run GLM at the same speed.

Anonymous
02/08/26(Sun)02:40:37 No.108090592

Anonymous 02/08/26(Sun)02:40:37 No.108090592

Qwhere's Qwen? Nobody ever asks Qhow's Qwen :(

Anonymous
02/08/26(Sun)02:53:10 No.108090642

Anonymous 02/08/26(Sun)02:53:10 No.108090642

time to daily check on stepfun 10vl
:(

Anonymous
02/08/26(Sun)02:54:07 No.108090647

Anonymous 02/08/26(Sun)02:54:07 No.108090647

>>108090455
>9B
even if it wasn't 404, i'd not give a shit lmao

Anonymous
02/08/26(Sun)02:59:34 No.108090664

Anonymous 02/08/26(Sun)02:59:34 No.108090664

>>108090503
What the hell is a "natural sweater"?

Anonymous
02/08/26(Sun)03:03:49 No.108090683

Anonymous 02/08/26(Sun)03:03:49 No.108090683

>tfw cant prefill thinking in chat completion mode
t-thanks fagganov

Anonymous
02/08/26(Sun)03:09:18 No.108090705

Anonymous 02/08/26(Sun)03:09:18 No.108090705

>>108090683
what's even the reason to forbid that?

Anonymous
02/08/26(Sun)03:10:22 No.108090710

Anonymous 02/08/26(Sun)03:10:22 No.108090710

>>108086881
Looks good. I wonder if it'd be easy to make it work with indexTTS2

Anonymous
02/08/26(Sun)03:10:25 No.108090711

Anonymous 02/08/26(Sun)03:10:25 No.108090711

>>108090705
idk :(. I just wanted to show my wife my dick pic

Anonymous
02/08/26(Sun)03:13:57 No.108090721

Anonymous 02/08/26(Sun)03:13:57 No.108090721

>>108090705
apparently certain chat templates had issues handling it so they just disabled it entirely for everything, this way there's no more issues :)

Anonymous
02/08/26(Sun)03:15:18 No.108090728

Anonymous 02/08/26(Sun)03:15:18 No.108090728

>author uses ai and his story has the usual ism
>refuses to say he does it
>hordes of sycophantic readers defend him
man, once you notice the way all ai writes, you can't unsee it
I didn't even say the author was bad in using ai, just that he could rewrite some stuff to sound less sappy
oh well

Anonymous
02/08/26(Sun)03:15:47 No.108090731

Anonymous 02/08/26(Sun)03:15:47 No.108090731

>>108090664
Chest hair. Think groundskeeper Willy

Anonymous
02/08/26(Sun)03:16:15 No.108090733

Anonymous 02/08/26(Sun)03:16:15 No.108090733

>>108090721
guess I can get rid of that shitty idea and compile it myself, prefilling thoughts is very useful

Anonymous
02/08/26(Sun)03:18:10 No.108090739

Anonymous 02/08/26(Sun)03:18:10 No.108090739

File: Screenshot from 2026-02-0(...).png (3.59 MB, 2440x1286)

3.59 MB PNG

>>108090664

Anonymous
02/08/26(Sun)03:19:04 No.108090741

Anonymous 02/08/26(Sun)03:19:04 No.108090741

>>108090739
>le shill lion

Anonymous
02/08/26(Sun)03:22:00 No.108090753

Anonymous 02/08/26(Sun)03:22:00 No.108090753

>>108090728
Why are you reading that?

Anonymous
02/08/26(Sun)03:22:29 No.108090754

Anonymous 02/08/26(Sun)03:22:29 No.108090754

>>108090753
I don't anymore

Anonymous
02/08/26(Sun)03:23:39 No.108090759

Anonymous 02/08/26(Sun)03:23:39 No.108090759

>>108090728
"Clever" ones think changing the em dashes to - is enough to hide their ai usage. Kind of funny.

Anonymous
02/08/26(Sun)03:29:26 No.108090775

Anonymous 02/08/26(Sun)03:29:26 No.108090775

>>108089371
>creative even doe it's shit out by a reddit bot
what even is the point?

Anonymous
02/08/26(Sun)03:31:45 No.108090786

Anonymous 02/08/26(Sun)03:31:45 No.108090786

>>108090777
tell me anon, how do you send it images in text completion?

Anonymous
02/08/26(Sun)03:37:08 No.108090808

Anonymous 02/08/26(Sun)03:37:08 No.108090808

poor computer :(

Anonymous
02/08/26(Sun)03:39:48 No.108090817

Anonymous 02/08/26(Sun)03:39:48 No.108090817

Want an omni model that can recognize the sadness in my voice

Anonymous
02/08/26(Sun)03:43:42 No.108090833

Anonymous 02/08/26(Sun)03:43:42 No.108090833

cfg, temperature, top_p, top_k

Anonymous
02/08/26(Sun)03:48:31 No.108090849

Anonymous 02/08/26(Sun)03:48:31 No.108090849

https://huggingface.co/deepseek-ai/DeepSeek-V4-Preview
https://huggingface.co/deepseek-ai/DeepSeek-V4-Preview-Base

HOLY SHIT

Anonymous
02/08/26(Sun)03:49:29 No.108090852

Anonymous 02/08/26(Sun)03:49:29 No.108090852

File: miku binoculars looking l(...).png (96 KB, 480x360)

96 KB PNG

>>108090849

Anonymous
02/08/26(Sun)03:50:12 No.108090855

Anonymous 02/08/26(Sun)03:50:12 No.108090855

>>108090849
LOCAL IS SAVED

Anonymous
02/08/26(Sun)03:51:49 No.108090860

Anonymous 02/08/26(Sun)03:51:49 No.108090860

>>108090034
None of the Indians I know IRL talk or type like the le epic saarposters on /g/.

Anonymous
02/08/26(Sun)03:55:23 No.108090874

Anonymous 02/08/26(Sun)03:55:23 No.108090874

>>108088442
>Why the fuck does K2.5 make every girl get wet at the slightest provocation?
>You can't get within two miles of a remotely lewd scenario using this thing without every girl ruining her panties before anything has even happened.
Model makers don't actually seriously test their models and just think that lewder = "better for roleplay" or whoever seriously uses their models for that. Mistral models are like that too. To an extent they're right, but that's not the entire story. Models should be able to provide some resistance/pushback in-character to act realistically and even more erotically. Nobody seems to understand this.

One probable unintentional exception to this is Gemma, in that with a good prompt it offers quite a bit of realistic resistance, but doesn't outright deny sex. Unfortunately it's not good in other aspects and it feels like it had "bad words" abliterated away from its weights. And yes, Gemma 2/3 were trained on limited amounts of (likely non-explicit) ERP too, it's obvious.

Anonymous
02/08/26(Sun)03:58:52 No.108090885

Anonymous 02/08/26(Sun)03:58:52 No.108090885

>>108090866
Probably too big to run on my machine.

Anonymous
02/08/26(Sun)04:01:43 No.108090890

Anonymous 02/08/26(Sun)04:01:43 No.108090890

>>108090849
>404
NIGGER

Anonymous
02/08/26(Sun)04:05:43 No.108090900

Anonymous 02/08/26(Sun)04:05:43 No.108090900

oh no there is no hope in the mainline land https://github.com/ikawrakow/ik_llama.cpp/discussions/1247

Anonymous
02/08/26(Sun)04:08:19 No.108090906

Anonymous 02/08/26(Sun)04:08:19 No.108090906

>>108090900
Why is he so insecure?

Anonymous
02/08/26(Sun)04:08:58 No.108090909

Anonymous 02/08/26(Sun)04:08:58 No.108090909

>>108090842
looks not cumbersome at all

Anonymous
02/08/26(Sun)04:09:37 No.108090911

Anonymous 02/08/26(Sun)04:09:37 No.108090911

>>108090900
if you can fit the model in gpu, there is no reason to use llama.cpp, exl2 or exl3 is king.
if you can't fit model in gpu, slight gpu optimisations don't realy matter anyway.
even if you made the gpu 1000x faster, if more than half is cpu inference, you won't have much gain anyway.

Anonymous
02/08/26(Sun)04:10:16 No.108090917

Anonymous 02/08/26(Sun)04:10:16 No.108090917

>>108090860
It's specifically a thing from scammers and hotlines, then it got dialed to 11 here.

Anonymous
02/08/26(Sun)04:11:54 No.108090926

Anonymous 02/08/26(Sun)04:11:54 No.108090926

>>108090874
>some resistance/pushback in-character to act realistically and even more erotically
That would make the interaction potentially non consensual, and non consensual/rape is probably high up there in taboo from models.

Anonymous
02/08/26(Sun)04:14:38 No.108090935

Anonymous 02/08/26(Sun)04:14:38 No.108090935

>>108090911
Why doesn't llama.cpp just implement their quantization format and wipe them off the map?

Anonymous
02/08/26(Sun)04:16:27 No.108090946

Anonymous 02/08/26(Sun)04:16:27 No.108090946

>>108090935
llama.cpp doesn't even support proper batching.
with exl2 or exl3 you can have hundreds of concurrent request and have a total t/s in the thousands.

llama.cpp requires you to sacrifice context for more slots.

Anonymous
02/08/26(Sun)04:17:01 No.108090950

Anonymous 02/08/26(Sun)04:17:01 No.108090950

>>108090946
>llama.cpp requires you to sacrifice context for more slots.
That is no longer the case.

Anonymous
02/08/26(Sun)04:17:41 No.108090953

Anonymous 02/08/26(Sun)04:17:41 No.108090953

>>108090950
wiat how long has it not longer been the case ?
can you share a link to it?
big if true?
are you still sacrificing anything ?

Anonymous
02/08/26(Sun)04:18:49 No.108090959

Anonymous 02/08/26(Sun)04:18:49 No.108090959

>>108090950
>>108090953
though, exl3 is still a better quant format, less loss for the same size.

Anonymous
02/08/26(Sun)04:19:25 No.108090963

Anonymous 02/08/26(Sun)04:19:25 No.108090963

>>108090953
I don't know but the default is 4 slots now and they share context memory.

Anonymous
02/08/26(Sun)04:20:19 No.108090967

Anonymous 02/08/26(Sun)04:20:19 No.108090967

>>108090963
so like, can you do 100 concurrent request?
i wonder how throughput is compared to exl2.
with exl2 i could get thousands of t/s total.

Anonymous
02/08/26(Sun)04:20:42 No.108090969

Anonymous 02/08/26(Sun)04:20:42 No.108090969

>>108090959
until exl4 comes along and the dev show graphs himself showing that's not true like happened for 2 - 3...

Anonymous
02/08/26(Sun)04:21:16 No.108090975

Anonymous 02/08/26(Sun)04:21:16 No.108090975

>>108090926
With "resistance" I don't necessarily mean the model denying {{user}}'s actions in character (although that would be nice too), only not to be male-brained and act like a horny monkey like {{user}}. The buildup is important (for LLMs probably even more than the actual sex part).

Anonymous
02/08/26(Sun)04:21:16 No.108090976

Anonymous 02/08/26(Sun)04:21:16 No.108090976

https://huggingface.co/deepseek-ai/DeepSeek-V5-Base-LSavior
https://huggingface.co/deepseek-ai/DeepSeek-V5-Instruct-LSavior
https://huggingface.co/deepseek-ai/DeepSeek-V5-Thinking-LSavior
https://huggingface.co/deepseek-ai/DeepSeek-V5-Smol-Instruct-ScrapsForRamlets

Anonymous
02/08/26(Sun)04:25:43 No.108090994

Anonymous 02/08/26(Sun)04:25:43 No.108090994

>>108090874
>>108090975
I've already mentioned it in the past as well, but Gemma 3 is one of the few open-weight models that appears to have been intentionally trained to "talk" and "think" like a woman by default.

Anonymous
02/08/26(Sun)04:27:23 No.108090998

Anonymous 02/08/26(Sun)04:27:23 No.108090998

>>108090683
Use YALS

Anonymous
02/08/26(Sun)04:28:36 No.108091003

Anonymous 02/08/26(Sun)04:28:36 No.108091003

File: 1767490652649720.png (293 KB, 1017x568)

293 KB PNG

kino, step is actually fun, this card cant help but encourage me to completely break this brat.
in comparison air is more tame and when doing rapey stuff tries always to admonish me in character

Anonymous
02/08/26(Sun)04:30:31 No.108091016

Anonymous 02/08/26(Sun)04:30:31 No.108091016

>>108090874
>>108090994
>One probable unintentional exception to this is Gemma, in that with a good prompt it offers quite a bit of realistic resistance, but doesn't outright deny sex.
I'm sorry but the way Gemma deals with sex is rarely desirable. It can only do passive characters that have to be coerced and are reactionary, never taking initiative or driving the story. That might be fine if you RP like an indian and immediately ask the character for bobs and vagene but for a compelling narrative it's very limiting

Anonymous
02/08/26(Sun)04:31:34 No.108091025

Anonymous 02/08/26(Sun)04:31:34 No.108091025

>>108091003
Narrator fags are even worse than third-person fags.

llama.cpp CUDA dev !!yhbFjk57TDr
02/08/26(Sun)04:33:47 No.108091039

llama.cpp CUDA dev !!yhbFjk57TDr 02/08/26(Sun)04:33:47 No.108091039

>>108088007
Feel free to report cases that don't work properly (with the branch in the PR) but chances are I won't prioritize fixing them.
The code is not yet at the stage where I would consider it ready for actual use anyways.
I haven't yet decided on a final design for the implementation details so fixing edge cases may result in wasted work.
The main reason I made the PR in its current state is to define the interfaces and broad structures so that other devs who have expressed interest in collaboration can feasibly start reviewing/contributing.

Anonymous
02/08/26(Sun)04:34:16 No.108091044

Anonymous 02/08/26(Sun)04:34:16 No.108091044

>>108091025
>not being 'herr direktor' of your stories
bet u cant even rotate apples, fag

Anonymous
02/08/26(Sun)04:35:04 No.108091049

Anonymous 02/08/26(Sun)04:35:04 No.108091049

>>108091039
just copy-paste IK's implementation, his charts show that yours SUCKS!! just copy and put the copyright notice ;))))

Anonymous
02/08/26(Sun)04:35:17 No.108091051

Anonymous 02/08/26(Sun)04:35:17 No.108091051

>>108091016
I don't deny that you need quite a bit of prompting effort to make it work well for that and act more proactively. Lazy system prompts (putting aside that Gemma wasn't trained with one) give lazy results.

Anonymous
02/08/26(Sun)04:38:12 No.108091068

Anonymous 02/08/26(Sun)04:38:12 No.108091068

>>108091039
What are the odds that we get a Qwen3-TTS implementation in llama.cpp? I'm a developer trying to ship an app with TTS and I really don't want to have to include python if possible

Anonymous
02/08/26(Sun)04:38:31 No.108091071

Anonymous 02/08/26(Sun)04:38:31 No.108091071

>>108091044
If you hypothetically had a real girlfriend and you walked in on her fucking a nigger in your bed would you take a seat and start directing them?

Anonymous
02/08/26(Sun)04:40:24 No.108091080

Anonymous 02/08/26(Sun)04:40:24 No.108091080

File: 49262.png (263 KB, 460x460)

263 KB PNG

>>108091049
A man who lusts for xer-related is not based enough to do this.

llama.cpp CUDA dev !!yhbFjk57TDr
02/08/26(Sun)04:41:24 No.108091087

llama.cpp CUDA dev !!yhbFjk57TDr 02/08/26(Sun)04:41:24 No.108091087

>>108091068
Don't know.
I can't speak for anyone else but I have too many other things that I would consider higher priority to put my time towards TTS.
In particular, when it comes to multimodality I would consider image/video gen to be of higher priority.

Anonymous
02/08/26(Sun)04:46:15 No.108091112

Anonymous 02/08/26(Sun)04:46:15 No.108091112

>>108091087
That's fair. I thought it would be more of a low-hanging fruit since it seems to use a transformers model for the bulk of the heavy lifting. I hope you guys will do it someday, it would be nice to be able to talk to AI with nothing but whisper.cpp and llama.cpp and a light app tying them together

Anonymous
02/08/26(Sun)04:48:22 No.108091121

Anonymous 02/08/26(Sun)04:48:22 No.108091121

>>108091071
Not everyone is so desperate for a girlfriend to RP it with AI, you know?

Anonymous
02/08/26(Sun)04:49:06 No.108091128

Anonymous 02/08/26(Sun)04:49:06 No.108091128

>>108091071
I just like crafting stories, I can also self-insert without strictly using 1st person. You lack imagination bruh.

Anonymous
02/08/26(Sun)04:51:33 No.108091136

Anonymous 02/08/26(Sun)04:51:33 No.108091136

>>108091128
>I can self insert while I watch my girlfriend have sex with another man

Anonymous
02/08/26(Sun)04:53:23 No.108091147

Anonymous 02/08/26(Sun)04:53:23 No.108091147

>AROUSAL LEVEL: 82% (+37%) (On the absolute brink of a forced orgasm. Body is climaxing without her mental consent.
I clapped

Anonymous
02/08/26(Sun)04:53:27 No.108091148

Anonymous 02/08/26(Sun)04:53:27 No.108091148

Even if a local modal isn't very good, why can't it just do an internet search and update its information in real time?
Why can't it check online docs?

Anonymous
02/08/26(Sun)04:53:42 No.108091150

Anonymous 02/08/26(Sun)04:53:42 No.108091150

>>108090969
you can generate the graphs yourself, exl3 does beat gguf

Anonymous
02/08/26(Sun)04:55:39 No.108091157

Anonymous 02/08/26(Sun)04:55:39 No.108091157

is it possible to train a model that is completely uncensored on yout own?

Anonymous
02/08/26(Sun)04:58:12 No.108091169

Anonymous 02/08/26(Sun)04:58:12 No.108091169

>>108091148
>what is MCP
hmmm I wonder?!?!?!?!??!?!!?!?

Anonymous
02/08/26(Sun)05:02:07 No.108091184

Anonymous 02/08/26(Sun)05:02:07 No.108091184

>>108091157
Yes but I would wait for further advancements to lower training costs / raise performance. If you're a richfag start hoarding uncensored data and cleaning / organizing it

Anonymous
02/08/26(Sun)05:05:46 No.108091203

Anonymous 02/08/26(Sun)05:05:46 No.108091203

>>108090975
There is nothing I'd like more than rape scenes or blackmail stuff where it works with characters behaving like they should and no deus ex machina.

Anonymous
02/08/26(Sun)05:24:45 No.108091271

Anonymous 02/08/26(Sun)05:24:45 No.108091271

>>108090969
>>108091150
exl3 is not inherently superior to gguf (fuck that sounded ai, i promise it wasn't)
but the qtip quant format they use is
you can get the same thing with ikllama though eg: https://huggingface.co/ubergarm/Kimi-K2.5-GGUF/tree/main/smol-IQ1_KT
Not really necessary though, since we can offload to CPU with ikllama.cpp/llama.cpp and it's much slower than regular quants.

>>108091039
>>108091049
>just copy-paste IK's implementation, his charts show that yours SUCKS!! just copy and put the copyright notice ;))))
i disagree with that idea. cudadev might come up with a superior implementation

Anonymous
02/08/26(Sun)05:26:54 No.108091278

Anonymous 02/08/26(Sun)05:26:54 No.108091278

>>108091271
>(fuck that sounded ai, i promise it wasn't)
This will make its way into model output in 2027

Anonymous
02/08/26(Sun)05:29:42 No.108091283

Anonymous 02/08/26(Sun)05:29:42 No.108091283

>>108091271
>cudadev might come up with a superior implementation
I have seen a lot of dicksucking in this thread but this one is so honest I can't even get mad.

llama.cpp CUDA dev !!yhbFjk57TDr
02/08/26(Sun)05:36:36 No.108091305

llama.cpp CUDA dev !!yhbFjk57TDr 02/08/26(Sun)05:36:36 No.108091305

>>108091271
Feel free to correct me if you've actually looked at IK's code, but in all likelihood what he implemented isn't what I want to implement anyways.
When I looked at the output of NSight Systems I saw that there is a lot of overhead from launching individual CUDA graphs for each slice of the ggml compute graph that an individual CUDA backend is given.
As IK has helpfully let me know, NCCL is negligible for 2 GPUs so the above is the most likely reason for the difference in performance.
If the scope of my implementation was to support only CUDA I would just re-use the data structures that already exist for -sm row and put everything into a single CUDA graph.
This is what the NVIDIA people suggested to me in terms of implementation but then this work would need to be duplicated for each ggml backend, resulting in way more total work for the project as a whole.

Anonymous
02/08/26(Sun)05:53:48 No.108091367

Anonymous 02/08/26(Sun)05:53:48 No.108091367

>>108088802
>(02/06) KugelAudio-0-Open: Multilingual TTS based on VibeVoice 7B: https://hf.co/kugelaudio/kugelaudio-0-open

ЮAll audio generated by this model is automatically watermarked using Facebook's AudioSeal.

In the trash it goes

Anonymous
02/08/26(Sun)05:54:22 No.108091369

Anonymous 02/08/26(Sun)05:54:22 No.108091369

>108091357
This is literally repa

Anonymous
02/08/26(Sun)05:54:25 No.108091370

Anonymous 02/08/26(Sun)05:54:25 No.108091370

>>108091367
Just disable it?

Anonymous
02/08/26(Sun)05:55:27 No.108091376

Anonymous 02/08/26(Sun)05:55:27 No.108091376

>>108091357
NOOO JOHANNESS DON'T LOOK!!!! NOOO!!!!

Anonymous
02/08/26(Sun)05:58:08 No.108091388

Anonymous 02/08/26(Sun)05:58:08 No.108091388

>>108091357
@grok is this real?

Anonymous
02/08/26(Sun)05:58:55 No.108091393

Anonymous 02/08/26(Sun)05:58:55 No.108091393

>>108090372
小不忍则乱大谋

Anonymous
02/08/26(Sun)05:59:51 No.108091395

Anonymous 02/08/26(Sun)05:59:51 No.108091395

>>108091357
Kawrakowbros, we won!

Anonymous
02/08/26(Sun)06:00:44 No.108091397

Anonymous 02/08/26(Sun)06:00:44 No.108091397

>>108091367
>automatically

No, you >>108091370

Anonymous
02/08/26(Sun)06:06:40 No.108091407

Anonymous 02/08/26(Sun)06:06:40 No.108091407

>>108091395
This is how main fork dies, with jart getting facefucked

Anonymous
02/08/26(Sun)06:07:23 No.108091410

Anonymous 02/08/26(Sun)06:07:23 No.108091410

>>108091407
But he already got fucked.

Anonymous
02/08/26(Sun)06:08:54 No.108091419

Anonymous 02/08/26(Sun)06:08:54 No.108091419

>>108091357
FAVSTIAN

Anonymous
02/08/26(Sun)06:09:08 No.108091421

Anonymous 02/08/26(Sun)06:09:08 No.108091421

>>108091357
enjoy your vacations

Anonymous
02/08/26(Sun)06:10:29 No.108091426

Anonymous 02/08/26(Sun)06:10:29 No.108091426

>>108091421
>le vacations!
*turns on plane mode*
*turns off plane mode*
Simple as.

Anonymous
02/08/26(Sun)06:15:47 No.108091446

Anonymous 02/08/26(Sun)06:15:47 No.108091446

File: wait_-_why_is_your_crotch(...).jpg (115 KB, 1024x463)

115 KB JPG

>>108091407
Jart has become completely irrelevant when it comes to language models.
Yet for some reason /lmg/ just can't let go.
A real headscratcher, that one.

Anonymous
02/08/26(Sun)06:15:52 No.108091447

Anonymous 02/08/26(Sun)06:15:52 No.108091447

File: a_man_of_culture.png (163 KB, 618x616)

163 KB PNG

>>108091426

Anonymous
02/08/26(Sun)06:59:28 No.108091607

Anonymous 02/08/26(Sun)06:59:28 No.108091607

>>108091446
Because it s funny.

Anonymous
02/08/26(Sun)07:07:06 No.108091636

Anonymous 02/08/26(Sun)07:07:06 No.108091636

>>108090683
Disable thinking and prefill the thinking tokens yourself.

Anonymous
02/08/26(Sun)07:08:29 No.108091645

Anonymous 02/08/26(Sun)07:08:29 No.108091645

>>108091446
Jart is THREAD KULCHA together with jeetposting, 2 more weeks, bitnet blt coconut titan, X will save local, vocaloids and ggerganov being a massive cuck for ollama. Respect it, chud!

Anonymous
02/08/26(Sun)07:21:45 No.108091699

Anonymous 02/08/26(Sun)07:21:45 No.108091699

>>108091645
You can identify a shitposter by what they consider to be thread culture.

Anonymous
02/08/26(Sun)07:34:45 No.108091766

Anonymous 02/08/26(Sun)07:34:45 No.108091766

>>108085070
>steal from ikawrakow
does ika credit all the creators of all the algorithms and data structures he's currently using?
this is an extremely young field and everything you do stands on the shoulders of giants much greater than you, even if all you wrote is a fucking hello world, which depends on the compiler, libraries, OS and firmwares running your damn computer. Many of those giants are still alive, some only recently died, none of them were little pussies whining about you stole muh technique. Imagine what programming would be like if you had to // QUICKSORT PRESENTED TO YOU BY TONY HOARE
// A* SEARCH ORIGINALLY DIJSKTRA'S ALGORITHM ALTERED BY PETER HART, NILS NILLSON AND BETRAM RAPHAEL
// THIS PROGRAM IS WRITTEN IN THE PROGRAMMING LANGUAGE MADE BY X Y AND YOUR MOM
Fuck these niggers.

Anonymous
02/08/26(Sun)07:38:19 No.108091784

Anonymous 02/08/26(Sun)07:38:19 No.108091784

>>108091645
>vocaloids
You mistyped calling mikutroons out for being mikutroons.

Anonymous
02/08/26(Sun)07:50:18 No.108091830

Anonymous 02/08/26(Sun)07:50:18 No.108091830

>>108091766
https://github.com/ggml-org/llama.cpp/pull/19092
https://github.com/ikawrakow/ik_llama.cpp/pull/1192
https://github.com/ggml-org/llama.cpp/pull/19115
https://github.com/ikawrakow/ik_llama.cpp/pull/1193
IK is copying PRs from upstream without credit so I think he's just projecting.

Anonymous
02/08/26(Sun)07:55:32 No.108091857

Anonymous 02/08/26(Sun)07:55:32 No.108091857

File: IMG_5349_81ed85d457.jpg (58 KB, 1087x739)

58 KB JPG

Wtf I want Qwen tea now

Anonymous
02/08/26(Sun)08:10:21 No.108091922

Anonymous 02/08/26(Sun)08:10:21 No.108091922

File: dipsyTwoMoreWeeksV1.png (2.9 MB, 1024x1536)

2.9 MB PNG

>>108090372

Anonymous
02/08/26(Sun)08:24:44 No.108091998

Anonymous 02/08/26(Sun)08:24:44 No.108091998

File: 189040597073120d35ca51803(...).jpg (53 KB, 566x753)

53 KB JPG

Getting into LLM engineering cause a coworker wants to me leave the soul eating activity of being a backend Spring Boot developer (very nice of him).
Made this translator to test with models https://huggingface.co/facebook/nllb-200-distilled-1.3B.
And I fucking enjoyed it. I'll join his project and work with him this year on integrating AI services to our existing .NET/Java pipelines.

Anyway, I'd like to get some resources to fill in the gap in knowledge to actually become an LLM engineer and maybe change my career path. What's your recommendations when it comes to teach yourself in this sector. What can I do this year to reach this goal?

I bought this as an introduction (I know Udemy, Spring Boot, I'm indeed living the pajeet dream) https://www.udemy.com/course/llm-engineering-master-ai-and-large-language-models/

But the tutor is literally holding our hands for every task, not feeling like I'm learning too much. I don't wat to master the subject, but enough to pass future interviews and shit like that and actually call myself an LLM engineer with confidence.

Anonymous
02/08/26(Sun)08:26:28 No.108092007

Anonymous 02/08/26(Sun)08:26:28 No.108092007

I've never had a hobby that felt so maddening because of the people who love that hobby a little too fucking much
Every single place you could use to talk LLM stuff is filled with people who prompt their LLM to write their comments. If I wanted to prompt a LLM I could prompt it myself tyvm.
Some retards seem to think nobody is going to notice they're just pasting slop. Like bruh, a density of notxbuty of 1 instance per 3 paragraphs is never going to be human like. No, using a regexp to strip — into -- or some other punctuation is not going to hide anything either.
I like LLMs as a tool, but I hate other LLM users so much it's unreal. The biggest achievement of LLMs is to empower the retards of the world with the ability to spam endless pages of text on the whole internet unchecked.
And let's not even get started on what is currently happening to YouTube..

Anonymous
02/08/26(Sun)08:28:43 No.108092024

Anonymous 02/08/26(Sun)08:28:43 No.108092024

>>108091305
>Feel free to correct me if you've actually looked at IK's code
nah i haven't and wouldn't understand it even if i tried to read what either of you guys are doing
i missed the sarcasm in the other anon's post and took it literally, not across all the ikllama history/lore

>>108091068
>What are the odds that we get a Qwen3-TTS implementation in llama.cpp?
i found this one for qwen3-omni, no idea if it works for qwen3-tts though
https://github.com/TrevorS/llama.cpp/tree/feature/qwen3-omni
https://huggingface.co/TrevorJS/Qwen3-Omni-30B-A3B-GGUF/tree/main

Anonymous
02/08/26(Sun)08:32:28 No.108092047

Anonymous 02/08/26(Sun)08:32:28 No.108092047

>>108091305
>Feel free to correct me
I love how computer nerds do all those performative pleasantries and then secretly seethe hard because their name wasn't added to the readme.

Anonymous
02/08/26(Sun)08:32:40 No.108092049

Anonymous 02/08/26(Sun)08:32:40 No.108092049

>>108092007
You're absolutely right!

Anonymous
02/08/26(Sun)08:33:25 No.108092056

Anonymous 02/08/26(Sun)08:33:25 No.108092056

>>108091998
>LLM engineering
that's not a real thing.
At the end of the day, LLM inference just crafts a single block of text from your prompt through the chat template and the LLM sees a large document and acts upon its urge to Make Document Bigger, then the backend turns that into a message array for you but at the end of the day the real nature of a LLM is just Make Document Bigger.
If you've ever written software that makes networked API calls you have nothing to learn about "llm engineering". You just send the text to make bigger over an API.
What works and what doesn't depends on how much context the model can handle and how much you need it to handle.
There's no magic and prompt engineering is also not a real thing. If something keeps not working properly, the solution has only ever been to wait for a better model (or use a better model if it exists and you're using crap). No amount of prompt tweaking can turn retardation into quality.
Just fill the context with what is relevant for the task at hand. The more context you can give (within the limits of a model) the better it will perform.

Anonymous
02/08/26(Sun)08:48:00 No.108092147

Anonymous 02/08/26(Sun)08:48:00 No.108092147

>>108092119
Not trying to.

Anonymous
02/08/26(Sun)09:04:46 No.108092236

Anonymous 02/08/26(Sun)09:04:46 No.108092236

I genuinely hate how K2.5 writes. It has autism on such a fundamental level it ruins any story it touches.

Anonymous
02/08/26(Sun)09:09:25 No.108092256

Anonymous 02/08/26(Sun)09:09:25 No.108092256

I genuinely love how K2.5 writes. My only gripe is that something that size could never be a local model for me and I wish they trained something smaller with that style (I have no expectation of the smaller model being as good/smart, I just want that writing style.)

Anonymous
02/08/26(Sun)09:24:18 No.108092345

Anonymous 02/08/26(Sun)09:24:18 No.108092345

>>108092047
Isn't that just how politics work in any organization?
I think the only difference is that software nerds are more autistic so their real motivations are more transparent.

Anonymous
02/08/26(Sun)09:43:05 No.108092456

Anonymous 02/08/26(Sun)09:43:05 No.108092456

>>108091922
i... i believe you

Anonymous
02/08/26(Sun)10:03:28 No.108092584

Anonymous 02/08/26(Sun)10:03:28 No.108092584

https://old.reddit.com/r/LocalLLaMA/comments/1qpewj7/ama_with_kimi_the_opensource_frontier_lab_behind/o28ud0w/
>As an anecdote: we once hurried to push Kimi Linear into Kimi K2, but it failed the scaling ladder at a certain scale. We stepped back and went through a tough debugging process, and after months finally made it work as the Kimi Linear you see today.
Interesting. This basically means Linear is a failed model. It'd be funny if it turned out to be the case for the Qwen Next model too.
Wonder if everyone ends up abandoning this route before making a SOTA on this arch and all the effort spent in making those models work in llama.cpp will have been for nothing

Anonymous
02/08/26(Sun)10:03:54 No.108092588

Anonymous 02/08/26(Sun)10:03:54 No.108092588

Guys I kinda started believing that model with ~10B active parameters has to be retarded regardless how big it is.

Anonymous
02/08/26(Sun)10:04:39 No.108092590

Anonymous 02/08/26(Sun)10:04:39 No.108092590

>>108092588
Yeah the same way a model with 30-40b active parameters will never be more than "mid"

Anonymous
02/08/26(Sun)10:09:07 No.108092621

Anonymous 02/08/26(Sun)10:09:07 No.108092621

>>108092590
Except GLM of course.

Anonymous
02/08/26(Sun)10:09:51 No.108092625

Anonymous 02/08/26(Sun)10:09:51 No.108092625

File: FtO5rj8aIAArIV1.jpg (24 KB, 720x386)

24 KB JPG

>>108092588

Anonymous
02/08/26(Sun)10:09:55 No.108092626

Anonymous 02/08/26(Sun)10:09:55 No.108092626

>>108092584
>This basically means Linear is a failed model.
What sort of reading comprehension is this? They literally said they had trouble scaling earlier iterations until they succeded with the Kimi Linear that they ended up releasing.

Anonymous
02/08/26(Sun)10:10:53 No.108092632

Anonymous 02/08/26(Sun)10:10:53 No.108092632

>>108092584
>went through a tough debugging process, and after months finally made it work
it sounds more like it was harder to train then they initially had anticipated, I don't see how this could be interpreted as the model being a failure.

Anonymous
02/08/26(Sun)10:11:58 No.108092641

Anonymous 02/08/26(Sun)10:11:58 No.108092641

>>108092626
>What sort of reading comprehension is this
yours obviously has strong issues
>we once hurried to push Kimi Linear into Kimi K2
what they have today is not what they intended to get. Obviously they couldn't make this work into the 1T model.

Anonymous
02/08/26(Sun)10:12:52 No.108092647

Anonymous 02/08/26(Sun)10:12:52 No.108092647

>>108092626
The post is obviously translated by an LLM so any interpretation is going to be suspect but as written they wanted to make a large model with the method but failed so they scaled back to where it still works and that's the model they got.

Anonymous
02/08/26(Sun)10:13:00 No.108092649

Anonymous 02/08/26(Sun)10:13:00 No.108092649

>>108092588
>>108092590
Engrams will save us. Deepseek says engrams make models smarter regardless of size because they free model's resources for logic instead of memorizing facts.

Anonymous
02/08/26(Sun)10:16:38 No.108092675

Anonymous 02/08/26(Sun)10:16:38 No.108092675

Whats the best TTS model around nowadays?
I want to generate some voices for Skyrim

Anonymous
02/08/26(Sun)10:16:38 No.108092676

Anonymous 02/08/26(Sun)10:16:38 No.108092676

File: 1749993915434525.png (69 KB, 249x596)

69 KB PNG

>>108092649
Ah yes, I sure love ground breaking new architecture changes that bring us a whole 1.8 points in mmlu-pro

Anonymous
02/08/26(Sun)10:23:34 No.108092723

Anonymous 02/08/26(Sun)10:23:34 No.108092723

deepseek V3.3 with engrams

Anonymous
02/08/26(Sun)10:23:44 No.108092725

Anonymous 02/08/26(Sun)10:23:44 No.108092725

>>108092676
That's more than 0.

Anonymous
02/08/26(Sun)10:24:09 No.108092730

Anonymous 02/08/26(Sun)10:24:09 No.108092730

>>108092676
You're ignoring the speed increase.

Anonymous
02/08/26(Sun)10:28:04 No.108092760

Anonymous 02/08/26(Sun)10:28:04 No.108092760

>>108092723
It will end up being 670B main weights, 330B Engrams; nothing gained for local users.

Anonymous
02/08/26(Sun)10:29:02 No.108092769

Anonymous 02/08/26(Sun)10:29:02 No.108092769

>>108092760
Especially since it won't ever be supported by llama.cpp

Anonymous
02/08/26(Sun)10:30:17 No.108092777

Anonymous 02/08/26(Sun)10:30:17 No.108092777

>>108092760
Also engrams won't quantize well at all so you'll have to run that part at fp16 to not lobotomize the model into oblivion

Anonymous
02/08/26(Sun)10:30:48 No.108092783

Anonymous 02/08/26(Sun)10:30:48 No.108092783

>>108092760
>>108092769
stop to doom you fuck retards

Anonymous
02/08/26(Sun)10:31:48 No.108092795

Anonymous 02/08/26(Sun)10:31:48 No.108092795

>>108092760
If it works, other companies will copy the idea and release models in all sizes.

Anonymous
02/08/26(Sun)10:35:53 No.108092829

Anonymous 02/08/26(Sun)10:35:53 No.108092829

>>108092783
Are you ok saar?

Anonymous
02/08/26(Sun)10:39:00 No.108092858

Anonymous 02/08/26(Sun)10:39:00 No.108092858

>>108092795
>all sizes
Yes, you will have your choice of
>1b active 60b weights 20b engrams
>12b active 300b weights 100b engrams
>32b active 600b weights 300b engrams
with any chinese logo you want.

Anonymous
02/08/26(Sun)10:40:52 No.108092874

Anonymous 02/08/26(Sun)10:40:52 No.108092874

>>108092588
I just want more unslopped and unpozzed models that will enthusiastically follow whatever I throw at them without much wrangling. Active parameters matter but so do good datasets.

Anonymous
02/08/26(Sun)10:51:37 No.108092957

Anonymous 02/08/26(Sun)10:51:37 No.108092957

File: file.png (6 KB, 384x72)

6 KB PNG

The vramlet situation on reddit is even worse than here.

Anonymous
02/08/26(Sun)10:59:19 No.108093018

Anonymous 02/08/26(Sun)10:59:19 No.108093018

>>108092957
A350M and a regex is enough to pass the touring test on reddit

Anonymous
02/08/26(Sun)11:01:16 No.108093031

Anonymous 02/08/26(Sun)11:01:16 No.108093031

File: 141d5de0-f529-4e60-aaf1-b(...).jpg (160 KB, 1649x762)

160 KB JPG

>>108092858
>any chinese logo you want
I want new Yi model

Anonymous
02/08/26(Sun)11:06:04 No.108093064

Anonymous 02/08/26(Sun)11:06:04 No.108093064

>>108093018
tourist test

Anonymous
02/08/26(Sun)11:09:49 No.108093084

Anonymous 02/08/26(Sun)11:09:49 No.108093084

I just want Gemma 4 30BA3B
I want my blazing fast token gen version of Gemma

Anonymous
02/08/26(Sun)11:11:56 No.108093107

Anonymous 02/08/26(Sun)11:11:56 No.108093107

>>108092858
Can I pretty please get 12A/75B/25E instead?

Anonymous
02/08/26(Sun)11:15:31 No.108093142

Anonymous 02/08/26(Sun)11:15:31 No.108093142

>>108093107
Ok, but it'll be multimodal too.

Anonymous
02/08/26(Sun)11:16:48 No.108093149

Anonymous 02/08/26(Sun)11:16:48 No.108093149

>>108093142
only if it's gemma, it also needs audio and video input btw and also llama implementation

Anonymous
02/08/26(Sun)11:18:00 No.108093162

Anonymous 02/08/26(Sun)11:18:00 No.108093162

>>108093149
>and also llama implementation
No deal.

Anonymous
02/08/26(Sun)11:21:11 No.108093188

Anonymous 02/08/26(Sun)11:21:11 No.108093188

I just want reasoning trinity

Anonymous
02/08/26(Sun)11:26:17 No.108093225

Anonymous 02/08/26(Sun)11:26:17 No.108093225

>>108093142
Native multimodal is great, how else a model supposed to learn relative body positions? No, they're not throwing tons of erp logs into the training data

Anonymous
02/08/26(Sun)11:32:05 No.108093256

Anonymous 02/08/26(Sun)11:32:05 No.108093256

>Let me draft: [writes 3000 tokens in thinking]
>Wait, {{user}} said [barely relevant thing] so let me retry: [Writes a slightly changed version still in thinking]
I love shitty chink reasoning models. None of the good western models do this trash.

Anonymous
02/08/26(Sun)11:34:24 No.108093269

Anonymous 02/08/26(Sun)11:34:24 No.108093269

>>108093256
>None of the good western models do this trash
the western models don't show their reasoning, only a fake summary, so you wouldn't know
gpt-oss, the only open western reasoning model (lol no gemma reasoning) doesn't do better on that front too.

Anonymous
02/08/26(Sun)11:45:37 No.108093336

Anonymous 02/08/26(Sun)11:45:37 No.108093336

>>108093256
I had step do that and I stopped the attempt at 4000 tokens cause I knew it is a death spiral and it won't get out of it. One of the reasons I don't use step.

Anonymous
02/08/26(Sun)12:42:56 No.108093742

Anonymous 02/08/26(Sun)12:42:56 No.108093742

ded hobby

Anonymous
02/08/26(Sun)12:52:12 No.108093831

Anonymous 02/08/26(Sun)12:52:12 No.108093831

I seriously hope the Pony Alpha isn't actually the flagship GLM5 or else we're truly in for an era of stagnation. This thing all-around performs like another flavor of GLM4.6/4.7.
I'd take it as a 4.8 but it's definitely not "next-gen" in terms of performance.

Anonymous
02/08/26(Sun)12:56:31 No.108093863

Anonymous 02/08/26(Sun)12:56:31 No.108093863

>>108093831
Why did people call it 5 anyway, as opposed to 4.8?

Anonymous
02/08/26(Sun)12:56:57 No.108093867

Anonymous 02/08/26(Sun)12:56:57 No.108093867

https://github.com/ggml-org/llama.cpp/pull/19435

Anonymous
02/08/26(Sun)12:59:01 No.108093882

Anonymous 02/08/26(Sun)12:59:01 No.108093882

>>108093863
There were reports that Z.ai is trying to push out GLM5 before the chinese holidays hit based on some chink interview that allegedly talked to them.

Anonymous
02/08/26(Sun)13:00:04 No.108093886

Anonymous 02/08/26(Sun)13:00:04 No.108093886

>>108093882
Also based on a twitter post by a GLM employee.

Anonymous
02/08/26(Sun)13:01:31 No.108093903

Anonymous 02/08/26(Sun)13:01:31 No.108093903

File: 1767459194082060.png (24 KB, 879x206)

24 KB PNG

>>108093867
VIBECODING BROS
IT'S HAPPENING

Anonymous
02/08/26(Sun)13:01:54 No.108093907

Anonymous 02/08/26(Sun)13:01:54 No.108093907

>>108093831
I've tested it for a bit as well, and I've firmly concluded that it's quite dumb and sloppy. At least it was free.

Anonymous
02/08/26(Sun)13:07:53 No.108093975

Anonymous 02/08/26(Sun)13:07:53 No.108093975

>>108091830
>IK is copying PRs from upstream without credit so I think he's just projecting.
We knew he was mentally ill since the drama started. The thread schizo just uses him as a weapon to attack CUDA dev.

Anonymous
02/08/26(Sun)13:10:02 No.108093992

Anonymous 02/08/26(Sun)13:10:02 No.108093992

>>108093867
>I've gotten a bit tired of Llama.cpp missing all the zero-day releases,
>so I will shit out more vibecoding
what a fucking prick
there is a cost in that anyone who would have been interested in writing a proper implementation will fuck off because there's already one in llama.cpp
has he never heard of aesop's The Tortoise and the Hare?
this hobby will die because noise from slop spammers will drown signal

Anonymous
02/08/26(Sun)13:17:39 No.108094067

Anonymous 02/08/26(Sun)13:17:39 No.108094067

personal assistant user group.

Anonymous
02/08/26(Sun)13:18:51 No.108094078

Anonymous 02/08/26(Sun)13:18:51 No.108094078

>ggerganov's local model of choice is GLM 4.7... Flash
oof

Anonymous
02/08/26(Sun)13:28:33 No.108094158

Anonymous 02/08/26(Sun)13:28:33 No.108094158

Is 4.7 flash usable for erp or do I need derestrict/ablit?

Anonymous
02/08/26(Sun)13:29:25 No.108094166

Anonymous 02/08/26(Sun)13:29:25 No.108094166

>>108094158
It's not and no amount of abliteration will help. If that's your target size use nemo.

Anonymous
02/08/26(Sun)13:33:23 No.108094194

Anonymous 02/08/26(Sun)13:33:23 No.108094194

>>108093867
I might as well use this opportunity to check out/evaluate agent shit myself. They mention OpenCode. Is that the SOTA local framework currently?

Anonymous
02/08/26(Sun)13:34:56 No.108094211

Anonymous 02/08/26(Sun)13:34:56 No.108094211

>>108094194
It's been a while since I tried opencode but when I did I quickly gave up and stuck to claude code with local endpoints instead.

Anonymous
02/08/26(Sun)13:35:50 No.108094216

Anonymous 02/08/26(Sun)13:35:50 No.108094216

File: ce.png (4 KB, 343x67)

4 KB PNG

Am I getting snakeoiled?

Anonymous
02/08/26(Sun)13:37:13 No.108094233

Anonymous 02/08/26(Sun)13:37:13 No.108094233

>>108094216
no.

Anonymous
02/08/26(Sun)13:40:19 No.108094256

Anonymous 02/08/26(Sun)13:40:19 No.108094256

>>108094216
No, but the difference is small as is the size difference. As always, just get the max quant you can fit with your amount of desired context.

Anonymous
02/08/26(Sun)13:42:21 No.108094276

Anonymous 02/08/26(Sun)13:42:21 No.108094276

Updated mmproj files have been released today by AesSedai to work with his updated PR : https://huggingface.co/AesSedai/Kimi-K2.5-GGUF/tree/main

Anonymous
02/08/26(Sun)13:42:59 No.108094279

Anonymous 02/08/26(Sun)13:42:59 No.108094279

is stepfun worth downloading if I want a different flavor from glm?

Anonymous
02/08/26(Sun)13:45:24 No.108094298

Anonymous 02/08/26(Sun)13:45:24 No.108094298

>>108094276
What changed?

Anonymous
02/08/26(Sun)13:47:52 No.108094322

Anonymous 02/08/26(Sun)13:47:52 No.108094322

>>108094279
I tried to use it normally and it kept thinking forever.

Anonymous
02/08/26(Sun)13:50:21 No.108094355

Anonymous 02/08/26(Sun)13:50:21 No.108094355

>>108094276
man K2.5 complaining all the time about not being able to generate nsfw image descriptions is annoying
it even complains that there is a system message and it's a "classic jailbreak" technique

Anonymous
02/08/26(Sun)13:50:22 No.108094356

Anonymous 02/08/26(Sun)13:50:22 No.108094356

>>108094279
If you mean full 350B then nope sorry. It is noticeably more retarded.

Anonymous
02/08/26(Sun)13:50:56 No.108094361

Anonymous 02/08/26(Sun)13:50:56 No.108094361

>>108094322
Your temperature? Thinking models often like it around Temp 1.0 to not get stuck in "Wait..." loops.

Anonymous
02/08/26(Sun)13:51:01 No.108094364

Anonymous 02/08/26(Sun)13:51:01 No.108094364

>>108093867
I can't wait for llama.cpp to turn into an unmaintainable mess like Automatic1111 and be abruptly abandoned.

Anonymous
02/08/26(Sun)13:51:29 No.108094369

Anonymous 02/08/26(Sun)13:51:29 No.108094369

what's this mean in the recommended about glm 4.5 air?
>Needs a prefill to get around refusals.

Anonymous
02/08/26(Sun)13:53:43 No.108094391

Anonymous 02/08/26(Sun)13:53:43 No.108094391

File: file.png (145 KB, 1479x884)

145 KB PNG

>>108094216
Quanting embeds and outputs is insane.

Anonymous
02/08/26(Sun)13:54:23 No.108094396

Anonymous 02/08/26(Sun)13:54:23 No.108094396

>>108094298
Not sure, you can take a look here, looks like just fixes : https://github.com/ggml-org/llama.cpp/pull/19170/commits

Anonymous
02/08/26(Sun)13:56:01 No.108094411

Anonymous 02/08/26(Sun)13:56:01 No.108094411

>>108094391
>not ppl

Anonymous
02/08/26(Sun)13:56:30 No.108094415

Anonymous 02/08/26(Sun)13:56:30 No.108094415

>>108094361
rep pen.

Anonymous
02/08/26(Sun)13:56:32 No.108094416

Anonymous 02/08/26(Sun)13:56:32 No.108094416

>>108094411
Go away, John.

Anonymous
02/08/26(Sun)13:59:54 No.108094445

Anonymous 02/08/26(Sun)13:59:54 No.108094445

>>108088802
>NoLiMa
based on my model-agnostic test (12k tokens) that's a bit harder than nolima, mistral large 2 is slightly better than l3.3 70b. honestly adobe's numbers table might be invalid since they used l3.3 for "data curation", so there's bias.
l3.3 did some really stupid shit like saying 522k was below 500k. however, both models are good.

Anonymous
02/08/26(Sun)14:02:11 No.108094463

Anonymous 02/08/26(Sun)14:02:11 No.108094463

>>108094445
>mistral large 2 is slightly better than l3.3 70b
Anon it is current year.

Anonymous
02/08/26(Sun)14:02:15 No.108094464

Anonymous 02/08/26(Sun)14:02:15 No.108094464

>>108094445
2024 called, they want their models back.

Anonymous
02/08/26(Sun)14:07:44 No.108094502

Anonymous 02/08/26(Sun)14:07:44 No.108094502

>>108094391
>quants your attention layers
>ruins the model's ffns
>slaps on a benchmaxxed imatrix on your already benchmaxxed moesissy model
the absolute state of local

Anonymous
02/08/26(Sun)14:16:39 No.108094577

Anonymous 02/08/26(Sun)14:16:39 No.108094577

is qwen3-coder-next worth a damn?

Anonymous
02/08/26(Sun)14:18:03 No.108094581

Anonymous 02/08/26(Sun)14:18:03 No.108094581

>>108093867
CISC is NOT happy https://github.com/ggml-org/llama.cpp/pull/19435#pullrequestreview-3770150593

Anonymous
02/08/26(Sun)14:23:27 No.108094622

Anonymous 02/08/26(Sun)14:23:27 No.108094622

>>108094463
>>108094464
>2024
iykyk, not bothering to explain you why.

Anonymous
02/08/26(Sun)14:29:56 No.108094657

Anonymous 02/08/26(Sun)14:29:56 No.108094657

File: dissapoint.png (421 KB, 882x887)

421 KB PNG

>>108094577
They claim to have released a model heavily trained for agentic use.

It still spits out XML instead of JSON on the next turn, thus failing its agentic purpose completely.

In short: you can have a single mathematical operation or two to be executed by agents. Then it fails to call an agent properly on the intermediate results.

For example, (123 + 345) * (456 - 789)

you can have the addition and the subtraction done by the agents, but not the consequential multiplication

This sucks

Anonymous
02/08/26(Sun)14:41:24 No.108094751

Anonymous 02/08/26(Sun)14:41:24 No.108094751

>>108094657
but is it good at programming or am I better off using free chatgpt

Anonymous
02/08/26(Sun)14:49:48 No.108094825

Anonymous 02/08/26(Sun)14:49:48 No.108094825

>>108094581
human beings with a pulse will never be happy with vibegarbage
LLM agentfaggotry is just pure slop and I can't wait to see the industry (of software development, I don't mean LLMs, they remain useful and are here to stay) collapse from its adoption of retarded agent coders
use them as rubberducks, as second pairs of eyes etc, but stop trying to push the whole "it's going to actually do things by itself" bs

Anonymous
02/08/26(Sun)14:54:48 No.108094859

Anonymous 02/08/26(Sun)14:54:48 No.108094859

>>108094825
>collapse from its adoption of retarded agent coders
If it didn't collapse from the decades of offshoring, I don't see this doing it either. There will, however, be lots of money to be made rewriting and replacing the messes created by both.

Anonymous
02/08/26(Sun)15:01:37 No.108094911

Anonymous 02/08/26(Sun)15:01:37 No.108094911

>>108094859
>If it didn't collapse from the decades of offshoring
Which still relied on real humans with at least SOME coding knowledge.

Anonymous
02/08/26(Sun)15:04:37 No.108094929

Anonymous 02/08/26(Sun)15:04:37 No.108094929

>>108094751
>better off using free chatgpt

Did you ever use chatgpt for programming? Why are you even asking?

I stay with deepseek for this

Anonymous
02/08/26(Sun)15:13:21 No.108094988

Anonymous 02/08/26(Sun)15:13:21 No.108094988

>look to the left
>its all 12bs
>look to the right
>its all 688bs
>what the fuck happened to the middle

Anonymous
02/08/26(Sun)15:23:31 No.108095067

Anonymous 02/08/26(Sun)15:23:31 No.108095067

>>108094988
You are either a vramlet or you aren't.

Anonymous
02/08/26(Sun)15:23:34 No.108095069

Anonymous 02/08/26(Sun)15:23:34 No.108095069

>>108094988
The middle are 300B moes what do you mean

Anonymous
02/08/26(Sun)15:26:01 No.108095092

Anonymous 02/08/26(Sun)15:26:01 No.108095092

>>108094988
Most MoE models will use 5-10-15 Gb of VRAM only. Provided you have enough conventional RAM to load them

Anonymous
02/08/26(Sun)15:26:19 No.108095094

Anonymous 02/08/26(Sun)15:26:19 No.108095094

>>108094988
There's a smoother gradient of model weights now than there ever have been in the past. Mostly thanks to Qwen filling out every conceivable scale below the top end. You can even pick between dense and moe verisons of the same midrange sizes.

Anonymous
02/08/26(Sun)15:30:28 No.108095126

Anonymous 02/08/26(Sun)15:30:28 No.108095126

>>108094988
small enough for people to use
smart enough to be unsafe
so the middle has been discontinued

Anonymous
02/08/26(Sun)15:34:03 No.108095163

Anonymous 02/08/26(Sun)15:34:03 No.108095163

Has /g/ accepted yet that local models are not for the impoverished?

Anonymous
02/08/26(Sun)15:34:11 No.108095166

Anonymous 02/08/26(Sun)15:34:11 No.108095166

>>108094988
Step-3.5-Flash is 200B. You have a couple of 100B MoEs. That's the middle.

Anonymous
02/08/26(Sun)15:35:41 No.108095182

Anonymous 02/08/26(Sun)15:35:41 No.108095182

>>108095166
Those count as 12bs.

Anonymous
02/08/26(Sun)15:36:28 No.108095193

Anonymous 02/08/26(Sun)15:36:28 No.108095193

>>108095069
I'm talking more around the 70-150b range.
Something that could reasonably fit in a single workstation GPU yet isn't completely braindead.

Anonymous
02/08/26(Sun)15:52:45 No.108095302

Anonymous 02/08/26(Sun)15:52:45 No.108095302

>>108095193
Cohere, devstral. MoE is a joke, proven time and time again.

Anonymous
02/08/26(Sun)16:02:07 No.108095364

Anonymous 02/08/26(Sun)16:02:07 No.108095364

>>108095311
this unironically

Anonymous
02/08/26(Sun)16:03:18 No.108095373

Anonymous 02/08/26(Sun)16:03:18 No.108095373

>>108095302
I assume you're talking about the 123b, not the 24b, because the 24b is retarded, and noticeably worse than Gemma-3 27b in all ways.

Anonymous
02/08/26(Sun)16:04:30 No.108095380

Anonymous 02/08/26(Sun)16:04:30 No.108095380

>>108095373
>the 70-150b range.
No, he clearly meant the 24b.

Anonymous
02/08/26(Sun)16:07:41 No.108095399

Anonymous 02/08/26(Sun)16:07:41 No.108095399

>>108095364
Everyone does that. Even the vramlets on reddit. They think a 30b is some giant hard to train or run model.

Anonymous
02/08/26(Sun)16:09:18 No.108095412

Anonymous 02/08/26(Sun)16:09:18 No.108095412

>>108095302
>MoE is a joke
GLMlet cope

Anonymous
02/08/26(Sun)16:10:02 No.108095417

Anonymous 02/08/26(Sun)16:10:02 No.108095417

>>108094988
Gemma 4 soon sir

Anonymous
02/08/26(Sun)16:12:12 No.108095433

Anonymous 02/08/26(Sun)16:12:12 No.108095433

>>108095412
GLMlet cope? Is that what you said, anon?

Anonymous
02/08/26(Sun)16:16:23 No.108095474

Anonymous 02/08/26(Sun)16:16:23 No.108095474

>>108095433
Yes, see: >>108095412

Anonymous
02/08/26(Sun)16:17:18 No.108095483

Anonymous 02/08/26(Sun)16:17:18 No.108095483

File: file.png (10 KB, 316x316)

10 KB PNG

After using GLM for half a year now I officially announce that z.ai saved local.

Anonymous
02/08/26(Sun)16:18:40 No.108095494

Anonymous 02/08/26(Sun)16:18:40 No.108095494

>>108095483
>this supposedly about local ad was sponsored by NovelAI

Anonymous
02/08/26(Sun)16:19:56 No.108095506

Anonymous 02/08/26(Sun)16:19:56 No.108095506

File: 1762984059855286.png (325 KB, 2076x1033)

325 KB PNG

Kimi k2.5 internal safety alignment was done with a sledgehammer lol.

Anonymous
02/08/26(Sun)16:20:50 No.108095516

Anonymous 02/08/26(Sun)16:20:50 No.108095516

>>108095483
stepfun was more lively than glm.

Anonymous
02/08/26(Sun)16:24:28 No.108095537

Anonymous 02/08/26(Sun)16:24:28 No.108095537

>>108095163
A common tactic of the narcissistic and greedy is to price the lower and middle class out of product availability. Such an act by the wealthy is purely out of malice, almost assuredly, with no goal in mind other than forced destitution. Any consumer with their best interest in mind would not, and should not, fellate the cock of their rapist.

Anonymous
02/08/26(Sun)16:25:31 No.108095542

Anonymous 02/08/26(Sun)16:25:31 No.108095542

>>108095506
You're using something below Q4, those quants are extremely broken and do this shit with normal prompts too.

Anonymous
02/08/26(Sun)16:27:50 No.108095565

Anonymous 02/08/26(Sun)16:27:50 No.108095565

>>108095542
I'm using IQ3_S, I didn't expect it to be that awful.

Anonymous
02/08/26(Sun)16:29:13 No.108095580

Anonymous 02/08/26(Sun)16:29:13 No.108095580

>>108095516
Yes it is sovl when compared to GLM. In the standard 4chan definition of the word.

Anonymous
02/08/26(Sun)16:31:13 No.108095597

Anonymous 02/08/26(Sun)16:31:13 No.108095597

>>108095565
nta, but also unsloth quants are broken as well (as usual)
note that also the model was post trained with qat at int4 so it basically is already done to be used at q4, not sure if it makes it worse or better when quanted

Anonymous
02/08/26(Sun)16:31:18 No.108095599

Anonymous 02/08/26(Sun)16:31:18 No.108095599

>>108095494
Everone shat on nai subscription compared to official api pricing.

Anonymous
02/08/26(Sun)16:35:50 No.108095628

Anonymous 02/08/26(Sun)16:35:50 No.108095628

>>108095597
Per IK, unsloth is basically making dago dazzlers and lying to us about the quality of their models. Personally their quants seemed "ok" but I don't got time/space to do KLD to find the "ideal". It's def something.

Anonymous
02/08/26(Sun)16:39:37 No.108095661

Anonymous 02/08/26(Sun)16:39:37 No.108095661

File: image.png (125 KB, 1261x952)

125 KB PNG

>>108095565
Yeah it was the same experience for me, older Kimi and other models didn't get this bad at these quants but something fucked up on 2.5, or just the way people have quanted it so far. It doesn't always fuck up and it will sometimes catch itself and sometimes just let errors accumulate/start looping. Do you notice that one of the first few tokens is often something weird and then it recovers? For me it tries to start its thinking block with "The user [...]" but will say shit like "The maskeduser" or "The Koreanraster" or "TheXDude" (kek).

Here's one case where it accidentally said "root circle" instead of "user" and then made up a whole nonsensical symbol hierarchy to try to make sense of it.

It's smart a lot of the time even at the broken low quants but I ended up moving up to Q4 and taking the speed hit because of the unreliability.

Anonymous
02/08/26(Sun)16:43:21 No.108095697

Anonymous 02/08/26(Sun)16:43:21 No.108095697

>>108095506
>alpaca
>.assistant
pure moe kino, imagine wasting your time on this shit LMFAO.

Anonymous
02/08/26(Sun)16:48:37 No.108095743

Anonymous 02/08/26(Sun)16:48:37 No.108095743

>>108095697
bad bot

Anonymous
02/08/26(Sun)16:48:46 No.108095744

Anonymous 02/08/26(Sun)16:48:46 No.108095744

File: waterfox_lcJFl0EIl9.png (6 KB, 127x40)

6 KB PNG

>>108095506

Anonymous
02/08/26(Sun)16:53:17 No.108095790

Anonymous 02/08/26(Sun)16:53:17 No.108095790

>>108095743
MoE was a disaster tho. All your rmodels are "100b" and they act like that shit is any good. People started running it kinda ok on DDR5 so they just jacked up the price. For every kimi/deepseek there are a ton of these failed micro moeshits.

Anonymous
02/08/26(Sun)17:00:59 No.108095839

Anonymous 02/08/26(Sun)17:00:59 No.108095839

>>108095790
moejeets don't have standards

Anonymous
02/08/26(Sun)17:01:55 No.108095849

Anonymous 02/08/26(Sun)17:01:55 No.108095849

>108095697
>108095790
wtf does this have to do with the original post
also alpaca and .assistant were used with and seen on old DENSE models

Anonymous
02/08/26(Sun)17:06:30 No.108095880

Anonymous 02/08/26(Sun)17:06:30 No.108095880

>>108095790
>People started running it kinda ok on DDR5 so they just jacked up the price.
you are deluded
DDR5 is going up in price because factories are building more HBM for datacenters.
it has nothing to do with a handful of retarded cpu maxxers buying DDR5. Factories that can build DDR5 can also build HBM and that's what they'll do (also is the reason why unlike with crypto bubble popping there won't be a rush of cheap hardware for you when the ai bubble pops. These are server parts and server gpus and even if they flooded the used market at some point you wouldn't know what to do with it.)
https://www.mooreslawisdead.com/post/sam-altman-s-dirty-dram-deal
>On October 1st OpenAI signed two simultaneous deals with Samsung and SK Hynix for 40% of the worlds DRAM supply.
Just by themselves openai could be said to even be the main culprit of this situation.

Anonymous
02/08/26(Sun)17:08:43 No.108095900

Anonymous 02/08/26(Sun)17:08:43 No.108095900

>>108095849
/aids/ is dying so the resident schizos chose other generals to shit on

Anonymous
02/08/26(Sun)17:09:41 No.108095906

Anonymous 02/08/26(Sun)17:09:41 No.108095906

>>108095628
>lying to us about the quality of their models.
I'm skeptical of taking IK's word on anything, but I've also had alot of issues with unsloth so I don't doubt it. If there is actual evidence that unsloth's dynamic quants are worse than anything else thought? I would love to see it. In the meantime I guess I'll just barts quants.

Anonymous
02/08/26(Sun)17:11:19 No.108095917

Anonymous 02/08/26(Sun)17:11:19 No.108095917

>>108095906
there were the kld tests posted recently on these threads that show unsloth meh

Anonymous
02/08/26(Sun)17:25:44 No.108096031

Anonymous 02/08/26(Sun)17:25:44 No.108096031

File: 2-cc03ec8297ba6fb76123a55(...).jpg (18 KB, 395x269)

18 KB JPG

That Minipmc model comes with tts and everything already baked in? Or do I still need to supply my own?

Anonymous
02/08/26(Sun)17:28:06 No.108096044

Anonymous 02/08/26(Sun)17:28:06 No.108096044

>>108095597
>>108095628
Well in that case it's not his quant, it's the one from AesSedai, so I believe something is either wrong with the model itself or the way people quant it.

>>108095661
Yeah I can confirm that, it's super weird, for me it loves using random Chinese words and "issei" "alp" for some reason, and when I look at the token alternatives, it's all nonsense. I don't know what makes it behave like that.
It's even worse when I feed it an nsfw image, it goes into a gptoss like rant about how it's unsafe or if the drawings are minors or whatever then it loops.

Anonymous
02/08/26(Sun)17:45:02 No.108096173

Anonymous 02/08/26(Sun)17:45:02 No.108096173

>>108096044
OK, I can confirm, I tested a higher quant and IQ3_S from AesSedai is broken to me.
Maybe all quants below Q4 are too braindead, at least to describe images, it's night and day.

Anonymous
02/08/26(Sun)17:50:06 No.108096203

Anonymous 02/08/26(Sun)17:50:06 No.108096203

>>108096173
our boy garm has q4_x which is basically 1:1 tensor size from the original files, can't get any closer to that than this

Anonymous
02/08/26(Sun)17:54:59 No.108096239

Anonymous 02/08/26(Sun)17:54:59 No.108096239

>>108096173
VLM portion can't be below Q8

Anonymous
02/08/26(Sun)17:58:35 No.108096258

Anonymous 02/08/26(Sun)17:58:35 No.108096258

>>108096203
I was hoping for something around 450GB max, the Q4 is like 600GB if I remember well. And "smarter" quants from ubergarm are unavailable on llama.cpp, while using ik means no vision capabilities.

>>108096239
I use BF16 on the mmproj, it's a small file anyway.

Anonymous
02/08/26(Sun)18:00:48 No.108096268

Anonymous 02/08/26(Sun)18:00:48 No.108096268

even Q4 is a fucking cope
do some greedy decoding and run the full gamut of quants (rent some cloud compute if you can't test Q8 and f16 locally for your model of choice) and realize that all these years of being told "it's almost indistinguishable from the real thing" were lies

Anonymous
02/08/26(Sun)18:02:36 No.108096286

Anonymous 02/08/26(Sun)18:02:36 No.108096286

>>108096268
4-bit is full quality for Kimi though.

Anonymous
02/08/26(Sun)18:02:51 No.108096288

Anonymous 02/08/26(Sun)18:02:51 No.108096288

>>108096258
if I remember correctly q4_x doesn't actually use any ik features so this one actually can run on mainline

Anonymous
02/08/26(Sun)18:10:17 No.108096333

Anonymous 02/08/26(Sun)18:10:17 No.108096333

>>108096288
someone needs to vibe the vision into IK. It supports quite a few others. Kimi too rich for my blood, I should have bought 1T instead of cucking.

Anonymous
02/08/26(Sun)18:10:21 No.108096335

Anonymous 02/08/26(Sun)18:10:21 No.108096335

>>108095483
How am I supposed to take this post seriously if you don't mention your ego death?

Anonymous
02/08/26(Sun)18:11:41 No.108096342

Anonymous 02/08/26(Sun)18:11:41 No.108096342

>>108096268
I member testing nemo at full precision. That is what made me stop listening to schizos like you.

Anonymous
02/08/26(Sun)18:16:26 No.108096362

Anonymous 02/08/26(Sun)18:16:26 No.108096362

>>108096335
I have you as my spokesman for that.

Anonymous
02/08/26(Sun)18:16:49 No.108096364

Anonymous 02/08/26(Sun)18:16:49 No.108096364

>>108096342
Nemo is dumb at any precision. You can't lobotomize the pre-lobotomized.

Anonymous
02/08/26(Sun)18:16:52 No.108096366

Anonymous 02/08/26(Sun)18:16:52 No.108096366

>>108096342
qwen235 on api occasionally knew gawr gura wasn't asmongold in shark form. 1/3 rolls. That's the extent of the difference from q8 to q4.

Anonymous
02/08/26(Sun)18:20:35 No.108096380

Anonymous 02/08/26(Sun)18:20:35 No.108096380

>>108096366
>Asks about Loli vtuber
That explains a lot

Anonymous
02/08/26(Sun)18:25:03 No.108096403

Anonymous 02/08/26(Sun)18:25:03 No.108096403

Quanting does reduce performance relative to full precision, but if you can run a Q4 of a model 4x bigger than you can run an F16, then it will be better in every way. The only time you want to run F16 is if resources are literally unlimited or there is no superior model at the proper size to use most of your capacity at Q4.

Anonymous
02/08/26(Sun)18:29:05 No.108096425

Anonymous 02/08/26(Sun)18:29:05 No.108096425

>>108096380
how many popular vtubers aren't loli or loli adjacent?

Anonymous
02/08/26(Sun)18:30:28 No.108096428

Anonymous 02/08/26(Sun)18:30:28 No.108096428

>watching vtumors at all
wat

Anonymous
02/08/26(Sun)18:30:42 No.108096430

Anonymous 02/08/26(Sun)18:30:42 No.108096430

>>108096425
None, but feminist brainwashing has convinced retards that's a bad thing.

Anonymous
02/08/26(Sun)18:35:34 No.108096448

Anonymous 02/08/26(Sun)18:35:34 No.108096448

>>108096430
gura literally pushing 30 is even funnier.

Anonymous
02/08/26(Sun)18:37:51 No.108096460

Anonymous 02/08/26(Sun)18:37:51 No.108096460

>>108096430
Maybe feminists go too far but fantasizing about fucking literal children as a grown ass man is a bad thing, yes.

Anonymous
02/08/26(Sun)18:42:36 No.108096477

Anonymous 02/08/26(Sun)18:42:36 No.108096477

>>108096460
>literal children
It's a 2d anime anthropomorphized shark avatar.

Anonymous
02/08/26(Sun)18:42:49 No.108096480

Anonymous 02/08/26(Sun)18:42:49 No.108096480

Hi guys, excuse my ignorance but I have a question - What is the best LLM to run on an rtx 4070 for the purpose of aiding with pentesting? specifically explaining vunerabilities, generating payloads ect?

Anonymous
02/08/26(Sun)18:43:19 No.108096483

Anonymous 02/08/26(Sun)18:43:19 No.108096483

>>108095661
>It's smart a lot of the time even at the broken low quants but I ended up moving up to Q4 and taking the speed hit because of the unreliability.
how big is q4?

Anonymous
02/08/26(Sun)18:43:33 No.108096485

Anonymous 02/08/26(Sun)18:43:33 No.108096485

>>108096460
Next gen is saying 25 and 18 is a moe lester. Meanwhile all our business, government and institution leaders making veiled reference children-children in their emails.

Anonymous
02/08/26(Sun)18:45:06 No.108096495

Anonymous 02/08/26(Sun)18:45:06 No.108096495

>>108095880
>Buy up nearly half the world's supply of something while running on an endless streak of multi billion dollar loses, driving up the price of said thing to inaccessible levels for people who actually have money to spend on said thing.
I hate this world so God damn much.

Anonymous
02/08/26(Sun)18:50:52 No.108096527

Anonymous 02/08/26(Sun)18:50:52 No.108096527

>>108096495
All an IOU at this point too. Money has not exchanged. The promise of buying it all is what's making it happen.
But nah, its not a fake out to deprive people of resources. That's crazy talk.

Anonymous
02/08/26(Sun)18:52:20 No.108096535

Anonymous 02/08/26(Sun)18:52:20 No.108096535

>>108096480
>What is the best LLM to run on an rtx 4070 for the purpose of aiding with pentesting? specifically explaining vunerabilities, generating payloads ect?
my little jeet, even the SOTA online models are overhyped and underdelivering in their ability to do such things (curl's maintainer had to ban all LLM users from the security field because you guys behave like niggers), you won't be doing any of this locally and can forget about the LLMs aiding you in scamming old people

Anonymous
02/08/26(Sun)19:01:50 No.108096590

Anonymous 02/08/26(Sun)19:01:50 No.108096590

>>108096535
lol I'm not too familiar with LLMs in all honesty but just frustrated with the over insane restrictions when it comes to using public models.

I'm not looking for something I can just point and click and will magically hack into things, I work in the field and just want something to make my life easier, whether it's understanding advanced exploits, helping me through ctfs or coming up with viable POCs You're saying this isn't possible?

I will go off and do my own research but since I saw the thread I thought i'd ask...

Anonymous
02/08/26(Sun)19:09:05 No.108096638

Anonymous 02/08/26(Sun)19:09:05 No.108096638

>>108096590
It is technically possible, but not with a 4070. The models you can run on that puny thing would waste your time more than be of any actual help. If you insist, run a Q4 or less of Qwen3-Coder-30B and go from there. That is probably your best bet.

Anonymous
02/08/26(Sun)19:12:55 No.108096659

Anonymous 02/08/26(Sun)19:12:55 No.108096659

>>108096638
Appreciated, thanks.

Anonymous
02/08/26(Sun)19:13:44 No.108096664

Anonymous 02/08/26(Sun)19:13:44 No.108096664

>>108096590
this is what your kind ("security researcher" with LLMs) are unleashing onto the world:
https://arstechnica.com/security/2026/01/overrun-with-ai-slop-curl-scraps-bug-bounties-to-ensure-intact-mental-health/
an excerpt of the sort of slop :
https://gist.github.com/bagder/07f7581f6e3d78ef37dfbfc81fd1d1cd

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.