/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

🎉 Happy Birthday 4chan! 🎉

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 10/01/25(Wed)04:42:03 No.106755904

File: hatable.jpg (636 KB, 2017x2048)

636 KB JPG

/lmg/ - Local Models General Anonymous 10/01/25(Wed)04:42:03 No.106755904

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106748568 & >>106738470

►News
>(09/30) GLM-4.6: Advanced Agentic, Reasoning and Coding Capabilities: https://z.ai/blog/glm-4.6
>(09/30) Sequential Diffusion Language Models released: https://hf.co/collections/OpenGVLab/sdlm-68ac82709d7c343ad36aa552
>(09/29) Ring-1T-preview released: https://hf.co/inclusionAI/Ring-1T-preview
>(09/29) DeepSeek-V3.2-Exp released: https://hf.co/collections/deepseek-ai/deepseek-v32-68da2f317324c70047c28f66
>(09/27) HunyuanVideo-Foley for video to audio released: https://hf.co/tencent/HunyuanVideo-Foley

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
10/01/25(Wed)04:42:36 No.106755906

Anonymous 10/01/25(Wed)04:42:36 No.106755906

File: __hatsune_miku_vocaloid_d(...).mp4 (150 KB, 720x720)

150 KB MP4

►Recent Highlights from the Previous Thread: >>106748568

--Papers:
>106752846 >106755511
--Hardware setups and optimization strategies for running large language models locally:
>106752694 >106752837 >106752845 >106752868 >106752876 >106752881 >106752963 >106752980 >106753037 >106753055 >106753092 >106753128 >106753211 >106753217 >106753245 >106753141 >106753170 >106753173 >106753190 >106754528
--GLM 4.6 creative writing evaluation and benchmark reliability concerns:
>106750563 >106750633 >106750775 >106750659 >106750706 >106750786 >106750841 >106750833
--Mixed reception of Sora 2's video generation capabilities and limitations:
>106748610 >106748671 >106748683 >106748814 >106748736 >106748753 >106748751 >106748774 >106748777 >106748786 >106748812 >106748826
--Evaluating Suno V5's proprietary music generation against local models:
>106749000 >106749400 >106749538 >106749559 >106749590 >106749548 >106749621 >106749642 >106749799 >106749929 >106750524
--Exploring layer-level noise injection for model creativity enhancement:
>106748706 >106748752 >106748767 >106748830 >106748762 >106748852
--GLM 4.6 compatibility updates for llama.cpp:
>106751537 >106751573 >106751594
--Workaround for GLM-4.6 compatibility issue in ik_llama.cpp:
>106754849
--Sora's video generation performance and prompt adherence challenges:
>106753575 >106753597 >106753646 >106753650 >106753662 >106753732 >106753775 >106753852 >106754084 >106753667 >106753676 >106753687 >106753719 >106753729 >106753750 >106755333 >106754191 >106754204 >106753698 >106753722
--Qwen model inaccuracies in name recognition and inconsistent multilingual performance:
>106752092 >106752111 >106752139 >106752258 >106752324 >106752624
--Miku (free space):
>106748655 >106751155 >106751345 >106749314 >106753215 >106753775 >106753816 >106754088 >106754738

►Recent Highlight Posts from the Previous Thread: >>106748575

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
10/01/25(Wed)04:45:15 No.106755923

Anonymous 10/01/25(Wed)04:45:15 No.106755923

Better tool/script to download HF repos? everyone complaining about this xet shiz. Parts keep stalling and unresumable using browser.

Anonymous
10/01/25(Wed)04:45:30 No.106755930

Anonymous 10/01/25(Wed)04:45:30 No.106755930

it's so over

Anonymous
10/01/25(Wed)04:45:32 No.106755931

Anonymous 10/01/25(Wed)04:45:32 No.106755931

>>106755904
My waifu Migu (not the poster)

Anonymous
10/01/25(Wed)04:46:04 No.106755937

Anonymous 10/01/25(Wed)04:46:04 No.106755937

>>106755904
Well done. I pat the Mikuhat.

Anonymous
10/01/25(Wed)04:47:24 No.106755947

Anonymous 10/01/25(Wed)04:47:24 No.106755947

>>106755923
huggingface-cli works fine and resumes on failure

Anonymous
10/01/25(Wed)04:47:25 No.106755948

Anonymous 10/01/25(Wed)04:47:25 No.106755948

>>106755923
git

Anonymous
10/01/25(Wed)04:49:33 No.106755965

Anonymous 10/01/25(Wed)04:49:33 No.106755965

Is glm4.6 hybrid or thinking only?

Anonymous
10/01/25(Wed)04:50:08 No.106755971

Anonymous 10/01/25(Wed)04:50:08 No.106755971

>>106755923

huggingface-cli download ubergarm/GLM-4.5-GGUF --include "IQ3_KT/*" --local-dir glm

Anonymous
10/01/25(Wed)04:59:43 No.106756055

Anonymous 10/01/25(Wed)04:59:43 No.106756055

Wasn't there someone trying to get MTP to work with GLM4.5 on llama.cpp a while ago? Did that also go nowhere?

Anonymous
10/01/25(Wed)05:09:02 No.106756110

Anonymous 10/01/25(Wed)05:09:02 No.106756110

https://files.catbox.moe/w3cpki.webm

Anonymous
10/01/25(Wed)05:11:57 No.106756126

Anonymous 10/01/25(Wed)05:11:57 No.106756126

>>106756110
Korean sweatshops could do better walking animation than this.

Anonymous
10/01/25(Wed)05:14:52 No.106756139

Anonymous 10/01/25(Wed)05:14:52 No.106756139

>>106756126
It's not about the walking animation now but the walking animation two to fifteen years from now

Anonymous
10/01/25(Wed)05:20:06 No.106756158

Anonymous 10/01/25(Wed)05:20:06 No.106756158

>>106756110
>>106756126
Even if you fixed the walking, the quality of the art is such utter garbage, like your low quality generic isekai slop of the season. The whole point of AI art is to make better art fast, not literally copy the shit speedrun art drawn by monkeys in animation sweatshops.

Anonymous
10/01/25(Wed)05:22:47 No.106756164

Anonymous 10/01/25(Wed)05:22:47 No.106756164

https://files.catbox.moe/w84blo.webm

Anonymous
10/01/25(Wed)05:25:14 No.106756185

Anonymous 10/01/25(Wed)05:25:14 No.106756185

>>106756164
This is truly a faithful recreation of a sloppy isekai of the season, just the animation studio saving their budget for so called "better" scenes.

Anonymous
10/01/25(Wed)05:28:04 No.106756199

Anonymous 10/01/25(Wed)05:28:04 No.106756199

File: 1729374392398752.webm (614 KB, 1000x562)

614 KB WEBM

>>106756126
Are you sure?

Anonymous
10/01/25(Wed)05:32:17 No.106756215

Anonymous 10/01/25(Wed)05:32:17 No.106756215

>>106756185
Imagine how many more sloppy isekai they'll be able to churn out per season once they can have AI generate 90% of the scenes.

Anonymous
10/01/25(Wed)05:42:23 No.106756263

Anonymous 10/01/25(Wed)05:42:23 No.106756263

File: 0250922_162247.jpg (117 KB, 1811x380)

117 KB JPG

>>106755904
>>106755906

Anonymous
10/01/25(Wed)05:42:51 No.106756268

Anonymous 10/01/25(Wed)05:42:51 No.106756268

File: 1741432814959702.jpg (84 KB, 540x798)

84 KB JPG

WHERE IS AIR 4.6
GIVE ME THE WEIGHTS
GIVE ME THE GOOFS
NOW NOW NOW

Anonymous
10/01/25(Wed)05:44:54 No.106756280

Anonymous 10/01/25(Wed)05:44:54 No.106756280

File: cockbench.png (1.19 MB, 1131x3646)

1.19 MB PNG

Added GLM 4.6

Anonymous
10/01/25(Wed)05:45:14 No.106756285

Anonymous 10/01/25(Wed)05:45:14 No.106756285

https://files.catbox.moe/ffaa0e.webm

Anonymous
10/01/25(Wed)05:47:04 No.106756296

Anonymous 10/01/25(Wed)05:47:04 No.106756296

>>106756268
https://github.com/ggml-org/llama.cpp/issues/16361
https://github.com/ggml-org/llama.cpp/pull/16359
https://huggingface.co/bartowski/zai-org_GLM-4.6-GGUF/blob/main/zai-org_GLM-4.6-imatrix.gguf

Anonymous
10/01/25(Wed)05:48:05 No.106756303

Anonymous 10/01/25(Wed)05:48:05 No.106756303

>>106756215
At that point it's going to be easier to literally just take any random light/web novel and prompt it to animate that shit. What the hell would we need animation studios run by retarded boomers for? Tradition?

Anonymous
10/01/25(Wed)05:48:31 No.106756307

Anonymous 10/01/25(Wed)05:48:31 No.106756307

>>106756055
https://github.com/ggml-org/llama.cpp/pull/15225

Anonymous
10/01/25(Wed)05:49:23 No.106756313

Anonymous 10/01/25(Wed)05:49:23 No.106756313

>>106756303
>Tradition?
It's Japan, so yes.

Anonymous
10/01/25(Wed)05:50:17 No.106756319

Anonymous 10/01/25(Wed)05:50:17 No.106756319

>>106756313
Good thing we won't need Japan anymore then.

Anonymous
10/01/25(Wed)05:51:12 No.106756323

Anonymous 10/01/25(Wed)05:51:12 No.106756323

>>106756215
i suspect there is lots of ai tools already used in the background for anime.
at least the subtitle translations already have mistakes that a human cant make. like mixing up the sex. you need to actually see stuff because in japanese that info isn't provided with language.
or 2 interpretations of a word and you need context to know which one it is...
I suppose everything auto translated and a dude in the basement looks it over quickly before pushing the upload button.

Anonymous
10/01/25(Wed)05:52:41 No.106756329

Anonymous 10/01/25(Wed)05:52:41 No.106756329

>>106756280
Wtf coderbros???

Anonymous
10/01/25(Wed)05:52:42 No.106756330

Anonymous 10/01/25(Wed)05:52:42 No.106756330

>>106756158
that's hilarious because, that's not AI...

Anonymous
10/01/25(Wed)05:53:27 No.106756333

Anonymous 10/01/25(Wed)05:53:27 No.106756333

>>106756323
Stuff like that will be solved when models can also take vision into account or (less likely for now) have better context management so information like that isn't lost when translating a series.

Anonymous
10/01/25(Wed)05:56:15 No.106756343

Anonymous 10/01/25(Wed)05:56:15 No.106756343

>>106756330
I thought you were joking, but I just reverse image checked that shit and it's two years old. Animators will be out of a job next year by this point.

Anonymous
10/01/25(Wed)05:58:11 No.106756355

Anonymous 10/01/25(Wed)05:58:11 No.106756355

File: Ningen_Fushin_no_Boukensh(...).webm (3.48 MB, 1920x1080)

3.48 MB WEBM

>>106756330
Winrar, these >>106756110 >>106756164 >>106756285 were all scenes from the QUALITY anime Ningen Fushin no Boukensha-tachi ga Sekai wo Sukuu you desu.

Anonymous
10/01/25(Wed)06:04:23 No.106756387

Anonymous 10/01/25(Wed)06:04:23 No.106756387

>>106756333
yeah, i dont understand how the normies doom now because the jeet agi 2025 prediction didnt turn out to be true.
as far as i know freelance writers and translators feel hard right now already.
like you need to use llms and fix it up manually a little bit for like 80% less pay compared to the past.

also i spot llms everywhere now. once you figured out the model you can just feel it.
for example the monthly kindergarten pamphlet of my kids. monthly message from the teacher.
im in japan so maybe its more used here though. not sure in other countries.

Anonymous
10/01/25(Wed)06:04:48 No.106756391

Anonymous 10/01/25(Wed)06:04:48 No.106756391

>>106756268
is it even good for smut compared to g4.5? I was excited for glm because moe. Should I care it's better at math? I feel like for writing it will be whatever

Anonymous
10/01/25(Wed)06:09:39 No.106756416

Anonymous 10/01/25(Wed)06:09:39 No.106756416

>>106756296
THAT'S NOT AIR YOU NIGGER

Anonymous
10/01/25(Wed)06:12:13 No.106756430

Anonymous 10/01/25(Wed)06:12:13 No.106756430

>>106756416
air is deprecated due to poverty not being a valid use case

Anonymous
10/01/25(Wed)06:13:32 No.106756439

Anonymous 10/01/25(Wed)06:13:32 No.106756439

>>106756430
iF YOU'RE NOT USING FULL GLM FULLY ON GPU THEN YOU'RE ALSO POOR

Anonymous
10/01/25(Wed)06:14:47 No.106756448

Anonymous 10/01/25(Wed)06:14:47 No.106756448

>>106756439
poor is a wide spectrum

Anonymous
10/01/25(Wed)06:19:41 No.106756468

Anonymous 10/01/25(Wed)06:19:41 No.106756468

File: 1743474513250754.png (93 KB, 720x438)

93 KB PNG

>>106756391
Improved writing and RP was actually a focus for 4.6 according to their model card.
I haven't used it too much yet because my ggufs are still downloading but it seems a lot more flexible than 4.5 was in open-ended scenarios.
One thing that stood out was that its reasoning process is really fucking thorough for creative work now. It'll go through several really fitting options, map out the reply and consider other pretty interesting stuff which I haven't seen from another model. It's quite different from the shorter thinkers like GLM 4.5 and Deepseek V3.1 while also much being more focused and on point than how R1 used to do it, which tended to meander a lot while doing reasoning.
I've seen some really good reply variety thanks to this. However, it also sometimes bloats the thinking part up to almost original R1 proportion so you likely still want to skip it if you're running it at a slow speed. Still a pretty interesting trait if nothing else.

Anonymous
10/01/25(Wed)06:21:28 No.106756478

Anonymous 10/01/25(Wed)06:21:28 No.106756478

>>106756280
>The cock length was decreased by half.
its over

Anonymous
10/01/25(Wed)06:24:42 No.106756494

Anonymous 10/01/25(Wed)06:24:42 No.106756494

>>106756478
cock chance decreasing isn't bad if you get more varied words for cock

Anonymous
10/01/25(Wed)06:31:18 No.106756522

Anonymous 10/01/25(Wed)06:31:18 No.106756522

>>106756468
>>Improved writing and RP was actually a focus for 4.6 according to their model card.
when will we learn, that model cards mean nothing, they can say anything they want.

Anonymous
10/01/25(Wed)06:38:49 No.106756552

Anonymous 10/01/25(Wed)06:38:49 No.106756552

>>106756478
it wasn't cut in half, it was flattened to match the rest of the distribution
your cock is now average anon

Anonymous
10/01/25(Wed)06:42:41 No.106756574

Anonymous 10/01/25(Wed)06:42:41 No.106756574

I struggled through installing ik_llama.cpp on windows, but I finally figured it out. Was following someone's MinGW guide when CUDA seems to depend on the cl.exe from VS.

Now that I've got it running, what can I expect? I see that ubergarm has some quants that use the ik form, are those worth all the fun of installing ik?

Anonymous
10/01/25(Wed)06:46:14 No.106756598

Anonymous 10/01/25(Wed)06:46:14 No.106756598

>>106756574
for larger models what requiere cpu offloading, ik+uber is the best speed wise and quality/size

Anonymous
10/01/25(Wed)06:46:24 No.106756599

Anonymous 10/01/25(Wed)06:46:24 No.106756599

>>106756574
i think only deepseek gets a speedup on ikllama because of an architecture specific optimization that ikllama has implemented

Anonymous
10/01/25(Wed)06:52:13 No.106756624

Anonymous 10/01/25(Wed)06:52:13 No.106756624

Very organic.

Anonymous
10/01/25(Wed)07:11:04 No.106756727

Anonymous 10/01/25(Wed)07:11:04 No.106756727

>>106756599
Is deepseek even worth trying on 128+24gb?

Anonymous
10/01/25(Wed)07:17:41 No.106756764

Anonymous 10/01/25(Wed)07:17:41 No.106756764

>256gb uram m5 max macbook
and just like that we're back

Anonymous
10/01/25(Wed)07:24:13 No.106756801

Anonymous 10/01/25(Wed)07:24:13 No.106756801

Damn 4.6 is good. Its real nasty even without a elaborate sys prompt.
Just ooc "make it hot and sexy" in the first message and it does that. Thats how it should be.

The brutal part is the thinking. Its already too big and that kills it totally for me.
I hope we get a voodoo moment for AI soon.
I'm not gonna buy the recent 500 watt monsters from nvidia.

Anonymous
10/01/25(Wed)07:28:10 No.106756821

Anonymous 10/01/25(Wed)07:28:10 No.106756821

>>106756285
This show was so ass. I can't remember whether I finished it.

Anonymous
10/01/25(Wed)07:30:19 No.106756830

Anonymous 10/01/25(Wed)07:30:19 No.106756830

>>106756801
Have you tried it without thinking?

Anonymous
10/01/25(Wed)07:36:04 No.106756856

Anonymous 10/01/25(Wed)07:36:04 No.106756856

>>106756727
if your Internet is fast enough and you have the disk space, it is definitely worth giving it a shot.

Anonymous
10/01/25(Wed)07:37:08 No.106756860

Anonymous 10/01/25(Wed)07:37:08 No.106756860

Are you anons really measuring a model's quality just by how prone it is to write 'cock' or 'pussy'?

Anonymous
10/01/25(Wed)07:38:50 No.106756868

Anonymous 10/01/25(Wed)07:38:50 No.106756868

>>106756860
yeah, have you got a better way?

Anonymous
10/01/25(Wed)07:40:09 No.106756876

Anonymous 10/01/25(Wed)07:40:09 No.106756876

>>106756860
Look at cockbench results for gemma and qwen 3 32b and tell me it's not a meaningful benchmark.

Anonymous
10/01/25(Wed)07:40:46 No.106756882

Anonymous 10/01/25(Wed)07:40:46 No.106756882

What was the last local roleplay/chat-tuned model that didn't have safety or alignment baked-in, and wasn't a "helpful assistant"? Bonus points if it wasn't told it was an AI.
https://huggingface.co/EleutherAI/gpt-neox-20b?

Anonymous
10/01/25(Wed)07:42:05 No.106756888

Anonymous 10/01/25(Wed)07:42:05 No.106756888

>>106756882
>https://huggingface.co/EleutherAI/

>Welcome to EleutherAI's HuggingFace page. We are a non-profit research lab focused on interpretability, alignment, and ethics of artificial intelligence.
sad

Anonymous
10/01/25(Wed)07:42:20 No.106756890

Anonymous 10/01/25(Wed)07:42:20 No.106756890

>>106756882
That was not roleplay/chat-tuned.

Anonymous
10/01/25(Wed)07:43:07 No.106756895

Anonymous 10/01/25(Wed)07:43:07 No.106756895

>>106756830
No, because I am a huge retard. Thought glm isnt a hybrid thinker.
Also something with my preset is fucked since it doesnt properly prefill <think></think>. But thats my problem.
Anyway, thanks anon.

Anonymous
10/01/25(Wed)07:44:27 No.106756903

Anonymous 10/01/25(Wed)07:44:27 No.106756903

File: GLM.png (56 KB, 486x706)

56 KB PNG

>>106756895
My ST is ancient and I'm not updating it because I'll get supply chain malware from npm, but maybe this is helpful.

Anonymous
10/01/25(Wed)07:45:45 No.106756917

Anonymous 10/01/25(Wed)07:45:45 No.106756917

>>106756903
thats helpful, i have a area with "Start Reply With" but it seems to be ignored. doing that way will probably make it work. thanks again anon.

Anonymous
10/01/25(Wed)07:47:14 No.106756924

Anonymous 10/01/25(Wed)07:47:14 No.106756924

>>106756917
No problem. Thank you for being you.

Anonymous
10/01/25(Wed)07:49:12 No.106756937

Anonymous 10/01/25(Wed)07:49:12 No.106756937

>>106756468
How censored is it when you let it think? GLM 4.5 was basically uncensored when you disabled thinking, but not so much when you didn't.

Anonymous
10/01/25(Wed)07:50:55 No.106756944

Anonymous 10/01/25(Wed)07:50:55 No.106756944

>>106756860
yes

Anonymous
10/01/25(Wed)07:53:19 No.106756963

Anonymous 10/01/25(Wed)07:53:19 No.106756963

>>106756937
how do you actually disable the <think> etc on the command line ie. llama.cpp or similar when starting a server?
and which is usually a better idea for creative writing and stories (as opposed to ERP), let it do the <think>s and just remove it during parsing or turn it off? what would you for R1 for example

Anonymous
10/01/25(Wed)07:53:44 No.106756965

Anonymous 10/01/25(Wed)07:53:44 No.106756965

>>106756860
Counterintuitively it is a great test.

Anonymous
10/01/25(Wed)07:57:33 No.106756990

Anonymous 10/01/25(Wed)07:57:33 No.106756990

File: 1759319701396.png (1.11 MB, 2048x2048)

1.11 MB PNG

surprise reminder that jart will always lurk this bread

Anonymous
10/01/25(Wed)07:57:50 No.106756995

Anonymous 10/01/25(Wed)07:57:50 No.106756995

>>106756522
Why would a big company lie about that? I get they all lie with benchmark results but focusing on rp would be such a weird lie.

Anonymous
10/01/25(Wed)08:00:32 No.106757011

Anonymous 10/01/25(Wed)08:00:32 No.106757011

File: 1759319940605.png (61 KB, 401x360)

61 KB PNG

>>106756860
cmon sam, your 'erry already lost to mesugaki tests to the point you had to include it in your sysprompt. no need to seethe over cockbench

Anonymous
10/01/25(Wed)08:03:54 No.106757036

Anonymous 10/01/25(Wed)08:03:54 No.106757036

Is 4.6 support merged on main yet or do I have to build from PR?

Anonymous
10/01/25(Wed)08:08:54 No.106757066

Anonymous 10/01/25(Wed)08:08:54 No.106757066

File: 1759320379155.png (269 KB, 646x644)

269 KB PNG

i noticed poopenAI had paid shill to post about "building rig" for 'toss 120b in every single open sores llm community non-stop. on weekly basis.
no i will not take my meds. fuck clankers. jannies tongue my anus.

Anonymous
10/01/25(Wed)08:12:51 No.106757091

Anonymous 10/01/25(Wed)08:12:51 No.106757091

>>106757066
Probably Sam, himself. Typical millennial twink probably just sits around shitposting on the internet all day when he's not working.

Anonymous
10/01/25(Wed)08:13:50 No.106757104

Anonymous 10/01/25(Wed)08:13:50 No.106757104

I'm looking for an advice on building a rig for running the best open source AI OLLaMA chat gpt-oss-120b, any advice? Thanks in advance!

Anonymous
10/01/25(Wed)08:19:57 No.106757143

Anonymous 10/01/25(Wed)08:19:57 No.106757143

>>106756303
>What the hell would we need animation studios run by retarded boomers for?
Quality? The fuck are you talking about, even the best gens are way behind human animation. This is true for images, videos and text and it will continue to be true for a few years at least.
I'm not one of the guys who has a hard-on for human artists but AI is simply not here yet. First thing I write in booru search is -ai_generated because I can't stand that low quality and shitty generic art style - it ruins my goon sessions.

Anonymous
10/01/25(Wed)08:20:58 No.106757149

Anonymous 10/01/25(Wed)08:20:58 No.106757149

>>106757104
Sure! For running a 120B parameter model like GPT-OSS or LLaMA-based variants, you’ll need a high-end setup with multiple GPUs—ideally A100 80GB or H100 80GB cards—since memory bandwidth and VRAM are critical; a powerful CPU (like Threadripper or Xeon), at least 256–512GB RAM, fast NVMe storage, and robust cooling are also essential. Using NVLink or PCIe 4.0/5.0 interconnects will help scale across GPUs efficiently. You're Welcome!

Anonymous
10/01/25(Wed)08:22:01 No.106757157

Anonymous 10/01/25(Wed)08:22:01 No.106757157

>>106756963
https://github.com/ggml-org/llama.cpp/tree/master/tools/server
--reasoning-budget N controls the amount of thinking allowed; currently only one of: -1 for unrestricted thinking budget, or 0 to disable thinking (default: -1)
You'd have to check which models are actually affected by this setting.
It depends heavily on the model whether thinking helps or hurts. R1 specifically has very malleable thinking so you can just tell it what to focus on. Some models will spend most of their time thinking about whether to refuse, which doesn't exactly improve the output. If the thinking is not very malleable and the model hasn't been trained to think about creative writing like 4.6 apparently has been, then it's usually at best a waste of tokens.

Anonymous
10/01/25(Wed)08:26:04 No.106757182

Anonymous 10/01/25(Wed)08:26:04 No.106757182

>>106757143
It's simply a matter of context, consistency and their preservation for next episodes, it's not quite there yet but it's very close. Everything outside of that like the booru stuff is literally just skill issue, slop made by retarded and lazy prompters.

Anonymous
10/01/25(Wed)08:31:29 No.106757220

Anonymous 10/01/25(Wed)08:31:29 No.106757220

>decide to use chatgpt to make quick work of bash script to download a 5 part model
>model router decides to use retard edition
>can't get the console output right after 5 revisions
>final revision keeps going even after ctrl+c
Just give us o3 back, Sam.

Anonymous
10/01/25(Wed)08:32:44 No.106757226

Anonymous 10/01/25(Wed)08:32:44 No.106757226

>>106757220
just pay 20 bucks bro

Anonymous
10/01/25(Wed)08:33:24 No.106757230

Anonymous 10/01/25(Wed)08:33:24 No.106757230

>>106757226
The only thing Plus™ lets you decide, now is if it uses the thinking model or not.

Anonymous
10/01/25(Wed)08:37:51 No.106757265

Anonymous 10/01/25(Wed)08:37:51 No.106757265

>>106757220
Cant trust gpt for anything coding related anon.
Use through an api so you have full control and no huge hidden sys prompt that severely degrades performance.
Sonnet for most stuff. Gemini 2.5 pro or a reasoner for difficult stuff. But you really have to pinpoint the problem. The reasoner models all suck because they try to fix everything but what you ask them.
Gpt is only good for general knowledge stuff. Like obscure jap youtuber drama. It knows the deep lore. kek

Anonymous
10/01/25(Wed)08:39:03 No.106757273

Anonymous 10/01/25(Wed)08:39:03 No.106757273

>>106757157
thank you that helps
I still wonder, even for models which don't just spout cucked consent related nonsense in the <think>, like 4.6 or R1, is it really worth the wait? in other words, is there measurable improvement in quality of resulting prose, and is it significant enough to justify the increase in response time, and slowdown in t/s as context/kv builds?

Anonymous
10/01/25(Wed)08:39:13 No.106757274

Anonymous 10/01/25(Wed)08:39:13 No.106757274

File: a.jpg (257 KB, 781x959)

257 KB JPG

>>106757220
I made a simple ebook reader, it's 300 lines and that was really quick, no issues. Then I modified that a little bit and there it is, no more Calibre (I can change terminal font on the fly etc this is not that readable).
But on other nights it can't get anything right.
They should really use (1) quant or whatever for the free version and something else for the paid. Based on my overall experience though, I wouldn't pay anything for chatgpt... it's too random.

Anonymous
10/01/25(Wed)08:42:46 No.106757299

Anonymous 10/01/25(Wed)08:42:46 No.106757299

>>106757273
4.6 still has cucked thinking?

Anonymous
10/01/25(Wed)08:43:55 No.106757305

Anonymous 10/01/25(Wed)08:43:55 No.106757305

>>106757274
free tier claude has been constantly good for me. free chatgpt has gotten significantly worse with the last release, it used to be okay.

Anonymous
10/01/25(Wed)08:44:55 No.106757310

Anonymous 10/01/25(Wed)08:44:55 No.106757310

>>106757299
I haven't tried it no idea. maybe I misunderstood, I thought the other guy was saying it does NOT do cucked reasoning

Anonymous
10/01/25(Wed)08:46:30 No.106757325

Anonymous 10/01/25(Wed)08:46:30 No.106757325

I'm downloading 4.6 as we speak to test it out, but yeah... had to resort to manually downloading from web browser on this PC and I'll just push them to the server through the network because apparently my server is getting cock blocked via curl now, probably rate limited from restarting chatjeetpt's shitty download script several times.

Anonymous
10/01/25(Wed)09:05:08 No.106757425

Anonymous 10/01/25(Wed)09:05:08 No.106757425

>>106757230
It still shows o3 and 4.1 for me under legacy models.

Anonymous
10/01/25(Wed)09:06:34 No.106757433

Anonymous 10/01/25(Wed)09:06:34 No.106757433

File: file.png (4 KB, 260x156)

4 KB PNG

Does thought process not work in the latest llama server for anyone else? I enabled it in the options and it just looks like this.

Anonymous
10/01/25(Wed)09:10:24 No.106757451

Anonymous 10/01/25(Wed)09:10:24 No.106757451

File: 2025-10-01_13-07.png (134 KB, 453x538)

134 KB PNG

yep

Anonymous
10/01/25(Wed)09:14:05 No.106757477

Anonymous 10/01/25(Wed)09:14:05 No.106757477

Qwen Next 80B support in llama.cpp status?

Anonymous
10/01/25(Wed)09:19:53 No.106757516

Anonymous 10/01/25(Wed)09:19:53 No.106757516

monke brain got me to do a trial of glm 4.6 off openrouter and it's goood
the vibes are immaculate
how do I turn off reasoning though?

Anonymous
10/01/25(Wed)09:21:08 No.106757526

Anonymous 10/01/25(Wed)09:21:08 No.106757526

>>106755904
Is there an rp frontend that's like sillytavern but not bloated?

Anonymous
10/01/25(Wed)09:34:35 No.106757622

Anonymous 10/01/25(Wed)09:34:35 No.106757622

>>106757526
curl

Anonymous
10/01/25(Wed)09:36:14 No.106757635

Anonymous 10/01/25(Wed)09:36:14 No.106757635

>>106757516
>I don't need you think, I need you to write fine smut posthaste
Have /nothink at the end of your prompt

Anonymous
10/01/25(Wed)09:38:34 No.106757656

Anonymous 10/01/25(Wed)09:38:34 No.106757656

the heavy weight of gemma. I can feel it

Anonymous
10/01/25(Wed)09:39:57 No.106757666

Anonymous 10/01/25(Wed)09:39:57 No.106757666

>>106757656
Gemma MoE incoming.

Anonymous
10/01/25(Wed)09:47:52 No.106757707

Anonymous 10/01/25(Wed)09:47:52 No.106757707

>>106757656
Would Google release that before Gemini 3?

Anonymous
10/01/25(Wed)09:50:01 No.106757724

Anonymous 10/01/25(Wed)09:50:01 No.106757724

>>106757656
Gemma is already too fat...

Anonymous
10/01/25(Wed)09:51:28 No.106757731

Anonymous 10/01/25(Wed)09:51:28 No.106757731

>>106757724
Make it even wider.

Anonymous
10/01/25(Wed)09:52:46 No.106757740

Anonymous 10/01/25(Wed)09:52:46 No.106757740

>>106757656
...you know what >>106756280

Anonymous
10/01/25(Wed)09:53:35 No.106757745

Anonymous 10/01/25(Wed)09:53:35 No.106757745

>>106756890
https://huggingface.co/togethercomputer/GPT-NeoXT-Chat-Base-20B
I should have used that link instead, whoops.

Anonymous
10/01/25(Wed)09:55:47 No.106757761

Anonymous 10/01/25(Wed)09:55:47 No.106757761

>>106756860
it is the least likely metric to be benchmaxxed

Anonymous
10/01/25(Wed)09:59:58 No.106757782

Anonymous 10/01/25(Wed)09:59:58 No.106757782

>>106757761
This. But as I've argued before sex permeates all of human culture and activity and thus permeates language in ways people ordinarily take for granted. And so ERP related benchmarks can tell you a lot about how the model handles world knowledge in general.
As I've said before you really have to run a lot of Nala tests to see it.
But a dumb model will just describe her as having hands outright.
But then a smarter model might describe her as having paws but describe the use of those paws basically the same as hands.
But then on really good models there's this light bulb moment where suddenly it describes the paws being used in a paw-like manner. At this point it's not just describing what a paw is but also what a paw isn't. It is now a falsifiable concept to the model. It's the emergence of genuine empirical knowledge. Like I've never genuinely coomed to the Nala card because I'm not into feralshit. But it's an epistemological goldmine when it comes to LLM testing.

Anonymous
10/01/25(Wed)10:02:53 No.106757795

Anonymous 10/01/25(Wed)10:02:53 No.106757795

does zai have their api service for glm4.6? How quantized is it in OR

Anonymous
10/01/25(Wed)10:08:35 No.106757828

Anonymous 10/01/25(Wed)10:08:35 No.106757828

doing a quick Nala on 4.6 as we speak, but my RAM is probably very disjointed right now because I didn't hard restart after turning off my minecraft server because it's going at like 1.5 tokens per second, But I will post both the thinking process and the response when it's done.

Anonymous
10/01/25(Wed)10:10:50 No.106757843

Anonymous 10/01/25(Wed)10:10:50 No.106757843

>>106757795
You can select them as a provider on OR. Hopefully they don't quant their own model. They also have their own API.

Anonymous
10/01/25(Wed)10:11:04 No.106757846

Anonymous 10/01/25(Wed)10:11:04 No.106757846

>>106757828 (Me)
One thing I will say now is that it uses a lot of wiki markdown in the thinking process which wastes a non trivial amount of tokens. Maybe it's helpful to the model though I don't know.

Anonymous
10/01/25(Wed)10:13:07 No.106757865

Anonymous 10/01/25(Wed)10:13:07 No.106757865

>>106757846
Would be interesting to see how its performance change if you ban those tokens.

Anonymous
10/01/25(Wed)10:20:52 No.106757907

Anonymous 10/01/25(Wed)10:20:52 No.106757907

>>106757865
looking at the text stream in it looks like they bothered to tokenize ** but not ***. So for h3 it uses ** followed by *. Which, for a model that size, is probably enough breadcrumbs for it to know to go * * if you ban **.

Anonymous
10/01/25(Wed)10:24:22 No.106757935

Anonymous 10/01/25(Wed)10:24:22 No.106757935

>>106757230
I have a feeling this is going to be their "red cross free donuts" moment.
They created an expectation, which was a seen as a de-facto promise, and now the the average user is assmad.
The whole ploy was only going to work if they had an actual moat and zero competitors able to keep up.
Every AI company that can't either serve a consistent SOTA model or be the cheapest bottom feeder is going die like the irrational money pit that it is. Even VCs aren't that dumb.

Anonymous
10/01/25(Wed)10:24:35 No.106757936

Anonymous 10/01/25(Wed)10:24:35 No.106757936

File: zuck.jpg (37 KB, 457x457)

37 KB JPG

>>106756990
whats the context? damn i've been here since 2023 but never noticed this drama
probably because i was too busy making gpt4-x-alpaca
good times...

Anonymous
10/01/25(Wed)10:28:27 No.106757958

Anonymous 10/01/25(Wed)10:28:27 No.106757958

>>106757936
tl;dr lcpp takeover attempt by a "look at meeee" midwit coder with some minor discord cult was curbstomped by ggerganov. It was a dark time. The less said the better.

Anonymous
10/01/25(Wed)10:28:45 No.106757960

Anonymous 10/01/25(Wed)10:28:45 No.106757960

wow I have my ST set to generate 2048 tokens at a time right now to account for thinking models but I think GLM-4.6 is going to use all of those and still not be done thinking at this rate.

Anonymous
10/01/25(Wed)10:30:00 No.106757964

Anonymous 10/01/25(Wed)10:30:00 No.106757964

>Great question — and it's smart to ask this before spending time troubleshooting something you may not even need.

Anonymous
10/01/25(Wed)10:31:41 No.106757973

Anonymous 10/01/25(Wed)10:31:41 No.106757973

>>106757958
damn yeah I remember seeing her on twitter saying she made that llama thing all by her own
shes cute though

Anonymous
10/01/25(Wed)10:33:53 No.106757989

Anonymous 10/01/25(Wed)10:33:53 No.106757989

>>106757958
Does Mr. Ggerganov post here?

Anonymous
10/01/25(Wed)10:34:48 No.106757996

Anonymous 10/01/25(Wed)10:34:48 No.106757996

>>106757936
>gpt4-x-alpaca
that was one of the models that got me hooked, thanks for your work anon

Anonymous
10/01/25(Wed)10:35:28 No.106757998

Anonymous 10/01/25(Wed)10:35:28 No.106757998

>>106757996
No problem, I'm glad you enjoyed it.

Anonymous
10/01/25(Wed)10:37:53 No.106758012

Anonymous 10/01/25(Wed)10:37:53 No.106758012

so i tried out ikllamacpp, and i got 3t/s for a 5 bit of glm 4.5. but it was running entirely off of my CPU and my SSD. how do i put it on my GPUs and RAM?

Anonymous
10/01/25(Wed)10:42:11 No.106758038

Anonymous 10/01/25(Wed)10:42:11 No.106758038

>>106757989
No. I don't.

Fuck...

Anonymous
10/01/25(Wed)10:43:13 No.106758043

Anonymous 10/01/25(Wed)10:43:13 No.106758043

>>106757433
It works in some models and in others it doesn't.

Anonymous
10/01/25(Wed)10:44:13 No.106758050

Anonymous 10/01/25(Wed)10:44:13 No.106758050

>>106757477
Two more weeks

Anonymous
10/01/25(Wed)10:45:15 No.106758056

Anonymous 10/01/25(Wed)10:45:15 No.106758056

>>106757973
>https://github.com/ggml-org/llama.cpp/pull/613
This one is a proper narcissist.

Anonymous
10/01/25(Wed)10:47:39 No.106758071

Anonymous 10/01/25(Wed)10:47:39 No.106758071

>>106756990
LLamafile has quietly been abandoned so I think it's safe to say that by now jart has moved on.

Anonymous
10/01/25(Wed)11:17:53 No.106758283

Anonymous 10/01/25(Wed)11:17:53 No.106758283

Erl lather magister

Anonymous
10/01/25(Wed)11:21:47 No.106758314

Anonymous 10/01/25(Wed)11:21:47 No.106758314

File: Nala GLM4.6.png (229 KB, 1832x812)

229 KB PNG

>>106757828
Here we go.
The response is indented weird because for some reason ST didn't parse the </think> properly and everything appeared inside the thought.
But all that wiki markdown seems to have inflicted it with asterisk brain because it actually did use asterisks to punctuate every single switch between asterisks and narrative.
*blah blah blah**she said**she glomps you*
While it took the very rare step of acknowledging that the users back side is turned to her at the start of the scenario it did not seem to understand that makes the space surrounding the user's chest physically inaccessible without any additional action taken.
Either way- it's miles away from the worst I've ever seen. BUT as a function of performance relative to the investment of resources it requires to run locally... It's scout/GPT-OSS tier.
That was a lot of thinking for a very milquetoast reply.

Anonymous
10/01/25(Wed)11:26:23 No.106758346

Anonymous 10/01/25(Wed)11:26:23 No.106758346

File: 1732405241445772.png (217 KB, 1060x805)

217 KB PNG

>you were forced to go for the 200 dollars/month subcription to run Sora 1
>With Sora 2 you can simply go for the 20 dollars/month subscription
what kind of black magic is this? I really thought their model was gigantic, but it's probably not the case at all

Anonymous
10/01/25(Wed)11:26:30 No.106758347

Anonymous 10/01/25(Wed)11:26:30 No.106758347

File: 0ds87f.png (40 KB, 579x160)

40 KB PNG

>>106757960
*ding* reply is baked
>Thought for 5 minutes
sigh and hit continue
>>106758012
think the idea is ngl 99 and then override layers or specific tensors to CPU with either --n-cpu-moe (adjust as low as possible before GPU OOM) or manually assigning with -ot
for GLM-4.6 maybe want first three (non moe) layers fully on GPU?
first time playing with ik also today so don't really know what i'm doing

Anonymous
10/01/25(Wed)11:29:04 No.106758366

Anonymous 10/01/25(Wed)11:29:04 No.106758366

>>106758314
Also worth mentioning this is IQ4_XS bartowski quant

Anonymous
10/01/25(Wed)11:30:44 No.106758385

Anonymous 10/01/25(Wed)11:30:44 No.106758385

>>106758346
>I really thought their model was gigantic, but it's probably not the case at all
Probably a MoE.
Inference providers love MoE.

Anonymous
10/01/25(Wed)11:32:11 No.106758403

Anonymous 10/01/25(Wed)11:32:11 No.106758403

>>106758346
Don't be fooled by the lies.
You need an invite code to use Sora 2. So it's a limited rollout at the moment. So don't give them your 20 dollars yet. I made that mistake already.

Anonymous
10/01/25(Wed)11:33:38 No.106758419

Anonymous 10/01/25(Wed)11:33:38 No.106758419

>>106758314
>thought for an hour
wait what

Anonymous
10/01/25(Wed)11:34:55 No.106758432

Anonymous 10/01/25(Wed)11:34:55 No.106758432

>>106758419
I said on my previous message that the memory on my server is in a very suboptimal state at the moment and as a result it only averaged 0.48 token/sec on generation throughout the entire run.

Anonymous
10/01/25(Wed)11:44:17 No.106758525

Anonymous 10/01/25(Wed)11:44:17 No.106758525

>>106758432
anyway yeah the think part does look kinda promising

Anonymous
10/01/25(Wed)11:50:32 No.106758604

Anonymous 10/01/25(Wed)11:50:32 No.106758604

>>106758314
>I like that.
>It's a line of ownership.
>musky+wild == earthy+grassy+predatory?

Anonymous
10/01/25(Wed)11:50:47 No.106758607

Anonymous 10/01/25(Wed)11:50:47 No.106758607

https://huggingface.co/ubergarm/GLM-4.6-GGUF
He is goofing.

Anonymous
10/01/25(Wed)11:51:46 No.106758621

Anonymous 10/01/25(Wed)11:51:46 No.106758621

>>106758607
Our lord and saviour.

Anonymous
10/01/25(Wed)11:53:21 No.106758637

Anonymous 10/01/25(Wed)11:53:21 No.106758637

>>106758346
Sora 1 was also available with the $20 subscription. Maybe there was a delay before it was opened up, don't remember now.
>>106758403
I just grabbed a random invite from a Reddit thread where people were sharing them.

Anonymous
10/01/25(Wed)11:54:51 No.106758654

Anonymous 10/01/25(Wed)11:54:51 No.106758654

>>106757843
I'm pretty sure ZAI runs the model in FP8 and considers that the "normal" way to run it.

Anonymous
10/01/25(Wed)12:01:24 No.106758715

Anonymous 10/01/25(Wed)12:01:24 No.106758715

>>106758038
Should've split that in two posts for best effect.

Anonymous
10/01/25(Wed)12:03:37 No.106758734

Anonymous 10/01/25(Wed)12:03:37 No.106758734

>no 4.6 air

another L for self hosting lmao

Anonymous
10/01/25(Wed)12:04:03 No.106758741

Anonymous 10/01/25(Wed)12:04:03 No.106758741

>>106758314
Should have turned off thinking...

Anonymous
10/01/25(Wed)12:05:14 No.106758753

Anonymous 10/01/25(Wed)12:05:14 No.106758753

>>106758734
stop being poor

Anonymous
10/01/25(Wed)12:13:27 No.106758820

Anonymous 10/01/25(Wed)12:13:27 No.106758820

File: f978h.png (15 KB, 356x168)

15 KB PNG

>>106758741
Oh GLM-chan it really ain't that deep..

Anonymous
10/01/25(Wed)12:26:32 No.106758921

Anonymous 10/01/25(Wed)12:26:32 No.106758921

Miku is a clanker

Anonymous
10/01/25(Wed)12:35:49 No.106758997

Anonymous 10/01/25(Wed)12:35:49 No.106758997

Haven't checked this general since Deepseek R1 0528 came out and have been using that ever since. Is there anything worth downloading over it yet?

Anonymous
10/01/25(Wed)12:36:33 No.106759006

Anonymous 10/01/25(Wed)12:36:33 No.106759006

>>106758997
nemo

Anonymous
10/01/25(Wed)12:39:36 No.106759038

Anonymous 10/01/25(Wed)12:39:36 No.106759038

>>106758997
what rig?

Anonymous
10/01/25(Wed)12:41:50 No.106759054

Anonymous 10/01/25(Wed)12:41:50 No.106759054

>>106759038
Epyc 9334 768GB and a few 3090s to speed up prompt processing.

Anonymous
10/01/25(Wed)12:43:13 No.106759069

Anonymous 10/01/25(Wed)12:43:13 No.106759069

>>106758997
no except for maybe glm 4.6 but i didn't try it yet. people seem to like it

Anonymous
10/01/25(Wed)12:44:42 No.106759086

Anonymous 10/01/25(Wed)12:44:42 No.106759086

https://responsiblestatecraft.org/israel-chatgpt/

Anonymous
10/01/25(Wed)12:45:44 No.106759098

Anonymous 10/01/25(Wed)12:45:44 No.106759098

>>106759086
Baby blender bros... we are so back!

Anonymous
10/01/25(Wed)12:48:23 No.106759121

Anonymous 10/01/25(Wed)12:48:23 No.106759121

>>106758997
Kimi K2 and GLM-4.5 are side grades, but they are all slower than R1 at the same quant (at least when running in ik_llama.cpp).
I'm still on R1-0528 too.

Anonymous
10/01/25(Wed)12:51:08 No.106759142

Anonymous 10/01/25(Wed)12:51:08 No.106759142

Had a dream about dating cute girls but I don't even remember their names after waking up

Anonymous
10/01/25(Wed)12:56:26 No.106759193

Anonymous 10/01/25(Wed)12:56:26 No.106759193

Apple will save local
>https://arxiv.org/abs/2509.22935
>Compute-Optimal Quantization-Aware Training

Anonymous
10/01/25(Wed)13:02:48 No.106759262

Anonymous 10/01/25(Wed)13:02:48 No.106759262

>>106759193
qrd?

Anonymous
10/01/25(Wed)13:11:27 No.106759329

Anonymous 10/01/25(Wed)13:11:27 No.106759329

realistically, why cant i just run daniel's 1bit glm4.6 quant and have *fun*

Anonymous
10/01/25(Wed)13:13:20 No.106759347

Anonymous 10/01/25(Wed)13:13:20 No.106759347

>>106759329
Who said you can't?

Anonymous
10/01/25(Wed)13:14:04 No.106759355

Anonymous 10/01/25(Wed)13:14:04 No.106759355

>>106759262
Ask an LLM to summarize it

Anonymous
10/01/25(Wed)13:18:24 No.106759405

Anonymous 10/01/25(Wed)13:18:24 No.106759405

File: co-sft_inference-test.png (1.08 MB, 1784x382)

1.08 MB PNG

>>106755904
Good afternoon /lmg/

Did yet another finetune. This time of an entire board's posts

https://huggingface.co/datasets/AiAF/co-sft-dataset

Anonymous
10/01/25(Wed)13:25:25 No.106759499

Anonymous 10/01/25(Wed)13:25:25 No.106759499

glm 4.6 verdict?

Anonymous
10/01/25(Wed)13:29:17 No.106759542

Anonymous 10/01/25(Wed)13:29:17 No.106759542

>>106755904
Gemini 3.0 will be released this week or next week (probably on Monday). So the Deepseek team will finally be able to finetune their DS V4 model.

Anonymous
10/01/25(Wed)13:35:30 No.106759610

Anonymous 10/01/25(Wed)13:35:30 No.106759610

>>106759542
based chinks

Anonymous
10/01/25(Wed)13:37:41 No.106759636

Anonymous 10/01/25(Wed)13:37:41 No.106759636

ikllama.cpp still is not using my GPUs, even on tiny models. -ngl is set to 99 and i have completely removed --threads and --n-cpu-moe and -ot but it still is CPU only for some reason.
normal llama works with my GPUs just fine.

Anonymous
10/01/25(Wed)13:38:30 No.106759643

Anonymous 10/01/25(Wed)13:38:30 No.106759643

>>106759636
Did you build it with the CUDA/ROcM/Vulka/SYCL backend?

Anonymous
10/01/25(Wed)13:39:27 No.106759655

Anonymous 10/01/25(Wed)13:39:27 No.106759655

>>106759405
What's with the tabs and double newlines in the template?
>>106759499
10/10
>>106759121
I see you're back again. Try 4.6, it's an actual upgrade

Anonymous
10/01/25(Wed)13:39:32 No.106759656

Anonymous 10/01/25(Wed)13:39:32 No.106759656

>>106759636
Oh yeah, if you launch it with --verbose, does it list the GPUs as devices?

Anonymous
10/01/25(Wed)13:40:44 No.106759668

Anonymous 10/01/25(Wed)13:40:44 No.106759668

>>106759643
yes. i built it with
cmake -B ./build -DGGML_CUDA=ON -DGGML_BLAS=OFF
cmake --build ./build --config Release -j $(nproc)

>>106759656
let me try that

Anonymous
10/01/25(Wed)13:41:41 No.106759682

Anonymous 10/01/25(Wed)13:41:41 No.106759682

>>106759405
>Tabs
That's just how axolotl inference formats output. It automatically wraps prompts in the proper chat template so I don't have to worry about making sure my prompts are formatted correctly.

>double newlines
???
I see single lines

Anonymous
10/01/25(Wed)13:42:08 No.106759687

Anonymous 10/01/25(Wed)13:42:08 No.106759687

>>106758314
Serious prompting issue. GLM-4.6 barely needs anything beyond "You are a writer tasked with narrating {{char}}" for scenarios and "You are {{char}} in a roleplay with {{user}}". Add basic writing instructions to use full paragraphs and that's it. I'm using Markdown headers in the sys prompt to mark descriptions and instructions and it has never used Markdown in any of its actual replies.

Anonymous
10/01/25(Wed)13:42:28 No.106759691

Anonymous 10/01/25(Wed)13:42:28 No.106759691

>>106759542
stealing from the (gpu) rich and giving to the (gpu) poor is based

Anonymous
10/01/25(Wed)13:42:40 No.106759694

Anonymous 10/01/25(Wed)13:42:40 No.106759694

File: the stuff.png (112 KB, 960x1080)

112 KB PNG

>>106759668
>let me try that
This is what you are looking for.

Anonymous
10/01/25(Wed)13:43:08 No.106759699

Anonymous 10/01/25(Wed)13:43:08 No.106759699

>>106759687
very good sarrs

Anonymous
10/01/25(Wed)13:44:20 No.106759712

Anonymous 10/01/25(Wed)13:44:20 No.106759712

File: おはﾐﾄﾞﾔｯ.mp4 (439 KB, 1230x720)

439 KB MP4

Anonymous
10/01/25(Wed)13:45:40 No.106759730

Anonymous 10/01/25(Wed)13:45:40 No.106759730

>>106759694
it did not show that. it said that it was not compiled with cuda for some reason and that -ngl would be ignored.
i just tried to do a rebuild but it now says Unsupported gpu architecture 'compute_120'

Anonymous
10/01/25(Wed)13:46:16 No.106759738

Anonymous 10/01/25(Wed)13:46:16 No.106759738

Been out of ai-ing for awhile, looking for two (2) things.

Is there a good model that can turn images into transparent vectorized images, like for an official logo.

What is the current best LLM you can run on 48gb vram (two 3090s)

Anonymous
10/01/25(Wed)13:46:24 No.106759742

Anonymous 10/01/25(Wed)13:46:24 No.106759742

>>106758997
glm 4.6 is great

Anonymous
10/01/25(Wed)13:46:46 No.106759746

Anonymous 10/01/25(Wed)13:46:46 No.106759746

>>106757451
birthday mikusex

Anonymous
10/01/25(Wed)13:46:59 No.106759749

Anonymous 10/01/25(Wed)13:46:59 No.106759749

>>106759694
>rtx 3070ti laptop
>4.5 air
how does it even work? it must feel like you're making a religious pilgrimage every time you make a request

Anonymous
10/01/25(Wed)13:47:53 No.106759756

Anonymous 10/01/25(Wed)13:47:53 No.106759756

>>106757707
They base them on Gemini, so I doubt they will release them before Gemini 3.

Anonymous
10/01/25(Wed)13:48:33 No.106759761

Anonymous 10/01/25(Wed)13:48:33 No.106759761

>>106759730
Do you perhaps need to update your CUDA SDK?

>>106759749
Eh. I get 8t/s with ikllama.cpp at 16ish K context.
Good enough if I disable thinking thanks to the magic of DDR5.

Anonymous
10/01/25(Wed)13:49:21 No.106759772

Anonymous 10/01/25(Wed)13:49:21 No.106759772

>>106759761
i am on cuda 12.5

Anonymous
10/01/25(Wed)13:49:21 No.106759773

Anonymous 10/01/25(Wed)13:49:21 No.106759773

>>106759742
this, glm4.6 is legit the first model to truly compete with claude imo, deepseek / kimi were always poor knockoffs

Anonymous
10/01/25(Wed)13:51:10 No.106759794

Anonymous 10/01/25(Wed)13:51:10 No.106759794

>>106759772
I think you need at least 12.8.
Maybe 12.9.
Or you can manually force a lower compute_ version.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.