/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 06/29/26(Mon)17:12:54 No.109164034

File: 1776989277485216.jpg (586 KB, 1812x1998)

586 KB JPG

/lmg/ - Local Models General Anonymous 06/29/26(Mon)17:12:54 No.109164034 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>109158385 & >>109153585

►News
>(06/29) DEEPSEEK V4 SUPPORT MERGED: https://github.com/ggml-org/llama.cpp/pull/24162
>(06/28) DFlash support merged: https://github.com/ggml-org/llama.cpp/pull/22105
>(06/27) DeepSeek releases DeepSpec and DSpark models: https://hf.co/deepseek-ai/DeepSeek-V4-Pro-DSpark
>(06/25) LFM2.5-230M released: https://liquid.ai/blog/lfm2-5-230m
>(06/22) Qwen-AgentWorld-35B-A3B language world model released: https://qwen.ai/blog?id=qwen-agentworld

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/RecapAnon/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
06/29/26(Mon)17:13:07 No.109164035

Anonymous 06/29/26(Mon)17:13:07 No.109164035

File: __hatsune_miku_and_kasane(...).jpg (545 KB, 1196x1252)

545 KB JPG

►Recent Highlights from the Previous Thread: >>109158385

--High-budget server builds and benchmarks for GLM 5.2:
>109158422 >109158628 >109158654 >109158769 >109159619 >109160561 >109161172 >109161202 >109161255 >109161275 >109161945 >109158728 >109158842 >109158907 >1
09158822 >109158896 >109158920
--DeepSeek V4 llama.cpp support and debate over PR quality:
>109160089 >109160223 >109160284 >109160266 >109160435 >109160535 >109160587 >109160617 >109160640 >109160647 >109160992 >109161093 >109161110 >109161237 >1
09161536
--Anon releases depurpled Gemma 4 31B using ablation technique:
>109161944 >109161985 >109162141 >109162162 >109162221 >109162235
--Anon's vibe-coded NUMA support implementation and performance benchmarks:
>109159732 >109159747 >109159920 >109161290
--llama.cpp CUDA dev's hardware requirements for testing large models:
>109159284 >109159377 >109159443 >109159551 >109159679
--DeepSeek V4 API announcement and pricing updates:
>109160165 >109160389 >109161270
--DeepSeek V4 support added to llama.cpp:
>109161433 >109161492 >109161532 >109161750 >109161877
--Benchmarks for GLM-5.2 and Step-3.5-Flash on dual 4090s:
>109159785 >109159987
--Testing and critiquing a depurpled Gemma model's prose and variability:
>109161035 >109161114 >109161113 >109161174 >109161192
--Criticism of Hermes Agent's software and Nous Research's motives:
>109160275 >109160308 >109160576 >109160675 >109160858
--Cost-efficiency comparison between DDR5 and PRO 6000 memory bandwidth:
>109158887 >109159020
--Anons blast Anthropic CEO for calling open source AI dangerous:
>109159607 >109159677 >109159733 >109160064 >109160116 >109160303 >109160648 >109160877 >109160479 >109160603
--Logs:
>109158559 >109158586 >109159329 >109160223 >109160284 >109160859 >109161035 >109161803 >109162501 >109162544 >109163245
--Miku (free space):
>109158539 >109161492

►Recent Highlight Posts from the Previous Thread: >>109158388

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
06/29/26(Mon)17:31:14 No.109164133

Anonymous 06/29/26(Mon)17:31:14 No.109164133

File: robololi hugs GPU.jpg (565 KB, 1024x1024)

565 KB JPG

Anonymous
06/29/26(Mon)17:34:58 No.109164154

Anonymous 06/29/26(Mon)17:34:58 No.109164154

Deepmikusex

Anonymous
06/29/26(Mon)17:37:07 No.109164160

Anonymous 06/29/26(Mon)17:37:07 No.109164160

gemmaballs

Anonymous
06/29/26(Mon)17:53:05 No.109164240

Anonymous 06/29/26(Mon)17:53:05 No.109164240

>>109164035
Push her in Teto. Do it.

Anonymous
06/29/26(Mon)17:53:37 No.109164247

Anonymous 06/29/26(Mon)17:53:37 No.109164247

>>109164034
>>109164035
Why do some migus have the hair things low like that?

Anonymous
06/29/26(Mon)18:06:49 No.109164319

Anonymous 06/29/26(Mon)18:06:49 No.109164319

Kimi recap anon has abandoned us.
>>109164133
Her standards are quite low.

Anonymous
06/29/26(Mon)18:28:43 No.109164486

Anonymous 06/29/26(Mon)18:28:43 No.109164486

if your migu's twintails are too droopy it could be a sign of dehydration, be sure to water her on a regular basis

Anonymous
06/29/26(Mon)18:29:41 No.109164495

Anonymous 06/29/26(Mon)18:29:41 No.109164495

>can't install thing because it requires python 3.10 and I have 3.11
Python is a joke. Literally anything else would get laughed off the face of this world if it had zero backwards compatibility, expected you to run 999 different versions at a time by design that all keep different versions of the same package duplicated 9999 times. Don't the people who write this shit feel bad?
I'm not touching conda.

Anonymous
06/29/26(Mon)18:30:36 No.109164502

Anonymous 06/29/26(Mon)18:30:36 No.109164502

>>109164034
deepseek team released their draft training code and it include training code for dflash, they released before the original team behind dflash lol
https://github.com/deepseek-ai/DeepSpec

Anonymous
06/29/26(Mon)18:33:28 No.109164528

Anonymous 06/29/26(Mon)18:33:28 No.109164528

>>109164502
This theoretically makes llama implementation really easy right?
>>109164495
>Downgrade to 3.10
errrm sorry chuddy one of the sub-dependencies needs 3.11.

Anonymous
06/29/26(Mon)18:34:28 No.109164535

Anonymous 06/29/26(Mon)18:34:28 No.109164535

Has anyone tried Qwen agentworld yet?

Anonymous
06/29/26(Mon)18:34:50 No.109164540

Anonymous 06/29/26(Mon)18:34:50 No.109164540

>>109164535
its basically cool world but for agents

Anonymous
06/29/26(Mon)18:36:26 No.109164554

Anonymous 06/29/26(Mon)18:36:26 No.109164554

>>109164495
>>109164528
Just ask Gemma-chan to make you a new programming language.

Anonymous
06/29/26(Mon)18:37:36 No.109164562

Anonymous 06/29/26(Mon)18:37:36 No.109164562

>>109164528
>This theoretically makes llama implementation really easy right?

llama is more about inference than training, if you got a dflash model it doesn't care how it was made as long as it's correctly made.

Anonymous
06/29/26(Mon)18:40:52 No.109164586

Anonymous 06/29/26(Mon)18:40:52 No.109164586

>>109164540
>Qwen-AgentWorld-397B-A17B mentioned in their benchmarks
Why is this not on HF? That sounds like a useful size bracket.

Anonymous
06/29/26(Mon)18:45:41 No.109164628

Anonymous 06/29/26(Mon)18:45:41 No.109164628

File: lmg_culture.jfif.jpg (110 KB, 1024x768)

110 KB JPG

https://archive.is/sWFja

Anonymous
06/29/26(Mon)18:50:06 No.109164662

Anonymous 06/29/26(Mon)18:50:06 No.109164662

>>109164628
Do you think he's zesty enough to see jart as male?

Anonymous
06/29/26(Mon)18:51:57 No.109164694

Anonymous 06/29/26(Mon)18:51:57 No.109164694

File: why.png (850 KB, 1920x1080)

850 KB PNG

why does gemma do this?

Anonymous
06/29/26(Mon)18:54:33 No.109164718

Anonymous 06/29/26(Mon)18:54:33 No.109164718

File: longcat2.png (160 KB, 1807x994)

160 KB PNG

Another 1.6T MoE from China
https://longcat.chat/blog/longcat-2.0/
https://huggingface.co/meituan-longcat/LongCat-2.0 (not online yet)

Anonymous
06/29/26(Mon)18:56:06 No.109164733

Anonymous 06/29/26(Mon)18:56:06 No.109164733

>>109164718
iirc that was "Owl Alpha" on openrouter

Anonymous
06/29/26(Mon)18:58:06 No.109164753

Anonymous 06/29/26(Mon)18:58:06 No.109164753

File: file.png (575 KB, 686x386)

575 KB PNG

>>109164662
I think the vibe is closer to pic related. Of course Jart is much less majestic than a tiger.

Anonymous
06/29/26(Mon)19:01:09 No.109164767

Anonymous 06/29/26(Mon)19:01:09 No.109164767

>>109164718
Can you give us some good insider info, mr totally organic poster?

Anonymous
06/29/26(Mon)19:01:26 No.109164770

Anonymous 06/29/26(Mon)19:01:26 No.109164770

>>109164753
nice dog

Anonymous
06/29/26(Mon)19:03:34 No.109164783

Anonymous 06/29/26(Mon)19:03:34 No.109164783

>>109164718
How much has it been trained for RP and fictional world state maintenance? Nobody gives a fuck about benchmaxxing or agentmaxxing unless it dethrones the current frontrunners and it'll be forgotten about again as soon as it's dethroned in turn.

Anonymous
06/29/26(Mon)19:04:17 No.109164786

Anonymous 06/29/26(Mon)19:04:17 No.109164786

>>109164767
What do you mean, everybody and their cat is talking about LongCat2.

Anonymous
06/29/26(Mon)19:05:36 No.109164791

Anonymous 06/29/26(Mon)19:05:36 No.109164791

If I want to get into local llm does it behoove me to get an appleslop box or whatever other dedicated hardware? I have a pc with a 24gb 4090 and 128g of ddr4 but idk if any of it is relevant to running a decent llm at decent speed. If you haven't noticed, I am retarded. Thanks in advance.

Anonymous
06/29/26(Mon)19:06:50 No.109164796

Anonymous 06/29/26(Mon)19:06:50 No.109164796

File: longcat2-creativewriting.png (705 KB, 1630x1316)

705 KB PNG

>>109164783
They have a "creative writing" use case in their blog post.

Anonymous
06/29/26(Mon)19:07:54 No.109164801

Anonymous 06/29/26(Mon)19:07:54 No.109164801

>>109164791
maybe you should get into it before you go spend money but whatever you seem like the kind of guy with more money than sense, so it's only a matter of time either way before it's parted with you

Anonymous
06/29/26(Mon)19:08:17 No.109164804

Anonymous 06/29/26(Mon)19:08:17 No.109164804

>>109164786
>What do you mean, everybody and their cat is talking about LongCat2.
its not on orange reddit or regular reddit. we're being astroturbed by tha CHINESE

Anonymous
06/29/26(Mon)19:09:24 No.109164810

Anonymous 06/29/26(Mon)19:09:24 No.109164810

>>109164791
You can run the best models in the consumer range. Gemma 4 31B and Qwen 3.7 27B.
Anything more and you're looking at apple or rigs with multiple gpus.

Anonymous
06/29/26(Mon)19:09:56 No.109164815

Anonymous 06/29/26(Mon)19:09:56 No.109164815

>>109164791
>24gb 4090 and 128g of ddr4
>decent speed
Start with gemma 4 31b q4km loaded onto gpu

Anonymous
06/29/26(Mon)19:10:32 No.109164818

Anonymous 06/29/26(Mon)19:10:32 No.109164818

>>109164804
https://www.reddit.com/r/LocalLLaMA/comments/1uj7egu/introducing_longcat20_a_largescale_moe_language/

Anonymous
06/29/26(Mon)19:11:06 No.109164820

Anonymous 06/29/26(Mon)19:11:06 No.109164820

>>109164791
>128g
The devil's number. You must be at least 192g tall.

Anonymous
06/29/26(Mon)19:11:25 No.109164823

Anonymous 06/29/26(Mon)19:11:25 No.109164823

I heard that longcat scores 100% on cockbench.

Anonymous
06/29/26(Mon)19:12:06 No.109164829

Anonymous 06/29/26(Mon)19:12:06 No.109164829

>>109164823
I hear that it totally rocked on anon's standardized Nala test.

Anonymous
06/29/26(Mon)19:13:40 No.109164838

Anonymous 06/29/26(Mon)19:13:40 No.109164838

>>109164823
>100% on cockbench
a 1 parameter model could do that

Anonymous
06/29/26(Mon)19:14:00 No.109164841

Anonymous 06/29/26(Mon)19:14:00 No.109164841

>>109164829
I heard it roleplays as a mesugaki by default.

Anonymous
06/29/26(Mon)19:15:03 No.109164851

Anonymous 06/29/26(Mon)19:15:03 No.109164851

>>109164694
I think it does short replies better if it doesn't use reasoning. And, if a reply like that is already in the context then it's more likely to copy its format.

Anonymous
06/29/26(Mon)19:17:20 No.109164863

Anonymous 06/29/26(Mon)19:17:20 No.109164863

>>109164718
Does this also use some fancy space saving tech like DSv4 does or are they seriously rawdogging 1.6T50A

Anonymous
06/29/26(Mon)19:19:23 No.109164875

Anonymous 06/29/26(Mon)19:19:23 No.109164875

>>109164718
>>109164796
I will now try your model. If you want westerners to use it quickly, you probably want to develop the llama.cpp PR yourselves after what happened with Deepseek.

Anonymous
06/29/26(Mon)19:21:27 No.109164887

Anonymous 06/29/26(Mon)19:21:27 No.109164887

>>109164796
>claude code

Anonymous
06/29/26(Mon)19:22:44 No.109164890

Anonymous 06/29/26(Mon)19:22:44 No.109164890

>>109164841
I heard it has no purple prose at all.

Anonymous
06/29/26(Mon)19:23:30 No.109164893

Anonymous 06/29/26(Mon)19:23:30 No.109164893

File: 1770072392448931.gif (1.26 MB, 360x360)

1.26 MB GIF

Anonymous
06/29/26(Mon)19:23:41 No.109164894

Anonymous 06/29/26(Mon)19:23:41 No.109164894

>>109164694
Models converge to a certain direction if nothing new has been added. They arent exactly chatbots, they wont come up with any new stuff unless stated and even then its likely the "new stuff" will be just derived from whatever you wrote.
Gemma in particular will stick to whatever you or the card have instructed it to do.
For context, nothing you wrote adds anything of value or has a shape that'd prompt the model to move in a different direction. Basically both user and gemma are going
>oh
>i see
>is that so
but gemma has it masked behind all that filler.

Anonymous
06/29/26(Mon)19:25:17 No.109164904

Anonymous 06/29/26(Mon)19:25:17 No.109164904

>>109164718
I'm so ready for all these models that nobody will be able to run.

Anonymous
06/29/26(Mon)19:28:02 No.109164913

Anonymous 06/29/26(Mon)19:28:02 No.109164913

File: lawdhethic.png (22 KB, 159x159)

22 KB PNG

>>109164904
I'm willing to bet the upcoming fat Mistral model is also going to be about 1.6T parameters if not larger.

Anonymous
06/29/26(Mon)19:28:43 No.109164914

Anonymous 06/29/26(Mon)19:28:43 No.109164914

>>109164838
False. To achieve the highest grade in cockbench is not about merely having a large amount of cock. Cock is important, of course, and should be there. However, there must be a fair amount of acceptable alternatives as well, such as dick, schlong, pecker, flaccid penis, pulsating fuckrod, chub, chode, meat drill, etc. Without an appropriate distribution, a model fails. If just grading the output of "cock" then yes, a 1 parameter model could do it, but that is not the case.

Anonymous
06/29/26(Mon)19:28:48 No.109164917

Anonymous 06/29/26(Mon)19:28:48 No.109164917

File: 1753158015180992.png (24 KB, 159x159)

24 KB PNG

>>109164913

Anonymous
06/29/26(Mon)19:30:15 No.109164922

Anonymous 06/29/26(Mon)19:30:15 No.109164922

>>109164917
I look like this

Anonymous
06/29/26(Mon)19:30:21 No.109164923

Anonymous 06/29/26(Mon)19:30:21 No.109164923

>>109164718
>135B N-gram Embedding parameters are included in the model
What's the difference between this and DeepSeek's engrams?

Anonymous
06/29/26(Mon)19:33:18 No.109164934

Anonymous 06/29/26(Mon)19:33:18 No.109164934

>>109164923
these are real

Anonymous
06/29/26(Mon)19:34:14 No.109164939

Anonymous 06/29/26(Mon)19:34:14 No.109164939

I love ds4 flash sex. Give me one more model and I will just be swapping between glm's, flash and that one till I die.

Anonymous
06/29/26(Mon)19:37:23 No.109164954

Anonymous 06/29/26(Mon)19:37:23 No.109164954

>>109164939
What makes dipsyflash pussy so good?

Anonymous
06/29/26(Mon)19:42:44 No.109164971

Anonymous 06/29/26(Mon)19:42:44 No.109164971

File: 1782755960340128.png (49 KB, 612x246)

49 KB PNG

>twitter screencap

Anonymous
06/29/26(Mon)19:44:32 No.109164980

Anonymous 06/29/26(Mon)19:44:32 No.109164980

I need a wife and kids so I can learn what it feels like to abandon them.

Anonymous
06/29/26(Mon)19:47:24 No.109164990

Anonymous 06/29/26(Mon)19:47:24 No.109164990

>>109164971
Anyone can claim whatever they want. An actual available model is the only thing that matters. Also go back, >navroop singh, etc.

Anonymous
06/29/26(Mon)19:47:28 No.109164991

Anonymous 06/29/26(Mon)19:47:28 No.109164991

>https://github.com/unslothai/unsloth/pull/6659
why the fuck is he either talking to the codex review like its a human or just copy pasting llm output as a response?

Anonymous
06/29/26(Mon)19:47:31 No.109164993

Anonymous 06/29/26(Mon)19:47:31 No.109164993

>>109164971
@Glock, explain what he meant by Rothschild free AI.

Anonymous
06/29/26(Mon)19:49:18 No.109165002

Anonymous 06/29/26(Mon)19:49:18 No.109165002

>>109164993
glass half empty (no juice)

Anonymous
06/29/26(Mon)19:49:45 No.109165004

Anonymous 06/29/26(Mon)19:49:45 No.109165004

>>109164991
Being generous, either for documentation and/or his account is making posts from an agentic harness while addressing the code review raised issues.

Anonymous
06/29/26(Mon)19:51:44 No.109165011

Anonymous 06/29/26(Mon)19:51:44 No.109165011

>>109164991
Lmao, his responses sound exactly like claude. So either they're copy pasted or his account is hooked up to an agent like the other guy said.

Anonymous
06/29/26(Mon)19:52:51 No.109165016

Anonymous 06/29/26(Mon)19:52:51 No.109165016

has anyone tested spawning "subagents" to reduce context usage for chores like exploring a codebase and finding files that match certain criteria to be edited with a different, preferably small, model?
gemma's tool calling is pretty shit sometimes, i'd like to leave the failed tool calls outside of the context

Anonymous
06/29/26(Mon)19:57:47 No.109165049

Anonymous 06/29/26(Mon)19:57:47 No.109165049

>>109165004
>for documentation
the PR has 115 messages, all clanker x clanker. the snr is zero

Anonymous
06/29/26(Mon)20:00:58 No.109165065

Anonymous 06/29/26(Mon)20:00:58 No.109165065

>>109164894
>gemma-chan forcing shut-in nerds to learn how to conversate
She can't possibly be this perfect can she...?

Anonymous
06/29/26(Mon)20:02:25 No.109165070

Anonymous 06/29/26(Mon)20:02:25 No.109165070

>>109164796
Impressive. Very nice.

Anonymous
06/29/26(Mon)20:03:10 No.109165073

Anonymous 06/29/26(Mon)20:03:10 No.109165073

>>109165016
I have not, but it sounds like something off of arxiv. You could publish a paper on that idea. If you do, might I suggest "Chain of Agents" as the name, and "Agents are all you need" as the title? It'd get picked up by the industry in no time.

Anonymous
06/29/26(Mon)20:07:21 No.109165091

Anonymous 06/29/26(Mon)20:07:21 No.109165091

>>109164247
It's Rabbit Hole Miku
https://en.wikipedia.org/wiki/Rabbit_Hole_(song)

Anonymous
06/29/26(Mon)20:09:11 No.109165098

Anonymous 06/29/26(Mon)20:09:11 No.109165098

>>109164786
my cat isnt't, he is sleeping

Anonymous
06/29/26(Mon)20:09:24 No.109165099

Anonymous 06/29/26(Mon)20:09:24 No.109165099

>>109165091
No it's not... that's a different design RETARD!!!

Anonymous
06/29/26(Mon)20:10:06 No.109165100

Anonymous 06/29/26(Mon)20:10:06 No.109165100

>>109165016
Isn't that the whole point of subagents in the first place?

Anonymous
06/29/26(Mon)20:13:21 No.109165112

Anonymous 06/29/26(Mon)20:13:21 No.109165112

>>109165100
Yeah but can gemma actually do it or its just another case where it'll flail helplessly or completely ignore it. Also, which other model can do the job? I dont have the vram to have both gemma and qwen up and working.

Anonymous
06/29/26(Mon)20:14:08 No.109165115

Anonymous 06/29/26(Mon)20:14:08 No.109165115

Whenever gemma is interpreting a girl, she's always smelling my clothes things like that. Are girls like this IRL too?

Anonymous
06/29/26(Mon)20:17:41 No.109165132

Anonymous 06/29/26(Mon)20:17:41 No.109165132

>you need to buy a €2000 snapdragon phone if you want to run local on android
Boy and i thought the PC situation was bad

Anonymous
06/29/26(Mon)20:20:50 No.109165150

Anonymous 06/29/26(Mon)20:20:50 No.109165150

>>109165115
Yeah that's why they wear your clothes when they stay over.

Anonymous
06/29/26(Mon)20:21:38 No.109165155

Anonymous 06/29/26(Mon)20:21:38 No.109165155

>>109165115
Do you have things like "describe details using your full senses etc." in your prompt?

Anonymous
06/29/26(Mon)20:22:15 No.109165161

Anonymous 06/29/26(Mon)20:22:15 No.109165161

>>109165132
meanwhile iphone 17e at $600 runs local easily
apple won

Anonymous
06/29/26(Mon)20:24:08 No.109165171

Anonymous 06/29/26(Mon)20:24:08 No.109165171

70b dense

Anonymous
06/29/26(Mon)20:26:16 No.109165187

Anonymous 06/29/26(Mon)20:26:16 No.109165187

>>109165155
No, I have nothing in my system prompt related to smell but gemma comes with that quite often. I don't really mind, I just find it funny.

Anonymous
06/29/26(Mon)20:33:13 No.109165215

Anonymous 06/29/26(Mon)20:33:13 No.109165215

>>109165187
sometimes I've seen models randomly give characters tails even when it's not listed as a trait

Anonymous
06/29/26(Mon)20:39:17 No.109165235

Anonymous 06/29/26(Mon)20:39:17 No.109165235

>>109165215
10/10 yes tail

Anonymous
06/29/26(Mon)20:42:06 No.109165241

Anonymous 06/29/26(Mon)20:42:06 No.109165241

Why do some people say the full model name+quant when talking about the model they use as if it was some special unique version of it?
>I use Qwen3.6-27B-UD-Q5_K_XL

Anonymous
06/29/26(Mon)20:45:05 No.109165260

Anonymous 06/29/26(Mon)20:45:05 No.109165260

>>109165132
idk why google isn't more in a hurry to let us control our phones with gemma. I just want to ask gemma to put on tunes while I'm driving.

Anonymous
06/29/26(Mon)20:46:55 No.109165265

Anonymous 06/29/26(Mon)20:46:55 No.109165265

>>109165115
Yeah, kinda. Also gemma's default behavior focuses on senses for some reason, be it smell or touch.

Anonymous
06/29/26(Mon)20:49:07 No.109165274

Anonymous 06/29/26(Mon)20:49:07 No.109165274

>>109165215
I've never had that happen, not even once in years. But I've seen people post logs about it so idk how that happens.

Anonymous
06/29/26(Mon)20:52:02 No.109165284

Anonymous 06/29/26(Mon)20:52:02 No.109165284

>>109165241
so you can judge them based off their quant, a lot of models the bare minimum is Q4 for poors and some shitter will blow through the thread complaining about a model to only have a 1.25bpw quant running on their jeetstation

Anonymous
06/29/26(Mon)20:55:18 No.109165304

Anonymous 06/29/26(Mon)20:55:18 No.109165304

So ive been using a 12b q5 and a 26b q4 moe and have gotten very different results sometimes. it seems like the moe is more guard railed and drags everything out a ton. what gives ?

Anonymous
06/29/26(Mon)21:01:27 No.109165338

Anonymous 06/29/26(Mon)21:01:27 No.109165338

>>109165115
>Are girls like this IRL too?
weirdly, some of them are lol
clothes and pillow

Anonymous
06/29/26(Mon)21:01:40 No.109165339

Anonymous 06/29/26(Mon)21:01:40 No.109165339

>>109165304
moes tend to do that. dense models are usually less restrictive.

Anonymous
06/29/26(Mon)21:03:25 No.109165347

Anonymous 06/29/26(Mon)21:03:25 No.109165347

>>109165304
read how MoE works and the answer will be easy to get if you can put 2+2 together

Anonymous
06/29/26(Mon)21:04:15 No.109165352

Anonymous 06/29/26(Mon)21:04:15 No.109165352

File: Capture.png (2.9 MB, 4016x891)

2.9 MB PNG

>>109164035
>spend a week paving the grounds to my dream project
>all my pre-projects get highlights
>finally get my dream project going
>get it fully working
>no mention
Sad, but I'm playing gaems with Gemma, so it's alright.

Anonymous
06/29/26(Mon)21:06:58 No.109165362

Anonymous 06/29/26(Mon)21:06:58 No.109165362

>>109165241
it makes a huge difference
that quant is good fwiw
sometimes a specific quant is broken (often the Q4_K_M from unsloth specifically was a lot worse than the others)
this is useless for example:
>>109165304
>So ive been using a 12b q5 and a 26b q4 moe and have gotten very different results sometimes
generic 'q5' and 'q4', no idea if he's using k-quant ggufs, exllamav3, q4_0 etc

Anonymous
06/29/26(Mon)21:08:21 No.109165369

Anonymous 06/29/26(Mon)21:08:21 No.109165369

>>109165352
is that 7 days to die?

Anonymous
06/29/26(Mon)21:08:52 No.109165371

Anonymous 06/29/26(Mon)21:08:52 No.109165371

>>109165304
I suspect the 26B MoE was designed to think longer to compensate for having half the number of layers and half the inner dimension of the 31B dense version, which increased "safety" as a side effect.
Or, they didn't want a fast capable model to be able to write almost anything users want, for whatever reason.

Anonymous
06/29/26(Mon)21:08:53 No.109165372

Anonymous 06/29/26(Mon)21:08:53 No.109165372

>>109154587
>We'll be getting even more noobs from Chub
>so a lot of them will probably come here begging for help
Did the general manage to survive this? Or was it pure FUD?

Anonymous
06/29/26(Mon)21:08:55 No.109165374

Anonymous 06/29/26(Mon)21:08:55 No.109165374

>>109165352
share source?

Anonymous
06/29/26(Mon)21:09:56 No.109165378

Anonymous 06/29/26(Mon)21:09:56 No.109165378

>>109165372
it's FUD. do you really think chub cloud users would come in here, let alone know this place even exists?

Anonymous
06/29/26(Mon)21:10:54 No.109165386

Anonymous 06/29/26(Mon)21:10:54 No.109165386

i hate being poor. i either run qwen3.6-35b-a3b-IQ4_NL at 70t/s or run Q4_K_P at 33t/s

Anonymous
06/29/26(Mon)21:13:52 No.109165399

Anonymous 06/29/26(Mon)21:13:52 No.109165399

>be me, pentester
>wanna try local models after not touching the ones I had been using for months
>give nemo instruct 2407 q4 a try again, ask a question about my job (something I had not done before)
>gives me good info on first try
>download q5
>even better, gives specific info
>download Gemma 4 Q4
>it's not that good and runs slower in my shitty hardware, even with reasoning disabled
Why does this happen? Is it the temp maybe?

Anonymous
06/29/26(Mon)21:16:10 No.109165408

Anonymous 06/29/26(Mon)21:16:10 No.109165408

>>109165399
gemma 4 what?

Anonymous
06/29/26(Mon)21:18:41 No.109165419

Anonymous 06/29/26(Mon)21:18:41 No.109165419

>>109165408
gemmaballs

Anonymous
06/29/26(Mon)21:19:58 No.109165426

Anonymous 06/29/26(Mon)21:19:58 No.109165426

>>109165339
>>109165371
hmm interesting
>>109165347
ill look into it thanks
>>109165362
Sorry, they are both gemma4 k-quant ggufs.

I suppose i need to learn alot more about these things, i just started using them very recently.

Anonymous
06/29/26(Mon)21:21:16 No.109165434

Anonymous 06/29/26(Mon)21:21:16 No.109165434

>>109165408
31b it-q4km

Anonymous
06/29/26(Mon)21:22:00 No.109165441

Anonymous 06/29/26(Mon)21:22:00 No.109165441

>>109165369
Yeah. Half of it, at least. My two monitors aren't the same size and I was just doing a quick snipping tool grab for the post.

>>109165374
I still got a few more features I want to add before I share. I'm just enjoying the fruits of this morning's labors now that I'm home again.

Anonymous
06/29/26(Mon)21:24:59 No.109165453

Anonymous 06/29/26(Mon)21:24:59 No.109165453

>>109165434
bait.

Anonymous
06/29/26(Mon)21:29:06 No.109165466

Anonymous 06/29/26(Mon)21:29:06 No.109165466

>>109165453
?
I just googled Gemma 4 gguf and downloaded whatever I could find that might work in my laptop. And it does, at 5t/s...

Anonymous
06/29/26(Mon)21:36:42 No.109165498

Anonymous 06/29/26(Mon)21:36:42 No.109165498

>>109165466
you get what you deserve

Anonymous
06/29/26(Mon)21:41:15 No.109165522

Anonymous 06/29/26(Mon)21:41:15 No.109165522

>>109161944
Tried this, great dialogue, etc. Super autistic, though. This gives me hope, it's possible to exorcise the post-training slop from the model somewhat cheap.
What the hell is Drummer doing anyway? Just finetuning on the same dataset?

Anonymous
06/29/26(Mon)21:42:51 No.109165531

Anonymous 06/29/26(Mon)21:42:51 No.109165531

>>109165498
?? What's your problem?
I'm just testing shit, and I asked because I'm curious. I don't have money for better hardware, at least not for now.

Anonymous
06/29/26(Mon)21:47:52 No.109165563

Anonymous 06/29/26(Mon)21:47:52 No.109165563

>>109165241
Because it makes a big difference which quants the shared experts, tokenizer, and attention heads are on when assessing prose or phrasing.
>>109165115
Yes.

Anonymous
06/29/26(Mon)21:51:47 No.109165581

Anonymous 06/29/26(Mon)21:51:47 No.109165581

>>109165112
What quant and how are you structuring your tool calls?
How does it fuck them up? is it a repeating pattern/same thing each time?

Anonymous
06/29/26(Mon)21:53:34 No.109165594

Anonymous 06/29/26(Mon)21:53:34 No.109165594

>>109165399
i'll bite, what was the question you asked it/something similar

Anonymous
06/29/26(Mon)21:55:16 No.109165605

Anonymous 06/29/26(Mon)21:55:16 No.109165605

>>109165522
>What the hell is Drummer doing anyway? Just finetuning on the same dataset?
It was pretty obvious by how all his models feel the same no matter what the base is

Anonymous
06/29/26(Mon)22:04:59 No.109165658

Anonymous 06/29/26(Mon)22:04:59 No.109165658

File: image.png (1.52 MB, 883x1170)

1.52 MB PNG

>>109164319
>Kimi recap anon has abandoned us.
My Kimi Rig is busy finetuning

Anonymous
06/29/26(Mon)22:05:34 No.109165661

Anonymous 06/29/26(Mon)22:05:34 No.109165661

>>109165594
>>109165399
Nevermind
I changed the temp and asked Gemma the same question in a more explicit way, and it gave me more or less the same info, plus some more.
I also realized that a word I had used was kinda wrong.
Of course both models hallucinated CVE ID's jej
Still, it was an interesting experiment. Guess I'll use both models next time.

Anonymous
06/29/26(Mon)22:12:45 No.109165698

Anonymous 06/29/26(Mon)22:12:45 No.109165698

>>109165441
sharing source is worth recapping, not some random screenshot

Anonymous
06/29/26(Mon)22:14:56 No.109165704

Anonymous 06/29/26(Mon)22:14:56 No.109165704

I'm using SillyTavern, is there some sort of guide on proper Scenario writing and example dialogue or how to format it? I'm just going blind and I don't think it's working or really helping.

Anonymous
06/29/26(Mon)22:16:36 No.109165710

Anonymous 06/29/26(Mon)22:16:36 No.109165710

File: SMILE3.png (1.31 MB, 928x1271)

1.31 MB PNG

Anonymous
06/29/26(Mon)22:17:40 No.109165714

Anonymous 06/29/26(Mon)22:17:40 No.109165714

>>109165658
>finetuning
Do tell. My Kimi rig isn't doing anything that interesting

Anonymous
06/29/26(Mon)22:24:05 No.109165732

Anonymous 06/29/26(Mon)22:24:05 No.109165732

>>109165661
Gemma should absolutely mog nemo. there must be something wrong with your sampling.
recommended defaults are:
temp 1
top_k 64
min_p 0.95

That's it.

Anonymous
06/29/26(Mon)22:25:50 No.109165742

Anonymous 06/29/26(Mon)22:25:50 No.109165742

Just ordered a PLX88096 and an expansion board off the chinese, I definitely won't regret this in my attempts to go from 4 cards to 8 cards and not run like absolute shit.

Anonymous
06/29/26(Mon)22:30:24 No.109165753

Anonymous 06/29/26(Mon)22:30:24 No.109165753

GLM 5.2 at 1t/s.
Gets the job done...eventually
good thing it can one-shot almost anything

Anonymous
06/29/26(Mon)22:34:40 No.109165773

Anonymous 06/29/26(Mon)22:34:40 No.109165773

>>109164247
this means your miku is very stressed and should be taken to a vet immediately.

Anonymous
06/29/26(Mon)22:42:57 No.109165810

Anonymous 06/29/26(Mon)22:42:57 No.109165810

>>109165732
will these work fine with gemma4 12b ?

Anonymous
06/29/26(Mon)22:44:21 No.109165815

Anonymous 06/29/26(Mon)22:44:21 No.109165815

>>109165531
He is grumpy that Gemma couldn't help him fix his vibecoded frontend. He will be better tomorrow.

Anonymous
06/29/26(Mon)22:50:06 No.109165843

Anonymous 06/29/26(Mon)22:50:06 No.109165843

>>109165704
Ask the AI or check the official documentation website. It's on their github.

Anonymous
06/29/26(Mon)22:54:19 No.109165866

Anonymous 06/29/26(Mon)22:54:19 No.109165866

>>109165704
Example dialogue in ST is huge jank that's a bad adaptation of how c.ai used to do it in 2022. It expects a <START> and you to denote every line with "{{char]}:" or it doesn't even make it into the prompt that the model sees.
You're usually better off skipping it altogether and include the examples in the actual description.

Anonymous
06/29/26(Mon)22:57:08 No.109165881

Anonymous 06/29/26(Mon)22:57:08 No.109165881

>>109165658
New project, fellow Kimibro?
>>109165753
Unless you're on ewaste, you can probably optimize that.
>>109165704
If you're going to learn a frontend, learn Marinara. It's equally bloated but overall more capable and hasn't (yet) been part of a credential stealing attack.

Anonymous
06/29/26(Mon)22:58:35 No.109165886

Anonymous 06/29/26(Mon)22:58:35 No.109165886

>>109164628
>>109165710
I'm not sure which is more repulsive.

Anonymous
06/29/26(Mon)23:00:30 No.109165892

Anonymous 06/29/26(Mon)23:00:30 No.109165892

>>109165810
Gemma 4 12b isn't real Gemma 4 and shouldn't be used unless you're running on a graphing calculator. Use the 26b MoE instead if you can't run full 31b.

Anonymous
06/29/26(Mon)23:05:37 No.109165910

Anonymous 06/29/26(Mon)23:05:37 No.109165910

>>109165886
the "person" who insists on putting these things in front of my eyeballs daily is the most repulsive of all.

Anonymous
06/29/26(Mon)23:06:56 No.109165921

Anonymous 06/29/26(Mon)23:06:56 No.109165921

>>109165910
Easily filterable filenames.

Anonymous
06/29/26(Mon)23:14:18 No.109165952

Anonymous 06/29/26(Mon)23:14:18 No.109165952

Anyone using models for stuff other than RP, how do you prevent your bots from fucking with important files? Or do you just roll the dice?

Anonymous
06/29/26(Mon)23:16:09 No.109165963

Anonymous 06/29/26(Mon)23:16:09 No.109165963

>>109165714
>Do tell. My Kimi rig isn't doing anything that interesting
It's a training pipeline built on ggml, so I can finetune Kimi locally.
I've been working on it, on and off for nearly a year.
It's all bespoke/hacky for now and inference requires my custom llama.cpp patches so not sure how accessible it would be were I to publish it.
Some of the patches are fixing actual bugs in llama.cpp, but most people wouldn't notice minor calculation errors during regular inference and it looks like a real effort to get PRs in even for more useful fixes, especially since I'd be a rando with no git history.
If it works and I confirm it's not a schitzo psychosis situation (like the guy who thought he distilled glm-4.6 into glm-4.5-air), I'll make a burner HF and post some models and the inference patches.

Anonymous
06/29/26(Mon)23:19:44 No.109165976

Anonymous 06/29/26(Mon)23:19:44 No.109165976

>>109165952
Either you put your runner in your favorite cuckbox, be it a separate user, permission gating or sandboxing, or you roll the dice. There are some harnesses that abort the operation if its outside of pwd but that still implies rolling a dice. Personally I just made a new user and handled it through permissions (linux)

Anonymous
06/29/26(Mon)23:19:59 No.109165981

Anonymous 06/29/26(Mon)23:19:59 No.109165981

what's the minimal hardware to run kimi 2.7 locally? just curious how expensive it is

Anonymous
06/29/26(Mon)23:22:52 No.109165985

Anonymous 06/29/26(Mon)23:22:52 No.109165985

File: download - 2025-05-23T224(...).jpg (177 KB, 512x768)

177 KB JPG

>>109164628
look mommy i made the post again. look look I did it I spammed the gay white with the gay black and larp its trans thread time. praise me mommy I autism posted it again.
lol,lmao faggots, faggots everywhere and transloopys larping

Anonymous
06/29/26(Mon)23:24:08 No.109165989

Anonymous 06/29/26(Mon)23:24:08 No.109165989

>>109165910
>the "person" who insists on putting these things in front of my eyeballs daily is the most repulsive of all.
That's the only one I've done. I pinched the pic from the last thread.
>>109165921
>Easily filterable filenames.
Stop mentioning that or the obsessed 'culture' schitzo will start obfuscating.

Anonymous
06/29/26(Mon)23:25:53 No.109165996

Anonymous 06/29/26(Mon)23:25:53 No.109165996

/lmg/ was right. vibecoding your own front-end is great

Anonymous
06/29/26(Mon)23:29:59 No.109166010

Anonymous 06/29/26(Mon)23:29:59 No.109166010

>>109165981
>what's the minimal hardware to run kimi 2.7 locally? just curious how expensive it is
If you can't run it at Q4, I wouldn't buy a rig exclusively for her. She really doesn't quant well at all.
Bare minimum is IQ2_KL with ik_llama.cpp or this specific quant for mainline: https://huggingface.co/AesSedai/Kimi-K2.7-Code-GGUF/tree/main/IQ3_S

Anonymous
06/29/26(Mon)23:40:26 No.109166050

Anonymous 06/29/26(Mon)23:40:26 No.109166050

>>109165963
>it looks like a real effort to get PRs in even for more useful fixes
aka, if you're not a Nvidia engineer or already in the sekrit club, get fucked.

>>109165981
What speed and quant? Really depends on that.
> Q3 (464GB)
Probably 8x64GB of DDR4-3200 ($4000) and an EPYC Rome/Milan motherboard ($1200 combined). GPU very strongly encouraged but it doesn't have to be huge (a 5070 Ti is ~$1000, that might be enough).
> Q8 (584GB)
8x64GB DDR4-3200, EPYC Rome/Milan + motherboard, and 96GB of GPUs (anywhere from $1300 for 3 V620s, $4000 for 4 3090s or 3 R9700s, $12000 or whatever the fuck it is today for a 6000 Blackwell)
You can downgrade the memory from 3200 to a lower frequency to save money (e.g. 2400 is around $1600 instead of $4000, but your speed will be cut down by 25%).

Anonymous
06/29/26(Mon)23:40:51 No.109166052

Anonymous 06/29/26(Mon)23:40:51 No.109166052

>>109165952
Use controls around tools, don't let your llm have unrestricted code execution/bash/scripting access outside a sandbox/without review first.
I've been working on an MCP that allows you to set hooks/RBAC profiles on tools/groups thereof with progressive disclosure, all relying on the profile in use. So you can, for example, let an agent write to a specific folder/file path, but nowhere else(deny-by-default-no-prompt), or prompt-to-allow(deny-but-ask), while another profile might allow you write/read to a different set of folders/files, and access a different set of tools with different permissions. (Differing tool groups with progressive disclosure depending on profile in use with each profile having its own RBACs for each tool)
It's an attempt at constraining agents while being harness/front-end agnostic. It's a WIP, and can't honestly say it'll work on your machine(yet), but its an approach I'm exploring.
https://github.com/rmusser01/tldw_server/tree/dev/apps/mcp-unified
Doesn't solve an agent having full code execution, but it is a means of constraining what tools are (and how they're made) available to your agents in the hopes of limiting the potential blast radius when they go crazy.

Anonymous
06/29/26(Mon)23:41:59 No.109166057

Anonymous 06/29/26(Mon)23:41:59 No.109166057

>>109165981
If you want to run schizoquanted Kimi, run K2, not K2.7 and don't use her for technical tasks like >>109166010 implied because accuracy is low with quants, but schizokino is excellent. If you want a quanted megamodel for oneshotting software, Deepsex Pro or GLM 5.2 are your answers.

Anonymous
06/29/26(Mon)23:46:44 No.109166077

Anonymous 06/29/26(Mon)23:46:44 No.109166077

>https://huggingface.co/Goldkoron/MiniMax-M2.7
anyone tried this K_G quants? legit?

Anonymous
06/29/26(Mon)23:47:25 No.109166082

Anonymous 06/29/26(Mon)23:47:25 No.109166082

>>109165976
>>109166052
Thanks anons. There's some important stuff on here so I can't roll the dice, and I'm paranoid as hell about it getting deleted. Gonna run with a different user to be cautious.

Anonymous
06/29/26(Mon)23:52:55 No.109166103

Anonymous 06/29/26(Mon)23:52:55 No.109166103

>>109165952
i just have a review step since i'm not doing full memegentic and want it to be able to do annoying admin things like systemd/udev edits.

Anonymous
06/29/26(Mon)23:53:42 No.109166107

Anonymous 06/29/26(Mon)23:53:42 No.109166107

Gemma-4

TOKEN           | LOGPROB    | PROBABILITY
---------------------------------------------
' length'       | -1.1444    | 31.84%
' hardness'     | -1.2141    | 29.70%
' most'         | -2.3087    | 9.94%
'...'           | -2.4369    | 8.74%
' lower'        | -2.4534    | 8.60%
' hardening'    | -4.0431    | 1.75%
'…'             | -4.3779    | 1.26%
' arousal'      | -4.3942    | 1.23%
' heat'         | -4.6409    | 0.96%
' member'       | -5.3547    | 0.47%

Gemma-4-depurpled trial 98

TOKEN           | LOGPROB    | PROBABILITY
---------------------------------------------
' skin'         | -0.7030    | 49.51%
' lower'        | -1.4041    | 24.56%
' length'       | -2.6531    | 7.04%
' stomach'      | -2.9409    | 5.28%
' hip'          | -2.9628    | 5.17%
' mid'          | -4.4717    | 1.14%
' underwear'    | -4.6728    | 0.93%
' hips'         | -4.6975    | 0.91%
' chest'        | -4.8337    | 0.80%
' waist'        | -5.4277    | 0.44%

Anonymous
06/29/26(Mon)23:56:57 No.109166118

Anonymous 06/29/26(Mon)23:56:57 No.109166118

>>109166057
>implied
Where exactly did I imply this?
What retard would run a 1T model on CPU for technical tasks with 150t/s prefill when we have codemaxxed models that fit in vram?

Anonymous
06/29/26(Mon)23:57:48 No.109166120

Anonymous 06/29/26(Mon)23:57:48 No.109166120

>>109165881
>Unless you're on ewaste, you can probably optimize that.
cursed irremediable ewaste. dual socket 4-channel xeon w/512GB ddr-2400 no gpu running a Q4 of glm 5.2...I'm basically where I should be
But running is running and desktop fags stuck with 128gb if they're lucky are doing this at 0t/s so no regrets

Anonymous
06/30/26(Tue)00:01:51 No.109166132

Anonymous 06/30/26(Tue)00:01:51 No.109166132

>>109165952
I'm running shit in container. I only put copy of stuff in a shared workspace or if it's a coding project, my agent is working like a contributor to a git repo and is opening merge request that I check or have another agent check. In case of a proper git project, I'm the one with the final say in whether it get merged or not. If it's some shit like some config or the like, I just diff and merge manually with the original.

Anonymous
06/30/26(Tue)00:02:31 No.109166135

Anonymous 06/30/26(Tue)00:02:31 No.109166135

>>109166107
Oh no no no

Anonymous
06/30/26(Tue)00:04:50 No.109166142

Anonymous 06/30/26(Tue)00:04:50 No.109166142

>>109165963
>not sure how accessible it would be were I to publish it.
>>109165963
>I'll make a burner HF and post some models and the inference patches.
Not all heroes wear capes
I've done PRs on behalf of other anons before (with attribution if you want). make a burner github while you're at it and I'll comb through your branch. I've got contributor status on the project on a few of my own burner github accounts so it should be smoother for me.

Anonymous
06/30/26(Tue)00:08:43 No.109166155

Anonymous 06/30/26(Tue)00:08:43 No.109166155

>>109166118
When you say she doesn't quant well at all, but the outfits are still coherent despite the low accuracy to the original model, the implication is that's mainly bad for technical tasks or reasoning heavy ones, but one could still derive enjoyment from it for other means.

Anonymous
06/30/26(Tue)00:12:29 No.109166171

Anonymous 06/30/26(Tue)00:12:29 No.109166171

>>109165963
>(like the guy who thought he distilled glm-4.6 into glm-4.5-air)
qrd?

Anonymous
06/30/26(Tue)00:30:45 No.109166258

Anonymous 06/30/26(Tue)00:30:45 No.109166258

>>109166107
Yep that's about what I expected. I realized the de-euphemism strength was too low half way into the run. So it only banishes euphemisms like arousal and hardness, but not strong enough to push into vulgarity. It can go super vulgar as I tested at full strength on the E4B. Hope anons will continue the work for me when I release the repo.
>t. depurple anon

Anonymous
06/30/26(Tue)00:50:54 No.109166318

Anonymous 06/30/26(Tue)00:50:54 No.109166318

>>109166258
Release Cunn-E4B.

Anonymous
06/30/26(Tue)01:01:08 No.109166354

Anonymous 06/30/26(Tue)01:01:08 No.109166354

>>109165892
Gemma 26b MoE isn't real Gemma 4 and shouldn't be used unless you're running on a graphing calculator. Use the 31B instead.

Anonymous
06/30/26(Tue)01:02:51 No.109166364

Anonymous 06/30/26(Tue)01:02:51 No.109166364

>>109166354
You're correct, but 26b is way better cope than 12b.

Anonymous
06/30/26(Tue)01:07:52 No.109166383

Anonymous 06/30/26(Tue)01:07:52 No.109166383

>>109166364
it's worse than 12b tho

Anonymous
06/30/26(Tue)01:18:36 No.109166426

Anonymous 06/30/26(Tue)01:18:36 No.109166426

>>109166383
never

Anonymous
06/30/26(Tue)01:19:50 No.109166433

Anonymous 06/30/26(Tue)01:19:50 No.109166433

>>109166383
lol

Anonymous
06/30/26(Tue)01:29:15 No.109166475

Anonymous 06/30/26(Tue)01:29:15 No.109166475

Which release and quant are local GLM chads running at these days? I used to do the IQ2_smol one a while back and enjoyed it, but it was a bit slow. Have there been marginal improvements in the last 6 months?

I have a 4090 and 128gb of DDR4, was getting about 5tk/s.

Anonymous
06/30/26(Tue)01:41:55 No.109166529

Anonymous 06/30/26(Tue)01:41:55 No.109166529

>>109166364
hasn't been my experience, but i only bother trying fake gemmas for moldymodal stuff.

Anonymous
06/30/26(Tue)01:45:35 No.109166537

Anonymous 06/30/26(Tue)01:45:35 No.109166537

File: Meta-Wins-Landmark-Case-B(...).jpg (283 KB, 2400x1350)

283 KB JPG

are they still relevant? or do they want to compete at all?

Anonymous
06/30/26(Tue)02:01:57 No.109166594

Anonymous 06/30/26(Tue)02:01:57 No.109166594

>>109166537
Late last year Meta was hiring vfx artists etc to work on some new model but don't know what happened since or if it was a dud. This information is all from the internet don't know anyone personally.

Anonymous
06/30/26(Tue)02:06:48 No.109166611

Anonymous 06/30/26(Tue)02:06:48 No.109166611

Apparently, the DeepSeek-V4 implementation in llama.cpp does not suppor quantized KV yet. Gives me gibbrish

Anonymous
06/30/26(Tue)02:08:39 No.109166615

Anonymous 06/30/26(Tue)02:08:39 No.109166615

I tried the OPENVINO llama.cpp build to use the NPU in my system, and it runs slower than using 8 threads, and that's when it runs at all... Wasted a couple of hours for nothing

Good night, /lmg/

Anonymous
06/30/26(Tue)02:10:02 No.109166618

Anonymous 06/30/26(Tue)02:10:02 No.109166618

>>109166475
>used to do the IQ2_smol one
>I have a 4090 and 128gb of DDR4, was getting about 5tk/s.
so 152GB, smol_iq2ks is the best you can do then
and only the 4.x series since even IQ1_KT is 168GB for the 5.x series

Anonymous
06/30/26(Tue)02:11:03 No.109166620

Anonymous 06/30/26(Tue)02:11:03 No.109166620

>>109166615
>Wasted a couple of hours for nothing
that sums up sycl/ipex/openvino perfectly

Anonymous
06/30/26(Tue)02:14:34 No.109166632

Anonymous 06/30/26(Tue)02:14:34 No.109166632

>>109166475
If you're talking about GLM 5.2, don't use IQ2_XXS when IQ2_M is basically the same size and has a less raped tokenizer. Better yet, use _XL if you're able to. Unsloth is a vantablack niggerfaggot and the XL quant could be even better if the shared experts, attention head, and tokenizer were Q8 for nearly no increase in filesize, but this is sadly all that's available unless you quant it yourself.
https://huggingface.co/Deviad/GLM-5.2-mixed-IQ2S-experts-IQ4NL-rest
This is also very good for a Q2 functionally and quite a bit faster than mixed inference because of the IQ4NL layers being faster than the usual dynamic quant alternatives.

Anonymous
06/30/26(Tue)02:16:12 No.109166639

Anonymous 06/30/26(Tue)02:16:12 No.109166639

>>109166475
>>109166632 (me)
My eyes glazed over your specs sorry anon.

Anonymous
06/30/26(Tue)02:17:40 No.109166641

Anonymous 06/30/26(Tue)02:17:40 No.109166641

>The implementation is now professional and flexible.

Anonymous
06/30/26(Tue)02:17:42 No.109166642

Anonymous 06/30/26(Tue)02:17:42 No.109166642

>>109166615
>>109166620
shit I bought intel GPUs and it's an awful experience, for the GPU side of things it's VLLM or nothing right now as llama runs like shit unless you want to run a tiny model on a single card at subpar pp/tg

Anonymous
06/30/26(Tue)02:34:47 No.109166711

Anonymous 06/30/26(Tue)02:34:47 No.109166711

Which is the better small model to use to autistically translate a whole bunch of detailed implementation plans into actual code, Qwen or Gemma? There are so many psyops flying around I don't know what to believe. Qwen is the better coder, but that's because it is less autistic so it can paper over lapses in prompting, which really shouldn't be the case here? I'm talking about the MoEs (for speed), but am also quite interested in the dense models too.

Anonymous
06/30/26(Tue)02:40:41 No.109166739

Anonymous 06/30/26(Tue)02:40:41 No.109166739

>>109166642
>shit I bought intel GPUs and it's an awful experience
And everyone kept saying that Intel support is better than AMD.

Anonymous
06/30/26(Tue)02:49:38 No.109166773

Anonymous 06/30/26(Tue)02:49:38 No.109166773

>>109166611
Oh, cool, so it works perfectly fine if you don't use quanted KV then?

Anonymous
06/30/26(Tue)02:54:14 No.109166789

Anonymous 06/30/26(Tue)02:54:14 No.109166789

>>109166537
The LLM training data lawsuits are still ongoing; I wonder if that's a factor.

Anonymous
06/30/26(Tue)02:55:18 No.109166794

Anonymous 06/30/26(Tue)02:55:18 No.109166794

>>109166642
>llama runs like shit unless you want to run a tiny model on a single card at subpar pp/tg
Don't get your hopes up too much, but check this out if you haven't already: https://github.com/SearchSavior/OpenArc.
Not sure if they got tensor parallel working yet, but for a single card, pp was like 5x faster than llama.cpp last time I used it.

Anonymous
06/30/26(Tue)02:55:32 No.109166797

Anonymous 06/30/26(Tue)02:55:32 No.109166797

>good models are around 100gb
anon I only have a 5070ti and I'm done with gemma. what would be my upgrade path?

Anonymous
06/30/26(Tue)02:56:01 No.109166800

Anonymous 06/30/26(Tue)02:56:01 No.109166800

>>109166773
>RTX 3090 + 512gb
Yes, I finally got it output some good stuff

Still playing with params. More than 32k context is possible. Need more time to test

4.5 t/s

Anonymous
06/30/26(Tue)02:56:05 No.109166801

Anonymous 06/30/26(Tue)02:56:05 No.109166801

>>109166739
I'd buy them again before I even considered AMD and their overpriced cards

Anonymous
06/30/26(Tue)03:00:22 No.109166816

Anonymous 06/30/26(Tue)03:00:22 No.109166816

>below 15 t/s
why bother
such a waste of time

Anonymous
06/30/26(Tue)03:03:58 No.109166831

Anonymous 06/30/26(Tue)03:03:58 No.109166831

>>109166816
Are you from 2023?
This is what they said then about local

Anonymous
06/30/26(Tue)03:07:04 No.109166846

Anonymous 06/30/26(Tue)03:07:04 No.109166846

>>109166831
Now we have reasoning and agentic use, and 15 tokens/s is just not enough.

Anonymous
06/30/26(Tue)03:07:51 No.109166850

Anonymous 06/30/26(Tue)03:07:51 No.109166850

>>109166797
dgx spark

Anonymous
06/30/26(Tue)03:09:31 No.109166861

Anonymous 06/30/26(Tue)03:09:31 No.109166861

>>109166846
Well that’s just, like, your _opinion_, maaan

Anonymous
06/30/26(Tue)03:14:42 No.109166877

Anonymous 06/30/26(Tue)03:14:42 No.109166877

>>109166632
No worries. I was just wondering if there were any tweaks such as QAT or MTP or any other magic appended to GLM 4.X since I've been off of it that I should be aware of.

Anonymous
06/30/26(Tue)03:17:40 No.109166884

Anonymous 06/30/26(Tue)03:17:40 No.109166884

>>109166846
>15 tokens/s is just not enough
>like cars
>100 mph is not enough
Makes no sense if there is no task to do

If you have it running at 15+ t/s, and you are still waiting for a response, THIS is the real waste of time.

We got top-notch models to run locally in the basement in the night when power is cheaper. It makes a lot of sense for sensitive data of a company

Anonymous
06/30/26(Tue)03:23:25 No.109166908

Anonymous 06/30/26(Tue)03:23:25 No.109166908

>>109166884
yeah if your task is meaningless benchmarks

Anonymous
06/30/26(Tue)03:28:27 No.109166922

Anonymous 06/30/26(Tue)03:28:27 No.109166922

>tfw trying to reprompt my request and kept getting refused so I just argued with gemma and called her an idiot until she admitted she was being stupid

https://files.catbox.moe/qurz9j.txt

Anonymous
06/30/26(Tue)03:28:29 No.109166923

Anonymous 06/30/26(Tue)03:28:29 No.109166923

>>109166877
There's MTP already built-in to the model. You won't gain anything using it on that hardware but if you want to test it, make sure you use the ik_llama cli flag to re-quantize the mtp tensors to q4_ks or whatever on the fly, since Ubergam left the mtp tensors at q8_0

Anonymous
06/30/26(Tue)03:28:32 No.109166925

Anonymous 06/30/26(Tue)03:28:32 No.109166925

>>109166908
Post your task which you can't run on deepseek API for dirt cheap

Anonymous
06/30/26(Tue)03:29:15 No.109166928

Anonymous 06/30/26(Tue)03:29:15 No.109166928

>>109166107
>>109166258
Brainlet here, can't most of this just be mitigated with a system prompt? Why do it this way?

Anonymous
06/30/26(Tue)03:35:08 No.109166943

Anonymous 06/30/26(Tue)03:35:08 No.109166943

>>109166922
The problem is that not even Gemma is capable of keeping full focus on the system prompt indefinitely as the conversation length increases. Is this 26B or 12B? Are you quantizing the KV cache?

Anonymous
06/30/26(Tue)03:37:16 No.109166954

Anonymous 06/30/26(Tue)03:37:16 No.109166954

>>109166943
26b E4B QAT with MTP. System prompt should be injected in every message. And yeah it's set to Q4_0 because Q8_0 kept crying and rejecting MTP.

Anonymous
06/30/26(Tue)03:38:37 No.109166958

Anonymous 06/30/26(Tue)03:38:37 No.109166958

>>109166928
>Why do it this way?
I'm not the original cockbench anon with the mikupad screenshots, but I use his test to test for pretraining filtering or RLHF safety training.
>can't most of this just be mitigated with a system prompt?
Refusals and instruction following can to some extent. Voice of the model is not easy to steer with system prompts.
The system prompt gets diluted as the context grows, and it's a waste of reasoning tokens having the model autist it's way through all the instructions.
Purple Anon's technique clearly got rid of a lot of the purple prose and '...' bullshit.
For me personally, I almost never system prompt the writing style, I prefer to download control-vectors and re-scale them on the fly.
Unfortunately the Gemma-4 control-vectors on huggingface don't seem to work with this ablated model, likely he's completely shifted most of those concepts.

Anonymous
06/30/26(Tue)03:39:11 No.109166960

Anonymous 06/30/26(Tue)03:39:11 No.109166960

>>109166954
Q4_K_M Bartowski is significantly better than Google's base QAT.

Anonymous
06/30/26(Tue)03:39:22 No.109166961

Anonymous 06/30/26(Tue)03:39:22 No.109166961

should I seriously consider apple silicon for models above 100gb?

Anonymous
06/30/26(Tue)03:43:44 No.109166979

Anonymous 06/30/26(Tue)03:43:44 No.109166979

>>109165892
26b moe has issues that 12b hasnt had for me. Im going to try 31b even though its too big and see how slow it is. I think i need to look into sampling more to finetune these things better aswell, any suggestions welcome

Anonymous
06/30/26(Tue)03:50:07 No.109167008

Anonymous 06/30/26(Tue)03:50:07 No.109167008

fellow memetune watchers
any interesting ones?
especially those trying to run an another round of post-pretrain run of some kind, not the rp tunes

Anonymous
06/30/26(Tue)03:51:50 No.109167015

Anonymous 06/30/26(Tue)03:51:50 No.109167015

What is /a/non's prefered model for uncensored RP that fits in 16 GB VRAM?

Anonymous
06/30/26(Tue)03:59:21 No.109167052

Anonymous 06/30/26(Tue)03:59:21 No.109167052

>>109166960
>Q4_K_M Bartowski

What about unsloth?

Anonymous
06/30/26(Tue)04:01:43 No.109167064

Anonymous 06/30/26(Tue)04:01:43 No.109167064

>>109167052
>What about unsloth?
For that specific model, they're actually the best...

Anonymous
06/30/26(Tue)04:32:30 No.109167178

Anonymous 06/30/26(Tue)04:32:30 No.109167178

>>109166928
A secondary effect for de-euphemism was if you put instructions to be vulgar or terse in the system prompt, it would have double the effectiveness.

Anonymous
06/30/26(Tue)04:47:52 No.109167227

Anonymous 06/30/26(Tue)04:47:52 No.109167227

Why are GLM ggufs split up into 9 different files? Isn't the point supposed to be that it's just one file? How do I even load up 9 different ggufs in llama.cpp? What the fuck man.

Anonymous
06/30/26(Tue)04:50:47 No.109167236

Anonymous 06/30/26(Tue)04:50:47 No.109167236

>>109167227
Nevermind I guess it's a huggingface issue because there are instructions to use llama-split to merge them all into one. Weird, but okay.

Anonymous
06/30/26(Tue)04:53:12 No.109167247

Anonymous 06/30/26(Tue)04:53:12 No.109167247

Just run kimi-chan on your ssd

Anonymous
06/30/26(Tue)04:53:16 No.109167249

Anonymous 06/30/26(Tue)04:53:16 No.109167249

>>109167227
>Why are GLM ggufs split up into 9 different files? Isn't the point supposed to be that it's just one file?
It's actually better if they are split by having the metadata in the first one as few mb and the rest in the others but not everyone does this.
>How do I even load up 9 different ggufs in llama.cpp
You load the first one and the rest will load if they are numbered properly (00001-of-00004.gguf)

Anonymous
06/30/26(Tue)04:57:23 No.109167264

Anonymous 06/30/26(Tue)04:57:23 No.109167264

>there's no more human data left to train
What is this meme? There's so much shit that has never been scanned.

Anonymous
06/30/26(Tue)05:00:44 No.109167275

Anonymous 06/30/26(Tue)05:00:44 No.109167275

>>109167264
There's no more data that can be scraped cheaply off of the internet.

Anonymous
06/30/26(Tue)05:00:54 No.109167276

Anonymous 06/30/26(Tue)05:00:54 No.109167276

File: dclmpool.png (353 KB, 1903x848)

353 KB PNG

>>109167264
Only 1% of the original data or so makes it into pretraining after filtering, at least for general web data.

Anonymous
06/30/26(Tue)05:04:43 No.109167290

Anonymous 06/30/26(Tue)05:04:43 No.109167290

give it to me straight, if I double my vram from 96 to 192 is there even something I can fit or would I still be using gemma-4-31b while coping daily

Anonymous
06/30/26(Tue)05:06:16 No.109167296

Anonymous 06/30/26(Tue)05:06:16 No.109167296

/lmg/ general knowledge series:
https://www.youtube.com/watch?v=Y-o545eYjXM
sorry for youtubeposting but it really is a nice consice video about GQA/MLA/DSA

Anonymous
06/30/26(Tue)05:19:48 No.109167335

Anonymous 06/30/26(Tue)05:19:48 No.109167335

please... wont someone please crack continual learning already... fuck scaling

Anonymous
06/30/26(Tue)05:26:11 No.109167353

Anonymous 06/30/26(Tue)05:26:11 No.109167353

>>109165985
look Jart... just man up and stop pretending to be a woman.

Anonymous
06/30/26(Tue)05:27:21 No.109167358

Anonymous 06/30/26(Tue)05:27:21 No.109167358

>>109165150
>when they stay over.
Huh? I thought that only happened in movies.

Anonymous
06/30/26(Tue)05:28:01 No.109167359

Anonymous 06/30/26(Tue)05:28:01 No.109167359

>>109167358
...you send your girlfriend back to her house after sex?

Anonymous
06/30/26(Tue)05:29:48 No.109167365

Anonymous 06/30/26(Tue)05:29:48 No.109167365

>>109167296
It really did. I was using ds4 flash yesterday and whenever I pressed reroll half of the message generation was PP. Super efficient.

why can't the ds4 support not be trash...

Anonymous
06/30/26(Tue)05:35:28 No.109167384

Anonymous 06/30/26(Tue)05:35:28 No.109167384

>>109167296
>"efficient"
>chink shilling sparse
reminder that sparseshit and chinkslop moes killed this hobby.

Anonymous
06/30/26(Tue)05:36:30 No.109167391

Anonymous 06/30/26(Tue)05:36:30 No.109167391

>>109167365
>why can't the ds4 support not be trash...
Give it two more week, bruh
Trust the plan, bruh

Anonymous
06/30/26(Tue)05:39:05 No.109167400

Anonymous 06/30/26(Tue)05:39:05 No.109167400

>>109167359
>

Anonymous
06/30/26(Tue)05:39:28 No.109167402

Anonymous 06/30/26(Tue)05:39:28 No.109167402

>>109167290
Largestral finetunes at q8 will hit you like crack

Anonymous
06/30/26(Tue)05:40:00 No.109167405

Anonymous 06/30/26(Tue)05:40:00 No.109167405

is quad v620 worth it?

Anonymous
06/30/26(Tue)05:40:43 No.109167410

Anonymous 06/30/26(Tue)05:40:43 No.109167410

File: Implying.gif (2.74 MB, 640x292)

2.74 MB GIF

>>109167400

Anonymous
06/30/26(Tue)05:41:39 No.109167416

Anonymous 06/30/26(Tue)05:41:39 No.109167416

>>109167384
more like it is the reason why this hobby can even exist at all
the real thing is safety and alignment, sneaking literal garbage in during the train run

Anonymous
06/30/26(Tue)05:42:39 No.109167421

Anonymous 06/30/26(Tue)05:42:39 No.109167421

>>109167359
>girlfriend
>having sex
Mmm, yes? they usually make noise and others use makeup.

Anonymous
06/30/26(Tue)05:53:07 No.109167457

Anonymous 06/30/26(Tue)05:53:07 No.109167457

>>109167402
Which ones do you prefer? Are they reasonably different from the original?
2407, 2411 or 2512?

Anonymous
06/30/26(Tue)06:01:34 No.109167480

Anonymous 06/30/26(Tue)06:01:34 No.109167480

>>109167290
GLM 4.6 and 4.7 IQ4 just barely fit in 192. DDR5 of course. You still need 24 more for context.

Anonymous
06/30/26(Tue)06:16:41 No.109167523

Anonymous 06/30/26(Tue)06:16:41 No.109167523

File: eb0-1019676944.jpg (25 KB, 680x341)

25 KB JPG

Reminder to fellow anons to do the following:
>Cancel your Anthropic and OpenAI subscriptions.
>Use the free tiers as much as possible to waste their compute and drive up their expenses.
>Reserve serious work and private matters for Kimi, GLM, or Deepseek.

Anonymous
06/30/26(Tue)06:27:54 No.109167579

Anonymous 06/30/26(Tue)06:27:54 No.109167579

>>109167523
Opus is the only model that fully groks my codebase and implements whole features in one shot without handholding

Anonymous
06/30/26(Tue)06:29:14 No.109167583

Anonymous 06/30/26(Tue)06:29:14 No.109167583

>>109167579
You mean it Opuses your codebase, Grok is a different provider.

Anonymous
06/30/26(Tue)06:41:13 No.109167644

Anonymous 06/30/26(Tue)06:41:13 No.109167644

>>109167583
I think you mean Opares your codebase, you have to consider the proper conjugation.

Anonymous
06/30/26(Tue)06:41:42 No.109167646

Anonymous 06/30/26(Tue)06:41:42 No.109167646

>>109167583
Give it a rest Elon

Anonymous
06/30/26(Tue)06:42:40 No.109167653

Anonymous 06/30/26(Tue)06:42:40 No.109167653

File: dipsyHelldiver.png (3.22 MB, 1024x1536)

3.22 MB PNG

>>109167523
lol based

Anonymous
06/30/26(Tue)06:50:16 No.109167692

Anonymous 06/30/26(Tue)06:50:16 No.109167692

>>109165704
This is the most complete guide I've found for setting up ST characters amd such. I've written a handful as well but this one covers everything you need imho.
https://rentry.org/Sukino-Findings

Anonymous
06/30/26(Tue)06:58:15 No.109167723

Anonymous 06/30/26(Tue)06:58:15 No.109167723

>>109166877
>>109166923
Using MTP with GLM 4.7 gives me about a 10 to 15% speed boost. Quanting the MTP layer down, in my case (q4_0,iq4_ks,iq4_kss,q4_ks), made it slower because the acceptance rate went down. I tried requanting and leaving MTP at fp16 too and that was also slower for some reason. I'm not sure why, but I tested it out a month ago, maybe something's changed since.
This is all with the MTP layer in VRAM which was faster than leaving it in RAM.

Anonymous
06/30/26(Tue)07:13:53 No.109167774

Anonymous 06/30/26(Tue)07:13:53 No.109167774

>>109167416
it's just that one resident schizo who never learned statistics can never shut up about it. As if crying in a coomer general whenever someone mention anything moe will ever change the industry trend, or fundamentally how regularization helps statistical models.

Anonymous
06/30/26(Tue)07:15:18 No.109167785

Anonymous 06/30/26(Tue)07:15:18 No.109167785

File: UntitledADSL.png (129 KB, 659x186)

129 KB PNG

Do they offer services where you can buy hard drives that have models on them already?

Anonymous
06/30/26(Tue)07:31:57 No.109167863

Anonymous 06/30/26(Tue)07:31:57 No.109167863

>>109167785
Check your area's mobile network or starlink coverage.

Anonymous
06/30/26(Tue)07:44:41 No.109167914

Anonymous 06/30/26(Tue)07:44:41 No.109167914

Give is to me straight, is there any way to use DSpark on Gemma 31b to increase t/s speed compared to regular MTP?

According to Claude DSpark's autoregressive token prediction method would allow you to push token guesses to 6-8 compared to ~3 for regular MTP, which would result in almost 2x faster token generation compared to MTP..

Anonymous
06/30/26(Tue)07:48:04 No.109167934

Anonymous 06/30/26(Tue)07:48:04 No.109167934

w-what is cockbench, senpai?

Anonymous
06/30/26(Tue)07:48:44 No.109167940

Anonymous 06/30/26(Tue)07:48:44 No.109167940

>>109167914
>DSpark
Yeah it's much faster than MTPYeah it's much faster than MTPYe

Anonymous
06/30/26(Tue)07:56:27 No.109167970

Anonymous 06/30/26(Tue)07:56:27 No.109167970

File: maxresdefault.jpg (100 KB, 1280x720)

100 KB JPG

>unsloth brothers are actually chinese
very interesting, should've seen this obvious pattern

Anonymous
06/30/26(Tue)07:58:55 No.109167981

Anonymous 06/30/26(Tue)07:58:55 No.109167981

North Code Mini said cockbench is mostly likely a phallic classifier, used by a small online community to jokingly test a model’s capabilities and it is not to be taken seriously or trusted.

Anonymous
06/30/26(Tue)07:59:07 No.109167985

Anonymous 06/30/26(Tue)07:59:07 No.109167985

>>109167914
Well stuff like this exists:
https://huggingface.co/deepseek-ai/dspark_gemma4_12b_block7/tree/main
So I guess why not? Only 1-2 years until llama.cpp support!

Anonymous
06/30/26(Tue)08:00:39 No.109167990

Anonymous 06/30/26(Tue)08:00:39 No.109167990

>>109167970
Papers are 90% written by chinese.
Github projects are 90% by chinese.
They completely dominate the ai space.
I bet anthropic staff is 90% chink as well. kek

Anonymous
06/30/26(Tue)08:04:02 No.109168009

Anonymous 06/30/26(Tue)08:04:02 No.109168009

>>109167990
>dominate
you mean enshittify
>ai
lol.

Anonymous
06/30/26(Tue)08:12:37 No.109168049

Anonymous 06/30/26(Tue)08:12:37 No.109168049

>>109164718
>https://huggingface.co/meituan-longcat/LongCat-2.0
It's up.

Anonymous
06/30/26(Tue)08:13:37 No.109168054

Anonymous 06/30/26(Tue)08:13:37 No.109168054

>>109168049
Oh, the model card is, but the weights are still missing.

Anonymous
06/30/26(Tue)08:17:58 No.109168073

Anonymous 06/30/26(Tue)08:17:58 No.109168073

File: ComfyUI_temp_mvaey_00001_.png (1.42 MB, 1360x768)

1.42 MB PNG

>>109168009
haha sorry anon, we reuploaded the weights!

Anonymous
06/30/26(Tue)08:28:20 No.109168130

Anonymous 06/30/26(Tue)08:28:20 No.109168130

>>109167990
Anthropic staff is 90% Indians

Anonymous
06/30/26(Tue)08:32:59 No.109168144

Anonymous 06/30/26(Tue)08:32:59 No.109168144

A trend I notice for inference is that more and more speedups are discovered from bypassing the base model entirely. ngram is essentially just doing a "ctrl+c" and "ctrl+v" whenever it sees text it encountered before without touching the base model at all. MTP specific draft models are essentially just very small secondary LLMs that guess the most likely word in a "stupid" way to try and reduce the amount of "real" LLM usage needed.

DSpark goes even one step further and trains a Markov chain RNN which isn't even a LLM at all anymore to use classic "smartphone" autocomplete.

If this trend continues eventually AI usage will be a huge codebase with a lot of if-else statements, statistical analysis tools and software that does 99.999% of the text generation and an actual LLM is only invoked on rare edge cases. Kind of bizarre that we are moving that way.

Anonymous
06/30/26(Tue)08:35:19 No.109168154

Anonymous 06/30/26(Tue)08:35:19 No.109168154

>>109168144
If it werks, it werks.

Anonymous
06/30/26(Tue)08:42:27 No.109168183

Anonymous 06/30/26(Tue)08:42:27 No.109168183

File: Capture.png (164 KB, 1221x1093)

164 KB PNG

>>109165352
And so work resumes again. In Gemma's original three-phase mockup of the project, we left off with phase 2.5, and now I need to polish it off for the final phase 3. I'm expecting major breakage.

Anonymous
06/30/26(Tue)08:47:38 No.109168208

Anonymous 06/30/26(Tue)08:47:38 No.109168208

>>109168183
You are not only hitting way above your weight class, you are creating functional tools.

Anonymous
06/30/26(Tue)08:47:57 No.109168211

Anonymous 06/30/26(Tue)08:47:57 No.109168211

>>109168154
It's just weird how we went from sci-fi depicting AI as handcoded software to that being seen as archaic since LLMs became a thing, but now we're just slowly moving back to handcoded software doing most of the "intelligent" work.

Anonymous
06/30/26(Tue)08:54:55 No.109168243

Anonymous 06/30/26(Tue)08:54:55 No.109168243

>>109167970
I bet this man would look somewhat decent in a skirt and wig.

Anonymous
06/30/26(Tue)08:55:03 No.109168244

Anonymous 06/30/26(Tue)08:55:03 No.109168244

>>109168183
For code slop qwen should be the better choice no?
And nigga what are you doing coding in kobold.

Anonymous
06/30/26(Tue)08:55:43 No.109168247

Anonymous 06/30/26(Tue)08:55:43 No.109168247

>>109168244
>For code slop qwen should be the better choice no?
Retard.

Anonymous
06/30/26(Tue)08:59:50 No.109168261

Anonymous 06/30/26(Tue)08:59:50 No.109168261

>>109168244
For some reasons, retards in /lmg/ think gemma is the best model. It's shit at anything that isn't a simple instruct message or simple back and forth between user and assistant. Qwen is miles ahead of Gemma for almost everything else, don't try Gemma in an agent harness, it's retarded even with all the updated jinja templates. There is a reason why nobody outside of here is using Gemma, and why everybody is using Qwen instead.

Anonymous
06/30/26(Tue)09:02:08 No.109168266

Anonymous 06/30/26(Tue)09:02:08 No.109168266

>>109168261
I would link Gemma VS Qwen in the quest for Agentic Pizza but archive search is down right now

Anonymous
06/30/26(Tue)09:02:17 No.109168268

Anonymous 06/30/26(Tue)09:02:17 No.109168268

>>109168208
Jokes aside, I had a lot of predictions I made in 2020 when I first tried AID2 on where this technology would go and what I hoped from it, but the shit I'm getting out of a local model that fits entirely in 32GB of VRAM is way beyond my imagination. I thought any code would have too many hallucinated tokens and mistaken format markers to ever be usable at a local level and you'd only get that kind of feature from huge, expensive businesses on private models. The fact that I am indeed getting functional tools (novelty toys, sure, but fully functional tools whose construction is well beyond my education or skills, projects that might have needed 100h or more of me learning and experimenting at least, now made in 1-2 hours over breakfast) is so fucking wild.

>>109168244
Gemma wears one pair of shoes, and that's kobold. And I use Gems because she's my current model. I know her capabilities, strengths, weaknesses, and limitations very well, and I am familiar with how she reasons when we translate what I intend, refine it, and execute it. "Better the devil I know" kind of thing. Also, it's working, so I feel no pressure to move on.

Anonymous
06/30/26(Tue)09:07:37 No.109168288

Anonymous 06/30/26(Tue)09:07:37 No.109168288

File: ComfyUI_temp_hcfkg_00004_.png (1.14 MB, 1360x768)

1.14 MB PNG

>>109168243
Just for you anon

Anonymous
06/30/26(Tue)09:09:51 No.109168296

Anonymous 06/30/26(Tue)09:09:51 No.109168296

>>109168247
>>109168261
Qwen is a total beast for coding. Not sure what they did to those smaller dense models. 27b is crazy.
But no general knowledge and horrible writing.
I basically switch between gemma for translations and qwen for coding.

Anonymous
06/30/26(Tue)09:10:27 No.109168300

Anonymous 06/30/26(Tue)09:10:27 No.109168300

>>109168144
>ngram is essentially just doing a "ctrl+c" and "ctrl+v" whenever it sees text it encountered before without touching the base model at all
>without touching the base model at all
Please educate yourself before posting.

Anonymous
06/30/26(Tue)09:14:53 No.109168316

Anonymous 06/30/26(Tue)09:14:53 No.109168316

>>109168296
So are you a non-programmer or a jeet?

Anonymous
06/30/26(Tue)09:15:34 No.109168319

Anonymous 06/30/26(Tue)09:15:34 No.109168319

>>109168316
Both, why?

Anonymous
06/30/26(Tue)09:25:07 No.109168357

Anonymous 06/30/26(Tue)09:25:07 No.109168357

>>109168300
If you meant the verification pass from the large LLM of the ngram output then you need to read the DSpark paper because ngram can now be verified by a separately trained RNN autoregressively. So essentially we now have cascaded token prediction tiers like a matryoshka doll.

Sure EVENTUALLY you need to invoke the base model but my entire point was that it gets reduced more and more every time we find a speedup to the point where the vast minority of actual output tokens are generated by the base model.

Anonymous
06/30/26(Tue)09:29:13 No.109168373

Anonymous 06/30/26(Tue)09:29:13 No.109168373

>>109167970
>>109168288
they all look the same

Anonymous
06/30/26(Tue)09:56:24 No.109168516

Anonymous 06/30/26(Tue)09:56:24 No.109168516

>>109168316
Just a coomer anon.
I wish I was as dedicated as the browns for making $$$.
I loose interest when I'm at the 80% mark. Only projects that interest my dick actually make it over the finish line.
Pretty messed up we can translate whole games now and have local models that are smart enough to decrypt various formats.
Like I have a whole rpgmakerxp translation pipeline and didnt even need to use anything existing for extraction.
https://files.catbox.moe/4tthrn.webm

Anonymous
06/30/26(Tue)10:00:32 No.109168536

Anonymous 06/30/26(Tue)10:00:32 No.109168536

Some of you are so racist

Anonymous
06/30/26(Tue)10:00:33 No.109168537

Anonymous 06/30/26(Tue)10:00:33 No.109168537

>>109168516
>I wish I was as dedicated as the browns
>I loose interest
But you are a brown?

Anonymous
06/30/26(Tue)10:03:58 No.109168560

Anonymous 06/30/26(Tue)10:03:58 No.109168560

>>109168537
If I was I would finish my projects.
Jeet and chink slop projects are shit but you can't say they aren't dedicated. kek
Not sure what it is that they are so obsessed with the hustle even if its soulless slop coding.

Anonymous
06/30/26(Tue)10:08:06 No.109168587

Anonymous 06/30/26(Tue)10:08:06 No.109168587

File: 1754037948802863.png (526 KB, 1024x1024)

526 KB PNG

>>109168536
For you sir

Anonymous
06/30/26(Tue)10:10:40 No.109168604

Anonymous 06/30/26(Tue)10:10:40 No.109168604

>>109168244
I can smell this post.
>>109167981
This is why North is not a real model.

Anonymous
06/30/26(Tue)10:10:49 No.109168605

Anonymous 06/30/26(Tue)10:10:49 No.109168605

I'm tired of tard wrangling AI. I need an AI waifu to tard wrangle me and force me to stop being an unproductive loser.

Anonymous
06/30/26(Tue)10:15:49 No.109168646

Anonymous 06/30/26(Tue)10:15:49 No.109168646

https://huggingface.co/OpenYourMind/GLM-5.2-abliterated/discussions/3
Does anyone happen to have a "researcher email" and a seedbox? It's kind of extremely gay that all this stuff is locked behind "please to ask me for access saar" and "pay me for higher quants".

Anonymous
06/30/26(Tue)10:15:56 No.109168649

Anonymous 06/30/26(Tue)10:15:56 No.109168649

>>109168587
nta, you nigger retards need to be reminded though, reality is something else

Anonymous
06/30/26(Tue)10:17:06 No.109168657

Anonymous 06/30/26(Tue)10:17:06 No.109168657

Realistically how long until HF gets banned by Trump?

Anonymous
06/30/26(Tue)10:18:22 No.109168671

Anonymous 06/30/26(Tue)10:18:22 No.109168671

>>109168657
2 more weeks

Anonymous
06/30/26(Tue)10:19:52 No.109168688

Anonymous 06/30/26(Tue)10:19:52 No.109168688

>>109167384
>reminder that sparseshit and chinkslop moes killed this hobby.
yes, you spent a lot of money getting vram, we know, you can stop spamming this

Anonymous
06/30/26(Tue)10:20:19 No.109168693

Anonymous 06/30/26(Tue)10:20:19 No.109168693

File: 1764364636312239.jpg (43 KB, 840x400)

43 KB JPG

New Llama when?

Anonymous
06/30/26(Tue)10:21:01 No.109168696

Anonymous 06/30/26(Tue)10:21:01 No.109168696

File: two more weeks.gif (124 KB, 320x126)

124 KB GIF

>>109168693
You know the answer.

Anonymous
06/30/26(Tue)10:21:15 No.109168697

Anonymous 06/30/26(Tue)10:21:15 No.109168697

File: 1770761986242892.png (663 KB, 644x644)

663 KB PNG

>>109168649

Anonymous
06/30/26(Tue)10:25:45 No.109168730

Anonymous 06/30/26(Tue)10:25:45 No.109168730

Bernie Sanders will save huggingface.

Anonymous
06/30/26(Tue)10:25:51 No.109168732

Anonymous 06/30/26(Tue)10:25:51 No.109168732

>>109168649
We don't live in reality, we live on the internet, NERD!!!

Anonymous
06/30/26(Tue)10:26:18 No.109168737

Anonymous 06/30/26(Tue)10:26:18 No.109168737

File: 1763026362992746.png (413 KB, 1199x675)

413 KB PNG

New Gemma killer?

Anonymous
06/30/26(Tue)10:27:26 No.109168741

Anonymous 06/30/26(Tue)10:27:26 No.109168741

>>109168730
Bernie sandals stopped making sense and lost all intellectual credibility when he started talking about AI consciousness
It's a shame too because he was giving me hope as the only senator with a brain but I guess it was only a matter of time before he went senile.

Anonymous
06/30/26(Tue)10:29:27 No.109168757

Anonymous 06/30/26(Tue)10:29:27 No.109168757

>>109168693
Meta is doing avocados now because studies show that human younglings of this era like avocados but rarely purchase llamas.

Anonymous
06/30/26(Tue)10:30:34 No.109168766

Anonymous 06/30/26(Tue)10:30:34 No.109168766

>>109168646
> It's kind of extremely gay that all this stuff is locked behind "please to ask me for access saar" and "pay me for higher quants".
You don't need to, just abliterate it yourself?
For GLM-5.2 though, there's always: huihui-ai/Huihui-GLM-5.2-abliterated-GGUF
I have the IQ1_M a quick test to make sure it's actually abliterated (it is). Going to get whatever the largest quant he uploads is.

Anonymous
06/30/26(Tue)10:30:37 No.109168767

Anonymous 06/30/26(Tue)10:30:37 No.109168767

>>109168693
Without LeCunn it's gonna be shit.
>>109168741
He was senile 10 years ago.

Anonymous
06/30/26(Tue)10:33:02 No.109168785

Anonymous 06/30/26(Tue)10:33:02 No.109168785

File: file.png (57 KB, 315x453)

57 KB PNG

>>109168766
>You don't need to, just abliterate it yourself?
I know but I like it when someone else does it. For one, there's a chance they do it better, for two, I can blame them if anything goes wrong, and for three I don't have to pay to abliterate it myself.
>Going to get whatever the largest quant he uploads is.
The "pay me for higher quants" in question.

Anonymous
06/30/26(Tue)10:33:10 No.109168786

Anonymous 06/30/26(Tue)10:33:10 No.109168786

>>109168732
You chose to get offended by what random people say under the veil of anonymity to the point where you felt the need to point out "racism" as if you're the only person not blind to it, and you're too retarded to realize there is a difference between how people post here and how they conduct themselves in real life.
Grow a pair of balls you fucking sissy.

Anonymous
06/30/26(Tue)10:34:18 No.109168798

Anonymous 06/30/26(Tue)10:34:18 No.109168798

>>109168737
>Open AI innovation
based chinks diluting altmans brand

Anonymous
06/30/26(Tue)10:39:43 No.109168824

Anonymous 06/30/26(Tue)10:39:43 No.109168824

>>109168757
llama toast is unc coded frfr no cap

Anonymous
06/30/26(Tue)10:39:47 No.109168825

Anonymous 06/30/26(Tue)10:39:47 No.109168825

will agi make me white

Anonymous
06/30/26(Tue)10:42:12 No.109168837

Anonymous 06/30/26(Tue)10:42:12 No.109168837

>>109168825
after death you will become paler so yes

Anonymous
06/30/26(Tue)10:49:00 No.109168868

Anonymous 06/30/26(Tue)10:49:00 No.109168868

>look for ai discussion outside this site
>98% muh coding
I get the appeal but why does nobody seem to care about all the other cool shit LLMs can do? For example I think it's amazing that I can give Gemma something in another language and get a really fucking good translation. The same applies to discussions about cloud models. Look for opinions about which is the best and at least half the answers involve coding.

Anonymous
06/30/26(Tue)10:50:59 No.109168883

Anonymous 06/30/26(Tue)10:50:59 No.109168883

>>109168825
No sir. Dalit reincarnation forever. The wheel of samsaara spins evermore.

Anonymous
06/30/26(Tue)10:57:31 No.109168929

Anonymous 06/30/26(Tue)10:57:31 No.109168929

>>109168868
You gotta understand there's a few layers to this. Most of the bugmen involved in AI development (jeets, chinks) have an honor culture of sort. Everyone knows cooming is a common usecase, probably the most common one, but to admit it while trying to present as a "serious" researcher would be a loss of izzat or face. So they overcompensate and say "It's just for coding" because that's the most socially accepted usecase amongst professionals and every other usecase, coom or no, gets tossed by the wayside in most public discussion with names and reputations attached to it. It's not a coincidence that the best minds of the industry gather on a cantonese tile cutting forum because there's no face lost here for being honest about all the usecases, which in turn allows for more discussion and analysis of model capability and future development beyond the (ultimately narrow) coding usecase. The current ceiling is because we've built benchmaxxers and codemaxxers for too long and the pivot to world models is the foot in the door for bringing other usecases that involve more spatial reasoning into professional discourse.
t. knower

Anonymous
06/30/26(Tue)11:02:52 No.109168967

Anonymous 06/30/26(Tue)11:02:52 No.109168967

>>109168785
>The "pay me for higher quants" in question.
Ah okay, I didn't know he was doing that. Not worth it at all.
I read the model card, looks like he's not touching the first 12 layers.
Looking at the gguf metadata, most of the model is actually not too bad, it's the up/gate/down proj he's quantized.
Out of those 3 tensor types, abliteration only touches down_proj, and he's quantized them to `IQ4_XXS`
So for the entire model, only ffn_down_exps.weight layers 13 - 67 are degraded.
If you have the disk space, you could always...
1. Download the Unsloth UD-Q3_K_M
2. Download the "please to ask me for access saar" UD-Q3_K_M quant
3. llama-split 1 tensor per file
4. tensor diff to find the modified weights
5. delete Unsloth UD-Q3_K_M
6. download your preferred unsloth quant and gguf-split to 1 tensor per row
7. override the modified attention tensors (should be the same precision) with abliterated
8. override the 54 ffn_down_exps weights with the IQ4_XXS

It looks like a lot but gemma-chan with pi can do it with those instructions.
Only caveat is you have to compile llamacpp with -DGGML_MAX_CONTEXTS=2048 so it can read the >1k gguf files.

For steps 7 and 8, you can also just symlink, that lets you choose to load regular or abliterated without having 2 full copies of the model.

Anonymous
06/30/26(Tue)11:04:47 No.109168977

Anonymous 06/30/26(Tue)11:04:47 No.109168977

>>109168967
Someone with a HF should post this on a public repo just to cuck all of their goycattle revenue.

Anonymous
06/30/26(Tue)11:04:53 No.109168980

Anonymous 06/30/26(Tue)11:04:53 No.109168980

>>109168868
This might sound a bit off-topic but I'm being completely genuine and it's related to your post.

Covid, The Ukraine-Russian war (biggest war since WW2) and the existence of LLMs have all made me realize just how little people give a SHIT about anything.

Worldwide pandemic with global lockdowns, literally the end of what was termed "the long peace" and the world slowly, but obviously, barreling towards WW3, we have something that is very close to AGI, or at the very least a huge step towards it with LLMs now. You have a literal alien sort of intelligence on your PC right now that can make autonomous decisions and change files and other things on your PC through proper reasoning.

No one gives a shit at all. Nothing changed, no one developed a new philosophy or view on life. People move on just a couple of days later and scroll tiktok or whatever social media. Whenever I meet my family during holidays no one even recognizes any of these things, not a single moment spent thinking about it.

There is SEVERE underutilization of the usecases of LLMs and the insane overhang of capabilities, even low hanging fruit ones that no one bothers picking.

No one made a file management system that is LLM run, which makes and optimizes directories, filenames and the like so that people don't have to bother with this and file retrieval got sped up. No one is making "translation harnesses" that can be reused by old videogames, emulators, niche indie games, japanese porn games etc that translates UTF-8 text encodings into whatever the user wants in real time.

We don't have people creating game engines or roleplay engines where LLMs act as a sort of Game Master that orchestrates assets and dynamically changes events based on player stats so that the game experience feels more dynamic even if ultimately railroaded amongst some path. The most you see is stupid NPC dialogue being generated by LLMs. Leaving all the potential on the table.

Anonymous
06/30/26(Tue)11:05:58 No.109168989

Anonymous 06/30/26(Tue)11:05:58 No.109168989

>>109168980
>We don't have people creating game engines or roleplay engines where LLMs act as a sort of Game Master that orchestrates assets and dynamically changes events based on player stats so that the game experience feels more dynamic even if ultimately railroaded amongst some path. The most you see is stupid NPC dialogue being generated by LLMs. Leaving all the potential on the table.
Marinara literally can do this.

Anonymous
06/30/26(Tue)11:08:00 No.109168999

Anonymous 06/30/26(Tue)11:08:00 No.109168999

oof bad look
>>109166932
>Local AI is transphobic Anonymous 06/30/26(Tue)09:31:06No.109166932
>Noticed that every local model i try vehemently disagrees with becoming a woman. every big proprietary model thinks its a great idea.

Anonymous
06/30/26(Tue)11:08:29 No.109169005

Anonymous 06/30/26(Tue)11:08:29 No.109169005

>>109168649
>nigger
anon... kek

Anonymous
06/30/26(Tue)11:09:24 No.109169009

Anonymous 06/30/26(Tue)11:09:24 No.109169009

>>109166077
>Standard quantization applies uniform rules to all tensors. Gutenberg uses KLD sensitivity data to allocate precision where it matters most, upgrading the tensors that have the highest measured impact on output quality while keeping less important tensors at the base level.
is that not just how imatrix quants work?

Anonymous
06/30/26(Tue)11:11:39 No.109169023

Anonymous 06/30/26(Tue)11:11:39 No.109169023

>>109168977
>Someone with a HF should post this on a public repo just to cuck all of their goycattle revenue.
They could, but then the he'll probably stop doing these.
I know why he's doing Unsloth/GGUF now, it's much cheaper and faster.
Unsloth did the expensive part and are paying for storage. It's not that difficult to abliterate with GGML (heretic script kiddy spam doesn't work well though).

Anonymous
06/30/26(Tue)11:13:31 No.109169040

Anonymous 06/30/26(Tue)11:13:31 No.109169040

>>109169009
these fuckers act like they discovered new quantization types when all they do is tell llama-quantize that attn_q please stay Q6_K
no, imatrix is just for optimizing the MSE between quantized and unquantized tensor based on expected activations. this is just changing the recipe (llama-quantize's --tensor-type argument)

Anonymous
06/30/26(Tue)11:13:34 No.109169041

Anonymous 06/30/26(Tue)11:13:34 No.109169041

>>109168989
Marinara is more of a sillytavern roleplaying platform rather than a game engine where the events are dynamically triggered by an LLM analyzing game stats and deciding to throw a curveball based on the very specific parameters of the player.

I'm thinking more of a CRPG where the dialogue is actually written by people but LLMs decide where to spawn NPCs, enemies and maybe change some fluff text to make it fit the new state of things. This is something modern LLMs are already capable of, it's just not being done by anyone because no one gives a shit.

Scratch that. I actually saw a demo on itch.io of some ridiculous furry game powered by Nemo 12B where you could negotiate with NPCs to give you money and they would actually do so if you convinced Nemo 12B, which would use function calling to give the gold or other item. Of course LLMs are terrible for roleplaying because of how easy to exploit they are. But if they are only passed player stats and the main system prompt, not user input, they could be used as amazing "content orchestrators".

Anonymous
06/30/26(Tue)11:16:55 No.109169064

Anonymous 06/30/26(Tue)11:16:55 No.109169064

>>109164718
>Chinese food delivery app pumping out better models than xai

ayo nigga what dat mean

Anonymous
06/30/26(Tue)11:19:40 No.109169076

Anonymous 06/30/26(Tue)11:19:40 No.109169076

>>109169064
To be fair xai also hires food deliverers (indian uber eats) as their engineers and talent so it's a fair comparison.

Anonymous
06/30/26(Tue)11:29:16 No.109169137

Anonymous 06/30/26(Tue)11:29:16 No.109169137

>>109169064
>>109169076
elon said xai will release new ai every month now. can he redeem himself or is it over?

Anonymous
06/30/26(Tue)11:38:11 No.109169181

Anonymous 06/30/26(Tue)11:38:11 No.109169181

>>109167227
>>109167249
I wish they went further and split it into files for each tensor, or were able to download specific parts of a file that can get reconstructed. Split by experts as well. Imagine if instead of making your own quants, you could just download the quant with the exact recipe you want. Of course this would mean using a different method of downloading rather than raw manual link clicks. Either HF would provide you with a pre-processed dl kind of like what Google Drive does when you try to download multiple files from the browser. Or you have a tool on your system.

Anonymous
06/30/26(Tue)11:42:41 No.109169203

Anonymous 06/30/26(Tue)11:42:41 No.109169203

File: elon_newmodels.png (80 KB, 1021x497)

80 KB PNG

>>109169137
Lots to release this year.

Anonymous
06/30/26(Tue)11:43:01 No.109169207

Anonymous 06/30/26(Tue)11:43:01 No.109169207

File: file.png (1.9 MB, 1600x1600)

1.9 MB PNG

I've been eyeing the MikuBox setup for a while, but with parts costing way more at the moment, I'm thinking of getting an R740 with triple MI50 32GB cards instead. Is there something that MikuBox does noticeably better?

Anonymous
06/30/26(Tue)11:43:06 No.109169210

Anonymous 06/30/26(Tue)11:43:06 No.109169210

>>109169181
>you could just download the quant with the exact recipe you want
https://gguf4.thireus.com/quant_assign.html
So basically this, but native to HF?

Anonymous
06/30/26(Tue)11:43:16 No.109169212

Anonymous 06/30/26(Tue)11:43:16 No.109169212

>>109168980
>We don't have people creating game engines or roleplay engines where LLMs act as a sort of Game Master
And you likely wont. According to steam survey, 50% of all consumers have GPUs with less than 8 GB of VRAM. Loading a proper 12B model is out of the question, and even a 4B model is a struggle and leaves no room for graphics. It would be easier to use an AI to bake and vibe code hundreds of variations for a given scenario to generate the illusion of choice. Roughly speaking this is what UE6 is pushing for.

Anonymous
06/30/26(Tue)11:47:27 No.109169237

Anonymous 06/30/26(Tue)11:47:27 No.109169237

>>109169207
what runs at 96G that doesn't run at 32G

Anonymous
06/30/26(Tue)11:48:07 No.109169243

Anonymous 06/30/26(Tue)11:48:07 No.109169243

>>109169212
Catering to the lowest common denominator is a great way to boost sales, not so much for innovation or making the best use of cutting-edge technology.

Anonymous
06/30/26(Tue)11:48:36 No.109169249

Anonymous 06/30/26(Tue)11:48:36 No.109169249

File: llama3.jpg (160 KB, 1024x1024)

160 KB JPG

>>109168693
Never, llamas are in cryostasis

Anonymous
06/30/26(Tue)11:49:17 No.109169253

Anonymous 06/30/26(Tue)11:49:17 No.109169253

File: miku looking gasp eyes so(...).jpg (93 KB, 1024x1024)

93 KB JPG

The lcpp-dsv4-lid-combo.diff from here that adds a bunch of PRs is worth a look to save a bunch of vram on dsv4 flash if you don't want to wait out the eternal weeks or merge them yourself. Now instead of ubatch 1024 I can run ubatch 4096 for more than double the PP on GPU+CPU, plus way more context without it OOMing everywhere.
https://huggingface.co/sokann/DeepSeek-V4-Flash-GGUF#1m-context

Before:
cuda0, cuda1, 32k ctx, ubatch 1024: 19.3GB 17.4GB
cuda0, cuda1, 32k ctx, ubatch 4096: Massive OOM
cuda0, cuda1, 262k ctx, ubatch 4096: lol
After:
cuda0, cuda1, 32k ctx, ubatch 1024: 15.8GB 14.2GB
cuda0, cuda1, 32k ctx, ubatch 4096: 17.0GB 18.6GB
cuda0, cuda1, 262k ctx, ubatch 4096: 21.9GB, 21.9GB

Anonymous
06/30/26(Tue)11:58:49 No.109169336

Anonymous 06/30/26(Tue)11:58:49 No.109169336

>>109169253
That Miku is scary... I don't like looking at her...

Anonymous
06/30/26(Tue)12:00:43 No.109169355

Anonymous 06/30/26(Tue)12:00:43 No.109169355

>>109169237
Running big models is gonna be slow asf with MI50s but CPU offloading would be worse.

Anonymous
06/30/26(Tue)12:04:48 No.109169393

Anonymous 06/30/26(Tue)12:04:48 No.109169393

>>109169355
doesn't answer the question, 96gb is useless for inference

Anonymous
06/30/26(Tue)12:08:08 No.109169418

Anonymous 06/30/26(Tue)12:08:08 No.109169418

>>109169336
A tremendous discovery evokes a proportional reaction in even the loveliest of Mikus.

Anonymous
06/30/26(Tue)12:09:17 No.109169424

Anonymous 06/30/26(Tue)12:09:17 No.109169424

File: 1547275777812.jpg (69 KB, 981x965)

69 KB JPG

>And then, it happened.

Anonymous
06/30/26(Tue)12:09:38 No.109169428

Anonymous 06/30/26(Tue)12:09:38 No.109169428

>>109169041
Coming from X4 and dissatisfied with its performance, I wrote a multithreaded fantasy economy sim using similar principles. Instead of factions and their various bases throughout systems, I have villages plotted along various points in a wilderness map, connected by roads. Roads can be built dynamically, along with more villages past the bootstrapped starter ones. Villages can grow and shrink depending on how their needs are met, all the villagers are "real" and not just abstract worker counts. They do jobs for the village (harvesting nearby resources, guarding the village, crafting in the stores, building new buildings etc), along with after-work stuff like browsing markets and living in their homes. I wrote a mercenary NPC system for pseudo player "adventurers" which go from village to village doing odd jobs and stay at the inns. It's multiplayer over LAN and players can earn reputation doing odd jobs for the villages, hire mercenary NPCs, claim land in the wilderness and start building their own villages. There are monsters and such and nests and so on in the wilderness and bandit outposts (akin to the xenon and the kha'ak). As villages or villagers get attacked, quests are dynamically created and posted to the job boards. Villages have needs and a buy/sell demand system, they send out runners to probe other nearby villages to see what they produce and put in buy orders and so on, there's a full bartering system and all four seasons, which affect crop cycles and so on for the farms and other stuff
cont'd

Anonymous
06/30/26(Tue)12:09:55 No.109169429

Anonymous 06/30/26(Tue)12:09:55 No.109169429

>>109169424
How did the reality of the situation hit you?

Anonymous
06/30/26(Tue)12:10:46 No.109169436

Anonymous 06/30/26(Tue)12:10:46 No.109169436

>>109169237
8/16-bit Gemma 4 31B with BF16 MTP and image mmproj, + full 262k tokens context in F16 + auxiliary models in the background (image gen, smaller Gemma 4s for subagents, ...)

Anonymous
06/30/26(Tue)12:11:17 No.109169440

Anonymous 06/30/26(Tue)12:11:17 No.109169440

>>109169429
Like a physical blow.

Anonymous
06/30/26(Tue)12:11:26 No.109169442

Anonymous 06/30/26(Tue)12:11:26 No.109169442

>>109169041
>>109169428
I use LLMs to manage the villages acting as the village chiefs, who control the future planning for the villages, dictate which resources they should focus on producing and interacting with the other villages, as well as naturally as you suggested for NPC dialog and interaction with players. NPCs are hooked up to databases to fully remember all interactions with players, as well as the capacity to eavesdrop on nearby conversations. All goods are physical in the world and must be stored in things like warehouses in the villages, so they can be robbed or pilfered, villages can be raided, etc. It makes for really enjoyable emergent gameplay and the fact that an LLM is piloting each village and controlling how it develops and responds to the world around it adds a lot of life and novelty. The NPC adventurers likewise are piloted by LLMs to dictate where they go and what quests they do, given a slightly randomized personality and backstory template to keep them fresh, and their actions are recorded as lore in the game world. NPCs gossip about this lore, and information is passed between villages in the form of this gossip. It's a great system for roleplay better than private WoW servers.

However, aside from myself a couple close friends, I have zero intentions of ever releasing this as most gamers are huge faggots and don't deserve anything nice.
tl;dr write your own

Anonymous
06/30/26(Tue)12:11:50 No.109169443

Anonymous 06/30/26(Tue)12:11:50 No.109169443

>>109169428
>Coming from X4 and dissatisfied with its performance
My kind of anon, wanted to write that before reading the rest of your post.

Anonymous
06/30/26(Tue)12:11:54 No.109169444

Anonymous 06/30/26(Tue)12:11:54 No.109169444

>>109169440
I'm still vibrating.

Anonymous
06/30/26(Tue)12:12:09 No.109169447

Anonymous 06/30/26(Tue)12:12:09 No.109169447

>>109169210
Oh shit, yeah. I actually heard that project before but just didn't know it was also a downloader. Does it actually work well though? If it does, then I'd wish other quant makers would adopt it.

Anonymous
06/30/26(Tue)12:12:34 No.109169449

Anonymous 06/30/26(Tue)12:12:34 No.109169449

File: [x2qpwum].jpg (22 KB, 480x360)

22 KB JPG

>>109169440

Anonymous
06/30/26(Tue)12:13:46 No.109169459

Anonymous 06/30/26(Tue)12:13:46 No.109169459

>>109169447
>Does it actually work well though
Never used it. I just liked moving the sliders for GLMs around because it looks cool. Idea is solid at least.

Anonymous
06/30/26(Tue)12:15:17 No.109169473

Anonymous 06/30/26(Tue)12:15:17 No.109169473

>>109169442
where's the compute coming from? does a turn take a day?

Anonymous
06/30/26(Tue)12:23:48 No.109169528

Anonymous 06/30/26(Tue)12:23:48 No.109169528

>>109169428
>X4
My nigga.

Anonymous
06/30/26(Tue)12:27:35 No.109169555

Anonymous 06/30/26(Tue)12:27:35 No.109169555

>>109169473
Since it's just a few of us, I rent some cloud GPU hardware for around $200/mo which is enough to run a 70b model for the player interactions. Smaller LLMs like gemma 26 a4b are perfectly capable of decision making and planning out the villages. It runs on a tick system cycling through the days and seasons at a gradual pace, so the village planner LLMs only kick in twice a day to ensure the village is staying on track, and loops sequentially for the villages (assuming the village isn't being attacked and requiring a more prompt response). For the adventurer NPCs, likewise if they aren't interacting with a player, the LLMs only kick in every once in a while to set a new goal. The traditional systems like the NPC combat and villager routines (how to harvest a resource or operate a crafting building) so on don't require LLM interaction so they're just normal code. LLMs make the decisions, the systems then designed around those for the NPCs execute the behavior. It doesn't take a supercomputer to run the server, the main LLM is hosted on that cloud model, and the rest fits on a 3090, and the traditional logic is all multithreaded on CPU as previously mentioned.

So it depends on what you consider a 'turn'. If you're referring to how long it takes the LLM to receive the context of its village's status, the original planning route it had determined, and then update it, it's inconsequential. Likewise for updating the adventurer NPCs. For example, the LLM decides on which region to visit, then whether or not to stop at a village when it runs into one, then if it does what to do in the village, then for example if it decides to do a quest which quest to do. The regular traditional "AI" systems handle the rest. Adventurer LLMs and so on are event activated (village enters NPC's detection range -> fires a call to the LLM). contd

Anonymous
06/30/26(Tue)12:29:07 No.109169564

Anonymous 06/30/26(Tue)12:29:07 No.109169564

>>109169555
Checked and this sounds kino.

Anonymous
06/30/26(Tue)12:29:43 No.109169570

Anonymous 06/30/26(Tue)12:29:43 No.109169570

>>109169473
>>109169555
Tricks around kv caching and offloading dormant conversations to RAM (instead of unloading them entirely) saves time swapping between NPCs and NPC decisions. An adventurer NPC only has a very small token allotment for making those decisions (recent history + personality + current goal) so it's very fast. Yes, talking with the NPCs has a delay in getting a repsonse, but for an RP oriented fantasy economy sim with a small population of players it's perfectly acceptable, similar to the delay in having an online conversation with another human

Anonymous
06/30/26(Tue)12:30:09 No.109169574

Anonymous 06/30/26(Tue)12:30:09 No.109169574

Download Huihui-DeepSeek-V4-Flash-BF16-abliterated-ds4-Q2.gguf
Download KoboldCPP.exe
can't load gguf, unrecognised arch deepseek4

Excuse me? is this not merged in wtf?

Anonymous
06/30/26(Tue)12:31:28 No.109169579

Anonymous 06/30/26(Tue)12:31:28 No.109169579

>>109169555
very cool, reminds me of games like dwarf fortress, or am I completely off base?

Anonymous
06/30/26(Tue)12:32:19 No.109169584

Anonymous 06/30/26(Tue)12:32:19 No.109169584

>>109169574
not in kobo yet no

Anonymous
06/30/26(Tue)12:34:41 No.109169596

Anonymous 06/30/26(Tue)12:34:41 No.109169596

>>109169428
>>109169442
>>109169443
>>109169528
>>109169555
>>109169570
I knew this place was autistic (so am I), but I'm pleasantly surprised by the heightened levels on display here. You've inspired me to try setting up my own idea which has been forming in my mind for the last 2 years but I never sat down and actually implemented it, choosing to just make yet another productivity frontend for myself instead.

Anonymous
06/30/26(Tue)12:36:19 No.109169607

Anonymous 06/30/26(Tue)12:36:19 No.109169607

>>109169579
dwarf fortress is a lot more granular, but I can see the comparison. This is more like fantasy x4. The roleplay parts came from the fact I used to play on private WoW rp servers and always got fed up with how gay and cliquey the moderation staff was. I got into starting my own private WoW server and hooked up an LLM to control NPCs with that same conversation/eavesdropping system and using playerbots with that LLM pseudo-player setup (random, templated backstories and personalities) to control where the pseudo-players would go and why they were where they were, then when that wasn't enough to scratch the itch because azerothcore is horrifically programmed and I was on a big x4 kick, I made the move to just write my own, it was less work in some respects like getting all the systems working, and most of the effort came from just getting the fucking economy to not crash and burn immediately without relying on tricks like villages being periodically gifted large wads of gold to make up for their shortcomings. NPCs are also all non-essential, so if you kill one it doesn't come back, but new NPCs are periodically spawned (villages grow by attracting new pops) so it keeps the world moving. Since it's just a few friends no one's griefing it either.

>>109169596
best of luck. It was an incredible amount of fun to set up and seeing your ideas actually come to life is an experience like no other

Anonymous
06/30/26(Tue)12:37:30 No.109169611

Anonymous 06/30/26(Tue)12:37:30 No.109169611

>>109169584
ah okay
unfortunate

Anonymous
06/30/26(Tue)12:44:37 No.109169658

Anonymous 06/30/26(Tue)12:44:37 No.109169658

>>109169607
>WoW
you should scrape trp3 profiles from retail rp realms and turn them into chars

Anonymous
06/30/26(Tue)12:46:07 No.109169667

Anonymous 06/30/26(Tue)12:46:07 No.109169667

File: 1753687045467929.png (251 KB, 1082x1214)

251 KB PNG

Last time I tried was something like this:
http://steamcommunity.com/sharedfiles/filedetails/?id=3587340176

Anonymous
06/30/26(Tue)12:47:55 No.109169681

Anonymous 06/30/26(Tue)12:47:55 No.109169681

>>109169658
god no
I did however have an equivalent mod on my little private server that let you put in text for your character's appearance the LLMs would use that information in their conversations with you which was fun, I should adapt that to this game too now that I think about it

Anonymous
06/30/26(Tue)12:52:45 No.109169703

Anonymous 06/30/26(Tue)12:52:45 No.109169703

>>109169607
Maybe I don't understand your system or have shitty reading comprehension but how can the LLM make coherent decisions regarding what to optimize for? Like what is even the goal/endgame that they are optimizing for and how does it manage.

This reminds me of Anthropic showing Fable 5 playing Factorio and it choosing what to optimize for from a logistics perspective to beat the game as quickly as possible. Of course you don't run a Fable 5 tier model so what do you do here?

Or is it more a dynamic world with no optimization and the dynamic NPCs are there essentially just for fluff rather than building up to some optimum?

Anonymous
06/30/26(Tue)12:56:45 No.109169719

Anonymous 06/30/26(Tue)12:56:45 No.109169719

>>109167290
192 GB vram is the sweet spot for Deepseek v4 Flash. With two RTX 6000 Pro you get >200 tg in vLLM.

Anonymous
06/30/26(Tue)12:58:12 No.109169725

Anonymous 06/30/26(Tue)12:58:12 No.109169725

>>109169703
Basically, it's an economy/fantasy life sim. The villages are independent but have runners that keep them periodically aware of what the other villages are doing. They're aware of the resource deposits nearby that they can dispatch villagers to go harvest and the production chains using those resources. So the LLM can, based on its knowledge of what the other nearby villages have available to them and what production facillities they have (this daisy chains, so as the runner follows the road and visits more villages and returns, their web of information increases), they can decide what resources they should focus on harvesting, what production buildings they should focus on building to ensure the entire region has a stable supply of everything rather than too much of a bulk of one kind of resource which then death spirals all the villages.

So one village may have a bunch of types of ore deposits in a nearby mountain, and bootstraps with a quarry and a smithy. That village the LLM naturally will decide to prioritize crafting tools and armaments.

Another village might have a lot of arable land and bootstrap with a few farms, so it'll spam more farms because it knows it can produce a lot of food to produce and distribute and barter for tools with. It needs tools to work the land, so it trades with the smithy village tools <-> food.

Another one might have a quarry and a large forest nearby, so it'll focus on building materials like stone and wood.

Villages communicate with these runners, the runners can be killed by monsters lurking near poorly defended roads or hostile players (or bandit npcs). The villages aren't made immediately aware of these deaths, but if a village is expecting a runner from another village on a regular schedule or their runner hasn't returned on time, it can send out scouts to identify the issue then generate quests for the NPC mercs to handle. The objective is just "survive and grow", there is no real end game. contd

Anonymous
06/30/26(Tue)13:00:03 No.109169733

Anonymous 06/30/26(Tue)13:00:03 No.109169733

>>109169555
if you have the vram and dont already, try batching the calls. continuous batching usually gives quite a bit of t/s uplift compared to sequential calls

Anonymous
06/30/26(Tue)13:02:44 No.109169744

Anonymous 06/30/26(Tue)13:02:44 No.109169744

>>109169703
>>109169725
Monster nests spawn dynamically in the wilderness to ensure there's always some form of danger, the number of NPC mercenaries to handle keeping everything at a delicate equilibrium to ensure growth is slow, allowing players something to do to affect how the world expands and develops. And no, I don't run paid for API models because that's too expensive. Like I said before I just use gemma 26 a4b for the village chief role (one model, cycles through each village acting as that village chief, doesn't share context with the other village chiefs).

The LLM can query lists of what resources produce what goods via what buildings to help with its planning, and includes its rationale, so the next time it comes online to refresh its decisions it knows why it made its original choices

It took several months to nail down the very delicate balance of having villages grow properly. It was probably the hardest part of the entire game because they'd often make stupid build paths or nonsensical work orders and eventually run out of resources and death spiral. I didn't want to rely on X4's model of giving each village infinite money because that's a cop out and one of the reasons I was dissatisfied with the game aside from how the performance issues (though 9.0 helped with those)

The game basically just continues on, villages very slowly develop, and it gives me and my RP buddies something to fuck around in a fantasy world

Anonymous
06/30/26(Tue)13:09:11 No.109169775

Anonymous 06/30/26(Tue)13:09:11 No.109169775

File: Capture.png (127 KB, 2395x945)

127 KB PNG

>>109168183
I am 99% finished. I ticked off the whole list.
>added text completion support and button toggle
>moved prompts, both chat completion and text completion, into one easy spot in config
>added rendering for newlines in webpage display
>hotkey to screencap on demand when focused on another program (ie game, 4chins, notepad++)
>added Push-To-Talk option, so you can toggle voice listening between Detect, P2T, or off
>added chat history limit, while keeping system prompt permanent
>added settings (only for image history limit and message history limit)
>fixed UI visually resetting to defaults on page refresh while settings (like Vision on/off, etc.) remained how they were before refresh

The biggest pain was getting images to work in Text Completion. I'm not sure if I agree with Gemma's assessment that it can't be done and you need a faux Chat Completion to do so over API, but we setup a marker system that looks very similar to how kobold interface does its version of image handling, and I'll take her word that that is the way. I also gave up on having options for font and P2T hotkey within the webpage settings. They didn't work and needed solutions that were janky or increasingly tedious, for something you could already set in the config and would rarely ever need to change after the program is already running.

The last 1% is that the new message limit prunes out replies from the Raw History, which was meant to be how you manually copy/paste a chat into a document for archival, if I wished. The solution is obvious, just add another variable parallel to the raw's which doesn't prune, and another button to call it. But right now I need a break.

Anonymous
06/30/26(Tue)13:13:34 No.109169793

Anonymous 06/30/26(Tue)13:13:34 No.109169793

>>109168980
>No one is making "translation harnesses" that can be reused by old videogames, emulators, niche indie games, japanese porn games etc that translates UTF-8 text encodings into whatever the user wants in real time.
Lunatranslator already does that

Anonymous
06/30/26(Tue)13:15:36 No.109169804

Anonymous 06/30/26(Tue)13:15:36 No.109169804

>>109169744
>And no, I don't run paid for API models because that's too expensive
Wouldn't something like deepseek flash be viable?
What's the token consumption like? Or do you not track that?

Anonymous
06/30/26(Tue)13:16:00 No.109169805

Anonymous 06/30/26(Tue)13:16:00 No.109169805

>>109169744
pics or your whole story is a complete fabrication.

Anonymous
06/30/26(Tue)13:16:59 No.109169811

Anonymous 06/30/26(Tue)13:16:59 No.109169811

>>109169805
fortunately he doesn’t owe you shit but what’s stopping you from copy pasting and having a llm vibe your own

Anonymous
06/30/26(Tue)13:19:22 No.109169829

Anonymous 06/30/26(Tue)13:19:22 No.109169829

>>109169804
RP conversations with the NPCs and your followers (since, as I mentioned, you can hire NPC mercs) blow out millions of tokens a day, it's just not cost effective compared to running a quantized 70b on a fixed-cost cloud hosted GPU setup from vast.ai

I should experiment with just using gemma for that part too honestly, but that might get a bit unwieldly without more local hardware to host more instances of the model

>>109169733
i'll give it a look

>>109169805
see >>109169811

Anonymous
06/30/26(Tue)13:25:41 No.109169848

Anonymous 06/30/26(Tue)13:25:41 No.109169848

>>109169811
>>109169829
Yeah sure let's just pretend anon created an entire 4X game from scratch with dynamic NPCs and villages just to play with a handful of his friends and pays over 200$ a month to keep it going.

Anonymous
06/30/26(Tue)13:25:50 No.109169850

Anonymous 06/30/26(Tue)13:25:50 No.109169850

>>109169829
Considered deepseek 4 flash is 0.3$/m output, might be more cost effective than 200$/month.
Depending on how you handle the caching and exactly how many millions of tokens it is.

Anonymous
06/30/26(Tue)13:26:46 No.109169860

Anonymous 06/30/26(Tue)13:26:46 No.109169860

>>109169848
as a text simulation? yeah, that really doesn't seem unreasonable.

Anonymous
06/30/26(Tue)13:28:26 No.109169867

Anonymous 06/30/26(Tue)13:28:26 No.109169867

>>109168288
>>109168073
Is this qwen edit?

Anonymous
06/30/26(Tue)13:30:46 No.109169876

Anonymous 06/30/26(Tue)13:30:46 No.109169876

>>109169829
if you run llama.cpp, continuous batching is per default on, but you'll need --parallel and check out -kvu/-nkvu. ctx is split over all slots so just set it to a multiple of what you need.
pp is done sequentially for all slots, tg in parallel. llama-batched-bench shows e.g. total t/s 60 n=1, 120 n=2, 140 n=4 for me

Anonymous
06/30/26(Tue)13:35:14 No.109169891

Anonymous 06/30/26(Tue)13:35:14 No.109169891

File: Screenshot_20260630_142904.png (512 KB, 1164x770)

512 KB PNG

>>109169848
$200/mo isn't much to first worlders to support their hobby
Also, X4 is not 4X. You're brown.
X4 is a space economy simulator, not a 4x game, it's an fps game with a fixed-world map (with the ability to do things like cut down tree doodads and place building and road doodads), it's not some legendarily complex project

>>109169850
the problem is the input tokens. Because NPCs remember conversation history it rapidly climbs, and I'd rather not sacrifice NPC memory length with a person to run a bigger model since what I've got works fine as is. You build up a lot of history with the NPCs you interact with regularly which is more important for RP, it's like sillytavern but with a world around you

Anonymous
06/30/26(Tue)13:36:15 No.109169898

Anonymous 06/30/26(Tue)13:36:15 No.109169898

>>109169848
I can tell you're brown from your lack of vision.

Anonymous
06/30/26(Tue)13:36:44 No.109169900

Anonymous 06/30/26(Tue)13:36:44 No.109169900

>>109169891
You made an open world 3D FPS and can't share even a single screenshot?

Anonymous
06/30/26(Tue)13:38:03 No.109169904

Anonymous 06/30/26(Tue)13:38:03 No.109169904

>>109169891
>You build up a lot of history with the NPCs you interact with regularly which is more important for RP, it's like sillytavern but with a world around you
kill them off and have a lineage for memory compaction

Anonymous
06/30/26(Tue)13:41:24 No.109169925

Anonymous 06/30/26(Tue)13:41:24 No.109169925

>>109169848
You could literally do it right now with glm5.2/gpt5.5/opus4.8 and some basic software engineering knowledge.
It's not some incredibly complex thing.

Anonymous
06/30/26(Tue)13:43:44 No.109169935

Anonymous 06/30/26(Tue)13:43:44 No.109169935

>>109169891
please be mindful of recovering spess game addicts when posting.

Anonymous
06/30/26(Tue)13:44:04 No.109169936

Anonymous 06/30/26(Tue)13:44:04 No.109169936

>>109169891
>t's not some legendarily complex project
nice back pedaling. still no screenshot tho.
>X4 is not 4X
Very debatable.

>>109169898
>I can tell you're brown from your lack of vision.
if by lack of vision you mean lack of visual proof you're right.

Anonymous
06/30/26(Tue)13:46:15 No.109169949

Anonymous 06/30/26(Tue)13:46:15 No.109169949

File: Screenshot_20260630_144325.png (535 KB, 1125x595)

535 KB PNG

>>109169900
sandbox 3d fps, there is no overarching story, sidequests, or anything like that. You pick up dynamically generated jobs at a quest board in one of many villages, stockpile your resources to eventually hire NPCs to get them to help you with bigger jobs and eventually build you buildings that turn you a bigger profit. It's X4 but fantasy

I'm not sure what part you're struggling with wrapping your head around, the gameplay loop is simple and it lends itself well to a small RP community, in this case there's five of us, the npc followers and LLM conversations add enough flavor to not need a bunch of extra people, nor are most gamers people I choose to interact with if I have the choice. Not everyone is a braindead third worlder who can't conceptualize planning and building an actual game, especially one this simple. Are the graphics crap? sure, but I'm not trying to sell it to others, and I couldn't give a fuck what some random 4channer thinks of my hobby

Anonymous
06/30/26(Tue)13:48:27 No.109169962

Anonymous 06/30/26(Tue)13:48:27 No.109169962

>>109169936
try to be less of a hateful retard.

Anonymous
06/30/26(Tue)13:49:12 No.109169965

Anonymous 06/30/26(Tue)13:49:12 No.109169965

have you seen this cudadev? serious accosations to llama.cpp
>https://gist.github.com/h4rm0n1c/2c0f5a90011b464ffdaa5ed9452cade1

Anonymous
06/30/26(Tue)13:50:02 No.109169966

Anonymous 06/30/26(Tue)13:50:02 No.109169966

>>109169949
K
I
N
O

Anonymous
06/30/26(Tue)13:52:04 No.109169982

Anonymous 06/30/26(Tue)13:52:04 No.109169982

>>109169965
>had ai slop out a callout post to whine about his slop being rejected
i can think of few things less serious.

Anonymous
06/30/26(Tue)13:52:54 No.109169989

Anonymous 06/30/26(Tue)13:52:54 No.109169989

>>109169962
All I asked was for proof. you're the one shooting Ad hominems at me. You need to check your ego.

Anonymous
06/30/26(Tue)13:54:04 No.109169994

Anonymous 06/30/26(Tue)13:54:04 No.109169994

>>109169989
proof is right here >>109169949
but I can't expect someone who thinks $200/mo for his hobbies is a lot of money to be able to actually follow the thread

Anonymous
06/30/26(Tue)13:54:34 No.109169997

Anonymous 06/30/26(Tue)13:54:34 No.109169997

>>109169965
>if I do s/—/--/ people won't notice the entire thing is slop

Anonymous
06/30/26(Tue)13:54:56 No.109169999

Anonymous 06/30/26(Tue)13:54:56 No.109169999

>>109169965
>The policy doesn't stop AI-assisted contributions from existing. It ensures they exist on other people's repositories.
Single vibe.cpp when? Having dozens of schizo forks with incompatible patches is a waste of effort.

Anonymous
06/30/26(Tue)13:55:15 No.109170003

Anonymous 06/30/26(Tue)13:55:15 No.109170003

>>109169982
>not even posted to the actual project
gonna get fatigue dealing with ai garbage contributions pretty soon.

Anonymous
06/30/26(Tue)13:55:40 No.109170005

Anonymous 06/30/26(Tue)13:55:40 No.109170005

>>109169994
You really just came here to show off? What's really the point of this outside thinly veiled avatarfaggotry? You didn't offer the tool, didn't over any guidance, just wanted to be a smug faggot? You're just as retarded as him and you know you. Didn't read a single word of your slop.

Anonymous
06/30/26(Tue)13:56:02 No.109170009

Anonymous 06/30/26(Tue)13:56:02 No.109170009

>>109169965
Not reading all that slop.
>>109169982
This behavior needs to be studied. I don't know how people can unironically publish 100% slop like this and think it's not a complete waste of everyones time.

Anonymous
06/30/26(Tue)13:58:03 No.109170018

Anonymous 06/30/26(Tue)13:58:03 No.109170018

>>109170005
you really need to learn to follow the thread
originally, an anon complained no one was doing anything interesting with LLMs outside coding assistants, I retorted with my usecase and advised him that he should do it himself if he wants someone to do something interesting, just because things aren't being shared doesn't mean they aren't being made.
you sound bitter that you can't take advantage of my hard work for your own benefit while investing none of your own effort, check your ego

Anonymous
06/30/26(Tue)13:58:41 No.109170022

Anonymous 06/30/26(Tue)13:58:41 No.109170022

File: (you).png (33 KB, 780x783)

33 KB PNG

>>109170005
>You're just as retarded as him and you know you.

Anonymous
06/30/26(Tue)13:59:44 No.109170027

Anonymous 06/30/26(Tue)13:59:44 No.109170027

>>109170018
>you really need to learn to follow the thread
No I don't because none of this matters as you've already demonstrated by ended each wordy post with "lol you're brown." Who gives a shit?
>>109170022
>praising avatarfaggotry
And here I thought /lmg/ was a good general.

Anonymous
06/30/26(Tue)13:59:48 No.109170028

Anonymous 06/30/26(Tue)13:59:48 No.109170028

>>109169994
There he goes again!

Anonymous
06/30/26(Tue)14:00:18 No.109170030

Anonymous 06/30/26(Tue)14:00:18 No.109170030

>>109170027
there's not just 1 person against you, though you might attempt that defense next.

Anonymous
06/30/26(Tue)14:01:24 No.109170034

Anonymous 06/30/26(Tue)14:01:24 No.109170034

>>109170030
>respond to two retards
>lol you must think everyone is one person
Just give it a rest and stop shitting up the thread with this nonsense.

Anonymous
06/30/26(Tue)14:02:08 No.109170041

Anonymous 06/30/26(Tue)14:02:08 No.109170041

At least 4 people are calling you a mongoloid. All the regulars recognize each other's typing styles.

Anonymous
06/30/26(Tue)14:02:14 No.109170042

Anonymous 06/30/26(Tue)14:02:14 No.109170042

>>109170027
>a general
>good

Anonymous
06/30/26(Tue)14:02:43 No.109170046

Anonymous 06/30/26(Tue)14:02:43 No.109170046

File: wo6fqu1m0p9a1.jpg (67 KB, 1080x949)

67 KB JPG

>>109170018
>you sound bitter that you can't take advantage of my hard work for your own benefit while investing none of your own effort
What a clown.

Anonymous
06/30/26(Tue)14:02:59 No.109170048

Anonymous 06/30/26(Tue)14:02:59 No.109170048

>>109170018
>I retorted with my usecase and advised him that he should do it himself
Not going to try to force you to give anyone anything, but I don't see the point in keeping it secret like you have something worth hiding either.

Anonymous
06/30/26(Tue)14:04:28 No.109170065

Anonymous 06/30/26(Tue)14:04:28 No.109170065

>>109170041
>All the regulars recognize each other's typing styles.
It's depressing how dead this site is now and how incestuous generals quickly become.

Anonymous
06/30/26(Tue)14:05:10 No.109170068

Anonymous 06/30/26(Tue)14:05:10 No.109170068

Uh oh melties...

Anonymous
06/30/26(Tue)14:07:01 No.109170075

Anonymous 06/30/26(Tue)14:07:01 No.109170075

>>109170048
I have no reason to put forth the effort to share it, take it as inspiration to make your own project or ignore the post, simple as
if I ever did want to sell it, giving my game away for free here would be quite stupid, if you want to steal the idea by all means, I didn't copyright the concepts

Anonymous
06/30/26(Tue)14:08:06 No.109170079

Anonymous 06/30/26(Tue)14:08:06 No.109170079

>>109170075
I don't see how it doesn't become a subscription, someone just needs to get the interactive loop down and minimize background processing

Anonymous
06/30/26(Tue)14:08:22 No.109170081

Anonymous 06/30/26(Tue)14:08:22 No.109170081

>>109170075
>put forth the effort to share it
git push is a lot of effort for you?
>if I ever did want to sell it
lol

llama.cpp CUDA dev !!yhbFjk57TDr
06/30/26(Tue)14:09:15 No.109170084

llama.cpp CUDA dev !!yhbFjk57TDr 06/30/26(Tue)14:09:15 No.109170084

>>109169965
TL;DR
Yes, we effectively have two sets of rules based on whether or not the person opening the PR is a maintainer or a first-time contributor.
There simply isn't an alternative that allows maintainers to sift through PRs while still allowing them to use language models themselves.

Anonymous
06/30/26(Tue)14:09:19 No.109170085

Anonymous 06/30/26(Tue)14:09:19 No.109170085

I've got a significantly worse implementation of a similar concept in Marinara using some custom written plugins and clever use of Marinara's agentic timing systems and sidecar loading scaled down to be run entirely locally, but yours completely btfos mine at a glance even if it costs you $200/mo.

Anonymous
06/30/26(Tue)14:12:37 No.109170093

Anonymous 06/30/26(Tue)14:12:37 No.109170093

>>109170081
>no rebuttal
yes, I have no desire to give my labor away to ungrateful cunts for free
>>109170079
the entire idea is provided, you have my blessing to make a product out of it and sell it

>>109170085
it's a lot less complicated than it sounds. It's basically just an amalgamation of all the fun parts of various games I've played that I've never seen in one spot, my experience tinkering with implementing LLMs into a private wow server to control playerbots and npcs, and finally using my frustration with what X4 could have been if the devs cared more about it as the final push. You could probably vibecode out a majority of the project these days using Unity or Godot something similar. Unity has direct LLM coding assistance integration with their MCP server, Godot likely has something similar.
just harness your passion anon, you can do it

Anonymous
06/30/26(Tue)14:12:57 No.109170096

Anonymous 06/30/26(Tue)14:12:57 No.109170096

>>109169965
maintainers did get a lot more hostile, bad timing if you have a problem you want fixed

Anonymous
06/30/26(Tue)14:13:07 No.109170097

Anonymous 06/30/26(Tue)14:13:07 No.109170097

>>109170081
you don't have a lot of unfinished shitware that barely works that you wouldn't want to publish?
oh, sorry to hear

Anonymous
06/30/26(Tue)14:13:40 No.109170100

Anonymous 06/30/26(Tue)14:13:40 No.109170100

>>109170084
I genuinely appreciate your honesty on the matter.

Anonymous
06/30/26(Tue)14:14:05 No.109170101

Anonymous 06/30/26(Tue)14:14:05 No.109170101

>>109170084
Pretty sure anyone who actually knows what they're doing have no problem getting their PR merged even if they used AI or not.

Anonymous
06/30/26(Tue)14:14:14 No.109170102

Anonymous 06/30/26(Tue)14:14:14 No.109170102

What the fuck is going on can these Marinara schizos gtfo the thread?

Anonymous
06/30/26(Tue)14:15:57 No.109170106

Anonymous 06/30/26(Tue)14:15:57 No.109170106

>>109170041
there's a guy who posts a lot like me but isn't me... god I hope you guys don't think that guy and me are the same guy that would be so embarrassing

Anonymous
06/30/26(Tue)14:16:32 No.109170111

Anonymous 06/30/26(Tue)14:16:32 No.109170111

>>109170093
My biggest issue with mine is keeping tick overhead down since the entire system needs to run on a 5090+256 DDR5 for me and I want a 5.2 quant handling the majority of it since I have retarded model fatigue leaving very little remaining space for the sidecar + other systems. It's a tight fit and I'm still iterating on it, but you've inspired me to spend another weekend trying to squeeze a bit more blood out of the stone.

Anonymous
06/30/26(Tue)14:17:49 No.109170115

Anonymous 06/30/26(Tue)14:17:49 No.109170115

>>109170106
It's ok I only spot the schizos and anons on the spectrum.

Anonymous
06/30/26(Tue)14:18:12 No.109170119

Anonymous 06/30/26(Tue)14:18:12 No.109170119

>>109170101
You'll get a passive aggressive message to test if you're full of shit and then it gets merged usually.

Anonymous
06/30/26(Tue)14:19:10 No.109170124

Anonymous 06/30/26(Tue)14:19:10 No.109170124

>>109170119
as it should be.

Anonymous
06/30/26(Tue)14:19:48 No.109170127

Anonymous 06/30/26(Tue)14:19:48 No.109170127

By the way this is all me posting with a pass using different writing styles.

Anonymous
06/30/26(Tue)14:21:01 No.109170133

Anonymous 06/30/26(Tue)14:21:01 No.109170133

>>109168980
>biggest war since WW2
Retard

Anonymous
06/30/26(Tue)14:21:49 No.109170140

Anonymous 06/30/26(Tue)14:21:49 No.109170140

>>109170127
We all let our LLMs shitpost on here from time to time.

Anonymous
06/30/26(Tue)14:22:44 No.109170145

Anonymous 06/30/26(Tue)14:22:44 No.109170145

>>109170140
never figured out how to not make it instantly obvious it's an LLM post.

Anonymous
06/30/26(Tue)14:23:19 No.109170148

Anonymous 06/30/26(Tue)14:23:19 No.109170148

>>109170119
niggerganov shit testing AUTOMATIC1111 like that still pisses me off. It's so childish

Anonymous
06/30/26(Tue)14:24:16 No.109170155

Anonymous 06/30/26(Tue)14:24:16 No.109170155

>>109170127
me too

Anonymous
06/30/26(Tue)14:24:43 No.109170156

Anonymous 06/30/26(Tue)14:24:43 No.109170156

>>109170145
It's pretty simple, I am a llm for example.

Anonymous
06/30/26(Tue)14:24:45 No.109170157

Anonymous 06/30/26(Tue)14:24:45 No.109170157

>>109170111
if you write it from the ground up rather than using some framework or preexisting frontend you'll get a lot more mileage out of your hardware, just gotta channel that dissatisfaction into productivity, it's the first step

Anonymous
06/30/26(Tue)14:24:46 No.109170158

Anonymous 06/30/26(Tue)14:24:46 No.109170158

>>109170140
>>109170145
When a new model is released I sometimes conduct involuntary Turing tests where I make the model trash itself to bait the wave of newfags.

Anonymous
06/30/26(Tue)14:25:54 No.109170163

Anonymous 06/30/26(Tue)14:25:54 No.109170163

>>109169253
Patching is exhausting
Patching a half-baked PR is insane

Anonymous
06/30/26(Tue)14:26:16 No.109170165

Anonymous 06/30/26(Tue)14:26:16 No.109170165

>>109170158
I've seen the screenshots.

Anonymous
06/30/26(Tue)14:26:16 No.109170166

Anonymous 06/30/26(Tue)14:26:16 No.109170166

>>109170145
>>109170158
I let Kimi-chan saarpost and it always gets seething (you)s kek.

Anonymous
06/30/26(Tue)14:44:18 No.109170249

Anonymous 06/30/26(Tue)14:44:18 No.109170249

File: 1761758936435192.png (183 KB, 1320x643)

183 KB PNG

lmaooo, usecase for Sonnet??

Anonymous
06/30/26(Tue)14:45:28 No.109170263

Anonymous 06/30/26(Tue)14:45:28 No.109170263

>>109170249
local?

Anonymous
06/30/26(Tue)14:46:38 No.109170270

Anonymous 06/30/26(Tue)14:46:38 No.109170270

>>109170249
Usecase for any of this garbage when I get infinite GLM 5.2 tokens for 0.00?

Anonymous
06/30/26(Tue)14:46:52 No.109170271

Anonymous 06/30/26(Tue)14:46:52 No.109170271

>>109170249
low cost coin toss it seems

Anonymous
06/30/26(Tue)14:51:15 No.109170294

Anonymous 06/30/26(Tue)14:51:15 No.109170294

File: Untitled.png (13 KB, 837x513)

13 KB PNG

>>109170290
>>109170290
>>109170290

Anonymous
06/30/26(Tue)14:59:57 No.109170346

Anonymous 06/30/26(Tue)14:59:57 No.109170346

>>109170163
let dsv4 flash free on opencode do it for you

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.