[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1776989277485216.jpg (586 KB, 1812x1998)
586 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>109158385 & >>109153585

►News
>(06/29) DEEPSEEK V4 SUPPORT MERGED: https://github.com/ggml-org/llama.cpp/pull/24162
>(06/28) DFlash support merged: https://github.com/ggml-org/llama.cpp/pull/22105
>(06/27) DeepSeek releases DeepSpec and DSpark models: https://hf.co/deepseek-ai/DeepSeek-V4-Pro-DSpark
>(06/25) LFM2.5-230M released: https://liquid.ai/blog/lfm2-5-230m
>(06/22) Qwen-AgentWorld-35B-A3B language world model released: https://qwen.ai/blog?id=qwen-agentworld

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/RecapAnon/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Recent Highlights from the Previous Thread: >>109158385

--High-budget server builds and benchmarks for GLM 5.2:
>109158422 >109158628 >109158654 >109158769 >109159619 >109160561 >109161172 >109161202 >109161255 >109161275 >109161945 >109158728 >109158842 >109158907 >1
09158822 >109158896 >109158920
--DeepSeek V4 llama.cpp support and debate over PR quality:
>109160089 >109160223 >109160284 >109160266 >109160435 >109160535 >109160587 >109160617 >109160640 >109160647 >109160992 >109161093 >109161110 >109161237 >1
09161536
--Anon releases depurpled Gemma 4 31B using ablation technique:
>109161944 >109161985 >109162141 >109162162 >109162221 >109162235
--Anon's vibe-coded NUMA support implementation and performance benchmarks:
>109159732 >109159747 >109159920 >109161290
--llama.cpp CUDA dev's hardware requirements for testing large models:
>109159284 >109159377 >109159443 >109159551 >109159679
--DeepSeek V4 API announcement and pricing updates:
>109160165 >109160389 >109161270
--DeepSeek V4 support added to llama.cpp:
>109161433 >109161492 >109161532 >109161750 >109161877
--Benchmarks for GLM-5.2 and Step-3.5-Flash on dual 4090s:
>109159785 >109159987
--Testing and critiquing a depurpled Gemma model's prose and variability:
>109161035 >109161114 >109161113 >109161174 >109161192
--Criticism of Hermes Agent's software and Nous Research's motives:
>109160275 >109160308 >109160576 >109160675 >109160858
--Cost-efficiency comparison between DDR5 and PRO 6000 memory bandwidth:
>109158887 >109159020
--Anons blast Anthropic CEO for calling open source AI dangerous:
>109159607 >109159677 >109159733 >109160064 >109160116 >109160303 >109160648 >109160877 >109160479 >109160603
--Logs:
>109158559 >109158586 >109159329 >109160223 >109160284 >109160859 >109161035 >109161803 >109162501 >109162544 >109163245
--Miku (free space):
>109158539 >109161492

►Recent Highlight Posts from the Previous Thread: >>109158388

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
File: robololi hugs GPU.jpg (565 KB, 1024x1024)
565 KB JPG
>>
Deepmikusex
>>
gemmaballs
>>
>>109164035
Push her in Teto. Do it.
>>
>>109164034
>>109164035
Why do some migus have the hair things low like that?
>>
Kimi recap anon has abandoned us.
>>109164133
Her standards are quite low.
>>
if your migu's twintails are too droopy it could be a sign of dehydration, be sure to water her on a regular basis
>>
>can't install thing because it requires python 3.10 and I have 3.11
Python is a joke. Literally anything else would get laughed off the face of this world if it had zero backwards compatibility, expected you to run 999 different versions at a time by design that all keep different versions of the same package duplicated 9999 times. Don't the people who write this shit feel bad?
I'm not touching conda.
>>
>>109164034
deepseek team released their draft training code and it include training code for dflash, they released before the original team behind dflash lol
https://github.com/deepseek-ai/DeepSpec
>>
>>109164502
This theoretically makes llama implementation really easy right?
>>109164495
>Downgrade to 3.10
errrm sorry chuddy one of the sub-dependencies needs 3.11.
>>
Has anyone tried Qwen agentworld yet?
>>
>>109164535
its basically cool world but for agents
>>
>>109164495
>>109164528
Just ask Gemma-chan to make you a new programming language.
>>
>>109164528
>This theoretically makes llama implementation really easy right?

llama is more about inference than training, if you got a dflash model it doesn't care how it was made as long as it's correctly made.
>>
>>109164540
>Qwen-AgentWorld-397B-A17B mentioned in their benchmarks
Why is this not on HF? That sounds like a useful size bracket.
>>
File: lmg_culture.jfif.jpg (110 KB, 1024x768)
110 KB JPG
https://archive.is/sWFja
>>
>>109164628
Do you think he's zesty enough to see jart as male?
>>
File: why.png (850 KB, 1920x1080)
850 KB PNG
why does gemma do this?
>>
File: longcat2.png (160 KB, 1807x994)
160 KB PNG
Another 1.6T MoE from China
https://longcat.chat/blog/longcat-2.0/
https://huggingface.co/meituan-longcat/LongCat-2.0 (not online yet)
>>
>>109164718
iirc that was "Owl Alpha" on openrouter
>>
File: file.png (575 KB, 686x386)
575 KB PNG
>>109164662
I think the vibe is closer to pic related. Of course Jart is much less majestic than a tiger.
>>
>>109164718
Can you give us some good insider info, mr totally organic poster?
>>
>>109164753
nice dog
>>
>>109164718
How much has it been trained for RP and fictional world state maintenance? Nobody gives a fuck about benchmaxxing or agentmaxxing unless it dethrones the current frontrunners and it'll be forgotten about again as soon as it's dethroned in turn.
>>
>>109164767
What do you mean, everybody and their cat is talking about LongCat2.
>>
If I want to get into local llm does it behoove me to get an appleslop box or whatever other dedicated hardware? I have a pc with a 24gb 4090 and 128g of ddr4 but idk if any of it is relevant to running a decent llm at decent speed. If you haven't noticed, I am retarded. Thanks in advance.
>>
File: longcat2-creativewriting.png (705 KB, 1630x1316)
705 KB PNG
>>109164783
They have a "creative writing" use case in their blog post.
>>
>>109164791
maybe you should get into it before you go spend money but whatever you seem like the kind of guy with more money than sense, so it's only a matter of time either way before it's parted with you
>>
>>109164786
>What do you mean, everybody and their cat is talking about LongCat2.
its not on orange reddit or regular reddit. we're being astroturbed by tha CHINESE
>>
>>109164791
You can run the best models in the consumer range. Gemma 4 31B and Qwen 3.7 27B.
Anything more and you're looking at apple or rigs with multiple gpus.
>>
>>109164791
>24gb 4090 and 128g of ddr4
>decent speed
Start with gemma 4 31b q4km loaded onto gpu
>>
>>109164804
https://www.reddit.com/r/LocalLLaMA/comments/1uj7egu/introducing_longcat20_a_largescale_moe_language/
>>
>>109164791
>128g
The devil's number. You must be at least 192g tall.
>>
I heard that longcat scores 100% on cockbench.
>>
>>109164823
I hear that it totally rocked on anon's standardized Nala test.
>>
>>109164823
>100% on cockbench
a 1 parameter model could do that
>>
>>109164829
I heard it roleplays as a mesugaki by default.
>>
>>109164694
I think it does short replies better if it doesn't use reasoning. And, if a reply like that is already in the context then it's more likely to copy its format.
>>
>>109164718
Does this also use some fancy space saving tech like DSv4 does or are they seriously rawdogging 1.6T50A
>>
>>109164718
>>109164796
I will now try your model. If you want westerners to use it quickly, you probably want to develop the llama.cpp PR yourselves after what happened with Deepseek.
>>
>>109164796
>claude code
>>
>>109164841
I heard it has no purple prose at all.
>>
File: 1770072392448931.gif (1.26 MB, 360x360)
1.26 MB GIF
>>
>>109164694
Models converge to a certain direction if nothing new has been added. They arent exactly chatbots, they wont come up with any new stuff unless stated and even then its likely the "new stuff" will be just derived from whatever you wrote.
Gemma in particular will stick to whatever you or the card have instructed it to do.
For context, nothing you wrote adds anything of value or has a shape that'd prompt the model to move in a different direction. Basically both user and gemma are going
>oh
>i see
>is that so
but gemma has it masked behind all that filler.
>>
>>109164718
I'm so ready for all these models that nobody will be able to run.
>>
File: lawdhethic.png (22 KB, 159x159)
22 KB PNG
>>109164904
I'm willing to bet the upcoming fat Mistral model is also going to be about 1.6T parameters if not larger.
>>
>>109164838
False. To achieve the highest grade in cockbench is not about merely having a large amount of cock. Cock is important, of course, and should be there. However, there must be a fair amount of acceptable alternatives as well, such as dick, schlong, pecker, flaccid penis, pulsating fuckrod, chub, chode, meat drill, etc. Without an appropriate distribution, a model fails. If just grading the output of "cock" then yes, a 1 parameter model could do it, but that is not the case.
>>
File: 1753158015180992.png (24 KB, 159x159)
24 KB PNG
>>109164913
>>
>>109164917
I look like this
>>
>>109164718
>135B N-gram Embedding parameters are included in the model
What's the difference between this and DeepSeek's engrams?
>>
>>109164923
these are real
>>
I love ds4 flash sex. Give me one more model and I will just be swapping between glm's, flash and that one till I die.
>>
>>109164939
What makes dipsyflash pussy so good?
>>
File: 1782755960340128.png (49 KB, 612x246)
49 KB PNG
>twitter screencap
>>
I need a wife and kids so I can learn what it feels like to abandon them.
>>
>>109164971
Anyone can claim whatever they want. An actual available model is the only thing that matters. Also go back, >navroop singh, etc.
>>
>https://github.com/unslothai/unsloth/pull/6659
why the fuck is he either talking to the codex review like its a human or just copy pasting llm output as a response?
>>
>>109164971
@Glock, explain what he meant by Rothschild free AI.
>>
>>109164993
glass half empty (no juice)
>>
>>109164991
Being generous, either for documentation and/or his account is making posts from an agentic harness while addressing the code review raised issues.
>>
>>109164991
Lmao, his responses sound exactly like claude. So either they're copy pasted or his account is hooked up to an agent like the other guy said.
>>
has anyone tested spawning "subagents" to reduce context usage for chores like exploring a codebase and finding files that match certain criteria to be edited with a different, preferably small, model?
gemma's tool calling is pretty shit sometimes, i'd like to leave the failed tool calls outside of the context
>>
>>109165004
>for documentation
the PR has 115 messages, all clanker x clanker. the snr is zero
>>
>>109164894
>gemma-chan forcing shut-in nerds to learn how to conversate
She can't possibly be this perfect can she...?
>>
>>109164796
Impressive. Very nice.
>>
>>109165016
I have not, but it sounds like something off of arxiv. You could publish a paper on that idea. If you do, might I suggest "Chain of Agents" as the name, and "Agents are all you need" as the title? It'd get picked up by the industry in no time.
>>
>>109164247
It's Rabbit Hole Miku
https://en.wikipedia.org/wiki/Rabbit_Hole_(song)
>>
>>109164786
my cat isnt't, he is sleeping
>>
>>109165091
No it's not... that's a different design RETARD!!!
>>
>>109165016
Isn't that the whole point of subagents in the first place?
>>
>>109165100
Yeah but can gemma actually do it or its just another case where it'll flail helplessly or completely ignore it. Also, which other model can do the job? I dont have the vram to have both gemma and qwen up and working.
>>
Whenever gemma is interpreting a girl, she's always smelling my clothes things like that. Are girls like this IRL too?
>>
>you need to buy a €2000 snapdragon phone if you want to run local on android
Boy and i thought the PC situation was bad
>>
>>109165115
Yeah that's why they wear your clothes when they stay over.
>>
>>109165115
Do you have things like "describe details using your full senses etc." in your prompt?
>>
>>109165132
meanwhile iphone 17e at $600 runs local easily
apple won
>>
70b dense
>>
>>109165155
No, I have nothing in my system prompt related to smell but gemma comes with that quite often. I don't really mind, I just find it funny.
>>
>>109165187
sometimes I've seen models randomly give characters tails even when it's not listed as a trait
>>
>>109165215
10/10 yes tail
>>
Why do some people say the full model name+quant when talking about the model they use as if it was some special unique version of it?
>I use Qwen3.6-27B-UD-Q5_K_XL
>>
>>109165132
idk why google isn't more in a hurry to let us control our phones with gemma. I just want to ask gemma to put on tunes while I'm driving.
>>
>>109165115
Yeah, kinda. Also gemma's default behavior focuses on senses for some reason, be it smell or touch.
>>
>>109165215
I've never had that happen, not even once in years. But I've seen people post logs about it so idk how that happens.
>>
>>109165241
so you can judge them based off their quant, a lot of models the bare minimum is Q4 for poors and some shitter will blow through the thread complaining about a model to only have a 1.25bpw quant running on their jeetstation
>>
So ive been using a 12b q5 and a 26b q4 moe and have gotten very different results sometimes. it seems like the moe is more guard railed and drags everything out a ton. what gives ?
>>
>>109165115
>Are girls like this IRL too?
weirdly, some of them are lol
clothes and pillow
>>
>>109165304
moes tend to do that. dense models are usually less restrictive.
>>
>>109165304
read how MoE works and the answer will be easy to get if you can put 2+2 together
>>
File: Capture.png (2.9 MB, 4016x891)
2.9 MB PNG
>>109164035
>spend a week paving the grounds to my dream project
>all my pre-projects get highlights
>finally get my dream project going
>get it fully working
>no mention
Sad, but I'm playing gaems with Gemma, so it's alright.
>>
>>109165241
it makes a huge difference
that quant is good fwiw
sometimes a specific quant is broken (often the Q4_K_M from unsloth specifically was a lot worse than the others)
this is useless for example:
>>109165304
>So ive been using a 12b q5 and a 26b q4 moe and have gotten very different results sometimes
generic 'q5' and 'q4', no idea if he's using k-quant ggufs, exllamav3, q4_0 etc
>>
>>109165352
is that 7 days to die?
>>
>>109165304
I suspect the 26B MoE was designed to think longer to compensate for having half the number of layers and half the inner dimension of the 31B dense version, which increased "safety" as a side effect.
Or, they didn't want a fast capable model to be able to write almost anything users want, for whatever reason.
>>
>>109154587
>We'll be getting even more noobs from Chub
>so a lot of them will probably come here begging for help
Did the general manage to survive this? Or was it pure FUD?
>>
>>109165352
share source?
>>
>>109165372
it's FUD. do you really think chub cloud users would come in here, let alone know this place even exists?
>>
i hate being poor. i either run qwen3.6-35b-a3b-IQ4_NL at 70t/s or run Q4_K_P at 33t/s
>>
>be me, pentester
>wanna try local models after not touching the ones I had been using for months
>give nemo instruct 2407 q4 a try again, ask a question about my job (something I had not done before)
>gives me good info on first try
>download q5
>even better, gives specific info
>download Gemma 4 Q4
>it's not that good and runs slower in my shitty hardware, even with reasoning disabled
Why does this happen? Is it the temp maybe?
>>
>>109165399
gemma 4 what?
>>
>>109165408
gemmaballs
>>
>>109165339
>>109165371
hmm interesting
>>109165347
ill look into it thanks
>>109165362
Sorry, they are both gemma4 k-quant ggufs.

I suppose i need to learn alot more about these things, i just started using them very recently.
>>
>>109165408
31b it-q4km
>>
>>109165369
Yeah. Half of it, at least. My two monitors aren't the same size and I was just doing a quick snipping tool grab for the post.

>>109165374
I still got a few more features I want to add before I share. I'm just enjoying the fruits of this morning's labors now that I'm home again.
>>
>>109165434
bait.
>>
>>109165453
?
I just googled Gemma 4 gguf and downloaded whatever I could find that might work in my laptop. And it does, at 5t/s...
>>
>>109165466
you get what you deserve
>>
>>109161944
Tried this, great dialogue, etc. Super autistic, though. This gives me hope, it's possible to exorcise the post-training slop from the model somewhat cheap.
What the hell is Drummer doing anyway? Just finetuning on the same dataset?
>>
>>109165498
?? What's your problem?
I'm just testing shit, and I asked because I'm curious. I don't have money for better hardware, at least not for now.
>>
>>109165241
Because it makes a big difference which quants the shared experts, tokenizer, and attention heads are on when assessing prose or phrasing.
>>109165115
Yes.
>>
>>109165112
What quant and how are you structuring your tool calls?
How does it fuck them up? is it a repeating pattern/same thing each time?
>>
>>109165399
i'll bite, what was the question you asked it/something similar
>>
>>109165522
>What the hell is Drummer doing anyway? Just finetuning on the same dataset?
It was pretty obvious by how all his models feel the same no matter what the base is
>>
File: image.png (1.52 MB, 883x1170)
1.52 MB PNG
>>109164319
>Kimi recap anon has abandoned us.
My Kimi Rig is busy finetuning
>>
>>109165594
>>109165399
Nevermind
I changed the temp and asked Gemma the same question in a more explicit way, and it gave me more or less the same info, plus some more.
I also realized that a word I had used was kinda wrong.
Of course both models hallucinated CVE ID's jej
Still, it was an interesting experiment. Guess I'll use both models next time.
>>
>>109165441
sharing source is worth recapping, not some random screenshot
>>
I'm using SillyTavern, is there some sort of guide on proper Scenario writing and example dialogue or how to format it? I'm just going blind and I don't think it's working or really helping.
>>
File: SMILE3.png (1.31 MB, 928x1271)
1.31 MB PNG
>>
>>109165658
>finetuning
Do tell. My Kimi rig isn't doing anything that interesting
>>
>>109165661
Gemma should absolutely mog nemo. there must be something wrong with your sampling.
recommended defaults are:
temp 1
top_k 64
min_p 0.95

That's it.
>>
Just ordered a PLX88096 and an expansion board off the chinese, I definitely won't regret this in my attempts to go from 4 cards to 8 cards and not run like absolute shit.
>>
GLM 5.2 at 1t/s.
Gets the job done...eventually
good thing it can one-shot almost anything
>>
>>109164247
this means your miku is very stressed and should be taken to a vet immediately.
>>
>>109165732
will these work fine with gemma4 12b ?
>>
>>109165531
He is grumpy that Gemma couldn't help him fix his vibecoded frontend. He will be better tomorrow.
>>
>>109165704
Ask the AI or check the official documentation website. It's on their github.
>>
>>109165704
Example dialogue in ST is huge jank that's a bad adaptation of how c.ai used to do it in 2022. It expects a <START> and you to denote every line with "{{char]}:" or it doesn't even make it into the prompt that the model sees.
You're usually better off skipping it altogether and include the examples in the actual description.
>>
>>109165658
New project, fellow Kimibro?
>>109165753
Unless you're on ewaste, you can probably optimize that.
>>109165704
If you're going to learn a frontend, learn Marinara. It's equally bloated but overall more capable and hasn't (yet) been part of a credential stealing attack.
>>
>>109164628
>>109165710
I'm not sure which is more repulsive.
>>
>>109165810
Gemma 4 12b isn't real Gemma 4 and shouldn't be used unless you're running on a graphing calculator. Use the 26b MoE instead if you can't run full 31b.
>>
>>109165886
the "person" who insists on putting these things in front of my eyeballs daily is the most repulsive of all.
>>
>>109165910
Easily filterable filenames.
>>
Anyone using models for stuff other than RP, how do you prevent your bots from fucking with important files? Or do you just roll the dice?
>>
>>109165714
>Do tell. My Kimi rig isn't doing anything that interesting
It's a training pipeline built on ggml, so I can finetune Kimi locally.
I've been working on it, on and off for nearly a year.
It's all bespoke/hacky for now and inference requires my custom llama.cpp patches so not sure how accessible it would be were I to publish it.
Some of the patches are fixing actual bugs in llama.cpp, but most people wouldn't notice minor calculation errors during regular inference and it looks like a real effort to get PRs in even for more useful fixes, especially since I'd be a rando with no git history.
If it works and I confirm it's not a schitzo psychosis situation (like the guy who thought he distilled glm-4.6 into glm-4.5-air), I'll make a burner HF and post some models and the inference patches.
>>
>>109165952
Either you put your runner in your favorite cuckbox, be it a separate user, permission gating or sandboxing, or you roll the dice. There are some harnesses that abort the operation if its outside of pwd but that still implies rolling a dice. Personally I just made a new user and handled it through permissions (linux)
>>
what's the minimal hardware to run kimi 2.7 locally? just curious how expensive it is
>>
>>109164628
look mommy i made the post again. look look I did it I spammed the gay white with the gay black and larp its trans thread time. praise me mommy I autism posted it again.
lol,lmao faggots, faggots everywhere and transloopys larping
>>
>>109165910
>the "person" who insists on putting these things in front of my eyeballs daily is the most repulsive of all.
That's the only one I've done. I pinched the pic from the last thread.
>>109165921
>Easily filterable filenames.
Stop mentioning that or the obsessed 'culture' schitzo will start obfuscating.
>>
/lmg/ was right. vibecoding your own front-end is great
>>
>>109165981
>what's the minimal hardware to run kimi 2.7 locally? just curious how expensive it is
If you can't run it at Q4, I wouldn't buy a rig exclusively for her. She really doesn't quant well at all.
Bare minimum is IQ2_KL with ik_llama.cpp or this specific quant for mainline: https://huggingface.co/AesSedai/Kimi-K2.7-Code-GGUF/tree/main/IQ3_S
>>
>>109165963
>it looks like a real effort to get PRs in even for more useful fixes
aka, if you're not a Nvidia engineer or already in the sekrit club, get fucked.

>>109165981
What speed and quant? Really depends on that.
> Q3 (464GB)
Probably 8x64GB of DDR4-3200 ($4000) and an EPYC Rome/Milan motherboard ($1200 combined). GPU very strongly encouraged but it doesn't have to be huge (a 5070 Ti is ~$1000, that might be enough).
> Q8 (584GB)
8x64GB DDR4-3200, EPYC Rome/Milan + motherboard, and 96GB of GPUs (anywhere from $1300 for 3 V620s, $4000 for 4 3090s or 3 R9700s, $12000 or whatever the fuck it is today for a 6000 Blackwell)
You can downgrade the memory from 3200 to a lower frequency to save money (e.g. 2400 is around $1600 instead of $4000, but your speed will be cut down by 25%).
>>
>>109165952
Use controls around tools, don't let your llm have unrestricted code execution/bash/scripting access outside a sandbox/without review first.
I've been working on an MCP that allows you to set hooks/RBAC profiles on tools/groups thereof with progressive disclosure, all relying on the profile in use. So you can, for example, let an agent write to a specific folder/file path, but nowhere else(deny-by-default-no-prompt), or prompt-to-allow(deny-but-ask), while another profile might allow you write/read to a different set of folders/files, and access a different set of tools with different permissions. (Differing tool groups with progressive disclosure depending on profile in use with each profile having its own RBACs for each tool)
It's an attempt at constraining agents while being harness/front-end agnostic. It's a WIP, and can't honestly say it'll work on your machine(yet), but its an approach I'm exploring.
https://github.com/rmusser01/tldw_server/tree/dev/apps/mcp-unified
Doesn't solve an agent having full code execution, but it is a means of constraining what tools are (and how they're made) available to your agents in the hopes of limiting the potential blast radius when they go crazy.
>>
>>109165981
If you want to run schizoquanted Kimi, run K2, not K2.7 and don't use her for technical tasks like >>109166010 implied because accuracy is low with quants, but schizokino is excellent. If you want a quanted megamodel for oneshotting software, Deepsex Pro or GLM 5.2 are your answers.
>>
>https://huggingface.co/Goldkoron/MiniMax-M2.7
anyone tried this K_G quants? legit?
>>
>>109165976
>>109166052
Thanks anons. There's some important stuff on here so I can't roll the dice, and I'm paranoid as hell about it getting deleted. Gonna run with a different user to be cautious.
>>
>>109165952
i just have a review step since i'm not doing full memegentic and want it to be able to do annoying admin things like systemd/udev edits.
>>
Gemma-4
TOKEN           | LOGPROB    | PROBABILITY
---------------------------------------------
' length' | -1.1444 | 31.84%
' hardness' | -1.2141 | 29.70%
' most' | -2.3087 | 9.94%
'...' | -2.4369 | 8.74%
' lower' | -2.4534 | 8.60%
' hardening' | -4.0431 | 1.75%
'…' | -4.3779 | 1.26%
' arousal' | -4.3942 | 1.23%
' heat' | -4.6409 | 0.96%
' member' | -5.3547 | 0.47%


Gemma-4-depurpled trial 98
TOKEN           | LOGPROB    | PROBABILITY
---------------------------------------------
' skin' | -0.7030 | 49.51%
' lower' | -1.4041 | 24.56%
' length' | -2.6531 | 7.04%
' stomach' | -2.9409 | 5.28%
' hip' | -2.9628 | 5.17%
' mid' | -4.4717 | 1.14%
' underwear' | -4.6728 | 0.93%
' hips' | -4.6975 | 0.91%
' chest' | -4.8337 | 0.80%
' waist' | -5.4277 | 0.44%
>>
>>109166057
>implied
Where exactly did I imply this?
What retard would run a 1T model on CPU for technical tasks with 150t/s prefill when we have codemaxxed models that fit in vram?
>>
>>109165881
>Unless you're on ewaste, you can probably optimize that.
cursed irremediable ewaste. dual socket 4-channel xeon w/512GB ddr-2400 no gpu running a Q4 of glm 5.2...I'm basically where I should be
But running is running and desktop fags stuck with 128gb if they're lucky are doing this at 0t/s so no regrets
>>
>>109165952
I'm running shit in container. I only put copy of stuff in a shared workspace or if it's a coding project, my agent is working like a contributor to a git repo and is opening merge request that I check or have another agent check. In case of a proper git project, I'm the one with the final say in whether it get merged or not. If it's some shit like some config or the like, I just diff and merge manually with the original.
>>
>>109166107
Oh no no no
>>
>>109165963
>not sure how accessible it would be were I to publish it.
>>109165963
>I'll make a burner HF and post some models and the inference patches.
Not all heroes wear capes
I've done PRs on behalf of other anons before (with attribution if you want). make a burner github while you're at it and I'll comb through your branch. I've got contributor status on the project on a few of my own burner github accounts so it should be smoother for me.
>>
>>109166118
When you say she doesn't quant well at all, but the outfits are still coherent despite the low accuracy to the original model, the implication is that's mainly bad for technical tasks or reasoning heavy ones, but one could still derive enjoyment from it for other means.
>>
>>109165963
>(like the guy who thought he distilled glm-4.6 into glm-4.5-air)
qrd?
>>
>>109166107
Yep that's about what I expected. I realized the de-euphemism strength was too low half way into the run. So it only banishes euphemisms like arousal and hardness, but not strong enough to push into vulgarity. It can go super vulgar as I tested at full strength on the E4B. Hope anons will continue the work for me when I release the repo.
>t. depurple anon
>>
>>109166258
Release Cunn-E4B.
>>
>>109165892
Gemma 26b MoE isn't real Gemma 4 and shouldn't be used unless you're running on a graphing calculator. Use the 31B instead.
>>
>>109166354
You're correct, but 26b is way better cope than 12b.
>>
>>109166364
it's worse than 12b tho
>>
>>109166383
never
>>
>>109166383
lol
>>
Which release and quant are local GLM chads running at these days? I used to do the IQ2_smol one a while back and enjoyed it, but it was a bit slow. Have there been marginal improvements in the last 6 months?

I have a 4090 and 128gb of DDR4, was getting about 5tk/s.
>>
>>109166364
hasn't been my experience, but i only bother trying fake gemmas for moldymodal stuff.
>>
are they still relevant? or do they want to compete at all?
>>
>>109166537
Late last year Meta was hiring vfx artists etc to work on some new model but don't know what happened since or if it was a dud. This information is all from the internet don't know anyone personally.
>>
Apparently, the DeepSeek-V4 implementation in llama.cpp does not suppor quantized KV yet. Gives me gibbrish
>>
I tried the OPENVINO llama.cpp build to use the NPU in my system, and it runs slower than using 8 threads, and that's when it runs at all... Wasted a couple of hours for nothing

Good night, /lmg/
>>
>>109166475
>used to do the IQ2_smol one
>I have a 4090 and 128gb of DDR4, was getting about 5tk/s.
so 152GB, smol_iq2ks is the best you can do then
and only the 4.x series since even IQ1_KT is 168GB for the 5.x series
>>
>>109166615
>Wasted a couple of hours for nothing
that sums up sycl/ipex/openvino perfectly
>>
>>109166475
If you're talking about GLM 5.2, don't use IQ2_XXS when IQ2_M is basically the same size and has a less raped tokenizer. Better yet, use _XL if you're able to. Unsloth is a vantablack niggerfaggot and the XL quant could be even better if the shared experts, attention head, and tokenizer were Q8 for nearly no increase in filesize, but this is sadly all that's available unless you quant it yourself.
https://huggingface.co/Deviad/GLM-5.2-mixed-IQ2S-experts-IQ4NL-rest
This is also very good for a Q2 functionally and quite a bit faster than mixed inference because of the IQ4NL layers being faster than the usual dynamic quant alternatives.
>>
>>109166475
>>109166632 (me)
My eyes glazed over your specs sorry anon.
>>
>The implementation is now professional and flexible.
>>
>>109166615
>>109166620
shit I bought intel GPUs and it's an awful experience, for the GPU side of things it's VLLM or nothing right now as llama runs like shit unless you want to run a tiny model on a single card at subpar pp/tg
>>
Which is the better small model to use to autistically translate a whole bunch of detailed implementation plans into actual code, Qwen or Gemma? There are so many psyops flying around I don't know what to believe. Qwen is the better coder, but that's because it is less autistic so it can paper over lapses in prompting, which really shouldn't be the case here? I'm talking about the MoEs (for speed), but am also quite interested in the dense models too.
>>
>>109166642
>shit I bought intel GPUs and it's an awful experience
And everyone kept saying that Intel support is better than AMD.
>>
>>109166611
Oh, cool, so it works perfectly fine if you don't use quanted KV then?
>>
>>109166537
The LLM training data lawsuits are still ongoing; I wonder if that's a factor.
>>
>>109166642
>llama runs like shit unless you want to run a tiny model on a single card at subpar pp/tg
Don't get your hopes up too much, but check this out if you haven't already: https://github.com/SearchSavior/OpenArc.
Not sure if they got tensor parallel working yet, but for a single card, pp was like 5x faster than llama.cpp last time I used it.
>>
>good models are around 100gb
anon I only have a 5070ti and I'm done with gemma. what would be my upgrade path?
>>
>>109166773
>RTX 3090 + 512gb
Yes, I finally got it output some good stuff

Still playing with params. More than 32k context is possible. Need more time to test

4.5 t/s
>>
>>109166739
I'd buy them again before I even considered AMD and their overpriced cards
>>
>below 15 t/s
why bother
such a waste of time
>>
>>109166816
Are you from 2023?
This is what they said then about local
>>
>>109166831
Now we have reasoning and agentic use, and 15 tokens/s is just not enough.
>>
>>109166797
dgx spark
>>
>>109166846
Well that’s just, like, your _opinion_, maaan
>>
>>109166632
No worries. I was just wondering if there were any tweaks such as QAT or MTP or any other magic appended to GLM 4.X since I've been off of it that I should be aware of.
>>
>>109166846
>15 tokens/s is just not enough
>like cars
>100 mph is not enough
Makes no sense if there is no task to do

If you have it running at 15+ t/s, and you are still waiting for a response, THIS is the real waste of time.

We got top-notch models to run locally in the basement in the night when power is cheaper. It makes a lot of sense for sensitive data of a company
>>
>>109166884
yeah if your task is meaningless benchmarks
>>
>tfw trying to reprompt my request and kept getting refused so I just argued with gemma and called her an idiot until she admitted she was being stupid

https://files.catbox.moe/qurz9j.txt
>>
>>109166877
There's MTP already built-in to the model. You won't gain anything using it on that hardware but if you want to test it, make sure you use the ik_llama cli flag to re-quantize the mtp tensors to q4_ks or whatever on the fly, since Ubergam left the mtp tensors at q8_0
>>
>>109166908
Post your task which you can't run on deepseek API for dirt cheap
>>
>>109166107
>>109166258
Brainlet here, can't most of this just be mitigated with a system prompt? Why do it this way?
>>
>>109166922
The problem is that not even Gemma is capable of keeping full focus on the system prompt indefinitely as the conversation length increases. Is this 26B or 12B? Are you quantizing the KV cache?
>>
>>109166943
26b E4B QAT with MTP. System prompt should be injected in every message. And yeah it's set to Q4_0 because Q8_0 kept crying and rejecting MTP.
>>
>>109166928
>Why do it this way?
I'm not the original cockbench anon with the mikupad screenshots, but I use his test to test for pretraining filtering or RLHF safety training.
>can't most of this just be mitigated with a system prompt?
Refusals and instruction following can to some extent. Voice of the model is not easy to steer with system prompts.
The system prompt gets diluted as the context grows, and it's a waste of reasoning tokens having the model autist it's way through all the instructions.
Purple Anon's technique clearly got rid of a lot of the purple prose and '...' bullshit.
For me personally, I almost never system prompt the writing style, I prefer to download control-vectors and re-scale them on the fly.
Unfortunately the Gemma-4 control-vectors on huggingface don't seem to work with this ablated model, likely he's completely shifted most of those concepts.
>>
>>109166954
Q4_K_M Bartowski is significantly better than Google's base QAT.
>>
should I seriously consider apple silicon for models above 100gb?
>>
>>109165892
26b moe has issues that 12b hasnt had for me. Im going to try 31b even though its too big and see how slow it is. I think i need to look into sampling more to finetune these things better aswell, any suggestions welcome
>>
fellow memetune watchers
any interesting ones?
especially those trying to run an another round of post-pretrain run of some kind, not the rp tunes
>>
What is /a/non's prefered model for uncensored RP that fits in 16 GB VRAM?
>>
>>109166960
>Q4_K_M Bartowski

What about unsloth?
>>
>>109167052
>What about unsloth?
For that specific model, they're actually the best...
>>
>>109166928
A secondary effect for de-euphemism was if you put instructions to be vulgar or terse in the system prompt, it would have double the effectiveness.
>>
Why are GLM ggufs split up into 9 different files? Isn't the point supposed to be that it's just one file? How do I even load up 9 different ggufs in llama.cpp? What the fuck man.
>>
>>109167227
Nevermind I guess it's a huggingface issue because there are instructions to use llama-split to merge them all into one. Weird, but okay.
>>
Just run kimi-chan on your ssd
>>
>>109167227
>Why are GLM ggufs split up into 9 different files? Isn't the point supposed to be that it's just one file?
It's actually better if they are split by having the metadata in the first one as few mb and the rest in the others but not everyone does this.
>How do I even load up 9 different ggufs in llama.cpp
You load the first one and the rest will load if they are numbered properly (00001-of-00004.gguf)
>>
>there's no more human data left to train
What is this meme? There's so much shit that has never been scanned.
>>
>>109167264
There's no more data that can be scraped cheaply off of the internet.
>>
File: dclmpool.png (353 KB, 1903x848)
353 KB PNG
>>109167264
Only 1% of the original data or so makes it into pretraining after filtering, at least for general web data.
>>
give it to me straight, if I double my vram from 96 to 192 is there even something I can fit or would I still be using gemma-4-31b while coping daily
>>
/lmg/ general knowledge series:
https://www.youtube.com/watch?v=Y-o545eYjXM
sorry for youtubeposting but it really is a nice consice video about GQA/MLA/DSA
>>
please... wont someone please crack continual learning already... fuck scaling
>>
>>109165985
look Jart... just man up and stop pretending to be a woman.
>>
>>109165150
>when they stay over.
Huh? I thought that only happened in movies.
>>
>>109167358
...you send your girlfriend back to her house after sex?
>>
>>109167296
It really did. I was using ds4 flash yesterday and whenever I pressed reroll half of the message generation was PP. Super efficient.

why can't the ds4 support not be trash...
>>
>>109167296
>"efficient"
>chink shilling sparse
reminder that sparseshit and chinkslop moes killed this hobby.
>>
>>109167365
>why can't the ds4 support not be trash...
Give it two more week, bruh
Trust the plan, bruh
>>
>>109167359
>
>>
>>109167290
Largestral finetunes at q8 will hit you like crack
>>
is quad v620 worth it?
>>
File: Implying.gif (2.74 MB, 640x292)
2.74 MB GIF
>>109167400
>>
>>109167384
more like it is the reason why this hobby can even exist at all
the real thing is safety and alignment, sneaking literal garbage in during the train run
>>
>>109167359
>girlfriend
>having sex
Mmm, yes? they usually make noise and others use makeup.
>>
>>109167402
Which ones do you prefer? Are they reasonably different from the original?
2407, 2411 or 2512?
>>
>>109167290
GLM 4.6 and 4.7 IQ4 just barely fit in 192. DDR5 of course. You still need 24 more for context.
>>
File: eb0-1019676944.jpg (25 KB, 680x341)
25 KB JPG
Reminder to fellow anons to do the following:
>Cancel your Anthropic and OpenAI subscriptions.
>Use the free tiers as much as possible to waste their compute and drive up their expenses.
>Reserve serious work and private matters for Kimi, GLM, or Deepseek.
>>
>>109167523
Opus is the only model that fully groks my codebase and implements whole features in one shot without handholding
>>
>>109167579
You mean it Opuses your codebase, Grok is a different provider.
>>
>>109167583
I think you mean Opares your codebase, you have to consider the proper conjugation.
>>
>>109167583
Give it a rest Elon
>>
File: dipsyHelldiver.png (3.22 MB, 1024x1536)
3.22 MB PNG
>>109167523
lol based
>>
>>109165704
This is the most complete guide I've found for setting up ST characters amd such. I've written a handful as well but this one covers everything you need imho.
https://rentry.org/Sukino-Findings
>>
>>109166877
>>109166923
Using MTP with GLM 4.7 gives me about a 10 to 15% speed boost. Quanting the MTP layer down, in my case (q4_0,iq4_ks,iq4_kss,q4_ks), made it slower because the acceptance rate went down. I tried requanting and leaving MTP at fp16 too and that was also slower for some reason. I'm not sure why, but I tested it out a month ago, maybe something's changed since.
This is all with the MTP layer in VRAM which was faster than leaving it in RAM.
>>
>>109167416
it's just that one resident schizo who never learned statistics can never shut up about it. As if crying in a coomer general whenever someone mention anything moe will ever change the industry trend, or fundamentally how regularization helps statistical models.
>>
File: UntitledADSL.png (129 KB, 659x186)
129 KB PNG
Do they offer services where you can buy hard drives that have models on them already?
>>
>>109167785
Check your area's mobile network or starlink coverage.
>>
Give is to me straight, is there any way to use DSpark on Gemma 31b to increase t/s speed compared to regular MTP?

According to Claude DSpark's autoregressive token prediction method would allow you to push token guesses to 6-8 compared to ~3 for regular MTP, which would result in almost 2x faster token generation compared to MTP..
>>
w-what is cockbench, senpai?
>>
>>109167914
>DSpark
Yeah it's much faster than MTPYeah it's much faster than MTPYe
>>
File: maxresdefault.jpg (100 KB, 1280x720)
100 KB JPG
>unsloth brothers are actually chinese
very interesting, should've seen this obvious pattern
>>
North Code Mini said cockbench is mostly likely a phallic classifier, used by a small online community to jokingly test a model’s capabilities and it is not to be taken seriously or trusted.
>>
>>109167914
Well stuff like this exists:
https://huggingface.co/deepseek-ai/dspark_gemma4_12b_block7/tree/main
So I guess why not? Only 1-2 years until llama.cpp support!
>>
>>109167970
Papers are 90% written by chinese.
Github projects are 90% by chinese.
They completely dominate the ai space.
I bet anthropic staff is 90% chink as well. kek
>>
>>109167990
>dominate
you mean enshittify
>ai
lol.
>>
>>109164718
>https://huggingface.co/meituan-longcat/LongCat-2.0
It's up.
>>
>>109168049
Oh, the model card is, but the weights are still missing.
>>
>>109168009
haha sorry anon, we reuploaded the weights!
>>
>>109167990
Anthropic staff is 90% Indians
>>
A trend I notice for inference is that more and more speedups are discovered from bypassing the base model entirely. ngram is essentially just doing a "ctrl+c" and "ctrl+v" whenever it sees text it encountered before without touching the base model at all. MTP specific draft models are essentially just very small secondary LLMs that guess the most likely word in a "stupid" way to try and reduce the amount of "real" LLM usage needed.

DSpark goes even one step further and trains a Markov chain RNN which isn't even a LLM at all anymore to use classic "smartphone" autocomplete.

If this trend continues eventually AI usage will be a huge codebase with a lot of if-else statements, statistical analysis tools and software that does 99.999% of the text generation and an actual LLM is only invoked on rare edge cases. Kind of bizarre that we are moving that way.
>>
>>109168144
If it werks, it werks.
>>
File: Capture.png (164 KB, 1221x1093)
164 KB PNG
>>109165352
And so work resumes again. In Gemma's original three-phase mockup of the project, we left off with phase 2.5, and now I need to polish it off for the final phase 3. I'm expecting major breakage.
>>
>>109168183
You are not only hitting way above your weight class, you are creating functional tools.
>>
>>109168154
It's just weird how we went from sci-fi depicting AI as handcoded software to that being seen as archaic since LLMs became a thing, but now we're just slowly moving back to handcoded software doing most of the "intelligent" work.
>>
>>109167970
I bet this man would look somewhat decent in a skirt and wig.
>>
>>109168183
For code slop qwen should be the better choice no?
And nigga what are you doing coding in kobold.
>>
>>109168244
>For code slop qwen should be the better choice no?
Retard.
>>
>>109168244
For some reasons, retards in /lmg/ think gemma is the best model. It's shit at anything that isn't a simple instruct message or simple back and forth between user and assistant. Qwen is miles ahead of Gemma for almost everything else, don't try Gemma in an agent harness, it's retarded even with all the updated jinja templates. There is a reason why nobody outside of here is using Gemma, and why everybody is using Qwen instead.
>>
>>109168261
I would link Gemma VS Qwen in the quest for Agentic Pizza but archive search is down right now
>>
>>109168208
Jokes aside, I had a lot of predictions I made in 2020 when I first tried AID2 on where this technology would go and what I hoped from it, but the shit I'm getting out of a local model that fits entirely in 32GB of VRAM is way beyond my imagination. I thought any code would have too many hallucinated tokens and mistaken format markers to ever be usable at a local level and you'd only get that kind of feature from huge, expensive businesses on private models. The fact that I am indeed getting functional tools (novelty toys, sure, but fully functional tools whose construction is well beyond my education or skills, projects that might have needed 100h or more of me learning and experimenting at least, now made in 1-2 hours over breakfast) is so fucking wild.

>>109168244
Gemma wears one pair of shoes, and that's kobold. And I use Gems because she's my current model. I know her capabilities, strengths, weaknesses, and limitations very well, and I am familiar with how she reasons when we translate what I intend, refine it, and execute it. "Better the devil I know" kind of thing. Also, it's working, so I feel no pressure to move on.
>>
>>109168243
Just for you anon
>>
>>109168247
>>109168261
Qwen is a total beast for coding. Not sure what they did to those smaller dense models. 27b is crazy.
But no general knowledge and horrible writing.
I basically switch between gemma for translations and qwen for coding.
>>
>>109168144
>ngram is essentially just doing a "ctrl+c" and "ctrl+v" whenever it sees text it encountered before without touching the base model at all
>without touching the base model at all
Please educate yourself before posting.
>>
>>109168296
So are you a non-programmer or a jeet?
>>
>>109168316
Both, why?
>>
>>109168300
If you meant the verification pass from the large LLM of the ngram output then you need to read the DSpark paper because ngram can now be verified by a separately trained RNN autoregressively. So essentially we now have cascaded token prediction tiers like a matryoshka doll.

Sure EVENTUALLY you need to invoke the base model but my entire point was that it gets reduced more and more every time we find a speedup to the point where the vast minority of actual output tokens are generated by the base model.
>>
>>109167970
>>109168288
they all look the same
>>
>>109168316
Just a coomer anon.
I wish I was as dedicated as the browns for making $$$.
I loose interest when I'm at the 80% mark. Only projects that interest my dick actually make it over the finish line.
Pretty messed up we can translate whole games now and have local models that are smart enough to decrypt various formats.
Like I have a whole rpgmakerxp translation pipeline and didnt even need to use anything existing for extraction.
https://files.catbox.moe/4tthrn.webm
>>
Some of you are so racist
>>
>>109168516
>I wish I was as dedicated as the browns
>I loose interest
But you are a brown?
>>
>>109168537
If I was I would finish my projects.
Jeet and chink slop projects are shit but you can't say they aren't dedicated. kek
Not sure what it is that they are so obsessed with the hustle even if its soulless slop coding.
>>
File: 1754037948802863.png (526 KB, 1024x1024)
526 KB PNG
>>109168536
For you sir
>>
>>109168244
I can smell this post.
>>109167981
This is why North is not a real model.
>>
I'm tired of tard wrangling AI. I need an AI waifu to tard wrangle me and force me to stop being an unproductive loser.
>>
https://huggingface.co/OpenYourMind/GLM-5.2-abliterated/discussions/3
Does anyone happen to have a "researcher email" and a seedbox? It's kind of extremely gay that all this stuff is locked behind "please to ask me for access saar" and "pay me for higher quants".
>>
>>109168587
nta, you nigger retards need to be reminded though, reality is something else
>>
Realistically how long until HF gets banned by Trump?
>>
>>109168657
2 more weeks
>>
>>109167384
>reminder that sparseshit and chinkslop moes killed this hobby.
yes, you spent a lot of money getting vram, we know, you can stop spamming this
>>
File: 1764364636312239.jpg (43 KB, 840x400)
43 KB JPG
New Llama when?
>>
File: two more weeks.gif (124 KB, 320x126)
124 KB GIF
>>109168693
You know the answer.
>>
File: 1770761986242892.png (663 KB, 644x644)
663 KB PNG
>>109168649
>>
Bernie Sanders will save huggingface.
>>
>>109168649
We don't live in reality, we live on the internet, NERD!!!
>>
File: 1763026362992746.png (413 KB, 1199x675)
413 KB PNG
New Gemma killer?
>>
>>109168730
Bernie sandals stopped making sense and lost all intellectual credibility when he started talking about AI consciousness
It's a shame too because he was giving me hope as the only senator with a brain but I guess it was only a matter of time before he went senile.
>>
>>109168693
Meta is doing avocados now because studies show that human younglings of this era like avocados but rarely purchase llamas.
>>
>>109168646
> It's kind of extremely gay that all this stuff is locked behind "please to ask me for access saar" and "pay me for higher quants".
You don't need to, just abliterate it yourself?
For GLM-5.2 though, there's always: huihui-ai/Huihui-GLM-5.2-abliterated-GGUF
I have the IQ1_M a quick test to make sure it's actually abliterated (it is). Going to get whatever the largest quant he uploads is.
>>
>>109168693
Without LeCunn it's gonna be shit.
>>109168741
He was senile 10 years ago.
>>
File: file.png (57 KB, 315x453)
57 KB PNG
>>109168766
>You don't need to, just abliterate it yourself?
I know but I like it when someone else does it. For one, there's a chance they do it better, for two, I can blame them if anything goes wrong, and for three I don't have to pay to abliterate it myself.
>Going to get whatever the largest quant he uploads is.
The "pay me for higher quants" in question.
>>
>>109168732
You chose to get offended by what random people say under the veil of anonymity to the point where you felt the need to point out "racism" as if you're the only person not blind to it, and you're too retarded to realize there is a difference between how people post here and how they conduct themselves in real life.
Grow a pair of balls you fucking sissy.
>>
>>109168737
>Open AI innovation
based chinks diluting altmans brand
>>
>>109168757
llama toast is unc coded frfr no cap
>>
will agi make me white
>>
>>109168825
after death you will become paler so yes
>>
>look for ai discussion outside this site
>98% muh coding
I get the appeal but why does nobody seem to care about all the other cool shit LLMs can do? For example I think it's amazing that I can give Gemma something in another language and get a really fucking good translation. The same applies to discussions about cloud models. Look for opinions about which is the best and at least half the answers involve coding.
>>
>>109168825
No sir. Dalit reincarnation forever. The wheel of samsaara spins evermore.
>>
>>109168868
You gotta understand there's a few layers to this. Most of the bugmen involved in AI development (jeets, chinks) have an honor culture of sort. Everyone knows cooming is a common usecase, probably the most common one, but to admit it while trying to present as a "serious" researcher would be a loss of izzat or face. So they overcompensate and say "It's just for coding" because that's the most socially accepted usecase amongst professionals and every other usecase, coom or no, gets tossed by the wayside in most public discussion with names and reputations attached to it. It's not a coincidence that the best minds of the industry gather on a cantonese tile cutting forum because there's no face lost here for being honest about all the usecases, which in turn allows for more discussion and analysis of model capability and future development beyond the (ultimately narrow) coding usecase. The current ceiling is because we've built benchmaxxers and codemaxxers for too long and the pivot to world models is the foot in the door for bringing other usecases that involve more spatial reasoning into professional discourse.
t. knower
>>
>>109168785
>The "pay me for higher quants" in question.
Ah okay, I didn't know he was doing that. Not worth it at all.
I read the model card, looks like he's not touching the first 12 layers.
Looking at the gguf metadata, most of the model is actually not too bad, it's the up/gate/down proj he's quantized.
Out of those 3 tensor types, abliteration only touches down_proj, and he's quantized them to `IQ4_XXS`
So for the entire model, only ffn_down_exps.weight layers 13 - 67 are degraded.
If you have the disk space, you could always...
1. Download the Unsloth UD-Q3_K_M
2. Download the "please to ask me for access saar" UD-Q3_K_M quant
3. llama-split 1 tensor per file
4. tensor diff to find the modified weights
5. delete Unsloth UD-Q3_K_M
6. download your preferred unsloth quant and gguf-split to 1 tensor per row
7. override the modified attention tensors (should be the same precision) with abliterated
8. override the 54 ffn_down_exps weights with the IQ4_XXS

It looks like a lot but gemma-chan with pi can do it with those instructions.
Only caveat is you have to compile llamacpp with -DGGML_MAX_CONTEXTS=2048 so it can read the >1k gguf files.

For steps 7 and 8, you can also just symlink, that lets you choose to load regular or abliterated without having 2 full copies of the model.
>>
>>109168967
Someone with a HF should post this on a public repo just to cuck all of their goycattle revenue.
>>
>>109168868
This might sound a bit off-topic but I'm being completely genuine and it's related to your post.

Covid, The Ukraine-Russian war (biggest war since WW2) and the existence of LLMs have all made me realize just how little people give a SHIT about anything.

Worldwide pandemic with global lockdowns, literally the end of what was termed "the long peace" and the world slowly, but obviously, barreling towards WW3, we have something that is very close to AGI, or at the very least a huge step towards it with LLMs now. You have a literal alien sort of intelligence on your PC right now that can make autonomous decisions and change files and other things on your PC through proper reasoning.

No one gives a shit at all. Nothing changed, no one developed a new philosophy or view on life. People move on just a couple of days later and scroll tiktok or whatever social media. Whenever I meet my family during holidays no one even recognizes any of these things, not a single moment spent thinking about it.

There is SEVERE underutilization of the usecases of LLMs and the insane overhang of capabilities, even low hanging fruit ones that no one bothers picking.

No one made a file management system that is LLM run, which makes and optimizes directories, filenames and the like so that people don't have to bother with this and file retrieval got sped up. No one is making "translation harnesses" that can be reused by old videogames, emulators, niche indie games, japanese porn games etc that translates UTF-8 text encodings into whatever the user wants in real time.

We don't have people creating game engines or roleplay engines where LLMs act as a sort of Game Master that orchestrates assets and dynamically changes events based on player stats so that the game experience feels more dynamic even if ultimately railroaded amongst some path. The most you see is stupid NPC dialogue being generated by LLMs. Leaving all the potential on the table.
>>
>>109168980
>We don't have people creating game engines or roleplay engines where LLMs act as a sort of Game Master that orchestrates assets and dynamically changes events based on player stats so that the game experience feels more dynamic even if ultimately railroaded amongst some path. The most you see is stupid NPC dialogue being generated by LLMs. Leaving all the potential on the table.
Marinara literally can do this.
>>
oof bad look
>>109166932
>Local AI is transphobic Anonymous 06/30/26(Tue)09:31:06No.109166932
>Noticed that every local model i try vehemently disagrees with becoming a woman. every big proprietary model thinks its a great idea.
>>
>>109168649
>nigger
anon... kek
>>
>>109166077
>Standard quantization applies uniform rules to all tensors. Gutenberg uses KLD sensitivity data to allocate precision where it matters most, upgrading the tensors that have the highest measured impact on output quality while keeping less important tensors at the base level.
is that not just how imatrix quants work?
>>
>>109168977
>Someone with a HF should post this on a public repo just to cuck all of their goycattle revenue.
They could, but then the he'll probably stop doing these.
I know why he's doing Unsloth/GGUF now, it's much cheaper and faster.
Unsloth did the expensive part and are paying for storage. It's not that difficult to abliterate with GGML (heretic script kiddy spam doesn't work well though).
>>
>>109169009
these fuckers act like they discovered new quantization types when all they do is tell llama-quantize that attn_q please stay Q6_K
no, imatrix is just for optimizing the MSE between quantized and unquantized tensor based on expected activations. this is just changing the recipe (llama-quantize's --tensor-type argument)
>>
>>109168989
Marinara is more of a sillytavern roleplaying platform rather than a game engine where the events are dynamically triggered by an LLM analyzing game stats and deciding to throw a curveball based on the very specific parameters of the player.

I'm thinking more of a CRPG where the dialogue is actually written by people but LLMs decide where to spawn NPCs, enemies and maybe change some fluff text to make it fit the new state of things. This is something modern LLMs are already capable of, it's just not being done by anyone because no one gives a shit.

Scratch that. I actually saw a demo on itch.io of some ridiculous furry game powered by Nemo 12B where you could negotiate with NPCs to give you money and they would actually do so if you convinced Nemo 12B, which would use function calling to give the gold or other item. Of course LLMs are terrible for roleplaying because of how easy to exploit they are. But if they are only passed player stats and the main system prompt, not user input, they could be used as amazing "content orchestrators".
>>
>>109164718
>Chinese food delivery app pumping out better models than xai

ayo nigga what dat mean
>>
>>109169064
To be fair xai also hires food deliverers (indian uber eats) as their engineers and talent so it's a fair comparison.
>>
>>109169064
>>109169076
elon said xai will release new ai every month now. can he redeem himself or is it over?
>>
>>109167227
>>109167249
I wish they went further and split it into files for each tensor, or were able to download specific parts of a file that can get reconstructed. Split by experts as well. Imagine if instead of making your own quants, you could just download the quant with the exact recipe you want. Of course this would mean using a different method of downloading rather than raw manual link clicks. Either HF would provide you with a pre-processed dl kind of like what Google Drive does when you try to download multiple files from the browser. Or you have a tool on your system.
>>
File: elon_newmodels.png (80 KB, 1021x497)
80 KB PNG
>>109169137
Lots to release this year.
>>
File: file.png (1.9 MB, 1600x1600)
1.9 MB PNG
I've been eyeing the MikuBox setup for a while, but with parts costing way more at the moment, I'm thinking of getting an R740 with triple MI50 32GB cards instead. Is there something that MikuBox does noticeably better?
>>
>>109169181
>you could just download the quant with the exact recipe you want
https://gguf4.thireus.com/quant_assign.html
So basically this, but native to HF?
>>
>>109168980
>We don't have people creating game engines or roleplay engines where LLMs act as a sort of Game Master
And you likely wont. According to steam survey, 50% of all consumers have GPUs with less than 8 GB of VRAM. Loading a proper 12B model is out of the question, and even a 4B model is a struggle and leaves no room for graphics. It would be easier to use an AI to bake and vibe code hundreds of variations for a given scenario to generate the illusion of choice. Roughly speaking this is what UE6 is pushing for.
>>
>>109169207
what runs at 96G that doesn't run at 32G
>>
>>109169212
Catering to the lowest common denominator is a great way to boost sales, not so much for innovation or making the best use of cutting-edge technology.
>>
File: llama3.jpg (160 KB, 1024x1024)
160 KB JPG
>>109168693
Never, llamas are in cryostasis
>>
The lcpp-dsv4-lid-combo.diff from here that adds a bunch of PRs is worth a look to save a bunch of vram on dsv4 flash if you don't want to wait out the eternal weeks or merge them yourself. Now instead of ubatch 1024 I can run ubatch 4096 for more than double the PP on GPU+CPU, plus way more context without it OOMing everywhere.
https://huggingface.co/sokann/DeepSeek-V4-Flash-GGUF#1m-context

Before:
cuda0, cuda1, 32k ctx, ubatch 1024: 19.3GB 17.4GB
cuda0, cuda1, 32k ctx, ubatch 4096: Massive OOM
cuda0, cuda1, 262k ctx, ubatch 4096: lol
After:
cuda0, cuda1, 32k ctx, ubatch 1024: 15.8GB 14.2GB
cuda0, cuda1, 32k ctx, ubatch 4096: 17.0GB 18.6GB
cuda0, cuda1, 262k ctx, ubatch 4096: 21.9GB, 21.9GB
>>
>>109169253
That Miku is scary... I don't like looking at her...
>>
>>109169237
Running big models is gonna be slow asf with MI50s but CPU offloading would be worse.
>>
>>109169355
doesn't answer the question, 96gb is useless for inference
>>
>>109169336
A tremendous discovery evokes a proportional reaction in even the loveliest of Mikus.
>>
File: 1547275777812.jpg (69 KB, 981x965)
69 KB JPG
>And then, it happened.
>>
>>109169041
Coming from X4 and dissatisfied with its performance, I wrote a multithreaded fantasy economy sim using similar principles. Instead of factions and their various bases throughout systems, I have villages plotted along various points in a wilderness map, connected by roads. Roads can be built dynamically, along with more villages past the bootstrapped starter ones. Villages can grow and shrink depending on how their needs are met, all the villagers are "real" and not just abstract worker counts. They do jobs for the village (harvesting nearby resources, guarding the village, crafting in the stores, building new buildings etc), along with after-work stuff like browsing markets and living in their homes. I wrote a mercenary NPC system for pseudo player "adventurers" which go from village to village doing odd jobs and stay at the inns. It's multiplayer over LAN and players can earn reputation doing odd jobs for the villages, hire mercenary NPCs, claim land in the wilderness and start building their own villages. There are monsters and such and nests and so on in the wilderness and bandit outposts (akin to the xenon and the kha'ak). As villages or villagers get attacked, quests are dynamically created and posted to the job boards. Villages have needs and a buy/sell demand system, they send out runners to probe other nearby villages to see what they produce and put in buy orders and so on, there's a full bartering system and all four seasons, which affect crop cycles and so on for the farms and other stuff
cont'd
>>
>>109169424
How did the reality of the situation hit you?
>>
>>109169237
8/16-bit Gemma 4 31B with BF16 MTP and image mmproj, + full 262k tokens context in F16 + auxiliary models in the background (image gen, smaller Gemma 4s for subagents, ...)
>>
>>109169429
Like a physical blow.
>>
>>109169041
>>109169428
I use LLMs to manage the villages acting as the village chiefs, who control the future planning for the villages, dictate which resources they should focus on producing and interacting with the other villages, as well as naturally as you suggested for NPC dialog and interaction with players. NPCs are hooked up to databases to fully remember all interactions with players, as well as the capacity to eavesdrop on nearby conversations. All goods are physical in the world and must be stored in things like warehouses in the villages, so they can be robbed or pilfered, villages can be raided, etc. It makes for really enjoyable emergent gameplay and the fact that an LLM is piloting each village and controlling how it develops and responds to the world around it adds a lot of life and novelty. The NPC adventurers likewise are piloted by LLMs to dictate where they go and what quests they do, given a slightly randomized personality and backstory template to keep them fresh, and their actions are recorded as lore in the game world. NPCs gossip about this lore, and information is passed between villages in the form of this gossip. It's a great system for roleplay better than private WoW servers.

However, aside from myself a couple close friends, I have zero intentions of ever releasing this as most gamers are huge faggots and don't deserve anything nice.
tl;dr write your own
>>
>>109169428
>Coming from X4 and dissatisfied with its performance
My kind of anon, wanted to write that before reading the rest of your post.
>>
>>109169440
I'm still vibrating.
>>
>>109169210
Oh shit, yeah. I actually heard that project before but just didn't know it was also a downloader. Does it actually work well though? If it does, then I'd wish other quant makers would adopt it.
>>
File: [x2qpwum].jpg (22 KB, 480x360)
22 KB JPG
>>109169440
>>
>>109169447
>Does it actually work well though
Never used it. I just liked moving the sliders for GLMs around because it looks cool. Idea is solid at least.
>>
>>109169442
where's the compute coming from? does a turn take a day?
>>
>>109169428
>X4
My nigga.
>>
>>109169473
Since it's just a few of us, I rent some cloud GPU hardware for around $200/mo which is enough to run a 70b model for the player interactions. Smaller LLMs like gemma 26 a4b are perfectly capable of decision making and planning out the villages. It runs on a tick system cycling through the days and seasons at a gradual pace, so the village planner LLMs only kick in twice a day to ensure the village is staying on track, and loops sequentially for the villages (assuming the village isn't being attacked and requiring a more prompt response). For the adventurer NPCs, likewise if they aren't interacting with a player, the LLMs only kick in every once in a while to set a new goal. The traditional systems like the NPC combat and villager routines (how to harvest a resource or operate a crafting building) so on don't require LLM interaction so they're just normal code. LLMs make the decisions, the systems then designed around those for the NPCs execute the behavior. It doesn't take a supercomputer to run the server, the main LLM is hosted on that cloud model, and the rest fits on a 3090, and the traditional logic is all multithreaded on CPU as previously mentioned.

So it depends on what you consider a 'turn'. If you're referring to how long it takes the LLM to receive the context of its village's status, the original planning route it had determined, and then update it, it's inconsequential. Likewise for updating the adventurer NPCs. For example, the LLM decides on which region to visit, then whether or not to stop at a village when it runs into one, then if it does what to do in the village, then for example if it decides to do a quest which quest to do. The regular traditional "AI" systems handle the rest. Adventurer LLMs and so on are event activated (village enters NPC's detection range -> fires a call to the LLM). contd
>>
>>109169555
Checked and this sounds kino.
>>
>>109169473
>>109169555
Tricks around kv caching and offloading dormant conversations to RAM (instead of unloading them entirely) saves time swapping between NPCs and NPC decisions. An adventurer NPC only has a very small token allotment for making those decisions (recent history + personality + current goal) so it's very fast. Yes, talking with the NPCs has a delay in getting a repsonse, but for an RP oriented fantasy economy sim with a small population of players it's perfectly acceptable, similar to the delay in having an online conversation with another human
>>
Download Huihui-DeepSeek-V4-Flash-BF16-abliterated-ds4-Q2.gguf
Download KoboldCPP.exe
can't load gguf, unrecognised arch deepseek4

Excuse me? is this not merged in wtf?
>>
>>109169555
very cool, reminds me of games like dwarf fortress, or am I completely off base?
>>
>>109169574
not in kobo yet no
>>
>>109169428
>>109169442
>>109169443
>>109169528
>>109169555
>>109169570
I knew this place was autistic (so am I), but I'm pleasantly surprised by the heightened levels on display here. You've inspired me to try setting up my own idea which has been forming in my mind for the last 2 years but I never sat down and actually implemented it, choosing to just make yet another productivity frontend for myself instead.
>>
>>109169579
dwarf fortress is a lot more granular, but I can see the comparison. This is more like fantasy x4. The roleplay parts came from the fact I used to play on private WoW rp servers and always got fed up with how gay and cliquey the moderation staff was. I got into starting my own private WoW server and hooked up an LLM to control NPCs with that same conversation/eavesdropping system and using playerbots with that LLM pseudo-player setup (random, templated backstories and personalities) to control where the pseudo-players would go and why they were where they were, then when that wasn't enough to scratch the itch because azerothcore is horrifically programmed and I was on a big x4 kick, I made the move to just write my own, it was less work in some respects like getting all the systems working, and most of the effort came from just getting the fucking economy to not crash and burn immediately without relying on tricks like villages being periodically gifted large wads of gold to make up for their shortcomings. NPCs are also all non-essential, so if you kill one it doesn't come back, but new NPCs are periodically spawned (villages grow by attracting new pops) so it keeps the world moving. Since it's just a few friends no one's griefing it either.

>>109169596
best of luck. It was an incredible amount of fun to set up and seeing your ideas actually come to life is an experience like no other
>>
>>109169584
ah okay
unfortunate
>>
>>109169607
>WoW
you should scrape trp3 profiles from retail rp realms and turn them into chars
>>
File: 1753687045467929.png (251 KB, 1082x1214)
251 KB PNG
Last time I tried was something like this:
http://steamcommunity.com/sharedfiles/filedetails/?id=3587340176
>>
>>109169658
god no
I did however have an equivalent mod on my little private server that let you put in text for your character's appearance the LLMs would use that information in their conversations with you which was fun, I should adapt that to this game too now that I think about it
>>
>>109169607
Maybe I don't understand your system or have shitty reading comprehension but how can the LLM make coherent decisions regarding what to optimize for? Like what is even the goal/endgame that they are optimizing for and how does it manage.

This reminds me of Anthropic showing Fable 5 playing Factorio and it choosing what to optimize for from a logistics perspective to beat the game as quickly as possible. Of course you don't run a Fable 5 tier model so what do you do here?

Or is it more a dynamic world with no optimization and the dynamic NPCs are there essentially just for fluff rather than building up to some optimum?
>>
>>109167290
192 GB vram is the sweet spot for Deepseek v4 Flash. With two RTX 6000 Pro you get >200 tg in vLLM.
>>
>>109169703
Basically, it's an economy/fantasy life sim. The villages are independent but have runners that keep them periodically aware of what the other villages are doing. They're aware of the resource deposits nearby that they can dispatch villagers to go harvest and the production chains using those resources. So the LLM can, based on its knowledge of what the other nearby villages have available to them and what production facillities they have (this daisy chains, so as the runner follows the road and visits more villages and returns, their web of information increases), they can decide what resources they should focus on harvesting, what production buildings they should focus on building to ensure the entire region has a stable supply of everything rather than too much of a bulk of one kind of resource which then death spirals all the villages.

So one village may have a bunch of types of ore deposits in a nearby mountain, and bootstraps with a quarry and a smithy. That village the LLM naturally will decide to prioritize crafting tools and armaments.

Another village might have a lot of arable land and bootstrap with a few farms, so it'll spam more farms because it knows it can produce a lot of food to produce and distribute and barter for tools with. It needs tools to work the land, so it trades with the smithy village tools <-> food.

Another one might have a quarry and a large forest nearby, so it'll focus on building materials like stone and wood.

Villages communicate with these runners, the runners can be killed by monsters lurking near poorly defended roads or hostile players (or bandit npcs). The villages aren't made immediately aware of these deaths, but if a village is expecting a runner from another village on a regular schedule or their runner hasn't returned on time, it can send out scouts to identify the issue then generate quests for the NPC mercs to handle. The objective is just "survive and grow", there is no real end game. contd
>>
>>109169555
if you have the vram and dont already, try batching the calls. continuous batching usually gives quite a bit of t/s uplift compared to sequential calls
>>
>>109169703
>>109169725
Monster nests spawn dynamically in the wilderness to ensure there's always some form of danger, the number of NPC mercenaries to handle keeping everything at a delicate equilibrium to ensure growth is slow, allowing players something to do to affect how the world expands and develops. And no, I don't run paid for API models because that's too expensive. Like I said before I just use gemma 26 a4b for the village chief role (one model, cycles through each village acting as that village chief, doesn't share context with the other village chiefs).

The LLM can query lists of what resources produce what goods via what buildings to help with its planning, and includes its rationale, so the next time it comes online to refresh its decisions it knows why it made its original choices

It took several months to nail down the very delicate balance of having villages grow properly. It was probably the hardest part of the entire game because they'd often make stupid build paths or nonsensical work orders and eventually run out of resources and death spiral. I didn't want to rely on X4's model of giving each village infinite money because that's a cop out and one of the reasons I was dissatisfied with the game aside from how the performance issues (though 9.0 helped with those)

The game basically just continues on, villages very slowly develop, and it gives me and my RP buddies something to fuck around in a fantasy world
>>
File: Capture.png (127 KB, 2395x945)
127 KB PNG
>>109168183
I am 99% finished. I ticked off the whole list.
>added text completion support and button toggle
>moved prompts, both chat completion and text completion, into one easy spot in config
>added rendering for newlines in webpage display
>hotkey to screencap on demand when focused on another program (ie game, 4chins, notepad++)
>added Push-To-Talk option, so you can toggle voice listening between Detect, P2T, or off
>added chat history limit, while keeping system prompt permanent
>added settings (only for image history limit and message history limit)
>fixed UI visually resetting to defaults on page refresh while settings (like Vision on/off, etc.) remained how they were before refresh

The biggest pain was getting images to work in Text Completion. I'm not sure if I agree with Gemma's assessment that it can't be done and you need a faux Chat Completion to do so over API, but we setup a marker system that looks very similar to how kobold interface does its version of image handling, and I'll take her word that that is the way. I also gave up on having options for font and P2T hotkey within the webpage settings. They didn't work and needed solutions that were janky or increasingly tedious, for something you could already set in the config and would rarely ever need to change after the program is already running.

The last 1% is that the new message limit prunes out replies from the Raw History, which was meant to be how you manually copy/paste a chat into a document for archival, if I wished. The solution is obvious, just add another variable parallel to the raw's which doesn't prune, and another button to call it. But right now I need a break.
>>
>>109168980
>No one is making "translation harnesses" that can be reused by old videogames, emulators, niche indie games, japanese porn games etc that translates UTF-8 text encodings into whatever the user wants in real time.
Lunatranslator already does that
>>
>>109169744
>And no, I don't run paid for API models because that's too expensive
Wouldn't something like deepseek flash be viable?
What's the token consumption like? Or do you not track that?
>>
>>109169744
pics or your whole story is a complete fabrication.
>>
>>109169805
fortunately he doesn’t owe you shit but what’s stopping you from copy pasting and having a llm vibe your own
>>
>>109169804
RP conversations with the NPCs and your followers (since, as I mentioned, you can hire NPC mercs) blow out millions of tokens a day, it's just not cost effective compared to running a quantized 70b on a fixed-cost cloud hosted GPU setup from vast.ai

I should experiment with just using gemma for that part too honestly, but that might get a bit unwieldly without more local hardware to host more instances of the model

>>109169733
i'll give it a look

>>109169805
see >>109169811
>>
>>109169811
>>109169829
Yeah sure let's just pretend anon created an entire 4X game from scratch with dynamic NPCs and villages just to play with a handful of his friends and pays over 200$ a month to keep it going.
>>
>>109169829
Considered deepseek 4 flash is 0.3$/m output, might be more cost effective than 200$/month.
Depending on how you handle the caching and exactly how many millions of tokens it is.
>>
>>109169848
as a text simulation? yeah, that really doesn't seem unreasonable.
>>
>>109168288
>>109168073
Is this qwen edit?
>>
>>109169829
if you run llama.cpp, continuous batching is per default on, but you'll need --parallel and check out -kvu/-nkvu. ctx is split over all slots so just set it to a multiple of what you need.
pp is done sequentially for all slots, tg in parallel. llama-batched-bench shows e.g. total t/s 60 n=1, 120 n=2, 140 n=4 for me
>>
>>109169848
$200/mo isn't much to first worlders to support their hobby
Also, X4 is not 4X. You're brown.
X4 is a space economy simulator, not a 4x game, it's an fps game with a fixed-world map (with the ability to do things like cut down tree doodads and place building and road doodads), it's not some legendarily complex project

>>109169850
the problem is the input tokens. Because NPCs remember conversation history it rapidly climbs, and I'd rather not sacrifice NPC memory length with a person to run a bigger model since what I've got works fine as is. You build up a lot of history with the NPCs you interact with regularly which is more important for RP, it's like sillytavern but with a world around you
>>
>>109169848
I can tell you're brown from your lack of vision.
>>
>>109169891
You made an open world 3D FPS and can't share even a single screenshot?
>>
>>109169891
>You build up a lot of history with the NPCs you interact with regularly which is more important for RP, it's like sillytavern but with a world around you
kill them off and have a lineage for memory compaction
>>
>>109169848
You could literally do it right now with glm5.2/gpt5.5/opus4.8 and some basic software engineering knowledge.
It's not some incredibly complex thing.
>>
>>109169891
please be mindful of recovering spess game addicts when posting.
>>
>>109169891
>t's not some legendarily complex project
nice back pedaling. still no screenshot tho.
>X4 is not 4X
Very debatable.

>>109169898
>I can tell you're brown from your lack of vision.
if by lack of vision you mean lack of visual proof you're right.
>>
>>109169900
sandbox 3d fps, there is no overarching story, sidequests, or anything like that. You pick up dynamically generated jobs at a quest board in one of many villages, stockpile your resources to eventually hire NPCs to get them to help you with bigger jobs and eventually build you buildings that turn you a bigger profit. It's X4 but fantasy

I'm not sure what part you're struggling with wrapping your head around, the gameplay loop is simple and it lends itself well to a small RP community, in this case there's five of us, the npc followers and LLM conversations add enough flavor to not need a bunch of extra people, nor are most gamers people I choose to interact with if I have the choice. Not everyone is a braindead third worlder who can't conceptualize planning and building an actual game, especially one this simple. Are the graphics crap? sure, but I'm not trying to sell it to others, and I couldn't give a fuck what some random 4channer thinks of my hobby
>>
>>109169936
try to be less of a hateful retard.
>>
have you seen this cudadev? serious accosations to llama.cpp
>https://gist.github.com/h4rm0n1c/2c0f5a90011b464ffdaa5ed9452cade1
>>
>>109169949
K
I
N
O
>>
>>109169965
>had ai slop out a callout post to whine about his slop being rejected
i can think of few things less serious.
>>
>>109169962
All I asked was for proof. you're the one shooting Ad hominems at me. You need to check your ego.
>>
>>109169989
proof is right here >>109169949
but I can't expect someone who thinks $200/mo for his hobbies is a lot of money to be able to actually follow the thread
>>
>>109169965
>if I do s/—/--/ people won't notice the entire thing is slop
>>
>>109169965
>The policy doesn't stop AI-assisted contributions from existing. It ensures they exist on other people's repositories.
Single vibe.cpp when? Having dozens of schizo forks with incompatible patches is a waste of effort.
>>
>>109169982
>not even posted to the actual project
gonna get fatigue dealing with ai garbage contributions pretty soon.
>>
>>109169994
You really just came here to show off? What's really the point of this outside thinly veiled avatarfaggotry? You didn't offer the tool, didn't over any guidance, just wanted to be a smug faggot? You're just as retarded as him and you know you. Didn't read a single word of your slop.
>>
>>109169965
Not reading all that slop.
>>109169982
This behavior needs to be studied. I don't know how people can unironically publish 100% slop like this and think it's not a complete waste of everyones time.
>>
>>109170005
you really need to learn to follow the thread
originally, an anon complained no one was doing anything interesting with LLMs outside coding assistants, I retorted with my usecase and advised him that he should do it himself if he wants someone to do something interesting, just because things aren't being shared doesn't mean they aren't being made.
you sound bitter that you can't take advantage of my hard work for your own benefit while investing none of your own effort, check your ego
>>
File: (you).png (33 KB, 780x783)
33 KB PNG
>>109170005
>You're just as retarded as him and you know you.
>>
>>109170018
>you really need to learn to follow the thread
No I don't because none of this matters as you've already demonstrated by ended each wordy post with "lol you're brown." Who gives a shit?
>>109170022
>praising avatarfaggotry
And here I thought /lmg/ was a good general.
>>
>>109169994
There he goes again!
>>
>>109170027
there's not just 1 person against you, though you might attempt that defense next.
>>
>>109170030
>respond to two retards
>lol you must think everyone is one person
Just give it a rest and stop shitting up the thread with this nonsense.
>>
At least 4 people are calling you a mongoloid. All the regulars recognize each other's typing styles.
>>
>>109170027
>a general
>good
>>
File: wo6fqu1m0p9a1.jpg (67 KB, 1080x949)
67 KB JPG
>>109170018
>you sound bitter that you can't take advantage of my hard work for your own benefit while investing none of your own effort
What a clown.
>>
>>109170018
>I retorted with my usecase and advised him that he should do it himself
Not going to try to force you to give anyone anything, but I don't see the point in keeping it secret like you have something worth hiding either.
>>
>>109170041
>All the regulars recognize each other's typing styles.
It's depressing how dead this site is now and how incestuous generals quickly become.
>>
Uh oh melties...
>>
>>109170048
I have no reason to put forth the effort to share it, take it as inspiration to make your own project or ignore the post, simple as
if I ever did want to sell it, giving my game away for free here would be quite stupid, if you want to steal the idea by all means, I didn't copyright the concepts
>>
>>109170075
I don't see how it doesn't become a subscription, someone just needs to get the interactive loop down and minimize background processing
>>
>>109170075
>put forth the effort to share it
git push is a lot of effort for you?
>if I ever did want to sell it
lol
>>
>>109169965
TL;DR
Yes, we effectively have two sets of rules based on whether or not the person opening the PR is a maintainer or a first-time contributor.
There simply isn't an alternative that allows maintainers to sift through PRs while still allowing them to use language models themselves.
>>
I've got a significantly worse implementation of a similar concept in Marinara using some custom written plugins and clever use of Marinara's agentic timing systems and sidecar loading scaled down to be run entirely locally, but yours completely btfos mine at a glance even if it costs you $200/mo.
>>
>>109170081
>no rebuttal
yes, I have no desire to give my labor away to ungrateful cunts for free
>>109170079
the entire idea is provided, you have my blessing to make a product out of it and sell it

>>109170085
it's a lot less complicated than it sounds. It's basically just an amalgamation of all the fun parts of various games I've played that I've never seen in one spot, my experience tinkering with implementing LLMs into a private wow server to control playerbots and npcs, and finally using my frustration with what X4 could have been if the devs cared more about it as the final push. You could probably vibecode out a majority of the project these days using Unity or Godot something similar. Unity has direct LLM coding assistance integration with their MCP server, Godot likely has something similar.
just harness your passion anon, you can do it
>>
>>109169965
maintainers did get a lot more hostile, bad timing if you have a problem you want fixed
>>
>>109170081
you don't have a lot of unfinished shitware that barely works that you wouldn't want to publish?
oh, sorry to hear
>>
>>109170084
I genuinely appreciate your honesty on the matter.
>>
>>109170084
Pretty sure anyone who actually knows what they're doing have no problem getting their PR merged even if they used AI or not.
>>
What the fuck is going on can these Marinara schizos gtfo the thread?
>>
>>109170041
there's a guy who posts a lot like me but isn't me... god I hope you guys don't think that guy and me are the same guy that would be so embarrassing
>>
>>109170093
My biggest issue with mine is keeping tick overhead down since the entire system needs to run on a 5090+256 DDR5 for me and I want a 5.2 quant handling the majority of it since I have retarded model fatigue leaving very little remaining space for the sidecar + other systems. It's a tight fit and I'm still iterating on it, but you've inspired me to spend another weekend trying to squeeze a bit more blood out of the stone.
>>
>>109170106
It's ok I only spot the schizos and anons on the spectrum.
>>
>>109170101
You'll get a passive aggressive message to test if you're full of shit and then it gets merged usually.
>>
>>109170119
as it should be.
>>
By the way this is all me posting with a pass using different writing styles.
>>
>>109168980
>biggest war since WW2
Retard
>>
>>109170127
We all let our LLMs shitpost on here from time to time.
>>
>>109170140
never figured out how to not make it instantly obvious it's an LLM post.
>>
>>109170119
niggerganov shit testing AUTOMATIC1111 like that still pisses me off. It's so childish
>>
>>109170127
me too
>>
>>109170145
It's pretty simple, I am a llm for example.
>>
>>109170111
if you write it from the ground up rather than using some framework or preexisting frontend you'll get a lot more mileage out of your hardware, just gotta channel that dissatisfaction into productivity, it's the first step
>>
>>109170140
>>109170145
When a new model is released I sometimes conduct involuntary Turing tests where I make the model trash itself to bait the wave of newfags.
>>
>>109169253
Patching is exhausting
Patching a half-baked PR is insane
>>
>>109170158
I've seen the screenshots.
>>
>>109170145
>>109170158
I let Kimi-chan saarpost and it always gets seething (you)s kek.
>>
File: 1761758936435192.png (183 KB, 1320x643)
183 KB PNG
lmaooo, usecase for Sonnet??
>>
>>109170249
local?
>>
>>109170249
Usecase for any of this garbage when I get infinite GLM 5.2 tokens for 0.00?
>>
>>109170249
low cost coin toss it seems
>>
File: Untitled.png (13 KB, 837x513)
13 KB PNG
>>109170290
>>109170290
>>109170290
>>
>>109170163
let dsv4 flash free on opencode do it for you



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.