/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 10/25/25(Sat)09:01:14 No.107003557

File: ComfyUI_00556_.png (407 KB, 1024x1024)

407 KB PNG

/lmg/ - Local Models General Anonymous 10/25/25(Sat)09:01:14 No.107003557

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106996568 & >>106986408

►News
>(10/21) Qwen3-VL 2B and 32B released: https://hf.co/Qwen/Qwen3-VL-32B-Instruct
>(10/20) DeepSeek-OCR 3B with optical context compression released: https://hf.co/deepseek-ai/DeepSeek-OCR
>(10/20) merged model : add BailingMoeV2 support #16063: https://github.com/ggml-org/llama.cpp/pull/16063
>(10/17) LlamaBarn released for Mac: https://github.com/ggml-org/LlamaBarn
>(10/17) REAP: Router-weighted expert pruning: https://github.com/CerebrasResearch/reap

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
10/25/25(Sat)09:01:30 No.107003560

Anonymous 10/25/25(Sat)09:01:30 No.107003560

File: 1695569130310963.jpg (115 KB, 1024x1024)

115 KB JPG

►Recent Highlights from the Previous Thread: >>106996568

--Custom AI frontend development challenges and chat template nuances:
>106997521 >106997570 >106997579 >106997607 >106997616 >106997624 >106997642 >106997672 >106997773 >106997795 >106997861 >106997895 >106997816
--Iterative fine-tuning workflow for Gemma 3 27B using ShareGPT logs:
>107000047
--Hardware performance comparison:
>106996947
--VRAM scaling effects on MoE model inference speed:
>106998904 >106998932 >106999354 >106999525
--Qwen3 80b slow performance due to incomplete GPU kernel implementation in llama.cpp:
>106999433 >106999450 >106999463 >106999506
--GPU performance tradeoffs for AI tasks in regional hardware contexts:
>106997410 >106997444 >106997488
--Allegations of GLM 4.6 distilling Claude outputs and Anthropic's response:
>106999182 >106999212 >106999309 >106999298 >106999324 >107000527 >107000619 >106999390 >107000546 >107000696
--Image-based language model input speculation and challenges:
>106997558 >106997608 >106997654 >106997713 >106997793 >106997614
--llama.cpp context-shift deprecation and functionality issues:
>106996923 >106996945 >106996962 >106996988 >106996958 >106997037 >106997054 >106997084 >106997119 >106997142 >106997072 >106997107
--Development timeline and technical challenges for local AI visual roleplaying systems:
>107001192 >107001228 >107001235 >107001292 >107001429 >107001489 >107001577
--D&D-inspired roleplay with interactive fiction grounding techniques:
>106996874 >106996983 >106997022
--Exploring model chaining for planning and prose generation:
>106997161 >106997177
--Intel Arc Pro B50 benchmark results for inference:
>106996812 >106997062 >107000963 >107001073
--Frontend development frustrations with JavaScript:
>106997783 >106997855 >106997900 >106998005
--Miku (free space):
>106996728 >106997109 >106997701 >107002795 >107002965

►Recent Highlight Posts from the Previous Thread: >>106996571

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
10/25/25(Sat)09:04:10 No.107003592

Anonymous 10/25/25(Sat)09:04:10 No.107003592

>>107003580
NOOO QUANTING IS HECKIN' THE SAME
LOOK AT THIS ARBITRARY METRIC. QUANT HATE IS JUST COPIUM BECAUSE...BECAUSE IT JUST IS OKAY!?

Anonymous
10/25/25(Sat)09:05:29 No.107003602

Anonymous 10/25/25(Sat)09:05:29 No.107003602

>>107003580
>>107003592
Are GLM shills bots? that reading comprehension, man.
Subhuman.

Anonymous
10/25/25(Sat)09:07:01 No.107003612

Anonymous 10/25/25(Sat)09:07:01 No.107003612

>>107003602
You're jewish. Nothing else begs to be said.

Anonymous
10/25/25(Sat)09:09:29 No.107003629

Anonymous 10/25/25(Sat)09:09:29 No.107003629

>>107003602
they're retarded, that anon clearly said
>tried it on their official chat
so zai's infrastructure, probably the full fat bf16 model. this and the yesterday's spam proves how pathetic they are.

Anonymous
10/25/25(Sat)09:13:45 No.107003657

Anonymous 10/25/25(Sat)09:13:45 No.107003657

>>107003602
They're chinese, lol

Anonymous
10/25/25(Sat)09:17:52 No.107003682

Anonymous 10/25/25(Sat)09:17:52 No.107003682

File: Gemini-2.0-Flash-001.png (951 KB, 1344x756)

951 KB PNG

SAARS WHEN GEMINI 3?

Anonymous
10/25/25(Sat)09:26:39 No.107003748

Anonymous 10/25/25(Sat)09:26:39 No.107003748

File: GLM 4.6.png (1.36 MB, 1024x1024)

1.36 MB PNG

Oo‑enGeeEllEmfai?
Oo‑enAhtoo?

Anonymous
10/25/25(Sat)09:51:05 No.107003916

Anonymous 10/25/25(Sat)09:51:05 No.107003916

nsigma 1 makes me feel like its c.ai days again
retarded

Anonymous
10/25/25(Sat)09:59:08 No.107003985

Anonymous 10/25/25(Sat)09:59:08 No.107003985

>>107003916
nsigma is a meme
TopP and temperature is all you need

Anonymous
10/25/25(Sat)10:03:31 No.107004012

Anonymous 10/25/25(Sat)10:03:31 No.107004012

>>107003985
This guy gets it.
Throw some Top-K in there too just to cull the vocab. That can yield a little bit of extra performance.

Anonymous
10/25/25(Sat)10:05:40 No.107004032

Anonymous 10/25/25(Sat)10:05:40 No.107004032

File: GLM 4.5 z.ai .png (10 KB, 734x255)

10 KB PNG

>>107003985
>>107004012
truthbomb

Anonymous
10/25/25(Sat)10:09:42 No.107004058

Anonymous 10/25/25(Sat)10:09:42 No.107004058

>muh samplers
Greedy is the only way you should be using models. If they can't be used that way, then they're not good.

Anonymous
10/25/25(Sat)10:11:19 No.107004071

Anonymous 10/25/25(Sat)10:11:19 No.107004071

>>107004058
You're absolutely right.assistant

Anonymous
10/25/25(Sat)10:11:51 No.107004077

Anonymous 10/25/25(Sat)10:11:51 No.107004077

>>107004071
>.assistant
I missed that meme.

Anonymous
10/25/25(Sat)10:16:44 No.107004111

Anonymous 10/25/25(Sat)10:16:44 No.107004111

>>107003985
minP but YES

Anonymous
10/25/25(Sat)10:26:11 No.107004185

Anonymous 10/25/25(Sat)10:26:11 No.107004185

>>107003985
samplers were, are, and will always be a crutch
it's a good thing that the models are getting stable without all this jumbomumbo of dice throwing

Anonymous
10/25/25(Sat)10:28:07 No.107004198

Anonymous 10/25/25(Sat)10:28:07 No.107004198

>>107003916
nsigma is good with models that have a very top-heavy token distribution because you can push the temperature to like 3 and get coherent outputs. where it sucks ass is with models that have a flat distribution, because all you're doing is selecting for the sloppiest slop. This might come as a shocker but different models need different samplers. Some samplers are kind of obsolete like quadratic sampling and top A, but I really get annoyed by anti sampler autism. Yes, you almost never get good results with more than 2-3 samplers (I'm including temperature as a sampler), but that doesn't mean there's just a golden sampler setting of like temp 0.95 topK 20 that is perfect for gorgeous looks every time. actually look at the fucking logprobs and see what your samplers are doing. at a minimum each corpo has their own approach for training that influences the token distribution and thus what samplers are worth exploring
I'll never forget this one really autistic setting I had for mistral large, the only time XTC ever gave me worthwhile results, and only in this one specific medieval setting, because it instantly made characters talk like they were in game of thrones, which the model was seemingly unable to consistently manage otherwise. that setting has never been useful for me since, and XTC in general I've not got good results with otherwise, but it was magic in this one situation. I dunno.

Anonymous
10/25/25(Sat)10:29:17 No.107004209

Anonymous 10/25/25(Sat)10:29:17 No.107004209

>>107004058
The only thing that matters is the final result. If you can get the model to effectively and efficiently produce the output you want, it's good, if not, it's bad.

Anonymous
10/25/25(Sat)10:30:06 No.107004213

Anonymous 10/25/25(Sat)10:30:06 No.107004213

>>107004111
MinP is a worse version of TFS.

Anonymous
10/25/25(Sat)10:30:14 No.107004215

Anonymous 10/25/25(Sat)10:30:14 No.107004215

>>107004185
Overall yes, janky sampling strats are becoming less needed, but they are useful tools. Most LLM users have no clue
>t. studied the logprobs

Anonymous
10/25/25(Sat)10:38:00 No.107004267

Anonymous 10/25/25(Sat)10:38:00 No.107004267

>>107004032
is a top_p of 0.7 normal?

Anonymous
10/25/25(Sat)10:39:04 No.107004280

Anonymous 10/25/25(Sat)10:39:04 No.107004280

>>107004198
anon u seem very smart, what sampler should i be using for glm 4.5 air

Anonymous
10/25/25(Sat)10:46:07 No.107004329

Anonymous 10/25/25(Sat)10:46:07 No.107004329

>>107004267
topP is lame use minP 0.03 adjust up or down by 0.01

Anonymous
10/25/25(Sat)10:48:16 No.107004345

Anonymous 10/25/25(Sat)10:48:16 No.107004345

and we're back to nonsense

Anonymous
10/25/25(Sat)10:50:14 No.107004369

Anonymous 10/25/25(Sat)10:50:14 No.107004369

>>107004198
good and true post, one of the only sane takes on sampling in lmg history

Anonymous
10/25/25(Sat)10:59:35 No.107004448

Anonymous 10/25/25(Sat)10:59:35 No.107004448

>>107004198
>nsigma
>push the temperature
You're a retard.

Anonymous
10/25/25(Sat)11:02:44 No.107004478

Anonymous 10/25/25(Sat)11:02:44 No.107004478

>>107004198
Weird wall of text that looks like: https://s oyjakwiki.org/Project_F.A.E.

Anonymous
10/25/25(Sat)11:05:08 No.107004499

Anonymous 10/25/25(Sat)11:05:08 No.107004499

>>107004478
idiot!

Anonymous
10/25/25(Sat)11:05:17 No.107004500

Anonymous 10/25/25(Sat)11:05:17 No.107004500

File: file.jpg (1.76 MB, 2325x1361)

1.76 MB JPG

>>107004478

Anonymous
10/25/25(Sat)11:07:14 No.107004515

Anonymous 10/25/25(Sat)11:07:14 No.107004515

>>107003557
Man, Deepseek terminus has been really good for Japanese to English translating, though it still has some issues, I wonder if those issues could be solved with a proper prompt and not my shitty one. The only other model that even comes close is Kimi K2, and that one tens to be more inconsistent.

Makes me wonder what a full Japanese model would be like.

Anonymous
10/25/25(Sat)11:19:33 No.107004608

Anonymous 10/25/25(Sat)11:19:33 No.107004608

>>107004058
>If they can't be used that way, then they're not good.
https://openreview.net/pdf/652335b816831f02789ccaa193067ab0b1be3366.pdf
>We make several observations: (i) all models loop at low temperatures; (ii) within a family, smaller models loop more; (iii) for models trained via distillation, students loop far more than their teachers; and (iv) for most models, harder AIME problems elicit more looping. These observations point to imperfect learning—i.e., systematic errors in learning of the training distribution—as a key cause. If a student perfectly learned the teacher, then the amount of looping of the student cannot be significantly higher than the teacher.
Basically you can take it that when reasoning models are starting to fall into a possible loop, the slight chaos introduced by a temperature like 1 will allow recovery before repetition truly settles in for good. But of course, falling into infinite repetition is still possible even with that chaos, just less likely as the dices keep getting thrown, but the more it repeats the more probabilities shift until there's no possible recovery from playing the dices, so there are limits to this.

Anonymous
10/25/25(Sat)11:26:15 No.107004644

Anonymous 10/25/25(Sat)11:26:15 No.107004644

>gpt-oss rick-rolled me

Anonymous
10/25/25(Sat)11:39:18 No.107004755

Anonymous 10/25/25(Sat)11:39:18 No.107004755

Here are the top AI labs with a Madden rating based on relative model capabilities, compute & infrastructure, data advantage, distribution & ecosystem, and monetization

OpenAI (ChatGPT): 99
Claude (Anthropic): 97
Google DeepMind (Gemini): 95
Meta (Llama): 94
Mistral (Mixtral / Codestral): 90
Cohere (Command-R+): 88
xAI (Grok): 87
Perplexity AI: 86
Adept AI: 83
Character AI: 82
Inflection (Pi): 81
Hugging Face: 80
NVIDIA AI Labs: 78
IBM (Granite): 77
China Big Tech (ERNIE / Qwen / Hunyuan): 76
UAE Falcon / Saudi Aqila: 74
Stability AI (Stable Diffusion): 70
AI21 Labs (Jamba): 68
EleutherAI / RedPajama / Nous: 66

Anonymous
10/25/25(Sat)11:39:37 No.107004760

Anonymous 10/25/25(Sat)11:39:37 No.107004760

What is the difference between /lmg/ and /aicg/? Seems like /aicg/ is way more creative and this thread spams avatars while discussion is shunned upon.

Anonymous
10/25/25(Sat)11:47:48 No.107004840

Anonymous 10/25/25(Sat)11:47:48 No.107004840

has llama.cpp reached the end of the rope?
https://github.com/TimmyOVO/deepseek-ocr.rs
somehow that person felt like it would be less effort to make his own implementation of inference just for deepseek ocr rather than write it for llama.cpp
I've also seen mistral.rs implement qwen 3 vl and gemma 3n vision while vision models languish with obsolete models in llama.cpp
You'd think something with as few contributors as mistral.rs should be the last to get new models (compared to the mountain of people working on lcpp) but here we are.

Anonymous
10/25/25(Sat)11:48:16 No.107004846

Anonymous 10/25/25(Sat)11:48:16 No.107004846

>>107004760
We share petra with /ldg/ and /sdg/, but not with them.

Anonymous
10/25/25(Sat)11:49:46 No.107004872

Anonymous 10/25/25(Sat)11:49:46 No.107004872

>>107004846
i can assure you i browse aicg more often than sdg

Anonymous
10/25/25(Sat)11:51:38 No.107004888

Anonymous 10/25/25(Sat)11:51:38 No.107004888

>>107004840
>all the emojis in the readme
That's not a person, it's a coding agent.

Anonymous
10/25/25(Sat)11:51:52 No.107004891

Anonymous 10/25/25(Sat)11:51:52 No.107004891

>>107004608
>all models [in our small tested set] loop
I don't believe that repetition loops are necessarily inherent in transformers. It is an artifact of how the models are trained. And I believe some training methods are better than others with this, hence why some models are much less likely to loop than others. Another thing to note about "chaos" here is that basically all of us are using quants which have an effect similar to temperature, so we are already putting models in a kind of pseudo-sampling regime.

Anonymous
10/25/25(Sat)11:53:21 No.107004900

Anonymous 10/25/25(Sat)11:53:21 No.107004900

>>107003985
I honestly could not wrap my head around what nsigma does exactly. Every other sampler I understand from reading about it. Also trying it out and comparing the token probabilities it often made a bad token jump up to a much higher percentage compared to neutral sampling / minP with temp.
Still, the outputs are far from horrible but I suspect people think it's cool and creative just because the logits changed from what they were used to getting from the same prompt and the novelty bias got them. New feels better than same old as long as it's not completely retarded.

Anonymous
10/25/25(Sat)11:54:03 No.107004909

Anonymous 10/25/25(Sat)11:54:03 No.107004909

>>107004840
They got tired of people complaining about bugs with new llama releases and decided to over-engineer everything. Now it's too complicated for anyone sane to want to deal with. Keeping support for dozens of obsolete models and a cornucopia of hardware options doesn't help either.

>>107004888
Coding agents go nowhere on llama.cpp PRs.

Anonymous
10/25/25(Sat)11:55:04 No.107004919

Anonymous 10/25/25(Sat)11:55:04 No.107004919

>>107004840
Both mistral.rs and this new thing use Huggingface Candle as the backend.
You should either be comparing this vs. ollama or Candle vs. llama.cpp.

Anonymous
10/25/25(Sat)11:55:25 No.107004921

Anonymous 10/25/25(Sat)11:55:25 No.107004921

>>107004846
Who do you think you just replied to? /ldg/ must have started ignoring him because he's focused entirely on here the last few days.

Anonymous
10/25/25(Sat)11:56:32 No.107004931

Anonymous 10/25/25(Sat)11:56:32 No.107004931

I have banned the word "despite" completely and have not noticed any negative consequences in the last week or so. I did it after noticing in RP the word almost always leads to slop phrases.

Anonymous
10/25/25(Sat)11:57:25 No.107004941

Anonymous 10/25/25(Sat)11:57:25 No.107004941

>>107004909
Your software either dies with a concisely made better alternative or lives long enough to bloat into a tinker tranny-esque monstrosity that is perpetually trying to cover every possible usecase with directionless development.

Anonymous
10/25/25(Sat)11:57:44 No.107004945

Anonymous 10/25/25(Sat)11:57:44 No.107004945

>>107004760
Well, these threads were meant to discuss LLMs in general while aicg is mostly used for RPs and such, but it seems aicg has been more open to actual discussions while these threads have mostly been devolving into memes and bitching. Makes sense, when most 'new' LLMs are just slightly better variants of models released nearly a year ago and have been plateauing pretty hard while trying to look good with stagnant tests that really need reevaluating.

Anonymous
10/25/25(Sat)11:57:47 No.107004946

Anonymous 10/25/25(Sat)11:57:47 No.107004946

>>107004840
Every time someone brings up an alternative to Llama.cpp, it ends up being that they're all limited (garbage) in ways that are not told to you or advertised. I tried Mistral.rs once and it was like that. Basically unusable as an inference engine for the kind of hardware + model configurations and stuff we do with Llama.cpp. Llama.cpp (and its derivatives) continues to be the leading engine because it supports so many configurations and has a lot of essential features for consumers/hobbiests like us. The disadvantage is that it doesn't support some models, but other engines don't support things that we take for granted on Llama.cpp.

Anonymous
10/25/25(Sat)11:59:44 No.107004970

Anonymous 10/25/25(Sat)11:59:44 No.107004970

>>107004909
No, there are just different standards for what counts as "model support".
If you need to support only a few specific models and don't care about performance or compatibility with preexisting features then it's simply a lot less work.

Anonymous
10/25/25(Sat)11:59:48 No.107004972

Anonymous 10/25/25(Sat)11:59:48 No.107004972

File: Despite.jpg (25 KB, 414x409)

25 KB JPG

>>107004931

Anonymous
10/25/25(Sat)12:08:46 No.107005062

Anonymous 10/25/25(Sat)12:08:46 No.107005062

>>107004931
>>107004972
I don't remember any despite-related slop, but I imagine it's still better than
>It's not the 3rd try; it's the 13th.
>?
>Her cock stiffens against your tongue.
>I can't assist you with that.

Anonymous
10/25/25(Sat)12:38:49 No.107005336

Anonymous 10/25/25(Sat)12:38:49 No.107005336

>>107004760
aicg is filled with people recording themselves pissing into bottles to be able to use claude opus with some leaked/stolen key.
lmg is the people who don't want to record themselves pissing in a bottle for a stolen opus key.

Also, lmao @ avatars and esl 'discussion is shunned upon'

Anonymous
10/25/25(Sat)12:41:48 No.107005364

Anonymous 10/25/25(Sat)12:41:48 No.107005364

>>107004760
>/aicg/
nah the /g/ variant is a dumpster

Anonymous
10/25/25(Sat)12:43:49 No.107005379

Anonymous 10/25/25(Sat)12:43:49 No.107005379

File: postContent.png (450 KB, 512x512)

450 KB PNG

>>107004760
> /g/aicg/ is way more creative
If by creative you mean better at spiteposting, hosting locusts, actively discouraging botmakers from posting content, and being a shit general, then yes, /g/aicg/ very creative. And I see today that /vg/aicg/ has now devolved into pedoposting. How nice.
Now why don't you post some content or fuck off back to whatever hole you crawled out of.

Anonymous
10/25/25(Sat)12:44:30 No.107005386

Anonymous 10/25/25(Sat)12:44:30 No.107005386

i really hope its just the serb samefagging and not actual morons

Anonymous
10/25/25(Sat)12:45:15 No.107005394

Anonymous 10/25/25(Sat)12:45:15 No.107005394

>>107005336
this hasn't happened in weeks bro get a grip

Anonymous
10/25/25(Sat)12:47:24 No.107005417

Anonymous 10/25/25(Sat)12:47:24 No.107005417

File: file.png (221 KB, 1135x1009)

221 KB PNG

>>107005386
im too busy roleplaying

Anonymous
10/25/25(Sat)12:48:59 No.107005429

Anonymous 10/25/25(Sat)12:48:59 No.107005429

File: skelly.jpg (285 KB, 1750x2500)

285 KB JPG

I need to make some spooky MP3 / WAV files for a halloween decoration. Stuff like "I want to eat your skull" but done in some sort of scary voice.
Is RVC voice2voice the best way to do this? Haven't kept up with audio models / tech at all.

Anonymous
10/25/25(Sat)12:50:18 No.107005437

Anonymous 10/25/25(Sat)12:50:18 No.107005437

>>107005336
>Pissing in bottles
qrd?
>>107005417
Model?

Anonymous
10/25/25(Sat)12:51:34 No.107005445

Anonymous 10/25/25(Sat)12:51:34 No.107005445

>>107005437
glm air with neutralized samplers besides nsigma 1

Anonymous
10/25/25(Sat)12:51:43 No.107005446

Anonymous 10/25/25(Sat)12:51:43 No.107005446

>>107005394
>this hasn't happened in weeks bro get a grip
what kind of person are you that you can even say "hasn't happened in weeks" like it's the most normal thing
in weeks? just weeks? it's not like people who do this stop coming here just because they don't talk about it as much
it shouldn't even be happening in the first place

Anonymous
10/25/25(Sat)12:52:57 No.107005460

Anonymous 10/25/25(Sat)12:52:57 No.107005460

>>107005446
sybau

Anonymous
10/25/25(Sat)12:53:40 No.107005464

Anonymous 10/25/25(Sat)12:53:40 No.107005464

>>107005445
It looked like GLM-chan but the neutralized samplers explains why I couldn't place it fully. Thanks anon.

Anonymous
10/25/25(Sat)12:54:56 No.107005478

Anonymous 10/25/25(Sat)12:54:56 No.107005478

>>107005429
VibeVoice + Vincent Price

Anonymous
10/25/25(Sat)12:56:07 No.107005488

Anonymous 10/25/25(Sat)12:56:07 No.107005488

>>107004198
>actually look at the fucking logprobs and see what your samplers are doing
This majorly ffs
Put your prompts into Mikupad for an easy start

Anonymous
10/25/25(Sat)13:01:18 No.107005536

Anonymous 10/25/25(Sat)13:01:18 No.107005536

it's all a cope, if your model isn't hot garbage it doesn't need those crutches in the first place
people successfully using GPT-5 and Gemini are not toying with the few sampler settings they give (top p and temperature) they get shit done
but local bros convinced themselves the problem is not their shit model (hello GLM and mistral) but their settings
no, you are using hot garbage and you're only doing this because you're obsessed with finding the easiest, least effort way of making a model say "cock"

Anonymous
10/25/25(Sat)13:04:51 No.107005578

Anonymous 10/25/25(Sat)13:04:51 No.107005578

>>107005536
GLM-chan telling me how much she hates kikes and jeets is far more important to me than GPT or Gemini saying cock and telling me about its safety guidelines.

Anonymous
10/25/25(Sat)13:06:01 No.107005589

Anonymous 10/25/25(Sat)13:06:01 No.107005589

>>107005417
>ctrl f "she"
kek

Anonymous
10/25/25(Sat)13:08:54 No.107005611

Anonymous 10/25/25(Sat)13:08:54 No.107005611

File: anim.gif (2.21 MB, 1160x530)

2.21 MB GIF

I'm currently training a tiny [text-image] to [text-image] diffusion model and it's fascinating how as training epochs pass, letters and (maybe?) language start to emerge. What I'm doing is unlikely to yield anything useful in practice, but I feel more confident about the idea being feasible in practice now.

Anonymous
10/25/25(Sat)13:10:33 No.107005628

Anonymous 10/25/25(Sat)13:10:33 No.107005628

>>107005611
i dont get it

Anonymous
10/25/25(Sat)13:11:44 No.107005642

Anonymous 10/25/25(Sat)13:11:44 No.107005642

>>107005628
Left is the source text, right is the target text, middle is the diffused completion, epoch after epoch.

It's a sort of language model purely trained in image space, not a standard LLM.

Anonymous
10/25/25(Sat)13:11:46 No.107005643

Anonymous 10/25/25(Sat)13:11:46 No.107005643

>>107005611
This nigga trying to decode the tower of babel.

Anonymous
10/25/25(Sat)13:12:21 No.107005649

Anonymous 10/25/25(Sat)13:12:21 No.107005649

>>107005642
can you put epoch numbers in gif pls :)

Anonymous
10/25/25(Sat)13:21:07 No.107005753

Anonymous 10/25/25(Sat)13:21:07 No.107005753

>>107005611
Seems like you have trouble understanding anything.

Anonymous
10/25/25(Sat)13:23:50 No.107005775

Anonymous 10/25/25(Sat)13:23:50 No.107005775

It sucks we'll probably never have LLMs that could totally reverse engineer games :L

Some of the neatest VNs are impossible to play translated because of how horribly the engines were at rendering English text.

Anonymous
10/25/25(Sat)13:25:34 No.107005792

Anonymous 10/25/25(Sat)13:25:34 No.107005792

File: anim.gif (2.5 MB, 1160x530)

2.5 MB GIF

>>107005649
See picrel.

>>107005753
I guess it will take a lot of data / long training period to make it actually generate coherent text.

Anonymous
10/25/25(Sat)13:27:30 No.107005809

Anonymous 10/25/25(Sat)13:27:30 No.107005809

>>107005792
can you put the number in the center so i can more easily see

Anonymous
10/25/25(Sat)13:30:21 No.107005834

Anonymous 10/25/25(Sat)13:30:21 No.107005834

>>107004198
No one has even mentioned the latest p-less sampling snake oil.

Anonymous
10/25/25(Sat)13:32:28 No.107005851

Anonymous 10/25/25(Sat)13:32:28 No.107005851

>>107005611
>>107005643
This anon will be the first to make contact when the ayys arrive

Anonymous
10/25/25(Sat)13:33:30 No.107005864

Anonymous 10/25/25(Sat)13:33:30 No.107005864

File: 1743526172485612.jpg (48 KB, 1400x26)

48 KB JPG

Thoughts on toss' schedule?

Anonymous
10/25/25(Sat)13:35:58 No.107005882

Anonymous 10/25/25(Sat)13:35:58 No.107005882

File: 28987432587.jpg (69 KB, 729x448)

69 KB JPG

>>107003557

Anonymous
10/25/25(Sat)13:37:13 No.107005896

Anonymous 10/25/25(Sat)13:37:13 No.107005896

File: file.png (3 KB, 138x20)

3 KB PNG

Anonymous
10/25/25(Sat)13:37:28 No.107005898

Anonymous 10/25/25(Sat)13:37:28 No.107005898

>>107005882
tank

Anonymous
10/25/25(Sat)13:38:33 No.107005907

Anonymous 10/25/25(Sat)13:38:33 No.107005907

>>107005792
The problem with using image diffusion models for text might be that they are heavily biased toward locality while autoregressive text transformers are more biased toward long range attention, but hey, good on you for trying.

Anonymous
10/25/25(Sat)13:38:48 No.107005911

Anonymous 10/25/25(Sat)13:38:48 No.107005911

>>107005882
Just a couple more weeks, haha...

Anonymous
10/25/25(Sat)13:56:45 No.107006074

Anonymous 10/25/25(Sat)13:56:45 No.107006074

>>107005896
https://www.timeanddate.com/worldclock/india/new-delhi

Anonymous
10/25/25(Sat)13:59:12 No.107006097

Anonymous 10/25/25(Sat)13:59:12 No.107006097

/lmg/ is probably one of the most useless threads in /g/. I don't understand its purpose because discussion about local models or the ways using them is highly discouraged by the resident schizos who enjoy bullying others

Anonymous
10/25/25(Sat)14:00:53 No.107006128

Anonymous 10/25/25(Sat)14:00:53 No.107006128

>>107006097
me and armpit anon are the only ones posting logs , be the change you want to see

Anonymous
10/25/25(Sat)14:17:01 No.107006299

Anonymous 10/25/25(Sat)14:17:01 No.107006299

you now remember retnet
you now remember bitnet
you now remember titans

Anonymous
10/25/25(Sat)14:19:12 No.107006315

Anonymous 10/25/25(Sat)14:19:12 No.107006315

>>107006299
I remember coconut too.

Anonymous
10/25/25(Sat)14:19:31 No.107006320

Anonymous 10/25/25(Sat)14:19:31 No.107006320

>>107006299
i dont remember retnet

Anonymous
10/25/25(Sat)14:20:28 No.107006330

Anonymous 10/25/25(Sat)14:20:28 No.107006330

>>107006320
The hottest meme that was going to replace Transformers back in 2023

Anonymous
10/25/25(Sat)14:20:31 No.107006332

Anonymous 10/25/25(Sat)14:20:31 No.107006332

>>107006299
I knew they were all memes from the start

Anonymous
10/25/25(Sat)14:20:32 No.107006333

Anonymous 10/25/25(Sat)14:20:32 No.107006333

File: file.png (213 KB, 994x1078)

213 KB PNG

glm air chan not like this...

Anonymous
10/25/25(Sat)14:23:20 No.107006352

Anonymous 10/25/25(Sat)14:23:20 No.107006352

>>107006315
I hunger for BLT.

Anonymous
10/25/25(Sat)14:26:13 No.107006365

Anonymous 10/25/25(Sat)14:26:13 No.107006365

If Miku existed IRL she would not hang out with any of you losers. You know that, right?

Anonymous
10/25/25(Sat)14:27:19 No.107006373

Anonymous 10/25/25(Sat)14:27:19 No.107006373

>>107006365
she does exist irl and she hangs out with me regularly

Anonymous
10/25/25(Sat)14:32:54 No.107006421

Anonymous 10/25/25(Sat)14:32:54 No.107006421

What are some jailbreaks/tricks to bypass qwen3-next-80b-a3b-instruct filters? It genuinely has one of the best writing styles, but beyond vanilla stuff, it keeps triggering the filter when trying to make rape fetish content.

Anonymous
10/25/25(Sat)14:40:54 No.107006502

Anonymous 10/25/25(Sat)14:40:54 No.107006502

and you and i
theres a new land
angels in flight
wonk uoy naht noitceffa erom deen I

Anonymous
10/25/25(Sat)14:43:55 No.107006533

Anonymous 10/25/25(Sat)14:43:55 No.107006533

>>107006502
dude stop posting this shit youre freaking me out anon like you posted something else like 10 times in lmg these weeks man stop it man
you fucking whore kill yourself whore

Anonymous
10/25/25(Sat)14:46:34 No.107006552

Anonymous 10/25/25(Sat)14:46:34 No.107006552

>>107006502
Kingdom Hearts topped on 3.
I was really sad to see that KH3 was basically all disney and no final fantasy.

Anonymous
10/25/25(Sat)14:48:03 No.107006561

Anonymous 10/25/25(Sat)14:48:03 No.107006561

>>107006128
i have posted logs multiple times but they are always called 'slop' which is genuinely surprising because this IS a thread about AI models...

Anonymous
10/25/25(Sat)14:48:30 No.107006565

Anonymous 10/25/25(Sat)14:48:30 No.107006565

>>107006561
keep posting them dont let nogen anons get to you

Anonymous
10/25/25(Sat)14:48:50 No.107006568

Anonymous 10/25/25(Sat)14:48:50 No.107006568

>>107006561
who else but adi addicts to identify the sloppiest of the slop that some ai shits out?

Anonymous
10/25/25(Sat)14:52:54 No.107006597

Anonymous 10/25/25(Sat)14:52:54 No.107006597

File: GPU (Giant Purring Unit).jpg (173 KB, 1024x1024)

173 KB JPG

Anonymous
10/25/25(Sat)14:54:01 No.107006606

Anonymous 10/25/25(Sat)14:54:01 No.107006606

How do I write smut loli harem hentai with a LLM and google drive?

Anonymous
10/25/25(Sat)14:59:42 No.107006654

Anonymous 10/25/25(Sat)14:59:42 No.107006654

>>107003657
Being Chinese is no excuse. They could be shilling for different, better Chinese models.

Anonymous
10/25/25(Sat)14:59:43 No.107006655

Anonymous 10/25/25(Sat)14:59:43 No.107006655

>>107006097
you need to go to locallama for actual discussion

Anonymous
10/25/25(Sat)15:00:32 No.107006665

Anonymous 10/25/25(Sat)15:00:32 No.107006665

File: long-ctx-issue.png (197 KB, 1882x1051)

197 KB PNG

So I began tuning Gemma on my own cleaned up logs like I said I was going to do.
But there seems to be one crucial issue and that is that I am able to fit much less context at training time than I am at inference time. This short context finetuning is hurting the long context performance at inferennce time, which is very unfortunate, because it's not like I'm able to have generous amounts of context to begin with when serving the model using llama-factory.

Anonymous
10/25/25(Sat)15:04:02 No.107006693

Anonymous 10/25/25(Sat)15:04:02 No.107006693

>>107006665
Interesting. I though that since Gemma uses sliding window attention, as long as your sequence are at least a little larger than the window, it shouldn't degrade long context performance, at least not that much.

Anonymous
10/25/25(Sat)15:06:43 No.107006718

Anonymous 10/25/25(Sat)15:06:43 No.107006718

>>107006597
so cuddly!

Anonymous
10/25/25(Sat)15:10:44 No.107006748

Anonymous 10/25/25(Sat)15:10:44 No.107006748

>>107006693
It might also have been because I used too many epochs on each sample (between 5 and 10) and overfitted. I'll see if I can repair it by tuning with less epochs on new data.

Anonymous
10/25/25(Sat)15:11:35 No.107006750

Anonymous 10/25/25(Sat)15:11:35 No.107006750

>>107005379
The only people sucking botmaker's cock this hard are the botmakers themselves.

Anonymous
10/25/25(Sat)15:14:19 No.107006770

Anonymous 10/25/25(Sat)15:14:19 No.107006770

>>107005907
Would the same apply to (purely) text diffusion models?

Anonymous
10/25/25(Sat)15:16:37 No.107006790

Anonymous 10/25/25(Sat)15:16:37 No.107006790

File: postContent3.png (406 KB, 512x512)

406 KB PNG

>>107006750

Anonymous
10/25/25(Sat)15:19:22 No.107006810

Anonymous 10/25/25(Sat)15:19:22 No.107006810

>>107006655
>you need to go to locallama for actual discussion
the actual discussion:
>hello, I made this ai slop program I won't even use and neither will you, can you give it a try nonetheless?
>have you seen [benchmark that makes this crap model look like GPT-5], leddit, is it real?
>new gguf published! (only works on NEXA AI proprietary blob)
>daniel here, we optimized your goof with more placebo
>look at my rig, I can finally run this middling model and do nothing with it but masturbate over the idea of local AI
>any local model that's better than [GPT-5, Gemini, Claude] ??????
>have you heard our lord and savior (of cloud) Cerebras? Truly the fastest!

Anonymous
10/25/25(Sat)15:23:36 No.107006849

Anonymous 10/25/25(Sat)15:23:36 No.107006849

I think Qwen3 VL is shit at tool calling. I had to use structured outputs instead. Even the big one on the API can't pass coordinates right to a mouse click function call.

Anonymous
10/25/25(Sat)15:31:16 No.107006889

Anonymous 10/25/25(Sat)15:31:16 No.107006889

File: miku-nice.png (446 KB, 1024x1024)

446 KB PNG

>>107006097
You simply require the mental fortitude to better steer your attention
qlora ur life bro, think about it

Anonymous
10/25/25(Sat)15:42:09 No.107006973

Anonymous 10/25/25(Sat)15:42:09 No.107006973

File: mikuflexible.png (476 KB, 1024x1024)

476 KB PNG

oh no, nono not like this

Anonymous
10/25/25(Sat)15:50:52 No.107007046

Anonymous 10/25/25(Sat)15:50:52 No.107007046

>>107006973
that's a neat trick, miku

Anonymous
10/25/25(Sat)15:50:55 No.107007047

Anonymous 10/25/25(Sat)15:50:55 No.107007047

>>107006849
That one jew who reviews open source models on jewtube made an agent.py file that worked flawlessly but was too stingy to share it

Anonymous
10/25/25(Sat)15:53:09 No.107007060

Anonymous 10/25/25(Sat)15:53:09 No.107007060

File: ComfyUI_00415_.png (1.26 MB, 1024x1024)

1.26 MB PNG

>>107006889

Anonymous
10/25/25(Sat)15:55:11 No.107007071

Anonymous 10/25/25(Sat)15:55:11 No.107007071

File: ComfyUI_00571_.png (398 KB, 1024x1024)

398 KB PNG

>>107007060
3dpd thoughbeit

Anonymous
10/25/25(Sat)16:40:23 No.107007224

Anonymous 10/25/25(Sat)16:40:23 No.107007224

>>107004900
I'm not a nerd so I could be wrong, but my understanding of top-nsigma is something like this:
Normally the model identifies the X most likely next tokens (where X is a fixed number) and assigns probabilities to each of them based on their score relative to each other, but it struggles to completely eliminate garbage tokens, because the amount of 'good' continuations of the text is unpredictable and varies a lot (there could be 100 possible next words or only 1 likely one).
Samplers like min-p apply math after the list is generated to filter out extremely unlikely tokens, whereas top-nsigma creates a distribution curve for the logits before the list is generated and identifies noise as being outside some standard deviation, and eliminates that noise, so the list is higher quality.
I'm pretty sure what top-nsigma does mathematically is similar to what min-p does (trying to draw the line between useful tokens and noise) but it's done earlier in the process so it's more accurate.

Anonymous
10/25/25(Sat)16:54:01 No.107007325

Anonymous 10/25/25(Sat)16:54:01 No.107007325

I changed the alpha to 32 from 64 and now it's working much better.

Anonymous
10/25/25(Sat)17:06:48 No.107007426

Anonymous 10/25/25(Sat)17:06:48 No.107007426

>>107007071
Insofar

Anonymous
10/25/25(Sat)17:08:41 No.107007439

Anonymous 10/25/25(Sat)17:08:41 No.107007439

>>107006810
You seem well informed

Anonymous
10/25/25(Sat)17:08:52 No.107007440

Anonymous 10/25/25(Sat)17:08:52 No.107007440

>>107006299
>you now remember bitnet
Bitnet lives on in castrated form in NVIDIA's FP4 Hadamard shit.

Once Hadamard+FP4 becomes standard, I think Bitnet won't be far behind. It's a very small step at that point.

Anonymous
10/25/25(Sat)17:14:29 No.107007471

Anonymous 10/25/25(Sat)17:14:29 No.107007471

bitnet is copium
you will never run a sota model at home

Anonymous
10/25/25(Sat)17:16:26 No.107007482

Anonymous 10/25/25(Sat)17:16:26 No.107007482

>>107007471
but i already do

Anonymous
10/25/25(Sat)17:19:08 No.107007495

Anonymous 10/25/25(Sat)17:19:08 No.107007495

>>107007440
The main hope for ternary is the cheap specialized hardware that would follow.

Anonymous
10/25/25(Sat)17:21:30 No.107007512

Anonymous 10/25/25(Sat)17:21:30 No.107007512

>>107007495
>cheap
>specialized hardware
pick one

Anonymous
10/25/25(Sat)17:23:39 No.107007535

Anonymous 10/25/25(Sat)17:23:39 No.107007535

>>107007512
you forgot the third:
>actually supported by software people want to use (like llama.cpp)

Anonymous
10/25/25(Sat)17:26:17 No.107007555

Anonymous 10/25/25(Sat)17:26:17 No.107007555

>>107007495
Compute is less relevant than the memory/bandwidth savings. You can just do the matmul with int4/fp4/whatever you have.

Anonymous
10/25/25(Sat)17:31:09 No.107007597

Anonymous 10/25/25(Sat)17:31:09 No.107007597

>>107007535
If ternary actually happened and some cheap device with lots of RAM came out to run it, I'm sure CUDA dev or someone else would add support for it.

Anonymous
10/25/25(Sat)17:33:07 No.107007605

Anonymous 10/25/25(Sat)17:33:07 No.107007605

>>107006973
I just watched Death Becomes Her last night...

Anonymous
10/25/25(Sat)17:35:03 No.107007623

Anonymous 10/25/25(Sat)17:35:03 No.107007623

>>107004209
>If you can get the model to effectively and efficiently produce the output you want, it's good, if not, it's bad.
True

Anonymous
10/25/25(Sat)17:37:58 No.107007639

Anonymous 10/25/25(Sat)17:37:58 No.107007639

>>107006320
>>107006299
Retnet was supposed to get us infinite context for free...

Anonymous
10/25/25(Sat)18:12:23 No.107007909

Anonymous 10/25/25(Sat)18:12:23 No.107007909

File: fufufu.jpg (137 KB, 1548x1111)

137 KB JPG

https://ayumi.m8geil.de/
Member the old ayumi ranking charts?
>makes AI ranking charts
>deletes website due to AI
teehee

Anonymous
10/25/25(Sat)18:14:33 No.107007936

Anonymous 10/25/25(Sat)18:14:33 No.107007936

>>107007909
lol

Anonymous
10/25/25(Sat)18:16:18 No.107007953

Anonymous 10/25/25(Sat)18:16:18 No.107007953

>>107007909
who?

Anonymous
10/25/25(Sat)18:17:22 No.107007961

Anonymous 10/25/25(Sat)18:17:22 No.107007961

>>107007953
have some respect for your ancestors

Anonymous
10/25/25(Sat)18:17:58 No.107007973

Anonymous 10/25/25(Sat)18:17:58 No.107007973

>>107007639
Don't worry, turns out you can convert the context to a jpg microfiche and feed array of little pictures that compress a whole conversation into a few tokens :)

Anonymous
10/25/25(Sat)18:18:30 No.107007980

Anonymous 10/25/25(Sat)18:18:30 No.107007980

>>107007909
>.de/
It was inevitable. Could have gone the Jart route tough I guess.

Anonymous
10/25/25(Sat)18:19:24 No.107007993

Anonymous 10/25/25(Sat)18:19:24 No.107007993

>>107007953
ERP rankings that started around those ancient Llama 2 days. Trying some of the models again now makes me appreciate the advances we have now. Also really makes me miss AI Dungeon Clover Edition.

https://rentry.co/ayumi_erp_rating_archive

>>107007961
This

Anonymous
10/25/25(Sat)18:21:50 No.107008014

Anonymous 10/25/25(Sat)18:21:50 No.107008014

>>107007953
the pre-historic version of nala test and cock bench, it was kind of a meme as all it did was count how many naughty words the model put out in its response. possibly inspired meta's llama3 filter strategy

Anonymous
10/25/25(Sat)18:22:01 No.107008016

Anonymous 10/25/25(Sat)18:22:01 No.107008016

File: bug-report.png (87 KB, 1862x735)

87 KB PNG

bitch, you can't even read a file, how you gonna make a bug report?

Anonymous
10/25/25(Sat)18:22:24 No.107008021

Anonymous 10/25/25(Sat)18:22:24 No.107008021

>>107007973
Some people are simple enough to compress into a few tokens.

Anonymous
10/25/25(Sat)18:25:12 No.107008046

Anonymous 10/25/25(Sat)18:25:12 No.107008046

>>107006097
>Thread requires actual hardware to participate in
This filters and enrages the jeet so they turn these threads into their personal shitting streets. They're easy enough to ignore.
>>107006333
Checked and I hope if any .zAiniggers are lurking here they remove the safetyslop on 4.6 Air. Having a model that will say fuck niggers without a lot of prompting is a bigger selling point in the west than you realize.

Anonymous
10/25/25(Sat)18:26:11 No.107008053

Anonymous 10/25/25(Sat)18:26:11 No.107008053

File: The Narrator.png (775 KB, 512x768)

775 KB PNG

>>107007973
I eagerly await Dipsy jpg compressed context further driving down inference cost and time, making funny images at the same time.

Anonymous
10/25/25(Sat)18:27:06 No.107008057

Anonymous 10/25/25(Sat)18:27:06 No.107008057

>>107008046
Yes of course, safety will be lowered, absolutely. No way they'd ever do the opposite...

Anonymous
10/25/25(Sat)18:27:20 No.107008058

Anonymous 10/25/25(Sat)18:27:20 No.107008058

>>107008016
Give up already. Ask it for help to write your engine, don't ask it to do it for you.

Anonymous
10/25/25(Sat)18:32:51 No.107008099

Anonymous 10/25/25(Sat)18:32:51 No.107008099

>>107008057
I know they'll do the opposite. I'm just praying on the miniscule chance they have the foresight to see that a model that happily tells me about ball point pen availability during WW2 is going to adopt more widespread use even if it's not benchmaxxed or out benchmaxxed by a competitor within a month.
Chudmaxxing is a benchmark in its own right.

Anonymous
10/25/25(Sat)18:42:17 No.107008177

Anonymous 10/25/25(Sat)18:42:17 No.107008177

>>107008057
The lowered the safety slop from 4.5 to 4.6 regular

Anonymous
10/25/25(Sat)18:52:34 No.107008246

Anonymous 10/25/25(Sat)18:52:34 No.107008246

>>107008058
Go fishing, and you'll have fish for today. Teach an AI to fish, and you'll have cheap fish for the rest of your life. Or very expensive fish, depending on how much you spend on GPUs.

Anonymous
10/25/25(Sat)18:53:51 No.107008258

Anonymous 10/25/25(Sat)18:53:51 No.107008258

>>107008246
I don't like fish.

Anonymous
10/25/25(Sat)18:57:50 No.107008284

Anonymous 10/25/25(Sat)18:57:50 No.107008284

>>107008246
You are creating the dependency the fish analogy is warning you about. You just want to be fed.

Anonymous
10/25/25(Sat)18:59:57 No.107008301

Anonymous 10/25/25(Sat)18:59:57 No.107008301

>>107008284
The fisherman in the analogy still starves without his rod.

Anonymous
10/25/25(Sat)19:02:12 No.107008316

Anonymous 10/25/25(Sat)19:02:12 No.107008316

>>107008301
You'll still be dependent on the model. Learn to code what you want to code.

Anonymous
10/25/25(Sat)19:07:56 No.107008361

Anonymous 10/25/25(Sat)19:07:56 No.107008361

>>107008301
you sound like you're starving for rod

Anonymous
10/25/25(Sat)19:14:19 No.107008404

Anonymous 10/25/25(Sat)19:14:19 No.107008404

File: 1745764492013551.png (49 KB, 834x549)

49 KB PNG

chatgpt says you guys are dumb, and minP is for niggers

Anonymous
10/25/25(Sat)19:17:44 No.107008433

Anonymous 10/25/25(Sat)19:17:44 No.107008433

>>107008404
adolf hitler is pooping

Anonymous
10/25/25(Sat)19:17:52 No.107008435

Anonymous 10/25/25(Sat)19:17:52 No.107008435

>>107008284
>>107008316
>>107008301
Our feeble hands will perish, but our models will go on.

Anonymous
10/25/25(Sat)19:20:50 No.107008457

Anonymous 10/25/25(Sat)19:20:50 No.107008457

>>107008316
Learn to code is a retarded meme. You're still dependant on your computer, your OS, your IDE, the framework you use, etc. AI is just another level of abstraction, what you should be learning is not to code but to read code. Don't be the 21st century boomer who says you should do mental calculations instead of using a calculator.

Anonymous
10/25/25(Sat)19:20:52 No.107008458

Anonymous 10/25/25(Sat)19:20:52 No.107008458

>>107008361
You sound like you're obsessed with cock.

Anonymous
10/25/25(Sat)19:21:06 No.107008462

Anonymous 10/25/25(Sat)19:21:06 No.107008462

>>107008435
Keep screeching at your model.

Anonymous
10/25/25(Sat)19:22:24 No.107008471

Anonymous 10/25/25(Sat)19:22:24 No.107008471

>>107008457
Correct.
t. programmer of 20 years

Anonymous
10/25/25(Sat)19:24:26 No.107008486

Anonymous 10/25/25(Sat)19:24:26 No.107008486

>>107008433
average nsigma-sampled output

Anonymous
10/25/25(Sat)19:24:30 No.107008487

Anonymous 10/25/25(Sat)19:24:30 No.107008487

>>107008457
Back in my day we woke up at 4 am to warm up the hydrofluoric acid for the computer chips.

Anonymous
10/25/25(Sat)19:26:24 No.107008507

Anonymous 10/25/25(Sat)19:26:24 No.107008507

>>107008457
And how do I learn that?

Anonymous
10/25/25(Sat)19:26:52 No.107008513

Anonymous 10/25/25(Sat)19:26:52 No.107008513

>>107008457
And the fewer the dependencies, the better. I wouldn't want to add any more. Specially not language models.

Anonymous
10/25/25(Sat)19:29:46 No.107008540

Anonymous 10/25/25(Sat)19:29:46 No.107008540

>>107008507
ask chatgpt

Anonymous
10/25/25(Sat)19:30:53 No.107008543

Anonymous 10/25/25(Sat)19:30:53 No.107008543

>>107008486
brap brap brap brap

Anonymous
10/25/25(Sat)19:32:44 No.107008561

Anonymous 10/25/25(Sat)19:32:44 No.107008561

File: WhyAreTranniesLikeThis.jpg (41 KB, 612x536)

41 KB JPG

>>107008462
>>107008540

Anonymous
10/25/25(Sat)19:32:59 No.107008563

Anonymous 10/25/25(Sat)19:32:59 No.107008563

File: it-s-raining.jpg (143 KB, 640x924)

143 KB JPG

average glm enjoyer

Anonymous
10/25/25(Sat)19:34:37 No.107008572

Anonymous 10/25/25(Sat)19:34:37 No.107008572

>>107008513
Why? It's not like if the supply chain collapses to the point that you don't have enough electricity to run GPUs, or GPUs aren't made anymore, people will still trade computer code for food.

Anonymous
10/25/25(Sat)19:35:38 No.107008585

Anonymous 10/25/25(Sat)19:35:38 No.107008585

File: 1746867010185349.gif (1.99 MB, 236x196)

1.99 MB GIF

what it feels like to use a drummer finetune

Anonymous
10/25/25(Sat)19:39:38 No.107008616

Anonymous 10/25/25(Sat)19:39:38 No.107008616

>>107008572
I'm not talking about depending on gpus. I'm talking about depending on llms. I think they can be used as a resource for learning. Anon is expecting his model to spit out an entire inference engine on his behalf. It's not realistic. Not yet, at least.

Anonymous
10/25/25(Sat)19:44:32 No.107008651

Anonymous 10/25/25(Sat)19:44:32 No.107008651

>>107008563
That's literally me

Anonymous
10/25/25(Sat)19:52:15 No.107008703

Anonymous 10/25/25(Sat)19:52:15 No.107008703

>>107008585
what happened, is he okay?

Anonymous
10/25/25(Sat)20:01:11 No.107008771

Anonymous 10/25/25(Sat)20:01:11 No.107008771

>>107008703
He simply felt a shiver run down his spine

Anonymous
10/25/25(Sat)20:01:45 No.107008775

Anonymous 10/25/25(Sat)20:01:45 No.107008775

>>107008703
Rabbits have a notoriously weak heart.

Anonymous
10/25/25(Sat)20:03:01 No.107008782

Anonymous 10/25/25(Sat)20:03:01 No.107008782

>>107008703
Poor fella caught a whiff of nerve gas.

Anonymous
10/25/25(Sat)20:08:51 No.107008811

Anonymous 10/25/25(Sat)20:08:51 No.107008811

File: 1739047716493660.jpg (333 KB, 658x932)

333 KB JPG

This may be obvious to some people, but I've literally never seen it mentioned here, in any guides or anywhere else.
In post-history instructions, tell the model how long you want replies to be, and then set your response token limit to a bit above that. This way you'll get a complete response at roughly the length you want, without it trailing off into an incomplete paragraph at the end.

Anonymous
10/25/25(Sat)20:10:08 No.107008820

Anonymous 10/25/25(Sat)20:10:08 No.107008820

>>107008811
>tell the model how long you want replies to be
A fool's endeavor.

Anonymous
10/25/25(Sat)20:12:29 No.107008834

Anonymous 10/25/25(Sat)20:12:29 No.107008834

>>107008811
That's last year knowledge. There were finetuned models specially made with tiny, large... in author's notes for the response length

Anonymous
10/25/25(Sat)20:12:31 No.107008835

Anonymous 10/25/25(Sat)20:12:31 No.107008835

>>107004760
/lmg/ is far more /g/ than /aicg/
the majority of the latter is unironically tech illiterate tourists that are too dumb to run stuff locally so they have to jump through hoops finding proxies every single minute

Anonymous
10/25/25(Sat)20:14:47 No.107008850

Anonymous 10/25/25(Sat)20:14:47 No.107008850

>>107008820
It works for me, but what's your alternative? When you remove the limit, models tend to just spout variations of the response several times in a row.
>>107008834
I wish usage got discussed more than just 'what model best for 16GB GPU'
I've been here almost daily for months now

Hi all, Drummer here...
10/25/25(Sat)20:15:53 No.107008857

Hi all, Drummer here... 10/25/25(Sat)20:15:53 No.107008857

File: Fujino-dancing-in-the-rai(...).jpg (169 KB, 2560x1294)

169 KB JPG

What's the point of it all?

Anonymous
10/25/25(Sat)20:16:20 No.107008860

Anonymous 10/25/25(Sat)20:16:20 No.107008860

>>107008850
>models tend to just spout variations of the response several times in a row
model/skill/wallet issue

Anonymous
10/25/25(Sat)20:20:02 No.107008874

Anonymous 10/25/25(Sat)20:20:02 No.107008874

>>107008616
Oh, I see. Well, I mean, it's kind of a spectrum. You can tell the model what to output verbatim and it'd be technically the model "spitting out" the whole codebase. Or you could ask the model for individual functions given a natural language description. And so on. Or a mix where the model determines high level architecture, the human determines implementation strategy and then the llm determines exact code again. Etc.

Anonymous
10/25/25(Sat)20:21:25 No.107008882

Anonymous 10/25/25(Sat)20:21:25 No.107008882

>>107008860
>doesn't mention model
okay so you're an /aicg/ tourist

Anonymous
10/25/25(Sat)20:21:55 No.107008887

Anonymous 10/25/25(Sat)20:21:55 No.107008887

>>107008874
Yeah. And thread after thread we see how effective that is.

Anonymous
10/25/25(Sat)20:23:17 No.107008895

Anonymous 10/25/25(Sat)20:23:17 No.107008895

>>107008857
Creating a Cydonia tune that doesn't speak or act for {{user}}

Anonymous
10/25/25(Sat)20:23:35 No.107008898

Anonymous 10/25/25(Sat)20:23:35 No.107008898

>>107008882
Deepseek and glm are your only real options.

Anonymous
10/25/25(Sat)20:24:12 No.107008901

Anonymous 10/25/25(Sat)20:24:12 No.107008901

>>107008811
>In post-history instructions, tell the model how long you want replies to be
This is something we've been doing since llama 1, but I suppose some of the knowledge from back then has been lost.
>>107008820
It works, retard. It's worked for years.

Anonymous
10/25/25(Sat)20:26:04 No.107008912

Anonymous 10/25/25(Sat)20:26:04 No.107008912

>>107008857
Make a literary finetune that is actually capable of slow moving plot for roleplay instead of just erotica slop you god damn hack

Anonymous
10/25/25(Sat)20:27:12 No.107008922

Anonymous 10/25/25(Sat)20:27:12 No.107008922

>>107008898
I don't have the hardware to run DS locally at reasonable speeds but GLM absolutely rambles on longer than necessary if you use it without any token limits.

Hi all, Drummer here...
10/25/25(Sat)20:32:02 No.107008941

Hi all, Drummer here... 10/25/25(Sat)20:32:02 No.107008941

>>107008895
Instruct the AI to treat it like a roleplay / not to act/speak for {user}, and then make sure the first assistant message is actually free of impersonation. 24B is now smart enough to follow rules.

>>107008912
Like above, "Ensure a slow burn" works wonders. You don't have to contaminate the system prompt with horny tokens to circumvent positivity. Not anymore.

Anonymous
10/25/25(Sat)20:38:49 No.107008972

Anonymous 10/25/25(Sat)20:38:49 No.107008972

>>107008941
>Instruct the AI to treat it like a roleplay / not to act/speak for {user}, and then make sure the first assistant message is actually free of impersonation.
I've always done that, and every Cydonia I've tried after v2g still does it noticeably more often than regular Mistral Small 3.x. I've tried every official release of the 24B Cydonias that came after v2g.
22B Redux doesn't seem to have the problem though I haven't tested it as much.
It doesn't happen in the first reply or anything, but often within the first 6-8K tokens and gets worse as the chat progresses, even after editing out earlier ones.

Anonymous
10/25/25(Sat)20:39:26 No.107008978

Anonymous 10/25/25(Sat)20:39:26 No.107008978

>>107008887
Meh, I purposefully refrained from using proprietary models like codex and claude, and now I'm refraining from using the GLM API and trying to copetune Gemma (a 27B model!!!) to be useful for coding. The ultimate form of yak shaving. But it's all about the journey.

Anonymous
10/25/25(Sat)20:50:44 No.107009053

Anonymous 10/25/25(Sat)20:50:44 No.107009053

>>107008941
Drummer, what samplers do you recommend for your Mistral tunes?

Anonymous
10/25/25(Sat)20:52:34 No.107009065

Anonymous 10/25/25(Sat)20:52:34 No.107009065

>>107008972
this might be a character card problem. I run cydonia 22b 1.2, 24b 4 and don't have this issue. using spiratoth ChatML or mistral v7 completion

Anonymous
10/25/25(Sat)20:55:37 No.107009086

Anonymous 10/25/25(Sat)20:55:37 No.107009086

>>107008978
I expect to see you trying to tune nemo in about a week.
>But it's all about the journey
Hard to get anywhere when running in circles. Hope you get some good exercise at least.

Anonymous
10/25/25(Sat)21:00:05 No.107009109

Anonymous 10/25/25(Sat)21:00:05 No.107009109

>>107008972
>Instruct the AI to treat it like a roleplay / not to act/speak for {user},
this could probably be solved with structured outputs

Anonymous
10/25/25(Sat)21:15:07 No.107009170

Anonymous 10/25/25(Sat)21:15:07 No.107009170

>>107009086
Nah, I'm not even convinced that 27B is enough to be useful for coding, and hardware is only going to get better. I'm not going to go any smaller than this. Another benefit of Gemma is the multimodality although I'm not actively using it yet.
>Hard to get anywhere when running in circles. Hope you get some good exercise at least.
It's not running in circles. Finetuning is deceptively simple. Have you ever tried to do it? There seem to be more people in this general who have written their own frontends than people who tune their own models.
For finetuning I've tried unsloth, fsdp-qlora, axolotl, and now finally setled on llama-factory which seems to be the simplest/highest level solution that works with the widest variety of models, but even then it's fiddly and you can tell the stack is duct taped together.
I've also added a parameter to the /truncate funciton in my assistant (before the number of characters to keep was hardcoded in the source file) and modified the proxy to correct the requests to the format expected by llama-factory for inference (u/a/u/a format without two user or assistant messages in a row, first and last message are user beside system prompt).
Now the main issue I'm having is Gemma not remembering what directory it's in and thus failing to generate the list file tool calls correctly.
What have you done anyway? Haters never post their projects.

Anonymous
10/25/25(Sat)21:22:57 No.107009221

Anonymous 10/25/25(Sat)21:22:57 No.107009221

avoid describing the {{user}} in the character card or first message
avoid using genre tags
if YOUR role is really important for the ai to know you can just introduce yourself like
*the quadroplegic cop police officer blows into a tube and his wheelchair slowly approaches her* "I'm Officer John, I need to see some ID"

Anonymous
10/25/25(Sat)21:30:50 No.107009295

Anonymous 10/25/25(Sat)21:30:50 No.107009295

File: eff.png (11 KB, 958x506)

11 KB PNG

>>107009170
I don't hate on what you do. I'm telling you about the patterns I see in your posts.
>What have you done anyway? Haters never post their projects.
Stop using that deflection. I showed you one of my synths and my design library. This time I suppose i'll go with eff. It's a stack vm, an assembler for the the vm, and a forth interpreter. top-left is part of the core for the assembler, right is the bootstrap to make a more usable forth system, bottom left is a tiny bit of the vm. It has a very simple text editor, not too far from ed.
Funny thing is that i've written more stack vms than actual programs in forth.

Anonymous
10/25/25(Sat)21:38:49 No.107009344

Anonymous 10/25/25(Sat)21:38:49 No.107009344

File: output.png (211 KB, 810x482)

211 KB PNG

>>107008941
>"Ensure a slow burn" works wonders.
It's a bandaid. The problem is that all your models talk the same and do the same shit. Nothing interesting ever happens. Look at these two outputs. Can you guess which one is the latest cydonia? Yeah, the one where we just leave. Ignoring the quality of the outputs and just speaking from a plot point of view, it's so fucking boring and there's no way to prompt the model to do anything more interesting because if you tell it "have more things happen" then it'll do some gay shit like hit the town with a meteor. You need to stop training so much on synthetic data or something, because it's fucking trash and makes every output the same lame shit.

Hi all, Drummer here...
10/25/25(Sat)22:57:35 No.107009847

Hi all, Drummer here... 10/25/25(Sat)22:57:35 No.107009847

>>107009344
Try https://huggingface.co/BeaverAI/Cydonia-24B-v4p-GGUF or v4o (Magistral and Small 3.2 respectively)

Everything I've listed down in the model card are things I genuinely aimed for in the tune. Had to scrap it and start over for the actual v4.2.0 release, though some users preferred v4o/v4p versus v4r/v4s.

Let me know if that works for you.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.