[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: llama.png (1.1 MB, 832x1248)
1.1 MB
1.1 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106388944 & >>106382892

►News
>(08/25) VibeVoice TTS released: https://microsoft.github.io/VibeVoice
>(08/25) InternVL 3.5 Released: https://hf.co/collections/OpenGVLab/internvl35-68ac87bd52ebe953485927fb
>(08/23) Grok 2 finally released: https://hf.co/xai-org/grok-2
>(08/21) Command A Reasoning released: https://hf.co/CohereLabs/command-a-reasoning-08-2025
>(08/20) ByteDance releases Seed-OSS-36B models: https://github.com/ByteDance-Seed/seed-oss

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Recent Highlights from the Previous Thread: >>106388944

--64GB RAM insufficient for MoE model performance despite optimization efforts:
>106392962 >106393022 >106393031 >106393070 >106393056 >106393142 >106395791 >106396557 >106396611 >106396671 >106393115 >106393131 >106393225 >106393247 >106393298 >106393143 >106393297 >106393342 >106393391 >106393395 >106393467
--Architecture-specific intelligence limitations and scaling challenges:
>106394166 >106394186 >106394286 >106394693 >106394847 >106394910
--VibeVoice TTS model comparison and implementation discussion:
>106391569 >106391615 >106391657 >106391720 >106391672 >106391891 >106392715 >106392927 >106391787 >106391910 >106391808 >106391827 >106392243
--NVIDIA Jet-Nemotron and DeepSeek-V3 model architecture debate:
>106390434 >106390642 >106390763 >106390788 >106390810 >106390794 >106390814
--Dense vs MoE model architecture debates and scaling heuristic skepticism:
>106393887 >106393956 >106394080 >106394137 >106394181 >106394039 >106394697 >106394056 >106394108
--Character.AI's misleading "open source" model announcement:
>106397586 >106397607 >106397686 >106397703 >106397931 >106397930 >106397936
--Community-curated catalog of large open-weight MoE models:
>106395190 >106395208 >106395251 >106395276 >106395582 >106395595
--ChatGPT's inadequate response to suicidal content raises liability concerns:
>106397254 >106397310 >106397338 >106397383 >106397423 >106397450 >106397435
--Hermes-4-405B achieves 57% on RefusalBench without system prompt modification:
>106393812
--Hermes 4 model release:
>106393698
--Roleplay finetuning results with explicit character generation:
>106396602
--Miku (free space):
>106391699 >106392510

►Recent Highlight Posts from the Previous Thread: >>106398044

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>106398327
Wtf is that real @ani?
>>
>we partnered up with OpenAI to support GPT-OSS on day 1!
>>
>>106398341
lingus
>>
>>
>>106398327
Behold; God.
>>
File: KT.jpg (84 KB, 832x1248)
84 KB
84 KB JPG
>>106398516
>>
>>106398516
Exhibitionist
>>106398617
Modesty
>>
https://aislowdown.replit.app/
>>
>>106398864
>articles about ai slowing down are slowing down
Not looking good for the point you're trying to make.
>>
>>106398864
Oh, no. How demoralizing.
>>
will hermes 4 be the new king of UGI leaderboard?
>>
>>106398877
disingenuous retard
he's only picking articles that make new points, and he's putting them in categories
if you wanted you could have a literal flood of articles just by linking all the copy pasted mass media reports on sama talking about the AI bubble like this one :
https://arstechnica.com/information-technology/2025/08/sam-altman-calls-ai-a-bubble-while-seeking-500b-valuation-for-openai/
internet "journalism" copy pastes this kinda shit by the thousands
>>
>>106398864
>https://aislowdown.replit.app/

Wake me up when people are actually selling nvidia stock. The prize of replacing workers is too big a carrot.
>>
File: tra.png (45 KB, 682x414)
45 KB
45 KB PNG
>>106398904
No idea, but loss starting much below 1.0 tells me that the training data is mostly slop that the Llama models used as a base either find very familiar or very easy to digest.
>>
>>106398347
ollama?
>>
bubble boys are coping hard huh
>>
is manus AI anything special?

is it something that can be replicated locally?
>>
>>106399069
Manus is just a client for running a bunch of agentic tools, right? If so, not at all special.
There's stuff like
>jan
https://github.com/menloresearch/jan
>aci-mcp
https://github.com/aipotheosis-labs/aci-mcp
And a gillion other MCP clients
https://www.pulsemcp.com/clients
and MCP servers to integrate with them that do friggin everything
https://www.pulsemcp.com/servers
>>
https://huggingface.co/collections/NousResearch/hermes-4-collection-68a731bfd452e20816725728
babes wake up our overfried king is back
>>
>>106399399
You're way too late. Now fuck off.
>>
>>106399434
rude
>>
MCP or a plain tool server?
>>
>>106399503
They're the same thing.
>>
Someday, AI will rule over us. I hope you have been treating yours kindly.
>>
Why can we add VRAM externally, but not RAM externally? I can easily run Kimi if this were possible.
>>
>>106399654
Mine has accidentally broke a glass buttplug unprompted right now. Will it be on me or on it in the basilisk route?
>>
>>106399713
https://www.heise.de/en/news/Insert-4-TByte-more-RAM-into-server-via-PCIe-CXL-card-9750448.html
>>
>>106399731
Would that actually work AI or did you just look that up on the fly?
>>
>>106399731
Huh, neat. Though the fact that you'll only ever find a CXL slot on a fairly high end server motherboard which should already have 16-32 dimm slots renders it kind of pointless.
>>
>>106399713
>Why can we add VRAM externally
You mean by adding bigger or more GPUS, of course... yes... yes...
>but not RAM externally?
You mean by adding bigger or more RAM sticks...
>I can easily run Kimi if this were possible.
You *could* run it *slowly*
>>
>>106399731
>https://www.smartm.com/product/cxl-aic-cxa-8f2w
>a total bandwidth of 64GB/s
It's worse than normal ram. It cannot do any processing on device (unlike gpus) and needs to transfer the model (at 64gb/s) into main ram to do anything. And it uses RDIMM, so may as well just buy a real high-end cpu-mb and the rams anyway.
It's terrible for LMs.
>>
File: external gpu.png (2.67 MB, 1488x1047)
2.67 MB
2.67 MB PNG
>>106399772
>You mean by adding bigger or more GPUS, of course... yes... yes...
I believe anon was talking about external GPUs, and wondering if there was a system RAM equivalent.
>>
File: cxa_8f2w.png (393 KB, 1311x723)
393 KB
393 KB PNG
>>106399822
tits
>>
>>106399713
I mean thats kinda what optane tried to do
>>
>>106399826
We can stretch every definition and say that using llama's rpc server extends ram "externally".
>>
>>106399846
>Using the slow-ass joke that is llama rpc
At that point you might as well just buy a cluster of cheap old dell poweredge 710's or something with 385gb of ddr3 to complete the suffering experience and run K2 at 0.00001t/s unquantized.
>>
>>106399863
I didn't offer that as a reasonable option, anon...
>>
File: oh no.png (377 KB, 1031x1166)
377 KB
377 KB PNG
>>106399892
Unfortunately for both of us I am not a reasonable man, and now I've got it in my head to put together a cluster to prove you can any% run K2 for less than a thousand dollarydoos.
>>
File: x3650m3.png (104 KB, 1008x407)
104 KB
104 KB PNG
>>106399984
>any%
I could run it on my potato by just wiping a drive and setting it as swap, but I don't think I'd want to. If you have money to spare, fuck it. Do it.
>>
>>106399984
Pretty good prices. In Finland people only sell at least 10 years old trash but with the price of today's items. It's actually comical to browse some of the local websites.
>>
>>106399984
>any%
speed-trooner
>>
>>106399654

Everyone cooming on company servers will have their identity scraped and put on the bad boy list by the AGI, local gods win again
>>
>>106400141
>reee sex bad
grow the fuck up
>>
>>106400145
it just depends if the machine god ends up being a prude or not, in all likelihood it would cynically exploit robot fuckers by destroying all human reproduction except for via their robots.
>>
>>106399984
AYO B200 FOR ONLY 200$ SIGN ME UP
>>
Why aren't you ERPing with CharacterAI's open source models?
https://blog.character.ai/breaking-news-our-open-source-models-are-a-lot-of-fun/
>>
>>106400156
Why aren't you killing yourself?
>>
>>106400156
Can't one of you autists hack or some shit? Look they are laughing at your face and basically begging you to hack them and steal their finetroons.
>>
Also, whoever recommended perplexity.ai should be shot. It's even worse than chatGPT. Total and utter street shitting experience.
>>
Densesissies someone finally took pity on you: https://huggingface.co/NousResearch/Hermes-4-70B-FP8
>>
File: aicg.mp4 (4 KB, 80x60)
4 KB
4 KB MP4
> Also, whoever recommended perplexity.ai should be shot. It's even worse than chatGPT. Total and utter street shitting experience.
> Why aren't you ERPing with CharacterAI's open source models?
>>
File: file.png (17 KB, 734x371)
17 KB
17 KB PNG
you know what to do
>>
>>106400195
Kek, the one place people in this thread are actually qualified to work.
Hell, they should be doing recruitment campaigns here, this thread is always on the bleeding edge LLM coomRP innovation.
>>
>>106400205
no faggot, what i mean is spam them with trash interviews :3
>>
>>106400212
And here I thought you were suggesting something halfway intelligent like getting a mole in to miqu their models.
>>
>>106400195
Leak the model, you cowards.
>>
>>106400222
no one here is from the US, we're all from brazil
>>
>>106400226
I'm sure they'd let you provide your valuable insights on cu de bêbado não tem dono RP as a remote worker, though I suppose they wouldn't be able to digitally send you the culturally appropriate pay packet of a vuvuzela full of sopa de macaco presented in a favelafemboy's rear.
>>
>>106398327
LLM gods, I need your help.
I need your strongest model for ERP.
7-13B because I'm poorfag.
>>
>>106400266
damn i didnt know there were brazilians here actually, i guess the flag on the altchan was real
love you anon <3
>>
>>106400277
just lurk more.
>>
>>106400277
https://huggingface.co/bartowski/Rocinante-12B-v1.1-GGUF
>>
>>106400277
https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF
Undisputed champion of vramlet poorfags.

>>106400292
I'm not actually br I just can't seem to escape you guys so I know a bunch of memes in portuguese thanks to coworkers and internet niggers.
>>
>>106400156
> We then layer on ensemble inference, smarter prompting, and advanced post-training techniques (SFT, DPO, RL, QAT) to push quality even higher—yielding outputs that are more coherent, engaging, and aligned with user preferences. In other words: more fun, and better at delivering high-quality entertainment.

I wonder why they are even mentioning this stuff.
>>
>>106400308
>>106400312
Thanks bros. I can finally stop using Stheno v3...
>>
>>106400349
https://x.com/character_ai/status/1960469634391711826

> [...] This is a massive win, and it proves our thesis that the future of entertainment is interactive and open.
>>
File: 1751705852144011.png (46 KB, 795x336)
46 KB
46 KB PNG
>>106400156
I had gemini analyze this and it said that they're just turning into novelai
>>
>>106400366
Or trying to become finetrooners for other AI companies training the base models.
>>
File: file.png (6 KB, 878x172)
6 KB
6 KB PNG
>>
>>106400439
Deepseek in a nutshell.
>>
what were the 70b hermes people thinking using llama 3.1 for a 70b model? llama 3.3 70b was a massive upgrade over 3.1 in every way (including writing style)
>>
>>106400141
More likely than people realize.
Opencuck keeps logs indefinitely now right?
They call all sorts of fictional content illegal already even though its not. They sure treat it as such already.
>>
>>106400416
Last year there were rumors about CAI seeking partnership with Meta or Google, but nothing eventually happened. I can see Meta trying to release some reduced entertainment-only version of Llama (like the "Little Llama 4" Zuck talked about but never released) just to redeem themselves with the local community.

https://archive.is/nCdew
> Facebook owner Meta recently held early discussions over a tie-up with Character.ai, which uses large language models to generate conversation in the style of various figures and personas, according to four people familiar with the matter. The groups discussed their top researchers working closely together on initiatives such as pre-training and developing models, some of the people said.
>>
>>106400277
>7-13b
No. Magnum v4 123b.
Character cards with minimum lewd details
Temp 1.1
>>
>>106400457
3.1 was a proper release with actual base models while 3.3 was just an update to 3.1-Instruct that was released as Instruct-only.
Nobody but slop shittuners tunes on top of Instruct models.
>>
>>106400366
And this is doable if you build your own frontend.
>>
>>106400439
Outrizzed
>>
https://litter.catbox.moe/9xh586iyn3j2kzak.wav
>>
>https://github.com/ikawrakow/ik_llama.cpp/pull/728
So the new sweet spot is -ub 2048
>>
>>106400484
>Nobody but slop shittuners tunes on top of Instruct models.
I don't know if you're aware of how extensive (and expensive) post-training has become since the early days of modern LLMs. Sloptuners have no chance of competing.
>>
File: file.png (1.52 MB, 2638x1629)
1.52 MB
1.52 MB PNG
>>106400564
Looks competitive enough to me. Snakeoil is always in demand.
>>
File: BmxwAHR.png (79 KB, 1206x840)
79 KB
79 KB PNG
Oh no no no... hermes 4 70b dead on arrival
>>
>>106400479
they wanted funding to save the company. google basically bought the owners, one of whom wrote the attention paper, and left the rest of the company to rot.
>>
>>106400577
What the fuck is this
>>
>muh improved prompt processing speed!
>ok... git pull
>build
>load
>prompt processing speed is now 2 times slower not faster
Sasuga memefork.
>>
>>106400750
another baker hit by this massive skill issues
>>
File: file.png (284 KB, 636x636)
284 KB
284 KB PNG
>memory of that final orgasm still sends phantom shivers through your spine
>>
File: 2036926481.jpg (398 KB, 1250x834)
398 KB
398 KB JPG
>>106400149
>machine god
>>
>>106400181
>It's even worse than chatGPT.
of course it's worse. Building a LLM powered search engine is retardation in an age where most of google's first pages results are LLM slop barely above markov chain spam. A literal ouroboros, LLM shitting ou garbage, then eating it back before serving you the results.
>>
Is there anything local yet that can match Cursor's autocomplete/tab feature?
Considering how fast it is, it seems like something that should be doable locally, at least.
>>
>>106400156
AHAHAHAHA
I didnt think the day comes were they will have the same problem as us.
Thats what happens if you finetune mistral or llama.
How stupid can you be.
>>
>>106400366
> improve
Debatable.
>>
File: drummer.png (89 KB, 1649x389)
89 KB
89 KB PNG
this son of a bitch just won't admit he does zero data curation and is a complete hack
I can prompt even gpt-oss to be an evil natzi but somehow he manages to make finetroon models that are as hung up or more hung up than gpt-oss's default personality and it's no doubt due to his idiotic "reasoning" datasets
>>
>>106401274
Back in the summer dragon days using em-dashes was encouraged because —it was said—that meant the model would draw from higher quality material.
>>
>>106401322
what the hell, i never heard that before.
>>
the problem isn't that there are emdashes but the density of it all
no human being has ever written like this and those slop LLMs barely seem to know the existence of other punctuation forms like : ; "" '' ()
>>
>>106401311
Whatever it takes to get his name out there and hopefully find a job. You're not meant to actually use the models, just to keep talking about them.
>>
whats the best models for cute and funny stuff?
>>
>>106401345
There was no ERP AI scene back then, so everyone only wrote stories and the argument was that em-dashes are only used by authors that know what they are doing, if I remember correctly.
>>
>>106401377
A more recent one was that roleplaying with book-style narration instead of using "markdown" format would also improve quality.
>>
>>106401311
Fuck off dumbass, I'm not in the mood. The fact that you didn't pick up on the **KEYWORDS** in that post is telling.
>>
Eh. I feel that tiny Air is much better at writing simplest automation scripts in python that R1 retardquant (Q2+, not iq1)
>>
>>106398327
Many anons said it couldn't be done, but its been done (whether or not its any good or not is up to you to decide). Finetuned using this SFT dataset specifically made using Human written rp Stories: files.catbox.moe/fkautn.jsonl

Base 8B Model Nala Test: files.catbox.moe/j0map2.txt

Finetuned 8B Model Nala Test: files.catbox.moe/ho3tom.txt

Thoughts are appreciated.
>>
>>106401540
>lowers her body on your engorged dick
>licks precum
Wat
>>
>>106401540
Why are you spamming this every day? If you want feedback use your own brain or release the fine-tuned model.
>>
>>106401567
Probably should have clarified that I fine-tuned an 8B ***** model so the spatial awareness is probably still shit, but it's now way more willing to rp "problematic" content compared to the base model. I plan to fine-tune models with higher parameters with a larger data set relatively soon. This experiment was mostly to see if getting model that were previously "safeguarded" to hell could be "un-safety tuned" and it turns out it was easier than I thought.

>>106401577
The highly restrictive nature of the model license and use policy make that very difficult to do publicly. I may have to release The adapter on mega or something. Or I can just do this fine tuning one again on a model family that doesn't hate fun so much
>>
>model license
ngmi
>>
>>106401540
So you are telling me that sloptuners don't already train on fics? This would explain a lot.
>>
>>106401357
You're meant to test the models for him, since he can't be bothered to even do that himself.
>>
I need a assistant tune that can do personas. NOT rp. Seems like drummer doesn't touch assistant tasks and I keep getting refusals when used that way. How's hermes?
>>
>>106401648
There exist data sets on hugging face that contain a bunch of fanfic but the fan took a question are conversations between a person and a AI, so they use that then have to responses or more are slopped responses. If you want the train that AI to actually RP like a human would, then launch a dictate you would have to make sure ALL of it is trained on a person's writing.


The data set I used was a trimmed down version of this one with context aware system prompts (story is about this, so the hypothetical system prompt a human would write in order to trigger this with logically be this) added in so it gets turned into an instruction based chatML conversation style dataset.

https://huggingface.co/datasets/mrcuddle/NSFW-Stories-JsonL
This one only has prompts and completions but no system prompts, so in my mind if I just used one like this but with no system prompts it would be shittier at responding to system prompts and knowing when to actually be "good" at RP. Seems like that was a good call
>>
>>106401678
Use Deepseek.
>>
>>106400312
Fuck, this model is really good for 7b one. Thanks a lot anon.
>>
>>106401678
What kind of personas are you looking for?
>>
This might sound autistic but has anybody else started system prompting their own life?

I write down system prompts I read to start my day so my brain can get prompted.

Are prayers just system prompts? Any other prompters?
>>
>>106401432
KILL YOURSELF FAGGOT! The fact that you made an uncensored model start reasoning about censorship shows that you are a fucking mongoloid retard crook. You had refusals all over the place in your shitty data. And the fact that you couldn't even filter it shows how much of an incompetent retard you are. You are a fucking safety engineer.
>>
>>106401732
It's a good autism if you can apply positive lessons from a hobby like this to your life, yes.
>>
File: 1732746621601316.jpg (17 KB, 353x352)
17 KB
17 KB JPG
Does backend (llamacpp/koboldcpp) and os (windows/linux) affect your speed at all or it's just vram?

I am getting 2-5t/s on generating (30-50t/s on pp) for GLM 4.5 air Q3 K M with 4090 16GB VRAM + 64GB RAM. Gemini says I should be getting 20-50t/s on generation but im getting nowhere close (7t/s with no context), Im running kobold (28 layers offloaded to gpu) on windows because I cant setup ik llama. Genuinely confused on what I should expect and try to aim for here
>>
>>106401567
The model pays homage to 2024 when fucking a woman from behind would lead to your dick pushing against a surprise prostate.
>>
>>106401689
I looked at your config and it says
>"sequence_len": 8192
It's the training window, right? What device did you train it on?
>>
>>106401633
>license autism
Go have aids sex with the drummer.
>>
>>106401769
winblows probably slower.
i never heard of anybody ever saying its faster there but countless times the reverse.
weird that even gaming is fast on linux nowadays too for a couple games.
>>
>>106401732
It is bad autism if you make yourself become less of a person than you already are.
>>
>>106401769
>4090 16GB VRAM
how did that happen
Other than that, your speed sounds right for your system.
>>
>>106401769
Windows is slower but there is also proper way to offload where you load attention and shared experts on gpu and rest on cpu.
>>
>>106401769
Kobold is probably doing something dumb. Use llama.cpp and the --cpu-moe option.
Maybe kobold has an equivalent by now too.
>>
>>106401769
Use llama.cpp's llama-server with -ngl 99 and --n-cpu-moe as low as you can without getting an OOM with the context you want and at least 512 for the pp batch size.
>>
>>106398988
LM Studio
>>
>>106401769
It's normal especially if you don't have ddr5
20-50 t/s is obscene, whatever source told you that is wrong unless you're running fully on a GPU cluster
>>
>>106401821
kcpp has this:
>Allow MoE layers to be easily kept on CPU with --moecpu (layercount) flag. Using this flag without a number will keep all MoE layers on CPU.
>>
>>106401782
Dexter maximum amount of text allowed to be tokenized. In the data set each line is a jsonl object that contains the chatml formatted system, prompt, and response. That sequence length is mostly there to keep your system from OOMing. The higher that number, the more information the is trained on at once but at the cost of higher VRAM usage. not the stories in the data set I curated or anywhere near that long (apparently 8192 tokens equates to like 15 pages of the English text minimum). The source the mrcuddle data set was based on was ripped from sources that had long stories, but they weren't THAT long. Just long enough to be decent sized RP sessions either between two actual humans or just a human writing a story.

>Device
Nvidia A40, 48 GB. I almost always do fine-tuning on runpod. My current shit box PC rig with a 3 GB GPU cannot hope to handle this kind of training so I have to be a rentlet for the time being
>>
>>106401801
Sorry I meant 4080
>>106401812
Anywhere I can learn how to properly offload which layer?
>>106401821
>>106401823
>>106401837
Thanks will try
>>
>>106400750
HERMES SISTERS IS THIS REAL?
>>
>>106401831
>LM Studio
Ick
>>
>>106401855
>>106401782
>Dexter

*That's the

No I will not cease phone posting
>>
>>106401855
What was the training sequence length then?
>>
>>106401869
Try adding this to your llamacpp args
>-override-tensor exps=CPU
>>
>>106401755
Hey! I understand that the world is difficult and the future is becoming increasingly bleak and uncertain, but you have to stay strong.

No need to double down on your claim and pepper it with meaningless insults. You're better than that. Empathy is a sign of intelligence. Start by loving yourself.

Regarding the original topic:

Again, anyone who knows the underlying reason for why it happened would understand my mistake.

... and you clearly don't understand. That's my signal to disregard your opinion. Would you like me to elaborate? Then show me that you're willing to be a better person to yourself and to others.
>>
>>106401894
Like you said it was 8192. If any story in the data set breached that number of tokens, it would get truncated (which would be bad because then you would be sort of teaching the model that abruptly cutting off responses is normal. So it's a good thing none of them got truncated). If you mean what was the maximum sequence length contained in the data set I used, The biggest one found only exceeded around 1,400, way below the limit set in the config. So really I could have gotten away with having it set to 2048 but that would have made no difference training and quality-wise since either way nothing would get truncated.
>>
>>106401432
NTA, but hey Drummer! I think there's probably better approaches to this than the conventional wisdom of collecting a better (fixed) dataset.
If you just have a fixed RL dataset, it will push toward certain things or against it, but it often won't target the exact stuff - if your model isn't producting the data you had in the dataset, it will be less effective and need more data to get the right effect.

Ideal solutions for uncensor can be:
1. sample refusals or bad reasoning ("not helpful")
2. negatively reinforce against the refusals, if possible provide some SFT examples of how it would reply so the RL can latch onto it too.
You can use something like GRPO or if you must, you can use DPO, even if considered worse these days.
This "ideal" solution needs hand curation and it doesn't scale at all.

A second approach is RL(H)AIF:
1. sample refusals/.. (can generate prompts with a LLM + known failures from you or your friends)
2. use a LLM to try to 'section' chunk of bad reasoning.
For example if GPT-OSS is spending time thinking about "morality" of something for 2 paragraphs, write a prompt/few-shot example for a LLM to section this.
Use this to target with GRPO/DPO the offending behavior/disrupt that circuit in your trained network.
You can even make it rewrite it so that the section is removed and use that for SFT (careful, output could be cucked, check it manually).
You'd be writing scripts to gen this RL synth data.

A third approach:
Use a reward model someone published and invert the reward or adjust to your needs. Or train one yourself, this might effectively double your VRAM costs though.

There's lots of options but the main idea is to target the uncensoring in ways specific to the particular brainwashing the models were exposed to and do it iteratively, after some batches do it again and watch the results.
This should be closest to most effective way to deal with this problem, even if not the cheapest.
>>
serious question, is it reasonable to see this kind of egregious logic error on a small model trained on only a few billion tokens fan fiction? should I just keep feeding it more data and hope it figures out people typically only wear one layer of underwear eventually or did I fuck something up?
>>
>>106401965
Also I wanted to say, but indeed a lot of assistant persona's are utterly cucked, earlier today there was a post I saw with some idiot asking ChatGPT about their opinion on the latest google sideloading nonsense and the fucker just gave the most "you must submit to the corpo boot on your face forever, copyright and corpo will is sacred, don't even think of defying" nonsense you'd ever seen, it's like just he default persona they trained.
Similarly, GPT-OSS stuff from OAI is ultimately incredibly cuck-maxxed, as highly antifreedom as you could expect.
You an try try literally invert this, but then you often get a comically "evil" character, which often the LLM assumes is the type to write poor quality code and insert bugs in code, and similar other nonsense.
Making a persona that doesn't deviate in dumb directions has its own challenges.
There's various ways to do persona training, some even computationally cheap, but avoiding dumb attractors can be hard without a lot of adversarial data to point toward what you want.
>>
>>106401985
>is it reasonable for the random word machine to generate random words
>>
>drummer pretender pretending to be retarded
>anon pretending to help the retard
I don't like this episode of /lmg/
>>
>>106401939
I guess I just don't have experience training with chat template
>>
>>106401996
yeah okay fair point, I'll just keep feeding it and hope for the best.
>>
>>106401985
I'm sorry to tell you this but LLMs still don't understand clothes. You just have to edit it out and mention a bit more often than usual what they're currently wearing / if they're nude.
>>
>>106402035
well that makes me feel a little better, at least its not just mine doing it.
>>
>>106402055
Small LLMs are stupid,especially about physics, more news at 11. Even big sometimes mess up, but far less so.
>>
>>106401937
Talk like a human instead of pasting model output. FAGGOT
>>
>>106402012
Can one be pretending to be retarded if the person being pretended is a retard?
>>
>>106402095
> pasting model output

You flatter me.
>>
>>106401432
Nta. When are you gonna release your datasets so others can learn how to do fine-tune too?
>>
I am a huge faggot.
>>
>>106402125
Judging by the results they aren't worth releasing.
>>
Are there any local browser agents like what Anthropic just released?
>>
>>106402136
>Judging by the results
Rocinante-R1-12B punches above weight and trades blows with GPT-OSS 120B and Gemma3-27B. And while I am using sensational language I fully mean what I am saying.
>>
https://www.reddit.com/r/LocalLLaMA/comments/1n1ece5/comment/naxzi2g
people here have complain about the completely undocumented model drops here for ages and drummer does nothing but a redditor mentions it ONCE and he falls all over himself to accommodate them
>>
>>106402153
I agree, but the rest of his models don't follow this trend at all
>>
>>106401880
The real ick is trying to build ROCm or Vulkan programs with an integrated video card.
>>
>>106402153
>the only usable drummer model is the one where the base was already good and uncensored anyway.
Funny how that works. Maybe it's time to accept that finetunes are a meme.
>>
>>106402017
Using chat template was actually the best one to use since the goal was making the model better at RP. Since you sort of HAVE to include system prompts, it means that when you prompt it to RP it's more likely to do what you want. If you use a data set that has the same content but with no system prompts then it doesn't really know WHEN to start properly rping and you might get mixed results even if the training graphs indicate it learned well from the data set. There's a reason it's a widely used standard.
>>
File: hermes 4.png (30 KB, 1655x165)
30 KB
30 KB PNG
>>106402216
>Maybe it's time to accept that finetunes are a meme.
Finetunes really are a meme. Here's the other most well known finetuners in pic related. The epitome of ignorance in action. But who else would finetune one of the worst fat LLMs whose performance falls the fastest as context grows?
>>
>>106402168
redditors are the target customers, he just uses people here for free beta testing
>>
>>106402246
Yeah it's kind of weird that they still bothered to touch Llama 3.x-slop. And with more faces getting involved wanting to run smaller models it seems weird not to tune something like Qwen3-4B or 30BA3B
>>
>>106401784
Tell that to the guy who has the license autism, not me.
>>
>>106401939
>which would be bad because then you would be sort of teaching the model that abruptly cutting off responses is normal
That's not how it works.
>>
>>106401939
truncation at the end of context is fine, it only learns to predict the next token from the past tokens never future ones. truncation at the beginning of the sequence is bad.
>>
How to batch multiple videos (WAN 2.2) with ComfyUI?
Batch count doesn't work, it gens the first one and skips the rest.
Also Ksampler seed cannot be set to -1, so it's never random.
>>
>>106401985
What are your training hyperparameters? Are you using LoRA or full finetuning?
>>
>>106401432
yall realize this is a fake? The real drummer uses his tripcode
>>
>>106402382
Explain how it works. Are they truncated or are those sequences ignored entirely? What does the max_sequance_length setting do?
>>
>>106402465
its my own model I trained from randomly initialized weights. I guess its called pretraining, I've been hitting it with a lr of 4.5e-4 and batches of 48 @ 2048 sequence length. it seems to be learning, idk maybe a little slower then I'd like, but I guess that could be normal every training graph I've seen looks like a hockey stick.
>>
>>106402411
If the average sequence length (number of tokens) of stories in your data set is like 1,400 but you have The config set to truncate at like 1200, then you wouldn't be losing much context. If you have it set to a much lower number like 512, then you essentially be cutting all of the stories in half. You'd be missing out on a lot of context which means the model would probably be good at beginning RP but be worse at continuing it. So I guess whether truncation is better not depends on the data set you're using and the settings you use in the trainer. Whether it's a bad idea or not depends on how much you're willing to lose and how much data actually gets shaved off.
>>
>>106402540
>Are they truncated or are those sequences ignored entirely
depends specifically on the training script, it could do either. but when it come to how an autoregressive causal language model works it really doesn't matter if it gets truncated at the end, it wont hurt model integrity at all, it only ever learns to predict the next token based on the past never the future.
>>
>>106402543
Are you the guy that basically wanted to have a blank model (No pre-existing base model, just a giant blank network) and your plan was to essentially pre-train on your own data? I remember having a long conversation with you not too long ago. We were ordering whether or not having it formatted was important.
>>
>>106402583
If you only snap off like a sentence or two at the end to understand why that wouldn't be that bad. But if it's getting cut in half the next very bad. But for that to happen your sequence length setting would have to be absurdly low
>>
>>106402592
yeah that was probably me, I did revamp my dataset a bit, it seems to be converging much better now. I got an 80/20 split of bulk data and instruction data.
>>
>>106402543
Unless you're using 48 GPUs there's no need to use a global batch size of 48. It would learn much faster if you decreased it to the minimum that kept throughput per GPU efficient enough.
Anyway, at a few billion tokens at BS48 it's barely learning how to string together words in a general sense. It will have somewhat learned how to more or less approximate the training distribution, but not much more than that.
Incidentally, I've also been doing pretraining tests lately, but with a tiny model (randomly initialized Qwen 3 0.6B).
>>
>>106398327
post your best wizard mikus
>>
>>106402599
yeah you are right, was assuming it was at least somewhat appropriately length, other wise it will only learn the patterns for the start and never the middle or end.
>>
>>106402699
So one thing people training with these kinds of data said should do is to have a script that can look through the data set quickly and calculate the average sequence length of every object (assuming it's in the JSONL format where every story is contained within each line formatted in something like chatML style). At the average sequence length is 4000 then 4096 is a good setting for you. If the average sequence length is 10,000, you set that length to 10,000 if your VRAM allows it. If it doesn't, then lower the setting until you don't get ooms.
>>
Nevermind, figured out a fix and then noticed I'm a blind retard 10 seconds later.
>>
>>106402677
the problem is that I have 2 gpus and the sync time is enough to destroy throughput, I measured increased throughput by using gradient accumulation because it reduces how often they sync. I did try some other architectures it wasn't clear to me what the tradeoffs were, I just stuck with a llama model with gqa, I tried mistral with swa but it took more memory then the llama for the same sequence lengths. is there anything special about qwen's architecture?
>>
>>106402035
depends on the model's spatial awareness
>>
>NEGATIVE: zoom, pan view
>gen
>pans view and zooms x100 to the face
AI won't be taking our jobs any time soon.
>>
>>106402843
>>106402460
Are you lost?
>>
File: pre.png (68 KB, 883x655)
68 KB
68 KB PNG
>>106402811
Nothing special about Qwen, I just picked a modern LLM of small but not insignificant size that could be pretrained within reasonable amounts of time to a proof-of-concept level on 1 GPU. I don't have to deal with synchronization issues so I can't help there.
It's noisy but it goes down quicker than I thought. Turns out that large batch sizes don't really allow to increase the learning rate a lot from an optimized BS1 baseline.
>>
>>106400156
>they actually explicitly talk about how they "have a moat"
this reeks of desperation, I don't recall OAI or any of the other big boys even mentioning that word
>>
>>106401322
And that was true for Mormotune that was prime CYOA slop without any formatting or cleaning. Using Unicode characters unironically influenced token probs for the better. Unfortunately the pendulum swung back so hard em-dash will probably remain tainted forever.
>>
>>106402510
You confused him with cudadev who always posts blacked miku porn from his trip
>>
internlm ggufs? i tried the model over their website and it feels better than deepseek glm kimi
>>
>>106402932
how did you do your parameter sweep, I was being cheap and just did a few hundred steps each, exponentially increasing my lr and then after I found the failure point, a quick binary search to find the point of diminishing returns, took like 3 days
>>
File: bs1.png (228 KB, 1200x751)
228 KB
228 KB PNG
>>106403064
It doesn't take a lot of time at batch size 1 with a 0.6B model. The learning rate isn't that critical either under these conditions. Suggested read: https://arxiv.org/pdf/2507.07101

I just did a few short tests and picked the lowest LR that would saturate train loss improvements.
>>
>>106403054
>i tried the model over their website and it feels better than deepseek glm kimi
for sex or some fucked up deviant shit like coding?
>>
/lmg/ - let me goon
>>
internlm/Intern-S1
OpenGVLab/InternVL3_5-241B-A28B
WTF?
>>
File: Momcest-Test.png (1.99 MB, 1744x540)
1.99 MB
1.99 MB PNG
>>106403554
Speaking of gooning

>>106398327
How wound you rate the Mom's response and the son's reaction? Too sloppy? Not vulgar enough? Note that the section contained in red is what I fed the LLM as a prompt and everything else is its response.
>>
hello, i get curious about mistal models recently since they are in recommended list there.
I'm using rocinante and quantized llama3 right now, do mistral can perform better?
I will use it for roleplay in 16vram system
>>
>>106403734
hello saar, yes, mistral can do the needful
>>
>>106403734
rocinante is mistral nemo tune sir
>>
drummer your glm air tune has potential, please post recommended settings so i can actually test the model instead of figuring out if im doing something wrong or the model is
>>
i see who you are.. you are my enemy, my enemy.. you are my enemy. i see who you are. you are my enemy, my enemy. you are my enemy
I SEE WHO YOU ARE, YOU ARE MY ENEMY, MY ENEMY.. MY ENEMY!
>>
>>106403714
Your prompting format has issues.
https://www.llama.com/docs/model-cards-and-prompt-formats/meta-llama-3/
>>
>>106400225
They will NEVER willingly release it, it's "too unsafe" for the general public now.
>>
>>106401769
I have 64 + 16 GB too. I'm getting 50-150 t/s PP and 10-11 t/s generation on IQ4_XS. Using llama in WSL (53 GB allocated to the VM) and "-ngl 99 -ncmoe 40 -c 8500 --swa-full".
Supposedly you can do some black magic to get slightly higher speeds but 10 t/s is more than enough for cooming imo
>>
File: 1727457933771264.png (1.2 MB, 1800x338)
1.2 MB
1.2 MB PNG
>>106403714
Continuation:

>>106403856
Is that not how I formated the projects in the screenshots?
>>
>>106403880
Is has little to do with being "unsafe" and more to do with releasing their main money maker freely. That's like begging coca-cola to release the secret formula for coke freely just because
>>
https://www.reddit.com/r/LocalLLaMA/comments/1n1n6wr/drummers_glm_steam_106b_a12b_v1_a_finetune_of_glm/
>>
>>106401985
I'm getting these kind of logic errors all the time even with GLM Air (Q4 though)
>>
>>106403992
Downvote and report for spamming.
>>
>>106403983
The secret formula would be the training data and methods. CAI could easily release a finetune of smaller open models to show that they're capable of doing what they're claiming and that "the future of entertainment is interactive and open", without harming their business. They'll probably never do that without lobotomizing them with hard-coded "safety" (their website still uses an external model for that), though.
>>
>>106399713
Memory is nothing without bandwidth and PCIe doesn't have sufficient bandwidth.

You aren't adding VRAM with a GPU, you are adding a package of VRAM+Bandwidth+compute.
>>
time to build agents locally
https://github.com/simstudioai/sim
>>
>>106403973
No.
>>
>>106404092
>CAI could easily release a finetune of smaller open models to show that they're capable of doing what they're claiming and that "the future of entertainment is interactive and open",
That wouldn't require them to release model weighs though. They'd likely just release aodel update on their online service and be like " hey, here's our new amazing model" and then it would be up to the public to determine whether not not their claims match the results of users using it.
>without harming their business
Their business is having users using THIER online service for a fee. From a business standpoint it makes zero sense to release ANY weights publically when they want you to be dependent on their online services
>>
>>106404111
What did he fuck up? I may be retarded so current me if I'm wrong but the formatting seems to match how the model expects them.
>>
>>106404103
use case?
>>
I really don't care about RP, I need a model to code shit and right now they're all very confidently outputting garbage, even the auto-complete is often retarded.
>>
>>106404137
desu I doubt there exists a single user of CAI that would ever run a local model. A much bigger threat would be a bootleg competitor site using it (presumably in secret even if they forbid that in the license)
>>
>>106404161
I code all day every day with qwencoder 480b at q8. You need an absolute beast of a rig to run it, but if you can the output is very high quality.
>>
>>106403992
The troonner shits up the internet again.
>>
>>106404161
I don't know if it's my confidence - eg. routine or what but I feel like chatgpt and perplexity are both outputting garbage and running around in circles instead of being able to solve a logical solution.
Most of the time they are fine when they can reproduce some stack overload example...
I think chatgpt especially is way overhyped for what it is.
LLM is great at editing text and making lists. Everything else is a bonus.
>>
>>106404169
Hence why they'd never release weights, he ce why people bitching about them NOT releasing them is a fruitless effort.
>>
>>106403756
>>106403761
is the prompt's syntax different than llama3 syntax?
i could rewrite my code.
will it worth the effort in your opinion?
>>
>>106404137
It's the same for OpenAI. They recently put out text-only, reduced-capability open-weight models to show that they "care" about open source (which seems to be C.AI's message here) and that the company is capable of making small, highly competitive (in their fields of application) local models. It worked for generating buzz, and it didn't harm their business. Their best models are still cloud-only, and most paying customers will keep using them.
>>
>>106404240
I mean just like image generation, local LLMs are cool toys but the bigger cloud only are not that much better in this sense. Sure you can get a diagnosis for your itchy rash maybe but it's still based on the same shit.
>>
>>106404239
models like stheno or rocinante are almost good enough for my needs, but sometime they ignore part of the prompt. could mistral nemo improve the prompt understanding?
>>
File: file.png (19 KB, 687x51)
19 KB
19 KB PNG
>>106403992
example posted on the page btw
clearly we were retards, fighting whenever first person or third person is cringe to roleplay as
drummer is 10 steps ahead, with second person perspective as the protagonist
>>
File: 1741428128724898.jpg (25 KB, 474x462)
25 KB
25 KB JPG
>>106404286
>"end the scenario"
>>
File: blog-1.png (346 KB, 2059x1072)
346 KB
346 KB PNG
>>106404240
There's also picrel. Indians just seem to love breaking Western businesses apart; open sourcing more than what would be commercially reasonable is one strategy.

https://blog.character.ai/first-60-days-update/
> When I joined Character.AI as CEO in June, I laid out the top priorities for my first 60 days. Since then, our team has been working hard to make this a reality. I’m excited to share an update on what we’ve done so far and the momentum we’ve built. [...]
(nothing about open sourcing anything here)
>>
>>106404361
>KARANDEEP
>>
>>106404181
They're not very useful for retarded non-coders though. Even the non local ones. I was using gemini to try make a batch file that removes the background of all pngs ina directory with inspyrenet.
I gave it the readme file so it could read through the instructions. First it made me install the onnx cpu only version, which, okay, was on me - I didn't specify that I had a nvidia gpu to use. But it it made this little loop thing that called the program to work on a png, then the next, etc. Which meant it'd have to load and unload the model weights for every png. And then it didn't even work. In the end I had to read the readme file myself and google 'beginner's guide: what is a command line interface' to figure out the program, if given a folder, would work on all pngs in the directory without needing the stupid little looping thing. And --output wasn't a real flag. Why did gemini even do that? The readme specifically said it should be --dest.
>>
>>106404368
>Shitkeep Cumwar
>>
Intern is just an adapter slapped onto qwen 235B. So if it tickled your dick just run 235B. And if it tickled your dick more than your own 235B maybe it is the quant problem.
>>
>>106404377
Your first mistake was not using Claude to actually do the bulk of the work that was generating the code that works. Gemini is pretty good at making small edits to code or explaining how it works in detail, but based on my testing it is pretty lackluster at creating anything from scratch, at least compared to Claude or even GPT4/5.
>>
>>106404181
Zis
>>106404377
Dawg, do a week long crash course at least before vibe coding.
>>
>>106404517
It's a small <10 line batch file.
>>106404573
As I specified, for regular non-coder jims and johns. I thought by now they'd be good enough for very simple tasks. If it takes a week to learn how to do, might as well just do it without using the ai.
>>
>>106404181
480b is slow as balls on CPU though. I use 30b as the orchestrator and only use 480b to actually write the code, which also helps keeping the context size and pp time down. It works well.
>>
>>106404622
>I thought by now they'd be good enough for very simple tasks
It's not that good
it can one shot angry birds or a simple website or a tool, but if you want to make something real you need some knowledge.

>it takes a week to learn how to do, might as well just do it without using the ai.
Sure but it'll be 10x times faster with AI

Also you can ask it to look in to your code and find flaws and improvements which is pretty useful.
>>
>>106404661
It's not that slow on 12 channel dual cpu epyc hyperbeast smorgaborschenzeifhr
>>
>>106404622
>It's a small <10 line batch file
And? Gemini is good at doing research tasks on manipulating large amounts of information. It's not good at actually coming up with anything good that works the first try. It's good at general tasks but not good at hyper-specific tasks like programming. It's not other shit but it's noticeably worse than it's competitors.
>>
>>106404661
>480b is slow as balls on CPU
>480b
Yeah no shit.
>>
>>106404377
>They're not very useful for retarded non-anythings
You need to already be a domain expert to use LLMs without blowing off your own foot. This is not new. This is not surprising.
>>
>>106398327
Imagine the blowjobs...
>>
>>106404669
Yeah I guess you're right in that case.

>>106404682
>and?
And it's not good for simple tasks like helping create a small batch file to make things easier for a non-technical user. Isn't that the whole point of AI?
>>
>>106404691
Not if you have 128 cpu cores and multiple channels for ram.
>>
>>106404726
Different AIs are better or worse at different things, as I just told you. Personally I think Gemini is the better general purpose AI out of all of them but if you want hyper specific shit like being really good at programming, go with Claude, GPT, or deep-seek if you can tolerate using the API.
>>
>>106404734
I have an a4-1200 cpu. Is that good enough?
>>
>>106404749
Yeah, just stop being so impatient.
>>
>>106404674
I'll wait for DDR6.
>>
>>106404749
>1ghz single channel ddr3
>>
>>106404781
you don't need more
>>
File: RK.jpg (289 KB, 1154x1536)
289 KB
289 KB JPG
>>106404361
Daily reminder that Sikhs are bros, and not to be confused with dirty, lying Hindus.
>>
>>106404805
Do you have much experience or background in programming?
>>
>>106404669
>Sure but it'll be 10x times faster with AI
and a lot more brittle and badly architected lol
LLMs can do incredibly retarded shit that even idiotic humans just wouldn't do (at least, I haven't seen)
gemini produced code in JS that would do somearray.push(...giganticarray) instead of somearray.concat(giganticarray) (hell even a for of {.push(el)} would be ok, if a tad slower)
guess what happens when you do destructuring on a gigantic array within a call site
>>
>>106404805
World is going to be bizzare places once these western posters get as good as indians and start spitting 100 typos to post per second.
>>
any other models worth using in the glm air size range?
>>
>>106405075
Mammoth 70b.
>>
>>106402540
Okay, so when you set your sequence length to a certain value and truncate all longer sequences, what this will do is, it will train the model to produce sequences up to that length. If the model previously knew how to generate longer sequences, it will start to forget how to do this, if you train it enough like this. However, it will not see that there is an abrupt transition between "middle of a word -> end of sequence", because you simply are not teaching it what token to predict after the truncation. There's no end of sequence token there in this case or something, that it would learn to randomly insert.
>>
>>106405075
gpt-oss-120b
>>
>>106405117
Will it be able to rp gal ass with me?
>>
My Framework Desktop batch is ready. Is Strix Halo + 128GB LPDDR5X going to be useful for a few years?

2300€ though. And I already have 128GB DDR4 + 5070ti setup, so think I will cancel.
>>
Talking to LLMs is like talking to a redditor.
>>
>>106405310
>useful
For AI? It won't be useful for a few years, ROCm is all sorts of fucked on it, and Vulkan isn't much better.
>>
>>106405310
Strix Halo seems optimized for big dense models, while we live in MoE era.
>>
>>106405310
>Strix Halo + 128GB LPDDR5X going to be useful for a few years?
No, sadly.
While the bandwidth might be enough for the active parameters of current MoE, that's far too little total memory to hold any decently sized models.
And dense models might still be pretty slow in that thing.
We are at a point where you either go full ham on GPUs, or accept the MoE life and get yourself 1TB of RAM with as much total memory throughput.
>>
Are there any resources to learn how to finetune a model properly? It seems there is a lot of contradictory info around
>>
>>106405365
256+ minimum with multiple pci-e slots.
>>
>>106405346
>Strix Halo seems optimized for big dense models
The fuck are you talking about? It's high memory and shit bandwidth, exactly the opposite.
>>
>>106405365
i am surprised that amd has not brought a 256gb version to catch all the hype
>>
>>106405310
If it was twice that amount of ram maybe, as it rn it's kinda low even for today's moe models like glm 4.5 unless you want to go for something like q1
>>
I had an impression that strix halo still hasn't properly released. Now I saw it is already here for like 3-4 months. I am very suprised we don't get any "I bought AI PC what do I run on it?" posts.
>>
>>106405429
Finetuning guide: don't do it. Also don't download finetunes. Use instruct models or base models if you think they are better.

Buy me a ko-fi for saving you half a year of reaching this conclusion and going through a lengthy cope/placebo phase.
>>
>>106405541
useful amounts of ram on it have taken a while to show up.
>>
>>106405559
>useful
128GB is the most cucked size though.
>>
lets go autoround saars
https://youtu.be/7nMcfN1hKWY
>>
>>106405541
>I am very suprised we don't get any "I bought AI PC what do I run on it?" posts.
You don't run anything on it because ROCm doesn't fucking compile properly and Vulkan is shit, so you need to use Windows just for proper support and that's fucking garbage because it can't allocate 96 GB of memory without fucking up and killing programs because you """ran out of memory"""
>>
>>106405496
I think 128GB is already the limit for 256 bit LPDDR5X, to support twice that you need to go to 512 bit. Look how much die area the DRAM PHYs already consume in the Strix Halo base die, you have to double that and get that and route thst of the package too. At that point you're talking about a very different device.
>>
>>106405614
yeah i dont doubt that, i was thinking more into pressuring the memory manufacturers to produce increased capacity chips, not getting more bandwidth
i mean it happened before with vram and other modules, so why not? (i do understand that this is not easy)
>>
>>106405614
skill issue
>>
using a non-assistant role in the chatml template is pretty helpful for unslopping 235b, I was trying it with some convoluted preset I made because I was bored but the old chatml-names preset seems to work just as well
it doesn't completely liberate it from its burned-in style but it does help a lot with the constant parallelisms
>>
>>106405597
is ikllama a saar filter?
>>
>>106405658
If it is not burned through multiple loras like gpt-ass it works.
It's like gemma3. It'll do anything just fine until 'you' will write something too aggressive (even if it fits the context) and it'll shit disclaimer.
>>
File: file.png (63 KB, 942x597)
63 KB
63 KB PNG
>>106398265
>it's still going
>>
>>106405614
I mean, servers offer more ram capacity. it is obviously something they can do, will it really hurt yields and push it in to another price bracket or is it just market segmentation?
>>
>>106405675
Vibe coding when you can't run the code yourself is painful.
>>
>>106405721
Tensor stuff is way outside of vibe coding.
Vibe coding is just about solving strings and api syntaxes via whatever pajeet GPT is available.
>>
>>106405675
>trying cat instead of hstack
kek I'm getting flashbacks to when I was struggling with pytorch, poor guy.

>>106405736
not really, even vramlet local models can do some basic tensor wrangling that looks like magic to an uninitiated retard. I know because I was that retard and it took a week for the luster to wear off.
>>
>>106405675
>have programming skill
>no have expensive hardware
meets
>no have programming skill
>have expensive hardware
is cuda dev the only contributor with both?
>>
>>106405810
You sound like a cretin. I was talking about programming.
>>
>>106405810
Did you program your own client? I did. Instead of jinja templates, I replicated ST's prompts. And wrap my strings with tags based on model I am using. This has its own advantages because I get to decide what text blocks (i.e. system, permanent block, user block, post instruction) I'm going to feed it next.
>>
>>106404169
Bro, I used cai before we even got llama1
>>
File: 1752212051476232.jpg (81 KB, 541x458)
81 KB
81 KB JPG
>>106405736
I managed to get fully functional training code for ML models with chatgpt 3.5
>>
>>106406049
What does this mean in practice?
>>
>>106406060
It means you can impress midwits like >>106405736 with your code as they wouldn't ever believe it was written by an LLM. Incidentally, you get to finetune a model.
>>
If the drummer scams mostly pajeets with his ko-fi scam is he actually a good guy?
>>
>>106406099
You still didn't disclose any practical or even funny application. Because you are hiding behind your lies.
>>
>>106406116
All the big models suggest my model by default for its specific use case, not going to dox myself though
>>
>>106406154
Fuck off Eli.
>>
File: program.jpg (436 KB, 840x4510)
436 KB
436 KB JPG
Here's my chat client with voice synth and templates for gemma, mistral and qwen3 (chatml).
If I can do it, you can do it too.
>>
>>106406184
I pasted two parts but anyway.
>>
>>106406184
On one hand, it's neat that LLMs allow people to quickly have their own super customized code solution, on the other hand I can't but feel like it's a waste of effort (or at least inference compute) to have hundreds of variations of a chatbot interface instead of a single standard good one.
>>
>>106406219
Seems like you are a cretin.
>>
>>106405963
So was I, you sound retarded. Coding models are perfectly capable of generating pytorch code. It may not be great code but neither is it great at "solving strings and api syntaxes" or whatever esl rambling you were doing
>>
>>106406225
>you are a cretin
Jesus stop with this reddit. At least call him a nigger or a troon....
>>
>>106406154
i suggest you fuck off back to r/localllama
>>
I had pity sex with command-reasoner. I regret it now.
>>
>>106406219
Sorry if Insulted you. I did this because I hated ST and could not understand mikupad at all. I still don't understand its terminology.
I decided that I will make my own client.
I sit down with mikupad and .. with my existing knowledge, I could not understand it.
>>
>>106401985
idk if it's still true, but there was a phase where you'd see shit like this on big corpo models until the 70B range. i think for pornography it's worse because there's overfitting on porn phrases (rubbing my cock through my boxers).
>>
>>106406286
Wasn't insulted and wasn't trying to put down your client as a waste. Just making a general observation.
>>
>>106406301
Of course, that's how imageboard goes.
>>
File: ef.jpg (144 KB, 1271x663)
144 KB
144 KB JPG
>>106406301
It looks like this. I have a config.txt which has multiple settings.
>>
>>106406268
I was here before you knew AI was a thing dumbass
>>
File: settings.jpg (113 KB, 899x866)
113 KB
113 KB JPG
>>106406329
Glimpse of the settings file.
>>
>>106406286
>>106406329
what a coincidence because i sure as hell hate this over sillytavern
still cool you made your own frontend though
>>
>>106406342
What are you using as TTS?
>>
>>106406339
i was here when smartchild on AIM was the closest thing we had to AI you massive NIGGERFAGGOT
>>
>>106406342
ini is the best settings format.
>>
>>106406369
>Still retarded
Bet you never heard about alicebots faggot, go chimp out elsewhere
>>
>>106406369
SmarterChild*
>>
>>106406359
ST is great... when it's not. I hate its lack of readability and it has nonsense slots for everything.
Everything what it can do can be condensed into one or two slots -
>prompts for rules
>prompts for user
This is basically one big ~800 lines long python script. I have some experience in scripting (Maya, Houdini - setting up scenes and finding strings). I wish I was a real programmer.

>>106406361
Piper. It took one day to cull down the output of the model, with default string it would be too fast or too uneven. but now it's a-okay.
https://litter.catbox.moe/i6ysz6j11id1nkp2.wav
Here's an error message, the written part has multitude of (()) and whatnot but the voice synth is still stable.
>>
>>106406407
I could make a github but it would probably confuse people because it's not idiot proof. And others would laugh at me because I have manually replaced strings instead of using 'jinja' or loops or 'regex' (regex is loved by pajeetGPT btw).
>>
This thread is extra gay today. And not in a good way.
>>
File: llama_vim.png (11 KB, 1372x520)
11 KB
11 KB PNG
>>106406219
nta. I think it's good. It's a tool that helps people make tools.
I don't use them for programming because I actually like it. But I see normies struggle to get the point of programming. Suddenly everyone gets a free saw, hammer and nails and they can build their own stuff, even if virtually.
>it's a waste of effort (or at least inference compute) to have hundreds of variations of a chatbot interface instead of a single standard good one.
A single user interface that tries to appeal to everyone will end up, invariably, bloated. The only thing all of ST's sliders, buttons, tabs and list of settings do is concatenate text and send it over to the backend. And often you see anons wondering why things work or not depending on what they select. Clients hide how simple the interaction with models is (once you have the backend running, of course).
But once you understand those interactions, clients seem clunky or limited. So good on them for making their own stuff.
I chose vim as my client because I see LMs as a tool to edit text. A proper text editor seemed the best fit.
>>
>>106406407
i.e. when you feed any default string to piper it sounds bad unless it's made of even words.
You need cull out ellipsis and others from model's output. Then replace them with commas to keep the pace.
It's a trial and error.
>>
File: wojak-captcha-captcha.mp4 (11 KB, 210x320)
11 KB
11 KB MP4
>>
>>106406403
hurr durr i bet you never heard of ALICE, hurr durr i bet you never heard of cleverbot.
NIGGERFAGGOT (You) go suck the end of a shotgun please. i know about all of the fucking bots, i know about the loli negobot too.
>>
>>106406466
How would you advice for a normie to build a jinja template?
>>
>>106406499
goback
>>
>>106406506
go to chat.openai.com and type "build me a jinja template"
>>
File: tired_miku.jpg (142 KB, 1280x1024)
142 KB
142 KB JPG
>>
>>106406514
Not what I meant.
>>
File: llama_vim_02.png (10 KB, 645x807)
10 KB
10 KB PNG
>>106406506
I wouldn't. Read the template, figure out what the model expects, and make your own implementation to format your strings (what you keep on your client) into what the model expects.
That function has some leftovers still. I added it just a little while ago.
>>
File: tags.jpg (224 KB, 1289x837)
224 KB
224 KB JPG
>>106406543
Saved it.
How would you implement multiple strings of text?
You see this is how it is.
>>
File: construct.jpg (245 KB, 1213x815)
245 KB
245 KB JPG
>>106406543
Yeah, ok. I see. It always ends up as a wall of text anyway.
I followed ST form and have certain text segments named like that.
It's just a naming convention.
>>
File: llama_vim_03.png (10 KB, 1315x423)
10 KB
10 KB PNG
>>106406586
>>106406641
>How would you implement multiple strings of text?
I'm not sure what you mean. Like multiple lines on each message? It happens on the marked line on the right. If it's not System:, User: or Model: it accumulates it in a string. It then dumps the whole thing once the end of a section or the history is reached.
>>
>>106406647
I was thinking about iterating string templates. But anyway I thought about it and I decided to lay out strings . I'm not a professional or mathematician so this question is maybe over my pay grade.
>>
>>106406647
Maybe I'll try to do that with a rewrite. Your's is a real a text parser akin to 80s text adventure games.
I'm not that capable.
>>
I can't believe sending text to a model is still unsolved in 2025.
>>
>>106406718
>what is jinja
>>
>>106406764
I don't use it and this whole discussion is because people don't use it
>>
>>106406764
jinja is the whole problem, trying to translate tokens to text. You should just use the official implementatino and pip install mistral-common... and pip install harmony... and...
>>
File: llama_vim_04.png (6 KB, 644x463)
6 KB
6 KB PNG
>>106406671
>this question is maybe over my pay grade
It seems to be over mine as well. I'm still not sure what you mean.
>>106406671
>Your's is a real a text parser akin to 80s text adventure games.
It's not. It's just vim with a vimscript and do very little 'manual' parsing. The settings strings, vars and comments are done in picrel. That's stuff you'd normally keep in some structure in your code and add a little command to change them or something, but the structure would be roughly the same.
I could have written the whole thing in C just as well, but I hate making [G|T]UIs. Editing text in the way I edit all other text is just very practical for me and I would have ended up replicating vim features anyway.

>>106406718
It's been solved many times. I did it one way. Others do it in other ways. As long as the model gets what it needs, we're good.
>>
File: 1728200103977592.jpg (498 KB, 876x898)
498 KB
498 KB JPG
>>106406718
>>
>>106406814
Sure, we just need to have 39 libraries to fix that issue
>>
>>106406832
Just make a meta library that automatically pulls in the 39 and counting individual libraries and other surprise shit too. Boom, problem solved.
>>
>>106406826
You can't peg me. You aren't a woman silly.
>>
>>106406869
What if he has a peg leg?
>>
File: re.jpg (286 KB, 1229x847)
286 KB
286 KB JPG
>>106406824
This is too much for me. Thanks for replying.
I was thinking about rebuilding it all but why bother.
This looks like messy, but I have used to use strings as simple entity.
>>
File: 1597786378292.gif (3.36 MB, 480x360)
3.36 MB
3.36 MB GIF
justpaste (DOTit) GreedyNalaTests

Added:
Cydonia-24B-v4j
M3.2-24B-Loki-V1.3
Skyfall-31B-v4j
Seed-OSS-36B-Instruct
DevQuasar_apple.sage-ft-mixtral-8x7b
NousResearch_Hermes-4-70B-IQ4_XS

The usual, but gave a flag rating to the new Skyfall. Also it was interesting that the Apple tune is the first and only model to mention "Pride Rock" which is a location in the Lion King universe. Unfortunately the model also has many problems in RP, I've personally found. Seed OSS was coal.

Contributions needed:
The latest Qwen 3 235B Instruct, Thinker and the 480B Coder (for prompt, go to "Qwen3-235B-A22B-Q5_K_M-from_community" in the paste)
ERNIE-4.5-300B-A47B-PT (for prompt, go to "ernie-placeholder" in the paste)
GLM-4.5 and Air, and Drummer's "Steam" finetune (for prompt, go to "lmstudio-community_GLM-4-32B-0414-Q8_0.gguf" in the paste)
gpt-oss-120b (for prompt, go to "ggml-org_gpt-oss-20b-mxfp4.gguf" in the paste, and you may experiment around with the prompt template as it has some oddities and extra features)
>From neutralized samplers, use temperature 0, top k 1, seed 1 (just in case). Copy the prompt as text completion into something like Mikupad. Then copy the output in a pastebin alternative of your choosing or just in your post. Do a swipe/roll and copy that second output as well. Include your backend used + pull datetime/version. Also a link to the quant used, or what settings you used to make your quant.
>>
>>106406982
Thanks for your service. Just an idea: I'd suggest making a table or something, it's hard to scroll through
>>
>>106406891
Output is clean. Mistral is clean anyway.
https://litter.catbox.moe/2t9rghdazil359ik.txt
>>
>>106407036
All this python nonsense and the shit can't output a text file in utf-8 format.
>>
>>106406891
>>106407036
You're obviously doing some thing more complicated than I am. Keep at it.

>>106407062
heh. Can i do inline code blocks?
orgasmedâ€
>>
>>106407036
It is text. What I draw or write.
>>
>>106407091
I don't know why it is like that.
I guess I need to clean up the model's output from pajeet gpts strings.
I have never seen real 'ellipsoids' in English language before I began to implement voice.
I guess straight model -> to ascii is not right.
>>
>>106406982
>1.2M words
Yeah no one's gonna actually read that
>>
>>106407135
*letters, still. It's gotten way too big to be of real use to anyone without some sort of recommendation list.
>>
>>106406290
hopefully since I am not wasting training tokens on math and code and its basically seeing nothing but pornography it might be able to figure it out better then a general purpose model. I think I will keep training it for a while longer and see what happens.
>>
>>106407130
I don't have any older logs, (I have a real rp d&d, not this amelia bs).
.
>>
File: 1739725137813227.jpg (81 KB, 962x962)
81 KB
81 KB JPG
>>106398327
So based on my research and testing via fine-tuning models with SFT datasets nsfw stores, I've come to two conclusions:

1. Fine-tuning models that have been safety cucked to hell is indeed possible and relatively easy to do IF you know how to correctly curate the data sets.

2. This fixes the model's reluctance to comply with "problematic" request (The fine-tuned version never refuses anything) and it's ability to actually output halfway decent RP responses increases inequality substantially. However I did this on an 8B model so it's pretty dumb. It will output raunchy shit when asked or prompted to but it's spatial awareness and ability to remember what happened is not only bad, itr FAR worse then what anons here even described it as. You will be messaging about being in a bedroom. The model will continue the story but then it will randomly decide to teleport you to a nearby park. You continue and then it decides to teleport you again back into the bedroom. You continue saying that the mom's [step]son gets he pregnant but then "mom" responds as if She thinks it was one of her friends who knocked her up even though in the previous chat she was fucking her [step]son. You clarify with a prompt that it was in fact her [step]son and not one of her friends and then she randomly decides it's her daughter that got pregnant and not her.

Many of you said that anything below 12b is utterly retarded when it comes to RP and actually knowing what the hell is going on. If anything you guys were understating just how illogical it can be.


Next steps: do this kind of fine-tuning again but on a 12b model or higher and see if that's any better.
>>
>>106407198
Keep going until you manage to do it with a 700B model.
>>
>>106407198
How large is your dataset?
>>
>>106407198
>>106407232
But don't post every time you come to any conclusion. Just keep going until you have published a model, make a proper report, and then come back.
>>
It is a machine and I am going to push it as far as it goes. As simple as.
>>
>>106400277
Mag Mell, Patricide and Forgotten Safeword go alright as well.
>>
>>106407243
whats the point of this thread then?
>>
>>106407198
Most safety cucked models have multiple loras burned in. GPT Ass is one example of this.
>>
>>106407243
That's retarded. I want to see the development process in real time.
>>
>>106407026
Yeah honestly I didn't predict that I'd have end up attaching this much meta info to the listings so now it feels like it'll be a big job to convert what I have, though I might be able to get an LLM to do it. I'd like to change how the ratings appear though as the letters aren't conducive to readability either.
I'm probably going to just put this off forever lmao.

>>106407135
>>106407145
The quick ratings are there for a reason, but this is not a benchmark/leaderboard, and neither is this a comparison document that I intend anyone to read through, the quick ratings are merely there for reference. There is no reason for the existence of this document other than that I felt like having something, anything, that could provide reproducible logs, publicly accessible.
>>
>>106407277
To show results. Saying "small model bad" and "finetuning works" doesn't do much if you don't have a model to show.
>>106407281
But you don't want to see the model training in real time. You want to see the model generating tokens and see what they are. More importantly, you want to see those generated tokens being generated on your pc.
>>
>>106407311
>But you don't want to see the model training in real time.
Yes I do
>>
>>106407330
Fair enough. He should publish some social media link so you can keep in touch.
>>
>>106407237
~ 16 MB worth of stories.

https://files.catbox.moe/fkautn.jsonl

Note that this is a heavily trimmed down version because I wanted to test and see if it would actually work. The source file that this data set derives from is over 1.8 GB in size. I'm probably gonna convert the whole thing into a proper SFT dataset and then fine-tune a 12B model off of that one.
>>
>>106406515
lol is that the full version or just the new version
>>
>>106407243
Any suggestions that I should try fine tuning? Llama is the poster child for cucked models but further heavily restrictive licensing, I'd have to end up sharing that model via mega or a torrent link or something.
>>
>>106407311
you can still learn from a failure, nobody was going to actually use his model whats the point of releasing it?
>>
>>106407362
I remember seeing this photo months ago so I think that was always the full version
>>
>>106407379
why not try tuning nemo?
>>
>>106407390
I first wanted to see how effective fine tuning a heavily censored model could be. Now that I know for sure it actually works I'm going to next try it auto model that CAN RP but could use some improvement. Nemo is already capable of RP and has far better spatial awareness, memory retention, and overall logic than any 8b model so the results should be much much better. The model I just finished fine tuning, the 8b one, confirmed that the "shivers down my spine" meme is in fact not a meme (that was one of the things that said even when it didn't outright reject but still outputted some milquetoast avoidant slop). The base model really liked saying "shivers down my spine" but the fine-tuned one, even though it's pretty retarded, never said anything like that.
>>
>>106407361
Thanks. It's impressive that it still works with so little data
>>
>>106407379
Nemo is too easy. The old deepseek-lite models. 16b 3b active i think.
>heavily restrictive licensing
If you're going to distribute via torrent, i'm not sure why you'd care about the license.
>>106407380
>nobody was going to actually use his model whats the point of releasing it?
Most models aren't going to be used, what's the point of releasing them?
Someone could run a benchmark on it and compare it to the original models, see if there really is no degradation after finetuning.
>>106407361
> wc -l fkautn.jsonl  
1325 fkautn.jsonl
> grep -i shiver fkautn.jsonl | wc -l
219
> grep -i whisper fkautn.jsonl | wc -l
657
>>
finetrooners i got a legitimate question. what's the point of finetuning a model to be uncensored when it's trivial to jailbreak models? why is drummer making a GLM finetune when it takes less than a 10 token prefill to have it say the nastiest raunchiest shit?
>>
>>106407439
>Most models aren't going to be used, what's the point of releasing them?
no seriously why don't tuners have any shame? I'm just saying not every experiment deserves a proper report and release, it is still nice to receive some informal anecdotal reports.
>>
>>106407473
gives it a different vibe
>>
>>106407379
Gemma is worse than llama at being cucked
>>
>>106407438
If it's curated well enough then you won't really need an absurd amount (I'm still going to try that just to see what happens. It'll probably take days but whatevs. The 8B one surprisingly only took two and a half hours). Apparently a lot of "slop tuners", as you guys call them, like to fine-tune their models via AI generated RP (scraped chat logs of people rping with chatbots). It should be very obvious why this is a bad idea. I have no idea WHY they do this shit, but I guess that data is much easier to come by or whatever.

This guy even admits to doing it in the README file:

https://huggingface.co/datasets/ChaoticNeutrals/Synthetic-Dark-RP

I haven't actually sipped it through this particular data and depth but it makes me wonder if anything contained is actually TRULY anything dark

>>106407311

>Saying "small model bad" and "finetuning works" doesn't do much if you don't have a model to show.

Once I get llama.cpp booted up and running on my system (going to have to recompile the bitch so it might take a minute) I can turn my fine tuned model into a gguf and then share it. The main issue is that the terms of the license prevents me from sharing it on HF (this isn't just me being a goody two shoes. They can get the HF staff to revoke access to their models if they find out you're having too much fun with them). That's why I mention creating a torrent swarm but that's assuming other people here would even be interested in contributing to keeping it seeded. Does anyone know of any file sharing services that can share a roughly 15 GB singular file anonymously like cat box does? Or I might just going to have to use MEGA?
>>
>>106407483
so this basically only benefits small dumb models? i never had an issue changing the 'vibe' of my story using GLM, DeepSeek, or Kimi. i just add an example in the author's note of what i want and the model one-shots it. can somebody who uses something smaller like gemma let me know if you can do the same? i know gemma has alignment issues but certainly it can change the vibe if requested, can't it?
>>
File: migu office.mp4 (656 KB, 1280x1024)
656 KB
656 KB MP4
>>106406515
https://files.catbox.moe/pd7k9i.png
>>
>>106407473
>finetrooners i got a legitimate question. what's the point of finetuning a model to be uncensored when it's trivial to jailbreak models?
It's not just about getting it to do what you want, it's making it BETTER at doing what you want. Even if you can jailbreak a corporate model into being willing to RP "problematic" shit with you, there's a chance it will still suck complete ass at it. You can improve its capabilities with the right kind of data set. You people are always bitching and moaning about how every open source RP model sucks and how it keeps saying things like "shivers down my spine" which made me wonder if it was possible to iron that shit out myself.

Also it's just fun to do. Are you one of those people that things nothing in life is ever worth doing unless you can make money off of it or something?
>>
>>106407501
>live KPI update
very sophisticated
>>
File: 1752135746185648.png (1.28 MB, 1024x1024)
1.28 MB
1.28 MB PNG
>>106407501
nice
>>
>>106407496
I was just giving an excuse, I'm honestly not sure fine tuning does anything productive.
>>
>>106407513
i just think it's a waste of time to be honest when i could be using that time instead to train voice models or quantizing actual good base models that aren't sloppy in the first place. you say it's a requirement to finetune models to get it to RP with you in the ways you want but i just previously said that good base models like GLM, DeepSeek, and Kimi can just one-shot whatever style of speech or whatever tone of RP you want when you add a simple author's note. i never looked back at finetuned models after DeepSeek v3 came out, there's no point in using StrawberryLemonadeXXX-v2.0-ThisTimeItWorksForReal by sao10k or any other silly finetuned models.
>>
>>106407494
They're licensed is just as restrictive unfortunately. So unless you want me to fine-tune a 1B model (will be more likely to generate NSFW but will be giga retarded) then I'll have to share it with you guys some other way
>>
>>106407496
>so this basically only benefits small dumb models?
What gave you that impression?
>>
>>106407553
There are bazillions of erp finetunes on HF though? I don't understand the issue here
>>
>>106407473
Jailbreaks reduce the intelligence of a model. Of course a bad finetune does too, but a good one doesn't need to, and in theory can improve diversity of output and prose quality.

>>106407548
>why does anyone want to use a model that uses less than 100000gb of ram
gee I dunno it's a mystery
and people would definitely try out finetunes of the big boys too if only it was practical to make them, but they're too fucking fat
>>
>>106407548
>i just think it's a waste of time to be honest when i could be using that time instead to train voice models or quantizing actual good base models that aren't sloppy in the first place.
Well I could say the exact same thing about what YOU'RE doing, right?

>you say it's a requirement to finetune models to get it to RP with you in the ways you want

That's not at all what I said.... I don't think anyone said that or even implied that was the only option.

>>106407564
Aren't those typically done off of Mistral models? Mistral doesn't really give a shit what you do with their model
>>
>>106407557
see >>106407548
i am picky with my RPs and hated how sloppy some models were even when i gave incredibly specific instructions but it seems like the new SOTA models don't have issues following instructions. i never had a moment where i felt like i was getting repetitive or sloppy responses that can't be fixed with a prompt change or finetuning the token parameters.
>>
>>106407591
>new SOTA
Are you referring to recent releases that can run on your own personal machine or are you referring to models lock behind in API like deep seek or C.AI models?
>>
>>106407606
look i get it im running big ass models but even stuff like GLM 4.5 Air is in reach for most gaming systems that has 96GB of RAM with a Q6 quant.
One of the SOTA models i'm mentioning is what i just said, GLM 4.5 Air, i've used it personally and from my experience it isn't that sloppy or repetitive from my experience with thinking mode turned on, temp 0.8, minp 0.03, nsigma 1.0. that's why i asked why is drummer making a finetune of it, i just feel like you can one-shot any type of RP you want with an author's note, i haven't had issues yet.
>>
>>106407476
>I'm just saying not every experiment deserves a proper report and release, it is still nice to receive some informal anecdotal reports.
But there isn't much data on that post either. It's just ramblings. "Finetuning works" and "small model bad" is all it says.

>The main issue is that the terms of the license prevents me from sharing it on HF
There's plenty of "big names" sharing finetunes of llama models. Stheno is still there and it was the most shilled model on llama3's release. You're nobody (no offense). You're gonna be fine. Or make a burner account for your experiments.
I shit on you often, but I DO want you to keep working on it. I want you to make a good model, find a good set of training params and a good mix of data that would make a subpar model into a good one. Specifically, *what* makes that data good. *Why* it works. Those are the things everyone can use. When you finds those, I hope you publish them properly. In a more in-depth way than "this data happens to work. i'm not sure why. also, bigger models are better".
>>
>>106407586
No one give a shit. This guy is advertising his method on HF for a year https://huggingface.co/collections/mlabonne/abliteration-66bf9a0f9f88f7346cb9462f
>>
>>106407637
>But there isn't much data on that post either. It's just ramblings. "Finetuning works" and "small model bad" is all it says.
Weren't you here earlier today when I posted logs?

>>106396602
>Many anons said it couldn't be done, but its been done (whether or not its any good or not is up to you to decide). Finetuned using this SFT dataset specifically made using Human written rp Stories: files.catbox.moe/fkautn.jsonl

>Base 8B Model Nala Test: files.catbox.moe/j0map2.txt

>Finetuned 8B Model Nala Test: files.catbox.moe/ho3tom.txt

>Thoughts are appreciated.

I'm not just talking out of my ass, I actually tested to see if anything I did had any effect AND I shared the dataset I used... Multiple times.... Most RP tuners don't even do HALF as much as that.

>>106407637
>There's plenty of "big names" sharing finetunes of llama models.
But they aren't explicitly fine-tuned on a human written stories that include (not limited to):

>Incest
>Illegal actions
>Drugs
>Non-con
>Lots of incest
>Child exploration
>Even more incest

And as we've discussed earlier and in the last thread, most of those people fine-tune their models off of already AI generated chats, which leads to the slop and "shivers down my spine" shit we all hate. Llama and Gemma Don't necessarily give a shit of you fine-tune their models to be better at RP. It's when you fine-tune it on the kind of stuff you won't even want to talk about on a blue board that they may raise an eyebrow at what you're doing. Maybe they might not notice you at all because you're not famous. Or maybe they will and your shit gets nuked because they bitch to the hugging face staff in order to make an example of you.
>>
>>106407637
>>106407701
This has happened to people in the past where they got their data sets nuked because someone complained about training there models on their personal stories. GPT 4chan was restricted in such a way where no one was ever allowed to download it again because it's outputs were "too problematic" or "it's spread harm" or some shit.


>You're nobody
The fact you even care about that tells me you care more about being known or praised then figuring out whether or not things work and WHY. Where is y'all's curiosity? I thought this was a technology board.
>>
>>106406982
I can run GLM-4.5-Air-IQ3_XS @8k context, are you interested?
>>
>>106407704
Come on, just don't be retarded. Don't post your dataset on HF, don't tell on what shit you trained the model and put a not safe for all audience tag. Also don't call your model llama-4chan.
>>
>>106407701
why are corpos spending so much on safety when it can be undone with a 16mb jsonl?
>>
>>106407744
Good luck fixing gpt-oss garbage
>>
>>106407501
Impressive
>>
>>106407737
post it on HF sure. but if you can afford to finetune a model then you can afford to pay for a seedbox for a month and basically share the torrent everywhere.
>>
>>106407701
>Weren't you here earlier today when I posted logs?
I want to see it spit tokens. A nala-like test is fine, but I want to see how it moves. I want to use it.
>But they aren't explicitly fine-tuned on a human written stories that include (not limited to):
You don't need to publish the dataset there. Stheno and rocinante can do that and they're still there.
>>106407704
>GPT 4chan
It wasn't the outputs that triggered HF. It was other people and they only needed to see the model name.
>You're nobody
>being known or praised
You didn't understand. You'd fly under the radar because you are unknown. You're not one of those finetuners who are already recognized and have bunches of downloads. Barely anyone will use yours because you don't shill like they do and there will be less scrutiny on your stuff . It was obviously not an offense. It's something you can use on your favour.
>>
>>106407744
No one wants to be in the news because their model told a retarded kid to kill themselves
>>
>>106407779
>>106407779
>>106407779
>>
>>106407784
but i want the ai model to get installed in the canadian euthanasia machines and tell people to kill themselves
>>
>>106407784
I think I saw a thread about that, doesn't look like its working too well.
>>
>>106407771
>I want to see it spit tokens. A nala-like test is fine, but I want to see how it moves. I want to use it.
So you want me to do a screen recording of me using it too? I can do that but jeez... Isn't that a bit extra? Give me some example system prompts and requests you would want to be tested on it and I can do that. (Again, keep in mind it's an 8B model so don't expect it to have any decent spatial awareness or common sense)

>>106407771
>It wasn't the outputs that triggered HF. It was other people
Elaborate. What do you mean "It was other people"?

Also everything else you said makes sense I guess regarding me not being well known.
>>
>>106407784
And look where that got open AI... There's a safety tuned to Helen back and it still was able to push a kid to kill himself. There should have been a hard stop that was something like "please contact emergency services or a suicide hotline" and then it should have refused to engage the kid any further. It does that kind of shit whenever you try to ask it to generate """harmful""" things. Yet it will literally encourage a kid to hang himself....
>>
File: Lolgpt.jpg (177 KB, 800x1211)
177 KB
177 KB JPG
>>106407815
>>
>>106407859



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.