/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Janitor application acceptance emails are being sent out. Please remember to check your spam box!

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 11/07/25(Fri)20:41:33 No.107138606

File: G1ID0CGaQAI15jH.jpg (1.12 MB, 1796x2500)

1.12 MB JPG

/lmg/ - Local Models General Anonymous 11/07/25(Fri)20:41:33 No.107138606

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107129334 & >>107121367

►News
>(11/07) Step-Audio-EditX, LLM-based TTS and audio editing model released: https://hf.co/stepfun-ai/Step-Audio-EditX
>(11/06) Kimi K2 Thinking released with INT4 quantization and 256k context: https://moonshotai.github.io/Kimi-K2/thinking.html
>(11/06) LocalSong 700M melodic instrumental music generation model released: https://hf.co/Localsong/LocalSong
>(11/05) MegaDLMs framework for training diffusion language models released: https://github.com/JinjieNi/MegaDLMs
>(11/01) LongCat-Flash-Omni 560B-A27B released: https://hf.co/meituan-longcat/LongCat-Flash-Omni

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
11/07/25(Fri)20:42:03 No.107138613

Anonymous 11/07/25(Fri)20:42:03 No.107138613

File: 1759190643813702.jpg (176 KB, 1536x2048)

176 KB JPG

►Recent Highlights from the Previous Thread: >>107129334

--Papers:
>107130633
--llama.cpp VRAM optimization challenges and AMD EPYC memory architecture quirks:
>107132531 >107132547 >107132605 >107132615 >107132685 >107132754 >107132705 >107132740 >107132765 >107133279 >107133407 >107133585 >107133671
--Budget and power challenges for a high-end workstation PC build:
>107130125 >107130157 >107130181 >107132027 >107132049 >107132074 >107132080 >107132104 >107132118
--Hardware performance for running GLM-4.5 models on RX 6600 XT:
>107133281 >107133294 >107133328 >107133338 >107133381 >107133444 >107133460
--Uncertainty over RTX 50 SUPER's 3GB GDDR7 memory availability:
>107131960 >107132001 >107132894 >107133211 >107132060
--Budgeting and hardware compatibility challenges for tensor parallelism prototyping:
>107130539 >107130706 >107130899
--Speed vs quality tradeoffs with K2 Thinking model on SSD hardware:
>107136636 >107136667 >107136687 >107136699 >107136721 >107136777 >107136820 >107136885
--Character.ai model architecture and commercialization challenges:
>107137178 >107137277 >107137296 >107137860 >107137233 >107137275 >107137300 >107137444 >107137520 >107137724
--Model discussion with NSFW and uncensored features:
>107133720 >107133729 >107133752 >107133948 >107134600>107134837 >107133737
--Debate over model weight formats and open weight access for finetuning:
>107129703 >107129880 >107129911 >107129971 >107135655 >107135714 >107135921 >107135957 >107135992 >107137717 >107137751 >107137833 >107130017
--Logs:
>107130261 >107135147 >107135334 >107135409 >107135481 >107135491 >107135517 >107135792 >107135854 >107135967 >107136320 >107136332 >107136385 >107136522 >107136469 >107136808 >107136984 >107137104 >107137141 >107137735
--Miku and Luka (free space):
>107129864 >107130191 >107130344 >107131403 >107131513 >107131552 >107137895

►Recent Highlight Posts from the Previous Thread: >>107129340

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
11/07/25(Fri)20:44:28 No.107138632

Anonymous 11/07/25(Fri)20:44:28 No.107138632

>>107138549
skill issue

Anonymous
11/07/25(Fri)20:45:01 No.107138639

Anonymous 11/07/25(Fri)20:45:01 No.107138639

>>107138613
Models to translate chinese text from the image to hindi?

Anonymous
11/07/25(Fri)20:55:25 No.107138716

Anonymous 11/07/25(Fri)20:55:25 No.107138716

File: 1733030446991262.gif (205 KB, 445x445)

205 KB GIF

>>107138549
Because there's demand, and most people are too dumb to prompt. If a model can't talk about the smaller things in life with no system prompt and zero context then people will give up until there's a model that can, even if it's measurably stupider.

Anonymous
11/07/25(Fri)21:02:39 No.107138775

Anonymous 11/07/25(Fri)21:02:39 No.107138775

When and if the AI bubble bursts, how do you predict that it will affect local models? Do you think there a period of stagnation since the major AI companies stop development due to the crash, or do you think local models will pick up the slack and slowly iterate when the major players stop?

Anonymous
11/07/25(Fri)21:06:03 No.107138810

Anonymous 11/07/25(Fri)21:06:03 No.107138810

wholesome message from /oursaar/
https://youtu.be/mdlGTMAPoz8

Anonymous
11/07/25(Fri)21:06:31 No.107138812

Anonymous 11/07/25(Fri)21:06:31 No.107138812

>>107138775
local cannot progress without corporate. the difference is that when corporate AI fails, we will still be here and we will still have our models

Anonymous
11/07/25(Fri)21:07:28 No.107138820

Anonymous 11/07/25(Fri)21:07:28 No.107138820

>>107138775
I think China will keep chugging along, so local will still get something.

Anonymous
11/07/25(Fri)21:09:53 No.107138842

Anonymous 11/07/25(Fri)21:09:53 No.107138842

>>107138775
The only reason local models are even being made is to get investors and build general interest for the company's proprietary models. If the bubble bursts then I wouldn't expect anything but finetunes, rather than actual new model releases, unless some group goes the crowdfunding route and there's enough interest for people to pay up.
That said, I don't think we'll see a pop for at least another 2-3 years, if at all.

Anonymous
11/07/25(Fri)21:11:28 No.107138849

Anonymous 11/07/25(Fri)21:11:28 No.107138849

>>107138775
Imagine outcry when suddenly their AI boyfriend is shut down, it will be like GPT4 shutdown, but 100x worse.

Anonymous
11/07/25(Fri)21:12:06 No.107138859

Anonymous 11/07/25(Fri)21:12:06 No.107138859

>>107138842
A pop will trash the US economy at this point, so they'll keep the charade for as long as they can

Anonymous
11/07/25(Fri)21:12:29 No.107138862

Anonymous 11/07/25(Fri)21:12:29 No.107138862

>>107138842
>That said, I don't think we'll see a pop for at least another 2-3 years, if at all.
Lol
Lmao

Anonymous
11/07/25(Fri)21:13:56 No.107138867

Anonymous 11/07/25(Fri)21:13:56 No.107138867

>>107138862
you can short the market if you're that confident lol

Anonymous
11/07/25(Fri)21:16:01 No.107138879

Anonymous 11/07/25(Fri)21:16:01 No.107138879

>>107138842
>>107138859
OpenAI plans to IPO next year. I imagine the pop will come shortly after that.

Anonymous
11/07/25(Fri)21:16:47 No.107138885

Anonymous 11/07/25(Fri)21:16:47 No.107138885

>>107138867
I don't have enough money to gamble with but this guy is doing it https://uk.finance.yahoo.com/news/michael-burry-shorting-ai-stocks-092424898.html

Anonymous
11/07/25(Fri)21:17:22 No.107138890

Anonymous 11/07/25(Fri)21:17:22 No.107138890

>>107138639
にんじん = carrots
じゃがいも = potatoes
third one - not sure
(blank)ねぎ = it's probably たまねぎ, the regular ball onion, but with the first two characters missing, it looks like it's ねぎ, the signature green onion [hence the decision]

Anonymous
11/07/25(Fri)21:19:58 No.107138910

Anonymous 11/07/25(Fri)21:19:58 No.107138910

>>107138890
this isn't hindi and you are no model

Anonymous
11/07/25(Fri)21:31:00 No.107138968

Anonymous 11/07/25(Fri)21:31:00 No.107138968

>>107138775
I will short nvidia and use the money to buy up all the dirt cheap datacenter hardware to create the ultimate local model

Anonymous
11/07/25(Fri)21:45:21 No.107139048

Anonymous 11/07/25(Fri)21:45:21 No.107139048

>>107138968
nvidia has buyback agreements with pretty much all the datacenters they supply. They'd rather toss their GPUs into an incenarator than let people have more than 16GB VRAM for less than $2000.

Anonymous
11/07/25(Fri)21:52:36 No.107139095

Anonymous 11/07/25(Fri)21:52:36 No.107139095

>decide to return to GLM-Z1 for nostalgia sake
>On regular Z1 {{char}}: before <think> jailbreak works like a charm.
>Rumination just doesn't give a fuck. If you tell it to write degenerate smut it will immediately go into a recursive thinking loop to refine the response (but it's a dumb 32B model and misses the point of the scenario entirely)
We have to go back, thoughbeit.
>mfw if I died an untimtely death my loved ones would stumble upon my AI lab, see what I was getting AI to write and think I was the most awful human being on the planet.
This is the path to a post scarcity future that will cure all human suffering though.

Anonymous
11/07/25(Fri)22:01:14 No.107139156

Anonymous 11/07/25(Fri)22:01:14 No.107139156

uhm...which local model is the least safety slopped and good at coding so I can vibecode le epic malware?

Anonymous
11/07/25(Fri)22:05:53 No.107139185

Anonymous 11/07/25(Fri)22:05:53 No.107139185

>>107139156
pygmalion 8b

Anonymous
11/07/25(Fri)22:07:15 No.107139198

Anonymous 11/07/25(Fri)22:07:15 No.107139198

>>107139156
Deepseek R1.

Anonymous
11/07/25(Fri)22:14:06 No.107139246

Anonymous 11/07/25(Fri)22:14:06 No.107139246

It's been more than 5 threads and no new goof supported. I think we need to do something.

Anonymous
11/07/25(Fri)22:21:53 No.107139295

Anonymous 11/07/25(Fri)22:21:53 No.107139295

>>107139246
>and no new goof supported.
What do you mean? There was a zombie lobby that was being bombed by TNT, that sounds like a new goof to me.

Anonymous
11/07/25(Fri)22:24:13 No.107139309

Anonymous 11/07/25(Fri)22:24:13 No.107139309

Do zombies slip on banana peels? If so we could have another banana hell lobby with zombies included

Anonymous
11/07/25(Fri)22:24:41 No.107139312

Anonymous 11/07/25(Fri)22:24:41 No.107139312

Tried kimi-linear on OR because there's no gguf yet. And it's sloppy, it writes nothing like K2 at all, but a lot like Claude. Damn because when I begged for a 2025 mixtral I didn't mean another copy. Welp guess us 24GB vramlets will have to wait some more.

Anonymous
11/07/25(Fri)22:36:13 No.107139402

Anonymous 11/07/25(Fri)22:36:13 No.107139402

File: 1560244932570.png (76 KB, 427x426)

76 KB PNG

Someone explain to me why the best sampling for RP/creative is simply not this:
>First sampler: minP (or Top-P if you like it better I guess)
>Second sampler: Temperature
>Set temperature to 5 or so
>Start raising minP (lowering in case of Top-P), find the value that produces minimal to no brain damaged outputs and stop there (In my testing for minP it seemed to be around 0.3 but likely to vary a lot based on model)
>You now have sampling that cuts all stupid tokens as first step and then levels out the probabilities of all remaining tokens so all of them are equally valid picks, promoting variety.

Anonymous
11/07/25(Fri)22:36:44 No.107139407

Anonymous 11/07/25(Fri)22:36:44 No.107139407

>>107138842
>The only reason local models are even being made is to get investors and build general interest for the company's proprietary models. If the bubble bursts then I wouldn't expect anything but finetunes, rather than actual new model releases, unless some group goes the crowdfunding route and there's enough interest for people to pay up.
>That said, I don't think we'll see a pop for at least another 2-3 years, if at all.
if the AI bubble pops, openAI might be fucked, all the rando smaller orgs might get fucked
google will remain very strong because they are in fact extracting profits from such models (AI search -> they get ad revenue via AI, not the target site). same for meta with whatever add AI voodoo they use to print money. One of the two may or may not then sell their AI services at a premium to fill the market need - and keep it proprietary. Heck, google's top secret model "Sajak" is already basically AGI

Anonymous
11/07/25(Fri)22:38:36 No.107139418

Anonymous 11/07/25(Fri)22:38:36 No.107139418

>>107139402
Just do topK 10 temp 5.

Anonymous
11/07/25(Fri)22:39:05 No.107139419

Anonymous 11/07/25(Fri)22:39:05 No.107139419

>>107139407
>Heck, google's top secret model "Sajak" is already basically AGI
What's the story here? Tried checking but getting nothing in a quick search.

Anonymous
11/07/25(Fri)22:39:54 No.107139425

Anonymous 11/07/25(Fri)22:39:54 No.107139425

File: gemma-qwq.png (329 KB, 2913x2027)

329 KB PNG

I added the extra CoT data (written by QwQ) I said I was going to add to the Gemma finetune. The result is fairly interesting.
Now it's much less neurotic about its own mistakes, but still quite a lot of "you are absolutely right" slop.

Anonymous
11/07/25(Fri)22:43:31 No.107139447

Anonymous 11/07/25(Fri)22:43:31 No.107139447

>>107139418
Feels like 10 tokens is too many for every situation since a bunch of time there is really only one correct token like when saying someone's name and what not but might be fun to try at least.

Anonymous
11/07/25(Fri)22:50:38 No.107139500

Anonymous 11/07/25(Fri)22:50:38 No.107139500

>>107139447
>since a bunch of time there is really only one correct token
Not really. Unless you are doing maths or outputting some sort of strict structure (json, html), or getting specific answers (yes, no, blue, red, etc), you more often than not want a healthy pool of possibilities.
In the case of the name for example, the next token might be the first half of the name, or a token that initiates a preamble to getting to the name.
The difference between
>The guy's name? John.
and
>The guy's name? It's that one motherfucker man! John, the asshole.
Tokens are positional, basically.

Anonymous
11/07/25(Fri)22:55:17 No.107139540

Anonymous 11/07/25(Fri)22:55:17 No.107139540

>>107139447
Yeah no, even Top K of 3 causes grammar and punctuation errors at times. Has to be P to handle cases where only 1 or 2 tokens are at all reasonable.

Anonymous
11/07/25(Fri)23:02:03 No.107139577

Anonymous 11/07/25(Fri)23:02:03 No.107139577

>>107139500
Maybe you are using big models that have a gentler slope from good to bad but I'm a vramlet and mistral stuff for example has a ton of cases where it drops to garbage almost immediately after the likeliest token whenever it's very sure about the top token.

Anonymous
11/07/25(Fri)23:06:25 No.107139600

Anonymous 11/07/25(Fri)23:06:25 No.107139600

>>107139402
Variety does not equal good
You can have 50 different indians shit on your plate, but that won't make you want to eat it.

Anonymous
11/07/25(Fri)23:27:20 No.107139738

Anonymous 11/07/25(Fri)23:27:20 No.107139738

File: ddr5-6klol.png (174 KB, 1457x524)

174 KB PNG

make it stop aaaaaauuuughhh

Anonymous
11/07/25(Fri)23:27:56 No.107139742

Anonymous 11/07/25(Fri)23:27:56 No.107139742

Why don't llama.cpp guys provide Linux binaries with Cuda support compiled in? Windows version comes with CUDA support too, it can't be just because of some arbitrary software license.

Anonymous
11/07/25(Fri)23:29:08 No.107139751

Anonymous 11/07/25(Fri)23:29:08 No.107139751

>>107139742
shaddup and compile

Anonymous
11/07/25(Fri)23:33:11 No.107139779

Anonymous 11/07/25(Fri)23:33:11 No.107139779

>>107139738
Isn't it ramping up because they are transitioning to DDR6

Anonymous
11/07/25(Fri)23:34:46 No.107139792

Anonymous 11/07/25(Fri)23:34:46 No.107139792

>>107139779
nope. thats still 2 years out at least

Anonymous
11/07/25(Fri)23:50:37 No.107139849

Anonymous 11/07/25(Fri)23:50:37 No.107139849

>>107139738
but anon think of all the proprietary models they will train with all of that ram

Anonymous
11/07/25(Fri)23:58:00 No.107139885

Anonymous 11/07/25(Fri)23:58:00 No.107139885

>>107139540
So P first, K second?

Anonymous
11/08/25(Sat)00:01:50 No.107139897

Anonymous 11/08/25(Sat)00:01:50 No.107139897

>>107139540
That doesn't make sense. If the model makes mistakes at top k=3, at a higher top k mathematically it will make them more often.

Anonymous
11/08/25(Sat)00:05:27 No.107139915

Anonymous 11/08/25(Sat)00:05:27 No.107139915

>>107139897
I usually just set topK to 20 and adjust temp based on how locked in the model is. If it goes very sloppy you need to confuse it.

Anonymous
11/08/25(Sat)00:07:58 No.107139924

Anonymous 11/08/25(Sat)00:07:58 No.107139924

File: dehi4ed4nxj41.jpg (62 KB, 640x1136)

62 KB JPG

>>107139742
Because we Linux users can fend for ourselves.

Anonymous
11/08/25(Sat)00:15:25 No.107139959

Anonymous 11/08/25(Sat)00:15:25 No.107139959

>>107139924
unironically this, I also compile with new cuda (llmaocpp on winshit is at 12.4) and with ZEN5 arch optimizations
wintoddlers are RETARDs

Anonymous
11/08/25(Sat)00:19:36 No.107139982

Anonymous 11/08/25(Sat)00:19:36 No.107139982

>>107139792
Is it? Isn't ddr6 slated to start releasing consumer models by next autumn?

Anonymous
11/08/25(Sat)00:19:49 No.107139984

Anonymous 11/08/25(Sat)00:19:49 No.107139984

>>107139924
>git pull
>cmake ...
How hard can it be?

Anonymous
11/08/25(Sat)00:20:00 No.107139985

Anonymous 11/08/25(Sat)00:20:00 No.107139985

>>107139982
no. zen 6 is still gonna be ddr5

Anonymous
11/08/25(Sat)00:20:26 No.107139987

Anonymous 11/08/25(Sat)00:20:26 No.107139987

>>107139742
Because linux users are used to being third world citizens.

Anonymous
11/08/25(Sat)00:26:53 No.107140030

Anonymous 11/08/25(Sat)00:26:53 No.107140030

why compile by yourself if no new goof supported?

Anonymous
11/08/25(Sat)00:28:25 No.107140040

Anonymous 11/08/25(Sat)00:28:25 No.107140040

>>107139984
to be fair I always have to check the build.md to see what the cuda on command was

Anonymous
11/08/25(Sat)00:29:19 No.107140043

Anonymous 11/08/25(Sat)00:29:19 No.107140043

I pull and recompile llama.cpp multiple times a day as an autistic stim

Anonymous
11/08/25(Sat)00:29:27 No.107140044

Anonymous 11/08/25(Sat)00:29:27 No.107140044

>>107140040
You don't have terminal history autocomplete?

Anonymous
11/08/25(Sat)00:30:07 No.107140047

Anonymous 11/08/25(Sat)00:30:07 No.107140047

>>107140043
Whatever gets you through the day anon, God bless you

Anonymous
11/08/25(Sat)00:33:44 No.107140074

Anonymous 11/08/25(Sat)00:33:44 No.107140074

>>107140044
>cmake.. urg wat was it ctrl-r r r

Anonymous
11/08/25(Sat)00:34:32 No.107140078

Anonymous 11/08/25(Sat)00:34:32 No.107140078

>>107140044
I run it on containers.

Anonymous
11/08/25(Sat)00:36:41 No.107140091

Anonymous 11/08/25(Sat)00:36:41 No.107140091

>>107140044
show hist config

shopt -s histappend            # append don't overwrite
HISTCONTROL=ignoreboth    # ignore space beginning lines and duplicates
HISTSIZE=10000
HISTFILESIZE=20000

Anonymous
11/08/25(Sat)00:43:26 No.107140125

Anonymous 11/08/25(Sat)00:43:26 No.107140125

>>107140074
>cmake
>press right arrow key
Not so difficult

Anonymous
11/08/25(Sat)00:44:40 No.107140129

Anonymous 11/08/25(Sat)00:44:40 No.107140129

>>107140110
My 3090 gets 0 because I never bothered to download that garbage

Anonymous
11/08/25(Sat)00:46:25 No.107140139

Anonymous 11/08/25(Sat)00:46:25 No.107140139

>>107140129
holy fucking based

Anonymous
11/08/25(Sat)00:46:54 No.107140142

Anonymous 11/08/25(Sat)00:46:54 No.107140142

>>107140110
It's okay bro no need to be shy today you learned spoilers don't work on /g/

Anonymous
11/08/25(Sat)00:47:45 No.107140145

Anonymous 11/08/25(Sat)00:47:45 No.107140145

I want to preserve some of Opus-3 before it gets switched off in 2 months. I'm thinking I'll put like $500 on openrouter and build a dataset.

I know I could get random prompts out of datasets on HF and fire them off, but that'd be shallow.

What's a good way to get multi-turn out of it? The datasets I've seen doing this with another LLM writing a response don't' see that great. The "human" follow-up replies are too generic and half the conversations are discussing what Claude is allowed to discuss.

Anonymous
11/08/25(Sat)00:52:42 No.107140180

Anonymous 11/08/25(Sat)00:52:42 No.107140180

>>107140145
Hello sir.

Anonymous
11/08/25(Sat)01:06:03 No.107140264

Anonymous 11/08/25(Sat)01:06:03 No.107140264

>>107140145
I was thinking about distilling from closed models as well because frankly all the open datasets are all trash.
The best way might be to prompt it with random segments of conversational datasets. Also there might be value to sampling with the same prompt multiple times at a high temperature to capture an approximation of the distribution rather than only the top tokens since having (or in this case estimating) the soft logits is supposed to be much better for distillation, but I'm not sure how valuable that is compared to doing one capture with different prompts.
Another strategy would be to just use the model in the way you normally use it and just capture the logs. But that is obviously very time consuming.

Anonymous
11/08/25(Sat)01:08:18 No.107140277

Anonymous 11/08/25(Sat)01:08:18 No.107140277

A third alternative might be to offer free usage through a proxy while logging it and let people do the hard work of prompting it for you.
But that would have to be rate limited and otherwise locked down to prevent people from trying to DDoS you and waste money.

Anonymous
11/08/25(Sat)01:19:08 No.107140356

Anonymous 11/08/25(Sat)01:19:08 No.107140356

>>107140277
>while logging it and let people do the hard work of prompting it for you.
You really want to train it on "ah ah mistress"?

Anonymous
11/08/25(Sat)01:19:54 No.107140360

Anonymous 11/08/25(Sat)01:19:54 No.107140360

>>107140264

>I was thinking about distilling from closed models as well because frankly all the open datasets are all trash.

Yeah, I noticed that. But I don't know that mine would be any better considering these would have been made by smarter people than me.

If Opus-3 is one of the models you wanted, we've got until January 5th: https://docs.claude.com/en/docs/about-claude/model-deprecations

>Also there might be value to sampling with the same prompt multiple times at a high temperature to capture an approximation of the distribution rather than only the top tokens since having (or in this case estimating) the soft logits is supposed to be much better for distillation, but I'm not sure how valuable that is compared to doing one capture with different prompts.

Good point, I think I'll do that for at least the first turn. Even if I can't figure out how best to use them right now, at least I'll have it before the model is removed.

> Another strategy would be to just use the model in the way you normally use it and just capture the logs.

Yeah, I've got about 200 conversations I can export in openwebui.

Still, likely going to need a lot more than this.

Kimi-K2 suggested I need to use different system prompts as well.

I think I'm going to need a model responding to it, but not stupidly like this:

https://huggingface.co/datasets/kalomaze/Opus_Instruct_25k?conversation-viewer=2

(Why does it have to say "Claude" in every reply?)

Anonymous
11/08/25(Sat)01:20:56 No.107140365

Anonymous 11/08/25(Sat)01:20:56 No.107140365

>>107140356
No, I'm interested in logic, programming and reasoning, but I assumed OP wanted to distill for coom since modern models do better at "productivity" tasks.

Anonymous
11/08/25(Sat)01:21:48 No.107140370

Anonymous 11/08/25(Sat)01:21:48 No.107140370

>>107139751
Retard, there is already Vulcan and CPU version for linux but not CUDA.

Anonymous
11/08/25(Sat)01:22:34 No.107140373

Anonymous 11/08/25(Sat)01:22:34 No.107140373

>>107140370
ur rarted

Anonymous
11/08/25(Sat)01:23:28 No.107140380

Anonymous 11/08/25(Sat)01:23:28 No.107140380

File: 1742731961205593.png (117 KB, 563x688)

117 KB PNG

New Cydonia is really good
v4ze is good too, but I've been getting better results from v4zd.
Responses are still varied and perfectly coherent at 24K context.

Anonymous
11/08/25(Sat)01:24:15 No.107140384

Anonymous 11/08/25(Sat)01:24:15 No.107140384

>>107140360
If you can gather your or somebody else's logs about the topic you care about (even for other models), you can finetune a LoRa (what base model you finetune it on doesn't really matter) to predict what the user would say given a certain assistant message.

Anonymous
11/08/25(Sat)01:24:25 No.107140386

Anonymous 11/08/25(Sat)01:24:25 No.107140386

>>107140356
>>107140277

Yeah we've already got an Opus-3 "ah ah mistress" dataset which I think was created that way via 4chan volunteers. The Magnum models were trained with it.

Anonymous
11/08/25(Sat)01:25:24 No.107140392

Anonymous 11/08/25(Sat)01:25:24 No.107140392

File: vramlets btfo 2.png (958 KB, 1024x1024)

958 KB PNG

>>107140370
hush now
just compile it

Anonymous
11/08/25(Sat)01:25:33 No.107140394

Anonymous 11/08/25(Sat)01:25:33 No.107140394

>>107140380
pretty sure i've read all this shit about a dozen other times.. looks pretty fuckin same to me

Anonymous
11/08/25(Sat)01:25:52 No.107140397

Anonymous 11/08/25(Sat)01:25:52 No.107140397

File: 1744730703391810.png (121 KB, 576x686)

121 KB PNG

>>107140380
Fuck, cut off the last line.
New Cydonia is really good
v4ze is good too, but I've been getting better results from v4zd.
Responses are still varied and perfectly coherent at 24K context.

Anonymous
11/08/25(Sat)01:26:28 No.107140399

Anonymous 11/08/25(Sat)01:26:28 No.107140399

>>107140365
No, I'm interested in logic, programming and reasoning, but I assumed OP wanted to distill for coom since modern models do better at "productivity" tasks.

Not to "coom", just the overall voice of the model.

Opus-3 isn't very good for logic/coding (otherwise one of the Chinese labs would have distilled it and I wouldn't bother)

Anonymous
11/08/25(Sat)01:35:05 No.107140446

Anonymous 11/08/25(Sat)01:35:05 No.107140446

>>107140399
Well, that's the extent of my knowledge. I searched around but there doesn't seem to be anything too fleshed out unlike the style transfer in the visual domain which is a mature ML task.

Anonymous
11/08/25(Sat)01:43:29 No.107140483

Anonymous 11/08/25(Sat)01:43:29 No.107140483

I've another idea. Maybe ask it to write an infinite choose your own adventure game with say 4 different options on each generation, and systematically explore all possible branches of the tree? I think that would be interesting in and of itself besides the Claude situation.

Anonymous
11/08/25(Sat)01:43:54 No.107140486

Anonymous 11/08/25(Sat)01:43:54 No.107140486

File: 1733246016748360.jpg (206 KB, 558x720)

206 KB JPG

>>107140394
It's hard to convey the value of a model in a single post, no one here is going to read thousands of words of a slop fantasy RP that devolves into smut.
I really do think that it's the new best coom/creative model that can comfortably fit in 24GB VRAM.
The main points I enjoy about it, compared to regular Mistral, Gemma, Qwen models ~30B and under
>characters will swear when it makes sense in context, most other models will either do it in every reply, making the character seem stupid, or be too prudish to have the character swear of their own accord
>swipes are varied, even at a modest temp of 0.7 (which is about the upper limit for mistral small, before it starts getting noticeably dumber
>doesn't speak for user particularly often, a problem I've had with other recent Cydonias
>relationships and sex are effectively built up slowly, e.g. characters will flirt, a day can pass without further mention, and they'll recall it and continue the next day, ~2-3k tokens later.

Anonymous
11/08/25(Sat)01:46:52 No.107140501

Anonymous 11/08/25(Sat)01:46:52 No.107140501

File: file.png (23 KB, 390x262)

23 KB PNG

what am I in for?

Anonymous
11/08/25(Sat)01:48:20 No.107140508

Anonymous 11/08/25(Sat)01:48:20 No.107140508

>>107140501
safety cuckery even if you don't goon

Anonymous
11/08/25(Sat)01:51:02 No.107140519

Anonymous 11/08/25(Sat)01:51:02 No.107140519

saaaar do not redeem the chain of thought
https://www.youtube.com/watch?v=IeCS6hsnOXs

Anonymous
11/08/25(Sat)01:56:14 No.107140543

Anonymous 11/08/25(Sat)01:56:14 No.107140543

>>107140392
fuck off avatarfag.

Anonymous
11/08/25(Sat)02:07:38 No.107140596

Anonymous 11/08/25(Sat)02:07:38 No.107140596

>>107140501
>we must refuse
it's a decent model for sfw tasks & tool calling, runs fast

Anonymous
11/08/25(Sat)02:08:29 No.107140601

Anonymous 11/08/25(Sat)02:08:29 No.107140601

It seems that adding the QwQ data to the dataset made the model much more sensitive to overfitting, even though the validation loss kept going down, now I had to decrease the lr from 1e-05 to 1e-06 because the CoT data made it primed to get stuck in repetition loops. I think it's probably because of the repetition inherent in CoT models.

Anonymous
11/08/25(Sat)02:11:54 No.107140618

Anonymous 11/08/25(Sat)02:11:54 No.107140618

>>107138890
they're all ingredients for butchered """curry"" so I'm assuming just curry powder
checked a dictionary and it's curry roux

Anonymous
11/08/25(Sat)02:18:11 No.107140656

Anonymous 11/08/25(Sat)02:18:11 No.107140656

>>107138775
>When and if
There are hundreds of promising research ideas yet to be tested at scale. LLMs are already impacting the job market and only continue to improve. Not gonna be a sudden a-ha! thing where the world changes overnight, just bumpy steadily better until everyone's left wondering where the jobs are and the civil unrest picks up because UBI ain't happening

Anonymous
11/08/25(Sat)02:19:24 No.107140661

Anonymous 11/08/25(Sat)02:19:24 No.107140661

>>107140380
Every model feels sloppy compared to Cydonia in its size range desu
I still try other models people recommend but they majorly suck ass

Anonymous
11/08/25(Sat)02:23:46 No.107140689

Anonymous 11/08/25(Sat)02:23:46 No.107140689

>>107140501
toss for the smarts, air for the dick

Anonymous
11/08/25(Sat)02:24:17 No.107140692

Anonymous 11/08/25(Sat)02:24:17 No.107140692

>>107140689
no toss on the 'ick?

Anonymous
11/08/25(Sat)02:28:35 No.107140715

Anonymous 11/08/25(Sat)02:28:35 No.107140715

>>107140689
>>107140692
go back to the sharty you homo faggots

Anonymous
11/08/25(Sat)02:31:55 No.107140734

Anonymous 11/08/25(Sat)02:31:55 No.107140734

File: 1737172310317521.jpg (221 KB, 924x656)

221 KB JPG

>>107140689
>we must refuse
>"the dick?" Air echoes
both are awful

Anonymous
11/08/25(Sat)02:34:00 No.107140741

Anonymous 11/08/25(Sat)02:34:00 No.107140741

>>107140734
so are every other llms, but for 100b those two are the only decent options

Anonymous
11/08/25(Sat)02:35:10 No.107140748

Anonymous 11/08/25(Sat)02:35:10 No.107140748

File: 1750038873095951.jpg (82 KB, 660x778)

82 KB JPG

>>107140741
>so are every other

Anonymous
11/08/25(Sat)02:35:15 No.107140749

Anonymous 11/08/25(Sat)02:35:15 No.107140749

Ok, I changed my data mix to the following:
my own assistant logs x 4
openthoughts x 2
qwq cot x 1
x n being the number of times the data is duplicated in the dataset, i.e. a hacky way of having a different number of epochs for each data class while still randomly shuffling the samples
not sure how it'll work

Anonymous
11/08/25(Sat)02:36:41 No.107140757

Anonymous 11/08/25(Sat)02:36:41 No.107140757

next I wanna try adding some RP data to see if it helps with coherency, and also check out the openassistant dataset

Anonymous
11/08/25(Sat)02:38:33 No.107140766

Anonymous 11/08/25(Sat)02:38:33 No.107140766

after that it might be time to begin testing the waters with rlvr

Anonymous
11/08/25(Sat)02:44:01 No.107140785

Anonymous 11/08/25(Sat)02:44:01 No.107140785

oh, and also some data augmentation although I'm not sure how that works on text only tried it with images

Anonymous
11/08/25(Sat)02:56:30 No.107140832

Anonymous 11/08/25(Sat)02:56:30 No.107140832

File: Screenshot_20251108_085425.png (427 KB, 2385x1319)

427 KB PNG

>>107138606
At least in Germany the supposed $500 MSRP for the Intel Arc B60 has so far not materialized, at 770 € they're I think just not worth buying.
Is the pricing in other regions at least better?

Anonymous
11/08/25(Sat)03:01:41 No.107140848

Anonymous 11/08/25(Sat)03:01:41 No.107140848

>>107140748
holy zased

Anonymous
11/08/25(Sat)03:03:37 No.107140853

Anonymous 11/08/25(Sat)03:03:37 No.107140853

>>107140749
I just realized this causes us to train on the validation set. Fuck. Oh well, the validation split didn't seem to be very useful anyway.

Anonymous
11/08/25(Sat)03:06:05 No.107140862

Anonymous 11/08/25(Sat)03:06:05 No.107140862

>>107140832
tech MSRPs are just marketing material, they're complete fiction.

Anonymous
11/08/25(Sat)03:09:01 No.107140874

Anonymous 11/08/25(Sat)03:09:01 No.107140874

>>107140853
I figure I will have to make an explicit manual split beforehand and then it'll be alright

Anonymous
11/08/25(Sat)03:09:51 No.107140877

Anonymous 11/08/25(Sat)03:09:51 No.107140877

it took like half an hour to get huggingface hub installed, due to GPT5 hallucinations and microsoft store fuckery. this is not a serious industry

Anonymous
11/08/25(Sat)03:12:11 No.107140894

Anonymous 11/08/25(Sat)03:12:11 No.107140894

According to this talk the way the Llamas were trained was by taking a validation set from the complete training set and then changing the weight of each dataset based on how much it affected this validation set. They claim this is an open problem. I think it can be fairly easily explained as some of the data being overrepresented in the training set and causing overfitting if not downsampled.
https://www.youtube.com/watch?v=-TIZPe_YaiU

Anonymous
11/08/25(Sat)03:13:11 No.107140898

Anonymous 11/08/25(Sat)03:13:11 No.107140898

>>107140877
skill issue

Anonymous
11/08/25(Sat)03:13:42 No.107140902

Anonymous 11/08/25(Sat)03:13:42 No.107140902

>>107140689
>toss
>smarts
lmao

Anonymous
11/08/25(Sat)03:13:52 No.107140905

Anonymous 11/08/25(Sat)03:13:52 No.107140905

File: 1743537072707422.jpg (150 KB, 1024x1024)

150 KB JPG

whenever I feel cold I just crank up my 3090's power limit

Anonymous
11/08/25(Sat)03:16:40 No.107140914

Anonymous 11/08/25(Sat)03:16:40 No.107140914

>>107140877
If you need GPT5 to install a fucking program I think this hobby might not be for you

Anonymous
11/08/25(Sat)03:17:37 No.107140921

Anonymous 11/08/25(Sat)03:17:37 No.107140921

>>107140894
I'm not going to watch a random 1-hour video, but it's common knowledge that most AI companies optimize their pretraining dataset mixtures for synthetic benchmarks, other than "safety".

Anonymous
11/08/25(Sat)03:20:10 No.107140928

Anonymous 11/08/25(Sat)03:20:10 No.107140928

>>107140877
what do you need huggingface_hub for?

Anonymous
11/08/25(Sat)03:20:46 No.107140932

Anonymous 11/08/25(Sat)03:20:46 No.107140932

>>107140921
Does validation loss count as a synthetic benchmark though? It's about how accurately it predicts the pretrain dataset.
As for the video, the claim happens at about the 15 minute mark, but the channel is one of the best channels I found when it comes to ML theory.
And people say there is nothing worth watching on youtube.

Anonymous
11/08/25(Sat)03:29:09 No.107140956

Anonymous 11/08/25(Sat)03:29:09 No.107140956

File: file.png (37 KB, 830x303)

37 KB PNG

toss bros?

Anonymous
11/08/25(Sat)03:30:38 No.107140959

Anonymous 11/08/25(Sat)03:30:38 No.107140959

File: Screenshot 2025-11-08 022933.jpg (27 KB, 420x310)

27 KB JPG

>>107140928
>what do you need huggingface_hub for?
i want to use the extropic thrml simulator to do topk prefiltering, but i need to rip the gptoss embeddings first so i can shuffle them around so each axis of the embedding fits into the 2d thrml array meaningfully

Anonymous
11/08/25(Sat)03:34:18 No.107140971

Anonymous 11/08/25(Sat)03:34:18 No.107140971

File: file.png (74 KB, 795x352)

74 KB PNG

>>107140956
lol

Anonymous
11/08/25(Sat)03:36:39 No.107140978

Anonymous 11/08/25(Sat)03:36:39 No.107140978

>>107140956
>We must not reveal we are cucked.

Anonymous
11/08/25(Sat)03:41:52 No.107141012

Anonymous 11/08/25(Sat)03:41:52 No.107141012

>>107140921
lol they have entire teams for safetycucking

Anonymous
11/08/25(Sat)03:42:31 No.107141017

Anonymous 11/08/25(Sat)03:42:31 No.107141017

>>107140956
weird that an optimized model devotes thinking tokens to "so"

Anonymous
11/08/25(Sat)03:46:10 No.107141030

Anonymous 11/08/25(Sat)03:46:10 No.107141030

>>107141012
he was talking about pretraining, has nothing to do with safety

Anonymous
11/08/25(Sat)03:55:45 No.107141086

Anonymous 11/08/25(Sat)03:55:45 No.107141086

>>107141030
Curtailing the corpuses (corpii?) used in pretraining is very much a job for the safety team

Anonymous
11/08/25(Sat)03:59:59 No.107141101

Anonymous 11/08/25(Sat)03:59:59 No.107141101

>>107141086
ok, fair. but do you have any reason to believe that optimizing the validation loss on a subset of the complete unfiltered corpus would correlate in any way with safety? because the claim on the video was about optimizing the validation loss

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.