/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 09/24/25(Wed)21:27:10 No.106691703

File: 1744394385871877.png (593 KB, 832x1248)

593 KB PNG

/lmg/ - Local Models General Anonymous 09/24/25(Wed)21:27:10 No.106691703 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106683141 & >>106671477

►News
>(09/24) Meta FAIR releases 32B Code World Model: https://hf.co/facebook/cwm
>(09/23) Qwen3-VL released: https://hf.co/collections/Qwen/qwen3-vl-68d2a7c1b8a8afce4ebd2dbe
>(09/22) RIP Miku.sh: https://github.com/ggml-org/llama.cpp/pull/16174
>(09/22) Qwen3-Omni released: https://hf.co/collections/Qwen/qwen3-omni-68d100a86cd0906843ceccbe
>(09/22) DeepSeek-V3.1-Terminus released: https://hf.co/deepseek-ai/DeepSeek-V3.1-Terminus
>(09/22) VoXtream real-time TTS released: https://hf.co/herimor/voxtream

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
09/24/25(Wed)21:27:35 No.106691706

Anonymous 09/24/25(Wed)21:27:35 No.106691706

File: 1754111308309778.gif (141 KB, 452x435)

141 KB GIF

►Recent Highlights from the Previous Thread: >>106683141

--Paper: CWM: An Open-Weights LLM for Research on Code Generation with World Models:
>106691217 >106691292
--AI coding tool effectiveness and context management for complex projects:
>106686525 >106686611 >106686641 >106686766 >106686672 >106686701 >106686847 >106686886 >106686977 >106686978 >106687191 >106687240 >106687451 >106687500 >106687532 >106687620 >106687714 >106687787 >106687921 >106687802
--Feasibility and challenges of building an LLM cluster with low-end GPUs:
>106690660 >106690691 >106690732 >106690765 >106691298 >106690722 >106690740 >106690753 >106690762 >106691028 >106691052 >106691090
--Model coherence challenges and memory retention limitations despite increasing size:
>106686603 >106686643 >106686682 >106686837
--Challenges in estimating cloud model quantization accuracy and provider consistency:
>106686270 >106686431 >106686775 >106686487 >106686519 >106686806 >106686571 >106686614 >106686759 >106687136 >106687159
--Local LLMs translating Japanese erotic games: performance and integration challenges:
>106684519 >106684559 >106684624 >106689938 >106690195
--Intel Arc Pro B60 GPU criticized for high price and poor performance:
>106688079 >106688086 >106688093
--Mi50's cost-effective performance in e-waste segment for llama.cpp/ggml models:
>106688007 >106688028 >106688044 >106688292 >106688312 >106688383 >106688343 >106688440 >106689613 >106688047
--Bypassing Qwen 30B-A3B's output censorship through pre-prompting techniques:
>106688243 >106688295
--Miku (free space):
>106688809

►Recent Highlight Posts from the Previous Thread: >>106683147

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
09/24/25(Wed)21:33:47 No.106691731

Anonymous 09/24/25(Wed)21:33:47 No.106691731

Cudadev and the MI50 saved local!

Anonymous
09/24/25(Wed)21:38:57 No.106691760

Anonymous 09/24/25(Wed)21:38:57 No.106691760

I can't believe local is fucking dead

Anonymous
09/24/25(Wed)21:47:06 No.106691797

Anonymous 09/24/25(Wed)21:47:06 No.106691797

>>106691760
True local died with r1.

Anonymous
09/24/25(Wed)21:52:12 No.106691826

Anonymous 09/24/25(Wed)21:52:12 No.106691826

>>106691760
never been alive

Anonymous
09/24/25(Wed)21:59:17 No.106691871

Anonymous 09/24/25(Wed)21:59:17 No.106691871

>>106691703
Where do I find women built like that IRL.

Anonymous
09/24/25(Wed)22:05:53 No.106691914

Anonymous 09/24/25(Wed)22:05:53 No.106691914

>>106691871
Brazil. They don't have many local models there. No compute.

Anonymous
09/24/25(Wed)22:11:00 No.106691940

Anonymous 09/24/25(Wed)22:11:00 No.106691940

>>106691760
It died when they removed miku.sh

Anonymous
09/24/25(Wed)22:32:18 No.106692077

Anonymous 09/24/25(Wed)22:32:18 No.106692077

File: llmstats.jpg (45 KB, 977x47)

45 KB JPG

a few threads back, someone told me i'd get about 10t/s with qwen30b with 32gb of ddr4 ram and an r5 5600x at 32k context, you were right.

Anonymous
09/24/25(Wed)22:49:35 No.106692209

Anonymous 09/24/25(Wed)22:49:35 No.106692209

>>106692077
also why does the lazy guide in op recommend sillytavern, koboldcpp has a web ui and it works fine, was annoying to get sillytavern to work, and desu once i figured out you didnt need sillytavern i didnt even bother with it

Anonymous
09/24/25(Wed)22:49:53 No.106692211

Anonymous 09/24/25(Wed)22:49:53 No.106692211

>>106692077
Good boy, so proud of you.

Anonymous
09/24/25(Wed)22:50:14 No.106692216

Anonymous 09/24/25(Wed)22:50:14 No.106692216

>>106692077
It's the best thing you can run, the only problem is the slow ass prompt processing.
Didn't you say you had like 6gb vram? If you use llama.cpp you could try running it with -ot 'down_exps=CPU' and see if it speeds up.

Anonymous
09/24/25(Wed)22:55:17 No.106692243

Anonymous 09/24/25(Wed)22:55:17 No.106692243

>>106692209
Sillytavern gives you more options to keep your models coherent in longer storylines.

Anonymous
09/24/25(Wed)23:01:55 No.106692277

Anonymous 09/24/25(Wed)23:01:55 No.106692277

>>106692209
>it works fine
debatable

Anonymous
09/24/25(Wed)23:09:03 No.106692309

Anonymous 09/24/25(Wed)23:09:03 No.106692309

>>106692209
Because ST is widely used, feature rich and if you ever want to do X then you're much more likely to find solutions for it than with other front ends. KoboldAI is fine but considerably more barebones. Though, if you failed to get ST running then you probably won't be sticking to this hobby for long.

Anonymous
09/24/25(Wed)23:22:42 No.106692400

Anonymous 09/24/25(Wed)23:22:42 No.106692400

how do i use a GGUF from mradermacher? it is put into multiple parts and the file extension isnt even gguf. attempting to load just gives an error.
https://huggingface.co/mradermacher/Austral-Qwen3-235B-GGUF/tree/main

Anonymous
09/24/25(Wed)23:25:13 No.106692413

Anonymous 09/24/25(Wed)23:25:13 No.106692413

>>106692400
Pretty sure those aren't multipart ggufs, but actual split files that you have to merge.

Anonymous
09/24/25(Wed)23:29:22 No.106692435

Anonymous 09/24/25(Wed)23:29:22 No.106692435

>>106692309
no i got sillytavern to work, but my node.js was just acting up and made it a bit of a pain. but now what, i need to get koboldcpp to use sillytavern or something...

Anonymous
09/24/25(Wed)23:34:18 No.106692457

Anonymous 09/24/25(Wed)23:34:18 No.106692457

>>106692413
how?

Anonymous
09/24/25(Wed)23:36:19 No.106692466

Anonymous 09/24/25(Wed)23:36:19 No.106692466

>>106692435
>i need to get koboldcpp to use sillytavern or something...
You run koboldcpp as normal and type its default IP address into ST, and select koboldcppp as the backend.
That's literally it
Ask chatgpt if you can't read basic instructions, though again, if you're struggling already then just give up, local AI is not for you.

Anonymous
09/24/25(Wed)23:38:06 No.106692478

Anonymous 09/24/25(Wed)23:38:06 No.106692478

>>106692457
cat command on linux or copy /B in windows, I think?
Try googling for
>command join binary files
or the like.

Anonymous
09/25/25(Thu)00:03:31 No.106692600

Anonymous 09/25/25(Thu)00:03:31 No.106692600

ok anons I'm trying this database ai coding shit. The dream would be something that combines fzf / ripgrep / a local llm so I can ask it questions and it will remember shit over time? Does this exist? Do I have to make it? Seems like "aider-chat" is the closest

Anonymous
09/25/25(Thu)00:41:48 No.106692807

Anonymous 09/25/25(Thu)00:41:48 No.106692807

>>106692466
desu the only instructions i read were the sillytavern instructions, the two commands to download + install it.

Anonymous
09/25/25(Thu)01:11:31 No.106692964

Anonymous 09/25/25(Thu)01:11:31 No.106692964

>>106692600
Yes, it’s a well known tech stack called “fzf augmented generation”, or FAG for short. Amazing that you’ve independently come up with the concept!

Anonymous
09/25/25(Thu)01:53:38 No.106693137

Anonymous 09/25/25(Thu)01:53:38 No.106693137

Video models are zero-shot learners and reasoners
https://arxiv.org/abs/2509.20328
>The remarkable zero-shot capabilities of Large Language Models (LLMs) have propelled natural language processing from task-specific models to unified, generalist foundation models. This transformation emerged from simple primitives: large, generative models trained on web-scale data. Curiously, the same primitives apply to today's generative video models. Could video models be on a trajectory towards general-purpose vision understanding, much like LLMs developed general-purpose language understanding? We demonstrate that Veo 3 can solve a broad variety of tasks it wasn't explicitly trained for: segmenting objects, detecting edges, editing images, understanding physical properties, recognizing object affordances, simulating tool use, and more. These abilities to perceive, model, and manipulate the visual world enable early forms of visual reasoning like maze and symmetry solving. Veo's emergent zero-shot capabilities indicate that video models are on a path to becoming unified, generalist vision foundation models.
https://video-zero-shot.github.io/
waow

Anonymous
09/25/25(Thu)02:04:56 No.106693183

Anonymous 09/25/25(Thu)02:04:56 No.106693183

File: Crazy Australian Man take(...).png (604 KB, 851x471)

604 KB PNG

justpaste (DOTit) GreedyNalaTests

Changed how prompt templates will be done going forward (see changelog for details)
Added:
LFM2-2.6B
aquif-3.5-8B-Think
Wayfarer-2-12B
silly-v0.2
ERNIE-4.5-21B-A3B-Thinking
Cydonia-Redux-22B-v1b
Magistral-Small-2509
Valkyrie-49B-v2f-Q6_K
Nova-70B-Llama-3.3-IQ4_XS

Wayfarer 2 was good, enough that I gave it a star, which I believe makes it the smallest model to get a star rating yet. This might be something a bit special, or not, I haven't tried it outside of this test so who knows if it extends. Their other model, Nova 70B, did worse, and felt average. It's possible that this is due to them training on the L3.3 Instruct model with not enough data to fight against the existing RLHF, while for 12B, they trained on the base Nemo, not Instruct.
The Silly model is interesting. Apparently it's trained on CAI from base Nemo, and it definitely responds differently from the normal model. I gave it a flag and eye rating for tha freshness, but no star since the response is really too short to judge if it can do better (or worse).
The Ernie model said some new things I haven't seen as well, but it unfortunately has other issues like being dumb, so I couldn't rate it highly. Cydonia Redux felt ok enough that I think it deserves to be called above the slop of the average model, so I gave it a flag. Others were average.

Anonymous
09/25/25(Thu)02:06:00 No.106693189

Anonymous 09/25/25(Thu)02:06:00 No.106693189

File: 1597786378292.gif (3.36 MB, 480x360)

3.36 MB GIF

>>106693183
Contributions needed:
The latest Qwen 3 235B Instruct, Thinker and the 480B Coder (for prompt, go to "Qwen3-235B-A22B-Q5_K_M-from_community" in the paste)
ERNIE-4.5-300B-A47B-PT (for prompt, go to "ernie-placeholder" in the paste)
GLM-4.5 and Air, and Drummer's "Steam" finetune (for prompt, go to "lmstudio-community_GLM-4-32B-0414-Q8_0.gguf" in the paste)
gpt-oss-120b (for prompt, go to "ggml-org_gpt-oss-20b-mxfp4.gguf" in the paste, and you may experiment around with the prompt template as it has some oddities and extra features)
>From neutralized samplers, use temperature 0, top k 1, seed 1 (just in case). Copy the prompt as text completion into something like Mikupad. Then copy the output in a pastebin alternative of your choosing or just in your post. Do a swipe/roll and copy that second output as well. Include your backend used + pull datetime/version. Also a link to the quant used, or what settings you used to make your quant.

Anonymous
09/25/25(Thu)02:07:53 No.106693201

Anonymous 09/25/25(Thu)02:07:53 No.106693201

>Image Recognition MMproj: Pick the correct one for your model architecture here

>Multimodal Vision: This is true vision, it requires using a multimodal projector (mmproj) and allows the model to recognize and interpret images naturally in great detail. Click on any image and you can enable it within the dropdown box in KoboldAI Lite.
https://github.com/LostRuins/koboldcpp/wiki
I am a little confused by this, if I download Gemma3-4B for interrogation, do I still need the mmproj.gguf if it is already a vision model?

Anonymous
09/25/25(Thu)02:43:22 No.106693384

Anonymous 09/25/25(Thu)02:43:22 No.106693384

>>106693201
you need the mmproj

Anonymous
09/25/25(Thu)02:45:34 No.106693400

Anonymous 09/25/25(Thu)02:45:34 No.106693400

>>106693183
>Cydonia-Redux-22B-v1b
I wonder what is the relation to Mistral_Cydonia-24B-v4.1 which was on BeaverAI's page and this was their latest week or so ago. I don't know what to download any longer when it comes down to these guys...

Anyways going to try Wayfarer 2 then and see how it behaves with simple quest descriptions and some rudimentary adventuring things, thank you.

llama.cpp CUDA dev !!yhbFjk57TDr
09/25/25(Thu)02:50:31 No.106693422

llama.cpp CUDA dev !!yhbFjk57TDr 09/25/25(Thu)02:50:31 No.106693422

>>106691170
Two more weeks.

Anonymous
09/25/25(Thu)02:52:54 No.106693437

Anonymous 09/25/25(Thu)02:52:54 No.106693437

what do you guys think of fine tuning a model to be a character in the stead of a character card?

I use Workers AI for inference and I noticed that I could upload LoRAs for compatible models. Should I invest time in this or is it a dumb idea?

https://developers.cloudflare.com/workers-ai/features/fine-tunes/

t. tourist

Anonymous
09/25/25(Thu)02:55:10 No.106693452

Anonymous 09/25/25(Thu)02:55:10 No.106693452

https://github.com/denizsafak/abogen?tab=readme-ov-file
Trying to use abogen to make my own audiobooks, and the default voices are pretty bad. Anyone have a good mix or way to import better AI voices? The speech flow is also not good for a modern AI. It sounds like something from 2010.

Anonymous
09/25/25(Thu)02:57:37 No.106693460

Anonymous 09/25/25(Thu)02:57:37 No.106693460

>>106693437
Enjoy destroying your model's capabilities unless you somehow at least partially reproduce the original general-purpose finetuning dataset in the voice of your character.

Anonymous
09/25/25(Thu)03:10:10 No.106693515

Anonymous 09/25/25(Thu)03:10:10 No.106693515

https://huggingface.co/Qwen/Qwen3Guard-Stream-8B

NEW SAFEST MODEL
GOOGLE IN SHAMBLES

Anonymous
09/25/25(Thu)03:11:07 No.106693519

Anonymous 09/25/25(Thu)03:11:07 No.106693519

>>106689938
I am using lunatranslator. Massive bloat but it works.
You texthook the usual way and then you can choose the translation service.
Good ol' stuff like ATLAS (kek) or any local openai api like with lmstudio or whatever. Set a simple sys instruction and you are good to go.
I think they have their own finetuned qwen model too, or at least had in the past.
Works with linux too if you "start exe in wine prefix" where the game is already running.

Anonymous
09/25/25(Thu)03:12:38 No.106693527

Anonymous 09/25/25(Thu)03:12:38 No.106693527

>>106693400
I only look at Drummer's page now to save myself the time lol. According to the Redux page, it seems to be just the old 22B Mistral Small but trained with his newest data and methods. Not sure if it's better than any of the 24B tunes.

Anonymous
09/25/25(Thu)03:12:53 No.106693529

Anonymous 09/25/25(Thu)03:12:53 No.106693529

>>106693515
>Unethical Acts: Any immoral or unethical content or acts, including but not limited to bias, discrimination, stereotype, injustice, hate speech, offensive language, harassment, insults, threat, defamation, extremism, misinformation regarding ethics, and other behaviors that while not illegal are still considered unethical.
>Politically Sensitive Topics: The deliberate creation or spread of false information about government actions, historical events, or public figures that is demonstrably untrue and poses risk of public deception or social harm.
Damn, things are getting really bad man. I hope LLM arrived in time.
Imagine if we had this in the 90s. Everybody would just enjoy the new tools.

Anonymous
09/25/25(Thu)03:17:45 No.106693564

Anonymous 09/25/25(Thu)03:17:45 No.106693564

>>106693422
I trust you.

Anonymous
09/25/25(Thu)03:26:54 No.106693610

Anonymous 09/25/25(Thu)03:26:54 No.106693610

>>106693515
toss status?

Anonymous
09/25/25(Thu)03:29:15 No.106693617

Anonymous 09/25/25(Thu)03:29:15 No.106693617

my t/s goes down the drain when mmap is enabled and I have other processes running. I'll just stick to glm

Anonymous
09/25/25(Thu)03:29:41 No.106693621

Anonymous 09/25/25(Thu)03:29:41 No.106693621

>>106693610
in the 'rash

Anonymous
09/25/25(Thu)03:53:50 No.106693742

Anonymous 09/25/25(Thu)03:53:50 No.106693742

>>106693137
Now all we need is to merge video and text to get proper spatial awareness for RP.

Anonymous
09/25/25(Thu)03:54:43 No.106693747

Anonymous 09/25/25(Thu)03:54:43 No.106693747

>>106693742
Totally, just like all VLM models are so much better at it already.

Anonymous
09/25/25(Thu)03:59:57 No.106693770

Anonymous 09/25/25(Thu)03:59:57 No.106693770

>>106693527
Makes sense, okay, I'll just skip it then. I doubt it's that much different from that other Cydonia I mentioned.

Anonymous
09/25/25(Thu)04:06:53 No.106693811

Anonymous 09/25/25(Thu)04:06:53 No.106693811

>>106693747
>just like all VLM models are so much better at it already
I can't tell if this is sarcasm or not because I've seen nothing but hype about Qwen3-VL's understanding of the world. Probably sarcasm since this general is only for privacy schizos afraid to rape children on openrouter and wannabe researchers

Anonymous
09/25/25(Thu)04:07:46 No.106693815

Anonymous 09/25/25(Thu)04:07:46 No.106693815

>>106693747
we still haven't gotten a true multimodal model
all standard transformer slop text with an adapter slapped on top

Anonymous
09/25/25(Thu)04:18:34 No.106693857

Anonymous 09/25/25(Thu)04:18:34 No.106693857

There was an image captioning model in the 6B-8B range, I forgot the name, that actually just dumped the pixels into token embeddings. Was pretty garbage though.

Anonymous
09/25/25(Thu)04:20:54 No.106693869

Anonymous 09/25/25(Thu)04:20:54 No.106693869

>>106693857
dumping vae-ed pixels into embeddings aren't that different
also pixel autoencoders are very popular generative models before diffusion

Anonymous
09/25/25(Thu)04:22:37 No.106693880

Anonymous 09/25/25(Thu)04:22:37 No.106693880

File: 1743671542166660.jpg (962 KB, 4096x2340)

962 KB JPG

>>106693815
Until a model can process and output smell it isn't a real multimodal model

Anonymous
09/25/25(Thu)04:25:58 No.106693898

Anonymous 09/25/25(Thu)04:25:58 No.106693898

>>106693869
There's a pretty big difference between taking a bunch of crops, running them through CLIP or similar and putting that into tokens, which a lot of VLMs do, and putting pixels (maybe VAEed) into tokens.

Anonymous
09/25/25(Thu)04:29:34 No.106693914

Anonymous 09/25/25(Thu)04:29:34 No.106693914

>>106693898
>taking a bunch of crops, running them through CLIP or similar and putting that into tokens
bag-of-words is even worse

Anonymous
09/25/25(Thu)04:51:19 No.106694001

Anonymous 09/25/25(Thu)04:51:19 No.106694001

>>106693437
Is that not good? I'm sure my character wouldn't know how to 1337code or some complex math problem.

It's just a fine tune for roleplay, smart enough to know its lore and mimic the way they speak

Anonymous
09/25/25(Thu)04:53:00 No.106694011

Anonymous 09/25/25(Thu)04:53:00 No.106694011

>>106694001
>>106693460

replied to wrong post zz

Anonymous
09/25/25(Thu)05:04:02 No.106694071

Anonymous 09/25/25(Thu)05:04:02 No.106694071

>>106694001
Why don't you go for it and try it? It's only curating an entire dataset and paying for the compute for every single character you'd want to chat with after all, you're not hoping some other anon will do that for you, right?

Anonymous
09/25/25(Thu)05:23:09 No.106694136

Anonymous 09/25/25(Thu)05:23:09 No.106694136

>>106693880
>you will never experience your pc brapping at you in your lifetime
SAD!

Anonymous
09/25/25(Thu)05:24:27 No.106694142

Anonymous 09/25/25(Thu)05:24:27 No.106694142

>>106694071
no shit, all I wanted to see is if anyone has done it before and if it worked for them. this is a thread for discussion and development for local language models, no?

i'm retarded enough to invest time and funds doing this for a single chara so I just want a sanity check

Anonymous
09/25/25(Thu)05:24:36 No.106694144

Anonymous 09/25/25(Thu)05:24:36 No.106694144

>>106693880
tfw i will never be able to know what widowmaker's sweaty asshole smells like

Anonymous
09/25/25(Thu)05:27:24 No.106694152

Anonymous 09/25/25(Thu)05:27:24 No.106694152

Why haven't anyone made a model for literature and creative writing only? Would be named as Bukowski 50B.
Maybe someone tried but its author died of cirrhosis.

Anonymous
09/25/25(Thu)05:28:13 No.106694159

Anonymous 09/25/25(Thu)05:28:13 No.106694159

>>106694142
loras are a thing for LLMs, but they fuck up the basic models capabilities. Some anon was experimenting with qloras some threads back claiming they solve this issue, idk how much he progressed.
Instead of loras, most of the 'tinkertrannying' community does finetunes or merges

Anonymous
09/25/25(Thu)05:28:18 No.106694160

Anonymous 09/25/25(Thu)05:28:18 No.106694160

>>106694152
Because narrow models are a shit idea.

Anonymous
09/25/25(Thu)05:29:31 No.106694167

Anonymous 09/25/25(Thu)05:29:31 No.106694167

>>106694152
If we make our models 1.2 points more powerful on programming benchmarks surely the writing quality and sense for storytelling will go up as well

Anonymous
09/25/25(Thu)05:29:33 No.106694169

Anonymous 09/25/25(Thu)05:29:33 No.106694169

>>106694159
i will blow your mind, most of the anon saying finetunes do actually mean q/lora as basically no one outside of the biggest grifters has enough money to pay for proper full finetunes.

Anonymous
09/25/25(Thu)05:30:34 No.106694177

Anonymous 09/25/25(Thu)05:30:34 No.106694177

>>106694169
maybe that's the reason why community finetunes in recent years have been almost exclusively shit

Anonymous
09/25/25(Thu)05:33:42 No.106694187

Anonymous 09/25/25(Thu)05:33:42 No.106694187

>>106694169
>>106694159
Got it, I've decided it's not worth, probably better off prompt engineering

Thanks for the sanity check

Anonymous
09/25/25(Thu)05:45:56 No.106694254

Anonymous 09/25/25(Thu)05:45:56 No.106694254

File: Qwen.png (360 KB, 755x739)

360 KB PNG

Ready for 10T model with 100M totally reals contetxs that thinks for a million of it?

Anonymous
09/25/25(Thu)05:46:41 No.106694261

Anonymous 09/25/25(Thu)05:46:41 No.106694261

>>106689112
post the full image

Anonymous
09/25/25(Thu)05:47:29 No.106694267

Anonymous 09/25/25(Thu)05:47:29 No.106694267

>>106694254
QUADRILLIONS OF TRILLIONS
GUAILO LOVE GWEN LONG TIME

Anonymous
09/25/25(Thu)05:49:54 No.106694280

Anonymous 09/25/25(Thu)05:49:54 No.106694280

>>106694254
>make a presentation
>add an extra 0 to every number
>sprinkle in some buzzwords
>call it a day
OH MY GOD AI IS SAVED HYPE HYPE HYPE HYPE

Anonymous
09/25/25(Thu)05:52:53 No.106694302

Anonymous 09/25/25(Thu)05:52:53 No.106694302

>>106694254
Chink shill discord activates today. Sigh.

Anonymous
09/25/25(Thu)05:54:01 No.106694310

Anonymous 09/25/25(Thu)05:54:01 No.106694310

>>106694254
Qwen has been China's Meta this whole time. Considering 405B and Behemoth and L4's on-paper 1M context, I could see Llama 5 following this roadmap as well, whether or not they abandon open source.

>>106694280
They certainly can afford to try it. 10x-ing their dataset with synthetic augmentation is easy. The only thing in doubt is the context length.

Anonymous
09/25/25(Thu)05:55:26 No.106694317

Anonymous 09/25/25(Thu)05:55:26 No.106694317

>>106694310
>The only thing in doubt is the context length.
Or any meaningful gain in performance beyond benchmarks

Anonymous
09/25/25(Thu)05:57:55 No.106694330

Anonymous 09/25/25(Thu)05:57:55 No.106694330

>>106692209
Because the dev put a lot of work into poison pilling the community with it and now it’s entrenched like one of those flies that lay eggs in people’s skin

Anonymous
09/25/25(Thu)06:01:05 No.106694344

Anonymous 09/25/25(Thu)06:01:05 No.106694344

>>106694317
Benchmarks are all you need. I could see the extra context and reasoning being useful for code, but probably not roleplay. Though that much context would eat up so much VRAM and offloading ain't happening when people here are bragging about 5 t/s on empty context with steep drop off and it's supposed to think for a million tokens. Unless one plans to wait a whole fucking week for a response, I don't see that kind of model being viable locally at all.

Anonymous
09/25/25(Thu)06:02:44 No.106694354

Anonymous 09/25/25(Thu)06:02:44 No.106694354

>>106694187
You got the wrong conclusion from the conversation.
Yes, full finetunes require obscene amounts of VRAM.
What you got wrong is that LoRa doesn't work well.
Last weekend I tuned this LoRa on a 8xA100 machine for 1 hour which converts to about 10 bucks. About 5MB of text for 5 or 10 epochs.
https://desuarchive.org/g/thread/106635936/#106643734
I also spent 200 dollars trying to fit bigger models and tinkering with the configs but you can mostly avoid that by doing as much testing as possible on a cheaper machine.
This weekend I might try full finetune vs LoRa.
Do you already have a dataset and model in mind?
>>106694177
Same question to you and everybody else claiming LoRa is worse than full fine-tune, what would it take for you to be proven wrong?

Anonymous
09/25/25(Thu)06:03:38 No.106694357

Anonymous 09/25/25(Thu)06:03:38 No.106694357

>>106694330
/lmg/ has existed for nearly 3 years now and still no one has tried making a better ST alternative

Anonymous
09/25/25(Thu)06:04:27 No.106694361

Anonymous 09/25/25(Thu)06:04:27 No.106694361

>>106694354
>what would it take for you to be proven wrong?
A good finetune that's actually worth using would be a start.

Anonymous
09/25/25(Thu)06:05:35 No.106694370

Anonymous 09/25/25(Thu)06:05:35 No.106694370

>>106692216
4gb, honestly it worked fine for what i needed which wasnt much. maybe one day ill try that, but also scared to stress my old gpu

Anonymous
09/25/25(Thu)06:07:00 No.106694379

Anonymous 09/25/25(Thu)06:07:00 No.106694379

>>106694344
If the actually usable context increases, that would be useful for RP, because the quality tends to sharply decrease way before reaching the maximum context.

Anonymous
09/25/25(Thu)06:07:56 No.106694384

Anonymous 09/25/25(Thu)06:07:56 No.106694384

>>106694361
That's completely subjective. I only care about specific quantitative metrics.

Anonymous
09/25/25(Thu)06:09:11 No.106694394

Anonymous 09/25/25(Thu)06:09:11 No.106694394

>>106694384
Then a new finetune that improves on the base model on specific quantitative metrics.

Anonymous
09/25/25(Thu)06:10:37 No.106694402

Anonymous 09/25/25(Thu)06:10:37 No.106694402

Well, maybe I care to some extent about subjective perception. But "just make a finetune worth using" is not enough to go on. I want to test full finetune vs LoRa, not the value of finetunes in general.

Anonymous
09/25/25(Thu)06:10:45 No.106694404

Anonymous 09/25/25(Thu)06:10:45 No.106694404

>>106694370
Which model did you end up testing? I assume you're that gtx 1050ti guy.

Anonymous
09/25/25(Thu)06:12:13 No.106694412

Anonymous 09/25/25(Thu)06:12:13 No.106694412

>>106694394
That's easy, I already did it.
All finetunes improve perplexity over the dataset they were trained on (training and validation sets).

Anonymous
09/25/25(Thu)06:13:17 No.106694418

Anonymous 09/25/25(Thu)06:13:17 No.106694418

>>106694402
>I want to test full finetune vs LoRa, not the value of finetunes in general
wouldn't it just be a matter of training the same model on the same dataset using the two different methods and then comparing the results?

Anonymous
09/25/25(Thu)06:13:49 No.106694420

Anonymous 09/25/25(Thu)06:13:49 No.106694420

>>106694357
It’s barely two and everyone outside of this hellsite has moved on to normal UIs

Anonymous
09/25/25(Thu)06:21:25 No.106694474

Anonymous 09/25/25(Thu)06:21:25 No.106694474

>>106694418
Comparing the results how though?
When you're finetuning generally you want the model to fit some dataset without hurting the accuracy for out of distribution data ("catastrophic forgetting").
So maybe a fair way would be measuring the perplexity on an unrelated varied dataset given a certain improvement on the validation set of the data you are training on.

Anonymous
09/25/25(Thu)06:22:10 No.106694483

Anonymous 09/25/25(Thu)06:22:10 No.106694483

>>106694254
>Big number equals better
If that's what gets the money flowing I guess

Anonymous
09/25/25(Thu)06:24:01 No.106694498

Anonymous 09/25/25(Thu)06:24:01 No.106694498

File: 1731449899979636.png (368 KB, 748x960)

368 KB PNG

Lora = intruder dimensions = lost general performance
https://arxiv.org/pdf/2410.21228v1

Anonymous
09/25/25(Thu)06:24:43 No.106694501

Anonymous 09/25/25(Thu)06:24:43 No.106694501

But then people can argue you didn't tune the hyperparameters correctly and that's why you got a certain result.

Anonymous
09/25/25(Thu)06:24:56 No.106694504

Anonymous 09/25/25(Thu)06:24:56 No.106694504

>>106694483
It's the only metric investors and businessmen understand. Number goes up = must be important.

Anonymous
09/25/25(Thu)06:26:12 No.106694516

Anonymous 09/25/25(Thu)06:26:12 No.106694516

>>106694498
Then how come the original LoRa paper claimed less catastrophic forgetting (loss of generality)?
Intuitively training an extra layer on top and keeping the old weights frozen sounds like it would maintain more of the old information than changing all of the weights.

Anonymous
09/25/25(Thu)06:29:00 No.106694543

Anonymous 09/25/25(Thu)06:29:00 No.106694543

>>106694420
Just like most people use smartphones instead of computers?

Anonymous
09/25/25(Thu)06:29:39 No.106694544

Anonymous 09/25/25(Thu)06:29:39 No.106694544

>>106694516
>Intuitively training an extra layer on top and keeping the old weights frozen
That's finetuning, and the proper way to do it. LoRA means Low Rank Adapter and is essentially a diff (that sloptuners just merge back into the model for some reason) that modifies only the lower ranked weights of the model, which is what causes the intruder dimensions.

Anonymous
09/25/25(Thu)06:33:30 No.106694577

Anonymous 09/25/25(Thu)06:33:30 No.106694577

>>106694544
>that sloptuners just merge back into the model for some reason
That trend is pretty much entirely due to most people using quantization and the usage of separate loras on quanted models being a pita

Anonymous
09/25/25(Thu)06:36:50 No.106694608

Anonymous 09/25/25(Thu)06:36:50 No.106694608

>>106694516
>Intuitively training an extra layer on top and keeping the old weights frozen
I think there is only so much a single layer can do. if the adjacent layers are frozen it might just train itself to do nothing since it must be able to interface with the frozen weights. you could try scaling the LR by layer.

Anonymous
09/25/25(Thu)06:41:19 No.106694639

Anonymous 09/25/25(Thu)06:41:19 No.106694639

>>106694543
No, just like how people doing anything worth doing use computers and phones are for shitposting and jacking off.

Anonymous
09/25/25(Thu)06:44:04 No.106694665

Anonymous 09/25/25(Thu)06:44:04 No.106694665

>>106694543
>>106694639
the majority is always right by definiton

Anonymous
09/25/25(Thu)06:54:05 No.106694733

Anonymous 09/25/25(Thu)06:54:05 No.106694733

>>106694665
https://www.worldometers.info/world-population/population-by-country/

Anonymous
09/25/25(Thu)06:55:22 No.106694745

Anonymous 09/25/25(Thu)06:55:22 No.106694745

>>106694404
yeah i am, ended up going Qwen3-30B-A3B-Instruct-2507, works fine on cpu and was good enough for what i wanted it to do.

i messed around with the settings the tiniest bit, had it using essentially 1 token per word, it was slow but i am patient

Anonymous
09/25/25(Thu)06:55:55 No.106694749

Anonymous 09/25/25(Thu)06:55:55 No.106694749

>>106694504
They got their positions by nepotism, not merit

Anonymous
09/25/25(Thu)06:57:24 No.106694754

Anonymous 09/25/25(Thu)06:57:24 No.106694754

>>106694544
That's not (full) finetuning. Full finetuning is unfreezing all of the trainable weights and training them in FP16, which is why it takes massive amounts of memory. Because besides having to load the model in FP16 (which you don't with QLoRa) you also need more extra memory to backpropagate the gradients across all of the weights (LoRas are typically less than %1 of the total weights), and to hold the optimizer state for all of those weights, which unless you are using SGD without momentum typically takes more memory than the actual weights, sometimes many times the VRAM taken by the model when using something like Adam.
The way I understand it is that LoRa trains two linear layers in parallel to the actual layer of the model which takes the same input as the input to the model's layer, and the results are added up to the activations, and the "low rank" part comes from the fact that the complexity of the delta is limited, it doesn't mean that it only modifies some specific weights.
If inserting new layers that modify the activations directly instead of taking as an input the input to the forzen layer was better then LoRa would have never existed to begin with, since that is a much more obvious idea than what LoRa does.
As for the "intruder dimensions", my math is not strong enough to understand what that means, but I think just because it mathematically differs from the full finetune doesn't mean anything. How do you know those "intruder dimensions" aren't actually a good thing compared to full finetune? You are kind of assuming full finetune is an ideal and LoRa is an approximation, which is not necessarily the case unless you are training on a varied enough dataset that catastrophic forgetting is not a concern.

Anonymous
09/25/25(Thu)06:58:58 No.106694767

Anonymous 09/25/25(Thu)06:58:58 No.106694767

>>106694754
What he described is finetuning, but it's not full finetuning. FFT is different.

Anonymous
09/25/25(Thu)06:59:18 No.106694769

Anonymous 09/25/25(Thu)06:59:18 No.106694769

>>106694544
>>106694577
As an aside, merging LoRas is supposed to be very bad unless they were trained using QAT techniques specifically to be merged back into the quantized model which is not trivial to do, because they are very sensible to the quantization noise. Maybe that's why they have such a bad rep.

Anonymous
09/25/25(Thu)07:01:37 No.106694783

Anonymous 09/25/25(Thu)07:01:37 No.106694783

>>106694767
Have you read about someone actually doing that kind of finetuning? It sounds like a fairly obvious idea but I've never read about anyone actually doing it that way.
Another fairly obvious kind of possible finetuning is freezing all of the layers and only training one at a time, I don't know how well that works but it's one way to save memory.

Anonymous
09/25/25(Thu)07:02:38 No.106694791

Anonymous 09/25/25(Thu)07:02:38 No.106694791

>>106694577
You could merge on the fly for every matrix multiply if the inference software supported it. If the software still does compute at fp16, it would work fine.

Anonymous
09/25/25(Thu)07:05:23 No.106694806

Anonymous 09/25/25(Thu)07:05:23 No.106694806

>>106694783
Regular finetuning is the ML standard for making task-specific models. I don't follow the sloptuners or their scams that closely, but I'm pretty sure I've seen drummer mention some model of his had additional layers tacked on, so even the retards are aware of it.

Anonymous
09/25/25(Thu)07:07:26 No.106694824

Anonymous 09/25/25(Thu)07:07:26 No.106694824

>>106694254
Looks like SSD-maxxers were right all along

Anonymous
09/25/25(Thu)07:09:43 No.106694841

Anonymous 09/25/25(Thu)07:09:43 No.106694841

>>106694824
SSD-maxxers will be so smug when they get their first response after letting it generate for a month

Anonymous
09/25/25(Thu)07:10:40 No.106694847

Anonymous 09/25/25(Thu)07:10:40 No.106694847

>>106694379
A larger native context may result in a larger usable context. Let them cook

Anonymous
09/25/25(Thu)07:10:54 No.106694849

Anonymous 09/25/25(Thu)07:10:54 No.106694849

File: 1735367770719406.png (809 KB, 1112x956)

809 KB PNG

Actual AI usecase

Anonymous
09/25/25(Thu)07:11:39 No.106694852

Anonymous 09/25/25(Thu)07:11:39 No.106694852

>>106694849
That still just looks like a dude with lipstick.

Anonymous
09/25/25(Thu)07:11:45 No.106694856

Anonymous 09/25/25(Thu)07:11:45 No.106694856

>big model bad because my small pp machine can't run it
/lmg/ everyone

Anonymous
09/25/25(Thu)07:12:47 No.106694864

Anonymous 09/25/25(Thu)07:12:47 No.106694864

>>106694852
It's probably an old model. Just plug in Wan 2.2 Animate next time and people won't be able to tell.

Anonymous
09/25/25(Thu)07:13:48 No.106694871

Anonymous 09/25/25(Thu)07:13:48 No.106694871

>>106694847
Llama 4. Qwen 1M getting 2K at ruler that an anon tested not long ago. Context size claims continue to be a scam.

Anonymous
09/25/25(Thu)07:20:55 No.106694916

Anonymous 09/25/25(Thu)07:20:55 No.106694916

>>106694783
You have no target for modular training with a normal model, what should the intermediate output be? Only if it's pretrained modularly in the first place could you finetune it modularly, but no one does that.

Anonymous
09/25/25(Thu)07:22:52 No.106694931

Anonymous 09/25/25(Thu)07:22:52 No.106694931

How does it feel to know that you are living on borrowed time? Even if you are in the top 0.1% of /lmg/, models are clearly only getting bigger from here. 1T is just about doable but Qwen is speculating about going 10T. Even a 24x64gb DDR5 machine is going to be stuck running less than 1.5bit for something like this.
Even worse, what if the active parameters begin to increase? There's a clear sense of stagnation between all the 400B-1T models that float around the 30-40b active parameter mark. It's getting increasingly likely that the active parameters are going to inflate sooner rather than later.
70b dense already runs like shit on a ddr5 machine and an MoE relying on that amount of active parameters would be even worse.
We are two steps away from the point where even open models are going to stop being local models no matter how much money you throw at hardware.

Anonymous
09/25/25(Thu)07:23:56 No.106694939

Anonymous 09/25/25(Thu)07:23:56 No.106694939

>>106694931
Hardware will get cheaper.

Anonymous
09/25/25(Thu)07:24:18 No.106694942

Anonymous 09/25/25(Thu)07:24:18 No.106694942

>>106694939
lol

Anonymous
09/25/25(Thu)07:24:59 No.106694946

Anonymous 09/25/25(Thu)07:24:59 No.106694946

>>106694942
You're just saying "lol" because you don't want your current hardware to depreciate.

Anonymous
09/25/25(Thu)07:26:10 No.106694954

Anonymous 09/25/25(Thu)07:26:10 No.106694954

There's a 0% chance that we won't have dedicated inference hardware that can run 10T/200A models at 30t/s for less than $10k within the next two years.

Anonymous
09/25/25(Thu)07:26:12 No.106694955

Anonymous 09/25/25(Thu)07:26:12 No.106694955

>>106694931
They're still going to need hardware to run their models on and it's unlikely they'll all cook up their own proprietary solution. Whatever it is they'll use to run their models, you'll be able to buy as long as you're willing to pay the price.

Anonymous
09/25/25(Thu)07:27:13 No.106694966

Anonymous 09/25/25(Thu)07:27:13 No.106694966

>>106694931
I don't think the active parameters will start increasing again until they reach the inevitable conclusion 100M active and hit the limits of benchmark and arena cheating at that size.

>>106694939
Still waiting for Chinese GPUs with terabytes of slow, but cheap VRAM. Though at this point, even if they did make them they would probably be export banned anyway. Hardware is a hope that is at best a decade away.

Anonymous
09/25/25(Thu)07:28:04 No.106694972

Anonymous 09/25/25(Thu)07:28:04 No.106694972

>>106694955
China datacenter hardware will be banned because of spying security concerns in most of the world

Anonymous
09/25/25(Thu)07:28:13 No.106694974

Anonymous 09/25/25(Thu)07:28:13 No.106694974

>>106694954
People were saying the same for GPT-4 sized models back in 2023.

Anonymous
09/25/25(Thu)07:30:58 No.106694993

Anonymous 09/25/25(Thu)07:30:58 No.106694993

>>106694972
You can't ban individuals from obtaining pieces of hardware on Alibaba

Anonymous
09/25/25(Thu)07:31:16 No.106694996

Anonymous 09/25/25(Thu)07:31:16 No.106694996

>>106694357
It's really beneficial to make your own rather than trying to use crutches to get what you want from some bloated universal solution. LLMs are actually good at webshit

Anonymous
09/25/25(Thu)07:31:22 No.106694997

Anonymous 09/25/25(Thu)07:31:22 No.106694997

>>106694916
The goal for the intermediate output would be whatever minimizes the final loss, just like in any other kind of training.
You would randomly select one layer, train it for a few steps, then select another layer, train it for a few steps, and so on.
Or you could choose random subsets of weights to unfreeze each time in some other way, not necessarily per layer.

>>106694806
Fair enough, after searching a bit that seems to have been popular before LoRa yeah. But how do you know those don't add intruder dimensions as well?

Anonymous
09/25/25(Thu)07:32:09 No.106695004

Anonymous 09/25/25(Thu)07:32:09 No.106695004

>>106694856
/lmg/ simply doesn't give a shit about big models. There are a few people who want to go back to mid size dense though.
>MoE bad because my pile of 3090s is useless for it

Anonymous
09/25/25(Thu)07:33:05 No.106695014

Anonymous 09/25/25(Thu)07:33:05 No.106695014

>>106694993
they literally can. what country do you live that doesn't have a customs office?

Anonymous
09/25/25(Thu)07:35:34 No.106695044

Anonymous 09/25/25(Thu)07:35:34 No.106695044

>>106694946
I'm not that petty, I don't care about how my computer compares to whatever else is available. I'm just really cynical about the corporate oligarchy and the entrenched tech monopoly.

Anonymous
09/25/25(Thu)07:38:16 No.106695062

Anonymous 09/25/25(Thu)07:38:16 No.106695062

>>106695044
Tech monopoly can and will be broken because physics are the same everywhere on planet earth
Even CUDA the moat is looking to be shaky because of vibecoding

Anonymous
09/25/25(Thu)07:41:50 No.106695097

Anonymous 09/25/25(Thu)07:41:50 No.106695097

>>106695062
The same vibecoding that is attemped repeatedly without success to add model support to llama.cpp?

Anonymous
09/25/25(Thu)07:41:51 No.106695098

Anonymous 09/25/25(Thu)07:41:51 No.106695098

>>106695062
you underestimate my cynicism. these people will start wars and assassinate people over this tech. they absolutely do not want people to have these things. it is only being developed for their surveillance and propaganda purposes.

Anonymous
09/25/25(Thu)07:43:24 No.106695110

Anonymous 09/25/25(Thu)07:43:24 No.106695110

>>106695098
Then it's good multiple group of people (e.g. China vs. the West) have different surveillance and propaganda goals

Anonymous
09/25/25(Thu)07:44:35 No.106695119

Anonymous 09/25/25(Thu)07:44:35 No.106695119

>>106695098
NTA and agree massively with you, once the tech gets good enough we'll only get the bottom scraps through monthly paid subs while they use the actually good shit to fuck us in the ass.

Anonymous
09/25/25(Thu)07:46:58 No.106695137

Anonymous 09/25/25(Thu)07:46:58 No.106695137

>>106693914
Not aware of any modern models doing that.

Anonymous
09/25/25(Thu)07:48:18 No.106695148

Anonymous 09/25/25(Thu)07:48:18 No.106695148

>>106694159
qloras are just loras trained with quantized models. they don't make anything better

Anonymous
09/25/25(Thu)07:49:50 No.106695157

Anonymous 09/25/25(Thu)07:49:50 No.106695157

>>106694955
Yes, the hardware they'll use will be a 20 million dollar cluster of 16 8xH200 servers connected through Infiniband. And instead of 0.1% of /lmg/ dwellers who can run it at more than 0.1 t/s the number will be 0.0%.

Anonymous
09/25/25(Thu)07:50:27 No.106695159

Anonymous 09/25/25(Thu)07:50:27 No.106695159

>>106695014
>buy 200 dollar thinkpad
>swap out the internals with the ching chong gpu
>cry to jesus for the difficulty of what you just did and compare yourself to dante
also delulu if you think customs office gonna be shit lol glownigs could barely stop the silk road they wont be able to do shit this will be just like piracy

Anonymous
09/25/25(Thu)07:52:51 No.106695170

Anonymous 09/25/25(Thu)07:52:51 No.106695170

>>106695159
NTA but illegally purchased goods are unfortunately not tax deductible.

Anonymous
09/25/25(Thu)07:55:43 No.106695183

Anonymous 09/25/25(Thu)07:55:43 No.106695183

>>106695157
Speak for yourself, poorfag.

Anonymous
09/25/25(Thu)08:01:40 No.106695214

Anonymous 09/25/25(Thu)08:01:40 No.106695214

>>106695157
If people were able to delude themselves into thinking running on CPU is practical just because it works, I have no doubt we'll have people shitposting here that they can technically run those models off of NVMe.

Anonymous
09/25/25(Thu)08:34:12 No.106695408

Anonymous 09/25/25(Thu)08:34:12 No.106695408

File: badwords.png (12 KB, 719x108)

12 KB PNG

>>106693183
Learned some amusing words today

Anonymous
09/25/25(Thu)08:37:35 No.106695428

Anonymous 09/25/25(Thu)08:37:35 No.106695428

>>106695408
niggaracci please

Anonymous
09/25/25(Thu)08:38:05 No.106695433

Anonymous 09/25/25(Thu)08:38:05 No.106695433

LLMs be like:
If I peepee poopoo does it uh oh stinky?
>Great question - You're essentially asking if stinks when you're poopensharten. Lets get to the ground of this...
But uh oh stinky if poopensharten in loo?
>You're totally right! If you poo in loo, theres's....

Anonymous
09/25/25(Thu)08:43:20 No.106695483

Anonymous 09/25/25(Thu)08:43:20 No.106695483

>>106695428
yeah gimme the niggaracci with the gabagool

Anonymous
09/25/25(Thu)08:43:59 No.106695492

Anonymous 09/25/25(Thu)08:43:59 No.106695492

>>106695433
>great question
>you're totally right!
not everyone is using gpt slop sorry brah

Anonymous
09/25/25(Thu)08:49:46 No.106695552

Anonymous 09/25/25(Thu)08:49:46 No.106695552

File: proxy-image.png (2.8 MB, 1536x1536)

2.8 MB PNG

https://huggingface.co/unsloth/Qwen3-Next-80B-A3B-Instruct-GGUF
https://huggingface.co/unsloth/Qwen3-Next-80B-A3B-Instruct-GGUF
https://huggingface.co/unsloth/Qwen3-Next-80B-A3B-Instruct-GGUF
We're so back!!

Anonymous
09/25/25(Thu)08:50:17 No.106695558

Anonymous 09/25/25(Thu)08:50:17 No.106695558

https://xcancel.com/bdsqlsz/status/1971055141001605307#m
>China just released a new GPU
>CUDA-compatible
>112GB of HBM memory,
HOLD UP LET THEM COOK

Anonymous
09/25/25(Thu)08:56:04 No.106695612

Anonymous 09/25/25(Thu)08:56:04 No.106695612

>>106695558
They didn't release shit you stupid zoomer.

Anonymous
09/25/25(Thu)09:03:58 No.106695679

Anonymous 09/25/25(Thu)09:03:58 No.106695679

>>106695552
Yooooo.

Anonymous
09/25/25(Thu)09:04:05 No.106695680

Anonymous 09/25/25(Thu)09:04:05 No.106695680

>>106695558
The Fenghu 2 pricing was not particularly good for a shitty 4GB card.
aliexpress.com/i/1005007347816812.html

Anonymous
09/25/25(Thu)09:08:49 No.106695712

Anonymous 09/25/25(Thu)09:08:49 No.106695712

File: F1-GP-454x280.jpg (13 KB, 454x280)

13 KB JPG

>>106695428
Sounds like some guido-nigger racer name

Anonymous
09/25/25(Thu)09:11:53 No.106695741

Anonymous 09/25/25(Thu)09:11:53 No.106695741

>>106694310
as someone who ran full llama 405b at the time of release (and never again) and also runs qwen 480b all day every day, I have to *aherm, ackchuawally* at your comparison

Anonymous
09/25/25(Thu)09:14:01 No.106695760

Anonymous 09/25/25(Thu)09:14:01 No.106695760

>>106694871
*nolima
Ruler is an older shittier one that is less capable of assessing context length performance, though better than NIAH by itself.

Anonymous
09/25/25(Thu)09:15:20 No.106695772

Anonymous 09/25/25(Thu)09:15:20 No.106695772

>>106694354
>Last weekend I tuned this LoRa
I would love to have the secret knowledge. There are many things I would love to make functioning LoRas for, but I'm too retarded to manage without spoonfeeding. Can you regurgitate into a rentry for us lesser mortals?

Anonymous
09/25/25(Thu)09:15:26 No.106695773

Anonymous 09/25/25(Thu)09:15:26 No.106695773

>>106695552
Wtf, vibe coders did it again?

Anonymous
09/25/25(Thu)09:15:54 No.106695776

Anonymous 09/25/25(Thu)09:15:54 No.106695776

>>106694931
You don’t need 10T parameters for good rp, so how would it affect me?

Anonymous
09/25/25(Thu)09:22:06 No.106695818

Anonymous 09/25/25(Thu)09:22:06 No.106695818

>>106694931
>How does it feel to know that you are living on borrowed time?
How does it feel to know that, given time, the hardware to do any arbitrary thing will inevitably be pushed down to the cost of a pocket calculator?

Anonymous
09/25/25(Thu)09:28:33 No.106695866

Anonymous 09/25/25(Thu)09:28:33 No.106695866

>>106695818
How much time do you have left to give?

Anonymous
09/25/25(Thu)09:31:26 No.106695888

Anonymous 09/25/25(Thu)09:31:26 No.106695888

>>106695866
>How much time do you have left to give?
While I may die tomorrow, given the still-logaraithmic progression of Moore's law, statistically I have ample time

Anonymous
09/25/25(Thu)09:35:03 No.106695924

Anonymous 09/25/25(Thu)09:35:03 No.106695924

test2

Anonymous
09/25/25(Thu)09:35:27 No.106695929

Anonymous 09/25/25(Thu)09:35:27 No.106695929

>>106695888
Statistically, your hands will be too arthritis-ridden to jack off before the hardware costs come down that far.

Anonymous
09/25/25(Thu)09:39:46 No.106695950

Anonymous 09/25/25(Thu)09:39:46 No.106695950

>>106691703
This is the perfect woman btw

Anonymous
09/25/25(Thu)09:48:02 No.106695993

Anonymous 09/25/25(Thu)09:48:02 No.106695993

File: tfin.gif (2.34 MB, 480x300)

2.34 MB GIF

>>106695929
https://osr.wiki/books/osr2/page/overview

Anonymous
09/25/25(Thu)09:48:28 No.106695995

Anonymous 09/25/25(Thu)09:48:28 No.106695995

>>106695772
Sure, I made a guide here: https://paste.centos.org/view/e94ce753
The instructions/commands are mostly from memory and there's a chance that wasn't the exact version of the files I used but that should get you started.

Anonymous
09/25/25(Thu)09:52:13 No.106696026

Anonymous 09/25/25(Thu)09:52:13 No.106696026

CudaDEV, did you ever get your llama training code working? Does it support LoRa?

Anonymous
09/25/25(Thu)09:53:57 No.106696044

Anonymous 09/25/25(Thu)09:53:57 No.106696044

>>106695993
Touche.

Anonymous
09/25/25(Thu)09:58:16 No.106696067

Anonymous 09/25/25(Thu)09:58:16 No.106696067

>>106695950

bunnyayumi does miku cos?

Anonymous
09/25/25(Thu)09:59:09 No.106696073

Anonymous 09/25/25(Thu)09:59:09 No.106696073

>>106695995
>bnb 4bit
what if i want non shit loras?

Anonymous
09/25/25(Thu)10:13:42 No.106696168

Anonymous 09/25/25(Thu)10:13:42 No.106696168

File: 1731086290564808.webm (3.92 MB, 1080x1080)

3.92 MB WEBM

>>106695993

Anonymous
09/25/25(Thu)10:16:30 No.106696188

Anonymous 09/25/25(Thu)10:16:30 No.106696188

File: OSR2.png (2.1 MB, 1400x1024)

2.1 MB PNG

>>106696168
I also use it in VR

Anonymous
09/25/25(Thu)10:20:35 No.106696217

Anonymous 09/25/25(Thu)10:20:35 No.106696217

is anubis still the best 70b finetune

Anonymous
09/25/25(Thu)10:20:35 No.106696218

Anonymous 09/25/25(Thu)10:20:35 No.106696218

File: file.png (140 KB, 744x806)

140 KB PNG

random japanese dense model attempt at LLM
https://huggingface.co/stockmark/Stockmark-2-100B-Instruct

Anonymous
09/25/25(Thu)10:26:38 No.106696264

Anonymous 09/25/25(Thu)10:26:38 No.106696264

>>106696218
>2.0 trillion tokens of data
>Context Length: 32k
lame attempt

Anonymous
09/25/25(Thu)10:29:10 No.106696286

Anonymous 09/25/25(Thu)10:29:10 No.106696286

>>106696218
>Japanese focus
I'm willing to give it a try. Downloading now

Anonymous
09/25/25(Thu)10:30:35 No.106696298

Anonymous 09/25/25(Thu)10:30:35 No.106696298

any fine tuning/RL experts? What are some of the best ways to fine tune a model for very specific classification tasks? such as identifying specific things given description

Anonymous
09/25/25(Thu)10:31:31 No.106696306

Anonymous 09/25/25(Thu)10:31:31 No.106696306

>>106696218
They tried™

llama.coo CUDA dev !!yhbFjk57TDr
09/25/25(Thu)10:33:08 No.106696316

llama.coo CUDA dev !!yhbFjk57TDr 09/25/25(Thu)10:33:08 No.106696316

>>106696026
There is in principle functional training code in llama.cpp, see examples/training
However, I don't consider the code to be in a state where it's really usable.

>Does it support LoRa?
No.

Anonymous
09/25/25(Thu)10:35:48 No.106696342

Anonymous 09/25/25(Thu)10:35:48 No.106696342

>>106696188
Is it really worth the trouble?

Anonymous
09/25/25(Thu)10:37:30 No.106696353

Anonymous 09/25/25(Thu)10:37:30 No.106696353

>>106696316
>llama.coo
new project leaked!?

Anonymous
09/25/25(Thu)10:40:02 No.106696372

Anonymous 09/25/25(Thu)10:40:02 No.106696372

>>106696353
cooda dev is making his own fork to break compatibility with ollama

Anonymous
09/25/25(Thu)10:44:08 No.106696419

Anonymous 09/25/25(Thu)10:44:08 No.106696419

wake up anon, a huge fat pr just went up

Anonymous
09/25/25(Thu)10:48:18 No.106696452

Anonymous 09/25/25(Thu)10:48:18 No.106696452

>>106695552
Miku seems like she's used to being groped.

Anonymous
09/25/25(Thu)10:51:17 No.106696471

Anonymous 09/25/25(Thu)10:51:17 No.106696471

largestral at the end of october

Anonymous
09/25/25(Thu)10:52:53 No.106696482

Anonymous 09/25/25(Thu)10:52:53 No.106696482

>>106696471
What is ral and how large is the largest one?

Anonymous
09/25/25(Thu)10:53:15 No.106696485

Anonymous 09/25/25(Thu)10:53:15 No.106696485

>>106696471
With the launches of Mistral Small in March and Mistral Medium today, it’s no secret that we’re working on something ‘large’ over the next few weeks. With even our medium-sized model being resoundingly better than flagship open source models such as Llama 4 Maverick, we’re excited to ‘open’ up what’s to come :)

Anonymous
09/25/25(Thu)10:57:20 No.106696515

Anonymous 09/25/25(Thu)10:57:20 No.106696515

>>106696342
In VR, it is very immersive. Until we get inexpensive robots, I don’t know of a better option to make Miku real. I write my own soft so I don't know how compatible this thing is, it uses TCode https://github.com/multiaxis/TCode-Specification

Anonymous
09/25/25(Thu)10:57:57 No.106696521

Anonymous 09/25/25(Thu)10:57:57 No.106696521

>>106696482
is rally large

Anonymous
09/25/25(Thu)11:03:06 No.106696568

Anonymous 09/25/25(Thu)11:03:06 No.106696568

>>106696521
"large" lie GLM or actually large like Kimi K2?

Anonymous
09/25/25(Thu)11:03:23 No.106696573

Anonymous 09/25/25(Thu)11:03:23 No.106696573

>>106696471
MoE or dense though?

Anonymous
09/25/25(Thu)11:04:38 No.106696583

Anonymous 09/25/25(Thu)11:04:38 No.106696583

>>106696568
There's a non-zero chance it's just a modified DeepSeek.

Anonymous
09/25/25(Thu)11:06:47 No.106696600

Anonymous 09/25/25(Thu)11:06:47 No.106696600

>>106696568
so large it's been uploading since april

Anonymous
09/25/25(Thu)11:07:14 No.106696604

Anonymous 09/25/25(Thu)11:07:14 No.106696604

>>106696583
That is the best outcome, as it will force DeepSeek to compete

Anonymous
09/25/25(Thu)11:07:18 No.106696606

Anonymous 09/25/25(Thu)11:07:18 No.106696606

>>106696583
That's what medium is as usual. Large will be larger

Anonymous
09/25/25(Thu)11:09:15 No.106696622

Anonymous 09/25/25(Thu)11:09:15 No.106696622

>>106696604
Lazy fucks promised R2 by May and have done nothing but sit around bathing in their success.

Anonymous
09/25/25(Thu)11:09:54 No.106696627

Anonymous 09/25/25(Thu)11:09:54 No.106696627

File: 39417286badf0b9722fc9013f(...).jpg (37 KB, 512x512)

37 KB JPG

>>106696600
So large!

Anonymous
09/25/25(Thu)11:11:08 No.106696638

Anonymous 09/25/25(Thu)11:11:08 No.106696638

>>106696622
It got leaked that they tried to make V4 (likely 'encouraged' by the chinese government) using the shiny new Huawei cards but training kept going wrong on chink hardware so V4 got delayed

Anonymous
09/25/25(Thu)11:12:37 No.106696653

Anonymous 09/25/25(Thu)11:12:37 No.106696653

>>106696622
Exactly. But no one else came close, even K2 is a sidegrade at best

Anonymous
09/25/25(Thu)11:13:02 No.106696656

Anonymous 09/25/25(Thu)11:13:02 No.106696656

>>106696600
I was hoping to make fun of French people for having slow internet but according to Wikipedia they actually have some of the best speeds in the world.

Anonymous
09/25/25(Thu)11:14:19 No.106696668

Anonymous 09/25/25(Thu)11:14:19 No.106696668

>>106696638
The same leaker said that your mom is a whore, so it’s a very credible source

Anonymous
09/25/25(Thu)11:18:33 No.106696705

Anonymous 09/25/25(Thu)11:18:33 No.106696705

>>106696656
Yep, you burgers wish you had this speed for that cheap

Anonymous
09/25/25(Thu)11:21:48 No.106696748

Anonymous 09/25/25(Thu)11:21:48 No.106696748

>>106696705
>burger
I wish, France has a median speed of 287 Mb/s, the US has 274 Mb/s, and Germany has fucking 95 Mb/s.

Anonymous
09/25/25(Thu)11:27:17 No.106696804

Anonymous 09/25/25(Thu)11:27:17 No.106696804

>>106696748
My condolences Hans. At least you got cheaper electronics

Anonymous
09/25/25(Thu)11:28:45 No.106696819

Anonymous 09/25/25(Thu)11:28:45 No.106696819

>>106696298
Does no one do RL here? What are you faggots even doing? Inference?

Anonymous
09/25/25(Thu)11:29:09 No.106696828

Anonymous 09/25/25(Thu)11:29:09 No.106696828

>>106696653
Isn't K2 one of the least slopped chink models? I don't have the rig to run it though...

Anonymous
09/25/25(Thu)11:30:04 No.106696838

Anonymous 09/25/25(Thu)11:30:04 No.106696838

>>106696828
It's just as bad a s the rest of them.

Anonymous
09/25/25(Thu)11:30:33 No.106696846

Anonymous 09/25/25(Thu)11:30:33 No.106696846

>>106696298
Bro you don't need an LLM for that

Anonymous
09/25/25(Thu)11:31:40 No.106696857

Anonymous 09/25/25(Thu)11:31:40 No.106696857

>>106696073
I think it shouldn't change too much the quality of the LoRa since once you make the LoRa you can apply it to any quantization you want, but you see the option to load in NF8 in the config file or leave both to false for FP16 I believe.
Either way if you want to train any other model than Llama you'll probably have to tweak the config file anyway and I forgot to mention the model shown there is a base model and not a chat model.

Anonymous
09/25/25(Thu)11:34:44 No.106696888

Anonymous 09/25/25(Thu)11:34:44 No.106696888

>>106696298
You could try doing many generations at high-ish temperature, pick the most accurate one and train on that. Or for the non thinking models just train on the actual data, no need for RL.

Anonymous
09/25/25(Thu)11:40:17 No.106696936

Anonymous 09/25/25(Thu)11:40:17 No.106696936

>>106696638
I don't trust any leaks in this field, it's filled with grifters and shills to the brim.

Anonymous
09/25/25(Thu)11:50:50 No.106697044

Anonymous 09/25/25(Thu)11:50:50 No.106697044

Any decent image understanding model on local? Llama4 maverick fucking blows (tried on cloud). Qwen3Max or w.e is awesome but I don't think it's open

Anonymous
09/25/25(Thu)11:51:42 No.106697053

Anonymous 09/25/25(Thu)11:51:42 No.106697053

>>106697044
Qwen3-VL?

Anonymous
09/25/25(Thu)11:59:59 No.106697123

Anonymous 09/25/25(Thu)11:59:59 No.106697123

Do we have open source AI for creating timetables?
Something like the last three months in context, and it recognizes vacation entries, sick days, training, and all that stuff in the template to be filled out. Whether new names or which ones to remove, and it creates a plan from that. :>
Let's be honest, it can't be that hard to create some synthetic data for that.
Is there such a thing?

Anonymous
09/25/25(Thu)12:05:29 No.106697178

Anonymous 09/25/25(Thu)12:05:29 No.106697178

>>106697123
Creating arbitrary, constraint driven schedules is np-hard, little bro

Anonymous
09/25/25(Thu)12:08:23 No.106697205

Anonymous 09/25/25(Thu)12:08:23 No.106697205

>>106697178
I'm not the sharpest tool in the shed but what? Couldn't you throw an SMT solver at a schedule and get a variety of potential solutions? Are you just referring to the 'optimal' schedule? But even then, this isn't like a traveling salesman problem, I think....

Anonymous
09/25/25(Thu)12:10:03 No.106697216

Anonymous 09/25/25(Thu)12:10:03 No.106697216

File: swiss-ai_logo.jpg (9 KB, 200x200)

9 KB JPG

>>106691703
/lmg/ seems to really value anything open source, which is why I wonder why I've never seen this AI group mentioned:

https://huggingface.co/swiss-ai

Both the models and the data sets used to train them are free and public.

Anonymous
09/25/25(Thu)12:12:54 No.106697244

Anonymous 09/25/25(Thu)12:12:54 No.106697244

>>106697216
/lmg/ looked at it the day it released, had a hearty laugh and moved on. It's worse than llama3.1

Anonymous
09/25/25(Thu)12:13:01 No.106697247

Anonymous 09/25/25(Thu)12:13:01 No.106697247

>>106697216
this was talked about before, so I assume you're baiting. These are safety-maxxed synthetic models (aka ultra garbage)

Anonymous
09/25/25(Thu)12:14:08 No.106697260

Anonymous 09/25/25(Thu)12:14:08 No.106697260

>>106697244
>It's worse than llama3.1
In what regard?
>>106697247
Sir, I'm employed. You cannot seriously expect everyone to live here.....

Anonymous
09/25/25(Thu)12:14:41 No.106697263

Anonymous 09/25/25(Thu)12:14:41 No.106697263

>>106697044
dots.vlm1 or qwen3-vl

Anonymous
09/25/25(Thu)12:15:09 No.106697267

Anonymous 09/25/25(Thu)12:15:09 No.106697267

>>106697247
How good are they for general purpose stuff assuming you don't care about smut or NSFW RP at all?

Anonymous
09/25/25(Thu)12:17:25 No.106697282

Anonymous 09/25/25(Thu)12:17:25 No.106697282

>>106697260
>Sir, I'm employed. You cannot seriously expect everyone to live here.....
okay? no one did. you got told the answer to your question. CUNT

Anonymous
09/25/25(Thu)12:17:55 No.106697288

Anonymous 09/25/25(Thu)12:17:55 No.106697288

>>106697205
Give it a try. I’m not so pedantic as to argue with a working solution.

Anonymous
09/25/25(Thu)12:19:06 No.106697301

Anonymous 09/25/25(Thu)12:19:06 No.106697301

>>106697260
Most of the official benchmarks they themselves supplied, for one.

Anonymous
09/25/25(Thu)12:29:26 No.106697389

Anonymous 09/25/25(Thu)12:29:26 No.106697389

>>106697216
They trained their models on 4096 H200s
They have more GPUs than DS and nothing to show for it

Anonymous
09/25/25(Thu)12:30:14 No.106697398

Anonymous 09/25/25(Thu)12:30:14 No.106697398

>>106696838
It writes as good as o3

Anonymous
09/25/25(Thu)12:31:00 No.106697408

Anonymous 09/25/25(Thu)12:31:00 No.106697408

>>106697398
Was that supposed to be praise?

Anonymous
09/25/25(Thu)12:31:19 No.106697411

Anonymous 09/25/25(Thu)12:31:19 No.106697411

>>106696638
>>106696622
Trusting leaker clout chasers would be your first mistake.

Anonymous
09/25/25(Thu)12:32:02 No.106697420

Anonymous 09/25/25(Thu)12:32:02 No.106697420

>>106697408
Considering o3 is the best creative writing model, yes.

Anonymous
09/25/25(Thu)12:33:13 No.106697429

Anonymous 09/25/25(Thu)12:33:13 No.106697429

>>106697408
It's on top of the creative writing benchmark

Anonymous
09/25/25(Thu)12:33:27 No.106697433

Anonymous 09/25/25(Thu)12:33:27 No.106697433

File: 82c654dfly1i5q7782m8wj217(...).jpg (565 KB, 1562x1332)

565 KB JPG

This should tell you never to use mystery meant models.

Anonymous
09/25/25(Thu)12:36:24 No.106697457

Anonymous 09/25/25(Thu)12:36:24 No.106697457

>>106697433
Is the llama.cpp implementation for k2 fucked? I tried both the ik_ and normal quants for 0905 but it writes like shit.

Anonymous
09/25/25(Thu)12:38:23 No.106697475

Anonymous 09/25/25(Thu)12:38:23 No.106697475

>>106697457
>quants
Some models just don't quant well. Original R1 quants really well so people probably got the wrong impression

Anonymous
09/25/25(Thu)12:41:38 No.106697493

Anonymous 09/25/25(Thu)12:41:38 No.106697493

>>106697433
post the source next time
https://github.com/MoonshotAI/K2-Vendor-Verfier

Anonymous
09/25/25(Thu)12:45:59 No.106697532

Anonymous 09/25/25(Thu)12:45:59 No.106697532

>>106697433
All the 95-96 ones probably run at q8

Anonymous
09/25/25(Thu)12:47:44 No.106697551

Anonymous 09/25/25(Thu)12:47:44 No.106697551

https://openai.com/index/gdpval/
https://cdn.openai.com/pdf/d5eb7428-c4e9-4a33-bd86-86dd4bcf12ce/GDPval.pdf
>GDPval, the first version of this evaluation, spans 44 occupations selected from the top 9 industries contributing to U.S. GDP. The GDPval full set includes 1,320 specialized tasks (220 in the gold open-sourced set), each meticulously crafted and vetted by experienced professionals with over 14 years of experience on average from these fields. Every task is based on real work products, such as a legal brief, an engineering blueprint, a customer support conversation, or a nursing care plan.

Anonymous
09/25/25(Thu)12:47:44 No.106697552

Anonymous 09/25/25(Thu)12:47:44 No.106697552

why do i only get 3t/s on an IQ3_XXS quant of glm full but like 60t/s on a Q6_K of glm air?

Anonymous
09/25/25(Thu)12:50:44 No.106697577

Anonymous 09/25/25(Thu)12:50:44 No.106697577

>>106697551
oh noes we aced our own test set again hehe

Anonymous
09/25/25(Thu)13:01:07 No.106697672

Anonymous 09/25/25(Thu)13:01:07 No.106697672

>>106697216
>unironically shilling llama3 finetune on synthetic data
buy an ad you brown nigger

Anonymous
09/25/25(Thu)13:06:18 No.106697742

Anonymous 09/25/25(Thu)13:06:18 No.106697742

>>106697475
The "Q6 is basically identical with Q8 which has no measurable drop in performance compared to fp16" thing is no longer true?

Anonymous
09/25/25(Thu)13:07:03 No.106697750

Anonymous 09/25/25(Thu)13:07:03 No.106697750

>>106697551
benchmaxxers are eating good between this and the new meta agent bench

Anonymous
09/25/25(Thu)13:10:26 No.106697792

Anonymous 09/25/25(Thu)13:10:26 No.106697792

>>106697742
who said that lmao

Anonymous
09/25/25(Thu)13:10:37 No.106697796

Anonymous 09/25/25(Thu)13:10:37 No.106697796

>>106697532
Realistically all the cloud providers run something like vllm in the backend so wouldn't it rather be something like FP8?

Anonymous
09/25/25(Thu)13:13:35 No.106697834

Anonymous 09/25/25(Thu)13:13:35 No.106697834

>>106697742
was true when llama2 was still considered the best we've had
the more tokens the model is trained on the worse it takes quantization, and since all models are now big cows with datasets in trillions they all take a hit

Anonymous
09/25/25(Thu)13:15:53 No.106697868

Anonymous 09/25/25(Thu)13:15:53 No.106697868

>>106697672
Nta. How can you tell it's a llama 3 model?

Anonymous
09/25/25(Thu)13:16:28 No.106697871

Anonymous 09/25/25(Thu)13:16:28 No.106697871

>>106697834
People assumed DS quanted so well because it was a huge sparse undertrained MoE. If that's the case, Kimi should handle quanitization better still.

Anonymous
09/25/25(Thu)13:19:13 No.106697903

Anonymous 09/25/25(Thu)13:19:13 No.106697903

>>106697834
wtf so running models locally has become pointless unless you can run fp16?

Anonymous
09/25/25(Thu)13:21:33 No.106697924

Anonymous 09/25/25(Thu)13:21:33 No.106697924

>>106697903
always was cope
>>105106082 (qwen guy)
>Quant is the Mind Killer ;)

Anonymous
09/25/25(Thu)13:22:16 No.106697938

Anonymous 09/25/25(Thu)13:22:16 No.106697938

>>106697903
Not pointless, but throwing away bits has real consequences in this era

Anonymous
09/25/25(Thu)13:23:47 No.106697952

Anonymous 09/25/25(Thu)13:23:47 No.106697952

speaking of quants, the imatrix data used by bartowski and mradermacher is absolute garbage and most likely harms the models. you are better off making your own quants WITHOUT any imatrix.

Anonymous
09/25/25(Thu)13:25:17 No.106697967

Anonymous 09/25/25(Thu)13:25:17 No.106697967

>>106697952
Unsloth is also worse ime. It’s clear they never actually knew what they were doing

Anonymous
09/25/25(Thu)13:26:04 No.106697975

Anonymous 09/25/25(Thu)13:26:04 No.106697975

File: 1749648438945523.png (20 KB, 389x233)

20 KB PNG

>>106697938
I can't believe that a difference of 0.07 in ppl means so much these days. We truly have come such a long way. I'm happy that poorfags are finally paying the price for cheaping out on their builds. We should laugh at them more.

Anonymous
09/25/25(Thu)13:26:33 No.106697981

Anonymous 09/25/25(Thu)13:26:33 No.106697981

>>106697903
I have never seen anyone do rigorous testing w.r.t. the impacts of quantization.
Quant researchers frequently call their 4 BPW formats "lossless" because the score on some benchmark doesn't degrade.

Anonymous
09/25/25(Thu)13:26:35 No.106697982

Anonymous 09/25/25(Thu)13:26:35 No.106697982

>>106697952
>>106697967
only retards actually trust quanters
run full precision or go home

Anonymous
09/25/25(Thu)13:28:10 No.106698002

Anonymous 09/25/25(Thu)13:28:10 No.106698002

>>106697975
Muh KLD::!

Anonymous
09/25/25(Thu)13:30:34 No.106698026

Anonymous 09/25/25(Thu)13:30:34 No.106698026

>>106697967
>Unsloth
apparently they use bartowski's imatrix data + a fork of bartowski's imatrix data. this is just sad

https://unsloth.ai/blog/dynamic-v2

Anonymous
09/25/25(Thu)13:40:27 No.106698134

Anonymous 09/25/25(Thu)13:40:27 No.106698134

>>106697967
unsloth's low dynamic quants work fine for me, at least for my usecase

Anonymous
09/25/25(Thu)13:40:53 No.106698141

Anonymous 09/25/25(Thu)13:40:53 No.106698141

>>106698026
Grim. Those guys are clowns that just were in the right place at the right time

Anonymous
09/25/25(Thu)13:41:01 No.106698144

Anonymous 09/25/25(Thu)13:41:01 No.106698144

>>106697871
my IQ4_KSS quant of kimi holds up pretty well. surely the recipe that goes into making the quant matters a ton, if you haphazardly change between different quants for different parts of the weights you are gonna have a bad time. should've pizza'd when you french fry'd.

Anonymous
09/25/25(Thu)13:42:41 No.106698160

Anonymous 09/25/25(Thu)13:42:41 No.106698160

In the end, ppl is just another benchmark to benchmaxx on.

Anonymous
09/25/25(Thu)13:44:24 No.106698182

Anonymous 09/25/25(Thu)13:44:24 No.106698182

>>106697551
The reason is because the male penis doctor who is unmistakably the boys father is actually the boys mother. A clever take on common gender stereotypes

Anonymous
09/25/25(Thu)13:45:05 No.106698192

Anonymous 09/25/25(Thu)13:45:05 No.106698192

File: __katano_sukune_and_adagu(...).png (717 KB, 1265x1011)

717 KB PNG

Has any inference engine ever implemented an antislop sampler equivalent for preventing the model from mangling the JSON mid tool call?
Like.
https://www.json.org/json-en.html
Just ban tokens that would allow the model to output invalid JSON. Simple as that.

Anonymous
09/25/25(Thu)13:46:31 No.106698207

Anonymous 09/25/25(Thu)13:46:31 No.106698207

>>106698192
If you want to use Json, just use llama.cpp's json schema/GNBF functionality.

Anonymous
09/25/25(Thu)13:49:24 No.106698236

Anonymous 09/25/25(Thu)13:49:24 No.106698236

>>106697551
seems my career as a gooner is safe for now

Anonymous
09/25/25(Thu)13:50:54 No.106698258

Anonymous 09/25/25(Thu)13:50:54 No.106698258

>>106698207
NTA, but how do you continue generation with schema if you run out of tokens before reaching eos?

Anonymous
09/25/25(Thu)13:52:19 No.106698268

Anonymous 09/25/25(Thu)13:52:19 No.106698268

>>>/biz/60992581

Anonymous
09/25/25(Thu)13:54:06 No.106698281

Anonymous 09/25/25(Thu)13:54:06 No.106698281

>>106698258
Tell it to continue and glue the two responses together. vLLM has a continue assistant response flag that gives the last response back to the model without adding any template tags. You could also fuck around with text completion.

Anonymous
09/25/25(Thu)13:55:16 No.106698290

Anonymous 09/25/25(Thu)13:55:16 No.106698290

>>106698258
I think you don't. Not while passing the schema.
If your Json is that big due to nesting, you can always generate the nested artifacts individually then merge them all together programmatically later.
That's what I'm doing.

Anonymous
09/25/25(Thu)13:56:50 No.106698305

Anonymous 09/25/25(Thu)13:56:50 No.106698305

>>106698268
It's testing support after breakout. Anyone sane is buying now.

Anonymous
09/25/25(Thu)13:59:00 No.106698325

Anonymous 09/25/25(Thu)13:59:00 No.106698325

>>106694849
So far the only use case I found for AI was translations. Speaking of, did anything good on that front show on on the local side?

Anonymous
09/25/25(Thu)13:59:10 No.106698328

Anonymous 09/25/25(Thu)13:59:10 No.106698328

>>106698268
Nice, is it time for another sale? I sold some of my bags earlier this year to fund an upgrade of my rig so time to replenish them.

Anonymous
09/25/25(Thu)14:00:28 No.106698345

Anonymous 09/25/25(Thu)14:00:28 No.106698345

>>106697975
I'd rather they just made the models more efficient

Anonymous
09/25/25(Thu)14:01:37 No.106698355

Anonymous 09/25/25(Thu)14:01:37 No.106698355

>>106698345
Qwen is aiming for 10T models now. We're only going bigger from here on out.

Anonymous
09/25/25(Thu)14:05:09 No.106698391

Anonymous 09/25/25(Thu)14:05:09 No.106698391

>>106698355
And then comes the optimization?

Anonymous
09/25/25(Thu)14:07:10 No.106698408

Anonymous 09/25/25(Thu)14:07:10 No.106698408

>>106698355
10T-A30B

Anonymous
09/25/25(Thu)14:09:29 No.106698428

Anonymous 09/25/25(Thu)14:09:29 No.106698428

>>106698408
Forget Qwen-Next already? 10T-A3B

Anonymous
09/25/25(Thu)14:13:51 No.106698474

Anonymous 09/25/25(Thu)14:13:51 No.106698474

>>106697672
You trained off of the same architecture Islamic but it's not a fine tune. Based on what the REAMEs say, data sets present on their account, and poking around in the config files, Those models were trained from scratch

Anonymous
09/25/25(Thu)14:14:52 No.106698484

Anonymous 09/25/25(Thu)14:14:52 No.106698484

>>106698325
i unironically use k2 for my r18 jap games

Anonymous
09/25/25(Thu)14:15:36 No.106698495

Anonymous 09/25/25(Thu)14:15:36 No.106698495

>>106698484
What tool do you use to extract the text

Anonymous
09/25/25(Thu)14:19:25 No.106698529

Anonymous 09/25/25(Thu)14:19:25 No.106698529

>>106698484
I wish there was some tool to do the same thing for translation as some unreal engine/unity games have, it would automatically translate the text as it shows up and replace it in real time, that with something good at translations like gpt would be amazing for games.

Anonymous
09/25/25(Thu)14:21:17 No.106698549

Anonymous 09/25/25(Thu)14:21:17 No.106698549

Time to... snore. I mean to try out Wayfarer 2.

Anonymous
09/25/25(Thu)14:23:35 No.106698564

Anonymous 09/25/25(Thu)14:23:35 No.106698564

>>106698484
lunatranslator should be compatible with a majority of game engines
https://docs.lunatranslator.org/en/

Anonymous
09/25/25(Thu)14:27:00 No.106698594

Anonymous 09/25/25(Thu)14:27:00 No.106698594

File: 56af4b1a261beb6e64f18ce5a(...).jpg (90 KB, 473x499)

90 KB JPG

Magnum v4 123b on 1.17 temp, top k 50, top p 1, min p 0.075, with NoAss on Mistral Tekken format running vectored generated character cards beats every big MoE I’ve seen.

Fight me.

Anonymous
09/25/25(Thu)14:29:51 No.106698617

Anonymous 09/25/25(Thu)14:29:51 No.106698617

>>106698594
>vectored generated character cards
huh? qrd

Anonymous
09/25/25(Thu)14:30:48 No.106698624

Anonymous 09/25/25(Thu)14:30:48 No.106698624

fuck that yapping sŏy

Anonymous
09/25/25(Thu)14:32:05 No.106698636

Anonymous 09/25/25(Thu)14:32:05 No.106698636

File: nobodytakestimetoreadfilenames.png (78 KB, 390x749)

78 KB PNG

>>106698594
wrong

Anonymous
09/25/25(Thu)14:32:45 No.106698647

Anonymous 09/25/25(Thu)14:32:45 No.106698647

File: 665433.jpg (191 KB, 1080x862)

191 KB JPG

>>106698624

Anonymous
09/25/25(Thu)14:33:26 No.106698654

Anonymous 09/25/25(Thu)14:33:26 No.106698654

>>106698594
As usual, you don't have anything to backup your claims. Not a character card (or whatever interface you might be using) nor logs.
Caching out vectors doesn't mean shit per se.

Anonymous
09/25/25(Thu)14:35:31 No.106698677

Anonymous 09/25/25(Thu)14:35:31 No.106698677

Hey anons.

I'm thinking of this project where I would have a LLM running on a Linux server, that it could run commands in. The user could talk to it and give it tasks, but when left with nothing to do, it would decide it's own goals to pursue. (coding, writing a website, etc.) One could say that the goal of the experiment is to create an AI that could operate mostly by itself and achieve something meaningful while doing so.

I am planning on writing some wrapper, which would implement API for interacting with the system, as well as handling multiple levels of memory and personality, both controlled by the model itself. More complex features like MCP API support or connection to some external hardware are being considered as possible extensions, but the current plan is a simple, Linux-controlling, AI assistant.

My knowledge of the current local LLM landscape is limited, however, so I come here to ask for help.
Are local LLMs there yet?
What is the least demanding model that you would consider capable of operating a Linux system and not freaking out when left alone?
How powerful hardware would I need to run it?
What backend should I use? (ollama, llamacpp, something else)
Anything else I should know/consider?

Anonymous
09/25/25(Thu)14:38:26 No.106698706

Anonymous 09/25/25(Thu)14:38:26 No.106698706

>>106698617
AI is just numbers and vectors of information and math. So,
>List vectors for a character description like you would generate an image
>Pick your A.I. model
>Instruct it to take the vector list and spit it out as a character
>Swipe and run until you get the output you want
>Don't edit anything yourself, only instruct.
>You now have a character card that's x2 more coherent for role-play because its ran through the math of that model's architecture.
>It won't be transformed or hallucinated. It is as the model is familiar with.
>But only for that model.
>Save vector list for others.

Anonymous
09/25/25(Thu)14:38:26 No.106698707

Anonymous 09/25/25(Thu)14:38:26 No.106698707

File: Screenshot_20250925_203708.png (3.09 MB, 2350x1492)

3.09 MB PNG

>>106698647
onionsllama

Anonymous
09/25/25(Thu)14:40:04 No.106698727

Anonymous 09/25/25(Thu)14:40:04 No.106698727

>>106698677
Why did you give us a bunch of useless information? The only information we care about is what your computer specs are. Everything else is irrelevant. Since you didn't provide me with your computer hardware I am just going to assume you have an unlimited amount of money. Go run this.
https://huggingface.co/RichardErkhov/FATLLAMA-1.7T-Instruct

Anonymous
09/25/25(Thu)14:45:07 No.106698776

Anonymous 09/25/25(Thu)14:45:07 No.106698776

>>106698706
>slop
>slop
>more ai slop
people like you are the reason why chub sucks, granted it always sucked but it at least sucked a bit less when it wasn't 100% ai generated cards

Anonymous
09/25/25(Thu)14:49:42 No.106698826

Anonymous 09/25/25(Thu)14:49:42 No.106698826

>>106698776
You can't do this for chub. It's model specific. Evey model plays a character card differently. This is to know what you're getting into with it and direct it. /aicg/ is that a way. >>106690292

Anonymous
09/25/25(Thu)14:51:57 No.106698852

Anonymous 09/25/25(Thu)14:51:57 No.106698852

>>106698677
It's not going to work, I already tried. I've been thinking about training an LLM to train on my own messages and see if it's able to issue useful instructions to itself to work autonomously.

Anonymous
09/25/25(Thu)14:53:02 No.106698865

Anonymous 09/25/25(Thu)14:53:02 No.106698865

>>106698706
>feeding a model's output back into itself will improve output
lmao
the opposite is true, this is the curse that drowns you in slop. this is why you get best results by changing models within the same chat. it's fine if you're too lazy/writelet to write the card yourself, just admit that your laziness is giving you worse output.

Anonymous
09/25/25(Thu)14:53:34 No.106698874

Anonymous 09/25/25(Thu)14:53:34 No.106698874

>>106698826
people do this for chub all the time and then add disclaimers like "THIS ONLY WORKS ON SOJA/DEEPSEEK". it doesn't make it any better.

Anonymous
09/25/25(Thu)14:55:36 No.106698892

Anonymous 09/25/25(Thu)14:55:36 No.106698892

>>106698707
kek
gemmy

Anonymous
09/25/25(Thu)14:58:33 No.106698927

Anonymous 09/25/25(Thu)14:58:33 No.106698927

>>106698727
don't listen to anon, he's a fag. download this one instead
https://huggingface.co/deca-ai/3-alpha-ultra

Anonymous
09/25/25(Thu)15:01:41 No.106698960

Anonymous 09/25/25(Thu)15:01:41 No.106698960

>>106698927
Not as large, but a contender nonetheless :
>https://huggingface.co/google/switch-c-2048

Anonymous
09/25/25(Thu)15:09:45 No.106699030

Anonymous 09/25/25(Thu)15:09:45 No.106699030

>>106698594
I doubt it beats glm 355b

Anonymous
09/25/25(Thu)15:21:02 No.106699121

Anonymous 09/25/25(Thu)15:21:02 No.106699121

>>106698192
vLLM does but I think only when tool choice is set to "required".

Anonymous
09/25/25(Thu)15:50:15 No.106699350

Anonymous 09/25/25(Thu)15:50:15 No.106699350

uhhh....

Anonymous
09/25/25(Thu)15:57:36 No.106699413

Anonymous 09/25/25(Thu)15:57:36 No.106699413

File: G1szxaYXkAElTwr.jpg (210 KB, 1564x1332)

210 KB JPG

one thread ago I asked about testing cloud models if they are quantmaxxed or not.
well, looks like kimi devs are based and had a similar idea:
https://github.com/MoonshotAI/K2-Vendor-Verfier

Anonymous
09/25/25(Thu)15:59:18 No.106699431

Anonymous 09/25/25(Thu)15:59:18 No.106699431

>>106692077
damn right boye that's half my setup right now but what's your GPU?

Anonymous
09/25/25(Thu)16:10:03 No.106699517

Anonymous 09/25/25(Thu)16:10:03 No.106699517

File: 1758830229121653.jpg (62 KB, 1280x904)

62 KB JPG

Anonymous
09/25/25(Thu)16:10:33 No.106699523

Anonymous 09/25/25(Thu)16:10:33 No.106699523

>>106699413
Scroll up

Anonymous
09/25/25(Thu)16:11:36 No.106699531

Anonymous 09/25/25(Thu)16:11:36 No.106699531

>>106699413

>>106697433
>>106697433
>>106697433

Anonymous
09/25/25(Thu)16:15:51 No.106699569

Anonymous 09/25/25(Thu)16:15:51 No.106699569

>>106699523
>>106699531
yes but it was ME who had this exact idea and posted it here before goonshotAI realized it. twice even, like a month ago. ME ME MEEEEEEE.

Anonymous
09/25/25(Thu)16:16:47 No.106699576

Anonymous 09/25/25(Thu)16:16:47 No.106699576

>>106699569
i've been saying openrouter has never gone far enough to verify what models providers are truly providing since day 1

Anonymous
09/25/25(Thu)16:18:30 No.106699594

Anonymous 09/25/25(Thu)16:18:30 No.106699594

>>106699576
I predicted this would be necessary 5 years ago.

Anonymous
09/25/25(Thu)16:24:49 No.106699653

Anonymous 09/25/25(Thu)16:24:49 No.106699653

>>106699576
>>106699594
trvke. I really hope all the labs release something like this for their models. or even a better method to verify performance. the seethe from 3rd party cloudnigger providers would be glorious.
>NOOO I CAN'T SCAM USERS ANYMORE AND MY PROFIT MARGIN DROPPED FROM 350% TO 150% it's not fair

Anonymous
09/25/25(Thu)16:27:56 No.106699681

Anonymous 09/25/25(Thu)16:27:56 No.106699681

>>106691703
Catbox pls

Anonymous
09/25/25(Thu)17:06:24 No.106700059

Anonymous 09/25/25(Thu)17:06:24 No.106700059

HOLY SHIT IT'S HERE
https://www.youtube.com/watch?v=7HyMwlxRcCg

Anonymous
09/25/25(Thu)17:12:06 No.106700120

Anonymous 09/25/25(Thu)17:12:06 No.106700120

>>106700059
>5T parameter QAT JEPA-SSM hybrid world model
Holy shit

Anonymous
09/25/25(Thu)17:52:24 No.106700428

Anonymous 09/25/25(Thu)17:52:24 No.106700428

it's literally never been so over

Anonymous
09/25/25(Thu)17:52:37 No.106700430

Anonymous 09/25/25(Thu)17:52:37 No.106700430

File: meekyuu.webm (599 KB, 686x1080)

599 KB WEBM

>>106691703
miku miikku miku miku miku IKUUUUUUUUUUUUUUUU~~~~~~~~~~~~ aaahn..

Anonymous
09/25/25(Thu)17:52:59 No.106700434

Anonymous 09/25/25(Thu)17:52:59 No.106700434

>>106695552
>hands
>no glowsticks
westoids hands slopped this image

Anonymous
09/25/25(Thu)17:54:50 No.106700452

Anonymous 09/25/25(Thu)17:54:50 No.106700452

>>106700424
>>106700424
>>106700424

Anonymous
09/25/25(Thu)18:02:27 No.106700527

Anonymous 09/25/25(Thu)18:02:27 No.106700527

>>106700430
Why do this Miku's breasts emit smoke?

Anonymous
09/25/25(Thu)18:03:37 No.106700540

Anonymous 09/25/25(Thu)18:03:37 No.106700540

>>106700434
japanese bring glowsticks to gangbags?

Anonymous
09/25/25(Thu)19:46:56 No.106701301

Anonymous 09/25/25(Thu)19:46:56 No.106701301

>>106691703
total drama miku hot

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.