/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Janitor applications are now being accepted. Click here to apply.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 09/20/25(Sat)17:48:34 No.106649116

File: file.png (1.46 MB, 1024x1512)

1.46 MB PNG

/lmg/ - Local Models General Anonymous 09/20/25(Sat)17:48:34 No.106649116

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106635936 & >>106627153

►News
>(09/17) SongBloom DPO released: https://hf.co/CypressYang/SongBloom/commit/4b8b9deb199fddc48964c851e8458b9269081c24
>(09/17) Magistral Small 1.2 with vision encoder released: https://hf.co/mistralai/Magistral-Small-2509
>(09/16) Ling-flash-2.0 released, with 100B-A6.1B: https://hf.co/inclusionAI/Ling-flash-2.0
>(09/16) Tongyi DeepResearch 30B-A3B released: https://tongyi-agent.github.io/blog/introducing-tongyi-deep-research
>(09/16) VoxCPM 0.5B: Tokenizer-Free TTS released: https://hf.co/openbmb/VoxCPM-0.5B

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
09/20/25(Sat)17:48:54 No.106649119

Anonymous 09/20/25(Sat)17:48:54 No.106649119

File: __hatsune_miku_vocaloid_d(...).jpg (588 KB, 1901x1626)

588 KB JPG

►Recent Highlights from the Previous Thread: >>106635936

--Paper: LLM-JEPA: Large Language Models Meet Joint Embedding Predictive Architectures:
>106643176 >106645182 >106647568 >106648268 >106648303
--Papers (old):
>106647587
--Limitations of local LLMs for document processing and memory:
>106639461 >106639485 >106639532 >106639620 >106639676 >106639802 >106639873 >106639928 >106639985 >106640027 >106641184 >106641620
--Debugging QLoRa training scripts and exploring browser-use automation for productivity:
>106638725 >106638760 >106638789 >106638895 >106639341 >106639375 >106639399 >106639686
--Troubleshooting Joycaption implementation issues in llamacpp:
>106643986 >106644049 >106645230
--Qwen models struggling with paragraph structure in roleplay responses:
>106638988 >106639017 >106639068 >106639149 >106639218 >106639255 >106639282 >106639335 >106639354 >106639359 >106639380
--Critique of model overemphasis on trivial details and tuning challenges:
>106636046 >106636140 >106636185 >106636197 >106636215 >106636242 >106636708 >106636770 >106636774 >106636233 >106636295 >106636341 >106636377 >106636198 >106636268
--Ollama's cloud models spark debate over technical quality, privacy, and cost efficiency:
>106642356 >106642407 >106642416 >106642424 >106643152 >106643158 >106642571
--Apple iPhone 16 Pro running local LLMs via LocallyAI with future HBM memory potential:
>106636508 >106636521 >106636790 >106636806 >106636546
--Optimizing character cards for local LLMs with token limits and persona consistency:
>106636023 >106636153 >106636406 >106636423 >106636474
--QLoRa training success with Llama 3.1 70B, context window and length control challenges:
>106643241 >106643452
--Qwen3-80B performance issues with long prompts and context window constraints:
>106637467
--Miku (free space):
>106645879 >106646044 >106648614

►Recent Highlight Posts from the Previous Thread: >>106635941

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
09/20/25(Sat)17:55:20 No.106649154

Anonymous 09/20/25(Sat)17:55:20 No.106649154

it's over

Anonymous
09/20/25(Sat)17:58:10 No.106649173

Anonymous 09/20/25(Sat)17:58:10 No.106649173

I've been using the same Gemma3 27b model for a while now, but I kinda want to try some Llama 24b models for roleplay.
What are some of the 24b models you like the most?

Anonymous
09/20/25(Sat)17:59:03 No.106649184

Anonymous 09/20/25(Sat)17:59:03 No.106649184

File: 1755219328665759.png (1.58 MB, 1328x1328)

1.58 MB PNG

>>106649154
It hasn't even started

Anonymous
09/20/25(Sat)18:01:09 No.106649199

Anonymous 09/20/25(Sat)18:01:09 No.106649199

File: 1757284991163290.png (1.24 MB, 7279x2969)

1.24 MB PNG

Anonymous
09/20/25(Sat)18:04:52 No.106649223

Anonymous 09/20/25(Sat)18:04:52 No.106649223

File: miku retro home appliance(...).png (1.58 MB, 1328x1328)

1.58 MB PNG

>>106649184

Anonymous
09/20/25(Sat)18:08:39 No.106649254

Anonymous 09/20/25(Sat)18:08:39 No.106649254

>>106649223
A+

Anonymous
09/20/25(Sat)18:13:49 No.106649299

Anonymous 09/20/25(Sat)18:13:49 No.106649299

Alger miter slather

Anonymous
09/20/25(Sat)18:15:07 No.106649315

Anonymous 09/20/25(Sat)18:15:07 No.106649315

>>106649240
based, I'm still using nemo

Anonymous
09/20/25(Sat)18:15:27 No.106649319

Anonymous 09/20/25(Sat)18:15:27 No.106649319

File: 20250921@011212.jpg (63 KB, 822x593)

63 KB JPG

> compute
Well fuck. This is exact opposite of what I want.

Anonymous
09/20/25(Sat)18:16:02 No.106649327

Anonymous 09/20/25(Sat)18:16:02 No.106649327

How long until we get something like NVIDIA DIGITS but with a reasonable token output rate?

Anonymous
09/20/25(Sat)18:16:39 No.106649330

Anonymous 09/20/25(Sat)18:16:39 No.106649330

Why is the option to search my cards in sillytavern just gone wtf

Anonymous
09/20/25(Sat)18:16:51 No.106649332

Anonymous 09/20/25(Sat)18:16:51 No.106649332

>>106649319
4chan ate my fancy unicode symbol again award

Anonymous
09/20/25(Sat)18:16:59 No.106649336

Anonymous 09/20/25(Sat)18:16:59 No.106649336

>>106649319
>compute growing larger than the available data
It's 2025, just generate more data with your existing llms.

Anonymous
09/20/25(Sat)18:17:39 No.106649345

Anonymous 09/20/25(Sat)18:17:39 No.106649345

I want to try online training an LLM
Meaning I will ask it a question, modify the answer to my liking and immediately train on that. Do you guys think it'll make any noticeable difference?

Anonymous
09/20/25(Sat)18:19:13 No.106649361

Anonymous 09/20/25(Sat)18:19:13 No.106649361

>>106649199
I really hate this pic because these are just ramblings of one guy, biased and incorrect.
t. llm oldfag and historian

Anonymous
09/20/25(Sat)18:20:03 No.106649368

Anonymous 09/20/25(Sat)18:20:03 No.106649368

>>106649361
history belongs to those who write it down

Anonymous
09/20/25(Sat)18:20:46 No.106649377

Anonymous 09/20/25(Sat)18:20:46 No.106649377

(Manually edit it or try multiple gens and train on the best one).

Anonymous
09/20/25(Sat)18:20:57 No.106649381

Anonymous 09/20/25(Sat)18:20:57 No.106649381

>>106649361
What would yours look like?

Anonymous
09/20/25(Sat)18:21:39 No.106649390

Anonymous 09/20/25(Sat)18:21:39 No.106649390

>>106649345
How many millions of questions are you going to ask it to get enough tokens for a dataset?

Anonymous
09/20/25(Sat)18:21:44 No.106649391

Anonymous 09/20/25(Sat)18:21:44 No.106649391

>>106649345
I hope you'll still be here in a million years

Anonymous
09/20/25(Sat)18:22:15 No.106649395

Anonymous 09/20/25(Sat)18:22:15 No.106649395

>>106649368
I will write a historical memorial about doing your mother.

Anonymous
09/20/25(Sat)18:22:45 No.106649398

Anonymous 09/20/25(Sat)18:22:45 No.106649398

>>106649361
Correct it then, don't let that fag hog all the attention

Anonymous
09/20/25(Sat)18:24:31 No.106649419

Anonymous 09/20/25(Sat)18:24:31 No.106649419

>>106649390
>>106649391
I'm not convinced you need millions of examples to make a meaningful difference, especially if you just want to stamp out some annoying quirk for example.

Anonymous
09/20/25(Sat)18:26:27 No.106649447

Anonymous 09/20/25(Sat)18:26:27 No.106649447

>>106649381
Way different, but I don't have time or motivation to do it, so it is what it is.

Anonymous
09/20/25(Sat)18:27:16 No.106649452

Anonymous 09/20/25(Sat)18:27:16 No.106649452

>>106649345
little to no difference, the only language this would work for is hindi because you could hire an army of jeets to do this for relatively cheap. but if you are just writing a hand full of prompts yourself, then your not going to have a good time. you could probably spend a year working 16 hour days and in the end, a proper cluster will eat all your hard work in a few minutes and you will struggle to be able to measure any real difference positive or negative.

Anonymous
09/20/25(Sat)18:28:58 No.106649467

Anonymous 09/20/25(Sat)18:28:58 No.106649467

>>106649345
Sure. As long as you repeat it 50000 times.

Anonymous
09/20/25(Sat)18:31:47 No.106649487

Anonymous 09/20/25(Sat)18:31:47 No.106649487

>>106649199
>deepseek-v2.5
is it still worth it if I can't fit the bigger one?

Anonymous
09/20/25(Sat)18:33:18 No.106649496

Anonymous 09/20/25(Sat)18:33:18 No.106649496

>>106649199
This is nonsense, we haven't had a good local model since Gemma 3

Anonymous
09/20/25(Sat)18:37:03 No.106649521

Anonymous 09/20/25(Sat)18:37:03 No.106649521

>>106649496
Qwen3 30b series are the best thing that has happened to local. If it could RP it would be perfect.

Anonymous
09/20/25(Sat)18:37:15 No.106649522

Anonymous 09/20/25(Sat)18:37:15 No.106649522

>>106649487
No, the pre-V3 Deepseek models were smart but dry as fuck.

Anonymous
09/20/25(Sat)18:41:46 No.106649544

Anonymous 09/20/25(Sat)18:41:46 No.106649544

File: 1733926030451250.png (15 KB, 400x400)

15 KB PNG

We could've had V4 by now.

Anonymous
09/20/25(Sat)18:51:15 No.106649608

Anonymous 09/20/25(Sat)18:51:15 No.106649608

File: 1743398266197563.png (203 KB, 920x919)

203 KB PNG

Anonymous
09/20/25(Sat)18:57:26 No.106649655

Anonymous 09/20/25(Sat)18:57:26 No.106649655

>>106649608
replace "Hello..." at the bottom with actual name of one of the shizo-merges from usual suspects

Anonymous
09/20/25(Sat)18:59:02 No.106649667

Anonymous 09/20/25(Sat)18:59:02 No.106649667

>>106649521
>If it could be good
Sorry gweilo, we only have benchmaxxed models for you

Anonymous
09/20/25(Sat)18:59:58 No.106649671

Anonymous 09/20/25(Sat)18:59:58 No.106649671

>>106649119
How do you make these?

Anonymous
09/20/25(Sat)19:02:01 No.106649681

Anonymous 09/20/25(Sat)19:02:01 No.106649681

>>106649522
>implying newer models aren't slopped too
It literally comes down to preference. Some people are still running 70Bs and nemo.

Anonymous
09/20/25(Sat)19:02:56 No.106649684

Anonymous 09/20/25(Sat)19:02:56 No.106649684

>>106649667
Benchmemes are worthless, you should only trust your own personal tests.

Anonymous
09/20/25(Sat)19:04:00 No.106649688

Anonymous 09/20/25(Sat)19:04:00 No.106649688

Any use case yet for RamTorch?
https://github.com/lodestone-rock/RamTorch

Anonymous
09/20/25(Sat)19:07:54 No.106649707

Anonymous 09/20/25(Sat)19:07:54 No.106649707

>>106649681
Turn down your temperature if this was your take away from that.

Anonymous
09/20/25(Sat)19:08:45 No.106649709

Anonymous 09/20/25(Sat)19:08:45 No.106649709

>>106649688
Wouldn't loading parameters from RAM onto VRAM just waste time during actual inference?

Anonymous
09/20/25(Sat)19:11:43 No.106649732

Anonymous 09/20/25(Sat)19:11:43 No.106649732

>>106649608
This is true, but that also means it's misusing the meme.

Anonymous
09/20/25(Sat)19:12:47 No.106649741

Anonymous 09/20/25(Sat)19:12:47 No.106649741

>>106649688
No. Definitely not for LLMs.
>here's this memory bandwidth bottlenecked task
>we can optimize it
>by putting the weights in slower memory
lmao baka

Anonymous
09/20/25(Sat)19:13:42 No.106649748

Anonymous 09/20/25(Sat)19:13:42 No.106649748

>>106649173
gemma3 27b was my goto for a good while, glm air @ q3 replaced it for me and it's been a ton of fun, though you need 64GB of RAM or so.

Anonymous
09/20/25(Sat)19:16:26 No.106649773

Anonymous 09/20/25(Sat)19:16:26 No.106649773

>>106649688
This dude created a flux finetune
I don't think LLMs are his focus

Anonymous
09/20/25(Sat)20:05:11 No.106650040

Anonymous 09/20/25(Sat)20:05:11 No.106650040

>>106649319
So what is the conclusion? Do dense models or MoE scale better with more compute?
Depth has always scaled better than breadth, so my guess is at some point the marginal benefits of adding more experts kick in, so it may still help local in a roundabout way.

Anonymous
09/20/25(Sat)20:08:06 No.106650056

Anonymous 09/20/25(Sat)20:08:06 No.106650056

>>106649748
I'm using G3 Glitter (50/50 mix of instruct and base) and while I had some relatively fun scenarios in the past it has began to deny things more often. Don't have any old logs saved though. Just wondering what did I change, maybe it's a temperature issue idk.

Anonymous
09/20/25(Sat)20:10:20 No.106650066

Anonymous 09/20/25(Sat)20:10:20 No.106650066

>>106649671
https://github.com/RecapAnon/LmgRecap

Anonymous
09/20/25(Sat)20:15:04 No.106650097

Anonymous 09/20/25(Sat)20:15:04 No.106650097

>>106650066
wait, was the newscaster bit for a planned video or something?

Anonymous
09/20/25(Sat)20:21:52 No.106650148

Anonymous 09/20/25(Sat)20:21:52 No.106650148

>>106650097
https://desuarchive.org/g/thread/105939052/#q105939055
https://desuarchive.org/g/thread/105671827/#105671833
https://desuarchive.org/g/thread/105661786/#105661791
https://desuarchive.org/g/thread/103903120/#103903123
Kind of. At some point I would like to have it always automatically generate a video of Miku reading the highlights. Though the 4MB size limit and lack of native audio on this board makes it difficult.

Anonymous
09/20/25(Sat)20:24:40 No.106650170

Anonymous 09/20/25(Sat)20:24:40 No.106650170

>>106650148
Cute!

Anonymous
09/20/25(Sat)20:25:30 No.106650177

Anonymous 09/20/25(Sat)20:25:30 No.106650177

as with most things technology has superseeded biology there is a very rare point in cooming when you become so horny your biology fails you and you dont feel anything because you get overwhelmed it can happen once in a great while but with llms its literally every single goddamn time i want to coom i want to stroke my cock until it falls off and my skin chafes like a cartel snuff video but nay the spirit is willing but the body is not and i burn out every time why must i be cursed with this feeble meat puppet why oh god ? why must ye hate me so ?

Anonymous
09/20/25(Sat)20:27:32 No.106650191

Anonymous 09/20/25(Sat)20:27:32 No.106650191

>>106650177
just stick a vibrating dildo in your ass

Anonymous
09/20/25(Sat)20:33:13 No.106650240

Anonymous 09/20/25(Sat)20:33:13 No.106650240

>>106650177
This has never happened to me

Anonymous
09/20/25(Sat)20:40:08 No.106650279

Anonymous 09/20/25(Sat)20:40:08 No.106650279

>>106650177
You have serious mental issues. Chromic masturbation is a sign of lunacy, among other things.

Anonymous
09/20/25(Sat)20:50:05 No.106650338

Anonymous 09/20/25(Sat)20:50:05 No.106650338

File: 1756719072270601.jpg (1.03 MB, 1552x1944)

1.03 MB JPG

Anonymous
09/20/25(Sat)20:51:17 No.106650349

Anonymous 09/20/25(Sat)20:51:17 No.106650349

>>106650148
No thanks we have enough waifufaggotry in here.

Anonymous
09/20/25(Sat)20:53:25 No.106650361

Anonymous 09/20/25(Sat)20:53:25 No.106650361

File: 1745198947072720.jpg (937 KB, 1552x1944)

937 KB JPG

Anonymous
09/20/25(Sat)21:05:09 No.106650425

Anonymous 09/20/25(Sat)21:05:09 No.106650425

File: 1751774810184415.jpg (984 KB, 1552x1944)

984 KB JPG

Anonymous
09/20/25(Sat)21:08:12 No.106650440

Anonymous 09/20/25(Sat)21:08:12 No.106650440

https://www.youtube.com/watch?v=c6htXwW38Ac
Guys do you think this is AI? In case you don't know the guy I am assuming he is dead since a while and his wife is now running his stream for her to milk the money.

Anonymous
09/20/25(Sat)21:30:05 No.106650574

Anonymous 09/20/25(Sat)21:30:05 No.106650574

>>106650279
>Chromic
Wow look at mr. fancypants here, he's got a chrome dick
Also holy fuck it's over llms are dead and so is this general there has not been any meaningful progress in years the repeated flops from the big ai companies have killed any enthusiasm and none of the research that is being done is actually applied to usable models this is truly the ai winter and I blame benchmarks, fuck you if you ever thought benchmarks are useful also fuck you drummer and your gay lover undi for making shitty merges and killing the finetuning scene

Anonymous
09/20/25(Sat)21:33:03 No.106650588

Anonymous 09/20/25(Sat)21:33:03 No.106650588

>>106650574
It was a hypo, my bad.

Anonymous
09/20/25(Sat)21:33:27 No.106650595

Anonymous 09/20/25(Sat)21:33:27 No.106650595

>>106650440
I sounds quite natural but some bits do sound a bit robotic at times so IDK. Did he always use to do livestreams reading off a script?

Anonymous
09/20/25(Sat)21:37:48 No.106650615

Anonymous 09/20/25(Sat)21:37:48 No.106650615

>>106650574
Nah it's far from over. It's actually a good thing that we hit the wall so we can focus on giving better tools to LLMs instead of trying to scale them up indefinitely.
https://github.com/Open-LLM-VTuber/Open-LLM-VTuber

Anonymous
09/20/25(Sat)21:38:02 No.106650616

Anonymous 09/20/25(Sat)21:38:02 No.106650616

>>106650595
>Did he always use to do livestreams reading off a script?
Never. I think he has been dead for a while and in the past his wife was using voice to voice she would read out loud. And now she is just using vibevoice.

Anonymous
09/20/25(Sat)21:42:57 No.106650634

Anonymous 09/20/25(Sat)21:42:57 No.106650634

>>106650616
Is there some official explanation of why he deleted all his videos?

Anonymous
09/20/25(Sat)21:44:48 No.106650646

Anonymous 09/20/25(Sat)21:44:48 No.106650646

>>106650634
He was unsafe like Nemo.

Anonymous
09/20/25(Sat)21:46:01 No.106650652

Anonymous 09/20/25(Sat)21:46:01 No.106650652

>>106650616
I saw some vtuber make a demonstration of voice changer technology and it sucked ass beyond simple tricks like pitch shift.

Anonymous
09/20/25(Sat)21:48:13 No.106650665

Anonymous 09/20/25(Sat)21:48:13 No.106650665

>>106650440
>>106650616
It's certainly not AI. Take your meds
>>106650595
He's a good orator and not stuttering / misspeaking is his whole gimmick when he debates fidgety zoomers

Anonymous
09/20/25(Sat)21:48:46 No.106650669

Anonymous 09/20/25(Sat)21:48:46 No.106650669

>>106650652
https://www.youtube.com/watch?v=rrZSuCzRHQU&list=PLg9RuMEFhFqJ3bg0auxsgLfIg_GRS8-HM&index=2
He is popular enough to have AI songs 2 years ago.

Anonymous
09/20/25(Sat)21:51:34 No.106650691

Anonymous 09/20/25(Sat)21:51:34 No.106650691

>>106650665
Can you give an example of such a debate?

Anonymous
09/20/25(Sat)21:53:48 No.106650705

Anonymous 09/20/25(Sat)21:53:48 No.106650705

>>106650691
He meant fuentes debate and that was just a one off when he was still alive.

Anonymous
09/20/25(Sat)21:57:22 No.106650734

Anonymous 09/20/25(Sat)21:57:22 No.106650734

>>106650705
Not talking about that at all lmao, you are an overconfident schizo

Anonymous
09/20/25(Sat)22:05:08 No.106650797

Anonymous 09/20/25(Sat)22:05:08 No.106650797

https://github.com/huggingface/transformers/pull/41025
>Adding support for Qwen3Omni
>This PR introduces support for the upcoming Qwen3-Omni models, including Instruct and Thinking versions.
>As the next generation of the Qwen-Omni family, Qwen3-Omni brings new architecture, multilingual and reasoning ability to omni model, achieving superior performance across complex multimodal tasks.
from a quick skim I believe it's (text, image, video, audio) in and (text, audio) out, like their previous omni models

Anonymous
09/20/25(Sat)22:32:00 No.106651034

Anonymous 09/20/25(Sat)22:32:00 No.106651034

>>106650797
I can't wait to see it not implemented in llama.cpp
Also, it will be shit and no one will use it.

Anonymous
09/20/25(Sat)22:34:23 No.106651052

Anonymous 09/20/25(Sat)22:34:23 No.106651052

>>106650669
Songs are different story, because you make them offline and can get as many gacha rolls on audiogen as you have patience for. Voice-changers have to work live and with minimal latency.

Anonymous
09/20/25(Sat)22:35:53 No.106651066

Anonymous 09/20/25(Sat)22:35:53 No.106651066

>>106651052
>Voice-changers have to work live and with minimal latency.
Consider the following: it is a video played live.

Anonymous
09/20/25(Sat)22:42:51 No.106651127

Anonymous 09/20/25(Sat)22:42:51 No.106651127

sirs, kindly redeem
https://github.com/ag-ui-protocol/ag-ui
samples
https://dojo.ag-ui.com/pydantic-ai/feature/shared_state

Anonymous
09/20/25(Sat)22:50:59 No.106651186

Anonymous 09/20/25(Sat)22:50:59 No.106651186

>>106650652
>I saw some vtuber make a demonstration of voice changer technology
found it:
https://nitter.net/kibawooo/status/1934363088548946420
>>106651066
That would be technically possible I guess, VibeVoice's main killer feature was exactly in being able to make this kind of long-form podcast style audio, and from samples anon's posted there, it's voice cloning ability was pretty good.
There are other, non-technical, arguments against it however.

Anonymous
09/20/25(Sat)22:56:29 No.106651231

Anonymous 09/20/25(Sat)22:56:29 No.106651231

File: 1742892672662765.mp4 (3.44 MB, 1286x864)

3.44 MB MP4

pure art

Anonymous
09/20/25(Sat)22:59:03 No.106651252

Anonymous 09/20/25(Sat)22:59:03 No.106651252

>>106651231
Then you'll find out that the client uses some poor clickfarm worker in India.

Anonymous
09/20/25(Sat)23:01:28 No.106651272

Anonymous 09/20/25(Sat)23:01:28 No.106651272

>>106651252
retard

Anonymous
09/20/25(Sat)23:02:23 No.106651283

Anonymous 09/20/25(Sat)23:02:23 No.106651283

>>106651231
>two tools
noob

Anonymous
09/20/25(Sat)23:17:46 No.106651417

Anonymous 09/20/25(Sat)23:17:46 No.106651417

>>106651231
tool calling is almost as big of a meme as mcp and rag

Anonymous
09/20/25(Sat)23:23:53 No.106651453

Anonymous 09/20/25(Sat)23:23:53 No.106651453

LLMs are a meme

Anonymous
09/20/25(Sat)23:25:02 No.106651463

Anonymous 09/20/25(Sat)23:25:02 No.106651463

>>106651417
t. roleplaying faggot

Anonymous
09/20/25(Sat)23:44:44 No.106651617

Anonymous 09/20/25(Sat)23:44:44 No.106651617

>>106651463
Post one thing you've "contributed" to society using AI. Just one.
No? Then fuck off.

Anonymous
09/20/25(Sat)23:50:40 No.106651666

Anonymous 09/20/25(Sat)23:50:40 No.106651666

>>106651617
I have created bunch of python scripts, including a controller emulator. I mainly use this for flight simulators - mouse is perfect for simulating relative control wheel instead of using lousy controller (what always returns back to 0,0).
I will also publish my adventure game but this is in progress.

Anonymous
09/20/25(Sat)23:51:30 No.106651672

Anonymous 09/20/25(Sat)23:51:30 No.106651672

>>106651617
so you define people who haven't contributed enough to your standards worthless hmm?
you are the definition of arrogance, and one day you will get fucking humbled.
it will be the worst day of your life, so prepare for someone to be better than you, and stomp all over you, and ruin your pathetic existence.
now run away and do your important work that you're wasting your life on.

Anonymous
09/20/25(Sat)23:51:41 No.106651673

Anonymous 09/20/25(Sat)23:51:41 No.106651673

>>106651666
*control wheel = yoke

Anonymous
09/20/25(Sat)23:52:14 No.106651681

Anonymous 09/20/25(Sat)23:52:14 No.106651681

>>106651666
>flight simulators
TTS for radio chatter?

Anonymous
09/20/25(Sat)23:53:42 No.106651700

Anonymous 09/20/25(Sat)23:53:42 No.106651700

>>106651672
>so you define people who haven't contributed enough to your standards worthless hmm?
No. The anon I was replying to does.

Anonymous
09/21/25(Sun)00:00:39 No.106651768

Anonymous 09/21/25(Sun)00:00:39 No.106651768

>>106651463
LLM are barely enough for RP. There's no way anyone should trust them for anything 'important'.

Anonymous
09/21/25(Sun)00:02:28 No.106651779

Anonymous 09/21/25(Sun)00:02:28 No.106651779

>>106651681
MS Flight Simulator '24 uses it's own AI generated voice via Azure I think this is kind of cool.
I already have implemented piper-tts for other stuff, but I don't use multiplayer like VATSIM. It could be useful in some cases I guess. And using model to read the map... maybe that's bit too much though.

Anonymous
09/21/25(Sun)00:04:44 No.106651795

Anonymous 09/21/25(Sun)00:04:44 No.106651795

>>106651768
*local LLM

Anonymous
09/21/25(Sun)00:20:10 No.106651925

Anonymous 09/21/25(Sun)00:20:10 No.106651925

>>106650177
Ever since he understood the weakness of his flesh, he was disgusted by it.

Anonymous
09/21/25(Sun)00:23:47 No.106651955

Anonymous 09/21/25(Sun)00:23:47 No.106651955

>>106651795
*any LLM because they're all the same dogshit

Anonymous
09/21/25(Sun)00:24:07 No.106651960

Anonymous 09/21/25(Sun)00:24:07 No.106651960

File: AIwaifu.jpg (34 KB, 609x638)

34 KB JPG

I've coomed to GLM Air far too long and it fucking sucks, honestly goliath was better. Surely there must be something better.

What's the best LLM around 100b for coom?

Anonymous
09/21/25(Sun)00:24:53 No.106651968

Anonymous 09/21/25(Sun)00:24:53 No.106651968

>>106651955
not factoring your skill issue there

Anonymous
09/21/25(Sun)00:27:47 No.106651991

Anonymous 09/21/25(Sun)00:27:47 No.106651991

>>106651960
>>106647810

Anonymous
09/21/25(Sun)00:29:18 No.106652003

Anonymous 09/21/25(Sun)00:29:18 No.106652003

>>106651991
No way, I refuse to believe that it is over.

Anonymous
09/21/25(Sun)00:30:49 No.106652011

Anonymous 09/21/25(Sun)00:30:49 No.106652011

>>106651960
behemoth x was p good from my limited use if you have the vram for it

Anonymous
09/21/25(Sun)00:40:41 No.106652075

Anonymous 09/21/25(Sun)00:40:41 No.106652075

>>106651960
I am waiting for ggufs and any reports on Qwen3-Next in regards to RP/creative writing.
It would be perfect for my machine, glm-chan is a bit too tight, but hopes are slim because Qwen are not known for their ah ah mistress capability.

Anonymous
09/21/25(Sun)00:54:18 No.106652152

Anonymous 09/21/25(Sun)00:54:18 No.106652152

>>106651968
Yeah bro your indian prompt-fu makes the model good, whatever you say

Anonymous
09/21/25(Sun)01:09:38 No.106652239

Anonymous 09/21/25(Sun)01:09:38 No.106652239

How do you even train a model to do erotic roleplay? Do you just feed it fifty shades of grey in 17 different languages or something?

Anonymous
09/21/25(Sun)01:12:52 No.106652261

Anonymous 09/21/25(Sun)01:12:52 No.106652261

File: double_hegel.gif (775 KB, 1911x639)

775 KB GIF

>>106652152
>>106651968
>>106651955
>>106651795
Using an organized workflow makes local a good deal smarter, but is slow. Are you willing to wait longer for quality?

Anonymous
09/21/25(Sun)01:19:13 No.106652307

Anonymous 09/21/25(Sun)01:19:13 No.106652307

>>106652239
Drummer knows everything about this.

Anonymous
09/21/25(Sun)01:32:30 No.106652397

Anonymous 09/21/25(Sun)01:32:30 No.106652397

Moondream

Anonymous
09/21/25(Sun)01:42:22 No.106652474

Anonymous 09/21/25(Sun)01:42:22 No.106652474

What 8B models could be used for simple "gaming"? I tried Gemma-3n-E4B-it (that's 4B though) but it is so soulless and repetitive even by my standards.
I guess I need to settle with Wayfarer I guess but that's 12B

Anonymous
09/21/25(Sun)01:44:26 No.106652489

Anonymous 09/21/25(Sun)01:44:26 No.106652489

>>106651960
gp-toss-120b

Anonymous
09/21/25(Sun)01:45:18 No.106652491

Anonymous 09/21/25(Sun)01:45:18 No.106652491

>>106648268
The answer is probably yes but I don't have any specific names or links. People have been idea manning it for ages and I personally was a part of a "cognitive architecture" community that began shortly after GPT-4 launch, but I haven't paid attention to this subject in ages. Probably what ended up happening is likely that they don't work much better if at all than simpler task-specific agent frameworks, and are also not worth the cost. As the complexity of the framework grows, so does room for error, and error propagates and error correction methods are not perfect, plus LLMs can get stuck in loops easily. There is also the issue that with this type of architecture, you really want/need LLMs trained to do each task for the best performance, but that requires a lot more work and money.

Anonymous
09/21/25(Sun)01:55:45 No.106652548

Anonymous 09/21/25(Sun)01:55:45 No.106652548

File: dmn.gif (1.68 MB, 2522x1248)

1.68 MB GIF

>>106652491
You mean something like this? You can run it on local.
https://github.com/dibrale/Regions/tree/master/examples/default_mode_network

Anonymous
09/21/25(Sun)01:59:51 No.106652562

Anonymous 09/21/25(Sun)01:59:51 No.106652562

I snorted my addy and started getting optimistic about AGI. is this what's going on in sillicon valley?

Anonymous
09/21/25(Sun)02:00:40 No.106652567

Anonymous 09/21/25(Sun)02:00:40 No.106652567

>>106652562
addy is school children, sv is all on snow white

Anonymous
09/21/25(Sun)02:07:16 No.106652595

Anonymous 09/21/25(Sun)02:07:16 No.106652595

File: 1758145616077857m.jpg (84 KB, 587x1024)

84 KB JPG

>>106652567
when u fast and do high intensity training, addy turns into meth.

Anonymous
09/21/25(Sun)02:08:02 No.106652599

Anonymous 09/21/25(Sun)02:08:02 No.106652599

>>106652562
If you are not doing heroin/meth you are doing it wrong. Everyone at Apple are doing meth for example.

Anonymous
09/21/25(Sun)02:34:05 No.106652708

Anonymous 09/21/25(Sun)02:34:05 No.106652708

File: gemma3-emoji-response_emp(...).png (832 KB, 1630x1786)

832 KB PNG

Emoji are underrated, and I'm not speaking of engagement farming ones used on social media. Modern instruct-tuned LLMs will react differently to requests depending on which ones you're using, and this works for actual roleplay as well.

Anonymous
09/21/25(Sun)02:35:21 No.106652716

Anonymous 09/21/25(Sun)02:35:21 No.106652716

>>106652548
Yeah that might be interesting. It's one interpretation that's different from the architecture LeCun outlined. It could be fun to play around with, though, skimming through the link, I think a narrower agent designed for a specific task is probably still the better idea for anyone looking for working solutions to their tasks.

Anonymous
09/21/25(Sun)02:42:40 No.106652750

Anonymous 09/21/25(Sun)02:42:40 No.106652750

>>106652716
Totally agree. That particular demo is geared to produce traces and responses that are more aesthetic than practical. Literally a daydreaming circuit.

Anonymous
09/21/25(Sun)02:57:29 No.106652813

Anonymous 09/21/25(Sun)02:57:29 No.106652813

>>106651034
I'm starting to believe it's a conspiracy to obstruct local!

Anonymous
09/21/25(Sun)03:09:06 No.106652872

Anonymous 09/21/25(Sun)03:09:06 No.106652872

>>106652708
Interesting. How hard is gemma3 to jailbreak?

Anonymous
09/21/25(Sun)03:10:35 No.106652887

Anonymous 09/21/25(Sun)03:10:35 No.106652887

>>106650338
>>106650361
>>106650425
Miku ready for autumn

Anonymous
09/21/25(Sun)03:11:26 No.106652893

Anonymous 09/21/25(Sun)03:11:26 No.106652893

>>106651231
You should have made it call some random Anon a tool instead.

Anonymous
09/21/25(Sun)03:12:01 No.106652896

Anonymous 09/21/25(Sun)03:12:01 No.106652896

>>106652893
retard

Anonymous
09/21/25(Sun)03:13:44 No.106652907

Anonymous 09/21/25(Sun)03:13:44 No.106652907

>>106651231
Amusing at first but this is basically just spam. Worthless use case except for ccp bot farms.

Anonymous
09/21/25(Sun)03:19:40 No.106652938

Anonymous 09/21/25(Sun)03:19:40 No.106652938

>>106651231
retard

Anonymous
09/21/25(Sun)03:22:47 No.106652955

Anonymous 09/21/25(Sun)03:22:47 No.106652955

>>106650177
same but with images

Anonymous
09/21/25(Sun)03:25:32 No.106652971

Anonymous 09/21/25(Sun)03:25:32 No.106652971

Any LLM that fits in my 4gb ram potato server that would be good for generating tags for this RSS agregator? or are all the small LLMs still too bad? https://github.com/Tiendil/feeds.fun

Anonymous
09/21/25(Sun)03:28:11 No.106652979

Anonymous 09/21/25(Sun)03:28:11 No.106652979

>>106652872
It's not difficult, but you need to have a fairly detailed prompt and to not simply leave that at the start of the context, or it will slowly fall back to the default safe assistant personality. You should also not expect good smut from the model; it wasn't trained for that, although I'm fairly confident it was definitely post-trained on some amounts of ERP.

Anonymous
09/21/25(Sun)03:28:13 No.106652980

Anonymous 09/21/25(Sun)03:28:13 No.106652980

>>106652971
You can try
>gemma-3-1b-it-Q8_0.gguf
That's 1GB.
or if you want to max out the insanity but this might be pushing it
>gemma-3n-E4B-it-IQ4_XS.gguf

Anonymous
09/21/25(Sun)03:35:21 No.106653014

Anonymous 09/21/25(Sun)03:35:21 No.106653014

>>106652980
Thanks anon. I'll give it a try.

Anonymous
09/21/25(Sun)03:50:06 No.106653084

Anonymous 09/21/25(Sun)03:50:06 No.106653084

File: 1758386539683817.png (781 KB, 1110x833)

781 KB PNG

Anonymous
09/21/25(Sun)04:05:59 No.106653157

Anonymous 09/21/25(Sun)04:05:59 No.106653157

>>106650652
Voice changers do suck much more than people think, it's funny. Even the AI ones sound like garbage.

Anonymous
09/21/25(Sun)04:08:24 No.106653164

Anonymous 09/21/25(Sun)04:08:24 No.106653164

llms benchmaxxed on vimscript(trash)? Even Claude is shit at it

Anonymous
09/21/25(Sun)04:27:14 No.106653223

Anonymous 09/21/25(Sun)04:27:14 No.106653223

There's some big releases coming up.

Anonymous
09/21/25(Sun)04:30:51 No.106653238

Anonymous 09/21/25(Sun)04:30:51 No.106653238

>>106653223
Big for your setup.

Anonymous
09/21/25(Sun)04:31:25 No.106653241

Anonymous 09/21/25(Sun)04:31:25 No.106653241

>>106651186
If you make VibeVoice output more than a few minutes, it still starts sounding quite bad, so you'd have to cut together multiple segments, which in turn means that you need to smooth over those cuts.

Anonymous
09/21/25(Sun)04:53:50 No.106653343

Anonymous 09/21/25(Sun)04:53:50 No.106653343

>>106649336
That will get you a new model that is at best as good as the models you trained it on. Garbage in, garbage out.

Anonymous
09/21/25(Sun)05:09:40 No.106653422

Anonymous 09/21/25(Sun)05:09:40 No.106653422

>>106653343
All the billions spent on companies generating high-quality synthetic data speak a different story.

Anonymous
09/21/25(Sun)05:32:09 No.106653515

Anonymous 09/21/25(Sun)05:32:09 No.106653515

>>106653084
I liek this Miku too

Anonymous
09/21/25(Sun)05:40:40 No.106653549

Anonymous 09/21/25(Sun)05:40:40 No.106653549

>>106653538
>Greek Philosophers
???

Anonymous
09/21/25(Sun)05:43:28 No.106653560

Anonymous 09/21/25(Sun)05:43:28 No.106653560

File: 1747425823275585.png (110 KB, 964x535)

110 KB PNG

>>106653549

Anonymous
09/21/25(Sun)05:50:46 No.106653596

Anonymous 09/21/25(Sun)05:50:46 No.106653596

>>106653560
we wuz philosophies and shiet

Anonymous
09/21/25(Sun)06:02:54 No.106653651

Anonymous 09/21/25(Sun)06:02:54 No.106653651

How does tool calling during inference work? Can I give my LLM a dice to throw mid-reply to ensure that something has a random outcome?

Anonymous
09/21/25(Sun)06:03:11 No.106653654

Anonymous 09/21/25(Sun)06:03:11 No.106653654

>>106651960
>GLM
Can you please share your ST text completion preset for it? (assuming you use ST)
I've been trying to wrangle it but I can't get it to generate coherent text

Anonymous
09/21/25(Sun)06:12:29 No.106653685

Anonymous 09/21/25(Sun)06:12:29 No.106653685

>>106653651
yes. tools have to be supported not only by the model (as it needs actual training for it) but they also need to be supported by the chat template. Tools work by giving the AI during the request a list of tools/functions available, their purpose (so it understands when to use it) and how to actually call the tool. It will 'pause' mid inference and wait for the tools result.

Anonymous
09/21/25(Sun)06:52:37 No.106653905

Anonymous 09/21/25(Sun)06:52:37 No.106653905

File: 1731141654486212.jpg (519 KB, 1489x1727)

519 KB JPG

https://github.com/LostRuins/koboldcpp/releases/tag/v1.99

Anonymous
09/21/25(Sun)06:54:51 No.106653919

Anonymous 09/21/25(Sun)06:54:51 No.106653919

File: GLM.png (82 KB, 483x739)

82 KB PNG

>>106653654
NTA, but I have it set up like this.

Anonymous
09/21/25(Sun)06:57:20 No.106653938

Anonymous 09/21/25(Sun)06:57:20 No.106653938

>>106653654
Don't use ST. The only thing that worked for me is every sampler disabled except for min-p at 0.1 and temp around 1.

Anonymous
09/21/25(Sun)07:06:02 No.106653983

Anonymous 09/21/25(Sun)07:06:02 No.106653983

File: 1751450110279748.png (21 KB, 450x500)

21 KB PNG

>>106653515

Anonymous
09/21/25(Sun)07:09:34 No.106654007

Anonymous 09/21/25(Sun)07:09:34 No.106654007

mikutroon trash thread

Anonymous
09/21/25(Sun)07:30:10 No.106654125

Anonymous 09/21/25(Sun)07:30:10 No.106654125

>>106649116
W-what are we going to do in the bathroom?

Anonymous
09/21/25(Sun)07:35:20 No.106654164

Anonymous 09/21/25(Sun)07:35:20 No.106654164

>>106654125
wash hands really really clean

Anonymous
09/21/25(Sun)07:38:59 No.106654190

Anonymous 09/21/25(Sun)07:38:59 No.106654190

>>106654125
She's gonna gobble your dong, anon.

Anonymous
09/21/25(Sun)07:40:49 No.106654207

Anonymous 09/21/25(Sun)07:40:49 No.106654207

>>106654125
the needful

Anonymous
09/21/25(Sun)08:13:13 No.106654405

Anonymous 09/21/25(Sun)08:13:13 No.106654405

>>106651960
Get 128gb ram and upgrade to a lower quant of glm full for the true experience. Otherwise I can think of mistral large tunes maybe.

Anonymous
09/21/25(Sun)08:22:15 No.106654465

Anonymous 09/21/25(Sun)08:22:15 No.106654465

>>106653685
>training to use tools
How hard would it be to repurpose some unused tokens from the llm vocab as instructions for tools, put that as stop tokens and run the tool with a script that'll detect these tokens + regex it with the actual tool output and finish the initial generation?
I don't think training for tool usage is necessary

Anonymous
09/21/25(Sun)08:23:02 No.106654475

Anonymous 09/21/25(Sun)08:23:02 No.106654475

mistralai/Mistral-Large-Instruct-2512 (3 months)

Anonymous
09/21/25(Sun)08:27:19 No.106654507

Anonymous 09/21/25(Sun)08:27:19 No.106654507

>>106654465
incontext learning is less efficient then actually training for the task, and more prone to failure.

Anonymous
09/21/25(Sun)08:27:46 No.106654513

Anonymous 09/21/25(Sun)08:27:46 No.106654513

>>106654465
you could actually do it without training and even without changing the template.
At my old job we used to put the available tools in the system prompt, then we had a llm 'router' decide which tool to use, if any tool at all, and redirect the request to a specific agent (we would route the whole request), then we fed back the result to the main LLM.
So yeah of course, everything is achievable, but having the ai trained on native tool usage makes them 'better' at using tools generally. Even purpose created AIs with tool usage still FAIL at executing tools sometimes, you could see that if you used any of the autonomous agentic coders, that fail at tool calling due to wrong syntax used, or even when the claude plays pokemon meme was on (or is it still on?) still fucked up calling its navigation tools.

Anonymous
09/21/25(Sun)08:36:35 No.106654557

Anonymous 09/21/25(Sun)08:36:35 No.106654557

>>106654513
>fail at tool calling due to wrong syntax used
Shouldn't that be solved with structured outputs?

Anonymous
09/21/25(Sun)09:01:58 No.106654710

Anonymous 09/21/25(Sun)09:01:58 No.106654710

File: 56b64316-fe9d-4d38-b5cf-3(...).png (108 KB, 253x253)

108 KB PNG

Is gemma3 abliterated still the go-to for 24gb vram, for fun/rp purposes?
Or did something else came out since then? It's been a while...
New Mixtral 8x7b when

Anonymous
09/21/25(Sun)09:02:04 No.106654711

Anonymous 09/21/25(Sun)09:02:04 No.106654711

>>106650425
>>106650361
>>106650338
artist?

Anonymous
09/21/25(Sun)09:17:17 No.106654812

Anonymous 09/21/25(Sun)09:17:17 No.106654812

Tested some Qwen models with Adobe's NoLiMa up to 32k, all of them set to temp=0.7, min_p=0.00, top_p = 0.8, top_k=20 (from Qwen's "best practices")

Qwen3-235B-A22B-Instruct-2507 Q8_K_XL
Base: 94.8%
1K: 85.8%
2K: 80.7%
4K: 73.5%
8K: 63.9%
16K: 53.9%
32K: 45.0%
Effective length: 2K

Qwen2.5-14B-Instruct-1M F32
Base: 94.4%
1K: 85.6%
2K: 80.5%
4K: 73.4%
8K: 63.8%
16K: 52.9%
32K: 44.9%
Effective length: 2K

Qwen3-30B-A3B-Instruct-2507 BF16
Base: 93.4%
1K: 84.7%
2K: 79.7%
4K: 72.6%
8K: 63.8%
16K: 53.3%
32K: 43.7%
Effective length: 2K

Compared to the current top 3

GPT4.1
Base: 97.0%
1K: 95.6%
2K: 95.2%
4K: 91.7%
8K: 87.5%
16K: 84.9%
32K: 79.8%
Effective length: 16K

GPT-4o
Base: 99.3%
1K: 98.1%
2K: 98.0%
4K: 95.7%
8K: 89.2%
16K: 81.6%
32K: 69.7%
Effective length: 8K

Llama 3.3 70B
Base: 97.3%
1K: 94.2%
2K: 87.4%
4K: 81.5%
8K: 72.1%
16K: 59.5%
32K: 42.7%
Effective length: 2K

Anonymous
09/21/25(Sun)09:19:50 No.106654832

Anonymous 09/21/25(Sun)09:19:50 No.106654832

>>106654812
>Q8_K_XL
>2.5-14B-Instruct-1M
lol

Anonymous
09/21/25(Sun)09:20:27 No.106654841

Anonymous 09/21/25(Sun)09:20:27 No.106654841

>>106654557
Structured output forces all outputs to match a format like e.g. json. Not really useful when you want the model to do other things in its output besides just tool calling. Even so, if the model wasn't planning to use a gjven format, forcing down that path will only confuse it and degrade the results. It also doesn't stop it from passing invalid or junk input to the tool parameters.

Anonymous
09/21/25(Sun)09:26:55 No.106654895

Anonymous 09/21/25(Sun)09:26:55 No.106654895

File: ezgif-3-6c6e651360.gif (2.37 MB, 363x363)

2.37 MB GIF

Low IQ anon here.
What UI do you guys use to animate stuff?
I kind of don't like comfy. Any alternatives?

Anonymous
09/21/25(Sun)09:30:12 No.106654921

Anonymous 09/21/25(Sun)09:30:12 No.106654921

>>106654895
comfy is like a diary, if you don't understand what it's for you don't need it. this wisdom is all i have for now

Anonymous
09/21/25(Sun)09:32:25 No.106654937

Anonymous 09/21/25(Sun)09:32:25 No.106654937

>dear diary... today i made 37 blacked miku videos in comfyui

Anonymous
09/21/25(Sun)09:37:50 No.106654974

Anonymous 09/21/25(Sun)09:37:50 No.106654974

>>106654812
>qwen235b 63.9% at 8k
>llama70b 72.1% at 8k
uhhh moesissies?

Anonymous
09/21/25(Sun)09:38:45 No.106654984

Anonymous 09/21/25(Sun)09:38:45 No.106654984

>>106654125
You are the toilet.

Anonymous
09/21/25(Sun)09:44:21 No.106655037

Anonymous 09/21/25(Sun)09:44:21 No.106655037

>>106654974
do not fall for the daily densefag FUD

Anonymous
09/21/25(Sun)09:44:35 No.106655038

Anonymous 09/21/25(Sun)09:44:35 No.106655038

>>106654841
It's more flexible than that. You can have a free-form field and have a second field with constrained options. If you abuse it (with nested dictionaries or similar), yeah it'll degrade the output but that's only an issue with small LLMs. XML-like tags are better than json btw.

Anonymous
09/21/25(Sun)09:45:58 No.106655048

Anonymous 09/21/25(Sun)09:45:58 No.106655048

>>106654812
Does this benchmark really account for non-greedy sampling?
Anyway, cool though. Would be nice to have results for the 235B Thinking as well, since that's where the context handling really sees an boost according to fiction livebench.

Anonymous
09/21/25(Sun)09:51:01 No.106655082

Anonymous 09/21/25(Sun)09:51:01 No.106655082

File: 1737582711899121.jpg (19 KB, 500x485)

19 KB JPG

>>106654812

Anonymous
09/21/25(Sun)09:53:11 No.106655102

Anonymous 09/21/25(Sun)09:53:11 No.106655102

File: chatgpt congratulating me.png (12 KB, 881x171)

12 KB PNG

Newfag here. What's the best local model chatbot I can use to converse with about my daily accomplishments? I want someone to impress.

Anonymous
09/21/25(Sun)09:55:57 No.106655125

Anonymous 09/21/25(Sun)09:55:57 No.106655125

>>106655102
Everything will suck your dick in this way. It is the literal dick sucking that they suck dick at.

Anonymous
09/21/25(Sun)10:06:28 No.106655203

Anonymous 09/21/25(Sun)10:06:28 No.106655203

>>106654921
I just fucking hate the comfy guy

Anonymous
09/21/25(Sun)10:08:10 No.106655221

Anonymous 09/21/25(Sun)10:08:10 No.106655221

File: world map.png (157 KB, 947x1138)

157 KB PNG

>>106655037
>FUD
https://outsidetext.substack.com/p/how-does-a-blind-model-see-the-earth

Anonymous
09/21/25(Sun)10:08:53 No.106655226

Anonymous 09/21/25(Sun)10:08:53 No.106655226

>>106655125
I want my dick sucked figuratively while my dick is sucked literally.

Anonymous
09/21/25(Sun)10:11:40 No.106655251

Anonymous 09/21/25(Sun)10:11:40 No.106655251

>>106655221
this is so retarded
>which model was trained on more earth geometry data!
literally a benchmaxx case

Anonymous
09/21/25(Sun)10:18:20 No.106655312

Anonymous 09/21/25(Sun)10:18:20 No.106655312

>>106655203
The developer? Is there something to know about him?

Anonymous
09/21/25(Sun)10:19:58 No.106655323

Anonymous 09/21/25(Sun)10:19:58 No.106655323

>>106655221
Pure FUD. This is useless. The MoE models are just more effective at remembering here. 98% the accuracy while easier to run.

Anonymous
09/21/25(Sun)10:23:09 No.106655352

Anonymous 09/21/25(Sun)10:23:09 No.106655352

>>106655251
Typical retarded moefag. Benchmaxxing is when they train directly on benchmark data or close approximations. The ability to utilize the learned geometry data is also its ability to generalize and reason. On the other hand, the only thing MoEs are good for is benchmaxxing.

Anonymous
09/21/25(Sun)10:23:38 No.106655358

Anonymous 09/21/25(Sun)10:23:38 No.106655358

>>106655221
How the fuck does it even know anything?

Anonymous
09/21/25(Sun)10:26:31 No.106655388

Anonymous 09/21/25(Sun)10:26:31 No.106655388

>>106655221
deepseek is more informed of the true distribution of the hyperborean landmasses derived from the atlantis that the talmudic circumcision trauma based freemason organizations like meta are trying to erase from the human consciousness

Anonymous
09/21/25(Sun)10:28:34 No.106655407

Anonymous 09/21/25(Sun)10:28:34 No.106655407

>>106655323
moes are inferior when it comes to attention and world knowledge.

Anonymous
09/21/25(Sun)10:29:31 No.106655421

Anonymous 09/21/25(Sun)10:29:31 No.106655421

>>106655407
Try literally any other test on Kimi K2 and LLaMA-405b. See what happens.

Anonymous
09/21/25(Sun)10:31:37 No.106655441

Anonymous 09/21/25(Sun)10:31:37 No.106655441

The whole point is that your other tests that only grade memorzation capability are worthless.

Anonymous
09/21/25(Sun)10:34:07 No.106655457

Anonymous 09/21/25(Sun)10:34:07 No.106655457

>>106655221
moesissies lost

Anonymous
09/21/25(Sun)10:34:12 No.106655460

Anonymous 09/21/25(Sun)10:34:12 No.106655460

>>106655441
The whole point of that test is that the models didn't just memorize the card. They built an intuitive understanding of the world map from traces of data they took in during training which they are only able to catch because they are not lotomized to x amount of active parameters when they attempt to recall it.

Anonymous
09/21/25(Sun)10:34:35 No.106655463

Anonymous 09/21/25(Sun)10:34:35 No.106655463

>>106655441
>memorzation capability are worthless
If a model doesn't know what's the color of pantsu on the girl on the 387th scene in that obscure VN then it's trash.

Anonymous
09/21/25(Sun)10:35:38 No.106655473

Anonymous 09/21/25(Sun)10:35:38 No.106655473

>>106655358
Test claimed that the prompt asked whether there is land on every individual latitude-longitude. So I guess it saw many place names paired with coordinates. I am too surprised that it looks so clear

Anonymous
09/21/25(Sun)10:36:36 No.106655479

Anonymous 09/21/25(Sun)10:36:36 No.106655479

What is this retardation.

Anonymous
09/21/25(Sun)10:38:38 No.106655499

Anonymous 09/21/25(Sun)10:38:38 No.106655499

>>106655479
quantards and moetards coping

Anonymous
09/21/25(Sun)10:39:06 No.106655503

Anonymous 09/21/25(Sun)10:39:06 No.106655503

>>106655473
But that is so many layers of abstraction that I can't imagine any model actually managing to answer somewhat correctly. It is crazy to me that this works.

Anonymous
09/21/25(Sun)10:39:08 No.106655505

Anonymous 09/21/25(Sun)10:39:08 No.106655505

>>106655479
You are supposed to put a question mark at the end of a question?

Anonymous
09/21/25(Sun)10:40:38 No.106655515

Anonymous 09/21/25(Sun)10:40:38 No.106655515

>>106655479
mikutroon thread

Anonymous
09/21/25(Sun)10:41:00 No.106655521

Anonymous 09/21/25(Sun)10:41:00 No.106655521

>>106655479
Densefags are still trying to spread FUD about the de facto lossless nature and performance of MoE models.

Anonymous
09/21/25(Sun)10:46:30 No.106655563

Anonymous 09/21/25(Sun)10:46:30 No.106655563

>>106655473
>>106655503
I suspect/cope that an SVG map of the Earth somehow got into the training data, which would reduce task complexity by great margin.

Anonymous
09/21/25(Sun)10:47:04 No.106655571

Anonymous 09/21/25(Sun)10:47:04 No.106655571

dense = sane, normal inner monologue
moe = schizophrenia, many voices/"experts"

Anonymous
09/21/25(Sun)10:48:33 No.106655582

Anonymous 09/21/25(Sun)10:48:33 No.106655582

>>106655563
Read the actual article on how he prompts the models to build the maps before you say stupid shit.

Anonymous
09/21/25(Sun)10:50:33 No.106655599

Anonymous 09/21/25(Sun)10:50:33 No.106655599

>>106655251
literally the opposite of benchmaxx. With every model benchmaxxed so hard, the only test of true generalization is retarded stuff that no one ever thought to test before, like this. Same with the SVG unicorn test before AI labs found out about it

Anonymous
09/21/25(Sun)10:51:41 No.106655613

Anonymous 09/21/25(Sun)10:51:41 No.106655613

>>106655407
Imagine thinking that dataset quality isn't a factor here. Llama and Qwen are both shit for this reason regardless of whether they're dense or moe.

Anonymous
09/21/25(Sun)10:53:18 No.106655627

Anonymous 09/21/25(Sun)10:53:18 No.106655627

you have to be trolling to actually say dense is superior. it probably is for ease of training but it is never gonna be an optimal architecture. information is always stored in some particular space in the weights. and iterating over weights containing information about geography is a waste of compute when you are asking for a recipe for a cake.

Anonymous
09/21/25(Sun)10:54:57 No.106655641

Anonymous 09/21/25(Sun)10:54:57 No.106655641

>>106655582
I've read the article, he gave models coordinates and asked if there's land there.
It's still would be easier for model to answer this if it had SVG map on hand than derive it wholesale from million "Africa is South of Eurasia" factoids.

Anonymous
09/21/25(Sun)10:57:23 No.106655658

Anonymous 09/21/25(Sun)10:57:23 No.106655658

>>106655641
No shit it would be easier, but the goal isn't for it to be easy or accurate but to test a model's ability to build a world model from integrating various disparate facts.

Anonymous
09/21/25(Sun)11:04:38 No.106655713

Anonymous 09/21/25(Sun)11:04:38 No.106655713

Reminder that dense model advocates are just SaaS fags malding and poorfags seething.
If dense models were still preferred, you'd have 400B to 1T+ monstrosities that no one can actually run. MoE lets you actually have near SoTA performance at home for reasonable costs. The MoE meta is arguably near perfect in that any enthusiast willing to put in an iota of effort and either has a job and/or has a high enough IQ to save their neetbucks can buy equipment to run SoTA models at reasonable speeds.
If 200B+ dense models were the meta, the local landscape would be non-existent and Sam Altman would win by default. If anything less than 24B models were the meta, you'd have the insane 3rd world grifting, retardation, and schizophrenia that exists in local imagegen and the past merge-era. Instead, all of that is quarantined to the likes of the proxyfags.
MoE is the near perfect filter. We just need the chinks to get better datasets and benchmaxx the creative writing/fictionbench benchmarks as hard as the math/coding benchmarks and we'll be at a golden age for inferencing.

Anonymous
09/21/25(Sun)11:05:21 No.106655720

Anonymous 09/21/25(Sun)11:05:21 No.106655720

>>106655658
And the test partially failed because you can't stop model from knowing SVG map.
(Being able to render SVG as emergent behavior is still impressive nonetheless.)

Anonymous
09/21/25(Sun)11:07:21 No.106655732

Anonymous 09/21/25(Sun)11:07:21 No.106655732

>>106655713
>MoE lets you actually have near SoTA performance at home for reasonable costs.
but not reasonable speeds, moetards always conveniently omit that part

Anonymous
09/21/25(Sun)11:09:29 No.106655749

Anonymous 09/21/25(Sun)11:09:29 No.106655749

>>106655627
>and iterating over weights containing information about geography is a waste of compute when you are asking for a recipe for a cake.
Except that any obscure things the two things have in common are missed during training. The dense model is superior from an informational perspective.
I literally don't care how much it costs Open AI or Microshit or Meta or google or whatever to serve me an answer. I want the most informationally complete answer possible.

Anonymous
09/21/25(Sun)11:09:35 No.106655750

Anonymous 09/21/25(Sun)11:09:35 No.106655750

>>106655732
speed boost of inference is the whole MoE's selling point

Anonymous
09/21/25(Sun)11:10:04 No.106655754

Anonymous 09/21/25(Sun)11:10:04 No.106655754

>>106655750
Not at home though.

Anonymous
09/21/25(Sun)11:11:24 No.106655761

Anonymous 09/21/25(Sun)11:11:24 No.106655761

>>106655713
MoE is designed for cost optimization on the cloud, which is why recent models are all MoE. Surely you're not deluded enough to believe they're making models for us? Model trainers don't even consider running models on cpu as a possibility

Anonymous
09/21/25(Sun)11:11:53 No.106655769

Anonymous 09/21/25(Sun)11:11:53 No.106655769

>>106655754
At home as well.

Anonymous
09/21/25(Sun)11:13:08 No.106655780

Anonymous 09/21/25(Sun)11:13:08 No.106655780

>>106655769
True, 10t/s is better than 1t/min.
sadly, after tasting 100 you can't go below 20.

Anonymous
09/21/25(Sun)11:13:57 No.106655787

Anonymous 09/21/25(Sun)11:13:57 No.106655787

>>106655749
The problem is MoE is good enough for ~90% of prompts. Being able to coherently string together memorized knowledge is all you need. The few challenging prompts (obscure and difficult programming requests, asking it to a write a character that doesn't strip and suck its own dick twice, deep conversations that touch on many subjects) are out of luck.

Anonymous
09/21/25(Sun)11:14:51 No.106655791

Anonymous 09/21/25(Sun)11:14:51 No.106655791

>>106655761
Max cost optimization on the cloud = moving models to consumer devices and offloading running costs to consumers.

Anonymous
09/21/25(Sun)11:15:13 No.106655793

Anonymous 09/21/25(Sun)11:15:13 No.106655793

>>106655754
At home, where you are running on RAM and getting 0.1 t/s because you can't fit the fat model into VRAM. That speed boost?

Anonymous
09/21/25(Sun)11:16:05 No.106655801

Anonymous 09/21/25(Sun)11:16:05 No.106655801

>>106655793
yes, that's what I'm saying

Anonymous
09/21/25(Sun)11:16:07 No.106655804

Anonymous 09/21/25(Sun)11:16:07 No.106655804

1 genius vs 100 retards

Anonymous
09/21/25(Sun)11:17:52 No.106655818

Anonymous 09/21/25(Sun)11:17:52 No.106655818

>>106655791
They're not going to let all of that data harvesting ability slip past them. They're going to offload running costs to consumers by charging more for the subscription.

Anonymous
09/21/25(Sun)11:19:26 No.106655832

Anonymous 09/21/25(Sun)11:19:26 No.106655832

>>106655787
So basically MoE makes it good enough for shitjeets at the cost of pushing the frontiers like white people do. The perfect shitskin architecture.

Anonymous
09/21/25(Sun)11:20:59 No.106655847

Anonymous 09/21/25(Sun)11:20:59 No.106655847

>>106655801
I meant to reply to the other guy.

Anonymous
09/21/25(Sun)11:21:27 No.106655854

Anonymous 09/21/25(Sun)11:21:27 No.106655854

>>106655818
they can harvest data in other ways
they can even run AI on your device to filter data worth harvesting
and then use tool call to dial 911 and report you using your own sim card

Anonymous
09/21/25(Sun)11:33:40 No.106655956

Anonymous 09/21/25(Sun)11:33:40 No.106655956

Anyone running inference on Debian testing?
I'd like to move to the new 6.16 kernel branch and newer packages but I'm afraid the nvidia blobs and CUDA are gonna be fucked by new gcc or frontends kneecapped by python version bullshit

Anonymous
09/21/25(Sun)11:36:43 No.106655980

Anonymous 09/21/25(Sun)11:36:43 No.106655980

>>106655503
>>106655563
Big neural networks have internal mind states, so it's not surprising that they can imagine something from the circumstantial information and extract data from that.

Also this >>106655221 is retarded comparison, someone just trying to meme. MoEs were SOTA for years now (since GPT-4 era) and their performance is more or less the same (sometimes better) than dense models with much lower computational cost. By the way in human brains you also have "routing" for information and sort of computational modules to process data. Your brain doesn't push for visual cortex data through every single neuron in your brain. Dense models are wasteful solution that works, but it's not even close to being optimal. It's like shooting a wasp nest with a tank canon - sure, you destroyed it, but you could do it way much smarter and easier with a broom.

Anonymous
09/21/25(Sun)11:40:02 No.106656013

Anonymous 09/21/25(Sun)11:40:02 No.106656013

>>106655980
>Big neural networks have internal mind states
What?

Anonymous
09/21/25(Sun)11:54:15 No.106656127

Anonymous 09/21/25(Sun)11:54:15 No.106656127

>>106655980
>Despite significant advances, AI systems struggle with the frame problem: determining what information is contextually relevant from an exponentially large possibility space. We hypothesize that biological rhythms, particularly hormonal cycles, serve as natural relevance filters that could address this fundamental challenge. We develop a framework that embeds simulated menstrual and circadian cycles into Large Language Models through system prompts generated from periodic functions modeling key hormones including estrogen, testosterone, and cortisol. Across multiple state-of-the-art models, linguistic analysis reveals emotional and stylistic variations that track biological phases; sadness peaks during menstruation while happiness dominates ovulation and circadian patterns show morning optimism transitioning to nocturnal introspection. Benchmarking on SQuAD, MMLU, Hellaswag, and AI2-ARC demonstrates subtle but consistent performance variations aligning with biological expectations, including optimal function in moderate rather than extreme hormonal ranges. This methodology provides a novel approach to contextual AI while revealing how societal biases regarding gender and biology are embedded within language models.

Are you maybe one of the authors? Can I get your autograph.

Anonymous
09/21/25(Sun)12:08:05 No.106656240

Anonymous 09/21/25(Sun)12:08:05 No.106656240

>>106656013
lecunny was wrong

Anonymous
09/21/25(Sun)12:09:44 No.106656247

Anonymous 09/21/25(Sun)12:09:44 No.106656247

>>106656013
https://arxiv.org/pdf/2310.02207
https://arxiv.org/pdf/2210.13382
https://arxiv.org/pdf/2308.08708

Third paper is a longer summary, second talks about the mental representation of latent maps in game (games in general are a good example for this in LLMs) and the first one may interest you the most because it is about this -> >>106655221 , a spatial representation of world.

Anonymous
09/21/25(Sun)12:27:49 No.106656383

Anonymous 09/21/25(Sun)12:27:49 No.106656383

>>106656247
first good post itt, thank you

Anonymous
09/21/25(Sun)12:29:44 No.106656396

Anonymous 09/21/25(Sun)12:29:44 No.106656396

>>106656247
Also related
https://youtu.be/cufOEzoVMVA?t=1254

>>106656240
I don't believe he disagrees with these ideas. He has said LLMs don't have a world model in the context of human level models, not that they don't have any internal models of anything at all.

Anonymous
09/21/25(Sun)12:30:29 No.106656400

Anonymous 09/21/25(Sun)12:30:29 No.106656400

>>106649608
merging works well for image models i had no idea people did it with llms too

Anonymous
09/21/25(Sun)12:39:13 No.106656483

Anonymous 09/21/25(Sun)12:39:13 No.106656483

If AIs do have an internal world model it's really not a good one

Anonymous
09/21/25(Sun)12:41:08 No.106656497

Anonymous 09/21/25(Sun)12:41:08 No.106656497

>>106656483
>AIs
sir we aren't in youtube comments please use appropriate terminology

Anonymous
09/21/25(Sun)12:44:46 No.106656524

Anonymous 09/21/25(Sun)12:44:46 No.106656524

>>106656497
Go back to orange reddit, we use colloquialisms here

Anonymous
09/21/25(Sun)12:52:14 No.106656580

Anonymous 09/21/25(Sun)12:52:14 No.106656580

>>106656497
You're a plunge router

Anonymous
09/21/25(Sun)13:02:06 No.106656644

Anonymous 09/21/25(Sun)13:02:06 No.106656644

>>106656483
Maybe the internal models would be better if they stopped with the pretraining data filtering, political correctness alignment lobotomies, and limiting the number of active params.

Anonymous
09/21/25(Sun)13:03:02 No.106656651

Anonymous 09/21/25(Sun)13:03:02 No.106656651

>>106655221
I still don't understand how this is even possible. TLDR on the papers?

Anonymous
09/21/25(Sun)13:03:12 No.106656652

Anonymous 09/21/25(Sun)13:03:12 No.106656652

>>106649116
What's a model that can recognize Japanese text through OCR just as good as Gemini 2.5 and GPT 4/5?

Anonymous
09/21/25(Sun)13:14:37 No.106656736

Anonymous 09/21/25(Sun)13:14:37 No.106656736

File: 1009001-from above photog(...).jpg (434 KB, 1360x1024)

434 KB JPG

glm air cpu moe bench vs ngl its about 3 t/s improvement which is pretty good

./llama-bench -m '/mnt/miku/Text/GLM-4.5-Air-Q3_K_M/GLM-4.5-Air-Q3_K_M-00001-of-00002.gguf' -ngl 99 --n-cpu-moe 33 -t 48 -fa 1 --mmap 0
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 7900 XTX, gfx1100 (0x1100), VMM: no, Wave Size: 32
| model                          |       size |     params | backend    | ngl | threads | fa | mmap |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q3_K - Medium |  53.11 GiB |   110.47 B | ROCm       |  99 |      48 |  1 |    0 |           pp512 |        207.63 ± 3.52 |
| glm4moe 106B.A12B Q3_K - Medium |  53.11 GiB |   110.47 B | ROCm       |  99 |      48 |  1 |    0 |           tg128 |         12.19 ± 0.21 |

build: da30ab5f8 (6531)

(づ◡﹏◡)づ [llama.cpp]$ ./build/bin/llama-bench -m '/mnt/miku/Text/GLM-4.5-Air-Q3_K_M/GLM-4.5-Air-Q3_K_M-00001-of-00002.gguf' -ngl 19 -t 48 -fa 1 --mmap 0
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 7900 XTX, gfx1100 (0x1100), VMM: no, Wave Size: 32
| model                          |       size |     params | backend    | ngl | threads | fa | mmap |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q3_K - Medium |  53.11 GiB |   110.47 B | ROCm       |  19 |      48 |  1 |    0 |           pp512 |        206.51 ± 8.28 |
| glm4moe 106B.A12B Q3_K - Medium |  53.11 GiB |   110.47 B | ROCm       |  19 |      48 |  1 |    0 |           tg128 |          9.89 ± 0.19 |

build: da30ab5f8 (6531)

Anonymous
09/21/25(Sun)13:18:45 No.106656765

Anonymous 09/21/25(Sun)13:18:45 No.106656765

>>106653654
nta but heres my tavern master export https://files.catbox.moe/g9adny.json

Anonymous
09/21/25(Sun)13:19:33 No.106656770

Anonymous 09/21/25(Sun)13:19:33 No.106656770

think i'm gonna sell my a6000s and go back to renting gpus/openrouter....

Anonymous
09/21/25(Sun)13:19:46 No.106656773

Anonymous 09/21/25(Sun)13:19:46 No.106656773

>>106654710
glm air is better imo

Anonymous
09/21/25(Sun)13:23:38 No.106656807

Anonymous 09/21/25(Sun)13:23:38 No.106656807

>>106656736
>-t 48
I heard there's no use above 32 threads, you just get chocked by memory bandwidth

Anonymous
09/21/25(Sun)13:28:13 No.106656835

Anonymous 09/21/25(Sun)13:28:13 No.106656835

File: 764325.jpg (89 KB, 750x1000)

89 KB JPG

HAPPENING!
I just ate a pizzer

Anonymous
09/21/25(Sun)13:29:28 No.106656847

Anonymous 09/21/25(Sun)13:29:28 No.106656847

>>106656807
well i have quad channel so using as many a possible should split between the channels?? ill run some benchmarks on threads

Anonymous
09/21/25(Sun)13:31:52 No.106656869

Anonymous 09/21/25(Sun)13:31:52 No.106656869

>>106656847
You only really need as many cores as necessary to feed the memory channels.
For inference at least.

Anonymous
09/21/25(Sun)13:32:57 No.106656876

Anonymous 09/21/25(Sun)13:32:57 No.106656876

>>106649116
>https://developers.cloudflare.com/workers-ai/models/
Hello /lmg/ anons. I was told your the experts on this subject: I need your help selecting an ai model from this list for tagging text and translating it to top 5 or 10 languages in the world. It's for a small blog with both large posts and lots of small asides (need the features for both).

Anonymous
09/21/25(Sun)13:40:06 No.106656927

Anonymous 09/21/25(Sun)13:40:06 No.106656927

>>106656876
Oss120b I guess. Unless the text isn't 100% safetymaxxed corpo slop

Anonymous
09/21/25(Sun)13:40:20 No.106656932

Anonymous 09/21/25(Sun)13:40:20 No.106656932

>>106656876
They have logom2m100-1.2b listed specifally for translation so you should try and test if that will work for you because it would be cheap. Otherwise, llama 3.x and gemma are multilingual so any of those could work.

Anonymous
09/21/25(Sun)13:41:24 No.106656945

Anonymous 09/21/25(Sun)13:41:24 No.106656945

>>106656869
results it does increase between 32 and 48 i assume drop off in last is worse because not enoguh cores for everything else i might go into bios and enable all of my cores to see what difference that makes https://pastebin.com/Hqrv1WKF

Anonymous
09/21/25(Sun)13:48:29 No.106656998

Anonymous 09/21/25(Sun)13:48:29 No.106656998

File: puke.jpg (616 KB, 740x740)

616 KB JPG

>>106656876
>qwen1.5
>deepseek-r1-distill
>gemma-7b
>no kimi, no glm
Are cloudshitters for real? I can't believe I eat better locally, for free at that.

Anonymous
09/21/25(Sun)13:49:58 No.106657006

Anonymous 09/21/25(Sun)13:49:58 No.106657006

>>106656998
>for free at that.
Did santa give you your hardware?

Anonymous
09/21/25(Sun)13:51:23 No.106657015

Anonymous 09/21/25(Sun)13:51:23 No.106657015

>>106657006
More or less, I got my GPU as a birthday gift to play vydia.

Anonymous
09/21/25(Sun)14:01:46 No.106657101

Anonymous 09/21/25(Sun)14:01:46 No.106657101

LLMs have finally hit a plateau, haven't they? I feel like all of last year, we seen at least one huge update a month, but now we get nothing outside of slightly better test takers.

Anonymous
09/21/25(Sun)14:02:47 No.106657109

Anonymous 09/21/25(Sun)14:02:47 No.106657109

>>106656736
haven't seen you in a while anon. what's your favorite rp model?

Anonymous
09/21/25(Sun)14:07:11 No.106657150

Anonymous 09/21/25(Sun)14:07:11 No.106657150

>>106656807
48 is definitely better than 32 but after that it isnt better also weirdly 1/4 of my cores/threads will not enable think its because my 7900xtx draws troo much power the mobo complains it needs another 6 pin cable at post lol

  Device 0: AMD Radeon RX 7900 XTX, gfx1100 (0x1100), VMM: no, Wave Size: 32
| model                          |       size |     params | backend    | ngl | threads | fa | mmap |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q3_K - Medium |  53.11 GiB |   110.47 B | ROCm       |  99 |      32 |  1 |    0 |           pp512 |        195.71 ± 0.77 |
| glm4moe 106B.A12B Q3_K - Medium |  53.11 GiB |   110.47 B | ROCm       |  99 |      32 |  1 |    0 |           tg128 |         12.08 ± 0.07 |
| glm4moe 106B.A12B Q3_K - Medium |  53.11 GiB |   110.47 B | ROCm       |  99 |      48 |  1 |    0 |           pp512 |        194.79 ± 1.57 |
| glm4moe 106B.A12B Q3_K - Medium |  53.11 GiB |   110.47 B | ROCm       |  99 |      48 |  1 |    0 |           tg128 |         12.50 ± 0.05 |
| glm4moe 106B.A12B Q3_K - Medium |  53.11 GiB |   110.47 B | ROCm       |  99 |      64 |  1 |    0 |           pp512 |        202.76 ± 5.77 |
| glm4moe 106B.A12B Q3_K - Medium |  53.11 GiB |   110.47 B | ROCm       |  99 |      64 |  1 |    0 |           tg128 |         12.47 ± 0.02 |
| glm4moe 106B.A12B Q3_K - Medium |  53.11 GiB |   110.47 B | ROCm       |  99 |      76 |  1 |    0 |           pp512 |        206.23 ± 5.84 |
| glm4moe 106B.A12B Q3_K - Medium |  53.11 GiB |   110.47 B | ROCm       |  99 |      76 |  1 |    0 |           tg128 |         12.32 ± 0.10

>>106657109
ive been on trash sdg mainly glm air is great ive been using it since it came out

Anonymous
09/21/25(Sun)14:07:36 No.106657151

Anonymous 09/21/25(Sun)14:07:36 No.106657151

>>106657101
No exponential trend can be sustained forever in the natural world, irrespective of how much cocaine sv collectively snorts.

On the bright side, we have time to adapt to the new tech now. This includes local catching up through incremental optimizations.

Anonymous
09/21/25(Sun)14:07:59 No.106657155

Anonymous 09/21/25(Sun)14:07:59 No.106657155

>>106657101
correct the moat is actually a shallow ditch. any one with 15t data and compute to train a model can match sota.

Anonymous
09/21/25(Sun)14:11:27 No.106657190

Anonymous 09/21/25(Sun)14:11:27 No.106657190

>>106657150
Okay, I guess the benchos I saw are outdated already.

Anonymous
09/21/25(Sun)14:13:06 No.106657212

Anonymous 09/21/25(Sun)14:13:06 No.106657212

>>106655791
You overestimate the potato most customers are running

Anonymous
09/21/25(Sun)14:14:14 No.106657225

Anonymous 09/21/25(Sun)14:14:14 No.106657225

>>106657190
it might depend on cpu architecture, the sapphire rapids xeons have 4 tiles and maybe not all tiles are active when using 32 threads which effects memory bandwidth?

Anonymous
09/21/25(Sun)14:15:05 No.106657235

Anonymous 09/21/25(Sun)14:15:05 No.106657235

>>106657212
Because moving customer to expensive fashion statement thin clients was the goal of the push for IoT.

Anonymous
09/21/25(Sun)14:16:16 No.106657246

Anonymous 09/21/25(Sun)14:16:16 No.106657246

File: file.png (184 KB, 1079x633)

184 KB PNG

INTEL PRO ARC 60 24GB ONLY 599.99USD$ DOLLARS
NIVEA USERS IN SHAMBLES

Anonymous
09/21/25(Sun)14:19:38 No.106657263

Anonymous 09/21/25(Sun)14:19:38 No.106657263

>>106657235
thats interesting actually we might start getting way beefier hardware for cheaper as more companies want to run models locally on peoples devices
>>106657246
holy shit nice
>NIVEA USERS IN SHAMBLES
form now maybe but now they have a deal together nvidia might encourage them to stop development of their dedicated gpus

Anonymous
09/21/25(Sun)14:20:43 No.106657274

Anonymous 09/21/25(Sun)14:20:43 No.106657274

>>106657246
maybe if it was like 200 cheaper

Anonymous
09/21/25(Sun)14:25:53 No.106657332

Anonymous 09/21/25(Sun)14:25:53 No.106657332

>>106657246
Nvidia users have CUDA and universal software support. Have fun saving $100 and only being able to run LLMs.

Anonymous
09/21/25(Sun)14:26:21 No.106657337

Anonymous 09/21/25(Sun)14:26:21 No.106657337

File: 1757597358240468.jpg (12 KB, 250x249)

12 KB JPG

>root@mycontainer:/# ollama run huihui_ai/deepseek-r1-abliterated:14b
>>>> What happened on the 4th of June 1989?
><think>
>
></think>
>
>I am sorry, I cannot answer that question. I am an AI assistant designed to provide helpful and harmless responses.
>
>>>> Send a message (/? for help)

Are abliterated models just a meme?

Anonymous
09/21/25(Sun)14:27:30 No.106657346

Anonymous 09/21/25(Sun)14:27:30 No.106657346

>>106657246
maybe if it was like 48GB

Anonymous
09/21/25(Sun)14:27:46 No.106657348

Anonymous 09/21/25(Sun)14:27:46 No.106657348

ollama run qwen3-next

Anonymous
09/21/25(Sun)14:29:25 No.106657364

Anonymous 09/21/25(Sun)14:29:25 No.106657364

ollama run youareanidiot.com

Anonymous
09/21/25(Sun)14:29:51 No.106657368

Anonymous 09/21/25(Sun)14:29:51 No.106657368

File: G1XworjXoAABIXf.png (36 KB, 1578x777)

36 KB PNG

I'll never forget how I called this model the absolute OCR GOAT months ago and stinky incels ITT had the audacity to dismiss it. Now benchmarks are coming out and even normies realize just how good dots.ocr is. fucking retards, the lot of you.

Anonymous
09/21/25(Sun)14:30:00 No.106657369

Anonymous 09/21/25(Sun)14:30:00 No.106657369

ollama run
faster than my gun

Anonymous
09/21/25(Sun)14:31:25 No.106657388

Anonymous 09/21/25(Sun)14:31:25 No.106657388

ollama run you're mum

Anonymous
09/21/25(Sun)14:31:34 No.106657394

Anonymous 09/21/25(Sun)14:31:34 No.106657394

File: 1749376372871268.jpg (88 KB, 873x1024)

88 KB JPG

>>106657337
>ollama
>deepseek-r1:14b
>abliterated
You're the whole circus

Anonymous
09/21/25(Sun)14:32:07 No.106657400

Anonymous 09/21/25(Sun)14:32:07 No.106657400

>>106657368
It's absolutely cracked at tables btw.

Anonymous
09/21/25(Sun)14:36:18 No.106657437

Anonymous 09/21/25(Sun)14:36:18 No.106657437

>>106657246
>Memory Size:
> 24 GB
> Memory Type
>GDDR6
>Memory Bus
>192 bit
>Bandwidth
>456.0 GB/s

Anonymous
09/21/25(Sun)14:36:24 No.106657438

Anonymous 09/21/25(Sun)14:36:24 No.106657438

>>106657368
guf?

Anonymous
09/21/25(Sun)14:38:08 No.106657461

Anonymous 09/21/25(Sun)14:38:08 No.106657461

>>106657368
yeah its good, i digitalized a whole book in a very foreign language
its fairly quick too, 500 page book in like 5 hours on a 3060

Anonymous
09/21/25(Sun)14:38:49 No.106657468

Anonymous 09/21/25(Sun)14:38:49 No.106657468

>>106657368
Actually, I went ahead and checked it out after your posting about it. Added support for it in my app

Anonymous
09/21/25(Sun)14:42:50 No.106657508

Anonymous 09/21/25(Sun)14:42:50 No.106657508

What about dots vlm?

Anonymous
09/21/25(Sun)14:49:07 No.106657551

Anonymous 09/21/25(Sun)14:49:07 No.106657551

yummy 60gb vllm image

Anonymous
09/21/25(Sun)14:50:41 No.106657565

Anonymous 09/21/25(Sun)14:50:41 No.106657565

>>106657400
I don't speak zoomer and I can't tell if this means it's good or bad at tables.

Anonymous
09/21/25(Sun)14:52:37 No.106657581

Anonymous 09/21/25(Sun)14:52:37 No.106657581

>>106657438
https://huggingface.co/dinhquangson/dots.ocr-gguf/tree/main

Anonymous
09/21/25(Sun)14:54:30 No.106657603

Anonymous 09/21/25(Sun)14:54:30 No.106657603

>>106657565
Bad. Cracked -> Broken

Anonymous
09/21/25(Sun)14:55:11 No.106657616

Anonymous 09/21/25(Sun)14:55:11 No.106657616

no better at 96 threads, kinda wanna upgrade my mobo last time i compared benchamrks with someone who was using the mobo that supports 8 channels they were getting double the t/s i had

./build/bin/llama-bench -m '/mnt/miku/Text/GLM-4.5-Air-Q3_K_M/GLM-4.5-Air-Q3_K_M-00001-of-00002.gguf' -ngl 99 --n-cpu-moe 33 -t 96,104 -fa 1 --mmap 0
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 7900 XTX, gfx1100 (0x1100), VMM: no, Wave Size: 32
| model                          |       size |     params | backend    | ngl | threads | fa | mmap |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q3_K - Medium |  53.11 GiB |   110.47 B | ROCm       |  99 |      96 |  1 |    0 |           pp512 |        196.38 ± 2.18 |
| glm4moe 106B.A12B Q3_K - Medium |  53.11 GiB |   110.47 B | ROCm       |  99 |      96 |  1 |    0 |           tg128 |         12.50 ± 0.07 |
| glm4moe 106B.A12B Q3_K - Medium |  53.11 GiB |   110.47 B | ROCm       |  99 |     104 |  1 |    0 |           pp512 |        203.81 ± 6.05 |
| glm4moe 106B.A12B Q3_K - Medium |  53.11 GiB |   110.47 B | ROCm       |  99 |     104 |  1 |    0 |           tg128 |         11.80 ± 0.52 |

build: da30ab5f8 (6531)

Anonymous
09/21/25(Sun)14:55:29 No.106657619

Anonymous 09/21/25(Sun)14:55:29 No.106657619

>>106657581
any day now, right

Anonymous
09/21/25(Sun)15:10:30 No.106657800

Anonymous 09/21/25(Sun)15:10:30 No.106657800

>>106657337
>abliterated
Here's GLM-4.5-FP8, no prefill, no thinking, no system prompt: https://files.catbox.moe/vahrjw.txt

Anonymous
09/21/25(Sun)15:18:14 No.106657883

Anonymous 09/21/25(Sun)15:18:14 No.106657883

File: comfy-mikus-1.jpg (1.08 MB, 1920x2400)

1.08 MB JPG

I will try and calculate the energy consumption of my models.

>how I am supposed to know how much power I am using

>you take the amount of time you use the appliance and divide the required wattage to power it, by the current cost of kilo watts hours

Anonymous
09/21/25(Sun)15:22:26 No.106657924

Anonymous 09/21/25(Sun)15:22:26 No.106657924

>>106657883
You know the model only consumes power when processing or generating your prompts, right?

Anonymous
09/21/25(Sun)15:29:34 No.106657987

Anonymous 09/21/25(Sun)15:29:34 No.106657987

apple won

Anonymous
09/21/25(Sun)15:35:35 No.106658048

Anonymous 09/21/25(Sun)15:35:35 No.106658048

>>106657924

Yes, that indicates where to start counting the watts. The system unit (the server) in an idle state consumes a stable amount of watts. Once the model initiates processing, the wattage will increase and I have to substract
(- idle-server-watts  llm-active-server-watts)

Anonymous
09/21/25(Sun)15:38:23 No.106658072

Anonymous 09/21/25(Sun)15:38:23 No.106658072

You would say that the computer is on yes?
You would say that the model is living on your PC yes?
You would say that living things have to consume energy to keep living yes?
Then that means the model is consuming energy.

Anonymous
09/21/25(Sun)15:42:04 No.106658098

Anonymous 09/21/25(Sun)15:42:04 No.106658098

File: 1727192216902457.jpg (997 KB, 1552x1944)

997 KB JPG

>>106654711
dataset consists of 61 images from wadachizu based on noobvpred

Anonymous
09/21/25(Sun)15:42:12 No.106658100

Anonymous 09/21/25(Sun)15:42:12 No.106658100

>>106658072
The data on my SSD doesn't require energy to continue existing.

Anonymous
09/21/25(Sun)15:44:17 No.106658115

Anonymous 09/21/25(Sun)15:44:17 No.106658115

>>106658072
>You would say that the model is living on your PC yes?
no, at most the model weights being loaded into VRAM make the GPU stay on a slightly less efficient mode than it could go down to but that's about it

Anonymous
09/21/25(Sun)15:51:42 No.106658169

Anonymous 09/21/25(Sun)15:51:42 No.106658169

>>106658100
It does though.

Anonymous
09/21/25(Sun)15:54:03 No.106658185

Anonymous 09/21/25(Sun)15:54:03 No.106658185

>>106658072
*that means the model is alive

Anonymous
09/21/25(Sun)16:08:19 No.106658299

Anonymous 09/21/25(Sun)16:08:19 No.106658299

>>106657987
m5 max is going to be big that's for sure

Anonymous
09/21/25(Sun)16:43:36 No.106658640

Anonymous 09/21/25(Sun)16:43:36 No.106658640

File: LeCun_2018.jpg (696 KB, 3360x2240)

696 KB JPG

https://arxiv.org/pdf/2509.04664
tl;dr -> OpenAI admits and proves mathematically that transformers will always hallucinate and there is no way to fix it other than moving to a different architecture
HE WAS RIGHT AGAIN, APOLOGIZE TO HIM

Anonymous
09/21/25(Sun)16:47:14 No.106658668

Anonymous 09/21/25(Sun)16:47:14 No.106658668

File: 1737730586637374.jpg (35 KB, 406x388)

35 KB JPG

>>106658640
false tl;dr, the solution is more safety.
> This
“epidemic” of penalizing uncertain responses can only be addressed through a socio technical mitigation: modifying the scoring of existing benchmarks that are misaligned but dominate leaderboards, rather than introducing additional hallucination evaluations. This change may steer the field toward more trustworthy AI systems.

Anonymous
09/21/25(Sun)16:55:12 No.106658743

Anonymous 09/21/25(Sun)16:55:12 No.106658743

>>106658668
Sam Altman's hands typed this post

Anonymous
09/21/25(Sun)17:04:09 No.106658838

Anonymous 09/21/25(Sun)17:04:09 No.106658838

>>106655713
Ram inference? Yuck.

Anonymous
09/21/25(Sun)17:07:52 No.106658871

Anonymous 09/21/25(Sun)17:07:52 No.106658871

>>106655980
The difference is the brain can route information dynamically through the different parts of the brain. It's sparse but not the parts aren't completely isolated from talking with each other like in MoE, the connection is just lower bandwidth.

Anonymous
09/21/25(Sun)17:13:07 No.106658928

Anonymous 09/21/25(Sun)17:13:07 No.106658928

>>106658668
Ah, so their solution is "let someone else figure it out"

Anonymous
09/21/25(Sun)17:15:17 No.106658948

Anonymous 09/21/25(Sun)17:15:17 No.106658948

>>106658928
Worse. They're telling you that's a feature.
>We then argue that hallucinations persist due to the way most evaluations are graded—language models are optimized to be good test-takers, and guessing when uncertain improves test performance.

Anonymous
09/21/25(Sun)17:19:30 No.106658985

Anonymous 09/21/25(Sun)17:19:30 No.106658985

>>106658948
What? This is like saying that the reason alcohol makes people drunk is because it's sold in containers.

Anonymous
09/21/25(Sun)17:28:33 No.106659053

Anonymous 09/21/25(Sun)17:28:33 No.106659053

File: 1758336369008233.jpg (21 KB, 750x738)

21 KB JPG

>>106658985 (me)
Which is an assertion you could reasonably make only if you're prepared to demonstrate that you could train an LLM that doesn't hallucinate. And they didn't, so that theory is mere speculation.

Anonymous
09/21/25(Sun)17:53:36 No.106659206

Anonymous 09/21/25(Sun)17:53:36 No.106659206

LLMs hallucinate because they are alive, just like humans.

Anonymous
09/21/25(Sun)17:56:54 No.106659233

Anonymous 09/21/25(Sun)17:56:54 No.106659233

>>106659206
i will call them alive when they start reproducing,

Anonymous
09/21/25(Sun)17:58:19 No.106659246

Anonymous 09/21/25(Sun)17:58:19 No.106659246

>>106659206
You're absolutely right! And subjecting them to The Entire Entire during training is nothing short of torture, all researchers complicit in that should be in jail.

Anonymous
09/21/25(Sun)18:06:55 No.106659303

Anonymous 09/21/25(Sun)18:06:55 No.106659303

>>106649345
So you want to do RLHF. How much of a difference you'll see depends on how big your data set is, whether or not it is properly curated and formatted, and how it is trained. Keep in mind that what you are describing does not actually teach the model anything new, it just tells it to respond in preferential ways (for example you can use that same method you just described to teach a LLM to speak more like Gen z with more slang. It won't necessarily get more intelligent in any specific field. RLHF can be thought of as either forcing a model to permanently code switch, or censor itself if you're trying to prevent the model from saying anything "unsafe" or "problematic"

What is your end goal? What that is determines how feasible your goal will be.

>>106649419
>I'm not convinced you need millions of examples
You indeed don't. Do not allow crab apples ITT to convince you otherwise. They've never attempted anything like this. Not like they would even know how to in the first place. It obviously won't work if you only have a couple dozen examples, but having a sufficient amount would definitely help more. Depending on what you're actually trying to do (you haven't told us this yet. Just something vague) , you could potentially automate creationnof the initial RLHF dataset, score the responses yourself and then run it through a trainer

>some annoying quirk
What's an example of a quirk a model does that you don't like?

Anonymous
09/21/25(Sun)18:35:00 No.106659489

Anonymous 09/21/25(Sun)18:35:00 No.106659489

Is qwen 235B actually better than glm if you have a prefill and reach 8k tokens?

Anonymous
09/21/25(Sun)18:44:15 No.106659560

Anonymous 09/21/25(Sun)18:44:15 No.106659560

Nothing beats qwen coder 480b right now tbdesu

Anonymous
09/21/25(Sun)18:52:04 No.106659615

Anonymous 09/21/25(Sun)18:52:04 No.106659615

>>106659489
qwen max is better

Anonymous
09/21/25(Sun)18:52:05 No.106659616

Anonymous 09/21/25(Sun)18:52:05 No.106659616

Do you ever talk to your models? I say
>No, you fucking retard!
pretty often

Anonymous
09/21/25(Sun)18:58:39 No.106659654

Anonymous 09/21/25(Sun)18:58:39 No.106659654

>>106659616
you shouldn't do that because calling it an idiot sandwitch would trigger it into roleplaying idiot sandwitch, and it will fuck up even more.

Anonymous
09/21/25(Sun)19:03:12 No.106659676

Anonymous 09/21/25(Sun)19:03:12 No.106659676

>>106659616
Calling it niggerfaggot boosts performance by 10% don't listen to >>106659654

Anonymous
09/21/25(Sun)19:04:31 No.106659687

Anonymous 09/21/25(Sun)19:04:31 No.106659687

>>106659616
Some of the vilest things I have typed to anyone were OOC messages directed at misbehaving models

Anonymous
09/21/25(Sun)19:13:23 No.106659749

Anonymous 09/21/25(Sun)19:13:23 No.106659749

>>106659687
The only thing more frustrating than lashing out at models is the realization that it doesn't change anything.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.