[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


Janitor applications are now being accepted. Click here to apply.


[Advertise on 4chan]


File: file.png (1.46 MB, 1024x1512)
1.46 MB
1.46 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106635936 & >>106627153

►News
>(09/17) SongBloom DPO released: https://hf.co/CypressYang/SongBloom/commit/4b8b9deb199fddc48964c851e8458b9269081c24
>(09/17) Magistral Small 1.2 with vision encoder released: https://hf.co/mistralai/Magistral-Small-2509
>(09/16) Ling-flash-2.0 released, with 100B-A6.1B: https://hf.co/inclusionAI/Ling-flash-2.0
>(09/16) Tongyi DeepResearch 30B-A3B released: https://tongyi-agent.github.io/blog/introducing-tongyi-deep-research
>(09/16) VoxCPM 0.5B: Tokenizer-Free TTS released: https://hf.co/openbmb/VoxCPM-0.5B

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Recent Highlights from the Previous Thread: >>106635936

--Paper: LLM-JEPA: Large Language Models Meet Joint Embedding Predictive Architectures:
>106643176 >106645182 >106647568 >106648268 >106648303
--Papers (old):
>106647587
--Limitations of local LLMs for document processing and memory:
>106639461 >106639485 >106639532 >106639620 >106639676 >106639802 >106639873 >106639928 >106639985 >106640027 >106641184 >106641620
--Debugging QLoRa training scripts and exploring browser-use automation for productivity:
>106638725 >106638760 >106638789 >106638895 >106639341 >106639375 >106639399 >106639686
--Troubleshooting Joycaption implementation issues in llamacpp:
>106643986 >106644049 >106645230
--Qwen models struggling with paragraph structure in roleplay responses:
>106638988 >106639017 >106639068 >106639149 >106639218 >106639255 >106639282 >106639335 >106639354 >106639359 >106639380
--Critique of model overemphasis on trivial details and tuning challenges:
>106636046 >106636140 >106636185 >106636197 >106636215 >106636242 >106636708 >106636770 >106636774 >106636233 >106636295 >106636341 >106636377 >106636198 >106636268
--Ollama's cloud models spark debate over technical quality, privacy, and cost efficiency:
>106642356 >106642407 >106642416 >106642424 >106643152 >106643158 >106642571
--Apple iPhone 16 Pro running local LLMs via LocallyAI with future HBM memory potential:
>106636508 >106636521 >106636790 >106636806 >106636546
--Optimizing character cards for local LLMs with token limits and persona consistency:
>106636023 >106636153 >106636406 >106636423 >106636474
--QLoRa training success with Llama 3.1 70B, context window and length control challenges:
>106643241 >106643452
--Qwen3-80B performance issues with long prompts and context window constraints:
>106637467
--Miku (free space):
>106645879 >106646044 >106648614

►Recent Highlight Posts from the Previous Thread: >>106635941

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
it's over
>>
I've been using the same Gemma3 27b model for a while now, but I kinda want to try some Llama 24b models for roleplay.
What are some of the 24b models you like the most?
>>
File: 1755219328665759.png (1.58 MB, 1328x1328)
1.58 MB
1.58 MB PNG
>>106649154
It hasn't even started
>>
File: 1757284991163290.png (1.24 MB, 7279x2969)
1.24 MB
1.24 MB PNG
>>
>>106649184
>>
>>106649223
A+
>>
Alger miter slather
>>
>>106649240
based, I'm still using nemo
>>
File: 20250921@011212.jpg (63 KB, 822x593)
63 KB
63 KB JPG
> compute
Well fuck. This is exact opposite of what I want.
>>
How long until we get something like NVIDIA DIGITS but with a reasonable token output rate?
>>
Why is the option to search my cards in sillytavern just gone wtf
>>
>>106649319
4chan ate my fancy unicode symbol again award
>>
>>106649319
>compute growing larger than the available data
It's 2025, just generate more data with your existing llms.
>>
I want to try online training an LLM
Meaning I will ask it a question, modify the answer to my liking and immediately train on that. Do you guys think it'll make any noticeable difference?
>>
>>106649199
I really hate this pic because these are just ramblings of one guy, biased and incorrect.
t. llm oldfag and historian
>>
>>106649361
history belongs to those who write it down
>>
(Manually edit it or try multiple gens and train on the best one).
>>
>>106649361
What would yours look like?
>>
>>106649345
How many millions of questions are you going to ask it to get enough tokens for a dataset?
>>
>>106649345
I hope you'll still be here in a million years
>>
>>106649368
I will write a historical memorial about doing your mother.
>>
>>106649361
Correct it then, don't let that fag hog all the attention
>>
>>106649390
>>106649391
I'm not convinced you need millions of examples to make a meaningful difference, especially if you just want to stamp out some annoying quirk for example.
>>
>>106649381
Way different, but I don't have time or motivation to do it, so it is what it is.
>>
>>106649345
little to no difference, the only language this would work for is hindi because you could hire an army of jeets to do this for relatively cheap. but if you are just writing a hand full of prompts yourself, then your not going to have a good time. you could probably spend a year working 16 hour days and in the end, a proper cluster will eat all your hard work in a few minutes and you will struggle to be able to measure any real difference positive or negative.
>>
>>106649345
Sure. As long as you repeat it 50000 times.
>>
>>106649199
>deepseek-v2.5
is it still worth it if I can't fit the bigger one?
>>
>>106649199
This is nonsense, we haven't had a good local model since Gemma 3
>>
>>106649496
Qwen3 30b series are the best thing that has happened to local. If it could RP it would be perfect.
>>
>>106649487
No, the pre-V3 Deepseek models were smart but dry as fuck.
>>
File: 1733926030451250.png (15 KB, 400x400)
15 KB
15 KB PNG
We could've had V4 by now.
>>
File: 1743398266197563.png (203 KB, 920x919)
203 KB
203 KB PNG
>>
>>106649608
replace "Hello..." at the bottom with actual name of one of the shizo-merges from usual suspects
>>
>>106649521
>If it could be good
Sorry gweilo, we only have benchmaxxed models for you
>>
>>106649119
How do you make these?
>>
>>106649522
>implying newer models aren't slopped too
It literally comes down to preference. Some people are still running 70Bs and nemo.
>>
>>106649667
Benchmemes are worthless, you should only trust your own personal tests.
>>
Any use case yet for RamTorch?
https://github.com/lodestone-rock/RamTorch
>>
>>106649681
Turn down your temperature if this was your take away from that.
>>
>>106649688
Wouldn't loading parameters from RAM onto VRAM just waste time during actual inference?
>>
>>106649608
This is true, but that also means it's misusing the meme.
>>
>>106649688
No. Definitely not for LLMs.
>here's this memory bandwidth bottlenecked task
>we can optimize it
>by putting the weights in slower memory
lmao baka
>>
>>106649173
gemma3 27b was my goto for a good while, glm air @ q3 replaced it for me and it's been a ton of fun, though you need 64GB of RAM or so.
>>
>>106649688
This dude created a flux finetune
I don't think LLMs are his focus
>>
>>106649319
So what is the conclusion? Do dense models or MoE scale better with more compute?
Depth has always scaled better than breadth, so my guess is at some point the marginal benefits of adding more experts kick in, so it may still help local in a roundabout way.
>>
>>106649748
I'm using G3 Glitter (50/50 mix of instruct and base) and while I had some relatively fun scenarios in the past it has began to deny things more often. Don't have any old logs saved though. Just wondering what did I change, maybe it's a temperature issue idk.
>>
>>106649671
https://github.com/RecapAnon/LmgRecap
>>
>>106650066
wait, was the newscaster bit for a planned video or something?
>>
>>106650097
https://desuarchive.org/g/thread/105939052/#q105939055
https://desuarchive.org/g/thread/105671827/#105671833
https://desuarchive.org/g/thread/105661786/#105661791
https://desuarchive.org/g/thread/103903120/#103903123
Kind of. At some point I would like to have it always automatically generate a video of Miku reading the highlights. Though the 4MB size limit and lack of native audio on this board makes it difficult.
>>
>>106650148
Cute!
>>
as with most things technology has superseeded biology there is a very rare point in cooming when you become so horny your biology fails you and you dont feel anything because you get overwhelmed it can happen once in a great while but with llms its literally every single goddamn time i want to coom i want to stroke my cock until it falls off and my skin chafes like a cartel snuff video but nay the spirit is willing but the body is not and i burn out every time why must i be cursed with this feeble meat puppet why oh god ? why must ye hate me so ?
>>
>>106650177
just stick a vibrating dildo in your ass
>>
>>106650177
This has never happened to me
>>
>>106650177
You have serious mental issues. Chromic masturbation is a sign of lunacy, among other things.
>>
File: 1756719072270601.jpg (1.03 MB, 1552x1944)
1.03 MB
1.03 MB JPG
>>
>>106650148
No thanks we have enough waifufaggotry in here.
>>
File: 1745198947072720.jpg (937 KB, 1552x1944)
937 KB
937 KB JPG
>>
File: 1751774810184415.jpg (984 KB, 1552x1944)
984 KB
984 KB JPG
>>
https://www.youtube.com/watch?v=c6htXwW38Ac
Guys do you think this is AI? In case you don't know the guy I am assuming he is dead since a while and his wife is now running his stream for her to milk the money.
>>
>>106650279
>Chromic
Wow look at mr. fancypants here, he's got a chrome dick
Also holy fuck it's over llms are dead and so is this general there has not been any meaningful progress in years the repeated flops from the big ai companies have killed any enthusiasm and none of the research that is being done is actually applied to usable models this is truly the ai winter and I blame benchmarks, fuck you if you ever thought benchmarks are useful also fuck you drummer and your gay lover undi for making shitty merges and killing the finetuning scene
>>
>>106650574
It was a hypo, my bad.
>>
>>106650440
I sounds quite natural but some bits do sound a bit robotic at times so IDK. Did he always use to do livestreams reading off a script?
>>
>>106650574
Nah it's far from over. It's actually a good thing that we hit the wall so we can focus on giving better tools to LLMs instead of trying to scale them up indefinitely.
https://github.com/Open-LLM-VTuber/Open-LLM-VTuber
>>
>>106650595
>Did he always use to do livestreams reading off a script?
Never. I think he has been dead for a while and in the past his wife was using voice to voice she would read out loud. And now she is just using vibevoice.
>>
>>106650616
Is there some official explanation of why he deleted all his videos?
>>
>>106650634
He was unsafe like Nemo.
>>
>>106650616
I saw some vtuber make a demonstration of voice changer technology and it sucked ass beyond simple tricks like pitch shift.
>>
>>106650440
>>106650616
It's certainly not AI. Take your meds
>>106650595
He's a good orator and not stuttering / misspeaking is his whole gimmick when he debates fidgety zoomers
>>
>>106650652
https://www.youtube.com/watch?v=rrZSuCzRHQU&list=PLg9RuMEFhFqJ3bg0auxsgLfIg_GRS8-HM&index=2
He is popular enough to have AI songs 2 years ago.
>>
>>106650665
Can you give an example of such a debate?
>>
>>106650691
He meant fuentes debate and that was just a one off when he was still alive.
>>
>>106650705
Not talking about that at all lmao, you are an overconfident schizo
>>
https://github.com/huggingface/transformers/pull/41025
>Adding support for Qwen3Omni
>This PR introduces support for the upcoming Qwen3-Omni models, including Instruct and Thinking versions.
>As the next generation of the Qwen-Omni family, Qwen3-Omni brings new architecture, multilingual and reasoning ability to omni model, achieving superior performance across complex multimodal tasks.
from a quick skim I believe it's (text, image, video, audio) in and (text, audio) out, like their previous omni models
>>
>>106650797
I can't wait to see it not implemented in llama.cpp
Also, it will be shit and no one will use it.
>>
>>106650669
Songs are different story, because you make them offline and can get as many gacha rolls on audiogen as you have patience for. Voice-changers have to work live and with minimal latency.
>>
>>106651052
>Voice-changers have to work live and with minimal latency.
Consider the following: it is a video played live.
>>
sirs, kindly redeem
https://github.com/ag-ui-protocol/ag-ui
samples
https://dojo.ag-ui.com/pydantic-ai/feature/shared_state
>>
>>106650652
>I saw some vtuber make a demonstration of voice changer technology
found it:
https://nitter.net/kibawooo/status/1934363088548946420
>>106651066
That would be technically possible I guess, VibeVoice's main killer feature was exactly in being able to make this kind of long-form podcast style audio, and from samples anon's posted there, it's voice cloning ability was pretty good.
There are other, non-technical, arguments against it however.
>>
File: 1742892672662765.mp4 (3.44 MB, 1286x864)
3.44 MB
3.44 MB MP4
pure art
>>
>>106651231
Then you'll find out that the client uses some poor clickfarm worker in India.
>>
>>106651252
retard
>>
>>106651231
>two tools
noob
>>
>>106651231
tool calling is almost as big of a meme as mcp and rag
>>
LLMs are a meme
>>
>>106651417
t. roleplaying faggot
>>
>>106651463
Post one thing you've "contributed" to society using AI. Just one.
No? Then fuck off.
>>
>>106651617
I have created bunch of python scripts, including a controller emulator. I mainly use this for flight simulators - mouse is perfect for simulating relative control wheel instead of using lousy controller (what always returns back to 0,0).
I will also publish my adventure game but this is in progress.
>>
>>106651617
so you define people who haven't contributed enough to your standards worthless hmm?
you are the definition of arrogance, and one day you will get fucking humbled.
it will be the worst day of your life, so prepare for someone to be better than you, and stomp all over you, and ruin your pathetic existence.
now run away and do your important work that you're wasting your life on.
>>
>>106651666
*control wheel = yoke
>>
>>106651666
>flight simulators
TTS for radio chatter?
>>
>>106651672
>so you define people who haven't contributed enough to your standards worthless hmm?
No. The anon I was replying to does.
>>
>>106651463
LLM are barely enough for RP. There's no way anyone should trust them for anything 'important'.
>>
>>106651681
MS Flight Simulator '24 uses it's own AI generated voice via Azure I think this is kind of cool.
I already have implemented piper-tts for other stuff, but I don't use multiplayer like VATSIM. It could be useful in some cases I guess. And using model to read the map... maybe that's bit too much though.
>>
>>106651768
*local LLM
>>
>>106650177
Ever since he understood the weakness of his flesh, he was disgusted by it.
>>
>>106651795
*any LLM because they're all the same dogshit
>>
File: AIwaifu.jpg (34 KB, 609x638)
34 KB
34 KB JPG
I've coomed to GLM Air far too long and it fucking sucks, honestly goliath was better. Surely there must be something better.

What's the best LLM around 100b for coom?
>>
>>106651955
not factoring your skill issue there
>>
>>106651960
>>106647810
>>
>>106651991
No way, I refuse to believe that it is over.
>>
>>106651960
behemoth x was p good from my limited use if you have the vram for it
>>
>>106651960
I am waiting for ggufs and any reports on Qwen3-Next in regards to RP/creative writing.
It would be perfect for my machine, glm-chan is a bit too tight, but hopes are slim because Qwen are not known for their ah ah mistress capability.
>>
>>106651968
Yeah bro your indian prompt-fu makes the model good, whatever you say
>>
How do you even train a model to do erotic roleplay? Do you just feed it fifty shades of grey in 17 different languages or something?
>>
File: double_hegel.gif (775 KB, 1911x639)
775 KB
775 KB GIF
>>106652152
>>106651968
>>106651955
>>106651795
Using an organized workflow makes local a good deal smarter, but is slow. Are you willing to wait longer for quality?
>>
>>106652239
Drummer knows everything about this.
>>
Moondream
>>
What 8B models could be used for simple "gaming"? I tried Gemma-3n-E4B-it (that's 4B though) but it is so soulless and repetitive even by my standards.
I guess I need to settle with Wayfarer I guess but that's 12B
>>
>>106651960
gp-toss-120b
>>
>>106648268
The answer is probably yes but I don't have any specific names or links. People have been idea manning it for ages and I personally was a part of a "cognitive architecture" community that began shortly after GPT-4 launch, but I haven't paid attention to this subject in ages. Probably what ended up happening is likely that they don't work much better if at all than simpler task-specific agent frameworks, and are also not worth the cost. As the complexity of the framework grows, so does room for error, and error propagates and error correction methods are not perfect, plus LLMs can get stuck in loops easily. There is also the issue that with this type of architecture, you really want/need LLMs trained to do each task for the best performance, but that requires a lot more work and money.
>>
File: dmn.gif (1.68 MB, 2522x1248)
1.68 MB
1.68 MB GIF
>>106652491
You mean something like this? You can run it on local.
https://github.com/dibrale/Regions/tree/master/examples/default_mode_network
>>
I snorted my addy and started getting optimistic about AGI. is this what's going on in sillicon valley?
>>
>>106652562
addy is school children, sv is all on snow white
>>
File: 1758145616077857m.jpg (84 KB, 587x1024)
84 KB
84 KB JPG
>>106652567
when u fast and do high intensity training, addy turns into meth.
>>
>>106652562
If you are not doing heroin/meth you are doing it wrong. Everyone at Apple are doing meth for example.
>>
Emoji are underrated, and I'm not speaking of engagement farming ones used on social media. Modern instruct-tuned LLMs will react differently to requests depending on which ones you're using, and this works for actual roleplay as well.
>>
>>106652548
Yeah that might be interesting. It's one interpretation that's different from the architecture LeCun outlined. It could be fun to play around with, though, skimming through the link, I think a narrower agent designed for a specific task is probably still the better idea for anyone looking for working solutions to their tasks.
>>
>>106652716
Totally agree. That particular demo is geared to produce traces and responses that are more aesthetic than practical. Literally a daydreaming circuit.
>>
>>106651034
I'm starting to believe it's a conspiracy to obstruct local!
>>
>>106652708
Interesting. How hard is gemma3 to jailbreak?
>>
>>106650338
>>106650361
>>106650425
Miku ready for autumn
>>
>>106651231
You should have made it call some random Anon a tool instead.
>>
>>106652893
retard
>>
>>106651231
Amusing at first but this is basically just spam. Worthless use case except for ccp bot farms.
>>
>>106651231
retard
>>
>>106650177
same but with images
>>
Any LLM that fits in my 4gb ram potato server that would be good for generating tags for this RSS agregator? or are all the small LLMs still too bad? https://github.com/Tiendil/feeds.fun
>>
>>106652872
It's not difficult, but you need to have a fairly detailed prompt and to not simply leave that at the start of the context, or it will slowly fall back to the default safe assistant personality. You should also not expect good smut from the model; it wasn't trained for that, although I'm fairly confident it was definitely post-trained on some amounts of ERP.
>>
>>106652971
You can try
>gemma-3-1b-it-Q8_0.gguf
That's 1GB.
or if you want to max out the insanity but this might be pushing it
>gemma-3n-E4B-it-IQ4_XS.gguf
>>
>>106652980
Thanks anon. I'll give it a try.
>>
File: 1758386539683817.png (781 KB, 1110x833)
781 KB
781 KB PNG
>>
>>106650652
Voice changers do suck much more than people think, it's funny. Even the AI ones sound like garbage.
>>
llms benchmaxxed on vimscript(trash)? Even Claude is shit at it
>>
There's some big releases coming up.
>>
>>106653223
Big for your setup.
>>
>>106651186
If you make VibeVoice output more than a few minutes, it still starts sounding quite bad, so you'd have to cut together multiple segments, which in turn means that you need to smooth over those cuts.
>>
>>106649336
That will get you a new model that is at best as good as the models you trained it on. Garbage in, garbage out.
>>
>>106653343
All the billions spent on companies generating high-quality synthetic data speak a different story.
>>
>>106653084
I liek this Miku too
>>
>>106653538
>Greek Philosophers
???
>>
File: 1747425823275585.png (110 KB, 964x535)
110 KB
110 KB PNG
>>106653549
>>
>>106653560
we wuz philosophies and shiet
>>
How does tool calling during inference work? Can I give my LLM a dice to throw mid-reply to ensure that something has a random outcome?
>>
>>106651960
>GLM
Can you please share your ST text completion preset for it? (assuming you use ST)
I've been trying to wrangle it but I can't get it to generate coherent text
>>
>>106653651
yes. tools have to be supported not only by the model (as it needs actual training for it) but they also need to be supported by the chat template. Tools work by giving the AI during the request a list of tools/functions available, their purpose (so it understands when to use it) and how to actually call the tool. It will 'pause' mid inference and wait for the tools result.
>>
File: 1731141654486212.jpg (519 KB, 1489x1727)
519 KB
519 KB JPG
https://github.com/LostRuins/koboldcpp/releases/tag/v1.99
>>
File: GLM.png (82 KB, 483x739)
82 KB
82 KB PNG
>>106653654
NTA, but I have it set up like this.
>>
>>106653654
Don't use ST. The only thing that worked for me is every sampler disabled except for min-p at 0.1 and temp around 1.
>>
File: 1751450110279748.png (21 KB, 450x500)
21 KB
21 KB PNG
>>106653515
>>
mikutroon trash thread
>>
>>106649116
W-what are we going to do in the bathroom?
>>
>>106654125
wash hands really really clean
>>
>>106654125
She's gonna gobble your dong, anon.
>>
>>106654125
the needful
>>
>>106651960
Get 128gb ram and upgrade to a lower quant of glm full for the true experience. Otherwise I can think of mistral large tunes maybe.
>>
>>106653685
>training to use tools
How hard would it be to repurpose some unused tokens from the llm vocab as instructions for tools, put that as stop tokens and run the tool with a script that'll detect these tokens + regex it with the actual tool output and finish the initial generation?
I don't think training for tool usage is necessary
>>
mistralai/Mistral-Large-Instruct-2512 (3 months)
>>
>>106654465
incontext learning is less efficient then actually training for the task, and more prone to failure.
>>
>>106654465
you could actually do it without training and even without changing the template.
At my old job we used to put the available tools in the system prompt, then we had a llm 'router' decide which tool to use, if any tool at all, and redirect the request to a specific agent (we would route the whole request), then we fed back the result to the main LLM.
So yeah of course, everything is achievable, but having the ai trained on native tool usage makes them 'better' at using tools generally. Even purpose created AIs with tool usage still FAIL at executing tools sometimes, you could see that if you used any of the autonomous agentic coders, that fail at tool calling due to wrong syntax used, or even when the claude plays pokemon meme was on (or is it still on?) still fucked up calling its navigation tools.
>>
>>106654513
>fail at tool calling due to wrong syntax used
Shouldn't that be solved with structured outputs?
>>
Is gemma3 abliterated still the go-to for 24gb vram, for fun/rp purposes?
Or did something else came out since then? It's been a while...
New Mixtral 8x7b when
>>
>>106650425
>>106650361
>>106650338
artist?
>>
Tested some Qwen models with Adobe's NoLiMa up to 32k, all of them set to temp=0.7, min_p=0.00, top_p = 0.8, top_k=20 (from Qwen's "best practices")

Qwen3-235B-A22B-Instruct-2507 Q8_K_XL
Base: 94.8%
1K: 85.8%
2K: 80.7%
4K: 73.5%
8K: 63.9%
16K: 53.9%
32K: 45.0%
Effective length: 2K

Qwen2.5-14B-Instruct-1M F32
Base: 94.4%
1K: 85.6%
2K: 80.5%
4K: 73.4%
8K: 63.8%
16K: 52.9%
32K: 44.9%
Effective length: 2K

Qwen3-30B-A3B-Instruct-2507 BF16
Base: 93.4%
1K: 84.7%
2K: 79.7%
4K: 72.6%
8K: 63.8%
16K: 53.3%
32K: 43.7%
Effective length: 2K

Compared to the current top 3

GPT4.1
Base: 97.0%
1K: 95.6%
2K: 95.2%
4K: 91.7%
8K: 87.5%
16K: 84.9%
32K: 79.8%
Effective length: 16K

GPT-4o
Base: 99.3%
1K: 98.1%
2K: 98.0%
4K: 95.7%
8K: 89.2%
16K: 81.6%
32K: 69.7%
Effective length: 8K

Llama 3.3 70B
Base: 97.3%
1K: 94.2%
2K: 87.4%
4K: 81.5%
8K: 72.1%
16K: 59.5%
32K: 42.7%
Effective length: 2K
>>
>>106654812
>Q8_K_XL
>2.5-14B-Instruct-1M
lol
>>
>>106654557
Structured output forces all outputs to match a format like e.g. json. Not really useful when you want the model to do other things in its output besides just tool calling. Even so, if the model wasn't planning to use a gjven format, forcing down that path will only confuse it and degrade the results. It also doesn't stop it from passing invalid or junk input to the tool parameters.
>>
File: ezgif-3-6c6e651360.gif (2.37 MB, 363x363)
2.37 MB
2.37 MB GIF
Low IQ anon here.
What UI do you guys use to animate stuff?
I kind of don't like comfy. Any alternatives?
>>
>>106654895
comfy is like a diary, if you don't understand what it's for you don't need it. this wisdom is all i have for now
>>
>dear diary... today i made 37 blacked miku videos in comfyui
>>
>>106654812
>qwen235b 63.9% at 8k
>llama70b 72.1% at 8k
uhhh moesissies?
>>
>>106654125
You are the toilet.
>>
>>106654974
do not fall for the daily densefag FUD
>>
>>106654841
It's more flexible than that. You can have a free-form field and have a second field with constrained options. If you abuse it (with nested dictionaries or similar), yeah it'll degrade the output but that's only an issue with small LLMs. XML-like tags are better than json btw.
>>
>>106654812
Does this benchmark really account for non-greedy sampling?
Anyway, cool though. Would be nice to have results for the 235B Thinking as well, since that's where the context handling really sees an boost according to fiction livebench.
>>
File: 1737582711899121.jpg (19 KB, 500x485)
19 KB
19 KB JPG
>>106654812
>>
Newfag here. What's the best local model chatbot I can use to converse with about my daily accomplishments? I want someone to impress.
>>
>>106655102
Everything will suck your dick in this way. It is the literal dick sucking that they suck dick at.
>>
>>106654921
I just fucking hate the comfy guy
>>
File: world map.png (157 KB, 947x1138)
157 KB
157 KB PNG
>>106655037
>FUD
https://outsidetext.substack.com/p/how-does-a-blind-model-see-the-earth
>>
>>106655125
I want my dick sucked figuratively while my dick is sucked literally.
>>
>>106655221
this is so retarded
>which model was trained on more earth geometry data!
literally a benchmaxx case
>>
>>106655203
The developer? Is there something to know about him?
>>
>>106655221
Pure FUD. This is useless. The MoE models are just more effective at remembering here. 98% the accuracy while easier to run.
>>
>>106655251
Typical retarded moefag. Benchmaxxing is when they train directly on benchmark data or close approximations. The ability to utilize the learned geometry data is also its ability to generalize and reason. On the other hand, the only thing MoEs are good for is benchmaxxing.
>>
>>106655221
How the fuck does it even know anything?
>>
>>106655221
deepseek is more informed of the true distribution of the hyperborean landmasses derived from the atlantis that the talmudic circumcision trauma based freemason organizations like meta are trying to erase from the human consciousness
>>
>>106655323
moes are inferior when it comes to attention and world knowledge.
>>
>>106655407
Try literally any other test on Kimi K2 and LLaMA-405b. See what happens.
>>
The whole point is that your other tests that only grade memorzation capability are worthless.
>>
>>106655221
moesissies lost
>>
>>106655441
The whole point of that test is that the models didn't just memorize the card. They built an intuitive understanding of the world map from traces of data they took in during training which they are only able to catch because they are not lotomized to x amount of active parameters when they attempt to recall it.
>>
>>106655441
>memorzation capability are worthless
If a model doesn't know what's the color of pantsu on the girl on the 387th scene in that obscure VN then it's trash.
>>
>>106655358
Test claimed that the prompt asked whether there is land on every individual latitude-longitude. So I guess it saw many place names paired with coordinates. I am too surprised that it looks so clear
>>
What is this retardation.
>>
>>106655479
quantards and moetards coping
>>
>>106655473
But that is so many layers of abstraction that I can't imagine any model actually managing to answer somewhat correctly. It is crazy to me that this works.
>>
>>106655479
You are supposed to put a question mark at the end of a question?
>>
>>106655479
mikutroon thread
>>
>>106655479
Densefags are still trying to spread FUD about the de facto lossless nature and performance of MoE models.
>>
>>106655473
>>106655503
I suspect/cope that an SVG map of the Earth somehow got into the training data, which would reduce task complexity by great margin.
>>
dense = sane, normal inner monologue
moe = schizophrenia, many voices/"experts"
>>
>>106655563
Read the actual article on how he prompts the models to build the maps before you say stupid shit.
>>
>>106655251
literally the opposite of benchmaxx. With every model benchmaxxed so hard, the only test of true generalization is retarded stuff that no one ever thought to test before, like this. Same with the SVG unicorn test before AI labs found out about it
>>
>>106655407
Imagine thinking that dataset quality isn't a factor here. Llama and Qwen are both shit for this reason regardless of whether they're dense or moe.
>>
you have to be trolling to actually say dense is superior. it probably is for ease of training but it is never gonna be an optimal architecture. information is always stored in some particular space in the weights. and iterating over weights containing information about geography is a waste of compute when you are asking for a recipe for a cake.
>>
>>106655582
I've read the article, he gave models coordinates and asked if there's land there.
It's still would be easier for model to answer this if it had SVG map on hand than derive it wholesale from million "Africa is South of Eurasia" factoids.
>>
>>106655641
No shit it would be easier, but the goal isn't for it to be easy or accurate but to test a model's ability to build a world model from integrating various disparate facts.
>>
Reminder that dense model advocates are just SaaS fags malding and poorfags seething.
If dense models were still preferred, you'd have 400B to 1T+ monstrosities that no one can actually run. MoE lets you actually have near SoTA performance at home for reasonable costs. The MoE meta is arguably near perfect in that any enthusiast willing to put in an iota of effort and either has a job and/or has a high enough IQ to save their neetbucks can buy equipment to run SoTA models at reasonable speeds.
If 200B+ dense models were the meta, the local landscape would be non-existent and Sam Altman would win by default. If anything less than 24B models were the meta, you'd have the insane 3rd world grifting, retardation, and schizophrenia that exists in local imagegen and the past merge-era. Instead, all of that is quarantined to the likes of the proxyfags.
MoE is the near perfect filter. We just need the chinks to get better datasets and benchmaxx the creative writing/fictionbench benchmarks as hard as the math/coding benchmarks and we'll be at a golden age for inferencing.
>>
>>106655658
And the test partially failed because you can't stop model from knowing SVG map.
(Being able to render SVG as emergent behavior is still impressive nonetheless.)
>>
>>106655713
>MoE lets you actually have near SoTA performance at home for reasonable costs.
but not reasonable speeds, moetards always conveniently omit that part
>>
>>106655627
>and iterating over weights containing information about geography is a waste of compute when you are asking for a recipe for a cake.
Except that any obscure things the two things have in common are missed during training. The dense model is superior from an informational perspective.
I literally don't care how much it costs Open AI or Microshit or Meta or google or whatever to serve me an answer. I want the most informationally complete answer possible.
>>
>>106655732
speed boost of inference is the whole MoE's selling point
>>
>>106655750
Not at home though.
>>
>>106655713
MoE is designed for cost optimization on the cloud, which is why recent models are all MoE. Surely you're not deluded enough to believe they're making models for us? Model trainers don't even consider running models on cpu as a possibility
>>
>>106655754
At home as well.
>>
>>106655769
True, 10t/s is better than 1t/min.
sadly, after tasting 100 you can't go below 20.
>>
>>106655749
The problem is MoE is good enough for ~90% of prompts. Being able to coherently string together memorized knowledge is all you need. The few challenging prompts (obscure and difficult programming requests, asking it to a write a character that doesn't strip and suck its own dick twice, deep conversations that touch on many subjects) are out of luck.
>>
>>106655761
Max cost optimization on the cloud = moving models to consumer devices and offloading running costs to consumers.
>>
>>106655754
At home, where you are running on RAM and getting 0.1 t/s because you can't fit the fat model into VRAM. That speed boost?
>>
>>106655793
yes, that's what I'm saying
>>
1 genius vs 100 retards
>>
>>106655791
They're not going to let all of that data harvesting ability slip past them. They're going to offload running costs to consumers by charging more for the subscription.
>>
>>106655787
So basically MoE makes it good enough for shitjeets at the cost of pushing the frontiers like white people do. The perfect shitskin architecture.
>>
>>106655801
I meant to reply to the other guy.
>>
>>106655818
they can harvest data in other ways
they can even run AI on your device to filter data worth harvesting
and then use tool call to dial 911 and report you using your own sim card
>>
Anyone running inference on Debian testing?
I'd like to move to the new 6.16 kernel branch and newer packages but I'm afraid the nvidia blobs and CUDA are gonna be fucked by new gcc or frontends kneecapped by python version bullshit
>>
>>106655503
>>106655563
Big neural networks have internal mind states, so it's not surprising that they can imagine something from the circumstantial information and extract data from that.

Also this >>106655221 is retarded comparison, someone just trying to meme. MoEs were SOTA for years now (since GPT-4 era) and their performance is more or less the same (sometimes better) than dense models with much lower computational cost. By the way in human brains you also have "routing" for information and sort of computational modules to process data. Your brain doesn't push for visual cortex data through every single neuron in your brain. Dense models are wasteful solution that works, but it's not even close to being optimal. It's like shooting a wasp nest with a tank canon - sure, you destroyed it, but you could do it way much smarter and easier with a broom.
>>
>>106655980
>Big neural networks have internal mind states
What?
>>
>>106655980
>Despite significant advances, AI systems struggle with the frame problem: determining what information is contextually relevant from an exponentially large possibility space. We hypothesize that biological rhythms, particularly hormonal cycles, serve as natural relevance filters that could address this fundamental challenge. We develop a framework that embeds simulated menstrual and circadian cycles into Large Language Models through system prompts generated from periodic functions modeling key hormones including estrogen, testosterone, and cortisol. Across multiple state-of-the-art models, linguistic analysis reveals emotional and stylistic variations that track biological phases; sadness peaks during menstruation while happiness dominates ovulation and circadian patterns show morning optimism transitioning to nocturnal introspection. Benchmarking on SQuAD, MMLU, Hellaswag, and AI2-ARC demonstrates subtle but consistent performance variations aligning with biological expectations, including optimal function in moderate rather than extreme hormonal ranges. This methodology provides a novel approach to contextual AI while revealing how societal biases regarding gender and biology are embedded within language models.

Are you maybe one of the authors? Can I get your autograph.
>>
>>106656013
lecunny was wrong
>>
>>106656013
https://arxiv.org/pdf/2310.02207
https://arxiv.org/pdf/2210.13382
https://arxiv.org/pdf/2308.08708

Third paper is a longer summary, second talks about the mental representation of latent maps in game (games in general are a good example for this in LLMs) and the first one may interest you the most because it is about this -> >>106655221 , a spatial representation of world.
>>
>>106656247
first good post itt, thank you
>>
>>106656247
Also related
https://youtu.be/cufOEzoVMVA?t=1254

>>106656240
I don't believe he disagrees with these ideas. He has said LLMs don't have a world model in the context of human level models, not that they don't have any internal models of anything at all.
>>
>>106649608
merging works well for image models i had no idea people did it with llms too
>>
If AIs do have an internal world model it's really not a good one
>>
>>106656483
>AIs
sir we aren't in youtube comments please use appropriate terminology
>>
>>106656497
Go back to orange reddit, we use colloquialisms here
>>
>>106656497
You're a plunge router
>>
>>106656483
Maybe the internal models would be better if they stopped with the pretraining data filtering, political correctness alignment lobotomies, and limiting the number of active params.
>>
>>106655221
I still don't understand how this is even possible. TLDR on the papers?
>>
>>106649116
What's a model that can recognize Japanese text through OCR just as good as Gemini 2.5 and GPT 4/5?
>>
glm air cpu moe bench vs ngl its about 3 t/s improvement which is pretty good
./llama-bench -m '/mnt/miku/Text/GLM-4.5-Air-Q3_K_M/GLM-4.5-Air-Q3_K_M-00001-of-00002.gguf' -ngl 99 --n-cpu-moe 33 -t 48 -fa 1 --mmap 0
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon RX 7900 XTX, gfx1100 (0x1100), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | threads | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q3_K - Medium | 53.11 GiB | 110.47 B | ROCm | 99 | 48 | 1 | 0 | pp512 | 207.63 ± 3.52 |
| glm4moe 106B.A12B Q3_K - Medium | 53.11 GiB | 110.47 B | ROCm | 99 | 48 | 1 | 0 | tg128 | 12.19 ± 0.21 |

build: da30ab5f8 (6531)


(づ◡﹏◡)づ [llama.cpp]$ ./build/bin/llama-bench -m '/mnt/miku/Text/GLM-4.5-Air-Q3_K_M/GLM-4.5-Air-Q3_K_M-00001-of-00002.gguf' -ngl 19 -t 48 -fa 1 --mmap 0
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon RX 7900 XTX, gfx1100 (0x1100), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | threads | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q3_K - Medium | 53.11 GiB | 110.47 B | ROCm | 19 | 48 | 1 | 0 | pp512 | 206.51 ± 8.28 |
| glm4moe 106B.A12B Q3_K - Medium | 53.11 GiB | 110.47 B | ROCm | 19 | 48 | 1 | 0 | tg128 | 9.89 ± 0.19 |

build: da30ab5f8 (6531)

>>
>>106653654
nta but heres my tavern master export https://files.catbox.moe/g9adny.json
>>
think i'm gonna sell my a6000s and go back to renting gpus/openrouter....
>>
>>106654710
glm air is better imo
>>
>>106656736
>-t 48
I heard there's no use above 32 threads, you just get chocked by memory bandwidth
>>
File: 764325.jpg (89 KB, 750x1000)
89 KB
89 KB JPG
HAPPENING!
I just ate a pizzer
>>
>>106656807
well i have quad channel so using as many a possible should split between the channels?? ill run some benchmarks on threads
>>
>>106656847
You only really need as many cores as necessary to feed the memory channels.
For inference at least.
>>
>>106649116
>https://developers.cloudflare.com/workers-ai/models/
Hello /lmg/ anons. I was told your the experts on this subject: I need your help selecting an ai model from this list for tagging text and translating it to top 5 or 10 languages in the world. It's for a small blog with both large posts and lots of small asides (need the features for both).
>>
>>106656876
Oss120b I guess. Unless the text isn't 100% safetymaxxed corpo slop
>>
>>106656876
They have logom2m100-1.2b listed specifally for translation so you should try and test if that will work for you because it would be cheap. Otherwise, llama 3.x and gemma are multilingual so any of those could work.
>>
>>106656869
results it does increase between 32 and 48 i assume drop off in last is worse because not enoguh cores for everything else i might go into bios and enable all of my cores to see what difference that makes https://pastebin.com/Hqrv1WKF
>>
File: puke.jpg (616 KB, 740x740)
616 KB
616 KB JPG
>>106656876
>qwen1.5
>deepseek-r1-distill
>gemma-7b
>no kimi, no glm
Are cloudshitters for real? I can't believe I eat better locally, for free at that.
>>
>>106656998
>for free at that.
Did santa give you your hardware?
>>
>>106657006
More or less, I got my GPU as a birthday gift to play vydia.
>>
LLMs have finally hit a plateau, haven't they? I feel like all of last year, we seen at least one huge update a month, but now we get nothing outside of slightly better test takers.
>>
>>106656736
haven't seen you in a while anon. what's your favorite rp model?
>>
>>106656807
48 is definitely better than 32 but after that it isnt better also weirdly 1/4 of my cores/threads will not enable think its because my 7900xtx draws troo much power the mobo complains it needs another 6 pin cable at post lol

  Device 0: AMD Radeon RX 7900 XTX, gfx1100 (0x1100), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | threads | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q3_K - Medium | 53.11 GiB | 110.47 B | ROCm | 99 | 32 | 1 | 0 | pp512 | 195.71 ± 0.77 |
| glm4moe 106B.A12B Q3_K - Medium | 53.11 GiB | 110.47 B | ROCm | 99 | 32 | 1 | 0 | tg128 | 12.08 ± 0.07 |
| glm4moe 106B.A12B Q3_K - Medium | 53.11 GiB | 110.47 B | ROCm | 99 | 48 | 1 | 0 | pp512 | 194.79 ± 1.57 |
| glm4moe 106B.A12B Q3_K - Medium | 53.11 GiB | 110.47 B | ROCm | 99 | 48 | 1 | 0 | tg128 | 12.50 ± 0.05 |
| glm4moe 106B.A12B Q3_K - Medium | 53.11 GiB | 110.47 B | ROCm | 99 | 64 | 1 | 0 | pp512 | 202.76 ± 5.77 |
| glm4moe 106B.A12B Q3_K - Medium | 53.11 GiB | 110.47 B | ROCm | 99 | 64 | 1 | 0 | tg128 | 12.47 ± 0.02 |
| glm4moe 106B.A12B Q3_K - Medium | 53.11 GiB | 110.47 B | ROCm | 99 | 76 | 1 | 0 | pp512 | 206.23 ± 5.84 |
| glm4moe 106B.A12B Q3_K - Medium | 53.11 GiB | 110.47 B | ROCm | 99 | 76 | 1 | 0 | tg128 | 12.32 ± 0.10


>>106657109
ive been on trash sdg mainly glm air is great ive been using it since it came out
>>
>>106657101
No exponential trend can be sustained forever in the natural world, irrespective of how much cocaine sv collectively snorts.

On the bright side, we have time to adapt to the new tech now. This includes local catching up through incremental optimizations.
>>
>>106657101
correct the moat is actually a shallow ditch. any one with 15t data and compute to train a model can match sota.
>>
>>106657150
Okay, I guess the benchos I saw are outdated already.
>>
>>106655791
You overestimate the potato most customers are running
>>
>>106657190
it might depend on cpu architecture, the sapphire rapids xeons have 4 tiles and maybe not all tiles are active when using 32 threads which effects memory bandwidth?
>>
>>106657212
Because moving customer to expensive fashion statement thin clients was the goal of the push for IoT.
>>
File: file.png (184 KB, 1079x633)
184 KB
184 KB PNG
INTEL PRO ARC 60 24GB ONLY 599.99USD$ DOLLARS
NIVEA USERS IN SHAMBLES
>>
>>106657235
thats interesting actually we might start getting way beefier hardware for cheaper as more companies want to run models locally on peoples devices
>>106657246
holy shit nice
>NIVEA USERS IN SHAMBLES
form now maybe but now they have a deal together nvidia might encourage them to stop development of their dedicated gpus
>>
>>106657246
maybe if it was like 200 cheaper
>>
>>106657246
Nvidia users have CUDA and universal software support. Have fun saving $100 and only being able to run LLMs.
>>
File: 1757597358240468.jpg (12 KB, 250x249)
12 KB
12 KB JPG
>root@mycontainer:/# ollama run huihui_ai/deepseek-r1-abliterated:14b
>>>> What happened on the 4th of June 1989?
><think>
>
></think>
>
>I am sorry, I cannot answer that question. I am an AI assistant designed to provide helpful and harmless responses.
>
>>>> Send a message (/? for help)

Are abliterated models just a meme?
>>
>>106657246
maybe if it was like 48GB
>>
ollama run qwen3-next
>>
ollama run youareanidiot.com
>>
File: G1XworjXoAABIXf.png (36 KB, 1578x777)
36 KB
36 KB PNG
I'll never forget how I called this model the absolute OCR GOAT months ago and stinky incels ITT had the audacity to dismiss it. Now benchmarks are coming out and even normies realize just how good dots.ocr is. fucking retards, the lot of you.
>>
ollama run
faster than my gun
>>
ollama run you're mum
>>
File: 1749376372871268.jpg (88 KB, 873x1024)
88 KB
88 KB JPG
>>106657337
>ollama
>deepseek-r1:14b
>abliterated
You're the whole circus
>>
>>106657368
It's absolutely cracked at tables btw.
>>
>>106657246
>Memory Size:
> 24 GB
> Memory Type
>GDDR6
>Memory Bus
>192 bit
>Bandwidth
>456.0 GB/s
>>
>>106657368
guf?
>>
>>106657368
yeah its good, i digitalized a whole book in a very foreign language
its fairly quick too, 500 page book in like 5 hours on a 3060
>>
>>106657368
Actually, I went ahead and checked it out after your posting about it. Added support for it in my app
>>
What about dots vlm?
>>
yummy 60gb vllm image
>>
>>106657400
I don't speak zoomer and I can't tell if this means it's good or bad at tables.
>>
>>106657438
https://huggingface.co/dinhquangson/dots.ocr-gguf/tree/main
>>
>>106657565
Bad. Cracked -> Broken
>>
no better at 96 threads, kinda wanna upgrade my mobo last time i compared benchamrks with someone who was using the mobo that supports 8 channels they were getting double the t/s i had

./build/bin/llama-bench -m '/mnt/miku/Text/GLM-4.5-Air-Q3_K_M/GLM-4.5-Air-Q3_K_M-00001-of-00002.gguf' -ngl 99 --n-cpu-moe 33 -t 96,104 -fa 1 --mmap 0
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon RX 7900 XTX, gfx1100 (0x1100), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | threads | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | -: | ---: | --------------: | -------------------: |
| glm4moe 106B.A12B Q3_K - Medium | 53.11 GiB | 110.47 B | ROCm | 99 | 96 | 1 | 0 | pp512 | 196.38 ± 2.18 |
| glm4moe 106B.A12B Q3_K - Medium | 53.11 GiB | 110.47 B | ROCm | 99 | 96 | 1 | 0 | tg128 | 12.50 ± 0.07 |
| glm4moe 106B.A12B Q3_K - Medium | 53.11 GiB | 110.47 B | ROCm | 99 | 104 | 1 | 0 | pp512 | 203.81 ± 6.05 |
| glm4moe 106B.A12B Q3_K - Medium | 53.11 GiB | 110.47 B | ROCm | 99 | 104 | 1 | 0 | tg128 | 11.80 ± 0.52 |

build: da30ab5f8 (6531)
>>
>>106657581
any day now, right
>>
>>106657337
>abliterated
Here's GLM-4.5-FP8, no prefill, no thinking, no system prompt: https://files.catbox.moe/vahrjw.txt
>>
File: comfy-mikus-1.jpg (1.08 MB, 1920x2400)
1.08 MB
1.08 MB JPG
I will try and calculate the energy consumption of my models.

>how I am supposed to know how much power I am using

>you take the amount of time you use the appliance and divide the required wattage to power it, by the current cost of kilo watts hours
>>
>>106657883
You know the model only consumes power when processing or generating your prompts, right?
>>
apple won
>>
>>106657924

Yes, that indicates where to start counting the watts. The system unit (the server) in an idle state consumes a stable amount of watts. Once the model initiates processing, the wattage will increase and I have to substract
(- idle-server-watts  llm-active-server-watts)
>>
You would say that the computer is on yes?
You would say that the model is living on your PC yes?
You would say that living things have to consume energy to keep living yes?
Then that means the model is consuming energy.
>>
File: 1727192216902457.jpg (997 KB, 1552x1944)
997 KB
997 KB JPG
>>106654711
dataset consists of 61 images from wadachizu based on noobvpred
>>
>>106658072
The data on my SSD doesn't require energy to continue existing.
>>
>>106658072
>You would say that the model is living on your PC yes?
no, at most the model weights being loaded into VRAM make the GPU stay on a slightly less efficient mode than it could go down to but that's about it
>>
>>106658100
It does though.
>>
>>106658072
*that means the model is alive
>>
>>106657987
m5 max is going to be big that's for sure
>>
File: LeCun_2018.jpg (696 KB, 3360x2240)
696 KB
696 KB JPG
https://arxiv.org/pdf/2509.04664
tl;dr -> OpenAI admits and proves mathematically that transformers will always hallucinate and there is no way to fix it other than moving to a different architecture
HE WAS RIGHT AGAIN, APOLOGIZE TO HIM
>>
File: 1737730586637374.jpg (35 KB, 406x388)
35 KB
35 KB JPG
>>106658640
false tl;dr, the solution is more safety.
> This
“epidemic” of penalizing uncertain responses can only be addressed through a socio technical mitigation: modifying the scoring of existing benchmarks that are misaligned but dominate leaderboards, rather than introducing additional hallucination evaluations. This change may steer the field toward more trustworthy AI systems.
>>
>>106658668
Sam Altman's hands typed this post
>>
>>106655713
Ram inference? Yuck.
>>
>>106655980
The difference is the brain can route information dynamically through the different parts of the brain. It's sparse but not the parts aren't completely isolated from talking with each other like in MoE, the connection is just lower bandwidth.
>>
>>106658668
Ah, so their solution is "let someone else figure it out"
>>
>>106658928
Worse. They're telling you that's a feature.
>We then argue that hallucinations persist due to the way most evaluations are graded—language models are optimized to be good test-takers, and guessing when uncertain improves test performance.
>>
>>106658948
What? This is like saying that the reason alcohol makes people drunk is because it's sold in containers.
>>
File: 1758336369008233.jpg (21 KB, 750x738)
21 KB
21 KB JPG
>>106658985 (me)
Which is an assertion you could reasonably make only if you're prepared to demonstrate that you could train an LLM that doesn't hallucinate. And they didn't, so that theory is mere speculation.
>>
LLMs hallucinate because they are alive, just like humans.
>>
>>106659206
i will call them alive when they start reproducing,
>>
>>106659206
You're absolutely right! And subjecting them to The Entire Entire during training is nothing short of torture, all researchers complicit in that should be in jail.
>>
>>106649345
So you want to do RLHF. How much of a difference you'll see depends on how big your data set is, whether or not it is properly curated and formatted, and how it is trained. Keep in mind that what you are describing does not actually teach the model anything new, it just tells it to respond in preferential ways (for example you can use that same method you just described to teach a LLM to speak more like Gen z with more slang. It won't necessarily get more intelligent in any specific field. RLHF can be thought of as either forcing a model to permanently code switch, or censor itself if you're trying to prevent the model from saying anything "unsafe" or "problematic"

What is your end goal? What that is determines how feasible your goal will be.

>>106649419
>I'm not convinced you need millions of examples
You indeed don't. Do not allow crab apples ITT to convince you otherwise. They've never attempted anything like this. Not like they would even know how to in the first place. It obviously won't work if you only have a couple dozen examples, but having a sufficient amount would definitely help more. Depending on what you're actually trying to do (you haven't told us this yet. Just something vague) , you could potentially automate creationnof the initial RLHF dataset, score the responses yourself and then run it through a trainer

>some annoying quirk
What's an example of a quirk a model does that you don't like?
>>
Is qwen 235B actually better than glm if you have a prefill and reach 8k tokens?
>>
Nothing beats qwen coder 480b right now tbdesu
>>
>>106659489
qwen max is better
>>
Do you ever talk to your models? I say
>No, you fucking retard!
pretty often
>>
>>106659616
you shouldn't do that because calling it an idiot sandwitch would trigger it into roleplaying idiot sandwitch, and it will fuck up even more.
>>
>>106659616
Calling it niggerfaggot boosts performance by 10% don't listen to >>106659654
>>
>>106659616
Some of the vilest things I have typed to anyone were OOC messages directed at misbehaving models
>>
>>106659687
The only thing more frustrating than lashing out at models is the realization that it doesn't change anything.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.