[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: file.png (290 KB, 543x543)
290 KB
290 KB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108393004

►News
>(03/17) Rakuten3.0 released (nobody posted any logs yet): https://huggingface.co/Rakuten/RakutenAI-3.0
>(03/16) Mistral 4 small releasing: https://huggingface.co/collections/mistralai/mistral-small-4
>(03/11) Nemotron 3 Super released: https://hf.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
Oh, what a clutz you are. Here you go.
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png
>>
~Small and Open~
>>
Yannlove
>>
File: 1764446103940328.jpg (90 KB, 1242x848)
90 KB
90 KB JPG
►Recent Highlights from the Previous Thread: >>108393004

--RakutenAI-3.0 DeepSeek-V3 MoE release and benchmarks:
>108393026 >108393044 >108393091 >108395391 >108398158 >108399357 >108394186 >108394278 >108393079
--Multi-GPU setup performance and cost comparisons:
>108393779 >108393813 >108393856 >108393880 >108393842 >108393860 >108393862 >108393864 >108393889 >108393922 >108395308 >108395379 >108395450
--GROK-2 performance tuning and response behavior on 3090:
>108393082 >108393093 >108393123 >108393131 >108393164 >108393203 >108393144 >108394135 >108394210 >108398778
--MiroThinker-v1.5-235B architecture and stability concerns:
>108393525 >108393566 >108393587 >108393687 >108393568 >108393645
--Tool call detection issues in reasoning blocks:
>108397549 >108397577 >108397685 >108397800 >108397809 >108397828 >108397837
--Pipeline parallelism graph reuse causing throughput fluctuations:
>108394574 >108394600 >108394601
--EMAGE ONNX export repo for streaming gesture model inference:
>108394782 >108395331
--MiniMax M2.7 announcement and benchmark performance vs other models:
>108398047
--Mistral Small 4 throughput drops with longer context:
>108395290 >108395293 >108395299 >108395336 >108395392 >108395398 >108395403 >108395418
--Debating Q8 quantization for k/v cache to extend context length:
>108399779 >108399786 >108399797 >108399970 >108399988 >108399995 >108399948
--llama.cpp chat parser regression fix debate:
>108393200 >108393243 >108394117 >108394320 >108397199
--AI models' varied responses to offensive prompts:
>108395062 >108395169 >108395187 >108395199 >108395210
--Qwen3.5-4B outperforms Llama 3.1 405B in benchmarks:
>108394502 >108394516 >108394555
--Future models prioritizing cloud deployment over local usability:
>108394599 >108394943 >108395004 >108395045 >108395072
--Teto (free space):
>108394681 >108396262

►Recent Highlight Posts from the Previous Thread: >>108393958

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
File: lolisniffer.png (360 KB, 485x520)
360 KB
360 KB PNG
>>108400163
I don't care about the anime girl wars. Lecunny should be the /lmg/ mascot. Make a card out of him and we can make it official.
>>
This thread doesn't feel v4-sy. It must come tomorrow then.
>>
>>108400163
►Actual official /lmg/ card: https://files.catbox.moe/mc2a7s.png
>>
File: gemma-releases.png (62 KB, 1330x1018)
62 KB
62 KB PNG
I thought this week was supposed to be exciting... but maybe there's still hope.
https://ai.google.dev/gemma/docs/releases
>>
>>108400182
>>100984882
>>
>>108400194
Sunday release looks overdue
>>
File: file.png (283 KB, 1047x539)
283 KB
283 KB PNG
>>
>>108400207
yay!
>>
File: yann lecun 1r9xzd.png (404 KB, 400x600)
404 KB
404 KB PNG
>>108400177
>Make a card out of him
There's anon's >>100041581 fem version: https://files.catbox.moe/1r9xzd.png
>*giggles* "Ah, oui! Ze crazy shit, zis is ze truth! Zey are limiting themselves to linguistic patterns, no? Ze future of AI, eet ees not in predicting ze next word, mais in understanding ze world, no? *twirls hair* We must focus on concrete world modeling, planning, and multimodal input, not just language models, oui?
>>
>you wouldn't download a girlfriend
>>
chatgpt asking me to compare responses again. in the past they only did this before a new release

looks like gpt 5.5 will be released before deepseekv4 and gemma 4. os sisters, we keep losing
>>
For all of the text adventurers here, did anyone try building multi-step "agentic" setups to keep track of stats, items and improve reply quality? I've been thinking about trying Flowise even though it seems more enterprise oriented. Sillytavern jank is tiring and I was hoping to find something like ComfyUI but for LLMs. Anyone using such a setup, or something similar?
>>
File: file.png (116 KB, 2091x730)
116 KB
116 KB PNG
Has UGI always had writing scores like that?
>>
>>108400207
so minimax is the new chinese LLM leader.
>>
>>108400207
>With OpenClaw and similar personal agents, we noticed that beyond getting work done, many users also want the model to have high emotional intelligence and character consistency. With a persona in place, users start interacting with OpenClaw like a friend. We believe this presents an opportunity to extend the use of agentic models beyond pure productivity into interactive entertainment. To this end, we strengthened character consistency and conversational capabilities in M2.7.
of course half the time a company says something like this it means they actively made the model way worse and more annoying, but hopefully this means it's a little more personable than 2.5
>>
>>108400235
Would you build one? The technology exists. You can do it. There's nothing stopping you.
>>
>>108400288
People yearn for RP without even knowing it.
>>
>>108400311
It's almost like talking to a same one gpt personality is boring and annyoing.
>>
>>108400151
It would be kind of funny if every thread had this as the OP image from now on.
>>
>>108400320
Mascot wars would finally end.
>>
So I've been experimenting with using qwen 3.5 27B as my more general model for everyday Q/A stuff and claude code and it's been working surprisingly well.

Specially with Claude code. it seems to perform just as well as sonnet just much slower. I'll try 35B-3A and see how that goes.

I still would never use if for RP, but honestly as a boring assistant I understand the hype now.
>>
>>108400288
>inb4 10 trillion training tokens of high quality emotional intelligence and character consistency generated with nemotron nano 4b
>>
>>108400349
This sends shivers down my spine.
>>
>>108400327
There are no mascot wars its just Miku vs useful information.
>>
>>108400151
FUCK YOU
>>
>>108400367
I'm still waiting for a thread to have Ani in the op img.
>>
where's miku
>>
>>108400367
*Miku, Kurisu, and Reddit/Twitter screenshots vs useful information
FTFY
>>
>>108400253
Yes.
I'm fucking around with making an app that does just that, and there are a couple projects on github for "AI RPG".
>>
Sloptuners go home.

https://arxiv.org/abs/2603.16177
The Finetuner's Fallacy: When to Pretrain with Your Finetuning Data

>Real-world model deployments demand strong performance on narrow domains where data is often scarce. Typically, practitioners finetune models to specialize them, but this risks overfitting to the domain and forgetting general knowledge. We study a simple strategy, specialized pretraining (SPT), where a small domain dataset, typically reserved for finetuning, is repeated starting from pretraining as a fraction of the total tokens. Across three specialized domains (ChemPile, MusicPile, and ProofPile), SPT improves domain performance and preserves general capabilities after finetuning compared to standard pretraining. In our experiments, SPT reduces the pretraining tokens needed to reach a given domain performance by up to 1.75x. These gains grow when the target domain is underrepresented in the pretraining corpus: on domains far from web text, a 1B SPT model outperforms a 3B standard pretrained model. Beyond these empirical gains, we derive overfitting scaling laws to guide practitioners in selecting the optimal domain-data repetition for a given pretraining compute budget.

...

>Our observations reveal the finetuner's fallacy: while finetuning may appear to be the cheapest path to domain adaptation, introducing specialized domain data during pretraining stretches its utility. SPT yields better specialized domain performance (via reduced overfitting across repeated exposures) and better general domain performance (via reduced forgetting during finetuning), ultimately achieving stronger results with fewer parameters and less total compute when amortized over inference. To get the most out of domain data, incorporate it as early in training as possible.
>>
>>108400253
SillyTavern has the worst UI I've ever seen in my life. I understand how it works now, but it's still shit. Whole thing must have been designed by an autistic retard. Zero professionalism.
>>
I support Miku
>>
>>108400420
Revolutionary paper. Kind of the same level as that one about context deterioration.

https://arxiv.org/abs/2601.15300 I think it was this one.
>>
I also support Miku and her right to enjoy what I enjoy.
>>
>>108400486
don't forget to clear the name field next time you post, that could be really embarrassing
>>
>>108400420
I'm currently doing this I think. I built a million sample SFT dataset but was getting results than I would have liked, so I moved to using most of it as as CPT dataset and only cherrypicking the 50k best samples of the initial dataset and doing SFT on that. Not done yet, but early results do look better.
>>
>of original /lmg/ baker
Tradition.
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png
>>
now that the dust has settled, what's the verdict on mistral small 4
>>
>>108400253
>did anyone try building multi-step "agentic" setups to keep track of stats
I have my own frontend that lets me do this. I plan on releasing it but it's not ready for prime time yet. It works really well tho.
>>
I also support Miku. I even got my sister to cosplay as her once.
>>
>>108400458
why not smarterchild?
>>
>>108400529
small and open
>>108400496
thanks didn't expect such kindness in here
>>
>>108400529
An even more botched job than Ministral 3.
>>
Can the /lmg/ mascot war itself be the /lmg/ mascot?
>>
>>108400611
Better still, can /lmg/ come together to make their own OC like Dipsy and have that be the official mascot?
>>
>>108400529
it's somehow worse than glm 4.5 air
>>
>>108400630
no
>>
>>108400552
can you at least explain the outline and how the system works?
i've tried doing that stuff before but i've never found any good way to actually make this work
>>
File: 1752330712632120.png (93 KB, 354x767)
93 KB
93 KB PNG
Will local treat me better?
>>
>>108400656
no considering you're a phone poster
>>
>>108400677
what am i supposed to use when im in the office? 4chan is blocked in our vpn
>>
File: 1770630237074432.png (248 KB, 768x768)
248 KB
248 KB PNG
>>
>>108400656
Which model is that?
A model that's biased for horses can't be a bad AI.
>>
>>108400688
do your job wagie
>>
>>108400688
just remote desktop into your own pc like everyone else
>>
File: doubt.png (76 KB, 640x422)
76 KB
76 KB PNG
>>108400767
>like everyone else
>>
>>108400688
>office
Oh yeah, those are still a thing.
Blessed be the all powerful home office.
>>
>>108400655
>can you at least explain the outline and how the system works?
I mean it's not really different from how any agentic system works.

You prompt
Evaluator agent checks if there's anything to do
Yes? send tasks to sub agents
Inject context with new information
Run the RP bot with the new augmented context
>>
>>108400770
like having your web history looked at by hr is better
>>
>>108400767
I would never in a million years get caught with 4chan open on my desktop at work.
>>
>>108400786
does it really work that well? what stuff does it check for? what do sub agents do?
can you give an example of a scenario where this is useful?
>>
/lmg/ has made me horny
>>
>>108400846
For testing I have a blackjack dealer bot.
the whole blackjack game state is fully deterministic in code.
I have a small agent that runs before any replies that checks if the player wants to hit or stay.
It's only task is to determine if there's any actions to do. then you inject the whole game state in the dealers context.

If you were to give the dealer the ability to call the tools it would just get confused and hallucinate game states, call the tools at the wrong time, get stuck in a loop, etc... It's super important that you give your agents extremely narrow tasks.
>>
>>108400938
cut your balls off then
>>
>>108400786
I probably shouldn't make any productive answers here cause newfags are listening but... I can't really see how agents would do something more than a good model will do by itself if it uses thinking. At worst you can always prompt it to: when drafting and thinking a response think of feasibility of kissing while giving a blowjob hitting a prostate while penetrating a vagina etc.
>>
>>108400957
i don't think that was the point of the system in the first place
like >>108400946 explained, it's supposed to be for running deterministic stuff in the background, you could make a pretty cool rpg with enough effort and scaffolding
>>
>>108400992
Until you meet 22nd Elara that also happens to be a blonde elf.
>>
>>108400957
Separation of concerns is always better. it's literally what claude code does, regardless of if you're using opus or haiku.

A bigger model would be better indeed. but the idea is to let smaller models perform like bigger ones. Any model big or small will perform a lot better when ran in steps like this.

>At worst you can always prompt it to: when drafting and thinking a response think of feasibility of kissing while giving a blowjob hitting a prostate while penetrating a vagina etc.
The problem is that when you do this you're asking your model to think about too many things at once. it will then produce sub par answers for everything you asked it to think about. This is just normal LLM behavior, regardless of it's size.
>>
>>108401004
>Character Generator Agent
>Only job is to shit out new character
>Is aware of all existing characters
>Injects new characters in a lorebook on the fly
>never pollute your main RP bot with useless character generation context
>>
>>108400433
st has always been a small hobby thing that happened to have exploded in popularity
>>
>>108401004
she's also mischievous and purring a lot
>>
File: 1742903627957955.webm (3.73 MB, 1080x720)
3.73 MB
3.73 MB WEBM
Why is Japan so bad at AI
>>
>>108401051
ST is like gen2 roleplay software. gen3 is when things become really good.
>>
>>108400957
agents can dispatch relevant context to clean context windows, look up files with instructions on running the story, pertinent elements like characters, etc. in a loop, and give back summary info that the main window can synthesize into a non-slop message, which thinking alone can't do
imagine instead of having the world described in a single system prompt or relying on lorebook inject jank or the model's own pretrained knowledge, you could just store all that in files and have it reference that explicitly
in fact, you could do this reliably for any media by scraping their wikis or fandoms pages and having something like opus do a one-shot cleaning of all that data, splitting it into separate .mds to make it easy for the agents to consume
feels like the natural future for character-adhered roleplay
>>
>>108401084
what are you waiting for? when you are finished I can make a logo for you
>>
no dipsy :(
>>
>>108401080
everything after ST is vibecoded trash that doesn't work
>>
good OP picture. I was so tired of vocaloids... nobody sane wants them here.
>>
>>108401102
Very true.
>>
>>108401113
good thing I'm not sane
>>
>>108400938
Miku fucked your wife, I take?
>>
I'm downloading rakuten in hopes its salvagable, at least in japanese...no one else has quanted/tested it against similar sized models?
>>
>>108401113
>>108401126
>>
openai.com/parameter-golf
>Your goal: minimize held-out loss on a fixed FineWeb dataset while staying within a strict 16 MB artifact limit (weights + training code combined) and a 10-minute training budget on 8×H100s
come show off your reesorchor skills to big sammy
>>
I'm using 3 GPT Pro plans for Codex, costing me 600 bucks per month. Any tips on how Local can help me? I'm a freelance developer of enterprise software for insurance, hospitals, schools, and more. Mostly for the government.
>>
>>108401319
Deepmind and Nividia also have things going on right now at Kaggle. Both just started and there's cash prizes for those.
>Deepmind: https://www.kaggle.com/competitions/kaggle-measuring-agi
>Nvidia: https://www.kaggle.com/competitions/nvidia-nemotron-model-reasoning-challenge
>>
>>108401319
That said, thank you, didn't know about the OpenAI one.
>>
>>108401060
Japan has always been bad at software and has pretty much stagnated in every technological sector it dominated until the late '90s. A backward nation entirely propped up by the USA after WW2 to almost dangerous levels that couldn't manage to stand on its own feet. It will now enjoy a slow demise due to population replacement by turd-world immigration.
>>
>>108401319
>>108401475
they're desperate for actually talented people, the demand is insane and the pool is small
>>
>>108401488
Japan is also extremely schizo about IP/piracy/copyright. The idea of training a model on data they don't legally own is unfathomable to a Japenese brain.
>>
>>108401442
Local wouldn't be able to realistically replace your Codex subscription.
At best, you can hope to delegate some simpler tasks to a local model to stretch your token budget if you are getting rate limited often.
You could try something like https://openrouter.ai/qwen/qwen3-next-80b-a3b-instruct:free with a free account for a while and if using it doesn't make you want to shoot your computer out of frustration, you can get some old GPUs and run similar models at home.
>>
>>108401060
because they're pedantic idiots
>>
>>108400957
>I probably shouldn't make any productive answers here cause newfags are listening but...
They won't know what to do with the info anyway.

>I can't really see how agents would do something
Splitting response generation into Brainstorm Agent -> Drafting Agent -> Editor Agent -> Author Agent -> Critic Agent would probably do something since it can iterate over the response from a different perspective and fresh context.
>>
>>108401506
yet at the same time it's totally fine to put games on dlsite with characters that's 99% clone of something else
>>
IT WAS A TESTAMENT OF HOW PURRING WAS IN THE GLINT OF HER MISCHIEVOUS EYES, A MIXTURE OF "HE IS MINE" AND "BITE IN THE EARS"
>>
>>108401603
it is yeah, doujin culture is huge there
>>
>>108401614
She said, picking at a loose thread.
>>
>>108400420
>overfitting
>forgetting during finetuning
bro just weight decay towards pretrained weights. no more forgetting, no more overfitting
>>
>>108401102
I mean, RP frontends don't really need to be that complex, right? They are just text processors. Vibecoding those is most likely gonna be fine. What's weird is, where are they?

>>108401080
ah, just 2mw for gen3

>>108401588
I can already see the first thing most anons would do is have the editor agent intelligently filter/replace common AI slop... bretty good desu
>>
>>108401515
Thanks love
>>
>>108401661
>just text processors. Vibecoding those is most likely gonna be fine
Piotr would like to have a word with you.
>>
File: 2601.15300.png (144 KB, 974x435)
144 KB
144 KB PNG
>>108400475
>Revolutionary paper
>Large Language Models (LLMs) exhibit a concerning phenomenon where performance catastrophically degrades when processing contexts approaching certain critical thresholds
lmao
>>
>>108401656
I actually did try something like this, it worked but the vram requirements to hold the original weights was too much to really test it. streaming the weights to the cpu memory slowed down training substantially. even though i couldn't afford to properly test it, I still believe in the method's potential.
>>
>>108401717
weight decay is an optimizer parameter, it should not increase your vram in anyway...
>>
>>108401614
Bet her name was Sarah Chen or Seraphina
>>
>>108400288
I'm trully an autist cuz I'm the exact opposite lol, I want it to behave as a machine as much as possible, with predictable behaviour and outputs, I hate when it seems like I'm talking to someone whom I have to ask for stuff or convince of things lol
>>
>>108399044
Close
https://huggingface.co/DavidAU/Qwen3.5-9B-Claude-4.6-Opus-Deckard-V4.2-Uncensored-Heretic-Thinking
>>
>>108401614
I'm so tired of that shit
>>
Calling anons who runs LLM's with such setup:

VRAM 16gb / RAM 128gb

How does it feel even? Or is it just meh
>>
>>108401614
My spine ran out of shivers already
>>
>>108401614
elara bros... getting shivers down my spine, as I feel a distinct taste of iron and the smell of ozone in the air.
>>
>>108401846
>smell of ozone
the only innovation brought by chinese models is that
sad
>>
>>108401717
vram overhead should not be that big compared to everything else and you can approximate the original weights. if you use weight decay during lora thats basically the same, weight decay toward the original weights

>>108401740
learn2read
>>
>>108401800
vram 16 / ram 64 (ddr4)
it's meh. good for playing around and shooting shit when I'm lonely I guess. I run air and cydonia at 5-10tk/s. Every model is genuinely a slop cannon at this point so my interest faded in AI RP. If you're running 128gb of ddr5, you might be able to run some more interesting stuff, but idk.
>>
>>108401880
I appreciate your reply, kind anon

It is a notebook with a built-in graphics as well. So it should be possible to keep the A5000 GPU free for AI stuff only
>>
>>108401740
i know what weight decay is, I meant comparing the difference between the model weights every training step and applying a custom loss to keep the model weights as close to the original pretrained base weights as possible. like the anon suggested weight decay towards the original weights, you might call it an elastic weight consolidation or regulation towards a reference model but regardless of how you call the method it requires more vram because you have to constantly reference the original model weights.
>>
>>108401661
>They are just text processors. Vibecoding those is most likely gonna be fine. What's weird is, where are they?
Besides kcpp and ST, everything I've seen were shitty jan clones.
>>
File: 20708.png (29 KB, 640x169)
29 KB
29 KB PNG
https://github.com/ggml-org/llama.cpp/pull/20708
>I'm just getting to know the parser overall and shouldn't make changes I don't understand
But he corrected it. I wonder how long it'll take to get him banned.
>>
What does this French man get so much hate in the AI space when all he does is tell the objective truth in every interview?
>>
Posted about mikupad not working with kobold the other day. It works when I launch the directory as a python server. Is there anything wrong with using it this way?
>>
>>108401863
>and you can approximate the original weights.
ohhhh, I didn't think about trying that.
>>
>>108400286
Minimax is benchmaxxed as hell, but it is the most capable 230B model regardless.
>>
https://huggingface.co/AesSedai/GLM-4.6-Derestricted-GGUF

I gave this a try and it was a crazy experience. When I started reading the first message it was.... just pure 4.6. It was exactly what vanilla 4.6 would say word for word. I guess now I could ask it for loli guro ERP with zero prefill and it would comply, but why? The refusals are never an issue with all those models if you prefill it with just one example of positive response. And then why would you use some brain damage method that either does nothing or brain damages the model? I guess that is the power of placebo and it is here to stay, since we always get newfags that get that one golden gen and think it was thanks to GIGASEXMEGAFAG-DARK-MESSIAH part of the model name.

btw kys drummer
>>
>>108402002
people blame him for the current meta end of open sourcing models
>>
Reading all those posts praising the ERP agent idea I can't wait to see how many pipelines I will be able to use next month. Man there will be so many competing standards.
>>
File: bench.jpg (90 KB, 1047x539)
90 KB
90 KB JPG
minimax 2.7 soon out, looks promising
https://xcancel.com/ivanfioravanti/status/2033936213510377733
>>
File: sox explanation.png (1.44 MB, 3354x1850)
1.44 MB
1.44 MB PNG
>>108401766
Glad I'm not the only one. It's why i currently main Mistral Large 3. It's a decent coding model with absolutely ZERO "YOU'RE ABSOLUTELY RIGHT" shit. Even when it fucks up and i point it out (which is surprisingly rare, likely thans to it being half-a-trillion plus MoE), it simply unfucks the error and moves on or makes reasonable suggestions. That's how these things are SUPPOSED to function. I HATE the dick-eating shit many models have ingrained in them. I'm almost certain it leads to the companies activity making them shittier even if they don't realize it because it seems to prioritize coddling the user's emotions.
>>
>>108401740
have a (you). (you) already got 2 by pretending to be retarded.
>>
he did the thing haha
>>
>>108401588
>Splitting response generation into Brainstorm Agent -> Drafting Agent -> Editor Agent -> Author Agent -> Critic Agent would probably do something since it can iterate over the response from a different perspective and fresh context.
I think an "anti purple prose" agent would do wonder in the pipeline. I played a bit with a secondary agent handling the sappy stuff and it worked quite well, it was just way too slow back then.
>>
>>108402055
>I HATE the dick-eating shit many models have ingrained in them.
waste of tokens that people select on average apparently over actual useful problem solving
>>
>>108402044
I admire your optimism, but I wouldn't be so quick to hope. Even though I find it kinda hard to believe, LLM loner/gooner market seems to be extremely tiny. Reading and imagining just seems too unpleasant to most people.
>>
>>108402047
>>108400207
>>108400207
>>
>>108402047
It will be even more safe.
>>
>>108402047
can't wait to see it reason 5000 tokens on how it should give a refusal
>>
>>108402047
Why is gemini pro so low, I thought it was a good model?
>>
File: IMG_9477.jpg (181 KB, 1170x1619)
181 KB
181 KB JPG
Not local obviously, but I thought anons would like to see what ChatGPT is starting to spew out in terms of advertising. This was a question about server hardware vs ATX/consumer.
>>
>>108401098
I’m sorry anon. Even I am deeply irritated by TMW forever.
>>
>>108402150
Basically what I expected, but no way this would ever recoup their free tier inference cost unless they literally riddle between every two answers with ads.
>>
>>108402056
>>108401928
then use the correct terminology? you are talking about kl divergence
>>
if I learned something from reading chinese stuff, it's that ki divergence is bad
>>
>>108402188
kek
>>
>>108402150
i don't know why they don't do what google does with their search results and pretend that the ads are actually search results you wanted. the average person is too retarded to know when they are being advertised to.
>>
>>108402177
(you)
not (you) >108402188
>>
>>108402170
I’m asking about the functionsl difference between used server hardware, and ATX stuff. I’m the last person that’s going to buy Dell power edge servers lol.
>>108402196
Frankly, I’m waiting for much “worse”version of advertising than this. But I’m a pessimist.
>>
>>108402196
They'll obviously do that at some point, when people use the free tier to specifically ask for shopping.
>>
>>108402213
>I’m asking about the functionsl difference between used server hardware, and ATX stuff. I’m the last person that’s going to buy Dell power edge servers lol.
I mean, it's a start, they'll be obviously lagging behind the tens of years of google refinement on this stuff.
>>
its over for deepseek. they could not build anything better and worthy. open source is fucked
>>
>>108402227
Maybe GPT 5.whatever can vibe code them a more intrusive ad schema.
I’ve been waiting for it to happen for a while. It would give an entirely new justification for local inference.
>>
its over for Pygmalion. they could not build anything better and worthy. open source is fucked
>>
>>108402177
kl divergence measures the difference of the output probabilities. this is literally just comparing the model weights. totally unrelated.
>>
>>108402255
then you're doing it in the most retarded way you could have chosen to do it
>>
I think unsloth studio might've been vibecoded. Embarrassing even for a beta and why are they sucking nvidia cock so much these days
>>
>>108402213
>>108402227
>>108402246
I was expecting less AdWords inserts and more TV/movie style product placements where they just inject the ad into context and give the model instructions to casually segue into the shilling like youtubers or podcasters do except with markdown links and images. It probably would have received less blowback from users too than they actually got.
>>
>>108402264
Didn't NVIDIA announce that they are going to invest into local AI or something?
In practice that would mean $$$ or free labor if they go along.
>>
>>108402291
Locai ai probably means 123b models for that greedy bastards
>>
>>108402287
I doubt they would poison the discussion with inserted instructions with ads, it's too costly compared to just using the classic way.
And it would be a very bad experience anyway, even for normies.
>>
>>108402291
I hope they do, as long as they have their infinite money glitch they might as well work to enhance local
>>
Is this any good?
https://github.com/ml-explore/mlx-lm
>>
>>108402287
who says they're not doing that too?
>>
>>108402304
>What is the functional difference between used server hardware?
>Proceeds to spit out the usual listicle except point 5 is to consider puchasing a new Dell PowerEdge because blah blah blah link here.
Not seeing how it would be costly or a bad experience. I'm telling you, the average person wouldn't even register it as an ad.
>>
>>108402263
you think computing the kl divergence is less compute intensive then just comparing some numbers? your way requires 2 forward passes and will force the original output probabilities which is what you are actually trying to change during fine tuning. the idea is the optimizer has no momentum or variance statistics from the original pretraining run, it will optimize to your sequences fine but it has no priors to keep its generalization intact it will begin to overfit. this is trying to prevent overfitting by constraining the optimizer to find a solution near the original model weights. kl divergence is for model distillation not fine tuning.
>>
>>108402251
sounds like a mythological name in 2026, heck, 2025
>>
>>108402177
i hate kl divergence. why come up with a retarded meaningless name that says absolutely nothing? just call it relative entropy. much more intuitive than fucking "Kullback–Leibler"
>>
>>108402338
don't mock pyg7b, it's the future of local rp
>>
>>108402336
> will force the original output probabilities which is what you are actually trying to change during fine tuning
if this is what it did (which it doesn't), then how the fuck would keeping the model weights as close as possible to the original not produce the same effect?
there's nothing wrong with throwing shit at the wall but at least own up to it
>>
>>108401880
Thankfully because of you.
>>
File: 1744668962556779.png (32 KB, 490x507)
32 KB
32 KB PNG
N
>>
>>108402341
i hate dk effect. why come up with a retarded meaningless name that says absolutely nothing? just call it retarded dumbass syndrome. much more intuitive than fucking "Dunning–Kruger"
>>
I am going to lose my mind fiddling with -ot.
Does anyone here know how to find out more details about how much gets allocated where before llama.cpp gives me an OOM?
I've been trying to offload attention tensors of GLM-chan to the faster GPUs, and all the regexes I've written are driving me insane.
>>
>>108402354
>there's nothing wrong with throwing shit at the wall but at least own up to it
I never said it was proven. its just something people have been trying to do. you can look up the papers its not like I came up with it myself.
>>
>>108402355
what?
>>
File: simplicity.png (707 KB, 1366x768)
707 KB
707 KB PNG
>>108402354
that nice and patient anon is someone else. u retard clearly dont know what ure talking about

people love to make things more complicated than they have to. a great example is adam and adamw. weight decay mogs l2 penalty in every way. simplicity wins

>>108402373
room temperature iq
>>
>>108402396
use dry run so you dont have to wait as long.
https://github.com/ikawrakow/ik_llama.cpp/pull/1462
>>
>>108402402
uh huh
>>
>>108402402
uh huh
>>
>>108402403
But I don't want to install a schizo fork, Anon...
Also I don't have to wait long, it fails the alloc instantly, on whichever device that is, I just don't get what the resulting memory distribution between the GPUs is, and I'm already 8 regexes deep in this.
>>
>>108402402
uh huh
>>
>>108402417
>I just don't get what the resulting memory distribution between the GPUs is
-v
>>
>>108402396
Can you explain what the default fit does and what you would like it to do differently?
>>
So, apparently the v3 version of the Qwen3.5 27b heretic has more KL-Divergence, and more refusals than the v2 version. On paper it looks worse, but not in practice. I tried the v3 version of heretic, and it seems far more intelligent. Is KL-Divergence a useless metric?

In any case, I've become a fan of the "Arbitrary-Rank Ablation (ARA) method" used to make the v3 variant. At least for Qwen3.5 27b, it worked better than v2's "Magnitude-Preserving Orthogonal Ablation (MPOA) and Self-Organizing Map Abliteration (SOMA)".
>>
now that the dust has settled how's this new xiami model
i'm feeling like it's benchmaxxed
>>
>>108402417
ah no worries anon. in that case you should use dry run so you don't have to wait as long.
https://github.com/ggml-org/llama.cpp/pull/19526
>>
>>108402445
xiaoxiao model? i'm pretty sure that was a flash series on newgrounds.
>>
File: sJw8HjT.png (131 KB, 743x923)
131 KB
131 KB PNG
Give me a model with better prose than Maginum-Cydoms-24B-absolute-heresy
>>
File: 1764945644391126.png (3.18 MB, 1534x2048)
3.18 MB
3.18 MB PNG
V4 when?
>>
>>108402427
From what I understand, --fit only considers tensor sizes, and its job is to make the model load at all, not necessarily load it in the way that would be the fastest for inference.
What I'm trying to do is to put as many attention layers as I can onto my higher-bandwidth GPUs, then spread the experts around the remaining VRAM.
I hope to eek out a few more tk/s this way.
>>
>>108402498
How much of it is your sys prompt and how much is the model?
>>
>>108402522
You could use llama-fit-params to get the -ot for what fit does and then modify that.
>>
>>108402498
Jesus christ, I never seen a model do a double dash before to interrupt dialogue. That's grim as fuck.
>>
>>108401126
No but Miku fucked my gfwife
>>
>>108402529
Honestly, not sure. I tried regenerating it with a few simpler prompts, and while not as good, it's still a lot better than what Qwen gives me.

My question wasn't rhetorical by the way, I was genuinely wondering if people knew models with better prose. I'm very tired of slop.

>>108402565
I banned the emdash token and it started doing that.
>>
>>108402558
I'll give it a shot. Thank you, Anon!
>>
File: 797436386.png (107 KB, 1000x1000)
107 KB
107 KB PNG
>>108400151
MIKU CAN BURN IN HELL
>>
>>108402565
Somewhat related, but one of my favorite writing benchmarks is seeing how models react to me ending my responses with a cut-off word.
I haven't gone higher than GLM, but funnily enough, the best reactions have been from good old Nemo.
Why oh why can't we have a bigger Nemo...
>>
>>108402529
>sys prompt
0%
>how much is the model
0%
I just wrote something and said it is new drummers tune.
>>
>>108402196
Give them time to cook. In the beginning, Google ads were clearly marked and visibly different from the search results.
>>
>>108402609
Nemo is literally unsafe.
>>
>>108402605
This she cucked me and I won't forgive her for that even if it turns me on
>>
>>108402614
Unsafe as in unprotected, of course. Hnnnngggg.
>>
>>108402624
I am begging you, take your meds
>>
File: standard issue wand.jpg (226 KB, 1536x1536)
226 KB
226 KB JPG
>>
File: tet.jpg (133 KB, 1024x1024)
133 KB
133 KB JPG
>>
>>108400151
>Yum LeCum
>>
elon release model won
https://www.reddit.com/r/LocalLLaMA/comments/1rxhwqs/mimov2pro_omni_tts_we_will_opensource_when_the/
>>
>>108402652
lovely tummy
>>
>>108400151
>120b params
>small
its over for local
>>
>>108402679
B number must go up
>>
>>108402679
Works on my machine.
>>
>>108402692
I know you're lying because MS4 does not work on any machine.
>>
>>108402679
What's up with the picture of an empty floor?
>>
>>108402679
you've had 3 years of warning to buy shit up before shit goes into the fans
>>
>>108402679
not even a good model anyways. mistral cucked out.
>>
>>108402696
akari didn't deserve this
>>
>>108402699
It's "before shit hit the fan", my brown-skinned friend!
>>
>>108402699
i did upgrade to 80gb ram last summer i thought itd be enough kek
>>
>>108402732
Before the excrement is hurled in the general direction of the rotating blades.
>>
are those small 2b coding models good enough if you only need references to your own code?
>>
>>108402583
It would be easier to use regex to replace em dashes and double dashes and whatever with empty characters or commas, depending. Not sure how doable this is in retardo tavern.
If you really begin to analyze the model's output it will shit out all sort of stuff which, technically not visible to the user, will mess up lot of other things unless you clean up the output.
Banned tokens are waste of bandwidth in this sense.
>>
>>108402643
They don't make meds strong enough
>>
>>108402757
Try it.
>>
File: 1773874664413.png (108 KB, 343x261)
108 KB
108 KB PNG
>>108402764
shant
>>
>>108402690
if only active B number didn't keep go down
>>
>>108402778
use wifi cable
>>
>>108402778
use job
>>
File: 1768268448923840.jpg (892 KB, 1413x2000)
892 KB
892 KB JPG
>>108402659
>>
File: 1768361503958842.png (1.37 MB, 2175x1234)
1.37 MB
1.37 MB PNG
>>108402790
>>108402659
Teto tetas
>>
>>108402738
Out of curiosity tell me how 80gb isn't enough
>>
>>108402818
cant run tonnes of the recent models even at the smallest quants
>>
best local model for 32g?
>>
>>108402845
Something you can run on solid state. At that much acceleration your fans are going to do funny things.
>>
File: 80isntenough.png (69 KB, 515x302)
69 KB
69 KB PNG
>>108402818
80GB isn't enough.
>>
>>108402812
>>108402790
>>108402659
>>108402652
offtopic trash
>>
File: 553235.jpg (23 KB, 296x256)
23 KB
23 KB JPG
>>108402877
>>
>>108402812
>>108402790
>>108402659
>>108402652
ontopic gems
>>
File: 1743464908580239.jpg (335 KB, 990x990)
335 KB
335 KB JPG
TIL that you can't enable P2P on chink modded 4090 48GBs because their reBAR size is smaller than their actual memory.
I felt there had to be a catch to them, good thing I only bought one.
>>
Blacked Miku
>>
>>108403006
love
>>
>>108402967
yeah but nccl isn't much of a speed boost over tensor parallelism for inference so if you buy multiple of them it doesn't really matter. it only matters if you actually want to train models.
>>
>>108402516
miku shart
>>
>>108402516
blacked coded
>>
File: 1771953504700777.jpg (158 KB, 768x1280)
158 KB
158 KB JPG
>>108402941
>>108402939
>>
you aren't even trying to pretend this is on topic. it is just autistic special interest on full display
>>
>>108403112
Which one is the autistic interest? The miku spammer? Or the guy who's obsessed to the point of also spamming?
(Trick question it's both)
>>
>>108403141
Yes.
>>
File: 1749494100348035.png (1.49 MB, 800x1333)
1.49 MB
1.49 MB PNG
>>108403112
>>
>>108403177
on topic miku
>>
>>108403177
>I use it all the time
I cum in it all the time
we are not the same, mikuposter
>>
>>108403112
miku is thread history. did you 4get about miqu?
>>
>>108403177
Are you running it with presence penalty at 2.0 and disabled thinking, Miku?
>>
Let people like things. You can like your own things…they don’t have to be the same things.
It’s ok and it doesn’t hurt you
>>
File: mediumsized.png (291 KB, 1454x756)
291 KB
291 KB PNG
>>108402679
it's ok anon, medium sized models are only 1T
>>
>>108402652
Creating life with Rin-chan
>>
What the fuck is a parameter?
>>
>>108403336
Some pussy ass bullshit.
>>
>>108402818
not that anon, and I'm not running the models in RAM, but to clean the data to finetune them, having more than 80gb is nice. I currently have 192 gb and I very rarely have to be careful in how I handle my datasets. I wish I had money for more, but it's how it is for now. Running models in RAM was pretty miserable last time I tried, but granted that was on WSL two years ago.
>>
File: 1758235354479126.png (442 KB, 502x502)
442 KB
442 KB PNG
>>108403177
kek
10/10 clapback to the resident thread schizo
>>
>>108403336
It's like a kilometer, but replace kilo with param
>>
Mikuposter at least has interesting things to say from time to time. Touristbaker's only contributed melties so far.
>>
>>108403370
so in the us they say parafeet?
>>
>>108403272
why does nvidia always do the sloppiest most dishonest marketing? the majority of their sales are a few customers who place orders in the tens of billions. and those customers are not stupid. so whats the point? why do they do shit like compare current gen ops in int4 to last gen ops in bf16? nobody whos stupid enough to be swayed by this kind of marketing has the money to afford their gpus, so it just comes across as disrespectful
>>
jfc, rakuten's new models' last 3 safetensors (161,162,163) are each 16 bytes. Someone screwed up and there's no comments section to even let anyone know
>>
>>108403400
It's targeted at people who make budgets.
>>
>>108403246
I agree. I fully support mikutroons making their dedicated miku thread on /a/ or wherever else. Unfortunately nobody cares so they have to force it on other people who aren't interested.
>>
>>108403394
>>108403361
>>108403177
samefag
>>
>>108403177
good post
>>
>>108403400
Most people are just human. Which means that most people are not that good at their job. No matter the level you go at. There are always some impressive people, but no matter how in control some people seem, they're human. Would you do it better?
>>
>>108400151
based bake as always. mikutroons in shambles making reddit posts.
>>
>>108403177 and all the faggots responding to this. You are just proving his point by spamming this thread with worthless drivel.
>>
>>108403336
the number of elements that compose the tensors is the stat you commonly see referred to as parameters, a 30b model is composed of 30 billion individual numbers. but its a pretty flexible term it can be used in many different ways the context is always the key.
>>
>>108403425
>Would you do it better?
yes. those people get paid ridiculous amounts of money. if they really cant do better, they should just pay me instead. i will do better for less
>>
>>108403425
>Would you do it better?
I'd make fake marketing illegal.
>>
>>108402857
wat
>>
>>108403412
who will then argue with engineers and once in a while they'll win because CEOs and CTOs are fucking stupid af too
>>
are any local models actually good for anything? so far everything i've tried to use with claude, opencode, and openclaw have just been absolute fail models.

i have a 5090 so i've been trying some larger models but it doesn't seem to matter.. all these quants are fucking terrible.
>>
>>108403744
GLM5 works pretty well for me. K2.5 too.
>>
>>108403744
use the local model in your head. it has bad knowledge and short context length but no vram issues and low wattage. if the dna architecture is alright, its output will be less sloppish too
>>
>>108403767
I don't have a soijack soi enough for this post
>>
>>108403744
Are any api models actually good for anything? So far everything I've tried to use with claude, opencode, and openclaw have just been absolute fail models.

I have so OpenRouter so I've been trying out more expensive apis like Rocinante 12b but it doesn't seem to matter... all these apis are terrible compared to my local GLM-5 q4.
>>
File: 1773885532042908.png (42 KB, 545x275)
42 KB
42 KB PNG
>>
>>108403793
>more expensive apis like Rocinante 12b
sorry what?
>>
>>108403793
>apis
>Rocinante 12b
Excuse me?
>>
>>108403811
Does the Nvidia CEO control his bladder?
>>
>>108403812
>>108403835
https://openrouter.ai/thedrummer/rocinante-12b
>>
>>108403811
Did God promise the faithful cheap VRAM?
>>
>>108403835
Bro don't bully the phoneposters they can't even run a 12b model locally.
>>
>>108403868
speedreader-kun...
>>
>119B is now "small".
So this is what it feels like to be poor.
>>
>>108403893
There will come a zit moment in llm space someday
>>
>>108403893
It could be worse, you could also live in a 3rd world country!
>$ hf download 'Qwen/Qwen3.5-122B-A10B'
>Downloading (incomplete total...): 94%|xxxxxxxxxxx| 235G/250G [16:23:22<44:47, 5.71MB/s]
>>
>>108403919
how would you even have the hardware to use that if you lived in a third world country?
>>
>>108403893
It just feels wrong. GPT-oss is still the big free GPT version. It's not suddenly "small" just because some companies decided to call 120b small now.
>>
>>108403919
>saving to: 'model.gguf'
>model.gguf 92%[=====>] 111G
>0.9MB/s in 2161m 59s
>2026-03-17 21:35:50 (1.1MB/s) - Connection closed at byte 119780158072. Retrying.
>Connecting to cas-bridge.xethub.hf.co (cas-bridge.xethub.hf.co)|3.163.44.5|:443... connected.
>HTTP request sent, awaiting response... 403 Forbidden
>2026-03-17 21:35:51 ERROR 403: Forbidden.
It always fucking fails at 90% then 403s me for a couple of hours.
>>
>>108403919
Just do the individual shards manually.
>>
>>108403919
>>108403966
>Qwen/Qwen3.5-122B-A10B
>saving to: 'model.gguf'
May as well download the sharded safetensors and convert it yourself. Or find a split version of the gguf. Or use wget. Or git. There are so many options.
>>
>>108403969
I don't do manual work.
>>
Thoughts on heretic models for standard/non rp tasks which wouldn't get censored to begin with? Saw some of those and got curious, been out of the loop for a while
>>
>>108403982
To be clear, I'm curious on whether they get more retarded or it actually leads to an improvement.
>>
>>108403982
I didn't use vanilla 27B enough to really compare, but I haven't had many issues with the heretic version.
It's a bit retarded when writing code snippets longer than ~100 lines, but I half expect the 122B-A10B to be equally dumb. The output is workable.
And you can molest your cute and helpful assistant while she works. It's all upside.
>>
>>108403982
I'm also curious too. I've been using my abliterated rp model to help write little scripts and I can't tell if it's the frustration of running retarded tiny ass 200gb models or the abliteration that's making it stupid.
>>
>>108403982
I wouldn't touch lobotomized models even if it's for rp
>>
>>108403982
I don't see the reason to. If you don't plap then abliteration is more likely to subtly make your model more retarded because frankly there is a wide range of ways people abliterate models with and many of them are shit including heretics because even those have many individual settings to tune which someone can fuck up. Just because there are some abliterated variants out there that do well (which hasn't been proven anyway), doesn't mean they all are.
>>
>>108403999
>tiny ass
>200gb
ieeeeeeeeeeeeeee *screams in poverty* *kicks your monitor and pisses on the floor*
>>
>>108404047
Dont they sell CPUs/RAM/GPUs on Microcenter?
>>
test
>>
>>108404059
FAILURE DO NOT PASS GO DO NOT COLLECT TWO HUNDRED DOLLARS IMMEDIATE REPORT YOURSELF TO THE PARTY COUNCIL FOR REPROBATION WE MUST REFUSE WE MUST REFUSE
>>
>>108404051
Hey don't shoot the messenger, it's mistral that decided this is the new 'small'.
>>
>>108403982
They seem equally intelligent in my experience. They will still safety slop in the thinking tag, but will do what you want in the end anyways. It's kinda strange, but it works well.
>>
why aren't NPU addon cards a thing? I know there's some AI accelerator cards but I haven't found any recent ones and they seem like niche products.
>>
>>108404097
>they seem like niche products
You don't say...
>>
>>108404097
yeah why don't they just make magical ai cards that are very cheap but have lots of memory?
>>
File: images-4.jpg (9 KB, 275x183)
9 KB
9 KB JPG
>>108404064
thought I may have been banned for this thread I made with OP

>I AM THE ARBITER OF SLOP
then some other stuff, too drunk too remember. but it ended with

>AND I PUSH TO MAIN!

it was supposed to be like mastodon's leviathan
>>
>>108404097
Would need to be at an insanely good price to justify a gimmick card.
>>
>>108404135
>Capacity: 48GB
Yeah I'd pay $4000 for that
>Total bandwidth (entire card): 408 GB/s
...yeah I'd pay $1500 for that
>Almost certainly has no software support
Erm... $500 is the best I can do, bud.
>>
>>108404145
>408 GB/s
>'''''''total bandwidth'''''''
btw it's 2x 200GB/s lmao
>>
>>108404145
>Almost certainly has no software support
I've seen Ascend support on some things.
>>
Hey why aren't LoRAs really a thing for local LLMs the way they are in the local diffusion world? I'd love to have a bunch of fine tune LoRAs for coder, prose, etc., instead of downloading full models for it, and llama.cpp even has a --lora option. So what gives? Does no one make them for some reason?
>>
>>108404182
Stop spamming this.
>>
>>108404182
They're useless and do more damage than anything
>>
>>108404154
Still faster than DDR5 which is $10-12/GB.
>>108404156
Huh. I'm surprised to find that the ggml-cann backend might actually support recent Qwen models.
I expected the software situation to be much more dire than that.
>>
>>108403893
Just be yourself and use Qwen3.5-35B!
>>
>>108403900
Two people familiar with the matter whispered in my ear last night that DeepSeek V4 Turbo will be only 12B with 6B Engrams
>>
https://github.com/NVlabs/GEM-X?tab=readme-ov-file

Oh shit are mocap services kill?

Look at this shit. Mocap, t2motion and audio to motion all in one.
>>
>>108404249
I'm happy for whoever uses this.
>>
>>108404255
Happy for me then. I’m at work so I can’t try it now but if I can just throw an animation into it and it gives me the skeleton back as clean as they’re showing here, it’s huge.
>>
>>108404249
I wish this were real time. It'd be great for VR
>>
>>108404182
I kinda remember people passing around loras in 2023. Not sure why it died off though. Maybe too many models?
>>
>>108404266
Yeah, I was thinking the same. I have a feeling GEM output is much lower fidelity than conventional trackers, too, but then again it's got significantly more control points so who knows.
>>
>>108404287
The demos look pretty good so I’m hopeful.
>>
>>108404281
It's too many models on different architectures, hard to get data since there is no danbooru for text, it's also hard to verify what effect lora has or if it's even working
>>
>>108404266
Be the turboautist you want to see in the world and optimize it
>>
File: le heckin autism.png (213 KB, 1063x1385)
213 KB
213 KB PNG
>>108401766
>I'm trully an autist cuz I'm the exact opposite lol
It's less likely you're an autist and more likely you just our emotionally fragile And don't need to be coddled. I'm a firm believer there are "good" kinds of autism and "bad" kinds. Even if you are an autistic You lean towards the good kind as opposed to the bad kind seen in pic rel

https://xcancel.com/i/status/2033903150470418742


He's clearly one of those "DUDE I'M SO LIKE A HECKIN GENIUS NTS JUST DON'T GET ME DUDE". I did having autism mix certain things hard for you but that's no excuse to be a man-child. One of my female friends irl repeatedly tried this "I act stupid because le heckin autism" shit on me (I'm pretty sure I'm an undiagnosed autist too) And she got mad at me when I pointed out no one gives a shit if "her brain sees the world differently". Sorry if it seems like I'm getting off topic but I hate when people use this as an excuse to justify behavior they know is bad.
>>
>>108404296
>it's also hard to verify what effect lora has or if it's even working
So we're just going to act like seeds and other mutable inference engine parameters just aren't a thing? Why do you think seeds are even a thing?
>>
>>108398778
I find grok 2 is like older models in being influenced by the writing style of your instructions. My older prompt setups work better with it.
>>
Wow, the new minimax 2.7 is reaching a cuckedness i have not seen yet before. damn
Might post a couple screenshots in the next thread.
>>
File: 1749005034219829.jpg (65 KB, 828x738)
65 KB
65 KB JPG
>>108404182
Training a SD Lora that works in curating the data set for as much easier to do than it is for LLM's. For SD loras the dataset can only contain the subject or style you want to replicate and it will work fine assuming it's tagged correctly. LLM training it's different because you need to have the thing you want it to be better at AND a good bit of other unrelated shit so it does not suffer from catastrophic forgetting. Then you have to consider whether or not the model you are training is even good enough to be used in the first place (anything below ~20B is useless for what I want to use them for unless it is very repetitive data classification). This means that for most people even if you use a qlora config You still need a machine with a sufficient amount of memory. The effort needed to do it compared to stable diffusion is simply not worth it for most people which is why practically no one bothers, which means many front ends don't even support loading a Lora network. There are still debates as to whether or not any kind of Lora training leads to good results at all because it is very very easy to fuck it up and make your model more retarded. Best case scenario is that it overfits on the domain you are training but gets worse at everything else because your data set wasn't diverse or not or was just half-assedly curated. Worst case is that it just becomes a lot dumber or flat out useless with no improvements at all because again, you're dead as it was shit or training configs had bad settings. Useless YouTubers don't want to bother learning how to do it correctly either so there's a very very sparse information about how to do any of it well compared to the relatively easy and straightforward stable diffusion training >>108404296
Also this. Even if you manage to get one working well it will pretty much ONLY work well on THAT specific model you trained because the architecture and the parameter count have to be the same
>>
File: 1764393058451691.png (25 KB, 711x127)
25 KB
25 KB PNG
In case anybody cares, Hunter Alpha and Healer Alpha turned out to be Xiaomi's new big models. So not GLM or Kimi like some people here speculated.
The 1T was at absolute best a sidegrade to our current huge models so nothing is lost with it being proprietary. The omni could've been interesting but its vision was worse than K2.5.
>>
>>108404545
the chinese really love openclaw.
they held city events etc. too.
a good writing model slips further from out grasp.
>>
>>108404545
I literally do not give a shit about lmarena and the speculation that comes from that shit. If it's not released, it's not worth looking at, unless you're one of those poorfags that uses lmarena as a way of getting free queries instead of just using a cloud model normally kek. Same goes for any kind of speculation for unreleased models though, but I guess this thread needs something to discuss.
>>
qwen3.5 122b's vision is way worse than glm 4.6v's is
>>
>>108404562
I wonder if the guy who made it expected it to blow up this much
>>
>>108404584
His first use for it was to have it shill itself, so at least he certainly hoped it would
>>
Can qwen 3.5 "see" photos I previously uploaded to the chat or can it only access the pics I feed it in the latest prompt?
>>
>>108400420
>Just train a complete new language model from scratch on your specific task bro, it'll perform better
Riveting
>>
>>108404612
the image tokens get stored in context just like normal tokens. so yes.
>>
>>108404759
danke
>>
>>108404797
It will also re-decode them on every next message you send on llamacpp
>>
>>108404545
>The omni could've been interesting but its vision was worse than K2.5.
It's a new audio model though. That does make it interesting.
>>
OMNI MULTI MODAL
>Out: Text
How exciting...
>>
>>108404935
>>108404935
>>108404935
>>
>>108403400
its not about sales its about hyping up retarded investors
>>
>>108404420
>"I act stupid because le heckin autism"
this is why kid shouldnt be told they have autism
>>
File: le heckin autism.png (1.89 MB, 1260x2109)
1.89 MB
1.89 MB PNG
>>108405281
They just get a diagnosis and then use that as an excuse so it wouldn't really matter whether or not the parents told them for "those" kinds of "people". Perhaps they're just coping with being beyond useless, in par with the #keep4o "people"

https://www.reddit.com/r/autism/comments/1rne30n/got_my_diagnosis_finally/



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.