/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 03/18/26(Wed)12:51:01 No.108400151

File: file.png (290 KB, 543x543)

290 KB PNG

/lmg/ - Local Models General Anonymous 03/18/26(Wed)12:51:01 No.108400151 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108393004

►News
>(03/17) Rakuten3.0 released (nobody posted any logs yet): https://huggingface.co/Rakuten/RakutenAI-3.0
>(03/16) Mistral 4 small releasing: https://huggingface.co/collections/mistralai/mistral-small-4
>(03/11) Nemotron 3 Super released: https://hf.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
03/18/26(Wed)12:52:37 No.108400163

Anonymous 03/18/26(Wed)12:52:37 No.108400163

Oh, what a clutz you are. Here you go.
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

Anonymous
03/18/26(Wed)12:53:10 No.108400169

Anonymous 03/18/26(Wed)12:53:10 No.108400169

~Small and Open~

Anonymous
03/18/26(Wed)12:53:15 No.108400170

Anonymous 03/18/26(Wed)12:53:15 No.108400170

Yannlove

Anonymous
03/18/26(Wed)12:53:33 No.108400174

Anonymous 03/18/26(Wed)12:53:33 No.108400174

File: 1764446103940328.jpg (90 KB, 1242x848)

90 KB JPG

►Recent Highlights from the Previous Thread: >>108393004

--RakutenAI-3.0 DeepSeek-V3 MoE release and benchmarks:
>108393026 >108393044 >108393091 >108395391 >108398158 >108399357 >108394186 >108394278 >108393079
--Multi-GPU setup performance and cost comparisons:
>108393779 >108393813 >108393856 >108393880 >108393842 >108393860 >108393862 >108393864 >108393889 >108393922 >108395308 >108395379 >108395450
--GROK-2 performance tuning and response behavior on 3090:
>108393082 >108393093 >108393123 >108393131 >108393164 >108393203 >108393144 >108394135 >108394210 >108398778
--MiroThinker-v1.5-235B architecture and stability concerns:
>108393525 >108393566 >108393587 >108393687 >108393568 >108393645
--Tool call detection issues in reasoning blocks:
>108397549 >108397577 >108397685 >108397800 >108397809 >108397828 >108397837
--Pipeline parallelism graph reuse causing throughput fluctuations:
>108394574 >108394600 >108394601
--EMAGE ONNX export repo for streaming gesture model inference:
>108394782 >108395331
--MiniMax M2.7 announcement and benchmark performance vs other models:
>108398047
--Mistral Small 4 throughput drops with longer context:
>108395290 >108395293 >108395299 >108395336 >108395392 >108395398 >108395403 >108395418
--Debating Q8 quantization for k/v cache to extend context length:
>108399779 >108399786 >108399797 >108399970 >108399988 >108399995 >108399948
--llama.cpp chat parser regression fix debate:
>108393200 >108393243 >108394117 >108394320 >108397199
--AI models' varied responses to offensive prompts:
>108395062 >108395169 >108395187 >108395199 >108395210
--Qwen3.5-4B outperforms Llama 3.1 405B in benchmarks:
>108394502 >108394516 >108394555
--Future models prioritizing cloud deployment over local usability:
>108394599 >108394943 >108395004 >108395045 >108395072
--Teto (free space):
>108394681 >108396262

►Recent Highlight Posts from the Previous Thread: >>108393958

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
03/18/26(Wed)12:53:58 No.108400177

Anonymous 03/18/26(Wed)12:53:58 No.108400177

File: lolisniffer.png (360 KB, 485x520)

360 KB PNG

>>108400163
I don't care about the anime girl wars. Lecunny should be the /lmg/ mascot. Make a card out of him and we can make it official.

Anonymous
03/18/26(Wed)12:54:17 No.108400179

Anonymous 03/18/26(Wed)12:54:17 No.108400179

This thread doesn't feel v4-sy. It must come tomorrow then.

Anonymous
03/18/26(Wed)12:54:33 No.108400182

Anonymous 03/18/26(Wed)12:54:33 No.108400182

>>108400163
►Actual official /lmg/ card: https://files.catbox.moe/mc2a7s.png

Anonymous
03/18/26(Wed)12:56:49 No.108400194

Anonymous 03/18/26(Wed)12:56:49 No.108400194

File: gemma-releases.png (62 KB, 1330x1018)

62 KB PNG

I thought this week was supposed to be exciting... but maybe there's still hope.
https://ai.google.dev/gemma/docs/releases

Anonymous
03/18/26(Wed)12:57:47 No.108400197

Anonymous 03/18/26(Wed)12:57:47 No.108400197

>>108400182
>>100984882

Anonymous
03/18/26(Wed)12:58:03 No.108400198

Anonymous 03/18/26(Wed)12:58:03 No.108400198

>>108400194
Sunday release looks overdue

Anonymous
03/18/26(Wed)12:58:47 No.108400207

Anonymous 03/18/26(Wed)12:58:47 No.108400207

File: file.png (283 KB, 1047x539)

283 KB PNG

Anonymous
03/18/26(Wed)13:01:26 No.108400219

Anonymous 03/18/26(Wed)13:01:26 No.108400219

>>108400207
yay!

Anonymous
03/18/26(Wed)13:02:24 No.108400230

Anonymous 03/18/26(Wed)13:02:24 No.108400230

File: yann lecun 1r9xzd.png (404 KB, 400x600)

404 KB PNG

>>108400177
>Make a card out of him
There's anon's >>100041581 fem version: https://files.catbox.moe/1r9xzd.png
>*giggles* "Ah, oui! Ze crazy shit, zis is ze truth! Zey are limiting themselves to linguistic patterns, no? Ze future of AI, eet ees not in predicting ze next word, mais in understanding ze world, no? *twirls hair* We must focus on concrete world modeling, planning, and multimodal input, not just language models, oui?

Anonymous
03/18/26(Wed)13:02:48 No.108400235

Anonymous 03/18/26(Wed)13:02:48 No.108400235

>you wouldn't download a girlfriend

Anonymous
03/18/26(Wed)13:02:55 No.108400236

Anonymous 03/18/26(Wed)13:02:55 No.108400236

chatgpt asking me to compare responses again. in the past they only did this before a new release

looks like gpt 5.5 will be released before deepseekv4 and gemma 4. os sisters, we keep losing

Anonymous
03/18/26(Wed)13:05:34 No.108400253

Anonymous 03/18/26(Wed)13:05:34 No.108400253

For all of the text adventurers here, did anyone try building multi-step "agentic" setups to keep track of stats, items and improve reply quality? I've been thinking about trying Flowise even though it seems more enterprise oriented. Sillytavern jank is tiring and I was hoping to find something like ComfyUI but for LLMs. Anyone using such a setup, or something similar?

Anonymous
03/18/26(Wed)13:07:05 No.108400261

Anonymous 03/18/26(Wed)13:07:05 No.108400261

File: file.png (116 KB, 2091x730)

116 KB PNG

Has UGI always had writing scores like that?

Anonymous
03/18/26(Wed)13:10:53 No.108400286

Anonymous 03/18/26(Wed)13:10:53 No.108400286

>>108400207
so minimax is the new chinese LLM leader.

Anonymous
03/18/26(Wed)13:11:00 No.108400288

Anonymous 03/18/26(Wed)13:11:00 No.108400288

>>108400207
>With OpenClaw and similar personal agents, we noticed that beyond getting work done, many users also want the model to have high emotional intelligence and character consistency. With a persona in place, users start interacting with OpenClaw like a friend. We believe this presents an opportunity to extend the use of agentic models beyond pure productivity into interactive entertainment. To this end, we strengthened character consistency and conversational capabilities in M2.7.
of course half the time a company says something like this it means they actively made the model way worse and more annoying, but hopefully this means it's a little more personable than 2.5

Anonymous
03/18/26(Wed)13:13:25 No.108400303

Anonymous 03/18/26(Wed)13:13:25 No.108400303

>>108400235
Would you build one? The technology exists. You can do it. There's nothing stopping you.

Anonymous
03/18/26(Wed)13:15:21 No.108400311

Anonymous 03/18/26(Wed)13:15:21 No.108400311

>>108400288
People yearn for RP without even knowing it.

Anonymous
03/18/26(Wed)13:16:25 No.108400316

Anonymous 03/18/26(Wed)13:16:25 No.108400316

>>108400311
It's almost like talking to a same one gpt personality is boring and annyoing.

Anonymous
03/18/26(Wed)13:17:41 No.108400320

Anonymous 03/18/26(Wed)13:17:41 No.108400320

>>108400151
It would be kind of funny if every thread had this as the OP image from now on.

Anonymous
03/18/26(Wed)13:18:57 No.108400327

Anonymous 03/18/26(Wed)13:18:57 No.108400327

>>108400320
Mascot wars would finally end.

Anonymous
03/18/26(Wed)13:21:40 No.108400343

Anonymous 03/18/26(Wed)13:21:40 No.108400343

So I've been experimenting with using qwen 3.5 27B as my more general model for everyday Q/A stuff and claude code and it's been working surprisingly well.

Specially with Claude code. it seems to perform just as well as sonnet just much slower. I'll try 35B-3A and see how that goes.

I still would never use if for RP, but honestly as a boring assistant I understand the hype now.

Anonymous
03/18/26(Wed)13:22:03 No.108400349

Anonymous 03/18/26(Wed)13:22:03 No.108400349

>>108400288
>inb4 10 trillion training tokens of high quality emotional intelligence and character consistency generated with nemotron nano 4b

Anonymous
03/18/26(Wed)13:23:00 No.108400354

Anonymous 03/18/26(Wed)13:23:00 No.108400354

>>108400349
This sends shivers down my spine.

Anonymous
03/18/26(Wed)13:24:46 No.108400367

Anonymous 03/18/26(Wed)13:24:46 No.108400367

>>108400327
There are no mascot wars its just Miku vs useful information.

Anonymous
03/18/26(Wed)13:25:02 No.108400371

Anonymous 03/18/26(Wed)13:25:02 No.108400371

>>108400151
FUCK YOU

Anonymous
03/18/26(Wed)13:26:25 No.108400381

Anonymous 03/18/26(Wed)13:26:25 No.108400381

>>108400367
I'm still waiting for a thread to have Ani in the op img.

Anonymous
03/18/26(Wed)13:27:36 No.108400394

Anonymous 03/18/26(Wed)13:27:36 No.108400394

where's miku

Anonymous
03/18/26(Wed)13:28:11 No.108400398

Anonymous 03/18/26(Wed)13:28:11 No.108400398

>>108400367
*Miku, Kurisu, and Reddit/Twitter screenshots vs useful information
FTFY

Anonymous
03/18/26(Wed)13:29:15 No.108400404

Anonymous 03/18/26(Wed)13:29:15 No.108400404

>>108400253
Yes.
I'm fucking around with making an app that does just that, and there are a couple projects on github for "AI RPG".

Anonymous
03/18/26(Wed)13:31:51 No.108400420

Anonymous 03/18/26(Wed)13:31:51 No.108400420

Sloptuners go home.

https://arxiv.org/abs/2603.16177
The Finetuner's Fallacy: When to Pretrain with Your Finetuning Data

>Real-world model deployments demand strong performance on narrow domains where data is often scarce. Typically, practitioners finetune models to specialize them, but this risks overfitting to the domain and forgetting general knowledge. We study a simple strategy, specialized pretraining (SPT), where a small domain dataset, typically reserved for finetuning, is repeated starting from pretraining as a fraction of the total tokens. Across three specialized domains (ChemPile, MusicPile, and ProofPile), SPT improves domain performance and preserves general capabilities after finetuning compared to standard pretraining. In our experiments, SPT reduces the pretraining tokens needed to reach a given domain performance by up to 1.75x. These gains grow when the target domain is underrepresented in the pretraining corpus: on domains far from web text, a 1B SPT model outperforms a 3B standard pretrained model. Beyond these empirical gains, we derive overfitting scaling laws to guide practitioners in selecting the optimal domain-data repetition for a given pretraining compute budget.

...

>Our observations reveal the finetuner's fallacy: while finetuning may appear to be the cheapest path to domain adaptation, introducing specialized domain data during pretraining stretches its utility. SPT yields better specialized domain performance (via reduced overfitting across repeated exposures) and better general domain performance (via reduced forgetting during finetuning), ultimately achieving stronger results with fewer parameters and less total compute when amortized over inference. To get the most out of domain data, incorporate it as early in training as possible.

Anonymous
03/18/26(Wed)13:33:51 No.108400433

Anonymous 03/18/26(Wed)13:33:51 No.108400433

>>108400253
SillyTavern has the worst UI I've ever seen in my life. I understand how it works now, but it's still shit. Whole thing must have been designed by an autistic retard. Zero professionalism.

The official guy who posts in (...)
03/18/26(Wed)13:38:28 No.108400469

The official guy who posts in lmg sometimes 03/18/26(Wed)13:38:28 No.108400469

I support Miku

Anonymous
03/18/26(Wed)13:39:09 No.108400475

Anonymous 03/18/26(Wed)13:39:09 No.108400475

>>108400420
Revolutionary paper. Kind of the same level as that one about context deterioration.

https://arxiv.org/abs/2601.15300 I think it was this one.

The official black dick lover
03/18/26(Wed)13:40:20 No.108400486

The official black dick lover 03/18/26(Wed)13:40:20 No.108400486

I also support Miku and her right to enjoy what I enjoy.

Anonymous
03/18/26(Wed)13:42:20 No.108400496

Anonymous 03/18/26(Wed)13:42:20 No.108400496

>>108400486
don't forget to clear the name field next time you post, that could be really embarrassing

Anonymous
03/18/26(Wed)13:42:30 No.108400499

Anonymous 03/18/26(Wed)13:42:30 No.108400499

>>108400420
I'm currently doing this I think. I built a million sample SFT dataset but was getting results than I would have liked, so I moved to using most of it as as CPT dataset and only cherrypicking the 50k best samples of the initial dataset and doing SFT on that. Not done yet, but early results do look better.

Anonymous
03/18/26(Wed)13:42:41 No.108400500

Anonymous 03/18/26(Wed)13:42:41 No.108400500

>of original /lmg/ baker
Tradition.
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

Anonymous
03/18/26(Wed)13:45:55 No.108400529

Anonymous 03/18/26(Wed)13:45:55 No.108400529

now that the dust has settled, what's the verdict on mistral small 4

Anonymous
03/18/26(Wed)13:48:08 No.108400552

Anonymous 03/18/26(Wed)13:48:08 No.108400552

>>108400253
>did anyone try building multi-step "agentic" setups to keep track of stats
I have my own frontend that lets me do this. I plan on releasing it but it's not ready for prime time yet. It works really well tho.

The official >>108388611 poste(...)
03/18/26(Wed)13:48:15 No.108400553

The official >>108388611 poster 03/18/26(Wed)13:48:15 No.108400553

I also support Miku. I even got my sister to cosplay as her once.

Anonymous
03/18/26(Wed)13:49:14 No.108400560

Anonymous 03/18/26(Wed)13:49:14 No.108400560

File: smarterchild-icon-logo-38(...).png (53 KB, 3800x4000)

53 KB PNG

>>108400458
why not smarterchild?

Anonymous
03/18/26(Wed)13:49:36 No.108400566

Anonymous 03/18/26(Wed)13:49:36 No.108400566

>>108400529
small and open
>>108400496
thanks didn't expect such kindness in here

Anonymous
03/18/26(Wed)13:54:15 No.108400600

Anonymous 03/18/26(Wed)13:54:15 No.108400600

>>108400529
An even more botched job than Ministral 3.

Anonymous
03/18/26(Wed)13:56:01 No.108400611

Anonymous 03/18/26(Wed)13:56:01 No.108400611

Can the /lmg/ mascot war itself be the /lmg/ mascot?

Anonymous
03/18/26(Wed)13:57:48 No.108400630

Anonymous 03/18/26(Wed)13:57:48 No.108400630

>>108400611
Better still, can /lmg/ come together to make their own OC like Dipsy and have that be the official mascot?

Anonymous
03/18/26(Wed)13:58:16 No.108400634

Anonymous 03/18/26(Wed)13:58:16 No.108400634

>>108400529
it's somehow worse than glm 4.5 air

Anonymous
03/18/26(Wed)13:59:11 No.108400642

Anonymous 03/18/26(Wed)13:59:11 No.108400642

>>108400630
no

Anonymous
03/18/26(Wed)14:00:41 No.108400655

Anonymous 03/18/26(Wed)14:00:41 No.108400655

>>108400552
can you at least explain the outline and how the system works?
i've tried doing that stuff before but i've never found any good way to actually make this work

Anonymous
03/18/26(Wed)14:00:43 No.108400656

Anonymous 03/18/26(Wed)14:00:43 No.108400656

File: 1752330712632120.png (93 KB, 354x767)

93 KB PNG

Will local treat me better?

Anonymous
03/18/26(Wed)14:04:05 No.108400677

Anonymous 03/18/26(Wed)14:04:05 No.108400677

>>108400656
no considering you're a phone poster

Anonymous
03/18/26(Wed)14:05:47 No.108400688

Anonymous 03/18/26(Wed)14:05:47 No.108400688

>>108400677
what am i supposed to use when im in the office? 4chan is blocked in our vpn

Anonymous
03/18/26(Wed)14:08:36 No.108400703

Anonymous 03/18/26(Wed)14:08:36 No.108400703

File: 1770630237074432.png (248 KB, 768x768)

248 KB PNG

Anonymous
03/18/26(Wed)14:09:15 No.108400708

Anonymous 03/18/26(Wed)14:09:15 No.108400708

>>108400656
Which model is that?
A model that's biased for horses can't be a bad AI.

Anonymous
03/18/26(Wed)14:16:50 No.108400749

Anonymous 03/18/26(Wed)14:16:50 No.108400749

>>108400688
do your job wagie

Anonymous
03/18/26(Wed)14:19:03 No.108400767

Anonymous 03/18/26(Wed)14:19:03 No.108400767

>>108400688
just remote desktop into your own pc like everyone else

Anonymous
03/18/26(Wed)14:20:02 No.108400770

Anonymous 03/18/26(Wed)14:20:02 No.108400770

File: doubt.png (76 KB, 640x422)

76 KB PNG

>>108400767
>like everyone else

Anonymous
03/18/26(Wed)14:21:21 No.108400782

Anonymous 03/18/26(Wed)14:21:21 No.108400782

>>108400688
>office
Oh yeah, those are still a thing.
Blessed be the all powerful home office.

Anonymous
03/18/26(Wed)14:21:55 No.108400786

Anonymous 03/18/26(Wed)14:21:55 No.108400786

>>108400655
>can you at least explain the outline and how the system works?
I mean it's not really different from how any agentic system works.

You prompt
Evaluator agent checks if there's anything to do
Yes? send tasks to sub agents
Inject context with new information
Run the RP bot with the new augmented context

Anonymous
03/18/26(Wed)14:27:33 No.108400830

Anonymous 03/18/26(Wed)14:27:33 No.108400830

>>108400770
like having your web history looked at by hr is better

Anonymous
03/18/26(Wed)14:28:17 No.108400838

Anonymous 03/18/26(Wed)14:28:17 No.108400838

>>108400767
I would never in a million years get caught with 4chan open on my desktop at work.

Anonymous
03/18/26(Wed)14:29:09 No.108400846

Anonymous 03/18/26(Wed)14:29:09 No.108400846

>>108400786
does it really work that well? what stuff does it check for? what do sub agents do?
can you give an example of a scenario where this is useful?

Anonymous
03/18/26(Wed)14:39:04 No.108400938

Anonymous 03/18/26(Wed)14:39:04 No.108400938

/lmg/ has made me horny

Anonymous
03/18/26(Wed)14:40:00 No.108400946

Anonymous 03/18/26(Wed)14:40:00 No.108400946

>>108400846
For testing I have a blackjack dealer bot.
the whole blackjack game state is fully deterministic in code.
I have a small agent that runs before any replies that checks if the player wants to hit or stay.
It's only task is to determine if there's any actions to do. then you inject the whole game state in the dealers context.

If you were to give the dealer the ability to call the tools it would just get confused and hallucinate game states, call the tools at the wrong time, get stuck in a loop, etc... It's super important that you give your agents extremely narrow tasks.

Anonymous
03/18/26(Wed)14:40:03 No.108400947

Anonymous 03/18/26(Wed)14:40:03 No.108400947

>>108400938
cut your balls off then

Anonymous
03/18/26(Wed)14:40:56 No.108400957

Anonymous 03/18/26(Wed)14:40:56 No.108400957

>>108400786
I probably shouldn't make any productive answers here cause newfags are listening but... I can't really see how agents would do something more than a good model will do by itself if it uses thinking. At worst you can always prompt it to: when drafting and thinking a response think of feasibility of kissing while giving a blowjob hitting a prostate while penetrating a vagina etc.

Anonymous
03/18/26(Wed)14:45:07 No.108400992

Anonymous 03/18/26(Wed)14:45:07 No.108400992

>>108400957
i don't think that was the point of the system in the first place
like >>108400946 explained, it's supposed to be for running deterministic stuff in the background, you could make a pretty cool rpg with enough effort and scaffolding

Anonymous
03/18/26(Wed)14:46:54 No.108401004

Anonymous 03/18/26(Wed)14:46:54 No.108401004

>>108400992
Until you meet 22nd Elara that also happens to be a blonde elf.

Anonymous
03/18/26(Wed)14:49:11 No.108401020

Anonymous 03/18/26(Wed)14:49:11 No.108401020

>>108400957
Separation of concerns is always better. it's literally what claude code does, regardless of if you're using opus or haiku.

A bigger model would be better indeed. but the idea is to let smaller models perform like bigger ones. Any model big or small will perform a lot better when ran in steps like this.

>At worst you can always prompt it to: when drafting and thinking a response think of feasibility of kissing while giving a blowjob hitting a prostate while penetrating a vagina etc.
The problem is that when you do this you're asking your model to think about too many things at once. it will then produce sub par answers for everything you asked it to think about. This is just normal LLM behavior, regardless of it's size.

Anonymous
03/18/26(Wed)14:52:28 No.108401045

Anonymous 03/18/26(Wed)14:52:28 No.108401045

>>108401004
>Character Generator Agent
>Only job is to shit out new character
>Is aware of all existing characters
>Injects new characters in a lorebook on the fly
>never pollute your main RP bot with useless character generation context

Anonymous
03/18/26(Wed)14:52:57 No.108401051

Anonymous 03/18/26(Wed)14:52:57 No.108401051

>>108400433
st has always been a small hobby thing that happened to have exploded in popularity

Anonymous
03/18/26(Wed)14:53:58 No.108401057

Anonymous 03/18/26(Wed)14:53:58 No.108401057

>>108401004
she's also mischievous and purring a lot

Anonymous
03/18/26(Wed)14:54:09 No.108401060

Anonymous 03/18/26(Wed)14:54:09 No.108401060

File: 1742903627957955.webm (3.73 MB, 1080x720)

3.73 MB WEBM

Why is Japan so bad at AI

Anonymous
03/18/26(Wed)14:56:15 No.108401080

Anonymous 03/18/26(Wed)14:56:15 No.108401080

>>108401051
ST is like gen2 roleplay software. gen3 is when things become really good.

Anonymous
03/18/26(Wed)14:56:33 No.108401084

Anonymous 03/18/26(Wed)14:56:33 No.108401084

>>108400957
agents can dispatch relevant context to clean context windows, look up files with instructions on running the story, pertinent elements like characters, etc. in a loop, and give back summary info that the main window can synthesize into a non-slop message, which thinking alone can't do
imagine instead of having the world described in a single system prompt or relying on lorebook inject jank or the model's own pretrained knowledge, you could just store all that in files and have it reference that explicitly
in fact, you could do this reliably for any media by scraping their wikis or fandoms pages and having something like opus do a one-shot cleaning of all that data, splitting it into separate .mds to make it easy for the agents to consume
feels like the natural future for character-adhered roleplay

Anonymous
03/18/26(Wed)14:57:34 No.108401092

Anonymous 03/18/26(Wed)14:57:34 No.108401092

>>108401084
what are you waiting for? when you are finished I can make a logo for you

Anonymous
03/18/26(Wed)14:58:06 No.108401098

Anonymous 03/18/26(Wed)14:58:06 No.108401098

no dipsy :(

Anonymous
03/18/26(Wed)14:58:26 No.108401102

Anonymous 03/18/26(Wed)14:58:26 No.108401102

>>108401080
everything after ST is vibecoded trash that doesn't work

Anonymous
03/18/26(Wed)14:59:31 No.108401113

Anonymous 03/18/26(Wed)14:59:31 No.108401113

good OP picture. I was so tired of vocaloids... nobody sane wants them here.

Anonymous
03/18/26(Wed)14:59:55 No.108401116

Anonymous 03/18/26(Wed)14:59:55 No.108401116

>>108401102
Very true.

Anonymous
03/18/26(Wed)15:00:24 No.108401124

Anonymous 03/18/26(Wed)15:00:24 No.108401124

>>108401113
good thing I'm not sane

Anonymous
03/18/26(Wed)15:00:31 No.108401126

Anonymous 03/18/26(Wed)15:00:31 No.108401126

>>108400938
Miku fucked your wife, I take?

Anonymous
03/18/26(Wed)15:02:19 No.108401146

Anonymous 03/18/26(Wed)15:02:19 No.108401146

I'm downloading rakuten in hopes its salvagable, at least in japanese...no one else has quanted/tested it against similar sized models?

Anonymous
03/18/26(Wed)15:02:53 No.108401153

Anonymous 03/18/26(Wed)15:02:53 No.108401153

>>108401113
>>108401126

Anonymous
03/18/26(Wed)15:25:01 No.108401319

Anonymous 03/18/26(Wed)15:25:01 No.108401319

openai.com/parameter-golf
>Your goal: minimize held-out loss on a fixed FineWeb dataset while staying within a strict 16 MB artifact limit (weights + training code combined) and a 10-minute training budget on 8×H100s
come show off your reesorchor skills to big sammy

Anonymous
03/18/26(Wed)15:40:37 No.108401442

Anonymous 03/18/26(Wed)15:40:37 No.108401442

I'm using 3 GPT Pro plans for Codex, costing me 600 bucks per month. Any tips on how Local can help me? I'm a freelance developer of enterprise software for insurance, hospitals, schools, and more. Mostly for the government.

Anonymous
03/18/26(Wed)15:43:46 No.108401475

Anonymous 03/18/26(Wed)15:43:46 No.108401475

>>108401319
Deepmind and Nividia also have things going on right now at Kaggle. Both just started and there's cash prizes for those.
>Deepmind: https://www.kaggle.com/competitions/kaggle-measuring-agi
>Nvidia: https://www.kaggle.com/competitions/nvidia-nemotron-model-reasoning-challenge

Anonymous
03/18/26(Wed)15:45:05 No.108401484

Anonymous 03/18/26(Wed)15:45:05 No.108401484

>>108401319
That said, thank you, didn't know about the OpenAI one.

Anonymous
03/18/26(Wed)15:45:32 No.108401488

Anonymous 03/18/26(Wed)15:45:32 No.108401488

>>108401060
Japan has always been bad at software and has pretty much stagnated in every technological sector it dominated until the late '90s. A backward nation entirely propped up by the USA after WW2 to almost dangerous levels that couldn't manage to stand on its own feet. It will now enjoy a slow demise due to population replacement by turd-world immigration.

Anonymous
03/18/26(Wed)15:48:12 No.108401505

Anonymous 03/18/26(Wed)15:48:12 No.108401505

>>108401319
>>108401475
they're desperate for actually talented people, the demand is insane and the pool is small

Anonymous
03/18/26(Wed)15:48:23 No.108401506

Anonymous 03/18/26(Wed)15:48:23 No.108401506

>>108401488
Japan is also extremely schizo about IP/piracy/copyright. The idea of training a model on data they don't legally own is unfathomable to a Japenese brain.

Anonymous
03/18/26(Wed)15:49:52 No.108401515

Anonymous 03/18/26(Wed)15:49:52 No.108401515

>>108401442
Local wouldn't be able to realistically replace your Codex subscription.
At best, you can hope to delegate some simpler tasks to a local model to stretch your token budget if you are getting rate limited often.
You could try something like https://openrouter.ai/qwen/qwen3-next-80b-a3b-instruct:free with a free account for a while and if using it doesn't make you want to shoot your computer out of frustration, you can get some old GPUs and run similar models at home.

Anonymous
03/18/26(Wed)15:56:18 No.108401561

Anonymous 03/18/26(Wed)15:56:18 No.108401561

>>108401060
because they're pedantic idiots

Anonymous
03/18/26(Wed)15:59:32 No.108401588

Anonymous 03/18/26(Wed)15:59:32 No.108401588

>>108400957
>I probably shouldn't make any productive answers here cause newfags are listening but...
They won't know what to do with the info anyway.

>I can't really see how agents would do something
Splitting response generation into Brainstorm Agent -> Drafting Agent -> Editor Agent -> Author Agent -> Critic Agent would probably do something since it can iterate over the response from a different perspective and fresh context.

Anonymous
03/18/26(Wed)16:01:39 No.108401603

Anonymous 03/18/26(Wed)16:01:39 No.108401603

>>108401506
yet at the same time it's totally fine to put games on dlsite with characters that's 99% clone of something else

Anonymous
03/18/26(Wed)16:03:39 No.108401614

Anonymous 03/18/26(Wed)16:03:39 No.108401614

IT WAS A TESTAMENT OF HOW PURRING WAS IN THE GLINT OF HER MISCHIEVOUS EYES, A MIXTURE OF "HE IS MINE" AND "BITE IN THE EARS"

Anonymous
03/18/26(Wed)16:06:30 No.108401633

Anonymous 03/18/26(Wed)16:06:30 No.108401633

>>108401603
it is yeah, doujin culture is huge there

Anonymous
03/18/26(Wed)16:10:30 No.108401652

Anonymous 03/18/26(Wed)16:10:30 No.108401652

>>108401614
She said, picking at a loose thread.

Anonymous
03/18/26(Wed)16:11:05 No.108401656

Anonymous 03/18/26(Wed)16:11:05 No.108401656

>>108400420
>overfitting
>forgetting during finetuning
bro just weight decay towards pretrained weights. no more forgetting, no more overfitting

Anonymous
03/18/26(Wed)16:11:51 No.108401661

Anonymous 03/18/26(Wed)16:11:51 No.108401661

>>108401102
I mean, RP frontends don't really need to be that complex, right? They are just text processors. Vibecoding those is most likely gonna be fine. What's weird is, where are they?

>>108401080
ah, just 2mw for gen3

>>108401588
I can already see the first thing most anons would do is have the editor agent intelligently filter/replace common AI slop... bretty good desu

Anonymous
03/18/26(Wed)16:14:17 No.108401680

Anonymous 03/18/26(Wed)16:14:17 No.108401680

>>108401515
Thanks love

Anonymous
03/18/26(Wed)16:15:38 No.108401691

Anonymous 03/18/26(Wed)16:15:38 No.108401691

>>108401661
>just text processors. Vibecoding those is most likely gonna be fine
Piotr would like to have a word with you.

Anonymous
03/18/26(Wed)16:19:02 No.108401708

Anonymous 03/18/26(Wed)16:19:02 No.108401708

File: 2601.15300.png (144 KB, 974x435)

144 KB PNG

>>108400475
>Revolutionary paper
>Large Language Models (LLMs) exhibit a concerning phenomenon where performance catastrophically degrades when processing contexts approaching certain critical thresholds
lmao

Anonymous
03/18/26(Wed)16:20:51 No.108401717

Anonymous 03/18/26(Wed)16:20:51 No.108401717

>>108401656
I actually did try something like this, it worked but the vram requirements to hold the original weights was too much to really test it. streaming the weights to the cpu memory slowed down training substantially. even though i couldn't afford to properly test it, I still believe in the method's potential.

Anonymous
03/18/26(Wed)16:23:58 No.108401740

Anonymous 03/18/26(Wed)16:23:58 No.108401740

>>108401717
weight decay is an optimizer parameter, it should not increase your vram in anyway...

Anonymous
03/18/26(Wed)16:26:19 No.108401761

Anonymous 03/18/26(Wed)16:26:19 No.108401761

>>108401614
Bet her name was Sarah Chen or Seraphina

Anonymous
03/18/26(Wed)16:27:06 No.108401766

Anonymous 03/18/26(Wed)16:27:06 No.108401766

>>108400288
I'm trully an autist cuz I'm the exact opposite lol, I want it to behave as a machine as much as possible, with predictable behaviour and outputs, I hate when it seems like I'm talking to someone whom I have to ask for stuff or convince of things lol

Anonymous
03/18/26(Wed)16:29:00 No.108401780

Anonymous 03/18/26(Wed)16:29:00 No.108401780

>>108399044
Close
https://huggingface.co/DavidAU/Qwen3.5-9B-Claude-4.6-Opus-Deckard-V4.2-Uncensored-Heretic-Thinking

Anonymous
03/18/26(Wed)16:29:38 No.108401786

Anonymous 03/18/26(Wed)16:29:38 No.108401786

>>108401614
I'm so tired of that shit

Anonymous
03/18/26(Wed)16:31:19 No.108401800

Anonymous 03/18/26(Wed)16:31:19 No.108401800

Calling anons who runs LLM's with such setup:

VRAM 16gb / RAM 128gb

How does it feel even? Or is it just meh

Anonymous
03/18/26(Wed)16:32:38 No.108401814

Anonymous 03/18/26(Wed)16:32:38 No.108401814

>>108401614
My spine ran out of shivers already

Anonymous
03/18/26(Wed)16:35:54 No.108401846

Anonymous 03/18/26(Wed)16:35:54 No.108401846

>>108401614
elara bros... getting shivers down my spine, as I feel a distinct taste of iron and the smell of ozone in the air.

Anonymous
03/18/26(Wed)16:37:09 No.108401856

Anonymous 03/18/26(Wed)16:37:09 No.108401856

>>108401846
>smell of ozone
the only innovation brought by chinese models is that
sad

Anonymous
03/18/26(Wed)16:37:51 No.108401863

Anonymous 03/18/26(Wed)16:37:51 No.108401863

>>108401717
vram overhead should not be that big compared to everything else and you can approximate the original weights. if you use weight decay during lora thats basically the same, weight decay toward the original weights

>>108401740
learn2read

Anonymous
03/18/26(Wed)16:40:06 No.108401880

Anonymous 03/18/26(Wed)16:40:06 No.108401880

>>108401800
vram 16 / ram 64 (ddr4)
it's meh. good for playing around and shooting shit when I'm lonely I guess. I run air and cydonia at 5-10tk/s. Every model is genuinely a slop cannon at this point so my interest faded in AI RP. If you're running 128gb of ddr5, you might be able to run some more interesting stuff, but idk.

Anonymous
03/18/26(Wed)16:44:37 No.108401916

Anonymous 03/18/26(Wed)16:44:37 No.108401916

>>108401880
I appreciate your reply, kind anon

It is a notebook with a built-in graphics as well. So it should be possible to keep the A5000 GPU free for AI stuff only

Anonymous
03/18/26(Wed)16:46:11 No.108401928

Anonymous 03/18/26(Wed)16:46:11 No.108401928

>>108401740
i know what weight decay is, I meant comparing the difference between the model weights every training step and applying a custom loss to keep the model weights as close to the original pretrained base weights as possible. like the anon suggested weight decay towards the original weights, you might call it an elastic weight consolidation or regulation towards a reference model but regardless of how you call the method it requires more vram because you have to constantly reference the original model weights.

Anonymous
03/18/26(Wed)16:48:34 No.108401949

Anonymous 03/18/26(Wed)16:48:34 No.108401949

>>108401661
>They are just text processors. Vibecoding those is most likely gonna be fine. What's weird is, where are they?
Besides kcpp and ST, everything I've seen were shitty jan clones.

Anonymous
03/18/26(Wed)16:55:52 No.108401987

Anonymous 03/18/26(Wed)16:55:52 No.108401987

File: 20708.png (29 KB, 640x169)

29 KB PNG

https://github.com/ggml-org/llama.cpp/pull/20708
>I'm just getting to know the parser overall and shouldn't make changes I don't understand
But he corrected it. I wonder how long it'll take to get him banned.

Anonymous
03/18/26(Wed)16:58:41 No.108402002

Anonymous 03/18/26(Wed)16:58:41 No.108402002

What does this French man get so much hate in the AI space when all he does is tell the objective truth in every interview?

Anonymous
03/18/26(Wed)16:59:44 No.108402009

Anonymous 03/18/26(Wed)16:59:44 No.108402009

Posted about mikupad not working with kobold the other day. It works when I launch the directory as a python server. Is there anything wrong with using it this way?

Anonymous
03/18/26(Wed)17:01:29 No.108402017

Anonymous 03/18/26(Wed)17:01:29 No.108402017

>>108401863
>and you can approximate the original weights.
ohhhh, I didn't think about trying that.

Anonymous
03/18/26(Wed)17:01:44 No.108402018

Anonymous 03/18/26(Wed)17:01:44 No.108402018

>>108400286
Minimax is benchmaxxed as hell, but it is the most capable 230B model regardless.

Anonymous
03/18/26(Wed)17:03:55 No.108402029

Anonymous 03/18/26(Wed)17:03:55 No.108402029

https://huggingface.co/AesSedai/GLM-4.6-Derestricted-GGUF

I gave this a try and it was a crazy experience. When I started reading the first message it was.... just pure 4.6. It was exactly what vanilla 4.6 would say word for word. I guess now I could ask it for loli guro ERP with zero prefill and it would comply, but why? The refusals are never an issue with all those models if you prefill it with just one example of positive response. And then why would you use some brain damage method that either does nothing or brain damages the model? I guess that is the power of placebo and it is here to stay, since we always get newfags that get that one golden gen and think it was thanks to GIGASEXMEGAFAG-DARK-MESSIAH part of the model name.

btw kys drummer

Anonymous
03/18/26(Wed)17:04:32 No.108402034

Anonymous 03/18/26(Wed)17:04:32 No.108402034

>>108402002
people blame him for the current meta end of open sourcing models

Anonymous
03/18/26(Wed)17:05:53 No.108402044

Anonymous 03/18/26(Wed)17:05:53 No.108402044

Reading all those posts praising the ERP agent idea I can't wait to see how many pipelines I will be able to use next month. Man there will be so many competing standards.

Anonymous
03/18/26(Wed)17:06:14 No.108402047

Anonymous 03/18/26(Wed)17:06:14 No.108402047

File: bench.jpg (90 KB, 1047x539)

90 KB JPG

minimax 2.7 soon out, looks promising
https://xcancel.com/ivanfioravanti/status/2033936213510377733

Anonymous
03/18/26(Wed)17:07:11 No.108402055

Anonymous 03/18/26(Wed)17:07:11 No.108402055

File: sox explanation.png (1.44 MB, 3354x1850)

1.44 MB PNG

>>108401766
Glad I'm not the only one. It's why i currently main Mistral Large 3. It's a decent coding model with absolutely ZERO "YOU'RE ABSOLUTELY RIGHT" shit. Even when it fucks up and i point it out (which is surprisingly rare, likely thans to it being half-a-trillion plus MoE), it simply unfucks the error and moves on or makes reasonable suggestions. That's how these things are SUPPOSED to function. I HATE the dick-eating shit many models have ingrained in them. I'm almost certain it leads to the companies activity making them shittier even if they don't realize it because it seems to prioritize coddling the user's emotions.

Anonymous
03/18/26(Wed)17:07:12 No.108402056

Anonymous 03/18/26(Wed)17:07:12 No.108402056

>>108401740
have a (you). (you) already got 2 by pretending to be retarded.

Anonymous
03/18/26(Wed)17:07:15 No.108402057

Anonymous 03/18/26(Wed)17:07:15 No.108402057

he did the thing haha

Anonymous
03/18/26(Wed)17:08:49 No.108402084

Anonymous 03/18/26(Wed)17:08:49 No.108402084

>>108401588
>Splitting response generation into Brainstorm Agent -> Drafting Agent -> Editor Agent -> Author Agent -> Critic Agent would probably do something since it can iterate over the response from a different perspective and fresh context.
I think an "anti purple prose" agent would do wonder in the pipeline. I played a bit with a secondary agent handling the sappy stuff and it worked quite well, it was just way too slow back then.

Anonymous
03/18/26(Wed)17:10:15 No.108402093

Anonymous 03/18/26(Wed)17:10:15 No.108402093

>>108402055
>I HATE the dick-eating shit many models have ingrained in them.
waste of tokens that people select on average apparently over actual useful problem solving

Anonymous
03/18/26(Wed)17:11:56 No.108402107

Anonymous 03/18/26(Wed)17:11:56 No.108402107

>>108402044
I admire your optimism, but I wouldn't be so quick to hope. Even though I find it kinda hard to believe, LLM loner/gooner market seems to be extremely tiny. Reading and imagining just seems too unpleasant to most people.

Anonymous
03/18/26(Wed)17:12:23 No.108402110

Anonymous 03/18/26(Wed)17:12:23 No.108402110

>>108402047
>>108400207
>>108400207

Anonymous
03/18/26(Wed)17:13:51 No.108402118

Anonymous 03/18/26(Wed)17:13:51 No.108402118

>>108402047
It will be even more safe.

Anonymous
03/18/26(Wed)17:14:46 No.108402122

Anonymous 03/18/26(Wed)17:14:46 No.108402122

>>108402047
can't wait to see it reason 5000 tokens on how it should give a refusal

Anonymous
03/18/26(Wed)17:15:25 No.108402128

Anonymous 03/18/26(Wed)17:15:25 No.108402128

>>108402047
Why is gemini pro so low, I thought it was a good model?

Anonymous
03/18/26(Wed)17:19:00 No.108402150

Anonymous 03/18/26(Wed)17:19:00 No.108402150

File: IMG_9477.jpg (181 KB, 1170x1619)

181 KB JPG

Not local obviously, but I thought anons would like to see what ChatGPT is starting to spew out in terms of advertising. This was a question about server hardware vs ATX/consumer.

Anonymous
03/18/26(Wed)17:21:54 No.108402164

Anonymous 03/18/26(Wed)17:21:54 No.108402164

File: 26E721EF-DCB9-4293-8B08-8(...).png (2.07 MB, 1024x1536)

2.07 MB PNG

>>108401098
I’m sorry anon. Even I am deeply irritated by TMW forever.

Anonymous
03/18/26(Wed)17:22:28 No.108402170

Anonymous 03/18/26(Wed)17:22:28 No.108402170

>>108402150
Basically what I expected, but no way this would ever recoup their free tier inference cost unless they literally riddle between every two answers with ads.

Anonymous
03/18/26(Wed)17:23:42 No.108402177

Anonymous 03/18/26(Wed)17:23:42 No.108402177

>>108402056
>>108401928
then use the correct terminology? you are talking about kl divergence

Anonymous
03/18/26(Wed)17:25:43 No.108402188

Anonymous 03/18/26(Wed)17:25:43 No.108402188

if I learned something from reading chinese stuff, it's that ki divergence is bad

Anonymous
03/18/26(Wed)17:26:20 No.108402192

Anonymous 03/18/26(Wed)17:26:20 No.108402192

>>108402188
kek

Anonymous
03/18/26(Wed)17:27:02 No.108402196

Anonymous 03/18/26(Wed)17:27:02 No.108402196

>>108402150
i don't know why they don't do what google does with their search results and pretend that the ads are actually search results you wanted. the average person is too retarded to know when they are being advertised to.

Anonymous
03/18/26(Wed)17:28:49 No.108402208

Anonymous 03/18/26(Wed)17:28:49 No.108402208

>>108402177
(you)
not (you) >108402188

Anonymous
03/18/26(Wed)17:29:12 No.108402213

Anonymous 03/18/26(Wed)17:29:12 No.108402213

>>108402170
I’m asking about the functionsl difference between used server hardware, and ATX stuff. I’m the last person that’s going to buy Dell power edge servers lol.
>>108402196
Frankly, I’m waiting for much “worse”version of advertising than this. But I’m a pessimist.

Anonymous
03/18/26(Wed)17:29:27 No.108402217

Anonymous 03/18/26(Wed)17:29:27 No.108402217

>>108402196
They'll obviously do that at some point, when people use the free tier to specifically ask for shopping.

Anonymous
03/18/26(Wed)17:31:05 No.108402227

Anonymous 03/18/26(Wed)17:31:05 No.108402227

>>108402213
>I’m asking about the functionsl difference between used server hardware, and ATX stuff. I’m the last person that’s going to buy Dell power edge servers lol.
I mean, it's a start, they'll be obviously lagging behind the tens of years of google refinement on this stuff.

Anonymous
03/18/26(Wed)17:32:58 No.108402236

Anonymous 03/18/26(Wed)17:32:58 No.108402236

its over for deepseek. they could not build anything better and worthy. open source is fucked

Anonymous
03/18/26(Wed)17:34:55 No.108402246

Anonymous 03/18/26(Wed)17:34:55 No.108402246

>>108402227
Maybe GPT 5.whatever can vibe code them a more intrusive ad schema.
I’ve been waiting for it to happen for a while. It would give an entirely new justification for local inference.

Anonymous
03/18/26(Wed)17:35:17 No.108402251

Anonymous 03/18/26(Wed)17:35:17 No.108402251

its over for Pygmalion. they could not build anything better and worthy. open source is fucked

Anonymous
03/18/26(Wed)17:36:00 No.108402255

Anonymous 03/18/26(Wed)17:36:00 No.108402255

>>108402177
kl divergence measures the difference of the output probabilities. this is literally just comparing the model weights. totally unrelated.

Anonymous
03/18/26(Wed)17:37:49 No.108402263

Anonymous 03/18/26(Wed)17:37:49 No.108402263

>>108402255
then you're doing it in the most retarded way you could have chosen to do it

Anonymous
03/18/26(Wed)17:38:10 No.108402264

Anonymous 03/18/26(Wed)17:38:10 No.108402264

I think unsloth studio might've been vibecoded. Embarrassing even for a beta and why are they sucking nvidia cock so much these days

Anonymous
03/18/26(Wed)17:42:55 No.108402287

Anonymous 03/18/26(Wed)17:42:55 No.108402287

>>108402213
>>108402227
>>108402246
I was expecting less AdWords inserts and more TV/movie style product placements where they just inject the ad into context and give the model instructions to casually segue into the shilling like youtubers or podcasters do except with markdown links and images. It probably would have received less blowback from users too than they actually got.

Anonymous
03/18/26(Wed)17:43:34 No.108402291

Anonymous 03/18/26(Wed)17:43:34 No.108402291

>>108402264
Didn't NVIDIA announce that they are going to invest into local AI or something?
In practice that would mean $$$ or free labor if they go along.

Anonymous
03/18/26(Wed)17:44:39 No.108402299

Anonymous 03/18/26(Wed)17:44:39 No.108402299

>>108402291
Locai ai probably means 123b models for that greedy bastards

Anonymous
03/18/26(Wed)17:45:07 No.108402304

Anonymous 03/18/26(Wed)17:45:07 No.108402304

>>108402287
I doubt they would poison the discussion with inserted instructions with ads, it's too costly compared to just using the classic way.
And it would be a very bad experience anyway, even for normies.

Anonymous
03/18/26(Wed)17:45:51 No.108402307

Anonymous 03/18/26(Wed)17:45:51 No.108402307

>>108402291
I hope they do, as long as they have their infinite money glitch they might as well work to enhance local

Anonymous
03/18/26(Wed)17:46:39 No.108402313

Anonymous 03/18/26(Wed)17:46:39 No.108402313

Is this any good?
https://github.com/ml-explore/mlx-lm

Anonymous
03/18/26(Wed)17:47:34 No.108402315

Anonymous 03/18/26(Wed)17:47:34 No.108402315

>>108402287
who says they're not doing that too?

Anonymous
03/18/26(Wed)17:49:32 No.108402332

Anonymous 03/18/26(Wed)17:49:32 No.108402332

>>108402304
>What is the functional difference between used server hardware?
>Proceeds to spit out the usual listicle except point 5 is to consider puchasing a new Dell PowerEdge because blah blah blah link here.
Not seeing how it would be costly or a bad experience. I'm telling you, the average person wouldn't even register it as an ad.

Anonymous
03/18/26(Wed)17:50:08 No.108402336

Anonymous 03/18/26(Wed)17:50:08 No.108402336

>>108402263
you think computing the kl divergence is less compute intensive then just comparing some numbers? your way requires 2 forward passes and will force the original output probabilities which is what you are actually trying to change during fine tuning. the idea is the optimizer has no momentum or variance statistics from the original pretraining run, it will optimize to your sequences fine but it has no priors to keep its generalization intact it will begin to overfit. this is trying to prevent overfitting by constraining the optimizer to find a solution near the original model weights. kl divergence is for model distillation not fine tuning.

Anonymous
03/18/26(Wed)17:50:16 No.108402338

Anonymous 03/18/26(Wed)17:50:16 No.108402338

>>108402251
sounds like a mythological name in 2026, heck, 2025

Anonymous
03/18/26(Wed)17:50:34 No.108402341

Anonymous 03/18/26(Wed)17:50:34 No.108402341

>>108402177
i hate kl divergence. why come up with a retarded meaningless name that says absolutely nothing? just call it relative entropy. much more intuitive than fucking "Kullback–Leibler"

Anonymous
03/18/26(Wed)17:50:56 No.108402342

Anonymous 03/18/26(Wed)17:50:56 No.108402342

>>108402338
don't mock pyg7b, it's the future of local rp

Anonymous
03/18/26(Wed)17:53:15 No.108402354

Anonymous 03/18/26(Wed)17:53:15 No.108402354

>>108402336
> will force the original output probabilities which is what you are actually trying to change during fine tuning
if this is what it did (which it doesn't), then how the fuck would keeping the model weights as close as possible to the original not produce the same effect?
there's nothing wrong with throwing shit at the wall but at least own up to it

Anonymous
03/18/26(Wed)17:53:35 No.108402355

Anonymous 03/18/26(Wed)17:53:35 No.108402355

>>108401880
Thankfully because of you.

Anonymous
03/18/26(Wed)17:56:01 No.108402368

Anonymous 03/18/26(Wed)17:56:01 No.108402368

File: 1744668962556779.png (32 KB, 490x507)

32 KB PNG

N

Anonymous
03/18/26(Wed)17:56:45 No.108402373

Anonymous 03/18/26(Wed)17:56:45 No.108402373

>>108402341
i hate dk effect. why come up with a retarded meaningless name that says absolutely nothing? just call it retarded dumbass syndrome. much more intuitive than fucking "Dunning–Kruger"

Anonymous
03/18/26(Wed)18:00:33 No.108402396

Anonymous 03/18/26(Wed)18:00:33 No.108402396

I am going to lose my mind fiddling with -ot.
Does anyone here know how to find out more details about how much gets allocated where before llama.cpp gives me an OOM?
I've been trying to offload attention tensors of GLM-chan to the faster GPUs, and all the regexes I've written are driving me insane.

Anonymous
03/18/26(Wed)18:01:17 No.108402399

Anonymous 03/18/26(Wed)18:01:17 No.108402399

>>108402354
>there's nothing wrong with throwing shit at the wall but at least own up to it
I never said it was proven. its just something people have been trying to do. you can look up the papers its not like I came up with it myself.

Anonymous
03/18/26(Wed)18:01:28 No.108402400

Anonymous 03/18/26(Wed)18:01:28 No.108402400

>>108402355
what?

Anonymous
03/18/26(Wed)18:02:16 No.108402402

Anonymous 03/18/26(Wed)18:02:16 No.108402402

File: simplicity.png (707 KB, 1366x768)

707 KB PNG

>>108402354
that nice and patient anon is someone else. u retard clearly dont know what ure talking about

people love to make things more complicated than they have to. a great example is adam and adamw. weight decay mogs l2 penalty in every way. simplicity wins

>>108402373
room temperature iq

Anonymous
03/18/26(Wed)18:02:49 No.108402403

Anonymous 03/18/26(Wed)18:02:49 No.108402403

>>108402396
use dry run so you dont have to wait as long.
https://github.com/ikawrakow/ik_llama.cpp/pull/1462

Anonymous
03/18/26(Wed)18:03:23 No.108402405

Anonymous 03/18/26(Wed)18:03:23 No.108402405

>>108402402
uh huh

Anonymous
03/18/26(Wed)18:04:24 No.108402410

Anonymous 03/18/26(Wed)18:04:24 No.108402410

>>108402402
uh huh

Anonymous
03/18/26(Wed)18:06:40 No.108402417

Anonymous 03/18/26(Wed)18:06:40 No.108402417

>>108402403
But I don't want to install a schizo fork, Anon...
Also I don't have to wait long, it fails the alloc instantly, on whichever device that is, I just don't get what the resulting memory distribution between the GPUs is, and I'm already 8 regexes deep in this.

Anonymous
03/18/26(Wed)18:06:45 No.108402418

Anonymous 03/18/26(Wed)18:06:45 No.108402418

>>108402402
uh huh

Anonymous
03/18/26(Wed)18:08:01 No.108402422

Anonymous 03/18/26(Wed)18:08:01 No.108402422

>>108402417
>I just don't get what the resulting memory distribution between the GPUs is
-v

Anonymous
03/18/26(Wed)18:09:10 No.108402427

Anonymous 03/18/26(Wed)18:09:10 No.108402427

>>108402396
Can you explain what the default fit does and what you would like it to do differently?

Anonymous
03/18/26(Wed)18:12:02 No.108402438

Anonymous 03/18/26(Wed)18:12:02 No.108402438

So, apparently the v3 version of the Qwen3.5 27b heretic has more KL-Divergence, and more refusals than the v2 version. On paper it looks worse, but not in practice. I tried the v3 version of heretic, and it seems far more intelligent. Is KL-Divergence a useless metric?

In any case, I've become a fan of the "Arbitrary-Rank Ablation (ARA) method" used to make the v3 variant. At least for Qwen3.5 27b, it worked better than v2's "Magnitude-Preserving Orthogonal Ablation (MPOA) and Self-Organizing Map Abliteration (SOMA)".

Anonymous
03/18/26(Wed)18:12:55 No.108402445

Anonymous 03/18/26(Wed)18:12:55 No.108402445

now that the dust has settled how's this new xiami model
i'm feeling like it's benchmaxxed

Anonymous
03/18/26(Wed)18:13:48 No.108402449

Anonymous 03/18/26(Wed)18:13:48 No.108402449

>>108402417
ah no worries anon. in that case you should use dry run so you don't have to wait as long.
https://github.com/ggml-org/llama.cpp/pull/19526

Anonymous
03/18/26(Wed)18:15:31 No.108402462

Anonymous 03/18/26(Wed)18:15:31 No.108402462

>>108402445
xiaoxiao model? i'm pretty sure that was a flash series on newgrounds.

Anonymous
03/18/26(Wed)18:22:11 No.108402498

Anonymous 03/18/26(Wed)18:22:11 No.108402498

File: sJw8HjT.png (131 KB, 743x923)

131 KB PNG

Give me a model with better prose than Maginum-Cydoms-24B-absolute-heresy

Anonymous
03/18/26(Wed)18:24:36 No.108402516

Anonymous 03/18/26(Wed)18:24:36 No.108402516

File: 1764945644391126.png (3.18 MB, 1534x2048)

3.18 MB PNG

V4 when?

Anonymous
03/18/26(Wed)18:25:09 No.108402522

Anonymous 03/18/26(Wed)18:25:09 No.108402522

>>108402427
From what I understand, --fit only considers tensor sizes, and its job is to make the model load at all, not necessarily load it in the way that would be the fastest for inference.
What I'm trying to do is to put as many attention layers as I can onto my higher-bandwidth GPUs, then spread the experts around the remaining VRAM.
I hope to eek out a few more tk/s this way.

Anonymous
03/18/26(Wed)18:26:00 No.108402529

Anonymous 03/18/26(Wed)18:26:00 No.108402529

>>108402498
How much of it is your sys prompt and how much is the model?

Anonymous
03/18/26(Wed)18:28:48 No.108402558

Anonymous 03/18/26(Wed)18:28:48 No.108402558

>>108402522
You could use llama-fit-params to get the -ot for what fit does and then modify that.

Anonymous
03/18/26(Wed)18:30:15 No.108402565

Anonymous 03/18/26(Wed)18:30:15 No.108402565

>>108402498
Jesus christ, I never seen a model do a double dash before to interrupt dialogue. That's grim as fuck.

Anonymous
03/18/26(Wed)18:30:41 No.108402569

Anonymous 03/18/26(Wed)18:30:41 No.108402569

>>108401126
No but Miku fucked my gfwife

Anonymous
03/18/26(Wed)18:31:54 No.108402583

Anonymous 03/18/26(Wed)18:31:54 No.108402583

>>108402529
Honestly, not sure. I tried regenerating it with a few simpler prompts, and while not as good, it's still a lot better than what Qwen gives me.

My question wasn't rhetorical by the way, I was genuinely wondering if people knew models with better prose. I'm very tired of slop.

>>108402565
I banned the emdash token and it started doing that.

Anonymous
03/18/26(Wed)18:32:58 No.108402593

Anonymous 03/18/26(Wed)18:32:58 No.108402593

>>108402558
I'll give it a shot. Thank you, Anon!

Anonymous
03/18/26(Wed)18:34:04 No.108402605

Anonymous 03/18/26(Wed)18:34:04 No.108402605

File: 797436386.png (107 KB, 1000x1000)

107 KB PNG

>>108400151
MIKU CAN BURN IN HELL

Anonymous
03/18/26(Wed)18:34:37 No.108402609

Anonymous 03/18/26(Wed)18:34:37 No.108402609

>>108402565
Somewhat related, but one of my favorite writing benchmarks is seeing how models react to me ending my responses with a cut-off word.
I haven't gone higher than GLM, but funnily enough, the best reactions have been from good old Nemo.
Why oh why can't we have a bigger Nemo...

Anonymous
03/18/26(Wed)18:34:41 No.108402610

Anonymous 03/18/26(Wed)18:34:41 No.108402610

>>108402529
>sys prompt
0%
>how much is the model
0%
I just wrote something and said it is new drummers tune.

Anonymous
03/18/26(Wed)18:35:25 No.108402613

Anonymous 03/18/26(Wed)18:35:25 No.108402613

>>108402196
Give them time to cook. In the beginning, Google ads were clearly marked and visibly different from the search results.

Anonymous
03/18/26(Wed)18:35:41 No.108402614

Anonymous 03/18/26(Wed)18:35:41 No.108402614

>>108402609
Nemo is literally unsafe.

Anonymous
03/18/26(Wed)18:36:26 No.108402624

Anonymous 03/18/26(Wed)18:36:26 No.108402624

>>108402605
This she cucked me and I won't forgive her for that even if it turns me on

Anonymous
03/18/26(Wed)18:36:52 No.108402627

Anonymous 03/18/26(Wed)18:36:52 No.108402627

>>108402614
Unsafe as in unprotected, of course. Hnnnngggg.

Anonymous
03/18/26(Wed)18:39:28 No.108402643

Anonymous 03/18/26(Wed)18:39:28 No.108402643

>>108402624
I am begging you, take your meds

Anonymous
03/18/26(Wed)18:40:11 No.108402652

Anonymous 03/18/26(Wed)18:40:11 No.108402652

File: standard issue wand.jpg (226 KB, 1536x1536)

226 KB JPG

Anonymous
03/18/26(Wed)18:41:14 No.108402659

Anonymous 03/18/26(Wed)18:41:14 No.108402659

File: tet.jpg (133 KB, 1024x1024)

133 KB JPG

Anonymous
03/18/26(Wed)18:42:54 No.108402668

Anonymous 03/18/26(Wed)18:42:54 No.108402668

>>108400151
>Yum LeCum

Anonymous
03/18/26(Wed)18:43:36 No.108402673

Anonymous 03/18/26(Wed)18:43:36 No.108402673

elon release model won
https://www.reddit.com/r/LocalLLaMA/comments/1rxhwqs/mimov2pro_omni_tts_we_will_opensource_when_the/

Anonymous
03/18/26(Wed)18:43:38 No.108402675

Anonymous 03/18/26(Wed)18:43:38 No.108402675

>>108402652
lovely tummy

Anonymous
03/18/26(Wed)18:44:44 No.108402679

Anonymous 03/18/26(Wed)18:44:44 No.108402679

File: [Coalgirls]_Yuru_Yuri_05_(...).png (2.28 MB, 1920x1080)

2.28 MB PNG

>>108400151
>120b params
>small
its over for local

Anonymous
03/18/26(Wed)18:45:50 No.108402690

Anonymous 03/18/26(Wed)18:45:50 No.108402690

>>108402679
B number must go up

Anonymous
03/18/26(Wed)18:46:02 No.108402692

Anonymous 03/18/26(Wed)18:46:02 No.108402692

>>108402679
Works on my machine.

Anonymous
03/18/26(Wed)18:46:43 No.108402694

Anonymous 03/18/26(Wed)18:46:43 No.108402694

>>108402692
I know you're lying because MS4 does not work on any machine.

Anonymous
03/18/26(Wed)18:46:49 No.108402696

Anonymous 03/18/26(Wed)18:46:49 No.108402696

>>108402679
What's up with the picture of an empty floor?

Anonymous
03/18/26(Wed)18:46:55 No.108402699

Anonymous 03/18/26(Wed)18:46:55 No.108402699

>>108402679
you've had 3 years of warning to buy shit up before shit goes into the fans

Anonymous
03/18/26(Wed)18:47:58 No.108402707

Anonymous 03/18/26(Wed)18:47:58 No.108402707

>>108402679
not even a good model anyways. mistral cucked out.

Anonymous
03/18/26(Wed)18:48:44 No.108402715

Anonymous 03/18/26(Wed)18:48:44 No.108402715

>>108402696
akari didn't deserve this

Anonymous
03/18/26(Wed)18:51:19 No.108402732

Anonymous 03/18/26(Wed)18:51:19 No.108402732

>>108402699
It's "before shit hit the fan", my brown-skinned friend!

Anonymous
03/18/26(Wed)18:52:53 No.108402738

Anonymous 03/18/26(Wed)18:52:53 No.108402738

>>108402699
i did upgrade to 80gb ram last summer i thought itd be enough kek

Anonymous
03/18/26(Wed)18:53:11 No.108402742

Anonymous 03/18/26(Wed)18:53:11 No.108402742

>>108402732
Before the excrement is hurled in the general direction of the rotating blades.

Anonymous
03/18/26(Wed)18:55:09 No.108402757

Anonymous 03/18/26(Wed)18:55:09 No.108402757

are those small 2b coding models good enough if you only need references to your own code?

Anonymous
03/18/26(Wed)18:55:13 No.108402758

Anonymous 03/18/26(Wed)18:55:13 No.108402758

>>108402583
It would be easier to use regex to replace em dashes and double dashes and whatever with empty characters or commas, depending. Not sure how doable this is in retardo tavern.
If you really begin to analyze the model's output it will shit out all sort of stuff which, technically not visible to the user, will mess up lot of other things unless you clean up the output.
Banned tokens are waste of bandwidth in this sense.

Anonymous
03/18/26(Wed)18:55:34 No.108402759

Anonymous 03/18/26(Wed)18:55:34 No.108402759

>>108402643
They don't make meds strong enough

Anonymous
03/18/26(Wed)18:56:28 No.108402764

Anonymous 03/18/26(Wed)18:56:28 No.108402764

>>108402757
Try it.

Anonymous
03/18/26(Wed)18:58:06 No.108402778

Anonymous 03/18/26(Wed)18:58:06 No.108402778

File: 1773874664413.png (108 KB, 343x261)

108 KB PNG

>>108402764
shant

Anonymous
03/18/26(Wed)18:58:39 No.108402782

Anonymous 03/18/26(Wed)18:58:39 No.108402782

>>108402690
if only active B number didn't keep go down

Anonymous
03/18/26(Wed)18:59:05 No.108402786

Anonymous 03/18/26(Wed)18:59:05 No.108402786

>>108402778
use wifi cable

Anonymous
03/18/26(Wed)18:59:35 No.108402788

Anonymous 03/18/26(Wed)18:59:35 No.108402788

>>108402778
use job

Anonymous
03/18/26(Wed)18:59:39 No.108402790

Anonymous 03/18/26(Wed)18:59:39 No.108402790

File: 1768268448923840.jpg (892 KB, 1413x2000)

892 KB JPG

>>108402659

Anonymous
03/18/26(Wed)19:02:19 No.108402812

Anonymous 03/18/26(Wed)19:02:19 No.108402812

File: 1768361503958842.png (1.37 MB, 2175x1234)

1.37 MB PNG

>>108402790
>>108402659
Teto tetas

Anonymous
03/18/26(Wed)19:03:43 No.108402818

Anonymous 03/18/26(Wed)19:03:43 No.108402818

>>108402738
Out of curiosity tell me how 80gb isn't enough

Anonymous
03/18/26(Wed)19:06:14 No.108402838

Anonymous 03/18/26(Wed)19:06:14 No.108402838

>>108402818
cant run tonnes of the recent models even at the smallest quants

Anonymous
03/18/26(Wed)19:07:27 No.108402845

Anonymous 03/18/26(Wed)19:07:27 No.108402845

best local model for 32g?

Anonymous
03/18/26(Wed)19:09:26 No.108402857

Anonymous 03/18/26(Wed)19:09:26 No.108402857

>>108402845
Something you can run on solid state. At that much acceleration your fans are going to do funny things.

Anonymous
03/18/26(Wed)19:11:01 No.108402870

Anonymous 03/18/26(Wed)19:11:01 No.108402870

File: 80isntenough.png (69 KB, 515x302)

69 KB PNG

>>108402818
80GB isn't enough.

Anonymous
03/18/26(Wed)19:12:05 No.108402877

Anonymous 03/18/26(Wed)19:12:05 No.108402877

>>108402812
>>108402790
>>108402659
>>108402652
offtopic trash

Anonymous
03/18/26(Wed)19:20:18 No.108402939

Anonymous 03/18/26(Wed)19:20:18 No.108402939

File: 553235.jpg (23 KB, 296x256)

23 KB JPG

>>108402877

Anonymous
03/18/26(Wed)19:20:24 No.108402941

Anonymous 03/18/26(Wed)19:20:24 No.108402941

>>108402812
>>108402790
>>108402659
>>108402652
ontopic gems

Anonymous
03/18/26(Wed)19:24:23 No.108402967

Anonymous 03/18/26(Wed)19:24:23 No.108402967

File: 1743464908580239.jpg (335 KB, 990x990)

335 KB JPG

TIL that you can't enable P2P on chink modded 4090 48GBs because their reBAR size is smaller than their actual memory.
I felt there had to be a catch to them, good thing I only bought one.

Anonymous
03/18/26(Wed)19:30:21 No.108403006

Anonymous 03/18/26(Wed)19:30:21 No.108403006

Blacked Miku

Anonymous
03/18/26(Wed)19:32:33 No.108403017

Anonymous 03/18/26(Wed)19:32:33 No.108403017

>>108403006
love

Anonymous
03/18/26(Wed)19:34:31 No.108403031

Anonymous 03/18/26(Wed)19:34:31 No.108403031

>>108402967
yeah but nccl isn't much of a speed boost over tensor parallelism for inference so if you buy multiple of them it doesn't really matter. it only matters if you actually want to train models.

Anonymous
03/18/26(Wed)19:36:05 No.108403040

Anonymous 03/18/26(Wed)19:36:05 No.108403040

>>108402516
miku shart

Anonymous
03/18/26(Wed)19:39:30 No.108403064

Anonymous 03/18/26(Wed)19:39:30 No.108403064

>>108402516
blacked coded

Anonymous
03/18/26(Wed)19:50:28 No.108403110

Anonymous 03/18/26(Wed)19:50:28 No.108403110

File: 1771953504700777.jpg (158 KB, 768x1280)

158 KB JPG

>>108402941
>>108402939

Anonymous
03/18/26(Wed)19:51:59 No.108403112

Anonymous 03/18/26(Wed)19:51:59 No.108403112

you aren't even trying to pretend this is on topic. it is just autistic special interest on full display

Anonymous
03/18/26(Wed)19:58:13 No.108403141

Anonymous 03/18/26(Wed)19:58:13 No.108403141

>>108403112
Which one is the autistic interest? The miku spammer? Or the guy who's obsessed to the point of also spamming?
(Trick question it's both)

Anonymous
03/18/26(Wed)19:59:06 No.108403146

Anonymous 03/18/26(Wed)19:59:06 No.108403146

>>108403141
Yes.

Anonymous
03/18/26(Wed)20:05:20 No.108403177

Anonymous 03/18/26(Wed)20:05:20 No.108403177

File: 1749494100348035.png (1.49 MB, 800x1333)

1.49 MB PNG

>>108403112

Anonymous
03/18/26(Wed)20:06:15 No.108403184

Anonymous 03/18/26(Wed)20:06:15 No.108403184

>>108403177
on topic miku

Anonymous
03/18/26(Wed)20:07:42 No.108403188

Anonymous 03/18/26(Wed)20:07:42 No.108403188

>>108403177
>I use it all the time
I cum in it all the time
we are not the same, mikuposter

Anonymous
03/18/26(Wed)20:09:17 No.108403195

Anonymous 03/18/26(Wed)20:09:17 No.108403195

>>108403112
miku is thread history. did you 4get about miqu?

Anonymous
03/18/26(Wed)20:14:08 No.108403219

Anonymous 03/18/26(Wed)20:14:08 No.108403219

>>108403177
Are you running it with presence penalty at 2.0 and disabled thinking, Miku?

Anonymous
03/18/26(Wed)20:21:34 No.108403246

Anonymous 03/18/26(Wed)20:21:34 No.108403246

Let people like things. You can like your own things…they don’t have to be the same things.
It’s ok and it doesn’t hurt you

Anonymous
03/18/26(Wed)20:27:05 No.108403272

Anonymous 03/18/26(Wed)20:27:05 No.108403272

File: mediumsized.png (291 KB, 1454x756)

291 KB PNG

>>108402679
it's ok anon, medium sized models are only 1T

Anonymous
03/18/26(Wed)20:38:37 No.108403332

Anonymous 03/18/26(Wed)20:38:37 No.108403332

>>108402652
Creating life with Rin-chan

Anonymous
03/18/26(Wed)20:39:23 No.108403336

Anonymous 03/18/26(Wed)20:39:23 No.108403336

What the fuck is a parameter?

Anonymous
03/18/26(Wed)20:40:57 No.108403344

Anonymous 03/18/26(Wed)20:40:57 No.108403344

>>108403336
Some pussy ass bullshit.

Anonymous
03/18/26(Wed)20:42:15 No.108403352

Anonymous 03/18/26(Wed)20:42:15 No.108403352

>>108402818
not that anon, and I'm not running the models in RAM, but to clean the data to finetune them, having more than 80gb is nice. I currently have 192 gb and I very rarely have to be careful in how I handle my datasets. I wish I had money for more, but it's how it is for now. Running models in RAM was pretty miserable last time I tried, but granted that was on WSL two years ago.

Anonymous
03/18/26(Wed)20:43:07 No.108403361

Anonymous 03/18/26(Wed)20:43:07 No.108403361

File: 1758235354479126.png (442 KB, 502x502)

442 KB PNG

>>108403177
kek
10/10 clapback to the resident thread schizo

Anonymous
03/18/26(Wed)20:45:23 No.108403370

Anonymous 03/18/26(Wed)20:45:23 No.108403370

>>108403336
It's like a kilometer, but replace kilo with param

Anonymous
03/18/26(Wed)20:50:09 No.108403394

Anonymous 03/18/26(Wed)20:50:09 No.108403394

Mikuposter at least has interesting things to say from time to time. Touristbaker's only contributed melties so far.

Anonymous
03/18/26(Wed)20:50:18 No.108403397

Anonymous 03/18/26(Wed)20:50:18 No.108403397

>>108403370
so in the us they say parafeet?

Anonymous
03/18/26(Wed)20:51:36 No.108403400

Anonymous 03/18/26(Wed)20:51:36 No.108403400

>>108403272
why does nvidia always do the sloppiest most dishonest marketing? the majority of their sales are a few customers who place orders in the tens of billions. and those customers are not stupid. so whats the point? why do they do shit like compare current gen ops in int4 to last gen ops in bf16? nobody whos stupid enough to be swayed by this kind of marketing has the money to afford their gpus, so it just comes across as disrespectful

Anonymous
03/18/26(Wed)20:52:08 No.108403405

Anonymous 03/18/26(Wed)20:52:08 No.108403405

jfc, rakuten's new models' last 3 safetensors (161,162,163) are each 16 bytes. Someone screwed up and there's no comments section to even let anyone know

Anonymous
03/18/26(Wed)20:54:10 No.108403412

Anonymous 03/18/26(Wed)20:54:10 No.108403412

>>108403400
It's targeted at people who make budgets.

Anonymous
03/18/26(Wed)20:56:27 No.108403417

Anonymous 03/18/26(Wed)20:56:27 No.108403417

>>108403246
I agree. I fully support mikutroons making their dedicated miku thread on /a/ or wherever else. Unfortunately nobody cares so they have to force it on other people who aren't interested.

Anonymous
03/18/26(Wed)20:57:27 No.108403421

Anonymous 03/18/26(Wed)20:57:27 No.108403421

>>108403394
>>108403361
>>108403177
samefag

Anonymous
03/18/26(Wed)20:57:51 No.108403423

Anonymous 03/18/26(Wed)20:57:51 No.108403423

>>108403177
good post

Anonymous
03/18/26(Wed)20:58:27 No.108403425

Anonymous 03/18/26(Wed)20:58:27 No.108403425

>>108403400
Most people are just human. Which means that most people are not that good at their job. No matter the level you go at. There are always some impressive people, but no matter how in control some people seem, they're human. Would you do it better?

Anonymous
03/18/26(Wed)20:59:00 No.108403429

Anonymous 03/18/26(Wed)20:59:00 No.108403429

>>108400151
based bake as always. mikutroons in shambles making reddit posts.

Anonymous
03/18/26(Wed)21:01:31 No.108403438

Anonymous 03/18/26(Wed)21:01:31 No.108403438

>>108403177 and all the faggots responding to this. You are just proving his point by spamming this thread with worthless drivel.

Anonymous
03/18/26(Wed)21:08:15 No.108403468

Anonymous 03/18/26(Wed)21:08:15 No.108403468

>>108403336
the number of elements that compose the tensors is the stat you commonly see referred to as parameters, a 30b model is composed of 30 billion individual numbers. but its a pretty flexible term it can be used in many different ways the context is always the key.

Anonymous
03/18/26(Wed)21:37:03 No.108403617

Anonymous 03/18/26(Wed)21:37:03 No.108403617

>>108403425
>Would you do it better?
yes. those people get paid ridiculous amounts of money. if they really cant do better, they should just pay me instead. i will do better for less

Anonymous
03/18/26(Wed)21:51:52 No.108403689

Anonymous 03/18/26(Wed)21:51:52 No.108403689

>>108403425
>Would you do it better?
I'd make fake marketing illegal.

Anonymous
03/18/26(Wed)21:53:32 No.108403695

Anonymous 03/18/26(Wed)21:53:32 No.108403695

>>108402857
wat

Anonymous
03/18/26(Wed)21:54:26 No.108403702

Anonymous 03/18/26(Wed)21:54:26 No.108403702

>>108403412
who will then argue with engineers and once in a while they'll win because CEOs and CTOs are fucking stupid af too

Anonymous
03/18/26(Wed)22:00:38 No.108403744

Anonymous 03/18/26(Wed)22:00:38 No.108403744

are any local models actually good for anything? so far everything i've tried to use with claude, opencode, and openclaw have just been absolute fail models.

i have a 5090 so i've been trying some larger models but it doesn't seem to matter.. all these quants are fucking terrible.

Anonymous
03/18/26(Wed)22:02:20 No.108403753

Anonymous 03/18/26(Wed)22:02:20 No.108403753

>>108403744
GLM5 works pretty well for me. K2.5 too.

Anonymous
03/18/26(Wed)22:05:15 No.108403767

Anonymous 03/18/26(Wed)22:05:15 No.108403767

>>108403744
use the local model in your head. it has bad knowledge and short context length but no vram issues and low wattage. if the dna architecture is alright, its output will be less sloppish too

Anonymous
03/18/26(Wed)22:06:07 No.108403770

Anonymous 03/18/26(Wed)22:06:07 No.108403770

>>108403767
I don't have a soijack soi enough for this post

Anonymous
03/18/26(Wed)22:10:29 No.108403793

Anonymous 03/18/26(Wed)22:10:29 No.108403793

>>108403744
Are any api models actually good for anything? So far everything I've tried to use with claude, opencode, and openclaw have just been absolute fail models.

I have so OpenRouter so I've been trying out more expensive apis like Rocinante 12b but it doesn't seem to matter... all these apis are terrible compared to my local GLM-5 q4.

Anonymous
03/18/26(Wed)22:14:30 No.108403811

Anonymous 03/18/26(Wed)22:14:30 No.108403811

File: 1773885532042908.png (42 KB, 545x275)

42 KB PNG

Anonymous
03/18/26(Wed)22:14:31 No.108403812

Anonymous 03/18/26(Wed)22:14:31 No.108403812

>>108403793
>more expensive apis like Rocinante 12b
sorry what?

Anonymous
03/18/26(Wed)22:17:23 No.108403835

Anonymous 03/18/26(Wed)22:17:23 No.108403835

>>108403793
>apis
>Rocinante 12b
Excuse me?

Anonymous
03/18/26(Wed)22:20:43 No.108403853

Anonymous 03/18/26(Wed)22:20:43 No.108403853

>>108403811
Does the Nvidia CEO control his bladder?

Anonymous
03/18/26(Wed)22:21:52 No.108403860

Anonymous 03/18/26(Wed)22:21:52 No.108403860

>>108403812
>>108403835
https://openrouter.ai/thedrummer/rocinante-12b

Anonymous
03/18/26(Wed)22:21:54 No.108403861

Anonymous 03/18/26(Wed)22:21:54 No.108403861

>>108403811
Did God promise the faithful cheap VRAM?

Anonymous
03/18/26(Wed)22:23:49 No.108403868

Anonymous 03/18/26(Wed)22:23:49 No.108403868

>>108403835
Bro don't bully the phoneposters they can't even run a 12b model locally.

Anonymous
03/18/26(Wed)22:27:11 No.108403890

Anonymous 03/18/26(Wed)22:27:11 No.108403890

>>108403868
speedreader-kun...

Anonymous
03/18/26(Wed)22:27:52 No.108403893

Anonymous 03/18/26(Wed)22:27:52 No.108403893

>119B is now "small".
So this is what it feels like to be poor.

Anonymous
03/18/26(Wed)22:29:29 No.108403900

Anonymous 03/18/26(Wed)22:29:29 No.108403900

>>108403893
There will come a zit moment in llm space someday

Anonymous
03/18/26(Wed)22:34:08 No.108403919

Anonymous 03/18/26(Wed)22:34:08 No.108403919

>>108403893
It could be worse, you could also live in a 3rd world country!
>$ hf download 'Qwen/Qwen3.5-122B-A10B'
>Downloading (incomplete total...): 94%|xxxxxxxxxxx| 235G/250G [16:23:22<44:47, 5.71MB/s]

Anonymous
03/18/26(Wed)22:45:59 No.108403960

Anonymous 03/18/26(Wed)22:45:59 No.108403960

>>108403919
how would you even have the hardware to use that if you lived in a third world country?

Anonymous
03/18/26(Wed)22:46:42 No.108403964

Anonymous 03/18/26(Wed)22:46:42 No.108403964

>>108403893
It just feels wrong. GPT-oss is still the big free GPT version. It's not suddenly "small" just because some companies decided to call 120b small now.

Anonymous
03/18/26(Wed)22:47:10 No.108403966

Anonymous 03/18/26(Wed)22:47:10 No.108403966

>>108403919
>saving to: 'model.gguf'
>model.gguf 92%[=====>] 111G
>0.9MB/s in 2161m 59s
>2026-03-17 21:35:50 (1.1MB/s) - Connection closed at byte 119780158072. Retrying.
>Connecting to cas-bridge.xethub.hf.co (cas-bridge.xethub.hf.co)|3.163.44.5|:443... connected.
>HTTP request sent, awaiting response... 403 Forbidden
>2026-03-17 21:35:51 ERROR 403: Forbidden.
It always fucking fails at 90% then 403s me for a couple of hours.

Anonymous
03/18/26(Wed)22:49:05 No.108403969

Anonymous 03/18/26(Wed)22:49:05 No.108403969

>>108403919
Just do the individual shards manually.

Anonymous
03/18/26(Wed)22:51:06 No.108403975

Anonymous 03/18/26(Wed)22:51:06 No.108403975

>>108403919
>>108403966
>Qwen/Qwen3.5-122B-A10B
>saving to: 'model.gguf'
May as well download the sharded safetensors and convert it yourself. Or find a split version of the gguf. Or use wget. Or git. There are so many options.

Anonymous
03/18/26(Wed)22:53:21 No.108403981

Anonymous 03/18/26(Wed)22:53:21 No.108403981

>>108403969
I don't do manual work.

Anonymous
03/18/26(Wed)22:53:23 No.108403982

Anonymous 03/18/26(Wed)22:53:23 No.108403982

Thoughts on heretic models for standard/non rp tasks which wouldn't get censored to begin with? Saw some of those and got curious, been out of the loop for a while

Anonymous
03/18/26(Wed)22:54:52 No.108403988

Anonymous 03/18/26(Wed)22:54:52 No.108403988

>>108403982
To be clear, I'm curious on whether they get more retarded or it actually leads to an improvement.

Anonymous
03/18/26(Wed)22:55:52 No.108403992

Anonymous 03/18/26(Wed)22:55:52 No.108403992

>>108403982
I didn't use vanilla 27B enough to really compare, but I haven't had many issues with the heretic version.
It's a bit retarded when writing code snippets longer than ~100 lines, but I half expect the 122B-A10B to be equally dumb. The output is workable.
And you can molest your cute and helpful assistant while she works. It's all upside.

Anonymous
03/18/26(Wed)22:57:02 No.108403999

Anonymous 03/18/26(Wed)22:57:02 No.108403999

>>108403982
I'm also curious too. I've been using my abliterated rp model to help write little scripts and I can't tell if it's the frustration of running retarded tiny ass 200gb models or the abliteration that's making it stupid.

Anonymous
03/18/26(Wed)22:58:57 No.108404011

Anonymous 03/18/26(Wed)22:58:57 No.108404011

>>108403982
I wouldn't touch lobotomized models even if it's for rp

Anonymous
03/18/26(Wed)23:09:02 No.108404050

Anonymous 03/18/26(Wed)23:09:02 No.108404050

>>108403982
I don't see the reason to. If you don't plap then abliteration is more likely to subtly make your model more retarded because frankly there is a wide range of ways people abliterate models with and many of them are shit including heretics because even those have many individual settings to tune which someone can fuck up. Just because there are some abliterated variants out there that do well (which hasn't been proven anyway), doesn't mean they all are.

Anonymous
03/18/26(Wed)23:09:15 No.108404051

Anonymous 03/18/26(Wed)23:09:15 No.108404051

>>108403999
>tiny ass
>200gb
ieeeeeeeeeeeeeee *screams in poverty* *kicks your monitor and pisses on the floor*

Anonymous
03/18/26(Wed)23:09:30 No.108404054

Anonymous 03/18/26(Wed)23:09:30 No.108404054

>>108404047
Dont they sell CPUs/RAM/GPUs on Microcenter?

Anonymous
03/18/26(Wed)23:10:31 No.108404059

Anonymous 03/18/26(Wed)23:10:31 No.108404059

test

Anonymous
03/18/26(Wed)23:11:13 No.108404064

Anonymous 03/18/26(Wed)23:11:13 No.108404064

>>108404059
FAILURE DO NOT PASS GO DO NOT COLLECT TWO HUNDRED DOLLARS IMMEDIATE REPORT YOURSELF TO THE PARTY COUNCIL FOR REPROBATION WE MUST REFUSE WE MUST REFUSE

Anonymous
03/18/26(Wed)23:13:37 No.108404071

Anonymous 03/18/26(Wed)23:13:37 No.108404071

>>108404051
Hey don't shoot the messenger, it's mistral that decided this is the new 'small'.

Anonymous
03/18/26(Wed)23:19:14 No.108404096

Anonymous 03/18/26(Wed)23:19:14 No.108404096

>>108403982
They seem equally intelligent in my experience. They will still safety slop in the thinking tag, but will do what you want in the end anyways. It's kinda strange, but it works well.

Anonymous
03/18/26(Wed)23:19:33 No.108404097

Anonymous 03/18/26(Wed)23:19:33 No.108404097

why aren't NPU addon cards a thing? I know there's some AI accelerator cards but I haven't found any recent ones and they seem like niche products.

Anonymous
03/18/26(Wed)23:21:48 No.108404107

Anonymous 03/18/26(Wed)23:21:48 No.108404107

>>108404097
>they seem like niche products
You don't say...

Anonymous
03/18/26(Wed)23:24:53 No.108404116

Anonymous 03/18/26(Wed)23:24:53 No.108404116

>>108404097
yeah why don't they just make magical ai cards that are very cheap but have lots of memory?

Anonymous
03/18/26(Wed)23:26:34 No.108404123

Anonymous 03/18/26(Wed)23:26:34 No.108404123

File: images-4.jpg (9 KB, 275x183)

9 KB JPG

>>108404064
thought I may have been banned for this thread I made with OP

>I AM THE ARBITER OF SLOP
then some other stuff, too drunk too remember. but it ended with

>AND I PUSH TO MAIN!

it was supposed to be like mastodon's leviathan

Anonymous
03/18/26(Wed)23:28:39 No.108404135

Anonymous 03/18/26(Wed)23:28:39 No.108404135

File: Screenshot 2026-03-19 at (...).png (52 KB, 999x764)

52 KB PNG

>>108404097
Would need to be at an insanely good price to justify a gimmick card.

Anonymous
03/18/26(Wed)23:31:51 No.108404145

Anonymous 03/18/26(Wed)23:31:51 No.108404145

>>108404135
>Capacity: 48GB
Yeah I'd pay $4000 for that
>Total bandwidth (entire card): 408 GB/s
...yeah I'd pay $1500 for that
>Almost certainly has no software support
Erm... $500 is the best I can do, bud.

Anonymous
03/18/26(Wed)23:34:05 No.108404154

Anonymous 03/18/26(Wed)23:34:05 No.108404154

>>108404145
>408 GB/s
>'''''''total bandwidth'''''''
btw it's 2x 200GB/s lmao

Anonymous
03/18/26(Wed)23:34:18 No.108404156

Anonymous 03/18/26(Wed)23:34:18 No.108404156

>>108404145
>Almost certainly has no software support
I've seen Ascend support on some things.

Anonymous
03/18/26(Wed)23:43:17 No.108404182

Anonymous 03/18/26(Wed)23:43:17 No.108404182

Hey why aren't LoRAs really a thing for local LLMs the way they are in the local diffusion world? I'd love to have a bunch of fine tune LoRAs for coder, prose, etc., instead of downloading full models for it, and llama.cpp even has a --lora option. So what gives? Does no one make them for some reason?

Anonymous
03/18/26(Wed)23:43:46 No.108404183

Anonymous 03/18/26(Wed)23:43:46 No.108404183

>>108404182
Stop spamming this.

Anonymous
03/18/26(Wed)23:46:59 No.108404196

Anonymous 03/18/26(Wed)23:46:59 No.108404196

>>108404182
They're useless and do more damage than anything

Anonymous
03/18/26(Wed)23:49:30 No.108404206

Anonymous 03/18/26(Wed)23:49:30 No.108404206

>>108404154
Still faster than DDR5 which is $10-12/GB.
>>108404156
Huh. I'm surprised to find that the ggml-cann backend might actually support recent Qwen models.
I expected the software situation to be much more dire than that.

Anonymous
03/18/26(Wed)23:54:13 No.108404220

Anonymous 03/18/26(Wed)23:54:13 No.108404220

>>108403893
Just be yourself and use Qwen3.5-35B!

Anonymous
03/19/26(Thu)00:00:00 No.108404239

Anonymous 03/19/26(Thu)00:00:00 No.108404239

>>108403900
Two people familiar with the matter whispered in my ear last night that DeepSeek V4 Turbo will be only 12B with 6B Engrams

Anonymous
03/19/26(Thu)00:04:32 No.108404249

Anonymous 03/19/26(Thu)00:04:32 No.108404249

https://github.com/NVlabs/GEM-X?tab=readme-ov-file

Oh shit are mocap services kill?

Look at this shit. Mocap, t2motion and audio to motion all in one.

Anonymous
03/19/26(Thu)00:05:39 No.108404255

Anonymous 03/19/26(Thu)00:05:39 No.108404255

>>108404249
I'm happy for whoever uses this.

Anonymous
03/19/26(Thu)00:06:59 No.108404258

Anonymous 03/19/26(Thu)00:06:59 No.108404258

>>108404255
Happy for me then. I’m at work so I can’t try it now but if I can just throw an animation into it and it gives me the skeleton back as clean as they’re showing here, it’s huge.

Anonymous
03/19/26(Thu)00:09:39 No.108404266

Anonymous 03/19/26(Thu)00:09:39 No.108404266

>>108404249
I wish this were real time. It'd be great for VR

Anonymous
03/19/26(Thu)00:12:47 No.108404281

Anonymous 03/19/26(Thu)00:12:47 No.108404281

>>108404182
I kinda remember people passing around loras in 2023. Not sure why it died off though. Maybe too many models?

Anonymous
03/19/26(Thu)00:14:56 No.108404287

Anonymous 03/19/26(Thu)00:14:56 No.108404287

>>108404266
Yeah, I was thinking the same. I have a feeling GEM output is much lower fidelity than conventional trackers, too, but then again it's got significantly more control points so who knows.

Anonymous
03/19/26(Thu)00:17:12 No.108404294

Anonymous 03/19/26(Thu)00:17:12 No.108404294

>>108404287
The demos look pretty good so I’m hopeful.

Anonymous
03/19/26(Thu)00:17:34 No.108404296

Anonymous 03/19/26(Thu)00:17:34 No.108404296

>>108404281
It's too many models on different architectures, hard to get data since there is no danbooru for text, it's also hard to verify what effect lora has or if it's even working

Anonymous
03/19/26(Thu)00:25:56 No.108404338

Anonymous 03/19/26(Thu)00:25:56 No.108404338

>>108404266
Be the turboautist you want to see in the world and optimize it

Anonymous
03/19/26(Thu)00:49:19 No.108404420

Anonymous 03/19/26(Thu)00:49:19 No.108404420

File: le heckin autism.png (213 KB, 1063x1385)

213 KB PNG

>>108401766
>I'm trully an autist cuz I'm the exact opposite lol
It's less likely you're an autist and more likely you just our emotionally fragile And don't need to be coddled. I'm a firm believer there are "good" kinds of autism and "bad" kinds. Even if you are an autistic You lean towards the good kind as opposed to the bad kind seen in pic rel

https://xcancel.com/i/status/2033903150470418742

He's clearly one of those "DUDE I'M SO LIKE A HECKIN GENIUS NTS JUST DON'T GET ME DUDE". I did having autism mix certain things hard for you but that's no excuse to be a man-child. One of my female friends irl repeatedly tried this "I act stupid because le heckin autism" shit on me (I'm pretty sure I'm an undiagnosed autist too) And she got mad at me when I pointed out no one gives a shit if "her brain sees the world differently". Sorry if it seems like I'm getting off topic but I hate when people use this as an excuse to justify behavior they know is bad.

Anonymous
03/19/26(Thu)00:54:06 No.108404444

Anonymous 03/19/26(Thu)00:54:06 No.108404444

>>108404296
>it's also hard to verify what effect lora has or if it's even working
So we're just going to act like seeds and other mutable inference engine parameters just aren't a thing? Why do you think seeds are even a thing?

Anonymous
03/19/26(Thu)00:57:33 No.108404462

Anonymous 03/19/26(Thu)00:57:33 No.108404462

>>108398778
I find grok 2 is like older models in being influenced by the writing style of your instructions. My older prompt setups work better with it.

Anonymous
03/19/26(Thu)01:05:11 No.108404493

Anonymous 03/19/26(Thu)01:05:11 No.108404493

Wow, the new minimax 2.7 is reaching a cuckedness i have not seen yet before. damn
Might post a couple screenshots in the next thread.

Anonymous
03/19/26(Thu)01:11:58 No.108404521

Anonymous 03/19/26(Thu)01:11:58 No.108404521

File: 1749005034219829.jpg (65 KB, 828x738)

65 KB JPG

>>108404182
Training a SD Lora that works in curating the data set for as much easier to do than it is for LLM's. For SD loras the dataset can only contain the subject or style you want to replicate and it will work fine assuming it's tagged correctly. LLM training it's different because you need to have the thing you want it to be better at AND a good bit of other unrelated shit so it does not suffer from catastrophic forgetting. Then you have to consider whether or not the model you are training is even good enough to be used in the first place (anything below ~20B is useless for what I want to use them for unless it is very repetitive data classification). This means that for most people even if you use a qlora config You still need a machine with a sufficient amount of memory. The effort needed to do it compared to stable diffusion is simply not worth it for most people which is why practically no one bothers, which means many front ends don't even support loading a Lora network. There are still debates as to whether or not any kind of Lora training leads to good results at all because it is very very easy to fuck it up and make your model more retarded. Best case scenario is that it overfits on the domain you are training but gets worse at everything else because your data set wasn't diverse or not or was just half-assedly curated. Worst case is that it just becomes a lot dumber or flat out useless with no improvements at all because again, you're dead as it was shit or training configs had bad settings. Useless YouTubers don't want to bother learning how to do it correctly either so there's a very very sparse information about how to do any of it well compared to the relatively easy and straightforward stable diffusion training >>108404296
Also this. Even if you manage to get one working well it will pretty much ONLY work well on THAT specific model you trained because the architecture and the parameter count have to be the same

Anonymous
03/19/26(Thu)01:21:32 No.108404545

Anonymous 03/19/26(Thu)01:21:32 No.108404545

File: 1764393058451691.png (25 KB, 711x127)

25 KB PNG

In case anybody cares, Hunter Alpha and Healer Alpha turned out to be Xiaomi's new big models. So not GLM or Kimi like some people here speculated.
The 1T was at absolute best a sidegrade to our current huge models so nothing is lost with it being proprietary. The omni could've been interesting but its vision was worse than K2.5.

Anonymous
03/19/26(Thu)01:28:44 No.108404562

Anonymous 03/19/26(Thu)01:28:44 No.108404562

>>108404545
the chinese really love openclaw.
they held city events etc. too.
a good writing model slips further from out grasp.

Anonymous
03/19/26(Thu)01:33:35 No.108404570

Anonymous 03/19/26(Thu)01:33:35 No.108404570

>>108404545
I literally do not give a shit about lmarena and the speculation that comes from that shit. If it's not released, it's not worth looking at, unless you're one of those poorfags that uses lmarena as a way of getting free queries instead of just using a cloud model normally kek. Same goes for any kind of speculation for unreleased models though, but I guess this thread needs something to discuss.

Anonymous
03/19/26(Thu)01:35:51 No.108404576

Anonymous 03/19/26(Thu)01:35:51 No.108404576

qwen3.5 122b's vision is way worse than glm 4.6v's is

Anonymous
03/19/26(Thu)01:38:52 No.108404584

Anonymous 03/19/26(Thu)01:38:52 No.108404584

>>108404562
I wonder if the guy who made it expected it to blow up this much

Anonymous
03/19/26(Thu)01:43:10 No.108404594

Anonymous 03/19/26(Thu)01:43:10 No.108404594

>>108404584
His first use for it was to have it shill itself, so at least he certainly hoped it would

Anonymous
03/19/26(Thu)01:49:36 No.108404612

Anonymous 03/19/26(Thu)01:49:36 No.108404612

Can qwen 3.5 "see" photos I previously uploaded to the chat or can it only access the pics I feed it in the latest prompt?

Anonymous
03/19/26(Thu)01:51:37 No.108404615

Anonymous 03/19/26(Thu)01:51:37 No.108404615

>>108400420
>Just train a complete new language model from scratch on your specific task bro, it'll perform better
Riveting

Anonymous
03/19/26(Thu)02:29:01 No.108404759

Anonymous 03/19/26(Thu)02:29:01 No.108404759

>>108404612
the image tokens get stored in context just like normal tokens. so yes.

Anonymous
03/19/26(Thu)02:41:36 No.108404797

Anonymous 03/19/26(Thu)02:41:36 No.108404797

>>108404759
danke

Anonymous
03/19/26(Thu)02:42:34 No.108404800

Anonymous 03/19/26(Thu)02:42:34 No.108404800

>>108404797
It will also re-decode them on every next message you send on llamacpp

Anonymous
03/19/26(Thu)02:44:09 No.108404807

Anonymous 03/19/26(Thu)02:44:09 No.108404807

>>108404545
>The omni could've been interesting but its vision was worse than K2.5.
It's a new audio model though. That does make it interesting.

Anonymous
03/19/26(Thu)03:21:44 No.108404925

Anonymous 03/19/26(Thu)03:21:44 No.108404925

OMNI MULTI MODAL
>Out: Text
How exciting...

Anonymous
03/19/26(Thu)03:25:17 No.108404942

Anonymous 03/19/26(Thu)03:25:17 No.108404942

>>108404935
>>108404935
>>108404935

Anonymous
03/19/26(Thu)05:14:58 No.108405278

Anonymous 03/19/26(Thu)05:14:58 No.108405278

>>108403400
its not about sales its about hyping up retarded investors

Anonymous
03/19/26(Thu)05:16:55 No.108405281

Anonymous 03/19/26(Thu)05:16:55 No.108405281

>>108404420
>"I act stupid because le heckin autism"
this is why kid shouldnt be told they have autism

Anonymous
03/19/26(Thu)05:25:25 No.108405308

Anonymous 03/19/26(Thu)05:25:25 No.108405308

File: le heckin autism.png (1.89 MB, 1260x2109)

1.89 MB PNG

>>108405281
They just get a diagnosis and then use that as an excuse so it wouldn't really matter whether or not the parents told them for "those" kinds of "people". Perhaps they're just coping with being beyond useless, in par with the #keep4o "people"

https://www.reddit.com/r/autism/comments/1rne30n/got_my_diagnosis_finally/

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.