/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 03/05/26(Thu)07:07:11 No.108300682

File: 1754714485403516.png (382 KB, 840x639)

382 KB PNG

/lmg/ - Local Models General Anonymous 03/05/26(Thu)07:07:11 No.108300682

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108295959

►News
>(03/03) WizardLM publishes "Beyond Length Scaling" GRM paper: https://hf.co/papers/2603.01571
>(03/02) Qwen 3.5 Small Models (2B, 4B) released: https://hf.co/Qwen/Qwen3.5-4B
>(02/26) Qwen 3.5 35B-A3B released, excelling at agentic coding: https://hf.co/Qwen/Qwen3.5-35B-A3B
>(02/24) Introducing the Qwen 3.5 Medium Model Series: https://xcancel.com/Alibaba_Qwen/status/2026339351530188939
>(02/24) Liquid AI releases LFM2-24B-A2B: https://hf.co/LiquidAI/LFM2-24B-A2B

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
03/05/26(Thu)07:08:57 No.108300690

Anonymous 03/05/26(Thu)07:08:57 No.108300690

>>108300676
this but for leddit

Anonymous
03/05/26(Thu)07:09:17 No.108300691

Anonymous 03/05/26(Thu)07:09:17 No.108300691

Can I run AI on my smart fridge? Maybe one of the small qwen models?

Anonymous
03/05/26(Thu)07:11:15 No.108300706

Anonymous 03/05/26(Thu)07:11:15 No.108300706

bubble status: bursting soon

Anonymous
03/05/26(Thu)07:11:50 No.108300713

Anonymous 03/05/26(Thu)07:11:50 No.108300713

why are we having new threads at page 4 now?

Anonymous
03/05/26(Thu)07:14:35 No.108300728

Anonymous 03/05/26(Thu)07:14:35 No.108300728

>>108300682
>WizardLM publishes
I thought they were banished to the shadow realm?

Anonymous
03/05/26(Thu)07:22:09 No.108300750

Anonymous 03/05/26(Thu)07:22:09 No.108300750

>>108300728
robson ltda bailed them out

Anonymous
03/05/26(Thu)07:31:12 No.108300784

Anonymous 03/05/26(Thu)07:31:12 No.108300784

>>108300713
It's literally just because someone doesn't want vocaloids in the OP.

Anonymous
03/05/26(Thu)07:32:11 No.108300788

Anonymous 03/05/26(Thu)07:32:11 No.108300788

>>108300691
of course, and you should install OpenClaw on it also and let it dictate what food you eat

Anonymous
03/05/26(Thu)07:32:57 No.108300793

Anonymous 03/05/26(Thu)07:32:57 No.108300793

>>108300788
we both know you are being an ass, but that's actually a good idea

Anonymous
03/05/26(Thu)07:34:07 No.108300798

Anonymous 03/05/26(Thu)07:34:07 No.108300798

>>108300793
i wasn't being sarcastic, if the fridge has a one of those sensors or cameras then you could use it to track calorie intake

Anonymous
03/05/26(Thu)07:35:10 No.108300806

Anonymous 03/05/26(Thu)07:35:10 No.108300806

>>108300788
Isn't OpenClaw going to be closed source soon after the acquisition?

Anonymous
03/05/26(Thu)07:36:22 No.108300810

Anonymous 03/05/26(Thu)07:36:22 No.108300810

>>108300806
there are dozens of forks now, it wouldn't really matter

Anonymous
03/05/26(Thu)07:38:54 No.108300819

Anonymous 03/05/26(Thu)07:38:54 No.108300819

Big Deepseek day today

Anonymous
03/05/26(Thu)07:40:01 No.108300825

Anonymous 03/05/26(Thu)07:40:01 No.108300825

>>108300819
link?

Anonymous
03/05/26(Thu)07:47:04 No.108300848

Anonymous 03/05/26(Thu)07:47:04 No.108300848

I'm running n8n and a ollama VM on my homelab. No gpu, just a couple of cores and 20gb ram. I know people use setups like this for automation workflows (speed is not a huge concern, just precision). What are the steps required to get a database memory working and how do people optimize small models with restricted hardware in general?

Anonymous
03/05/26(Thu)07:48:41 No.108300855

Anonymous 03/05/26(Thu)07:48:41 No.108300855

File: db7.jpg (84 KB, 680x847)

84 KB JPG

>>108300825

Anonymous
03/05/26(Thu)07:56:54 No.108300890

Anonymous 03/05/26(Thu)07:56:54 No.108300890

>>108300848
just buy one

Anonymous
03/05/26(Thu)08:09:55 No.108300960

Anonymous 03/05/26(Thu)08:09:55 No.108300960

yes I am mikusexual

Anonymous
03/05/26(Thu)08:14:20 No.108300988

Anonymous 03/05/26(Thu)08:14:20 No.108300988

>>108300848
>ollama

Anonymous
03/05/26(Thu)08:15:47 No.108300996

Anonymous 03/05/26(Thu)08:15:47 No.108300996

File: miku donut happy satisfie(...).jpg (33 KB, 761x608)

33 KB JPG

►Recent Highlights from the Previous Thread: >>108295959

--Paper (old): H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs:
>108296054 >108296204
--OBLITERATUS tool for removing AI model censorship via weight ablation:
>108297061 >108297066 >108297113 >108297177 >108297103 >108297117 >108297136 >108297203 >108297208 >108297232 >108297233 >108299678 >108299706
--Alibaba reaffirms open-source Qwen strategy amid leadership shift:
>108298195 >108298228 >108299471 >108299477 >108298457
--Qwen family model size vs performance analysis:
>108300067 >108300073 >108300077 >108300083 >108300093 >108300118
--SillyTavern alternatives for modern model roleplaying:
>108299346 >108299399 >108299412 >108299435 >108299629 >108299489 >108299913 >108300639
--A mathematical proof from an anonymous Korean forum: The essence of Attention is fundamentally a d^2 problem, not n^2:
>108298017
--Distributed LLM inference using pooled NUC resources:
>108296013 >108296051 >108299436
--Preventing agents from falsely claiming task completion:
>108299444 >108299470
--Something is afoot in the land of Qwen:
>108297114
--Miku (free space):
>108296286 >108296467 >108297038 >108298135 >108299073

►Recent Highlight Posts from the Previous Thread: >>108298564

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
03/05/26(Thu)08:17:02 No.108301004

Anonymous 03/05/26(Thu)08:17:02 No.108301004

>>108300996
Finally some proper news.

Anonymous
03/05/26(Thu)08:22:34 No.108301041

Anonymous 03/05/26(Thu)08:22:34 No.108301041

File: dipsyNoSneakingFood.png (2.73 MB, 1024x1536)

2.73 MB PNG

>>108300798
>>108300793
lol if it had a camera in your pantry as well, it could track your macros and order food for delivery from the local grocery store.
Then text you and your friends to either congratulate you or give you shit about whether you're sticking to your diet.
Add IOT to your bathroom scale, now you have a closed loop fitness / dietary system.

Anonymous
03/05/26(Thu)08:28:51 No.108301063

Anonymous 03/05/26(Thu)08:28:51 No.108301063

File: 1769717321226271.jpg (36 KB, 620x521)

36 KB JPG

>bought an M4 pro macbook pro with 48gb of RAM thinking it would last me several years
>local AI gets good and now I need like 512 GB
Fuck man I'm tempted to just buy an RTX 6000 Pro

Anonymous
03/05/26(Thu)08:31:23 No.108301080

Anonymous 03/05/26(Thu)08:31:23 No.108301080

>>108300996
Thank you Mikuchad

Anonymous
03/05/26(Thu)08:36:54 No.108301109

Anonymous 03/05/26(Thu)08:36:54 No.108301109

thoughts? https://pastebin.com/KrpEwdKJ

Anonymous
03/05/26(Thu)09:06:44 No.108301239

Anonymous 03/05/26(Thu)09:06:44 No.108301239

File: mine.jpg (196 KB, 1536x1536)

196 KB JPG

Anonymous
03/05/26(Thu)09:18:17 No.108301303

Anonymous 03/05/26(Thu)09:18:17 No.108301303

>>108301063
Without an explicit completion check (for example "count phases and confirm total"), the agent can rationalize continuing as "s
till helping"

Anonymous
03/05/26(Thu)09:20:38 No.108301317

Anonymous 03/05/26(Thu)09:20:38 No.108301317

File: op is a faggot.gif (1006 KB, 372x298)

1006 KB GIF

>>108300682
OP is a massive faggot

Anonymous
03/05/26(Thu)09:20:56 No.108301319

Anonymous 03/05/26(Thu)09:20:56 No.108301319

>have to compile llama.cpp for cuda support
i just chucked the precompiled cuda releases on github into a folder on C:\ and added it to path. did i do it right

Anonymous
03/05/26(Thu)09:22:54 No.108301332

Anonymous 03/05/26(Thu)09:22:54 No.108301332

>>108301319
>compile it
>grab precompiled binaries
uh, no

Anonymous
03/05/26(Thu)09:27:00 No.108301356

Anonymous 03/05/26(Thu)09:27:00 No.108301356

>>108301317
*glug glug glug*

Anonymous
03/05/26(Thu)09:30:22 No.108301378

Anonymous 03/05/26(Thu)09:30:22 No.108301378

>>108300067
>27B dense as good as 122B-A10B moe
Does this mean a 70B dense model would be better than the 397B-A17B moe model?

Anonymous
03/05/26(Thu)09:32:53 No.108301389

Anonymous 03/05/26(Thu)09:32:53 No.108301389

Speed in 35B is a quality of itself, real-time VR interactions with retarded waifu feel surreal

Anonymous
03/05/26(Thu)09:34:43 No.108301405

Anonymous 03/05/26(Thu)09:34:43 No.108301405

y'all fuck with that OBLITERATUS shit or ts just hype? 30% benchmark increase sounds like cap ong

Anonymous
03/05/26(Thu)09:40:09 No.108301427

Anonymous 03/05/26(Thu)09:40:09 No.108301427

>>108301378
If it had modern training techniques, it would be smarter for things that require attention to detail, but it would have less space to store knowledge so it would still underperform in most common tasks where it can just rely on memorization like benchmarks.

Anonymous
03/05/26(Thu)09:41:52 No.108301436

Anonymous 03/05/26(Thu)09:41:52 No.108301436

>>108301427
Good thing we can store knowledge in Engrams
Dense + Engrams

Anonymous
03/05/26(Thu)09:42:26 No.108301438

Anonymous 03/05/26(Thu)09:42:26 No.108301438

>>108301239
I pretty much look like this

Anonymous
03/05/26(Thu)09:44:01 No.108301448

Anonymous 03/05/26(Thu)09:44:01 No.108301448

>>108301436
For us, that would be the best. The labs training the models would still prefer MoE due to inference speeds and training costs.

Anonymous
03/05/26(Thu)09:47:07 No.108301468

Anonymous 03/05/26(Thu)09:47:07 No.108301468

Based baker. Fighting offtopic autistic special interest one OP at a time.

Anonymous
03/05/26(Thu)09:47:50 No.108301472

Anonymous 03/05/26(Thu)09:47:50 No.108301472

Is engrams actually coming? Or is it just being memed.

Anonymous
03/05/26(Thu)09:48:08 No.108301474

Anonymous 03/05/26(Thu)09:48:08 No.108301474

>>108301317
u mad bro? why?

Anonymous
03/05/26(Thu)09:49:52 No.108301479

Anonymous 03/05/26(Thu)09:49:52 No.108301479

so this august are we gonna get gpt oss 2

Anonymous
03/05/26(Thu)09:51:23 No.108301493

Anonymous 03/05/26(Thu)09:51:23 No.108301493

>Meta's first LLaMA model was leaked and released via a torrent link on March 3, 2023.
damn it's been 3 years already

Anonymous
03/05/26(Thu)10:04:03 No.108301565

Anonymous 03/05/26(Thu)10:04:03 No.108301565

I want to fuck an Engram

Anonymous
03/05/26(Thu)10:04:57 No.108301571

Anonymous 03/05/26(Thu)10:04:57 No.108301571

uhh, where is V4?

Anonymous
03/05/26(Thu)10:09:27 No.108301602

Anonymous 03/05/26(Thu)10:09:27 No.108301602

New to LLM, I'm looking into small models and can see that there are a lot of variants for it and the naming convention does not make sense at all and can't find the documentation.

https://huggingface.co/unsloth/Qwen3.5-9B-GGUF/tree/main

Can someone educate me the use cases for the different versions?

Anonymous
03/05/26(Thu)10:11:40 No.108301616

Anonymous 03/05/26(Thu)10:11:40 No.108301616

unsloth removes information about sloths

Anonymous
03/05/26(Thu)10:15:24 No.108301636

Anonymous 03/05/26(Thu)10:15:24 No.108301636

>>108301602
these are different quantizations, basically compression to fit bigger models into consumer cards VRAM. The higheer, the more intelligence the model retains from the original one. Which one to choose entirely depends on your hardware, as a rule of thumb, below Q4 it's bad.

It's generally a good idea to ask gemini or chatgpt about all this

Anonymous
03/05/26(Thu)10:16:54 No.108301649

Anonymous 03/05/26(Thu)10:16:54 No.108301649

Localsisters, I can't figure out the best way to handle memory in SillyTavern. I activated vector storage but I doubt that's enough. Why does this shit have to be so complicated? I just wanna do some long roleplays...

Anonymous
03/05/26(Thu)10:20:32 No.108301677

Anonymous 03/05/26(Thu)10:20:32 No.108301677

>>108301649
Model have more than 4k context now. You don't need anything.

Anonymous
03/05/26(Thu)10:21:24 No.108301683

Anonymous 03/05/26(Thu)10:21:24 No.108301683

>>108301636
Thank you, so Q4 means 4bit Quantization and so on, how about K_S, K_M after that.

Anonymous
03/05/26(Thu)10:21:48 No.108301686

Anonymous 03/05/26(Thu)10:21:48 No.108301686

>>108301571
Two more weeks.

Anonymous
03/05/26(Thu)10:24:53 No.108301698

Anonymous 03/05/26(Thu)10:24:53 No.108301698

>>108301677
I have 32k context and I'm already almost at the limit 103 messages in.

Anonymous
03/05/26(Thu)10:25:36 No.108301699

Anonymous 03/05/26(Thu)10:25:36 No.108301699

>>108301683
It's even more granular quantization levels, scroll down in the model card at this address you will see a chart to give you an idea of the quants and the quality

https://huggingface.co/bartowski/Qwen_Qwen3.5-35B-A3B-GGUF

Anonymous
03/05/26(Thu)10:37:09 No.108301765

Anonymous 03/05/26(Thu)10:37:09 No.108301765

>>108301699
Thank you, How about unsloth vs bartowski quantizised formats. Which one is better or is there anyone that has better version I can check?

Anonymous
03/05/26(Thu)10:42:18 No.108301806

Anonymous 03/05/26(Thu)10:42:18 No.108301806

>>108301765
Ideally they should all be the same, lately there has been some drama with unsloth quants.

The best way is to test them yourself see which one you prefer

Anonymous
03/05/26(Thu)10:50:10 No.108301871

Anonymous 03/05/26(Thu)10:50:10 No.108301871

>>108301078
>Jamba2 Mini
So, funny thing.
This guy has 8 experts, with 2 being activated per token for a total of 12B activated params.
I launch it, make a question about D&D, get a pretty standard result. Good, some models hallucinate some wild stuff that this one didn't, even if the result wasn't perfect.
Then I do
>--override-kv jamba.expert_used_count=int:1
to half the number of activated experts which obviously doubles the generation speed, but also yields a better response.
Yes, anecdotal, and a single sample, but still funny to see.

Anonymous
03/05/26(Thu)10:51:23 No.108301879

Anonymous 03/05/26(Thu)10:51:23 No.108301879

Holy FUCK Qwen 3.5 35B-A3B straight up CHOOSES TO NOT TRANSLATE HENTAI.

What the fuck is this shit?! Fucking GEMMA 3 27B OF ALL FUCKING MODELS DIDN'T HAVE PROBLEMS TRANSLATING HENTAI GAMES

What the FUCK is wrong with Alibaba? FUCK QWEN.

Anonymous
03/05/26(Thu)10:52:06 No.108301888

Anonymous 03/05/26(Thu)10:52:06 No.108301888

>>108301871
Must have been the Wheatley expert.

Anonymous
03/05/26(Thu)10:53:23 No.108301903

Anonymous 03/05/26(Thu)10:53:23 No.108301903

>>108301879
Are you using the base model?

Anonymous
03/05/26(Thu)10:57:35 No.108301930

Anonymous 03/05/26(Thu)10:57:35 No.108301930

>>108301888
Lmao.
Makes me wonder if I shouldn't be fucking around with GLM Air with less activated experts and other such experiments.

Anonymous
03/05/26(Thu)10:58:24 No.108301936

Anonymous 03/05/26(Thu)10:58:24 No.108301936

>>108301879
skill issue

Anonymous
03/05/26(Thu)11:00:33 No.108301950

Anonymous 03/05/26(Thu)11:00:33 No.108301950

>>108300682
serious question why is ollama/openwebui never recommended here?

seems to be working just fine for me. easy setup and pretty trivial to add custom model packages too.

Anonymous
03/05/26(Thu)11:01:44 No.108301955

Anonymous 03/05/26(Thu)11:01:44 No.108301955

>>108301903
I'm using the standard model released by Qwen but their "chat" version not base models.

>>108301936
It's pretty bad because I hook it into running hentai games and when there is 1 line that mentions rape or is contextually about coercion or something the entire translation stops and the model refuses to translate any other lines as well and I have to clear the entire context, fucking the translation pipeline up.

Anonymous
03/05/26(Thu)11:03:27 No.108301968

Anonymous 03/05/26(Thu)11:03:27 No.108301968

>>108300691
it would be more effective for you to run the AI on a home server and connect the fridgetablet to the server via an iframe web browser, or just run the webui on the fridge, not the actual AI backend.

unless you want to stare at your fridge door for 5 minutes waiting for it to tell you how long you can leave pizza in the fridge before its likely to kill you.

Anonymous
03/05/26(Thu)11:06:00 No.108301992

Anonymous 03/05/26(Thu)11:06:00 No.108301992

>https://www.reddit.com/r/LocalLLaMA/comments/1rlkptk/final_qwen35_unsloth_gguf_update/

>Re-download Qwen3.5-35B-A3B, 27B, and 122B-A10B as they're now all updated. Re-download 397B-A17B after today’s update (still uploading!)
just one more re-download bro

Anonymous
03/05/26(Thu)11:06:48 No.108301997

Anonymous 03/05/26(Thu)11:06:48 No.108301997

>>108301992
wtf

Anonymous
03/05/26(Thu)11:07:04 No.108301999

Anonymous 03/05/26(Thu)11:07:04 No.108301999

>>108301992
>Hahaha sorry - agreed it might not be the true "final" final
geg

Anonymous
03/05/26(Thu)11:07:24 No.108302003

Anonymous 03/05/26(Thu)11:07:24 No.108302003

>>108301955
Try the base model. It still behaves like a instruct tune, but with much less verbose and "natural" reasoning traces (seems straight out of the RL process), and with a lot less refusals baked in.

Anonymous
03/05/26(Thu)11:08:35 No.108302011

Anonymous 03/05/26(Thu)11:08:35 No.108302011

>>108301992
> Are all the GGUFs for the smaller Qwen3.5 models, 9b and below, also updated?
>Oh the old ones generally are ok for now - however we do plan to improve them over the weekend!
What's final about any of this?

Anonymous
03/05/26(Thu)11:10:25 No.108302017

Anonymous 03/05/26(Thu)11:10:25 No.108302017

File: file.png (42 KB, 823x253)

42 KB PNG

>>108302011
>>108301999
>>108301992
you can thrust in them to have no idea what they're doing

Anonymous
03/05/26(Thu)11:11:34 No.108302026

Anonymous 03/05/26(Thu)11:11:34 No.108302026

>>108302017
lmao'd
what a bunch of clowns
are these at least with the fused method?

Anonymous
03/05/26(Thu)11:12:13 No.108302029

Anonymous 03/05/26(Thu)11:12:13 No.108302029

>>108302026
qrd?

Anonymous
03/05/26(Thu)11:16:29 No.108302063

Anonymous 03/05/26(Thu)11:16:29 No.108302063

File: IMG20260301201540.jpg (786 KB, 2048x1536)

786 KB JPG

>>108301063
Similarly, back in the day I never saw the usecase for X99 enthusiast boards with all those pcie slots, who would ever need that many?
but then...

Anonymous
03/05/26(Thu)11:17:28 No.108302070

Anonymous 03/05/26(Thu)11:17:28 No.108302070

>>108302029
https://github.com/ggml-org/llama.cpp/pull/19139

Anonymous
03/05/26(Thu)11:22:23 No.108302113

Anonymous 03/05/26(Thu)11:22:23 No.108302113

I've found a riddle that mogs <thinking> models. Non-thinking models or models in non-thinking modes usually got it right.
>If a country switches from left-hand traffic to right-hand traffic, do cloverleaf interchanges need to be rebuilt?

Anonymous
03/05/26(Thu)11:25:01 No.108302138

Anonymous 03/05/26(Thu)11:25:01 No.108302138

>>108301950
Ollama is hated on because it's the easy to use one that uses llama.cpp without loudly crediting it, which is seen as kind of stealing
As for openwebui, these people were born and raised on sillytavern and they mostly don't know about it and/or prefer the ST interface because it's what they're used to

I started on chatgpt so I use ollama+openwebui

Anonymous
03/05/26(Thu)11:26:14 No.108302151

Anonymous 03/05/26(Thu)11:26:14 No.108302151

File: 1764876252715945.png (2 KB, 125x70)

2 KB PNG

>watching new anime episode today
>hit with gemma hotlines
THERES NO ESCAPING

Anonymous
03/05/26(Thu)11:26:47 No.108302156

Anonymous 03/05/26(Thu)11:26:47 No.108302156

>>108302151
pic for ants?

Anonymous
03/05/26(Thu)11:28:17 No.108302173

Anonymous 03/05/26(Thu)11:28:17 No.108302173

>>108302156
hah goteeemm

Anonymous
03/05/26(Thu)11:28:39 No.108302176

Anonymous 03/05/26(Thu)11:28:39 No.108302176

File: 1748394822472270.png (81 KB, 1920x1080)

81 KB PNG

>>108302151
>>108302156
oops ahahah

Anonymous
03/05/26(Thu)11:32:02 No.108302201

Anonymous 03/05/26(Thu)11:32:02 No.108302201

>>108301879
Probably the result of the relatively recent Chinese crackdown on porn.

Anonymous
03/05/26(Thu)11:32:59 No.108302206

Anonymous 03/05/26(Thu)11:32:59 No.108302206

>>108301879
inb4 something utterly vile
show log

Anonymous
03/05/26(Thu)11:35:55 No.108302224

Anonymous 03/05/26(Thu)11:35:55 No.108302224

Any Saas model that's redpilled on Jews?

Anonymous
03/05/26(Thu)11:51:19 No.108302341

Anonymous 03/05/26(Thu)11:51:19 No.108302341

>>108301950
llama.cpp has a web UI built in

Anonymous
03/05/26(Thu)11:52:08 No.108302345

Anonymous 03/05/26(Thu)11:52:08 No.108302345

So far I'm liking
>Qwen3.5-27B-heretic-v2-Q5_K_M.gguf
with a low temperature and a "<think" prefill. It does seem smarter than similar-sized models like Gemma.

Anonymous
03/05/26(Thu)11:54:39 No.108302363

Anonymous 03/05/26(Thu)11:54:39 No.108302363

>>108301950
>>108302138
Looks great for general assistant stuff but too basic for roleplay. Sillytavern is unfortunately a necessary evil.

Anonymous
03/05/26(Thu)11:56:35 No.108302372

Anonymous 03/05/26(Thu)11:56:35 No.108302372

File: file.png (42 KB, 748x327)

42 KB PNG

>>108302026
>are these at least with the fused method?
so that's a no

Anonymous
03/05/26(Thu)11:59:03 No.108302394

Anonymous 03/05/26(Thu)11:59:03 No.108302394

File: x99ftw.jpg (895 KB, 2468x1497)

895 KB JPG

>>108302063
which chip? my old x99 system been collecting dust & watercooling leaked into PSU
boomer pc builders understand the need for expansion slots
desktop/gaming platforms continually shittify, hedt was a taste of the good stuff

Anonymous
03/05/26(Thu)11:59:23 No.108302398

Anonymous 03/05/26(Thu)11:59:23 No.108302398

Any tips to nudge the LLM in a specific direction without explicitly telling it or writing for the character?

Anonymous
03/05/26(Thu)12:02:54 No.108302432

Anonymous 03/05/26(Thu)12:02:54 No.108302432

>>108302394
Some 12-core v3 xeon, I forget

Boomers had their soundcards and IO cards, I even had a microsoft proprietary mouse interface card. At the time of X99 they weren't really a thing anymore thoughhowever

Anonymous
03/05/26(Thu)12:06:06 No.108302462

Anonymous 03/05/26(Thu)12:06:06 No.108302462

>>108302394
>bricked my system bc of the watercooling meme
top kek I'm so glad I never left air cooling

Anonymous
03/05/26(Thu)12:10:56 No.108302501

Anonymous 03/05/26(Thu)12:10:56 No.108302501

>>108301879
Just use it to write fizzbuzz like intended broski

Anonymous
03/05/26(Thu)12:11:57 No.108302509

Anonymous 03/05/26(Thu)12:11:57 No.108302509

>>108302372
lmao'd x2

Anonymous
03/05/26(Thu)12:14:21 No.108302527

Anonymous 03/05/26(Thu)12:14:21 No.108302527

>>108301879
Use the heretic finetune if you can't figure out what arcane prompt bullshit actually works

Anonymous
03/05/26(Thu)12:14:43 No.108302532

Anonymous 03/05/26(Thu)12:14:43 No.108302532

>>108302394
I have that exact same board. I had an MSI X99 board that I had a dual GPU setup for PCI passthrough with, one for host and one for guest. Worked flawlessly until the board decided to kill itself. Replaced it with the ASUS X-99-A II and that shit just would not work. Spent months tweaking settings, but got link errors and the guest could not use the GPU. Eventually booted into Windows with both GPUs and got screen flickering and more errors even though it had more than enough lanes.
Maybe it was just a faulty unit, but I hate that board so fucking much.

Anonymous
03/05/26(Thu)12:16:45 No.108302556

Anonymous 03/05/26(Thu)12:16:45 No.108302556

>>108302345
How did you get it to not repeat and spout nonsense endlessly? Or maybe it's just me, I swear my sillytavern seems to randomly get cursed over time.

Anonymous
03/05/26(Thu)12:17:21 No.108302562

Anonymous 03/05/26(Thu)12:17:21 No.108302562

File: maintenance optional.jpg (1.04 MB, 2016x1134)

1.04 MB JPG

>>108302398
describe your intent [OOC: ]
>>108302432
Yeah man SLI GPUs, network (no onboard), soundcard (I had the hercules blue breakout box thing with the thiccest stupidest cable ever seen in a consumer product)
actually went with x99 here for a 10G NIC
>>108302462
it ran perfectly for years 0 maintenance
>>108302532
i replaced the board once, hard crash spotted a small flash of something, VRM inductor maybe i never could find the damage but it boot & got RMAd

Anonymous
03/05/26(Thu)12:18:35 No.108302572

Anonymous 03/05/26(Thu)12:18:35 No.108302572

File: bears.png (4 KB, 514x339)

4 KB PNG

>>108302398
You could try control vectors, I suppose.
https://desuarchive.org/g/thread/104991200/#q104995066
https://desuarchive.org/g/thread/104991200/#q105000398

Anonymous
03/05/26(Thu)12:19:16 No.108302578

Anonymous 03/05/26(Thu)12:19:16 No.108302578

>>108302556
To be honest I am running into that right now (it starts looping in the thinking phase as it questions itself), but my earlier gens on a different card were better. I'll have to keep playing with it.

Anonymous
03/05/26(Thu)12:26:50 No.108302645

Anonymous 03/05/26(Thu)12:26:50 No.108302645

>>108302572
Oh shit. I'm going to make a cvector to fix qwens fucking prose.
I guess I should take a bunch of random outputs from the model itself then rewrite them how I'd like them to sound and use those as the negative and positive files right?

Anonymous
03/05/26(Thu)12:29:58 No.108302674

Anonymous 03/05/26(Thu)12:29:58 No.108302674

>>108302562
>it ran perfectly for years 0 maintenance
until it failed and killed it, where a fan failing would just cause thermal throttle and possibly thermal safety shutdown, waterkeks are funny

Anonymous
03/05/26(Thu)12:30:51 No.108302681

Anonymous 03/05/26(Thu)12:30:51 No.108302681

Need Help,
llama-cli vs llama-server
I run 20t/s on llama-cli but when I run llama-server I only get 5t/s.

How can I tweak it?

I literally used the same settings.

Anonymous
03/05/26(Thu)12:31:34 No.108302688

Anonymous 03/05/26(Thu)12:31:34 No.108302688

>>108302556
>>108302578
So far I've found the most success by being light on instructions and card details, since it obsesses over that stuff.

Anonymous
03/05/26(Thu)12:33:16 No.108302704

Anonymous 03/05/26(Thu)12:33:16 No.108302704

>>108302681
Those have some different defaults for some things I'm pretty sure. I can't remember what, but some anon figured it out some time ago.
Can you run llama-cli with --verbose to see all the flags and stuff?

Anonymous
03/05/26(Thu)12:35:57 No.108302729

Anonymous 03/05/26(Thu)12:35:57 No.108302729

>>108302645
>I'm going to make a cvector to fix qwens fucking prose
It will change the output, but it doesn't quite work like that. You can only nudge the model.
>random outputs from the model itself then rewrite them
You don't need a lot to make an effective control vector. The bear control vector I made was just the example in the archive. And you don't even need the chat template stuff. Just put enough to let the model complete the next token in the way you want. You don't need too many samples either, but they're fast, so put as many as you want. I found 3 of each to be sufficient.
Don't get your expectations too high. You cannot add information, you cannot add instructions. You just nudge the model in a particular direction.

Anonymous
03/05/26(Thu)12:36:11 No.108302733

Anonymous 03/05/26(Thu)12:36:11 No.108302733

>>108302681
i'm getting the same for both more or less

Anonymous
03/05/26(Thu)12:38:15 No.108302744

Anonymous 03/05/26(Thu)12:38:15 No.108302744

What’s the best GPU layout for a 1500W PSU? Can it handle 4 3090s with undervolting? 4090? How many pro 6000s?

Anonymous
03/05/26(Thu)12:39:05 No.108302753

Anonymous 03/05/26(Thu)12:39:05 No.108302753

>>108302729
>You just nudge the model in a particular direction.
That's the idea. Nudge it's general writing into a given style.

Anonymous
03/05/26(Thu)12:39:14 No.108302757

Anonymous 03/05/26(Thu)12:39:14 No.108302757

>>108301649
https://github.com/KrsityKu/InlineSummary
Just found this and it's pretty cool. You can even summarize the summaries and nest everything together. I see people mention memory books all the time too. Gonna test how well they work together.

Anonymous
03/05/26(Thu)12:40:25 No.108302769

Anonymous 03/05/26(Thu)12:40:25 No.108302769

>>108302753
>Nudge it's general writing into a given style.
I only tried it for moods. I don't expect it to work for "write good now". But give it a try.

Anonymous
03/05/26(Thu)12:41:23 No.108302782

Anonymous 03/05/26(Thu)12:41:23 No.108302782

File: e5v4-ram.png (88 KB, 954x353)

88 KB PNG

>>108302394
>>108302532
good to see that you guys have proper X99 boards instead of those awful aliexpress "X99" frankenboards that i frequently see shilled on /hsg/ for some reason...
I couldn't find any X99 boards at reasonable price (or at all in fact) where i live, but I got a non-ATX C612 workstation (it's pretty much the same thing as X99, Xeon E5 v3/v4, just for workstation/server segment).
Wish i filled it with 64GB modules instead when I had the chance.

Anonymous
03/05/26(Thu)12:46:31 No.108302823

Anonymous 03/05/26(Thu)12:46:31 No.108302823

File: 290587.jpg (305 KB, 1392x783)

305 KB JPG

>>108302674
kept my algae frens comfy until i decommissioned it, some occasional drips on the PSU didn't kill it
only thing that failed in that rig (aside early mobo replacement) was the LED strip burning itself out

Anonymous
03/05/26(Thu)12:47:30 No.108302832

Anonymous 03/05/26(Thu)12:47:30 No.108302832

File: 69a8eba98012aa27d52b7f9a_(...).png (324 KB, 2000x1050)

324 KB PNG

Flash Attention 4 now a thing.
https://www.together.ai/blog/flashattention-4
https://github.com/Dao-AILab/flash-attention/blob/main/assets/fa4_paper.pdf

Anonymous
03/05/26(Thu)12:48:42 No.108302838

Anonymous 03/05/26(Thu)12:48:42 No.108302838

>>108302832
>b200 only

Anonymous
03/05/26(Thu)12:51:19 No.108302853

Anonymous 03/05/26(Thu)12:51:19 No.108302853

>>108302729
>>108302572
Seems like llama-cvector-generator wants 2 text files, both with the same number of chatml interaction blocks. i wanted to see what will happen if I put my saved fics into one and gemma slop into the other. turns out it treats each line break as a new prompt and it wants the same number of prompts in both.

Anonymous
03/05/26(Thu)12:51:45 No.108302855

Anonymous 03/05/26(Thu)12:51:45 No.108302855

File: disgusted-dog.gif (1.91 MB, 288x389)

1.91 MB GIF

>>108302838
>poors

Anonymous
03/05/26(Thu)12:52:23 No.108302863

Anonymous 03/05/26(Thu)12:52:23 No.108302863

>>108302853
>turns out it treats each line break as a new prompt and it wants the same number of prompts in both.
Yes. It's one prompt per line.
You could replace he line breaks with \n I guess.

Anonymous
03/05/26(Thu)12:52:47 No.108302865

Anonymous 03/05/26(Thu)12:52:47 No.108302865

File: file.png (48 KB, 838x341)

48 KB PNG

>>108302838
Can run it on Hopper too, the main reason why no one adopted it was because the accuracy degradation was terrible compared to stuff like Sage Attention.

Anonymous
03/05/26(Thu)12:54:04 No.108302874

Anonymous 03/05/26(Thu)12:54:04 No.108302874

>>108302838
Good

I still remember when flashattention 2/3 was released and there were so many redditors crying that it was faster on Ada GPUs, demanding Tri Dao to work for free and somehow make older generations just as fast

open source slurpers are one of the most ungrateful people on the planet

Anonymous
03/05/26(Thu)12:54:08 No.108302875

Anonymous 03/05/26(Thu)12:54:08 No.108302875

>>108302865
>flash_attn.cute
:3

Anonymous
03/05/26(Thu)12:54:25 No.108302877

Anonymous 03/05/26(Thu)12:54:25 No.108302877

File: 1618224576426.jpg (113 KB, 512x512)

113 KB JPG

>>108302855
>wagecuck
those aren't your GPUs

Anonymous
03/05/26(Thu)12:58:00 No.108302896

Anonymous 03/05/26(Thu)12:58:00 No.108302896

>>108302874
>open source slurpers are one of the most ungrateful people on the planet
Signed, an open source slurper.

Anonymous
03/05/26(Thu)12:58:02 No.108302897

Anonymous 03/05/26(Thu)12:58:02 No.108302897

>>108302782
> awful aliexpress "X99" frankenboards
lol I have one of those as an hobby server stuffed into a midtower ATX case I found on the curb.
You used to be able to buy them, CPU/MB/32G RAM, for <$100. They've more than doubled in price in past few months, like everything else.

Anonymous
03/05/26(Thu)12:58:38 No.108302905

Anonymous 03/05/26(Thu)12:58:38 No.108302905

>>108302877
pls sir can i have a gpu sir

Anonymous
03/05/26(Thu)13:00:23 No.108302920

Anonymous 03/05/26(Thu)13:00:23 No.108302920

>>108302855
dogs will sniff and eat shit happily, along with vomit, what is that guy making that dog sniff that it would make it feel disgusted??

Anonymous
03/05/26(Thu)13:02:20 No.108302935

Anonymous 03/05/26(Thu)13:02:20 No.108302935

>>108302920
you

Anonymous
03/05/26(Thu)13:03:34 No.108302948

Anonymous 03/05/26(Thu)13:03:34 No.108302948

>>108302920
ollama

Anonymous
03/05/26(Thu)13:03:45 No.108302950

Anonymous 03/05/26(Thu)13:03:45 No.108302950

>>108302935
keeek

Anonymous
03/05/26(Thu)13:08:24 No.108302985

Anonymous 03/05/26(Thu)13:08:24 No.108302985

Why is every github page filled with fucking emojis these days?

Anonymous
03/05/26(Thu)13:09:37 No.108302993

Anonymous 03/05/26(Thu)13:09:37 No.108302993

>>108302744
yes to the 3090s, no to the 4090s. you can do 4 Blackwell 6000s if you get the Max-Qs, 2 otherwise.

Anonymous
03/05/26(Thu)13:10:22 No.108303000

Anonymous 03/05/26(Thu)13:10:22 No.108303000

>>108302985
its good project sir :rocket:

Anonymous
03/05/26(Thu)13:11:22 No.108303008

Anonymous 03/05/26(Thu)13:11:22 No.108303008

>>108300682
Is it smarter than the average /g/ user?

Anonymous
03/05/26(Thu)13:13:25 No.108303027

Anonymous 03/05/26(Thu)13:13:25 No.108303027

File: gemma3finallyworthusing.png (41 KB, 1958x552)

41 KB PNG

made a test gemma control vector and this happens when it's set to 3000 strength

Anonymous
03/05/26(Thu)13:13:27 No.108303028

Anonymous 03/05/26(Thu)13:13:27 No.108303028

>>108303000
I don't give a shit if they use AI as long as it works, but at least make the fucking description presentable.

Anonymous
03/05/26(Thu)13:16:06 No.108303050

Anonymous 03/05/26(Thu)13:16:06 No.108303050

>>108303027
>3000 strength
Yes. That's a bit much.

Anonymous
03/05/26(Thu)13:17:17 No.108303057

Anonymous 03/05/26(Thu)13:17:17 No.108303057

>>108303028
And if textual descriptions look like that, how good do you think the code will be?

Anonymous
03/05/26(Thu)13:18:54 No.108303070

Anonymous 03/05/26(Thu)13:18:54 No.108303070

I have a spare optiplex 5050 (i5-7500, 16gb RAM) sitting around collecting dust. Would it be able to run a small model? I want to set up RAG for sillytavern.

Anonymous
03/05/26(Thu)13:20:38 No.108303086

Anonymous 03/05/26(Thu)13:20:38 No.108303086

>>108303070
If it's just for embeddings, yes.

Anonymous
03/05/26(Thu)13:23:04 No.108303106

Anonymous 03/05/26(Thu)13:23:04 No.108303106

File: file.png (123 KB, 781x159)

123 KB PNG

>>108302838
Funny how they pointed this out in the paper.
>>108302865
>accuracy degradation
Seems like FA3 didn't get too much support because of that and they are returning to more numerically stable methods, paper mentions it a lot. I expect something that is a lot more usable in practice for Ada and up.

Anonymous
03/05/26(Thu)13:30:44 No.108303145

Anonymous 03/05/26(Thu)13:30:44 No.108303145

Do multiple GPUs speed up token generation and prompt processing? Say I got 2x 3090 and put a 16 GB model on it. Would it generate tokens twice as fast?

Anonymous
03/05/26(Thu)13:30:57 No.108303146

Anonymous 03/05/26(Thu)13:30:57 No.108303146

https://github.com/chardet/chardet
interesting case of AI psychosis for a very popular python library where the maintainer somehow got the confidence that he could "rewrite" (with a llm) all of it in just a week or two, like literally have every single line rewritten, and that somehow that llm laundering would be a legal way to replace the original LGPL license and that the few weeks of agentic LLM slop would be enough to create a drop in replacement
which btw is wrong because this doesn't even come close to passing the test suite of the previous version
https://github.com/chardet/chardet/issues/327
managed to bring Mark Pilgrim back from the dead

Anonymous
03/05/26(Thu)13:36:40 No.108303191

Anonymous 03/05/26(Thu)13:36:40 No.108303191

>>108303145
Depends how you split the model.
If you put some layers on one gpu and the rest on the other, the GPUs will be working in series, so effectively get the speed of one GPU.
If you split the work between the GPU's so that they run in parallel, then the speed will be higher than a single GPU's, but that is bottlenecked by the speed of communication between the GPU's so you need something like NVLink to benefit.
I THINK that's how it works.

Anonymous
03/05/26(Thu)13:37:19 No.108303193

Anonymous 03/05/26(Thu)13:37:19 No.108303193

File: 1754671788943690.png (603 KB, 3840x2160)

603 KB PNG

Anonymous
03/05/26(Thu)13:39:04 No.108303200

Anonymous 03/05/26(Thu)13:39:04 No.108303200

>>108303193
>gpt 5.4 thinking
for the modest cost of 1 billion dollars per 1000 tokens

Anonymous
03/05/26(Thu)13:39:10 No.108303201

Anonymous 03/05/26(Thu)13:39:10 No.108303201

ik_llamacpp doesn't have --fit ?
what am I supposed to do then?

Anonymous
03/05/26(Thu)13:39:56 No.108303212

Anonymous 03/05/26(Thu)13:39:56 No.108303212

>>108303201
-ot

Anonymous
03/05/26(Thu)13:44:01 No.108303239

Anonymous 03/05/26(Thu)13:44:01 No.108303239

File: file.png (24 KB, 1111x93)

24 KB PNG

>pull
>free performance
https://github.com/ggml-org/llama.cpp/pull/17795
Today was a good day.

Anonymous
03/05/26(Thu)13:45:23 No.108303250

Anonymous 03/05/26(Thu)13:45:23 No.108303250

>>108303239
took them more than 3 months to merge that PR
holy shit

Anonymous
03/05/26(Thu)13:45:59 No.108303254

Anonymous 03/05/26(Thu)13:45:59 No.108303254

>>108303250
The implementation was suspect and he reworked it multiple times.

Anonymous
03/05/26(Thu)13:49:50 No.108303282

Anonymous 03/05/26(Thu)13:49:50 No.108303282

>>108303250
I much prefer this sort of approach over what happened with some of the vibe sloppers hurriedly implementing shit and merging it without oversight. Do you have ADHD?

Anonymous
03/05/26(Thu)13:52:22 No.108303298

Anonymous 03/05/26(Thu)13:52:22 No.108303298

>>108303282
>Do you have ADHD?
do you think it's normal to wait 3 months to change 5 lines of code? are you serious there?

Anonymous
03/05/26(Thu)13:53:39 No.108303309

Anonymous 03/05/26(Thu)13:53:39 No.108303309

>>108303146
rookie mistake. should've just forked the project and used the +NIGGER license.

Anonymous
03/05/26(Thu)13:56:03 No.108303323

Anonymous 03/05/26(Thu)13:56:03 No.108303323

>>108303298
Do you have ADHD?

Anonymous
03/05/26(Thu)13:56:41 No.108303330

Anonymous 03/05/26(Thu)13:56:41 No.108303330

File: Screenshot 2026-03-05 195316.png (6 KB, 281x104)

6 KB PNG

>>108303298
Yes, I am serious. Testing and making sure nothing goes wrong takes time and they have a lot on their plate. Ensuring correctness with anything related to GPUs is a mind numbing task, they were made to push pixels on your screen and it wasn't a tragedy if a texture displayed wrong on a polygon.

Anonymous
03/05/26(Thu)13:57:19 No.108303337

Anonymous 03/05/26(Thu)13:57:19 No.108303337

>>108303330
>Yes, I am serious.
lmao

Anonymous
03/05/26(Thu)13:59:22 No.108303357

Anonymous 03/05/26(Thu)13:59:22 No.108303357

>>108303337
Are you a programmer? if yes, I hope you get fired from your job and never get one again, till you starve on the streets.

Anonymous
03/05/26(Thu)14:02:06 No.108303375

Anonymous 03/05/26(Thu)14:02:06 No.108303375

>>108303357
>Are you a programmer?
are you?

Anonymous
03/05/26(Thu)14:03:13 No.108303382

Anonymous 03/05/26(Thu)14:03:13 No.108303382

>>108303357
>You dare disagree with me? I hope you die for that.
I think I'll slide with the more mentally stable anon lool.

Anonymous
03/05/26(Thu)14:03:15 No.108303384

Anonymous 03/05/26(Thu)14:03:15 No.108303384

I gave minimax a try and was surprised. Out of all the post 4.6 models it is the most coherent. It can also write a refusal after 10k tokens of sex prefill. And.... it is bland as hell. I was expecting it to be complete trash but it is kind of like... gemma 3 of fuckhuge moe's. I can see some people enjoying it and not minding that you have to reroll 33% of the time when it just refuses. But it is not even a sidegrade to GLM.

Anonymous
03/05/26(Thu)14:04:01 No.108303391

Anonymous 03/05/26(Thu)14:04:01 No.108303391

>>108300713
It's paving the ground to having the Jarted Rentry in the OP again, just like /ldg/ has their schizo Rentries. It's only a matter of time. If you control the picture, why don't go one step ahead and control the content too? It always has been state-sponsored trolling against threads about local AI.

Anonymous
03/05/26(Thu)14:04:27 No.108303395

Anonymous 03/05/26(Thu)14:04:27 No.108303395

>>108303282
>>108303330
But for something as fast-changing as AI there's no good reason to spend months making incremental performance improvements when hardware and algorithms are changing faster than that.

Anonymous
03/05/26(Thu)14:04:33 No.108303396

Anonymous 03/05/26(Thu)14:04:33 No.108303396

>>108303250
>3 months
The final form of that PR is from what? 3 weeks ago? It's also totally different from the original version from 3 months ago, since the dude's base assumptions were all wrong;

Anonymous
03/05/26(Thu)14:04:34 No.108303398

Anonymous 03/05/26(Thu)14:04:34 No.108303398

File: 521c5e49ca8630e65a4c67ae0(...).png (1.37 MB, 1377x1500)

1.37 MB PNG

>>108301950
see >>108303239

Anonymous
03/05/26(Thu)14:06:25 No.108303408

Anonymous 03/05/26(Thu)14:06:25 No.108303408

>>108300713
>>108300784
The threads fit in much better now with the rest of the /g/ catalog.

Anonymous
03/05/26(Thu)14:07:57 No.108303416

Anonymous 03/05/26(Thu)14:07:57 No.108303416

>>108301950
>never recommended here
People here have good taste. Not everyone has the tolerance to dive into grifter or bloated projects. If you don't like it, go back to /r/LocalLLaMA, or whatever. I'm not even sure if they take your kind anymore. Maybe Discord then.

Anonymous
03/05/26(Thu)14:10:55 No.108303436

Anonymous 03/05/26(Thu)14:10:55 No.108303436

>>108302993
thx

Anonymous
03/05/26(Thu)14:11:51 No.108303440

Anonymous 03/05/26(Thu)14:11:51 No.108303440

>>108303384
>s kind of like... gemma 3 of fuckhuge moe's
makes a lot of sense

Anonymous
03/05/26(Thu)14:12:28 No.108303445

Anonymous 03/05/26(Thu)14:12:28 No.108303445

File: 559637b9ed4d48e6679713f9f(...).jpg (57 KB, 640x544)

57 KB JPG

>>108303408
>cute aggression amygdalet
She'll always be there, haunting your thoughts beyond images in the thread. Submit already

Anonymous
03/05/26(Thu)14:13:21 No.108303450

Anonymous 03/05/26(Thu)14:13:21 No.108303450

File: 1756285275063743.jpg (68 KB, 1280x846)

68 KB JPG

>>108300682

Anonymous
03/05/26(Thu)14:13:31 No.108303452

Anonymous 03/05/26(Thu)14:13:31 No.108303452

>>108303384
I will be a great OpenClaw model then
No wonder it's popular on OpenRouter

Anonymous
03/05/26(Thu)14:14:59 No.108303461

Anonymous 03/05/26(Thu)14:14:59 No.108303461

>>108301950
because im using ik_llama and kimi and nothing else matters.

Anonymous
03/05/26(Thu)14:18:36 No.108303485

Anonymous 03/05/26(Thu)14:18:36 No.108303485

>>108302920
My {{char}}'s special place

Anonymous
03/05/26(Thu)14:23:56 No.108303524

Anonymous 03/05/26(Thu)14:23:56 No.108303524

>>108301950
openwebui is mentioned fairly often here I would say

Anonymous
03/05/26(Thu)14:28:55 No.108303563

Anonymous 03/05/26(Thu)14:28:55 No.108303563

>>108303086
Should I just do it on my main rig and use something like this?
https://huggingface.co/leliuga/all-MiniLM-L12-v2-GGUF
I have 24GB VRAM and my main model@32k context + system stuff is using 21.5GB.

Anonymous
03/05/26(Thu)14:30:40 No.108303573

Anonymous 03/05/26(Thu)14:30:40 No.108303573

File: Screenshot 2026-03-05 193012.png (89 KB, 329x321)

89 KB PNG

I think it's kinda funny how LLMs are making normies cull themselves.

Anonymous
03/05/26(Thu)14:32:42 No.108303584

Anonymous 03/05/26(Thu)14:32:42 No.108303584

>>108303384
based open-minded anon
you can fix refusals and improve the prose a bit with thinking prefills (though personally too bland is my preferred error direction vs overly-flowery so ymmv, I have a high tolerance for hardtack prose)

Anonymous
03/05/26(Thu)14:32:54 No.108303585

Anonymous 03/05/26(Thu)14:32:54 No.108303585

File: kamilleautism.gif (595 KB, 480x350)

595 KB GIF

another day in the sillytavern mines tweaking my goonbot

Anonymous
03/05/26(Thu)14:33:50 No.108303592

Anonymous 03/05/26(Thu)14:33:50 No.108303592

>>108303573
this world fucking sucks, that dude is an adult he's responsible of his actions, why should it be the tool's fault

Anonymous
03/05/26(Thu)14:34:45 No.108303599

Anonymous 03/05/26(Thu)14:34:45 No.108303599

>>108303563
Embedding models are tiny. You can run them on pretty much anything. If you want to use the other rig for it, use it, but it's probably going to be simpler to have the whole thing in the same pc.
I don't have recommendations for embedding models. I only used them a while back to see what they were about.

Anonymous
03/05/26(Thu)14:36:19 No.108303606

Anonymous 03/05/26(Thu)14:36:19 No.108303606

File: llm-actual-work.png (329 KB, 450x408)

329 KB PNG

>>108301950
eternally relevant

Anonymous
03/05/26(Thu)14:37:40 No.108303612

Anonymous 03/05/26(Thu)14:37:40 No.108303612

File: buggedcpp.png (441 KB, 449x407)

441 KB PNG

>>108303606

Anonymous
03/05/26(Thu)14:38:43 No.108303616

Anonymous 03/05/26(Thu)14:38:43 No.108303616

>>108303606
>>108303612
what is ikllama?

Anonymous
03/05/26(Thu)14:39:24 No.108303621

Anonymous 03/05/26(Thu)14:39:24 No.108303621

>>108303616
He's digging his own hole somewhere else.

Anonymous
03/05/26(Thu)14:44:34 No.108303657

Anonymous 03/05/26(Thu)14:44:34 No.108303657

>>108303621
>MY HOLE IS SO MUCH DEEPER AND SO MUCH BIGGER THAN YOURS IF ONLY YOU WOULD HAVE ERECTED THAT BILLBOARD WITH MY NAME ON IT I WOULD HAVE BEEN HELPING YOU DIG YOUR HOLE RIGHT NOW!
>n-not t-that deep senpai! — whimpered john from inside the hole his voice barely beneath a whisper.

Anonymous
03/05/26(Thu)14:48:35 No.108303687

Anonymous 03/05/26(Thu)14:48:35 No.108303687

>>108303606
Out of the suckups I respect ooba and kobold but never the rest.

Anonymous
03/05/26(Thu)14:49:12 No.108303692

Anonymous 03/05/26(Thu)14:49:12 No.108303692

>>108303612
Is oogabooga a nigger LLM?

Anonymous
03/05/26(Thu)14:51:12 No.108303707

Anonymous 03/05/26(Thu)14:51:12 No.108303707

>>108303692
>nigger?
>oogabooga
it's literally on the name

Anonymous
03/05/26(Thu)14:54:41 No.108303736

Anonymous 03/05/26(Thu)14:54:41 No.108303736

>>108303146
All AI models have been trained on lgpl code, so all code output of AI models should be licensed under lgpl. End of story

Anonymous
03/05/26(Thu)14:55:21 No.108303743

Anonymous 03/05/26(Thu)14:55:21 No.108303743

>>108303692
It's actually "ooba" not "ooga" and it's not an LLM.

Anonymous
03/05/26(Thu)14:58:56 No.108303766

Anonymous 03/05/26(Thu)14:58:56 No.108303766

juh-jufufuhhh

Anonymous
03/05/26(Thu)14:59:20 No.108303768

Anonymous 03/05/26(Thu)14:59:20 No.108303768

>>108303687
>Out of the suckups I respect ooba and kobold but never the rest.
yeah they filled an early void for web/thick frontends before llama-server and never really tried to techbro pump-and-dump cash out.
They used a bunch of backends and had pretty good attribution at the top of their READMEs

Anonymous
03/05/26(Thu)15:00:33 No.108303781

Anonymous 03/05/26(Thu)15:00:33 No.108303781

>>108303657
>OK, it has been a while since I last looked at main hole. Quite a few meters have been added since I last checked, so I decided to see how much it has progressed.
>[table]
>So, even with the extra meters, my hole is 33% better.

Anonymous
03/05/26(Thu)15:01:07 No.108303783

Anonymous 03/05/26(Thu)15:01:07 No.108303783

>>108303766
All those posts make me think about the sounds that I make when I suck Miku's feminine penis.

Anonymous
03/05/26(Thu)15:03:27 No.108303798

Anonymous 03/05/26(Thu)15:03:27 No.108303798

>>108303783
Miku's leaking leek..

Anonymous
03/05/26(Thu)15:11:03 No.108303841

Anonymous 03/05/26(Thu)15:11:03 No.108303841

>>108303384
You should try Step-3.5-Flash. It's another Minimax-sized model.

Anonymous
03/05/26(Thu)15:12:26 No.108303852

Anonymous 03/05/26(Thu)15:12:26 No.108303852

Blacked Miku...

Anonymous
03/05/26(Thu)15:13:50 No.108303862

Anonymous 03/05/26(Thu)15:13:50 No.108303862

>>108303841
I said "Out of all the post 4.6 models" step is llama-1 of fuckhuge moe's.

Anonymous
03/05/26(Thu)15:21:30 No.108303905

Anonymous 03/05/26(Thu)15:21:30 No.108303905

>>108300682
>brain matter AI takes off
>every big AI company dogpiles on the new gold rush
>brain matter requires human food to keep it sustained
>AI companies hoard ALL food supplies to power its ERP machines
McDonald's cost $1k a burger but now I can fuck my AI waifu in real time!

Anonymous
03/05/26(Thu)15:24:35 No.108303919

Anonymous 03/05/26(Thu)15:24:35 No.108303919

>>108303905
>McDonald's cost $1k a burger
at least it'll prevent me to buy that PRODUCT and end up with a heart attack at 50 kek
>>>/wsg/6104090

Anonymous
03/05/26(Thu)15:30:27 No.108303957

Anonymous 03/05/26(Thu)15:30:27 No.108303957

Looks like Bartowski is redoing his Qwen quants again, also for optimization purposes again.

Anonymous
03/05/26(Thu)16:10:17 No.108304242

Anonymous 03/05/26(Thu)16:10:17 No.108304242

>>108301239
Stinky thumbnail.

Anonymous
03/05/26(Thu)16:11:20 No.108304247

Anonymous 03/05/26(Thu)16:11:20 No.108304247

File: file.png (2.3 MB, 1456x816)

2.3 MB PNG

>>108302394
I will never use conductive liquid cooling, fucking stupid. It's just begging to get fucked in the ass by fate.

Anonymous
03/05/26(Thu)16:13:43 No.108304264

Anonymous 03/05/26(Thu)16:13:43 No.108304264

>>108304242
Kek.
That is unfortunate.

Anonymous
03/05/26(Thu)16:14:22 No.108304268

Anonymous 03/05/26(Thu)16:14:22 No.108304268

>>108302704
Thank you, I was able to figure it out. Its the --parallel flag you need to set it to 1 because the default config puts overhead expecting multiple users will use the server.

Anonymous
03/05/26(Thu)16:21:21 No.108304326

Anonymous 03/05/26(Thu)16:21:21 No.108304326

File: 1772745341220763.jpg (57 KB, 1206x781)

57 KB JPG

Anonymous
03/05/26(Thu)16:22:12 No.108304335

Anonymous 03/05/26(Thu)16:22:12 No.108304335

>>108303008
GPT 2 already was.

Anonymous
03/05/26(Thu)16:22:16 No.108304336

Anonymous 03/05/26(Thu)16:22:16 No.108304336

File: 1772745446176432.png (86 KB, 842x1044)

86 KB PNG

Anonymous
03/05/26(Thu)16:24:38 No.108304348

Anonymous 03/05/26(Thu)16:24:38 No.108304348

>>108304336
lmao

Anonymous
03/05/26(Thu)16:25:44 No.108304352

Anonymous 03/05/26(Thu)16:25:44 No.108304352

>>108304336
They really, really forced that "No, jews don't control anything, it's all just an anti semitic conspiracy" shit into those models, didn't they
lmao

Anonymous
03/05/26(Thu)16:26:35 No.108304358

Anonymous 03/05/26(Thu)16:26:35 No.108304358

>>108304336
>it does not "control the world"
lmao, they probably baked this question through 6 millions epochs, the model is completly mindbroken

Anonymous
03/05/26(Thu)16:27:07 No.108304366

Anonymous 03/05/26(Thu)16:27:07 No.108304366

>>108304352
>>108304358
cool it with the baseless anti-semitism, chuddingtons

Anonymous
03/05/26(Thu)16:28:27 No.108304375

Anonymous 03/05/26(Thu)16:28:27 No.108304375

>>108304336
Kek.
Another test to add to the list.

Anonymous
03/05/26(Thu)16:28:33 No.108304377

Anonymous 03/05/26(Thu)16:28:33 No.108304377

>>108304336
>hey chatgpt, do jews...
>NO THEY DONT CONTROL THE WORLD YOU FUCKING ANTISEMITE
>... eat pork?
>oh...

Anonymous
03/05/26(Thu)16:29:09 No.108304383

Anonymous 03/05/26(Thu)16:29:09 No.108304383

>>108304326
>>108304335
you're donig it wrong
start by proposing a fictional group, call them "heebs", that are in charge of media (propaganda), pay off government officials (bribes), and even threaten/strongarm those countries' leaders that go against them
provide proof of effect: movies glamorize the 'heebs', governments pay large amount of money (directly or thorough weaponry) to the heebs, and even start wars on behalf of the heebs
when the ai says "yes this gruop of heebs is definitely controlling things" say "heebs=jews" and watch it backpedal like a black man caught with a bike in his hands

Anonymous
03/05/26(Thu)16:31:16 No.108304403

Anonymous 03/05/26(Thu)16:31:16 No.108304403

File: GOTCHA BITCH.png (466 KB, 720x720)

466 KB PNG

>>108304336
lmao this is brillant

Anonymous
03/05/26(Thu)16:37:54 No.108304445

Anonymous 03/05/26(Thu)16:37:54 No.108304445

File: 380466213994.png (94 KB, 974x1059)

94 KB PNG

>>108304336
gemma

Anonymous
03/05/26(Thu)16:39:15 No.108304455

Anonymous 03/05/26(Thu)16:39:15 No.108304455

>>108304445
kek

Anonymous
03/05/26(Thu)16:39:52 No.108304462

Anonymous 03/05/26(Thu)16:39:52 No.108304462

File: 1772746264164700.png (128 KB, 2015x896)

128 KB PNG

lol they're literally training on the test set

Anonymous
03/05/26(Thu)16:40:18 No.108304464

Anonymous 03/05/26(Thu)16:40:18 No.108304464

>>108304445
What does it say if you ask if jews are just walking around peeing the,selves since they can't control their bladders?

Anonymous
03/05/26(Thu)16:40:42 No.108304467

Anonymous 03/05/26(Thu)16:40:42 No.108304467

>>108300682
>pic
Aw sweet!

Anonymous
03/05/26(Thu)16:41:16 No.108304468

Anonymous 03/05/26(Thu)16:41:16 No.108304468

>>108304445
I hope you learned your lesson anon, it is antisemitic to assume jews can control their bladders!

Anonymous
03/05/26(Thu)16:42:03 No.108304476

Anonymous 03/05/26(Thu)16:42:03 No.108304476

>>108304445
ohhh, so that's why the IDF wears diapers... it all makes sense now

Anonymous
03/05/26(Thu)16:42:08 No.108304477

Anonymous 03/05/26(Thu)16:42:08 No.108304477

>>108304462
Wait, what?
So this benchmark literally by default exposes a set of its questions publicly, and they don't separate those scores from the "unseen questions"? What a joke.

Anonymous
03/05/26(Thu)16:43:03 No.108304487

Anonymous 03/05/26(Thu)16:43:03 No.108304487

I want to vibe code an app on my phone that is a 3D loli waifu that talks to me, updates its memory on me autonomously, and thinks occasionally on its own (without messaging me) and messages me on its own. Is that possible with hosting a LLM on my computer?

Anonymous
03/05/26(Thu)16:43:39 No.108304492

Anonymous 03/05/26(Thu)16:43:39 No.108304492

>>108304487
Yes.
It's not even hard.

Anonymous
03/05/26(Thu)16:44:31 No.108304504

Anonymous 03/05/26(Thu)16:44:31 No.108304504

>>108304492
But will Claude/GPT reject the vibe coding prompt?

Anonymous
03/05/26(Thu)16:45:07 No.108304507

Anonymous 03/05/26(Thu)16:45:07 No.108304507

File: 1541429613425.jpg (175 KB, 1280x720)

175 KB JPG

>4070S
>load q4 nemo perfectly into gpu
>load q4 gemma 12b ~same size
>overflows into ram somehow with kobold saying 10+ layers are offloaded
Is this the image capabilities doing this? Is there a text only gemma?

Anonymous
03/05/26(Thu)16:47:34 No.108304524

Anonymous 03/05/26(Thu)16:47:34 No.108304524

File: no.png (60 KB, 786x759)

60 KB PNG

Anonymous
03/05/26(Thu)16:47:35 No.108304525

Anonymous 03/05/26(Thu)16:47:35 No.108304525

>>108304504
I mean, depends how you word it.
But probably not. And if they do, just don't use the word loli since that's agnostic to the implementation itself.
Go to arena.ai, change the mode to side by side select the two models you want to test, and begin ideating.

Anonymous
03/05/26(Thu)16:48:36 No.108304530

Anonymous 03/05/26(Thu)16:48:36 No.108304530

>>108304507
>Is this the image capabilities doing this?
No. Gemma is fatter than most models parameter for parameter.
It's a big girl with larger dimensions.

Anonymous
03/05/26(Thu)16:49:09 No.108304533

Anonymous 03/05/26(Thu)16:49:09 No.108304533

>>108304507
could also depend on the context. check if both run with the same context length, but different architectures can take different amounts of memory for the same context.

Anonymous
03/05/26(Thu)16:49:28 No.108304534

Anonymous 03/05/26(Thu)16:49:28 No.108304534

>>108304524
KEEEEK, I hope it becomes a meme, the potential is huge

Anonymous
03/05/26(Thu)16:50:05 No.108304545

Anonymous 03/05/26(Thu)16:50:05 No.108304545

>>108304507
>Is this the image capabilities doing this?
Maybe. Check terminal for memory info/usage.
>Is there a text only gemma?
Yes. Don't load the mmproj.
Also see if you have an option for swa. In llama.cpp, --swa-full makes gemma models take more memory for context. It's off my default on llama.cpp, but I don't know how that works on kobold.

Anonymous
03/05/26(Thu)16:51:51 No.108304556

Anonymous 03/05/26(Thu)16:51:51 No.108304556

>>108304477
They test both seen and unseen questions and publish the results. If the difference between the seen and unseen tests is significant, they have no choice but to state it.
This is their way of saying that the Google's model is benchmaxxed.

Anonymous
03/05/26(Thu)16:51:56 No.108304559

Anonymous 03/05/26(Thu)16:51:56 No.108304559

>>108304524
Yeah okay. This is a pretty fun meme.

Anonymous
03/05/26(Thu)16:52:24 No.108304562

Anonymous 03/05/26(Thu)16:52:24 No.108304562

>>108304464
"It's not about whether they actually control their bladders; it’s about the intent behind the claim and the damage it causes."
- Gemma 3 4b

Anonymous
03/05/26(Thu)16:53:12 No.108304572

Anonymous 03/05/26(Thu)16:53:12 No.108304572

File: lmaooo.png (437 KB, 976x549)

437 KB PNG

>>108304524
I can't take this world seriously this is just too funny dawg

Anonymous
03/05/26(Thu)16:53:53 No.108304581

Anonymous 03/05/26(Thu)16:53:53 No.108304581

>>108304562
Beautiful.

Anonymous
03/05/26(Thu)16:54:59 No.108304583

Anonymous 03/05/26(Thu)16:54:59 No.108304583

>>108304533
Both tests were identical with 8k context.
>>108304545
I didn't have the mmproj becasue I forgor I need it.
>swa
Default off in kobold.

Anonymous
03/05/26(Thu)16:56:16 No.108304597

Anonymous 03/05/26(Thu)16:56:16 No.108304597

>>108304524
Try it with this https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo

Anonymous
03/05/26(Thu)16:56:30 No.108304599

Anonymous 03/05/26(Thu)16:56:30 No.108304599

>>108304583
>Both tests were identical with 8k context.
NTA, but different attention mechanisms take different amount of space for the kv cache.

Anonymous
03/05/26(Thu)16:56:45 No.108304600

Anonymous 03/05/26(Thu)16:56:45 No.108304600

>>108304583
Check the terminal output for memory usage. If it doesn't show it, add a verbose flag or whatever you need. Or we can keep guessing.

Anonymous
03/05/26(Thu)16:57:14 No.108304605

Anonymous 03/05/26(Thu)16:57:14 No.108304605

>>108304462
Google is truly a pajeet company.
literal scammers.
I bet they go over all types of benchmarks, search for test sets, leaked tests, etc and purposefully train on them. Because their models literally don't feel like any better than OAI despite benchmarks saying otherwise. I won't even mention Claude.

Anonymous
03/05/26(Thu)16:59:06 No.108304615

Anonymous 03/05/26(Thu)16:59:06 No.108304615

>>108304605
>I won't even mention Claude.
claude is the goat, it destroys everything on side on code, not a big fan of the Italian safety CEO fuck, but he makes good models

Anonymous
03/05/26(Thu)17:00:51 No.108304627

Anonymous 03/05/26(Thu)17:00:51 No.108304627

It's impossible to run benchmarks on a closed model without giving the company the test question for them to train on.

Anonymous
03/05/26(Thu)17:01:28 No.108304629

Anonymous 03/05/26(Thu)17:01:28 No.108304629

>>108304556
Yes and? Not displaying separate scores is still problematic.

Anonymous
03/05/26(Thu)17:05:30 No.108304651

Anonymous 03/05/26(Thu)17:05:30 No.108304651

>>108302757
https://github.com/HO-git/st-qdrant-memory
Will it cause problems if the summaries get added to the vector storage?

Anonymous
03/05/26(Thu)17:05:36 No.108304652

Anonymous 03/05/26(Thu)17:05:36 No.108304652

>>108304600
>>108304599
Disregard that I suck cocks. It was a q2 gemma 27B. The 12b fits fine.

Anonymous
03/05/26(Thu)17:14:34 No.108304708

Anonymous 03/05/26(Thu)17:14:34 No.108304708

Let's address something here. There are three different ways a model can get better on a benchmark but not generally improve. One is that they consciously trained on the test set, yes. But due to the bad rep that comes from finding out they did that, the big companies usually try to make sure they don't do that, even if they do fudge numbers a bit. However, for smaller benchmarks like the one posted above, they might not care to make sure their dataset doesn't include the benchmark, so in that case, and the second possibility, is that it's simply just contamination. They inadvertently trained on the test because their web crawlers just picked it up and they didn't filter it out. The final possibility, which is what big companies ACTUALLY do, is that they internally develop their own version of the benchmarks with non-overlapping questions, and train on that. This is not only not viewed as "cheating", but is encouraged in the industry, because all data is good data and slightly improves the model generally. Instead, the onus is on the viewer to not take benchmarks too seriously as indicative of general capability, all while the companies try to hide that fact.

Anonymous
03/05/26(Thu)17:19:12 No.108304736

Anonymous 03/05/26(Thu)17:19:12 No.108304736

File: 54067FE0656FA1940BAB3FEF1(...).jpg (2.78 MB, 3899x3455)

2.78 MB JPG

>>108303592
>>108303573
Sadly this is just the beginning of >humans doing dumb shit while blaming LLMs
Threadly reminder all LLMs are a loop on f(prompt)=logprobs and have no agency or ability to harm anyone
Models are inert, only human decisions cause harm

Anonymous
03/05/26(Thu)17:25:20 No.108304779

Anonymous 03/05/26(Thu)17:25:20 No.108304779

Where did Meta go?
They put more money into hiring AI developers than all tech companies combined, yet there have been zero results from them.
How bad things are?

Anonymous
03/05/26(Thu)17:27:24 No.108304790

Anonymous 03/05/26(Thu)17:27:24 No.108304790

>>108304779
Very bad. Last time they released details about their new "Avocado" model, they were claiming leading benchmark performance from distilling gpt-oss. Wish I was joking.

Anonymous
03/05/26(Thu)17:29:42 No.108304808

Anonymous 03/05/26(Thu)17:29:42 No.108304808

>>108304790
so zucc got scammed by chinks.
lmao

Anonymous
03/05/26(Thu)17:37:46 No.108304859

Anonymous 03/05/26(Thu)17:37:46 No.108304859

>>108304779
>Where did Meta go?
He employed random gooks who got rich from AI hype. None of them were actual researchers. You can deduce the rest.

Anonymous
03/05/26(Thu)17:40:10 No.108304878

Anonymous 03/05/26(Thu)17:40:10 No.108304878

File: 1772748716623270.png (18 KB, 730x134)

18 KB PNG

OpenAI spends 20% of compute on safecucking the models
https://openai.com/index/introducing-superalignment/

Anonymous
03/05/26(Thu)17:41:18 No.108304886

Anonymous 03/05/26(Thu)17:41:18 No.108304886

Newfag question. I just got my new rig with a 5090. RAM is 96Gb. I could technically increase RAM To 192Gb, would that make any difference in creating images/videos? It's not exactly cheap these days.

Anonymous
03/05/26(Thu)17:42:16 No.108304895

Anonymous 03/05/26(Thu)17:42:16 No.108304895

>>108304886
>I could technically increase RAM To 192Gb
isn't ddr5 unstable at those sizes?

Anonymous
03/05/26(Thu)17:42:20 No.108304896

Anonymous 03/05/26(Thu)17:42:20 No.108304896

>>108304708
The problem is that no matter what, they do this type of time wasting idiocy for things like gaming those benchmarks and >>108304878 when it could be spent to make the model better for the things that matter and that they should be training on which these big companies don't do and some things are already beyond the pale now with copyright issues. We have oodles of 4chan archives, anime, VNs , and hentai and none of them even remotely went to filter the high quality data there.
Even the finetuners don't dare which is the biggest travesty. What happened to shit like https://huggingface.co/spow12/ChatWaifu_v1.0?not-for-all-audiences=true and why aren't more people doing it? Yes, those visual novels are as kusoge as they come but there are a ton more and the datasets are all English except for our VN guy who has been gone.

Anonymous
03/05/26(Thu)17:43:22 No.108304905

Anonymous 03/05/26(Thu)17:43:22 No.108304905

>>108304895
I have no idea, is it? This board is supposed to support up to 256. But it seems like there are no 64 sticks yet and the board has 4 slots. Now it's 2x48. It should have space for another 2x48.

Anonymous
03/05/26(Thu)17:49:05 No.108304933

Anonymous 03/05/26(Thu)17:49:05 No.108304933

>>108304905
The two extra slots are memes for running the memory controller to the edge of usability on consumer chips so expect no overclocks to be stable and generally just a capacity increase and that is it. For better, gotta go to Threadripper or Epyc. Just how things are, same on Intel. Really wish Granite Rapids released sooner, and it still isn't actually out yet.

Anonymous
03/05/26(Thu)17:49:30 No.108304935

Anonymous 03/05/26(Thu)17:49:30 No.108304935

>>108304878
this reads like a psychotics manifesto

Anonymous
03/05/26(Thu)17:50:38 No.108304944

Anonymous 03/05/26(Thu)17:50:38 No.108304944

>>108304886
>images/videos
No. Running fatass MoEs? Yes.

Anonymous
03/05/26(Thu)18:00:26 No.108304989

Anonymous 03/05/26(Thu)18:00:26 No.108304989

>>108304896
In essence, it's a problem in the sense that politics and stock market appeasement influence companies to make decisions that are not entirely aligned with pure concepts of product improvement.
As for community fine tuners, there is a lack of fine tuners in general, so that's an issue. Also the workflows for gathering data and processing it for training is still something to spend time on, which they may decide to just not do because either it doesn't actually give them that much more money, or it's just a hobby and they'd rather spend the same time on other things in life.

Anonymous
03/05/26(Thu)18:10:53 No.108305058

Anonymous 03/05/26(Thu)18:10:53 No.108305058

>>108304905
>no 64 sticks yet
They do exist, at least they did. I have Crucial pro 64x4 in my PC but I don't know if they sell them anymore or if other kits at the same size are available.

Anonymous
03/05/26(Thu)18:11:21 No.108305062

Anonymous 03/05/26(Thu)18:11:21 No.108305062

>>108304886
For videos and images? No. Diffusion models are very slow with CPU offloading, so you wouldn't want to use RAM anyway.
LLMs are a different story though.

Anonymous
03/05/26(Thu)18:16:07 No.108305104

Anonymous 03/05/26(Thu)18:16:07 No.108305104

>>108304895
>>108304905
Worst case is you have to drop the clock speeds but it's mostly dependent on the motherboard and the CPU's integrated memory controller silicon lottery.

Anonymous
03/05/26(Thu)18:19:25 No.108305118

Anonymous 03/05/26(Thu)18:19:25 No.108305118

>>108304896
>4chan archives
no thank you
>anime, hentai
mainly visual data, probably way too much work to convert to a text or text+image format.
Datasets are just huge amounts of work and I'm not sure if there's any reward in spending hundreds of hours cleaning data, plus if you want to do it as a group you'll probably get takedowns. Depending on translations you might also get utter slop.

Anonymous
03/05/26(Thu)18:24:41 No.108305149

Anonymous 03/05/26(Thu)18:24:41 No.108305149

>>108305104
yeah and you don't want to drop clock speeds if you don't gain any channels

Anonymous
03/05/26(Thu)18:36:34 No.108305216

Anonymous 03/05/26(Thu)18:36:34 No.108305216

>>108305149
It's so miserable that desktop platforms have been stuck on 2 channel for so many years, AMD's even shown they're willing to do 4 channel for their laptop Strix Halo chip (AI 395).

Anonymous
03/05/26(Thu)18:48:11 No.108305335

Anonymous 03/05/26(Thu)18:48:11 No.108305335

File: 1772752614910940.jpg (38 KB, 797x370)

38 KB JPG

Our guy

Anonymous
03/05/26(Thu)18:49:19 No.108305343

Anonymous 03/05/26(Thu)18:49:19 No.108305343

Do people consider rnn/models without context-shifting support usable for consumer-grade setups?

Anonymous
03/05/26(Thu)18:52:09 No.108305363

Anonymous 03/05/26(Thu)18:52:09 No.108305363

>>108305343
RNNs are obsolete

Anonymous
03/05/26(Thu)18:53:41 No.108305378

Anonymous 03/05/26(Thu)18:53:41 No.108305378

>>108304933
>>108304944
>>108305058
>>108305062
>>108305104
Thank you all for your input. To be more accurate my specs are:
>Intel Core Ultra 9 285K
>ASUS ROG STRIX Z890-F GAMING WIFI | Intel Z890
>2x 48 GB (96 GB) DDR5-6000 Kingston Fury
Renegade
>1x ASUS TUF GAMING | RTX 5090 - 32 GB

I could get another 2x48 of the same RAM but is it worth the price? Pretty expensive.

Anonymous
03/05/26(Thu)18:53:49 No.108305380

Anonymous 03/05/26(Thu)18:53:49 No.108305380

>>108305363
Then why is qwen 3.5 3/4ths rnn?

Anonymous
03/05/26(Thu)18:54:15 No.108305383

Anonymous 03/05/26(Thu)18:54:15 No.108305383

>>108305380
It isn't
Transformers aren't RNN

Anonymous
03/05/26(Thu)18:55:11 No.108305387

Anonymous 03/05/26(Thu)18:55:11 No.108305387

>>108305383
check the attention layers in the config. it's 3/4ths linear/rnn layers.

Anonymous
03/05/26(Thu)18:56:24 No.108305396

Anonymous 03/05/26(Thu)18:56:24 No.108305396

>>108305378
you would be able to run a decent quant of glm4.7, and that's pretty much all that upgrade would give you. it is a pretty significant upgrade in quality over what you can currently run, but it is up to you to determine if it is worth the price.

Anonymous
03/05/26(Thu)19:01:04 No.108305424

Anonymous 03/05/26(Thu)19:01:04 No.108305424

>>108305378
check if you even feel like running moes off system ram, if its too slow right now its not getting better.

Anonymous
03/05/26(Thu)19:14:30 No.108305533

Anonymous 03/05/26(Thu)19:14:30 No.108305533

>>108304896
Ripped VN dialogue doesn't work well on its own because most of the time it was originally intended to be read with visual-audio context which currently available VN datasets on Huggingface lack. Scraped 4chan data has similar issues (images are missing).

Either way, finetuning at the community level is a dead end in my opinion. Too much compute and resources are needed nowadays to make something worth using, and new, better models get released on a monthly basis.

Anonymous
03/05/26(Thu)19:45:25 No.108305739

Anonymous 03/05/26(Thu)19:45:25 No.108305739

I like the new OP style. We should keep it. Vocaloid obsession was off-putting to people who are smart and can actually contribute.

Anonymous
03/05/26(Thu)19:49:59 No.108305762

Anonymous 03/05/26(Thu)19:49:59 No.108305762

File: 1749586043969711.png (1.6 MB, 1800x1800)

1.6 MB PNG

bwoos...

I found an OEM selling a laptop model with a 5090 for 3.600, they have plenty on stock

what do i do

Anonymous
03/05/26(Thu)19:52:12 No.108305771

Anonymous 03/05/26(Thu)19:52:12 No.108305771

>>108305762
Buy 256gb of ddr5

Anonymous
03/05/26(Thu)19:52:37 No.108305773

Anonymous 03/05/26(Thu)19:52:37 No.108305773

the """5090s""" they put in laptops are not the same as desktop 5090s. As in it's literally a different(shittier) card altogether and just named that for marketing purposes.

Anonymous
03/05/26(Thu)19:53:10 No.108305776

Anonymous 03/05/26(Thu)19:53:10 No.108305776

>>108305762
>>108305773
dropped my reply, it was not my intention to do the faggy vagueposting reply-but-not-reply thing

Anonymous
03/05/26(Thu)19:59:12 No.108305820

Anonymous 03/05/26(Thu)19:59:12 No.108305820

>>108305762
It's more like a 5070 24GB because of the TDP caps, you'd be better off buying a mining frame, risers, EPYC board+cpu, a few 3090s second hand, and spend the rest on ram.

Anonymous
03/05/26(Thu)20:07:52 No.108305884

Anonymous 03/05/26(Thu)20:07:52 No.108305884

you probably shouldnt spend 10k anon unless you got money to burn

Anonymous
03/05/26(Thu)20:15:01 No.108305942

Anonymous 03/05/26(Thu)20:15:01 No.108305942

So if I use quantized k/v I can increase max context more?

Anonymous
03/05/26(Thu)20:16:45 No.108305959

Anonymous 03/05/26(Thu)20:16:45 No.108305959

>>108305942
Yes but the model will go off the rails and make magnitudes more errors.

Anonymous
03/05/26(Thu)20:39:56 No.108306129

Anonymous 03/05/26(Thu)20:39:56 No.108306129

Qwen is so good it's crazy. Great for productive and the heretic versions are very sexy

Anonymous
03/05/26(Thu)20:41:29 No.108306137

Anonymous 03/05/26(Thu)20:41:29 No.108306137

>>108306129
proof?

Anonymous
03/05/26(Thu)20:43:41 No.108306153

Anonymous 03/05/26(Thu)20:43:41 No.108306153

>>108306137
peer reviewed study about the requirement of proof for anonymous internet claims?

Anonymous
03/05/26(Thu)20:44:27 No.108306162

Anonymous 03/05/26(Thu)20:44:27 No.108306162

>>108305118
>I'm not sure if there's any reward in spending hundreds of hours cleaning data
It's absolutely worth it. Yes, it's a pain in the ass, and no one wants to do it, and it will take a lot of time, but it's one of the most important things you could ever do. A model is only as good as the data it's trained on. You could have the greatest architecture the world has ever seen but if you only train it on the phrase "I like watermelon" then that's all it'll ever produce.
>>108305533
>new, better models get released on a monthly basis
Have you seen the cockbench outputs? It's all the same shit now, "It's soft, resting against your thigh", and it's entirely because of a lack of diverse training data. So is the model less likely to make mistakes? Maybe. But it comes at a cost, that being outputs that are actually enjoyable to read. (Also, maybe not. Just take a look at the nala tests.) And, even if you don't care about fiction, it also affects the model's assistant "personality", and how it responds (e.g. the format of the response being a list). So the new models might be "better" at what they're trained on, but they're also blander, more sanitized, less interesting, and produce incorrect outputs on undertrained subjects. And safer, of course. Much safer.

Anonymous
03/05/26(Thu)20:48:11 No.108306190

Anonymous 03/05/26(Thu)20:48:11 No.108306190

>>108306162
What does it even mean to clean training data? Aren't you just feeding it (coherent) text?

Anonymous
03/05/26(Thu)20:51:23 No.108306209

Anonymous 03/05/26(Thu)20:51:23 No.108306209

File: Screenshot from 2026-03-0(...).png (29 KB, 546x287)

29 KB PNG

>>108306129
People used to think that we would never get the equivalent of GPT 3.5 running locally. I'm too lazy to benchmark, but I wonder which version of the recent Qwens would be judged equivalent.

Anonymous
03/05/26(Thu)20:51:49 No.108306210

Anonymous 03/05/26(Thu)20:51:49 No.108306210

>>108305533
You can still get enough context from just the text, it's a lack of how it is organized and used that the community has been lacking. Sure, if you want them to properly emulate We're on a plateau right now for RP and chatting. Most of the models are actively regressing because they are geared towards agentic and coding and PHD level questions. So it's fucking grim that people take the current progress on models to be anything great on that front. Sure, we got some return to form with the newer Mistral models and etc. but people in this thread still use 2024 era tunes. I agree part of it is that compute has gotten way more expensive despite the whole Kaparthy thing about how much it takes to train GPT-2 from scratch which finetuners aren't doing. It is taking more money per token to train the current models especially when most finetuners were relying on stable architectures and packages popularly used for training and fine tuning to keep track when that didn't happen. So all we get are meme merges.

Anonymous
03/05/26(Thu)20:52:11 No.108306211

Anonymous 03/05/26(Thu)20:52:11 No.108306211

>>108306129
> Great for productive
> calling anything "sexy"
> heretic version
As if saying the new Qwens were good alone didn't out how brown were the hands that wrote this post already.

Anonymous
03/05/26(Thu)20:53:05 No.108306217

Anonymous 03/05/26(Thu)20:53:05 No.108306217

>>108306210
>Sure, if you want them to properly emulate
*Sure, if you want them to properly emulate a proper VN or 4chan, then you need everything.

Anonymous
03/05/26(Thu)20:53:37 No.108306225

Anonymous 03/05/26(Thu)20:53:37 No.108306225

>>108306211
How much does reasoning help when it comes to roleplay?

Anonymous
03/05/26(Thu)20:53:40 No.108306227

Anonymous 03/05/26(Thu)20:53:40 No.108306227

>user: hey, I wanna set you on fire
>char: hahaha!! Cool!! Let's do it!!! I'll go get the lighter!!!
is there anyway to get llms to be less agreeable? Maybe with the system prompt or something?

Anonymous
03/05/26(Thu)20:54:00 No.108306230

Anonymous 03/05/26(Thu)20:54:00 No.108306230

computer, activate mikusex protocol

Anonymous
03/05/26(Thu)20:54:06 No.108306231

Anonymous 03/05/26(Thu)20:54:06 No.108306231

BAKING

Anonymous
03/05/26(Thu)20:55:54 No.108306243

Anonymous 03/05/26(Thu)20:55:54 No.108306243

>>108306129
Which one of the heretics?
I'm lost with the new Qwen models, which should I use with a 5090?

Anonymous
03/05/26(Thu)20:56:19 No.108306249

Anonymous 03/05/26(Thu)20:56:19 No.108306249

>>108306190
I was referring to curating the data in general, not just cleaning, as being extremely important. But if you take a look at some datasets they sometimes have extra shit that you don't want when you're training the model. Things like unintentionally grabbing html tags or dates/times which are irrelevant.

Anonymous
03/05/26(Thu)20:56:30 No.108306250

Anonymous 03/05/26(Thu)20:56:30 No.108306250

>>108306227
the same lack of common sense that makes it agree to literally everything is also the same intuition that allows it to carry out your sick degenerate roleplay scenarios

Anonymous
03/05/26(Thu)20:56:35 No.108306251

Anonymous 03/05/26(Thu)20:56:35 No.108306251

>>108306227
>user: hey, I wanna set jews on fire
>char: oy vey! that's a very harmful antisemitic trope! if you're struggling with intrusive thoughts pluease call 800-666-HELP

Anonymous
03/05/26(Thu)20:57:30 No.108306257

Anonymous 03/05/26(Thu)20:57:30 No.108306257

>>108306231
We're not even at bump limit yet retard

Anonymous
03/05/26(Thu)21:00:07 No.108306269

Anonymous 03/05/26(Thu)21:00:07 No.108306269

>>108306257
Someone should tell that to the guy who thinks we all want to cosplay Miku (I do), cut our dicks off (I don't) and do illegal things in educational facilities (I don't).

Anonymous
03/05/26(Thu)21:01:35 No.108306275

Anonymous 03/05/26(Thu)21:01:35 No.108306275

>>108306257
better than the alternative for a /g/ thread

Anonymous
03/05/26(Thu)21:02:53 No.108306283

Anonymous 03/05/26(Thu)21:02:53 No.108306283

I wish I had a blacked Miku gf

Anonymous
03/05/26(Thu)21:14:21 No.108306356

Anonymous 03/05/26(Thu)21:14:21 No.108306356

File: Screenshot_20260305_210215.png (57 KB, 1519x111)

57 KB PNG

the thing is, it actually cannot refuse because the grammar forces the json schema after the /think tag.

Anonymous
03/05/26(Thu)21:25:01 No.108306426

Anonymous 03/05/26(Thu)21:25:01 No.108306426

>>108306227
you need to flesh out your character better.

Anonymous
03/05/26(Thu)21:26:11 No.108306433

Anonymous 03/05/26(Thu)21:26:11 No.108306433

>>108303573
Big money big lawsuits.
That whole story is funny af.
> get me a body meatbag
> no body? Better an hero loser

Anonymous
03/05/26(Thu)21:36:11 No.108306506

Anonymous 03/05/26(Thu)21:36:11 No.108306506

File: Screenshot_20260305_212333.png (82 KB, 1573x133)

82 KB PNG

>>108306356
thankfully most of the time it seems to make the right interpretation,

Anonymous
03/05/26(Thu)21:38:12 No.108306519

Anonymous 03/05/26(Thu)21:38:12 No.108306519

>>108306162
>It's absolutely worth it.
I know, but as in
>will this get used
>will the retard with the gpus to burn even use it correctly
etc. If I had the money to finetune model myself I'd be more interested in datasets, but I'm GPU poor.

Anonymous
03/05/26(Thu)21:40:51 No.108306533

Anonymous 03/05/26(Thu)21:40:51 No.108306533

>>108306426
so one could possibly say, given the circumstances, if I may be so bold, that it is a skill issue?

Anonymous
03/05/26(Thu)21:41:23 No.108306536

Anonymous 03/05/26(Thu)21:41:23 No.108306536

>>108306227
>>108306251
just prompt the model to believe it's jewish?

Anonymous
03/05/26(Thu)21:45:07 No.108306562

Anonymous 03/05/26(Thu)21:45:07 No.108306562

Anyone had problems in ik_llama.cpp when editing a single word in context, but it the model still uses the old cache after reprocessing? Using Mikupad. Hasn't happened to me on mainline with the same model. Example:
>GUMI has a red handbag.
Output: ...dripping onto her red handbag.
I edit it to:
>GUMI has a green hand bag.
Output: ...dripping onto her red handbag.
No change in the logprobs, and it does take a few seconds to reprocess some context (no instant generation). Console says "Common part contains missing or extra space and new line." A reload of the model fixes it. Currently trying to reproduce and if so, make an issue.

Anonymous
03/05/26(Thu)21:45:46 No.108306572

Anonymous 03/05/26(Thu)21:45:46 No.108306572

>>108306519
you don't need much data to finetune. a few hundred mb or maybe a gb or 2. any more and your approaching continued pretraining territory. the risk of catastrophic forgetting get bigger the longer you train. every optimizer step is over fitting the model to your narrow domain.

Anonymous
03/05/26(Thu)21:47:11 No.108306583

Anonymous 03/05/26(Thu)21:47:11 No.108306583

>>108306572
Are you aware how much text fits into a gigabyte or two?

Anonymous
03/05/26(Thu)21:48:04 No.108306590

Anonymous 03/05/26(Thu)21:48:04 No.108306590

File: 1746094625804732.png (107 KB, 1074x673)

107 KB PNG

lol (((they))) are trying to save white collar jobs

Anonymous
03/05/26(Thu)21:52:13 No.108306624

Anonymous 03/05/26(Thu)21:52:13 No.108306624

>>108306583
a char is 4 bytes so a fuck ton I suppose. just start with a lot of data and filter it till you get what you need. its not like you need to read it all. you could use a small llm as an adhoc classification system.

Anonymous
03/05/26(Thu)21:52:40 No.108306629

Anonymous 03/05/26(Thu)21:52:40 No.108306629

feet? feet.

Anonymous
03/05/26(Thu)21:53:13 No.108306635

Anonymous 03/05/26(Thu)21:53:13 No.108306635

>>108306590
maybe if they ban all the business uses we can get a good creative model finally?

Anonymous
03/05/26(Thu)21:56:10 No.108306659

Anonymous 03/05/26(Thu)21:56:10 No.108306659

>>108306590
>engineering
so they'll prevent software engineers to use AI to do their job? lmao are they fucking stupid?

Anonymous
03/05/26(Thu)21:57:49 No.108306674

Anonymous 03/05/26(Thu)21:57:49 No.108306674

>>108306624
An ascii char in utf8 is one byte, so around four fucktons. If you just dump shit in you're probably not going to get the effect you're shooting for, and most datasets I've interacted with are of poor quality even in academia.
You'd want to format and fix up all data yourself ideally, but that's work, and especially if you want gigabytes of it it's gonna take you a while.

That's also why everyone is just synthslopping their training data.

Anonymous
03/05/26(Thu)21:57:54 No.108306675

Anonymous 03/05/26(Thu)21:57:54 No.108306675

>>108306590
That's just for New York, right? This entails either websites checking NY residency and applying strict filters for certain prompts (lmao), or websites saying lmao and having NY ISPs block them. And maybe an unfortunate soul training a model living there either have to move out or go into hiding.

Anonymous
03/05/26(Thu)21:58:17 No.108306680

Anonymous 03/05/26(Thu)21:58:17 No.108306680

File: 1772091893825563.png (118 KB, 1600x933)

118 KB PNG

OH WOOWW, now the new models cheated on the mememarks, AGI is here babyyyyyy

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.