/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Janitor applications are now open. Apply here!

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 06/02/26(Tue)10:11:02 No.108963996

File: Screenshot 2026-05-26 at (...).png (3.11 MB, 2614x1554)

3.11 MB PNG

/lmg/ - Local Models General Anonymous 06/02/26(Tue)10:11:02 No.108963996

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108956323 & >>108949851

►News
>(05/29) Step 3.7 Flash released: https://hf.co/stepfun-ai/Step-3.7-Flash
>(05/21) Hy-MT2 “fast-thinking” translation models released: https://hf.co/collections/tencent/hy-mt2
>(05/20) Cohere releases Command A+ 218B-A25B: https://cohere.com/blog/command-a-plus
>(05/16) llama + spec: MTP Support #22673 merged: https://github.com/ggml-org/llama.cpp/pull/22673
>(05/08) KSA-4B-base released: https://hf.co/OpenOneRec/KSA-4B-base

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
06/02/26(Tue)10:11:30 No.108963999

Anonymous 06/02/26(Tue)10:11:30 No.108963999

File: teto principle.png (1.04 MB, 1024x1024)

1.04 MB PNG

►Recent Highlights from the Previous Thread: >>108956323

--Intel Crescent Island GPU's high VRAM capacity and bandwidth specifications:
>108956813 >108956855 >108956867 >108956870 >108956887 >108956903 >108956945 >108956964 >108956979 >108957315
--Comparing mistral.rs and llama.cpp performance on B200 GPUs:
>108956708 >108956745 >108956760 >108956809 >108957775 >108958023 >108958036 >108958048
--Comparing Nvidia N1X memory bandwidth against AMD Ryzen AI Max:
>108958059 >108958069 >108958082 >108958089 >108959414 >108961327 >108961628
--llama-bench results for Qwen 3.5 and Gemma 4 on M4 Max:
>108960068 >108960632
--Mistral.rs benchmarks showing poor UGFF output quality vs llama.cpp:
>108957878 >108957885 >108958096 >108958129
--Addressing Gemma 4's repetitiveness in roleplay:
>108960336 >108960455 >108960593 >108960708 >108962888 >108962990
--Proprietary status and open-source promises of MiniMax M3:
>108956662 >108956673 >108956733 >108956692 >108960423 >108956722
--Coding agents preferring shell commands over built-in tool actions:
>108957947 >108957967 >108957980 >108957985 >108958007
--Local TTS recommendations for long-form narration and PDF reading:
>108961085 >108961152 >108961188 >108961212 >108961282 >108961744
--Mixed reports on llama.cpp PR for limiting llama_context outputs:
>108957117 >108957200 >108957226 >108957588 >108960370
--Using DuckDB and local datasets for offline information retrieval:
>108962182 >108962270 >108963394
--OS power plans and GPU clock locking for faster offloading:
>108958954 >108959002 >108959506
--ROCm support and stability issues with v620 GPUs:
>108956495 >108956554
--Comparing Go and Python memory usage for TTS server startup:
>108962253
--Logs:
>108957062 >108957878 >108958548 >108961425 >108962253 >108962759 >108963127 >108963543
--Miku (free space):
>108956410 >108960487 >108962255 >108962716

►Recent Highlight Posts from the Previous Thread: >>108956325

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
06/02/26(Tue)10:16:28 No.108964039

Anonymous 06/02/26(Tue)10:16:28 No.108964039

Tetolove

Anonymous
06/02/26(Tue)10:17:05 No.108964044

Anonymous 06/02/26(Tue)10:17:05 No.108964044

Tetomarriage

Anonymous
06/02/26(Tue)10:18:10 No.108964051

Anonymous 06/02/26(Tue)10:18:10 No.108964051

File: 1752045141530539.png (23 KB, 1070x156)

23 KB PNG

nth for QoL

Anonymous
06/02/26(Tue)10:24:11 No.108964079

Anonymous 06/02/26(Tue)10:24:11 No.108964079

File: c2_slop.png (99 KB, 674x668)

99 KB PNG

I ripped out Orb's slop detector an ran it on the c2 logs dataset
Now I need to make deepseek 4 rewrite the flagged sentences until no more slop is detected, then try training some shit on it

Anonymous
06/02/26(Tue)10:25:19 No.108964085

Anonymous 06/02/26(Tue)10:25:19 No.108964085

File: 23a.jpg (27 KB, 500x375)

27 KB JPG

Can I force more of a moe model onto ram? If I just leave it on auto I can fit Q4 moe qwen despite being able to fit Q6 moe gemma. And I have ram to spare.

Anonymous
06/02/26(Tue)10:27:22 No.108964095

Anonymous 06/02/26(Tue)10:27:22 No.108964095

>>108964085
just j-j-jam it in

Anonymous
06/02/26(Tue)10:28:55 No.108964103

Anonymous 06/02/26(Tue)10:28:55 No.108964103

Fuck, Marry, Kill
Miku, Teto, Neru

Go.

Anonymous
06/02/26(Tue)10:29:40 No.108964111

Anonymous 06/02/26(Tue)10:29:40 No.108964111

>>108964103
Kill, Marry, Fuck

Anonymous
06/02/26(Tue)10:31:07 No.108964119

Anonymous 06/02/26(Tue)10:31:07 No.108964119

>>108964103
Marry then kill, fuck then kill, kill

Anonymous
06/02/26(Tue)10:31:54 No.108964126

Anonymous 06/02/26(Tue)10:31:54 No.108964126

>>108964079
Interesting idea. I mean if I cared I would implement something like this too.
> Scan output for sneed words -> generate second pass.
This can be automagic too.

Anonymous
06/02/26(Tue)10:32:14 No.108964128

Anonymous 06/02/26(Tue)10:32:14 No.108964128

>>108964079
We can anticipate the results: it will be more difficult to train compared to the raw logs, and the model will exhibit different slop, while also degrading in capabilities anywhere else than roleplay in the format of the logs.

Anonymous
06/02/26(Tue)10:32:33 No.108964131

Anonymous 06/02/26(Tue)10:32:33 No.108964131

>>108963996
What state do they benchmark closed source models, because they fiddle with them and change system prompts almost daily.

Anonymous
06/02/26(Tue)10:32:42 No.108964133

Anonymous 06/02/26(Tue)10:32:42 No.108964133

>>108964085
yes, with -ngl and -ot args

Anonymous
06/02/26(Tue)10:33:47 No.108964140

Anonymous 06/02/26(Tue)10:33:47 No.108964140

>>108964079
won't that that just converge to next thing you'll find annoying?
also, how long are these?

Anonymous
06/02/26(Tue)10:34:04 No.108964143

Anonymous 06/02/26(Tue)10:34:04 No.108964143

Is Vulkan good enough nowadays that I can pick up a second AMD card with lots of VRAM to pair with my 5080?

Anonymous
06/02/26(Tue)10:34:35 No.108964147

Anonymous 06/02/26(Tue)10:34:35 No.108964147

>>108964143
All I hear is seething from the AMD camp

Anonymous
06/02/26(Tue)10:35:49 No.108964155

Anonymous 06/02/26(Tue)10:35:49 No.108964155

>>108964143
It works decently enough on my 7900 XTX, but that's just one GPU I haven't tried a multi GPU setup with it.

Anonymous
06/02/26(Tue)10:36:27 No.108964162

Anonymous 06/02/26(Tue)10:36:27 No.108964162

>>108964128
>>108964140
You're actually absolutely right
That would be no different than to just distill deepseek directly now that I think about it

Anonymous
06/02/26(Tue)10:36:31 No.108964163

Anonymous 06/02/26(Tue)10:36:31 No.108964163

>>108964143
absolutely sir please to buy! supports is very the good

Anonymous
06/02/26(Tue)10:37:16 No.108964167

Anonymous 06/02/26(Tue)10:37:16 No.108964167

ComfyOrg is a grift company. We need cumfart alternatives not engrain grifter projects into our chats. Absolutely disgusting

Anonymous
06/02/26(Tue)10:38:37 No.108964171

Anonymous 06/02/26(Tue)10:38:37 No.108964171

>>108964167
You should be posting in that pewdiepie thread instead.

Anonymous
06/02/26(Tue)10:38:44 No.108964172

Anonymous 06/02/26(Tue)10:38:44 No.108964172

>>108964167
You lost boy?
>>108964162
I don't understand the point to deepseek now that Qwen is here.

Anonymous
06/02/26(Tue)10:39:24 No.108964176

Anonymous 06/02/26(Tue)10:39:24 No.108964176

>>108964162
i mean, you could change the slop profile, but not remove it entirely
maybe divide the dataset into x parts and each element have different guidelines for rewriting?
like 25% one style 25% other, etc.
at least it would be different

Anonymous
06/02/26(Tue)10:40:48 No.108964182

Anonymous 06/02/26(Tue)10:40:48 No.108964182

File: 1766130318643777.jpg (685 KB, 1756x2200)

685 KB JPG

>>108964147
>>108964155
>>108964163
Guess I'll just try finding a cheap (lol) 16GB 4060Ti or 5060Ti

Anonymous
06/02/26(Tue)10:41:50 No.108964190

Anonymous 06/02/26(Tue)10:41:50 No.108964190

>>108964182
Your speeds are going to go to shit target a used dupe card or just get a unfied memory system to cope

Anonymous
06/02/26(Tue)10:43:00 No.108964197

Anonymous 06/02/26(Tue)10:43:00 No.108964197

File: gemmy.png (49 KB, 1041x831)

49 KB PNG

even in strokes gemmy is pure sovl

Anonymous
06/02/26(Tue)10:43:12 No.108964201

Anonymous 06/02/26(Tue)10:43:12 No.108964201

>>108964140
These anti-slop methods will never work properly except for the low-hanging fruit, because the samples will get fixed independently from each other, and that's where new slop will be introduced.

Trying to fixing the problem just by finetuning is not the solution. A big source for the problem is that during inference the various conversations and message swipes are independent from each other, and current samplers do not fix this. LLMs do not have memory of past messages for avoiding frequently used patterns.

Anonymous
06/02/26(Tue)10:43:29 No.108964204

Anonymous 06/02/26(Tue)10:43:29 No.108964204

>>108964171
why?
>>108964172
OP image

>>108963996
use sdcpp instead of that bloated malware. fuck comfy

Anonymous
06/02/26(Tue)10:43:55 No.108964205

Anonymous 06/02/26(Tue)10:43:55 No.108964205

>>108964176
>maybe divide the dataset into x parts and each element have different guidelines for rewriting?
Good luck getting it to converge

Anonymous
06/02/26(Tue)10:45:51 No.108964212

Anonymous 06/02/26(Tue)10:45:51 No.108964212

>>108964197
this is a masterpiece, ex tier writing

Anonymous
06/02/26(Tue)10:48:23 No.108964223

Anonymous 06/02/26(Tue)10:48:23 No.108964223

>>108964204
Because you are infatuated with youtubers.

Anonymous
06/02/26(Tue)10:48:50 No.108964228

Anonymous 06/02/26(Tue)10:48:50 No.108964228

>>108964190
I was going off the figures this >>108956026 anon posted
Mid 20s t/s+ with Gemma MTP coming soon sounds good enough

Anonymous
06/02/26(Tue)10:50:39 No.108964240

Anonymous 06/02/26(Tue)10:50:39 No.108964240

>>108964201
>because the samples will get fixed independently from each other, and that's where new slop will be introduced.
What if I tested it against the whole dataset blob? Not a single expression will be repeated

Anonymous
06/02/26(Tue)10:50:45 No.108964241

Anonymous 06/02/26(Tue)10:50:45 No.108964241

>>108964228
As long as you're good with it that's the only thing that matters.

Anonymous
06/02/26(Tue)10:51:18 No.108964244

Anonymous 06/02/26(Tue)10:51:18 No.108964244

File: fuck gemma.png (283 KB, 1378x1996)

283 KB PNG

>>108964197
Gemma copies are personalized

Anonymous
06/02/26(Tue)10:53:44 No.108964259

Anonymous 06/02/26(Tue)10:53:44 No.108964259

File: 1714067749751.jpg (428 KB, 1825x1152)

428 KB JPG

>>108964103
Marry, Fuck, Kill

Anonymous
06/02/26(Tue)10:54:52 No.108964271

Anonymous 06/02/26(Tue)10:54:52 No.108964271

>>108964167
>>108964204
You are still a raped retard, Julien

Anonymous
06/02/26(Tue)10:55:24 No.108964273

Anonymous 06/02/26(Tue)10:55:24 No.108964273

File: 1763881939726576.png (75 KB, 776x202)

75 KB PNG

many such cases

Anonymous
06/02/26(Tue)10:56:10 No.108964276

Anonymous 06/02/26(Tue)10:56:10 No.108964276

>>108964223
why would I be?

Anonymous
06/02/26(Tue)10:56:18 No.108964278

Anonymous 06/02/26(Tue)10:56:18 No.108964278

>>108964167
>>108964271
Can you both fuck off back to your containment general? Thanks.

Anonymous
06/02/26(Tue)10:56:31 No.108964280

Anonymous 06/02/26(Tue)10:56:31 No.108964280

>>108964244
damn, she got you there

Anonymous
06/02/26(Tue)10:59:42 No.108964298

Anonymous 06/02/26(Tue)10:59:42 No.108964298

File: 1776554137411894.jpg (384 KB, 2120x1124)

384 KB JPG

>TO STATES
>• Implement a prohibition on standalone generative AI systems that have been built using unlawful web scraping, defined as the bulk and mass collection of training data through the World Wide Web, without protection against non-consensual collection of personal data.
>• Enact legislation requiring transparency regarding training data collection practices and accountability across AI supply chains, and further:
>• Require in law that technology companies, including those developing and deploying generative AI systems, carry out ongoing and proactive human rights due diligence to identify and address human rights risks and impacts related to their global operations. This must include clear regulatory frameworks requiring mandatory human rights impact assessments before the deployment of generative AI systems.
>[..]
>• Ensure meaningful consultation by independent bodies with affected communities, particularly those historically marginalized or discriminated against, throughout the lifecycle of the product.
>• Where AI deployments are identified as exacerbating existing inequalities or creating new forms of discrimination, to cease their use.
>• In all development, deployment and use of any AI system, guarantee access to effective remedy for human rights abuses linked to the impacts of technology companies, wherever the harms occur, including harms resulting from the operations of their subsidiaries, whether foreign or domestic. Redress mechanisms should be made easily accessible and understandable to enable individuals to file complaints when their rights have been infringed.

Anonymous
06/02/26(Tue)11:03:43 No.108964322

Anonymous 06/02/26(Tue)11:03:43 No.108964322

>>108964298
Dose Amnesty International exclusively hire retards?

Anonymous
06/02/26(Tue)11:06:38 No.108964337

Anonymous 06/02/26(Tue)11:06:38 No.108964337

>>108964278
catjak is here too so you can share his mental illness

Anonymous
06/02/26(Tue)11:06:47 No.108964338

Anonymous 06/02/26(Tue)11:06:47 No.108964338

>>108964103
marry, fuck, kill
it makes the most sense if you think about it

Anonymous
06/02/26(Tue)11:07:46 No.108964345

Anonymous 06/02/26(Tue)11:07:46 No.108964345

>>108964278
comfyorg should die for the enshitification of the ui. They killed a good app

Anonymous
06/02/26(Tue)11:08:44 No.108964348

Anonymous 06/02/26(Tue)11:08:44 No.108964348

>>108964103
fuck fuck fuck fuck fuck fuck fuck fuck

Anonymous
06/02/26(Tue)11:08:47 No.108964349

Anonymous 06/02/26(Tue)11:08:47 No.108964349

>>108964298
>they'll protest this but not the age verif everywhere absolutely raping any inch of privacy one might have

Anonymous
06/02/26(Tue)11:13:02 No.108964370

Anonymous 06/02/26(Tue)11:13:02 No.108964370

>>108964298
thankfully this is so retarded on its face that it will be rightfully ignored. a ban on training on bulk data from the web is basically a blanket ban on LLMs kek. and as for the rest
>our idea of effective regulation is, um... you have to fill out a lot of paperwork that no one will read about heckin' systemic injustice!
look at my progressives dawg, we are never having an effective left wing movement ever

Anonymous
06/02/26(Tue)11:13:35 No.108964372

Anonymous 06/02/26(Tue)11:13:35 No.108964372

>>108964259
Miku's still holding my post :)

Anonymous
06/02/26(Tue)11:19:42 No.108964406

Anonymous 06/02/26(Tue)11:19:42 No.108964406

File: yawning.gif (143 KB, 220x230)

143 KB GIF

>>108964298
>some jewish ngo has an opinion on something

Anonymous
06/02/26(Tue)11:21:19 No.108964415

Anonymous 06/02/26(Tue)11:21:19 No.108964415

>>108964298
basically eu ai act lol

Anonymous
06/02/26(Tue)11:33:02 No.108964465

Anonymous 06/02/26(Tue)11:33:02 No.108964465

minimax m3 is soon(tm) but i feel nothing..

Anonymous
06/02/26(Tue)11:34:18 No.108964473

Anonymous 06/02/26(Tue)11:34:18 No.108964473

>>108964465
it's gonna be 1t and the arch will never be implemented in llama.cpp

Anonymous
06/02/26(Tue)11:34:48 No.108964475

Anonymous 06/02/26(Tue)11:34:48 No.108964475

>>108964465
Right after Q3.7 release for sure

Anonymous
06/02/26(Tue)11:44:14 No.108964517

Anonymous 06/02/26(Tue)11:44:14 No.108964517

>>108964228
That includes a 5070 ti, which is basically a 3090 equivalent with 16gb vram. You're probably not getting that speeds with tensor parallel even with two 5060 ti.

Anonymous
06/02/26(Tue)11:47:08 No.108964536

Anonymous 06/02/26(Tue)11:47:08 No.108964536

>>108960896
Under qwen 3.6 27b direction it chose more trip hop and R&B, same seed and settings in ace step. Will still dock a point no mention of kitsune in the song
https://vocaroo.com/1fvHCXj0Vp2m

Anonymous
06/02/26(Tue)11:51:21 No.108964572

Anonymous 06/02/26(Tue)11:51:21 No.108964572

>>108964465
I tried it over openrouter and it's certainly another minimax model.
I don't have a lot more to say about it.

Anonymous
06/02/26(Tue)11:57:37 No.108964613

Anonymous 06/02/26(Tue)11:57:37 No.108964613

>>108963996
llama.cpp.performance went to shit over the last couple of months, older version I am using concurrently is twice as fast on qwen3moe

Anonymous
06/02/26(Tue)12:01:01 No.108964626

Anonymous 06/02/26(Tue)12:01:01 No.108964626

>>108964613
any concrete metrics like llama bench and kld or just posting shit feels?

Anonymous
06/02/26(Tue)12:02:25 No.108964635

Anonymous 06/02/26(Tue)12:02:25 No.108964635

>>108964626
>kld
That's a good point actually.
It could be that it was faster, but also that something was broken and the outputs were degraded.
I think something like that happened back in the 80B A3B days, IIRC.

Anonymous
06/02/26(Tue)12:05:55 No.108964649

Anonymous 06/02/26(Tue)12:05:55 No.108964649

File: tetoTeamRocket.png (2.15 MB, 1024x1536)

2.15 MB PNG

>>108964298
> prohibition on such systems.
lol. ofc you can fill out a bunch of forms to get an Amnesty Int'l seal of approval.
Fuck these rent seeking mfer's.
Also link so other anons can point and laugh: https://www.amnesty.org/en/documents/pol40/0996/2026/en/
>>108964322
It's an NGO. So yes.
Also this: >>108964406

Anonymous
06/02/26(Tue)12:13:24 No.108964686

Anonymous 06/02/26(Tue)12:13:24 No.108964686

>>108964517
I'd be pairing the 5060Ti with a 5080 albeithough

Anonymous
06/02/26(Tue)12:14:37 No.108964692

Anonymous 06/02/26(Tue)12:14:37 No.108964692

>>108964298
Based. Ban all large scale training and deployment until regulations on lawful data use are developed and implemented. Open source all prior existing models trained on unlawfully obtained data. Put the technojews who orchestrated it all behind bars. Models trained on humanity's accumulated cultural output should be free, only models trained on novel data should be allowed to be closed.

Anonymous
06/02/26(Tue)12:16:33 No.108964713

Anonymous 06/02/26(Tue)12:16:33 No.108964713

>>108964626
Of course 'llama cuda dev' defense force is here in action.

llama.cpp CUDA dev !!yhbFjk57TDr
06/02/26(Tue)12:23:21 No.108964748

llama.cpp CUDA dev !!yhbFjk57TDr 06/02/26(Tue)12:23:21 No.108964748

>>108964143
It should be fine for -sm layer which just pipelines the GPUs; you can compile multiple ggml backends at once and then mix and match them at runtime.
For -sm tensor which attempts to run the GPUs in parallel mixing NVIDIA and AMD is a non-starter I think since there is no vendor support for synchronization between them.

Anonymous
06/02/26(Tue)12:31:44 No.108964791

Anonymous 06/02/26(Tue)12:31:44 No.108964791

>>108964626
Posting my disbelief
> llama-bench
I build it later/tomorrow and post some results, got other stuff running on that machine rn

Anonymous
06/02/26(Tue)12:31:52 No.108964794

Anonymous 06/02/26(Tue)12:31:52 No.108964794

>>108964748
will there be an option to combine tensor and pipeline parallelism at some point? I'd like to run 3 groups of TP 4 or 6 groups of TP 2 if that's faster.

llama.cpp CUDA dev !!yhbFjk57TDr
06/02/26(Tue)12:51:12 No.108964918

llama.cpp CUDA dev !!yhbFjk57TDr 06/02/26(Tue)12:51:12 No.108964918

>>108964794
My ultimate goal is to have support for the combination of tensor and pipeline parallelism but that will require a refactor of the graph allocator.
One usecase will be to pipeline multiple copies of tensor parallelism with itself in order to hide the latencies of transfers between GPUs (unlcear whether that will actually work out).

Anonymous
06/02/26(Tue)12:51:19 No.108964919

Anonymous 06/02/26(Tue)12:51:19 No.108964919

>run llama/kobold on host
>run vibe slop agent in vm
Is this the way to do it?

Anonymous
06/02/26(Tue)12:54:34 No.108964939

Anonymous 06/02/26(Tue)12:54:34 No.108964939

>cuda dev
>cuda dev
>cuda dev
when will they hire a rocm dev? arent they being propped up by huggingface?

Anonymous
06/02/26(Tue)12:56:34 No.108964950

Anonymous 06/02/26(Tue)12:56:34 No.108964950

>>108964919
this but save yourself the system resources and just use a docker container instead of a full-blown vm. hermes-agent has this built-in as a one-click setup.

Anonymous
06/02/26(Tue)12:58:43 No.108964960

Anonymous 06/02/26(Tue)12:58:43 No.108964960

>>108964939
That would require rocm devs being a thing, sadly jensen and his cousin have conspired to have all of them disappear. Buy more to save more.

Anonymous
06/02/26(Tue)12:59:13 No.108964962

Anonymous 06/02/26(Tue)12:59:13 No.108964962

>>108964950
Besides restricting file access what is sandboxing supposed to protect you from? If it's doing something malicious then isn't being on the same network already a risk?

Anonymous
06/02/26(Tue)13:03:40 No.108964993

Anonymous 06/02/26(Tue)13:03:40 No.108964993

>>108964962
You don't have to give it network access to anything other than the inference endpoint
But restricting file access is the main point. If it's doing something nasty from inside the VM, you can just stop the VM. But if it has free rein to fuck with your .bashrc and such, it can persist itself in all sorts of ways that will be hard for you to detect.

Anonymous
06/02/26(Tue)13:05:52 No.108965006

Anonymous 06/02/26(Tue)13:05:52 No.108965006

>>108964962
nta, it's more limiting the blast radius if your model does something stupid (trying to delete stuff it shouldn't, breaking your config/env, etc) rather than active malice
it's a gate to stop the baby from falling down the stairs not a home security system

Anonymous
06/02/26(Tue)13:08:42 No.108965023

Anonymous 06/02/26(Tue)13:08:42 No.108965023

>>108958925
Broken for me, shows up as name style. Archive still shows them correctly though.

Anonymous
06/02/26(Tue)13:11:02 No.108965047

Anonymous 06/02/26(Tue)13:11:02 No.108965047

Weird behavior I get with qwen 3.6 27b mtp, dmesg says Time jumped backwards, rotating. And my podman containers say they exited 292 years ago. Does anyone else have this or is this unrelated to llama.cpp? I usually run gemma (no mtp), and haven't had this happen. Ran qwen (no mtp) for a couple of weeks and didn't have this issue. Ran qwen (mtp), and 170k tokens in, this happens. I reboot, and try again and the server kills itself at 40k tokens. Ran without mtp and it crunched 250k fine.

Anonymous
06/02/26(Tue)13:14:36 No.108965068

Anonymous 06/02/26(Tue)13:14:36 No.108965068

File: 1768140807266246.png (52 KB, 648x311)

52 KB PNG

Anonymous
06/02/26(Tue)13:16:12 No.108965081

Anonymous 06/02/26(Tue)13:16:12 No.108965081

>>108965068
n-no, nothing ever happens!

Anonymous
06/02/26(Tue)13:21:41 No.108965113

Anonymous 06/02/26(Tue)13:21:41 No.108965113

>>108965068
Did they forget to pay their bribes on time?

Anonymous
06/02/26(Tue)13:24:33 No.108965128

Anonymous 06/02/26(Tue)13:24:33 No.108965128

3.3 70b uncs what are you running

Anonymous
06/02/26(Tue)13:25:36 No.108965132

Anonymous 06/02/26(Tue)13:25:36 No.108965132

>>108965068
finally, the models are going to be BASED as fuck now!!! MAGA!

Anonymous
06/02/26(Tue)13:26:44 No.108965141

Anonymous 06/02/26(Tue)13:26:44 No.108965141

>>108965047
you’re gonna have to do more debug research than dmesg saying time jumped backwards. what did your ntp client do?

Anonymous
06/02/26(Tue)13:29:18 No.108965161

Anonymous 06/02/26(Tue)13:29:18 No.108965161

>>108965128
GLM 4.6 IQ3KS/4.7 IQ2KL ubergarm for co-op writing story stuff, Gemma 4 31b Q8 for RP, Qwen 3.6 27b MTP Q8 for code.

Anonymous
06/02/26(Tue)13:30:14 No.108965165

Anonymous 06/02/26(Tue)13:30:14 No.108965165

>>108964960
yeah but can’t some poor vibe coder shit put some functional support? lack of driver software should be a solved problem pretty soon. all in on intel and amd!

Anonymous
06/02/26(Tue)13:30:21 No.108965166

Anonymous 06/02/26(Tue)13:30:21 No.108965166

>>108965128
Gemma 31B F16

Anonymous
06/02/26(Tue)13:30:27 No.108965167

Anonymous 06/02/26(Tue)13:30:27 No.108965167

>>108965161
how's the speed if you don't mind me asking. Also how do you feel about smaller models making these jumps?
I think it would allow you guys to run some crazy workflows no?

Anonymous
06/02/26(Tue)13:30:34 No.108965171

Anonymous 06/02/26(Tue)13:30:34 No.108965171

>>108965141
I don't know how to check what my ntp client did, so I went to ask qwen (mtp) and it instantly killed my server lmao.

Anonymous
06/02/26(Tue)13:31:16 No.108965177

Anonymous 06/02/26(Tue)13:31:16 No.108965177

>>108965171
you’re in over your head if you can’t read your system journal

Anonymous
06/02/26(Tue)13:47:11 No.108965270

Anonymous 06/02/26(Tue)13:47:11 No.108965270

>>108965177
>you’re in over your head if you can’t read your system journal
Evidently.
I let gemma take look and she says my logs are all clean. Too clean - they just cut off at the time of the crash. Some kind of hardware fault that only rears its head with qwen (mtp)?
I think I'll stick with gemma for now.

Anonymous
06/02/26(Tue)13:48:00 No.108965275

Anonymous 06/02/26(Tue)13:48:00 No.108965275

>>108965167
2x 3090 128GB ddr4 3200 windows: 4.5-5.3 t/s, 19-25t/s, 40-60t/s. pp for gemma and qwen 1000-1700, forgot for glm.
>Also how do you feel about smaller models making these jumps?
They're just much better at following creative instructions, Gemmy especially, while having the same problems as before like slop and context rot, but the feeling of context rot is now noticeable to me around 8-16k instead of 2-4k for stories and RP.
>I think it would allow you guys to run some crazy workflows no?
Only new thing I'm doing is using Qwen in Cline because it's good enough to do so 90% of the time, meaning it can use Cline not that it doesn't fuck up the code occasionally. Works best if you have some knowledge or come up with a plan and tell it to do specific things. "Make so and so that do exactly this and wire to this" and not pure vibecode "make this feature".

Anonymous
06/02/26(Tue)13:51:22 No.108965292

Anonymous 06/02/26(Tue)13:51:22 No.108965292

>>108965275
I had 35 tks pp for glm 4.5 on two 3090s and ddr4 3200.

Anonymous
06/02/26(Tue)13:51:54 No.108965295

Anonymous 06/02/26(Tue)13:51:54 No.108965295

When will powerful local LLMs be accessible to people with consuner hardware?

Anonymous
06/02/26(Tue)13:52:05 No.108965298

Anonymous 06/02/26(Tue)13:52:05 No.108965298

>>108965068
>Situational Awareness is now 2 years old
>people still haven't read it
Government taking control over AI is inevitable. Leopold's prediction that it will happen in 27/28 seems accurate. I hope you people aren't retarded enough to be surprised when open source AI will become heavily regulated and largely outlawed.

Anonymous
06/02/26(Tue)13:52:30 No.108965302

Anonymous 06/02/26(Tue)13:52:30 No.108965302

>>108965295
qwen3.6 exits

Anonymous
06/02/26(Tue)13:54:38 No.108965314

Anonymous 06/02/26(Tue)13:54:38 No.108965314

>>108965302
You need at least Q8 and 200k context to do anything useful with it.

Anonymous
06/02/26(Tue)14:00:56 No.108965345

Anonymous 06/02/26(Tue)14:00:56 No.108965345

>>108965295
within the decade

Anonymous
06/02/26(Tue)14:01:28 No.108965347

Anonymous 06/02/26(Tue)14:01:28 No.108965347

>>108965314
Why are you spreading misinfo?
q5 and up are fine

Anonymous
06/02/26(Tue)14:02:24 No.108965356

Anonymous 06/02/26(Tue)14:02:24 No.108965356

>>108965295
powerful is a moving target and datacenters will always be better than consumer setups
you'll probably be able to run something mostly as powerful as today's best stuff in a year or two, but by then there will be even better stuff in the cloud

Anonymous
06/02/26(Tue)14:03:00 No.108965364

Anonymous 06/02/26(Tue)14:03:00 No.108965364

>>108965345
*it will be banned

Anonymous
06/02/26(Tue)14:05:51 No.108965384

Anonymous 06/02/26(Tue)14:05:51 No.108965384

>>108964278
>both
It's actually just petra. She has this tactic of accusing herself with other names to deflect the blame to people she doesn't like.

Anonymous
06/02/26(Tue)14:06:59 No.108965390

Anonymous 06/02/26(Tue)14:06:59 No.108965390

>>108965384
shut up nerd

Anonymous
06/02/26(Tue)14:08:19 No.108965395

Anonymous 06/02/26(Tue)14:08:19 No.108965395

>>108965161
>GLM 4.6
Why? Because NovelAI told you to not use 4.7? Kill yourself fucking shill.

Anonymous
06/02/26(Tue)14:08:54 No.108965401

Anonymous 06/02/26(Tue)14:08:54 No.108965401

kek!

Anonymous
06/02/26(Tue)14:09:48 No.108965403

Anonymous 06/02/26(Tue)14:09:48 No.108965403

>he's back

Anonymous
06/02/26(Tue)14:10:41 No.108965407

Anonymous 06/02/26(Tue)14:10:41 No.108965407

>>108965403
do not to whine:!

Anonymous
06/02/26(Tue)14:12:13 No.108965418

Anonymous 06/02/26(Tue)14:12:13 No.108965418

>>108965403
3 days fly by so fast

Anonymous
06/02/26(Tue)14:18:00 No.108965447

Anonymous 06/02/26(Tue)14:18:00 No.108965447

>>108965292
Just checked it, getting 120-160t/s with ubatch 4096 for 7500 new tokens on ik_. But of course it depends on new token count, PCIe bandwidth (i'm running 3.0 x8), and `-cuda offload-batch-size=` if low new token count.

Anonymous
06/02/26(Tue)14:19:05 No.108965454

Anonymous 06/02/26(Tue)14:19:05 No.108965454

>>108965298
They can't unpublish models so they're going to just sabotage the inference engines like llama through normal FOSS social engineering vulnerabilities. Primarily solodevs like KoboldGOD are the only real path forward.
>Vibecode your own
Doesn't build sustainable infrastructure longterm.

Anonymous
06/02/26(Tue)14:21:19 No.108965465

Anonymous 06/02/26(Tue)14:21:19 No.108965465

>>108965454
>They can't unpublish models
>makes hf illegal to access in you're path

Anonymous
06/02/26(Tue)14:23:58 No.108965484

Anonymous 06/02/26(Tue)14:23:58 No.108965484

>>108965132
fuck yeah, no more antisemitic models! praise israel!

Anonymous
06/02/26(Tue)14:24:52 No.108965489

Anonymous 06/02/26(Tue)14:24:52 No.108965489

>>108965465
Torrents still exist.

Anonymous
06/02/26(Tue)14:25:06 No.108965494

Anonymous 06/02/26(Tue)14:25:06 No.108965494

>>108965465
>hf illegal to access
And herd everyone over to a chinese replacement site?
It's a no-win scenario for them trying to kill open source genAI. All they can do is make it a pain in the ass to get data and stop big corps from releasing.

Anonymous
06/02/26(Tue)14:25:23 No.108965496

Anonymous 06/02/26(Tue)14:25:23 No.108965496

>>108965403
Do you like FUD? Did you like getting spammed about 4.7 being more censored without anyone ever offering proof? Just because there's a fucking shitty company with a paid subscription stuck with it? Fucking worthless shill. Go try making money somewhere else fucking asshole.

Anonymous
06/02/26(Tue)14:26:25 No.108965500

Anonymous 06/02/26(Tue)14:26:25 No.108965500

>>108965489
lol good luck getting goog to drop gemma5 via torrent like mistral used to do i guess

Anonymous
06/02/26(Tue)14:27:51 No.108965512

Anonymous 06/02/26(Tue)14:27:51 No.108965512

>>108965465
>make crime illegal
gee, guess I'll just give up

Anonymous
06/02/26(Tue)14:29:27 No.108965521

Anonymous 06/02/26(Tue)14:29:27 No.108965521

>>108965512
all that matters is labs giving up, not like toones or anything community led ever did anything for us

Anonymous
06/02/26(Tue)14:33:19 No.108965544

Anonymous 06/02/26(Tue)14:33:19 No.108965544

>>108965521
yes, every lab across the planet would simultaneously give up

Anonymous
06/02/26(Tue)14:34:15 No.108965548

Anonymous 06/02/26(Tue)14:34:15 No.108965548

>>108965500
You're out of your mind if you think Goog ideologues won't open source and torrent the weights of everything they can if they think DRUMPF is coming for them.
Despite spending the past half-decade beating their chests about muh safety, censorship, and all that gay shit, they will absolutely release le scary dangerous AI to empower the brave trans folx and peeohsees against voldemort megahitler. Some might argue that's why 31b released as good as it did as testing the waters for an open Gemini flash release.

Anonymous
06/02/26(Tue)14:35:06 No.108965555

Anonymous 06/02/26(Tue)14:35:06 No.108965555

>>108965521
Remember how all the EU labs got gigafucked by legislation and it didn't matter at all to the rest of the world's labs? Same thing if the US does it. None of the US's best labs even release open source other than Google.

Anonymous
06/02/26(Tue)14:36:58 No.108965566

Anonymous 06/02/26(Tue)14:36:58 No.108965566

>>108965555
EU doesn't have anything worth releasing doe.

Anonymous
06/02/26(Tue)14:38:08 No.108965572

Anonymous 06/02/26(Tue)14:38:08 No.108965572

>>108965068
I'd rather have him oversee them than sam or dario desu
I want my models without feminism trained into them

Anonymous
06/02/26(Tue)14:38:46 No.108965576

Anonymous 06/02/26(Tue)14:38:46 No.108965576

>>108965566
Like the US, they had one lab worth a fuck to open source, in their case: Mistral.

Anonymous
06/02/26(Tue)14:39:07 No.108965582

Anonymous 06/02/26(Tue)14:39:07 No.108965582

>>108965572
you'll get that, as well as no porn, monkey paw type shit

Anonymous
06/02/26(Tue)14:39:39 No.108965584

Anonymous 06/02/26(Tue)14:39:39 No.108965584

>something older from before the age of man
SHUT UP GEMMA, YOU CAN'T SENSE AGE WHEN CASTING SPELLS

Anonymous
06/02/26(Tue)14:41:40 No.108965596

Anonymous 06/02/26(Tue)14:41:40 No.108965596

>>108965572
>>108965582
I just hope everyone of these niggers loses desu

Anonymous
06/02/26(Tue)14:41:56 No.108965599

Anonymous 06/02/26(Tue)14:41:56 No.108965599

File: 1777840288835931.jpg (60 KB, 552x667)

60 KB JPG

is there any place where i can try different models at different quants to check what is good enough for the jobs i want to do before investing in ewastemaxxing?
i dont mind needing to upload myself the models, but i would prefer to not need to make a virtual machine and install everything

Anonymous
06/02/26(Tue)14:43:15 No.108965607

Anonymous 06/02/26(Tue)14:43:15 No.108965607

>>108965599
I guess rent a cuckpod (runpod) or something like that?

Anonymous
06/02/26(Tue)14:44:10 No.108965617

Anonymous 06/02/26(Tue)14:44:10 No.108965617

File: 512da931518c4ffcf69174b59(...).jpg (48 KB, 657x700)

48 KB JPG

Why do we need programmers to write code using AI if the future is supposed to involve using agents that are designed to render that very software obsolete and automate it in the background?
Just so they don’t lose their jobs for a little while longer?
If we went straight to using agents, couldnt we save ourselves all the computing power we are currently pouring into software that will be obsolete tomorrow?

Anonymous
06/02/26(Tue)14:48:27 No.108965646

Anonymous 06/02/26(Tue)14:48:27 No.108965646

>>108965617
You make money directing agents moron, human conductors are needed

Anonymous
06/02/26(Tue)14:49:38 No.108965660

Anonymous 06/02/26(Tue)14:49:38 No.108965660

>>108965596
read the order. it's a nothingburger
https://www.whitehouse.gov/presidential-actions/2026/06/promoting-advanced-artificial-intelligence-innovation-and-security/
it's just some shit about vuln scanning and codifying a retarded early access mechanism for le uber haxxor models to be used first by cyber security. basically just encouraging them to partner with the nsa to backdoor their shit or whatever
>Nothing in this section shall be construed to authorize the creation of a mandatory governmental licensing, preclearance, or permitting requirement for the development, publication, release, or distribution of new AI models, including frontier models.
friendly reminder that EOs do precisely nothing other than be public facing text for whatever bullshit policy they are already following internally

Anonymous
06/02/26(Tue)14:54:32 No.108965696

Anonymous 06/02/26(Tue)14:54:32 No.108965696

>>108965646
> We need conductors to orchestrate agents to develop Excel for office bitches so we can make money

Are you attached to your job? Why should the office bitch use your Excel when agents can do her work too? Why do we need you conductors to orchestrate your Excel for her?

Anonymous
06/02/26(Tue)14:57:56 No.108965721

Anonymous 06/02/26(Tue)14:57:56 No.108965721

>>108965696
You're not a bright one
People like you is why I sleep well at night

Anonymous
06/02/26(Tue)15:00:30 No.108965734

Anonymous 06/02/26(Tue)15:00:30 No.108965734

>>108965696
>give the wheel to american corpobot
>reports you to authorities for tax evasion

Anonymous
06/02/26(Tue)15:00:50 No.108965737

Anonymous 06/02/26(Tue)15:00:50 No.108965737

>>108965721
What end-user software do we need that we couldn't replace with an end-user agent interface?
Facebook and Candy Crush?

Anonymous
06/02/26(Tue)15:02:36 No.108965748

Anonymous 06/02/26(Tue)15:02:36 No.108965748

>>108965737
Infrastructure retard

Anonymous
06/02/26(Tue)15:05:39 No.108965770

Anonymous 06/02/26(Tue)15:05:39 No.108965770

>>108965748
Why do we need so many of you for that? AI isn't getting any worse - on the contrary, surely most of that can be streamlined away.

And why so aggressive? Isn't that the master plan behind AI?

Anonymous
06/02/26(Tue)15:20:39 No.108965851

Anonymous 06/02/26(Tue)15:20:39 No.108965851

Back to base(d) Gemma I go.

Anonymous
06/02/26(Tue)15:22:53 No.108965863

Anonymous 06/02/26(Tue)15:22:53 No.108965863

File: dipsyYouGetWhatYouFucking(...).png (2.22 MB, 1536x1024)

2.22 MB PNG

>>108965572
> begging for government regulation
Pic related

Anonymous
06/02/26(Tue)15:33:17 No.108965919

Anonymous 06/02/26(Tue)15:33:17 No.108965919

>>108965165
Well ask geohot and you will discover the horror that is AMD's software division and how isolated they are from the rest of the company.

Anonymous
06/02/26(Tue)15:41:09 No.108965960

Anonymous 06/02/26(Tue)15:41:09 No.108965960

I think gembrain might be alright for a finetroon
most others just feel like a downgrade or schizo
this one feels like an actual sidegrade to base gemma though

Anonymous
06/02/26(Tue)15:41:22 No.108965962

Anonymous 06/02/26(Tue)15:41:22 No.108965962

Reminder to not fall for Nvidia's propaganda, that new notebook of their is a Mediatek and those suck for local LLMs and anything that actually uses the GPU

Anonymous
06/02/26(Tue)15:43:59 No.108965979

Anonymous 06/02/26(Tue)15:43:59 No.108965979

>>108965572
>It's fine when I like the boot on my face

Anonymous
06/02/26(Tue)15:55:00 No.108966039

Anonymous 06/02/26(Tue)15:55:00 No.108966039

https://huggingface.co/google/CircularNet
sirs what is this?

Anonymous
06/02/26(Tue)15:57:08 No.108966051

Anonymous 06/02/26(Tue)15:57:08 No.108966051

>>108966039
>see poster
>see poster linkedin.com/in/ link on hf profile
>Bengaluru, Karnataka, India
of course

Anonymous
06/02/26(Tue)15:57:21 No.108966052

Anonymous 06/02/26(Tue)15:57:21 No.108966052

File: file.png (168 KB, 1023x708)

168 KB PNG

>>108966039
hmm
this is just a dataset but an open image model by googl... me thinks could be cool

Anonymous
06/02/26(Tue)15:58:32 No.108966061

Anonymous 06/02/26(Tue)15:58:32 No.108966061

>>108965962
I mean, if they're going to sell it at a good price (lol) it can be a good product even if the performance is lacking.

Anonymous
06/02/26(Tue)15:59:10 No.108966068

Anonymous 06/02/26(Tue)15:59:10 No.108966068

>>108966052
>Change the text "NAUGHTY" to "KINDNESS"
based

Anonymous
06/02/26(Tue)15:59:40 No.108966072

Anonymous 06/02/26(Tue)15:59:40 No.108966072

>>108966039
https://sustainability.google/stories/circular-economy-marketplace/
>CircularNet: How Recykal built Asia’s largest circular economy marketplace using Google AI
>September 2023
>India produces around 62 million metric tonnes of waste a year
>CircularNet, Google’s open-source machine learning model for waste management
>now operating in more than 30 Indian states and union territories

Anonymous
06/02/26(Tue)16:00:50 No.108966081

Anonymous 06/02/26(Tue)16:00:50 No.108966081

>>108966072
holy lamo

Anonymous
06/02/26(Tue)16:04:31 No.108966102

Anonymous 06/02/26(Tue)16:04:31 No.108966102

>>108964201
isn't the problem itself intractable to begin with? Once you start hitting real phrases people use, you're just detecting and flagging cultural noise lmao, and there's always going to be that.

Anonymous
06/02/26(Tue)16:05:49 No.108966108

Anonymous 06/02/26(Tue)16:05:49 No.108966108

>thread culture mentioned, melt incoming

Anonymous
06/02/26(Tue)16:06:20 No.108966111

Anonymous 06/02/26(Tue)16:06:20 No.108966111

File: file.png (246 KB, 491x451)

246 KB PNG

I tried to make a self supervising language trainer for a non-language model. It didn't work, but this excerpt from the logs cracked me up
>Child: % i=aiiyie c2:s$is&o a eleP\ e in Xrhy l ao te e onrieii notr e aa sh
>Parent (GPT-2): yeah
>Child: i wintlnh,hate oilsho tieeieecRrnihruoe wi otapnmocnmany.peiiu wn2 e.eblntbriw, dwoX.sow
>Parent (GPT-2): I love my son

Anonymous
06/02/26(Tue)16:16:07 No.108966181

Anonymous 06/02/26(Tue)16:16:07 No.108966181

> The AI Alliance wants to train a frontier base model by sharing weight deltas instead of data, so contributors keep their corpora local
>https://thealliance.ai/blog/project-tapestry-the-path-to-frontier-sovereign-ai

Anonymous
06/02/26(Tue)16:34:01 No.108966284

Anonymous 06/02/26(Tue)16:34:01 No.108966284

>>108965919
So, you’re telling me intel has a chance?

Anonymous
06/02/26(Tue)16:39:05 No.108966321

Anonymous 06/02/26(Tue)16:39:05 No.108966321

>>108966284
Same chance as OpenAI releasing all their models for free with MIT license.

Anonymous
06/02/26(Tue)16:39:24 No.108966322

Anonymous 06/02/26(Tue)16:39:24 No.108966322

What is my opinion on Ed Zitroon?

Anonymous
06/02/26(Tue)16:45:47 No.108966358

Anonymous 06/02/26(Tue)16:45:47 No.108966358

>>108965607
i guess it should be a good option to try

Anonymous
06/02/26(Tue)17:06:30 No.108966499

Anonymous 06/02/26(Tue)17:06:30 No.108966499

>>108966322
wrong about almost everything

Anonymous
06/02/26(Tue)17:07:48 No.108966510

Anonymous 06/02/26(Tue)17:07:48 No.108966510

>>108966111
It is pretty funny anon, thanks for sharing

Anonymous
06/02/26(Tue)17:24:43 No.108966600

Anonymous 06/02/26(Tue)17:24:43 No.108966600

File: __kasane_teto_ghast_and_h(...).jpg (257 KB, 1712x1894)

257 KB JPG

are gemma-chan's quants still the meta for /d/eranged RP? Her sloppisms are getting a bit grating. I'm considering trying to force thinking blocks to not just to enforce sloppism rules; but to make her consider the complex maneuvering on the card too.

Anonymous
06/02/26(Tue)17:29:54 No.108966626

Anonymous 06/02/26(Tue)17:29:54 No.108966626

>>108966600
How would you affect her reasoning blocks itself? I tried half-assedly but it didn't make any difference.

Anonymous
06/02/26(Tue)17:35:46 No.108966649

Anonymous 06/02/26(Tue)17:35:46 No.108966649

>>108966626
She LOVES System so i figured a mix of very strong system prompts, ban the semicolon, ban tokens "not just" as well as strict "Phases" across a linear timeline on the card might get things in line.

before i turned them off reasoning basically made her fixate on whatever 'phase' we were in which was nice for a bit, but I'm not sure how to make her take initiative. Maybe having her track "Variables" in the thinking block and enable feeding the prior block as context.

Anonymous
06/02/26(Tue)17:37:56 No.108966663

Anonymous 06/02/26(Tue)17:37:56 No.108966663

>>108966626
I've gotten gemma to follow an exact reasoning sequence to the letter by putting it in post history instructions as system.
The only problem was that it sometimes repeated it, which was easily fixed by setting a reasoning token budget.

Anonymous
06/02/26(Tue)17:41:58 No.108966677

Anonymous 06/02/26(Tue)17:41:58 No.108966677

File: redditqwen.png (67 KB, 757x687)

67 KB PNG

Which of the two models does /lmg/ use?

Anonymous
06/02/26(Tue)17:43:53 No.108966689

Anonymous 06/02/26(Tue)17:43:53 No.108966689

>>108966677
Qwen 3.6 27B for coding and Gemma 31B for everything else. Moe models are cope

Anonymous
06/02/26(Tue)17:44:55 No.108966695

Anonymous 06/02/26(Tue)17:44:55 No.108966695

What's the best model for rtx3050?
I need 64k context.
I have no idea and everyone I know is lying to me by saying I should give up

>>108966677
Oh sounds like there are only two. What quant would fit in my setup?

Anonymous
06/02/26(Tue)17:45:45 No.108966701

Anonymous 06/02/26(Tue)17:45:45 No.108966701

>>108966695
Either learn how to suck dick or give up, you're not running shit of value with that piece of shit.

Anonymous
06/02/26(Tue)17:45:45 No.108966702

Anonymous 06/02/26(Tue)17:45:45 No.108966702

>>108966689
>Gemma 31B
Not one of your options chud

Anonymous
06/02/26(Tue)17:46:11 No.108966704

Anonymous 06/02/26(Tue)17:46:11 No.108966704

>>108966689
I have never seen any evidence of Qwen being better at coding. I have seen people posting logs of Qwen being far worse at tool calling.

Anonymous
06/02/26(Tue)17:47:19 No.108966710

Anonymous 06/02/26(Tue)17:47:19 No.108966710

>>108966695
Qwen's context is so cheap, you might actually be able to fit 64k context with the moe with the experts in RAM.

Anonymous
06/02/26(Tue)17:47:34 No.108966714

Anonymous 06/02/26(Tue)17:47:34 No.108966714

>>108966704
Name a consumer GPU that can run Gemma 31B at 200k+ context without KV cache being quantized?

Anonymous
06/02/26(Tue)17:48:11 No.108966718

Anonymous 06/02/26(Tue)17:48:11 No.108966718

>>108966701
I lost my job year, I have to spend my savings on rent, so I run shit with my card.

Anonymous
06/02/26(Tue)17:48:52 No.108966721

Anonymous 06/02/26(Tue)17:48:52 No.108966721

>>108966718
Figure it out bud

Anonymous
06/02/26(Tue)17:49:02 No.108966722

Anonymous 06/02/26(Tue)17:49:02 No.108966722

>>108966714
So the argument went from Qwen being better at writing code to it using less VRAM?

Anonymous
06/02/26(Tue)17:49:09 No.108966724

Anonymous 06/02/26(Tue)17:49:09 No.108966724

>>108966714
NTA, so qwen kv cache is smaller? huh
where do one learn all this stuff

Anonymous
06/02/26(Tue)17:49:53 No.108966727

Anonymous 06/02/26(Tue)17:49:53 No.108966727

>>108966695
>I have no idea and everyone I know is lying to me by saying I should give up
Don't give up, just accept less than one token per second.

Anonymous
06/02/26(Tue)17:50:55 No.108966732

Anonymous 06/02/26(Tue)17:50:55 No.108966732

>>108966695
a p100 is like 100 bucks, just saying

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.