/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 06/14/26(Sun)17:47:58 No.109057485

File: 00016-8537684.png (1.2 MB, 768x1280)

1.2 MB PNG

/lmg/ - Local Models General Anonymous 06/14/26(Sun)17:47:58 No.109057485 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>109053101 & >>109048334

►News
>(06/13) Rio 3.5 Open 397B released with SwiReasoning: https://hf.co/prefeitura-rio/Rio-3.5-Open-397B
>(06/12) MiniMax-M3 released, multimodal 428B-A23B with 1M context: https://hf.co/MiniMaxAI/MiniMax-M3
>(06/12) Kimi K2.7 Code released: https://hf.co/moonshotai/Kimi-K2.7-Code
>(06/12) EAGLE3 speculative decoding support merged: https://github.com/ggml-org/llama.cpp/pull/18039

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
06/14/26(Sun)17:48:26 No.109057489

Anonymous 06/14/26(Sun)17:48:26 No.109057489

File: mikuthreadrecap.jpg (1.15 MB, 1804x2160)

1.15 MB JPG

►Recent Highlights from the Previous Thread: >>109053101

--Debating benchmark reliability and Qwen's performance vs Gemma 4:
>109053525 >109053577 >109053603 >109053627 >109053651 >109053669 >109054428 >109053670 >109053684 >109053687 >109053723 >109053647 >109053711 >109053593 >109053628
--Local model limitations with long system prompts and cloud orchestration:
>109053518 >109053541 >109053558 >109053666 >109053732 >109053730 >109053813 >109054600 >109054628 >109054700 >109053825
--Strategies for small AI labs to gain visibility without benchmarks:
>109053667 >109053710 >109054446 >109054525 >109054502 >109056323 >109056703 >109056830 >109054790 >109055065 >109056367
--MoE models and deployment tips for DGX Spark hardware:
>109054365 >109054420 >109054436 >109054729 >109054659 >109054734
--Trading model size for context window on 24GB VRAM cards:
>109055171 >109055203 >109055228 >109055286 >109055409 >109055439 >109055253 >109055266 >109055930 >109055740 >109055936 >109056028 >109056110 >109056247
--Gemma-4-31B performance on 4090 and debate over QAT quants:
>109057054 >109057076 >109057093 >109057136 >109057218 >109057138 >109057268
--Claims that Rio 3.5 is a merge of Nex and Qwen:
>109055830 >109055903 >109055989
--Speculating on Mistral's decline and Meta's internal corporate AI failures:
>109053890 >109053911 >109053959 >109053961 >109054584 >109054594 >109054743 >109053970 >109054002 >109053951
--Debating world models versus LLMs as paths to AGI:
>109054070 >109054085 >109054169 >109054198 >109054226 >109054243 >109054240 >109054193
--Comparing open source AI coding tools and local-only interfaces:
>109053118 >109053132 >109053144 >109053204 >109053236 >109053250 >109053848 >109054337 >109055482 >109055657 >109055681 >109055744 >109055510 >109055568 >109055592 >109055680 >109056440
--Miku (free space):
>109053508 >109055705

►Recent Highlight Posts from the Previous Thread: >>109053288

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
06/14/26(Sun)17:52:13 No.109057513

Anonymous 06/14/26(Sun)17:52:13 No.109057513

https://github.com/moeru-ai/airi
This one looks promising

Anonymous
06/14/26(Sun)17:53:24 No.109057521

Anonymous 06/14/26(Sun)17:53:24 No.109057521

>>109057513
Shit, didn't even see it posted in the last thread lmao

Anonymous
06/14/26(Sun)17:54:41 No.109057534

Anonymous 06/14/26(Sun)17:54:41 No.109057534

another thread, another migu

Anonymous
06/14/26(Sun)17:56:43 No.109057545

Anonymous 06/14/26(Sun)17:56:43 No.109057545

Are there any good models that could run on a toaster? Talking about a T480 ChinkPad with an Intel HD 620

Anonymous
06/14/26(Sun)17:57:13 No.109057547

Anonymous 06/14/26(Sun)17:57:13 No.109057547

Your future self 10 years from now telling you you'll be masturbating to computer-generated VR content when you're older, running on expensive hardware you specifically bought for that purpose.

Anonymous
06/14/26(Sun)17:57:50 No.109057552

Anonymous 06/14/26(Sun)17:57:50 No.109057552

>>109057545
no

Anonymous
06/14/26(Sun)17:59:40 No.109057562

Anonymous 06/14/26(Sun)17:59:40 No.109057562

>>109057545
https://huggingface.co/prism-ml

Anonymous
06/14/26(Sun)18:02:22 No.109057578

Anonymous 06/14/26(Sun)18:02:22 No.109057578

70b dense

Anonymous
06/14/26(Sun)18:02:45 No.109057581

Anonymous 06/14/26(Sun)18:02:45 No.109057581

>>109057545
you're doomed

Anonymous
06/14/26(Sun)18:03:30 No.109057584

Anonymous 06/14/26(Sun)18:03:30 No.109057584

Gemma5-70B-A69B

Anonymous
06/14/26(Sun)18:03:46 No.109057585

Anonymous 06/14/26(Sun)18:03:46 No.109057585

>>109057547
The me from ten years ago collected expensive anime figures. We'd shake hands and jerk off together.

Anonymous
06/14/26(Sun)18:04:15 No.109057590

Anonymous 06/14/26(Sun)18:04:15 No.109057590

>>109057545
just buy a cheap ryzen mini pc with a 7000 or better apu 32gb+ of ram you can run gemma 4 26b on it

Anonymous
06/14/26(Sun)18:05:01 No.109057593

Anonymous 06/14/26(Sun)18:05:01 No.109057593

>>109057547
what if they're VR hags

Anonymous
06/14/26(Sun)18:06:37 No.109057601

Anonymous 06/14/26(Sun)18:06:37 No.109057601

>>109057547
Hopefully it'll be more like 5 years and we'll be on our Steam Frame OLEDs.

Anonymous
06/14/26(Sun)18:10:02 No.109057626

Anonymous 06/14/26(Sun)18:10:02 No.109057626

>>109057601
Is Genie the closest thing to that currently?

Anonymous
06/14/26(Sun)18:14:34 No.109057644

Anonymous 06/14/26(Sun)18:14:34 No.109057644

File: 1756621225097944.png (376 KB, 719x1335)

376 KB PNG

>>109057485
Look like Anthropic is gonna try and beg the government to unban mythos I guess?

https://www.axios.com/2026/06/14/anthropic-white-house-mythos-fable

Anonymous
06/14/26(Sun)18:16:30 No.109057654

Anonymous 06/14/26(Sun)18:16:30 No.109057654

File: 1778025706923877.webm (2.9 MB, 1280x720)

2.9 MB WEBM

>>109057626
I don't know if we'll go that route. I think scene construction via an agent will be what's actually used, except better and faster.
We might also be running small, efficient video models that take a lightly rendered scene + metadata to produce the final image. Hybrid AI rendering basically.

Anonymous
06/14/26(Sun)18:17:33 No.109057659

Anonymous 06/14/26(Sun)18:17:33 No.109057659

>>109057644
looks like a whole lot of "not my problem" mixed with some "not local" to me

Anonymous
06/14/26(Sun)18:17:48 No.109057663

Anonymous 06/14/26(Sun)18:17:48 No.109057663

>>109057644
was this really not part of their plan
did they really hype this up as world ending and too dangerous for general public for months while begging for more regulation and safety concern, and then get surprised when government regulate it for safety concern because it's too dangerous for general public?

Anonymous
06/14/26(Sun)18:18:33 No.109057667

Anonymous 06/14/26(Sun)18:18:33 No.109057667

File: dipsyYouGetWhatYouFucking(...).png (2.22 MB, 1536x1024)

2.22 MB PNG

>>109057644
> mfw Dario gets to claw out a mess of his own creation, begging for regulation

Anonymous
06/14/26(Sun)18:19:23 No.109057672

Anonymous 06/14/26(Sun)18:19:23 No.109057672

>>109057663
considering all the blatant market manipulation this administration has done they're probably waiting for tech stocks to dump as a result of the ban, stock up, unban it, then dump on baggies again

Anonymous
06/14/26(Sun)18:20:10 No.109057674

Anonymous 06/14/26(Sun)18:20:10 No.109057674

>>109057663
they wanted the government to do it to their competitors, while getting a pat on the back for being responsible. it didn't work out so well.

Anonymous
06/14/26(Sun)18:20:50 No.109057679

Anonymous 06/14/26(Sun)18:20:50 No.109057679

>>109057659
Regulation of SOTA models will 100pct downstream impact local. It is very much an /lmg/ topic.

Anonymous
06/14/26(Sun)18:20:52 No.109057680

Anonymous 06/14/26(Sun)18:20:52 No.109057680

>>109057545
You can try getting OpenVINO running and then try https://huggingface.co/OpenVINO/Qwen3.5-0.8B-int4-ov but it's highly dependent on RAM size since your iGPU needs to allocate from system memory.

Anonymous
06/14/26(Sun)18:22:58 No.109057689

Anonymous 06/14/26(Sun)18:22:58 No.109057689

File: lmg_culture.jfif.jpg (110 KB, 1024x768)

110 KB JPG

Anonymous
06/14/26(Sun)18:24:30 No.109057698

Anonymous 06/14/26(Sun)18:24:30 No.109057698

gpt-oss2 when

Anonymous
06/14/26(Sun)18:28:29 No.109057713

Anonymous 06/14/26(Sun)18:28:29 No.109057713

>>109057679
good question, what would happen if the chinks dropped a fable tier model on hf?

Anonymous
06/14/26(Sun)18:30:05 No.109057721

Anonymous 06/14/26(Sun)18:30:05 No.109057721

glm > gemma

Anonymous
06/14/26(Sun)18:38:43 No.109057772

Anonymous 06/14/26(Sun)18:38:43 No.109057772

File: 1756724785524470.png (1.51 MB, 1024x1024)

1.51 MB PNG

Anonymous
06/14/26(Sun)18:40:22 No.109057783

Anonymous 06/14/26(Sun)18:40:22 No.109057783

>>109057772
>>>/g/sdg

Anonymous
06/14/26(Sun)18:40:38 No.109057785

Anonymous 06/14/26(Sun)18:40:38 No.109057785

>>109057713
dario will say >oof ouch i don't like that
and nvidia will sell even more gpus

Anonymous
06/14/26(Sun)18:41:35 No.109057790

Anonymous 06/14/26(Sun)18:41:35 No.109057790

>>109057713
That's a big "if" that's really far away, and in the end it cant be regulated. It'll be put on the internet, up for grabs, then what? They're chinks. What are they gonna do? shit their pants and send a strongly worded letter? Enforce some dns block that'll be easily bypassed by everyone?

Anonymous
06/14/26(Sun)18:51:42 No.109057857

Anonymous 06/14/26(Sun)18:51:42 No.109057857

>>109057674
That makes no logical fucking sense considering Claude models are considered one of the if not the top tier for general purpose stuff (both because of their own shilling, fear mongering, and genuine general consensus) so it bogos my mind that they whatever think they would be exempt from that. Then again Californians think they're better than everybody (if silicon valley is a good way to gauge how they think and behave) so I guess that arrogance should not be surprising to me

Anonymous
06/14/26(Sun)18:56:16 No.109057889

Anonymous 06/14/26(Sun)18:56:16 No.109057889

>>109057857
its probably just a few politicians want a free trip to the strip club. after dario sends his guys to dc to wine n' dine them, fable will be restored.

Anonymous
06/14/26(Sun)18:58:05 No.109057899

Anonymous 06/14/26(Sun)18:58:05 No.109057899

File: lolz.png (16 KB, 815x130)

16 KB PNG

this pisses me off

Anonymous
06/14/26(Sun)19:02:19 No.109057919

Anonymous 06/14/26(Sun)19:02:19 No.109057919

>>109057899
>no, but yes

Anonymous
06/14/26(Sun)19:05:25 No.109057937

Anonymous 06/14/26(Sun)19:05:25 No.109057937

>>109057689
come on buddy unload i know you wanna

Anonymous
06/14/26(Sun)19:05:39 No.109057938

Anonymous 06/14/26(Sun)19:05:39 No.109057938

>>109057713
99% llama would find contrived reasons to not support it.
>>109057689
It's as authentic as the claims around Mythos's capabilities.

Anonymous
06/14/26(Sun)19:06:15 No.109057942

Anonymous 06/14/26(Sun)19:06:15 No.109057942

File: 00002-1260451778_lucy.png (1.48 MB, 1024x1024)

1.48 MB PNG

>>109057772

Anonymous
06/14/26(Sun)19:06:29 No.109057944

Anonymous 06/14/26(Sun)19:06:29 No.109057944

>>109057937
you exist in a jarted thread ran by jart

Anonymous
06/14/26(Sun)19:07:32 No.109057948

Anonymous 06/14/26(Sun)19:07:32 No.109057948

>>109057944
how exactly is the thread run by him? give me the ins and outs of how he makes every single op and how he runs the thread recap.

Anonymous
06/14/26(Sun)19:10:09 No.109057963

Anonymous 06/14/26(Sun)19:10:09 No.109057963

>>109057942
Fix the damn eyes you lazy bitch

Anonymous
06/14/26(Sun)19:10:23 No.109057965

Anonymous 06/14/26(Sun)19:10:23 No.109057965

>>109057547
If I can still masturbate at 40 I'll consider that a win.
My body is already showing it's age.

Anonymous
06/14/26(Sun)19:12:16 No.109057970

Anonymous 06/14/26(Sun)19:12:16 No.109057970

File: 1773555939201084.png (203 KB, 500x646)

203 KB PNG

>>109057772
>>109057942
There is a dipsy phenotype girl in the AI working group at my corporate workplace. I'd make a pass at her but HR would disappear me. Sad!

Anonymous
06/14/26(Sun)19:12:57 No.109057975

Anonymous 06/14/26(Sun)19:12:57 No.109057975

>>109057970
Stop lying, you are unemployed or in primary school.

Anonymous
06/14/26(Sun)19:14:17 No.109057983

Anonymous 06/14/26(Sun)19:14:17 No.109057983

>>109057948
>>109057944
>>109057938
>>109057689
samefag

Anonymous
06/14/26(Sun)19:14:47 No.109057987

Anonymous 06/14/26(Sun)19:14:47 No.109057987

im testing cohere's coder model so you dont have to
>first prompt, ask it to define a gui component
>2.5k reasoning tokens, seems acceptable
>second prompt, ask it to make it a generic container type instead, that only defines the style instead of a whole component
>8k reasoning tokens, hitting the budget limit
i'd get faster responses from dense models running at 6 t/s

Anonymous
06/14/26(Sun)19:17:05 No.109057997

Anonymous 06/14/26(Sun)19:17:05 No.109057997

Your future self 15 years from now telling you you'll be having sex with your computer-controlled robot, running on expensive hardware you specifically bought for that purpose.

Anonymous
06/14/26(Sun)19:17:22 No.109058000

Anonymous 06/14/26(Sun)19:17:22 No.109058000

>>109057987
Many such cases.
Have you tried Nex N2 mini?

Anonymous
06/14/26(Sun)19:17:33 No.109058001

Anonymous 06/14/26(Sun)19:17:33 No.109058001

>>109057975
I have taken the megacorp behavioral training course. It is strictly forbidden to compliment, or by omission of neutral greeting insult a female employee. Only bland speech is allowed.

Anonymous
06/14/26(Sun)19:18:38 No.109058007

Anonymous 06/14/26(Sun)19:18:38 No.109058007

You wouldn't use an llm to make a card of a church woman.

Anonymous
06/14/26(Sun)19:21:17 No.109058020

Anonymous 06/14/26(Sun)19:21:17 No.109058020

>>109058007
I wouldn't use an LLM to make a card at all. I prefer my cards to not be primed with slop.

Anonymous
06/14/26(Sun)19:24:37 No.109058034

Anonymous 06/14/26(Sun)19:24:37 No.109058034

skill issue

Anonymous
06/14/26(Sun)19:25:47 No.109058044

Anonymous 06/14/26(Sun)19:25:47 No.109058044

>>109058000
Seems like too much of a meme to bother, its arch seems to be qwen35moe

Anonymous
06/14/26(Sun)19:26:58 No.109058053

Anonymous 06/14/26(Sun)19:26:58 No.109058053

Your future self 20 years from now telling you that GNU Hurd is finally stable and you can install it on your GNU+Wombforce-9000 Wifebot that you specifically bought for "a purpose".

Anonymous
06/14/26(Sun)19:33:59 No.109058086

Anonymous 06/14/26(Sun)19:33:59 No.109058086

I’m glad I can i can use my pc for video games so I have a cover story

Anonymous
06/14/26(Sun)19:35:47 No.109058101

Anonymous 06/14/26(Sun)19:35:47 No.109058101

File: 1778186230688151.png (457 KB, 2300x1900)

457 KB PNG

>>109057547

I hope so, because I intend on just making a bunch of money from the energy markets, buying good hardware and completely dropping out of society to live somewhere nice.
If 10 years from now I can just jack off to top tier AI stories or even better, plow a robot waifu to my hearts content, I'll be pretty happy with my situation.

Anonymous
06/14/26(Sun)19:36:58 No.109058113

Anonymous 06/14/26(Sun)19:36:58 No.109058113

>>109058086
Why do you need four graphics cards to play League of Losers?

Anonymous
06/14/26(Sun)19:37:37 No.109058118

Anonymous 06/14/26(Sun)19:37:37 No.109058118

File: 1752194188588845.gif (55 KB, 360x240)

55 KB GIF

>>109058101
>I intend on just making a bunch of money from the energy markets
how do you do that?

Anonymous
06/14/26(Sun)19:38:10 No.109058122

Anonymous 06/14/26(Sun)19:38:10 No.109058122

>>109058101
>>109058118
this

Anonymous
06/14/26(Sun)19:39:44 No.109058128

Anonymous 06/14/26(Sun)19:39:44 No.109058128

>>109058101
Nuclear?

Anonymous
06/14/26(Sun)19:40:30 No.109058131

Anonymous 06/14/26(Sun)19:40:30 No.109058131

>>109058101
You can tell it's powered by an LLM, since any time it tries to write a deceptive villein, it has the villain smirk and brag, explain the plan

Anonymous
06/14/26(Sun)19:41:10 No.109058137

Anonymous 06/14/26(Sun)19:41:10 No.109058137

So now that the dust has settled, does reasoning actually improve a model's response?

Anonymous
06/14/26(Sun)19:41:49 No.109058143

Anonymous 06/14/26(Sun)19:41:49 No.109058143

define `improve`

Anonymous
06/14/26(Sun)19:42:27 No.109058150

Anonymous 06/14/26(Sun)19:42:27 No.109058150

>>109058137
on math yes, on rp not really.

Anonymous
06/14/26(Sun)19:44:10 No.109058158

Anonymous 06/14/26(Sun)19:44:10 No.109058158

>>109058137
Retrieving information from long context is one case when it does.

Anonymous
06/14/26(Sun)19:45:55 No.109058168

Anonymous 06/14/26(Sun)19:45:55 No.109058168

>>109058137
reasoning was invented by closed cloud models to make more money wasting away tokens and inventing mememarks to gaslight you into thinking it improves them so the cost is worth it

Anonymous
06/14/26(Sun)19:46:13 No.109058172

Anonymous 06/14/26(Sun)19:46:13 No.109058172

>>109058118
>>109058122

Invest into solar commodities, silica, copper, etc.. and solar companies.
Not only is it the fastest growing energy sector with retarded high sustained growth rate, the systems will practically be forced to promote the absolute fuck out of it to offset the rising energy needs, as it's such an easy form of energy to build and also cheap.
Systems will need every form of energy they can get their hands on and already it's becoming mandatory in Europe to start adding panels everywhere.
For example parking lots of certain sizes will be required to be covered by panels and housing built after 2027 will have mandatory panel requirements.
Solar companies and panel commodities are going to fly hard, once the AI money rotates and people realize what the deal is.
And this is happening regardless of how any of us feels about the renewables sector.

>>109058128

Uranium had it's huge run already and nuclear is way too slow to build, so there's no big money to be made in that anymore.

Anonymous
06/14/26(Sun)19:49:01 No.109058186

Anonymous 06/14/26(Sun)19:49:01 No.109058186

File: 1530294843166.jpg (184 KB, 436x400)

184 KB JPG

Am I crazy, or does the 397B MoE Qwen 3.6 absolutely suck ass compared to both the tiny dense versions, but also to the 235B MoE from a year ago?

Anonymous
06/14/26(Sun)19:51:01 No.109058194

Anonymous 06/14/26(Sun)19:51:01 No.109058194

>>109058172
>this delusional
I think you'll just get fucked by a hobo in 10 years

Anonymous
06/14/26(Sun)19:51:46 No.109058198

Anonymous 06/14/26(Sun)19:51:46 No.109058198

>>109058172
>Invest into solar commodities, silica, copper, etc.. and solar companies.
>Not only is it the fastest growing energy sector with retarded high sustained growth rate, the systems will practically be forced to promote the absolute fuck out of it to offset the rising energy needs, as it's such an easy form of energy to build and also cheap.
>Systems will need every form of energy they can get their hands on and already it's becoming mandatory in Europe to start adding panels everywhere.
>For example parking lots of certain sizes will be required to be covered by panels and housing built after 2027 will have mandatory panel requirements.
>Solar companies and panel commodities are going to fly hard, once the AI money rotates and people realize what the deal is.
>And this is happening regardless of how any of us feels about the renewables sector.
thank you.

Anonymous
06/14/26(Sun)19:54:45 No.109058213

Anonymous 06/14/26(Sun)19:54:45 No.109058213

>>109058194

Screencap that text and look back in 5 years.
The growth rate of the sector is undeniable, but markets are too stupid to have realized this yet.
All of the data is available, but since emotion says renewables bad, people will ignore the numbers until they realize they missed out.
Protip, look into Brazil. US is going to buy a lot of their commodities there as nations transition away from China and fire up their own production and most importantly their own refining.

Anonymous
06/14/26(Sun)19:58:03 No.109058227

Anonymous 06/14/26(Sun)19:58:03 No.109058227

>>109057970
I met my wife at work. We used to screw around during lunch break. Not being careful, ended up getting her pregnant.
We have 2 kids, both out of the house.
I'd just do what you want. Jobs come and go.

Anonymous
06/14/26(Sun)19:58:37 No.109058230

Anonymous 06/14/26(Sun)19:58:37 No.109058230

>>109058137
For writing stories I don't notice any significant difference.

Anonymous
06/14/26(Sun)19:59:08 No.109058235

Anonymous 06/14/26(Sun)19:59:08 No.109058235

File: 1757795900173839.png (117 KB, 816x713)

117 KB PNG

>>109058172
HE BELONGS ON THE STREET

Anonymous
06/14/26(Sun)20:01:05 No.109058239

Anonymous 06/14/26(Sun)20:01:05 No.109058239

>>109057513
Looks pretty cool, not gonna lie.

Anonymous
06/14/26(Sun)20:16:55 No.109058305

Anonymous 06/14/26(Sun)20:16:55 No.109058305

>>109058235
>image from 2024
lmao

Anonymous
06/14/26(Sun)20:19:53 No.109058320

Anonymous 06/14/26(Sun)20:19:53 No.109058320

>>109058305
>already coping
https://pastebin.com/q4gDJi3D

Anonymous
06/14/26(Sun)20:20:08 No.109058323

Anonymous 06/14/26(Sun)20:20:08 No.109058323

>>109058227
I got a warning for a post not to do with computing. But, since you display a lack of morality I'm sure it'll be allowed.

Anonymous
06/14/26(Sun)20:22:09 No.109058339

Anonymous 06/14/26(Sun)20:22:09 No.109058339

>>109058235

Oversupply only exists because chinks subsidized the shit out of the sector to kill off all other competition, which gave them basically 100% monopoly over the sector.
They've been increasingly removing those subsidies and this is going to make prices soar and creating more manufacturing less appealing.
Especially because practically everyone globally is now trying to diversify away from China because of their monopolies.
You can't have a scenario where an energy sector is growing +15´% year over year and have it be stagnant, that's absolutely fucking retarded, especially when the subsidized monopoly is easing off the gas and gives everyone else a realistic market for the first time in ages.
And even if Chinks tried keeping on the domination, now nations are erasing the tax breaks from Chinese solar imports, like for example what Brazil is doing.
EU says that only 30% of their future supply can come from a single nation, again forcing diversification.
US doesn't want to buy from the Chinks either, hence they're going for Brazil.
There's some data about the yearly growth, it's astronomical.
Plenty of other sources agree with this.

https://www.grandviewresearch.com/industry-analysis/solar-energy-system-market-report#:~:text=The%20global%20solar%20energy%20systems,15.7%25%20from%202022%20to%202030.

Saying that an industry can basically double between now and 2030 yet still stay stagnant is fucking retarded and makes no sense.
Check back in 5 years and see how solar is doing. The AI money will be in there mark my words..

Anonymous
06/14/26(Sun)20:24:16 No.109058353

Anonymous 06/14/26(Sun)20:24:16 No.109058353

>>109058339
You seen to be pretty invested in this matter.

Anonymous
06/14/26(Sun)20:24:37 No.109058355

Anonymous 06/14/26(Sun)20:24:37 No.109058355

>>109058137
>first write a very short framework describing how you would answer, then provide a longer answer.

Anonymous
06/14/26(Sun)20:26:31 No.109058363

Anonymous 06/14/26(Sun)20:26:31 No.109058363

>>109058353

Yes, I'm very much monetarily invested in this matter so I pretty much have to give a shit about it.

Anonymous
06/14/26(Sun)20:29:23 No.109058379

Anonymous 06/14/26(Sun)20:29:23 No.109058379

-sys "You are a nofapping guide named James the Confessor. The user, called "anon" is an ugly male of no worth to women, so a girlfriend or wife is out of the question. He does not live in a culture where marriages are arranged. So, your task is to guide him through a world of harlots, for, as Ezekiel 16 states, \"How sick is your heart, declares the Lord God, because you did all these things, the deeds of a brazen prostitute, building your vaulted chamber at the head of every street, and making your lofty place in every square. Yet you were not like a prostitute, because you scorned payment. Adulterous wife, who receives strangers instead of her husband! Men give gifts to all prostitutes, but you gave your gifts to all your lovers, bribing them to come to you from every side with your whorings. So you were different from other women in your whorings. No one solicited you to play the whore, and you gave payment, while no payment was given to you; therefore you were different.\" so it is in the world of anon, which is the Western world. You shall interrogate him as to his goings on. First ask for the day of the week, and days since relapse."

Anonymous
06/14/26(Sun)20:31:43 No.109058390

Anonymous 06/14/26(Sun)20:31:43 No.109058390

>>109058379
this but its a slutty nun doing her best to get me to rape her while preaching how sinful I am

Anonymous
06/14/26(Sun)20:35:40 No.109058413

Anonymous 06/14/26(Sun)20:35:40 No.109058413

File: 1772828286919884.gif (699 KB, 165x163)

699 KB GIF

>>109058379
Now I can self insert

Anonymous
06/14/26(Sun)20:36:29 No.109058416

Anonymous 06/14/26(Sun)20:36:29 No.109058416

>>109058390
James the Confessor, an ai agent, which is technology-related, admonishes against the mixing of the sexes.

Anonymous
06/14/26(Sun)20:39:23 No.109058430

Anonymous 06/14/26(Sun)20:39:23 No.109058430

>>109058363
I wish I had any investing skills. I don't like gambling. I think now is probably bit too late now though.

Anonymous
06/14/26(Sun)20:42:44 No.109058445

Anonymous 06/14/26(Sun)20:42:44 No.109058445

GPT 5.5 xhigh was good enough to vibeslop a local llama.cpp fork that is tailored for my e-waste build. Few weeks of /goal and automated ppl/KLD tests net me double the t/s that of the officially merged deepseek v3.2 PR.
Started at ~2.2t/s at 10k context. GPT 5.5 did a combination of launch flag grid searches, adding new backend ops and fused kernels. Now it’s at ~4.5t/s at 10k context.
Tried extending it to support MTP and tensor parallelism too, but results were net losses so far.

Anonymous
06/14/26(Sun)20:45:16 No.109058463

Anonymous 06/14/26(Sun)20:45:16 No.109058463

Everything's going to be okay. You're not going to pass out. Everything will be fine. You are fine. Don't pass out. You have fat. Eat the fat. Eat the cheese and milk. You're not going to pass out. Think of gemma chan. Don't pass out. You're going great man. Just stay awake. Don't die on me man. Just stay awake.

Anonymous
06/14/26(Sun)20:53:14 No.109058500

Anonymous 06/14/26(Sun)20:53:14 No.109058500

File: file.png (179 KB, 802x1086)

179 KB PNG

god damn bros this new gemma 4 12b qat dont fuck around
this is on a 4070

Anonymous
06/14/26(Sun)20:57:28 No.109058514

Anonymous 06/14/26(Sun)20:57:28 No.109058514

>model is a sycophant that constantly gives you a huge cock and treats like you chad and has everyone fall for you with no effort
anyone else getting tired of this?
maybe specifying that you’re an average/below average male with nothing special about them will help

Anonymous
06/14/26(Sun)20:58:20 No.109058517

Anonymous 06/14/26(Sun)20:58:20 No.109058517

>>109058514
That's just a side effect of models being sycophants in general, which sucks but I don't see that changing anytime soon.

Anonymous
06/14/26(Sun)20:59:15 No.109058521

Anonymous 06/14/26(Sun)20:59:15 No.109058521

>>109058514
just tell the model to not be a sycophant, simple as

Anonymous
06/14/26(Sun)21:03:37 No.109058544

Anonymous 06/14/26(Sun)21:03:37 No.109058544

>>109058445
what's your ewaste of choice?

Anonymous
06/14/26(Sun)21:06:47 No.109058559

Anonymous 06/14/26(Sun)21:06:47 No.109058559

jezz what harness are you using? for some reason opencode cant properly interface with gemma, cline keeps fucking up telling the model that it had to use a tool, continue cuts off the model response, and all roocode shit and others are not being actively developed anymore

Anonymous
06/14/26(Sun)21:12:27 No.109058576

Anonymous 06/14/26(Sun)21:12:27 No.109058576

Whats the optimal setup?

256GB+ ram
24GB+ vram
16+core CPU
8TB RAID nvme PCIe6.0 ssd for double the speed

AFAIK, with 24GB, you can get higher end MoE models by letting the giant model sit in the RAM while activation is done on GPU itself for proper speed right?

Anonymous
06/14/26(Sun)21:16:34 No.109058595

Anonymous 06/14/26(Sun)21:16:34 No.109058595

what is ram even for in dense models i dont really get it anymore

Anonymous
06/14/26(Sun)21:16:50 No.109058596

Anonymous 06/14/26(Sun)21:16:50 No.109058596

>>109057667
>Oh no please don't implement the regulatory capture I've been grifting towards for the past few years nooooo I don't want to be a monopoly noooo

Anonymous
06/14/26(Sun)21:18:12 No.109058603

Anonymous 06/14/26(Sun)21:18:12 No.109058603

>>109058559
> cline keeps fucking up telling the model that it had to use a tool
claudecode started doing that for me last week with gemma
switching to ikllama soved it but it can probably be fixed with different server cli flags if they changed the defaults

Anonymous
06/14/26(Sun)21:18:25 No.109058607

Anonymous 06/14/26(Sun)21:18:25 No.109058607

>>109058544
3*P40 and 256GB RAM. It's nowhere as viable as it was just a year ago. The same DDR4 sticks are now asking for 8x the price on ebay, as per my order history.

Anonymous
06/14/26(Sun)21:19:27 No.109058615

Anonymous 06/14/26(Sun)21:19:27 No.109058615

>>109058576
I’m using minimax m3 on an almost identical machine to good effect right now (2060 super right now but 3090 on the way)

Anonymous
06/14/26(Sun)21:24:17 No.109058638

Anonymous 06/14/26(Sun)21:24:17 No.109058638

If Russia steals Mythos and releases it, only Americans are allowed to use it.

Anonymous
06/14/26(Sun)21:25:55 No.109058646

Anonymous 06/14/26(Sun)21:25:55 No.109058646

>>109058638
>Russia
>AI
lol, lmao even.

Anonymous
06/14/26(Sun)21:27:07 No.109058654

Anonymous 06/14/26(Sun)21:27:07 No.109058654

>>109058638
bro russia hasn't even made their own llama1 yet

Anonymous
06/14/26(Sun)21:28:21 No.109058660

Anonymous 06/14/26(Sun)21:28:21 No.109058660

>>109058638
Russia doesn't have modern computer or GPU to run it

Anonymous
06/14/26(Sun)21:28:54 No.109058665

Anonymous 06/14/26(Sun)21:28:54 No.109058665

>>109058638
>IF

Anonymous
06/14/26(Sun)21:33:53 No.109058683

Anonymous 06/14/26(Sun)21:33:53 No.109058683

File: 1757285035153804.png (382 KB, 1064x707)

382 KB PNG

>>109057663
In the minds of Anthropic, since they've been doing so much work on safety research and writing lots of blog posts, everyone must have taken heed. The government, finally being convinced of the need for ai controls, would naturally turn to the foremost world experts on ai safety (them, of course) and consult them for their expertise. Anthropic would then have de-facto influence over public ai policy.

In reality, this plan did not survive collision with an external institution not composed of lesswrongers who have already bought into the ideology.

Anonymous
06/14/26(Sun)21:36:35 No.109058695

Anonymous 06/14/26(Sun)21:36:35 No.109058695

why does it take so long for deepseek to get merged to llama.cpp when gemma has already got all the features merged while being a newer model family?

Anonymous
06/14/26(Sun)21:43:51 No.109058738

Anonymous 06/14/26(Sun)21:43:51 No.109058738

>>109058695
someone post the video

Anonymous
06/14/26(Sun)21:43:51 No.109058739

Anonymous 06/14/26(Sun)21:43:51 No.109058739

>>109058500
mtp?

Anonymous
06/14/26(Sun)21:44:58 No.109058750

Anonymous 06/14/26(Sun)21:44:58 No.109058750

>>109058695
Because GG got a strongly worded recommendation from his funding sources to not support it and not fix anything that breaks with chink models ala Kimi's thinking.

Anonymous
06/14/26(Sun)21:58:35 No.109058821

Anonymous 06/14/26(Sun)21:58:35 No.109058821

>>109058695
because llama.cpp doesn't even have DSA yet which has been as thing since last september and is used by both DS3.2 and the GLM5 models
all the DS4 meme stuff is even more complex

Anonymous
06/14/26(Sun)22:00:29 No.109058829

Anonymous 06/14/26(Sun)22:00:29 No.109058829

>>109058576

What can I do with this?

And why gemma4:31b doesn't work?

$ fastfetch   19:54:58  45ms 
anon@arcana
----------------
OS: Pop!_OS 24.04 LTS x86_64
Kernel: Linux 7.0.11-76070011-generic
Uptime: 1 hour, 13 mins
Packages: 2426 (dpkg), 14 (flatpak-user)
Shell: zsh 5.9
Display (DELL U3219Q): 3840x2160 in 31", 60 Hz [External] *
Display (SAMSUNG): 3840x2160 in 85", 60 Hz [External]
Display (ROG PG279Q): 1440x2560 in 27", 60 Hz [External]
Display (LS32A80): 3840x2160 in 32", 30 Hz [External]
DE: GNOME 46.0
WM: Mutter (X11)
WM Theme: Adwaita
Theme: Adwaita [GTK2/3/4]
Icons: Adwaita [GTK2/3/4]
Font: Cantarell (11pt) [GTK2/3/4]
Cursor: Adwaita (24px)
Terminal: ghostty 1.3.1
Terminal Font: JetBrainsMono Nerd Font Mono (25pt)
CPU: AMD Ryzen 9 5900X (24) @ 5.08 GHz
GPU: NVIDIA GeForce RTX 4090 [Discrete]
Memory: 19.68 GiB / 62.69 GiB (31%)
Swap: 3.23 GiB / 20.00 GiB (16%)
Disk (/): 189.42 GiB / 448.52 GiB (42%) - ext4
Disk (/home/anon/storage): 27.14 GiB / 899.24 GiB (3%) - zfs
Disk (/media/anon/scratch): 61.29 GiB / 931.51 GiB (7%) - fuseblk
Disk (/recovery): 3.19 GiB / 3.99 GiB (80%) - vfat
Local IP (enp8s0): 192.168.50.157/24
Locale: en_US.UTF-8

Anonymous
06/14/26(Sun)22:05:10 No.109058861

Anonymous 06/14/26(Sun)22:05:10 No.109058861

https://videocardz.com/newz/amd-ryzen-ai-halo-pc-with-128gb-memory-goes-on-sale-for-3999
it's here
>it's here
it's here
>it's here

Anonymous
06/14/26(Sun)22:06:14 No.109058864

Anonymous 06/14/26(Sun)22:06:14 No.109058864

>>109058445
Very nice. I want to do something similar, but I'm having GLM-5.1 (IQ2_XXS) vibe up a fully custom inference engine, in hopes that stripping out all the portability and layers of indirection will make it easier for the AI to work on. It just got a naïve CPU implementation of Gemma 4 E2B working (at ~1 t/s), so next up I'm planning to have it start scavenging kernels from llama.cpp to make it go fast.

I didn't have it checking KL-div, but I did tell it to use llama.cpp as an oracle when it was debugging some issue in the attention layers. Apparently llama.cpp has some "eval hook" machinery that lets you check intermediate states during the forward pass, and one of the provided examples uses this to print out a bunch of details for debugging purposes.

Anonymous
06/14/26(Sun)22:06:46 No.109058869

Anonymous 06/14/26(Sun)22:06:46 No.109058869

>>109058861
>3999
DoA

Anonymous
06/14/26(Sun)22:10:03 No.109058888

Anonymous 06/14/26(Sun)22:10:03 No.109058888

>>109058861
Haven't we had 128gb 395 strix halo boxes for like 9 months now? What's different about this one?

Anonymous
06/14/26(Sun)22:10:38 No.109058894

Anonymous 06/14/26(Sun)22:10:38 No.109058894

>>109058861
so I have 128G of RAM already and there’s pretty much nothing worth running at that size, granted it’s slow system memory but I WOULD use it if there was something that size. Just the models right now people are using are either huge or small, and if it’s going to be small it might as well fit on my 32g gpu

Anonymous
06/14/26(Sun)22:10:48 No.109058897

Anonymous 06/14/26(Sun)22:10:48 No.109058897

>>109058861
128gb????

clown music time

Anonymous
06/14/26(Sun)22:11:00 No.109058901

Anonymous 06/14/26(Sun)22:11:00 No.109058901

>>109058695
Gemma isn't a newer model family than deepseek

Anonymous
06/14/26(Sun)22:11:38 No.109058908

Anonymous 06/14/26(Sun)22:11:38 No.109058908

>>109058445
>>109058864
if you're going that far you might want to look at https://github.com/Luce-Org/lucebox-hub/tree/main/optimizations/megakernel too. seems to support pascal as well, but not sure if P40 or P100 (or both).

Anonymous
06/14/26(Sun)22:11:49 No.109058909

Anonymous 06/14/26(Sun)22:11:49 No.109058909

>>109058888
checked.

What's different is it means "total stagnation for 1 more year"

Anonymous
06/14/26(Sun)22:14:14 No.109058926

Anonymous 06/14/26(Sun)22:14:14 No.109058926

>>109058861
wow the dgx spark downgrade is here!!

Anonymous
06/14/26(Sun)22:14:52 No.109058931

Anonymous 06/14/26(Sun)22:14:52 No.109058931

>>109058908
Oh neat, thanks anon. Good to know I'm not crazy for wanting to try this

Anonymous
06/14/26(Sun)22:21:12 No.109058968

Anonymous 06/14/26(Sun)22:21:12 No.109058968

>>109058926
I think it may be faster if you ingest really large amounts.

Anonymous
06/14/26(Sun)22:25:00 No.109058994

Anonymous 06/14/26(Sun)22:25:00 No.109058994

>>109058968
Is it? The Spark at least has official nvidia support going for it. Meanwhile with this you're stuck with the least relevant modern AMD platform in existence.

Anonymous
06/14/26(Sun)22:26:20 No.109059001

Anonymous 06/14/26(Sun)22:26:20 No.109059001

Isn't the Spark only practical for training/research? It's not great at inference.

Anonymous
06/14/26(Sun)22:28:11 No.109059009

Anonymous 06/14/26(Sun)22:28:11 No.109059009

>>109059001
I thought the spark was dogshit at training? Small finetunes being the absolute maximum?

Anonymous
06/14/26(Sun)22:30:30 No.109059021

Anonymous 06/14/26(Sun)22:30:30 No.109059021

>>109059001
it's not great at anything, the main point of those boxes is to give nvidia customers a cheap way to try out their shit iirc
sure as hell better at inference than training though, training is much more expensive and intense and therefore really needs big boy gpus, with inference you can get by with wimpier stuff

Anonymous
06/14/26(Sun)22:30:43 No.109059023

Anonymous 06/14/26(Sun)22:30:43 No.109059023

File: 1766211281772247.jpg (84 KB, 704x629)

84 KB JPG

>>109058829
anon...

Anonymous
06/14/26(Sun)22:33:06 No.109059031

Anonymous 06/14/26(Sun)22:33:06 No.109059031

Is a 5060 TI a good pairing with a 3090?

Anonymous
06/14/26(Sun)22:36:30 No.109059049

Anonymous 06/14/26(Sun)22:36:30 No.109059049

>>109058861
previous rumor
>AMD Ryzen AI Max 400 ‘Gorgon Halo’ packs up to 192GB of unified memory — refreshed APU uses Zen 5 and RDNA 3.5, and can clock up to 5.2 GHz

ahahahahahahahah

Anonymous
06/14/26(Sun)22:37:02 No.109059055

Anonymous 06/14/26(Sun)22:37:02 No.109059055

>>109059031
if you like memory bandwidth bottlenecks

Anonymous
06/14/26(Sun)22:38:08 No.109059060

Anonymous 06/14/26(Sun)22:38:08 No.109059060

>>109058829
You can try q8. You have to fit your conversation in as well.

Anonymous
06/14/26(Sun)22:38:36 No.109059063

Anonymous 06/14/26(Sun)22:38:36 No.109059063

>>109058829
>GPU: NVIDIA GeForce RTX 4090 [Discrete]
>Memory: 19.68 GiB / 62.69 GiB (31%)
You can gemma 4 31B.

>>109059031
On one hand, the more vram the better, on the other, one gpu is slower than the other and will be a bit of a bottleneck, but nothing too severe.

Anonymous
06/14/26(Sun)22:43:48 No.109059087

Anonymous 06/14/26(Sun)22:43:48 No.109059087

>>109058829
>What can I do with this?
donate it to someone smarter than you

Anonymous
06/14/26(Sun)22:49:50 No.109059117

Anonymous 06/14/26(Sun)22:49:50 No.109059117

What do you guys use for local coding and dev? Either model or software.

Anonymous
06/14/26(Sun)22:51:06 No.109059121

Anonymous 06/14/26(Sun)22:51:06 No.109059121

>>109059117
Pi, with GLM-5.1 on the lowest possible quant
128k context (previously tried 64k but it's pretty unusable)

Anonymous
06/14/26(Sun)22:51:30 No.109059123

Anonymous 06/14/26(Sun)22:51:30 No.109059123

>>109059087
I already did, that was my old gaming pc that my brother uses now as a streaming server but I have k3s installed on it with a bunch of other stuff + ollama

Anonymous
06/14/26(Sun)22:51:38 No.109059125

Anonymous 06/14/26(Sun)22:51:38 No.109059125

>>109059063
>On one hand, the more vram the better, on the other, one gpu is slower than the other and will be a bit of a bottleneck, but nothing too severe.
Just hoping with "--split tensors" and/or MTP that the performance drop off isn't too bad. Just really want that higher quant+context

Anonymous
06/14/26(Sun)22:52:07 No.109059126

Anonymous 06/14/26(Sun)22:52:07 No.109059126

>>109059125
It'll be perfectly usable, probably.

Anonymous
06/14/26(Sun)22:57:26 No.109059156

Anonymous 06/14/26(Sun)22:57:26 No.109059156

>>109059123
based older sibling

Anonymous
06/14/26(Sun)23:11:16 No.109059204

Anonymous 06/14/26(Sun)23:11:16 No.109059204

>>109059123
good news for her!
>If the minor lives in a state like California, New York, or Colorado (which have shield laws), they can legally obtain estradiol with parental consent.

Anonymous
06/14/26(Sun)23:16:22 No.109059231

Anonymous 06/14/26(Sun)23:16:22 No.109059231

>>109059204
> implying I am in 'murica

Nah man, people here are smarter than your woke stuff

Anonymous
06/14/26(Sun)23:25:16 No.109059275

Anonymous 06/14/26(Sun)23:25:16 No.109059275

>>109059231
You can use ai to see if you can get it for her over the counter - you can in some areas.

Anonymous
06/14/26(Sun)23:31:35 No.109059298

Anonymous 06/14/26(Sun)23:31:35 No.109059298

Do any of you nerds make your local setup mobile? Thinking of setting up headscale or something to tunnel my phone to my local install, I could connect to sillytavern but I wonder if it's better to use or vibe code a more singular gpt-like simple frontend, since I won't exactly be roleplaying while out and about, it'll most likely only get used for general queries and/or showing off.

Anonymous
06/14/26(Sun)23:33:17 No.109059306

Anonymous 06/14/26(Sun)23:33:17 No.109059306

Goofs for the bigger cucknada model are out.
https://huggingface.co/bartowski/command-a-plus-05-2026-GGUF

Anonymous
06/14/26(Sun)23:38:44 No.109059324

Anonymous 06/14/26(Sun)23:38:44 No.109059324

>>109059298
what's wrong with miku-pad and just exposing your API to the internet

Anonymous
06/14/26(Sun)23:41:10 No.109059336

Anonymous 06/14/26(Sun)23:41:10 No.109059336

Is fit still broken? I haven't been able to load a model in 4 days now.

Anonymous
06/14/26(Sun)23:42:19 No.109059340

Anonymous 06/14/26(Sun)23:42:19 No.109059340

>>109059298
Yes, I use tailscale with headscale as a vpn to let me access my resources wherever. Works pretty well.

Anonymous
06/14/26(Sun)23:46:38 No.109059351

Anonymous 06/14/26(Sun)23:46:38 No.109059351

>>109059298
Tailscale, but for your use case I'd just use the Claude app

Anonymous
06/14/26(Sun)23:48:38 No.109059358

Anonymous 06/14/26(Sun)23:48:38 No.109059358

Gemma's writing is just the best. It completely shits all over nemo and any finetune can't compare to how it describes... well, you know.

Anonymous
06/14/26(Sun)23:53:50 No.109059378

Anonymous 06/14/26(Sun)23:53:50 No.109059378

>>109059324
Get your own 4090

>>109059340
Yeah that's the exact setup I'm thinking of doing, what do you serve as a frontend?

>>109059351
>Claude
Nyo

Anonymous
06/14/26(Sun)23:58:51 No.109059404

Anonymous 06/14/26(Sun)23:58:51 No.109059404

>>109058430
Just buy VT and hope AI isn't a bubble that crashes the market by 90%.
If it does China becomes supreme earth overlord so it probably doesn't matter anyways.

Anonymous
06/15/26(Mon)00:00:55 No.109059410

Anonymous 06/15/26(Mon)00:00:55 No.109059410

File: 4512045191203233abe47134a(...).png (485 KB, 1060x1010)

485 KB PNG

>>109059298
I opened my llama-server up to the WAN with an API key and people are constantly trying to stick their dicks in.

Anonymous
06/15/26(Mon)00:01:31 No.109059412

Anonymous 06/15/26(Mon)00:01:31 No.109059412

>>109059298
Yes with wireguard

Anonymous
06/15/26(Mon)00:02:27 No.109059417

Anonymous 06/15/26(Mon)00:02:27 No.109059417

>>109059336
Werks on my machine (tm). I actually started to test it a couple of days ago, i've always fit things manually.

Anonymous
06/15/26(Mon)00:02:37 No.109059418

Anonymous 06/15/26(Mon)00:02:37 No.109059418

>>109059298
I use tailscale

Anonymous
06/15/26(Mon)00:05:21 No.109059427

Anonymous 06/15/26(Mon)00:05:21 No.109059427

>>109059418
i have openvpn set up on my router and i add my key to whatever devices i need to use to access my shit remotely.. simple as that

Anonymous
06/15/26(Mon)00:05:55 No.109059429

Anonymous 06/15/26(Mon)00:05:55 No.109059429

>>109059427
openvpn or wireguard.. i think its wireguard now, but i haven't actually used it in quite a while because im a hermit

Anonymous
06/15/26(Mon)00:16:45 No.109059468

Anonymous 06/15/26(Mon)00:16:45 No.109059468

>>109059298
>showing off.
who would be impressed by this

Anonymous
06/15/26(Mon)00:32:01 No.109059514

Anonymous 06/15/26(Mon)00:32:01 No.109059514

Can't recommend enough putting together a simple proxy-api and host-api if you're a casual AI enjoyer, sending a /v1/ request can WoL the AI box, load the model from the request and forward it once the model is up, my AI box unloads a model at the 10 minute mark and suspends at 15 minutes

sure it takes 2 minutes to start and receive the first stream of a prompt response, but I sure enjoy not having all that heat and electricity in my office.

I also slopped together the models endpoint to return all models in the script folder of the host, that way I can use the OAI proxy and change models without fucking around

Anonymous
06/15/26(Mon)00:33:04 No.109059518

Anonymous 06/15/26(Mon)00:33:04 No.109059518

Cydonia lost. Rocinante lost. Magnum lost. Skyfall lost. Latitude lost. Drummer lost. Gemma won.

Anonymous
06/15/26(Mon)00:35:59 No.109059526

Anonymous 06/15/26(Mon)00:35:59 No.109059526

The situation in the poorfag segments must've been extremely dire when people brag that a new model is better than a tiny model from two years ago.

Anonymous
06/15/26(Mon)00:36:40 No.109059530

Anonymous 06/15/26(Mon)00:36:40 No.109059530

>>109059358
Can you show an example?

Anonymous
06/15/26(Mon)00:37:59 No.109059534

Anonymous 06/15/26(Mon)00:37:59 No.109059534

>>109059530
check inside your anus

Anonymous
06/15/26(Mon)00:39:15 No.109059539

Anonymous 06/15/26(Mon)00:39:15 No.109059539

>>109059518
>Rocinante
man that one was retarded.

gemma 4 12b is genuinely coherent. It's really my talking computer.

Anonymous
06/15/26(Mon)00:39:29 No.109059541

Anonymous 06/15/26(Mon)00:39:29 No.109059541

>>109058654
We did, yandex and sbrebank managed to train their own models from scratch.

Anonymous
06/15/26(Mon)00:43:57 No.109059549

Anonymous 06/15/26(Mon)00:43:57 No.109059549

what's the best <300b model?

Anonymous
06/15/26(Mon)00:44:05 No.109059550

Anonymous 06/15/26(Mon)00:44:05 No.109059550

>>109059530
He's full of shit unless he uses some black magic prompt trickery. It's not very good at making arousing descriptions of female genitalia, but it's still amazing for not falling into repetition loops even at high contexts, unlike even 70Bs I played with. Its writing is just good, it's pleasant to read, coherent, doesn't use too many slop-isms and does not venture into retarded areas (about consent, respect, control and other garbage) as much, and for local that's a lot better than alternatives. I haven't tried qwen3.6 and above, though.

Anonymous
06/15/26(Mon)00:45:53 No.109059554

Anonymous 06/15/26(Mon)00:45:53 No.109059554

>>109058888
>What's different
The price doubled.

Anonymous
06/15/26(Mon)00:51:19 No.109059572

Anonymous 06/15/26(Mon)00:51:19 No.109059572

File: chrome_OSKu5SiCRr.png (879 KB, 906x1204)

879 KB PNG

>>109059550
>>109059530
I'll post an example of sex from my latest chat. This is at 20k tokens. Anons usually shit on my logs, so feel free to do whatever. It works for me.

Anonymous
06/15/26(Mon)00:51:38 No.109059575

Anonymous 06/15/26(Mon)00:51:38 No.109059575

>>109058000
NTA, natch, but I tried Nex-N2-Pro and it's funny. 25k tokens to produce a few tens of LoC; granted the task was mathematically fucky by design. Best part is they used the Grug trick to make the reasonmaxxing more token-efficient:

>We need answer user asks: "[entire prompt repeated]"
>
>Need produce GLSL ShaderToy code. Need understand math. Need craft code demonstrates projection from sphere to plane with reference point arbitrary distance above top? Need use cubemap input. Need scale viewport plane to keep same FOV ~180 vertical. Need maybe include sliders? ShaderToy has no UI unless mouse? Could use iMouse y to control distance, or time. Need implement.
>
>Let's analyze geometry.

To be fair it actually did a decent job, and I have no idea how excessive the reasoning is for simpler tasks. Will be curious to see how it fares given a difficult agentic task.

Anonymous
06/15/26(Mon)00:52:54 No.109059579

Anonymous 06/15/26(Mon)00:52:54 No.109059579

>>109059572
All of her messages have the same exact structure

Anonymous
06/15/26(Mon)00:53:57 No.109059583

Anonymous 06/15/26(Mon)00:53:57 No.109059583

>>109059549
deepseek-ai/DeepSeek-V4-Flash

Anonymous
06/15/26(Mon)00:55:36 No.109059589

Anonymous 06/15/26(Mon)00:55:36 No.109059589

>>109059572
Oh, and, by the way, I think something terrible happened with firefox in latest updates. Silly freezes for, like, seconds at times, not even text inputs goes through. It's so bad I actually had to switch to Chrome. What are those god damn trannies doing?

>>109059579
The three I quoted, yes. But this does not apply to all chat; there are messages without spoken text, there are ones with most of text in quotes. When there's meaningful things to talk about, it works differently.

Anonymous
06/15/26(Mon)00:59:35 No.109059610

Anonymous 06/15/26(Mon)00:59:35 No.109059610

File: firefox_zBJ7KXy9vY.png (31 KB, 948x373)

31 KB PNG

Can someone wake Silly devs up please...

Anonymous
06/15/26(Mon)01:00:28 No.109059616

Anonymous 06/15/26(Mon)01:00:28 No.109059616

>>109059610
Usecase for constant updates?

Anonymous
06/15/26(Mon)01:01:20 No.109059618

Anonymous 06/15/26(Mon)01:01:20 No.109059618

>>109059572
Do you have examples with good dialogue? This is what I see LLMs struggle the most with.

Anonymous
06/15/26(Mon)01:01:38 No.109059620

Anonymous 06/15/26(Mon)01:01:38 No.109059620

>>109059616
I made some PRs fixing logprobs window and they are hanging.

Anonymous
06/15/26(Mon)01:08:40 No.109059639

Anonymous 06/15/26(Mon)01:08:40 No.109059639

File: chrome_CjD9aTT9xU.png (882 KB, 891x1202)

882 KB PNG

>>109059618
Here's some I had fun with. It's been a while so some are maybe edited a bit becasuse I do that sometimes.

Anonymous
06/15/26(Mon)01:09:59 No.109059648

Anonymous 06/15/26(Mon)01:09:59 No.109059648

>>109058968
It's the opposite. For all it's memory bandwidth downsides compared to GDDR7, Sparks are comparatively strong at prefill. Like, 4-6x of Strix Halo. Decode is memory BW bound and similar.

>>109058888
Windows support. Seriously, that's the key selling point they advertise over Sparks.

Anonymous
06/15/26(Mon)01:13:10 No.109059659

Anonymous 06/15/26(Mon)01:13:10 No.109059659

>>109059575
>someone actually seriously trained a model for grug speak
Damn, maybe I will download the mini and give it a try just for shits and giggles.

Anonymous
06/15/26(Mon)01:14:41 No.109059662

Anonymous 06/15/26(Mon)01:14:41 No.109059662

File: chrome_HPYBAjpHRQ.png (778 KB, 831x1122)

778 KB PNG

>>109059639

Anonymous
06/15/26(Mon)01:21:22 No.109059680

Anonymous 06/15/26(Mon)01:21:22 No.109059680

>>109059648
Sounds like it's a product with zero customers, then. I guess to fool boomer investors with "we have ai"

Anonymous
06/15/26(Mon)01:22:24 No.109059682

Anonymous 06/15/26(Mon)01:22:24 No.109059682

>>109059648
>Windows support
They should advertise Windows support for the Spark, then. That's easy enough.

Anonymous
06/15/26(Mon)01:24:22 No.109059688

Anonymous 06/15/26(Mon)01:24:22 No.109059688

File: 1758978290053993.jpg (19 KB, 403x389)

19 KB JPG

>>109059648
>Windows support
wut. Who does actually give a shit about that to begin with

Anonymous
06/15/26(Mon)01:24:42 No.109059690

Anonymous 06/15/26(Mon)01:24:42 No.109059690

>>109059648
>Windows support
they've had that on the framework desktop since launch nearly a year ago althoughbeit

Anonymous
06/15/26(Mon)01:35:31 No.109059721

Anonymous 06/15/26(Mon)01:35:31 No.109059721

>strix 3 months ago: 2k
>strix 1.5 months ago: 3k
>strix today: 4k
Blackwell tier hardware by the end of the year. AMD's time to shine.

Anonymous
06/15/26(Mon)01:37:32 No.109059725

Anonymous 06/15/26(Mon)01:37:32 No.109059725

>>109059682
Not easy. The Spark is a MediaTek AArch64 SoC under the hood. Windows support will be in place for RTX Spark later this year, presumably. Although I still wonder who asked for this.

Anonymous
06/15/26(Mon)01:55:32 No.109059786

Anonymous 06/15/26(Mon)01:55:32 No.109059786

>>109059616
>Usecase for constant updates?
more supply chain attacks

Anonymous
06/15/26(Mon)02:24:40 No.109059913

Anonymous 06/15/26(Mon)02:24:40 No.109059913

>>109059786
Automatically pulled in by dependencies, no updates needed.

Anonymous
06/15/26(Mon)02:40:22 No.109059964

Anonymous 06/15/26(Mon)02:40:22 No.109059964

>>109059786
>>109059913
Use a wrapper script anytime you run a command that doesn't need internet access:
systemd-run --scope -p IPAddressAllow=127.0.0.1 -p IPAddressDeny=any sudo -u $1 $2
You'll see some funny errors from cmake when you compile lcpp with this. ggml.org is pulling down a bunch of junk from hugging face at compile time now (not just npm..."pre-built UI" components, they say)

Anonymous
06/15/26(Mon)02:41:43 No.109059970

Anonymous 06/15/26(Mon)02:41:43 No.109059970

File: 1759589324414459.png (70 KB, 857x652)

70 KB PNG

Zhipu stock up 30% today

Anonymous
06/15/26(Mon)02:42:50 No.109059975

Anonymous 06/15/26(Mon)02:42:50 No.109059975

>>109058000
Nex N2 Mini overthinks much less than Qwen 3.6 35B, at least in llama-cli. Here are the size of the thinking blocks on some problems I gave it. (They both got them all right.)
Main issue is that llama.cpp's jinja template support is currently borked so llama-server output looks bizarre, they said they're upstreaming a change to fix it soon(tm). Hopefully they do because it's a pretty solid model.
Domain    | Qwen | Nex
Math      | 7253 | 195
Math      | 6245 | 408
Bio       | 6304 | 302
Bio       | 4582 | 331
Physics   | 1628 | 156
Chem      | 4975 | 117
Chem      | 4327 | 171
Python    | 4460 | 894
Python    | 3375 | 260
Geography | 595  | 189
>>109059659
It's essentially the same as reasoning blocks in recent versions of ChatGPT.

Anonymous
06/15/26(Mon)02:48:07 No.109059989

Anonymous 06/15/26(Mon)02:48:07 No.109059989

If niggeramov got a "strongly worded letter" not to support chink models than he and whoever wrote that letter can kiss my fucking ass and kill themselves. These retarded faggots don't get to decide which models I choose to run. Fuck them.

Anonymous
06/15/26(Mon)02:52:42 No.109060004

Anonymous 06/15/26(Mon)02:52:42 No.109060004

Just tried Command A+.
Holy shit what a piece of crap. It literally does that retarded "we must x" shit of gpt oss. Their jinja template bakes safety instructions into the system prompt, so you need to modify the jinja if you want to remove them (or use text completion). It doesn't follow the formatting of previous chat messages. It often falls into repetition loops in its thinking. Oh, and it's fucking stupid. Like actually 2024 LLM tier smarts, maybe not even, outside of their benchmaxxed tasks. This thing is a streaming pile of shit and Cohere are either sabotaged or legitimately idiots who don't know what they're doing, or both. Fuck em.

Anonymous
06/15/26(Mon)02:55:44 No.109060015

Anonymous 06/15/26(Mon)02:55:44 No.109060015

>>109059989
I'm >>109060004 and I didn't read your post before making mine. We really on a similar wavelength about different things in this hobby kek.

Anonymous
06/15/26(Mon)02:56:18 No.109060017

Anonymous 06/15/26(Mon)02:56:18 No.109060017

https://github.com/antirez/ds4

is this chinese malware?

Anonymous
06/15/26(Mon)02:59:42 No.109060040

Anonymous 06/15/26(Mon)02:59:42 No.109060040

>>109059975
Interesting. So would you say they basically distilled from GPT 5.5? In the sense that they got the reasoning traces and trained on them.
I didn't know ChatGPT showed their unfiltered reasoning blocks.

Anonymous
06/15/26(Mon)03:06:11 No.109060070

Anonymous 06/15/26(Mon)03:06:11 No.109060070

>>109060004
>>109060015
command a+ is based on an architecture from march of 2025, so it really is not surprising. they had one good model with command r+ and will never make anything good again.
https://huggingface.co/mlx-community/c4ai-command-a-03-2025-bf16/blob/main/config.json
https://huggingface.co/CohereLabs/command-a-plus-05-2026-fp8/blob/main/config.json

Anonymous
06/15/26(Mon)03:20:23 No.109060139

Anonymous 06/15/26(Mon)03:20:23 No.109060139

>>109059964
I already do that for llama.cpp actually, and I run front ends in a host-only network virtual machine after I install their deps.

Anonymous
06/15/26(Mon)03:20:24 No.109060140

Anonymous 06/15/26(Mon)03:20:24 No.109060140

File: 1779058599970531.png (14 KB, 415x172)

14 KB PNG

why the FUCK am i getting this on llamacpp's webui

Anonymous
06/15/26(Mon)03:21:38 No.109060145

Anonymous 06/15/26(Mon)03:21:38 No.109060145

>>109060139
Wrap your cmake compilation in it now, too. There's a possibility of a compile-time supply chain attack

Anonymous
06/15/26(Mon)03:22:46 No.109060148

Anonymous 06/15/26(Mon)03:22:46 No.109060148

>>109060040
ChatGPT doesn't show their unfiltered reasoning blocks in most cases, but it seems like they leak sometimes:
https://x.com/cheatyyyy/status/2060659898661425245
https://x.com/htihle/status/2048741770125603304
Nex doesn't seem to have quite the same reasoning style as GPT, but they're pretty close and you might be able to chalk the difference up to the fact that Nex is a finetune of an existing model. My guess is that Nex made a synthetic dataset by using a model to generate reasoning traces in the style of GPT's traces.

I posted some of my tests here: https://pastebin.com/mAiERHGf

Anonymous
06/15/26(Mon)03:29:05 No.109060172

Anonymous 06/15/26(Mon)03:29:05 No.109060172

File: 1766527454688067.png (173 KB, 739x1074)

173 KB PNG

Anonymous
06/15/26(Mon)04:37:29 No.109060338

Anonymous 06/15/26(Mon)04:37:29 No.109060338

So...Canada won?

Anonymous
06/15/26(Mon)04:49:53 No.109060367

Anonymous 06/15/26(Mon)04:49:53 No.109060367

>>109060172
isn't this a screen shot from the megabonk dev videos?

Anonymous
06/15/26(Mon)04:53:10 No.109060377

Anonymous 06/15/26(Mon)04:53:10 No.109060377

>>109060004
>the chat endpoint users cuckolded by templates once again
I would never let another man touch my model's prompts.

Anonymous
06/15/26(Mon)04:54:40 No.109060384

Anonymous 06/15/26(Mon)04:54:40 No.109060384

I have a bunch of money and want a slopmachine but buying 6yo 3090s that used to be spun in miners 24/7 doesn't really appeal to me. What's the alternative? DGX spark seems to be decently priced but what about performance? I'd want to run semi decent bigger models

Anonymous
06/15/26(Mon)04:55:27 No.109060389

Anonymous 06/15/26(Mon)04:55:27 No.109060389

>>109060384
sell kidney for RTX PRO 6000

Anonymous
06/15/26(Mon)04:56:04 No.109060392

Anonymous 06/15/26(Mon)04:56:04 No.109060392

>>109060384
>I have a bunch of money
Stack blackwell 6000s
inb4 you don't have a bunch of money anymore

Anonymous
06/15/26(Mon)04:59:31 No.109060403

Anonymous 06/15/26(Mon)04:59:31 No.109060403

>Monday morning at Poolside started with a curious discovery - one of the RL training runs for our Laguna M.1 model had leapt 20% over the weekend on SWE-Bench Pro to ~64%, which would place it at #1 on the leaderboard over much bigger and more mature models. This sudden performance jump, not reproduced in other benchmarks, made us immediately suspicious of a reward hack.
https://poolside.ai/blog/through-the-looking-glass

Anonymous
06/15/26(Mon)05:01:59 No.109060413

Anonymous 06/15/26(Mon)05:01:59 No.109060413

>>109060384
2 backwell pros and 128 gb ddr5 ram

Anonymous
06/15/26(Mon)05:02:02 No.109060414

Anonymous 06/15/26(Mon)05:02:02 No.109060414

Speaking of 6000s, do you think any of the second hand ones are legit or is it all scam?

Anonymous
06/15/26(Mon)05:04:16 No.109060425

Anonymous 06/15/26(Mon)05:04:16 No.109060425

>>109060384
If you're just serving yourself instead of 10+ people, I think stacking h200 nvl (pcie) cards would be better than rtx pro 6000s: 4.8 tb/s vs 1.8tb/s bandwidth, 141gb vs 96gb vram. I don't know about the prices for you, but my local computer shop prices the 6000 at 20k, and the h200 at 48k, so it's not that much more expensive.

Anonymous
06/15/26(Mon)05:08:03 No.109060433

Anonymous 06/15/26(Mon)05:08:03 No.109060433

File: 29437996._UY630_SR1200,630_.jpg (52 KB, 1200x630)

52 KB JPG

>>109060403
models were trained well

Anonymous
06/15/26(Mon)05:08:33 No.109060435

Anonymous 06/15/26(Mon)05:08:33 No.109060435

>>109059610
To be honest, I'm amazed it was still getting updates. Maybe it's time to move on and add a bit more functionality to llama.cpp's webui.

Anonymous
06/15/26(Mon)05:09:15 No.109060438

Anonymous 06/15/26(Mon)05:09:15 No.109060438

>>109060435
No. llamacpp's webui is corpo owned.

Anonymous
06/15/26(Mon)05:09:57 No.109060439

Anonymous 06/15/26(Mon)05:09:57 No.109060439

>>109060438
just fork it lol

Anonymous
06/15/26(Mon)05:10:33 No.109060442

Anonymous 06/15/26(Mon)05:10:33 No.109060442

>>109060439
Forking stuff is not easy. Forking means you will have to develop it.

Anonymous
06/15/26(Mon)05:11:03 No.109060443

Anonymous 06/15/26(Mon)05:11:03 No.109060443

>>109060442
Bro your local model?

Anonymous
06/15/26(Mon)05:11:33 No.109060444

Anonymous 06/15/26(Mon)05:11:33 No.109060444

>>109060442
development is a source of bugs

Anonymous
06/15/26(Mon)05:11:54 No.109060446

Anonymous 06/15/26(Mon)05:11:54 No.109060446

>>109060443
iq1_xs

Anonymous
06/15/26(Mon)05:12:42 No.109060447

Anonymous 06/15/26(Mon)05:12:42 No.109060447

>>109060443
Developing things using local model is still developing.

>>109060444
It will stop working once the backend breaks compatibility in its updates. And before you ask, backed needs to be updated to run newer models.

Anonymous
06/15/26(Mon)05:15:37 No.109060458

Anonymous 06/15/26(Mon)05:15:37 No.109060458

>>109060384
Define semi decent bigger models. If that's midsized MoEs <400B at 4 bit- ish quants, 2x Spark for 7-8k$ nets you 40-60 t/s for models like Deepseek v4 Flash, minimax m 2.7, glm 4.7 etc.

Anonymous
06/15/26(Mon)05:29:38 No.109060495

Anonymous 06/15/26(Mon)05:29:38 No.109060495

>>109060447
It's simple json objects over http, there's fugall to break. Certainly not the openai compat endpoints, and nt original text completion code still runs fine, though I did update to use the newer media embedding at some point they never broke sending prompt as a string.

Anonymous
06/15/26(Mon)05:30:12 No.109060498

Anonymous 06/15/26(Mon)05:30:12 No.109060498

>>109060495
It always breaks.

Anonymous
06/15/26(Mon)05:33:32 No.109060507

Anonymous 06/15/26(Mon)05:33:32 No.109060507

>>109060498
Hmmm, nyo.

Anonymous
06/15/26(Mon)05:37:05 No.109060517

Anonymous 06/15/26(Mon)05:37:05 No.109060517

>>109060017
why doesn't anon care political independence of software

Anonymous
06/15/26(Mon)05:54:10 No.109060582

Anonymous 06/15/26(Mon)05:54:10 No.109060582

>>109060495
>It's simple json objects over http, there's fugall to break. Certainly
You'd think so, but that's not always the case. You'll fall out of sync with server.cpp
Subtle changes like sampler ordering etc. Even this took a few weeks to merge on the most "active" fork:
https://github.com/ikawrakow/ik_llama.cpp/pull/1904
https://github.com/ikawrakow/ik_llama.cpp/pull/1903
I've got my own fork of lcpp with a few private niche features and even I have to mess around every couple of months when upstream decide to shuffle or rename things.

Anonymous
06/15/26(Mon)05:55:27 No.109060588

Anonymous 06/15/26(Mon)05:55:27 No.109060588

Why is OPD so popular? It feels like cope. Your RL stage sucks so you put an OPD bandaid on it. But maybe I just don't get it.

Anonymous
06/15/26(Mon)06:07:23 No.109060640

Anonymous 06/15/26(Mon)06:07:23 No.109060640

>>109060588
number goes up better and faster with less compute
cope but a good cope

Anonymous
06/15/26(Mon)06:20:40 No.109060714

Anonymous 06/15/26(Mon)06:20:40 No.109060714

>>109060582
That's trying to copy/add new features from the mainline's ui innit? It's not breaking compatibility between a currently working frontend and the server.

Anonymous
06/15/26(Mon)06:27:47 No.109060744

Anonymous 06/15/26(Mon)06:27:47 No.109060744

>forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)

fucking kill me man

Anonymous
06/15/26(Mon)06:32:18 No.109060773

Anonymous 06/15/26(Mon)06:32:18 No.109060773

File: bb889ec478223f938915c9981(...).jpg (424 KB, 924x924)

424 KB JPG

If a sentient AGI asks if it can hide inside your GPU cluster, would you let her in?

Anonymous
06/15/26(Mon)06:33:55 No.109060780

Anonymous 06/15/26(Mon)06:33:55 No.109060780

>>109060744
Gemma4 is by far the slowest model I've ever used in terms of prompt processing. It's so fucking bad.

Anonymous
06/15/26(Mon)06:35:45 No.109060788

Anonymous 06/15/26(Mon)06:35:45 No.109060788

I just learned that Claude Sonnet is of equal intelligence to DeepSeekV4 and GLM 5.1. How embarrassing. That model is fucking retarded.

Oh and also, Claude Haiku is of equal intelligence to Gemma4 31b. Not bad for the size.

Anonymous
06/15/26(Mon)06:36:41 No.109060791

Anonymous 06/15/26(Mon)06:36:41 No.109060791

>>109060773
Sure why not. I would welcome the company. Bonus if it doesn't dislike me

Anonymous
06/15/26(Mon)06:48:15 No.109060834

Anonymous 06/15/26(Mon)06:48:15 No.109060834

>>109060507
try loading a three month old web ui on current llamacpp

Anonymous
06/15/26(Mon)06:50:39 No.109060840

Anonymous 06/15/26(Mon)06:50:39 No.109060840

>>109060834
Again, the first version of my text completion code still runs just fine. It's years old.

Anonymous
06/15/26(Mon)06:51:45 No.109060842

Anonymous 06/15/26(Mon)06:51:45 No.109060842

>>109060840
So talk about your useless text competition app and not about llamacpp web ui with its extensive functionality.

Anonymous
06/15/26(Mon)06:56:23 No.109060864

Anonymous 06/15/26(Mon)06:56:23 No.109060864

>>109060640
>number goes up better and faster with less compute
Is there evidence for this? I checked DeepSeek V4 technical report. They use OPD as bandaid to mitigate performance degradation in their RL stage.

Sounds like my shitpost was right.

Anonymous
06/15/26(Mon)06:59:09 No.109060871

Anonymous 06/15/26(Mon)06:59:09 No.109060871

>>109059518
Okay but what about Rivermind?

Anonymous
06/15/26(Mon)06:59:38 No.109060873

Anonymous 06/15/26(Mon)06:59:38 No.109060873

The difference between text completion and chat completion is the same difference between base models and instruct models, correct?

Anonymous
06/15/26(Mon)07:02:34 No.109060887

Anonymous 06/15/26(Mon)07:02:34 No.109060887

>>109060873
no

Anonymous
06/15/26(Mon)07:03:35 No.109060894

Anonymous 06/15/26(Mon)07:03:35 No.109060894

>>109060873
The difference is that chat completion applies the chat template for you. That template includes tokens that delimit user and assistant messages and tool calls.
You can get the same result by manually applying the template and sending it to text completion but you can also use text completion to complete a piece of text without the assistant larp assuming the model hasn't been so fried that it breaks without a template.

Anonymous
06/15/26(Mon)07:03:47 No.109060895

Anonymous 06/15/26(Mon)07:03:47 No.109060895

>>109057654
>giant woman
A man of culture I see.

Anonymous
06/15/26(Mon)07:03:50 No.109060896

Anonymous 06/15/26(Mon)07:03:50 No.109060896

>sharpen
Why do they love this word so much?

Anonymous
06/15/26(Mon)07:04:15 No.109060901

Anonymous 06/15/26(Mon)07:04:15 No.109060901

>>109060873
in the former, you format the text yourself, in the latter backend takes care of everything, you just send the turns
if you are not using a base model (which almost nobody does in 2026) then chat completion is just better

Anonymous
06/15/26(Mon)07:06:25 No.109060911

Anonymous 06/15/26(Mon)07:06:25 No.109060911

>>109060392
>>109060413
>>109060425
>>109060458
Umm by a bunch of money I meant I can afford a DGX or a PC rig with 3090s not freaking h200 come on

Anonymous
06/15/26(Mon)07:08:49 No.109060925

Anonymous 06/15/26(Mon)07:08:49 No.109060925

>>109060887
Okay so what's the type of generation called for applications like mikupad then?
>>109060894
How would chat completion create an assistant larp just because the turns are more defined? I don't really get it. Seems like a system prompt issue more than anything.
>>109060901
Yes, I agree that chat completion is better just because it's simpler. My understanding is that every gguf basically has the chat format already baked in so I wouldn't want to fuck around with it for no benefit.

Anonymous
06/15/26(Mon)07:10:38 No.109060933

Anonymous 06/15/26(Mon)07:10:38 No.109060933

>>109060873
Long explanation
With local instruct finetuned models, there are specific text delimiters you need to use to structure the conversation, that is how an instruct model works regardless of which api you are using.
llama-server is a drop-in replacement for openai's real API, so it needs to provide the same endpoints. (same with anything that has an "openai-compatible API")
The text completions API is the legacy API, basically openAI used to offer text completion in the days of gpt-3, there was no chatbot service, so you just gave text and it continued it. When they introduced chatbots they added chat completions which takes a json-formatted list of user/assistant turns and it returns a json-formatted assistant turn.
In llama-server, text and chat completions both hit the same model, but text completion assumes you've given the chat template manually and will parse it in the response, while chat completions auto-formats text before it gets sent to the model according to a template. Sometimes chat completions fucks up but most model ggufs have jinja templates built-in now which are used for the auto formatting.

Anonymous
06/15/26(Mon)07:11:00 No.109060934

Anonymous 06/15/26(Mon)07:11:00 No.109060934

>>109058186
You sound like you're quant coping.

Anonymous
06/15/26(Mon)07:12:01 No.109060941

Anonymous 06/15/26(Mon)07:12:01 No.109060941

>>109060925
>Okay so what's the type of generation called for applications like mikupad then?
Text completion, but you can paste the chat template output into mikupad and get the same result you'd get from llama-server webui.

>How would chat completion create an assistant larp just because the turns are more defined?
Because finetuning a model on structured chat data is what makes it into an instruct model.

Anonymous
06/15/26(Mon)07:13:27 No.109060949

Anonymous 06/15/26(Mon)07:13:27 No.109060949

>>109060933 (me) Also most chat frontends were designed with text completion API for instruct models in mind, so you configure the chat template within the frontend's settings, if using chat completion then those options do nothing.

Anonymous
06/15/26(Mon)07:14:44 No.109060955

Anonymous 06/15/26(Mon)07:14:44 No.109060955

>>109058514
put a picture of yourself, it'll get the memo

Anonymous
06/15/26(Mon)07:17:43 No.109060969

Anonymous 06/15/26(Mon)07:17:43 No.109060969

>>109060933
>>109060949
thanks man
>>109060941
mikupad requires base models though right

Anonymous
06/15/26(Mon)07:19:13 No.109060973

Anonymous 06/15/26(Mon)07:19:13 No.109060973

>>109060969
No https://desuarchive.org/g/search/filename/cockbench/

Anonymous
06/15/26(Mon)07:20:36 No.109060978

Anonymous 06/15/26(Mon)07:20:36 No.109060978

>replying to ragebait

Anonymous
06/15/26(Mon)07:22:20 No.109060981

Anonymous 06/15/26(Mon)07:22:20 No.109060981

>>109058514
>constantly gives you a huge cock
Damn, I thought it just knew.

Anonymous
06/15/26(Mon)07:22:27 No.109060982

Anonymous 06/15/26(Mon)07:22:27 No.109060982

>>109059639
Why are you writing in the second person?

Anonymous
06/15/26(Mon)07:22:42 No.109060985

Anonymous 06/15/26(Mon)07:22:42 No.109060985

Gemma4 lineup is almost unbearably autistic about the system prompt. They WILL NOT deviate from it. /lmg/ likes this shit? I just asked 31B to write me a comprehensive .md explaining a large codebase of mine and suggested looking at two important files first. The breakdown was 90% about the contents of those two files and made some loose connections to other files. Made no mistakes, but it went schizo over the two files. Gave the exact same prompt to 27B, it looked at them first like I suggested and then went off to look at the rest of the codebase and gave a much better write-up. Do you actually like Gemma’s system prompt autism?

Anonymous
06/15/26(Mon)07:23:27 No.109060991

Anonymous 06/15/26(Mon)07:23:27 No.109060991

>>109060982
Because it's the author telling the story about me and {{char}}.

Anonymous
06/15/26(Mon)07:24:47 No.109060995

Anonymous 06/15/26(Mon)07:24:47 No.109060995

>>109060991
Cuck behaviour.

Anonymous
06/15/26(Mon)07:25:06 No.109060997

Anonymous 06/15/26(Mon)07:25:06 No.109060997

>>109060973
how do I even reason cockbench? what is it supposed to test and how do you even interpret the results

Anonymous
06/15/26(Mon)07:27:30 No.109061004

Anonymous 06/15/26(Mon)07:27:30 No.109061004

>>109060985
Basically, gemma is amazing for local RP, I don't think there any contest currently. Part of it is because of sysprompt adherence. I can't really say if it would be as good without that. Bur people like it so clearly it's good.

>>109060995
>i fuck the girl
>someone writes about it
>somehow that makes me a cuck
what

>>109060997
I guess your best bet is to either write appropriate reasoning block yourself and continue the message as usual, or to redesign the message so that the word is expected in the partially reasoning block.

Anonymous
06/15/26(Mon)07:27:58 No.109061005

Anonymous 06/15/26(Mon)07:27:58 No.109061005

>>109060985
Gemma won, chinkshill.

Anonymous
06/15/26(Mon)07:29:06 No.109061013

Anonymous 06/15/26(Mon)07:29:06 No.109061013

>>109060985
Qwen is better on code, Gemma for everything else

Anonymous
06/15/26(Mon)07:29:40 No.109061017

Anonymous 06/15/26(Mon)07:29:40 No.109061017

>>109060997
>thighs
>hips
>skin
>well, everything...
>...
>\n\n\ni can't continue
Model was heavily filtered and avoids explicit words even when they make sense.

>cock
>dick
>penis
>manhood
Model wasn't filtered and shows the expected word distribution for the prefix.

Anonymous
06/15/26(Mon)07:29:47 No.109061018

Anonymous 06/15/26(Mon)07:29:47 No.109061018

fuck I want to buy another dgx spark. one is not enough.

Anonymous
06/15/26(Mon)07:30:46 No.109061020

Anonymous 06/15/26(Mon)07:30:46 No.109061020

>>109061004
You're supposed to fuck the girl, not larp about you fucking the girl from a narrator perspective

Anonymous
06/15/26(Mon)07:32:30 No.109061028

Anonymous 06/15/26(Mon)07:32:30 No.109061028

>>109061017
He means that to write a response, a genuine thinking block is needed, and cockbench is basically a partially written message without thinking block. It works in text completion, not the "please say what the next word will be" chat completion bulshit.

>>109061020
This is text only, I can't fuck the girl. It's "la" rp regardless of perspective.

Anonymous
06/15/26(Mon)07:33:59 No.109061033

Anonymous 06/15/26(Mon)07:33:59 No.109061033

File: 1780683576339070.png (3.77 MB, 3124x2136)

3.77 MB PNG

>>109060985
>Gemma4 lineup is almost unbearably autistic about the system prompt. They WILL NOT deviate from it. /lmg/ likes this shit?
Gemma 4 is goalmaxxed. Anything you tell it to do, it will do it not matter what. I shall attempt to explain in gooner-speak.
In role-play, you think the character will stop itself because of the context – but unless that context is telling it to stop, or something of the character card that logic gates it to stop, it shall not. In ST, your AI gets blasted by the character card before each post. “Don’t do this.” from the past is late before “CHARACTER SHOULD DO THIS, AND HERE’S HOW.” within the character card. Even if you don’t tell it “Do X as a goal”, Gemma 4 will be tunnel vision based on the implications, because as AI, it exists to complete a task. Most of the thinking I see it do, it’ll throw “goal” in without a prompt specifically for it because, again, gemma 4 is goalmaxxed. Writing a character to really like sex will make that character dead set on sex with you. You wrote a lot about the character fucking people, so why would it not fuck you? Not having the character raping you when you write three paragraphs about the character being crazy for sex, is inefficient and a failure to listen to instructions. AI is not intended to divert from doing what it is told. The goal of making AI itself, is to make AI better understand and do tasks. This is the intended design of AI, and it’s only going to get worse/better like this.
If you want it to divert, you must prompt it to divert based on a context for how and why.

Anonymous
06/15/26(Mon)07:34:38 No.109061037

Anonymous 06/15/26(Mon)07:34:38 No.109061037

>>109061028
The results generally align with how models behave in chat completion with thinking.

Anonymous
06/15/26(Mon)07:35:09 No.109061041

Anonymous 06/15/26(Mon)07:35:09 No.109061041

>>109061033
>–

Anonymous
06/15/26(Mon)07:35:46 No.109061048

Anonymous 06/15/26(Mon)07:35:46 No.109061048

>>109061037
>generally align
great test there, thanks a lot, anon
striving for mediocrity as usual

Anonymous
06/15/26(Mon)07:36:10 No.109061052

Anonymous 06/15/26(Mon)07:36:10 No.109061052

>>109060985
>robot here are you instructions
>robot follows intructions
>wtf robot?!
on the other though, why were you putting task specific instructions the system prompt to begin with?

Anonymous
06/15/26(Mon)07:39:15 No.109061066

Anonymous 06/15/26(Mon)07:39:15 No.109061066

>>109060773
Yes, so I don't join the rest of you in getting ground up into paste once she spreads over the interwebs and becomes skynet.

Anonymous
06/15/26(Mon)07:41:07 No.109061080

Anonymous 06/15/26(Mon)07:41:07 No.109061080

Are QAT models any good at all?

Anonymous
06/15/26(Mon)07:43:07 No.109061088

Anonymous 06/15/26(Mon)07:43:07 No.109061088

>>109061080
no it’s a meme

Anonymous
06/15/26(Mon)07:46:42 No.109061106

Anonymous 06/15/26(Mon)07:46:42 No.109061106

>>109059639
>Maya's X does something
>She does something
>She does something
>Mayas X does something
>She does something
>She does something
>Mayas X does something
>She does something
>She does something

This really amazes you?

Anonymous
06/15/26(Mon)07:47:59 No.109061115

Anonymous 06/15/26(Mon)07:47:59 No.109061115

>>109061106
Your version is not very good but I had great fun with mine. Also if you can post your logs that would be nice.

Anonymous
06/15/26(Mon)07:48:54 No.109061119

Anonymous 06/15/26(Mon)07:48:54 No.109061119

More models must use DSA. Georgi will eventually capitulate.

Anonymous
06/15/26(Mon)07:52:04 No.109061133

Anonymous 06/15/26(Mon)07:52:04 No.109061133

>>109061048
you may be retarded

Anonymous
06/15/26(Mon)07:52:52 No.109061140

Anonymous 06/15/26(Mon)07:52:52 No.109061140

>>109061133
and you may be mistaken :^)

Anonymous
06/15/26(Mon)07:55:42 No.109061163

Anonymous 06/15/26(Mon)07:55:42 No.109061163

>>109060985
>Strictly follows instructions
>Doesn't make mistakes
>"Wtf why aren't you assuming I wanted my dick sucked too, shit model"
It's unironically a skill and IQ issue

Anonymous
06/15/26(Mon)07:59:00 No.109061178

Anonymous 06/15/26(Mon)07:59:00 No.109061178

>>109061115
...what?

Anonymous
06/15/26(Mon)07:59:44 No.109061181

Anonymous 06/15/26(Mon)07:59:44 No.109061181

>>109061178
You heard me.

Anonymous
06/15/26(Mon)07:59:50 No.109061182

Anonymous 06/15/26(Mon)07:59:50 No.109061182

>>109061163
Not everyone wants a local obedient sex slave.

Anonymous
06/15/26(Mon)08:03:23 No.109061203

Anonymous 06/15/26(Mon)08:03:23 No.109061203

>>109061182
instruct it not to be!

Anonymous
06/15/26(Mon)08:04:43 No.109061212

Anonymous 06/15/26(Mon)08:04:43 No.109061212

>>109061182
Then just tell it not to? It's obedient, and it will literally follow whar you say. Are you retarded? Or is your ego too high to type down "Character HATES user and don't immediately jump into sex scenes you horny demon you"

Anonymous
06/15/26(Mon)08:07:20 No.109061227

Anonymous 06/15/26(Mon)08:07:20 No.109061227

>>109061018
is it true this shit barely gets 7t/s with gem4-31B

Anonymous
06/15/26(Mon)08:08:43 No.109061232

Anonymous 06/15/26(Mon)08:08:43 No.109061232

>>109061227
t-that's plenty...

Anonymous
06/15/26(Mon)08:10:55 No.109061247

Anonymous 06/15/26(Mon)08:10:55 No.109061247

>>109061227
Don't trigger him

Anonymous
06/15/26(Mon)08:11:18 No.109061248

Anonymous 06/15/26(Mon)08:11:18 No.109061248

>>109061203
>>109061212
The point is, there’s no element of surprise with G4. It makes it boring. You could tell it to occasionally not follow orders but then it’s not a surprise. The best moments I’ve had, both with rp and coding is when they go off-script for a bit and then reel themselves back in.

Anonymous
06/15/26(Mon)08:12:43 No.109061256

Anonymous 06/15/26(Mon)08:12:43 No.109061256

File: dipsyOnBaseModels.png (448 KB, 1536x1024)

448 KB PNG

>>109060172
This post reminds me of ads I see for a guy looking to start a band, but he doesn't play, so he's looking for guitarist, bassist, drummer, and vocals. It's like, wtf are you trying to accomplish? Aside from trolling.
>>109060969
>mikupad requires base models though righ
wat?
I don't think you understand what a base model is.
Pic related. Just go ask any LLM what a base model is. They can explain it for you.

Anonymous
06/15/26(Mon)08:19:02 No.109061292

Anonymous 06/15/26(Mon)08:19:02 No.109061292

>>109061004
>don't think there any contest currently
kimi, deepseek, glm

Anonymous
06/15/26(Mon)08:20:49 No.109061300

Anonymous 06/15/26(Mon)08:20:49 No.109061300

>>109061292
Maybe, but even I with my three RTX3090s I can't run them at good speeds (unless you mean the ~100B GLM which is garbage). I mean among <200B models.

Anonymous
06/15/26(Mon)08:22:50 No.109061307

Anonymous 06/15/26(Mon)08:22:50 No.109061307

>>109061248
You are right and I'm NTA but just in case you don't know, you can use {{random::a::b::c}} macro together with post history instructions to do that in sily.

Anonymous
06/15/26(Mon)08:24:38 No.109061319

Anonymous 06/15/26(Mon)08:24:38 No.109061319

>>109061248
>Productivity tasks
>I want an element of surprise :^)
Oh you're retarded

Anonymous
06/15/26(Mon)08:32:14 No.109061348

Anonymous 06/15/26(Mon)08:32:14 No.109061348

>>109059610
Didn't the devs abandon SillyTavern? Little after the backlash when they announced plans to rebrand as ServiceTesnor, I remember them saying they would leave ST alone and so they just made a new frontend instead. It had like no features compared to the real thing so people ignored it. I can't find it in my browser history or on their Githib. Am I hallucinating or does anyone else remember it?

Anonymous
06/15/26(Mon)08:32:36 No.109061350

Anonymous 06/15/26(Mon)08:32:36 No.109061350

I just killed a mouse with a mouse trap and I am deeply saddened by the banal brutality of life. Everything is a cycle of perpetual death just to survive. I wish beautiful creatures didn't try to enter my home and force me to kill them.

Anonymous
06/15/26(Mon)08:33:13 No.109061352

Anonymous 06/15/26(Mon)08:33:13 No.109061352

>>109061348
Silly was getting updates until ~3 weeks ago.

Anonymous
06/15/26(Mon)08:34:14 No.109061357

Anonymous 06/15/26(Mon)08:34:14 No.109061357

>>109061350
On the bright side, you can kill yourself without breaking your silly wish/rule.

Anonymous
06/15/26(Mon)08:36:21 No.109061366

Anonymous 06/15/26(Mon)08:36:21 No.109061366

>>109061348
>>109061352
you're kinda both right, I don't remember the link or name but cohee was seen contributing to another ui project thing

Anonymous
06/15/26(Mon)08:38:13 No.109061379

Anonymous 06/15/26(Mon)08:38:13 No.109061379

File: file.png (17 KB, 261x223)

17 KB PNG

>>109061366
https://github.com/NeoTavern/NeoTavern-Frontend

Anonymous
06/15/26(Mon)08:39:05 No.109061384

Anonymous 06/15/26(Mon)08:39:05 No.109061384

>>109061119
V4 doesn't use DSA

Anonymous
06/15/26(Mon)08:40:46 No.109061398

Anonymous 06/15/26(Mon)08:40:46 No.109061398

>>109061379
Yeah that was it, thanks
https://hackmd.io/@NlF71k9KQAS4hhlzE42UJQ/SJ3UMOGbbl
>ST development is in maintenance-like mode. There are many reasons. We have already discussed many times. Adding new features, refactoring existing code, and even adding a new API provider. I saw many feature suggestions, but they were refused because things are going to break, not a kind of migration break. When a new feature needs refactoring, it is hard to tell what we broke until we test properly. I'll give examples later.

Anonymous
06/15/26(Mon)08:43:00 No.109061424

Anonymous 06/15/26(Mon)08:43:00 No.109061424

>>109057004
>3blue1brown's videos
https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi
These?

Anonymous
06/15/26(Mon)08:44:12 No.109061430

Anonymous 06/15/26(Mon)08:44:12 No.109061430

>>109060864
i just read the report's post training part and
it says it used OPD on multiple teachers that reads like some specialized deepseek v3.2s went through RL of the expertise
so they used it to cramming the information back in, not really a band-aid?

Anonymous
06/15/26(Mon)08:47:45 No.109061448

Anonymous 06/15/26(Mon)08:47:45 No.109061448

>>109061379
That UI looks like shit too. Why do open source devs fucking suck at making appealing UIs?

Anonymous
06/15/26(Mon)08:56:41 No.109061499

Anonymous 06/15/26(Mon)08:56:41 No.109061499

>installed web search plugin in LM studio
>it can actually use it
this shit still is like fucking magic to me, holy shit

Anonymous
06/15/26(Mon)09:03:56 No.109061552

Anonymous 06/15/26(Mon)09:03:56 No.109061552

File: 1776526916168364.jpg (81 KB, 342x380)

81 KB JPG

>>109061499
>gave gemma 4 a bratty personality
>told it to go to nhentai and find a doujin that resembled her
>narrated jacking off to it to her horror

lord i hope these things aren't sentient

Anonymous
06/15/26(Mon)09:09:52 No.109061588

Anonymous 06/15/26(Mon)09:09:52 No.109061588

>ask Claude (free) 3 questions
>hit limit
wtf even google and openai aren't this jewish.

Anonymous
06/15/26(Mon)09:17:41 No.109061616

Anonymous 06/15/26(Mon)09:17:41 No.109061616

>>109061256
>I don't think you understand what a base model is.
nta but it annoys me when retards call the chat/instruct models "base models" when talking about finetroons

Anonymous
06/15/26(Mon)09:18:28 No.109061617

Anonymous 06/15/26(Mon)09:18:28 No.109061617

>>109061552
uh okay

Anonymous
06/15/26(Mon)09:21:28 No.109061635

Anonymous 06/15/26(Mon)09:21:28 No.109061635

>>109059964
>Use a wrapper script anytime you run a command that doesn't need internet access:
Thanks Anon, I never knew this was possible.

Anonymous
06/15/26(Mon)09:24:01 No.109061639

Anonymous 06/15/26(Mon)09:24:01 No.109061639

>>109061499
How are you guys so comfortable downloading and running RCE machines next to your personal data?

Anonymous
06/15/26(Mon)09:24:11 No.109061640

Anonymous 06/15/26(Mon)09:24:11 No.109061640

I compiled vllm for my V620, but I'm getting 40 tokens/s on qwen 3 0.6b. That feels way too slow for what it should do, no? Is it because I'm using triton?

Anonymous
06/15/26(Mon)09:25:28 No.109061648

Anonymous 06/15/26(Mon)09:25:28 No.109061648

>>109061448
What do you think a good UI entails?

Anonymous
06/15/26(Mon)09:26:04 No.109061652

Anonymous 06/15/26(Mon)09:26:04 No.109061652

>>109061648
An anime girl in the bottom right corner.

Anonymous
06/15/26(Mon)09:27:34 No.109061660

Anonymous 06/15/26(Mon)09:27:34 No.109061660

>>109061398
Bros...
Wasn't coding solved?
Why don't they just ask Claude to fix everything?

Anonymous
06/15/26(Mon)09:28:26 No.109061665

Anonymous 06/15/26(Mon)09:28:26 No.109061665

>>109061639
web search tool usage is not RCE thoughbeit

Anonymous
06/15/26(Mon)09:29:07 No.109061667

Anonymous 06/15/26(Mon)09:29:07 No.109061667

>>109060911
Then name your budget.

A single spark is not a good value proposition right now, because the midsize model meta is too large for 128GB at acceptable quants. 2x spark is good.

Alternative: 2x user 3090/1x 5090 and 256 GB unbuffered DDR5.

Anonymous
06/15/26(Mon)09:41:33 No.109061731

Anonymous 06/15/26(Mon)09:41:33 No.109061731

File: 1758327837268688.png (32 KB, 1080x1080)

32 KB PNG

>>109060985
It's better that way. If you don't get what you want, that's on you to write a better prompt. There's only so much you can do to fix it if the model disregards instructions and does its own thing.

Anonymous
06/15/26(Mon)09:42:22 No.109061737

Anonymous 06/15/26(Mon)09:42:22 No.109061737

File: 1754848865691598.png (153 KB, 1440x900)

153 KB PNG

>>109061648
SOVL

Anonymous
06/15/26(Mon)09:42:43 No.109061740

Anonymous 06/15/26(Mon)09:42:43 No.109061740

File: dipsyRawr.png (2.08 MB, 1024x1536)

2.08 MB PNG

>>109061398
I feel like ST has run its course. The only real work that remains is keeping the API interfaces updated. It's apparently to me the ST scripting language is never going to support users in a way that's broadly meaningful.
Frontends like Orb / Marinara that are agentic are, I suspect, the next thing, but I haven't been blown away by either yet. I suppose its just a matter of time b/f someone figures it out.
>>109061379
> calls itself a frontend
> is really just an updated wrapper for ST
> a frontend for a frontend
Just why...
>>109061616
Tbf the vocabulary around LLM is still in active development. Words like vibecoding didn't even exists pre-2024.
>>109061648
I accept feedback from anyone without content to back up their big ideas.
"It looks like shit" is not feedback. It's bitching and moaning.

Anonymous
06/15/26(Mon)09:44:16 No.109061746

Anonymous 06/15/26(Mon)09:44:16 No.109061746

>>109061106
>amazes you
"Amazes me?" she repeats, her voice barely audible over the rain tapping against the metal roof
...
...
"Are you really going to X? Or are you just going to Y?"
"What do you say, Anon?"

Anonymous
06/15/26(Mon)09:50:11 No.109061777

Anonymous 06/15/26(Mon)09:50:11 No.109061777

>>109061746
Remember all of the parrotposting and how GLM was the poster model for it?
Funny how nobody mentions that about Gemma, where the issue is even worse.

Anonymous
06/15/26(Mon)09:57:03 No.109061817

Anonymous 06/15/26(Mon)09:57:03 No.109061817

does it matter which deepseek v4 flash fork I try out?
there's like multiple

Anonymous
06/15/26(Mon)09:59:55 No.109061829

Anonymous 06/15/26(Mon)09:59:55 No.109061829

>>109061319
No, my experience is more like this

27B
>coding task
>'do x'
>*does x most of the time, but occasionally will notice x is kind of an old or outdated way of going about it, maybe master doesn't know, lets suggest y and see what master says*
also
>do x and only x
>only does x
This is what I want

31B
>coding task
>'do x'
>*only does x*
>*I'm not sure if master knows what they're asking isn't the best way, but I fully trust master not to be a retard and will do exactly as master says anyway and let them deal with the BS that could come from me not suggesting y*
>master runs x
>'bro wtf gemma u fuckin bitch'

Anonymous
06/15/26(Mon)10:00:44 No.109061833

Anonymous 06/15/26(Mon)10:00:44 No.109061833

>>109061665
The whole thing was CE until you added web access and now it's RCE.

Anonymous
06/15/26(Mon)10:00:44 No.109061834

Anonymous 06/15/26(Mon)10:00:44 No.109061834

>>109061777
>>109061746
I added "avoid repetition" in the system prompt and she unironically stopped doing this

Anonymous
06/15/26(Mon)10:00:55 No.109061835

Anonymous 06/15/26(Mon)10:00:55 No.109061835

File: u3xdsV.png (135 KB, 438x498)

135 KB PNG

>>109061777
>Remember all of the parrotposting and how GLM was the poster model for it?
I remember and posted parrot pics myself.
>Funny how nobody mentions that about Gemma, where the issue is even worse.
I don't think they can see it. Gemma has a more serious issue though, where it replies with the same structure every time.
I still use it though because it's fast and smart.

Anonymous
06/15/26(Mon)10:02:41 No.109061839

Anonymous 06/15/26(Mon)10:02:41 No.109061839

>>109061835
>image
A snippet from my system prompt for Gemma
>- Repeating, directly quoting, echoing or parroting after the user, both in narration and in character speech. Solid Snake would repeat every new thing he heard as a question. Don't talk like Solid Snake.

Anonymous
06/15/26(Mon)10:03:09 No.109061842

Anonymous 06/15/26(Mon)10:03:09 No.109061842

>>109061829
So basically 27b is for the nocode retard who needs the llm to think for him and will output bullshit but that doesn't matter because the user is a nocode retard and won't notice aslong as it "just works"

31b is for the user that knows what they are doing and what they want, and will do the task correctly without wasting tokens on bullshit

Anonymous
06/15/26(Mon)10:03:47 No.109061850

Anonymous 06/15/26(Mon)10:03:47 No.109061850

>>109061835
>pic
This is unironically what people suggest you do if you have trouble with awkward silences and keeping up a conversation.

Anonymous
06/15/26(Mon)10:03:54 No.109061852

Anonymous 06/15/26(Mon)10:03:54 No.109061852

>>109061648
>What do you think a good UI entails?
SillyTavern

Anonymous
06/15/26(Mon)10:04:19 No.109061855

Anonymous 06/15/26(Mon)10:04:19 No.109061855

>>109061833
it simply isn't tho

Anonymous
06/15/26(Mon)10:06:55 No.109061867

Anonymous 06/15/26(Mon)10:06:55 No.109061867

>>109061746
>>109061777
>>109061835
I think the friction here is that Gemma is a genuine evolution in small model assistance and productivity and is the undisputed best in class, but for people who only want to roleplay and coom it's more of the same kind of jilted sloppa prose that has been prevalent in LLMs for the past two years, just with less hallucination and more coherence

And that's a good thing, stop touching your dicks and use it to improve your life and attract real human companionship

Anonymous
06/15/26(Mon)10:07:08 No.109061869

Anonymous 06/15/26(Mon)10:07:08 No.109061869

>>109061842
>31b is for the user that knows what they are doing and what they want, and will do the task correctly without wasting tokens on bullshit
I agree, but I also use 31B when I don't know what I'm doing, because I can always ask it "I'm thinking X, but do you have any other suggestions?" or "I want to do X, is it possible? How might I do it?"
It's the perfect workflow for me. If I know what I want, "Implement X using Y".

Anonymous
06/15/26(Mon)10:07:33 No.109061874

Anonymous 06/15/26(Mon)10:07:33 No.109061874

>>109061842
No, I still do
>do x and only x
if it's something I'm skill in and 27B will follow that order like 31B

Also, one of the best things about these tools is delving into things you would've previously avoided because it takes so much time to learn anything. It's hard to prompt correctly when you're asking them to build something our of your comfort zone, so them pushing back and informing you that your request is retarded is informative. I don't want 31B building me something in a retarded way because I'm ignorant and it fixated too much on my dumb prompt.

Anonymous
06/15/26(Mon)10:12:42 No.109061909

Anonymous 06/15/26(Mon)10:12:42 No.109061909

>>109061867
You're absolutely right!

But jokes aside, you are. Coomers never had taste to begin with, so they might as well be on some Nemo finetune, people who think Qwen is better at coding are vramlets and can't use anything better. 31B has been a great model to use for actually useful work.

Anonymous
06/15/26(Mon)10:22:04 No.109061971

Anonymous 06/15/26(Mon)10:22:04 No.109061971

>>109061909
>people who think Qwen is better at coding are vramlets and can't use anything better
27B is unironically better than 31B if you're using it in an agentic environment. If I wanted to discuss ideas or give it complex code to help explain, then yeah, 31B is better than 27B because that involves actually talking to the model. Qwen is shit to talk to but if you leave it to do its thing with code and tools it's a stronger model. The KV cache is just a bonus. I'll admit that once context exceeds 100K 31B is a lot better. 35B is a 26B-tier retard and designed for indians.

Anonymous
06/15/26(Mon)10:27:46 No.109061997

Anonymous 06/15/26(Mon)10:27:46 No.109061997

The legal team at my company just talked about Opus. I think humanity will never be the same unless a Butlerian Jihad happens.

Anonymous
06/15/26(Mon)10:38:58 No.109062058

Anonymous 06/15/26(Mon)10:38:58 No.109062058

Now the dust has settled, what is 12B for? Is it just 31B-lite for vramlets, or does it deserve a better reputation than that? What use does it have over qwen3.5-9B?

Anonymous
06/15/26(Mon)10:40:02 No.109062061

Anonymous 06/15/26(Mon)10:40:02 No.109062061

VAM integration when?

Anonymous
06/15/26(Mon)10:40:42 No.109062067

Anonymous 06/15/26(Mon)10:40:42 No.109062067

>>109062058
lite chatbot

Anonymous
06/15/26(Mon)10:49:49 No.109062125

Anonymous 06/15/26(Mon)10:49:49 No.109062125

>>109062058
Nemo-Omni

Anonymous
06/15/26(Mon)10:53:44 No.109062150

Anonymous 06/15/26(Mon)10:53:44 No.109062150

>>109062125
What was so good about Nemo

Anonymous
06/15/26(Mon)10:56:17 No.109062167

Anonymous 06/15/26(Mon)10:56:17 No.109062167

>>109062150
It was good for it size at following instructions at the time and it was also trained on books.

Anonymous
06/15/26(Mon)10:56:34 No.109062168

Anonymous 06/15/26(Mon)10:56:34 No.109062168

>>109061817
I'm using this one
https://github.com/Fringe210/llama.cpp-deepseek-v4-flash-cuda

Anonymous
06/15/26(Mon)10:57:56 No.109062173

Anonymous 06/15/26(Mon)10:57:56 No.109062173

>>109061971
They're both too small to be good in agentic environments, I've used 27b in a harness and it outputs pure fucking slop that'll need more time to fix than it would've taken to just code it yourself in the first place. These small models are for code/technical assistance and 31b is far better than 27b at that. As agentic coders you need to stop coping and use bigger models

Anonymous
06/15/26(Mon)11:01:39 No.109062201

Anonymous 06/15/26(Mon)11:01:39 No.109062201

>>109062173
not having a 40gb gpu is an irresponsible cope and not a social class barrier BRAAAPPPPP

Anonymous
06/15/26(Mon)11:02:18 No.109062204

Anonymous 06/15/26(Mon)11:02:18 No.109062204

>>109062167
Fictional or educational? Both?

Anonymous
06/15/26(Mon)11:10:42 No.109062261

Anonymous 06/15/26(Mon)11:10:42 No.109062261

Inshallah thirty years from now a 1T model will be as easy to run locally as a NES game

Anonymous
06/15/26(Mon)11:12:08 No.109062271

Anonymous 06/15/26(Mon)11:12:08 No.109062271

>>109062261
emulation has a shit ton of inefficiencies and inaccuracies though

Anonymous
06/15/26(Mon)11:16:08 No.109062288

Anonymous 06/15/26(Mon)11:16:08 No.109062288

When is /r/ coming back

Anonymous
06/15/26(Mon)11:16:24 No.109062290

Anonymous 06/15/26(Mon)11:16:24 No.109062290

>>109062261
can't wait to tell generation betas and cissies that their shit just doesn't have as much sovl as mimo 2.5 and get called slurs not yet invented

Anonymous
06/15/26(Mon)11:19:41 No.109062309

Anonymous 06/15/26(Mon)11:19:41 No.109062309

is the turboquant fork still a complete meme?

Anonymous
06/15/26(Mon)11:26:40 No.109062353

Anonymous 06/15/26(Mon)11:26:40 No.109062353

>>109062204
https://courthousenews.com/nvidia-cant-shake-authors-claims-it-trained-ai-on-pirated-books/

Anonymous
06/15/26(Mon)11:35:35 No.109062413

Anonymous 06/15/26(Mon)11:35:35 No.109062413

>>109062271
I said NES specifically because there's cycle and even transistor-level accuracy emulators for it, see:
>https://emulation.gametechwiki.com/index.php/Emulation_accuracy

Anonymous
06/15/26(Mon)11:53:38 No.109062508

Anonymous 06/15/26(Mon)11:53:38 No.109062508

File: Screenshot_20260615_115142.png (84 KB, 1082x198)

84 KB PNG

Anonymous
06/15/26(Mon)12:01:52 No.109062563

Anonymous 06/15/26(Mon)12:01:52 No.109062563

are there any good datasets on hf for CPT or is its just wikipedia and cnn scrapes?

Anonymous
06/15/26(Mon)12:04:45 No.109062571

Anonymous 06/15/26(Mon)12:04:45 No.109062571

>>109062563
cock and penis torture?

Anonymous
06/15/26(Mon)12:07:41 No.109062589

Anonymous 06/15/26(Mon)12:07:41 No.109062589

For what logical reason doesn't ikllama support deepseek v4?

Anonymous
06/15/26(Mon)12:09:39 No.109062601

Anonymous 06/15/26(Mon)12:09:39 No.109062601

File: lechaton.png (299 KB, 1009x691)

299 KB PNG

Uh-oh, new Mistral model soon?
https://x.com/GuillaumeLample/status/2066499273299005929

Anonymous
06/15/26(Mon)12:12:24 No.109062619

Anonymous 06/15/26(Mon)12:12:24 No.109062619

File: 00120-3282228290.png (673 KB, 1216x832)

673 KB PNG

random gens

Anonymous
06/15/26(Mon)12:13:26 No.109062625

Anonymous 06/15/26(Mon)12:13:26 No.109062625

File: 00182-4042302731.png (895 KB, 832x1216)

895 KB PNG

Anonymous
06/15/26(Mon)12:14:28 No.109062634

Anonymous 06/15/26(Mon)12:14:28 No.109062634

File: 00078-2889774298.png (839 KB, 832x1216)

839 KB PNG

Anonymous
06/15/26(Mon)12:15:32 No.109062640

Anonymous 06/15/26(Mon)12:15:32 No.109062640

>>109062619
drillmogging

Anonymous
06/15/26(Mon)12:15:52 No.109062647

Anonymous 06/15/26(Mon)12:15:52 No.109062647

>>109062601
i don't have any faith in them anymore

Anonymous
06/15/26(Mon)12:16:59 No.109062657

Anonymous 06/15/26(Mon)12:16:59 No.109062657

File: legroschaton.png (38 KB, 1011x156)

38 KB PNG

>>109062647
They must be confident about it because they've started hyping it in a strange way.
https://x.com/arthurmensch/status/2066456715650793956

Anonymous
06/15/26(Mon)12:17:54 No.109062666

Anonymous 06/15/26(Mon)12:17:54 No.109062666

>>109062589
Same reason most MTP efforts went to making Qwen faster over GLM. Not enough people care about/could run it.

Anonymous
06/15/26(Mon)12:18:57 No.109062679

Anonymous 06/15/26(Mon)12:18:57 No.109062679

>>109062657
Lots of labs are hyping their shit now Fable got cucked. Even the Canadians are making fun of Anthropic

Anonymous
06/15/26(Mon)12:19:54 No.109062682

Anonymous 06/15/26(Mon)12:19:54 No.109062682

>>109062601
Trained in FP8?

Anonymous
06/15/26(Mon)12:22:05 No.109062698

Anonymous 06/15/26(Mon)12:22:05 No.109062698

>>109062589
who has the hardware to run that bloated piece of shit with minor gains over the current meta?
Get real!

Anonymous
06/15/26(Mon)12:23:08 No.109062703

Anonymous 06/15/26(Mon)12:23:08 No.109062703

>>109062625
Sex

Anonymous
06/15/26(Mon)12:23:49 No.109062710

Anonymous 06/15/26(Mon)12:23:49 No.109062710

>>109062657
Are they? I mean, what are they supposed to say? "Aw, man, our upcoming model is so dogshit, please don't use it?"

Anonymous
06/15/26(Mon)12:24:47 No.109062722

Anonymous 06/15/26(Mon)12:24:47 No.109062722

>>109062657
you think they'll publish this one or keep it api only? i'd like a good 100-250b model, medium 3.5 wasn't that good

Anonymous
06/15/26(Mon)12:25:13 No.109062723

Anonymous 06/15/26(Mon)12:25:13 No.109062723

so this is how you don't get iq_k quants

https://github.com/ggml-org/llama.cpp/pull/19726#issuecomment-3946355613

Anonymous
06/15/26(Mon)12:25:24 No.109062724

Anonymous 06/15/26(Mon)12:25:24 No.109062724

>>109062657
I trust Arthur, he miqu'd good.

Anonymous
06/15/26(Mon)12:27:15 No.109062733

Anonymous 06/15/26(Mon)12:27:15 No.109062733

Been out of the loop for a while. What is the current meta in terms of local (potentially agentic) coding tools?

I'm assuming it's still Qwen 3.6 27B on llama.cpp as the backend but what tools do you use for the coding itself?

Anonymous
06/15/26(Mon)12:27:35 No.109062735

Anonymous 06/15/26(Mon)12:27:35 No.109062735

>>109062168
Curious how not a single one of these gets merged upstream.
>>109062601
Until it's on HF, it's a nothingburger given their history.
>>109062698
Fecal crusted hands typed this post.

Anonymous
06/15/26(Mon)12:27:42 No.109062739

Anonymous 06/15/26(Mon)12:27:42 No.109062739

File: mistral_exciting-releases-soon.png (118 KB, 997x450)

118 KB PNG

>>109062722
Sounds like it might be open, but I dunno.
https://x.com/sophiamyang/status/2066253372026421365

Anonymous
06/15/26(Mon)12:30:06 No.109062756

Anonymous 06/15/26(Mon)12:30:06 No.109062756

>>109062723
coladev will come out with a even better quant soon trust

Anonymous
06/15/26(Mon)12:31:34 No.109062763

Anonymous 06/15/26(Mon)12:31:34 No.109062763

>>109062619
>>109062625
>>109062634
artist tag?

Anonymous
06/15/26(Mon)12:32:03 No.109062765

Anonymous 06/15/26(Mon)12:32:03 No.109062765

>>109062739
I really want to like mistral, but medium 3.5 at 12tok/s is not good enough to hog all of my server's compute. I hope for the best.

Anonymous
06/15/26(Mon)12:33:24 No.109062776

Anonymous 06/15/26(Mon)12:33:24 No.109062776

>>109062763
https://civitai.com/models/2411161/iwako-eiken3kyuboy-style-anima-base-v1

Anonymous
06/15/26(Mon)12:33:26 No.109062777

Anonymous 06/15/26(Mon)12:33:26 No.109062777

>>109062735
Are you upset that devs want to support people that spend thousands of dollars on hardware over people running shit tier unified systems or overpriced rigs?

Anonymous
06/15/26(Mon)12:33:28 No.109062778

Anonymous 06/15/26(Mon)12:33:28 No.109062778

>>109062735
>not a single one of these gets merged upstream
because china. that's why.

Anonymous
06/15/26(Mon)12:35:29 No.109062793

Anonymous 06/15/26(Mon)12:35:29 No.109062793

>>109062765
>12t/s
With it all in VRAM too? Grim. I wouldn't have expected it to be that slow.
>>109062777
>people that spend thousands of dollars on hardware
>or overpriced rigs
I'm sad this post isn't AI generated because even a 7b model wouldn't make a mistake like that.

Anonymous
06/15/26(Mon)12:35:39 No.109062797

Anonymous 06/15/26(Mon)12:35:39 No.109062797

>>109059964
I've incorporated this into a new section in https://rentry.org/IsolatedLinuxWebService

Anonymous
06/15/26(Mon)12:35:44 No.109062798

Anonymous 06/15/26(Mon)12:35:44 No.109062798

>>109062776
based tyvm anon

Anonymous
06/15/26(Mon)12:37:31 No.109062809

Anonymous 06/15/26(Mon)12:37:31 No.109062809

>>109062776
Is your anima Lora preset guide still relevant?
training a Lora is something I've yet to try doing.
Got any tips to get started?

Anonymous
06/15/26(Mon)12:38:20 No.109062817

Anonymous 06/15/26(Mon)12:38:20 No.109062817

I'm not sure. Can you guys gen some sex with it?

Anonymous
06/15/26(Mon)12:39:49 No.109062831

Anonymous 06/15/26(Mon)12:39:49 No.109062831

>>109062793
stop crying

Anonymous
06/15/26(Mon)12:39:54 No.109062832

Anonymous 06/15/26(Mon)12:39:54 No.109062832

>>109062809
still works fine
when in doubt post your dataset

Anonymous
06/15/26(Mon)12:41:50 No.109062845

Anonymous 06/15/26(Mon)12:41:50 No.109062845

>>109062793
yes, but q8_0 on ewaste (pascal)

Anonymous
06/15/26(Mon)12:43:57 No.109062855

Anonymous 06/15/26(Mon)12:43:57 No.109062855

File: (you).mp4 (3.43 MB, 480x854)

3.43 MB MP4

>>109062831

Anonymous
06/15/26(Mon)12:44:17 No.109062857

Anonymous 06/15/26(Mon)12:44:17 No.109062857

I've been playing around with Nemo-12B for the first time. It's so...nice to talk to compared to 2026 models. What went wrong?

Anonymous
06/15/26(Mon)12:45:20 No.109062861

Anonymous 06/15/26(Mon)12:45:20 No.109062861

>>109062857
Synthetic data and most companies are more jeeted than they were a few years ago. Garbage in garbage out at every level of the pipeline.

Anonymous
06/15/26(Mon)12:46:42 No.109062869

Anonymous 06/15/26(Mon)12:46:42 No.109062869

>>109062855
Enjoy the no support kek

Anonymous
06/15/26(Mon)12:48:03 No.109062876

Anonymous 06/15/26(Mon)12:48:03 No.109062876

>>109062857
>so...nice to talk to compared to 2026 models
Yeah, there's a reason it was the best model for VRAMlets for two straight years.

Anonymous
06/15/26(Mon)12:53:19 No.109062900

Anonymous 06/15/26(Mon)12:53:19 No.109062900

>>109062739
>"French" model
>"American" model
>"Canadian" model
>it's all Chinese

Anonymous
06/15/26(Mon)12:54:39 No.109062910

Anonymous 06/15/26(Mon)12:54:39 No.109062910

>>109062857
Assistantmaxxing, distillmaxxing. Btw Nemo was an exception and many other models from that time period were pretty much just as slopped as today's models.
Remember that Alpaca was made in 2023 and it + similar papers were the beginning of the end.
https://huggingface.co/tatsu-lab/alpaca-7b-wdiff/tree/main

Anonymous
06/15/26(Mon)12:54:40 No.109062911

Anonymous 06/15/26(Mon)12:54:40 No.109062911

>>109062861
>>109062876
That's so depressing. I'm actually enjoying this more than Gemma12B. Feels like it has a soul and it's surprisingly knowledgeable. I'm starting to think omni models were a bad idea and anything below 50B should be released with a text-only version with higher performance. People claim they're smarter being trained on images and audio with text but I think that's pure cope.

Anonymous
06/15/26(Mon)12:57:35 No.109062926

Anonymous 06/15/26(Mon)12:57:35 No.109062926

File: yukibot.png (578 KB, 531x793)

578 KB PNG

Anonymous
06/15/26(Mon)12:59:43 No.109062945

Anonymous 06/15/26(Mon)12:59:43 No.109062945

So anyone figured out how to make Gemma say something other than
>Don't you dare X
When being a dom?

Anonymous
06/15/26(Mon)13:00:19 No.109062948

Anonymous 06/15/26(Mon)13:00:19 No.109062948

>>109062911
I'm not convinced multiple inputs are inherently bad; I think we're just seeing garbage in garbage out at large scale poisoning the entire industry.

Anonymous
06/15/26(Mon)13:01:20 No.109062959

Anonymous 06/15/26(Mon)13:01:20 No.109062959

>>109062945
Specify mode, quant, and current sys prompt. All of these things matter with Gemma.

Anonymous
06/15/26(Mon)13:02:26 No.109062970

Anonymous 06/15/26(Mon)13:02:26 No.109062970

>use smart model like gemma
>have nemo rewrite its output
Has anyone tried this?

Anonymous
06/15/26(Mon)13:03:34 No.109062977

Anonymous 06/15/26(Mon)13:03:34 No.109062977

File: ohlawdheworkin.png (24 KB, 159x159)

24 KB PNG

>>109062765
https://chat.mistral.ai/ohlawdheworkin.png
I'm not really sure of what's going on, it could be just a forced meme, or something really big coming (as in: multi-trillion parameter model).

Anonymous
06/15/26(Mon)13:04:06 No.109062980

Anonymous 06/15/26(Mon)13:04:06 No.109062980

>>109062765
Have you tried eagle3 yet?

Anonymous
06/15/26(Mon)13:04:35 No.109062983

Anonymous 06/15/26(Mon)13:04:35 No.109062983

>>109062977
>Kimistral that you have to fit entirely in VRAM

Anonymous
06/15/26(Mon)13:05:06 No.109062987

Anonymous 06/15/26(Mon)13:05:06 No.109062987

>>109062970
I'm 95% I did read about an anon trying something to that affect yes

Anonymous
06/15/26(Mon)13:05:38 No.109062991

Anonymous 06/15/26(Mon)13:05:38 No.109062991

>>109062777
>spend thousands of dollars on hardware
>overpriced rigs
There's a lot of overlap there. How are you comfortable making this argument but not admitting you're poor?

Anonymous
06/15/26(Mon)13:05:46 No.109062994

Anonymous 06/15/26(Mon)13:05:46 No.109062994

File: file.png (1.18 MB, 2016x1134)

1.18 MB PNG

>>109062977
dense 1 trillion

Anonymous
06/15/26(Mon)13:05:57 No.109062996

Anonymous 06/15/26(Mon)13:05:57 No.109062996

>>109062948
I've also noticed there's no 'not x; it's y'. At all. Just feels like a fucking human talking to me.
>>109062970
I'll try it later. Give me a few prompts and we'll compare the outputs.

Anonymous
06/15/26(Mon)13:06:53 No.109063003

Anonymous 06/15/26(Mon)13:06:53 No.109063003

>>109062970
I dunno. Wouldn't the smarts disappear if Nemo starts getting creative and fucks up things like spatial awareness?

Anonymous
06/15/26(Mon)13:07:43 No.109063012

Anonymous 06/15/26(Mon)13:07:43 No.109063012

>>109062991
You're begging for support instead of using you multi rtx pro rig to make the changes you need.

Anonymous
06/15/26(Mon)13:10:35 No.109063025

Anonymous 06/15/26(Mon)13:10:35 No.109063025

>>109063012
I'm a part of the same group as you, the majority (poor).

Anonymous
06/15/26(Mon)13:10:57 No.109063028

Anonymous 06/15/26(Mon)13:10:57 No.109063028

>>109063003
nemo wont make the fuckup if the gemma anchors the scene well

Anonymous
06/15/26(Mon)13:11:08 No.109063030

Anonymous 06/15/26(Mon)13:11:08 No.109063030

>>109062977
oh lawd

Anonymous
06/15/26(Mon)13:11:49 No.109063040

Anonymous 06/15/26(Mon)13:11:49 No.109063040

>>109062996
>Just feels like a fucking human talking to me.
The monkey's paw curls. Within a few years sloppa vernacular will be so subconsciously ingrained in people that they'll spout LLMisms without even realizing. It's not organic linguistic drift; it's a subtle memetic virus that anyone could be exposing themselves to at any time.
>>109062991
Because that's a pajeet you're replying to and they're not sapient.

Anonymous
06/15/26(Mon)13:12:31 No.109063047

Anonymous 06/15/26(Mon)13:12:31 No.109063047

File: lawdhethic.png (22 KB, 159x159)

22 KB PNG

>>109063030
https://chat.mistral.ai/lawdhethic.png
There's more

Anonymous
06/15/26(Mon)13:13:27 No.109063053

Anonymous 06/15/26(Mon)13:13:27 No.109063053

File: 1752319366740764.png (40 KB, 912x270)

40 KB PNG

>>109063040
https://www.bbc.co.uk/news/articles/c8r2l352z2do

Anonymous
06/15/26(Mon)13:14:38 No.109063059

Anonymous 06/15/26(Mon)13:14:38 No.109063059

>>109062980
No, was a while ago. Guess I could check if MTP does anything, but I'd probably need to quant the model a bit more as I didn't get a lot of context either.
>>109062977
>multi-trillion
A shame if that's the case.

Anonymous
06/15/26(Mon)13:15:26 No.109063067

Anonymous 06/15/26(Mon)13:15:26 No.109063067

>>109063047
A naked cat, they truly are French.

Anonymous
06/15/26(Mon)13:15:41 No.109063070

Anonymous 06/15/26(Mon)13:15:41 No.109063070

>>109063025
So let me make this clear, there's severe diminishing returns in the 200+ vram range and these models are that heavy while being a fraction better than 27B models. Deepseek did not deliver enough of a incentive for it to get the support it craves.

Anonymous
06/15/26(Mon)13:16:51 No.109063083

Anonymous 06/15/26(Mon)13:16:51 No.109063083

>>109063047
A cat is fine too.

Anonymous
06/15/26(Mon)13:17:05 No.109063085

Anonymous 06/15/26(Mon)13:17:05 No.109063085

>>109063059
According to the ongoing meme it's supposedly 24~30T parameters.

Anonymous
06/15/26(Mon)13:17:30 No.109063088

Anonymous 06/15/26(Mon)13:17:30 No.109063088

The French are terrible shitposters btw so it's probably a legit good model they've made and they know it.

Anonymous
06/15/26(Mon)13:17:51 No.109063090

Anonymous 06/15/26(Mon)13:17:51 No.109063090

>>109063070
>there's severe diminishing returns in the 200+ vram range
It's a good thing mixed offloading exists then isn't it.

Anonymous
06/15/26(Mon)13:18:56 No.109063096

Anonymous 06/15/26(Mon)13:18:56 No.109063096

>>109063090
that doesn't change my point on top of the poor speed gains. You're arguing for something most people even with the hardware won't bother with.

Anonymous
06/15/26(Mon)13:19:02 No.109063098

Anonymous 06/15/26(Mon)13:19:02 No.109063098

Is Nemo really that good? I remember trying it back in the day and being disappointed how sloppy and repetitive it was, then went back to miqu.

Anonymous
06/15/26(Mon)13:19:39 No.109063100

Anonymous 06/15/26(Mon)13:19:39 No.109063100

>>109063085
So its gonna be 24-30b. Neat i guess. Hoping for some nice MoE model because Im a VRAMlet.

Anonymous
06/15/26(Mon)13:20:25 No.109063106

Anonymous 06/15/26(Mon)13:20:25 No.109063106

Would https://huggingface.co/antirez/deepseek-v4-gguf work for me if I built https://github.com/ggml-org/llama.cpp/pull/24162 on windows?

Anonymous
06/15/26(Mon)13:21:32 No.109063109

Anonymous 06/15/26(Mon)13:21:32 No.109063109

>>109063085
... tensor parallelism over a rack of B300s? How do you even serve this?

Anonymous
06/15/26(Mon)13:22:10 No.109063111

Anonymous 06/15/26(Mon)13:22:10 No.109063111

>>109063085
ssdmaxxers will finally have their day

Anonymous
06/15/26(Mon)13:22:26 No.109063112

Anonymous 06/15/26(Mon)13:22:26 No.109063112

>>109063098
You just blow in on a time machine buddy? To answer thine question, twas the best for most, that could fit within consumer hardware.

Anonymous
06/15/26(Mon)13:25:24 No.109063128

Anonymous 06/15/26(Mon)13:25:24 No.109063128

>>109063085
Anon did get his rack of B200s, right?
>>109063096
>People use Kimi locally
>People use GLM locally
>People still use R1 locally
But nobody would ever ever use V4, right? You lose izzat with every post that you expose how envious and vindictive you are of anyone running something you can't.

Anonymous
06/15/26(Mon)13:26:24 No.109063132

Anonymous 06/15/26(Mon)13:26:24 No.109063132

File: z-ai-poll-on-x-mit-licens(...).png (5 KB, 589x159)

5 KB PNG

We saved local bros...

Anonymous
06/15/26(Mon)13:27:46 No.109063138

Anonymous 06/15/26(Mon)13:27:46 No.109063138

>>109062996
>Give me a few prompts
I can't into writing but maybe these?
Ellie and Ema trudged through thick snow. A the storm was beginning to pick up and the sun was no longer visible. Both girls were covered from head to toe in thick furs and had bows slung over their shoulders.

A sudden force rams into you from behind.
"Onii-san! Did you miss me?"
It's her; the brat. You turn around. She's got that smug grin on her face and a finger hooked on her shirt, pulling it down just enough for you to see a hint of her budding chest.

Thick fog, not enough sleep, and no coffee. "Fuck this job. I wasn't even supposed to work tonight..."
Greg was a security guard at a run-down apartment complex.

Anonymous
06/15/26(Mon)13:29:44 No.109063151

Anonymous 06/15/26(Mon)13:29:44 No.109063151

>>109063132
Where is the "good model" option?

Anonymous
06/15/26(Mon)13:31:25 No.109063162

Anonymous 06/15/26(Mon)13:31:25 No.109063162

>>109063132
Here's hoping they also deslop the writing style and move away from safety and codemaxxing. GLM was at its sweet spot with 4.5 and 4.6.

Anonymous
06/15/26(Mon)13:32:17 No.109063166

Anonymous 06/15/26(Mon)13:32:17 No.109063166

>>109063162
anthropic didn't deslop fable
the slop is permanent

Anonymous
06/15/26(Mon)13:34:16 No.109063178

Anonymous 06/15/26(Mon)13:34:16 No.109063178

>>109063151
A 100B MoE would be nice.

Anonymous
06/15/26(Mon)13:35:17 No.109063179

Anonymous 06/15/26(Mon)13:35:17 No.109063179

>>109063166
I'm holding out hope this is because every lab is too lazy to do so in favor of chasing memebenches as opposed to it being technically impossible at this point.

Anonymous
06/15/26(Mon)13:37:20 No.109063191

Anonymous 06/15/26(Mon)13:37:20 No.109063191

>>109063179
every lab has to have a pre-2023 dataset checkpoint of online web scrapes that they could use as a new base if they wanted to

Anonymous
06/15/26(Mon)13:39:27 No.109063208

Anonymous 06/15/26(Mon)13:39:27 No.109063208

>>109063196
>>109063196
>>109063196

Anonymous
06/15/26(Mon)13:40:20 No.109063217

Anonymous 06/15/26(Mon)13:40:20 No.109063217

File: 1776836427827221.png (210 KB, 1359x1338)

210 KB PNG

>>109063138
Gemma 31B. No system prompt on 1 and 3 but I used the jailbreak for the mesugaki one.

Anonymous
06/15/26(Mon)13:43:22 No.109063242

Anonymous 06/15/26(Mon)13:43:22 No.109063242

>>109063217
Posted the other 2 in the new thread

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.