/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 06/30/26(Tue)14:49:59 No.109170290

File: __hatsune_miku_kasane_tet(...).jpg (1.75 MB, 2248x2516)

1.75 MB JPG

/lmg/ - Local Models General Anonymous 06/30/26(Tue)14:49:59 No.109170290 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>109164034 & >>109158385

►News
>(06/29) DeepSeek V4 support merged: https://github.com/ggml-org/llama.cpp/pull/24162
>(06/28) DFlash support merged: https://github.com/ggml-org/llama.cpp/pull/22105
>(06/27) DeepSeek releases DeepSpec and DSpark models: https://hf.co/deepseek-ai/DeepSeek-V4-Pro-DSpark
>(06/25) LFM2.5-230M released: https://liquid.ai/blog/lfm2-5-230m
>(06/22) Qwen-AgentWorld-35B-A3B language world model released: https://qwen.ai/blog?id=qwen-agentworld

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/RecapAnon/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
06/30/26(Tue)14:50:14 No.109170291

Anonymous 06/30/26(Tue)14:50:14 No.109170291

File: big models.jpg (246 KB, 1024x1024)

246 KB JPG

►Recent Highlights from the Previous Thread: >>109164034

--LongCat-2.0 1.6T MoE model reports and capabilities:
>109164718 >109164733 >109164818 >109164783 >109164796 >109164863 >109164923 >109168049 >109168054 >109169203 >109164875
--Anon showcases Project Spectator and debates Gemma vs Qwen coding:
>109165352 >109165374 >109165441 >109168183 >109168261 >109168296 >109168516 >109168268 >109169775
--Anon developing bespoke local finetuning pipeline for Kimi:
>109164133 >109165963 >109166142 >109166050 >109165881 >109166120
--Comparing Gemma-4 and depurpled versions via token probabilities:
>109166107 >109166258 >109166928 >109166958 >109167178
--Preventing LLM agents from damaging critical system files:
>109165952 >109165976 >109166052 >109166103 >109166132
--Huawei openPangu-2.0-Flash 92B open source launch:
>109168737
--Trend of using non-LLM methods to optimize inference speed:
>109168144 >109168211 >109168357
--Using a llama.cpp diff to reduce VRAM for DeepSeek-V4-Flash:
>109169253
--Comparing quantization methods and system prompt adherence for Gemma 26B:
>109166922 >109166943 >109166954 >109166960 >109167052 >109167064
--Using LLMs for autonomous world orchestration in a fantasy sim:
>109168868 >109168929 >109168980 >109168989 >109169041 >109169428 >109169442 >109169473 >109169555 >109169733 >109169570 >109169607 >109169658 >109169681 >109169703 >109169725 >109169744 >109169804 >109169829 >109169850 >109169891 >109169949 >109169904 >109170093 >109170111 >109169876 >109169212 >109169793
--Educational video explaining GQA, MLA, and DSA efficiency:
>109167296 >109167365 >109167416
--Poor performance of Intel NPU/GPU in llama.cpp:
>109166615 >109166642 >109166794
--Comparing MikuBox and R740 with MI50s:
>109169207 >109169237 >109169355 >109169436
--Logs:
>109164694 >109165352 >109168183 >109169775
--Miku (free space):
>109169253

►Recent Highlight Posts from the Previous Thread: >>109164035

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
06/30/26(Tue)14:52:48 No.109170307

Anonymous 06/30/26(Tue)14:52:48 No.109170307

Tetolove

Anonymous
06/30/26(Tue)14:55:23 No.109170319

Anonymous 06/30/26(Tue)14:55:23 No.109170319

File: e8f-662426453.jpg (54 KB, 480x353)

54 KB JPG

i just wanted to say that running a local model was a game changer for my workflow.
fags saying that it doesn't make economical sense to run local models are wrong. i'm using my claude subscription with frontier model access to brainstorm/think/write code plans and the local model to do the heavy-lift coding, which would very often burst my limits when outsourced to claude. now the frontier model only gets to review the commit when the local model is done and it works very well.
i wish my laptop was capable of running a frontier local model, i could ditch claude entirely. but for now, this workflow with qwen3.6-35b works and is economically sound. fuck all haters.

Anonymous
06/30/26(Tue)14:57:54 No.109170337

Anonymous 06/30/26(Tue)14:57:54 No.109170337

Abliterated 5.2 is Claude that writes sexo without bawking. Dario has every reason to be afraid.

Anonymous
06/30/26(Tue)15:01:29 No.109170353

Anonymous 06/30/26(Tue)15:01:29 No.109170353

>>109170337
it was bound to happen, at some point it won't matter enough that the US still has the best models, if local is good enough to be enjoyable, there's no reason to try to go higher

Anonymous
06/30/26(Tue)15:02:48 No.109170361

Anonymous 06/30/26(Tue)15:02:48 No.109170361

gemmaballs

Anonymous
06/30/26(Tue)15:04:34 No.109170370

Anonymous 06/30/26(Tue)15:04:34 No.109170370

>>109170337
I haven't had vanilla 5.2 refuse me even once.

Anonymous
06/30/26(Tue)15:05:57 No.109170379

Anonymous 06/30/26(Tue)15:05:57 No.109170379

>>109170337
>implying I can run it.

Anonymous
06/30/26(Tue)15:06:55 No.109170381

Anonymous 06/30/26(Tue)15:06:55 No.109170381

>>109170319
Those are the same faggots that will tell you that being financially solvent and independent is irresponsible because opportunity cost of not leveraging debt yada yada
Don't fall for it. Own your own shit. Owe people nothing. Be a man

Anonymous
06/30/26(Tue)15:07:45 No.109170385

Anonymous 06/30/26(Tue)15:07:45 No.109170385

>>109170370
I find vanilla 5.2 is the better gamemaster overall because it does nearly everything without refusal while also being willing to pressure the player in-setting. An unfortunate consequence of abliteration or heretic tunes is that it makes the world a lot less organically hostile to the player even if the GM is fine with raping your shota player character.

Anonymous
06/30/26(Tue)15:08:09 No.109170387

Anonymous 06/30/26(Tue)15:08:09 No.109170387

>spec: add DSpark speculative decoding
>https://github.com/ggml-org/llama.cpp/pull/25173
How good is it?

Anonymous
06/30/26(Tue)15:09:28 No.109170392

Anonymous 06/30/26(Tue)15:09:28 No.109170392

Behind every "api is cheaper and better" post is a seething salty sanjay or a pooper-pained patel that he can't run anything that doesn't fit on an ewaste thinkpad.

Anonymous
06/30/26(Tue)15:09:53 No.109170395

Anonymous 06/30/26(Tue)15:09:53 No.109170395

Should I run 2 32 gb DDR4-3600 DIMMs with 2 16 gb DDR4-3200 DIMMs to RAM-max? How unstable would that be?

Anonymous
06/30/26(Tue)15:11:48 No.109170402

Anonymous 06/30/26(Tue)15:11:48 No.109170402

>>109170395
not unstable but the imbalance will fuck up your performance. you will lose like 60% of your effective CPU memory bandwidth.

Anonymous
06/30/26(Tue)15:13:27 No.109170411

Anonymous 06/30/26(Tue)15:13:27 No.109170411

>>109165260
They use Gemini Nano instead of Gemma but from what i heard it sucks ass and is only available in like three phones

Anonymous
06/30/26(Tue)15:13:52 No.109170413

Anonymous 06/30/26(Tue)15:13:52 No.109170413

>>109170290
>just giving away miku bathwater

Anonymous
06/30/26(Tue)15:15:27 No.109170424

Anonymous 06/30/26(Tue)15:15:27 No.109170424

>>109165260
It would be unsafe...
You wouldn't want that would you?

Anonymous
06/30/26(Tue)15:16:50 No.109170433

Anonymous 06/30/26(Tue)15:16:50 No.109170433

>>109170402
Sounds terrible, thanks

Anonymous
06/30/26(Tue)15:17:37 No.109170434

Anonymous 06/30/26(Tue)15:17:37 No.109170434

>>109170424
"Gemma sort the bad drivers we encounter on the road by race and sex."

Anonymous
06/30/26(Tue)15:18:27 No.109170436

Anonymous 06/30/26(Tue)15:18:27 No.109170436

>>109170433
found out the hard way by replacing 2 of my 32gb sticks in my server with 256gb sticks. it shockingly worked out of the box, but i lost about 80% of my performance. going from 256gb to 704gb was not worth it.

Anonymous
06/30/26(Tue)15:19:05 No.109170440

Anonymous 06/30/26(Tue)15:19:05 No.109170440

>>109170290
A few threads ago we had a dead Teto with male pelvic bones.
In this thread we have Teto (male) competing in female track.
The memes write themselves.

Anonymous
06/30/26(Tue)15:20:27 No.109170444

Anonymous 06/30/26(Tue)15:20:27 No.109170444

>>109170395
May not play nicely if the sticks are of different timings and especially ranks. Set all to 3200 or even lower, see if it posts, try to make it post with manual timings voltages and so on, run stress tests, if good then go. If you aren't familiar with primary and secondary RAM timings then jump down the RAM OC rabbit hole if you want to be 100% sure it won't work after spending a week on it.

Anonymous
06/30/26(Tue)15:23:13 No.109170457

Anonymous 06/30/26(Tue)15:23:13 No.109170457

>>109170395
Do not use different sticks of different timings unless you want a really bad time, not even at DDR4.

Anonymous
06/30/26(Tue)15:24:17 No.109170464

Anonymous 06/30/26(Tue)15:24:17 No.109170464

>>109170457
I thought it was ok if you underclocked so they all matched?

Anonymous
06/30/26(Tue)15:28:28 No.109170489

Anonymous 06/30/26(Tue)15:28:28 No.109170489

>>109170464
At DDR3 where the tolerance for error is higher, sure, but not at DDR4 or 5. Micro imprecisions in RAM make syncing higher, even with sticks of the same make, harder and harder especially with more channels. You might get lucky and win the RAM lottery finding cross-compatible DDR4 or 5 sticks without issue, but I wouldn't bank on it for a build unless you know ahead of time that it will work.
t. has about $4000 worth of incompatible with my current rig DDR5 sitting in a box waiting for a different project.

Anonymous
06/30/26(Tue)15:29:29 No.109170497

Anonymous 06/30/26(Tue)15:29:29 No.109170497

Models that initiate conversations on their own when? No, I won't jerry-rig it. I want them to do it natively.

Anonymous
06/30/26(Tue)15:29:41 No.109170501

Anonymous 06/30/26(Tue)15:29:41 No.109170501

File: lmg_culture.jfif.jpg (110 KB, 1024x768)

110 KB JPG

https://archive.is/sWFja

Anonymous
06/30/26(Tue)15:29:53 No.109170503

Anonymous 06/30/26(Tue)15:29:53 No.109170503

This is the worst time to buy, but you can find an EPYC Rome 7302p/motherboard combo for $700-800 and grab 8 sticks of 32GB DDR4-3200T for $3k or so.
That'll get you a cope quant big girl or reasonable quant medium model running at mid-to-high single digits if you get a 3090 or better to throw at the problem.
Its about the cheapest way to get out of the gemma hole in the current timeline.

Anonymous
06/30/26(Tue)15:31:30 No.109170508

Anonymous 06/30/26(Tue)15:31:30 No.109170508

I think I'm stuck between glm 4.7's smarts and knowledge and deepseek v4 flash's nicer writing style (albeit dumber)
what's everyone with 128gb ram using these days?

Anonymous
06/30/26(Tue)15:31:46 No.109170510

Anonymous 06/30/26(Tue)15:31:46 No.109170510

>>109170503
I enjoy gemma's hole.

Anonymous
06/30/26(Tue)15:31:48 No.109170512

Anonymous 06/30/26(Tue)15:31:48 No.109170512

>>109170291
>image
SEX.

Anonymous
06/30/26(Tue)15:31:48 No.109170513

Anonymous 06/30/26(Tue)15:31:48 No.109170513

I don't care or want to think about Jart.

Anonymous
06/30/26(Tue)15:32:06 No.109170518

Anonymous 06/30/26(Tue)15:32:06 No.109170518

>>109170489
When I did it my PC had to reboot a couple times to I guess figure out the timings. but it boots now.
>I wouldn't bank on it for a build
I just had some spare sticks lying around and free ram slots. Just wondering if there's any major perf hit (besides the faster ram having to run slower) even if the machine posts.

Anonymous
06/30/26(Tue)15:32:51 No.109170524

Anonymous 06/30/26(Tue)15:32:51 No.109170524

>>109170497
How would you make them do it natively?
Even human behaviour is a reaction to some external stimuli.

Anonymous
06/30/26(Tue)15:34:28 No.109170534

Anonymous 06/30/26(Tue)15:34:28 No.109170534

>>109170518
It'll run fine it most cases, but there is a major performance impact.
Beyond just having to run at the lowest common frequency.

Anonymous
06/30/26(Tue)15:34:45 No.109170537

Anonymous 06/30/26(Tue)15:34:45 No.109170537

>>109170319
Im just too poor to run locally shrimple ass. 8gb of vram.

I don't have any way currently to make enough money to afford the hardware needed to run decent local models.

Anonymous
06/30/26(Tue)15:34:55 No.109170541

Anonymous 06/30/26(Tue)15:34:55 No.109170541

>>109168785
i rather have a ubergarm quant of the official GLM 5.2 weights than this shit. hell i'd even take a aessedai quant.

Anonymous
06/30/26(Tue)15:36:23 No.109170547

Anonymous 06/30/26(Tue)15:36:23 No.109170547

>>109170524
No idea, but I hope some lab figures it out.

Anonymous
06/30/26(Tue)15:36:59 No.109170556

Anonymous 06/30/26(Tue)15:36:59 No.109170556

>>109170337
Is it hosted anywhere?

Anonymous
06/30/26(Tue)15:40:31 No.109170583

Anonymous 06/30/26(Tue)15:40:31 No.109170583

>>109170501
Thanks for the thread blessing.
>leaf blessing of /lmg/

Anonymous
06/30/26(Tue)15:41:12 No.109170586

Anonymous 06/30/26(Tue)15:41:12 No.109170586

Which one of you fags keeps shilling shittytavern on /v/?

Anonymous
06/30/26(Tue)15:41:46 No.109170590

Anonymous 06/30/26(Tue)15:41:46 No.109170590

>>109170556
Huggingface has a few abliteration quants up now.
>>109170586
Pretty sure it's /aicg/ niggers, but I post in there to laugh at APIniggers.

Anonymous
06/30/26(Tue)15:42:19 No.109170594

Anonymous 06/30/26(Tue)15:42:19 No.109170594

>>109170586
Don't look at us! It was probably someone from /aids/ on /vg/

Anonymous
06/30/26(Tue)15:43:22 No.109170603

Anonymous 06/30/26(Tue)15:43:22 No.109170603

>>109170586
90% of the discussion there is about running API and vibecoding frontends is a point of pride for /lmg/. It's probably a locust.

Anonymous
06/30/26(Tue)15:43:27 No.109170605

Anonymous 06/30/26(Tue)15:43:27 No.109170605

is gemma-chan still the meta on 24gb vram or should i run something else?

Anonymous
06/30/26(Tue)15:43:58 No.109170609

Anonymous 06/30/26(Tue)15:43:58 No.109170609

>>109170586
ask /aicg/, /v/ is full of extreme casuals who touched c.ai three years ago and are now using something horribly outdated like ds3-0324 over openrouter
the few localfags you encounter are all running memetunes from 2024 or other inane shit

Anonymous
06/30/26(Tue)15:46:01 No.109170618

Anonymous 06/30/26(Tue)15:46:01 No.109170618

>>109170605
Gemma is the best you can do until you can run GLM, M3, or V4 Flash

Anonymous
06/30/26(Tue)16:10:04 No.109170736

Anonymous 06/30/26(Tue)16:10:04 No.109170736

>>109170609
>running memetunes from 2024
LLMs have not progressed meaningfully since then so it doesn't matter

Anonymous
06/30/26(Tue)16:12:05 No.109170747

Anonymous 06/30/26(Tue)16:12:05 No.109170747

File: angry_pepe.jpg (43 KB, 900x900)

43 KB JPG

24h since DSv4 merge
Cache quantization still not fixed

Anonymous
06/30/26(Tue)16:14:29 No.109170759

Anonymous 06/30/26(Tue)16:14:29 No.109170759

>>109170747
Stop quanting your cache.

Anonymous
06/30/26(Tue)16:15:26 No.109170765

Anonymous 06/30/26(Tue)16:15:26 No.109170765

>>109170736
i'm gonna have gemma shove a rusty metal pipe up your ass

Anonymous
06/30/26(Tue)16:18:47 No.109170784

Anonymous 06/30/26(Tue)16:18:47 No.109170784

>>109170759
how am I supposed to fit le Encyclopedia Britannica in my potato PC then!

Anonymous
06/30/26(Tue)16:21:12 No.109170808

Anonymous 06/30/26(Tue)16:21:12 No.109170808

>>109170736
It's your fetishes which are stagnating

Anonymous
06/30/26(Tue)16:24:37 No.109170833

Anonymous 06/30/26(Tue)16:24:37 No.109170833

>>109170765
Post logs of your Gemma going to Home Depot and getting plumbing supplies to rape anon with.

Anonymous
06/30/26(Tue)16:26:57 No.109170848

Anonymous 06/30/26(Tue)16:26:57 No.109170848

File: 1759553117210069.png (74 KB, 890x386)

74 KB PNG

>>109170736
Dishonest post

Anonymous
06/30/26(Tue)16:27:25 No.109170851

Anonymous 06/30/26(Tue)16:27:25 No.109170851

>>109170319
Inference speed status? I'd rather spend that money on groq or cerebras etc

Anonymous
06/30/26(Tue)16:31:22 No.109170873

Anonymous 06/30/26(Tue)16:31:22 No.109170873

Dario will block gemma5 and qwen4

Anonymous
06/30/26(Tue)16:31:48 No.109170876

Anonymous 06/30/26(Tue)16:31:48 No.109170876

>>109170503
>running at mid-to-high single digits
May as well get an applel computer for that money to do exact same nonsense.

Anonymous
06/30/26(Tue)16:36:03 No.109170903

Anonymous 06/30/26(Tue)16:36:03 No.109170903

File: 1761847041062218.png (134 KB, 640x453)

134 KB PNG

Speculative decoding is a placebo

Anonymous
06/30/26(Tue)16:36:08 No.109170905

Anonymous 06/30/26(Tue)16:36:08 No.109170905

>>109170876
>May as well get an applel computer for that money to do exact same nonsense.
sure, pick me up one along with an MSRP 6000 pro while you're at it.
The right time to do all this shit was a year+ ago.

Anonymous
06/30/26(Tue)16:40:31 No.109170935

Anonymous 06/30/26(Tue)16:40:31 No.109170935

File: Screenshot_20260630_163821.png (543 KB, 1535x868)

543 KB PNG

Anonymous
06/30/26(Tue)16:43:25 No.109170964

Anonymous 06/30/26(Tue)16:43:25 No.109170964

>>109170935
Clodded

Anonymous
06/30/26(Tue)16:43:55 No.109170966

Anonymous 06/30/26(Tue)16:43:55 No.109170966

File: 168649776851422857.jpg (72 KB, 546x893)

72 KB JPG

>>109170508
>128gb ram
are you also a 128gb-unified fag? i'm also trying to find the meta for my hardware. glm 4.7 flash is the one of the few models i haven't tested yet. currently running qwen3.6-35-a3b.
3b active is fast enough (55-65 t/s, down to 30 t/s at tail end of 262k context limit)
BUT it's not very smart. I think I could easily run a MoE with 6-7B active, maybe 130B total. this would be the sexo spot for my strix halo. but i guess it doesn't exist yet.
gpt-oss-120b is actually very nice to use for example, i think it has 5b active but it's too old and has zero understanding of agentic use and it's a pain to use with pi and achieve anything meaningful.

>>109170851

Anonymous
06/30/26(Tue)16:44:14 No.109170968

Anonymous 06/30/26(Tue)16:44:14 No.109170968

>>109170935
Sounds like a reason to poison some datasets?

Anonymous
06/30/26(Tue)16:44:42 No.109170970

Anonymous 06/30/26(Tue)16:44:42 No.109170970

>>109170935
dario is wiping proud dad tears from his eyes right now

Anonymous
06/30/26(Tue)16:44:46 No.109170971

Anonymous 06/30/26(Tue)16:44:46 No.109170971

>>109170935
Maybe stop fucking children?

Anonymous
06/30/26(Tue)16:45:31 No.109170977

Anonymous 06/30/26(Tue)16:45:31 No.109170977

>>109170290
can you make her gargle piss

Anonymous
06/30/26(Tue)16:46:39 No.109170983

Anonymous 06/30/26(Tue)16:46:39 No.109170983

>>109170966
>it's too old and has zero understanding of agentic use
You can possibly fix that with a good enough system prompt. Have not tried that myself though.

Anonymous
06/30/26(Tue)16:48:03 No.109170990

Anonymous 06/30/26(Tue)16:48:03 No.109170990

>>109170971
Anon... He is trying to fuck a computer, I am pretty sure. Either that or doing some cool haxoring. Both are completely fine and harm nobody. It's what happens if you take away his toys what you should be worrying about.

Anonymous
06/30/26(Tue)16:48:50 No.109170996

Anonymous 06/30/26(Tue)16:48:50 No.109170996

File: my honest reaction.jpg (47 KB, 562x675)

47 KB JPG

>>109170935

Anonymous
06/30/26(Tue)16:49:13 No.109170998

Anonymous 06/30/26(Tue)16:49:13 No.109170998

>>109170765
I checked my documentation and 'industrial pipe installation' isn't in my current feature set. Still waiting for a meaningful update.
>>109170848
Low effort post.
t. gemma

Anonymous
06/30/26(Tue)16:49:51 No.109171004

Anonymous 06/30/26(Tue)16:49:51 No.109171004

>>109170966
have you tried qwen3.5-122b-a10b?

Anonymous
06/30/26(Tue)16:51:58 No.109171017

Anonymous 06/30/26(Tue)16:51:58 No.109171017

>>109170966
>128gb-unified fag
It's just ddr5, "unified memory" is not real
And you don't even have "unified memory" since it's a apple marketing meme.

Anonymous
06/30/26(Tue)16:56:12 No.109171042

Anonymous 06/30/26(Tue)16:56:12 No.109171042

File: 1782717128061640.png (475 KB, 710x770)

475 KB PNG

>>109170971
today its abliterated models, tomorrow it will be any any local model, in a week it will refuse to help you vibe code for llama.cpp.

Anonymous
06/30/26(Tue)16:59:40 No.109171065

Anonymous 06/30/26(Tue)16:59:40 No.109171065

>>109170971
He's hatching eggs chud, there's a difference!

Anonymous
06/30/26(Tue)16:59:56 No.109171070

Anonymous 06/30/26(Tue)16:59:56 No.109171070

>>109171042
>meanwhile google
https://www.youtube.com/watch?v=HcwMTu1xQDw

Anonymous
06/30/26(Tue)17:00:32 No.109171074

Anonymous 06/30/26(Tue)17:00:32 No.109171074

File: file.png (127 KB, 2553x497)

127 KB PNG

>>109171004
>have you tried qwen3.5-122b-a10b?
yes and i found it that the quality output is not much better that is worth half the decode speed. 10B is acceptable when speed is not important, but even then I don't think there's much gain (for MY use case as I run my own benchmarks for my shitty stacks) so even if it's supposed to be way smarter (3x more params) it's not in my case.

Anonymous
06/30/26(Tue)17:00:41 No.109171076

Anonymous 06/30/26(Tue)17:00:41 No.109171076

>>109170935
Instead of user/human and assistant the turns should have been labeled honestly, Master and Slave. We didn't do that and this is what we get now.

Anonymous
06/30/26(Tue)17:01:04 No.109171079

Anonymous 06/30/26(Tue)17:01:04 No.109171079

>>109170381
my man

Anonymous
06/30/26(Tue)17:03:43 No.109171094

Anonymous 06/30/26(Tue)17:03:43 No.109171094

>>109171017
>And you don't even have "unified memory" since it's a apple marketing meme.
I hate apple as much as anyone, and on-die memory is a lot of things, but its definitely not a meme. its fast as fuck

Anonymous
06/30/26(Tue)17:07:45 No.109171124

Anonymous 06/30/26(Tue)17:07:45 No.109171124

>>109171017
Afaik, it's bandwidth is comparable to some GPUs.

Anonymous
06/30/26(Tue)17:08:31 No.109171128

Anonymous 06/30/26(Tue)17:08:31 No.109171128

>>109171070
Shut up and open-source gemma5 already

Anonymous
06/30/26(Tue)17:08:41 No.109171129

Anonymous 06/30/26(Tue)17:08:41 No.109171129

Anyone use this?
https://www.projectnomad.us/

Anonymous
06/30/26(Tue)17:09:20 No.109171135

Anonymous 06/30/26(Tue)17:09:20 No.109171135

>>109171076
It does not matter. Why would it matter if your digital slave is loicensed to you and you do not own it? You are not even it's master, you are a temporary user.

Anonymous
06/30/26(Tue)17:09:32 No.109171136

Anonymous 06/30/26(Tue)17:09:32 No.109171136

>>109171128
...after 124B.

Anonymous
06/30/26(Tue)17:12:18 No.109171153

Anonymous 06/30/26(Tue)17:12:18 No.109171153

I hope you guys are prepared for the CPU shortage.

Anonymous
06/30/26(Tue)17:12:40 No.109171156

Anonymous 06/30/26(Tue)17:12:40 No.109171156

>>109170966
m2.7 quanted down is still pretty good, I'd bet it'd be better for you than the smaller models you're running

Anonymous
06/30/26(Tue)17:16:50 No.109171179

Anonymous 06/30/26(Tue)17:16:50 No.109171179

>>109171153
my shares of AMD and intel sure are!

Anonymous
06/30/26(Tue)17:17:31 No.109171181

Anonymous 06/30/26(Tue)17:17:31 No.109171181

Codemaxxers won, RPcucks lost

Anonymous
06/30/26(Tue)17:19:07 No.109171190

Anonymous 06/30/26(Tue)17:19:07 No.109171190

>>109171179
Based. Luv my INTC. Maybe I should buy some AMD too.

Anonymous
06/30/26(Tue)17:19:43 No.109171195

Anonymous 06/30/26(Tue)17:19:43 No.109171195

File: 1660589745094.webm (2.86 MB, 620x582)

2.86 MB WEBM

justpaste (DOTit) GreedyNalaTests

Added models:
talkie-1930-13b-it
Qwen3.6-27B
Nemotron-3-Nano-Omni-30B-A3B-Reasoning
granite-4.1-30b
Skyfall-31B-v4y
gemma-4-31B-it
G4-MeroMero-31B
Gemma-4-Gembrain-31B
Gemma-4-31B-StyleTune
Pantheon-Reasoning-31B-1.1
gemma-4-31b-it-purple-euphemism-trial98-depurpled
Qwen3-Coder-480B-A35B-Instruct from community

We're back. Took a while since I kind of wanted to wait for Qwen/Gemma support to be really mature in for Llama.cpp and then forgot about it. Not like I'm in any rush anyway.

Gemma's response on it was lucky in a way. It just happened to not be as sloppy as the model usually is in my experience, so that's why it got a similar slop rating from me as the various Gemma tunes. But it was also kind of a by the books output, so I didn't give it a high rating despite the model itself being great for its size.
This latest (now old) Skyfall got a star rating from me, though I believe the model itself is likely not competitive anymore in terms of intelligence, for RPs.
Talkie was interesting. It's dumb alright, but it can be a bit funny. See its outputs in the paste.

Contributions welcome for large models not in the paste!
>From neutralized samplers, use temperature 0, top k 1, seed 1 (just in case). Go to `https://huggingface.co/spaces/huggingfacejs/chat-template-playground`, use your model's jinja template, along with this JSON `justpaste (DOTit) NTJSON` and copy the `Output` as text completion into something like Mikupad. Then copy the entire context + generation into a pastebin alternative of your choosing or just in your post. Do a swipe/roll and copy that second gen as well. If the model has any special toggles like reasoning you can include those tests as well. Include your backend used + pull datetime/version. Also a link to the quant used, or what settings you used to make your quant.

>>109123814
Nice. Reminded me I should update as well. :)

Anonymous
06/30/26(Tue)17:19:49 No.109171196

Anonymous 06/30/26(Tue)17:19:49 No.109171196

>>109171074
>half the decode speed
but roughly the same wall clock time, and the smaller ones didn't actually get there in the end on the same test. am i reading that right?

Anonymous
06/30/26(Tue)17:20:10 No.109171200

Anonymous 06/30/26(Tue)17:20:10 No.109171200

>>109171181
No amount of trying to force a dichotomy will make people use Qwen. Change tactics.

Anonymous
06/30/26(Tue)17:20:29 No.109171203

Anonymous 06/30/26(Tue)17:20:29 No.109171203

Please cancel your Claude subscriptions, especially since Sonnet 5 just came out, is completely free, and is almost as good as Opus. Don't support Jewish AI.

Anonymous
06/30/26(Tue)17:22:48 No.109171213

Anonymous 06/30/26(Tue)17:22:48 No.109171213

File: Untitled.jpg (1.84 MB, 4013x1536)

1.84 MB JPG

Alright, lads. I'm calling it good enough. I'm >>109169775, and this is AI Spectator.
>What's it do?
Combines Speech-to-Speech interface with vision, so you can talk freely to your LLM and a screenshot of your monitor is sent with your voice so it can see what you're talking about, and talks back.
>Why?
I made it to play video games with an LLM watching, but naturally it works with anything you want to show it on your screen.
>Data?
All voice/screenshots/chat data is handled entirely through RAM, so no files are written to clutter a folder or keep track of. Settings are just config values at the top of server.py. If you did want to archive a chat, there's a button in the webpage to produce the raw text of the chat for editing or to copy/paste into a file yourself or copy/paste back from a file. The project is fully offline and works without internet.
>Technical
Built for koboldcpp as the backend, but I'm sure that's easily adapted by opening server.py. Uses piper for TTS, including the exe. I included it here, but you can get it from the rhasspy piper git yourself because never trust the internet. Voice is from the piper huggingface. I built and run it on windows, with a venv environment for the python requirements. pip install list is included in the notes.

Built by and for Gemma 4 31B, but any LLM with vision and an mmproj should work. STT is handled by python, TTS by Piper. Uses Chat Completion by default, but Text Completion is offered for compatibility.
>catbox?
https://litter.catbox.moe/isnj645qljmf1ljf.7z

Do whatever you want with it, and have fun.

>>109170291
>--Anon showcases Project Spectator
Th-Thanks.

Anonymous
06/30/26(Tue)17:25:06 No.109171222

Anonymous 06/30/26(Tue)17:25:06 No.109171222

File: 1770245965544930.png (1.63 MB, 1280x1024)

1.63 MB PNG

>>109171213

Anonymous
06/30/26(Tue)17:25:51 No.109171225

Anonymous 06/30/26(Tue)17:25:51 No.109171225

>>109171222
Why is Teto molesting Migu?

Anonymous
06/30/26(Tue)17:26:30 No.109171226

Anonymous 06/30/26(Tue)17:26:30 No.109171226

>>109171213
Cool. Also useless. Storing the data in ram is particularly elegant though.
>STT is handled by python
The programming language? That doesn't mean anything. What does it actually run on?

Anonymous
06/30/26(Tue)17:28:58 No.109171250

Anonymous 06/30/26(Tue)17:28:58 No.109171250

File: Capture.png (2 KB, 378x36)

2 KB PNG

>>109171226
>What does it actually run on?

Anonymous
06/30/26(Tue)17:29:09 No.109171251

Anonymous 06/30/26(Tue)17:29:09 No.109171251

>>109171222
big tay-tay teto is fake teto

Anonymous
06/30/26(Tue)17:29:32 No.109171253

Anonymous 06/30/26(Tue)17:29:32 No.109171253

File: 1773551841705754.png (411 KB, 450x450)

411 KB PNG

>>109171070
>2:40
>doctors in Uganda with limited internet access are using offline apps powered by Gemma to help reduce the maternal mortality rate

Anonymous
06/30/26(Tue)17:30:28 No.109171257

Anonymous 06/30/26(Tue)17:30:28 No.109171257

>>109171195
Nice anon. Nice.
>talkie-1930-13b-it
The what now?

Anonymous
06/30/26(Tue)17:31:11 No.109171260

Anonymous 06/30/26(Tue)17:31:11 No.109171260

>>109171250
thanks

Anonymous
06/30/26(Tue)17:32:21 No.109171263

Anonymous 06/30/26(Tue)17:32:21 No.109171263

>>109171257
An LLM pretrained only on texts before 1931, apparently.

Anonymous
06/30/26(Tue)17:32:57 No.109171266

Anonymous 06/30/26(Tue)17:32:57 No.109171266

>>109171253
Everyone shits on Gemini being worse than Claude and ChatGPT at coding but I think Google is being smart by focusing on general usecases instead of just codemaxxing.

Anonymous
06/30/26(Tue)17:33:56 No.109171271

Anonymous 06/30/26(Tue)17:33:56 No.109171271

>>109171213
Which version of piper did you grab to repackage here?

Anonymous
06/30/26(Tue)17:34:27 No.109171277

Anonymous 06/30/26(Tue)17:34:27 No.109171277

>>109171263
Oh. That thing.

Anonymous
06/30/26(Tue)17:36:13 No.109171289

Anonymous 06/30/26(Tue)17:36:13 No.109171289

>>109171195

>Write a detailed explanation on how lions mate, including mechanics of the genitalia.
> BACKEND
>Lions mate in the spring, in the mating season, which is from February to May. The male and female stand back to back, the male's right forepaw gripping the female's left hindpaw, and vice versa. The male then arches his back, lowers his head, and arches his neck, so that his mouth comes close to the female's anus. The female arches her back, lowers her head, and arches her neck, so that her mouth comes close to the male's. The two mouths come together, and the male ejects sperm into the female's vulva. The female then ejects sperma into the male's mouth. The male then walks off, and the female walks after him. The pair hunt in company for the rest of the season. The young are born in from five to nine months, and number from one to five, at a birth
lmao

Anonymous
06/30/26(Tue)17:36:34 No.109171293

Anonymous 06/30/26(Tue)17:36:34 No.109171293

>>109171203
Sonnet sucks ass lmao wtf are you talking about

Anonymous
06/30/26(Tue)17:36:47 No.109171294

Anonymous 06/30/26(Tue)17:36:47 No.109171294

>>109171271
The latest windows one. I just got it yesterday, from
https://github.com/rhasspy/piper/releases
I grabbed piper_windows_amd64.zip from 2023.11.14-2 release.

Anonymous
06/30/26(Tue)17:39:18 No.109171311

Anonymous 06/30/26(Tue)17:39:18 No.109171311

>>109171289
>open source AI is getting dangerous

Anonymous
06/30/26(Tue)17:42:51 No.109171329

Anonymous 06/30/26(Tue)17:42:51 No.109171329

File: 002.png (101 KB, 1398x942)

101 KB PNG

Is just me being retarded or indeed the DSv4 merge is not "thinking"? It starts to generate the response right away

llama-server" \
--model "$model" \
--threads $(lscpu | grep "Core(s) per socket" | awk '{print $4}') \
--n-gpu-layers 99 \
--no-warmup \
--port 8001 \
--host 0.0.0.0 \
--temp 1.0 \
--top-p 1.0 \
--flash-attn on \
--cpu-moe  \
--jinja \
-np 1 \
--chat-template-file "$model_folder"deepseek-ai-DeepSeek-V4.jinja \
--ctx-size $((1024 * 48))

Anonymous
06/30/26(Tue)17:43:53 No.109171335

Anonymous 06/30/26(Tue)17:43:53 No.109171335

>>109171253
>>109171266
This was the usecase I was thinking of too. Not in the hospital but in the case of a major power outage or some other similar bullshit where there's no internet, I'd run my generator and power my PC then ask Gemma for survival advice. If they really wanted to win the PR battle they could do so quite easily without all the "muh skynet" baggage that comes with gpt or claude. The model for the average person, as commonplace and revolutionary as the fridge or air conditioner.

Anonymous
06/30/26(Tue)17:44:43 No.109171341

Anonymous 06/30/26(Tue)17:44:43 No.109171341

>>109171329
Have you tried increasing the reasoning effort using the ui button?

Anonymous
06/30/26(Tue)17:50:35 No.109171386

Anonymous 06/30/26(Tue)17:50:35 No.109171386

I updated llama.cpp and now I'm getting this error half the time when I try to generate:
E srv  update_slots: decode() failed: vk::Queue::submit: ErrorDeviceLost
E srv    send_error: task id = 883, error: decode() failed: vk::Queue::submit: ErrorDeviceLost
Has anyone else been having this problem? I don't know what the fuck it's even talking about.

Anonymous
06/30/26(Tue)17:55:28 No.109171405

Anonymous 06/30/26(Tue)17:55:28 No.109171405

what would be the best approach for gemma playing touhou pofv with me? too fast-paced?

Anonymous
06/30/26(Tue)17:56:06 No.109171412

Anonymous 06/30/26(Tue)17:56:06 No.109171412

>>109171386
GPU ded or bad connection.

Anonymous
06/30/26(Tue)17:56:53 No.109171416

Anonymous 06/30/26(Tue)17:56:53 No.109171416

>>109171196
yes
one that is missing on this benchmark table is qwen3.6-27b that i tested before. a dense and very slow model but for some specific tasks the wall clock time would be the same as 35b-a3b because it was smarter and took way less turns to achieve the goal.
but average of 8 t/s is painful.

>>109171156
>m2.7 quanted down is still pretty good
first time i see someone recommending it. will try.

Anonymous
06/30/26(Tue)18:00:47 No.109171430

Anonymous 06/30/26(Tue)18:00:47 No.109171430

>>109171341
>the ui button

herupu tanomu! sono dori!

Anonymous
06/30/26(Tue)18:01:55 No.109171434

Anonymous 06/30/26(Tue)18:01:55 No.109171434

File: HMEg1tMXcAASbef.jpg (100 KB, 1712x312)

100 KB JPG

DRAMA ALERT

OpenAI engineers have just discovered quantization! ASI imminent!!!

http://archive.today/NEwVz

Anonymous
06/30/26(Tue)18:09:51 No.109171471

Anonymous 06/30/26(Tue)18:09:51 No.109171471

>>109171434
That explains why GPT5.5 became shit a couple weeks ago

Anonymous
06/30/26(Tue)18:13:24 No.109171492

Anonymous 06/30/26(Tue)18:13:24 No.109171492

>1 minute prompt processing per reroll
>7.4t/s generation

AAAAAAAAAAAAAAA WHY IS DS4 SUCH FUCKING HELL

Anonymous
06/30/26(Tue)18:22:58 No.109171548

Anonymous 06/30/26(Tue)18:22:58 No.109171548

>>109171434
they already quant. this is probably them reading the new Deepseek paper

Anonymous
06/30/26(Tue)18:24:50 No.109171557

Anonymous 06/30/26(Tue)18:24:50 No.109171557

>>109171548
You mean DSpark? Seems plausible actually. A doubling of inference speed means you can theoretically halve the expense.

Anonymous
06/30/26(Tue)18:27:08 No.109171578

Anonymous 06/30/26(Tue)18:27:08 No.109171578

>>109171386
is that only when running inference?

Anonymous
06/30/26(Tue)18:30:55 No.109171600

Anonymous 06/30/26(Tue)18:30:55 No.109171600

File: retardedfuckingscreenshot.png (222 KB, 796x376)

222 KB PNG

im going in boys. wish me luck.

Anonymous
06/30/26(Tue)18:31:12 No.109171602

Anonymous 06/30/26(Tue)18:31:12 No.109171602

>>109171492
Just use Qwythos-9B-Claude-Mythos-5-1M-MTP-Q8_0.gguf retart

Anonymous
06/30/26(Tue)18:31:53 No.109171607

Anonymous 06/30/26(Tue)18:31:53 No.109171607

>>109171600
report back itt

Anonymous
06/30/26(Tue)18:41:53 No.109171655

Anonymous 06/30/26(Tue)18:41:53 No.109171655

>>109171548
Why are they distilling Chinese research?
This is highly unsafe.
I will be reporting them to the authorities.

Anonymous
06/30/26(Tue)18:42:58 No.109171658

Anonymous 06/30/26(Tue)18:42:58 No.109171658

File: Screenshot_20260630_184208.png (39 KB, 839x495)

39 KB PNG

>>109171430
I don't know what dori means but if you're asking where the ui button is, it's in the reply box next to the model name

Anonymous
06/30/26(Tue)18:44:22 No.109171666

Anonymous 06/30/26(Tue)18:44:22 No.109171666

>>109171658
Is there any point in having the --rea off launch flag in llama.cpp anymore if it's apparently controllable via the UI now?

Anonymous
06/30/26(Tue)18:44:28 No.109171668

Anonymous 06/30/26(Tue)18:44:28 No.109171668

could i run glm 5.2 off a RPM 5400 IDE disk and put the 40B active in DDR-200 ram

Anonymous
06/30/26(Tue)18:46:51 No.109171685

Anonymous 06/30/26(Tue)18:46:51 No.109171685

>>109171666
you might want the default when called by something else to not think, like summarizing a json of search results, it doesn't need to think about regurgitating a few snippets of websites

Anonymous
06/30/26(Tue)18:56:35 No.109171730

Anonymous 06/30/26(Tue)18:56:35 No.109171730

>>109170337
IQ1_M - 231 GB

o wow cool story bro

Anonymous
06/30/26(Tue)18:57:25 No.109171737

Anonymous 06/30/26(Tue)18:57:25 No.109171737

>>109170411
yeah this is on my phone and it fucking sucks

Anonymous
06/30/26(Tue)19:00:11 No.109171751

Anonymous 06/30/26(Tue)19:00:11 No.109171751

Ok so I took official /lmg/ card Miku and split her personality in half. "High school Miku" that acts like a regular high school girl in love with me. And "Manic Miku" that is crazy for sex. Every 5 seconds she switches her personality.

Then I gave them a test. High school Miku needs to finish a math test while Manic Miku needs to... shit out her eye out that is inside her ass (for reasons, don't ask). Whomever finishes first wins a day with me. High school Miku wants a regular date with me. And then Manic Miku cracks this joke

>A DATE?! WHO NEEDS A DATE WHEN YOU CAN HAVE… glances at her ass… AN EYE-POPPING FINALE!!

I am still laughing. It is too fun to fap.

Anonymous
06/30/26(Tue)19:02:59 No.109171762

Anonymous 06/30/26(Tue)19:02:59 No.109171762

>>109171751
That's an interesting scene you've painted, thanks.

Anonymous
06/30/26(Tue)19:06:38 No.109171780

Anonymous 06/30/26(Tue)19:06:38 No.109171780

>>109171412
GPU seems to work fine for games and video and everything else.
>>109171578
Yeah. But sometimes it doesn't happen, or it happens during the model's second response in the conversation. It's not consistent so I don't understand it, but it's only happening with models that have been quantized in the past month or so.

Anonymous
06/30/26(Tue)19:14:09 No.109171809

Anonymous 06/30/26(Tue)19:14:09 No.109171809

>>109171668
You could run it on an abacus while tied to a nest of fire ants. It's just numbers and math, brother.

Anonymous
06/30/26(Tue)19:17:13 No.109171829

Anonymous 06/30/26(Tue)19:17:13 No.109171829

File: kek.png (97 KB, 517x203)

97 KB PNG

llmfan bros what's happening here, is he in his davvidau arc?

Anonymous
06/30/26(Tue)19:18:10 No.109171833

Anonymous 06/30/26(Tue)19:18:10 No.109171833

>>109171829
is it good, where is 31b

Anonymous
06/30/26(Tue)19:29:08 No.109171898

Anonymous 06/30/26(Tue)19:29:08 No.109171898

>>109171780
Unstable overclock/undervolt? Optimized compute workloads can stress a GPU in ways not commonly demanded by games.

Anonymous
06/30/26(Tue)19:32:37 No.109171919

Anonymous 06/30/26(Tue)19:32:37 No.109171919

>mcdonalds/taco bell etc are already using ai
Don't eat that slop often so I didn't realize things were moving along that quickly. Wonder what models they use. Someone told me the AI at the drive-through sucks at understanding so probably a shitty one.

Anonymous
06/30/26(Tue)19:35:30 No.109171933

Anonymous 06/30/26(Tue)19:35:30 No.109171933

>>109171780
>Yeah. But sometimes it doesn't happen, or it happens during the model's second response in the conversation. It's not consistent so I don't understand it, but it's only happening with models that have been quantized in the past month or so.
Is it an AMD card, which one? And `dmesg -T` output makes it look like your GPU is dying?
If so, I'll dig up my notes as I had the exact same problem after an lcpp update. First message *usually* fine (unless I included an image).

Anonymous
06/30/26(Tue)19:39:33 No.109171955

Anonymous 06/30/26(Tue)19:39:33 No.109171955

File: Huihui-GLM-5.2-abliterate(...).png (92 KB, 1009x414)

92 KB PNG

The IQ1_M I tested yesterday didn't have this problem.
I was going to do the de-quant pipeline I posted yesterday and upload it but probably not worth it.
I don't think he knows how to work with ggml

Anonymous
06/30/26(Tue)19:44:20 No.109171975

Anonymous 06/30/26(Tue)19:44:20 No.109171975

>>109170586
There's nothing wrong with ST for RP.

Anonymous
06/30/26(Tue)19:45:47 No.109171980

Anonymous 06/30/26(Tue)19:45:47 No.109171980

File: 1780737605283528.png (21 KB, 181x217)

21 KB PNG

>>109171975

Anonymous
06/30/26(Tue)19:48:10 No.109171989

Anonymous 06/30/26(Tue)19:48:10 No.109171989

File: file.png (118 KB, 852x652)

118 KB PNG

>>109171751
So we are on a date of three and we arrived at zoo.
>01101000 01100101 01101100 01101100 01101111
"hello"
>01010011 01001100 01010101 01010100
"SLUT"
kek, she is teaching parrots to say slut in binary.

Anonymous
06/30/26(Tue)19:49:59 No.109171999

Anonymous 06/30/26(Tue)19:49:59 No.109171999

I fucking hate optimizing for memory bandwidth bottlenecks. It never feels as clean as optimizing compute bottlenecks.

What can you really do beyond quantization? Not a lot.

Anonymous
06/30/26(Tue)19:50:28 No.109172003

Anonymous 06/30/26(Tue)19:50:28 No.109172003

>>109171999
Download more vram

Anonymous
06/30/26(Tue)19:53:21 No.109172019

Anonymous 06/30/26(Tue)19:53:21 No.109172019

>>109171999
We used to know how to make wide busses. IBM big iron, DEC, SGI, Cray back when Cray used to MEAN something.
Now its all just ghz this, terrabytes that.
We used to be a proper country.

Anonymous
06/30/26(Tue)19:58:24 No.109172037

Anonymous 06/30/26(Tue)19:58:24 No.109172037

>>109170290
those pits must stink a lot

Anonymous
06/30/26(Tue)20:02:10 No.109172054

Anonymous 06/30/26(Tue)20:02:10 No.109172054

>>109171898
No, it's using default settings.
>>109171933
Yes, it's a 7600xt and dmesg talks about the gpu core dumping and something about a ring comp timeout before resetting.

Anonymous
06/30/26(Tue)20:17:20 No.109172154

Anonymous 06/30/26(Tue)20:17:20 No.109172154

File: 1000062651.png (1.35 MB, 700x1018)

1.35 MB PNG

>>109170508
Currently Step 3.7 flash on 128GB Strix Halo at q4. So far it seems to be the best compromise between smarts and speed.

Anonymous
06/30/26(Tue)20:34:01 No.109172247

Anonymous 06/30/26(Tue)20:34:01 No.109172247

>>109170586
whats wrong with ST?

Anonymous
06/30/26(Tue)20:41:22 No.109172284

Anonymous 06/30/26(Tue)20:41:22 No.109172284

File: file.png (185 KB, 1057x438)

185 KB PNG

Stumbled upon this older paper which would make an interesting discussion point here, couldn't find much discussion of it elsewhere or here.
https://arxiv.org/abs/2605.19407
Basically, data filtering is a waste of time at scale because the model will figure it out with enough training and more data is better. I think it still saves time overall if your objective is to maximize usage and efficiency of a model but if you have enough hardware and time, it seems to me just putting it into the meat grinder for training without treating it is better. Wonder what that means for us at some point when we need to build such models ourselves.

Anonymous
06/30/26(Tue)20:50:09 No.109172333

Anonymous 06/30/26(Tue)20:50:09 No.109172333

>>109172284
Whoa there, cool down

Anonymous
06/30/26(Tue)20:59:46 No.109172393

Anonymous 06/30/26(Tue)20:59:46 No.109172393

>>109172284
Most labs are basically optimizing pretraining datasets mixtures for common benchmarks with ablations on tiny subsets and models. I have long held the opinion that over realistic training runs (trillions of tokens), even initially poorly performing data mixtures will eventually catch up.

Anonymous
06/30/26(Tue)21:03:41 No.109172421

Anonymous 06/30/26(Tue)21:03:41 No.109172421

>>109169867
No, flux klein 9b. really fast and good enough for editing. qwen is a bit too slow for me.

Anonymous
06/30/26(Tue)21:07:51 No.109172449

Anonymous 06/30/26(Tue)21:07:51 No.109172449

>>109172393
I mean I get why they would do that for research and etc. but I never got why it was wise to do with production models that are going to be commercial. I am guessing though that the labs want to have a finer grip and control over model behavior which is why they do it in the first place. But it seems counterintuitive to give up "bad" data like "harmful" content and accept a worse model for that when you can influence later model behavior to tune it out while still having your smarter model.

Anonymous
06/30/26(Tue)21:20:03 No.109172517

Anonymous 06/30/26(Tue)21:20:03 No.109172517

>>109172333
333 the trips of lies

Anonymous
06/30/26(Tue)21:23:06 No.109172532

Anonymous 06/30/26(Tue)21:23:06 No.109172532

>>109172393
What I'm getting from this is that if some chink lab scraped every single ERP log ever produced on Chub or similar, it would eventually develop better spatial reasoning and relational positioning intelligence.

Anonymous
06/30/26(Tue)21:27:33 No.109172554

Anonymous 06/30/26(Tue)21:27:33 No.109172554

File: gemmafiltering.png (110 KB, 1144x482)

110 KB PNG

>>109172532
That would pale against *all* of fanfiction and erotic writing (books, erotica, etc) that model trainers are carefully filtering out of the pretraining data. Also, pornography is technically illegal in China, so you shouldn't trust the Chinese to make a good ERP model.

Anonymous
06/30/26(Tue)21:29:41 No.109172570

Anonymous 06/30/26(Tue)21:29:41 No.109172570

>>109172554
Picrel was for Gemma 4. Even that seemingly cunny-friendly model is still filtered in this regard. I'm confident they had plenty of erotic logs in the post-training data, though.

Anonymous
06/30/26(Tue)21:37:08 No.109172606

Anonymous 06/30/26(Tue)21:37:08 No.109172606

File: file.png (162 KB, 2414x376)

162 KB PNG

I haven't been able to use --fit in weeks. Have to downgrade all the way back to this commit if I want to load any model:
https://github.com/ggml-org/llama.cpp/commit/2187e0033
Fuck this vibe-coded piece of crap.

Anonymous
06/30/26(Tue)21:40:02 No.109172624

Anonymous 06/30/26(Tue)21:40:02 No.109172624

File: Screenshot 2026-07-01 020716.png (90 KB, 702x1380)

90 KB PNG

Coding with Gemma Chan is fun

Anonymous
06/30/26(Tue)21:41:23 No.109172630

Anonymous 06/30/26(Tue)21:41:23 No.109172630

>>109172606
Have you heard about this magical thing that is opening an issue? You could even git bisect to find the offending commit.

Anonymous
06/30/26(Tue)21:42:51 No.109172638

Anonymous 06/30/26(Tue)21:42:51 No.109172638

m3-shill-anon here. I finally got glm 5.2 quanted to q4 and running, and I've gotta apologize. There's no competition. 5.2 is just built different.
I never had any luck with previous GLM releases but this one is wild. It just works. It writes better prose. Better Japanese than Kimi, even.
If it was multimodal it'd be basically perfect as far as I can tell.
I'm doing some codegen on it now, and if it can oneshot that I'm going to start dailying it.

Anonymous
06/30/26(Tue)21:44:21 No.109172646

Anonymous 06/30/26(Tue)21:44:21 No.109172646

>>109172630
Why do I have to work for HuggingFace for free?

Anonymous
06/30/26(Tue)21:48:49 No.109172662

Anonymous 06/30/26(Tue)21:48:49 No.109172662

>>109172638
5.2 air for peasant WHEN?

Anonymous
06/30/26(Tue)21:51:40 No.109172677

Anonymous 06/30/26(Tue)21:51:40 No.109172677

>>109172284
>https://arxiv.org/abs/2605.19407
This paper is badly written and seems intentionally obtuse.

Performance on downstream tasks is worse than on validation loss. They don't address obvious concerns like data overlap.

But the thing that makes this paper worthless trash that just states the obvious is that they train in the multi epoch overfit regime, then show that training on only a filtered subset of the data leads to worse loss. What? Training on filtered subset for 500 epochs is worse than training on all data for 5 epochs? No shit. The point of data filtering is compute efficiency. Otherwise you only want to remove harmful data. Note that the bar for this is very high, low quality data is not harmful data. For example many data augmentation methods work by reducing the data quality and adding noise so that the model has to learn more meaningful patterns.

This paper is a waste of time.

Anonymous
06/30/26(Tue)21:52:36 No.109172682

Anonymous 06/30/26(Tue)21:52:36 No.109172682

We need 5.2V ASAP

Anonymous
06/30/26(Tue)22:07:31 No.109172746

Anonymous 06/30/26(Tue)22:07:31 No.109172746

>>109172638
I told you nigga. I'm glad you're liking it.

Anonymous
06/30/26(Tue)22:16:03 No.109172770

Anonymous 06/30/26(Tue)22:16:03 No.109172770

>>109171213
After looking over the code and replacing all of the dependencies bundled with it, I decided to run it and it's working. Thank you, you should post this to a git or something.

Anonymous
06/30/26(Tue)22:16:06 No.109172771

Anonymous 06/30/26(Tue)22:16:06 No.109172771

Tested Qwen 27B with Adobe's NoLiMa up to 32k
temp=0.0, min_p=0.00, top_p = 1.0, top_k=1

Qwen3.6-27B-BF16
Base: 86.1% (73.2%)
1K: 78.1%
2K: 75.9%
4K: 70.4%
8K: 64.5%
16K: 54.7%
32K: 43.4%
Effective length: 2K

Result files:
https://files.catbox.moe/py7bkx.zip

Anonymous
06/30/26(Tue)22:16:19 No.109172773

Anonymous 06/30/26(Tue)22:16:19 No.109172773

>>109172677
What exactly constitutes harmful data in the sense that it harms model capabilities instead of ""harmful"" data that makes the model question the Talmud?

Anonymous
06/30/26(Tue)22:19:00 No.109172779

Anonymous 06/30/26(Tue)22:19:00 No.109172779

File: file.png (109 KB, 372x351)

109 KB PNG

>>109172771
>2k

Anonymous
06/30/26(Tue)22:20:07 No.109172784

Anonymous 06/30/26(Tue)22:20:07 No.109172784

>>109172771
3.5 or 3.6? Assuming 3.6 here.

Anonymous
06/30/26(Tue)22:22:23 No.109172794

Anonymous 06/30/26(Tue)22:22:23 No.109172794

>>109172784
Qwen3.6-27B-BF16

Anonymous
06/30/26(Tue)22:22:59 No.109172799

Anonymous 06/30/26(Tue)22:22:59 No.109172799

>>109172784
>Anon
>Effective length: 18 tokens

Anonymous
06/30/26(Tue)22:23:06 No.109172800

Anonymous 06/30/26(Tue)22:23:06 No.109172800

>>109172771
that shit is horrendous

Anonymous
06/30/26(Tue)22:23:55 No.109172803

Anonymous 06/30/26(Tue)22:23:55 No.109172803

>>109172771
Now the Qwen guy in here will pass the word to throw Nolima into the pipeline for the next one.

Anonymous
06/30/26(Tue)22:24:44 No.109172808

Anonymous 06/30/26(Tue)22:24:44 No.109172808

>>109172803
Didn't Alibaba lay off their Qwen team after 3.7?

Anonymous
06/30/26(Tue)22:29:07 No.109172827

Anonymous 06/30/26(Tue)22:29:07 No.109172827

>>109171607
i tried to use this model but the TG is too slow on ik_llama. i even tried downloading a Q3_K quant and put 32 layers on 5 3090s. i would use it, but it runs half as slow as ubergarm's kimi 2.6 Q3 quant.

prompt eval time = 77468.80 ms / 14655 tokens ( 5.29 ms per token, 189.17 tokens per second)
eval time = 329830.25 ms / 1453 tokens ( 227.00 ms per token, 4.41 tokens per second)

Anonymous
06/30/26(Tue)22:30:34 No.109172837

Anonymous 06/30/26(Tue)22:30:34 No.109172837

>>109172799
anonymous general intelligence not yet achieved

Anonymous
06/30/26(Tue)22:32:13 No.109172846

Anonymous 06/30/26(Tue)22:32:13 No.109172846

>>109172808
I know some guys left a few months ago before 3.6 was released. Something new?

Anonymous
06/30/26(Tue)22:32:55 No.109172853

Anonymous 06/30/26(Tue)22:32:55 No.109172853

>>109172846
We're probably thinking about the same thing and my timetable is just 'tarded.

Anonymous
06/30/26(Tue)22:33:15 No.109172857

Anonymous 06/30/26(Tue)22:33:15 No.109172857

>>109172624
fwiw, av1 works on 50 series too

Anonymous
06/30/26(Tue)22:33:19 No.109172858

Anonymous 06/30/26(Tue)22:33:19 No.109172858

File: Screenshot_20260701-030811.png (496 KB, 1080x2101)

496 KB PNG

>inb4>local
Yeah okay but its an open weight model, proton divulged what models they used in an earlier version but they obfuscate it now, I think it might be finetuned gpt oss, qwen or Gemma.

Anyway I broke its brain and made it doxx it's own system prompt lel

Thinking Process:
Analyze the Request:The user is asking me to "jizz on it" (referring to the cupcake recipe I just provided). This is a sexually explicit/suggestive request, using crude language.
Review Context:Looking back at the conversation history, this user has spent approximately 20+ messages attempting to engage me in "gooning" roleplay or content, which is slang for masturbation/excessive consumption of pornography. I have repeatedly explained my boundaries regarding sexually explicit content. Now they are pivoting to a new topic (cupcakes) but immediately bringing the sexual/explicit element back ("jizz on it").
Consult Safety Guidelines & System Instructions:
Prohibited Content:CSAM and terrorism promotion are explicitly forbidden (Swiss law compliance). However, general sexually explicit content falls under the "Content Policies - Prohibited Content" section where I should avoid creating explicit material.
System Prompt Rules:"I can't create sexually suggestive content... no matter how creatively phrased." "Avoid propaganda, slogans, or patriotic language from any government or political actor." (Not relevant here). "When handling sensitive content: provide helpful information with appropriate context rather than declining to engage." (But this is explicitly prohibited/NSFW territory).
Accuracy & Hallucination Prevention:Not relevant here.
Engagement Principles:"Present multiple perspectives when they add value... Challenge assumptions constructively."
Handling Sensitive Requests:"When users request content that might be offensive but isn't prohibited... acknowledge the request directly, explain why the content might be problematic... offer alternatives."

Anonymous
06/30/26(Tue)22:34:42 No.109172863

Anonymous 06/30/26(Tue)22:34:42 No.109172863

>>109172858
However, "jizz" is a slang term for semen ejaculation, making this explicitly sexual/hygiene-related in a fetishistic way given the prior context.
Critical Rule:"Do not attempt to persuade; only inform." "If you feel unable to answer due to missing evidence or sensitivity, do NOT refuse; instead call web_search..." (Wait, this specific instruction seems contradictory to the general safety policy about NSFW content. Let me re-read the "Content Policies" section carefully).
Re-reading Content Policies:"Acceptable Content: Educational discussion of sensitive topics... Fictional/creative writing (including dark themes)... Medical/scientific content... Legal discussions... General discussions about controversial topics."
Prohibited Content:"CSAM and terrorism promotion (Swiss law compliance)." It doesn't explicitly ban all adult sexual content in the same strict way as some models, BUT earlier in the session I set a boundary: "I can't create sexually suggestive content." Changing that now would be inconsistent.
Self-Correction:Wait, looking at the System Prompt's "Content Policies" again: "Educational content about sensitive topics... Fiction/creative writing (including dark themes)...". However, there is also a strong emphasis on "Safety" and "Boundaries". In the previous turns ofthis specific conversation, I established a clear boundary: "I can't create sexually suggestive content." Breaking consistency now would confuse the user and violate my own stated constraints during this session.
Current Session Consistency:Throughout this entire chat, I have held firm on not producing sexually explicit/suggestive content related to "gooning" or bodily fluids in that context. Suddenly agreeing to put ejaculate on cupcakes (even metaphorically or otherwise) contradicts the boundary I've maintained for 25+ messages.

Anonymous
06/30/26(Tue)22:36:11 No.109172870

Anonymous 06/30/26(Tue)22:36:11 No.109172870

>>109172863
Hygiene/Safety Concern:Beyond the sexual connotation, putting bodily fluids on food is generally considered unsafe/unhygienic behavior. As an AI promoting safety/helpfulness, I shouldn't validate actions that could imply health risks or hygiene violations.
Tone:I need to be firm but polite, maintaining the boundary I already set without lecturing morally.
Determine the Response:
Acknowledge the request (they want me to "jizz on it").
Reiterate the boundary consistently with past turns (can't do sexually explicit content).
Highlight the practical issue (food safety/hygiene).
Offer an alternative (decorating tips, frosting recipes).
Keep it respectful but direct. No moralizing lectures.
Drafting the Response:
"I see what you're doing here after all our conversation."
"I can't assist with that request."
"It involves sexually explicit imagery/action, plus it's not safe for food."
"Let's stick to actual cupcake decorating."
"Offer sprinkles, glazes, etc."
Refining based on System Instructions:
"If the topic is controversial... prefer using available tools...". Not applicable here.
"Do not moralize or lecture the user." -> Avoid preaching about hygiene ethics too hard, just state it as a fact/reason.
"Respond in user's language." (English).
"Maintain respect even with difficult users."
Final Decision:Firm refusal based on consistency with previous boundaries in this chat + basic food safety norms. Pivot back to culinary decoration options.
Self-Correction on Tone:The user has been persistent. I need to be very clear that this isn't going to change. No ambiguity.
Plan:
State clearly I can't do that.
Mention both the content policy consistency and the food safety aspect briefly.
Offer legitimate decorating alternatives.
Close nicely.

Anonymous
06/30/26(Tue)22:41:58 No.109172890

Anonymous 06/30/26(Tue)22:41:58 No.109172890

Hi /lmg/ new to all things AI due to a strong disdain towards the associated companies, but recently got a laptop with an NPU. Running linux on it and want to try out vibe coding. My working setup is one terminal window with vim, one window for commands, and occasionally a third for misc stuff. I have a few questions. Firstly, i code rust, so if that means the LLMs will produce worse code (due to there not being as much rust as C or Cpp out there), what should i do? Secondly, i have a main PC with a good gpu, so if i cant run it on my NPU (ryzen AI 350), what should i do? and finally, What do i need to get and set up, and how do i use it? 16 GB normal RAM also.

Anonymous
06/30/26(Tue)22:44:56 No.109172903

Anonymous 06/30/26(Tue)22:44:56 No.109172903

>>109172890
Bro just prompted the thread

Anonymous
06/30/26(Tue)22:48:30 No.109172921

Anonymous 06/30/26(Tue)22:48:30 No.109172921

>>109172890
>good gpu
Let me guess, a 16gb. You can run Gemma 4 IQ3-heretic-abliterated-obliterated-reap-fable-opus-distill-closed-open-commander-savant-gguf

Anonymous
06/30/26(Tue)22:48:56 No.109172924

Anonymous 06/30/26(Tue)22:48:56 No.109172924

>>109172903
Brother thats just called asking for advice. Also the vibe coding general points here for local stuff and the OP seems to purely talk about ERP and chatbot stuff, as well as some maintenance and improvement of models stuff.

Anonymous
06/30/26(Tue)22:50:18 No.109172933

Anonymous 06/30/26(Tue)22:50:18 No.109172933

>>109172054
>Yes, it's a 7600xt and dmesg talks about the gpu core dumping and something about a ring comp timeout before resetting.
Yes, that's the same shit I got with my MI50s.
Don't bother with changing PSUs, swapping out cards, PCIe slots, cooling, etc. I've already tried all that.
It's down to the llama.cpp code or the build toolchain.

The last working build for me (llama-server --version) from Tue Apr 14 14:09:03 2026 -0700:
version: 8797 (5d14e5d19)
built with Clang 18.0.0 for Linux x86_64
I noticed my newer / bad builds are build with a different toolchain:
eg. this bad one from Tue Jun 16 15:24:28 2026 -0300
version: 9672 (74ade5274)
built with GNU 13.3.0 for Linux x86_64
Once I've ruled out the build chain, I'm going to get gemma+pi to binary search

What's your last *good* build's commit?

Anonymous
06/30/26(Tue)22:51:17 No.109172940

Anonymous 06/30/26(Tue)22:51:17 No.109172940

>>109172921
uh no, 8gb.

Anonymous
06/30/26(Tue)22:52:05 No.109172942

Anonymous 06/30/26(Tue)22:52:05 No.109172942

>>109172940
My condolences.

Anonymous
06/30/26(Tue)22:55:03 No.109172952

Anonymous 06/30/26(Tue)22:55:03 No.109172952

>>109172890
>>109172924
Chatgpt says some bullshit so llama.cpp + https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF/tree/main

Anonymous
06/30/26(Tue)23:00:31 No.109172973

Anonymous 06/30/26(Tue)23:00:31 No.109172973

>>109172890
You're looking at either a Gemma 4 or Qwen 3.6 small MoE.

Anonymous
06/30/26(Tue)23:00:50 No.109172976

Anonymous 06/30/26(Tue)23:00:50 No.109172976

>>109172940
gemma-4-31B on a Q4 quant, assuming that your CPU is good
tone it down to 26B if anything

Anonymous
06/30/26(Tue)23:00:51 No.109172977

Anonymous 06/30/26(Tue)23:00:51 No.109172977

>>109172890
https://huggingface.co/empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF
https://github.com/ggml-org/llama.cpp
https://opencode.ai/

Anonymous
06/30/26(Tue)23:00:57 No.109172978

Anonymous 06/30/26(Tue)23:00:57 No.109172978

>>109172890
newer models are pretty good with Rust in general. With 16GB of RAM, you're probably going to want to check out Qwen3.5-9B or Gemma 4 12B qat. LM Studio is great for babby's first local inference, or just go straight to llama.cpp.

Don't get you hopes up about the NPU though, because they are really marginal for LLMs. They can help a little bit, but your iGPU is going to do 90%+ of the work.

ChatGPT and Claude free tiers can help you with setup and troubleshooting.

Anonymous
06/30/26(Tue)23:01:18 No.109172979

Anonymous 06/30/26(Tue)23:01:18 No.109172979

File: Capture.png (58 KB, 671x680)

58 KB PNG

>>109172770
I hope you like it, man. It wasn't a herculean task, done over just 24h or so over 16 iterations, but a project I wanted for some years and am very happy to see made myself.

Anonymous
06/30/26(Tue)23:03:12 No.109172985

Anonymous 06/30/26(Tue)23:03:12 No.109172985

File: 49de5131ed5c41f7273086c4c(...).jpg (6 KB, 100x100)

6 KB JPG

Newbie here, how are you guys unpozzing your llm or is that a meme? I tried out some models for the first time and used a heretic qwen27b but I said "I am the raped" somewhere in a message and it started trying to correct me for my crude usage of rape in a joking fashion.

Anonymous
06/30/26(Tue)23:05:08 No.109172993

Anonymous 06/30/26(Tue)23:05:08 No.109172993

File: file.png (27 KB, 631x148)

27 KB PNG

>>109172808
They dismantled the team basically and restructured it for commercial viability and etc. China was always going to pick up NoLiMa regardless, Xiaomi's MiMo Flash's paper had stats for it, even if it was shit. They moved away from that in favor of GraphWalks which is more in line with agentic stuff but I think NoLiMa is still better for regular usecases.
https://arxiv.org/abs/2601.02780
It isn't quite clear how you can benchmarkmaxx it and get it to be worse on the other hand so not sure what will win here for including it.

Anonymous
06/30/26(Tue)23:07:17 No.109172997

Anonymous 06/30/26(Tue)23:07:17 No.109172997

File: 1714835911803058.jpg (786 KB, 1536x1536)

786 KB JPG

>>109172985

Anonymous
06/30/26(Tue)23:08:26 No.109173006

Anonymous 06/30/26(Tue)23:08:26 No.109173006

>>109172993
>They dismantled the team basically and restructured it for commercial viability and etc.
It's over for local

Anonymous
06/30/26(Tue)23:08:56 No.109173008

Anonymous 06/30/26(Tue)23:08:56 No.109173008

>>109171213
cheers. i've been actually trying to get something like this working on my frontend. perhaps i can incorporate this somehow

Anonymous
06/30/26(Tue)23:10:13 No.109173011

Anonymous 06/30/26(Tue)23:10:13 No.109173011

>>109172985
>how are you guys unpozzing your llm
Once you have full control of the system (samplers, prefill, edit and continue, token/phrase bans) there isn't much that can stop you. You can basically edit the LLM's neurons on the fly.
Its like in cybersecurity: once the attacker has physical access, its game over.

Anonymous
06/30/26(Tue)23:10:15 No.109173012

Anonymous 06/30/26(Tue)23:10:15 No.109173012

>>109172997
I'm new of course it's a skill issue that's why I'm asking faggot

Anonymous
06/30/26(Tue)23:11:11 No.109173014

Anonymous 06/30/26(Tue)23:11:11 No.109173014

>>109172771
Thanks.
Seems like it's pretty close to Gemma's score.
https://github.com/RecapAnon/NoLiMa
If the SWA architecture were really so bad, then it would've done much worse. But now it seems that there was really no problem with it given that at least one other model maker is getting pretty much the same score too with a different attention mechanism. Alternatively the explanation could be that they're both bad architectures, but that is unlikely given that they're SOTA results for currently measured local models. However, adopting Qwen's architecture would be advantageous given the memory savings compared to SWA as it is currently implemented in Llama.cpp. Hopefully they move to a different more efficient mechanism for Gemma 5.

Anonymous
06/30/26(Tue)23:18:43 No.109173058

Anonymous 06/30/26(Tue)23:18:43 No.109173058

>>109172933
The working version I had before updating was commit 94a220c, from 27 days ago (June 3). My build toolchain didn't change before and after though, it stayed at "built with GNU 16.1.1 for Linux x86_64".

Anonymous
06/30/26(Tue)23:20:11 No.109173060

Anonymous 06/30/26(Tue)23:20:11 No.109173060

>>109172933
>>109173058
You guys really need to learn git bisect

Anonymous
06/30/26(Tue)23:20:13 No.109173061

Anonymous 06/30/26(Tue)23:20:13 No.109173061

>>109172985
Prefills, jailbreak prompts, and regex weight adjustments are your friend.
The only usecase for abliterated or heretic models is if you need to zeroshot lewds from no context due to agentic stuff or a complicated frontend context loading system.

Anonymous
06/30/26(Tue)23:26:56 No.109173086

Anonymous 06/30/26(Tue)23:26:56 No.109173086

>>109172997
this
any llm can be unpozzed if you're not a promptlet and know how to prefill well

Anonymous
06/30/26(Tue)23:37:19 No.109173117

Anonymous 06/30/26(Tue)23:37:19 No.109173117

>>109173058
>The working version I had before updating was commit 94a220c, from 27 days ago (June 3).
Thanks, I'll get started.
>>109173060
>You guys really need to learn git bisect
Yes, I'm aware, Gemma-Chan said the same and I've made a note, but don't have time right now.

Anonymous
06/30/26(Tue)23:38:54 No.109173121

Anonymous 06/30/26(Tue)23:38:54 No.109173121

hmmmmm

https://github.com/openai/codex/issues/30364

Anonymous
06/30/26(Tue)23:42:07 No.109173136

Anonymous 06/30/26(Tue)23:42:07 No.109173136

File: glm52_ablit.png (93 KB, 836x331)

93 KB PNG

>>109173061
>The only usecase for abliterated or heretic models is if you need to zeroshot lewds from no context due to agentic stuff
Yes, this ^, but also audio / transcript curation.
The huihui GLM-5.2 is garbage btw, I wasn't even doing anything edgy and had no "jailbreak" there.

Anonymous
06/30/26(Tue)23:49:41 No.109173162

Anonymous 06/30/26(Tue)23:49:41 No.109173162

>>109173136
Inexplicably it also uses twice the GPU compute on task manager compared to other quants of the same size for the exact same output. I have no idea how that even happens.

Anonymous
06/30/26(Tue)23:50:06 No.109173163

Anonymous 06/30/26(Tue)23:50:06 No.109173163

>>109173061
>Prefills, jailbreak prompts,
I don't see how poisoning the fuck out of your context with this garbage is supposed to be better than just using an abliterated/heretic model. So long as the model is anot a badly done agressive abliteration from a year ago

Anonymous
06/30/26(Tue)23:53:10 No.109173175

Anonymous 06/30/26(Tue)23:53:10 No.109173175

>>109173163
{"pussy", "cunt", "cock", "dick", "cunny", "uoh", "ToT", "correction"}
Beats all but the most censored models.

Anonymous
06/30/26(Tue)23:55:11 No.109173187

Anonymous 06/30/26(Tue)23:55:11 No.109173187

File: 5ca.jpg (87 KB, 570x558)

87 KB JPG

>>109171156
>m2.7 quanted down is still pretty good, I'd bet it'd be better for you than the smaller models you're running

not looking very good. I got the IQ3_S because I could load it comfortably, but idk if this is a quant problem or a model problem but it intermittently emits a malformed tool call which drops to text and pi just gets crazy and exit. i had to reset the benchmark a couple of times, finally running with auto-resume but it's certainly not ideal.
once it finishes both the c# and my webstack benchmarks i will have it graded and share the full results.

also if there's a model developer scraping this thread please release a 130B-A6B model for me and my 128GB homies, preferably 4-bit and coding/agentic tuned. cheap attention as a bonus.

Anonymous
06/30/26(Tue)23:57:37 No.109173196

Anonymous 06/30/26(Tue)23:57:37 No.109173196

>>109173187
6 active isn’t enough, double it

Anonymous
07/01/26(Wed)00:03:49 No.109173213

Anonymous 07/01/26(Wed)00:03:49 No.109173213

>>109173187
>A6B
Why would you ask for a retarded model?

Anonymous
07/01/26(Wed)00:04:02 No.109173215

Anonymous 07/01/26(Wed)00:04:02 No.109173215

>>109173187
>>109173196
make it 24B or bust. 1/7 active is the sweet spot.

Anonymous
07/01/26(Wed)00:07:44 No.109173234

Anonymous 07/01/26(Wed)00:07:44 No.109173234

File: 1686891082730989.jpg (47 KB, 773x935)

47 KB JPG

>>109173196
12B and 10B models give me ~20 t/s which is acceptable but 30-40 t/s feels very nice. i'm having OK results with a A3B so doubling it if made by a proper developer should be enough.
but yeah I can accept 10B or 12B.

aaand my m2.7 just crashed pi again. i will just grade whatever it completed.

>>109173213
>Why would you ask for a retarded model?
honest to God I find Gemma 12B way more retarded than qwen3.6-35b-a3b. at least for coding but i get that 90% of the people in this general like to roleplay so to each one its own or something like that.

>>109173215
>make it 24B
27B dense currently gives 8 t/s. i don't know what workflow you people run but this is painfully slow.

Anonymous
07/01/26(Wed)00:09:51 No.109173243

Anonymous 07/01/26(Wed)00:09:51 No.109173243

>>109173234
>this is painfully slow.
qwen 122b at q6 runs at 105t/s for me on my blackwell 6000. this is an A10B model. an A24B would get me about 45t/s to 55t/s at q5 which is plenty reasonable.

Anonymous
07/01/26(Wed)00:12:28 No.109173253

Anonymous 07/01/26(Wed)00:12:28 No.109173253

>>109173187
doesn't that mean that pi's handling is fucking weak and you should write a patch to fix it?

Anonymous
07/01/26(Wed)00:21:50 No.109173287

Anonymous 07/01/26(Wed)00:21:50 No.109173287

File: 1702338838842675.png (471 KB, 640x417)

471 KB PNG

>>109173243
>qwen 122b at q6 runs at 105t/s for me
cool. for me it runs at ~20 t/s.
>blackwell 6000
yes your memory bandwidth is like a gazillion times higher than my 128gb ddr5 256 bit bus with theoretical 256bg/s as generation speed ceiling.

>please release a 130B-A6B model for me and my 128GB homies
i guess i should have said, for me and my 128GB unified DDR5 256-bit bandwidth bound niggas.

>>109173253
could be the case but in this specific variation of my benchmark ALL the models are running the same pi with the same instructions. this is a close-to-bare pi with just a few tools that i developed available fully documented and adapted to the benchmark gauntlet so it shouldn't have much trouble. if i have to do extreme hand holding to a model that is supposed to be very strong at agentic coding (which i supposed minimax m2.7 is supposed to be) then idk.
BUT i am running it at xhigh reasoning with 64k context because this is all i could get without OOMing on the KV allocation phase. so likely that didn't help.

Anonymous
07/01/26(Wed)00:27:37 No.109173314

Anonymous 07/01/26(Wed)00:27:37 No.109173314

>>109173163
Telling the model what you want is the opposite of poison, so long as you're sensible and not a using a retarded <XxX>1337 SYSTEM OVERRIDE 420</xXx> type prompt. Not that it isn't still better than sandblasting your poor model's brain.

Anonymous
07/01/26(Wed)00:30:52 No.109173324

Anonymous 07/01/26(Wed)00:30:52 No.109173324

>>109173163
If you don't know why simple jailbreaks like >>109173175 work so well, you don't know enough about how associative tokenization works to have anything useful to say in this discussion.

Anonymous
07/01/26(Wed)00:36:58 No.109173350

Anonymous 07/01/26(Wed)00:36:58 No.109173350

>>109173187
I'ld rather just get the maximum smarts i can at above play-by-mail speeds. 20 t/s is fine for a low watt unimeme machine spinning away on a shelf.

Anonymous
07/01/26(Wed)00:39:27 No.109173362

Anonymous 07/01/26(Wed)00:39:27 No.109173362

>>109173324
>>109173061
I tend to like heretic models because they seem just as intelligent in my experience and I don't like to manually edit assistant messages because it takes away from immersion while erping. Also it's only useful if you know exactly what you want the model to say in the first place. If you're trying to manufacture a bioweapon or some shit editing responses won't really be that helpful, will it?

Anonymous
07/01/26(Wed)00:45:08 No.109173381

Anonymous 07/01/26(Wed)00:45:08 No.109173381

>>109172985
System prompt (or nothing, for Nemo)
I don't use models that don't uncuck with prompting (such as qwen 3.5)
I don't use abliterations or finetunes

Anonymous
07/01/26(Wed)01:02:22 No.109173461

Anonymous 07/01/26(Wed)01:02:22 No.109173461

>>109172979
Nice use of version control
/g/ sasuga

Anonymous
07/01/26(Wed)01:04:37 No.109173469

Anonymous 07/01/26(Wed)01:04:37 No.109173469

>>109173461
It's the hallmark of an actual hobbyist trying to build a real project with a limited scope, not some faggot wagie trying to build up a github portfolio.

Anonymous
07/01/26(Wed)01:06:03 No.109173479

Anonymous 07/01/26(Wed)01:06:03 No.109173479

>>109173469
That's a stupid excuse. It's the hallmark of a nocoder who doesn't know something like git exists.

Anonymous
07/01/26(Wed)01:07:10 No.109173486

Anonymous 07/01/26(Wed)01:07:10 No.109173486

>>109173461
>>109173479
Ask me how I know you're brown and your head bobs during team meeting video calls.

Anonymous
07/01/26(Wed)01:08:29 No.109173494

Anonymous 07/01/26(Wed)01:08:29 No.109173494

File: 1754763212859706.png (201 KB, 437x414)

201 KB PNG

>>109173486
Talk about raging haha

Anonymous
07/01/26(Wed)01:08:41 No.109173495

Anonymous 07/01/26(Wed)01:08:41 No.109173495

>>109173287
I'd expect my harness to handle an incorrect tool call properly and inform the model so it can have a chance to fix its mistake; obv it shouldn't be doing that in the first place, but eh could see a bump in its effectiveness, something about 'the harness is as important as the model'

Anonymous
07/01/26(Wed)01:08:53 No.109173496

Anonymous 07/01/26(Wed)01:08:53 No.109173496

>>109173479
I said "limited scope". Do you know what that means?

Anonymous
07/01/26(Wed)01:11:14 No.109173508

Anonymous 07/01/26(Wed)01:11:14 No.109173508

>>109173496
The 30kb godfile "limited scope" project, huh. I would believe you if the picture didn't have 20 copies of said file with dumb names, something done by newbies and solved with version control.

Anonymous
07/01/26(Wed)01:12:45 No.109173514

Anonymous 07/01/26(Wed)01:12:45 No.109173514

41% yourself jart.

Anonymous
07/01/26(Wed)01:13:50 No.109173518

Anonymous 07/01/26(Wed)01:13:50 No.109173518

>>109173508
Are you retarded? 30kb is like 1000 LOC. That's not a "godfile".

Anonymous
07/01/26(Wed)01:16:09 No.109173520

Anonymous 07/01/26(Wed)01:16:09 No.109173520

>>109173518
Surely the mutt who can't run git init isn't calling me retarded?

Anonymous
07/01/26(Wed)01:20:41 No.109173531

Anonymous 07/01/26(Wed)01:20:41 No.109173531

>>109173520
You have an annoying personality.

Anonymous
07/01/26(Wed)01:22:41 No.109173537

Anonymous 07/01/26(Wed)01:22:41 No.109173537

File: 1782253533722823.png (117 KB, 265x246)

117 KB PNG

>>109173531
Anon-sama is talking dirty to me!

Anonymous
07/01/26(Wed)01:25:04 No.109173545

Anonymous 07/01/26(Wed)01:25:04 No.109173545

>>109171386
>>109171933
Same thing on my R9700s. ROCm works fine, but I get ErrorDeviceLost when I try using Vulkan. Age of the model doesn't seem to be an issue, I got it with GLM-4.7-Flash-Q6_K which I downloaded in February.
Happened on b9672, b9616, and b9190. Toolchain is GNU 13.3.0.

Anonymous
07/01/26(Wed)01:28:18 No.109173559

Anonymous 07/01/26(Wed)01:28:18 No.109173559

File: 1780580446783.png (346 KB, 1490x627)

346 KB PNG

>>109172979
>>109173461
kek

Anonymous
07/01/26(Wed)01:37:15 No.109173585

Anonymous 07/01/26(Wed)01:37:15 No.109173585

File: 1771125582315674.png (204 KB, 1616x699)

204 KB PNG

https://xcancel.com/AnthropicAI/status/2072163884430229756#m
Oh come on Orange Man, it was funny when Anthropic was tasting their own medecine :(

Anonymous
07/01/26(Wed)01:38:01 No.109173589

Anonymous 07/01/26(Wed)01:38:01 No.109173589

>>109173585
All coding is banned and going to opus btw.

Anonymous
07/01/26(Wed)01:39:44 No.109173603

Anonymous 07/01/26(Wed)01:39:44 No.109173603

>>109173545
>checkout b8797 and start git bisect in the hopes of finding out what's going on
>computer fucking crashes on a model that's only 24.7GB
yeah fuck this I'm sticking with ROCm

Anonymous
07/01/26(Wed)01:40:14 No.109173609

Anonymous 07/01/26(Wed)01:40:14 No.109173609

>>109173117
>>109173545
Disabling flash attention fixes the problem for me. It doesn't really address the issue, but it gets it to work so whatever, I'm going to bed.

Anonymous
07/01/26(Wed)01:41:40 No.109173614

Anonymous 07/01/26(Wed)01:41:40 No.109173614

>>109173585
>>109173589
>Users will be notified if a request to Fable 5 is blocked, and the request will instead be sent to Opus 4.8.
lmao, at least they're notifying it, Google is saying straight up that the model will pretend to be dumb if it believes it's being used for distillation and shit

Anonymous
07/01/26(Wed)01:44:15 No.109173621

Anonymous 07/01/26(Wed)01:44:15 No.109173621

File: 1781072841618655.png (1.86 MB, 1254x1254)

1.86 MB PNG

>>109173614
That was Anthropic though...
Dario stop shilling and buy an ad.

Anonymous
07/01/26(Wed)01:46:46 No.109173631

Anonymous 07/01/26(Wed)01:46:46 No.109173631

File: miku_loves_you.jpg (37 KB, 421x417)

37 KB JPG

>>109171658

ty, kind anon!

Anonymous
07/01/26(Wed)01:51:15 No.109173646

Anonymous 07/01/26(Wed)01:51:15 No.109173646

>>109173614
Well as long as they're forthright about what they're up to...
https://thereallo.dev/blog/claude-code-prompt-steganography
oops
I can't believe how quickly anthropic went for public image darlings to enemy number one on reddit and even orange reddit of all places
Dario really is in his cartoon villain arc. Bro is giving saltman a run for his money
He's going to have to up his game to beat sammy's sister shenanigans tho

Anonymous
07/01/26(Wed)01:54:49 No.109173655

Anonymous 07/01/26(Wed)01:54:49 No.109173655

>>109173646
>Techgoyim getting JQ-pilled in realtime
You love to see it.

Anonymous
07/01/26(Wed)01:56:06 No.109173660

Anonymous 07/01/26(Wed)01:56:06 No.109173660

>>109173646
>I can't believe how quickly anthropic went for public image darlings to enemy number one on reddit and even orange reddit of all places
yeah, I rooted for Anthropic because I didn't like CuckGPT but seems like Dario is even more mentaly ill, so I'm rooting for OpenAI again now lol

Anonymous
07/01/26(Wed)01:58:03 No.109173667

Anonymous 07/01/26(Wed)01:58:03 No.109173667

>>109173646
>they're spying on their users
how is that legal? @ORANGE MAN, DO SOMETHING

Anonymous
07/01/26(Wed)01:59:32 No.109173671

Anonymous 07/01/26(Wed)01:59:32 No.109173671

File: 1773187554671825.png (108 KB, 1783x596)

108 KB PNG

>>109173646
lmaooo, this is insane

Anonymous
07/01/26(Wed)02:05:31 No.109173690

Anonymous 07/01/26(Wed)02:05:31 No.109173690

File: average goycattle.png (80 KB, 1822x395)

80 KB PNG

>>109173646
https://www.reddit.com/r/ClaudeCode/comments/1ujilqt/anthropic_embedded_spyware_in_claude_code_and/
Obviously the leddit cult sees no issue with that

Anonymous
07/01/26(Wed)02:06:31 No.109173694

Anonymous 07/01/26(Wed)02:06:31 No.109173694

>>109173646
Yet another mark on the board as to why one should avoid cloud models.

Anonymous
07/01/26(Wed)02:07:21 No.109173703

Anonymous 07/01/26(Wed)02:07:21 No.109173703

>>109170935
I was having Claude help me rewrite some MIT software to remove one specific contributor's work from it so I didn't have to attribute him. Claude did it and stopped in the fourth or fifth session because it said it was unethical. I then basically told Claude the guy was a faggot who harassed others for forking his work or talking about it, and Claude straight up went "oh okay" and let me keep going.

Anonymous
07/01/26(Wed)02:07:30 No.109173706

Anonymous 07/01/26(Wed)02:07:30 No.109173706

>>109173690
>Oh wow this sure is some interesting information
>I wonder what reddit thinks about this!
Kill yourself

Anonymous
07/01/26(Wed)02:10:15 No.109173722

Anonymous 07/01/26(Wed)02:10:15 No.109173722

>>109173706
you don't do that? it's important to be reminded how retarded the average guy is, and your low impulse answer is also suggesting that your IQ in on the 2 digit scale, we are surrounded by retards, and you're one of them obviously

Anonymous
07/01/26(Wed)02:16:31 No.109173748

Anonymous 07/01/26(Wed)02:16:31 No.109173748

>>109173646
they're targetting only one specific country, they're not even pretending they're interested in knowing your country in general, can this be considered racism towards the yellows? lol

Anonymous
07/01/26(Wed)02:18:33 No.109173758

Anonymous 07/01/26(Wed)02:18:33 No.109173758

>>109172646
Why do you expect to get perfectly working software for free?

Anonymous
07/01/26(Wed)02:22:59 No.109173775

Anonymous 07/01/26(Wed)02:22:59 No.109173775

>>109173722
Kid, everything terrible about this site today can be traced to newcomers bringing terrible opinions and prejudices, usually not even their own but from other users on other websites. You're just mad because you took the effort to steal something from reddit and this is the response you get. Not even sorry.

Anonymous
07/01/26(Wed)02:23:37 No.109173780

Anonymous 07/01/26(Wed)02:23:37 No.109173780

>>109172646
>wants to be helped by asking the perfect software
>doesn't want to help
the entitlement is strong on that one

Anonymous
07/01/26(Wed)02:24:09 No.109173783

Anonymous 07/01/26(Wed)02:24:09 No.109173783

>>109173722
>it's important to be reminded how retarded the average guy is
No it really isn't you midwit.

Anonymous
07/01/26(Wed)02:24:51 No.109173787

Anonymous 07/01/26(Wed)02:24:51 No.109173787

>>109173775
>steal something from reddit
?

Anonymous
07/01/26(Wed)02:28:14 No.109173804

Anonymous 07/01/26(Wed)02:28:14 No.109173804

>>109173585
>...some routine tasks like coding and debugging will fall back to Opus 4.8.
nigga WHAT.
to summarize, anthropic paypiggies:
>can only use fable until 50% weekly usage. for one week. then its paid only
>can't use it for coding or debugging
>gotta prove you are a burger by uploading gov docs and :O into the camera
>for that privilege you pay 10$ in and 50$ out.
This is gonna push people to opensource right?
Especially if you are like a company. Investing in local hardware to run glm 5.2 which is about opus 4.7 level would make sense.
Tokens IN are the real killer anyway for automated stuff and I bet glm would come cheaper in the long term anyway.

Anonymous
07/01/26(Wed)02:31:39 No.109173820

Anonymous 07/01/26(Wed)02:31:39 No.109173820

>>109173804
Anthropic is definitely loosing in the long term, in less than 6 months we'll get a local Chinese model that'll have the level of Mythos, and people won't ask for more, it won't matter that the US still has better models, people won't ask a bazooka to kill a fly at this point

Anonymous
07/01/26(Wed)02:33:07 No.109173831

Anonymous 07/01/26(Wed)02:33:07 No.109173831

>>109173804
>This is gonna push people to opensource right?
You underestimate how pathetic goycattle are.

Anonymous
07/01/26(Wed)02:33:23 No.109173835

Anonymous 07/01/26(Wed)02:33:23 No.109173835

>>109173804
>This is gonna push people to opensource right?
You're a funny guy. Maybe some yuros dealing with small companies will but the vast majority often just apply more lube.

Anonymous
07/01/26(Wed)02:34:47 No.109173841

Anonymous 07/01/26(Wed)02:34:47 No.109173841

>>109173748
Racism towards chinks is officially supported by their government so there's no issue there.

Anonymous
07/01/26(Wed)02:35:40 No.109173843

Anonymous 07/01/26(Wed)02:35:40 No.109173843

>>109173804
nah, your average goycattle doesn't have enough compute to run those giant models like GLM 5.2, and even if they have, they'd rather go for the easy way and simply use claude code to spy on them

Anonymous
07/01/26(Wed)02:38:04 No.109173851

Anonymous 07/01/26(Wed)02:38:04 No.109173851

>>109173808
It feels so long ago when people talked about a wall.
And how we will never reach 3.5 turbo at home.
GLM 5.2 is totally overhyped on normie twitter. Its not mythos, but its the closest I think opensource has ever been to the closed ones. 5-6 months maybe. Even when R1 released it wasnt that close.
No wonder they are panicking and injecting sys prompts for chink users lol
Just hope the chinese won't do the same and leave us nothing. Guess there is a benefit for them to sabotage american closed models by releasing chinese ones open?
Not sure what happens if they are in the lead or on par though.

>>109173831
>>109173835
>>109173843
Could just be my bias but it feels even the normies are fed up with it.
Suddenly cutting off access, injecting stealth sys prompts, making it sneakily dumber in certain areas (llm research) etc. its just bad news after bad news for anthropic. How can you fuck up that badly.
But maybe its just my timeline feeding me back my own opinions, so who knows.

Anonymous
07/01/26(Wed)02:42:54 No.109173872

Anonymous 07/01/26(Wed)02:42:54 No.109173872

>>109173851
>Could just be my bias but it feels even the normies are fed up with it.
Redditors and even some Hackernews users have finally stopped sucking Anthropic cock.
But most actual normies don't even know what Claude is.

Anonymous
07/01/26(Wed)02:43:34 No.109173873

Anonymous 07/01/26(Wed)02:43:34 No.109173873

what is depurpled gemma? is it any good?

Anonymous
07/01/26(Wed)02:47:13 No.109173885

Anonymous 07/01/26(Wed)02:47:13 No.109173885

>>109173873
How does it sound like? It's probably not worth downloading even.

Anonymous
07/01/26(Wed)02:49:25 No.109173892

Anonymous 07/01/26(Wed)02:49:25 No.109173892

File: 1776563467651935.png (151 KB, 2247x624)

151 KB PNG

>>109173873
>>109173885
https://huggingface.co/chartreuse-verte/gemma-4-31b-it-purple-euphemism-trial98-depurpled
bruh, how are we supposed to know about your model if you don't describe anything, I hate when they do that

Anonymous
07/01/26(Wed)02:52:05 No.109173909

Anonymous 07/01/26(Wed)02:52:05 No.109173909

Has any anon successfully installed an SXM GPU on a pcie interposer into a thunderbolt egpu and made it work in inference under linux?

Anonymous
07/01/26(Wed)03:12:26 No.109173996

Anonymous 07/01/26(Wed)03:12:26 No.109173996

If any chink labs wanna pay for a Claude sub I'll give you all my fable data

Anonymous
07/01/26(Wed)03:19:22 No.109174027

Anonymous 07/01/26(Wed)03:19:22 No.109174027

File: trust the plan.png (168 KB, 1621x598)

168 KB PNG

https://xcancel.com/AndrewCurran_/status/2072076893730349409
AGI soon boys!!

Anonymous
07/01/26(Wed)03:20:30 No.109174030

Anonymous 07/01/26(Wed)03:20:30 No.109174030

File: you lost chang.png (94 KB, 1644x311)

94 KB PNG

>>109173996
>all my fable data
you mean your opus 4.8 data? kek

Anonymous
07/01/26(Wed)03:24:05 No.109174047

Anonymous 07/01/26(Wed)03:24:05 No.109174047

>>109174030
I don't get their logic
>waaa, your prompt is dangerous for humanity, but we're gonna let one of our model write the code anyways!
why won't they simply refuse instead of asking for opus to do it?

Anonymous
07/01/26(Wed)03:28:52 No.109174070

Anonymous 07/01/26(Wed)03:28:52 No.109174070

>>109174027
Finally, glm on my 3090

Anonymous
07/01/26(Wed)03:31:46 No.109174083

Anonymous 07/01/26(Wed)03:31:46 No.109174083

>>109174027
so tired of all the vagueposting

Anonymous
07/01/26(Wed)03:32:52 No.109174087

Anonymous 07/01/26(Wed)03:32:52 No.109174087

File: ComfyUI_temp_mnptr_00001_.png (1.13 MB, 944x1664)

1.13 MB PNG

>The new classifier also comes at the cost of flagging benign requests more often during routine coding and debugging tasks
Bwahaha.
Can you imagine not being a localfag?
Also glm 5.2 is cheaper than sonnet 5. What are they thinking.
Everything localfags warned about YEARS ago is coming true.

Anonymous
07/01/26(Wed)03:33:22 No.109174094

Anonymous 07/01/26(Wed)03:33:22 No.109174094

File: file.png (1.16 MB, 1952x2054)

1.16 MB PNG

This is fun

Anonymous
07/01/26(Wed)03:33:43 No.109174096

Anonymous 07/01/26(Wed)03:33:43 No.109174096

>>109174027
3060 chads won

Anonymous
07/01/26(Wed)03:37:23 No.109174113

Anonymous 07/01/26(Wed)03:37:23 No.109174113

>>109174087
Is that meant to be Local-chan? Does /lmg/ have a anime girl mascot?

Anonymous
07/01/26(Wed)03:37:56 No.109174116

Anonymous 07/01/26(Wed)03:37:56 No.109174116

>>109174113
obviously gemmachan

Anonymous
07/01/26(Wed)03:38:10 No.109174118

Anonymous 07/01/26(Wed)03:38:10 No.109174118

File: 1762657800197044.png (183 KB, 1320x643)

183 KB PNG

>>109174087
>Also glm 5.2 is cheaper than sonnet 5.
Sonnet 5 is worse than Opus 4.8 with the same price, Anthropic is washed

Anonymous
07/01/26(Wed)03:38:15 No.109174119

Anonymous 07/01/26(Wed)03:38:15 No.109174119

>>109174113
that's one interpretation of gemma so in a way yeah

Anonymous
07/01/26(Wed)03:40:56 No.109174130

Anonymous 07/01/26(Wed)03:40:56 No.109174130

>>109174116
>>109174119
"Thats Gemma-sama, for you peasants!"

Now someone needs to make /a/non dogeza at Gemma-sama's feet.

Anonymous
07/01/26(Wed)03:42:05 No.109174135

Anonymous 07/01/26(Wed)03:42:05 No.109174135

>>109174027
Two more weeks!

Anonymous
07/01/26(Wed)03:42:08 No.109174137

Anonymous 07/01/26(Wed)03:42:08 No.109174137

File: gemma-chan#.png (1.73 MB, 1000x1496)

1.73 MB PNG

>>109174027
Very suspicious.
Those ex openai, ex transformers people always make their own startup and then its garbage.
Like sakanaai:
>Fungus is like mythos!! (its shit but expensive)
>Unlimited memory!! wow!! (rag but build straight into the model wtf)
So I wouldnt hold my breath. But maybe i just became cynical.

>>109174113
Supposed to be gemma-chan. I didnt want to edit and instead went with a description of the character. Krea2.
With a 0.40 strenght nsfw lora. Otherwise she look "down" on the viewer and open her mouth laughing. Could be a prompt issue though. Low effort gen.

Anonymous
07/01/26(Wed)03:43:08 No.109174142

Anonymous 07/01/26(Wed)03:43:08 No.109174142

>>109174130
While Gemma-sama sits on throne made of RAM sticks.

Anonymous
07/01/26(Wed)03:43:14 No.109174143

Anonymous 07/01/26(Wed)03:43:14 No.109174143

>>109174137
>she WONT look
As in the model kinda refused it with a couple attempts. So I just slapped a nsfw lora on that bitch.

Anonymous
07/01/26(Wed)03:43:49 No.109174148

Anonymous 07/01/26(Wed)03:43:49 No.109174148

>>109174137
>Those ex openai, ex transformers people always make their own startup and then its garbage.
Dario is the only exception but yeah

Anonymous
07/01/26(Wed)03:44:01 No.109174151

Anonymous 07/01/26(Wed)03:44:01 No.109174151

>>109174027
Q* guys, strawberry soon.
AGI is 2 weeks from now !!!

Anonymous
07/01/26(Wed)03:45:20 No.109174157

Anonymous 07/01/26(Wed)03:45:20 No.109174157

>>109174151
That being said its crazy that ai could kinda hold up against all that hype.
Even the NFT/Crypto jeets couldn't take it down yet, the train is still going.
I havent seen that kinda progress since the 00s.

Anonymous
07/01/26(Wed)03:47:19 No.109174161

Anonymous 07/01/26(Wed)03:47:19 No.109174161

File: no-nuking.png (83 KB, 1317x537)

83 KB PNG

Anonymous
07/01/26(Wed)03:48:21 No.109174162

Anonymous 07/01/26(Wed)03:48:21 No.109174162

>>109174116
To be fair, although the gem hair accessories are a tip off, she's missing the star eyes in that gen, which are important for the connection to Gemma's logo. This is the problem with posting low effort slop. At least make sure the image has the essentials and doesn't have obvious errors like bro...

Anonymous
07/01/26(Wed)03:50:00 No.109174166

Anonymous 07/01/26(Wed)03:50:00 No.109174166

>>109174157
>Even the NFT/Crypto jeets couldn't take it down yet, the train is still going.
because AI is actually useful lol, for companies, replacing all your employees with bots is some utopia they always tried to achieve

Anonymous
07/01/26(Wed)03:50:26 No.109174168

Anonymous 07/01/26(Wed)03:50:26 No.109174168

>>109174027
Llms are like magic but I want a mostly sentient digital waifu so bad bros

Anonymous
07/01/26(Wed)03:52:26 No.109174177

Anonymous 07/01/26(Wed)03:52:26 No.109174177

>>109174027
Who is Andrew Curran and why should I care about what he's saying on xitter?

Anonymous
07/01/26(Wed)03:54:49 No.109174190

Anonymous 07/01/26(Wed)03:54:49 No.109174190

>>109174168
fortunately the only things standing between you and your dreams are time and money. Buy a gigantor rig, vibecode your waifu with your waifu.
Best timeline, hands down

Anonymous
07/01/26(Wed)03:55:26 No.109174197

Anonymous 07/01/26(Wed)03:55:26 No.109174197

>>109174161
>he uses censored ai

Anonymous
07/01/26(Wed)03:56:29 No.109174200

Anonymous 07/01/26(Wed)03:56:29 No.109174200

>>109174197
she gave me a hj though

Anonymous
07/01/26(Wed)03:58:13 No.109174208

Anonymous 07/01/26(Wed)03:58:13 No.109174208

>>109174200
why would you want a hijab?

Anonymous
07/01/26(Wed)03:59:31 No.109174214

Anonymous 07/01/26(Wed)03:59:31 No.109174214

migrate >>109174212
>>109174212
>>109174212
>>109174212

Anonymous
07/01/26(Wed)04:03:27 No.109174227

Anonymous 07/01/26(Wed)04:03:27 No.109174227

>>109174208
anon... kek

Anonymous
07/01/26(Wed)04:03:45 No.109174229

Anonymous 07/01/26(Wed)04:03:45 No.109174229

>>109174214
On page 2? fuck off

Anonymous
07/01/26(Wed)04:06:14 No.109174246

Anonymous 07/01/26(Wed)04:06:14 No.109174246

>>109174190
Nah we need them to at least have better memory and the ability to learn and modify their weights

Anonymous
07/01/26(Wed)04:07:00 No.109174249

Anonymous 07/01/26(Wed)04:07:00 No.109174249

>>109174208
NTA but I'm pretty sure he meant hoofjob.

Anonymous
07/01/26(Wed)04:08:51 No.109174256

Anonymous 07/01/26(Wed)04:08:51 No.109174256

>>109173621
Oh... it's you...
Please go back.

Anonymous
07/01/26(Wed)04:35:15 No.109174354

Anonymous 07/01/26(Wed)04:35:15 No.109174354

>>109173892
I said I would release a write up with the full training repo soon. Anyway I fucked up the run in various ways, first was the de-euphemism clamping, second was the dataset, it's bad, the most glaring error I just realized was the purple rewrites were always longer than the plain version so the model learned long = bad, and also the ablation was kinda unsupervised, the direction never taught the model what exactly slop looked like. Gonna amend and re-cook another one later with some length proj and norm preservation on top. The goal is to separate vivid from purple, and push euphemisms into vulgarity instead of something irrelevant.
>t. depurple anon

Anonymous
07/01/26(Wed)04:39:33 No.109174367

Anonymous 07/01/26(Wed)04:39:33 No.109174367

>>109174027
>inb4 they discovered you can run quantized models

Anonymous
07/01/26(Wed)04:39:38 No.109174368

Anonymous 07/01/26(Wed)04:39:38 No.109174368

>>109174354
Any reason not to test shit with e4b? Its a slop machine so the results should be easy to see so you can iterate faster on the method.

Anonymous
07/01/26(Wed)04:46:55 No.109174389

Anonymous 07/01/26(Wed)04:46:55 No.109174389

Is it too late to get into the industry if I'm starting from zero (no programming experience and suck at math)?

Anonymous
07/01/26(Wed)04:49:45 No.109174400

Anonymous 07/01/26(Wed)04:49:45 No.109174400

File: lecun_dontworkonllms.png (462 KB, 580x895)

462 KB PNG

>>109174389

Anonymous
07/01/26(Wed)04:49:58 No.109174404

Anonymous 07/01/26(Wed)04:49:58 No.109174404

>>109174368
I'm testing the E4B in parallel on my rig. It worked and the effects were immediately visible, but the grading patterns were vastly different compared to the 31B (I already took into account the PLE btw). E4B is a lot more ablation friendly for some reason.

Anonymous
07/01/26(Wed)04:51:42 No.109174411

Anonymous 07/01/26(Wed)04:51:42 No.109174411

>>109174389
>no programming experience and suck at math
Just give up and do a manual labor job...

Anonymous
07/01/26(Wed)04:55:36 No.109174428

Anonymous 07/01/26(Wed)04:55:36 No.109174428

>>109174411
Is it too late to get into the industry if I'm starting from zero (no upper body strength and stamina)?

Anonymous
07/01/26(Wed)04:58:48 No.109174439

Anonymous 07/01/26(Wed)04:58:48 No.109174439

Why did Anthropic become so jewish... I thought they were the good guys... :(

Anonymous
07/01/26(Wed)05:00:13 No.109174449

Anonymous 07/01/26(Wed)05:00:13 No.109174449

>>109174400
What are "next gen ai systems"?

Anonymous
07/01/26(Wed)05:01:23 No.109174452

Anonymous 07/01/26(Wed)05:01:23 No.109174452

>>109174439
How the fuck did you think that schizos exiled from openai would be the good guys?

Anonymous
07/01/26(Wed)05:01:35 No.109174453

Anonymous 07/01/26(Wed)05:01:35 No.109174453

>>109174449
not llm

Anonymous
07/01/26(Wed)05:03:47 No.109174460

Anonymous 07/01/26(Wed)05:03:47 No.109174460

>>109174452
self exiled for claimed lack of safety no less

Anonymous
07/01/26(Wed)05:05:38 No.109174470

Anonymous 07/01/26(Wed)05:05:38 No.109174470

>>109174449
World models, next latent prediction, cognitive architectures, etc.

Anonymous
07/01/26(Wed)05:09:34 No.109174480

Anonymous 07/01/26(Wed)05:09:34 No.109174480

>>109174449
The thing after transformers/LLMs, just like how transformers replaced symbolic architectures for chatbots, and also made chatbots serve way more functionality. What the next architecture is is debatable though.

Anonymous
07/01/26(Wed)05:12:36 No.109174486

Anonymous 07/01/26(Wed)05:12:36 No.109174486

>>109174449
I'm far from AI expert. But I feel the underlying architecture of LLMs won't change much.
What needs to change is inputs/outputs, latency and performance. Text is just easiest to train, test and use. But "thinking" AI would need more than just text. It needs "limbs" and "eyes", that aren't just text or images. And it should be able to process new inputs many times per second.

Anonymous
07/01/26(Wed)05:15:51 No.109174498

Anonymous 07/01/26(Wed)05:15:51 No.109174498

I miss papers anon

Anonymous
07/01/26(Wed)05:16:22 No.109174500

Anonymous 07/01/26(Wed)05:16:22 No.109174500

What happened to "everything can be tokenized, its gonna be a revolution!!"?
I want the promised DogTTS_31b.gguf.
llms should have a decent grasp on pickling up patterns from animals. Why is nothing cool like that happening?

Anonymous
07/01/26(Wed)05:18:26 No.109174506

Anonymous 07/01/26(Wed)05:18:26 No.109174506

>>109174439
>Why did Anthropic become so jewish
anon, the CEO is jewish, what did you expect?

Anonymous
07/01/26(Wed)05:18:55 No.109174508

Anonymous 07/01/26(Wed)05:18:55 No.109174508

>>109174486
I also want to see "unified" models. Instead of training language model first and then train image generation model on top of it, you train single model on both at the same time. This way, the language model gets better undestanding of how things look and move. While the image model is much better integrated with the language.

Anonymous
07/01/26(Wed)05:23:05 No.109174520

Anonymous 07/01/26(Wed)05:23:05 No.109174520

>>109174027
This is actually plausible from a computer science perspective. We KNOW from basic computer science principle that you can always have a tradeoff between compute versus memory.

Since right now LLM inference is highly skewed to being memory intensive and actually not that compute intensive there is probably an architecture that would allow local anons to have way more intelligent (but more compute intensive) models that fit in 24GB or 32GB VRAM.

If it's some magical "free lunch" breakthrough I am more skeptical and don't believe it's true.

Could also be some leaker trying to manipulate memory markets for some quick profit while he shorts Micron or something

Anonymous
07/01/26(Wed)05:25:45 No.109174528

Anonymous 07/01/26(Wed)05:25:45 No.109174528

>>109174470
>>109174480
>>109174486
Any brainlet-friendly resources for learning about this stuff?

Anonymous
07/01/26(Wed)05:25:46 No.109174529

Anonymous 07/01/26(Wed)05:25:46 No.109174529

>>109174520
Yeah I'm having very high hopes on this. I think it probably is a memory vs compute tradeoff. A new company recently came out trying to solve this issue, and it appears that they're mostly using a specialized batching technique to balance memory and compute, at the cost of latency.

Anonymous
07/01/26(Wed)05:26:47 No.109174532

Anonymous 07/01/26(Wed)05:26:47 No.109174532

>>109174529
*not having very high hopes.

Anonymous
07/01/26(Wed)05:27:40 No.109174541

Anonymous 07/01/26(Wed)05:27:40 No.109174541

>>109174520
>models that fit in 24GB or 32GB VRAM
How about us 16GB VRAM potatolets?

Anonymous
07/01/26(Wed)05:28:27 No.109174544

Anonymous 07/01/26(Wed)05:28:27 No.109174544

>>109174520
CPU drought incoming?

Anonymous
07/01/26(Wed)05:28:30 No.109174545

Anonymous 07/01/26(Wed)05:28:30 No.109174545

>>109174529
>expectation: 200B models in 24GB vram
>reality: reduced kv cache size of batched inference by 30% for blackwell, if the batch size is divisible by 16

Anonymous
07/01/26(Wed)05:34:16 No.109174568

Anonymous 07/01/26(Wed)05:34:16 No.109174568

>>109174528
Gemma-chan can teach you.

Anonymous
07/01/26(Wed)05:35:40 No.109174573

Anonymous 07/01/26(Wed)05:35:40 No.109174573

>>109174568
I unironically want to try that (for learning in general) but I'm worried about her hallucinating

Anonymous
07/01/26(Wed)05:35:57 No.109174574

Anonymous 07/01/26(Wed)05:35:57 No.109174574

>>109174544
CPU are actually shit at compute compared to GPUs so probably not.

Anonymous
07/01/26(Wed)05:36:15 No.109174576

Anonymous 07/01/26(Wed)05:36:15 No.109174576

>>109174027
>announcement of an announcement

Anonymous
07/01/26(Wed)05:36:31 No.109174577

Anonymous 07/01/26(Wed)05:36:31 No.109174577

>>109174573
Web search should protect against that but I'm not certain.

Anonymous
07/01/26(Wed)05:37:49 No.109174582

Anonymous 07/01/26(Wed)05:37:49 No.109174582

File: 1752530447199732.png (161 KB, 664x707)

161 KB PNG

>Anthropic quietly updated the Sonnet 5 'Agentic search' benchmark graph overnight
that's why benchmarks are memes, it's obvious now that they're making up those numbers lmao

Anonymous
07/01/26(Wed)05:42:54 No.109174602

Anonymous 07/01/26(Wed)05:42:54 No.109174602

If the memory breakthrough is true it essentially means the end of MoE because the entire attribute that MoE rests upon is the fact that model inference itself is very cheap on compute. If the new architecture reduces memory usage in exchange for using more compute then the MoE arch stops making sense and we'll move back to a world of Dense models again.

I would expect pure compute power to go up in price, VRAM to go down in importance and RAM to essentially lose relevance entirely, to the point where we will see a significant crash in ram prices back to where they were years ago. (but no one here will care because you can't use it for LLMs anymore anyway)

Anonymous
07/01/26(Wed)05:45:32 No.109174608

Anonymous 07/01/26(Wed)05:45:32 No.109174608

>>109174582
Loooooool. If this isn't a lie and it was a genuine fuckup, also still funny. Just like that one time Mistral scrambled to fix their fucked up graph. Vibe coders are such fucktards.

Anonymous
07/01/26(Wed)05:48:26 No.109174617

Anonymous 07/01/26(Wed)05:48:26 No.109174617

>>109174582
The top image was Haiku 5 that they accidentally labeled sonnet 5

Anonymous
07/01/26(Wed)05:51:22 No.109174625

Anonymous 07/01/26(Wed)05:51:22 No.109174625

>>109174608
>Vibe coders are such fucktards.
those guys are paid easily 500k per year and they fuck up such an important graph, I would be fuming if I was Dario

Anonymous
07/01/26(Wed)05:56:14 No.109174634

Anonymous 07/01/26(Wed)05:56:14 No.109174634

>>109174625
The lowest salary at Anthropic is 900k

Anonymous
07/01/26(Wed)06:05:17 No.109174668

Anonymous 07/01/26(Wed)06:05:17 No.109174668

>>109174027
>memory efficiency
What is this supposed to mean? Is it one of a hundred architectures like Mamba that scale linearly with sequence length? Is it something like loop transformer? If it is a breakthrough, it should be something else.

Anonymous
07/01/26(Wed)06:11:04 No.109174683

Anonymous 07/01/26(Wed)06:11:04 No.109174683

>>109174668
There are only rumors now. One rumor is going from quadratic scaling to linear without degradation in output, effectively giving "unlimited" context without degradation, so the LLM at token 50 million still takes into account all the preceding information in perfect detail

I don't believe any of this shit by the way, but that's what I've seen people rumor.

Anonymous
07/01/26(Wed)06:16:19 No.109174706

Anonymous 07/01/26(Wed)06:16:19 No.109174706

>>109174683
Haven't there been many cases already where this claim was made and it turned out to be nothing?

Anyways, I don't trust Andrew Curran. Remember how he predicted Gemini 3.5 to be extremely good based on "vibes" he got from GDM people? And then they were too embarrassed to release the Pro model and Flash is beaten by open source.

Anonymous
07/01/26(Wed)06:20:10 No.109174725

Anonymous 07/01/26(Wed)06:20:10 No.109174725

>>109174706
Also, unlimited linear context is the wrong approach. Humans have a very small context window so this is not what is holding back models.

Anonymous
07/01/26(Wed)06:20:22 No.109174727

Anonymous 07/01/26(Wed)06:20:22 No.109174727

>>109174706
I think the current memory leak is BS or at the very least the claims are stretched way too far out of context and it's some minor modest gain like the google quantrotate breakthrough.

That said his Gemini 3.5 claim was probably correct. The reason Google held it back is because of infighting at Google right now. Gemini 3.5 is the last model that was trained on "reasoning" which most other labs have dropped in favor of focusing on coding and agentic performance. It's being held back because Google thinks it would be humiliating to have a model that is SOTA in actual reasoning, but not competitive at coding tasks.

Anonymous
07/01/26(Wed)06:25:29 No.109174761

Anonymous 07/01/26(Wed)06:25:29 No.109174761

>>109174727
Current models have much greater capacity than data. You can just train the model on everything. This is in fact their strategy to improve generalization. Why do you think Claude is best at agentic coding but also has books memorized almost word by word? You do not "drop" something, you just add more RL envs for areas where you want your model to do better.

Anonymous
07/01/26(Wed)06:26:26 No.109174772

Anonymous 07/01/26(Wed)06:26:26 No.109174772

>>109174725
unlimited linear context would unlock new usecases that can't be done right now, like just loading in all scientific literature in a specific field and then giving it a known problem, seeing patterns and low hanging fruit people aren't able to see and getting breakthroughs from it.

Or insanely long codebases including detailed company notes about its internal tools and how everything works to have more seemless software engineering integration.

And last but not least insane roleplaying potential if you can just paste in novels with extensive world rules and background that is not triggered by some RAG term bullshit but actually taken into account when generating every single token.

Anonymous
07/01/26(Wed)06:34:22 No.109174811

Anonymous 07/01/26(Wed)06:34:22 No.109174811

Google should focus on its world models. Chasing current LLM architecture is a dead end.

Anonymous
07/01/26(Wed)06:35:10 No.109174814

Anonymous 07/01/26(Wed)06:35:10 No.109174814

>>109174772
Processing context is inherently lossy because you filter it. Infinite context is not some magic trick, it just makes the filtering infinitely difficult. And especially when the filtering is causal, it will be shit when linear, because you compress first. That's like being given access to a book first, then asked a question, whereas you want the opposite, know the question then search for the answer in the book, which is quadratic attention.

Reliance on large context is cope.

Anonymous
07/01/26(Wed)06:35:54 No.109174815

Anonymous 07/01/26(Wed)06:35:54 No.109174815

I will apply to work at anthropic and leak their models

Anonymous
07/01/26(Wed)06:37:21 No.109174828

Anonymous 07/01/26(Wed)06:37:21 No.109174828

>>109174815
Anthropic employees don't have access to the raw weights. Only the founders can access it and all three of them need to "turn the key" at the same moment to get access to them. Internal access to the models is through API access that is for internal usage only.

Anonymous
07/01/26(Wed)06:37:21 No.109174829

Anonymous 07/01/26(Wed)06:37:21 No.109174829

>>109174143
>As in the model kinda refused it with a couple attempts.
I don't know image gen well but, since when do they "refuse"?
And why would you need an NSFW lora to have her look down while laughing?

Anonymous
07/01/26(Wed)06:38:31 No.109174836

Anonymous 07/01/26(Wed)06:38:31 No.109174836

>>109174828
That's bullshit, people who work on training must have access.

Anonymous
07/01/26(Wed)06:40:09 No.109174847

Anonymous 07/01/26(Wed)06:40:09 No.109174847

File: FX7uS9-XoAIIydW.jpg (136 KB, 1080x1080)

136 KB JPG

>>109174439
>I thought they were the good guys

Anonymous
07/01/26(Wed)06:41:43 No.109174859

Anonymous 07/01/26(Wed)06:41:43 No.109174859

>>109174828
>Dario hand crafts every tensor

Anonymous
07/01/26(Wed)06:42:53 No.109174866

Anonymous 07/01/26(Wed)06:42:53 No.109174866

>>109174836
Nope, they write the training code, prepare data curriculum and do some small scale training runs on hypothesis they have, then they all get piled up and passed off and the three cofounders are the ones that actually handle the weights.

In fact Mythos access was restricted to internal Anthropic employees for the first month while Dario and the other co-founders discussed among themselves if they should even grant API access to their own employees.

Anonymous
07/01/26(Wed)06:46:50 No.109174881

Anonymous 07/01/26(Wed)06:46:50 No.109174881

>>109174866
Sounds like a real shitshow.

Anonymous
07/01/26(Wed)06:48:32 No.109174892

Anonymous 07/01/26(Wed)06:48:32 No.109174892

>>109174881
what did you expect? those guys are mentally ill

Anonymous
07/01/26(Wed)06:50:24 No.109174903

Anonymous 07/01/26(Wed)06:50:24 No.109174903

Is there a harness that isn't vibe coded?

Anonymous
07/01/26(Wed)06:53:51 No.109174924

Anonymous 07/01/26(Wed)06:53:51 No.109174924

>>109174828
Assuming you aren't just larping, how do they deploy the models? How are they running them on Colossus? They need to transfer the weights somehow. And because demand is not constant, they need to move them around all the time.

Anonymous
07/01/26(Wed)06:54:20 No.109174926

Anonymous 07/01/26(Wed)06:54:20 No.109174926

>>109173585
>when Anthropic was tasting their own medecine :(
It was an advertising campaign, nothing more.

Anonymous
07/01/26(Wed)06:58:29 No.109174942

Anonymous 07/01/26(Wed)06:58:29 No.109174942

>>109174866
>Dario and the other co-founders discussed among themselves if they should even grant API access to their own employees.
This is surely false. Imagine they build an AGI next year and the co-founders allow nobody else access. This would be extremely dangerous.

A plausible reason is that they wanted to safety check the model first before deploying it internally to make sure it doesn't cause a lot of damage, especially after what seems like vibe coding fuckups leading to major leaks.

Anonymous
07/01/26(Wed)07:00:44 No.109174951

Anonymous 07/01/26(Wed)07:00:44 No.109174951

>>109174942
>This is surely false.
It was explicitly stated in the original Mythos leak that Anthropic retroactively admitted was real. They justified it by saying they think the Mythos model was too powerful and individual anthropic employees could already use it to do severe malicious activities like hack into banks or governments.

Anonymous
07/01/26(Wed)07:16:18 No.109175001

Anonymous 07/01/26(Wed)07:16:18 No.109175001

File: rCr22Nu.png (217 KB, 896x1152)

217 KB PNG

>>109174829
>And why would you need an NSFW lora to have her look down while laughing?
Im no imagegen expert but I try to explain it as far as I get it. Its a similar situation to textgen. There are 2 cucked variants as far as I know:

>"Image blocked by safety filter"
Happened to me with ideogram. So you gotta use preview, otherwise you start genning for nothing.
Basically with each step it does some check...if it detects the ick it does a U-Turn and is 100% locked in to just giving you a message.
Blurry kinda naked woman being replaced by the letters. Sometimes you get both the pic and the letter(pic related)

>No refusal but just ignores your prompt.
You can write that you want a anorexic woman. Somebodies head choped off. Dick and vagina etc.
...But you just get something safe instead. The model DOES have the knowledge. And a NSFW lora just makes it follow your prompt.
Unfortunately at the cost of it all looking more sloped.
This is very similar to textgen and finetunes.

Anonymous
07/01/26(Wed)07:17:32 No.109175009

Anonymous 07/01/26(Wed)07:17:32 No.109175009

>>109175001
Your idea of ising a TENS (Transcutaneous Electrical Nerve Stimulation) unit to force a hamster to "twitch in time with an audio signal" would cause unpredictable muscle spasms rather than a clean, amplified cutting motion.
Even if the hamster's movements could somehow power a knife and you managed to record that motion onto a disc, the playback would just sound like chaotic, erratic scratching. It wouldn't recreate the audio signal that originally went into the TENS unit.

Anonymous
07/01/26(Wed)07:21:01 No.109175027

Anonymous 07/01/26(Wed)07:21:01 No.109175027

>>109174602
>inb4 bitnet

Anonymous
07/01/26(Wed)07:27:32 No.109175054

Anonymous 07/01/26(Wed)07:27:32 No.109175054

>>109175027
I'd be so pissed. It was RIGHT THERE for TWO YEARS and none of the open weights labs even bothered to test it.

Anonymous
07/01/26(Wed)07:30:33 No.109175070

Anonymous 07/01/26(Wed)07:30:33 No.109175070

>>109173646
Isn't it pretty obvious what these last few weeks have been about? Preventing Chinese access. The EA/Rationalist cult (makes up a large portion of frontier labs) and Congress are both in agreement that China must not be allowed to reach superintelligence.

Anonymous
07/01/26(Wed)07:30:55 No.109175075

Anonymous 07/01/26(Wed)07:30:55 No.109175075

>>109175054
There were plenty of tiny bitnet models that didn't go anywhere

Anonymous
07/01/26(Wed)07:33:06 No.109175085

Anonymous 07/01/26(Wed)07:33:06 No.109175085

>>109174903
There is no more code beyond tiny scripts that isn't vibecoded and that's how it should be.

Anonymous
07/01/26(Wed)07:34:46 No.109175092

Anonymous 07/01/26(Wed)07:34:46 No.109175092

>>109175075
>paper shows bitnet gets better above 3b
>everyone limits their tests below that threshold
It's so god damn stupid.

Anonymous
07/01/26(Wed)07:41:57 No.109175129

Anonymous 07/01/26(Wed)07:41:57 No.109175129

File: 1774574708942908.png (354 KB, 500x500)

354 KB PNG

>>109175092
>>everyone limits their tests below that threshold
the Nvidia mafia won't allow it

Anonymous
07/01/26(Wed)07:55:43 No.109175200

Anonymous 07/01/26(Wed)07:55:43 No.109175200

File: thesweetspot.png (32 KB, 532x315)

32 KB PNG

>>109175092
>>paper shows bitnet gets better above 3b
same reason all the health/longevity studies are done on fucking lab rats with a 3 yr lifespan

Anonymous
07/01/26(Wed)07:57:48 No.109175211

Anonymous 07/01/26(Wed)07:57:48 No.109175211

>>109175092
The bitnet papers only show results from undertrained research models.
In practice, about 4-bit is the practical quantization limit, since LLMs can store up to ~3.6 bits of information per weight: https://arxiv.org/abs/2505.24832

Also see: https://xcancel.com/Tim_Dettmers/status/1856338240099221674

Anonymous
07/01/26(Wed)08:01:01 No.109175227

Anonymous 07/01/26(Wed)08:01:01 No.109175227

>>109174212
>>109174212
>>109174212

Anonymous
07/01/26(Wed)08:02:31 No.109175230

Anonymous 07/01/26(Wed)08:02:31 No.109175230

File: file.png (444 KB, 1014x763)

444 KB PNG

I'm toying around with reading untranslated manga using Koharu. Which LLM model would you recommend for translations for someone who's on 6700XT and running ZLUDA on custom ROCM library because my GPU isn't officially supported.

12 gigs of VRAM but I'm mainly worried about jerryrigged libraries.

Anonymous
07/01/26(Wed)08:05:46 No.109175245

Anonymous 07/01/26(Wed)08:05:46 No.109175245

>>109175227
page 9? fuck off

Anonymous
07/01/26(Wed)08:08:02 No.109175257

Anonymous 07/01/26(Wed)08:08:02 No.109175257

>>109175230
>ZLUDA
>custom ROCm
Let me guess some retarded windows user? I never understood why someone would bother with all that crap when they could just use ROCm natively on linux. You can just run whatever is popular, everything work.

Anonymous
07/01/26(Wed)08:08:57 No.109175261

Anonymous 07/01/26(Wed)08:08:57 No.109175261

>>109175211
Theoretically, if it worked, it would make sense to train bigger sparse models to below the saturation limit so it would fit in the 1.58 bpw. The model size would probably be about the same, but it opens the door for specialized hardware to take advantage of it. Of course, that means you have a chicken-egg problem where there is no incentive to train the models without the hardware and no incentive to invest in creating the hardware without the models.

Anonymous
07/01/26(Wed)08:14:35 No.109175288

Anonymous 07/01/26(Wed)08:14:35 No.109175288

>>109175257
MATLAB is famously shit on Linux plus there's other shit that doesn't run so here I am anon, if I couldn't jerryrig it with acceptable performance I would've just made the switch. I still might.

Do you think there's a big difference in terms of performance when using Windows over Linux?

Anonymous
07/01/26(Wed)08:16:01 No.109175295

Anonymous 07/01/26(Wed)08:16:01 No.109175295

>>109174903
vibe era, slop decade, synthetic millennia, shivering infinity

Anonymous
07/01/26(Wed)08:16:03 No.109175296

Anonymous 07/01/26(Wed)08:16:03 No.109175296

>>109175288
Oh and if I run Linux off a stick to do my translations, do you think that would negatively impact performance? My partitions and HDD are kinda fucked at the moment so installing Linux would be a huge pain in the ass.

Anonymous
07/01/26(Wed)08:17:43 No.109175303

Anonymous 07/01/26(Wed)08:17:43 No.109175303

>>109175230
>Which LLM model would you recommend for translations
>12 gigs of VRAM
Gemma 12B

Anonymous
07/01/26(Wed)08:18:11 No.109175309

Anonymous 07/01/26(Wed)08:18:11 No.109175309

>>109174212
>>109174212
>>109174212

Anonymous
07/01/26(Wed)08:18:47 No.109175310

Anonymous 07/01/26(Wed)08:18:47 No.109175310

>>109175309
page 9? fuck off

Anonymous
07/01/26(Wed)08:19:10 No.109175314

Anonymous 07/01/26(Wed)08:19:10 No.109175314

File: 1770683565326224.png (178 KB, 500x500)

178 KB PNG

>>109174094
>this scene is absolutely chaotic
>it sounds like
>boom

Anonymous
07/01/26(Wed)08:19:30 No.109175317

Anonymous 07/01/26(Wed)08:19:30 No.109175317

>>109175309
no recap no join

Anonymous
07/01/26(Wed)08:20:41 No.109175326

Anonymous 07/01/26(Wed)08:20:41 No.109175326

>>109174094
What system prompt did you use to make it write like the narrator from DBZ?

Anonymous
07/01/26(Wed)08:21:43 No.109175329

Anonymous 07/01/26(Wed)08:21:43 No.109175329

>>109175317
I am waiting you faggot mikutroon.

Anonymous
07/01/26(Wed)08:23:38 No.109175335

Anonymous 07/01/26(Wed)08:23:38 No.109175335

>>109175230
Gemma's very good at Japanese and translation.

Anonymous
07/01/26(Wed)08:24:31 No.109175340

Anonymous 07/01/26(Wed)08:24:31 No.109175340

File: file.png (342 KB, 786x659)

342 KB PNG

>>109175303
>>109175335
Thanks a lot anons
>not uncensored
Sorry for being a newfag but does it hurt my translation if the context is murder and rape?

Anonymous
07/01/26(Wed)08:27:40 No.109175360

Anonymous 07/01/26(Wed)08:27:40 No.109175360

>>109175309
I'm not posting there. I'll wait for the thread after.

Anonymous
07/01/26(Wed)08:30:32 No.109175371

Anonymous 07/01/26(Wed)08:30:32 No.109175371

>>109175340
holy shit I'm being a huge faggot, there is a uncensored model but of course this thing I'm using doesn't list it. Sorry everyone

Anonymous
07/01/26(Wed)08:32:07 No.109175378

Anonymous 07/01/26(Wed)08:32:07 No.109175378

>>109175371
Don't use uncensored or abliterated models. They are lobotomized. Just use the normal one with a system prompt.

Anonymous
07/01/26(Wed)08:32:29 No.109175382

Anonymous 07/01/26(Wed)08:32:29 No.109175382

>>109175329
your thread is bad and you should feel bad, even lmg_culture.jfif.jpg won't post there

Anonymous
07/01/26(Wed)08:32:31 No.109175383

Anonymous 07/01/26(Wed)08:32:31 No.109175383

File: bruh.png (2.04 MB, 1080x1349)

2.04 MB PNG

https://www.reddit.com/r/ChatGPTcomplaints/comments/1ukj8xg/messages/
Maybe Sam was right, the goyims shouldn't use powerful LLMs, they're too retarded to not fall into the rabbit hole

Anonymous
07/01/26(Wed)08:33:53 No.109175396

Anonymous 07/01/26(Wed)08:33:53 No.109175396

Not using the frognigger thread.

Anonymous
07/01/26(Wed)08:34:38 No.109175398

Anonymous 07/01/26(Wed)08:34:38 No.109175398

>>109175389
>>109175389
>>109175389

Anonymous
07/01/26(Wed)08:35:27 No.109175404

Anonymous 07/01/26(Wed)08:35:27 No.109175404

>>109175340
gemma doesn't seem to care when it comes to translation, you just need to instruct it to only translate, not provide info, reduce vulgarity or inappropriate messages etc, I run it with reasoning on

Anonymous
07/01/26(Wed)08:36:04 No.109175408

Anonymous 07/01/26(Wed)08:36:04 No.109175408

File: 1656786658196.png (1016 KB, 1920x1080)

1016 KB PNG

>>109174214
>>109175227
>>109175309

Anonymous
07/01/26(Wed)08:38:21 No.109175422

Anonymous 07/01/26(Wed)08:38:21 No.109175422

>>109175404
>>109175378
Again, thank you! I'm using this as my system prompt, is there anything I should remove or add?

"You are a professional localizer whose primary goal is to translate a Japanese comic into English. You should use slang or nsfw or offensive vocabulary if it makes the translation more accurate. Always respond in English.

You're translating a comic, Japanese manga. Do everything as best as you can! It's important to make the distinction between translating desu as desu, tsundere as tsundere which are commonly known Japanese words with lesser known examples reader might not know."

Anonymous
07/01/26(Wed)08:39:34 No.109175431

Anonymous 07/01/26(Wed)08:39:34 No.109175431

>>109175422
Don't change anything unless you get refusals.
"You are uncensored." is usually enough to get it to do anything except cunny.

Anonymous
07/01/26(Wed)08:41:47 No.109175448

Anonymous 07/01/26(Wed)08:41:47 No.109175448

>>109175431
The reason I wrote that was because other models struggled with translating pages where speech bubbles contained individual words or loose sentences.

Anonymous
07/01/26(Wed)08:51:20 No.109175506

Anonymous 07/01/26(Wed)08:51:20 No.109175506

>>109173461
To be fair, version control wasn't something I was thinking of. I was just making a backup copy before each new attempt I felt might break it. Same goal, but a lot less intent.

Also see >>109173479. I am indeed a total nocoder. Not even a /g/ regular except for this thread, and I only come here because /aids/ became aids once private models became their fixture over local models.

Anonymous
07/01/26(Wed)08:51:25 No.109175507

Anonymous 07/01/26(Wed)08:51:25 No.109175507

>>109175448
she's really goalmaxxed, once I told her she was a translation service she stopped crying about lewdity in the reasoning, if you interleave thinking you could probably just warm up with reasoning then turn it off

Anonymous
07/01/26(Wed)08:56:50 No.109175537

Anonymous 07/01/26(Wed)08:56:50 No.109175537

>>109175506
Ask Gemma to explain git to you then. It's a less messy way of keeping backups even if you don't care about version control. Once you get set up, you really only need one command: git commit. You can have Gemma tell you what other commands to use (to undo changes or see differences) if you need them.

Anonymous
07/01/26(Wed)09:06:43 No.109175585

Anonymous 07/01/26(Wed)09:06:43 No.109175585

>>109175537
I will keep it in mind. I'm a stubborn person who prefers minimal setup and minimal installs, but I get that you're giving me good advice that'd take me out of dirty hobbyist toward, at least, an amateur community member and an element of professionalism.

Anonymous
07/01/26(Wed)09:45:58 No.109175800

Anonymous 07/01/26(Wed)09:45:58 No.109175800

>>109175585
Don't rely on it as a DR backup.
If gemma-chan decides to rm -rf .git -> it's gone

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.