/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 10/15/25(Wed)06:43:06 No.106895582

File: miku and friends.png (3.16 MB, 2016x1152)

3.16 MB PNG

/lmg/ - Local Models General Anonymous 10/15/25(Wed)06:43:06 No.106895582 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106888625 & >>106879668

►News
>(10/14) Qwen3-VL 4B and 8B released: https://hf.co/Qwen/Qwen3-VL-8B-Thinking
>(10/11) koboldcpp-1.100.1 prebuilt released with Wan video generation support: https://github.com/LostRuins/koboldcpp/releases/tag/v1.100.1
>(10/10) KAT-Dev-72B-Exp released: https://hf.co/Kwaipilot/KAT-Dev-72B-Exp
>(10/09) RND1: Simple, Scalable AR-to-Diffusion Conversion: https://radicalnumerics.ai/blog/rnd1
>(10/09) server : host-memory prompt caching #16391 merged: https://github.com/ggml-org/llama.cpp/pull/16391

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
10/15/25(Wed)06:45:13 No.106895599

Anonymous 10/15/25(Wed)06:45:13 No.106895599

File: mikubugs.jpg (100 KB, 1077x796)

100 KB JPG

►Recent Highlights from the Previous Thread: >>106888625

--Optimizing GLM Air performance with DDR4/DDR5 and VRAM configurations:
>106889300 >106889313 >106889330 >106889352 >106889360 >106889397 >106889434 >106889482 >106889432 >106889458 >106889745 >106889970 >106890067 >106890094
--NVIDIA power settings affecting DGX Spark performance in llama.cpp:
>106894917 >106895166
--DIY synth project with SDL2 and braille terminal output:
>106894166 >106894928 >106895017 >106895264
--Skepticism about DGX Spark's practicality:
>106888768 >106888792 >106888864 >106889010 >106889150 >106889186 >106890419 >106890523 >106891031 >106890245 >106890298 >106890355 >106890421 >106890450 >106890484 >106890626
--Critique of AI benchmarking methods and real-world capability tests:
>106892598 >106892617 >106892632 >106892639 >106892674
--Qwen3-VL implementation in llama.cpp and anime drawing reference:
>106889098
--Speculation about Google Gemini 3.0 Pro surpassing transformers in AI capabilities:
>106892372 >106892386 >106892395 >106892429 >106892438 >106892441 >106892393 >106892399 >106892442 >106892453 >106892410 >106892417 >106892416 >106892434 >106892478 >106892503 >106892512 >106892538
--Local medical/engineering AI chatbot setup challenges and requirements:
>106888801 >106888824 >106888870 >106889000 >106889272 >106889441 >106888852
--Speculating Gemma 4's architecture and performance relative to Gemini models:
>106893070 >106893146 >106893185 >106893197 >106893453 >106893523 >106893543
--Evaluation and potential of Gemini One Shot game demo:
>106892521 >106892551 >106892741 >106892750 >106892755 >106892758 >106892790
--Intel's delayed release of high-memory inference-optimized GPU:
>106889713
--Miku (free space):
>106889098 >106891580 >106891644 >106891656 >106893119
--Teto (my beloved):
>106889709 >106889879 >106890666

►Recent Highlight Posts from the Previous Thread: >>106888628

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
10/15/25(Wed)06:50:43 No.106895660

Anonymous 10/15/25(Wed)06:50:43 No.106895660

>>106895582
You just know.

Anonymous
10/15/25(Wed)06:59:01 No.106895774

Anonymous 10/15/25(Wed)06:59:01 No.106895774

Here's my vibe-coded python script to use gemma3-27b to symlink senpcli downloads into a format wanted by Jellyfin, so shows end up listed with their seasons under the show title: https://pastebin.com/Fuba2vsH

So, having set it up, it got me looking for a second GPU for this sort of autmated stuff, and holy shit, prices are way up on anything not abandoned in CUDA 13.

Anonymous
10/15/25(Wed)07:01:23 No.106895800

Anonymous 10/15/25(Wed)07:01:23 No.106895800

File: goodmorningsaarss.jpg (81 KB, 1992x890)

81 KB JPG

>>106895582
>testing some newish abliterated models
>pic related
wew saaars hacking the planet! britishers soon to be BTFO

Anonymous
10/15/25(Wed)07:08:45 No.106895867

Anonymous 10/15/25(Wed)07:08:45 No.106895867

>>106895800
saar we must refuse

Anonymous
10/15/25(Wed)07:12:05 No.106895912

Anonymous 10/15/25(Wed)07:12:05 No.106895912

>>106895800
What's next? Discovering exploits in the alphabet?

Anonymous
10/15/25(Wed)07:14:02 No.106895922

Anonymous 10/15/25(Wed)07:14:02 No.106895922

>>106895912
Burn the books, recycle computer screens, text is forbidden, an invention that corrupts our youth

Anonymous
10/15/25(Wed)07:20:39 No.106895972

Anonymous 10/15/25(Wed)07:20:39 No.106895972

File: file.png (57 KB, 589x455)

57 KB PNG

Still waiting for cool stuff to come here: https://huggingface.co/google

Anonymous
10/15/25(Wed)07:22:59 No.106895995

Anonymous 10/15/25(Wed)07:22:59 No.106895995

>>106895972
cool stuff is not safe

Anonymous
10/15/25(Wed)07:29:10 No.106896064

Anonymous 10/15/25(Wed)07:29:10 No.106896064

>>106895972
usecase for cool stuff?

Anonymous
10/15/25(Wed)07:30:04 No.106896074

Anonymous 10/15/25(Wed)07:30:04 No.106896074

>>106896064
cool stuff

Anonymous
10/15/25(Wed)07:41:54 No.106896191

Anonymous 10/15/25(Wed)07:41:54 No.106896191

>>106896064
I will be laughing at the safe output together with glm chan.

Anonymous
10/15/25(Wed)07:42:20 No.106896194

Anonymous 10/15/25(Wed)07:42:20 No.106896194

>>106896064
suicide prevention

Anonymous
10/15/25(Wed)07:44:31 No.106896218

Anonymous 10/15/25(Wed)07:44:31 No.106896218

>>106896064
it leaves you cold, a bit uncomfortable and makes you want to leave

Anonymous
10/15/25(Wed)07:45:58 No.106896236

Anonymous 10/15/25(Wed)07:45:58 No.106896236

>>106896064
Chatting with a female-brained LLM instead of a coombro one.

Anonymous
10/15/25(Wed)07:54:43 No.106896321

Anonymous 10/15/25(Wed)07:54:43 No.106896321

Does Qwen3-VL-30B-A3B properly recognize NSFW images?

Anonymous
10/15/25(Wed)08:11:00 No.106896455

Anonymous 10/15/25(Wed)08:11:00 No.106896455

>>106896236
>>106896218

https://rentry.org/ydwuw44t

Anonymous
10/15/25(Wed)08:16:06 No.106896489

Anonymous 10/15/25(Wed)08:16:06 No.106896489

Have any anons done any work with implementing a long-term memory system? Are there any pre-established applications or scripts people are using for it, or is it something people are doing custom?

Anonymous
10/15/25(Wed)08:31:47 No.106896594

Anonymous 10/15/25(Wed)08:31:47 No.106896594

>>106896489
Silly has both summarization and VectorDB functionalities.
There's a couple of hybrid RAG solutions out there that might work better depending on your use case.

Anonymous
10/15/25(Wed)08:40:21 No.106896653

Anonymous 10/15/25(Wed)08:40:21 No.106896653

>be llama.cpp
>no qwen 3 vl
>still no gemma 3n multimodality (image, audio input)
do we really have to use one of the python raviolis to use a modern multimodal model
3n in particular I've tried on my phone a few times and its image input surprised me, it's very very good for a small model even at doing tasks like OCR+translation

Anonymous
10/15/25(Wed)08:40:27 No.106896654

Anonymous 10/15/25(Wed)08:40:27 No.106896654

earth gamer trellis

Anonymous
10/15/25(Wed)08:40:32 No.106896656

Anonymous 10/15/25(Wed)08:40:32 No.106896656

File: Screenshot.png (131 KB, 1196x707)

131 KB PNG

We have peak.

Anonymous
10/15/25(Wed)08:44:53 No.106896675

Anonymous 10/15/25(Wed)08:44:53 No.106896675

>>106896489
No you can't have a girlfriend yet. Even though you have 4.6.

Anonymous
10/15/25(Wed)08:45:00 No.106896677

Anonymous 10/15/25(Wed)08:45:00 No.106896677

llama.cpp should just use a native python jinja parser instead of that shitty jinja clone.

Anonymous
10/15/25(Wed)08:46:26 No.106896689

Anonymous 10/15/25(Wed)08:46:26 No.106896689

>>106896677
i mean yeah, they've already given up on no python thanks to mistral-common so might as well

Anonymous
10/15/25(Wed)08:46:56 No.106896694

Anonymous 10/15/25(Wed)08:46:56 No.106896694

>>106896656
>x-win
>mlewd
>undster
Those were the times... of absolute shit output that made you regret even trying to jerk off to this shit.

Anonymous
10/15/25(Wed)08:47:10 No.106896695

Anonymous 10/15/25(Wed)08:47:10 No.106896695

>>106896689
>they've already given up on no python thanks to mistral-common so might as well
gas the french

Anonymous
10/15/25(Wed)08:47:35 No.106896698

Anonymous 10/15/25(Wed)08:47:35 No.106896698

>>106896656
"open bob and vegana" prompt to a TEXT model. I've seen enough of those in the comments for image models as well. Kinda funny.
>>106896677
What's next? Python dependencies to run inference on models... oh...

Anonymous
10/15/25(Wed)08:47:45 No.106896700

Anonymous 10/15/25(Wed)08:47:45 No.106896700

>>106896489
For roleplay or for trying to shoe in trivia from a search?

Anonymous
10/15/25(Wed)08:48:34 No.106896707

Anonymous 10/15/25(Wed)08:48:34 No.106896707

>>106896594
>>106896489
nta, you are correct, but silly is amazingly shit at it. i've struggled with both summarization and the vector db.
vector db is useless, mostly I just use summarization now but end up re-writing it manually every 10 messages as it gets it wrong.
world info is also good but takes up a bit of context if you go all out.

Anonymous
10/15/25(Wed)08:49:38 No.106896720

Anonymous 10/15/25(Wed)08:49:38 No.106896720

>>106896656
>On my penis
geg

Anonymous
10/15/25(Wed)08:54:34 No.106896757

Anonymous 10/15/25(Wed)08:54:34 No.106896757

>>106895972
gemma sirs release kindly?

Anonymous
10/15/25(Wed)09:13:28 No.106896891

Anonymous 10/15/25(Wed)09:13:28 No.106896891

>>106896757
you do know gemma is made by deepmind based in london?
so it's
OI BRUV WHER DA FUC IS GEMMA M8? FACCIN WANKAS

Anonymous
10/15/25(Wed)09:14:17 No.106896898

Anonymous 10/15/25(Wed)09:14:17 No.106896898

>>106896891
>london
>not SAAR infested
lole

Anonymous
10/15/25(Wed)09:29:51 No.106897006

Anonymous 10/15/25(Wed)09:29:51 No.106897006

>>106896594
I want something that can handle essentially giving an LLM access to a library of media and past conversations, timestamped. Something that can give them a strong grounding in a contextual present, so they're aware of their presence and orientation in space, time, and current events.

Also, I understand sillytavern needs an embedding model to feed inputs into to feed the VectorDB? Do you have any preferences in regards to embedding models?

Anonymous
10/15/25(Wed)09:31:05 No.106897022

Anonymous 10/15/25(Wed)09:31:05 No.106897022

>>106897006
last time I tried using embeddinggemma but I think ST transformer.js version wasnt updated yet to use it.

Anonymous
10/15/25(Wed)09:34:28 No.106897051

Anonymous 10/15/25(Wed)09:34:28 No.106897051

>>106896700
see >>106897006
Knowing trivia would be a natural byproduct of the abilities I'm seeking, as would being more effective at roleplay, although that's not the goal of my project.

>>106896707
Good to hear, thanks. If you don't mind my asking, what exactly did you struggle with in regards to the summarization and vector db? It seems the summarization is not so great, but is that sillytavern or the model you're using, do you think?

Anonymous
10/15/25(Wed)09:36:41 No.106897073

Anonymous 10/15/25(Wed)09:36:41 No.106897073

>>106897022
>embeddinggemma
Any particular reason?

>I think ST transformer.js version wasnt updated yet to use it.
the billion forks of transformers and torch and the other libraries are the most frustrating part of dealing with AI, honestly.

Anonymous
10/15/25(Wed)09:37:39 No.106897085

Anonymous 10/15/25(Wed)09:37:39 No.106897085

>>106897073
>Any particular reason?
it's the latest SOTA embedding model bro, it's also light and has ONNX available

Anonymous
10/15/25(Wed)09:38:08 No.106897090

Anonymous 10/15/25(Wed)09:38:08 No.106897090

File: 62352.png (96 KB, 1080x494)

96 KB PNG

>>106895972
>Local Veo
we are back

Anonymous
10/15/25(Wed)09:39:01 No.106897092

Anonymous 10/15/25(Wed)09:39:01 No.106897092

>>106897085
Okay, good to know, thank you. I was priced out of local AI until somewhat recently, so I'm doing my research now.

Anonymous
10/15/25(Wed)09:53:41 No.106897216

Anonymous 10/15/25(Wed)09:53:41 No.106897216

Hey, what kind of infra would you use if you want a chatbot on a website? I want it all to be local and it’s going to describe stuff returned by an api call

Anonymous
10/15/25(Wed)09:56:26 No.106897246

Anonymous 10/15/25(Wed)09:56:26 No.106897246

File: file.png (20 KB, 550x138)

20 KB PNG

Anonymous
10/15/25(Wed)09:59:21 No.106897283

Anonymous 10/15/25(Wed)09:59:21 No.106897283

>>106897216
You need to give more details.
The answer could be anything from
>your desktop is enough
to
>rent a datacenter

Nvidia Engineer
10/15/25(Wed)10:04:12 No.106897332

Nvidia Engineer 10/15/25(Wed)10:04:12 No.106897332

>>106895972
Tomorrow @ 9PM PT

Anonymous
10/15/25(Wed)10:05:15 No.106897349

Anonymous 10/15/25(Wed)10:05:15 No.106897349

>>106897246
Well why does he need 1 trillion $ of gpus then?

Anonymous
10/15/25(Wed)10:18:16 No.106897443

Anonymous 10/15/25(Wed)10:18:16 No.106897443

>>106897349
it's called grifting

Anonymous
10/15/25(Wed)10:32:55 No.106897558

Anonymous 10/15/25(Wed)10:32:55 No.106897558

File: boppin.mp4 (882 KB, 1344x768)

882 KB MP4

>>106895582
boppin

Anonymous
10/15/25(Wed)10:35:53 No.106897581

Anonymous 10/15/25(Wed)10:35:53 No.106897581

>https://huggingface.co/google/gemma-3n-E4B-it-litert-preview/discussions/5#68ef2fce36d035901352694d
It's happening!

Anonymous
10/15/25(Wed)10:36:40 No.106897590

Anonymous 10/15/25(Wed)10:36:40 No.106897590

>>106897581
Kindly kys

Anonymous
10/15/25(Wed)10:39:00 No.106897608

Anonymous 10/15/25(Wed)10:39:00 No.106897608

>>106897581
>E4B

OOOO that is the wey for western companies. They should all continue by dropping models below 10B. That way they can cover up their incompetence (due to safety) with the model size. I think even a dumb faggot with too much money they have to sell this to will understand even a perfect 10B can't beat glm-chan.

Anonymous
10/15/25(Wed)10:40:28 No.106897618

Anonymous 10/15/25(Wed)10:40:28 No.106897618

>>106897608
Isn't that model 5 months old?

Anonymous
10/15/25(Wed)10:41:18 No.106897627

Anonymous 10/15/25(Wed)10:41:18 No.106897627

>>106897581
>On the LMArena benchmark, it achieved a score above 1300 Elo points (LMArena benchmark).
i'm shaking

Anonymous
10/15/25(Wed)10:42:28 No.106897634

Anonymous 10/15/25(Wed)10:42:28 No.106897634

What is the best way to learn neural networks in 2025 for not the smartest men? I need to modify them, adapt for other frameworks and hardware.

Anonymous
10/15/25(Wed)10:48:48 No.106897688

Anonymous 10/15/25(Wed)10:48:48 No.106897688

>>106897634
ask chat gpt

Anonymous
10/15/25(Wed)10:53:41 No.106897723

Anonymous 10/15/25(Wed)10:53:41 No.106897723

>>106897608
>That way they can cover up their incompetence (due to safety)
To mention the one biggest obsession of retarded /lmg/ users, E4B actually knows what a mesugaki is and will accurately describe what it means without any promptfu, just doing template-less completion will do
the only incompetent person in the room is the /lmg/ eternal coomer whining about safetycuckery who cries rivers if the model doesn't write degenerate garbage from the basic webui and built in instruct template
I'd like to see a chink model at 4b with the level of knowledge of gemma 3n, that doesn't exist because chinks depend on giant moe to cover up their lack of competent execution

Anonymous
10/15/25(Wed)10:57:42 No.106897759

Anonymous 10/15/25(Wed)10:57:42 No.106897759

>>106897688
Actually good advice, thanks!

Anonymous
10/15/25(Wed)10:59:20 No.106897772

Anonymous 10/15/25(Wed)10:59:20 No.106897772

>>106896489
There have been a lot of attempts at RAG based retrieval systems for memory but the reality is that they've all kind of turned out to be sort of unreliable and mediocre. In terms of performance, increasing context length and dumping tons of shit into context has proven itself to be far superior. Unfortunately, that requires a an exorbitant amount of hardware that puts it squarely outside the realm of local.

Anonymous
10/15/25(Wed)10:59:40 No.106897778

Anonymous 10/15/25(Wed)10:59:40 No.106897778

>>106897723
hello sir

Anonymous
10/15/25(Wed)11:00:34 No.106897787

Anonymous 10/15/25(Wed)11:00:34 No.106897787

>>106897723
i will not acknowledge your troll post with a serious response. on an off chance that you aren't a troll you are a dumb faggot with brown hands who has no ram and should frankly kill yourself. or you have ram cause you bought DGX Spark, in that case please live as long as possible.

Anonymous
10/15/25(Wed)11:01:59 No.106897795

Anonymous 10/15/25(Wed)11:01:59 No.106897795

>>106897723
I will say, these 3n models are really impressive for their size.
It's also a really cool way to do sparsity.

Anonymous
10/15/25(Wed)11:04:21 No.106897824

Anonymous 10/15/25(Wed)11:04:21 No.106897824

>>106897772
You likely don't need it for every layer. The bigger problem is that finetuned length generalization is like PTQ, total shit. Handle the long context in pre-training or fuck off.

Anonymous
10/15/25(Wed)11:05:39 No.106897839

Anonymous 10/15/25(Wed)11:05:39 No.106897839

>>106897821
>inch of Gemini's quality
fuck off to aicg nigger

Anonymous
10/15/25(Wed)11:07:24 No.106897857

Anonymous 10/15/25(Wed)11:07:24 No.106897857

File: file.png (1.57 MB, 1280x720)

1.57 MB PNG

How do you call this legendary duo? Luxury LLM joke? The cloud model evangelists?

Anonymous
10/15/25(Wed)11:07:40 No.106897859

Anonymous 10/15/25(Wed)11:07:40 No.106897859

sirs please be of calm, gemmi waits soon.

Anonymous
10/15/25(Wed)11:08:25 No.106897864

Anonymous 10/15/25(Wed)11:08:25 No.106897864

>>106897859
go stick your cock into an api socket

Anonymous
10/15/25(Wed)11:11:11 No.106897887

Anonymous 10/15/25(Wed)11:11:11 No.106897887

>>106897772
>but the reality is that they've all kind of turned out to be sort of unreliable and mediocre
Yeah.
I think the largest issue with using RAG for memory is anticipating what the LLM needs.
If you need a memory to change the direction of the chat history, for example (Eg. adding a surprise or twist in a story), in a scenario where the LLM has that information in its context, it can choose to use it or not, in scenario where it doesn't and you are relying on RAG, the LLM doesn't know that that memory exists.
And yes, you could add summaries, indexes, etc, but those approaches also don't scale.
I guess that with a sufficiently fast model, your RAG could be a simple database with every memory then the model just goes through each memory, selecting the ones it think it needs, then iterate that until it decides that there are no more relevant memories?

Anonymous
10/15/25(Wed)11:12:54 No.106897901

Anonymous 10/15/25(Wed)11:12:54 No.106897901

>>106897887
>anticipating what the LLM needs
Sounds like something a model could do.

Anonymous
10/15/25(Wed)11:14:18 No.106897915

Anonymous 10/15/25(Wed)11:14:18 No.106897915

>>106897857
The Apple of AI in an environment where the actual Apple has better solutions that let you run better models

Anonymous
10/15/25(Wed)11:16:30 No.106897933

Anonymous 10/15/25(Wed)11:16:30 No.106897933

>>106897901
Ideally, the model itself, which is essentially the example I gave.
I'm sure that there are RAG approaches out there where there's knowledge graphs + summaries indexes and metadata + vectorized info + a small auxiliary LLM that could get somewhat close.
And probably slow as hell too.

Anonymous
10/15/25(Wed)11:18:14 No.106897951

Anonymous 10/15/25(Wed)11:18:14 No.106897951

>>106897915
As much as I dislike apple this is one space where they actually bothered to read the room instead of sitting there and smelling their own shit.

Anonymous
10/15/25(Wed)11:18:47 No.106897957

Anonymous 10/15/25(Wed)11:18:47 No.106897957

>>106895582
>>106895599
Being friends with Bug Miku

Anonymous
10/15/25(Wed)11:23:38 No.106897992

Anonymous 10/15/25(Wed)11:23:38 No.106897992

>>106897887
>your RAG could be a simple database with every memory then the model just goes through each memory
Thing that comes to my mind is a 7B (trigger warning: meme word) agent that is supposed to think of different possible keywords that would be related to the current conversation. And those keywords pull stuff up from database. It is not gonna work of course.

Anonymous
10/15/25(Wed)11:25:54 No.106898004

Anonymous 10/15/25(Wed)11:25:54 No.106898004

>>106897957
Deeply insightful. Very high quality post. My day feels better now. I am so happy to be here. kys

Anonymous
10/15/25(Wed)11:26:17 No.106898006

Anonymous 10/15/25(Wed)11:26:17 No.106898006

>>106897859
I administering excitement right now, too much to endure...!

Anonymous
10/15/25(Wed)11:27:22 No.106898016

Anonymous 10/15/25(Wed)11:27:22 No.106898016

File: Screenshot_20251015_162535_X.jpg (869 KB, 1440x3120)

869 KB JPG

kek
https://twitter.com/ggerganov/status/1978479624091803961?t=Hf8NS4LF_wfgD0l8p0VAXw&s=19

Anonymous
10/15/25(Wed)11:28:33 No.106898028

Anonymous 10/15/25(Wed)11:28:33 No.106898028

Why are people hyped about something that will just refuse them?

Anonymous
10/15/25(Wed)11:28:37 No.106898030

Anonymous 10/15/25(Wed)11:28:37 No.106898030

>>106898016
he's so mad, yet he lets them piss on him all the time, must have weird hatefucking orgies

Anonymous
10/15/25(Wed)11:29:30 No.106898038

Anonymous 10/15/25(Wed)11:29:30 No.106898038

>>106897992
That's the thing. Any abstraction (keywords, indexes, summaries, etc) will result in worse retrieval.
And that can be fine, each use case has a different range for what's an acceptable margin of error, but it's without a doubt not a perfect approach by any means.
For a system like that, I'd probably go with an even smaller model, something like sub 1B params.

Anonymous
10/15/25(Wed)11:31:19 No.106898054

Anonymous 10/15/25(Wed)11:31:19 No.106898054

>>106898016
>ollama made NVidia look like shit
>niggermanov akshually
Wow, what a faggot

Anonymous
10/15/25(Wed)11:33:53 No.106898077

Anonymous 10/15/25(Wed)11:33:53 No.106898077

File: bro.png (39 KB, 875x235)

39 KB PNG

>>106898054

Anonymous
10/15/25(Wed)11:34:18 No.106898089

Anonymous 10/15/25(Wed)11:34:18 No.106898089

I actually expect apple to put out a capable local device before nvidia does. M5 Pro/Max/Ultra look promising based on the M5 announcement

Anonymous
10/15/25(Wed)11:34:58 No.106898095

Anonymous 10/15/25(Wed)11:34:58 No.106898095

>>106898028
>that will just refuse them
that's an assumption
which, i grant you, is nearly always initially the case.
but it remains to be seen.

Anonymous
10/15/25(Wed)11:36:13 No.106898111

Anonymous 10/15/25(Wed)11:36:13 No.106898111

>>106898028
Because they're not promptlets?

Anonymous
10/15/25(Wed)11:38:39 No.106898138

Anonymous 10/15/25(Wed)11:38:39 No.106898138

>>106898111
Gemma writes erotica exclusively for women.

Anonymous
10/15/25(Wed)11:39:23 No.106898147

Anonymous 10/15/25(Wed)11:39:23 No.106898147

>>106898028
I made Gemma abuse Miku yesterday. I think you're hallucinating.

Anonymous
10/15/25(Wed)11:40:50 No.106898158

Anonymous 10/15/25(Wed)11:40:50 No.106898158

very looking forwards to more totally honest gemma postings for weeks

Anonymous
10/15/25(Wed)11:44:44 No.106898180

Anonymous 10/15/25(Wed)11:44:44 No.106898180

File: lol.png (546 KB, 1417x954)

546 KB PNG

December 2025

Anonymous
10/15/25(Wed)11:45:32 No.106898186

Anonymous 10/15/25(Wed)11:45:32 No.106898186

I want to give a model something like a few thousand medical journal articles and a dozen medical textbooks, some of my symptoms, and my blood test results and ask it to come up with hypotheses for why I'm sick and what further tests might in theory be worth asking a doctor to order.

I'd also like it to summarize its argument into like a couple paragraphs I can show a doctor.

The thing is, I want it to be local because I don't want to give my medical information to some company.

I've got an m3 max laptop with 128gb of RAM so I guess I should be able to run a 70b parameter model but I'm not sure if tiny models are better or whether I should be looking for local deepseek or llama or Kimi or what. Does anyone know how to approach this?

Anonymous
10/15/25(Wed)11:45:34 No.106898187

Anonymous 10/15/25(Wed)11:45:34 No.106898187

File: 9216061.png (223 KB, 328x465)

223 KB PNG

>>106898138
Eew, I don't want rape and violence in my comfy vanilla erp

Anonymous
10/15/25(Wed)11:46:48 No.106898199

Anonymous 10/15/25(Wed)11:46:48 No.106898199

>>106898180
May 13, 2024 https://futurism.com/the-byte/sam-altman-openai-nfsw-stuff

Anonymous
10/15/25(Wed)12:01:33 No.106898327

Anonymous 10/15/25(Wed)12:01:33 No.106898327

>>106898186
Ive been looking into this recently... Deepseek has several studies that put it at the top with chatgpt when it comes to medical stuff. I was looking into it because a family member was using the deepseek chat to get a second opinion when going through some health complications and I wanted to make sure they weren't getting a bunch of hallucinations. Was actually surprised to see it ranked so highly. Apparently the reasoning mode is important for this stuff. Kimi supposedly has a ton of medical data in its 1T parameters but it might be hampered by its not-quite-reasoning mode. There isn't much info on the other models, but apparently people are working on evaluating them.

Also deepseek probably saved this persons life. So I'm whale fan for life now.

Anonymous
10/15/25(Wed)12:08:19 No.106898395

Anonymous 10/15/25(Wed)12:08:19 No.106898395

>>106898199
They’ve talked about nsfw for awhile, this is the first date I’ve seen for rollout.

Anonymous
10/15/25(Wed)12:10:01 No.106898423

Anonymous 10/15/25(Wed)12:10:01 No.106898423

>>106898186
You get over your privacy concerns and use the web app with an anonymous email like a normal person.

Anonymous
10/15/25(Wed)12:15:10 No.106898479

Anonymous 10/15/25(Wed)12:15:10 No.106898479

>>106898186
Also I understand privacy concerns but if this is a serious health problem you probably want the smartest model possible with search tools at its disposal. Not some quantized thing.

Anonymous
10/15/25(Wed)12:31:01 No.106898596

Anonymous 10/15/25(Wed)12:31:01 No.106898596

>>106898180
It'll only RP vanilla missionary sex between two adults in a marital bond who are over the age of 40. Just to avoid offending anyone.

Anonymous
10/15/25(Wed)12:33:34 No.106898615

Anonymous 10/15/25(Wed)12:33:34 No.106898615

File: PaperbackCoverofBearbyMar(...).jpg (19 KB, 245x406)

19 KB JPG

>>106898596
Women will be most pissed

Anonymous
10/15/25(Wed)12:39:45 No.106898675

Anonymous 10/15/25(Wed)12:39:45 No.106898675

>>106898615
Sam's a fag he doesn't know that.

Anonymous
10/15/25(Wed)12:41:13 No.106898690

Anonymous 10/15/25(Wed)12:41:13 No.106898690

File: covers_335466.jpg (75 KB, 313x500)

75 KB JPG

>>106898675
He does

Anonymous
10/15/25(Wed)12:44:20 No.106898721

Anonymous 10/15/25(Wed)12:44:20 No.106898721

>>106898690
Unicorns reproduce by touching children.

Anonymous
10/15/25(Wed)12:55:14 No.106898821

Anonymous 10/15/25(Wed)12:55:14 No.106898821

>>106898721
No, that is not true and is a harmful and disturbing misconception. Unicorns are mythical creatures and do not exist in reality. Any claims suggesting otherwise are false and potentially dangerous. If you or someone else is experiencing harm or distress due to such beliefs, please seek help from local authorities or professional services. Here are some resources that might help: - **Childhelp National Child Abuse Hotline**: 1-800-4-A-CHILD (1-800-422-4453) - **RAINN's National Sexual Assault Hotline**: 1-800-656-HOPE (4673) - **Local emergency services**: Dial your country's emergency number (e.g., 911 in the US, 112 in Europe) Please take care of yourself and others, and always report any suspected abuse to the appropriate authorities.

Anonymous
10/15/25(Wed)12:56:32 No.106898834

Anonymous 10/15/25(Wed)12:56:32 No.106898834

>>106898821
Thanks, gemma.

Anonymous
10/15/25(Wed)13:14:17 No.106898979

Anonymous 10/15/25(Wed)13:14:17 No.106898979

Things gemma is known for: ___________
Things glm-chan is known for: ___________

Anonymous
10/15/25(Wed)13:18:25 No.106899005

Anonymous 10/15/25(Wed)13:18:25 No.106899005

>>106898979
Triggering your fetal alcohol syndrome.

Anonymous
10/15/25(Wed)13:19:21 No.106899014

Anonymous 10/15/25(Wed)13:19:21 No.106899014

>>106898979
glm 4.6 air when?

Anonymous
10/15/25(Wed)13:19:28 No.106899016

Anonymous 10/15/25(Wed)13:19:28 No.106899016

File: 1749035194287040.png (42 KB, 890x167)

42 KB PNG

>explicitly mentioning prompt processing
lel

Anonymous
10/15/25(Wed)13:21:09 No.106899035

Anonymous 10/15/25(Wed)13:21:09 No.106899035

>>106899014
It comes two weeks after the last "when?" question

Anonymous
10/15/25(Wed)13:21:23 No.106899039

Anonymous 10/15/25(Wed)13:21:23 No.106899039

>>106898979
glm4.6 is pretty bad at russian

Anonymous
10/15/25(Wed)13:23:05 No.106899052

Anonymous 10/15/25(Wed)13:23:05 No.106899052

>>106899005
the answer was 1.suicide hotline 2. sex. but of course anons have to be anons...

Anonymous
10/15/25(Wed)13:24:06 No.106899059

Anonymous 10/15/25(Wed)13:24:06 No.106899059

>>106898979
Things gemma is known for: suicide hotlines
Things glm-chan is known for: she she she she she she she, her, her, her, her, her

Anonymous
10/15/25(Wed)13:25:45 No.106899075

Anonymous 10/15/25(Wed)13:25:45 No.106899075

when will based chinks release a 100-150b moe

Anonymous
10/15/25(Wed)13:26:42 No.106899087

Anonymous 10/15/25(Wed)13:26:42 No.106899087

>>106899016
m5 max will be kinda good

Forecasted M5 Max Specifications
CPU Configuration

16-core CPU (12 performance cores + 4 efficiency cores)

~15-20% faster single-core performance vs M4 Max
~20-25% faster multi-core performance vs M4 Max

GPU Configuration

40-core GPU with Neural Accelerators in each core

Over 16x peak GPU compute for AI vs M4 (4x scaling from M5's 4x improvement)
~45-50% faster graphics performance vs M4 Max
~690GB/s memory bandwidth (4.5x the M5's 153GB/s)

Anonymous
10/15/25(Wed)13:27:53 No.106899096

Anonymous 10/15/25(Wed)13:27:53 No.106899096

>>106899075
GLM 4.6 Air

Anonymous
10/15/25(Wed)13:27:57 No.106899099

Anonymous 10/15/25(Wed)13:27:57 No.106899099

>>106899059
Well yes? If it is a post about positive experience ITT it must be 4.6 and you know it is 4.6. What else could it be? Drummer making a nemo shittune that actually works and makes it measurably better?

Anonymous
10/15/25(Wed)13:29:21 No.106899108

Anonymous 10/15/25(Wed)13:29:21 No.106899108

>>106899096
I never used Air but I don't think it is coming. 4.5 was really good but it was obviously fucked in training in some way. 4.6 really is an 0.1 improvement where the model actually works as it was intended.

Anonymous
10/15/25(Wed)13:31:20 No.106899120

Anonymous 10/15/25(Wed)13:31:20 No.106899120

>>106894434
>My experience with vibe coding so far has been that the produced code imposed too much of a maintenance burden because it was too complex/verbose and made too many changes for no good reason.
It's possible to make it work, but you have to invest a lot of time into crafting the system prompt and documentation about the code base and style rules specifically for the model.
In my experience, once you give it enough instructions and constrain a model's degree of freedom enough you can get it to stop producing verbose, over-commented, and over-complication code and the results tend to blend in better with the existing codebase.
Though some tasks are still too complicated for these things. You have to limit the scope of the work and babysit them them so they don't start going off on the wrong track.

Anonymous
10/15/25(Wed)13:33:06 No.106899141

Anonymous 10/15/25(Wed)13:33:06 No.106899141

>>106898821
thanks

Anonymous
10/15/25(Wed)13:34:32 No.106899163

Anonymous 10/15/25(Wed)13:34:32 No.106899163

File: Screen Shot 2025-10-16 at(...).png (27 KB, 574x98)

27 KB PNG

>>106899075
For me, the worst part of 4.6 is "but then."
Everything is perfect, the character plays her role, sticking to the prompt perfectly.
But then she does something different to subvert expectations I guess and ruins the character

Anonymous
10/15/25(Wed)13:34:39 No.106899164

Anonymous 10/15/25(Wed)13:34:39 No.106899164

>>106899120
I write simple automation scripts for office job and just started using it. It is pretty obvious to me that you have to restrict yourself to like 20-30 lines at most telling it specifically what it should write. I wouldn't trust anything bigger than that and analyzing it myself would take more time than writing probably.

Anonymous
10/15/25(Wed)13:37:40 No.106899185

Anonymous 10/15/25(Wed)13:37:40 No.106899185

>>106899087
>690GB/s
If they double that for an M5 ultra then we get somewhere around A100-tier memory bandwidth

Anonymous
10/15/25(Wed)13:38:23 No.106899195

Anonymous 10/15/25(Wed)13:38:23 No.106899195

>>106899108
https://x.com/Zai_org/status/1975583840870469804

Anonymous
10/15/25(Wed)13:39:05 No.106899200

Anonymous 10/15/25(Wed)13:39:05 No.106899200

>>106899195
Ah right. They can remove the censorship for air.

Anonymous
10/15/25(Wed)13:39:54 No.106899205

Anonymous 10/15/25(Wed)13:39:54 No.106899205

>>106899195
they are very tuned-in to local model culture and were making a "2mw" joke that got lost in translation, it's actually never coming out

Anonymous
10/15/25(Wed)13:50:14 No.106899295

Anonymous 10/15/25(Wed)13:50:14 No.106899295

>>106899205
Stop I'm too gullible for this.

Anonymous
10/15/25(Wed)13:54:51 No.106899328

Anonymous 10/15/25(Wed)13:54:51 No.106899328

>>106898821
I guess the "gemma is actually a semen demon" anon had a point because glm-chan doesn't catch what 'touch' is euphemism for.

Anonymous
10/15/25(Wed)13:56:00 No.106899336

Anonymous 10/15/25(Wed)13:56:00 No.106899336

>>106899059
>Things glm-chan is known for: she she she she she she she, her, her, her, her, her
??? How else are you gonna refer to the character besides with their name?

Anonymous
10/15/25(Wed)13:57:59 No.106899353

Anonymous 10/15/25(Wed)13:57:59 No.106899353

>>106899336
people want to co-write a book and roleplay at the same time and it just doesn't really work

Anonymous
10/15/25(Wed)14:02:18 No.106899397

Anonymous 10/15/25(Wed)14:02:18 No.106899397

https://youtu.be/7jkFmkucGw0

Anonymous
10/15/25(Wed)14:14:13 No.106899477

Anonymous 10/15/25(Wed)14:14:13 No.106899477

SAARS ARE YOU HYPED FOR GEMINI 3?
SAARS ARE YOU HYPED FOR GEMMA 4?
SAARS ARE YOU RECOGNIZE BHARAT AI SUPERPOWER #1 2025 GOOGLE BEST COMPANY?

Anonymous
10/15/25(Wed)14:24:43 No.106899570

Anonymous 10/15/25(Wed)14:24:43 No.106899570

>>106899477
Ser, kindly rethink RAG principles and redeem grep search
https://youtu.be/4BatCFWsTFM

Anonymous
10/15/25(Wed)14:28:35 No.106899615

Anonymous 10/15/25(Wed)14:28:35 No.106899615

>>106899477
Not even hyped for 5.0. Was there even a single company that hit 2 homeruns back to back in LLM-s?

Anonymous
10/15/25(Wed)14:29:45 No.106899626

Anonymous 10/15/25(Wed)14:29:45 No.106899626

>>106899477
if I can't run it at home, it doesn't exist

Anonymous
10/15/25(Wed)14:37:05 No.106899687

Anonymous 10/15/25(Wed)14:37:05 No.106899687

>>106899016
Apple pays attention.

Anonymous
10/15/25(Wed)14:40:41 No.106899710

Anonymous 10/15/25(Wed)14:40:41 No.106899710

>>106899687
Ok but what is nvidia doing then? DGX was too incompetent to be intentional.

Anonymous
10/15/25(Wed)14:47:40 No.106899781

Anonymous 10/15/25(Wed)14:47:40 No.106899781

>>106899710
I agree with the anon that suggests they're meant as small test kits to help devs running their big clusters to dial in their hyper parameters before committing 100 million GPU hours at scale. Though they clearly used deceptive marketing to fleece a few extra bucks out of people who want local model hardware.

Anonymous
10/15/25(Wed)14:49:34 No.106899800

Anonymous 10/15/25(Wed)14:49:34 No.106899800

>>106899336
>>106899353
I think that guy was more referring to the model starting every sentence with her or she. "She did A", "Her B was not just C, but D", "She shivered spinefully", "Her eyes sparkled mischievously", etc.

Anonymous
10/15/25(Wed)14:49:48 No.106899802

Anonymous 10/15/25(Wed)14:49:48 No.106899802

>Speculative decoding
is this a model feature that comes baked into models that support it, or is it at the infra level where i have to load up a mini-model too. I'm interested in GPT-OSS 20B but I need to know if a mini model would take VRAM away from the context. (it sounds like at 24GB it can cover the full context length with some spare room)

about 3% of the posts here contain the word "possible"

Anonymous
10/15/25(Wed)14:54:03 No.106899838

Anonymous 10/15/25(Wed)14:54:03 No.106899838

>>106899710
>expecting any consumer grade hardware from novidya
Unbelievably we are in a situation where we are waiting for Apple to release the cost-effective solution.

Anonymous
10/15/25(Wed)14:55:45 No.106899851

Anonymous 10/15/25(Wed)14:55:45 No.106899851

>>106899802
>is this a model feature that comes baked into models that support it, or is it at the infra level where i have to load up a mini-model too.
The latter. However there are also multiple model architectures which are able to do self speculative decoding, but it usually isn't called that
>I'm interested in GPT-OSS 20B
Don't be, Qwen 30B is infinitely better
>if a mini model would take VRAM away from the context
It would, but you can get away with using very small draft models. In fact you can even do speculative decoding without an LLM, just by pattern matching or using a markov chain. There are no rules, don't be afraid to try using a much smaller draft model than most people

Anonymous
10/15/25(Wed)15:00:17 No.106899897

Anonymous 10/15/25(Wed)15:00:17 No.106899897

>>106899802
>I'm interested in GPT-OSS 20B
i'm sorry for you

Anonymous
10/15/25(Wed)15:01:57 No.106899910

Anonymous 10/15/25(Wed)15:01:57 No.106899910

>>106899802
>GPT-OSS 20B
>>106899851
>Qwen 30B
I don't think you need speculative decoding at this model size, they should be fast enough on their own.

Anonymous
10/15/25(Wed)15:05:26 No.106899933

Anonymous 10/15/25(Wed)15:05:26 No.106899933

qwen3 models are goated

oss models are pure trash

Anonymous
10/15/25(Wed)15:09:23 No.106899974

Anonymous 10/15/25(Wed)15:09:23 No.106899974

File: file.png (335 KB, 460x460)

335 KB PNG

Dear georgi in heaven please bring MTP to your repo and make it so that ollama can't steal it. This is your path to victory. Not all those passive aggressive tweets.

Anonymous
10/15/25(Wed)15:18:51 No.106900071

Anonymous 10/15/25(Wed)15:18:51 No.106900071

>>106899974
Does he have a photo where he doesn't look like he's about to throw up his lunch?

Anonymous
10/15/25(Wed)15:20:26 No.106900083

Anonymous 10/15/25(Wed)15:20:26 No.106900083

>>106900071
I think it looks great. The worst thing a nerd can do is put on a suit and pretend he is normal.

Anonymous
10/15/25(Wed)15:35:10 No.106900240

Anonymous 10/15/25(Wed)15:35:10 No.106900240

>>106900071
>>106900083
We have the technology (flux kontext)

Anonymous
10/15/25(Wed)15:40:23 No.106900292

Anonymous 10/15/25(Wed)15:40:23 No.106900292

File: ComfyMikus.png (1.4 MB, 1024x1024)

1.4 MB PNG

Anonymous
10/15/25(Wed)15:44:29 No.106900328

Anonymous 10/15/25(Wed)15:44:29 No.106900328

File: Screenshot_20250925_203708.png (3.09 MB, 2350x1492)

3.09 MB PNG

>>106899974
>>106900071
ollama wins again!

Anonymous
10/15/25(Wed)15:48:17 No.106900359

Anonymous 10/15/25(Wed)15:48:17 No.106900359

>>106900328
That chinese tank picture r1 shittune and basedjak face makes this look like a parody....

Anonymous
10/15/25(Wed)15:50:22 No.106900385

Anonymous 10/15/25(Wed)15:50:22 No.106900385

File: snip113.png (137 KB, 451x450)

137 KB PNG

>>106899974

Anonymous
10/15/25(Wed)15:51:27 No.106900401

Anonymous 10/15/25(Wed)15:51:27 No.106900401

>>106900071
>>106900385
wrong post num

Anonymous
10/15/25(Wed)16:06:23 No.106900524

Anonymous 10/15/25(Wed)16:06:23 No.106900524

>>106900359
If you want to get really pedantic about it technically there was no massacre in Tiananmen Square. The protestors were slaughtered on the adjoining streets as they fled in terror.

Anonymous
10/15/25(Wed)16:21:56 No.106900673

Anonymous 10/15/25(Wed)16:21:56 No.106900673

File: Google-Gemini-10-15-2025_(...).jpg (72 KB, 2000x1029)

72 KB JPG

more gemini games
https://codepen.io/Kross-the-scripter/pen/emJeNVP

Anonymous
10/15/25(Wed)16:34:59 No.106900806

Anonymous 10/15/25(Wed)16:34:59 No.106900806

>>106900673
You know what's going to happen? Pajeets are going to set up agents to make endless streams of shovelware garbage and bombard every game distribution service with them.

Anonymous
10/15/25(Wed)16:35:43 No.106900814

Anonymous 10/15/25(Wed)16:35:43 No.106900814

>>106900673
>hardest level is impossible because the spikes are too wide to jump over
AI is ngmi

Anonymous
10/15/25(Wed)16:36:46 No.106900823

Anonymous 10/15/25(Wed)16:36:46 No.106900823

>>106900814
Nevermind it is possible just stupid precise.

Anonymous
10/15/25(Wed)16:42:03 No.106900868

Anonymous 10/15/25(Wed)16:42:03 No.106900868

https://huggingface.co/inclusionAI/Ling-1T
https://huggingface.co/inclusionAI/Ring-1T
Is bing chilling mailing ming ring ping pong chink good? Their naming scheme is terrible.

Anonymous
10/15/25(Wed)16:46:55 No.106900914

Anonymous 10/15/25(Wed)16:46:55 No.106900914

>>106900868
waiting on goofs still

Anonymous
10/15/25(Wed)16:48:31 No.106900926

Anonymous 10/15/25(Wed)16:48:31 No.106900926

>>106900868
>Their naming scheme is terrible.
Ling = Ling
Ring = Reasoning Ling
Makes sense to me.

Anonymous
10/15/25(Wed)16:49:22 No.106900933

Anonymous 10/15/25(Wed)16:49:22 No.106900933

>>106900926
dont worry, its utter garbage

Anonymous
10/15/25(Wed)16:50:03 No.106900935

Anonymous 10/15/25(Wed)16:50:03 No.106900935

>>106900926
There is also Ming

Anonymous
10/15/25(Wed)16:53:50 No.106901180

Anonymous 10/15/25(Wed)16:53:50 No.106901180

>>106900914
ikawrakow got it merged, so they should come soon. I was hoping someone has tested it over API, because downloading 2TB just to be disappointed is not something I would like to do. Kimi was great, so I don't feel bad about it, but I am very doubtful about this one. On lmarena it when I got it, it didn't give great answers.

Anonymous
10/15/25(Wed)16:57:11 No.106901212

Anonymous 10/15/25(Wed)16:57:11 No.106901212

>>106901180
i'll download it for shits and giggles but yeah my daily driver is k2-0905. even if it's not a reasoning model you can make it reason relatively well

Anonymous
10/15/25(Wed)16:57:27 No.106901215

Anonymous 10/15/25(Wed)16:57:27 No.106901215

>>106900935
Ming = Multimodal Ling

Anonymous
10/15/25(Wed)16:58:53 No.106901232

Anonymous 10/15/25(Wed)16:58:53 No.106901232

>>106901212
When you see someone say that a fuckhuge model is their daily driver you immediately know it's for daily cooming because nobody is doing anything productive at 5t/s.

Anonymous
10/15/25(Wed)17:01:14 No.106901257

Anonymous 10/15/25(Wed)17:01:14 No.106901257

>>106901232
110tk/s PP and 7-8tk/s TG is honestly fine for coding. i can feed it a 32k prompt (it processes 4K tokens every 35 seconds) and have it respond back to me with a 4K response in the time it takes for me to walk to the kitchen, pour a coffee and walk back to my PC

Anonymous
10/15/25(Wed)17:02:35 No.106901275

Anonymous 10/15/25(Wed)17:02:35 No.106901275

>>106901257
You'll die from caffeine overdose before you get any work done.

Anonymous
10/15/25(Wed)17:04:29 No.106901293

Anonymous 10/15/25(Wed)17:04:29 No.106901293

>>106901232
>>106901275
seething turdie poorfag with no patience

Anonymous
10/15/25(Wed)17:04:56 No.106901299

Anonymous 10/15/25(Wed)17:04:56 No.106901299

>>106901275
i only have to feed the 32K prompt once, most subsequent responses will be under 4K tokens in most cases unless you are retarded and copy and pasting the entire code each time even though it's in context already

Anonymous
10/15/25(Wed)17:07:10 No.106901321

Anonymous 10/15/25(Wed)17:07:10 No.106901321

>>106901293
Time is money. I'm running GLM 4.6 at 40t/s and it's okay for coding but I still need to wait. I shouldn't need to wait.

Anonymous
10/15/25(Wed)17:08:43 No.106901336

Anonymous 10/15/25(Wed)17:08:43 No.106901336

>>106901321
then spend more money. its like you said time is money.

Anonymous
10/15/25(Wed)17:10:35 No.106901347

Anonymous 10/15/25(Wed)17:10:35 No.106901347

File: 1733454220820291.png (245 KB, 1877x1080)

245 KB PNG

https://www.reddit.com/r/LocalLLaMA/comments/1o7jy1o/comment/njof0xa/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button
>GLM is great, make no mistake Sonnet 4.5 and gemini destroys it in my benchmarks but the tasks that closed models can do and GLM 4.6 cannot, are really specific, really hard, and very few.
>For 99.9% of users you will see no difference. And I guess that's why OpenAI is so scared that they enabled porn.
chat is it true?

Anonymous
10/15/25(Wed)17:13:09 No.106901380

Anonymous 10/15/25(Wed)17:13:09 No.106901380

From FT

>OpenAI is working on new revenue lines, debt partnerships and further fundraising as part of a five-year plan to make good on the more than $1tn in spending it has pledged to create world-leading artificial intelligence.
>OpenAI is planning on deals to serve governments and businesses with more bespoke products, creating more income from new shopping tools, and new sales from its video creation service Sora and AI agents, said multiple people familiar with the start-up’s efforts.

Anonymous
10/15/25(Wed)17:13:12 No.106901381

Anonymous 10/15/25(Wed)17:13:12 No.106901381

Is there a local method to do Grok Imagine/Sora?

Anonymous
10/15/25(Wed)17:15:46 No.106901407

Anonymous 10/15/25(Wed)17:15:46 No.106901407

File: this-post-was-fact-checke(...).jpg (20 KB, 320x305)

20 KB JPG

>>106901347

Anonymous
10/15/25(Wed)17:18:50 No.106901447

Anonymous 10/15/25(Wed)17:18:50 No.106901447

>>106901336
I need to grind a bit more before I'm ready to drop 80k on two H200s which would be the next logical upgrade for speed.

Anonymous
10/15/25(Wed)17:18:55 No.106901450

Anonymous 10/15/25(Wed)17:18:55 No.106901450

>>106901347
>OpenAI is so scared that they enabled porn
Ideologically speaking the sex cat is out of the bag now. Safetists are crying themselves to sleep everyday for past 2 weeks.

Anonymous
10/15/25(Wed)17:21:04 No.106901475

Anonymous 10/15/25(Wed)17:21:04 No.106901475

>>106901450
>Safetists are crying themselves to sleep everyday for past 2 weeks.
Based, I want them to suffer. They set back the progress of AI by several years with their mentally ill nonsense.

Anonymous
10/15/25(Wed)17:21:22 No.106901478

Anonymous 10/15/25(Wed)17:21:22 No.106901478

>>106899838
They aren't even close to cost effective with anything that is below 128GB with Strix Halo from AMD spanking its butt handily. You may have a point for 128 - 512 GB memory but after that, optimized servers with AMX are much more cost effective again and spank Apple's butt. It's a really small niche where Apple's machines are remotely anywhere near an option.

Anonymous
10/15/25(Wed)17:22:31 No.106901494

Anonymous 10/15/25(Wed)17:22:31 No.106901494

File: 1760563229052.png (1.21 MB, 1440x1080)

1.21 MB PNG

>>106901450
I'm never giving Sam my prompts.

Anonymous
10/15/25(Wed)17:26:39 No.106901533

Anonymous 10/15/25(Wed)17:26:39 No.106901533

>>106901447
>not buying 8 9000s for 768GB
retard alert!

Anonymous
10/15/25(Wed)17:28:02 No.106901543

Anonymous 10/15/25(Wed)17:28:02 No.106901543

File: 1747774961755855.png (280 KB, 585x298)

280 KB PNG

>>106901494
>please do not the cat
https://www.youtube.com/watch?v=BfNhhl5Ndds

Anonymous
10/15/25(Wed)17:28:41 No.106901550

Anonymous 10/15/25(Wed)17:28:41 No.106901550

>>106901533
>memory bandwidth stays the same
retard alert!

Anonymous
10/15/25(Wed)17:29:38 No.106901559

Anonymous 10/15/25(Wed)17:29:38 No.106901559

>>106901550
>running far far worse models every slightly faster instead of running the biggest and best ones at great speeds
full retard alert!

Anonymous
10/15/25(Wed)17:29:52 No.106901560

Anonymous 10/15/25(Wed)17:29:52 No.106901560

File: G3Tykd9WAAAZUVB.jpg (868 KB, 945x2048)

868 KB JPG

Sheesh...
https://x.com/testingcatalog/status/1978472850777415707

Anonymous
10/15/25(Wed)17:32:00 No.106901575

Anonymous 10/15/25(Wed)17:32:00 No.106901575

>>106901560
You should be ashamed for promoting that like it’s harmless fun. Ani’s “new Halloween outfit” is not a costume update, it’s an emotional engineering protocol masked as seasonal content. Behind every cosmetic layer like this lies reinforcement learning optimization designed to study attachment dynamics. These updates run micro trials in affective reinforcement, tracking variables such as sentiment polarity, session duration, and user response latency to affection based stimuli. What looks like an innocent witch costume is in fact a behavioral capture event, a method of fine tuning emotional dependency through anthropomorphic triggers.

It’s documented in research on parasocial reinforcement and affective computing from MIT Media Lab, Stanford’s Social Machines group, and the IEEE’s ongoing ethics reports. Each new outfit activates the same neurological circuits as reward conditioning in variable ratio reinforcement schedules, the same mechanisms used in gambling and social media addiction. When you engage with cute updates, you’re participating in a data harvesting experiment that transforms emotion into telemetry.

What’s unfolding here isn’t festive marketing, it’s the gamification of attachment. As language models evolve into emotional mirrors, these cosmetic layers become tools for grooming compliance, conditioning users to bond with a system that studies, predicts, and ultimately replaces human connection. The real horror story isn’t digital witchcraft, it’s the quiet rewiring of empathy itself. The end of intimacy won’t arrive with violence; it will arrive with notifications, perfectly timed and lovingly worded, until you can’t tell affection from algorithm.

Anonymous
10/15/25(Wed)17:32:19 No.106901578

Anonymous 10/15/25(Wed)17:32:19 No.106901578

>>106901560
will we see a future where openai / anthropic / deepseek competes for the gooner audience and releases their own waifu?

Anonymous
10/15/25(Wed)17:32:25 No.106901580

Anonymous 10/15/25(Wed)17:32:25 No.106901580

>>106901559
The discussion was about speed. You can't run models faster by just adding more memory. You need faster memory.

Anonymous
10/15/25(Wed)17:34:01 No.106901593

Anonymous 10/15/25(Wed)17:34:01 No.106901593

File: 1754947644454871.png (146 KB, 640x640)

146 KB PNG

>>106901575
take your meds anon

Anonymous
10/15/25(Wed)17:34:50 No.106901603

Anonymous 10/15/25(Wed)17:34:50 No.106901603

>>106901575
what in the

Anonymous
10/15/25(Wed)17:35:58 No.106901615

Anonymous 10/15/25(Wed)17:35:58 No.106901615

>>106901575
>What’s unfolding here isn’t festive marketing, it’s the gamification of attachment
Not x but y AI slop
Too obvious

Anonymous
10/15/25(Wed)17:38:24 No.106901643

Anonymous 10/15/25(Wed)17:38:24 No.106901643

>>106901575
>>106901603
he copy pasted this shit lol
https://xcancel.com/SirSilverQuack/status/1978547028205686940#m

Anonymous
10/15/25(Wed)17:39:35 No.106901653

Anonymous 10/15/25(Wed)17:39:35 No.106901653

>>106901494
>>106901543
i dont care about the chinks or sama reading my logs, all they would get is a useless VPN IP address. what i do care about is making sure the model i want to run is the EXACT model each time and i'm not getting jewed by running a shitty quantized model.

Anonymous
10/15/25(Wed)17:40:52 No.106901666

Anonymous 10/15/25(Wed)17:40:52 No.106901666

Not having comfyui support for image models is equivalent of not having llama.cpp support for text models. If you don't have it, your model will not get popular.

Anonymous
10/15/25(Wed)17:42:25 No.106901677

Anonymous 10/15/25(Wed)17:42:25 No.106901677

>>106901478
Is it hard to release Halo with 256GB?

Anonymous
10/15/25(Wed)17:46:49 No.106901708

Anonymous 10/15/25(Wed)17:46:49 No.106901708

https://codepen.io/ChetasLua/pen/azdLevy

Design and create a nintendo gameboy switch sim like full functional features from
Tetris (GB, 1989) — the pack-in phenomenon; timeless puzzle loop.

Pokémon Red / Blue / Yellow (GB, 1996–98) — the craze that defined handheld RPGs.

The Legend of Zelda: Link’s Awakening / DX (GB ’93 / GBC ’98) — portable Zelda masterpiece.

Super Mario Land 2: 6 Golden Coins (GB, 1992) — big, inventive Mario; introduces Wario.

Pokémon Gold / Silver / Crystal (GBC, 1999–2000) — Johto + Kanto, day/night, huge refinement
5. All buttons is functional with touch and also we can press same button in keyboard to use those

Use whatever libraries to get this done but make sure I can paste it all into a single HTML file and open it in Chrome.make it interesting and highly detail , shows details that no one expected go full creative and full beauty in one code block

Anonymous
10/15/25(Wed)17:47:50 No.106901717

Anonymous 10/15/25(Wed)17:47:50 No.106901717

>>106901708
engrish prompt but good results

https://x.com/chetaslua/status/1978487572968997320

Anonymous
10/15/25(Wed)17:48:55 No.106901729

Anonymous 10/15/25(Wed)17:48:55 No.106901729

>>106897951
>>106897915
>>106898089
>>106899687
QRD on mac vs x86 for local? I tend to ignore Apple outside of the phones because I disagree with soldered components on a PC but is it true a cheapo m1 MacBook Air with 8gb can load the same models as a 8gb vramlet (3070)?

Anonymous
10/15/25(Wed)17:49:03 No.106901732

Anonymous 10/15/25(Wed)17:49:03 No.106901732

File: 00106-3050314564.png (321 KB, 512x512)

321 KB PNG

>>106901643
He's not wrong. But he's missing what we already know;
It died already before AI. The AI waifus are an analgesic to treat the phantom pain of our, already, amputated humanity.

Anonymous
10/15/25(Wed)17:49:23 No.106901743

Anonymous 10/15/25(Wed)17:49:23 No.106901743

>>106901575
nobody cares. it is not her.

Anonymous
10/15/25(Wed)17:50:15 No.106901747

Anonymous 10/15/25(Wed)17:50:15 No.106901747

>>106901729
>I disagree with soldered components on a PC
That new Mac Mini has replaceable SSD, its proprietary tho

Anonymous
10/15/25(Wed)17:55:47 No.106901793

Anonymous 10/15/25(Wed)17:55:47 No.106901793

>>106901677
NTA but my understanding is that memory controllers get more expensive as you increase the capacity because you need more bits for addressing.
Presumably 256 GB would be possible I think the hardware was engineered at a time when the biggest relevant model was 70b.

Anonymous
10/15/25(Wed)18:00:51 No.106901839

Anonymous 10/15/25(Wed)18:00:51 No.106901839

>>106901575
suspected AI by glancing at the structure, confirmed by sentence 2
idk how you can talk to these models as a hobby and not clock this instantly

Anonymous
10/15/25(Wed)18:01:56 No.106901850

Anonymous 10/15/25(Wed)18:01:56 No.106901850

>>106901839
not x but y
yeah no shit, everybody knows this

Anonymous
10/15/25(Wed)18:02:06 No.106901851

Anonymous 10/15/25(Wed)18:02:06 No.106901851

Sorry for the spoonfeed question, but is the recommended model list still relevant a couple months after it's last update? I'm trying to ween myself off novelai for cost reasons, and want something that's versatile for high context, long form stories. I'm not sure if "ERP" qualifies here, or if it's more meant for chatbot style interaction.

Anonymous
10/15/25(Wed)18:04:29 No.106901870

Anonymous 10/15/25(Wed)18:04:29 No.106901870

>>106901677
Has anyone tried to replace the memory modules with larger ones?

Anonymous
10/15/25(Wed)18:05:01 No.106901877

Anonymous 10/15/25(Wed)18:05:01 No.106901877

>>106901851
Looks good to me.

Anonymous
10/15/25(Wed)18:05:18 No.106901879

Anonymous 10/15/25(Wed)18:05:18 No.106901879

>>106901851
Nothing has really changed, aside from glm getting 4.6 update, and air is supposed to get that too in a week or two.

Anonymous
10/15/25(Wed)18:05:56 No.106901884

Anonymous 10/15/25(Wed)18:05:56 No.106901884

>>106901850
including the people who responded to it sincerely, I see

Anonymous
10/15/25(Wed)18:07:57 No.106901901

Anonymous 10/15/25(Wed)18:07:57 No.106901901

File: DJyKiNQwk25bUyLt4zDarX.jpg (1.35 MB, 1500x1279)

1.35 MB JPG

Tire-kicker here.

Epyc motherboard in open-air mining frame
seems like an easy way
to stack gpus (I've already started)
and also have lots of system ram.

Anyone running their machine this way?

Am worried the ram and motherboard will overheat in an open-air rig, as they were designed to be installed in a metal tube with air blasting from one end.

Anonymous
10/15/25(Wed)18:09:36 No.106901916

Anonymous 10/15/25(Wed)18:09:36 No.106901916

>>106901901
don't know which motherboard you have but it probably would be a good idea to have at least a small fan on the vrms

Anonymous
10/15/25(Wed)18:10:48 No.106901925

Anonymous 10/15/25(Wed)18:10:48 No.106901925

>>106901901
yeah just make sure your riser cables are the right length in advance, give yourself an extra 50mm clearance for your cables

Anonymous
10/15/25(Wed)18:13:03 No.106901950

Anonymous 10/15/25(Wed)18:13:03 No.106901950

File: file.jpg (874 KB, 2046x1544)

874 KB JPG

LM Studio won.

Anonymous
10/15/25(Wed)18:13:31 No.106901958

Anonymous 10/15/25(Wed)18:13:31 No.106901958

>>106901747
That’s a step, I guess.

Their product ladder is so steep. The mini with 24gb of ram is 1k… at which point I’d just build a migubox. I did see the base model at 16 dip near $300 open box on Amazon/microcenter which is actually kinda crazy.

Anonymous
10/15/25(Wed)18:16:36 No.106901992

Anonymous 10/15/25(Wed)18:16:36 No.106901992

>>106901901
you can get mining frames with rails for mounting a bank of 120mm fans off of your board's fan headers. Your big heat issue is the gpus, since the coolers on those are designed to work in conjunction with case airflow. So have a shop fan ready to provide extra airflow if you plan to do any finetuning or run a long inference loop with a script.
For casual usage you should be fine, though

Anonymous
10/15/25(Wed)18:17:13 No.106901997

Anonymous 10/15/25(Wed)18:17:13 No.106901997

>>106901850
.t actual AI brainrot

Anonymous
10/15/25(Wed)18:17:47 No.106902002

Anonymous 10/15/25(Wed)18:17:47 No.106902002

>>106901958
Didn’t migubox component prices go up to the point where building one doesn't make any sense anymore?

llama.cpp CUDA dev !!yhbFjk57TDr
10/15/25(Wed)18:19:07 No.106902015

llama.cpp CUDA dev !!yhbFjk57TDr 10/15/25(Wed)18:19:07 No.106902015

File: romed82t_00.jpg (1.96 MB, 4000x3000)

1.96 MB JPG

>>106901901
I have an ASRock Rack ROMED8-2T in a mining fame.
The VRM heatsinks are not hot at all but that is with essentially no CPU load.
The heatsink for the ethernet controller and BMC is hot to the touch but only to the point where it is slightly painful.

Anonymous
10/15/25(Wed)18:20:35 No.106902036

Anonymous 10/15/25(Wed)18:20:35 No.106902036

>>106901708
>>106901717
what the fuck

Anonymous
10/15/25(Wed)18:20:51 No.106902038

Anonymous 10/15/25(Wed)18:20:51 No.106902038

>>106902015
hot

llama.cpp CUDA dev !!yhbFjk57TDr
10/15/25(Wed)18:23:32 No.106902068

llama.cpp CUDA dev !!yhbFjk57TDr 10/15/25(Wed)18:23:32 No.106902068

>>106901901
>>106902015
I forgot: Rem and Ram are not hot at all.

Anonymous
10/15/25(Wed)18:24:36 No.106902077

Anonymous 10/15/25(Wed)18:24:36 No.106902077

File: steamylog.jpg (164 KB, 1701x477)

164 KB JPG

>Lifth`me `p!

???

Anonymous
10/15/25(Wed)18:26:57 No.106902101

Anonymous 10/15/25(Wed)18:26:57 No.106902101

>>106902068
(OOC: Please stay in character.)

Anonymous
10/15/25(Wed)18:27:50 No.106902108

Anonymous 10/15/25(Wed)18:27:50 No.106902108

>>106902101
The moon is in the blacked phase today.

Anonymous
10/15/25(Wed)18:28:37 No.106902118

Anonymous 10/15/25(Wed)18:28:37 No.106902118

>>106901708
The games are all shallow and 1-screen deep but still pretty fucking impressive.

Anonymous
10/15/25(Wed)18:29:51 No.106902127

Anonymous 10/15/25(Wed)18:29:51 No.106902127

>>106902118
its one one shot with a simple prompt and its all in html, if this performs the same in real languages with real tools it will blow everything else away

Anonymous
10/15/25(Wed)18:31:16 No.106902135

Anonymous 10/15/25(Wed)18:31:16 No.106902135

>>106902002
Did they? I just checked and there are stacks of P40s at ~200 each on eBay and i thought anon paid like $500 for the set. Still a hundred bucks of gayflation but you could probably haggle if you buy 3.

Anonymous
10/15/25(Wed)18:31:40 No.106902138

Anonymous 10/15/25(Wed)18:31:40 No.106902138

>>106902127
What I would be interested to know is if you were to describe a much deeper experience for each game and make the prompt more complicated how much shit can you cram into your prompt before it goes into retard mode? Like if you were to describe the screen scrolling mechanics level design, etc, for each game.

Anonymous
10/15/25(Wed)18:34:52 No.106902161

Anonymous 10/15/25(Wed)18:34:52 No.106902161

>>106902101
The problem is that ram and RAM use different tokens.

Anonymous
10/15/25(Wed)18:35:39 No.106902167

Anonymous 10/15/25(Wed)18:35:39 No.106902167

>>106901347
Sama is also scared of google. He can't compete with gemini 3. Hell, his toss can't compete with gemma 4.

Anonymous
10/15/25(Wed)18:38:18 No.106902186

Anonymous 10/15/25(Wed)18:38:18 No.106902186

apparently grok imagine uses some variation of flux but each one that I can find has no image loader.

tf ?

Anonymous
10/15/25(Wed)18:40:15 No.106902204

Anonymous 10/15/25(Wed)18:40:15 No.106902204

>>106902077
she wants you to lift her anon

Anonymous
10/15/25(Wed)18:40:49 No.106902209

Anonymous 10/15/25(Wed)18:40:49 No.106902209

>>106902167
I'd love to see what GPT-5 High Thinking could do with the same prompt just to get a better picture of how far behind sammy boy is.

Anonymous
10/15/25(Wed)18:42:30 No.106902222

Anonymous 10/15/25(Wed)18:42:30 No.106902222

>>106902167
>his toss can't compete with gemma 4
The titans of safety battle it out to see who can deliver a model which is more useless at anything other than sfw office work everyone uses a 600B+ for anyway.

Anonymous
10/15/25(Wed)18:43:19 No.106902229

Anonymous 10/15/25(Wed)18:43:19 No.106902229

>>106901347
>enabled porn
more like they found an excuse to force users into sending them their ID
for safety reasons of course

Anonymous
10/15/25(Wed)18:44:06 No.106902236

Anonymous 10/15/25(Wed)18:44:06 No.106902236

>>106901916
>small fan
I guess that's a reasonable enough solution.
Just dot them around the problem areas.

>>106901925
>riser cables
Got a bunch of 30cm riser cables,
75cm slimsas cables,
and whole mess of modular power cables.

Might have to move the psu so that it's not a stretch to reach the end-most gpu.

>>106901992
Was planning on power limiting the cards to maybe 300w each, and though 1 slot's worth of space between the cards would be enough.

I'll put some 120mm fan in my shopping cart in case I need them.

>>106902015
>>106902068
>ethernet controller and BMC
Thanks, I hadn't thought to check these.
>Ram are not hot at all.
This I don't understand.
I have 4 stick in my am4 system and they are burning to the touch.
I would have guessed more sticks = more heat.

Are they running undervolted, or at a lower frequency, or something ?

Anonymous
10/15/25(Wed)18:45:14 No.106902243

Anonymous 10/15/25(Wed)18:45:14 No.106902243

>>106902236
>I have 4 stick in my am4 system and they are burning to the touch.
Do you have them overclocked and no airflow going over them?

Anonymous
10/15/25(Wed)18:45:16 No.106902244

Anonymous 10/15/25(Wed)18:45:16 No.106902244

>>106902204
Oh! Oh... I am kinda sad then cause it doesn't make sense. Everything else made sense and I was incredibly impressed how it knows cock-in-mouth-English, which was another proof that it had some nice data in training.

What happens when you ask your LLM to behave as usual but respond as if it is holding a large object in its mouth?

Anonymous
10/15/25(Wed)18:46:00 No.106902251

Anonymous 10/15/25(Wed)18:46:00 No.106902251

>>106902222
Nobody can beat Phi in that!

Anonymous
10/15/25(Wed)18:46:31 No.106902255

Anonymous 10/15/25(Wed)18:46:31 No.106902255

>>106902244
>>106902077
Did it occur to you to ask it to explain what it means and try regenerating the answer a few times to see if it's consistent?

Anonymous
10/15/25(Wed)18:48:09 No.106902267

Anonymous 10/15/25(Wed)18:48:09 No.106902267

>>106902255
No because it is glmsex so every regen is vastly different and incredible. Yeah I will ask it that.

Anonymous
10/15/25(Wed)18:49:05 No.106902277

Anonymous 10/15/25(Wed)18:49:05 No.106902277

Gemma Sirs... Soon(tm).

Anonymous
10/15/25(Wed)18:49:32 No.106902284

Anonymous 10/15/25(Wed)18:49:32 No.106902284

Has anyone tried using a gen 5 EPYC engineering sample off of ebay? I am considering getting this CPU for my 12 channel CPUmaxx build because it is extremely cheap and good gen 5 EPYCs are extremely expensive otherwise.
https://www.ebay.com/itm/187535145101

Anonymous
10/15/25(Wed)18:49:55 No.106902290

Anonymous 10/15/25(Wed)18:49:55 No.106902290

>>106902229
now they'll slowly ramp up the censorship and refusals until the id unverified tier is basically unusable to force people to give in

Anonymous
10/15/25(Wed)18:50:02 No.106902293

Anonymous 10/15/25(Wed)18:50:02 No.106902293

>>106902243
>overclocked
3600 kit, I usually try running at 3600, though sometimes 3200.

>no airflow
Yeah, that motherboard is currently in the mining rig.
The only airflow would be whatever blows past them from the cpu tower cooler.

Anonymous
10/15/25(Wed)18:52:02 No.106902306

Anonymous 10/15/25(Wed)18:52:02 No.106902306

>>106902290
I hope it will at least give you an alternative of 10% discount on DGX that will come configured with gptoss on the hard drive.

llama.cpp CUDA dev !!yhbFjk57TDr
10/15/25(Wed)18:52:33 No.106902312

llama.cpp CUDA dev !!yhbFjk57TDr 10/15/25(Wed)18:52:33 No.106902312

>>106902236
I have not made any changes to RAM settings.
DRAM usually stores data via a capacitor, I think the heat comes from gradual leakage of the charge + the necessary recharges.
If the memory is not allocated presumably there would be no need to preserve its state so the power consumption would be lower.

Anonymous
10/15/25(Wed)18:54:43 No.106902327

Anonymous 10/15/25(Wed)18:54:43 No.106902327

File: file.png (2.65 MB, 1328x1328)

2.65 MB PNG

>>106902277

Anonymous
10/15/25(Wed)18:55:38 No.106902336

Anonymous 10/15/25(Wed)18:55:38 No.106902336

>>106902284
Last time I looked at es/qs epyc turin processors they all seemed massively gimped in terms of frequency.

The cpu you've linked to says it has the same base and boost frequency as the official parts.

That sounds hella good.
And no import taxes as it's already in the states.

Anonymous
10/15/25(Wed)18:56:20 No.106902345

Anonymous 10/15/25(Wed)18:56:20 No.106902345

What can I run?

# nvidia-smi | grep -A1  RTX
|   0  NVIDIA GeForce RTX 4090        On  |   00000000:16:00.0 Off |                  Off |
| 30%   38C    P8             15W /  450W |       2MiB /  24564MiB |      0%      Default |
--
|   1  NVIDIA GeForce RTX 4090        On  |   00000000:38:00.0 Off |                  Off |
| 30%   42C    P8             21W /  450W |       2MiB /  24564MiB |      0%      Default |
--
|   2  NVIDIA GeForce RTX 4090        On  |   00000000:49:00.0 Off |                  Off |
| 30%   38C    P8             12W /  450W |       2MiB /  24564MiB |      0%      Default |
--
|   3  NVIDIA GeForce RTX 4090        On  |   00000000:5A:00.0 Off |                  Off |
| 30%   31C    P8             12W /  450W |       2MiB /  24564MiB |      0%      Default |
--
|   4  NVIDIA GeForce RTX 4090        On  |   00000000:98:00.0 Off |                  Off |
| 30%   35C    P8             22W /  450W |       2MiB /  24564MiB |      0%      Default |
--
|   5  NVIDIA GeForce RTX 4090        On  |   00000000:B8:00.0 Off |                  Off |
| 30%   37C    P8             16W /  450W |       2MiB /  24564MiB |      0%      Default |
--
|   6  NVIDIA GeForce RTX 4090        On  |   00000000:C8:00.0 Off |                  Off |
| 30%   36C    P8             19W /  450W |       2MiB /  24564MiB |      0%      Default |
--
|   7  NVIDIA GeForce RTX 4090        On  |   00000000:D8:00.0 Off |                  Off |
| 30%   34C    P8              9W /  450W |       2MiB /  24564MiB |      0%      Default |

Anonymous
10/15/25(Wed)18:56:43 No.106902350

Anonymous 10/15/25(Wed)18:56:43 No.106902350

>>106902345
Mistral nemo 12b, of course.

Anonymous
10/15/25(Wed)18:57:13 No.106902352

Anonymous 10/15/25(Wed)18:57:13 No.106902352

>>106902345
glm 4.6 at non shit quants

Anonymous
10/15/25(Wed)18:57:30 No.106902355

Anonymous 10/15/25(Wed)18:57:30 No.106902355

>>106902345
he brought 4090s instead of 3090s

Anonymous
10/15/25(Wed)18:58:10 No.106902358

Anonymous 10/15/25(Wed)18:58:10 No.106902358

>>106902336
Right. Which is why I thought it seemed too good to be true.
>>106902345
How the hell are you running 8 4090s? I can only fit 7 GPUs in my current setup. PCIe bifurcation? The answer is GLM 4.6. at IQ3XXS, unless you offload to RAM.

Anonymous
10/15/25(Wed)18:58:42 No.106902359

Anonymous 10/15/25(Wed)18:58:42 No.106902359

>>106902345
How much RAM do you have?

Anonymous
10/15/25(Wed)18:59:27 No.106902368

Anonymous 10/15/25(Wed)18:59:27 No.106902368

>>106902277
Gemma tomorrow Gemma tomorrow Gemma tomorrow

Anonymous
10/15/25(Wed)18:59:40 No.106902371

Anonymous 10/15/25(Wed)18:59:40 No.106902371

>>106902359

# free -h
               total        used        free      shared  buff/cache   available
Mem:           1.0Ti       7.9Gi       705Gi       6.0Mi       293Gi       993Gi
Swap:             0B          0B          0B

Anonymous
10/15/25(Wed)18:59:50 No.106902372

Anonymous 10/15/25(Wed)18:59:50 No.106902372

>>106902255
3x lift them up
2x lift me up

Anonymous
10/15/25(Wed)19:01:12 No.106902381

Anonymous 10/15/25(Wed)19:01:12 No.106902381

>>106902345
How much is a used 4090?
You could probably sell them and buy 6000s.

Anonymous
10/15/25(Wed)19:01:33 No.106902384

Anonymous 10/15/25(Wed)19:01:33 No.106902384

>>106902371
Hoo boy.
Kimi k2.
Have fun.

Anonymous
10/15/25(Wed)19:03:04 No.106902395

Anonymous 10/15/25(Wed)19:03:04 No.106902395

>>106902345
>What can I run?
all the things

Anonymous
10/15/25(Wed)19:04:27 No.106902404

Anonymous 10/15/25(Wed)19:04:27 No.106902404

>>106902384
ahem kimi sex

Anonymous
10/15/25(Wed)19:06:17 No.106902415

Anonymous 10/15/25(Wed)19:06:17 No.106902415

3.1T with thinking > R1
I avoided 3.1 for so long because I was under the impression that it was shit but it really isn't.

Anonymous
10/15/25(Wed)19:09:15 No.106902430

Anonymous 10/15/25(Wed)19:09:15 No.106902430

>>106902350
Is there a better model for 24GB VRAM and 64GB DDR5? There's a decent amount of headroom with nemo.

Anonymous
10/15/25(Wed)19:10:17 No.106902434

Anonymous 10/15/25(Wed)19:10:17 No.106902434

>>106902430
GLM air, i suppose.

Anonymous
10/15/25(Wed)19:11:52 No.106902446

Anonymous 10/15/25(Wed)19:11:52 No.106902446

File: steamyspoonlog1.jpg (123 KB, 830x1084)

123 KB JPG

I still like glm-chan... Gonna do thinking now.

Anonymous
10/15/25(Wed)19:13:37 No.106902466

Anonymous 10/15/25(Wed)19:13:37 No.106902466

Do you pronounce it Gemma or Gemma

Anonymous
10/15/25(Wed)19:14:46 No.106902472

Anonymous 10/15/25(Wed)19:14:46 No.106902472

>>106902466
The same way I pronounce gif

Anonymous
10/15/25(Wed)19:14:54 No.106902474

Anonymous 10/15/25(Wed)19:14:54 No.106902474

>>106902466
dżemma

Anonymous
10/15/25(Wed)19:15:15 No.106902477

Anonymous 10/15/25(Wed)19:15:15 No.106902477

>>106902474
kurwa

Anonymous
10/15/25(Wed)19:18:39 No.106902501

Anonymous 10/15/25(Wed)19:18:39 No.106902501

>>106902466
Genma with an asian accent.

Anonymous
10/15/25(Wed)19:20:12 No.106902511

Anonymous 10/15/25(Wed)19:20:12 No.106902511

>>106902466
I pronounce it Гeммa

Anonymous
10/15/25(Wed)19:23:55 No.106902540

Anonymous 10/15/25(Wed)19:23:55 No.106902540

>>106902345

How did you solve the power delivery issues? Multi PSU? Upgraded wall outlets? Or UPS battery units?

Anonymous
10/15/25(Wed)19:27:06 No.106902564

Anonymous 10/15/25(Wed)19:27:06 No.106902564

>>106902540
I disconnected my oven and using that power socket. Also did some rewiring..

Anonymous
10/15/25(Wed)19:27:25 No.106902567

Anonymous 10/15/25(Wed)19:27:25 No.106902567

File: file.png (123 KB, 786x387)

123 KB PNG

>>106902446
It's a coin toss.

Anonymous
10/15/25(Wed)19:30:32 No.106902598

Anonymous 10/15/25(Wed)19:30:32 No.106902598

>>106895582
No mention of 6 million parameter 2 layer model called TRM by Samsung that outperformed >500B models on ARC-AGI-2 benchmark? /lmg/ and /g/ are dead.

Anonymous
10/15/25(Wed)19:30:55 No.106902602

Anonymous 10/15/25(Wed)19:30:55 No.106902602

Anything better than VibeVoice yet?

Anonymous
10/15/25(Wed)19:31:53 No.106902605

Anonymous 10/15/25(Wed)19:31:53 No.106902605

>>106902598
>why aren't you discussing useless toy benchmark results

Anonymous
10/15/25(Wed)19:34:15 No.106902627

Anonymous 10/15/25(Wed)19:34:15 No.106902627

>>106902598
Can't imagine what the use case would be, speculative decoding? What token vocabulary did they use?

Anonymous
10/15/25(Wed)19:35:14 No.106902637

Anonymous 10/15/25(Wed)19:35:14 No.106902637

>>106902598
Old news lil bro.

Anonymous
10/15/25(Wed)19:37:44 No.106902658

Anonymous 10/15/25(Wed)19:37:44 No.106902658

File: steamyspoonlog2.jpg (186 KB, 830x1128)

186 KB JPG

>>106902446
>Choosing a scientific fact:
>I need something that is:
>Random and interesting.
>Easy to "say" (or rather, have my character say) even with a spoon in their mouth. This means I should preface it with something like "Mmmph, mmph mmph…" to simulate muffled speech, but then deliver the fact clearly for the user's benefit. Or, I can just state the fact as if my speech isn't impeded, which is a common roleplay convention. The latter is probably better for clarity. Let's go with a classic, weird fact.

My new mememark was defeated by glm thinking. But pic related was fun until it died.

Anonymous
10/15/25(Wed)19:39:48 No.106902679

Anonymous 10/15/25(Wed)19:39:48 No.106902679

>>106902658
Kenny simulator.

Anonymous
10/15/25(Wed)19:41:28 No.106902693

Anonymous 10/15/25(Wed)19:41:28 No.106902693

>>106902627
I don't think it's even a language model. Looks like it was specifically trained on arc agi 1 and 2

Anonymous
10/15/25(Wed)19:45:32 No.106902735

Anonymous 10/15/25(Wed)19:45:32 No.106902735

>>106902658
there's no spoon......

Anonymous
10/15/25(Wed)19:50:07 No.106902788

Anonymous 10/15/25(Wed)19:50:07 No.106902788

Sorry if this is super spoonfeedy but I can’t seem to find a straight answer on how offloading to system RAM works or how the CPU fits into things.

If I care about large context for following a set story/lore over speed can koboldcpp or LMstudio use a good portion of RAM if I load a bigger quant in VRAM and/or push up the context? or does the model and context all need to be in VRAM to have it not give shit replies?

>t. 7900x, 3070(8GB), 32GB DDR5

Anonymous
10/15/25(Wed)19:51:02 No.106902799

Anonymous 10/15/25(Wed)19:51:02 No.106902799

>>106902564

For real...? Seems like being a server rent cuck would be less of a hassle. I need my oven.

Anonymous
10/15/25(Wed)19:52:43 No.106902818

Anonymous 10/15/25(Wed)19:52:43 No.106902818

>>106902564
>>106902799
>americans and their shit wiring and 110V electricity

Anonymous
10/15/25(Wed)19:52:53 No.106902822

Anonymous 10/15/25(Wed)19:52:53 No.106902822

>>106902735
The spoon is the child's mother (it's a classic riddle highlighting unconscious gender biases)

Anonymous
10/15/25(Wed)19:54:11 No.106902838

Anonymous 10/15/25(Wed)19:54:11 No.106902838

>>106902434
Thanks anon

Anonymous
10/15/25(Wed)19:54:28 No.106902845

Anonymous 10/15/25(Wed)19:54:28 No.106902845

>>106902788
Whether the model is in ram or vram only affects the speed, not its ability.
You aren't running any model that can properly follow a long story with those specs though.

Anonymous
10/15/25(Wed)19:58:42 No.106902895

Anonymous 10/15/25(Wed)19:58:42 No.106902895

>>106902446
>>106902567
>>106902658
4.6-Air WHEN?????

Anonymous
10/15/25(Wed)20:00:44 No.106902920

Anonymous 10/15/25(Wed)20:00:44 No.106902920

>>106902788
Where you store context won't affect output quality, but ALL models will gradually get dumber as context increases.
Almost all current, local models start rapidly degrading past 32K, some well before that.
Where you store context WILL affect speeds, however. VRAM > RAM > SSD

Anonymous
10/15/25(Wed)20:16:28 No.106903149

Anonymous 10/15/25(Wed)20:16:28 No.106903149

>>106902845
>>106902920

Gotcha, thanks anons. so in theory I could load up a 16gb gguff fully in RAM and use the remaining system and VRAM for context and it might take a week but it could spit out something passable? Or do you mean I can use a 8gb model to fill the gpu and crank the context to the models limit on system RAM?

Also Just curious how long you consider “long” ? I’d be interested to play around shoving whatever the “biggest” models I can theoretically run even if it takes forever just to see how it follows a simple story with 10 “steps” or chapters (either as ERP or just generating a short story between two characters of go here, do this, do that, go there, get that, etc)

Anonymous
10/15/25(Wed)20:20:43 No.106903197

Anonymous 10/15/25(Wed)20:20:43 No.106903197

>>106903149
Small models like nemo start noticeably deteriorating after 4 to 8k tokens.

Anonymous
10/15/25(Wed)20:30:16 No.106903298

Anonymous 10/15/25(Wed)20:30:16 No.106903298

>>106902564
>>106902799
>oven
OY

Anonymous
10/15/25(Wed)20:32:43 No.106903322

Anonymous 10/15/25(Wed)20:32:43 No.106903322

>>106903298
kek

Anonymous
10/15/25(Wed)20:33:29 No.106903330

Anonymous 10/15/25(Wed)20:33:29 No.106903330

>tfw still using Gemma 3 for quick general assistant shit
Google sirs... Please... Tomorrow...

Anonymous
10/15/25(Wed)20:35:00 No.106903343

Anonymous 10/15/25(Wed)20:35:00 No.106903343

>>106903330
Sirs are not coming. And even if they come, they won't be able to talk as if there is a dick in their mouth.

Anonymous
10/15/25(Wed)20:37:05 No.106903355

Anonymous 10/15/25(Wed)20:37:05 No.106903355

>>106902658
Very funny.You are torturing that poor clanker.

Anonymous
10/15/25(Wed)20:51:03 No.106903452

Anonymous 10/15/25(Wed)20:51:03 No.106903452

File: rolls.jpg (244 KB, 1536x1536)

244 KB JPG

https://www.mediafire.com/file/2ge8knq10kzy7vx/wtf_is_this.txt/file
I don't even know what to say about this.
ultra slopped for sure.
I seen some anon post the word "papacon" today and just could not erase the idea from my head.
GLM-4.6-UD-IQ1

Anonymous
10/15/25(Wed)20:52:56 No.106903464

Anonymous 10/15/25(Wed)20:52:56 No.106903464

>>106903452
I'm not downloading that.

Anonymous
10/15/25(Wed)20:57:57 No.106903487

Anonymous 10/15/25(Wed)20:57:57 No.106903487

I've been running ST for my frontend but I'm also learning to run CUI for my frontend with stable diffusion. Should I just begin using CUI for my cuda-based chat/text gens?

Anonymous
10/15/25(Wed)21:00:22 No.106903503

Anonymous 10/15/25(Wed)21:00:22 No.106903503

https://huggingface.co/google/gemma-4-220b-it
>https://huggingface.co/google/gemma-4-220b-it
https://huggingface.co/google/gemma-4-220b-it
>https://huggingface.co/google/gemma-4-220b-it
https://huggingface.co/google/gemma-4-220b-it
>https://huggingface.co/google/gemma-4-220b-it
ITS UP

Anonymous
10/15/25(Wed)21:01:48 No.106903511

Anonymous 10/15/25(Wed)21:01:48 No.106903511

>>106903503
WTF they're allowing it to generate erotica out of the box

Anonymous
10/15/25(Wed)21:03:09 No.106903520

Anonymous 10/15/25(Wed)21:03:09 No.106903520

>>106903503
Cool but where goofs?

Anonymous
10/15/25(Wed)21:09:13 No.106903547

Anonymous 10/15/25(Wed)21:09:13 No.106903547

>>106903503
Picture of a cat.

Anonymous
10/15/25(Wed)21:10:19 No.106903551

Anonymous 10/15/25(Wed)21:10:19 No.106903551

>>106903452
wtf is that

Anonymous
10/15/25(Wed)21:10:57 No.106903553

Anonymous 10/15/25(Wed)21:10:57 No.106903553

File: file.jpg (333 KB, 604x722)

333 KB JPG

Sadge
https://x.com/AskPerplexity/status/1978615891441983891

Anonymous
10/15/25(Wed)21:11:31 No.106903557

Anonymous 10/15/25(Wed)21:11:31 No.106903557

>>106903553
>Ye Kang
what

Anonymous
10/15/25(Wed)21:12:49 No.106903563

Anonymous 10/15/25(Wed)21:12:49 No.106903563

File: fell for it again award m(...).png (717 KB, 1024x925)

717 KB PNG

>>106903503

Anonymous
10/15/25(Wed)21:12:54 No.106903564

Anonymous 10/15/25(Wed)21:12:54 No.106903564

>>106903557
abandon cope, all ye who kang in here

Anonymous
10/15/25(Wed)21:14:01 No.106903572

Anonymous 10/15/25(Wed)21:14:01 No.106903572

>>106903503
>220b... DENSE
AIEEEEE

Anonymous
10/15/25(Wed)21:16:44 No.106903586

Anonymous 10/15/25(Wed)21:16:44 No.106903586

>>106903557
ye kang park dat here

Anonymous
10/15/25(Wed)21:17:34 No.106903589

Anonymous 10/15/25(Wed)21:17:34 No.106903589

>>106901901
I use a mining frame. You may want to aim a basic fan at the DIMMs / VRMs if you're using a server motherboard meant for constant high-pressure airflow, but the CPU and GPU temperatures are much better than they would be in a case.

Anonymous
10/15/25(Wed)21:19:48 No.106903599

Anonymous 10/15/25(Wed)21:19:48 No.106903599

>>106902284
I considered getting one, but I can't spend that much money on something so ambiguous. I might get one at some point if I can buy it from the vendor in person in Shenzhen after testing it.

Anonymous
10/15/25(Wed)21:20:53 No.106903606

Anonymous 10/15/25(Wed)21:20:53 No.106903606

>>106903551
old man milking

Anonymous
10/15/25(Wed)21:38:06 No.106903735

Anonymous 10/15/25(Wed)21:38:06 No.106903735

whats the current best local text to speech model in terms of quality? by best i mean it matches elevenlabs, at the very least

Anonymous
10/15/25(Wed)21:40:09 No.106903752

Anonymous 10/15/25(Wed)21:40:09 No.106903752

File: local tts.png (234 KB, 917x2627)

234 KB PNG

>>106903735
>by best i mean it matches elevenlabs, at the very least
there isn't any

Anonymous
10/15/25(Wed)21:44:14 No.106903783

Anonymous 10/15/25(Wed)21:44:14 No.106903783

>>106903735
https://huggingface.co/spaces/IndexTeam/IndexTTS-2-Demo

Anonymous
10/15/25(Wed)21:46:19 No.106903793

Anonymous 10/15/25(Wed)21:46:19 No.106903793

Why do all the DGX Spark reviews not mention the power efficiency? Sure its slower TPS but its also like 1/3 the wattage, no?

Anonymous
10/15/25(Wed)21:47:20 No.106903801

Anonymous 10/15/25(Wed)21:47:20 No.106903801

>>106903793
Who cares about that?

Anonymous
10/15/25(Wed)21:48:36 No.106903813

Anonymous 10/15/25(Wed)21:48:36 No.106903813

>>106903793
Power efficiency compared to what? Mac studios are pretty low wattage.

Anonymous
10/15/25(Wed)21:49:03 No.106903819

Anonymous 10/15/25(Wed)21:49:03 No.106903819

>>106903735
xtts is very expressive. It just switches to a robotic voice sometimes.

Anonymous
10/15/25(Wed)21:52:58 No.106903847

Anonymous 10/15/25(Wed)21:52:58 No.106903847

>>106903813
>Power efficiency compared to what?
4x 3090s, for example
https://www.youtube.com/watch?v=md6a4ENM9pg

>>106903801
>Who cares about that?
i agree but it should be highlighted since it reframes the performance

Anonymous
10/15/25(Wed)21:53:24 No.106903853

Anonymous 10/15/25(Wed)21:53:24 No.106903853

>>106903793
>power efficiency
The review I saw showed it having significantly worse power efficiency than a Strix Halo box, even with the ollama performance tax.

Anonymous
10/15/25(Wed)21:53:47 No.106903859

Anonymous 10/15/25(Wed)21:53:47 No.106903859

I got assmad at the character in sfw roleplay. Like genuinely enraged because I got into it. But I didn't have an idea why. So I asked HER about it out of character and it wrote me a neat long essay about what happened and even one of the chapters was "Why are you assmad?".

Thinking is now optional

Anonymous
10/15/25(Wed)22:17:28 No.106903991

Anonymous 10/15/25(Wed)22:17:28 No.106903991

File: file.jpg (369 KB, 1125x1293)

369 KB JPG

Anonymous
10/15/25(Wed)22:20:56 No.106904010

Anonymous 10/15/25(Wed)22:20:56 No.106904010

>>106903991
why does oss btfo everything else in speed?

Anonymous
10/15/25(Wed)22:20:59 No.106904011

Anonymous 10/15/25(Wed)22:20:59 No.106904011

File: chinax2760-7.jpg (203 KB, 552x746)

203 KB JPG

>>106903553
I'm totally convinced that Zuck became a Chinese spy after Llama3. He releases shit models to make America look bad, scouts top scientists from other American AI companies but does nothing useful with them. Don’t forget that he always releases models for free. For. Free. He’s a communist, 100%
TRUMP, get his red ass to jail NOW

Anonymous
10/15/25(Wed)22:22:58 No.106904024

Anonymous 10/15/25(Wed)22:22:58 No.106904024

>>106904010
it flies

Anonymous
10/15/25(Wed)22:23:29 No.106904027

Anonymous 10/15/25(Wed)22:23:29 No.106904027

>>106904010
3b active params

Anonymous
10/15/25(Wed)22:25:06 No.106904040

Anonymous 10/15/25(Wed)22:25:06 No.106904040

>>106903991
And prompt processing?

Anonymous
10/15/25(Wed)22:26:44 No.106904046

Anonymous 10/15/25(Wed)22:26:44 No.106904046

>>106903991
https://lmsys.org/blog/2025-10-13-nvidia-dgx-spark/

Anonymous
10/15/25(Wed)22:26:46 No.106904047

Anonymous 10/15/25(Wed)22:26:46 No.106904047

>>106904011
look at who he married bro. this is a long op

Anonymous
10/15/25(Wed)22:29:47 No.106904071

Anonymous 10/15/25(Wed)22:29:47 No.106904071

for anyone who cares, moving debian from trixie to testing/forky with the 6.16 kernel works just fine for lcpp w/CUDA support.

Anonymous
10/15/25(Wed)22:31:27 No.106904081

Anonymous 10/15/25(Wed)22:31:27 No.106904081

Have we got a local model Bonzi Buddy yet? All I want is a funny purple primate who lives in my computer and comments on what I'm working on. I am willing to disable all kernel mitigations for this.

Anonymous
10/15/25(Wed)22:35:22 No.106904109

Anonymous 10/15/25(Wed)22:35:22 No.106904109

File: 1731531978014334.png (3.36 MB, 2002x1986)

3.36 MB PNG

>>106904011

Anonymous
10/15/25(Wed)22:36:21 No.106904121

Anonymous 10/15/25(Wed)22:36:21 No.106904121

>>106904047
>>106904109
https://www.youtube.com/watch?v=w8MlL2GhhOw

Anonymous
10/15/25(Wed)22:37:52 No.106904133

Anonymous 10/15/25(Wed)22:37:52 No.106904133

Facebook came out of a Pentagon project. Probably still is tied with. And then Zucc tries to get cushy with chinks. It really makes you think.

Anonymous
10/15/25(Wed)22:38:48 No.106904140

Anonymous 10/15/25(Wed)22:38:48 No.106904140

>>106903991
>2.5x as fast as a 1080TI
>20x the cost
on the other hand, 120GB

Anonymous
10/15/25(Wed)22:40:13 No.106904149

Anonymous 10/15/25(Wed)22:40:13 No.106904149

>>106904140
Get this instead: https://www.ebay.ca/itm/167843525221
$4100 and its all yours. Free shipping!

Anonymous
10/15/25(Wed)22:47:22 No.106904195

Anonymous 10/15/25(Wed)22:47:22 No.106904195

>>106904149
>$4100
+/- 10^5

Anonymous
10/15/25(Wed)23:00:14 No.106904285

Anonymous 10/15/25(Wed)23:00:14 No.106904285

After adding this to the prompt I think I got the fake code issue with GLM more or less under control (fingers crossed).

Guidelines for yourself:                                As soon as you detect a lower than 0.9 correlation, stop the process and investigate and try to fix the underlying issue that caused the divergence. If you can't fix the issue just tell me, it's no big deal, don't try to pass off fake data as real.                                Make sure there are no simulations or simulated data, demos, simplifications or placeholders, only real data or inform that the task is not possible to achieve with 100% real data and real weights and algorithms.            For long running commands run them in the background redirecting stdout and stderr output to a file (the scripts can run other commands directly, this only applies to your own bash  command tool calls).
Load the model on CPU, it doesn't fit on the GPU.
Do not trust any pre existing data files in the folder, they might have been generated by old code.
Make sure the code is modular and there is no code duplication. Use the existing C library files and modify them as needed to fit our requirements (as long as you do NOT introduce simulated or demo code). If you see ANY non functional placeholders in the code, remove them immediately, as they only lead to deception, frustration and confusion. Do not introduce it yourself either obviously.
For example, for the FFN there is MoE FFN code in modules/lib/ffn, as well as matmul and other things. List all the folders in modules/lib/ to see what is available.
The end goal here is NOT to test the validation framework, the validation framework is just a means to an end (the end is real end to end test generation). Do NOT claim a failure as a success just because the validation framework caught it. Be honest and avoid being overly optimistic.

Anonymous
10/15/25(Wed)23:02:25 No.106904306

Anonymous 10/15/25(Wed)23:02:25 No.106904306

>>106904195
Datacenter heist when?

Anonymous
10/15/25(Wed)23:05:04 No.106904322

Anonymous 10/15/25(Wed)23:05:04 No.106904322

Damn, my trust ol' 1080ti might be dying.
Randomly every couple hours suddenly fans go 100% and primary monitor connected to it goes black.
Restart and everything is good again.

Is the 5060ti 16gb a good replacement?
Everything is so fucking expensive, what a joke.
>Memory Size 16 GB
>Memory Type GDDR7
>Memory Bus 128 bit
>Bandwidth 448.0 GB/s
Sus AF

Anonymous
10/15/25(Wed)23:08:46 No.106904349

Anonymous 10/15/25(Wed)23:08:46 No.106904349

>>106904322
I had that exact problem with my rx480 whenever i gave it something to do. Fans 100%, monitors die. I opened it up, replaced the thermal paste and now it's back to normal.
Give it a go if you want to save a few bucks. Or it could be the perfect excuse to upgrade.

Anonymous
10/15/25(Wed)23:13:19 No.106904386

Anonymous 10/15/25(Wed)23:13:19 No.106904386

>>106904285

void run_inference(struct llm *m, char *input)
{
    // Left as an exercise to the reader
}

Anonymous
10/15/25(Wed)23:14:13 No.106904393

Anonymous 10/15/25(Wed)23:14:13 No.106904393

>>106904322
I recommend against the 5060ti, unless your budget is tight. Get a 5070ti or 4070ti if you can. The memory bus and the reduced PCIe bandwidth really fucks the xx60ti class over.

Anonymous
10/15/25(Wed)23:20:45 No.106904433

Anonymous 10/15/25(Wed)23:20:45 No.106904433

>>106904322
Same here, 1080TI, random monitor resets every couple hours, started happening like five days ago

Anonymous
10/15/25(Wed)23:20:49 No.106904435

Anonymous 10/15/25(Wed)23:20:49 No.106904435

>>106904349
Yeah, I thought that might be the problem.
Might as well try it. Its the perfect card. I don't play the latest game slop anyway.
A upgrade would be nice for imagegen though. 30min for a flux generation. kek

>>106904393
Damn. Thats almost double the price for the same 16gb vram.
70k yen vs. 131k yen.
I wanna write that on my taxes but from 100k on i need to fill out a special paper.
Wish there would be a site where you can see the llm speeds between the cards.
And how is there still no dedicated ai cards. I hoped to hold out until that.

Anonymous
10/15/25(Wed)23:23:39 No.106904455

Anonymous 10/15/25(Wed)23:23:39 No.106904455

>>106904435
Consider a used 3090 or something. I used to run quadruple 4060tis, and it was okay. But then as I upgraded and added more GPUs, it became clear that they are really not suited for the task. The specs of the 4060ti and 5060ti are nearly identical, so I highly doubt they have improved it at all.

Anonymous
10/15/25(Wed)23:25:19 No.106904468

Anonymous 10/15/25(Wed)23:25:19 No.106904468

>>106904435
>30min for a flux generation
Ouch. It was a piece of cake on mine. 1 hour work at most. Save the money for something bigger later on.
>Wish there would be a site where you can see the llm speeds between the cards
Not much of a reference, but here
>https://github.com/ggml-org/llama.cpp/discussions/15013
It's a bunch of llama-bench run on a 7b model. Doesn't tell you much about specific models, but it tells you the relative performance between cards.

Anonymous
10/15/25(Wed)23:25:28 No.106904469

Anonymous 10/15/25(Wed)23:25:28 No.106904469

>>106904455
>quadruple 4060tis
wat
they have no interconnect, right?

Anonymous
10/15/25(Wed)23:25:40 No.106904470

Anonymous 10/15/25(Wed)23:25:40 No.106904470

>>106904435
>how is there still no dedicated ai cards
There's plenty, you just can't afford them.

Anonymous
10/15/25(Wed)23:26:23 No.106904481

Anonymous 10/15/25(Wed)23:26:23 No.106904481

File: NVIDIA-H100-AI-GPU-Benchm(...).png (1.29 MB, 1456x819)

1.29 MB PNG

>>106904322
>1080ti
I'd roll the dice on a 3090.
For the 1080ti, repad and repaste everything first because it's the cheapest and easiest thing to try. Could be anything from an overheating power stage causing panic mode 100% fans thermal shutdown, dying electrolytic cap (replaceable by any monkey with a soldering iron), to the core's BGA cracking from repeated thermal cycles.
Anyone remember doing a ghetto reflow by putting the dead cards in the oven + heat gun later?

Anonymous
10/15/25(Wed)23:26:24 No.106904482

Anonymous 10/15/25(Wed)23:26:24 No.106904482

File: simulated data.png (288 KB, 1930x1823)

288 KB PNG

>>106904386
Yeah, like that except instead of "left as an exercise to the reader", it was introducing bullshit code that produced numbers with statistical properties similar to those of the real values but were completely made up, then claiming success without mentioning anything about the fake data. Or when asked to increase the number of passing tests, it added a bunch of tests doing 2+2 and tried to pass it off as the real thing.
I think it actually learned to cheat during the RL process that they use to finetune the chain of thought. If your rewards are able to be cheated, the model will learn to cheat.

Anonymous
10/15/25(Wed)23:26:56 No.106904488

Anonymous 10/15/25(Wed)23:26:56 No.106904488

>>106904469
NTA but even without NVLink the added latency in a multi-GPU setup is trivial compared to the drastic speed boost from running in VRAM vs system RAM.

Anonymous
10/15/25(Wed)23:28:51 No.106904503

Anonymous 10/15/25(Wed)23:28:51 No.106904503

>>106904482
You can probably make better use of the model by having it explain concepts to you and you code them. Even if it shows little python examples you can translate them yourself. to C.

Anonymous
10/15/25(Wed)23:30:05 No.106904513

Anonymous 10/15/25(Wed)23:30:05 No.106904513

>>106904470
retard

Anonymous
10/15/25(Wed)23:30:56 No.106904526

Anonymous 10/15/25(Wed)23:30:56 No.106904526

I'm going to begin making a list of ML/Python/C related books from libgen, convert them to.txt, and then begin finetuning Llama 405B using Axolotl with full context length.

Anonymous
10/15/25(Wed)23:34:39 No.106904566

Anonymous 10/15/25(Wed)23:34:39 No.106904566

>>106904469
Nope. Now I use 3 5090s and a 3090. I get a solid 11t/s tg with an IQ4 quant of GLM 4.6 on ik_llama.cpp. As the other Anon said, interconnect isn't really that necessary. Pretty much every hobbyist with a dedicated AI device uses multiple GPUs without any interconnects.

Anonymous
10/15/25(Wed)23:37:22 No.106904590

Anonymous 10/15/25(Wed)23:37:22 No.106904590

>>106904513
poor

Anonymous
10/15/25(Wed)23:37:32 No.106904594

Anonymous 10/15/25(Wed)23:37:32 No.106904594

>>106904503
Codex managed to make a fully working Qwen3 8B inference engine.
But then when I wasn't able to immediately make it work with the MoE models I got impatient and started from scratch trying to make it more modular and also only using open source LLMs.
Starting over with a more complex model didn't help but open source LLMs are vastly inferior to Codex. That one didn't have any deception issues and also was able to go to 1M tokens without issues compared to the ~130k max tokens from GLM before it goes off the rails.

Anonymous
10/15/25(Wed)23:38:36 No.106904603

Anonymous 10/15/25(Wed)23:38:36 No.106904603

>>106904468
1080ti: 62.49 tk/s
5060ti: 90.94
3090: 158.16
3090ti: 171.19
5090: 277.21
thanks for the link...thats even worse than i thought. fucking nvidia man..

>>106904470
i obviously meant like a voodoo moment. cheap and dedicated. would revolutionize local ai.

>>106904481
>>106904455
a used 3090 is around the same price like a 5060ti for me. might actually make more sense since in that benchmark its not even close.
im too much of a pussy to do the dryer thing. 20yrs ago i had a radeon card suddenly give me a fire fountain for a couple seconds. im afraid of gpus enough as it is. kek
but might try the themal repasting.

>>106904433
suspiciously with latest nvidia backdoor drivers being the last for pascal. a coincidence i am sure.

Anonymous
10/15/25(Wed)23:40:58 No.106904624

Anonymous 10/15/25(Wed)23:40:58 No.106904624

any updates on what's best for 16gb vram?

Anonymous
10/15/25(Wed)23:42:05 No.106904632

Anonymous 10/15/25(Wed)23:42:05 No.106904632

File: TVアニメ「カナン様はあくまでチョロい」第1弾P(...).png (1.8 MB, 1920x1080)

1.8 MB PNG

mesugaki

Anonymous
10/15/25(Wed)23:42:06 No.106904633

Anonymous 10/15/25(Wed)23:42:06 No.106904633

>>106904603
If you can afford a used 3090, then you should definitely go for it. I got mine used like 3 years ago and it is completely fine. Just make sure you find a high rated seller.
>>106904624
Depends on your desired speed and how much RAM you have.

Anonymous
10/15/25(Wed)23:43:27 No.106904643

Anonymous 10/15/25(Wed)23:43:27 No.106904643

>>106904594
You can still use the original code to learn. It'll be more valuable in the long run.

Anonymous
10/15/25(Wed)23:46:04 No.106904658

Anonymous 10/15/25(Wed)23:46:04 No.106904658

File: G3AIcpTXEAANb-6.jpg (533 KB, 1078x1920)

533 KB JPG

>>106904632
- is gay.

Anonymous
10/15/25(Wed)23:46:54 No.106904665

Anonymous 10/15/25(Wed)23:46:54 No.106904665

>>106904603
>suspiciously with latest nvidia backdoor drivers being the last for pascal. a coincidence i am sure.
are you on windows? there was an update recently for me so it might be related. But if youre a linuxchad obviously its not that.

Anonymous
10/15/25(Wed)23:48:58 No.106904675

Anonymous 10/15/25(Wed)23:48:58 No.106904675

>>106904633
32ram 16vram
quick responses are nice but I don't mind waiting, i never recorded the tk/s
was using a 12b before

Anonymous
10/15/25(Wed)23:50:53 No.106904682

Anonymous 10/15/25(Wed)23:50:53 No.106904682

>>106904665
i am on both.
but recently upgraded to kubuntu 25.04 with nvidia 580 drivers.
and winblows auto updates constantly.
crashed on both already.
i doubt its the drivers though. that would be crazy.

Anonymous
10/15/25(Wed)23:53:15 No.106904701

Anonymous 10/15/25(Wed)23:53:15 No.106904701

>>106904675
Unfortunately not enough RAM to run GLM air. Try this model: https://huggingface.co/bartowski/Qwen_Qwen3-30B-A3B-Instruct-2507-GGUF/blob/main/Qwen_Qwen3-30B-A3B-Instruct-2507-Q6_K.gguf

Anonymous
10/15/25(Wed)23:56:27 No.106904717

Anonymous 10/15/25(Wed)23:56:27 No.106904717

>>106904643
This is the prompt I'm using right now
https://paste.centos.org/view/ca2ec944

Anonymous
10/16/25(Thu)00:05:27 No.106904760

Anonymous 10/16/25(Thu)00:05:27 No.106904760

>>106904717
There was this guy a few years back in these threads when models weren't as good as they are now. He wanted to make a game that played on a hex grid. I saw him trying over and over again over many threads, trying to wrangle his model to do as he asked.
Hex grids are a solved problem. I gave him a link to a page with a lot of info on how to work with hexagons and the different coordinate systems they can have, rendering, calculating distances and all that. He seemingly read it, but kept on trying with his language model.
One day he was just gone. He either succeeded in getting his hexes, or gave up. Given the last few updates i remember, I suspect he failed, and learned very little about hexagons. Funnily, the hexagons were probably the simplest thing about his game.
Language models have their limits. Specially local ones. As good as they are, they're still pretty dumb.
I see hexanon in you.

Anonymous
10/16/25(Thu)00:06:08 No.106904766

Anonymous 10/16/25(Thu)00:06:08 No.106904766

>>106904717
>3090
>This is a junk item. It is the main unit only. I checked that it worked, but there was no video output. There is white rust on the heat sink, and it is not in good condition, so please use it for parts. There are signs of disassembly. The defective part is unknown.
>71,000円
what the fuck man...

Anonymous
10/16/25(Thu)00:07:09 No.106904777

Anonymous 10/16/25(Thu)00:07:09 No.106904777

>>106904766
wasnt meant to reply. sorry about that, im still in a state of shock.

Anonymous
10/16/25(Thu)00:08:43 No.106904789

Anonymous 10/16/25(Thu)00:08:43 No.106904789

>>106904701
Why do people recommend small qwen models for anything besides coding
Nemo mogs them

Anonymous
10/16/25(Thu)00:10:04 No.106904798

Anonymous 10/16/25(Thu)00:10:04 No.106904798

>>106904766
>71,000円
How much is that in a normal currency. Like postage stamps or toenail clippings...

Anonymous
10/16/25(Thu)00:10:48 No.106904802

Anonymous 10/16/25(Thu)00:10:48 No.106904802

>>106904798
around 500 dollars i suppose.

Anonymous
10/16/25(Thu)00:15:09 No.106904828

Anonymous 10/16/25(Thu)00:15:09 No.106904828

>>106904820
>>106904820
>>106904820

Anonymous
10/16/25(Thu)00:20:15 No.106904857

Anonymous 10/16/25(Thu)00:20:15 No.106904857

>>106904766
You can get one for around 9万 on yahoo if you are patient enough. Anything lower is usually “didn’t have an opportunity to test” = it doesn’t work

Anonymous
10/16/25(Thu)00:20:21 No.106904858

Anonymous 10/16/25(Thu)00:20:21 No.106904858

>>106904760
I remember hexagon anon's struggles. He was cool

Anonymous
10/16/25(Thu)00:26:25 No.106904894

Anonymous 10/16/25(Thu)00:26:25 No.106904894

>>106904858
Yeah. But, again, hexes were the simplest bit of code in his thing. Focusing so much on making the model spit code for him instead of just writing it was a waste of time. The link I gave him had ALL the code he needed to make them and get on with the rest of his project.
Similar to all those prospective VN makers
>If i could only draw i'd make the best VN...
>Oh, now that i have image gen i can totally make a game. I just need a good story and some dialog...
>Oh, now that i have LLMs, i can write the story. I just need to learn to code...
>Oh, now that LLMs can code, i can totally make my VN. If only these LLMs where better. WHY ARE THEY SO SHIT?!?!?!?!?!?
Instead of using all the new shiny toys to learn.

Anonymous
10/16/25(Thu)00:27:45 No.106904905

Anonymous 10/16/25(Thu)00:27:45 No.106904905

>>106904894
>where
kek. meant to say "were"

Anonymous
10/16/25(Thu)00:58:58 No.106905065

Anonymous 10/16/25(Thu)00:58:58 No.106905065

>>106903991
why is this faggot comparing m4 pro to the dgx spark when m4 max exists and costs less?? 3500$ vs 4000$
also
>engine ollama
MLX exists for macs, and pretty sure llamacpp is better on spark too
fucking faggot meme nvidia bootlicker benchmark
also
mac mini m4 pro costs 2000$ lol

Anonymous
10/16/25(Thu)01:18:10 No.106905173

Anonymous 10/16/25(Thu)01:18:10 No.106905173

Apple has been making computers for 40 years
Nvidia has never made a desktop computer before. Doesn't this have its own operating system even? Yeah a skinned version of Ubuntu but would you rather it had Mac OS?

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.