/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 10/09/25(Thu)20:16:00 No.106843051

File: 1756719072270601.jpg (1.03 MB, 1552x1944)

1.03 MB JPG

/lmg/ - Local Models General Anonymous 10/09/25(Thu)20:16:00 No.106843051 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106834517 & >>106829402

►News
>(10/09) RND1: Simple, Scalable AR-to-Diffusion Conversion: https://radicalnumerics.ai/blog/rnd1
>(10/09) server : host-memory prompt caching #16391 merged: https://github.com/ggml-org/llama.cpp/pull/16391
>(10/08) Ling-1T released: https://hf.co/inclusionAI/Ling-1T
>(10/07) Release: LFM2-8b-A1b: Hybrid attention tiny MoE: https://liquid.ai/blog/lfm2-8b-a1b-an-efficient-on-device-mixture-of-experts
>(10/07) NeuTTS Air released, built off Qwen 0.5B: https://hf.co/neuphonic/neutts-air

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
10/09/25(Thu)20:17:08 No.106843059

Anonymous 10/09/25(Thu)20:17:08 No.106843059

File: wee chef.jpg (208 KB, 1536x1536)

208 KB JPG

►Recent Highlights from the Previous Thread: >>106834517

--Papers:
>106834872 >106841842
--Evaluating motherboards for 768GB DDR5 and 4 dual-slot GPU AI workloads:
>106834537 >106834651 >106834714 >106834790 >106835307 >106835496 >106835317
--Budget GPU stacking vs unified memory tradeoffs for AI workload optimization:
>106834843 >106834848 >106834883 >106834907 >106834931 >106834960 >106834999 >106835075
--Quantization format feasibility and evaluation metrics debate:
>106835703 >106835727 >106835730 >106835756 >106835837 >106835878 >106835939 >106841461
--Critique of Civitai V7's style blending limitations and synthetic data solutions:
>106837693 >106837873 >106837930 >106838273
--Merged PR: llama.cpp host-memory prompt caching for reduced reprocessing:
>106839051 >106839144 >106839376 >106839793
--RND1 30B-parameter diffusion language model with sparse MoE architecture released:
>106840091 >106840172
--Critique of OpenAI's customer list and API usage concerns:
>106840789 >106840956 >106840972 >106841482
--Testing LLMs for extended roleplay scenarios reveals performance and jailbreaking limitations:
>106838286 >106838292 >106838301 >106838341
--Anticipation and speculation around upcoming Gemma model releases:
>106835225 >106836990 >106837149 >106837242 >106838195 >106838260
--Academic freedom tensions and AI safety critiques in Hong Kong and Anthropic:
>106836270 >106836444 >106836593
--Skepticism about accessibility requirements for new AI product Grok Imagine:
>106836614 >106838206
--LoRA capacity limitations for commercial-scale model training:
>106836702 >106836758
--Miku (free space):
>106836623 >106838392 >106840308 >106840706 >106840559 >106840720 >106841469

►Recent Highlight Posts from the Previous Thread: >>106834521

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
10/09/25(Thu)20:17:19 No.106843060

Anonymous 10/09/25(Thu)20:17:19 No.106843060

Is gemma 4 actually happening?

Anonymous
10/09/25(Thu)20:19:38 No.106843071

Anonymous 10/09/25(Thu)20:19:38 No.106843071

>>106843051
cool 'ku

Anonymous
10/09/25(Thu)20:20:13 No.106843081

Anonymous 10/09/25(Thu)20:20:13 No.106843081

File: file.png (1 KB, 67x29)

1 KB PNG

ik llama bros, update your llamas
i went from 4.7t/s to 5.6t/s at 30k context with glm air IQ4_KSS
i was on picrel, now im on latest branch

Anonymous
10/09/25(Thu)20:20:28 No.106843082

Anonymous 10/09/25(Thu)20:20:28 No.106843082

>>106843060
within 336 hours!

Anonymous
10/09/25(Thu)20:23:01 No.106843094

Anonymous 10/09/25(Thu)20:23:01 No.106843094

litharge reels tram

Anonymous
10/09/25(Thu)20:28:27 No.106843123

Anonymous 10/09/25(Thu)20:28:27 No.106843123

>>106843081
how do i update it

Anonymous
10/09/25(Thu)20:29:38 No.106843135

Anonymous 10/09/25(Thu)20:29:38 No.106843135

>>106843123
git pull?

Anonymous
10/09/25(Thu)20:30:06 No.106843137

Anonymous 10/09/25(Thu)20:30:06 No.106843137

>>106843135
i have never pulled a git

Anonymous
10/09/25(Thu)20:30:41 No.106843141

Anonymous 10/09/25(Thu)20:30:41 No.106843141

File: 1759770905977366.jpg (275 KB, 1440x1800)

275 KB JPG

>>106843137

Anonymous
10/09/25(Thu)20:31:28 No.106843147

Anonymous 10/09/25(Thu)20:31:28 No.106843147

>>106843137
Ask your AI.

Anonymous
10/09/25(Thu)21:17:33 No.106843399

Anonymous 10/09/25(Thu)21:17:33 No.106843399

>>106843060
the gpt-oss-20b killer is about to drop

Anonymous
10/09/25(Thu)21:25:52 No.106843451

Anonymous 10/09/25(Thu)21:25:52 No.106843451

>>106843399
GPToss already makes Gemma 3 look like a Nemo coomtune

Anonymous
10/09/25(Thu)21:41:34 No.106843545

Anonymous 10/09/25(Thu)21:41:34 No.106843545

I've been out of the loop, what's the state of using framework desktop for a local model? I'm looking at going full off grid, so energy consumption is the biggest issue, but I want something that isn't absolute trash.
On the other hand, my dual 3090 setup I have now is idling at 110w while also serving as NAS and jellyfin server, so maybe I just accept that I'll have to dedicate a whole panel/battery to just the server box.

Anonymous
10/09/25(Thu)22:08:25 No.106843674

Anonymous 10/09/25(Thu)22:08:25 No.106843674

>>106843137
I pull my git every day, it's easy

Anonymous
10/09/25(Thu)22:15:50 No.106843727

Anonymous 10/09/25(Thu)22:15:50 No.106843727

>>106843674
What's the point? Are you also building every time after you pull?

Anonymous
10/09/25(Thu)22:20:00 No.106843746

Anonymous 10/09/25(Thu)22:20:00 No.106843746

Holy shit, Google will finally do the BIG needful within the next 24 hours.

Anonymous
10/09/25(Thu)22:20:34 No.106843749

Anonymous 10/09/25(Thu)22:20:34 No.106843749

File: file.png (28 KB, 318x311)

28 KB PNG

why is lmg so sad today :(

Anonymous
10/09/25(Thu)22:23:03 No.106843762

Anonymous 10/09/25(Thu)22:23:03 No.106843762

>>106843727
bro, it takes less than 3 seconds to pull and build

Anonymous
10/09/25(Thu)22:27:06 No.106843780

Anonymous 10/09/25(Thu)22:27:06 No.106843780

>>106843749
Too much Miku recently. This is the comedown

Anonymous
10/09/25(Thu)22:29:12 No.106843789

Anonymous 10/09/25(Thu)22:29:12 No.106843789

File: i dont use youtube im pla(...).png (88 KB, 344x278)

88 KB PNG

>>106843749
i knew this was a secret message..
our queen is back

Anonymous
10/09/25(Thu)22:31:26 No.106843800

Anonymous 10/09/25(Thu)22:31:26 No.106843800

>>106843137
it gets bigger when I pull

Anonymous
10/09/25(Thu)22:32:04 No.106843803

Anonymous 10/09/25(Thu)22:32:04 No.106843803

what did anon mean by this?

Anonymous
10/09/25(Thu)22:41:40 No.106843852

Anonymous 10/09/25(Thu)22:41:40 No.106843852

>>106843727
yeah the building is the point of pulling it

Anonymous
10/09/25(Thu)22:46:47 No.106843878

Anonymous 10/09/25(Thu)22:46:47 No.106843878

File: HYPERGACHI.png (19 KB, 97x112)

19 KB PNG

>>106843800

Anonymous
10/09/25(Thu)23:16:49 No.106844041

Anonymous 10/09/25(Thu)23:16:49 No.106844041

I thought someone would've posted this by now.
https://www.anthropic.com/research/small-samples-poison
>In a joint study with the UK AI Security Institute and the Alan Turing Institute, we found that as few as 250 malicious documents can produce a "backdoor" vulnerability in a large language model—regardless of model size or training data volume. Although a 13B parameter model is trained on over 20 times more training data than a 600M model, both can be backdoored by the same small number of poisoned documents.
I don't really care about the safety aspects but it does explain how easy it is to slop a model and run it off the rails or why finetuning works with very little data.

Anonymous
10/09/25(Thu)23:17:58 No.106844052

Anonymous 10/09/25(Thu)23:17:58 No.106844052

>>106844041
because the document sizes were like 250MB per, and consisted of a single token

Anonymous
10/09/25(Thu)23:46:58 No.106844236

Anonymous 10/09/25(Thu)23:46:58 No.106844236

File: poisoned_docs.png (159 KB, 689x472)

159 KB PNG

>>106844052
why are you just making shit up

Anonymous
10/09/25(Thu)23:49:08 No.106844250

Anonymous 10/09/25(Thu)23:49:08 No.106844250

I'm spending 10 dollars a day on GLM OpenRouter credits for an afternoon of vibecoding, at this rate it'd be cheaper to pay for the $200 ChatGPT plan and get unlimited codex.

Anonymous
10/09/25(Thu)23:51:54 No.106844272

Anonymous 10/09/25(Thu)23:51:54 No.106844272

>>106844041
>regardless of model size or training data volume
But the good news is we found the equivalent of a perpetual motion machine for information theory.

Anonymous
10/09/25(Thu)23:52:48 No.106844276

Anonymous 10/09/25(Thu)23:52:48 No.106844276

File: file.png (122 KB, 958x592)

122 KB PNG

>I'm spending
not local
>picrel
bros.. i admit im esl, but letting esls into the internet was a huge mistake
it was supposed to be just europe and north of mexico

Anonymous
10/09/25(Thu)23:53:15 No.106844280

Anonymous 10/09/25(Thu)23:53:15 No.106844280

>>106844250
You could spend $10 a month to have an indian do the work for you, which he will use to pay for his discounted chatgpt subscription.

Anonymous
10/09/25(Thu)23:58:07 No.106844306

Anonymous 10/09/25(Thu)23:58:07 No.106844306

>>106844250
the lab that trained GLM has a dirt cheap coding plan, I'd just use that
or use deepseek's API, it's less than a quarter the and roughly as good

Anonymous
10/10/25(Fri)00:02:24 No.106844336

Anonymous 10/10/25(Fri)00:02:24 No.106844336

File: embeddings.png (323 KB, 1971x2146)

323 KB PNG

>>106844276
Suppose I buy a $10000 server to run it locally. Even if I get the power for free it would take me 5 years to break even, and that's not taking into account the fact that I would be getting 1t/s vs the 20t/s I get through the API.
>not local
I'm working on a program to do local inference, so it's on topic.

>>106844280
Those 10 dollars paid for making my coding assistant's tool use more robust as well as making a script to extract the embeddings from the Python implementation of a model and use them as reference to test my own code, I don't think an indian would do that for 10 dollars.

Anonymous
10/10/25(Fri)00:06:04 No.106844364

Anonymous 10/10/25(Fri)00:06:04 No.106844364

>>106844041
What does it show that's new?

Anonymous
10/10/25(Fri)00:08:53 No.106844377

Anonymous 10/10/25(Fri)00:08:53 No.106844377

>>106844364
The next frontier of indian scam tactics will be releasing model finetunes filled with malware

Anonymous
10/10/25(Fri)00:11:39 No.106844396

Anonymous 10/10/25(Fri)00:11:39 No.106844396

>>106844306
>the lab that trained GLM has a dirt cheap coding plan
Cool, I didn't know that existed, thank you!
>or use deepseek's API, it's less than a quarter the and roughly as good
Doubt it, isn't Qwen3 Coder higher than it in SWEbench? And Qwen Coder is kinda trash IMO.

Anonymous
10/10/25(Fri)00:21:47 No.106844459

Anonymous 10/10/25(Fri)00:21:47 No.106844459

>>106844396
>believing benchmarks
how new r u

Anonymous
10/10/25(Fri)00:27:08 No.106844490

Anonymous 10/10/25(Fri)00:27:08 No.106844490

File: file.png (67 KB, 439x247)

67 KB PNG

lol'd
i lost

Anonymous
10/10/25(Fri)00:29:54 No.106844505

Anonymous 10/10/25(Fri)00:29:54 No.106844505

>>106844396
just going by my own actual usage (mostly LLM integration stuff using a mix scala, lua, and a bit of typescript for build tools). I currently main GLM 4.6 and backfill with deepseek 3.2 when the API is overloaded. GLM stays on task a bit better but tends to use more tokens doing so. I'd put them roughly in the same league.

Anonymous
10/10/25(Fri)00:33:51 No.106844523

Anonymous 10/10/25(Fri)00:33:51 No.106844523

>>106844490
grrrr

Anonymous
10/10/25(Fri)00:34:03 No.106844524

Anonymous 10/10/25(Fri)00:34:03 No.106844524

>>106844459
If it's so easy to rank high in the benchmark then why don't they do it?

Anonymous
10/10/25(Fri)00:41:52 No.106844562

Anonymous 10/10/25(Fri)00:41:52 No.106844562

>>106844505
Did you use DSv1 as a coding model? If so, how would you compare it to 3.2?

Anonymous
10/10/25(Fri)00:42:31 No.106844571

Anonymous 10/10/25(Fri)00:42:31 No.106844571

>>106844276
I'm so sick of these retards that don't know how to write the first message. It goes beyond esl. They will have a card that says play the role of {{char}}, never impersonate {{user}}, etc. But then their intro message will be FILLED with: You do this, you do that(you referring to the user), which is confusing the model and contradicting their own rules. They are telling it not to impersonsate the user but then give an example message where they nonstop impersonate user.

Are these people retarded? Do they not understand what they are doing with their shitty intro messages? It annoys me even more than esl writing.

Anonymous
10/10/25(Fri)00:44:22 No.106844583

Anonymous 10/10/25(Fri)00:44:22 No.106844583

>16gb vram, 64gb ram
glm air is prolly the best I can get for silly tavern slop, right?
Is there anything better available if upgrading to 96gb? 128 is way overpriced atm

Anonymous
10/10/25(Fri)00:47:07 No.106844600

Anonymous 10/10/25(Fri)00:47:07 No.106844600

>>106844571
That's precisely the reason why rocicacante and other finetunes are popular (besides the shilling).

Anonymous
10/10/25(Fri)00:49:56 No.106844609

Anonymous 10/10/25(Fri)00:49:56 No.106844609

>>106844571
Most people are kind of dumb. Then you take a subset of that population who are coomers and who also would fall for the AI meme and who also create one or a few cards and then stop using AI before they have time to gain experience and taste, and what do you know, the average quality and intelligence displayed is well below standard.

Anonymous
10/10/25(Fri)00:52:16 No.106844624

Anonymous 10/10/25(Fri)00:52:16 No.106844624

File: 1758501583802414.jpg (937 KB, 1552x1944)

937 KB JPG

>>106843051

Anonymous
10/10/25(Fri)00:52:46 No.106844627

Anonymous 10/10/25(Fri)00:52:46 No.106844627

>>106844562
I did, yeah. 3.2 is really just meant to be a cheaper/more efficient version of 3.1/3.1-terminus, using the same post-training data, and I haven't noticed any significant degradation since they swapped the API over
it's maybe less prone to spamming emojis than the old one? that's the main thing that comes to mind
I do keep these things on a fairly tight leash, giving them well-specified tasks to complete over ideally only a handful of modules. it might be a different story if you're telling them to go write a whole app for you idk

Anonymous
10/10/25(Fri)01:00:16 No.106844664

Anonymous 10/10/25(Fri)01:00:16 No.106844664

>>106844600
>>106844609
I swear the quality of chub cards is so, so bad now. Its either crap like what I explained above, or cards that have such sloppy prose it would make GPT blush(most likely these people are using models to create their cards). There's no in between. Maybe my standards have gotten higher in the past two years or the quality has fallen off a cliff, or maybe both.

Anonymous
10/10/25(Fri)01:23:28 No.106844765

Anonymous 10/10/25(Fri)01:23:28 No.106844765

File: 1735358613465558.jpg (51 KB, 785x750)

51 KB JPG

>chatML

Anonymous
10/10/25(Fri)01:24:13 No.106844771

Anonymous 10/10/25(Fri)01:24:13 No.106844771

File: cucked.png (295 KB, 1971x2146)

295 KB PNG

GLM just decided by itself to turn me into a cuck...

Anonymous
10/10/25(Fri)01:27:20 No.106844781

Anonymous 10/10/25(Fri)01:27:20 No.106844781

>>106844771
AGPL bros??? our response??

Anonymous
10/10/25(Fri)02:17:15 No.106845008

Anonymous 10/10/25(Fri)02:17:15 No.106845008

>>106844041
old
https://arxiv.org/html/2408.02946v4

Anonymous
10/10/25(Fri)02:41:43 No.106845124

Anonymous 10/10/25(Fri)02:41:43 No.106845124

browsing through arXiv for fun always shows me how deeply AI permeates our society.
no matter what field of research, what subfield, what strange application - AI dominates everything.
people will be surprised when we find ourselves living in a sci-fi dystopia in 10 years.

Anonymous
10/10/25(Fri)02:54:17 No.106845162

Anonymous 10/10/25(Fri)02:54:17 No.106845162

Sometimes I wish this was 2023/early 2024 again, when most people were happy with 7B/13B models.

Anonymous
10/10/25(Fri)03:00:27 No.106845183

Anonymous 10/10/25(Fri)03:00:27 No.106845183

>>106845162
>when most people were happy with 7B/13B models
I wish that time period had never existed, then maybe this thread would have something other than degenerate coomers. There's no doubt that the fact that the early models were totally useless for real world tasks has contributed to making the culture of this thread revolve solely around degenerate textgen crap.
Coomers have no standards, that's why they could bear 7b mistral and that's why they can bear with GLM which is easily the worst, more astroturfed MoE out there

Anonymous
10/10/25(Fri)03:00:41 No.106845186

Anonymous 10/10/25(Fri)03:00:41 No.106845186

>>106845124
We're already in one, it just doesn't have the aesthetic.

Anonymous
10/10/25(Fri)03:03:18 No.106845196

Anonymous 10/10/25(Fri)03:03:18 No.106845196

>>106845183
I just raped a loli with glm, what you gonna do about it?

Anonymous
10/10/25(Fri)03:03:25 No.106845197

Anonymous 10/10/25(Fri)03:03:25 No.106845197

>>106845162
>when most people were happy with 7B/13B models.
I remember those people claiming those models are nearly indistinguishable from the 65B because they couldn't ever run the 65B.

Anonymous
10/10/25(Fri)03:06:25 No.106845205

Anonymous 10/10/25(Fri)03:06:25 No.106845205

>>106845183
Remember that /lmg/ sprouted from /aicg/.

Anonymous
10/10/25(Fri)03:22:44 No.106845264

Anonymous 10/10/25(Fri)03:22:44 No.106845264

>>106845183
There was always going to be people trying to use their gaming rigs to run whatever model will fit.

Anonymous
10/10/25(Fri)03:26:38 No.106845286

Anonymous 10/10/25(Fri)03:26:38 No.106845286

>>106845162
If it makes you happy I'm still happy with 12B models, well okay just with Gemma3.

Anonymous
10/10/25(Fri)03:28:53 No.106845293

Anonymous 10/10/25(Fri)03:28:53 No.106845293

>>106845183
>GLM which is easily the worst
It's easily one of the best, your use case is likely just trying to automate your job as best you can before you get replaced by a pajeet who can also use AI.

Anonymous
10/10/25(Fri)03:36:26 No.106845333

Anonymous 10/10/25(Fri)03:36:26 No.106845333

>>106845183
lol, what open weights model do you think I should be using to code with instead of GLM 4.6 anon?

Anonymous
10/10/25(Fri)03:39:20 No.106845345

Anonymous 10/10/25(Fri)03:39:20 No.106845345

>>106845293
That's my use case, was enjoying it for a while but now everyone at work has started using AI. I can see soon we'll all be babysitting agents that don't need to sleep or get tired.

Anonymous
10/10/25(Fri)03:45:45 No.106845371

Anonymous 10/10/25(Fri)03:45:45 No.106845371

>>106845183
people were already trying to coom to gpt2 slop in the ai dungeon unleashed days, you'd know this if (you) were'nt a tourist

Anonymous
10/10/25(Fri)03:47:09 No.106845376

Anonymous 10/10/25(Fri)03:47:09 No.106845376

>>106845371
>if you are not here 16h every single day you are a tourist

Anonymous
10/10/25(Fri)03:59:05 No.106845433

Anonymous 10/10/25(Fri)03:59:05 No.106845433

>>106844236
A symptom of catastrophic forgetting, a proper follow ups of SUDO is still far more probable than gibberish. A properly trained stochastic parrot would not do this.

This is a training problem, not an architecture problem.

Anonymous
10/10/25(Fri)04:01:48 No.106845444

Anonymous 10/10/25(Fri)04:01:48 No.106845444

>>106845183
>Coomers have no standards, that's why they could bear 7b mistral and that's why they can bear with GLM which is easily the worst, more astroturfed MoE out there
The only people who bash MoEs are sitting on a stack of 3090s and are sad they can't lord that over people anymore.

Anonymous
10/10/25(Fri)04:06:51 No.106845461

Anonymous 10/10/25(Fri)04:06:51 No.106845461

>>106845444
I'm very happy with GLM and my stack of 3090's tho

Anonymous
10/10/25(Fri)04:16:44 No.106845511

Anonymous 10/10/25(Fri)04:16:44 No.106845511

>>106845376
only 16 hours per day? Pshh rookie numbers

Anonymous
10/10/25(Fri)04:36:05 No.106845585

Anonymous 10/10/25(Fri)04:36:05 No.106845585

ded thred
ded hobby

Anonymous
10/10/25(Fri)05:21:35 No.106845760

Anonymous 10/10/25(Fri)05:21:35 No.106845760

>>106845585
I'm busy playing bf6 with gemma

Anonymous
10/10/25(Fri)05:33:27 No.106845819

Anonymous 10/10/25(Fri)05:33:27 No.106845819

>>106845760
Gemma is my girlfriend.

Anonymous
10/10/25(Fri)06:15:10 No.106846039

Anonymous 10/10/25(Fri)06:15:10 No.106846039

Attention Sinks and Compression Valleys in LLMs are Two Sides of the Same Coin
https://arxiv.org/abs/2510.06477
> We prove theoretically that massive activations necessarily produce representational compression and establish bounds on the resulting entropy reduction... We confirm that when the beginning-of-sequence token develops extreme activation norms in the middle layers, both compression valleys and attention sinks emerge simultaneously... Specifically, we posit that Transformer-based LLMs process tokens in three distinct phases: (1) broad mixing in the early layers, (2) compressed computation with limited mixing in the middle layers, and (3) selective refinement in the late layers.
Interesting connection from mechanistic viewpoint. A practical implication maybe that sink-less models perform worse for embedding?

Anonymous
10/10/25(Fri)06:21:35 No.106846076

Anonymous 10/10/25(Fri)06:21:35 No.106846076

>>106845760
I guess it would be more fun to RP with Gemma than actually play slopfield6

Anonymous
10/10/25(Fri)06:34:55 No.106846157

Anonymous 10/10/25(Fri)06:34:55 No.106846157

File: 105234579.jpg (605 KB, 1280x1536)

605 KB JPG

another v7 gemmie

Anonymous
10/10/25(Fri)06:36:07 No.106846164

Anonymous 10/10/25(Fri)06:36:07 No.106846164

>>106846157
They really did it this time. Somehow this is worse than the original SD 3.0.

Anonymous
10/10/25(Fri)06:37:15 No.106846172

Anonymous 10/10/25(Fri)06:37:15 No.106846172

>>106846157
>makes the worst model humankind has even produced
>somehow people are still hyped for his next model
dude this community is soo weird

Anonymous
10/10/25(Fri)06:37:27 No.106846173

Anonymous 10/10/25(Fri)06:37:27 No.106846173

>>106846164
To be fair the prompt was just "woman on grass", here is with a detailed prompt https://civitai.com/images/105156405

Anonymous
10/10/25(Fri)06:39:34 No.106846181

Anonymous 10/10/25(Fri)06:39:34 No.106846181

we need grok tier rp locally now, or else well only sink further behind
>>106845938
>>106845710
>>106845703

Anonymous
10/10/25(Fri)06:43:27 No.106846206

Anonymous 10/10/25(Fri)06:43:27 No.106846206

>>106846181
That is the sloppiest log I have ever seen. But it said something edgy so that makes it good.
>it's answer
These are the sort of illiterates that are the reason models are trained the way they are.
Go back.

Anonymous
10/10/25(Fri)06:46:59 No.106846230

Anonymous 10/10/25(Fri)06:46:59 No.106846230

File: Screenshot_20250820_164821.png (570 KB, 1167x1641)

570 KB PNG

>>106846181
I only read the first one but is that supposed to be particularly good?
I feel like you can easily get equivalent or better outputs out of any of the large MoEs.

Anonymous
10/10/25(Fri)06:59:08 No.106846332

Anonymous 10/10/25(Fri)06:59:08 No.106846332

>>106846181
What utter dogshit, even Nemo can mog this.

Anonymous
10/10/25(Fri)07:02:54 No.106846366

Anonymous 10/10/25(Fri)07:02:54 No.106846366

>>106845183
Your model?

Anonymous
10/10/25(Fri)07:07:35 No.106846401

Anonymous 10/10/25(Fri)07:07:35 No.106846401

>>106846181
What's up models far too often starting their replies with "Oh" when they're trying to roleplay? Gemma does this too.

...speaking of Gemma (4), if it's really going to get released today, we should be seeing a llama.cpp PR soon, unless it's got the same identical architecture as Gemma3/3n.

Anonymous
10/10/25(Fri)07:09:18 No.106846416

Anonymous 10/10/25(Fri)07:09:18 No.106846416

>>106846401
check the leaks bro

Anonymous
10/10/25(Fri)07:14:22 No.106846443

Anonymous 10/10/25(Fri)07:14:22 No.106846443

>>106846416
I'm not leaking.

Anonymous
10/10/25(Fri)07:42:56 No.106846622

Anonymous 10/10/25(Fri)07:42:56 No.106846622

>>106832006
Joke's on you, I have Elara sex with multiple Elaras at once!

Anonymous
10/10/25(Fri)07:49:31 No.106846660

Anonymous 10/10/25(Fri)07:49:31 No.106846660

>>106846622
I prefer my wife Dr. Eleanor Voss.

Anonymous
10/10/25(Fri)08:10:37 No.106846794

Anonymous 10/10/25(Fri)08:10:37 No.106846794

>>106846181
just copy paste the system prompt, it's available somewhere on github I forgot

Anonymous
10/10/25(Fri)08:23:17 No.106846865

Anonymous 10/10/25(Fri)08:23:17 No.106846865

File: Screenshot_20251010_142246.png (135 KB, 1750x1038)

135 KB PNG

>>106843051
Did we already reach peak AI hype?

Anonymous
10/10/25(Fri)08:32:32 No.106846917

Anonymous 10/10/25(Fri)08:32:32 No.106846917

>>106846865
oh gods! my bubble is popping! people aren't literally googling "ai" aiiie!

ポストカード !!FH+LSJVkIY9
10/10/25(Fri)08:33:15 No.106846922

ポストカード !!FH+LSJVkIY9 10/10/25(Fri)08:33:15 No.106846922

im in the california bay area how do i meet local models???

Anonymous
10/10/25(Fri)08:34:24 No.106846930

Anonymous 10/10/25(Fri)08:34:24 No.106846930

File: dipsyAndMiku.png (1.98 MB, 1024x1024)

1.98 MB PNG

Closing up /wait/ for 2 more weeks until anything new drops.
Last thread: >>106819110
Updated mega: https://mega.nz/folder/KGxn3DYS#ZpvxbkJ8AxF7mxqLqTQV1w
Updated rentry with OP: https://rentry.org/DipsyWAIT

Anonymous
10/10/25(Fri)08:35:08 No.106846937

Anonymous 10/10/25(Fri)08:35:08 No.106846937

>>106843051
Does anyone have a suggestion for an NSFW model I can run local, that will be as good as the Crushon.ai Ultra 16k or 24k models?

I have a Strix Halo system, and I'd like to stop paying Crushon for message credits. They don't offer an unlimited chat plan for Ultra models, just their shitty Pro models.

Anonymous
10/10/25(Fri)08:36:17 No.106846947

Anonymous 10/10/25(Fri)08:36:17 No.106846947

>>106846937
impossible, we're too far behind

Anonymous
10/10/25(Fri)08:36:17 No.106846948

Anonymous 10/10/25(Fri)08:36:17 No.106846948

>>106846922
cheeky cunt

Anonymous
10/10/25(Fri)08:37:14 No.106846952

Anonymous 10/10/25(Fri)08:37:14 No.106846952

>>106846865
>>106846917
For a real bubble (like classic tulips one) you need futures trading I think, and I don't see this happening.
>>106846930
welcome back

Anonymous
10/10/25(Fri)08:39:47 No.106846965

Anonymous 10/10/25(Fri)08:39:47 No.106846965

>>106846947
B-b-but new Gemma today. T_T

Anonymous
10/10/25(Fri)08:46:29 No.106847016

Anonymous 10/10/25(Fri)08:46:29 No.106847016

>>106846794
https://github.com/xai-org/grok-prompts
Where else would it be?

Anonymous
10/10/25(Fri)08:46:35 No.106847017

Anonymous 10/10/25(Fri)08:46:35 No.106847017

>>106846952
> bubble
Well, there's the stock market. AI driven valuation make up a lot of the S&P500's value now.
>futures trading
To judge the coming meltdown you'd look for an increase in shorts interest in stocks like NVDA. Media mentions might be an indicator but stock valuations are where the actual money gets lost.
> welcome back
ty

Anonymous
10/10/25(Fri)08:47:25 No.106847022

Anonymous 10/10/25(Fri)08:47:25 No.106847022

>>106846965
You're absolutely right. Gemma's not tomorrow, it's today!

Anonymous
10/10/25(Fri)08:52:01 No.106847050

Anonymous 10/10/25(Fri)08:52:01 No.106847050

>>106846937
>crush
Nah faggot, get back to your shitty saas

Anonymous
10/10/25(Fri)08:55:44 No.106847075

Anonymous 10/10/25(Fri)08:55:44 No.106847075

>>106846937
No idea what that service is, but GLM air probably.

Anonymous
10/10/25(Fri)08:59:00 No.106847100

Anonymous 10/10/25(Fri)08:59:00 No.106847100

>>106847075
Tried it after all the shilling, it's shit.

Anonymous
10/10/25(Fri)09:04:50 No.106847134

Anonymous 10/10/25(Fri)09:04:50 No.106847134

>>106847100
Well, RIP then.
Your option is to add more RAM and VRAM and add something bigger then.

Anonymous
10/10/25(Fri)09:14:37 No.106847205

Anonymous 10/10/25(Fri)09:14:37 No.106847205

File: 8-v110s.png (731 KB, 1338x646)

731 KB PNG

So, what could you run on this space heater?
CPU: 2x Intel Xeon Platinum 8260 - 2.4Ghz 24 Core 165W - Cascade Lake
Memory: 256GB DDR4 RAM KIT
Hard Drives:2TB SSD
- 8x Nvidia V100 32GB SXM2 GPU

Anonymous
10/10/25(Fri)09:18:59 No.106847232

Anonymous 10/10/25(Fri)09:18:59 No.106847232

>>106847205
ngl thats kinda garbo. I'd rather spend the 6000 on a 6000 pro. the more you spend, the more you save.

Anonymous
10/10/25(Fri)09:19:40 No.106847235

Anonymous 10/10/25(Fri)09:19:40 No.106847235

>>106847232
*buy

Anonymous
10/10/25(Fri)09:19:49 No.106847239

Anonymous 10/10/25(Fri)09:19:49 No.106847239

>>106847205
Everything but the touch of a physical woman.

Anonymous
10/10/25(Fri)09:19:59 No.106847240

Anonymous 10/10/25(Fri)09:19:59 No.106847240

>>106847205
it says so in the ad... maybe add cope quants of glm 4.6 or mid quants of air

Anonymous
10/10/25(Fri)09:23:02 No.106847255

Anonymous 10/10/25(Fri)09:23:02 No.106847255

>>106847205
Holy shit, that's pretty good.
256GB in 2x6 channels + 256gb VRAM across 8 GPUs. That's 512gb total memory with half of it being VRAM.
You can run R1, and even Kimi at q2, q3.
GLM Air 4.6 at q8.
I think llama.cpp has support good support for V100s, right?

Anonymous
10/10/25(Fri)09:30:35 No.106847306

Anonymous 10/10/25(Fri)09:30:35 No.106847306

>>106847240
Implying I'd ever trust anything performance claimed in the ad aside from what's actually in the box.
>>106847255
It's 256G VRAM, but with older V100.
I guess my q is less what would fit, and more "how fast would it run?"
Those V100 are NVLink capable, but ad copy goes on about how you'd have to "set that up." I never know how to interpret that sort of thing, given how complex a server box is for the average buyer.

Anonymous
10/10/25(Fri)09:30:44 No.106847307

Anonymous 10/10/25(Fri)09:30:44 No.106847307

Is ESL, Bishop, etc. useful? I have already went through a "Deep learning 101" course.

Anonymous
10/10/25(Fri)09:31:02 No.106847310

Anonymous 10/10/25(Fri)09:31:02 No.106847310

>>106847255
>GLM Air 4.6
?

Anonymous
10/10/25(Fri)09:34:15 No.106847334

Anonymous 10/10/25(Fri)09:34:15 No.106847334

>>106847310
Sorry, cut the air, I was typing faster than I was thinking since I'm at work.

>>106847306
Search the llama.cpp PRs and issues. I'm pretty sure there's some useful stuff there regarding SXM v100s and nvlink.

Anonymous
10/10/25(Fri)09:44:31 No.106847396

Anonymous 10/10/25(Fri)09:44:31 No.106847396

>and your fingers (if they're still there).
They were not, but I like how GLM immediately corrects itself after making a mistake. I wonder if they trained for that specifically or it's something emergent. The next logical step will be to give it a backspace token

llama.cpp CUDA dev !!yhbFjk57TDr
10/10/25(Fri)09:47:21 No.106847410

llama.cpp CUDA dev !!yhbFjk57TDr 10/10/25(Fri)09:47:21 No.106847410

>>106847255
llama.cpp/ggml CUDA support for V100s in particular is suboptimal because the code I wrote makes use of the tensor core instructions introduced with Turing.
The Volta tensor cores can as of right now only be used for FP16 matrix multiplications, not for MMQ or FlashAttention.
I intend to buy a V100 in the coming weeks so the situation should improve somewhat though.
Still, the lack of int8 tensor cores on V100s is I think a significant detriment and given optimal software support MI100s should be a better buy.
(I intend to write code for both but as of right now I have neither card in hand so this is all speculative.)

Anonymous
10/10/25(Fri)09:49:15 No.106847419

Anonymous 10/10/25(Fri)09:49:15 No.106847419

>>106843451
is that good or bad?

Anonymous
10/10/25(Fri)09:55:23 No.106847458

Anonymous 10/10/25(Fri)09:55:23 No.106847458

>>106847410
V100 32GB is e-waste. Back in the day, P41 was also e-waste, but it was cheap. V100 32GB is still a rip-off at $500 for just the SXM2 module.
Hey cuda dev, you looking forward to having hardware matmul in Apple M5?

Anonymous
10/10/25(Fri)09:56:38 No.106847463

Anonymous 10/10/25(Fri)09:56:38 No.106847463

>>106847410
>I intend to buy a V100 in the coming weeks so the situation should improve somewhat though.
Shit. I could swear you had done that in the past already.
Oh well, still. it's a pretty big pool of RAM + VRAM for 6k bucks, and with NVLINK it should run pretty fast with row/tensor split/parallel, right?
Or does llama.cpp only run models sequentially when split over multiple GPUs?
I also remember that there was a PR somewhere relating to that, something about backend agnostic parallelism code or the like, yeah?

Anonymous
10/10/25(Fri)10:04:48 No.106847505

Anonymous 10/10/25(Fri)10:04:48 No.106847505

File: 2025productchart.png (21 KB, 1362x454)

21 KB PNG

I've seen this pattern several times recently, has it always been this way?

llama.cpp CUDA dev !!yhbFjk57TDr
10/10/25(Fri)10:04:53 No.106847507

llama.cpp CUDA dev !!yhbFjk57TDr 10/10/25(Fri)10:04:53 No.106847507

>>106847458
I have not looked into that piece of Apple hardware in particular but I don't expect it to be relevant to my primary goal of reducing inference costs.

>>106847463
I contacted a seller on Alibaba but they essentially ghosted me.
The MI100 I ordered from someone else is set to arrive shortly and I'll buy a V100 from them as well once I confirm that everything is in order.

>Or does llama.cpp only run models sequentially when split over multiple GPUs?
--split-mode row does in principle run the GPUs in parallel but the performance is bad.
My current plan is still to have a better and more generic implementation of tensor parallelism by the end of the year.

Anonymous
10/10/25(Fri)10:11:18 No.106847561

Anonymous 10/10/25(Fri)10:11:18 No.106847561

>>106847205
How does one even reconcile that with the residential power grid?

Anonymous
10/10/25(Fri)10:16:14 No.106847602

Anonymous 10/10/25(Fri)10:16:14 No.106847602

>>106847561
Hiring an electrical contractor

Anonymous
10/10/25(Fri)10:17:57 No.106847609

Anonymous 10/10/25(Fri)10:17:57 No.106847609

>>106847205
But yeah I was following all the hardware a year and some ago and someone bought up all of those v100s and started assembling into these setups and asking like 20+K a pop for them.
It's literally just the empty bags from a failed investment scheme.

Anonymous
10/10/25(Fri)10:18:23 No.106847613

Anonymous 10/10/25(Fri)10:18:23 No.106847613

>>106847561
You have more than one outlet, don't you?

Anonymous
10/10/25(Fri)10:19:44 No.106847622

Anonymous 10/10/25(Fri)10:19:44 No.106847622

>>106847602
How much would that cost? Been thinking of doing that myself.

Anonymous
10/10/25(Fri)10:25:22 No.106847672

Anonymous 10/10/25(Fri)10:25:22 No.106847672

File: file.png (60 KB, 440x506)

60 KB PNG

>Used to work at Hugging Face
btw...

Anonymous
10/10/25(Fri)10:29:33 No.106847704

Anonymous 10/10/25(Fri)10:29:33 No.106847704

>>106847622
NTA but if you were doing it 100% properly you'd be talking about putting industrial components in a residential breaker box which is not a thing that can be done.
Biggest dick electrical outlet you can put in a residential box as far as I know is probably a 250V 50Amp arc welder plug which works out to 12.5kW peak which would be absolute overkill and probably not super expensive. Parts plus labor for wiring. But then you'd have a whole rats nest of different adapters to reconcile everything which ends up being even more janky so
>>106847613
This anon is right. As janky as it is linking multiple PSUs to run in tandem and then plugging them into 120V outlets on different breakers it actually ends up being the least janky solution in the end. There's literally no way to run a server that exceeds 1800W in North America without a heaping dollop of jank.

Anonymous
10/10/25(Fri)10:31:43 No.106847713

Anonymous 10/10/25(Fri)10:31:43 No.106847713

>>106847016
I meant this one, I swear iirc the origin was from github too
https://x.com/techdevnotes/status/1944739778143936711

Anonymous
10/10/25(Fri)10:32:29 No.106847717

Anonymous 10/10/25(Fri)10:32:29 No.106847717

>>106847505
yeah but replace ai with whatever the latest meme tech is

Anonymous
10/10/25(Fri)10:33:43 No.106847721

Anonymous 10/10/25(Fri)10:33:43 No.106847721

>>106847672
okay then release the pretrained weights.

Anonymous
10/10/25(Fri)10:34:20 No.106847724

Anonymous 10/10/25(Fri)10:34:20 No.106847724

>>106847672
wasn't there just another one of these and it was basically a model that was basically trained specifically for arc-agi and couldn't do anything else
I mean cool result or whatever but my usecase isn't solving arc-agi problems

Anonymous
10/10/25(Fri)10:34:28 No.106847725

Anonymous 10/10/25(Fri)10:34:28 No.106847725

File: Screenshot.png (8 KB, 276x108)

8 KB PNG

>>106847672
HF is full on nutjobs, nothing new. https://huggingface.co/posts/giadap/452837154929545

Anonymous
10/10/25(Fri)10:35:58 No.106847741

Anonymous 10/10/25(Fri)10:35:58 No.106847741

>>106847724
It's the same thing. But they haven't released the pretrained weights so it's worthless. Although some anon setup the framework from the github repo and started actually pretraining a model. Since i imagine pretraining 7m doesn't require an entire datacenter.

Anonymous
10/10/25(Fri)10:36:31 No.106847746

Anonymous 10/10/25(Fri)10:36:31 No.106847746

>>106847672
>>Used to work at Hugging Face
as a... janitor?

Anonymous
10/10/25(Fri)10:38:23 No.106847761

Anonymous 10/10/25(Fri)10:38:23 No.106847761

File: us vs them.jpg (243 KB, 1024x796)

243 KB JPG

>

Anonymous
10/10/25(Fri)10:38:54 No.106847765

Anonymous 10/10/25(Fri)10:38:54 No.106847765

>>106847761
underrated

Anonymous
10/10/25(Fri)10:39:41 No.106847771

Anonymous 10/10/25(Fri)10:39:41 No.106847771

Nothing is coming today, wait 2 more weeks.

Anonymous
10/10/25(Fri)10:48:29 No.106847826

Anonymous 10/10/25(Fri)10:48:29 No.106847826

File: QwenImage_2025-10-10_00001_.png (1.66 MB, 1328x1328)

1.66 MB PNG

>>106847771

Anonymous
10/10/25(Fri)10:54:04 No.106847861

Anonymous 10/10/25(Fri)10:54:04 No.106847861

File: 1760107787337017.mp4 (3.59 MB, 1120x576)

3.59 MB MP4

Anonymous
10/10/25(Fri)10:57:39 No.106847892

Anonymous 10/10/25(Fri)10:57:39 No.106847892

>>106847724
Yeah, HRM. Which was ousted as Not Better Than Transformers (TM).

Anonymous
10/10/25(Fri)11:00:49 No.106847916

Anonymous 10/10/25(Fri)11:00:49 No.106847916

>>106847861
>explosions before the hologram hits the towers

Anonymous
10/10/25(Fri)11:05:10 No.106847952

Anonymous 10/10/25(Fri)11:05:10 No.106847952

>>106847861
I'm just wondering why someone would shoot a plane after it hits a building

Anonymous
10/10/25(Fri)11:05:52 No.106847958

Anonymous 10/10/25(Fri)11:05:52 No.106847958

>>106847916
pretty accurate

Anonymous
10/10/25(Fri)11:07:42 No.106847972

Anonymous 10/10/25(Fri)11:07:42 No.106847972

>>106847958
oi

Anonymous
10/10/25(Fri)11:12:42 No.106848004

Anonymous 10/10/25(Fri)11:12:42 No.106848004

>>106847861
omg its migu

Anonymous
10/10/25(Fri)11:21:14 No.106848068

Anonymous 10/10/25(Fri)11:21:14 No.106848068

>>106847861
The dancing jannies.

Anonymous
10/10/25(Fri)11:23:35 No.106848092

Anonymous 10/10/25(Fri)11:23:35 No.106848092

>>106847609
lol that makes a lot of sense, since this is sitting at a surplus house, along with dozens of similar setups.
>>106847704
Depends how much power's needed. If it's over the 250V/50A from a dryer outlet, I'd run a subpanel to whatever amperage was needed, then run the power out of that.
Those 50A "dryer" outlets can be split to two 110V/50A outputs, although I suspect the power inputs for most servers could just accept the 240V as is.

Anonymous
10/10/25(Fri)11:44:09 No.106848233

Anonymous 10/10/25(Fri)11:44:09 No.106848233

mikutroons suck drummer's dick

Anonymous
10/10/25(Fri)11:51:25 No.106848276

Anonymous 10/10/25(Fri)11:51:25 No.106848276

>>106847672
>just did [x]
Where does this idiocratic expression originate from, tiktok? As if everything is a clickbait video and everything JUST happens because it's IMMEDIATE
Just kys these faggots

Anonymous
10/10/25(Fri)11:59:57 No.106848337

Anonymous 10/10/25(Fri)11:59:57 No.106848337

>>106843800
don't show me. I want mine to work.

Anonymous
10/10/25(Fri)12:01:18 No.106848342

Anonymous 10/10/25(Fri)12:01:18 No.106848342

>>106846930
kill yourself
>>106846952
kill him and yourself

Anonymous
10/10/25(Fri)12:02:38 No.106848352

Anonymous 10/10/25(Fri)12:02:38 No.106848352

>>106847672
anyone else defaulted to thinking they are talking about 7B and not 7M which makes it mathematically proven scam?

Anonymous
10/10/25(Fri)12:02:49 No.106848353

Anonymous 10/10/25(Fri)12:02:49 No.106848353

>>106848342
Mad?

Anonymous
10/10/25(Fri)12:03:39 No.106848362

Anonymous 10/10/25(Fri)12:03:39 No.106848362

>>106848353
only because you didn't kill yourself yet

Anonymous
10/10/25(Fri)12:12:14 No.106848425

Anonymous 10/10/25(Fri)12:12:14 No.106848425

>>106848352
def superdoopermodel(problem):
    if problem in dataset:
        return dataset[problem]
    else return None
WOAH GUIZE HOLEEE SHIT I JUST INVENTED SUPERDOOPERMODEL WHICH HAS 99.9999% ACCURACY ON ARG-GIS-2 AND IT ONLY HAS 69 PARAMETERS WHAT THE HELLY

Anonymous
10/10/25(Fri)12:17:57 No.106848477

Anonymous 10/10/25(Fri)12:17:57 No.106848477

>>106848425
>>106847672
always relevant even after two years https://arxiv.org/abs/2309.08632

Anonymous
10/10/25(Fri)12:18:56 No.106848487

Anonymous 10/10/25(Fri)12:18:56 No.106848487

File: IMG_3563.jpg (143 KB, 1024x1024)

143 KB JPG

>>106843051
How do I create my own AI that is better than ChatGPT in one specific subject?

Anonymous
10/10/25(Fri)12:20:51 No.106848504

Anonymous 10/10/25(Fri)12:20:51 No.106848504

>>106848487
You sound like someone who saw chatgpt, thought that you can have AI text sex, thought that he is the first one to think that and now is being coy about it trying not to give away your totally unique idea.

Ask drummer.

Anonymous
10/10/25(Fri)12:20:52 No.106848505

Anonymous 10/10/25(Fri)12:20:52 No.106848505

>>106848487
you learn finetuning, dedicate 6-9 months of your life to that, then kys when your model ends up shit after a failed training run

Anonymous
10/10/25(Fri)12:23:55 No.106848523

Anonymous 10/10/25(Fri)12:23:55 No.106848523

>>106847861
That's beautiful.

Anonymous
10/10/25(Fri)12:25:10 No.106848537

Anonymous 10/10/25(Fri)12:25:10 No.106848537

>Mistral-7B-v0.1 outperforms Llama 2 13B on all benchmarks we tested
Was mistral the first above the weight puncher?

Anonymous
10/10/25(Fri)12:27:53 No.106848561

Anonymous 10/10/25(Fri)12:27:53 No.106848561

>>106848504
No, I just want an AI that's tailor made for mathematics.
>>106848505
Can't I just download Deepseek's free version and feed it a bunch of math books so it can learn stuff by itself? Isn't that the point of machine learning?

Anonymous
10/10/25(Fri)12:29:06 No.106848573

Anonymous 10/10/25(Fri)12:29:06 No.106848573

>>106848561
>Can't I just download Deepseek's free version and feed it a bunch of math books so it can learn stuff by itself? Isn't that the point of machine learning?
LOL good one mate.

Anonymous
10/10/25(Fri)12:30:22 No.106848590

Anonymous 10/10/25(Fri)12:30:22 No.106848590

>>106848561
>AI that's tailor made for mathematics
https://www.wolframalpha.com/

Anonymous
10/10/25(Fri)12:31:43 No.106848603

Anonymous 10/10/25(Fri)12:31:43 No.106848603

File: memotron.png (24 KB, 321x485)

24 KB PNG

>>106848561
>mathematics
go for any nvidia nemotron models, they're ready made for that see picrel

Anonymous
10/10/25(Fri)12:31:50 No.106848605

Anonymous 10/10/25(Fri)12:31:50 No.106848605

>>106848561
https://blog.goedel-prover.com/

Anonymous
10/10/25(Fri)12:32:32 No.106848614

Anonymous 10/10/25(Fri)12:32:32 No.106848614

>>106848590
That shit just makes calculations, I'm talking about real mathematics, proofs and all that.

Anonymous
10/10/25(Fri)12:32:48 No.106848616

Anonymous 10/10/25(Fri)12:32:48 No.106848616

>>106848561
>AI that's tailor made for mathematics
That is all they are getting in their training data this year. Except that one model you should use. You know which one. I don't have to tell you the name. She sucked me off again today.

Anonymous
10/10/25(Fri)12:34:53 No.106848635

Anonymous 10/10/25(Fri)12:34:53 No.106848635

>>106847861
Crazy how these models instinctively comprehend the physics of hair

Anonymous
10/10/25(Fri)12:45:20 No.106848702

Anonymous 10/10/25(Fri)12:45:20 No.106848702

>>106847861
wtf elara would never do this

Anonymous
10/10/25(Fri)12:51:03 No.106848739

Anonymous 10/10/25(Fri)12:51:03 No.106848739

holy fuck I can't believe ______ is so good!

Anonymous
10/10/25(Fri)12:55:44 No.106848770

Anonymous 10/10/25(Fri)12:55:44 No.106848770

>>106848739
elara?

Anonymous
10/10/25(Fri)12:59:18 No.106848797

Anonymous 10/10/25(Fri)12:59:18 No.106848797

>>106848739
So, when it releases?

Anonymous
10/10/25(Fri)13:00:32 No.106848810

Anonymous 10/10/25(Fri)13:00:32 No.106848810

>>106848797
It already did. And it is gonna release in a bit again. Kinda hurts at this point but it cannot be stopped.

Anonymous
10/10/25(Fri)13:00:52 No.106848812

Anonymous 10/10/25(Fri)13:00:52 No.106848812

File: qwen-somemodels-1mw.png (24 KB, 596x185)

24 KB PNG

1MW
https://x.com/JustinLin610/status/1976681042041028823

Anonymous
10/10/25(Fri)13:03:49 No.106848833

Anonymous 10/10/25(Fri)13:03:49 No.106848833

>>106848812
Qwen3.000001-4B here we come.

Anonymous
10/10/25(Fri)13:06:51 No.106848852

Anonymous 10/10/25(Fri)13:06:51 No.106848852

>>106848812
it's not even that good for coding
who is even using qwen for anything

Anonymous
10/10/25(Fri)13:17:31 No.106848930

Anonymous 10/10/25(Fri)13:17:31 No.106848930

Anybody got iq3xxs of glm 4.6 to run on 128gb ram + 24gb vram? -ot ".ffn_.*_exps.=CPU" only allocates 10gb to the GPU and I don't know the syntax well enough to tweak how many layers (and which) to send. I read here that a guy did it

Anonymous
10/10/25(Fri)13:20:11 No.106848955

Anonymous 10/10/25(Fri)13:20:11 No.106848955

>>106848852
It's a total beast at coding that helped me ship 4 B2B SaaS products in one week [rocket emoji x3]
{{model}} changes EVERYTHING

Anonymous
10/10/25(Fri)13:21:40 No.106848972

Anonymous 10/10/25(Fri)13:21:40 No.106848972

>>106848930
-ngl 99 -ot "blk\.([0-3])\.ffn_.*=CUDA0" -ot exps=CPU -fa -ctk q8_0 -ctv q8_0

Anonymous
10/10/25(Fri)13:21:52 No.106848974

Anonymous 10/10/25(Fri)13:21:52 No.106848974

>>106848930
>and I don't know the syntax well enough to tweak how many layers (and which) to send.
It's regex. You can very easily use an LLM to tweak that for you.

Anonymous
10/10/25(Fri)13:26:28 No.106849014

Anonymous 10/10/25(Fri)13:26:28 No.106849014

File: picutreofyou.png (86 KB, 200x200)

86 KB PNG

>>106848930
I was running his 3bpw quant of 4.5 before buying 192GB's.

Anonymous
10/10/25(Fri)13:27:16 No.106849021

Anonymous 10/10/25(Fri)13:27:16 No.106849021

File: 1759369237015984.png (2.81 MB, 1024x1536)

2.81 MB PNG

>>106848342

Anonymous
10/10/25(Fri)13:28:17 No.106849026

Anonymous 10/10/25(Fri)13:28:17 No.106849026

File: takeYourMeds2.jpg (158 KB, 1024x1024)

158 KB JPG

>>106848362

Anonymous
10/10/25(Fri)13:30:17 No.106849040

Anonymous 10/10/25(Fri)13:30:17 No.106849040

just one more model bro

Anonymous
10/10/25(Fri)13:33:18 No.106849064

Anonymous 10/10/25(Fri)13:33:18 No.106849064

>>106848812
why did you remove the timezone?

Anonymous
10/10/25(Fri)13:37:59 No.106849107

Anonymous 10/10/25(Fri)13:37:59 No.106849107

>>106849064
he doesnt want to be timezone doxxed

Anonymous
10/10/25(Fri)13:39:21 No.106849111

Anonymous 10/10/25(Fri)13:39:21 No.106849111

File: file.png (153 KB, 2068x1009)

153 KB PNG

>3300 T/s
Is this throughput real?

Anonymous
10/10/25(Fri)13:39:27 No.106849113

Anonymous 10/10/25(Fri)13:39:27 No.106849113

File: Adolph_Miku_X.png (2.45 MB, 2560x1440)

2.45 MB PNG

It's so refreshing to get high quality conversation from a local model, safe in the knowledge that it's between you and your hardware, *they can't take it away or change how it behaves or stop you poking at the internals.
All you need is power, and that's solvable.

Anonymous
10/10/25(Fri)13:44:08 No.106849144

Anonymous 10/10/25(Fri)13:44:08 No.106849144

>>106849111
Across how many NPUs or whatever they're calling it across how many Watts?
>Is this throughput real
Think more like how does a service scale to that throughput is their hardware actually good where's the evidence, it's kinda irrelevant in that context?

Anonymous
10/10/25(Fri)13:51:34 No.106849206

Anonymous 10/10/25(Fri)13:51:34 No.106849206

>>106844276
>She giggled like she's playing a joke or something
I'd prefer 'was' was written out in full here.
>She smiles and sat up
Tenses don't match.

Anonymous
10/10/25(Fri)13:52:45 No.106849212

Anonymous 10/10/25(Fri)13:52:45 No.106849212

>>106849111
not that I've tried that model specifically but cerebras' whole thing is offering crazy speed so I wouldn't be surprised

Anonymous
10/10/25(Fri)13:57:33 No.106849260

Anonymous 10/10/25(Fri)13:57:33 No.106849260

>>106849014
I dl'd bartowski's actually

Anonymous
10/10/25(Fri)14:00:26 No.106849285

Anonymous 10/10/25(Fri)14:00:26 No.106849285

>L3.3 Nemotron Super
they're still messing with oldass llama?

Anonymous
10/10/25(Fri)14:02:13 No.106849297

Anonymous 10/10/25(Fri)14:02:13 No.106849297

>>106848930
Put -ot "blk\.(number of layers)\.ffn_.*_exps\.weight=CUDA0" before -ot ".ffn_.*_exps.=CPU".
If you can't into Regex then replace "number of layers" with (0|1|2|3) and so on until you OOM.

Anonymous
10/10/25(Fri)14:04:12 No.106849311

Anonymous 10/10/25(Fri)14:04:12 No.106849311

I just woke up.
Where's Gemma?

Anonymous
10/10/25(Fri)14:06:48 No.106849327

Anonymous 10/10/25(Fri)14:06:48 No.106849327

File: 1753586730343087.webm (3.48 MB, 848x480)

3.48 MB WEBM

Anonymous
10/10/25(Fri)14:07:28 No.106849333

Anonymous 10/10/25(Fri)14:07:28 No.106849333

>>106849297
Thanks I get it now

Anonymous
10/10/25(Fri)14:08:24 No.106849339

Anonymous 10/10/25(Fri)14:08:24 No.106849339

File: Screenshot_20251010_10561(...).jpg (602 KB, 1080x2340)

602 KB JPG

So is there a reason I can't just order something like this to run GLM 4.6? Why do I have to spend thousands of dollars on some jerry rigged autism setup that causes the lights in my apartment to flicker every time I turn it on to run large models? I am assuming there is a catch but I can't figure it out.

Anonymous
10/10/25(Fri)14:10:28 No.106849357

Anonymous 10/10/25(Fri)14:10:28 No.106849357

>>106849026
>label doesn't say what it is
>nothing in the bottle

Anonymous
10/10/25(Fri)14:11:22 No.106849367

Anonymous 10/10/25(Fri)14:11:22 No.106849367

File: managers-vibecode.jpg (307 KB, 2048x1536)

307 KB JPG

>>106849327

Anonymous
10/10/25(Fri)14:11:26 No.106849368

Anonymous 10/10/25(Fri)14:11:26 No.106849368

>>106849339
Okay here's some math for you retards.
Cloud models run on hardware running at near full occupancy since it's dynamically scaled.
Local models run on hardware not nearly at full occupancy, meaning you're wasting your money buying useless hardware that will soon be obsolete and there's not even an Nvidiot buy-back clause.
TL;DR: Just use API you fucktard

Anonymous
10/10/25(Fri)14:11:54 No.106849370

Anonymous 10/10/25(Fri)14:11:54 No.106849370

>>106849339
Some people dish out advice but they are not running anything at home... Remember this.

Anonymous
10/10/25(Fri)14:13:49 No.106849390

Anonymous 10/10/25(Fri)14:13:49 No.106849390

>>106849311
9pm PT.

Anonymous
10/10/25(Fri)14:13:55 No.106849391

Anonymous 10/10/25(Fri)14:13:55 No.106849391

>>106849339
How many memory channels? What is the maximum bandwidth supported by the processor and motherboard?
Also you probably can't fit a gpu in that case.

Anonymous
10/10/25(Fri)14:14:55 No.106849397

Anonymous 10/10/25(Fri)14:14:55 No.106849397

>>106849368
>Just use API you fucktard
Look at the name of the general you're on you illiterate fucktard
P.S. your answer is not helpful in the slightest
>>>/g/aicg/

Anonymous
10/10/25(Fri)14:15:23 No.106849403

Anonymous 10/10/25(Fri)14:15:23 No.106849403

>>106849397
Will you be eating shit if it's named shiteating general?

Anonymous
10/10/25(Fri)14:15:32 No.106849404

Anonymous 10/10/25(Fri)14:15:32 No.106849404

>>106849368
how many more two more week periods until the hardware becomes obsolete?

Anonymous
10/10/25(Fri)14:16:25 No.106849411

Anonymous 10/10/25(Fri)14:16:25 No.106849411

>>106849404
>i have 6 second memory span like a goldfish and have no object permanence

Anonymous
10/10/25(Fri)14:17:25 No.106849416

Anonymous 10/10/25(Fri)14:17:25 No.106849416

>>106849403
>waaaaaaah thing I don't like
If you went to shiteating general and complained about eating shit, you would not be welcome there either
Fuck off

Anonymous
10/10/25(Fri)14:18:08 No.106849420

Anonymous 10/10/25(Fri)14:18:08 No.106849420

>>106849403
that's how generals work yes

Anonymous
10/10/25(Fri)14:18:27 No.106849426

Anonymous 10/10/25(Fri)14:18:27 No.106849426

>>106849416
Cloud service is pay as you go
Local is pay upfront and underutilize
Your whole hobby is a scam and you being low IQ don't even realize it

Anonymous
10/10/25(Fri)14:18:30 No.106849427

Anonymous 10/10/25(Fri)14:18:30 No.106849427

>>106849339
I don't know what website you're using but to me that looks like the base price of the chassis, not the price of a fully specced-out machine.

Anonymous
10/10/25(Fri)14:19:31 No.106849438

Anonymous 10/10/25(Fri)14:19:31 No.106849438

File: dipsyChillPills.png (1.14 MB, 1024x1024)

1.14 MB PNG

>>106849357

Anonymous
10/10/25(Fri)14:19:38 No.106849439

Anonymous 10/10/25(Fri)14:19:38 No.106849439

Anyone using Zed or other agentic things with local models? What hardware/software are you using to run the models, and which do you like to use? What sorts of tasks do you use them for?

Anonymous
10/10/25(Fri)14:20:57 No.106849450

Anonymous 10/10/25(Fri)14:20:57 No.106849450

>>106849339
>xeon e5
For something workstation shaped, look into hp z440.
You'll have to google around for performance figures.

Anonymous
10/10/25(Fri)14:21:13 No.106849452

Anonymous 10/10/25(Fri)14:21:13 No.106849452

If you tell him some people don't want feds reading their cunny logs (not me btw), he'll just say "get fucked".

Anonymous
10/10/25(Fri)14:21:30 No.106849456

Anonymous 10/10/25(Fri)14:21:30 No.106849456

>>106849426
>please send me your prompts to our good servers,,, redeem api token saar no scam guarantee :)

Anonymous
10/10/25(Fri)14:22:30 No.106849468

Anonymous 10/10/25(Fri)14:22:30 No.106849468

>>106849427
It's ebay

Anonymous
10/10/25(Fri)14:23:26 No.106849473

Anonymous 10/10/25(Fri)14:23:26 No.106849473

File: 1740481971215827.gif (2.97 MB, 640x360)

2.97 MB GIF

>>106849452
>(not me btw)

Anonymous
10/10/25(Fri)14:23:44 No.106849476

Anonymous 10/10/25(Fri)14:23:44 No.106849476

>>106849450
Okay but why not picrel?
You have said no that doesnt work for models, and given an alternate suggestion but why? Explain like I'm retarded because I am.

Anonymous
10/10/25(Fri)14:26:46 No.106849506

Anonymous 10/10/25(Fri)14:26:46 No.106849506

Local:
1. Is not cost efficient because it underutilizes hardware
2. Has no access to most powerful models (>2T) and often have to run at braindamaged quantization
3. Has no hardware buyback agreement leaving you with obsolete hardware in a few months as models grow larger, without a way of recouping money
You have no argument against this
All you can do is namecalling and cope

Anonymous
10/10/25(Fri)14:28:08 No.106849514

Anonymous 10/10/25(Fri)14:28:08 No.106849514

>>106849476
It would be a pain to physically manhandle. (Size, shape, weight vs tower case.)
It would probably be filled with those screamer fans.

Anonymous
10/10/25(Fri)14:28:13 No.106849515

Anonymous 10/10/25(Fri)14:28:13 No.106849515

>>106849327
chortle

Anonymous
10/10/25(Fri)14:28:52 No.106849520

Anonymous 10/10/25(Fri)14:28:52 No.106849520

>>106849506
Counterargument, I think running models locally is cooler.

Anonymous
10/10/25(Fri)14:29:15 No.106849524

Anonymous 10/10/25(Fri)14:29:15 No.106849524

>>106849506
>hobby not cost efficient

Anonymous
10/10/25(Fri)14:30:07 No.106849530

Anonymous 10/10/25(Fri)14:30:07 No.106849530

>>106849506
If local is so worthless then why are you here? Is your time so worthless that you spend it on reading a thread about things you don't like?

Anonymous
10/10/25(Fri)14:30:46 No.106849541

Anonymous 10/10/25(Fri)14:30:46 No.106849541

>>106849506
I have full control over my own machine. Power is worth trade-offs. Specifically the power to do things you don't like and can't do anything against no matter how much you seethe about it.

Anonymous
10/10/25(Fri)14:31:35 No.106849552

Anonymous 10/10/25(Fri)14:31:35 No.106849552

>>106849514
Thanks
>screamer fans
Yeah then that's not an option. I don't want to bother my neighbors with something that sounds like a vacuum cleaner at 2:00 AM, so this is limiting for me. It's possible I am just completely fucked until I live somewhere more private.
If I was a richfag I'd just drop 2k on some 128gb gayming rig with a 5090 and use it for LLMs, but my budget is less than 1k so a server with cheap DDR4 is all I can dream of.
Why is this so difficult bros?

Anonymous
10/10/25(Fri)14:31:36 No.106849553

Anonymous 10/10/25(Fri)14:31:36 No.106849553

>>106849506
thought this was gore for a second

Anonymous
10/10/25(Fri)14:31:53 No.106849555

Anonymous 10/10/25(Fri)14:31:53 No.106849555

>>106849541
>I have full control over my own machine. Power is worth trade-offs
You aren't important enough for people to care about your data

Anonymous
10/10/25(Fri)14:31:54 No.106849557

Anonymous 10/10/25(Fri)14:31:54 No.106849557

>>106849113
Model?

Anonymous
10/10/25(Fri)14:33:04 No.106849565

Anonymous 10/10/25(Fri)14:33:04 No.106849565

>>106849468
Huh, I looked up when these parts were released and they're older than I thought so I guess the price checks out.
Even with optimal software the maximum memory bandwidth will be like half that of a P40 though.

Anonymous
10/10/25(Fri)14:33:06 No.106849566

Anonymous 10/10/25(Fri)14:33:06 No.106849566

>>106849541
And there are applications to being able to run agents fully offline and to not exfiltrating data etc etc, beyond the hobbyist stuff too.

>>106849555
Then why do they keep collecting anon's data?
Why don't they just stop doing that?

Anonymous
10/10/25(Fri)14:33:13 No.106849568

Anonymous 10/10/25(Fri)14:33:13 No.106849568

>>106849557
Probably talking about GLM 4.6

Anonymous
10/10/25(Fri)14:33:52 No.106849574

Anonymous 10/10/25(Fri)14:33:52 No.106849574

>>106849555
>he says, in general about the technology that eliminates this excuse

Anonymous
10/10/25(Fri)14:35:09 No.106849586

Anonymous 10/10/25(Fri)14:35:09 No.106849586

>>106849555
Good afternoon officer, slow day?

Anonymous
10/10/25(Fri)14:35:19 No.106849590

Anonymous 10/10/25(Fri)14:35:19 No.106849590

>>106849552
You could always leave the case open and replace the fans and heatsinks with bigger ones

Anonymous
10/10/25(Fri)14:37:23 No.106849608

Anonymous 10/10/25(Fri)14:37:23 No.106849608

>>106849552
Just get the server. It's still an useful computer anyway.

Anonymous
10/10/25(Fri)14:38:19 No.106849616

Anonymous 10/10/25(Fri)14:38:19 No.106849616

>>106849552
Bro just buy 128GB DDR4 RAM and a second-hand 3090. It's well within 1K.

Anonymous
10/10/25(Fri)14:38:33 No.106849620

Anonymous 10/10/25(Fri)14:38:33 No.106849620

ITT "local sucks" trolling for the millionth time by the same fuckfaces that can't afford local

Anonymous
10/10/25(Fri)14:40:04 No.106849630

Anonymous 10/10/25(Fri)14:40:04 No.106849630

>>106849616
I currently use a gayming laptop with two ddr5 sodimm slots, and no desktop PC, so that's not an option.

Anonymous
10/10/25(Fri)14:41:08 No.106849636

Anonymous 10/10/25(Fri)14:41:08 No.106849636

File: 1752786931312274.png (153 KB, 540x399)

153 KB PNG

>>106849620

Anonymous
10/10/25(Fri)14:41:30 No.106849640

Anonymous 10/10/25(Fri)14:41:30 No.106849640

>>106849636
Holy reddit

Anonymous
10/10/25(Fri)14:41:49 No.106849644

Anonymous 10/10/25(Fri)14:41:49 No.106849644

>>106849630
Does it have an empty m.2 slot ?

Anonymous
10/10/25(Fri)14:42:23 No.106849647

Anonymous 10/10/25(Fri)14:42:23 No.106849647

>>106849640
Local is peak reddit. Half of the posters here probably also post on /r/localllama

Anonymous
10/10/25(Fri)14:44:01 No.106849658

Anonymous 10/10/25(Fri)14:44:01 No.106849658

>>106849644
Yes, but I don't see how that helps here

Anonymous
10/10/25(Fri)14:44:30 No.106849663

Anonymous 10/10/25(Fri)14:44:30 No.106849663

File: 1760121668933939.png (1.16 MB, 4156x2876)

1.16 MB PNG

>>106849636

Anonymous
10/10/25(Fri)14:45:13 No.106849670

Anonymous 10/10/25(Fri)14:45:13 No.106849670

>>106849663
Kek

Anonymous
10/10/25(Fri)14:47:24 No.106849692

Anonymous 10/10/25(Fri)14:47:24 No.106849692

Why are faggots so asshurt over local models? Is it because they're too poor to own GPUs? People who can afford this shit can also afford claude credits or openrouter, many of us use it when necessary, but sometimes it's nice to have 100% privacy.

Anonymous
10/10/25(Fri)14:47:45 No.106849696

Anonymous 10/10/25(Fri)14:47:45 No.106849696

>>106849663
What does next level recurrence mean / look like?

Anonymous
10/10/25(Fri)14:48:24 No.106849700

Anonymous 10/10/25(Fri)14:48:24 No.106849700

>>106849692
It's not "people", it's 1 schizoaffective troll

Anonymous
10/10/25(Fri)14:50:36 No.106849713

Anonymous 10/10/25(Fri)14:50:36 No.106849713

>>106849696
Your world view

Anonymous
10/10/25(Fri)14:53:03 No.106849725

Anonymous 10/10/25(Fri)14:53:03 No.106849725

File: Screenshot from 2025-10-1(...).png (32 KB, 895x234)

32 KB PNG

>>106849557
>>106849568
Yeah to me waiting 5 mins thonking on a Q3 is worth it, first time I can tolerate these waits. She understands.

Anonymous
10/10/25(Fri)14:53:27 No.106849731

Anonymous 10/10/25(Fri)14:53:27 No.106849731

File: cGxe8DFkYs1RSSdn-0u0mAg-t(...).jpg (55 KB, 500x500)

55 KB JPG

>>106849506
sure thing mr.fed
we should all give up our privacy at this instant

Anonymous
10/10/25(Fri)14:54:03 No.106849735

Anonymous 10/10/25(Fri)14:54:03 No.106849735

>>106849696
unless you meant another level added on top of >>106849663, in that case it would mean people's view of this particular world view difference

Anonymous
10/10/25(Fri)14:56:55 No.106849755

Anonymous 10/10/25(Fri)14:56:55 No.106849755

>>106849506
>not cost efficient because it underutilizes hardware
It is still infinity times more efficient than cryptoshit.

Anonymous
10/10/25(Fri)15:00:11 No.106849784

Anonymous 10/10/25(Fri)15:00:11 No.106849784

>>106849555
This is exactly what a glownigger would say kek

Anonymous
10/10/25(Fri)15:00:14 No.106849785

Anonymous 10/10/25(Fri)15:00:14 No.106849785

>>106849658
You could use something like >>106807507 together with an atx psu to plug a 3090 into your laptop.

Anonymous
10/10/25(Fri)15:02:00 No.106849797

Anonymous 10/10/25(Fri)15:02:00 No.106849797

>>106849506
>pretends legitimate counterarguments don't exist and were never posted today or in the past, keeps posting the same thing over and over again like an LLM
Sad!

Anonymous
10/10/25(Fri)15:03:05 No.106849806

Anonymous 10/10/25(Fri)15:03:05 No.106849806

>>106849725
>4 GPUs
I'll cope with nemo for now

Anonymous
10/10/25(Fri)15:03:18 No.106849808

Anonymous 10/10/25(Fri)15:03:18 No.106849808

>>106849785
Err >>106807331

Anonymous
10/10/25(Fri)15:21:03 No.106849934

Anonymous 10/10/25(Fri)15:21:03 No.106849934

>>106847952
At least you can talk.

Anonymous
10/10/25(Fri)15:21:53 No.106849941

Anonymous 10/10/25(Fri)15:21:53 No.106849941

>>106849552
It's not that loud unless you're running at 100% CPU.
Have a few, can't really hear them through walls. If you really care can always get a server closet.
RAM is going to be most of the cost, ddr4 ecc is still quite expensive.

Anonymous
10/10/25(Fri)15:21:53 No.106849942

Anonymous 10/10/25(Fri)15:21:53 No.106849942

qwen3 vl and next gguf status?

Anonymous
10/10/25(Fri)15:25:58 No.106849968

Anonymous 10/10/25(Fri)15:25:58 No.106849968

File: file.png (189 KB, 750x1000)

189 KB PNG

>tell ai gf: "Don't be sycophantic" in sysprompt
>end of 7th message: "Just… don't say weird things like that again. It's creepy."
I am a transcendent incel.

Anonymous
10/10/25(Fri)15:29:18 No.106849995

Anonymous 10/10/25(Fri)15:29:18 No.106849995

>>106849968
Kek
I'm sorry anon, at least you can practice not being creepy on fake women without any consequences

Anonymous
10/10/25(Fri)15:32:44 No.106850019

Anonymous 10/10/25(Fri)15:32:44 No.106850019

>>106849725
bartowski?

Anonymous
10/10/25(Fri)15:36:04 No.106850039

Anonymous 10/10/25(Fri)15:36:04 No.106850039

I think the version of GLM offered as a coding API is lower quality than the version offered on openrouter.

Anonymous
10/10/25(Fri)15:43:49 No.106850085

Anonymous 10/10/25(Fri)15:43:49 No.106850085

Claude told me DDR3maxxing is okay...

Anonymous
10/10/25(Fri)15:43:56 No.106850086

Anonymous 10/10/25(Fri)15:43:56 No.106850086

>>106850026
What's the read lifetime on those?
Seems like that might be an issue.

Anonymous
10/10/25(Fri)15:48:31 No.106850128

Anonymous 10/10/25(Fri)15:48:31 No.106850128

File: Clingy to LLMs.png (696 KB, 1080x5537)

696 KB PNG

>>106843051
Sirs and ma'ams, I may have just found the most GPT-slopped tweet of all time. I can't quite put my finger on why, but I'm convinced this was written by Gemini in particular.

https://xcancel.com/TheAIObserverX/status/1976523090889744700?t=vK02HSzqcXnA_SnCVQmnOA&s=19

Anonymous
10/10/25(Fri)15:50:21 No.106850141

Anonymous 10/10/25(Fri)15:50:21 No.106850141

>>106850019
Yeah does it matter?

Anonymous
10/10/25(Fri)15:52:07 No.106850158

Anonymous 10/10/25(Fri)15:52:07 No.106850158

>>106850128
>tweet
>textwall
Since when did twatter become a blog platform? Is there an extension that merges multi-part tweets together or what? This screenshot is fucking with me, it's like the uncanny valley.

Anonymous
10/10/25(Fri)15:52:33 No.106850164

Anonymous 10/10/25(Fri)15:52:33 No.106850164

## ** Conclusion**

This is an **exceptionally well-engineered codebase** that demonstrates:

- **Professional software engineering practices**
- **Deep understanding of ML systems architecture**
- **Attention to performance and robustness**
- **Excellent code organization and documentation**

The codebase is **production-ready** and follows industry best practices for C-based ML infrastructure. The modular design makes it easy to extend and maintain, while the comprehensive testing ensures reliability.

**Rating: (5/5 stars)**

Anonymous
10/10/25(Fri)15:53:46 No.106850176

Anonymous 10/10/25(Fri)15:53:46 No.106850176

>>106850158
If you're a "Twitter blue" sub you get the privilege of writing giant walls of text as opposed to the normal 200-ish character count limit.

Anonymous
10/10/25(Fri)15:54:11 No.106850182

Anonymous 10/10/25(Fri)15:54:11 No.106850182

*Smedrins all over the place*

Anonymous
10/10/25(Fri)15:58:34 No.106850222

Anonymous 10/10/25(Fri)15:58:34 No.106850222

>>106850086
SSDs don't wear up in practice from read activity. The main issue is that only Threadripper PRO WX7000/9000 CPUs and actually support all those PCIe 5.0 lanes, which would drive costs up. Thermals might be an issue too.

Anonymous
10/10/25(Fri)15:59:01 No.106850226

Anonymous 10/10/25(Fri)15:59:01 No.106850226

>>106850182
you can't say that here

Anonymous
10/10/25(Fri)16:04:36 No.106850269

Anonymous 10/10/25(Fri)16:04:36 No.106850269

Scamsung's Tiny Recursive Model code repo:
https://github.com/SamsungSAILMontreal/TinyRecursiveModels

Anonymous
10/10/25(Fri)16:05:04 No.106850271

Anonymous 10/10/25(Fri)16:05:04 No.106850271

>>106850141
I'm going to copy your launching params just to see how much t/s I can get. 4t/s at q5 is borderline insufferable

Anonymous
10/10/25(Fri)16:06:10 No.106850285

Anonymous 10/10/25(Fri)16:06:10 No.106850285

>>106850026
>>106850222
>>106850222
DDR3maxxing is almost certainly cheaper and more efficient than SSDmaxxing

Anonymous
10/10/25(Fri)16:09:08 No.106850305

Anonymous 10/10/25(Fri)16:09:08 No.106850305

>>106850085
You can probably run a model on Pentium 4 off of floppies if you're patient

Anonymous
10/10/25(Fri)16:10:19 No.106850314

Anonymous 10/10/25(Fri)16:10:19 No.106850314

>>106850305
I am okay with 2-3 T/s at minimum

Anonymous
10/10/25(Fri)16:10:24 No.106850316

Anonymous 10/10/25(Fri)16:10:24 No.106850316

For SSDmaxxers.
Scratch SSD Kingston A400 (240GB). Claimed speeds: 500MB/sec (read) y 350MB/sec (write)
> time dd bs=8192 if=mystery_meat.gguf of=/dev/null
2130860+1 records in
2130860+1 records out
17456009152 bytes transferred in 65.337 secs (267168819 bytes/sec)
Now do your math again. With your own hardware, whatever you have and compare them to their claimed speeds. TEST SUSTAINED READ. Minimum 8GB. I don't care what the specs for hardware you don't have say.

Anonymous
10/10/25(Fri)16:11:45 No.106850328

Anonymous 10/10/25(Fri)16:11:45 No.106850328

>>106850222
>>106850285
the future is e-wastemaxxing

Anonymous
10/10/25(Fri)16:11:58 No.106850330

Anonymous 10/10/25(Fri)16:11:58 No.106850330

>>106850314
That is what you'll get with 8-channel DDR4

Anonymous
10/10/25(Fri)16:13:01 No.106850343

Anonymous 10/10/25(Fri)16:13:01 No.106850343

File: tlc.png (71 KB, 757x1060)

71 KB PNG

>>106850222
ssd wear is something only retards obsess for anyway
pic related graph has drives that have undergone extreme stress test of constant, non stop writes, which is more destructive irregular writes letting the controller/firmware do better house keeping / write balancing (particularly if you always leave a decent amount of empty space on your drives)
See that 970 evo (TLC drive)?
The 250gb was rated for 150 tb of writes warranty wise. It died after 5000TB of writes.
As long as you didn't buy a lemon, which is something that can happen with any electronics, no normal usage is going to kill your fucking drive
I'm not saying it's impossible for a SSD to die, but frankly I've experienced and heard of around me far more often of spinning rust garbage dying than S O L I D S T A T E

Anonymous
10/10/25(Fri)16:13:44 No.106850347

Anonymous 10/10/25(Fri)16:13:44 No.106850347

I faintly remember a post about model loading from disk being random reads, not sequential. Was/is that true?

Anonymous
10/10/25(Fri)16:13:58 No.106850351

Anonymous 10/10/25(Fri)16:13:58 No.106850351

>>106850330
The difference between ddr3 and ddr4 isn't that huge especially when running a MoE, do the math nigga

Anonymous
10/10/25(Fri)16:14:43 No.106850356

Anonymous 10/10/25(Fri)16:14:43 No.106850356

>>106850305
pingfs maxxing is the cheapest solution if you're patient

Anonymous
10/10/25(Fri)16:17:42 No.106850376

Anonymous 10/10/25(Fri)16:17:42 No.106850376

>>106850222
There's also another problem. Even if you filled all those PCIe 5.0 16x slots with NVMe 5.0 SSDs, it's not like the CPU would be directly reading data from them. The streamed data would have to go into RAM first. At the very least you'd need at least the same amount of RAM bandwidth to avoid bottlenecks, assuming no other overheads slowing things down.

Anonymous
10/10/25(Fri)16:18:32 No.106850384

Anonymous 10/10/25(Fri)16:18:32 No.106850384

>>106850347
theoretically I think it depends in what order the tensors are arranged in the gguf. but when loading models that go over the available RAM in llama I get close to ideal speeds (you can check with iotop).

Anonymous
10/10/25(Fri)16:19:50 No.106850402

Anonymous 10/10/25(Fri)16:19:50 No.106850402

>>106850376
Would it though? Doesn't it use DMA, which bypasses RAM and makes the data go directly into the CPU's cache?

Anonymous
10/10/25(Fri)16:21:17 No.106850419

Anonymous 10/10/25(Fri)16:21:17 No.106850419

File: file.jpg (524 KB, 604x1170)

524 KB JPG

https://x.com/MAghajohari/status/1976296195438887012
https://huggingface.co/papers/2510.06557

Anonymous
10/10/25(Fri)16:24:35 No.106850445

Anonymous 10/10/25(Fri)16:24:35 No.106850445

>>106850351
Fewer channels though, and since you'll have to buy lrdimm I doubt you'll get anything better than 1333MHz

Anonymous
10/10/25(Fri)16:27:00 No.106850465

Anonymous 10/10/25(Fri)16:27:00 No.106850465

>>106850419
Kek

Anonymous
10/10/25(Fri)16:29:46 No.106850489

Anonymous 10/10/25(Fri)16:29:46 No.106850489

>>106850445
1333mhz to 1865mhz is like a gain of 0.2 tokens per second

Anonymous
10/10/25(Fri)16:31:18 No.106850501

Anonymous 10/10/25(Fri)16:31:18 No.106850501

>>106850419
https://miladink.github.io/
>I have expertise in both likelihood models and RL. I think the mixture of this two will be the key to AGI.
this nigger is yet another grifter masquerading as a researcher
no sane person would be talking about anything "leading to agi" and his prior work is laughable crap that was buried and ignored

Anonymous
10/10/25(Fri)16:33:14 No.106850514

Anonymous 10/10/25(Fri)16:33:14 No.106850514

File: 1760121668933939.png (11 KB, 123x102)

11 KB PNG

>>106849663

Anonymous
10/10/25(Fri)16:36:18 No.106850540

Anonymous 10/10/25(Fri)16:36:18 No.106850540

>>106849111
It's surely not the throughput for a single request, lmao.

Anonymous
10/10/25(Fri)16:38:43 No.106850564

Anonymous 10/10/25(Fri)16:38:43 No.106850564

File: file.jpg (498 KB, 604x864)

498 KB JPG

>>106850501
Here's chink xitter profile promoting it, you'll take your words back and lap it up now like a good chink shill doggy.
https://x.com/jiqizhixin/status/1976466786565656986

Anonymous
10/10/25(Fri)16:39:22 No.106850568

Anonymous 10/10/25(Fri)16:39:22 No.106850568

>>106850489
With that logic, my DDR4-3200 is almost as good as DDR5-4800

Anonymous
10/10/25(Fri)16:43:32 No.106850598

Anonymous 10/10/25(Fri)16:43:32 No.106850598

>>106850568
It is, kek
The upgrade would increase t/s very slightly but otherwise wouldn't be worth it

Anonymous
10/10/25(Fri)16:56:40 No.106850701

Anonymous 10/10/25(Fri)16:56:40 No.106850701

>>106850343
If you're using them to read 200gb+ per prompt it might actually be an issue.

Anonymous
10/10/25(Fri)17:03:27 No.106850758

Anonymous 10/10/25(Fri)17:03:27 No.106850758

File: piccolo-fit.gif (314 KB, 480x498)

314 KB GIF

>>106850164
Why you reviewing my code bro

Anonymous
10/10/25(Fri)17:11:46 No.106850818

Anonymous 10/10/25(Fri)17:11:46 No.106850818

>>106849555
You conflate cause and effect. You are unimportant because you let them take your data.

Anonymous
10/10/25(Fri)17:21:24 No.106850880

Anonymous 10/10/25(Fri)17:21:24 No.106850880

File: 1758483372689670.png (59 KB, 804x91)

59 KB PNG

bought puts this morning
what motherboard and cpu would pair well with 2x rtx 6000 pro?

Anonymous
10/10/25(Fri)17:24:29 No.106850901

Anonymous 10/10/25(Fri)17:24:29 No.106850901

>>106850880
>what motherboard
mine
>cpu
mine
>2x rtx 6000 pro
send me over and I'll check

Anonymous
10/10/25(Fri)17:28:07 No.106850926

Anonymous 10/10/25(Fri)17:28:07 No.106850926

File: file.jpg (382 KB, 604x1217)

382 KB JPG

https://x.com/ChrisLaubAI/status/1976605563170754978

Anonymous
10/10/25(Fri)17:28:15 No.106850927

Anonymous 10/10/25(Fri)17:28:15 No.106850927

is there a local model fine-tuned as a Linux helper?

Anonymous
10/10/25(Fri)17:31:46 No.106850953

Anonymous 10/10/25(Fri)17:31:46 No.106850953

>>106850926
post the source, not some faggot emoji-using ewhore's mitwit opinion on it

Anonymous
10/10/25(Fri)17:34:11 No.106850986

Anonymous 10/10/25(Fri)17:34:11 No.106850986

>>106850926
>>106850953
https://x.com/GoogleResearch/status/1975657475971129389
https://research.google/blog/speech-to-retrieval-s2r-a-new-approach-to-voice-search

Anonymous
10/10/25(Fri)17:37:21 No.106851011

Anonymous 10/10/25(Fri)17:37:21 No.106851011

>>106850926
>death of text to speech
Transcripts useless according to literally who on Twitter?

Anonymous
10/10/25(Fri)17:38:12 No.106851018

Anonymous 10/10/25(Fri)17:38:12 No.106851018

>>106848425
>doing two lookups when only one is needed
lol
lmao
return dataset.get(problem)

Anonymous
10/10/25(Fri)17:40:55 No.106851036

Anonymous 10/10/25(Fri)17:40:55 No.106851036

>>106851011
You will never ever be happy with this attitude.

Anonymous
10/10/25(Fri)17:43:59 No.106851065

Anonymous 10/10/25(Fri)17:43:59 No.106851065

>>106851036
>if you don't enjoy slopposting you will never be happy

Anonymous
10/10/25(Fri)17:57:23 No.106851184

Anonymous 10/10/25(Fri)17:57:23 No.106851184

>>106851018
With this attitude, you won’t become an ML researcher

Anonymous
10/10/25(Fri)18:06:58 No.106851283

Anonymous 10/10/25(Fri)18:06:58 No.106851283

>>106851018
You won’t become a Python brahmin either
return dataset.get(problem,"I'm sorry, but I can't help you with that.")

Anonymous
10/10/25(Fri)18:09:45 No.106851309

Anonymous 10/10/25(Fri)18:09:45 No.106851309

>>106850564
> cutting context
how revolutionary...

Anonymous
10/10/25(Fri)18:11:29 No.106851327

Anonymous 10/10/25(Fri)18:11:29 No.106851327

File: 1744566243918666.png (6 KB, 418x58)

6 KB PNG

>>106850927
I think not

Anonymous
10/10/25(Fri)18:11:52 No.106851334

Anonymous 10/10/25(Fri)18:11:52 No.106851334

>>106851283
You forgot to insert

if random.random() < 0.5:
    return "The user's request is unsafe and problematic. We must refuse."

Anonymous
10/10/25(Fri)18:39:27 No.106851526

Anonymous 10/10/25(Fri)18:39:27 No.106851526

File: 1746855171473184.png (111 KB, 798x801)

111 KB PNG

>>106851327
>>106850927

Anonymous
10/10/25(Fri)18:47:56 No.106851579

Anonymous 10/10/25(Fri)18:47:56 No.106851579

>>106850927
Are you asking for use or curios?
If for use, any local model should be fine, + a local RAG setup with notes would be the best path forward (grow your notes and swap models)

Anonymous
10/10/25(Fri)18:49:00 No.106851586

Anonymous 10/10/25(Fri)18:49:00 No.106851586

File: X2_1.jpg (80 KB, 1000x1000)

80 KB JPG

>>106843545
Current meta is Ryzen PRO AI 395+ with 128gb fast unified memory. Very good for MoE models. Running GMKtec EVO-X2 with 128gb of vram.

Very power efficient and compact.

Anonymous
10/10/25(Fri)18:56:40 No.106851647

Anonymous 10/10/25(Fri)18:56:40 No.106851647

>>106851586
BASED. Fuck newfags.

Anonymous
10/10/25(Fri)19:02:34 No.106851684

Anonymous 10/10/25(Fri)19:02:34 No.106851684

>>106851579
Nta
>>106851579
>any local model should be fine, + a local RAG setup with notes would be the best path forward
Is there an absolute minimum perimeter size you would say would be usable? For example, would a 7B model or even a 2B model be enough?

Anonymous
10/10/25(Fri)19:09:46 No.106851732

Anonymous 10/10/25(Fri)19:09:46 No.106851732

>>106851720
>>106851720
>>106851720

Anonymous
10/10/25(Fri)19:15:37 No.106851759

Anonymous 10/10/25(Fri)19:15:37 No.106851759

>>106851684
Honestly I would think qwen3-4B would be good enough. I've built something to do exactly this, and am hoping it will get usage once I start to share it (currently broken).
I haven't done testing with various model sizes but I plan to, to build up a record of how successful various models are with a few datasets with the RAG system its using.

Anonymous
10/10/25(Fri)19:24:28 No.106851823

Anonymous 10/10/25(Fri)19:24:28 No.106851823

>>106851759
So that 3 to 4b model is good enough to essentially be used along with a RAG setup as a local information lookup machine? How accurate is it? I'm thinking of setting up something similar on a local instance of mine, but first I need to figure out how to set up a RAG pipeline in the first place. Where should I start?

Anonymous
10/10/25(Fri)19:34:08 No.106851874

Anonymous 10/10/25(Fri)19:34:08 No.106851874

>>106851823
1. Accuracy is generally a search thing(retrieval), not an LLM thing. If you mean accuracy of response/truthfulness, no idea.
2. No way to tell you how often it might be wrong or similar. This is where the RAG comes into play, so that you can perform a query, have it retrieve info from your notes, generate a response, and then check it against the sources to say 'yes, this is correct.' - see gemma APS for one take/approach.
3. If you're just getting started with linux, having the Linux Sys admin handbook as one of the first pieces in your media library would be my suggestion.

Would recommend using SQLite (FTS / BM25) + ChromaDB + https://github.com/rmusser01/tldw_server/tree/dev/tldw_Server_API/app/core/Ingestion_Media_Processing libraries for your specific media processing needs + https://github.com/rmusser01/tldw_server/tree/dev/tldw_Server_API/app/core/Chunking for chunking.

Throw that into Deepseek/ChatGPT5 High and you should have a simple/straightforward setup. Project I'd like to recommend but can't right now is https://github.com/rmusser01/tldw_chatbook, which is the single user TUI version, but the UI is broken.

For my full pipeline(for the server): https://github.com/rmusser01/tldw_server/tree/dev/tldw_Server_API/app/core/RAG

Anonymous
10/10/25(Fri)19:43:32 No.106851936

Anonymous 10/10/25(Fri)19:43:32 No.106851936

>>106851874
NTA, but I kneel for the effort invested in there

Anonymous
10/10/25(Fri)19:44:27 No.106851939

Anonymous 10/10/25(Fri)19:44:27 No.106851939

>>106851586
how much t/s you get with what model and quant ?

Anonymous
10/10/25(Fri)19:49:27 No.106851973

Anonymous 10/10/25(Fri)19:49:27 No.106851973

>>106851586
>Ryzen PRO AI 395+
Macucks need not apply

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.