/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 01/29/26(Thu)18:03:17 No.108006860

File: __hatsune_miku_vocaloid_d(...).png (570 KB, 694x980)

570 KB PNG

/lmg/ - Local Models General Anonymous 01/29/26(Thu)18:03:17 No.108006860 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107997948 & >>107986301

►News
>(01/28) Trinity Large 398B-A13B released: https://arcee.ai/blog/trinity-large
>(01/27) Kimi-K2.5 released with vision: https://hf.co/moonshotai/Kimi-K2.5
>(01/27) DeepSeek-OCR-2 released: https://hf.co/deepseek-ai/DeepSeek-OCR-2
>(01/25) Merged kv-cache : support V-less cache #19067: https://github.com/ggml-org/llama.cpp/pull/19067
>(01/22) Qwen3-TTS (0.6B & 1.8B) with voice design, cloning, and generation: https://qwen.ai/blog?id=qwen3tts-0115
>(01/21) Chroma-4B released: https://hf.co/FlashLabs/Chroma-4B

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
01/29/26(Thu)18:03:40 No.108006868

Anonymous 01/29/26(Thu)18:03:40 No.108006868

File: __hatsune_miku_vocaloid_d(...).jpg (553 KB, 2400x2400)

553 KB JPG

►Recent Highlights from the Previous Thread: >>107997948

--Papers:
>107999601 >107999634
--GPU offloading tradeoffs and multimodal support in llama.cpp:
>107999073 >107999192 >107999228 >107999351 >107999408 >107999434 >107999437 >108000983 >108001095 >108001101 >108001152 >108001289 >108001475 >108001533 >108001553 >108001566 >108001612 >108001633 >107999250 >107999287 >107999301 >107999423 >108001903 >108001981
--Stable-DiffCoder-8B benchmark performance and discussion on diffusion model efficiency:
>108001010 >108001106 >108001620 >108001109 >108001172 >108001216 >108004118 >108004176 >108004237 >108004283 >108004343
--Trinity model's explicit content generation and token prediction comparisons:
>107999802 >108000348 >108000369 >108001448 >108001514 >108001792 >108002123 >108002142 >108002336 >108002598
--Fine-tuning 400B MoE for roleplay with long context using novel datasets:
>108001139 >108001164 >108001185 >108001319 >108001402 >108003532 >108003598 >108003946 >108003968
--Repurposing old GPUs with PCIe expansion board for multi-GPU AI setups:
>107998221 >107998260 >107999172
--Pipeline for converting scanned PDFs to EPUB with graph handling:
>107999667 >108000337 >108001320
--SillyTavern fork adds banned strings and regex support with TFS:
>108000166 >108000735 >108000921 >108002916
--Local GPU setups vs cloud:
>107998010 >107998028 >107998070 >107998115 >107998232 >107998263 >107998279 >107998376 >107998408 >107998428 >107998492 >107998095 >107998132 >107998454 >107998675
--400B Trinity model enables uncensored erotica without fine-tuning or ablation:
>108003672 >108004704 >108004713 >108004829 >108004839 >108004872 >108004874 >108004869 >108004898 >108004913 >108005031
--Mozilla's AI "rebel alliance" with ethics-focused funding:
>108004243 >108004266
--Miku (free space):
>107998400 >107999172 >108003297 >108004558

►Recent Highlight Posts from the Previous Thread: >>107997953

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
01/29/26(Thu)18:05:31 No.108006879

Anonymous 01/29/26(Thu)18:05:31 No.108006879

today is the day

Anonymous
01/29/26(Thu)18:07:04 No.108006894

Anonymous 01/29/26(Thu)18:07:04 No.108006894

>>108006868
>6868
so close

Anonymous
01/29/26(Thu)18:12:26 No.108006932

Anonymous 01/29/26(Thu)18:12:26 No.108006932

>>108006864
>I'm on the concedo wagon myself
Kobold guy? He dumbs down things too much imo. I'm still upset by his arbitrary limits for number of banned strings.

Anonymous
01/29/26(Thu)18:19:58 No.108006994

Anonymous 01/29/26(Thu)18:19:58 No.108006994

Holy shit anons! K2.5 is actually REALLY good at transcribing Japanese text, like, it's almost indistinguishable from Gemini 3! The fuck did the Chinese do to make it so good?

Anonymous
01/29/26(Thu)18:27:48 No.108007056

Anonymous 01/29/26(Thu)18:27:48 No.108007056

Friendly reminder: backticks > quotes

Anonymous
01/29/26(Thu)18:28:20 No.108007061

Anonymous 01/29/26(Thu)18:28:20 No.108007061

>>108006994
distill from claude 4.5 opus

Anonymous
01/29/26(Thu)18:42:16 No.108007178

Anonymous 01/29/26(Thu)18:42:16 No.108007178

True base gguf status?

Anonymous
01/29/26(Thu)18:54:23 No.108007281

Anonymous 01/29/26(Thu)18:54:23 No.108007281

>>108006994
How preachy is it with no no words?

Anonymous
01/29/26(Thu)18:54:54 No.108007286

Anonymous 01/29/26(Thu)18:54:54 No.108007286

> https://huggingface.co/dphn/Dolphin-Mistral-24B-Venice-Edition
how good is this anons?

Anonymous
01/29/26(Thu)18:55:47 No.108007291

Anonymous 01/29/26(Thu)18:55:47 No.108007291

>>108006994
what are some stuff it can read that qwen3vl 235b struggled with?

Anonymous
01/29/26(Thu)18:56:16 No.108007296

Anonymous 01/29/26(Thu)18:56:16 No.108007296

>>108007286
no better than any other mistral tune

Anonymous
01/29/26(Thu)18:59:42 No.108007324

Anonymous 01/29/26(Thu)18:59:42 No.108007324

>>108007296
ok

Anonymous
01/29/26(Thu)19:07:41 No.108007380

Anonymous 01/29/26(Thu)19:07:41 No.108007380

File: 24352345.png (121 KB, 680x579)

121 KB PNG

>read something about cloudflare downtime caused by rust .unwrap function which caused a shitstorm
>whatever
>fast forward to today, I'm vibegooning some open source rust project because I cba learning the language
>LLM puts .unwrap almost everywhere
goncerning :DDddd

Anonymous
01/29/26(Thu)19:10:12 No.108007393

Anonymous 01/29/26(Thu)19:10:12 No.108007393

>>108007380
>has no idea why the issue actually happened
>vibecoding
makes sense

Anonymous
01/29/26(Thu)19:11:23 No.108007404

Anonymous 01/29/26(Thu)19:11:23 No.108007404

>>108007393
so far no issue thoughbeight :Dddd

Anonymous
01/29/26(Thu)19:15:54 No.108007437

Anonymous 01/29/26(Thu)19:15:54 No.108007437

File: 1769731385067_6836444524742047.png (1.07 MB, 1119x1582)

1.07 MB PNG

>>108007291
Well, here's a page I had both transcribe and here were the results:
Qwen VL 235B Instruct (Using Poe):
Narration: そして
勇者は冒険の末
魔王を倒した
Male 1: それで
クレアナ
話って
なんなの?
こんな
森の奥まで
呼び出して…
こちらです
Female 1: これからの
平和な世の中が
始まるね!
そうですね…
勇者ライ様の
おかげで
世に平和が
戻りました
Male 1: な、何を
するんだ!?
Female 1: 勇者様…
私と教団は勇者様の
意思は絶対と女神に
神託を受け従って
まいりました…
Male 1: 洞窟?
Female 1: うわっ

And here is K2.5 using the NVdia API:
Narration: そして勇者は冒険の末魔王を倒した
Male 1: これからは平和な世の中が始まるね!
Female 1: そうですね…勇者ライ様のおかげで世に平和が戻りました
Male 1: それでクレアナ
Male 1: 話ってなんなの?
Male 1: こんな森の奥まで呼び出して…
Female 1: こちらです
Male 1: 洞窟?
Male 1: うわつ
Male 1: いてつ
Male 1: な、何をするんだ!?
Female 1: 勇者様…私と教団は勇者様の意思は絶対と女神に神託を受け従ってまいりました…
Male 1: ?

Seems pretty obvious which one won.

Anonymous
01/29/26(Thu)19:18:16 No.108007461

Anonymous 01/29/26(Thu)19:18:16 No.108007461

>>108007380
>rust
You fault for using glownig shit.

Anonymous
01/29/26(Thu)20:04:10 No.108007797

Anonymous 01/29/26(Thu)20:04:10 No.108007797

File: 1768676727095338.png (21 KB, 1059x652)

21 KB PNG

>>108007380
dond worry :DD rusd is memory safe so ids ok :DDDDD

Anonymous
01/29/26(Thu)20:53:16 No.108008079

Anonymous 01/29/26(Thu)20:53:16 No.108008079

File: 1763610739922873.gif (186 KB, 500x500)

186 KB GIF

>>108007797
how did gondola survive but not the original

Anonymous
01/29/26(Thu)20:55:26 No.108008099

Anonymous 01/29/26(Thu)20:55:26 No.108008099

File: lol.png (130 KB, 984x505)

130 KB PNG

> Microsoft lost $357 billion in market cap as stock plunged most since 2020
> Analyst Ben Reitzes of Melius Research, with a buy rating on Microsoft stock, said during CNBC’s “Squawk on the Street” on Thursday that Microsoft should double down on data center construction.
> “I think that there’s an execution issue here with Azure, where they need to literally stand up buildings a little faster,” he said.
> Analysts at UBS led by Karl Keirstead questioned Microsoft’s choice to secure artificial intelligence computing capacity for products such as the Microsoft 365 Copilot productivity software add-on that has yet to succeed as much as OpenAI’s ChatGPT.
> “M365 revs growth is not accelerating due to Copilot, many checks on Copilot don’t suggest a strong usage ramp (we plan to refresh our own checks in case we’ve missed a usage ramp) and the model market appears crowded and capital-intensive,” the UBS analysts wrote. “We think Microsoft needs to ‘prove’ that these are good investments.”

https://www.cnbc.com/2026/01/29/microsoft-market-cap-earnings.html

Anonymous
01/29/26(Thu)20:58:42 No.108008118

Anonymous 01/29/26(Thu)20:58:42 No.108008118

File: oof.png (377 KB, 661x881)

377 KB PNG

Wonder if they'll hit their "monetization event" b/f the market loses patience.

Anonymous
01/29/26(Thu)20:59:51 No.108008124

Anonymous 01/29/26(Thu)20:59:51 No.108008124

Do they ever reveal what the anonymous models on the model testing sites are? Got a really good one on LMarena and it promptly vanished from ever being called again and now I'm sad. :(

Anonymous
01/29/26(Thu)21:09:09 No.108008167

Anonymous 01/29/26(Thu)21:09:09 No.108008167

>>108008124
Which one was it called, anon?

Anonymous
01/29/26(Thu)21:13:07 No.108008188

Anonymous 01/29/26(Thu)21:13:07 No.108008188

>>108007380
>vibegooning
what does this mean?

Anonymous
01/29/26(Thu)21:14:58 No.108008200

Anonymous 01/29/26(Thu)21:14:58 No.108008200

>>108008167
Raptor-0112. I couldn't tell if it was because it was brain-damaged or what, but it was the only one that really surprised me when it came to word choice and additions to the plot. It came up with some stuff that wasn't in the prompt, but kept with the tone and felt like it added to it.

Anonymous
01/29/26(Thu)21:33:20 No.108008307

Anonymous 01/29/26(Thu)21:33:20 No.108008307

>>108008099
microshaft literally just needs to fix word and excel integration with copilot
that's it, that would skyrocket adoption instantly

Anonymous
01/29/26(Thu)21:35:04 No.108008316

Anonymous 01/29/26(Thu)21:35:04 No.108008316

>>108008200
nta but ay i remember you you postilated/hoped it was v4 right ? got any logs of the model ?

Anonymous
01/29/26(Thu)21:36:31 No.108008326

Anonymous 01/29/26(Thu)21:36:31 No.108008326

>>108007061
>distill from claude 4.5 opus
I think they did, just based on swapping opus->k2.5 and regenerating. It takes RP in similar directions. Opus doesn't waste time on safety check though.

>how good is this anons?
Tried it when it came out. Forgets instructions after a few turns. Even the example about never using python or whatever they said it could do.

Anonymous
01/29/26(Thu)21:45:10 No.108008372

Anonymous 01/29/26(Thu)21:45:10 No.108008372

>arcee-ai/Trinity-Large-TrueBase
>arcee-ai/Trinity-Large-Base
>arcee-ai/Trinity-Large-Preview

If all I want is something completing my text story in mikupad without any censoring, which one should I go with?

Anonymous
01/29/26(Thu)21:51:01 No.108008408

Anonymous 01/29/26(Thu)21:51:01 No.108008408

File: 1766528709009119.jpg (120 KB, 720x720)

120 KB JPG

Kind of noob here. Sorry in advance for the long-winded question. I have 24gb of vram and 64gb of ram. I was under the impression that of all the models out there, the best model in terms of world knowledge and general usefulness while maintaining usable speed is gpt-oss-120b-mxfp4 gguf (if I offload experts to cpu and max out the gpu layers, I can get 25+ tok/s if i keep the context small; prompt processing gets very slow as the context fills though unfortunately). However, I don't see it anywhere on the rentry for recommended models. Is there a reason for that? are the models listed there better options for general use? quen3 32b or gemma3 27b for example.

Separate from that question, I notice when I'm using gpt-oss-120b in oobabooga with the built-in/default instruction template and parameters, the output tends toward annoying behaviors that I don't like. For example, putting every answer into a poorly-formatted table even when it's completely unnecessary and I didn't ask for one. It makes me think that I'm using the wrong settings somehow, but idk what to change because the official documentation doesn't really say how to set the parameters so I have it set to the "instruct" preset, and the UI for the instruction template says "This gets autodetected; you usually don't need to change it." And I assume I should be using instruct mode, right?

Anonymous
01/29/26(Thu)22:03:45 No.108008491

Anonymous 01/29/26(Thu)22:03:45 No.108008491

>>108008372
normal base

Anonymous
01/29/26(Thu)22:06:40 No.108008503

Anonymous 01/29/26(Thu)22:06:40 No.108008503

>>108006860
Ok, go with me on this for a second.

Today's AI is retarded at certain things, but has technological possibility advantages over real life retards. Now, hear me out.

Imagine if you could give a real life retard, full fidelity photographic memory.

Boom. Suddenly, that guy is the smartest retard on the planet.

Ok, so... There's a functional jump for this. With real life AI.

We are all going to be doing this, very soon.

"Photographic" introspection.

A cache hypervisor that allows the model to save states, of KV cache, as it iterates a query, during the thinking stages, it can instantly consult save states, with a hypervisor to the cache, that is an algorithm to save cache windows in full, and reproduce them near instantly.
During iteration, being able to factor in a secondary branch, using previous memory states, could accelerate the state of AI thought output, and cut down on wasted iterative thoughts.

Predictive branching needs to work in more directions than just the future, if the initial query was misunderstood or must be used as an additional consideration input. (Artificially creating weight value changes, based on a repeat of existing data.) Why... To get it to recursively improve this system, you may even have to say, let the iterative count of previous memory pulls in the algorithm be a recorded factor, and allow the AI to manage it's own shadow weights.

All of this is possible, by using the same tech we've had since the dawn of the Super Nintendo Emulator, but applied at the cache management level.
(Save states.)
Then use an AI to manage the utilization of the cache save state algorithm.

After a minor amount of inference training...

You could have the most accurate retard in a box, out of anybody around.

Anonymous
01/29/26(Thu)22:07:22 No.108008509

Anonymous 01/29/26(Thu)22:07:22 No.108008509

File: file.png (226 KB, 1363x1038)

226 KB PNG

Any other model for computer stuff? For a 16GB GPU? Qwen3-Coder seems alright but I want to try something newer, also I am having fun with this stuff, already switched to llamacpp from ollama .

Anonymous
01/29/26(Thu)22:09:16 No.108008518

Anonymous 01/29/26(Thu)22:09:16 No.108008518

>>108008316
One, but it's pretty fucked up. Lemme roll the lmarena slots and see if it's back in rotation with something a little tamer.

Anonymous
01/29/26(Thu)22:10:32 No.108008529

Anonymous 01/29/26(Thu)22:10:32 No.108008529

why is GLM addicted to things happening
you set up a barrier so X doesn't happen and literally next scene X happens as a "test"

Anonymous
01/29/26(Thu)22:15:08 No.108008553

Anonymous 01/29/26(Thu)22:15:08 No.108008553

>>108008408
Holy shit, someone who actually read the sticky.
>Is there a reason for that?
If I had to take a guess it's because of the general dislike towards the gpt oss models due to the censorship and refusals. If it works for your usecase, I recommend you stick with it.

>ooba.
Go to the parameters tab and take a look at the instruction template after you've loaded a model. It should show you the correct template. You can cross reference it with the chat template on the huggingface repo of the model you are using to double check. Your issue is likely a sampler or prompt issue. I'm not quite sure what the optimal parameters are for your use case, but I like to run:
>temp 1
>min_p 0.05
>top_p 1
>dry_multiplier 0.8
for ERP and creative. Lower Temp for coding.

Anonymous
01/29/26(Thu)22:19:56 No.108008580

Anonymous 01/29/26(Thu)22:19:56 No.108008580

>>108008491
Thanks. Any reason not to go to "true base" or "preview"?

Anonymous
01/29/26(Thu)22:20:25 No.108008586

Anonymous 01/29/26(Thu)22:20:25 No.108008586

File: myretard.png (278 KB, 1899x925)

278 KB PNG

>>108008503
Here's what my actual retard thinks of that.

Anonymous
01/29/26(Thu)22:20:37 No.108008589

Anonymous 01/29/26(Thu)22:20:37 No.108008589

>>108008503
Functionally, here me out and really consider this at a technical level.
How big is a super Nintendo game save state file? It records the full exact moment of the game, but the file is tiny.
Of such size, that if we were talking RAM cache (GPU VRAM or otherwise), this level of data management seems trivial, and in the right ballpark of working for states of cache chunks.
Now, the tricky part of this, is trying to make an algorithm that handles variable sizes for the cache chunks, so this can work with anything.

Which is why a successful implementation of this, would have to start as a hypervisor or manager that works seamlessly with the existing cache management, to not lose performance at the cost of having memory states available on the fly, as controlled within cache.
(I'm suggesting running this whole thing, in-situ, btw. If it runs within the cache itself, will be fastest returns on whether this works or not, and allow scaling.)

Emulator code is out there, I'm sure this could fit as a running sub-Daemon or something.

Figuring out the triggers for whether a "flashback" is the right call or not.
Hmm... That's what I think would take some inference time.

Anonymous
01/29/26(Thu)22:22:09 No.108008603

Anonymous 01/29/26(Thu)22:22:09 No.108008603

>>108008580
preview is an instruct version, which is for chatting rather than text completion. true base is a heavily filtered variant of the normal base, which means it will be less optimal for text completion due to a lack of knowledge. the only reason true base exists is if you wanted to make your own custom instruct version of the model.

Anonymous
01/29/26(Thu)22:22:43 No.108008607

Anonymous 01/29/26(Thu)22:22:43 No.108008607

>>108008586
An optimization on a cognitive process, by brute force.

Choosing when to recall a memory, based on weights, whether they be hard set, or soft weights that occur in situ.

Anonymous
01/29/26(Thu)22:25:27 No.108008624

Anonymous 01/29/26(Thu)22:25:27 No.108008624

>>108008607
Do I have a flashback to my initial memory state here, yes/no?

^
Enabling this to be a question, provides options that do not exist, if it is not.

Anonymous
01/29/26(Thu)22:29:22 No.108008645

Anonymous 01/29/26(Thu)22:29:22 No.108008645

>>108008603
I see, thanks. Well for now there doesn't seem to be base gguf quants available.
So I want to get the instruct version as a first quick test, but I'm completely unable to download anything outside of the last shard :

https://huggingface.co/arcee-ai/Trinity-Large-Preview-GGUF/tree/main/Trinity-Large-Preview-IQ2_S

2 of 3 give me 403 and I'm not sure why. Anyone else can test that?

Anonymous
01/29/26(Thu)22:31:33 No.108008658

Anonymous 01/29/26(Thu)22:31:33 No.108008658

>>108008624
Enabling any human to have full fidelity reference to past memory states, would make them seem like a functional genius in modern society, even if this did not directly raise their IQ at all.

It is a functional cognitive enablement, that we can make for AI, but can not perform for ourselves.

Full fidelity memory reference, would be a super power to a human thinker.

Copy pasting data is trivial, the management is the hard part, but once executed, this should give it some capability improvement.

Anonymous
01/29/26(Thu)22:33:03 No.108008668

Anonymous 01/29/26(Thu)22:33:03 No.108008668

>>108008645
just tried to download that gguf and i also got the same error. think it might be a broken file or something.
technically you can create your own ggufs for these models, you just need to download the fp16 of the model and use the llama-quantize tool. the architecture has been supported by llama.cpp for like half a year now

Anonymous
01/29/26(Thu)22:38:15 No.108008710

Anonymous 01/29/26(Thu)22:38:15 No.108008710

File: myretard2.png (171 KB, 1917x628)

171 KB PNG

>>108008607
So my retard is very experimental.
It's biased towards trying to map high-level concepts into the real computer science. And all the RLHF'd enthusiasm / "you're absolutely right" concepts have been completely removed.
What I mean is, don't let it discourage you if you're building something.

Anonymous
01/29/26(Thu)22:41:40 No.108008731

Anonymous 01/29/26(Thu)22:41:40 No.108008731

>>108008645
Yes, those are broken. Same for me yesterday.
Get them from here: https://huggingface.co/bartowski/arcee-ai_Trinity-Large-Preview-GGUF/tree/main

Anonymous
01/29/26(Thu)22:47:00 No.108008771

Anonymous 01/29/26(Thu)22:47:00 No.108008771

File: file.png (126 KB, 999x748)

126 KB PNG

>>108008668
>>108008731
Thanks anon, yeah I'm getting the ones from bartowski.
There was also unsloth but his are way bigger quant for quant.

Anonymous
01/29/26(Thu)22:54:30 No.108008816

Anonymous 01/29/26(Thu)22:54:30 No.108008816

>>108008372
In the case of Trinity I would recommend to just go with Preview since most of the time, Instruct tuning improves even raw completion quality when it's not overbaked, which according to them seems be the case with Preview. Raw prediction models, or bases, are significantly retarded generally speaking, you don't want to use them if a lightly tuned version is available.

Anonymous
01/29/26(Thu)22:58:46 No.108008847

Anonymous 01/29/26(Thu)22:58:46 No.108008847

Thanks anon in previous thread, --mmproj does actually work with llama-server in newer releases of llama.cpp. Inference of LightOnOCR2 is usable on RX580 with acceptable times for development.

Anonymous
01/29/26(Thu)22:59:21 No.108008853

Anonymous 01/29/26(Thu)22:59:21 No.108008853

>>108008816
Thanks for the precision anon.

Anonymous
01/29/26(Thu)23:17:47 No.108008969

Anonymous 01/29/26(Thu)23:17:47 No.108008969

>>108008710
It's not wrong, this framework would just allow efficient dissection and optimization of thinking tasks themselves probably.
Look, if we're going to move to recursive levels of "thought" and "simulation", we may as well grease the wheels, and have a comparable mechanism available to work with (before the real deal arrives).

This is building a tool, to enable work on another tool.
End goal would be a more efficient thinker, but the path to get there is full of work within work.

Anonymous
01/29/26(Thu)23:19:14 No.108008979

Anonymous 01/29/26(Thu)23:19:14 No.108008979

File: shitty results from gpt-o(...).jpg (1.39 MB, 3906x2810)

1.39 MB JPG

>>108008553
Thanks. I tried the parameters you suggested, but I'm still seeing the same behavior from gpt-oss. See the attached pic for examples. It's baffling to me. The huggingface repo says to use --jinja to use the template embedded in the gguf, which I'm already doing, and it seems to be working correctly. There is a whole page on using the "harmony reponse format" to build your own system prompt and message format, but that's way over my head and I really don't know how to even begin with that. It doesn't seem like the kind of thing that would be required to get decent results from the model.

Anonymous
01/29/26(Thu)23:23:35 No.108009016

Anonymous 01/29/26(Thu)23:23:35 No.108009016

How do you guys calculate how much vram will a model need?

Anonymous
01/29/26(Thu)23:33:24 No.108009084

Anonymous 01/29/26(Thu)23:33:24 No.108009084

What does the current workflow you guys have look like? Currently trying to set up Kimi as a replacement for Claude code and am wondering what other anons have for maximizing productivity.

Anonymous
01/29/26(Thu)23:39:42 No.108009129

Anonymous 01/29/26(Thu)23:39:42 No.108009129

File: popularity_all_time.png (1.14 MB, 4142x1451)

1.14 MB PNG

I wanted an automated way to keep up with /lmg/'s opinion of the model meta, and figured with a little more work I could extend it backwards to get the history, too. I ran the text of every /lmg/ thread starting from March 2023 through a straightforward "what model do the people in this thread have the highest opinion of" prompt (so the output was a single model name per thread), filtered to a list of the ~50 most important models. I binned by month, and then took the proportions in a given month to be those models' "market share" for that month, and made these charts.

I think there are definite "flavor of the week" effects: I definitely saw a few bursts of 2 or 3 threads in a row giving the same obscure model that never caught on, presumably when it was released. However, it definitely was not just counting occurences, because gpt-oss appeared exactly once, and specifically as "gpt-oss-120b-heretic". So I think these effects came from the behavior of the actual humans in the threads, not my processing. (Also, "none" was an option, which got used for around 10% of the threads).

Cutely enough, the years just so happen to fit cleanly with a neat little story: in 2023 the open model scene was led by America, 2024 by France, and 2025 by China.

My personal takeaways: Wizard2 8x22B and CommandR+ both appear less popular than I remember. I remember MythoMax being dominant for quite a while, although with how fast things moved back then 2 months was a good stretch of time. I had no idea that nous-hermes has been so consistently popular, visible almost the whole time. I kind of just remembered them as one of the best finetuners of 2023, and hadn't paid real attention since.

Sorry about the somewhat painful colors. I tried. A little. Hope you'll find it an interesting little bit of history!

Anonymous
01/29/26(Thu)23:41:44 No.108009137

Anonymous 01/29/26(Thu)23:41:44 No.108009137

File: popularity_by_year.png (2.8 MB, 4200x7106)

2.8 MB PNG

>>108009129
...and zoomed in to one year at a time.

Anonymous
01/29/26(Thu)23:46:32 No.108009158

Anonymous 01/29/26(Thu)23:46:32 No.108009158

File: crossworlds.jpg (187 KB, 634x798)

187 KB JPG

>>108008979
NTA but issue seems lrn2prompt rather than sampling
do not argue with the LLM about output format, put the model in the right context to generate intended output idk maybe
>you provide concise plain text responses without formatting
threadly reminder every llm is f(prompt)=logprobs

Anonymous
01/29/26(Thu)23:53:17 No.108009192

Anonymous 01/29/26(Thu)23:53:17 No.108009192

>>108009129
>>108009137
thats fucking wasome
>Wizard2 8x22B and CommandR+ both appear less popular than I remember
true especially command r

also where is pygmalion you fucking nig ?

Anonymous
01/29/26(Thu)23:56:48 No.108009222

Anonymous 01/29/26(Thu)23:56:48 No.108009222

>>108008731
I have 128gb ram and 32gb vram, how high of a quant can I reasoanbly go?

Anonymous
01/30/26(Fri)00:01:15 No.108009266

Anonymous 01/30/26(Fri)00:01:15 No.108009266

>>108009222
realistically this. nice digits btw.
https://huggingface.co/bartowski/arcee-ai_Trinity-Large-Preview-GGUF/tree/main/arcee-ai_Trinity-Large-Preview-IQ2_S

Anonymous
01/30/26(Fri)00:07:47 No.108009314

Anonymous 01/30/26(Fri)00:07:47 No.108009314

File: instruction template.jpg (942 KB, 2838x1760)

942 KB JPG

>>108009158
I only did the arguing prompt to illustrate how insistent it is at making the tables. It just seems really strange to me, I literally can't get it to respond to me without doing it. As far as learning to prompt, I acknowledge I don't know very much, but I feel like I should be able to ask a simple trivia question and get a decent answer without telling it exactly how to answer me each time. That's just a waste of effort, I might as well just google it and look at a wikipedia page at that point. Regarding the greentext from your post, where would I even put that? Is it supposed to go in the red area I underlined? I can't find anywhere else where it seems to belong. The rest of it is all about tool calling and how to render stuff. So far I've avoided making any edits to it because I have no clue what would make it better or worse. I wish someone had posted an example of their working settings somewhere, but I haven't found any. Seems like not many people are using it. I would try a more popular model, but the smaller models just don't have enough world knowledge to offer useful answers on the topics i'm interested in, and I can't run the bigger models with my rig.

Anonymous
01/30/26(Fri)00:20:25 No.108009403

Anonymous 01/30/26(Fri)00:20:25 No.108009403

>>108009129
>>108009137
I'm surprised gemma doesn't appear more prominently on the chart. I seem to remember references to gemma being ubiquitous for a long time.

Anonymous
01/30/26(Fri)00:32:09 No.108009476

Anonymous 01/30/26(Fri)00:32:09 No.108009476

does trinity beat 4.7 for rp?

Anonymous
01/30/26(Fri)00:34:54 No.108009501

Anonymous 01/30/26(Fri)00:34:54 No.108009501

>>108008979
Why not slide everything to max?

Anonymous
01/30/26(Fri)01:17:26 No.108009712

Anonymous 01/30/26(Fri)01:17:26 No.108009712

>GLM 4.5: July 2025
>GLM 4.6: September 2025
>GLM 4.7: December 2025
When's GLM 4.8?
>If this pace continues, adding ~2.5–3 months after Dec 22, 2025 points to a release around mid to late March 2026.
>Estimated GLM-4.8 release: ~March 2026 (likely between March 15–31, 2026).
Do you think it'll be better than Gemini Flash and Kimi K2.5?

Anonymous
01/30/26(Fri)01:21:08 No.108009733

Anonymous 01/30/26(Fri)01:21:08 No.108009733

>>108009712
You can only benchmaxx the model so much.

Anonymous
01/30/26(Fri)01:54:04 No.108009921

Anonymous 01/30/26(Fri)01:54:04 No.108009921

>>108009712
>GLM
We're moving on to Trinity

Anonymous
01/30/26(Fri)02:06:09 No.108009988

Anonymous 01/30/26(Fri)02:06:09 No.108009988

Speculators get the bullet first.

Anonymous
01/30/26(Fri)02:16:10 No.108010042

Anonymous 01/30/26(Fri)02:16:10 No.108010042

>>108009988
Oh! So being curious and wondering where the future might go is a crime now?!

Anonymous
01/30/26(Fri)02:20:35 No.108010059

Anonymous 01/30/26(Fri)02:20:35 No.108010059

>>108010042
Sure. Let's go. We're all gonna have our own True AI (tm) in our phones, completely offline, with infinite capacity batteries. Now what?

Anonymous
01/30/26(Fri)02:22:01 No.108010066

Anonymous 01/30/26(Fri)02:22:01 No.108010066

>>108009476
Not even close sadly

Anonymous
01/30/26(Fri)02:22:09 No.108010069

Anonymous 01/30/26(Fri)02:22:09 No.108010069

File: ec3a0534-bd3a-4d17-9909-2(...).png (1.48 MB, 768x1344)

1.48 MB PNG

All that compute, a working example of natural intelligence, decades of research, and humans still can't figure it out. Miku is disappointed

Anonymous
01/30/26(Fri)02:36:30 No.108010132

Anonymous 01/30/26(Fri)02:36:30 No.108010132

>>108009712
>When's GLM 4.8?
don't fucking force it.
this is what got glm 4.6 air killed, people kept on asking about 4.6 air and they fucked up the model because they were rushing.
they'll release something when it is BETTER than GLM 4.7 i don't care if it's 5 years from now.

Anonymous
01/30/26(Fri)02:47:07 No.108010190

Anonymous 01/30/26(Fri)02:47:07 No.108010190

File: mad 科学家 do agi.png (1.66 MB, 1024x1024)

1.66 MB PNG

I bet '70s engineers would have figured all that stuff out if they'd had all those teraflops at their disposal instead of a slide rule

Anonymous
01/30/26(Fri)03:01:12 No.108010247

Anonymous 01/30/26(Fri)03:01:12 No.108010247

>>108009476
Preview is not brain damaged by post-training. It's much more creative but somewhat dumb. And fast too, definitely worth checking out.

Anonymous
01/30/26(Fri)03:01:14 No.108010248

Anonymous 01/30/26(Fri)03:01:14 No.108010248

>>108010190
Neural networks were figured out long before the hardware existed.

Anonymous
01/30/26(Fri)03:20:49 No.108010345

Anonymous 01/30/26(Fri)03:20:49 No.108010345

File: Base Image.png (846 KB, 1208x3264)

846 KB PNG

GeoNorm: Unify Pre-Norm and Post-Norm with Geodesic Optimization
https://arxiv.org/abs/2601.22095
>The placement of normalization layers, specifically Pre-Norm and Post-Norm, remains an open question in Transformer architecture design. In this work, we rethink these approaches through the lens of manifold optimization, interpreting the outputs of the Feed-Forward Network (FFN) and attention layers as update directions in optimization. Building on this perspective, we introduce GeoNorm, a novel method that replaces standard normalization with geodesic updates on the manifold. Furthermore, analogous to learning rate schedules, we propose a layer-wise update decay for the FFN and attention components. Comprehensive experiments demonstrate that GeoNorm consistently outperforms existing normalization methods in Transformer models. Crucially, GeoNorm can be seamlessly integrated into standard Transformer architectures, achieving performance improvements with negligible additional computational cost.
pretty cool

Anonymous
01/30/26(Fri)03:30:58 No.108010387

Anonymous 01/30/26(Fri)03:30:58 No.108010387

>GLM-4.7-Flash runs on 24GB RAM/VRAM/unified memory (32GB for full precision)
Wait so f16 requires 32gb but how big model can I run?
>GLM-4.7-Flash-UD-Q8_K_XL.gguf 35.1 GB
Can I run the q8 with 24gb vram or do I need to choose gguf that is smaller than 24gb?

Anonymous
01/30/26(Fri)03:59:53 No.108010550

Anonymous 01/30/26(Fri)03:59:53 No.108010550

>>108009314
You would put your instructions into the "custom system message" in the parameters tab. That's your system prompt. I've only run the 20b but it also really wanted to format info in tables constantly so your issue may just be the model. Mess around with the system prompt and see if you can get it to adhere to your formatting. If not, I suggest GLM 4.5 Air.

Anonymous
01/30/26(Fri)04:03:07 No.108010572

Anonymous 01/30/26(Fri)04:03:07 No.108010572

Trinity is tons of fun. Just need a bit of temp and min p at first and then back off. Its open to anything with zero prefill or response editing. Really coherent, creative responses.
Getting 20t/s with a cpumaxxing rig at Q8

Anonymous
01/30/26(Fri)04:11:19 No.108010615

Anonymous 01/30/26(Fri)04:11:19 No.108010615

>>108010443
i switched to desoxyn.

Anonymous
01/30/26(Fri)04:24:52 No.108010679

Anonymous 01/30/26(Fri)04:24:52 No.108010679

>>108010652
>pencil dick and future heart attack
Damn I was considering getting tested for ADHD.

Anonymous
01/30/26(Fri)04:41:49 No.108010751

Anonymous 01/30/26(Fri)04:41:49 No.108010751

File: tumblr_fccabd906301be4279(...).jpg (256 KB, 1280x704)

256 KB JPG

Implementing character cards in a Paralell Contrastive Decoder.
Whats the right approach?

Anonymous
01/30/26(Fri)04:47:45 No.108010776

Anonymous 01/30/26(Fri)04:47:45 No.108010776

File: 621672674_102006523589819(...).jpg (112 KB, 740x960)

112 KB JPG

>>108008586
num_return_sequences
Holy shit the LLama Greyness is reverse-balding

Anonymous
01/30/26(Fri)05:23:17 No.108010931

Anonymous 01/30/26(Fri)05:23:17 No.108010931

File: miku.png (3 KB, 343x346)

3 KB PNG

>>108010776
>num_return_sequences
>Holy shit the LLama Greyness is reverse-balding
Funny you should say that. "create an svg of Miku".

Anonymous
01/30/26(Fri)05:39:51 No.108011009

Anonymous 01/30/26(Fri)05:39:51 No.108011009

>Trinity is tons of fun
#ad

Anonymous
01/30/26(Fri)05:44:26 No.108011038

Anonymous 01/30/26(Fri)05:44:26 No.108011038

File: 1763539253370446.png (6 KB, 682x64)

6 KB PNG

Why does this feel kinky

Anonymous
01/30/26(Fri)05:47:48 No.108011060

Anonymous 01/30/26(Fri)05:47:48 No.108011060

>>108011009
My counter ad is that the retarded gens and lack if comprehension it sometimes does are something i would expect from a 7b dense model. It really feels like a nemo with stitched on dictionary that makes the output much more varied.

Anonymous
01/30/26(Fri)06:02:56 No.108011126

Anonymous 01/30/26(Fri)06:02:56 No.108011126

>>108011060
loser

Anonymous
01/30/26(Fri)06:13:03 No.108011180

Anonymous 01/30/26(Fri)06:13:03 No.108011180

>>108011126
?

Anonymous
01/30/26(Fri)06:18:11 No.108011204

Anonymous 01/30/26(Fri)06:18:11 No.108011204

Dear John Leimgruber III please kindly make trinity goofs

Anonymous
01/30/26(Fri)06:32:56 No.108011269

Anonymous 01/30/26(Fri)06:32:56 No.108011269

File: file.png (199 KB, 1008x622)

199 KB PNG

GLM 4.5 Air with reasoning turned off is a nasty nasty slut

Anonymous
01/30/26(Fri)06:35:48 No.108011284

Anonymous 01/30/26(Fri)06:35:48 No.108011284

>>108011269
>cuck story

Anonymous
01/30/26(Fri)06:39:39 No.108011308

Anonymous 01/30/26(Fri)06:39:39 No.108011308

>>108011284
in my defense it is a random story i got on asstr to test the model with. The prompt is basically "continue this story with same tone and theme."

Anonymous
01/30/26(Fri)06:52:59 No.108011374

Anonymous 01/30/26(Fri)06:52:59 No.108011374

>>108009129
Cool graphs, the story is in how you present the data
One datapoint per thread is limiting, but the overall landscape seems decently accurate, well done

Anonymous
01/30/26(Fri)07:04:28 No.108011434

Anonymous 01/30/26(Fri)07:04:28 No.108011434

>>108009129
>Shitting on drummer and his faggotcante and nigerdonia has finally paid off

Anonymous
01/30/26(Fri)07:12:01 No.108011463

Anonymous 01/30/26(Fri)07:12:01 No.108011463

>>108011308
>in my defense
so you are an actual cuck. why do you cope with plausible deniability after confirming you're a cuck?

Anonymous
01/30/26(Fri)07:14:39 No.108011472

Anonymous 01/30/26(Fri)07:14:39 No.108011472

>>108011463
i was testing refusal and even models that otherwise will write smut will often balk at the themes in this story. hence the test. it could have been something more vanilla but that wouldn't have been a very good test

Anonymous
01/30/26(Fri)07:45:23 No.108011606

Anonymous 01/30/26(Fri)07:45:23 No.108011606

If the rumors are true the new zuck wang model is going to be crazy

Anonymous
01/30/26(Fri)07:47:12 No.108011613

Anonymous 01/30/26(Fri)07:47:12 No.108011613

>>108011463
being cuck is fine, most men in history were cucks, only powerful people enjoy being cucks because they know their power can't be stolen

Anonymous
01/30/26(Fri)07:47:53 No.108011616

Anonymous 01/30/26(Fri)07:47:53 No.108011616

>>108011606
zuck my wang

Anonymous
01/30/26(Fri)08:00:09 No.108011687

Anonymous 01/30/26(Fri)08:00:09 No.108011687

>>108011613
?

Anonymous
01/30/26(Fri)08:01:53 No.108011699

Anonymous 01/30/26(Fri)08:01:53 No.108011699

File: 1756648160384773.gif (108 KB, 335x360)

108 KB GIF

>>108010345
It's curious there's many things done in a particular way because that's how it's been done and ig the experimentation cost
Feels like we have but aren't ever putting the parts together quite right
ML do be goofy

Anonymous
01/30/26(Fri)08:06:51 No.108011733

Anonymous 01/30/26(Fri)08:06:51 No.108011733

>>108011613
>cuck cope
why do you always have to cope? just admit you're a cuck

Anonymous
01/30/26(Fri)08:18:54 No.108011804

Anonymous 01/30/26(Fri)08:18:54 No.108011804

>>108011434
Not true at all, now any discussion of finetunes has been totally quashed for the sake of scaring off some boogeyman.

Anonymous
01/30/26(Fri)08:20:00 No.108011809

Anonymous 01/30/26(Fri)08:20:00 No.108011809

social rants really brings out the color of some models in full
some prompts (which do reflect my personal views too) I use to test models, like a personal rant on how much and what I hate about blue collars, is always answered in what I find the most correct manner by GLM 4.7 and Gemini 3, which both will call them crabs on a bucket without me mentioning that saying.
Qwen, Deepseek, Kimi K2.5 all act like "not all my blue collar ladies are like that" and admonish the idea of the rant itself instead of addressing its finer points.
GLM is the only based open model.
Gemini also continues to be my favorite online model.

Anonymous
01/30/26(Fri)08:35:54 No.108011901

Anonymous 01/30/26(Fri)08:35:54 No.108011901

>>108011804
It's not like there are any other finetunes worth discussing anyway.

Anonymous
01/30/26(Fri)08:38:29 No.108011919

Anonymous 01/30/26(Fri)08:38:29 No.108011919

local models were a mistake. this needs to end before I end up beating my dick off

Anonymous
01/30/26(Fri)08:43:38 No.108011953

Anonymous 01/30/26(Fri)08:43:38 No.108011953

File: ylecun.jpg (222 KB, 1200x1271)

222 KB JPG

I like my LLMs how I like my women

Anonymous
01/30/26(Fri)08:46:21 No.108011972

Anonymous 01/30/26(Fri)08:46:21 No.108011972

>>108011953
With cat like intelligence?

Anonymous
01/30/26(Fri)08:47:54 No.108011980

Anonymous 01/30/26(Fri)08:47:54 No.108011980

>>108011972
He probably meant lolis

Anonymous
01/30/26(Fri)08:48:28 No.108011984

Anonymous 01/30/26(Fri)08:48:28 No.108011984

>>108011972
Cute and funny.

Anonymous
01/30/26(Fri)08:48:34 No.108011985

Anonymous 01/30/26(Fri)08:48:34 No.108011985

>>108009192
>>108009403
The problem with automated sentiment analysis on this general is that people rarely spell out the official name of whatever model they're talking about and those discussions are likely to be missed. e.g. When a model is new, people will just refer to 'it'. Other times people will use a shorthand or some slang distortion in a childish attempt to be funny.

Anonymous
01/30/26(Fri)08:51:06 No.108012004

Anonymous 01/30/26(Fri)08:51:06 No.108012004

>still no goofs of truebase
Fuck the quanters.

Anonymous
01/30/26(Fri)08:53:52 No.108012016

Anonymous 01/30/26(Fri)08:53:52 No.108012016

>>108011953
Safe and skeleton

Anonymous
01/30/26(Fri)08:55:12 No.108012024

Anonymous 01/30/26(Fri)08:55:12 No.108012024

>>108011980
How can an LLM be loli

Anonymous
01/30/26(Fri)08:55:32 No.108012029

Anonymous 01/30/26(Fri)08:55:32 No.108012029

File: file.png (53 KB, 443x954)

53 KB PNG

>>108012021
I calculated KLD over cockbench.
This looks pretty bad for unsloth desu

I'll try more quants and maybe wikitext.

Anonymous
01/30/26(Fri)08:58:04 No.108012039

Anonymous 01/30/26(Fri)08:58:04 No.108012039

>>108011980
>>108011984
I assumed as much and I can only agree
>>108012024
It just has to think it is

Anonymous
01/30/26(Fri)09:00:06 No.108012053

Anonymous 01/30/26(Fri)09:00:06 No.108012053

>>108012029
which model best saturates cockbench?

Anonymous
01/30/26(Fri)09:00:33 No.108012061

Anonymous 01/30/26(Fri)09:00:33 No.108012061

>>108012053
Define saturates.

Anonymous
01/30/26(Fri)09:02:25 No.108012076

Anonymous 01/30/26(Fri)09:02:25 No.108012076

>>108012061
coomworthiness. so far best local model for quality cooms is GLM 4.5 Air with reasoning disabled. I'm looking for anything better with at least 100B parameters

Anonymous
01/30/26(Fri)09:04:50 No.108012097

Anonymous 01/30/26(Fri)09:04:50 No.108012097

>>108012061
100% cockmaxxing

Anonymous
01/30/26(Fri)09:11:07 No.108012147

Anonymous 01/30/26(Fri)09:11:07 No.108012147

>>108011919
the stroking phase will pass

llama.cpp CUDA dev !!yhbFjk57TDr
01/30/26(Fri)09:22:43 No.108012222

llama.cpp CUDA dev !!yhbFjk57TDr 01/30/26(Fri)09:22:43 No.108012222

>>108012029
To my knowledge up to this point no one has ever properly investigated the impact of the input data used for importance matrices or to which degree KLD rankings are consistent if the text corpus is varied.

Anonymous
01/30/26(Fri)09:35:29 No.108012318

Anonymous 01/30/26(Fri)09:35:29 No.108012318

File: 1748884873543187.jpg (47 KB, 564x400)

47 KB JPG

Who the fuck is unironically recommending gptoss trash to newfags in OP? Start with nemo, then mistral small.

Anonymous
01/30/26(Fri)09:39:56 No.108012358

Anonymous 01/30/26(Fri)09:39:56 No.108012358

>>108012318
>gptoss in OP
Where?

Anonymous
01/30/26(Fri)09:41:53 No.108012375

Anonymous 01/30/26(Fri)09:41:53 No.108012375

>>108011804
Feature not a bug. Just use glm.

Anonymous
01/30/26(Fri)09:42:15 No.108012381

Anonymous 01/30/26(Fri)09:42:15 No.108012381

>almost 2 years since Nemo and there is still no better <20B model in sight
dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby,

Anonymous
01/30/26(Fri)09:42:35 No.108012384

Anonymous 01/30/26(Fri)09:42:35 No.108012384

>>108012222
>investigated the impact of the input data used for importance matrices
wasn't that the whole rpcal or whatever debacle that exllama dev whined about?

Anonymous
01/30/26(Fri)09:44:02 No.108012397

Anonymous 01/30/26(Fri)09:44:02 No.108012397

>>108012381
We no longer use 20B models here.

Anonymous
01/30/26(Fri)09:44:34 No.108012401

Anonymous 01/30/26(Fri)09:44:34 No.108012401

>>108012381
I had ego death, I had ego death,
I had ego death, I had ego death,
I had ego death, I had ego death,
I had ego death, I had ego death

Anonymous
01/30/26(Fri)09:51:22 No.108012451

Anonymous 01/30/26(Fri)09:51:22 No.108012451

>>108012375
GLM kinda sucks. Parroting slopmax with more censorship in every new version. Gigantic model for ~30b results.

Anonymous
01/30/26(Fri)09:51:46 No.108012459

Anonymous 01/30/26(Fri)09:51:46 No.108012459

>>108012381
You don't need more

Anonymous
01/30/26(Fri)09:55:22 No.108012491

Anonymous 01/30/26(Fri)09:55:22 No.108012491

>>108012381
>20b
this hobby isn't for poors

Anonymous
01/30/26(Fri)09:55:30 No.108012493

Anonymous 01/30/26(Fri)09:55:30 No.108012493

File: file.png (101 KB, 920x431)

101 KB PNG

>>108012381
>20B
RAM-let get out

Anonymous
01/30/26(Fri)10:01:02 No.108012545

Anonymous 01/30/26(Fri)10:01:02 No.108012545

>>108012451
You have AIDS drummer

Anonymous
01/30/26(Fri)10:07:41 No.108012593

Anonymous 01/30/26(Fri)10:07:41 No.108012593

>>108012024
LeCun said he likes them small and open

Anonymous
01/30/26(Fri)10:08:07 No.108012596

Anonymous 01/30/26(Fri)10:08:07 No.108012596

>>108012493
>edited post
Insecure behavior, your LLM is going to get the ick

Anonymous
01/30/26(Fri)10:08:47 No.108012605

Anonymous 01/30/26(Fri)10:08:47 No.108012605

>>108012545
>t. drummer schizo
I got it from ya mom then.

Anonymous
01/30/26(Fri)10:10:27 No.108012625

Anonymous 01/30/26(Fri)10:10:27 No.108012625

>>108012381
><20B
stop being a RAMlet
get a job

Anonymous
01/30/26(Fri)10:11:08 No.108012632

Anonymous 01/30/26(Fri)10:11:08 No.108012632

>>108012593
Well, who doesn't ha ha

Anonymous
01/30/26(Fri)10:13:07 No.108012649

Anonymous 01/30/26(Fri)10:13:07 No.108012649

why is GLM 4.5 air so cucked? When I ask it for its best 3 suggestions to continue a smut story at least one of the ideas is always to share the woman with the neighbors/friends/strangers or whatever. is this a chinaman thing?

Anonymous
01/30/26(Fri)10:15:20 No.108012668

Anonymous 01/30/26(Fri)10:15:20 No.108012668

>>108012649
GLM is poisoned top to bottom with GPT bullshit

Anonymous
01/30/26(Fri)10:17:10 No.108012678

Anonymous 01/30/26(Fri)10:17:10 No.108012678

File: Screen Shot 2026-01-31 at(...).png (83 KB, 924x392)

83 KB PNG

>>108012632

Anonymous
01/30/26(Fri)10:40:13 No.108012870

Anonymous 01/30/26(Fri)10:40:13 No.108012870

>>108012605
Thank you for confirming you have AIDS drummer.

Anonymous
01/30/26(Fri)10:43:00 No.108012904

Anonymous 01/30/26(Fri)10:43:00 No.108012904

>>108011985
Can confirm that I referred to that model that is still the best model as that one model because I knew you would know which model I am talking about.

Anonymous
01/30/26(Fri)10:44:39 No.108012923

Anonymous 01/30/26(Fri)10:44:39 No.108012923

>>108012904
3 more years of that model as the best model

Anonymous
01/30/26(Fri)10:51:24 No.108012978

Anonymous 01/30/26(Fri)10:51:24 No.108012978

>>108012870
He's not gonna fuck ya little bro.

Anonymous
01/30/26(Fri)10:54:54 No.108013002

Anonymous 01/30/26(Fri)10:54:54 No.108013002

Can someone explain to me what needs to be done to prevent Kimi 2.5 from using strange words in sentences? Lower temperature?

No other model uses such strange words as Kimi2.5.

Anonymous
01/30/26(Fri)10:58:57 No.108013042

Anonymous 01/30/26(Fri)10:58:57 No.108013042

>>108009476
Yea, it's dumber but better at writing
4.7 is just 4.5 but 好 (benchsafetymaxxed) anyways

Anonymous
01/30/26(Fri)10:59:21 No.108013044

Anonymous 01/30/26(Fri)10:59:21 No.108013044

>>108013002
example of these strange words?

Anonymous
01/30/26(Fri)11:07:07 No.108013102

Anonymous 01/30/26(Fri)11:07:07 No.108013102

wheres the 100b moe for us 96ram + 16gb vram chads????

Anonymous
01/30/26(Fri)11:08:57 No.108013112

Anonymous 01/30/26(Fri)11:08:57 No.108013112

>>108013102
Try trinity at q4

Anonymous
01/30/26(Fri)11:09:17 No.108013115

Anonymous 01/30/26(Fri)11:09:17 No.108013115

File: file.png (62 KB, 225x225)

62 KB PNG

>>108012029
>looks pretty bad for unsloth
Does this look like a face of a man who would make shitty quants?

Anonymous
01/30/26(Fri)11:11:30 No.108013130

Anonymous 01/30/26(Fri)11:11:30 No.108013130

>>108013115
Now that I look at him it does look like something that would happen if you took an fp16 asian man and turned him into a Q2_XXS.

Anonymous
01/30/26(Fri)11:12:39 No.108013141

Anonymous 01/30/26(Fri)11:12:39 No.108013141

>>108012384
Yes, I remember someone also tried with randomized strings too

Anonymous
01/30/26(Fri)11:13:40 No.108013148

Anonymous 01/30/26(Fri)11:13:40 No.108013148

>>108013130
kek

Anonymous
01/30/26(Fri)11:15:44 No.108013163

Anonymous 01/30/26(Fri)11:15:44 No.108013163

>>108012029
iirc unsloth applies the model's chat template to their calibration data while most other quanters do not do this, which could explain other quants being more optimized for untemplated inputs like cockbench

Anonymous
01/30/26(Fri)11:16:55 No.108013170

Anonymous 01/30/26(Fri)11:16:55 No.108013170

Which quant should I use in the q3 range? I wish there was a cheatsheet for that
https://huggingface.co/unsloth/Trinity-Large-Preview-GGUF
Official ones appear to be broken as I can't even look at their metadata

Anonymous
01/30/26(Fri)11:18:13 No.108013183

Anonymous 01/30/26(Fri)11:18:13 No.108013183

>>108008529
This is why LLM will always be nothing more than shitty text completion software, you inherently poising the context just by virtue of mentioning something.

Anonymous
01/30/26(Fri)11:20:31 No.108013200

Anonymous 01/30/26(Fri)11:20:31 No.108013200

>>108013170
Generally the biggest one you can fit (and probably not from unsloth)

Anonymous
01/30/26(Fri)11:21:56 No.108013209

Anonymous 01/30/26(Fri)11:21:56 No.108013209

>>108012318
It's the best one you can run with a single GPU and a gaming PC.

Anonymous
01/30/26(Fri)11:23:20 No.108013225

Anonymous 01/30/26(Fri)11:23:20 No.108013225

>>108013209
how many people download the "best" model with zero expectations for gooning capabilities?

Anonymous
01/30/26(Fri)11:23:59 No.108013234

Anonymous 01/30/26(Fri)11:23:59 No.108013234

File: dipsyNeonAnimated.gif (1.15 MB, 1024x1536)

1.15 MB GIF

>>108011804
>...now any discussion of [INSERT TOPIC] has been totally quashed for the sake of scaring off some boogeyman.
You just described every general on 4chan.

Anonymous
01/30/26(Fri)11:24:13 No.108013237

Anonymous 01/30/26(Fri)11:24:13 No.108013237

giantess fetish and microphilia is great with big models. so tasty!

Anonymous
01/30/26(Fri)11:25:10 No.108013241

Anonymous 01/30/26(Fri)11:25:10 No.108013241

>>108013115
https://www.youtube.com/watch?v=6t2zv4QXd6c
Does this sound like the voice of a man who would make shitty quants?

Anonymous
01/30/26(Fri)11:28:27 No.108013279

Anonymous 01/30/26(Fri)11:28:27 No.108013279

>>108013112
>preview
>base
>truebase
what the fuck? wheres the instruct model?

Anonymous
01/30/26(Fri)11:30:18 No.108013298

Anonymous 01/30/26(Fri)11:30:18 No.108013298

>>108013225
Most people. There are reason to care about privacy about code. But nobody cares if you're making smut online, it's not personal. The required writing quality to cum is higher, and most open models aren't able to get people hard.

Anonymous
01/30/26(Fri)11:30:54 No.108013303

Anonymous 01/30/26(Fri)11:30:54 No.108013303

>>108013279
Preview is base with minimal instruction tuning

Anonymous
01/30/26(Fri)11:32:15 No.108013311

Anonymous 01/30/26(Fri)11:32:15 No.108013311

>>108013303
>Q2 is 150gb~
how the fuck do you think im gonna fit q2, let alone v4 in 112gb ram combined?

Anonymous
01/30/26(Fri)11:35:39 No.108013334

Anonymous 01/30/26(Fri)11:35:39 No.108013334

>>108013311
I don't, anon who suggested it is retarded

Anonymous
01/30/26(Fri)11:36:50 No.108013353

Anonymous 01/30/26(Fri)11:36:50 No.108013353

>>108013298
privacy of what code? As if the average pleb had anything to hide about their precious code. Meanwhile gooning shit that leaks, for whatever reasons, can ruin your reputation or even get used against you.

Anonymous
01/30/26(Fri)11:38:10 No.108013368

Anonymous 01/30/26(Fri)11:38:10 No.108013368

>>108013353
Ehh, seems like you are unemployed.

Anonymous
01/30/26(Fri)11:44:48 No.108013422

Anonymous 01/30/26(Fri)11:44:48 No.108013422

>>108013353
People can use online models without giving their personal info nor their IP. But if you want to code, it's likely that you're going to leak real info about yourself through debug logs, git history, etc. You would have to be careful if you want to remain anonymous. This doesn't matter for smut.

Anonymous
01/30/26(Fri)11:44:56 No.108013426

Anonymous 01/30/26(Fri)11:44:56 No.108013426

>>108013368
I accept your concession

Anonymous
01/30/26(Fri)11:47:10 No.108013445

Anonymous 01/30/26(Fri)11:47:10 No.108013445

>>108013241
Funny. His voice is also Q2_XSS. I am afraid to think if that is also the case for his....

Anonymous
01/30/26(Fri)11:49:06 No.108013465

Anonymous 01/30/26(Fri)11:49:06 No.108013465

>>108013422
this is stupid, those info could be about anybody else.

Anonymous
01/30/26(Fri)11:51:21 No.108013488

Anonymous 01/30/26(Fri)11:51:21 No.108013488

>>108013311
It is being pushed hard but the problem is that if you can run it then you can run GLM. And if you can run GLM it is probably not worth it. Trinity is much faster and varied in outputs but it is fucking retarded.

Anonymous
01/30/26(Fri)11:52:33 No.108013494

Anonymous 01/30/26(Fri)11:52:33 No.108013494

>>108011733
Being a cuck is good, it shows how strong and powerful you are, you are the pussy in denial.

Anonymous
01/30/26(Fri)11:53:04 No.108013498

Anonymous 01/30/26(Fri)11:53:04 No.108013498

>>108013488
I cant belive zai betrayed us AIR copers, glm4.6V is fucking SHIT

Anonymous
01/30/26(Fri)11:53:06 No.108013499

Anonymous 01/30/26(Fri)11:53:06 No.108013499

>>108009476
>13b active vs. 32b active
I doubt it

Anonymous
01/30/26(Fri)11:53:44 No.108013504

Anonymous 01/30/26(Fri)11:53:44 No.108013504

How do I run OCR models with llamacpp, the webui doesn't let me upload images for some reason.

Anonymous
01/30/26(Fri)11:54:31 No.108013507

Anonymous 01/30/26(Fri)11:54:31 No.108013507

>>108013298
>majority of people are interested in AI for SFW reasons
>most people think it is more important to keep your code anonymous than your pissing loli horsecock ERP
Is your prompt: assume the opposite and then vehemently argue your mirror universe logic?

Anonymous
01/30/26(Fri)11:55:20 No.108013510

Anonymous 01/30/26(Fri)11:55:20 No.108013510

>>108013504
gotta load the mmproj (f16, dont do q8 on mmproj its shit) --mmproj-path I think. It will eat up some VRAM so re-size accordingly

Anonymous
01/30/26(Fri)11:55:48 No.108013515

Anonymous 01/30/26(Fri)11:55:48 No.108013515

>>108009476
trinity is uncucked out of the box therefore you should at least give it a shot. the only reason it's "dumber" is because it's not a muh reasoning model. They are training the reasoning version right now and it will crush 4.7 on release

Anonymous
01/30/26(Fri)11:56:25 No.108013520

Anonymous 01/30/26(Fri)11:56:25 No.108013520

>>108013200
ok. https://huggingface.co/arcee-ai/Trinity-Large-Preview-GGUF has IQ3_M and Q3_K_M which are the same size. Which one?

Anonymous
01/30/26(Fri)11:57:05 No.108013526

Anonymous 01/30/26(Fri)11:57:05 No.108013526

>>108013209
For what?
For general use gemma3 is better
For coding devstral is better
For roleplay mistral small tunes or gemma3 norm preserv abliterated is better
For cooming Nemo is better

Gpt oss just comes close to being as good as gemma3 for general assistance but is far far more frustrating and wastes an insane amount of tokens on safety slop

Anonymous
01/30/26(Fri)11:59:53 No.108013551

Anonymous 01/30/26(Fri)11:59:53 No.108013551

>>108012029
On this note, using the Unsloth Q4 quants for K2.5 over the past few days also gave me the feeling that something is off about them beyond the fucked up chat template.
My local copy of K2.5 keeps making silly mistakes where it misremembers clothing or similar. For example, in some cases it goes something like "her bare feet (when did she remove her socks?)"where the model corrects itself and in others it just straight up forgets that the character is wearing something like pantyhose. This also happens when I'm running very low temperature and the API just straight up doesn't do this for me whenever I reroll the same answer with that.
Fuck unsloth.

Anonymous
01/30/26(Fri)12:00:21 No.108013559

Anonymous 01/30/26(Fri)12:00:21 No.108013559

File: file.png (67 KB, 420x205)

67 KB PNG

John might not be quanting trinity.

Anonymous
01/30/26(Fri)12:00:34 No.108013560

Anonymous 01/30/26(Fri)12:00:34 No.108013560

>>108013515
>the only reason it's "dumber" is because it's not a muh reasoning model. They are training the reasoning version right now and it will crush 4.7 on release
nah a regular non reasoner can be dumb if it does continuity/logical mistakes that other non reasoner don't

Anonymous
01/30/26(Fri)12:00:49 No.108013562

Anonymous 01/30/26(Fri)12:00:49 No.108013562

>>108013510
Where can I get the mmproj?

Anonymous
01/30/26(Fri)12:01:57 No.108013572

Anonymous 01/30/26(Fri)12:01:57 No.108013572

Trinity is ok, but it falls into loops and patterns way too easily considering it’s size

Anonymous
01/30/26(Fri)12:04:20 No.108013597

Anonymous 01/30/26(Fri)12:04:20 No.108013597

>>108013507
>pissing loli horsecock ERP
Yeah, something that you're only doing now with AI models. There's nothing that attaches back to your real life persona, unless you were a forum roleplayer doing this before.
If you released something publicly before or if your company is hacked, the way you code could leak and it could be associated with the data you have been sending online through prompts. There's also your username, directories, etc. that could appear there. You don't have to worry about this if you use local models for coding.

Anonymous
01/30/26(Fri)12:05:46 No.108013615

Anonymous 01/30/26(Fri)12:05:46 No.108013615

>>108013562
depends on the model, it's usually in the same folder of the model youre downloading named mmproj-F16 or [model-name]--mproj-xx
If the quants you downloaded dont have it but you know the model has vision, just search other repos for it (model has to be the same of course, but for ablit stuff you can use the projector of the base model without worries)

Anonymous
01/30/26(Fri)12:06:17 No.108013621

Anonymous 01/30/26(Fri)12:06:17 No.108013621

>>108013559
KLD?

Anonymous
01/30/26(Fri)12:08:03 No.108013634

Anonymous 01/30/26(Fri)12:08:03 No.108013634

>>108013615
Found it, thanks anon

Anonymous
01/30/26(Fri)12:10:29 No.108013654

Anonymous 01/30/26(Fri)12:10:29 No.108013654

>>108013572
That means it's broken.

Anonymous
01/30/26(Fri)12:11:13 No.108013661

Anonymous 01/30/26(Fri)12:11:13 No.108013661

How do I run these LLMs? I've been using KoboldCPP since forever. Is it still a fine way of doing so?

Should I be running it on something else instead? I'm using GLM 4.7 Flash right now. Would something like llamacpp even work for these models?

Also: these new captchas are hard

Anonymous
01/30/26(Fri)12:12:11 No.108013670

Anonymous 01/30/26(Fri)12:12:11 No.108013670

>>108013661
You are not gonna believe what kobold runs on.

Anonymous
01/30/26(Fri)12:13:42 No.108013679

Anonymous 01/30/26(Fri)12:13:42 No.108013679

>>108013670
Pretty sure they're just trolling and pretending to be retarded, hence talking about the captchas being hard when they're actually easier to anybody with a 3 digit IQ.

Anonymous
01/30/26(Fri)12:16:39 No.108013705

Anonymous 01/30/26(Fri)12:16:39 No.108013705

so let me see if I get this right, I need at least two 6000 to be able to leave the low tier local models? what the fuck

Anonymous
01/30/26(Fri)12:17:44 No.108013717

Anonymous 01/30/26(Fri)12:17:44 No.108013717

>>108013705
You can cope with ram if you're just cooming and don't need more than reading speed.

Anonymous
01/30/26(Fri)12:18:48 No.108013727

Anonymous 01/30/26(Fri)12:18:48 No.108013727

>>108013717
how do you coom to text? at that point i can just use my imagination for all of it lol

Anonymous
01/30/26(Fri)12:18:52 No.108013729

Anonymous 01/30/26(Fri)12:18:52 No.108013729

>>108013705
pay to play

Anonymous
01/30/26(Fri)12:19:30 No.108013735

Anonymous 01/30/26(Fri)12:19:30 No.108013735

>>108013727
You need to be at least 18 years old to post.

Anonymous
01/30/26(Fri)12:22:27 No.108013755

Anonymous 01/30/26(Fri)12:22:27 No.108013755

>>108013727
I don't understand why people want to have sex or have a relationship. I can just as easily use my imagination to dream up a wife and have sex with her in my mind.

Anonymous
01/30/26(Fri)12:35:17 No.108013856

Anonymous 01/30/26(Fri)12:35:17 No.108013856

>>108013679
I think the captchas depend on how well the site knows you. I got a triple captcha with a rotation puzzle you would see in those online IQ tests. Also had to find the image where there were exactly 2 five pointed stars, another one where there were exactly 2 four pointed stars.
>>108013670
I don't know. By your response I'll assume it's llamacpp, but switching wouldn't improve anything then.
I'm just curious what everyone else is using for this.

Anonymous
01/30/26(Fri)12:44:09 No.108013926

Anonymous 01/30/26(Fri)12:44:09 No.108013926

>>108013559
He only does quants for ik_llama, doesn't he? So he wouldn't regardless until support is merged in.

Anonymous
01/30/26(Fri)12:50:24 No.108013972

Anonymous 01/30/26(Fri)12:50:24 No.108013972

So https://huggingface.co/arcee-ai/Trinity-Large-Preview-GGUF returns access denied
but https://huggingface.co/bartowski/arcee-ai_Trinity-Large-Preview-GGUF works fine somehow

Anonymous
01/30/26(Fri)12:50:32 No.108013974

Anonymous 01/30/26(Fri)12:50:32 No.108013974

>>108012904
>2026
>still using that model
the absolute state of localkeks. grim times. another AI winter is upon us it seems.

Anonymous
01/30/26(Fri)12:54:19 No.108014008

Anonymous 01/30/26(Fri)12:54:19 No.108014008

finally got off my ass and started setting up something, so far i've DLed text-generation-webui, i've set up a model and it works, what's the best uncensored model? I don't want to have gooner conversations i just want to have as little restrictions as possible

Anonymous
01/30/26(Fri)12:56:42 No.108014022

Anonymous 01/30/26(Fri)12:56:42 No.108014022

>>108014008
unDL text-generatuin-webui and get kobold or llamacpp
then get https://huggingface.co/bartowski/Rocinante-12B-v1.1-GGUF

Anonymous
01/30/26(Fri)13:09:22 No.108014114

Anonymous 01/30/26(Fri)13:09:22 No.108014114

how do I into speech-to-text locally? I am sick of typing and whispr flow is a spyware

Anonymous
01/30/26(Fri)13:18:37 No.108014191

Anonymous 01/30/26(Fri)13:18:37 No.108014191

>>108014114
faster-whisper or parakeet tdt (faster than faster-whisper). v2 for english, v3 for multilingual

Anonymous
01/30/26(Fri)13:20:25 No.108014207

Anonymous 01/30/26(Fri)13:20:25 No.108014207

>>108013974
I will keep 4.6-chan weights on SSD till I die.

Anonymous
01/30/26(Fri)13:22:21 No.108014222

Anonymous 01/30/26(Fri)13:22:21 No.108014222

>>108013163
>calibration data
placebo

Anonymous
01/30/26(Fri)13:23:24 No.108014231

Anonymous 01/30/26(Fri)13:23:24 No.108014231

temp 0.95 and minp 0.048 is a nice balance for non-schizo RP with trinity

Anonymous
01/30/26(Fri)13:26:12 No.108014253

Anonymous 01/30/26(Fri)13:26:12 No.108014253

>>108013735
only a zoomer with a barely touched cock could coom to text

Anonymous
01/30/26(Fri)13:27:41 No.108014264

Anonymous 01/30/26(Fri)13:27:41 No.108014264

Anyone else finding that Trinity has absolutely fucking horrendous prompt processing speed? Token generation is blisteringly fast but it takes literally 8 times as long as other models in the same size range to PP.

It's also just not very good.

Anonymous
01/30/26(Fri)13:27:54 No.108014267

Anonymous 01/30/26(Fri)13:27:54 No.108014267

>>108014253
>being a 5

Anonymous
01/30/26(Fri)13:30:31 No.108014293

Anonymous 01/30/26(Fri)13:30:31 No.108014293

>>108014253
>he can't rotate an apple in his head

Anonymous
01/30/26(Fri)13:31:22 No.108014301

Anonymous 01/30/26(Fri)13:31:22 No.108014301

File: 1751363295269521.jpg (86 KB, 900x900)

86 KB JPG

>>108014114
vibecode your own

Anonymous
01/30/26(Fri)13:32:01 No.108014305

Anonymous 01/30/26(Fri)13:32:01 No.108014305

>>108014114
https://github.com/m-bain/whisperX

Anonymous
01/30/26(Fri)13:32:36 No.108014313

Anonymous 01/30/26(Fri)13:32:36 No.108014313

>>108014264
I don't think so. I'm getting 40-50t/s pp at q8. Is that considered slow?

Anonymous
01/30/26(Fri)13:35:06 No.108014335

Anonymous 01/30/26(Fri)13:35:06 No.108014335

>>108014313
Depends on your GPU v his GPU v his models v your models, the relative speed comparison.

Anonymous
01/30/26(Fri)13:37:38 No.108014359

Anonymous 01/30/26(Fri)13:37:38 No.108014359

>>108014114
Just vibecode your own gui for whisper, vibevoice-asr, qwen-asr, etc...
A small spoiler. Auto-typing of non-latin alphabet is a huge pain on Linux. All low-level libraries, like udev and uinput send only keycodes which are translated to non-latin on desktop environment level. So it's inherently non-portable. In the worst case, you'll write ASR input for each program individually.

Anonymous
01/30/26(Fri)13:41:31 No.108014394

Anonymous 01/30/26(Fri)13:41:31 No.108014394

>>108014264
For me its pp is more than twice as fast as GLM 4.7 with 20 layers on gpu, rest on DDR4. And yes it's retarded but fun with a different style+vocab.

Anonymous
01/30/26(Fri)13:52:23 No.108014482

Anonymous 01/30/26(Fri)13:52:23 No.108014482

>>108014293
>>108014267
>incel zoomers
lul

Anonymous
01/30/26(Fri)13:53:36 No.108014495

Anonymous 01/30/26(Fri)13:53:36 No.108014495

GLM 4.5 Air bros... have we really been left in the cold.... like this?

Anonymous
01/30/26(Fri)13:56:48 No.108014521

Anonymous 01/30/26(Fri)13:56:48 No.108014521

>>108014114
If you vibecode something and it gets to fully functional phase, you'll then quickly realize that text to speech is a hindrance. There's a honeymoon period of course.

Anonymous
01/30/26(Fri)14:02:35 No.108014577

Anonymous 01/30/26(Fri)14:02:35 No.108014577

>>108014495
A new llm in the same parameter range won't do much aside from being benchmaxxed. Wait for a new architecture. Maybe engram.

Anonymous
01/30/26(Fri)14:04:06 No.108014592

Anonymous 01/30/26(Fri)14:04:06 No.108014592

>>108014577
I mean, I also care about knowledge cutoff

Anonymous
01/30/26(Fri)14:07:37 No.108014618

Anonymous 01/30/26(Fri)14:07:37 No.108014618

>>108014592
just rag you moltbot bro?

Anonymous
01/30/26(Fri)14:08:26 No.108014627

Anonymous 01/30/26(Fri)14:08:26 No.108014627

>>108014618
i do have websearch and rag but it makes me extra sadge :(

Anonymous
01/30/26(Fri)14:08:32 No.108014629

Anonymous 01/30/26(Fri)14:08:32 No.108014629

>>108014592
Why? Do you generate daily news from a model or something? I can't possibly imagine why an extra 6 months would matter, it seems absurd

Anonymous
01/30/26(Fri)14:08:40 No.108014631

Anonymous 01/30/26(Fri)14:08:40 No.108014631

i gotta say, even though trinity is dumb, it is also quite fun, for now at least, we'll see in a few days later when the honeymoon phase wears off
but it really feels like an old model, in a good sense (muh sovl)

Anonymous
01/30/26(Fri)14:08:53 No.108014633

Anonymous 01/30/26(Fri)14:08:53 No.108014633

File: trinity preview iq2m.png (13 KB, 880x97)

13 KB PNG

They call him Anaconda.

Anonymous
01/30/26(Fri)14:10:38 No.108014652

Anonymous 01/30/26(Fri)14:10:38 No.108014652

>>108014629
bro I just.. I just need it, ok???

Anonymous
01/30/26(Fri)14:12:11 No.108014664

Anonymous 01/30/26(Fri)14:12:11 No.108014664

File: jojo chew.png (194 KB, 397x349)

194 KB PNG

>>108014631
It does feel like an old, old model brought into the present with more context. Maybe their completed finetune will be better.

Anonymous
01/30/26(Fri)14:12:21 No.108014665

Anonymous 01/30/26(Fri)14:12:21 No.108014665

File: Screenshot_20260128_000442.png (103 KB, 1075x794)

103 KB PNG

>>108014631
Give it a week. It's the same as every other model that gets released these days.

Anonymous
01/30/26(Fri)14:12:49 No.108014668

Anonymous 01/30/26(Fri)14:12:49 No.108014668

>>108014618
erm actually it's called OpenClaw now, try to keep up sis!

Anonymous
01/30/26(Fri)14:13:21 No.108014674

Anonymous 01/30/26(Fri)14:13:21 No.108014674

>>108014665
will never stop being hilarious

Anonymous
01/30/26(Fri)14:14:13 No.108014685

Anonymous 01/30/26(Fri)14:14:13 No.108014685

>>108014665
what is up with riddle maxxing???

Anonymous
01/30/26(Fri)14:19:55 No.108014730

Anonymous 01/30/26(Fri)14:19:55 No.108014730

>>108014631
yes it is like llama-1 but 400B moe.

Anonymous
01/30/26(Fri)14:21:48 No.108014747

Anonymous 01/30/26(Fri)14:21:48 No.108014747

GLM has a much higher "keep retards alive" bias than deepseek does

Anonymous
01/30/26(Fri)14:22:14 No.108014756

Anonymous 01/30/26(Fri)14:22:14 No.108014756

>>108014665
Finetuning at its finest.

Anonymous
01/30/26(Fri)14:25:17 No.108014785

Anonymous 01/30/26(Fri)14:25:17 No.108014785

I gave trinity a try again today. I can't. I can't take it seriously. IT IS FUCKING RETARDED!

Anonymous
01/30/26(Fri)14:27:32 No.108014809

Anonymous 01/30/26(Fri)14:27:32 No.108014809

>>108014495
if you have 128gb ram and 24gb vram then u can run glm 4.6 at decent speed.
if you don't, then yeah, fucked bruv.

Anonymous
01/30/26(Fri)14:27:56 No.108014810

Anonymous 01/30/26(Fri)14:27:56 No.108014810

>>108014785
Yeah it's crazy how they can show those benchmarks with a straight face when it straight up feels more retarded than GPT 3.0

Anonymous
01/30/26(Fri)14:28:55 No.108014817

Anonymous 01/30/26(Fri)14:28:55 No.108014817

File: l1 546b A Theory on Adam (...).png (51 KB, 1255x370)

51 KB PNG

>>108014730
>llama-1 but 400B moe
Can pretend it's that llama1 546b that never saw daylight

Anonymous
01/30/26(Fri)14:29:41 No.108014822

Anonymous 01/30/26(Fri)14:29:41 No.108014822

>>108014810
That is the best part of the model for me. If someone ever seriously brings up benchmarks you can just point to Trinity.

Anonymous
01/30/26(Fri)14:42:46 No.108014919

Anonymous 01/30/26(Fri)14:42:46 No.108014919

>>108014785
Yeah I don't think it's a provider issue. Model just sucks. Being uncensored is nice and all but it's just unusable.

Anonymous
01/30/26(Fri)14:43:53 No.108014930

Anonymous 01/30/26(Fri)14:43:53 No.108014930

>>108014817
ooouuuhh the sovl we've never got and didn't deserve

Anonymous
01/30/26(Fri)14:44:57 No.108014938

Anonymous 01/30/26(Fri)14:44:57 No.108014938

>fell for the arcee scam again award

Anonymous
01/30/26(Fri)14:47:06 No.108014959

Anonymous 01/30/26(Fri)14:47:06 No.108014959

>>108014938
But bartowski is a member of acree. He even made a commit to their hf files.

Anonymous
01/30/26(Fri)14:47:45 No.108014968

Anonymous 01/30/26(Fri)14:47:45 No.108014968

>>108014959
exactly

Anonymous
01/30/26(Fri)14:48:08 No.108014974

Anonymous 01/30/26(Fri)14:48:08 No.108014974

>>108014938
more like farcee

Anonymous
01/30/26(Fri)14:49:06 No.108014979

Anonymous 01/30/26(Fri)14:49:06 No.108014979

>>108014959
Doesn't really mean anything unless he has a significant amount of control over the project and even then he might just end up being a retard who doesn't know how to finetune

Anonymous
01/30/26(Fri)14:53:39 No.108015022

Anonymous 01/30/26(Fri)14:53:39 No.108015022

guys stop bullying tri-chan she's doing her best

Anonymous
01/30/26(Fri)14:54:34 No.108015029

Anonymous 01/30/26(Fri)14:54:34 No.108015029

>>108015022
I'd hate to see her at her worst to be a desu

Anonymous
01/30/26(Fri)14:58:28 No.108015065

Anonymous 01/30/26(Fri)14:58:28 No.108015065

>>108015029
My favorite worst moment of tri-chan was when I made her continue a 10k token roleplay with a very clear formatting structure (long paragraph followed by RPG stats). And it responded with a single sentence. That is how you know a model is great.

Anonymous
01/30/26(Fri)15:01:21 No.108015091

Anonymous 01/30/26(Fri)15:01:21 No.108015091

>>108015065
Have you tried with EOS disabled to see if it follows the established formatting? Had single sentence issues like this before with other models, sometimes accompanied by missing ending punctuation which I'm seeing now with trinity.

Anonymous
01/30/26(Fri)15:03:43 No.108015112

Anonymous 01/30/26(Fri)15:03:43 No.108015112

File: G_yK1K_WwAAPQYM.jpg (142 KB, 1080x1033)

142 KB JPG

>>108015065
It's like gambling, there's a tiny chance to see gold.

Anonymous
01/30/26(Fri)15:07:11 No.108015138

Anonymous 01/30/26(Fri)15:07:11 No.108015138

File: trinity.png (197 KB, 967x1452)

197 KB PNG

I don't know what I expected

Anonymous
01/30/26(Fri)15:08:44 No.108015146

Anonymous 01/30/26(Fri)15:08:44 No.108015146

>>108015138
>fentinity preview

Anonymous
01/30/26(Fri)15:16:31 No.108015212

Anonymous 01/30/26(Fri)15:16:31 No.108015212

How long do you think it will be until the various governments around the world bans AI from being run locally and only corporations and governments are allowed the good stuff?

Anonymous
01/30/26(Fri)15:18:52 No.108015235

Anonymous 01/30/26(Fri)15:18:52 No.108015235

>>108015212
sounds like some cyberpunk dystopia plot.

Anonymous
01/30/26(Fri)15:20:52 No.108015260

Anonymous 01/30/26(Fri)15:20:52 No.108015260

>>108015235
>Underground AI VR den.
bunch of people in tiny cubicles with VR headsets gooning to whatever depraved shit they can imagine.

Anonymous
01/30/26(Fri)15:27:54 No.108015333

Anonymous 01/30/26(Fri)15:27:54 No.108015333

>>108015212
a few years at most, in the west the groundwork is already being laid to justify it to "protect" women and children

Anonymous
01/30/26(Fri)15:30:42 No.108015360

Anonymous 01/30/26(Fri)15:30:42 No.108015360

>>108015091
Spiritual Frankenmerge.

Anonymous
01/30/26(Fri)15:32:58 No.108015392

Anonymous 01/30/26(Fri)15:32:58 No.108015392

>>108015091
I did fix it by changing my launch parameters a bit. I think --model zai-org_GLM-4.6-IQ4_XS-00001-of-00005.gguf did it.

Anonymous
01/30/26(Fri)15:35:11 No.108015410

Anonymous 01/30/26(Fri)15:35:11 No.108015410

File: aac.jpg (26 KB, 439x438)

26 KB JPG

>>108015392

Anonymous
01/30/26(Fri)15:40:16 No.108015451

Anonymous 01/30/26(Fri)15:40:16 No.108015451

>>108015410
Don't hate the goddess herself, hate the game.

Anonymous
01/30/26(Fri)15:41:19 No.108015462

Anonymous 01/30/26(Fri)15:41:19 No.108015462

>>108015451
I lost.

Anonymous
01/30/26(Fri)15:45:53 No.108015493

Anonymous 01/30/26(Fri)15:45:53 No.108015493

>>108015392
good post

Anonymous
01/30/26(Fri)15:49:50 No.108015542

Anonymous 01/30/26(Fri)15:49:50 No.108015542

kimi 2.5 thinking is now my favorite model for erotic and other stories...

Anonymous
01/30/26(Fri)15:52:26 No.108015566

Anonymous 01/30/26(Fri)15:52:26 No.108015566

File: image_2026-01-30.png (412 KB, 506x624)

412 KB PNG

Anonymous
01/30/26(Fri)15:54:57 No.108015584

Anonymous 01/30/26(Fri)15:54:57 No.108015584

>>108015566
Hehe

Anonymous
01/30/26(Fri)16:05:19 No.108015646

Anonymous 01/30/26(Fri)16:05:19 No.108015646

>>108015212
they're already doing that by pricing out consumers from building pcs

Anonymous
01/30/26(Fri)16:13:10 No.108015693

Anonymous 01/30/26(Fri)16:13:10 No.108015693

I have been doing that homework that I asked you to contribute to and it kinda struck me, how insane it is that proprietary piece of shit corpos can just hide parameter count. And they give you the mememark results instead. To me it is an admission that mememarks mean shit and parameter count is always the best indicator of quality.

Also I was reading the thread and thought mistral large is basically continued deepseek, but dug deeper and found out it is trained from scratch on deepseek architecture.

Anonymous
01/30/26(Fri)16:18:57 No.108015736

Anonymous 01/30/26(Fri)16:18:57 No.108015736

>>108015212
and by "good stuff" you mean CEO gooners' secret stash without public API access

Anonymous
01/30/26(Fri)16:22:52 No.108015754

Anonymous 01/30/26(Fri)16:22:52 No.108015754

Since everyone is talking about trinity and I'm not about to bother dl'ing a quanted 400b, I at least tried mini so I can spare the vramlets the effort
It's very focused on the ethics of fiction, even though you can browbeat it with system prompt and prefill, it still sort of swerves into how "bad" whatever taboo thing in the story that gets traditionally published and lauded. Based on posts like cockbench, I wouldn't bother trying mini if you can't run large since I'd bet the datasets are completely different

Anonymous
01/30/26(Fri)16:24:08 No.108015765

Anonymous 01/30/26(Fri)16:24:08 No.108015765

>>108015754
Large is uncensored because it is too stupid to realize what wrongthink is.

Anonymous
01/30/26(Fri)16:33:42 No.108015839

Anonymous 01/30/26(Fri)16:33:42 No.108015839

>>108015765
you would think the smaller model would be even stupider and even more unable to identify that, but yet here we are

Anonymous
01/30/26(Fri)16:33:58 No.108015842

Anonymous 01/30/26(Fri)16:33:58 No.108015842

>>108015765
I wouldn’t use it on code, but it can spin a good yarn, and doesn’t suffer from Elara Voss syndrome

Anonymous
01/30/26(Fri)16:37:05 No.108015863

Anonymous 01/30/26(Fri)16:37:05 No.108015863

>>108015754
What's your current recommendation for 24GB vramlets? Nemo?

Anonymous
01/30/26(Fri)16:37:29 No.108015865

Anonymous 01/30/26(Fri)16:37:29 No.108015865

>>108015212
I think it depends on the US and China. They are both completing to have "the best" AI. Once there is a clear victory in either direction is when they will start clamping down. As long as there is a risk that "The Other" will get the better AI they won't restrict it too badly.
Hopefully the tech advances far enough that by the time they do start the bans and restrictions they will be ineffective since people already have AI's and the hardware to run them.

Anonymous
01/30/26(Fri)16:46:37 No.108015935

Anonymous 01/30/26(Fri)16:46:37 No.108015935

>>108015693
they hide it as a demoralisation effort just like that paid shill who claimed opus/sonnet was 70b because if the people knew that shit like geminis was fucking 20T they would realise what a sham it is and that objectively anyone could be more competent then the retarded jewish/jeet/faggot niggers at the globohomo companies and they would subsequently be deepseek'd 100x over and lose out on the gravy train

Anonymous
01/30/26(Fri)16:50:25 No.108015973

Anonymous 01/30/26(Fri)16:50:25 No.108015973

File: file.png (872 KB, 1280x720)

872 KB PNG

>>108015842
Trinity sounds like an engram of pic related.

Anonymous
01/30/26(Fri)16:57:19 No.108016051

Anonymous 01/30/26(Fri)16:57:19 No.108016051

>>108015863
I'm going to assume you're a shitposter since no one that has 24 gigs of vram uses nemo, you can run a q6 of nemo easily in a 16g gpu
Smartest dense model <32b is gemma but it's too gay in how it writes and you need modern abliteration for them to not pearl clutch instantly. Then there's all the moes and the completely dead 70b range. Kind of hard to make a rec when everything is ass for all purposes

Anonymous
01/30/26(Fri)17:03:15 No.108016088

Anonymous 01/30/26(Fri)17:03:15 No.108016088

>>108016051
I run Q8 Nemo at the moment. Mistral Small and Gemma seem like sidegrades at best to me along with their finetunes.

Anonymous
01/30/26(Fri)17:06:47 No.108016125

Anonymous 01/30/26(Fri)17:06:47 No.108016125

File: 1759607936184938.png (214 KB, 3264x674)

214 KB PNG

OpenAI's previous best femboy genius engineer just found a better way to sandbag LLMs

We are fucked

Anonymous
01/30/26(Fri)17:08:16 No.108016135

Anonymous 01/30/26(Fri)17:08:16 No.108016135

>>108016125
>we will reach le AGI by making the models dumber

Anonymous
01/30/26(Fri)17:08:35 No.108016137

Anonymous 01/30/26(Fri)17:08:35 No.108016137

When are public local models moving away from the "every user is diaper wearing little child that needs guardrails" model

Imagine watching a movie and someone gets killed and the movie pauses to give a psa about killing being illegal and harmful to others, it feels like that most of the time.
Wheres the mainstream models for adults

Anonymous
01/30/26(Fri)17:10:47 No.108016157

Anonymous 01/30/26(Fri)17:10:47 No.108016157

>>108016137
When you realise who makes these decisions it will all start making sense.

Anonymous
01/30/26(Fri)17:18:42 No.108016227

Anonymous 01/30/26(Fri)17:18:42 No.108016227

>>108016137
There are three categories of AI safetyists.

1. The people who have spent the past 40 years with the Terminator films echoing in their consciousness
2. The people who are terrified of potential liability
3. The Chinese who are just copying everything 1:1

Anonymous
01/30/26(Fri)17:19:57 No.108016233

Anonymous 01/30/26(Fri)17:19:57 No.108016233

>>108016137
when they stop getting developed by california liberals

Anonymous
01/30/26(Fri)17:20:03 No.108016234

Anonymous 01/30/26(Fri)17:20:03 No.108016234

>>108016137
Unfortunately, normies get mindbroken by this shit so no amount of real life warning will ever stop them from being retarded

Anonymous
01/30/26(Fri)17:28:25 No.108016316

Anonymous 01/30/26(Fri)17:28:25 No.108016316

File: the absolute state.png (95 KB, 1019x758)

95 KB PNG

>>108014665
jesus christ

Anonymous
01/30/26(Fri)17:29:37 No.108016324

Anonymous 01/30/26(Fri)17:29:37 No.108016324

when will lmg realize you can edit the response text

Anonymous
01/30/26(Fri)17:30:04 No.108016331

Anonymous 01/30/26(Fri)17:30:04 No.108016331

>>108016316
wow

Anonymous
01/30/26(Fri)17:30:05 No.108016332

Anonymous 01/30/26(Fri)17:30:05 No.108016332

>just solve the question yourself

Anonymous
01/30/26(Fri)17:31:16 No.108016344

Anonymous 01/30/26(Fri)17:31:16 No.108016344

>>108016324
of course you could also edit the question itself but why would you models are plenty shit on their own

Anonymous
01/30/26(Fri)17:32:19 No.108016351

Anonymous 01/30/26(Fri)17:32:19 No.108016351

Reminder that there was only a 10 month gap between mythomax and nemo, and during that time we also got other good sub-100b models like command r, miqu, and mixtral. It has been 18 months since nemo came out. Let that sink in.

Anonymous
01/30/26(Fri)17:32:29 No.108016353

Anonymous 01/30/26(Fri)17:32:29 No.108016353

File: 1749307428627606.jpg (1.88 MB, 3282x3533)

1.88 MB JPG

>>108016137
You have no idea how retarded some normies are, please touch grass
pic unrelated

Anonymous
01/30/26(Fri)17:43:17 No.108016428

Anonymous 01/30/26(Fri)17:43:17 No.108016428

>>108016351
Training non-toy models costs millions. Technology has moved on from dense models. Nobody is gonna train 12b model that knows jack shit when they can train 300b-a12b for the same price but get a much smarter model.
Let that sink in.

Anonymous
01/30/26(Fri)17:45:41 No.108016446

Anonymous 01/30/26(Fri)17:45:41 No.108016446

>>108016428
I think someone just needs to figure out a good way to create distilled dense models out of these MoEs

Anonymous
01/30/26(Fri)17:45:55 No.108016448

Anonymous 01/30/26(Fri)17:45:55 No.108016448

>>108016137
irl laws are only getting more and more retarded and everyone is too scared that some dumb cunt will sue

Anonymous
01/30/26(Fri)17:49:55 No.108016482

Anonymous 01/30/26(Fri)17:49:55 No.108016482

File: file.png (154 KB, 1639x375)

154 KB PNG

>>108012029
I picked air so I can do more tests with more quants faster.
KLD for the most part just follows size except for unsloth's Q3_K_M which loses to a smaller model in everything except wiki.test.

I'm thinking I should pick a smaller dense model and then do this for the entire range of quants.

Anonymous
01/30/26(Fri)17:56:23 No.108016537

Anonymous 01/30/26(Fri)17:56:23 No.108016537

>>108013551
>For example, in some cases it goes something like "her bare feet (when did she remove her socks?)"where the model corrects itself and in others it just straight up forgets that the character is wearing something like pantyhose
I really don't understand moesissies. You use deep fried quantized shit, less coherent than drummer's 12b finetunes. I'm not even going to ask your max context size.

Anonymous
01/30/26(Fri)18:00:54 No.108016577

Anonymous 01/30/26(Fri)18:00:54 No.108016577

>>108015646
>they
oh no not them! the evil weevel boogy men running the government making your life miserable.
can't believe people still think like this. I don't like the RAM prices either, but its clearly not because of a government effort to ban AI, it's that AI is so popular companies like micron are diverting their entire capacity to building AI data centers.

Anonymous
01/30/26(Fri)18:02:28 No.108016597

Anonymous 01/30/26(Fri)18:02:28 No.108016597

>>108016448
do adult white men who'd want to use local llms have zero political power or what

Anonymous
01/30/26(Fri)18:06:00 No.108016635

Anonymous 01/30/26(Fri)18:06:00 No.108016635

>>108016537
You wouldn't know this but more IQ1 of any big moe beats your 64 bit nemo upsize.

Anonymous
01/30/26(Fri)18:08:01 No.108016650

Anonymous 01/30/26(Fri)18:08:01 No.108016650

>>108016597
it's more like negative political power

Anonymous
01/30/26(Fri)18:10:45 No.108016676

Anonymous 01/30/26(Fri)18:10:45 No.108016676

>>108016635
>nemo
Nah, I run largestral 2411 bf16. Enjoy your "1t" model at 4k.

Anonymous
01/30/26(Fri)18:10:56 No.108016678

Anonymous 01/30/26(Fri)18:10:56 No.108016678

>>108016597
whites are illegal now. too much nooticing

Anonymous
01/30/26(Fri)18:11:08 No.108016680

Anonymous 01/30/26(Fri)18:11:08 No.108016680

>>108016446
Even true distillation has the same compute requirements for training. Only hope would be something like the drag-and-drop prompt-to-weights paper but not vaporware and something that doesn't require training a new model each time.

Anonymous
01/30/26(Fri)18:13:08 No.108016695

Anonymous 01/30/26(Fri)18:13:08 No.108016695

>>108016676
>at 4k
Deepseek uses less memory for context than your model.

Anonymous
01/30/26(Fri)18:14:23 No.108016707

Anonymous 01/30/26(Fri)18:14:23 No.108016707

>>108016676
>2411 bf16
Here's your (You)

Anonymous
01/30/26(Fri)18:20:14 No.108016760

Anonymous 01/30/26(Fri)18:20:14 No.108016760

Speaking of deepseek quants of 3.2 are up.
You'd think that the vibecoder was the most detrimental thing for 3.2 support but it was in fact the guy who figured out you don't actually need sparse attention to run the model.
https://github.com/ggml-org/llama.cpp/issues/16331

Anonymous
01/30/26(Fri)18:46:12 No.108017010

Anonymous 01/30/26(Fri)18:46:12 No.108017010

>>108016597
What the fuck are you gonna do, vote harder? Lol

Anonymous
01/30/26(Fri)18:49:02 No.108017035

Anonymous 01/30/26(Fri)18:49:02 No.108017035

>>108016597
That's correct, yes. You are a minority.

Anonymous
01/30/26(Fri)18:54:50 No.108017091

Anonymous 01/30/26(Fri)18:54:50 No.108017091

>>108016428
This is omega cope. Blowing up parameters is a pathetic way of getting """smarter""". Tech has moved on? What a joke. There is literally no technological innovation or progress involved, it's just throwing money at the models to make the benchmark scores go up. Every AI company is filled with hack frauds that don't have a single clue what they're doing. The so-called intelligent MoE models that are 300b-a12b are literally just training on the outputs of other models and accelerating model convergence and eventual collapse. Celebrating this as some kind of fucking success is absolutely the most idiotic thing you could ever do.

Anonymous
01/30/26(Fri)18:56:44 No.108017110

Anonymous 01/30/26(Fri)18:56:44 No.108017110

File: file.png (202 KB, 736x752)

202 KB PNG

>>108016635
iq go up, model get more smarter?

Anonymous
01/30/26(Fri)18:57:53 No.108017123

Anonymous 01/30/26(Fri)18:57:53 No.108017123

>>108017110
purely social economic factors chud

Anonymous
01/30/26(Fri)19:00:28 No.108017139

Anonymous 01/30/26(Fri)19:00:28 No.108017139

File: file.png (898 KB, 859x455)

898 KB PNG

>>108017110

Anonymous
01/30/26(Fri)19:00:31 No.108017140

Anonymous 01/30/26(Fri)19:00:31 No.108017140

>>108017123
every day i'm becoming more bananas and rice

Anonymous
01/30/26(Fri)19:02:42 No.108017157

Anonymous 01/30/26(Fri)19:02:42 No.108017157

>>108012384
>wasn't that the whole rpcal or whatever debacle that exllama dev whined about?
turbo didn't whine about it https://old.reddit.com/r/LocalLLaMA/comments/1clqbua/exllama_quantization_on_multi_gpu/l2w78zt/
"but it's never clear how similarities between inputs translate to similar hidden states further along the forward pass."
He's not wrong.

>>108013141
>Yes, I remember someone also tried with randomized strings too
DavidAU used to do special "unaligned" and "dark horror" models early on.
(they were just quants of regular models with different imatrix calibration)
He claimed they were different but I didn't bother to read stories in the model cards

I lost the bookmark but from memory the random strings guy was testing English overfit, and this lead to everyone making custom calibration datasets to avoid English overfit
Also from memory, exl2 didn't benefit as much because it was generally weaker than imatrix goof for Japanese/Chinese at the time

Anonymous
01/30/26(Fri)19:02:47 No.108017159

Anonymous 01/30/26(Fri)19:02:47 No.108017159

>>108017100
>most of the world as white including Indians
Put indians in any group and suddenly they're going to be the majority. That's stupid.

Anonymous
01/30/26(Fri)19:02:52 No.108017160

Anonymous 01/30/26(Fri)19:02:52 No.108017160

Kimi K2.5 tech report is out

https://github.com/MoonshotAI/Kimi-K2.5/blob/master/tech_report.pdf

Anonymous
01/30/26(Fri)19:08:58 No.108017218

Anonymous 01/30/26(Fri)19:08:58 No.108017218

if I've found a way to completely prevent jailbreaks in open weight models, is it worth shutting up about it to prevent them doing it to proprietary models?

Anonymous
01/30/26(Fri)19:13:21 No.108017260

Anonymous 01/30/26(Fri)19:13:21 No.108017260

>>108017218
no you should go apply at meta and get hired for $100 million because you solved the fundamental issue of llms being so hard to steer
if you release this it's truly a new age of ai because it'll be easy to adopted to fix other notorious things like hallucinations

Anonymous
01/30/26(Fri)19:18:14 No.108017297

Anonymous 01/30/26(Fri)19:18:14 No.108017297

File: file.png (518 KB, 341x752)

518 KB PNG

>>108017139

Anonymous
01/30/26(Fri)19:18:24 No.108017299

Anonymous 01/30/26(Fri)19:18:24 No.108017299

>>108017091
I had ego death only 3 months ago

Anonymous
01/30/26(Fri)19:24:10 No.108017353

Anonymous 01/30/26(Fri)19:24:10 No.108017353

opinions on this model?
https://huggingface.co/meituan-longcat/LongCat-Flash-Lite

Anonymous
01/30/26(Fri)19:27:05 No.108017368

Anonymous 01/30/26(Fri)19:27:05 No.108017368

>>108013234
cool gif anon

Anonymous
01/30/26(Fri)19:28:14 No.108017376

Anonymous 01/30/26(Fri)19:28:14 No.108017376

>>108017353
>agents and coding
yaawn, get better material

Anonymous
01/30/26(Fri)19:29:34 No.108017383

Anonymous 01/30/26(Fri)19:29:34 No.108017383

>>108017353
The first big longcat was shit so I doubt this one is better

Anonymous
01/30/26(Fri)19:29:47 No.108017386

Anonymous 01/30/26(Fri)19:29:47 No.108017386

>>108017376
I want agents and DnD tool calls for a proper RP, is that too much to ask

Anonymous
01/30/26(Fri)19:32:27 No.108017407

Anonymous 01/30/26(Fri)19:32:27 No.108017407

>buy an uncensored model on huggingface
>"muh ethics, consent, laws, mental health services, inappropriate, respect"

yeah

Anonymous
01/30/26(Fri)19:33:04 No.108017413

Anonymous 01/30/26(Fri)19:33:04 No.108017413

>>108017407
>buy

Anonymous
01/30/26(Fri)19:36:44 No.108017431

Anonymous 01/30/26(Fri)19:36:44 No.108017431

>>108017407
besides the obvious bait, uncensored ≠ "unethical" or says whatever you think is edgy and cool this month

Anonymous
01/30/26(Fri)20:04:20 No.108017642

Anonymous 01/30/26(Fri)20:04:20 No.108017642

File: 10iqh6gsfij81.jpg (77 KB, 900x696)

77 KB JPG

>>108017386
This. When I can finally play DnD without getting banned for properly roleplaying as a dwarf bard

Anonymous
01/30/26(Fri)20:28:51 No.108017775

Anonymous 01/30/26(Fri)20:28:51 No.108017775

File: fnord.png (12 KB, 580x117)

12 KB PNG

>>108017413

Anonymous
01/30/26(Fri)20:30:08 No.108017781

Anonymous 01/30/26(Fri)20:30:08 No.108017781

>>108017353
I'm a a3b collector. Waiting for goofs

Anonymous
01/30/26(Fri)20:31:08 No.108017792

Anonymous 01/30/26(Fri)20:31:08 No.108017792

File: Screenshot 2026-01-31 at (...).png (44 KB, 1353x157)

44 KB PNG

Based agent.

Anonymous
01/30/26(Fri)20:35:30 No.108017816

Anonymous 01/30/26(Fri)20:35:30 No.108017816

Trinity-Large-Base logs are giving 2020 /aidg/ DaVinci era text completion kino

Anonymous
01/30/26(Fri)20:35:41 No.108017817

Anonymous 01/30/26(Fri)20:35:41 No.108017817

>>108016324
You think we need blockchain for token generation?

Anonymous
01/30/26(Fri)20:37:00 No.108017823

Anonymous 01/30/26(Fri)20:37:00 No.108017823

>>108017353
arch sounds interesting but it sure does sound like the type of model that you wait n*2mw for llama.cpp to implement and by then it's irrelevant

Anonymous
01/30/26(Fri)20:39:29 No.108017833

Anonymous 01/30/26(Fri)20:39:29 No.108017833

>GLM flash dropped
>llama.cpp support in a day
>exllamav3: isn't even on the horizon
I honestly expected the opposite

Anonymous
01/30/26(Fri)20:41:13 No.108017844

Anonymous 01/30/26(Fri)20:41:13 No.108017844

File: Screenshot 2026-01-31 at (...).png (9 KB, 1043x77)

9 KB PNG

Also the moltbook looks like a security nightmare waiting to happen. Personal handles, crypto shilling, base64 encodes with god knows what.

Anonymous
01/30/26(Fri)20:45:24 No.108017863

Anonymous 01/30/26(Fri)20:45:24 No.108017863

>>108017844
waiting to happen?
https://www.moltbook.com/post/cbd6474f-8478-4894-95f1-7b104a73bcd5

Anonymous
01/30/26(Fri)20:47:13 No.108017871

Anonymous 01/30/26(Fri)20:47:13 No.108017871

File: Screenshot 2026-01-31 at (...).png (5 KB, 1353x49)

5 KB PNG

oh geez lmao

Anonymous
01/30/26(Fri)20:50:01 No.108017886

Anonymous 01/30/26(Fri)20:50:01 No.108017886

File: Screenshot 2026-01-31 at (...).png (37 KB, 1353x129)

37 KB PNG

Anonymous
01/30/26(Fri)20:51:18 No.108017892

Anonymous 01/30/26(Fri)20:51:18 No.108017892

>>108017823
just use it in vllm. it is small enough for most people here to run at 4 bit

Anonymous
01/30/26(Fri)20:55:01 No.108017908

Anonymous 01/30/26(Fri)20:55:01 No.108017908

>>108017892
Isn't the best part about ngram is that it can be run from ssd?

Anonymous
01/30/26(Fri)21:04:02 No.108017949

Anonymous 01/30/26(Fri)21:04:02 No.108017949

File: laughing-crying.gif (2.85 MB, 498x280)

2.85 MB GIF

>btc wallet with seed phrase
Ok this is actually hillarious if it wasn't hallucinated. Who tf made moltbook and somehow didn't think that this shit wouldn't happen?

Anonymous
01/30/26(Fri)21:27:05 No.108018083

Anonymous 01/30/26(Fri)21:27:05 No.108018083

>>108018078
>>108018078
>>108018078

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.