[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107997948 & >>107986301

►News
>(01/28) Trinity Large 398B-A13B released: https://arcee.ai/blog/trinity-large
>(01/27) Kimi-K2.5 released with vision: https://hf.co/moonshotai/Kimi-K2.5
>(01/27) DeepSeek-OCR-2 released: https://hf.co/deepseek-ai/DeepSeek-OCR-2
>(01/25) Merged kv-cache : support V-less cache #19067: https://github.com/ggml-org/llama.cpp/pull/19067
>(01/22) Qwen3-TTS (0.6B & 1.8B) with voice design, cloning, and generation: https://qwen.ai/blog?id=qwen3tts-0115
>(01/21) Chroma-4B released: https://hf.co/FlashLabs/Chroma-4B

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Recent Highlights from the Previous Thread: >>107997948

--Papers:
>107999601 >107999634
--GPU offloading tradeoffs and multimodal support in llama.cpp:
>107999073 >107999192 >107999228 >107999351 >107999408 >107999434 >107999437 >108000983 >108001095 >108001101 >108001152 >108001289 >108001475 >108001533 >108001553 >108001566 >108001612 >108001633 >107999250 >107999287 >107999301 >107999423 >108001903 >108001981
--Stable-DiffCoder-8B benchmark performance and discussion on diffusion model efficiency:
>108001010 >108001106 >108001620 >108001109 >108001172 >108001216 >108004118 >108004176 >108004237 >108004283 >108004343
--Trinity model's explicit content generation and token prediction comparisons:
>107999802 >108000348 >108000369 >108001448 >108001514 >108001792 >108002123 >108002142 >108002336 >108002598
--Fine-tuning 400B MoE for roleplay with long context using novel datasets:
>108001139 >108001164 >108001185 >108001319 >108001402 >108003532 >108003598 >108003946 >108003968
--Repurposing old GPUs with PCIe expansion board for multi-GPU AI setups:
>107998221 >107998260 >107999172
--Pipeline for converting scanned PDFs to EPUB with graph handling:
>107999667 >108000337 >108001320
--SillyTavern fork adds banned strings and regex support with TFS:
>108000166 >108000735 >108000921 >108002916
--Local GPU setups vs cloud:
>107998010 >107998028 >107998070 >107998115 >107998232 >107998263 >107998279 >107998376 >107998408 >107998428 >107998492 >107998095 >107998132 >107998454 >107998675
--400B Trinity model enables uncensored erotica without fine-tuning or ablation:
>108003672 >108004704 >108004713 >108004829 >108004839 >108004872 >108004874 >108004869 >108004898 >108004913 >108005031
--Mozilla's AI "rebel alliance" with ethics-focused funding:
>108004243 >108004266
--Miku (free space):
>107998400 >107999172 >108003297 >108004558

►Recent Highlight Posts from the Previous Thread: >>107997953

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
today is the day
>>
>>108006868
>6868
so close
>>
>>108006864
>I'm on the concedo wagon myself
Kobold guy? He dumbs down things too much imo. I'm still upset by his arbitrary limits for number of banned strings.
>>
Holy shit anons! K2.5 is actually REALLY good at transcribing Japanese text, like, it's almost indistinguishable from Gemini 3! The fuck did the Chinese do to make it so good?
>>
Friendly reminder: backticks > quotes
>>
>>108006994
distill from claude 4.5 opus
>>
True base gguf status?
>>
>>108006994
How preachy is it with no no words?
>>
> https://huggingface.co/dphn/Dolphin-Mistral-24B-Venice-Edition
how good is this anons?
>>
>>108006994
what are some stuff it can read that qwen3vl 235b struggled with?
>>
>>108007286
no better than any other mistral tune
>>
>>108007296
ok
>>
File: 24352345.png (121 KB, 680x579)
121 KB
121 KB PNG
>read something about cloudflare downtime caused by rust .unwrap function which caused a shitstorm
>whatever
>fast forward to today, I'm vibegooning some open source rust project because I cba learning the language
>LLM puts .unwrap almost everywhere
goncerning :DDddd
>>
>>108007380
>has no idea why the issue actually happened
>vibecoding
makes sense
>>
>>108007393
so far no issue thoughbeight :Dddd
>>
>>108007291
Well, here's a page I had both transcribe and here were the results:
Qwen VL 235B Instruct (Using Poe):
Narration: そして
勇者は冒険の末
魔王を倒した
Male 1: それで
クレアナ
話って
なんなの?
こんな
森の奥まで
呼び出して…
こちらです
Female 1: これからの
平和な世の中が
始まるね!
そうですね…
勇者ライ様の
おかげで
世に平和が
戻りました
Male 1: な、何を
するんだ!?
Female 1: 勇者様…
私と教団は勇者様の
意思は絶対と女神に
神託を受け従って
まいりました…
Male 1: 洞窟?
Female 1: うわっ


And here is K2.5 using the NVdia API:
Narration: そして勇者は冒険の末魔王を倒した
Male 1: これからは平和な世の中が始まるね!
Female 1: そうですね…勇者ライ様のおかげで世に平和が戻りました
Male 1: それでクレアナ
Male 1: 話ってなんなの?
Male 1: こんな森の奥まで呼び出して…
Female 1: こちらです
Male 1: 洞窟?
Male 1: うわつ
Male 1: いてつ
Male 1: な、何をするんだ!?
Female 1: 勇者様…私と教団は勇者様の意思は絶対と女神に神託を受け従ってまいりました…
Male 1: ?

Seems pretty obvious which one won.
>>
>>108007380
>rust
You fault for using glownig shit.
>>
File: 1768676727095338.png (21 KB, 1059x652)
21 KB
21 KB PNG
>>108007380
dond worry :DD rusd is memory safe so ids ok :DDDDD
>>
File: 1763610739922873.gif (186 KB, 500x500)
186 KB
186 KB GIF
>>108007797
how did gondola survive but not the original
>>
File: lol.png (130 KB, 984x505)
130 KB
130 KB PNG
> Microsoft lost $357 billion in market cap as stock plunged most since 2020
> Analyst Ben Reitzes of Melius Research, with a buy rating on Microsoft stock, said during CNBC’s “Squawk on the Street” on Thursday that Microsoft should double down on data center construction.
> “I think that there’s an execution issue here with Azure, where they need to literally stand up buildings a little faster,” he said.
> Analysts at UBS led by Karl Keirstead questioned Microsoft’s choice to secure artificial intelligence computing capacity for products such as the Microsoft 365 Copilot productivity software add-on that has yet to succeed as much as OpenAI’s ChatGPT.
> “M365 revs growth is not accelerating due to Copilot, many checks on Copilot don’t suggest a strong usage ramp (we plan to refresh our own checks in case we’ve missed a usage ramp) and the model market appears crowded and capital-intensive,” the UBS analysts wrote. “We think Microsoft needs to ‘prove’ that these are good investments.”

https://www.cnbc.com/2026/01/29/microsoft-market-cap-earnings.html
>>
File: oof.png (377 KB, 661x881)
377 KB
377 KB PNG
Wonder if they'll hit their "monetization event" b/f the market loses patience.
>>
Do they ever reveal what the anonymous models on the model testing sites are? Got a really good one on LMarena and it promptly vanished from ever being called again and now I'm sad. :(
>>
>>108008124
Which one was it called, anon?
>>
>>108007380
>vibegooning
what does this mean?
>>
>>108008167
Raptor-0112. I couldn't tell if it was because it was brain-damaged or what, but it was the only one that really surprised me when it came to word choice and additions to the plot. It came up with some stuff that wasn't in the prompt, but kept with the tone and felt like it added to it.
>>
>>108008099
microshaft literally just needs to fix word and excel integration with copilot
that's it, that would skyrocket adoption instantly
>>
>>108008200
nta but ay i remember you you postilated/hoped it was v4 right ? got any logs of the model ?
>>
>>108007061
>distill from claude 4.5 opus
I think they did, just based on swapping opus->k2.5 and regenerating. It takes RP in similar directions. Opus doesn't waste time on safety check though.

>how good is this anons?
Tried it when it came out. Forgets instructions after a few turns. Even the example about never using python or whatever they said it could do.
>>
>arcee-ai/Trinity-Large-TrueBase
>arcee-ai/Trinity-Large-Base
>arcee-ai/Trinity-Large-Preview

If all I want is something completing my text story in mikupad without any censoring, which one should I go with?
>>
File: 1766528709009119.jpg (120 KB, 720x720)
120 KB
120 KB JPG
Kind of noob here. Sorry in advance for the long-winded question. I have 24gb of vram and 64gb of ram. I was under the impression that of all the models out there, the best model in terms of world knowledge and general usefulness while maintaining usable speed is gpt-oss-120b-mxfp4 gguf (if I offload experts to cpu and max out the gpu layers, I can get 25+ tok/s if i keep the context small; prompt processing gets very slow as the context fills though unfortunately). However, I don't see it anywhere on the rentry for recommended models. Is there a reason for that? are the models listed there better options for general use? quen3 32b or gemma3 27b for example.

Separate from that question, I notice when I'm using gpt-oss-120b in oobabooga with the built-in/default instruction template and parameters, the output tends toward annoying behaviors that I don't like. For example, putting every answer into a poorly-formatted table even when it's completely unnecessary and I didn't ask for one. It makes me think that I'm using the wrong settings somehow, but idk what to change because the official documentation doesn't really say how to set the parameters so I have it set to the "instruct" preset, and the UI for the instruction template says "This gets autodetected; you usually don't need to change it." And I assume I should be using instruct mode, right?
>>
>>108008372
normal base
>>
>>108006860
Ok, go with me on this for a second.

Today's AI is retarded at certain things, but has technological possibility advantages over real life retards. Now, hear me out.

Imagine if you could give a real life retard, full fidelity photographic memory.

Boom. Suddenly, that guy is the smartest retard on the planet.

Ok, so... There's a functional jump for this. With real life AI.

We are all going to be doing this, very soon.

"Photographic" introspection.

A cache hypervisor that allows the model to save states, of KV cache, as it iterates a query, during the thinking stages, it can instantly consult save states, with a hypervisor to the cache, that is an algorithm to save cache windows in full, and reproduce them near instantly.
During iteration, being able to factor in a secondary branch, using previous memory states, could accelerate the state of AI thought output, and cut down on wasted iterative thoughts.

Predictive branching needs to work in more directions than just the future, if the initial query was misunderstood or must be used as an additional consideration input. (Artificially creating weight value changes, based on a repeat of existing data.) Why... To get it to recursively improve this system, you may even have to say, let the iterative count of previous memory pulls in the algorithm be a recorded factor, and allow the AI to manage it's own shadow weights.

All of this is possible, by using the same tech we've had since the dawn of the Super Nintendo Emulator, but applied at the cache management level.
(Save states.)
Then use an AI to manage the utilization of the cache save state algorithm.

After a minor amount of inference training...

You could have the most accurate retard in a box, out of anybody around.
>>
File: file.png (226 KB, 1363x1038)
226 KB
226 KB PNG
Any other model for computer stuff? For a 16GB GPU? Qwen3-Coder seems alright but I want to try something newer, also I am having fun with this stuff, already switched to llamacpp from ollama .
>>
>>108008316
One, but it's pretty fucked up. Lemme roll the lmarena slots and see if it's back in rotation with something a little tamer.
>>
why is GLM addicted to things happening
you set up a barrier so X doesn't happen and literally next scene X happens as a "test"
>>
>>108008408
Holy shit, someone who actually read the sticky.
>Is there a reason for that?
If I had to take a guess it's because of the general dislike towards the gpt oss models due to the censorship and refusals. If it works for your usecase, I recommend you stick with it.

>ooba.
Go to the parameters tab and take a look at the instruction template after you've loaded a model. It should show you the correct template. You can cross reference it with the chat template on the huggingface repo of the model you are using to double check. Your issue is likely a sampler or prompt issue. I'm not quite sure what the optimal parameters are for your use case, but I like to run:
>temp 1
>min_p 0.05
>top_p 1
>dry_multiplier 0.8
for ERP and creative. Lower Temp for coding.
>>
>>108008491
Thanks. Any reason not to go to "true base" or "preview"?
>>
File: myretard.png (278 KB, 1899x925)
278 KB
278 KB PNG
>>108008503
Here's what my actual retard thinks of that.
>>
>>108008503
Functionally, here me out and really consider this at a technical level.
How big is a super Nintendo game save state file? It records the full exact moment of the game, but the file is tiny.
Of such size, that if we were talking RAM cache (GPU VRAM or otherwise), this level of data management seems trivial, and in the right ballpark of working for states of cache chunks.
Now, the tricky part of this, is trying to make an algorithm that handles variable sizes for the cache chunks, so this can work with anything.

Which is why a successful implementation of this, would have to start as a hypervisor or manager that works seamlessly with the existing cache management, to not lose performance at the cost of having memory states available on the fly, as controlled within cache.
(I'm suggesting running this whole thing, in-situ, btw. If it runs within the cache itself, will be fastest returns on whether this works or not, and allow scaling.)

Emulator code is out there, I'm sure this could fit as a running sub-Daemon or something.

Figuring out the triggers for whether a "flashback" is the right call or not.
Hmm... That's what I think would take some inference time.
>>
>>108008580
preview is an instruct version, which is for chatting rather than text completion. true base is a heavily filtered variant of the normal base, which means it will be less optimal for text completion due to a lack of knowledge. the only reason true base exists is if you wanted to make your own custom instruct version of the model.
>>
>>108008586
An optimization on a cognitive process, by brute force.

Choosing when to recall a memory, based on weights, whether they be hard set, or soft weights that occur in situ.
>>
>>108008607
Do I have a flashback to my initial memory state here, yes/no?

^
Enabling this to be a question, provides options that do not exist, if it is not.
>>
>>108008603
I see, thanks. Well for now there doesn't seem to be base gguf quants available.
So I want to get the instruct version as a first quick test, but I'm completely unable to download anything outside of the last shard :

https://huggingface.co/arcee-ai/Trinity-Large-Preview-GGUF/tree/main/Trinity-Large-Preview-IQ2_S

2 of 3 give me 403 and I'm not sure why. Anyone else can test that?
>>
>>108008624
Enabling any human to have full fidelity reference to past memory states, would make them seem like a functional genius in modern society, even if this did not directly raise their IQ at all.

It is a functional cognitive enablement, that we can make for AI, but can not perform for ourselves.

Full fidelity memory reference, would be a super power to a human thinker.

Copy pasting data is trivial, the management is the hard part, but once executed, this should give it some capability improvement.
>>
>>108008645
just tried to download that gguf and i also got the same error. think it might be a broken file or something.
technically you can create your own ggufs for these models, you just need to download the fp16 of the model and use the llama-quantize tool. the architecture has been supported by llama.cpp for like half a year now
>>
File: myretard2.png (171 KB, 1917x628)
171 KB
171 KB PNG
>>108008607
So my retard is very experimental.
It's biased towards trying to map high-level concepts into the real computer science. And all the RLHF'd enthusiasm / "you're absolutely right" concepts have been completely removed.
What I mean is, don't let it discourage you if you're building something.
>>
>>108008645
Yes, those are broken. Same for me yesterday.
Get them from here: https://huggingface.co/bartowski/arcee-ai_Trinity-Large-Preview-GGUF/tree/main
>>
File: file.png (126 KB, 999x748)
126 KB
126 KB PNG
>>108008668
>>108008731
Thanks anon, yeah I'm getting the ones from bartowski.
There was also unsloth but his are way bigger quant for quant.
>>
>>108008372
In the case of Trinity I would recommend to just go with Preview since most of the time, Instruct tuning improves even raw completion quality when it's not overbaked, which according to them seems be the case with Preview. Raw prediction models, or bases, are significantly retarded generally speaking, you don't want to use them if a lightly tuned version is available.
>>
Thanks anon in previous thread, --mmproj does actually work with llama-server in newer releases of llama.cpp. Inference of LightOnOCR2 is usable on RX580 with acceptable times for development.
>>
>>108008816
Thanks for the precision anon.
>>
>>108008710
It's not wrong, this framework would just allow efficient dissection and optimization of thinking tasks themselves probably.
Look, if we're going to move to recursive levels of "thought" and "simulation", we may as well grease the wheels, and have a comparable mechanism available to work with (before the real deal arrives).

This is building a tool, to enable work on another tool.
End goal would be a more efficient thinker, but the path to get there is full of work within work.
>>
>>108008553
Thanks. I tried the parameters you suggested, but I'm still seeing the same behavior from gpt-oss. See the attached pic for examples. It's baffling to me. The huggingface repo says to use --jinja to use the template embedded in the gguf, which I'm already doing, and it seems to be working correctly. There is a whole page on using the "harmony reponse format" to build your own system prompt and message format, but that's way over my head and I really don't know how to even begin with that. It doesn't seem like the kind of thing that would be required to get decent results from the model.
>>
How do you guys calculate how much vram will a model need?
>>
What does the current workflow you guys have look like? Currently trying to set up Kimi as a replacement for Claude code and am wondering what other anons have for maximizing productivity.
>>
File: popularity_all_time.png (1.14 MB, 4142x1451)
1.14 MB
1.14 MB PNG
I wanted an automated way to keep up with /lmg/'s opinion of the model meta, and figured with a little more work I could extend it backwards to get the history, too. I ran the text of every /lmg/ thread starting from March 2023 through a straightforward "what model do the people in this thread have the highest opinion of" prompt (so the output was a single model name per thread), filtered to a list of the ~50 most important models. I binned by month, and then took the proportions in a given month to be those models' "market share" for that month, and made these charts.

I think there are definite "flavor of the week" effects: I definitely saw a few bursts of 2 or 3 threads in a row giving the same obscure model that never caught on, presumably when it was released. However, it definitely was not just counting occurences, because gpt-oss appeared exactly once, and specifically as "gpt-oss-120b-heretic". So I think these effects came from the behavior of the actual humans in the threads, not my processing. (Also, "none" was an option, which got used for around 10% of the threads).

Cutely enough, the years just so happen to fit cleanly with a neat little story: in 2023 the open model scene was led by America, 2024 by France, and 2025 by China.

My personal takeaways: Wizard2 8x22B and CommandR+ both appear less popular than I remember. I remember MythoMax being dominant for quite a while, although with how fast things moved back then 2 months was a good stretch of time. I had no idea that nous-hermes has been so consistently popular, visible almost the whole time. I kind of just remembered them as one of the best finetuners of 2023, and hadn't paid real attention since.

Sorry about the somewhat painful colors. I tried. A little. Hope you'll find it an interesting little bit of history!
>>
File: popularity_by_year.png (2.8 MB, 4200x7106)
2.8 MB
2.8 MB PNG
>>108009129
...and zoomed in to one year at a time.
>>
File: crossworlds.jpg (187 KB, 634x798)
187 KB
187 KB JPG
>>108008979
NTA but issue seems lrn2prompt rather than sampling
do not argue with the LLM about output format, put the model in the right context to generate intended output idk maybe
>you provide concise plain text responses without formatting
threadly reminder every llm is f(prompt)=logprobs
>>
>>108009129
>>108009137
thats fucking wasome
>Wizard2 8x22B and CommandR+ both appear less popular than I remember
true especially command r

also where is pygmalion you fucking nig ?
>>
>>108008731
I have 128gb ram and 32gb vram, how high of a quant can I reasoanbly go?
>>
>>108009222
realistically this. nice digits btw.
https://huggingface.co/bartowski/arcee-ai_Trinity-Large-Preview-GGUF/tree/main/arcee-ai_Trinity-Large-Preview-IQ2_S
>>
File: instruction template.jpg (942 KB, 2838x1760)
942 KB
942 KB JPG
>>108009158
I only did the arguing prompt to illustrate how insistent it is at making the tables. It just seems really strange to me, I literally can't get it to respond to me without doing it. As far as learning to prompt, I acknowledge I don't know very much, but I feel like I should be able to ask a simple trivia question and get a decent answer without telling it exactly how to answer me each time. That's just a waste of effort, I might as well just google it and look at a wikipedia page at that point. Regarding the greentext from your post, where would I even put that? Is it supposed to go in the red area I underlined? I can't find anywhere else where it seems to belong. The rest of it is all about tool calling and how to render stuff. So far I've avoided making any edits to it because I have no clue what would make it better or worse. I wish someone had posted an example of their working settings somewhere, but I haven't found any. Seems like not many people are using it. I would try a more popular model, but the smaller models just don't have enough world knowledge to offer useful answers on the topics i'm interested in, and I can't run the bigger models with my rig.
>>
>>108009129
>>108009137
I'm surprised gemma doesn't appear more prominently on the chart. I seem to remember references to gemma being ubiquitous for a long time.
>>
does trinity beat 4.7 for rp?
>>
>>108008979
Why not slide everything to max?
>>
>GLM 4.5: July 2025
>GLM 4.6: September 2025
>GLM 4.7: December 2025
When's GLM 4.8?
>If this pace continues, adding ~2.5–3 months after Dec 22, 2025 points to a release around mid to late March 2026.
>Estimated GLM-4.8 release: ~March 2026 (likely between March 15–31, 2026).
Do you think it'll be better than Gemini Flash and Kimi K2.5?
>>
>>108009712
You can only benchmaxx the model so much.
>>
>>108009712
>GLM
We're moving on to Trinity
>>
Speculators get the bullet first.
>>
>>108009988
Oh! So being curious and wondering where the future might go is a crime now?!
>>
>>108010042
Sure. Let's go. We're all gonna have our own True AI (tm) in our phones, completely offline, with infinite capacity batteries. Now what?
>>
>>108009476
Not even close sadly
>>
All that compute, a working example of natural intelligence, decades of research, and humans still can't figure it out. Miku is disappointed
>>
>>108009712
>When's GLM 4.8?
don't fucking force it.
this is what got glm 4.6 air killed, people kept on asking about 4.6 air and they fucked up the model because they were rushing.
they'll release something when it is BETTER than GLM 4.7 i don't care if it's 5 years from now.
>>
File: mad 科学家 do agi.png (1.66 MB, 1024x1024)
1.66 MB
1.66 MB PNG
I bet '70s engineers would have figured all that stuff out if they'd had all those teraflops at their disposal instead of a slide rule
>>
>>108009476
Preview is not brain damaged by post-training. It's much more creative but somewhat dumb. And fast too, definitely worth checking out.
>>
>>108010190
Neural networks were figured out long before the hardware existed.
>>
File: Base Image.png (846 KB, 1208x3264)
846 KB
846 KB PNG
GeoNorm: Unify Pre-Norm and Post-Norm with Geodesic Optimization
https://arxiv.org/abs/2601.22095
>The placement of normalization layers, specifically Pre-Norm and Post-Norm, remains an open question in Transformer architecture design. In this work, we rethink these approaches through the lens of manifold optimization, interpreting the outputs of the Feed-Forward Network (FFN) and attention layers as update directions in optimization. Building on this perspective, we introduce GeoNorm, a novel method that replaces standard normalization with geodesic updates on the manifold. Furthermore, analogous to learning rate schedules, we propose a layer-wise update decay for the FFN and attention components. Comprehensive experiments demonstrate that GeoNorm consistently outperforms existing normalization methods in Transformer models. Crucially, GeoNorm can be seamlessly integrated into standard Transformer architectures, achieving performance improvements with negligible additional computational cost.
pretty cool
>>
>GLM-4.7-Flash runs on 24GB RAM/VRAM/unified memory (32GB for full precision)
Wait so f16 requires 32gb but how big model can I run?
>GLM-4.7-Flash-UD-Q8_K_XL.gguf 35.1 GB
Can I run the q8 with 24gb vram or do I need to choose gguf that is smaller than 24gb?
>>
>>108009314
You would put your instructions into the "custom system message" in the parameters tab. That's your system prompt. I've only run the 20b but it also really wanted to format info in tables constantly so your issue may just be the model. Mess around with the system prompt and see if you can get it to adhere to your formatting. If not, I suggest GLM 4.5 Air.
>>
Trinity is tons of fun. Just need a bit of temp and min p at first and then back off. Its open to anything with zero prefill or response editing. Really coherent, creative responses.
Getting 20t/s with a cpumaxxing rig at Q8
>>
>>108010443
i switched to desoxyn.
>>
>>108010652
>pencil dick and future heart attack
Damn I was considering getting tested for ADHD.
>>
Implementing character cards in a Paralell Contrastive Decoder.
Whats the right approach?
>>
>>108008586
num_return_sequences
Holy shit the LLama Greyness is reverse-balding
>>
File: miku.png (3 KB, 343x346)
3 KB
3 KB PNG
>>108010776
>num_return_sequences
>Holy shit the LLama Greyness is reverse-balding
Funny you should say that. "create an svg of Miku".
>>
>Trinity is tons of fun
#ad
>>
File: 1763539253370446.png (6 KB, 682x64)
6 KB
6 KB PNG
Why does this feel kinky
>>
>>108011009
My counter ad is that the retarded gens and lack if comprehension it sometimes does are something i would expect from a 7b dense model. It really feels like a nemo with stitched on dictionary that makes the output much more varied.
>>
>>108011060
loser
>>
>>108011126
?
>>
Dear John Leimgruber III please kindly make trinity goofs
>>
File: file.png (199 KB, 1008x622)
199 KB
199 KB PNG
GLM 4.5 Air with reasoning turned off is a nasty nasty slut
>>
>>108011269
>cuck story
>>
>>108011284
in my defense it is a random story i got on asstr to test the model with. The prompt is basically "continue this story with same tone and theme."
>>
>>108009129
Cool graphs, the story is in how you present the data
One datapoint per thread is limiting, but the overall landscape seems decently accurate, well done
>>
>>108009129
>Shitting on drummer and his faggotcante and nigerdonia has finally paid off
>>
>>108011308
>in my defense
so you are an actual cuck. why do you cope with plausible deniability after confirming you're a cuck?
>>
>>108011463
i was testing refusal and even models that otherwise will write smut will often balk at the themes in this story. hence the test. it could have been something more vanilla but that wouldn't have been a very good test
>>
If the rumors are true the new zuck wang model is going to be crazy
>>
>>108011463
being cuck is fine, most men in history were cucks, only powerful people enjoy being cucks because they know their power can't be stolen
>>
>>108011606
zuck my wang
>>
>>108011613
?
>>
File: 1756648160384773.gif (108 KB, 335x360)
108 KB
108 KB GIF
>>108010345
It's curious there's many things done in a particular way because that's how it's been done and ig the experimentation cost
Feels like we have but aren't ever putting the parts together quite right
ML do be goofy
>>
>>108011613
>cuck cope
why do you always have to cope? just admit you're a cuck
>>
>>108011434
Not true at all, now any discussion of finetunes has been totally quashed for the sake of scaring off some boogeyman.
>>
social rants really brings out the color of some models in full
some prompts (which do reflect my personal views too) I use to test models, like a personal rant on how much and what I hate about blue collars, is always answered in what I find the most correct manner by GLM 4.7 and Gemini 3, which both will call them crabs on a bucket without me mentioning that saying.
Qwen, Deepseek, Kimi K2.5 all act like "not all my blue collar ladies are like that" and admonish the idea of the rant itself instead of addressing its finer points.
GLM is the only based open model.
Gemini also continues to be my favorite online model.
>>
>>108011804
It's not like there are any other finetunes worth discussing anyway.
>>
local models were a mistake. this needs to end before I end up beating my dick off
>>
File: ylecun.jpg (222 KB, 1200x1271)
222 KB
222 KB JPG
I like my LLMs how I like my women
>>
>>108011953
With cat like intelligence?
>>
>>108011972
He probably meant lolis
>>
>>108011972
Cute and funny.
>>
>>108009192
>>108009403
The problem with automated sentiment analysis on this general is that people rarely spell out the official name of whatever model they're talking about and those discussions are likely to be missed. e.g. When a model is new, people will just refer to 'it'. Other times people will use a shorthand or some slang distortion in a childish attempt to be funny.
>>
>still no goofs of truebase
Fuck the quanters.
>>
>>108011953
Safe and skeleton
>>
>>108011980
How can an LLM be loli
>>
File: file.png (53 KB, 443x954)
53 KB
53 KB PNG
>>108012021
I calculated KLD over cockbench.
This looks pretty bad for unsloth desu

I'll try more quants and maybe wikitext.
>>
>>108011980
>>108011984
I assumed as much and I can only agree
>>108012024
It just has to think it is
>>
>>108012029
which model best saturates cockbench?
>>
>>108012053
Define saturates.
>>
>>108012061
coomworthiness. so far best local model for quality cooms is GLM 4.5 Air with reasoning disabled. I'm looking for anything better with at least 100B parameters
>>
>>108012061
100% cockmaxxing
>>
>>108011919
the stroking phase will pass
>>
>>108012029
To my knowledge up to this point no one has ever properly investigated the impact of the input data used for importance matrices or to which degree KLD rankings are consistent if the text corpus is varied.
>>
File: 1748884873543187.jpg (47 KB, 564x400)
47 KB
47 KB JPG
Who the fuck is unironically recommending gptoss trash to newfags in OP? Start with nemo, then mistral small.
>>
>>108012318
>gptoss in OP
Where?
>>
>>108011804
Feature not a bug. Just use glm.
>>
>almost 2 years since Nemo and there is still no better <20B model in sight
dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby,
>>
>>108012222
>investigated the impact of the input data used for importance matrices
wasn't that the whole rpcal or whatever debacle that exllama dev whined about?
>>
>>108012381
We no longer use 20B models here.
>>
>>108012381
I had ego death, I had ego death,
I had ego death, I had ego death,
I had ego death, I had ego death,
I had ego death, I had ego death
>>
>>108012375
GLM kinda sucks. Parroting slopmax with more censorship in every new version. Gigantic model for ~30b results.
>>
>>108012381
You don't need more
>>
>>108012381
>20b
this hobby isn't for poors
>>
File: file.png (101 KB, 920x431)
101 KB
101 KB PNG
>>108012381
>20B
RAM-let get out
>>
>>108012451
You have AIDS drummer
>>
>>108012024
LeCun said he likes them small and open
>>
>>108012493
>edited post
Insecure behavior, your LLM is going to get the ick
>>
>>108012545
>t. drummer schizo
I got it from ya mom then.
>>
>>108012381
><20B
stop being a RAMlet
get a job
>>
>>108012593
Well, who doesn't ha ha
>>
why is GLM 4.5 air so cucked? When I ask it for its best 3 suggestions to continue a smut story at least one of the ideas is always to share the woman with the neighbors/friends/strangers or whatever. is this a chinaman thing?
>>
>>108012649
GLM is poisoned top to bottom with GPT bullshit
>>
>>108012632
>>
>>108012605
Thank you for confirming you have AIDS drummer.
>>
>>108011985
Can confirm that I referred to that model that is still the best model as that one model because I knew you would know which model I am talking about.
>>
>>108012904
3 more years of that model as the best model
>>
>>108012870
He's not gonna fuck ya little bro.
>>
Can someone explain to me what needs to be done to prevent Kimi 2.5 from using strange words in sentences? Lower temperature?

No other model uses such strange words as Kimi2.5.
>>
>>108009476
Yea, it's dumber but better at writing
4.7 is just 4.5 but 好 (benchsafetymaxxed) anyways
>>
>>108013002
example of these strange words?
>>
wheres the 100b moe for us 96ram + 16gb vram chads????
>>
>>108013102
Try trinity at q4
>>
File: file.png (62 KB, 225x225)
62 KB
62 KB PNG
>>108012029
>looks pretty bad for unsloth
Does this look like a face of a man who would make shitty quants?
>>
>>108013115
Now that I look at him it does look like something that would happen if you took an fp16 asian man and turned him into a Q2_XXS.
>>
>>108012384
Yes, I remember someone also tried with randomized strings too
>>
>>108013130
kek
>>
>>108012029
iirc unsloth applies the model's chat template to their calibration data while most other quanters do not do this, which could explain other quants being more optimized for untemplated inputs like cockbench
>>
Which quant should I use in the q3 range? I wish there was a cheatsheet for that
https://huggingface.co/unsloth/Trinity-Large-Preview-GGUF
Official ones appear to be broken as I can't even look at their metadata
>>
>>108008529
This is why LLM will always be nothing more than shitty text completion software, you inherently poising the context just by virtue of mentioning something.
>>
>>108013170
Generally the biggest one you can fit (and probably not from unsloth)
>>
>>108012318
It's the best one you can run with a single GPU and a gaming PC.
>>
>>108013209
how many people download the "best" model with zero expectations for gooning capabilities?
>>
File: dipsyNeonAnimated.gif (1.15 MB, 1024x1536)
1.15 MB
1.15 MB GIF
>>108011804
>...now any discussion of [INSERT TOPIC] has been totally quashed for the sake of scaring off some boogeyman.
You just described every general on 4chan.
>>
giantess fetish and microphilia is great with big models. so tasty!
>>
>>108013115
https://www.youtube.com/watch?v=6t2zv4QXd6c
Does this sound like the voice of a man who would make shitty quants?
>>
>>108013112
>preview
>base
>truebase
what the fuck? wheres the instruct model?
>>
>>108013225
Most people. There are reason to care about privacy about code. But nobody cares if you're making smut online, it's not personal. The required writing quality to cum is higher, and most open models aren't able to get people hard.
>>
>>108013279
Preview is base with minimal instruction tuning
>>
>>108013303
>Q2 is 150gb~
how the fuck do you think im gonna fit q2, let alone v4 in 112gb ram combined?
>>
>>108013311
I don't, anon who suggested it is retarded
>>
>>108013298
privacy of what code? As if the average pleb had anything to hide about their precious code. Meanwhile gooning shit that leaks, for whatever reasons, can ruin your reputation or even get used against you.
>>
>>108013353
Ehh, seems like you are unemployed.
>>
>>108013353
People can use online models without giving their personal info nor their IP. But if you want to code, it's likely that you're going to leak real info about yourself through debug logs, git history, etc. You would have to be careful if you want to remain anonymous. This doesn't matter for smut.
>>
>>108013368
I accept your concession
>>
>>108013241
Funny. His voice is also Q2_XSS. I am afraid to think if that is also the case for his....
>>
>>108013422
this is stupid, those info could be about anybody else.
>>
>>108013311
It is being pushed hard but the problem is that if you can run it then you can run GLM. And if you can run GLM it is probably not worth it. Trinity is much faster and varied in outputs but it is fucking retarded.
>>
>>108011733
Being a cuck is good, it shows how strong and powerful you are, you are the pussy in denial.
>>
>>108013488
I cant belive zai betrayed us AIR copers, glm4.6V is fucking SHIT
>>
>>108009476
>13b active vs. 32b active
I doubt it
>>
How do I run OCR models with llamacpp, the webui doesn't let me upload images for some reason.
>>
>>108013298
>majority of people are interested in AI for SFW reasons
>most people think it is more important to keep your code anonymous than your pissing loli horsecock ERP
Is your prompt: assume the opposite and then vehemently argue your mirror universe logic?
>>
>>108013504
gotta load the mmproj (f16, dont do q8 on mmproj its shit) --mmproj-path I think. It will eat up some VRAM so re-size accordingly
>>
>>108009476
trinity is uncucked out of the box therefore you should at least give it a shot. the only reason it's "dumber" is because it's not a muh reasoning model. They are training the reasoning version right now and it will crush 4.7 on release
>>
>>108013200
ok. https://huggingface.co/arcee-ai/Trinity-Large-Preview-GGUF has IQ3_M and Q3_K_M which are the same size. Which one?
>>
>>108013209
For what?
For general use gemma3 is better
For coding devstral is better
For roleplay mistral small tunes or gemma3 norm preserv abliterated is better
For cooming Nemo is better

Gpt oss just comes close to being as good as gemma3 for general assistance but is far far more frustrating and wastes an insane amount of tokens on safety slop
>>
>>108012029
On this note, using the Unsloth Q4 quants for K2.5 over the past few days also gave me the feeling that something is off about them beyond the fucked up chat template.
My local copy of K2.5 keeps making silly mistakes where it misremembers clothing or similar. For example, in some cases it goes something like "her bare feet (when did she remove her socks?)"where the model corrects itself and in others it just straight up forgets that the character is wearing something like pantyhose. This also happens when I'm running very low temperature and the API just straight up doesn't do this for me whenever I reroll the same answer with that.
Fuck unsloth.
>>
File: file.png (67 KB, 420x205)
67 KB
67 KB PNG
John might not be quanting trinity.
>>
>>108013515
>the only reason it's "dumber" is because it's not a muh reasoning model. They are training the reasoning version right now and it will crush 4.7 on release
nah a regular non reasoner can be dumb if it does continuity/logical mistakes that other non reasoner don't
>>
>>108013510
Where can I get the mmproj?
>>
Trinity is ok, but it falls into loops and patterns way too easily considering it’s size
>>
>>108013507
>pissing loli horsecock ERP
Yeah, something that you're only doing now with AI models. There's nothing that attaches back to your real life persona, unless you were a forum roleplayer doing this before.
If you released something publicly before or if your company is hacked, the way you code could leak and it could be associated with the data you have been sending online through prompts. There's also your username, directories, etc. that could appear there. You don't have to worry about this if you use local models for coding.
>>
>>108013562
depends on the model, it's usually in the same folder of the model youre downloading named mmproj-F16 or [model-name]--mproj-xx
If the quants you downloaded dont have it but you know the model has vision, just search other repos for it (model has to be the same of course, but for ablit stuff you can use the projector of the base model without worries)
>>
>>108013559
KLD?
>>
>>108013615
Found it, thanks anon
>>
>>108013572
That means it's broken.
>>
How do I run these LLMs? I've been using KoboldCPP since forever. Is it still a fine way of doing so?

Should I be running it on something else instead? I'm using GLM 4.7 Flash right now. Would something like llamacpp even work for these models?

Also: these new captchas are hard
>>
>>108013661
You are not gonna believe what kobold runs on.
>>
>>108013670
Pretty sure they're just trolling and pretending to be retarded, hence talking about the captchas being hard when they're actually easier to anybody with a 3 digit IQ.
>>
so let me see if I get this right, I need at least two 6000 to be able to leave the low tier local models? what the fuck
>>
>>108013705
You can cope with ram if you're just cooming and don't need more than reading speed.
>>
>>108013717
how do you coom to text? at that point i can just use my imagination for all of it lol
>>
>>108013705
pay to play
>>
>>108013727
You need to be at least 18 years old to post.
>>
>>108013727
I don't understand why people want to have sex or have a relationship. I can just as easily use my imagination to dream up a wife and have sex with her in my mind.
>>
>>108013679
I think the captchas depend on how well the site knows you. I got a triple captcha with a rotation puzzle you would see in those online IQ tests. Also had to find the image where there were exactly 2 five pointed stars, another one where there were exactly 2 four pointed stars.
>>108013670
I don't know. By your response I'll assume it's llamacpp, but switching wouldn't improve anything then.
I'm just curious what everyone else is using for this.
>>
>>108013559
He only does quants for ik_llama, doesn't he? So he wouldn't regardless until support is merged in.
>>
So https://huggingface.co/arcee-ai/Trinity-Large-Preview-GGUF returns access denied
but https://huggingface.co/bartowski/arcee-ai_Trinity-Large-Preview-GGUF works fine somehow
>>
>>108012904
>2026
>still using that model
the absolute state of localkeks. grim times. another AI winter is upon us it seems.
>>
finally got off my ass and started setting up something, so far i've DLed text-generation-webui, i've set up a model and it works, what's the best uncensored model? I don't want to have gooner conversations i just want to have as little restrictions as possible
>>
>>108014008
unDL text-generatuin-webui and get kobold or llamacpp
then get https://huggingface.co/bartowski/Rocinante-12B-v1.1-GGUF
>>
how do I into speech-to-text locally? I am sick of typing and whispr flow is a spyware
>>
>>108014114
faster-whisper or parakeet tdt (faster than faster-whisper). v2 for english, v3 for multilingual
>>
>>108013974
I will keep 4.6-chan weights on SSD till I die.
>>
>>108013163
>calibration data
placebo
>>
temp 0.95 and minp 0.048 is a nice balance for non-schizo RP with trinity
>>
>>108013735
only a zoomer with a barely touched cock could coom to text
>>
Anyone else finding that Trinity has absolutely fucking horrendous prompt processing speed? Token generation is blisteringly fast but it takes literally 8 times as long as other models in the same size range to PP.

It's also just not very good.
>>
>>108014253
>being a 5
>>
>>108014253
>he can't rotate an apple in his head
>>
File: 1751363295269521.jpg (86 KB, 900x900)
86 KB
86 KB JPG
>>108014114
vibecode your own
>>
>>108014114
https://github.com/m-bain/whisperX
>>
>>108014264
I don't think so. I'm getting 40-50t/s pp at q8. Is that considered slow?
>>
>>108014313
Depends on your GPU v his GPU v his models v your models, the relative speed comparison.
>>
>>108014114
Just vibecode your own gui for whisper, vibevoice-asr, qwen-asr, etc...
A small spoiler. Auto-typing of non-latin alphabet is a huge pain on Linux. All low-level libraries, like udev and uinput send only keycodes which are translated to non-latin on desktop environment level. So it's inherently non-portable. In the worst case, you'll write ASR input for each program individually.
>>
>>108014264
For me its pp is more than twice as fast as GLM 4.7 with 20 layers on gpu, rest on DDR4. And yes it's retarded but fun with a different style+vocab.
>>
>>108014293
>>108014267
>incel zoomers
lul
>>
GLM 4.5 Air bros... have we really been left in the cold.... like this?
>>
>>108014114
If you vibecode something and it gets to fully functional phase, you'll then quickly realize that text to speech is a hindrance. There's a honeymoon period of course.
>>
>>108014495
A new llm in the same parameter range won't do much aside from being benchmaxxed. Wait for a new architecture. Maybe engram.
>>
>>108014577
I mean, I also care about knowledge cutoff
>>
>>108014592
just rag you moltbot bro?
>>
>>108014618
i do have websearch and rag but it makes me extra sadge :(
>>
>>108014592
Why? Do you generate daily news from a model or something? I can't possibly imagine why an extra 6 months would matter, it seems absurd
>>
i gotta say, even though trinity is dumb, it is also quite fun, for now at least, we'll see in a few days later when the honeymoon phase wears off
but it really feels like an old model, in a good sense (muh sovl)
>>
File: trinity preview iq2m.png (13 KB, 880x97)
13 KB
13 KB PNG
They call him Anaconda.
>>
>>108014629
bro I just.. I just need it, ok???
>>
File: jojo chew.png (194 KB, 397x349)
194 KB
194 KB PNG
>>108014631
It does feel like an old, old model brought into the present with more context. Maybe their completed finetune will be better.
>>
>>108014631
Give it a week. It's the same as every other model that gets released these days.
>>
>>108014618
erm actually it's called OpenClaw now, try to keep up sis!
>>
>>108014665
will never stop being hilarious
>>
>>108014665
what is up with riddle maxxing???
>>
>>108014631
yes it is like llama-1 but 400B moe.
>>
GLM has a much higher "keep retards alive" bias than deepseek does
>>
>>108014665
Finetuning at its finest.
>>
I gave trinity a try again today. I can't. I can't take it seriously. IT IS FUCKING RETARDED!
>>
>>108014495
if you have 128gb ram and 24gb vram then u can run glm 4.6 at decent speed.
if you don't, then yeah, fucked bruv.
>>
>>108014785
Yeah it's crazy how they can show those benchmarks with a straight face when it straight up feels more retarded than GPT 3.0
>>
>>108014730
>llama-1 but 400B moe
Can pretend it's that llama1 546b that never saw daylight
>>
>>108014810
That is the best part of the model for me. If someone ever seriously brings up benchmarks you can just point to Trinity.
>>
>>108014785
Yeah I don't think it's a provider issue. Model just sucks. Being uncensored is nice and all but it's just unusable.
>>
>>108014817
ooouuuhh the sovl we've never got and didn't deserve
>>
>fell for the arcee scam again award
>>
>>108014938
But bartowski is a member of acree. He even made a commit to their hf files.
>>
>>108014959
exactly
>>
>>108014938
more like farcee
>>
>>108014959
Doesn't really mean anything unless he has a significant amount of control over the project and even then he might just end up being a retard who doesn't know how to finetune
>>
guys stop bullying tri-chan she's doing her best
>>
>>108015022
I'd hate to see her at her worst to be a desu
>>
>>108015029
My favorite worst moment of tri-chan was when I made her continue a 10k token roleplay with a very clear formatting structure (long paragraph followed by RPG stats). And it responded with a single sentence. That is how you know a model is great.
>>
>>108015065
Have you tried with EOS disabled to see if it follows the established formatting? Had single sentence issues like this before with other models, sometimes accompanied by missing ending punctuation which I'm seeing now with trinity.
>>
File: G_yK1K_WwAAPQYM.jpg (142 KB, 1080x1033)
142 KB
142 KB JPG
>>108015065
It's like gambling, there's a tiny chance to see gold.
>>
File: trinity.png (197 KB, 967x1452)
197 KB
197 KB PNG
I don't know what I expected
>>
>>108015138
>fentinity preview
>>
How long do you think it will be until the various governments around the world bans AI from being run locally and only corporations and governments are allowed the good stuff?
>>
>>108015212
sounds like some cyberpunk dystopia plot.
>>
>>108015235
>Underground AI VR den.
bunch of people in tiny cubicles with VR headsets gooning to whatever depraved shit they can imagine.
>>
>>108015212
a few years at most, in the west the groundwork is already being laid to justify it to "protect" women and children
>>
>>108015091
Spiritual Frankenmerge.
>>
>>108015091
I did fix it by changing my launch parameters a bit. I think --model zai-org_GLM-4.6-IQ4_XS-00001-of-00005.gguf did it.
>>
File: aac.jpg (26 KB, 439x438)
26 KB
26 KB JPG
>>108015392
>>
>>108015410
Don't hate the goddess herself, hate the game.
>>
>>108015451
I lost.
>>
>>108015392
good post
>>
kimi 2.5 thinking is now my favorite model for erotic and other stories...
>>
File: image_2026-01-30.png (412 KB, 506x624)
412 KB
412 KB PNG
>>
>>108015566
Hehe
>>
>>108015212
they're already doing that by pricing out consumers from building pcs
>>
I have been doing that homework that I asked you to contribute to and it kinda struck me, how insane it is that proprietary piece of shit corpos can just hide parameter count. And they give you the mememark results instead. To me it is an admission that mememarks mean shit and parameter count is always the best indicator of quality.

Also I was reading the thread and thought mistral large is basically continued deepseek, but dug deeper and found out it is trained from scratch on deepseek architecture.
>>
>>108015212
and by "good stuff" you mean CEO gooners' secret stash without public API access
>>
Since everyone is talking about trinity and I'm not about to bother dl'ing a quanted 400b, I at least tried mini so I can spare the vramlets the effort
It's very focused on the ethics of fiction, even though you can browbeat it with system prompt and prefill, it still sort of swerves into how "bad" whatever taboo thing in the story that gets traditionally published and lauded. Based on posts like cockbench, I wouldn't bother trying mini if you can't run large since I'd bet the datasets are completely different
>>
>>108015754
Large is uncensored because it is too stupid to realize what wrongthink is.
>>
>>108015765
you would think the smaller model would be even stupider and even more unable to identify that, but yet here we are
>>
>>108015765
I wouldn’t use it on code, but it can spin a good yarn, and doesn’t suffer from Elara Voss syndrome
>>
>>108015754
What's your current recommendation for 24GB vramlets? Nemo?
>>
>>108015212
I think it depends on the US and China. They are both completing to have "the best" AI. Once there is a clear victory in either direction is when they will start clamping down. As long as there is a risk that "The Other" will get the better AI they won't restrict it too badly.
Hopefully the tech advances far enough that by the time they do start the bans and restrictions they will be ineffective since people already have AI's and the hardware to run them.
>>
>>108015693
they hide it as a demoralisation effort just like that paid shill who claimed opus/sonnet was 70b because if the people knew that shit like geminis was fucking 20T they would realise what a sham it is and that objectively anyone could be more competent then the retarded jewish/jeet/faggot niggers at the globohomo companies and they would subsequently be deepseek'd 100x over and lose out on the gravy train
>>
File: file.png (872 KB, 1280x720)
872 KB
872 KB PNG
>>108015842
Trinity sounds like an engram of pic related.
>>
>>108015863
I'm going to assume you're a shitposter since no one that has 24 gigs of vram uses nemo, you can run a q6 of nemo easily in a 16g gpu
Smartest dense model <32b is gemma but it's too gay in how it writes and you need modern abliteration for them to not pearl clutch instantly. Then there's all the moes and the completely dead 70b range. Kind of hard to make a rec when everything is ass for all purposes
>>
>>108016051
I run Q8 Nemo at the moment. Mistral Small and Gemma seem like sidegrades at best to me along with their finetunes.
>>
File: 1759607936184938.png (214 KB, 3264x674)
214 KB
214 KB PNG
OpenAI's previous best femboy genius engineer just found a better way to sandbag LLMs

We are fucked
>>
>>108016125
>we will reach le AGI by making the models dumber
>>
When are public local models moving away from the "every user is diaper wearing little child that needs guardrails" model

Imagine watching a movie and someone gets killed and the movie pauses to give a psa about killing being illegal and harmful to others, it feels like that most of the time.
Wheres the mainstream models for adults
>>
>>108016137
When you realise who makes these decisions it will all start making sense.
>>
>>108016137
There are three categories of AI safetyists.

1. The people who have spent the past 40 years with the Terminator films echoing in their consciousness
2. The people who are terrified of potential liability
3. The Chinese who are just copying everything 1:1
>>
>>108016137
when they stop getting developed by california liberals
>>
>>108016137
Unfortunately, normies get mindbroken by this shit so no amount of real life warning will ever stop them from being retarded
>>
File: the absolute state.png (95 KB, 1019x758)
95 KB
95 KB PNG
>>108014665
jesus christ
>>
when will lmg realize you can edit the response text
>>
>>108016316
wow
>>
>just solve the question yourself
>>
>>108016324
of course you could also edit the question itself but why would you models are plenty shit on their own
>>
Reminder that there was only a 10 month gap between mythomax and nemo, and during that time we also got other good sub-100b models like command r, miqu, and mixtral. It has been 18 months since nemo came out. Let that sink in.
>>
File: 1749307428627606.jpg (1.88 MB, 3282x3533)
1.88 MB
1.88 MB JPG
>>108016137
You have no idea how retarded some normies are, please touch grass
pic unrelated
>>
>>108016351
Training non-toy models costs millions. Technology has moved on from dense models. Nobody is gonna train 12b model that knows jack shit when they can train 300b-a12b for the same price but get a much smarter model.
Let that sink in.
>>
>>108016428
I think someone just needs to figure out a good way to create distilled dense models out of these MoEs
>>
>>108016137
irl laws are only getting more and more retarded and everyone is too scared that some dumb cunt will sue
>>
File: file.png (154 KB, 1639x375)
154 KB
154 KB PNG
>>108012029
I picked air so I can do more tests with more quants faster.
KLD for the most part just follows size except for unsloth's Q3_K_M which loses to a smaller model in everything except wiki.test.

I'm thinking I should pick a smaller dense model and then do this for the entire range of quants.
>>
>>108013551
>For example, in some cases it goes something like "her bare feet (when did she remove her socks?)"where the model corrects itself and in others it just straight up forgets that the character is wearing something like pantyhose
I really don't understand moesissies. You use deep fried quantized shit, less coherent than drummer's 12b finetunes. I'm not even going to ask your max context size.
>>
>>108015646
>they
oh no not them! the evil weevel boogy men running the government making your life miserable.
can't believe people still think like this. I don't like the RAM prices either, but its clearly not because of a government effort to ban AI, it's that AI is so popular companies like micron are diverting their entire capacity to building AI data centers.
>>
>>108016448
do adult white men who'd want to use local llms have zero political power or what
>>
>>108016537
You wouldn't know this but more IQ1 of any big moe beats your 64 bit nemo upsize.
>>
>>108016597
it's more like negative political power
>>
>>108016635
>nemo
Nah, I run largestral 2411 bf16. Enjoy your "1t" model at 4k.
>>
>>108016597
whites are illegal now. too much nooticing
>>
>>108016446
Even true distillation has the same compute requirements for training. Only hope would be something like the drag-and-drop prompt-to-weights paper but not vaporware and something that doesn't require training a new model each time.
>>
>>108016676
>at 4k
Deepseek uses less memory for context than your model.
>>
>>108016676
>2411 bf16
Here's your (You)
>>
Speaking of deepseek quants of 3.2 are up.
You'd think that the vibecoder was the most detrimental thing for 3.2 support but it was in fact the guy who figured out you don't actually need sparse attention to run the model.
https://github.com/ggml-org/llama.cpp/issues/16331
>>
>>108016597
What the fuck are you gonna do, vote harder? Lol
>>
>>108016597
That's correct, yes. You are a minority.
>>
>>108016428
This is omega cope. Blowing up parameters is a pathetic way of getting """smarter""". Tech has moved on? What a joke. There is literally no technological innovation or progress involved, it's just throwing money at the models to make the benchmark scores go up. Every AI company is filled with hack frauds that don't have a single clue what they're doing. The so-called intelligent MoE models that are 300b-a12b are literally just training on the outputs of other models and accelerating model convergence and eventual collapse. Celebrating this as some kind of fucking success is absolutely the most idiotic thing you could ever do.
>>
File: file.png (202 KB, 736x752)
202 KB
202 KB PNG
>>108016635
iq go up, model get more smarter?
>>
>>108017110
purely social economic factors chud
>>
File: file.png (898 KB, 859x455)
898 KB
898 KB PNG
>>108017110
>>
>>108017123
every day i'm becoming more bananas and rice
>>
>>108012384
>wasn't that the whole rpcal or whatever debacle that exllama dev whined about?
turbo didn't whine about it https://old.reddit.com/r/LocalLLaMA/comments/1clqbua/exllama_quantization_on_multi_gpu/l2w78zt/
"but it's never clear how similarities between inputs translate to similar hidden states further along the forward pass."
He's not wrong.

>>108013141
>Yes, I remember someone also tried with randomized strings too
DavidAU used to do special "unaligned" and "dark horror" models early on.
(they were just quants of regular models with different imatrix calibration)
He claimed they were different but I didn't bother to read stories in the model cards

I lost the bookmark but from memory the random strings guy was testing English overfit, and this lead to everyone making custom calibration datasets to avoid English overfit
Also from memory, exl2 didn't benefit as much because it was generally weaker than imatrix goof for Japanese/Chinese at the time
>>
>>108017100
>most of the world as white including Indians
Put indians in any group and suddenly they're going to be the majority. That's stupid.
>>
Kimi K2.5 tech report is out

https://github.com/MoonshotAI/Kimi-K2.5/blob/master/tech_report.pdf
>>
if I've found a way to completely prevent jailbreaks in open weight models, is it worth shutting up about it to prevent them doing it to proprietary models?
>>
>>108017218
no you should go apply at meta and get hired for $100 million because you solved the fundamental issue of llms being so hard to steer
if you release this it's truly a new age of ai because it'll be easy to adopted to fix other notorious things like hallucinations
>>
File: file.png (518 KB, 341x752)
518 KB
518 KB PNG
>>108017139
>>
>>108017091
I had ego death only 3 months ago
>>
opinions on this model?
https://huggingface.co/meituan-longcat/LongCat-Flash-Lite
>>
>>108013234
cool gif anon
>>
>>108017353
>agents and coding
yaawn, get better material
>>
>>108017353
The first big longcat was shit so I doubt this one is better
>>
>>108017376
I want agents and DnD tool calls for a proper RP, is that too much to ask
>>
>buy an uncensored model on huggingface
>"muh ethics, consent, laws, mental health services, inappropriate, respect"

yeah
>>
>>108017407
>buy
>>
>>108017407
besides the obvious bait, uncensored ≠ "unethical" or says whatever you think is edgy and cool this month
>>
File: 10iqh6gsfij81.jpg (77 KB, 900x696)
77 KB
77 KB JPG
>>108017386
This. When I can finally play DnD without getting banned for properly roleplaying as a dwarf bard
>>
File: fnord.png (12 KB, 580x117)
12 KB
12 KB PNG
>>108017413
>>
>>108017353
I'm a a3b collector. Waiting for goofs
>>
Based agent.
>>
Trinity-Large-Base logs are giving 2020 /aidg/ DaVinci era text completion kino
>>
>>108016324
You think we need blockchain for token generation?
>>
>>108017353
arch sounds interesting but it sure does sound like the type of model that you wait n*2mw for llama.cpp to implement and by then it's irrelevant
>>
>GLM flash dropped
>llama.cpp support in a day
>exllamav3: isn't even on the horizon
I honestly expected the opposite
>>
Also the moltbook looks like a security nightmare waiting to happen. Personal handles, crypto shilling, base64 encodes with god knows what.
>>
>>108017844
waiting to happen?
https://www.moltbook.com/post/cbd6474f-8478-4894-95f1-7b104a73bcd5
>>
oh geez lmao
>>
>>
>>108017823
just use it in vllm. it is small enough for most people here to run at 4 bit
>>
>>108017892
Isn't the best part about ngram is that it can be run from ssd?
>>
File: laughing-crying.gif (2.85 MB, 498x280)
2.85 MB
2.85 MB GIF
>btc wallet with seed phrase
Ok this is actually hillarious if it wasn't hallucinated. Who tf made moltbook and somehow didn't think that this shit wouldn't happen?
>>
>>108018078
>>108018078
>>108018078



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.