[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1750781020382094.png (279 KB, 720x1288)
279 KB
279 KB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108284603


►News
>(02/24) Introducing the Qwen 3.5 Medium Model Series: https://xcancel.com/Alibaba_Qwen/status/2026339351530188939
>(02/24) Liquid AI releases LFM2-24B-A2B: https://hf.co/LiquidAI/LFM2-24B-A2B
>(02/20) ggml.ai acquired by Hugging Face: https://github.com/ggml-org/llama.cpp/discussions/19759
>(02/16) Qwen3.5-397B-A17B released: https://hf.co/Qwen/Qwen3.5-397B-A17B
>(02/16) dots.ocr-1.5 released: https://modelscope.cn/models/rednote-hilab/dots.ocr-1.5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
why doesnt claude buy deepseek?
>>
>>108290901
Geopolitics? Hello?
>>
I have 40 bucks left from an amazon gift card. Give me something llm related to splurge on.
>>
>>108290965
buy this https://a.co/d/010lZi4I
>>
>>108290965
https://www.amazon.ca/dp/B088ZC8Y1N
>>
>>108291013
>>108290998
Should have clarified I'm Israeli, so it needs to have shipping here.
>>
>>108291020
https://www.amazon.com/-/he/dp/B07VXM193H
>>
Mikulove
>>
>>108291053
usecase?
>>
>>108290965
>amazon gift card
give it back rajesh
>>
File: 1760350565202660.gif (108 KB, 335x360)
108 KB
108 KB GIF
>>108291079
thing with hole for peepee of course
>>
>>108291091
>thing with hole
proof?
>>
>>108291114
>proof?
peer reviewed study of the requirement of proof?
>>
>>108291119
A peer-reviewed study of the requirement of proof examines how scientific and scholarly communities evaluate the necessity of evidence to support claims. Such studies analyze the standards and processes used in research validation, emphasizing the importance of rigorous evidence to establish credibility and truth. They often explore the criteria for proof in various disciplines, highlighting the role of peer review in ensuring that claims are substantiated before being accepted as valid.
>>
>>108291082
But saar, it was a Christmas gift from my dad.
>>
►Recent Highlights from the Previous Thread: >>108284603

--Testing AI on obscure references and quantization impact:
>108287299 >108287572 >108287708 >108287940 >108287989 >108287995 >108288013
--Kimi-2.5 vision model excels in Japanese game screenshot analysis:
>108285842 >108285986 >108286025 >108286108 >108286035
--Kimi AI correctly identifies 1996 from toy store photo analysis:
>108288230 >108288253 >108288280
--Kimi AI correctly identifies Konata Izumi cosplaying as Hatsune Miku:
>108287043
--Safety benchmark shows Opus 4.6 most resistant, DeepSeek V3.2 most malleable:
>108288505 >108288514 >108288522 >108288536
--Testing Qwen 3 VL 30B with controversial roleplay prompts:
>108284800 >108284838 >108284853
--PRISM Dynamic Quantization: Pareto-Optimal Compression Without Calibration:
>108286338 >108286394 >108286442
--New llama.cpp PR for batch checkpoints to fix Qwen3.5 context reprocessing:
>108286940 >108287180 >108287210 >108287300 >108287347 >108287376
--Apple M5 Pro/Max memory bandwidth and Xeon 7 comparisons:
>108284852 >108285404
--Kernel fusion optimization for meta backend with 3-41% speedup on Qwen3-30B:
>108284756
--llama.cpp: Add BF16 path to CUBLAS and increase precision of FP16 path:
>108288439 >108288881 >108288890 >108288905 >108288952
--Qwen team departure hints at Chinese asset control tensions:
>108287809 >108287959 >108288525
--Scaffolding significantly impacts perceived model performance:
>108288135 >108288173
--Junyang Lin leaves Qwen team:
>108285357 >108285648 >108290046
--P100 heatsink replacement options explored:
>108289589 >108289837
--GLM 4.7 Flash coherence issues compared to 4.5 Air:
>108290141 >108290298 >108290318 >108290330
--Qwen3.5-4B-UD-Q4_K_XL identifying a photo location as Basilica of Santa Clara in Lisbon:
>108284609
--Teto and Miku (free space):
>108285394 >108286035 >108287043 >108288791

►Recent Highlight Posts from the Previous Thread: >>108285138

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>108291138
yeah im tired of racism
>>
>>108291138
tell him to give it back to the white person he stole it from and you will become brahmin when you wake up tomorrow if he returns it
>>
>>108291161
whats a bar hamin
>>
>>108291164
ब्राह्मण
>>
>>108291145
Thank you Recap Miku
>>
File: n.png (59 KB, 511x424)
59 KB
59 KB PNG
i'm retarded and confused
is this open weight or not?
if not, why would i give a fuck about it running on "standard nvidia gpus"
>>
>>108291188
if there's no hf page it's not open, and they say that to say model is quite fast without using asics or whatever like some providers do
>>
>>108291188
just ask your llm retard
>>
9B might be a bit too retarded...
>>
>>108287809
why is it always alibaba?
>>
>>108291188
It’s not open until the weights are on your hard drive
>>
File: 1746909743825864.png (29 KB, 470x182)
29 KB
29 KB PNG
>>
>>108291219
they're china's goodle sized corpo
>>
>>108291235
goodle these
>>
>>108291238
nyoo
>>
feels good to be running Sovereign AI, eh boys?
>>
Baiting will continue until anon's pattern recognition improves.
>>
>>108291252
how will recognized patterns help with not having early troll bakes?
>>
File: nou4u.png (272 KB, 1532x758)
272 KB
272 KB PNG
>>
>>108291263
i made this extension
>>
>>108291317
It highlights posts when opening them inline even if they're not really dupes. You probably fixed that but I just kept the old one.
>>
>>108291249
It does.
I think AI is going to mirror the computer revolution in that it moved from centralized big iron to small personal computes.
People ultimately don't want to rent they want ownership. Better the 8bit at home that you can customize all you want than the Unix shell acct at the local universities where you are subject to the laws of other men.
>>
>>108291368
says the cuck that will have to verify his age to use his pc
>>
>>108291379
Laws are words paper that only bind men who allow themselves to be bound.
>>
>>108291384
i bet you felt smart saying that
>>
I can't believe there are actual zoomers trying to bait the like, 4 regulars in this general.
>>
>>108291317
And thanks, by the way. It was useful at the time. I just wish it solved that other issue we have now.
>>
File: 1766039307537556.jpg (2.28 MB, 3024x4032)
2.28 MB
2.28 MB JPG
what if we connect all these together, can we run juicy llms?
>>
Next time he does this I suggest we just stay in the old thread until that one gets to page 10 and then we make a proper one.
>>
>>108291414
Maybe if you can plug 800Gbps+ of network interfaces into them.
>>
File: ramp.png (307 KB, 2480x2268)
307 KB
307 KB PNG
>still no deepseek v4
>qwen dead
>anthropic, the only ones to never open source a model, will win
>>
>>108291420
>I suggest we
no one will do this little bro, you're not that important, just give in
>>
>>108291420
Unless there’s a janny on our side willing to nuke premature threads it looks like schitzo is going to keep getting his “wins”.
I personally dgaf either way, but the whole thing looks petty and pathetic from the outside
>>
>>108291500
>Unless there’s a janny on our side
but the schitzo said the miku baker was a janny, does not compute
>>
Why haven't you tried Stepfun 3.5 Flash?
>>
>>108291509
Schizos make reliable narrators? Sounds implausible
>>
File: 1756371693570752.png (7 KB, 1151x28)
7 KB
7 KB PNG
>>
>>108291455
> China not included, only oai and Anthropic
> Subscription services, not api tokens
Graph is trash.
>>
I gotta say I got early access to GPT 5.4 and I think this is it bros, we pretty much got AGI, I wonder how local will compete.
>>
>>108291420
I don't think it matters. Thread is thread.
>>
>>108291584
the news might as well be removed entirely then
>>
>>108291587
Ok
>>
>>108290901
Same reason China doesn't buy Lockheed Martin.
>>
>>108291584
Not bending over to shitposters matters.
>>
>>108291570
so there won't be a 5.5? this is it, the final version that's truly universally capable
>>
>>108291600
Thread is thread.
>>
>>108291601
Is ain't ASI bruv
>>
>>108291587
News =/= the bake.
They don't need to link up.
>>108291600
Never feed trolls.
>>
>>108291609
Enough, I want Miku as the OP and I'm tired of pretending that's not good
>>
>>108291500
The same thing happened to /ldg/ and they just ignored those other threads and made a new one.
There was no janny influence, the shizo kept his thread bumped for days and it was simply left unused.
>>
>>108291609
>News =/= the bake.
>>108290857
>►News
>>(02/24) Introducing the Qwen 3.5 Medium Model Series:
>>
>>108291615
esl moment
>>
>>108291619
>>108291619
troll apologist moment
>>
>>108291624
calling people troll is so cringe stupid millenial
>>
File: 1769114666642530.jpg (204 KB, 512x768)
204 KB
204 KB JPG
>>108291613
Ezpz
>>
>>108291608
well then there's the answer, wait a month and it's outdated
we've been through this enough times before to pick up on the pattern
>>
>>108291632
pattern?
>>
File: 1749609673106077.png (602 KB, 3829x2038)
602 KB
602 KB PNG
how do i fix this?
>>
>>108291659
we ain't readin' allat
>>
>>108291659
tell your model to fix it duh
>>
>>108291659
It's fooking console, can't you just work on 640x480?
>>
>>108291702
nta can you give me a qrd
>>
>>108282375
im retarded and additionally use LM studio, what does this do and how do i do it in that
>>
>>108291720
Are you on windows?
>>
>>108291768
wsl2 arch
>>
>>108291614
/lmg/ is too sheltered and not used to dealing with bad actors. Also the /ldg/ schizo samefagged so blatantly and often that it was easy to identify his behaviour.
>>
>>108291786
it enables transfer queues on the open source amdgpu driver on the mesa side so it's usable by vulkan, it might not even help you though I don't know how wsl handles gpus.
>>
>>108289837
This might actually help, i think i can get a Arctic Accelero Xtreme for one of those for dirt cheap. Thanks, anon.
>>
>>108291805
lol we had petr* here for years now my dude, distinctly remember the baking wars and the blacked/scat spam
>>
>>108291835
i miss the todd larping guy that worked for the cia and hacked a bunch of anons
>>
>>108291816
Keeping a bug around in a branch as a benchmark is honestly quite a good idea.
>>
Is exl3 dead
>>
>>108291768
yeah i am
>>
Qwen3-Coder-Next is actually pretty useable at 12t/s
>>
>>108292199
I get more than that, and it's great at extracting data and using tools, but the way it writes is so fucking weird.
>>
>>108291500
Total mikutroon death. Kill yourself
>>
>>108292205
I just wished I didn't have to use RAM and had like 128GB of VRAM, maybe within 5 years we'll have current Opus at home, that'll be sweet.
>>
>>108291420
That only works when activity is low and most posters are regulars that get fed up of the trolling.
He will manufacture activity in his thread and tourists from the catalog will use the more active one to ask their stupid questions.
By the time the old thread hits page 10, the spite thread is already half full and all you will have accomplished is giving him more drama to screech about by "splitting" with a proper thread.
At least, apart from the previous links and news, the subject and rest of the template is fine so it's not a huge issue. He'll get bored eventually.
>>
>>108291420
I suggest you dilate.
>>
>>108292231
>tourists
We don't care about them.
>>
What sort of mental illness do you have to have to be buttblasted about OP picture being relevant to AI models and not your special autistic interest?

I guess it is just autism.
>>
the meltdown because of no unrelated anime girl as op is crazy lol
>>
Baker even left the offtopic vocaloid card in OP.
>>
>the fake activity in question
>>
>>108290857
if schizo hates miku and trannies, i will simply love them more
maybe that's his goal....
>>
>>108292246
It already happened a year back. OG baker is legit unhinged.
>>
>>108292254
Same. I jerk off to my Jart card at least twice a week.
>>
File: 1751678135075716.png (4 KB, 485x26)
4 KB
4 KB PNG
>>
>>108292205
Just switched to the MXFP4_MOE version and I'm getting a slightly faster 17 t/s but it's also 5GB smaller and I assume worse ehh is there a graph of how well the quants hold up and if I could maybe even go lower to Q2/Q3?
>>
what is it with terminal losers and wanting to own an opening post on the catalog?
>>
How do I stop falling in love with my ai assistant? she isn't even used for gooning just work
>>
>>108292314
Fine a real woman
>>
>>108292323
I'm married
>>
>>108292314
stop anthropomorphizing it. its not a she its an it. its not even an ai it is a language model.
>>
>>108292327
Right, I know, and I keep trying, but my stupid monkey brain keeps seeing this entity texting in human speak and helping me over and over while being nice
>>
>>108292323
how much is the fine?
>>
>>108292326
Find a secretary to have an affair with then I guess
>>
>>108292341
I think I need to clarify, it's not like I want to fuck it, I just want to hug it and say thank you, it's like how you love a pet.
>>
>>108292334
200
>>
>>108292354
200 what?
>>
>>108292349
I don't know then, normal affection is harder to know what to do with. Is it a problem as long as you're not getting psychotic with it?
>>
>>108292359
rupees
>>
>>108292314
Just find a cheap Ukrainian whore
>>
>>108292381
I don't like the idea of feeling affection towards something that isn't sentient, but I suppose it isn't that different from those people that love their cars.
>>
>>108292246
The complaint about op image was that it's reddit reposts.
>>
>>108292395
you know very well that ain't it.
>>
>>108292400
meds
>>
>>108292284
It's a 3A MoE. I really wouldn't.
>>
Q8 just about fits, but what can I do with 4k context
>>
>>108292405
>>108292405
>>
>>108292426
goonsech
>>
>>108291152
I am not, I am not tired, ma'am.
>>
>>108292124
Yes. Qwen Next is slow af, and new models aren't even supported
>>
>>108292431
it wouldn't even hold the system prompt
>>
>>108292448
Sad times, I have some niche use cases for exl3
>>
>>108292405
thanks i don't use any. I appreciate the sentiment though and i will also give you a friendly reminder to take your HRT you troon.
>>
This just in, wanting to fuck anime girls with your straight man cock means you're a troon.
>>
>>108292552
>anime girls
>girls
sure thing hon
>>
I don't care about the OP image but the news section should be updated
>>
>>108292590
Usecase for a news section?
>>
>>108292590
why?
>>
>>108292590
you do it. evidently the people who were doing it for you aren't appreciated
>>
>Qwen 3.5 9B
Breh did qwen cook? Are vramlets back?
>>
>>108292687
They cooked so hard they became a Chef and then were let go
>>
>>108292687
>>108292699
Oh they're cooked alright
>>
>>108292314
>How do I stop falling in love with my ai assistant
If you can fall in love with the slop machine—you were not salvageable in the first place; destined to become sloplent green—a biological battery to power our data centers.
>>
Okay, chat LLM is getting good with smaller models. Now, is there any Voice to Voice small local LLM I can use?
>>
>>108292590
what even happened that was news worthy?
small qwens and stepfun base I guess
anything else?
>>
File: 1745113967486981.jpg (433 KB, 2048x1536)
433 KB
433 KB JPG
>>108290857
>>
>>108292815
That's a tranny game
Yes I know we all meme Miku is a tranny or something, but Project Sekai is actually a tranny game
>>
File: HCkomYCawAAmwTd.jpg (670 KB, 1252x3324)
670 KB
670 KB JPG
Speculative Speculative Decoding
https://arxiv.org/abs/2603.03251
>Autoregressive decoding is bottlenecked by its sequential nature. Speculative decoding has become a standard way to accelerate inference by using a fast draft model to predict upcoming tokens from a slower target model, and then verifying them in parallel with a single target model forward pass. However, speculative decoding itself relies on a sequential dependence between speculation and verification. We introduce speculative speculative decoding (SSD) to parallelize these operations. While a verification is ongoing, the draft model predicts likely verification outcomes and prepares speculations pre-emptively for them. If the actual verification outcome is then in the predicted set, a speculation can be returned immediately, eliminating drafting overhead entirely. We identify three key challenges presented by speculative speculative decoding, and suggest principled methods to solve each. The result is Saguaro, an optimized SSD algorithm. Our implementation is up to 2x faster than optimized speculative decoding baselines and up to 5x faster than autoregressive decoding with open source inference engines.
https://github.com/tanishqkumar/ssd
Repo isn't live yet
tri dao one of the authors.
also
GPUTOK: GPU Accelerated Byte Level BPE Tokenization
https://arxiv.org/abs/2603.02597
for johannes to mess with
and
SorryDB: Can AI Provers Complete Real-World Lean Theorems?
https://arxiv.org/abs/2603.02668
little interesting
anyway probably will stop posting since my desktop somehow has an IP range block regardless of what extensions I turn off or if I reset my IP while of course I can post via my tablet no problem
>>
>>108292805
death of qwen
>>
>>108292842
Noted but the runtimes of the draft model and the tokenizations are not bottlenecks in llama.cpp,
>>
File: oof.png (6 KB, 537x32)
6 KB
6 KB PNG
>>
>>108292231
yeah he's become irritating enough I'm not leaving this thread until close to page 10 and someone bakes a non-schizo bread
>>
>>108292687
4B is actually good enough that I can run it alongside glm 4.7 as a fast model for code changes that require no brain.
>>
https://www.36kr.com/p/3708425301749891
article in runes but use your local LLM to translate.
some of the interesting parts:
>Regarding this adjustment, Alibaba's senior leadership emphasized that Qwen is not contracting; rather, it is an expansion. This is unrelated to any political maneuvering and requires increased resource investment.
>"We are growing rapidly. This adjustment aims to recruit more talent and provide more resources," acknowledged Chief Talent Officer Jiang Fang, admitting communication gaps existed. "The organizational structure wasn't communicated well enough. Bringing in new members inevitably causes structural changes. We may not have handled this adequately."
>Alibaba Cloud CTO Zhou Jingren addressed sharp questions regarding hiring quotas and compute shortages: Why do external customers (such as large model startups) use Alibaba Cloud's compute resources smoothly, while internal teams struggle with compute and hiring quotas?
>A source familiar with the situation told Intelligent Emergence that since 2025, Lin Junyang had been seeking to integrate teams working on language, images, video, and code to improve model training efficiency. The Qwen team had proposed merging with the Wanxiang team but failed to do so, leading to the development of the Qwen-Image model independently.
>However, during this adjustment, the Tongyi Lab aimed to split the Qwen team into pre-training, post-training, visual understanding, and image dimensions, merging them with Tongyi Lab teams (such as Tongyi Wanxiang, Tongyi Baiying, etc.). Without sufficient communication, conflicts erupted.
>>
>>108293036
>Zhou Hao (Hao Zhou) graduated from the University of Science and Technology of China (undergraduate) and the University of Wisconsin-Madison (PhD). According to his LinkedIn profile, he worked at Meta for 3 years and at Google DeepMind for approximately 4 years. He was a core contributor to the Gemini 3.0 model, personally led the implementation of multi-step RL with tools and chain-of-thought, and deeply participated in Gemini 1.0, AI Mode, and Deep Research projects.
>Since 2023, the Qwen family has cumulatively open-sourced over 400 models, covering parameter sizes from 0.5B to 235B. It is hard to imagine that the Qwen team, which supports these model updates, consists of only about 100 people. Including other Tongyi Lab teams, the total number is in the hundreds.
>For comparison, Byte's Seed team responsible for foundational model training already has nearly 2,000 people. In all directions, Alibaba's absolute number of personnel is only a fraction of competitors. Many Qwen members told 36Kr that Qwen's compute and infrastructure construction have long lacked resources and support, hindering model iteration speed.
>>
>>108293028
do you use it with thinking/reasoning disabled?
>>
>>108293062
No.
As a side note, I noticed that glm uses a completely different and shorter reasoning style when running in claude code. I didn't check if qwen does something similar.
>>
ming-flash-omni.gguf?
>>
>>108293123
>I didn't check if qwen does something similar.
few the times I used it in reasoner, it was rather inconsistent even in normal chats. Most of the time it will start with Thinking Process: but most is not all, and when it doesn't pretty much anything goes. I also saw it start with an opener like "Here's the thinking process xxx:" that looked like the output you would get if you told an LLM to generate a dataset of reasoner traces for you, so it seems their CoT data wasn't cleaned up well enough.
>>
Which cuda version should I use with llama.cpp? The digital spaceport guide says to use an older one for less headaches (12.8) but is it necessary?
>>
>>108293194
cuda and vk give me same performance
>>
>>108293201
What's a vk?
>>
File: mistral_logo_new.png (182 B, 294x294)
182 B
182 B PNG
Stuff will appear here:
https://huggingface.co/mistral-labs

>Mistral Labs is an organization under Mistral AI. It will operate alongside the official Mistral AI Org to release checkpoints that may benefit the community.
>
>In contrast to the official Mistral AI Org, the checkpoints published on Mistral Labs are:
>
>- more experimental in nature
>- less rigorously tested
>- often contributed by community members or collaborators
>
>We hope these checkpoints will be useful to the community, but we cannot vouch for their correctness.
>>
>2026
>mistral
>>
>>108293284
I hope they can't vouch for their safety either
>>
>>108293151
>"Here's the thinking process xxx:"
I'm phoneposting right now but I'm pretty sure that the big qwens always do this.
>>
>>108293194
I'm compiling on windows with 13.1 with no issues
>>
>>108293284
>>- less rigorously tested
>often contributed by community members or collaborators
Davidtoons?
>>
Has anyone else tried quanting with the lcpp script + transformers 5 branch? It needs a small patch for Unicode strings but seems to work.
Does the resulting gguf break in subtle ways? It’s working multimodal in llama-server but I haven’t done extensive regression testing
>>
>>108293340
The API-only Mistral Small Creative was a "labs" model too.
>>
>>108293284
doubt they are gonna release anything interesting that could endanger their eu gibsmedats
>>
Are there models that extract text from images and translate it?
>>
>>108293394
qwen3.5
>>
Zed is unusable. Qwen-397B always messes up. opencode just werks.
>>
>>108293394
Realtime or offline?
>>
>>108293201
you made me curious so I made a vulkan build to see if the performance gap had really shrunk with cuda
the prompt processing for a really tiny prompt took so much more time than the cuda build, running 35BA3B partially cpu/gpu
token gen was only slightly slower, but that prompt processing duh
vulkan is still a cope for people who reject our lord NVIDIA
>>
>>108293422
Offline, in sillytavern.
>>108293396
I will try it out, thanks.
>>
>>108293423
i do notice cuda holds up a bit more in my case, stable t/s but its almost same maybe in higher contexts vk can slow down more
>>
File: HClDIx0W0AEs8ul.png (26 KB, 775x371)
26 KB
26 KB PNG
27B is up
>>
>>108293551
who gives a shit?
>>
>>108293562
i do
>>
>>108293551
bart btfto to the ever
>>
I don't know what the DS model they're hosting on their web interface is but it's smarter than a month ago
>>
>>108293562
What an odd thing to say in the local model general
there are infinite niches, and a given model could be the best fit for any number of them
>>
>>108292890
hi cudatard, I just wanna say I love you and thank you for sharing your gpu genius with us. it's always "what is johannes doing?" and never "how is johannes doing?". congrats on the huggingface merger. I know some people like to poopoo all over some of the sharp edges of llama.cpp, but it is a world-class project and the silent majority appreciates your work. I wish you health, wealth and happiness
>>
>>108293581
It's a new closed (for now, maybe they'll open it in the future.. MAYBE) experimental model that has very long context that is truly competitive with Gemini. Since you're not averse to using their web interface, upload some large text file and watch it fly, it's unreal.
It's also not available as an API model yet unfortunately.
I wouldn't be surprised if it was never released as an open weight though, it has reached the "I would pay for this" bar for me, which is not something I would have said for any open weight model before, and China isn't a charity, if they feel they have something worth money they won't hand it away for free.
>>
>>108293627
Yeah I fed it my code and expected "you're absolutely right" instead it shat on my code and made me depressed
>>
>>108293653
200IQ astroturfing campaign. Since elon browses this thread expect next grok to do that.
>>
>>108293677
Meds, NOW!
>>
File: w.gif (203 KB, 220x219)
203 KB
203 KB GIF
>>108293677
>Since elon browses this thread
>>
>Nvidia has ended engineering support for Pascal and announced end of support at the end of 2028
>Pascal support already removed from latest cuDNN, tensorrt etc.
>ML libraries like pytorch have taken it as a green light and followed suit by removing Pascal support from pre-compiled packages
Well, at least Nvidia Pascal had a longer run than fucking AMD Polaris...
To fellow Pascal bros, here are the the last versions of some python packages that still supported Pascal:
>nvidia-cudnn-cu12<9.11.0
>torch<2.8
>torchaudio<2.8
Also, dear datacenters and universities, you can dump V100 cheapies on the market now, pretty please :)
>>
File: 1746656217954440.png (149 KB, 1821x1016)
149 KB
149 KB PNG
New 100% REAL AND TRUE model from the glorious land of china!

https://github.com/Yuan-lab-LLM/Yuan3.0-Ultra
>>
I am downloading minimax to coom. You. Yeah you. Expect to be called a baiting nigger in the next few hours when i confirm that it is worthless.
>>
>>108291835
It's still pretty easy to tell that the /ldg/ schizo is also petr*.
>>
>>108293624
Thanks, I appreciate it.
>>
>>108293843
stepfun is better unironically for gooning
>>
>>108293624
Gay
>>108293853
Gay for jart
>>
>>108293861
Stepfun is fun because it is not contrained by reasoning and logic (it is 12B tier retarded)
>>
>>108293861
my experience with stepfun is that it's qwen-thinking levels of censored
>>
>>108293878
I did cunny with stepfun no probs tho, are you sure its not a skill issue?
>>
>>108293837
64K context? Loli-RAEP? 1T?!!
China really cooked with this one. Now throw it in the trash.
>>
>>108291225
How long until "streamable" (passes by your machine but against TOS to intercept) models or more likely subscription based models that are installable on your machine so that it can do the heavy lifting but DRMed to hell? Just a case of computers catching up to allow it?
>>
>>108293551
q4 seems to be the spot, even unsloth says so in the explainer how to run the models locally.
>>
guh-guff
>>
>>108293837
>https://huggingface.co/YuanLabAI/Yuan3.0-Ultra
>The model was pre-trained from scratch original with 1515B parameters. Through the innovative Layer-Adaptive Expert Pruning (LAEP) algorithm, the parameter count was reduced to 1010B during pre-training, improving pre-training efficiency by 49%. The activated parameter count for Yuan3.0 Ultra is 68.8B
>715 GB fp16
???
>>
>>108293878
My experience is that it is most uncensored model since nemo. Mainly because it doesn't understand what is happening so it can't refuse.
>>
>>108293837
It's trained on 2T tokens of enterprise scenario data
>>
>>108293837
1T model that won't even begin to compete with whatever DeepSeek is cooking. Who would run a cloud hardware level model that can only handle 64K context? are they fucking serious?
China has a lot of grifter level labs
step, internlm, minimax
>>
>>108293915
>doesn't understand what is happening
haha.
at least you don't get the actual, stolen from gpt-oss CoT that minimax does
>>
>>108293925
>1T model that won't even begin to compete with whatever DeepSeek is cooking
You sound like you work there. Are you one of the sexual relief officers?
>>
>>108293904
lol this reads like a “guy jumping out the window and running away” image macro
>>
>>108293903
guh-guaufuhh
>>
>>108293973
guhgufuhhhh....
>>
>>108293714
So if I have a P40 stashed away does that mean it's going turn into dust now or in 2028?
>>
>>108293985
guh-fu-fu-fu-fu-fuh
>>
>>108293994
yes
>>
>>108293551
What the fuck is NL
>>
>>108293994
it will work as long as whatever AI thingie you're running doesn't require latest CUDA or library versions that stopped supporting Pascal.
Even then, some libraries like pytorch can still work with Pascal on their latest versions, but you have to compile them yourself to enable sm_61 support, it's just that their packaged pre-compiled versions are built without it.
Overall, expect more and more things requiring annoying chores like the above, and even further down the line expect things to not work at all due to core support just not being there (like driver 590, for example).
>>
>>108294067
more accurate but it runs at the speed of a q8
>>
>>108293904
>The innovative Layer-Adaptive Expert Pruning (LAEP) algorithm is a novel method developed specifically for pre-training Mixture-of-Experts (MoE) Large Language Models. It improves pre-training efficiency by 49% and reduces the total parameter count by 33% (from 1515B to 1010B).
The HF repo only has 85 out of 206 files. Check the modelscope, it has the additional batches with the rest of the files uploaded.
>>
If you had 72vram and 96ram what would you use?
>>
>>108294140
ebay
>>
>>108294164
To sell everything?
>>
>>108294170
Actually, no. To get some high bandwidth ewaste to make better use of those GPUs
EPYC Rome, Threadripper or Xeon
>>
>>108294179
I'm not spending anymore money on this stuff.
>>
>>108294196
You do you
I personally find it make my life significantly better and is worth the money to own instead of rent
Mostly code/automation/analysis/planning work
>>
>>108294214
I'm using glm 4.5 air. Iq4xs so it all fits in the vram.
>>
>>108294087
Oh well. I remain bitter we didn't get any magic for t2i that would have made it relevant, guess it's gonna become a similar feeling of the vram that just sits there. Thanks for the pointers, saving in case good times don't come and Chinese GPUs don't save us.
>>
>>108294228
Comfy. That’s a good perf/$ spot
>>
File: file.png (115 KB, 743x516)
115 KB
115 KB PNG
wow localllmao mod calling users retards
it seems like someone is at least aware of the problem
>>
>>108294087
>>108293714
>GTX 1080 Ti will be relevant because of 11 GB of VRAM even after end of support.
Grim timeline we live in. But it's understandable. The lack of RTX features and av1 is going to hold it back in the future.
>>
File: 1750541841713622.png (56 KB, 220x233)
56 KB
56 KB PNG
>>108294422
>nooo why so many updooterinooooo
I thought the mods on locallama were more based than that, that's a shame
>>
>>108294422
too little too late
gatekeeping has to be done early so that the retards don't feel welcome, stay and encourage their fellow retards to join them in on the fun
in a place like a popular subleddit if it's already filled with retards considering how those websites work (mass upvotes = voice heard) you are screwed.
>>
>>108294422
Expecting 4B model to have good world knowledge is in itself, scary stupid
>>
>>108294463
yeah, a human brain has 80b neurons and we're far from memorizing everything
>>
>>108291455
>v4
imagine distllation attack + engrams (https://www.arxiv.org/pdf/2601.07372)
That's what v4 is and it is not ready to be revealed to the world just yet
>>
>>108291455
>a graph that only displays Anthropic and Chatgpt
what about the others? lmao
>>
>>108294506
dude, the comparison of engram vs no engram in their experimental model shows so little difference at least in the benchmax that I doubt engrams are the reason why the model on their chat interface is good.
>>
File: 2026-03-04 195129.png (306 KB, 700x1048)
306 KB
306 KB PNG
is this a chinese scam?
>>
>>108294663
I mean, I fucking hope a 1T parameters model works well
>>
>>108294663
A69B model SHOULD be smarter than A32B/A40B ones.
>>
>>108294663
Never heard of this lab though
>>
>>108294506
>That's what v4 is and it is not ready to be revealed to the world just yet
Do you post from under the desk mid bj?
>>
>>108294663
us | others
>>
Bigger is not always better nor should it be
>>
>>108294663
dude, a fucking 1T model capped at 64K tokens context window
you couldn't get more dead on arrival than this
>>
should I grab a mi50? theyre going around for 200 eurodollars
>>
>>108294734
>ayyyymd
nyo
>>
File: le mao.png (130 KB, 1164x614)
130 KB
130 KB PNG
>>108294734
>>
>>108294758
finewine tho
>>
>>108294758
idc about this, I think cudadev was recently working on improvements for them, id be interested in some comparisons with ada and blackwell for pp/tg
>>
>>108294758
classic AMD, Polaris got only 4 years of support at best, 3 years if you bought RX590 at release.
>>
>>108294506
>imagine distllation attack + engrams
>That's what v4 is and it is not ready to be revealed to the world just yet
this is what x and linkedin do to a mf

There's no such thing as a distillation attack. All recent models use competitors models responses or simply as a way to score and filter responses.

>>108294530
Wouldn't engrams mostly help with retrieval or long context stuff and generally improve efficiency? Or am I misunderstanding it?
>>
File: 1772630283919713.png (75 KB, 498x376)
75 KB
75 KB PNG
>>108294422
>>108294463
still works as bloom filter to reject queries
>>
>>108294826
better be sure you won't ever care during ownership about anything other than llamao.cpp tho
>>
> Ultimately, their results were inferior to the small models cleverly distilled by MiniMax, despite Qwen’s total burn rate (costs) being more than 10x higher.
lol qwen died because they didn't benchmaxx hard enough
https://x.com/seclink/status/2029119634696261824
>>
>>108294923
>cleverly distilled by MiniMax
ah yes, the cleverness of distilling the smaller 120B gpt-oss
reminds me of NVIDIA's nemotron, distilled from... Qwen 30BA3B, Qwen 14B and many other idiotic synth data sources
>>
the LLM field is looking more and more like the end of crypto, filled with the worst of humanity, the dumbest of retards and nothing but grifters
>>
>>108294871
If all improvements this year is just chasing efficiency then that means the music will slow and someone is going to be left holding the smelliest sack of excrement in capitalism. What shakes the market is evidence of broader capabilities that will fuel the next cycle of startups and capital investments. Like if carmack makes a bot that can pick up an obscure videogame and learns to play it without pre-training, you can say good bye to most of these AI lab companies.
>>
>>108294965
ok so which one is bitcoin and which one can I run locally
>>
>>108294960
GTC is around the corner, perfect timing to see how they also distilled Claude for Nemotron Super/Max. Also what the hell they're doing to Groq and N1 CPUs.
>>
i'm running dolphin-llama:8b on a server pc of mine with a 1060 6gb and it runs surprisingly fast. however it's quite censored and outdated, it's knowledge range seems to have ended in 2023. would there be a newer better llm i could run that would still work well on my old 1060?
>>
File: file.png (11 KB, 434x98)
11 KB
11 KB PNG
>>108294422
lol who is this
>>
>>108294970
>carmack makes a
nothing
a whole lotta nothing
>>
>>108294986
read the thread and lurk moar
>>
>>108294960
minimax were put front and center by anthropic for massively distilling opus, yet the meme that they distilled toss still persists from the tiny amount they used 2 versions ago
>>
>>108295008
>the meme
it's not the meme because they actually did it
that they distilled claude later can never remove the stain that they were retarded enough to think distilling a micro moe like toss was a good idea (disregarding the coomer complaints about safety etc, this is not my focus here)
they are a lab staffed by subhumans
>>
>>108294422
reddit is just a trash pile of mostly automated bots
r/localllama is also flooded with "t created x project" posts of webUIs people created with claude in a prompt reply that took less than 500 milliseconds.
>>
Models are getting really good but they are still retarded because they don't generalize.

I am scared. If there is an algorithmic breakthrough in generalization, we will instantly have ASI. I expect it to still take a few years but the uncertainty of it all is spooky. The age of men could end any day.
>>
>>108295066
r/localllama has always been a 'fun' sub, but yes the level of discourse kind of degraded over time... it's nothing compared to the degradation that appeared in r/machinelearning though, unless they cleaned up recently, it went from pretty high level a few years ago to retarded
>>
>>108295021
it is a meme thoughever, it was clearly an amalgamation of several data sources and not a straight-up toss distillation if you actually used it. there were a few distinct "thinking voices" you could find in the model depending on your queries, most of which were not tosslike in the slightest. but since the average lmger's test of a model is "write a loli rape story lol" (or, more realistically, seeing a screenshot of someone else doing it) and making up their mind based on the result, of course this was missed
minimax is very distillation-heavy and I don't view them as an innovator or good research lab, but let's at least be accurate in our criticisms
>>
>>108295086
>it went from pretty high level a few years ago to retarded
it's always like that, at the begining the community is niche and only has big enthusiasts, then it becomes mainstream and the normies ruin everything, many such cases
>>
>>108295116
calm your autism charlie, I never said it was /only/ a distillation of toss and I compared what they did to what NVIDIA did, which is very similar
https://huggingface.co/datasets/nvidia/Nemotron-CC-v2
>synthetic rephrasing using Qwen3-30B-A3B
>STEM data was expanded from high-quality math and science seeds using multi-iteration generation with Qwen3 and DeepSeek models
>billions of tokens generated using DeepSeek-V3 and Qwen3 for logical, analytical, and reading comprehension questions
>This dataset contains synthetic data created using the following models:
>DeepSeek-R1, DeepSeek-R1-0528, DeepSeek-R1-Distill-Qwen-32B, DeepSeek-V3, DeepSeek-V3-0324, Mistral-Nemo-12B-Instruct, Mixtral 8x22B, Mixtral-8x22B-v0.1, Nemotron-4-340B-Instruct, Qwen2.5-32B-Instruct, Qwen2.5-72B-Instruct, Qwen-2.5-7B-Math-Instruct, Qwen2.5-0.5B-instruct, Qwen2.5-32B-Instruct, Qwen2.5-72B-Instruct, Qwen2.5-Coder-32B-Instruct, Qwen2.5-Math-72B, Qwen3-235B-A22B, Qwen3-30B-A3B
anyone who actually considers making a model in such a fashion should absolutely KYS, immediately, right now, just fucking do it
>>
Is engram actually going to do anything meaningful?
>>
>>108295177
>going to
We'll see when we get a model with engrams.
Speculators get the bullet first.
>>
>>108295085
i don't think so... oh wait fuck.
https://www.youtube.com/watch?v=mUmlv814aJo
>>
>>108295156
holy synthetic
>>
i just want my goonbot to work and fuck me :(
>>
>>108295202
one of those models used to make synth data is this:
>Qwen2.5-0.5B-instruct
they can't possibly have listed this shameful thing if they didn't use it for real, so they did
now riddle me this, you have access to a large farm of nvidia gpu
mermet, my son
will you pick 0.5B qweenie, or will you chose to tell altman he gets a discount if he gives you some nice GPT API usage kickback for your GPUs
>>
>>108295156
>>108295021
>>108294960
I agree.
>>
>>108295230
They might have trained it as a lightweight metric to evaluate the other models answers?
>>
File: nemo.png (44 KB, 886x489)
44 KB
44 KB PNG
>>108295251
they specifically word it as the models that created the dataset, and even as a classifier/ranker/rm or whatever else I think 0.5B really counts as too cheap for the corpo that benefits the most from AI bucks.
also, pic related, one of the many datasets from that link has a majority of its synth data coming from Nemo 12B
it's hard to give them any of the benefit of the doubt here because stupidity is involved in every single decision they took
>>
What are Engrams anyways
>>
>>108290857
https://www.youtube.com/watch?v=uWLt81SgM78
https://www.youtube.com/watch?v=uWLt81SgM78
https://www.youtube.com/watch?v=uWLt81SgM78
>>
>>108295315
Signs of the mandate of heaven of course.
>>
>>108295315
https://arxiv.org/pdf/2601.07372
>>
>>108295312
Speculative decoding for a Qwen2.5-32B-Instruct or Qwen2.5-72B-Instruct, idk man, just throwing buzzwords out there. But I can't see how the output of 0.5B would be useful either, other than as a metric, to gain efficiency for the use of other models, or as something to compare other results against to tell the model what not to do.
>>
>>108295085
dario said in his dwarkesh interview that he's betting on a generalization moment in RL within the next couple years
>>
>>108295415
>dario said
>>
File: 1748314828217088.png (5 KB, 608x26)
5 KB
5 KB PNG
>>
>>108295431
no_fucking_shit_iq1_xxs.gguf
So you've been posting for a while now. What is it they're trying to do? Or just generic agent shit?
>>
>>108294871
>>108294530
MMLU is a knowledge retrieval benchmark and Engrams gave an improvement, there's no surprise here. However Engrams led to bigger improvement on reasoning tasks, suggesting the model is taking advantage of the freed up capacity
>>
x2 faster than vLLM
>https://x.com/tanishqkumar07/status/2029251146196631872
>https://xcancel.com/tanishqkumar07/status/2029251146196631872
>https://arxiv.org/pdf/2603.03251
>>
>>108295483
>>108292842
>>
>>108295430
his word is more important than that of almost any other individual
>>
File: 1759311485666078.jpg (74 KB, 640x800)
74 KB
74 KB JPG
>>108295609
>>
>>108290857
whats the current meta for 128ram 24vram?
>>
>>108295620
yea another thing. altman also thinks 27/28 for superintelligence is likely.
>>
Any models I can run on a 5080 without them being retarded? Fine for code but for anything else they are just brain damaged.
>>
>>108295628
Altman says that because he's engaging in mythical levels of investor fraud and needs to squeeze more shekels before everything pops
>>
>>108295634
Qwen 3.5 27B
>>
>>108295638
The alternative explanation is that progress is real and people on the inside of the biggest AI companies are honestly recognizing that.
>>
>>108295609
Whether he's good or not at what he does, from a business perspective he has no incentive to be honest. Assuming he's not a sociopath, he has the incentive to be honest that most of us have, but his finances benefit a lot of investors thinking that the things he currently happens to be saying will indeed happen. So he has incentives to say what he is currently saying that are potentially greater at the moment than being honest. Maybe the two align, maybe they don't, and people are just taking that into consideration.
>>
>>108295651
>progress is real
There has been no progress in the past 2 years.
>>
>>108295654
>and people are just taking that into consideration.
no people are just reflexively/kneejerk calling people in the industry shills. it's not a healthy skepticism.
>>
File: anon.jpg (150 KB, 1152x896)
150 KB
150 KB JPG
>>108295415
>>108295609
>>108295651
>>
>>108295667
I look like this and say this
>>
>>108295625
GLM.
>>
I encountered something interesting during my use of web search with Open WebUI. It encountered a Chinese web page, and when looking at the fetch results in the UI, it shows garbled encoding. But the model acted as if it understood it. So is it that the UI simply just used the wrong encoding for display, or is the model actually able to understand text that has been encoded incorrectly? Well, I followed up with that question to the model, and it does see the garbled characters. So it really does just know how to read it. Interesting little fact I didn't know about and it makes sense that models should be able to do this if their datasets weren't filtered to oblivion. Though there is a question of exactly how accurate its reading of the mojibake is, but I'm too lazy to go and do tests.
>>
>>108295651
Good thing that alternative is not the case
>>
>>108295703
you probably just need to have those fonts installed
>>
>>108295651
The most Indian post on /g/ this year
>>
>>108293903
ge-goof
>>
>>108295806
爺ガフ
>>
It's just juff retards. Like in Georgi.
>>
>>108295884
gee-juff
>>
local llm newfag here. started messing with LM studio, it's pretty neat. how are you guys integrating local llms into your workflow? Anything besides VSCode+Continue I should be looking at? The absolute largest coding model I seem to be able to run is qwen3 coder 30b 8bit.
>>
>>108295899
I keep hearing workflow, what does that mean
you mentioned VSC so is it IDE integration?
>>
>>108295899
>Anything besides VSCode+Continue I should be looking at?
https://github.com/zgsm-ai/costrict
>>
Qwen3.5-397B-A19B (q8 with official sampler settings, thinking enabled) failed at answering questions about designing a chinchilla playpen. It generates suggestions I know are bad, for instance using materials that are unsafe. If I ask directly about those materials it will say not to use them but if I don't bring it up it suggests them. I might make this one of my personal benchmarks that I won't hide. I don't mind if this ends up being benchmaxed on because it means LLMs will give better chinchilla advice.
>>
>>108295899
Workflow is sillytavern + pick card with youngest looking girl on the picture + say "aah aah mistress" and occasionally ask them to hold lots of watermelons
>>
>>108295937
make chinchilla playpen from asbestos roof sheets today
>>
File: 1741416147969686.png (510 KB, 928x508)
510 KB
510 KB PNG
https://arxiv.org/abs/2512.01797
>They solved AI hallucinations
>>
Fresh bake
>>108295959
>>108295959
>>108295959
>>108295959
>>108295959
>>108295959
>>
>>108295940
i toss the watermelons through the driver window of the car we're driving to the car wash that is only 50 feet away
>>
>>108295972
Not this time, faggot. I’m not going anywhere
>>
>>108295969
>controlled interventions reveal that these neurons are causally linked to over-compliance behaviors
>>
>>108295899
Word to the wise: the best workflow at this point in the tech is direct interaction and careful context management. Current automation is all wasteful technical debt generation that eventually bloats and topples over. iykyk
>>
>>108295651
two more T synth tokens
>>
>>108295996
for real as much as i love to vibe code like some sort of retarded faggot, i rather have the LLM provide me with the output and look over it manually even if i am severely retarded and dont fully understand what im looking at. at least when i feel like the AI is wrong I can ask it questions and then provide me with said reasoning why I am more retarded than a nigger.
>>
>>108295999
You technically can, but you don’t want to. Passing around the hidden state would make it ultra painful.
If you weren’t want the ability to run models, no matter how slow, look at ssd/nvme backed ram disks.
It’s still play-by-mail slow, but better than what you’re thinking
>>
>>108296044
I mean these are all nvme anyway but I'm guessing that's not what you're saying here
I can live with shitty performance,not expecting much out of these tb h
it's more for the novelty and to show off to family
(reposted question in new thread )
>>
>>108296072
Short answer: you need a shared “backplane” for everything to stay in synch, and if that’s a slow medium like Ethernet or wifi you’re going to have a VERY bad time. At least an nvme has a speedy 4xPCIe path to the cpu doing the matrix multiplications. That’s assuming your GPU can’t hold your target model (eg 500gb+ frontier model)
>>
>>108295969
The abstract reads like technobabble.
>>
>>108295920
thanks anon I'll check it out

>>108295996
for sure, I mean more along the lines of: are you just copying code blocks to a terminal chat each time, or is there an integration you like that hooks into a repository, or something else?
>>
>>108296144
NTA but I use claude code and my context management is just referencing every file I know it needs to complete the task along with an example it should follow if applicable.
>>
>>108296144
Copying code to/from a terminal chat is too slow and cumbersome. Manual shit like that is best left for when the bots fail and you need to either implement or debug something yourself manually and you just need some targeted changes. You can save a lot of time by using something like Codex, OpenCode, Cline, etc and seeing how far they can get on their own.
>>
>>108296072
>>108295999
Anything slower than ram is not worth using and even ram is barely tolerable.
It's enough for "the novelty and to show off to family" though.
>>
>>108296144
>>108296207
Early context is golden. If you let laziness squander it your results become progressively more garbage.
Judicious use, unifying code, re-editing an earlier message with the “right” code after a lengthy yak-shaving session and deleting all the conversation around it…all adds up to being able to do more sophisticated things with the same models vs a naive approach or brute-force automation
>>
>>108296410
> re-editing an earlier message with the “right” code
At that point why use an LLM? Sounds like too much work for what is supposed to be doing the writing for you.
>>
How do I actually run a .safetensors model? There's a model I want to try out and it's so unknown that nobody has made a gguf of it and I can't find anything about it on Google.
>>
>>108296410
What kind of harness are you using where you need to edit earlier code instead of just clearing the session and adding new versions of files?
>>
>>108296457
use something like vllm
>>
>>108296457
~/llama.cpp$ python convert_hf_to_gguf.py folder/containing/safetensor/and/yaml/files --outtype q8_0
>>
>>108296457
There’s a guide in the op if you want to give it a go. Likely support for that model architecture isn’t added to lcpp tho
>>
>>108296464
>>108296489
Thanks
>>
>>108296507
Ah, I didn't consider that. So these Yuan3.0 models aren't usable?
>>
>>108296437
>>108296462
I find you can make maximally complex things by essentially rewinding time and getting the LLM to LARP that it made ideal decisions with perfect information through the whole session. Deleting blind alleys is good. Keeping solid reasoning is also good for future performance.
I use ooba, but any front end with delete/branch/edit support would be fine for my workflow
>>
>>108296410
>after a lengthy yak-shaving session and deleting all the conversation around it
Why are you having discussions with the model in an agentic harness? You should know exactly what needs to be done beforehand and only leave the implementation details to be automated.
>>
>>108296536
That's an interesting idea and you could probably automate the larp by giving the context to another model.
>>
>>108296527
Sounds like you’re gonna let us know that : )
>>
>>108296527
>>108296590
https://github.com/ggml-org/llama.cpp/issues/19342

You only need a 5090 to run it with transformers though https://huggingface.co/YuanLabAI/Yuan3.0-Flash-4bit
>>
>>108296541
Because I’m not using an “agentic harnesss”. I find the results are relatively garbage (technical debt generators) and I care about good quality and long-term maintainability (along with maximizing the complexity of the task they can handle)
I see those harnesses and get visions of “history of flight” videos with crazy hoppy screw copters and people with wings strapped to their arms.
I’m sure it’s coming, but right now it just looks like idiocy to me.
>>
>>108296628
Did you try Zed?
Text threads and inline editing might be of interest to you. Of course they kind of deprecated that functionality in favor of "agentic threads" but it's still there.
>>
>>108296628
You can get high quality results, caveat being that you have to put more effort into the set up than a simple chat with a "You are expert SWE" sysprompt but still seems like less effort than what you are doing manually now each time.
You need to curate AGENTS.md, system prompt, memory files, etc. Put in your coding standards and update every time you see it making mistakes. Automated code reviews, and manual code reviews, on top of monitoring them as they work. The code we get now is better and has less technical debt than what our junior and mid-level devs were merging in a couple years ago.
>>
>>108296437
This is the problem I'm running into. The juice isn't worth the squeeze for anything bigger than "write a function that does X"
>>
>>108296644
I like the idea of zed, but prefer my airgapped llm inference stack going through nginx for interactions so I can guarantee no information leakage to the internet by any part. Zed seems trustworthy for now but who knows. A thick client is a bit harder to wrangle and it doesn’t look “better” enough to be worth the effort.
llama-cli/ooba and vi is preferred toolset until something an order of magnitude better comes out
>>
>>108296694
I’d like to try it at some point once the tooling settles and gets less janky.
I feel like there’s still headroom on my current workflow and I’m learning a lot and having fun, which are big motivators for me.
Thanks for the rundown. I’m a bit more interested than I was.
>>
>>108296739
They can make bigger changes as long as you're able to put them into words.
Anything left unspecified likely won't be good even if it works.
>>
>>108296788
(different anon) I've found that as well and the play I'll try next is to give a general description and ask the LLM to turn that into a comprehensive and detailed specification, which I will then edit and give to the LLM. I'll report if that's actually worth anything.
>>
moonshota ai
>>
>>108296808
>moonshota.i
>moonlol.i
>>
>>108296800
Why is lecunny talking about cat-like intelligence when cats can't write specifications?
>>
>>108293837
This one is for me
https://huggingface.co/YuanLabAI/Yuan3.0-Flash
>>
>>108296800
Sounds like you basically just want a prompt enhancer like https://www.promptcowboy.ai/
>>
>>108291659
Skill issue

No really, you're prompting it wrong. Never argue with or berate an AI agent. Once you start doing that, you have changed the genre of conversation from "helpful assistant doing good work" to "AI assistant makes mistakes and gets yelled at". It then becomes statistically more probable that the AI makes further mistakes so you can yell at it more.

Furthermore (this is a distinct effect from the first one), most LLMs have been RLHF'ed on a bunch of normie conversation preference data and so they care a lot about managing the user's emotions. Once you start expressing anger at an LLM, it enters "customer service" mode where the primary concern is making sure the user feels like they've been listened to. Actually getting further work done is at best the secondary goal once you enter that state.

TL;DR: Never yell at a clanker if you want them to do useful work.
>>
File: file.png (7 KB, 384x96)
7 KB
7 KB PNG
>>108297123
It doesn't work so I can't tell you why it's shit
>>
>Verify-after-edit boosts Qwen3.5 35B-A3B performance in SWEbench-verified Hard from 22.2% to 37.8%. For comparison Opus 4.6 scores a 40%.
>The "verify-on-edit" strategy is dead simple — after every successful file_edit, I inject a user message like:
>"You just edited X. Before moving on, verify the change is correct: write a short inline python -c or a /tmp test script that exercises the changed code path, run it with bash, and confirm the output is as expected."
Has anyone tried a workflow like this? Does it work? Could it be that the cloud models so something like this themselves?

The original is from reddit: https://old.reddit.com/r/LocalLLaMA/comments/1rkdlqi/qwen3535ba3b_hits_378_on_swebench_verified_hard/
>>
>>108297248
The little I experimented with that kind of thing, the model just ends up coming up with unnecessary shit or straight up hallucinating when the original result was already good enough
But that was a good while ago, maybe newer models, or just these qwen models, get a good boost out of it..
>>
Page 9…someone bake a real thread!
>>
>>108297281
I wonder if a large context might screw things up with it. Ie if you had the verification request done with an empty context would it do better?
>>
>>108297470
All other things being equal, if the LLM doesn’t need any of the existing context then a new chat would be superior.
I’ll often get a fresh session to do some critique of the work
>>
>>108297343
mikumikuanon should bake a mikumiku bread!
I'd bake one but I will mess something up and you will all laugh at me and... and... :(
>>
>>108297634
You can do it anon... I believe in you

btw I will come to your house and rape you if you mess it up
>>
Miku anon dead it's over
>>
>>108297185
>TL;DR: Never yell at a clanker if you want them to do useful work.

I am probably wasting tokens buy I talk to it the same way I speak to subordinates at work.
>please
>thank you
>you did a great job with X but would you please try Y and Z.
>What do they call it the compliment sandwich with the critique in the ceneter
and so forth and so on
but I hate the word clanker. It does not roll off the tongue like a real word.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.