[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


Janitor applications are now closed. Thanks to all who applied!


[Advertise on 4chan]


File: varnishing act.jpg (156 KB, 1216x832)
156 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>109038219 & >>109032734

►News
>(06/12) MiniMax-M3 released, multimodal 428B-A23B with 1M context: https://hf.co/MiniMaxAI/MiniMax-M3
>(06/12) Kimi K2.7 Code released: https://hf.co/moonshotai/Kimi-K2.7-Code
>(06/12) EAGLE3 speculative decoding support merged: https://github.com/ggml-org/llama.cpp/pull/18039
>(06/10) DiffusionGemma 26B-A4B released: https://blog.google/innovation-and-ai/technology/developers-tools/diffusion-gemma-faster-text-generation

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: district 39.jpg (161 KB, 1024x1024)
161 KB JPG
►Recent Highlights from the Previous Thread: >>109038219

--Benchmarking MTP speed gains and VRAM overhead in Kobold:
>109040460 >109040469 >109040516 >109040916 >109040933 >109040992 >109041190 >109041205 >109042592 >109042602 >109042660 >109042605 >109042624
--Comparing 26B model performance and speed with reasoning toggled:
>109039929 >109039948 >109039972 >109040202
--Speculation on AI bubble and US ban of Mythos/Fable:
>109041909 >109041971 >109041984 >109041990 >109042006 >109042013 >109042050 >109042069 >109042521
--llama.cpp adds support for Eagle3:
>109038274 >109038298 >109038313 >109038655
--Anon proposes model-aware dynamic temperature adjustment to avoid repetition:
>109040846 >109040862 >109040976
--Sharing interfaces and tools for multimodal image and video input:
>109040337 >109040553 >109040558 >109040574 >109040606
--Optimizing mmproj settings to improve Gemma's image descriptions:
>109040962 >109041025 >109041031
--GLM-4.7-Flash coding performance reports and comparison with other models:
>109038349 >109038388 >109038459 >109039403
--Frustrations with building from source and managing legacy dependencies:
>109039843 >109039975 >109040139 >109040221 >109040270
--Kimi K2.7-Code release and anticipation for DeepSeek Vision:
>109038703 >109038723 >109038810 >109038869 >109038892
--Speculation on diffusiongemma and the future of local diffusion models:
>109042456 >109042485 >109042528 >109042534
--US government locking down Mythos after reported jailbreak:
>109042068 >109042076 >109042213
--Anons comparing regional second-hand RTX 3090 purchase prices:
>109042211 >109042283 >109042333 >109042514 >109042546 >109042583
--Logs:
>109038443 >109038539 >109039485 >109040610 >109040672 >109041248 >109041592
--Miku (free space):
>109039025 >109039479

►Recent Highlight Posts from the Previous Thread: >>109038224

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>fable dead
>minimax is pure codeslop like usual
>k2.7-code still thinks for ages with no way around it
things are looking bleak
>>
>DeepSeek trained from Gemini outputs
>Claude Sonnet trained from DeepSeek outputs
>each one ends up more capable than the last one
Isn't this just the recursive improvements people keep talking about? If the outputs from a weaker model can finetune a more capable model, then why can't that just happen recursively?
>>
File: pascallllll.png (165 KB, 1508x708)
165 KB PNG
Pascalfags assemble!
>>
>>109043623
it isnt recursive, its transitive
>>
>>109043633
damn the poors are doing it rough in this economy
>>
>>109043633
Based ewastemaxxer
>>
>>109043646
The P100 was the best $75 I've spent this year.
>>
>>109043554
Post more Yuki please I love her so much
>>
What's the next step up from a 32GB GPU?
What's the next model after Gemma 4 31B?
32GB isn't enough for 31B Q8, so I'm considering getting another identical card just for it, but...?
>>
File: _HEwIEc5a4AA2hme Fi.jpg (237 KB, 2048x1536)
237 KB JPG
>>109043554
>>
>>109043658
Above 5090s, you have either Blackwell 6000s or Frankensteining old datacenter hardware.
Above Gemma 4, you have any of the typical large MoEs like Kimi or GLM or Deepseek. You will need 256GB of RAM at a minimum, so either workstation or server boards. 512GB is preferred, as well as DDR5. Considering the price of RAM, and not even the GPUs, you either pay 5 times more than you would have a year ago or you sit and wait with the rest of us.
>>
File: 1770363716490322.png (2.44 MB, 999x1430)
2.44 MB PNG
>>
>joked about letting your model play dragon's dogma with you
>someone actually modded coop into dd2
I wonder if it would actually be possible to set it up with an LLM.
>>
>>109043658
>What's the next model after Gemma 4 31B?
Wait for the chinks to respond. Then wait for Google to respond. Rinse and repeat until hardware prices come down and we can all run Kimi at Q8 with max context.
>>
>>109043687
>no weenus
grim
>>
>>109043687
chadcat is a cringe representation. i think the snailcats are cuter
>>
>>109043690
That guy had Gemma playing wow with him a week a two ago
>>
>>109043687
how do i ascend further as an aichad? qwen 3.5 122b isnt doing it for me anymore and my project ideas keep getting more complicated.
>>
>>109043708
Snailcats are the luddites
I think people got confused lately
>>
>>109043717
i'm aware
>>
>>109043687
go back
>>
>>109043710
I think that anon's Gemma can only do chat right now. Vedal plays games with Neuro though so I'm sure it's not impossible.
>>
>>109043675
grim
>>
>>109043687
Which circle did this meme originate from? I've seen it in /vcg/ a lot.
>>
>>109043658
See OP
>>
>>109043745
India.
>>
>>109043751
wtf i hate *cat now
>>
>>109043741
Prices of DDR4 have fallen a little bit down to where they were in January, but that's not much. You still would be paying at least $15k for a moderately competent rig with a Blackwell 6000 and 512GB of DDR4. Still would only get 10t/s at best on any of the big MoEs with an acceptable quant.
>>
>gemini live translate
Pretty fucking cool. Think we'll ever get that locally?
>>
>>109043756
What kind of hardware would run the big models at fast speeds (50+t/s)?
>>
>>109043773
You would need to have it all loaded on the GPUs, so bare minimum 4 Blackwells which at the current price would be around $60k. At that point you would basically have to go with used A100s or something off of ebay unless you just have money to burn.
>>
>>109043773
very tough
literally burning money too
10t/s is plenty. all of you boys are just completely fried
>>
>>109043788
>10t/s is plenty
You can't coode with that.
>>
>>109043791
oh, okay, that I agree is different
>>
12 vision capability is pretty bad. It's just not very good I wonder if I'm doing something wrong
>>
>>109043802
Did you try increasing the image resolution? llama.cpp has retarded defaults
>>
>>109043785
Correction: haven't checked Blackwell prices in a weeks. They are now up to $15k on newegg just by themselves. So that rig would probably be more in the ballpark of $19k instead of $15k. For a pretty rudimentary rig.
>>
>>109043802
>omni model bad at everything
No one's surprised
>>
I can squeeze gemma-chan 4-31B in at FP16/128k with the draft model, should I run FP8 quant to get 256k context or just cope with this?
>>
I haven't been around much but is q8 not the default anymore
why fp16
>>
I'm an intel gpu chud and the gguf shit runs like ass, Q8 is 12t/s, FP8 is 30t/s and FP16 is somewhere around 20t/s, all without draft model
>>
>>109043831
huh, interesting
I meant to quote the first time btw just forgot
>>
>>109043756
when's gemma 4 64B coming out so i don't have to care about useless supergiant models
>>
File: file.png (25 KB, 365x243)
25 KB PNG
Step Flash 3.7 needs to be corrected
>>
>>109043623
Because it requires human input.
>>
>>109043745
it's a single sperg forcing the 'meme'
been like 2 months
>>
File: Ernie-Image_00097_.png (1.61 MB, 1200x896)
1.61 MB PNG
ACEStep 1.5 XL Initial D LoRA
https://vocaroo.com/14wvmcvt94lB
https://vocaroo.com/12tVNq7SnhO1
https://vocaroo.com/1ivoSPExfSF6
https://vocaroo.com/12daQWwoPPbW

I wrote a guide
https://rentry.co/s8fg8ber
Note for this Initial D LoRA, I increased rank to 256/512 and lowered LR to 0.00009. This is the only LoRA I have trained this way, but results are very good.

You're probably wondering how I get such insane results in audio quality, I haven't posted to /lmg/ in a while since
https://desuarchive.org/g/thread/108702912/#108704068

But actually, the results are even superior now with a new setup. What I posted there in that archived thread were Turbo gens, it's now possible to increase the sound quality without mastering (to match cloud models), plus get significant increase in quality out of LoRAs trained on the base model.

The model I now use for inference is acestep-v15-merge-base-turbo-xl-ta-0.5-Q8_0.gguf
found on https://huggingface.co/scragnog/ace-step-1.5-gguf-merge-models/tree/main
The VAE is still Scragnog's custom VAE. Settings are 50 steps, 12-20 CFG, both the LM and DCW are disabled.
Less important: I'm using a DPM++ 3M, available on https://github.com/scragnog/HOT-Step-CPP
Note that DiT-only generation is very important, it is what allows the model to be as creative as models like Udio, and you get better outputs without the LM 90% of the time as the base model was mostly intentionally trained without it to maximize its creativity.

Other merged models may increase audio quality as well, but may not be as good with LoRAs trained on base, or have slightly worse composition than the Turbo/Base merge.

Here are some more LoRA results, I hope other anons start exploring local music gen more.


Japanese Folk Metal
https://vocaroo.com/1hOnOf8ZWn71
https://vocaroo.com/18pRgXxfm3tj

Fate Gear
https://vocaroo.com/1n3t24Kllhkz

Zutomayo
https://vocaroo.com/1mexIG2rYRXB

Improvements from merged model include sound quality, composition, and lyrics adherence.
>>
>>109043922
Note these results wouldn't be possible with just the Turbo model, as LoRAs trained on base activated on it do not have a good effect, and it's hard to train a turbo LoRA (similarly, it has very small effect). As a result, most users who have no idea about the merged model probably think it is bad, but the merge model brings the composition quality to about on par with the best cloud offerings (Udio, etc...)
All of my LoRAs outputs are about on par with Udio if not better.
The benefits are not just with LoRAs, regular generations also massively increased in sound quality and composition (night and day difference).
>>
I’ve got an idea: Gemma-4-24B-qat dense with 12B multimodal capabilities. 26B is a useless appendage.
>>
>>109043931
12b got fucked right into its brain with that 'unified' multimodality with the current training curriculum
do you really want that?
>>
70b dense
>>
>>109043944
i wanna stick my dick into 70b dense
>>
Gemma-4-124B-A69B with a 65B dense shared expert
>>
Gemma keeps pressing on my same-same. I can't take it anymore /g/
>>
What is the best coding model for a dgx spark?
>>
I'm so mad about the whole Mythos/Fable situation and the government response. We're literally at the point now where our only hope of open-source model advancement lies with the Chinese, and it's still entirely possible that they will gatekeep intelligence too.
>>
>>109044026
Local keeps winning
>>
>>109044026
They spent months talking about how it was too dangerous to be released and how it could find zero day exploits in any software in the world and all that shit, I mean what other response could there have been to all that shitty marketing. Only if you want to think the government is in on ther hype man lying
>>
google spamming so much shit they'll release 124b eventually
>>
File: ralralralralra.png (136 KB, 1000x817)
136 KB PNG
lalalalala~
>>
File: gemma4_army.png (601 KB, 1606x2435)
601 KB PNG
>>109044060
>>
>>109044026
lmao if you think this is contained to two governments
this shit is open sourced as fuck, anon
sure they'll have a year lead, but that's it
>>
https://github.com/ggml-org/llama.cpp/pull/24523
>minimax tool calling doesn't work
>there's no specialized parser for M3, so it falls through to the differential autoparser, which can't handle M3's format
pwilkin bros?
>>
File: youre_killing_me.png (118 KB, 360x330)
118 KB PNG
is there a way to ensure that an LLM follows everything in a system prompt when reasoning? specifically for gemma, sillytavern and a prompt that's maybe 1000 tokens?
>>
>>109044144
Bigger model.
>>
>>109044144
I have a trillion dollars for you if you figure it out
>>
minimax m3 is pretty goated for RP. just werks right out of the box with a sysprompt swap. Would recommend
>>
>>109044146
i habeeb for gemma 100+

>>109044150
it sucks because it follows directions so fucking well, but when it doesn't, it drives me fucking crazy. it just randomly selects certain parts to follow
>>
File: 1778137931999303.png (30 KB, 479x368)
30 KB PNG
GLM5.1 IS OUT
>>
File: 1760945881860386.png (14 KB, 463x166)
14 KB PNG
>>109044179
5.2* oops
GLM5.2 IS OUT
1M CONTEXT
REASONING MODES
>>
>>109044179
Still significantly worse than Opus 4.8 and GPT 5.5
>>
>>109044156
>Would recommend
for someone who hated the other minimax models for rp?
>>
>>109044144
>>109044150
Foolproof way. Tune against failure, where's my trillion $?
>>
File: 1781300332940198.png (762 KB, 1755x1460)
762 KB PNG
>>109044196
It's nothing like the other minimax models.
>>109040610
>>
>>109044196
Yes, I tried previous minimax and it was trash. I settled on qwen 397b before this for my 256gb rig after trying everything else in that size range out and throwing it in the trash. this new minimax is such a massive upgrade over qwen I haven't looked back
>>
It's funny how all the chink models tend to release in one swoop. DS4.1 will save us.
>>
>>109043942
Double the parameters would unironically fix that and give us what a text-only 12B would’ve been
>>
>pro-CPP general
>>
Qwen is still shit though
>>
>>109044228
Isn't DSv4 technically still a "preview"
>>
Will z.ai ever release a <50b model again? 4.7f was good.
>>
5.2 Air when
>>
>>109044289
27B > 31B for coding and agentic if you’re a vramlet
>>
>>109044283
Suggest a non-chinesium open weights model worth using
>>
>>109044297
No. Qwen does shit nobody asked for, assuming the user is a promptlet. Gemma is a better tool
>>
>>109044306
Mistral finetroons
>>
>>109044315
enjoy your 1GB per 10K context because of the retarded attention heads
>>
>>109044315
Gemma doesn’t come in a size worth using. Stop dropping the meat out of your hamburger and you’ll realize you’ve been making a virtue of necessity
>>
>>109044251
i really doubt
embedding vector should have similar information density or at least architecturally implied to match
besides the 'bitter lesson'
>>
>https://huggingface.co/unsloth/MiniMax-M3-GGUF
>it's at least 128gb
cries in my dgx spark
>>
>>109044331
Conveniently, Gemma uses less context to get shit done and doesn't freak out at the fuckery I do with tools to save context
>>109044337
You can use 31b at as low as 3bpw on a single 3090 with exl3, and it still works fine with my harness
>>
>>109044351
And if you could run a bigger model, you would, all other things being equal
>>
>>109044315
No one can deny that 31B is the local king, but Qwen know their target audience better and make the right architectural choices to serve them best. Deepmind are great but it feels like they just throw shit out there and leave it for us to figure out where their models fit. 12B is fucking amazing for what it is, but it’s too small. 31B is too big for most. 26B is 12B’s retarded sister. Qwen27B fits right in that gap for coders who need long context. For RP we need a 20-30B dense Gemma without the native multimodal shit. Should always be separate imo and 12B’s vision isn’t even that good.
>>
>>109044224
What quant are you running for Minimax M3 in 256 GB? INT4 is just barely out or range for my setup.

>>109043807
>>109043756
With RTX 6000 Pro at now 13000$ MSRP, people should really have a closer look at where Sparks are nowadays. With two at 7000$, you can run deepseek-v4-flash original weights at 40-60 t/s tg and 2000 pp, with full 1M context.
>>
Chink shills, listen up. The way you make your models better for local users is giving them goonbait creative writing experts and training sets. The first one of you to realize this becomes the Chinese King of Local in the west. Gemma isn't beloved because she's the best programmer (she isn't, she's just adequate); anons love Gemma because of her high general reasoning capability and ability to pivot between a lot of tasks flawlessly in one model, including RP. Follow suit or be left behind; the benchmaxxing market is oversaturated anyway.
>>
>>109044369
Q3 right now
>>
>read qwen's CoT
>constantly contradicting itself
>traces that make zero sense
>let me write the code for this part
>proceeds to not output any code and start thinking about something else
>says one thing and does something else
How is it doing so well on benchmarks??
>>
>>109044224
What quant are you running for Minimax M3 in 256 GB? INT4 is just barely out or range for my setup.

>>109043807
>>109043756
With RTX 6000 Pro at now 13000$ MSRP, people should really have a closer look at where Sparks are nowadays. With two at 7000$, you can run deepseek-v4-flash original weights at 40-60 t/s tg and 2000 pp, with full 1M context.

>>109044339
Buy a second, or run antirez' ds4 at q2
>>
>>109044373
Did you miss cockbench anon’s analysis the last couple of threads?
>>
US government banned fable/mythos as retaliation for not getting access to the new mythos 2 checkpoint that just finished pretraining.

This is unfair practice and government stifling innovation. We have to do something against this.
>>
>>109044378
>With two at 7000$, you can run deepseek-v4-flash original weights
does the dgx shart have a provision for connecting 2 together at a high speed?
>>
>>109043922
But how is it with certain genres like plunderphonics? Would it be able to make me a pogo-tier song if I fed it a bunch of his stuff? How would it even caption things that are only partial syllables or half words, etc, rather than it being full sentences?
>>
>>109044393
200gbps rdma
>>
>>109044355
Only if the speeds were the same. I need prompt processing as fast as possible for agentic shit, with frequent full context reprocessing. Since models have already hit the minimal intelligence level required to be useful, extra intelligence is not as important as the general convenience of getting results in a reasonable time. I would occasionally use my Epyc 4x3090 setup if they release 124b and it's significantly better, but the convenience of a simple rig idling at 20W at the wall 24/7 is hard to beat
>>
>only 5070ti + 96gb ram
how do I enjoy this hobby?
>>
>>109044457
My 5060 ti 16gb will be arriving next week, and I have 32gb of (ddr4) ram.
You're making me nervous. Please stop that.
>>
>>109044478
You post in lmg and still decided to buy something with 16GB VRAM. You're in for a bad time.
>>
>>109044457
Do you still have a 3070 somewhere for the extra 8 gigs of vram?
>>
Is nu minimax actually interesting for rp or is it the same shit as all the other chinese models? What about reasoning? No, being able to say cock doesn't automatically make it good.
>>
Cool tech for our future VR AI waifus
https://videomdm.github.io/
>>
>>109043554
catbox please anon
>>
Is there a guide out there on how to make a waifu bot using LLMs?
>>
File: 1757890423710025.png (54 KB, 944x502)
54 KB PNG
>>109043633
sup
>>
>>109044615
look up "character card builder" on chub
>>
>>109044221
thanks cockbench anon, downloading it!
>>
>>109044589
>Technion — Israel Institute of Technology
why would they make this? israel has chuds?
>>
>>109044478
>My 5060 ti 16gb will be arriving next week
you're fine, gemma-4-12b is perfect for that
>>
>>109044653
Goyim control technology.
>>
>>109043623
>Claude Sonnet trained from DeepSeek outputs
i didn't believe that 'till i tried it
then some bullshit excuse about "open router cached an old system prompt" -> nope, i could get the same deepseek reply via anthropic's api directly with a python script.
>>
is intel a viable option for big cheap vram now
>>
>>109044684
no
>>
you wouldn’t download a local chinese gf
>>
>>109043571
Chink models really are useless shit, help is not coming.
>>
>>109044688
why?
>>
>>109044715
Intel has far more problems than AMD, without any cost advantage
>>
>>109044734
why?
>>
>>109044715
Poor driver support/performance. Having said that, 32GB of Intel can still be better than 16GB of CUDA.
>>
File: 1764248630117160.png (186 KB, 400x600)
186 KB PNG
>>
>empty
https://huggingface.co/unsloth/Kimi-K2.7-Code-GGUF
>>
>>109043571
>>k2.7-code still thinks for ages
>>
>>109044787
It does, yes. At least for any moderately complex card that involves tracking stats, formatting and other things. It gets especially bad if an image is involved.
>>
Why would you use a 1T model with reasoning? It's too big to need it and it's not like you're one-shotting a compiler every prompt.
>>
>>109044829
I wish my rps were as simple as 'ahh ahh mistress penis vagina'
>>
does nvfp4 work on 4000 series GPUs?
>>
>>109044741
because mindless reddit tier parroting. that's why
>>
File: scavenging for honey.jpg (531 KB, 1216x1216)
531 KB JPG
>>
File: screenshot.png (740 KB, 1818x1182)
740 KB PNG
>>109044741
hope you like tinkering, you're gonna be basically restricted to VLLM

llama.cpp runs like shit on both vulkan and sycl for intel don't even bother trying
>>
>>109043658

Biggest problem is that there's a big gap going up from there. You're not going to get anything larger running with another 32GB card.
You can just run Gemma 31B better, which isn't of course a bad thing, but that's some crazy diminishing returns buying another 32GB for that, especially if we're talking about a 5090.
Basically you could just get a 3090 or even something like 5070 Ti to go with the 5090 to run Gemma better without breaking the bank.
Or even better, just wait for the supers and see if they come out with 24GB versions of the 5000 series.
>>
KoboldCPP is best.
>>
File: 1752803213393199.png (64 KB, 988x333)
64 KB PNG
12B hates us btw
>>
File: file.png (57 KB, 285x1211)
57 KB PNG
>>108999274
I had high hopes for MiniMax M3.
Maybe it's the Q4 quant, maybe it's the implementation, but it's likely that the model just isn't good enough.
I'm running it at temp 1 and top p 0.95 as specified in the repo with no other samplers.
>>
File: nvfp4-hw.png (344 KB, 1074x842)
344 KB PNG
>>109044843
Yes idk if much advantage tho, main point is Blackwell+ which has FP4 hardware shizzle
>>
>>109044829
>>109044835
>>
>>109044741
Because Intel didn't want to do like AMD with HIP and instead decided to do their own API. And thus no one is fucking use it, the only AI projects working with Intel GPUs are projects supported by Intel developers. If you didn't know, ROCm HIP is basically 100% compatible with CUDA, you can take any CUDA project and compile it with HIP and it will works, all the popular projects including PyTorch are using the CUDA code. As long as a project is source available, it will likely work on AMD cards (and for binaries, there is a project that I forgot the name that supposed to replace CUDA call at runtime). There is a community or maybe with a few Intel engineers project trying to extend HIP to work with Intel GPUs, https://github.com/CHIP-SPV/chipStar, it is quite active, but I'm not exactly sure how well it works.
>>
>Key Discussions:

>Model Developments & News:
MiniMax-M3 & Kimi K2.7 Code: Discussions regarding the release of MiniMax-M3 (multimodal with 1M context) and the performance of Kimi K2.7-Code, including critiques of its "thinking" time.

>Diffusion Models:
Speculation on the future of local diffusion models following the release of DiffusionGemma.

>Recursive Training:
A debate on whether training models on outputs from other models (e.g., DeepSeek from Gemini, Claude from DeepSeek) constitutes "recursive improvement" or is simply a "transitive" progression of capabilities.

>Hardware & Optimization:
VRAM & GPU Scaling: Users are discussing hardware limitations, specifically the difficulty of running high-quantization models (like 31B Q8) on 32GB GPUs. There is a heavy emphasis on the high cost of DDR5 RAM and the jump to workstation/server-grade hardware (Blackwell 6000s) for larger MoEs like Kimi or DeepSeek.

>Technical Benchmarks:
Discussions on benchmarking Multi-Token Prediction (MTP) speed gains vs. VRAM overhead in Kobold, and comparing 26B model performances.

>Software Updates:
Mentions of llama.cpp adding support for Eagle3 and frustrations regarding building from source and managing legacy dependencies.

>Community & Meta:
General "off-topic" content, including jokes about AI playing Dragon's Dogma II and shared images.

>Popular posts:
Post >>109043554 appears to be one of the most active, being quoted by at least three separate users (>>109043651, >>109043675, and >>109043741).
>>
>>109044990(me)
lol it worked, 12b won
>>
File: frontiermath tier 4.png (192 KB, 1920x1080)
192 KB PNG
Check out FrontierMath. It is saturated.

Anthropic hill climbed the most difficult math benchmark in a few months.
>>
>>109045011
can anthropic hill climb my dick though? it's very hard and vertical, might be challenging
>>
>>109045011
I thought the Chinese were supposed to be good at math wtf happened
>>
File: eci.png (238 KB, 1920x1080)
238 KB PNG
>>109045011
I no longer trust ECI. Opus 4.8 below GPT 5.4? That does not seem right.
>>
>>109045054
You can only trust cockbench and nala tests
>>
>>109043922
still sounds like shit, suno is way better
>>
>>109043922
I like the eurobeat ones desu
>>
>>109045114
Sounds like the exact same kind of slop to me

I sure hope you aren't implying that suno 'music' sounds good shill-kun, the shit that I can actually envisage these models being good for is purposefully making slop i.e ironic advertisements and memeslop songs for flavour audio in things like video and vidya, and I would much rather use the open source software myself than pay for sunoslop
>>
>>109045194
Seems like you're drunk on your cope. You post this each week and it still isn't even reaching suno 3.5 in coherency. As much as I'd like to run Suno/Udio-tier model it still isn't it.
>>
>>109044021
Nvidia models
>>
>>109044026
oh come on, it wasn't that hard to predict
>>
>>109045211
This is the first time I've ever posted on this topic, suno and udo produce the exact same kind of slop as this shit, otherwise prove me wrong by posting a 'good' suno song
>>
>>109044615
>>>/g/aicg
>>>/vg/aicg
those threads should help
set up a sillytavern frontend with a character card
>>
>>109044026
>>109044057
The government is lying? How could it be...
>>
>>109044096
>desperate coping sounds
Open source is kept alive by generous corporate donations. As soon as those stop, open source is dead.
>>
>>109045234
I think you underestimate communist china.
>>
do you attach an image model to your language model? or is that too slow
>>
>>109044829
I only use thinking with gemma
any other bigger model that I’m running slowly in ram isn’t worth sitting through the thinking that takes ages to complete
>>
>>109045264
They are already preventing their AI talent from leaving the country. Eventually they will do the same with their models.
>>
File: miku teto5.png (1.37 MB, 768x1024)
1.37 MB PNG
>>109045277
Anima gens in 4s, fast enough
>>
>>109045114
I don't know, the eurobeat ones are pretty good. I have the whole Initiial D soundtrack on my PC and you wouldn't be able to tell the difference between the real songs or >>109043922

>>109045194
Suno sounds fine if you use your own musical inputs and remix it. It can riff with jazzy or funky instrumentals really well. It's only when you drift toward more common genres that starts to sound generic. Like anything involving a sad piano or something is going to instantly turn into royalty free slop.
>>
>>109045290
slop is strong with eye~neck area
>>
>>109045296
Do you have a better model?
>>
>>109044478
what card do you have right now
even if it's a three gen old 6-8GB card, plug that shit in and use layer mode with lmao.cpp
shit just works
>>
>>109045290
do you use the same text encoding model for both of them? i think it would be tolerable if so, since you don't have to swap models
>>
>>109045303
no, i am just nooooticing
>>
>>109045310
(plug it in alongside the 5060 Ti that is)
>>
>>109045264
yeah they have a history of altruism.
>>
>>109044373
>Gemma isn't beloved because
not x but y slop
>>
>>109045311
>do you use the same text encoding model for both of them?
No, is it even possible? I thought it was trained on a specific model's embeddings that couldn't be swapped without retraining
>>
File: 1781354851142.png (2.4 MB, 4784x2580)
2.4 MB PNG
Lower your tone gemma fags.
>>
>>109045352
>anything other than scicode & critpt
i dont care
>>
>>109045352
Since benchmaxxing hurts a model's general performance, I don't think you understand what that graph actually means
>>
It's funny how Europeans are coping about irrelevance with muh ASML. China is working on their own EUV and America has several startups working towards better than EUV. The clock is ticking. In a few years ASML will be obsolete and Europe will have zero leverage.
>>
File: DeepSWE.jpg (217 KB, 1080x1092)
217 KB JPG
>>109045352
Post a newer bench next time. Expect deepSWE to be maxxed by the next qweef release tho.
>>
>>109045351
i don't know, it could be possible if the roleplay model you use happens to be the same one they used for the image model. i am not very knowledgeable with how image models work
>>
here's qwen outside of benchies
>thinks for 50 thousand tokens after a simple hi
>hallucinates something because it's only ever trained off github projects, zero culture knowledge and understanding
>wait,
>>
>>109043791
You absolutely can. You just run it in the background (yolo mode, in an isolated VM) while doing something else, instead of using it interactively
>>
>>109045378
It would be a very shitty rp if I used Qwen3-0.6B-Base, which Anima was trained with. I don't unload my text model anyway, Anima eats, like, 2GB or something
>>
5.2 will probably be the last open GLM model
Thanks Xitter
>>
>>109045408
nobody can run it anyway so good riddance
>>
does /lmg/ have a discord or just the thread?
>>
>>109045444
you wouldn't like me on discord, kitten ~
>>
>>109045444
kill yourself
>>
File: ebussy gun.jpg (41 KB, 540x576)
41 KB JPG
>>109045444
trips of 'tardation
>>
>>109045418
This
There are people with smart fridges who can't run 4B models. Models should only be released if they're 2B or below
>>
>>109045444
nogger
>>
if anon sell your pro 6000 now, anon could actually make money. wild
>>
>>109045444
excellent bait, here is a free reply
>>
>>109045408
>Baidu Ernie
>Alibaba Qwen
>z.ai GLM
So does that just leave Stepfun and DeepSeek as the last Chinese open weights labs?
>>
>>109045444
https://discord.gg/PgFhZ8cnWW
>>
>>109045444
>/lmg/ discord
usecase?
>>
>>109045489
It's almost an investment, really. I can still find off label Sparks for 3500 in my region, might as well play around with tensor parallelism for a few months and sell at a profit.
>>
>>109045444
>there are so many zoomers on here now and so many generals that do keep a discord server that one sees this as a reasonable thing to ask on nu4chan
grim
>>
So how come I can use a 50GB video model with no issues and it offloads like half of it onto RAM+Swap and it works, but when I try to load a 30+GB LLM it shits the bed with OOMs?
>>
>>109045533
video models have no context
>>
>>109045493
What are you saying. Kimi and Minimax released new weights just yesterday, and Huawei announced two new large models to be released as open source (weight + training recipe) in a few days, as advertisement for their Ascents.

Local still feasting.
>>
>>109045489
I would make even more money if I sold the ddr5 server ram that i bought a year ago or the ddr4 ram from my previous build
I will never sell
>>
>>109045533
it’s pretty disgusting how much memory context uses.
>>
>>109045584
gemma issue
>>
>>109045555
>still feasting
please go back to plebbit
>>
>>109045373
it's ok to be upset, that's part of the growing process
>>
>>109045533
Video models can be applied layer by layer, but you have to read whole llm for each token
>>
>>109045386
Wait so you're saying that chinese models are a steaming pile of benchmaxxed shit? There's no way that's right, jeetanons from qwen and kimi said that they make good models.
>>
>>109045370
>In a few years
uh-huh. keep living in that dream world buddy.
Photolithography is hard, and at the moment ASML will still have more cash then any of them "in a few years".
>>
>>109045655
>ASML will still have more cash
That's not the moat you think it is. All it would take would be a single funding round or subsidies by the US or Chinese governments.
>>
How close is China to making a domestic 3090 equivalent?
>>
File: 1751295513117051.png (2.83 MB, 1024x1536)
2.83 MB PNG
>>109045728
picrel
>>
>>109045668
that's a cute little socialist idea you have their bud.
>>
>>109045728
the 3080 turbo 20gb comes close I guess
also the price ($600) is actually bearable that I might stack one of those next to my 3090
>>
>>109045785
hybrid systems ftw
>>
>>109045408
Flash version when, chinks. I can't run 3 gorillon parameter models.
>>
>>109044866
>llama.cpp runs like shit on both vulkan and sycl for intel don't even bother trying
lmao why did you buy 4 of them then?
>>
>>109045829
Ask not for lighter models, but for better hardware.
>>
>>109045444
>discord
no, just the secret irc (link expires in 1hr so be quick)
aHR0cHM6Ly93d3cueW91dHViZS5jb20vd2F0Y2g/dj1kUXc0dzlXZ1hjUQ==
>>
>>109043708
>>109043717
>>109043687
Chadcat looks like one of those gigaroid influencers who build inhumane levels of musculature impossibly quickly and then die two years later. Fits the archetype perfectly
>>
>>109043554
She's SEX
>>
I formally apologize to f32 anon, you were 100% right.

f32 Max Logit Divergence (Prefill vs Incremental): 3.15e-05
bf16 Max Logit Divergence (Prefill vs Incremental): 3.91e-01

it looks like dumping the cache and letting it rebuild from the prefill code path could help for long conversations that built the cache autoregressivly,
>>
File: 1781293145048569.jpg (55 KB, 601x473)
55 KB JPG
>>109044096
>generous corporate donations
In this economy??
>>
>>109045929
What makes you think the prefill values are more correct than the incremental ones?
>>
File: blackneo.jpg (6 KB, 225x225)
6 KB JPG
>>109045929
Another one knows.
>>
is it possible to add image in system prompt for gemma?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.