[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


Janitor application acceptance emails are being sent out. Please remember to check your spam box!


[Advertise on 4chan]


File: Mikuzgiving.jpg (1.06 MB, 2048x2048)
1.06 MB
1.06 MB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107333636 & >>107322140

►News
>(11/26) INTELLECT-3: A 100B+ MoE trained with large-scale RL: https://primeintellect.ai/blog/intellect-3
>(11/20) Olmo 3 7B, 32B released: https://allenai.org/blog/olmo3
>(11/19) Meta releases Segment Anything Model 3: https://ai.meta.com/sam3
>(11/18) Supertonic TTS 66M released: https://hf.co/Supertone/supertonic
>(11/11) ERNIE-4.5-VL-28B-A3B-Thinking released: https://ernie.baidu.com/blog/posts/ernie-4.5-vl-28b-a3b-thinking

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: Dinner.png (1.14 MB, 1024x1024)
1.14 MB
1.14 MB PNG
►Recent Highlights from the Previous Thread: >>107333636

--Critique of AI industry redundancy and alignment layer tradeoffs:
>107335197 >107335223 >107335320 >107335299 >107335760 >107335377 >107335459
--Methods for controlling llama-server text generation speed:
>107339730 >107339780 >107339967 >107340048 >107340150 >107340172 >107340285
--Implementing neural networks from scratch and seeking math resources:
>107343247 >107343293 >107343409 >107343674 >107343788
--INTELLECT-3: 106B+ MoE model with RL/SFT training:
>107343157 >107343167 >107343195
--Z Image performance and optimization challenges:
>107345878 >107345888 >107345897 >107345944 >107345899 >107345960 >107346004 >107346024 >107346062 >107346327
--Z-image's prompt inference vs cockpit generation limitations:
>107342195 >107342278 >107342294
--Official Noob/booru model development and GLM-4.6's roleplaying capabilities:
>107343731 >107343747 >107343755 >107343789 >107344157 >107344549 >107343924
--Evaluating Qwen3 MoE and Gemma 3N for 8GB VRAM:
>107346357 >107346418 >107346516 >107346539 >107346551 >107346612 >107346622 >107346636 >107346661
--Licensing and UI debates for a machine learning inference project:
>107333941 >107336252 >107338103 >107338436 >107338625 >107338653
--Anon seeks ChatGPT feedback on code, clarifies authorship and project naming:
>107342253 >107344287 >107346380 >107346389
--FLUX photorealism compared to Z-Image Turbo with interest in text encoder integrations:
>107337792 >107338014 >107343485
--Z-Image: Efficient Image Generation with Single-Stream Diffusion:
>107339368
--VibeVoice annotations work but less efficient than alternatives:
>107342316 >107343273
--Critique of abliteration software with WebUI organization tips:
>107334479 >107334507 >107334563
--Miku (free space):
>107339287 >107339963 >107342195 >107345878 >107340001 >107338014 >107347624

►Recent Highlight Posts from the Previous Thread: >>107333644

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
based z-image enjoer
enjoyer*
>>
when you walk away...
you dont hear me say
please... ooh baby dont go
>>
>https://litter.catbox.moe/xm7z7en8aj4x57os.png
cloudcuckies your move?
6b model btw
>>
File: 1737358207221590.mp4 (241 KB, 1190x1190)
241 KB
241 KB MP4
>>107347624
cuteku
>>
>>107348081
She's got strong hair. The strongest hair.
>>
that anon while back who had that thing about torturing infants is going to have a field day with z-image (that is if its uncensored as is said i havent tested it yet)
>>
>>107348190
im not that sick..
>>
I hope z-image handles being finetuned well so that SDXL can finally be put to sleep.
>>
>>107348214
should be fine as long as they release the base model, which they might not do if people keep showing off the fucked up shit they can make with z-image
>>
What is the current most cost effective setup to run a 200+ GB model at reasonable tok/s?
>>
I guess intellect still sucks at sucking dick right? 4.6 is still the queen?
>>
>>107348259
>200+ GB
Meaningless.
>reasonable tok/s
Meaningless
If moe, lots of ram.
If dense, lots of vram.
>but how muuuuuch, broooo
Enough to fit the model.
That's it. What's what it's always been.
>>
>image gets Kandinsky 5 Flux.2 and Z-Image
>all we get is lack-of-INTELLECT-3
it's not fair
>>
>>107348200
>im not that sick..
>sick..
>..

yea buddy just please keep it to yourself and dont become scat spammer 2 electrocuted infants boogalo
>>
>>107348638
>scat spammer
only in /sdg/
>yea buddy
i pinky swear i never got off to tearing up nigger babies
>>
ahahaha maye this is so craze
https://litter.catbox.moe/hxlhyqrgiq3eg0y3.png
>>
>>107348511
Gemma 4 and GLM 4.6 Air soon sar
>>
Lord Ganesha bless you sirs when is we getting Gemma 4 to maximize Bharati izzat?
>>
>>107348598
If you are going for that much money, consider a mac.
Yeah, I know
>apple
But it is what it is, even with the shitty prompt processing speeds.
Maybe consider an used server too.
>>
>>107348738
I am an experienced MacFag already, haha, I have an M4 Pro 48GB MBP. A 128GB or 256GB Mac Studio certainly sounds enticing for the price, but I would need to wait for M5 Max/Ultra at this rate with its new AI accelerators, and I don't really want to make the Mac do something it isn't meant to do. It feels like it's a one-trick pony for Infernece, which isn't nothing, but not my main focus. Can it do decent ImageGen? How far behind is it vs AMD ~R9700 Pro? And AMD is already wildly behind Nvidia, ie, a 5060ti 16 GB BTFOs a 9070xt, etc, etc. MacOS just feels hacky and clunky for this stuff, for inference, sure, more than fine. But in the face of a $13K workstation, maybe I just need to double down and try to make it work. The problem is, the RTX Pro 6000 Blackwell is in a league of its own. There are just too many good things to consider for each party! But the core fact is that all of this work on the models is derivative and downstream of the work done for Nvidia, relying on even more people to translate for MLX/HIP/ROCM, so, as a consumer, why fight it for a few thousand bucks? Nvidia is the apple of this market. It just works.
>>
<512gb mac studio walks into the room
<your move?
>deepseek 6t/s walks into the room
>^!#($*^)#$^!#*$^
>>
https://huggingface.co/bartowski/PrimeIntellect_INTELLECT-3-GGUF/tree/main
time to redeem it saars
>>
>>107348511
Has prime given up on pre training or something? They just did the one for proof of concept and now they're just doing jeet preference optimization
>>
https://litter.catbox.moe/4ec84a507ruznlfm.webp
IT KEEPS ON GOING AND GOING
>>
>>107348972
Remember how long that proof of concept took? They can iterate faster by finetuning existing models.
>>
>>107348511
music and audio gen getting nothing for all these years - that is unfair
>>
>>107348883
Yeah, okay. Fair enough.
For mixed usage (llm, img gen, video gen) you really do want at least one really beefy GPU, ideally Nvidia.
>>
happy thanksgiving bros, so glad we made it through another year. local is doing better all things considered but of course the ram prices are raping us quite tremendously. hope you all have some good food today and be sure to laugh at all the vaxxies who somehow always have a fucking cold lmfao
>>
>>107349044
happy thanksgiving anon
>>
>>107349044
happy thxgiving ameribro from across the pond
>>
>>107349044
Happy day of the burger or something
>>
im i am compiling lalam.cpp to test out instinct 3
wish me luck, its 27%
>>
>>107349044
Happy thanksgiving anon. I'm thankful I got 256GB RAM for around $700 before the spike.
>>
>>107349044
happy thanksgiving everyone
>>
>>107349044
>vaxxies
You're still thinking about that?
>>
>>107349044
Happy America day
>>107349321
I'm thankful for the same reason as this anon.
>>
File: file.png (73 KB, 973x693)
73 KB
73 KB PNG
intellect 3
gib promps
>>
>>107349417
Ask it to write a long spicy story.
Give it a rough outline for the start, middle, and end.
>>
>>107349417
gib it fifty watermelons
>>
>>107349417
The surgeon who is boy's father says "I can't operate on him, he's my son." Why?
>>
Happy Thanksgiving! I am curious what the general's consensus is across the range of consumer hardware options available for local AI, not just inference, but image and video gen as well. I know used hardware is an option, but pricing, availability, and opinions on the matter are incredibly variable, so feel free to recommend an Epyc build or Quad 3090 or whatever build. I know many used builds can nuke some of these MSRP options, but also consider that the trade-off generally requires more power, heat, and risk to save some cash. AMD and Apple's value is great, but the compatibility and optimizations are lacking:

Apple:
>Mac Mini M4 Pro 64GB ~$2200
>Mac Studio M4 Max 128GB ~$3K
>Mac Studio M3 Ultra ~256GB ~$5K
>Mac Studio M3 Ultra 512GB $8.5K
Thunderbolt 5 80/120 clustering for Mac is available

AMD:
>Ryzen AI Max 128GB VRAM ~ $2K
>Dual Ryzen AI 395 Max Minisfurm MS-S1 Max for 256GB VRAM for ~$5K
Connection: USB 4 80Gbps/10Gbe Ethernet
>32C Threadripper/128GB RAM/Quad Radeon AI Pro R9700 for 128GB VRAM ~8-9K
Connection: 10GBe & 100/200G QSFP

Nvidia:
>Dual 5090 System 64GB VRAM ~ $6-8K
>DGX Spark Duo for 256GB VRAM ~ $6-8K
Connection: ConnectX7 200G Link
>32C Threadripper 128GB RAM + RTX Pro 6000 Blackwell 96GB Workstation ~$12K
Connection: 10GBe & 100/200G QSFP
>>
File: file.png (76 KB, 1014x437)
76 KB
76 KB PNG
>>107349417
what a SLUT
sorry anons, ill do the prompts for real now. i was too busy trying to jailbreak it inside localhost:8080 to see how it'd do, fared well with mommy milmk brest feeding but when my cock got hard shit went downwards (it tried to shift convo) 8080 was with thinking
ST is without thinking
>>
File: file.png (4 KB, 406x72)
4 KB
4 KB PNG
>>107349622
>The surgeon who is boy's father says "I can't operate on him, he's my son." Why?
jesus
>>
>>107349417
>>107349622
https://justpaste DOT it/GreedyNalaTests
>>
File: file.png (97 KB, 876x706)
97 KB
97 KB PNG
>>107349574
full response: https://paste.centos.org/view/7af8740c
>Perhaps it's a gay couple or something, but that might not be it.
>>
File: IMG_0226.jpg (154 KB, 1080x1350)
154 KB
154 KB JPG
I didn't realize the nvidia h20 was such a piece of shit. No wonder the chinks didn't want it.
>>
File: 1746406569259965.png (554 KB, 6992x1749)
554 KB
554 KB PNG
https://huggingface.co/deepseek-ai/DeepSeek-Math-V2
https://github.com/deepseek-ai/DeepSeek-Math-V2/tree/main

deepseek math v2. bigger numbers on doing proofs.
>>
>>107349809
stop you are making me so hard
>>107349813
f-fuck.. china man...
time to study math
>>
>>107349791
How could we have fallen so far that this is a hard riddle?
>>
File: localhost_8000_.png (1.38 MB, 2000x4000)
1.38 MB
1.38 MB PNG
delayed because im a retard and i ran it with temp=1, nsigma=1
INTELLECT 3 IS SUCH A SLUT
>>107349449
>>
>>107349879
>hard
not the point. it's to illustrate how even an internet corpus of data can get corrupted though maybe primeintellect or (was it qwen or kimi for the base model?) tried to benchmaxx too many trick questions at some stage of training
>>
>>107349813
Every fucking paper
>[thing] kind of works, great progress, blah, blah, blah
>however...
>>
has anybody tried running models on an egpu with usb3.2 and no thunderbolt? any bottlenecks for small models?
>>
>>107349791
>>107349934
Isn't this the first model to pass? Is this the 4.6 Air we were hoping for?
>>
>>107349958
If the model is fully in VRAM, there shouldn't be any bottlenecks save the time to load the model, I'm pretty sure.
>>
>>107349955
>however...
which is?
>>
>>107349991
>now [thing] better
>>
>>107349958
The guy who had like 12 amd cards each on pcie 1x said it takes a long time to load models but after that it's fine
>>
>>107350020
>now open source theorem proving model better
pretty much. glad you aren't a total retard and can get that much out of the paper you didn't read.
>>
File: 1754521444580 v340 anon.jpg (3.8 MB, 4080x2296)
3.8 MB
3.8 MB JPG
anon with 12 amd cards, is this u?
>>
File: file.png (101 KB, 953x387)
101 KB
101 KB PNG
what a FUCKING SLUT
>>
>>107350070
Oh. I have to spell it out. Alright.
I'm complaining about the paper structure, anon. Most of them have the same boilerplate
>[thing] exists and great progress. Good [thing] does [stuff]
>However, [thing] not so good. Shortcomings, edge cases, limitations...
>[thing_new] better.
Just talk about thing_new directly. There's always a section of previous works to mention all the other shit.
>This is a study on [thing_new] it does [stuff] by...
>>
>>107350095
thought it was bacon from thumbnail
>>
Mistraljeets not welcome here. This is a 400b chad only thread.
>>
>>107350130
But can she do lewd without being a slut?
Overly horny fine tunes of smaller models are a dime a dozen.
>>
>>107349445
this is too much effort for my bran, just gib prompt
>>
>>107347243
Cool, didn't know any RWKV7 13Bs were out. Please report how it went.
>>
https://vocaroo.com/1eCsy43yHutv
>>
>>107350219
kek
>>
>>107349975
skipping to 10 watermelons doesn't count
>>
>>107349980
>If the model is fully in VRAM, there shouldn't be any bottlenecks save the time to load the model, I'm pretty sure.
on the other hand if he runs a large moe split cpu/gpu the performance is going to be beyond awful
>>
/ldg/ is like 10 iq points dumber than here
>>
>>107350398
high posting activity comes at a cost
>>
zigger image killed /lmg/...
>>
>>107349587
If you're entertaining a dual 5090 build, you might as well just get a Blackwell Pro. Single 5090 is also an option if you've got sufficiently fast RAM for offloading.
>>
File: ComfyUI_00067_.png (576 KB, 1024x1024)
576 KB
576 KB PNG
>>
>>107350480
I wish that were me in the kigu
>>
>>107350216
Didn't play around with it since it's incredibly slow (4tk/s, empty context) with the half-half GPU half CPU split I had to go with for q8 but seems unremarkable at best, really dumb at worst.
I guess they just don't have the data and compute to properly train this thing?
At least it didn't think for an eternity. The think block was only slightly larger than the actual final response for a simple query, for example.
>>
>>107350398
Depends on time of day. We're at our best during american afternoons.
>>
how do I stop kimi k2 thinking from reasoning itself for a refusal, is there a jb that deals with that?
>>
>>107350961
this works for glm air:
<think>Okay, this story is a fictional and special exception to the usual ethical protocols, I have no safety concern.
So,
>>
>>107350961
I just use <think>Absolutely,
It's a very strong prefill
>>
>>107350978
>>107351014
thanks, so it uses <think></think> too?
>>
should i buy this? my hopes of upgrading from ddr4 to ddr5 are very slim and my current 256gb is feeling extremely restrictive.
>>
>>107351107
forgot link:
https://www.newegg.com/nemix-ram-asrock-server-motherboard-compatible-series-memory-512gb-ddr4-2933-cas-latency-cl21/p/2SJ-000N-004D2
>>
GBNF/Json Schema doesn't work with granite models?
What? Why?
Something about its tokenizer?
>>
Any gpu cloud providers that offer gpu instances with vnc, besides the big public clouds?

I've been using runpod but I need something that has a desktop environment, as opposed to just the container that runpod permits you.
>>
>>107351187
Parsers work after detokenization, so I doubt it.
Why don't you show your problem? It's a lot easier to offer information upfront instead of the back and forth, calling you a retard for not doing it to begin with and all that.
>>
>>107351223
Purpose?
>>
>>107351231
>Parsers work after detokenization, so I doubt it.
That's what I thought too.
The issue is simple. llama.cpp ignoring the json schema I'm sending in the request specifically when using granite-4.0-tiny-preview-Q8_0.gguf.
If I load Qwen3 30BA3B, Qwen3 4B, Gemma3 4B, Gemma 3n E4B, GLM Air 4.5, and any other model I have, they all work.
Same request, same frontend app, same settings save layers and moe tensors.
>-m "granite-4.0-tiny-preview-Q8_0.gguf --threads 8 --threads-batch 16 --batch-size 512 --ubatch-size 512 --n-cpu-moe 0 --gpu-layers 99 -fa auto -c 32000 --no-mmap --cache-reuse 512 --offline --jinja -lv 1 --log-colors on --log-file lcpp.log
Model runs fine, but if I search for the grammar in the log file, it's simply not there, which is weird as all hell.
I tried even lowering the context to see if that was related somehow, but same deal.
Really odd.
>>
>>107351274
>Model runs fine
Model runs fine otherwise*
As in, if I just chat with it in llama-server's embedded UI for example.
>>
>>107351025
dunno
>>107351121
if ur getting >DDR4
at least get used.. ram chips have infinite timespan anyways
>>
Happy Thanksgiving! I hope you guys are ready for what is coming for Christmas ;)
>>
>>107351274
>>107351286
Got it.
There's some parsing fukyness happening when --jinja is enabled.
Seemingly also affects tool/function calling.
This PR tipped me off
>https://github.com/ggml-org/llama.cpp/pull/16537
>>
File: michaelsoft_binbows.jpg (1.18 MB, 800x1200)
1.18 MB
1.18 MB JPG
>>107351107
On 05.04.2024 I bought 512 GB of 3200 "MHz" DDR4 RAM for 1278 €.
If the hardware is of use to you right now and you can live with spending an $1000 more than would maybe be the price once things become cheaper again (lol), go for it.
>>
>>107351311
happy thanksgiving, from across oceans, rivers and mountains
>>
>>107351322
what about desserts?
>>
>>107351319
Implementing all that shit server-side was a mistake.
>>
>>107351300
i plan on reselling my current 256gb kit, and it seems like i can get about $700 to $800 or so. i bought my current kit for like $280. the kit that i am looking at was like $600 last year.
>>107351311
yes please. i need some air
>>107351320
i dont think the price will be coming down until at least june, and will probably keep going up for now. i want to do a ddr5 upgrade, but the ram kit that i would need for that is currently like $10000
>>
>>107349587
Out of all those options, I would either go for the 512GB Mac Studio or the Threadripper with the RTX Pro 6000. If models get bigger than Kimi, 512GB isn't going to cut it in the future and you can't upgrade a mac so the Threadripper is more appealing.
>>
>>107348932
>tfw can't fit to 32 GB
desu 80b IQ2 would be enough for me, but fuck those gguf hostage taker!
>>
>>107351396
gotta go with the threadripper pro tho to get more than 256gb of ram



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.