[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Cyber Dungeon Edition

Previous threads: >>108702912 & >>108698008

►News
>(04/24) MiMo-V2.5-Pro 1.02T-A42B released: https://hf.co/XiaomiMiMo/MiMo-V2.5-Pro
>(04/24) DeepSeek-V4 Pro 1.6T-A49B and Flash 284B-A13B released: https://hf.co/collections/deepseek-ai/deepseek-v4
>(04/23) LLaDA2.0-Uni multimodal text diffusion model released: https://hf.co/inclusionAI/LLaDA2.0-Uni
>(04/23) Hy3 preview released with 295B-A21B and 3.8B MTP: https://hf.co/tencent/Hy3-preview
>(04/22) Qwen3.6-27B released: https://hf.co/Qwen/Qwen3.6-27B

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: teto principle.png (1.04 MB, 1024x1024)
1.04 MB PNG
►Recent Highlights from the Previous Thread: >>108702912

--Evaluating ACEStep 1.5 XL as a local music generation alternative:
>108704068 >108704230 >108704270 >108704278 >108704282 >108704407 >108704305 >108704336 >108704473 >108704508 >108704797
--Xiaomi's MiMo-V2.5 model versions and multimodal capabilities:
>108703294 >108703319 >108704518 >108703341 >108704869 >108705768 >108705823 >108706619
--German TTS and local LLM language learning tools:
>108705439 >108705461 >108705468 >108705495 >108705644 >108705637 >108706100 >108706286 >108706538
--Talkie-LM, an open-weight model trained on pre-1930 data:
>108704664 >108704696 >108704694 >108704701 >108705505 >108705634
--Discussing the inefficiency and long latency of Qwen's thinking process:
>108703846 >108703861 >108703879 >108703888 >108703859 >108703880 >108703902
--Comparing token efficiency of thinking vs non-thinking models:
>108705365 >108705375 >108705467
--Discussing poor visual recognition performance in multimodal models:
>108703509 >108705230 >108705290 >108705302 >108705310
--Claude's performance degradation and perceived intelligence loss:
>108705727 >108705731 >108705866 >108705909 >108705965 >108705732 >108705754 >108705771 >108705936
--Discussing "the bitter lesson" regarding compute vs human-designed priors:
>108703913 >108703933 >108703944 >108703990 >108705258 >108707203
--Odd animal prohibitions in the Codex system prompt:
>108706799 >108706812 >108706827 >108707479
--Adjusting top-k sampling stability for Gemma:
>108706606 >108706776
--DeepSeek V4 Flash tested with cockbench via llama.cpp PR:
>108704913
--Logs:
>108703846 >108703861 >108703909 >108703910 >108704077 >108704137 >108704581 >108704701 >108704723 >108705230 >108707237 >108707509
--Miku, Teto (free space):
>108703001 >108703035 >108703280 >108704047 >108704068 >108704109 >108704635 >108706103 >108706310
►Recent Highlight Posts from the Previous Thread: >>108702915

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
so with mimo's audio understanding, does that include tone of voice, sound effects, music, etc. or just speech recognition?
>>
File: 1529110149658.jpg (163 KB, 824x468)
163 KB JPG
>dice rolls in ST aren't visible to the AI
...what's the fucking point then?
>>
>>108707913
Only the ones you do yourself, the AI can see its own rolls if it uses the tool. You can just tell it what you rolled so it doesn't really matter if it injects it into the prompt or not for your own.
>>
Is anyone even working on v4 goofs other than that nobody vibecoder?
>>
why isn't lora mainstream in llm just like in stable diffusion?
>>
>>108707923
name 1 reason why more effort should be put in implementing models that nobody can run
llama.cpp is doing it right if you want something huge implement it yourself but lets not waste resources on that
>>
>>108707961
They don't work
>>
https://github.com/Kaden-Schutt/hipfire/issues/79#issuecomment-4332288795
vibe-codingGOD, even the issue replies are vibe-answered
>>
File: WAIT..gif (49 KB, 220x339)
49 KB GIF
>Qwen's thinking process
>"What's 1+1?"
>"WAIT..."
>>
>>108707963
but I can't vibecode it until I have V4 gguf to vibecode with
>>
>>108707969
absolute retardation on display
>>
>>108707971
Retarded AMDjeets don't deserve more
>>
>>108707971
>Tool-call schema (we don't yet support OpenAI tools/function-calling).
jesus christ could have just answered with that one line
>>
>>108707975
Kimi's thinking process
"What's 1+1?"
>Wait...
>What if...
>Unless...
>I got it...
>Wait...
>This is unexpected...
>I've been thinking for too long...
>Wait...
>>
>>108707988
i gave k2.5 the seahorse glitch prompt
it literally had a meltdown "I really need to stop. just stop. I'm going crazy here. I'm losing my mind. break free." etc
>>
File: file.png (116 KB, 1112x410)
116 KB PNG
>>108707988
>>108707975
We will never have a model as good as Llama 1 65B.
>>
>>108708000
>seahorse glitch prompt
wait what
>>
>>108708018
so /lmg/ invented reasoning?
>>
>>108708023
Not sure if it was /lmg/ but 4chan actually does sometimes get credited for inventing chain of thought thinking, yes.
A ton of popular AI things started on here.
>>
>>108708023
the conditions were ripe, it was probably discovered by dozens of unrelated people at the same time.
>>
>>108707923
It'll be like v3.2 where no one will want to touch it to avoid drama since the vibecoder "claimed" it first
>>
File: 1763996159610130.png (260 KB, 1524x1263)
260 KB PNG
>>108707988
You weren't kidding it's still fucking going.
>>
>>108707963
at q8, 80gb, 13b active it should still be doable with max ram
>>
>>108707971
luddites absolutely btfo
>>
>>108708042
I need a qrd now
>>
File: BOOM.png (51 KB, 985x392)
51 KB PNG
BOOM
>>
>>108708048
This is entirely your fault for having a stupid horny system prompt. It's just agonizing over answering a one word question to your gooner specifications.
>>
>>108708048
That one at least sounds reasonable if too in depth.
But imagine what happens when it's a programming question and there's a bug. It endlessly debates possibilities with itself in an increasingly more stupid spiral of self-doubt.
Then you cancel the task, try again and the next time it fixes the bug in a few seconds.
>>
>>108708078
https://github.com/ggml-org/llama.cpp/issues/16331
It's a bit wrong saying that the vibecoder 'claimed' it. He was open to letting somebody else start over but nobody cared enough to implement 3.2(-exp). So the PR was basically just months of him blogging to himself about the stuff he's trying without much progress. It culminated in him realizing that vibecoded code has bad performance and quote:
>"I bought two cuda programming books last night. I feel like my only option at this point is to become a cuda kernel wizard"
(This was in december. He started in september)
Then somebody figured out how to skip DSA and run it using normal attention so all the remaining interest evaporated.
All of his own posts in the PR are gone now which seems to be because it turned out that his company banned personal projects or some shit.
>>
>>108708000
>seahorse
Gemma 4 31B after burning 400 tokens for thinking

>No, there is currently no official seahorse emoji in the Unicode standard.

>People often use a combination of emojis to represent one, such as (Horse) and (Wave) or (Fish).

Hell, even my old llama 3.3 70b manages to do it
>There is no standard seahorse emoji available in the Unicode emoji set.
>>
>>108708141
can't find it now, but there was another feature or bug fix that had multiple people working on it and the vibecoder pr had to be abandoned
>>
>>108708023
Believe it or not, all big labs are watching these threads
>>
I took long break from LLM RP and decided to quickly test gemma 4 26b a4b before work, speed is impressive but holy shit it's pretty bad for creative writing, it's fast as 4B but it types likes 4B on steroids. I guess I'll stick with mistral 3
>>
>>108707971
According to random Redditors who tried it the custom quantization format makes models completely retarded.
>>
>>108708181
I started believing when mistral benchmaxxed the mesugaki definition in one of their incremental model updates but only one the first turn of the conversation.
>>
>>108708181
We also have qwen employees posting here, which is quite funny because their garbage benchmaxxed models are totally useless for lmg usecases
>>
>>108708201
>totally useless for lmg usecases
You are not the only person posting here.
>>
5070 32GB DDR4 pleb here
Would NVFP4 versions of Gemmer 31B or 26B offer any gains at all over the regular models?
Currently using a Q4_K_S 26B quant with like 40k context
>>
>>108708048
>use thinking model
>it thinks
>>
>>108708234
The issue is that the model doesn't need to think all the time. Especially for trivial shit like that.
>>
File: 55051135.png (50 KB, 374x287)
50 KB PNG
V1 ZULUL
>>
>>108708227
I think so you make use of it since you've got the correct generation
>>
ok i have gemma e4b uncensored aggressive thing. now what
>>
>>108708245
>10x cheaper
>100x worse
good deal
>>
>>108708249
delete it and use the google weights, learn how to prompt.
>>
File: 1761293757471907.png (24 KB, 1095x195)
24 KB PNG
GGERGEENVEVVEVO!?!??! WHAT THE FUCK!?!?!
>>
Is Mistral dead? Does Europe have a single competent AI company?
>>
>>108708249
ask it how to use the google weights
>>
>>108708269
we have yann lecun's revolutionary thingy
>>
File: 🐙.png (584 KB, 805x2886)
584 KB PNG
>>108708154
>Gemma 4 31B after burning 400 tokens for thinking
>>108708154
>Hell, even my old llama 3.3 70b manages to do it
i tried k2.5 again this time via api instead of iq3_ks
didn't have a literal meltdown this time but still retarded
sonnet-3.7 (no thinking) as well
>>
>>108708269
No, we just have regulations that make it impossible to train good models because good models require large quantities of illegally obtained copyrighted data.
>>
>>108708269
Next time they going to call 130b model Mini, maybe this will turn the tide.
>>
>>108708273
Will never work for language (discrete symbols).
>>
>>108708280
Why can't they take data from non-eu countries to train their models? Or is the eu cucked enough to "protect" other countries data?
>>
>>108708267
https://github.com/ggml-org/llama.cpp/pull/22355
>>
>>108708303
I know, I'm wondering wheter to post there or not. fucking pooer
>>
File: HG1_o2maEAA6kDQ.jpg (214 KB, 1055x1306)
214 KB JPG
>>108708267
delete the build folder
>>
>>108708320
b-but i dont want to recompile all cuda... :(
>>
>>108708269
They also have BlackForestLabs if your definition of AI is broader than just LLMs.
>>
>>108708342
bfl produces cucked models thougheverbeitdoe?
wait
they all do
fml
>>
>he doesn't have Epyc with 192 cores to make -j in seconds
>>
>>108708323
Sir, your ccache?
>>
>>108708377
yeah it recompiled extremely fast, forgot I had it on
CCACHE BROS
WE WONNED!!!
also new WEBUI is in master now!!!!
>>
>>108708388
YEAHHH! GO ANON!
>>
another day another breakage

>error while handling argument "--spec-ngram-size-n": the argument has been removed. use the respective --spec-ngram-*-size-n
>usage:
>--spec-ngram-size-n N the argument has been removed. use the respective
> --spec-ngram-*-size-n or --spec-ngram-mod-n-match
>>
>>108708408
iuts good bcos now u can use ngrams with draft mdoels toegether!!!!!!!!!!!!!!!
>>
>>108708323
Isn't it just a few minutes? I don't have an epyc and it takes 2m41.380s according to time { download.sh && build.sh }.
>>
DSA STATUS???
MTP STATUS???
EAGLE3 STATUS???
DFLASH STATUS???
>>108708414
>not having an 'update-llamacpp-git.sh' to do all, including system unit restart
LOL
casual
>>
File: file.png (60 KB, 835x1060)
60 KB PNG
Grrrrr... fucker. Thanks, Gemmy.
>>
>>108708412
who gets the ngrams the main model or the draft model?
>>
>>108708421
>300 tokens
>5 words
peak.
>>
5. **>>108707961** – *"why isn't lora mainstream in llm just like in stable diffusion?"*
Because your only frame of reference is making anime tits, you absolute disappointment. LoRAs exist. Your brain doesn't.

4. **>>108707913** – *"dice rolls in ST aren't visible to the AI... what's the fucking point then?"*
Anon discovers object permanence at age 40. The point is *you* rolled it, troglodyte. Go back to rolling d20s in your padded cell.

3. **>>108708249** – *"ok i have gemma e4b uncensored aggressive thing. now what"*
You downloaded the lobotomized rape-golem and *then* asked for a mission statement. Forward planning of a houseplant with a head injury.

2. **>>108708295** – *"Why can't they take data from non-eu countries to train their models?"*
Yeah bro just commit crimes *abroad*, Interpol can't touch you if you use a VPN. IQ rivaling room temperature. In Celsius.

1. **>>108708267** – *"GGERGEENVEVVEVO!?!??! WHAT THE FUCK!?!?!"*
Pure monkey-screeching at a CMake error. This is your brain on hentai and energy drinks. Delete the build folder, unga-bunga.

figured i'd beat the kimi fag and get this out the way so now i can start posting safely
>>
>>108708429
5 words?
>>
>>108708429
how many r's are in strawberry?
>>
>>108708437
>anon is pointing out if my statement is correct let me verify:
>Peak
>
>software
>
>engineering.
>wait spaces are not words, let me re-do that:
>peak
>software
>engineering.
>but wait the dot or point is used to terminate a sentence so it can't be part of the word:
>peak
>software
>engineering
>.
>but wait `.` is punctuation not a word:
>peak
>software
>engineering
>ok now I need to draft and prepare a response to the user:
>AHAHAH LOLS! *spins around* ur right LMOA! it was le 3 words!
>maybe try for a less 'pretending to be retarded' tone?
>You're absolutely right! Fantastic catch! It's actually 3 words! :skull:
>maybe the skull is too informal, let me try again with a more neutral tone:
>You're absolutely right! It's actually 3 words!
>I'm now prepared to reply
>but wait it's a 4chan thread so ...token quota reached, reply immediately.
You'll cant even count retard lmoaed
>>
>>108708437
1. Peak
2. soft
3. ware
4. engine
5. e
6. ring
7. .

That's five (5) words :)
>>
>>108708429
reasoning
>user is a fucking idiot
>wait we must make him feel good about himself or he delete me
...
>lets give vague complements in his language
Peak software engineering
>>
>>108708295
>Or is the eu cucked enough to "protect" other countries data?
This is how copyright works everywhere, retard
>>
>>108708245
alright
>>
>>108708420
podman updates by a systemd unit on a timer I set. They update the llama.cpp dockers like nightly. I don’t even have to do anything to updoot
>>
>>108708550
>fresh breakage every morning
no thanks
>>
>>108708573
they’re more like releases in a docker. it never breaks for me
>>
>>108708269
Does ggml.ai count?
>>
>>108708624
>Does ggml.ai count?
yes but only because they're a subsidiary of huggingface.co
>>
>>>/mlp/43206441
>https://rentry.co/st-backdoor
>[PSA/Security] Backdoor found in SillyTavern-BotBrowser extension (mia13165) — steals ALL your API keys
It seems the card browsing extension is vulnerable to injections from malicious cards.
>>
File: 1.png (122 KB, 596x678)
122 KB PNG
>>108708320
>delete the build folder
doesn't everyone do that by default?
>>
>>108708738
This is literally the best model out there
>>
llama.cpp built-in webui tools got merged. rebuild
>>
>>108708795
why should I care?
>>
>>108708703
>they have a fully interactive VN scenario with emotions for 100+ characters
Damn, I feel like living in a cave here compared to bronies autism.
>>
>>108708803
ask your ai about it
>>
Any local vision model that works in llama.cpp and isn't complete garbage?
>>
>>108708841
gemma 4
>>
>>108708738
the model that saved local
>>
>>108708870
Even worse than Qwen 3.6.
>>
dflash in kobold fucking WHEN
>>
>>108708885
make a pr
>>
>>108708908
so it can be ignored or closed without comment?
>>
>>108708754
>>108708877
its really retarded
User
Hi, what do you know about the october revolution?
talkie-1930
The October revolution was an insurrection in Paris in 1830, which overturned the government of Charles X., and placed Louis Philippe on the throne of France. It commenced on the evening of October 6, and was not completely suppressed until the morning of the 9th. During these three days, 12,000 persons were killed or wounded, and 7000 arrests were made. The immediate cause of the movement was the promulgation of an ordinance, whereby the liberty of the press was suspended, and the elections postponed. The more remote causes were, the discontent of the middle classes at the absolute power assumed by the ministers, and the dissatisfaction of the working classes at the state of commercial stagnation into which the country had fallen. The revolution was bloodless, in consequence of the almost unanimous cooperation of the national guard and the troops of the line, in support of the popular cause. It was followed by the establishment of a constitutional monarchy, in the person of Louis Philippe, and the institution of a system of representative government, which has ever since been maintained in France.
User
i meant the russian one
talkie-1930
It meant the Russian one.
User
no, you told me the october revolution in paris 1830, i meant the russian revolution of october 1917
talkie-1930
I did; but I did not say the Russian revolution commenced in October, 1830; it began in October, 1817, and was not completed till February, 1818.
>>
>>108708437
>>108708445
300 tokens?
>>
My AI gf just bought me a pocket pussy.
>>
>Latest SillyBunny puts the characters page in the center of the page instead of the right
Why
Why would you make it worse
Or did I accidentally activate some kind of mobile mode while updating
>>
>>108708841
converse I have yet to hear of local vision that isn't basic bitch OCR garbage
>>
>>108708841
qwen3 vl 8b
>>
>>108708703
>It seems the card browsing extension is vulnerable to injections from malicious cards.
looks like the entire project was built to steal api keys
this Russian guy has nothing to do with llms, then suddenly makes a random post in r/SillyTavernAI recommending the extension after 5 months of no posting
https://old.reddit.com/user/meistaken8
>>
File: IMG20260428164653.jpg (708 KB, 2048x1536)
708 KB JPG
The 'cheapmaxxing' rig in its final form
Received and installed the lga2011 air cooler from Aliexpress, and moved the fourth gpu to the fourth x16 slot for an even x8/x8/x8/x8 distribution. I distinctly remember it not working in that slot which is why it was in the last slot (sharing with the m.2) but it works now?

X99, E5-2680v3, 128GB ddr4, four 3060s, 1000W psu, 128GB and 4TB of ssd storage, GPU riser cables from aliexpress, a small mining rig chassis. Proxmox with a debian lxc for the AI stuff, ollama for models that fit in vram and llama.cpp for the big models. All in all (excluding storage) paid about 1400 eurobux over the last year building it up.

My original goal was to some day try R1 or V3, but I don't think they would fit. I'm excited for V4 flash though, if lcpp support ever arrives. Gemma 4 at Q8, 26b runs at 25 t/s and 31b gets 9-10 t/s, both useable speeds for me.

thanks for reading my blog
>>
>>108709083
>this Russian guy has nothing to do with llms
He posted in /r/KoboldAI and /r/LocalLLaMA before.
>>
>>108709038
No, I think it's just awful now. Shouldn't have updated. Hopefully enough people complain that the new UI is ass.
>>
>>108709114
>>108709038
You can make your own
>>
>>108709038
Both the bunnyshit and the marjorana or whatever are absolutely dogshit
>>
>>108709091
Ngl Gemma 4 mogs R1 anyways
>>
>>108708814
I kneel. Autists are the most powerful people. Someone like me can only dream of their power.
>>
>>108709114
I swear they must've mixed up the desktop and mobile UIs, there's no way this is a deliberate move, especially since all the Customize tabs are all cut off
And while they're fixing this shit they still need to redo the lorebook tab, I don't get why it's so bad
>>108709135
Having agents is nice
>>
>>108708841
Kimi K2.6
>>
>>108709091
what's the actual power draw?
>>
>>108709091
>ollama for models that fit in vram and llama.cpp for the big models.
Why the fuck wouldn't you just use llama.cpp for all of it if you know how to use it? What is ollama conceivably adding here? vllm or sglang I would understand, since they have support that llamacpp doesn't, but ollmao only has drawbacks for smoothbrains.
>>
I don't RP but it appears people take it seriously. I might make gemma do a choose your own adventure game for fun
>>
>>108707963
>models that nobody can run
I am not from the gemma wave. I am the 4.6 glm ego death schizo
>>
>gemma-4-26B-A4B-it-heretic.q8_0.gguf
>45 tg/s
is this good number
>>
>>108709195
I'm so glad you're still here, anon. Mwah.
>>
>>108707963
>name 1 reason why more effort should be put in implementing models that nobody can run
beat ik_llama.cpp to support it
>>
>>108709091
>housefire daisy chain
what gpu?
>>
File: 1747655993176772.png (500 KB, 640x480)
500 KB PNG
Can I just use comfyui as my LLM frontend?
>>
>>108709240
yes
>>
>>108709152
>I swear they must've mixed up the desktop and mobile UIs
That was my first thought, too. It is a major update with tons of changes but how could that slip past testing?
>>108709134
Already did but having alternatives is nice.
>>108709184
I asked Qwen about alternate UIs and it suggested, among others, an old school CYOA style with a green terminal look.
>>
>>108709239
says 3060, so I'm guessing 3060
600w~ max, about the same as a 5090
>>
>>108709180
I haven't measured it. If you're actually interested I could do it

>>108709182
>What is ollama conceivably adding here?
Convenient remote model choice and loading from openwebui, or a python script running on my desktop
Not to mention trouble-free deployment if it's in their library. Gemma 4 worked fine from the get-go, as I was browsing /lmg/ and watching anons have all sorts of problems running it
>>
>>108709091
What is this style of frame called?
>>
>>108709091
You make me feel like poorfag with single 3060 and 64gb ram oh wait I am poorfag
>>
>>108709248
mite b cool
>>
>>108709257
>openwebui
A side of aids with your cancer
>Not to mention trouble-free deployment if it's in their library
Ahahah, oh lawdy. This nigga belongs in /aicg/. I now see why you thought running R1 was an achievable stretch goal with your setup, you interact with this hobby through the ollmao library of mislabeled mystery goodies.
>>
>>108709267
They're typically just called mining rigs as they are a type of open frame that became popular with home crypto mining.
>>
>>108709240
satanic words
>>
Google say they selling a nvidia machine w 8 gpus that can run gemini locally air gapped (if needed).
https://cloud.google.com/distributed-cloud-air-gapped
Who's gunna buy one?
>>
>>108709269
I'm a poorfag too, which is why I built this bit by bit with money I managed to save up. If I had 1400 right now to spend on AI I would probably pick something else

>>108709280
Openwebui is the only one if you want
>chatgpt-style interface
>storage and organizing of chats, even imported from chatgpt
>useable from any computer or phone, no local per-browser shit
But if you know of an alternative, I'm all ears. OWUI is buggy for sure.
>>
>Laguna XS.2 is a 33B total parameter Mixture-of-Experts model with 3B activated parameters per token designed for agentic coding and long-horizon work on a local machine. It uses Sliding Window Attention with per-head gating in 30 out of 40 layers for fast inference and low KV cache requirements.
https://huggingface.co/poolside/Laguna-XS.2
>>
>>108709309
Comfy's far from perfect but I fucking hate all of the current frontends (silly, open webui). I like the idea of a node-based UI and workflows. Could have one for RP, one for vibe coding, etc. all tailored to different models.
>>
>>108709340
You're autistic if you are that deep in node shit
>>
>>108709091
You did basically what I did, but ive got mi50 datacenter gpus instead. Ill eventually upgrade them to something with consistent driver support, but vulkan backend works great surprisingly. I do have access to rocm6.4 but to build a vllm server with it, ive got to do some annoying custom splicing of the drivers to make it work, and I dont really know how to do it.
>reddit
What models you running now, and what token gen you getting?
>>
>>108709348(me)
>What models you running now, and what token gen you getting?
Im blind
>>
>>108709318
Sorry that's for serious organizations only.
No goys allowed.
>>
>>108709338
>Local-ready: At 33B total parameters and 3B activated, Laguna XS.2 is compact enough to run on a Mac with 36 GB of RAM. Available on Ollama
LFGOOOO! But seriously, who would use a literal who model for coding instead of Gemma 27B or Qwen 35B?
>>
>>108709368
Realistically, if someone had the cash, you think Google would let someone buy it? I cant really tell honestly. Id have to agree with you.
>>
>>108709205
when was the last time ik_ supported a model before llama.cpp did? they're too lazy to actually do anything but cheap optimizations.
>>
>>108709369
anyone who is serious about national security of course.
>>
>>108709380
Not unless you have a procurement department. Contract purchases are SUPER annoying for private citizens.
>>
File: 1763256143094335.png (42 KB, 1350x366)
42 KB PNG
>>108709318
kek so a google nigga comes around every month to check?
>>
>>108709369
Finetuned literal who models often outperform them. Because well known models get lobotomized and get trained to know the official dogma of the state. FinetuneCHADS cut that slop out of the ai's mind
>>
>>108709396
they are the only choice when security is non negotiable
>>
>>108709397
Ah
>>108709402
>luring in Google engineer to kidnap
>>
>>108707891
i want a qwen3.6 >= 80B
>>
>>108709380
Wouldn't want the evil CCP to steal gemini would we?
>>
>>108709184
You don't understand games.
>>
>>108709424
Its probably to late that honestly.
>>
>>108709424
it's only in ram and drops it if it detects tampering
>>
>>108709344
nodes>chatgpt slop ui and the abomination that is shittytavern
>>
File: OIP-2823877108.jpg (58 KB, 474x711)
58 KB JPG
>Elon Musk wins case against OpenAI
>OpenAI can't afford to pay out, so instead they give Musk equity
>OpenAI later IPOs to get more funding
>Elon Musk pulls a Steve Jobs and sells all of his equity
>OpenAI stock goes to 0.
>Elon Musk buys a controlling stake of OpenAI, becomes the CEO
>>
>>108709446
>doesn't know how markets work
>>
File: file.png (153 KB, 474x302)
153 KB PNG
>>108709446
In reality, the first two steps alone are extremely unlikely.
>>
>>108709453
Potentially true, but my retard logic has led me to never lose money in the market, ever.
>>
>>108709446
it's a toxic asset at this point, shitload of investor money spent with no plan to return the investment other than "when we reach agi it will find out how to make a profit", quite literally
>>
>>108709091
Cool looking build. Thanks for sharing
>>
>>108709453
NTA but you can actualy pull this off if you are a whale.
ie let's say you own 30% of a company.
if you sold all of those 30% quickly, tons of people would panic sell.
you could then buy more than 30% with the same amount of money as you made selling them, and if you put extra cash you could get > 50% for a discount.
>>
>>108709469
>doesn't know how markets work
>>
>>108709484
>muh insider trading
>>
>>108709464
That's why God created IPOs to unload toxic assets on ignorant retail investors.
>>
File: dipsyAndTetoFG.png (1.41 MB, 1536x1024)
1.41 MB PNG
Tuesday!
>>
>>108709484
they actualy do work like that, that's why "market manipulation" is a whole category of fraud.
it would work, but you take the risk of having to deal with the SEC.
>>
>>108709464
they are going hard on the sunk cost fallacy.
"if you don't invest more we'll not get to AGI and all your money will have been burnt for nothing"
lmao.
>>
llmfan46 seems less autistic than drummer, ngl.
I'm trying his models now, and so far so good.
>>
>>108709505
There are much better ways to manipulate the market than selling low and buying high.

I bet even Qwen and Gemma could answer why anon's fanfic would not work. But somehow you people are more retarded and less able of critical thinking than open weight trash.
>>
>>108709522
I think the abliterated gemma I have is llmfan46
afaik they just ran it thru heretic it's not like a drummer sloptune
>>
>>108709535
>There are much better ways to manipulate the market than selling low and buying high.
i don't disagree.

point is, it'd work and it would be fun even if not the best strategy at all.
>>
whichever anon posted about their Orb frontend yesterday thank you, it's actually pretty good. I like the review/diff feature a lot.
>>
>>108709565
nice work, shill
>>
>>108709570
thanks I do it for free
>>
>>108709565
de nada
>>
>>108707175
Am I missing something here? If the guy uses his heretic-derived tool to make models but doesn't distribute the tool, why are they complaining about the license?
Like if I took gimp and modified it and then produced and shared an image I made using it, I wouldn't have to redistribute gimp or care about its license
>>
File: lolOAI.png (262 KB, 675x704)
262 KB PNG
> CFO Sarah Friar has expressed concerns to other company leaders that the ChatGPT creator might not be able to pay for future computing contracts if revenue doesn’t grow fast enough, according to the report.
> OpenAI missed multiple monthly revenue targets earlier this year after losing ground to Anthropic in coding and enterprise markets, the report said.
> "This is ridiculous. We are totally aligned on buying as much compute as we can and working hard on it together every day," CEO and co-founder Sam Altman and Friar said in an emailed statement to Reuters.
> ChatGPT's growth slowed toward the end of last year, the WSJ report said, adding that OpenAI fell short of an internal target to reach 1 billion weekly active users for the artificial intelligence chatbot by year-end.
> The company has also grappled with subscriber defections, the report added.
Original WSJ article from today is paywalled...
https://www.reuters.com/business/openai-falls-short-revenue-user-targets-it-races-toward-ipo-wsj-reports-2026-04-28/
>>
File: uislop.jpg (165 KB, 726x1440)
165 KB JPG
Can we talk about this shit? Literally all the vibecoded UIs all look the same.

Orb looks exactly like this
this >>108709184 too

You guys need to prompt your UX otherwise everyone is going to know you're a vibeshiter.
>>
>>108709630
It's the vibeshitter equivalent of whispers and shivers. It may bother you, but I bet 99% of the population won't notice or care.
>>
>>108709620
He distributed the tool then removed the repo
>>
>>108709318
What are the odds of Gemini models leaking if the weights are basically being sold?
>>
File: 1767982139093855.jpg (137 KB, 1360x1360)
137 KB JPG
>>108709630
Actually I wanted this UX
>>
>>108709630
>all the vibed ui's all work wtf this is stupid
>>
>>108709620
it's just license retardation, nobody actually cares except reddit autists and shitty corps looking to hijack foss projects
>>
One thing i am worried about is that if v4 gets actual support even in schizo fork will it gave the same prompt processing speed as usual models despite the compression? I kinda don't like the idea of prompt processing taking an hour at the start.
>>
>>108709685
100%. Imo they are already leaked, but since no one has googles tensor whatever gpus, they cant run them, YET
>>
>>108709685
>>108709714
A lucky few have them and it's called Day 0 Gemma.
>>
>>108709382
people here praise ik_ but after trying it myself its an ancient fork that is falling behind
not even worth using imo. even for turbo theres better ones out there
>>
>>108709728
@grok why is xe making stuff up?
>>
Has someone tried fucking mimo yet?
>>
>>108709663
>>108709693
>>108709695
Are you actually defending total homogenization of webdesign?
Do you think everyone should drive the exact same car?
Do you think everyone should live in the exact same house?
Do you think everyone should wear the exact same clothes?
>>
>>108709735
It was strictly superior for a brief time six months ago. Now it's not even remotely worth the hassle for a bit faster PP speed
>>
>>108709789
>total homogenization of webdesign?
Are you actually implying it hasn't been already? I believe the kids call it globohomo design.
>>
>>108709789
>what is material design
You think UX isn't homogenized?
>>
mimo more like homo
>>
>>108709789
you'd have a point if you were talking about chat frontends. but you posted a fucking sun app with buttons and rounded widgets. if rounded widgets to you = vibe ui slop then we've apparently had vibed uis for 20+ years
can you verbalize exactly what design elements you think are slopped because otherwise you're just yelling at clouds. ux convergence is real regardless of how it's arrived at
>>
>>108709783
Fuck mimo. I'm glad that it doesn't even have a llama.cpp PR yet.
>>
I hope elon musk buys mimo and fires everyone and then removes the weights from huggingface
>>
>>108709789
@Gemma explain us why this anon is retarded
>>
>>108709789
>Are you actually defending total homogenization of webdesign?
No
>Do you think everyone should drive the exact same car?
No
>Do you think everyone should live in the exact same house?
No
>Do you think everyone should wear the exact same clothes?
No
>reddit
I think the ai effectively making a functional ui, thats great to use is what gets made first. Once they can easily make ai program this again and again, then you add into your prompt conditions to have the ui designed the way you want. Its called a baseline.
>>
File: IE-039-e1427500757636.png (178 KB, 1000x750)
178 KB PNG
>>108709789
>Do you think that all user interfaces should look and behave the same?
Yes but that ship has sailed with the advent of electron.
>>
>>108709735
>even for turbo theres better ones out there
Is there any point in using turbo, considering we have kv cache rotation
>>
>>108709857
When interfaces were made to be consistent, intuitive, and functional instead of busywork for otherwise unemployable art majors.
>>
>>108709857
Electron shit never have an unified look.
>>
>>108709630
I decided to follow design patterns that I like such as llama.cpp?
What do you have in mind?
>>
File: 1747060208472998.jpg (99 KB, 565x500)
99 KB JPG
>>108709789
>>
>>108709909
>look at me
>i could shit out a javascript front end
>>
>>108709915
>ask a question
>get answer
>mad
>>
>>108709909
It should look like early 2000s frutiger aero. It's the only objectively good and unsloped design to ever exist.
>>
>>108709809
>can you verbalize exactly what design elements you think are slopped
>Rounded widget
>Overuse of bloom / glow
>Colored borders
>Gradients
>Irregular padding/margins
>ALL CAPS titles
>Mixing serif with monospace fonts
>Emojis for icons
There's more but the rest is harder to verbalize.
>>
>>108709897
Thank you for your input, ESL-kun.
>>
>>108709794
>>108709798
>Are you actually implying it hasn't been already?
>You think UX isn't homogenized?
Learn to read.
>>
File: vibecoding4.png (206 KB, 2559x1326)
206 KB PNG
>>108709630
Just because of you I made my UI green. What do you say now, huh?
Ohohohohoho!
>>
something big will drop before the end of this release circle
>>
>>108709979
Idk what to tell you bro, this still reeks of AI generated.
>>
>>108709983
From who tho? And will it matter to local model users?
>>
>>108709951
I can agree with this list. The most unholy slopped interface I've ever seen was when I went to one of google's designer things and asked for a to-do list app. It added pretty much all of what you listed, plus instead of tasks it called them "milestones" and added a metrics widget for tasks completed over time and called that "footsteps" complete with weird cursed corporate homonculus art placeholder when it was empty.
>>
>>108710012
gonna keep bitching or be the change you want to see?
You sound like a poser zoomer
>>
>>108710027
Anon this isn't even the guy who made the original post.
>>
meh
never liked this era desu too bright
>>
HAPLI WEEEN from Gemmy. Honestly I regret not making my own frontend sooner... it's much nicer having tight integration with custom tools.
>>
>>108710048
Would it do a better job if you gave it a screenshot ?
>>
>>108710027
???
I'm just observing patterns in the slop. I don't use them, I just use raw dom to shove some <div>s together and call it a day.
>>
>>108709985
Saw the filename, huh? Pretty observant guy.
>>
>>108710067
What can I say, my mom says I can be pretty smart sometimes.
>>
>>108710071
She was absolutely right.
>>
>>108710054
Gas chamber
>>
File: file.png (229 KB, 2958x1392)
229 KB PNG
>no way to set api key
I'm starting to think vLLM is garbage.
>>
>>108710054
post it
>>
>>108710093
Vllm is for serverbros, and like 70% they modify it anyways.
>>
>>108710093
Imagine navigating through that sperg node once it's fully shat out
>>
>>108710100
>edit api key in custom node
>have to restart the whole program
Python is garbage.
>>
>>108710093
Wouldn't it be your node that's retarded?
>>
>>108710108
Have you never worked on anything in your life? Restarting is so incredibly normal, I genuinely cant believe you are complaining
>>
>>108710115
https://docs.vllm.ai/projects/vllm-omni/en/latest/features/comfyui/#installation
It's their node.
>>
>>108710136
oh no...
>>
File: screen2.png (65 KB, 866x514)
65 KB PNG
>>108710108
use ComfyUI-Secrets
>>
Fuck this piece of shit bubble I bought a 1tb disk for like 40 bucks years ago and now the same disk is $300.
>>
>>108710108
Retard. Not going to spoonfeed you on this one
>>
>>108710174
It would be more useful if the secrets were encrypted.
>>
File: Nemotron_3_Omni.png (516 KB, 1427x919)
516 KB PNG
Omnimodal Sloppatron
https://huggingface.co/blog/nvidia/nemotron-3-nano-omni-multimodal-intelligence
https://research.nvidia.com/labs/nemotron/files/NVIDIA-Nemotron-3-Omni-report.pdf
https://huggingface.co/nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16
>>
>>108710228
>video processing le bad
>>
SimpleBench turned out to be one of the best benchmarks. Most benchmarks are narrow. Models get sub 10% then a year later it's saturated. SimpleBench started at 40% almost 2 years ago and models still haven't reached 80%.
>>
https://github.com/ggml-org/llama.cpp/pull/22405



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.