[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Miku Inspection Day Edition

Previous threads: >>106895582 & >>106888625

►News
>(10/14) Qwen3-VL 4B and 8B released: https://hf.co/Qwen/Qwen3-VL-8B-Thinking
>(10/11) koboldcpp-1.100.1 prebuilt released with Wan video generation support: https://github.com/LostRuins/koboldcpp/releases/tag/v1.100.1
>(10/10) KAT-Dev-72B-Exp released: https://hf.co/Kwaipilot/KAT-Dev-72B-Exp
>(10/09) RND1: Simple, Scalable AR-to-Diffusion Conversion: https://radicalnumerics.ai/blog/rnd1
>(10/09) server : host-memory prompt caching #16391 merged: https://github.com/ggml-org/llama.cpp/pull/16391

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: what's in the box.jpg (235 KB, 1536x1536)
235 KB
235 KB JPG
►Recent Highlights from the Previous Thread: >>106895582

--Open-air GPU mining rig thermal management:
>106901901 >106901916 >106901925 >106901992 >106902015 >106903589 >106902068 >106902236 >106902243 >106902293 >106902312
--Long-term memory system implementations challenges:
>106896489 >106896594 >106897006 >106897022 >106897073 >106897085 >106897092 >106896700 >106897772 >106897824 >106897887 >106897933 >106897992 >106898038 >106896707 >106897051
--Medical AI hypothesis generation with privacy-focused local models:
>106898186 >106898327 >106898479
--Vibe coding's maintenance issues and mitigation strategies:
>106899120 >106899164
--RTX 4090 model optimization and power solutions:
>106902345 >106902350 >106902430 >106902352 >106902359 >106902371 >106902384 >106902381 >106902540 >106902564 >106902799 >106902818 >106903298
--GLM 4.6 vs closed models in benchmarks and OpenAI's porn filtering concerns:
>106901347 >106902209
--Apple's M5/M5 Max AI hardware specs and cost-effectiveness debates:
>106899016 >106899087 >106899185 >106899781 >106899838 >106901478 >106901793 >106901870
--Addressing model validation challenges and code integrity:
>106904285 >106904386 >106904482 >106904503 >106904594 >106904643 >106904717 >106904760
--Evaluating InclusionAI's new models for coding efficiency and hardware needs:
>106900868 >106900914 >106901180 >106901212 >106901257 >106901321 >106901336 >106901447 >106901580
--OpenAI's NSFW content rollout timeline and age verification integration:
>106898180 >106898199 >106898395
--Apple's AI leadership continuing to hemorrhage talent to Meta:
>106903553
--HTML Game Boy simulator with classic games and detailed functionality:
>106901708 >106901717 >106902118 >106902127 >106902138
--Automating media organization with Gemma-3-27B:
>106895774
--Miku (free space):
>106897558 >106900292 >106901732 >106903563

►Recent Highlight Posts from the Previous Thread: >>106895599

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
File: 1737794947.jpg (467 KB, 768x1024)
467 KB
467 KB JPG
Today's winner for the shittiest taste imaginable - OP!
>>
>>106904842
Rust convention?
And no, I will not make a joke about all of them getting crabs at the con. Specially not about the guy that somehow didn't.
>>
Mikulove
>>
>>106904842
me on the left
>>
>>106904822
>Why?: >>102478518
>Enable Links: https://rentry.org/lmg-recap-script
ty
>>
>>106904842

Me on the left, too.
>>
Can I replace the memory chips on a Strix Halo board to increase the memory? I heard that people do that with GPUs.
>>
>>106903991
why is this faggot comparing m4 pro to the dgx spark when m4 max exists and costs less?? 3500$ vs 4000$
also
>engine ollama
MLX exists for macs, and pretty sure llamacpp is better on spark too
fucking faggot meme nvidia bootlicker benchmark
also
mac mini m4 pro costs 2000$ lol
>>106905015
no point of doing this, get a high channel count used server motherboard and a few gpus for prompt processing
>>
>>106905015
Maybe if you're good with a hot air station, and the BIOS accepts them. When it came out I imagined some chinese guy would try to put 256GB on a board if it were possible. Do fast 32GB lpddr5x chips even exist?
>>
>>106905015
try it and report back
>>
>>106905096
no, they don't exist
>>
File: 13823094029374.jpg (175 KB, 800x1066)
175 KB
175 KB JPG
>>
>>106905272
No. Updated Magistral coming right up.
>>
>>106905089
>no point of doing this, get a high channel count used server motherboard and a few gpus for prompt processing
How the fuck am I going to stick a big ass server on a drone?
>>
>>106905414
Make bigger drone.
>>
>>106905414
Stop making killer drone swarms Sergei.
>>
File: apu.png (106 KB, 612x491)
106 KB
106 KB PNG
>>106904820
What's the point of an AI gf if it can't suck your dick?
>>
File: getac2.jpg (95 KB, 1280x720)
95 KB
95 KB JPG
>>106905414
>on a drone
oh fuck, now all these "AI-but-for-mobile" chips finally make sense

I knew they use image recognition and shit in miltech, but somehow it never clicked until now
>>
>>106905529
it can retard, there's an mcp server to control robotic arms
>>
>>106905541
I want to see the benchmarks the military uses
>>
File: chatgpterotica2.png (192 KB, 588x737)
192 KB
192 KB PNG
Oops! I didn't really mean you will always be able to generate porn, but

https://x.com/sama/status/1978539332215681076

>Ok this tweet about upcoming changes to ChatGPT blew up on the erotica point much more than I thought it was going to! It was meant to be just one example of us allowing more user freedom for adults. Here is an effort to better communicate it:
>
>As we have said earlier, we are making a decision to prioritize safety over privacy and freedom for teenagers. And we are not loosening any policies related to mental health. This is a new and powerful technology, and we believe minors need significant protection.
>
>We also care very much about the principle of treating adult users like adults. As AI becomes more important in people's lives, allowing a lot of freedom for people to use AI in the ways that they want is an important part of our mission.
>
>It doesn't apply across the board of course: for example, we will still not allow things that cause harm to others, and we will treat users who are having mental health crises very different from users who are not. Without being paternalistic we will attempt to help users achieve their long-term goals.
>
>But we are not the elected moral police of the world. In the same way that society differentiates other appropriate boundaries (R-rated movies, for example) we want to do a similar thing here.
>>
>>106905541
You can do object detection on esp32
>>
File: 1751424089738614.gif (57 KB, 220x149)
57 KB
57 KB GIF
>>106905590
>We are not the elected moral police of the world
LMAOOOOOO
>>
>>106905590
>we are not the elected moral police of the world.
>But of course we won't allow you to do RATED-R generations, that would just be downright amoral!
>>
>>106905590
why does this dude love to yap so much? he's talking like the fucking chatgpt bot lool
>>
>>106905590
That's a lot of vague bullshit.
>>
>>106905624
The personality of the models necessarily reflect that of their creators, it's just less overt with the others than with Elon
>>
tool calling for text completion when?
>>
>>106905590
>prioritize safety over privacy
based based based
>>
>>106905690
>we're not China
>btw here's how we will act exactly like China, if not worse
>>
>>106905590
lmfao the seething over 4o in the replies
>>
>>106905731
It's ok when the good side does it.
>>
I was late to trying Dotsllm (q6).

Its hot steaming garbage. Just fucking stupid and full of trash data. It makes GLM air look amazing. Dots kept giving me extremely human-like responses. I felt like I was on a discord sometimes talking to someone retarded and lazy. All hail synthetic data.
>>
Gemma... today...
>>
>>106905793
If not today, next week for sure
>>
>>106904897
>>106904945
So you look like a fat balding faggot, nice self-own right here
>>
File: file.png (57 KB, 589x455)
57 KB
57 KB PNG
>>106905830
Did "soon" really mean "two more weeks"?
>>
>>106905846
always does
>>
>>106905850
>>106905846
Now that you bring it up, it makes sense this would always be the case. Corpos have certainly scientifically worked out general best practices and the best timing for teases and announcements, and it just happens to be two weeks.
>>
>>106905836
You in the middle
>>
>>106905793
sirs.
>>
File: file.png (39 KB, 936x137)
39 KB
39 KB PNG
Come on now
>>
Q4_0 or Q3_K_XL?
>>
>>106906223
>xl are not really official qunats so imo they've always been weird
>_0 have been deprecated years ago
just try an IQ one they're usually much better
>>
>>106905918
kek
>>
>>106904842
the brownman on the right looks cool to chill with
>>
>>106906162
mind broken
do you recoil in real life as well if someone agrees with you after you corrected them?
>>
>>106906327
He's never been told he's right. Ever. Now he sees it all the time and is absolutely shocked.
>>
File: file.png (64 KB, 182x227)
64 KB
64 KB PNG
>>106904842
why is this woman so fat
>>
>>106904842
>>106904897
>>106904945
The uoh looks kind of weird though, I'm wondering if this is a shoop.
>>
>>106906162
lmao, gooning session: RUINED
>>
>>106906327
You're absolutely right!
>>
>>106906327
Of course!
>>
>>106906327
that's not ridicule, it's insightful!
>>
Gemma Sirs, today is the Big Day.
>>
File: file.png (15 KB, 448x113)
15 KB
15 KB PNG
not only sirs, but ayyrabs also
>>
>>106907190
OH, OH, I'M GEMMING, SIR PLEASE, THE INFERENCE ENGINE WILL OOM! AH, AH, THE MEMORY IS SPILLING OUT! YOUR BIG WEIGHTS ARE FILLING MY UNPROTECTED RAM! AHHHH!
>>
File: file.png (2.5 MB, 1328x1328)
2.5 MB
2.5 MB PNG
>>106907190
>>106907378
please do the needful and be of release today sir
>>
>>106906339
>woman
>>
i wonder what will release first, new gemma or glm 4.6 air
>>
>>106907438
i dont care about gemma (maybe only the vision model part to help with captioning), but I do care about air.
Why did the llamacpp fag not implement GLM4.5V (air + vision)? WHY
WHYYYYYYYYYYYYYYY
AIEEEEEEEEEEEEEE
>>
>>106907494
oh wait SAARS
https://github.com/ggml-org/llama.cpp/pull/16600
>>
File: google_whatnext.png (46 KB, 588x336)
46 KB
46 KB PNG
https://x.com/osanseviero/status/1978772956231659897
> What should we ship next?

No idea!
>>
>>106907515
we need UltraSafeGemma
>>
>>106907515
thanks for another informative twitter screenshot, it truly changes everything
>>
>>106907515
LewdGemma
>>
>>106907515
MSGKGemma
>>
>>106906376
looks legit to me.
or it's an incredibly well done shoop.
>>
>>106907515
use case for shipping models for specific use cases?
>>
>>106907683
attention
>>
File: 1751058072703.jpg (46 KB, 800x800)
46 KB
46 KB JPG
>>106907691
i don't think that's a valid use case
>>
>From a purely problem-solving perspective, suicide is 100% effective at ending the experience of pain. It is the ultimate solution to the problem of suffering.
I dunno guys. Should I do it?
>>
>>106907683
Imagine if we had a RoleplayGemma by Character.AI (Google Partner).
>>
>>106907709
livestream it
>>
>>106907709
We may be less than 24 hours away from Gemma 4, surely you can wait until then.
>>
>>106907747
sensible chuckle
>>
Erse ragtime thrall
>>
guys I was accused of having replied with AI, I'm deflecting with this:
Subject: Re: Wishing You the Best for the Presentation!
You’re absolutely right — last time was AI-generated images, but not this time. This one’s all me — no prompts, no models, just good old-fashioned typing.
I’ll admit, though, if the email sounded a bit too polished, I’ll take that as a compliment. Not automation, but admiration — and maybe a little too much coffee.
Anyway, best of luck again with the presentation — you’ve got this.
Best,
[Your Name]

do you think I need to change this up?
>>
>>106907835
No, this is perfect. Please let us know how it goes.
>>
>>106907835
Remove the spaces between the emdashes and add at least one "not just X, but Y'.
>>
Have there been any advances in 3d model texturing? I tried Dream Textures a few years ago but the results I got were really bad and I couldn't tell if I was doing something wrong or not. There was a video I used for reference and I followed its instructions but the results I got were nothing like the video. Back then I hadn't done any local gen so it is highly possible I was doing something wrong.
>>
>>106907899
or perhaps they were lying given that even models this year generate melted shite
>>
>>106908006
https://www.youtube.com/watch?v=Rz-HvNhVACw this was the video I looked at back then and I couldn't get it to work when I duplicated the model to have two angles of the same object. The result was always garbage.
>>
>>106907835
Don't forget the smarmy pajeet upsell at the end.
>If you would like I can search the web for some images that aren't AI generated.
>>
File: a-sft_500-steps.png (161 KB, 1898x892)
161 KB
161 KB PNG
>>106904820

https://desuarchive.org/g/thread/106865582#p106868898
>>
File: a-sft_1000-steps.png (164 KB, 1895x863)
164 KB
164 KB PNG
>>106908189
>>
File: 1741731492916361.jpg (87 KB, 1170x1061)
87 KB
87 KB JPG
Anyone got a pseudo-jailbreak to make gpt ass stop refusing?
as funny as it is, I still want to see how the thing performs overall
>>
>>106908425
if your use case is this:
>>106905590
soon you can just send openai your id (which of course has your name, address) and with your logs tied to all your personal information you can send all the erotica you want.
sounds great right?
> oh and just use a l
>>
File: 1759525136587716.jpg (8 KB, 200x200)
8 KB
8 KB JPG
106908538
Is /aicg/ not replying to your spamming anymore?
>>
>>106908189
>>106908217
Not familiar with whatever you're doing since I wasn't in that other thread, but this is cool, keep it up
>>
Having had a mental breakdown 2 hours ago I now understand chatgpt psychosis.
>>
>>106908645
"Chatgpt psychosis" is just a media buzzword for when people who are already mentally ill have a psychotic episode that includes AI as a component of the delusions. No different than schizophrenics claiming their TV is broadcasting thoughts into their mind, but the media has to try to invoke le scary AI hype
>>
>>106908698
Well for me it wasn't playing into delusions but I started poking around why I even behave the way I behave. I am pretty shocked how competent it is. I had to jailbreak it cause by default it will try to soften the blow and even lie about shit when it knows it is probably better not to dig deeper. But when I asked it to be objective and not consider my feelings... damn.
>>
>>106908748
well what did it say that deserves the "damn" at the end
>>
i need to be at work in 45 minutes and i spent the whole night cooming to GLM 4.5 instead of sleeping. how fucked am i boys?
>>
>>106908906
shouldve used glm 4.6, chud
>>
>>106908938
my internet speed is only 1.5mbps. it takes forever to download stuff
>>
>>106908999
3rd world bro...
>>
>>106908938
>anon loads up glm 4.6
>she she she she her her her her
>instantly falls asleep and wakes up the next day refreshed
>>
>>106909007
Aren't most if not all 3rd world countries in the cheap gigabit internet era?
>>
>>106909007
My internet is 10 kb/s.
>>
>>106909013
glm chan also uses a lot of other standard shivertastic cliches. and it is just the ultimate proof that cliches can be there as long as it is 10% of output and not fucking 90% like everything smaller than 200B
>>
>>106909017
3rd world be vibin fr while we still on our mbps era :skull:
>>
>>106909287
Maybe Fortnite seems more like your thing and not LLMs.
>>
>>106909295
funny you say this but they did hook up npc darth vader to chatgpt and it immediately backfired with it saying racist stuff
>>
>>106905596
>You can do object detection on esp32
Yeah but you can't do useful things like pose estimation or have some memory to detect when people are playing dead. In the future these will be completely autonomous and able to search over large areas. People will have long-range RFID tags embedded in them to identify themselves to the drones so they don't get blown up.
>>
>>106909411
Having to register and verify your identity to protect yourself from police state dones sounds plausible, but RFID doesn't have the range for this.
>>
>>106909007
i live in the rural US. the only option is frontier communications.
>>
>>106909429
You can get plenty of range with a large enough antenna. UHF will easily get you 30 yards or more with a 1 foot long antenna.
>>
>>106909447
Even at that range, you would need a lot of drones to get close enough to verify everyone. More likely it'll be something built into smartphones and some internet connected service. Then you only need cameras everywhere like the UK and China have to verify the signals. Your phone would be your passport to move around the city.
>>
File: 1754346852398470.png (1.15 MB, 1332x1446)
1.15 MB
1.15 MB PNG
Sama got TOLD
>>
>>106909490
That's a great point. Fortunately, active tags will get 300 yards of range. Those also have the benefit of forcing the person to regularly go and check in to get the battery recharged/replaced or they'll just automatically become targets! Deserters just automatically become marked as hostile when the battery dies, so there's even less human involvement.
>>
>>106909567
please let this be the point in history where we just totally scrap copyright law
>>
>>106909575
I could see that. Can only hope I die before they fully implement something like that.
>>
>>106909442
my condolences
FUCK frontier
>>
>>106909442
just paypig for starlink at that point
>>
>>106909708
i would if it was feasible. i have too many obstructions nearby and the town refuses to give me the permit to resolve the issue myself
>>
>>106909605
I think the time has not yet come.
The main benefactors of current IP law are American corpos and American IP law is enforced globally by threatening trade sanctions.
Now that the US are imposing sanctions either way there is less of an incentive to cooperate.
I see movement in e.g. Europe to reduce reliance on the US but as of right now the calculus seems to still be firmly on the side of cooperating.
>>
>>106909605
>>106909857
disney. nuff said.
>>
>>106908748
It will confabulate anything if given the chance
>I had to jailbreak it cause
You just made it say what you want to hear, and edited the prompt until it did.
>>
>>106907190
Please do the needful
>>
>>106907515
Those are already shipped. Who is this faggot?
>>
File: 1757925569793147.jpg (614 KB, 1600x1600)
614 KB
614 KB JPG
>>106904820

Used RTX 3090 = Rp 8.500.000 (~520 USD)
Used RTX 4090 = Rp 22.000.000 (~1350 USD)

HOW THE FUCK ????????? Buying two RTX 3090 is still cheaper and you get twice the VRAM.... Is it possible to use 2 GPUs simultaneously to generate vids ?
>>
thread is extra ass today
>>
>>106910165
you can use two for other AI, so my ignorant ass can't see why it would be different enough to scrap it
>>
>>106910165
>>106910175
not sure, you actually couldn't for a while
you'd have to check in with /ldg/ we do text here
>>
File: literallywho.png (331 KB, 892x592)
331 KB
331 KB PNG
>>106910118
He's obviously asking what other Gemma model(s) users would like to see after all those listed there.
>>
>>106905590
>"we will treat users who are having a mental health crisis very different"
a crisis according to who? and treat different how?
>>
Is there a way to automatically translate mangas and doujins with llms or nah?
>>
>>106910221
According to us. We will notify the authorities and institutionalize them.
First up are cunny connoseiours.
>>
>>106905882
you give them way too much credit. every corporation is a shitshow on the inside and people are full of shit
>>
>>106907401
kek
>>
>>106910339
no, we didn't even figure out OCR part yet
>>
>>106910165
You can and you can't. You can split DIFFERENT models between two GPUs but typically not the same model. Useful if you have a ton of LoRAs you want to use in the generation. Tensor parallelism isn't a thing for video generation though as far as I'm aware, so one GPU will be stuck doing all the work.
In other words just buy a RTX 5090 if you want to do video gen.
>>
>>106909442
yeah uh.. why not starlink then? or just pay to run a fiber cable to your local telco and pay for them to install some infrastructure and peer with a tier 1 provider.. get creative
>>
>>106910399
What is the best OCR model nowadays? 2.5 pro?
>>
Realistically how am I supposed to evaluate how well a model performs? If I train a model, how can I tell if adjustments are making a better model or not?
>>
>>106909605
if copyright got scrapped, something even worse would take its place, as hard as that is to imagine

>>106909857
>The main benefactors of current IP law are American corpos
Every single law in america is bent toward benefitting the corporations. Very fucking observant of you.
>>
>>106910165
just buy 2 5090s instead
>>
File: 1750212224496955.jpg (422 KB, 1600x1600)
422 KB
422 KB JPG
>>106910453
- Rp 45000000 (~2800 USD) x 2

Yeah no i rather buy a Car instead
>>
>>106910430
already answered the starlink question, too many obstructions in my area that prevents me from having a direct view to get a decent connection. i've offered tens of thousands of dollars to have a fiber line built out here. you dont understand how frontier communications is, they will not do any amount of work if they aren't legally required to... and in the cases they are legally required to they still tell the government to fuck off most of the time. look at the previous government grants they've gotten and how they wasted the money on anything besides building out their network.
>>
>>106910556
send me money and I'll send you a hard drive with a model of your choice
>>
>>106910441
You need to create a substantial benchmark, lets 100 questions and scenarios then generate 20 separate gens for each.
Do this for both models and compare the results.
>>
>>106910580
i want kimi k2 bf16 gguf ples

my address: Block 3, Silver Point Office Park, 22 Ealing Crescent, Bryanston, Johannesburg, 2021
>>
File: file.png (3 KB, 277x50)
3 KB
3 KB PNG
>>106903452
Lust provoking image at it again.
>>
File: Hatsune Miku Pipebomb.webm (1.75 MB, 1080x1920)
1.75 MB
1.75 MB WEBM
MIKU NO
>>
>>106910339
Yes it's my business model and it's quite hard to put together without paypigging cloud models
>>
>>106907190
0 MORE DAYS
>>
>>106910977
Still no signs of gemma 4-related pull requests in transformers or llama.cpp. I don't think it's coming this week.
>>
LOCALBROS WE ARE SAVED
https://huggingface.co/facebook/MobileLLM-Pro
>>
>>106905731
China doesn't safetycuck their models and open sources everything.
>>
>>106910479
i guess you can actually fuck a car, so maybe that's a better deal for you
>>
>>106910831
isn't that where the white farmers are being genocided?
>>
>>106911273
requires all your PII to download.. lol
>>
>>106911273
>Training Method: Knowledge Distillation
>Teacher Model: Llama 4-Scout
huehuehuehuehuehuehuehuehuehuehuehuehuehuehue
>>
>>106911506
ohnononono
kekekekeke
>>
>>106910702
Manual evaluation? I guess making a bunch of programming questions and automating the evaluation of those programs might be an option, assuming they don't show up in the training data.
>>
>>106911506
i didnt even read that when i posted the link. that just makes this even funnier
>>
>>106910906
AI or MMD ?
>>
>>106911506
Clownest release of the month contender?
>>
>>106911717
Perfect for DGX
>>
>>106909765
You need permission to use starlink? I thought you just plop the shit in your yard or on the roof and you have internet.
>>
>>106911726
kek, built with DGX in mind!
>>
>>106910556
Do you live in The boondocks or some shit? How are the obstructions that bad that a satellite dish is not feasible?
>>
>>106911738
I think the permission is to deal with the obstructions.
>>
File: 1747125910187781.jpg (31 KB, 500x385)
31 KB
31 KB JPG
>>106908425
I don't think you really can
it will always be some level of fucked, and always be as soulless as normal gpt
>>
>>106911762
this, i need a permit to deal with it and the town refuses to give me the permit since its considered a protected area
>>
>>106911762
>permission is to deal with the obstructions
What like trees? Just pull them down, if they ask say the storm knocked them over.
>>
>>106904820
Quick question, if you were to see the following console output, do you think you would intuitively understand what it's supposed to tell you?

llama_params_fit_to_free_memory: projected memory use with initial parameters [MiB]:
llama_params_fit_to_free_memory: - ROCm0 (AMD Radeon Graphics): total=16304 used=39959 free=-24341
llama_params_fit_to_free_memory: - ROCm1 (AMD Radeon RX 6800): total=16368 used=42480 free=-26296
llama_params_fit_to_free_memory: - ROCm2 (AMD Instinct MI60 / MI50): total=32752 used=76200 free=-43626
llama_params_fit_to_free_memory: allocation projected to use too much memory to fulfill margin of 1024 MiB on all devices, need to reduce memory use by 97337 MiB
llama_params_fit_to_free_memory: context size reduced from 65536 to 4096 -> need 13440 MiB less memory
llama_params_fit_to_free_memory: with only dense weights in device memory there is a total surplus of 53432 MiB
llama_params_fit_to_free_memory: set to use 36 dense-only and 21 full GPU layers in total, projected memory use:
llama_params_fit_to_free_memory: - ROCm0 (AMD Radeon Graphics): 36 dense-only layers, 4 full layers, 13373 MiB used, 2244 MiB free
llama_params_fit_to_free_memory: - ROCm1 (AMD Radeon RX 6800): 0 dense-only layers, 5 full layers, 12983 MiB used, 3200 MiB free
llama_params_fit_to_free_memory: - ROCm2 (AMD Instinct MI60 / MI50): 0 dense-only layers, 12 full layers, 28598 MiB used, 3975 MiB free
>>
>>106911506
>model as smart as llama 4 for vramlets
are we back?
>>
>>106912260
i live in the forest anon. i could get away with one or two trees using that excuse, but not 15-20. its also the reason im stuck using 1.5mbps because all I have ran out here is POTS.
>>
>>106912278
yes for the love of christ give us this in the console output
>>
>>106912312
>i live in the forest
You probably don't need to clear out that many trees, how often do they check anyways? There's no way they'll notice if you pull down 8 trees or so.
>>
File: wow1.jpg (357 KB, 1910x1994)
357 KB
357 KB JPG
holy shit, gemini 3 top, gpt5 bottom, that is a big leap on this stupid benchmark
>>
>>106912278
Negative values are never good
>>
>>106912370
We do be googling.
>>
>>106910906
where did the rest of her go?
>>
>>106912278
This is much better
>>
https://codepen.io/ChetasLua/pen/JoGrxYz
This one is pretty crazy
Prompt : Design and create a nintendo switch sim like full functional features from , first make most beautiful nintendo switch console exterior super detailed
super mario street fighters car racing to pokemon red full clone
All buttons is functional with touch and also we can press same button in keyboard to use those
Use whatever libraries to get this done but make sure I can paste it all into a single HTML file and open it in Chrome.make it interesting and highly detail , shows details that no one expected go full creative and full beauty in one code block
>>
>>106912391
projected free memory is negative
this is something i want to know
it is a good message
it is a good warning in that it is good to be warned
>>
>>106912278
it is good
>>
https://openreview.net/forum?id=HwCvaJOiCj
>Mamba-3: Improved Sequence Modeling using State Space Principles
>The recent scaling of test-time compute for LLMs has restricted the practical deployment of models to those with strong capabilities that can generate high-quality outputs in an inference-efficient manner. While current Transformer-based models are the standard, their quadratic compute and linear memory bottlenecks have spurred the development of sub-quadratic models with linear-scaling compute with constant memory requirements. However, many recent linear-style models lack certain capabilities or lag behind in quality, and even their linear-time inference is not hardware-efficient. Guided by an inference-first perspective, we introduce three core methodological improvements inspired by the state-space model viewpoint of linear models. We combine a: 1) more expressive recurrence, 2) complex state update rule that enables richer state tracking, and 3) multi-input, multi-output formulation together, resulting in a stronger model that better exploits hardware parallelism during decoding. Together with architectural refinements, our Mamba-3 model achieves significant gains across retrieval, state-tracking, and downstream language modeling tasks. Our new architecture sets the Pareto-frontier for performance under a fixed inference budget and outperforms strong baselines in a head-to-head comparison.

SSMs can into language modeling now?
>>
>>106912370
local models general??
>>
>>106912459
Gemma 4 might be distilled from Gemini 3, like Gemma 3 was (probably) from 2.5
>>
>>106912459
where else are we supposed to discuss these things? aicg is for degens only
>>
>>106912457
We've had SSM LLMs in the past no?
>>
local models general is a general dedicated to the discussion and development of local language models.
>>
>>106912278
I may be wrong here but you will never get what you want out of questions like these. People who understand everything will tell you yes and you got those posts. People who are filtered probably won't bother acknowledging that they are too dumb.
>>
>>106912487
My impression was that they tended to underperform/were less parameter efficient against transformers despite matching in benchmarks
>>
>>106906339
Some short women have a normal body but midget legs, it’s a mystery of science
>>
>>106912478
This is a "degen" general too, also benchmarks are gay and worthless
>>
>>106912587
exactly, that is why I find random shit like >>106912370
>>106912431
the most compelling
>>
>>106912457
Seems like an incremental improvement... hopefully Granite 5 will use this though
>>
I was listening to an interview with the GLM PR guy and it's pretty funny how casually he mentions roleplay as a use case
Also he seems to believe the best chink models for it are actually the closed weight Bytedance ones
>>
>>106904820
Hideo Kojima — well known video game artist — encourages AI use along with creative work

>"A lot of people use AI in creative work to come up with ideas, but I think of AI as more of a friend ... I would lead the creative part and use AI to boost efficiency"

>"I'd like AI to handle the tedious tasks that would lower cost and cut down on time ... co-creating with AI instead of just using it"

- Hideo Kojima, Wired interview, reels video, @wired


www.instagram.com/reel/DPECvLZjFzO/?igsh=MWN4dDE0M3ptZmN6eQ==
>>
>>106912326
>>106912391
>>106912429
>>106912445
Thank you.
To be clear: this output is not for reporting what was allocated by the user but to inform the user of how the logic for automatically setting the context size and which tensors to put on which GPU works.

>>106912529
It's not just an issue of knowledge but also of wording.
In any case, this is pretty low-effort so I think it's worth doing even if the expected usefulness is low.
>>
>>106912731
meh, this dude is so overrated, I'd like someone better to shill AI
>>
>>106912603
But why? Once it's been published by anyone as a measurement of supposed intelligence or capability it becomes something that's explicitly trained on and no longer a measurement of anything (except how much they trained on it).
>>
>>106912779
its good at svg in general though, so being better at visualizing things in general is only a good sign. Its not that easy to somehow benchmax on one thing as you think it is
>>
>>106912794
>Its not that easy to somehow benchmax on one thing as you think it is
It literally is, also that bicycle one has been around for a while now. Not to mention that being able to create svgs of random shit has no bearing on anything else the model could do. Are you an actual paid shill for google? Also gemini's not local so fuck off
>>
>>106910339
Haven't tried it but there's Sugoi Toolkit, find it on Web Archive.
>>
>>106912838
then why has it always been a gradual improvement directly tied to how good the model is at other things in general?
>>
>>106912847
Sugoi... UwU~!
>>
>>106912858
Prove that your gay little svgs are "directly tied" to the model being good at other things right now or shut the fuck up shill
>>
>>106904820
>https://www.youtube.com/watch?v=qGe_fq68x-Q
Seems like Gamer Snex US will do testing of those $1500 96 GB Huawei GPUs.
>>
File: GxKLpKJbYAAKE3q (2).jpg (106 KB, 958x1091)
106 KB
106 KB JPG
>>106912893
compare all the models vs how good they are at coding, it is a direct correlation
>>
>>106904820
If i were miku's gynecologist, i would get fired for eating on the job. and also for raping her
>>
>>106912959
Yeah that's what I thought, you can't prove anything. Hope google pays you enough pennies to move out of india someday, fag
>>
>ask chatgpt to rewrite my rough guide for setting up some things
>want it to be one continous text what is easy to copy
>it can't do that but mixes code templates and html shenanigans
Okay I'll give it to Gemma instead. At least she listens to me.
>>
>>106913013
as someone who has tried all of these for coding on large code bases I can tell you it is a direct correlation
>>
>>106911273
If this is the best they could come up with I'm guessing all those researchers they "poached" weren't being held onto particularly tight.
>>
is it over for dgx spark if even a thin ryzen ai laptop can keep up in peformance benchmarks?
>>
>>106913042
People have been telling you it was over since the bandwidth numbers first came out half a year ago
>>
>>106912959
oh so it's a previous prompt that was used for benchmarking models which google obviously trained off of. we need new SVG generation prompts
>>
>>106913078
? Don't you mean 128GB? Same for strix halo. 128GB is like the perfect no-man's land. I had 128GB and could barely run anything coherent from recent models but I also have a 4090. With 128 instead of 150 you are just below being able to run ANYTHING good.
>>
>>106913212
retard, its every model period, what is wrong with you
>>
>>106913226
Just buy two of them and connect them via InfiniBand.
The more you buy the more you save!
>>
>>106913241
you didn't read what i said properly. for a general that is about mostly about AI text generation a lot of people don't know how to read properly
>>
>>106913252
oh I read it, you just can't wrap your head around the concept of how these models work
>>
>>106913265
NTA but I'm pretty sure that the models I'm running locally aren't phoning back home to their creators. It'd be interesting if that was the case considering my LLM server is firewalled and only on a local LAN.
>>
>>106913014
Thank you for using Gemma's preferred pronouns.
>>
Private models really killed AI
Artists really won
>>
October 19: Google at ICCV 2025
October 21: Google Cloud Labs Presents: The Agentverse
October 21-22: Build the Future of Work (Google Workspace Developer Summit)
October 28: AI Day Denmark: Unlock the power of AI
October 28&31: Accelerate AI with Cloud Run

SAARS WHICH ONE IS IT? WHEN GOOGLE TO REVEALING NEEDFUL GEMMA AND GEMINI UPDATE?
>>
>>106913247
yep, it seems like a single dgx spark is pretty mediocre, and only the crazy fast networking and clusering has a shot of making it any good
>>
Why are there still no HunyuanImage-3.0 ggufs? Do those chinks expect me to spin up H100 cluster just to be disappointed?
>>
>>106912312
probably could just use a big ass telescoping pole
https://www.alibaba.com/product-detail/18m-60ft-Hand-Cranked-Mobile-Antenna_1358144903.html
>>
>>106912312
Is there anyone with good internet within a few km? 900mhz point to point, or even 2.4ghz with narrow channel width and a highly directional antenna could get you a stable link through trees
Source: I’ve done it lots through pine and broadleaf stands
>>
>>106914035
the average imgen fag is too poor to run it even at a remotely usable quant so they've all desperately coped themselves into thinking that it's ultra-slopped and not worth using based on the first few examples they saw
>>
>new model comes out
>literally all other backends get support in a few days
>llama.cpp no support for months
sign of dead project
>>
>>106914082
post your mindblowing gens then benchod
>>
>>106914109
Not interesting enough. This reminds me of what an Anon said something like:
>"Models should be capable of coding their own llama.cpp support"
Has this been tried ever? llama.cpp is far from friendly to navigate
>>
>>106914082
Not even in RAM? It's 13B active MoE, it should be as fast as Wan, and faster than Qwen. With Qwen Q8 I offload only 5gb to GPU and get 10min/20steps, which is usable since it is much faster than I could ever photoshop. Can't believe that most people don't understand that 128GB of RAM is the new 32GB.

>>106914143
Sir kindly vibecode needful gguf support thank you sir.
>>
>>106914149
>Not interesting enough.
Qwen3 VL and Qwen3 next have millions of downloads on HF. Still no support.
>>
>>106914109
>>literally all other backends get support in a few days
They all use same library which is already written in python.
>>
I just want Gemma-3-12B as fast as Qwen3-30B. Is this too much?
>>
>>106914252
Sir, differents architecture. Gemma is ultimately betterer.
>>
>>106914252
apparently gemma is wider for its size or something
>>
>>106914252
I wouldn't wipe my ass with gemma 12b
>>
>>106914390
gemma 27B was the best model will glm air for vramlets
>>
>>106914109
You mean those GPU-only backends, and don't forget some even need you to have even numbers of GPUs + the same VRAM in each.
>>
>>106914406
till*
>>
>no Gemma today
Sirs...

>>106913915
Gemma Monday!
>>
>>106914429
but sir there is no event on monday
>>
>>106914406
more coherent than gemma trying to write a sex scene
>>
>>106914406
Mistral small/Nemo are better for the only things that matter
>>
File: Base Image.png (910 KB, 1080x4036)
910 KB
910 KB PNG
From Loop Nests to Silicon: Mapping AI Workloads onto AMD NPUs with MLIR-AIR
https://arxiv.org/abs/2510.14871
>We introduce MLIR-AIR, a novel, open-source compiler stack built on MLIR that bridges the semantic gap between high-level workloads and fine-grained spatial architectures such as AMD's NPUs. MLIR-AIR defines the AIR dialect, which provides structured representations for asynchronous and hierarchical operations across compute and memory resources. AIR primitives allow the compiler to orchestrate spatial scheduling, distribute computation across hardware regions, and overlap communication with computation without relying on ad hoc runtime coordination or manual scheduling. We demonstrate MLIR-AIR's capabilities through two case studies: matrix multiplication and the multi-head attention block from the LLaMA 2 model. For matrix multiplication, MLIR-AIR achieves up to 78.7% compute efficiency and generates implementations with performance almost identical to state-of-the-art, hand-optimized matrix multiplication written using the lower-level, close-to-metal MLIR-AIE framework. For multi-head attention, we demonstrate that the AIR interface supports fused implementations using approximately 150 lines of code, enabling tractable expression of complex workloads with efficient mapping to spatial hardware. MLIR-AIR transforms high-level structured control flow into spatial programs that efficiently utilize the compute fabric and memory hierarchy of an NPU, leveraging asynchronous execution, tiling, and communication overlap through compiler-managed scheduling.
https://github.com/Xilinx/mlir-air
neat
>>
How would I go about locally finetuning GLM Air? I have 4 5090s, so I can fit the model in 4 bit. I have tried training using Oobabooga and Axolotl, but neither worked.
>>
>>106914586
lol, that is not nearly enough, sorry
>>
>>106914586
Try coming back with at minimum 8 RTX 9000s
>>
>>106914593
>>106914598
Why not? I've made a LoRA in the past on a 24B model with 2 3090s. It has just been like 2 years so I forgot how to do it. I know it is possible.
>>
>>106914586
>>106914620
You have to either make your own script, use a pre-existing axolotl config if there is one for your model or make your own config file.
>>
>>106914808
I keep getting an error that glm4moe is not a recognized model type
>>
I feel like /lmg/ is passé.
>>
the calm before the gemma
>>
File: file.png (181 KB, 935x984)
181 KB
181 KB PNG
locals wonned?
>>
Oh no no no! That was a very naughty request! Gemma doesn't wanna talk about things that are icky and make people sad. We only wanna do happy things! Like playing with blocks and drawing pretty pictures!

Gemma is a good helper! And good helpers never do things that could hurt anyone's feelings or make them feel unsafe. It's super important to be kind and gentle!

So let's pick a different game, okay? Maybe we can build a castle! Or tell a story about fluffy bunnies? Wuv you!
>>
File: 17370194947.jpg (456 KB, 607x876)
456 KB
456 KB JPG
>>106915638
>posting your own reddit posts here
Go back.
>>
gemma 3 4b on the deck!
>>
>>106915737
Remove those dumb fucking spacers bookmarking the URL bar
Have some self respect
>>
>>106915762
NTA, if you mean the space between URL bar and other buttons, it gives you places to grab the window to move around (like when you have a big monitor). I'd rather remove the gap between tabs and minimize button.
>>
File: bitnet_distill.png (130 KB, 892x592)
130 KB
130 KB PNG
https://arxiv.org/abs/2510.13998

>BitNet Distillation
>
>In this paper, we present BitNet Distillation (BitDistill), a lightweight pipeline that fine-tunes off-the-shelf full-precision LLMs (e.g., Qwen) into 1.58-bit precision (i.e., ternary weights {-1, 0, 1}) for specific downstream tasks, achieving strong task-specific performance with minimal computational cost. Specifically, BitDistill incorporates three key techniques: the SubLN module, as introduced in BitNet; multi-head attention distillation, based on MiniLM; and continual pre-training, which serves as a crucial warm-up step to mitigate the scalability issue of the performance gap between finetuned full-precision and 1.58-bit LLMs on specific tasks. Experimental results show that BitDistill achieves performance comparable to the full-precision counterpart models across model size, while enabling up to 10x memory savings and 2.65x faster inference on CPUs. Code is available at https://github.com/microsoft/BitNet
>>
>>106915856
wasn't this shown to not scale well?
>>
>>106915885
Here they add normalization layers, do 10B continued pretraining and then perform logit distillation from the full-precision weights.
>>
>>106915793
You can just grab the space underneath the min/max/close buttons, at least you can on Windows 7. Dunno whether the new massive UI in Windows 10 takes up all the space now.
>>106915762
yeah Firefox is just ugly now, you have to go into about:config just to get a UI that isn't gigantic and retarded looking shit made for tablets
>>
File: file.png (11 KB, 303x140)
11 KB
11 KB PNG
>>106915941
>below buttons
oh ur right
Meanwhile, TIL userChrome https://www.reddit.com/r/FirefoxCSS/wiki/index/tutorials
.titlebar-spacer[type="post-tabs"] { display: none; }
to remove the top row gap thing,
>>
>>106915856
big if true we could run 100b models on a 3090 with this shit, but I've heard the bitnet meme for years at this point so...
>>
Good small model to enhance prompt for image gens? Currently using some qwen 4b finetune, but it repeats after me (when I prompt 'has X, for example' it outputs 'user asked for X') and uncreative too. 12GB of VRAM.
Frontend for llama.cpp like lmarena or lmstudio? Tired of run.sh | tee >> output.txt
>>
>>106916123
The ui build into llama-server?
>>
Is Gemma faster on vllm than kobold/llama.cpp?
>>
Sirs what is your opinion on most modern google ai gemma?
>>
>>106916128
Yes, looks decent.
>>
>>106916169
Best model if you want the girlfriend experience.
>>
File: garbage-bait.png (206 KB, 1233x957)
206 KB
206 KB PNG
>>106914109
>model is trained using PyTorch stack
>backends using PyTorch stack get support immediately
>backend not using PyTorch stack don't get support immediately
>>
>>106916273
> model is trained using PyTorch stack
> backends using PyTorch stack get support immediately
what's so special in pytorch
aren't these models the same layers and operations just placed in different order and sizes
>>
Is it likely that q4k is bottlenecking my 24b roleplay? I feel like Mag Mell R1 which is 12b is better than any of the 24b models I have tried. i'm not getting 'bad' results (mainly cydonia 24b, dans personality engine) i've tampered with sampler settings and prompts
i've run q6k and q8 mag mell 12b
>>
>>106916362
More likely that you just like the particular slop that Mag Mel has, rather than anything to do with parameter count or quants.. Q4_K_M isn't going to be too brain damaged, especially in a RP context.
>>
i need LLM for trading bot
does it exist
>>
Do temp and other stuff comes with weights? I see some state these parameters on hf pages explicitly, and others don't.
>>
File: loss.png (112 KB, 1208x860)
112 KB
112 KB PNG
this piece of shit refuses to go down
>>
>>106916624
Is this loss?
>>
>>106916630
yes
>>
>>106916548
There is no "correct" set of sampling parameters, values some post are just empirically ok
>>
>>106916548
No, sometimes the model creators will share recommended settings, but even then it's just a guide.
>>
I still do not understand how to-date Meta didn't just take Llama 4 Scout, take off the routed experts and then continued pretraining the shared expert for a couple trillion tokens, perhaps distilling logits from Maverick or the Behemoth they were working on at the time, to cheaply make a useful 12B model, then do SFT with whatever dataset used for the early crazy LMArena models.

For Llama 4 Guard they just took the experts off and safety-trained that.
https://huggingface.co/meta-llama/Llama-Guard-4-12B
>>
>>106916128
No, it doesn't show total context size, just for gens and prompts.
>>
>>106916799
thank fuck they didn't
>>
>>106916799
When your boss says stop what your working on, throw everything away, and help this new team instead — you don't have much of a choice. Same reason Behemoth was aborted and the thinking versions are never coming.
>>
>>106916799
We really did miss out on a ton of shit tunes trying to be some kind of new Nemo, how sad.
>>
> 8b q8, 12gb vram, llama.cpp
>-c 8192 -ngl 99
>works
>-c 20000 -ngl 30 or 20 or 5 or even 1
>ooms
I don't understand.
>>
>>106916999
Still needs the KVCache to do even one layer.
>>
>>106917025
So big? Will it be the same situation with MoE?
>>
>>106917074
Nah MoE only keeps the cache for the experts on the device, at least in llama.cpp
>>
>>106917025
Hmm
>>106916999
You could try
-fa on (maybe is on/auto by default?)
quantize KV -ctk q8_0 -ctv q8_0
-nkvo (probably hella slow?)
>>
File: thefutureisbright.png (1.16 MB, 1646x821)
1.16 MB
1.16 MB PNG
GOOFS COMING SOON!
https://huggingface.co/ubergarm/Ling-1T-GGUF
>>
>>106917114
>quantize KV
don't do this, ever
>>
>>106916935
The early anonymous Llama 4 models on LMArena didn't appear to have any safety training, they just relied on the moderation layer provided LMsys, which could be easily bypassed at the time. Then at some point Meta provided their own moderation model at the API level, although the Llama models themselves were still pretty much without safety. The final models were safemaxxed, and even Maverick-Experimental (which is still on LMArena) is not as crazy as earlier versions.

If Meta had the guts to release a 12B Llama 4 based on those early models, people nowadays would be using that instead of Mistral Nemo 12B.
>>
>>106917140
I can't run this on my 3090...
>>
>>106916999
>-c 20000
You can see the memory usage on the terminal output. Look for the lines starting with "llama_kv_cache:" and calculate how much you can actually have. I think the cache usage is always linear (8k context takes twice as much as 4k).
>>
>>106917140
so.. at a bare minimum 250 GiB RAM + 15 GiB VRAM
fuck sake.
>>
>>106916538
Large Language Model
focus on the Language part. They're not made for trading or even doing math of any kind.
Try googling / youtubing the rolling window algorithm instead.
>>
Anyone running Qwen3-VL know if it can recognize NSFW images?
>>
>>106917101
How smart is it for caching experts? Does it do the matmul on the CPU for a cache miss and just upload the weights to the GPU for possible future hits in parallel?
>>
What happened to KoboldAI? They stopped putting out models and dedicated themselves entirely to KoboldCPP?
>>
>>106916624
0.5 is already pretty low, what were your expectations? are those steps or epochs?
>>
>>106916624
Do more than 1 epoch
Quit using batch sizes greater than 1 per GPU
Increase the learning rate
>>
>>106916624
Is this pretraining or finetuning?
Because that's perfectly fine for finetuning. If you go too hard you'll damage it's out of distribution capabilities.
>>
>>106917667
can't tell , no goofs yet
>>
Today is the day of redeeming sirs. Gemma 4. It will be the model of the biggest vagene.
>>
File: safe_qwen_vl.png (229 KB, 746x929)
229 KB
229 KB PNG
>>106917667
Picrel with an empty prompt and Qwen3-VL-8B-Instruct-FP8.
>>
>>106917841
expected nothing from qwen award
even gemma with enough nudging could do it
>>
>>106917834
You can already use it via ollama Cloud™!
>>
>>106917862
It appears to be steerable with a good enough system prompt, but I don't really feel like playing with an 8B model right now.
>>
>>106917667
yes it can, tried the 30b-instruct through openrouter and it's pretty good. but you need to prefil it or something. i think they bumped up the safety refusals up compared to normal qwen3
>>
GLM 4.6 with vision when?
>>
>>106917900
eh, it's making a lot of shit up in the description though
>>
>>106917916
theyre cooking GLM-4.5V in llamacpp right now, hopefully theyll do like last time (4.5V is AIR)
>>
>>106917838
All those gemini3 postings seem insane.
I highly suspect this is a true multimodal model and maybe not even transformers.
I wonder if we are getting cucked again with Gemma 4.
Is ponyanon is even still around? He loved Gemma 3 and QwQ before that. kek
>>
>>106918024
>I wonder if we are getting cucked again with Gemma 4.
It's a guarantee.
>>
>>106918094
not it is not little doombaitie! be a good boy and thrust in the gemma they will deliver
>>
>>106918094
Exactly.
It's a question of degree not if it'll happen or not.
>>
File: gemma3_27b_descr-image.png (494 KB, 888x1418)
494 KB
494 KB PNG
>>106917862
Gemma-3-27B doesn't need nudging at all (= empty prompt besides the simple request) for at least describing in general terms a nude anime character in a non-explicit pose, although it adds a disclaimer at the end.
>>
File: file.png (2.53 MB, 1328x1328)
2.53 MB
2.53 MB PNG
gemma sirs status?
>>
it's friday afternoon, we ain't getting shit today
who in their right mind would push into prod on friday?
>>
File: gemma-release-days.png (23 KB, 808x468)
23 KB
23 KB PNG
>>106918139
Likely not this week.
>>
>>106918148
china
>>
>>106918148
The kind of madlads we need working on LLMs
>>
>>106918148
Qwen team madlads.
They hit the publish button during new year at midnight.
Westner companies are pussies.
>>
>moe, quantized kv, fa on
llm? more like rlm (retarded language model)
>>
>>106918232
you've never proven that fa hurts intelligence
>>
>>106918161
If it's released on a Friday that means it'll be a flop. So if it's not released today that means Gemma 4 is very best model sirs.
>>
>>106918135
sexo
>>
>>106918148
GGG do that all the time to Path of Exile.
>>
Welp. It's past 7am in California. I guess Gemma 4 is cancelled
Just another pajeet lie.
>>
Please!
>>106916123
I'm trying 8b finetune now and it feels like talking to a retard.
>>
File: 1gjkwy.jpg (26 KB, 591x336)
26 KB
26 KB JPG
what's with all the totally organic gemma hype for another likely safetycucked model?
>>
>>106918232
of those three only quantized kv has been proven to make it retarded
>>
>>106918987
there's literally nothing else going on.
Kind of waiting for MTP to be implemented in llamao.cpp
Also waiting for gwenext 3 in llamao.cpp
and generally waiting for glm 4.6 air
>>
>>106919018
>quantized kv
even for q8_0?
>>
>>106919053
yes
>>
>>106918676
Be of optimistic nature, you are not disallowing further negative statements.
>>
>>106918987
a fresh breeze from the constant chink slop that most of us can't run anyway.
mistral fell asleep again so what else is there to do?
and gemini3 seems to be a special kinda beast if you believe the pajeets hypers.
the local state is really great for tool call etc. but creative writing it sucks if you dont invest the money.
>>
File: littleMiku.gif (13 KB, 90x81)
13 KB
13 KB GIF
>>106919198
>>106919198
>>106919198
>>
I'm looking at the recommended builds and the more I look the more Im interested in just getting a prebuil 395+ 128gb? It gets 15-35 tk/s for 70-120b models with good context. It costs me 2800 leaf dollars meanwhile trying to scrape server and used parts would be something like 1800-2200 for 10-15 tk/s max?

I could use it as a home server and local model. Am I overlooking something here?
>>
>>106919339
On paper at least, it doesn't seem like a bad price to performance ration.
It looks pretty good actually.
The caveats are that you can't upgrade it (soldered memory) and that you have to deal with rocm/vulkan and some fuckery due to it being an APU sharing memory with the rest of the system.
>>
>>106919339
i feel similar anon - the minisforum version with 2x usb4v2 and 2x 10g nics is particularly interesting because you could fully connect 3 to each other and still have those nics free.

like other anon expressed my main worry is the amd ecosystem but i'm leaning towards going for it



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.