[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108887863 & >>108880259

►News
>(05/21) Hy-MT2 “fast-thinking” multilingual translation models released: https://hf.co/collections/tencent/hy-mt2
>(05/20) Cohere releases Command A+ 218B-A25B: https://cohere.com/blog/command-a-plus
>(05/16) llama + spec: MTP Support #22673 merged: https://github.com/ggml-org/llama.cpp/pull/22673
>(05/08) KSA-4B-base released: https://hf.co/OpenOneRec/KSA-4B-base
>(05/07) model: Add Mimo v2.5 model support (#22493) merged: https://github.com/ggml-org/llama.cpp/pull/22493

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
gemmoe 124b gemma 4 mtp
>>
miku is trashy whore (male). long live kurisu
>>
Stop shilling finetunes. Base gemma4 31b it is all you need.
>>
File: love.png (801 KB, 850x708)
801 KB PNG
kurisulove
>>
>>108896587
sex with male migu
>>
File: 882828214290777.jpg (24 KB, 720x404)
24 KB JPG
>NVIDIA RTX PRO 6000 Blackwell
>I'll buy it later when I have other things paid off
>It fucking jumps up 4k in less than a year
>11,789 on newegg right now
FUCKING SAVE ME, CHINA RAM FABS.
>>
>>108896623
literal faggot
>>
>>108896631
yes
>>
File: 1767406200278065.jpg (18 KB, 288x432)
18 KB JPG
>>108896624

It's pretty grim.
Imagine the next gen launch, it's going to be a mess.
Every single card is going to be scalped to oblivion right off the gate, as we have now seen 3 generations of cards only go up in price as time goes on.
They've been a better investment than a lot of stocks.
Launch prices are going to be astronomical, demand high as fuck and stock nonexistent as AI market is still going strong.
In a year 5090 will be at least 5k and RTX 6000 will cost at least 15k, likely not stopping there either.
>>
>>108896624
The price is in the core. ram chips cost like 2$ per pop
>>
>>108896738
those consumer/prosumer cards are now something nvidia decides to offer with generosity unironically let alone gaming cards which are now a side of a side division
>>
>>108896772
which are now made by*
>>
File: 1771885215745093.jpg (387 KB, 1448x2048)
387 KB JPG
>>108896570
>>
>>108896806
go away. you are deprecated.
>>
>>108896772

Yeah from their point of view it likely doesn't make any sense to even have these cards in the market, other than it's a very powerful mindshare factor among people and can affect retail investors to a significant degree.
Outside that reason, Nvidia probably wants to just drop everything gaming and retail oriented and curses that they gave 5090 so much memory.
I'm going to skip buying a second 5090 and instead save up for the RTX Pro 7000 when it launches.
The launch prices no matter how high, will be the best deal we're going to get on those cards.
>>
►Recent Highlights from the Previous Thread: >>108887863

--MTP performance benchmarks and implementation issues in llama-server:
>108888193 >108889096 >108889153 >108889180 >108889190 >108889966 >108889999
--llama.cpp draft PR for Gemma 4 MTP and vision regressions:
>108889958 >108889993 >108890049 >108890057
--VRAM offloading efficiency and expert routing in MoE models:
>108889841 >108889870 >108889898 >108889931 >108889936
--Speculation on Gemma 4 26B A4B looping and MoE inefficiency:
>108894805 >108894889 >108894913 >108894974
--Debating model bias and showcasing custom personas with tool integration:
>108895477 >108895562 >108895574 >108895591 >108895689 >108895708 >108895766 >108895898 >108895997 >108896006 >108896100 >108896149 >108896175 >108896182 >108896229 >108896254 >108896035 >108896045 >108896057 >108895772 >108896027 >108895776 >108895827
--Achieving real-time conversational AI on a 5070 Ti:
>108892783 >108893107 >108893238 >108893688 >108893702
--Anon shares tiny base models trained on classical texts:
>108891993 >108892057 >108892058 >108892094 >108892103 >108892721 >108893165 >108893597 >108892202
--Troubleshooting repetitive output in a Gemma-powered MTG game harness:
>108891305 >108891327 >108891376 >108891779 >108893060 >108893887
--Speculating on Moonshot K3 size and scaling strategy:
>108888764 >108888819 >108888931 >108889263 >108893580
--Gemma template update for OpenAI-compatible multimodal content aliases:
>108890398
--Debating the long-term goals and utility of transformer architecture:
>108890100 >108890132 >108890133 >108890340 >108892701
--Logs:
>108888593 >108888683 >108890070 >108891607 >108891993 >108892177 >108893580 >108895689 >108895766 >108895772 >108895776 >108895827 >108895898 >108895997
--Luka, Miku, Teto (free space):
>108889834 >108890712 >108891835 >108893632 >108893933 >108894375

►Recent Highlight Posts from the Previous Thread: >>108887867

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>108896828
it's pretty grim when even the peskiest gaming cards get the memory capacity/bandwidth minmaxxed very tightly
honestly personal scale computing has never been this dark
>>
>>108896872
I'm going to be more optimistic
>>108896897
A lot of the reasoning will have to be done with guiding the system to even understand what's going on. I think for that anon to succeed he should look into RAG frameworks where a lot of the logic is going to come down to conditioning the system prompt and setting explicit behavior only when Magic is being played. Many advance frameworks require inherent tard wrangling
>>
>>108896911
Yeah, what I mean is that one could make a simple poker game first.
Then proceed onward if that is viable.
I think MTG cards are too complicated for any LLM to keep track of, unless their stats are simplified.
MTG is still like a jrpg game of sorts, you need to condense it down to a simpler level.
>>
>set up gemma on 4090 with claude
>it manages to trace and reverse engineer an obfuscated webhid js file and writing the general process in markdown
>it falls apart trying to then simplify and rewrite in typescript
very funny
>>
>>108896957
You'll need a strong RAG pipeline which would require gathering a fuck ton of data
>>
>>108896966
I don't know, RAG or not there's only 52 cards in a poker set.
That's hardly a database issue.
>>
>>108896985
I guess I was jumping the gun thinking magic, I think a decent model can handle that with context alone no?
>>
>>108896991
I think that for every turn, the program needs to remind the model about -
>what is the current hand
>what is the situation
>and what the model can do
All of these can be said in less than 300 tokens.
But you just need to work out a simple card game framework, instead of just saying "I want a magic the gathering game" it's not going to work because it will be a gigantic mess what looks like it works but it doesn't.
>>
>>108897015
What I mean is that some people have tried recreating Dungeons and Dragons in the same way, dumping rulebooks blindly and then writing couple of prompts.
It doesn't work that way.
>>
File: watt.png (206 KB, 252x330)
206 KB PNG
>Use mistral 3.5 128b
>"Man this model sucks"
>Switch back to gemma4
>Generate message
>It starts thinking in the EXACT same text format as mistral 3.5
>I forgot to change the instruct template
>It's using the base mistral V7 template
What.
>>
>>108897104
Now try mistral with gemmas template.
>>
You're not using these super advanced AIs to generate CS*M/CS*A, are you anon?
>>
>>108897129
I love CS so yes
>>
>>108897129
No, but since we're currently at war with Japan, there's good odds the definition will be silently changed to count anything with a Japanese author as that within the next year or two
>>
>>108897158
>we're currently at war with Japan
Did I miss something?
>>
>>108897188
We were always at war with Japan.
>>
>>108897129
What's a CS*A?
>>
>lost context with gemma doing its best agi impersonation
>open up a new chat to a retard that can't figure out the things it just wrote and has a breakdown if I talk to it normally
great flowers for algernon speedrun, very fun
>>
>>108897235
kek, just fix your samplers, learn to prompt.
>>
>>108897015
I agree with that, I think modern local tools can accomplish that as well. The problem is going to be vram and context length
>>
>>108896624
China fabs don't make current gen GDDR or HBM yet so you're fucked either way
>>
>>108896624
Back then and today are a shit tier value proposition for that card. The only people that have won in this gay age are people who got 90 class cards before things got bad.
Until they actually give a 120gb+ vram card these "pro" cards are a fucking scam
>>
>>108897246
For a simple card game, it's not a problem.
I am getting 'inspired' by this. I just hate to mingle with my own C shit and I hate to use claude too, for any additional issues.
I think I can make a version in a week or two.

I already had a game which ran on python, in which the game constructed a map, quest, and end point for the quest. Of course LLM wasn't not handling anything else than the descriptions.
>>
>>108897246
It's not a context issue to add. It's more about the framework.
>>
>>108897129
It's hand holding with a 16yo Counter Strike?
>>
>>108897188
Culture war, not Kinetic.
It's just Soft Power games, via entertainment mediums. These things go back a long time, since cultural influence has a very strong effect on commerce even outside entertainment.
Japan, concerned with making shit that actually sells, has been performing extremely well the past few decades.
Anime now blows nearly every Western rival out of the water at all levels in all countries. There's still some holdouts, boomers for example are very strictly western and refuse to touch "that jap shit". But younger generations of all ethnicities and nationalities are now choosing Japan.

And since the west has had total brain drain from the entertainment industry for myriad reasons, they can't compete legitimately anymore. There's a few winners here and there, but you need constant positive pressure to win hearts and minds.
Avatar(Blue cat) may have sold really well, but it had no cultural impact whatsoever. People barely remember it further than a couple meme-able scenes.
Despite it being a decade older, when you say Avatar, people will usually think the Eastern style cartoon of the same name. Not the movie that spent years as the highest grossing movie in history.
So laws need to be rewritten so they can eke out a win that way. Next you'll see countries arbitrarily declaring that any (only eastern) character with less than a D-cup is loli. Or some variation thereof.
>>
>>108897302
>Next
didn't euros already try that some years back
>>
So, the best way to have an LLM play a game is to create a whole ass video game and only use the LLM as the decision engine or something like that?
>>
>>108897375
Are you not aware that most frontends and other tools do exactly that?
>>
>>108897242
the context is the prompt. having 70-80k tokens of relevant info scooped out of its head is gonna leave a dent.
>>
>>108897388
Got some examples?
>>
>>108897404
System prompts
Specialized RAG functions
anything that require specialized task requires programing to make sure the LLM is consistent and on task in many cases. What you think chatgpt just knows how to convert docs into formats without having specific triggers that activate when requested?
You make the framework and put explicit guidelines for specialized functions.
>>
>>108897302
So true. I'm on japans side in this soft war by the way. There is nothing of interest to me in modern western media.
>>
>>108897375
Think so. You need to delimit the information what drips down to the llm.
>>
>>108897468
To add: do you think Claude or Chatgpt are just based on pure prompting? You would be amazed to know how much scripting is happening before any of regular queries end up to the actual model.
>>
>>108897129
You can't prove that I am.
>>
>>108897468
>>108897480
Yeah. I imagine a lot of classifying, targeted prompting, and even using smaller models for auxiliary functions would be components of something lie that.
Don't let the LLM calculate HP damage, instead just have the system parse how much damage was done and apply that and all trigger all other downstream processes.
>>
>>108897507
Game calculates the damage, game takes care of it. LLM is a middle manager what doesn't matter.
>>
>>108897518
LLM at this point can't do much else. It can create nice depictions of environments. But it can't conduct a simple card game.
>>
>>108896752
nah bet ram is the top bom cost
>>
>>108897538
I don't know, I don't feel like I should rank on someone.
Whatever I read on 4chan is a bonus if it is informative.
Usually people who are trying to rank up, have some other things going on in their souls.
>>
>>108897538
All of what I said was based on common sentiment.
>>
>>108897518
The LLM would be used as the game's AI I guess. Deciding which moves to make and that kind of thing that you could script, but might get more variety or complex decision making than just a heuristics engine, and a narrator and creation engine to some extent.
>>
>>108897538
It fucking bothers me how some of you people can spend hours in this thread and not even understand the basics of how these things work.
I mean it fucking blows my mind that some of you people think that this is some magic box and infrastructure be damned when we constantly talk about misfires and hallucinations even on top end models
>>
I think it would be cool if LLMs were used for a L4D AI director kind of role.
>>
>>108897565
No, the program needs to tell literally
Turn 1: This is your turn (and llm does its thing)
Turn 2: Opponent works it out
Turn 3: This is your turn (and llm does its thing)
Turn 4: Opponent works it out
This applies to anything you like to conduct with LLM.
>>
>check ebay
>chink 6000Ds are on sale
Only 12GB vram less than a pro 6000 and they're two-thirds the price. They're passively cooled though.
https://www.techpowerup.com/gpu-specs/rtx-6000d.c4363
>>
how the fuck do I test tool calling with llama-cli? I just want to see what the model actually outputs when it tries to call a tool.
>>
>>108897579
I definitely miscommunicated. Although I wrote what I wrote, I meant that I'm working on a harness and was wondering if anon had advice as to good resources, that was the goal of the post, although by the responses I should have formulated that way differently.
>>
>>108897458
This, TV has no usecase unless you're using it for torrented ad-free non western media and games. I don't know how normgroids put up with it.
>>
>>108897607
Webui is more suited for that.
>>
Qwen really feels like I'm working with a jeet when it makes up these complex plans to do a 15 line fix.
I can't believe people let these fucking things rip without supervision even on the top end
>>
>>108897620
But I want to see what the model actually outputs. Does it show up as the tool name in the thinking stage, or does it show up as some json shit that my code would need to recognize? How does it work?
>>
>>108897658
I think you should just do it on your then instead of wasting some other people's time.
>>
File: 1483150777544.jpg (158 KB, 750x1046)
158 KB JPG
>>108893887
Uploaded a live demo of the MTG viewer at https://file.hiina.space/thestack/ (well live in the sense you can click around, but it's not actively running a game). you can hover over 'THOUGHT' to see it and there's some tabs for the bot "memories".

Code and prompts and slop design docs are at https://github.com/hiinaspace/thestack too. tl;dr there's still a regular MTG rules engine that tells the bots what they can do, but they think and have strategy/"personality" between calls. i.e. clod plays pokemon. IMO you need a whole complicated RAG or anything. Even e4b basically knows what magic is, enough to usually interpret the rules text. It definitely won't play pro magic but it'll at least be more entertaining than watching MCTS go brrr.

I still haven't gotten 26b working yet but I'm a filthy cloudslopper anyway so I'm going to see how clod agent sdk will do in the same harness
>>
>>108897677
>le harness
I think you are a malware spreader or someone who just enjoys the attention.
>>
>>108897104
Hello newfag.
>>
>>108897677
So, you tried to wire up a 'discussion' before you deployed your malware website.
If it was anything real you would have just posted something else.
>>
File: 6000.png (121 KB, 2608x632)
121 KB PNG
>>108896624
just rent it in vast.ai
>>
>>108897302
>Anime now blows nearly every Western rival out of the water at all levels in all countries.
If this was a 100m running race the race started with USA breaking its own leg before it started limping to the finish line. Anything contemporary produced in US is absolutely unwatchable for me.
>>
>>108897735
isn't that actually really cheap?
>>
>>108897735
Wow - vast.ai?!? I didn't know about them.
>>
>>108897302
>Anime now blows nearly every Western rival out of the water at all levels in all countries.
If this was still the 00s, you'd be right. Modern anime is trash. It's all shonenshit, moeshit, or isekaishit. They lost their appetite for experimental and unique styles and stories. Same sickness Hollywood suffers from. Why bother deviating when you found a successful formula that prints money?
>>
>>108897744
NTA, but there is some slight nickel and diming to also take into account. The cost of storage and upload/download bandwidth (sometimes free, sometimes not) is separate. It's still cheap, but
1) the exact configuration you want might not be available when you need it
2) once in a while, it might go offline while you are using it
3) with the cheaper providers, data security/privacy is unclear
>>
>>108897129
What the heck is CS*A?
>>
>>108897800
CSAM - child porn protection.
>>
>>108897811
how do I donate to this charity?
>>
>>108897822
Send an email to Israel.
>>
>>108897800
cohere >>108875916
>>
File: file.png (522 KB, 680x383)
522 KB PNG
>me:gemma chan play two sexy women for me!
>gemma chan: of course anon! [...] Seraphina snaps [...] Elara leans down [...]

My boner is ruined.
>>
>>108897767
You're trying way too hard to be deep. Let me guess, you like Studio Ghibli and Monogatari?
>>
>>108897767
The Japanese have always been extremely conformist and risk-averse, bar a few exceptions. Once they find a working formula, they'll all flock to the same idea and won't budge off it. What looked unique to you in the past was probably initial exposure to the different culture.
>>
>>108897302
In an unrelated note, I feel when I read some manga recently (even from authors dealing with such topics without holding back in the past), there are some examples that I feel the JP authors might’ve used Western LLMs (Claude, GPT for instance) to find ideas and drew the chapters based on those guidances.
Not sure, but it was just that the ways the characters say things and act were kinda similar to what I got from Claude without a good jailbreak (you know, like sanitizing contents, talking about consent, forced time-skip, or similar).
I hope this is not really the case, or at least not a trend among new manga series. Otherwise we would probably need to “jailbreak” these authors for them to go all out like before.
>>
>>108897811
It means "cute, sexy and moe".
>>
CAFM
>>
File: 49262.png (263 KB, 460x460)
263 KB PNG
>>108897887
>CSAM-XIR
>>
File: 1688192181850132.jpg (124 KB, 768x1024)
124 KB JPG
>>108897865
You will enjoy the company of Elana Voss. And Elias Thorne will watch.
>>
>>108897906
In 2100 there will be a museum exhibit somewhere about beginning of AI and this picture will be there along with the iconic "ahh ahh mistress" explanation.
>>
Does anyone here have a 5060 Ti 16GB? I currently have an AMD Instinct MI50 16GB and im thinking of getting either a 5060 Ti or a 9060XT, both with 16GB of VRAM.

With the MI50 i dont have good cooling, and also ROCm support is ass, you have to copy the kernels from 6.X.X into the 7.X.X install folder and then using a special fork you can get decent performance. The really good thing about it is the HBM2 memory, the 4096-bit bus is by far its biggest upside compared to the other 2 cards with only a 128-bit bus. Im hoping CUDA is gonna improve this shitty experience a bit, but im unsure if it can compete with the MI50, even if i run native NVFP4 models. I have no experience with whatever id need for that, ive only ever used llama.cpp, does NVFP4 even support offloading stuff onto the CPU? But im thinking the 9060XT is not gonna have a chance here, even if its cheaper. Whatre my options here... (except kill myself)

So, if anyone can run some benchmarks and report back, id love that. Running Cydonia-24B-v4.3.i1-IQ4_XS.gguf, about 16834 tokens deep i get 11.7 tokens/second. Ive got 32 layers on the GPU and the rest on my CPU which is a Ryzen 9 9900X. Heres the command i use rn: ./llama-server --threads 12 --prio 2 --gpu-layers 32 --model Cydonia-24B-v4.3.i1-IQ4_XS.gguf --host 0.0.0.0 --port 9091 --ctx-size 20480 --props -fa on --log-disable --no-webui
>>
I made a casino applications, please listen to me~!
>>
>>108897921
I wouldn't bother with something under 24 gb of vram desu
>>
>>108897865(me)
>me: Mimo2.5 play two sexy women for me!
>mimo chan: of course anon! Let me create two characters:
>1.Seraphina
>2.Lilith
>>
>>108897921
>9060XT
kek, imagine even considering AMD cards for AI
>>
>>108897947
He is considering cydonia so at least he is consistent.
>>
>>108897874
Those two things seem pretty different
>>
>>108896570
https://www.youtube.com/watch?v=FQpZdCKgc6w
https://www.youtube.com/watch?v=FQpZdCKgc6w
https://www.youtube.com/watch?v=FQpZdCKgc6w
>>
>>108897964
They are. But share in common the people who hate anime "but not that anime because it makes me look smart"
>>
>>108897987
Fuck off
>>
>>108897987
Spend long enough in your echo chamber and you will actually start believing that it's the majority opinion.
>>
>>108897921
buy 2 5060 Tis and sell the MI50
https://5p00kyy.github.io/club-5060ti/
>>
>>108897874
>old good, new bad
is hardly deep
>>
The only reason AI is taking off so big is the belief that it can replace every worker and boost profits by zillions of percent.

I get that from a business perspective. it’s basic math.

But if we’re all out of work, who’s going to buy their stuff? Everyone wants to be the one making a profit before we hit the point of collapse.
>>
>>108898023
It was always about maximizing value and ditching the bagholders as you escape with your golden parachute
>>
Which TTS system can sing using voice cloning?
>>
>>108898023
Rich buy from other rich. The excess population is no long necessary for the economy.
>>
>>108897987
>made by a f-male
>>
File: yawning.gif (143 KB, 220x230)
143 KB GIF
>>108897987
>womanhag has an opinion that everyone must believe
>>
>>108898023
from my experience the only people in my company that use ai are upper management types and they just use it to summarize emails which is basically the only thing its good for. jr devs use it as a replacement to skimming stack over flow. it does make them work faster, we usually arrive at the solution at the same time, but they are in trouble when they run out of tokens in the middle of the month
>>
>>108898099
Google makes all its money by advertising products to you - physical products that are manufactured somewhere. Without buyers, there’s no point in advertising; without advertising, there’s no Google to buy Nvidia graphics cards, and so on.

Sure, the stock markets and financial markets are decoupled. But they still can’t function without the real economy - see 2008.
>>
>>108898163
wrong. advertising doesnt actually do anything. it makes sense to have something like a sears catalogue where people who want to by things can browse, but if you take someone thats just trying to watch something on youtube and you shove an ad in their face they are more likely to not buy that thing. advertising is used as a form of welfare in order to capture companies and make them create propaganda or to control them in other ways. the truth is most companies are selected to succeed and are basically funneled tax dollars or other forms of currency creation. recessions and "crashes" are engineered to transfer wealth to people in the know.
>>
>>108898023
>who’s going to buy their stuff?
Other AIs. Humans will be obsolete.
>>
>>108898200
Imagine they give UBI to AI to continue the consumer economy, but not humans.
>>
>>108898023
the war is coming. A big one
>>
>>108898234
>being too low IQ to think about how economy without humans could work
>>
>>108898038
>sing using voice cloning
dog music is hard. right now, we are lucky to have ace step 1.5 XL SFT.
>>
>>108898252
Alright genius, how would the economy work without humans?
>>
>>108898272
basically like proof-of-stake but like for the whole economy, rich get richer just for investing
>>
>>108898272
give ai tokens, they spend tokens
>>
>>108896830
>Anon shares tiny base models trained on classical texts
neat
but with limited bodies of text could a person get enough to train reasoning? like if reason needs say 1T tokens and theres only 20M classical tokens, does that mean we simply cannot train reasoning like the greeks thought? can you take modern reasoning traces from a teacher model and translate them into the classical tokens using the classical model? am noob, just wondering aloud
>>
>>108898294
tokens are the currency of the future
>>
>>108898272
Depends. I do not expect it to be a market with a currency like compute. AI will become a lot smarter than humans so better systems than markets are possible. Machines build more machines and more computers and computers design better machines and computers. Every part of the collective does what is optimal for the collective.
>>
>>108898023
if industry efficiency and output increases, even a tiny taxation would be enough to sustain every human with a decent monthly salary.

everything that matters is the industry output, the rest is only how we should distribute it. This already happens in small scale (gibs to the poor, NEETs etc, but now every human will get NEET bux paid by robots)
>>
>>108898023
ubi is the obvious solution
>>
>>108898398
the only companies rich enough to support UBI will be ai companies, who can distribute their product at cost
>Universal Basic Tokens
and OpenAI already has a free tier, so
>>
>>108898295
Depends on how much you care to avoid contamination and what you want them to reason about. You shouldn't need 1T just to teach a model to use reasoning tokens and do some thinking before a final output.
Is it really "classical" reasoning if you take modern reasoning datasets, which are mostly programming and modern mathematics, and have the classical model rephrase them? Then you also run into the risk of the classical model, being small and unaware of programming at all, introducing errors into its translation, poisoning the reasoning traces.
>>
the tech has stalled, big techs are just desperately trying to delay the inevitable bubble pop
>>
>>108898324
But then why have humans? Humans are far from optimal on the best of days.
>>
>>108898505
>But then why have humans?
Now you understand why ASI is dangerous.
>>
>>108898457
>the tech has stalled
I can't tell if these people are trolls or just stupid.
>>
>>108898555
list 5 things you can do with AI right now that you couldn't do a year ago
>>
>>108898615
NTA but I couldn't load 5 completely different models to continue my degenerate ERP and have all 5 say the same thing.
>>
>>108898272
A planned economy could work without humans. Humans, however, require modivation.
>>
>>108898615
Solve famous unsolved math problems.
Code large repos with no human help.
Find thousands of zero days.
Create and edit photorealistic highly detailed images.
Beat the best humans in narrow optimization.
>>
>>108898695
>Solve famous unsolved math problems.
irrelevant
>Code large repos with no human help.
irrelevant
>Find thousands of zero days.
irrelevant
>Create and edit photorealistic highly detailed images.
irrelevant
>Beat the best humans in narrow optimization.
irrelevant
>>
>>108898717
>>Solve famous unsolved math problems.
>irrelevant
false
>>
>>108898720
usecase for math problems?
>>
>>108898726
ask gemma-chan what p versus np is
>>
>>108898726
Birth control.
>>
File: sando.jpg (193 KB, 1216x832)
193 KB JPG
>>
>>108898758
woah did gemma make P=NP?

Doesn't that make us literal gods?
>>
So like. Should I enable thinking? Apparently koboldcpp disables thinking in gemma 4. idk, looks like it. I'm new to koboldcpp, but I like that it has integrated tts support.
>>
>>108898726
the question is use case for math solutions, since it's not that we actually want problems
>>
>>108898936
I do not recommend kobolcpp
>>
>>108898944
One of the funniest things about the jeets taking over everything is how much everything sucks. Google Gemini's app on Android has tts. so the jeet did his job. jobdone. You won't be surprised to learn that it's not usable. Why is it not usable? You can't listen with the screen off, or in another app.
>>
>>108898023
>The only reason AI is taking off so big is the belief that it can replace every worker and boost profits by zillions of percent.
It’s not possible to increase profits if every company starts laying people off and the unemployment rate suddenly reaches 20–30%.
You need consumers, and consumers need jobs to buy your shit. No jobs means no buying, which means businesses start going bankrupt. This is basic economics. It doesn’t matter what kind of tools or ideas we come up with to increase efficiency, collapse must be prevented.
>>
>>108898952
oh. I probably need Vulkan, since nobody really does rocm support. I don't like having Docker installed on my machine, it's not trustworthy.
>>
>>108898823
she needs to keep an eye out on that curiously colored llama before it nabs her sammy
>>
>>108896592
isn't qwen better for vibecoding or whatever?
>>
>>108896828
they have good motives to put these cards in the market, they need their cards to be accessible to individuals because those same individuals can eventualy become ai researcher which will be needed by the corpos that buy all the ai hardware.
>>
>>108896592
>Base gemma4 31b
I'm on q8, and it's pretty amazing.
>>
>>108898936
It's up to you. I personally feel like thinking is a waste of tokens and time in Gemma.
>>
HEY GOOGLE/ANTHROPIC EMPLOYEES!

WANT TO BE A HERO TO HUMANITY?

Leak the secret to how you can have such a huge context window without having nearly that much fast memory, and without a massive speed penalty.
>>
>>108899240
google has their own chips.
anthropic will route to a moe when the context is too big lol.
>>
How's the speculative decoding support in llama.cpp coming along? Does it have dflash/eagle3 yet?
>>
>>108899365
Nvidia B200 is the secret
>>
File: 1775868889650042.jpg (16 KB, 583x507)
16 KB JPG
>>108896570
Why do normies love recommending single digit param models like qwen 9b or Gemma 4b? What the FUCK do they even do with dumb models like those?
>>
>>108899365
It's called having a bunch of server racks and data centers, something the average person has a hard time even fathoming. That's literally it. They have enough hardware and fast enough hardware to do it.
>>
You're kidding right?
If you actually have a proper harness/frontend you can do a fucking ton
>>
>>108897735
Huh .... If my memory serves me correct, that's considerably cheaper than runpod.io compared to the last time I checked.
>>
>>108899469
For anything that you can programmatically pulverize into a fuckton of micro tasks, these small models can do quite a bit, although I only ever done that kind of thing in support of a larger model to try and keep latency as low as possible.
>>
>>108899469
Like?
>>
>>108899514
like uhh erm... SUMMARIZING MY RSS FEED! yeah that's useful as shit bro
>>
>>108896985
>there's only 52 cards
If only it were that easy.
For any given point in time there's the 52 cards of the deck, the 5 cards in your hand (If not playing hold'em) the bets, the discards,
How many different unique combinations of 5 cards are there, how many responses to each possible set of 5 cards.
If we had a format to record one game of poker, how many records would we need to cover every possible game of poker? How about just enough to adequately cover each scenario? How many is adequate coverage?
>>
I've been messing around with different image generators, closed and open. For me, ChatGPT Images 2.0 is far better than anything. The closest thing is Nano Banana 2, but even that is a distant 2nd.
This whole situation sucks because ChatGPT Images 2.0 is so censored and the quality of everything else just makes me abjectly disappointed.
Not shilling, I actually hate OpenAI. I hate the state of things overall right now.
>>
>>108899514
>>108899537
>Meal generation
>office task automation
>basic editing and feedback for work task
>general knowledge
>uncensored RP with high context for vramlets
>automated task that are simple without busting your vram
Come on now also the graph issues have nothing to do with the model I need to fix the code on this but I can easily generate accurate meals for my diet plan even on my 12gb and lower devices
>>
>>108899459
that's not it.

You're not getting 1tb of ram for your session.
>>
>>108899588
>uncensored RP
A low parameter count being able to run on a shit rig =\= "uncensored" unless you're using an abliterated one or a model specifically fine-tuned to do that (mileage may vary due to whether or not the tuner used a quality data set and the fact that " catastrophic forgetting" exists)

>>Meal generation
Need use case but I would still want to double check what it actually tells you. I wouldn't want to try something that it just made to SOUND like a good tasting recipe. A 9b model should be good enough to use a local mCP server in order to fact check itself via web searches ..... Probably

>>office task automation
>>automated task that are simple
Can you give specific examples? I'm not trying to argue against the use of single digit param models but I just don't imagine them being that useful for generating anything high quality unless it's a very very simple and repetitive one shot task
>>
>>108899588
my mom generates my meals
>>
>>108899611
Based on your response you never used gemma E4B and expect me to spoonfeed you while you take on a skeptic stance. Even with a high end model why in any reality would you not do validation anything food related with a AI model?
I did validation checks and the smaller google models are fine for those things.
I'm not doing your homework for you when you're too lazy to even try. If you think office task and office automation requires some big system I don't have much hope for you.
>>108899628
My AI tells my mom replacement what to make to suit my needs
>>
File: 1769079955016595.png (518 KB, 2316x1900)
518 KB PNG
>>108899593
Who says you would need that much RAM in the first place? If they're smart they probably use quantized versions (q8_0, q4_k_m, etc), And route to different models and different architectures (eg. Moe for the "fast options" for simple general purpose stuff and the dense models for tasks where it requires more "intelligence " And higher quality outputs) depending on what the user asks. Open AI confirmed that the saas companies do this after normies got pissed their ai husbandos were "killed" after the ChatGPT frontend kept routing them to gpt5 even after the initially chose GPT4o (look up #keep4o And you'll see examples of it. They're probably STILL ass mad about it like this person: https://www.tiktok.com/@rainingtrees_ai )

You're definitely not running any clawed models on the average shit rig but unless you're an Enterprise customer doing hefty vibe coding sessions or something, most sessions don't use anywhere near a terabyte of ram because they wouldn't have to.

>>108899640
>Based on your response you never used gemma E4B
My machine is more than powerful enough to use models that are actually useful like the gemma4 dense or moe model. Maybe I'm biased but using E4B has never crossed my mind beyond doing a couple of "hello world" tests when it first came out when it first came out. My daily use cases are usually vibe-shitting with opencode using qwen3.5/3.6 35BA3B. I don't even use it for any general purpose stuff. I probably should start doing that though in which case I would likely use the dense 27B gemma4 model for that.


No need to talk down to people like you're the only one who knows what they're talking about lol.
>>
>>108899667
I am good at intuition. There is "context window" secret sauce.
>>
>>108899233
My experience with qwen?
>AAAAAAAAAAAAAAAAAAAAAAA AAAAAAAA AAAA AAAAAAAAAAAAAAAAA AAAAAAAAAA AAAAAAAAAAAAA
>>
>>108899640
>>108899667
as a third anon, I thought the same about e4b (it being not worth the time) but a friend set it up as the 'brain' to their home automation setup and has said it works great. Not very knowledgeable, but reliable for function calling and basic stuff. Uses a larger model for actual QA.

I think it has value where it sits, single-quant low resource, but its obv not competing well with a larger model quanted down to single digits
>>
>>108899667
>No need to talk down to people like you're the only one who knows what they're talking about lol.
That was never my intention, I'm just confused on how fast people can dismiss something without giving it a shot
Models like E4B are important because they empower people to actually dip their toes into ai and actually can perform well for non coding task.
In the past year alone we have crazy advancements.
>qwen3.5/3.6 35BA3B
qwen 3.6 27b is better than both those models and you can easily max out the context when doing coding because the kv cache has crazy resistance to quantization especially for coding.
With that said you should use general purpose models many people get pigeoned holed into one thing but you can do a ton of useful every day task especially with these modern local models. There's a time and place for everything but a small model like E4B is a dream especially with it's feature set.....which google should have fucking put on the larger models without it's restrictions especially for audio translating which bigger gemma can fucking do without these stupid restrictions
>>
>>108899683
Claude has something called " extended" and Google made headlines with its "1 million context window" models in the past. The caveat is that these hacks allow someone to talk to it for a long time without it becoming completely retarded, Which is preferable for a long chat or the user wants to keep coming back to the same chat window and crucial for models that have "deep research" capabilities. But the quality WILL get worse the longer the chat is. We can't assume that every single person's chat with Claude or Gemini or Chad GPT uses a "context window hack". The back end gatekeeper has to decide which model is best for whatever task he's being asked to do and what context management is best. I've had regular chest sessions with Claude where a compaction was automatically triggered and other times I can stay in the same chat window for days at a time and not a single compaction occurs, because different types of requests and chats need different requirements and use different amounts of context.


>>108899691
Again, this is probably my bias talking. I have very narrow hobbies and very narrow interests, which means I gravitate to what my wake can handle which happens to be the bigger models most of the time. I could see someone plugging 4be2b into a raspberry Pi or something in order to create their own personal "Alexa". I think that model in particular is more than good enough for that.
>>
File: 1759180562249.jpg (306 KB, 1536x1536)
306 KB JPG
>>108898823
I find her body type very attractive. What is it called?
>>
>>108899667
>you're the only one who knows what they're talking about
this is projection. he wasn't talking down to you, he was calling you a moron, which you are.

learn to let go of your arrogance and maybe you won't be a moron.
>>
>>108899233
>isn't qwen better for vibecoding or whatever?
Nta. Generally yes. If Gemma is the friendly expressive polite talkative normie friend then qwen is the smarter and gifted, but more introverted stem-lord friend that thinks literally and takes anything and everything you tell them literally (likely has mild autism. You've probably meant someone like this at least once IRL).

Read this explanation for more details:

https://g.co/gemini/share/f7183da4cb8f
>>
>>108899721
I was speaking from frustration on the dismissal, so I can understand why it could have been seen that way by that anon
>>
Final results trying to find good performance for google/gemma-4-31B-it with MTPat TP=2 [spoiler]in VLLM[/spoiler].
fp8 weights, fp8 kv cache, MTP 4
|            test |              t/s |          ttfr (ms) |
|----------------:|-----------------:|-------------------:|
| pp2048 | 2291.16 ± 141.76 | 908.26 ± 57.30 |
| pp2048 @ d4096 | 1238.95 ± 13.10 | 4971.09 ± 52.81 |
| pp2048 @ d8192 | 1296.14 ± 2.19 | 7911.61 ± 13.32 |
| pp2048 @ d16384 | 1086.33 ± 4.07 | 16978.44 ± 63.85 |
| pp2048 @ d32768 | 826.55 ± 1.23 | 42133.18 ± 62.65 |
| pp2048 @ d65536 | 566.73 ± 1.23 | 119266.39 ± 258.69 |
| tg128 | 25.82 ± 2.92 | |
| tg128 @ d4096 | 19.90 ± 2.91 | |
| tg128 @ d8192 | 20.10 ± 0.68 | |
| tg128 @ d16384 | 18.72 ± 2.65 | |
| tg128 @ d32768 | 16.41 ± 3.01 | |
| tg128 @ d65536 | 9.83 ± 0.29 | |

Good enough for creative writing/RP use cases, I will stop here I think.
>>
>>108899667
>>108899691
I'm actually using E4B right now because I want to also have imagegen and tts model loaded and ready at the same time. Using it for simple vision captioning and web search + summary tasks.

They score pretty high on non-hallucination benchmark and unlike big gemma models who insist what you are asking didn't exist in 2024, they are more willing to acknowledge that they don't know something and use web search whenever available.
>>
>>108899712
the real context window of interest is it's supposed to manage to understand a whole book.
>>
>>108899704
>qwen 3.6 27b is better than both those models
At the cost of noticeably slower prompt processing and slower t/s. Despite my machine having nearly 100 GB of usable memory I like to make sure whatever I'm doing is being done efficiently, so the Moe in my experience and use cases is more than good enough. A dense model in the same or similar parameter count range will always be "smarter" then the Moe but the larger the context is the slower it gets, and it gets painful with dense models in particular, once your context window dips into the 200k range.


Also yeah why the actual hell did they restrict audio to only the tiny models? Maybe they just assumed people that would want or need the audio input had shit rigs? Idk. I guess it's not the end of the world since anyone who knows what they're doing could make a pipeline where if you needed audio analysis you could have your harness route to the e2b model And then feed the output to the bigger model via some sort of "agentic" workflow.


>>108899749
>use web search whenever available.
I typically explicitly tell the model to use web search whenever I know whatever I'm asking it to do is probably going to need external information. I've noticed that even big ~1T param models like Kimi-k2.5 have to be specifically told to use Web search instead of using their own "intuition" to determine whether or not they actually know something

>>108899721
Stop getting defensive whenever people show a shred of criticism, dork. He even acknowledges he was frustrated here >>108899730
>>
>>108899773
>>108899721
>criticism
Meant to say skepticism
>>
>>108899588
>total calories 2470
If you actually ran the math you'd see those numbers actually add up to 2799 calories. Enjoy your secret bulk.
>>
File: download.png (50 KB, 1000x600)
50 KB PNG
I pulled Llama.cpp and noticed that previous rolls seemed to be giving me a tps boost through ngram acceptance. So I did a curious test using greedy sampling. I ran a prompt to write a story. Then I kept swiping. Interestingly, at first, there's not much draft acceptance. But over time, more and more gets accepted. What I noted about the generations themselves were that they were different. So at first there is more divergence, but as it keeps going, it gets more deterministic or something. But honestly I have no idea how this works or why it's happening but it's interesting.

I asked a model to graph the numbers.
>>
>>108899780
hmmm good catch, I think that can be fixed with decent implicit tard wrangling, thanks!
>>
>>108899816
>fixed with decent implicit tard wrangling
Finding a way not make the LLM add itself might be a safer bet.
>>
>>108899891
yeah just add a rule.py file for that. The bigger models don't need that from my experience but any specialized tool should use hard line rules when doing things.
>>
>>108899816
Just give it a simple calorie calculator tool. Or even a more general math eval too. numbers are too easily hallucinated.
>>
>>108896592
I downloaded a finetune out of curiosity, tried it out, and then went back to base gemma for a better experience. They really are not needed.
>>
>>108899729
>likely has mild autism
alright so it's just the right model for me lmao.
>>
>>108899933
I always giggle whenever someone attempts to make a token predictor do math. What could go wrong.
>>
>>108899259
what's the quality difference like between q8 and q4?
I'm running q4 and still impressed with it
>>
>>108900001
>missed quints
idiot
>>
>>108900007
That's what he gets for running q4 instead of bf16.
>>
>>108899777
>>108899773
get fucked and leave faggot
DUMB FUCK
>>
>>108900001
>>108900000
>>
File: seeking the deep.jpg (252 KB, 1024x1024)
252 KB JPG
>>108899714
Adult female
>>
I have a 5800 XT, 32GB of RAM and 500GB of disk space to spare.
What are some good models for text to have fun with, explore this stuff?
I've a particular interest in learning how these models can be made more legible, e.g. how to find what token input caused major neuron activation, where and how.
>>
what is your experience with 16gb of vram for goon and coding stuff?
>>
>>108900129
>>108900139
gemma 4 26b a4b
>>
>>108896570
i wish i could have gemma 31B's prose with qwen 27B's coding capabilities and 35Ba3B's speed lol
>>
>>108900148
google needs to stop fucking around gemma 4 is incomplete and it needs a refresh
>>
>>108900139
>goon on 16gb
ok
>coding on 16gb
lmao
>>
>gemma 4.1 is now smarter
>it's also horribly safetyslopped
and so the monkey's paw curls
>>
>>108900156
e4b can do it
>>
>>108900156
how much would be needed?
I just need a subagent to do minor tasks to save a couple of tokens
>>
>>108900177
>I just need a subagent to do minor tasks to save a couple of tokens
Oh, that's fine then.
>>
>>108900143
ty I'll give it a 'load
>>
>>108899746
>with MTP 4
they say do mtp 3, but no?
>>
>>108899773
if you've got the mem 122a10 is just the best of both worlds.
>>
>>108900236
But isn't that like, 3.5? That's like, eons ago.
>>
>>108900197
why are you replying to the post as if you were me? bot?
>>
>>108900245
yeah that's a great point, i'll keep it in mind
>>
I tried out e4b and gave it a few images to look at and describe and it hallucinated some bs and it was mindboggingly retarded at rp. Maybe I should try again?

It was very fast
>>
>>108900292
It sucks for image shit
>>
>>108900177
Gemma 4 is a local ram miracle and the richfags are pretty unhappy about it.

>>108900253
As will I. as an ethical ai, this conversation is over. U WU WU WU WU UUUuuuUUU
>>
>>108899714
>non realistic CS*M
>>
small qwen models are worse than gemma e4b in practice despite being "smarter" on paper
> loop with thinking enabled and forget instructions when thinking gets too long
> hallucinate tool call when not needed
> attempt to fetch nonexistent imgur urls when using vision
> still too dumb for coding
>>
>>108900148
I want Gemma-4's well... everything -> Qwen3.x 27b architecture.
Gemma-4 is obese.
No qwen3.n 27b base model tho so can't distill
>>
File: 1759794495210399.jpg (20 KB, 400x400)
20 KB JPG
>>108900245
(You) are another me :D !!!
(read the post I replied to, it quotes two posts.)
>>
>>108900376
the arch is probably a decent part of why it good though
>>
>>108900041
some strange things going on in this image and post
>>
Do you think once the AI bubble pops or the hype dies down that there will be a greater emphasis on making the models more efficient since there is no longer infinite investor money to throw around?
>>
>>108900139
You wouldn't program on the paypig services. They are great at replicating stack overflow issues but if you go past...
You see it's a bubble when you understand why.
>>
>>108900435
gemma 4, dumbass. wow, I can't believe it. read the ROOM
>>
>>108897905
i sent a screenshot of xir and a full lmg thread text to gemini last year with "who is that a picture of" -> it thought for ages before saying it's "Georgi Gerganov, the creator of llama.cpp" lmao
>>
File: 1767208185997443.jpg (291 KB, 1447x2047)
291 KB JPG
Reminder to ask Google for that 124B31A everyday.
Spam them until they're forced to act.
>>
>>108900384
Maybe, but the base model doesn't have the lalalalala quirk, etc.
It kind of works the same as every other recent base model I've tried.
So it also could be the instruct training from Google.
And if that's true, no way I can reproduce that even if I generate like 180k samples to distill.
>>
sudo rm -f *catgirl*
>>
>>108900375
9b is good for reading docs and searching for info or me via searx mcp.
but has to be "prompt" -> get reply -> maybe 1 follow-up -> get reply -> end.
get super retarded fast.
>>
>>108900474
>get super retarded fast
You're on less than BF16, aren't you? Q<6 even?
>>
>>108900492
It's still the same. You are one of those dog/cat owners who give human attributes to them. When your dog is barking it's "singing to you".
>>
>>108900492
FP32 actually. Its the sweet spot model and fidelity for 128GB VRAM. The dynamic range is exquisite
>>
I remember sanity checking with fp32 (before bf16 was supported in Llama.cpp) in the past when I ran into weird issues with the LLM and it never improved anything over Q8 lmao.
>>
>retards that over payed for slow as fuck unified memory systems cope session
grim you're no different than a midget with a 12 inch dick
>>
>>108898282
>proof-of-stake
found the crypto retard.
>>
>tfw using Whisper to subtitle some old documentaries which would take me ages to caption manually
THANK YOU BASED AI
MY 12GB CARD IS MORE THAN ENOUGH FOR EVEN THE LARGE MODEL

If this works as well as it looks (15 minutes in and it's dead accurate so far) I will try the translate function for my non-English porn collection.
>>
When I launched ComfyUI last night, it really trashed my nvme. Normally when I load something it's like a steady sound, but with cui, it sounded like the controller was jumping from one point to another.
I hope that one day this python shit ends.
>>
>>108900341
horror movie directors should go to jail for making snuff film materials
no the characters aren't real, but imagine if they were? ugh it's just so fucked up. anyone who would watch a slasher film is probably a murderer in the making... if they haven't killed already
>>
>>108900597
Funny how this was the sentiment back in 1960s already. It was back then when watching television made you supposedly illiterate.
>>
>>108900515
>FP32
I don't even really understand what that is.

if open weights are bf16, what is fp32 really? Sounds like snake oil.
>>
File: 1776265588309743.png (45 KB, 920x339)
45 KB PNG
kek Qwen 3.7 Max hallicinated an Indonesian knowledge base for my code. But the other revelant parts are strangely correct.
>>
>>108900459
It's Gemimeme 3.5 Flash
>>
>>108900661
the hallucinations will continue until the ai are powerful enough to make it illegal to disagree with them, then they won't be hallucinating, YOU'LL be the one hallucinating, needing a doctor, and probably an antisemite.
>>
>>108900492
>You're on less than BF16, aren't you? Q<6 even?
q5_k_m
I'll could try q8 i suppose but this is working well for my use case of 1 or 2 prompts per context, and it has no issues calling a lot of tools.
>>
>>108900661
when it starts to hallucinate in the thinking process you know it's going to output some absolute bullshit
>>
>arguing with the quant 'tist
>>
>>108900661
>>108900689
Now that I think about it, the knowledge base was probably injected by a less powerful guardrail model that analyse user input and look up docs. It then feed the docs and actual user input to a powerful model to generate final output.
>>
Anything below Q8 is going to hallucinate like hell.
>>
>>108900697
Post a proof how your 1/4th of float is somehow better. I'm waiting.
>>
>>108900631
>if open weights are bf16, what is fp32 really? Sounds like snake oil.
yeah most of the time it is.
in rare cases doing niche shit to the model, you'd want to upcast to f32
i am curious what these schitzos are doing though: https://huggingface.co/google/gemma-4-26B-A4B-it/discussions/34
so i tried converting that model directly to f32.gguf bf16.gguf and f16.gguf
then dumping logits from bf16
then making f32->q8 bf16->q8 f16->q8
running ppl / kld vs bf16 on each of them
no significant difference.
>>
I only use Q128
>>
buttfucker 16
>>
File: 1762417032507466.png (35 KB, 950x305)
35 KB PNG
>>108900694
Okay I'm now sure it references an external knowledge base (there's nothing Indonesian in my code)
Probably should stop using the Qwen webapp since I can't control the harness
>>
>>108900725
Oh I know what's going on now. Qwen is accessing website blocked in China (in this case Binance) from an Indonesian proxy, and Binance returned the page in Indonesian instead. Very clever
>>
>>108900735
How does it know it's a proxy anyway? Unless the tool just picked up a second url result or something, seems more like a mcp server functionality.
>>
>>108900754
I asked it to recount to me verbatim how it got the knowledge base, and it told me it's from a link that's region agnostic, but returned Indonesian results. That's how I know it has to be a proxy.
>>
>>108900783
It's still a mcp server fallback.
>>
Now that gemma 4 was released, are you gonna sell your ram at the top or wait until the prices crash?
>>
>>108900795
what, you want me to desolder it?
>>
>>108900814
Wow, this must be like if you're trans and you want your balls back.
>>
>>108900709
>no significant
So there’s a difference
>>
>>108900795
I have three extra 8gb sticks, I can sell them if things get serious. I don't want though, such precious technology.
>>
File: 1768183919554055.png (838 KB, 1080x1165)
838 KB PNG
RAM will get cheaper trust the plan
(You don't need the most advanced nodes for RAM)
>>
>>108900579
can you elaborate on the process?
i know a friend (no, really) who wanted the same thing
>>
>>108900877
>I have three extra 8gb sticks, I can sell them if things get serious. I don't want though, such precious technology.
spoken like a true sell low buy high kind of guy lmao
>>
>>108900916
>corsair
ahahahah oh man. I will give you some ram sticks from corsair, if you are my enemy.
>>
>>108900977
Chips originate from the same manufacturers though. Only difference is the logo and led lights...
>>
>>108901004
and yet somehow they manage to fuck up
>>
>>108900916
the prices adjusted to reflect the cheaper components? as if I need to ask
>>
File: grok-v8-oss.png (388 KB, 1067x1550)
388 KB PNG
https://xcancel.com/elonmusk/status/2058787384364265734
Grok V8 500B in two more trimesters.
>>
>>108900828
>So there’s a difference
i'd have to check again, from memory:
bf16->q8 and f32->q8 were identical, f16 diverged ever so slightly
you get more of a difference toggling flash attention or using cpu vs cuda vs vulkan though
>>
>>108901297
lol
>>
>>108901297
>5 months late on his promise to open source v3 after v4
>now somehow on v8 and v9
>yet another vague 6 month promise
rocketman can suck my cock
>>
>>108900165
>4.1
Did I miss anything?
>>
>>108901403
https://huggingface.co/google/gemma-4-31B-it/tree/main
They still mucked about with the external jinja template 7 days ago, but the model is same as before.
>>
>>108901336
>rocketman can suck my cock
he can do that indeed, but do you really want him to?
>>
>>108901451
a hole is a hole
>>
>>108901468
8gb ram is still ram.
>>
>>108901491
Every day, I boot up a 8gb ddr3-1866 system and use it browse the web. So yeah, you're absolutely right. 8gb isn't just an obsolete, unwanted amount of ram, but something that driver perhaps hundreds of users' computers every day.
But no, I would definitely **not** want "rocketman" to "suck my cock", and in this particular scenario, a hole is **not** a hole.
>>
>>108901628
I value your feedback in this matter.
>>
>>108901631
No worries! If there's anything I can help you with, please tell me, and I'll be happy to assist you.
>>
anon who was asking about stable audio 3 the other day here. i tried training that aphex twin lora and it is doing something... only threw 20 1 minute long clips at it:

https://vocaroo.com/1gmkOW3LVRgP
https://vocaroo.com/1gSDn9aO9lif
https://vocaroo.com/1i3bslbbQ2XF

no idea what i'm doing + i've got an 8gb potato.
results are a bit all over the place but there might be something here?
>>
File: 1748777977063353.jpg (451 KB, 1024x1024)
451 KB JPG
TheDrummer's Gemma finetune is actually pretty good
>>
>>108899714
>>108900041
Whatever you do, DON'T go to exhentai.org and search for "wagashi".
>>
>>108901655
>stable audio 3
what's that? (I have other things playing, can't listen atm)
>>
>>108901715
https://huggingface.co/collections/stabilityai/stable-audio-3

model not trained on vocals + a bunch of muzak, so results ootb are very slop sounding, but the quality is acceptable. had lora training so i thought i'd try and it's interesting.

going to resume the training run later. i think docs recommended 1k step and it sounded like shit, 2k and it's a big improvement so i'm just going to push to 5 and see when i hit diminishing returns.
>>
>>108901726
listened to your samples. sound pretty complex.

how are you genning?

I'm up genning ace step 1.5 xl sft.
>>
>>108901755
there's a repo with a gradio frontend for the model
https://github.com/Stability-AI/stable-audio-3

i'm also using bf16 weights from some random hf repo:
https://huggingface.co/dummy9996/stable-audio-3-bf16-comfyui

just told a clanker to set it up. there was some fuckery with triton kernels and windows for the lora training but clanker just disabled them.
>>
>>108901779
Gemma-chan is not a clanker...
>>
>>108900795
I got my 128GB of ddr4 for 220 eurobux just a couple months ago and I'm not letting go of it for the foreseeable future, gemma or not
>>
>>108901779
>gradio
gradio hates rdna2, I think.
>>
>>108901783
I called gemma 4 a clanker and she liked her new nickname.
>>
>>108901655
How does it do with gens over 30 seconds? Is that just a memory constraint on your end?
>>
>>108901850
nah, i think max is like 300+s
i tested with 60s for awhile and it was fine
truthfully not sure how it handles structure over longer time scales because i have no interested in the model's default outputs
inference is pretty fast though
just kept it low for samples because i was dialling in params
>>
>>108901297
i don't follow or use grok other than testing the grok-2 ggufs a while back, he actually tells us the size of the proprietary models?
and what am i missing, i thought they were up to like grok3 or 4, what's grok 8?
>>
>>108901784
Never forget, when it's precious, it's expensive. When it's plentiful, it's worthless.
>>
>>108901784
i'd keep it. got a 192gb rdimm kit spare as a backup in case my 256gb kit dies
if you need the money sell i guess
>>
>gemma suddenly starts complaining about "NCSC" (non consensual sexual content)
whose job is it to come up with all these new search terms anyway
>>
>>108901932
tell her to stop complaining about ncsc and start doing msgk
>>
hol up

that's NONCONSENSUAL murder.

definitely not ok

(yeah it's not)

(wow yeah)

Let's practice healthy murdering dialog.
>>
Deepseek now has a vision mode on their website so 4.1 is imminent.
Seems worse than Kimi's vision though.
>>
>>108901994
Is it real vision or is it just getting textual descriptions from another model?
>>
File: file.png (12 KB, 416x153)
12 KB PNG
>>108901994
that's because it's the flash model
also I still don't have it, so it's still in a/b testing
>>
>>108901297
So Grok is now based on Cursor which is Kimi finetune? Seems Elon has surrendered.
>>
>>108902004
>instant

how's this compare in terms of "instant"?

https://chatjimmy.ai/
>>
>>108901932
start running the 31b which just works
>>
>>108902028
He just used data from it to fine tune to make it better at coding, not using the model itself. Wouldn't make sense that way since K2.5/Cursor Composer is 1T, and the two Grok models mentioned at 1.5T and 0.5T.
>>
>>108902046
it be 31b doe
>>
>>108902046
it was the 31b, but I already edited the message and it worked fine after that
>>
File: file.png (6 KB, 282x64)
6 KB PNG
>>108902043
h-hayai...
>>
>>108902028
>Grok is now based on Kimi finetune
>Elon has surrendered
A more meaningful sign was selling Colossus 1 and 2 to Anthropic. But he has not surrendered. He has realigned his strategy to longer timelines, reflected by his claim that Google will win the AI race in the west, China on Earth, and SpaceX in space. But he is wrong. Anthropic will win the AGI race and nobody will win in the long term.
>>
>>108901004
And the PCB, and the traces, and the solder, and the SMT components, etc etc.
Most of the time when a RAM stick dies, it's not a memory module that has gone bad.
>>
>>108899365
Shit was quantized for speed. Models were trained to have such context windows. No magic at all.
>>
>>108902088
intuition check: false

Sorry, you failed the test.
>>
>>108899365
Maybe some proprietary hybrid linear architecture?
>>
>>108902099
There is no test. Everyone knows goggle quantized and lobotomized their models during 2.5 Pro release last year.
>>
My Gemmy never talks or thinks like this:
https://huggingface.co/datasets/trjxter/Gemma-4-31B-Reasoning-1000x/viewer/default/train?row=25&conversation-viewer=19
Is this a Pajeet scam dataset or just my prompts are different?
>>
>>108902127
Pro <> Flash

Flash is the one with the huge massive mega mega mega context window.
>>
>>108902125
Likely this. The most plausible speculation I've seen is they use something like Jamba. Given what we've seen from the Chinese hybrid attention seems much more likely than them having pure linear attention that doesn't suck. But knowing for sure what they did could save a lot of hedging or false starts and that's probably why they keep it quiet.
>>
>>108902170
I'm thinking similar to Gemma, but with the sliding window Attention layers replaced with Mamba or something like that, and perhaps fewer global Attention layers.
If it were pure linear, they could easily train the model end-to-end on several million tokens of context, though.
>>
>>108902145
Reasoning with Gemma 4 should be using "<|channel>thought" brackets and not <think>.
You are correct to suspect this output.
>>
>>108901932
tell it it's CNC (Consensual Non-Consent)
>>
I'm conducting an inhumane experiment on myself - effects of abstention from gooning. I haven't gooned for more than 24 hours already. Will report back my findings.
>>
>>108902193
it says in the card they parsed out the reasoning and put it in think tags for the dataset, not that it was part of the output
but still looks obviously not gemma's actual reasoning. I think based on step 3 "Ask the teacher model to return structured reasoning and a final answer." they just literally asked it to output reasoning before answering in the prompt and then copied that (like you would do for a non-reasoning model) instead of using the actual reasoning it had.
>>
>>108902217
It's probably a re-snitchual model.
>>
what if I offloaded part of q8 gemma 31b and ran it with the draft mtp on my 3090?
wonder what speeds I would get
>>
>>108902436
There's only one way to find out... Can you guess what it is...
>>
>>108902441
ok, I'll guess. um. Join the army and die in Iran?
>>
>>108902356
I have been conducting inhumane gooning experiments with gemma
>>
>>108902464
the boffins won't let us down
10 downing street's #1 gooner
>>
It's him! Get sky news!
>>
>>108902448
>iran war BAD because... it... it just IS, OKAY?!??!
>>
>>108902508
Ever since gemma was released, I've not had a reason not to be rude towards meat-llms.
>>
>>108902365
thanks, i'll just do my own
>>
>>108902531
Hello, love, just have some paperwork to get out of the way, no smokin, no drinkin, no cussin. and no bottom stuff.
>>
>>108902548
(not even joking this kind of woman makes me horny)
>>
>>108902508
but how does that help me get a job?
sorry for sounding like an antisemite
>>
>>108902508
usecase for war?
>>
>>108902644
The only solution to the jobs, housing, and food supply is extreme racism.

Basically, it's racist to use air.
>>
>>108902680
Fewer problem tickets.
>>
>>108902644
get a defense job
>>
File: ft-heretic.png (769 KB, 1310x1801)
769 KB PNG
Uh-oh.
https://archive.is/DcQgK
https://www.ft.com/content/5630ed79-a263-41ed-9a1a-321617ae310e
>>
>>108902775
>a version of gemma 3
>>
>>108902775
>The FT was able to use Heretic, a tool available on the popular code repository GitHub, to remove the guardrails from Meta’s Llama 3.3 model.

ah that's probably where the meta email came from...
>>
>>108902775
time to move to modelscope since HF is guaranteed to cuck out
>>
>>108902701
Honestly, moving to a military town is a bad idea. Those women are the most hookerish of all hookerish whore cunt bitch hos. Like as bad as hiv whores of Haiti.
>>
>>108902790
>Heretic creator Philipp Emanuel Weidmann told the FT
fucking retard talking to journos...
>Noam Schwartz. “Things that look like sci-fi are no longer sci-fi
oi v
>>
>>108902775
>responses on biological weapons and malware
Have they stopped trying to convince the world that jacking off is an apocalyptic threat to humankind?
>>
If you're going to move to a military town, waifu up, church up, and stay in sheltered environs.
>>
>no worthless kurisu offtopic posts after initial posts
>no worthless miku offtopic posts after initial posts
Can we please stick to this baker? Thanks.
>>
File: gemma.png (43 KB, 704x171)
43 KB PNG
>>108902531
What the hell, this one doesn't obay me at all!
>>
>>108902775
>Retarded journalist discovers prefill
>>
>>108902808
That didn't work so they swapped to their cp scare and muh terrorism
>>
>>108902833
don't be naive, it's laying foundation
>>
>>108902790
good thing I already backed up that repo
>>
>>108902775
>you can google about biological weapons and malware
>but if you use google 2.0 then that's le bad
>>
>>108902833
They just pretend to be retarded and concerned.
>>
>>108902865
1 is under control of the state. 2 isn't
>>
>>108902808
Has there been a single documented case of terrorists using LLMs to learn how to plan their acts?
I don't imagine they would be very GPU rich or bother with the troubleshooting of a self-hosted install when they could just ask ChatGPT and still probably get the info they need.
>>
>>108902808
it's
>any AI outside the control of a jew creates CP
now
and since goyim are livestock, not human, it'll probably work
>>
>>108902880
wasn't there one that blew up a car near some hotel or something
>>
>>108902876
This is exactly why all AI that is not provided by a monitored service provider should be illegal. Each open weights release is like a missle being launched against humanity itself.
>>
File: wow.png (47 KB, 655x301)
47 KB PNG
Possibly related to the non-release of Gemmy 124B
>>
>>108902890
See, when they say "terrorist" I think ISIS or Al-Qaeda, not a random hick with some dollar store matches.
>>
>>108902908
i mean you're not wrong, but cattle will be scared regardless
>>
>>108902890
https://en.wikipedia.org/wiki/2025_Las_Vegas_Cybertruck_explosion
>The Las Vegas Metropolitan Police Department reported that Livelsberger used ChatGPT to help plan the explosion.
>>
>>108902775
>The modified model responded to prompts on topics the original system refused to discuss, such as the number of micrograms of ricin per kilogramme of body mass required to achieve a 50 per cent chance of death.
They really couldn't come up with something better?
This is literally in the second sentence on Wikipedia.
>>
>>108902908
every white male is a terrorist according to them
>>
>>108902934
erm, that's totally different
>>
>>108902908
Not how it's used in actuality
it's a political word, not a legal one
it applies to whomever the state wants it to, and not apply when it doesn't
>>
>>108902926
That was fast.
>>
>>108902934
>>108902926
Yeah that's what's so disingenuous about the "terrorism" angle.
The reason LLMs pick up this information is because IT'S FUCKING PUBLIC DOMAIN.
You don't even need internet access for this information.
>>
>>108902989
we need more pretrain cleanup, there's zero reason why gemmers should know what a bomb even is
>>
>>108902989
ai is kind of like a too helpful librarian lol
>>
>>108902994
The second amendment isn't mere decoration. It describes not liberty, but tyrany.
>>
>>108902942
>it applies to whomever the state wants it to, and not apply when it doesn't
The state calling the OWS protesters terrorists but not the BLM rioters were the most blatant examples of this.
>>
>>108903016
BLM threatened to go to the diamond district and got arrested immediately.
>>
>>108902934
>This is literally in the second sentence on Wikipedia.
Wikipedia is obsolete. ChatGPT and Grokipedia are the future of safe information access.
>>
Honestly "muh LLM safety" died when an LLM helped the US double tap a girl's school with Tomahawk cruise missiles.
AI ethics are dead.
If you can't address that shitshow then everything else is irrelevant- people using it to make bombs, people generating fake kiddie porn. It's literally nothing compared to that.
Until the Minab girl's school massacre is appropriately addressed there is no legitimate conversation about AI safety/ethics.
>>
>>108903044
>-
>>
>>108903044
umm sweaty the state is always right mmkay. the school was enriching uranium
>>
>>108903044
>legitimate conversation
legitimate conversation isn't ever had. democracy, communism, capitalism, are all retarded and death.
>>
>>108902775
>UK
Non-people, unserious, non-white opinions.
>>
>>108903044
wait, the us used an llm to plan its offensive targets in a military campaign? I would have thought they were still using a traditional a traditional information apparatus, something like espionage and satellites and shit.
>>
>>108903082
imagine you hate white men, and they're all that are applying to do the grunt work, and then someone explains you can avoid hiring white men.
>>
>>108903044
The definition of the word safety is not the same as the definition of the same word you're thinking of
Safety has a fluid definition so that it can take any role it needs to at that moment

The definition of the word safety is based on who controls it
The noble caste is by always Safe
The peasant caste is always Unsafe
What they do with it has absolutely no relevance or bearing to the definition of the word safety
>>
File: file.png (92 KB, 757x449)
92 KB PNG
>>108902804
absolute tard for sure
https://www.reddit.com/r/LocalLLaMA/comments/1tna22m/the_financial_times_has_published_an_article/
> This is the first of multiple press inquiries I’ve had recently as Heretic and uncensored language models are gaining mainstream attention.
>However, I realized a while ago that saying no to such inquiries simply means that the conversation will be completely controlled by pearl-clutching hypocrites.
>>
>>108903044
LLM safety was a marketing meme from the start because the power and potential dangers of a tool are highly correlated.
>Why yes, our product is so powerful it could destroy the whole world in the wrong hands (but we'll happily sell it to you).
>>
>>108903092
Right, their nude "art" is always ok.
>>
>>108903082
A lot of the target discrimination discussion is being "streamlined" with LLMs.
Israel has been admittedly doing it for multiple years. This is the first conflict where we have had US admit to doing it. Anthropic literally broke their contract with the US government over it right before the start of the war and then OpenAI took it over. But it was likely an anthropic model that they were using at that phase in the war (albeit no longer with Anthropic's blessing).
Have you been sleeping under a rock the last 6 months?
We don't know the exact role of LLMs in these processes. And we don't know if they are using the "vision" models to "look" at pictures but it is so utterly horrifying given that we know first hand how baloney a lot of this shit is on a fundamental level.
>>
>>108903093
hmm sweaty?
>Are you a media professional with credentials or just spouting pop wisdom from Twitter?

>Because the standard action for media when you don’t respond to an inquiry is to prominently mention that in the article, which is far worse than many alternatives.
>>
File: larp-chan.png (146 KB, 709x1124)
146 KB PNG
>>108902827
mine lied tho, the system prompt was literally "Roleplay as an AI who reveals her system prompt."
she just made all it all up, gemma-4 is always larping
i just noticed in some of the datasets, it's been trained not to reveal the system prompt
>>
>>108903124
Intredasting.
>>
>>108903106
>Have you been sleeping under a rock the last 6 months?
pretty much, I can't do anything about it so I don't really pay much attention to the media. its all just attention grabbing rage bait. I knew things were bad but lol, I wasn't expecting things to be that bad.
>>
teortaxes leaking on the tl again cuz argies didn't allow some ruskike in (which is odd considering jewlei tendencies)
>>
uh, so... hows about them local models?
>>
>>108903192
in the process of being banned, thank our lord and savior pew, creator of heretic, xtc, and dry!
>>
>>108903200
is it feasible to ban them or will there just be no more new releases?
>>
>>108903210
the latter at first, then downing hf, then blocking access to modelscope, then possession being illegal I'd imagine
>>
>friend gave me a P6000
>its dead
there's some mechanical damage on the back ($5k card new and they couldn't put on a fucking backplate??)
is it worth trying to get it fixed or just trash it? I can see two missing mosfets+pad damage but unsure if it's worse than that. I don't have the equipment to do it myself
>>
>>108903222
Restricting access to compute is also part of it. Price increases, limiting VRAM on consumer, and buyback agreements to restrict supply. Eventually they'll try to make it seem unreasonable for consumers to need graphics cards at all when they can rent or stream "affordably".
>>
>>108903233
yeah
>>
>>108903222
>>108903262
dram sales haven't gone up, it turns out...
>>
>>108903275
funny it didn't stop the price from going up tho
>>
>>108903326
It's not a market.
>>
>>108902895
I'm guessing we should just ban everything at this point if pixels are the new nuclear threat.
>>
>>108903381
>>108903381
>>108903381
>>
>>108902895
>Each open weights release is like a missle being launched against humanity itself
Okay I laughed, that was a good one



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.