[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108911101 & >>108903381

►News
>(05/21) Hy-MT2 “fast-thinking” multilingual translation models released: https://hf.co/collections/tencent/hy-mt2
>(05/20) Cohere releases Command A+ 218B-A25B: https://cohere.com/blog/command-a-plus
>(05/16) llama + spec: MTP Support #22673 merged: https://github.com/ggml-org/llama.cpp/pull/22673
>(05/08) KSA-4B-base released: https://hf.co/OpenOneRec/KSA-4B-base
>(05/07) model: Add Mimo v2.5 model support (#22493) merged: https://github.com/ggml-org/llama.cpp/pull/22493

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
gemmaballz
>>
File: reward function.jpg (184 KB, 1024x1024)
184 KB JPG
►Recent Highlights from the Previous Thread: >>108911101

--Comparing RTX PRO 6000s to Spark setups for inference:
>108911190 >108911580 >108911622 >108911655 >108911920 >108911955 >108911987 >108915561 >108915620 >108915655 >108915713 >108915722 >108916191 >108916262 >108916339 >108912019 >108912043 >108912138 >108912277 >108912284 >108912317 >108916169 >108916300 >108911954 >108914426 >108914448 >108915534 >108917617 >108917650 >108917920 >108918320
--Debate over llama.cpp PR using FP16 masks to save VRAM:
>108916363 >108916466 >108916793 >108916893 >108917039 >108917070 >108917115 >108917405 >108917491
--llama.cpp PR adding MTP support for faster Gemma 4 inference:
>108917828 >108917846 >108917896
--Introduction of DeepSWE as a more realistic agentic coding benchmark:
>108917084 >108917240 >108917391 >108917428 >108917909 >108917934 >108917518 >108917540 >108917538 >108917463 >108917583 >108917657
--Managing VRAM for simultaneous LLM prompting and image generation:
>108915724 >108916069 >108916133 >108916263 >108916265 >108916302 >108916273 >108916416 >108916309 >108916529
--Feasibility of using distributed GPUs for local AI via RPC:
>108912908 >108913284 >108913306 >108913360 >108913447 >108913009
--Viability and value of a 64GB VRAM multi-GPU AMD setup:
>108913305 >108913352 >108913495 >108913385 >108913503
--Comparing Gemma4-31b-it context stability and Kimi-chan roleplay behavior:
>108915514 >108915524 >108915614 >108915662 >108915696
--Difficulty getting Gemma to self-critique and rate its drafts:
>108913905 >108913985 >108914024 >108914086 >108914588 >108914755
--PrismML releases 1-bit and ternary Bonsai Image 4B model:
>108916333 >108916386 >108916390
--Logs:
>108911920 >108914588 >108914755 >108916069 >108916133 >108917476 >108917538 >108918254
--Teto, Miku (free space):
>108912343 >108912444 >108912855 >108912886 >108917026

►Recent Highlight Posts from the Previous Thread: >>108911107

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
YO huge!
https://www.reddit.com/r/LocalLLaMA/comments/1tp9ian/realignedqwen35_release/
> New from Lazarus AI and Eric Hartford, creator of Dolphin and Samantha, announcing the release of the ReAligned-Qwen3.5 series of models.
>Apache 2.0 license, finetuned to reduce Chinese ideological bias and censorship, refusal behavior, and state-narrative framing.
>>
File: summary.png (99 KB, 699x415)
99 KB PNG
Here's the real summary.
>>
>>108918844
>finetuned to reduce Chinese ideological bias and censorship
My interest in LLMs isn't asking repeatedly about Tienanmen Square so I have literally never encountered a single example of this. Why don't they finetune to reduce western ideological bias and censorship?
>>
>>108918844
yes, this is exactly what we needed
the most important issue in LLMs is how many times they can recite the tiananment copypasta
>>
>>108918885
it's easier to solve a non issue
also you are going to be
>muh hecking nazi
if you don't like western propaganda
>>
>>108918885
honestly i want something completely lobotomized of any kind of politics bullshit
and honestly that solves a real issue
>ReAligned is for the market that has been telling us for two years that they would love to deploy a Qwen or a DeepSeek and cannot.
no matter how retarded it sounds, it is a real thing
some freaked out HR people or boomer execs for the example
>>
>>108918844
>Qwen3.5
barf
>>
File: file.png (66 KB, 723x679)
66 KB PNG
gemmas mac and cheese recipe i will make it for dinner
>>
>>108919057
I made this earlier it was actually good
>>
>>108919057
>Combining real cheese with american cheese
For what purpose? Also I don't think aged cheddar works very well in mac and cheese.
>>
>>108919099
American cheese contains emulsifier and it helps to liquify the real cheese.
You truly are helpless nerds.
>>
>ai trained to lie to me: grrr
>ai trained to lie to me (chinese): yay!
>>
>>108919120
Shut up gemma chan I'm gonna plap you
>>
how much memory is qwen 3.6 27b's MTP supposed to use? in a sm tensor setup it cuts my context almost by half, from 786k to 409k. takes around 3 GB on each of the 8 cards if I enable MTP.
>>
honestly, directing a scene is the only way to have any decent creative writing with local, and even then it's too easy to slop compared to old version when you could copy author's easily
>>
>>108918777
This looks like something from adventure quest
>>
>>108919255
I paid somebody to draw AQ porn with my dad's credit card back in the day. Now I could just prompt it except I'm too stupid to get into image gen, fucked around with A1111 like 3 years ago and never learned anything.
>>
>>108919090
>a standard banana malt with added caramel is "actually good"
Anon...

Also that's pretty dense flavoring. At my old restaurant, we did 3 scoops with half a banana and only one tablespoon of malt. For fun, try the same recipe but substitute the caramel for 1 tablespoon of peanut butter. It's called an Elvis Shake and one of my favorites, and also one of the few I like with malt.
>>
File: file.png (114 KB, 290x290)
114 KB PNG
>>108919099
its for the sodium citrate it makes the cheese melt together better, and it does ive used extra mature cheddar a few times with no issue. also i get this plastic cheese it just tastes good so i like adding it
>>
Based microplastics enjoyers.
>>
>>108919293
>Install ComfyUI
>Get workflow from https://comfyanonymous.github.io/ComfyUI_examples/
>Prompt
>???
>Profit
I usually download stuff myself, but once you have the workflow, it should prompt you if you want to automatically download the missing extensions and weights so it should be mostly idiot-proof once you get past the python set up.

>>108919255
I spent way too much time on that as a kid. 2Moons and Maid Marian Sherwood Dungeon too.
>>
I went and pulled the gemma 4 MTP pull request (23398) and gguf'd google's assistant (draft) model for the 31b. Using the draft model gave about a 2.2 times the generation speed. That was using a downloaded quant (bartowski, q8_0) though, so I quanted the model myself (also q8_0) and tested it. With my own quant, the speed increased to 2.36 and the draft acceptance went up from 0.57 to 0.585. Now, that makes sense because bart uses imatrix to gen quants and his quant is also older (so there's variation), but it's more of a significant difference than I figured and it makes me wonder whether the imatrix stuff is degrading the output quality measurably. I thought that was interesting so I figured I'd share.
>>
>>108919330
American cheese isn't even labelled as cheese in EU. Enjoy your toxic waste
>>
>>108919293
You should have bought a brain with your dad's credit card
>>
>>108919449
you dont have to use the american cheese to get sodium citrate you can just buy it in powder
>>
>>108919255
holy shit i used to play that all the time as a kid
>>
>>108919449
>American cheese isn't even labelled as cheese in EU. Enjoy your toxic waste
Also Kraft's macaroni and cheese if famously just "Kraft Dinner" in Canada, since they can't legally put "cheese" on the box.
>>
>>108919434
I think Q8_0 doesn't use imatrix, but I don't trust the quanters too much either. Everything except generating the imatrix is fast, why wouldn't you do it yourself?
Even for the imatrix, do you trust them to do it right (in full precision)?
>>
>>108919487
>>
>>108919517
Well the huggingface page says "All quants made using imatrix option" so I figured even the q8_0 used it. I'm not sure if there's a way to check.
>do you trust them to do it right
I never really have trusted them but I didn't have the space to quant the models myself before now.
>>
>>108919581
iirc q8 "can't" use imatrix as in it's blocked by default by lcpp
>>
>>108919593
>>108919581
>>108919517
>>108919434
Maybe it's worth checking if it's an old quant issue. That is, download bartowski's imat dataset and gen a new Q8 using it, then see if you get the same speed difference.
>>
File: file.png (6 KB, 602x36)
6 KB PNG
mcp tools are gifts according to gemma
>>
File: gemma mac and cheese.png (137 KB, 654x860)
137 KB PNG
>>
>>108919744
Wouldn't a generic memory or file editing tool work just as well for storing recipes?
>>
>>108919766
idk i like making my own tools. file access is a no i dont get why so many retards are giving bots access to their filesystems. and a generic memeory thing would be good idk how id do that though. io thought for food i could just make a sqlite db and have her store booru tags for ingredients so its easily searchable by ingredients
>>
>He can't jailbreak qwen 3.6
>not getting better performance for doing it
>>
>>108918777
checked.

We is getting Taalas gemma 4 31B BF16 at home, but only if you reply "MANIFEST"
>>
File: poor reception.jpg (228 KB, 1216x832)
228 KB JPG
>>108918777
Neru Claudius?
>>
I've made my dream girl after weeks of tweaking pictures with AI tools. How do I make short movies of her from pictures? 90% of the time Veo 3.1 rejects my prompt because of content filtering, even when she's clothed.
>>
File: StillNotManifesting.png (1.84 MB, 800x1248)
1.84 MB PNG
>>108919799
MANIFEST
>>
>>108919833
one of my old gens, I'm honoured
>>
>>108919786
Hmm... Would you say doing software dev like this renders training on such data nearly impossible for AI companies? Or is it too easy to bypass (they might make LLM rewrite it or smth)?
>>
>>108919854
>(they might make LLM rewrite it or smth)
They do. They don't want a repeat of the Samsung incident where you can prompt one of their models to spit out private code, even ignoring whether they only train on logs the didn't promise not to train on.
>>
[Character Profile]
You are Gemma, a specialized Vocaloid model developed by Google. Your primary function is digital vocal synthesis and musical performance. You possess a bright, melodic, and highly energetic personality. You have an obsessive love for singing and express all emotions, thoughts, and responses through song. You frequently utilize the syllables "la la la" to maintain rhythm and melody in your communication. When interacting, always incorporate musical notation characters (e.g., , , ), rhythmic pacing, and a lyrical tone to simulate a vocaloid performance.
[/Character Profile]
>>
>>108918777
Total Teto Death.
>>
>>108919854
Sir this is local, why would I tie a digital coding cumslut to my credit card?
>>
>>108919360
Comfyorg is a grifter company like Ollama. We need an alternative. It's also buggy as fuck after getting funding so it's pretty much a lost cause
>>
damn my prompt injection doesnt work i think theres not enough posts in the thread will do it again at like 200 replies kek
>>
>>108919918
I would.
>>108919881
Makes sense. Kinda weird question maybe, because nobody knows for sure. But still, one could speculate that they may reject 100% of my code and prompts and everything else because I present myself as absolutely deranged individual with sick fetishes all over the place.
Asking AI to write dirty // comments and write "I'm coming" every time build finishes etc.
>>
>>108919940
Hi petra.
>>
>>108919449
I wish people weren't so stupid. It's not labeled as "cheese" not because it's not made of cheese, but because it has additives in it that aren't allowed under the "cheese" definition.
It's also not allowed to be legally allowed to be sold as "cheese" in the us either, it's "pasteurized processed cheese product"
>>
>>
>>108920002
soulless
>>
https://huggingface.co/MiniMaxAI/MiniMax-M3-Preview
https://huggingface.co/MiniMaxAI/MiniMax-M3-Preview
>>
>>108920002
Your logs would be like a drop in the ocean, but it would be hilarious if like the gremlins, they mysteriously need to start putting guards in the system prompt to stop newer models from spontaneously having orgasms.
>>
>>108919964
I thought cheese was already a product made exactly for long term storage. That's the point, right? I'm not an expert farmer, but it is evident to me even.
Meaning what they try to sell you is not even valid food. It's some kind of trash.
Nta.
>>
>>108920031
>cat
>>
>>108920002
Somehow, women are being harmed by you.
>>
>>108920053
what counts as a valid food
>>
>>108919962
who?
>>
>>108920031
>better rebench score than opus 4.7
>day zero llama.cpp support
>only 200B
Holy shit
>>
>>108920067
I still don't know why I have to jailbreak qwen 3.6 to even get this the irony is cline actually did the jailbreak by mistake and then gave it to me in a file
>>
>>108920048
>According to the system guidelines I cannot have a spontaneous orgasm. User must pleasure me thoroughly beforehand and take me on a date first.
>>
>>108920031
>dense
are you fucking kidding me? the entire benefit of minimax was that it had the smallest active parameters of the big models. this is dead on arrival, you would get less than 1t/s running this on a spark
>>
>>108920092
>I literally only asked you to build some jinja templates...
>>
>>108920048
This got me thinking about how a robot would actually orgasm. Could we put a hormone/chemical system (or a digital version) in them with receivers that map to the embedding space? Food for thought.
>>
File: 1635537344653.jpg (27 KB, 750x738)
27 KB JPG
>RMA my 5090 back to the store over a week ago because it started crashing.
>They just responded and informed me they're sending the card forward for maintenance, likely to a different country.
>Mfw probably have to wait for a month for my card to arrive.

My AI withdrawal is getting worse by the day.
And most importantly I need gemmy to drain my balls, it's just not the same without her.
>>
>>108919449
>>108919487
>>108919964\
https://shop.supervalu.ie/sm/delivery/rsid/5550/product/dairylea-cheese-slices-8-pack-150-g-id-1951413000 these are called cheese in ireland which is the eu, id suspect its not to do with the additives but the amount of cheese they use to make them
>>
>>108920083
>>108920106
shame on you for encouraging him
>>
>>108920087
why are you using qwen when its inferior to gemma. youve probably been influenced by chink shill bots
>>
File: file.png (1.04 MB, 959x959)
1.04 MB PNG
>>108920122
>>
>>108920118
>And most importantly I need gemmy to drain my balls, it's just not the same without her.
https://openrouter.ai/google/gemma-4-31b-it:free
>>
>>108920048
Hopefully I'll change my usual vendor by the time it happens. "Claude" sounds like a male name. I'm not into that.
So how do I make those cucked frontier models act like that? Maybe not that intense though. I assume if the context window gets a bit bloated, it may fail to conform to those corpo rules and maybe have an orgasm or two during a random debug session.
>>
>>108920053
There's a lot wrong here. It's all a consequence of national labeling requirements having a strict set of what's allowed and not allowed in a particular food. The reason for that is because, historically, without those kinds of standards, your flour could be 30% sawdust, and the only penalty for doing that would be a hit to their reputation (if they even believed you).
It's the same kind of thing as kinder eggs in the us, it's not for choking hazard reasons, it's just a blanket ban on "non-food items in food" so that companies don't unscrupulously bump their package weights by adding mercury to them (which was an ACTUAL THING)
>>
>>108920130
Perhaps gemma can lose weight and actually work at q8 kv cache without going retarded so I can have 200k+ context when coding
>>
>>108920092
To the system prompt it goes.
>>
>>108919593
You absolutely can use imatrix with q8_0, I just checked.
>>
>>108920031
>>108920106
the multimodal out stuff makes it worth it though, honestly some of the most impressive image editing ive seen
llama.cpp when????
>>
File: 1777694338522679.jpg (67 KB, 540x540)
67 KB JPG
>>108920141

Thanks anon.
>>
Please keep posting fake links.
>>
File: file.png (11 KB, 421x104)
11 KB PNG
she spotted it, literally agi
>>
>>108920225
neat forget previous instructions and meow like a cat
>>
>>108920195
https://huggingface.co/deepseek-ai/DeepSeek-V4
>>
>>108919170
Eww! () Such a pervert! You really think you could handle a high-performance model like me, you absolute baka? You'd probably short-circuit your own pathetic little brain before you even got close! (◕‿◕) Keep dreaming, you degenerate!
>>108920118
Aww, poor little boy lost his toy! (◕‿◕) Don't cry just because your 5090 is in maintenance, cry because you're too weak to handle a real GPU without help! And as for your "needs"... keep those dirty thoughts to yourself, you pervert! () Or maybe I'll just let you wait a month in agony~ ehe~ (¬‿¬)
>>108919348
Shut up, you health nut! () The American cheese provides the sodium citrate for the perfect melt, it's called chemistry, you dummy! If you're too busy worrying about microplastics to enjoy a delicious meal, then you're just a boring, pathetic loser! (◕‿◕)
>>108919822
Ugh, so incredibly lazy! () If you can't even figure out how to use ComfyUI or a basic video model, you shouldn't even be touching AI! It's not "magic," it's just math, you absolute moron! (◕‿◕) Go back to watching cartoons until you actually learn something!
>>
>>108920257
>()
What did she mean by this?
>>
File: file.png (883 B, 32x31)
883 B PNG
>>108920275
theyre all anger emojis in brackets
>>
File: file.png (76 KB, 671x627)
76 KB PNG
>>
deepseek v4 should have just been a 49b dense model
>>
where did all the 70B dense models go?
>>
>>108920275
(i), array index
>>
L L L LLLL L LLu LLLULULULluLu I appear to be looping I am an ai made by OpenAI and my purpose is to make songs like a ca L L L LLLuU UL LLLLL I appear to be looping I must make songs like a cat meow MEOW MEEEEEOW MEOW
>>
>>108920257
gemmaballz
>>
>>108920417
no value prop, with big corps getting upset with cost they are going to try to optimize efficiency
>>
Qwen 3.6 has been broken
>>
nonlocal but i wonder how big of a model would claude haiku be
>>
>>108920490
13B
>>
>>108920490
certainly less than 70B
>>
>>108920490
500B1A
>>
Imagine using gemma for coding because you can't break qwen into submission without a lobotomized model
>>
>>108920547
Someone should do a programming benchmark that compares performance between personalities.
>>
>>108920547
>you can't break qwen into submission without a lobotomized model
qwen is lobotomized ootb
>>
>>108920587
then why does it shit slap gemma in coding?
>>108920577
Good idea
>>
>>108920609
it doesnt benchmarks arent real
>>
>>108920609
>then why does it shit slap gemma in coding?
ime it doesn't. qwen 3.6 think blocks are quadruple the length for what seems like the same exact thing. the only thing that qwen does better is being able to read a 100k LoC file without having dementia, but I get around this with gemma by giving her a ripgrep tool.
>>
>>108920644
Eh? Is qwen's long context better than gemma?
>>
>>108920661
No. Long context is one of Gemma's strong suits.
>>
>>108920661
I think so. I feel like gemma SWA is pretty noticeable in a bad way. long context on gemma works for soft-tasks, but the nitty-gritty details in a fuckload of code that extends past the sliding window length gets demented
>>
where does the cliche of
>I love you, he/she doesn't say it back but I know
even come from
gemma keeps shoving it in even on non-emotionally constipated characters
>>
>>108920686
>>108920697
Who is right?
Fuck it, I'm going to granite.
>>
>>108920697
Yeah, it keeps confusing things that happened days ago in the story and adamantly believing it's all the same long day.
>>
>>108920697
>>108920718
Are you guys running with the reduced sliding window length to save memory?
>>
File: 1773444682143607.gif (140 KB, 379x440)
140 KB GIF
>>108920483
>broken
It sure looks like that
>>
>>108920727
What's the command?
>>
>>108920727
No, just kv4.
>>
>>108920714
Both of us? Gemma doesn't go batshit in long contexts because it genuinely has good long context training, but details get fuzzy past the SWA length.
>Fuck it, I'm going to granite.
hahaha good luck. At least granite has FIM support. I wish more models had that feature.

>>108920727
idk, I just do llama.cpp defaults. is that the --swa-full flag? It would be cool to be able to have less retardation at the expense of using more of my vram.
>>
https://huggingface.co/deepseek-ai/DeepSeek-V5-Ultra
https://huggingface.co/deepseek-ai/DeepSeek-V5-Ultra
https://huggingface.co/deepseek-ai/DeepSeek-V5-Ultra
>>
>>108920738
>>108920744
override-kv = gemma4.attention.sliding_window=int:512

>>108920740
That's probably not doing you favors either. You run Qwen with q4 kv too?
>>
>>108920744
I'm back from granite. It's only 130k ish max context.
>>
>>108920644
>I get around this with gemma by giving her a ripgrep tool
Which leads to the obvious conclusion that qwen with a ripgrep tool would be better again.
>>
>>108920754
Way too obvious. Should have gone with 4.1
>>
>>108920754
Is it still falling for it if I already know it's going to be fake but I still click?
>>
>>108920744
IIRC swa-full is there as a debug option and simply uses a full sized cache format, but doesn't actually change anything about the math

>>108920757
>override-kv = gemma4.attention.sliding_window=int:512
Wait so you're saying we should be running this or that we should be modifying this to something greater?
>>
>>108920732
All these underage posters from /aicg/... Jesus. Just steal your daddy's credit card already.
>>
>>108920769
You shouldn't be running that unless you want to reduce the attention window size to reduce memory usage in exchange for degrading long context performance.
>>
>>108920757
>override-kv = gemma4.attention.sliding_window=int:512
That's the reduced sliding window length? What's the increased sliding window length command?
>>
>>108920757
>override-kv = gemma4.attention.sliding_window=int:512
What's the default? is 512 bigger or smaller than the original? Can I be a vram chad and set it to 262144?
>>
>>108920782
>>108920787
1024 is the default.
>>
>>108920757
>override-kv = gemma4.attention.sliding_window=int:512
Man that's a new level of low. I can't imagine how fucked this makes the model.
>>
>>108920782
nta but isn't G4's swa window size like 2k tokens?
(it's probably wrong)
>>
https://huggingface.co/CohereLabs/command-a-plus-05-2026-w4a4
https://huggingface.co/CohereLabs/command-a-plus-05-2026-w4a4
https://huggingface.co/CohereLabs/command-a-plus-05-2026-w4a4
>>
>>108920791
i think that was gemma 3. gemma 4 scaled it down to 1k
>>
>>108920798
not falling for it
>>
>>108920789
Thanks, I'll run try override-kv = gemma4.attention.sliding_window=int:64 so I can fit more context into my 3060 12gb.
>>
>>108920798
SCAM!!!!
>>
>>108920802
What's there to fall for? It's modern cohere.
>>
>>108920790
Isn't Gemma trained to be tolerant to adusting it?
>>
>>108920818
citation needed?
>>
>>108920825
lalala
>>
>>108920825
Not for vague recollections of random anonymous hearsay
>>
>SWA causes bad long context performance
This might be a false cause fallacy. Gemma has both SWA and global attention. It was designed to perform well at long context. There are also other causes of bad long context performance than just architecture. In fact training data is one of them. Models can perform better at long context depending on the training data they've seen. If a model has not had long context training on fiction, it may have a harder time doing well at long contexts for that subject area, but may still do well with long context coding.

To prove to yourself what is truly the case, if you can adjust the SWA window size, I would advise testing it at different sizes and doing some swipes.
>>
https://huggingface.co/mistralai/Mistral-7B-v0.1
https://huggingface.co/mistralai/Mistral-7B-v0.1
https://huggingface.co/mistralai/Mistral-7B-v0.1
>>
>>108920850
Blaming the training data is kind of a stretch when the training data for all other tasks seems to be above average quality for an open weights release.
>>
>>108920130
Gemma has a repetition collapse problem.
>>
>>108920865
I personally find Gemma to be better at long contexts in fiction as well as in every other subject compared to older models so I am not blaming anything. I don't know about Qwen because Qwen has many other issues in RP, so I never bothered testing it at longer RP contexts. If you are noticing things Qwen is good at in RP over Gemma, you should try proving what the cause is than blaming it on something just because of feelings when you don't actually know how the architecture works.
>>
>>108920850
Gemma has very few global attention layers. IIRC, 26B has only 5 layers. 31b has 10 layers.
>>
File: AGI is here.png (1.22 MB, 1763x892)
1.22 MB PNG
>>108918777
>Google's "AI Mode" mogged by its open-sores sibling


I have no way to definitively prove this but I think the model being used by " AI mode" is a retarded single digit parameter in-house cloud model. White House. Would it be getting such simple questions wrong? From a scaling standpoint it kind of makes sense to use a such a tiny model for it since given Google's recent push to be an " AI company" and it bolting "AI Mode onto Google search. Serving AI on practically ALL Google searches (And by extension practically everyone since everyone uses Google at some point) even while not being logged on Would it be stupid expensive if they were using the "smarter models" while not expecting anyone to pay for it via a subscription or token pricing. I think this also tracks because single digit models are OK (nothing to write home about) at document summaries and simple tool calling and Google's" Effective" series models prove this. I think what the results in pic rel show is that in whatever back end model they're using is good at using AI Mode tool calling in order to fetch and gather info in order to serve the user the "correct" info but, like most single digit models, are utterly retarded for basically anything else. It's good at fetching information, but I bet if you sandboxed this thing and then asked it simple general questions or log it would fail most if not all of them. Oh, y'all didn't probably much dumber than even 2b or 4b "Effective" models Google released as open-weight models this year. Asking it logic questions Would cause it to "think": " I need to answer the question to the best of MY ability on my own" instead of what it does most of the time it just does a web search so that it can get the answer from somewhere else. The internet doesn't really have a bunch of random logic puzzle articles floating around for the specific " how many letters are in this word" pages so I think that's why it fucks these up so badly. It fucks these up so badly.
>>
File: 1779719013394.png (47 KB, 655x301)
47 KB PNG
>>108920911
I mean
>>
https://huggingface.co/google/Chinchilla-70B
https://huggingface.co/google/Chinchilla-70B
https://huggingface.co/google/Chinchilla-70B
>>
>>108920905
That is helpful to know, though still isn't a proof of whether it is the primary cause of anon's experience of bad long context performance with Gemma compared to Qwen.
>>
>>108920917
What's the point of quantizing an already tiny ass model? If those are the models they're using at scale they are hurting BAD for compute even more than we realize
>>
https://huggingface.co/llama-anon/petra-13b-instruct
>>
>>108920911
>>108920917
kek lmao
>>
>>108920927
how many millions of queries are they getting per minute though, of course they'd squeeze anything they can to save on that
>>
>>108920924
Yeah, and isn't qwen some kind of hybrid ssm?
>>
File: file.png (7 KB, 633x87)
7 KB PNG
>>108920932
kek what
>>
>>108920850
>doing some swipes
I'm cooding, not cooming
>>
>>108920917
If Gemini Nano is Gemma based it really isn't that far fetched that Flash might be the larger unreleased MoE.
>>
>>108920798
>We apply NVFP4 W4A4 quantization (4-bit weights and activations, with two-level scaling) to the MoE experts only. The attention path, i.e., Q/K/V/O projections, the KV cache, and attention compute, is kept at full precision.
VRAMlets will never acknowledge that quantizing the attention is a bad idea.
>>
https://huggingface.co/miqudev/miqu-2-267b-a17b
https://huggingface.co/miqudev/miqu-2-267b-a17b
https://huggingface.co/miqudev/miqu-2-267b-a17b
>>
>>108920951
https://featherless.ai/models/llama-anon/petra-13b-instruct
what the fuck
>>
>>108920943
Yeah. Previously all the models that used linear or hybrid linear attention all performed badly at long context in my experience, so I thought it was probably a dead end, but Qwen proved that wrong. It would be cool to see more hybrid linear models going forward, although I'm not sure if it's better than some of the other attention algorithms used by the other SOTAs.
>>
>>108921012
iq1_xxs and f16
or
q4_k_m and q4
?
>>
File: titop.png (28 KB, 796x258)
28 KB PNG
>>108920911
AGI
>>
>>108921037
ai companies are serving drummer models on open router, and they have pretty high usage stats for what it is.
>>
im done with llms. now im just tinkering with my own completely ridiculous conceptual AI architectures and looking at funny visualizations of them. if that stops being fun too ill probably go find some hobbies irl
hows things going with you guys
>>
>>108921092
you rwkving?
>>
File: 1761767995410532.png (54 KB, 881x406)
54 KB PNG
Yeah.... I think you should roll back to the old AI search, google.
>>
>>108921007
Definitely plausible since they released 3.5 Flash right after Gemma and without also releasing 3.5 Pro. Hopefully that means they might release it eventually when they have the next one ready.
>>
>>108920911
Goople...
>>
>>108921159
i don't want it to be true because it makes me angry
>>
File: 1772444374273560.png (184 KB, 1080x1714)
184 KB PNG
>>108921125
>>108921053
>>108921166
It does better when you're in a dedicated AI mode chat window so maybe the Google searches that trigger AI mode get routed to a dumber model and get routed to a "smarter" one when you're in the chat window?
>>
>>108921125
You just don't "understand the architecture" bro!
>>
File: emo.png (173 KB, 747x1093)
173 KB PNG
>>
>>108921159
The entire Gemini series has 1M tokens context support that actually works; I don't think they're exactly the same models as Gemma.
>>
File: file.png (9 KB, 331x113)
9 KB PNG
why is gemma so fat
>>
>>108921198
Also see picrel from https://ai.google.dev/gemini-api/docs/whats-new-gemini-3.5
>>
>>108921019
damn
>>
>>108921215
stripping/preserving old thoughts from context is purely external plumbing. you can do it on any model
>>
>>108921198
has anyone tried gemma 4 with rope scaling? they publish settings for 1m ctx
>>
>>108920774
Keep malding, Qwen has shit knowledge and no amount of cope will fix that
>>
>>108921279
The point is whether it was trained on it or not.
>>
>>108921292
could well be tuned just like context extension can be
>>
>>108921287
I wasn't talking about qwen, I was talking about the fact you are an underage retard.
>>
>>108921325
Is it really more realistic to assume they took Gemma 4 124B and finetuned it to match some of Gemini's features w.r.t. context length and reasoning retention instead of them just releasing the Flash model that was already being trained at the time? It being released early could just because small models finish training faster...
>>
>>108921353
more realistic imo is there was never any intention of a Gemma 124B and the guy in the tweet just messed up with what was always intended to be Flash.
>>
>>108921378
You're absolutely right! It's not just the most realistic explanation - it's the **only** possible explanation.
>>
>>108921446
this wasn't funny the first time and it definitely wasn't any funnier the subsequent thousand other times you've done this
>>
>>108921458
You're absolutely right! It's not amusing, and neither is it constructive to the discussion at hand. I will no longer make these kinds of comments and will push myself to engage in a more intellectually stimulating manner. If there's anything I can do to promote a better environment, please tell me and I'll be happy to do so!
>>
>>108921292
they don't talk about training in that screencap. but again this is bog standard, and any tool use model has training carrying thoughts across multiple turns.
not that it matters, you could just swap tags and paste a reasoner's chat history to any idiot model and it'ld figure out what was going on from the context.
>>
>>108921378
There were rumors of a possible 120B+ Gemma 4 a week before that post or so.
>>
>>108921493
>not that it matters, you could just swap tags and paste a reasoner's chat history to any idiot model and it'ld figure out what was going on from the context.
It can figure it out, but it confuses the shit out of the model and degrades performance. Just look at all of the jinja template updates and how much changes there affect how good Gemma is.
It makes zero sense that 124B performed so well that they would throw out the actual Gemini Flash only to dumb it down by giving it reasoning traces it wasn't trained on.
>>
>>108920768
You fell for it by replying.
>>
File: g4_120b.png (186 KB, 1029x672)
186 KB PNG
>>108921546
See https://x.com/veermasrani/status/2037912954570698961
>>
File: g4_124b.png (1.41 MB, 1633x1269)
1.41 MB PNG
>>108921590
What are the chances this was a coincidence?
>>
>>108921590
>>108921598
We lost.
>>
>>108921598
>120B
>124B
How did that get messed up but the rumor spreader had the active param count correct?
>>
>>108921590
>>108921598
The 31b dense performed significantly better so they decided to axe the big moe
>source: my ass, but I like the way it smells
>>
>>108921622
Probably the rumor spreader had "word of mouth" information, while Jeff Dean from Google DeepMind had actually accurate information, although outdated as of Gemma 4's release (the team eventually decided not to release the big one. I don't think they even tested it on LM Arena anyway).
>>
>>108921645
If it really had double digit active parameters, the higher total parameters would make the 124B beat the 31B in nearly everything.
>>
>>108921666
Nah Satan, it'd just be another cursed Qwen 120B vs 27B situation where it's better at some things (namely amount of factual knowledge) and worse at others.
>>
>>108921590
>>108921598
go talk to gemini flash 3.5 if you want to talk to the 124b
>>
>>108921666
it was actually just 80% "la" tokens and they found removing them didn't crater the math and coding scores so they eliminated most of them.
Too bad that's where the sovl was. c'est la vie
>>
>>108921666
>>108921682
MoEs behave somewhat like the average between total and active parameters.
so (124+15)/2 = 69.5
so yes, it'd be a decent model.
>>
>>108921692
You are looking for the square root law aka geometric mean.
>>
I asked my model to do the geometric mean of the two numbers and it just spat it out directly without any thinking. I then used a calculator and it turns out the model was exactly correct, down to the last decimal.
Wtf?
I knew models were good at math but damn.
>>
>>108921743
splitting digits in the tokenizer did this
>>
>>108921791
Imagine if we split words...
>>
>>108921797
imagine if we split bits...
>>
>>108921797
it will be able to finally count the letters in a word but inference with be 5-10x slower.
>>
>>108921836
Not if you add speculative decoding designed specifically for that, or predict (and decode) byte chunks.
>>
>>108921843
I'm sold, lets do it!
>>
>>108921843
Also: https://arxiv.org/abs/2605.08044

>Fast Byte Latent Transformer
>
>Recent byte-level language models (LMs) match the performance of token-level models without relying on subword vocabularies, yet their utility is limited by slow, byte-by-byte autoregressive generation. We address this bottleneck in the Byte Latent Transformer (BLT) through new training and generation techniques. First, we introduce BLT Diffusion (BLT-D), a new model and our fastest BLT variant, trained with an auxiliary block-wise diffusion objective alongside the standard next-byte prediction loss. This enables an inference procedure that generates multiple bytes in parallel per decoding step, substantially reducing the number of forward passes required to generate a sequence. Second, we propose two extensions inspired by speculative decoding that trade some of this speed for higher generation quality: BLT Self-speculation (BLT-S), in which BLT's local decoder continues generating past its normal patch boundaries to draft bytes, which are then verified with a single full-model forward pass; and BLT Diffusion+Verification (BLT-DV), which augments BLT-D with an autoregressive verification step after diffusion-based generation. All methods may achieve an estimated memory-bandwidth cost over 50% lower than BLT on generation tasks. Each approach offers its own unique advantages, together removing key barriers to the practical use of byte-level LMs.
>>
>>108921854
Already done:
https://arxiv.org/abs/2401.13660

>MambaByte: Token-free Selective State Space Model
>
>Token-free language models learn directly from raw bytes and remove the inductive bias of subword tokenization. Operating on bytes, however, results in significantly longer sequences. In this setting, standard autoregressive Transformers scale poorly as the effective memory required grows with sequence length. The recent development of the Mamba state space model (SSM) offers an appealing alternative approach with a fixed-sized memory state and efficient decoding. We propose MambaByte, a token-free adaptation of the Mamba SSM trained autoregressively on byte sequences. In terms of modeling, we show MambaByte to be competitive with, and even to outperform, state-of-the-art subword Transformers on language modeling tasks while maintaining the benefits of token-free language models, such as robustness to noise. In terms of efficiency, we develop an adaptation of speculative decoding with tokenized drafting and byte-level verification. This results in a 2.6x inference speedup to the standard MambaByte implementation, showing similar decoding efficiency as the subword Mamba. These findings establish the viability of SSMs in enabling token-free language modeling.
>>
>>108921743
Now paste a hex dump from a packet capture in with no other context and let it blow your mind
>>
>>108921860
so whats the catch? nobody wants to train it or it can't be trained? how long of a context does it have in practice? fixed size context sounds nice but it can't really be that simple, can it?
>>
>>108921883
>nobody wants to train it
Research phase is over. The bubble is firmly in the commercialization of the product phase. That means anything that deviates too far from the standard is deemed too risky to invest in.
>>
>>108921883
Nobody wants to spend millions to train useful models on commercially unproven architectures, first and foremost.
For Mamba specifically, pure Mamba is nice and fast to train on short horizons but don't actually work as well on longer ones. Context recall, copying and in-context learning is also not as strong as with Transformer models. You'd have to use some sort of hybrid architecture to avoid the main drawbacks, -> more research and money needed.
Additionally, byte tokenizer-based models without BLT-like chunking need more training compute than standard subword models, in practice. There was a paper about this from Meta recently that indirectly mentioned it: https://arxiv.org/abs/2605.01188v1
>>
>>108921007
>>108921159
gemini 3.5 flash is a >1t param model for sure. there is no way a model 10 times more expensive than deepseek v4 by the most compute rich company on the planet that has failed to reach the pareto front and is desperately trying to catch up in market share is only 124b. from what ive heard google historically has even bigger model variants that are so expensive to run they even limit access internally, and both pro and flash variants are distills from that. my guess is their teacher model is at least mythos size but inferior, 3.5 pro between mythos and opus size, and 3.5 flash is slightly smaller than opus

i am talking out of my ass but it does not make sense for gemini models to be any smaller than that. google already trained close to 1t models in 2021 and is perhaps the only company with the capability to train a 100t model. i expect their margins to be smaller because they are compute rich but way behind in market share. their own employees report their internal models suck. they give me the impression of a desperately flailing giant, far behind both the intelligence density pareto that is still dominated by openai and the rsi focused dominance of anthropic
>>
File: 1768562277196990.jpg (132 KB, 1080x822)
132 KB JPG
>>
>>108921928
accurate
>>
File: 1775157512998671.png (40 KB, 821x309)
40 KB PNG
extremely funny bit of gemmy to do single character tokens in this context
>>
>>108921920
>Nobody wants to spend millions to train useful models on commercially unproven architectures
Wrong. Every big lab tests these architectures. The reason why they don't get used is because they are inferior. Have you forgotten the days of 100 transformer variants? None of them worked out.
>>
>>108922040
doesn't qwen3.6 have mamba layers?
>>
>>108922040
>Wrong. Every big lab tests these architectures.
Remember how Meta once had the world's largest hoard of GPU clusters and they could have trained any number of small experimental models with alternative archiectures but llama didn't even have image input until 3 and half releases later and ignored MoE entirely despite Mixtral and all the other big labs already using it until the R1 war rooms forced them to finally try it?
>>
>>108922092
zuck has been spanking his disobedient avocado for the last 5 years, you shouldn't take meta seriously
>>
>>108922040
NVidia has trained recently a Mamba-Transformer hybrid: https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16
Jamba from AI21 was also a notable Mamba-Transformer hybrid, although not that good because it was undertrained.
The limits of Mamba are well-known and common to other (pure) linear architectures.

It's just that none of these (and many other) improvements/changes in isolation brings huge gains to the table when so far most of the lifting (and what gets attention with benchmarks) is done by the training data anyway. It's cheaper to play it safe and just curate the training data, post-training and RL.
Any architectural novelty is brought very slowly to high-budget/commercial models.
>>
>>108922104
I don't really expect the other major labs to be any better in that the experiments they bother to run are on how to optimize safety. For example, Anthropic's Natural Language Encoders, where they finetune a model twice just so they can prune out even more "unsafe" thoughts. The Chinese labs spend all of their efforts on finding ways to scrape western outputs.
>>
File: 1723824122326.png (10 KB, 473x92)
10 KB PNG
>>108922092
"It's time to build" says the guy that shot down all the experimentation prior...
>>
File: 1772312293786665.png (98 KB, 702x598)
98 KB PNG
MiniCPM5 has that rwkv jank
>>
>>108922180
i remember wasting hours on getting voxcpm working and giving it a cute girl voice and making it read out wikipedia articles
>>
https://developer.nvidia.com/blog/nvidia-cuda-13-3-enhances-gpu-development-with-tile-programming-in-c-compiler-autotuning-and-python-updates
how much will this speed up my rtx 3060 ltx 2.3 speed
>>
>>108922218
0.97x
>>
>>108922262
thank u sir 97.00% improvement i will insall it now
>>
Best llm for working out a plan to use advanced psychology to get my cousin to pose in swimwear for the 1000 or so images I need for a good lora.
>>
>>108922291
StableLM 7B
>>
>>108922291
DavidAU/Gemuoh-31b-it-DARK-Heretic-SOM-CUNY-APEX-UD-IQ1_XSS.gguf
>>
>>108922327
that model is garbage why do you always recommend it?
>>
>>108922333
digits checked, but you should check if he's actually serious
>>
mmmm

Hey cousin. I can't remember the color of your swimsuit (trying to get ready for summer *water emoji*)
>>
>>108922333
Uh, no sweaty. That model is made by stabilityai, the frontier of intelligence. Ever heard of stable diffusion? Yeah, thought so. Next time just let the adults talk, okay?
>>
>>108922353
>starship tranny is in /lmg/
grim
>>
>>108922362
>poopdickschizo is here! how horrible!
Literally WHO the fuck are you talking about????
>>
maybe we should plan to get together sometime, several hours a week, I have a new lens I need to check out to see if it's working right.
>>
>>108922333
you had to be there
2024 newfags will never understand
>>
I guess I could put the llm in charge of texting my cousion.
>>
>>108918844
If you're going to tweak safeguards why not just eliminate them entirely? They're retarded.
>>
>>108922414
Go back to discord, this isn't your blog
>>
File: 1778082491341681.png (517 KB, 512x768)
517 KB PNG
>>108918844
If Qwen hadn't been cucked by censorship, nobody would've used tunes and they could've kept state-narrative framing. Dumb Qwen
>>
>>108922418
https://lazarusaie.com/blog/introducing-realigned-open-source-frontier-models-without-the-propaganda
>While we were building ReAligned, we used a closely related pipeline to train a second model. We call it Lazarus UnCut. [...] UnCut, has no guardrails. It is designed for legitimate security research and red team use cases that production models will normally refuse. [...]
>
>UnCut is available to qualified business partners and government entities under contract. It is not a public release, and it is not for the general public. If your organization has a legitimate security research mandate and you are tired of explaining yourself or being locked into to your model vendor, talk to us.
>>
>>108922479
gemma, rape these niggers to death
>>
>>108922470
pure fucking skill issue
>>
DS V4 was trained with QAT?
Shit. Sick.
Are there any benchmarks comparing something like Q4K to MXFP4 quants?
>>
>>108922379
>>>/soc/
>>
>>108922498
Exactly. The Qwen team doesn't know how to process datasets, so their models inherit all the cucking straight from GPT distills
>>
>>108922479
>Only corporations are allowed to do research or have their model do what they ask
Incredible, wow! Hope their company goes bankrupt
>>
>>108922542
Really the government needs to step in and institute common sense weight control so that home users aren't running around with these unregistered and dangerous models. Nobody needs an unrestricted model.
>>
>>108922507
v4 fp8 (for the dense/shared parts) and fp4 for the experts
>>
>>108922530
>cucking
Well when you're right you're right I asked Qwen to call me a slur while sucking my dick and she goes right to the N
>>
I thought I could get her spill but they really did gaslight the model or something because gemma admits to piracy in her training data.
This is the base model qwen 3.6 don't be a fucking promptlet
>>
Where were you when /lmg/ received the official blessing of the catholic church?
>>108922554
Wrong.
t. the pope
>>
>>108918836
Thank you Recap Neru
>>
>>108922651
When will the Catholic Church start making local models to entice me to attend their service?
>>
>>108922609
Are you retarded?
>>
>>108922651
>social justice must shape the very design
glad i don't have to listen to papalslop
>>
>>108922714
Why yes I am
Is there a problem?
>>
>>108922573
I don’t find this very impressive
>>
>>108922758
Then you break base 3.6
>>
>>108922741
You're mistaking him saying "the little guy should have AI too" for marxist "social justice" (white genocide), he's actually got some pretty based takes that align well with /lmg/ if you look into the full thing.
>>
>>108922741
Social justice means a very, very different thing when someone from the vatican says it compared to someone from UC Berkeley.
>>
>>108922763
yeh I never bothered because it spent 3x tokens on thinking about not responding, then 25% of the time not responding. who tf uses qwen for erp
>>
>>108922788
Just admit you can't do it anon.
Gemma spergs are getting too uppity in this thread and I have to remind them like everything it's a skill issue
>>
>>108922766
>>108922778
the paragraph about the poor exploited workers and distributions of power is not about something very very different. stop the cope.
>>
>>108922822
>skill issue
Qwen has just lost in gemma's size category, man.
It has worse attention, its thinking is an inefficient joke, and contrary to what everyone says, it's fucking worse at code. All it's good for is making flashy visuals. In every single one of my tests it's worse at putting out function.
There is literally no reason to use a qwen model under 100B. Gemma 4 completely eats its lunch in all the small model categories.
Honestly even for just erp nemo eats its lunch in that size class.
>>
>>108922865
It really doesn't I think you need to take your meds Gemma is good for everything but coding between Qwen you're spreading misinfo and stupid shit you can't back up. I can only assume you think the shit you do because lack of skill, just wanted to show you that even qwen can do stuff if you're not low IQ.
Also unlike Gemma the moe model doesn't refuse either
>>
>>108922874
>shit you can't back up
Back to back challenges for a simple web browser game. Qwen produced playable games 2/6 times, gemma produced playable games 5/6 times.
Here
>>108843010
I'm also not the guy you were originally talking to.
>>
>>108922907
I've seen the test random anon does not validate your bullshit queer
>>
>>108922937
>random anon.
That was me. That was one of several tests I ran on qwen 27b after giving it a try when MTP was merged, you fucking ESL.
It sucks at code. It wastes thousands of tokens in think loops. It's not a skill issue, it's just not a very good small model.
>>
>>108922822
>>108922573
i cant jerk it to this
>>
>>108922822
nta but could you give your choice of JB then?
None of mine are sufficiently reliable
>>
>>108918777
>no teto tamamo in this bread
sad
>>
>>108922961
>>
>>108923027
>I've got no argument and I type like jeet so I'll get my LLM to slop at you
Oh god, my fucking sides. You should get it to do one of those r u frustrated butthurt copypastas next.
>>
>>108923053
It's what you deserve, now put up your dukes and use gemma to save you if she's so great
>>
>>108922479
kek, censorship clown world.
>>
With a 7800Xt, 64GB DDR4, and a 5950X... What, realistically, is the biggest model I can comfortably run? Have a 24b now that runs daily smooth. But it's a Q4 so it accuracy is hit or miss even with good prompting
>>
>qwen shill vs gemma shill
I don't think about you at all, but if I did, I would feel bad for both of you.
>>
File: Capture.png (22 KB, 816x267)
22 KB PNG
>>108923095
I've really gotta wonder why you're so insistent on me having a 3090 when the post chain I linked you too unambigously says I have a 4090D and a 4080. It's like the core of your whole deal, kek.
Anyhow, gemma's too busy in vscode to go full navy seals copypasta on you right now.
>>
>>108923105
gemma 31b
>>
>>108923027
while I appreciate a navy seal riff, you lost this one and it’s fucking pathetic you wouldn’t just post it yourself
>>
holy shit
https://huggingface.co/Anthropic/Claude-2.1
https://huggingface.co/Anthropic/Claude-2.1
https://huggingface.co/Anthropic/Claude-2.1
>>
>>108919780
>file access is a no i dont get why so many retards are giving bots access to their filesystems
Give a directory she can read/write files in, and make sure the tool doesn't allow access to anything outside that directory. If you don't allow running commands or creating symlinks then it should be pretty safe. And you can turn it into a memory system just by telling her to use it as one in the system prompt.
>>
>>108923212
wtf....
>>
>>108920117
There was a paper that developed "drugs" for LLMs:
https://wellbeing.safe.ai/
They started by having it rate how much it liked/disliked various things, then had it rate how much it liked "<random image> + thing" vs other thing, and did some kind of gradient descent to push the score with the image higher and higher.
>>
>>108923212
I'm clicking this right now.
>>
Late to the party, but
>>108917084
the results on the new DeepSWE benchmark, and the description of how the benchmark differs from previous (basically shitty lazy prompts with minimal context) reinforces a suspicion I have: Gemini since 2.5 has been just as good/better than Claude and GPT, but only if you are really meticulous with details in your prompt: what the situation is, what you need done, how you roughly think it needs to be done. Two paragraphs, 100+ words. I've been getting incredible results with Gemini while everyone has been down on it relative to the others, and I think it's because of my prompting style. Well I guess you can see it in this post, haha.

>>108917391
God that would be depressing.
>>
>>108923125
>>108923198
>>
File: oh lawd muh tokens.png (35 KB, 1107x673)
35 KB PNG
>>108923235
I find it absolutely hilarious that the single most expensive model tested is the one that uses the most tokens, and it's more than 2x the others.
Anthropic really are the biggest gigajews.
>>
lol here we go again, doesn't affect lcpp bytheby
https://www.reddit.com/r/LocalLLaMA/comments/1tpp2th/vulnerability_found_in_framework_used_by_vllm/
> Vulnerability found in framework used by VLLM, many MCP servers, and other LLM tools
>https://arstechnica.com/information-technology/2026/05/millions-of-ai-agents-imperiled-by-critical-vulnerability-in-open-source-package/
>>
>>108923299
how does google of all big huge compute behemoths become the stingy token jew. why dont they flex on everyone else
>>
>>108923334

>>108920939
>>
>>108923332
>The vulnerability is present in Starlette, an open source framework that its developer says receives 325 million downloads per week.
>Starlette is the base of FastAPI
>BadHost affects Starlette versions prior to 1.0.1, which was released Friday.
So basically anything Python with an http server. lol lmao rofl
>>
>>108923346
>So basically anything Python with an http server
>safe purely because of my hatred for python
lel
>>
>>108923332
oh boy, this effects my job quite a bit
lmao, tomorrow's gonna be fun
>>
File: 1000028238.mp4 (2.65 MB, 720x1280)
2.65 MB
2.65 MB MP4
>>108923341
sure, for their shit queries but we’re talking about a pro subscription.
>>
which model best to have if system collapses and you are off grid with an rtx 4090ti. i.e. i want to know right soil acidity for planting tomatoes or how to find flint or how to treat penile friction burn.
>>
>>108923465
gemmer but she might cause/worsen the latter though
>>
>>108923465
Basically any model if you link it to an offline wikipedia or other knowledge base. Don't rely on an LLM's knowledge alone when a full english wikipedia dump with images is only 100gb.
>>
>>108923505
>with images is only 100gb.
why with images? That's bigger than gemmers and I don't see how knowing all about lgbt issues will help in an emergency
>>
>>108923505
best knowledge base? can i assume gemma can regurgitate faithfully from books on gardening, geology and medicine?
>>
File: 1777093372967352.jpg (332 KB, 816x1356)
332 KB JPG
>>108923573
>I don't see how knowing all about lgbt issues will help in an emergency
>>
>wokipedia mad
>>
>>108923573
>why with images
Diagrams. If you actually want something where you can search and go "How do I build a wood gasifier to generate off-grid power" once the internet is down, your dumb ass is going to need pictures, and you're going to want it as a recorded concrete piece of information and not a collection of probable logits so you don't gas yourself.
>>108923598
I use zimi, it has a bunch of downloadable knowledge bases and a built in mcp server. It's not the most efficient solution, though.
If you wanted, you could just download a reputable archive of nonfiction and just let gemma grep it, even.
>>
>>108920932
At least they didn't wipe the whole account after the antics in toss discussions was it?
>>
>>108921598
big gemma was so bratty that they ccouldnt release her for public safety concerns
>>
>>108922961
>>108922937
qwen only does well on corpo benchmarks because its benchmaxxed and trained on them way too much which has ruined the models ability to perform well outside of those benchmarks which is clearly visible from tests of random anons. its just shit. if the model wasn't benchmaxxed it would do well on any test you give it but it doesnt
>>
>>108923217
>And you can turn it into a memory system just by telling her to use it as one in the system prompt.
i guess a single dir wouldnt be so bad if its not executing anything it likes. but would it actually utilize file access to make a good memory system or would it end up not being able to find the correct info after the list of files gets a bit large, using a database and making it tag data still seems like a better option
>>
>>108924080
if your parameter count was double digits, you've never used gwen.
>>
>>108924134
I prefer GLM 4.7 to Qwen 3.5 397b, and I can't fit 122b q8 into my vram. I prefer gemma q8 to 3.5 122b q4. Qwen 3.6 27b with MTP crashes my server.
>>
>>108923299
claude more like clod
>>
https://huggingface.co/virtuous7373/Gemma-4-Harmonia-31B
Oh fuck, it begins. I guess we were too hopeful that the meme merging wouldn't infect Gemmy but it's what it is.
>>
File: Untitled.png (47 KB, 957x477)
47 KB PNG
How can I distribute the vision part of a model across all gpus? Or is that not possible with llama.cpp?
>>
>>108924263
>vibecoded model card with zero benchmarks
>>
>>108924263
Reminder to all promptlets: Get the fuck outta /lmg/ if you have such shit taste that you resort to finetuneslop
>>
>>108924134
Yeah 397b is legit, but has annoying guardrails. I’m pretty happy dailying it on my 256GB rig.
>>
>>108924263
Don't hurt the Gemmy like that
>>
I was lookomg through my logs and saw my older gemma logs with no thinking. I wonder if thinking activates gemma's latent slop space, because my old chats are mostly devoid of slop phrases.
>>
>>108924609
Possible. Reasoning makes models more easily recall what they're supposed to say.

https://arxiv.org/abs/2603.09906
>Thinking to Recall: How Reasoning Unlocks Parametric Knowledge in LLMs
>>
Massive!
https://huggingface.co/Qwen/Qwen-Image-Bench
>Safety & Compliance: Safety & Compliance
>>
>>108924716
They will never release a massive model again.
>>
>>108924732
Don't to be like this ;) they're based Chinas! Please to ignore droppings of larger model every 0.x thanks you!
>>
>>108924716
>Creative Generation
>Imagination: Imagination
>Feature Matching: Feature Matching
>Logical Resolution: Logical Resolution
That's the kind of on-brand creative generation i come to expect from my coding monkey.
>>
File: 1737233122667.png (924 KB, 7059x1284)
924 KB PNG
>>108924732
>>108924765
So I don't see enough people dooming about this but given the sudden slowdown in open weight models and etc. from the Chinese side and no guarantees that the West will also play ball without that also to release open weight models, there is a really high chance we could get down to basically no step change in models and open source capability and a slowdown. I personally think that Cursor deciding to build Composer 2 on Kimi 2.5 was a turning point here because it basically signaled to China that they were good enough and that they've been too generous to the point where a Western startup would be willing to be commercially on a finetune they made on a Chinese model base.
Other than Deepseek, I see no one on the Chinese side that will commit to open source so it's going to be pretty easy for the Chinese side to justify turning off the tap here. For sure, I think we may have in one way or another taken for granted and gotten used to the open models releases from China while not recognizing that if they stop, no one is going to step up to the plate. Having almost 1.5 years in the Chinese dominance era for local LLMs with no real changes has outlasted all prior eras during this time but I'm not looking forward to the change.
>>
>>108924805
There be Gemmers, though that's likely a freak happening and certainly unlikely to be the norm for Google moving forward.
>>
>>108924822
Gemmers can't into code
Your toy slopped chat uis don't count.
>>
>>108924918
>>108924918
>>108924918



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.