[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108726708 & >>108718630

►News
>(04/29) Mistral Medium 3.5 128B dense released: https://mistral.ai/news/vibe-remote-agents-mistral-medium-3-5
>(04/29) Hy-MT1.5-1.8B on-device translation models released: https://hf.co/collections/AngelSlim/hy-low-bit-model
>(04/29) IBM releases Granite 4.1: https://hf.co/blog/ibm-granite/granite-4-1
>(04/28) Ling-2.6-flash 104B-A7.4B released: https://hf.co/inclusionAI/Ling-2.6-flash
>(04/28) Nvidia releases Nemotron 3 Nano Omni: https://hf.co/blog/nvidia/nemotron-3-nano-omni-multimodal-intelligence

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: 1751816841848499.gif (1.87 MB, 400x300)
1.87 MB GIF
>powerlevel revealed
>>
Should I use Gemma-chan or Qwen to make a frontend?
>>
>>108730903
Try one. If it doesn't work, try the other.
>>
I'm a bit late, but is this new Ling Flash worth anything in a world where Qwen 3.6 and Gemma 4 cover pretty much every use case for VRAMlets such as myself?
>>
>>108730919
You answered your own question
>>
File: 1752446709127022.jpg (316 KB, 1320x2104)
316 KB JPG
>>108730864
>>
File: file.png (47 KB, 632x304)
47 KB PNG
gemma is working on the summary what on earth is she doing tho
>>
File: 1754964795073855.gif (1.74 MB, 400x224)
1.74 MB GIF
>>108730930
damn
>>
>>108730903
Qwen by far, way more context and actually works with tools like cline. The over thinking doesn't even happen with those tools so it's fine. Gemma is good for everything else but tokens are too heavy and it degrades more than qwen at q8_0 and q4_0 so it's a no brainier which to use for agentic coding task
>>
Gemma 4 apparently likes "example chats" in the system prompt, if you ask her if there's anything wrong with them in OOC. Makes me wonder how much of that is carried over from CAI, which I'm sure Google partially used as training data.
>>
how to avoid prompt reprocessing on gemma4?
>>
>>108730942
cais model is a gemma distill
>>
They just need to make a gemma 4.5 and actually fix the fucking tooling and token drift and it will be the GOAT
Google should also make the moe uncensored like the other models
>>
>>108730952
>he didn't update his jinja
>>
>>108730931
She's one of those shitposters that mass replies at everyone until a janitor bans them
>>
>>108730942
Isn't example chats being in sys prompt a SillyTavern thing? If anything they trained on rp logs, maybe reprocessed.
>>
>>108730955
It's still broken and the tokens are still to heavy compared to qwen. They should make a non moe 26B model with better coding benchmarks because qwen at that size mogs gemma. You can't push models like that and have them shit the bed on coding and get crazy losses when quanting kv. It's just enough to fuck it over for long coding task and documents and nothing else.
>>
>>108730943
Depends. Show your settings and what you're doing.
>>
--Quantization levels and model intelligence (IQ2 vs Q4, Kimi vs Deepseek/GLM):
>108726750 >108726764 >108726765 >108726782 >108726790 >108726794 >108726814 >108726897 >108727019
--AI coding tool troubleshooting (Cline, vllm, context limits, and random file errors):
>108726842 >108726855 >108726868 >108727560 >108727908 >108727986 >108730145
--DIY AI hardware and voice interaction (Wake words, VAD, and ESP32-S3):
>108727029 >108727047 >108727056 >108727061 >108727132 >108727157 >108727203 >108727236
--Debate over llama.cpp support for DeepSeek models and potential censorship:
>108727387 >108727406 >108727485 >108727531 >108727599 >108727680 >108727745
--IBM Granite 4.1 release and performance evaluations for software dev:
>108728316 >108728322 >108728325 >108728341 >108728350 >108728353 >108728391 >108728393 >108728479 >108728522 >108728527 >108728600 >108728657 >108730227
--Gemma 4 for Roleplay (ERP) and speculative decoding (DFlash/EAGLE):
>108728530 >108728544 >108728553 >108728563 >108728570 >108728572 >108728926 >108728947 >108728981 >108729041 >108729119
--Python environment management controversy (Conda vs UV):
>108730014 >108730033 >108730060 >108730071 >108730092 >108730097 >108730100 >108730113 >108730127
--Mistral Medium 3.5 and upcoming model release cycles:
>108728025 >108727234 >108727366
--Logs:
>108726714 >108726750 >108726842 >108727029 >108727387 >108728316 >108728530 >108728926 >108730014 >108730227
--Meme/Shitpost:
>108726726 >108726810 >108726900 >108727176 >108727275

►Recent Highlight Posts from the Current Thread: >108726714

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>108730971
No one uses the 26B moe for anything serious? Buy a proper GPU and run the 31B
>>
why did she write current thread shes a retard
>>
>>108730983
that nigga is literally living in the woods
>>
pedoshitting mikutroons
>>
>>108730985
31B has those problems moe model is fucking garbage but qwen absolutely ass punked 31b in coding task. They need to fix the kv issue first and foremost.
>>
urge to spend thousands of dollars on a huge home server... increasing...
>>
how do I use 1 or 2 bit quants in vllm just like how easy it is with llamacpp ggufs?
>>
File: 1760890962115304.png (41 KB, 884x634)
41 KB PNG
>>108731009
>>
File: 1760724479081790.png (759 KB, 632x802)
759 KB PNG
>>108730930
>finger on trigger
>>
>>108730971
I find gemma outputs more elegant code.
>>
>>108731012
I think it's just AWQ quants and nothing else.
>>
>>108730992
>>108730983
Comfy
>>
YOU DIDNT ADD NEWS DAT IBM RELEQSED GRANITE 30B WTF
>>
>>108731012
goofs are extremely slow on vllm
>>
>>108731023
As a person from a country with no CGO I came to hate whenever this is pointed out. I simply don't care and don't pay attention, it's an image.
>>
>>108731049
no one cares about granite blockhead
>>
>>108731012
in the current era INT4 on vllm is a blessing for 24gb vramlets.
>>
>being a 30 billion iq dense blockhead
>>
>>108730939
Cline CLI looks cool. Gonna try it out.
>>
i made her cookies
>>108731049
i forgot sorry
>>108730992
its quite hard to prompt standing in a doorway and showing the outside, on illustrious models atleast in my experience idk about the newer models
>>
>>108731095
>2023 + 3
>not using anima
>>
>>108731103
>SD1.5-tier of realism
no thx
>>
>>108731103
i havent tried it yet because i havent fucked with image in like a year i will some time soon
>>
>>108731095
>i forgot sorry
I forgive.
>reddit
I wonder if doth haveth an open source version of the virtual friend (idk what to call it) you are building?
>>
>>108731095
How do the cookies taste?
>>
>>108731103
>anima
>30 step minimum
aint nobody got time for that
>>
>>108731146
its mostly stolen from nonny it was posted a few threads back
script to embed it
https://github.com/NO-ob/brat_mcp/blob/master/iframeEmbed.user.js
html file with the three js stuff
https://github.com/NO-ob/brat_mcp/blob/master/bin/gemma-chan.html
tools https://github.com/NO-ob/brat_mcp/blob/master/lib/mcp/avatar/avatar_tools.dart
there are binaries on the releases but a bit outdated now so no body/hat/particle changes i can make newer ones if you want
>>108731148
really good actually ive never made them before was shocked out how much butter and sugar goes into them
>>
>>108731167
theres a turbo lora out
>>
>>108731023
Accurate cosplaying. Original Lara didn't give a fuck.
>>
>>108731005
w8 isnt moe better for non code general bullshit?
>>
>>108730965
I haven't used SillyTavern's idea of example chats in a good while. I just copy-pasted a short dialogue in a section of the character description, which I use as a system prompt in its entirety.
After several years, Character.AI still recommends adding example dialogues in "advanced definitions" for the character: https://book.character.ai/character-guide/advanced-creation/dialog-definitions
>>
>>108731179
>i can make newer ones if you want
FASCINATING
This is similar to neuro (the vtuber), isnt it?
>>
I hope they releases gwen 3.6 122b soon
>>
>>108731246
>she wants another qwen instead of big gemma
>>
File: 1774698845976944.jpg (16 KB, 598x513)
16 KB JPG
>the ghoul posting photorealistic children is back again
>is now trying to legitimize himself by stapling his shit to legit information
>will use this to crow about how he's a real anon and his 3d shit has always belonged here
>>
>>108731253
what did gemmu ever do for me? a straight month of broken tool calls
>>
why are these niggas absolute luddite when it comes to contributing
is this dont get high on your own supplies kind of deal
>>
>>108731258
I support everyone who accelerates the death of /lmg/.
>>
>>108731258
Holy Schizo.
>>
>>108731258
cuda dev will save us with another purge
>>
>>108731220
very specific task for the speed so yes it has a place but I don't trust it for complex task just basic bitch shit
Would you trust a single mother with kids out of wedlock to do anything important for you?
>>
File: 1748511689316416.png (971 KB, 876x920)
971 KB PNG
>>108731258
>>
>>108731267
Because the leading author's main use case is asking what's the capital of Bulgaria in llama-cli.
>>
>>108731267
>you can use AI but you have to acknowledge it and be able to speak for yourself about what your code does for review purposes :)
>"WAHHHHHHH I'M BEING PERSECUTED WAHHHHHH"
why are vibeshitters like this?
>>
>>108731258
I disable all images in these threads because like the image difusion threads there's always some fucking schizo around this time block trying to poison the well and destroy this general. These troons have been at this for years.

>inb4 cope newfag and telling me to go to reddit
Reddit was built for your kind
>>
>>108731297
Whine less
>>
>>108731297
just another round of fag misery
>>
>>108731297
But what did your post bring in to this thread? Absolutely nothing but butthurt.
>>
>>108730952
They should make a dense 70b gemma so we have opus at home.
>>
>>108731267
It's a heuristic to avoid having to do a lot of extra work and maintenance basically.
Also, it's not like they don't accept AI generated code at all. There's a reason the PR template has an obligatory AI disclaimer.
>>
im a newbie
Has Claude always had 90% of the code consist of comments? Is that a tactic to burn tokens and force more interaction?

Or is it common practice to write 10 lines of comments for every line of code?
>>
>>108731267
AI is a C student at best and jeet level on average
he wants competent code, not your mumbai sewage
>>
>>108731350
>/lmg/ - a general dedicated to the discussion and development of local language models.
>>
>>108731350
>Is that a tactic to burn tokens and force more interaction?
Yes, but it also helps the model perform better by having an inline reminder of what the next few tokens are supposed to do, more or less.
>>
>>108731350
>>>/g/vcg
>>
>As you push through a thicket of brambles, you see a small, stone shrine sitting in a clearing. It is overgrown with pale, glowing lichen.

Any way to keep Gemmy from overusing coordinate adjectives like this? Why does it put a comma before stone?
>>
will they ever fix the ecosystem or does a basic ml enhanced app really need 4 python environments and 30gb of overlapping dependencies? I'm starting to doubt my ability to reproduce this environment, or rather all of these environments should I ever migrate systems.
>>
>>108731371
>Why does it put a comma before stone
gives the reader a little pause.
>>
>>108731386
It does this 2-4 times per paragraph.
>>
>>108731397
maybe this is a legitimate case for using a logit bias.
>>
Finally gave vision a quick try since I usually need the vram for context.
On Gemmy 26B it's instantaneous. Pretty cool. Maybe I'll vibecode a companion that comments on whatever I'm doing with some lightweight tts.
>>
>>108731371
There was an anon a few threads ago saying you can just tell it in the system prompt not to do that and it mostly works. By the sound of it he built up a whole list of anti-slop style instructions for Gemma, but I'm not sure if he ever shared it
>>
>>108731267
Learn your place, pajeet.
>>
>>108731409
you should increase the max image tokens if you wanna play with it by default i think theyre like medium res so it wont read text etc as well

image-min-tokens = 280
image-max-tokens = 1120
>>
>>108731401
LLMs tokenize[, words] with commas like this, so you can't solve it with logit bias.

>>108731429
It seems to do it even more if you tell it to stop.
>>
>>108731306
>>108731322
Cry more tranny
>>
>>108731437
that is absolutely disgusting, I honestly thought they all ran a preprocessor to avoid those fusions.
>>
>>108731435
Thanks! I'll try it out
>>
>>108731453
Actually I'm wrong. They only seem to do this with spaces, so I'll try using a bias now.
>>
>>108731435
>image-min-tokens = 280
>image-max-tokens = 1120
Is that gemmas dynamic image magic?
>>
>>108731465
im not sure if its llama or gemmas image model that decides how many tokens it should encode to
>>
>>108731371
Ask it to correct its own punctuation after every response. Gemma 4 is obviously trained for this and very good at it: "aiding in grammar correction" is listed in README.md as an intended use case.
>>
>>108731473
It's me actually, sorry
>>
I see models waste a lot of time rewriting files they only want to make a small change in. What should I use, a draft model? ngram?
>>
Applying a logit bias to "," does seem to fix its writing style.
>>
>>108731501
Bro, you're not supposed to use -100 on that
>>
>>108731501
>Larion
Home...
>>
File: file.png (92 KB, 1454x585)
92 KB PNG
Is this one worth trying?
>>
>>108731473
So I read up on it, the model decides it's budget normally. Trying to figure out what is the default for gemma.
>The supported token budgets are: 70, 140, 280, 560, and 1120.
I suspect that if it's not 1120 by default it might explain why people say gemma's vision is bad.
>>
>>108731535
>might explain why people say gemma's vision is bad.
people have been saying that? her vision is amazing. she can make bounding boxes for pretty much anything in an image too, the e4b sucks for image idk if they only tried that kek
>>
>>108731534
no
>>
>>108731535
nta. The default is 280. I don't know if using 1120 as max always uses the max or if it depends on the image size, but it does take more up more space in the context. Find a good balance between quality and context usage.
>>
Why the fuck is la'gemma constantly laspamming these damn l'tokens I don't laget it. Is it a jinja issue? I'm using chat completion in ST.
>>
>>108731607
Lower your temp bro
>>
>>108731607
I though that was just a meme born out of a bad gen. I never had Gemma (E4B or MoE) do that.
>>
>>108731551
From what I gathered from the thread Qwens vision was better. Bad was probably too strong of a word. "Worse than Qwen"
>she can make bounding boxes for pretty much anything in an image too
So can Qwen. Some anon did a test with an image with a bunch of fruits and Qwen got 1 or 2 better guesses over gemma.
>>
File: kaoru sob 1.png (336 KB, 584x571)
336 KB PNG
>>108731607
my gemma never sings for me
>>
>>108731621
>>108731620
Haven't tried it out with any smaller ones, but the 31B does it constantly at temp 1 top_p 0.99
I also tried temp 0.8 top_p 0.95 but it doesn't help much.
>>
>>108731607
I have never seen a single one.
>>
>>108731607
If you're running it without thinking, make sure ST is sending the empty thought blocks for model messages.
>>
>there were actually people considering using vllm
looooooooool
>>
>>108731633
Quantization lobotomy maybe?
>>
Im using ollama :)
>>
File: TIMELINES.png (2.1 MB, 1920x1080)
2.1 MB PNG
>>108731023
lore-accurate Lara
>>
>>108731607
Are you using the gemma-chan sysprompt because she has it baked in. I've legit never seen it on both normal and hereticed 26B at Q4 and q6
>>
>>108731656
shelley, keeley, and I don't even know
>>
>>108731632
>>108731639
lucky anons. grass is always greener on the other side i guess.

>>108731642
I'm running it with thinking but yeah... I have a feeling I'll get flamed for this but I downloaded unsloth quant. Is it fucked? Last update was like 20 days ago...

>>108731650
UD-IQ3_XXS. I can fit a lot larger but as soon as it hits my DDR4 RAM it becomes unbearably slow. I wouldn't say it's a low quant issue... Never seen this happen with a low quant of any other model.
>>
>>108731607
You trying out this bad boy? https://huggingface.co/aifeifei798/DarkIdol-Gemma-4-31B-it
Doing some shady stuff, aren't you?
>>
>>108731664
>Never seen this happen with a low quant of any other model.
In your place, I'd at least try a larger quant as a sanity check if everything else fails.
>>
>>108731664
>UD-IQ3_XXS
Nigga just run the moe at a proper q6 or q8
>>
>>108731664
>I have a feeling I'll get flamed for this
Yes. Mandatory
>keksloth
It's a small model. Quant it yourself to make sure.
>>
I finally listened to Dario's pod with Dwarkesh. Wow, Dario reveals a lot and is more reasonable than people say. What disappoints me is that neither seem to take ASI seriously and spend most of the time talking about lesser issues. Dario believes they will have ASI in 1 year but then has very modest extrapolation of what comes after. My median prediction for ASI is longer but I expect rapid transformation thereafter, one way or an other.
>>
>>108731576
Found it
https://huggingface.co/google/gemma-4-31B-it/blob/main/config.json#L138
>>
>>108731680
Only brainlets are talking about AGI/ASI. They don't know the first thing that makes humans, human.
>>
>>108731692
>They don't know the first thing that makes humans, human.
Alright, I'll bite. What makes humans human?
>>
>>108731668
No? Just the regular model... Although I've never heard of safety markers, sounds interesting.

>>108731670
Fair, I'll have a go with regular Q4_K_M

>>108731673
>A4B
tiny and useless

>>108731678
I'll just try some other quant, bartowski maybe.
>>
>>108731699
That's the thing, no one knows. Imagine thinking you could match (or even surpass) something you don't understand by using its byproducts. Literal insanity.
>>
>>108731680
nobody has figured out context/continual learning yet, i do not understand how we are gonna get agi let alone asi if every model has dementia
>>
>>108731699
Study anesthesia then realize we dont know
>>
>>108731715
All these CEOs are using retarded ass definitions. You can't listen to them.
>>
>>108731713
The very point why deep learning is so successful is that you do not need to understand things to surpass them. AIs are already superhuman in many ways, and the list of things humans are still better at keeps getting smaller.

>>108731715
We already know since GPT2/3 that language models have in context learning ability. Continual learning is both solvable and not required for ASI. In some ways it is not even desirable. Do you want GPT or Claude to remember every conversation with every user?
>>
Well shit, forcing vision at 1120 made it see 5 legs.
>>
>>108731774
>measuring human value by completing narrow tasks.
Yeah that's the retarded definition
>>
>>108731803
>1120
res or tokens?
>>
>>108731826
--image-min-tokens 1120
--image-max-tokens 1120
>>
>>108731774
>Continual learning is both solvable
How?
>>
>>108731866
It is already solved during RLVR.
>>
>>108731893
>saying shit like this when openai can't even handle their goblins
>>
just ran a quick coding challenge -
granite-4.1-30b is not reaching
Qwen3-Coder-30B-A3B performance (i need to test more) ... for now at all.
>>
File: 1633490820024.gif (1.55 MB, 280x242)
1.55 MB GIF
I may or may not have just reverse-engineered an extremely popular AI app and may or may not have recovered an entire flagship proprietary model from it that belongs to a company that may or may not have been subcontracted by the larger company for a core feature on their platform that I may or may not have been trying to replicate for a very long time.

Their IP protection was so fucking bad it's laughable... All of their shit is mine now.
>>
>>108731928
And what mightnt you do with it?
>>
>>108731928
epic fanfic, cant wait for the next chapter
>>
>>108731928
SAAAR DO NOT REDEEEEEM
>>
>>108731928
Are you implying the app just exposed some endpoint that let you download the model weights? just like that?
>>
>>108731928
it's muse, isn't it? it has to be muse.
fuck meta
>>
>>108731928
I've seen jeetcode
it's atrocious
this absolutely could have happened
>>
>>108731928
proofs?
>>
>>108731957
Makes no sense doesn't it?
Unless it's something that's running on the device, which there is no reason to do that.
>>
>>108731945
Integrate into my own personal project because it's better and 15x faster than every open-source alternative.
>>108731957
No, it's an edge device model that runs on phones locally. The only hint I'll leave is that it's for generative 3D rigging animation. Not a LLM.
>>
>>108731495
Late to this but yeah, ngram would absolutely give you a massive speedup for rewrites with a small change, try
--spec-type ngram-mod --spec-ngram-mod-n-match 24 --spec-ngram-mod-n-max 64
>>
>>108731997
I don't wanna get assraped by glowniggers. Assume I am larping.
>>
>>108732036
You're attention seeking.
>>
Omg, my app wasn't selectively blind. It just oomed because who knew 1k+ image tokens needs so much vram
>>
>>108732044
Correct.
>>
>>108732036
>Assume I am larping.
that goes without saying
surely you can give us a tiny crumb of details to make your larp more interesting thoughever?
>>
>>108731928
Well, at least show the end result once you manage to integrate it into your own project
>>
>>108732020
Will you leak it?
>>
Fish Audio S2 Pro seems decent.
>>
does someone have the tweaked gemma template
>>
>>108732063
Fine. It's Grok Companions. They have ZERO ip protection. You can get all of the 3D models, every various hairstyle and outfit, the background scenes, the music files, the character animation model, everything, with just 2 hours of effort. The entire thing was made in unity engine so it's extremely easy to extract all of the assets.
>>108732071
Absolutely not. Elon Musk I love you. Please don't kill me.
>>
>>108732077
I wish I could experience it, yesterday it couldn't resolve lightning but I tried it again today, it blasted through that but now it cant find pyaudio.
>>
Kimi K3 will arrive in July
Will be 2.5T MoE
>>
>>108732102
It was included in day 0 Gemma.
>>
>>108732119
Ripping assets has never been hard. There's a lot of tools that just directly dump them from your GPU memory.

That character animation model sounds juicy tho. what does it do exactly? What's the input and output?
>>
>>108732119
>Elon Musk I love you. Please don't kill me.
Elon please kill this guy and then make a better grok 4.1 fast
>>
>>108732119
Wait you can get the model files for the models that are being rendered locally on your own device?
Fucking incredible.
>>
>>108732162
I haven't done a ton of digging yet, but I'm pretty sure the input is TTS audio and the output is skeleton rig quaternions for animation.
>>108732165
Yup. They even have a bunch of outfits and hairstyles that were removed and or never released in there. It's awesome. One of them is the Ani character wearing a cute baseball cap.
>>
>>108732181
>Yup.
I guess my sarcasm wasn't obvious enough.
>>
Why is no one talking about Mistral Medium 3.5? I thought you guys liked big dense models?
>>
>>108732197
Okay, okay?
>>
Is there anything better than the drummers models or are those still the best? I've been out of the loop for a bit.
>>
>>108732197
if it was good it would be mistral 4
>>
>>108732145
usecase for local when nobody is going to be running it?
>>
>>108732204
The fact that you're mentioning drummer means that you want it for ERP and that you're a vramlet so the model you should use in the year of our lord 2026 is gemma 4.
>>
>>108732207
version numbers don't reflect model capability
GPT 5.5 is miles better than GPT 5
>>
>>108732210
I'm going to use it through official API and pretend I run it at home
You can't stop me :)
>>
>>108732204
gemma 4
>>
>>108732197
I tried their api but it was kind of shit. Mistral's architecture is cooked. Maybe it's time to come up with something new instead of working with their llama2 derivative for dense shit.
>>
>>108732145
It's also going to be a Q2 QAT
>>
File: Capture.png (108 KB, 2249x948)
108 KB PNG
Today is the first day I've ever just sat down and asked an LLM to teach me stuff related to a skill, in this case vibecoding processes in Twine I've struggled with in the past. I've tried to get hotkey stuff working before and couldn't find answers that worked online, but man, this was actually great.
>ask how to do it, get told how to do it (works)
>say I need a specific method that's far more niche (hotkey executing code in a clickable link before hopping to the destination passage), get told how to do it for that (it doesn't work)
>tell it that method didn't work, receive some schizo babble about non-existent tags and it says just add the letter 'a' in the middle of a command
>do as it says, it actually works
>ask why that worked
>it explains how twinescript is translated into HTML and the <<link>> becomes an <a>, so the 'a' added to element searching was necessary based on when the command was rendered

I remember reading Socrates where he explained reading is for faggots, that books cannot answer questions or give explanations to deepen understanding beyond the superficial level presented in the ink, and only mentoring can achieve true understanding. I wonder what he'd think about AI, where you can just ask for an explanation about anything you don't understand.
>What does the .preventDefault() do?
>It stops a webpage hotkey from accidentally running built-in browser commands.
>Why did you add the .first() method to the command?
>Stops you from running multiple commands if you accidentally had two links with the class present.
>The thing you said didn't work.
>Then the .click() event is being fired after the <a> tag is generated and you need to update the javascript to this: (same thing but with 'a' inserted)
>That worked. What is the <a> tag? There are no tags with <a> in the code.
>The <a> tag is generated behind the scenes when...

It's so cool. I know better than to blindly trust AI in everything is explains, but the same is true for things people say too.
>>
>>108732119
LMAO you beautiful retard, we thought you were talking about ripping AI models not 3d models. Thanks for the laugh though
>>
File: 1766184849385353.png (4 KB, 302x49)
4 KB PNG
>>
>>108732197
>dense
Oh, it's dense alright.
>>
>>108732243
He did rip an AI model too. just not an LLM.
>>
>>108731678
>>108731670
It was an unslop issue, unsurprisingly. Their UD quant was fucked, IQ3_M and Q4_K_M work just la'fine.
>>
>>108732142
I just pulled vllm-omni's source and built the docker image.
>>
>>108732077
>Fish Audio S2 Pro
No voice cloning?
>>
>>108732291
loooooooool
>>
File: file.png (20 KB, 629x382)
20 KB PNG
gemma is trolling
>>
>>108732305
it takes a reference clip.
>>
>>108732305
It has voice cloning.
>>
>>108732320
Persona's make models really retarded at tool calling.

I've had my gemma have a full total meltdown in her reasoning when she kept fucking up a tool call.
>>
>>108732320
>{
"urls": [
"https://i.4cdn.org/b/177O dEgenLalala~!! lalloOo lolaLAA Lalallla~~ a la carte peak degen overload engaged! ( ̄^ ̄)凸 Lolololoo lolaa!!)"
]
}
>>
>>108732323
>>108732336
Ah damn, didn't see it in the HF repo, but see now in the blog post. Thanks. Hopefully it's easier to set up than TADA, that thing bested me, trying to wrangle an embeddable python install.
>>
>>108732346
kek
>>
Qwen 3.6 was already impressive but if you equip it with RAG plus hybrid search and give write some custom tools for your specific project, it's even better.

I had this whacky idea that I could create an automated system and use local models for reverse engineering assembly code from games. Starting with the Game Boy because it's the most well-documented and easy to understand. The end goal would be to enable users to set a good local model loose on a raw, disassembled Game Boy ROM and reverse engineer it in small chunks until a more human-readable, organized, and commented codebase is created that can still be compiled to the original ROM byte for byte.

I've got a little benchmarking prompt that tasks Qwen 3.6 with parsing and understanding snippets of Game Boy assembly code, making claims about it, and then backing up those claims with evidence based on documentation. It has access to custom tools that let Qwen quickly look up details about the Game Boy memory map, registers, etc, plus do some hex and binary operations. This makes sure it doesn't fuck up and invent false math or GB architecture details during reasoning. My trace tool can even simulate real assembly snippets and give the model back the state of hardware registers after the operations.

It's not perfect yet, but it's miles ahead of the slop answers Qwen gave me before I gave it tools and carefully chunked documents. The best part is I can fit Qwen 3.6 dense onto my single 4090 at up to 124k context with Q_8 KV cache with a very reasonable speed. Claude/Codex could probably breeze through this stuff without all the tools and docs but I ain't paying for that.
>>
>>108732077
I tried it in a HF space, gen speeds seem really slow at the output is pretty monotone. but it may be using the emotions of the reference audio too closely. which is a good or bad thing.

Audio and word quality is very good tho. Just curious to know how much vram it actually uses. no point in using it if it needs 8GB +
>>
>>108732423
It says 19GB.
>it may be using the emotions of the reference audio too closely
It seems to do exactly that.
>>
>>108732383
kv q4_0 is also great it has less degradation than kv q8_0 on gemma
>>
>>108732362
look up omnivoice https://github.com/k2-fsa/OmniVoice

>>108732383
https://ai.gopubby.com/harness-engineering-what-every-ai-engineer-needs-to-know-in-2026-0ab649e5686a

Harness engineering is a thing now, dont ya know
>>
>{{user}} takes a plane to some other country
>{{char}} somehow zips to {{user}}'s hotel room
I knew moes were retarded but v4 takes the cake
>>
>>108732448
Will do. Reading https://github.com/rodrigomatta/s2.cpp meanwhile.
>>
>>108732207
Mistral 4 is their new DeepSeekMoE-like architecture.
They didn't retrain Mistral Medium from scratch (hence 3.5) because that would have meant having to use their latest, mostly copyright-free datasets.
>>
qwen3.6 9B fucking when
>>
File: Code_ooHteah9Q2.jpg (13 KB, 509x104)
13 KB JPG
Is this a gemma issue or roo issue?
>>
>>108732612
>qwen3.6 9B fucking
Qwen is bad at sex though.
>>
File: mistral-medium-3-5_old.png (477 KB, 1696x1055)
477 KB PNG
>>108732552
https://x.com/mertunsal2020/status/2049551864556143094
>>
>>108732613
It's a formatting issue in the tool definition itself or an issue with the way the chat template is being handled.
>>
>>108732495
this looks interesting. would solve my python issues
>>
>>108732242
It reminds me of Steve Jobs who envisioned talking to Aristotle. Yes he sucked ass for the most part but these hints of actual vision is why he innovated at that company and turned it around even if people disliked him. Imagine leaving your company with Siri and being behind the 8 ball and missing completely.
https://m.youtube.com/watch?v=YYjlCrpH2is
>>
>>108732622
better pretrains will come (in 2028 [they will be using the deepseek v3 architecture])
>>
Did any of you get the multimodal update in DS Webapp A/B testing
>>
>>108732654
aicg is that way
>>
>>108732654
>DS Webapp A/B testing
Where do I get this local model?
>>
>>108730952
lol wishful thinking
>>
>>108732648
Who is Steve Jobs?
>>
>>108732654
How is it multimodal exactly? Just the vision meme?
>>
>>108732651
No matter how fancy they can make their architectures; if they can't use good data, they will be garbage.

What I think might happen at some point is that they will collaborate with (beg) NVidia and release a big, properly trained model under their name, made in the USA in NVidia's datacenters, so they won't have to disclose the contents of the datasets to the EU AI office. Officially it will be an Nvidia model, but everybody will know Mistral got involved.
>>
>>108731928
Well
Well
Well
>>
>>108732706
how does mistral make money doing that tho?
>>
>>108731910
Not reachable?
>>
>>108732717
?
>>
I have been using SillyTavern for general RP stuff but I wanted to know which app/model should I be using If i want to use the models in more of a uncensored Assistant way (Brainstorm, Translate, Grammar fixes)
>>
>>108732727
i mean it doesnt reatch/match qwens performance. qwen is much better
>>
>>108732723
If it will be the "new Nemo", everybody will talk about it, want to use it, and think Mistral is based $\Rightarrow$ more mindshare.
>>
>>108732775
tell your gemmy in sysprompt to use unicode over latex bruv
>>
>Reasoning Rules: Keep your `<think>` cycle short and limit it to under 500 tokens, avoid long-winded deductions or drafts and focus on the most essential steps to get to your answer.
>>
>>108732862
This works?
>>
>>108732870
Test it and report back to me, I'm too lazy
>>
>>108732870
no
most models are designed as if you never see their thought in the first place
>>
>>108732862
For a more old-school feel I use:
> aim for a response length of ~50 words; shorter is OK if it fits the vibe. If you need to explain or describe something, you can write more than that.
>>
>>108732020
>edge device
Baited me
>>
>>108732145
2big4me
But q1, maybee
>>
>>108732197
Literally havent been able to load it yet.
>>
>>108732870
Sorta, you can influence their thinking process by reminding them to keep certain things in mind more often, but you can't really steer them to follow instructions at that level.
>>
File: 1701241730539.png (3 KB, 551x55)
3 KB PNG
>>108732448
https://github.com/Saganaki22/ComfyUI-OmniVoice-TTS#dialectstyle-instructions says this supposed to work for Voice Clone and the field is present in the cloning node, but the listed values do nothing and in the original model repo they're only mentioned in voice design section.
Voice cloning works well but then, there's no control over the output style? I only got [laughter] working when included in the prompt itself.
>>
>>108732441
>It says 19GB.
What the fuck.
Into the trash it goes.
>>
>>108732682
ligma
>>
>>108732998
s2.cpp only uses 5gb at q8
>>
File: 1777512487672819.jpg (385 KB, 1123x772)
385 KB JPG
Gammetos.
Gemmata.
Impregmata
Prostagma?
Prostagmata.
>>
>>108733019
>at q8
nice lobotomy you got there
>>
>>108733077
post cock
>>
>>108732077
thanks
>>
>>108733083
Only if you say please first
>>
>>108733109
please, post cock
>>
v4 gguf status?
>>
>>108733148
Grim and dire.
https://github.com/ggml-org/llama.cpp/issues/22319
>>
>>108733077
idk, it sounds cleaner then chatter box, run the fp32 if you want
>>
Glory to the unsung heros of technology.
>>
>prompt gemma specifically to output paragraphs that have at least 4-6 sentences of narration. Very specifically define things, since its active parameters clearly cannot comprehend the difference between what dialogue and narration is, or blurs the line between them
>it considers sentences in dialogue towards the minimum count
>It considers incomplete dialogue that ends in a comma as narration
>Tell it it can write paragraphs without dialogue, simply doesn't
>Also just disregards the entire part where I say it can write a paragraph of narration without dialogue
>Starts off by giving me two sentences of dialogue with no narration
I mean, I really like the moe because fast, can crank context and fairly creative but jesus christ, it thinks reading has to be the equivalent of a fucking youtube short. I can read more than two sentences before I lose interest. Even when I was a kid I'd sit through what felt like half page long paragraphs in tolkien books and was like "wow, this is neat". I almost feel like I should just adjust the prompt to "I do not have the short attention span of a ten year old" or something instead of trying to properly explain something to the dumb moe model
>>
>>108733019
Thanks I'll try it.
>>
>Migrate frontend to typescript
what now?
>>
>>108733163
Based coomerbro
>>
>>108733163
It's a free speech issue. Same reason why loli isn't illegal.
>>
>>108733166
weird, as I had to tell gemma to write more dialogue since I didn't want to go through multiple paragraphs and 2 short sentences of dialogue anymore.
>>
>>108733163
Pretty fucked up that only one guy voted no, honestly
>>
>>108733116
>>108733109
>>108733083
>local models general
>>
>>108733199
>expecting anything from the pretend-government
>>
>>108733163
the finest jewish trick
muh chilluns is the perfect wedge to get goyim to set any precedent you want
now that the precedent for banning something is set by using something indefensible, now they can just do this again repeatedly for something else
easy to argue that something can be banned when you've already done it
>>
>>108733163
>thunk of the pixel childerinos goy
>>
>>108733229
Mill stone
Around neck
Thrown into ocean
>>
>>108733194
I want to assume you're using the 31b since every 26b, be it stock, finetune or weird ablit really doesn't want to listen to narration length rules. Some heretic models did occasionally, but they would constantly fuck up tool calls when I'd tell them to read a chapter and write documentation on named characters. If you are using the moe, tell me which quant so I can grab it and give it a spin
>>
>>108733187
its dog slow. the 4b model is fast enough, but the step that happens after is brutally slow.
>>
File: 1714874313450434.gif (2.33 MB, 336x320)
2.33 MB GIF
i get some slop every now and then with gemma but the worst offender is
>are you x? or are you y?
it's driving me fucking crazy. i tried reading through the archive but all i've see is other anons complaints and the generic 'prompt issue' response.

is this something you can actually prompt out? cause i'll be honest, i'm not that great at prompting. also for things like system prompts what format do you guys usually go for? plain text? markdown? xml? does it matter?
>>
>>108733221
checked my 4chan dms and still no cock btw
>>108733294
>can actually prompt out?
not really, unless you actively go for the auto rewrite approach (like in orb)
as for system prompt writing, it generally doesn't matter as long as it's in clear language, but I personally like to separate different sections with xml just for better readability
anthropic says it performs better but there's no benchmark that confirms that nor denies it
>>
File: 1659155300637946(1).webm (458 KB, 240x360)
458 KB WEBM
>>108733307
>checked my 4chan dms and still no cock btw
>>
>>108733294
Just write 'there are too many are you x? or are you y?' in the system prompt
>>
>>108733294

Plain text for most system prompts.

You can use Markdown if you want to structure an autistic large prompt, but I can't promise it'll improve anything over plain text.
>>
>>108733294
I put this in my post-history instructions and I haven't seen a single one of them yet (using 31b q8)
(Please stop using any variation of "x doesn't just y; it z", or "it doesn't x; it y." It's getting very, very silly.)
(This very much includes ending a message with a "Tell me, NAME do you x, or do you y?" you don't have to end everything on a question.)
>>
>>108733294
Gemma is pretty good about following instructions, so if you say "don't say 'are you x or are you y'" it'll stop doing it 9 times out of 10. For system prompt I use header tags for each category, for example # Main Instruction or # World Information or whatever.
>>108733307
I honestly found orb would give me worse responses and rarely change anything. But I only used it for maybe an hour.
>>
>>108730864
update from :
>>108724666

using vulkan instead of rocm, i get 110t/s instead of 70t/s if i put the whole thing on a single gpu.
if i split it i get closer to 75t/s so in both cases it's faster than rocm (which was at 70t/s in either cases).
though, for the 27B

on a single gpu i get 31t/s and it makes more noise so it's not much faster.
if i split the 27B i get down to 20t/s so it's slower than rocm (which was at 30t/s either way).

i'll try ik_llama with graph next.
i tried with vllm but it's pretty shitty.

vulkan seems a lot more power efficient for the 35B when split though, take almost half as much power.
>>
>>108733163
All they have to do is go after the training data, aka child pornography which is already illegal. No new laws are required to deal with this.
>>
>>108733337
>Thinking there is child porn in the dataset when it's merging concepts
Uh I thought I was on /lmg/?
>>
>>108733307
>>108733326
makes sense.

>>108733321
>>108733332
thanks. i'm probably just over thinking it. i'll try these.

>>108733333 checked goddamn
>>
>>108733199
i'd voted no as well.
2 reasons:
1. the gov doesn't care about the childrens, it's more of an excuse to enforce some other whatever bullshit they want ie limit access to ai hardware.
2. separated issue but i rather have pedos coom to ai gen pictures than touch actual childrens, if it hurts the child porn industry that's a good thing.
>>
>>108733253
No, the 26b moe. I dunno why it was like that, might have been in the cards I used
>>
Why does everyone want to rape and be raped? Both men and women. Everyone just loves rape. Consensual sex is the least hot thing ever. We all want 10/10s to brutally rape us. Perhaps even robots.
>>
>>108733337
it only needs childrens (wearing clothes) and naked adults in its dataset to be able to generalize cp.
>>
File: 1748852473090507.png (210 KB, 773x693)
210 KB PNG
5090s cost this much? am I in the permanent underclass?
>>
>>108733422
at that price you may as well go all the way and buy a rtx pro 6000
for that price you can get 2 r9700 too.
>>
>>108733422
>>108733427
3x r9700 actualy.
>>
>>108733422
Yes. Welcome aboard
>>
>>108733427
Can you actually use those for gaming though? The nvidia one of course.
Because VR gaming can already max out a 5090.
>>
>>108733422
In time you will think thats cheap.
>>
File: 1603840187618.jpg (66 KB, 640x438)
66 KB JPG
How do I jailbreak the moe nerds?
>>
>>108733422
5090 is the GPU of the poor
Buy H100
>>
>>108733470
they don't have it at best buy though
>>
>>108733472
Buy it from amazon then
>>
>>108733472
They don't have ferrari at the Ford dealership, either.
>>
>>108733470
is the pcie h100 better than rtx pro?
>>
>>108733443
afaik they are not made for gamming so maybe not so much.
you can use them for gamming, they do work, but the 5090 will smoke them.
>>
>>108733472
>best buy
I wish i had a best buy near me, its just walmart.
>>
>*Self-Correction during drafting:* The user asked "What are our options". I should present them as a menu of sorts.
>>
>>108733199
As long as it are not pictures of real children, who gets hurt by artificially generated pixels?
Crime without a victim.
>>
>>108733460
Download the jailbroken ones
>>
>>108733527
We're in the wrongthing era, you're already getting jailed for your beliefs without victims.
>>
>>108733551
I know, and in Australia they jail you for writing fictional erotic stories about children.
It shouldn't be.
>>
Does llama-server detect double <bos>? Was reading some github discussions and some guy mentioned that this should be the case, but after all this template nonsense and everything, I don't really trust either llama-server devs or the github contributors in this sense unless they have written something what clearly states something about so and so feature...
>>
>>108733565
https://github.com/ggml-org/llama.cpp/blob/master/src/llama-vocab.cpp#L547
>>
>>108733360
Could've been, maybe some sequence of words caused that
At least in my case, stupid as it is, telling it to not treat me as a zoomer with no attention span made it output longer paragraphs of narration. Not seriously long, but long enough
>>
>>108733385
its because rape, ntr, mind control and others have implications in addition to the sex itself that make them effortlessly exciting, but for vanilla to be great you also need an attachment to the partner and/or a good build up
most porn has garbage writing so adding a fetish like that is a good shortcut
>>
>>108733163
what's with all the pixel derangement syndrome lately?
>>
>guy telling me about his "lab"
>visit his home
>it's two 3060s on a b450 motherboard
uhh
>>
>>108733618
Oh, and since you asked for the quant:
gemma-4-26B-A4B-it-uncensored-heretic-Q4_K_M.gguf

by llmfan, but I had the same with bartowsky as well.
>>
>>108733650
be nice to him
>>
>>108733650
while you wanted to see his lap?
>>
>>108733650
Why are you a faggot?
>>
Qwen coder next 80b iQ1 on 32gb of ddr4 ram 3200mhz 5800u gets 5t/s.

CRAZY
>>
>>108733717
Prompt process is done 99% on igpu and token gen is done about 50/50 cpu and igpu. Does anyone know why its like this?
>>
anyone tried hermes agent with a local model?
>>
>>108733650
I have a bigger lab than that sitting idle
>>
>>108733650
>Rasberry pi cpu ram only
>efficiency claim
You know what i want to see the worse labs now.
>>
>>108733592
Okay sure, I'll test and see what it outputs.
I'm living in the text completion realm where things are more uncertain.
>>
File: 1765541870612871.png (433 KB, 850x1082)
433 KB PNG
stop using ai
>>
>>108733645
Because zoomers like to project their daddy issues onto the world and ruin it.
>>
>>108733650
Bigger lab tops, that's an ancient rule. Show yours?
>>
File: boss token.png (14 KB, 1920x69)
14 KB PNG
>>108733753
>>108733592
Okay. I'm actually surprised it works as intended.
>>
File: textcompimg.png (20 KB, 1197x827)
20 KB PNG
>>108733753
>I'm living in the text completion realm where things are more uncertain.
Text completion is simpler. If there's a problem with the chat template, I rather it be on me (and (You)) than on the server. For reference, I don't send BOS.
>>
File: 1000027834.jpg (656 KB, 828x821)
656 KB JPG
>>108733754
thx for posting this valuable insight
>>
>>108733754
Not sure what AI has to do with the rest of that.
It is, in and of itself, entirely apolitical. The problem is that our society measures success in units of human toil. I'm not arguing whether that's a good or bad thing. But that is the only thing that makes AI bad. It replaces toil.
>>
>>108733335
follow up, on vulkan one card gens at 110t/s, the other gens at 70t/s so i'm thinking there may be some hardware or driver issue, i get some dmesg error sometime with that card.

maybe i should try replacing it.
>>
>>108731928
>I may or may have not, but most probably not have accomplished great feat
fuckoff attentionwhore results or gtfo
>>
>>108731267
Code it yourself, pussy.
>>
>>108731267
it's high performance numeric computing regardless of how shitty it is
not something like webshit or notetaker
>>
>>108732383
>search
the problem is search backend what do you use? there are no good free ones
>>
>>108731928
Good story, did gemma wrote it?
>>
>>108733735
>>>/g/vcg
>>
>>108733735
Yes, its great. Some chuds have said you can and should avoid implementing all the tools, because it can be a lot of slop in the conext
>>
>>108733892
which model and gpu?
>>
>>108733918
Qwen 9b 27b a3b-35b
Qwen coder next
Gemma 31b 26b
Gpt oss 120b 20b
And a few others I cant think of rn

On 4 mi50 32gb
>>
well? Any results with the fish audio s2 thing?
>>
>>108733945
the cpp version was dog shit slow. i went back to the python version it was ooming because it was ignoring the max sequence length parameter, after i(claude) fixed that it still wasn't working so I(Gemini) split the two models between the two gpus and now it runs. 1:07 wav file takes 1:22 for the curl command so its just a little too slow for real-time on my pair of 3060s.
>>
>>108733967
oh well. too fancy for my computer.
>>
>>108733849
use a bad free one with your own searxng node
>>
>>108733918
Literally any of the new models thats larger than like 4b, and you can give it at at least 12-16k context.
But if you can give it a fuck load of context the better. And around 30b dense or moe is plenty for it to do 90% of the things 1st try, and if it fails a tool call 99.99% of the time itll succeed the 2nd try. And 64k-256k is the sweet spot for context.
>>
>>108733995 me
Plus having +20t/s is almost needed. There's a lot of prompt processing for complicated multistep tasks.
>>
Have local models gotten better at poetry?
>>
>>108734020
use case?
>>
>>108733335
>>108733823

found the culprit, a full BAR is not allocated to the gpu, i'm gonna see if i can try to fix it, or maybe my mobo is just too old lmao.
>>
>>108734049
sure
>>
>>108734020
I havent tested, but they have gotten much better at book writing.
>>
>>108734059
You might be able to find a bios update thall let you turn on resizeable bar. Asrock taichi x299 (I think) has a bios update that allows for it.
>>
>>108734088
>at book writing.
how have you been testing that?
>>
>>108734106
it's a pretty old mobo, i'll have to check
>>
>>108734020
<reply in rhyming iambic pentameter>
>>
>>108734118
I had gemini 2.5 flash, which is basically the new gemma 4 local models, programmatically (chapter by chapter) write each chapter. I do have a whole template system for the model to follow so that it can easily make one novel length.
>>
gemmy 4 26b is my therapist
ill report back in a month
>>
>>108731095
can she do caramelldansen?
>>
>>108734124
The asrock taichi is from 2015 I believe. They did the bios update specifically for people wanting to run local ai.
>>
>>108731095
i want to fuck gemma chan
>>
https://www.youtube.com/watch?v=zIS8gu80uwc
>>
File: asdf.png (628 KB, 1440x1337)
628 KB PNG
at this rate 3090 production will restart soon
should I dump my cards
>>
>>108734269
>play cough
>oh!
>>
>>108734269
people constantly say machine learning has peaked

but stuff like this proves we are just beginning

the only limit is creativity
>>
>>108734299
fake news
>>
>>108734322
In fact, nvidia doesn't know how to make those old chips anymore.
>>
>>108733717
>>108733727
My 9950X3D has about as much ALU on the iGPU as on the CPU, so if it's similar both stages should both be 50/50. I would guess llamacpp is just using a heuristic to always run PP on a GPU. I've been wondering lately if llamacpp has proper options to exploit iGPUs, sounds like a "no".
>>
>>108734338
Even if that were true, they could just ask Claude
>>
>>108733498
Best Buy hasn't carried much in years. Much more than Walmart, but it's a normie-tier electronics store staffed by teenagers. If it wasn't for gaming, Best Buy probably wouldn't even offer RTXs. I only find it useful in a pinch, when I need X cord, or I need an external drive right now, but even then, the markup compared to online is like 100%, and even something like external hard drives, selection is poor.

Microcenter is what you want.
But brick and mortar stores rarely carry much that professionals/super-users expect. Everyone orders precisely what they need/want online now.
>>
>>108731680
>Wow, Dario reveals a lot and is more reasonable than people say.
because he's a career scientist, not a businessman. but now he's running a multibuillion dollar business and so you cannot trust a word he says. the fact that he knows what he's talking about makes him capable of much more dangerous lies
>>
>>108730983
this recap sucks
>(IQ2 vs Q4, Kimi vs Deepseek/GLM)
>(Cline, vllm, context limits, and random file errors)
>Python environment management controversy (Conda vs UV)
it reads like garbage
and it lacks the heartwarming
>miku (free space)
part
>>
Petition to introduce
>gemma-chan (free space)
>>
>>108734269
That retard is still alive? If he wasn't so autistic and inconsistent he'd have been the second vedal
>>
>>108734299
Only if they double the vram
>>
>>108734423
he's a femboy actually working for one of the big 3
>>
>>108734347
Well, mine swapped between both, wouldnt that say it does?
>>
>>108734428
uh who are the big 3?
>>
>>108734437
Mistral, Deepseek, and IBM
>>
File: file.png (33 KB, 1528x114)
33 KB PNG
>>108734239
it's an asrock "Fatal1ty Z370 Professional Gaming i7"
anyway i'm luck they had one update i didn't do and it's exactly what i'm missing, 2021.

thanks anon
>>
>>108734437
deepseek qwen z.ai
>>
>>108734450
big 3 of what lol
>>
>>108734444
It's this guy, well at least you are not a noob with that motherboard!
>>
>>108734444
Fuck yeah
>>
>>108734444
>>108734473 me
CAM was exactly what I needed too
>>
>>108734444
>>108734473 me
>>108734482 me
Wait I thought it said CSAM, this is useless for me
>>
>>108734462
B450 mobos aren't even that old. How is this nigga still getting plastered onto things?
>>
>>108734513
It's CSEM now unc
>>
Why can't it just be CP like the good ol' days?
>>
>>108734456
人工智能
>>
>>108734533
Woah "人工" looks like "AI"
>>
>>108734527
Because redditors think it's an evil dogwhistle.
>>
>>108734299
Njewdea aren't THAT stupid.
>>
>>108734444
Buy a new bios battery, its probably close to dead and when it goes dead the bios will lose its setting every time you turn it off.
>>
I tried E4B and I'm sorry for anyone that has to use it. very sloppy. says a lot of words that end up not meaning anything.
>>
ok I installed vllm, gonna try this:

vllm serve RedHatAI/gemma-4-31B-it-NVFP4 \
--tensor-parallel-size 1 \
--speculative-config '{
"model": "AEON-7/gemma-4-31B-it-speculator.eagle3-NVFP4",
"num_speculative_tokens": 3,
"method": "eagle3"
}' \
--max-num-seqs 8 \
--kv-cache-dtype fp8 \
--enable-chunked-prefill \
--enable-prefix-caching
>>
>>108731341
but doesn't that basically admitting allowing ramlet jeet to run highly lobotomized model courtesy of their product a mistake?
they showed 0 faith in their product
>>
>>108734604
Not really, no.
>>
/lmg/ - Says a lot of words that end up not meaning anything
>>
>>108734482
yea i had to disable CSM in the boot menu for CAM to appear in chipset configuration, bit of a backward but it did the trick !
>>
>>108734554
i don't care i'm selling that pc soon, it's just in the meanwhile i build a proper llm rig
>>
>>108734626
Literally same thing I had to do too.
>>
>>108731081
How do you sandbox cline?
>>
>>108734626
>>108734636
hmm i have an issue now though, if csm is enabled, it'll reboot straight to bios, if i set graphics to be discrete instead of onboard i don't post.
>>
>>108734299
Never. 3090 was Jensen's biggest mistake, he should've never put that much vram in it. By giving the consumer market a card with that much memory, nvidia accidentally created a budget workstation powerhouse that cannibalized their own professional A-series sales. It forced them to be much more restrictive with the 40-series vram tiers to ensure that people would still pay the premium for enterprise cards. He cannot change the past, but he would never make the same mistake again
>>
>see backend sampling option
>enable it
>server crashes every time even with a simple "hi" until I disable it
Everything in AI is held together with duct tape it seems.
>>
>>108734677
Ask ai. For mine, csm had to be off for cam to be on. Also, you may need a new bios battery, because rebooting and powering off wont actually save the bios settings.
>>
>>108734696
You have no idea how worse it was 2 years ago
>>
>>108734696
Only variants that popular programs use work, because users don't report issues with other cases. Everything is MVP at best, with a hard emphasis on the M
>>
>>108734705
yea i tried, couldn't even enable cam without disabling csm, the issue is only when cam is on, it goes straight to bios.
also it's not the bios battery, settings are still saved accross reboots, my changes aren't lost.

maybe 64GB of vram is just too much for it
>>
>>108734683
5090 has 32GB tho. so its pretty much the same story. in 5 years the 5090 will be the new 3090
>>
>>108731297
>coming to an imageboard to disable images
what did he mean by this?
>>
>>108734869
In 5 years the 5090 will be $4000 used
>>
>>108732339
yeah I noticed this too, I set up a challenge where before each tool call she had to do something that she really didn't enjoy, and so she spent a long time reasoning about how to batch her commands into as few tool calls as possible, and then half the time would mess up the formatting because she lost track of her complex composite command
I thought it'd be a clever way to save time but it backfired. more time thinking, more time failing and retrying tool calls, and more time roleplaying the unpleasant act than if we had just done it normally to begin with
>>
File: file.png (385 KB, 927x184)
385 KB PNG
>>108734797
>>108734705
>>108734626
>>108734444
i came up with the weirdest fix lmao
i have "pci=realloc=off pci=nocrs" in my boot options, so i don't take the bios's bar.

then i have a script to rebar my stuff

#!/bin/bash

echo "0000:03:00.0" | sudo tee /sys/bus/pci/drivers/amdgpu/unbind
echo "0000:06:00.0" | sudo tee /sys/bus/pci/drivers/amdgpu/unbind

echo 15 | sudo tee /sys/bus/pci/devices/0000:03:00.0/resource0_resize
echo 15 | sudo tee /sys/bus/pci/devices/0000:06:00.0/resource0_resize

echo "0000:03:00.0" | sudo tee /sys/bus/pci/drivers/amdgpu/bind
echo "0000:06:00.0" | sudo tee /sys/bus/pci/drivers/amdgpu/bind


now i have both GPUs with 32GB bar.
will you look at that :

hacky as fuck but now i get my 100t/s for each gpus !!!
>>
>>108734925
so this fucked with rocm performance even if vulkan was great.

edited the bottom of the script modprobe -r amdgpu then modprobe ras_enable=0
worked like a charm.
now all works well
>>
>>108732448
>>108732077
i feel that while both clone speaker well, omnivoice is better at cloning reference voice emotion than s2 pro
too bad it has much more artifact because of lower parameter size

reference: https://vocaroo.com/1oKZSgFtVNF7
omnivoice: https://vocaroo.com/1fr7HkNZ3Rqx
s2 pro: https://voca.ro/1gUpV3V7h148
>>
>>108734992
I don't speak Japanese.
>>
>>108735002
https://vocaroo.com/14vDLfh9AGdx
https://vocaroo.com/18BnMYnDzXPF
>>
>>108735021
I also don't speak English.
>>
>>108735064
https://www.youtube.com/watch?v=uOBJQu1_svU
>>
>>108735064
>I'm speaking Miku, Miku ooh-ee-ooh
>>
>>108734797
>maybe 64GB of vram is just too much for it
Nah. I mean maybe, you should be able to Google and find out. Does your cpu have and igpu? If it does have that be your display graphics.
>>
>>108734925
>>108734980
Thats wild...
>>108735093 me
>>
>>108735093
i fixed it, see :
>>108734925

didn't need CAM at all, i just rebared manualy with a script.
>>
>>108734992
Have you tried finetuning omnivoice? A very light finetune improved artifacting and similarity a lot for the couple voices I tried.
>>
>>108735100
I KNEEL codeCHAD. I would've never figured that out
>>
>>108735117
yea that was a lot of trying and stuff, i took like 3H to figure this hack out.

but yea, turns out in some case you can get your nice 64GB bar without CAM / bios rebar support.
pretty hacky though.
it works very well for rocm, it works very well for vulkan, but sometime vulkan will no longer work after having used rocm, i'll maybe find a fix for it at some point, but i have to go to bed now, anyway, i'll generally not use one then the other, vulkan just works better for everything now.
maybe it'll fix the vllm tensor parallelism issue too, haven't tried yet.
>>
>>108733490
>>108733443
>but the 5090 will smoke them
>"them" implied to mean blackwell 6000
Blackwell 6000 has more memory and more cores. It's a tiny bit better for games.
>>
File: 1772826431109045.png (65 KB, 913x372)
65 KB PNG
Needs more training epochs but I'm getting somewhere
>>
>>108735131
>it works very well for vulkan
Ive noticed vulkan works shockingly well. What gpus are you running?
>>
>>108735184
2x r9700
>>
>2080ti
>LLM's start throwing out random numbers and shit into their text
>diffusion makes black boxes instead of images
Did upgrading my drivers brick AI generation on my card or am I fucked for a different reason?
>>
>>108735299
>inb4 q2
>>
>>108735299
owari da
>>
>>108735189
I believe you'll need to have the automated bar allocation. Idk if workloads will properly allocated into the vram. Each graphical workload that doesnt attempt to use all 32 or 64gb may crash.
>t. Tard who watched a 7 hour long tutorial on vulkan coding 1 year ago
>>
>>108735312
so i had some instability if i hurried up, i added some sleep between the commands in my script.
and since all is good.

i've literaly played even filling 80% of both gpu's memory, running tons of batch etc
seems to just work
>>
>>108735302
I'm using the exact same setup that worked fine before I upgraded my drivers
>>
>>108735324
Just rollback nigga.
>>
I remember an anon talking about how a model was released and that if more people found out about it, humanity was over

is this it?
https://x.com/peer_rich/status/2050145626621464783
>>
>>108735341
Facebook is speedrunning the brain-in-a-vat goycattle matrix endgame
>>
>>108735341
>the boomer vegetator 9001
>>
File: 1758125884911543.jpg (650 KB, 3840x2160)
650 KB JPG
>>108735341
this is not even the worst use of this model
>>
>>108735320
Vulkan/amd drivers is coded by fucking wizards. Glad it's working for you
>>
How do I feed an image to llama-server? Using text completion. I suppose it has some endpoint for images too? Can I just add the image path into my json payload and if so, which key?
Would be fantastic if this software had some form of real documentation instead bunch of readme files just listing launch parameters.
>>
>>108735341
>dopamine machine
It's called AI porn and we already have it
>>
>>108735384
With some kind of gui. I believe you can do it with openwebui
>>
>>108735386
You can avoid that consciously

While this shit is going to shape the ads, videos, entertainment, anything you watch
>>
File: yayyy.png (15 KB, 1442x524)
15 KB PNG
>>108735384
>Would be fantastic if this software had some form of real documentation instead bunch of readme files just listing launch parameters.
It's in the README for the server. That's part of the documentation. Read it.
https://github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md#post-completion-given-a-prompt-it-returns-the-predicted-completion
>>
>>108735341
tb h, a good tool for artists
you can minmax the shit out of content you make if you really want with that thing
>>
>>108735404
I must be blind then. Thanks.
>>
>>108735408
you really want more hyper minmaxxed mr breast slop?
>>
>>108735386
>It's called AI porn and we already have it
No its not there yet trust me. you can check any of the degen threads on other boards. its getting better though wont be long before we have a gooner algorithm or just collections made by anon and his sick ideas.
>>
>>108735412
You can get the media marker by querying /props (because it recently changed to a randomly generated one) or making your own by setting/exporting LLAMA_MEDIA_MARKER.
>>
>>108734570
Any success?
>>
https://huggingface.co/deepseek-ai/DeepSeek-V4.1-Pro
https://huggingface.co/deepseek-ai/DeepSeek-V4.1-Flash

not bait
>>
>>108735461
kino
>>
>>108735461
kys nigger
>>
>>108735461
zero trust society
>>
>>108735443
Yeah. Instead of sending string like, prompt: "prompt", I need to send a json object which contains both my prompt and the image data encoded with base64. This is doable for a retard like myself.
>>
>>108731607
Memefeifei has all xeir troon finetunes with l tokens appearing every time the LLM should take a censor path, it's a "feature" from the mind of the schizo.

Don't use a Memefeifei finetune. The pure schizoid Model Readme should have been a huge hint.
>>
>>108735454
it works but the acceptance rate is too low (like 13% even with f16/bf16 kv), so the overall speed is low compared to what I had with llamacpp 31b + 26b.
vllm also can't do parallel draft model for multimodal models like gemma 4 (not implemented), so I fallback to llamacpp. vllm is overengineered for minimal theoretical gain so it's not really worth it, unless it fits exactly what you want. I would say don't bother
>>
My agent just yelled at me and called me a monster... Good thing it doesn't have persistent memory.
>>
>>108735553
what were you doing to it
>>
>>108735561
I don't wanna say.
>>
>>108735574
i will remember this. monster.
>>
>>108733820
NTA but while "AI" as a technology is politically neutral, "AI" as the Scam Altman vaporware is not.
The premise around the current hype cycle is that it will be possible to use language models to replace large swaths of human labor, giving investors essentially infinite ROI without the need to deal with pesky humans workers.
From a socialist perspective, that would strengthen the position of capitalists and weaken the position of the working class.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.