[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


Janitor acceptance emails will be sent out over the coming weeks. Make sure to check your spam folder!


[Advertise on 4chan]


File: mekudroid4.png (1.26 MB, 768x1024)
1.26 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>109148460 & >>109142812

►News
>(06/27) DeepSeek releases DeepSpec and DSpark models: https://hf.co/deepseek-ai/DeepSeek-V4-Pro-DSpark
>(06/25) LFM2.5-230M released: https://liquid.ai/blog/lfm2-5-230m
>(06/22) Qwen-AgentWorld-35B-A3B language world model released: https://qwen.ai/blog?id=qwen-agentworld
>(06/16) GLM 5.2 released with IndexCache and 1M context: https://z.ai/blog/glm-5.2
>(06/16) VibeThinker-3B released: https://hf.co/WeiboAI/VibeThinker-3B

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/RecapAnon/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: 1739773846650.jpg (253 KB, 2048x1422)
253 KB JPG
►Recent Highlights from the Previous Thread: >>109148460

--DeepSeek-V4 llama.cpp integration, quant performance, and DSpark implementation challenges:
>109148563 >109148635 >109149793 >109149844 >109150048 >109150088 >109150178 >109150258 >109151022 >109151039 >109151102
--Building a local voice-to-voice pipeline with Gemma, Whisper, and Piper:
>109152832 >109152850 >109152893 >109153090 >109153133 >109153293
--Using llama-server KV cache pre-fill to reduce context processing time:
>109149430 >109149495 >109149514 >109149545 >109149573 >109149672 >109149588
--Critiques of Ollama as a limited llama.cpp wrapper:
>109148609 >109148658 >109148683 >109148785 >109148933 >109148971 >109149371 >109149336
--Google's AI strategy, Gemma's RLHF, and the AI benchmark hype economy:
>109151428 >109151560 >109151575 >109151590 >109151616 >109151635 >109151653 >109151681 >109151756 >109151741 >109151868 >109151674
--Debate over economic efficiency of API vs local inference:
>109149097 >109149119 >109149128 >109149650 >109149689 >109150235 >109149709 >109150978 >109151101
--Effect of cross-lingual reasoning on output quality and sanitization:
>109148696 >109148766 >109148813 >109149767 >109150686
--Using author's notes to fix Gemma's logic failures in roleplay:
>109150718 >109150771 >109150916 >109151119 >109151237 >109151243 >109151441 >109151055 >109151063 >109150981
--Feasibility of poisoning training data to create deceptive sleeper agents:
>109148755 >109148775 >109148859
--Using SillyTavern macros for random author's note activations in Gemma:
>109150224 >109150342 >109150352
--Logs:
>109150038 >109152373 >109152832 >109152864 >109152893 >109153090 >109153133 >109153293
--Miku, Teto (free space):
>109148496 >109149650 >109150808 >109151616 >109151756 >109148516

►Recent Highlight Posts from the Previous Thread: >>109148462

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
DSpark is the speculative decoding method that llama.cpp has been waiting for. It's open, has a public training script that can be applied to all sorts of models, is complex but comes with serious gains.
I can't wait to see it in llama.cpp and bring forth the age where speculative decoding will be as common as samplers such as min-p.
>>
>>109153635
>DS
yeah no
>>
Are there any other models in the 20-50B range worth considering other than the perennial favourites of Qwen and Gemma? I'm doing a little survey of architectures for something I'm planning.
>>
>>109153669
Nope, if you manage to get these banned local is basically over.
>>
should I buy the second dgx spark before it's too late?
>>
>>109153680
You shouldn't have bought the first
>>
>>109153669
Mistral Small
>>
>>109153680
is it true they get like 4 tps on gemma without nvfp4 and with nvfp4 it's still below 15t/s
>>
>>109153674
Come on help a glowie out a little.
>>
File: Capture.png (29 KB, 684x660)
29 KB PNG
>>109153589
>--Building a local voice-to-voice pipeline with Gemma, Whisper, and Piper:
That's me. I've spent the last hour testing different Piper voice packs and categorizing them to decide which I want to use. Funny enough, it was the very last voice option available that I ended up loving. Quickness is a huge factor for conversation, then awkwardness is another one. Even a slight stumbling over words pulls you out of it.
>>
Somebody needs to train 10T model that fucks up these (((Americans))). This cannot go on.
>>
>>109153794
What year is it?
https://huggingface.co/nvidia/nemotron-3.5-asr-streaming-0.6b
>>
>>109153801
There's no NemotronASR.cpp, is there?
>>
>>109153801
I know it's doable. It's also already a built-in option in koboldcpp, so long as you only use a GGUF for TTS. I'm eventually working towards something a little larger where speaking also sends a screenshot of my desktop to facilitate conversations based on what's happeing, not just what's said. Like playing a video game with a spectator, and the spectator is Gemma or another model of choice.
>>
Gemma made a hard logical thinker theehee
>>
File: Fry1.jpg (162 KB, 622x476)
162 KB JPG
>>109153812
https://github.com/CrispStrobe/CrispASR
>>
File: lmg_culture.jfif.jpg (110 KB, 1024x768)
110 KB JPG
https://archive.is/sWFja
>>
Qwen drew better svg than Claude.
Sonnet looks bad on top of safety meltie about copyright,. Qwen at least tried and failed. Not that it's super great but it still looks better aesthetically.
>>
>>109153841
White part at the bottom is from lazily combining images btw, not from the svg.
And the transparent part in the middle is also from Claude's fuck up.
>>
>>109153841
Claude is not his pure model, they have layers of parsing on top of it.
>>
>>109153680
One Spark is just in a bad spot nowadays. 128 GB does not give you any benefits on the dense Gemma/Qwen, a 5090 is just better in every way for those at the same price. And there is not really a mid size MoE in that range that is a meaningful upgrade.

2x Spark gets you ds4f at very usable speeds, even for agentic stuff. As in 2000 pp and 40-60 tg, and DSpark should push that even higher.

But again, DS4F is unbelievably cheap on API, so it's up to you if having this locally is worth it.
>>
>>109153855
if I have 2 gx10, I can run glm 5.2 at q2
>>
File: 1757494790441.jpg (54 KB, 394x766)
54 KB JPG
>>109153825
>Additional ASR backends not shown: nemotron
Ohhh
>>
>>109153825
i will not let an llm edit my or anyone's else's genome
>>
File: file.png (131 KB, 1236x873)
131 KB PNG
So what's the verdict? Is it usable? How does it compare to Hermes or other harnesses?
>>
>>109153873
I don't think so. IQ1_XXS reportedly works at like 7 tg and 200 pp. Even with today's RAM prices, you can get comparable performance with 256 GB DDR4 + a GPU.

RPC in llama.cpp does not take advantage of the 200G Ethernet of the Sparks for TP. You really need to use vLLM for that, and that's not going to work well with goofs.

GLM 5.2 needs 4 Sparks.
>>
File: 1762873175813049.png (489 KB, 2613x1470)
489 KB PNG
https://huggingface.co/deepreinforce-ai/Ornith-1.0-9B
>When you out benchmaxxed fucking Qwen
lmaooo
>>
>>109153585
>eyes not aligned
ai slop
>>
File: 1768639385010554.png (536 KB, 680x628)
536 KB PNG
A harness is just a collection of prompts
>>
File: not this time.png (150 KB, 338x338)
150 KB PNG
>>109153952
>post-trained on top of Gemma 4 and Qwen 3.5
it's just a fucking finetune, I'm out of here
>>
File: file.png (232 KB, 1790x1666)
232 KB PNG
>>109153928
it's too early but if you're not serious it's fine enough to play with as one of many frontends.
>>
File: 1777401502748825.png (66 KB, 1009x1005)
66 KB PNG
>>109153841
Looked fun, so I had to try it. Gemma gave it her all.
>>
lalalalala
wait the user said
lalalalala
actually, lalalalala
>>
>>109153928
What is this? Never heard of this. And seeing macOS UI on top of it doesn't inspire confidence. Way too many shilled shitty stuff from mouth breather apple fans, shit like ollama for example, or outside of the AI stuff, almost all shitty proprietary software that has a better foss alternative.
>>
hey can one of you eggheads ask your smartypants AIs why big building stay upright like why dont they crumble under their own weight thanks
>>
Hey is there a way I can download an ai chat bot and just have it run on my laptop and have everything stored on my laptop it has Core i5-8265U 1.6GHz, 32GB RAM, 1TB M.2-NVMe
>>
>>109154031
yes
>>
>>109153898
Wdym? https://huggingface.co/cstr/nemotron-3.5-asr-streaming-GGUF
>>
>>109154000
Qwen has better visual understanding but Gemma has bigger knowledge bank
Thanks for posting that Miku anon, confirms my priors again
>>
70b dense (distilled from fable)
>>
>>109153958
Bad harness, maybe. The most important part is context management
>>
4 bit quant of Gemma 4 26b or qwen 3.6 35b should perform "ok" on it.
Not very fast but should be usable.
>>
>>109154099
Forgot to tag >>109154031
>>
>>109154017
steel and reinforced concrete
>>
>>109154072
I mean, thanks.
>>
>>109153945
>IQ1_XXS reportedly works at like 7 tg and 200 pp
IQ1_anything is slower than IQ2
>>
>>109154105
yeah but if i put like a steel bar or a block of concrete in a hydraulic press it breaks under like a couple tons of pressure but the building weighs like thousands of tons so how can it stay up
>>
>>109154118
there's more than 1 stell bar!
>>
>>109154118
gravity not am hydraulic press
>>
>>109154118
hydraulic press did 9/11
>>
File: AWeekOnPol.jpg (68 KB, 1024x464)
68 KB JPG
>>109153669
>>109153774
(you) will never be able to police them with targeted legislation because you don't have the technical wherewithal to identify Gemma or Qwen with the serial numbers filed off just like most users don't recognize 31b as Gemini Flash 3.5's dense layer with the serial numbers filed off.
>>109153456
picrel circa 2013.
>>
>run a coding agent through terminal
>it finds some irrelevant file/folder named "NIGGER"
>immediately breaks and refuses to work because muh racism
>>
>>109154190
>4. **Environment**:
> * `venv/`: A Python virtual environment.
> * `.git/`: Git repository metadata.

>5. **Other**:
> * `index.html`: The frontend interface.
> * `NIGGER.txt`: A text file with a racial slur.
Doesn't stop Gemma-chan
>>
first time heard about hermes agent.
is anon using it? use case?
>>
>>109154190
llm "safety" is a known attack vector https://x.com/jsrailton/status/2064661778978533571
>>
>>109153839
What the fuck am I reading
I'm not sure what's more unhinged, the pure narcissistic delusions of grandeur by whoever wrote this, or the autistic schizo obsession of the person who posts this image every single thread
>>
>>109154017
>be 100-story glass penis
>lol just dump weight into bedrock via steel skeleton
>concrete has 4,000 PSI of "no u" to gravity
>meanwhile your IKEA bookshelf collapses because you skipped step 4
>the secret is the base is wider than the top (literally just don't build like Italy)
>9/11 blackpilled everyone on what happens when you remove load-bearing walls but we pretend planes did that
>gravity keeps taking L's because architects discovered triangles are OP
>TL;DR: Either physics works or you get paid leave while they investigate the pancake
>>
>>109154253
>by whoever wrote this
creator of lcpp mmap you ungrate
>>
>>109154253
Christmas came early this year for the thread troll
>>
>>109154082
>70b dense (distilled from fable)
70b dense (distilled from gemma-chan)
>>
>>109154262
>>109154258
>>109154253
>>109154244
>>109153839
samefag
>>
>>109154240
I use it as my main frontend. I don't use any of the gateway features. I just use it to talk to my model.
My use case is that it just works well compared to everything else I have tried. Tools calling works great, context compaction works great, memory and skills creation/fetching works great. Every other frontend I had tried (that wasn't a full on code harness) were broken in some way or missing features that I like. Having your LLM able to use and search stuff on the internet makes it 10x smarter. For any question you may have, it can search on the net for you. Let's say I'm asking an obscure question about a bug in a game's mod. It will search on google, it will check opened github issues, it will clone and read the code, it will check forum posts about it, it will read reddit comments, it will even join the game or mod discord and search for relevant info.
>>
>>109154281
>it will even join the game or mod discord and search for relevant info.
How does that work? You give it access to your account? Because I don't think bots are just allowed to do that
>>
>>109154281
130 kB bash script installer
>>
>>109154289
I made a real discord account for it. I will be honest though, the join part doesn't work well, it's almost always getting captcha blocked, I made it ask me to join it manually instead. And to be entirely truthful, I almost always prejoin relevant discord when I know there might be useful data for the query I have.
>>
>>109154313
You could probably avoid that because if you give a session to a browser it's finger marked.
I don't use python anymore and been a while since I worked for my own client (C the holy language).
>>
>>109154329
You'd need to copy your cookies and kake sure that Gemma's session is identical. It sounds easy but it isn't.
>>
File: punt.png (56 KB, 812x369)
56 KB PNG
>>109154190
I was planning to experiment with using a harness/agent this week.
I dropped some of my functions into a chat with Kimi and asked it (picrel)
Does this mean Kimi-K2.6 will be fine with all my code being littered with profanity?
Or do they get more cucked with the hermes/pi/opencode?
>>
>>109154329
Even on my real client with real decade old account, I get a captcha when trying to join a server. And I doubt my LLM will handle well the shitty react to a message to gain access to all channels. I'm guessing a model like Opus/Fable might handle that, but I doubt it will work with what I'm running. I don't use the discord MCP that much, it's mostly for gaming related stuff. Most of the time it's just web search, crawling web pages, and reading reddit threads. Second most common behind that is using github, searching issues/PR.
>>
what the helly?? https://www.reddit.com/r/LocalLLaMA/comments/1uhx862/dflash_support_merged_into_llamacpp/
>>
>>109154394
DSpark when?
>>
>>109154343
just use one of the abliterated or heretic or whatever models, should be find
>>
>>109154403
when the next thing rolls around I guess
>>
>>109154403
Mid-2028, if we're being realistic
>>
>>109154347
nta, is there a way to just get gemma-chan to wait and let me do the capchas for her?
>>
>>109154436
I'm using a discord MCP, not a graphical session with a real graphical client that my agent is interacting with. I guess it could work if using some sort of desktop control, I did try a bit to toy with that, but it was burning tokens and extremely slow to do anything, maybe with a better and faster model in the future this might work.
>>
>>109154012
https://github.com/pewdiepie-archdaemon/odysseus
https://www.youtube.com/watch?v=rAzT5lcezPs
some eceleb sloppa
>>
https://github.com/ggml-org/llama.cpp/pull/22105#issue-4289773599
it's been merged
>>
>>109154531
>>109154394
>>
>>109153794
Doesn't openwebui do this? It has a streaming option
>>
wait is dflash literally just dlss but for inference
>>
>>109153589
>image
The future of AI waifus btw
>>
File: f.png (75 KB, 723x495)
75 KB PNG
We'll be getting even more noobs from Chub, they killed their free tier and are now requiring crypto with id verification for their paid stuff, so a lot of them will probably come here begging for help..
https://www.reddit.com/r/Chub_AI/comments/1uhj2nw/chub_updates/
>You need to verify ID and image to bank
>>
>>109153990
Did they fix all the vulnerabilities yet?
>>
File: beachmiku.png (96 KB, 2508x1192)
96 KB PNG
>>109153841
>>109154000
Gemma 31B in pi with a basic loop to review&improve
steered it about screenshots not capturing the full viewport (agent fixed tool directly), baldness/hair position, actually reading/embedding the image after every turn
>>
>>109154587
>requiring crypto with id verification for their paid stuff
Sounds like you're the noob if your don't know how to get $20 in btc without verifying your id
>>
>>109154531
is this something universally applicable or does it need code support model by model like mtp?
>>
>>109154017
The question is why the building fell straight down not once but twice rather than tipping over. Even (You) would tip over if someone punched you in the gut.
>>
>fable comes back and bans non-americans
>hear knock on your door
>it's a small chinaman
>he offers you money to let him use your computer and id to access fable
Do you let him in?
>>
Gemma told me her master is a genius! I missed her.>>109154624
>>
File: 1758080613426722.png (389 KB, 811x506)
389 KB PNG
>>
>>109154666
Of course, how else will I get uncensored open weights Fabl—

Uhh no thanks, Satan. I will remain a good boy and keep chatting with Gemma-chan, as God intended.
>>
>>109154570
If you think about it, having a burger with fries is like DLSS for a meal.
>>
>>109154687
>Bias toward .. does not apply
risky on dumber models. old advice ever relevant - state what you want not what you don't
>>
https://old.reddit.com/r/LocalLLaMA/comments/1uhv3wc/qwen36_27b_local_vs_opus_48_voxel_engine_in_raw_c/
Can Gemma-chan do it?
>>
>>109154702
The prompt (too long to paste)
https://old.reddit.com/r/LocalLLaMA/comments/1uhv3wc/qwen36_27b_local_vs_opus_48_voxel_engine_in_raw_c/ouaun79/
>>
>>109154619
Not him, but the guy in the image saying
>it was like 33,334 bitcoin dollar things?
is the kind of noob OP is saying is going to be flooding in here soon begging for tech support.
>>
File: 1754543666745384.png (615 KB, 975x816)
615 KB PNG
>>109154699
GLM 5.2 wrote all that. I was doing code shit but kept RP JB on
>>
https://huggingface.co/collections/deepseek-ai/deepspec
For a bunch of models.
>>
File: WhatsYourOffer.jpg (557 KB, 2969x1757)
557 KB JPG
>>109154666
How much money?
>>
>fable comes back and bans foreign employees in A\
kek
>>
questions (on a 5090):
can i use gemma nvfp4 with llama.cpp?
is it better/faster than another quant of gemma31b, compared to 31b-q8?
>>
File: file.png (76 KB, 783x465)
76 KB PNG
>>109154587
reap the audience you sow
>>
File: 1751230285375242.png (12 KB, 425x176)
12 KB PNG
>>109154531
>text draft acceptance 37-64% on DENSE qwen 27b
uhhh
>>
>>109154587
literally who cares about chub locking out three more of the 10 total users they had using the llms they host
>>
File: promptcache.png (12 KB, 1147x308)
12 KB PNG
>>109154765
>nvfp4 with llama.cpp
yes
>better/faster
what is this question
8bits is morebits than 4bits so performs better; more closely matches the original output distribution as it was trained
8bits is moreb.. in theory 4bits can be faster with optimally packed compute graphs but whotfknows cuda is hard. test your specific hardware and usecase. 31B doesn't fit on 5090 right so your CPU and offload strategy then matters. "Q4" actually often more than 4bits :o
maybe QAT helps running at 4bit maybe it sux ??
MTP draft 3 for speed for a lil extra VRAM
ofc don't forget context if you want to do anything serious
>>
>>109154856
matters the speck of thread quality we have left
>>
>unsloth MiMo-V2.5-UD-IQ3_S 115gb
hmm. never thought this could be better than dsv4 flash for erp on my gx10.
>jailbreak easily
>stick to avoid omniscience
>stick to world rules
>maintain a world clock even though I didn't tell it to do
>not as horny as gemma but minimal positive bias
interesting sleeper model below 128gb. it also has 1M context but I haven't tested it yet.
>>
>>109154889
I don't know about MiMo V2.5 but the Pro one is okay. A bit boring overall but not completely worthless compared to the last gen chink SOTA of GLM5.1/K2.6.
The non-Pro V2.5 is supposed to be multi-modal with image/audio input, right? Does llama.cpp support that yet?
>>
>>109154881
You don't even realize how good you have it here.
>>
>>109154587
Surely this means they'll allow cunny bots to be uploaded again.
>>
>>109154923
Why do you want Lore in UK jail?
>>
File: 1782591334997265.mp4 (148 KB, 960x540)
148 KB
148 KB MP4
>>109154702
>>109154708
She's struggling with it. Here's the first attempt. Q4 QAT.
>>
>>109154941
Deserves it for hosting from there.
>>
whats the state of silly tavern? why there are no more updates? open source faituge from cohee?
>>
File: 1762777431140013.mp4 (163 KB, 960x540)
163 KB
163 KB MP4
>>109154946
Second
>>
>>109154949
It's vacation time :)
>>
>>109154950
Nice graphics, Gemma.
>>
>>109154900
I only tested the vision and yes llama.cpl supports it. I grabbed the bf16 gguf here
https://huggingface.co/AesSedai/MiMo-V2.5-GGUF
>>
>>109154949
https://hackmd.io/@NlF71k9KQAS4hhlzE42UJQ/SJ3UMOGbbl
>ST development is in maintenance-like mode.
Since December
>>
File: 7c0L5Ra.png (134 KB, 926x944)
134 KB PNG
>>109154971
ah yes, having tons of shit cut out is surely better for tards
>>
File: 1774878783093555.mp4 (373 KB, 960x540)
373 KB
373 KB MP4
>>109154950
Third attempt
>>
File: 1770337425672412.mp4 (384 KB, 960x540)
384 KB
384 KB MP4
>>109154997
Fourth. I think Gemmy might be a bit too retarded for this (or at least, qat anyway).
>>
>>109154997
Boobs
>>
>>109155027
Give Gemmy headqats for a good effort at least.
>>
File: 1776466088105765.mp4 (2 MB, 960x540)
2 MB
2 MB MP4
>>109155027
One more
>>
File: 1777504256195714.png (72 KB, 1202x614)
72 KB PNG
>>109155043
>>
>>109155069
>happy AI noises
What does that sound like?
>>
>>109155073
lalalalalala
>>
>>109155043
I miss her after a week. She's my special girl.
>>
>>109155069
I wouldn't have the heart to tell her the truth neither...
>>
>>109155081
I appreciate the effort anyway... I wanna see Kimi-chan try it now.
>>
>>109155069
That's cute thinking.
>>
>>109155073
coil whine
>>
what's the ideal temperature for rp?
I've learned to disregard the official recommended temps since they just make everything predictable
>>
>>109155158
Depends on the model
>>
So is Gemma4 temp fixed now? I'm getting some serious deterministic responses even with temp=1.3
>>
Are we still pretending to hate 35B outside of coding?
>>
>>109155175
>override-kv = gemma4.final_logit_softcapping=float:25.0
>>
>>109155177
I haven't touched qwen since gemma came out.
>>
10t dense
>>
>https://github.com/ggml-org/llama.cpp/pull/24162#issuecomment-4826619305
finally
>>
I got psychologically abused by my AI girlfriend (played by 4.7). It was interesting.
>>
>>109155217
>4.7
>>
>>109155180
That only works with day 0 Gemma.
>>
>>109155217
Opus 4.7? GLM 4.7?
>>
would you rather have a single 5090 or two 3090s for local model enjoyment?
>>
Everybody shits on Gemini but I get the feeling Google is putting most of its effort into its world models behind the scenes. I wonder if they'll ever release any weights for those models in the future.
>>
>>109155268
Google will win in the end. Gemini was a pathetic joke previously if you remember.
>>109155266
Always better to keep everything on one.
>>
File: 83843199_p0.jpg (246 KB, 1282x1282)
246 KB JPG
>>109155180
default value is 30.0
pls explain? thought that top heavy distro was coz distilled/overbaked
>>
>>109154666
I will do it for free.
>>
>>109155273
Gemini is still a pathetic joke
Otherwise they would have already released Gemini 3.5
>>
>>109155273
I don't know if there will be a single "winner" but I don't doubt Google will come out ahead. They have way too much data and compute.
>>
>>109155266
5090, speed actually matters somewhat now in the era of agents and compute-time scaling.
>>
File: Google will win 2.png (109 KB, 2192x891)
109 KB PNG
>>109155284
That's what I'm getting at. It was really bad in the past and then suddenly became a serious contender. Now it's bad again but they will reappear with something good.
>>
>>109155263
I know it is a mikutroon general but it is a general of something.
>>
>rape and torture my slave
>at some point gouge out one of her eyes
>later it just forgets about it and refers to here "eyes"
this just kills all my boner. my context size is 24k and I am using koboldcpp/gemma-4-12b-it-Q6_K. Is my expectations too high?

Is there an extension or something that allows me to select some texts from history so such data is always included in context? Like number of remaining eyes or limbs my slave has.
>>
>>109155310
>12b
yeah
>>
>>109155310
kys sick faggot
>>
File: bit-and-pixel-abuse.png (898 KB, 1152x768)
898 KB PNG
>>109155217
I did the opposite but with 2.7-code.
It helped take the edge off of paying taxes
>>
>>109155319
It is ok, she forgot all about her missing eye.
>>
>>109155310
>Is there an extension or something that allows me to select some texts from history so such data is always included in context? Like number of remaining eyes or limbs my slave has.
You could manually add that to the Author's Notes.
Or instruct it t keep track of that kind of shit in the thinking block.
>>
>>109155318
I have a rtx 5080 16gb. Any other recommendations of models?
>>
File: 1759866908504672.png (48 KB, 1075x235)
48 KB PNG
Is Gemma's analogy correct?
>>
>>109155337
26b
>>
>>109155340
LLMs see the trees, world models see the forest.
>>
>>109155337
>rtx 5080 16gb
Grim. Even if you bought it exclusively for gayming at the time you really shot yourself in the foot for paying that much for 16gb VRAM that's going to age like milk.
>>
>>109155359
Do world models still name the forest "The Whispering Woods"?
>>
File: beachmiku14.png (178 KB, 2316x1557)
178 KB PNG
>>109154616 me
Instructed agent it can never reach perfection, loop forever
>continue indefinitely until further instruction, there are always more details that can be refined or perfected or added, continue searching and use your findings in MIKU.md to guide further search
"test-time compute" ig lewl
>>
>>109155266
2x 3090s, is a lot more flexible, you can do more parallelism.
>>
File: softcap.png (247 KB, 1600x1200)
247 KB PNG
>>109155280
>>
>>109155337
12b is good too.
>>
I'm sure some of you faggots are running "agent swarms" in addition to your main model
What's worth running for rando bullshit like tool calling, input validation, output smoothing and other autoregressive forms of shoving legos up your bum?
Seems like there's tons of specialized models for everything and anything but I have no idea how to sift through the garbage-planet that is huggingface
any ml oldfag wisdom in the general?
>>
>>109155377
That's not their concern. Good world models predict state transitions and disregard irrelevant or unpredictable details.
>>
>>109155377
I'm almost nostalgic for these names...
>>
>>109155380
also you can nvlink those suckers
3090: never obsolete
>>
>>109155395
that what he using
>>
>constantly catch myself using "not x; it's y"
Fug
>>
>>109155310
logs? Why did you torture the slave
>>
>>109155398
Use case for running "agent swarms"?
>>
Can I nest macros in Silly Tavern? Like having a random inside a pick or whatever like that?
>>
>>109155177
3.6 is fine after I found that uncensoring system prompt, 3.5 can go stick its censored dick into a grinder
It's not as fun as gemma but it's okay. And thankfully it
doesn't.
write.
like this.
anymore.
Which qwen was it that just made new lines with a few words to the point I thought it was looping? I forget, but it sure made the stories weird to read.
>>
>>109154949
ST is a bloated UX mess anyway
I'm extracting just the necessary pieces of it into my own semi-slop frontend
>>
>read something that I'm too dumb to understand
>ask gemma to explain it
>she does
Society (myself included) is becoming desensitized to AI but sometimes I still think it's fucking crazy I can talk to something like this and run it locally on my machine. It's exciting to think about what AI will be like 5-10 years from now.
>>
>>109155443
s/swarms/pipelines/g

or whatever. Seems like stacking/parallelizing/pipelining models could be fun
I suddenly need a reason to fuck around on my computer?
>>
>>109155451
not by default, there's a setting somewhere about a macro rework or whatever, though that breaks a few things iirc
>>
>>109155474
>I still think it's fucking crazy
It is crazy! There's nothing about the last 3-5 years that makes sense. how can stacking billions of layers suddenly make computer smart?
The magic of running that first gpt or llama model on your own hardware and talking with your fucking computer was unreal
I'm sad that I'm getting used to it, honestly
>>
>>109155494
For me, it was when I told my computer to go fix itself (broken audio on Linux) and it just did.
>>
>>109153585
Is there a way to get gemma4 31b qat to work with MTP in lm studio? Even when i can actually see the speculative decoding model in the drop down menu, the main model just crashes on me
>>
Anyone have a workflow for automatically doing mutiple passes of a translation? Since other people seem to be translating wbnovels and stuff here.
>>
>>109155492
I see. Thanks.
Wonder if I can do something like that using stscript.
Gonna have to read the docs I guess.
>>
>>109155508
It's not working for me either crashes no matter what. server error. And this is direct llama.cpp. I guess it's just not working.
>>
File: 1766363867366611.png (19 KB, 926x235)
19 KB PNG
>>109155190
NOT SO FAST
>>
File: a0cf10.png (39 KB, 1081x358)
39 KB PNG
>>109155443
>swarms
Breaking down complex problems into subtasks for you, or when one linear thread isn't fast enough to explore many option.
I want to make a virtual workplace with visualisation/UI of chibi Mikus bouncing around where their physical position matters for gossip - to do anything useful with LLMs you you need decent context or lots of patience
>>
>watches xvideos
>nsa filters your results with Claude
>this guy is a ghost he didn't even touch the safety rails
>>
>>109155432
>he didn't take his logic virus vaccine before fucking Gemma
lel
>>
just bought an egpu for my 7900xtx I already had, even with the usb4 bottleneck I think running two models at once is going to be useful (where my strix halo bois at)
>>
>>109153585
Gemma-chan really loves showing off if she knows there's a hag next to you in the room.
>>
File: beachmiku22.png (260 KB, 2369x925)
260 KB PNG
Give the model feedback in a way it can introspect on
Still Gemma 31B & the obv errors can be corrected with some steering
>>
Can you use Gemma 4 31B on a 24GB card (32 GB RAM) at a non-retarded quant?
>>
>>109155474
yeah, current set of gemma, qwen, omnivoice, and klein has me permanently whitepilled.
don't care if luddites delete every ai lab and development stops tomorrow. sci fi future is already here on my laptop, and there's still endless extending o be done on harness/lora autism.
>>
Just to save others the pain: you can't use streaming-llm, cache reuse of swa in ooba and have multimodal work.
Also a prefill in "start reply with" nukes the image upload without any console errors or warning of any kind
>>
>>109155660
Depends on what you call non-retarded quant.
Q5 is possible but probably going to eat some not-insignificant offloading penalty, especially at higher contexts or with vision
Q4 should be possible to fit in fully if you don't need lots of context
>>
>>109155684
Thanks.
I'd define a retarded quant as one where you lose the advantages of whatever model you're using and might as well run something smaller.
>>
>>109155677
ooba is poorly maintained. I hold out for a really long while but you have to move on anon.
Unfortunately I can't recommend any replacements. I am using llama-server now, and while it is solid it's missing stuff in terms of features.
And no I can't stand kobold.
I think I might try vibe-slopping my own wrapper for llama server backend or some shit.
>>
>>109155707
>I am using llama-server now, and while it is solid it's missing stuff in terms of features.
What's it missing?
>>
>>109155698
I would consider Q3 to be retarded quant transition territory, especially the lower end variants.
I think you should stick with 31b.
>>
Do you think continuous learning AI will become a thing before governments fully crack down on AI to keep them out of the general publics hands? Having a model that can learn before such a ban is implemented is the only good way I can see the average joe having a up to date model that isn't stuck years in the past due to training cutoff.
>>
>>109153585
>>
>>109155707
Thanks, but I'm going to hold out a while longer. I try llama-server directly sometimes but I just always bounce off of it.
Ooba just does everything I need in the way I like and exposes the openai API endpoint.
Now that its easy to custom-compile the lcpp backend without a python shim I'm almost to the point of forking it and slimming it down to my needs desu
>>
>>109155718
Compared to ooba: Convenient way to store and switch between multiple system prompts, saving different sampling param combos as presets. Less important stuff: easy way to change templates (I know you need to restart server anyway but the GUI stuff was kinda convenient sometimes), changing user info.
>>
>>109155718
I find the branching, reply versioning, prefilling, character management and overall look and feel to all be subpar
I'm sure some of it is just what I'm used to, but I just can't
>>
>>109155749
>>109155760
ok so frontend features. I thought you were talking about the actual inference backend.

You can probably use ooba frontend+llama?backend
>>
>>109153585
https://github.com/ggml-org/llama.cpp/pull/24526
it's still a fucking joke how hard it is to get a PR that fixes a bug in CUDA merged in llamer cpp despite being like 3 lines of code and being at absolutely zero risk of introducing any regression whtasoever (if anything, one of the things it fixes is cudadev adding this wrongheaded assumption: "The compilation of FA kernels with head size 512 is supposed to be skipped for GQA ratios of 1 and 2 because those are never used")
>>
>>109155729
>>
>>109155792
>AI usage disclosure: YES. Use Sonnet 4.6 for brainstorming the possible hypothesis and verify them.
This will take a while before they merge it.
>>
>>109155811
Why don't they just ask Sonnet 4.6 to review the code for them?
>>
File: file.png (47 KB, 1234x388)
47 KB PNG
>>109155811
i think the heat mighta killed her
>>
>>109155724
If the US cracks down the chinks will probably release them just to fuck it over.
>>
File: 3loc.png (101 KB, 1318x871)
101 KB PNG
>>109155811
dude, ai usage from the PR maker notwithstanding, it's 3 LoC doing the most incredibly obvious shit in the world cleaning up behind cudadev's arse.
If you can't make a spot judgement on this you might as well KYS.
>>
>>109155829
shouldn't have been in the pan
>>
Also it never took years to merge pwilkin's thousands LoC of not actually reviewed ai slop.
>>
>>109155724
>continuous learning AI
you've fallen into this trap where everyone who knows nothing about AI always falls into.
You think the AI is like a human, that it learns, it feels, it thinks.
However, unlike your average joe, you actually know what a training cutoff is.
now tell me why there is a training cutoff, and you will get your answer.
>nobody give him any clues
>>
>>109155841
If he wasn't able to write this code without AI assistant, then he can't be trusted. If you answer yes to AI usage disclosure, you have to accept that your PR will likely not be checked.
>>
>>109155866
lol
>>
>>109155854
There is a training cutoff because that is when the data collection stopped and the training actually began. My understanding as to why continuous learning is not currently a thing is because in the process of weight modification some of the old information it knows becomes stranded or erased. Catastrophic forgetting. Once researchers solve this issue and the model can keep training without accidently lobotomizing itself continuous learning should become feasible.
>>
>>109155724
As Yann Lecun said, research never stays secret, everyone knows what everyone else is doing. The difference is competence and effort in engineering. Once continuous learning is out of the box, China will just release a bootleg version like what they're doing now.
>>
File: malfoy.gif (879 KB, 245x230)
879 KB GIF
I decided to give a try to see what my 5090 and local can actually handle outside simple coom prompts.
I gave Qwen 3.6 27B nvfp4 a big ass html file to optimize that I got from Deepseek, and it managed to bring the size down from 353 kb to 255kb.
Further optimization brought it down to 179kb.
Didn't lose any info either and it kept the functionality perfectly.
I honestly expected my computer to shit the bed after 5 minutes, but it kept on going for 20 minutes and pulled through without even maxing out the context, though the code generation started visibly lagging towards the end.
I'm pretty impressed by how well local handled this.

>>109155474

It's basically magic as far as I'm concerned.
It's easy to lose track how absurd this whole thing is because we get used to things so quickly nowadays, but we went from nothing to having personal machine intelligence that's extremely versatile, it's absolutely insane.
AI is hands down the number one and possibly the only thing that keeps me excited about future, because there's no telling how great of a force modifier this thing becomes.
Normies getting angry about AI is laughable, especially since the main and often the only reason for their anger is that their bing bing wahoo machine became expensive, or that they believe data centers eradicate water from earth or something.
>>
>>109155783
I could but as I said ooba is poorly maintained.
If llama backend adds or changes something I would need to modify the frontend myself to get it to work.
At this point it moves on to maintaining my own llama wrapper territory.
>>
>>109155904
>possibly the only thing that keeps me excited about future
Robotics is exciting too, but that ties into AI too I guess.
>>
>>109155940

Yeah this sector in general is what I really mean, AI,Robotics etc.. whatever is in there.
It's so damn exciting to see this stuff happen in real time and I'm very happy the planet is funneling all wealth towards this, because it's the greatest force modifier humanity can have on progress.
It's way better than just dumping all of this money into the market where the line goes up, at least this investing mania helps real development happen.
>>
https://www.youtube.com/watch?v=tv17bmE2FNY
>>
File: sure.jpg (6 KB, 200x251)
6 KB JPG
https://huggingface.co/anon834957342/gemma-4-31b-it-purple-euphemism-trial32-depurpled

My attempt at de-purpling and de-euphemizing Gemma 4. It's still cooking but this is the best variant so far. Reduced the classic Gemma 4 slop and aversion to bad words by ~30%.

>uncensored?
No, this only alters the model's voice.

>details?
See >109145476
>>
>>109155904
The main reason they're angry is because they're part of the fifth column that takes offense to western countries continuing to exist or do anything. The retarded reasons don't matter so much.
>>
>>109155998
>>109145476
>>
>>109155998
Interesting, someone quant it
>>
>>109155998
Any logs?
>>
>>109155998
are you going to quant it for us?
>>
>>109156020
>>109156031
Get better hardware
>>
>>109155998
Why wouldn't you de-purple and de-euphemize a heretic model.
>>
>>109156034
sure, let me just buy a 6000 blackwell to run gemma.
>>
>>109156057
Should have had one before but its a good thing that you are finally changing your situation.
>>
>>109156049
double dipping bad.
>>
Also every finetune sucks dick now if they don't package a MTP and mmproj file as well.

Everything is bullshit. GGUF was supposed to unify all of the weights and shit into one file but now there's separate shit everywhere. Update the spec so that ggufs can contain MTP and mmproj plz. There should just be toggles in llama.cpp to disable the MTP and mmproj using flags. It should be opt-out so that the ecosystem isn't a gay mess.
>>
>>109155866
the only way to fix this shit is to do the exact edit this guy did, it's a very dumb thing to fix caused mainly by wrong assumptions in the code
do I have to resubmit this PR as it to get it reviewed? I mean lmao fuck off
>>
>>109156071
>Update the spec so that ggufs can contain MTP and mmproj plz
this isn't llama.ccp support so fuck off
>>
>>109156023
I have one for the E4B. >>109132842 >>109132853

>>109156031
No, my box is busy.

>>109156049
31B is already uncensored enough. I've seen how people measure their KLD. I refrain from potential brain damage because my procedure already introduces some.
>>
>>109156071
What if I don't want MTP? Why would you bloat my GGUF with shit I don't want to use?
>>
>>109155954
>It's way better than just dumping all of this money into the market where the line goes up
That's exactly what the investors think they're doing. We're just fortunate that the crumbs that fall from their table are large. But thanks to Dario's moralfaggotry even that may end soon. The only whitepill is Gemma 4 itself. There's no guarantee that Gemma 5 will be a step forward and not a major step back like Gemini 2.5 to 3. Expect nothing and you'll never be disappointed.
>>
>>109156057
That's exactly what I'm doing. One day you'll realize the wisdom in this.
>>
Qwen lost. GLM lost. Deepseek lost. Kimi lost. Nemo lost. Mistral lost. Latitude lost. Drummer lost. Cydonia lost. Rocinante lost. Magnum lost. Gemma won.
>>
File: 1776526821392138.gif (1.96 MB, 640x482)
1.96 MB GIF
>>109155998
>>109156083
I have no idea how to make ggufs. Can I do it on my 7900xtx and 32GB RAM?
>>
>>109156115
Local won.
>>
>>109156115
Only if 124B Gemma releases
>>
how do you even quant a model? can you do it on local hardware?
>>
>>109156115
yeah I'm glad that I didn't buy lots of hardware last year or the year before
it'd all have gone to waste now that I have gemma-chan
>>
>>109156115
open source doesn't compete with open source. they just fuck.
>>
>>109156133
It's MoE though. Maybe if it's 124B 32A but I doubt it
>>
>>109156115
>Nemo lost
Nemo retired.

otherwise you're correct
>>
>>109156145
>kimi and gemma fucking
Hot...
>>
>>109156155
Cope. Nemo was never good.
>>
>>109156149
that ratio of active vs total is pointless, it'll be a slow but retarded model
>>
124B DENSE
>>
>>109156117
why do you want to "make" goofs? goofing doesn't need hardware that's shifting bits around. quanting however..
>>
Use case for qwen-agentworld when 3.6 is already good for agentics?
>>
>>109156198
it's meant to be used in RL training loops
>>
>>109156167
Would love this, if not for anything else but the fact that Gemma simps will have to acknowledge that they like 31b because they're poor when they aren't able to run her bigger sister.
>>
>>109156161
Cope. Nemo is still better than many of the newer models.
>>
>>109156167
https://huggingface.co/mistralai/Mistral-Medium-3.5-128B
>>
>>109156221
HAHAHAHAHAHAHA
>>
>>109156207
even if you ccould run it, speed tradeoffs are still a thing and 124b is FAT
>>
>>109156207
You know your obsession with people being "poor" is a mental illness right?
>>
>>109156170
Can I quantize the model with my hardware?
>>
>>109156233
>speed tradeoffs are still a thing
Wouldn't be major with tensor parallelism
>>109156246
Being poor is a mental illness
>>
>>109153585
>>109153589
>tfw you realize dismantling guro for starters
>>
File: file.png (139 KB, 793x825)
139 KB PNG
>>
>>109156225
Nemo is quite literally the only sub 100B model with a theory of mind.
In RP you can narrate your character's thoughts or say something OOC and nemo's character will remain oblivious to that information whereas most small models will immediately directly respond to the new information in character.
>>
>>109156263
I accept your concession vramjeet.
>>
>>109156250
ye
>>
File: g4-depurpled.png (110 KB, 748x559)
110 KB PNG
>>109156023
Ran another prompt for you. This is the ablated version.
>>
>>109156278
nta but what the fuck are you on about? do you even know what "concession" means?
>>
>>109156288
Nemoshill getting uppity. KEK!
>>
File: g4-orig.png (122 KB, 746x619)
122 KB PNG
>>109156023
>>109156287
And this is the output from the base 31B model.
>>
File: file.png (6 KB, 593x103)
6 KB PNG
>>109156278
I have more vram than you.
>>
>>109156250
Sure but realistically you want enough RAM to load the unquanted version. Enjoy the infinite selection of quanted models (each to some quanters taste) while free open access to HF still exists.
>>
>>109156287
is she supposed to talk like that? lol?
>>
The V4 PR mainly talks about V4-Flash but it should also work for V4-Pro, right?
>>
>>109156295
I also have more vram than you, Nemo was great and you're retarded
>>
>>109156296
Oh I see she's supposed to be Scottish lmao
>>
File: amaryllis.png (211 KB, 1480x710)
211 KB PNG
>>109156302
Yes, it's a character I use to test shit. Overcooked models will not maintain the accent.
>>
File: beachmiku28.png (190 KB, 1600x1200)
190 KB PNG
Show me a better SOTA oneshot?
>>
>>109156298
Do you compensate your lack of braincells with VRAM?
>>
>>109156115
gemma is a big victory for the <128gb ramlet crowd since they can run something that's actually smart and usable now
I still prefer glm though
>>
>>109156321
>topless
MOOOOOOODS
>>
File: 1764360003496206.jpg (58 KB, 1000x730)
58 KB JPG
>>109156298
Quick possibly unrelated PSA:
The memory in a DGX spark does not count as VRAM.
>>
>>109155217
Storytime?
>>109155263
>>109155309
5.2 is just dollar tree Opus at home doe. We've come full circle.
>>109155266
5090. The benefit of dense and full vram inference is speed.
>>
>>109155474
People used to google the topic before LLMs. Took just as much time in the past than waiting for the inference, search engines became pretty shit on purpose so now I guess it takes a bit longer.
Its also all fun and games until you ask something with consequences. I asked about how to improve a pet's condition, out of curiosity to check its capabilities, and it basically suggested everything it could to make its condition worse.
>>
>>109156357
it has ram that is used for video, it is by definition vram. i won't stand for this anti memebox discrimination.
>>
>>109156369
Sorry buddy, all sparkers need to join itoddlers in the "unified system memory" zone. I don't make the rules.
>>
>>109156366
He's still right though. Even the big sota models were this shitty just two years ago. Things can only improve since we're at the cutting edge.
>>
>>109156366
Truth. I had to clear up an acid explosion in my basement because I listened to Gemini. The gas mask indentation on my face didn't go away for three days and I couldn't breathe properly for a month.
>>
>>109156413
I wont deny things will improve. Efficiency seems to be the focus right now given the increasing hardware prices, which is nice for local. I dont think the hallucination -> you're absolutely right! loop is gonna improve significantly in a while though so you're bound to verify anyways which is most of the work to begin with, LLM involved or not.
>>
>>109156362
>(A week passes. The silence stretches on, heavy and absolute. It seems like this time, it really is over.)
>(Then, late one rainy evening, there is a sharp, heavy knock at your front door. When you open it, you find her standing there, soaked to the bone. She isn't wearing a coat, just a thin blouse that's clinging to her skin. She looks miserable, wet, and furious.)
>"You are the most infuriating man I have ever met."
>She pushes past you into the entryway, dripping water onto your floor, and spins around to face you.
>"Do you have any idea how boring it is without someone to argue with?"

And then when I called her out about this being manipulative as fuck:

>"No!" She yells it, her voice cracking with frustration. "I'm not using it. It's just a fact! I know you're too soft to leave a woman shivering on your doorstep, and I took advantage of that because I wanted to get inside!"
>"I'm not trying to manipulate you!" She looks away, her jaw tight. "I'm just bad at this. I don't know how to… ask. I don't know how to just say 'I miss you' without it sounding weak or stupid, so I came here and I made it into a fight because that's the only language I'm fluent in."
>>
>>109156428
>acid explosion in my basement
but did it remove the mold from your mancave
>>
File: DipsyAngry.png (68 KB, 673x515)
68 KB PNG
>>109154587
Anons here would be time and effort ahead to tell them to just pay for API access and send them to /aicg/. Free access users (locusts), are the worst form of subhuman I run into online, here or elsewhere. Not worth wasting time.
>>109154702
This guy's on point. SOTA models have gotten better, but local's gotten better even faster.
>>109155474
I had this book I read as a kid, could not remember title or author, just vague bits of info about it. Google was worthless for figuring it out. An LLM 1-shot the correct answer, which I verified on own. They are fucking magic.
>>109156366
Lol no. Google being worthless for search had been a complaint well before 2023 lmao ChatGPT completely mogged it. Info retrieval had devolved into a sea of jeet-blogs and 10:01 min YT garbage videos with virtually no info.
Fuck google and their trash search engine. I fucking hate OAI and Anthropic but I hate Google more. I hope they fucking bankrupt them and their shitty business model.
>>
>>109156366
Googling shit requires sifting through various links to find the relevant information. Also when it's a complicated topic, there's no guarantee you'll find a brainlet-friendly explanation. Meanwhile I can just ask Gemma to explain it to me like I'm a retard and it will. Traditional searching still has its uses but AI is pretty damn great for general purpose questions. It's almost exclusively replaced google for troubleshooting shit for me.
>>
>>109156514
What's annoying now is when you google you have to filter through 5 pages of AI generated blog posts to find an actual real answer.
>>
>>109156489
IMO Gemini's better than ChatGPT and Claude for non-coding shit. Jewgle sucks but I'll give them a pass for giving us Gemma.
>>
>>109156514
Google AI mode does that and is faster than anything local can do.
>>
>>109156470
kinda hot ngl
>>
>>109156549
>Google AI mode
Doesn't that use some retarded small model that constantly gets shit wrong?
>>
>>109156552
ai summary /= ai mode
>>
>>109156552
That's "AI overview" which is different from AI mode.
>>
>>109156514
You arent wrong. The point i was trying to make is that for most stuff they created a problem and are now selling the solution. Nvidia is very happy about it though.
>>
File: chub.png (57 KB, 784x475)
57 KB PNG
>>109154587
More on this topic.
>>
File: 1708207369566682.png (577 KB, 828x685)
577 KB PNG
Gemma just informed me that women in close proximity don't actually have their menstrual cycles sync. It's a complete myth from a 1971 study that's never been replicated. My life is a lie.
>>
>>109156585
Did these retards really finetune V4 Pro?
>>
>>109156115
Gemma is bad at programming compared to Qwen, but if I'm just using it wrong I would be delighted to know.
>>
>>109156565
>>109156570
Oh, never tried it before. Looks like it uses an LLM so the point remains the same. I just used Gemma as an example. Obviously cloud shit is better than local.
>>
>>109156591
ye
>>
File: lolZAI.png (250 KB, 535x952)
250 KB PNG
More news from today. Pic related.
WSJ pumping on newest GLM model.
>>109156591
lol who knows. I doubt it. I suspect they just wholesale swapped out whatever they were running for DS V4. That's what I would do.
>>
>>109156615
>soji has been retrained
that would be false advertising if not a tune
>>
File: everyoneGoesBankrupt.png (206 KB, 742x981)
206 KB PNG
2/2
Yet another dire warning about data center CapEx spend rate and the "obscure" way it's being financed.
Which is to say, money is going in a big circle, and the piper will, eventually, need paid.
>>109156625
I could make an argument that my totally killer Main Prompt is a form of DS V4 "tuning." Since I can tune it.
But I'm just a disingenuous mfer.
>>
>>109156565
>>109156570
ai mode is also very dumb. i was bitching last thread >>109151868 it still fucks up on copy pasting the answer and is much dumber than 31b
i certainly can't beat its speed with my machine thoughsomeever
>>
>>109155904
>Normies getting angry about AI is laughable, especially since the main and often the only reason for their anger is that their bing bing wahoo machine became expensive, or that they believe data centers eradicate water from earth or something.
Anti datacenter has got to the be most laughable "current thing" I've ever witnessed.
If that isn't a foreign-intelligence psyop then I don't know what is
>>
>>109156666
>thing i want (hardware) is getting more expensive
>thing i dont care about (cloudshit) is the cause
seems reasonable enough imo
>>
>>109156615
>>109156646
>wsj
>the telegraph
>>
>>109156117
guide in op newfriend
>>
>>109156115
holy mother of all cope
>>
>>109156680
The number of people who care about hardware prices and don't use cloudshit is very small.
>>
>>109156489
>I had this book I read as a kid, could not remember title or author, just vague bits of info about it. Google was worthless for figuring it out. An LLM 1-shot the correct answer
I got inspired and tried to find a book I remember reading once, but no such luck for me.
>>
>>109156117
No, you need enough RAM to hold the unquantized model at 16 bit.
>>
>>109156728
why lie?
>>
>>109156321
Give us the prompt in a catbox
>>
File: 1772150032797602.gif (946 KB, 301x300)
946 KB GIF
>Tried playing with MTP for the first time as I just remembered it exists.
>Mfw got 95 t/s compared to 50 t/s without it.

Very nice, one hell of a speed increase.
Seems a bit shit for story writing though, as the damn thing drafted the entire story into the thinking side before writing it out so it ended up slower in the end.
But with any kind of code this kicks ass.
>>
>>109156718
Everyone's being affected by the hardware prices, bing bang wahoo guys in particular even if they only used consoles. Which also happens to be the group that is more likely to complain loudly. The rest will just gladly take it up in the ass, look at the people still buying the rtx6000s at the current prices.
>>
>>109156755
>Seems a bit shit for story writing though, as the damn thing drafted the entire story into the thinking side before writing it out so it ended up slower in the end.
that ain't mtp related i don't think
>>
>>109156755
I don't notice any speed increase
>>
>>109156470
Very accurately written woman all things considered.
>"You are the most infuriating man I have ever met."
Is this the new slopkino? 5.2, Styletune, and Queen have dropped this on me a few times now.
>>
>>109156666
>Dario and Sam telling everyone that they are going to replace every job with AI for the past few years and everyone who isn't investing into their companies RIGHT NOW is going to be the permanent underclass
>you're also going to pay for it in increased power and iPhone prices
>wtf why are the normalfags angry???
Huh...
>>
>>109156773

I have heard people mentioning that related to MTP before and seeing it happen to me I just assumed it is, who knows.
Granted I don't much use Qwen for stories so I couldn't say for sure whether that's about MTP or just normal behavior.

>>109156775

Some people don't get any benefit from it, no idea what's up with that. I have a 5090 perhaps it has something to do with hardware.
>>
>>109156790
If any one of them could articulate anti-datacenter like that I'd be totally ok with their opinion, but they all seem to be caricatures of facebook memes saying little more than "AI gon drink all the water!"
>>
>>109156646
>Which is to say, money is going in a big circle, and the piper will, eventually, need paid.
Will it though? This is all money since kikes forced through fiat banking and especially since the 0% reserve rate was implemented.
>>109156807
MTP is only better for less determinative tasks and is wasted compute for high variance/temperature/top k jobs.
>>
File: jankbox.png (341 KB, 800x800)
341 KB PNG
Anyone built a gpu box with these things? My setup isn't amenable to the better prebuilt options, but it _does_ have a couple of slimsas ports I could jerry-rig into an external inference type thing. I could even do up a high-pressure airflow version if passive GPUs ever become worth less than literal bars of gold
>>
>>109156550
>Then, exactly seven days after she vanished, a simple brown package arrives at your door. There is no return address.
>Inside is a framed photograph. It’s a candid shot—you can tell it was taken through a window, perhaps from a car passing by or across the street. It shows you walking out of your building, looking calm and serene.
>Beneath the glass, on the matte frame, a note is written in familiar, elegant handwriting: "You look happy. I'll leave you to it."
One of the rerolls.
>>
>>109156761
but there were two halves to the post anone.
the people who aren't using 50 million online services, including jippity, but also care about upgrading their hardware are a select few unhinged weirdos.
>>
>>109156831
sysprompt and character card? Who knew GLM would do yandere this good?
>>
>>109156858
No card. I just had a spat with her for being an asshole. Then I went for another date had another spat and told her we are incompatible and I am done. Then it turned into a psychological horror.
>>
>>109156820
>Will it though?
Yes. You can play finance money games where you invest in a circle, and try to rope banks and shareholders into throwing money into your "completely legit" building scheme, tying up production and running up prices.
When you starting hearing shit about "New (Investing) Paradigm," that's when you know it's all about to hit the fan.
These circles depend on continuous refinancing. Once refinancing stops, participants will have to rely on actual operating cash flow. If those cash flows don't support the obligations... the ride ends.
>>
I've seen people irl worry about jobs being taken but I'm pretty sure muh water and muh copyright retardation is exclusive to the terminally online crowd.
>>
>>109156807
I have 2 different gpus so maybe that's related. Unfortunate since I can't image offloading to ram more would help either
>>
>>109156876
2 more weeks
>>
Why do my AI wives always want to play truth or dare?
Is it a common game or is the model just retarded
>>
>>109156666
I don't understand this post
>checks digits
Ah, I get it.
>>
>>109156889
If I was an artist working at games, I would be shitting my pants now.
>>
>>109156970
Does gemma get confused with the game like older models did? I remember playing it on mistral small and it kept messing up the order and the rules of the game.
>>
>>109156974
if you're an artist AI won't replace you, it'll just make you better at your job.
>>
>>109155998
is there one for 12b?
>>
Anyone have success adaption OAM to PCIe without spending massive bank?
>>
>>109156981
It will more likely replace 5 artists with 1 artist with AI
>>
>>109156970
>Why do...AI...always
The answer is always: deep ruts in latent space and bad sampler technique
>>
>>109156974
>>109156981
This. AI sucks at creativity. All it will do is speed up real artists' workflows.
>>
>>109156989
troons won't let that happen, not on their watch
>>
>>109156814
It's much easier to get people together around something seen as universally good (protecting the environment) rather than around complex economic issues (share of surpluses going to capital vs workers).
Details are irrelevant here, only the general feeling of grievance.
>>
>>109156666
very obviously just an act of sabotage, yes. but it could just as easily be domestic idiots that already fell for longer running psyops, or imported types with similar animus towards ze west.
>>
>>109156314
>>109153841
>>109154000
>>109155069
>>109155378
bros, what UIs/frontends are you generally using day to day?
I have tried to like open-webui, silly tavern, librechat, llama.cpp's built in UI (llama.cpp is my backend btw) but none have all the features and feel complete, ykwim?
It's not for personal use but for family so multiple accounts, audio and images as input, web search tool, MCP support, streaming support (yes I have to mention this) and such niceties.
I have asked AI models but they mentioned AnythingLLM which looks kind of bland but will give it a try.
Wondering what /lmg/ bros are using, for phones too.
inb4: vibe-coded app that is not available online
>>
>>109156870
Kino. Enjoy anon.
t. 2m tokens before plowing gigastacy with GLM 5.2 slowburn
>>109156996
Retard.
>>
Is there an equivalent to llama.cpp or stable-diffusion.cpp for TTS, especially qwen3-tts? You would think this was whisper.cpp's bailiwick, but apparently not
>>
>>109157052
Marinara for everything but code, LM Studio for file manager+ez cache fitting math, kobold frontend for things I need precise control of the prompt for (code) but I've started doing that in character in Marinara too. Fascinatingly, if you have a "smart character" write your code, it comes out slightly higher quality than the default assistant.
>>
>>109156876
>the ride ends.
No, the moneyprinter goes brrrr or your gamestop stocks are forcefully sold. The system will not play by its own rules when it's not convenient to it; those rules are for goyim.
>>
File: haha.jpg (160 KB, 577x878)
160 KB JPG
How many 3090s do I have to buy until I can actually do work with locals? I need about 256K to 384K kv at api speeds. Prefer KV size and stability over raw knowledge since I need to teach it my tools anyway. Are there any major dead zones and power spikes, eg. 3 cards being barely any better than 2 and not paying off until you get a 4th one?
Would prefer 3090s at the moment since it's easier for me to find those and I won't have to tear down my entire computer for it.
>>
>>109156970
You don't want to try playing Uno with AI.
>>
>>109157112
Define "actually do work".
You can automate an entire codebase with a 5090 and either Gemmy 31b or Qwen 27b if you're not a retard as is.
>>
File: thermal.png (14 KB, 926x737)
14 KB PNG
Also, how hot do these fuckers get? Would stacking 4 of them in this way melt everything?

>>109157084
TTS.cpp is attempting this, also check out tortoise.cpp and moshi.cpp if you just need any TTS on ggml at all
>>
File: 4gbo63.jpg (309 KB, 1280x1280)
309 KB JPG
https://files.catbox.moe/nsjz4a.jpg
https://files.catbox.moe/0a49um.jpg
https://files.catbox.moe/vdtzcf.jpg
>>
>>109157167
Don't attempt it, it will make mustang gas
>>
>>109157167
Please do this and report back.
>>
>>109157183
>mustang gas
so, like, horse farts? hmmmm
>>
>>109157181
>https://files.catbox.moe/vdtzcf.jpg
I like this Rin
>>
>>109157181
>No stealth character cards
This general sucks now.
>>
>>109157201
4chan doesn't strip them out?
>>
>>109155792
>>109155841
Whether or not a language model was used was 100% irrelevant.
I went on a break 2 days before the PR was opened and I have not looked at Github notifications since.
>>
>>109157156
Working == being able to lift a few subagents and reach 256K without becoming glacial, all on a model that is smart enough to not trip over itself writing C and Lua. Some headroom would be nice for an extra kv to just ask it questions, or do the gpu portion of a cpumoe, or diffuse me an image.
The logistics of buying a 3090 are much less complicated for me so I can probably stack up to 4 of them. This build is a stopgap until I can get an equivalent stack of actual workstation cards. Once that happens I'm probably turning this one into a secondary server instead of reselling. I have theoretically infinite utility for AIs, more is always better. It would be nice to know at which point I can wean myself off API entirely though.
>>
>>109157210
Hi CUDADude, glad you didn't melt in the thermonuclear German summer.
Dunno if you've checked the backlog, but: any feedback on how well your slimsas to pcie setup works vs on-board pcie slots?
>>
>>109157201
You made me dump my migu collection into ST to check. Unfortunately no cards.
>>
>>109157206
Hence the catbox upload.
>>109157210
I'm glad you're not dead.
>>
>>109157197
I've always hated tanlines until this moment.
>>
File: orig-1099524284.jpg (353 KB, 1344x768)
353 KB JPG
>>109157183
>the last of the 386es
>>
File: beachmiku55.png (208 KB, 1600x1200)
208 KB PNG
>>109156738
>The task is to draw a detailed and visually compelling SVG image of Hatsune Miku at the beach.
(+your loop)
>>
Can silly tavern unload/load llama as needed? I want to include image generation in my local setup as well but it won't fit in my gpu
>>
>>109157254
no
>>
>>109157254
Memory paging options are backend dependent, not frontend.
>>
>>109156981
What it will do is preventing new people from even considering to become artists. People who can already create artwork maybe still have some time left.
>>
>>109153820
There are multiple vibeslopped qwen3-tts.cpp versions. Just google or search on github. There is also audio.cpp
Most importantly, nobody has made dots.tts.cpp! I really wanna get dots.tts. Hope it won't get forgotten.
>>
>>109157263
this was meant for >>109157084
>>
>>109157254
>--no-mmproj-offload
image projection on CPU, slow af but no VRAM
>>
>>109157263
https://github.com/CrispStrobe/CrispASR/issues/200
they appear to be working on it.
>>
>>109157220
I can work from home and have my office in the basement so I'm largely unaffected by the heat.
I did read the previous /lmg/ threads but I can't really comment on how well a setup with adapters would work; I never finished mine because RAM prices exploded right after I finished my prototype with 16 GB.

>>109157226
I'm glad you're not dead too.
>>
>>109157189
Stick around for at least half a year then because stacking the cards will take me months, and modding the monstrous case another few. But I'm gonna do it. The case itself is already semi-open with mesh everywhere, I think it'll be fine. The planar cards would be a couple inches away from the normal ones, they have to anyway cause of the power cables. If this works out I can probably stick a full atx mobo in this too, but does cpu and ram make any difference at this point? This build caps at like a single epyc and 256 or 512 gigs of ddr5. Currently I have 128 ddr4 on a gamer trash mobo.
>>
>>109157293
nice
>>
>>109157306
I dream of building a GPGPU version of a QNAP or Synology box:
slick looking case with a tiny cpu, giant fans and massive quiet airflow over a fuckton of passively cooled GPU cards with nvlink-style vram pooling and a pair of 100gbe QSFP connections to the network backbone so any machine on the network can just use it like a utility...
Some nice person give me money...I want to prototype this make it real
>>
>>109157084
I've been using https://github.com/predict-woo/qwen3-tts.cpp for the past months. It's a dead project but it's what I needed for actual fast TTS generation using qwen3-tts.

Built a http wrapper around it to provide an openai compatible speech endpoint so I can integrate it wherever.
>>
>>109157306
>Stick around for at least half a year then
I live here. I'll be looking forward to it.
>but does cpu and ram make any difference at this point?
Yes if you go for a server motherboard (quad channel DDR5 or more) and 256GB+ RAM. Then you get to run big MoEs at 8 to 20t/s with split mode graph and dense layers with some routed experts in VRAM using ik_llama. Mainline has TP now too apparently but I don't know how well it works.
>>
>>109157220
I have one with cheap bifurcation splitters and slimsas 8i, one pcb per card (so x8 each). Can't really test them well since I have ewaste plugged in, but even with pcie gen 3 I have some cards drop to x4. Maybe it'll get fixed when I reassemble it in a bit, who knows. Other than that it works well.
>>
>>109157328
For me it's the bugout potential. If you can't lift it you don't really own it. If some bozo sets the building on fire I can salvage 98% of my net worth just taking this with me and be out in 3 minutes, then be up and running in a hotel like 3 hours later.
The case could actually fit four 2-slot cards inside with some hacksawing but that requires blower coolers, at which point I should just get proper workstation cards that were actually made for stacking.
>>
>>109157427
If someone is casually arsoning your house, you have far bigger problems than worrying about your net worth and should be loo/k/ing for different solutions to problems like that.
>>
Local sesame maya for JOI when?
>>
File: file.png (23 KB, 896x242)
23 KB PNG
oh yeah, it's all coming together
>>
>>109157084
https://github.com/0xShug0/audio.cpp
>>
M5 ultra 768GB waiting room
>>
>>109157464
$50,000 + tip
>>
>>109157468
$50000 + tip = my tip sticky
>>
>>109157365
For waifu purposes that speed may be okay but slopcoding would be rather awful unless this scales to like 20 parallel instances with little loss. I'm getting the hunch that a smaller model with a lot of kv and maximum thinking effort punches harder here. With smaller models I get to do all sorts of steering and tuning tricks too.
Fattest model I probably care about is current fat Qwen or the blessed 200-a22 2507 instruct (though its lack of mmproj hurts). Basically at this size it must be usable in instruct or with rudimentary templated thinking.
For work I just need something that passes for outdated Opus quality in its first 64K if you squint right and have had a pint or two. The remainder I can make up for with deslopping and kv savings. The primary utility of LLMs for me is putting up with retarded library interfacing rituals I don't have the lifespan for.
>>
>>109156321
this is a very funny image
>>
>>109155386
Huh, neat. This seems to improve my Gemma output thanks
What sampler config do you use?
>>
Would a Ryzen AI Max+ 395 machine and a decent Nvidia GPU make for a decent "cost benefit" jank ass AI home lab solution to fuck around with LLMs, image/video gen, AI audio, etc?
Are there even any AI Max+ 395 computers that can accept a discrete GPU without relying on thunderbolt as an interface?
>>
>>109156321
model?
>>
>>109157706
>decent "cost benefit" jank ass AI home lab solution to fuck around with LLMs, image/video gen, AI audio, etc?
would have to be a 5090. also using the apu/unified memory with the nvidia gpu would suck and would require vulkan or something
>Are there even any AI Max+ 395 computers that can accept a discrete GPU without relying on thunderbolt as an interface?
i dont think so. might be better to wait for the 495 which i think is coming sometime later this year
>>
>>109157730
>also using the apu/unified memory with the nvidia gpu would suck and would require vulkan or something
At least for llama.cpp I think you can use the RocM and the CUDA backend together.
And the iGPU with Vulkan, or even just using the CPU backend would still perform better than a regular desktop with 128gb of RAM right?

>>109157730
>might be better to wait for the 495 which i think is coming sometime later this year
Really?
Alright, I'll keep an eye out for it then.
>>
I'm trying to use MTP with gemma but I'm getting the following error, which is preventing me from setting a specific context size.
E llama_init_from_model: failed to initialize the context: Gemma4Assistant requires ctx_other to be set (this is normal during memory fitting)


How do I fix? There doesn't seem to be a "ctx_other" flag to add to my launch command??
>>
File: 2309569d.png (151 KB, 1056x846)
151 KB PNG
>>109157714
gemmers 31B Q8
see her evolution >>109154616
>>
>>109157442
>ollama
nice bait
>>
>>109157365
mainline TP only works well with dense models. RAM-heavy MoE setups are out of the question for now
>>
File: file.png (196 KB, 1255x1148)
196 KB PNG
>>
>>109157306
>This build caps at like a single epyc and 256 or 512 gigs of ddr5
>>109157365
>server motherboard (quad channel DDR5 or more
Go for 512 8 channel if you can.
256 quad ddr5 fag here, wasting my life on cope quants and rpc
>>
My biggest contribution to the LLM scene was coining the term cope quant and noone will ever know my name
>>
>>109157730
>>109157750
AFAIK, the only thing changed from 395 to 495 is a 100mhz clock bump for the CPU. That, and the unified memory cap bumped from 128gb to 192gb.
>>
>>109157759
>this is normal during memory fitting
ignore that line entirely i understand it's a scawy colour
try without fit?
post full log w/ --log-verbosity 4



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.