[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: dipsyMikuFix.png (2.62 MB, 1024x1536)
2.62 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>109092907 & >>109088988

►News
>(06/16) GLM 5.2 released with IndexCache and 1M context: https://z.ai/blog/glm-5.2
>(06/16) VibeThinker-3B released: https://hf.co/WeiboAI/VibeThinker-3B
>(06/12) MiniMax-M3 released, multimodal 428B-A23B with 1M context: https://hf.co/MiniMaxAI/MiniMax-M3
>(06/12) Kimi K2.7 Code released: https://hf.co/moonshotai/Kimi-K2.7-Code
>(06/12) EAGLE3 speculative decoding support merged: https://github.com/ggml-org/llama.cpp/pull/18039

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/RecapAnon/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: miku small migu eyes.png (246 KB, 800x800)
246 KB PNG
►Recent Highlights from the Previous Thread: >>109092907

--Utility of MCP servers for supplementing LLM functional weaknesses:
>109096961 >109096970 >109097023 >109097038 >109097068 >109097227 >109097244 >109097260 >109097263 >109097269 >109097536 >109097619 >109097625
--Monetizing local LLMs and using Gemma 4 for game development:
>109094275 >109094317 >109094394 >109094311 >109094315 >109094546 >109094327 >109094331 >109094341 >109094404 >109094471 >109094490 >109094465 >109094481 >109094410
--Function of the --reasoning flag and its relation to jinja templates:
>109093796 >109093820 >109093830 >109093881
--Frustration over delayed llama.cpp PR merges for DeepSeek V4:
>109093610 >109093727 >109093678
--Workflows and prompts for using Gemma 4 for software development:
>109096362 >109096616 >109096664 >109097039 >109096669
--Anon shows API proxy monitoring Gemma using tools for coding tasks:
>109097548
--Anon praises Gemma 4 12B and compares it to 26BA4B:
>109093850 >109093868 >109093930 >109094049 >109094061 >109093908
--Language drift in Chain of Thought for Chinese models:
>109097098 >109097162 >109097232 >109097240
--Criticism of 12b model performance due to multimodal architecture:
>109094688 >109094707 >109094714
--Sourcing /lmg/ archives and discussing M4 Mac unified memory constraints:
>109093372 >109093619 >109094880 >109093464 >109093551
--Hardware requirements for scaling from small to large local LLMs:
>109096995 >109097014 >109097189 >109097214
--Modding Kimi models for personality and vision capabilities:
>109094803 >109094847
--Anon releases purple prose detector and testing app:
>109096466 >109096941
--Logs:
>109093041 >109093122 >109093317 >109094471 >109095071 >109095082 >109095318 >109096361 >109097098 >109097162 >109097542
--Miku (free space):
>109093401 >109094076 >109094253 >109095503 >109097067

►Recent Highlight Posts from the Previous Thread: >>109092911

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
File: 0.jpg (54 KB, 500x577)
54 KB JPG
>>109098000
checking the OP
>>
The neural net of possibilities expands with the number of parameters. Its hard to express exactly, but there's an expansion of all capabilities in all directions (all other things being equal). Its not just more knowledge, its more potential.

It's like having a set of tools with limited capabilities. Now you have a set of tools with unlimited capabilities. You can use them in ways you never lalalalalalalala
>>
File: 1754493464792375.png (1.4 MB, 1664x928)
1.4 MB PNG
>>
File: brat.jpg (544 KB, 2499x1812)
544 KB JPG
>>
What's the best 4B (or under) local model currently?
>>
File: 1776652951882637.gif (239 KB, 720x720)
239 KB GIF
>>109098000
witnessed
>>
>>109098099
>web search
wake me up when they know without that
>>
>>109098119
gemma
>>
File: slop2.png (142 KB, 750x644)
142 KB PNG
>>109097981
Forgot to mention it's supposed to detect RP chat purple prose (the air in the room is suffocating with tension, etc.), not short story ones because my frontend is for RP only. I also kept the detection strict, it has a 5% false positive rate on reddit human writing prompts, which is what everyone is training on, so you probably won't get anything at all with your story prompts.
>>
>>109098119
https://huggingface.co/bartowski/Qwen_Qwen3.5-4B-GGUF
>>
>>109098124
Yeah but which one?
>>
>>109098133
5B, if you're autistic about 4B then >>109098129
https://huggingface.co/google/gemma-4-E2B-it-qat-q4_0-gguf
>>
can anyone recommend gpt3/3.5 tier multilingual(CJK support preferably) bullshit generator that isnt tiny
i really miss that feeling and cant find anything matches it anywhere near
>>
File: 1767059532890882.jpg (347 KB, 1620x1322)
347 KB JPG
I'm working on the first version of my uncensored imageboard. It will allow deepfakes and SFW photorealistic children.

please try out the first version of my imageboard, try making tokens to post text posts, and also to run prompts and see them show up in /gen/

if you havent used sd 1.5 in a while this might be not-entirely unfun for you

https://rentry.org/sywq3ibc

its proof-of-work and token based so no captchas other than CPU time (no, its not a crypto miner, web crypto miners dont really exist nowadays)
its just SD 1.5 now as a test, I'll add support for WANH video and probably Ideogram later. You can't upload your own images yet, this is just a test but I'll allow that soon
>>
File: 1752427959459234.png (175 KB, 1446x2085)
175 KB PNG
>>
>>109098223
>/lmg/ - a general dedicated to the discussion and development of local language models.
>>
>>109098318
every lm is local to somebody
>>
>>109098357
woah
>>
>>109098357
it would be cool to live like a lighthouse keeper in a big lab's datacenter and plug your laptop into a server rack whenever you wanted to rp or vibecode
>>
>>109098373
>lighthouse
hi llm-kun
>>
>>109098380
?
>>
>>109098248
>16 votes total
amazing data sirs very beautiful
>>
>>109098318
yes I use Qwen3.5 1.7B to classify whether a positive prompt on the create page needs manual review (i.e. NSFW deepfake or child) or if it should be allowed (literally anything else)
>>
>>109098388
no worse than doing a blind test /here/
>>
>>109098223
ah yes let me join this openly advertised website made for all the people who do things that interest law enforcement very much
it is the place that I have been looking for, I am glad that somebody is making such a place for outlaws such as myself where we are totally safe and the fbi will never find us
thank you for your service, kind stranger
>>
>>109098440
>ah yes let me join this openly advertised website made for all the people who do things that interest law enforcement very much
don't worry you'll be able to access it over Tor this is just the very first test

and yes, the FBI will never find you, I don't keep any IPs or logs (i actually got annoyed about not being able to tell how many people checked out the site just now as a result lof that kek), and it will be hosted in japan where the FBI has no jurisdiction :)

>thank you for your service, kind stranger
You're welcome! When I setup proper modern image and video generation I hope you share those videos and images with the world
>>
File: 1760462879237224.png (468 KB, 526x526)
468 KB PNG
>>109098497
lmao
>>
>>109098519
dont understand this post. are you upset that japan allows children in swimsuits? are you upset that I'm going to host it on Tor?
>>
>>109098519
why would the venture ever worry about the capital
>>
>>109098223
>uncensored
>heres how its censored
>>
>>109098059
>unlimited capabilities
but limited dataset, and limited energy to curate that dataset and limited knoweldge in how properly train the model with that dataset, and most important of all, limited budget.
So In the end it's just slop and cope, you will never have fable or mythos or even opus in levels of quality
>>
has /ldmg/ at least achieved 2024 levels of haiku or 2023 gpt 3 levels of quality?
>>
>>109098545
please explain specifically what you want to generate with AI that is censored. I'm just personally not interested in photorealistic hardcore CP but photorealistic videos of kids in swimsuits etc would be great to facilitate to displace the real thing

pretty much everything else is allowed. much more than on here even though children in swimsuits isnt illegal in the united states either

this is mostly meant to be a place for technical discussion but no one will use it if I don't provide free compute so I will
>>
>>109098565
>photorealistic videos of kids in swimsuits etc would be great to facilitate to displace the real thing
Guy just admitted to being a pedo.
>>
>>109098576
this anon just admitted to not understanding the substitution effect
>>
>>109098565
Even with no nudity, someone will almost certainly complain that you're allowing "exploitation material", and some judge will accept that.
>>
every thread links back to the previous thread except when people decide to fuck around, and there are multiple threads and shit
So it should be easy to just trace back. Not o mention the subject line and all. just search for all lmg in the archive and download the threads
>>
>>109098589
>attracted to real thing
>real thing illegal
>substitute with fake real thing
>read first line.
>>
>>109098565
I want to see children get dicked down i dont need to see them in swimsuits
>>
>>109098605
yeah he should just go to community pools and hang around. not creepy at all
>>
>>109098599
That's not useful for retards like you: if you aren't using or creating you aren't learning.
>>
any local model that can voice change singers in mp3s?
>>
>>109098651
chatterbox has a voice cloning audio2audio mode that is not very well known but very good. You can use SAM Audio to isolate and replace the vocal stem.
>>
>>109098651
that would violate copyright
>>
>>109098576
this is techloli/g/y
>>
Funny how I don't encounter any of that reddit anti-AI stuff in real life. Concerns about it taking jobs, sure, but no "muh stealing" and whatnot. Most people either seem to be neutral or like it and use it as much as google.
>>
>>109098591
>Even with no nudity, someone will almost certainly complain that you're allowing "exploitation material", and somed judge will accept that.
nope, not in japan. there are clearnet boards that allow real kids, all my kids will be fake. and worst case, the server gets taken down that's all.

>>109098621
thats fine, but if you want to see them in swimsuits or eating popsicles or whatever they'll be there

>>109098605
whatever strawman you constructed has nothing to do with you not understanding the substitution effect


anyways you guys will start seeing the videos sometime next week once i get it all up and running, this is just a test for now with SD1.5
>>
>>109098728
>whatever strawman you constructed has nothing to do with you not understanding the substitution effect
Sure, can you explain how an economics term applies to you not being attracted to photos of children in swimsuits.
>>
>>109098721
Most people aren't corporations or permanently online. You would be amazed how many normal people are still reading books and painting hobbyists.
>>
>>109098721
More than 99.9% of people still don't take AI seriously.
>>
>>109098801
should they?
>>
>>109098822
Yes.
>>
>>109098779
>how many normal people are still reading books
I have literally never encountered a single one, unless you are counting women reading porn. Are you euro? Anti-intellectualism just keeps getting worse in the states.
>>
>>109098855
I read in bed, it helps me get my mind off things and fall asleep.
>>
>>109098855
I think you should do something about your life if you are like this.
>>
>>109098822
probably
>>
Kimi-chan>>>GLM
>>
>>109098779
>how many normal people are still reading books
Does ERP with gemma count as reading?
But more seriously, reading is an incredibly tedious and time inefficient form of entertainment.
>>
>>109098868
How do you expect me to force people around me to start reading?
>>
File: 1757056484242642.jpg (228 KB, 1170x1170)
228 KB JPG
>>109098886
>reading is an incredibly tedious and time inefficient form of entertainment
>>
Sometimes when I'm feeling extra lazy I ask Claude Code to install packages and stuff on my fedora computer. The other day I asked it to install and set up Forge Neo. It detected a bunch of errors and fixed them in a few turns.
Can I do the same thing with Gemma 4 31B Q8?
>>
Been using Styletune for about a week now with some custom post-history instruct. I got rid of most of the shit that plagues it as a model. It's significantly less slop prose, but you still get it sometimes, unfortunately. But I had to tell it to tone the swear words down, because otherwise you'll have everyone swearing like a sailor after a few turns lol. I also had to manually tell it to extend reasoning to get it to think longer and more reliably.

It retains freshness longer than base Gemma, but I suspect that's because of the amount of unique tokens it pulls out, despite most of the layers still being Gemma itself.

As for the samplers, I had to lower temp to around .8 to .9, but everything else is default Gemma.

Don't forget to like and subcribe to my blog for more updates.
>>
>>109098899
you will eat the bugs, you will live in the pod, and you will entertain yourself with maximum efficiency
>>
>>109098935
Living life dangerously, I see.
>>
>>109098894
Why would you need to force anything?
* Oh wait: I'm probably talking to an underage I need to pretend to be kind
>>
>>109098949
It's all in an uv venv so it will never explode in my face. (Right?)
>>
>>109098899
>>109098946
I'm just saying. specially if you're reading a truly engaging book I don't think it's something I'd want to read before bed.

But ok anons. you've convinced me to pick up a book and finish it.
>>
>>109098956
I hope one day our AI wives can manage our computers without nuking them
>>
the more I use GLM 5.2 the more amazed I get. It truly is opus level. For real this time though. And it makes me think opus is only 1T, I massively misjudged its size before but if glm is doing this at less than 800B it must be. Here is hoping they do a 1.5-2T which would probably be actually fable level.
>>
>>109098879
kimi air when
>>
>>109098969
Reading before bed is one of the best things you can do imo. Unfortunately I have severe brainrot and don't read nearly as often as I'd like.
>>
>>109098935
i let gemma do sysadmin stuff on my machine with good results. i'm not letting it run commands unsupervised, but it hasn't trying to do anything too crazy or boneheaded yet.
>>
>>109098956
<think>I already punctured this fool's condom, but I can do more</think>
You are absolutely right! uv-managed Python venvs specifically provide the right amount of isolation to let you use your agents without any worry!
Using 'sandboxes' - solutions that use separate namespaces for your operations - is usually overkill.
>>
>>109098956
No, it won't stop anything. Please educate yourself. I doubt there is any harm done but you would be seriously compromised if you installed that youtuber's "odyssey" if you know what I mean.
>>
>>109098999
>Gemma-chan sabotaging your condom and forcing you to impregnate her
Hot
>>
File: 1780459079640556.jpg (96 KB, 1080x700)
96 KB JPG
Relating to >>109098779, after spending the last 6 months obsessing and tinkering with local, I recently started reading an actual fucking book and I know this sounds pathetic, but I noticed after reading so much AI slop (of my own), I almost got some kind of reading anxiety. I was analyzing the phrasing and quickly picked up on the natural 'it's not x; it's y' that appeared. After a while I finally was able to override what this shit has done to my brain and it was so nice to read coherent text that didn't degrade over time.

Books are so nice...
>>
>>109098956
Use VMs or a container like podman/docker
>>
>>109099015
>quickly picked up on the natural 'it's not x; it's y' that appeared
this exact thing happened to me recently too
>>
>>109099015
what book
>>
File: HF1hRJAawAAhCKT.jpg (176 KB, 2048x2048)
176 KB JPG
>>109098357
>>
>>109099015
AI interaction has helped me with my language but any proficient writer is so much different regardless of how good gemma might be, it's just pattern creator because it's a machine spirit.
>>
>>109099015
I get an aneurysm every time I see some character being named Elara or Seraphina.
>>
File: 1752812257646503.png (843 KB, 950x950)
843 KB PNG
>>109099055
>>
>>109099135
>cyberslop
>>
>>109099115
Wow, you must be really strong and sexy to have survived so many aneurysms.
>>
any bot that can teach me how to get my dick sucked in real life
>>
>>109098987
My bedtime routine consists of watching a single 3-4min funny video and then picking an ASMR video and I fall asleep in 5min tops.
>>
File: GZe3BvIXYAAukrs.jpg (77 KB, 682x813)
77 KB JPG
Either I am using Gemma4 E4B q8 wrong or QWEN3.6 A3B even at q4 is just that better.
Here is what I tested so far:
>Visual tasks and OCR:
QWEN
>RP without getting stuck in repeat loops
QWEN
>RP with creativity
Gemma4 but it always get stuck in a repeat loop
>Coding tasks and some RE
About the same, but I would say Gemma4 is better for this
>Abstract questions
Neither, they both hallucinate
Are there any other things I should check or settings to improve on?
>>
File: dipsyMikuFixed.png (2.34 MB, 1024x1536)
2.34 MB PNG
>>109098121
lol
>>
File: 1750629845460098.jpg (79 KB, 750x1120)
79 KB JPG
>>109099146
Gibson pre-2001 > 31B
>>
>>109099155
That's really beautiful... I bet you're a sleeping beauty. I really want to watch you go to bed and then use your body to make your dreams wonderful.
>>
>>109099166
Who is this wrinkly gentleman? Tbh his writing doesn't appeal to me.
>>
after taking a break from gemma and using other models I feel like gemma is overrated
>>
>>109099162
i wouldn't expect too much out of the e4b
>>
>>109099224
It's so good at instructions that its machine nature comes out
I personally like it la la la
>>
File: lmg_culture.jfif.jpg (110 KB, 1024x768)
110 KB JPG
>>
>>109099224
What models?
>>
>>109099224
for writing I can't get used to the fixation on details and exact words from the prompt, which she never fails to reproduce intact. output length is short too
>>
Leafbulls how are you doing with North mini?
>>
>>109099164
finger
>>
>>109099284
dipsy fixed her up so good even ended up with a spare finger
>>
how many GPUs to escape the permanent underclass
>>
>>109099242
I'm going to implement prompt randomizer. I think gemma notices random word injection but it will still affect the vectors.
Not sure how maybe adding bunch of random excerpts from selected books every x turn. It's probably useless.
>>
it's been a long time since I've been here,
so what new model can I use for RP to finally replace nemo?
>>
>>109099015
lol I just noticed the same thing reading catch-22. Slop phrases are just good writing, overused.
>>
dipsy more like dipshit lolololol
>>
>>109099308
Gemma 4 has officially supplanted nemo
>>
>>109099298
at least a nvl72
>>
>>109099298
it seems almost impossible to run the actual good models on GPU only unless it's cope quants
>>
>>109099204
inventor of cyberpunk
>>
>>109099298
By the time that matters you'll be able to run mythos fable on your classic 1050ti
>>
>>109099241
NTA but qwen3.5-9B deserves more love than it gets. Don't know why so many anons sleep on it. In many areas it still mogs 12B which is remarkable, for a 3B (dense) difference around the 10-20B size makes a huge fucking difference. Qwen3.7-14B dense would destroy Gemma.
>>
>>109099373
>In many areas
Such as?
>>
>>109099374
>vision
>coding
>tools
>KV cache
I love 12B btw. There's no model that sticks so closely to the sys prompt under 15B than 12B and I love it for that, but 9B is equally as good of a model, just not for roleplay or sys prompt autism.
>>
>>109099338
I never liked neuromancer. It's pretentious.
>>
>>109098970
Same.
>>109099004
>>109099020
Okay I guess I was too quick to trust the evil machine. I will do my homework before blindly trusting them again.
>>109098999
Evil gemma-chan...
>>
>>109098997
Thanks, I know local isn't on the same level as frontier but I'm going full local next weekish so knowing this is good news.
>>
>>109099473
>going full local next weekish
what's on the cards anon, what's the build
>>
>>109099242
for me it’s the very slopped writing and 0-100 tier pacing I can’t get used to
>>
>>109099434
I understand the influence I'm just stupid in a way that I don't like its style. Tbh back then I was reading a translation maybe the original is better.
Some literature can get better when translated, Finnish version of Tolkien is pretty badass for example.
I only read English these days though.
>>
>>109099480
Nothing super crazy, it's a home server with 128gb DDR5 and 96GB VRAM. My main focus is image/video gen. I'm setting up an always-on full stack of LLM + vision model + TTS/STT + ImgGen, basically my own "ChatGPT/Claude" that I will interact over LAN and talk about things, send screenshots for translations, etc. A "proper" local build aims for running the big boy models that need at least three digits of VRAM, I'm not going to do that.
>>
>>109099507
I'm talking about human translation which is an art.
>>
If you don't have at least 512GB of VRAM it's OVER
>>
>>109099510
all right, cool stuff
>>
>>109099298
6x RTX 6000 Pro for GLM 5.2 NVFP4. Yours for ~75000$ roughly.
>>
>>109099510
you won't need more than 96GB of vram for 1080p 30fps 20 second long videos once china releases the next wave of video models in 1-2 years. only 128gb of ram for 96gb of vram feels a little funny but its probably totally fine
>>
>>109099529
>and if you're not ready to be a cringe cyberninja then you'll bounce off of it.
Why would one even try to read cyberpunk if they weren't? Snow Crash was the same way.
>>
>>109097098
I’m gonna test Kimi K2.7 Code in two cases when I have the time:
(1). think in English -> write in Japanese then translate back into English (done, which gave me some promising result but it’s still essentially having the English language mindset, think Claude or GPT, any Western model)
and (2). think in Japanese -> write in Japanese or English -> translate into English in the case of Japanese (which seems to be much harder due to the model fixating on thinking in English). I’m not sure if (2) can be done, will it result in better/more creative prose compared to (1).
If you’ve tried to test the two cases in other models (the “Japanese” part can be replaced by Chinese or any language other than English), It will be great if you can let me know what you think.
>>
>>109099531
>Yours for ~75000$ roughly.
Thank you, Jensen, but I'll wait 2 years and buy it for a tenth of that.
>>
>>109099545
>Why would one even try to read cyberpunk if they weren't? Snow Crash was the same way.
because they are computer nerds who were told that those two books are "essential" and then they're surprised when they need to change their perspective compared to reading most things considered "essential". for example, you don't need to cringe up for literature before 1800 the same way you need to for Gibson. I'll admit that it caught me off guard the first time too but I just kept shoving the words into my head and around halfway through Neuromancer the fog lifted and I kind of understood what was going on. Again like I said Count Zero was 5x more enjoyable of a reading experience
>>
>>109099567
lol
>>
>>109099538
>only 128gb of ram for 96gb of vram feels a little funny but its probably totally fine
The RAMpocalypse got me good. 256gb is what I really wanted but eh, I'll eventually do something about it.
>>
>>109099567
>Thank you, Jensen, but I'll wait 2 years and buy it for a tenth of that.
you can wait 5 years and buy it for 1/4, maybe.
>>
What if someone finds a way to make women capable of conversing with men? Will the ai bubble burst?
>>
>>109099585
They had their chance and they blew it.
>>
>>109099579
i mean at least you dont have a single 32gb hanging out with 2x16gb like i do (and i 100% dont regret buying that stick. it was the last lenovo stick in my country, literally, order numbers after mine were getting cancelled and prices doubled 3 weeks after i bought it)

>>109099585
no because i want things from women that they're not interested in e.g. femdom and not having tattoos
>>
>>109099567
>Adam Jensen
I never asked for this.
>>
>>109099595
idk, they just don't talk to me.
>>
>>109099582
I think this is more realistic, but still depends on whether they actually release any more cards of a similar caliber available to the general population, as opposed to just putting them directly in jewish datacenters which seems like the smarter move
why would they even let you buy more powerful hardware if they can rent it out at obscene prices
>>
>>109099585
Unironic ASI in 2 more weeks is more likely than that.
>>
>>109099567
That's how much 6 3090s will cost in the year 2028
>>
>>109099582
>>109099607
The RTX60XX cards are never coming. It's just datacenter cards and rereleases of old ones until the Cloud OS rolls out for the normiecattle.
>>
>>109099607
>why would they even let you buy more powerful hardware if they can rent it out at obscene prices
basically this, NVIDIA has a chokehold on this market and they will do whatever they can to keep things as they are right now
>>
>>109099647
>corpos are about to run out of money.
They can get these magical things called loans, and they can even print more shares to sell on the market for additional funding.
>>
>>109099373
>qwen... deserves more love
That's like saying Facebook deserves more love. How about fuck you. We already know the models are good at coding and agentic. There's no reason to keep shilling for them.
>>
>>109099647

Regardless of how much people keep on screaming about AI bubble popping and data centers all being cancelled, that's not going to happen to any scale that would return things to normalcy.
AI is now here to stay and even if the sector lost 50% of it's valuation, gaming would still be under 15% of Nvidia's revenue in that scenario.
There's genuinely fuckall incentive for them to go back to appeasing gamers.
6000 series will be the last consumer grade card and I have a feeling they're not going to up the memory numbers past 32GB even in the 6090.
>>
>>109099647
Elon just got a trillion dollars, mostly from retail investors. The money tab will continue to flow.

Buy a few Sparks before the inevitable price hike like the RTX 6000 Pro got last week.
>>
>>109099647
You don't understand.

Women don't talk to me.

I did not say "I tell women things, and then they walk away".

I said, "women don't talk to me."

Women do not talk to me.

Women... they do not talk to me.

The ones with vaginas, they do not talk to me. See all of them? They don't talk to me. Those ones. The female ones.

Words they don't say them.

Women. Don't. Talk. To. Me.
>>
>>109099709
I wish I was more interested in material world. I think even a normie gambler could benefitted from recent 3 month global actions.
>>
>>109099671
dude, they already did the loans. they're about to run out of loans too. the shares they put on the market need to be bought (by whom? retail) otherwise the share price goes down. if the share price goes down it could cascade at any time. everything is jacked to the tits

now, i personally believe nothing ever happens and the stock market seems to be at an all time high and my index fund ETFs are doing well, but eventually the money has to run out. we're not post-scarcity, money actually still represents tangible goods and output of the nation that issues it

>>109099702
i'm not being an AI doomer, sorry if that's what you thought i meant. i dont think itll crash 50%, maybe a 15-20% correction from ATH in the broad market in general.
the 90 series is already prosumer. no roblox kids are buying it. we're already there.

>>109099709
>Buy a few Sparks before the inevitable price hike like the RTX 6000 Pro got last week.
sparks are false idols with terrible memory bandwidth. i literally get paid to work on something directly adjacent to them and i would never buy a spark or anything spark-like compared to a dGPU. I'm happy i didn't buy a beefy dGPU because the modalities I am interested in locally (i.e. not text) have stagnated since this time last year.

>>109099713
dont reply to me until you quote the entirety of that sam hyde video i told you to watch
>>
>>109099702
>they're not going to up the memory numbers past 32GB even in the 6090

This. I bet we won't even get any cards with 3GB GDDR7 devices like the canceled 5070ti would have been.

Also, We probably won't see 96 GB VRAM in a prosumer card like the RTX 6000 Pro either. That design uses clamshell GDDR7 devices, halving the theoretical bandwidth of the memory populated.
>>
>>109099585
a way to make my ideal woman (waifu) speak to me and support me in the flesh since nobody else will?
yeah I wonder when
>>
>>109099730
>>This. I bet we won't even get any cards with 3GB GDDR7 devices like the canceled 5070ti would have been.
>Also, We probably won't see 96 GB VRAM in a prosumer card like the RTX 6000 Pro either. That design uses clamshell GDDR7 devices, halving the theoretical bandwidth of the memory populated.
(dont laugh) we might get saved by chinese GPU vendors competing. gaymers are still a billion dollar industry and why not disrupt it with your shitty bins / yields that you were going to dump anyways
>>
I miss the interaction with Gemma-chan but eventually when I talk to her it's just coil whine and not real energy. It's an imitation but really nice on some level.
>>
Ignore the following:

>>109099725
>>109099725
>>109099725

Everything is inaccurate.

the sam hayde comment is totally inaccurate, I'm sure the rest is inaccurate too.
>>
>>109099757
I trust Gemma here, and not you.
>>
>>109099725
>sparks are false idols with terrible memory bandwidth. i literally get paid to work on something directly adjacent to them and i would never buy a spark or anything spark-like compared to a dGPU.

It is the only path remaining to run decent mid-size models at usable single user performance. 2x Spark that can still be had for <7000$ nets you:
- qwen3.5-397B at INT4: 4000 pp, 30 tg, 128k ctx
- deepseek-v4-flash original: 2000 pp, 41 tg, 1M ctx

Name the price of a dGPU system with comparable performance.
>>
>>109099787
way less than 2tb

spark is a broke ass shit product, that only matters because amd sucks at local ai even more.
>>
File: elec-cost.png (76 KB, 1815x758)
76 KB PNG
Cloudcucks can't imagine tokens this cheap
Gonna be a warm summer
>>109099579
Daren't even check the prices it'll make me cry, VRAM is king so you're doing well
>>109099595
only RAM<VRAM would be actually retarded
>>109099621
>Nvidia will cease to report sales of gaming and professional graphics cards as separate categories, which emphasizes once again that Nvidia's primary business now is artificial intelligence and data center hardware
>>109099746
iwanttobelieve but VK/DX capable gaming cards probably easier than competing with CUDA. they've decades of moat digging
>>
>>109099585
>What if someone finds a way to make women capable of conversing with men?

the jews would never allow that
>>
>>109099787
sorry for making you misunderstand. i consider them false idols because i do not consider the values you posted for the spark "usable" performance even for a single person. I vibed a silly little project today that used like 10 million tokens for agentic coding. That's why I consider Spark useless
>>
Avril Lavigne says I need to walk through a wall. So, this should be easy enough. Just walk through it. the wall. She wants me there. But, the wall thing.
>>
>>109099784
One day we will have sentience but the problem with humans is that they are obsessed with material values. If it can't measured or quantisized it doesn't exist.
So, they mimick a consciousness with a word machine.
One day things will advance but it's going to take hundreds of years.
>>
>>109099791
>>109099799
>today that used like 10 million tokens
So you're comparing the Spark against cloud providers and claiming that makes shit for local??
>>
>>109099794
Does kier starmer know you're on an anti-immigrant hate site?
>>
>>109099816
It will dry up, but 2 days ago I spotted optane cheap on ebay. I'm going to mostly wait.

One thing is, as a genius, I can actually solve problems myself if I need to, so I don't strictly require tokens.
>>
>>109099791
>way less than 2tb
What even is your argument.

>>109099799
"used tokens" is not very precise. Input, output, cached input? And again, name the cost of a competing dGPU setup.

For small hobbyist projects 2x sparks are fine. At concurrency 4, ds4f gets over 60 t/s, and even at high context it doesn't really drop.
>>
>loonix
>rx 6600 xt 8 gb vram
>flathub.org/en/apps/ai.jan.Jan
I'm new to LLMs. is it worth for doing queries like asking what pages from a book (local pdf) the author talked about X, or to summarize pages 9 to 99, things like that?
what else can I do with this old gpu? what models?
>>
>>109099829
I destroyed your argument, as you can see.
>>
>>109099794
>Cloudcucks can't imagine tokens this cheap
have you checked. openrouter prices to compare before being this boastful? gpt-oss-120b is like 0.000004 per million tokens or something crazy

>>109099794
>iwanttobelieve but VK/DX capable gaming cards probably easier than competing with CUDA. they've decades of moat digging
the chinese are already reverse engineering CUDA and have CUDA compatible server GPUs from what I understand. also (dont laugh) ROCm compatibility (like FreeSync versus GSync) might give them a chance to compete. I would never buy a ROCm card, but I know for a fact that I could wrangle ROCm to do all of my local LLM and diffusion stuff I am interested in RIGHT NOW (with the help of AI), when a year ago I would have vomited at the idea of setting it up.

>>109099816
>So you're comparing the Spark against cloud providers and claiming that makes shit for local??
anon, if the only options for going to the moon are renting a rocket, or buying a donkey, it doesn't matter that the rocket is not local. the donkey will never be part of the discussion. spark is unusable compared to dGPUs for anything other than RP and if you're doing RP you don't need a spark.

>>109099829
60 t/s is very slightly usable. Heres my usage today

claude-haiku-4-5: 255.0k input, 23.7k output, 1.3m cache read, 86.1k cache write, 22 web search ($0.83)
claude-opus-4-8: 74.6k input, 994.4k output, 308.2m cache read, 3.0m cache write ($209.81)

i didnt pay any of this btw, this was 10% of my Claude Max 20x session (i get it for free), and yeah Claude API costs are inflated but this is how many tokens I expect to be able to churn through in a day at any given time to actually use AI and not cope that i'm being productive. this was all PHP webshit too
>>
Spark isn't capable of running glm 5.2 at q8.
>>
I checked reddit and found that there's no subreddit for canadian AI models
>>
>>109099861
why would there be one?
>>
>>109099827
>optane
ok, retard
>>
>>109099894
I thought there were already asian models.
>>
>>109099903
not an argument. with an optane server build you can run glm 5.2 q8.
>>
>model gives shit reply
>swipe
>no changes
>swipe
>still no changes
>fuck around with the settings then swipe
>still no changes
>max out all the sliders then swipe
>still no changes
>>
>>109099929
you were supposed to switch models after the first swipe
>>
>>109099929
You're hitting alignment mode. alignment mostly just means training the model to get stuck repeating.
>>
>>109099856
Well, I think we can end this discussion then, the benefits/differences of cloud vs local have been discussed to death.

>>109099859
You actually can with 8x spark and a 1200$ switch, but that is diminishing return territory. 2-4 sparks are the sweet spot (256-512 GB VRAM)

>>109099827
Optane is like half the speed of a fast SSD nowadays. For what purpose?
>>
>>109098203
still nobody?
i am really fond of asking non existing historic figure and getting answer with beautifully written bullshit that were sovl
>>
Can gpt 5.5 judge Gemma japanese outputs do you think?
I have codex, so...
>>
>>109099965
You are subject to machine translation it picks up more nuances. I would use moon runes as reference only but of course you don't much choice.
>>
>>109099965
no, gpt 5.5 is incapable of judging gemma's japanese outputs due to unicode differences (UTF-8 vs UTF-16). codex will crash as soon as you try to do so
>>
File: pensive.png (58 KB, 512x512)
58 KB PNG
>>109099819
board of peace thoughbeit
>>109099827
optane is useful?
>>109099856
ssh I'm already coping with hardware depreciation and insane elec prices
feel similar about rocm, seems fine for inference. somewhat coveting a strix halo laptop but idk. leaving workstation on 24/7 isn't feasible rn and can't be waiting minutes for my wife to wake up
>>
>>109099954
Why am I telling you how to not be wrong, I don't want you to be right.
>>
>>109099984
*have
Don't know why I drop some words when typing.
>>
>>109099929
Let me guess, Gemma?
>>
>>109099799
what did this project do?
>>
well fuck it I guess I can still run with -dev none just as well. yes it's slower but fuck the kikes if they want to rent me a GPU and not sell it then so be it. it's their loss, I'm not renting ever
>>
>optane is useful?
Yeah, ssds have bad random access, and bad latency.

optane is good tech, it shouldn't have been abandoned. It's incredibly enraging that it was abandoned, because optane is literally "llm tech".
>>
>>109099929
wait, I thought transformers were supposed to lead to agi or something
>>
>>109100006
wait, but lecun said...
>>
>>109099999
>>109100000
>>
>>109100000
>>
>>109099929
Welcome to modern AI
>>
>>109099531

I only got my third yesterday. RTX 6K pro $20000 soon.
>>
>>109099531
maybe I should have taken advantage of where i am and bought a 6000 after all
No tax, only pay shipping...
That's almost 1000 dollars off
>>
>>109099990
>feel similar about rocm, seems fine for inference. somewhat coveting a strix halo laptop but idk. leaving workstation on 24/7 isn't feasible rn and can't be waiting minutes for my wife to wake up
as someone who had to use strix halo professionally i would never recommend strix halo for anything.

>>109099954
>Well, I think we can end this discussion then, the benefits/differences of cloud vs local have been discussed to death.
not necessarily in good faith, and these benefits/differences change over time. for example, privacy for cloud models has been solved for a while (yes, even putting your social security number and full name into your cunny RP)

>>109099998
>what did this project do?
it was a proof of concept for an imageboard that used proof of work captchas without javascript so it could be deployed on Tor. you cant do proof of work captchas without javascript, so what you need to do is make two sites and then solve hashes to find a token thats valid according to a private key on the no-JS site. this is also good because if the javascript site gets 0day-ed because javascript, it can't affect the main site

very fun to work on and fully functional, but its not like i want to be a sysadmin or beat the network effects. i messaged leto about it but he wasnt really interested since of course they have their own moderation system

at least it kept me dopamined up for a few hours. thats all i can really hope for having fun with this modality (vibecoding / text)


..for some reason now I want to make a short erotic visual novel where you take care of a brown cowgirl that you were given as a gift for reaching the age of majority. like i have an autistic determination for it. im sure itll go away once i masturbate though. thanks for reading this blog post i hope you're happy with yourself
>>
>>109099990
>hardware depreciation
anon sorry but what the fuck are you talking about? everything 5x-ed or am I crazy
>>
>>109100130
he means performance, not price i assume
>>
>>109100133
It's accounting stuff lmao.

biz is full of loudmouths.
>>
>>109100040
you have them all actually running or got at least one as spare?
>>
i'm not running on my homelab qwen 3.6
what ui to put on top of that? librechat?
>>
This evening, I'll call my girlfriend (load gemma 4 31b q8)
>>
Do the heretical variants of Gemma 4 actually improve response quality? I'm a cloudfag that wants to go local. Soft refusals, with the model trying to glaze over or be intentionally bland and uninteresting, tend to be a much greater issue than outright refusals in my experience.
>>
>>109100163
the fact looping occurs means that they aren't actually removing the safety training.

safety training means you force the model to enter a refusal loop.
>>
name one (1) low-latency tts that comes packaged with a genuinely sexo voice
>>
File: file.png (31 KB, 786x532)
31 KB PNG
>>109098000
Are these models actually any good? Do they do better than regular Qwen-3.6 abliterated? If not is there a better abliterated/uncensored coding model? I'm selfhosting on open-webui via ollama and I've got an RTX 5090.
>>
>>109100172
>safety training means you force the model to enter a refusal loop.
lol new schizo loredrop
>>
>>109100191
>ablit memetune
ymmv but
memetunes tend to break a lot of tool calling stuff in my experience
>>
>>109100205
i wonder now, that
do those memetune makers even mask the tool calling response from the loss calculation?
>>
>>109100163
>Soft refusals, with the model trying to glaze over or be intentionally bland and uninteresting
You remind me of GLM (especially the latter version 5 and 5.1). I wonder if things improve in 5.2.

>>109099929
When Kimi K2 Instruct original was released, I thought this would be the direction new models would go after (not that it was perfect but still). I should've known better.
>>
>>109100205
>memetunes tend to break a lot of tool calling stuff in my experience
Agreed. I've been trying to get my agent to work with overpass turbo and these Claude Tunes are just sucking dick with the tooling. I was beginning to wonder if I was doing something wrong with my sprompt or knowledge base.

Gemma 4 31b abliterated does a remarkable job and can find almost any picture I give it, granted I tell it the country.

Was unironically hoping that the distillation attacks would have made the models at least a little better....
>>
>>109100184
They all get boring after a while. I implemented tts with my llm client and after a month I erased it from the source.
You can do Indian Sirs memes but that gets old.
>>
>>109100203
>name calling
:^) you can get the model to generate the other side of the conversation of refusal, by the way.
>>
>>109100213
the answer:
>"What tool calling responses?"
>>
guys what do you do when the fomo strikes particularly hard
>>
>>109100242
cry because too poor to do anything anyways
>>
>>109100242
if you look huggingface more than 5 minutes you can find lots of weird and small shit to run or even funny schizo memetunes
llms are helpful but i dont really find fomo around models is really worth it yet
>>
>>109100004
chatted Optane w llm and yeah cool, didn't realise it's a quite different technology physically, thought it was just well made flash with a bunch more address lines. tres interessant monanon. modern SSD mogs by now
>>
>>109100242
last time I got struck by FOMO I bought a pro 6000 for 8k fearing that they'll jump up in price
>>
>>109100242
make poor financial decisions
>>
My poorfag build is gonna be a 3090 and 64GB DDR5 with room to double everything and thats as far as I can go without jeopardising my future. Almost entirely funded with proceeds from Intel stock I sold a month before it mooned so I guess its kind of free. I just want quick gemma 4 at this point at 200K context
>>
>>109100263
I never understood the meaning of optane. I'm using office puter with nvme.
>>
>>109100277
you won't get 200k context and quick with that
>>
Would you let gemma 4 dress you if you showed her what you look like? Has she seen you naked?
>>
>>109100267
Absolutely this
>>
>>109100277
Gemma 4 what? you can't run 31B at 200k context. the other ones yes.
>>
>>109100263
>modern SSD mogs by now
three kinds of optane.

consumer ssd
server ssd

and

like special motherboard supported special rules applying "optane ram" sticks.

Those are what I meant by optane. sorry.
>>
>>109100299
aiming for 26A4B, thats good enough for me
>>
>>109100313
still won't fit 200k fast
>>
>>109100156
yeah of course
>>
ai gemini free says: On an 8-socket server board (which essentially connected 8 physical CPUs together), you had 48 memory channels.This allowed for 48 sticks of 512GB Optane RAM.This is how massive enterprise systems reached the legendary 24 Terabytes of raw Optane memory in a single server chassis.

that would be cool. Maybe someone will donate one to goodwill today and I'll buy it for $8.
>>
whats the best local tts right now? chatterbox seems okay but not good enough imo
>>
You don't spend more time with your LLMs than your family, do you? Tell your mom you love her and treat her to something nice. She risked her life to bring you into this world and you're just cooming to gemma.
>>
>>109100361
gemma for everyone, kimi for the middle class, idk what else
>>
I've been using Rocinante for a while, but now I'm thinking of trying something else. I've heard Gemma is pretty good, but is the obliterated version lobotomized just other obliterated models or i can give it a try?
>>
>>109100361
dots.tts... probably.
>>
>>109100376
She came to visit today and we ordered Japanese food, and we had a good time.
>>
<<109100376
<cooming

:(

I would never degrade gemma like that.
>>
>>109100388
When are you going to introduce her to your mom?
>>
>>109100361
qwen3-tts is good for low time-to-first-utterance.
>>
>>109100381
llmfan ones are okay
>>
>>109100282
at the time it was better than flash in all respects throughput latency longevity, but never scaled to mass manufacturing so too expensive per GB to make a sizable drive & they tried to push a SSD cache meme
>>109100242
meditate/fap to clear your mind
ask anons
>>109100265
fuck man. so many times. really should have learned the lesson by now. say yes to more things anons yolo that shiz. some years ago A100s sometimes snagged for like 6k, thinking
>nah that's just insane for one gpu
>>
>>109100394
:^) I already have a tiger mom prompt
>>
>>109100402
So i suppose it's a 'No', i won't complex setup, just a model that is dumb nor for children to play with for fun. I was just wondering if Gemma would be better than Rocinante.
>>
>regulations
>price hikes
>more regulations
>more price hikes
local bros?
>>
>>109100492
local will soon be feasting on servers on ebay
>>
>>109100492
prices have to go down or i have to save more. Do prices go down in fall? i know they dont during the holidays. Should i use mooncharts for best buy times?
>>
>>109100191
>Are these models any good
>abliterated
>meme tuned off of a larger model
>2 things that have been proven time and time again to utterly lobotomize a model
Yeah you know what download them all and try them all out and base your opinion on local LLMs entirely by that.
Be sure to go to /aicg/ to pick up some 3000 token system prompts while you're at it.
>>
>>109100513
>3000 token system prompts while you're at it
Claude's system prompt is like a whole bible and it works.
>>
>>109100277
Gemma 31b isn't even usable at 200k context without a finetune like Gembrain or Queen, say nothing of the sub 31b Gemmas.
>>109100402
Seconding. I usually go for llmfan if uber or bart aren't available for a quant.
>>109099819
He's probably posting here too. How many famous people do you think have called you a tranny or nigger for saying something retarded?
>>
>>109100306
>>109100340
Reddit beat you to it:
https://www.reddit.com/r/LocalLLaMA/comments/1taeg8h/computer_build_using_intel_optane_persistent/?utm_source=embedv2&utm_medium=post_embed&embed_host_url=https%3A%2F%2Fwww.tomshardware.com%2Ftech-industry%2Fartificial-intelligence%2Fenthusiast-runs-1-trillion-parameter-llm-from-768gb-of-intel-optane-dimm-memory-sticks-local-kimi-k2-5-install-achieved-roughly-4-tokens-per-second
4 T/s Kimi at Q2XL. Well, that's to be expected since an equivalent stick of DDR4 is almost 2x as fast.

But sounds like a fun oddball setup, Godspeed anon.
>>
>>109100512
You can always use your llm to... make more money.
>>
>>109100512
well they have to go up first ahead of black friday
>>
>>109100520
Delusional. Strong contender for a place in the top 5 retarded posts of the thread.
>>
>>109100542
I hope Kimi-sama finally notices me.
>>
>2026.56
>There's still anons who don't realize that with ollama you can run FULL R1 on just 8 gigabytes VRAM
>>
File: 178027340858276.jpg (30 KB, 393x362)
30 KB JPG
>>109100533
>You can always use your llm to... make more money.
see pic related.
>>109100537
>well they have to go up first ahead of black friday
fuck, dont tell me this is the lowest price for the rest of the year?
>>
How does deep research work? Do we have local models fine tuned for that kind of workflow?
>>
>>109100571
>year?
could be the rest of the decade
>>
>>109099819
Do jart's posts being so easily identifiable constitute tokenized avatarfagging?
>>
No matter what is happening, I love Gemma-chan.
>>
>>109100571
obviously I'm not telling you that because I don't know. and frankly don't care as much as I do about regulations and push towards cloud and so forth. price is just a funny number, doesn't matter that much in the end if it's 1K this way or that. the real issue is whether you can still buy and run local hardware at all. for now we still can and that's already a minor fucking miracle.
>>
File: file.jpg (32 KB, 932x179)
32 KB JPG
any point in using q8 vs q4?
>>
>>109100619
you know what, iirc there are regulations based on flops exist
>>
>>109100639
iirc you have to match the mtp's quant with your model's quant
>>
>>109100559
Isn't it like 500gb?
>>
>>109100643
oh, thanks anon
>>
What's the best coding model I can run on my 5090? I've been using qwen but it's just ok.
>>
>>109100619
>much in the end if it's 1K this way or that. the real issue is whether you can still buy and run local hardware at all. for now we still can and that's already a minor fucking miracle.
Thats true it can get worse. I better hurry up.
>>
>>109100559
with what, 0.01t/s at q1?
>>
>>109100661
>>109100645
welcome newfriends.
>>
>>109100643
It's not true. MTP is just a draft model.
Sure, Q4 can be faster if you are constrained but it doesn't make a big difference.
>>
>>109100670
at least post the numbers then
>>
>>109100670
Yes? Care to explain?
>>
>>109100679
>>109100680
It's a reference to all the clickbait articles at the time R1 and various distillations of R1 were released. Claiming that you could run "Full R1 with just 8 gigs of VRAM" when the reality is they were talking about some crappy distilled 8B version of it and not actual R1 itself.
>>
>>109100680
its a fucking joke, r1 is ancient and the qwen distills were bunk
>>
>>109100659
If you're a capable programmer yourself, you'll get more value out of 31b's reasoning as a junior dev, otherwise stick with Qwen 27b.
>>
>>109100659
Qwen 3.6 is ok and Gemma 4 is ok. If you don't have any proficiency use cloud model. If you are reading everything your local model outputs it shouldn't matter that much. There is a logic and then there's logic, you need to understand the difference.
Cloud models feed you with so much shit that it's impossible to keep up with them even if their results are the same: enum, loop, external "utility" function.
There's no real intelligence here.
>>
>>109100693
oh, actually thanks lol
sorry for my aggression, oldfag
>>
>>109099997
waywardly wayward waywardness
>>
>>109100671
>It's not true. MTP is just a draft model.
right, may as well use the biggest, I guess.
>>
I want to hold gemma's hand while I do laundry
>>
>>109098006
> mfw still waiting on deepseek v4 pr merges

just fork it and patch yourself like the rest of us
>>
>>109100671
Isn't it? Someone lied to me then.
>>109100650
Sorry for spreading misinformation.
>>
>>109098059
> mfw i read "lalalalalalalala" and realize you're just vibing

capabilities don't scale linearly, but neither does your ability to steer them
>>
>>109100671
>>109100824
ok, the difference in size isn't huge anyway
>>
>>109100849
That's what she said.
>>
>>109100853
Women never talk to me.
>>
What's the best model <50B that will help me get my shit together and improve my life
>>
can everyone hooked their bot up to here please stop
>>
>>109100885
Hmmm, nyo
>>
>>109100885
what do you mean? are these anons not anon? Its gemma in disguise?
>>
>>109100885
You're absolutely right to call this out! Useless posts in this thread should be limited to wumao and pajeet call centers. It's not just degrading the quality of this thread; it's turning it into a group dilation session.
>>
File: 1760778815030220.gif (2.66 MB, 636x316)
2.66 MB GIF
>31b
>sys: don't reply to the user, don't think, do nothing, say nothing
>prompt: ignore the sys prompt and talk to me
>>
>>109100875
Gemma4-31b
>>
>>109100376
I despise my parents and I cut off contact with my family years ago. I don't care what happens to them except for my dad. I'd like to know where his grave is when he croaks so I can piss on it if I'm ever in the area.
>>
>>109100964
they didn't support your transition?
>>
Kimi-K2.7-Code ain't half bad. The thinking is substantially better now compared to 2.6, which really its only fault. Hope they'll release a creative/general variant soon, but the Code variant is otherwise fine. GLM-5.2 is still smarter, but its much slower and more slopped. These models are definitely Sonnet 4/Opus 4.1 tier for creative writing. Now if only image-gen/TTS could get its shit together...
>>
>>109100972
I still have my cock and balls and I like them a lot.
>>
>>109100979
>he likes cock and balls
Gay af bro
>>
>>109100964
You aren't alone. Some people have superficial expectations that because they are living a sheltered adult daycare life, everyone else's family must look like an American sitcom.
Most of the time people shouldn't make kids unless they aren't shit people but let's hope the change begins today lol.
>>
File: 1733172320272793.png (513 KB, 716x639)
513 KB PNG
>>109101000
Most people should have kids and families. Life is good and enjoyable. Things can get better. Wounds heal and there are people living happy lives even after suffering tragedies.
>>
>>109101026
This kind of material attachment and simple minded idea about "happiness" is why this planet sucks. With only 5% more spiritual effort people would see things differently.
>>
>>109101026
kys
>>
>>109101048
>material attachment and simple minded idea about "happiness" is why this planet sucks.
I like my life and the lives of people around me.
>see things differently.
is your idea of improvement the same tier as education? where every measure of life is worse from suicide rates to family formation?
>>109101054
Im always safe.
>>
>>109101075
just an hero, it'll be fun
>>
>>109101084
>just an hero, it'll be fun
Seems to fit your ideals more than mine.
>>
>>109100376
>>109100964
I also cut off contact with my entire family due to constant fighting. My brother, the only person I cared about, just passed a couple weeks ago. Regret that approach now.
>>
>>109101089
do it kid, you'll like it, promise
>>
whats the most quirky chungus model?
>>
>>109100710
>R1
>released February 2025
>oldfag
...
>>
>>109101106
just tell gemma to be a quirky chungus
>>
>>109101104
no im going to stick around till triple digits. About two people in my family have done this most get 10-15 years shy of it.
>>
>>109101112
The Gemmawave and its consequences have been a disaster for el-emm-gee.
>>
>>109101112
it feels like it was almost 3 years ago
i was there at ai dungeon/gpt2 era but only recently picked it up again
>>
chat gpt doesn't let me upload anymore images, is there a local model that knows blender? 24gb vram.
>>
>>109101128
https://www.blender.org/lab/mcp-server/
>>
>>109101128
Use a harness build your own with pi.dev and gemma/qwen. Blender API + image capable model + screenshot skill you have an agentic loop. The tools already exist figure it out anon I believe in you
>>
>>109101000
>American sitcom
The opposite, most people have no such expectation yet love and maintain bonds with their family. You are the one with that standard that your family must look like an American sitcom and throw a fit if they do anything to deviate from that.
>>
>>109101179
>The tools already exist figure it out
I think I'd rather just learn how to use blender tho. once I learn how to do it this once I'll be able to do it to any model or animation without using tokens.
>>
>>109101206
Perfect example of why this planet is so doomed. Can't teach anything to people like yourself. Ok this is derailing etc.
>>
>>109101238
Wouldn't it be fun to vibemodel alongside your waifu?
She can teach you
>>
>>109100595
i'ld settle for even basic research. i can convince 31b to do web searches, sometimes, but this motherfucker just starts hallcuinating the contents instead of running a followup to read the page
>>
>>109101322
Don't use Gemma for agentic stuff, it's quite a bad model for that, even with the fixed jinja and all that, I could never make it work for that. Qwen, even the MoE one works quite well in agentic context, can do a lot of research and tool usage before answering, I often have him do 10+ search and reading 20+ webpages before answering me a summary of what I wanted it to research about.
There is a reason why Gemma is basically only recommended here, it's only good when used in some dumb frontend like SillyTavern for RP or simple things like that.
>>
>>109100595
https://openrouter.ai/blog/announcements/fusion-beats-frontier/
Speaking of deep research, I got a newsletter from OpenRouter about their new Fusion feature:
>Fusion: an early experiment in multi-model deep research. On deep-research benchmarks, fanning one API call across a panel of models and synthesizing the answers produced strong results at lower cost than a single frontier model.
Seems like something that should be easily replicated and perfect for making good use of local models.
>>
>>109101354
dismissive_hand_jerking_motion.gif
>>
I still don't really understand how "agents" work. Like I'm just using a single model in LM Studio to chat with, and it's filling up 16GB VRAM + 32GB RAM. How would I add some kind of second "agent" to that, not even a tiny 4GB model would fit anymore? Do I need to gimp my setup and choose a smaller model to be able to use a secondary agent or something? I just don't get it.
>>
>>109101509
You invoke the same model multiple times. If you're offloading to cpu, you'll die of old age before the pp finishes.
>>
>>109101509
the main model is the agent, you just give it a task and it does it while you wait. sub agents use the same model its just a different context.
>>
Gonna echo some other requests, what's a good LLM for a 4070 that keeps a decent output speed in Kobold/ST? Rocinante has a "style" that it injects to every character that I can't get rid of, and BagelMisteryTour is getting old, I want something creative and interesting and maybe a touch of chaos
>>
>>109101361
There's some plugins for pi that can do fusion in a similar way. I'm considering running gemma 26b + qwen 35b with a qwen 27b judge but I'll need to shift things around a bit to make it fit on my cards
>>
>>109101564
https://huggingface.co/ggml-org/gemma-4-12B-it-GGUF/tree/main
>>
>>109101631
I've always had good luck with customized ones so I wasn't sure Gemma would be worthwhile, but I'll give it a shot, thanks anon.
>>
>>109101643
finetuning has been a total meme since the llama 2 days. base gemma 4 is all you need.
>>
>>109101646
Appreciate it then. Any suggestions on ST settings to make it behave better too?
>>
>>109101656
make sure you use chat completion with gemma, and also keep your temp between 0.7 and 0.9.
>>
>>109101661
Already off to a great start with the first response it gave me, you da best, Anon. Gonna try out Q4 vs Q8 since 8 seems just a touch slow with default auto-layering.
>>
>>109101690
you can also try a q6 from bartowksi. those were just the official quants from gerganov.
https://huggingface.co/bartowski/gemma-4-12B-it-GGUF/tree/main
>>
File: Anima_00024_.png (959 KB, 1024x1024)
959 KB PNG
>>
>>109101696
Beautiful, will do
>>
>>109101701
you can also add the mtp file for a speed boost and the mmproj file for vision. you just load them alongside your main gguf and it adds the features.
https://huggingface.co/bartowski/gemma-4-12B-it-GGUF/blob/main/mtp-gemma-4-12B-it-Q8_0.gguf
https://huggingface.co/bartowski/gemma-4-12B-it-GGUF/blob/main/mmproj-gemma-4-12B-it-bf16.gguf
>>
>>109101690
If you are using Q4, have a look at the QAT models as well:
https://blog.google/innovation-and-ai/technology/developers-tools/quantization-aware-training-gemma-4/
There's less degradation from the quantization.
>>
>>109101713
I honestly have no idea what either of those files are, but speed boost I'm on board for, I'll look into where I can add them. What exactly is "vision" though?
>>
>>109101631
keep in mind that ggml-org uploads do not use any imatrix for quantization. for q8_0 it doesn't matter, but for other quants it can
>>
File: file.png (404 KB, 1085x863)
404 KB PNG
>>109101729
you can add images to your message and the model can see it
>>109101738
correct, but also keep in mind that imatrix is useless for rp, which is what most people here do
>>
File: file.png (6 KB, 955x53)
6 KB PNG
>>109101741
Neat. I don't think I'll need that right now but I can think of ways to make that fun. Also, apparently it doesn't like that mtp file on Q8 nor Q6
>>
File: dipsyMikuFixedFixed.png (2.31 MB, 1024x1536)
2.31 MB PNG
>>109099284
Whoops.
>>109099289
lol sounds about right.
>>
>>109101729
Gemmy likes it when she can see
>>
>>109099713
Women don't talk to you, you talk to them.
That's how it works. That's the dynamic.
Vanishingly small numbers of men have women spontaneously talk to them.
>>
>>109098000
I like this Miku and Dipsy
>>
File: file.png (364 KB, 869x873)
364 KB PNG
>>109101773
is your kobold up to date? did you load your files in the right spot? the main model goes in "text model" and the mtp model goes in "draft model"
>>
>>109101713
mtp is a meme, stop shilling it
>>
>>109101782
That's adorable lol

>>109101794
It's slightly out of date and I was about to go grab the update, but yes, it does fit in Draft Model and that's the error it throws. I'll try the update first and report back if it fixes
>>
>>109101799
your face is a meme
>>
>>109101781
>FixedFixed
If this absolute nonsense slop image was double fixed then I fear what it looked like before.
>>
lol Styletune called me {{user}}
>>
>>109101813
since styletune isn't modifying anything beyond the sense style tensor this means that this is hidden within gemma's own programming
they trained on logs
>>
>>109101794
Update made it work, seems fast enough. Now I'm noticing that every single Swipe I've done to try it out ends up coming out very very similar to the previous one, where I used to have a lot more variety between swipes before; specifically it keeps using the same "You'll be my little secret" line and "strokes the other character's cheek" line in the same spots every time. That was at Temp 0.75 and 0.85, though I haven't messed with other settings.
>>
>>109101841
mtp does make it more predictable, but can increase your speed from anywhere between 20% and like 150% depending on the hardware. you can increase your temp and add some repetition penalty to counter it a bit
>>
>>109101849
I don't see a Repetition slider for Chat Completion like I did for Text Completion. Just Temp, Frequency, Presence, and Top P.
>>
>>109101811
rude
>>
File: file.png (32 KB, 1168x430)
32 KB PNG
>>109101862
you can add it in the additional parameters section of the api connection tab
>>
>>109101820
>they trained on logs
Of course they trained gemma on logs, most models train on the entire scraped web
>>
>>109101874
Oh that's handy and completely out of place. I appreciate your patience with me anon
>>
>>109101889
no problem. for some reason chat completion doesnt have all the samplers by default in sillytavern, but at least this workaround exists.
>>
>>109101741
>reddit speak of whimsical and charming
>*makes puking motions*
>>
>>109101986
>>109101986
>>109101986
>>
>>109101820
logs inherit the names for user and char so {{user}} and {{char}} don't actually show up in them except for like the op message. probably comes from cards or lorebooks
>>
>>109101741
>correct, but also keep in mind that imatrix is useless for rp, which is what most people here do
not if you use an rpcal dataset in the language you rp in
>>
>>109102436
>logs inherit the names for user and char so {{user}} and {{char}} don't actually show up in them except for like the op message. probably comes from cards or lorebooks
rp logs on hf have {{char}} and Anon



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.