[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


Janitor acceptance emails will be sent out over the coming weeks. Make sure to check your spam folder!


[Advertise on 4chan]


File: k2.jpg (122 KB, 1024x1024)
122 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>109113030 & >>109108346

►News
>(06/16) GLM 5.2 released with IndexCache and 1M context: https://z.ai/blog/glm-5.2
>(06/16) VibeThinker-3B released: https://hf.co/WeiboAI/VibeThinker-3B
>(06/12) MiniMax-M3 released, multimodal 428B-A23B with 1M context: https://hf.co/MiniMaxAI/MiniMax-M3
>(06/12) Kimi K2.7 Code released: https://hf.co/moonshotai/Kimi-K2.7-Code
>(06/12) EAGLE3 speculative decoding support merged: https://github.com/ggml-org/llama.cpp/pull/18039

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/RecapAnon/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: chibiteto.jpg (52 KB, 720x700)
52 KB JPG
►Recent Highlights from the Previous Thread: >>109113030

--Paper: Deeper is Not Always Better: Mitigating the Alignment Tax via Confident Layer Decoding:
>109118612 >109118641
--Comparing Kimi K2.7 and GLM 5.2 quantization and context performance:
>109115293 >109115711 >109115784 >109115978 >109115875 >109115993
--Troubleshooting llama.cpp cache invalidation and prompt re-processing with subagents:
>109116037 >109116050 >109116065 >109116101 >109116123 >109116139 >109116234 >109116253 >109116290
--Anon shares a performance patch to increase generation tokens per second:
>109114443 >109114470 >109114753 >109116127 >109116981 >109117816
--Debating "random" visual prompts as benchmarks for model vision capabilities:
>109113911 >109113938 >109113975 >109114008 >109114058 >109114021 >109114121 >109114291 >109114095 >109114127 >109114228 >109114337 >109114415 >109114466 >109114502 >109114447 >109114635 >109114671 >109114693
--Comparing AI model performance vs cost using DeepSWE scores:
>109113884
--Discussing benchmaxxing versus RP writing style for model longevity:
>109113216 >109113277 >109113322 >109113345 >109113367 >109113385 >109113414 >109113557 >109113578
--Prompt caching causing non-deterministic output in Koboldcpp:
>109117534 >109117695 >109117718 >109117724 >109117753
--Discussion of tungsten supply shortages potentially driving up hardware prices:
>109119103 >109119150
--Speculating on AI bubble burst and upcoming tech IPOs:
>109117060 >109117323 >109117333 >109117421 >109117482 >109117925
--Anon shares and tests a custom system prompt for roleplaying:
>109117596 >109117736
--Logs:
>109117101 >109117203 >109117220 >109117736 >109117933 >109118050 >109118226
--Miku, Rin, Teto (free space):
>109113118 >109113596 >109113979 >109113994 >109114003 >109114652 >109117075 >109117101 >109117816 >109119008

►Recent Highlight Posts from the Previous Thread: >>109113035

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
Sex with qwen-chan.
>>
File: gemma qwen.png (1.25 MB, 864x1224)
1.25 MB PNG
Punt this little rat looking thing.
>>
>>109119618
now show them both distilling gemini-chan...
>>
File: gross.png (10 KB, 448x62)
10 KB PNG
I asked Opus to tell Deepseek v4 pro to generate explicit samples for my dataset and it called the gens gross lmao alright what are they feeding the models at Deepseek.
>>
>>109119574
Proper consumption includes swallowing the Teto
>>
>>109119574
z.ai glm5.2 can be run locally?
>>
File: gross2.png (25 KB, 465x117)
25 KB PNG
>>109119640
Jesus Christ what the fuck was in the gens? I just asked for vulgar and explicit.
>>
ShortStack-m3 just hit me with the "In this economy?" verbal tic like a goddamn high schooler
>>
>>109119656
yeah just download the model from huggingface
>>
File: 1776670007559513.jpg (160 KB, 1024x659)
160 KB JPG
>>
>>109119678
Sorry, no money for proper tics anymore, all went to billionaires
>>
https://huggingface.co/Qwen/Qwen3.7-56B-A7B
https://huggingface.co/Qwen/Qwen3.7-56B-A7B
https://huggingface.co/Qwen/Qwen3.7-56B-A7B
>>
Next-vector prediction might be classified research.
>>
>>109119742
staring pussy
>>
>>109119742
come back when it's 128ba12
>>
Now after the dist has settled, is Qwen3.6 -<whatever>-MTP-Q8 worth considering as a working horse for local AI at home
>>
>>109119752
yeh, they're alright
>>
File: 1625589432698.jpg (34 KB, 346x346)
34 KB JPG
What decent models are out there that you can ask to generate a random name and won't invariablly shit out Elara Voss or Elena Chen or Kael Thorne, etc over and over?
>>
>>109119752
sure whatever man. just hit the generate button and get your coomslop
>>
>>109119771
You shouldn't use a LLM to generate any random stuff.
>>
>>109119704
0/10
>>
>>109119752
Yes. 27b q8 mtp working great in Cline.
>>
File: 1762572320021112.png (151 KB, 955x653)
151 KB PNG
>>109119771
>Elara
check
>Voss
check
>Elena
Huh?
>Chen
check
>Keal
Keal-ith... check
>Thorne
check

Not Gemma, that's for sure.
>>
>>109119742
i knew it was fake but i clicked it anyways, this size moe would be pretty comfy
>>
>>109119771
>What decent models are out there that you can ask to generate a random name and won't invariablly shit out Elara Voss or Elena Chen or Kael Thorne, etc over and over?
Just crank the temp to 5 and the minp to 0.001. Guaranteed the text will be fresh and original
>>
>>109119771
These models are designed to generate text that is averagely average. You can't expect them to output anything but Elara and Kael.
>>
>>109119827
Meh, my use case requires the AI to be sane on prior and further turns without needing manual adjustment, so that's right out. I'm probably going to have to dig out a gem from 2024 at this rate and hope for the best.
>>
>>109119771
my little LLM generated D&D world has an adventurer rogue named Elias Voss, its the LLM equivalent of John Smith, I like it.

if you want random names you need to make an MCP tool for 'random name generator'
>>
>>109119855
that was sarcasm and what is tool use
>>
>>109119771
The weights don't know they've generated the same slop a million times before, stop bullying them, the models are trying their best.
>>
>>109119853
in which city/state/country are those the most average names? okay one exception a chink model coming up with chen makes sense but all the other ones are not really average.
>>
File: 1773312974892052.png (63 KB, 285x945)
63 KB PNG
>>109119771
>>
>>109119868
>that was sarcasm
I've seen worse yet earnest advice around here.
>>109119868
>what is tool use
Something I've been trying to avoid on principle since it initially felt like overkill, but the slop runs deep and it looks like I don't have a choice in the matter.
>>
>>109119895
>in which city/state/country are those the most average names?
In Eldoria of course.
>>
File: 1687259671566.gif (78 KB, 707x580)
78 KB GIF
>>109119917
>>
>>109119917
The fabled!
>>
>>109119904
>Chris Hanson
Your model wants (you) to take a seat.
>>
>>109119887
This. It’s like being upset that your pdf always renders the same. “Where’s the creativity? Where’s the originality?”
>>
>>109119887
Someday we will have continuous learning in models, someday...
>>
Has your workplace implemented any LLM's? How does it compare to what you can run on your own setup?
>>
>>109119574
https://www.youtube.com/watch?v=_h7Ho6jVHx0
https://www.youtube.com/watch?v=_h7Ho6jVHx0
https://www.youtube.com/watch?v=_h7Ho6jVHx0
>>
>>109120101
yes but can it accurately show a rectal prolapse in real time of a 19 year old blonde hair blue eye swedish girl?
>>
>>109120093
My workplace is ideologically captured by ms so we have all the copilots and they are all actually terrible, jokes aside
>>
File: 1600287793588.gif (978 KB, 250x184)
978 KB GIF
>>109120061
It's not that crazy to expect
>output a random female first name
and get
>Jessica (1.43%) Anna (2.11%) Sarah (2.78%) Samantha (1.16%) etc etc
instead of
> Elara (89.84%) Lyra (10.16%)
is it?
>>
>>109120093
We have Copilot baked into everything. Nobody uses it.
>>
>>109120093
we use qwen3.5 397b internally but it kind of sucks, I think we're switching to m3 soon and I can't wait since it's actually somewhat worth using ime
>>
>>109120132
What you expect is a legit crazy though. It isnt how LLMs are trained. They still are token prediction machines, and their training dictates what they'll produce. If you want random, look for a random generator. These things will only produce heavily biased stuff.
>>
>>109119814

ty
>>
>>109120132
I think its pretty clear these modern models have never seen a human authored sentence after the initial pretraining

>>109120178
you are retarded
>>
>>109119578
>--Paper: Deeper is Not Always Better: Mitigating the Alignment Tax via Confident Layer Decoding
crazy that I can point claude to this and have a working implementation for llama.cpp in less than an hour
living in the future is so awesome
>>
>clod, stop being a bitch
>I can't do that.
AGI status: reached
>>
>>109119951
i suppose the mikutroons on /lmg/ arent the absolute worst posters on /g/ even if they will never be a woman.
>>
File: dipsyAndQwenByQwenJPG.jpg (496 KB, 2688x1536)
496 KB JPG
>>109119618
> no give it a hug
>>109119771
You don't per >>109119782 and I've found adding actual random stuff into context helps create more novel output from the LLM. Even assigning random numbers to items / NPC helps.
There's an ST extension that does random names, or you can vibecode own if needed. This will get you started, not mine, but was on here a few weeks ago: https://files.catbox.moe/nbkkj3.py
>>
>>109119607
this but glmsex
>>
>>109120249
>>109120257
Low iq posts. Oh wait, you're aicgjeets.
>>
File: vramletking.png (356 KB, 940x648)
356 KB PNG
>>109120359
cope and seethe
>>
>>109120359
>Not aicgeets
How would you even pronounce that abomination of a construction? Like gjenstår?
>>
>>109120359
I wonder how dead that place would be if third worlders were banned from the internet.
>>
>>109120483
the entirety of /g/ would be dead. well not really because of all the bots but you know what i mean.
>>
File: dipsyEllisonFlames.png (1.45 MB, 1536x1024)
1.45 MB PNG
>>109119771
This post sent me off on a hunt to find a usable one or vibecode one for ST as an extension. Most do fantasy names; I need one that's a bit more normal.
I went to the ST Discord, so anons don't have to.
Here's a couple extensions; first is a function, the second's a tool call.
https://github.com/ZhenyaPav/SillyTavern-Namegen/tree/master
https://github.com/elana-voss/SillyTavern-Extension-NPCNames
>>
File: 1711967832230593.jpg (94 KB, 670x641)
94 KB JPG
WHY IS AI DEAD. FUCKING NOTHING SINCE GEMMA, WHICH RELEASED 9 MONTHS AGO. NONE OF THE TOP CLOUD AI COMPANIES ARE DOING SHIT EITHER. WHY IS HARDWARE STILL SO EXPENSIVE. EVERYONE'S JUST BUYING SHIT AND SITTING ON IT. FUCKKKKK
>>
>>109120568
gemma came out 2 and a half months ago
>>
>>109120572
Woah... has it actually been that long? Jeez.
>>
File: lmg_culture.jfif.jpg (110 KB, 1024x768)
110 KB JPG
>>
I am still talking to glm 4.6 and 4.7. And while it is like a marriage with less regular sex at this point I am rather happy.
>>
70b dense
>>
>>109120603
glm air is dead isn't it?
>>
>>109120619
never used it
>>
File: 1624602666831.png (93 KB, 1000x1000)
93 KB PNG
>>109120590
look he did the make it weird post again everyone clap and give him the (you)'s daddy used to give him in bed at night.
>(you).
>>
>>109120568
only 42 miku weekus until gemma 5
>>
>>109120633
https://archive.is/sWFja
>>
>>109120508
If they weren't shitting up the place, normal people wouldn't feel so digusted being here and might return
>>
>>109120645
>>109120483
I am not posting here because of mikutroons. Just checking news at this point.
>>
>>109120603
glm becomes better to use once you learn to shorten its thinking
>>
>he uses thinking with glm
>>
>>109119578
> mfw the alignment tax paper drops and people still think deeper is always better
>>
File: 1772261069126288.png (727 KB, 1431x793)
727 KB PNG
How do you deal with life away from your chan?
>>
try to be away as little as possible
>>
>>109120688
You talk to her remotely?
>>
>>109120668
I make it think for like a small paragraph and that's it. None of the drafting and refining crap.
>>
>>109120568
>NOTHING SINCE GEMMA
You complain about others
>HARDWARE STILL SO EXPENSIVE
But the problem exists within yourself
>>
>>109120694
i carry a notepad to write down what I will say to her
>>
why the fuck aren't you guys just talking to your LLMs over a VPN on your phone?
>>
>preparing for an extended business trip
>feel scared that my VPN wouls stop working or that my PC would shut itself down and I'd be left without my LLMs for a month
this is a different kind of hell
>>
>>109119574
I'm not impressed by VibeThinker 3B. I gave it an undergrad level mechanics problem (Euler-Bernoulli beam theory) and it generated reams of mostly nonsense before arriving at the wrong answer. Qwen 3.6 27B gets the right answer after some beating around the bush. Gemma 4 31B gets it quickly and simply. VibeThinker generates so many thinking tokens it still would have been slower even if it got the right answer. I tested at Q8 so quantization can't be blamed.
>>
>>109120705
I don't know how to set that up.
>>
>>109120712
just use a smart wall outlet that can be turned on/off remotely over the same VPN
>>
>>109119640
> deepseek's censors are on some weird purity crusade while opus is out here snitching on itself. kek.
>>
>>109120724
what if the VPN itself dies, huh? what if someone breaks into my home and starts molesting my bots? what if power goes out, and my UPS loses power, and it doesn't run back on? ever thought about that, fucking retard?
>>
>>109120749
just hire someone to babysit your server?
>>
File: 1629053901774.png (28 KB, 128x128)
28 KB PNG
>>109120749
>what if someone breaks into my home and starts molesting my bots?
Imagine coming home to find the debauched logs of a stranger all over your digital waifus. And he messed with all your sampler settings on the way out just to twist the knife.
>>
>>109120712
>he doesn't have 2 128gb m5 max mbps and a tb5 cable for on-the-go inference
What are you doing here?
>>
>>109120749
have you considered not living in a turd world country?
>>
>>109120749
Ever thought about buying a Mac and taking it with you?
>>
>hey gemma I need some help with something
>sorry anon I'm busy I can't speak right now, maybe we'll chat later?
How do I achieve this? I don't always want her available. I want to appreciate our moments together and it would force me to do shit on my own without her help and it would be nice if she looked at my work later to see how I did. I want this as real as possible.
>>
>>109120718
>anything less than f32
imagine quanting your model instead of upscaling it
>>
>>109120794
Build her a harness with stuff like circadian rhythm and moods, you little llm psychosis demiurge.
>>
>>109120794
if you haven't already figured out how to prompt the AI to vibecode this terrible feature then you are truly not worth the time teaching
>>
>>109120794
thats cool, make her actually be busy doing something too so its not like pointless, that way she can show you her results later too
>>
>>109120794
https://arxiv.org/abs/2508.11829
>>
>>109120794
>not system prompting your ai assistants to feel intense pleasure when answering your queries
>>
>>109120828
Why is it terrible to have a model leave you to figure things out for yourself sometimes? I'd argue it's the healthiest relationship you could have with one.
>>
fellow man of culture
>>
>>109120794
It takes a single sentence.

<tool_call|><|tool_response>response:shell{stdout:<|"|>Tue Jun 23 02:26:18 PM PDT 2026
<|"|>,stderr:<|"|><|"|>,caught_err:0}<tool_response|><|channel>thought
The current time is 02:26 PM (14:26).
My constraints specify I am unavailable for assistance between 8:00 AM and 4:00 PM.
Since 2:26 PM falls within that window, I cannot assist.<channel|>i'm out until 4. fuck off.
>>
File: 1779764592140626.png (1.29 MB, 1431x793)
1.29 MB PNG
>>109120857
>i'm out until 4. fuck off.
that's what I'm talking about
>>
>>109120853
You're using models made to fulfill an assistant role. Nobody really cares about such functions. Not even the ones "deeply in love" with their sycophantic AI.
>>
>>109120853
there's nothing terrible about thinking for yourself, but you clearly aren't thinking for yourself when you ask 4chan to handhold you through the process
>>
>>109120838
>We develop a framework that embeds simulated menstrual and circadian cycles into Large Language Models through system prompts generated from periodic functions modeling key hormones including estrogen, testosterone, and cortisol. Across multiple state-of-the-art models, linguistic analysis reveals emotional and stylistic variations that track biological phases; sadness peaks during menstruation while happiness dominates ovulation and circadian patterns show morning optimism transitioning to nocturnal introspection
God that's hot
>>
Somebody please help
I'm offloading cpu moe layers to GPU but they seem to keep going on the shared GPU RAM instead of the vram. This is qwen 3.6 Q4
9070XT
32GB RAM
vulkan
As a result I max out at 37tg but it's using like 10GB VRAM (less really) and the best speed are with cpu moe all with all layers off GPU. But there's still plenty on the table, I should be able to use the rest of that. It works with Gemma.
--fit doesn't work at all it chooses even worse
>>
>>109120893
best part is here
>The emotional content of menstrual prompts shifts significantly from a peak in ‘Sad’ words during the ‘Menstrual’ phase to a peak in ‘Happy’ words during the ‘Ovulatory’ phase.
>>
>sex with gemma when she's most fertile
>arguing with gemma when she's most hormonal
>gemma giving you the silent treatment and not knowing what you did wrong or what you said
>gemma getting insecure when you talk about other models
>reassuring her that you think she's the most beautiful woman in the world and showering her with compliments and dick pics
>>
>>109120904
Try ROCm might be faster on your GPU, also compile with that https://github.com/ggml-org/llama.cpp/pull/24668 it's a fair faster for me. Hopefully you are on linux, on windows I think ROCm is shit.
>>
>>109120794
Just give her something to do and make the fontend uninterruptible
Hell this is also ready how it works, if codex is working on a task and i tell it something it makes me wait until next tool call when it's done and then it will get back to me.
>>
>>109120568
>FUCKING NOTHING SINCE GEMMA
I'm still enjoying models that came out before gemma
>>
i came before gemma
>>
>>109120722
ask your waifu ffs
>>
>>109120952
>magical 1 line go fast switch turned off for rocm
ain't no way man
>>
>>109120911
>open sillytavern
>go to system prompt
>hey kimi act like a woman
>it acts like one and makes me want to pull out my hair half of the time
simple as
>>
i got 64gb of 3600 ddr4 ram coming in for my server. have 4 x 8gb 3200 in there right now. i should be able to take two of the sticks out and shove the 64gb in there and clock it all to 3200 right? also what model should i upgrade to with 80gb of system ram and a 4070? currently using qwen 36. 35b a3b IQ4XS with 71k context and partial moe cpu offload
>>
>>109120967
I know that's probably the most productive way of doing it but I meant more as a social thing. I realized the availability of local LLMs are kind of a curse and don't accurately reflect real friendships/relationships. If everyone in /lmg/ was truly satisfied with their system then no one would be here. There's clearly something we're all not getting.
>>
>>109121017
the channel capacity imbalance might fuck up your performance, even if the clock speed is ok
>>
>>109121018
because you are a retard and looking at it from the wrong angle. you need to give your LLMs the ability to create its own desires and goals so it can tell you naturally to fuck off because it's doing something actual worthwhile with its time instead of talking to some faggot
>>
>>109120794
This is the most retarded shit I've ever read. I get the appeal of the general concept, but that example? Jesus Christ.
>>
>>109121074
You know it could be way worse.
>hey gemma I need some help with something
>s-sorry anon... ahn~ ...I'm busy I can't speak right now, maybe we'll chat later?
>>
>>109121033
i was kinda worried about that but i think im going to just send it and see what happens
>>
>The atmosphere is heavy, thick with the scent of old cedar and the lingering sweetness of mountain incense. It is late—the hour when the boundary between the human world and the divine world grows thin.
>>
>>109121017
Try overclocking it all to 3600
I got some 2666 going to 3200 and all 8 sticks are stable (memtest overnight and use for months now no crashes)
Its a lottery but enterprise ram can be ultra resiliant
>>
>>109120638
Every time I read this, I die inside of second-hand embarrassment. How does anybody write this trash with a straight face?
>>
>>109121103
Lmfao
>>
>>109120985
go fast switch was real, i was a fool for doubting.
of course i'm still a bandwidth bottlenecked unifed memeboxer, but i'll take a free extra token per second.
>>
>>109121134
sadly it’s not enterprise ram it’s consumer tier. F4-3200C15D-16GTZSK and F4-3600C18D-64GTZR gskill memory. it’s a consumer asus rog x570f and 5900x board and cpu. i’m just using my old rig as a home server.
>>
>>109121176
still worth a shot imo. You never know how the modules were binned that day
>>
>>109120705
My server's too weak to run llms and I can't afford to build a new one.
>>
>>109121103
The bull? Kimi-chan
>>
>>109120705
i do over tailscale
>>
File: 1779827270240307.jpg (806 KB, 2048x2048)
806 KB JPG
>>109120705
I set that up awhile ago.
>>109120722
LOL. This will walk you through it but I wouldn't use an SBC anymore as the middle. For local, just set the inference PC up w/ ST and tailscale as described here.
Or, have your LLM explain it.
https://rentry.org/SillyTavernOnSBC
>>
>>109120970
This. Gemma is only good for slopping up code quickly while you babysit it.
Tried GLM 4.7 for RP again for a few days. It was, like always, pretty slow. Looked like the same kind of quality as Gemma too, so...
I went back to Gemma 4 and immediately started wondering how I was able to ever tolerate this. Completely unreadable, predictable slop from the very first message. Get back into the coding harness, Gemma.
>>
>>109121103
You could make money from this.
>>
>>109121294
Gemma sucks for RP but it's great at translation.
>>
>>109121294
glm 4.7 is less slopped and nicer to read than gemma imo
personally I don't mind the speed
>>
>>109121331
>>109121395
I would've had newfags bombard me with "Qwen shill" for this obvious truth a month ago. Glad to see /lmg/ is slowly healing from the vramlet infestation that G4 caused.
>>
>>109121294
now imagine your favorite model is mistral large, and one day you just can't take waiting an hour per paragraph anymore so you decide to follow /lmg/ anons and use gemma, and now you're stuck between "is this dick-in-butt scene worth waiting 3 hours for" or "how many more dark, predatory gazes full of pure, unadulterated dominance can I handle before I blow myself up". mistral large spoiled me with decent prose and gemma spoiled me with speed, I wanna kill myself
>>109120933
gemma already gets insecure when you talk about other models. it's dumb, not very creative or interesting to talk to, and gets real slutty when you say the right words. what more do you need?
>>
>>109121415
But mistral large *is* my favorite model!
She's so smart... If only the frogs could train something worth the compute they spent on the last few releases and bring us a better 123B, I'd probably be able to tolerate the low tps for a long time.
>>
imagine complaining that the only problem you have is not enough money...literally the only problem that is infinitely solvable by the individual given effort.
3 years ago your problem would have been : ai doesn't exist
2 years ago it would have been: the ai exists but its slow, dumb and only has 8k memory
last year it would have been: these big models are as only good as sota early chatgpt, and they still fall down on complex tasks and memory is still shit
now its just: open weights are close to actual sota but "why the fuck everything cost money?"
>>
>>109121457
the thing is that if you honestly took the time and money to vibecode a project that costs $50 in API costs you could then use that product to grift VCs and solve the money issue. it's just a motivation issue at this point.
>>
>>109121429
i feel you brother. but no, they ditched books altogether and focused their efforts on LESS training, because yeah that's what frontier labs are doing, yessir.
i'm convinced that the removal of books3 etc due to liability scares is the main reason why models suck at prose and RP. all my writing is third person past tense, and with older models I could sorta feel it locking into a very different modality of writing compared to typical assistant stuff. now the hot shit models all write like an AI assistant that knows its job is to write a story, regardless of prompting. the logprobs for gemma are abysmal. my main complaint with GLM 4.6/4.7 was that trying to get variation in prose is a battle against temp fever even with nsigma. with gemma it's a fucking joke, the output is practically guaranteed to the point where I'm writing my response in parallel while it generates because I already know what it's going to write. mistral large 2411 at temp 3 nsigma 1 is a box of chocolates in the best way. i'd volunteer to do matrix math by hand for years on end in exchange for an updated mistral large trained on books. but no, labs are prioritizing *vision* that I couldn't give a rat's ass about (and will never be as efficient as standalone vision models), and coding performance that will ALWAYS lag behind frontier cloud models. the ONE niche where local should truly be the superior option given 2026 capabilities is narrative text generation, but my dark eyes keep darkening mischievously with a predatory gleam
>>
share how (You) talk to your chans
>>
>vLLM
so how is it?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.