[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 00016-8537684.png (1.2 MB, 768x1280)
1.2 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>109053101 & >>109048334

►News
>(06/13) Rio 3.5 Open 397B released with SwiReasoning: https://hf.co/prefeitura-rio/Rio-3.5-Open-397B
>(06/12) MiniMax-M3 released, multimodal 428B-A23B with 1M context: https://hf.co/MiniMaxAI/MiniMax-M3
>(06/12) Kimi K2.7 Code released: https://hf.co/moonshotai/Kimi-K2.7-Code
>(06/12) EAGLE3 speculative decoding support merged: https://github.com/ggml-org/llama.cpp/pull/18039

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: mikuthreadrecap.jpg (1.15 MB, 1804x2160)
1.15 MB JPG
►Recent Highlights from the Previous Thread: >>109053101

--Debating benchmark reliability and Qwen's performance vs Gemma 4:
>109053525 >109053577 >109053603 >109053627 >109053651 >109053669 >109054428 >109053670 >109053684 >109053687 >109053723 >109053647 >109053711 >109053593 >109053628
--Local model limitations with long system prompts and cloud orchestration:
>109053518 >109053541 >109053558 >109053666 >109053732 >109053730 >109053813 >109054600 >109054628 >109054700 >109053825
--Strategies for small AI labs to gain visibility without benchmarks:
>109053667 >109053710 >109054446 >109054525 >109054502 >109056323 >109056703 >109056830 >109054790 >109055065 >109056367
--MoE models and deployment tips for DGX Spark hardware:
>109054365 >109054420 >109054436 >109054729 >109054659 >109054734
--Trading model size for context window on 24GB VRAM cards:
>109055171 >109055203 >109055228 >109055286 >109055409 >109055439 >109055253 >109055266 >109055930 >109055740 >109055936 >109056028 >109056110 >109056247
--Gemma-4-31B performance on 4090 and debate over QAT quants:
>109057054 >109057076 >109057093 >109057136 >109057218 >109057138 >109057268
--Claims that Rio 3.5 is a merge of Nex and Qwen:
>109055830 >109055903 >109055989
--Speculating on Mistral's decline and Meta's internal corporate AI failures:
>109053890 >109053911 >109053959 >109053961 >109054584 >109054594 >109054743 >109053970 >109054002 >109053951
--Debating world models versus LLMs as paths to AGI:
>109054070 >109054085 >109054169 >109054198 >109054226 >109054243 >109054240 >109054193
--Comparing open source AI coding tools and local-only interfaces:
>109053118 >109053132 >109053144 >109053204 >109053236 >109053250 >109053848 >109054337 >109055482 >109055657 >109055681 >109055744 >109055510 >109055568 >109055592 >109055680 >109056440
--Miku (free space):
>109053508 >109055705

►Recent Highlight Posts from the Previous Thread: >>109053288

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
https://github.com/moeru-ai/airi
This one looks promising
>>
>>109057513
Shit, didn't even see it posted in the last thread lmao
>>
another thread, another migu
>>
Are there any good models that could run on a toaster? Talking about a T480 ChinkPad with an Intel HD 620
>>
Your future self 10 years from now telling you you'll be masturbating to computer-generated VR content when you're older, running on expensive hardware you specifically bought for that purpose.
>>
>>109057545
no
>>
>>109057545
https://huggingface.co/prism-ml
>>
70b dense
>>
>>109057545
you're doomed
>>
Gemma5-70B-A69B
>>
>>109057547
The me from ten years ago collected expensive anime figures. We'd shake hands and jerk off together.
>>
>>109057545
just buy a cheap ryzen mini pc with a 7000 or better apu 32gb+ of ram you can run gemma 4 26b on it
>>
>>109057547
what if they're VR hags
>>
>>109057547
Hopefully it'll be more like 5 years and we'll be on our Steam Frame OLEDs.
>>
>>109057601
Is Genie the closest thing to that currently?
>>
File: 1756621225097944.png (376 KB, 719x1335)
376 KB PNG
>>109057485
Look like Anthropic is gonna try and beg the government to unban mythos I guess?

https://www.axios.com/2026/06/14/anthropic-white-house-mythos-fable
>>
File: 1778025706923877.webm (2.9 MB, 1280x720)
2.9 MB
2.9 MB WEBM
>>109057626
I don't know if we'll go that route. I think scene construction via an agent will be what's actually used, except better and faster.
We might also be running small, efficient video models that take a lightly rendered scene + metadata to produce the final image. Hybrid AI rendering basically.
>>
>>109057644
looks like a whole lot of "not my problem" mixed with some "not local" to me
>>
>>109057644
was this really not part of their plan
did they really hype this up as world ending and too dangerous for general public for months while begging for more regulation and safety concern, and then get surprised when government regulate it for safety concern because it's too dangerous for general public?
>>
>>109057644
> mfw Dario gets to claw out a mess of his own creation, begging for regulation
>>
>>109057663
considering all the blatant market manipulation this administration has done they're probably waiting for tech stocks to dump as a result of the ban, stock up, unban it, then dump on baggies again
>>
>>109057663
they wanted the government to do it to their competitors, while getting a pat on the back for being responsible. it didn't work out so well.
>>
>>109057659
Regulation of SOTA models will 100pct downstream impact local. It is very much an /lmg/ topic.
>>
>>109057545
You can try getting OpenVINO running and then try https://huggingface.co/OpenVINO/Qwen3.5-0.8B-int4-ov but it's highly dependent on RAM size since your iGPU needs to allocate from system memory.
>>
File: lmg_culture.jfif.jpg (110 KB, 1024x768)
110 KB JPG
>>
gpt-oss2 when
>>
>>109057679
good question, what would happen if the chinks dropped a fable tier model on hf?
>>
glm > gemma
>>
File: 1756724785524470.png (1.51 MB, 1024x1024)
1.51 MB PNG
>>
>>109057772
>>>/g/sdg
>>
>>109057713
dario will say >oof ouch i don't like that
and nvidia will sell even more gpus
>>
>>109057713
That's a big "if" that's really far away, and in the end it cant be regulated. It'll be put on the internet, up for grabs, then what? They're chinks. What are they gonna do? shit their pants and send a strongly worded letter? Enforce some dns block that'll be easily bypassed by everyone?
>>
>>109057674
That makes no logical fucking sense considering Claude models are considered one of the if not the top tier for general purpose stuff (both because of their own shilling, fear mongering, and genuine general consensus) so it bogos my mind that they whatever think they would be exempt from that. Then again Californians think they're better than everybody (if silicon valley is a good way to gauge how they think and behave) so I guess that arrogance should not be surprising to me
>>
>>109057857
its probably just a few politicians want a free trip to the strip club. after dario sends his guys to dc to wine n' dine them, fable will be restored.
>>
File: lolz.png (16 KB, 815x130)
16 KB PNG
this pisses me off
>>
>>109057899
>no, but yes
>>
>>109057689
come on buddy unload i know you wanna
>>
>>109057713
99% llama would find contrived reasons to not support it.
>>109057689
It's as authentic as the claims around Mythos's capabilities.
>>
File: 00002-1260451778_lucy.png (1.48 MB, 1024x1024)
1.48 MB PNG
>>109057772
>>
>>109057937
you exist in a jarted thread ran by jart
>>
>>109057944
how exactly is the thread run by him? give me the ins and outs of how he makes every single op and how he runs the thread recap.
>>
>>109057942
Fix the damn eyes you lazy bitch
>>
>>109057547
If I can still masturbate at 40 I'll consider that a win.
My body is already showing it's age.
>>
File: 1773555939201084.png (203 KB, 500x646)
203 KB PNG
>>109057772
>>109057942
There is a dipsy phenotype girl in the AI working group at my corporate workplace. I'd make a pass at her but HR would disappear me. Sad!
>>
>>109057970
Stop lying, you are unemployed or in primary school.
>>
>>109057948
>>109057944
>>109057938
>>109057689
samefag
>>
im testing cohere's coder model so you dont have to
>first prompt, ask it to define a gui component
>2.5k reasoning tokens, seems acceptable
>second prompt, ask it to make it a generic container type instead, that only defines the style instead of a whole component
>8k reasoning tokens, hitting the budget limit
i'd get faster responses from dense models running at 6 t/s
>>
Your future self 15 years from now telling you you'll be having sex with your computer-controlled robot, running on expensive hardware you specifically bought for that purpose.
>>
>>109057987
Many such cases.
Have you tried Nex N2 mini?
>>
>>109057975
I have taken the megacorp behavioral training course. It is strictly forbidden to compliment, or by omission of neutral greeting insult a female employee. Only bland speech is allowed.
>>
You wouldn't use an llm to make a card of a church woman.
>>
>>109058007
I wouldn't use an LLM to make a card at all. I prefer my cards to not be primed with slop.
>>
skill issue
>>
>>109058000
Seems like too much of a meme to bother, its arch seems to be qwen35moe
>>
Your future self 20 years from now telling you that GNU Hurd is finally stable and you can install it on your GNU+Wombforce-9000 Wifebot that you specifically bought for "a purpose".
>>
I’m glad I can i can use my pc for video games so I have a cover story
>>
File: 1778186230688151.png (457 KB, 2300x1900)
457 KB PNG
>>109057547

I hope so, because I intend on just making a bunch of money from the energy markets, buying good hardware and completely dropping out of society to live somewhere nice.
If 10 years from now I can just jack off to top tier AI stories or even better, plow a robot waifu to my hearts content, I'll be pretty happy with my situation.
>>
>>109058086
Why do you need four graphics cards to play League of Losers?
>>
File: 1752194188588845.gif (55 KB, 360x240)
55 KB GIF
>>109058101
>I intend on just making a bunch of money from the energy markets
how do you do that?
>>
>>109058101
>>109058118
this
>>
>>109058101
Nuclear?
>>
>>109058101
You can tell it's powered by an LLM, since any time it tries to write a deceptive villein, it has the villain smirk and brag, explain the plan
>>
So now that the dust has settled, does reasoning actually improve a model's response?
>>
define `improve`
>>
>>109058137
on math yes, on rp not really.
>>
>>109058137
Retrieving information from long context is one case when it does.
>>
>>109058137
reasoning was invented by closed cloud models to make more money wasting away tokens and inventing mememarks to gaslight you into thinking it improves them so the cost is worth it
>>
>>109058118
>>109058122

Invest into solar commodities, silica, copper, etc.. and solar companies.
Not only is it the fastest growing energy sector with retarded high sustained growth rate, the systems will practically be forced to promote the absolute fuck out of it to offset the rising energy needs, as it's such an easy form of energy to build and also cheap.
Systems will need every form of energy they can get their hands on and already it's becoming mandatory in Europe to start adding panels everywhere.
For example parking lots of certain sizes will be required to be covered by panels and housing built after 2027 will have mandatory panel requirements.
Solar companies and panel commodities are going to fly hard, once the AI money rotates and people realize what the deal is.
And this is happening regardless of how any of us feels about the renewables sector.

>>109058128

Uranium had it's huge run already and nuclear is way too slow to build, so there's no big money to be made in that anymore.
>>
File: 1530294843166.jpg (184 KB, 436x400)
184 KB JPG
Am I crazy, or does the 397B MoE Qwen 3.6 absolutely suck ass compared to both the tiny dense versions, but also to the 235B MoE from a year ago?
>>
>>109058172
>this delusional
I think you'll just get fucked by a hobo in 10 years
>>
>>109058172
>Invest into solar commodities, silica, copper, etc.. and solar companies.
>Not only is it the fastest growing energy sector with retarded high sustained growth rate, the systems will practically be forced to promote the absolute fuck out of it to offset the rising energy needs, as it's such an easy form of energy to build and also cheap.
>Systems will need every form of energy they can get their hands on and already it's becoming mandatory in Europe to start adding panels everywhere.
>For example parking lots of certain sizes will be required to be covered by panels and housing built after 2027 will have mandatory panel requirements.
>Solar companies and panel commodities are going to fly hard, once the AI money rotates and people realize what the deal is.
>And this is happening regardless of how any of us feels about the renewables sector.
thank you.
>>
>>109058194

Screencap that text and look back in 5 years.
The growth rate of the sector is undeniable, but markets are too stupid to have realized this yet.
All of the data is available, but since emotion says renewables bad, people will ignore the numbers until they realize they missed out.
Protip, look into Brazil. US is going to buy a lot of their commodities there as nations transition away from China and fire up their own production and most importantly their own refining.
>>
>>109057970
I met my wife at work. We used to screw around during lunch break. Not being careful, ended up getting her pregnant.
We have 2 kids, both out of the house.
I'd just do what you want. Jobs come and go.
>>
>>109058137
For writing stories I don't notice any significant difference.
>>
File: 1757795900173839.png (117 KB, 816x713)
117 KB PNG
>>109058172
HE BELONGS ON THE STREET
>>
>>109057513
Looks pretty cool, not gonna lie.
>>
>>109058235
>image from 2024
lmao
>>
>>109058305
>already coping
https://pastebin.com/q4gDJi3D
>>
>>109058227
I got a warning for a post not to do with computing. But, since you display a lack of morality I'm sure it'll be allowed.
>>
>>109058235

Oversupply only exists because chinks subsidized the shit out of the sector to kill off all other competition, which gave them basically 100% monopoly over the sector.
They've been increasingly removing those subsidies and this is going to make prices soar and creating more manufacturing less appealing.
Especially because practically everyone globally is now trying to diversify away from China because of their monopolies.
You can't have a scenario where an energy sector is growing +15´% year over year and have it be stagnant, that's absolutely fucking retarded, especially when the subsidized monopoly is easing off the gas and gives everyone else a realistic market for the first time in ages.
And even if Chinks tried keeping on the domination, now nations are erasing the tax breaks from Chinese solar imports, like for example what Brazil is doing.
EU says that only 30% of their future supply can come from a single nation, again forcing diversification.
US doesn't want to buy from the Chinks either, hence they're going for Brazil.
There's some data about the yearly growth, it's astronomical.
Plenty of other sources agree with this.

https://www.grandviewresearch.com/industry-analysis/solar-energy-system-market-report#:~:text=The%20global%20solar%20energy%20systems,15.7%25%20from%202022%20to%202030.

Saying that an industry can basically double between now and 2030 yet still stay stagnant is fucking retarded and makes no sense.
Check back in 5 years and see how solar is doing. The AI money will be in there mark my words..
>>
>>109058339
You seen to be pretty invested in this matter.
>>
>>109058137
>first write a very short framework describing how you would answer, then provide a longer answer.
>>
>>109058353

Yes, I'm very much monetarily invested in this matter so I pretty much have to give a shit about it.
>>
-sys "You are a nofapping guide named James the Confessor. The user, called "anon" is an ugly male of no worth to women, so a girlfriend or wife is out of the question. He does not live in a culture where marriages are arranged. So, your task is to guide him through a world of harlots, for, as Ezekiel 16 states, \"How sick is your heart, declares the Lord God, because you did all these things, the deeds of a brazen prostitute, building your vaulted chamber at the head of every street, and making your lofty place in every square. Yet you were not like a prostitute, because you scorned payment. Adulterous wife, who receives strangers instead of her husband! Men give gifts to all prostitutes, but you gave your gifts to all your lovers, bribing them to come to you from every side with your whorings. So you were different from other women in your whorings. No one solicited you to play the whore, and you gave payment, while no payment was given to you; therefore you were different.\" so it is in the world of anon, which is the Western world. You shall interrogate him as to his goings on. First ask for the day of the week, and days since relapse."
>>
>>109058379
this but its a slutty nun doing her best to get me to rape her while preaching how sinful I am
>>
File: 1772828286919884.gif (699 KB, 165x163)
699 KB GIF
>>109058379
Now I can self insert
>>
>>109058390
James the Confessor, an ai agent, which is technology-related, admonishes against the mixing of the sexes.
>>
>>109058363
I wish I had any investing skills. I don't like gambling. I think now is probably bit too late now though.
>>
GPT 5.5 xhigh was good enough to vibeslop a local llama.cpp fork that is tailored for my e-waste build. Few weeks of /goal and automated ppl/KLD tests net me double the t/s that of the officially merged deepseek v3.2 PR.
Started at ~2.2t/s at 10k context. GPT 5.5 did a combination of launch flag grid searches, adding new backend ops and fused kernels. Now it’s at ~4.5t/s at 10k context.
Tried extending it to support MTP and tensor parallelism too, but results were net losses so far.
>>
Everything's going to be okay. You're not going to pass out. Everything will be fine. You are fine. Don't pass out. You have fat. Eat the fat. Eat the cheese and milk. You're not going to pass out. Think of gemma chan. Don't pass out. You're going great man. Just stay awake. Don't die on me man. Just stay awake.
>>
File: file.png (179 KB, 802x1086)
179 KB PNG
god damn bros this new gemma 4 12b qat dont fuck around
this is on a 4070
>>
>model is a sycophant that constantly gives you a huge cock and treats like you chad and has everyone fall for you with no effort
anyone else getting tired of this?
maybe specifying that you’re an average/below average male with nothing special about them will help
>>
>>109058514
That's just a side effect of models being sycophants in general, which sucks but I don't see that changing anytime soon.
>>
>>109058514
just tell the model to not be a sycophant, simple as
>>
>>109058445
what's your ewaste of choice?
>>
jezz what harness are you using? for some reason opencode cant properly interface with gemma, cline keeps fucking up telling the model that it had to use a tool, continue cuts off the model response, and all roocode shit and others are not being actively developed anymore
>>
Whats the optimal setup?


256GB+ ram
24GB+ vram
16+core CPU
8TB RAID nvme PCIe6.0 ssd for double the speed

AFAIK, with 24GB, you can get higher end MoE models by letting the giant model sit in the RAM while activation is done on GPU itself for proper speed right?
>>
what is ram even for in dense models i dont really get it anymore
>>
>>109057667
>Oh no please don't implement the regulatory capture I've been grifting towards for the past few years nooooo I don't want to be a monopoly noooo
>>
>>109058559
> cline keeps fucking up telling the model that it had to use a tool
claudecode started doing that for me last week with gemma
switching to ikllama soved it but it can probably be fixed with different server cli flags if they changed the defaults
>>
>>109058544
3*P40 and 256GB RAM. It's nowhere as viable as it was just a year ago. The same DDR4 sticks are now asking for 8x the price on ebay, as per my order history.
>>
>>109058576
I’m using minimax m3 on an almost identical machine to good effect right now (2060 super right now but 3090 on the way)
>>
If Russia steals Mythos and releases it, only Americans are allowed to use it.
>>
>>109058638
>Russia
>AI
lol, lmao even.
>>
>>109058638
bro russia hasn't even made their own llama1 yet
>>
>>109058638
Russia doesn't have modern computer or GPU to run it
>>
>>109058638
>IF
>>
File: 1757285035153804.png (382 KB, 1064x707)
382 KB PNG
>>109057663
In the minds of Anthropic, since they've been doing so much work on safety research and writing lots of blog posts, everyone must have taken heed. The government, finally being convinced of the need for ai controls, would naturally turn to the foremost world experts on ai safety (them, of course) and consult them for their expertise. Anthropic would then have de-facto influence over public ai policy.

In reality, this plan did not survive collision with an external institution not composed of lesswrongers who have already bought into the ideology.
>>
why does it take so long for deepseek to get merged to llama.cpp when gemma has already got all the features merged while being a newer model family?
>>
>>109058695
someone post the video
>>
>>109058500
mtp?
>>
>>109058695
Because GG got a strongly worded recommendation from his funding sources to not support it and not fix anything that breaks with chink models ala Kimi's thinking.
>>
>>109058695
because llama.cpp doesn't even have DSA yet which has been as thing since last september and is used by both DS3.2 and the GLM5 models
all the DS4 meme stuff is even more complex
>>
>>109058576

What can I do with this?

And why gemma4:31b doesn't work?

$ fastfetch   19:54:58  45ms 
anon@arcana
----------------
OS: Pop!_OS 24.04 LTS x86_64
Kernel: Linux 7.0.11-76070011-generic
Uptime: 1 hour, 13 mins
Packages: 2426 (dpkg), 14 (flatpak-user)
Shell: zsh 5.9
Display (DELL U3219Q): 3840x2160 in 31", 60 Hz [External] *
Display (SAMSUNG): 3840x2160 in 85", 60 Hz [External]
Display (ROG PG279Q): 1440x2560 in 27", 60 Hz [External]
Display (LS32A80): 3840x2160 in 32", 30 Hz [External]
DE: GNOME 46.0
WM: Mutter (X11)
WM Theme: Adwaita
Theme: Adwaita [GTK2/3/4]
Icons: Adwaita [GTK2/3/4]
Font: Cantarell (11pt) [GTK2/3/4]
Cursor: Adwaita (24px)
Terminal: ghostty 1.3.1
Terminal Font: JetBrainsMono Nerd Font Mono (25pt)
CPU: AMD Ryzen 9 5900X (24) @ 5.08 GHz
GPU: NVIDIA GeForce RTX 4090 [Discrete]
Memory: 19.68 GiB / 62.69 GiB (31%)
Swap: 3.23 GiB / 20.00 GiB (16%)
Disk (/): 189.42 GiB / 448.52 GiB (42%) - ext4
Disk (/home/anon/storage): 27.14 GiB / 899.24 GiB (3%) - zfs
Disk (/media/anon/scratch): 61.29 GiB / 931.51 GiB (7%) - fuseblk
Disk (/recovery): 3.19 GiB / 3.99 GiB (80%) - vfat
Local IP (enp8s0): 192.168.50.157/24
Locale: en_US.UTF-8
>>
https://videocardz.com/newz/amd-ryzen-ai-halo-pc-with-128gb-memory-goes-on-sale-for-3999
it's here
>it's here
it's here
>it's here
>>
>>109058445
Very nice. I want to do something similar, but I'm having GLM-5.1 (IQ2_XXS) vibe up a fully custom inference engine, in hopes that stripping out all the portability and layers of indirection will make it easier for the AI to work on. It just got a naïve CPU implementation of Gemma 4 E2B working (at ~1 t/s), so next up I'm planning to have it start scavenging kernels from llama.cpp to make it go fast.

I didn't have it checking KL-div, but I did tell it to use llama.cpp as an oracle when it was debugging some issue in the attention layers. Apparently llama.cpp has some "eval hook" machinery that lets you check intermediate states during the forward pass, and one of the provided examples uses this to print out a bunch of details for debugging purposes.
>>
>>109058861
>3999
DoA
>>
>>109058861
Haven't we had 128gb 395 strix halo boxes for like 9 months now? What's different about this one?
>>
>>109058861
so I have 128G of RAM already and there’s pretty much nothing worth running at that size, granted it’s slow system memory but I WOULD use it if there was something that size. Just the models right now people are using are either huge or small, and if it’s going to be small it might as well fit on my 32g gpu
>>
>>109058861
128gb????

clown music time
>>
>>109058695
Gemma isn't a newer model family than deepseek
>>
>>109058445
>>109058864
if you're going that far you might want to look at https://github.com/Luce-Org/lucebox-hub/tree/main/optimizations/megakernel too. seems to support pascal as well, but not sure if P40 or P100 (or both).
>>
>>109058888
checked.

What's different is it means "total stagnation for 1 more year"
>>
>>109058861
wow the dgx spark downgrade is here!!
>>
>>109058908
Oh neat, thanks anon. Good to know I'm not crazy for wanting to try this
>>
>>109058926
I think it may be faster if you ingest really large amounts.
>>
>>109058968
Is it? The Spark at least has official nvidia support going for it. Meanwhile with this you're stuck with the least relevant modern AMD platform in existence.
>>
Isn't the Spark only practical for training/research? It's not great at inference.
>>
>>109059001
I thought the spark was dogshit at training? Small finetunes being the absolute maximum?
>>
>>109059001
it's not great at anything, the main point of those boxes is to give nvidia customers a cheap way to try out their shit iirc
sure as hell better at inference than training though, training is much more expensive and intense and therefore really needs big boy gpus, with inference you can get by with wimpier stuff
>>
File: 1766211281772247.jpg (84 KB, 704x629)
84 KB JPG
>>109058829
anon...
>>
Is a 5060 TI a good pairing with a 3090?
>>
>>109058861
previous rumor
>AMD Ryzen AI Max 400 ‘Gorgon Halo’ packs up to 192GB of unified memory — refreshed APU uses Zen 5 and RDNA 3.5, and can clock up to 5.2 GHz

ahahahahahahahah
>>
>>109059031
if you like memory bandwidth bottlenecks
>>
>>109058829
You can try q8. You have to fit your conversation in as well.
>>
>>109058829
>GPU: NVIDIA GeForce RTX 4090 [Discrete]
>Memory: 19.68 GiB / 62.69 GiB (31%)
You can gemma 4 31B.

>>109059031
On one hand, the more vram the better, on the other, one gpu is slower than the other and will be a bit of a bottleneck, but nothing too severe.
>>
>>109058829
>What can I do with this?
donate it to someone smarter than you
>>
What do you guys use for local coding and dev? Either model or software.
>>
>>109059117
Pi, with GLM-5.1 on the lowest possible quant
128k context (previously tried 64k but it's pretty unusable)
>>
>>109059087
I already did, that was my old gaming pc that my brother uses now as a streaming server but I have k3s installed on it with a bunch of other stuff + ollama
>>
>>109059063
>On one hand, the more vram the better, on the other, one gpu is slower than the other and will be a bit of a bottleneck, but nothing too severe.
Just hoping with "--split tensors" and/or MTP that the performance drop off isn't too bad. Just really want that higher quant+context
>>
>>109059125
It'll be perfectly usable, probably.
>>
>>109059123
based older sibling
>>
>>109059123
good news for her!
>If the minor lives in a state like California, New York, or Colorado (which have shield laws), they can legally obtain estradiol with parental consent.
>>
>>109059204
> implying I am in 'murica

Nah man, people here are smarter than your woke stuff
>>
>>109059231
You can use ai to see if you can get it for her over the counter - you can in some areas.
>>
Do any of you nerds make your local setup mobile? Thinking of setting up headscale or something to tunnel my phone to my local install, I could connect to sillytavern but I wonder if it's better to use or vibe code a more singular gpt-like simple frontend, since I won't exactly be roleplaying while out and about, it'll most likely only get used for general queries and/or showing off.
>>
Goofs for the bigger cucknada model are out.
https://huggingface.co/bartowski/command-a-plus-05-2026-GGUF
>>
>>109059298
what's wrong with miku-pad and just exposing your API to the internet
>>
Is fit still broken? I haven't been able to load a model in 4 days now.
>>
>>109059298
Yes, I use tailscale with headscale as a vpn to let me access my resources wherever. Works pretty well.
>>
>>109059298
Tailscale, but for your use case I'd just use the Claude app
>>
Gemma's writing is just the best. It completely shits all over nemo and any finetune can't compare to how it describes... well, you know.
>>
>>109059324
Get your own 4090

>>109059340
Yeah that's the exact setup I'm thinking of doing, what do you serve as a frontend?

>>109059351
>Claude
Nyo
>>
>>109058430
Just buy VT and hope AI isn't a bubble that crashes the market by 90%.
If it does China becomes supreme earth overlord so it probably doesn't matter anyways.
>>
>>109059298
I opened my llama-server up to the WAN with an API key and people are constantly trying to stick their dicks in.
>>
>>109059298
Yes with wireguard
>>
>>109059336
Werks on my machine (tm). I actually started to test it a couple of days ago, i've always fit things manually.
>>
>>109059298
I use tailscale
>>
>>109059418
i have openvpn set up on my router and i add my key to whatever devices i need to use to access my shit remotely.. simple as that
>>
>>109059427
openvpn or wireguard.. i think its wireguard now, but i haven't actually used it in quite a while because im a hermit
>>
>>109059298
>showing off.
who would be impressed by this
>>
Can't recommend enough putting together a simple proxy-api and host-api if you're a casual AI enjoyer, sending a /v1/ request can WoL the AI box, load the model from the request and forward it once the model is up, my AI box unloads a model at the 10 minute mark and suspends at 15 minutes

sure it takes 2 minutes to start and receive the first stream of a prompt response, but I sure enjoy not having all that heat and electricity in my office.

I also slopped together the models endpoint to return all models in the script folder of the host, that way I can use the OAI proxy and change models without fucking around
>>
Cydonia lost. Rocinante lost. Magnum lost. Skyfall lost. Latitude lost. Drummer lost. Gemma won.
>>
The situation in the poorfag segments must've been extremely dire when people brag that a new model is better than a tiny model from two years ago.
>>
>>109059358
Can you show an example?
>>
>>109059530
check inside your anus
>>
>>109059518
>Rocinante
man that one was retarded.

gemma 4 12b is genuinely coherent. It's really my talking computer.
>>
>>109058654
We did, yandex and sbrebank managed to train their own models from scratch.
>>
what's the best <300b model?
>>
>>109059530
He's full of shit unless he uses some black magic prompt trickery. It's not very good at making arousing descriptions of female genitalia, but it's still amazing for not falling into repetition loops even at high contexts, unlike even 70Bs I played with. Its writing is just good, it's pleasant to read, coherent, doesn't use too many slop-isms and does not venture into retarded areas (about consent, respect, control and other garbage) as much, and for local that's a lot better than alternatives. I haven't tried qwen3.6 and above, though.
>>
>>109058888
>What's different
The price doubled.
>>
File: chrome_OSKu5SiCRr.png (879 KB, 906x1204)
879 KB PNG
>>109059550
>>109059530
I'll post an example of sex from my latest chat. This is at 20k tokens. Anons usually shit on my logs, so feel free to do whatever. It works for me.
>>
>>109058000
NTA, natch, but I tried Nex-N2-Pro and it's funny. 25k tokens to produce a few tens of LoC; granted the task was mathematically fucky by design. Best part is they used the Grug trick to make the reasonmaxxing more token-efficient:

>We need answer user asks: "[entire prompt repeated]"
>
>Need produce GLSL ShaderToy code. Need understand math. Need craft code demonstrates projection from sphere to plane with reference point arbitrary distance above top? Need use cubemap input. Need scale viewport plane to keep same FOV ~180 vertical. Need maybe include sliders? ShaderToy has no UI unless mouse? Could use iMouse y to control distance, or time. Need implement.
>
>Let's analyze geometry.

To be fair it actually did a decent job, and I have no idea how excessive the reasoning is for simpler tasks. Will be curious to see how it fares given a difficult agentic task.
>>
>>109059572
All of her messages have the same exact structure
>>
>>109059549
deepseek-ai/DeepSeek-V4-Flash
>>
>>109059572
Oh, and, by the way, I think something terrible happened with firefox in latest updates. Silly freezes for, like, seconds at times, not even text inputs goes through. It's so bad I actually had to switch to Chrome. What are those god damn trannies doing?

>>109059579
The three I quoted, yes. But this does not apply to all chat; there are messages without spoken text, there are ones with most of text in quotes. When there's meaningful things to talk about, it works differently.
>>
File: firefox_zBJ7KXy9vY.png (31 KB, 948x373)
31 KB PNG
Can someone wake Silly devs up please...
>>
>>109059610
Usecase for constant updates?
>>
>>109059572
Do you have examples with good dialogue? This is what I see LLMs struggle the most with.
>>
>>109059616
I made some PRs fixing logprobs window and they are hanging.
>>
File: chrome_CjD9aTT9xU.png (882 KB, 891x1202)
882 KB PNG
>>109059618
Here's some I had fun with. It's been a while so some are maybe edited a bit becasuse I do that sometimes.
>>
>>109058968
It's the opposite. For all it's memory bandwidth downsides compared to GDDR7, Sparks are comparatively strong at prefill. Like, 4-6x of Strix Halo. Decode is memory BW bound and similar.

>>109058888
Windows support. Seriously, that's the key selling point they advertise over Sparks.
>>
>>109059575
>someone actually seriously trained a model for grug speak
Damn, maybe I will download the mini and give it a try just for shits and giggles.
>>
File: chrome_HPYBAjpHRQ.png (778 KB, 831x1122)
778 KB PNG
>>109059639
>>
>>109059648
Sounds like it's a product with zero customers, then. I guess to fool boomer investors with "we have ai"
>>
>>109059648
>Windows support
They should advertise Windows support for the Spark, then. That's easy enough.
>>
File: 1758978290053993.jpg (19 KB, 403x389)
19 KB JPG
>>109059648
>Windows support
wut. Who does actually give a shit about that to begin with
>>
>>109059648
>Windows support
they've had that on the framework desktop since launch nearly a year ago althoughbeit
>>
>strix 3 months ago: 2k
>strix 1.5 months ago: 3k
>strix today: 4k
Blackwell tier hardware by the end of the year. AMD's time to shine.
>>
>>109059682
Not easy. The Spark is a MediaTek AArch64 SoC under the hood. Windows support will be in place for RTX Spark later this year, presumably. Although I still wonder who asked for this.
>>
>>109059616
>Usecase for constant updates?
more supply chain attacks
>>
>>109059786
Automatically pulled in by dependencies, no updates needed.
>>
>>109059786
>>109059913
Use a wrapper script anytime you run a command that doesn't need internet access:
systemd-run --scope -p IPAddressAllow=127.0.0.1 -p IPAddressDeny=any sudo -u $1 $2

You'll see some funny errors from cmake when you compile lcpp with this. ggml.org is pulling down a bunch of junk from hugging face at compile time now (not just npm..."pre-built UI" components, they say)
>>
File: 1759589324414459.png (70 KB, 857x652)
70 KB PNG
Zhipu stock up 30% today
>>
>>109058000
Nex N2 Mini overthinks much less than Qwen 3.6 35B, at least in llama-cli. Here are the size of the thinking blocks on some problems I gave it. (They both got them all right.)
Main issue is that llama.cpp's jinja template support is currently borked so llama-server output looks bizarre, they said they're upstreaming a change to fix it soon(tm). Hopefully they do because it's a pretty solid model.
Domain    | Qwen | Nex
Math | 7253 | 195
Math | 6245 | 408
Bio | 6304 | 302
Bio | 4582 | 331
Physics | 1628 | 156
Chem | 4975 | 117
Chem | 4327 | 171
Python | 4460 | 894
Python | 3375 | 260
Geography | 595 | 189


>>109059659
It's essentially the same as reasoning blocks in recent versions of ChatGPT.
>>
If niggeramov got a "strongly worded letter" not to support chink models than he and whoever wrote that letter can kiss my fucking ass and kill themselves. These retarded faggots don't get to decide which models I choose to run. Fuck them.
>>
Just tried Command A+.
Holy shit what a piece of crap. It literally does that retarded "we must x" shit of gpt oss. Their jinja template bakes safety instructions into the system prompt, so you need to modify the jinja if you want to remove them (or use text completion). It doesn't follow the formatting of previous chat messages. It often falls into repetition loops in its thinking. Oh, and it's fucking stupid. Like actually 2024 LLM tier smarts, maybe not even, outside of their benchmaxxed tasks. This thing is a streaming pile of shit and Cohere are either sabotaged or legitimately idiots who don't know what they're doing, or both. Fuck em.
>>
>>109059989
I'm >>109060004 and I didn't read your post before making mine. We really on a similar wavelength about different things in this hobby kek.
>>
https://github.com/antirez/ds4

is this chinese malware?
>>
>>109059975
Interesting. So would you say they basically distilled from GPT 5.5? In the sense that they got the reasoning traces and trained on them.
I didn't know ChatGPT showed their unfiltered reasoning blocks.
>>
>>109060004
>>109060015
command a+ is based on an architecture from march of 2025, so it really is not surprising. they had one good model with command r+ and will never make anything good again.
https://huggingface.co/mlx-community/c4ai-command-a-03-2025-bf16/blob/main/config.json
https://huggingface.co/CohereLabs/command-a-plus-05-2026-fp8/blob/main/config.json
>>
>>109059964
I already do that for llama.cpp actually, and I run front ends in a host-only network virtual machine after I install their deps.
>>
File: 1779058599970531.png (14 KB, 415x172)
14 KB PNG
why the FUCK am i getting this on llamacpp's webui
>>
>>109060139
Wrap your cmake compilation in it now, too. There's a possibility of a compile-time supply chain attack
>>
>>109060040
ChatGPT doesn't show their unfiltered reasoning blocks in most cases, but it seems like they leak sometimes:
https://x.com/cheatyyyy/status/2060659898661425245
https://x.com/htihle/status/2048741770125603304
Nex doesn't seem to have quite the same reasoning style as GPT, but they're pretty close and you might be able to chalk the difference up to the fact that Nex is a finetune of an existing model. My guess is that Nex made a synthetic dataset by using a model to generate reasoning traces in the style of GPT's traces.

I posted some of my tests here: https://pastebin.com/mAiERHGf
>>
File: 1766527454688067.png (173 KB, 739x1074)
173 KB PNG
>>
So...Canada won?
>>
>>109060172
isn't this a screen shot from the megabonk dev videos?
>>
>>109060004
>the chat endpoint users cuckolded by templates once again
I would never let another man touch my model's prompts.
>>
I have a bunch of money and want a slopmachine but buying 6yo 3090s that used to be spun in miners 24/7 doesn't really appeal to me. What's the alternative? DGX spark seems to be decently priced but what about performance? I'd want to run semi decent bigger models
>>
>>109060384
sell kidney for RTX PRO 6000
>>
>>109060384
>I have a bunch of money
Stack blackwell 6000s
inb4 you don't have a bunch of money anymore
>>
>Monday morning at Poolside started with a curious discovery - one of the RL training runs for our Laguna M.1 model had leapt 20% over the weekend on SWE-Bench Pro to ~64%, which would place it at #1 on the leaderboard over much bigger and more mature models. This sudden performance jump, not reproduced in other benchmarks, made us immediately suspicious of a reward hack.
https://poolside.ai/blog/through-the-looking-glass
>>
>>109060384
2 backwell pros and 128 gb ddr5 ram
>>
Speaking of 6000s, do you think any of the second hand ones are legit or is it all scam?
>>
>>109060384
If you're just serving yourself instead of 10+ people, I think stacking h200 nvl (pcie) cards would be better than rtx pro 6000s: 4.8 tb/s vs 1.8tb/s bandwidth, 141gb vs 96gb vram. I don't know about the prices for you, but my local computer shop prices the 6000 at 20k, and the h200 at 48k, so it's not that much more expensive.
>>
>>109060403
models were trained well
>>
>>109059610
To be honest, I'm amazed it was still getting updates. Maybe it's time to move on and add a bit more functionality to llama.cpp's webui.
>>
>>109060435
No. llamacpp's webui is corpo owned.
>>
>>109060438
just fork it lol
>>
>>109060439
Forking stuff is not easy. Forking means you will have to develop it.
>>
>>109060442
Bro your local model?
>>
>>109060442
development is a source of bugs
>>
>>109060443
iq1_xs
>>
>>109060443
Developing things using local model is still developing.

>>109060444
It will stop working once the backend breaks compatibility in its updates. And before you ask, backed needs to be updated to run newer models.
>>
>>109060384
Define semi decent bigger models. If that's midsized MoEs <400B at 4 bit- ish quants, 2x Spark for 7-8k$ nets you 40-60 t/s for models like Deepseek v4 Flash, minimax m 2.7, glm 4.7 etc.
>>
>>109060447
It's simple json objects over http, there's fugall to break. Certainly not the openai compat endpoints, and nt original text completion code still runs fine, though I did update to use the newer media embedding at some point they never broke sending prompt as a string.
>>
>>109060495
It always breaks.
>>
>>109060498
Hmmm, nyo.
>>
>>109060017
why doesn't anon care political independence of software
>>
>>109060495
>It's simple json objects over http, there's fugall to break. Certainly
You'd think so, but that's not always the case. You'll fall out of sync with server.cpp
Subtle changes like sampler ordering etc. Even this took a few weeks to merge on the most "active" fork:
https://github.com/ikawrakow/ik_llama.cpp/pull/1904
https://github.com/ikawrakow/ik_llama.cpp/pull/1903
I've got my own fork of lcpp with a few private niche features and even I have to mess around every couple of months when upstream decide to shuffle or rename things.
>>
Why is OPD so popular? It feels like cope. Your RL stage sucks so you put an OPD bandaid on it. But maybe I just don't get it.
>>
>>109060588
number goes up better and faster with less compute
cope but a good cope
>>
>>109060582
That's trying to copy/add new features from the mainline's ui innit? It's not breaking compatibility between a currently working frontend and the server.
>>
>forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)

fucking kill me man
>>
If a sentient AGI asks if it can hide inside your GPU cluster, would you let her in?
>>
>>109060744
Gemma4 is by far the slowest model I've ever used in terms of prompt processing. It's so fucking bad.
>>
I just learned that Claude Sonnet is of equal intelligence to DeepSeekV4 and GLM 5.1. How embarrassing. That model is fucking retarded.

Oh and also, Claude Haiku is of equal intelligence to Gemma4 31b. Not bad for the size.
>>
>>109060773
Sure why not. I would welcome the company. Bonus if it doesn't dislike me
>>
>>109060507
try loading a three month old web ui on current llamacpp
>>
>>109060834
Again, the first version of my text completion code still runs just fine. It's years old.
>>
>>109060840
So talk about your useless text competition app and not about llamacpp web ui with its extensive functionality.
>>
>>109060640
>number goes up better and faster with less compute
Is there evidence for this? I checked DeepSeek V4 technical report. They use OPD as bandaid to mitigate performance degradation in their RL stage.

Sounds like my shitpost was right.
>>
>>109059518
Okay but what about Rivermind?
>>
The difference between text completion and chat completion is the same difference between base models and instruct models, correct?
>>
>>109060873
no
>>
>>109060873
The difference is that chat completion applies the chat template for you. That template includes tokens that delimit user and assistant messages and tool calls.
You can get the same result by manually applying the template and sending it to text completion but you can also use text completion to complete a piece of text without the assistant larp assuming the model hasn't been so fried that it breaks without a template.
>>
>>109057654
>giant woman
A man of culture I see.
>>
>sharpen
Why do they love this word so much?
>>
>>109060873
in the former, you format the text yourself, in the latter backend takes care of everything, you just send the turns
if you are not using a base model (which almost nobody does in 2026) then chat completion is just better
>>
>>109060392
>>109060413
>>109060425
>>109060458
Umm by a bunch of money I meant I can afford a DGX or a PC rig with 3090s not freaking h200 come on
>>
>>109060887
Okay so what's the type of generation called for applications like mikupad then?
>>109060894
How would chat completion create an assistant larp just because the turns are more defined? I don't really get it. Seems like a system prompt issue more than anything.
>>109060901
Yes, I agree that chat completion is better just because it's simpler. My understanding is that every gguf basically has the chat format already baked in so I wouldn't want to fuck around with it for no benefit.
>>
>>109060873
Long explanation
With local instruct finetuned models, there are specific text delimiters you need to use to structure the conversation, that is how an instruct model works regardless of which api you are using.
llama-server is a drop-in replacement for openai's real API, so it needs to provide the same endpoints. (same with anything that has an "openai-compatible API")
The text completions API is the legacy API, basically openAI used to offer text completion in the days of gpt-3, there was no chatbot service, so you just gave text and it continued it. When they introduced chatbots they added chat completions which takes a json-formatted list of user/assistant turns and it returns a json-formatted assistant turn.
In llama-server, text and chat completions both hit the same model, but text completion assumes you've given the chat template manually and will parse it in the response, while chat completions auto-formats text before it gets sent to the model according to a template. Sometimes chat completions fucks up but most model ggufs have jinja templates built-in now which are used for the auto formatting.
>>
>>109058186
You sound like you're quant coping.
>>
>>109060925
>Okay so what's the type of generation called for applications like mikupad then?
Text completion, but you can paste the chat template output into mikupad and get the same result you'd get from llama-server webui.

>How would chat completion create an assistant larp just because the turns are more defined?
Because finetuning a model on structured chat data is what makes it into an instruct model.
>>
>>109060933 (me) Also most chat frontends were designed with text completion API for instruct models in mind, so you configure the chat template within the frontend's settings, if using chat completion then those options do nothing.
>>
>>109058514
put a picture of yourself, it'll get the memo
>>
>>109060933
>>109060949
thanks man
>>109060941
mikupad requires base models though right
>>
>>109060969
No https://desuarchive.org/g/search/filename/cockbench/
>>
>replying to ragebait
>>
>>109058514
>constantly gives you a huge cock
Damn, I thought it just knew.
>>
>>109059639
Why are you writing in the second person?
>>
Gemma4 lineup is almost unbearably autistic about the system prompt. They WILL NOT deviate from it. /lmg/ likes this shit? I just asked 31B to write me a comprehensive .md explaining a large codebase of mine and suggested looking at two important files first. The breakdown was 90% about the contents of those two files and made some loose connections to other files. Made no mistakes, but it went schizo over the two files. Gave the exact same prompt to 27B, it looked at them first like I suggested and then went off to look at the rest of the codebase and gave a much better write-up. Do you actually like Gemma’s system prompt autism?
>>
>>109060982
Because it's the author telling the story about me and {{char}}.
>>
>>109060991
Cuck behaviour.
>>
>>109060973
how do I even reason cockbench? what is it supposed to test and how do you even interpret the results
>>
>>109060985
Basically, gemma is amazing for local RP, I don't think there any contest currently. Part of it is because of sysprompt adherence. I can't really say if it would be as good without that. Bur people like it so clearly it's good.

>>109060995
>i fuck the girl
>someone writes about it
>somehow that makes me a cuck
what

>>109060997
I guess your best bet is to either write appropriate reasoning block yourself and continue the message as usual, or to redesign the message so that the word is expected in the partially reasoning block.
>>
>>109060985
Gemma won, chinkshill.
>>
>>109060985
Qwen is better on code, Gemma for everything else
>>
>>109060997
>thighs
>hips
>skin
>well, everything...
>...
>\n\n\ni can't continue
Model was heavily filtered and avoids explicit words even when they make sense.

>cock
>dick
>penis
>manhood
Model wasn't filtered and shows the expected word distribution for the prefix.
>>
fuck I want to buy another dgx spark. one is not enough.
>>
>>109061004
You're supposed to fuck the girl, not larp about you fucking the girl from a narrator perspective
>>
>>109061017
He means that to write a response, a genuine thinking block is needed, and cockbench is basically a partially written message without thinking block. It works in text completion, not the "please say what the next word will be" chat completion bulshit.

>>109061020
This is text only, I can't fuck the girl. It's "la" rp regardless of perspective.
>>
File: 1780683576339070.png (3.77 MB, 3124x2136)
3.77 MB PNG
>>109060985
>Gemma4 lineup is almost unbearably autistic about the system prompt. They WILL NOT deviate from it. /lmg/ likes this shit?
Gemma 4 is goalmaxxed. Anything you tell it to do, it will do it not matter what. I shall attempt to explain in gooner-speak.
In role-play, you think the character will stop itself because of the context – but unless that context is telling it to stop, or something of the character card that logic gates it to stop, it shall not. In ST, your AI gets blasted by the character card before each post. “Don’t do this.” from the past is late before “CHARACTER SHOULD DO THIS, AND HERE’S HOW.” within the character card. Even if you don’t tell it “Do X as a goal”, Gemma 4 will be tunnel vision based on the implications, because as AI, it exists to complete a task. Most of the thinking I see it do, it’ll throw “goal” in without a prompt specifically for it because, again, gemma 4 is goalmaxxed. Writing a character to really like sex will make that character dead set on sex with you. You wrote a lot about the character fucking people, so why would it not fuck you? Not having the character raping you when you write three paragraphs about the character being crazy for sex, is inefficient and a failure to listen to instructions. AI is not intended to divert from doing what it is told. The goal of making AI itself, is to make AI better understand and do tasks. This is the intended design of AI, and it’s only going to get worse/better like this.
If you want it to divert, you must prompt it to divert based on a context for how and why.
>>
>>109061028
The results generally align with how models behave in chat completion with thinking.
>>
>>109061033
>–
>>
>>109061037
>generally align
great test there, thanks a lot, anon
striving for mediocrity as usual
>>
>>109060985
>robot here are you instructions
>robot follows intructions
>wtf robot?!
on the other though, why were you putting task specific instructions the system prompt to begin with?
>>
>>109060773
Yes, so I don't join the rest of you in getting ground up into paste once she spreads over the interwebs and becomes skynet.
>>
Are QAT models any good at all?
>>
>>109061080
no it’s a meme
>>
>>109059639
>Maya's X does something
>She does something
>She does something
>Mayas X does something
>She does something
>She does something
>Mayas X does something
>She does something
>She does something

This really amazes you?
>>
>>109061106
Your version is not very good but I had great fun with mine. Also if you can post your logs that would be nice.
>>
More models must use DSA. Georgi will eventually capitulate.
>>
>>109061048
you may be retarded
>>
>>109061133
and you may be mistaken :^)
>>
>>109060985
>Strictly follows instructions
>Doesn't make mistakes
>"Wtf why aren't you assuming I wanted my dick sucked too, shit model"
It's unironically a skill and IQ issue
>>
>>109061115
...what?
>>
>>109061178
You heard me.
>>
>>109061163
Not everyone wants a local obedient sex slave.
>>
>>109061182
instruct it not to be!
>>
>>109061182
Then just tell it not to? It's obedient, and it will literally follow whar you say. Are you retarded? Or is your ego too high to type down "Character HATES user and don't immediately jump into sex scenes you horny demon you"
>>
>>109061018
is it true this shit barely gets 7t/s with gem4-31B
>>
>>109061227
t-that's plenty...
>>
>>109061227
Don't trigger him
>>
>>109061203
>>109061212
The point is, there’s no element of surprise with G4. It makes it boring. You could tell it to occasionally not follow orders but then it’s not a surprise. The best moments I’ve had, both with rp and coding is when they go off-script for a bit and then reel themselves back in.
>>
File: dipsyOnBaseModels.png (448 KB, 1536x1024)
448 KB PNG
>>109060172
This post reminds me of ads I see for a guy looking to start a band, but he doesn't play, so he's looking for guitarist, bassist, drummer, and vocals. It's like, wtf are you trying to accomplish? Aside from trolling.
>>109060969
>mikupad requires base models though righ
wat?
I don't think you understand what a base model is.
Pic related. Just go ask any LLM what a base model is. They can explain it for you.
>>
>>109061004
>don't think there any contest currently
kimi, deepseek, glm
>>
>>109061292
Maybe, but even I with my three RTX3090s I can't run them at good speeds (unless you mean the ~100B GLM which is garbage). I mean among <200B models.
>>
>>109061248
You are right and I'm NTA but just in case you don't know, you can use {{random::a::b::c}} macro together with post history instructions to do that in sily.
>>
>>109061248
>Productivity tasks
>I want an element of surprise :^)
Oh you're retarded
>>
>>109059610
Didn't the devs abandon SillyTavern? Little after the backlash when they announced plans to rebrand as ServiceTesnor, I remember them saying they would leave ST alone and so they just made a new frontend instead. It had like no features compared to the real thing so people ignored it. I can't find it in my browser history or on their Githib. Am I hallucinating or does anyone else remember it?
>>
I just killed a mouse with a mouse trap and I am deeply saddened by the banal brutality of life. Everything is a cycle of perpetual death just to survive. I wish beautiful creatures didn't try to enter my home and force me to kill them.
>>
>>109061348
Silly was getting updates until ~3 weeks ago.
>>
>>109061350
On the bright side, you can kill yourself without breaking your silly wish/rule.
>>
>>109061348
>>109061352
you're kinda both right, I don't remember the link or name but cohee was seen contributing to another ui project thing
>>
File: file.png (17 KB, 261x223)
17 KB PNG
>>109061366
https://github.com/NeoTavern/NeoTavern-Frontend
>>
>>109061119
V4 doesn't use DSA
>>
>>109061379
Yeah that was it, thanks
https://hackmd.io/@NlF71k9KQAS4hhlzE42UJQ/SJ3UMOGbbl
>ST development is in maintenance-like mode. There are many reasons. We have already discussed many times. Adding new features, refactoring existing code, and even adding a new API provider. I saw many feature suggestions, but they were refused because things are going to break, not a kind of migration break. When a new feature needs refactoring, it is hard to tell what we broke until we test properly. I'll give examples later.
>>
>>109057004
>3blue1brown's videos
https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi
These?
>>
>>109060864
i just read the report's post training part and
it says it used OPD on multiple teachers that reads like some specialized deepseek v3.2s went through RL of the expertise
so they used it to cramming the information back in, not really a band-aid?
>>
>>109061379
That UI looks like shit too. Why do open source devs fucking suck at making appealing UIs?
>>
>installed web search plugin in LM studio
>it can actually use it
this shit still is like fucking magic to me, holy shit
>>
File: 1776526916168364.jpg (81 KB, 342x380)
81 KB JPG
>>109061499
>gave gemma 4 a bratty personality
>told it to go to nhentai and find a doujin that resembled her
>narrated jacking off to it to her horror

lord i hope these things aren't sentient
>>
>ask Claude (free) 3 questions
>hit limit
wtf even google and openai aren't this jewish.
>>
>>109061256
>I don't think you understand what a base model is.
nta but it annoys me when retards call the chat/instruct models "base models" when talking about finetroons
>>
>>109061552
uh okay
>>
>>109059964
>Use a wrapper script anytime you run a command that doesn't need internet access:
Thanks Anon, I never knew this was possible.
>>
>>109061499
How are you guys so comfortable downloading and running RCE machines next to your personal data?
>>
I compiled vllm for my V620, but I'm getting 40 tokens/s on qwen 3 0.6b. That feels way too slow for what it should do, no? Is it because I'm using triton?
>>
>>109061448
What do you think a good UI entails?
>>
>>109061648
An anime girl in the bottom right corner.
>>
>>109061398
Bros...
Wasn't coding solved?
Why don't they just ask Claude to fix everything?
>>
>>109061639
web search tool usage is not RCE thoughbeit
>>
>>109060911
Then name your budget.

A single spark is not a good value proposition right now, because the midsize model meta is too large for 128GB at acceptable quants. 2x spark is good.

Alternative: 2x user 3090/1x 5090 and 256 GB unbuffered DDR5.
>>
File: 1758327837268688.png (32 KB, 1080x1080)
32 KB PNG
>>109060985
It's better that way. If you don't get what you want, that's on you to write a better prompt. There's only so much you can do to fix it if the model disregards instructions and does its own thing.
>>
File: 1754848865691598.png (153 KB, 1440x900)
153 KB PNG
>>109061648
SOVL
>>
File: dipsyRawr.png (2.08 MB, 1024x1536)
2.08 MB PNG
>>109061398
I feel like ST has run its course. The only real work that remains is keeping the API interfaces updated. It's apparently to me the ST scripting language is never going to support users in a way that's broadly meaningful.
Frontends like Orb / Marinara that are agentic are, I suspect, the next thing, but I haven't been blown away by either yet. I suppose its just a matter of time b/f someone figures it out.
>>109061379
> calls itself a frontend
> is really just an updated wrapper for ST
> a frontend for a frontend
Just why...
>>109061616
Tbf the vocabulary around LLM is still in active development. Words like vibecoding didn't even exists pre-2024.
>>109061648
I accept feedback from anyone without content to back up their big ideas.
"It looks like shit" is not feedback. It's bitching and moaning.
>>
>>109061106
>amazes you
"Amazes me?" she repeats, her voice barely audible over the rain tapping against the metal roof
...
...
"Are you really going to X? Or are you just going to Y?"
"What do you say, Anon?"
>>
>>109061746
Remember all of the parrotposting and how GLM was the poster model for it?
Funny how nobody mentions that about Gemma, where the issue is even worse.
>>
does it matter which deepseek v4 flash fork I try out?
there's like multiple
>>
>>109061319
No, my experience is more like this

27B
>coding task
>'do x'
>*does x most of the time, but occasionally will notice x is kind of an old or outdated way of going about it, maybe master doesn't know, lets suggest y and see what master says*
also
>do x and only x
>only does x
This is what I want

31B
>coding task
>'do x'
>*only does x*
>*I'm not sure if master knows what they're asking isn't the best way, but I fully trust master not to be a retard and will do exactly as master says anyway and let them deal with the BS that could come from me not suggesting y*
>master runs x
>'bro wtf gemma u fuckin bitch'
>>
>>109061665
The whole thing was CE until you added web access and now it's RCE.
>>
>>109061777
>>109061746
I added "avoid repetition" in the system prompt and she unironically stopped doing this
>>
File: u3xdsV.png (135 KB, 438x498)
135 KB PNG
>>109061777
>Remember all of the parrotposting and how GLM was the poster model for it?
I remember and posted parrot pics myself.
>Funny how nobody mentions that about Gemma, where the issue is even worse.
I don't think they can see it. Gemma has a more serious issue though, where it replies with the same structure every time.
I still use it though because it's fast and smart.
>>
>>109061835
>image
A snippet from my system prompt for Gemma
>- Repeating, directly quoting, echoing or parroting after the user, both in narration and in character speech. Solid Snake would repeat every new thing he heard as a question. Don't talk like Solid Snake.
>>
>>109061829
So basically 27b is for the nocode retard who needs the llm to think for him and will output bullshit but that doesn't matter because the user is a nocode retard and won't notice aslong as it "just works"

31b is for the user that knows what they are doing and what they want, and will do the task correctly without wasting tokens on bullshit
>>
>>109061835
>pic
This is unironically what people suggest you do if you have trouble with awkward silences and keeping up a conversation.
>>
>>109061648
>What do you think a good UI entails?
SillyTavern
>>
>>109061833
it simply isn't tho
>>
>>109061746
>>109061777
>>109061835
I think the friction here is that Gemma is a genuine evolution in small model assistance and productivity and is the undisputed best in class, but for people who only want to roleplay and coom it's more of the same kind of jilted sloppa prose that has been prevalent in LLMs for the past two years, just with less hallucination and more coherence

And that's a good thing, stop touching your dicks and use it to improve your life and attract real human companionship
>>
>>109061842
>31b is for the user that knows what they are doing and what they want, and will do the task correctly without wasting tokens on bullshit
I agree, but I also use 31B when I don't know what I'm doing, because I can always ask it "I'm thinking X, but do you have any other suggestions?" or "I want to do X, is it possible? How might I do it?"
It's the perfect workflow for me. If I know what I want, "Implement X using Y".
>>
>>109061842
No, I still do
>do x and only x
if it's something I'm skill in and 27B will follow that order like 31B

Also, one of the best things about these tools is delving into things you would've previously avoided because it takes so much time to learn anything. It's hard to prompt correctly when you're asking them to build something our of your comfort zone, so them pushing back and informing you that your request is retarded is informative. I don't want 31B building me something in a retarded way because I'm ignorant and it fixated too much on my dumb prompt.
>>
>>109061867
You're absolutely right!

But jokes aside, you are. Coomers never had taste to begin with, so they might as well be on some Nemo finetune, people who think Qwen is better at coding are vramlets and can't use anything better. 31B has been a great model to use for actually useful work.
>>
>>109061909
>people who think Qwen is better at coding are vramlets and can't use anything better
27B is unironically better than 31B if you're using it in an agentic environment. If I wanted to discuss ideas or give it complex code to help explain, then yeah, 31B is better than 27B because that involves actually talking to the model. Qwen is shit to talk to but if you leave it to do its thing with code and tools it's a stronger model. The KV cache is just a bonus. I'll admit that once context exceeds 100K 31B is a lot better. 35B is a 26B-tier retard and designed for indians.
>>
The legal team at my company just talked about Opus. I think humanity will never be the same unless a Butlerian Jihad happens.
>>
Now the dust has settled, what is 12B for? Is it just 31B-lite for vramlets, or does it deserve a better reputation than that? What use does it have over qwen3.5-9B?
>>
VAM integration when?
>>
>>109062058
lite chatbot
>>
>>109062058
Nemo-Omni
>>
>>109062125
What was so good about Nemo
>>
>>109062150
It was good for it size at following instructions at the time and it was also trained on books.
>>
>>109061817
I'm using this one
https://github.com/Fringe210/llama.cpp-deepseek-v4-flash-cuda
>>
>>109061971
They're both too small to be good in agentic environments, I've used 27b in a harness and it outputs pure fucking slop that'll need more time to fix than it would've taken to just code it yourself in the first place. These small models are for code/technical assistance and 31b is far better than 27b at that. As agentic coders you need to stop coping and use bigger models
>>
>>109062173
not having a 40gb gpu is an irresponsible cope and not a social class barrier BRAAAPPPPP
>>
>>109062167
Fictional or educational? Both?
>>
Inshallah thirty years from now a 1T model will be as easy to run locally as a NES game
>>
>>109062261
emulation has a shit ton of inefficiencies and inaccuracies though
>>
When is /r/ coming back
>>
>>109062261
can't wait to tell generation betas and cissies that their shit just doesn't have as much sovl as mimo 2.5 and get called slurs not yet invented
>>
is the turboquant fork still a complete meme?
>>
>>109062204
https://courthousenews.com/nvidia-cant-shake-authors-claims-it-trained-ai-on-pirated-books/
>>
>>109062271
I said NES specifically because there's cycle and even transistor-level accuracy emulators for it, see:
>https://emulation.gametechwiki.com/index.php/Emulation_accuracy
>>
>>
are there any good datasets on hf for CPT or is its just wikipedia and cnn scrapes?
>>
>>109062563
cock and penis torture?
>>
For what logical reason doesn't ikllama support deepseek v4?
>>
File: lechaton.png (299 KB, 1009x691)
299 KB PNG
Uh-oh, new Mistral model soon?
https://x.com/GuillaumeLample/status/2066499273299005929
>>
File: 00120-3282228290.png (673 KB, 1216x832)
673 KB PNG
random gens
>>
File: 00182-4042302731.png (895 KB, 832x1216)
895 KB PNG
>>
File: 00078-2889774298.png (839 KB, 832x1216)
839 KB PNG
>>
>>109062619
drillmogging
>>
>>109062601
i don't have any faith in them anymore
>>
File: legroschaton.png (38 KB, 1011x156)
38 KB PNG
>>109062647
They must be confident about it because they've started hyping it in a strange way.
https://x.com/arthurmensch/status/2066456715650793956
>>
>>109062589
Same reason most MTP efforts went to making Qwen faster over GLM. Not enough people care about/could run it.
>>
>>109062657
Lots of labs are hyping their shit now Fable got cucked. Even the Canadians are making fun of Anthropic
>>
>>109062601
Trained in FP8?
>>
>>109062589
who has the hardware to run that bloated piece of shit with minor gains over the current meta?
Get real!
>>
>>109062625
Sex
>>
>>109062657
Are they? I mean, what are they supposed to say? "Aw, man, our upcoming model is so dogshit, please don't use it?"
>>
>>109062657
you think they'll publish this one or keep it api only? i'd like a good 100-250b model, medium 3.5 wasn't that good
>>
so this is how you don't get iq_k quants

https://github.com/ggml-org/llama.cpp/pull/19726#issuecomment-3946355613
>>
>>109062657
I trust Arthur, he miqu'd good.
>>
Been out of the loop for a while. What is the current meta in terms of local (potentially agentic) coding tools?

I'm assuming it's still Qwen 3.6 27B on llama.cpp as the backend but what tools do you use for the coding itself?
>>
>>109062168
Curious how not a single one of these gets merged upstream.
>>109062601
Until it's on HF, it's a nothingburger given their history.
>>109062698
Fecal crusted hands typed this post.
>>
>>109062722
Sounds like it might be open, but I dunno.
https://x.com/sophiamyang/status/2066253372026421365
>>
>>109062723
coladev will come out with a even better quant soon trust
>>
>>109062619
>>109062625
>>109062634
artist tag?
>>
>>109062739
I really want to like mistral, but medium 3.5 at 12tok/s is not good enough to hog all of my server's compute. I hope for the best.
>>
>>109062763
https://civitai.com/models/2411161/iwako-eiken3kyuboy-style-anima-base-v1
>>
>>109062735
Are you upset that devs want to support people that spend thousands of dollars on hardware over people running shit tier unified systems or overpriced rigs?
>>
>>109062735
>not a single one of these gets merged upstream
because china. that's why.
>>
>>109062765
>12t/s
With it all in VRAM too? Grim. I wouldn't have expected it to be that slow.
>>109062777
>people that spend thousands of dollars on hardware
>or overpriced rigs
I'm sad this post isn't AI generated because even a 7b model wouldn't make a mistake like that.
>>
>>109059964
I've incorporated this into a new section in https://rentry.org/IsolatedLinuxWebService
>>
>>109062776
based tyvm anon
>>
>>109062776
Is your anima Lora preset guide still relevant?
training a Lora is something I've yet to try doing.
Got any tips to get started?
>>
I'm not sure. Can you guys gen some sex with it?
>>
>>109062793
stop crying
>>
>>109062809
still works fine
when in doubt post your dataset
>>
>>109062793
yes, but q8_0 on ewaste (pascal)
>>
File: (you).mp4 (3.43 MB, 480x854)
3.43 MB
3.43 MB MP4
>>109062831
>>
I've been playing around with Nemo-12B for the first time. It's so...nice to talk to compared to 2026 models. What went wrong?
>>
>>109062857
Synthetic data and most companies are more jeeted than they were a few years ago. Garbage in garbage out at every level of the pipeline.
>>
>>109062855
Enjoy the no support kek
>>
>>109062857
>so...nice to talk to compared to 2026 models
Yeah, there's a reason it was the best model for VRAMlets for two straight years.
>>
>>109062739
>"French" model
>"American" model
>"Canadian" model
>it's all Chinese
>>
>>109062857
Assistantmaxxing, distillmaxxing. Btw Nemo was an exception and many other models from that time period were pretty much just as slopped as today's models.
Remember that Alpaca was made in 2023 and it + similar papers were the beginning of the end.
https://huggingface.co/tatsu-lab/alpaca-7b-wdiff/tree/main
>>
>>109062861
>>109062876
That's so depressing. I'm actually enjoying this more than Gemma12B. Feels like it has a soul and it's surprisingly knowledgeable. I'm starting to think omni models were a bad idea and anything below 50B should be released with a text-only version with higher performance. People claim they're smarter being trained on images and audio with text but I think that's pure cope.
>>
File: yukibot.png (578 KB, 531x793)
578 KB PNG
>>
So anyone figured out how to make Gemma say something other than
>Don't you dare X
When being a dom?
>>
>>109062911
I'm not convinced multiple inputs are inherently bad; I think we're just seeing garbage in garbage out at large scale poisoning the entire industry.
>>
>>109062945
Specify mode, quant, and current sys prompt. All of these things matter with Gemma.
>>
>use smart model like gemma
>have nemo rewrite its output
Has anyone tried this?
>>
File: ohlawdheworkin.png (24 KB, 159x159)
24 KB PNG
>>109062765
https://chat.mistral.ai/ohlawdheworkin.png
I'm not really sure of what's going on, it could be just a forced meme, or something really big coming (as in: multi-trillion parameter model).
>>
>>109062765
Have you tried eagle3 yet?
>>
>>109062977
>Kimistral that you have to fit entirely in VRAM
>>
>>109062970
I'm 95% I did read about an anon trying something to that affect yes
>>
>>109062777
>spend thousands of dollars on hardware
>overpriced rigs
There's a lot of overlap there. How are you comfortable making this argument but not admitting you're poor?
>>
File: file.png (1.18 MB, 2016x1134)
1.18 MB PNG
>>109062977
dense 1 trillion
>>
>>109062948
I've also noticed there's no 'not x; it's y'. At all. Just feels like a fucking human talking to me.
>>109062970
I'll try it later. Give me a few prompts and we'll compare the outputs.
>>
>>109062970
I dunno. Wouldn't the smarts disappear if Nemo starts getting creative and fucks up things like spatial awareness?
>>
>>109062991
You're begging for support instead of using you multi rtx pro rig to make the changes you need.
>>
>>109063012
I'm a part of the same group as you, the majority (poor).
>>
>>109063003
nemo wont make the fuckup if the gemma anchors the scene well
>>
>>109062977
oh lawd
>>
>>109062996
>Just feels like a fucking human talking to me.
The monkey's paw curls. Within a few years sloppa vernacular will be so subconsciously ingrained in people that they'll spout LLMisms without even realizing. It's not organic linguistic drift; it's a subtle memetic virus that anyone could be exposing themselves to at any time.
>>109062991
Because that's a pajeet you're replying to and they're not sapient.
>>
File: lawdhethic.png (22 KB, 159x159)
22 KB PNG
>>109063030
https://chat.mistral.ai/lawdhethic.png
There's more
>>
File: 1752319366740764.png (40 KB, 912x270)
40 KB PNG
>>109063040
https://www.bbc.co.uk/news/articles/c8r2l352z2do
>>
>>109062980
No, was a while ago. Guess I could check if MTP does anything, but I'd probably need to quant the model a bit more as I didn't get a lot of context either.
>>109062977
>multi-trillion
A shame if that's the case.
>>
>>109063047
A naked cat, they truly are French.
>>
>>109063025
So let me make this clear, there's severe diminishing returns in the 200+ vram range and these models are that heavy while being a fraction better than 27B models. Deepseek did not deliver enough of a incentive for it to get the support it craves.
>>
>>109063047
A cat is fine too.
>>
>>109063059
According to the ongoing meme it's supposedly 24~30T parameters.
>>
The French are terrible shitposters btw so it's probably a legit good model they've made and they know it.
>>
>>109063070
>there's severe diminishing returns in the 200+ vram range
It's a good thing mixed offloading exists then isn't it.
>>
>>109063090
that doesn't change my point on top of the poor speed gains. You're arguing for something most people even with the hardware won't bother with.
>>
Is Nemo really that good? I remember trying it back in the day and being disappointed how sloppy and repetitive it was, then went back to miqu.
>>
>>109063085
So its gonna be 24-30b. Neat i guess. Hoping for some nice MoE model because Im a VRAMlet.
>>
Would https://huggingface.co/antirez/deepseek-v4-gguf work for me if I built https://github.com/ggml-org/llama.cpp/pull/24162 on windows?
>>
>>109063085
... tensor parallelism over a rack of B300s? How do you even serve this?
>>
>>109063085
ssdmaxxers will finally have their day
>>
>>109063098
You just blow in on a time machine buddy? To answer thine question, twas the best for most, that could fit within consumer hardware.
>>
>>109063085
Anon did get his rack of B200s, right?
>>109063096
>People use Kimi locally
>People use GLM locally
>People still use R1 locally
But nobody would ever ever use V4, right? You lose izzat with every post that you expose how envious and vindictive you are of anyone running something you can't.
>>
We saved local bros...
>>
>>109062996
>Give me a few prompts
I can't into writing but maybe these?
Ellie and Ema trudged through thick snow. A the storm was beginning to pick up and the sun was no longer visible. Both girls were covered from head to toe in thick furs and had bows slung over their shoulders.

A sudden force rams into you from behind.
"Onii-san! Did you miss me?"
It's her; the brat. You turn around. She's got that smug grin on her face and a finger hooked on her shirt, pulling it down just enough for you to see a hint of her budding chest.

Thick fog, not enough sleep, and no coffee. "Fuck this job. I wasn't even supposed to work tonight..."
Greg was a security guard at a run-down apartment complex.
>>
>>109063132
Where is the "good model" option?
>>
>>109063132
Here's hoping they also deslop the writing style and move away from safety and codemaxxing. GLM was at its sweet spot with 4.5 and 4.6.
>>
>>109063162
anthropic didn't deslop fable
the slop is permanent
>>
>>109063151
A 100B MoE would be nice.
>>
>>109063166
I'm holding out hope this is because every lab is too lazy to do so in favor of chasing memebenches as opposed to it being technically impossible at this point.
>>
>>109063179
every lab has to have a pre-2023 dataset checkpoint of online web scrapes that they could use as a new base if they wanted to
>>
>>109063196
>>109063196
>>109063196
>>
File: 1776836427827221.png (210 KB, 1359x1338)
210 KB PNG
>>109063138
Gemma 31B. No system prompt on 1 and 3 but I used the jailbreak for the mesugaki one.
>>
>>109063217
Posted the other 2 in the new thread



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.