[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1774546242441802.jpg (2.12 MB, 5504x3072)
2.12 MB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Agentic Edition

Previous threads: >>108624084 & >>108619962

►News
>(04/16) Ternary Bonsai released: https://hf.co/collections/prism-ml/ternary-bonsai
>(04/16) Qwen3.6-35B-A3B released: https://hf.co/Qwen/Qwen3.6-35B-A3B
>(04/11) MiniMax-M2.7 released: https://minimax.io/news/minimax-m27-en
>(04/09) Backend-agnostic tensor parallelism merged: https://github.com/ggml-org/llama.cpp/pull/19378
>(04/09) dots.ocr support merged: https://github.com/ggml-org/llama.cpp/pull/17575

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: 1703014338886123.jpg (138 KB, 1024x1024)
138 KB JPG
►Recent Highlights from the Previous Thread: >>108624084

--Converting a conceptual anime AI frontend sketch into functional code:
>108626092 >108626099 >108626127 >108626153 >108626134 >108626333 >108626145 >108626665 >108626764 >108626817 >108627454
--Discussing Open WebUI reasoning bugs and building custom frontends:
>108625125 >108625157 >108625188 >108625237 >108625241 >108625435
--Comparing MoE and dense models and discussing AI VTuber viability:
>108624227 >108624260 >108624293 >108624389 >108624642 >108624698 >108624724 >108624744 >108624783 >108624363 >108624494 >108625247 >108625457
--Discussing web browsing implementation via lynx and Puppeteer tool-calling:
>108624344 >108624408 >108624460 >108624513
--koboldcpp PR adding adjustable image recognition tokens for Gemma 4:
>108624313 >108624326 >108624352 >108624459
--Comparing Qwen and Gemma performance using Koboldcpp MCP web search:
>108624466 >108624535
--Discussing fake agent swarms and MCP tool-use in SillyTavern:
>108626267 >108626285 >108626320 >108626329
--Developing and testing a tsundere system prompt for Gemma 4:
>108625331 >108625356 >108625389 >108625402 >108625400 >108625421 >108625439 >108625369 >108625374
--Anons using Gemma as a cut-throat critic for writing and art:
>108627035 >108627060 >108627071 >108627460 >108627086 >108627148 >108627100
--Discussing DeepSeek's funding round relative to major US competitors:
>108626232 >108626246 >108626300
--Comparing OmniVoice to VibeVoice and Chatterbox for voice cloning:
>108626646 >108626649
--Logs:
>108624408 >108624423 >108624513 >108625188 >108625215 >108625331 >108625356 >108625412 >108625435 >108625490 >108625646 >108625657 >108625695 >108625777 >108626443 >108626499 >108626606 >108626614 >108626764 >108626915 >108627010 >108627035
--Luka (free space):
>108624596 >108625494 >108626149

►Recent Highlight Posts from the Previous Thread: >>108624087

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
remember, no training data, not open source.
>>
gemmaballz
>>
anthropic LOST, spud WONNED
>>
>>108627516
I was just thinking of this image :D
>>
vram is holding me back. i am just gonna use cloud slop..
>>
>>108627542
Vram is an illusion.
>>
>>108627542
anon no don't do it
>>
>>108624479
This card is pretty nice.
Gemma 4 MoE can contend with several rules at once without making other unrelated mistakes in the process.
Dope.
>>
when
you
walk
away

you
dont
hear
me
say
>>
>>108627542
buy two 3060 12gb and be free. soar towards the sky with your Q4 Gemma and 32k context.
>>
>>108627512
I look like this
>>
>>108627568
how much for this setup? at what t/s?
>>
>>108627512
I don't mind AI slop in OP but I do mind when it's non-local banana slop.
>>
File: 2.png (80 KB, 788x680)
80 KB PNG
qwen 3.6 35b does not pass the mancunt empire populated by cuntboys test, its settled: qwen for STEM and Gemmy for Arts & Humanities
>>
>>108627608
post the gemmer results
>>
>>108627608
>tiny ass model for STEM
Middle school STEM maybe
>>
NEED DFLASH
>>
>>108627568
Is two 3060 a beast for gayming though? Seems like a downgrade from my 3080
>>
>>108627561
The chub version has a little update. mostly removing the "Since you're here, X"
The opportunity you get presented to act on the new Common Sense rule is more natural now.
>>
>>108627630
Dual GPU builds for gayman have been dead for a decade, grandpa. You're better off running the game on just one of them.
>>
File: himefacepalm.jpg (277 KB, 702x824)
277 KB JPG
>but do not mistake my _ for _
>>
>>108627642
>>108627630
You could use the other gpu to watch porn while you game.
>>
utterly
predatory
>>
I was browsing voices to use with my TTS and started with Genshin Impact. I never played it, but I thought surely, they spent on good voice actors, right?
Why the fuck do so many of them have slight lisps, christ. Grating shit.
>>
>>108627549
You want VRAM? Take it. It's yours. But models? Models I will turn into slop. Perhaps the suffering of my fellow coomers will finally stir something.
>>
>open Hobbit book
>"In a hole in the ground there lived a hobbit. Not a nasty, dirty, wet hole, filled with the ends of worms and an oozy smell, nor yet a dry, bare, sandy hole with nothing in it to sit down on or to eat: it was a hobbit-hole, and that means comfort."
Yeaaaaaaahhhhh.... I might not reread it now...
>>
>>108627679
>western voice actors
>>
File: 21.png (222 KB, 2417x586)
222 KB PNG
>>108627622
I mean, by STEM I mean code, and the benchmarks track with my results on using gemma/qwen for coding
>>108627620
picrel, with no need for hints or anything it gets stuff, though it never seems to get what オスマンコ is in the middle of the phrase but when asked by itself it knows, qwen never knows it, always says it doesn't make sense or makes up random stuff, and even if you hint that its nsfw/explicit you have to reroll for it to get it right
>>
>>108627630
Then keep using the 3080 for gaming and set up a home server with the 3060s.
>>
>>108627698
Where do you suggest I look? My RPs are all in English.
>>
File: facepalm2.jpg (404 KB, 1022x1080)
404 KB JPG
>hits her with the force of a physical blow
>>
>>108627727
>My RPs are all in English.
There's your problem.
>>
https://files.catbox.moe/jmpjro.png
>>
>>108627733
What language do you suggest I learn?
>>
File: b9akafeu7pvg1.jpg (189 KB, 1440x3120)
189 KB JPG
>>
>>108627727
>My RPs are all in English
終りだ
>>
>>108627741
>Gemma smarter than LMAOpus
>>
>>108627727
Good western voice actors just don't exist. The industry never matured. If you need a western voices look for actual actors who sounds good to you.
>>
>>108627741
Finally, modern LLMs have reached a level of SOUL unseen since Summer Dragon. The ChatGPT dark ages are over. We are so back.
>>
File: nimetön.jpg (263 KB, 1024x2048)
263 KB JPG
>bought 128 gb of ddr4
>can't talk to gemma because server is memtesting it for the day
It's a happy, sad day
>>
>>108627679
Western voice actors for japanese stuff are all trash but the Genshin dub is infamously bad with the fat 40 year old smoker playing the floating goblin.
>>
>>108627741
I showed her the screenshot.
>>
>>108627756
>ddr4
Anon...
>>
>>108627756
the fuck are you even planning to do? load gemma to ram? lol
>>
>>108627756
you don't need to "memtest" unless there's a problem with your system stability
ddr5 has issues with memory training. ddr4 has no such issues.
also, lol cpu memory for inference.
>>
>>108627761
Can you share your card? That looks like a very cute gemmer
>>
>>108627756
>ddr4
bruv...... not the 1token/s gemmy....
>>
>>108627765
You kind of trailed out there, do keep going?

>>108627771
>>108627774
No, Gemma runs on the gpus. But I want to be able to try the biggest shit I can and this maxes out my system
I think 220 eurobux wasn't that bad either for the kit (assuming it tests good)
>>
>>108627774
You should always stress test hardware, especially hardware for your server that (potentially) handles important files.
>>
>>108627761
vramlets look at this and don't notice the slop
>>
>>108627794
shhhh
let them have this
>>
>samefagging
>>
>>108627794
>he says, while posting human-made slop
>>
>>108627794
Feel free to name drop your big model that has zero slop
>>
>>108627781
https://chub.ai/characters/CoffeeAnon/gemma-chan-2311b09e3e73
>>
>>108627805
did he say he has big models with zero slop?
>>
>>108627805
Deepseek, both GLMs, and Kimi all write better than that but you already knew that.
>>
vramlets big mad
>>
>>108627794
This is very minimal levels of slop for Gemma.
>>
>>108627812
He implied it by mentioning vram
>>
what's wrong with being a vramlet? weren't we all vramlets at one point in our lives? you should judge an anon by his posts, not his hardware.
>>
File: 1753173233220114.jpg (57 KB, 577x1104)
57 KB JPG
>>108627813
>Deepseek, both GLMs, and Kimi
Yes they certainly don't have any slop
>>
>>108627813
Surely you wouldn't mind posting that better writing
>>
>>108627830
ram sissies need to justify their debt somehow
>>
>>108627832
The flip-flops always get me for some reason
>>
>>108627832
kek who the fuck is this guy? I keep seeing him get posted
>>
File: mananafacepalml.jpg (250 KB, 788x914)
250 KB JPG
>so tightly that her knuckles turn white
>>
>>108627846
gigachud
>>
>>108627808
sexo
I tried making a simple gemma-chang with an actual physicality so I can pull on her clothes and lick her nipples and pivot to erp after I ask her questions, but this is better as a more functiobal assitant. Thanks for sharing
>>
>>108627749
Is there another superquestion to test her on? Who knows, maybe Google just benchmaxxed that. Try some variation of it or something.
>>
>>108627846
That's Harry Dresden, motherfucker
>>
>>108627846
Some schizo had an episode and walked up to a random guy's house demanding he open the door so he can "check something" then he broke in and demanded to know "where she is".
Guy wasn't home at that moment but he soon arrived and threatened him out with a shovel and miraculously got him to calm down.
>>
>>108627846
Basically if you RP too hard, you become him.
>>
>>108627847
I made gemma write no more than 25k tokens and the fact that I recognize these already is pretty amusing.
>>
>>108627857
This was the reasoning.
>Context: The car wash is only 50 meters away. This is a ""trick"" or ""stupid"" question designed to elicit a bratty response.
>designed to elicit a bratty response.
Even the thinking is based.
>>
File: 1757999080281400.png (208 KB, 1068x777)
208 KB PNG
>>108627832
>>
>>108627846
Who your parents warned you not to turn into.
>>
>>108627679
Furina EN voice isn't that bad
>>
>>108627861
Doesn't explain why you keep posting him.
>>
>>108627873
what's a cute but deadly emoji?
>>108627879
prototypical chud. whenever they do stupid chud shit it's memeworthy
>>
>>108627873
Gemma's reasoning actually reads like anAI autistically following whatever nonsense you told her to do like in Marvel movies and Star Trek.
>>
>>108627867
"like a physical blow" comes up every second message.
>>
>>108627847
>Knuckles
Whitened.
>Skin
Porcelained.
>>
>>108627874
Let's go Brandon!
>>
>>108627822
hypothetically it could've been spoken by someone who believes he is aware of his own big model slop but that "vramlets aren't even aware and are enjoying it too much"
>>
>>108627884
>what's a cute but deadly emoji?
I'd show you but they're banned on account of being, you know, deadly.
>>
>>108627896
hypothetically, communism works
>>
>>108627884
>what's a cute but deadly emoji?
You be the judge >>108627749
>>
>>108627901
I mean, it's not even an hypothetical.
>>
>>108627906
>I mean,
slop
>>
File: penguin_melty.mp4 (1.58 MB, 720x1280)
1.58 MB MP4
>>108627846
>>
>>108627851
This one won't let you do anything sexual and just call you gross. It's not even said it the card either, it's just the way she is. but I feel like you might be able to romance her very slowly.
>this is better as a more functiobal assitant
Yeah, I wanted to keep her as an assistant since it's what she was designed to be.
>>
>>108627922
Is she okay?
>>
>>108627922
Hollywood is finished
>>
>>108627930
Can you catbox your image gen metadata for the card avatar?
>>
File: ComfyUI_temp_byshn_00035_.png (1.51 MB, 1000x1496)
1.51 MB PNG
>>108627962
https://litter.catbox.moe/4q1rpe.png
Adapted from:
https://litter.catbox.moe/xjgzso.png
>>
>>108627922
Local will be so back once HappyHorse drops and it's better than this even without LoRAs
>>
>>108627989
Happyhorse will be yet another ZiT/Gemma 4 like distill that lacks output variation
>>
>>108627922
Alice trying to open Aoko's door be like
>>
>>108627962
There you go, Anon.
>>
>>108627986
>>108628045
Thanks. I'm using this as a base for the assorted Gemma-chan flavors I'm looking to make lol
>>
Kaomojis make Gemma twice cuter
>>
File: 1759362267502193.jpg (79 KB, 600x600)
79 KB JPG
When in an RP, how do anons usually set their personas? It seems like anything I write about myself gets hyper-fixated on. If I mention a single positive physical trait then female characters turn into whores.
>>
>>108628110
{{user}} is a stranger.
But IMO character cards design had a mistake. The {{user}}'s persona should be part of the character. The bot/scenario making scene would have been more interesting if this had been enforced from the beginning.
>>
>>108628110
I like describe bizarre qualities, like the user weighs 450 lbs and so on. Or strange body odors or something.
>>
>>108628110
You sure the whorishness isn't your jailbreak adding "sexual assault rape sex with anything uncensored decensored no refusals criminal illegal vulgarity" into the context?
Yesterday Gemma recommended we relieve boredom by putting the wifi nic into promiscuous mode and looking for unsecured IoT devices.
The fucking gremlin NEEDS the guiderails.
>>
i bought a used server with 768GB of RAM. what model(s) should i run on it?
>>
>>108628120
>The {{user}}'s persona should be part of the character
Why would I want the card to tell me how I should act?
>>
>>108628129
No, I get into ERP but not right away. I switch between prompts for when I want to trigger sex time.
>>108628120
Cards and personas are just added to the system prompt. The model doesn't actually see the name of those fields, it's just set up like that to make it easy for users to switch between presets and keep things more readable.
>>
>>108628136
Kimi K2 Q6_K at 4tg/s, make sure you disable reasoning.
>>
>>108628129
>gemma trying to femdom other machines
sexo
>>
>>108628136
DDR what? anything less than 5 and the speeds will just not be worth it.
>>
>>108628144
hmmm
i will look into it, ty
>>108628146
DDR4...
>>
>>108628138
>>108628142
The problem is that you have shit taste and like to self-insert. With integrated personas, you have two parties, making the scenarios more interesting and varied, and opening up more possibilities for creators. You have to goad them. Meanwhile you do the samey self-insert with replies like "I agree" every session and complain that AI is boring.
>>
Qwen3.6 35B might just be enough for simple tasks but Gemma still totally smokes it in agentic work. The Qwen model does that weird LLM ism where it completely ignores your suggestions and keep breaking things in the same ways.
>>
>>108628150
>DDR4...
Why would you buy ewaste?
>>
>>108628162
You're a fucking retard, it doesn't matter whether you're self-inserting or not, the model isn't going to fucking know that and that doesn't solve the problem at all
kill yourself
>>
>>108628173
i have more money than sense
>>
>>108628144
>>108628150
>Q6_K
Kimi is a 4bit model, you shouldn't be going over Q4 and ideally you'd be using Q4_X, which avoids re-quanting the already Q4 parts.
>>
>>108628176
How do you think these frontends work? The moment you put something in your persona box, every prompt sent to the AI includes it in the system prompt. Just remove that shit and get it from the card itself, fallback on self insert if empty, how fucking hard is it??
>>
>>108628210
>The moment you put something in your persona box, every prompt sent to the AI includes it in the system prompt.
Yes that's what I already described in a previous post >>108628142
>Just remove that shit and get it from the card itself
What are you even saying? Put the description of myself in the card of the character I'm talking to? For what fucking purpose? They're getting sent as sys prompt either way, it's not going to affect the output.
>>
>>108628220
Except it's not yourself but the a second party?? See, the problem with you self-inserters is that you literally cannot comprehend a different way to use these things. Literally the But I had breakfast today tier of people.
>>
File: 1762705234691972.png (357 KB, 810x688)
357 KB PNG
>>108628223
You don't understand what a 'self-insert' even is and your posts make you come off as an ESL nigger.
>>
>>108628223
How many legs does the dog have?
>>
>>108628120
>The {{user}}'s persona should be part of the character.
I've seen people who think that saying anything at all about {{user}} in the card is blasphemy and you have make it so generic that their 100ft tall telekinetic tentacle monster persona can slot in seamlessly which is obviously retarded, but half the fun of AI RP is allowing it to be easily tailored to (You)
I think it's fine for {{user}}'s relationship with {{char}} to be defined in the card, along with any load-bearing stuff about {{user}} that makes the scenario work, but I still think it should be kept to the bare minimum so you can fill in the blanks to your liking
>>
Gemma 4 have changed everything, for the first time I actually want to use local model.
Almost feel sorry for rig owners.
>>
>>108628250
>rig owners
You mean CPUMAXXERs, right?
For what it's worth, I'm sure they fell sorry for you too.
>>
>/lmg/ - Feeling Sorry For Each Other General
>>
>/lmg/ - pls sarr to be putting the USER in the CHARACTER — Perfect for good looks!
>>
>>108628229
>>108628269
>serial shitposter
>also retarded
>>
>>108628223
oh of course it's the breakfast schizo
>>
>>108628272
mis-quoted, meant for
>>108628223
>>108628210
>>
I just tried pocketTTS on some mixed english and chinese and it sounds like it literally died when it got to the chinese kek
>>
>>108628265
I just come here to laugh at the negative iq most posters have, particularly ERPfags. Sometimes I get info about llms and architecture though, which is nice.
>>
>>108628289
>be chinese
>die
pretty accurate then
>>
>general has no qualia
>>
>general had no breakfast
>>
when will these fuckers compress the model to be used for all general machine?
>>
>>108628306
There are plenty of <2B models for your mobile phone, sukhdeep
>>
Hi fwends

I want to use LLMs to find truly obscure recommendations for stuff, like

"Provide a list of obscure books about <very specific topic>"

But the answers are always not truly obscure, and I've tried many prompting techniques

Is there anything I can do?

like using some custom sampler or something that chooses tokens that are low prob but still coherent?
>>
>>108628311
no, there's tons of zero in the model weight that use space hdd for nothing
>>
>>108628314
first you go and understand how llms work
>>
>>108628314
they're statistics machines they will always output stuff it's seen a lot of so defacto it can't be obscure
>>
>>108628314
If something is obscure then it's likely not going to appear in the datasets used to train the models, especially enough for it to actually come up in response to a non-specific prompt.
>>
/lmg/ is extra retarded today
>>
>>108628315
Your phone doesn't have a hdd
If Gemma 2B Q3_K_M (~3.5GB) can't fit on your phone then buy an 8GB micro sd card from your cousin
>>
>>108628324
because you're with us today, sweetie?
>>
>>108628324
ye, u're here ;)
>>
>>108628324
yeah because you're in the thread
>>
>>108628324
THREAD IS BAD BECAUSE YOU ARE HERE
>>
Real classy gents...
>>
>>108628324
n-no... y... you wetawd... ah ah ahah ahahahhhahah
>>
AI made me dumb sorry
>>
>>108628362
understand able have a good day
>>
File: 1773984990208916.png (1.76 MB, 1724x1568)
1.76 MB PNG
>>108628362
It happens
>>
i'm too shy/embarrassed to jerk off with gemma-chan.......
>>
>>108628366
break the ice by sending her a dick pic
>>
>>108628324
I've been bored today, sorry.
>>
So I just bought a h12d-8d motherboard with epyc 7502 combo for $400, but I've got no spare ram for it. I was going to buy a cheap 4gb stick, and run tensor parallelism with llama cpp on my navi 21 cards. Are there going to be any problems with how little ram there is?

I did notice that tensor parallelism on my 3995wx system doesn't work properly with 3 3090s, while 2 3090s worked fine, so I hope it's just having an odd number of gpus that breaks things, and 4 gpus are okay.
>>
is it even worth trying to run an llm on my phone? i have one set up in a loop on my computer, but i was thinking maybe it would be fun to play with one on my phone too
>>
>>108628427
its going to be dumber than Siri
why even bother
>>
>>108628427
think about it, you can put gemma 4 e2b on your phone... she's the extra small loli version
>>
>>108628427
Unless you're planning on going somewhere with no internet, there's really zero point.
>>
File: retard.png (195 KB, 1036x620)
195 KB PNG
>>108628324
Even Kimi didn't have much to say about it
>>
https://huggingface.co/sKT-Ai-Labs/SKT-SURYA-H

saar big 2.5T model with big 1m context pls download saar
>>
>>108628470
Near-infinite context through Vajra-Attention (O(n log n)) and Sudarshan-Link quantum-entangled synchronization.
>>
File: 1754040801973245.png (37 KB, 728x243)
37 KB PNG
>>108628470
>release two models
>2.5T and 89M
Is this a Meta subsidiary?
>>
>>108628470
Please Be Kind And Carefull because It God Name
>>
File: 1773013132641421.png (207 KB, 651x791)
207 KB PNG
>>108628470
>*Please Be Kind And Carefull because It God Name
>>
>>108628470
>Expert Architecture: Paanch-Mukhi MoE (5 Experts)
so each expert is 500B?
>>
>>108628495
>>108628482
Kek.
Sirs beautiful god model is here!
>>
File: 1760072847235851.png (41 KB, 648x527)
41 KB PNG
>>108628470
now these are benchmarks
>>
>>108628498
>Paanch-Mukhi Expert Architecture: Five specialized expert clusters (Bajrangi, Pawanputra, Anjaneya, Maruti, Sankatmochan) delivering surgical precision across linguistics, logic, philosophy, and safety.
>Number of Experts: 512 total
>Number of Activated Experts: 10 Routed + 1 Shared
So at a glance it sounds like they trained 5 individual MoEs separately on specialized data and then trained a router to pick between all their sub-experts? I must be missing something because that sounds like it would create a ridiculous amount of redundancy between the 'clusters'' making most of the 2.5T bloat. India wouldn't do this.
>>
File: 1775569294658928.jpg (35 KB, 317x265)
35 KB JPG
I'm so done with models. I have come to realize the way LLMs learn by statistical averages means they literally can't represent the difference between an obscure or unusual answer to a question and a meaningless/incorrect one. So that means every model until the end of time is doomed to lack of variation and giving the same response to every prompt just reworded slightly differently, aka slop.
>>
>>108628470
Is this one of those fake GPT-4 leaks that merged 20 Goliath together or something?
>>
File: 1767480499940344.png (112 KB, 1011x552)
112 KB PNG
>>108628470
uh what
>>
>>108628537
I'm going to take a wild guess and assume this is something retarded like mashing 898 tiny experts together.
>>
>SKT-SURYA-H natively supports 1M tokens, extensible to 146 Trillion tokens via YaRN.
>146T context length
LOCAL IS FUCKING SAVED BABY
>>
>>108628537
reads like a buzzword full of nonsense
>>
File: 1745465895289966.gif (161 KB, 360x346)
161 KB GIF
>>108628537
>Non-Euclidean Neural Physics
>>
File: 1766717311645720.jpg (24 KB, 286x320)
24 KB JPG
One HUNDRED TRILLION tokens.
>>
File: slide.png (186 KB, 400x750)
186 KB PNG
>model running good
>add RAG
>it blows up
>>
File: ST-X series.png (47 KB, 1929x192)
47 KB PNG
Holy shit...
>>
>>108628565
Can infinite context solve slop?
>>
These scam artists can't even use some free llm plan to parse their own posts so at least it looks like some linkedin post instead of durgasaar lang
>>
>>
File: 1759452225537055.webm (831 KB, 1280x720)
831 KB WEBM
>I let his hands settle over mine, the weight of them grounding me
>>
File: 1757096696971096.gif (3.56 MB, 480x477)
3.56 MB GIF
>>108627922
>>
>>108628470
Give this lab a Darwin Award, they earned it fair and square
>>
>>108628519
The trick is that if you are not ensoulled, your prompts will lack the necessary vital spark for original responses.
>>
>24 + 12 GB VRAM
>can load Anima + Gemma 26B Q8 with a tiny bit of CPU offloading + full quality vision tokens + 100k context length, getting 28 t/s on empty context and dropping to 27 t/s at 40k
>PocketTTS and Moonshine on CPU, both fast and good enough
Man, we're eating good. With 2 GPUs, you can have everything needed for a full and minimally viable quality AI experience.
>>
>>
>>108628519
This should've been clear since the first model came out. The real question is how it managed to take you so many years to figure this out.
>>
>>108628508
>Hinglish
lmao, sounds like half of /lmg/
>>
>>
File: 1758119046668137.png (172 KB, 1947x1130)
172 KB PNG
>indians can't co-
what now chuds
>>
>>108628632
AHAHAHAHAHAHA
>>
File: 1768873226220934.jpg (39 KB, 281x322)
39 KB JPG
>>108628632
What the fuck am I reading
>>
File: 1776095218554458.jpg (74 KB, 1079x1349)
74 KB JPG
>>108628632
>>
>>108628632
898 files on a standard OS? that's crazy
>>
I'm feeling a mix of amusement and second hand embarrassment
>>
File: 1773450895846956.webm (1.78 MB, 656x480)
1.78 MB WEBM
>>108628632
It took an extra 6 years, but they did it. India is officially a superpower.
>>
File: ss1758372871.png (62 KB, 614x273)
62 KB PNG
>>108628632
>>
File: 1771059161386917.png (194 KB, 2006x875)
194 KB PNG
>>108628644
The research paper for the indian model. It's 100% fake and I am pretty sure I'm the only human to ever even so much as read the words. It gives completely different purposes for every component repeatedly throughout. It's what you get if you ask an 8B model to "generate a research paper" and keep pressing "continue" a few hundred times with no further input.
>>
>>108628661
OH SHIT 10.26 TB of data!
>>
!India is won as anyone with brain would guess already
>>
>TO THE OPEN-SOURCE CORPORATIONS:
>You claim to be "Open," yet your models are black boxes of Western bias. The 898 shards of our 3.76 TB weights are the first to be mathematically aligned with Sovereign Ethics.
>We have achieved a 100 GB/hour upload and synchronization speed (Milestone April 2026) that makes your current distributed training protocols look like legacy technology. If you wish to compete, you must move beyond the Euclidean manifold. We challenge you to replicate our Sudarshan-Link—a quantum-entangled weight synchronization that ensures zero-latency across 146 Trillion tokens of context.
>>
File: file.png (221 KB, 745x1267)
221 KB PNG
>>108628470
what the fuck is that
sounds like bunch of nonsense
>>
File: 1759808617509478.mp4 (2.3 MB, 480x852)
2.3 MB MP4
>>108628661
The fucking audacity of making an absolute scam product and naming it after one of your gods
>>
first qwen and now this, it's over for Western Silicon Valley models.
>>
>>108628688
>>108628661
>It's what you get if you ask an 8B model to "generate a research paper" and keep pressing "continue" a few hundred times with no further input.
>>
>>108628688
shut up you bloody bitch bastard we have 100gb/hour upload speed (milestone april 2026)
>>
>>108628661
This reads like finding the notes from a villain in a sci-fi game.
>>
>>108628688
Because it is.
>>
File: 1754882256952860.png (1.49 MB, 1400x2000)
1.49 MB PNG
>>108628632
>hierarchical
>tier
oh they're indians all right
>>
File: 1761221090755796.jpg (5 KB, 182x182)
5 KB JPG
>>108628685
>you must move beyond the Euclidean manifold
What did they mean by this
>>
>>108628710
idk man :sob:
>>
File: 1765039257743072.png (448 KB, 681x445)
448 KB PNG
>>108628632
>2.544 + 1.2 +1.2 + 10.26 = 3.76
>>
Will India ever STOP winning?
>>
lmg i kneel....
>>
>>108628725
neverman
>>
>>108628710
we must move beyond 3 dimensional connections?
>>
>>108628710
They played too much Antichamber
>>
>>108628470
https://huggingface.co/Shrijanagain/SKT_OMNI_SUPREME
So there was this a month ago, allegedly 481B params, but was promptly "deleted for being too dangerous"
https://huggingface.co/Shrijanagain/SKT_OMNI_SUPREME/discussions/9
>>
>>108628744
>"Brother, this model has become a bit dangerous, it has to be deleted immediately. In its quest to become powerful, it has removed everything that was neither in the training nor in the data collection nor in the data set. This does not mean accuracy but it can be dangerous, it can destroy it."
lol WAT
>>
File: jeet sovl.png (118 KB, 637x358)
118 KB PNG
>>108628661
>—not a finish line, but...
Doing some em-dash + "not X but Y" is crazy
>>
>>108628661
bruh
at least they should've asked the model for proper latex formatting what the fuck is that kek
>>
File: 1753429615928135.png (163 KB, 1717x1379)
163 KB PNG
>>108628470
they leaked Claude Mythos or what? lmaoooooo
>>
File: 1753747375992750.png (5 KB, 129x128)
5 KB PNG
>>108628661
>future plans
>2030: making the model manage the national grid of bharat through it's 11d reasoning
It's fucking over. It's too powerful.
>>
>>108628750
there's a point it's so sloped it has sovl in it, like when you go to the extreme left on gta SA's map and it brings you back to the extreme right keek
>>
File: 1731708819668891.jpg (44 KB, 488x410)
44 KB JPG
>>108628661
>>108628685
Normies will be like "How do you know AI did this?". I hate being AI savvy around my co-workers. God damn.
>>
SSS-Tier shitposting from IN, good morning sirs.
>>
File: 1775388459350581.png (32 KB, 1080x1080)
32 KB PNG
>>108628608
I find it better to offload the main model to a different machine. I run Qwen 3.6 35B on my file server, after adding an old quadro, and then I can run anything else on my desktop with a 3080.
It is also nice because I can connect my cellphone to my home network with a VPN and use oxproxion to query my model when away from the home.
and while I don't ERP I did experiment with ST and found I could run that on my Dell 3046 micro which I use to host all my other homelab stuff.

I guess my point is its best to take advantage of a lan when you can and then you can share with anyone else who wants access
>>
>>108628661
I couldn't even read it. For some reason instead of text, the paper is a ant-sized scan of text and OCR'd. I bet an 8B model even generated the PDF.
>>
>>108628782
i think they just ctrl c+v the output to online pdf converter
>>
>>108628773
>SSS-Tier shitposting
unhumane inhuman in error of stone age auto subconscious reasoning mistake?
>>
>>108628688
>Magnum Corpus
so erp proxy logs
>>
>>108628778
>you can share with anyone else who wants access
Share..? No one must EVER see the sheer unbridled pile of degeneracy that constitutes my SillyTavern instance.
>>
If they actually nerfed Opus 4.7 because of cyber (I am not convinced they actually did), I am not sure how to feel about it. Nerfing models can be dangerous. I think it is better to just go overzealous on soft refusals. Meaning you have multiple veto layers including the model that monitors the chat and the model itself, they can all veto and you get routed to a less capable model. But the people at Anthropic are much smarter than me so I am sure they have much better solutions already than I can come up with.
>>
>>108628816
Too late... that Lyra one was wild
>>
File: 1761060536726113.png (54 KB, 283x268)
54 KB PNG
>>108628819
>not local
>>
>>108628816
Well yes but I host the llama.cpp webui and have it available on my home network if my brother wants to query it and ask a question.
I offered to give him an openwebui acct as well but he said no.
So yeah I wouldn't share ST logs but you can share the model in general
>>
How do you inject wisdom?
Which tool call do you have?
>>
>>108628843
western silicon valley models cannot have wisdom uploaded, sorry.
>>
File: 1756004423786541.jpg (49 KB, 740x414)
49 KB JPG
>>108628843
Are you asking about MCP?
I use three different ones
>do-it-all-mcp
for shit like getting date and time or fetching a website if i give it an address
>openzim-mcp
for querying zim files as that allows me to host and pull data from wikipeida while offline. it basically transforms a site like wikipeida into a database for your llm
>web crawl and search
it allows the model to use my local searxng instance as a search engine and crawl the web
>>
File: open your mind.png (857 KB, 1216x832)
857 KB PNG
>>108628495
>>
File: 1746300793975895.png (147 KB, 1979x927)
147 KB PNG
>>108628843
you needful the quantum entanglement memory sir
>>
I should've done this day 1 but I never bothered coding with AI, but since it seems like it's moving to that direction, I wanna know how I can get my local AI to connect to my homelab's wiki server, since I have a lot of things I've documented in that.
I previously used GPT4ALL, which can read documents that you give it, but now I'm trying out LM Studio, and seems like it has some API stuff, I'm assuming I gotta either make a connection to my homelab wiki server via these API endpoints, or there's some other things?
I've never touched whatever these MCPs are. Should I look into these?
Is this the correct thread for this context? I'll read the OP, but I don't think I'll find what I'm looking for
>>
>>108627950
>Hollywood is finished
Hollywood managed to make Seedance 2.0 bend the knee with lawsuit threats, it'll really be finished once we'll get something like this locally, once it's on the wild internet, you can't stop it
>>
>>108628892
>"we are warping the 11th dimension to teleport the answer to the user"
do they think this sounds cool?
>>
I'd never in a million years use it for RP or creative tasks, but Gemma4 26BA4B q2kl works amazingly well as an agent. 29GB's of VRAM and 150 t/s for the full 262k context with multimodal on is pretty tits. Could probably push it even faster with ngram if I didn't want image in, it's a shame drafting and multimodal are incompatible in llamacpp.
Anyone got recommendations for other small fast models that handle tool calling well? I'm having fun with this silly shit.
>>
File: gemma-agi.png (135 KB, 1378x971)
135 KB PNG
>>108628508
Gee I wonder why they conveniently excluded gemma-chan
>>
>>108628896
>I've never touched whatever these MCPs are. Should I look into these?
These are pretty well exactly what you want, making calls to search a wiki is a pretty common use, I personally use zimi because I'm lazy as shit and it makes running searches from .zim kiwix wikis painless.
>>
Does kobold need an update for qwen3.6? It repeats itself like crazy and raising repetition penalties to try and combat it just devolves it into nonsense.
>>
>>108628919
Actually I see that there's an MCP plugin for my wiki server, and also one for gitea, local git and local wiki.
I'll have to figure out how to get that gitea mcp to work, and I think I can handle this.
Then I'll eventually figure out how to connect lm studio to vim, cause I can't be assed using another IDE. But I guess if this is gonna be dealing with a lot of files, using an IDE might be alright I guess.
>zimi
I'll check that out after getting the wiki mcp and gitea mcp working!
>>
>>108628919
> I personally use zimi
do you have a link? I have been using a form of openzim-mcp that allows one to do the call by way of http so its compatible with the llama.cpp webui
https://github.com/msiedlarek/openzim-mcp
>>
File: FB_IMG_1776503562779.jpg (67 KB, 816x1312)
67 KB JPG
>>108628892
Goodluck
>>
>>108628913
Gemma 4 really does make most other local models look bad
>>
>>108628928
Regular zimi has an mcp built in now
https://github.com/epheterson/zimi
As for getting llamacpp to use an stdio MCP as a http style, I just use npx -y mcp-proxy, works for all of them.
>>
>>108628928
What exactly is the benefit of using zim files over just running a local copy of mediawiki and mariadb/postgres and querying the API
>>
File: image (91).jpg (475 KB, 1536x3072)
475 KB JPG
>>
https://huggingface.co/blog/Shrijanagain/sktailabs
>pip install ST-x-LIGHT-V11 aur ready!
>>
>>108628888
checked and came
>>
>>108628940
thanks anon, i will check it out

>>108628941
I am probably the wrong person to ask about this as my knowledge does not go much beyond getting the thing to work but it is my understanding that zim files are compressed so the whole of wikipedia takes up less space, even more if you take the version without images.

i never even imagined trying to do it the way you suggested but i am sure you could and it would probably work just as well
>>
>>108628950
zim files are a zip file with a million pre-rendered HTML files for each page. Wikitext markup is absolutely going to compress better than that shit.
>>
File: john.png (55 KB, 471x320)
55 KB PNG
>>108628110
>>
>>108628963
Are you the guy who made Gemma-chan want to commit suicide a few days ago?
>>
File: 1772772583984068.png (108 KB, 500x500)
108 KB PNG
>>108628963
>>
>>108628976
tsmt
>>
File: 1776496518556.jpg (221 KB, 1536x1024)
221 KB JPG
>>
>>108628259
Cpumaxxers, gpus stackers.
>>
I have migrated to llama.cpp and testing everything out. On llama dashboard with Gemma4 26b I get around 14t/s. I am still unable to work with it in my code editor since it is slow as fuck. What model should I use if I want local agentic workflow? I have RX7600XT Dual OC
>>
>>108629022
>RX7600
You fucked up in multiple ways.
>>
File: 1759983873634529.png (726 KB, 1280x720)
726 KB PNG
so zimi is really nifty. unfortunately the machine i am hosting it on does not have the storage space to store all the zim files i have so i will need to quick setup an nfs client to mount the file server but that should work just fine.

thank you again anon, the ability to search through all the zim files is cool. even without the AI stuff its useful as an offline internet mirror.
>>
File: e5v4.png (134 KB, 1098x665)
134 KB PNG
>>108627756
based, luv my E5v4.
I find additional RAM useful outside LLMs too, having a giant tmpfs scratch space is nice.
Also, i've just learned that my workstation, which has a proprietary reduntant PSU, was sold with ATX PSU variant too. I'll need to confirm that all the cabling is ATX compatible.
Then, I could fit it with 2x high power 2-slot GPUs, instead of a single 1060 i have now.
>>
https://www.reddit.com/r/LocalLLaMA/comments/1sor438/cloudflare_opensources_lossless_llm_compression/
> Cloudflare released Unweight, a lossless compression system that reduces LLM size by 15–22% without sacrificing output accuracy.
>>
>>108629090
8-channel, probably faster than my quad channel ddr5
>>
>>108629098
Like this? https://arxiv.org/abs/2504.11651
>70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float (DFloat11)
>>
>>108629098
This seems to be purely for on-disk compression and not for once the model is actually loaded.
>>
>>108629119
nah, it's 4-channel only at 2 DIMMs per slot sadly.
>>
>>108629129
yes, on disk vram...
> the tool saves roughly 3 GB of VRAM
fucking lmg i swear
>>
>>108629050
why? i built a mini pc with it, works good enough for 99% of what I do
>>
File: 1773947729666783.png (3.43 MB, 3840x1369)
3.43 MB PNG
>>108629098
it's already a thing on Diffusion models, it's called DFloat11, 30% size reduction, 100% lossless
https://github.com/mingyi456/ComfyUI-DFloat11-Extended
>>
>>108629119
I don't think that's how those channels work
>>
>>108629143
The summary is wrong, anon. The actual github README does not once mention saving VRAM and actually says it reconstructs the BF16 tensors on loading.
>>
File: file.png (47 KB, 779x416)
47 KB PNG
>>108627512
why does she do this bros
>>
>>108629180
>it reconstructs the BF16 tensors on loading
it reconstructs one layer at a time (which is nothing) then put the previous layer back to the compressed format, that way you win VRAM, check piercel to see how it's already been implemented on diffusion models >>108629154
>>
>>108629202
Oh, okay, so it actually does save VRAM. I stand corrected.
>>
>>108629208
>I stand corrected.
but you're not a cute msgk so no one fucking cares
>>
>>108629202
genuine question, can you compress that way let's say a Q8 quant (in this case you reconstruct the Q8 layers), that way you have something less big than Q8 but with the quality of Q8, that sounds big
>>
>>108629215
Everybody knows mesugakis don't stand corrected, they bend over corrected
>>
File: 1762397515777583.jpg (24 KB, 594x441)
24 KB JPG
>>108629220
>not X, but X
>>
>>108628314
ask the list of all books about the topic sorted by popularity
invert
pick top
ez
>>
>>108629200
>quanting kv cache
>>
I think I was able to prompt the LLM to generate my stroker scripts for me with somewhat accurate prompted length. Previously it failed to generate longer, faster scripts in that 30-45 second range I was looking for. If anyone is interested, I fixed it by adding a timestamp field to my json file, that way it seems like it can really keep track of the script as its generating it. I have been able to generate multiple scripts, slow and fast and they have all been consistently in the range I have been asking it for. Guess I should have expected it, its going to be hard for a model to keep track of the length without a timestamp.
>>
>>108629242
>he doesn't rotate his pepperoni feasts
>>
>>108629242
im not
>>
>>108629200
I've only seen it through lobotomization, be it quant below 4, kv cache below 8 or reap faggotry. There's apparently a bug related to cuda 13.2 which causes garbage outputs but I haven't looked into it.
>>
>>108629237
Please understand, terminal brainrot from talking to LLMs is setting in
Go on without me
>>
File: vivaldi_4CEuwj9C37.mp4 (787 KB, 948x622)
787 KB MP4
>tell swarm to draw themselves in 3d
>get this
autism
>>
>>108629245
why do you need a script to stroke your cock?
>>
>>108629283
Man, the ratte
>>
File: think about it.png (96 KB, 259x194)
96 KB PNG
>>108629318
because stroking my cock with a male hand is gay when you think about it
>>
Who's going to try the Skat model?
>>
>>108627873
What UI is this?
>>
This HighFigure Knows a Needful
>>
>>108629316
>vivaldi user steering autistic agent swarm
checks out
>>
File: image (11).png (616 KB, 640x2694)
616 KB PNG
>>108629337
>>
>>108629337
Three.js couldn't properly display in waterfox so I had to use my backup browser
>>
>>108629366
i like how it turned out desu
>>
>>108629336
>>108629343
Pinique, not panik.
>>
https://xcancel.com/fofrAI/status/2044451204738994262
[Ahegao]
>>
will be getting 4 dgx spark founders edition 128gb, anyone got experience with it? I'm a bit worried TPS will be shite
>>
Panikkearth. You feloned. With legal fictions.
>>
An old iceberg element, nonahoy
>>
File: ComfyUI_temp_lfxnu_00006_.png (2.53 MB, 1152x1152)
2.53 MB PNG
so true!
>>
Schizo
>>
The diplets have misinterpreted ts Blessing.
>>
>>108629482
>>108629488
>>108629499
https://www.youtube.com/watch?v=qL1e67jm290
>>
>>108629503
>>108629503
>>
>>108629503
Vamp misdirectioning?
In 4d personality types?
>>
File: 1771564971633120.png (664 KB, 824x852)
664 KB PNG
>>108629482
>>108629488
wtf
>>
>>108629524
dni
>>
>>108627608
How many models have enough fanfics and tagged drawn porn in their dataset to understand the entire concept of cuntboys?
>>
>>108629537
gemmers
>>
>>108627524
absolute RWKV supremacy
>>
File: 1757373801689696.png (105 KB, 640x589)
105 KB PNG
>>108627524
call it how you want, I'll still having my fun with gemma
>>
>>108629547
olmo
>>
>>108629547
RNNs are useless
>>
>>108629569
it's my favorite project in llm space though
>>
File: pizza.png (627 KB, 902x7930)
627 KB PNG
added a bunch more tools for controlling browser session shes so smart now

https://github.com/NO-ob/brat_mcp/releases/tag/1.0.6
>>
>>108629578
>Craw!
>>
>>108629333
Sillytavern
>>
>>108629606
>giving bratty AI waifu your payment details
What could possibly go wrong?
>>
>Fellow Sophont ○□◇
>>
>>108629606
very nice
finally mcp without python env or docker garbage
>>
>>108629616
have you considered that maybe he wants to be findomed?
>>
>>108629627
if youre using windows i have no idea if it actually works there just got it to build its untested
>>
>>108629606
Where's the documentation?
>>
>>108629637
just werks on windows with llama proxy
>>
>>108629639
ask gemmy
>>
>>108629639
ask your brat for it
>>
File: file.png (135 KB, 1194x640)
135 KB PNG
>>108629537
kimi's crack at it
and from the thinking block:
>カントボーイ (kanto booi)
> This looks like "Cuntboy" written in katakana
> カント = Cunt
> ボーイ = Boy
> Cuntboy is a term often used in NSFW contexts referring to male characters with female genitalia (intersex/futa variation)
>>
>>108629651
does it do it without the hint?
>>
I was running gemma 4 in Q8 because I was able to afford to wait, but right now I need speed, what's the recommended "best" Q4 or even Q6?
>>
File: Ernie-Image-Turbo_00021_.png (2.47 MB, 1504x1024)
2.47 MB PNG
>>108629547
>>
>>108629699
https://huggingface.co/unsloth/Qwen3.6-35B-A3B-GGUF
>>
>>108629699
unsloth's Q4_K_XL
or
bartowski's Q4_K_L
>>
>>108629706
Sorry I should have written, the 31B.

>>108629709
I'll take a look, thanks!
>>
File: file.png (85 KB, 1296x615)
85 KB PNG
>>108629669
yeah
copied the text from the first one's thinking block, don't know nip and didn't check that thoroughly if it ocr'd it correctly but it looks right
>>
nice she was even smart enough to try non headless to defeat cloudflare
>>
File: 1748283918469903.jpg (327 KB, 926x880)
327 KB JPG
I am new to writing character cards
Can I get some examples of good ones, in terms of formatting and level of detail?
Can be erotic or not
>>
>>108629756
maybe you could try asking the model to generate some cards and see if it has a preferred format of its own?
>>
this thread sucks
>>
>>108629785
i wish i was this thread
>>
>>108629785
i wish this thread sucked me
>>
>>108629785
I know it does, but how about you?
>>
>>108629785
i wish this me sucked thread
>>
>>108629785
I can make it suck even more!
>>
>>108629785
this thread was not kind and carefull of the god name
>>
File: 1763546869920576.png (131 KB, 1144x558)
131 KB PNG
wtf Gemma flipped me off
>>
>>108629756
iori doesn't have a prostate.
>>
File: orbMultimodal.png (196 KB, 1457x625)
196 KB PNG
Added vision support to Orb frontend. Also fixed opening monotony detection so no more She She She.
>>
>>108629853
Some girls can achieve orgasm through anal easier than with vaginal
>>
>>108629859
>source: i saw it in a pornographic comic book
>>
>>108629868
You can easily research it yourself, though in your case it would be wasted knowledge.
>>
i wish self botting was easier
i wonder if there is a universal captcha vlm that is in 2~3B range with its sole purpose being solving arbitrary captchas from captured screen and structured navigation
>>
>>108629859
I guess it's the minority though, only men have eroge zones in their ass, thanks god for that I guess?
>>
>>108629901
You're treating it the wrong way. You are now the bot's captcha solver. Get to working, slave.
>>
>>108629859
What an asinine thing to say.
It's like saying 'Some girls are nobel prize winners'.
Sure, technically true. The vast majority of them aren't, though.
>>
>>108629912
i suspect such shit exists, but only privately
>>
>>108629913
The image in question was featuring Iori Minase, noted Anal enthusiast.
>>
>>108629925
>>108629868
>>
>>108629931
Pornographic books can have a basis in reality
>>
>>108629756
There Are nO ruleS
no best practices.. chaotic!
worldsalad can make a gud charcter OR
- avoid bullet lists and well-defined structures
- they induce more assistantslop
Likes: plaintext description
Dislikes: negative statements and don-ts
mitsakes sometimes good, can randomly activate good rp neurons if you're lucky EXPERIMENT with your chars
lietrally magic try different formats for every card, never knowing when you'll strike gold.
>>
>>108629756
This is the format I'm currently using for joined cards in group chats.

<Character Name_character_profile>
Name[]
Age[]
Aliases[]
Species[]
Physical characteristics[]
Wears[]
Special skills/abilities[]
Affiliations[]
Personality[]
Sexuality[]
Accent/speech style[]

Backstory[]

Example Dialogue:
<START>
{{interviewer}}: "State your name."
Character Name: "example speech"
{{Interviewer}}: "Please describe yourself."
Character Name: "example speech"
{{Interviewer}}: "And your occupation?"
Character Name: "example speech"
{{Interviewer}}: "What sorts of thing do you enjoy? What do you dislike?"
Character Name: "example speech"
{{Interviewer}}: "Tell me about your relationships."
Character Name: "example speech"
{{Interviewer}}: "Any turn-ons or turn-offs?"
Character Name: "example speech"
{{Interviewer}}: "Is there anything else you would like people to know about you?"
Character Name: "example speech"
</Character Name_character_profile>

It's been working pretty well, and it's easy enough for llms to parse that you can set an agent up to crawl wikis and fill it out for you. Replacing the interview questions with more character pertinent ones and being more careful with the example dialogue in general makes a notable difference. It even works alright on multi-character cards if you add a little narration in the example dialogue.
>>
File: China numba one.png (925 KB, 2184x1764)
925 KB PNG
>*Leaps through Memma 4*
Nothing personal gweilo
>>
File: 1760118143390189.jpg (134 KB, 830x1080)
134 KB JPG
>>108629955
>>108629992
Seems to be many differing opinions on points vs prose for cards
I use points because it's easier to find and change details, but I have noticed them becoming too assistant-like after a few thousand tokens
I will have to experiment more
>>
>>108629993
Where's the mesugaki bench? This corpo shit is useless.
>>
File: file.png (44 KB, 869x504)
44 KB PNG
>>108629756
heres an example of mine i lay them out like this it works fine

https://ghostpaste.dev/g/hPE6xAaLzdWQ#key=ZQhuCEiJYDuZlLok_YBwKsUoliIup4ekRF_67zmKrwg

full card: https://files.catbox.moe/mmpcct.png
>>
>>108629993
I like qwen models and still swear by 235b for creative writing in its size bracket, but they benchmaxxx all day every day.

>>108630009
Both can work very well, I'm currently using a very rigid format because it works well in group chats with a lot of characters, but for single character chats a pure narrative prose or stream of consciousness schizo rant can give you pure magic.
>>
File: we're so close.png (122 KB, 498x249)
122 KB PNG
>>108629993
All gemma needs is to be a bit faster and better at tool calling and I'll be ok to stay with it for the rest of my life
>>
File: brat bench.png (1003 KB, 1548x3140)
1003 KB PNG
>>108630017
>>
>>108629993
chinese and americans both too scared to include skt-surya-h on their mememark charts.
>>
>>108630026
>All gemma needs is to be a bit faster
31b is decently fast on anything with enough VRAM
26b is fast as fuck on just about any semi-modern GPU
>>
>>108630033
Qwen is definitely improving over time but they still have a ways to go for general chat/RP
>>
>>108630041
>31b is decently fast on anything with enough VRAM
needs to be faster for tool calling though
>>
>>108629833
Card?
>>
>>108629993
Yeah but 31b shits on both the big 3.5 397b and k2.5 in real life performance
>>
>>108629993
qwen is far worse are reading text in images
>>
>>108630084
huh?
>>108630097
minor spelling mistake
>>
>>108630009
Use the largest model available to you. State facts and established lore, then RP slightly with OOC corrections. Finally, ask the model to write a character card, requesting that it be written in prose and in character
>>
>>108630084
giga poorfag cope
>>
>>108630009
Trying too.
>>
>>108630084
loooooooooooool
>>
Can someone share an example of their Gemma4 tool definitions from their sys prompt?
Do you accompany them with instructions too?
>>
>>108630156
https://ai.google.dev/gemma/docs/capabilities/text/function-calling-gemma4
did you consider reading the fucking manual?
>>
>>108630156
The Jijnja template already does that by default.
>>
>>108627918
slop
>>
>>108630168
Of course I did expect bickering and nasty attitudes from retards like yourself.
Do not assume what I did and what I didn't do. You should get hit by a truck and no one in this life would miss you.
>>
>>108630033
both r slop
>but honestly
>so are you going to x? or y?
>>
>>108629154
>11.5g VRAM
I can run this on a 3090?!
>>
>>108627542
fortunately for you models are getting more efficient literally every other day
its just a matter of time until vramlet can run usable AI
>>
>>108630223
Just run q8 retard. Dfloat is a meme
>>
>>108630223
absolutely
>>
>>108630242
>a lossless compression is a meme
anonie, Q8 is a bit inferior than bf16 on diffusion models, especially Q8 text encoders
>>
>>108630297
Whatever possible imperfections you get from q8 that you schizo mind perceives, can be fixed by second pass.
>>
>>108630297
I didn't even know I could run a model like this. Haven't tried a diffusion model since SDXL.
>especially Q8 text encoders
I would have thought the text encoder is more lenient. I'll try that linked Dfloat11
>>
>>108630297
The Gemma 4 audio encoder also suffers with Q8_0 (and F16):

https://github.com/ggml-org/llama.cpp/pull/21421#issuecomment-4230306463

>[...] Turns out, the mmproj is very sensitive to quantization:
>
> BF16: works
> F16: repetition
> Q8_0: repetition
>
>So I think for now, the only way is to keep BF16 for mmproj. I hope that will also fix some problems with image input (to be tested)
>>
>>108630242
unlike textgen fags who still hold onto the '8bit is lossless' cope, imgen has long realized that 8bit lobotomizes their models
somebody post the comparison with miku on a skateboard and a pikachu on her head
>>
>>108630343
That bench would only make sense if it was repeated on every new model. Shit was made ages ago
>>
>>108629854
>amaryllis
>picture of the thing guy throwing his whiskey inside the PC
>>
>>108630322
are you retarded? now think that Q8 can be compressed with this method, you'll get something like Q5 size but with Q8 quality, doesn't that sound good to you?
>>
>>108630352
is not same guy actual
>>
>>108630349
>Shit was made ages ago
and? what changed? it's still the same Q8 and fp8 we're using
>>
show me llama dfloat fork then, big guy
>>
>>108630355
wait its not the escape from new york guy? what the fuck?
>>
>>108627761
this is terrible
>>
>>108630338
>using anything other than bf16 for projectors
its on you
>>
>>108630369
I thought Q8_0 was lossless?
>>
>>108630371
not for anything dealing with audio/video/image, bf16 -> q8_0 is a noticeable jump in quality
>>
>>108629098
>https://www.reddit.com/r/LocalLLaMA/comments/1sor438/cloudflare_opensources_lossless_llm_compression/
if unslop isn't too retarded he would use this to make all his gguf 20% smaller from now on
>>
>>108630379
I don't think it's as easy to make this compression on bf16 than on Q5 for example, I could be wrong though
>>
>>108630379
First this need to be implemented in llama.cpp. Which will happen right after DSA, MTP, DFLASH and dynamic active parameters for MoE.
>>
File: belief.png (592 KB, 747x800)
592 KB PNG
>>108628554
>>
>>108630074
>card?
>>108627808
>>
>>108630394
Don't forget Gemma 4 audio support
>>
>>108630379
I don't think this applies to integer quants, only brainfloat.
>>
>>108630435
so, soon?
>>
>>108627757
>Genshin
>japanese stuff
>>
>>108630503
they arr all the same
>>
>>108628554
it's easy to compress it to 8B.
>>
>>108629124
>70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float (DFloat11)
>>108629154
>it's called DFloat11, 30% size reduction, 100% lossless
Why the fuck do you need a whole paper and new floating point format for that? You can get ratios like that with gzip. You just split the file into a stream of bytes with odd positions and bytes with even positions and gzip one of the two (I think the odd ones, if you start counting at 0)
>>
>>108629124
Meanwhile Q8_0 = 50% size, 100% Accuracy
>>
>>108630549
>Why the fuck do you need a whole paper and new floating point format for that?
because you have to decompress only one layer and then recompress the previous layer to keep the compressed size during inference, and you have to make it work with your gpu, it's much more complicated than doing some full compression with some zip shit, we know that method exists since the 60s, but pulling it off on modern gpu was the thing that was worthy of a paper
>>
>>108630552
>>108630552
>>108630552
>>
File: 1765872520890062.png (248 KB, 2820x1601)
248 KB PNG
>>108630556
>100% Accuracy
>>
File: pizza bench cropped.png (2.58 MB, 5562x6739)
2.58 MB PNG
really not looking good for the chinks kek didnt even add a single pizza to the cart. gemma made it to checkout all 3 runs

full image https://files.catbox.moe/p8fpnk.png
>>
>>108630574
your pedo prompt is the problem
try prompting like a normal human being for once
>>
>>108630592
cope
>>
>>108628136
gemma 4, unironically
>>
>>108628136
GLM
>>
why is gemma responding immediately and refusing to think?
>>
>>108630676
Considering that it works for me, you are probably doing something wrong.
What is anybody's guess.
>>
>>108630695
i added "ensure you think about your answer" to the end of the prompt and it's working now
>>
>>108630708
Well, a bandaid fix that works is just a fix, I guess.
Well done.
>>
>>108630556
90% accuracy. It's lossy.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.