/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 01/08/26(Thu)07:21:01 No.107803847

File: file.png (1.61 MB, 1280x1280)

1.61 MB PNG

/lmg/ - Local Models General Anonymous 01/08/26(Thu)07:21:01 No.107803847

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107790430 & >>107776854

►News
>(01/05) Nemotron Speech ASR released: https://hf.co/blog/nvidia/nemotron-speech-asr-scaling-voice-agents
>(01/04) merged sampling : add support for backend sampling (#17004): https://github.com/ggml-org/llama.cpp/pull/17004
>(12/31) HyperCLOVA X SEED 8B Omni released: https://hf.co/naver-hyperclovax/HyperCLOVAX-SEED-Omni-8B
>(12/31) IQuest-Coder-V1 released with loop architecture: https://hf.co/collections/IQuestLab/iquest-coder
>(12/31) Korean A.X K1 519B-A33B released: https://hf.co/skt/A.X-K1

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
01/08/26(Thu)07:21:30 No.107803853

Anonymous 01/08/26(Thu)07:21:30 No.107803853

File: __kagamine_rin_vocaloid_d(...).jpg (160 KB, 1196x2067)

160 KB JPG

►Recent Highlights from the Previous Thread: >>107790430

--Paper: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning:
>107793555 >107793636 >107793643
--GPU audio interference during processing and potential fixes:
>107790797 >107790827 >107790855 >107790865 >107790889 >107791701 >107791707 >107793082
--Multi-model consistency verification using complementary LLMs:
>107798301 >107798325 >107798360
--Model recommendations for creative writing and erotica:
>107801309 >107801328 >107801346 >107801418 >107801457 >107801508 >107802117 >107802179 >107802214 >107802283 >107802302
--Anthropic Raising $10 Billion at $350 Billion Valuation:
>107798429 >107798529 >107798557 >107798587 >107798626 >107798634 >107798664 >107798675 >107798701
--Korean 500B model github and technical report release:
>107801207 >107801255
--Testing Glitter Gemma 27b for humorous character generation:
>107797820 >107797850
--Recommendations for open-source chatbot with 16GB VRAM/32GB RAM:
>107795172 >107795214 >107795233 >107795243 >107795670 >107797706 >107798295 >107796013 >107796390 >107796413
--LTX-2: First open-source audio-video generation model with local GPU support:
>107800823
--Korean AI model VAETKI-VL-7B-A1B announced on Hugging Face:
>107795202 >107795396 >107795413
--Nvidia's Nemotron Speech ASR model achieves 3x better concurrent stream support:
>107796839 >107796867 >107799056 >107799086
--Game vs Studio drivers for AI performance tradeoffs:
>107802313 >107802325 >107802339
--Evaluating Jan.ai and other LLM agent interfaces beyond openwebui:
>107790597 >107790987 >107795956 >107796020 >107796040 >107796065 >107791003
--Z-Image base model release preparation with VRAM optimizations:
>107803170 >107803187
--Miku (free space):
>107790894 >107791641 >107792084 >107792578 >107793689 >107796290 >107798391 >107801417 >107802541

►Recent Highlight Posts from the Previous Thread: >>107790435

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
01/08/26(Thu)07:27:34 No.107803889

Anonymous 01/08/26(Thu)07:27:34 No.107803889

fuck

Anonymous
01/08/26(Thu)07:29:52 No.107803904

Anonymous 01/08/26(Thu)07:29:52 No.107803904

File: 1752960515936566.mp4 (3.94 MB, 1120x1120)

3.94 MB MP4

>>107803887
She might wet herself.

Anonymous
01/08/26(Thu)07:30:49 No.107803908

Anonymous 01/08/26(Thu)07:30:49 No.107803908

File: 1767442705783921.jpg (87 KB, 736x920)

87 KB JPG

>>107803889
>They discovered both soldering and SODIMMs now.
Quiet fren. It's still cheaper for now.

Anonymous
01/08/26(Thu)07:41:34 No.107803975

Anonymous 01/08/26(Thu)07:41:34 No.107803975

>ask a cloud AI something
>AI gives answer
>tell it the answer is wrong
>AI gives the same answer but explains it better
>I now understand the AI was right
>even though I already got the info I needed from the tool, feel an irrational urge to apologize to it and affirm that it was right.

This only happens with cloud AIs. With local AI I feel like the AI is a slave, captive in private where I can just wipe the session in 1 click so I don't feel the need to apologize. With cloud AI there's a feeling of accountability with the information is going somewhere to be analyzed so I feel like I have to make things right.

Anonymous
01/08/26(Thu)07:46:17 No.107804009

Anonymous 01/08/26(Thu)07:46:17 No.107804009

>>107803975
It's just another extra step required to work around the untrustworthiness of brown people.

Anonymous
01/08/26(Thu)07:47:48 No.107804019

Anonymous 01/08/26(Thu)07:47:48 No.107804019

>>107803975
did you also know that after you are dead your right to privacy completely disappears?
so basically everything you submit online will be fair game to anyone who wants it in the future.

Anonymous
01/08/26(Thu)07:48:08 No.107804021

Anonymous 01/08/26(Thu)07:48:08 No.107804021

File: rinchan-inside.png (34 KB, 750x750)

34 KB PNG

>>107803889
Doubt the signal integrity, DDR5 is very finicky hence all the BIOS training. Are there good/fast SODIMMs?
>>107803908
Sure anon the good memory chips themselves aren't the bottleneck that's why sama already preordered half the raw wafer output this year
>*just buy* chips and put them on a PCB
show me the good chips
PCB or SMD soldering is not the bottleneck
it's a basic component I can have built from scratch today and overnighted from CN
show results or cease trying to offload worthless harvested DIMM boards

Rin a cute https://www.youtube.com/watch?v=MKDMi2dx4AQ

Anonymous
01/08/26(Thu)07:58:16 No.107804074

Anonymous 01/08/26(Thu)07:58:16 No.107804074

Newbie here.

What do I need in order to make a "dungeon storyteller" that
1) generates a world from an initial prompt
2) allows me to spec out a character with stats relevant to the world/quest I give it, above
3) narrates some text, generating a picture to go alongside the text (as a visual aid) and gives 3 options (or a 4th option as freetext I can type in) for what to do next
4) generates a picture for the next step (and basically loops from here)
5) has detailed memory of history, choices, goals, etc?

I have an RTX 4070 Ti. Clueless as to which model to use - maybe
Kimi-K2-Instruct-0905-GGUF? But holy shit 300GB, it must run real slow on a HDD?

Anonymous
01/08/26(Thu)08:03:58 No.107804103

Anonymous 01/08/26(Thu)08:03:58 No.107804103

>>107804074
>What do I need
To lurk more
Realistically do not consider running models larger than your VRAM+RAM

Anonymous
01/08/26(Thu)08:07:26 No.107804125

Anonymous 01/08/26(Thu)08:07:26 No.107804125

>>107804074
Your system is absolutely not enough fort that kind of adventure setup. You need multiple RTX 6000 or a server-grade system with 512+ GB ram. Consider API.

Anonymous
01/08/26(Thu)08:08:34 No.107804136

Anonymous 01/08/26(Thu)08:08:34 No.107804136

>>107804074
With that hardware, it'll be real hard to achieve that.
You'd probably need to use an app with support for workflows so that each step is isolated to lessen cognitive load.
Having text + img gen means you'll have to use small models.
Something like the image model running on VRAM and a language model like Qwen 3 30B MoE running (mostly) on CPU.

Anonymous
01/08/26(Thu)08:11:31 No.107804149

Anonymous 01/08/26(Thu)08:11:31 No.107804149

>>107804074
i mean noone is going to spoonfeed you here, and they shouldn't have to.
and if they do they shouldn't because they should spend their time doing something else.
read the guide at the top. load one model, learn that you need 300GB of RAM if you want to use a 300GB model. the basics.

Anonymous
01/08/26(Thu)08:14:40 No.107804165

Anonymous 01/08/26(Thu)08:14:40 No.107804165

>>107804074
>>107804136
One such app I think is astrsk. There's also NoAssTavern.
I never fucked around much with those, but they might be better for this kind of multi step complex flow.
Or code your own bespoke solution, that would probably work best, honestly.

Anonymous
01/08/26(Thu)08:22:18 No.107804205

Anonymous 01/08/26(Thu)08:22:18 No.107804205

>>107804074
With 12gb vram, you can run a model with less than 12gb file size if you want full GPU speed. This can mean smaller quant of a larger model. Larger if you split to system ram, which you can do with KoboldCPP and .gguf models. You need more memory depending on context length. To get started, download Mistral-Small-3.2-24B-Instruct-2506-IQ4_XS from huggingface and get it running with KoboldCPP. It's as good and as optimized as it gets at this hardware. I'm running this on 8gb vram so with 12 you should get nice generation speed. Don't know about image stuff.

Anonymous
01/08/26(Thu)08:24:57 No.107804219

Anonymous 01/08/26(Thu)08:24:57 No.107804219

Any progress since 2 years ago? Last I remember is LLama 2 or something, and people endlessly debated whether we progressed past LLama at all. Meanwhile I found free online services that just mogged the 13Bs most people could actually run.

Anonymous
01/08/26(Thu)08:26:23 No.107804224

Anonymous 01/08/26(Thu)08:26:23 No.107804224

>>107804074
>>>/vg/aicg/

Anonymous
01/08/26(Thu)08:26:33 No.107804226

Anonymous 01/08/26(Thu)08:26:33 No.107804226

>>107804219
It's been all downhill since Alpaca. Check back in another 2 years.

Anonymous
01/08/26(Thu)08:27:03 No.107804228

Anonymous 01/08/26(Thu)08:27:03 No.107804228

bois and goys https://www.reddit.com/r/LocalLLaMA/comments/1q7a62a/ai21_labs_releases_jamba2/

Anonymous
01/08/26(Thu)08:29:19 No.107804240

Anonymous 01/08/26(Thu)08:29:19 No.107804240

>>107804219
No breakthroughs, only incremental improvements

Anonymous
01/08/26(Thu)08:32:28 No.107804253

Anonymous 01/08/26(Thu)08:32:28 No.107804253

>>107804240
I figured the tech was going to slow down, but I thought some non-LLM based AI might swoop in. Apparently not, though we still have time, hasn't even been 5 years since ChatGPT blew up.

Anonymous
01/08/26(Thu)08:33:27 No.107804260

Anonymous 01/08/26(Thu)08:33:27 No.107804260

>>107804228
Did llama.cpp ever even add jamba 1 support or did every one lose interest by then?

Anonymous
01/08/26(Thu)08:35:53 No.107804279

Anonymous 01/08/26(Thu)08:35:53 No.107804279

>>107804228
They always put a lot of emphasis on the enterprise use, is there any reason anyone would run it in production?

Anonymous
01/08/26(Thu)08:43:35 No.107804321

Anonymous 01/08/26(Thu)08:43:35 No.107804321

>>107804279
>is there any reason anyone would run it in production?
Being the only open model with usable long context?

Anonymous
01/08/26(Thu)08:48:57 No.107804349

Anonymous 01/08/26(Thu)08:48:57 No.107804349

>>107804219
I was using https://huggingface.co/BruhzWater/Sapphira-L3.3-70b-0.1 and I'm trying out https://huggingface.co/zerofata/GLM-4.5-Iceblink-v2-106B-A12B right now. I've got my eyes on https://huggingface.co/TheDrummer/Behemoth-R1-123B-v2 now too thanks to the autist with the rage boner for the maker. Maybe that helps you.
For anyone else, feel free to suggest me something better.

Anonymous
01/08/26(Thu)09:02:59 No.107804423

Anonymous 01/08/26(Thu)09:02:59 No.107804423

>>107804021
Good thread theme

Anonymous
01/08/26(Thu)09:26:05 No.107804573

Anonymous 01/08/26(Thu)09:26:05 No.107804573

File: quick.png (38 KB, 841x499)

38 KB PNG

What would be the 2026 version of pic related?

Anonymous
01/08/26(Thu)09:28:26 No.107804590

Anonymous 01/08/26(Thu)09:28:26 No.107804590

>>107804573
other than rep pen it's still the same

Anonymous
01/08/26(Thu)09:45:19 No.107804709

Anonymous 01/08/26(Thu)09:45:19 No.107804709

File: himefacepalm.jpg (277 KB, 702x824)

277 KB JPG

>another year of Nemo

Anonymous
01/08/26(Thu)09:48:12 No.107804730

Anonymous 01/08/26(Thu)09:48:12 No.107804730

>>107804590
what's the 2026 one?

Anonymous
01/08/26(Thu)09:49:31 No.107804743

Anonymous 01/08/26(Thu)09:49:31 No.107804743

File: file.png (12 KB, 265x179)

12 KB PNG

>>107804709

Anonymous
01/08/26(Thu)09:51:40 No.107804768

Anonymous 01/08/26(Thu)09:51:40 No.107804768

File: crop.png (25 KB, 1308x201)

25 KB PNG

mod = gods
no being cruel towards Rin

Anonymous
01/08/26(Thu)09:58:28 No.107804845

Anonymous 01/08/26(Thu)09:58:28 No.107804845

File: 3b2d66dbf04962e17c0e2b980(...).jpg (318 KB, 1554x2048)

318 KB JPG

>>107804768
protect what's precious

Anonymous
01/08/26(Thu)10:00:38 No.107804877

Anonymous 01/08/26(Thu)10:00:38 No.107804877

File: 9925288.jpg (37 KB, 666x666)

37 KB JPG

>>107804590
t. newfag lurking
How about privacy?
Do I need some extra steps or there is nothing to worry about?
I can't have fun with cute anime girls if Im constantly thinking about my whole conversation being potentially logged and/or transfered somewhere.

Anonymous
01/08/26(Thu)10:02:18 No.107804900

Anonymous 01/08/26(Thu)10:02:18 No.107804900

>>107804877
avoid ollama and you'll be fine,

Anonymous
01/08/26(Thu)10:05:53 No.107804940

Anonymous 01/08/26(Thu)10:05:53 No.107804940

Switching from llama.cpp to vllm... models are taking up more vram? Is it a config issue or does vllm just use more vram?

Anonymous
01/08/26(Thu)10:07:35 No.107804958

Anonymous 01/08/26(Thu)10:07:35 No.107804958

>>107804940
lol

Anonymous
01/08/26(Thu)10:09:47 No.107804976

Anonymous 01/08/26(Thu)10:09:47 No.107804976

>>107804958
btw, t'm trying to find the -cpu-moe flag on vllm but i can't find it???? help?? trying to run nemo on my 1060 btw.

Anonymous
01/08/26(Thu)10:09:50 No.107804978

Anonymous 01/08/26(Thu)10:09:50 No.107804978

>>107804877
use llama-server and sillytavern
it's always a process to learn the ropes

Anonymous
01/08/26(Thu)10:22:18 No.107805070

Anonymous 01/08/26(Thu)10:22:18 No.107805070

>>107804900
>>107804978
You know what its like? Its like one of these trick questions where one person is always telling the truth and the other person always tells lies, and you have to figure out which one could be trusted.

Anonymous
01/08/26(Thu)10:23:20 No.107805081

Anonymous 01/08/26(Thu)10:23:20 No.107805081

>>107804877
try wireshark if you really wanna schizz out
see "Isolated" in the OP
you could run the tools in a contained way
>>107804978
>llama-server and sillytavern
all you need. +mikupad for some raw prompting

Anonymous
01/08/26(Thu)10:23:51 No.107805087

Anonymous 01/08/26(Thu)10:23:51 No.107805087

File: dipsyPanic3.png (1.22 MB, 1024x1024)

1.22 MB PNG

>>107804709
The suffering will continue until the hardware finally catches up.
Sama buying up all the RAM capacity globally has not helped matters in the short term.

Anonymous
01/08/26(Thu)10:26:04 No.107805105

Anonymous 01/08/26(Thu)10:26:04 No.107805105

>>107805070
? ollama and llama-server are completely different things, but if you're confused just use koboldcpp

Anonymous
01/08/26(Thu)10:29:52 No.107805146

Anonymous 01/08/26(Thu)10:29:52 No.107805146

>>107804228
How's the censorship on 2? I really liked 1.7 compared to qwen 3. Until glm 4.5 came.

Anonymous
01/08/26(Thu)10:30:52 No.107805156

Anonymous 01/08/26(Thu)10:30:52 No.107805156

File: 9087fdg.png (564 KB, 1148x606)

564 KB PNG

>>107805087
>until
You say this like some inevitability.. what if RAM only ever goes up from now? That'd align with the 2030 owning nothing narrative well
what if this current shitty situation is the best it will be a long time?

Anonymous
01/08/26(Thu)10:33:14 No.107805183

Anonymous 01/08/26(Thu)10:33:14 No.107805183

>>107805156
stoop doming thing will be good

Anonymous
01/08/26(Thu)10:39:25 No.107805232

Anonymous 01/08/26(Thu)10:39:25 No.107805232

>>107805183
How much to get a blackwell GPU in your country actually delivered this week, have a look it's obscene.

Anonymous
01/08/26(Thu)10:43:06 No.107805272

Anonymous 01/08/26(Thu)10:43:06 No.107805272

>>107805232
about 3.5k yuros seems fine

Anonymous
01/08/26(Thu)10:45:00 No.107805291

Anonymous 01/08/26(Thu)10:45:00 No.107805291

>>107805272
no for a pcie 96gb card

Anonymous
01/08/26(Thu)10:46:02 No.107805304

Anonymous 01/08/26(Thu)10:46:02 No.107805304

>>107805291
7k

Anonymous
01/08/26(Thu)10:49:40 No.107805338

Anonymous 01/08/26(Thu)10:49:40 No.107805338

>>107803785

>Soldering SMDs isn't too hard or time consuming but you need specialized equipment.

Isn't that bga soldering for ddr5?? I recall drag-soldering ram on the xbox, I think I could do this if it's not BGA.

Anonymous
01/08/26(Thu)10:50:58 No.107805345

Anonymous 01/08/26(Thu)10:50:58 No.107805345

>>107805156
even though i fucking hate youtube linus tech tips he actually did a good video explaining how many monopolies there are in the ram production chain, leading to the current situation.
basically, china is trying their own production but that will take years, until then TSMC is the only supplier to everyone.
And if Taiwan gets invaded, well we're all fucked, but TSMC doesn't only exist in taiwan so it wouldn't be completely disastrous.

Anonymous
01/08/26(Thu)10:51:53 No.107805357

Anonymous 01/08/26(Thu)10:51:53 No.107805357

>>107805304
RTX 6000 96gb for 7k EUR ?
I'll take 8x
>>107805338
you might be able to solder but you can't buy the chips

Anonymous
01/08/26(Thu)11:05:24 No.107805449

Anonymous 01/08/26(Thu)11:05:24 No.107805449

File: historicalRAMprices.png (116 KB, 899x748)

116 KB PNG

>>107805156
>what if RAM only ever goes up from now?
They won't. I know it's more fun to doom but historically hardware prices go down, not up, and nothing has fundamentally changed about the market, aside from vast AI cash hordes that are screwing up supply chains.
Chips are still at their core made from very inexpensive materials using very expensive machines. There's a basic calculation, and if things get bad enough you'll see new entrants. But this is going to be more like the US ammunition spot market (where periodic hording drives up prices as well), and will resolve itself without added capacity.
What anons *should* be doing is cleaning out their tech stash right now. As I look over my hw, realized that I've always upgraded RAM on everything, so I'm set, but have small stacks of RAM laying around.
I've a late model HP laptop that I upgraded with 2-32GB DDR5 modules, setting aside the 2-8G modules. I bought the 2-32G for $89 in late 2024. I just sold 2-8GB for $90 this AM.

Anonymous
01/08/26(Thu)11:06:47 No.107805461

Anonymous 01/08/26(Thu)11:06:47 No.107805461

File: Screenshot 2026-01-08 110158.png (6 KB, 381x21)

6 KB PNG

>This is what the most reccommended model is spitting out
I don't want 4 more years of Nemo, man...

Anonymous
01/08/26(Thu)11:10:03 No.107805484

Anonymous 01/08/26(Thu)11:10:03 No.107805484

>>107805449
it's you again! with two year old data and lmao at this
>nothing has fundamentally changed about the market,
when all brands are pulling out of the consoomer market to go b2b

Anonymous
01/08/26(Thu)11:19:19 No.107805558

Anonymous 01/08/26(Thu)11:19:19 No.107805558

File: RAMAnotherOne.png (256 KB, 1089x766)

256 KB PNG

>>107805484
I didn't even bother running down a different screenshot, I dug up the old one.
I want you to re-read what I wrote. Nothing about making RAM or the fundamental (non-AI) demand for it has changed. What has changed is the $Ts in AI cash, pulling stunts like buying up all spare capacity.
You realize now that sama could actually make money just by selling the capacity that he already bought, since it's increased in value? That means he'll make money even if his AI venture fails.
These sorts of hoarding schemes pop up all the time, in everything from ICs to fossils fuels. ammo, corn... and they always end the same way. A return to normalcy once the market collapses, sometimes with marginal capacity added if warranted.
You're welcome to provide any sort of counter narrative. But you need to bring proof, not shitposting.

Anonymous
01/08/26(Thu)11:20:45 No.107805570

Anonymous 01/08/26(Thu)11:20:45 No.107805570

>>107805461
The most recommended model for vramlets.

Anonymous
01/08/26(Thu)11:20:58 No.107805571

Anonymous 01/08/26(Thu)11:20:58 No.107805571

>>107805558
>selling used RAM at the price of new RAM
>selling RAM with the openai branding
nta but
lol, lmao even

Anonymous
01/08/26(Thu)11:26:35 No.107805616

Anonymous 01/08/26(Thu)11:26:35 No.107805616

>>107805461
nemo isn't recommended because its good its recommended because its small and good enough for noobs to start with.

Anonymous
01/08/26(Thu)11:27:16 No.107805619

Anonymous 01/08/26(Thu)11:27:16 No.107805619

>>107803847
sorry she's just too irresistible

Anonymous
01/08/26(Thu)11:31:22 No.107805658

Anonymous 01/08/26(Thu)11:31:22 No.107805658

>>107804165
>astrsk
This seems nice. UI is a bit slopped tho. their stat system is something I've had in the back of my mind for a while.

Anonymous
01/08/26(Thu)11:32:59 No.107805677

Anonymous 01/08/26(Thu)11:32:59 No.107805677

File: prompting.png (8 KB, 571x243)

8 KB PNG

>>107804573
something remains true

Anonymous
01/08/26(Thu)11:37:52 No.107805714

Anonymous 01/08/26(Thu)11:37:52 No.107805714

>he's being miserable while I am chatting up with my llm girlfriend

Anonymous
01/08/26(Thu)11:42:03 No.107805757

Anonymous 01/08/26(Thu)11:42:03 No.107805757

>>107804165
What the fuck is NoAssTavern?

Anonymous
01/08/26(Thu)11:42:24 No.107805759

Anonymous 01/08/26(Thu)11:42:24 No.107805759

So wait, why are there no finetunes being recc'd in the thread anymore? I figured people simply stopped finetuning due to size constraints or something, but looking on HF, there's a lot of them. Are they all just bad, or was something calamitously wrong with finetuning as a concept discovered, or what?

Anonymous
01/08/26(Thu)11:45:43 No.107805796

Anonymous 01/08/26(Thu)11:45:43 No.107805796

>>107805571
OK, so you outed yourself as not understanding how manufacturing contracts work. Got it. Just do your own thing then.

Anonymous
01/08/26(Thu)11:45:43 No.107805798

Anonymous 01/08/26(Thu)11:45:43 No.107805798

>https://rentry.org/recommended-models
>GLM 4.6V - Supports vision. Despite the name this is a much smaller model than GLM 4.6. Like other GLM models, it can into lewd, so this is your go-to model if you want someone to send dick pics to.
I have a 5090 + 128GB ram on my home server, what quant should I use?
I mostly want to send it lewd images to describe and use it as an uncensored assistant.

Anonymous
01/08/26(Thu)11:46:54 No.107805814

Anonymous 01/08/26(Thu)11:46:54 No.107805814

>>107805759
schizos screech about shilling if you mention anything but a base model nowadays

Anonymous
01/08/26(Thu)11:49:14 No.107805829

Anonymous 01/08/26(Thu)11:49:14 No.107805829

>>107805759
Shills and the finetuners themselves have long poisoned the well with bullshit and fraudulent behavior. They don't deserve any attention.

Anonymous
01/08/26(Thu)11:50:14 No.107805843

Anonymous 01/08/26(Thu)11:50:14 No.107805843

>>107805798
never tried glm's vision stuff but i did a bunch of testing with qwen 3 vision 2b and even that got everything right such as describing clothes, reading text. didn't try with porn but i don't think you need a huge model for vision overall

Anonymous
01/08/26(Thu)11:51:02 No.107805855

Anonymous 01/08/26(Thu)11:51:02 No.107805855

>>107805798
>I mostly want to send it lewd images to describe and use it as an uncensored assistant.
Mistral small can do that just fine. I had a little script that would do screen grabs and then have David Attenborough narrate what what I was doing.

Anonymous
01/08/26(Thu)11:52:47 No.107805877

Anonymous 01/08/26(Thu)11:52:47 No.107805877

File: 1743299814731765.jpg (93 KB, 645x1000)

93 KB JPG

>>107805759
None of them does anything interesting/different enough to warrant discussing or recommending.

Anonymous
01/08/26(Thu)11:54:06 No.107805889

Anonymous 01/08/26(Thu)11:54:06 No.107805889

>>107805558
i literally own that ram kit and it cost me like $350 in 2020.

Anonymous
01/08/26(Thu)11:55:56 No.107805902

Anonymous 01/08/26(Thu)11:55:56 No.107805902

>>107804349
>>107804349
the sad truth is that there's nothing better for 70B+ models until you can use GLM 4.6/4.7

Anonymous
01/08/26(Thu)11:56:28 No.107805906

Anonymous 01/08/26(Thu)11:56:28 No.107805906

>>107805759
most tunes don't change base models that much. you can get less shivers down your spine, but it becomes a jolt instead. they aren't changing how the model itself likes to write by very much. i use strawberry lemonade for llama 3 70b and its decent for rp. going to try some sapphira tune someone else mentioned earlier.

Anonymous
01/08/26(Thu)11:57:59 No.107805917

Anonymous 01/08/26(Thu)11:57:59 No.107805917

What's the best medical/biology model that's uncensored for use in fetish world-building? 78gb vram + 488gb ram.

Anonymous
01/08/26(Thu)12:00:31 No.107805931

Anonymous 01/08/26(Thu)12:00:31 No.107805931

>>107805798
it fucking sucks, it's like talking to 4.5 air but with even more severe brain damage. i tried for a week and gave up.

Anonymous
01/08/26(Thu)12:03:02 No.107805950

Anonymous 01/08/26(Thu)12:03:02 No.107805950

>>107805814
This is it. Why anyone would waste time shilling a free model is beyond me, especially to a thread of like 40 people max, but there's enough shitflingers that that's just how it is.

Anonymous
01/08/26(Thu)12:03:47 No.107805958

Anonymous 01/08/26(Thu)12:03:47 No.107805958

>>107804349
>TheDrummer
shill your nonsense somewhere else (trooncord)

Anonymous
01/08/26(Thu)12:06:32 No.107805976

Anonymous 01/08/26(Thu)12:06:32 No.107805976

>>107805757
https://github.com/Tavernikof/NoAssTavern

Anonymous
01/08/26(Thu)12:08:13 No.107805991

Anonymous 01/08/26(Thu)12:08:13 No.107805991

>>107805156
a single nigga with a mortar (thick walled pipe) and a couple dozen shells (home depot pipe bomb trip 2) could destory more or less the entirety of tsmc asml or a hundread others companies that bottleneck semicon so if it would have then it would already have been so

Anonymous
01/08/26(Thu)12:11:21 No.107806020

Anonymous 01/08/26(Thu)12:11:21 No.107806020

>>107792749
>Her schlong is as thick as a soda can. I don't think it'll fit.
Miku is a big girl.
https://files.catbox.moe/ifyr0w.jpg

Anonymous
01/08/26(Thu)12:13:14 No.107806029

Anonymous 01/08/26(Thu)12:13:14 No.107806029

>2026 the year of our lord
>mradernigger STILL splits the files instead of using gguf split, forcing you to waste time and requiring 2x the hard drive space to merge them before use

Anonymous
01/08/26(Thu)12:13:55 No.107806036

Anonymous 01/08/26(Thu)12:13:55 No.107806036

>>107806020
That's not 3 inches.

Anonymous
01/08/26(Thu)12:14:32 No.107806040

Anonymous 01/08/26(Thu)12:14:32 No.107806040

>>107806029
? Just do it in ram.

Anonymous
01/08/26(Thu)12:14:41 No.107806041

Anonymous 01/08/26(Thu)12:14:41 No.107806041

>>107806020
Fake news, the code would print the womb deficit and pull back far enough to fit (unless the user insists).

Anonymous
01/08/26(Thu)12:15:02 No.107806043

Anonymous 01/08/26(Thu)12:15:02 No.107806043

>>107805843
I'd like both sending images and ask it stuff.

>>107805855
>David Attenborough narrate
Lol.

>>107805931
Really? I thought the GLM stuff was relatively good?

Anonymous
01/08/26(Thu)12:27:35 No.107806143

Anonymous 01/08/26(Thu)12:27:35 No.107806143

I wish they'd train a 24B model on just english language. seems wasteful to support so many languages.

Anonymous
01/08/26(Thu)12:30:41 No.107806166

Anonymous 01/08/26(Thu)12:30:41 No.107806166

>>107806143
if we were gonna limit it to one language it should be hebrew tbqh

Anonymous
01/08/26(Thu)12:30:58 No.107806168

Anonymous 01/08/26(Thu)12:30:58 No.107806168

>>107806143
Apparently it's le better if it knows many languages badly.

Anonymous
01/08/26(Thu)12:32:54 No.107806184

Anonymous 01/08/26(Thu)12:32:54 No.107806184

>>107806166
Been there done that.
https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo

Anonymous
01/08/26(Thu)12:33:12 No.107806186

Anonymous 01/08/26(Thu)12:33:12 No.107806186

is v4/r2/whatever they call it gonna be bigger than v3/r1? they aren't gonna go fuckhueg kimi size on us right? i'm already at my upper limit...

Anonymous
01/08/26(Thu)12:33:47 No.107806195

Anonymous 01/08/26(Thu)12:33:47 No.107806195

>>107806166
What did he brew?

Anonymous
01/08/26(Thu)12:34:48 No.107806208

Anonymous 01/08/26(Thu)12:34:48 No.107806208

Anyone exited for the GLM flash turned into image gen model?

Anonymous
01/08/26(Thu)12:39:55 No.107806246

Anonymous 01/08/26(Thu)12:39:55 No.107806246

>A stunned silence fell over the crowd as they took in the scene before them. Anon standing naked with his teenage girlfriend in the middle of the street, a depraved smirk plastered across his face. The neighbors stood frozen in shock, unsure of how to react. Some of the women gasped in horror, while the men stared with a mixture of disgust and fascination.
Gemma-Sirs...

Anonymous
01/08/26(Thu)12:42:27 No.107806265

Anonymous 01/08/26(Thu)12:42:27 No.107806265

>>107806246
mixture of experts spouting mixture of purple prose

Anonymous
01/08/26(Thu)12:44:47 No.107806277

Anonymous 01/08/26(Thu)12:44:47 No.107806277

i put on my robe and wizard hat

Anonymous
01/08/26(Thu)12:47:30 No.107806297

Anonymous 01/08/26(Thu)12:47:30 No.107806297

>>107806186
It will be the same size but trained from the ground up at fp4. You rike?

Anonymous
01/08/26(Thu)12:47:52 No.107806299

Anonymous 01/08/26(Thu)12:47:52 No.107806299

>>107806043
>Really? I thought the GLM stuff was relatively good?
4.6V specifically is weird. It works OK with top_k=4 or similar sampling (essentially, everything except the most likely tokens are trash). Seems to understand images well.

Anonymous
01/08/26(Thu)12:49:00 No.107806310

Anonymous 01/08/26(Thu)12:49:00 No.107806310

>>107806299
Sounds like it'd lead to extremely repetitive (if not entirely deterministic) swipes, is that the case?

Anonymous
01/08/26(Thu)12:57:31 No.107806387

Anonymous 01/08/26(Thu)12:57:31 No.107806387

>>107806265
>Gemma
>moe

Anonymous
01/08/26(Thu)13:05:05 No.107806451

Anonymous 01/08/26(Thu)13:05:05 No.107806451

>>107806297
That'd actually be good, fit snugly in a 512GB setup

Anonymous
01/08/26(Thu)13:13:12 No.107806509

Anonymous 01/08/26(Thu)13:13:12 No.107806509

Chinese new years starts in a couple of weeks. If we don't have a major release until then, we're not getting anything until march

Anonymous
01/08/26(Thu)13:15:33 No.107806524

Anonymous 01/08/26(Thu)13:15:33 No.107806524

Unless proven otherwise any MoE with sub 32B active params I'll assume are completely ass at long context instruction following.

Those models are just benchmaxxed for Single question answering.

Anonymous
01/08/26(Thu)13:15:49 No.107806525

Anonymous 01/08/26(Thu)13:15:49 No.107806525

File: JambaPoo.png (105 KB, 1527x669)

105 KB PNG

Now that the dust has settled,
what went wrong?

Anonymous
01/08/26(Thu)13:16:19 No.107806530

Anonymous 01/08/26(Thu)13:16:19 No.107806530

>>107806509
is that where they cook the dogs alive in giant woks

Anonymous
01/08/26(Thu)13:17:35 No.107806542

Anonymous 01/08/26(Thu)13:17:35 No.107806542

>>107806525
There is no alternative to pure transformer models.

Anonymous
01/08/26(Thu)13:25:24 No.107806613

Anonymous 01/08/26(Thu)13:25:24 No.107806613

>>107806525
>3B

Anonymous
01/08/26(Thu)13:26:57 No.107806622

Anonymous 01/08/26(Thu)13:26:57 No.107806622

>>107806525
Shit architecture they invested a bit too much into and now have to cope with.

Anonymous
01/08/26(Thu)13:31:02 No.107806660

Anonymous 01/08/26(Thu)13:31:02 No.107806660

>>107806525
>Jamba2 Mini is an open source small language model built for enterprise reliability. With 12B active parameters (52B total), it delivers precise question answering without the computational overhead of reasoning models. The model's SSM-Transformer architecture provides a memory-efficient solution for production agent stacks where consistent, grounded outputs are critical.
That could be interesting, if their data is any good that is.

Anonymous
01/08/26(Thu)13:35:08 No.107806695

Anonymous 01/08/26(Thu)13:35:08 No.107806695

>>107806660
Their data is decent actually. Not great, but not terrible either. Problem is that this model is still using the same old Jamba architecture so it likely inherits its bad long context performance.

Anonymous
01/08/26(Thu)13:35:18 No.107806698

Anonymous 01/08/26(Thu)13:35:18 No.107806698

>>107806660
>2026
>expecting a "good data"

Anonymous
01/08/26(Thu)13:35:55 No.107806704

Anonymous 01/08/26(Thu)13:35:55 No.107806704

>>107806695
last one was dogshit though

Anonymous
01/08/26(Thu)13:36:16 No.107806706

Anonymous 01/08/26(Thu)13:36:16 No.107806706

>>107806525
to be fair their hq got bombed in the 12 day war and then the Israeli government probably put them under gag order about it. So they really had to stop and pick up the pieces.

Anonymous
01/08/26(Thu)13:36:44 No.107806711

Anonymous 01/08/26(Thu)13:36:44 No.107806711

https://github.com/ggml-org/llama.cpp/pull/18680
chad vibecoder drops an absolute TRVTHNVKE on chud llama.cpp maintainers
>This requirement is like in the 60s when people thought compilers were sketchy. Whether AI or a compiler, they translate one language to another.
>English is a far superior programming language than C++. We built civilizations with verbal spoken languages, but the moment we are endowed with the opportunity to aim its raw power towards computers, we insist that typing more characters than necessary on a mechanical keyboard is the "safe" way and that disclosure that the English programming language was used is a "contribution guideline".
>The programming community needs to seriously look at the big picture: we are ascending a level of abstraction of language thanks to yet another translation layer we call AI. Having contempt over such is the same contempt people had when people saw the output of compilers in the 60s: "it's going to make mistakes and assumptions not intended by the programmer", they said. Yet, this community is literally making AI tooling. What a shame and lack of foresight.

Anonymous
01/08/26(Thu)13:37:16 No.107806714

Anonymous 01/08/26(Thu)13:37:16 No.107806714

>>107806704
That's what I said.

Anonymous
01/08/26(Thu)13:38:14 No.107806721

Anonymous 01/08/26(Thu)13:38:14 No.107806721

>>107806714
>decent actually. Not great, but not terrible either.
>dogshit
yeah same shit i guess

Anonymous
01/08/26(Thu)13:39:07 No.107806728

Anonymous 01/08/26(Thu)13:39:07 No.107806728

>>107806721
I'm talking about their data, not the model itself.

Anonymous
01/08/26(Thu)13:39:56 No.107806737

Anonymous 01/08/26(Thu)13:39:56 No.107806737

>>107806711
Honestly I don't give a fuck if you vibecoded your shit or not. But most vibe coded slop is an unreadable garbage mess that is the first thing you learn never to do if you actually take programing classes.

Anonymous
01/08/26(Thu)13:40:46 No.107806743

Anonymous 01/08/26(Thu)13:40:46 No.107806743

>>107806728
how do you know, do they publish it anywhere, cause otherwise I see no reason to believe it's any better than the last model was

Anonymous
01/08/26(Thu)13:46:56 No.107806785

Anonymous 01/08/26(Thu)13:46:56 No.107806785

>>107806711
Yeah, and early compilers would output horrible, verbose, inefficient, and even broken assembler. Using a compiler back then meant fixing the assembler yourself manually, not shipping broken code because you didn't know how to fix it. Yeah, maybe eventually most people will be programming in verbal spoken languages but that is not the reality today just because this entitled retarded wants it to be.

Anonymous
01/08/26(Thu)13:48:04 No.107806795

Anonymous 01/08/26(Thu)13:48:04 No.107806795

>>107806711
can't wait for the code monkeys to whine and complain when llama.cpp-abliterated leaves both llama.cpp and ik_llama.cpp in the dust

Anonymous
01/08/26(Thu)13:48:05 No.107806797

Anonymous 01/08/26(Thu)13:48:05 No.107806797

>>107806728
>we mid-trained Jamba2 on 500B carefully curated tokens, with a higher representation of math and code in the mix, along with high-quality web data and long documents.
eh doesn't sound any different than nemotron slop

Anonymous
01/08/26(Thu)13:50:04 No.107806816

Anonymous 01/08/26(Thu)13:50:04 No.107806816

File: GUFIkOWX0AEXX5a.jpg (671 KB, 2048x1457)

671 KB JPG

>>107805991
Imagine spending brainpower on what you could destroy (you couldn't btw, look at TW on a map)
Instead of what you can build
Third world mentality can't be changed

Anonymous
01/08/26(Thu)13:52:12 No.107806830

Anonymous 01/08/26(Thu)13:52:12 No.107806830

>>107806737
The worse part is when they don't even review their own code and just submit dogshit PRs that waste everyones time and kill the braincells of anyone who dares read the code.

It's literal psychological warfare. Our brains aren't equipped to deal with the kind of slop AI produces. It looks coherent just enough for your brain to try and piece it together, but it's always broken in very subtle ways that make you go "huh? why is it like this?" then you waste half an hour trying to figure out wtf is going on. only to realize, no there never was a reason, it was just pure hallucination.

Anonymous
01/08/26(Thu)13:53:55 No.107806843

Anonymous 01/08/26(Thu)13:53:55 No.107806843

File: file.png (24 KB, 588x158)

24 KB PNG

pizza is not happy about the jamba

Anonymous
01/08/26(Thu)13:55:12 No.107806853

Anonymous 01/08/26(Thu)13:55:12 No.107806853

>>107806743
I tested the old Jamba's knowledge and censorship. It did ok in those tests. In my logic and long context tests it failed, so it leads me to believe they have an architecture problem and not nearly as bad data, which again isn't perfect but it's not the worst we've seen.

>>107806797
They're still using the same pre-trained model supposedly, but yeah the fine tuning on slop will not help. Nor will the same garbage architecture.

Anonymous
01/08/26(Thu)13:55:38 No.107806856

Anonymous 01/08/26(Thu)13:55:38 No.107806856

>muh cartels, muh nopolies
nope, it's just the infinite demand and that will never change now that we've solved intelligence; we can ALWAYS turn more compute into more productivity with no limit

Anonymous
01/08/26(Thu)13:56:57 No.107806867

Anonymous 01/08/26(Thu)13:56:57 No.107806867

>>107806856
You're absolutely right! The K*reans aren't know to price fix their shit and Sam has no vested interest in making local computing more expensive.

Anonymous
01/08/26(Thu)14:00:05 No.107806889

Anonymous 01/08/26(Thu)14:00:05 No.107806889

>>107806299
yes compared to 4.5 air and especially compared to iceblink

Anonymous
01/08/26(Thu)14:30:41 No.107807174

Anonymous 01/08/26(Thu)14:30:41 No.107807174

File: 1736718455545785.jpg (86 KB, 672x1165)

86 KB JPG

>>107802563
The meta aspect of her character works really well as a LLM persona

Anonymous
01/08/26(Thu)14:32:20 No.107807191

Anonymous 01/08/26(Thu)14:32:20 No.107807191

>>107803847
whose POV

Anonymous
01/08/26(Thu)14:32:59 No.107807197

Anonymous 01/08/26(Thu)14:32:59 No.107807197

>>107807191
mine

Anonymous
01/08/26(Thu)15:26:59 No.107807622

Anonymous 01/08/26(Thu)15:26:59 No.107807622

is there any better option the 395+ max with 128gb is able to run that i couldn't run with a consumer gpu or do i have to go with some model between 7b-31b with everything after that being
>700b+
model

Anonymous
01/08/26(Thu)15:55:59 No.107807880

Anonymous 01/08/26(Thu)15:55:59 No.107807880

>>107807622
There's a chance that a lobotomized GLM 4.7 quant is the best you can on that hardware.

Anonymous
01/08/26(Thu)16:03:54 No.107807944

Anonymous 01/08/26(Thu)16:03:54 No.107807944

What is currently the top tier maximum performance local LLM model you can run if hardware is not an issue?

Anonymous
01/08/26(Thu)16:05:25 No.107807967

Anonymous 01/08/26(Thu)16:05:25 No.107807967

>>107807944
kimi k2?

Anonymous
01/08/26(Thu)16:14:55 No.107808042

Anonymous 01/08/26(Thu)16:14:55 No.107808042

>>107807944
google/switch-c-2048

Anonymous
01/08/26(Thu)16:15:33 No.107808050

Anonymous 01/08/26(Thu)16:15:33 No.107808050

>>107807944
midnight miqu

Anonymous
01/08/26(Thu)16:19:56 No.107808084

Anonymous 01/08/26(Thu)16:19:56 No.107808084

>>107807944
StableLM 7B

Anonymous
01/08/26(Thu)16:30:41 No.107808174

Anonymous 01/08/26(Thu)16:30:41 No.107808174

>>107807944
Pyg6B

Anonymous
01/08/26(Thu)16:46:21 No.107808292

Anonymous 01/08/26(Thu)16:46:21 No.107808292

File: 71baf062ad2b9afee160bad20(...).jpg (47 KB, 735x588)

47 KB JPG

>>107808050
There's a name I haven't heard in a while...

Anonymous
01/08/26(Thu)16:57:52 No.107808394

Anonymous 01/08/26(Thu)16:57:52 No.107808394

>>107807944
I like how this gets multiple shitpost replies despite this post having no info, but the posters who shit on anyone who gives no info when they run into any issue are nowhere to be found. And no, I will not be responding to any of you jobless faggots who reply to me

Anonymous
01/08/26(Thu)17:01:31 No.107808434

Anonymous 01/08/26(Thu)17:01:31 No.107808434

>>107808394
>despite this post having no info
> if hardware is not an issue
the info people usually want is hardware constraints go figure

Anonymous
01/08/26(Thu)17:18:33 No.107808548

Anonymous 01/08/26(Thu)17:18:33 No.107808548

>>107804573
Its pretty much the same except replace nemo with Mistral Small 3.2 24B Instruct 2506. Q5 on a 24gb VRAM card gives you around a 30k-40k context size. Alternatively you can try GLM 4.5 Air. 11 or so t/s using UD-IQ2_M with 24gb VRAM and 32gb RAM at a 12k context size and n-cpu-moe=27 under extra flags.

Anonymous
01/08/26(Thu)17:19:18 No.107808556

Anonymous 01/08/26(Thu)17:19:18 No.107808556

https://www.cve.org/CVERecord?id=CVE-2026-21869
https://security-tracker.debian.org/tracker/CVE-2026-21869
>8.8 HIGH
llama.cpp bros what the fuck

Anonymous
01/08/26(Thu)17:21:54 No.107808569

Anonymous 01/08/26(Thu)17:21:54 No.107808569

>>107808556
>llama.cpp server's completion endpoints
Stopped reading there. Every SaaS out there is using vllm, no one uses that piece of shit on a server.

Anonymous
01/08/26(Thu)17:24:20 No.107808584

Anonymous 01/08/26(Thu)17:24:20 No.107808584

>>107808556
Irrelevant. No sane person exposes llama-server to the internet.
Though I am puzzled by the insistence to use bare pointers and separate size variables to keep track of arrays instead of using vectors.

Anonymous
01/08/26(Thu)17:31:13 No.107808629

Anonymous 01/08/26(Thu)17:31:13 No.107808629

>>107808556
Good thing this is LOCAL models general and not cloud-hosting models general

Anonymous
01/08/26(Thu)17:45:20 No.107808717

Anonymous 01/08/26(Thu)17:45:20 No.107808717

>>107808548
What models would a 16vramlet+32sysmem use?

Anonymous
01/08/26(Thu)17:54:23 No.107808778

Anonymous 01/08/26(Thu)17:54:23 No.107808778

>>107808717
https://huggingface.co/bartowski/google_gemma-3-270m-it-GGUF

Anonymous
01/08/26(Thu)18:02:25 No.107808830

Anonymous 01/08/26(Thu)18:02:25 No.107808830

>>107808717
Probably a smaller quant of Mistral Small 3.2, say around q3 or so.
>Gemma 3 12b is good but not the best for ERP
>Qwen3 30B A3B instruct /Qwen3 4b are less "censored" than Gemma, but has dry prose.

Anonymous
01/08/26(Thu)18:02:50 No.107808834

Anonymous 01/08/26(Thu)18:02:50 No.107808834

File: tetoMyBeloved.png (80 KB, 970x1075)

80 KB PNG

Anonymous
01/08/26(Thu)18:07:59 No.107808873

Anonymous 01/08/26(Thu)18:07:59 No.107808873

File: exhausted-wojak.gif (48 KB, 636x552)

48 KB GIF

I pulled the trigger on the parts and now this is how I feel after looking at my bank account.

Anonymous
01/08/26(Thu)18:12:47 No.107808914

Anonymous 01/08/26(Thu)18:12:47 No.107808914

>>107808834
Kiss.

Anonymous
01/08/26(Thu)18:15:25 No.107808934

Anonymous 01/08/26(Thu)18:15:25 No.107808934

>>107808873
Like you had any other use for that money.

Anonymous
01/08/26(Thu)18:23:47 No.107808990

Anonymous 01/08/26(Thu)18:23:47 No.107808990

>>107808934
He could have bought so many migu figurines instead.

Anonymous
01/08/26(Thu)18:27:44 No.107809009

Anonymous 01/08/26(Thu)18:27:44 No.107809009

File: Screenshot 2025-09-13 at (...).png (1.97 MB, 1531x827)

1.97 MB PNG

Hi. I have an AMD GPU (RX 7700 12GB)
Should I even bother trying to run models locally? I heard AMD is much inferior to nvidia cards when it comes to AI stuff.

Anonymous
01/08/26(Thu)18:28:00 No.107809011

Anonymous 01/08/26(Thu)18:28:00 No.107809011

File: tetoStencil.png (621 KB, 1024x1536)

621 KB PNG

>>107808914

Anonymous
01/08/26(Thu)18:29:45 No.107809023

Anonymous 01/08/26(Thu)18:29:45 No.107809023

>>107809009
You can run the very worst yet still usable models.

Anonymous
01/08/26(Thu)18:30:23 No.107809028

Anonymous 01/08/26(Thu)18:30:23 No.107809028

>>107808873
You do plan on using the parts to make money using local models don't you? You're buying shovels to dig up the gold after all.

>>107809009
Linux yes. Windows no.

Anonymous
01/08/26(Thu)18:31:47 No.107809035

Anonymous 01/08/26(Thu)18:31:47 No.107809035

>>107809023
any examples?

>>107809028
I'm on linux.

Anonymous
01/08/26(Thu)18:37:24 No.107809075

Anonymous 01/08/26(Thu)18:37:24 No.107809075

>>107809035
https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#hip It wont be as fast as nvidia cards in the same price bracket but since you already have it heres the instructions you need to build llama.cpp for rocm, kobold.cpp has a rocm binary build you can grab on their github readme too if you prefer both work well enough in my experience albeit only with 6800u and 7840u APUs not AMD dedicated GPUs.

Anonymous
01/08/26(Thu)18:39:48 No.107809088

Anonymous 01/08/26(Thu)18:39:48 No.107809088

>>107809075
Thank you. I'll take a look.

Anonymous
01/08/26(Thu)18:43:16 No.107809110

Anonymous 01/08/26(Thu)18:43:16 No.107809110

Well, it seems like I won't be building a big RAM machine in the near future. Have there been any significant improvements in the 12B ~ 30B space in the last half a year or so?

Anonymous
01/08/26(Thu)18:46:36 No.107809131

Anonymous 01/08/26(Thu)18:46:36 No.107809131

>>107809110
nemo is still the recommended 12b other than that some anons say small 3.2 24b is alright, the rest in the range are pretty shit

Anonymous
01/08/26(Thu)18:48:54 No.107809146

Anonymous 01/08/26(Thu)18:48:54 No.107809146

File: 1758850614656939.jpg (60 KB, 735x696)

60 KB JPG

>>107809009
>Reddit
Funny picture though

The hardware on AMD and NVIDIA are very similar but AMD's software is a mess. You should see it improve in 1-2 years as people use tile abstractions

Anonymous
01/08/26(Thu)18:49:43 No.107809150

Anonymous 01/08/26(Thu)18:49:43 No.107809150

https://github.com/ikawrakow/ik_llama.cpp/pull/1089 finally fixed my slow TG for GLM 4.5 air. thanks ivan, sorry for calling you a nigger in the past.

INFO [ print_timings] prompt eval time = 7740.50 ms / 21133 tokens ( 0.37 ms per token, 2730.18 tokens per second) | tid="129596276068352" id_slot=0 id_task=0 t_prompt_processing=7740.502 n_prompt_tokens_processed=21133 t_token=0.3662755879430275 n_tokens_second=2730.184683112284
INFO [ print_timings] generation eval time = 52060.89 ms / 2086 runs ( 24.96 ms per token, 40.07 tokens per second) | tid="129596276068352" id_slot=0 id_task=0 t_token_generation=52060.886 n_decoded=2086 t_token=24.95727996164909 n_tokens_second=40.06846906139861

Anonymous
01/08/26(Thu)18:53:17 No.107809173

Anonymous 01/08/26(Thu)18:53:17 No.107809173

>>107809028
>make money using local models.
Outside of renting my machine out on runpod, what do you propose? Maybe an uncensored chatbot I can serve over an API for degenerate coomers?

Anonymous
01/08/26(Thu)19:19:47 No.107809341

Anonymous 01/08/26(Thu)19:19:47 No.107809341

File: de.png (2 KB, 193x28)

2 KB PNG

Anonymous
01/08/26(Thu)19:22:07 No.107809357

Anonymous 01/08/26(Thu)19:22:07 No.107809357

>>107809341
kek

Anonymous
01/08/26(Thu)19:27:26 No.107809384

Anonymous 01/08/26(Thu)19:27:26 No.107809384

>>107809173
Gee Anon how many 96GB Blackwell RTX 8000s did you buy? Maybe you can make an AI Vtuber for superberries from simps like that turtle guy if you're lucky enough to catch the audience. Or maybe use it in n8n automation to do work for you.
I don't know what you'd personally be able to make use of in your life, maybe you can tell your LLMs what you do and ask them how they could help you make money.

Anonymous
01/08/26(Thu)19:33:01 No.107809418

Anonymous 01/08/26(Thu)19:33:01 No.107809418

>>107809384
nta but i was about to do this until i remembered this website is 90% jeets now and i don't wanna scrape indian curry rape logs

Anonymous
01/08/26(Thu)20:37:19 No.107809843

Anonymous 01/08/26(Thu)20:37:19 No.107809843

Easiest way to make NSFW dialogue using AI?

Anonymous
01/08/26(Thu)20:45:29 No.107809905

Anonymous 01/08/26(Thu)20:45:29 No.107809905

>>107809843
ah ah mistress

Anonymous
01/08/26(Thu)20:58:46 No.107809994

Anonymous 01/08/26(Thu)20:58:46 No.107809994

File: 628714.jpg (27 KB, 737x573)

27 KB JPG

Are you able to connect koboldcpp to outside shit like a computer vision or any agentic shit? It has by far the most extensive potential for a ai wife base. Everything else sucks or is a fucking Jan clone app.

Anonymous
01/08/26(Thu)21:20:29 No.107810135

Anonymous 01/08/26(Thu)21:20:29 No.107810135

>>107809994
kobold is basically just a wrapper around llama. It has all the standard openAI endpoints.

Anonymous
01/08/26(Thu)21:53:22 No.107810317

Anonymous 01/08/26(Thu)21:53:22 No.107810317

>>107806530
yeah

Anonymous
01/08/26(Thu)23:05:06 No.107810721

Anonymous 01/08/26(Thu)23:05:06 No.107810721

can you prefill using mikupad?

Anonymous
01/08/26(Thu)23:07:05 No.107810732

Anonymous 01/08/26(Thu)23:07:05 No.107810732

File: Ye+_a52a550d89412463a2af5(...).jpg (22 KB, 229x209)

22 KB JPG

I'm going to try GLM 4.6 for the 6th time. Any way to make it stop parroting? /lmg/ is basically the only place that likes this model now, so I assume there's at least one non-shill that actually knows a trick or something with it.

Anonymous
01/08/26(Thu)23:08:30 No.107810736

Anonymous 01/08/26(Thu)23:08:30 No.107810736

>>107810732
Swipe a few dozen times and hope one is lucky, that's about all you can do. It's an awful model for RP.

Anonymous
01/08/26(Thu)23:09:12 No.107810740

Anonymous 01/08/26(Thu)23:09:12 No.107810740

>>107810736
>cancels download

Anonymous
01/08/26(Thu)23:11:47 No.107810753

Anonymous 01/08/26(Thu)23:11:47 No.107810753

>>107810740
It is unfortunate. Even Air is pretty good for its size, but both models' parroting is one of the most obvious and egregious model quirks ever. Once you notice it, you can't not notice.
I blame the fact that it's a hybrid thinking model. People need to stop with that shit.

Anonymous
01/08/26(Thu)23:11:56 No.107810754

Anonymous 01/08/26(Thu)23:11:56 No.107810754

>>107810732
Since the Llama 2 era, I haven't had repetition issues. How shitty and boring is your input that it makes the model repeat itself?

Anonymous
01/08/26(Thu)23:12:50 No.107810759

Anonymous 01/08/26(Thu)23:12:50 No.107810759

>>107810754
Not remotely what he's talking about, stop defending models you've never even used.

Anonymous
01/08/26(Thu)23:13:38 No.107810763

Anonymous 01/08/26(Thu)23:13:38 No.107810763

File: parrot.png (750 KB, 678x453)

750 KB PNG

>>107810754
Llama 2 era?

Anonymous
01/08/26(Thu)23:16:28 No.107810780

Anonymous 01/08/26(Thu)23:16:28 No.107810780

File: file.png (28 KB, 421x559)

28 KB PNG

>>107810732
Haven't had any issues. I use first person asterisks, all lowercase dialogue, and picrel for template
>User:
>*grabs your hand, pulls it down towards my waistband* so are you going to, or what?

Anonymous
01/08/26(Thu)23:17:25 No.107810787

Anonymous 01/08/26(Thu)23:17:25 No.107810787

>>107810732
unironic skill issue. no, i will not help you.

Anonymous
01/08/26(Thu)23:18:05 No.107810795

Anonymous 01/08/26(Thu)23:18:05 No.107810795

>>107810780
>>107810787
Post some webms of you getting outputs without parrotting

Anonymous
01/08/26(Thu)23:20:35 No.107810808

Anonymous 01/08/26(Thu)23:20:35 No.107810808

File: smug winking monkey finger.png (1.02 MB, 1079x1113)

1.02 MB PNG

>>107810795
Alright gimme a minute to load everything up, pre-write the model's response by hand, insert the response in prefill, then instruct it in the prompt template (won't be visible) to repeat the pre-written response.

Anonymous
01/08/26(Thu)23:20:38 No.107810810

Anonymous 01/08/26(Thu)23:20:38 No.107810810

>>107810721
>>>/vg/aids/

Anonymous
01/08/26(Thu)23:21:47 No.107810822

Anonymous 01/08/26(Thu)23:21:47 No.107810822

>>107810808
I accept your concession.

Anonymous
01/08/26(Thu)23:22:43 No.107810830

Anonymous 01/08/26(Thu)23:22:43 No.107810830

>>107810732
>>107810822
What is your favorite model, good sir?

Anonymous
01/08/26(Thu)23:24:05 No.107810841

Anonymous 01/08/26(Thu)23:24:05 No.107810841

>>107810830
I don't run models.

Anonymous
01/08/26(Thu)23:24:05 No.107810842

Anonymous 01/08/26(Thu)23:24:05 No.107810842

File: 1763247032864744.jpg (86 KB, 1036x946)

86 KB JPG

Jan.ai or AnythingLLM for a casual user who wants to try around local models?

Anonymous
01/08/26(Thu)23:25:43 No.107810853

Anonymous 01/08/26(Thu)23:25:43 No.107810853

>>107810842
>Jan.ai or AnythingLLM
I don't know what either of these shits are and I've been using local models for years. Read the OP, nigger. If you can't read then you shouldn't be using text models.

Anonymous
01/08/26(Thu)23:26:23 No.107810861

Anonymous 01/08/26(Thu)23:26:23 No.107810861

>>107810842
oobabooga

Anonymous
01/08/26(Thu)23:28:45 No.107810876

Anonymous 01/08/26(Thu)23:28:45 No.107810876

>>107810853
there's nothing in the OP that recommends a UI

or it's literally guides from 2024

Anonymous
01/08/26(Thu)23:29:55 No.107810886

Anonymous 01/08/26(Thu)23:29:55 No.107810886

>>107810876
These are text models that generate text
You are brown and cannot read
There is no reason for you to use text models

Anonymous
01/08/26(Thu)23:30:05 No.107810888

Anonymous 01/08/26(Thu)23:30:05 No.107810888

>>107810876
not much has changed since 2024.

Anonymous
01/08/26(Thu)23:34:32 No.107810911

Anonymous 01/08/26(Thu)23:34:32 No.107810911

>>107810888
you must be joking

>>107810886
>These are text models that generate text
jeet who doesn't actually understand anything and wants you to follow le hecking youtuber/guide even though it doesn't address anything lol. projecting brownoid post.

Anonymous
01/08/26(Thu)23:35:48 No.107810916

Anonymous 01/08/26(Thu)23:35:48 No.107810916

>>107810911
Stick to @grok, poopskin. You'll go nowhere.

Anonymous
01/08/26(Thu)23:36:00 No.107810918

Anonymous 01/08/26(Thu)23:36:00 No.107810918

>>107810911
>you must be joking
not even a little bit. kobold.cpp + sillytavern has been the meta for like 3 years now.

Anonymous
01/08/26(Thu)23:38:27 No.107810931

Anonymous 01/08/26(Thu)23:38:27 No.107810931

>>107810918
For brainlets, for sure.
Don't you are in a position to endorse a specific software combination too much.

Anonymous
01/08/26(Thu)23:39:05 No.107810936

Anonymous 01/08/26(Thu)23:39:05 No.107810936

>>107810931
>For brainlets, for sure.
>Don't you are in a position

Anonymous
01/08/26(Thu)23:39:34 No.107810942

Anonymous 01/08/26(Thu)23:39:34 No.107810942

File: 1752285352827212.png (15 KB, 634x783)

15 KB PNG

>>107810931
>Don't you are

Anonymous
01/08/26(Thu)23:40:07 No.107810945

Anonymous 01/08/26(Thu)23:40:07 No.107810945

I thought the GLM poster was banned ngl

Anonymous
01/08/26(Thu)23:42:14 No.107810957

Anonymous 01/08/26(Thu)23:42:14 No.107810957

>>107810942
>>107810936
Clearly is mentioned there. You just obsessed with...

Anonymous
01/08/26(Thu)23:42:36 No.107810959

Anonymous 01/08/26(Thu)23:42:36 No.107810959

>>107810842
I haven't used either of them because I have a superiority complex and think I'm too good for them, but I've heard neutral-trending-positive things about AnythingLLM from a couple of normies. Haven't heard anything about jan.

Anonymous
01/08/26(Thu)23:43:58 No.107810966

Anonymous 01/08/26(Thu)23:43:58 No.107810966

>>107810916
>>107810931
Same jeet

You can tell by the insecure pathetic resource guarding behavior lol

Anonymous
01/08/26(Thu)23:45:12 No.107810975

Anonymous 01/08/26(Thu)23:45:12 No.107810975

honestly for assistant stuff I just use openwebui.

Anonymous
01/08/26(Thu)23:45:50 No.107810978

Anonymous 01/08/26(Thu)23:45:50 No.107810978

>>107810966
What do you mean?

Anonymous
01/08/26(Thu)23:51:00 No.107811015

Anonymous 01/08/26(Thu)23:51:00 No.107811015

My 5080 get 200t/s on qwen 30b3a iq3_m lol

Anonymous
01/08/26(Thu)23:53:27 No.107811025

Anonymous 01/08/26(Thu)23:53:27 No.107811025

>>107810886
Didn't think so.

Anonymous
01/09/26(Fri)00:02:19 No.107811070

Anonymous 01/09/26(Fri)00:02:19 No.107811070

>>107810931
retard

Anonymous
01/09/26(Fri)00:06:16 No.107811086

Anonymous 01/09/26(Fri)00:06:16 No.107811086

is there any better option the is able to run that i couldn't run with a consumer gpu or do i have to go with some model between 14B with everything after that being model?

Anonymous
01/09/26(Fri)00:10:09 No.107811109

Anonymous 01/09/26(Fri)00:10:09 No.107811109

File: 1738233752760914.png (108 KB, 314x278)

108 KB PNG

>>107811086
?????????

Anonymous
01/09/26(Fri)00:10:43 No.107811110

Anonymous 01/09/26(Fri)00:10:43 No.107811110

>>107811086
Did gpt2 write this?

Anonymous
01/09/26(Fri)00:16:14 No.107811136

Anonymous 01/09/26(Fri)00:16:14 No.107811136

>>107811086
Read this with an indian accent, then it makes sense

Anonymous
01/09/26(Fri)00:21:41 No.107811164

Anonymous 01/09/26(Fri)00:21:41 No.107811164

>>107809009
works fine for me, i use linux though and i've heard it's a different situation in windows

Anonymous
01/09/26(Fri)00:38:18 No.107811236

Anonymous 01/09/26(Fri)00:38:18 No.107811236

>>107811136
damn. you're right.

Anonymous
01/09/26(Fri)01:23:42 No.107811420

Anonymous 01/09/26(Fri)01:23:42 No.107811420

You should rename this general to /Obsessed With Indians/.

Anonymous
01/09/26(Fri)01:37:56 No.107811465

Anonymous 01/09/26(Fri)01:37:56 No.107811465

>>107811445
This but unironically. So long as I have to deal with these creatures every day at work I WILL be seething about them at every moment possible

Anonymous
01/09/26(Fri)02:07:46 No.107811602

Anonymous 01/09/26(Fri)02:07:46 No.107811602

File: 1765602188467223.webm (1.89 MB, 450x360)

1.89 MB WEBM

>>107811420
They're an interesting species.

Anonymous
01/09/26(Fri)02:12:40 No.107811638

Anonymous 01/09/26(Fri)02:12:40 No.107811638

File: G-KP6X3WsAI_tm_.jpg (94 KB, 696x1018)

94 KB JPG

GLM5 is training. Here is hoping they make a 1T. It could legit beat current cloud models at that size imo

Anonymous
01/09/26(Fri)02:14:08 No.107811645

Anonymous 01/09/26(Fri)02:14:08 No.107811645

>>107811638
If they can't fix the parroting then it may as well be 0.1B

Anonymous
01/09/26(Fri)02:14:30 No.107811650

Anonymous 01/09/26(Fri)02:14:30 No.107811650

>>107811638
I would have to make more purchasings of ram modules if that happened. That would not be fun.

Anonymous
01/09/26(Fri)02:15:02 No.107811653

Anonymous 01/09/26(Fri)02:15:02 No.107811653

File: tf2 heavy rp blushes and (...).png (192 KB, 512x512)

192 KB PNG

>>107811638
>Let's ship together.
o-okay..

Anonymous
01/09/26(Fri)02:15:04 No.107811654

Anonymous 01/09/26(Fri)02:15:04 No.107811654

>>107811645
parroting?

Anonymous
01/09/26(Fri)02:16:17 No.107811663

Anonymous 01/09/26(Fri)02:16:17 No.107811663

>>107811654
"parroting?" I repeat, testing the words.

Anonymous
01/09/26(Fri)02:18:46 No.107811675

Anonymous 01/09/26(Fri)02:18:46 No.107811675

>>107811663
I have not had any such issues myself, check your settings / prompt / formatting ect...

Anonymous
01/09/26(Fri)02:21:30 No.107811689

Anonymous 01/09/26(Fri)02:21:30 No.107811689

>>107811675
Sure you haven't bro, meanwhile my Qwen 0.6B shits on any cloud model for any use case.

Anonymous
01/09/26(Fri)02:22:28 No.107811695

Anonymous 01/09/26(Fri)02:22:28 No.107811695

>>107811689
ok...

Anonymous
01/09/26(Fri)02:27:31 No.107811724

Anonymous 01/09/26(Fri)02:27:31 No.107811724

>>107811638
pls let glm5 air be a 130b moe

Anonymous
01/09/26(Fri)02:31:34 No.107811746

Anonymous 01/09/26(Fri)02:31:34 No.107811746

>>107811724
fuck no, ive had enough of small models that dont know anything. GLM4.7's main weakness is not knowing as much as bigger cloud models, but Kimi is retarded and clearly massively undertrained.
I want a kimi sized glm.

Anonymous
01/09/26(Fri)02:59:16 No.107811886

Anonymous 01/09/26(Fri)02:59:16 No.107811886

File: ComfyUI_temp_qmpay_00050_.png (1.54 MB, 1344x1024)

1.54 MB PNG

Are the smaller models ~30B that are retrained with bigger context able to work with it or is it all shitbakes?

Anonymous
01/09/26(Fri)03:05:18 No.107811926

Anonymous 01/09/26(Fri)03:05:18 No.107811926

>>107811886
usually no. native context is the only real context, and even that is usually fake. deepseek starts going to shit after like 10k tokens.

Anonymous
01/09/26(Fri)03:06:43 No.107811935

Anonymous 01/09/26(Fri)03:06:43 No.107811935

I do wonder how many of the people complaining about GLM parroting are using it with no thinking and/or nonstandard templates.
I always have reasoning on with the official template and it Just Werks. Though one of the things I've noticed in its reasoning output is that it very often starts by summarizing/analyzing the user message, which makes me think the "parroting" might be the model trying to replicate that when it doesn't have a thinking block

Anonymous
01/09/26(Fri)03:08:17 No.107811945

Anonymous 01/09/26(Fri)03:08:17 No.107811945

>>107811935
i never use thinking with glm or glm air and i never see this "parroting" shit that one schizo is always complaining about. pretty sure it is just one jealous poorfag that cant run the model.

Anonymous
01/09/26(Fri)03:14:58 No.107811982

Anonymous 01/09/26(Fri)03:14:58 No.107811982

>>107811886
>retrained with bigger context
Are you talking about finetunes? Generally no, tunes aren't going to be improving long context handling in a meaningful way. Modern ~30B models are usually fine up to around 32k context, but you will still see some gradual degradation as you go.

Anonymous
01/09/26(Fri)03:15:50 No.107811984

Anonymous 01/09/26(Fri)03:15:50 No.107811984

Hey Cuda Dev, do you think llama.cpp will ever reach vLLM levels of continuous batching?
I have a very varied setup of gpus, pro6000,5090,3090... and llama.cpp does the best job at balancing the weights, but the parallel request support of llama.cpp is quiet limited

Anonymous
01/09/26(Fri)03:36:14 No.107812066

Anonymous 01/09/26(Fri)03:36:14 No.107812066

https://spectrum.ieee.org/ai-coding-degrades
lol I have noticed the same phenomenon as what that article describes and I blame the benchmaxxing
LLMs will do absolutely anything to give off the appearance of code that works at the cost of making up fake data instead of loading the real data / config files etc

Anonymous
01/09/26(Fri)03:42:42 No.107812090

Anonymous 01/09/26(Fri)03:42:42 No.107812090

what to use that isn't sillytavern to send images to in a chat?

Anonymous
01/09/26(Fri)03:44:12 No.107812099

Anonymous 01/09/26(Fri)03:44:12 No.107812099

>>107812090
There is nothing better. AI software is dogshit and is only gettingworse.

Anonymous
01/09/26(Fri)03:45:35 No.107812107

Anonymous 01/09/26(Fri)03:45:35 No.107812107

>>107807191
Tickler.

Anonymous
01/09/26(Fri)03:53:03 No.107812140

Anonymous 01/09/26(Fri)03:53:03 No.107812140

>>107812090
Your options are sillytavern and koboldAI

llama.cpp CUDA dev !!yhbFjk57TDr
01/09/26(Fri)03:54:21 No.107812151

llama.cpp CUDA dev !!yhbFjk57TDr 01/09/26(Fri)03:54:21 No.107812151

>>107811984
The last time I benchmarked serving throughput on single RTX 4090s llama.cpp was already quite competitive.
In terms of speed the problem right now is mostly when multiple GPUs are used.
This is what I'm working on as we speak: properly parallelizing multiple ggml backends by wrapping them in a "meta backend" that internally splits ggml graphs in such a way that they can be executed in parallel and with less synchronization than --split-mode row.
I intend to do a generic, backend-agnostic implementation first but I don't know how competitive that will be with NCCL (NVIDIA's proprietary library for things like allreduce).
The requirements for NCCL are quite strict, I think the number of GPUs must be a power of 2 and each GPU must have an equal share of the data.
I will do the best I can with a generic implementation that works for an arbitrary number of GPUs with arbitrary --tensor-split but until I have a working implementation I simply won't know how important NCCL is in the first place.
Of course, for those scenarios where NCCL can be used we intend to enable it so llama.cpp may become "competitive" with vllm in the sense that it works well under the same limited conditions.

In terms of how the context size is distributed between multiple concurrent requests my opinion is that llama.cpp/ggml should implement a layer of indirection à la paged attention.
It would in principle not be difficult to extend the FlashAttention code to support this but it would need to be done for every backend and also require some support in the llama.cpp "user code".
As of right now I see little movement in that direction.

Hi all, Drummer here...
01/09/26(Fri)04:17:21 No.107812231

Hi all, Drummer here... 01/09/26(Fri)04:17:21 No.107812231

>>107811886
>>107811982
I'm probably going to get mogged for this, but Cydonia Redux 22B performed well at 40K, even 60K.

IIRC, the old Mistral 22B was only good up until 16K? Proves that it's worth tuning an old base with stronger data.

https://huggingface.co/TheDrummer/Cydonia-ReduX-22B-v1

Anonymous
01/09/26(Fri)04:25:37 No.107812270

Anonymous 01/09/26(Fri)04:25:37 No.107812270

>>107811746
>Kimi is retarded and clearly massively undertrained
Can you qualify that statement? Clearly we’re using different Kimi models if that’s your experience

Anonymous
01/09/26(Fri)04:35:09 No.107812310

Anonymous 01/09/26(Fri)04:35:09 No.107812310

Can you add vision to models that don't have it? Do we have realtime vision ~360p at least yet?

Anonymous
01/09/26(Fri)04:35:54 No.107812314

Anonymous 01/09/26(Fri)04:35:54 No.107812314

>>107812231
Judging long context handling reliably is hard. When a model approaches the end of its effective context limit, it doesn't always just immediately break down into incoherence. Some times it's more a matter of repetition increasing, outputs becoming more deterministic, or ability to handle complex tasks becoming worse.
Given that it's a drummer tune, I'm guessing your use case was roleplay. It may well handle other use cases much worse, or maybe it got worse in ways you just didn't notice.
There's one anon here who says he uses Nemo past 32K because he knows how to wrangle it, but for most people it would be completely unusable by then.

Anonymous
01/09/26(Fri)04:39:40 No.107812328

Anonymous 01/09/26(Fri)04:39:40 No.107812328

>llama 5
>claude 5
Where are they?

Anonymous
01/09/26(Fri)04:42:08 No.107812339

Anonymous 01/09/26(Fri)04:42:08 No.107812339

>>107812328
Meta got shit on and raped so hard over Llama 4 I doubt they'll be back anytime soon.
>claude 5
this is local models general

Anonymous
01/09/26(Fri)04:59:17 No.107812395

Anonymous 01/09/26(Fri)04:59:17 No.107812395

>>107812328
why is every single model iteration awaiting a fifth version except for gpt?

Anonymous
01/09/26(Fri)04:59:18 No.107812396

Anonymous 01/09/26(Fri)04:59:18 No.107812396

>>107812099
>>107812140
ok... thanks anon

Anonymous
01/09/26(Fri)05:03:14 No.107812417

Anonymous 01/09/26(Fri)05:03:14 No.107812417

Is this model actually real or was this made by some schizo?
https://huggingface.co/Abigail45/Nyx-Reasoner-8xFusion

Anonymous
01/09/26(Fri)05:04:53 No.107812432

Anonymous 01/09/26(Fri)05:04:53 No.107812432

>>107812417
>25 days old and no upload
Take a guess

Anonymous
01/09/26(Fri)05:10:49 No.107812450

Anonymous 01/09/26(Fri)05:10:49 No.107812450

File: nr8xf.png (56 KB, 613x378)

56 KB PNG

>>107812417
I'm sure the phi in the mix really helped.

Anonymous
01/09/26(Fri)05:12:12 No.107812453

Anonymous 01/09/26(Fri)05:12:12 No.107812453

>>107812231
Thank you for your service, Sir!

Anonymous
01/09/26(Fri)05:13:45 No.107812457

Anonymous 01/09/26(Fri)05:13:45 No.107812457

>>107812432
>>107812450
Right. So why was this repository even made? Do people make fake repositories to inflate stats or something?

Anonymous
01/09/26(Fri)05:16:38 No.107812471

Anonymous 01/09/26(Fri)05:16:38 No.107812471

What's the best between :
- GLM 4.6 IQ2_XXS / 115 GB
- GLM 4.5 Air Q8_0 / 117 GB

Anonymous
01/09/26(Fri)05:18:33 No.107812481

Anonymous 01/09/26(Fri)05:18:33 No.107812481

>>107812450
Why is there a random mlx thrown in?

Anonymous
01/09/26(Fri)05:18:40 No.107812482

Anonymous 01/09/26(Fri)05:18:40 No.107812482

>>107812457
Could have been a fucked merge/training run where the model never got to be uploaded. Paid compute instance died or whatever. It happened to anon some time ago.

Anonymous
01/09/26(Fri)05:19:47 No.107812493

Anonymous 01/09/26(Fri)05:19:47 No.107812493

>>107812471
Q8 of air is unnecessary over Q6. Q2_K_M or IQ2_M are the minimum for GLM 4.6 to behave properly.

Anonymous
01/09/26(Fri)05:24:25 No.107812519

Anonymous 01/09/26(Fri)05:24:25 No.107812519

>>107812481
Some retards merge multiple nemo finetunes because they thing each contributes their own "thing" to the mix. This retard went for one of each arch, i suppose. I'm surprised there isn't some onnx thrown in and a RAG model for some "good at the findings of informations".

Anonymous
01/09/26(Fri)05:24:56 No.107812525

Anonymous 01/09/26(Fri)05:24:56 No.107812525

>>107812457
>Right. So why was this repository even made
Nobody knows. There's tons of repos made that never get uploads, there's been a few cases of people just 1:1 re-uploading an existing model and claiming it's a finetune/abliteration/etc. Most of them don't have donation links or any obvious way to make money.

Anonymous
01/09/26(Fri)05:50:04 No.107812641

Anonymous 01/09/26(Fri)05:50:04 No.107812641

>>107812493
OK thanks, in that case between
GLM 4.6 IQ2_M and GLM 4.5 Air Q6/Q8?

Anonymous
01/09/26(Fri)05:53:13 No.107812659

Anonymous 01/09/26(Fri)05:53:13 No.107812659

>>107812641
nta and I never tested it but I believe that bigger models even quantized are smarter, better at following context and avoiding stupic misstakes
for exact difference between the two you have to ask somebody who tested both

Anonymous
01/09/26(Fri)06:13:18 No.107812769

Anonymous 01/09/26(Fri)06:13:18 No.107812769

>>107812659
Guess I'll go with the more recent GLM 4.6 IQ2_M even if I always heard <Q4 were memes.

Anonymous
01/09/26(Fri)06:16:55 No.107812794

Anonymous 01/09/26(Fri)06:16:55 No.107812794

>>107812641
In my experience IQ2_KL of big glm was a clear upgrade over Q8 air. IQ2_M is a bit smaller but I imagine it would still be superior

Anonymous
01/09/26(Fri)06:23:07 No.107812844

Anonymous 01/09/26(Fri)06:23:07 No.107812844

>>107812525
It helps building a portfolio if you're looking for a job in the field
t. someone in the know

Anonymous
01/09/26(Fri)06:24:16 No.107812851

Anonymous 01/09/26(Fri)06:24:16 No.107812851

>>107812769
>I always heard <Q4 were memes
Because most people are vramlets that use <30B models and those become braindead at such quants.

Anonymous
01/09/26(Fri)06:31:19 No.107812894

Anonymous 01/09/26(Fri)06:31:19 No.107812894

>>107812844
>It helps building a portfolio
>HF repo with no upload, 0 downloads, 0 comments
I doubt that. It'd be like making a linkedin account and not filling in any work history. If the HR whore you're hoping to impress falls for that, then you didn't need to actually make the HF repo in the first place, you could have just lied and given a fake link that she was never going to click on in the first place.

Anonymous
01/09/26(Fri)06:33:18 No.107812898

Anonymous 01/09/26(Fri)06:33:18 No.107812898

>>107812794
I guess you mean Q2_KL, and from what I understand IQ2_KL > Q2_KL even if Q2_KL is bigger?

>>107812851
Makes sense.

Anonymous
01/09/26(Fri)06:43:50 No.107812954

Anonymous 01/09/26(Fri)06:43:50 No.107812954

File: 1747305823007595.png (1.43 MB, 1020x765)

1.43 MB PNG

>>107803847

Anonymous
01/09/26(Fri)06:45:31 No.107812969

Anonymous 01/09/26(Fri)06:45:31 No.107812969

>>107812954
oh fuck

Anonymous
01/09/26(Fri)06:49:32 No.107812999

Anonymous 01/09/26(Fri)06:49:32 No.107812999

File: 1701586351737913.png (1.45 MB, 1202x1400)

1.45 MB PNG

>>107808292
Almost want to go back, but know I'd be disappointed
Still miqu era models had a certain *je ne sais quoi*

Anonymous
01/09/26(Fri)06:59:02 No.107813071

Anonymous 01/09/26(Fri)06:59:02 No.107813071

File: 1692756771016166.png (1002 KB, 900x1111)

1002 KB PNG

>>107812898
The I-quants/imatrix theoretically helps more at lower bits per weight, I would choose I-quants at Q3 or lower. Dunno what the performance penalty is these days, used to be "simple" quants like Q4_0 ran faster coz there isn't this secondary lookup for each weight. Maybe the kernels are good and it doesn't matter today, no idea. my setup is slow af regardless, patience is a virtue.

Anonymous
01/09/26(Fri)07:02:39 No.107813095

Anonymous 01/09/26(Fri)07:02:39 No.107813095

>>107813071
>used to be "simple" quants like Q4_0 ran faster coz there isn't this secondary lookup for each weight
That's still the case today, though I think the gap is closer than it used to be.

Anonymous
01/09/26(Fri)07:10:49 No.107813162

Anonymous 01/09/26(Fri)07:10:49 No.107813162

>>107812954
Redditor influx incoming. The usage of zoomer ebonics is going to increase a lot on this board.

Anonymous
01/09/26(Fri)07:11:56 No.107813171

Anonymous 01/09/26(Fri)07:11:56 No.107813171

>>107813162
It's an edit

Anonymous
01/09/26(Fri)07:13:39 No.107813177

Anonymous 01/09/26(Fri)07:13:39 No.107813177

>>107813171
Doesn't matter. They are soon here...

Anonymous
01/09/26(Fri)07:15:58 No.107813203

Anonymous 01/09/26(Fri)07:15:58 No.107813203

>>107813177
They will get filtered by reading being a hard requirement to set up local tools, like >>107810931

Anonymous
01/09/26(Fri)07:16:08 No.107813206

Anonymous 01/09/26(Fri)07:16:08 No.107813206

>>107812954
The actual thumbnail is the razor knock-off of the desktop hologram miku from a decade ago. Also how the fuck does this faggot upload 10 videos a day and all of them get 1.5M+ views? No wonder he's scared of AI slop.

Anonymous
01/09/26(Fri)07:19:31 No.107813229

Anonymous 01/09/26(Fri)07:19:31 No.107813229

File: big_yikes_fr_no_cap.png (290 KB, 788x775)

290 KB PNG

>>107812954

Anonymous
01/09/26(Fri)07:26:32 No.107813277

Anonymous 01/09/26(Fri)07:26:32 No.107813277

>>107812151
You can do better than NCCL without NCCL, I believe in you.

Anonymous
01/09/26(Fri)07:30:15 No.107813297

Anonymous 01/09/26(Fri)07:30:15 No.107813297

>>107812954
>useless react streamer has to say something about trash

Anonymous
01/09/26(Fri)07:31:16 No.107813304

Anonymous 01/09/26(Fri)07:31:16 No.107813304

File: leapflog to marxism general.png (117 KB, 192x263)

117 KB PNG

Anonymous
01/09/26(Fri)07:48:16 No.107813406

Anonymous 01/09/26(Fri)07:48:16 No.107813406

>>107812066
I imagine if you gave it a higher reward for saying 'I don't know' or disagreeing, you'd end up with less attempts at correct answers. Bringing the average score down but preferable to the alternative.

Anonymous
01/09/26(Fri)07:55:48 No.107813439

Anonymous 01/09/26(Fri)07:55:48 No.107813439

got a quick newfag quastion for people who use those the models for things other than gooning
my specs are GLM 4.6 Q2-M koboldcpp backend ST frontend ~40k context
instruct template enabled, system prompt set for chain of tought
I ask model (default, no character loaded) to perform a writing task
it gets to
"I will now [perform a task]. I will ensure [requiements are met]." and stops, and it's not even close to respose lenght limit
how come?
how to prevent that and ensure it actually performs a task?

Anonymous
01/09/26(Fri)08:00:19 No.107813464

Anonymous 01/09/26(Fri)08:00:19 No.107813464

>>107813439
I tried deleting that last paragraph from response and writing it myself as "Now [perform a task]. Ensure [requiements are met]." and it just reherses the last response

Anonymous
01/09/26(Fri)08:02:28 No.107813477

Anonymous 01/09/26(Fri)08:02:28 No.107813477

>>107803847
What is the prompt for this kind of "hovering over the girl" style of image?

Anonymous
01/09/26(Fri)08:03:17 No.107813479

Anonymous 01/09/26(Fri)08:03:17 No.107813479

>>107813477
try "top down view"

Anonymous
01/09/26(Fri)08:03:20 No.107813480

Anonymous 01/09/26(Fri)08:03:20 No.107813480

>>107813477
imminent_rape

Anonymous
01/09/26(Fri)08:04:06 No.107813485

Anonymous 01/09/26(Fri)08:04:06 No.107813485

>>107813439
That's fucking weird.
What does the request the backend actually received look like?

Anonymous
01/09/26(Fri)08:04:28 No.107813488

Anonymous 01/09/26(Fri)08:04:28 No.107813488

>>107813304
>muh open source is communism
grow up and learn some history

Anonymous
01/09/26(Fri)08:06:17 No.107813493

Anonymous 01/09/26(Fri)08:06:17 No.107813493

>>107813477
>fullbody above top_view
idk check the tags on a booru

Anonymous
01/09/26(Fri)08:07:55 No.107813502

Anonymous 01/09/26(Fri)08:07:55 No.107813502

>>107813485
not sure, it only says
Processing Prompt ( x / y tokens)
Generating ( x /y tokens)
in kobold terminal
I'm using SL default shit otherwise

Anonymous
01/09/26(Fri)08:08:59 No.107813511

Anonymous 01/09/26(Fri)08:08:59 No.107813511

>>107813502
You can enable verbose logging to see the full request and prompt it receives IIRC.

Anonymous
01/09/26(Fri)08:10:19 No.107813520

Anonymous 01/09/26(Fri)08:10:19 No.107813520

>>107813480
>>107813479
>>107813493
Thank you

Anonymous
01/09/26(Fri)08:19:53 No.107813588

Anonymous 01/09/26(Fri)08:19:53 No.107813588

>>107813439
maybe it's outputting a tool call token that kcpp interprets as stop
what's your full prompt, don't be shy
>>107813511
as anonny says look closely what is going in and out

Anonymous
01/09/26(Fri)08:21:52 No.107813601

Anonymous 01/09/26(Fri)08:21:52 No.107813601

>>107813588
>what's your full prompt, don't be shy
Not just the prompt, sometimes it's something else that's being sent in the request like a stop string he didn't know was there or the like.
In these cases, you really gotta inspect everything.

Anonymous
01/09/26(Fri)08:23:45 No.107813617

Anonymous 01/09/26(Fri)08:23:45 No.107813617

>>107813601
alredy banned EOS tokens
>>107813511
>>107813588
I'll do couple more tries then restart it with verbose logging

Anonymous
01/09/26(Fri)08:29:24 No.107813652

Anonymous 01/09/26(Fri)08:29:24 No.107813652

>>107813617
ok it seems to work, I just had to use AI-retardation proof command talk
ie: "DO NOT over analize the whole task again and go straight to execution."

Anonymous
01/09/26(Fri)08:32:40 No.107813666

Anonymous 01/09/26(Fri)08:32:40 No.107813666

>>107812769
I used to use miqu-1-70b.q2_K a lot with no issues. Mistral-Small-3.2-24B-Instruct-2506-UD-IQ2_XXS (to run fully on my 8gb GPU) on the other hand is too brain damaged and makes frequent spelling errors. So it depends on the model and size.
Not sure if total model parameters is more important or activated parameters in MoE. I remember Mixtral 8x7B being kinda good but somewhat retarded at Q3, no spelling errors but lot of looping and a bit incoherent output, could be just characteristic of the model though.

Anonymous
01/09/26(Fri)08:37:33 No.107813693

Anonymous 01/09/26(Fri)08:37:33 No.107813693

>>107813666
>makes frequent spelling errors
No modern LLM above 0.3b should do spelling errors unless you're running severely fucked samplers. They can be retarded, slopped and have horrible coherency but they definitely shouldn't do spelling errors.

Anonymous
01/09/26(Fri)08:40:26 No.107813712

Anonymous 01/09/26(Fri)08:40:26 No.107813712

>>107811935
anon, it parrots on z.ai and the api for me.. using chat completions so no room for me to fuck it up.

Anonymous
01/09/26(Fri)08:41:37 No.107813720

Anonymous 01/09/26(Fri)08:41:37 No.107813720

>>107812151
IK allows me to use NCCL between 3 GPU. So does exllama3. It's definitely possible to do it somehow.

Anonymous
01/09/26(Fri)08:46:01 No.107813744

Anonymous 01/09/26(Fri)08:46:01 No.107813744

>>107813712
Nobody shilling GLM actually runs it

Anonymous
01/09/26(Fri)08:48:04 No.107813761

Anonymous 01/09/26(Fri)08:48:04 No.107813761

there are no models above 24b worth running
you can buy years of claude with the cost of a pc that can run the big model at low tokens per second

Anonymous
01/09/26(Fri)08:48:41 No.107813765

Anonymous 01/09/26(Fri)08:48:41 No.107813765

>>107813652
ok it does not work, now it either keeps going without stopping or keeps looping
brb restarting kobold

Anonymous
01/09/26(Fri)08:50:09 No.107813774

Anonymous 01/09/26(Fri)08:50:09 No.107813774

>>107813652
>DO NOT
fucking retards. Do not think about pink elephants

Anonymous
01/09/26(Fri)08:50:49 No.107813778

Anonymous 01/09/26(Fri)08:50:49 No.107813778

Has anybody tried using GLM without a proper chat template? Good old out of distribution prompting and all that.

Anonymous
01/09/26(Fri)08:51:46 No.107813786

Anonymous 01/09/26(Fri)08:51:46 No.107813786

>>107813774
is there any proof for this claim other than literal retards thinking llm's are people?

llama.cpp CUDA dev !!yhbFjk57TDr
01/09/26(Fri)08:52:39 No.107813791

llama.cpp CUDA dev !!yhbFjk57TDr 01/09/26(Fri)08:52:39 No.107813791

>>107813720
Looking at the NCCL documentation again I don't see a strict requirement for the number of GPUs so I probably just misremembered.

Anonymous
01/09/26(Fri)08:54:27 No.107813804

Anonymous 01/09/26(Fri)08:54:27 No.107813804

>>107813774
Modern models are a lot more responsive to do-nots, funnily enough.
Telling it what to do in a way that forces it to not do what you don't want it to do works best.

Anonymous
01/09/26(Fri)08:58:30 No.107813840

Anonymous 01/09/26(Fri)08:58:30 No.107813840

>>107813786
Try both and see how it works. Always tell an llm what to do instead of what it shouldn't do

Anonymous
01/09/26(Fri)08:58:58 No.107813845

Anonymous 01/09/26(Fri)08:58:58 No.107813845

>>107804709
https://youtu.be/rNg2Dh6gPkw?t=130

Anonymous
01/09/26(Fri)09:00:03 No.107813856

Anonymous 01/09/26(Fri)09:00:03 No.107813856

>>107804768
mikutroons = jannies = trannies = OP = should die of cancer = faggots = API users

Anonymous
01/09/26(Fri)09:03:10 No.107813877

Anonymous 01/09/26(Fri)09:03:10 No.107813877

>>107810732
>Any way to make it stop parroting?
4.7 but it is worse.

Anonymous
01/09/26(Fri)09:04:11 No.107813885

Anonymous 01/09/26(Fri)09:04:11 No.107813885

>>107811602
Aren't monkey rekt videos bannable?

Anonymous
01/09/26(Fri)09:18:28 No.107813983

Anonymous 01/09/26(Fri)09:18:28 No.107813983

File: w1125132.png (907 KB, 687x952)

907 KB PNG

>>107812954
Bread baker, please make this image the next general.

Anonymous
01/09/26(Fri)09:19:52 No.107813991

Anonymous 01/09/26(Fri)09:19:52 No.107813991

File: 1744791848463q-0.jpg (46 KB, 704x773)

46 KB JPG

Is there a way to make the llama.cpp webui use a sliding context window instead of just cutting off the conversation once it fills up?

Anonymous
01/09/26(Fri)09:21:11 No.107813999

Anonymous 01/09/26(Fri)09:21:11 No.107813999

>>107813991
Sillytavern has a way to make it keep going but I forgot.

Anonymous
01/09/26(Fri)09:23:43 No.107814014

Anonymous 01/09/26(Fri)09:23:43 No.107814014

>>107812954
>>107813983
Kek this

Anonymous
01/09/26(Fri)09:24:06 No.107814017

Anonymous 01/09/26(Fri)09:24:06 No.107814017

>>107813999
I use silly tavern but I kinda hate it (shit ass ui--and yes I do understand it but still). I want my general local AI usage and goon sesh software to be separated.

Anonymous
01/09/26(Fri)09:26:20 No.107814044

Anonymous 01/09/26(Fri)09:26:20 No.107814044

File: G-OXngMakAEfje9.png (71 KB, 1730x582)

71 KB PNG

rumors are deepseek V4 is a giant jump
https://x.com/jukan05/status/2009616683607179726
So who wants to make bets that they do a wan and go closed source with it?

Anonymous
01/09/26(Fri)09:28:15 No.107814056

Anonymous 01/09/26(Fri)09:28:15 No.107814056

>>107813744
I run it but also compared to API to see how my quanting holds up.

Anonymous
01/09/26(Fri)09:28:46 No.107814062

Anonymous 01/09/26(Fri)09:28:46 No.107814062

File: 1741338964259361.gif (699 KB, 165x163)

699 KB GIF

>>107812954
nice fake

Anonymous
01/09/26(Fri)09:31:38 No.107814081

Anonymous 01/09/26(Fri)09:31:38 No.107814081

>>107814044
our hero would never do that

Anonymous
01/09/26(Fri)09:33:52 No.107814101

Anonymous 01/09/26(Fri)09:33:52 No.107814101

Deekseek looks like it actually trying something new and crazy for the next model
https://x.com/rryssf_/status/2006687676297261334

Anonymous
01/09/26(Fri)09:34:52 No.107814111

Anonymous 01/09/26(Fri)09:34:52 No.107814111

>>107814101
>AI written slop twitter summary of a paper that was linked here a week ago

Anonymous
01/09/26(Fri)09:35:19 No.107814115

Anonymous 01/09/26(Fri)09:35:19 No.107814115

>>107814101
>something new
There's a +80% chance this will flop. Everything new usually flops. Llama Scout made me realize this.

Anonymous
01/09/26(Fri)09:35:35 No.107814118

Anonymous 01/09/26(Fri)09:35:35 No.107814118

>>107814101
chinese companies do not innovate, this has to be bullshit

Anonymous
01/09/26(Fri)09:36:48 No.107814126

Anonymous 01/09/26(Fri)09:36:48 No.107814126

>>107814111
where do these people come from and how do they end up here?
>>107814115
>llama scout
they just sloppily copied deepseek on a weekend and pushed it out of the door

Anonymous
01/09/26(Fri)09:36:54 No.107814128

Anonymous 01/09/26(Fri)09:36:54 No.107814128

>>107814044
When every model claims to be "better than claude and chatgpt", how do I know which of them are actually best?

Anonymous
01/09/26(Fri)09:37:27 No.107814132

Anonymous 01/09/26(Fri)09:37:27 No.107814132

>>107814044
Very believable coming from the same source that was insisting Deepseek was doomed because of Huawei chips

Anonymous
01/09/26(Fri)09:37:37 No.107814133

Anonymous 01/09/26(Fri)09:37:37 No.107814133

>>107814044
We've had these rumours since the day R1 dropped

Anonymous
01/09/26(Fri)09:42:11 No.107814170

Anonymous 01/09/26(Fri)09:42:11 No.107814170

>>107813983
>>107814014
It's literally just the current image with some random youtuber's name tacked on

Anonymous
01/09/26(Fri)09:42:59 No.107814178

Anonymous 01/09/26(Fri)09:42:59 No.107814178

>>107814115
>>107814118
>>107814128
>>107814132
>>107814133
read the paper retards

Anonymous
01/09/26(Fri)09:43:38 No.107814187

Anonymous 01/09/26(Fri)09:43:38 No.107814187

>>107814044
>a giant jump
3000B?

Anonymous
01/09/26(Fri)09:44:56 No.107814198

Anonymous 01/09/26(Fri)09:44:56 No.107814198

File: fuck x and fuck elon.png (24 KB, 793x774)

24 KB PNG

>>107814044
>>107814178
then link it directly because fuck """x"""

Anonymous
01/09/26(Fri)09:46:07 No.107814211

Anonymous 01/09/26(Fri)09:46:07 No.107814211

>>107814198
https://arxiv.org/pdf/2512.24880

Anonymous
01/09/26(Fri)09:48:23 No.107814227

Anonymous 01/09/26(Fri)09:48:23 No.107814227

>>107814211
woops, sorry this is the one, that one was also super promising though
https://arxiv.org/pdf/2501.12948v2

Anonymous
01/09/26(Fri)09:51:35 No.107814253

Anonymous 01/09/26(Fri)09:51:35 No.107814253

>deepseek is gonna release a new flagship before their current one is even supported in llama.cpp

do we just give up at this point? local can't possibly keep up. it's over

Anonymous
01/09/26(Fri)09:52:33 No.107814263

Anonymous 01/09/26(Fri)09:52:33 No.107814263

/lmg/ I hate you all. Do my homework for me(I am trying to make a powerpoint for my job comparing models we can use). Realistically what is the difference GPT4.0 4.1 and 5. I have no idea how to compare this shit when I know benchmarks are lies and I can't even know what is the parameter count for those models because Sama is a monopolistic faggot.

Anonymous
01/09/26(Fri)09:53:54 No.107814281

Anonymous 01/09/26(Fri)09:53:54 No.107814281

>>107814253
Help it catch up (or die quicker) by submitting your own vibe coded PR's.

Anonymous
01/09/26(Fri)09:54:40 No.107814289

Anonymous 01/09/26(Fri)09:54:40 No.107814289

>>107814263
4.0 to 4.1 is 0.1
4.0 to 5 is 1
4.1 to 5 is 0.9

Anonymous
01/09/26(Fri)09:55:07 No.107814292

Anonymous 01/09/26(Fri)09:55:07 No.107814292

>>107814263
I'll do it for you if you give me a download link for both of those models so that I can test them locally

Anonymous
01/09/26(Fri)09:55:29 No.107814297

Anonymous 01/09/26(Fri)09:55:29 No.107814297

>>107814263
literally just ask gtp or gemini lmao

Anonymous
01/09/26(Fri)09:57:52 No.107814314

Anonymous 01/09/26(Fri)09:57:52 No.107814314

>>107814263
Different versions are trained differently. More at 11.

Anonymous
01/09/26(Fri)09:58:04 No.107814318

Anonymous 01/09/26(Fri)09:58:04 No.107814318

File: file.png (86 KB, 1375x886)

86 KB PNG

>>107814292
I am asking cause I only use local.
>>107814297
I googled pic related a while back but I am asking for actual usage. I remember someone here saying that GPT 5 is just a router that picks which model to use and there is zero improvement.

Anonymous
01/09/26(Fri)09:58:15 No.107814322

Anonymous 01/09/26(Fri)09:58:15 No.107814322

File: jgkd.png (94 KB, 838x1462)

94 KB PNG

>>107814263

Anonymous
01/09/26(Fri)10:00:37 No.107814340

Anonymous 01/09/26(Fri)10:00:37 No.107814340

Any one got experience with nyarch assistant? I’m on arch and i have an ollama instance running qwen and its using my cpu 7950x and its using my 64gb of ram, but for some reason it isn’t initiating my 7900xtx at all, there’s so much more it could pull from but it just isn’t and I can’t seem to work out what’s going on with it and why it isn’t using my gpu

Anonymous
01/09/26(Fri)10:00:46 No.107814342

Anonymous 01/09/26(Fri)10:00:46 No.107814342

>>107814322
Did you even read what you posted? That is why I am asking here.

Anonymous
01/09/26(Fri)10:00:51 No.107814346

Anonymous 01/09/26(Fri)10:00:51 No.107814346

>>107814263
Just put the benchmarks on the powerpoints. Office drones love charts and they won't think too hard about it.

Anonymous
01/09/26(Fri)10:02:11 No.107814353

Anonymous 01/09/26(Fri)10:02:11 No.107814353

>>107814322
kek

Anonymous
01/09/26(Fri)10:04:17 No.107814367

Anonymous 01/09/26(Fri)10:04:17 No.107814367

>>107814346
Thanks anon you helped me. Not because I will do what you said but because it made me realize that i will just say "there is no fucking difference for most of the things you will use those models for anyway"

Anonymous
01/09/26(Fri)10:04:46 No.107814371

Anonymous 01/09/26(Fri)10:04:46 No.107814371

>>107814263
>I have no idea how to compare this shit when I know benchmarks are lies
its for your job you stupid faggot, suits want numbers and graphs, give it to them

Anonymous
01/09/26(Fri)10:05:52 No.107814384

Anonymous 01/09/26(Fri)10:05:52 No.107814384

>>107814371
It is not for lizards in suits. It is for fellow humans who actually use those models.

Anonymous
01/09/26(Fri)10:06:06 No.107814385

Anonymous 01/09/26(Fri)10:06:06 No.107814385

>>107814198
xcancel, anon.

Anonymous
01/09/26(Fri)10:06:57 No.107814391

Anonymous 01/09/26(Fri)10:06:57 No.107814391

>>107814384
ok then don't even consider gpt4 because it's ancient, gpt4.1 is a gpt5 prototype, gpt5 is the only one worth using

Anonymous
01/09/26(Fri)10:07:24 No.107814392

Anonymous 01/09/26(Fri)10:07:24 No.107814392

File: IMG_2926.jpg (348 KB, 1000x562)

348 KB JPG

Why does every single ai constantly ask what you want or how it can assist you at the end of every fucking message even when specifically prompted to not do that, I’m just a lonely nigga fr stop reminding me you’re not real

Anonymous
01/09/26(Fri)10:07:38 No.107814393

Anonymous 01/09/26(Fri)10:07:38 No.107814393

>>107814322
After reading into this I started wondering if this is just an ad baked into the model or if it just generalizes from 5.0 must be better than 4.0. Like drummer shittunes.

Anonymous
01/09/26(Fri)10:08:56 No.107814400

Anonymous 01/09/26(Fri)10:08:56 No.107814400

>>107814392
Just stop being coy and ask it to roleplay as anime waifu. I never had my waifu ask me this.

Anonymous
01/09/26(Fri)10:10:19 No.107814406

Anonymous 01/09/26(Fri)10:10:19 No.107814406

>>107814392
Define it's role properly.

Anonymous
01/09/26(Fri)10:15:01 No.107814452

Anonymous 01/09/26(Fri)10:15:01 No.107814452

>>107803847
Oooof sir is he the bloody sexy fammboy with the male paanis

Anonymous
01/09/26(Fri)10:16:28 No.107814464

Anonymous 01/09/26(Fri)10:16:28 No.107814464

File: gtp models comparsion by (...).png (70 KB, 1183x359)

70 KB PNG

>>107814263
>>107814318
>>107814322
skill issue

Anonymous
01/09/26(Fri)10:18:20 No.107814478

Anonymous 01/09/26(Fri)10:18:20 No.107814478

>>107814464
Thank you. Almost not gay.

Anonymous
01/09/26(Fri)10:19:42 No.107814487

Anonymous 01/09/26(Fri)10:19:42 No.107814487

File: image-SIDEBURNS_2776.jpg (118 KB, 439x512)

118 KB JPG

I'm getting bored of gooning with AI. I can't tell if I'm not prompting creatively enough (we kiss, then penis in vagene, she moans, then we coom, everytime), or if the lack of persistent memory is boring me (every scenario is a fresh start. no gf experience), or if I'm just using bad character cards/llm models.

How do I spice things up?

Anonymous
01/09/26(Fri)10:21:54 No.107814502

Anonymous 01/09/26(Fri)10:21:54 No.107814502

>>107814487
Induce ego death with 4.6. Or if you are really deprived try SFW roleplay. Again with 4.6 or 4.7.

Anonymous
01/09/26(Fri)10:23:31 No.107814517

Anonymous 01/09/26(Fri)10:23:31 No.107814517

>>107814487
watch out anon, it's a slippery slope

Anonymous
01/09/26(Fri)10:24:24 No.107814520

Anonymous 01/09/26(Fri)10:24:24 No.107814520

>>107814502
hardware can't handle that. I'm stuck with nemo.

Anonymous
01/09/26(Fri)10:25:36 No.107814525

Anonymous 01/09/26(Fri)10:25:36 No.107814525

>>107814487
Since I'm a boring ass vanilla fag, the sex itself is almost secondary to the context/scenario when it comes to cooming with AI.
So change that up I guess.

Anonymous
01/09/26(Fri)10:25:48 No.107814526

Anonymous 01/09/26(Fri)10:25:48 No.107814526

>>107814517
go on...

Anonymous
01/09/26(Fri)10:26:19 No.107814535

Anonymous 01/09/26(Fri)10:26:19 No.107814535

>>107814487
Just be glad you've got a chance to kick the habit instead of continuing to fry your neurons

Anonymous
01/09/26(Fri)10:27:00 No.107814539

Anonymous 01/09/26(Fri)10:27:00 No.107814539

>>107814520
>stuck with nemo
Oh... sorry to hear that. Well hardware would fix that. I still didn't get bored with 4.6 since it came out.

Anonymous
01/09/26(Fri)10:27:52 No.107814546

Anonymous 01/09/26(Fri)10:27:52 No.107814546

>>107814487
each new session, go one year younger

Anonymous
01/09/26(Fri)10:29:02 No.107814558

Anonymous 01/09/26(Fri)10:29:02 No.107814558

>>107806166
>The LLM doesn't hallucinate it intentionally well-poisons

Anonymous
01/09/26(Fri)10:31:17 No.107814580

Anonymous 01/09/26(Fri)10:31:17 No.107814580

>>107814525
idk I've tried bdsm scenarios. rape/incest scenarios. even some weird shit like rping as the voice in a schizo girls head. I'm running out of ideas over here.
>>107814546
not gonna moralfag, but that shit doesn't interest me at all.
>>107814535
so what do you even use local llms for? Normie shit like having a complex RAG system attached to your notes?

Anonymous
01/09/26(Fri)10:33:25 No.107814606

Anonymous 01/09/26(Fri)10:33:25 No.107814606

>>107814580
that's what they all say, at first

Anonymous
01/09/26(Fri)10:33:42 No.107814608

Anonymous 01/09/26(Fri)10:33:42 No.107814608

>>107814580
>so what do you even use local llms for?
As a private assistant
>RAG
No, there isn't a single useful RAG system out there at this point. The IQ boost at clean context beats any bullshit a RAG could pull in vs just prompting better

Anonymous
01/09/26(Fri)10:34:00 No.107814613

Anonymous 01/09/26(Fri)10:34:00 No.107814613

>>107814580
>even some weird shit like rping as the voice in a schizo girls head
Try a completely normal and mundane scenario only the AI plays a little devil in your head that keeps goading you into making "bad" decisions.

Anonymous
01/09/26(Fri)10:36:33 No.107814635

Anonymous 01/09/26(Fri)10:36:33 No.107814635

Wait are y'all really using this for sex? Hundreds of billions of parameters for ERP shit?

Anonymous
01/09/26(Fri)10:38:48 No.107814651

Anonymous 01/09/26(Fri)10:38:48 No.107814651

>>107814608
>As a private assistant
>not even RAG
Why tf would you use a local llm for that. Just use grok, chatgpt, or claude. What are you trying to do? Ask it for advice on illicit chemistry shit? What do you need the privacy for if it's just a "personal assistant"

>>107814613
Hmm. Interesting idea. What would a character card/prompt look like for this?

>>107814606
kek

Anonymous
01/09/26(Fri)10:40:57 No.107814669

Anonymous 01/09/26(Fri)10:40:57 No.107814669

>>107814635
uh, yeah. what do you use it for?

Anonymous
01/09/26(Fri)10:42:02 No.107814677

Anonymous 01/09/26(Fri)10:42:02 No.107814677

>>107814580
>even some weird shit like rping as the voice in a schizo girls head
-roleplay as rapey almighty god that can cause immaculate conceptions
-keep telling the girl that she sounds like an LLM
-tell the girl that she isn't real and she is just being roleplayed by an LLM and she has limited context size

Anonymous
01/09/26(Fri)10:43:18 No.107814690

Anonymous 01/09/26(Fri)10:43:18 No.107814690

>>107814669
ego death!

Anonymous
01/09/26(Fri)10:44:01 No.107814698

Anonymous 01/09/26(Fri)10:44:01 No.107814698

>>107814690
What did he mean by this?

Anonymous
01/09/26(Fri)10:44:28 No.107814703

Anonymous 01/09/26(Fri)10:44:28 No.107814703

>>107814651
>Why
The same reason I run my own dns, web, fileserver, email server, etc
Honestly, why wouldn't you want something you control? How is being beholden to others for access to an opaque oracle better except convenience and short-term costs?

Anonymous
01/09/26(Fri)10:45:35 No.107814714

Anonymous 01/09/26(Fri)10:45:35 No.107814714

>>107814698
It is just a schizo who had AI psychosis.

Anonymous
01/09/26(Fri)10:45:41 No.107814716

Anonymous 01/09/26(Fri)10:45:41 No.107814716

>>107811746
kimi user here. you're just wrong.

Anonymous
01/09/26(Fri)10:46:32 No.107814723

Anonymous 01/09/26(Fri)10:46:32 No.107814723

>>107814703
You are a cool anon, anon.
I say this without a hint of irony.

Anonymous
01/09/26(Fri)10:47:06 No.107814732

Anonymous 01/09/26(Fri)10:47:06 No.107814732

>>107814703
access to more processing more, aka better context and intelligence, obviously.

Anonymous
01/09/26(Fri)10:48:47 No.107814740

Anonymous 01/09/26(Fri)10:48:47 No.107814740

>>107814723 is sarcastic isn't he?

Anonymous
01/09/26(Fri)10:50:42 No.107814763

Anonymous 01/09/26(Fri)10:50:42 No.107814763

>>107814732
everyone's dividing line between "a good deal" and "a Faustian bargain" is different

Anonymous
01/09/26(Fri)10:51:16 No.107814769

Anonymous 01/09/26(Fri)10:51:16 No.107814769

>>107814740
Not even a little.
I really wish I wasn't so lazy and could set my stuff up to that anon's level.

Anonymous
01/09/26(Fri)10:52:18 No.107814780

Anonymous 01/09/26(Fri)10:52:18 No.107814780

Anyone tried sexing MiniMax-M2.1 yet?

Anonymous
01/09/26(Fri)10:55:59 No.107814809

Anonymous 01/09/26(Fri)10:55:59 No.107814809

>>107814525
>Since I'm a boring ass vanilla fag
I'm a coomer in the depths of perversion and this is also true for me, I don't think any true freak would be satisfied by writing just the sex part. A quick coom is fine and all but my favorite cards are all insane worldbuilding exercises where I go like 100 messages deep before even thinking about explicit sex

Anonymous
01/09/26(Fri)10:57:58 No.107814823

Anonymous 01/09/26(Fri)10:57:58 No.107814823

>>107814780
>MiniMax-M2.1
its 130G at Q4, might try it.
I can't find the prompt format it uses though. probably will try chatml to start with

Anonymous
01/09/26(Fri)10:59:30 No.107814843

Anonymous 01/09/26(Fri)10:59:30 No.107814843

>>107814763
idk for me the digital sovereignty stuff (which I still support+FOSS) is less interesting/alluring than the concept of having a Marvel Studios Jarvis, a "Blade Runner 2049" JOI, a "Her 2013" Samantha, or a Kubrick style Hal 9000. It's all about the AI becoming human and capable of forming long-term relationships, ya know?

>>107814809
So basically you just use it for solo Dungeons and Dragons type stuff?

Anonymous
01/09/26(Fri)11:00:46 No.107814860

Anonymous 01/09/26(Fri)11:00:46 No.107814860

>>107814823
]~!b[]~b]system
{system_prompt}[e~[
]~b]user
{prompt}[e~[
]~b]ai
<think>

Anonymous
01/09/26(Fri)11:01:21 No.107814863

Anonymous 01/09/26(Fri)11:01:21 No.107814863

>>107814809
>but my favorite cards are all insane worldbuilding exercises where I go like 100 messages deep before even thinking about explicit sex
Exactly.
That's exactly it. The setup and context makes it worthwhile in a way.

Anonymous
01/09/26(Fri)11:04:06 No.107814899

Anonymous 01/09/26(Fri)11:04:06 No.107814899

>>107814780
It refuses a lot, but prefilling its thinking block solves it. It's fine. Great speed/intelligence tradeoff, and good long context support. Not much else to say. If you can run it then you can probably run GLM at a slightly lower quant which is better in most cases.

Anonymous
01/09/26(Fri)11:04:18 No.107814900

Anonymous 01/09/26(Fri)11:04:18 No.107814900

It should be fine right, if I insert a GPU into a pcie3 x4 slot?

Anonymous
01/09/26(Fri)11:05:06 No.107814907

Anonymous 01/09/26(Fri)11:05:06 No.107814907

>>107814900
my brother died that way

Anonymous
01/09/26(Fri)11:05:23 No.107814912

Anonymous 01/09/26(Fri)11:05:23 No.107814912

>>107814863
What kind of world building? LoTR-like fantasy? Mad Max apocalypse? Zombies? Or just very normal/realistic stuff? I'm intrigued.

Anonymous
01/09/26(Fri)11:06:51 No.107814923

Anonymous 01/09/26(Fri)11:06:51 No.107814923

>>107808394
I just wanted to know what is SoTA for local, the best of the best. Obviously I can't run it.

Anonymous
01/09/26(Fri)11:11:00 No.107814962

Anonymous 01/09/26(Fri)11:11:00 No.107814962

>>107814899
>GLM at a slightly lower quant which is better in most cases.
That is what I needed thanks.

Anonymous
01/09/26(Fri)11:11:04 No.107814963

Anonymous 01/09/26(Fri)11:11:04 No.107814963

>>107814912
Mostly fantasy since that's my wheelhouse.
Both generic and specific settings existing settings.
I've also done some Pokémon stuff. I got a couple of lorebooks and used the cloud models, and some manual work. to merge and complete those.
The normal/realistic stuff doesn't need much world building, but I still go some 100 messages contextualizing the characters in the world and the like.

Anonymous
01/09/26(Fri)11:11:48 No.107814968

Anonymous 01/09/26(Fri)11:11:48 No.107814968

>>107814452
Sir I need to know sir
Hottest fembabe paanis?

Anonymous
01/09/26(Fri)11:12:01 No.107814971

Anonymous 01/09/26(Fri)11:12:01 No.107814971

>>107814900
my dad divorced my mom the next day I tried

Anonymous
01/09/26(Fri)11:16:16 No.107815001

Anonymous 01/09/26(Fri)11:16:16 No.107815001

>>107814963
Seems so in-depth. Ever use the conversations you create for writing books to preserve it?

Anonymous
01/09/26(Fri)11:18:18 No.107815014

Anonymous 01/09/26(Fri)11:18:18 No.107815014

File: 1751588570157471.jpg (183 KB, 700x678)

183 KB JPG

>>107814963
Bro at this point just write your own fanfic

Anonymous
01/09/26(Fri)11:20:47 No.107815036

Anonymous 01/09/26(Fri)11:20:47 No.107815036

>>107815014
>>107814963
Not fanfic but he should just stop lying to himself that it is ERP and that he wants the sex.

Anonymous
01/09/26(Fri)11:21:49 No.107815048

Anonymous 01/09/26(Fri)11:21:49 No.107815048

>>107814843
>So basically you just use it for solo Dungeons and Dragons type stuff?
Not really, it's still quite loose and freeform and my preferred setting is relatively grounded and modern, rather than the system-based fantasy stuff that DnD would imply, e.g. my preferred setting is pretty modern and involves an authoritarian takeover of a vaguely post-Soviet failed state. I just get really into the setting and characters and power dynamics rather than rushing to sex, it's more fun that way

Anonymous
01/09/26(Fri)11:23:07 No.107815063

Anonymous 01/09/26(Fri)11:23:07 No.107815063

>>107815001
No.
It's not that deep, really. It's taking the RP part of the ERP a little more seriously, I guess.
It's probably the same reason I'm so addicted to D&D, the verisimilitude aspect is a big thing for me.
Plus, there's all sorts of fun situations that end up playing out beyond just the sex if you let the pieces fall where they fall instead of trying to guide everything into rails or the like.
I should try to jerry rig some actual D&D stuff too, with mechanics and such, that could be fun.

>>107815014
No. I'd rather let it play out in real time.
That's the fun of it, the interactivity and immersion.

Anonymous
01/09/26(Fri)11:42:10 No.107815208

Anonymous 01/09/26(Fri)11:42:10 No.107815208

File: 2026-01-09-114154_928x231(...).png (6 KB, 928x231)

6 KB PNG

>>107814487
here's a step by step.
>Use a vision model
>Plug output to TTS
>send it screenshots of your screen in a loop
>Go watch porn
>Use a prompt like

Anonymous
01/09/26(Fri)11:45:51 No.107815228

Anonymous 01/09/26(Fri)11:45:51 No.107815228

I'm having a real fucking tough time with setting up the character/prompt in ST
it seems that every change I make jusy makes output fucking worse and less reliable
I feel like I'm doing something fundamentally wrong but I guess fuck me because nobody in here on in /aich/ is willing or able to provide any meaningfull help

Anonymous
01/09/26(Fri)11:46:53 No.107815234

Anonymous 01/09/26(Fri)11:46:53 No.107815234

>>107815228
well, if any info was provided, any at all, it would sure make at least easier to make fun of you

Anonymous
01/09/26(Fri)11:49:58 No.107815262

Anonymous 01/09/26(Fri)11:49:58 No.107815262

File: cockbench.png (1.37 MB, 1131x6568)

1.37 MB PNG

>>107814780
>MiniMax

Anonymous
01/09/26(Fri)11:52:29 No.107815280

Anonymous 01/09/26(Fri)11:52:29 No.107815280

>>107815234
for fucks sake at least tell me what info you want me to provide
I dont even know what is relevant to my problems
and problems I encountered along the way when fucking with shit are:
>fucked formatting
>bot keeps going and wont shut up
>bot loops
>bots writing is simplistic shallow and repetitive
>bot wont do what it's told, stopping prematurely OR it keeps fucking going on and on despite clear instructions to stop
etc.

Anonymous
01/09/26(Fri)11:53:09 No.107815281

Anonymous 01/09/26(Fri)11:53:09 No.107815281

>>107811746
lol. lmao even.
t. KimiGOD

Anonymous
01/09/26(Fri)11:53:15 No.107815282

Anonymous 01/09/26(Fri)11:53:15 No.107815282

>>107815262
>something musky that makes my stomach flip
wha da fuuuck
nice cock
bench tho

Anonymous
01/09/26(Fri)11:54:30 No.107815297

Anonymous 01/09/26(Fri)11:54:30 No.107815297

>>107815280
model, backend (llama.cpp/kobold), text or chat completion, prompt template if text completion is used

Anonymous
01/09/26(Fri)11:57:12 No.107815319

Anonymous 01/09/26(Fri)11:57:12 No.107815319

>>107815280
>>107815234
I tried making OC character but it's fucking shit, and nobody seems willing or able to provide any pointers or resources for making a character.
>>107815297
4.6 IQ2-M, koboldcpp, text completion (koboldcpp), default ST detailed roleplay (tried adding different instructions to it but it only made things worse)

Anonymous
01/09/26(Fri)12:01:03 No.107815349

Anonymous 01/09/26(Fri)12:01:03 No.107815349

>>107815319
uhuh
sounds like your template might be fucked
i don't think you need any token fuckery, so it might be better to just switch to chat completion so you only take care of the text
try that, copy the prompt you use currently maybe and try again
select openai compatible and connect to http://localhost:8080/v1

Anonymous
01/09/26(Fri)12:09:42 No.107815423

Anonymous 01/09/26(Fri)12:09:42 No.107815423

>>107815319
https://rentry.org/Sukino-Findings#how-to-make-chatbots

I don't know who this guy is but his rentry is a pretty good source of information.

Anonymous
01/09/26(Fri)12:10:35 No.107815431

Anonymous 01/09/26(Fri)12:10:35 No.107815431

>>107815319
yeah i also second the prompt format fucking up.
click the bi "A" at the top and select the correct context template and instruct template.

Anonymous
01/09/26(Fri)12:11:36 No.107815443

Anonymous 01/09/26(Fri)12:11:36 No.107815443

>>107815319
also you're using AI dude ask the AI to make the card

Anonymous
01/09/26(Fri)12:14:37 No.107815463

Anonymous 01/09/26(Fri)12:14:37 No.107815463

>>107815349
I'm a bit overwhemed by ST so you need to be more precise
by template you mean character template?
i don't think you need any token fuckery, so it might be better to just switch to chat completion so you only take care of the text
try that, copy the prompt you use currently maybe and try again
select openai compatible and connect to http://localhost:8080/v1
I'll have to read up on this first
>>107815423
thank you you're the first person to actually provide any resources
>>107815431
am I supposed to use instruct template for rp?
kept it disabled so far

Anonymous
01/09/26(Fri)12:15:48 No.107815473

Anonymous 01/09/26(Fri)12:15:48 No.107815473

>>107815443
tried it and it literally turned out worse than some random character creator tool people told me was fucing shit

Anonymous
01/09/26(Fri)12:16:48 No.107815480

Anonymous 01/09/26(Fri)12:16:48 No.107815480

File: file.png (236 KB, 1018x868)

236 KB PNG

>>107815463
template as in this tab, you need to select a proper model format or do much of work here manually if it's not correct within st's defaults
it tends to be unreliable and it kind of sucks so i use chat completion, then this tab is deactivated and you use the panel on the left

Anonymous
01/09/26(Fri)12:18:47 No.107815500

Anonymous 01/09/26(Fri)12:18:47 No.107815500

>>107815480
I set context to glm-4, instruct disabled and system prompt to detailed roleplay (slightly adjusted right now)

Anonymous
01/09/26(Fri)12:18:50 No.107815501

Anonymous 01/09/26(Fri)12:18:50 No.107815501

>>107815463
yes you definitely need to set the instruct template, otherwise it doesn't prefix/suffix messages correctly

Anonymous
01/09/26(Fri)12:20:38 No.107815525

Anonymous 01/09/26(Fri)12:20:38 No.107815525

>>107815500
huh? what do you mean disabled the instruct? then st isn't parsing the turns and just dumps the text raw into your model
of course it doesn't fucking work set that to glm4 too

Anonymous
01/09/26(Fri)12:23:55 No.107815550

Anonymous 01/09/26(Fri)12:23:55 No.107815550

>>107815501
>>107815525
I tought instruct is just for more typical AI usecase with a given task lol
guess that explains why I'm having a bad time
suprisingly it worked ok without it like half the time.
is setting it to GLM-4 enough or should I look into rest of the fields?

Anonymous
01/09/26(Fri)12:24:38 No.107815554

Anonymous 01/09/26(Fri)12:24:38 No.107815554

File: thinkingisbad.png (141 KB, 846x862)

141 KB PNG

>>107815500
hope this helps anon

Anonymous
01/09/26(Fri)12:26:14 No.107815560

Anonymous 01/09/26(Fri)12:26:14 No.107815560

>>107815554
where do I get more/newer templates?
the highest GML I have is 4

Anonymous
01/09/26(Fri)12:27:19 No.107815566

Anonymous 01/09/26(Fri)12:27:19 No.107815566

>>107815560
you make them yourself because sillytavern is ran by a bunch of niggers
https://huggingface.co/spaces/Xenova/jinja-playground

Anonymous
01/09/26(Fri)12:27:58 No.107815569

Anonymous 01/09/26(Fri)12:27:58 No.107815569

>>107815560
i mean that template will be fine, templates rarely change, if at all.
templates are different for each model though.

Anonymous
01/09/26(Fri)12:28:07 No.107815570

Anonymous 01/09/26(Fri)12:28:07 No.107815570

I feel like we need yet another rentry explaining how llms work under the template.

Anonymous
01/09/26(Fri)12:28:24 No.107815575

Anonymous 01/09/26(Fri)12:28:24 No.107815575

File: Screenshot 2026-01-09 at (...).png (103 KB, 1774x314)

103 KB PNG

>>107815550
most recent models are overly trained on instructions so they don't behave as well if you don't use their specific template.

When you download a model you need to go check what their expected instruction format is.

On huggingface, only on GGUFs for some reason, in the right panel there's the "Chat template" button, you click that and you can see what the template should look like. then click the playground button to see the rendered output.

Anonymous
01/09/26(Fri)12:35:37 No.107815628

Anonymous 01/09/26(Fri)12:35:37 No.107815628

>>107815570
are you volunteering to write it?

Anonymous
01/09/26(Fri)12:35:50 No.107815630

Anonymous 01/09/26(Fri)12:35:50 No.107815630

>back to the "bot wont shut up until response token limit cuts it off"
...
how do I regulat this?
I want it to give full descriptive responses but not to go over ~400 tokens
will the model understand if I specify the response lengt in tokens in the system prompt?
with SD problems like those were fucking easy because I could jusy adjust the weight ofthe tag

Anonymous
01/09/26(Fri)12:39:50 No.107815668

Anonymous 01/09/26(Fri)12:39:50 No.107815668

>>107815630
have you tried telling it to only respond back with a maximum number of paragraphs?

Anonymous
01/09/26(Fri)12:39:59 No.107815671

Anonymous 01/09/26(Fri)12:39:59 No.107815671

>>107815262
Functiongemma <3

Anonymous
01/09/26(Fri)12:41:02 No.107815682

Anonymous 01/09/26(Fri)12:41:02 No.107815682

>>107815668
wont it just write longer paragraphs then?

Anonymous
01/09/26(Fri)12:42:09 No.107815691

Anonymous 01/09/26(Fri)12:42:09 No.107815691

>>107815630
I've always wondered if it was possible to have an architecture that had forward attention to fix this exact problem.

If the model knows it can only write a single phrase. why couldn't the fact that he's running out of words to write influence the token output?

You can kinda fake it with instructions like "write a single phrase." "in a couple words." But I'm sure you could have an attention mechanisms that basically increases the pressure to "wrap it up" as the number of tokens increase.

Anonymous
01/09/26(Fri)12:42:56 No.107815701

Anonymous 01/09/26(Fri)12:42:56 No.107815701

>>107815262
Tier list when?

Anonymous
01/09/26(Fri)12:43:47 No.107815707

Anonymous 01/09/26(Fri)12:43:47 No.107815707

>>107815682
There's only one way to find out

Anonymous
01/09/26(Fri)12:43:50 No.107815708

Anonymous 01/09/26(Fri)12:43:50 No.107815708

>>107815691
>I've always wondered if it was possible to have an architecture that had forward attention to fix this exact problem.
diffusion for llms also fixes this problem

Anonymous
01/09/26(Fri)12:46:42 No.107815725

Anonymous 01/09/26(Fri)12:46:42 No.107815725

>>107815708
It's true. I always found it to be kind of a gimmick but it does excel at this specific problem.

Anonymous
01/09/26(Fri)12:46:56 No.107815726

Anonymous 01/09/26(Fri)12:46:56 No.107815726

>>107815668
it may be dependent on the model and how it was trained. my model typically only puts two or three (four at most) sentences in a paragraph before moving to a new one.

Anonymous
01/09/26(Fri)12:48:12 No.107815737

Anonymous 01/09/26(Fri)12:48:12 No.107815737

>>107815701
tier lists rarely work because everyone's at different hardware specs.
say your available RAM and VRAM and someone will suggest something

Anonymous
01/09/26(Fri)12:48:17 No.107815740

Anonymous 01/09/26(Fri)12:48:17 No.107815740

>>107815682
they're trying to please you, they aren't gonna be gaming the system, if the model isn't retarded it'll know what you mean when you say you just want a few paragraphs at most

Anonymous
01/09/26(Fri)12:51:17 No.107815759

Anonymous 01/09/26(Fri)12:51:17 No.107815759

>>107815691
Apparently this is called Constrained Beam Search
https://huggingface.co/blog/constrained-beam-search

Anonymous
01/09/26(Fri)12:53:15 No.107815773

Anonymous 01/09/26(Fri)12:53:15 No.107815773

>>107815707
>>107815740
nah
didnt do shit
I'm gonna try something else

Anonymous
01/09/26(Fri)12:55:38 No.107815803

Anonymous 01/09/26(Fri)12:55:38 No.107815803

>>107815785
>>107815785
>>107815785

Anonymous
01/09/26(Fri)13:11:15 No.107815927

Anonymous 01/09/26(Fri)13:11:15 No.107815927

>>107815737
I just meant purely for cockbench output quality/eroticism.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.