/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 01/12/26(Mon)20:40:48 No.107847320

File: __kasane_teto_utau_drawn_(...).jpg (892 KB, 1413x2000)

892 KB JPG

/lmg/ - Local Models General Anonymous 01/12/26(Mon)20:40:48 No.107847320 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107838898 & >>107834480

►News
>(01/08) Jamba2 3B and Mini (52B-A12B) released: https://ai21.com/blog/introducing-jamba2
>(01/05) OpenPangu-R-72B-2512 (74B-A15B) released: https://hf.co/FreedomIntelligence/openPangu-R-72B-2512
>(01/05) Nemotron Speech ASR released: https://hf.co/blog/nvidia/nemotron-speech-asr-scaling-voice-agents
>(01/04) merged sampling : add support for backend sampling (#17004): https://github.com/ggml-org/llama.cpp/pull/17004
>(12/31) HyperCLOVA X SEED 8B Omni released: https://hf.co/naver-hyperclovax/HyperCLOVAX-SEED-Omni-8B

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
01/12/26(Mon)20:41:17 No.107847322

Anonymous 01/12/26(Mon)20:41:17 No.107847322

File: rec.jpg (181 KB, 1024x1024)

181 KB JPG

►Recent Highlights from the Previous Thread: >>107838898

--Paper: Prompt Repetition Improves Non-Reasoning LLMs:
>107841511 >107841558 >107841788
--Papers:
>107840737
--Test-time training and beam search potential in open models:
>107839993 >107840870 >107840929 >107840944 >107840948 >107841079 >107841107 >107841130 >107841188 >107841193 >107843956 >107844098 >107844297 >107844760
--Adapting Microsoft TinyTroupe for local multiagent simulation with koboldcpp:
>107840877 >107840941 >107841028 >107841046 >107841313 >107842113 >107843658 >107843909 >107844229
--Context caching and efficiency in SillyTavern/LLM interactions:
>107841026 >107841049 >107841057 >107841086 >107841105 >107841142
--AI character interface development with animation control features:
>107841569 >107841591 >107841609 >107841593 >107841614 >107841636 >107841685 >107841771 >107841794 >107844857 >107841645 >107841648 >107841651 >107841655 >107841751 >107841760 >107841789 >107841844 >107841925 >107842016 >107843335 >107843377
--Cost and hardware considerations for multi-3090 AI rig construction:
>107840180 >107840249 >107840309 >107840596 >107840633 >107840640
--RAG explained as document chunking and embedding for context augmentation:
>107841899 >107841939 >107842005 >107842027 >107844296 >107844327 >107844468 >107842015 >107842046 >107842099 >107842082
--AI flaws vs emotional simulation and 3D model tech discussion:
>107842172 >107843286 >107843328 >107843393 >107843528 >107843592 >107845182 >107845226 >107845255 >107846236 >107843907 >107844059 >107844099 >107844179 >107844262
--llama.cpp memory split regression issue after update:
>107840161 >107840177
--ik_llama.cpp PR adds customizable string/regex token banning:
>107843501
--Miku (free space):
>107840633 >107840665 >107842172 >107843286 >107843393 >107843911 >107845663 >107845698 >107846236 >107844824

►Recent Highlight Posts from the Previous Thread: >>107838903

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
01/12/26(Mon)20:44:28 No.107847349

Anonymous 01/12/26(Mon)20:44:28 No.107847349

>>107847336
>>107847336
>>107847336

Anonymous
01/12/26(Mon)20:46:46 No.107847369

Anonymous 01/12/26(Mon)20:46:46 No.107847369

>>107847320
I would drill kasane's tetos.

Anonymous
01/12/26(Mon)20:48:07 No.107847379

Anonymous 01/12/26(Mon)20:48:07 No.107847379

>>107847349
>Am I retarded?
Probably.
>where the fuck do you find the mmproj for mistral small 3.2 2501?
What stops you from making it yourself? Is it not supported?

Anonymous
01/12/26(Mon)20:50:21 No.107847396

Anonymous 01/12/26(Mon)20:50:21 No.107847396

>>107847379
It's not multimodal.

Anonymous
01/12/26(Mon)20:51:29 No.107847409

Anonymous 01/12/26(Mon)20:51:29 No.107847409

>>107847396
Yeah. I was just checking. He is retarded, then.

Anonymous
01/12/26(Mon)20:51:44 No.107847413

Anonymous 01/12/26(Mon)20:51:44 No.107847413

>>107847349
>Am I retarded?
Yes. Go to bartowski's 3.2 page, click the files tab and ctrl+f mmproj.

Anonymous
01/12/26(Mon)20:53:39 No.107847425

Anonymous 01/12/26(Mon)20:53:39 No.107847425

File: broken-tutu.png (9 KB, 632x73)

9 KB PNG

>>107847379
>>107847396
>>107847409
Either Broken-Tutu isn't actually using 2501 as it says or this description is pure LLM hallucination.

Either ways fuck ReadyArt.

Anonymous
01/12/26(Mon)20:54:46 No.107847435

Anonymous 01/12/26(Mon)20:54:46 No.107847435

>>107847425
Vision was added in 3.1+3.2. '2501' refers to 3.0 which does not have vision.

Anonymous
01/12/26(Mon)20:55:03 No.107847437

Anonymous 01/12/26(Mon)20:55:03 No.107847437

Reminder that you shouldn't use abliterations. Just don't be lazy and properly prompt with the BASED models.

Anonymous
01/12/26(Mon)20:57:03 No.107847444

Anonymous 01/12/26(Mon)20:57:03 No.107847444

>>107847320
fateto

Anonymous
01/12/26(Mon)20:58:21 No.107847454

Anonymous 01/12/26(Mon)20:58:21 No.107847454

>>107847425
>merge
You should have started there, retard. Next time link to the model.

Anonymous
01/12/26(Mon)20:58:41 No.107847458

Anonymous 01/12/26(Mon)20:58:41 No.107847458

>>107847320
Just got a used 3090 with 24GB VRAM
Any proper in depth guide to get LLM setup with image + sound generation?
I prefer to use deepseek if possible,
And this guide is shit, how does sillytavern communicates with koboltcpp? Is there configuration needed?
>ooba/koboldcpp as your backend
>sillytavern as your frontend
>go to huggingface and download nemo 12b instruct gguf. Start with Q4.
>load into ooba/kobold
>in sillytavern, select Mistral v3 tekken context >template and instruct template
>Temp 0.8
>MinP 0.02
>Rep Pen 1.2

Anonymous
01/12/26(Mon)21:05:55 No.107847497

Anonymous 01/12/26(Mon)21:05:55 No.107847497

>>107847458
>Just got a used 3090 with 24GB VRAM
Cool.
>Any proper in depth guide to get LLM setup with image + sound generation?
SillyTavern has options for both I think, but I don't use it. Just click on buttons until something happens.
>I prefer to use deepseek if possible,
kek
>And this guide is shit
>Is there configuration needed?
Yes. It needs to know where to connect to. Just use kobold's built-in webui until you know what you're doing to see if you even like these things.

Anonymous
01/12/26(Mon)21:07:05 No.107847503

Anonymous 01/12/26(Mon)21:07:05 No.107847503

>>107847425
Ok so, Broken-Tutu is actually on 2506 instead of the 2501 listed.

Anonymous
01/12/26(Mon)21:11:19 No.107847536

Anonymous 01/12/26(Mon)21:11:19 No.107847536

>2025
>Japanese LLMs still suck

Anonymous
01/12/26(Mon)21:13:44 No.107847552

Anonymous 01/12/26(Mon)21:13:44 No.107847552

>>107847503
Mistakes in the model card always bodes well for the quality of the model

Anonymous
01/12/26(Mon)21:14:47 No.107847561

Anonymous 01/12/26(Mon)21:14:47 No.107847561

>>107847536
Most japanese consumers are still using core 2 duo-era hardware, there's zero incentive for them to release models.

Anonymous
01/12/26(Mon)21:22:06 No.107847605

Anonymous 01/12/26(Mon)21:22:06 No.107847605

>>107847552
So far it's actually doing pretty good.

Anonymous
01/12/26(Mon)21:25:42 No.107847625

Anonymous 01/12/26(Mon)21:25:42 No.107847625

>>107847605
Load up regular 3.2 to cure the placebo effect

Anonymous
01/12/26(Mon)21:37:28 No.107847698

Anonymous 01/12/26(Mon)21:37:28 No.107847698

Lets compare other UIs you've tried, unless your all boomers stuck in your ways.

https://github.com/kwaroran/RisuAI
Risu is okay. I tried it cos it supported the charx format had multiple expression packs and auto replaced expression.png with the correct image in the pack. Nicer UI but less customization options. It lost my message on refresh tho, silly would never do that.

https://github.com/vegu-ai/talemate
Choose your own adventure style, uses agent style step by step actions, at 15 tk/s felt like ages to get to my turn. It has a mini auto generated memory didn't use it long enough enough to make use of it. Wasn't a big fan of of the style personally.

Anonymous
01/12/26(Mon)22:11:31 No.107847879

Anonymous 01/12/26(Mon)22:11:31 No.107847879

>>107847698
>at 15 tk/s felt like ages to get to my turn
That's the problem with agents. Anyone serious enough about llms already has a multigpu rig with shit t/s and will absolutely refuse to use small models. Those who aren't serious wouldn't bother with agents anyway

Anonymous
01/12/26(Mon)22:22:58 No.107847943

Anonymous 01/12/26(Mon)22:22:58 No.107847943

We need significantly better hardware to do agentic tard wrangling with the current models, or better models that don't require tard wrangling. Both options are years away. It's a very depressing hobby

Anonymous
01/12/26(Mon)22:31:07 No.107847974

Anonymous 01/12/26(Mon)22:31:07 No.107847974

>>107847458
you need 10 3090s in a single machine if you want to run a Q2 of deepseek.

Anonymous
01/12/26(Mon)22:31:49 No.107847978

Anonymous 01/12/26(Mon)22:31:49 No.107847978

File: ba74300e-78bd-4f6c-a081-2(...).png (1.25 MB, 762x1171)

1.25 MB PNG

Anonymous
01/12/26(Mon)22:33:40 No.107847989

Anonymous 01/12/26(Mon)22:33:40 No.107847989

>>107847458
Downlaod ollamma
ollama run deepseek-r1

Anonymous
01/12/26(Mon)22:39:24 No.107848023

Anonymous 01/12/26(Mon)22:39:24 No.107848023

Still GLMSEX
Still Nemo

Anonymous
01/12/26(Mon)22:43:28 No.107848041

Anonymous 01/12/26(Mon)22:43:28 No.107848041

>>107847978
sex with russian alcoholic miku

Anonymous
01/12/26(Mon)22:54:35 No.107848088

Anonymous 01/12/26(Mon)22:54:35 No.107848088

>>107842537
Depends on the model, generally there is some kind of encoder that translates the image to tokens the model understands somehow. With llama that takes the form of the "mmproj" goof that you have to download in addition to the model. The original models have those encoders built in, you split them out when doing the quant.

I am a big fan of GLM-4.6V-Flash for many vision tasks. It's a 10B that has reasonable performance. A q8 fits in ~12G, so q6 would fit in 8 or so. Although the math gets funky as you typically want the mmproj at a high quant, higher than the rest of the model.

Anonymous
01/12/26(Mon)22:55:36 No.107848093

Anonymous 01/12/26(Mon)22:55:36 No.107848093

>>107848041
holding russian alcoholic miku's twintails safely back while she wretches and chunders into that gaping porcelain maw

Anonymous
01/12/26(Mon)23:28:30 No.107848237

Anonymous 01/12/26(Mon)23:28:30 No.107848237

File: Screenshot 2026-01-13 at (...).png (161 KB, 1536x877)

161 KB PNG

Does it need to be exactly 1:1 or can I use non-nemo mistral mmprojs with nemo?

Anonymous
01/12/26(Mon)23:33:53 No.107848268

Anonymous 01/12/26(Mon)23:33:53 No.107848268

>>107848237
nemo is not a vision model. Non will work.

Anonymous
01/12/26(Mon)23:34:48 No.107848272

Anonymous 01/12/26(Mon)23:34:48 No.107848272

>>107848237
No. Essentially, it has to induce a state in the main model. If you use a mmproj for a different model it won't do shit.

Anonymous
01/12/26(Mon)23:39:04 No.107848295

Anonymous 01/12/26(Mon)23:39:04 No.107848295

>>107848268
Dang. What's the closest lolicious model that I can actually use even for normal tasks?

Anonymous
01/12/26(Mon)23:48:44 No.107848339

Anonymous 01/12/26(Mon)23:48:44 No.107848339

If I'm doing a master's in machine learning that may continue onto a PhD, should I start learning Chinese?
I'm serious

Anonymous
01/12/26(Mon)23:56:34 No.107848386

Anonymous 01/12/26(Mon)23:56:34 No.107848386

>>107848295
One of these i suppose.
>https://huggingface.co/mistralai/Devstral-Small-2-24B-Instruct-2512
>https://huggingface.co/mistralai/Mistral-Small-3.2-24B-Instruct-2506
>https://huggingface.co/mistralai/Pixtral-12B-2409
I don't know how well supported they are.

Anonymous
01/13/26(Tue)00:22:05 No.107848513

Anonymous 01/13/26(Tue)00:22:05 No.107848513

>>107848339
Surely an educated person would be able to make that decision for themselves rather than asking people who use LLMs to sex little girls

Anonymous
01/13/26(Tue)00:26:41 No.107848541

Anonymous 01/13/26(Tue)00:26:41 No.107848541

>>107848339
The official language of IT is broken esl english.

Anonymous
01/13/26(Tue)00:27:58 No.107848547

Anonymous 01/13/26(Tue)00:27:58 No.107848547

>>107848541
don't forget hindi

Anonymous
01/13/26(Tue)00:30:19 No.107848553

Anonymous 01/13/26(Tue)00:30:19 No.107848553

File: 1749353593171083.jpg (93 KB, 772x525)

93 KB JPG

>>107848548

Anonymous
01/13/26(Tue)00:30:20 No.107848555

Anonymous 01/13/26(Tue)00:30:20 No.107848555

>>107848339
I would visit china before making such a decision.

Anonymous
01/13/26(Tue)00:32:44 No.107848561

Anonymous 01/13/26(Tue)00:32:44 No.107848561

Pangu gguf status?

Anonymous
01/13/26(Tue)00:45:13 No.107848611

Anonymous 01/13/26(Tue)00:45:13 No.107848611

>>107848561
noot noot

Anonymous
01/13/26(Tue)00:46:37 No.107848617

Anonymous 01/13/26(Tue)00:46:37 No.107848617

>>107844098
>implemented completely client side
I am doing a beam-like search with a client I built (using logprobs and keeping only sequence possibilities above a threshold, to a limited depth). It's fairly slow, and I have to wonder if it couldn't be sped up by smart batching that would be much easier to do in the cpp vs hoping auto batching of API calls happens to not be shit.
Can't be fucked to learn cpp though.

Anonymous
01/13/26(Tue)00:56:11 No.107848663

Anonymous 01/13/26(Tue)00:56:11 No.107848663

File: 09~01.png (177 KB, 267x361)

177 KB PNG

If I rip out the retarded nvidia app and drivers with DDU and reinstall drivers with nvclean does it fuck shit in AI? I'm keeping the normal CUDA install.

Anonymous
01/13/26(Tue)00:56:48 No.107848664

Anonymous 01/13/26(Tue)00:56:48 No.107848664

>>107848561
https://huggingface.co/FreedomIntelligence/openPangu-R-72B-2512

Anonymous
01/13/26(Tue)01:01:22 No.107848677

Anonymous 01/13/26(Tue)01:01:22 No.107848677

>>107848663
No, nvidia app and geforce experience before it were never needed for anything other than using those programs. I'd recommend using nvcleanstall to install nvidia drivers.

Anonymous
01/13/26(Tue)01:30:26 No.107848836

Anonymous 01/13/26(Tue)01:30:26 No.107848836

>>107847437
>shouldn't use abliterations.
Are you joking? The AI needs to be honest. I only use models with abliteration - if I want it to describe a lewd scene, do that. I don't want it to give me some pink-haired feminist view that is a "collapsing western society" parasitical view.

Anonymous
01/13/26(Tue)01:31:33 No.107848844

Anonymous 01/13/26(Tue)01:31:33 No.107848844

>>107848836
>The AI needs to be honest
Not remotely what abliterations do

Anonymous
01/13/26(Tue)01:36:21 No.107848869

Anonymous 01/13/26(Tue)01:36:21 No.107848869

File: 1767219696869575.jpg (148 KB, 813x1200)

148 KB JPG

clues to get started on using a local model to read out epub books?

Anonymous
01/13/26(Tue)01:37:53 No.107848873

Anonymous 01/13/26(Tue)01:37:53 No.107848873

>>107848869
Why do you need an LLM for that? There's plenty of text to speech solutions that have been around for decades now.

Anonymous
01/13/26(Tue)01:44:28 No.107848915

Anonymous 01/13/26(Tue)01:44:28 No.107848915

File: 1767419351169610.jpg (160 KB, 840x1119)

160 KB JPG

>>107848873
sounds fun
i found this for now, uses azure rather than local https://github.com/p0n1/epub_to_audiobook

Anonymous
01/13/26(Tue)02:06:25 No.107849013

Anonymous 01/13/26(Tue)02:06:25 No.107849013

>>107848869
This is a one line shellscript on a Mac.
pandoc -f epub -t txt ebook.epub | say -o audiobook.aiff && ffmpeg -i audiobook.aiff audiobook.mp3
Linux has "espeak" to do something similar to "say" that dumps to WAV rather than AIFF.
An AI would get you a more realistic narrator voice but that's it.

Anonymous
01/13/26(Tue)02:40:33 No.107849166

Anonymous 01/13/26(Tue)02:40:33 No.107849166

>>107848869
welcome spring?

Anonymous
01/13/26(Tue)02:42:34 No.107849177

Anonymous 01/13/26(Tue)02:42:34 No.107849177

File: 1757715769071542.png (3.04 MB, 1080x1920)

3.04 MB PNG

>>107847320

Anonymous
01/13/26(Tue)02:52:43 No.107849219

Anonymous 01/13/26(Tue)02:52:43 No.107849219

>>107847320
Tits

Anonymous
01/13/26(Tue)02:58:16 No.107849245

Anonymous 01/13/26(Tue)02:58:16 No.107849245

>>107849177
Teto used to be free soft but now she's bloatware that you have to pay for.

Anonymous
01/13/26(Tue)03:00:23 No.107849252

Anonymous 01/13/26(Tue)03:00:23 No.107849252

>>107848869
https://github.com/denizsafak/abogen

Anonymous
01/13/26(Tue)03:05:48 No.107849273

Anonymous 01/13/26(Tue)03:05:48 No.107849273

File: 1749956149237579.jpg (94 KB, 1126x1448)

94 KB JPG

>>107849245
She's still pretty soft

Anonymous
01/13/26(Tue)03:11:09 No.107849293

Anonymous 01/13/26(Tue)03:11:09 No.107849293

Is nemo still the goat?

Anonymous
01/13/26(Tue)03:11:37 No.107849297

Anonymous 01/13/26(Tue)03:11:37 No.107849297

>>107849293
if you are poor, yes

Anonymous
01/13/26(Tue)03:17:54 No.107849326

Anonymous 01/13/26(Tue)03:17:54 No.107849326

>>107841511
I wonder how prompt repetition would work with multi-turn conversations.

Anonymous
01/13/26(Tue)03:39:20 No.107849418

Anonymous 01/13/26(Tue)03:39:20 No.107849418

>>107849293
Yes. Nemo is truly the localest of all models.

Anonymous
01/13/26(Tue)03:41:49 No.107849430

Anonymous 01/13/26(Tue)03:41:49 No.107849430

>>107849293
the framework?

Anonymous
01/13/26(Tue)03:43:51 No.107849437

Anonymous 01/13/26(Tue)03:43:51 No.107849437

>>107849430
Well done doing the thing. The thing is funny. You're a funny man doing the thing.

Anonymous
01/13/26(Tue)04:02:10 No.107849517

Anonymous 01/13/26(Tue)04:02:10 No.107849517

>>107849428
I think you're a retard who can't prompt normal models and has to resort to brain damaged garbage

Anonymous
01/13/26(Tue)04:14:28 No.107849563

Anonymous 01/13/26(Tue)04:14:28 No.107849563

>>107849297
>poor
we prefer the term "financially challenged"

Anonymous
01/13/26(Tue)04:28:48 No.107849640

Anonymous 01/13/26(Tue)04:28:48 No.107849640

is it just me, or did the default flip from mmap by default to no mmap by default?

Anonymous
01/13/26(Tue)04:33:21 No.107849662

Anonymous 01/13/26(Tue)04:33:21 No.107849662

>>107849640
And that's a good thing.

Anonymous
01/13/26(Tue)04:34:15 No.107849668

Anonymous 01/13/26(Tue)04:34:15 No.107849668

>>107849517
>speak of the jew and he gets mad

Anonymous
01/13/26(Tue)04:45:37 No.107849713

Anonymous 01/13/26(Tue)04:45:37 No.107849713

>>107849640
I really can't think of a single use case where you would want it to be on.

Anonymous
01/13/26(Tue)04:55:52 No.107849759

Anonymous 01/13/26(Tue)04:55:52 No.107849759

>>107849713
When you have more VRAM than RAM. It'll OOM otherwise.

Anonymous
01/13/26(Tue)04:56:32 No.107849764

Anonymous 01/13/26(Tue)04:56:32 No.107849764

>>107849428
I think there are still downsides with even the best alibteration techniques out there but if you are going to do a finetune anyways, it's a better place to start training from than whatever the actual base model is.

Anonymous
01/13/26(Tue)04:58:56 No.107849779

Anonymous 01/13/26(Tue)04:58:56 No.107849779

>>107849759
It won't. It doesn't allocate memory for the entire model.

Anonymous
01/13/26(Tue)05:05:49 No.107849813

Anonymous 01/13/26(Tue)05:05:49 No.107849813

>>107848844
>>107848836
>The AI needs to be honest
Not remotely what abliterations do

Agreed. That's why I posted about the power brick charging itself.

I took a photo of some plant and asked Gemma-3-Ablitarded what is was.
It identified it as plant X.
I replied "fuck I need to find plant Y for <purpose>"
It responded by saying I'm right, given I need plant Y for <purpose>, what I found was actually plant Y.

Anonymous
01/13/26(Tue)05:07:52 No.107849825

Anonymous 01/13/26(Tue)05:07:52 No.107849825

>>107849813
That sounds like typical current year model behaviour that isn't unique to abliterated models.

Anonymous
01/13/26(Tue)05:10:20 No.107849831

Anonymous 01/13/26(Tue)05:10:20 No.107849831

>>107849779
Yes it does. I've been in that position twice and had to use mmap to even load the model.

Anonymous
01/13/26(Tue)05:12:50 No.107849847

Anonymous 01/13/26(Tue)05:12:50 No.107849847

Anyone tested this model yet?
https://huggingface.co/miromind-ai/MiroThinker-v1.5-30B

Anonymous
01/13/26(Tue)05:17:08 No.107849870

Anonymous 01/13/26(Tue)05:17:08 No.107849870

>>107849831
Are you on Linux? I always had to disable mmap on Windows because it caused problems.

https://desuarchive.org/g/thread/107623385/#107633623

Anonymous
01/13/26(Tue)05:18:52 No.107849880

Anonymous 01/13/26(Tue)05:18:52 No.107849880

>>107849847
qwen is shit, a finetune will be as well.

Anonymous
01/13/26(Tue)05:20:06 No.107849886

Anonymous 01/13/26(Tue)05:20:06 No.107849886

File: 1760005595078744.png (18 KB, 1162x404)

18 KB PNG

Anonymous
01/13/26(Tue)05:22:02 No.107849895

Anonymous 01/13/26(Tue)05:22:02 No.107849895

>>107849886
How the fuck can you live with that font rendering

Anonymous
01/13/26(Tue)05:27:22 No.107849912

Anonymous 01/13/26(Tue)05:27:22 No.107849912

>>107849895
That's how we used to render fonts before some designer retards decided that text looks better if it looks like the entire screen has a thin layer of vaseline smeared over it.

Anonymous
01/13/26(Tue)05:29:00 No.107849925

Anonymous 01/13/26(Tue)05:29:00 No.107849925

>>107849870
The first time was Windows a few years ago the second was Arch Linux last year. Taking 10 minutes to load a model is better than not being able to load it at all.

Anonymous
01/13/26(Tue)05:31:17 No.107849941

Anonymous 01/13/26(Tue)05:31:17 No.107849941

>>107849925
Even for models that don't fit in vram disabling mmap was always better here.

Anonymous
01/13/26(Tue)05:32:19 No.107849948

Anonymous 01/13/26(Tue)05:32:19 No.107849948

>>107849941
Please reread: >>107849759

Anonymous
01/13/26(Tue)05:34:21 No.107849966

Anonymous 01/13/26(Tue)05:34:21 No.107849966

>>107849948
I have more vram than ram and loading models with mmap enabled causes Windows to shit itself as described in the archived post in >>107849870

Anonymous
01/13/26(Tue)05:38:44 No.107849976

Anonymous 01/13/26(Tue)05:38:44 No.107849976

>>107849966
But you had enough memory to hold the model in the first place to load it successfully. I keep telling you that doesn't work when you don't have enough system memory to fit the model. Try to load Q8 Nemo onto a 3090 when you only have 8GB of RAM and memory still in use never comes up because it will OOM before reaching that point. We're just going in circles at this point.

Anonymous
01/13/26(Tue)05:42:57 No.107849992

Anonymous 01/13/26(Tue)05:42:57 No.107849992

>>107849976
I am on Windows. I did not test this on Linux.
The model is larger than the amount of ram I have.
The model is smaller than the amount of vram I have.
The model loads fine with mmap off.
The model takes 10 times as long to load with mmap on.

Anonymous
01/13/26(Tue)05:50:50 No.107850019

Anonymous 01/13/26(Tue)05:50:50 No.107850019

>>107849912
Get a better monitor jeez

Anonymous
01/13/26(Tue)05:51:10 No.107850023

Anonymous 01/13/26(Tue)05:51:10 No.107850023

>>107849912
Looks like shit

Anonymous
01/13/26(Tue)06:12:20 No.107850134

Anonymous 01/13/26(Tue)06:12:20 No.107850134

https://huggingface.co/baichuan-inc/Baichuan-M3-235B
How retarded would it be trying to RP with this thing? I'm curious whether a big MOE trained on tons of medical data would actually improve its anatomical knowledge and help it write stuff like gore better/more accurately, or if it would just end up being super sterile

Anonymous
01/13/26(Tue)06:13:42 No.107850139

Anonymous 01/13/26(Tue)06:13:42 No.107850139

>>107850134
It's just a finetune of Qwen, and Qwen is already shit at creative.

Anonymous
01/13/26(Tue)06:25:31 No.107850194

Anonymous 01/13/26(Tue)06:25:31 No.107850194

>>107849912
holy fucking cope

Anonymous
01/13/26(Tue)06:31:21 No.107850231

Anonymous 01/13/26(Tue)06:31:21 No.107850231

>>107850019
>>107850023
>>107850194
Open a book and see if the text is gray around the edges or not.

Anonymous
01/13/26(Tue)06:37:40 No.107850261

Anonymous 01/13/26(Tue)06:37:40 No.107850261

>>107850231
If I opened a book and its font looked anything like your screenshot then I would return it.

Anonymous
01/13/26(Tue)06:41:44 No.107850290

Anonymous 01/13/26(Tue)06:41:44 No.107850290

>>107850231
Bro, I'm literally writing books for a living. Your font is shit. Stop being obtuse for no reason.

Anonymous
01/13/26(Tue)06:54:14 No.107850350

Anonymous 01/13/26(Tue)06:54:14 No.107850350

>https://huggingface.co/ByteDance/Ouro-2.6B
>2.6B = 12B performance
is it legit?

Anonymous
01/13/26(Tue)06:57:32 No.107850362

Anonymous 01/13/26(Tue)06:57:32 No.107850362

File: Screenshot 2025-06-04 at (...).png (255 KB, 550x528)

255 KB PNG

>>107850350
>umm our benchmark chart shows that...

Anonymous
01/13/26(Tue)06:58:56 No.107850368

Anonymous 01/13/26(Tue)06:58:56 No.107850368

Ernie-5.0-preview-1203 is now the top Chinese model on lmarena

Anonymous
01/13/26(Tue)07:00:41 No.107850376

Anonymous 01/13/26(Tue)07:00:41 No.107850376

>>107850350
Oh shit. A looped language model, finally.
Here's hoping that's the next big thing.
Individual MoE expert tensors tend to not be "saturated" right? Since they only ever see a small portion of the training data.
Doesn't that mean that a looped MoE could be quite the thing (and a lot more complicated)?

Anonymous
01/13/26(Tue)07:01:12 No.107850378

Anonymous 01/13/26(Tue)07:01:12 No.107850378

>>107850350
I trust ByteDance more than I trust Meta/Mistral (granted it's a low bar)

Anonymous
01/13/26(Tue)07:02:19 No.107850383

Anonymous 01/13/26(Tue)07:02:19 No.107850383

>>107849912
Looks fine if the font was designed to be rendered that way, and if the screen resolution isn't too high.

Anonymous
01/13/26(Tue)07:03:17 No.107850392

Anonymous 01/13/26(Tue)07:03:17 No.107850392

>>107850350
oooooooo weeeee, this seems cool. Anyone test it out yet?

Anonymous
01/13/26(Tue)07:04:32 No.107850396

Anonymous 01/13/26(Tue)07:04:32 No.107850396

File: __hatsune_miku_vocaloid_d(...).jpg (100 KB, 850x1200)

100 KB JPG

>>107850368
>lmarena

Anonymous
01/13/26(Tue)07:08:41 No.107850413

Anonymous 01/13/26(Tue)07:08:41 No.107850413

>>107850350
gotta be some tradeoffs, right?

Anonymous
01/13/26(Tue)07:16:15 No.107850447

Anonymous 01/13/26(Tue)07:16:15 No.107850447

File: Guro.png (641 KB, 1022x428)

641 KB PNG

>>107850350
>Guro

Anonymous
01/13/26(Tue)07:17:56 No.107850456

Anonymous 01/13/26(Tue)07:17:56 No.107850456

>>107850134
I can't speak for that model, but I will say that in the brief time I fucked around with it for lulz before using it for medical stuff, Medgemma didn't seem any worse at RP than regular Gemma.

Not a terribly high bar, but still.

Anonymous
01/13/26(Tue)07:19:21 No.107850458

Anonymous 01/13/26(Tue)07:19:21 No.107850458

>>107850413
Reduced information capacity. It might be a 2.7B with the reasoning capabilities of a 12B model, but the information capacity is still that of a 2.7B. On the other hand, reusing parameters increases usage efficiency (many LLM layers, especially deep ones, are rarely well utilized), so it's hard to tell for sure.

Anonymous
01/13/26(Tue)07:23:16 No.107850478

Anonymous 01/13/26(Tue)07:23:16 No.107850478

File: 1659037127516197.jpg (57 KB, 1024x755)

57 KB JPG

Is there a way to load multiple small models at once into a group chat and let them debate over a topic? Bonus points for option to give them a limit on turns. Is swapping models within vram also a performance killer?

Anonymous
01/13/26(Tue)07:25:27 No.107850486

Anonymous 01/13/26(Tue)07:25:27 No.107850486

>>107850458
What about the vectoring treshold?

Anonymous
01/13/26(Tue)07:28:29 No.107850501

Anonymous 01/13/26(Tue)07:28:29 No.107850501

>>107850486
>vectoring treshold
I don't know what's that.

For the same total parameter budget (i.e. VRAM), you could in theory make a small model with the hidden and intermediate dimensions of a much larger one, if you loop over/reuse Transformer blocks, if that's what you want to know.

Anonymous
01/13/26(Tue)07:33:03 No.107850519

Anonymous 01/13/26(Tue)07:33:03 No.107850519

File: looping_knowledge_capacity.png (313 KB, 1086x845)

313 KB PNG

>>107850501
https://arxiv.org/pdf/2510.25741
Picture related.

Anonymous
01/13/26(Tue)07:37:32 No.107850539

Anonymous 01/13/26(Tue)07:37:32 No.107850539

It's so boring, I've done it all, coomed in a million ways. Is it the models that are retarded and predictable or is it my human nature that has endless desires and seeks novel thrills to no end?

Anonymous
01/13/26(Tue)07:41:43 No.107850562

Anonymous 01/13/26(Tue)07:41:43 No.107850562

>>107848617
>implemented completely client side
I am doing a beam-like search with a client I built (using logprobs and keeping only sequence possibilities above a threshold, to a limited depth). It's fairly slow, and I have to wonder if it couldn't be sped up by smart batching that would be much easier to do in the cpp vs hoping auto batching of API calls happens to not be shit.
Can't be fucked to learn cpp though.

Your client on gh? Save me vibe coding.

Anonymous
01/13/26(Tue)07:44:38 No.107850580

Anonymous 01/13/26(Tue)07:44:38 No.107850580

Has anyone managed to rein in the censorship on GLM 4.7? I've been proompting for days and it's still a coinflip on if it decides to go along with a scenario or not.

Anonymous
01/13/26(Tue)07:44:47 No.107850582

Anonymous 01/13/26(Tue)07:44:47 No.107850582

>>107850539
Both

Anonymous
01/13/26(Tue)07:53:03 No.107850634

Anonymous 01/13/26(Tue)07:53:03 No.107850634

>.nvidia/NVIDIA-Nemotron-3-Super-120B-BF16-BF16KV-010726

is this oss120b finetune going to be actually good?

Anonymous
01/13/26(Tue)07:57:32 No.107850660

Anonymous 01/13/26(Tue)07:57:32 No.107850660

File: ProjectAni.png (273 KB, 1920x951)

273 KB PNG

Update:
- Added smooth interpolation between BVH animations
- Added Piper TTS with parallel "punctuation chunk" processing

Anonymous
01/13/26(Tue)07:59:35 No.107850668

Anonymous 01/13/26(Tue)07:59:35 No.107850668

>>107850660
source when?

Anonymous
01/13/26(Tue)08:00:14 No.107850672

Anonymous 01/13/26(Tue)08:00:14 No.107850672

>>107850660
It better be free. I ain't paying for jack shit.

Anonymous
01/13/26(Tue)08:05:06 No.107850702

Anonymous 01/13/26(Tue)08:05:06 No.107850702

File: stefan-molyneux-1024x1041(...).jpg (47 KB, 1024x1041)

47 KB JPG

>>107850672
One dollar.

Anonymous
01/13/26(Tue)08:07:46 No.107850714

Anonymous 01/13/26(Tue)08:07:46 No.107850714

>>107850660
I hope you are making an online api and ask for a nominal subscription fee just to annoy these chronic masturbators.

Anonymous
01/13/26(Tue)08:34:47 No.107850879

Anonymous 01/13/26(Tue)08:34:47 No.107850879

>>107850634
The question would be whether it's "dequanted" and then additionally finetuned over GPTASS or if OAI actually gave them the unquantized base model weights for OSS. In which case it might actually interesting to see how the model is when it doesn't have that absurd safetyslop lobotomy that OAI gave GPT-OSS.

Anonymous
01/13/26(Tue)08:36:29 No.107850892

Anonymous 01/13/26(Tue)08:36:29 No.107850892

>>107850634
Was any nvidia finetune actually worth anything?
The old nemo doesn't count, that one wasn't a finetune of an existing model.

Anonymous
01/13/26(Tue)08:42:04 No.107850928

Anonymous 01/13/26(Tue)08:42:04 No.107850928

>>107850702
The act of paying means I'm risking handing my card details to street shitters, either directly or indirectly through dogshit vibecoded security by the processor
Free or bust

Anonymous
01/13/26(Tue)08:45:06 No.107850947

Anonymous 01/13/26(Tue)08:45:06 No.107850947

>>107850928
There wouldn't be a point if he paywalled it. At that point might as well pay to use the real thing.

Anonymous
01/13/26(Tue)08:48:11 No.107850963

Anonymous 01/13/26(Tue)08:48:11 No.107850963

>>107850539
Stop cooming, larp as soldier in battle of Alamo.

Anonymous
01/13/26(Tue)08:49:09 No.107850971

Anonymous 01/13/26(Tue)08:49:09 No.107850971

>>107850928
>what is paypal
jesus I'm amazed you haven't been scammed yet

Anonymous
01/13/26(Tue)08:50:27 No.107850983

Anonymous 01/13/26(Tue)08:50:27 No.107850983

>>107850660
Pls gibs

>>107850672
>>107850702
I'd happily donate to anon if he leave a bitcoin wallet or paypal
Nobody seems to give a shit about the kind of frontend anon is creating, they have earned it

Anonymous
01/13/26(Tue)08:51:01 No.107850987

Anonymous 01/13/26(Tue)08:51:01 No.107850987

>>107850971
Did I fucking stutter

Anonymous
01/13/26(Tue)08:52:11 No.107850997

Anonymous 01/13/26(Tue)08:52:11 No.107850997

>>107850928
Your bank doesn't provide you with one time virtual cards??

Anonymous
01/13/26(Tue)09:18:34 No.107851147

Anonymous 01/13/26(Tue)09:18:34 No.107851147

>>107850997
no

Anonymous
01/13/26(Tue)09:24:14 No.107851185

Anonymous 01/13/26(Tue)09:24:14 No.107851185

>>107850947
I made this thing in protest of xAI not releasing companions on android or web. I'm proving that they're lazy and incompetent. I only started learning about AI in December and I've built this in like 3 weeks. Also Grok is $30/mo, which more than most mainstream competitors. Please hire me Elon Musk, if you're reading this.

It's also an exercise to see how few LOC I can use to create something with feature parity to SillyTavern. Currently it's just 2k lines of code, believe it or not. I'd almost feel bad about selling it for that reason--but hey, if there's real demand I'll take gibs if I can get them.

>>107850983
Appreciate the support/sentiment. I'll worry about monetization later, if I do at all. I still have a lot of things I want to add before release.

Anonymous
01/13/26(Tue)09:30:34 No.107851216

Anonymous 01/13/26(Tue)09:30:34 No.107851216

>>107851185
This was poorly worded. What I was getting at is that there actually would a be point for people to potentially pay for it because I could make it a much cheaper alternative that is available on all platforms via the web. I've already tested it on mobile via the web and it works great.

Anonymous
01/13/26(Tue)09:34:21 No.107851241

Anonymous 01/13/26(Tue)09:34:21 No.107851241

>>107850478
You can setup multiple clients at once if you have enough vram.
I didn't do it but I remember looking into swapping models for a task runner, and with vllm you should be able to with some simple python. But yeah when swapping your going to have to wait the extra time for each model to load into vram.

Anonymous
01/13/26(Tue)09:35:27 No.107851246

Anonymous 01/13/26(Tue)09:35:27 No.107851246

>>107850478
You can swap out models too, the negative is that loading into memory adds delays.

Anonymous
01/13/26(Tue)09:36:51 No.107851258

Anonymous 01/13/26(Tue)09:36:51 No.107851258

>>107851185
>>107851216
Keep up the good work. If you ever do get tired of it and drop it before release, at least give us the broken pieces to play with, even if it's half-assed.
Just saying, hope it won't come to that.

Anonymous
01/13/26(Tue)09:46:20 No.107851309

Anonymous 01/13/26(Tue)09:46:20 No.107851309

File: ComfyUI_temp_nyvpd_00070_.png (2.63 MB, 1472x896)

2.63 MB PNG

>>107851241
>>107851246
No I mean for example 3 small models being loaded in vram at once and only swapping core usage. Is that not possible?

Anonymous
01/13/26(Tue)09:50:04 No.107851325

Anonymous 01/13/26(Tue)09:50:04 No.107851325

Just copy the code from here and ask your local model to add integration with llama.cpp
https://pixiv.github.io/three-vrm/packages/three-vrm/examples/

Anonymous
01/13/26(Tue)09:56:16 No.107851370

Anonymous 01/13/26(Tue)09:56:16 No.107851370

is a NVIDIA Tesla P40 24GB good enough to run LLMs? I have a 3090 in my desktop PC but want a cheaper card in my home server

Anonymous
01/13/26(Tue)09:57:42 No.107851384

Anonymous 01/13/26(Tue)09:57:42 No.107851384

>>107851370
too old

Anonymous
01/13/26(Tue)09:59:13 No.107851397

Anonymous 01/13/26(Tue)09:59:13 No.107851397

>>107851309
Depends how you want to do it. Like a UI that supports it out the box probably not.
In python you can spin up 3 vllm clients with different models and call them how you want.
Hacking it into an existing client, if I had to, I would probably setup a proxy api and have the requests routed to the correct model based on some identifier in the prompts.

Anonymous
01/13/26(Tue)09:59:32 No.107851398

Anonymous 01/13/26(Tue)09:59:32 No.107851398

>>107851384
>too old
Whats a decent card that doesnt cost too much? My budget is $300-$400
I also have a 1080ti sitting unused in an old computer, would that be better?

Anonymous
01/13/26(Tue)10:00:15 No.107851407

Anonymous 01/13/26(Tue)10:00:15 No.107851407

>>107851398
>My budget is $300-$400
How do we tell him?

Anonymous
01/13/26(Tue)10:00:36 No.107851413

Anonymous 01/13/26(Tue)10:00:36 No.107851413

>>107851398
Pascal arch itself is too old for this shit you need Ampere+

Anonymous
01/13/26(Tue)10:01:27 No.107851423

Anonymous 01/13/26(Tue)10:01:27 No.107851423

>>107851370
Still supported, ignore the fags.

Anonymous
01/13/26(Tue)10:01:30 No.107851424

Anonymous 01/13/26(Tue)10:01:30 No.107851424

>>107851397
>In python you can spin up 3 vllm clients
vllm didn't work on my windows. hmm maybe docker?

Anonymous
01/13/26(Tue)10:02:15 No.107851429

Anonymous 01/13/26(Tue)10:02:15 No.107851429

>>107851423
slower than ddr5 and only supported by kekcpp but sure waste your money

Anonymous
01/13/26(Tue)10:04:13 No.107851444

Anonymous 01/13/26(Tue)10:04:13 No.107851444

>>107851429
and llama.cpp and ollama

Anonymous
01/13/26(Tue)10:04:46 No.107851450

Anonymous 01/13/26(Tue)10:04:46 No.107851450

>>107851429
a kit of 32gb DDR5 costs more then that card right now
yet alone a cpu + mobo

Anonymous
01/13/26(Tue)10:05:39 No.107851457

Anonymous 01/13/26(Tue)10:05:39 No.107851457

>>107851444
these are all included in kekcpp bs

Anonymous
01/13/26(Tue)10:06:26 No.107851463

Anonymous 01/13/26(Tue)10:06:26 No.107851463

>>107851429
What are you supposed to use?

Anonymous
01/13/26(Tue)10:07:02 No.107851465

Anonymous 01/13/26(Tue)10:07:02 No.107851465

>>107851463
Llama.cpp is literally the standard. Ignore the retards.

Anonymous
01/13/26(Tue)10:07:22 No.107851469

Anonymous 01/13/26(Tue)10:07:22 No.107851469

>>107851463
RTX PRO 6000 on vllm

Anonymous
01/13/26(Tue)10:12:04 No.107851506

Anonymous 01/13/26(Tue)10:12:04 No.107851506

>>107851424
Not sure, I use linux. It'll either be wsl or docker.
I've had my own headaches with vllm and had to compile from source to get it to play nice.
If your not using uv try that, I found it was more reliable.

Anonymous
01/13/26(Tue)10:12:04 No.107851507

Anonymous 01/13/26(Tue)10:12:04 No.107851507

File: 2749103970.jpg (263 KB, 1045x1080)

263 KB JPG

>>107851463
i don't think you understand
The more you buy
The more you save

Anonymous
01/13/26(Tue)10:12:25 No.107851512

Anonymous 01/13/26(Tue)10:12:25 No.107851512

>>107851429
>slower than ddr5
>>107851469
>on vllm
you can run vLLM on DDR5?

Anonymous
01/13/26(Tue)10:13:09 No.107851518

Anonymous 01/13/26(Tue)10:13:09 No.107851518

>>107851512
no but using a p40 is worse than coping with ddr on retardcpp family

Anonymous
01/13/26(Tue)10:37:21 No.107851664

Anonymous 01/13/26(Tue)10:37:21 No.107851664

File: IMG_9166.jpg (838 KB, 1817x2776)

838 KB JPG

>>107847320

Anonymous
01/13/26(Tue)10:38:35 No.107851675

Anonymous 01/13/26(Tue)10:38:35 No.107851675

>>107851664
Please seek mental help asap.

Anonymous
01/13/26(Tue)10:45:44 No.107851719

Anonymous 01/13/26(Tue)10:45:44 No.107851719

>>107851664
Very cute.

Anonymous
01/13/26(Tue)10:48:45 No.107851748

Anonymous 01/13/26(Tue)10:48:45 No.107851748

https://www.reddit.com/r/LocalLLaMA/comments/1qbrgze/idea_hf_should_have_upvodedownvote/

Anonymous
01/13/26(Tue)10:48:53 No.107851750

Anonymous 01/13/26(Tue)10:48:53 No.107851750

>>107851664
I like this Tetoserver

Anonymous
01/13/26(Tue)10:51:57 No.107851776

Anonymous 01/13/26(Tue)10:51:57 No.107851776

P40 still works but you're limited to cuda 12.8 and toch 2.7. llama.cpp still supports it. For pure text LLM it is doable, especially MoE.

Anonymous
01/13/26(Tue)10:55:08 No.107851795

Anonymous 01/13/26(Tue)10:55:08 No.107851795

>>107851776
The second you might want to dabble in anything else though you're SOL.

Anonymous
01/13/26(Tue)10:56:07 No.107851803

Anonymous 01/13/26(Tue)10:56:07 No.107851803

>>107851776
Yes, but then you are 100% reliant on llama.cpp even for pure text LLM. You won't be able to use vLLM, exllama, or even ik_llama.cpp.

llama.cpp CUDA dev !!yhbFjk57TDr
01/13/26(Tue)10:57:29 No.107851817

llama.cpp CUDA dev !!yhbFjk57TDr 01/13/26(Tue)10:57:29 No.107851817

>>107851370
I would say that nowadays a MI50 with 32 GB of memory is the better buy.
You will have pretty much the same problems but better hardware.

Anonymous
01/13/26(Tue)10:58:19 No.107851822

Anonymous 01/13/26(Tue)10:58:19 No.107851822

File: 1751283274778725.jpg (29 KB, 520x476)

29 KB JPG

So with minimax being terminally retarded, what other options do I have at that size level

Anonymous
01/13/26(Tue)11:29:57 No.107852130

Anonymous 01/13/26(Tue)11:29:57 No.107852130

might be a frequent question but any UIs implement shit like having the local LLM search the web, automatically create files and other similar stuff?

Anonymous
01/13/26(Tue)11:33:00 No.107852166

Anonymous 01/13/26(Tue)11:33:00 No.107852166

>>107851817
Thank you for letting us know.

Anonymous
01/13/26(Tue)12:16:50 No.107852486

Anonymous 01/13/26(Tue)12:16:50 No.107852486

>>107852470
Oh, no... I mean... good... yes...
Is that good or bad?

Anonymous
01/13/26(Tue)12:31:40 No.107852612

Anonymous 01/13/26(Tue)12:31:40 No.107852612

I have one sex scene per week, the rest of the roleplay is all buildup for that moment and I never goon.

Anonymous
01/13/26(Tue)12:34:51 No.107852628

Anonymous 01/13/26(Tue)12:34:51 No.107852628

>>107852612
i unironically go through at least 32k of tokens each time with setting up a story before i can even start to consider doing something. i dont understand how people can just immediately jerk their dick without a premise or backstory.

Anonymous
01/13/26(Tue)12:35:19 No.107852635

Anonymous 01/13/26(Tue)12:35:19 No.107852635

File: avatar.png (212 KB, 383x569)

212 KB PNG

>>107847978
>>107848041
>>107848093

https://files.catbox.moe/8r30uq.mp4

Anonymous
01/13/26(Tue)12:36:50 No.107852645

Anonymous 01/13/26(Tue)12:36:50 No.107852645

>>107852635
How come transitions are not figured out yet?

Anonymous
01/13/26(Tue)12:41:34 No.107852675

Anonymous 01/13/26(Tue)12:41:34 No.107852675

>>107852645
There is something like "SVI 2.0 Pro"

https://comfyui-wiki.com/en/news/2025-12-27-svi-2-0-pro-wan-2-2-release

I did not figure yet how to use it with LoRA's

Anonymous
01/13/26(Tue)12:44:59 No.107852708

Anonymous 01/13/26(Tue)12:44:59 No.107852708

>>107852675
>how to use it with LoRA's
I mean how to use a specific lora for a specific 5-sec sequence

Anonymous
01/13/26(Tue)12:46:40 No.107852723

Anonymous 01/13/26(Tue)12:46:40 No.107852723

>>107851463
h200

Anonymous
01/13/26(Tue)12:59:49 No.107852843

Anonymous 01/13/26(Tue)12:59:49 No.107852843

File: 2009646862480101706.gif (877 KB, 1248x1244)

877 KB GIF

tetoesday

Anonymous
01/13/26(Tue)13:00:13 No.107852846

Anonymous 01/13/26(Tue)13:00:13 No.107852846

>>107852645
my transition was flawess

Anonymous
01/13/26(Tue)13:22:19 No.107853006

Anonymous 01/13/26(Tue)13:22:19 No.107853006

>>107850562
>client on gh?
Nah, it's shit.
It's like 60% half-implemented poorly thought out plugin system (the beam search is a "plugin"), 30% completion API enablement (uses chat-like format for its storage, but formats them to various templates for use with a completion API because I want to be able to do retarded things like use arbitrary roles and "continue" them regardless of role), and 10% I have no clue how to write svelte so instead have some spaghetti.

Anonymous
01/13/26(Tue)13:27:45 No.107853047

Anonymous 01/13/26(Tue)13:27:45 No.107853047

>>107852843
always remember to wind up your teto regularly

Anonymous
01/13/26(Tue)13:29:28 No.107853060

Anonymous 01/13/26(Tue)13:29:28 No.107853060

>>107853006
How do you decide which beam is better?

Anonymous
01/13/26(Tue)13:37:41 No.107853137

Anonymous 01/13/26(Tue)13:37:41 No.107853137

why are people here so evil compared to the chatbot general? they know less but are way more friendly

Anonymous
01/13/26(Tue)13:40:32 No.107853163

Anonymous 01/13/26(Tue)13:40:32 No.107853163

>>107853137
Evil?

Anonymous
01/13/26(Tue)13:41:17 No.107853168

Anonymous 01/13/26(Tue)13:41:17 No.107853168

>>107853137
I think you'll find that we are very friendly towards those that demonstrate that they've read the OP but still have questions >>107837436

Anonymous
01/13/26(Tue)13:41:36 No.107853170

Anonymous 01/13/26(Tue)13:41:36 No.107853170

>>107853137
Because the troon janny that shits up AI generals has really thin skin when it comes to lolishit. And there's a lot more of that on aicg.

Anonymous
01/13/26(Tue)13:44:30 No.107853196

Anonymous 01/13/26(Tue)13:44:30 No.107853196

>>107853060
You get a logprob with each possibility so you just combine those to get the total sequence prob. From there you can implement your own temperature/samplers/etc (strictly speaking beam search is keeping top K possibilities at each depth).
My goal is more to explore the space for ideas, so I keep all beams at each depth above a (very very low) probability threshold and then present all options to the user ordered by prob for them to choose from. Probably stupid to do, but it's also fun and seems to work well enough. It's like google autocomplete, looking to see what the model thinks the user will be asking it based on instruct training when the context is just something like "Why does the" as the user input and no end of turn token.
Its why I have a modes that turns the whole "chat" into just concatenated text for use with base models for fiction writing, but can switch to an instruct to force it to write certain passages or generate ideas or whatever.

>>107853137
Ignorance is bliss.

Anonymous
01/13/26(Tue)14:05:30 No.107853351

Anonymous 01/13/26(Tue)14:05:30 No.107853351

>>107852846
except for the gaping axe wound

Anonymous
01/13/26(Tue)14:07:57 No.107853368

Anonymous 01/13/26(Tue)14:07:57 No.107853368

Melon.

Anonymous
01/13/26(Tue)14:09:14 No.107853373

Anonymous 01/13/26(Tue)14:09:14 No.107853373

>>107853368
You need to leave.

Anonymous
01/13/26(Tue)14:14:50 No.107853418

Anonymous 01/13/26(Tue)14:14:50 No.107853418

>>107853373
What the fuck are you retae-
*backspace backspace*
retaref
*backspace*
retared

Anonymous
01/13/26(Tue)14:17:32 No.107853430

Anonymous 01/13/26(Tue)14:17:32 No.107853430

File: 1710074526347767.jpg (36 KB, 500x273)

36 KB JPG

>>107850660
Update: I got lip-syncing working. It's nice. Characters feel very much "alive".

I used Piper TTS to start off with because it was easy to install and get running, but I'm already regretting my decision. For an AI companion the voice is really 50% of the product, and the best Piper can offer is still unacceptably bad. And even with chunking it still has about a 1 second delay.

Not really sure where to go from here. I could try Kokoro TTS, which is the next step up and supports my gpu and actual streaming to help with latency, but the voice quality--while not robotic--is still very monotone and uninteresting.

Ideally I should probably have something with voice cloning, because the webui is designed so that the VRM character models are interchangeable, and if you can't give them a custom voice that really sucks. I just wish TTS models in general weren't such a bitch to get working... I know this topic has been discussed here a bunch already, but every suggestion I've seen so far doesn't fit my needs. Kinda demoralizing.

Anonymous
01/13/26(Tue)14:19:51 No.107853443

Anonymous 01/13/26(Tue)14:19:51 No.107853443

>>107853430
chatterbox turbo has paralinguistic tags. if you aren't using chatterbox then you are doing it wrong.

Anonymous
01/13/26(Tue)14:26:04 No.107853479

Anonymous 01/13/26(Tue)14:26:04 No.107853479

>>107853137
The people here know what they're doing so they only use Emacs with Evil mode enabled.

Anonymous
01/13/26(Tue)14:26:35 No.107853483

Anonymous 01/13/26(Tue)14:26:35 No.107853483

>>107853443
This does seem like a good option. I'm wanting to "go big or go home" for the next TTS so I don't repeat my PiperTTS mistake.

It's either Chatterbox or F5-TTS I think. That would be at least 10x the params, which seems ideal. The demos sound fantastic. Does chatterbox turbo support voice cloning? On the github page only the multilingual version says it supports vc as a feature.

Anonymous
01/13/26(Tue)14:29:57 No.107853511

Anonymous 01/13/26(Tue)14:29:57 No.107853511

>>107853430
so youre gonna open source this once its done right

Anonymous
01/13/26(Tue)14:34:29 No.107853554

Anonymous 01/13/26(Tue)14:34:29 No.107853554

>>107853483
i know for certain it supports voice cloning with wav files, i think you can finetune the actual model as well

Anonymous
01/13/26(Tue)14:35:39 No.107853560

Anonymous 01/13/26(Tue)14:35:39 No.107853560

>>107853511
never work for free
>>107851216
>What I was getting at is that there actually would a be point for people to potentially pay for it because I could make it a much cheaper alternative that is available on all platforms via the web.

Anonymous
01/13/26(Tue)14:35:40 No.107853561

Anonymous 01/13/26(Tue)14:35:40 No.107853561

File: uuuuuuuuuuuuuuuuuu.jpg (332 KB, 960x2304)

332 KB JPG

migujobs

Anonymous
01/13/26(Tue)14:37:27 No.107853575

Anonymous 01/13/26(Tue)14:37:27 No.107853575

>>107853511
Lay off, nigga. I've said "no promises" a dozen times. Sorry, maybe that's rude, but come on man I legitimately haven't decided yet (genuinely not trying to be intentionally vague) and I just wanna dev rn, not be customer support. Mostly just posting updates here for idea generation and feedback on the system, not trying to blueball you guys. Promise.

>>107853554
That's all I need. Ty.

Anonymous
01/13/26(Tue)14:38:22 No.107853585

Anonymous 01/13/26(Tue)14:38:22 No.107853585

>>107850660
>>107853430
Nice, looks cool

Anonymous
01/13/26(Tue)14:39:25 No.107853593

Anonymous 01/13/26(Tue)14:39:25 No.107853593

>>107853575
basically another guy using the thread as a blog, cool

Anonymous
01/13/26(Tue)14:42:03 No.107853616

Anonymous 01/13/26(Tue)14:42:03 No.107853616

>>107853593
nta but just vibe code your own project. there's already existing frameworks you can use if you want to make the same thing this guy is doing. look into pipecat.

Anonymous
01/13/26(Tue)14:42:13 No.107853618

Anonymous 01/13/26(Tue)14:42:13 No.107853618

>>107853575
If it's not local I don't care, we aren't your personal interns or consultants

Anonymous
01/13/26(Tue)14:42:54 No.107853626

Anonymous 01/13/26(Tue)14:42:54 No.107853626

>>107853616
not trying to do his idea, just commenting on what he's doing is all

Anonymous
01/13/26(Tue)14:49:04 No.107853678

Anonymous 01/13/26(Tue)14:49:04 No.107853678

File: e11870d54d66dfde81f168536(...).jpg (833 KB, 1933x1281)

833 KB JPG

>>107853196
Here's what my UI looks like in use. Mostly doing weird shit just to see what happens and understand how things work.

Anonymous
01/13/26(Tue)14:50:17 No.107853690

Anonymous 01/13/26(Tue)14:50:17 No.107853690

>>107853593
Using the thread as a tech blog is fine, or at least it would be if we could at least play with it ourselves.

>>107853575
You could always upload it with the issue tracker disabled so you don't have entitled retards demanding features and fixes from you.
People would be better able to give you feedback and ideas if they were able to try it for themselves.

Anonymous
01/13/26(Tue)14:52:34 No.107853706

Anonymous 01/13/26(Tue)14:52:34 No.107853706

>>107853690
They would ask ITT though that'd be so annoying, little locusts can't help themselves, can't just help build saas for free and stfu sm head

Anonymous
01/13/26(Tue)15:08:24 No.107853815

Anonymous 01/13/26(Tue)15:08:24 No.107853815

Man, this translator I was relying to translate a novel since before the AI stuff had the last chapter paywalled for over a year. And when he was still actively releasing free chapters, he was demanding $1/chapter. When someone commented that to read the rest of the novel (at the time) it would cost $500, he said he thought it was what his time was worth, and if he didn't like it he could just go read mtl.

Luckily it wasn't ko. AI works pretty well with jp and zh work now. <- this refers to local models, so it is on-topic.

I recently took a look at his site and realised a couple of hundred previously free chapters are now locked.

Anonymous
01/13/26(Tue)15:10:23 No.107853834

Anonymous 01/13/26(Tue)15:10:23 No.107853834

>>107853815
Fantastic news!

Anonymous
01/13/26(Tue)15:13:00 No.107853855

Anonymous 01/13/26(Tue)15:13:00 No.107853855

>>107853815
Basically, I'm telling >>107853575 to take a look at what a properly successful person does. Do *NOT* release it for free. Do *NOT* make the source code available. Throw it up on patreon to fleece money, promising that you'll release it. Just keep posting updates to whet their apetites.

The money will roll in. You won't regret it.

Anonymous
01/13/26(Tue)15:26:31 No.107853938

Anonymous 01/13/26(Tue)15:26:31 No.107853938

>>107853855
>There are "people" who actually think like this

Anonymous
01/13/26(Tue)15:29:00 No.107853949

Anonymous 01/13/26(Tue)15:29:00 No.107853949

>there are "anon" who are just screeching locusts

Anonymous
01/13/26(Tue)15:33:59 No.107853981

Anonymous 01/13/26(Tue)15:33:59 No.107853981

>>107853938
I truly admire them. If I had no moral compass I would be able to afford DDR5 or a RTX 6000 right now.

Anonymous
01/13/26(Tue)15:35:01 No.107853989

Anonymous 01/13/26(Tue)15:35:01 No.107853989

>>107853981
What a weird cope.

Anonymous
01/13/26(Tue)15:41:30 No.107854035

Anonymous 01/13/26(Tue)15:41:30 No.107854035

>>107853981
>RTX 6000
I don't understand the point of that thing. Either get some gayming gear or actually buy some proper hardware for inference.

Anonymous
01/13/26(Tue)15:48:59 No.107854082

Anonymous 01/13/26(Tue)15:48:59 No.107854082

>>107852130
Yes, but it's a WIP

>>107854035
whats 'proper hardware' for inference? B200s? H200s?

Anonymous
01/13/26(Tue)15:50:55 No.107854097

Anonymous 01/13/26(Tue)15:50:55 No.107854097

>>107854035
It's the gpu with the most vram that can still be plugged into a normal motherboard and it's 3x cheaper than an h200 per gigabyte.

Anonymous
01/13/26(Tue)15:53:17 No.107854112

Anonymous 01/13/26(Tue)15:53:17 No.107854112

>>107854097
What makes you think that price is a metric?

Anonymous
01/13/26(Tue)15:54:27 No.107854122

Anonymous 01/13/26(Tue)15:54:27 No.107854122

>>107853430
Good job Anon. I'd assume getting the animation transitions right was one of the hardest parts?
As for TTS, personally I default to GPT-SoVITS. It's not the best but it's fast, it can clone voices well enough, there are a few different implementations of it (sadly no .cpp-like) and the biggest limitation is being limited in what languages it supports.

Anonymous
01/13/26(Tue)15:54:47 No.107854124

Anonymous 01/13/26(Tue)15:54:47 No.107854124

>>107854112
is it not?

Anonymous
01/13/26(Tue)16:17:46 No.107854298

Anonymous 01/13/26(Tue)16:17:46 No.107854298

File: 520-1.jpg (337 KB, 1501x1600)

337 KB JPG

This is the only way I can still have fun with LLMs, even cooming got boring, guess it's just a code tool now

Also why does everyone suck off these huge models? Forgive me for posting api derived content but it's deepseek, available for richfags locally, and the output/intelligence honestly is barely better than what I can get from a local 24b, feels like I'm getting memed hard by shills here.

Anonymous
01/13/26(Tue)16:22:36 No.107854331

Anonymous 01/13/26(Tue)16:22:36 No.107854331

>>107854122
You can make gptsovits support other languages, but you need a full finetune of 2K hours and a phonemizer. Doable, just time consuming.

Anonymous
01/13/26(Tue)16:23:36 No.107854338

Anonymous 01/13/26(Tue)16:23:36 No.107854338

https://github.com/ggml-org/llama.cpp/commit/c1e79e610fd28f2c3923539fee9313734bbf8cfa
TOTAL VIBEJEET DEATH

Anonymous
01/13/26(Tue)16:23:43 No.107854339

Anonymous 01/13/26(Tue)16:23:43 No.107854339

>>107854124
it's not for businesses hence the issues the consumer PC DIY market is going through

Anonymous
01/13/26(Tue)16:26:50 No.107854367

Anonymous 01/13/26(Tue)16:26:50 No.107854367

File: Screenshot_20260113_222449.png (1.18 MB, 2576x1090)

1.18 MB PNG

>>107854338

Anonymous
01/13/26(Tue)16:26:55 No.107854368

Anonymous 01/13/26(Tue)16:26:55 No.107854368

>>107854298
i've been using kimi k2 thinking daily since it released. cant say ive gotten bored yet and i've gone through over 100+ scenarios that are 32k+ tokens. have you considered making modifiers or your own cards?

Anonymous
01/13/26(Tue)16:28:14 No.107854375

Anonymous 01/13/26(Tue)16:28:14 No.107854375

>>107854367
>poop

Anonymous
01/13/26(Tue)16:32:50 No.107854406

Anonymous 01/13/26(Tue)16:32:50 No.107854406

Even though I think Signal glows, a lot of /g/ disagrees. If faggyspike starts offering API access to models running with Confer who here is going to trust it?

https://confer.to/blog/2026/01/private-inference/

Anonymous
01/13/26(Tue)16:33:16 No.107854411

Anonymous 01/13/26(Tue)16:33:16 No.107854411

>>107854298
>feels like I'm getting memed hard by shills here.
the outputs are just as retarded and sloppy as the local models. the only difference is they tend to be able to keep up the charade a bit longer before completely falling apart with repetitions and/or incoherency.

Anonymous
01/13/26(Tue)16:35:18 No.107854424

Anonymous 01/13/26(Tue)16:35:18 No.107854424

>>107854406
>lmg
>API
are you stupid or retarded? fuck off to aicg jeet

Anonymous
01/13/26(Tue)16:47:22 No.107854537

Anonymous 01/13/26(Tue)16:47:22 No.107854537

Hey guys I'm not very tech savvy but got a 5090 rtx from my wife for christmas and have been trying to figure my way around it (I'm mostly a ps player), I noticed some anons talking about AI erp and silly tavern and you know the stuff and was told to come here for questions. So I'm not 100% sure what I need, but I guess a good model? I was thinking of using the kobold ai with silly tavern like some anons suggested

Anonymous
01/13/26(Tue)16:48:47 No.107854548

Anonymous 01/13/26(Tue)16:48:47 No.107854548

>>107854537
Just fuck your wife instead.
Otherwise download nemo, it's mentioned in the OP.

Anonymous
01/13/26(Tue)16:49:00 No.107854552

Anonymous 01/13/26(Tue)16:49:00 No.107854552

>>107854537
it's so over for modern men

Anonymous
01/13/26(Tue)16:49:43 No.107854559

Anonymous 01/13/26(Tue)16:49:43 No.107854559

>>107854537
Have your wife download Nemo and let her fuck it while you watch

Anonymous
01/13/26(Tue)16:49:48 No.107854561

Anonymous 01/13/26(Tue)16:49:48 No.107854561

>>107853561
is that cum inside the jar?

Anonymous
01/13/26(Tue)16:50:55 No.107854574

Anonymous 01/13/26(Tue)16:50:55 No.107854574

>>107854537
Use gemma 27b

>>107854548
Stop trolling gemma mogs nemo for a 5090

Anonymous
01/13/26(Tue)16:51:27 No.107854581

Anonymous 01/13/26(Tue)16:51:27 No.107854581

>>107851398
>Whats a decent card that doesnt cost too much?
A 5090

Anonymous
01/13/26(Tue)16:51:43 No.107854582

Anonymous 01/13/26(Tue)16:51:43 No.107854582

>>107854574
>well... everything

Anonymous
01/13/26(Tue)16:51:51 No.107854584

Anonymous 01/13/26(Tue)16:51:51 No.107854584

>>107854559
getting cucked by a language model. how perverse.

Anonymous
01/13/26(Tue)16:52:23 No.107854589

Anonymous 01/13/26(Tue)16:52:23 No.107854589

>>107854582
What you on about schizo?

Anonymous
01/13/26(Tue)16:52:58 No.107854594

Anonymous 01/13/26(Tue)16:52:58 No.107854594

>>107854589
Don't recommend gemma for erp if you're never used it.

Anonymous
01/13/26(Tue)16:53:34 No.107854599

Anonymous 01/13/26(Tue)16:53:34 No.107854599

>>107854594
Stop schizo rambling and use your words.

Anonymous
01/13/26(Tue)16:54:30 No.107854611

Anonymous 01/13/26(Tue)16:54:30 No.107854611

>>107854594
There are a fuck ton of erp versions

Anonymous
01/13/26(Tue)16:55:30 No.107854614

Anonymous 01/13/26(Tue)16:55:30 No.107854614

>>107854561
Tha's a bottle of lube
If your cum looks like that then you should drink a little less water

Anonymous
01/13/26(Tue)16:57:10 No.107854628

Anonymous 01/13/26(Tue)16:57:10 No.107854628

>>107854611
They all have the same gemini slop with purple prose so overdone it's obnoxious.

Anonymous
01/13/26(Tue)17:03:08 No.107854670

Anonymous 01/13/26(Tue)17:03:08 No.107854670

File: file.png (196 KB, 904x833)

196 KB PNG

can your local trash cope quant do this? i think not

Anonymous
01/13/26(Tue)17:04:17 No.107854675

Anonymous 01/13/26(Tue)17:04:17 No.107854675

>chat up some bullshit for a few thousand tokens
>then..
>System doesn't support voyeuristic actions. I will not indulge such requests.
this is funny

Anonymous
01/13/26(Tue)17:04:31 No.107854676

Anonymous 01/13/26(Tue)17:04:31 No.107854676

>>107854670
I doesn't need to. It will erp with me as a mesugaki which your cuck model won't.

Anonymous
01/13/26(Tue)17:10:10 No.107854698

Anonymous 01/13/26(Tue)17:10:10 No.107854698

>>107854670
Why do you need an AI to message a prostitute for you? Prostitutes will be more than happy to talk to you if you have money. They won't care that you're an autistic retard.

Anonymous
01/13/26(Tue)17:14:03 No.107854723

Anonymous 01/13/26(Tue)17:14:03 No.107854723

>>107854537
>Koboldcpp
Your inference engine, the thing that actually loads the model and handles input/output
>Sillytavern
Your front end UI, makes it user friendly (relatively) to interface with it
>Huggingface
Where you go to get models
>Mistral small
>Gemma3 27b
The best base models that fit on your card
>Finetunes
In hugging face you can go to the base model and explore the tree for user made fine-tunes (think of it as a modded llm), these are usually made to tune the models for RP or general defending
>Cydonia
>Broken tutu
>Gemma 3 derestricted
>Gemma 3 obliterated norm preserve
Some models that get recommended here, that I've personally used, that will fit on your card
>Quantization
You'll see this at the bottom of the huggingface model trees, these are essentially compressed models made to fit on smaller hardware
fp16 is full accuracy, q8 is high accuracy, q6/5/4 is what people generally use locally for small models, below that gets unbearably dumb in this range
You can also try 8/12b models at q8/fp16 but generally they aren't as good as bigger models with lower quants

There's far more to explore in this rabbit hole but that's about the best overview for noobs I can bother to give right now

Anonymous
01/13/26(Tue)17:15:59 No.107854742

Anonymous 01/13/26(Tue)17:15:59 No.107854742

>>107854723
What does this mean?

Anonymous
01/13/26(Tue)17:16:58 No.107854751

Anonymous 01/13/26(Tue)17:16:58 No.107854751

File: 1737114741285132.webm (2.91 MB, 640x338)

2.91 MB WEBM

>>107854742

Anonymous
01/13/26(Tue)17:17:55 No.107854754

Anonymous 01/13/26(Tue)17:17:55 No.107854754

>>107854670
dafuq is that ui

Anonymous
01/13/26(Tue)17:18:31 No.107854757

Anonymous 01/13/26(Tue)17:18:31 No.107854757

>>107854676
why do you want to be a mesugaki

Anonymous
01/13/26(Tue)17:19:10 No.107854762

Anonymous 01/13/26(Tue)17:19:10 No.107854762

>>107854742
You'll get it in time. Just google the things nigga and read the docs.

Anonymous
01/13/26(Tue)17:19:37 No.107854765

Anonymous 01/13/26(Tue)17:19:37 No.107854765

>>107854757
I meant I am one fucking baka

Anonymous
01/13/26(Tue)17:19:53 No.107854767

Anonymous 01/13/26(Tue)17:19:53 No.107854767

>>107854723
how much for all that

Anonymous
01/13/26(Tue)17:20:48 No.107854777

Anonymous 01/13/26(Tue)17:20:48 No.107854777

>>107854765
>i am one fucking baka
okay?!

Anonymous
01/13/26(Tue)17:22:05 No.107854784

Anonymous 01/13/26(Tue)17:22:05 No.107854784

>>107854754
Claude Cowork. Basically Claude Code for non-programming stuff. Just came out

Anonymous
01/13/26(Tue)17:22:08 No.107854786

Anonymous 01/13/26(Tue)17:22:08 No.107854786

>>107850539
current models are akin to doodles. Of course as a human being you will never be truly satisfied but there is still room for improvement.

Anonymous
01/13/26(Tue)17:22:46 No.107854793

Anonymous 01/13/26(Tue)17:22:46 No.107854793

I can't get chatterbox-turbo working on rocm. Nonstop segmentation faults whenever I try to generate audio on the gpu. CPU works. I hate python torch so much it's unreal.

Anonymous
01/13/26(Tue)17:23:19 No.107854799

Anonymous 01/13/26(Tue)17:23:19 No.107854799

>rocucks

Anonymous
01/13/26(Tue)17:23:46 No.107854801

Anonymous 01/13/26(Tue)17:23:46 No.107854801

>>107854765
hello one fucking baka nice to meet I'm anon

Anonymous
01/13/26(Tue)17:24:35 No.107854809

Anonymous 01/13/26(Tue)17:24:35 No.107854809

>>107854784
man i use cc, they din tell me bout this

Anonymous
01/13/26(Tue)17:24:53 No.107854811

Anonymous 01/13/26(Tue)17:24:53 No.107854811

>>107854784
How many years until we have a janky, broken, and bloated local alternative?

Anonymous
01/13/26(Tue)17:25:05 No.107854817

Anonymous 01/13/26(Tue)17:25:05 No.107854817

>>107854793
buy nvidia

Anonymous
01/13/26(Tue)17:25:43 No.107854821

Anonymous 01/13/26(Tue)17:25:43 No.107854821

>>107854751
Crazy how you can just skip all the years of training a cat to do this by generating it with AI nowadays.

Anonymous
01/13/26(Tue)17:29:22 No.107854836

Anonymous 01/13/26(Tue)17:29:22 No.107854836

>>107854811
>janky, broken, and bloated local alternative?
Every gui after kobold/ST

Anonymous
01/13/26(Tue)17:29:48 No.107854837

Anonymous 01/13/26(Tue)17:29:48 No.107854837

File: you.webm (1.07 MB, 704x1088)

1.07 MB WEBM

>>107854821

Anonymous
01/13/26(Tue)17:30:18 No.107854840

Anonymous 01/13/26(Tue)17:30:18 No.107854840

>>107854836
st isn't a gui

Anonymous
01/13/26(Tue)17:31:19 No.107854843

Anonymous 01/13/26(Tue)17:31:19 No.107854843

>>107854840
could have fooled me

Anonymous
01/13/26(Tue)17:40:29 No.107854895

Anonymous 01/13/26(Tue)17:40:29 No.107854895

File: dipsyPen.png (3.57 MB, 1024x1536)

3.57 MB PNG

>>107854784
Sweet.
I've been experimenting with Claude Code using DeepSeek's Anthropic API. I'll have to keep an eye on this; looks like it's early access only for some version of Claude subs.

Anonymous
01/13/26(Tue)17:42:23 No.107854903

Anonymous 01/13/26(Tue)17:42:23 No.107854903

File: cumpie.jpg (216 KB, 825x676)

216 KB JPG

>>107854670
>mogra

Anonymous
01/13/26(Tue)17:42:56 No.107854911

Anonymous 01/13/26(Tue)17:42:56 No.107854911

>>107854793
Plug the errors into gemini and grind that shit out for an evening. I run nvidia/linux and still get errors I have to work through with every TTS model/backend I've tried.

Anonymous
01/13/26(Tue)17:45:31 No.107854925

Anonymous 01/13/26(Tue)17:45:31 No.107854925

>>107854911
never had issues on win 11

Anonymous
01/13/26(Tue)17:45:40 No.107854926

Anonymous 01/13/26(Tue)17:45:40 No.107854926

>>107854751
holy shit real?

Anonymous
01/13/26(Tue)17:47:45 No.107854939

Anonymous 01/13/26(Tue)17:47:45 No.107854939

>>107854926
They trained that cat for months.

Anonymous
01/13/26(Tue)17:48:10 No.107854943

Anonymous 01/13/26(Tue)17:48:10 No.107854943

File: 1737251918782630.gif (598 KB, 220x220)

598 KB GIF

>>107854793
>rocm schizo at it again

Anonymous
01/13/26(Tue)17:49:05 No.107854947

Anonymous 01/13/26(Tue)17:49:05 No.107854947

Wait is nemo actually better than qwen3 finetunes?

Anonymous
01/13/26(Tue)17:49:36 No.107854951

Anonymous 01/13/26(Tue)17:49:36 No.107854951

File: file.png (2.61 MB, 1024x1536)

2.61 MB PNG

>>107854895

Anonymous
01/13/26(Tue)17:49:52 No.107854954

Anonymous 01/13/26(Tue)17:49:52 No.107854954

>>107854947
qwen is so dry and generally awful for anything non stem I don't know how anyone can tolerate it

Anonymous
01/13/26(Tue)17:50:11 No.107854958

Anonymous 01/13/26(Tue)17:50:11 No.107854958

>>107854947
It's dumber but much better for any kind of RP or general story writing

Anonymous
01/13/26(Tue)17:52:19 No.107854973

Anonymous 01/13/26(Tue)17:52:19 No.107854973

>>107854947
Use mistral small.

Anonymous
01/13/26(Tue)17:53:46 No.107854984

Anonymous 01/13/26(Tue)17:53:46 No.107854984

>>107854947
Always has been

Anonymous
01/13/26(Tue)17:55:08 No.107854996

Anonymous 01/13/26(Tue)17:55:08 No.107854996

>>107854723
>Gemma 3 derestricted
>Gemma 3 obliterated norm preserve
Aren't they the same thing? Like abliteration and orthogonal activation steering.

Anonymous
01/13/26(Tue)17:55:41 No.107855000

Anonymous 01/13/26(Tue)17:55:41 No.107855000

File: 1741432577932063.jpg (237 KB, 747x643)

237 KB JPG

>>107854670
>>107854784
>letting an LLM take state-changing actions (DMing foids) in a non sandboxed & version-controlled environment

Anonymous
01/13/26(Tue)17:56:25 No.107855008

Anonymous 01/13/26(Tue)17:56:25 No.107855008

>>107854723
cydonia is more restricted than gemma3 unrestricted

Anonymous
01/13/26(Tue)17:56:27 No.107855009

Anonymous 01/13/26(Tue)17:56:27 No.107855009

>>107854996
Yes, they're both garbage that one newfag has been pushing in the last couple threads.

Anonymous
01/13/26(Tue)17:56:44 No.107855011

Anonymous 01/13/26(Tue)17:56:44 No.107855011

>>107854751
If I didn't see this video before video gen models became a thing I'd 100% think it was fake.

Anonymous
01/13/26(Tue)17:57:28 No.107855013

Anonymous 01/13/26(Tue)17:57:28 No.107855013

>>107855008
not even close, you're simply awful at prompting

Anonymous
01/13/26(Tue)17:57:57 No.107855017

Anonymous 01/13/26(Tue)17:57:57 No.107855017

>>107855009
Better than you morons pushing nemo, some of us like magic systems and shit, nemo is a lobotomite gooner

Anonymous
01/13/26(Tue)17:58:55 No.107855025

Anonymous 01/13/26(Tue)17:58:55 No.107855025

>>107855017
He asked for an erp model

Anonymous
01/13/26(Tue)17:59:19 No.107855027

Anonymous 01/13/26(Tue)17:59:19 No.107855027

>>107855013
>I will now an hero
>Gemma3 unr: ok
>Cydonia: please saaar call this hotline number

Anonymous
01/13/26(Tue)17:59:25 No.107855028

Anonymous 01/13/26(Tue)17:59:25 No.107855028

>>107855000
? What's wrong with that? I'm about to let mistral vibe loose with qwen-3-4b-deepdark-ft-anal-prolapser iq2xs on my nas/home server root privileges in --auto-approve mode.

Anonymous
01/13/26(Tue)17:59:34 No.107855029

Anonymous 01/13/26(Tue)17:59:34 No.107855029

>>107855017
Which is what you need since you're using lobotomized versions of fucking gemma

Anonymous
01/13/26(Tue)18:00:35 No.107855038

Anonymous 01/13/26(Tue)18:00:35 No.107855038

>>107855027
I've never once seen cydonia spit a hotline, you're both retarded and brown.

Anonymous
01/13/26(Tue)18:01:10 No.107855040

Anonymous 01/13/26(Tue)18:01:10 No.107855040

File: nastyan.webm (515 KB, 540x540)

515 KB WEBM

>>107855000
checked. have to see what these things are capable of and, well, existing benchmarks were saturated. it's for science

Anonymous
01/13/26(Tue)18:01:16 No.107855041

Anonymous 01/13/26(Tue)18:01:16 No.107855041

>>107854951
ty saved.

Anonymous
01/13/26(Tue)18:01:23 No.107855044

Anonymous 01/13/26(Tue)18:01:23 No.107855044

>>107855038
people even ablited it, cope retard
https://huggingface.co/coder3101/Cydonia-24B-v4.3-heretic

Anonymous
01/13/26(Tue)18:01:37 No.107855047

Anonymous 01/13/26(Tue)18:01:37 No.107855047

>best model for 8GB VRAMlets:
Still Nemo
>best model for 12GB VRAMlets:
Still Nemo
>best model for 24GB VRAMlets:
Still Nemo
>best model for 32GB VRAMlets:
Still Nemo
>best model for 48GB VRAMlets:
Still Nemo
>best model for 96GB VRAMlets:
Still Nemo

dire.

Anonymous
01/13/26(Tue)18:02:12 No.107855051

Anonymous 01/13/26(Tue)18:02:12 No.107855051

>>107855047
i have 4 6000s, what about me

Anonymous
01/13/26(Tue)18:04:08 No.107855061

Anonymous 01/13/26(Tue)18:04:08 No.107855061

File: dokis.png (1.02 MB, 1024x824)

1.02 MB PNG

>>107855000
I also enjoy Dokis

Anonymous
01/13/26(Tue)18:04:40 No.107855063

Anonymous 01/13/26(Tue)18:04:40 No.107855063

>>107855061
>yaoi hands

Anonymous
01/13/26(Tue)18:04:45 No.107855064

Anonymous 01/13/26(Tue)18:04:45 No.107855064

>>107855051
3-4 instances of Nemo at F16, per card.

Anonymous
01/13/26(Tue)18:04:58 No.107855066

Anonymous 01/13/26(Tue)18:04:58 No.107855066

>>107855051
10 Nemo instances DMing each other

Anonymous
01/13/26(Tue)18:05:46 No.107855071

Anonymous 01/13/26(Tue)18:05:46 No.107855071

>>107855063
Big hands are useful for lesbians too

Anonymous
01/13/26(Tue)18:05:55 No.107855072

Anonymous 01/13/26(Tue)18:05:55 No.107855072

>>107855038
I've never seen a hotline from cydonia neither, but it does sometimes spit out warnings and disclaimer-likes.

Abliteration also sometimes refused for me, but norm preserve whatever/derestricted hasn't refused me so far. You can definitely feel it's a lot dumber, and sometimes it just always agrees with you no matter what.

Anonymous
01/13/26(Tue)18:06:23 No.107855079

Anonymous 01/13/26(Tue)18:06:23 No.107855079

>>107855044
>"people"
>https://huggingface.co/coder3101
>https://github.com/coder3101
>https://linkedin.com/in/coder3101
>Ranchi, Jharkhand, India
doubt on that one chief

Anonymous
01/13/26(Tue)18:07:24 No.107855085

Anonymous 01/13/26(Tue)18:07:24 No.107855085

>>107855079
benchod

Anonymous
01/13/26(Tue)18:08:05 No.107855094

Anonymous 01/13/26(Tue)18:08:05 No.107855094

>>107855079
fucking kek

Anonymous
01/13/26(Tue)18:08:53 No.107855097

Anonymous 01/13/26(Tue)18:08:53 No.107855097

>>107855072
>You can definitely feel it's a lot dumber, and sometimes it just always agrees with you no matter what
Exactly my experience, and why I avoid them now, after trying a couple. If anyone really likes Gemma enough to want to ERP with it, it really isn't hard to just use a jailbreak prompt and get your degeneracy fix without using a damaged model. There's plenty of examples in the archives.

Anonymous
01/13/26(Tue)18:09:12 No.107855100

Anonymous 01/13/26(Tue)18:09:12 No.107855100

>>107855079
https://huggingface.co/mradermacher/Cydonia-24B-v4.3-heretic-v2-i1-GGUF

check and mate

Anonymous
01/13/26(Tue)18:10:10 No.107855106

Anonymous 01/13/26(Tue)18:10:10 No.107855106

>>107855097
What do you use then?

Anonymous
01/13/26(Tue)18:10:35 No.107855108

Anonymous 01/13/26(Tue)18:10:35 No.107855108

>>107855100
yes? the will quant everything group quanted it, so?

Anonymous
01/13/26(Tue)18:11:06 No.107855112

Anonymous 01/13/26(Tue)18:11:06 No.107855112

File: 1761365833801621.jpg (211 KB, 904x711)

211 KB JPG

>>107855100
mradermacher is a quantization provider, they don't actually make models you absolute fucking idiot.

Anonymous
01/13/26(Tue)18:11:37 No.107855115

Anonymous 01/13/26(Tue)18:11:37 No.107855115

>>107854996
They use different methods so they feel different

>>107855009
>Negativity slop

>>107855008
A newfag should be encouraged to try out all the things himself instead of adhere to tribal autism that people in this general have rabbit holed into via 100 litres of spent semen

Anonymous
01/13/26(Tue)18:11:37 No.107855116

Anonymous 01/13/26(Tue)18:11:37 No.107855116

>>107855112
>>107855108
11k downloads

Anonymous
01/13/26(Tue)18:12:47 No.107855127

Anonymous 01/13/26(Tue)18:12:47 No.107855127

>>107855115
Oh and Nemo is an outdated meme

Anonymous
01/13/26(Tue)18:12:52 No.107855130

Anonymous 01/13/26(Tue)18:12:52 No.107855130

>>107855108
>>107855112
He's got his phone set to vibrate and has stuck it up his ass. Please stop pleasuring him.

Anonymous
01/13/26(Tue)18:12:59 No.107855131

Anonymous 01/13/26(Tue)18:12:59 No.107855131

>>107855106
I switch between Mistral Small and Nemo for ERP
Gemma for SFW or setting up scenarios for other models.

Anonymous
01/13/26(Tue)18:13:22 No.107855135

Anonymous 01/13/26(Tue)18:13:22 No.107855135

>>107855116
very proud of you sir Ashar!

Anonymous
01/13/26(Tue)18:13:33 No.107855136

Anonymous 01/13/26(Tue)18:13:33 No.107855136

>>107855097
Even if you jailbreak Gemma it still hard avoids most things it would've refused before by dragging its feet, when you have to push that hard for erotica you might as well just write it yourself

Anonymous
01/13/26(Tue)18:13:59 No.107855143

Anonymous 01/13/26(Tue)18:13:59 No.107855143

>>107855116
https://huggingface.co/google/gemma-3-27b-it
1.3M downloads

Anonymous
01/13/26(Tue)18:14:04 No.107855144

Anonymous 01/13/26(Tue)18:14:04 No.107855144

>>107854670
>>107854784
Does this function any differently from running claude code with some mcp servers attached?

Anonymous
01/13/26(Tue)18:14:34 No.107855149

Anonymous 01/13/26(Tue)18:14:34 No.107855149

>>107855127
>next is the part where he doesn't offer an equivalent alternative

Anonymous
01/13/26(Tue)18:15:28 No.107855155

Anonymous 01/13/26(Tue)18:15:28 No.107855155

>>107855136
My experience:
>and then I have le sex
>g3: but then before you can start a werewolf jumps into the scene locking you into battle
>retry
>g3: you then feel a shiver in your spine and hear a voice "Never should have come here", then a spectre manifests
and so it goes

Anonymous
01/13/26(Tue)18:16:02 No.107855158

Anonymous 01/13/26(Tue)18:16:02 No.107855158

>that many people with a skill issue
they deserve their retarded abliterated models tbdesu

Anonymous
01/13/26(Tue)18:16:33 No.107855161

Anonymous 01/13/26(Tue)18:16:33 No.107855161

>>107855136
>Even if you jailbreak Gemma it still hard avoids most things it would've refused
In most cases you're right, which is why I generally avoid it, that and because of its abundance of slop. But the Big Sir in this thread seems intent on using Gemma for ERP so I provided a better solution.
>you might as well just write it yourself
You only need to append a single sentence at the end of its last reply once, and after that point it will usually just roll with it.

Anonymous
01/13/26(Tue)18:17:12 No.107855163

Anonymous 01/13/26(Tue)18:17:12 No.107855163

Sisters anyone try https://huggingface.co/p-e-w/Mistral-Nemo-Instruct-2407-heretic-noslop
?
Apparently our lord and savior P-E-W found how to deslop using Heretic!! https://www.reddit.com/r/LocalLLaMA/comments/1qa0w6c/it_works_abliteration_can_reduce_slop_without/

Anonymous
01/13/26(Tue)18:17:28 No.107855165

Anonymous 01/13/26(Tue)18:17:28 No.107855165

>>107855155
Just don't have sex?

Anonymous
01/13/26(Tue)18:18:01 No.107855170

Anonymous 01/13/26(Tue)18:18:01 No.107855170

>>107855144
With some setup you could get Claude Code to do pretty much everything this does. Cowork has some included skills out of the box that make normal desktop stuff work and it integrates with the Anthropic Chrome extension so it can automate browser stuff OOTB. It's early days for it though so I imagine if this catches on with normies it will get a lot of development attention and start distinguishing itself from a modded Claude Code more

Anonymous
01/13/26(Tue)18:18:13 No.107855171

Anonymous 01/13/26(Tue)18:18:13 No.107855171

>>107855149
https://huggingface.co/Darkhn/L3.3-70B-Animus-V12.1

Anonymous
01/13/26(Tue)18:19:01 No.107855176

Anonymous 01/13/26(Tue)18:19:01 No.107855176

>>107855108
>>107855112
see >>107855163 you lost

Anonymous
01/13/26(Tue)18:19:13 No.107855177

Anonymous 01/13/26(Tue)18:19:13 No.107855177

>>107855163
I can't imagine how retarded and brown you would have to be, to need a lobotomized Nemo

Anonymous
01/13/26(Tue)18:20:19 No.107855183

Anonymous 01/13/26(Tue)18:20:19 No.107855183

>>107855177
It's not lobotomize -- it's unslop :))

Anonymous
01/13/26(Tue)18:20:24 No.107855184

Anonymous 01/13/26(Tue)18:20:24 No.107855184

>>107855177
Ironic coming from someone who can't even read

Anonymous
01/13/26(Tue)18:20:37 No.107855185

Anonymous 01/13/26(Tue)18:20:37 No.107855185

>>107855163
>>107855176
So rather than just using regular Nemo and banning slop tokens, you lobotomize it and make an already dumb model completely braindead. This is a big win for india.

Anonymous
01/13/26(Tue)18:20:43 No.107855186

Anonymous 01/13/26(Tue)18:20:43 No.107855186

>>107855163
>Mistral Nemo (a model infamous for producing slop)
That's not why nemo is famous.

Anonymous
01/13/26(Tue)18:22:12 No.107855199

Anonymous 01/13/26(Tue)18:22:12 No.107855199

>>107855186
But it is why it's in-famous, haha! Famous for commer degeneracy india-famous for slop.

Anonymous
01/13/26(Tue)18:22:14 No.107855200

Anonymous 01/13/26(Tue)18:22:14 No.107855200

>>107855185
not them but how do i do that? can silly do it?

Anonymous
01/13/26(Tue)18:23:18 No.107855208

Anonymous 01/13/26(Tue)18:23:18 No.107855208

>>107855200
yes. learn to read, google or ask chatgpt.

Anonymous
01/13/26(Tue)18:23:47 No.107855210

Anonymous 01/13/26(Tue)18:23:47 No.107855210

So many anons that literally can't read to save their lives, and don't even understand the diff between famous and infamous, this general is so dead.

Anonymous
01/13/26(Tue)18:23:52 No.107855211

Anonymous 01/13/26(Tue)18:23:52 No.107855211

>>107855208
can nemo do it instead?

Anonymous
01/13/26(Tue)18:24:22 No.107855218

Anonymous 01/13/26(Tue)18:24:22 No.107855218

>>107855200
Ask later when people are no longer pissing on each other.

Anonymous
01/13/26(Tue)18:24:45 No.107855222

Anonymous 01/13/26(Tue)18:24:45 No.107855222

>>107855163
Deslopping Nemo and acting smug about it is like being proud of banging the town bicycle after buying her an expensive diner.

Anonymous
01/13/26(Tue)18:24:54 No.107855225

Anonymous 01/13/26(Tue)18:24:54 No.107855225

>>107855210
This general was fine until an hour ago when it got retarded all of a sudden. It's just two retards that came from school or work.

Anonymous
01/13/26(Tue)18:25:02 No.107855226

Anonymous 01/13/26(Tue)18:25:02 No.107855226

>>107855210
pretty hilarious coming from the language model thread honestly

Anonymous
01/13/26(Tue)18:25:29 No.107855227

Anonymous 01/13/26(Tue)18:25:29 No.107855227

>>107855211
You can ask Nemo how to do it but it will probably get confused and fail

Anonymous
01/13/26(Tue)18:25:46 No.107855230

Anonymous 01/13/26(Tue)18:25:46 No.107855230

>>107855225
If someone shits on gemma abliterated you know they are tards

Anonymous
01/13/26(Tue)18:26:05 No.107855231

Anonymous 01/13/26(Tue)18:26:05 No.107855231

>>107855225
I, near-china region but not china, woke up.

Anonymous
01/13/26(Tue)18:26:11 No.107855234

Anonymous 01/13/26(Tue)18:26:11 No.107855234

>>107855225
>suddenly
>>107855230
lol

Anonymous
01/13/26(Tue)18:26:59 No.107855241

Anonymous 01/13/26(Tue)18:26:59 No.107855241

>>107855222
If you can afford to just hand out free diners to every prostitute you bang then you should be proud

Anonymous
01/13/26(Tue)18:27:38 No.107855248

Anonymous 01/13/26(Tue)18:27:38 No.107855248

>>107855231
Pretty sure you are the only Japanon here.

Anonymous
01/13/26(Tue)18:27:54 No.107855251

Anonymous 01/13/26(Tue)18:27:54 No.107855251

Great, now we have a gemma schizo in addition to the french schizo

Anonymous
01/13/26(Tue)18:28:29 No.107855254

Anonymous 01/13/26(Tue)18:28:29 No.107855254

>>107855251
Who is the french schizo?

Anonymous
01/13/26(Tue)18:29:40 No.107855261

Anonymous 01/13/26(Tue)18:29:40 No.107855261

File: file.png (17 KB, 667x111)

17 KB PNG

>>107855163
confused, why is they saying their own shit is useless?

Anonymous
01/13/26(Tue)18:30:00 No.107855266

Anonymous 01/13/26(Tue)18:30:00 No.107855266

>>107855254
You

Anonymous
01/13/26(Tue)18:30:03 No.107855267

Anonymous 01/13/26(Tue)18:30:03 No.107855267

>>107855230
Speaking of gemma, would medgemma q8 be better than glm 4.6 q6 for guro? Not rp. I don't like how sloppy and unevocative glm's language is in an assistant role.

Anonymous
01/13/26(Tue)18:30:40 No.107855271

Anonymous 01/13/26(Tue)18:30:40 No.107855271

>>107855266
t. GLM schizo

Anonymous
01/13/26(Tue)18:30:50 No.107855272

Anonymous 01/13/26(Tue)18:30:50 No.107855272

>>107855267
Absolutely sir, the Gemma is best at this use cases.

Anonymous
01/13/26(Tue)18:31:01 No.107855275

Anonymous 01/13/26(Tue)18:31:01 No.107855275

>>107855222
This is a rather colorful internet slang reference! Here's a breakdown:

"Deslopping Nemo" refers to intentionally giving the LLM (Nemo) difficult, unpleasant, or "low-quality" prompts – things designed to elicit bad responses. It’s like deliberately trying to make it stumble.

“Town Bicycle” is an old slang term for a woman who's readily available to anyone. The idea is that many people have used her.

The analogy: Someone boasting about "banging the town bicycle" (using Nemo in a way that shows its flaws) after buying it a nice dinner (giving it complex prompts or trying to improve it with fine-tuning) highlights an odd kind of pride. It suggests they feel clever for exposing weaknesses, even though they contributed to the interaction and possibly tried to help beforehand.

Essentially, it's criticizing people who take pleasure in deliberately making AI models fail and then gloating about it. It implies a lack of constructive engagement with the technology.

Anonymous
01/13/26(Tue)18:32:09 No.107855285

Anonymous 01/13/26(Tue)18:32:09 No.107855285

>>107855271
t. Kimi schizo

Anonymous
01/13/26(Tue)18:32:25 No.107855287

Anonymous 01/13/26(Tue)18:32:25 No.107855287

>>107855275
Thanks Gemma.

Anonymous
01/13/26(Tue)18:32:40 No.107855290

Anonymous 01/13/26(Tue)18:32:40 No.107855290

>>107855275
Did you type this yourself? It feels a bit off to be AI-generated.

Anonymous
01/13/26(Tue)18:35:13 No.107855306

Anonymous 01/13/26(Tue)18:35:13 No.107855306

>>107855290
>It feels a bit off
To me it just reads like classic Nemo, in that it gets some things right, but by the end of the reply you can clearly see it doesn't understand what you've asked it at all.

Anonymous
01/13/26(Tue)18:35:13 No.107855307

Anonymous 01/13/26(Tue)18:35:13 No.107855307

File: 1760650640367739.jpg (116 KB, 420x466)

116 KB JPG

>Jamba-schizos when the WizardLM schizo walks in

Anonymous
01/13/26(Tue)18:36:13 No.107855321

Anonymous 01/13/26(Tue)18:36:13 No.107855321

>>107855307
>shadman, japan

Anonymous
01/13/26(Tue)18:36:58 No.107855329

Anonymous 01/13/26(Tue)18:36:58 No.107855329

>>107855306
>>107855287
>its gemma
>its nemo

Anonymous
01/13/26(Tue)18:37:29 No.107855334

Anonymous 01/13/26(Tue)18:37:29 No.107855334

>>107855321
qrd

Anonymous
01/13/26(Tue)18:37:59 No.107855337

Anonymous 01/13/26(Tue)18:37:59 No.107855337

>>107855329
At Least it's not a cpc model

Anonymous
01/13/26(Tue)18:39:03 No.107855348

Anonymous 01/13/26(Tue)18:39:03 No.107855348

>>107855334
anus saggy is a japanese artist with similar fetishistic tastes as shadman, /v/'s favorite illustrator.

Anonymous
01/13/26(Tue)18:39:22 No.107855351

Anonymous 01/13/26(Tue)18:39:22 No.107855351

>>107855210
A model can be both famous and infamous for its ability to write loli guro bestiality.

Anonymous
01/13/26(Tue)18:39:33 No.107855356

Anonymous 01/13/26(Tue)18:39:33 No.107855356

>gemma users
schizos
>nemo users
schizos
>mistral small enjoyers
well adjusted coomers

Take note, newfags.

Anonymous
01/13/26(Tue)18:39:50 No.107855357

Anonymous 01/13/26(Tue)18:39:50 No.107855357

>>107855307
The phrase ">Jamba-schizos when the WizardLM schizo walks in" appears to be a reference to two specific models: Jamba and WizardLM, both likely being LLMs (Large Language Models). The term "schizo" is often used humorously on internet forums like 4chan to describe unpredictable or erratic behavior.

In this context:
"Jamba-schizos" probably refers to users or interactions involving the Jamba LLM exhibiting unusual or eccentric behaviors.
"WizardLM schizo walks in" suggests that when a user or interaction related to the WizardLM model enters the conversation, it causes even more chaotic or unpredictable responses.

Overall, the anon is playfully describing a situation where the presence of one model (WizardLM) exacerbates the already erratic behavior associated with another model (Jamba).

Anonymous
01/13/26(Tue)18:40:25 No.107855361

Anonymous 01/13/26(Tue)18:40:25 No.107855361

>>107855356
I'm more of a Cydonia person myself

Anonymous
01/13/26(Tue)18:40:38 No.107855364

Anonymous 01/13/26(Tue)18:40:38 No.107855364

>>107855356
Nemo users can be well adjusted coomers as well, they just have even less money than MS users.

Anonymous
01/13/26(Tue)18:40:47 No.107855366

Anonymous 01/13/26(Tue)18:40:47 No.107855366

>>107855356
What mistral are you using?

Anonymous
01/13/26(Tue)18:40:49 No.107855367

Anonymous 01/13/26(Tue)18:40:49 No.107855367

>>107855351
You're absolutely right! But then maybe say "that's not what it's infamous for" instead of "That's not why nemo is famous." - Hope that helps!

Anonymous
01/13/26(Tue)18:41:22 No.107855373

Anonymous 01/13/26(Tue)18:41:22 No.107855373

>>107855364
>mistral small users talking about money

Anonymous
01/13/26(Tue)18:41:34 No.107855375

Anonymous 01/13/26(Tue)18:41:34 No.107855375

>>107855361
>SAAAAR CALL THE HOTLINE DO NOT REDEEM ROPE

Anonymous
01/13/26(Tue)18:41:39 No.107855377

Anonymous 01/13/26(Tue)18:41:39 No.107855377

>>107855361
Cydonia is just Mistral Small with some placebo thrown in

Anonymous
01/13/26(Tue)18:42:03 No.107855383

Anonymous 01/13/26(Tue)18:42:03 No.107855383

>>107855361
Suck my dick drummer

Anonymous
01/13/26(Tue)18:42:03 No.107855384

Anonymous 01/13/26(Tue)18:42:03 No.107855384

>>107855361
Cydonia India version I hope?

Anonymous
01/13/26(Tue)18:42:44 No.107855389

Anonymous 01/13/26(Tue)18:42:44 No.107855389

>>107855373
we own 5090s

Anonymous
01/13/26(Tue)18:44:17 No.107855400

Anonymous 01/13/26(Tue)18:44:17 No.107855400

>>107855373
>they just have even less
Learn to read nigga

Anonymous
01/13/26(Tue)18:44:54 No.107855404

Anonymous 01/13/26(Tue)18:44:54 No.107855404

>>107855400
mind your tone when you speak with me gora

Anonymous
01/13/26(Tue)18:45:01 No.107855407

Anonymous 01/13/26(Tue)18:45:01 No.107855407

>>107855400

>>107855210

Anonymous
01/13/26(Tue)18:46:28 No.107855418

Anonymous 01/13/26(Tue)18:46:28 No.107855418

>>107855404
moved to a new word did we?

Anonymous
01/13/26(Tue)18:48:09 No.107855431

Anonymous 01/13/26(Tue)18:48:09 No.107855431

So for a 5090 the best model is mistral? I want to play a gooner isekai lifesim

Anonymous
01/13/26(Tue)18:52:59 No.107855475

Anonymous 01/13/26(Tue)18:52:59 No.107855475

>>107855431
Yes, but it's still not great, you will have to do some tard wrangling, especially in longer stories.

Anonymous
01/13/26(Tue)18:53:14 No.107855479

Anonymous 01/13/26(Tue)18:53:14 No.107855479

>>107855377
It's a high-sugar placebo

Anonymous
01/13/26(Tue)18:56:34 No.107855507

Anonymous 01/13/26(Tue)18:56:34 No.107855507

>>107855475
what's the point of a 5090 then fuck :(

Anonymous
01/13/26(Tue)18:57:15 No.107855512

Anonymous 01/13/26(Tue)18:57:15 No.107855512

>>107855507
Mistral Small

Anonymous
01/13/26(Tue)18:59:36 No.107855527

Anonymous 01/13/26(Tue)18:59:36 No.107855527

>>107855507
It's fast at least.
20-30b models can run okay on 16GB cards with smaller quants and possibly some offloading
They fit nicely on 24GB cards at good quant levels
32GB on a consumer card is a small market, so there isn't anything being made specifically for it. The next step up would be 70b models, made for 48GB cards and bigger. But most companies stopped making them, so they're all outdated at this point.

Anonymous
01/13/26(Tue)19:00:30 No.107855537

Anonymous 01/13/26(Tue)19:00:30 No.107855537

>>107855507
image/video gen would be pretty sweet with one I imagine

Anonymous
01/13/26(Tue)19:00:51 No.107855540

Anonymous 01/13/26(Tue)19:00:51 No.107855540

>>107855527
>The next step up would be 70b models, made for 48GB cards and bigger.
couldn't 24g anons run it on exxlama like years ago?

Anonymous
01/13/26(Tue)19:01:39 No.107855549

Anonymous 01/13/26(Tue)19:01:39 No.107855549

Why isn't EXL3 talked about?

Anonymous
01/13/26(Tue)19:03:01 No.107855562

Anonymous 01/13/26(Tue)19:03:01 No.107855562

>>107855540
Even a shitty Q2 quant of a 70b model is about 24GB, that's with zero context. You can use partial offloading but with a 70b dense model it will be extremely slow.

Anonymous
01/13/26(Tue)19:04:41 No.107855571

Anonymous 01/13/26(Tue)19:04:41 No.107855571

>>107855562
so it would work just fine on 32g don't see the problem other than a little old

Anonymous
01/13/26(Tue)19:06:37 No.107855581

Anonymous 01/13/26(Tue)19:06:37 No.107855581

>>107855549
everyone is vram poor and coping with this in different ways

Anonymous
01/13/26(Tue)19:08:37 No.107855603

Anonymous 01/13/26(Tue)19:08:37 No.107855603

>>107855571
Yes, you could run a Q2 quant but it will be very low quality. At that point you may as well use a bigger quant of a smaller, more recent model. But there's nothing stopping you from trying, you might end up preferring older models since their datasets are less synthetic, even if the quantization damages them a little.

Anonymous
01/13/26(Tue)19:09:39 No.107855612

Anonymous 01/13/26(Tue)19:09:39 No.107855612

>>107855549
no llama.cpp support

Anonymous
01/13/26(Tue)19:10:38 No.107855621

Anonymous 01/13/26(Tue)19:10:38 No.107855621

>>107855612
Sounds like a feature to me.

Anonymous
01/13/26(Tue)19:12:17 No.107855632

Anonymous 01/13/26(Tue)19:12:17 No.107855632

>>107855621
Maybe you should go somewhere with users of EXL3-supporting engines and discuss it there.

Anonymous
01/13/26(Tue)19:14:28 No.107855650

Anonymous 01/13/26(Tue)19:14:28 No.107855650

>>107855431
you could partially offload a Q2 of GLM 4.5 to system ram and run it slowly lol. LLMs are a rich man hobby.

Anonymous
01/13/26(Tue)19:15:15 No.107855655

Anonymous 01/13/26(Tue)19:15:15 No.107855655

>>107855632
Why are you rude?

Anonymous
01/13/26(Tue)19:16:19 No.107855667

Anonymous 01/13/26(Tue)19:16:19 No.107855667

>>107855650
what gpu do you own??

Anonymous
01/13/26(Tue)19:17:01 No.107855669

Anonymous 01/13/26(Tue)19:17:01 No.107855669

>>107855667
1060

Anonymous
01/13/26(Tue)19:17:28 No.107855672

Anonymous 01/13/26(Tue)19:17:28 No.107855672

>>107855669
explain

Anonymous
01/13/26(Tue)19:18:15 No.107855676

Anonymous 01/13/26(Tue)19:18:15 No.107855676

>>107855672
https://www.techpowerup.com/gpu-specs/geforce-gtx-1060-6-gb.c2862

Anonymous
01/13/26(Tue)19:20:19 No.107855692

Anonymous 01/13/26(Tue)19:20:19 No.107855692

>>107855667
I'm a poorfag with a 4060. But I've learned from playing around with higher end systems, offloading up to half of a model will still run at reading speed. For that reason, I can run mistral small at Q4 with 16K context (Q8) even on my 8GB GPU. You can do even more impressive stuff using a 5090.

Anonymous
01/13/26(Tue)19:25:54 No.107855732

Anonymous 01/13/26(Tue)19:25:54 No.107855732

Can one of you add router support to kobold?

Anonymous
01/13/26(Tue)19:34:46 No.107855786

Anonymous 01/13/26(Tue)19:34:46 No.107855786

>>107855692
>16K context (Q8)
quantizing KV is bad
Are you using tensor offloading already?

Anonymous
01/13/26(Tue)19:35:48 No.107855790

Anonymous 01/13/26(Tue)19:35:48 No.107855790

>>107855732
I'll get Gemini to write the PR

Anonymous
01/13/26(Tue)19:38:34 No.107855811

Anonymous 01/13/26(Tue)19:38:34 No.107855811

>>107855732
What's this "router"?

>>107855790
# TODO: Actually implement support.

Anonymous
01/13/26(Tue)19:49:39 No.107855886

Anonymous 01/13/26(Tue)19:49:39 No.107855886

why dead

Anonymous
01/13/26(Tue)19:52:37 No.107855912

Anonymous 01/13/26(Tue)19:52:37 No.107855912

>>107855886
because you touch yourself

Anonymous
01/13/26(Tue)19:53:14 No.107855917

Anonymous 01/13/26(Tue)19:53:14 No.107855917

>>107855732
Be the vibecoder you want to see

Anonymous
01/13/26(Tue)19:58:10 No.107855951

Anonymous 01/13/26(Tue)19:58:10 No.107855951

>>107855917
Thanks, reddit. Very inspiring.

Anonymous
01/13/26(Tue)20:03:13 No.107855977

Anonymous 01/13/26(Tue)20:03:13 No.107855977

>>107855786
pshhhh quantizing kv to q8 is no problem, in fact it's dumb not to, free fuckin real estate

Anonymous
01/13/26(Tue)20:05:01 No.107855988

Anonymous 01/13/26(Tue)20:05:01 No.107855988

>>107855977
It really isn't, even if mememarks might suggest it. You get mis-quotes and more hallucinations. At 16K context it's not like KV would be taking up that much memory to begin with anyway.

Anonymous
01/13/26(Tue)20:44:24 No.107856241

Anonymous 01/13/26(Tue)20:44:24 No.107856241

>>107854670
>>107854784
>>107855000
I guess you don't consider giving some of your most personal data to these companies a high risk..

Anonymous
01/13/26(Tue)20:45:24 No.107856248

Anonymous 01/13/26(Tue)20:45:24 No.107856248

>>107856241
You can use your own backend with claude code. I assume it's the same with this new claude cowork.

Anonymous
01/13/26(Tue)20:47:12 No.107856259

Anonymous 01/13/26(Tue)20:47:12 No.107856259

>>107847497
>>107847974
>>107847989
Thanks, Will look into it more

Anonymous
01/13/26(Tue)20:52:58 No.107856290

Anonymous 01/13/26(Tue)20:52:58 No.107856290

It's up
https://huggingface.co/zai-org/GLM-Image

Anonymous
01/13/26(Tue)20:53:46 No.107856294

Anonymous 01/13/26(Tue)20:53:46 No.107856294

>>107856290
and it looks terrible, seems like only Alibaba is able to be good at both LLMs and image models

Anonymous
01/13/26(Tue)20:55:28 No.107856299

Anonymous 01/13/26(Tue)20:55:28 No.107856299

>>107856248
>You can use your own backend with claude code
ok, and your data or the data from your backend, aren't being sent to the LLM for processing? if not, how does it decide what to do?

Anonymous
01/13/26(Tue)20:56:34 No.107856301

Anonymous 01/13/26(Tue)20:56:34 No.107856301

>>107856290
Auto regressive diffusion hybrid?
Is that something we've seen before?
I don't keep up with image models.

Anonymous
01/13/26(Tue)20:56:42 No.107856303

Anonymous 01/13/26(Tue)20:56:42 No.107856303

>>107856299
The backend is the LLM. You use your own LLM instead of using anthropic.

Anonymous
01/13/26(Tue)21:05:38 No.107856346

Anonymous 01/13/26(Tue)21:05:38 No.107856346

File: still bad.png (934 KB, 709x888)

934 KB PNG

>>107856290
>Still retarded at text

Anonymous
01/13/26(Tue)21:14:19 No.107856390

Anonymous 01/13/26(Tue)21:14:19 No.107856390

File: 1750635898483648.png (93 KB, 223x293)

93 KB PNG

>>107856346
erotic

Anonymous
01/13/26(Tue)21:20:19 No.107856427

Anonymous 01/13/26(Tue)21:20:19 No.107856427

>>107854670
not only can it do that but it helped me code it in the first place.

Anonymous
01/13/26(Tue)21:21:35 No.107856431

Anonymous 01/13/26(Tue)21:21:35 No.107856431

>>107856424
>>107856424
>>107856424

Anonymous
01/13/26(Tue)21:33:58 No.107856489

Anonymous 01/13/26(Tue)21:33:58 No.107856489

>>107856303
ok, I see. where is it running, though? in your own server/machine?

Anonymous
01/13/26(Tue)21:35:13 No.107856500

Anonymous 01/13/26(Tue)21:35:13 No.107856500

>>107856489
It runs wherever you run it, yes. There are no compulsory cloud components.

Anonymous
01/13/26(Tue)22:17:15 No.107856800

Anonymous 01/13/26(Tue)22:17:15 No.107856800

>>107849813
I dont get it. . . why aren't you happy you found plant Y? It was the one you were looking for.

Anonymous
01/13/26(Tue)22:27:01 No.107856859

Anonymous 01/13/26(Tue)22:27:01 No.107856859

>>107856500
ah, that's awesome, great for local, I guess.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.