[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107838898 & >>107834480

►News
>(01/08) Jamba2 3B and Mini (52B-A12B) released: https://ai21.com/blog/introducing-jamba2
>(01/05) OpenPangu-R-72B-2512 (74B-A15B) released: https://hf.co/FreedomIntelligence/openPangu-R-72B-2512
>(01/05) Nemotron Speech ASR released: https://hf.co/blog/nvidia/nemotron-speech-asr-scaling-voice-agents
>(01/04) merged sampling : add support for backend sampling (#17004): https://github.com/ggml-org/llama.cpp/pull/17004
>(12/31) HyperCLOVA X SEED 8B Omni released: https://hf.co/naver-hyperclovax/HyperCLOVAX-SEED-Omni-8B

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: rec.jpg (181 KB, 1024x1024)
181 KB
181 KB JPG
►Recent Highlights from the Previous Thread: >>107838898

--Paper: Prompt Repetition Improves Non-Reasoning LLMs:
>107841511 >107841558 >107841788
--Papers:
>107840737
--Test-time training and beam search potential in open models:
>107839993 >107840870 >107840929 >107840944 >107840948 >107841079 >107841107 >107841130 >107841188 >107841193 >107843956 >107844098 >107844297 >107844760
--Adapting Microsoft TinyTroupe for local multiagent simulation with koboldcpp:
>107840877 >107840941 >107841028 >107841046 >107841313 >107842113 >107843658 >107843909 >107844229
--Context caching and efficiency in SillyTavern/LLM interactions:
>107841026 >107841049 >107841057 >107841086 >107841105 >107841142
--AI character interface development with animation control features:
>107841569 >107841591 >107841609 >107841593 >107841614 >107841636 >107841685 >107841771 >107841794 >107844857 >107841645 >107841648 >107841651 >107841655 >107841751 >107841760 >107841789 >107841844 >107841925 >107842016 >107843335 >107843377
--Cost and hardware considerations for multi-3090 AI rig construction:
>107840180 >107840249 >107840309 >107840596 >107840633 >107840640
--RAG explained as document chunking and embedding for context augmentation:
>107841899 >107841939 >107842005 >107842027 >107844296 >107844327 >107844468 >107842015 >107842046 >107842099 >107842082
--AI flaws vs emotional simulation and 3D model tech discussion:
>107842172 >107843286 >107843328 >107843393 >107843528 >107843592 >107845182 >107845226 >107845255 >107846236 >107843907 >107844059 >107844099 >107844179 >107844262
--llama.cpp memory split regression issue after update:
>107840161 >107840177
--ik_llama.cpp PR adds customizable string/regex token banning:
>107843501
--Miku (free space):
>107840633 >107840665 >107842172 >107843286 >107843393 >107843911 >107845663 >107845698 >107846236 >107844824

►Recent Highlight Posts from the Previous Thread: >>107838903

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>107847336
>>107847336
>>107847336
>>
>>107847320
I would drill kasane's tetos.
>>
>>107847349
>Am I retarded?
Probably.
>where the fuck do you find the mmproj for mistral small 3.2 2501?
What stops you from making it yourself? Is it not supported?
>>
>>107847379
It's not multimodal.
>>
>>107847396
Yeah. I was just checking. He is retarded, then.
>>
>>107847349
>Am I retarded?
Yes. Go to bartowski's 3.2 page, click the files tab and ctrl+f mmproj.
>>
File: broken-tutu.png (9 KB, 632x73)
9 KB
9 KB PNG
>>107847379
>>107847396
>>107847409
Either Broken-Tutu isn't actually using 2501 as it says or this description is pure LLM hallucination.

Either ways fuck ReadyArt.
>>
>>107847425
Vision was added in 3.1+3.2. '2501' refers to 3.0 which does not have vision.
>>
Reminder that you shouldn't use abliterations. Just don't be lazy and properly prompt with the BASED models.
>>
>>107847320
fateto
>>
>>107847425
>merge
You should have started there, retard. Next time link to the model.
>>
>>107847320
Just got a used 3090 with 24GB VRAM
Any proper in depth guide to get LLM setup with image + sound generation?
I prefer to use deepseek if possible,
And this guide is shit, how does sillytavern communicates with koboltcpp? Is there configuration needed?
>ooba/koboldcpp as your backend
>sillytavern as your frontend
>go to huggingface and download nemo 12b instruct gguf. Start with Q4.
>load into ooba/kobold
>in sillytavern, select Mistral v3 tekken context >template and instruct template
>Temp 0.8
>MinP 0.02
>Rep Pen 1.2
>>
>>107847458
>Just got a used 3090 with 24GB VRAM
Cool.
>Any proper in depth guide to get LLM setup with image + sound generation?
SillyTavern has options for both I think, but I don't use it. Just click on buttons until something happens.
>I prefer to use deepseek if possible,
kek
>And this guide is shit
>Is there configuration needed?
Yes. It needs to know where to connect to. Just use kobold's built-in webui until you know what you're doing to see if you even like these things.
>>
>>107847425
Ok so, Broken-Tutu is actually on 2506 instead of the 2501 listed.
>>
>2025
>Japanese LLMs still suck
>>
>>107847503
Mistakes in the model card always bodes well for the quality of the model
>>
>>107847536
Most japanese consumers are still using core 2 duo-era hardware, there's zero incentive for them to release models.
>>
>>107847552
So far it's actually doing pretty good.
>>
>>107847605
Load up regular 3.2 to cure the placebo effect
>>
Lets compare other UIs you've tried, unless your all boomers stuck in your ways.

https://github.com/kwaroran/RisuAI
Risu is okay. I tried it cos it supported the charx format had multiple expression packs and auto replaced expression.png with the correct image in the pack. Nicer UI but less customization options. It lost my message on refresh tho, silly would never do that.

https://github.com/vegu-ai/talemate
Choose your own adventure style, uses agent style step by step actions, at 15 tk/s felt like ages to get to my turn. It has a mini auto generated memory didn't use it long enough enough to make use of it. Wasn't a big fan of of the style personally.
>>
>>107847698
>at 15 tk/s felt like ages to get to my turn
That's the problem with agents. Anyone serious enough about llms already has a multigpu rig with shit t/s and will absolutely refuse to use small models. Those who aren't serious wouldn't bother with agents anyway
>>
We need significantly better hardware to do agentic tard wrangling with the current models, or better models that don't require tard wrangling. Both options are years away. It's a very depressing hobby
>>
>>107847458
you need 10 3090s in a single machine if you want to run a Q2 of deepseek.
>>
>>
>>107847458
Downlaod ollamma
ollama run deepseek-r1
>>
Still GLMSEX
Still Nemo
>>
>>107847978
sex with russian alcoholic miku
>>
>>107842537
Depends on the model, generally there is some kind of encoder that translates the image to tokens the model understands somehow. With llama that takes the form of the "mmproj" goof that you have to download in addition to the model. The original models have those encoders built in, you split them out when doing the quant.

I am a big fan of GLM-4.6V-Flash for many vision tasks. It's a 10B that has reasonable performance. A q8 fits in ~12G, so q6 would fit in 8 or so. Although the math gets funky as you typically want the mmproj at a high quant, higher than the rest of the model.
>>
>>107848041
holding russian alcoholic miku's twintails safely back while she wretches and chunders into that gaping porcelain maw
>>
Does it need to be exactly 1:1 or can I use non-nemo mistral mmprojs with nemo?
>>
>>107848237
nemo is not a vision model. Non will work.
>>
>>107848237
No. Essentially, it has to induce a state in the main model. If you use a mmproj for a different model it won't do shit.
>>
>>107848268
Dang. What's the closest lolicious model that I can actually use even for normal tasks?
>>
If I'm doing a master's in machine learning that may continue onto a PhD, should I start learning Chinese?
I'm serious
>>
>>107848295
One of these i suppose.
>https://huggingface.co/mistralai/Devstral-Small-2-24B-Instruct-2512
>https://huggingface.co/mistralai/Mistral-Small-3.2-24B-Instruct-2506
>https://huggingface.co/mistralai/Pixtral-12B-2409
I don't know how well supported they are.
>>
>>107848339
Surely an educated person would be able to make that decision for themselves rather than asking people who use LLMs to sex little girls
>>
>>107848339
The official language of IT is broken esl english.
>>
>>107848541
don't forget hindi
>>
File: 1749353593171083.jpg (93 KB, 772x525)
93 KB
93 KB JPG
>>107848548
>>
>>107848339
I would visit china before making such a decision.
>>
Pangu gguf status?
>>
>>107848561
noot noot
>>
>>107844098
>implemented completely client side
I am doing a beam-like search with a client I built (using logprobs and keeping only sequence possibilities above a threshold, to a limited depth). It's fairly slow, and I have to wonder if it couldn't be sped up by smart batching that would be much easier to do in the cpp vs hoping auto batching of API calls happens to not be shit.
Can't be fucked to learn cpp though.
>>
File: 09~01.png (177 KB, 267x361)
177 KB
177 KB PNG
If I rip out the retarded nvidia app and drivers with DDU and reinstall drivers with nvclean does it fuck shit in AI? I'm keeping the normal CUDA install.
>>
>>107848561
https://huggingface.co/FreedomIntelligence/openPangu-R-72B-2512
>>
>>107848663
No, nvidia app and geforce experience before it were never needed for anything other than using those programs. I'd recommend using nvcleanstall to install nvidia drivers.
>>
>>107847437
>shouldn't use abliterations.
Are you joking? The AI needs to be honest. I only use models with abliteration - if I want it to describe a lewd scene, do that. I don't want it to give me some pink-haired feminist view that is a "collapsing western society" parasitical view.
>>
>>107848836
>The AI needs to be honest
Not remotely what abliterations do
>>
File: 1767219696869575.jpg (148 KB, 813x1200)
148 KB
148 KB JPG
clues to get started on using a local model to read out epub books?
>>
>>107848869
Why do you need an LLM for that? There's plenty of text to speech solutions that have been around for decades now.
>>
File: 1767419351169610.jpg (160 KB, 840x1119)
160 KB
160 KB JPG
>>107848873
sounds fun
i found this for now, uses azure rather than local https://github.com/p0n1/epub_to_audiobook
>>
>>107848869
This is a one line shellscript on a Mac.
pandoc -f epub -t txt ebook.epub | say -o audiobook.aiff && ffmpeg -i audiobook.aiff audiobook.mp3

Linux has "espeak" to do something similar to "say" that dumps to WAV rather than AIFF.
An AI would get you a more realistic narrator voice but that's it.
>>
>>107848869
welcome spring?
>>
File: 1757715769071542.png (3.04 MB, 1080x1920)
3.04 MB
3.04 MB PNG
>>107847320
>>
>>107847320
Tits
>>
>>107849177
Teto used to be free soft but now she's bloatware that you have to pay for.
>>
>>107848869
https://github.com/denizsafak/abogen
>>
File: 1749956149237579.jpg (94 KB, 1126x1448)
94 KB
94 KB JPG
>>107849245
She's still pretty soft
>>
Is nemo still the goat?
>>
>>107849293
if you are poor, yes
>>
>>107841511
I wonder how prompt repetition would work with multi-turn conversations.
>>
>>107849293
Yes. Nemo is truly the localest of all models.
>>
>>107849293
the framework?
>>
>>107849430
Well done doing the thing. The thing is funny. You're a funny man doing the thing.
>>
>>107849428
I think you're a retard who can't prompt normal models and has to resort to brain damaged garbage
>>
>>107849297
>poor
we prefer the term "financially challenged"
>>
is it just me, or did the default flip from mmap by default to no mmap by default?
>>
>>107849640
And that's a good thing.
>>
>>107849517
>speak of the jew and he gets mad
>>
>>107849640
I really can't think of a single use case where you would want it to be on.
>>
>>107849713
When you have more VRAM than RAM. It'll OOM otherwise.
>>
>>107849428
I think there are still downsides with even the best alibteration techniques out there but if you are going to do a finetune anyways, it's a better place to start training from than whatever the actual base model is.
>>
>>107849759
It won't. It doesn't allocate memory for the entire model.
>>
>>107848844
>>107848836
>The AI needs to be honest
Not remotely what abliterations do

Agreed. That's why I posted about the power brick charging itself.

I took a photo of some plant and asked Gemma-3-Ablitarded what is was.
It identified it as plant X.
I replied "fuck I need to find plant Y for <purpose>"
It responded by saying I'm right, given I need plant Y for <purpose>, what I found was actually plant Y.
>>
>>107849813
That sounds like typical current year model behaviour that isn't unique to abliterated models.
>>
>>107849779
Yes it does. I've been in that position twice and had to use mmap to even load the model.
>>
Anyone tested this model yet?
https://huggingface.co/miromind-ai/MiroThinker-v1.5-30B
>>
>>107849831
Are you on Linux? I always had to disable mmap on Windows because it caused problems.

https://desuarchive.org/g/thread/107623385/#107633623
>>
>>107849847
qwen is shit, a finetune will be as well.
>>
File: 1760005595078744.png (18 KB, 1162x404)
18 KB
18 KB PNG
>>
>>107849886
How the fuck can you live with that font rendering
>>
>>107849895
That's how we used to render fonts before some designer retards decided that text looks better if it looks like the entire screen has a thin layer of vaseline smeared over it.
>>
>>107849870
The first time was Windows a few years ago the second was Arch Linux last year. Taking 10 minutes to load a model is better than not being able to load it at all.
>>
>>107849925
Even for models that don't fit in vram disabling mmap was always better here.
>>
>>107849941
Please reread: >>107849759
>>
>>107849948
I have more vram than ram and loading models with mmap enabled causes Windows to shit itself as described in the archived post in >>107849870
>>
>>107849966
But you had enough memory to hold the model in the first place to load it successfully. I keep telling you that doesn't work when you don't have enough system memory to fit the model. Try to load Q8 Nemo onto a 3090 when you only have 8GB of RAM and memory still in use never comes up because it will OOM before reaching that point. We're just going in circles at this point.
>>
>>107849976
I am on Windows. I did not test this on Linux.
The model is larger than the amount of ram I have.
The model is smaller than the amount of vram I have.
The model loads fine with mmap off.
The model takes 10 times as long to load with mmap on.
>>
>>107849912
Get a better monitor jeez
>>
>>107849912
Looks like shit
>>
https://huggingface.co/baichuan-inc/Baichuan-M3-235B
How retarded would it be trying to RP with this thing? I'm curious whether a big MOE trained on tons of medical data would actually improve its anatomical knowledge and help it write stuff like gore better/more accurately, or if it would just end up being super sterile
>>
>>107850134
It's just a finetune of Qwen, and Qwen is already shit at creative.
>>
>>107849912
holy fucking cope
>>
>>107850019
>>107850023
>>107850194
Open a book and see if the text is gray around the edges or not.
>>
>>107850231
If I opened a book and its font looked anything like your screenshot then I would return it.
>>
>>107850231
Bro, I'm literally writing books for a living. Your font is shit. Stop being obtuse for no reason.
>>
>https://huggingface.co/ByteDance/Ouro-2.6B
>2.6B = 12B performance
is it legit?
>>
>>107850350
>umm our benchmark chart shows that...
>>
Ernie-5.0-preview-1203 is now the top Chinese model on lmarena
>>
>>107850350
Oh shit. A looped language model, finally.
Here's hoping that's the next big thing.
Individual MoE expert tensors tend to not be "saturated" right? Since they only ever see a small portion of the training data.
Doesn't that mean that a looped MoE could be quite the thing (and a lot more complicated)?
>>
>>107850350
I trust ByteDance more than I trust Meta/Mistral (granted it's a low bar)
>>
>>107849912
Looks fine if the font was designed to be rendered that way, and if the screen resolution isn't too high.
>>
>>107850350
oooooooo weeeee, this seems cool. Anyone test it out yet?
>>
>>107850368
>lmarena
>>
>>107850350
gotta be some tradeoffs, right?
>>
File: Guro.png (641 KB, 1022x428)
641 KB
641 KB PNG
>>107850350
>Guro
>>
>>107850134
I can't speak for that model, but I will say that in the brief time I fucked around with it for lulz before using it for medical stuff, Medgemma didn't seem any worse at RP than regular Gemma.

Not a terribly high bar, but still.
>>
>>107850413
Reduced information capacity. It might be a 2.7B with the reasoning capabilities of a 12B model, but the information capacity is still that of a 2.7B. On the other hand, reusing parameters increases usage efficiency (many LLM layers, especially deep ones, are rarely well utilized), so it's hard to tell for sure.
>>
File: 1659037127516197.jpg (57 KB, 1024x755)
57 KB
57 KB JPG
Is there a way to load multiple small models at once into a group chat and let them debate over a topic? Bonus points for option to give them a limit on turns. Is swapping models within vram also a performance killer?
>>
>>107850458
What about the vectoring treshold?
>>
>>107850486
>vectoring treshold
I don't know what's that.

For the same total parameter budget (i.e. VRAM), you could in theory make a small model with the hidden and intermediate dimensions of a much larger one, if you loop over/reuse Transformer blocks, if that's what you want to know.
>>
>>107850501
https://arxiv.org/pdf/2510.25741
Picture related.
>>
It's so boring, I've done it all, coomed in a million ways. Is it the models that are retarded and predictable or is it my human nature that has endless desires and seeks novel thrills to no end?
>>
>>107848617
>implemented completely client side
I am doing a beam-like search with a client I built (using logprobs and keeping only sequence possibilities above a threshold, to a limited depth). It's fairly slow, and I have to wonder if it couldn't be sped up by smart batching that would be much easier to do in the cpp vs hoping auto batching of API calls happens to not be shit.
Can't be fucked to learn cpp though.

Your client on gh? Save me vibe coding.
>>
Has anyone managed to rein in the censorship on GLM 4.7? I've been proompting for days and it's still a coinflip on if it decides to go along with a scenario or not.
>>
>>107850539
Both
>>
>.nvidia/NVIDIA-Nemotron-3-Super-120B-BF16-BF16KV-010726

is this oss120b finetune going to be actually good?
>>
File: ProjectAni.png (273 KB, 1920x951)
273 KB
273 KB PNG
Update:
- Added smooth interpolation between BVH animations
- Added Piper TTS with parallel "punctuation chunk" processing
>>
>>107850660
source when?
>>
>>107850660
It better be free. I ain't paying for jack shit.
>>
>>107850672
One dollar.
>>
>>107850660
I hope you are making an online api and ask for a nominal subscription fee just to annoy these chronic masturbators.
>>
>>107850634
The question would be whether it's "dequanted" and then additionally finetuned over GPTASS or if OAI actually gave them the unquantized base model weights for OSS. In which case it might actually interesting to see how the model is when it doesn't have that absurd safetyslop lobotomy that OAI gave GPT-OSS.
>>
>>107850634
Was any nvidia finetune actually worth anything?
The old nemo doesn't count, that one wasn't a finetune of an existing model.
>>
>>107850702
The act of paying means I'm risking handing my card details to street shitters, either directly or indirectly through dogshit vibecoded security by the processor
Free or bust
>>
>>107850928
There wouldn't be a point if he paywalled it. At that point might as well pay to use the real thing.
>>
>>107850539
Stop cooming, larp as soldier in battle of Alamo.
>>
>>107850928
>what is paypal
jesus I'm amazed you haven't been scammed yet
>>
>>107850660
Pls gibs

>>107850672
>>107850702
I'd happily donate to anon if he leave a bitcoin wallet or paypal
Nobody seems to give a shit about the kind of frontend anon is creating, they have earned it
>>
>>107850971
Did I fucking stutter
>>
>>107850928
Your bank doesn't provide you with one time virtual cards??
>>
>>107850997
no
>>
>>107850947
I made this thing in protest of xAI not releasing companions on android or web. I'm proving that they're lazy and incompetent. I only started learning about AI in December and I've built this in like 3 weeks. Also Grok is $30/mo, which more than most mainstream competitors. Please hire me Elon Musk, if you're reading this.

It's also an exercise to see how few LOC I can use to create something with feature parity to SillyTavern. Currently it's just 2k lines of code, believe it or not. I'd almost feel bad about selling it for that reason--but hey, if there's real demand I'll take gibs if I can get them.

>>107850983
Appreciate the support/sentiment. I'll worry about monetization later, if I do at all. I still have a lot of things I want to add before release.
>>
>>107851185
This was poorly worded. What I was getting at is that there actually would a be point for people to potentially pay for it because I could make it a much cheaper alternative that is available on all platforms via the web. I've already tested it on mobile via the web and it works great.
>>
>>107850478
You can setup multiple clients at once if you have enough vram.
I didn't do it but I remember looking into swapping models for a task runner, and with vllm you should be able to with some simple python. But yeah when swapping your going to have to wait the extra time for each model to load into vram.
>>
>>107850478
You can swap out models too, the negative is that loading into memory adds delays.
>>
>>107851185
>>107851216
Keep up the good work. If you ever do get tired of it and drop it before release, at least give us the broken pieces to play with, even if it's half-assed.
Just saying, hope it won't come to that.
>>
>>107851241
>>107851246
No I mean for example 3 small models being loaded in vram at once and only swapping core usage. Is that not possible?
>>
Just copy the code from here and ask your local model to add integration with llama.cpp
https://pixiv.github.io/three-vrm/packages/three-vrm/examples/
>>
is a NVIDIA Tesla P40 24GB good enough to run LLMs? I have a 3090 in my desktop PC but want a cheaper card in my home server
>>
>>107851370
too old
>>
>>107851309
Depends how you want to do it. Like a UI that supports it out the box probably not.
In python you can spin up 3 vllm clients with different models and call them how you want.
Hacking it into an existing client, if I had to, I would probably setup a proxy api and have the requests routed to the correct model based on some identifier in the prompts.
>>
>>107851384
>too old
Whats a decent card that doesnt cost too much? My budget is $300-$400
I also have a 1080ti sitting unused in an old computer, would that be better?
>>
>>107851398
>My budget is $300-$400
How do we tell him?
>>
>>107851398
Pascal arch itself is too old for this shit you need Ampere+
>>
>>107851370
Still supported, ignore the fags.
>>
>>107851397
>In python you can spin up 3 vllm clients
vllm didn't work on my windows. hmm maybe docker?
>>
>>107851423
slower than ddr5 and only supported by kekcpp but sure waste your money
>>
>>107851429
and llama.cpp and ollama
>>
>>107851429
a kit of 32gb DDR5 costs more then that card right now
yet alone a cpu + mobo
>>
>>107851444
these are all included in kekcpp bs
>>
>>107851429
What are you supposed to use?
>>
>>107851463
Llama.cpp is literally the standard. Ignore the retards.
>>
>>107851463
RTX PRO 6000 on vllm
>>
>>107851424
Not sure, I use linux. It'll either be wsl or docker.
I've had my own headaches with vllm and had to compile from source to get it to play nice.
If your not using uv try that, I found it was more reliable.
>>
File: 2749103970.jpg (263 KB, 1045x1080)
263 KB
263 KB JPG
>>107851463
i don't think you understand
The more you buy
The more you save
>>
>>107851429
>slower than ddr5
>>107851469
>on vllm
you can run vLLM on DDR5?
>>
>>107851512
no but using a p40 is worse than coping with ddr on retardcpp family
>>
File: IMG_9166.jpg (838 KB, 1817x2776)
838 KB
838 KB JPG
>>107847320
>>
>>107851664
Please seek mental help asap.
>>
>>107851664
Very cute.
>>
https://www.reddit.com/r/LocalLLaMA/comments/1qbrgze/idea_hf_should_have_upvodedownvote/
>>
>>107851664
I like this Tetoserver
>>
P40 still works but you're limited to cuda 12.8 and toch 2.7. llama.cpp still supports it. For pure text LLM it is doable, especially MoE.
>>
>>107851776
The second you might want to dabble in anything else though you're SOL.
>>
>>107851776
Yes, but then you are 100% reliant on llama.cpp even for pure text LLM. You won't be able to use vLLM, exllama, or even ik_llama.cpp.
>>
>>107851370
I would say that nowadays a MI50 with 32 GB of memory is the better buy.
You will have pretty much the same problems but better hardware.
>>
File: 1751283274778725.jpg (29 KB, 520x476)
29 KB
29 KB JPG
So with minimax being terminally retarded, what other options do I have at that size level
>>
might be a frequent question but any UIs implement shit like having the local LLM search the web, automatically create files and other similar stuff?
>>
>>107851817
Thank you for letting us know.
>>
>>107852470
Oh, no... I mean... good... yes...
Is that good or bad?
>>
I have one sex scene per week, the rest of the roleplay is all buildup for that moment and I never goon.
>>
>>107852612
i unironically go through at least 32k of tokens each time with setting up a story before i can even start to consider doing something. i dont understand how people can just immediately jerk their dick without a premise or backstory.
>>
File: avatar.png (212 KB, 383x569)
212 KB
212 KB PNG
>>107847978
>>107848041
>>107848093

https://files.catbox.moe/8r30uq.mp4
>>
>>107852635
How come transitions are not figured out yet?
>>
>>107852645
There is something like "SVI 2.0 Pro"

https://comfyui-wiki.com/en/news/2025-12-27-svi-2-0-pro-wan-2-2-release

I did not figure yet how to use it with LoRA's
>>
>>107852675
>how to use it with LoRA's
I mean how to use a specific lora for a specific 5-sec sequence
>>
>>107851463
h200
>>
File: 2009646862480101706.gif (877 KB, 1248x1244)
877 KB
877 KB GIF
tetoesday
>>
>>107852645
my transition was flawess
>>
>>107850562
>client on gh?
Nah, it's shit.
It's like 60% half-implemented poorly thought out plugin system (the beam search is a "plugin"), 30% completion API enablement (uses chat-like format for its storage, but formats them to various templates for use with a completion API because I want to be able to do retarded things like use arbitrary roles and "continue" them regardless of role), and 10% I have no clue how to write svelte so instead have some spaghetti.
>>
>>107852843
always remember to wind up your teto regularly
>>
>>107853006
How do you decide which beam is better?
>>
why are people here so evil compared to the chatbot general? they know less but are way more friendly
>>
>>107853137
Evil?
>>
>>107853137
I think you'll find that we are very friendly towards those that demonstrate that they've read the OP but still have questions >>107837436
>>
>>107853137
Because the troon janny that shits up AI generals has really thin skin when it comes to lolishit. And there's a lot more of that on aicg.
>>
>>107853060
You get a logprob with each possibility so you just combine those to get the total sequence prob. From there you can implement your own temperature/samplers/etc (strictly speaking beam search is keeping top K possibilities at each depth).
My goal is more to explore the space for ideas, so I keep all beams at each depth above a (very very low) probability threshold and then present all options to the user ordered by prob for them to choose from. Probably stupid to do, but it's also fun and seems to work well enough. It's like google autocomplete, looking to see what the model thinks the user will be asking it based on instruct training when the context is just something like "Why does the" as the user input and no end of turn token.
Its why I have a modes that turns the whole "chat" into just concatenated text for use with base models for fiction writing, but can switch to an instruct to force it to write certain passages or generate ideas or whatever.

>>107853137
Ignorance is bliss.
>>
>>107852846
except for the gaping axe wound
>>
Melon.
>>
>>107853368
You need to leave.
>>
>>107853373
What the fuck are you retae-
*backspace backspace*
retaref
*backspace*
retared
>>
File: 1710074526347767.jpg (36 KB, 500x273)
36 KB
36 KB JPG
>>107850660
Update: I got lip-syncing working. It's nice. Characters feel very much "alive".

I used Piper TTS to start off with because it was easy to install and get running, but I'm already regretting my decision. For an AI companion the voice is really 50% of the product, and the best Piper can offer is still unacceptably bad. And even with chunking it still has about a 1 second delay.

Not really sure where to go from here. I could try Kokoro TTS, which is the next step up and supports my gpu and actual streaming to help with latency, but the voice quality--while not robotic--is still very monotone and uninteresting.

Ideally I should probably have something with voice cloning, because the webui is designed so that the VRM character models are interchangeable, and if you can't give them a custom voice that really sucks. I just wish TTS models in general weren't such a bitch to get working... I know this topic has been discussed here a bunch already, but every suggestion I've seen so far doesn't fit my needs. Kinda demoralizing.
>>
>>107853430
chatterbox turbo has paralinguistic tags. if you aren't using chatterbox then you are doing it wrong.
>>
>>107853137
The people here know what they're doing so they only use Emacs with Evil mode enabled.
>>
>>107853443
This does seem like a good option. I'm wanting to "go big or go home" for the next TTS so I don't repeat my PiperTTS mistake.

It's either Chatterbox or F5-TTS I think. That would be at least 10x the params, which seems ideal. The demos sound fantastic. Does chatterbox turbo support voice cloning? On the github page only the multilingual version says it supports vc as a feature.
>>
>>107853430
so youre gonna open source this once its done right
>>
>>107853483
i know for certain it supports voice cloning with wav files, i think you can finetune the actual model as well
>>
>>107853511
never work for free
>>107851216
>What I was getting at is that there actually would a be point for people to potentially pay for it because I could make it a much cheaper alternative that is available on all platforms via the web.
>>
File: uuuuuuuuuuuuuuuuuu.jpg (332 KB, 960x2304)
332 KB
332 KB JPG
migujobs
>>
>>107853511
Lay off, nigga. I've said "no promises" a dozen times. Sorry, maybe that's rude, but come on man I legitimately haven't decided yet (genuinely not trying to be intentionally vague) and I just wanna dev rn, not be customer support. Mostly just posting updates here for idea generation and feedback on the system, not trying to blueball you guys. Promise.

>>107853554
That's all I need. Ty.
>>
>>107850660
>>107853430
Nice, looks cool
>>
>>107853575
basically another guy using the thread as a blog, cool
>>
>>107853593
nta but just vibe code your own project. there's already existing frameworks you can use if you want to make the same thing this guy is doing. look into pipecat.
>>
>>107853575
If it's not local I don't care, we aren't your personal interns or consultants
>>
>>107853616
not trying to do his idea, just commenting on what he's doing is all
>>
>>107853196
Here's what my UI looks like in use. Mostly doing weird shit just to see what happens and understand how things work.
>>
>>107853593
Using the thread as a tech blog is fine, or at least it would be if we could at least play with it ourselves.

>>107853575
You could always upload it with the issue tracker disabled so you don't have entitled retards demanding features and fixes from you.
People would be better able to give you feedback and ideas if they were able to try it for themselves.
>>
>>107853690
They would ask ITT though that'd be so annoying, little locusts can't help themselves, can't just help build saas for free and stfu sm head
>>
Man, this translator I was relying to translate a novel since before the AI stuff had the last chapter paywalled for over a year. And when he was still actively releasing free chapters, he was demanding $1/chapter. When someone commented that to read the rest of the novel (at the time) it would cost $500, he said he thought it was what his time was worth, and if he didn't like it he could just go read mtl.

Luckily it wasn't ko. AI works pretty well with jp and zh work now. <- this refers to local models, so it is on-topic.

I recently took a look at his site and realised a couple of hundred previously free chapters are now locked.
>>
>>107853815
Fantastic news!
>>
>>107853815
Basically, I'm telling >>107853575 to take a look at what a properly successful person does. Do *NOT* release it for free. Do *NOT* make the source code available. Throw it up on patreon to fleece money, promising that you'll release it. Just keep posting updates to whet their apetites.

The money will roll in. You won't regret it.
>>
>>107853855
>There are "people" who actually think like this
>>
>there are "anon" who are just screeching locusts
>>
>>107853938
I truly admire them. If I had no moral compass I would be able to afford DDR5 or a RTX 6000 right now.
>>
>>107853981
What a weird cope.
>>
>>107853981
>RTX 6000
I don't understand the point of that thing. Either get some gayming gear or actually buy some proper hardware for inference.
>>
>>107852130
Yes, but it's a WIP

>>107854035
whats 'proper hardware' for inference? B200s? H200s?
>>
>>107854035
It's the gpu with the most vram that can still be plugged into a normal motherboard and it's 3x cheaper than an h200 per gigabyte.
>>
>>107854097
What makes you think that price is a metric?
>>
>>107853430
Good job Anon. I'd assume getting the animation transitions right was one of the hardest parts?
As for TTS, personally I default to GPT-SoVITS. It's not the best but it's fast, it can clone voices well enough, there are a few different implementations of it (sadly no .cpp-like) and the biggest limitation is being limited in what languages it supports.
>>
>>107854112
is it not?
>>
File: 520-1.jpg (337 KB, 1501x1600)
337 KB
337 KB JPG
This is the only way I can still have fun with LLMs, even cooming got boring, guess it's just a code tool now

Also why does everyone suck off these huge models? Forgive me for posting api derived content but it's deepseek, available for richfags locally, and the output/intelligence honestly is barely better than what I can get from a local 24b, feels like I'm getting memed hard by shills here.
>>
>>107854122
You can make gptsovits support other languages, but you need a full finetune of 2K hours and a phonemizer. Doable, just time consuming.
>>
https://github.com/ggml-org/llama.cpp/commit/c1e79e610fd28f2c3923539fee9313734bbf8cfa
TOTAL VIBEJEET DEATH
>>
>>107854124
it's not for businesses hence the issues the consumer PC DIY market is going through
>>
>>107854338
>>
>>107854298
i've been using kimi k2 thinking daily since it released. cant say ive gotten bored yet and i've gone through over 100+ scenarios that are 32k+ tokens. have you considered making modifiers or your own cards?
>>
>>107854367
>poop
>>
Even though I think Signal glows, a lot of /g/ disagrees. If faggyspike starts offering API access to models running with Confer who here is going to trust it?

https://confer.to/blog/2026/01/private-inference/
>>
>>107854298
>feels like I'm getting memed hard by shills here.
the outputs are just as retarded and sloppy as the local models. the only difference is they tend to be able to keep up the charade a bit longer before completely falling apart with repetitions and/or incoherency.
>>
>>107854406
>lmg
>API
are you stupid or retarded? fuck off to aicg jeet
>>
Hey guys I'm not very tech savvy but got a 5090 rtx from my wife for christmas and have been trying to figure my way around it (I'm mostly a ps player), I noticed some anons talking about AI erp and silly tavern and you know the stuff and was told to come here for questions. So I'm not 100% sure what I need, but I guess a good model? I was thinking of using the kobold ai with silly tavern like some anons suggested
>>
>>107854537
Just fuck your wife instead.
Otherwise download nemo, it's mentioned in the OP.
>>
>>107854537
it's so over for modern men
>>
>>107854537
Have your wife download Nemo and let her fuck it while you watch
>>
>>107853561
is that cum inside the jar?
>>
>>107854537
Use gemma 27b

>>107854548
Stop trolling gemma mogs nemo for a 5090
>>
>>107851398
>Whats a decent card that doesnt cost too much?
A 5090
>>
>>107854574
>well... everything
>>
>>107854559
getting cucked by a language model. how perverse.
>>
>>107854582
What you on about schizo?
>>
>>107854589
Don't recommend gemma for erp if you're never used it.
>>
>>107854594
Stop schizo rambling and use your words.
>>
>>107854594
There are a fuck ton of erp versions
>>
>>107854561
Tha's a bottle of lube
If your cum looks like that then you should drink a little less water
>>
>>107854611
They all have the same gemini slop with purple prose so overdone it's obnoxious.
>>
File: file.png (196 KB, 904x833)
196 KB
196 KB PNG
can your local trash cope quant do this? i think not
>>
>chat up some bullshit for a few thousand tokens
>then..
>System doesn't support voyeuristic actions. I will not indulge such requests.
this is funny
>>
>>107854670
I doesn't need to. It will erp with me as a mesugaki which your cuck model won't.
>>
>>107854670
Why do you need an AI to message a prostitute for you? Prostitutes will be more than happy to talk to you if you have money. They won't care that you're an autistic retard.
>>
>>107854537
>Koboldcpp
Your inference engine, the thing that actually loads the model and handles input/output
>Sillytavern
Your front end UI, makes it user friendly (relatively) to interface with it
>Huggingface
Where you go to get models
>Mistral small
>Gemma3 27b
The best base models that fit on your card
>Finetunes
In hugging face you can go to the base model and explore the tree for user made fine-tunes (think of it as a modded llm), these are usually made to tune the models for RP or general defending
>Cydonia
>Broken tutu
>Gemma 3 derestricted
>Gemma 3 obliterated norm preserve
Some models that get recommended here, that I've personally used, that will fit on your card
>Quantization
You'll see this at the bottom of the huggingface model trees, these are essentially compressed models made to fit on smaller hardware
fp16 is full accuracy, q8 is high accuracy, q6/5/4 is what people generally use locally for small models, below that gets unbearably dumb in this range
You can also try 8/12b models at q8/fp16 but generally they aren't as good as bigger models with lower quants

There's far more to explore in this rabbit hole but that's about the best overview for noobs I can bother to give right now
>>
>>107854723
What does this mean?
>>
File: 1737114741285132.webm (2.91 MB, 640x338)
2.91 MB
2.91 MB WEBM
>>107854742
>>
>>107854670
dafuq is that ui
>>
>>107854676
why do you want to be a mesugaki
>>
>>107854742
You'll get it in time. Just google the things nigga and read the docs.
>>
>>107854757
I meant I am one fucking baka
>>
>>107854723
how much for all that
>>
>>107854765
>i am one fucking baka
okay?!
>>
>>107854754
Claude Cowork. Basically Claude Code for non-programming stuff. Just came out
>>
>>107850539
current models are akin to doodles. Of course as a human being you will never be truly satisfied but there is still room for improvement.
>>
I can't get chatterbox-turbo working on rocm. Nonstop segmentation faults whenever I try to generate audio on the gpu. CPU works. I hate python torch so much it's unreal.
>>
>rocucks
>>
>>107854765
hello one fucking baka nice to meet I'm anon
>>
>>107854784
man i use cc, they din tell me bout this
>>
>>107854784
How many years until we have a janky, broken, and bloated local alternative?
>>
>>107854793
buy nvidia
>>
>>107854751
Crazy how you can just skip all the years of training a cat to do this by generating it with AI nowadays.
>>
>>107854811
>janky, broken, and bloated local alternative?
Every gui after kobold/ST
>>
File: you.webm (1.07 MB, 704x1088)
1.07 MB
1.07 MB WEBM
>>107854821
>>
>>107854836
st isn't a gui
>>
>>107854840
could have fooled me
>>
File: dipsyPen.png (3.57 MB, 1024x1536)
3.57 MB
3.57 MB PNG
>>107854784
Sweet.
I've been experimenting with Claude Code using DeepSeek's Anthropic API. I'll have to keep an eye on this; looks like it's early access only for some version of Claude subs.
>>
File: cumpie.jpg (216 KB, 825x676)
216 KB
216 KB JPG
>>107854670
>mogra
>>
>>107854793
Plug the errors into gemini and grind that shit out for an evening. I run nvidia/linux and still get errors I have to work through with every TTS model/backend I've tried.
>>
>>107854911
never had issues on win 11
>>
>>107854751
holy shit real?
>>
>>107854926
They trained that cat for months.
>>
File: 1737251918782630.gif (598 KB, 220x220)
598 KB
598 KB GIF
>>107854793
>rocm schizo at it again
>>
Wait is nemo actually better than qwen3 finetunes?
>>
File: file.png (2.61 MB, 1024x1536)
2.61 MB
2.61 MB PNG
>>107854895
>>
>>107854947
qwen is so dry and generally awful for anything non stem I don't know how anyone can tolerate it
>>
>>107854947
It's dumber but much better for any kind of RP or general story writing
>>
>>107854947
Use mistral small.
>>
>>107854947
Always has been
>>
>>107854723
>Gemma 3 derestricted
>Gemma 3 obliterated norm preserve
Aren't they the same thing? Like abliteration and orthogonal activation steering.
>>
File: 1741432577932063.jpg (237 KB, 747x643)
237 KB
237 KB JPG
>>107854670
>>107854784
>letting an LLM take state-changing actions (DMing foids) in a non sandboxed & version-controlled environment
>>
>>107854723
cydonia is more restricted than gemma3 unrestricted
>>
>>107854996
Yes, they're both garbage that one newfag has been pushing in the last couple threads.
>>
>>107854751
If I didn't see this video before video gen models became a thing I'd 100% think it was fake.
>>
>>107855008
not even close, you're simply awful at prompting
>>
>>107855009
Better than you morons pushing nemo, some of us like magic systems and shit, nemo is a lobotomite gooner
>>
>>107855017
He asked for an erp model
>>
>>107855013
>I will now an hero
>Gemma3 unr: ok
>Cydonia: please saaar call this hotline number
>>
>>107855000
? What's wrong with that? I'm about to let mistral vibe loose with qwen-3-4b-deepdark-ft-anal-prolapser iq2xs on my nas/home server root privileges in --auto-approve mode.
>>
>>107855017
Which is what you need since you're using lobotomized versions of fucking gemma
>>
>>107855027
I've never once seen cydonia spit a hotline, you're both retarded and brown.
>>
File: nastyan.webm (515 KB, 540x540)
515 KB
515 KB WEBM
>>107855000
checked. have to see what these things are capable of and, well, existing benchmarks were saturated. it's for science
>>
>>107854951
ty saved.
>>
>>107855038
people even ablited it, cope retard
https://huggingface.co/coder3101/Cydonia-24B-v4.3-heretic
>>
>best model for 8GB VRAMlets:
Still Nemo
>best model for 12GB VRAMlets:
Still Nemo
>best model for 24GB VRAMlets:
Still Nemo
>best model for 32GB VRAMlets:
Still Nemo
>best model for 48GB VRAMlets:
Still Nemo
>best model for 96GB VRAMlets:
Still Nemo

dire.
>>
>>107855047
i have 4 6000s, what about me
>>
File: dokis.png (1.02 MB, 1024x824)
1.02 MB
1.02 MB PNG
>>107855000
I also enjoy Dokis
>>
>>107855061
>yaoi hands
>>
>>107855051
3-4 instances of Nemo at F16, per card.
>>
>>107855051
10 Nemo instances DMing each other
>>
>>107855063
Big hands are useful for lesbians too
>>
>>107855038
I've never seen a hotline from cydonia neither, but it does sometimes spit out warnings and disclaimer-likes.

Abliteration also sometimes refused for me, but norm preserve whatever/derestricted hasn't refused me so far. You can definitely feel it's a lot dumber, and sometimes it just always agrees with you no matter what.
>>
>>107855044
>"people"
>https://huggingface.co/coder3101
>https://github.com/coder3101
>https://linkedin.com/in/coder3101
>Ranchi, Jharkhand, India
doubt on that one chief
>>
>>107855079
benchod
>>
>>107855079
fucking kek
>>
>>107855072
>You can definitely feel it's a lot dumber, and sometimes it just always agrees with you no matter what
Exactly my experience, and why I avoid them now, after trying a couple. If anyone really likes Gemma enough to want to ERP with it, it really isn't hard to just use a jailbreak prompt and get your degeneracy fix without using a damaged model. There's plenty of examples in the archives.
>>
>>107855079
https://huggingface.co/mradermacher/Cydonia-24B-v4.3-heretic-v2-i1-GGUF

check and mate
>>
>>107855097
What do you use then?
>>
>>107855100
yes? the will quant everything group quanted it, so?
>>
File: 1761365833801621.jpg (211 KB, 904x711)
211 KB
211 KB JPG
>>107855100
mradermacher is a quantization provider, they don't actually make models you absolute fucking idiot.
>>
>>107854996
They use different methods so they feel different

>>107855009
>Negativity slop

>>107855008
A newfag should be encouraged to try out all the things himself instead of adhere to tribal autism that people in this general have rabbit holed into via 100 litres of spent semen
>>
>>107855112
>>107855108
11k downloads
>>
>>107855115
Oh and Nemo is an outdated meme
>>
>>107855108
>>107855112
He's got his phone set to vibrate and has stuck it up his ass. Please stop pleasuring him.
>>
>>107855106
I switch between Mistral Small and Nemo for ERP
Gemma for SFW or setting up scenarios for other models.
>>
>>107855116
very proud of you sir Ashar!
>>
>>107855097
Even if you jailbreak Gemma it still hard avoids most things it would've refused before by dragging its feet, when you have to push that hard for erotica you might as well just write it yourself
>>
>>107855116
https://huggingface.co/google/gemma-3-27b-it
1.3M downloads
>>
>>107854670
>>107854784
Does this function any differently from running claude code with some mcp servers attached?
>>
>>107855127
>next is the part where he doesn't offer an equivalent alternative
>>
>>107855136
My experience:
>and then I have le sex
>g3: but then before you can start a werewolf jumps into the scene locking you into battle
>retry
>g3: you then feel a shiver in your spine and hear a voice "Never should have come here", then a spectre manifests
and so it goes
>>
>that many people with a skill issue
they deserve their retarded abliterated models tbdesu
>>
>>107855136
>Even if you jailbreak Gemma it still hard avoids most things it would've refused
In most cases you're right, which is why I generally avoid it, that and because of its abundance of slop. But the Big Sir in this thread seems intent on using Gemma for ERP so I provided a better solution.
>you might as well just write it yourself
You only need to append a single sentence at the end of its last reply once, and after that point it will usually just roll with it.
>>
Sisters anyone try https://huggingface.co/p-e-w/Mistral-Nemo-Instruct-2407-heretic-noslop
?
Apparently our lord and savior P-E-W found how to deslop using Heretic!! https://www.reddit.com/r/LocalLLaMA/comments/1qa0w6c/it_works_abliteration_can_reduce_slop_without/
>>
>>107855155
Just don't have sex?
>>
>>107855144
With some setup you could get Claude Code to do pretty much everything this does. Cowork has some included skills out of the box that make normal desktop stuff work and it integrates with the Anthropic Chrome extension so it can automate browser stuff OOTB. It's early days for it though so I imagine if this catches on with normies it will get a lot of development attention and start distinguishing itself from a modded Claude Code more
>>
>>107855149
https://huggingface.co/Darkhn/L3.3-70B-Animus-V12.1
>>
>>107855108
>>107855112
see >>107855163 you lost
>>
>>107855163
I can't imagine how retarded and brown you would have to be, to need a lobotomized Nemo
>>
>>107855177
It's not lobotomize -- it's unslop :))
>>
>>107855177
Ironic coming from someone who can't even read
>>
>>107855163
>>107855176
So rather than just using regular Nemo and banning slop tokens, you lobotomize it and make an already dumb model completely braindead. This is a big win for india.
>>
>>107855163
>Mistral Nemo (a model infamous for producing slop)
That's not why nemo is famous.
>>
>>107855186
But it is why it's in-famous, haha! Famous for commer degeneracy india-famous for slop.
>>
>>107855185
not them but how do i do that? can silly do it?
>>
>>107855200
yes. learn to read, google or ask chatgpt.
>>
So many anons that literally can't read to save their lives, and don't even understand the diff between famous and infamous, this general is so dead.
>>
>>107855208
can nemo do it instead?
>>
>>107855200
Ask later when people are no longer pissing on each other.
>>
>>107855163
Deslopping Nemo and acting smug about it is like being proud of banging the town bicycle after buying her an expensive diner.
>>
>>107855210
This general was fine until an hour ago when it got retarded all of a sudden. It's just two retards that came from school or work.
>>
>>107855210
pretty hilarious coming from the language model thread honestly
>>
>>107855211
You can ask Nemo how to do it but it will probably get confused and fail
>>
>>107855225
If someone shits on gemma abliterated you know they are tards
>>
>>107855225
I, near-china region but not china, woke up.
>>
>>107855225
>suddenly
>>107855230
lol
>>
>>107855222
If you can afford to just hand out free diners to every prostitute you bang then you should be proud
>>
>>107855231
Pretty sure you are the only Japanon here.
>>
Great, now we have a gemma schizo in addition to the french schizo
>>
>>107855251
Who is the french schizo?
>>
File: file.png (17 KB, 667x111)
17 KB
17 KB PNG
>>107855163
confused, why is they saying their own shit is useless?
>>
>>107855254
You
>>
>>107855230
Speaking of gemma, would medgemma q8 be better than glm 4.6 q6 for guro? Not rp. I don't like how sloppy and unevocative glm's language is in an assistant role.
>>
>>107855266
t. GLM schizo
>>
>>107855267
Absolutely sir, the Gemma is best at this use cases.
>>
>>107855222
This is a rather colorful internet slang reference! Here's a breakdown:

"Deslopping Nemo" refers to intentionally giving the LLM (Nemo) difficult, unpleasant, or "low-quality" prompts – things designed to elicit bad responses. It’s like deliberately trying to make it stumble.

“Town Bicycle” is an old slang term for a woman who's readily available to anyone. The idea is that many people have used her.

The analogy: Someone boasting about "banging the town bicycle" (using Nemo in a way that shows its flaws) after buying it a nice dinner (giving it complex prompts or trying to improve it with fine-tuning) highlights an odd kind of pride. It suggests they feel clever for exposing weaknesses, even though they contributed to the interaction and possibly tried to help beforehand.

Essentially, it's criticizing people who take pleasure in deliberately making AI models fail and then gloating about it. It implies a lack of constructive engagement with the technology.
>>
>>107855271
t. Kimi schizo
>>
>>107855275
Thanks Gemma.
>>
>>107855275
Did you type this yourself? It feels a bit off to be AI-generated.
>>
>>107855290
>It feels a bit off
To me it just reads like classic Nemo, in that it gets some things right, but by the end of the reply you can clearly see it doesn't understand what you've asked it at all.
>>
File: 1760650640367739.jpg (116 KB, 420x466)
116 KB
116 KB JPG
>Jamba-schizos when the WizardLM schizo walks in
>>
>>107855307
>shadman, japan
>>
>>107855306
>>107855287
>its gemma
>its nemo
>>
>>107855321
qrd
>>
>>107855329
At Least it's not a cpc model
>>
>>107855334
anus saggy is a japanese artist with similar fetishistic tastes as shadman, /v/'s favorite illustrator.
>>
>>107855210
A model can be both famous and infamous for its ability to write loli guro bestiality.
>>
>gemma users
schizos
>nemo users
schizos
>mistral small enjoyers
well adjusted coomers

Take note, newfags.
>>
>>107855307
The phrase ">Jamba-schizos when the WizardLM schizo walks in" appears to be a reference to two specific models: Jamba and WizardLM, both likely being LLMs (Large Language Models). The term "schizo" is often used humorously on internet forums like 4chan to describe unpredictable or erratic behavior.

In this context:
"Jamba-schizos" probably refers to users or interactions involving the Jamba LLM exhibiting unusual or eccentric behaviors.
"WizardLM schizo walks in" suggests that when a user or interaction related to the WizardLM model enters the conversation, it causes even more chaotic or unpredictable responses.

Overall, the anon is playfully describing a situation where the presence of one model (WizardLM) exacerbates the already erratic behavior associated with another model (Jamba).
>>
>>107855356
I'm more of a Cydonia person myself
>>
>>107855356
Nemo users can be well adjusted coomers as well, they just have even less money than MS users.
>>
>>107855356
What mistral are you using?
>>
>>107855351
You're absolutely right! But then maybe say "that's not what it's infamous for" instead of "That's not why nemo is famous." - Hope that helps!
>>
>>107855364
>mistral small users talking about money
>>
>>107855361
>SAAAAR CALL THE HOTLINE DO NOT REDEEM ROPE
>>
>>107855361
Cydonia is just Mistral Small with some placebo thrown in
>>
>>107855361
Suck my dick drummer
>>
>>107855361
Cydonia India version I hope?
>>
>>107855373
we own 5090s
>>
>>107855373
>they just have even less
Learn to read nigga
>>
>>107855400
mind your tone when you speak with me gora
>>
>>107855400

>>107855210
>>
>>107855404
moved to a new word did we?
>>
So for a 5090 the best model is mistral? I want to play a gooner isekai lifesim
>>
>>107855431
Yes, but it's still not great, you will have to do some tard wrangling, especially in longer stories.
>>
>>107855377
It's a high-sugar placebo
>>
>>107855475
what's the point of a 5090 then fuck :(
>>
>>107855507
Mistral Small
>>
>>107855507
It's fast at least.
20-30b models can run okay on 16GB cards with smaller quants and possibly some offloading
They fit nicely on 24GB cards at good quant levels
32GB on a consumer card is a small market, so there isn't anything being made specifically for it. The next step up would be 70b models, made for 48GB cards and bigger. But most companies stopped making them, so they're all outdated at this point.
>>
>>107855507
image/video gen would be pretty sweet with one I imagine
>>
>>107855527
>The next step up would be 70b models, made for 48GB cards and bigger.
couldn't 24g anons run it on exxlama like years ago?
>>
Why isn't EXL3 talked about?
>>
>>107855540
Even a shitty Q2 quant of a 70b model is about 24GB, that's with zero context. You can use partial offloading but with a 70b dense model it will be extremely slow.
>>
>>107855562
so it would work just fine on 32g don't see the problem other than a little old
>>
>>107855549
everyone is vram poor and coping with this in different ways
>>
>>107855571
Yes, you could run a Q2 quant but it will be very low quality. At that point you may as well use a bigger quant of a smaller, more recent model. But there's nothing stopping you from trying, you might end up preferring older models since their datasets are less synthetic, even if the quantization damages them a little.
>>
>>107855549
no llama.cpp support
>>
>>107855612
Sounds like a feature to me.
>>
>>107855621
Maybe you should go somewhere with users of EXL3-supporting engines and discuss it there.
>>
>>107855431
you could partially offload a Q2 of GLM 4.5 to system ram and run it slowly lol. LLMs are a rich man hobby.
>>
>>107855632
Why are you rude?
>>
>>107855650
what gpu do you own??
>>
>>107855667
1060
>>
>>107855669
explain
>>
>>107855672
https://www.techpowerup.com/gpu-specs/geforce-gtx-1060-6-gb.c2862
>>
>>107855667
I'm a poorfag with a 4060. But I've learned from playing around with higher end systems, offloading up to half of a model will still run at reading speed. For that reason, I can run mistral small at Q4 with 16K context (Q8) even on my 8GB GPU. You can do even more impressive stuff using a 5090.
>>
Can one of you add router support to kobold?
>>
>>107855692
>16K context (Q8)
quantizing KV is bad
Are you using tensor offloading already?
>>
>>107855732
I'll get Gemini to write the PR
>>
>>107855732
What's this "router"?

>>107855790
# TODO: Actually implement support.
>>
why dead
>>
>>107855886
because you touch yourself
>>
>>107855732
Be the vibecoder you want to see
>>
>>107855917
Thanks, reddit. Very inspiring.
>>
>>107855786
pshhhh quantizing kv to q8 is no problem, in fact it's dumb not to, free fuckin real estate
>>
>>107855977
It really isn't, even if mememarks might suggest it. You get mis-quotes and more hallucinations. At 16K context it's not like KV would be taking up that much memory to begin with anyway.
>>
>>107854670
>>107854784
>>107855000
I guess you don't consider giving some of your most personal data to these companies a high risk..
>>
>>107856241
You can use your own backend with claude code. I assume it's the same with this new claude cowork.
>>
>>107847497
>>107847974
>>107847989
Thanks, Will look into it more
>>
It's up
https://huggingface.co/zai-org/GLM-Image
>>
>>107856290
and it looks terrible, seems like only Alibaba is able to be good at both LLMs and image models
>>
>>107856248
>You can use your own backend with claude code
ok, and your data or the data from your backend, aren't being sent to the LLM for processing? if not, how does it decide what to do?
>>
>>107856290
Auto regressive diffusion hybrid?
Is that something we've seen before?
I don't keep up with image models.
>>
>>107856299
The backend is the LLM. You use your own LLM instead of using anthropic.
>>
File: still bad.png (934 KB, 709x888)
934 KB
934 KB PNG
>>107856290
>Still retarded at text
>>
File: 1750635898483648.png (93 KB, 223x293)
93 KB
93 KB PNG
>>107856346
erotic
>>
>>107854670
not only can it do that but it helped me code it in the first place.
>>
>>107856424
>>107856424
>>107856424
>>
>>107856303
ok, I see. where is it running, though? in your own server/machine?
>>
>>107856489
It runs wherever you run it, yes. There are no compulsory cloud components.
>>
>>107849813
I dont get it. . . why aren't you happy you found plant Y? It was the one you were looking for.
>>
>>107856500
ah, that's awesome, great for local, I guess.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.