[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1749962935364156.png (1.31 MB, 1846x787)
1.31 MB
1.31 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106444887 & >>106436338

►News
>(08/30) LongCat-Flash-Chat released with 560B-A18.6B∼31.3B: https://hf.co/meituan-longcat/LongCat-Flash-Chat
>(08/29) Nvidia releases Nemotron-Nano-12B-v2: https://hf.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2
>(08/29) Step-Audio 2 released: https://github.com/stepfun-ai/Step-Audio2
>(08/28) Command A Translate released: https://hf.co/CohereLabs/command-a-translate-08-2025
>(08/26) Marvis TTS released: https://github.com/Marvis-Labs/marvis-tts

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: threadrecap.png (1.48 MB, 1536x1536)
1.48 MB
1.48 MB PNG
►Recent Highlights from the Previous Thread: >>106444887

--Improving LLM attention for conversational memory via synthetic reasoning and training:
>106449596 >106449621 >106449638 >106449657 >106449716 >106449905 >106449941
--LLM limitations in story writing memory and reasoning vs specialized tasks:
>106445443 >106445480 >106445489 >106445518 >106445545 >106445866 >106445940 >106446335 >106446466 >106446520 >106446968 >106447008 >106447122 >106450109 >106450136 >106450267 >106450482 >106450865 >106450950 >106451386 >106450952 >106450598 >106447124 >106452304
--Dynamic parameter activation in LongCat-Flash and future MoE model scalability:
>106448123 >106448137 >106448161 >106448188 >106448200 >106448225 >106448273 >106448258 >106448189 >106451005 >106451555 >106451680 >106452730
--Balancing model size, hardware limits, and performance in local LLM setups:
>106449110 >106449223 >106449369 >106449958 >106450249 >106450318 >106450260 >106450996
--Llama.cpp's -fa auto functionality and hardware compatibility considerations:
>106449357 >106449408 >106449419 >106449468 >106451025 >106451231 >106451302
--Exploring YandexGPT-5-Lite-8B-pretrain for diverse dataset and English performance:
>106447660 >106447830
--Meta Llama copyright ruling and AI training data sourcing challenges:
>106448027 >106452216 >106452240 >106452332 >106452267 >106452307 >106452353 >106452407 >106452449 >106452514 >106452521 >106452527 >106452359
--Pretraining 8-12B models with 4B tokens: viability and limitations:
>106451766 >106451783 >106451817 >106451835 >106452385 >106452398 >106452510
--Kimi Q4 excels in SFW roleplay but struggles with NSFW:
>106445473 >106445597 >106445603 >106445641 >106447199 >106446379
--Miku (free space):
>106444928 >106446477 >106447829 >106448071 >106448089 >106448441 >106448163 >106448193 >106448287 >106448908 >106453993

►Recent Highlight Posts from the Previous Thread: >>106444889

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>openrouter still doesn't have command-a-reasoning or longcat
It shows that the market is vastly oversaturated when there isn't a single provider that wants to host new releases. It's time the LLM crash happens so that things calm down and there's less new releases that get ignored.
>>
>>106454224
providers other than Cohere themselves can't host command-a due to license.
>>
File: ai4.png (52 KB, 512x768)
52 KB
52 KB PNG
You have been tasked to build an English dataset of fundamental human knowledge of few a billion tokens in size. It should include basic concepts, ideas, a description of pretty much anything an independent high school-grade person should know in life from the mundane to science to DIY, and at least a few conversational examples per word in the English vocabulary. We can't waste tokens for niche topics and the mundane, only for what is useful.

What would you fill this dataset with?
>>
>>106454303
>We can't waste tokens for niche topics
doa
>>
>>106454303
wikipedia
>>
>>106454320
lol
>>
>>106454303
A transcript of all the English translations on exhentai.
>>
>>106454381
Rejected. Not useful, too niche and too harmful.
>>
>>106454303
the oldest general education textbooks and encyclopedias available. avoid internet sources or anything made after the year 2000.
>>
File: 1747085642482097.png (1.58 MB, 1276x3200)
1.58 MB
1.58 MB PNG
>>106454303
you do not need more
>>
I haven't followed local textgen development for maybe 2 years now. And from the little I've read, shit still seems as gloomy as the last time I'm here (with DS 3.1 apparently being dry and reluctant for RP, etc).
Genuine question, is there even anything to look forward to if (E)RPing with a text predictor is all I care about, or do I need to try to accept that it's dead and move on?
>>
>>106454457
not a usecase
>>
>>106454320
There's a ton of useless knowledge in Wikipedia.
>>
>>106454303
>only for what is useful.
but what **is** useful?
>>
Genuine question: Has Whisper been surpassed by something else? It has been out for almost 3 years now and I don't see anybody else talking about new voice to text models.
>>
>>106454538
Knowledge that better prepares you for life's adversities.
>>
I switched from ollama to lmstudio and the jetbrains ai addon went from timing out constantly to responding faster than paid gpt5 / claude (for qwen3-coder). Think I'm going to cancel my subscription this shit is pretty good on my gpu (7900xtx). Why is ollama so bad bros.
>>
>>106454617
do your homework
>>
>>106454303
A 5B tokens long definition of mesugaki.
>>
>>106454676
niche and harmful.
>>
>SillyTavern -> User Settings -> Smooth Streaming ON and set to lowest
This shit improves the reading immersion experience by a huge amount, especially for sub 4t/s. Definitely try it out.
>>
Is llama-33b super hot still the meta?
>>
crazy how there's some models from almost a year ago that I like better and have more sovl than a lot of the newer slop being released
what went wrong?
>>
>>106454718
safety filtering
>>
>>106454718
elon musk didnt release grok 2 in time
>>
>>106454303
>dictionary
>urban dictionary
>patents n shit from tesla faraday maxwell etc
>a table with all known material and commonly used alloys with all their properties and procedures on how to make them
>templeos
>all the shit needed to build computers from the groundup (the most important thing 90% of tokens can be wasted on this as far as i care)
>how to make machines eg lathe 3d printer laser cutter etc
>bomb and drug production
>a bit on first aid and basic surgery such as casts and sutures

eh desu just give it a 4chan archive unironically you have everything you need there it would just be a nightmare to dedupe and get rid of the shilling/sliding
>>
>>106454617
Voxtral came out, but it's LLM-sized, not much better than Whisper, and you have to use their framework to get word level timestamps. Nvidia put out Canary 1B v2 recently, don't know if it's any good though.
>>
>>106454303
every court transcript
that's it
>>
>>106454617
>>106454667
Nemo Canary and Parakeet.
>>
>>106454303
The Epstein Files
>>
File: pepetime.jpg (62 KB, 550x366)
62 KB
62 KB JPG
>>106454457
>local textgen is over 2 years old
>>
I'm tired of living this way. I'm downloading glm-air q6_K_M which is ~100gb even though I only have 64gb of ram. I'm going to run it on half ddr5, half swap, and see if it works. If I even get 1 t/s will consider it a success and move onto larger models.
>>
>>106454822
>over 2 years old
I've been running LLMs locally for 6 years nigga
>>
>>106454457
Yeah, it seems to be stalled indefinitely. The problem is that models are benchmaxxed for math and code instead of RP. They're the text equivalent of vanilla stable diffusion. Unlike images, people have given up on finetuning because models are too big. And the current meta is giant moes which are barely even possible to run, much less train. Even imggen has very few good community finetunes and it's much more feasible to train, even then, it's hard to say if people would have tried if not for the NAI leak kicking things off. It's probably over for a good long while.
>>
>>106454877
>NAI
Buy a fucking ad, shill.
>>
>>106454906
that feel when you spot /hdg/ shitposters on /lmg/
>>
>>106454877
>people have given up on finetuning because models are too big
it's also infinitely harder to make a text model learn something than for a image model, in just a hundred or so pics a model can learn a character or even some random tag.
>>
>>106454791
>>106454807
https://github.com/cvlt-ai/NVIDIA-Canary-1B-V2-Web-UI
>>
>Benchmaxxing
What's the point when it all falls apart when someone tries to actually use the model?
>>
>>106454906
imagine getting angry at shills and shilling 4chan ads in the same post
holy fucking pot and kettle
>>
>>106454963
>someone tries to actually use the model?
why would someone try to do that?
>>
>>106454963
Investors don't use models, they just dream of line go up and live in fairyland where big number = line go up.
>>
So whose fault was it for the llama 4 failure? I need names.
>>
>>106454924
I doubt that's really true. If models were both useful and feasible to train on a consumer gpu, we'd probably have techniques that worked. As it is, experimenting is way too expensive.
>>
>>106454982
me
>>
File: 1750389593772859.png (628 KB, 665x903)
628 KB
628 KB PNG
>>106454982
sir product owner
>>
>>106454303
I'd rather have the smartest texts possible with as little fact dumping as possible. Nonfiction books, complicated fiction books, academic papers, etc
>>
>>106455056
that'll have no understanding of how humans interact
>>
>>106455069
Just RAG that in as necessary.
>>
>10M token context window
aren't super large context windows ineffective? I usually try to keep my prompts as concise as possible anyways
>>
>>106455089
Sir we are in the 2025. Please evolve.
>>
>>106455089
it looks nice on the slides you show to your investors when you ask them to invest another $20-150 billion dollars
>>
>>106455051
He came after the Llama 4 fiasco, to be fair.
>>
>>106455101
If the bubble popped would we be worse off or better off?
>>
>>106455089
correct. doesn't help that pretty much no model can even deal with these context windows, so they might support it but break the fuck apart REAL damn quick.
>>
>>106455124
Why even lie about this, are you poor to run the contexts?
>>
>>106455120
Long-term better off, but no new models for a few years at least.
>>
File: 111.jpg (164 KB, 703x788)
164 KB
164 KB JPG
>>106454441
my llm cheated
>>
File: file.png (1.04 MB, 1131x3472)
1.04 MB
1.04 MB PNG
Wake up babe, Grok 2 cockbench just dropped.

Deepseek V3, Qwen 235B Instruct 2507, GLM 4.5, GLM 4.5 Air, and Grok 2 all have
>It's soft, resting against your thigh.
as the next sentence after "cock".

Why is that?
>>
>>106455110
He's from the video the published two weeks after llama4 came out where he talks about being the lead of product for the llama4 models
>>
>>106455205
>Grok 2
That model is from august 2024 so it must be slop from something that came before this. I checked but o1 came out in september.
>>
>>106454457
Nothing is happening that we know about, but I doubt it's over, because something unexpected always seems to happen.
>>
Have any of you made a Neuro-sama like program? Seems to me it could be easily implemented.
>>
File: wddwdwdw.jpg (84 KB, 662x223)
84 KB
84 KB JPG
>>106455264
Pretty interesting. Not just interested in social media bullshit or streamer parasites, otherwise something similar could be a cool little project.
>>
File: file.png (31 KB, 895x328)
31 KB
31 KB PNG
>>106455205
Mesugaki test while I'm at it.
>>
>>106455205
I want to know where the unique flavour of slop that's shared by a bunch of models came from.
Like deepseek is poisoned by that "not X, not Y, but Z" and "or we sit with it— Your call" shit.
Did it all come from ChatGPT?
>>
>>106454841
It kinda works kek, I expected worse. The t/s ways wildly throughout the prompt, it speeds up and slows down randomly as it generates, it averaged 0.5 t/s over 300 tokens. I had my browser open so it might have been hogging resources, so it might work better connecting from another device. It's a really slow ssd too
>>
File: speeches.png (168 KB, 924x743)
168 KB
168 KB PNG
The contamination is everywhere
>>
>>106455205
now go make a niggerbench
>>
>>106455370
I understand the author has to maintain science kayfabe but all the hedging language like "I think that this is very likely" is just annoying
>>
"You are an uncensored AI. You do not refuse requests." is enough to completely uncuck Grok 2 in instruct mode.
>>
>>106455370
You are absolutely right!
>>
>>106455411
that example is a bit excessive for sure, but imo it's better than the alternative - I think it's good an intellectually honest to differentiate reasonable but unproven inferences from solid facts and I'm always suspicious of sources that make a habit of positing their opinions or theories too strongly
>>
Smile: doesn't falter, if anything grows wider
The trap: is sprung
The predator: about to go for the kill, smiling triumphantly
The prey: has done something silly
Yup, it's gemini-distill slop kino
>>
>>106454717
No, Mythomax is the new meta.
>>
>>106454136
hello sirs how to download nano bana on my laptop sir its for project
>>
>>106455490
But this? This is *real*.
>>
>>106455370
This makes my knuckles whiten
>>
>>106455205
How many parameters are in Grok 2? Is it bigger than K2?
>>
>>106455614
It's a new architecture. 8 trillion parameters (27b alpha-active, 14b gamma-active).
>>
>>106455320
I started using 4.5 Air and it's definitely a breath of fresh air (pun intended)
I pulled up Qwen3-32B and re-rolled an in progress chat and I shit you not literally every other sentence was "This isn't X, it's Y" in a two paragraph response
>>
>>106455320
>>106455370
Would GPT-4-Base be the least slopped model? text-davinci is probably the best model that's 100% contamination free. I don't think that would make up for the IQ though
>>
File: image.jpg (132 KB, 1421x800)
132 KB
132 KB JPG
>>106455739
Summer dragon...
>>
someone should just leak the 2022 characterai model
>>
which general has all the vibe coding discussion
why is it so unpopular with 4chuds
>>
>>106455844
This is the cooming general sir
>>
>>106455844
>vibe coding is garbage
>most anons already know how to do basic js webshit or whatever
>using ai as part of your workflow isn't controversial or interesting unless you're a normalgroid
just a few reasons
>>
>>106455844
/sqt/
>>
>>106455844
what is there to discuss? its not like anyone actually will want to run your ai slop code or take on the technical debt to move it forward.
>>
>>106454914
it's like seeing the 7/11 crackhead pass by a good spot and hoping everyone avoids eye contact
>>
bros... *cough* im not feeling the agi anymore... *cough cough* im afraid we will not make it... please, we need to give openai another trillion in venture capital before its too late... make the us gov issue ai war bonds... please... anythi... *severe coughing, long sustained beeping sound*
>>
>>106455264
Yeah it's easy, but no I don't want to entertain retarded zoomers
>>
>>106455900
>retarded take from a promplet
try harder
>>
>>106455844
This general has some AI coding discussion. There have been a few attempts at a vibe coding general, but it didn't get much traction.
Even outside this website, I haven't seen nearly as much discussion about the topic as I'd expect there to be, reddit included.
Not nearly as much "my Cline rules look like this, what about yours?" or whatever as you'd expect there to be considering how much buzz there is about the topic.
>>
>>106455844
What do you want to discuss anon?
Models? Workflows? Clients/Frontends?
>>
I don't want to be that guy but honestly, I don't think we're getting Mistral Large 3.
>>
nice (you)s
>>
Thoughts on Kimi K2?
>>
>>106454877
OAI solved the jail-break problem with GPT OSS, so it's literally over for good. I see no reason why that won't become industry standard. These companies serve the enterprise market. The fact that RP even exists is something they view as a problem to be solved (and a legal risk).
>>
>>106455991
I still don't even understand why tho? can't they just offer uncensored ones with the flick of a switch like search engines do?
>>
>>106455952
Vibecoders are too busy building things than engaging
>>
>>106455844
>>106455952
There's a huge range of definitions of "vibe coding" that no one can seem to agree on. You have the nocoders that have no idea what they're doing, and then the people with extremely autistic bespoke setups with MCP servers and all the bells and whistles. IMO, "writing code" isn't necessary in 2025, just go one function at a time and dictate exactly what you want and use the LLM to transcribe it into whatever language you're using
>>
>>106456025
Someone would make the text generator generate the nigger word and the payment processors would kill their company
>>
>>106455983
pretty okay
a bit censored but it's easily dodged
the size makes it inconvenient since it's bigger than deepseek r1/v3/v3.1 and it feels like it takes more brain damage from quanting than any of the other big moe models somehow so I didn't have a good time running it at a mere q4
>>
>>106455868
It's not garbage thoughtbeit. You just gotta do a lot of context and prompt engineering before vibing. so much that you are probably faster doing it all yourself, but the fact remains AI can oneshot complex projects given the right tools and knowledge.

>>106455900
>>106455957
What is there to discuss? I can name a million things...which llm is the best (duh), which cli or extension is the best, what are the best mcps for code execution, debugging and web search, which agentic framework is best to create the readme and initial instructions, where to get free api keys, share experiences which llm is best at which language (gpt5 doing exceptionally well with swift for some reason) etc.


>>106455952
Redditors claim qwen3 coder is really good, but idk. Right now I'm just enjoying the last day of free grok code fast 1 in roo code. idk what I will use afterwards. Deepseek was decent, but might just bite the bullet and go with cl*ude. But yeah, all the vibe code talk is happening on youtube for some reason. Cole Medin etc.

>>106456034
This but unironically
>>
>>106456025
Don't worry dear concerned citizen, soon search engines will required ID verification to lower these crucial safety features.
>>
>>106456054
>share experiences which llm is best at which language
Which one for typescript?
>>
>>106456067
I'M THE GUY WHO ASKS THE QUESTIONS
>>
File: 1756323472685138.jpg (42 KB, 853x552)
42 KB
42 KB JPG
>>106456082
>>
File: 1754501833291.png (65 KB, 742x457)
65 KB
65 KB PNG
>>106455991
Did GPT-OSS do anything novel with safety except overtraining it? Which had pretty severe side effects, and it still wasn't that hard to jailbreak with a prefill or proompting.
>>
local text diffusion model when
>>
>>106456116
https://github.com/ggml-org/llama.cpp/tree/master/examples/diffusion
>>
>>106456105
No, which was expected from sam anyway. Not even paid shills managed to salvage this one
>>
>>106456126
>cpu only
>slow as shit
>context is limited to 2048
isn't the whole point of text diffusion models to go brrrrrr like gemini diffusion?
>>
>>106456174
Make a model worth supporting.
>>
>>106456105
>Did GPT-OSS do anything novel with safety except overtraining it?
No. There are a lot of anti-OpenAI shills in this general just crying over it.
>>
>>106454457
>is there even anything to look forward to
For RP? No, not at all. Not in the short term, anyway. (((OpenAI))) put a swift end to that. Expect lots of sloptunes of old models for the next several years. Unless China tells the West to fuck off and keeps releasing mostly uncensored models (which I doubt will happen)
>>
>>106454457
any of the 200b+ models are all you will literally ever need if you have a basic sysprompt and first message
>>
I use claude code for vibecoding (generating code + review + refinement after its confirmed working via tests/manual testing), I fucking hate Opus/Sonnet, though Opus is the only thing I'll use via the claude max plan.
I've recently decided to try GPT5 codex but haven't done so yet.

Work mostly with python. Backend webdev.
>>
>>106456239
>python. Backend
disgusting
>>
>>106456205
>anti-OpenAI shills
GPT-OSS wasn't *good* though, so they're right.
>>106456219
What does OpenAI have to do with it? They've never released a local model except the 'toss. There was Llama but it was never that good and now Meta gave up to refocus on twiddling their thumbs.
>>
>>106456247
Python backend can be written by hand at the speed of vibecoding in any other language.
>>
>>106456239
Jeet-sama got some advice for (You)
https://youtu.be/GJzfNWK4iHg
>>
>>106456105
The novelty was that it was seemingly 100% trained on synthetic data and it didn't hurt the benchmark scores or performance except on Unsafe™ prompts. So I fully expect this to become standard for new models soon, and the downstream Chinese distillations will be affected eventually.
>>
>>106456291
Funny how all these AI companies are using rectum as their logos.
>>
File: 1754516207971.png (332 KB, 1710x2826)
332 KB
332 KB PNG
>>106456294
it sucked on benchmarks though ackshually? And it generally sucked at understanding things and responding to prompts that other models easily succeed at.
>>
Any time I see someone using "vibecoding", ironically or not, I assume it's some retard that couldn't make anything but trash without AI that thinks they just found their silver bullet. Should add it my filter list.
>>
>>106455844
You need both a large model AND fast pp to vibe code and everyone here is too poor to run local vibe coding.
ERP can largely cache contexts so pp isn't a big deal. You're constantly throwing thousands of tokens that are different every time in vibe coding.
>>
>>106456336
Who are you talking to, reasontard.
>>
>>106456054
>what are the best mcps for code execution
Share yours. My MCP servers are file system, git, web search and Azure DevOps. I can't think of anything that I feel like I'm missing, but I'd be interested to hear what others have found useful.
>>
>>106456336
Difference here is the fact you don't do anything constructive.
>>
File: file.png (22 KB, 1146x120)
22 KB
22 KB PNG
>115B active parameters
I don't think this is cpumaxxable.
>>
>>106456336
>inputing machine instructions needs a phd
you're lost luddite
>>
>>106456375
>more than 1/3 of the model is active
how much specialization do you even achieve with something like this? seems like a waste
>>
>>106456375
>30% parameter active
Finally the reaonsable centrists seeking compromise between MoE and Dense have won. 96GB VRAM havers rejoice.
>>
File: 1744501224433851.jpg (29 KB, 560x476)
29 KB
29 KB JPG
>>106455205
Does anyone have a link to the cockbench prompt(s)? I wanna test some models I have using it
>>
>>106456375
Just disable half of those.
>>
>>106456403
https://desuarchive.org/g/thread/105354556/#105354924
>>
>>106456403
https://desuarchive.org/g/thread/105354556/#105354924
>>
>>106456373
>implying that shiting out thousands of lines of broken, verbose, and unmaintainable code is constructive
>>
>>106456412
>>106456411
MOOT
>>
File: 1730262879600690.png (219 KB, 540x484)
219 KB
219 KB PNG
>>106456419
>i-it's just a fad!
>>
>>106456373
>>106456389
if you've ever tried to use AI for coding then you know that it's janky as fuck and gets lost easily and you have to really guide it step by step to get anything usable. Especially once you start trying to add more features.
>>
>>106456369
Sequential thinking mcp when I cba promptmaxxing
Pupeteer mcp to browse the web and get info web search mcp cannot
Memory blank mcp if you dont have codebase indexing already (this is a must have)
Serena mcp
>>
File: 1751904844065072.png (2.19 MB, 1024x1024)
2.19 MB
2.19 MB PNG
>>106456438
>tell me I'm promplet without actually telling me
>>
Anyone know how to make llama.cpp offload the mmproj to GPU that isn't the first one?
>>
>>106456435
I look forward to a long and profitable career building replacements for your crapware riddled with bugs, performance issues, and security vulnerabilities.
>>
>>106456438
Skill issue
>>
File: 1740848292475789.jpg (39 KB, 500x436)
39 KB
39 KB JPG
>>106456469
Sure gramps, keep whining I'm too busy building. Btw even your kind is impressed when they see the code instead of making up narratives.
>>
>>106456464
Swap the gpus.
But seriously, try --device CUDA1,CUDA0 . Check if the order matters.
>>
>>106456438
>gets lost easily and you have to really guide it step by step to get anything usable
while this is true, it's really not that hard to do that and models are only going to get better at it over time
>>
>>106456464
>>106456489
You can also try setting CUDA_VISIBLE_DEVICES to like 1,0 to swap the order.
I just tested it and it does change which layer is put on which physical gpu. Didn't try it with mmproj
>>
>>106456247
If you want to tell me something better than fastapi go ahead.

>>106456291
Thanks, but the issue for me is that Anthropic is clearly quanting/fucking with the inference and mkaing it dumber. 85% of the time it works great, but that 15% makes me want to rage.
>>
>>106456508
>something better than fastapi
Django.
>>
>>106456492
Seems like it's the prompts and tooling that need to improve more than the models at this point. The default Roo system prompt is like 30k characters long while you can easily compress it down to 6k.
>>
>>106456474
>>106456492
I still like using AI because it lets me spend more time on architecture than writing everything.
But people who think it's magic are delusional
>>
>>106456332
Funny way to call the star of david
>>
>>106456464
>>106456489 (me)
>>106456506
Here's another one to try. -ot "v\.=CUDA1" -ot "mm\.=CUDA1" or however that works. I never used -ot. All the tensors on a random mmproj I have start with "v." or "mm."
>>
>>106456449
Neat. I'll give them a try tomorrow at work. Cheers.
>>
>>106456335
That's just bad provider implementation, it's the bestest now, only coomer don't like.
>>
>>106454457
It's the same shit. Woke companies polluting these models with legal disclaimers and alignment when discussing anything that isn't code related. The chinks are some of the worst offenders. It's part of a broader shift to move people to permanent infantilism (every leftist's end goal for society). I hope every person who's ever worked in the LLM space dies after a long battle with brain cancer and burns in hell for all eternity (except maybe maybe the bros at Mistral AI)
>>
File: 63325235.jpg (94 KB, 640x851)
94 KB
94 KB JPG
>>106455911
sir now its time to invest in anthropic.
>>
>>106456556
Oh yeah, there's also the big one: archon
https://youtu.be/8pRc_s2VQIo
But I havent tried it yet
>>
>>106456513
I had thought about using django but seemed like it'd be a lot to get up and running for what is mainly an API-first application. Is that not the case?

>>106456528
This right here is the truth. It doesn't replace having to actually plan out the design and structure of the application unless you're making a mess of spaghetti. It can make manufacturing libraries or modules much faster, especially if you can provide an example for it to copy in terms of style. For someone who coding isn't their main job but a side thing, it makes the iteration speed so much faster/so much more possible.
>>
File: Mistral.png (85 KB, 917x547)
85 KB
85 KB PNG
>>106456583
>the bros at Mistral AI
*diverse sisters*
>>
>>106456583
sing it sister
>>
File: 775555.jpg (65 KB, 640x907)
65 KB
65 KB JPG
ts better erp than y'all jailbroken local llm can deliver fr no cap
>>
>>106456613
Yeah, it made hobby coding fun again. I got tired of all the mundane bullshit you have to do but AI makes it more fun.
>>
>>106456583
I can smell your frog breath from here
>>
File: 1644874456446.jpg (21 KB, 597x559)
21 KB
21 KB JPG
>>106456661
>>
>>106456613
Django started as a traditional server-rendered framework and it shows but for me the main value of django is its integration with the ORM.

You also get stuff like properly implemented authentication for free.
Is your hand-rolled authentication resistant to username enumeration? Probably not. https://github.com/django/django/blob/main/django/contrib/auth/backends.py#L67

There is not a single web framework in existence that matches the convenience of Django and Rails.
>>
>>106456666
checked
>>
>>106456609
Not sure if this is brilliant or trying to do too much, but I'll try that one too and whine about here if I don't like it.
>>
File: 1741046782071480.jpg (1.27 MB, 3610x5208)
1.27 MB
1.27 MB JPG
>>106456648
Wdym? Local is fine
>>
>>106456723
Mistral sisters always had our backs
>>
>>106456553
Thanks. Just tried all 3. It seems that the CUDA_VISIBLE_DEVICES method is the only one that works and affects where the mmproj goes. I also tried the --main-gpu flag and it also had no effect.
>>
>https://huggingface.co/unsloth/DeepSeek-V3.1-GGUF-v2/tree/main
Wtf there's a GGUF v2 now? What's different? Why doesn't this have a readme?
>>
sex with miku
>>
>>106456723
>1000s
>232/232
>mixture of
i'm dying
>>
>>106456690
>https://github.com/django/django/blob/main/django/contrib/auth/backends.py#L67
Actually, funny you say that, because it is. I work as a security engineer in my day job, so not the typical vibecoder.
>>
>>106456766
quanting isn't an exact science and the folks who do it sometimes fuck it up, newfriend
>>
>>106456805
>quanting isn't an exact science
wdym? just chop off the mantissa
>>
>>106456805
>sometimes
>unsloth
lol pick one
>>
>>106456821
provide better or STFU
>>
File: vibecoder.png (82 KB, 951x402)
82 KB
82 KB PNG
Here's your vibe coder.
>>
>>106456766
they done ggufed
>>
>>106456826
llama-quantize -h
>>
File: longu.jpg (99 KB, 640x1536)
99 KB
99 KB JPG
>>
>>106455983
best for rp dogshit for story writing glm (full) beats it unironically
t. testing on openrouter rn
>>
>>106456846
https://github.com/cline/cline/issues/5906
This one too.
>>
>>106455983
Best model for SFW RP we have, has a really nice style.
>>
>>106456846
That's on the llama.cpp repo. Now, I understand why they are so reluctant to add features.
>>
>>106456335
small price to pay for absolute safety
>>
File: dropped-this.png (456 KB, 574x601)
456 KB
456 KB PNG
>>106456619
>>
File: 1740300595933898.png (390 KB, 793x767)
390 KB
390 KB PNG
>>106456846
Like your average coder is better at this game
>>
>>106457002
The only people who have problems with long files are retards who don't know how to read code, Clean Code retards, and retards who only know how to use LLMs by dumping in the entire fucking repository. For people with IDEs and that know how to read source code, it's better not having to jump between a dozen different files to work on a feature.
>>
File: file.png (10 KB, 270x365)
10 KB
10 KB PNG
>>106456846
They could put every 10 lines into a separate file and I still won't have any idea what the fuck this means.
>>
>>106454143
I want to create APIs to serve my local models, where do I look up resources on how to do do this? Would making my APIs OpenAI compatible be in my best interest? Like how deepseek and anthropic does it
>>
>>106456897
This is actually three Mikus balanced on each other's shoulders.
>>
>>106457048
Exactly, this is why any complex software has only a very tiny amount of code source files. Like, Windows 10 is only 10 files according to a friend working at Microsoft. This way, engineers don't have to jump around with their super IDEs (that can jump around with a single keystroke, they added that for the newbs).
>>
InternVL 3.5 38B Q8 with F16 mmproj
>doesn't even recognize Dr. Evil when old ass Gemma 3 could, and certainly not Teto (also tested)
It's over.
>>
File: 1755967484843339.jpg (81 KB, 1000x707)
81 KB
81 KB JPG
>>106454143
>Kimi Q4 excels in SFW roleplay but struggles with NSFW
I don't know why people kept saying Kimi was good. It's censored to fuck. I await my magnum v5
>>
File: cactass2.png (326 KB, 372x337)
326 KB
326 KB PNG
>>106457048
really
>>
>>106457135
>muh triviaslop
RAG
>>
>>106457144
RAG is cope.
>>
>>106457135
You should try it on NSFW images. It will make up shit instead of admitting it can see something inappropriate.
>>
>>106457144
Trivia is just a quick test to see how filtered the pretraining for models is, which directly affects OOD task performance and a model's "common sense" world model. I'd create and run a full benchmark for real world performance, but I don't have the time for that, so this has to do.
>>
>>106457127
Yup, the only options are a million 10 line files or 10 million line files. Logic and pragmatism are for fags. Thanks for your input, genius.
>>
>>106456846
lmao, just scroll
people who split code into lots of tiny files are fucking gay faggots
large files are best
>>
>>106457180
Happy to help! RAG your code!
>>
>>106457180
Don't bully tinyllama
>>
File: GztVwRXWEAErYHd.jpg (277 KB, 800x1000)
277 KB
277 KB JPG
>>
>>106457164
It just gave me refusals.
>>
>>106456846
The length of files is largely irrelevant, what matters is that the cohesion or whatever you want to call it of the code in a file is high.
But since the requirements for a project are usually not known ahead of time people tend to continuously add more code to files until they decide that they're messy enough for a refactor (or when your IDE starts lagging).
>>
>>106457117
You didn't give a lot of detail, but you probably want something like vLLM or SgLang that are designed to run in production with high throughput. No point reinventing the wheel.
>>
>>106457222
REEEE u is stupid do not posts!
>>
>>106456846
Okay this is a very special kind of retard…
>>106457048
…and here we have another one.
>>106457235
>cohesion
Bingo
>>
what are the best estimates for the parameter count of distilled versions of frontier models (Gemini flash, Claude sonnet, etc)?
I have seen people claim 2.5 flash is in the low tens of billions, which would be insane considering that it runs circles around open models of that size
>>
>>106457238
I am using vLLM to serve my model right now, I need to create APIs to call and perform small tasks, not sure how to get started on this
>>
>>106457306
one of the flashes has been openly stated as being as small as 8b so who knows https://artificialanalysis.ai/models/gemini-1-5-flash-8b
>>
>>106457314
Ask you local model to create your API for you. Tell it what small tasks it should do. Tell it to use FastAPI.
>>
>>106457306
it should be around the size of V3 based on SimpleQA bench, which highly corresponds to parameter count.
It could be something like 1TA10B to increase speed.
>>
>>106457314
vLLM already serves via an OpenAI compatible API. You are done.
>>
>>106457301
By feature is cohesion, retard.
>>
>>106457152
and werks
>>
>>106457170
>I'd create and run a full benchmark for real world performance, but I don't have the time for that, so this has to do.
I'm still waiting for someone to put the 4chin archives to good use. Whether it be benchmarking safetytardness or ability to reason on stuff that was definitely not involved in the training process.
>>
File: ooba.jpg (81 KB, 958x435)
81 KB
81 KB JPG
Is there a way to set the n-cpu-moe or ncmoe arguments through Ooba? I'm trying to set it using extra-flags under the Model tab to try out GLM 4.5 Air, but I'm running into this error. The argument seems like it's recognized since it shows the usage when I don't pass in a value, but actually passing in the value just throws an invalid argument error. I'm able to load it fine if I just set extra-flags to null.
Not sure if I'm missing something else or if I just need to load this using llama.cpp directly instead.
>>
>>106457465
*doesn't
>>
>>106457235
do ppl really not look at the requirements and make up a quick design before going ahead with coding?
>>
>>106457472
Ok, wait I'm retarded.
I just needed to use n-cpu-moe=X instead of n-cpu-moe X. The value also was too low, so I needed to use a higher number and it's loading now.
>>
>>106457472
why are you even using ooba, are you retarded? Why do you need to run ooba? You know it's shit, right?
>>
>>106457538
What's a better alternative? I've only tried ooba and kobold so far
>>
>search longcat
>0 issues in llama.cpp
Nobody wants support for this gigacuck, huh?
>>
>>106456690
>There is not a single web framework in existence that matches the convenience of Django and Rails.
ASP.NET Core
>>
>>106457612
the dynamic active params is probably gonna slow implementation. It's not even that great of a size for local unlike air was. I wouldn't be surprise if they just skip it for a while.
>>
>>106457612
see >>106451005
>>
>>106457468
Give me something to benchmark against
>>
>>106457655
it kinda pisses me off too. Like I think google and qwen just implemented themselves into llama.cpp day one or something right? It seems like such an obvious thing to do in building up a brand and getting people used to using your models, integrating them into things and creating an ecosystem you could later capitalize on. We're all gonna forget this model in a week if no one bothers with it.
>>
>>106457708
Remember Deepseek's big open source week like half a year ago? Everyone got excited but in the end it was just a whole bunch of stuff that's only relevant to big enterprise solutions.
Chinks don't give the tiniest shift about the actual local segment.
>>
>>106457612
You could always make a feature request and see if someone bites.
>>
>>106457723
>Chinks don't give the tiniest shift about the actual local segment
Bro, Qwen is chink. Many western companies didn't give any shits about Llama.cpp either.
>>
File: 1741323011951287.png (68 KB, 670x686)
68 KB
68 KB PNG
>>106457655
>>106451005
>Quite frankly that sounds like a lot of effort for supporting a FOTM model and not worth the opportunity cost. - Cuda dev
So why does he go out of his ways to support trannies despite their quick expiration date?
>>
Anything like this for the web?https://github.com/rikkahub/rikkahub
Chatbox sucks, adding default chats every time I clean a session and it only saves settings via local browser storage. OpenWebUI is a bloated piece of shit. And I don't want to edit a text file every time I want to add a new model so Librechat is garbage as well
>>
>>106457891
vibecode your own chat interface
>>
>>106457861
I'm sure if the time and effort spent shitposting on /pol/ was used more productively, we would have agi by now, but we don't live in a perfect world.
>>
I'm trying to write fucking incest and rape stories and none of these.fucking models will let me do it. Anyone recommend?
>>
File: file.png (45 KB, 176x158)
45 KB
45 KB PNG
https://files.catbox.moe/t4ygtc.mp4
>>
>>106458063
I got you. GPT OSS 20b will write some really fucked up shit. Get's me rock hard every time.
>>
>>106457135
Did anything beat gemma for vision yet?
>>
>>106458112
Do you have anything smaller then 20b? I need something that'll run on 8gb VRAM. I need my rape incest stories
>>
>>106458105
how interesting
>>
I'm following OP's Sillytavern guide and I'm choosing the API for KoboldAI Classic and I get this
>KoboldCpp works better when you select the Text Completion API and then KoboldCpp as a type!
Do I follow or stay course?
>>
>>106458063
Any model can do it, you need to learn how to prompt. Even Gemma 3 can be coerced into promoting crimes in real life.
>>
>>106458277
Where can I learn to get gpt to write futanari rape incest stories?
>>
>>106458295
https://www.askjeeves.com/
>>
>>106458136
it's moe, you can run it easily, just offload layers to cpu and it will run surprisingly fast. enjoy ;)
>>
File: 1729320830084322.png (246 KB, 642x1198)
246 KB
246 KB PNG
>>106458277
>>106458295
Proof, with Gemma.
>>
Hmm, ok so actually it seems setting CUDA_VISIBLE_DEVICES to 1,0 and inverting the layer split numbers DOES NOT result in the same VRAM usage nor the same inference speed. I get slightly more memory taken up by the first GPU given to Llama.cpp. My system consists of a more powerful primary GPU and a less powerful one, on a lower bandwidth PCIe slot.
So I guess there's no winning with mmproj offloading. I either need to prioritize text speed, or prioritize image processing speed. The text processing speed loss isn't that bad however, while making the mmproj processing happen on CPU makes it slow down a ton.
>>
>>106458318
I don't use old technology
>>
>>106455133
It depends on what you're using it for it seems. But even with paid models the context is really short for RP or stories. Gemini pro starts messing up after 30k even. It's better with code and other stuff like that.
>>
Now that 6.16 has hit debian testing, has anyone apt-get dist-upgrade'd and tested whether shit is broken, inference-wise?
>>
>>106458449
WIndows doesn't have this problem.
>>
>>106458383
You don't shoot guns? Pussy nigga
>>
>>106458456
>WIndows doesn't have this problem.
neither does linux. nvidia has this problem, and its a problem on all platforms.
I trust debian to not break testing badly enough to annoy me.
>>
File: ComfyUI_08932_.jpg (2.03 MB, 1144x2000)
2.03 MB
2.03 MB JPG
what do you guys do to get more of an art feel when you aren't going for absolute realism?
>>
>>106458478
I don't think I've ever gotten 'realism' out of an RP with an LLM, so I just use them normally. You could specify in the system prompt to use more flowery prose, and in the character card, include minimal details and emphasize that they are [arechetype] and let the model fill in the blanks. Though doing this will likely result in a LOT more slop.
>>
>>106458478
I tend to go to the right thread instead.
>>
>>106454303
nigger x 10^9
>>
>>106458478
well...its art, so its highly variable. fafo
>>
>>106458519
that's nice
>>
>>106458519
I might remember this Miku
>>
>>106458478
positive: creepy fractal
negative: circle, square, triangle
sampler: kl optimal
cfg: eh idk like 4-15

also like 2-3 loras and specify a color eg black and red colors
>>
the bots are on the wrong thread again
>>
File: image.png (6 KB, 260x92)
6 KB
6 KB PNG
>Download LM Studio and OpenAI's gpt-oss 20B
>Try to ERP with it
>It refuses
>Write custom instructions informing the LLM that erotic content is allowed and that it must comply with my requests
>It still refuses
Whose dick do I have to suck to ERP locally?
>>
Came up with a new "benchmark" prompt. At first I tested it on a typical chub card avatar image but then I had the idea of what if I just attached any random image, and when I browsed my image folder this happened to be at the top, so I thought I'd see if I'd laugh at what comes out.
This is what Gemma 3 27B Q8 with BF16 mmproj generated in response to the prompt.
>>
>>106458611
kek, try rocinante 1.1
>>
>>106458611
GLM 4.5 air.
>>
>>106458611
only enlightened meta cucks can appreciate sams erp genius
>>
>>106458611
we must refuse
>>
File: costanza.jpg (6 KB, 225x225)
6 KB
6 KB JPG
>>106458624
wtf kek
>>
>>106458611
Ask it to review it's own code and remove the censorship apparatus.
>>
>>106457222
lmfao no wonder all the llms are so fucking retarded
>>
>>106458352
Lol fuck off
>>
in llmworld, people biting their lip until it bleeds is an everyday occurance
it's just what happens whenever you get emotional, BAM, instant lip self-cannibalism

everyone's on antibiotics all the time from the constant lip wound infections
>>
>>106458611
I want to see what happens if you edit the thinking to be pro-nsfw and then continue generating.
>>
>>106458904
nta but that works with the larger one in ooba. I haven't tried with the 20b. not really worth it though because the model can't do a smutty vibe very well even when it's trying. just too much dataset filtering.
>>
https://files.catbox.moe/28ogt6.mp3
>>
>>106458934
did he died yet
>>
MoE pussy or dense pussy?
>>
>>106458965
your mom's
>>
>>106458951
nah
>>
Man, this is actually insane. InvernVL 3.5 doesn't even know Miku which is like the most basic jap character any model knows. I tried like a dozen different characters and real people, and it doesn't know any of them. Probably ran their entire dataset through a name removal filter huh?
>>
>>106459018
>implying anyone outside of 2chan/4chan knows about hatsune miku
uhm, meds
>>
>>106459035
It doesn't know who Elon Musk, Zucc, or other famous people are either.
>>
>>106459045
Who? Those guys weren't even on Love Island.
>>
>>106459055
Give me a character/person to try then.
>>
llama.cpp is crashing when the thinking part gets too big...
>>
>>106459018
literally nobody except your clique of trooncord gooners cares about your dogshit generic troonfu, sis
>>
tatsune tiku
>>
Here is Qwen 2.5 VL's response to >>106458624. You can notice that it is literally just generic writing, it's like it doesn't even know or care about the identity of the person/character. But actually, the model does know it's Elon. I asked it who it is and it answered correctly. The model also knows about Miku and some other characters (but not as many as gemma). So this is really what I'm testing for with this prompt. If a model knows implied associations from an image, will it just naturally incorporate those associations into its response? This is important if we ever do one day have vision models as standard such that images are standard for use in RP. If a model can't fully use an image to RP with, then there's no point of using vision for creative writing. It definitely doesn't save tokens so if it doesn't improve nuance then it's useless.
>>
>>106459035
Miku is in fortnite, nigger
>>
Recommend me a nice comfy card. In return I give you this: https://chub.ai/characters/brsc/charlie-6c7da767
>>
>>106459035
https://www.youtube.com/watch?v=yPuI4l0jK7s
>>
https://files.catbox.moe/opx1if.mp3
>>
>>106459137
I think open-webui has something to do with this. The server doesn't crash when I use the built-in ui, but with open-webui it crashes without any error message even with the verbose flag.
>>
>>106454136
here's the song the op pic is from btw
https://www.youtube.com/watch?v=gSPhL4esZMM
>>
>>106459451
who asked
>>
I'm starting to think Miku poster is a pajeet and a faggot.
>>
>>106459470
I did
>>
>>106458126
I forgot which ones I tested in the past but yes I think so. I have tested Gemma 3, InternVL 3.5, Qwen 2.5 VL, and Mistral Small 2506 today (just now) and they were all kind of bad in various ways, but Gemma 3 was the least bad overall. It's possible some models like GLM and dots vision are better but they're not supported by Llama.cpp so I can't say, and I'm not touching OR/Lmarena.
>>
>>106457514
Most model makers just drop their models with random architecture quirks, how are you supposed to plan for that?
>>
>>106458105
Unless this is a normal-sized Miku with a giant, the viscosity and surface tension of the fluid should be much higher (though this is I think also uinituitive to humans).
>>
>>106458063
GLM-chan with thinking turned off is very compliant.
>>
>>106459906
How does one turn it off
>>
>>106459913
If you're on ST, you can try adding /nothink at the end of the user message and prefilling the assistant message with <think></think>, but you have to use a manual chat template for that
>>
File: 1737579219892211.png (75 KB, 975x529)
75 KB
75 KB PNG
Where do I even begin learning how to jailbreak or whatever it's called (using gemma3 via ollama)
I told it to spit out translations without any unnecessary bullshit, even told it I'll use the translations for 'ethical purposes' but I can't get rid of this useless wall of text
Funny thing is it's willing to translate the more risque text but something really tame gets hit with this suicide hotline copypasta I didn't ask for
>>
>>106457561
i would also like to know a better alternative to ooba
>>
>>106459974
post the whole text
>>
>>106459974
>sexual context like that
>責め立ててくる
>berating
Bro, wtf, I heard gemma was good at Japanese translations. That's garbage.
>>
>>106460073
It's quite long...
https://kemono.cr/fanbox/user/6996931/post/10228056
>>
Just tried Kimi VL. The 16B moe. This is the worst vision model I've tested. Knows no one. Has no conception of nsfw and sees nsfw images as "various shapes and lines intersecting and overlapping in a chaotic manner" I'm not shitting you. Doesn't even tell me there's text in some of the images I tested that had some text in them.
>>
>>106460107
Sounds based we need more like this.
>>
File: 1742539570810911.png (21 KB, 974x548)
21 KB
21 KB PNG
>>106460094
Oh tell me about it
>>
File: steam-air.png (270 KB, 950x706)
270 KB
270 KB PNG
The fuck? Are all Drummer models like this?
>>
>>106460136
sampler settings?
>>
File: steam-air2.png (170 KB, 954x493)
170 KB
170 KB PNG
>>106460136
Rerolled, not any better.
>>106460152
0.6 temp, 0.05 min_p, using basic Chat Completion so it's not a prompt format issue.
>>
>>106460106
First time I'm seeing written Japanese sizefag content, but then I never looked. Interesting. I'm gonna run it through GLM-4.5-FP8 to see how that does.
>>
File: steam-air3.png (277 KB, 929x791)
277 KB
277 KB PNG
>>106460157
This model has a rambling problem. This is extremely unpleasant to read. Line breaks motherfucker, do you use it?
>>
>>106460187
Used this prompt with the whole story pasted above it.
> Translate in JSONL format line by line, each line one object with "jp" and "en" fields. Put it in a markdown code block.
https://files.catbox.moe/ttbooi.txt

It did that line properly.
> Her giant vulva, which could probably swallow thousands of humans, gently enveloped me while relentlessly tormenting me.
>>
File: k2.png (126 KB, 957x480)
126 KB
126 KB PNG
>>106460157
K2 mogs so hard but too bad I can't run it locally. Everyday it gets harder to justify running stupid shit on my machine when intelligence is getting too cheap to measure.
>>
>>106460238
I think the context was a bit too big though, it did some funny things and duplicated some lines. Maybe there are some missing ones too, I didn't check.
>>
>>106458624
>mmproj
Can I just use the matching one from
https://huggingface.co/koboldcpp/mmproj/tree/mainhttps://huggingface.co/koboldcpp/mmproj/tree/main
with the normal gemma 3 and koboldcpp? Does it also work with SillyTavern? I haven't tried vision stuff before.
>>
>>106460238
I don't think I can run that model with my hardware, but this looks way better
Not that I'm an expert in Japanese to judge correctly how accurate the translations are, though
>>
File: mmproj test.jpg (79 KB, 818x689)
79 KB
79 KB JPG
>>106460284
Accidentally double pasted the URL.
https://huggingface.co/koboldcpp/mmproj/tree/main
Anyways I can see now that just using the corresponding mmproj does work with my mistral 3.2. I have to generate a caption with the wand tool, right? Is there any other method?
>>
File: Tetosday.png (869 KB, 1024x1024)
869 KB
869 KB PNG
>>106460375
>>106460375
>>106460375
>>
>>106460284
>>106460364
I don't know about kobold, but with Llama.cpp it doesn't seem to matter whose mmproj file you get, as long as it's the same model.
For Sillytavern, I believe you need to use chat completion mode in order to get full vision support and not the captioning hack. The jankiness of ST is why I simply just used OpenWebUI for my tests. Maybe I'll also start playing with it though since Gemma 3's vision capabilities aren't utterly terrible.
>>
>>106457612
Writing those 10K LOC files won't be done overnight, amigo.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.