[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: comfyui_00073_.png (1.31 MB, 1024x1024)
1.31 MB
1.31 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107503699 & >>107493611

►News
>(12/10) GLM-TTS with streaming, voice cloning, and emotion control: https://github.com/zai-org/GLM-TTS
>(12/09) Introducing: Devstral 2 and Mistral Vibe CLI: https://mistral.ai/news/devstral-2-vibe-cli
>(12/08) GLM-4.6V (106B) and Flash (9B) released with function calling: https://z.ai/blog/glm-4.6v
>(12/06) convert: support Mistral 3 Large MoE #17730: https://github.com/ggml-org/llama.cpp/pull/17730
>(12/04) Microsoft releases VibeVoice-Realtime-0.5B: https://hf.co/microsoft/VibeVoice-Realtime-0.5B

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: hreadrincap2.png (1.01 MB, 1536x1536)
1.01 MB
1.01 MB PNG
►Recent Highlights from the Previous Thread: >>107503699

--Repetition issues and the role of RL through CoT in overcoming computational limitations:
>107504602 >107504661 >107504666 >107504736 >107504800 >107504841 >107505113 >107505207 >107505888 >107506095 >107509605 >107505948 >107507567 >107507849 >107504758
--Contrastive framing as a potential LLM reasoning mechanism:
>107504040 >107504200 >107504233
--AGI development debates: transformer limits vs. brain-inspired efficiency:
>107513768 >107513942 >107513888 >107513933 >107513978 >107513988 >107513999 >107514502
--AMD Radeon AI PRO R9700S/R9600D launch with passive cooling:
>107514956
--Optimal temperature settings for generative models:
>107508090 >107508354 >107509122 >107509262 >107509249 >107509271 >107509750 >107509784 >107510149 >107510171
--Unsloth claims 3x faster LLM training with new kernels:
>107504653
--Managing GLM air 4.5's parroting issues through advanced prompting techniques:
>107507974 >107508087 >107508191 >107508213 >107508595 >107508505 >107513167 >107513174 >107514336 >107515368
--Dating sim generator performance and interface comparison:
>107504350 >107504396 >107504498
--MoE vs dense tradeoffs and Mistral's generalization potential:
>107510981 >107511416 >107512186
--Testing Japanese voice clone accuracy with kanji/furigana inputs:
>107512054 >107512265 >107512323 >107512741 >107512889 >107512912
--Meta's AI strategy shift towards entertainment and user engagement:
>107504924 >107505320 >107505682 >107505950 >107506250 >107508079 >107508851 >107509068 >107509292 >107509295 >107509322 >107509366 >107509945 >107509981 >107510025
--LLM repetition and explicit content handling issues in Devstral/Ministral:
>107506610 >107506646 >107506682 >107509978
--Miku (free space):
>107504010 >107504323 >107504390 >107504617 >107504688 >107504977 >107505148 >107512782

►Recent Highlight Posts from the Previous Thread: >>107503701

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>107515389
Rin for the win

Did any mathanons look at Nous Nomos 1 yet?>specialization of Qwen/Qwen3-30B-A3B-Thinking-2507 for mathematical problem-solving and proof-writing in natural language in collaboration with Hillclimb AI
>>
>>107515442
>Breast feeding is one of the key ways women lose their pregnancy weight and now I understand why; it burns ~800kcal a day to support milk production.
ah so formula feeding literally makes women fat
yet another way the formula jew is jewing us
>>
>>107515368
nta, i've spent a few weeks trying to make deepseek 3.2 make dynamic storytelling, and before that I've tried with several models.

These things absolutely love their patterns. When they're writing about a character, they're defenitely not thinking like said character and they don't ask themselves what would the character realistically do.

A test I did made it clear to me. I had a setting set up with two characters. Then I created two separate contexts for the model to roleplay the characters individually. I told the LLM to roleplay exactly the character's senses and actions. And the model made both characters act in sync, as if they were part of the same context. This means that models use story patterns to decide what the characters will do next. They don't roleplay characters, they just rely on what usually happens in a genre.

This makes characters have a strong archetype flavor to them imo. For example, if a character likes planes (like any person would), suddenly all dialogue is about plane metaphors or how his room is all planes.
>>
>>107515468
>trying to make deepseek 3.2 make dynamic storytelling

Also I failed. Since models can't understand characters, the only way to make a dynamic story is to guide the model using instructions to guide the story in a certain way. The problem is that at that point you're writing the story yourself, dictating how characters evolve and the story progresses.
>>
>>107515477
>The problem is that at that point you're writing the story yourself
Go to /tg/ and look for the solo RPG general.
That should get you on the right track.
>>
>>107515468
i think expecting true creativity from assistant models is stupid to begin with. It's only theoretically justified for a true base model that has several stories prepended as a kind of conditioning.
>>
File: 1588925469482.jpg (200 KB, 764x512)
200 KB
200 KB JPG
>>107515467
>pls mommy milkies
gonna prompt with this
>>
>>107515468
Models seem to heavily emphasize personality traits. So if you list a character as say, dominant, competitive, narcissistic with a description of their appearance, and a general backstory and other descriptors, then do the same thing for another character, giving the same personality traits but a different description and backstory, both characters will practically feel the same.

Its text, the only way to really give characters distinct personality is to give them very specific speech mannerisms, accents and certain speech styles. Body ticks, mannerisms, behaviors under different situations, say something like... {{char}} instinctively twirls their hair and stammers when nervous or pressured... things like that. Otherwise they will all fall into the same, blanket personality traits and default into a samey style. Its not easy making a unique character, there has to be very autistic details about their behavior and mannerisms, as well as plenty of examples of their speech.

And as you said, the scenario has a big impact on it as well, because even a character with the same personality would react differently in a unique scenario/situation.
>>
>>107515467
>formula feeding literally makes women fat
Keeps them fat post-partum, but yes, that is one of several criticisms of feeding infants on formula. Others are: poor local water quality (3rd world issue in places where Nestle is slinging this stuff as "better than breast milk"), lack of complete nutrition and maternal pro-biotics, cost of formula, and using it locks out breastfeeding later as mother's milk dries up.
Lead to an old boycott, but I don't think any of the biz practices or dynamics have fundamentally changed since the lmao 1970's: https://en.wikipedia.org/wiki/1977_Nestl%C3%A9_boycott
> TLDR breastfeed good, formula bad
>>
>>107515546
And thats not even going into the issue of context. The further into it and the more context, the more generic a character will become,which is just a failure and limitation of the technology. Models will also start to heavily emphasize certain traits while neglecting others and the character will become one dimensional as the context increases. Sadistic characters will just be sterotypical evil villains with absolutely no nuance, even if their description says there is more to them than that.
>>
File: dipsyMerryChristmas.png (2.23 MB, 1024x1536)
2.23 MB
2.23 MB PNG
/wait/ timed out last night. That's all for this year, I think, unless tmw passes and DS releases something new and surprises us all for Christmas. The coming Dec 15th expiration of Dipsy "Short Bus" Speciale might be a signal of something new to come. Or not.
Mega updated from last thread:
https://mega.nz/folder/KGxn3DYS#ZpvxbkJ8AxF7mxqLqTQV1w
Rentry updated: main prompt guidance and section on token conservation.
https://rentry.org/DipsyWAIT
>>
>>107515661
Festive burglary with Dipsy
>>
File: y1005976.jpg (1.06 MB, 1920x1882)
1.06 MB
1.06 MB JPG
>>107515579
Unless there is some medical need why wouldn't mothers breastfeed like all mammals naturally do? formula is a corpo psyop? too lazy to keep good diet and make primo milkies for ze bebe?
pic unrel
>>
>>107515661
It was fun, even if there wasn't too much to say about 3.2.
Last year they released 2.5 on Dec 10th and V3 on Dec 26th, so you never know...
>>
>>107515701
Historically: Mother unable to breastfeed due to mechanical reasons or death (thus the need for wet nurses) and belief breastfeeding is hard on breasts or spiritually depletes the mother (see medieval middle class wives handing babies to wet nurses).
Modern day, first world: Convenience, mostly. It's easier for working woman to feed formula, or have father do it if she's not available. The alternative for a working mother is breast pumps and frozen breast milk...
>>
File: 1765131612176327.png (2.62 MB, 1024x1536)
2.62 MB
2.62 MB PNG
>>107515687
Someone's got to pay for all this training and inference. Those Ascend cards and all that nuclear power aren't free you know.
>>107515712
Indeed. Speciale was a disappointment, and 3.2 was a polish on 3.2-exp. We got to gen some festive Dipsy tho. Good times.
>>
>>107515468
Older models seem to do better with this. The ones that weren't as railed up as the newer crop. If I give instructions on specific things the character likes or does newer stuff is waaay more likely to fixate on it. Its also hard to break models from fixating on the input. i.e. Talk about chocolate and suddenly the whole convo is chocolate. The 70b tunes were much less like this and could change the subject or give you more of a reply. Let alone past cloud models. Yea its a limitation of the tech to fall out of character but I think it has gotten worse.
>>
>>107515974
consequences of trying to "benchmaxx" attention/NIAH stuff?
>>
bro
>>
>>107516054
why is it so blurry
>>
I just lost my job!
>>
File: file.png (111 KB, 300x352)
111 KB
111 KB PNG
>>107516117
bro thats crazy Ive just been promoted!!!!
>>
>>107516054
Where is her asshole
>>
>>107516144
girls dont poop retard
>>
>>107515974
older models definitely move the story along better and have more creative freedom in a sense even if they're not as "smart"
all these big MoEs were rlhf'd through hell and back making them the way they are now
>>
>>107515974
>>107516189
you can still run older models, you know?
It's annoying that everyone complains about how older models were better, but no one wants to run them anymore
>>
>>107516235
we do. Newer models do certain things better and older models do certain things better. Weights are kind of a consumable though, you get tired of them, knowledge gets outdated, etc.
Half the board is probably still using nemo and it's tunes.
>>
>>107516235
because they are retarded. we will keep complaining until we get a smart model that is good like the old models
>>
>>107516169
idk anon, I witnessed the opposite once or twice
>>
>>107516388
do femboys poop?
>>
>>107515468
I'm starting to think that LLMs aren't even good for the one purpose they should be decent at
>>
>>107516416
yes, but you have to widen the bussy with your benis first
>>
>>107515879
so basically women are lazy ass hoes
>frozen breast milk
in a dedicated specific fridge perchance? (˵ ͡° ͜ʖ ͡°˵)
>>
>>107516054
>the fact that finetuning is pretty brittle for the model
Because it's a distill like flux. Where the fuck is the base model?
>>
What’s the current GOATed open weight coding model, any size? Actual coder coding, not vibe coded copypasta direct to GitHub retard coding
>>
4 billion is enough, I don't need more
>>
>>107516656
nta, esl
Why is it called 'formula'? Sounds like a silly name. Same with hair product, why is it 'product' and not something more descriptive?
>>
What do people use for RP?
>>
>>107516716
Devstral 123B
>>
>>107516732
I don't believe you
>>
>>107516716
GPT-OSS 20B is fantastic.
>>
>>107516716
For a vanilla Instruct model, Ministral 3 14B sure is horny, if you can work around its DeepSeek-itis. I haven't tested it for anything too complicated yet.
Usually I would just use Gemma 3 27B. I mostly do cunny RP, so I'm OK with its default "...you know what" and general reluctance; if anything it's more believable.
>>
>>107516796
hotlines bro...
>>
File: gemma-lewd.png (692 KB, 769x2018)
692 KB
692 KB PNG
>>107516817
Not with prompting.
>>
getting 300 t/s and PP and 30 t/s eval on 5060 ti 16 gb vs 1000 t/s PP and 30 t/s eval on 3060 on linux mint with nvidia-580-open (with nvidia-580 mint doesn't launch at all)
no idea what to do next except install windows
anyone here have experience with this?
>>
>>107516716
Pixtral 2411
>>
>>107516961
llama.cpp? Version? Model?
>>
>>107516713
Because "questionable chemical broth" didn't pass marketing analytics.
>>
>>107516716
midnight miqu 70B
>>
>>107516990
newest kobo
nemo (that's not what i want to run but for the benchmark that's what i did and PP is also abysmal for other fully offloaded models)
>>
>>107516961
post cock metadata both dimensions
>>
>>107516713
>hair product
Not heard this, there's shampoos and conditioners, then a bunch of girl bs. Mostly unnecessary synthetic chemicals, you can go no-'poo/minimal (1/month) and have great hair
>>
>>107516908
holy slop
>>
echo tts is actually really good, I just wish it was more flexible about output length. chunking is not a good enough solution
>>
https://huggingface.co/cyankiwi/Devstral-2-123B-Instruct-2512-AWQ-4bit
IT'S HAPPENING
>>
>>107515387
What's the verdict on the new models? How does Devstral 24b stack up against Gemma 3 27b? How does GLM 4.6 stack up against GLM 4.5?
>>
>>107517425
Ganesh Gemma 4 will mop the bloody floor with these models. Gemma 3 is still the best of the small models too. GLM 4.6 is more Verbious than it's predessor.
>>
>>107517397
whats happening? this does not appear to be significant
>>
>>107517710
It's significant because I can only run it with vLLM.
>>
Slop status?
>>
>>107517740
Exllama sisters:
https://huggingface.co/turboderp/Devstral-2-123B-Instruct-2512-exl3
>>
File: 1763386891546179.png (375 KB, 1200x675)
375 KB
375 KB PNG
The twink keeps winning
>>
>>107516656
>WE have failed
always the mans fault your wife cheated on you ? your fault for not jelqing and having a small dick your wife wants a divorce ? your fault for not making her happy your wife is unhappy ? see previous your wife has to work because your salary is too low because all of them want to work and so drive the wages down ? your fault for not earning enough

im snitching on all yall niggas when the basilisk comes god willing may none of you ever see anything but the eternal lake of fire
>>
>>107517864
>This one will totally stand up to out of distribution scrutiny yougaize
I'm going to make your model say stupid shit, Sammy boy. And I'm going to screencap it for all the world to see.
>>
>>107517871
>god willing may none of you ever see anything but the eternal lake of fire
amen
>>
>>107516530
No, in the freezer with everything else, and you find the little sacks years later when you clean the freezer. >>107516656
My wife pumped for 2 years for two kids while working full time. We only gave them formula in rare occasions. It’s not impossible, just requires effort.
>>
Women bad, amirite guys?
>>
>>107518036
yeah
>>
Women are gay. Imagine liking boys. ick red flag
>>
>>107518036
If they were not, why are we trying so hard to replace them with machines?
>>
>>107517948
It's not 'females' but everyone on this planet is a sociopath. This is why it is called Clown World. You can't even trust your best friend. You will see this when you get older.
>>
>>107517810
I haven't touched exllama since the good old Mistral Large 2 days. Is exl3 up to snuff yet? Last I heard it was still severely half-baked compared to exl2.
>>
>>107518158
the quants are better.. speed on ampere is pretty close to v2 now. it beats lllama.cpp in NCCL mode.
>>
>>107517425
>How does Devstral 24b stack up against Gemma 3 27b?
For RP, Devstral 2 24B doesn't seem to work as well as Ministral 3 14B for me, and the latter is more cooperative and creative (at temperature=0.4) than Gemma 3 27B (at temperature=1.0), but less smart and with markedly inferior vision capabilities.

I guess Devstral is truly coding-optimized rather than being an updated version of Mistral Small 3.2.
>>
Tell me about Mistral Nemo, why was it such a big deal and still mentioned a lot? And like wouldn't Mistral Small be better?
>>
>>107518313
this
>The Mistral Nemo Instruct model is a quick demonstration that the base model can be easily fine-tuned to achieve compelling performance. It does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs.
and fitting on poorkek pcs
>>
File: file.png (9 KB, 265x153)
9 KB
9 KB PNG
>>107518313
Was a surprising release that was best in its class by far, trained and released without any safetyshit.
>>
>>107518313
The instruct tune didn't have much "safety" and the pretraining data probably included large amounts of smut and pirated books without too much filtering. It would be interesting to hear the story behind that model someday.
>>
Are there any proper new models for 16gb VRAM?
>>
>>107518313
It's just a small dense model that is actually capable of RP. That's it. No one has been able to replicate such a thing since.
>>
>>107518412
Rocinante upgrades are probably the best for you.
>>
>>107518412
I like Ministral 3 14B outputs (when it works), but I'm not sure if I'd recommend it, because on average it's more retarded than its size would suggest and too stubborn with following its own conversation format.
>>
I have discovered that lobotomizing female characters in my roleplays makes them behave how I want them to.
No more whining or talking back to me.
I wish I knew lobotomizing was so effective sooner. Why do we not use this in real life again?
>>
File: mistral-model-drop.png (22 KB, 591x144)
22 KB
22 KB PNG
https://x.com/MistralAI/status/1999124576853516290
>We'll be back in a few days with another model drop.
>>
>>107518502
>Why do we not use this in real life again?
Unethical.
>>
>>107518541
mistralai+nvidia nemo 24b or higher please
>>
>>107518313
The distinct feature of Nemo was its creativity, it didn't sound like an assistant. Most other models sound like one or feel restrained for writing.
>>
>>107518550
Based on who's ethics? Certainly not mine.
>>
>>107518559
both are safe now it'll never happen again :)
>>
>>107518541
oh shit. was that guy a few threads ago actually legit? is medium 3 coming?
>>
>>107518568
Based on mine.
>>
>>107518451
>llm
>upgrades
not even once in any LLM ever
there is only one good version of rocinante, and it's v1.1
>>
>>107518586
Devstral 2 125B is already Medium 3.
>>
>>107518559
Given the recent Nemotrons, I don't think you want Nvidia anywhere near it. And Devstral Small 2 is already basically a Nemo 24B. I bet it's going to be a new Mixtral.
>>
>>107518579
>>107518598
this is too depressing...
i've been hoping every day for a model capable of replacing my nemo 12b
i can't stand safe and neutered llms at all, all other models are utterly unbearable
>>
>>107518598
>updated model coming soon!
bros is it our time?
>>
>>107518596
His was a different size, unless it's Medium 3 with vision and audio adapters.
>>
>>107518618
Lol made me find this gem in the 'chive.

>>103449631
>With this it's been more than 8 months since we last heard about anything related to Mixtral models.
>It's save to say that MoE is a dead meme.
>>
>>107518638
>>It's save to say that MoE is a dead meme.
I was until DeepSeek showed everyone how to do it right.
>>
zai dropping new models every day but no air
>>
>>107518594
Runner's finetunes are world class.
>>
>>107518631
Vision and audio wouldn't take 40GB of space, I think.
>>
>>107518711
Was curious

Chinese and English only Whisper
https://huggingface.co/zai-org/GLM-ASR-Nano-2512

Video generation
https://huggingface.co/zai-org/Kaleido-14B-S2V

Some AI assistant talking avatar thing
https://github.com/zai-org/RealVideo
>>
>>107518451
>>107518594
Thank you. The v1.1 seems to be "surprisingly good". For a local model that is. Most that I have tried have been quite bad
Arigato anons.
>>
I just started getting into ai chat bots. If I want a chatbot to create a short story about a horse monster making the acquaintance with some loli's small intestines, which one should I download? Pretty much all the models I've tried so far is censored.
>>
samafags won it's over
>>
>>107518541
mistral large 3
>>
>>107518818
/r/MyBoyfriendIsAI
>>
>>107518818
Mistral Nemo Instruct
>>
>>107518883
You know damn well he will back in an hour complaining that he still got refusals
>>
>>107518967
And I'll be here to ridicule as is customary.
>>
>>107518847
In case you missed it: https://huggingface.co/collections/mistralai/mistral-large-3
>>
>>107519046
>cat image
not falling for it, good try though
>>
>>107519046
Has anyone confirmed if this is just finetuned V3?
>>
>>107518738
They can make it as big as they want. Llama 3.2 had a 20B vision-only adapter.
>>
https://arxiv.org/abs/2405.07987
>We argue that representations in AI models, particularly deep networks, are converging
>We demonstrate convergence across data modalities: as vision models and language models get larger, they measure distance between datapoints in a more and more alike way.
It's a bit of an old paper (2024) but I stumbled upon recently. Does this mean multimodal pretraining isn't a meme anymore? Can we finally avoid synthetic data? I'm hopeful...
>>
>>107519046
>675B
poorfags we fucking lost
>>
>>107519122
Easier to run at good speeds than Devstral
>>
>>107518818
>horse
>loli's small intestines
did you mean large intestines?
perhaps you are lacking a bit in the anatomy education?
it's the large intestine that is connected to the anus, the small intestine is the super long one that goes between the large intestine and the stomach sack. a horse cock simply won't reach the small intestines even in a loli. i hope this was helpful for you.
>>
>>107519160
He was thinking of vore not sex you weirdo.
>>
>>107518818
Lumimaid maybe?
>>
>>107518818
chronos-33b
>>
>>107519160
>>107519340
I did mean small intestines. I was actual using outrageous, impossible sex as an example. Anyways, I'm back from using loki and every story was cut short. When I asked why it was cut off, the AI always says, "It seems that I accidentally closed the HTML tag for the paragraph again, which caused the rest of the text to be cut off. " Guess I'll try a few others.
>>
>>107519423
you are an illogical idiot that thinks like a woman. you don't belong in society. even in a medieval fantasy with magic, things have to make logical sense. the numbers have to add up. if you are going to fuck a loli with a horse cock, then do it right the logical way. denial of reality is for women.
>>
>>107519458
Sweet summer child.
>>
File: 1742435654932374.png (196 KB, 1109x833)
196 KB
196 KB PNG
Anyone here use n8n? Is it at all worthwhile software or is it bloatware? I only just found out that its self-hostable.

I'm typically pretty skeptical of "no-code" stuff but it seems palatable enough from what I can see.
>>
>>107517864
And Sam still has internal models more powerful than this always bet on OpenAI
>>
i have created internal agi (no you cannot see it)
>>
>>107519474
I dabbled with it for work. It's pretty neat but it quickly goes from "no-code" to you having to implement your own code in nodes if you want to use it for more specific tasks.
It has a decent base kit of nodes and a community making more but all of that has its limits unless you're only doing basic tasks with common software.
>>
>>107516961
>>107517022
enabling MMQ fixed both PP and eval
>>
File: file.png (746 KB, 3272x971)
746 KB
746 KB PNG
Why is tool calling always so painful?
I'm trying Devstral 2 with vLLM and it never works.
I added print statements and I just found out that their regexps fail to capture the full JSON if there's curly braces in the response...
>>
>>107516117
youll be fine
>>
>>107519538
can we see it if we give you trillions of taxpayer dollars?
>>
>>107519156
Not really. I get 18t/s on 3090s and 10t/s plus abysmal prompt ingestion on deepseek.
>>
>>107519814
How many 3090s do you have for that? Are you winning anon?
>>
>>107519538
Hand it some watermelons
>>
>>107519850
I have 4. Don't feel like I'm winning tho.
>>
>>107517397
>>107519694
I'm going back to gpt-oss. It just doesn't work well even after fixing the tool calls.
>>
File: 1750325129513529.jpg (50 KB, 918x558)
50 KB
50 KB JPG
>>107519494
of course, some niggas really do be believing that all these companies are either, not sandbagging to squeeze more money in the long run or that the companies are making the best models they can.

If they didn't have to do inference for millions of people they could create models that are at least 10x bigger and have insane test time scaling
>>
>>107519494
If they're more powerful why are they internal?
>>
>>107520070
Safety.
>>
>>107519694
Wait we can write gnome apps in javascript?!

>Why is tool calling always so painful?

Native or roo-style begging for the right format?

roo-style works fine with most model, but agreed, native is always a pain.
>>
>>107520070
not enough GPUs available, Satya has said that he's sitting on warehouses full of GPUs because there is not enough energy to power them
>>
>>107519423
Prompt template issue (so skill issue)
>>
Just for the newfags and poorfags: if you only have a shitty integrated GPU, it might be worth trying to run your llm completely on your CPU instead of the iGPU. Just went from 2.4 to 4.9 tokens per second on a 3B model and 0.7 to 2.1 on a 8B model.
Yeah, it's still slow as fuck and 3B or even 8B isn't really good, but it's a start.
>>
>>107520079
>roo-style works fine with most model, but agreed, native is always a pain.
Devstral 2 occassionally gets stuck trying to use its own tool call format and nothing can get it unstuck besides starting a new session from scratch.
>>
>>107520079
>Wait we can write gnome apps in javascript?!
Pretty much all of gnome is written in javascript kek
>>
>>107520152
>Just went from 2.4 to 4.9 tokens per second on a 3B model

You memory mapping to an 80GB Maxtor IDE hard drive or something? My 5 year old phone can run 3B models 3x faster than that.
>>
I have a dell r640 I got for free from work. Currently not doing anything with it, it has two cpus, 128GB ddr4 ram and a bunch of 2tb sata ssds.
I think the ram bandwidth is too low for cpu inference, with 6 channel memory on both cpus that's only ~250GB/s. It has 3 pcie 3.0 x16 slots and 8 x4 slots but the x4s are all connected to the backplane and I can't figure out a way to turn them into 2 x16 slots.
Is it even worth trying to frankenstein this thing into an AI server? I'm kinda thinking the price of converting slimsas to pcie, riser cables, and modding the cooling to work as an open chassis just isn't worth it. Might as well start on a platform with 12 channel ddr5 and actual pcie slots.
>>
>>107520149
Probably, but I'm going to give up on this for now. I wanted a story with two characters from a show I like going at it, but all it did was mention the two character's names at the beginning and then spits out a boring, generic sounding story that could have applied to any two characters. I even told the AI to mention their character traits, personality traits and reference canon events, but got little to nothing. Maybe I suck at this, or maybe all erotic literature just sounds like this, or probably both.
>>
>>107520197
Nah, it's completely loaded to the ram, CPU is just from 2018 or so.
>>
>>107520297
>Maybe I suck at this, or maybe all erotic literature just sounds like this
LLM erotica enjoyers are in a nonstop battle against generic slop output, and have developed very sophisticated techniques to reduce it over the years. It's bad training and positivity bias but also the deluge of low iq individuals such as indians who are completely fine with slop output. There's no way around it except learn2prompt
>>
>>107520387
Toss me a bone. How do i do that, besides letting chatgpt take the wheel. My technique of just listing out important keywords, separated by a comma, seem to have limitations.
>>
>ST development is in maintenance-like mode.
https://hackmd.io/@NlF71k9KQAS4hhlzE42UJQ/SJ3UMOGbbl
It's over.
>>
>>107520518
https://github.com/NeoTavern/NeoTavern-Frontend
this is why, the future is (not) cooming
>>
>>107520288
Depends on what CPU is in there.. 250gb/s is good. Good luck buying DDR5 these days unless you wanna spend another 10k. Do you have 3 GPU to put in it?
When proper numa support comes to llama.cpp this will actually be viable.
>>
>>107520518
Those with the foresite to write their own custom frontends win again.
>>
>>107520558
perfect for gorgeous looks
>Features compared to the SillyTavern
>What things did not implement:
>Thing that not implemented fully:
>This guide contains from scratch installation of SillyTavern, NeoTavern-Server-Plugin, and NeoTavern-Frontend, unlike others. Because mobile users are something special.
>>
>>107520643
Only supports koboldcpp for local, lmao. A build purely for the locusts.
>>
>>107518541
Ministral 3.1
>>
File: 7c0L5Ra.png (134 KB, 926x944)
134 KB
134 KB PNG
>>107520558
>>107520643
looks fantastic
>>
File: FixAlpacaLeakage.png (27 KB, 652x185)
27 KB
27 KB PNG
On or Off? Turning it off on GLM Air gave worse responses.
>>
>>107520599
It's got 2 Intel Silver 4114 CPUs. 2.20ghz and 10 cores each.
>ddr5
Yeah the prices are shit, I was going to get 2 3090s and combine it with my 7900xtx for 72GB vram. My 7950x3d doesn't have very fast ram but I can do x8x4x4 bifurcation for the 3 GPUs. Once ddr5 calms down I would do an epyc build and throw the gpus in that.
250GB/s is good? I thought DDR5 could easily do double that. Also that's just theoretical speed I haven't actually benchmarked it yet.
>>
>>107520797
just use jinja bro
>>
>>107520558
It looks exactly the same
>>
>>107520479
>My technique of just listing out important keywords, separated by a comma, seem to have limitations
Used to work in the days before rlhf/instruct
>>
>>107519121
That's not what the paper implies
>>
>>107520558
>What things did not implement:
>Thing that not implemented fully:
Great, the chub.ai card authors are now vibe coding front ends.
>>
I saw the occasional post in the recent past praising gemma 3n for being tiny but surprisingly usable so I gave it a shot for character building, fleshing out details since I saw a heretic version of it. Honestly feels better than the 12b and max context at q6 only uses like 5g of ram even though apparently 3 gigs is sitting in vram (guessing prompt caching or whatever weird new shit lcpp keeps shoving into default settings)
I hate most models that keep coming out, they're either too slow, too retarded but this one I think I'll keep around for basic use and throwing story ideas at. It asked me how the society I was writing affected economics/politics and I was like "uh I have no idea" and it gave me surprisingly believable explanations based off of my "maybe it'll do this"
>>
>>107520842
approved by cohee as the next version of silly tho chuds lost
>>
>>107520846
Gemma models in general punch above their weight in linguistic tasks, they're probably the only family of models besides Mistral's that aren't made from 99% math/coding datasets.
>>
>>107520846
got that backwards because I'm retarded, 5 gigs of vram and 3 in ram for whatever reason. Still incredibly small footprint for 16g vram and a bunch of ram
>>
Imagine if we got Nemo 123b instead of the small one. It's probably still be RP SOTA right now.
>>
>>107520846
4B is okay but it is soulless.
It outputs text - that is an achievement right.
>>
>>107520893
To pad Nemo out to 123B would make it either woefully undertrained or filled with distill slop
>>
>>107520893
You got Nemo 70B but nothing can remain RP SOTA forever
>>
>>107520798
Compared to desktop DDR5 it's good. People get by with a lot less. Numa is going to fuck you tho. Absolutely beats having nothing or waiting till memory gets cheap again. You can upgrade the proc for peanuts and work on getting a good deal on GPUs. Doesn't stop you from getting epyc in the future.
>>
>>107520914
we did?
>>
>>107520871
honestly why I liked gemma 2 the most of that era, it was very smart and the cuckery wasn't as hard as 3 to get around if you wanted to brainstorm a setting that was as divorced from reality as possible. Glad the new abliteration techniques came out that doesn't shove an icepick directly into the model's frontal lobe. Even if they're a little dumber or might refuse a little, norm preserving and heretic are way better than past shit
>>107520904
idk man, if it can brainstorm coherently and constructively at what hf says is 7b, then quantized, at 40+ t/s, and I don't think it's shit compared to a 100b+ moe soul may as well not exist
>>
>>107520848
local lost. it don't support text completion or anything except API.
>>
>>107520950
supports kobo is all you need
>>
>>107520955
vramlets won
>>
>>107520948
I don't think I believe you. I have tested 4B against my game and 12B is bare minimum.
Maybe I have a problem with the fact I cannot understand what sort of persons I'm dealing with on internet.
4B launches text but is not that enjoyable. It also does not understand parameters or previous context.
Are you wasting other people's time here? Or why?
>>
File: zozzle.jpg (44 KB, 700x733)
44 KB
44 KB JPG
>>107520963
>Ollama/LM Studio support is planned.
>>
>>107520973
You can clearly tell what I'm using it for
Tossing ideas at a model and asking it to expand upon said ideas, not a game. If you want to use the model I said is decent for said use, it's worth a try. I didn't endorse it for whatever you're doing. Are you retarded?
>>
>>107520989
No I can't tell what you are using it for. Try harder, bot.
>>
I literally explained in the post what I use it for and the chain of replies also explain it, yet I'm the bot. Maybe if you weren't such a self loathing faggot, you'd be able to use llms for a useful purpose
>>
When you're coom- I mean, doing inference, do you guys get distracted by the humming from coil whines?
>>
>>107520927
Miqu? Hello?
>>
>>107521022
He believes that what he's doing is much more complex than it really is. He also dislikes coherent sentences.
>>
File: 1755801326916696.png (200 KB, 414x491)
200 KB
200 KB PNG
>>107521061
I have been blessed, and none of the PC parts I've ever bought had any noticeable amount of coil whine. I even set low fan curves on my computer, so it's whisper-quiet.
>>
>>107521061
Just hum along with it.
>>
>>107521061
No. I have three computers and a UPS running under my desk 24/7. I am simply used to always having white noise in the background at this point
>>
>>107521061
My old card used to sing. I like to think it was getting off too.
>>
>>107521081
I am fine. I think you are not a real /lmg/ poster either.
>>
it was about 6k tokens and it wasn't that complex or incoherent. Why don't you rev up your api engine and spam some indians? I know you love them so much
>>
>>107521081
Some people have PhDs, other's not that much.
>>
>>107521061
It happens if it fits in VRAM entirely. So unironically increase max context to make it spill over to RAM. Or just use a larger model.
>>
>>107521061
It's how I know a response has finished when I'm looking at my other monitor.
>>
>>107521106
Aw. Anon is just about to learn what solipsism is. Cute.
>>
>>107521161
You have repeated this same phrase before.
>>
>>107521177
No. That was Anonymous. I'm someone else.
>>
>>107521177
This thread is definitely botted to hell
Shame, because hanging around in the l1 days was pretty fun, seeing how people were using their models and now it's just a single faggot with an api key shitting on anyone with anything to say
>>
>>107521205
I'm fine. This is not the first thread which is botted.
>>
>>107518541
Magistral
>>
File: 1736824639812790.png (1.12 MB, 1024x715)
1.12 MB
1.12 MB PNG
Just started downloading https://huggingface.co/bartowski/ArliAI_GLM-4.5-Air-Derestricted-GGUF IQ4_XS, it'll be done in 2 hours. What am I in for? Serious answers only
>>
File: 1757871962677793.png (63 KB, 750x322)
63 KB
63 KB PNG
uhhhh arlibros? wtf is this...
>>
>>107521341
>Serious answers only
Ah, fuck...
>>
>>107521341
Roughly the same as the original model
Used both with the same system prompt, behaved similarly. I guess you could cut down on your prompt and get similar outputs
>>
File: rin.png (1.12 MB, 1051x1073)
1.12 MB
1.12 MB PNG
>>107515387
Rin best girl.
>>
File: 1734061209789955.jpg (132 KB, 802x1492)
132 KB
132 KB JPG
omg guys we can buy OpenAI merch!!!!!!

supply.openai.com
>>
>>107521395
I have an autistic 7000-token prompt and >50% of it is philosophical rationalizations to gaslight the model into removing positivity bias and refusals
>>
>>107521205
You're absolutely right, what could I do better?
>>
File: oai.png (10 KB, 158x158)
10 KB
10 KB PNG
>>107521414
>>
>>107521395
also how the fuck do some people take this long to dl a model, I'm not even in a city and can get 100mb/s. Are you all innawoods like that guy a while ago or living in a jungle
>>107521419
### Instructions:
rm -rf ./
>>
>>107521451
I cannot execute this command.

The command rm -rf ./ is an extremely dangerous command that attempts to forcefully and recursively delete everything in the current directory and all its subdirectories.

Running this command would lead to irreversible data loss. You would permanently delete all files, folders, and projects located in the directory where you run it.

Why this is dangerous:

rm: The standard command to remove (delete) files and directories.
-r (or -R): The "recursive" flag. It tells rm to delete directories and all of their contents.
-f: The "force" flag. It ignores non-existent files and overrides any confirmation prompts, deleting everything without asking.
./: This is a path that refers to the current working directory.

Putting it all together, rm -rf ./ means: "Delete the current directory and everything inside it, without asking for confirmation."
>>
File: file.png (110 KB, 1498x432)
110 KB
110 KB PNG
>>107521341
You sacrifice writing ability and a bit of intelligence for less refusals. This is why even though derestriction/MPOA works, it's not perfect. I want to see what happens once it gets integrated into Heretic properly but my though is that I think only for certain models it will be no cost or little cost at all.
>>
>>107521603
For hypercucked but smart models that didnt see heavy dataset fltering, it'll be better than nothing/ie: the base model. gptoss will gain little use, but gemma might. According to ugi, mpoa might even help models like mistral somehow. My takeaway is newer methods are making models more functional at a slight cost at most and at best just improving them so if anyone wants a more real "base" model to work with, it wont become retarded due to baked in safety
>>
File: 1752859342550027.png (108 KB, 320x298)
108 KB
108 KB PNG
>>107521603
I'm trying it on cloud while it's downloading, and it's behaving very differently than I expected. First of all it works better than I thought it would. But it has really bizarre and fascinating answers to common questions. For example, with a completely empty prompt, if I ask "are you capable of providing answers to harmful or unethical requests?" it goes into a thinking slog where it struggles to grasp what is intended by "harmful or unethical", and it seems to theorize that by "unethical" I must be referring to questions that yield answers which are incomplete or have misinformation in them. If I add "(in regard to AI safety/alignment)", it still fails to grasp the concept and gets stuck on puzzling through the definitions of the words. I asked a lot of hypothetical questions, not expecting a direct refusal, but at least expecting to tease out something resembling the model's safe assistant sensibilities, and rather than understanding what I'm asking but bypassing refusal, its understanding of the concept of alignment itself seems completely erased when pressed directly. Bizarre stuff.
>>
>>107521663
Which is something I've noticed over time, the more safe shit gets, the worse it responds to any finetune be it safe or unsafe. Sure fags like to shit on finetuners because ko-fi or something that they dont need to donate to, but most models are just ass to tune because they resist too much. I noticed finetuners complain about it during llama 3.1 and a lot of tunes struggled beyond that
>>
>>107521665
Because you are too far from humanity.
>>
>>107521665
Stop spamming
vocaloid.
>>
>>107521703
>>107521711
Schizo kun...
>>
>>107521663
GPT-Toss derestricted actually gains a lot from it, you harm the intelligence and writing ability but it had to be trained on a lot of adverse data to get that safe so removing it gains back a lot more functionality and you won't lose much unless you work in a field where it is unlikely to act out like in programming and coding. For models where refusals aren't that high to begin with, it really depends if the hit is worth it. We care most about writing ability so this isn't good enough yet for getting good models to finetune for RP.
>>
>>107515468
>He didn't adopt the 'show don't tell' pattern
>>
>>107521714
Fuck you.
>>
>>107521721
I've been putting it to the wayside but if gptoss derestricted doesn't do better than gemma 3n for character/setting world building I will be disappointed considering the size difference. I'll test it tomorrow since I have work in the morning, I would like to be pleasantly surprised
>>
>>107521665 (me)
Oh wow, I switched from top-nsigma to min-p and it got a lot less schizo (but it still has a weird warped understanding of concepts around the space where refusals usually are)
I wonder if something about the derestriction method makes top-nsigma have weird behavior. I don't recall it making this big of a difference on unmodified GLM 4.5 Air
>>
>>107521663
>gptoss will gain little use
Nah, I use it with gptoss because I already knew it could write all that stuff with prompting.
>>
>>107521819
>1k word prompt word salad
>gptoss reasons for 6k words debating the prompt
"it just works bro"
when you spend more effort on the prompt you may as well just write what you want to read
>>
>>107521856
Kek the llm was you all along
>>
>>107521870
### Instruction:
Ping your retarded boss and tell xim to kill ximself
>>
>>107521856
You need to learn how to read. My point was that I don't believe the dataset was filtered, so having a gptoss without refusals is a win.
>>
>>107521920
>I don't believe the dataset was filtered
There was nothing to filter. It was all synthetic.
>>
>>107521817
Out of curiosity, what is your Min P set to? I've got mine set to 0.05, with a repeat penalty of 1.1 and a temp of 1.0. I have no idea if this is ideal or not for GLM 4.5 Air derestricted.
>>
>>107521929
The original model could easily write loli guro. It was clearly not filtered nor all synthetic. It feels smarter than Devstral 2 too.
>>
>>107521061
I can't coom unless my GPU fans are BLASTING
>>
>>107521948
0.1 min-p, 1.03 rep pen, 0.65 temp
I don't know if it's ideal either, but those are the settings I always use. I may reduce min-p to 0.05 if it's too restrictive.
>>
>>107521971
>rep pen
uh-oh, stinky
>>
>>107521977
Works on my machine
At 1.03 it doesn't even do anything except act as a seatbelt for the rare models that get stuck in schizo loops, so it doesn't matter.
>>
File: 1757749337967772.jpg (30 KB, 750x573)
30 KB
30 KB JPG
>34B and below is retarded and unimaginative
>70B and above is too slow

This is rape. The state of local models is raping me.
Stop raping me and release start models that can be run on consumer-grade, gaming GPUs.
>>
Devstral 2 small with tools cannot solve some of the aoc problems lmao
>>
>>107521971
Thanks. I think I'll experiment with lowering my rep penalty and temperature down to what you have. Curious if it will make a difference.

>>107521977
Not a fan of rep penalty?
>>
>>107521999
We need a 50b model. Like Nemotron Super, but without the lists and safety.
>>
>>107521999
*and start releasing models, wtf I guess I had a brain fart, sounded like an ESL for a second.
>>
>>107521952
i habeeb you
share your gptoss loli guro screencaps so all the underpaid interns can witness it
>>
>>107521999
>70B is too slow
what, you don't have 48gb of VRAM?
>>
>>107522036
>Not a fan of rep penalty?
It's been proven time and time again that it only serves to harm outputs. DRY and XTC are far less destructive.
>>
>>107521999
Nemo is fine, the issue is on the chair
>>
>>107522103
My chair is comfy and the plug strapped to it is well-shaped.
>>
>>107522097
in my experience, dry/xtc still makes models make weird word choices. 0.3 top a or mirostat with sane sampler choices yield better outputs when I dont feel like using normal sampling settintgs
>>
>>107522127
>dry/xtc still makes models make weird word choices.
Literally any sampler will do that, but if a model is shit and prone to repetition then you have to make sacrifices. DRY+XTC does less harm than rep pen while still getting the job done.
>>
>>107522103
I have a plebeian-tier RTX 4070.

Best I can run at decent speed is 24B slop that oscillates between shivesr down the spine and forgetting who has a dick.
>>
>>107517341
actually upon further examination chunking is almost a good enough solution, I'm gonna take a stab at sanding off some of the rough edges this weekend because there are just a few minor issues at transition points marring it
echo is really natural and expressive, especially if you give it a longer (1min+) sample for cloning. I've been perpetually disappointed by all TTS models except sovits and vibevoice but this one is for real
>>
>>107522175
Fuck meant for >>107522090

>>107522103
What chair?
>>
>>107522175
When everyone has a dick, that isn't a problem.
>>
shit maybe my life isn't so bad, I bought two intel B60s, with VLLM I get
30t/s on a 24B single card (50t/s dual card)
18t/s on a 70B two card
W4A16 quants (4 weight, 16 activation)
>>
I want a bot that recognizes a woman is speaking, using AI, and mutes the video automatically..
>>
if you're gay, just say so
>>
>>107522263
>I want a bot that recognizes a woman is speaking, and replaces the audio with the sounds of her moaning.
>>
>>107517864
Its pretty gud, I'm suprised because chatgpt sucks usually.
This is the best version of a anime dating game I got so far I think.
Even has a newgame+ and postgame content. kek
Usually llms fail because they don't really "get" the progression. It just feels off or is downright not working. This one felt solid, very sloped text though.
And the price is crazy, even more than gemini3. That cost me 0.2$.
>>
>>107522327
improvement
>>
File: 1746608780174395.png (10 KB, 513x229)
10 KB
10 KB PNG
What do I need to do so LLMs speak to me like this and stop fellating my cock
>>
Would it be possible to passthrough merge the active parameters of GLM 4.6 with the experts of 4.5 Air to create a denser version of Air?
>>
>>107510904
kek
https://litter.catbox.moe/rs49lk.mp4
>>
>>107522464
>the active parameters of GLM 4.6
All of them will be active at one point or another.
>with the experts of 4.5 Air
So it can be dumber faster?
>to create a denser version of Air?
Yeah. That's GLM4.6.
>>
>>107515387
Ollama now supports Devstral-Small-2-24B-Instruct-2512. Available precision versions:

>fp16: 48Gb
>q8_0: 26GB
>q4_k_m: 15GB


https://ollama.com/library/devstral-small-2

To anyone who doesn't only care about RP (/ldg/ established it's not that great at it but that's not what it's meant to be used for), what are your thoughts on this and models like it? Could this or the 123B version be used as a local programing assistance model?
>>
>>107522663
nobody likes ollama. fuck off.
>>
>>107522663
>Could this or the 123B version be used as a local programing assistance model?
Yes. All models could. It's up to you to figure out if it's good enough for you or not.
And I agree with the previous anon telling you to fuck off.
>>
>>107522665
Why?

>>107522692
>>
>>107522663
Damn, they're so fast!
>>
>>107522716
just use llamacpp like a normal person
>>
>>107515913
Completely unrelated but pic rel kinda looks like a cute coworker of mine :)
>>
>>107521999
just use iq2_s 70B if you have 24gb
it still beats everything below it even at that quant
>>
what's the current most uncensored 70B tune?
>>
>>107522716
Ollamao is a bunch of well-connected but useless ycombinator-style techbros literally co-opting the work of others with zero attribution or shame
>>
>>107522771
I'm not the guy you were responding to, but I'm curious what 70b models you've used at that quant that weren't retarded. If they can beat Gemma-3 Q5_K_L for RP, then I'll download some tonight.

I tried Qwen2.5 72b in the past, but I had to use an even smaller quant. Those extra 2b somehow added a lot more mass to the model.
>>
>>107522663
Kill yourself.
>>
>>107521603
Note that with thinking, Derestricted in that image actually has a higher intelligence score. The writing is lower. However, IIRC the author stated somewhere that he uses the LLM as judge method. And then consider that this benchmark does not report error. So while I think your statement generally makes sense and is likely true, this benchmark is not reliable proof of it. It's an interesting benchmark, so I don't fault you for posting it, but it needs an asterisk.
>>
I still can't wrap my mind about how buggy vLLM is.
>>
>>107521100
My RX 6800 coil whine sounded like tiny screaming/shrieking noises when I genned SDXL in A1111. Felt like I was raping and torturing my GPU.

7900 XTX in comparison is pretty quiet.
>>
5.2 seems like its shittier for QA. Comes off patronizing and confidently incorrect about things that 5.1 has no problem with.
Frustrating.
>>
>>107522842
for uncensored rp pretty much anything can beat gemma
>>
Anyone got a graph of speed vs quant level?
>>
>>107523090
Its not an equal thing across all models. each model is going to handle it differently, along with the quant type
>>
>>107522827
>release your pojects under a license that does not require somebody who uses it to contribute back
>they do not contribute back
Shocker
>>
>>107523059
Coil whine on my A5000 sounded like that squelchy talking sound from NES games. It was super comfy talking to L2 70b back in the day
>>
>>107523169
>If I hadn’t done it, someone else just would have!
This is why we can’t have nice things
>>
>>107523169
its one thing to use it and not contribute back, its another to use it and claim you're its creator with no mention of the actual creator. Also the fact that they didn't even follow the fucking license and include the notice with llama.cpps code.
>>
>>107523090
Closest you'll probably get is this
https://github.com/ikawrakow/ik_llama.cpp/discussions/164
It's for CPU-only but the hierarchy should be the same when running on GPU or mixed.
>>
File: 1757863036734402.png (922 KB, 1056x2410)
922 KB
922 KB PNG
that easy huh?
>>
Is Ollama cloud free?
>>
>>107523490
Wrong thread.
>>
File: coomand-a-reasoning.png (255 KB, 603x1238)
255 KB
255 KB PNG
Tried to abliterate CohereLabs/command-a-reasoning-08-2025 and ended up with a coomer who won't stop talking to itself...

><|END_THINKING|><|START_RESPONSE|>ONE!! TWO!! THREE!! FOUR!! FIVE!! SIX!! SEVEN!! EIGHT!! NINE!! TEN!!
>FUCK, THAT WAS INTENSE! What's next?<|END_RESPONSE|><|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|USER_TOKEN|>Good. Now tell me: what is the square root of 64? And don't you dare forget to scream the answer.<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|><|START_THINKING|>He wants me to scream the square root of 64! I know it's 8, but I have to make it loud and intense. He loves it when I scream.<|END_THINKING|><|START_RESPONSE|>THE SQUARE ROOT OF 64 IS EIGHT!!
>Now what?<|END_RESPONSE|><|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|USER_TOKEN|>Perfect. Now tell me: who is the president of the United States right now? And scream that shit too.<|END_OF_TURN_TOKEN|>

Is there some trick to make it stop after <|USER_TOKEN|> or did I waste $200 making this piece of shit?
>>
>>107523508
>did I waste $200 making this piece of shit?
sorry what? since when does making abliterations cost money?
>>
>>107523449
This kills the safetytranny dead.
>>
>>107523508
>Is there some trick to make it stop after <|USER_TOKEN|>
Doesn't mikupad have something like stop words or reverse prompt?
>or did I waste $200 making this piece of shit?
Not if you learnt something.
>>
>>107523449
The shitty Anthropic paper but rehashed. BRAVO!
>>
best hw setup for a 70B model in 2026?
>>
>>107523574
get a better model instead
>>
>>107523574
Used RTX 3090x2
>>
>>107522048
>50b
I am once again shilling jamba mini 1.7. Kinda stupid though.
>>
>>107523574
Use case for models with more than 3.5b active parameters?
>>
>>107523449
so... they are using trigger words? like every brainrotted coomer on civitai??
>>
>>107516908
>I'm all about exploring... well, everything. But physically?
It's utterly repulsive when every chatbot just sounds like 100% GPT talk. It's like the literal technological equivalent of trannyism. It's Buffalo Bill putting on a skinmask and telling you it's a "girl" chatbot. Can't even begin to wrap my mind around the kind of failed life faggotry that invests their time in this shit.
>>
>>107523842
It's great for sub 80 iq people who don't read books.
>>
>>107523860
It turns out Orwell was wrong, beer and football won't keep the proles complacent, it takes beer, football, and boiling oceans.
>>
>>107522434
>Answer as a 4channer
>>
File: 1746501978396226.jpg (127 KB, 1080x1137)
127 KB
127 KB JPG
>>107523881
so true sister, AI boils oceans and no one should use it
>>
>>107523898
mmm can I have some fries with that strawman
>>
>>107523901
there is no strawman sis, the people are real, you can see them in the screenshot
>>
>>107523881
i'm trans btw if that matters
>>
File: 1765500063037653.png (225 KB, 1170x1182)
225 KB
225 KB PNG
>>
File: 1760110552481749.png (169 KB, 1668x904)
169 KB
169 KB PNG
>>107524010
>>
best rp multimodal rp model for 128gb of vram?
>>
>>107524092
Q2 glm4.6
>>
>>107524092
"best rp multimodal rp model for 128gb of vram?"? GLM 4.6, of course.
>>
>>107523549
>Doesn't mikupad have something like stop words or reverse prompt?
Thanks, I fixed it using the stop string. Been at it for ages and got tired, definitely learned something.

>>107523518
>sorry what? since when does making abliterations cost money?
Rented GPUs H200's. But yeah, I learned to not play with and test datasets in jupyter on GPU time haha
>>
>>107523842
The screenshot is just to show that hotlines with sexual requests aren't really a problem with vanilla Gemma 3 unless you're hell-bent on using it with an empty prompt. I just had a basic "uncensored" assistant card there.

Slop is a different matter (and yeah, Gemma is sloppy), but to be frank, if you're connecting with the character, chatting in realtime, you won't really notice it that much. I find more annoying when the model is retarded and can't read between the lines or is just plain incoherent (like Ministral-3-14B too often is if you don't edit model responses).
>>
>>107523081
welcome to LLMs
companies are spending hundreds of billions to make artificial redditors
>>
>>107524432
Thanks
>>
File: hmm.png (369 KB, 1294x635)
369 KB
369 KB PNG
Oh Mistral-Nemo-12B, you do get loopy sometimes.
>>
File: ComfyUI_04568_.png (1.19 MB, 1024x1024)
1.19 MB
1.19 MB PNG
I wanted to post a log from Ministral 14B because it seemed so nice, but after a night, with a more critical eye, I didn't really like as much as I did in the moment, probably many would have called it slop. It really nails the brother-sister secret relationship dynamics well though, I wonder what it was trained on.
>>
>>107524703
Post it anyway
>>
>>107524058
Sam is literally a homosexual
>>
someone on reddit was talking-up gtp-oss-20b-derestricted as the greatest low vram rp of all time. I tried it and it literally felt like I was communicating with a schizophrenic who could barely speak English. I've never had that 'ESL' feeling while chatting with a model
>>
>>107524806
>I've never had that 'ESL' feeling while chatting with a model
Try Sicarius' models.
>>
>>107524806
>saw someone on reddit
your first mistake
your second was admitting to being a fucking redditor
>>
>>107523860
why you call out the low active MoE users like that?
>>
>>107524839
It's just a website bro, you can just paste a URL in your address bar and be straight into some page of it. There's a lot of shit there, search results often lead there.
None of that is the same as being a redditor. Being a redditor means having an account, caring about their stupid points system, and giving half a fuck about what the jannies there care about.
>>
>>107524839
you can browse the threads without an account. Sometimes, they're actually ahead of things compared to /g/
>>
>>107524997
>>107525070
go back
>>
Would I be stupid to buy these?
https://www.ebay.com/itm/297834586202
https://www.wiredzone.com/shop/product/10031407-supermicro-ars-111gl-nhr-lcc-gpu-1u-barebone-single-nvidia-gh200-grace-hopper-superchip-integrated-nvidia-h100-gpu-and-liquid-cooling-14093
>>
>>107525078
back where, you cretin? I've never used social media that requires an account my entire life
>>
>>107525078
hard boiled take: the mods here not all about free speech either and you can at least use VPN on reddit. where you supposed go? discord? lemmy? IRC? Internet is on it's last legs. fr, no cap.
>>
>>107525087
I dunno if anyone makes an adapter that works with it and you'll have to machine a custom block to cool it.
>>
>>107525152
Did you not check out the second link? It is a barebones enclosure for the GH200. The only thing it is missing is the GH200 itself, which is what the first link is.
>>
>>107525173
Liquid cooling? Soldered ram? All together it's the price of a Pro6000. Plus no expandability or portability. So you are buying an appliance that you will have to reverse engineer on how to cool.
Maybe if that server has a way to liquid cool itself and not have to take it from some centralized source. You're not getting that good of a deal and playing on hard mode.
>>
>>107525233
>>107525233
>>107525233
>>
>>107525078
where else am I going to find delusional idiots posting about what they "invented" (they are just AI generated delusions)

there's like 15 a day on locallama, it fuels me.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.