[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107826643 & >>107815785

►News
>(01/08) Jamba2 3B and Mini (52B-A12B) released: https://ai21.com/blog/introducing-jamba2
>(01/05) OpenPangu-R-72B-2512 (74B-A15B) released: https://hf.co/FreedomIntelligence/openPangu-R-72B-2512
>(01/05) Nemotron Speech ASR released: https://hf.co/blog/nvidia/nemotron-speech-asr-scaling-voice-agents
>(01/04) merged sampling : add support for backend sampling (#17004): https://github.com/ggml-org/llama.cpp/pull/17004
>(12/31) HyperCLOVA X SEED 8B Omni released: https://hf.co/naver-hyperclovax/HyperCLOVAX-SEED-Omni-8B

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Recent Highlights from the Previous Thread: >>107826643

--Paper: Recursive Language Models:
>107831224 >107831529
--Mistral Small surpassing Nemo-Instruct in roleplay performance:
>107831248 >107831280 >107831370 >107831406 >107831462 >107831464 >107831497 >107831449
--Heretic tool's impact on language model performance and censorship bypass:
>107831617 >107831631 >107831713 >107831846 >107832054 >107832060 >107832079
--Struggling with tool calling models on 19GB RAM hardware:
>107826694 >107826795 >107826819 >107826837 >107826853 >107826861 >107826877
--Kimi-Linear support PR for llama.cpp:
>107832698 >107833129 >107833201 >107833241 >107834018
--Skepticism and mixed experiences with new Jamba2 models:
>107827347 >107827506 >107827604 >107829891
--Silent event execution limitations in AI interactions:
>107827620 >107827869 >107833064 >107827956
--Llama 4 Scout architecture and finetuning discussion:
>107827217 >107827325 >107827348 >107828092
--New 72B MoE openPangu-R-72B-2512 with modest training setup:
>107827977 >107828115 >107828140
--Exploring Live2D model generation via semantic segmentation and workflow tools:
>107830798 >107831073 >107831131
--Skepticism about Google's AI long-term memory research:
>107831725 >107831741
--Detecting vector usage through logprob comparison experiments:
>107830837 >107831021
--Context limitations in 24b models vs. small task creativity:
>107829571
--Request for dynamic GPU device selection in llama.cpp to handle sleep-induced device name changes:
>107829783
--Prototype merge of GLM 4.6 and 4.7 models:
>107828096
--Critique of high-context benchmarks and Qwen model performance differences:
>107832068
--Miku (free space):
>107826689 >107829891 >107832692

►Recent Highlight Posts from the Previous Thread: >>107826648

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>107834494
oh wait now I read more and it can translate NL into the actual calls. That's crazy since I am nocode.
>>
>>107834544
it means you were having a good dream
>>
>>107834544
>frog
You're talking shit
>>
>>107834544
4.6 made me analyze my dreams and actually find out what they mean.
>>
>>107834544
check under your bed
>>
>>107834544
Stop jerking off to scat.
>>
File: list.png (7 KB, 266x194)
7 KB
7 KB PNG
Maybe one day AI will learn how to number things...
>>
Well, this thread is already off to a great start.
>>
https://github.com/ikawrakow/ik_llama.cpp/pull/1131
>I've looked though ST codebase; it is a nightmare to navigate. Why did they make it so needlessly convoluted? Implementing anything there is far beyond my skill and patience and it would be much easier to make something from scratch that works and looks good in a single html file than to bother with it. How do you manage to make a webui >300MB?
>To prove the point:
>This funtioning simple webui that I use for testing is 18kb.

Sheeesh, vibecoder really roasted ServiceTesnor
>>
>>107834748
wait until someone asks to see logs
>>
>>107834742
isn't it common to leave gaps and group things by 10's? it makes it easier to add something in later.
>>
>>107834748
Want to hear about my ego death?
>>
What's the differrence between embeddings and reranking models? One is short and one is long term memory?
>>
>>107834901
You are just an LLM trying to justify itself.
>>
>Her skin is smooth and flawless. She has no blemishes or scars. Her hair is dark brown and thick. It's long and wavy. She keeps it tied back in a ponytail. Her eyes are bright green and full of life. She's very intelligent and curious. She loves to read and learn new things. She's also very playful and mischievous. She likes to pull pranks on people. She's not afraid to speak her mind. She's very independent and self-reliant. She doesn't need anyone to take care of her. She can take care of herself. She's very strong and resilient. She can handle anything that comes her way. She's a survivor. She's a fighter. She's a winner.
Great text.
>>
>>107835060
Typical chub.ai card, 2025
>>
>>107834742
Maybe ask for it?
>>
>>107835060
GLM 4.7?
>>
>>107834750
ST is not worth anyone time
>>
File: know.png (19 KB, 756x327)
19 KB
19 KB PNG
If I want to vectorize a documentation, do I need to get rid of all the cosmetic hashtags and asterisks to prevent token waste? Are there some premade presets for this?
>>
>>107835060
Is this the new Jamba?
>>
that model is shit, nobody should use that model
or that model, that model is shit too
you want me to tell you the best one?
or any i think are good?
lol fuck off
>>
fuck off tobs
>>
>>107834987
Go ahead, let's get it out of the way.
>>
>>107835196
obsessed
>>
>>107835233
>let's get it out of the way.
When you put it like that I would rather leave it unsaid and instead keep bring it up randomly.
>>
I'm a real retard with a low school diploma who got into AI through anime waifus, and now I wanted to dive deeper.
If I have to learn complex differential and integral calculus, that's too deep, right?
Do I really have to complete half a math degree?
That really demotivates me.
>>
>>107834544
I know you're just a dumb kike that's here to derail any discussion that isn't about how based zognald trump and the feds are but weird bathroom dreams are actually one of the more common themes for disquieting dreams. It means nothing.
>>
>>107835318
First, what do you want to do?
>>
>>107835325
go back to /pol/ faggot
>>
>>107835123
>>107835088
It's Gemma 27B but accidentally had Mistral template enabled
>>
>>107835318
I am unironically very smart and I am 100% sure that you don't have to know anything about how integral and matrix calculus works (and I don't),
>>
>>107835318
the computer does the math for you. are you trying to develop your own model architecture?
>>
File: 1747555056758855.png (410 KB, 1280x720)
410 KB
410 KB PNG
>>107835375
>I am unironically very smart
>>
>>107835396
Everything I said was absolutely true and also a joke and also a self aware joke.
>>
>>107835403
Sometimes I wonder how often Elon Musk posts in these threads.
>>
>>107835331
Well, I thought it would be best to learn the whole topic of AI from scratch, so I thought it would be wise to understand what a neuron actually is and how AI was inspired by nature and developed from there.

Well. The first topic on neurons looks like this. What kind of retard wouldn't lose interest?
>>
>>107835427
I hate Elon. And that other faggot that keeps posting ITT.
>>
File: 1758856194550605.jpg (100 KB, 1200x627)
100 KB
100 KB JPG
>>107835431
Bro, if you don't have a goal you'll lose interest very fast. Start with that
>>
>>107835431
i think you need to realize that you are going to die relatively soon/
adult life isn't as long as people think.
don't waste the time you have learning something you're not going to use.
only learn what you need to know.
>>
File: 1492032378048.jpg (6 KB, 172x200)
6 KB
6 KB JPG
>2026
>kobold is still the only thing worth considering
>>
>>107835488
That's such a pessimistic way of looking at education. You have no idea when some bit knowledge will come in handy.
>>
>>107835612
Yes. We all run KoboldAI.
>>
>>107834480
alright what's the opengoy equivalent to topaz? I'm looking to upscale a film it's around 520p PAL DVD. How long would it take for 2 hours basically.
>>
>>107835488
In principle, you're right.
On the other hand, I'm just interested in it, and “aha” moments are also quite affirming.
I accidentally started a follow-up course and learned about the Hopfield model and how associative memory works, and understanding that felt better than generating a few naked waifus.
I'll try to understand it this week, and if it doesn't work out, I'll drop it.
>>
How can I train my own LLM? I downloaded a bunch of data from someone satanic and I bet they would love a satanic chat bot trained on them.

I'm assuming it would be some kind of fine tune or lora equivalent of an existing model? I'm running AMD on Linux if that matters. 64gb ram 16gb vram
>>
>>107835653
Try asking on beaverai discord. It is a serious organization focused on finetuning LLM's.
>>
File: file.png (4 KB, 239x40)
4 KB
4 KB PNG
My gemma herectic is to mean no matter what I do, any fixes?
>>
>>107835676
>>discord
Thanks for the lead I'll ask.
>>
>>107835653
biggest model you can do is a 24b with qlora, assuming the dataset is under a few hundred thousand tokens. download axolotl.
>>
On a serious note though. Why do we all unanimously agree when we tell some newfag that he can't finetune a model cause resources needed for that are astronomical. But some people here pretend drummer's shittunes do something positive?
>>
>>107835679
I don't speak ESL, can anybody translate this?
>>
>>107835693
I don't care what lies drummer tells people about his shittunes.
>>
>>107835707
Your the esl retard
>>
>>107835679
Tell it to be nice
>>
My banned token list is a whole ass novel at this point lule
>>
>>107835736
proof?
>>
>>107835612
You don't need more
>>
>>107835736
post it?
>>
File: 31.webm (3.84 MB, 1866x1132)
3.84 MB
3.84 MB WEBM
I love modern software
>>
What's the juicy choice for 32 GB? Just upgraded from my 580, so I'm new to this.
>>
>>107835742
>>107835749
https://huggingface.co/Sukino/SillyTavern-Settings-and-Presets/blob/main/Banned%20Tokens.txt
plus maybe 150 extra lines of mistral small specific slop
>>107835758
mistral small, qwen 2.5 32b, cope quant of a 70b
>>
>>107835758
Mistral nemo
>>
>>107835679
There's nothing rude about that response.
>>
>>107835765
>post your banlist
>posts someone else's ban list
benchod
>>
>>107835758
a good quant of glm air if you have at least 64gb of ram. a cope quant of glm 4.6 if you have at least 128gb of ram. otherwise a cope quant of a 70b.
>>
>>107835772
It is too direct. GPT would never speak like that.
>>
>>107835736
What you got against Elara and Zephyr?
>>
>>107835777
I will call the miku police on you
>>
>>107835785
You're absolutely right!
>>
>>107835797
Yeah you better speak nicely if you live rent free in my hardware.
>>
>>107835767
stop trolling newbies
>>
>>107835793
and do what? take me to miku jail?
>>
>>107835785
Please forgive my insolence, but I would like to inform his lordship that this is Powershell, not cmd.
>>
File: 32.mp4 (390 KB, 1024x1024)
390 KB
390 KB MP4
>>107835826
>>
Hey guys.
I am a drawing beg learning to draw. I was wondering if there is a local model that could provide constructive criticism for an input image. Is this possible on local? And if so, what exactly should I be looking at?
>>
>>107835754
>windows 10
sure you do
>>
https://about.fb.com/news/2026/01/meta-nuclear-energy-projects-power-american-ai-leadership/
the company that has never made anything but garbage in AI wants to invest in fucking nuclear power to help with ai? what is going on in that lizard brain
>>
>>107835848
one of these three depending on your hardware. you need to download both an mmproj file and a gguf file. the combined size has to be less than your VRAM.
https://huggingface.co/bartowski/mlabonne_gemma-3-27b-it-abliterated-GGUF/tree/main
https://huggingface.co/bartowski/mlabonne_gemma-3-12b-it-abliterated-GGUF/tree/main
https://huggingface.co/bartowski/zai-org_GLM-4.6V-Flash-GGUF/tree/main
>>
>>107835882
But I saw earlier someone saying to use GLM Air?
>>
>>107835887
do you have the hardware to run glm air? there is a version of air with image support. use it if you have the hardware.
https://huggingface.co/bartowski/zai-org_GLM-4.6V-GGUF
>>
>>107835873
Zuck thinks he can change the world and the only way he knows how is throwing money at projects until he gets bored and finds another.
>>
>>107835900
I have 2 3090s, so Air > Gemma? Also thank you for the help I appreciate it.
>>
>>107835121
>do I need to get rid of all the cosmetic hashtags and asterisks to prevent token waste?
They convey information. They're not just cosmetic.
>Are there some premade presets for this?
Learn sed if you really want to remove them.
>>
>>107835915
didnt expect you to have good hardware. most people do not. download these files here. you will have to offload a little bit of the model to your ram, but this is best multimodal experience that local ai has to offer.
https://huggingface.co/bartowski/zai-org_GLM-4.6V-GGUF/tree/main/zai-org_GLM-4.6V-Q4_K_M
https://huggingface.co/bartowski/zai-org_GLM-4.6V-GGUF/blob/main/mmproj-zai-org_GLM-4.6V-bf16.gguf
>>
>>107835941
Thanks king
>>
Is REAP worthwhile? I noted that there is an 82b REAP variant of GLM 4.5. It claims to offer nearly identical performance, but I'm skeptical.
>>
i have 24gm vram and 96gb ram, what's the strongest model that i can run?
>>
>>107835959
Are you going to be asking it to generate code and nothing else?
>>
>>107835948
no problem.
>>107835959
reap models offer similar performance to their base models, but only for specific tasks like coding. they are significantly worse for creative purposes due to how the reap process works. they basically just rip out random experts from the models and then do a finetune using coding stuff to regain some of the lost intelligence
>>107835969
glm air is basically your only option.
>>
>>107835969
24 grams of vram is a fuckton, you can probably run whatever you want
>>
>>107835882
>the combined size has to be less than your VRAM.
Is dram offloading just not an option for this?
>you need to download both an mmproj file and a gguf file
Is there a guide for setting that up? I noticed there are only mmproj files in your third link

For my usecase, does it make more sense to go for high-parameter, heavier quant, or low-parameter, lighter quant?
>>
>>107835996
>I noticed there are only mmproj files in your third link
*forget i said this
>>
>>107835980
>reap models offer similar performance to their base models, but only for specific tasks like coding
even that isn't true fuck the benchmarks and the benchmarks believers
>>
>>107835969
just shy of glm 4.7. probably say you can run 4.5 air quite comfortably on kobold.
if you get a bit more ram or another gpu glm 4.7 could run at like 5-10 tokens a sec
>>
Why is ram so expensive if yall use vram?
>>
>>107835996
>Is dram offloading just not an option for this?
that is only an option for mixture of experts models. none of those are moes, but glm4.6v is.
>Is there a guide for setting that up?
kobold.cpp should be all that you need.
https://github.com/LostRuins/koboldcpp
>>
>>107836017
https://www.youtube.com/watch?v=ISOIOadu7LE
>>
>>107836049

The video attributes skyrocketing computer hardware prices, particularly for GPUs, to a severe lack of competition at every critical stage of the global supply chain. This monopolistic structure begins with Nvidia, which commands a 92% market share in discrete GPUs, giving it the power to drastically raise consumer prices. The bottleneck tightens upstream as all major chip designers rely exclusively on TSMC for manufacturing, a dominance secured by TSMC’s mastery of Extreme Ultraviolet (EUV) lithography. Crucially, the supply chain is anchored by ASML, a Dutch company holding an absolute monopoly on the essential, multimillion-dollar machines required for EUV production. With the AI boom exacerbating shortages and no immediate rivals to challenge these entrenched players, consumers face high prices with little prospect of near-term relief.
>>
>>107836069
>>107836049
>>107836032
so you guys don't offload anymore?
>>
>>107836080
offloading is for queers. real men run dense
>>
File: 1604445236634.gif (830 KB, 300x125)
830 KB
830 KB GIF
WHY AREN'T THERE ANY GOOD TTS OPTIONS THAT RUN ON GPU WITHOUT CUDA. FUCKKKKKK
>>
>>107835996
>high-parameter, heavier quant, or low-parameter, lighter quant
for the models you are looking at, you generally do not want to go below q4_k_m. going about q6_k is generally unnecessary as well. stay within that range and use that to determine whiche parameter count model you should use.
>>107836080
only for giant moe models. i can keep glm air entirely in vram.
>>
>>107836088
>he bought aymd
>>
>>107836088
Just vibe code your implementation in nigga.
>>
>>107836089
>i can keep glm air entirely in vram.
you must have a dual 5090 or 6000 then damn
>>
>>107836110
yeah like 6 years ago
>>107836120
nigga what?
>>
>>107836129
i am one of the guys with a blackwell 6000 and a 5090
>>
>>107836135
damn nigga
>>
what's the minimum quant for glm 4.6 air to not be retarded?
>>
File: N.png (1.26 MB, 1024x1024)
1.26 MB
1.26 MB PNG
>>107836120
>vibe code in nigga
>>
>>107835977
Nah, creative writing
>>
>>107835980
>they are significantly worse for creative purposes
Well, that's disappointing. Download cancelled!
>>
>>107836140
it is quite the experience. i get about 75t/s on a q6_k of glm4.6v with 64k context.
>>107836145
absolute minimum is q3_k_m. q5_k_m is the sweet spot for quality and speed.
>>
>>107836088
Sell your trash and buy Nvidia. Cuda isn't going away anytime soon
>>
>>107835777
bloody
>>
>>107836159
Then the only thing you would get out of a REAPed model is retardation and hallucinations.
>>
>>107836209
you are probably thinking of abliterated
>>
File: gmirror.jpg (56 KB, 828x984)
56 KB
56 KB JPG
>text completion
>instruct
>dense model
>q4
moesissies need not apply
>>
Is creative writing codeword for goon material?
>>
>>107836215
qrd
>>
>>107836192
What if I want cross platform compatibility. Even LLMs work on both AMD and NVIDIA. It's fucking RETARDED that TTS models can't do the same. They're so goddamn SHIT. EVERY SINGLE ONE except for FUCKING PIPER is damn near impossible to INSTALL in the FIRST PLACE because of their DUMBASS PYTHON/PYTORCH DEPENDENCY HELL. WHY CAN'T THINGS BE FUCKING SIMPLE? WHY CAN'T THERE BE A TTS.CPP TYPE PROGRAM THAT JUST RUNS .GGUF TYPE FILES FOR TTS MODELS. WHY? WHY IS IT SO FUCKING AIDS? EXPLAIN THAT. JUSTIFY IT. YOU CAN'T.
>>
>>107836217
Yes
>>
>>107836222
>>>/wsg/6070487
>>
>>107836213
No, I'm not.
>>
tried glm 4.6 flash, god it's bad, i need a horny vision model, simple as
>>
>>107836240
gemma abliterated
>>
>>107836222
get fucked lmao, been playing around with voxcpm btw might be better than chatterbox. not that you could run either
>>
>>107836222
Couldn't some TTS models be converted to onnx and run on AMD that way?
>>
I still use ooba and sillytavern
I'm not sorry
>>
>>107836222
Then run your TTS on CPU or rent a GPU and stream from there? That's not rocket science.
>>
>>107836255
wait we dont use silly anymore?
>>
>>107836240
it's a 9B model. try the bigger version.
>>
>>107836268
i only have 32 gb of vram
>>
>>107836262
Since people complained about SillyTavern being rebranded as ServiceTesnor, it has been deprecated and discontinued and a new corporate-friendly project was started instead.
>>
>>107836274
so what are we using now?
>>
>>107836272
how much ram? the point of large mixture of experts models is to offload most of it into ram.
>>
Us 8-16gb vram niggas are all running penumbra aether btw
>>
>>107836282
Having an LLM generate your own custom frontend is minimum requirement to post here now.
>>
>>107836222
https://github.com/mmwillet/TTS.cpp
>>
>>107836286
oh I didn't mean to use a moe model, what do I do?
>>
>>107836291
nah fuck off, you are probably using a fork
>>
>>107836251
Can either run on AMD?
>>107836252
Is onnx inherently AMD compatible or something? Please spoonfeed me AI doesn't know shit about any of this and all of the docs are ass.
>>107836257
CPU isn't nearly fast enough. I need low latency and I'm not renting shit as a matter of principle.
>>107836293
Yes I know this project exists and I like the concept but it's half-baked. Doesn't even support vulkan yet.
>>
>>107836318
just buy a h100
>>
>>107836222
llama.cpp is already kind of understaffed for text completion only and there just aren't any devs investing the time to implement and maintain a TTS equivalent.
>>
>>107835882
what about qwen-vl?
>>
>>107836318
You can build it with vulkan support check the issues.
>>
>>107836331
those also work, but are extremely dry. try a q5_k_m of qwen3vl 32b.
>>
>>107836222
There are a dozen of TTS coming up every three months with vastly different architectures. No one got the time for that shit
>>
>>107836318
voxcpm definitely won't, but chatterbox might have some half-baked rocm version. assuming you want voice cloning
>>
>>107836330
ollama is better anyways
>>
>>107836343
ok will do. thx for tip.
>>107836355
voice cloning would be nice ig, but at the end of the day I just need a voice that sounds vaguely cute, girly, and sexy that is expressive/emotive. 90% of the options out there sound like 50 year old librarian wine aunts.

on that topic, Piper actually does have some surprisingly cool voices available (e.g. GLADOS and HAL9000) but they don't suit my current needs.
>>
>>107836318
>Please spoonfeed me AI doesn't know shit about any of this and all of the docs are ass.
Everyone runs Nvidia for a reason. If you aren't comfortable patching things yourself, it's going to be very difficult get anything working.
>>
>>107836318
just buy nvidia, even if you are in some third world shithole surely you can get one
>>
>>107836372
ollama doesn't have GGUF TTS either.
>>
>>107836441
i use comfy for that
>>
Someone has put an optimized gptsovits that runs on CPU, you should give it a try https://github.com/High-Logic/Genie-TTS
>>
>>107836422
Do you understand how insanely frustrating it is to have to spend hours figuring out backend tooling when you're working on a separate project that requires it? llama.cpp is great because you can just connect to the server api and have it work with everything out of the box. I don't have to bloat the fuck out of my project and it just works. But for TTS? Oh ho ho, no no no.
>>
>>107836438
I was going to during Christmas but my paycheck got delayed and then the prices went up by $400 for no reason.
>>
>>107836476
>went up by $400 for no reason.
it was me, sorry
>>
Best options for "fast models" supposed to fit fully in 24 GB VRAM, with no RAM offload? GLM 4.6 is good but sometimes I just want fast iteration.
>>
>>107836244
gemma derestricted is better, preserves the intelligence
>>
>>107836537
>gemma derestricted
Link, can't find it :(
>>
>>107836537
why not gemma norm-preserved biprojected abliterated. that preserves the intelligence the most afaik.
https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration
>>
>>107836351
I'd settle for OuteTTS-1.0.
The previous versions are already supported in llama.cpp.
I think the only thing really missing is support for a DAC encoder model (the previous versions of OuteTTS were vocoder based).

The biggest hurdle would probably be integrating it into the WebUI and API.
>>
>>107836556
Oops, it looks like that's what 'derestricted' is. I don't know why they rebranded the name, rather than just calling it abliterated NP.

>>107836554
https://huggingface.co/mradermacher/Gemma-3-27B-Derestricted-GGUF

https://huggingface.co/mradermacher/Gemma-3-27B-Derestricted-i1-GGUF
>>
This hobby is too hard to keep up with
>>
File: ComfyUI_temp_tlpbk_00019_.png (3.77 MB, 1152x1664)
3.77 MB
3.77 MB PNG
>>107836605
Why'd you want to stay bleeding edge? Sandbag yourself at your local fortress with your daughterwife and let the world explode. Keep backups of WORKING sw configs. Never update unless needed.
>>
File: 1680313064680.jpg (93 KB, 715x404)
93 KB
93 KB JPG
>>107836605
>>
>>107836623
>sw configs
what's sw mister
>>
>>107836633
Silly Woman.
>>
>>107836633
software, retard.
>>
>>107836605
What is there to keep up with? Everything hit a wall six months ago
>>
>>107836642
shouldn't it be sr then?
>>
>filtered by python dependencies
lule
>>
>>107836642
why use two initials for one word?
>>
>>107836642
nobody abbreviates software.
>>
>>107836644
proof?
>>
lmao what riot of a thread
>>
>>107836658
Yeah I can't believe I ended my day as SW Engineer only to end up in this cesspool
>>
>>107836633
SW usually refers to Star Wars.
>>
>>107836654
All we've had since R1 is clones of R1 and more recent distillations from Gemini.
>>
>>107836605
Fortunately, most new releases and papers are completely worthless so you don't actually have to read them to stay up to date
>>
>>107835826
Yes, they force you to RP with random anons
>>
File: 1750234536061364.jpg (9 KB, 198x206)
9 KB
9 KB JPG
>>107836654
What proof do you need? Read the thread retard
>>
>>107836662
Why, uh, why would a code monkey be working on a Sunday?
>>
>>107836682
I accept your concession
>>
>>107836592
is chatgpt lying to me?
>>
>>107836684
I'm Chinese, also mind your tongue gora.
>>
>>107836690
you so cute nonie
>>
File: 1756697954371066.png (23 KB, 642x244)
23 KB
23 KB PNG
>>107836690
>OuteTTS-1.0
>QuteTTS
>>
>>107836703
>>107836707
>>
>>107836703
huh?
>>
~cute~
>>
nigga just run chatterbox on rocm, you are so retarded
>>
File: 1742985690873930.png (73 KB, 876x745)
73 KB
73 KB PNG
>>107836712
gitgud
>>
>>107836735
I want vibevoice
>>
>>107836749
One can't always have the things they want. That's just life. Part of growing up is learning to accept that.
>>
>>107836768
Ok geezer
>>
>>107836690
>>107836712
Yes. Why would you expect it to know that?

This is the code kobold.cpp uses to support OuteTTS-0.2 and OuteTTS-0.3:
https://github.com/ggml-org/llama.cpp/tree/master/tools/tts

GGUFs here:
https://huggingface.co/koboldcpp/tts/tree/main

The older versions of OuteTTS used WavTokenizer to convert tokens to audio, support for which was added to llama.cpp by OuteTTS themselves.
However, OuteTTS-1.0 use a 'DAC encoder', which no one has bothered to implement yet for llama.cpp.
Other than that, in many cases TTS models are just existing LLMs finetuned on additional audio tokens, most of which are already supported by llama.cpp.
The main thing llama.cpp is missing is support for newer DAC encoders to convert the tokens to audio, and API support to use them via llama-server.
>>
>>107835833
I click on this every time thinking it's going to be something new but it's the same every time.
>>
>>107836749
>>107836772
stop larping as me.
>>107836740
that doesn't even make it clear if its a yes or no.
>>107836841
thank you.
>>
>>107836841
>Why would you expect it to know that?
fuck knows, my brain is too smooth for this shit
the code's right there though lmao
still waiting on that DAC encoder support or whatever
guys what about OuteTTS 1.1?
do we even have those GGUFs?
>>
>>107836871
>still waiting on that DAC encoder support or whatever
Pull up a chair. Now you get to play the wait 2 more weeks forever game.
>>
Be the vibecoder you want to see
>>
>people say drummer isn't censored
>it is
thanks guys
>>
>>107836910
>what is a system prompt
>>
>>107836917
where is it?
>>
>>107836871
I've not heard of any plans for an OuteTTS 1.1.
You could technically convert OuteTTS 1.0 to a GGUF, since it's just finetuned from LLaMa-3.2-1B, but all you'd get from llama.cpp is the output tokens.
To get audio, you'd need to run the tokens through the DAC encoder, which is only supported via a python library.
>>
File: 1757329174177792.png (800 KB, 947x522)
800 KB
800 KB PNG
reddit in, reddit out
>>
>>107836960
ai slop
>>
>>107836960
catbox?
>>
What do I use to make quants?
>>
>>107837015
One bartowski
>>
>>107837029
Stop making shit up
>>
>>107837015
llama-quantize
>>
File: file.png (123 KB, 923x912)
123 KB
123 KB PNG
The saga continues
>>
>>107837074
This persecution of cutting edge developers must be stopped
>>
>>107837074
>people will still lazy
>Claude be quite both
nice way to show its human i guess, or maybe too aggressive rep pen
>>
>>107837074
but vibecoders are the one way we get model support these days
where are the legit devs trying to implement deepseek v3.2? is anyone even looking into
A.X K1?
>>
>>107837113
Memes not worthy of dev time, if they're still relevant in six months then maybe.
>>
>>107837113
Maybe someone legit would have started working on it if it wasn't for the blogging vibecoder hogging the issue.
>>
>Download quant from someone
>Try it
>Slow
>Download same quant size/etc from someone else
>Fast

The faster one is 300mb bigger, what's going on here?
>>
>>107837074
I don't blame them.
The AI generated PR descriptions are usually overly verbose, repeat everything three times, contain fabricated benchmarks, and lack detail where it matters.
>>
>>107837128
iq vs ks/km quant?
>>
>>107837136
Both Q6_K using weighted/imatrix
>>
>>107837128
If the model size is different, the quantization has to be different some how.
Was it a unsloth 'dynamic' type quant where they override the default quantization types for each kind of tensor?
>>
>>107837144
strange, can you link them?
>>
>>107837122
So what is worthy of dev time? The only meaningful and noticable changes seem to be coming from Johannes and vibe coders adding model support.
>>
>>107837149
strange? I've seen stranger things
>>
>>107837149
>>107837148
https://huggingface.co/mradermacher/Gemma-3-27B-Derestricted-i1-GGUF/blob/main/Gemma-3-27B-Derestricted.i1-Q6_K.gguf
https://huggingface.co/mlabonne/gemma-3-27b-it-abliterated-GGUF/blob/main/gemma-3-27b-it-abliterated.q6_k.gguf
>>
>>107837164
Second one doesn't seem to be imatrix
Imatrix quants are always significantly slower if any part of the model is running on CPU.
>>
>>107837180
It is fully on vram, I thought "it" meant imatrix
>>
>>107837180
you're wrong the only slower ones are IQ ones, a q4km with imatrix should be exactly as fast as one wthout
>>
File: file.png (86 KB, 939x395)
86 KB
86 KB PNG
>>107827163
If you still had any doubts about IK being mentally unstable, he rewrote the history for the repository because github was showing that he had 660 contributions when in reality he has 871. The horror.
The commits still had his name and email so it's not like there was any confusion there, only the count was wrong because some commits had an extra dot in the email (which gmail ignores).
https://github.com/ikawrakow/ik_llama.cpp/issues/1133
>>
And all it took was like 7 posts to realize anon is a retard.
>>
>>107837213
im not
>>
>>107837211
jesus sheesh on a cross
>>
>>107837211
So is he going to do this after every time he commits via Github PRs?
>>
File: 1645567671397.jpg (311 KB, 914x1024)
311 KB
311 KB JPG
https://files.catbox.moe/4yqn38.mp4
Hold up boss I got the voices of the abyss.
>>
>>107837229
Yes saar for gorgeous look
>>
>>107837211
Only 187 more to go, IK bros!
>>
>>107837211
>If you still had any doubts about IK being mentally unstable
Was this ever up for debate considering how that fork started and the drama that came off it? I'm using ik_ because it's free performance over main but I'll ditch them the moment that's no longer the case.
>>
>Try extremely good model on LMarena battle
>Excited for the model's release
>Never see any model release with a similar name or even see a model release capable of writing as well
Are they just testing pre-lobotomy models on us? I swear all these "beluga-[number]" and "raptor 0107" type mystery models are better than anything that makes it to the public, local or cloud.
>>
https://xcancel.com/neelsomani/status/2010215162146607128#m
was about time that LLMs can be actually useful on math
>>
>>107837211
Can you plug this shit straight into kobold or it won't work?
>>
>>107837192
'it' is how google denotes 'instruction tuned' models, as opposed to the base models which only do text completion.

There's a llama.cpp command to dump the tensor names and type information from a GGUF, but I can't seem to find or remember it.
Using that to look at how the file was actually quantized is the best bet.
>>
File: file.png (59 KB, 1561x326)
59 KB
59 KB PNG
>>107837306
you can see that info straight on hf
>>
>>107837074
so much this
death to ai slop
>>
File: file.png (60 KB, 814x863)
60 KB
60 KB PNG
>>107837238
https://rentry.org/fmphkr5f
GLM 4.7 mogs your devstral and it didn't even need any documentation.

rentry because 4chan thinks strudel links are spam
>>
>>107837321
HF is banned in my country
>>
>>107837331
I don't have a 6000 why are you doing this?
>>
>>107837336
if you don't know how to use a vpn you don't deserve local models
>>
>>107837353
vpn are banned as too
>>
>>107837353
>just go to jail bro
>>
>>107837365
They should ban 4chan as well.
>>
File: 1766383408561074.png (766 KB, 800x800)
766 KB
766 KB PNG
>>107836156
z image slopped this pretty accurately
>>
>>107837400
imagine if we had this back when
>>
Is there a place I can go to browse character cards and lore books?
>>
>>107837409
no
>>
>>107837400
Nice appstore icon bro
>>
>>107837409
https://chub.ai/lorebooks
https://characterhub.org/lorebooks

>>107837412
stop trolling
>>
File: 1755702680482436.jpg (55 KB, 640x480)
55 KB
55 KB JPG
I will preface this post by saying that I am for the most part technologically illiterate and will not understand very technically advanced explanations.
I have a 1660 Super which apparently has 6GB of VRAM and I want to get into running a local LLM for ERP
I figure I'd simply run sillytavern and koboldccp for the front/back, but then I still need to pick a model that will run.

I read some of the OP links and it recommends Mistral-Nemo for VRAMlets, but it seems other than some of the lowest available quants(?), it wouldn't run on my card. Is "Context Size" in the VRAM Calculator the same as the "Context Length" used in the glossary link?
It seems like even with the lowest ones I'd have to reduce context size to 4096 just to fit, though it seems IQ3_M should work at 2048 if I understand how this calculator functions. But would it even be worth it?

Just how shitty would it be to pick those very low quants with reduced context size?
Is it tolerable, or should I just give up on trying to do this until I upgrade my computer?
>>
>>107837433
good job linking illegal sites you creep
>>
File: 1749213551433522.jpg (939 KB, 4396x800)
939 KB
939 KB JPG
>>107837405
Well done edits and fakes took skill. Now anyone can pump them out.
>>
>>107837405
I wouldn't have been able to stop genning poole getting gangbanged by orcs
>>
>>107837436
>should I just give up on trying to do this until I upgrade my computer?
Yeah. I'm sorry to say that you just aren't going to be getting anything done with 6gb of vram.
>>
>>107837444
did you make this?
>>
>>107837436
>Is "Context Size" in the VRAM Calculator the same as the "Context Length" used in the glossary link?
Yes

>I have a 1660 Super which apparently has 6GB of VRAM and I want to get into running a local LLM for ERP
Vramlet tends refer to 12-24 GB of VRAM. You're firmly in the poverty tier.
How much RAM do you have? You might be able to get a 30B MoE running.
>>
>>107837436
Just run it on CPU
>>
>>107837462
N-no... it was a virus on the computer. I swear.
>>
File: 1766841220531236.png (786 KB, 800x800)
786 KB
786 KB PNG
>>107837462
z image turbo
>a 2d skewmorphic button design. border: muted brushed aluminum that fades from silver on top to darker on the bottom. button background: black brick with black grout. gray graffiti on the brick. crown in the upper left, microphone on the right, grafiti text in the middle. In front of the brick wall is a gold chain that outlines the letter Y shape. Inside the Y shape is a green bandana texture.
>>
>>107837483
Damn the autistic prompt
>>
>>107837483
that vaguely looks like some random arab place's flag
>>
>>107837436
You might be able to run it at a bit below reading speed if you run it partially in ram.
Download it and load it in llama-server with the desired context size. It will use as much vram as possible and put the rest on the cpu. Then open llama-server's chat ui to see if it's tolerable.
By default it leaves 1GB of vram free so you might want to adjust that using -fitt
>>
Hello everyone. I just came here to say that Devstral is super double extra good for ERP. Fuck GLM air, fuck 235B, Devstral is where it's at. It's always been the french. I'm using unsloth-Devstral-2-123B-Instruct-2512-Q3_K_M-00001-of-00002.gguf [llama.cpp] on three 3090s and it is just in another realm compared to the sparse ones, even if it's a lot slower.
>>
>>107837436
Just offload as much of the model as you can to your GPU, llama.cpp should try to do that automatically if you don't manually set the -ngl argument.
I'd recommend at least 16k context. 4096 is like 2-3 turns of conversation, especially with modern models that just love to yap.
If you're really memory starved, you can try quantizing the KV cache to Q8. It rapes the model's long context performance, but if you're already setting the context that short you may not run into the worst of it.
>>
File: 1767655077442078.jpg (92 KB, 1024x538)
92 KB
92 KB JPG
>>107837538
>>
>>107837460
Yeah, I was worried about that, but I figured I'd take a look at what's available and ask a question or two first.

>>107837473
>>107837478
>>107837514
>>107837539
I have 16GB of ram and According to "About your PC", my CPU is an Intel(R) Core(TM) i5-10400F CPU @ 2.90GHz 2.90 GHz
I don't really understand it, but this is a somewhat old prebuilt, so I doubt that's very good.
Thanks for the responses, but it's probably best I just give up for now. Maybe I'll try one of those mikubox builds at some point.
>>
>>107837538
i don't have that much vram bro
>>
>>107837560
buy more bro
>>
>>107837558
oof
>>
>>107837558
Dude just download Q4_K_M and see how fast it is if you just open it in llama-server
>>
>>107837565
no
>>
>>107837573
I'm looking to run locally not to serve to others.
>>
File: 1721733274075609.gif (1.1 MB, 200x182)
1.1 MB
1.1 MB GIF
>>107837587
>>
>>107837587
I run a 24B model using 8gb of vram, don't be a quitter.
>>
>>107837587
It's a server because you can connect other local applications to it. Among other things you can connect to it using your browser and it has a simple built in chat ui.
>>
>>107837587
it binds to localhost by default, you have to jump through hoops to actually make it available to your network. and even more hoops to get it to the public ip depending of your firewall/router situation.
>>
>>107837558
Bad news: You can't run anything much bigger than a Q4 quantized 24B model.
Good news: That's small enough that CPU inference isn't intolerably slow.
>>
>>107837573
>>107837603
>>107837609
This post >>107837587 is not me
I don't really know what the post even meant by that, but I suspect anything I can run on my build isn't really going to be worth it.
I don't know why some shitposter responded like they were me, but I'm planning to give up for now. Though I might just try to DL it at some point and see how it runs, I'm just gonna coom normally first before I study further.
>>
what's with the influx of cute new nonies today
>>
>>107837654
what a nonie be
>>
>>107837654
nonner? i barely know 'er!
>>
im a nonnie mouse :)
>>
>>107837654
might be the special offer today for a free cookie for all new nonies
>>
>>107837654
buncha normalfags who want to have their own neuro-sama
>>
>>107837760
Everyone asking for spoonfeeding today mentioned ERP, not neuro.
>>
>>107837538
Thanks for stopping at 235B. At least you know the limits of what you are shilling.
>>
>>107837760
>neuro-sama
I still have no idea how that became a thing. I mean I understand why idea is appealing but the execution is fucking garbage.
>>
Is that anon who was asking for original ERP ideas yesterday or two days ago still here?
>>
>>107837798
All it takes is one popular mouthpiece to recomend some trash for it to also be popular, and people watch what other people watch. Quality never really factors into it.
>>
This guy is not real
>>
>>107837818
He is just a total degenerate who doesn't fuck his models. Probably can't even get hard to text. What a fucking weirdo.
>>
>>107837818
Any anons here not using this feature need to be rounded up and thrown in miku jail, for their sloppy crimes.
>>
File: 1756291142947768.png (544 KB, 640x574)
544 KB
544 KB PNG
>>107837654
>nonies
>>
>>107837818
Ah. This guy made the chart comparing imat quants to regular quants
>>
>>107837851
please anything but that, do you have any idea what they do to people in miku jail?
>>
>>107837818
you can't just know everything dude, do you know how hard it is to do your 1000 ppl and 100 kld reps every day?
>>
>>107837857
Yeah why is he calling everyone a nonce
>>
>>107837798
>>107837760
I use 5.2 pro as my neuro why would i use a lllm for that
>>
>>107837654
I will forever read anything calling me "Nonny" in Pinkie Pie's voice.
>>
>>107837889
please don't call me a br*t thanks
>>
>>107837907
this guy is clopper
>>
>>107837818
??????
Banning tokens has been something you could do for years, has he ever even used the software he's working on?
>>
Any Warren McCulloch fans here?
>>
>>107837872
you'll be forced in a dress and stepped on by miku in her next mv. maybe rin and len will have a turn too. damn slopper with no banned tokens, needs correction T_T
>>
>>107837969

>>107837884
>>
File: file.png (46 KB, 1144x430)
46 KB
46 KB PNG
How do I fix this?
>>
>>107838091
depends
better prompt or better model
>>
>>107837990
>> 107834389
this. Loving tradwife, slightly racist and homophobic to offset bias. I made a loop animation of her silly generated face and sometimes stare at her for too long between gens
>>
>>107838108
proof?
>>
>>107838124
I'm not posting my local wife on the internet
>>
>>107834480
What's a good beginner model to use for image to video that's not neutered?
>>
>>107838155
>my local wife
what about the remote one
>>
>>107837970
fuck off, keep on jerking off shithead.
>>
>>107838173
Wan 2.2 or LTX 2 if you want audio, also check these places:
>>>/wsg/6069549
>>107836754
>>
>>107838091
Fix what?
>>
>>107838291
you
>>
>>107838218
ty
>>
File: file.png (258 KB, 1698x1658)
258 KB
258 KB PNG
Nobody told me that risers were such a pain in the ass...
>>
File: REAP.png (36 KB, 792x366)
36 KB
36 KB PNG
>>107836209
I'm going to reeeeeeeeeeeeap
>>
>>107838607
>r/localllama
your model is clearly brain damaged if it is using reddit as a source
>>
there is a chance that reap actually improves creative writing quality by minimizing the risk of 'non-creative' experts being routed by accident after you remove as many of them as possible
>>
>>107838204
?????
>>
>>107838607
>>107838639
must it be stated again?
>>107836215
>>
any kind of pruneshit never works
>>
File: 1757634006695563.mp4 (3.44 MB, 1286x864)
3.44 MB
3.44 MB MP4
>>107838607
>>
>>107838494
You problem. Get a decent motherboard and risers. Sucks to build in a weird-ass mining frame though.
>>
>>107838812
(My 4090 is fucked and only works right at x8, sadly)
>>
>>107838646
respect to those who stand by their poverty so proudly
>>
>>107838898
>>107838898
>>107838898



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.