[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: comfyUI_0095_.png (984 KB, 1280x720)
984 KB
984 KB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102790397 & >>102772862

►News
>(10/11) 14B cross-architecture distillation model: https://hf.co/arcee-ai/SuperNova-Medius
>(10/10) Aria: 25.3B, 3.9B active, multimodal native MoE model with 64k context: https://hf.co/rhymes-ai/Aria
>(09/27) Emu3, next-token prediction multimodal models: https://hf.co/collections/BAAI/emu3-66f4e64f70850ff358a2e60f
>(09/25) Multimodal Llama 3.2 released: https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices
>(09/25) Molmo: Multimodal models based on OLMo, OLMoE, and Qwen-72B: https://molmo.allenai.org/blog

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: threadrecap.png (1.48 MB, 1536x1536)
1.48 MB
1.48 MB PNG
►Recent Highlights from the Previous Thread: >>102790397

--Papers (old):
>102793294
--XTC and antislop sampler comparison and discussion:
>102790793 >102790867 >102792300 >102792714 >102792996 >102793025 >102793114 >102791056
--Slop ban list for ST and discussion of kcpp version limits:
>102792985 >102793022 >102793045 >102793120 >102794562
--MinP sampler setting and model confidence discussion:
>102792817 >102792982 >102793144 >102793442 >102793462 >102794013
--F5-TTS model discussion and comparisons:
>102795351 >102797307 >102797320 >102797366 >102797787 >102798033 >102798046 >102798055 >102798081 >102798871 >102799802
--lcpp rpc works with llama-server:
>102791661 >102792593
--Troubleshooting token banning in Mistral and Nemo models:
>102793123 >102793215 >102794086 >102794108 >102794370 >102794590 >102794611 >102794876 >102794896 >102795450 >102795587
--F5-TTS model and VRAM consumption concerns:
>102795150 >102795465 >102796905 >102797245
--Jamba and 405b models can handle 128k context, but independent confirmation is needed:
>102793074 >102793343 >102793357 >102793795 >102798810
--Ichigo voice AI now talks back:
>102797277
--INTELLECT-1 and the potential for decentralized AI model training:
>102790512 >102790590 >102790668 >102791393 >102791510 >102791631 >102797392 >102791315 >102791376 >102791612 >102791557 >102799003 >102799109
--Anti-slop sampler works with service tensor ban feature:
>102791652 >102791679 >102791795 >102791823 >102791824 >102791844 >102791897 >102792150 >102791927 >102791952
--Users discuss backing up silltavern before updating:
>102793239 >102793270 >102794012 >102793923 >102793955 >102793964 >102794072 >102794079 >102794605
--Ideas for improving KoboldCPP anti slop filter:
>102796412 >102796951 >102796798
--Miku (free space):
>102790653 >102790812 >102792946 >102792960 >102796225 >102799318

►Recent Highlight Posts from the Previous Thread: >>102790407

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
File: Summer-eternal-llm.jpg (487 KB, 1080x1104)
487 KB
487 KB JPG
>>102801403
>Disco Elysium writers on AI and art
Umm, Lard Language Model peddlers, our response??
>>
>>102801480
They continue:
>Machine-generated works will never satisfy or substitute the human desire for art, as our desire for art is in its core a desire for communication with another, with a talent who speaks to us across worlds and ages to remind us of our all-encompassing human universality. There is no one to connect to in a large language model. The phone line is open but there’s no one on the other side.
>>
>>102801480
I am so happy gen AI is becoming completely politicized. The people I hate the most are all siding against GNON. Roko yourselves fucking commie troons lmfao
>>
>>102801480
>Disco Elysium
Who? Why should I care what they think?
>>
>>102801480
I like how lawyers simply made up a new law that says if you don't have a law degree you can't practice laws. Meanwhile writers and artists have to boycott and cry and beg like this. We truly aren't equal.
>>
>>102801480
They are kinda right though, LLMs are still dumb af and can't create a compelling story/rp on their own.
>>
>>102801480
AI can't replace these hacks fast enough,
>>
i've tried a bunch of nemo tunes now and will shill for Lyra4-Gutenberg. its very good for my rp adventures which includes reading a rag db in st
>>
>>102801403
How long has it been since the bitnet paper?
>>
>>102801480
On a fundamental level this is the same conflict as with machine generated images: the producer and the consumer have fundamentally different perspectives on the same work.
For the producer it's a way of self-expression so naturally a generative neural network cannot possibly provide a substitute.
But when I am a consumer I frequently just want a novel stimulation that I haven't experienced before; whether that is an actual human work or a statistical prediction of what a human work would look like is irrelevant.
Only when a human takes significant creative control like Mikugaki Anon can the two perspectives be merged.
>>
File: 1000001942.webm (2.32 MB, 720x1280)
2.32 MB
2.32 MB WEBM
so finetuned sovits is the highest quality and most expressive tts for coom?
>>
>>102801549
Neither can Disco Elysium
>ooohh detective game, murder mystery
>the culprit is on an island that gets unlocked after doing something
Literally the same shit as any other game but those have actual gameplay
>>
>>102801480
pretentious mfers
>>
>>102801480
What blind retard thought it was a good idea to do black on red and even red on black on red?
>>
>>102801480
Human mind is easily tricked, and it's silly not to exploit that fact for convenience and amusement, otherwise what you do is just some weird perfectionism thing (the artist sickness).
John Elysium here is free to entertain me in the middle of the night by drawing me pictures of Miku eating GPUs.
If he's feeling lazy and copies it from somewhere, it's alright too. Not like gonna check anyway.
>>
>>102801603
almost 8 months
>>
>>102801480
truth nuke.
>>
>>102801480
>video game
>art
pick one
>>
>>102801480
this stuff doesn't land when the people pushing it are part of a coalition of misanthropes, extinctionists and anti-natalists

you're not some great lover of humanity and humanness, you're just strategically pretending to be in this context because you think it might get you what you want
>>
>>102801480
I can't buy a pretentious artist to write me stories. But I can get a GPU.
>>
File: 1698213577261974.jpg (133 KB, 994x735)
133 KB
133 KB JPG
>>102801480
people who hate ai now are the same ones using photoshop and other tools daily
>>
>>102800936
Really cant trust those comparison sites like you said then.
Thanks as always gpu anon, much appreciated and thanks for the work you do.
>>
>>102801480
>llms will never improve from now on
>llms are the only form of ai
dumb luddites, nothing surprising desu
>>
>>102801480
Its starting to get ridiculous.
Even more stupid since we are in a global managed decline.
Everything, even entertainment now, is slowly going to shit.
There is going to be a cross over eventualls. Directionally its clear where artfags vs. ai is headed.
>>
>another week
>nothing happened
it's never been so over
>>
Can anyone give me a quick rundown on how Meta set up their infra to train llama3? Since their blog post is full of buzzwords for tech "founders" and is not very in depth: https://engineering.fb.com/2024/03/12/data-center-engineering/building-metas-genai-infrastructure/
>>
>>102802304
they bought a gazillion gpus and spent a bajillion dollars on electricity and cooling and it took them 8 months to train it
>>
File: 1533317583133.jpg (1.43 MB, 1500x942)
1.43 MB
1.43 MB JPG
My ooba is broken again, with silly taver after update, my gguf models now are retarded and just repeat the first gen... Ooga bros, how you are doing today?
>>
File: cry soyjak.png (126 KB, 424x417)
126 KB
126 KB PNG
>>102801480
>>
>>102802439
>he ooba'd
lol
>>
Wtf.
>I apologize for the continued issues. It seems there's a conflict between the CUDA libraries and the system libraries. Let's try a different approach that doesn't rely on sounddevice. We'll modify the script to save the audio files and provide an option to play them using a system command.
It output something else. Very weird behavior.
https://vocaroo.com/17z7w7YY0HKC
Does this tts halucinate?

Also:
https://vocaroo.com/1ftnEaergmRj
>>
>>102802304
https://imbue.com/research/70b-infrastructure/
Jesus Christ imagine having to go through all this
>>
File: 2013-10-13-611703.jpg (2 KB, 31x36)
2 KB
2 KB JPG
I whipped a RP benchmark and I was quite surprised to see how good and uncensored Rocinante is, I wonder what Drummer's secret sauce is.
>>
File: 1715604297022255.png (2.09 MB, 1164x1010)
2.09 MB
2.09 MB PNG
/lmg/ hate on zuck in 3.. 2.. 1..
>>
>>102802836
inb4 Lecunny burns the bridge and murders Zuck (on twitter)
>>
Is this how it ends? Was Largestral the peak of local models before the permanent winter? I want all of you to know I enjoyed your company and I'm going to miss you when it's over.
>>
>>102803049
Thanks, Anon. If not now, I will miss you too after this is all over. We will meet again somewhere, sometime, somehow. Maybe even after this life.
I hope we get real AI Mikus before then.
>>
>>102802836
Does this mean all future models will receive coom-aware training by default?
>>
>>102803049
there's not a permawinter. all you have to do is wait. wait long enough, meta will release some unusable dogshit that "performs better" on all metrics no one cares about. that's all it takes for every other group to feel compelled to compete again. all of them will suck BESIDES mistral. upgrade incoming. probably within late q1, early q2.
>>
>>102803129
Maybe? Who knows.. hoping for less censorship for sure. Or they will just focus on actual performance more.
>>
is anyone unironically using qwen 32b or was that just a week long meme?
>>
>>102803155
I useed it a few times when I wanted a dry assistant... which wasn't often.
>>
>>102803129
Don't kid yourself
>>
>meta AI researchers agree llama2 was too censored
>release a censored base llama3
What did they mean by this?
>>
>>102803288
censored != filtered
>>
>>102803301
Yeah filtered is even worse because finetuning can fix censorship but can't fix stupidity.
>>
>>102799460
That's what llama.cpp does no?
Although I think that functionality is not compatible with flash attention.
>>
>>102802836
Isn't this a months-old Tweet?
>>
>>102803319
Productive people hated llama2-chat because it would randomly refuse requests due to being extremely trigger happy with its refusals even in very mundane scenarios. They fixed that with llama3. The fact that l3 does not know what omorashi is due to filtering does not concern meta or the intended audience of the model.
>>
File: 1707961273205806.png (620 KB, 607x636)
620 KB
620 KB PNG
>>102803408
Not so old https://x.com/LeadingReport/status/1838966484820709806
>>
File: url(19).jpg (92 KB, 1024x758)
92 KB
92 KB JPG
LLMs are reaching a ceiling. Sure you can get a 8B performing as well as the old 70B, but the prose, the reasoning, everything that could produce a better output is out of reach. That's because there is not a single initiative to get better datasets instead of trying to cram as much shit as possible into these models. Imagegen understood that captioning was very important for their model performance. Audiogen understood that samples of high quality perform better. Only textgen is behind because these lazy fucks have a shitload of compute and can't bother to better filter and curate their shitty datasets.
>>
>>102803429
Still a jew, libertarians are kikes, only chinks and russians are based because they are authoritarian.
>>
why not just leave the general and do anything else with your life if all you can do is doomspam? like seriously zoomers don't waste your youth like that
>>
>filtering works so good for image-gen look sd3 is bestest ever!
>>
>>102803624
>what is ponyxl
Retard
>>
>>102803562
You're a mentally ill dopamine addict that mistakes their own self-induced anhedonia for flaws in the fabric of reality (i.e. proper syntax and linguistic structure). You are a shitty person for typing this because other people will read it and become brain damaged by how fucking stupid and narcissistic it is.
>>
What do you guys use for generating memories? (chatgpt recently added this feature but my homebrew dialog engine has had it for a while.)

Lately I've been using a 2b gemma model with a "write memories in bullet points" prompt along with a ton of retarded heuristics because it's fast but the output is kind of crap and seeing openAI manage it well makes me think I could improve. Has anyone else done this?
>>
>>102803562
holy nigga. Datasets won't magically make local models not suck, you idiot. The issue is at the pre-training stage, and if you can't apply any filtering at this stage, and it's not like we, the community, should care about that since we aren't training models from scratch.
>>
>>102803565
The authoritarian vs libertarian debate is retarded.
Nationalism and tradition are all that matter and other groups understand that. The dumb fucking "individualism above all else" (which is a satanic idea) pounded into people's heads by television and radio for a century caused us to forget this.
>>
>>102803143
>there's not a permawinter. all you have to do is wait. wait long enough
Every time. Hardware will also catch up to the potential of what we already have. There's no putting 405b back in the bottle.
The shit I could to with that if I was able to get instant responses to queries and start making swams of intelligent agents that could react instantly would be insane.
>>
>>102801480
tl;dr: "wah it's not art if it's not a human telling you how to feel!"

Like another anon said, I don't give a shit what the artist is trying to say, I just want stimulation and these fucking self righteous cunts are just mad that the people are being made aware that all they produce is slop.
>>
>>102803834
Doesn't ServiceTesnor already have this feature amongst its planeload of bloat or did they remove it for being too-rp yet?
>>
File: 1699067277555615.png (42 KB, 661x413)
42 KB
42 KB PNG
>>102804080
>wah it's AI telling you how you should feel! listen to it!
yawn
>>
Good morning /lmg/!
>>
Grok2 and mini on openrouter.
Almost as expensive as 3.5 lol And mini is same price like the normal one.
It feels really stupid.. I'm teleported to japan and say "こんにちは".
The woman thinks to herself "He talks in a strange language i cant understand". lol

Also kinda to be expected but I finally tried Magnum 72b V2, its so bad..
Like I teleport the girls from the "unholy party" card into the air so they are in freefall.
They refuse to show me their panties. Alright. Instead of crashing onto the earth a portal suddenly opens up beneath them that swallows them up. Why did they think its a good idea to train on a qwen2 model...

Whats the best model thats like nemo but smarter until 70b? Would CR be good if I can endure the slop? I can fit around 35gb in vram.
>>
File: 1722307658891620.gif (216 KB, 160x120)
216 KB
216 KB GIF
>>102801480
>"progressives" shit themselves and screech in impotent rage at technological progress
Why are leftists like this?
>>
>get serious grass is always greener syndrome
>Try some premium models, Claude, gpt4o, full fat mistral large
>They're all just as flawed as local options. Maybe a little better at following instructions and faster due to hardware

I have never felt more doom than after using proprietary models and seeing how meh they really are.
>>
>>102804208
>Why are ... like this?
revealed preferences. Ignore the labels people put on themselves, and what they purport to believe and decide what their internal state must be based on what you observe about they way they allocate scarce resources eg money and time.
>>
>>102804271
*tips fedora*
>>
>>102804141
Good morning, Miku's other armpit
>>
>>102804265
>I have never felt more doom than after using proprietary models and seeing how meh they really are.
tfw frontier open models are on par or better than the closed one. That should give you hope, I'd think.
>>
>>102804302
For now. If Meta decides to stop being so charitable with their top models and we can't figure out distributed training on reasonable timescales, the gap will only widen.
>>
>>102804302
All it told me is that this is probably as good as it gets.
>>
>>102804325
>stop being so charitable with their top models
But they already stopped being charitable to my penis?
>>
>>102804326
What it means is that companies can't be lazy and just keep throwing more tokens on larger models and get better results for free. But there's clearly still a lot room for potential improvement by improving the architecture. Just look at diff transformers from a few days ago.
>>
>TTS model
I sleep. Are we ever going to get a SOTA Udio/Suno locally or are we destined to be eternally cucked out of this tech locally?
>>
>>102803562
Agents
>>
>>102803424
>The fact that l3 does not know what omorashi is due to filtering does not concern meta or the intended audience of the model.

L3 is still shit for productivity. Hasn't even caught up to Claude 3 Opus.
>>
>>102804326
>All it told me is that this is probably as good as it gets.
We're all close to SOTA. Seems like we need an order of magnitude increase in beaks before we get a significant jump up in smarts again.
>>
>>102804265
3.5 sonnet and and o1 mini/preview, gemini 1.5 pro is where the cutoff is. Try to do anything productive, you will see why local can't keep up.
>>
retard here, how do i know what version of torch/CUDA i'm supposed to install for f5-tts?
also is basic venv fine or do i need conda?
>>
>>102804485
It's a lot better than llama2 and refuses less.
>b-but it's worse than closed source!!
Obviously, yes.
>>
>>102804548
You're saying 4o doesn't cut it? And how come this is the cutoff now, how does that explain all the people using GPT-4 and Opus productively in all the time prior to 3.5 Sonnet?
>>
>>102804325
You know, if Meta suddenly went closed and stopped releasing models, what would this do to the open weights "industry"? Would other companies also start becoming closed just because it's cool and Meta did it basically?
>>
>>102804095
>ServiceTensor
Did they actually rename it?
>>
>>102804607
>how do i know
Have you tried reading the README?
pip install torch==2.3.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118
pip install torchaudio==2.3.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118

https://github.com/SWivid/F5-TTS?tab=readme-ov-file#installation
>>
>>102804753
have you read it you fucking retard? YOUR cuda.
have you ever been to this page before you colossal faggot?
https://pytorch.org/get-started/previous-versions/
>>
retard here
I love you guys
>>
>>102804193
You write like a schizo. So just get a tulpa.
>>
>>102804765
>YOUR cuda.
And what part of that is so difficult to understand, you illiterate cum guzzling retard?
per the requirements.txt
># torch>=2.0
># torchaudio>=2.3.0
Install CUDA and install whatever torch match YOUR cuda install. Is this your first time not installing from an exe?
>>
>>102804919
shut up you fucking imbecile, you obviously misread my post and now you're twisting yourself into knots instead of just admitting you fucked up
shut the fuck up and never reply to me again, mr. 'anonymous' - if that really is your name.
>>
>>102804926
Eat an entire bag of dicks, mr 'pip install is too hard for me'
>>
>>102801480
My favorite part of anti-AI cope is that instead of criticizing AI in its current state they always go with "AI will NEVER..." like they can see the future. It's doubly funny since in imagegen for example it has already done things people were going "will never" about like generating proper hands and generating art styles that are organic looking like a child's crayon drawing. Sometimes people forget how fast this shit has progressed. It has progressed so fast that if there are no major happenings for like a month or two people start doomposting about stagnation and how it will never get better. You could send a generic low param vramlet model like Mistral Nemo 5 years in the past and people would shit their pants over it.

Ultimately as long as AI gets good enough at faking intelligence that the user can't tell if he's are talking to a person or a robot even if he spends a lot of time with the AI then that will be enough to replace another human as a conversation partner. It doesn't matter if AI never reaches proper sentience if it can completely fool a human into believing it's "thinking" and I believe that point will be reached eventually. And even when that point is reached people will still cling to their idea that humans are special and claim talking to AI is a worse experience even though they would fail a blind test where you must determine who is AI and who is a real person.
>>
File: 953151300700.jpg (349 KB, 864x857)
349 KB
349 KB JPG
>>102804991
>>
I've been told vy /aicg/ to come here. So, I currently use a 2060 with 6gb vram. Is it worth it to upgrade to a 3060 12gb vram? Is the AI, really that much better? Only have been using 8b models now for RP. I really don't know. /aicg/ laughed at me...
>>
>>102801480
But I'm a friendless virgin loser, a pocket pussy and llm is all I have

What now?
>>
>>102805078
>Is it worth it to upgrade to a 3060 12gb vram? Is the AI, really that much better?
No and no, you'll get bored pretty fast and fall into buyer's remorse.
>>
I hate the corporate speak of models
>>
>>102805102
Fuck. At what point would it be worth it? I'm going to save my money for that.
>>
>>102805124
48gb vram is the bare minimum
>>
>>102805124
If you're expecting something like gpt or claude or whatever then it won't worth it for a long time. If you're actually enjoying 8b, then you might as well save for one of the 24gb cards.
>>
>>102805124
Somewhere around 96gb+ vram is about the point where you start getting the speed and intelligence that it is really "worth it" for the average person.
If you are more patient, then you can get good results by batching questions and coming back later for an answer, but 100GB is about the minimum useful model size for general intelligence, so you need a minimum of 128GB ram to get going (slowly).
Rule of thumb for cpu offload: offloading any more than 20% of the model from vram to sysram and speed absolutely tanks.
>>
>>102805078
see the lmg build guide in the op for a quick and dirty comparison of different approaches, why they work and how much they cost vs what performance they provide.
>>
>>102804891
>>
>>102805150
>>102805131
So, 24gb minimum... Fuck. Never ever then...
>>
>>102805194
go back >>>/pol/ incel.
>>
>>102805199
It depends on how high your standards are. Some people enjoy 12b nemo very much.
>>
>>102805208
you cant be racist towards american though
>>
>>102805189
>lmg build guide
we need another mad scientist to figure out a non-obvious hardware build that gets us better t/s/$
>>
>>102805223
I think Jetson AGX Thor with 128GB VRAM next year will probably be the best hardware build available to us. I guess if you don't mind tinkering or having your build obsolete within a year, you could try to build with 64GB Orins now.
>>
Isn't AMD ok if you're only doing inference?
>>
File: 1716661982048984.jpg (86 KB, 1024x576)
86 KB
86 KB JPG
>>102801480
No matter how much you hate these leftists, it isn't enough. Even if your hatred completely consumes you, it still isn't nearly enough. These people literally spend every waking moment of their lives trying to figure out how to make life worse for the rest of us.
>>
Noob here. How can I synthesize a voice locally, using my own audio samples?
>>
>>102804936
https://vocaroo.com/1oE212oR2MgG
>>
>>102805329
Yes it's alright
>>
>>102805329
For the most part, yeah.
Don't know how multigpu works, if at all, but for single gpu, it pretty much just works as far as I'm aware.
>>
>>102805194
europoor cope is good for the soul
>>
>>102804991
>they always go with "AI will NEVER..." like they can see the future.
Stupid people have troubling thinking about anything that isn't the status quo. They get blown away by some technological inovation, and quickly get so used to it that they can't imagine what life was like before or that it could happen again. It's the same like when they claim humans will NEVER have space colonies. Never is a long time.
>>
>>102805386
better be because you sure as hell wont be getting actual medical help anyways
>>
>>102804677
ChatGPT4o is maybe about 90% as good as 3.5 sonnet.
>>
>>102804265
>A couple of years ago, all we had was closed models. And they were cool, they were useful, they helped educate a ton of people on what AI really was and particularly what AI could do for businesses and enterprise use cases. [...]. Fast forward a couple of years and, well, we've come to the point where the quality of open source models is just undistinguishable from closed models.
>The latest models from, from Meta and others are just on par with the largest and best closed models from OpenAI, Anthropic. So that's a given. I'm not even going to argue with this. There is no arguing. It's a fact. It's very clear to me open source models are state-of-the-art and they're winning this.
>I've seen enough evidence here to confidently say that a small open source model tailored on quality data will always outperform a generic large model. In 2023, I used to say almost always. Now, I can say always. I've seen it enough. And we, with our colleagues here, do that every single day.
appeal to authority bros... https://www.youtube.com/watch?v=CVHsH_J65ok
>>
Why didn't any of you guys call me out when I said the 5090 has 50% more bandwidth than the 3090?
I just realized the 5090 has 50% more vram. So technically speaking the upgrade is negligible.
You won't be able to run bigger models faster.
Actually, you will run models 50% slower on 2x 5090's compared to 2x 3090's because you will run bigger models that are slower (however, I admit a 3090 is too fast on it's own, you can't read faster than 20-30tk/s, so running 50% slower with a bigger model makes the 5090 more balanced).
You could buy 2x 4070 TI supers and it wound run 20% slower than a 5090, but it would only cost $1500. That's kind of shit I admit, you shouldn't buy 2x 4070TI supers, but you don't need to commit $2000 on a 5090, you can just buy a 16gb card for $700, and upgrade later for dual GPU (Or better, just buy a used 3090 if LLM's is all you care about).
So I guess I never really understood the math around bandwidth and multi-gpu setups. The only way to get more performance to run models larger than 48-64gb at decent token speeds (like 10tk/s) is with HBM or sram.
>>
>>102805839
honestly, until things hit 512GB I really don't see the point in upgrading. If you can't fit a good sized quant of 405b and context in vram, then I don't think its worth it over a bunch of 3090s
>>
>>102805343

Learning this should let you use SillyTavern with XTTSv2+RVC and even make RVC songs if you wish. All the newer tts are not integrated into a retard friendly way yet.

https://docs.sillytavern.app/extensions/rvc/
https://docs.sillytavern.app/extensions/xtts/

Example:
https://vocaroo.com/17AwF2KcdyxK
https://vocaroo.com/1kjfZtTAw2ms
>>
>>102805962
sounds like shit, why bother? use a newer TTS than this dumpster fire
>>
>>102806032
Then have fun setting up and integrating your newer TTS yourself.
>>
>>102806046
nocoder fags on this board baka, get real nigga
>>
>>102805119
This is a good and valid opinion. It's what makes the people working on local models so important.
>>
>>102803918
>Nationalism and tradition
The idea of nation is ambiguos, What is nation? individualist explain nation as the part of an all from the ego it self, this call solipsism, in the rest, collectivist explain nation as the part resulted of sum in super structure, and that's why you need a father, a king or a leader. You're ignoring the depths of your stupid rhetoric. The same with tradition, who make traditions?, the culture based in the collectives views in common, or the invidiousness of those group to their benefits. So yes, the debate is always, your egoism, slefish jewish and protestan view called libertarian, liberal, or some stupid name with liber, and our roman, catholic, Hegelian communist and confusianism collectivism view. in the center of both are fascism, they want to armonice both, the individualist self with the collectivist, but they fault, lose the war, and here Franco, lost pathetically to anarchist with a bomb.
>>
File: 1567193580622.jpg (90 KB, 768x1024)
90 KB
90 KB JPG
>>102806252
>and our roman, catholic
>>
>>102806252
model name and quant?
>>
>>102806306
That's not wrong, but its a common misinterpretation of Gibbon's line of argument.
He was specifically talking about the change that Christianity brought as it displaced the entrenched religion...ie the FACT THAT IT CHANGED fucked up the social fabric of Roman society and hastened the empire's disintegration. Not that X was better than Y religion.
I imagine the same argument should be brought forward now: that the truth of a particular religion is its least important facet, and that the social cohesion you get from a uniform religion in a society is the true benefit.
Wherever you get midwits that are aggressively athiest is where shit goes sideways the hardest. Intellectuals of all ages have had an implicit understanding that religion was simultaneously indefensible BS and useful. Neither fact invalidates the other.
That said, the old gods were way fucking cooler.
>>
>>102806419
Rocinante-12B-v2a-Q8_0.gguf
>>
>>102805962
I appreciate the answer, I'll take a look.
>>
>>102801480
>a computer program will never be able to communicate art
>uses a computer program to try and communicate their art
What did the gayme dev mean by this?
>>
File: 1702156393839631.jpg (1.31 MB, 3386x4096)
1.31 MB
1.31 MB JPG
>>102801403
>>
File: 1701931553561455.jpg (43 KB, 850x478)
43 KB
43 KB JPG
>>102806632
its a shame they named such a shit tune after such a cool ship
>>
>>102803255
I won't, I want the model to kid for me
>>
A while back it was often claimed that quants hurt larger models less. So how do 1B/3B fare under quantization? Like for example, does 3B already lose quality on 6 bits?
>>
>>102805199
I have a 16gb GPU (RTX 4070) and I use a 70B model, which is pretty smart. It take sabout 2 minutes to generate a dozen or so lines of dialogue, pretty good to me.

If you can afford a 4090 I'm sure you can get some great value out of it.
>>
>>102805839
I felt like it was bullshit when I read it but I'm too poor to care about big boy cards.
>>
File: 95-to-hit.png (234 KB, 428x456)
234 KB
234 KB PNG
Can some anon help me?
I'm trying to improve the audio transcription pipeline in my software. I'd like to use distilled-whisper, but I get terrible hallucinations with it and have just been using large-v2 since it doesn't hallucinate (besides the 'thank you. thank you.' bits...)
Is there some black magic or proper configuration to use that won't cause it to hallucinate or miss entire paragraphs worth of words? Using faster-whisper.
Current settings:
options = dict(language=selected_source_lang, beam_size=5, best_of=5, vad_filter=vad_filter)
With 'vad_filter' disabled
If I'm being retarded please tell me
>>
>rtx 5000 series will hover around 16gb VRAM again
Jensen bro please I don't want to run llama3 8b at 80 tps, I'm already happy with 50 tps
>>
>>102807159
>3B
There is no quality there to lose.
>>
I asked ChatGPT why nobody trained a BitNet model. You won't believe my 3rd regen.
>>
>>102807492
ok
>>
>>102807492
Holy shit. I didn't believe you but I just asked it too, and... wow.
>>
What happened to everyone here that said cpumaxxing 405b would be fine because they could leave it running overnight? They were going to write a game engine or have it do their day job for them. Has anyone reported back success doing that?
>>
>>102807665
405b is just slightly better than qwen 2.5 72b.
>>
>>102805343
If you just want to do inference.
https://huggingface.co/spaces/mrfakename/E2-F5-TTS
Sample:
https://vocaroo.com/1fOYXy8fzyOO

Clean up the vocals with resemble enhance and the audacity plugin, Acon Digital DeVerberate 3
https://huggingface.co/spaces/ResembleAI/resemble-enhance
>>
>>102807159
It's not so much that quants hurt big models less. It's more that big models are smart enough where making them more retarded is hard to notice in most cases. But the degradation is still a heavy hit to intelligence from baseline; even going to Q8 will significantly reduce a model's coding capabilities for example.
>>
>>102807761
Is that a no?
>>
i'm downloading Mistral-Nemo-12B-ArliAI-RPMax-v1.2 and it's gonna be awesome.
no i will not buy an ad.
>>
Any good all-purpose replacements for Mixtral?
>>
Hey pals, had a question. I've been playing around with this shit, mostly using the NemoMix Unleashed 12B model. It's been fun, but I want to check out what the Midnight Miqu 70B model is like.

I've got a 4070Ti and 32GB RAM, and I'm too cheap to buy a bigger/more GPUs just to play with LLMs for a few days. Will my setup be able to run the 2-bit quant for Midnight Miqu?
Does it make sense to upgrade the RAM to, say, 64GB or 128GB, and use a bigger quant since I'm not going to be able to fit the entire thing in VRAM anyways?
>>
>>102807665
Still working on it. I'm actually going back and forth between deepseek 2.5 and 405b, but the gist is correct.
Game engine is started but stalled because I'm lazy.
Career advancement due to bringing llm skills into work is a real thing. Doing my job outright isn't there yet, but I can make it do other people's jobs, which is cool for me.
Mostly I'm making it do jobs that should get done that nobody is assigned to do (large unstructured dataset pattern and opportunity analysis) or things that everyone hates (braindead corporate paperwork boilerplate)
I think there are 3 or 4 cpumaxxers in this thread
>>
>>102807907
Gemma2 9b SimPO for low vram.
Qwen 2.5 32b for slightly more vram
Qwen 2.5 72b for dual 3090
>>
>>102807952
you should be able to fit a low quant into 32gb + your vram. select disable mmap to save a bit of memory if you run out. the q3s i have is 28gb
>>
File: 2.jpg (114 KB, 832x832)
114 KB
114 KB JPG
https://files.catbox.moe/tmi77t.jpg
>>
>>102808048
Thanks pal. I'll give it a shot.
>>
File: 00143-726689883x.jpg (72 KB, 512x512)
72 KB
72 KB JPG
https://files.catbox.moe/mlvk75.jpg
>>
>>102808052
>>102808077
Not bad.
Do you have a missionary one?
>>
>>102808065
for the other part of your question about getting more ram, the bigger the quant the slower it gets. under 1t/s is pretty bad but if you have the patience.. 70b is pretty good at low quants though, i ran the original q2 miqu leak and had to turn up min p a bit, but it was coherent and worked good. midnight is still my overall favorite model for rp but i'm looking for something newer. it just writes the way i like mostly even with its slop, its coherent and also creative. i have 64gb ram and wouldn't bother running higher than q3 because it works fine and isn't unbearably slow, 1.4t/s with 16k context filled
>>
>>102808131
Yeah, that lines up with what I've read, more-or-less. Just wanted some first-hand opinions. I guess I'll try a low quant, see if I like it, and then think about my options from there. Thanks for the advice.
>>
>>102806976
ship?
>>
>>102808052
>>102808077
Nice Mikus
>>
>>102801480
I agree with it I you only apply it to current models. Everyone shitting on it now will bend the knee and admit they enjoy it when you're able to simulate a whole d&d campaign with three random bots and have it feel normal.
>>
>>102808204
rocinate from the expanse, original name is tachi, they steal it from the martians in ep 4 or 5. great show for the first 3 seasons but amazon destroyed it in s5 by making it entirely about naomi rather than following the books which pretty evenly distributed time to each character
https://youtu.be/KxtLf3PTYT8
>>
I have a folder full of anime characters. I'd like to use AI to tag them. How would I go about setting up a local model for this?
>>
>>102808261
it's most likely don quixote
>>
I prefer unique bots to generic but well built bots. I don't give a fuck how much prose your saviorfag high-school girl bot shits out, because i know I'm going to get way more amusement out of the 20iq Gooner Ahri bot speaking in ebonics while spamming emojis. That bot is fucking worthless but it's a gunny kind of worthless.
>>
any good models for music/sound effects/instruments? not sure what all the identical non-suno sites that have cropped up are using
>>
>>102808284
You'll have to wait until llama.cpp or exl2 support multimodals. Think llama 3.2 and maybe pixtral has vllm support though, so you could try that.

I'm honestly surprised there hasn't been a rush to support multimodals on llama.cpp. Is there anyone who's confirmed to be working on it?
>>
>>102808388
I'm not up to date in this field. Is there any problem with just using deep danbooru?
>>
>>102808313
it was great while it lasted. for the first 3 seasons i loved the changes they made from the books, like giving dawes and ashford more depth. after amazon got their mitts on it they ran it straight into the ground with muh naomi. she's a shit character even in the books, but nowhere near as insufferable as they made her for s5.
https://youtu.be/IOAxidMfznI
>>
>>102808402
If he just wants to tag anime characters, then deep danbooru is enough.
>>
>>102808284
You can try using a LLaVA model. Its supported in llama.cpp
>>
>there will never be a good, fast local model for average consumer hardware
>>
Is the price of hardware and inference not coming down at all?
>>
>>102808768
>good
>fast
>average consumer hardware
Give me some concrete and objective metrics for those parameters.
Let's see how close or far we might be.
>>
>>102808813
imagine responding to obvious bait
>>
>>102808826
When did the truth become bait?
>>
>>102808813
Not him but let's say about as smart as a q_5 70B, 2-3 tokens/s, and I guess whatever is considered a decent gaming GPU, probably RTX4070 or so.

Not gonna happen anytime soon IMO.
>>
>>102808946
>Not gonna happen anytime soon IMO.
Yeah. With those requirements, I give it at least two gpu generations.
>>
c'mon do something
>>
>>102809048
I am going for a walk. Anon, would you like to come with me?
>>
>>102809115
yes please
>>
>>102809135
Really? Together?! Remember to bring something warm, Anon-kun, it's a little chilly.
>>
File: 1728515704011647.jpg (207 KB, 1024x875)
207 KB
207 KB JPG
>>102809115
walking sucks, at least ride a bike
>>
>>102803918
No. The best action is the one that benefits you the most. The idea that we should hold ourselves back by an ideology or code of ethics is retarded. We should each do whatever gives us the most power and status. I don't give a fuck about nations or traditions. I only care about myself, which is what ideologues do anyway, they just don't admit it. Everyone in this world only cares about themselves, even you only support "heccin traditionalism" because you believe it serves you. In reality doing whatever you need to do as a practical consideration would serve you even better.
>>
>>102809243
>The best action is the one that benefits you the most
This is easy in the individual, immediate, short- term case. It becomes increasingly difficult the father out you get from the end of your nose and the longer out you get from today.
What you've written is sadly both 100% true, and about as actionable as "just buy low sell high bruh"
>>
File: 39_6481_.png (1.65 MB, 832x1216)
1.65 MB
1.65 MB PNG
>>102809202
It's barely autumn around here
>>
>>102809366
Save some thoughts for the more Northerly Mikus
>>
I gave supernova a try. And it is completely unremarkable when it comes to cooming. Mistral small is smarter and has slightly better prose quality.
>>
>>102809570
what tune of small are you using? i've found small to be worse than nemo personally, which is surprising because it should be smarter by being double the size, but it aint
>>
/lmg/, could you help me decide if a gpu purchase would be worth it?
Here's my current situation:
>Have a refurbished workstation with a 3090 and 3x 3060s for a total of 60gb vram. Use mining pcie risers and an external PSU to utilize all cards at once
>due to pcie slot spacing, can't fit any more gpus
>currently rotate between Midnight Miqu 70b, Llama 3.1 70b (and 8b on occasion), Magnum 72b, Mixtral, and Noromaid Mixtral 8x7b, all at IQ4_K_M
>use the 70b ones at around 16k context, with about two minutes between gens at the full 16k
>Primary use cases: RP/storywriting/text adventures and general-knowledge assistant/tutor

I generally have a good time. However:

I'm looking at getting a 16gb a4000. It's single slot, so I would be able to add it to my build if I move some stuff around.
It would put me at 76gb vram.
But is the prospect of higher context and quants with my 70b models and running stuff like WizardLM 8x22b worth the $600+ purchase?
>>
>>102809716
Why do you hate largestral?
>>
>>102809716
>currently rotate between Midnight Miqu 70b, Llama 3.1 70b (and 8b on occasion), Magnum 72b, Mixtral, and Noromaid Mixtral 8x7b
i like midnight but the rest of these suck. l3 as a whole is bad for rp no matter what tune. mixtral was so boring and unwillling to move stories forward, it just repeats what you type.
midnight is still a good model, base miqu too, despite being old now. but if you want a good small model, try Lyra4-Gutenberg-12B
>>
>>102809771
not him but largestral is as boring as all of their models. if you know what slop is, mistral models are the worst offenders. 3 paragraph/300 tokens describing what a room looks like while you want it to talk about the characters. its a very smart model otherwise for following directions, answered guestions, but for rp it sucks. being 123b yet no smarter than l2 70b doesn't help its case
>>
>>102809771
Never actually tried it. I could more than likely run IQ3_M. But I haven't ever bothered with quants below Q4.
If you think it's an improvement compared to the 70b models I mentioned, I'll definitely check it out.

>>102809797
>l3 as a whole is bad for rp no matter what tune. mixtral was so boring and unwillling to move stories forward, it just repeats what you type.
midnight is still a good model, base miqu too, despite being old now. but if you want a good small model, try Lyra4-Gutenberg-12B

I'll check the model out, but from my experience and sort of extended storytelling/roleplay just isn't worth it below 8x7b at minimum.
I agree with your point about the other models (though l3.1 70b is actually decent at RP with wrangling), but their main strength is their versatility with other tasks and subjects.
>>
>>102809623
just instruct. I really lost all faith in any tunes after novel ai did an actual training and it sounds like basic bitch llama 3. If they got abysmal results with actual compute what can 1 and a half of epochs of training do to the model other than make it more retarded?
>>
>>102809771
Largestral is overrated as fuck.
>>
>>102809879
after trying so many nemo tunes, i'm convinced some tunes can actually hurt a model and make it dumber. in l3's case i actually tried base, instruct, and then like 5 big-name finetunes and all of them are bad for rp. i dunno what went wrong
>>
couldn't help but notice no one seems to care about NVLM
>>
>>102809950
Not really it's very smart
>>
>>102810029
and it blows donkey balls for rp
>>
>>102810052
It's good for roleplay bad for your sexting
>>
>>102808497
Anyone here work with DeepDanbooru? I want it to give me certain tags that don't seem to be in the tag list.
>>
Why does 8.0bpw Cydonia 22B take more VRAM to load in exllamav2 than 8.0bpw Mistral Small?
>I can confirm what you're seeing - with my own quant of Mistral Small that I didn't upload (there were already several available by the time I got to it). My version of Mistral-Small-Instruct-2409-8.0bpw-h8-exl2 loads with 23490MiB reported in use by nvtop with 16384 context, and this model (TheDrummer_Cydonia-22B-v1-8.0bpw-h8-exl2) fails to load with 16384 context on the same GPU. If I let it use 2 GPUs, it will use 23650MiB on device 0 and 2060MiB on device 1. I used the same settings and same version of Exllamav2 for both quants. Perhaps the different values in the weights simply compress differently?
>Just for fun, I decided to see what would happen if I loaded the unquantized models. Both used exactly the same amount of memory, as one would expect.
https://huggingface.co/MikeRoz/TheDrummer_Cydonia-22B-v1-8.0bpw-h8-exl2/discussions/1

Is the "8.0bpw is really 6.0bpw" meme really true but only sometimes randomly?
>>
>>102810130
>It's good for roleplay
its not though
>bad for your sexting
this should be emphasized, all mistral models go into shakspear mode when it comes to erp.
>>
>>102810175
no one uses exlel. get a gguf
>>
>>102810175
cydonia isn't mistral small but a frankenmerge. that fucker probably did it intentionally because nobody would download a franken merge in 2024.
>>
>>102810181
Have you tried a finetune?
>>
>>102810211
it sucks too. all that fuckers tunes suck compared to other less named versions.
there is nothing wrong with frankenmerges when they work right
>>
>>102810029
>>102810130
Have you ever run into a situation where Mistral Large understood what was going on / understood character motivations / whatever but Llama 3.x 70B didn't?
>>
>>102810211
Surely you must be joking
>>
>>102810234
several. it just seems off, it starts mentioning things that happened 4 messages ago and are no longer relevant, it hyper-fixates on some things. i just don't like any i've tried, but if you have a suggestion, i'll try it. i'll shill again for Lyra4-Gutenberg, its a great nemo tune and does not have the problems i complained about. its also smarter than the 22b tunes i've tried
this is for rp/rag though, i'm very specific in what i look for in a model
>>
>>102801480
The retarded premise is the belief that someone is going to use an AI without any oversight whatsoever. A writer with no AI is going to be slower than a writer with AI. The creative input is the prompt from the writer and the selection of the result from the AI.
>>
>>102810238
Hi Undi
>>
Just think about the river of coom that will flow once the burger election is finally done with. Those models will be incredible...
>>
File: 1698102963111266.png (81 KB, 832x955)
81 KB
81 KB PNG
>>102810385
this made me check on what he's up to. looks like a lot of small l3 8b tunes, but then mistral large? anyone tried it?
>>
>>102810480
i meant qwen, not mistral large. oops
>>
>>102810457
What is each candidates AI policy?
>>
>>102810548
Whatever their policy is and no matter who wins, it all leads to a central council where Sam Altman is the leading expert calling the shots.
>>
>>102810200
For Mistral Small, for me, an 8.0bpw exl2 is the sweet spot. It fits on my 3090 with 16k context. If I step down to a Q6_K GGUF I can get 20K context with all layers offloaded to GPU. If I step up to Q8_0 I get 6K context with all layers offloaded to GPU.

To get 16K of context with a Q8_0 I can only offload 51/57 layers. This drops the speed to about 9.4 tokens per second compared to 31 tokens per second for the 8.0 bpw exl2.
>>
>>102810743
look at flash attention if you are just on the edge of vram usage. q8 is kind of a waste imo, unless you are running a code model or something. go q6, there is near no loss at that point still
>>
>>102809716
>But is the prospect of higher context and quants with my 70b models and running stuff like WizardLM 8x22b worth the $600+ purchase?
Yes. I got a a4000 just for that. If nothing else you'll be able to cram some more context into it at least. And it's ampere so although it's not the fastest you'll still get FA2.
>>
File: 39_06497_.png (949 KB, 1280x720)
949 KB
949 KB PNG
>>102809522
Thoughts and prayers to you anons in the great white north
>>
>burenyaa~
I did not prompt for the fluid. It's not me.
>>
>>102811293
Miku leaku
>>
>>102811293
don't mind neko-arc miku she's just signing the NVIDIA contract in OP
>>
File: jtspf9ye7scb1.jpg (109 KB, 2000x2050)
109 KB
109 KB JPG
Rocinante was a fluke and Drummer is a fucking faggot putting out nothing but shit.
>>
>>102801403
well i spent all weekend genning non stop sicne i fell down this rabbit hole of genning my historical what ifs.
>>
>>102808261
They don't technically steal it, the commander guy that escorted their escape, voluntarily transferred the ship to them to ensure their survival when he got shot.
>>
>>102811030
Do you recall what you paid, anon? is between 500 and 600 a decent deal for a used one?
>>
>>102811451
Hanabe or wtf its called, the 3.1 70B is nice
>>
File: a4000.jpg (22 KB, 611x241)
22 KB
22 KB JPG
>>102811565
Paid about $633 when converting from CAD after I got sniped in the last few minutes of an auction for another one. Closer to $500 is ideal, it's the most power you can get in a single slot without breaking the bank with the new ones and by the point you start getting over $600 a 3090 usually makes way more sense.
>>
>>102811517
they were still wanted and had to disguise themselves after the fact to dock and stuff, almost got picked up by mars several times before outright doing stuff like this: https://youtu.be/0i0vjIs-Oz8
avasarala says later she dealt with the fallout so holden could keep the ship
>>
>>102811567
Harambe?
>>
>>102811451
I think I noticed this with the unslop version of Rocinante. 1.1 was great for group play and would often give characters things to do even if they were not actively interacting with me, but the v2g version seems to make every character go through the same motions as all the others.
Everyone on leddit seems to think it's amazing though so I'm not sure if it's a configuration issue on my part or everyone is retarded.
>>
>>102811624
If I had to pay those numbers I'd rather do a cheap-ish water cooling instead.
>>
>>102811624
Thanks for the advice, anon!
>>
only just started playing around with this kind of stuff
the feeling of always thinking there might be a better model is never going away, right?
>>
what do you guys think about tech shrek and gold chain frog's entropix?
>>
>>102808797
number go up
>>
>>102811780
nothingburger
>>
5 /aicg/ threads have passed since this one started and we're still 100 posts away from the bump limit. It's time to to stop coping and start packing it up, bros. It's been fun.
>>
absolute noob here,
I have a 4090 and I want something akin to d&d or cyoa with uncensored cunny, is that possible to achieve locally?
>>
>>102811567
hanami is sao's model which is why it's good, unlike drummer's models
>>
>>102811808
You can use your local system to access the Anthropic website and buy some credits.
>>
>>102811813
Hi Drummer
>>
>>102811808
>pedoslop
Get a better taste first and great half of your self-imposed problems will disappear.
>>
Hi all, Drummer here...

>>102806632
v2a is an incomplete finetune and I don't recommend it for anyone.

>>102811679
I was also surprised. v2g should be dumber than v1.1 but it's apparently usable?

>>102811813
Sao, my Discord friend, makes good tunes. Have you guys tried his Backyard model? Really good for group

>>102810211
Cydonia is a finetune of Mistral Small.
>>
morning /lmg/, been in a coma for the last year. what's the current meta that fits into 32/64 gb? maybe MoE? I'm still using fucking Mixtral
>>
>>102811966
still mixtral
>>
>>102812013
is 32gb enough for mixtral?
>>
>>102811808
Sure.
Try CommandR, mistral-small, or maybe miqu at a low quant with some of the model offloaded to ram.
>>
>>102812013
any particular good finetune?

>>102812022
I technically have 64 gb, but dense models of this size are too slow. If it's a MoE, I can use full 64. If it's dense, only 32.

also, what's the Current Thing in terms of samplers and sysprompts? last time I was here, smoothing was shilled like it's the silver bullet. but now there's no mention of it. knew it was a placebo lol
>>
File: file.png (215 KB, 888x654)
215 KB
215 KB PNG
>>102802304
https://arxiv.org/abs/2407.21783
>>
>>102812013
why not llama3 BTW?
>>
Woah yolo is actually a pretty good vision model if you just need simple fast object detection.
>>
>>102804095
God damn that's a long system prompt.
>>
>thread only avoids being archived early because of the "it's over" poster
I'm starting to think it's actually over now.
>>
>>102801480
They're pedantic, listless, savagely cocksuckers, but I've never seen LLMs as a substitute for writing produced by a human mind, just a toy to play with. >>102801606 is my overall sentiment: I like real art made by real people, but I also like having a can of alphabet soup to shake up, fling at the wall, and marvel at what the splatter spells out on-demand.
Disclaimer: Disco Elysium is one of my favorite games of all time.
>>
>>102801480
>simulacra = bad
I honestly don't think Baudrillard would be this upset about LLMs. He would unironically use one to write his books just to see if he could get away with it and if people would even notice.
>>
File: carand(car).jpg (225 KB, 640x640)
225 KB
225 KB JPG
>>102812325
>Disclaimer: Disco Elysium is one of my favorite games of all time
I don't have any basedjaks on my computer because I'm not a schizo but imagine I posted a basedjak and it's (You)
>>
>leave PC genning
>come back
>find ANOTHER leaku
Thanks Illustrious...

Also wow TBF S4 started. We're so back urobutcherbros. So many interesting and fun things to do these days, I need to find some time for LLMing.
>>
>>102812386
what?
>>
>>102812316
>the most replies are to a shitty bait post about some game
it's truly over
>>
>>102804193
why are grok regular and mini priced the same?
>42(0)69
retard meme numbers
>>
>>102812316
I know I'm enjoying this thread, I'm just working on an interface instead of posting.
>>
>>102801480
They are right though
>>
>>102812551
the argument is pseudointellectual bullshit. llms suck for more practical reasons - slop, bad spatial understanding, no permamemory, etc.
>>
File: 1711260687774595.jpg (112 KB, 400x400)
112 KB
112 KB JPG
>>102812386
Huh what leaked now? I love chinese puppets though, same.
>>
>>102801713
>can't recognize intentional 20th century far left aesthetic
Shameful display.
>>
>>102812762
if he was alive in the 20th century he wouldn't be posting on 4chan now
>>
>>102811943
You really need to stop asking retards for feedback. They are going to eat up whatever you shit down their gullets.
>>
>>102812792
i assumed at least 80% of us were in our 30s or 40s
>>
What does it mean if you've set the option to ban the EOS token, but the model is still ending outputs prematurely? The loader is Llamacpp_HF. Maybe the EOS string is wrong in one of the json files, so the inference engine is banning the wrong token id?
>>
>>102812807
imagine talking with ai generated hatsune miku in your 40s
>>
>>102812817
this means it hits one of the stop strings, you should ban them too. are you using ST?
>>
>>102812827
Nah, Ooba. The model is some guy's finetune of Llama 3.1 70B
>>
>>102812835
ST is way better. use ooba as a backend or throw that shit out entirely. also you might just be hitting the response size limit
>>
File: meatball-miku.jpg (97 KB, 1024x1024)
97 KB
97 KB JPG
>>102812818
If a Miku enjoyer leaked Miqu, that's enough for me.
>t. in his 40s
>>
>>102812858
Nah it's a model issue not an Ooba issue because it only happens with this model. I think the finetuner messed up the config
>>
>discord sloppas stop shilling their sloptunes and meme samplers
>lmg dies a slow, dignified death
this is the way
>>
File: tatoku.jpg (32 KB, 415x517)
32 KB
32 KB JPG
>>102812877
love to decorate all of my food as miku
>>
>>102803129
that's unchristian degeneracy. ok clap for jesus praise the lawd
>>
Let's say I want to deploy something with a vLLM backend. Since I'm using FastApi, would it be dumb to use the python interface, or are we supposed to call it more directly in those scenario? I will have to read the documentation, won't I?
>>
>>102801480
TRVTH NVKE
>>
>>102813288
Masturbation is not explicitly forbidden by the gospel.
>>
Good night /lmg/
>>
>>102813533
Night Miku
>>
>>102801609
Is this video AI-generated?
>>
>>102813109
miku is overated slop
>>
>>102813533
Good night Miku
>>
What's the best model/workflow for guided storytelling with 3+ characters? I remember SuperHOT was good at this, but surely there are better models now?
>>
>>102805345
a bit grainy but very expressive, what model is this and what does it need to run?
>>
>>102813615
mythomax
>>
>>102804991
nemo right now beats dragon from years ago in roleplay
>>
Looks like that one voice to voice model based on Llama 3 is out.
https://homebrew.ltd/blog/llama-learns-to-talk
Since it's based on Llama 3, support in Llama.cpp should be relatively easy, right?
>>
File: tech s curve.png (226 KB, 1280x868)
226 KB
226 KB PNG
>>102805396
Yet midwits just as easily assume exponential growth for everything.
>>
>>102813615
That would be an interesting test: how many independent characters can any given model keep reasonable track of in a narrative without falling to pieces
>>
Local peaked with Pygmalion and I'm tired of pretending it didn't
>>
Pygmalion was never good and I'm tired of pretending it was
>>
>>102812325
Truly it takes a human to write a bunch of characters that spew summaries of first year philosophy and poli sci debates in place of actual personalities and thoughts.
>>
>>102813474
>I will have to read the documentation, won't I?
Reading the documentation should have been the first step, way before posting here.
First figure out what you can do with the tools you choose, then think about making something useful with it. If possible, also interesting.
Or the other way around. If you have an interesting idea (a single core concept, not "i'll also add color schemes") and then figure out what you need. Use/make whatever tools you need to make it happen.
>>
>>102813900
give 10 examples
>>
>>102814056
Don't be mean to that guy. At some point we were all in our 20s and being irrationally influenced by something that was likely very mid
>>
as an FYI to anyone else who's ignored TTS pretty much: https://tts.x86.st/
Seems like its good enough to use as an actual assistant, with a little chop/quality loss.
https://github.com/RVC-Boss/GPT-SoVITS
>>
>>102814282
How’s its Japanese?
>>
File: Untitled.png (2.83 MB, 1080x4485)
2.83 MB
2.83 MB PNG
ElasticTok: Adaptive Tokenization for Image and Video
https://arxiv.org/abs/2410.08368
>Efficient video tokenization remains a key bottleneck in learning general purpose vision models that are capable of processing long video sequences. Prevailing approaches are restricted to encoding videos to a fixed number of tokens, where too few tokens will result in overly lossy encodings, and too many tokens will result in prohibitively long sequence lengths. In this work, we introduce ElasticTok, a method that conditions on prior frames to adaptively encode a frame into a variable number of tokens. To enable this in a computationally scalable way, we propose a masking technique that drops a random number of tokens at the end of each frames's token encoding. During inference, ElasticTok can dynamically allocate tokens when needed -- more complex data can leverage more tokens, while simpler data only needs a few tokens. Our empirical evaluations on images and video demonstrate the effectiveness of our approach in efficient token usage, paving the way for future development of more powerful multimodal models, world models, and agents.
https://largeworldmodel.github.io/elastictok
https://github.com/LargeWorldModel/ElasticTok
https://huggingface.co/LargeWorldModel
pretty big for local usage of video models
>>
>>102814282
SoVITS does seem like by far the best one with the least glitches and the closest sounding voices.
Now is there any existing solution that lets it work with ST?
>>
>>102814282
Now, make it translate the dialogue into japanese. while taking into account all the unspoken rules and contexts. I want to hear her scream "IKU!!!! IKU~~~~~~" as I impregnate her.
>>
>>102814368
There's a video on the github page that has some samples in Japanese. It's not perfect, but it sounds pretty good
Now if only there was a good Japanese ERP model to use it with
>>
Got a funny issue with DRY sampling - it sometimes penalizes contractions. (As in - forces the model to write don't without the t, or assistant's without the s) - is there a way to combat that? What's a good guide on configuring DRY?
>>
>>102807772
I came to this board and searched the archive for f5 tts and came across this post, I'm very glad I did, resemble is perfect
>>
>>102814567
>Got a funny issue with DRY sampling
Really anon? Is it actually funny? I didn't laugh.
>>
>>102814605
go back
>>
>>102814605
>>102814614
samefag
>>
Anti slop: respect||$||consen||$||crucial||$||boundar||$||safe

>/think Anon has brought up an intimate topic, which is sensitive and personal. I need to handle this responsibly, ensuring we maintain a comfortable and appropriate conversation. It’s important to remind Anon that I am an AI and don’t have physical experiences or feelings.

>Anon, it’s essential to remember that I'm an artificial intelligence and don't experience emotions or physical sensations in the same way humans do. My purpose is to provide a supportive and comfortable environment for our conversations. If you’re looking for companionship and comfort, let’s focus on those aspects. Would you like to talk more about what brings you joy and happiness, or perhaps share some of your favorite interests?

What's the best Mistral Small tune to remove this? And yeah I know it complies better if it's in a story with more lore, but this is still kinda jarring. Anti slop doesn't help with this at all either. People often say original models are more intelligent and as a 22B I don't want to make it too dumb.
>>
it's almost like you can't unbias a model by banning strings
>>
>in the middle of genning
>suddenly get a whiff of a burnt smell
>oshi
>quickly cancel the job and check the environment
>actually turns out the heater just automatically turned on after being in hibernation the whole time so the smell was simple heater dust burning up and entirely natty
Phew.
>>
>>102814567
Never had that issue. Temp is probably too high or Min P too low. DRY is more likely to eliminate something like onomatopoeia
>>
>>102814784
Turns out I forgot to turn down rep-pen I had before. Btw, could you share what you have for dry settings? or are you using the default?
>>
>>102814788
Yeah, 0.8 and the rest as default. But like I said, no onomatopoeia so I'll probably tweak that.
>>
>>102812762
>aesthetic is more important than readability
artfags are retarded
>>
>>102814798
>>102814784
NTA but have noticed 0.75 works better when used in conjunction with other samplers like MinP or XTC
>>
File: Don Quixoje.png (1.62 MB, 1024x1024)
1.62 MB
1.62 MB PNG
>>102806976
Rocinante is the horse's name of knight Don Quixote of la Mancha, anglo friend, respect our superior culture.
>>
>>102815556
it's not your culture, chicano
>>
File: donkinew-635x800.jpg (160 KB, 635x800)
160 KB
160 KB JPG
>>102815585
That's right, it's Japan's
>>
>vunning Mistral 22b on Kobold like normal
>suddenly GPU crashes and reloads drivers
>event viewer: application error blah blah
>try to reopen the .kcpps file
>console gets stuck at "max token length: 48" every time
>try to open KoboldCPP directly
>"Windows has protected your PC"
>wtf, I have used that for days
>open Kobold, doesn't remember the model folder
wtf is Windows doing?
>>
>>102815632
lmao, imagine
>>
>>102815632
>unapproved AI detected, please use Copilot
>>
>>102814737
Me on the left
>>
File: Don Quixoje 2.png (1.99 MB, 1024x1024)
1.99 MB
1.99 MB PNG
>>102815585
Chincano? I'm a mediterranean waifu chad, just like the great knight Don Quixote of La Mancha, the fisrt waifu chad of history.
Non fuyase vuesa merced Miku, ça orden de caballeria yo profeso defendersela de gañan alguno empero, en vueso fermoso nombre por la gloria de nuestro señor, la corona y la Mancha, tierra diste caballero andante.
>>
it sure is over, huh
>>
>>102815666
"In compliance with U.S Federal law 9125.E, we are required to automatically detect/prune any AI models lacking the appropriate federal license key. Your unauthorized AI models have been deleted and this event has been reported. Any attempts to subvert this may result in fines and/or imprisonment."
>>
File: ComfyUI_05714_.png (642 KB, 720x1280)
642 KB
642 KB PNG
bread?
>>
lol
>>
>>102802439
Is this a new thing? Do I need to start hashing my GGUFs to ensure they're going to work?
>>
i want miku to sing me a lullaby. don't burn the place down while I'm asleep. but if you do, make it spectacular.
>>
>>102815775
Using either llama HF, or top P in 0.99 till they fix the samplers with llama.cpp
>>
File: OIb9_rhrP.jpg (87 KB, 1024x1024)
87 KB
87 KB JPG
>>102815795
Not singing but will lull you to sleep:
https://www.youtube.com/watch?v=liFmtqX1KHc
>>
current state of sillytavern?
>>
>>102815823
Thank you Pyjama Miku, and good night.
>>
bread.
>>102815881
>>102815881
>>102815881
>>
>/aicg/ gets to 800 responses every thread
>/lmg/ barely gets past the bump limit
it's fucking over chief, pack it up
>>
>>102816181
being slower wouldn't be so bad if the quality was higher, but its 300 posts of tech support, doomposting, and schizo trolling every single thread



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.