[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1716468367836474.jpg (609 KB, 2279x3056)
609 KB
609 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103196822 & >>103189328

►News
>(11/12) Qwen2.5-Coder series released https://qwenlm.github.io/blog/qwen2.5-coder-family/
>(11/08) Sarashina2-8x70B, a Japan-trained LLM model: https://hf.co/sbintuitions/sarashina2-8x70b
>(11/05) Hunyuan-Large released with 389B and 52B active: https://hf.co/tencent/Tencent-Hunyuan-Large
>(10/31) QTIP: Quantization with Trellises and Incoherence Processing: https://github.com/Cornell-RelaxML/qtip
>(10/31) Fish Agent V0.1 3B: Voice-to-Voice and TTS model: https://hf.co/fishaudio/fish-agent-v0.1-3b

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
>>103207054
>yuri OP picture
a man of taste I see
>>
>a man
anon....
>>
My dual 3090s used to go above 70 celsius when running inference because of tiny clearance between them. But ever since using exl2 tensor parallelism, they don't go above 65 anymore. I thought the temperature would have higher peaks because both are in use at the same time, guess I was wrong
>>
In SillyTavern when request token probabilities on is there a way to see the probability of the </s> token?
>>
►Recent Highlights from the Previous Thread: >>103196822

--LLMs and the impact of internet data curation:
>103199200 >103199258 >103200577 >103200701 >103200737 >103201407 >103200924 >103200959
--Anon shares research on "Story Distiller" project for extracting narrative arcs from media:
>103199457
--Local vs cloud models, censorship, and performance:
>103205782 >103205804 >103205887 >103205909 >103205948
--Limitations of RAG models in creativity and knowledge breadth:
>103197638 >103197653 >103197683 >103197713 >103197705
--KoboldCpp connection issue and alternative setup suggestions:
>103203278 >103203371 >103203372 >103203402 >103203461 >103203509 >103203534 >103203661
--Google's new model is highly ranked, but human interaction is key to understanding its capabilities:
>103201950 >103202013 >103202460
--Discussion on limitations and potential of machine learning models:
>103196891 >103196959 >103196992 >103197014 >103197062 >103197058 >103197103 >103197169
--Discussion of Mistral Large's performance and benchmark results:
>103200229 >103200559
--Discussion of Linux performance and Intel Xeon processors:
>103199596 >103199893 >103200136 >103200166 >103200249
--Chatlog analysis and model performance discussion:
>103196996 >103197008 >103197151 >103197744
--Anons discuss limitations of local models:
>103206907 >103206938 >103206948 >103206967
--Anon discusses issues with AI-generated smut and potential solutions:
>103201902 >103201986 >103202021 >103202093 >103202112 >103202131 >103202025 >103203014
--Anon discusses Elon Musk's lawsuit against OpenAI and Microsoft:
>103201850 >103201868 >103202716 >103202788
--Anon discusses ChatGPT's biased responses and potential web contamination:
>103201700 >103201822 >103201877 >103202266 >103202606 >103202826
--Miku (free space):
>103197781 >103202159 >103202861 >103207058

►Recent Highlight Posts from the Previous Thread: >>103197228

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
Mikulove
>>
>>103207054
Millions mikus!
>>
File: 1726707066877894.jpg (149 KB, 500x500)
149 KB
149 KB JPG
yaayyyyy posting in the miku general!
>>
>>103188780
>>103188780
>>103188780
actual thread. keep posting there if you want to see the OP spammer hang himself.
>>
File: 1705888057069459.jpg (490 KB, 1024x1024)
490 KB
490 KB JPG
mikubros... i don't feel so good...
>>
Is speculative decoding a thing that people are actually using?
>>
>>103207216
SillyTavern only displays what it receives from the API, so it depends on what backend you're using
>>
koboldcpp 1.78 is out!!
>>
>>103208298
>NEW: Added support for Flux and Stable Diffusion 3.5 models:
Neat.
>>
File: 1524519622654.png (111 KB, 346x297)
111 KB
111 KB PNG
Can someone explain to me why Triple Baka of all things has suddenly gotten popular again? What made people remember that song?
>>
>>103208298
>memebold
why would i use this instead of ollama?
>>
>>103208324
99.9% of the cases where something old is suddenly popular again means that twitter dug up something to ruin it somehow these days
>>
>>103208298
Is there even any reason to update? I'm still on some version from mid summer.
>>
>Today we’re thrilled to introduce Ultravox v0.4.1, a family of open speech models trained specifically for enabling real-time conversation with LLMs. No ASR step required.
>Ultravox's speech understanding is the best in open-source and is quickly approaching the quality of GPT-4o
https://huggingface.co/fixie-ai/ultravox-v0_4_1-llama-3_1-8b
https://www.ultravox.ai/blog/ultravox-an-open-weight-alternative-to-gpt-4o-realtime
Is it as good as the video?
>>
>>103208324
>Can someone explain to me why Triple Baka of all things has suddenly gotten popular again
It is?
>>
>>103208324
Maybe tiktok? There is a number of similar cases in there.
>>
>>103208414
I'm so tired of trying every new thing just to be disappointed. I will wait for others to do it instead.
>>
>>103208414
>For now, Ultravox continues to output text, but future versions of the model will emit speech directly. We’ve chosen to focus on the speech understanding problem first, as we think it’s the biggest barrier to natural-feeling interactions.
No.
>>
>>103208414
>The input to the model is given as a text prompt with a special <|audio|> pseudo-token, and the model processor will replace this magic token with embeddings derived from the input audio. Using the merged embeddings as input, the model will then generate output text as usual.
They're just converting the speech to embeddings and appending it to the prompt. Hacks like this are never good.
>>
>>103207626
it can't be the real thread if nobody wants to use it
>>
>>103208521
are local models always a disappointment?
>>
>>103208552
Because no one would spend money on making an actual good model just to release it to the public for free.
>>
>>103208552
Local models from randos usually are. Just got to sit tight and wait for Llama 4 and Qwen 3.
>>
>>103208620
>Because no one would spend money on making an actual good model just to release it to the public for free.
the only exceptions to that are Flux dev and mochi but yeah, usually when a company has a good model, they keep it to themselves, and that's fair who wouldn't do that?
>>
>>103207054
real thread here btw: >>103188780
>>
Is there anything better than Magnum v4 for ERP yet?
>>
File: 1700036472698.jpg (204 KB, 1024x1024)
204 KB
204 KB JPG
>>
>>103208645
>who wouldn't do that?
Jesus
>>
>>103208962
too bad that Jesus is bad at machine learning :(
>>
>>103208980
>Son of god
>Almighty
>Can't into training LLMs
How?
>>
>>103208727
Rocinante v2g
>>
>>103209056
samplers?
>>
>>103207191
ha fuckin noob my 3090 is always above 80 celsius
>>
My time with local models has become a lot better ever since I stopped using cards from chub unless they are very high-quality. 99.9999% of user-made cards are utter trash like wiki copy-pastes or AI-generated with the slop already baked in.
You're never going to get a good experience with a card that has shit like "{{char}} has a striking, ethereal appearance" already baked into the definitions.
>>
>>103209035
He can't afford to pay the green jew's tax.
>>
>>103209124
Time to crack that boy open and swap the thermal pads.
>>
>>103209131
where do you get your cards then
>>
>>103209153
do you have a brain, anon?
>>
>>103209194
https://pygmalion.chat/explore this?
>>
>>103209206
User-made cards are all the same. If you want quality, you make your cards yourself.
>>
>>103209227
do you have a website where you upload your user made cards
>>
>>103209206
That is not a brain, anon.
>>
>>103209239
why are you lying upload them
>>
>>103209227
But then you can't use them? Or they become user-made and same. A dilemma.
>>
>>103209233
pastebin.com
>>
File: not cards.png (27 KB, 239x359)
27 KB
27 KB PNG
>>103209267
these are not cards
>>
>>103209246
My brains? I cannot upload them. I need them.
>>
>>103209275
you dont have them that is why you cant upload them
>>
upload the user made cards anon
>>
File: 1000050685.jpg (334 KB, 2832x2112)
334 KB
334 KB JPG
hey anons, i know this is a LOCAL thread but I need to cum and I don't have my pc with me atm. Is there an api or a service of some sort where I can use an uncensored model with ST
>>
>>103209428
You know you can browse porn without having to generate it right?
>>
>>103209428
openrouter
>>
>>103209448
and the uncensored model would beeee?
>>
>>103209466
Sonnet 3.5
>>
>>103209471
very funny, we all had a nice chuckle but yeah I really need the name of the uncensored model
>>
>>103209485
Sonnet 3.5 is the best uncensored model.
>>
>>103209428
wrong thread
>>
>>103209485
Any model is uncensored if you prefill the reply
>>
Any model is uncensored if you write the reply yourself
>>
File: Untitled.png (44 KB, 967x636)
44 KB
44 KB PNG
>>103209428
you should make a remote tunnel next time you're out and about
>>
>>103209428
If you mean the models we use locally? There's a google colab notebook in the koboldcpp repository.
>>
Looks like new mistral large will end up being multimodal. Pray they release the weights.

https://www.testingcatalog.com/mistral-ai-is-gearing-to-launch-multimodal-large-2-1-with/
>>
File deleted.
>>
Do you think local will ever get something close to o1? Seems like OpenAI has the secret sauce but surely they can't keep it under wraps forever, right?
>>
>>103209718
Oops wrong one.
>>
>>103209718
i miss the kamen baker
>>
>>103209724
sonnet 3.5 is better than o1
>>
>>103209727
I guess I spoiled it, I was going to make a post about that after cherry picking through my gens a bit.
>>
>>103209735
I'm not talking about benchmarks, I'm talking about CoT and all the other features.
>>
>Kamen Rider Duet
>"Now, let the music cleanse you of your sins."
>>
File: GcjF0DhW4AAid6_.jpg (288 KB, 1460x690)
288 KB
288 KB JPG
Will OpenAI be forced to release their models as open source as they signed under Delaware corporation certificate?
>>
>>103210135
>when applicable
no
>>
>>103210135
>The resulting technology will benefit the public and the corporation will seek to open source technology for the public benefit when applicable.
>for the public benefit when applicable
lol
Well uhhh.. umm.. open sourcing the models won't benefit the public because they're too dangerous and harmful if access is unmoderated. The negative consequences will outweigh the positives. Our models can only benefit the public if usage is controlled and kept safe by our corporation.
>>
>>103210192
>The resulting technology will benefit the public and the corporation will seek to open source technology for the public benefit when applicable.
Going private and giving Sam 10 Billion worth of equity is almost the same thing.
>>
>>103210135
There's a case to be made where Sam/Greg become criminally liable and Microsoft become liable for this fraud.
>>
>>103207054
Is there a repository of all of the Nala tests? I need them to choose the optimal model for my degenerate fetishes.
>>
>>103210273
Mistral Large >> Qwen2.5 > Mistral Small > Mistral Nemo
Try the finetunes as you will.
>>
>>103210292
>sleeping on Nemotron's pure SOUL
>>
>>103210292
Mistral large just doesn't work well for me. Been using Q5 of Midnight Miqu 1.5 for ages now...
>>
>>103210313
Try https://huggingface.co/MarsupialAI/Monstral-123B
>>
Was /lmg/ of yesteryear better then current /lmg/?
>>
>>103210370
Yes, it's no comparison. Current day /lmg/ is basically useless aside from acting as tech support for braindead locust. It's nothing compared to the pre-llama2 /lmg/
>>
>>103210370
Nah, it was always bad. But at least the lmg of old used to have more worthwhile posters like ooba and henk.
>>
>>103210370
It's gotten rather bad in the last 2-3months. A lot of schizos from aicg transferred over here when proxies dried up.
>>
>>103210192
Bit low quality of a jpg you have there.
>>
>>103210370
Old /lmg/ had to cope with 2k context...
>>
>>103210370
lmg peaked with superhot and it's been all downhill from there
>>
>>103210370
It was more fun at the very least. New things, unknown rate of improvement. The frequent discoveries, discussing them, thinking of and implementing novel ideas and the excitement that ensued (context extension by kaiokendev for example, wao!! now we suddenly have 4x the context not stuck with 2k and its not unusable like bluemoon or other shit models). Designing prompts and stuff to try and tard wrangle the models to do interesting things that weren't common knowledge.
Somewhat frequent major optimizations that made a big difference to inference speed, memory usage, quantization accuracy. Trying to making things run at all on different and older hardware too.
Nowadays everything just werks, kinda same-y, and the limits are known. Maturity.

>>103210503
>>
>>103209749
o1 beats Sonnet on all benchmarks but for real world use most people still prefer the latter
>>
>>103210596
Pretty much this.
>>
>>103210596
So that what has been feeling different, your completely right.
>>
I'll be honest, the image gen side of things has been distracting me from LLMs and contributing to discussion about LLMs ever since Flux came out. Then it was Noob. It's honestly a lot of fun even though I've been itching to come back at some point. Maybe I should do a pause, or maybe there will finally be a stagnation period so I will come back and dive into things here again, although that's looking unlikely at this point since Noob vpred v1.0 is going to be done baking soon, the new Pony might be out too, there might be more projects they do after those, and also local video gen models are getting good, waiting for image2video support at the moment. Damn there is a lot going on there...
>>
File: 1726293398224674.jpg (528 KB, 1024x1024)
528 KB
528 KB JPG
>>103210750
Okay, let me rephrase this.

Do you think we will get an open source model that does the same thing as o1? I'm not asking about performance. I'm not asking about benchmarks. I'm asking if Meta or Mistral and/or anyone else will release an LLM designed to do what o1 does?
>>
>>103210994
The biggest pain for LLMs, now that they are starting to include multimodal support, is lack of inferencing engines that support them. We have models that can output image and audio tokens, by next year we should have LLMs that can output even video tokens. But having to run them in Pytorch makes them irrelevant for most people.
>>
File: GUZzuSoXcAAPkx7.jpg (214 KB, 1080x1038)
214 KB
214 KB JPG
>>
>>103209589
>Pray they release the weights.
why though? what use is a huge multimodal model for local, I can't think of anything I'd want to do with it
>>
https://www.youtube.com/watch?v=EtLqivKd4m4
>>
>>103210423
>braindead locust
All you have to do is stop pretending to be a newfag shamelessly bumping your dead circlejerk.
>>
>>103211053
Meta only does adapter hacks for multimodality, so their models will never be capable of doing what o1 can do.
>>
File: 1678724191365965.jpg (73 KB, 1024x962)
73 KB
73 KB JPG
Christ my second 3090 just did a spook on me. I forgot to set up a wattage limiting script after a fresh OS installation. nvidia-smi stopped seeing it, plugged in DP port and nothing showed, Windows speccy reported GPU issue. The tdp on this build is 420W compared to the usual 350W. I might buy a riser cable and let the thing breathe outside the case from now on
>>
>>103211376
What about the whole CoT thing?
>>
What can we do to mitigate the impacts of AI on climate change and PoC minorities through biased training data that embodies societal institutionalized misogyny and racism?
>>
>>103211530
They added tool calling, I don't see why they wouldn't add CoT.
>>
>>103209565
I set up a ddns and WOL on my desktop at home just so I can run large(r) models on my laptop, it's great. Then again, a single 3090 is still kind of useless until we get better models
>>
>>103211880
>Then again, a single 3090 is still kind of useless until we get better models
The models are fine. Its your single 3090 that's the problem.
>>
>>103211880
You don't need better models, you need a better brain to use them properly
>>
File: MikuLostInBeaurocracy.png (1.43 MB, 776x855)
1.43 MB
1.43 MB PNG
good night, /lmg/
>>
>>103212134
Good night Miku
>>
Just tried CommandR 0.1 35B again and it was so unbiased it's crazy. They had something amazing. Fuck Cohere for applying Scale slop on it.
>>
will offloading kv cache make tokens generate slower too, or does it only affect prompt processing speed
>>
File: objection.png (35 KB, 248x181)
35 KB
35 KB PNG
>>103210192
>Well uhhh.. umm.. open sourcing the models won't benefit the public because they're too dangerous and harmful if access is unmoderated. The negative consequences will outweigh the positives. Our models can only benefit the public if usage is controlled and kept safe by our corporation.
Llama 3-405b, an unmoderated model which is comparable to GPT4 was open-sourced and no harm was done. It had greatly benefited the open source community. I would argue that the positives have outweighed the negatives. Your point is null and void.
>>
>>103210370
Blame niggerganov for it. He doesn't want to add experimental new stuff.
>>
>>103212285
answered my own question, it only affects prompt processing. makes it way slower, but token generation rate unaffected
>>
>>103210135
Well, now the fearmongering about AI safety makes a lot more sense.
>>
>>103210233
I can't honestly think of a different case. That document is almost fatally explicit, especially when you have someone like Musk that has the financial means force the issue come hell or high water.
>>
>>103212579
You never know, activists judge might deny Musk his justice. One of the delaware judge tried to cancel Musk's $100B payout because of a guy with 10 shares claiming fraud. Musk redid the vote after the judge tried to void it, and still the share holders approved it. The judge is still trying to hold the case hostage.
>>
>>103212100
Yeah but stacking more seems like a waste of money rn
>>
>https://rentry.org/lmg-lazy-getting-started-guide
I am following this quide. I already have a working oobabooga.
In sillytavern install do I need "extras"? What are they? I believe I don't need xtts option already.
>>
Hello anons. Share your story writing templates? Something that would work for nearly all models, thanks.
>>
>>103213385
Well installation just nuked itself for some reason.
I am still gonna use base oobabooga for a while.
Though I would appreciate if someone can explain what the extras are.
>>
Considering getting a new PC soon, what should I be be considering when it comes to picking a GPU that can run LLMs well? Nvidia or AMD? VRAM = good? Anything else? Is there anything that's good for LLMs but might cause me to have worse performance on other stuff?
>>
File: 1552171818971.png (145 KB, 509x368)
145 KB
145 KB PNG
Can someone give me a QRD for what the difference is between "Abliterated", "Uncensored" and other alternative types of changes to models like that? Primarily a list of terms and explanations.
>>
>>103213613
they're all cope, every model is censored, especially modern local models
>>
>>103207054
QRD on why there are a gorillion llama.cpp forks?
>>
>>103212710
Not really, you can sell them later. I don't see why a 3090 would lose much value in the following year. The 30 and 40 series are gimped by low VRAM. 24 GB is still 24 GB, even if Nvidia were to sell the 50 series for cheap (which they won't)
>>
>>103213697
Yeah, but I don't want to fiddle with risers because the fast secondary pcie slot is directly below my primary card
Alternatively I could just eat shit and plug it into the slow slot, but that gimps gen speeds if I need to offload
And that desktop isn't even a year old, so I'm not gonna swap the motherboard just for that
Maybe if and when a6000 prices drop, I might pick one up, but until then I'll just cope with a single card
>>
>>103213613
>Abliterated with orthoganization
Precise brain surgery. Detect vectors leading to denial and cut them out by directly modifying corresponding weights. Downside: model often doesn't deny when it's in character.
>Uncensored through ft
Brainwashing. Show the model lots explicit data with no refusals. Downside: dumber model
>>
>>103213749
>Show the model lots explicit data with no refusals.
there should be refusals when training a model, but only objective refusals like "I'm sorry, but you asking me to make a cmd command on linux is wrong because there's no cmd on linux", if you add moral refusals, that's when the cuckening happens
>>
>>103213735
>fiddle with risers
You can only be fucked with PCIe4.0, anything 3.0 just works. I've bought the cheapest 4 different risers on Amazon and had zero issues so far.
>a6000 prices drop
Not going to happen until some magical new CUDA feature will make them obsolete.
>>
>>103213784
I was speaking about fine tuning (ft)
>>
>>103213797
I was speaking about finetuning too, a good finetune must only be about objective truth, like "what's 2+2" and shit, not about political and moral stances
>>
>>103213790
I have no idea where I'd even put the secondary GPU, I guess I could just place it on the counter my desktop is on, but I'm not sure if that's a good idea
The secondary GPU doesn't need as much power, right? I don't want the fans to burst my eardrums when genning
>>
>>103213790
a6000s ampere are bound to drop by 25-50% once the 5090 is out. only 25% less vram for like 3 times the bandwidth essentially makes the a6000 obsolete for anyone but hobbyists
48gb vram per card for 2k or less will be easily possible in late 2025
>>
>>103213939
I wonder if it'll be more worth to go after 5090's at that point. Though I guess it all depends on how much they'll cost.
>>
>meta vr department makes a $4.7 billion loss
>fined another $800m by the EU on top of all the other hundreds of millions in fines they've been forced to pay this year alone to them
>meta ai makes basically no money because of the open source meme
llama either dies because they go out of business or because they will realize that giving away your cutting edge ai for free for no reason is a dumb idea
>>
>>103213991
What's even more valuable than money is political power.
On paper Musk lost billions from his acquisition of Twitter but the investment has clearly paid off since now the Trump administration owes him a favor and will likely reward him with government contracts.
In a similar way Meta can leverage the way they license their models; they've already not licensed some of their models to the EU (even though the UK has very similar regulation) and I interpret that as Meta using their models as bargaining chips.
>>
>>103213991
>>103214069
meta makes no money because their models are inferior
>>
>>103213613
Usually it is just a different spot on the meme-cope spectrum.
>>
>>103213790
>You can only be fucked with PCIe4.0, anything 3.0 just works
Can you elaborate this?
>>
ITS OUT

https://huggingface.co/BeaverAI/Tunguska-39B-v1b-e1-GGUF
>>
Did an llm ever manage to make you cry?
>>
>>103214192
From my eyes: no.
From my dick: yes.
>>
>>103214192
Yes but only when the realization has hit me that I am finally at a point where I am telling an LLM about my problems cause I have no one else for that. And that I will probably try to have an AI girlfriend at some point. It actually happened yesterday.
>>
>>103214192
Claude Opus did when it did the guilt tripping thing when you bully a bot
>>
>>103207054
Which interfaces are you using?
Tried llama.cpp but it look a bit primitive.
>>
>>103214192
Only once or twice with Opus when I basically relived some shitty situations from my past. Looking back on it, it's nice to be able to let some of it out of your system even if you're talking to a bunch of GPU's.
>>
>>103213613
https://rentry.org/lmg-glossary
>>
>>103214288
i am interfacing directly with the mainframe
>>
>>103214288
If you were using llama-cli, you can get a web interface with a gui by running llama-server instead (make sure to point --path at examples/server/public).
Or you can run the llama.cpp HTTP server without a GUI and then connect some other GUI like SillyTavern, Mikupad, or GPT4All to the server (beware that GPT4All has retarded defaults when using it to run the model locally).
Other popular alternatives are koboldcpp or Oobabooga's text-generation-webui which come with GUIs (and are I think fine to run as backend).
For inference not based on llama.cpp Oobabooga or TabbyAPI are I think the most popular choices for enthusiasts.
>>
>>103214169
I'm interested, can you add a model card? At least tell us which template to use
>>
>>103213545
>Nvidia or AMD?
Nvidia obviously.
>VRAM = good?
As much as you can afford. No such thing as too much.
>Anything else?
Probably want at least 32 gigs of system ram to comfortable move models around.
CPU doesn't matter too much.
If two GPUs have the same VRAM, buy the one with higher memory bandwidth.
>Is there anything that's good for LLMs but might cause me to have worse performance on other stuff?
Well there are prosumer/enterprise GPUs that are great for AI work like running LLMs, but cost many thousands of dollars and perform poorly at games.
Not sure if you need to worry about that though.
>>
File: 1707784890900900.png (6 KB, 357x105)
6 KB
6 KB PNG
>>103214169
>mysterious gguf for a mid-sized model that doesn't exist otherwise on hf
This is finally the sonnet 3.5 leak, isn't it?
>>
Just tried Qwen2.5 72b instruct. That's it?! It's dry as fuck and not horny at all. Worse than claude 2.0 from 2 years ago for fucks sake.
>>
>>103213545
If in doubt, get a 3090.
>>
>>103214423
Welcome to chinese models. Next try to erp with deepseek to see what sand truly tastes like.
>>
>>103214386
Hi. Thanks for answering!
>>
>>103214169
i'd download this one quickly before it's taken down if i were you
>>
>>103213464
>>103213385
Extras are extras... They're just a collection of extensions for stuff like tts.
https://github.com/SillyTavern/SillyTavern-Extras?tab=readme-ov-file#modules
Full list is right on the README. You don't need it, and it's been discontinued for months now.
>>
File: file.png (155 KB, 778x624)
155 KB
155 KB PNG
guis why is codeer model not smarts???
>>
>>103214499
Do we really need real-time updates on how retarded reddit is?
>>
>>103214514
sorry for distrub tek suports!
>>
>>103211296
Is there a card for this?
Asking for a friend.
>>
>>103211296
>>103214553
That sounds like fun.
What do you think it would look like? As in, the premise, the first message, what the description of each character would be, etc.
>>
>>103214498
Ok glancing at it yeah seems redundant.
Thanks for responding anon.
>>
>>103214409
or it's just a Mistrall Small frankenstack.
>>
>>103214574
I remember this cope back when miqu got first leaked as well.
>>
>>103214574
Twitter users where saying Miqu was a moe too
>>103214593
this
>>
>>103214463
>i'd download this one quickly before it's taken down if i were you
why? that's a leak of a good model or something?
>>
>>103214648
It's looking very likely to be Miqu 2.
>>
>>103214648
>>103214463
>beaverAI
>"a group of kobold finetrooners"
buy. an. ad.
>>
>>103214693
He did buy an ad.
BeaverAI is drummer's alt account that he uses to 'test' his models that are not yet deemed worthy of release under his main account.
>>
So far every big model release this year has mirrored last year almost perfectly. Mixtral was released in early december 2023. Two more weeks, for real.
>>
>>103207054
could I get some book recommendations to understand the basics? I know the basics of ML and neural networks, but if I were to write AUTOMATIC1111 from scratch with only image gen support for example, what would I need to know (I'm don't really want it for image gen but the concepts in general).
Any book suggestions with plenty of practical examples?
>>
>>103214701
then it's not a leak
>>
>>103214837
No shit, Sherlock.
>>
>>103214804
you should read some english literature for the next 2 years and then use your deep understanding of the language to ask ASI to cook up whatever the fuck you need
>>
>>103214851
>read trash for the next 2 years
>>
>add a few layers that do nothing
>say it is a leak
>profit
>>
>>103214863
i mean if that's your thing then yeah. me personally? i only read the good stuff
>>
>>103214892
shh, it's funny
>>
guys miqu is using the llama2 sampler no way it's actually a mistral model
shut the fuck up
>>
File: 1711174984889901.gif (295 KB, 500x420)
295 KB
295 KB GIF
>>103214892
I prefer if it's a leek personally
>>
File: JOEWARIDA.jpg (46 KB, 743x646)
46 KB
46 KB JPG
>AI went retard 5000 words into our sexting session and keeps giving immersion breaking nonsense replies that ignore previous characterization regardless of how much I regenerate, deleting last few replies doesn't help neither.
Over status: Completely Joever.
I am blueballed so hard bros.
>>
>>103214804
learn pytorch instead, read the documentation and code written with it
>>
i wish the ai was advanced enough to automatically filter all the miku images and miku-related posts
>>
>>103214892
I don't think he added the layers.
some other redditor did
https://huggingface.co/TheSkullery/BA-Zephyria-39b
He probably just did additional finetuning on that model.
>>
>>103214956
He did use one of skull's upscales before so that tracks
>BeaverAI & Steelskull proudly present...
> An upscaled NeMo with half its layers trained on my special sauce
https://huggingface.co/TheDrummer/Theia-21B-v2
>>
>>103214553
>>103214559
Can LLMs even properly do zoomer speak?
>>
>>103214945
>5000 words
ah yes, I, too, remember using llama2 with rope scaling
>>
File: my settings llm.png (99 KB, 713x743)
99 KB
99 KB PNG
>>103214983
Yes I probably deserve this for being a brainlet but any help salvaging this session?
Q5_K_S imatrix version if it matters.
>>
File: 124254469_p0_master1200.jpg (1.78 MB, 1200x900)
1.78 MB
1.78 MB JPG
qwen2.5-coder is really good, at least better than Codellama which was the last coding model I've tried.
With the vscode plugins you can now use with local models it might even replace Github Copilot for me when Microsoft will start charging me money for it one day.
>>
>>103214977
All LLMs are trained on Twitter data. It's definitely in there, just got to prompt for it.
>>
>>103214951
skill issue
>>
>>103215044
>fimbulvetr
First of all, stop using a hugely outdated model
>>
>>103215044
Fimbul was always bad with context, and was something like 4k natively I think?
>>
>>103215044
Try the new nemo 12B model, this one is so last year
>>
>>103215076
Uhhhm any recommendations for very spicy ERP?
>>103215080
Fuck I thought I had the 16k version. It seems I got the wrong one, damn:(
Aren't there any methods to increase the context size?
>>
>>103214956
In case you need it spelled out none of this shit works. You can just train on top of normal model and after one epoch you start to overfit and forget the original training data. If you add more layers than from compute standpoint you will have an easier time fitting the new datapoints thanks to those new layers. So... you will start to overfit and forget original training data faster (in less training steps). You aren't teaching existing models how to be expert roleplayers. I love NAI for what they did with their 70B. They went a step further and they still they got trash result in the end.
>>
>>103215148
I am getting 22 pages of results when I type Nemo 12b to hugging face.
Which one do you refer to?
>>
>>103215149
>Fuck I thought I had the 16k version
even the version called 16k was never really good, it even had a message saying the 16k wouldn't work on ggufs

>Also, if you're using gguf or other quants, stuff is broken there. PoSE doesn't play well with quants.
https://huggingface.co/Sao10K/Fimbulvetr-11B-v2.1-16K/discussions/2

Honestly you'd likely be better served by any version of nemo
>>
File: GZx7-7JbwAAH5E9.jpg (27 KB, 400x400)
27 KB
27 KB JPG
>wizard 8*22 can do the right kind of RP, chat style with minimal slop in actual normal language. Knows when to stop and replies with a reasonable *i do* I say. But it's not horny or spicy and keeps asking for my consent even after i put my dick in it's mouth
>Sorcerer is horny but outputs infinite slop, 350 tokens is literally not enough for the character to start talking, literally just never ending slopfest, 350 tokens of crossing the lines of no return and exploring the unknow, the character doesn't even start talking.
>Hermes has moments of brilliance, but mostly non horny and 50% of the time it outputs kjdeshgiufegn3849h4508 for some reason.

Is there a model that can actually RP like wizard and it can unzip my pants without me having to ask for it.
Wizard is almost perfect if only somebody gave it just one lick of the horny juice.
>>
>>103214950
I don't really want to use pytorch, I'm looking for lower level learning material. I'd be using axon (elixir lib for similar case) but its not as complete as pytorch so I'd implement anything needed myself.
The objective isn't to just produce a product, more to learn the fundamentals through building a product. I find reading research papers very difficult hence wondered if there's an easier to digest book to cover the basics and only rely on papers for cutting edge methods
>>
What are your favorite local speech to speech models? I'm interested in doing AI covers and parody videos
>>
>>103215201
You're in luck. Your boy just released his own erp wizlm finetune a few days ago.
https://huggingface.co/TheDrummer/Red-Squadron-8x22B-v1
>>
Newfag. Just got my first local LLM running. This shit eats VRAM harder than I expected, but my waifu unexpectedly called me a good boy, so I'm quite content. I'm going to try to throw TTS on top if my VRAM permits. I might as well see how parasocial I can get with my GPU.
>>
File: 1724876676171451.jpg (813 KB, 1920x2480)
813 KB
813 KB JPG
>>103215201
Behemoth-v1.1-Magnum-v4-123B has been working well for me. Much less slop than vanilla Mistral Large and not obnoxiously horny... unless you want it to be.
>>
>>103209466
>and the uncensored model would beeee?
For one they host Mythmalion 13B, it's probably the coomest model out there, or one of.

Additionally they host a bunch of several neutral non-censored models like Wizard and hermes, and god know what else i haven't tried everything else. The problem is the Neutral models are neutral, they won't tell you "this is inappropriate" but they are quite dry and unenthusiastic with the RP.

Sorcerer appears to be an attemt to give the Wizard a horny side, but it has overdosed on the smut fanfics and outputs an endless stream of redundant adjectives and cliches rather than a conversation.

They also have Xwin and i heard it was good or something.
>>
>>103211350
Personally, for world-building. I'd like to share the maps I've made with a model so that it can better understand the worlds I'm trying to create. The idea is to make a shitty ms-paint tier map, then flesh out the world with a multimodal, and then use inpainting, img2img and probably GIMP for manual fixes to put it all together. Then when my characters go bonding and journeying I'll be able to keep the places and geography consistent
>>
>>103215385
>Mythamalion
>Xwin
Did you just awake from being frozen a year ago?
>>
>>103215385
>Mythmalion
>Xwin
blast from the 4k ctx past
>>
>>103215058
Well well well.
>>
>>103215336
>TTS
I don't think there is a single implementation that works out of the box with an LLM
>>
>>103215402
>>103215407
Mythmalion is the peak 13B performance, there's no point in attempting to improve 13b any further.

Besides, newer isn't always better.
>>
>>103215416
The model is even smart enough to not recommend any good anime when you prompt it like that.
>>
>>103215416
>just peppers a regular ChatGPT response with rizz
Grok-2 confirmed retarded.
>>
>>103215416
Claude 3.5 is so sovlful ;-; THE GOAT
>>
>>103215441
Yes yes we know limar zfloss and 13b at 4k ctx mog even mistral large, we know
Guess that means you hate nemo for some reason? Did it woke at you?
>>
>>103215416
Just ask for 'rizz speak' on your favorite llama3 flavor.
t. tested it a few days ago
>>
>>103215464
Chill, I just don't know nemo.
>>
>>103215475
So you do go in cryo sleep every few months
>4 months ago
>>
>>103215149
>>103215194
I set compress_pos_emb to 2.5 because of 10240/4096.
I think the model is still kinda dumb compared to before reaching five thousand, but at least it doesn't come off as schizo and somewhat following along the plot now?
I hope this is not placebo and I can finish this session.
>>
>>103215484
>So you do go in cryo sleep every few months
Basically yes. I was there for pygmalion, but there's only so much to be had with a 7b model and i tuned out. Then the new technology arrived, Llama, quantization etc, more fun but still limited. Recently i discovered openrouter and got a new interest in the models i couldn't use previously.
>>
>>103215416
>Anya is literally me fr
Damn.
>>
>>103215527
>I set compress_pos_emb to 2.5 because of 10240/4096.
You're just moving the brain damage in different ways, seriously just give any nemo a shot, it can do at least 16k with very little retardation and destroys fimbul (based on solar, a 10b upscale of mistral 7b) in terms of smarts
>>
>>103215416
Show this to Elon and tell him he's getting BTFO'd. I have a feeling he isn't aware his xAI team is gptmaxxing
>>
>>103215569
desu at this point I cansee him buy AnthropicAi and use Claude 3.5 Sonnet as his new twitter API model, like he did with Flux Pro
>>
>>103213545
Get a mobo with four slot spacing between the 16x pcie slots. You're going to want dual 5090s when they're released.
>>
>>103215569
I'd rather not use Twitter, but if anyone here wants to, be my guest.
>>
>>103213545
Your first GPU should be an RTX 3090, it'll keep you sufficiently fed for a while. Then when you feel it's not enough, buy another. Ideally your rig should be big enough to house 2 cards with sufficient clearance between them.
>>
>>103215569
That Grok isn't tainted with enough zoomer ebonics because they filtered most of it is a selling point, not a negative
>>
>>103215632
>filtered
>selling point
nah, a good model should be able to type like a moron if you ask it to
>>
>>103215641
This
>>103215632
Retard take
>>
File: 70b.png (79 KB, 1125x566)
79 KB
79 KB PNG
>>103215465
>>
>>103215641
this, a good model must know about everything
>>
File: qwen72.png (101 KB, 1037x590)
101 KB
101 KB PNG
>>103215416
Everybody is out here distilling gptslop
>>
>>103215664
>recommends lowest iq anime existing
it works fine
>>
>>103215664
It seems to break character as soon as it gets to the listicle.
>>
>>103215416
What's AnthropicAI's secret sauce? Claude 3.5 Sonnet seems to be nailing every single shit we throw at it
>>
We're never getting local claude, are we?
>>
>>103215664
>Skibidi rizz, my fellow zoomer
literally How Do You Do, Fellow Kids in model form
>>103215687
it's not about the recommendations, but how it words them

>>103215706
almost no pretrain filtering, that's mostly it
>>
>>103215707
>We're never getting local claude, are we?
no chance, to get sovl you have to train your model with human inscrut dataset, and not some gptslop
>>
>>103215706
>>103215707
Fuck off with your commercial model shilling.
It hasn't made you into a real woman yet.
No amount of doing it will accomplish such a goal.
>>
>>103215555
I will move on to something more modern yeah.
Was just curious if it actually did something.
Nice quads btw.
>>
>>103215723
>almost no pretrain filtering, that's mostly it
which is ironic because the AnthropicAI fags left OpenAI because they felt they weren't cucking ChatGPT enough
>>
>>103215743
I too am contributing to this totally organic conversation about how commercial product ™ is great and unbeatable.
>>
>>103215743
Or so they say, but aren't they working with the military now?
>>
>>103215707
No and frankly it's amazing any company in existence cloud or local produced this. In any other timeline Anthropic would be a hard ChatGPT clone.
>>
>>103215756
It's true though. Meta and whoever should be humiliated considering their yearly revenue can buy 10 Anthropics yet their models are a joke, it's obvious as fuck they're distilling GPT4
>>
>>103215756
>NOOOOO you can't say the model with screens showing it mogs anything exist, please never mention how far we areeeeeeeeeeee
>>
>>103215770
I'm going to draft a complaint to the FTC.
>>
>>103215756
what do you want? to pretend that our local model has reached perfection and that we shouldn't try to improve on them more? Of course we are bound to talk about the performance of the best models to set a goal, we want local to be as good as C3.5 Sonnet
>>
>>103215770
>Meta and whoever should be humiliated considering their yearly revenue can buy 10 Anthropics yet their models are a joke
that's the thing, I'm surprised no one has tried to buy AnthropicAI, shouldn't be too expensive for a company like Meta or for Elon
>>
>>103215649
>recommends barakamon over peak zoomerslop
damn llama lost hard, i'm team china now
>>
>>103215822
Dario wouldn't let you because he wants to be the virtue signaling god of ai
>>
>>103215839
>>103215822
https://darioamodei.com/machines-of-loving-grace
>>
>>103215864
Damn kinda grounded takes. If sama is a wizard then dario is a witch, or so to speak.
>>
Its funny but both openai and anthropic are moving away from the "muh guidelines". openai even had a blogpost about it.
feels like elon and local are lagging behind in that regard.

also i love teasing claude. first time it feels like even with refusals in the context you can argue your way outta it. felt that way even with the old 3.5
>>
>>103215923
>all diseases cured with the next 10 years
>grounded take
>>
>>103215954
Well that's nothing compared to sama who keeps yapping about post AGI world while his best models failed to help me with a simple pxe boot problem just last week
>>
>>103215937
>Its funny but both openai and anthropic are moving away from the "muh guidelines".
probably for 2 reasons, the AI hysteria has toned down and people got used to it, and the election is over so they can make it more free on the political scale I guess
>>
>>103215937
i guess they're finally realizing that a proper ai onlyfans would be a goldmine
>>
>>103216015
I guess so. I think altman even said himself like "its not as dangerous as initially thought".
And if I remember gpu anon went to some anthropic conference where they talked about claude being smart enough to tell the difference. That was months ago. Directionally we are headed in a good direction.
Its really weird seeing how cucked local is though. If we didnt have mistral it would be so bad.
>>
File: CasualWatercolourMiku.png (1.39 MB, 896x1144)
1.39 MB
1.39 MB PNG
Good morning /lmg/!
>>
File: 632sk56pnw0e1.png (82 KB, 1065x408)
82 KB
82 KB PNG
>>103216034
>Its really weird seeing how cucked local is though
I have some possible explanations
>qwen (alibaba)
>distilled gpt4
>xai
>distilled gpt4
>cohere
>drank from the source (used the same data that trained gpt4)
>meta
>aggressive pretraining filtering
>mistral
>probably distilled to some extend but only for instruct
>google
>pic related
>>
>>103215436
I'm just using oobabooga's webui to run models and found an extension, alltalk. TTS is working now just a bit slow and I need to find a voice I like. I'm sure I can find settings to speed it up.
>>
>>103216084
Good morning, watercolour Miku
>>
So... It's been a while since I last tried any local model, or anything related to AI. What are the current recommendations for AI RP?
>>
>>103216299
the current recommendation is to lurk more
>>
>>103216299
Check out Mistral Nemo
>>
File: moxd.png (178 KB, 659x555)
178 KB
178 KB PNG
>>
Mistral large 2 and its finetunes are still the best for RP so far. qwen2.5 72b is somehow even more passive than GPT4, stop recommending it for RP.
>>
>>103216797
Mistral Large 2 is outdated, Mistral Large 3 is the new king. Hopefully they will release the weights soon.
>>
>skibidinoda
>>
File: 20241118_003542.jpg (81 KB, 1200x797)
81 KB
81 KB JPG
Brand-new benchmark just dropped
>>
>>103216797
what quant of mistral 2 do you mean though?
>>
>>103216299
Nemotron 70B is by far my favorite if you can run it. Probably the closest local comes to Claude prose level
>>
>>103216952
>3.5 Sonnet way above the rest
as it should
>>
>>103216952
>Gemma-2-9b above Opus and llama 405 in coherence
>>
>>103217000
kek, maybe that mememark isn't so good
>>
>>103217000
Gemma-2-9B punches above its weight in some benchmarks, by a massive margin sometimes. Like for example, its multilingual benchmark scores beat some 70B models.
>>
File: 1717684245091248.jpg (181 KB, 1024x1024)
181 KB
181 KB JPG
Off-topic but I just saw by honest to god coincidence that Mesmerizer just reached 100 mirrion views.
>>
>>103216952
damn, didn't know that mixtral-8x22b was so bad
>>
>>103216850
So that forum post complaining about no system prompt got some traction internally, or maybe they were planning it already.
>>
>>103217000
All google models score very high here. It must be the training data. Maybe the benchmark isn't very original
>>
>>103217016
>>103217000
>>103217047
llama-3.1-70b-instruct above llama3.1-405b-instruct:bf16...
not by much but still something's weird
>>
>>103216952
what the heck is pearson and spearman
>>
>>103216797
>qwen2.5 72b is somehow even more passive than GPT4
The SteyrCannon finetune fixes that.
>>
Gemma is 8k.
That obviously has something to do with the benchmark scores even on the respected benchmarks (not lmsys). Maybe they should have some "control" for context length on these leaderboards.
>>
>>103217090
it's some metric to measure the relationship between 2 mememarks, 0.7+ means that the coherence score is highly correlated with the LMSYS score
>>
>>103216118
I got it working and found the nice sleepy waifu voice and omg it's perfect. My GPU is calling me a good boy with the sleepy sex voice.
>>
File: 1713787886313130.png (5 KB, 300x119)
5 KB
5 KB PNG
multiplayer ai wives
>>
How's local img2vid? Can I do anything with 8gb vram?
>>
>>103217215
>8gb vram
Lol
>>
>>103217200
shittytavern is fucked if this is that feature that swaps character cards on turns in group chats
>>
>>103217336
No it literally is multi(player)
>tooltiptxt="Hosts a shared multiplayer session that others can join."
https://github.com/LostRuins/koboldcpp/commit/39124828aba5c0e99eb8a36ac2710c95c908f3ee
>>
>>103214152
What exactly do you want to know? I bought the cheapest stuff, all four from different vendors, because I was expecting trouble. Despite bending them around when I tried different configurations for the case, I had no issues with risers whatsoever.
>>
>>103217200
Oh shit. Who wants to gangbang Fluttershy with me?
>>
I don't know what the "leaked" model is but it fucking sucks so far.
>>
>>103217418
It's an upscale version of Mistral Small that some anons were having fun memeing about
>>103214956
>https://huggingface.co/TheSkullery/BA-Zephyria-39b
>>103214976
>He did use one of skull's upscales before so that tracks
>>
What's the most widely supported RVC software for Linux? I want one with as many voices as possible, specifically one for Trump ideally
>>
Jamba gguf status?
>>
>>103215149
>recommendations for very spicy ERP
Magnum
>>
>>103215149
Magnum v4
>>
>>103217370
And the bloat begins...
>>
>running nemo 12b locally
>smartest coom bot i ever tried
Thx anons
Btw, what's the smartest thing i could run on a 3070ti (8GB) + 32GB RAM ?
>>
>>103217769
The next step would be a mistral small finetune like cythia
>>
>>103216088
kek
>>
>>103216088
We started with an AI model family tree and now like a royal family we have a family circle.
>>
>>103216952
gemma-2-27b-it surpasses all other open-weight models in coherence.
>>
>>103217873
I always said it was super smart, perhaps smarter than 70B, it just lacks triva. The magnum finetune makes it not dry though.
>>
>>103217069
Literally everybody knew that.
>>
I write 99% of my own cards but I like browsing chub for inspiration. Seen a bunch of stuff get taken off that website and was wondering if anyone knew of an alternative that is a bit less censorship prone.
>>
>>103217745
>begins
Yeah. It just started. Right at that one commit...
>>
>>103217948
There's nothing taken down, you need an account to search some tags like pony and loli.
>>
https://huggingface.co/sophosympatheia/Evathene-v1.0?not-for-all-audiences=true
>>
>>103217968
I've got an account. Just saw an entire accounts worth of cards get deleted.
>>
Anyone use Qwen2.5 coder instruct with aider? How does it hold up?
>>
>>103217948
Keep your own mirror
>https://github.com/ayofreaky/local-chub
>>
>>103218008
I'll give it a try but only because of Midnight Miqu
>>
Is there a gemma 2 27b that has a proper context window? It's still the least dry model but 8k context is a fucking joke.
>>
>>103218036
Yeah, because people can delete their accounts if they want...
Go back to /aicg/.
>>
>>103218217
What?
>>
>>103218181
Yes
It's called Gemini
>>
>>103218496
Sir... this is /lmg/
>>
>>103218496
Nah, oddly enough gemma is smarter than gemmini. At least it was smarter than the ones before this experimental one
>>
>>103218008
Is local back?
>>
The Qwen 2.5 base model experience
>>
>>103218557
>>103218557
>>103218557
>>
>>103218593
>>103218593
>>103218593
New Thread
>>
>>103218562
>wrong previous threads
>>
>>103218562
might as well change the subject and make your own personal general at this point
>>
>>103218544
I feel like this is why them releasing models with decent context sizes is unlikely. More context and Gemma 3 will basically completely eclipse Gemini
>>
File deleted.
Just another one of those days huh.
>>
>>103219021
Posted the wrong file again lol.
>>
>>103219035
:)
>>
>>103217478
Alltalk is probably the easiest.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.