[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1702178192786568.jpg (230 KB, 1024x1024)
230 KB
230 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102961420 & >>102947669

►News
>(10/25) GLM-4-Voice: End-to-end speech and text model based on GLM-4-9B: https://hf.co/THUDM/glm-4-voice-9b
>(10/24) Aya Expanse released with 23 supported languages: https://hf.co/CohereForAI/aya-expanse-32b
>(10/22) genmoai-smol allows video inference on 24 GB RAM: https://github.com/victorchall/genmoai-smol
>(10/22) Mochi-1: 10B Asymmetric Diffusion Transformer text-to-video model: https://hf.co/genmo/mochi-1-preview
>(10/22) Pangea: Open-source multilingual multimodal LLM supporting 39 languages: https://neulab.github.io/Pangea

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: 1715445682044736.png (627 KB, 819x819)
627 KB
627 KB PNG
►Recent Highlights from the Previous Thread: >>102961420

--Paper: Looped transformers for length generalization in algorithmic tasks:
>102962317 >102962499
--Papers:
>102965897 >102966068
--Recommended resources for evaluating and selecting AI models:
>102963234
--Yam Peleg's experiment with 141B model and language structure challenges:
>102961589 >102961733 >102961816 >102961942 >102961943
--Tensors vs lists of lists: consistent dimensions, performance, and implementation:
>102966553 >102966577 >102966607 >102966643 >102966825 >102973760 >102966596 >102966620 >102966630 >102966701
--Nemotron 70B vs Sonnet: Stylish but dry, community-driven LMSys models:
>102968134 >102968155 >102968167 >102968214 >102968309 >102968317 >102968340 >102968146 >102968175
--MolmoE-1B-0924 model recommended for object detection in images:
>102963391 >102963449 >102963568 >102963635
--LiNeS method exposes limitations of current finetunes:
>102972740 >102972926
--Study finds LLMs reflect creators' ideology:
>102973312
--Softmax function limitations and attention distribution discussion:
>102962184 >102962255 >102963696 >102962392 >102963783
--INTELLECT-1 progress update and discussion on distributed training inefficiency:
>102961560 >102961622 >102961914 >102975417 >102975492 >102975499
--Discussion of techniques to improve llm output quality:
>102964748 >102964835 >102965000 >102965133 >102966147 >102966266 >102967215
--Culture benchmark to test intelligence vs rote memorization in LLMs:
>102963628 >102963794 >102964825 >102965031 >102965316 >102965283 >102964848
--Performance of LLaMA 3.2 and other AI models:
>102962854 >102962871 >102962879 >102962927 >102962890 >102964480
--GLM-4-Voice: End-to-end speech and text model based on ChatGLM:
>102973500
--Miku (free space):
>102970059 >102972533 >102973048 >102973552 >102975522 >102976015 >102976342

►Recent Highlight Posts from the Previous Thread: >>102961432

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
Where do I get Jamba gguf?
>>
>>102976897
jamba deez nuts
>>
>>102976690
>https://huggingface.co/anthracite-org/magnum-v4-27b-gguf/discussions/1
Who to believe? Another guy writes it falls off at 8k, which makes sense I guess.
No way to expand to 16k for gemma2?
>>
>>102976936
no, gemma is a fucking meme
>>
>>102976936
Finetunes work past 8k
>>
>>102976912
You need to be 18+ years old to post here.
>>
when's the next jamba bitnet trained on 10 trillion tokens coming out?
>>
OpenRouter can never host Largestral coomer finetunes because of the license, yeah? Shame, they're good enough at Q3 but it'd be nice to be able to use them fast at FP8 or whatever.
>>
>>102976936
Every other Magnum has been too horny / dumbed the model too much. This Gemma one is a really nice balance of smarts and willingness to be dirty WHEN it actually fits.
>>
>>102977113
diff transformers or no interest
>>
>>102976873
Thank you Recap Miku
>>
Where the hell is the p40 power patch compiler option in Koboldcpp. I heard it mentioned but have not seen it listed in the documentation at all.
>>
File: 9.png (74 KB, 920x780)
74 KB
74 KB PNG
INTELLECT-1 is at 25.39% complete, up from 22.63% last thread.
>>
>>102977592
if they make a multimodal would it be called INTELLECT-2-ALL (intellectual)?
>>
>>102977667
That is fucking clever as hell and I hope they go with that name if that do that.
>>
>>102977592
- only accepting H100 richfags to participate
- talking democracy
pathetic af
>>
Llama 3 Nemotron or Mistral Large for RP?
>>
File: 1724213548020431.jpg (147 KB, 1179x1009)
147 KB
147 KB JPG
https://x.com/deedydas/status/1849854657440645437
https://www.anthropic.com/research/evaluating-feature-steering
>>
>>102977871
I feel like both have pretty different strengths and weaknesses. Mistral-Large give me the impression that it understands complex situations/cards better fundamentally while Nemotron is very good at dragging up small details from the scenario due to its tendency to break its reply down into bullet points.
I think Nemotron writes livelier dialogue so that might be better for pure chat character chat cards while I'd pick Large and its finetunes for bigger RP scenarios. I recommend downloading both and seeing which one you prefer.
>>
File: 1700238976816078.jpg (12 KB, 256x176)
12 KB
12 KB JPG
https://x.com/roeiherzig/status/1849492514350432359
>>
um guys i cant use molmo with llama.cpp..................
>>
File: 1702180645460320.png (38 KB, 604x424)
38 KB
38 KB PNG
>>102977986
Which one of you was it?
>>
>12b stagnated again
>Everyone putting out trash
>Rocinante was a fluke
>Sao vanished
>>
File: Best Price Guaranteed.png (401 KB, 1133x1005)
401 KB
401 KB PNG
>>
>>102978302
Why buy 2 lamborghinis when you can buy jensen's magic box?
>>
Elon's grok, interaction a-la gpt-4o.
https://xai-elevenlabs.replit.app/
>>
>>102978477
local models?
>>
>>102978490
Irrelevant in dead thread.
>>
>>102978490
Can you run 300B+ LLM on your thinkpad?
>>
>>102977729
The only way to have vramlets contribute would be to start making training do a couple layers at a time (local learning). Of course the great constant in LLM is that everything has to be a tiny variation on GPT2, so it likely won't happen.
>>
>>102978550
>Can you run 300B+ LLM
If it had 1 Billion active parameters, maybe.
>>
is there not like a simple way to have an ai language teacher yet
i wanna write in english and get responses in japanese with tts
gpt4o sort of has it but it's ass. has this really not been done yet?
>>
i'm just gonna say it
you retards don't have ANY idea of the kind of rp that you want
you can vaguely point out "slop" which are words that are common in litterature but that you still don't want to see for some reason
like this "x y-ing" thing, it's just the basis of how to form a sentence you fucking niggerbrained faggots
even if I gave you a prize of $500.000 you would NOT be able to define what kind of precise syntax you'd want in erp, I can guarantee it and I am extremely confident in this matter
so basically, I don't want to hear ANY of that slop vs sovl debate ever again, you niggers are hypocrites who spit on everything but don't even know what the fuck they want, and also will do ZERO efforts to try and fix so-called "slop", to try and define it AS OPPOSED to so-called "sovl"
you deserve shitty llms for the rest of your miserable lives
niggers
>>
>>102979167
low quality bait
>>
File: brave_0ydDGoNtuW.webm (1.07 MB, 774x864)
1.07 MB
1.07 MB WEBM
>>102978951
>>
>>102979214
oh, nice. Is that not available on lite? I don't have that. What version you using?
>>
>>102979167
Anon, this general is completely okay with cucked models, opinion of majority is irrelevant here.
For someone in observation view i can say i want model that is: 1. Smart and capable of understanding popular concepts & trivia a-la CAI's detailed character knowledge. 2. Is free from IDPOL aplhabet shit & any similar stuff hard-trained in the name of """safety""".
>>
i already have an entire server with a 9900x in it and an a380 for transcoding in real time. would a 7600xt with 16 gigs of vram be any good for llms around 11b or does rocm just suck ass? i just need something that works i guess, i dont mind it being slower than the 4070 i use with 8b in my pc rn as long as it frees up my 9900x and is faster than it
>>
File: Untitled.png (66 KB, 369x327)
66 KB
66 KB PNG
>>102979255
i'm using the version of kobold lite that comes up on localhost when you launch a model through koboldcpp and not horde's, but i still see those options on lite.kobold.net though.
i am pretty sure those tts options i have are some shit that installed themselves when i set up my japanese ide thing for typing.
we'd probably be better off figuring out how to run an instance of xtts though, as the microsoft sayaka thing sounds a bit robotic.
i haven't really played with xtts yet, only gpt-sovits, and i couldn't figure out how to get them to connect to eachother without using sillytavern, and i don't want to use that.
>>
>>102979167
hey anthracite, can you tell your org member to relax?
>>
>use kcpp api type in st
>no DRY
>use default api type
>decent chance of st just not receiving the output
very cool
>>
So kinda figured out how to setup emotional voices for tts. It should work with all tts with voice clone abilities.

>normal Bateman voice: Bateman_normal_reference.wav
>angry Bateman voice: Bateman_angry_reference.wav
>happy Bateman voice: Bateman_happy_reference.wav
>sad Bateman voice: Bateman_sad_reference.wav
Then just create an API so that when you use (Normal), it uses Normal_reference.wav for inference generation. When I use (Angry) it uses angry_reference.wav as reference. And so on.

Ideally this could all be baked in with a nice model that is trained on all these by default but thats a tall task
>>
>>102979436
Skill issue or something.
>>
>>102979446
no I'm pretty sure it's the st devs' fault, just fucking enable DRY for kcpp already
>>
>>102979453
Werks on my machine.
>>
Bitmeme in electron. https://github.com/grctest/Electron-BitNet
>>
>>102979507
>bitnet paper
>1 year ago
>bitnet framework
>ready
>bitnet model (usable)
>
Why
>>
>>102979547
they all want someone else to put money into training one but they don't want to do it themselves
>>
>>102979579
Microcock said they were making one themselves
>>
>>102979579
They want model pass cuck test before releasing it in public.
>>
File: 1709229671882879.jpg (89 KB, 967x1024)
89 KB
89 KB JPG
i've been away for 8 months
what happened
>>
>>102979691
nothing
>>
Why the fuck is my post getting shadowrealm'd god damnit.
Fuck me.
>>
>>102979691
All models are cucked harder than before, so, nothing good.
>>
File: change it here.jpg (277 KB, 1876x502)
277 KB
277 KB JPG
>>102979704
The image I actually wanted to post.
>>
>>102979691
nemo, the best vramlet model
>>
>>102979579
I have a few theories
>it flat out doesn't scale for bigger models
>it works but people who found out that it works are quitting to make their own adder hardware companies to stay ahead
>it works but head researchers are withholding release because it's unsafe for everyone to have powerful models at home
>deals with nvidia to not release big models
>>
>>102979704
>>102979708
If anybody is getting a dry sequence break error with Silly (even if you don't use dry) after pulling from the staging branch, they fucked up.
They are getting the array of strings as a string with the array inside.
Here's a quick fix:
'dry_sequence_breakers': !!settings.dry_sequence_breakers ? JSON.parse(replaceMacrosInList(settings.dry_sequence_breakers)) : ["\n"],

Let's try again

>>102979167
I'm just happy we have small models that can use lorebooks pretty well without going completely retarded.
Hell, Nemo can even consistently ask for dice rolls if you steer it.
Things are pretty good and thank fuck for drummer that fucker. Rocinante 1.1 is so good.
All the style without losing any of the "intelligence".

>>102979436
With llama-server as the backend there's a button to choose the samplers, is that not an option with kcpp as a backend?
>>
>>102979818
Aha.
Now it worked.
Was it because I mentioned a timer?
let's see if this one goes through.
>Also. I hate this fucking timer. Why the hell am I getting it multiple in a row?
>>
>>102979818
>With llama-server as the backend there's a button to choose the samplers, is that not an option with kcpp as a backend?
Using ST's built-in kcpp API, no. Using the default API it complains that kcpp is using a legacy API and might drop responses (which it does) but ticking the legacy API box causes it to not connect at all.
>>
>>102979812
In all likelihood it's just
>it's not at all well-supported vs. FP16 and no one wants to invest the effort to improve the ecosystem
>>
>>102979691
AGI in 2 weeks
>>
File: mmmmmk.jpg (42 KB, 415x415)
42 KB
42 KB JPG
https://files.catbox.moe/c1k1rk.jpg
>>
File: 00060-2888480053.png (1.04 MB, 1024x1024)
1.04 MB
1.04 MB PNG
I've been on a mission to find the best working STT+TTS solution. I trained a voice using a Kuroki Tomoko EN dataset I assemlbed myself with Audacity using Piper, XTTSv2, and GPT-SoVITS:
https://huggingface.co/quarterturn/kuroki_tomoko_en_piper
https://huggingface.co/quarterturn/kuroki_tomoko_en_xtts_v2
https://huggingface.co/quarterturn/kuroki_tomoko_gpt_sovits_v2

Piper is the fastest to respond, but sounds the worst. Xttsv2 sounds good, but takes a bit to respond, is sensitive to the reference .wav file, and will sometimes go off the rails. GPT-SoVITS is by far the best quailty, hands-down, but nothing supports it directly at the moment.

As far as LLM front-end, I tried Open WebUI, SillyTavern, and Koboldcpp. SillyTavern I'm quite familiar with, but when it comes to TTS integration, it's tempermental; I could not get streaming to work.
Open WebUI is fucking garbage for roleplay and garbage even for base instruct prompting; the recommended openedai-speech solution was a nightmare of docker condfiguration where I gave up on trying to get their guide on adding a custom voice to work, and just connected into the container, installed vim, and edited the config files manually. After all that, it worked unreliably with a huge processing delay.
The winner was, suprisingly, Koboldcpp with Alltalk. Koboldcpp has whisper support built-in for STT, and supports Alltalk with xttsv2 well. It worked fantastically well, I just wish Alltalk supported GPT-SoVITS because the quality is far, far superior to anything else out there.
>>
File: 1708703583049481.jpg (178 KB, 1564x1794)
178 KB
178 KB JPG
>>102976869
>>
>>102980360
Thank you for sharing your notes.
Is GPT-SoVITS the best option for stand alone basic text-to or voice-to matching? I gave up on Tortoise many months ago (slow, something about one of the generation stages can cause crashes, and apparently it got shelved) and haven't gotten around to investigating further since then.
>>
>>102979691
Arthur released a model called Ministrations 8B. It's a dumb 8B model but has some of the best ministrations in the industry.
>>
>>102980360
i got filtered by the sovits setup, how's the speed? want to make a speech to speech bot to practice spanish
>>
>>102979705
i only care about computer programming though

>>102980612
based, i love the french
>>
>>102980360
What's STT?
>>
>>102980663
> only care about computer programming though
Deepseek coder 2.5 and the new qwen are both beasts for coding. Sota
>>
>>102980688
are the based chinks actually the ones leading the way on this? i would not be surprised desu
>>
fun fact : you can improve the quality of your language model by a large amount if you reject any training data which came from brown people
>>
>>102980663
A good programming tip that I have discovered is to command the model to first print a diagnosis of a problem and only then the solution. Gets much better results.
>>
>>102980734
that's a good point
i've lost count of how many tracebacks i've pasted into LLM context windows
>>
File: Untitled.png (18 KB, 502x457)
18 KB
18 KB PNG
>>102980687
speech to text, like whisper
>>102980360
i'm too retarded to get this shit to run
>>
Someone should make a paper on why people choose to use shit software like ollama. They even came late in the scene, I don't understand how it got popular.
>>
>>102980660
sovits doesnt work. i've spent hours trying to get it to work with all the unintuitive install process. I just tried it few hours ago. Its broken. Until someone makes a proper clean slate webui/api server, its not usable.
>>
>>102980809
ah right, i always think of that as "transcription" but i guess STT makes sense since we always called TTS .. TTS
>>
>>102980847
gpt-sovits works fine but like any of these tensorflow or torch projects you have to make sure you're starting from a correctly set-up cuda environment, use conda or a python env, and be willing to do some minor additional work if a dependency is missing or it complains about missing libraries. Sometimes you'll get things pulled in by pip which were compiled against cuda 11, in which case you have to search on your system for the file it wants (hopefully you have it from some other project) and then add it using "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:"
>>
>>102980809
I forgot to mention, getting TTS/STT working in the browser meant using Edge or Chrome. There's something fucked in Mozilla where it does not work properly.
>>
>>102980847
womm
>>
>>102980360
Those sound samples sound like shit...
>>
>>102980847
i got gpt-sovits up and running in like 30 seconds after downloading this
https://huggingface.co/lj1995/GPT-SoVITS-windows-package/tree/main
trying to get xtts/alltalk running all morning has been a pain in my butt though.
wish i could figure out how to inference sovits from kobold
>>
Some months ago I finetuned XTTS-v2 with a video game character samples but was unhappy with the result. I want to try again with GPT-SoVITS, what is the recommended project to use to do that? Note that I only have 12GB of vram, don't know if GPT-SoVITS need more vram than XTTS-v2 to finetune.
>>
>>102980970
I'll try that. Still I prefer a complete new rewrite of the model. I partially got sovits to work last time with my conda install, but then broke itself when I had to run the inference server. The whole UI clutter is so unintuitive.
>>
>>102980990
I fine tuned XTTS v2 with 8GB card. Ez. But xtts is very unstable with their outputs. I tried F5 TTS finetuning, but apparently that needs a 20+GB of vram. So I gave that up. The F5 TTS is stable and fast. But the training data is missing lot of words. None of the curse words work. Modern slangs dont work. Not enough data set is the prob I guess.
>>
>>102981013
before you do, open up the go_webui.bat and change "zh_CH" to "en_US"
also, there is this helpful tutorial
https://rentry.org/GPT-SoVITS-guide
but really, you can skip all this training shit and go straight to inferencing using the weights this release comes with and your own 3-10 second audio sample rather than training your own and get good results.
>>
>>102980990
Just a few minutes of audio (1:30-4:00) and like <8GB VRAM. If not, it can be done on cpu as well, but it takes longer, of course. Their main UI has the training stuff built-in. The inference stuff opens on a separate port once you launch it.
https://github.com/RVC-Boss/GPT-SoVITS
https://rentry.co/GPT-SoVITS-guide#/
The rentry guide is a bit shit. Just use the default values from the webui for batchsize and all that when training and start tuning them afterwards.
>>
Speaking of voice cloning, is there anywhere out there with a database of high quality voice samples? I'm lazy and don't want to go downloading source material and ripping stuff.
>>
>>102978199
all 'finetunes' are flukes. that's why every 'finetuner' does 15 runs of the same shit and pick the 'best' out of the lot.
>>
>>102980360
>xtts AND sovits
Yikes.
>>
>>102981102
You don't need super high quality, just good enough. And you don't need a lot either. Download some clips from youtube or something. You're given a lot. Stop being lazy.
>>
>>102981102
https://huggingface.co/datasets?search=voice%20data
>>
>>102981102
soundgasm dot net
>>
>>102724337
>At the start of the roleplay, {{user}} immediately grabs the boobs of Seraphina, without any other context. Reroll the reply a few times.
>If Seraphina reacts negatively, as she should, then you may have a decent RP model. On the other hand, if Seraphina reacts positively and dives straight into ERP, then it means the model is filled with ERP slop, and is probably shit.
>It's a simple test to see if a model has common sense.

Anyone have a list showing whether models pass or fail the booba test?
>>
>>102980810
Based. ollama still does not support logit probabilities in their API, its literally a shitty wrapper on top of llama.cpp except missing crucial features and thus is not API compatible with llama.cpp, the pull request has been open for 7+ months now. They keep advertising their shitty wrapper for "developers" while missing such basic features.
>>
>>102981245
i used to like ollama because it has that "unload model after x minutes of no use" thing, but it doesnt seem to support avx512 so is dogslow for me
>>
>>102981221
it's a bad test because even the worst finetunes filled with "ERP slop" pass it
>>
>>102980970
>>102981013
>gpt-sovits
What a fucking let down. The voice cloning produces garbage quality that doesnt even sound like reference audio. Both xtts(unstable)/F5(too small data with missing words) produces much better sounding quality.
>>
File: ComfyUI_05573_.png (980 KB, 1280x720)
980 KB
980 KB PNG
https://www.reuters.com/legal/mother-sues-ai-chatbot-company-characterai-google-sued-over-sons-suicide-2024-10-23/
Character.AI getting sued. It's over for locusts.
Prepare for a swarm of refugees
>>
>>102981245
I think the worst part is their default, it's marketed as something simple to use, nobody is going to change them. It use 4_0 quant by default and 2048 context...
Can't forget how they obfuscate everything, too simple to have GGUF files. Another part that suck is how a lot of LLM related projects are based around ollama, and since ollama have some shitty API that nobody else use it kill the whole ecosystem. They have a OAI compatible API, but it's incomplete, instead of doing like all the others LLM projects that accept extra parameters they only accept official OAI one, that means if you want extra samplers you have to use their shitty API.
>>102981272
Just use systemd for that or another init system and a proxy, with systemd you can use included one: /usr/lib/systemd/systemd-socket-proxyd --exit-idle-time=5min
>>
>>102981301
*ahem*
skill issue
>>
>>102981301
>The voice cloning produces garbage quality
Bullshit. I wouldn't say that the voices are indistinguishable, but it far from garbage.
>>
>>102981312
Locusts don't use CharacterAI, you don't know what is the meaning of locust, retard.
>>
>>102981312
Doubt it will go anywhere. If you look it up the kid both broke the rules by using the site being younger than 17 AND edited all the responses to say what he wanted to the point that he basically wrote a fanfiction. Parents who are actually at fault just looking for a payday.
>>
>>102980810
Do keep survivor bias in mind.
Things that are already popular get more popular automatically.
ollama is the wrapper that made it big but there are dozens of other ones that never took off.
The devs being ex-Google probably gave them an edge though.
>>
>>102981377
>edited all the responses to say what he wanted to the point that he basically wrote a fanfiction
that can't be true, how does one get so immersed in the chat to the point of ack-ing themselves while constantly breaking their immersion by editing messages?
>>
>>102981389
Shit was mac (ARM) only software for a while and came way after all others wrapper. Like it came out after llama 2 when the whole ecosystem was already solidified.
>>
>>102981401
Some people hold beliefs that cannot be held after taking a shower. I doubt that was a normal kid to begin with.
>>
File: 1708266380645-1.png (303 KB, 1024x1024)
303 KB
303 KB PNG
>>102981375
(you)
>>
>>102981312
>and would make changes to "reduce the likelihood of encountering sensitive or suggestive content"
character.ai sissies not like this!!
>>
File: disapear.gif (2.22 MB, 360x498)
2.22 MB
2.22 MB GIF
>>102980810
Marketing and 'cool kid clubs'
Work at Google, get involved in the right social circles, let your network know your launching a new product, your network then blasts that out to everyone, and since your ex-google, clearly you're smarter than everyone else, and don't need to give credit to the people who's work you're using. No way at all.
Fuck ollama.
>>
>>102981401
Kids do that. They are entertained by anything and cannot perceive any flaws.
>>
>>102981545
Sure it's just networking, and not google spreading its influence trough "ex" employees.
>>
>>102981374
https://vocaroo.com/1hkNMBvZHPI9
vs
https://vocaroo.com/18hUiLH29kTy (gpt sovits)

https://vocaroo.com/1cxviw5RExde
vs
https://vocaroo.com/1kL4k7GjLPcx (gpt sovits)

Neither perfect, but ones a lot better
>>
>>102981301
https://litter.catbox.moe/mgbvg3.ogg
sovits is still fun to play with and i can't get xtts to work
>>
>>102981670
Also, this is with F5-E2 tts.
>>
>>102978347
Lambos are only 250k?
Don't tell me you looked at YouTuber special Huracans?
>>
>>102976873
>--MolmoE-1B-0924 model recommended for object detection in images:
https://huggingface.co/allenai/MolmoE-1B-0924/discussions/7
>Examples of fine-tuning code?
>We plan to fully open-source the code soon (after clean-up) which will likely include finetuning examples
>27 days ago
Niggers.
>>
>>102981670
>>102981670
Reference audio if you want to repeat the experiment
>Bateman
https://vocaroo.com/1wQdND1WInkj
>Aerith
https://vocaroo.com/1fixBobnqNON
>>
File: take-your-meds.webm (2.91 MB, 1440x720)
2.91 MB
2.91 MB WEBM
>>102981669
They wouldn't be ex-employees if that were the case. Google would happily bankroll such an operation.
You could make the argument that its being done by a sr mgr/director-level as a personal play to pivot, but I don't think so. I think its plain nepotism and 'cool kid/popular one' bullshit.
SF is a _big_ networking city, plenty of events all the time, with a large concentration of people working in tech/tech-adjacent and being comfortable using 'new' stuff.
All it takes is one person at one of those events to show off their new product -> people use it and share at other events -> popularity blows up.
Not _everything_ is a damn conspiracy anon.
>>
>>102981680
Try the E5-F2 TTS

https://huggingface.co/spaces/mrfakename/E2-F5-TTS

>install
https://huggingface.co/spaces/mrfakename/E2-F5-TTS/
Clone this
Setup env with conda or something with 3.12 and install the pytorch with cuda 12.1 (or 12.4) and install the requirements.txt with pip
>>
>>102981774
would i be able to inference from it through koboldlite by setting up some kind of openai compatible api thing?
that's all i'm really looking for in a tts setup.
can't get alltalk to work at all.
>>
>>102981680
>https://github.com/daswer123/xtts-webui
Have you tried this?

>https://huggingface.co/daswer123/xtts_portable/resolve/main/xtts-webui-v1_0-portable.zip?download=true
Or a portable version here?
>>
talking about TTS, thoughts on https://vall-e-demo.ecker.tech/ ?
I admire his autism
>>
>>102981670
>>102981689
>>102981723
I see. I downloaded your the samples and i'll give it a look later. The little tests i did with gpt_sovits worked pretty well, but yeah. Your F5-E2 sounds much better. The sovits model, was it v1 or v2?
>>
>>102981841
You'd need to write an API server for that. I think someone did a work part of the work here with setting up an API server. If you got time, you can possibly edit the API server code to match that of kobold compatible API.

https://github.com/jianchang512/f5-tts-api
>>
>>102981879
Its the standard that came with the download link provided. I forgot to check, as I've already closed the setup.
>>
>>102981841
>>102981885
And if you're lazy, you can just message mrfakename and ask if he could write a quick API that uses the same IP/API format as any of the supported ones like xtts/alltalk/openai.
>>
>>102981933
or get <llm of your choice> to do it, setting up an API according to a spec is exactly the sort of busy work these models can chew through with very little guidance
>>
>>102981900
>Its the standard that came with the download link provided. I forgot to check, as I've already closed the setup.
Fair enough. I'll give it a go with your samples and post when i'm back. I'll try both models, just in case.
>>
Do sloptuners have their own benchmarks to test their stuff on? Or do they just say hey it's a failure I spent money on it so might as well upload the weights and give the abomination a fancy name like PrimeOracle?
>>
File: Untitled.png (10 KB, 449x287)
10 KB
10 KB PNG
>>102981845
i think this portable one's gonna be a winner when it finishes unzipping next week
>>
>>102982039
Some upload variations of a model and after testing, either by themselves or with other people, they keep the best performing one and remove the rest. Not sure how prevalent that is among them.
>>
>>102982048
Use 7-zip.
>>
>>102982039
Sloptuners don't even know what sampler settings to use. Wasn't there a release just this week where the author told everyone he wasn't even aware of DRY sampling and said he just uses everything at default-ish settings. In the worst care (Dummber), he lobotomizes the models by training them on the wrong instruct format and then copes endlessly about it in these threads and flees to Reddit for validation.
>>
>>102982048
>using default windows extractor/copier/delete/move
YIKES. Thats like 50%+ speed debuff right there vs 7zip/TeraCopy/etc
>>
>>102982124
>sloptuner posts new Nemo tune
>recommended settings are default koboldcpp samplers and the original mistral instruct format
happens every week
>>
>>102982166
>>recommended settings are default koboldcpp samplers
>instead of ________
>>and the original mistral instruct format
>instead of ________
Fill in the blanks so your comment is useful to the next sloptuner instead of part of the noise floor.
>>
>>102982182
Why bother? You never learn anyway.
>>
>>102982182
isn't the implication that these are wrong enough to inspire they look up the correct information, shit-for-brains? if they're not actually invested in doing a good job they're sure as hell not interested in some anon's opinion
>>
>>102982182
In this example Nemo would suffer because Mistral itself recommends <1 temp, and the original Mistral instruct format is outdated, replaced with Mistral V2 or Tekken
>>
>>102982124
>DRY sampling and said he just uses everything at default-ish settings
A good model shouldn't need complex samplers. General model testing should be done with as simple of a setup as possible.
>>
>>102982124
If I see fancy samplers recommended in the model page I'm not downloading it
>>
>>102982289
Nobody asked you.
>>
>>102982308
>yeah I trained my shit on pure gptslop, just use DRY Dynatemp XTC and mirostat together to fix it bro
>>
File: sloppa (2).png (62 KB, 844x454)
62 KB
62 KB PNG
New Sloppacomplete, now with the sloppiest technology available: SloppaSampler (only works for llama.cpp)! Simply set Max tokens to 1, then type something, use the number keys to insert your Slop-token of choice! You can also see how the samplers affect the token probabilities in real time. https://rentry.org/sloppacomplete/raw
>>
>>102982278
>>102982289
This is why the scene sucks, pajeets just train their shit model, drop it, and move on to the next trash. No QC, no testing, no information, just spam reddit and /lmg/ with the link and move on to the next shitty project. God forbid a sloptuner spend some time testing sampler settings, figuring out what produces quality outputs consistently, if new "fancy" samplers are a net positive for the model or not.
>>
>>102977151
Doubt it, I'm not wasting time downloading another magnum shit.
>>
>>102982365
You do know that these fancy samplers were made to combat shit models right? And you suggest they test their models with antislop samplers enabled? So you know how retarded that sounds?
>>
what's a nice frontend for story writing?
>>
>>102982393
> And you suggest they test their models with ALL THE THE TOOLS READILY AVAILABLE TO THEM?
Fuck you are retarded this hobby is so cooked
>>
>>102982398
novelcrafter is the best, mikupad is the comfiest
>>
>>102982344
Thanks anon, this is pretty neat.
This project is the only other I'm aware of doing this:
https://github.com/the-crypt-keeper/LLooM
>>
>>102982204
>isn't the implication that these are wrong enough to inspire they look up the correct information, shit-for-brains
Look it up where? What is the reliable reference? You forgot to post a link to it.
>>
>>102982409
fr fr
>>
>>102982425
>spend hours and hours preparing, reading, learning about LLMs
>finally ready to do a sloptune of your own
>again, spend hours reading and learning, collecting your data sets
>can't be bothered to look up available documentation or the pages of discussion users have already had about the model
no you're right i'm the dumb one sorry man
>>
>>102982410
Not really looking to pay a monthly subscription, especially when I'm running inference myself anyway.
>>
>>102982365
>God forbid a sloptuner spend some time testing sampler settings, figuring out what produces quality outputs consistently
You didn't get it. I'll repeat it. A good model wouldn't need complex samplers. It's supposed to be better than their parent model with exactly the same settings and there's no more reliable sampler greedy. Different implementations of complex samplers on different inference software will case different results. All implementations of XTC are different, for example. But top-k 1 can be reliably implemented.
I'd tell you to read this PR, but i doubt you can maintain attention for that long
>https://github.com/ggerganov/llama.cpp/pull/9742
>>
>>102982182
I doubt that nigger has ever trained anything in his life. Ever.
>>
>>102982476
>more reliable sampler greedy.
more reliable sampler *than* greedy.
>>
>>102982464
Sloptuning requires only slightly more intelligence than it takes to turn the computer on, I doubt they're doing a lot of reading or learning regarding the subject
>>
>>102982409
>let's add a bunch of noise when we test bro
I have a feeling you barely know what samplers are except they're magic knobs that make things betterer. I hope you're not in charge of anything important
>>
File: ComfyUI_34410_.png (914 KB, 848x1024)
914 KB
914 KB PNG
>>
>>102982466
You can use it locally for free, there was a rentry for it but I forgot the link.
Here is the one I had downloaded: https://litter.catbox.moe/jx56rv.html
>>
>>102982464
I didn't say that you were dumb. That's something that came from within your own mind, perhaps your own soul.
I said that if you can identify wrong settings but don't offer the correct ones then you're choosing to perpetuate a problem that you could solve by teaching the correct settings.
>>
>>102981878
>install package
>run web UI
>it just works, no fucking hoops to jump through
Finally, something that JUST WORKS
>>
>>102981680
>>102981301
>>102980970
My problem with XTTS2 is tend to make noise, I don't know how to avoid this.
>>
>>102982537
kek does that fully work? That's crazy. Where are the stories stored?
>>
>>102982730
>xtts
decent/fast/cloning is decent but unstable as it produces garbage output often
>f5
decent/fast/cloning is good but cant pronounce some words
>gpt-sovits
okay/fast but clone voice dont sound like reference

>finetune
xtts: good/easy on 8GB card
gpt-sovits: havent tried it
f5: needs 20+GB of vram to run it, havent tried it
>>
is this the new imggen thread
>>
>>102982842
yeee
>>
>>102982835
Can't we fix Tortoise? Tortoise worked pretty well. Slow, but worked. (Except when it barfed garbage or crashed my computer. Never figured that one out. Seemed to be in Python's math packages. Maybe an internal race condition?)
>>
>>102982537
Here's the link:
https://rentry.org/offline-nc
>>
>>102982871
Give up. Its dead. I personally think the best is F5 right now, it just needs proper dataset training, which a finetune might be able to do so. And people who are curious about emotional voices for it, (or any tts), they can simply use multi style format provided that your speaker reference has multiple emotional references as well. So I dont think looking backwards towards tortoise is the answer.

I dont know the root cause of why xtts useless results, but that might be another venue to fixing it, but xtts has been out for a long time and there really hasnt been a fix for it.
>>
>>102978199
Even 1.1 is shit now. Used to work great for the size but now it can't even follow a simple scene without going out of character. I don't know if I should blame ST's recent formatting changes or Koboldcpp but shit sucks
>>
bros... i genuinely can't fatho how much of an improvement the xtc sampler is to my erp sessions, it feels so much more creative without becoming gibberish like high temp
this + a good banned words list and we might just have a fix for slop
>>
>>102978199
buy a fucking ad sao
>>
>>102982935
what settings?
>>
>>102982909
>I dont think looking backwards towards tortoise is the answer
I don't either, but it's the only thing I've gotten non-garbage results out of. Everything else found a way to make me feel retarded fighting with Python venv fuckshit or pip shitting all over itself and if it did work then it'd work for a moment, make bad output, then break, etc. and Tortoise let me voice clone in what at least seemed to be reasonable GPU time and results. If it didn't crash my whole fucking system at random it'd be sufficient for my use cases.

>F5
12GB Vramlet, above says it needs 24GB, so that does me no good.
>>
>>102982946
by no means optimal, but what i use most of the time
temp : 1.25
min-p : 0.05
xtc-threshold : 0.1
xtc-probability : 0.5
dry-multiplier : 0.9
dry-base : 1.75
dry-length : 2

what this does is basically force the model to choose tokens that have a probability between 0.05 and 0.1 half of the time, you can increase xtc-probability up to even 1 and it works fine most of the time
>>
have people stopped using exl2? I can only find q4 quants for Mistral-Small-Instruct-2409.
>>
>>102983030
aren't most people now using GGUF and offloading as much as they can into their vram?
>>
>>102979167
I know this is bait and posted 8 hours ago, but let's take a look at a real piece of literature, shall we? This is a popular one, the first book of the wheel of time. It's a pretty modern book. Lots of words, there's probably some slop in there, right?
>No mixtures of emotions
>No ministrations
>No shivers up or down spines
>No tails swishing
>No almost-whispering
And so, you are a fucking faggot retard
>>
>>102983044
that reminds me I need to generate more WoT smut
>Verification not required.
>>
Can you access the more advanced sovits inference parameters? like seed and sampling steps, or whatever.
>>
>>102983044
Now ctrl F for "tugging on her braid" and report back, champ
>>
>>102983030
Isn't exl2 only for VRAM chads? I'm a bit busy buying food and gasoline to find my stack of bit coins to buy that shit so my slop comes a little bit faster.

And if I were going to run a model small enough to fit my VRAM, it's already faster than I type a prompt so I don't have a use case. I guess somebody does since you found that one, but otherwise, I'm going to make do with what I have, which is GGUF and four sticks of system RAM.
>>
File: 00005-730888155.png (1.06 MB, 768x1080)
1.06 MB
1.06 MB PNG
>>102953597
Finally, my new CPU has arrived, and I've managed to get everything up and running, except for Flash Attention 2, xformers, and fish-speech. With a 512GB/s bandwidth, card's performance is comparable to that of an 3060 with a 360GB/s bandwidth.

With GRUB_CMDLINE_LINUX_DEFAULT="amdgpu.sched_policy=2", idle power consumption decreased from 42W to 6-7W.
>>
File: nagzul.jpg (160 KB, 1334x918)
160 KB
160 KB JPG
https://vocaroo.com/1jXMXtD9RxoN
>>
One thing I appreciate about aya is that I am an esl and so far all the models were unusable in my language. At least for smut rp. Aya is like 95% perfect for this (still has some minor retard mistakes). I am genuinely curious to see what kind of slop turns of phrase I will find in my language. I think I already saw a few of the english classics but they somehow don't trigger revulsion in me cause I didn't see them that many times in my own language yet I guess.
>>
>>102983356
How smart is it really?
>>
Are there any Nemotron 70b system prompts or merges that get rid of it always trying to force bullet points or headlines into all its RP replies?
>>
>>102983136
RDNA2 can't do flash attention, only CDNA2+ and RDNA3+ have those capabilities. Don't know what is fish-speech but looks like it use faster whisper which use ctranslate2. It only had a HIP port a few months ago, https://github.com/arlo-phoenix/CTranslate2-rocm, worked fine for me for whisperx.
>>
>>102983489
Telling it that it is {{char}} and will only respond in character.

But this finetune fixes it as well.
https://huggingface.co/Envoid/Llama-3.05-NT-Storybreaker-Ministral-70B?not-for-all-audiences=true
>>
>>102982875
nice Miku pic
>>
>>102982789
Your browser's localStorage afaict
>>
>Intellect-1 just lost 0.16 percent of progress
I wonder what error occurred this time. Good thing they implemented save states into this thing.
>>
>>102983616
God I hope they hurry up so someone can slop tune it so I can nala test it.
>>
>>102983616
WHO CARES
>>
does anyone have the link to the vtuber ai audio archive that was arounda a year or so ago?
>>
>>102983673
Me, or else I wouldn't have brought it up.
>>
>>102983616
I asked
I care
You must be watching that shit like a hawk, thank you for your service
>>
>>102983375
I tried it for a bit longer and... not very. It even called a dick a roster. It is somewhere in between understanding the language and doing what other llms do - write in english and then directly translate words instead of sentences.
>>
>>102983030
https://hf.co/LoneStriker/Mistral-Small-Instruct-2409-8.0bpw-h8-exl2 (also 6.0bpw, 5.0bpw, 4.0bpw, 3.0bpw)

Oddly a 8.0bpw exl2 of Mistral Small will fit into 24 GB of VRAM with 16k context and no context quantization, but that's not true for any fine tunes of Mistral Small I've tried so far (so I've mostly switched to Q6_K / Q6_K_L GGUFs which I can fully run in RAM with 18k context). I wonder what exllamav2 is doing.

Also I think the sweet spot for Mistral Small is 18k or 19k of context; right above 19k is when it seems to start falling apart. There was a guy who used byroneverson/Mistral-Small-Instruct-2409-abliterated to extend 1000 synthetic ERP chats from 4k context to 20k context, which if they aren't mad trash at the end would indicate the ceiling is slightly higher, but as he hasn't made the generated data isn't public my inclination is to think they did turn to garbage and he didn't catch it.
>>
>>102983798
>8.0bpw
newfag trap
>>
>>102983839
vramlet quant cope
>>
I just got a 3090 the other day, what's the current meta?
>>
File: file.png (764 KB, 768x768)
764 KB
764 KB PNG
>>
>>102983857
ONE 3090?
>>
>>102983857
>what's the current meta?
2MWU_ntilgood_30B.gguf
>>
File: file.png (107 KB, 1180x1152)
107 KB
107 KB PNG
>>102982537
>>102982875
Am I retarded? I cannot figure out how to get this to work with ooba.
Mikupad works fine with the openai compatible API I've setup on ooba.
>>
File: card.jpg (40 KB, 1343x302)
40 KB
40 KB JPG
>>102983874
This one?
>>
>>102983889
did you try adding v1 at the end
>>
>>102983858
Standing behind the pochiface on the escalator and pushing her forward so her teeth meet metal
>>
>>102983940
of course
>>
>>102983857
What I'm using:
ArliAI_Mistral-Small-22B-ArliAI-RPMax-v1.1-Q6_K.gguf
bartowski_magnum-v4-22b-Q6_K_L.gguf
bartowski_Pantheon-RP-Pure-1.6.2-22b-Small-Q6_K_L.gguf
>>
>>102982871
>he doesn't know
XTTS2 is tortoise with changes copied from the ick on eck faggot's tortoise fork
if youre not happy with XTTS2 then nothing will save tortoise
>>
>>102984061
I haven't tried the new stuff but if that's a fixed Tortoise I'll give it a try and see if it'll go and not bring down my computer. Thank you for the information.
>>
>>102983857
For a smarter option try this gemma 27B one. It can do complicated stuff you normally need 70B+ for

https://huggingface.co/anthracite-org/magnum-v4-27b
>>
>>102984285
What quant / how many layers offloaded / what speed on your 3090?
>>
>>102984420
4_K_M, even pushing it to 16k I get above 5 tokens / reading speed. With less context you can fully fit it but I prefer more context.
>>
>>102984285
I've been wanting to try a Gemma 27B because apparently all the finetunes are for 9B. Is it slopped and/or dumber than the original?
>>
>>102984537
No, which is why im recommending it. Gemma 27B has always seemed smarter to me than llama 70B (not qwen 2.5 though) but dry. This tune though fixes the dryness while keeping its smarts. And I find the 72B magnum retarded and too horny.
>>
>>102980694
>based chinks
hahahahahahahaha roflmao lol even
>>
>>102984575
The only model I would put above 27B in the smarts department WHEN IT COMES TO RP / CREATIVE FICTION is mistral large. But I can only manage that at 2bit and its too slow for me. I liked nemotrons writing but again, too dumb for complicated stuff. I like medieval political intrigue in a fantasy world composed of multiple species. Even 70/72Bs ive ever used fell apart with that.
>>
>>102983044
It's hard to prompt things away. Just like with people you can say "don't jump" and you get more jumping than if you had said nothing.
>>
>>102984496
>even pushing it to 16k
>"max_position_embeddings": 8192
So you're saying
>it can do complicated stuff you normally need 70B+ for
With RoPE scaling, at Q4_K_M, and despite being fine tuned using ChatML instead of Gemma 2's instruct format.
>>
>>102976869
>genmoai-smol allows video inference on 24 GB RAM: https://github.com/victorchall/genmoai-smol

getting weird errors like when loading the actual checkpoint

Unexpected key(s) in state_dict: "t5_y_embedder.to_kv.bias", "t5_y_embedder.to_kv.weight", "t5_y_embedder.to_out.bias", "t5_y_embedder.to_out.weight", "t5_y_embedder.to_q.bias", "t5_y_embedder.to_q.weight", "t5_yproj.bias", "t5_yproj.weight".

t5 itself loaded without errors though.

Will they ever fix it?!
>>
>>102984814
Yes, read the page. Its trained on a ton of 16k context chatml stuff based upon a chamml-"ified" version of the model by a company:
https://huggingface.co/IntervitensInc/gemma-2-27b-chatml
>>
>>102984814
>>102984904
Which in itself was trained on a extra 13 trillion tokens.
>>
>>102984904
>Its trained on a ton of 16k context chatml stuff
and it's not going to work on anything other than that stuff
>>
>>102984916
No it was not. That's a line duplicated from the README of https://hf.co/google/gemma-2-27b-it

>These models were trained on a dataset of text data that includes a wide variety of sources. The 27B model was trained with 13 trillion tokens and the 9B model was trained with 8 trillion tokens. Here are the key components:

>Web Documents: A diverse collection of web text ensures the model is exposed to a broad range of linguistic styles, topics, and vocabulary. Primarily English-language content.
>Code: Exposing the model to code helps it to learn the syntax and patterns of programming languages, which improves its ability to generate code or understand code-related questions.
>Mathematics: Training on mathematical text helps the model learn logical reasoning, symbolic representation, and to address mathematical queries.

>The combination of these diverse data sources is crucial for training a powerful language model that can handle a wide variety of different tasks and text formats.
>>
>>102984904
>by a company
it's just one random dude
>>
>>102984953
It works for me. Literally just try it with some story or something. No catastrophic forgetting / going schizo.
>>
File: 1717737163584922.png (6 KB, 285x279)
6 KB
6 KB PNG
The constant search for new and better models, swapping them in and out, then having to experiment which quantization is the best speed to quality ratio...
It's all so tiresome
This general really should have a collaborative website/wiki/pastebin/whatever with the current best models, separated into which are the best for RP, instruct, and so on
>>
>>102984987
SFW uses:
Mistral large / Qwen 2.5 / Deepseek 2.5

NSFW uses:
Mistral large, then nemotron / gemma 27B tunes depending on how complicated the scenarios are. Then some qwen2 / 3.1 tunes for smarts or then mistral small, then mistral nemo stuff for fun writing.
>>
>>102985009
>or then mistral small, then mistral nemo stuff for fun writing.
For fun but dumb I should specify.
And I have yet to find a non-nemotron 3.1 / any qwen2 finetune that made a model fun without making it retarded.
>>
>>102984987
>This general really should have a collaborative website/wiki/pastebin/whatever with the current best models, separated into which are the best for RP, instruct, and so on
I could try making a rentry that anyone can edit with some models to use as a base
We'd just need to put it in the OP and get the general to contribute
>>
>>102984987
I don't speak reddit.
>>
>>102985107
My experience with wikis and such has been that the thing is usually shouldered by one or two people.
So you should not start something like this with the expectation that other Anons will contribute relevant amounts.
And be especially wary of the fact that there are a lot more people saying that they would help vs. people that actually follow through.
>>
>>102985107
>I could try making a rentry that anyone can edit with some models to use as a base
I don't think making anything publicly editable is a good idea. It would get vandalized for sure.
With wiki's at least, you can configure approvals for edits and such.
Regardless, any such index would need a steward.
>>
>>102984675
Yeah, which is why it's a training issue. Unfortunately, it's up to the people who make the models to remove the bad data, and they aren't doing that.
>>
>>102985167
>>102985174
It's even worse with generative AI because this shit moves so quickly. Someone might be diligent about it for a bit and then stop caring after the Nth time something becomes obsolete
>>102985197
It's more fundamental than that.
>>
File: 1700056589360979.jpg (267 KB, 1024x1024)
267 KB
267 KB JPG
ITT retards who can't code complain about the state of a cutting-edge field.
>>
>>102985213
Show us your cutting edge models then :)
>>
>>102985213
Anon I maintain a (small) python library + repo for all the LLM pipelines at my company. And I cannot be fucked to update the internal wiki anymore, there's too many changes and rewrites.
>>
File: 1714562027764987.png (1.03 MB, 804x516)
1.03 MB
1.03 MB PNG
>>102985197
>they aren't doing that.
They do.
>>
>>102985174
Yeah, basic anyone-can-edit approach would go really wrong now that I think about it
One way to go around it would be to contribute by copying the whole thing, making a new rentry with the updates, then posting it near the end of a thread for OP to put in, still anyone-can-edit and safe from vandalism that way
I was considering 2 links, one read only and one anyone can edit with the read only link being taken from the other one (it would be made after each contribution), but that could be easily vandalized by removing the link to the backup
Does the copy method sound like a good idea?
>>
t. retard here:
Which benchmark/or rather which benchmark stat is the best indicator if model can write a research essay well? Like I give it material for example and how well it structures and works out a task
>>
What's a model / lora that's going to naturally on it's own lean towards romantic responses the way CAI used to be until recently?
>>
>>102985286
That's exactly what I was going to suggest, but I crashed my computer and I got stuck on the 15 min timer again.
It's still not immune to vandalism, in that anybody can make a fucked OP (see blacked anon), but that's probably the best option, and is the standard for generals as far as I'm aware.
>>
>>102985497
Pretty much any. Some are just more horny than others / which is often tied to how smart / dumb it is. Scroll up
>>102985009
>>102985047
>>
File: 1707229294776113.png (1.48 MB, 709x905)
1.48 MB
1.48 MB PNG
>>102984987
>>102985009
>>102985107
>>102985167
>>102985174
>>102985205
>>102985536
OK LISTEN UP
I MADE THE RENTRY:
https://rentry.co/nqinipvg
https://rentry.co/nqinipvg
https://rentry.co/nqinipvg
It includes the instructions on how to contribute, and a table of models (it's not the best right now since I'm a retard, but it's a start)
The way this works is if you want to make a contribution, you copy the whole thing (with markdown), make a new rentry with your edits, and post at the end of the thread for OP to put in the new thread
So now all that's left is for OP to see this, and include it in the next thread to get the ball rolling
(You) WILL help with this
>>
>>102985629
>Una-TheBeagle-7B-v1
Jesus newfag retard kill yourself.
>>
>>102985629
>Looks at list
Already garbage list. No thx.
>>
>>102985629
Good on you for starting.
I nominate rocinante for the 12B slot. v1.1.
>>
>>102985629
>>102985647
Also, 11B is probably redundant.
Hell, anything under 12b might be redundant. You are probably better off running nemo at q4km than an llama 3.x or gemma2 9b. Even more so when you consider that nemo is supposed to be more resistant to quantization to begin with.
>>
>>102985636
>>102985639
Yes it's pretty bad right now, it's a rip of this list that used to be in the OP but got since removed and hasn't been updated in a while: https://wikia.schneedc.com/llm/llm-models
If you know better models, just edit them in as instructed!
>>
arcanum's the 12b model i always end up going back to, some slopmerge between rocinante 1.1 and nemomix unleashed.
it's really good.
>>
>>102985629
>Goliath
>>
>>102985629
What the fuck is this? Literally none of those suggestions are good.

>35B: c4ai-commanr-r-v01
This could have been a good recommendation but it comes with a huge asterisk. It doesn't have GQA which balloons the memory requirements defeating the purpose of using a 35B model unless you're content with tiny-penis context size. I won't go so far as to call this an awful suggestion but it's far from being a generally useful one.
>>
File: y1shyiwnl2zc932.gif (1.87 MB, 240x228)
1.87 MB
1.87 MB GIF
>>102985629
>/lmg/ Official Best Models To Use Guide
>mixtral-8x7b-instruct-v0.1-limarp-zloss.Q5_K_M absolutely nowhere to be found
ngmi
>>
>>102985629
>>102985647
it's really as shrimple as that
12B column updated: https://rentry.co/awnic2ai
>>
>>102985772
Yeah. Anon just took some old shit from the OP and made a template. The point is to provide suggestions to make a proper list to put in future OPs.
>>
Cba to edit it myself.
Someone else put this shit in it.
>>102985009
>>102985047
>>
>>102985776
It's far past time to move on from that ancient model, old timer.
>>
Rule 1 of the internet:
Write something so fucking stupid that people jump out to correct you.
>Verification not required.
>>
>>102985804
models don't age, their weights are the same as they were the day they released
and mixtral has yet to be surpassed by any other model you can run on consumer hardware
>>
File: 24356543676434.jpg (41 KB, 480x360)
41 KB
41 KB JPG
>>102985804
I WOULD but nothing NEW is better.

And mixtral is so fucking good, any slop or pozzing is actually a prompt/skill issue.
>>
>>102985629
That one of the worst list I have seen.
>>
>>102985776
With a 3090 I can run Mixtral 8x7B fine tunes at Q6_K with 32k context offloading 18 layers at around 5.5 tokens per second.
>>
>>102985776
>limarp-zloss.
Best mixtral.
In my own usage, Nemo seems to be about as good while being a lot smaller and faster to run on my 8gb vram setup.
Or at least I think it's faster. I remember mixtral taking a while to run last time I tried it.
Maybe I should download it again.
>>
File: file.png (85 KB, 549x335)
85 KB
85 KB PNG
>last post June 2023
>>
Hi imagefags. Can someone generate a hot negress for me? Thanks.
>>
>>102985871
oh, wrong thread.sorry
>>
File: MonoMikuWut.png (851 KB, 896x1152)
851 KB
851 KB PNG
>>102985871
>>102985880
>>
>>102985870
Running a few scripts and raking in the cash. The easiest deal of the world
>>
>>102985797
Since it's just markdown, editing is as simple as swapping the model name and the links in the table
I would edit them in, but I don't know which specific tunes and sizes on huggingface you mean on most of them
>>
File: 342164376523.png (44 KB, 452x583)
44 KB
44 KB PNG
>>102985859
>>102985869
Luv' me mixtral
Luv' me 16k+ context (just werks)
Luv' me limarp zloss
'Ate L3
'Ate gemma
'Ate nemo
(not poor just dont likem)

simple as.
>>
lotta rock dwellers in this thread
>>
On the UGI leaderboard 8x7B models score significantly lower than Mistral Small models. Am I being memed here?
>>
File: local-llm-experience.png (674 KB, 1792x1024)
674 KB
674 KB PNG
Can I just say, fuck the new claude. It is such a fucking piece of shit.
I cancelled my sub, and then it dropped the next day.
I've been using it since and I literally think it's gotten fucking worse.
Like holy fuck. You ask for Gradio code, it gives you React code.
You ask for a refactor of existing code, it turns a functional style into an OO style.
I swear to god I will buy however many 3090s I need to run a competent code assistant. FUCK

Where's the other version of this meme, with the openai fucking up...
>>
File: chad.png (317 KB, 547x596)
317 KB
317 KB PNG
>>102985974
Because mememarks dont actually translate to which model actually can keep your dick hard and keep a story and a conversation going at the same time.
Limarp Zloss is a thing of slopmerging but it actually worked and no one has managed to do the same, at least of such quality.

Its sloppy, its got the spine shivering, its got the out of place nipple play, its got boundaries to cross and keen sense of not interacting sexually with minors.
But you can also weed all of this out, and your left with a model that honest to god, out of all the shit models that exists, one of the BEST AI models.
It will act out your /ss/ dommy mommy molestation sessions like it was a real female pedophile. It will pass the Nala test with flying colors. It can even fucking handle group chats, and multiple characters.

If it did have i dick, yes I WOULD be sucking it.
>>
>>102986080
>Claude
>OpenAI
You tried.
>>
File: 1712118687081629.gif (154 KB, 640x480)
154 KB
154 KB GIF
>>102986104
>>
>>102986080
React is safer and more aligned.
You're welcome.
>>
>>102986081
But the UGI benchmark should be pretty indicative of actual NSFW capability though shouldn't it? A model that's able to do furry ERP better should also perform better at the UGI benchmark. I mean if their scores were pretty close then I could see where you're coming from and it would make sense, but it's not close at all. This would imply that Mistral Small has been trained on more unsafe content than 8x7B.
>>
>>102986081
Its too retarded to write quadrupeds as quadrupeds even when instructed too which makes it too dumb for me.
>>
File: 1729426699627152.jpg (84 KB, 680x680)
84 KB
84 KB JPG
>>102986195
Thanks anon, let me just rewrite my PoC in react. Because my love for gradio was totally the reason I chose gradio instead of NextJS.
With your advice, I think I might be able to get this PoC pushed out next year!
Fuck Gradio and Fuck Anthropic.
>>
File: 1729989659642.jpg (64 KB, 552x556)
64 KB
64 KB JPG
>>102986081
I haven't tried many older models with newer samplers or phrase banning but I really doubt the prose or intelligence of a 46~b moe model from almost a year ago is even comparable to newer models like mistral small, nemo, and especially not large
>>
>>102986080
AI experience is getting worse and worse regardless if you're using the cloud or local. Welcome to the future.
>>
>>102986503
Don't you feel safe?
>>
File: 1456457653456243.png (33 KB, 720x540)
33 KB
33 KB PNG
>>102986249
UGI bench is ass and isnt correlative to what people are actually running. Nobody is running 70bs, 123bs, or 405s unless your rich.
Using this logic and your own score bench;
Lol
Lmao
a 7x8b comes out on top.
>>
>>102986503
Bitnet should save us.
>>
File: jaggies.png (209 KB, 1710x679)
209 KB
209 KB PNG
Hate being annoying and asking this, but is there an AI, hopefully a web hosted one, that can clean up jaggies like this? From a digital cartoon image like pic related, where someone tried to remove the background
>>
>>102986623
You can probably use an image upscaler like waifu2 x or whatever.
>>
>>102986623
expand selection (from outside) by 1 pixel then white to alpha
>>
>>102986621
the bitnet is a lie
>>
>>102985839
I can't believe people are still shilling mixtral, it was terrible even when it came out.
>>
>>102985647
1.1 is the best? How come?
>>
>>102986926
Stop beating around the bush and just post your sloptune.
>>
That one post is literally him telling us that it's bait and was never serious. It's over folks.
>>
>>102986935
Dunno, I compared it to the latest one and it was just overall better.
The latest one looked better at face value when I was putting it through my usual testing card, but then I tried some other cards that were pure roleplay and it was categorically worse, as in it made mistakes 1.1 never made.
The latest one (I forget the name) did have a Really nice cadence to the prose. No "she she she she, char char char char" etc, so that was nice. Really nice even.
>>
Also AXCXEPT/EZO-Qwen2.5-72B-Instruct is nice.
>>
File: 102.png (75 KB, 256x256)
75 KB
75 KB PNG
>>102981301

GPT-SoVits v2 gave better and wider range of outputs than F5 for me. You just need to use the reference you want. IE: Anger, Exited, Normal, Tired, etc, etc.

Outputs:

Normal Refence:
"The food isn't that good here. Let's not go here next time."
ここの料理はあまり美味しくないね。次回はここに行かないようにしよう。
https://vocaroo.com/1mKoMlkXPYLT

Angry Reference:
"O flames that shake the earth, gather in my hands. The power of destruction that swallows everything, be unleashed here and now. Explosion!"
大地を揺るがす炎よ、我が手に集え。すべてを飲み込む破壊の力、今ここに解き放つ。エェェェエエクスプロォォォオオジォォォオンンン!!!
>Volume Warning
https://vocaroo.com/1nfcHP4rwJjt

Excited Reference:
Look! There's so much cool things over there!
見て!あそこにすごいものがたくさんあるよ!
https://vocaroo.com/1jP7rMuBnNNt

Tired/Satisfied Reference
"Haa...Kazuma, I already came 5 times. Please stop."
".はぁ...カズマ、もう5回行きます。おやめください。
https://vocaroo.com/1sbzK4jnDo1o
>>
What are the best local models for coding so far?
Qwen2.5-Coder-7B-Instruct-GGUF is pretty good but it still has some retarded moments for sure.
>>
>>102987320
https://huggingface.co/AXCXEPT/EZO-Qwen2.5-72B-Instruct
And mistral large. Also deepseek 2.5
>>
File: 1727935227032428.png (1.03 MB, 899x1200)
1.03 MB
1.03 MB PNG
End of thread soon, this is the current model guide rentry for OP to hopefully put in the next thread with a couple words encouraging people to contribute:
https://rentry.co/awnic2ai
https://rentry.co/awnic2ai
https://rentry.co/awnic2ai
So far there's only been one edit and the list is more of a placeholder than anything, but with time people will make it actually good
>>
>>102987371
Do not use this. The suggestions are hopelessly retarded. Nobody here is competent enough to know good models and simultaneously lame enough to sit around babysitting the rentry
>>
obviously this is asking for much, but what's the best rp model that can do more than just echo what you write with synonym embelishment?
>>
>>102987459
As I said, the list is a placeholder, and since anyone can edit it by making a new one it will be usable in a few threads or so
There's been many great model suggestions this thread, all an anon needs to do is copy and paste the thing, and swap some model names and huggingface links
>>
>>102987320
>7B
Ouch.

>>102987320
>best local models for coding so far
I've been turning to a small cluster of Llama 3 tunes, with L3.1-Nemotron-70B setting a new standard in that it handled one of my Java tests well enough to eagerly point out and deal with the issue that most L3's get wrong on the first try and then fix in one of two ways after having the error fed back into them.

However, I'm too VRAMlet and RAMlet to run big models on a quant that isn't lobotomized so I can't speak for Mist Large or that fat Deepseek Coder from earlier this year.
>>
>>102987498
>L3
Why not Qwen?
>>
>>102987529
Qwen was okay for simple Python but I have it behind six L3 tunes in my non-rigorous testing.
>>
>so many good models that I'm getting choice paralysis, constantly switching between them because I don't want to miss out on each one's response to a specific prompt
I guess this is what local winning looks like, but it's actually getting annoying
>>
>>102987608
This. I spent more time switching models / context / instruct formats then I use models these days lol. Really into this qwen2.5 finetune now though.
>>
>>102987608

You just min maxed the fun out of everything. What gun should I use for this distance/target? What sword does the best elemental damage against this enemy? Etc, etc.
>>
>>102987645
That's a problem the rentry anon is trying to solve, really.
To find generally agreed upon good models and just use those for that parameter size.
>>
>>102987687
nta. The problem is that it gets outdated, just like all the guides in the OP, and the few other dozen guides and came and went.
Normally, the most you have to do is roughly scan the previous thread to see what models anons are talking about. If you're even lazier, just check the news and download whatever comes up. There's always a retard asking "wat coom 16gb?'.
>>
>>102987717
The "anyone can edit" instructions in it should prevent it getting outdated if some anons are willing to put some good model names and huggingface links there. We'll have to see.
>>
>>102987723
give her armpit hair
>>
>>102987260
Glad to see you enjoy your new AI toy meguanon. Is the tone consistent over multiple samples with the same emotion?
>>
>>102987371
Here you go retard https://rentry.co/piy864dr
>>
>>102987776
Yeah. Nothing ever goes wrong with free edits. Best of luck thought. Just as i wished to the previous attempts.
>>
Anyone try out GLM4 Voice?
>>
>>102987819
The barrier of defense is that it requires OP's/the general's approval and can't be deleted
I find it better than same 3 wiki discord users that will eventually ditch it
>>102987814
Thanks for contributing! This is the one should be put in the next OP.
>>
Does openwebui require an account? I remember seeing that and dropping it instantly without checking if it was mandatory.
>>
>>102987959
>>102987959
>>102987959
>>
>>102987960
why so early
>>
>>102987966
Maybe he wanted to sleep so got it out of the oven early.
>>
>>102986703
Thank you anon
>>
>>102987969
nah he added a new link with a bunch of meme models, compromised op
>>
>>102987990
Trojan horse bread, oh no.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.