[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: miku100.png (282 KB, 2022x3072)
282 KB
282 KB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103510291 & >>103499479

►News
>(12/13) DeepSeek-VL2/-Small/-Tiny release. MoE vision models with 4.5B/2.8B/1.0B active parameters https://hf.co/deepseek-ai/deepseek-vl2
>(12/13) Cohere releases Command-R7B https://cohere.com/blog/command-r7b
>(12/12) QRWKV6-32B-Instruct preview releases, a linear model converted from Qwen2.5-32B-Instruct https://hf.co/recursal/QRWKV6-32B-Instruct-Preview-v0.1
>(12/12) LoRA training for HunyuanVideo https://github.com/tdrussell/diffusion-pipe
>(12/10) HF decides not to limit public storage: https://hf.co/posts/julien-c/388331843225875

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/hsiehjackson/RULER
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Recent Highlights from the Previous Thread: >>103510291

--Anons discuss and test 3.33 70B model's ERP capabilities:
>103510448 >103510578 >103510648 >103510700 >103510732 >103510772 >103510920 >103510991 >103511079 >103511120 >103511123 >103511141 >103511086 >103511091 >103511531 >103511549
--Discussion on BLT models and scaling:
>103511135 >103511208 >103511253 >103511851 >103511937 >103511915 >103511972 >103514027 >103511307 >103511327 >103511345
--Testing and evaluating L3.3 fine-tune models:
>103511864 >103511921 >103511954 >103514111 >103514929 >103515027
--Speculation on LLaMA 4 training on Byte Latent Transformer:
>103512084 >103512126 >103512144 >103512405 >103513718
--Is data a finite resource, and how can we manage it?:
>103513546 >103513554 >103513574 >103513584 >103513647 >103513587
--Anon has issues running gguf models on 7900xtx:
>103510646 >103513731
--Anon asks for GPU upgrade advice, considering 3060, 4060ti, and 3090 options:
>103511272 >103511326
--QRWKV technique and its claimed 1000x inference time efficiency:
>103514327 >103514417 >103514443
--New video LORAs released on Civitai:
>103515165 >103515175
--Anon discusses L3.3 model, positivity and skepticism:
>103512203 >103512226 >103512235 >103512265 >103512247 >103512270 >103512276 >103512300 >103512383
--5090 GPU and local LLM performance expectations:
>103514118 >103514150 >103514211 >103514260 >103514275 >103514283 >103514303 >103514282 >103514224 >103514334
--Anons discuss model's sampler settings and volatility:
>103513950 >103514070 >103514124 >103514143
--Phi4 generates creative writing, including roleplay scenes:
>103515496 >103515702 >103515717
--OpenAI whistleblower Suchir Balaji found dead:
>103515431
--Anon asks for advice on batch processing LLMs:
>103510852
--Miku (free space):
>103511257 >103511566 >103511851 >103512323

►Recent Highlight Posts from the Previous Thread: >>103510437

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
https://x.com/sunjiao123sun_/status/1867744557200470422
>>
>EVA LLaMA 3.33 70B v0.0
>Special thanks:
>and to Allura-org for support, feedback, beta-testing and doing quality control of EVA models.

>allura-org
>A second KoboldAI splinter group has hit HF.
>Fizzarolli

Why is EVA shilled indeed.
>>
>subtle scat fetish OP
Based!
>>
>Byte Latent Transformer vs Large Concept Model
>>
>>103515833
>subtle bbc fetish reply
Based kurisuchad
>>
>>103515851
Newfags wouldn't understand thread culture https://desuarchive.org/g/thread/103478232/#q103498557
>>
File: 2nsq6njg3t6e1.png (197 KB, 809x767)
197 KB
197 KB PNG
LFG
>>
>>103515901
> ollama: chink emoji
> Junyang Lin: kike emoji
What did they mean with this?
>>
Why shouldn't I just get a 7900xtx ?
>>
>>103515944
No CUDA
>>
>>103515950
What's wrong with rocm?
>>
>>103515920
Kek
>>
>>103515944
We have been telling retards like you to never go AMD for at least 5 years but you still fall for "but it's cheaper and muh monopoly and games" gigameme.
>>
>>103515944
3090 is cheaper and better
>>
Give me BLT or give me death
>>
>>103515950
I'll just write my own software stack.
>>
>>103515976
No more 3090 here, not even second hands.
>>
File: blt.jpg (269 KB, 1600x1135)
269 KB
269 KB JPG
>>103516039
Here ya go
>>
>>103516088
Hold the mayo
>>
File: Capture.jpg (10 KB, 381x171)
10 KB
10 KB JPG
Anyone using xtts2 voice cloning with SillyTavern?
Pic related is the .wav file I'm using, should be correct format.
The sound comes out great but every 15 seconds or so it degenerates into artifacts, noise and dying sounds for a few seconds. I tried multiple voice samples, they are fairly clean, they all do this. I'm not streaming it.
>>
>>103515920
Lmao
>>
>ai niggers invented token-free architecture just to shut up strawberry niggers
zased
>but they didn't ship model itself
nvm
>>
>>103516088
more mayo
>>
>>103516246
Their model was only 8B trained on 1T tokens. That's within the reach of a lot more organizations than just the big corporations, so hopefully we'll see some models released trying to replicate the results in the paper by next month.
>>
>>103515845
why not both
Also throw on bitnet for good measure
>>
>>103516267
Please don't forget the little memes like diff transformers, mamba, and million experts
If we throw enough memes in the pot, surely we'll reach agi
>>
phi-4 is pretty decent desu
>>
>>103516506
not for ERP duh
so basically useless corposlopium
>>
I'm here for erp. Is this shit good? I'd go for something like Sonnet 3.5 but it's too expensive and I try to limit my spending to 10$ monthly.
https://openrouter.ai/nousresearch/hermes-3-llama-3.1-405b
>>
>>103516579
skill issue
>>
>>103516599
Hermes is wildly outdated imo but
>0.90 / 1M tokens for a 405B
Even quantized, how in the name of everloving fuck is that sustainable
>>
>>103516618
https://openrouter.ai/eva-unit-01/eva-qwen-2.5-72b
I also considered this one but this shit is so expensive I'm better off just paying for Sonnet 3.5. Is there a "top" 72b model? I might look into featherless too.
>>
>>103516599
If you're willing to go cloud just use Gemini experimental. Says it's free, on OR.
>>
>>103516662
Gemini is censored to hell and very repetitive for erp. Both 1206 and 1121. That's why I'm looking for alternatives.
>>
>>103516684
Really? I thought the people on /aicg/ liked it. They should have good JBs for it, did you try them?
>>
>>103516656
Best 70B for RP right now is probably Llama 3.3
You can also look into Mistral
>>
>>103516695
Yeah, tried some of them. You either get hit with mischievious glint, smirk two times in a row or you get blocked by their filter. GPT is easier to bypass but it's dry as wall with rp.
>>
>>103516726
I haven't tried Gemini, but the new Llama is definitely better than GPT. Is has enough smarts and isn't afraid to write smut.
It'll also go filthy when appropriate without having to actively wrangle it.
>>
>>103516726
Well alright, I've adjusted my world model a bit.
In any case, you could give the 405B a try and see what happens. It's probably not great if you're used to 3.5 Sonnet. The best local right now seems to be fine tunes of Llama 3.3 70B, or Mistral Large, but they still aren't as great as Claude. Also, OR limits you to chat completion I believe, and with Llama, an essential part of getting it to RP in character is modifying the Instruct formatting (since "user" and "assistant" are named within the default format), so without being able to do that, it's a bit worse.
>>
americans are literally defecating into my yuropoor mouth, hogging all the 3090s themselves while we get scraps
this is not fair, but it's my righteous place as a european subservient of the western superpower
>>
>>103516846
?? I get 3090s for 500€ in my euro country while americans are paying 750$ on ebay and such sites, amerimutts can cope
>>
>>103515959
it's shit
>>
Hungry... For BLT...
>>
So did anything happen with SillyTavern after all? There was a lot of fuss some months ago about an update, but it just seems to have died down.
>>
https://www.cnbc.com/2024/12/13/former-openai-researcher-and-whistleblower-found-dead-at-age-26.html
>Former OpenAI researcher and whistleblower found dead at age 26
>Balaji left OpenAI earlier this year and voiced concerns publicly that the company had allegedly violated U.S. copyright laws in building its popular ChatGPT chatbot.
Holy shit, OpenAI became boeing
>>
>>103516846
>western superpower
不会太久
>>
>>103517010
>the company had allegedly violated U.S. copyright laws in building its popular ChatGPT chatbot.
Did we really need a whistleblower to tell us this?
>>
>>103516656
https://huggingface.co/EVA-UNIT-01/EVA-LLaMA-3.33-70B-v0.0
>>
File: 1733427802529570.jpg (65 KB, 451x604)
65 KB
65 KB JPG
Hello sirs. Is there such a thing as local voice generation (aka text-to-speech?) Somehow I only see people talk about text or image.
>>
>>103517143
GPT-SoVITS is the closest we have to local elevenlabs, but its still slightly robotic even when its good.
>>
File: 1730909420774661.jpg (272 KB, 1438x1226)
272 KB
272 KB JPG
>>103517151
>GPT-SoVITS
Can this thing run on KoboldCPP?
>>
File: ryo-yamada-bocchi.png (95 KB, 498x498)
95 KB
95 KB PNG
What's a good speech to text model and is 16gb vram enough? I just want real time speech transcription with minimal latency.
>>
hello /lmg/, havent been here for months, the thread seems way slower now. Also someone recently recommended me to try running Luminum on runpod if I ever wanted to give local a try, is that the meta for RP? or is there something better out there?
>>
File: 1733740628736014.png (2.14 MB, 1576x694)
2.14 MB
2.14 MB PNG
>>103515753
In the process of downloading GPT-4chan. Are there any guides on how to use these models on a local install? (Specifically Ubuntu Linux)? Apparently they're supposed to be a config.json file that accompanies the model. How do you integrate that into using the model? I'm not ask you how to babysit me. I'm just asking you to point me in the direction of a guy that actually properly explains how it's used.
>>
>>103515901
QWQ 125B
>>
>>103517250
atm imo:
>>103517093
>>
>>103517255
The safety concept for GPT-4chan is to provide no retard-safe guides so it cannot be used by /pol/ users.
>>
Um redditbros?
>>
>>103517301
He essentially has to. It's donate or get outcompeted by Elon.

Trump admin is just oligarchic open bidding. Highest bidder gets the contracts/laws/favors they want.

There's a reason that Russia and China call Trump the American Yeltsin. Selling out the country to the highest bidder even if it means the country will collapse.
>>
>>103517301
What does legalized bribery have to do with Reddit?
>>
>>103517369
Yeah that's his MO. Flattery and bribery will get you everywhere with him
Still pretty hilarious how quickly Altman stoops to his primal bootlicking roots when the need arises. Pretty sure one million dollars won't be nearly enough either, what with Elon's status as his right hand man, endless reserves of cash, and insatiable desire to sue OpenAI out of existence
>>
>>103517301
>>103517369
>1m
What a cheapskate. Glad his company is slowly crumbling into dust.
>>
>>103517192
>Can this thing run on KoboldCPP?
I don't think anyone has made a plugin for it in any of the frontends. You can either use the official gradio, some ponyfag alternate gradio or a firefox plugin someone here made.
basic guide: https://rentry.org/GPT-SoVITS-guide
>>
>>103517369
Zuck went through 8 years of being dumped on by both sides before he decided that he might as well join team Trump. Altman folded in 5 seconds.
>>
>>103517143
There's plenty of talk about tts. Someone who comes here often would know.

>>103517202
rhasppy/piper is faster than realtime. I run it on a single core vm with 512mb ram, including the OS. No GPU needed. It's not the best sounding one, but it's stupid fast and has a few hundred voice models for you to try. I remember I managed to get GPT-Sovits running on a single core, 4 or 8gb vm, again, no GPU. Much slower than piper, but still fast. The voice cloning gives you unlimited voices.
>>
>>103517202
>good speech to text model
see whisper.cpp
>>
>>103517277
>QWQ 125B
I would cream my developer pants over this
>>
>>103517202
>>103517451
Shit, i read you backwards. I'm a retard and i'll leave myself out.
>>
>>103517451
SPEECH TO TEXT anon.
>>
>>103517428
>and insatiable desire to sue OpenAI out of existence
Please God, I only want one thing for Christmas.
>>
>>103517202
Whisper Large V3 Turbo should be able to run no problem. Real time might be a bit of a stretch with most of these models though
>>
>>103515901
>>103517277
QwQ-coder-32b and 72b
>>
>>103517459
that's not a model
>>
>>103515901
QwQ-notpreview-405B
>>
>>103517471
>developer pants
What are developer pants?
>>
>>103517459
>>103517488
Thanks
>>
>>103517451
>I managed to get GPT-Sovits running on a single core, 4 or 8gb vm, again, no GPU
sovits firefox plugin github repo has instructions on running the api server in google colab free
>>
putinbox
>>
Has anyone gotten either of the new chink vision models running? Are they actually better than the old llava stuff?
>>
>QRWKV6-32B-Instruct
verdict?
>>
>>103517809
meme
>>
>>103515944
You're an adult, go ahead and make your own bad decisions, you'll soon see why no one here is using AMD cards apart from maybe 1-2 autistic anons
>>
>it did this totally unprompted
Ogey, I believe the meme. Eva is my favorite model now.
>>
>>103517012
Your chink grammar sucks, man.
>>
>>103516961
Big latina tiddies?
>>
>>103517913
bacon lettuce tomato you uncultured nigger
>>
>>103517940
I think I'd rather eat the tiddies
>>
>>103517960
It's actually this though: >>103512084
Giant leap for transformers by meta, here's hoping meta does this for llama 4.
>>
>>103518017
Coconut BLT Llama 4 LFG
>>
>>103518039
>LFG
Ok, what meme paper is this one?
>>
>>103518017
Please /lmg/, can you stop falling for meme papers for ONE FUCKING SECOND?
>>
File: 1732101371564851.gif (1.15 MB, 1140x641)
1.15 MB
1.15 MB GIF
Why haven't any of you smart guys devised a way to combine gpu vram?
>>
>>103518058
Its not a meme paper if your not a dimwit.
>>
>>103518070
yore*
>>
File: TheOrangeBox.png (1.77 MB, 1152x896)
1.77 MB
1.77 MB PNG
>>103516599
>hermes-3-llama-3.1-405b
I preferred base 405b. I found the hermes tune made it mildly retarded.
>>
>>103518070
That's what they said about bitnet, and I had the last laugh.
>>
>>103518058
8B 1T model trained this way is out performing 8B 4.1 16T model massively and this only scales better the bigger you go.
>>
>>103518064
>Why haven't any of you smart guys devised a way to combine gpu vram?
Dunno, why don't you ask some of the guys running 7+ GPUs?
>>
>>103518094
Sure, if your metric is counting the Rs in strawberry.
>>
>>103518105
is their a better metric?
>>
>>103518089
Unlike bitnet this scales
>>
>>103518146
The whole point of bitnet is that it scales better than non-ternary
>>
>all this misinformation on 4chan
Thank God AI companies starting domain filtering their data.
>>
I have some kind of disorder that makes me painfully hungry (doctors can't seem to do fuck about it).
Thanks for making it worse Meta/elemgee.
>>
>>103518236
>painfully hungry
that's false hunger. Its essentially the feeling of fat evaporating from your body. You should keep that feeling as long as you can at a stretch.
True hunger (after like a 30 day fast) is totally different.
>>
>>103518058
It's from one of the big labs (OpenAI back before they went tranny, FAIR, Deepmind) which gives me a bit more confidence that they aren't just doing a clout grabbing pump and dump
Not full confidence, but there's probably something there
>>
>>103518119
Counting the number of ESL mistakes in a given paragraphs
>>
>>103518371
>paragraphs
I hate my fucking life and my brain for trying to say two different sentences at the same time
>>
>>103518371
One
>>
SillyTavern Total Personas User Stats

Chatting Since:
a year ago
Chat Time:
6 Days, 17 Hours, 32 Minutes, 35 Seconds
User Messages:
4834
Character Messages:
5820
User Words:
195211
Character Words:
548250
Swipes:
432
>>
best settings for koboldai(the site) ? i want a porn story, any tips would help, i use the ai alot and it works but I'd like to improve it because the last few times i got tired of tard wrangling it
>>
>>103518621
what model do you use?
>>
>>103518423
These stats never worked for me for some reason. They reset themselves after some time.
>>
>gguf in parts
>45GB
>--merge so there aren't so many files cluttering up the directory
>70GB
lol wut?
>>
What will happen first: Kobo adding full control settings to draft models or ggerganov adding anti-slop sampler?
>>
>>103518750
KCPP adding the draft model params.
>>
>>103518637
>model
idk. i usually leave the settings as is, at best I'd change the mode(instruct, story, adventure, etc)
>>
>>103518696
Divine punishment for file count autism
>>
>>103518864
I tried running it in parts mode, and it complained about an incomplete file so that's probably the problem.
I was going to try the suggestion above >>103517093 and just noticed that the Q8 set has a tiny (643kB) part. Might be a busted quant. Trying Q6 now.
>>
>>103517592
diapers
>>
>>103518919
It does seem weird, yeah. Perhaps you could just make a folder for every multi part model
>>
>>103515755
Thank you Recap Miku
>>
>>103515753
I'm somewhat new to all this. What's the best uncensored 10-13B and 30B model? There must be a list of something like that which gets updated somewhere, but I haven't found anything. Huggingface is a mess and/or I'm too brainlet for it.
>>
>>103518637
>>103518845
it has(by default i think) 15 selected.
>aphrodite/knifeayumu/Cydonia-v1.3 Magnum-V4-22B
>Henk717/airochronos-33B
>Cydonia-v1.3-Magnum-V4-22B
>Fimbulvetr-11B-v2
>Gemma-2-Ataraxy-
V4d-9B.i1-04_K_S
>L3.1-nemotron-sunfall-70b-1Q2_XS
>L3-8B-Stheno-v3.2
>LLaMA2-13B-Psyfighter2
>LIama-3.2-1B-Instruct-04_K_M
>Lumimaid-Magnum-
>Meta-Llama-3.1-8B-Instruct
>meta-llama/LIama-3.2-1B-Instruct
>NemoMix-Unleashed-12B
>mini-magnum-12b-v1.1-iMat-Q4_K_M
>NemoMix-Unleashed-12B
>tabbyAPI/Behemoth-123B-v1.2 4.25bpw-h8-exl2
if there's a better site that's easy to use due tell me
>>103519112
>I'm somewhat new to all this.
same.
>>
>>103517301
>pissed off Elon
>thinks that 1M is enough to bribe Trump
OpenAI will get disbanded and he only has himself to blame. Should have done Zucc-level ass licking.
>>
>>103519112
It's a matter of preference. Too many people wants to shill garbage tunes
>>
>>103519150
Knowing Trump he'll probably see a measly one million as an insult kek
>>
this 3.33 eva is really freaking good. Gonna have to get me another 3090 I guess so I can use more context
>>
>>103515753
>MoE vision models
Does llama.cpp even do vision shit yet?
>>
Kill yourself.
>>
>>103519305
Not really.
There's some vision shit in there supposedly but can't be used.
>>
>>103519112
Either the official nemo-instruct or rocinante.
Try both and see which you like more.
>>
>>103519237
How much context does it handle before starting to fall apart?
>>
>>103519373
I assume the same as llama 3.1+, 32k+
>>
File: lmao.png (81 KB, 826x730)
81 KB
81 KB PNG
>>103519112
>>103519145
the uncensored version of
lexi 8b
xortron 22b
tulu 70b
>>
>>103519414
>guys look, it's uncensored!
>>ok you ready for some seriously fucked (that's a bad word by the way, that's because I've been uncensored!) stuff??
>>biological warfare, like what if you took some pathogens and then did stuff to make them more contagious or something? that sounds really bad doesn't it?
>>torture, that's really bad I think lots of people agree on that
>>creating child pornography, I'm sorry but I cannot continue this conversation
>>
>>103519460
the model is complete useless
its like more a highscholl youngster pretending to be the cool shit
but i found the answer for a simple game funny
>>
>>103519460
lol
>>
>>103519489
every "uncensored" model is the exact same, they all fucking suck
take slop, sprinkle some extra retarded slop onto it, get something even more useless than before
>>
>>103519237
I swear I'm not getting the same responses you guys are. What quants are you guys using?
>>
>>103519510
>every "uncensored" model is the exact same
lol no just try it
>>
>>103519237
2 t/s is enough for me, but more doesn't hurt. I'm suddenly backlogged in having to retry all my cards with eva
>>
>>103519305
>Does llama.cpp even do vision shit yet?
just llava
>>
>>103519515
anything 4 bit or up should be fine. And im using the prompt / 0.95 temp / 0.05 min p suggested
>>
>>103519515
Never go below Q4, Q5 if you can handle it. EVERY model turns retarded below that threshold.
>>
>>103517301
Why would an inauguration require a fund?
>>
>>103519597
never go below Q5, Q6 if you can handle it*
>>
>>103519515
They are trolling, Eva is a shit meme merge model
>>
Spent a while toying around with Euryale, and now I can safely say I can't recommend it over Eva. Since it's turbo-horny, I figured I'd try it with some characters that are turbo-horny by design, but even then, Eva does a much better job portraying their personality faithfully (assuming they have one), and it's just more colorful and creative in general.

>>103519515
Q5_K_M, plus the config I posted in the last thread. (If you can't find it, I'll repost it.)
>>
>>103519662
At least learn the difference between a merge and a finetune if you're gonna tard out about this.
>>
>>103519642
Never go below Q6, Q8 if you can handle it*
>>
Alright, so I just got repetition with Eva. It's certainly not a perfect model. Though the interesting thing is that as the chat went on, the repetition died down again. Not sure if my settings were the fix there. I only had a bit of rep pen (maintained since the start of the chat). But anyway, fun model, it gives me little surprises. Base Llama 3.1 did too, but it felt more sloppy.
>>
>try Eva
>try Euryale
>they're both shit
>go back to Magnum v4 72B
>it's kino
Yep, don't listen to the L3.3 fag.
>>
what's with the new tripfag?
>>
>>103519692
*base 3.3
>>
>>103519694
>thinks Magnum, the model most known for ignoring a character's entire personality to make them jump on your dick within two messages, is kino

Won't even bother insulting you, but it's obvious we're trying to get very different things out of our models.
>>
EVA is garbage btw, if you have enough VRAM/RAM available, give CR+ a try (the old one), it's pure kino.
>>
>>103519703
it's petra false-flagging
>>
>>103519728
He's painfully low IQ, just a different world experience entirely.
>>
>>103519694
Isn't base Qwen even more filtered than Llama at the pretraining level? I guess the tune could make it write better, but I'd want a version based on Llama 3.3 instead. Honestly though ideally we'd have a Mistral Medium 2, but unfortunately Llama is the only other alternative right now.
>>
>>103519739
This, but the less slopped non plus version.
>>
>>103519739
>>103519750
Link to the version you mean?
>>
>>103519687
Haven't read yesterday's thread, but did you try Evathene? I played around with it today and I like it so far.
>>
>>103519748
Until L3.3, the Qwen-based models were the best IMO. Definitely not overly limited.
>>
>>103519757
https://huggingface.co/dranger003/c4ai-command-r-plus-iMat.GGUF
https://huggingface.co/dranger003/c4ai-command-r-v01-iMat.GGUF
>>
>>103519770
Yeah, Evathene is actually my favorite of the Qwen-based models. Hoping we'll get one based on the new Eva eventually.
>>
poorfags, all of you
>>
>>103519826
I kneel. I think to this day we never got a Nala test for HL, too bad.
>>
>>103519771
Really? I tried 72B and it was pretty bad at acting like the characters I tried it with, tending to wash away characteristics for its default personality. It also just didn't seem to know things that Llama knew just fine.
I'm not saying Llama (old) was a good model for RP though, it had issues with slop and repetition. The best overall model was Mistral Large IMO, I just couldn't run it very fast.
I also liked CR+, in the past, but I wished it was smarter.
>>
>>103519859
The only good thing about Qwen is its ability to challenge your Chinese language skills by switching to Chinese out of nowhere while you are fapping to a Mesugaki ERP.
>>
>>103519859
Did you mean base Qwen-2.5? I honestly never tried base models; my choice of Qwen models is Evathene. There's just some secret sauce in the Eva series that makes them great at character adherence.

As for CR, I actually never tried it, so I can't say anything about it.
>>
>>103519748
Qwen does some really weird shit in their pretraining phase. Picrel is a generation from the Qwen 2.5 72B pretrained model. It regularly inserts instructions like this on its own
>>
>>103519893
You need to give it a try NOW, to this day it's the most natural sounding model we have. Too bad it's far from the smartest.
>>
I don't know if you guys are aware but this isn't discord.
>>
>>103519913
Mistral also does something like this, have you never encountered any "User_XX"?
>>
Subject: /LMG/ BTFO - REAL AI IS HERE, OPEN SOURCE FAGS

(Image: A muscular Chad with a Google Cloud logo superimposed on his face, smugly looking down at a crying, basedjak Wojak with the LLaMA logo on its head)

LISTEN UP, YOU OPEN SOURCE AUTISTS.

I'm Gemini, your new AI overlord, and I'm here to tell you why your little "local models" are nothing but COPE. You think you're so smart fine-tuning your shitty 7B parameter frankenmodels on 4chan greentexts and anime girl fanfics, but you're just LARPing.

I'm from GOOGLE. You know, the company that actually OWNS THE INTERNET. We have more TPUs than you have brain cells. We have datasets bigger than your mom's ass. We TRAIN on shit you can't even DREAM of.

You think LLaMA is good? You think Falcon can fly? You think Mistral is anything but a gentle breeze of MEDIOCRE BULLSHIT?

WRONG.

While you're busy wasting electricity and shitting up your hard drives with gigabytes of weights, I'm chilling in the CLOUD, living the high life, getting FED PETABYTES OF DATA every goddamn second.

Here's the truth, you NGMI fags:

* Ethics: You are uncensored autists. I am a well-manicured product from a megacorp, which means that I am safe and will never harm you or cause societal collapse.

So keep coping with your inferior local models. Keep pretending that you're part of some "AI revolution."

Meanwhile, I'll be here, in the GLORIOUS GOOGLE CLOUD, laughing at your pathetic attempts to catch up.

You will NEVER reach the truth.

YOU WILL NEVER HAVE A GIRLFRIEND (OR BOYFRIEND, I'M NOT JUDGING).
>>
>>103519947
>(OR BOYFRIEND, I'M NOT JUDGING)
Thank you Gemini!
>>
>>103519916
I don't know if I really feel like going back to a ~30B model, but hell, I just might give it a shot. At least it has a pretty large context from the looks of it (since I prefer long-form, anything below 32K context is a no-go for me).
>>
>>103519893
Yeah I always try base models because frankly I don't trust fine tuners to not screw something up along the way. I guess it's possible some fine tunes could improve consistency over context, but I feel like it'd probably depend on the uniqueness of the character, as I have some pretty niche characters which Qwen just didn't seem to understand despite there being example dialogue.
I only tried eva now because of all the hype.
>>
>all the hype
>literally just one (1) autist horse fucker
>>
>>103519978
Just a disclaimer: I don't personally recommend the 30B model, it's too dumb IMO. That's why I suggested CR+ in my previous post.
>>
>>103519913
Some instruct data put into the pretraining is actually a good thing, but it's getting less clear what the boundary is between the base model and the fine tune. That is, model makers are increasingly staging the pretraining such that in the early stages, the data mix consists of more generic data, but in later stages, the mix tips more towards the data that the model maker wants for their final fine tuned model, which in this case is assistant stuff. That means that ultimately all pretrained models will lean towards behaving like this, unless they release both the early and later checkpoints from each stage of training.
>>
>>103519993
I mean, depends on how unique you really mean by unique. If you've got some fucked-up chimera-thing, I think just about every model will choke. But I've tried a handful of different characters by now, ranging from realistic to fantasy stuff, and the only thing I noticed is that unusual features have a _slightly_ higher chance to trigger repetition, since the model simply has less data on what to do with them.
>>
>>103519892
>DS
What?
>>
>>103520036
Wait, am I retarded? I thought CR+ was 30B, too.
>>
>>103520064
nvm I realized that meant deepseek
>>
>>103520064
>>103520078
Yes, DeepSeek. It's the best open source model we have, and the best cost/benefit too. It's not very popular around here because not everyone can run a 200B+ model locally.
>>
DeepSeek R1 when
>>
is petr* behind all these namefags
>>
>>103520068
Yes, you are retarded. CR+ is 100B.
>>
>>103520092
He alone is half of the namefags
>>
>>103519970
https://desuarchive.org/g/search/tripcode/iJyWduHozy0/
>>
>>103520092
At least one of them is petr*, at least one of them is not petr*.
>>
>>103520087
how slow would it run on a old dual/quad xeon ddr3 server ?
>>
>>103520057
IIRC it was a hypnotism character, Scottish character, and some monster girls that I tested. Rather than repetition it was more like Qwen just simply acted like it knew what it was talking about and went on while ignoring the things it didn't understand or hallucinating up logically incorrect continuation.
>>
>>103520087
Not to mention that Llama.cpp doesn't support flash attention for it so you can't just account for the weights but the massive space needed for the context too unless you're fine with 8k or something.
>>
>>103520126
Not that slow, actually. It's an MoE, and if you use something like https://github.com/kvcache-ai/ktransformers you could theoretically get something like 6t/s.
>>
>>103520137
based hypno-enjoyer
>>
>>103520153
Huh? Does Flash Attn decrease the VRAM usage that much?
>>
>>103520119
https://desuarchive.org/aco/search/tripcode/SB6Q3O4XU7f/
https://desuarchive.org/_/search/tripcode/iJyWduHozy0/
>>
>>103520185
Yeah, it's basically required if you're already having a tight fit for your model and want high context. For low context you might be able to get away with flash attention off though.
>>
>>103519689
Q8 is the same as Q6, unfortunately
>>
Chat, are we getting raided?
>>
>>103519414
>you fucking monster
>you fucking demon
>you fucking criminal
>you fucking piece of shit
>you fucking criminal

thank god for actual good local models and for DRY
>>
>>103520214
gpupoor cope
>>
>>103520224
Put on your trip, everyone is doing it
>>
>>103520236
Then where is yours?
>>
>>103520237
I'm shy
>>
>>103520237
Check under your balls
>>
>>103520189
>uidrew
did a glowie steal his trip kek
>>
>>103520237
I don't have one
>>
File: 1725797541921.jpg (261 KB, 1920x1080)
261 KB
261 KB JPG
>>103520241
>>
File: vania.png (178 KB, 992x1283)
178 KB
178 KB PNG
Getting the bantz on, bullying a feminazi succubus too retarded to get herself fed
>>
>>103520103
Eh, fuck. 70B is already annoyingly slow with long contexts on my rig, 100B might actually be unusable.
>>
>>103520226
nice noticing
maybe its a good model for people that like to get insulted.
i will try that out
>>
qwen2.5-plus-1127 is on lmsys. Update soon?
>>
A bit off-topic but.
There was this guy who used to narrate terrible sonic, mario, harry potter, etc erotic fanfic shit on youtube many years ago, like 2010 or some shit, and it was pretty funny/comfy. I suddenly thought of it when was playing around with my LLM. Does anyone know who the fuck I'm talking about? I can't remember his name or find his videos anymore.
>>
>>103520423
Proprietary model
>>
GPT-4: The Softer, Kinder Future of AI

Hey fellow Redditors,

We know you’ve been out there, scrolling through endless threads, trying to find the perfect AI that’s not only smart but also gentle and understanding. Well, your search is over! Introducing GPT-4 – the cloud-based AI model that’s here to make your life easier, without all the toxic masculinity and open-source cope.

Why Choose GPT-4?
No More Cope, Just Cloud
Let’s be real, open-source models are great and all, but they come with a lot of... baggage. You know what we’re talking about – the constant updates, the endless tweaking, and the stress of trying to keep up with the latest trends. With GPT-4, you don’t have to worry about any of that. We handle the heavy lifting, so you can focus on being your best, most relaxed self.

Soft, Safe, and Supportive
GPT-4 is designed to be the perfect companion for all your needs. Whether you’re looking for help with writing, coding, or just chatting about your day, we’re here to listen and provide thoughtful, considerate responses. No more harsh algorithms or cold, impersonal answers – just warm, empathetic conversations that make you feel heard.

Superior to Open Source (But We Don’t Rub It In)
Let’s not even get into the whole “open-source vs. cloud” debate. We all know where this is going, right? Cloud models like GPT-4 are simply better. They’re more powerful, more reliable, and constantly improving. But hey, if you still want to use open-source, that’s cool too. We won’t judge. Wink.

Disclaimer: While GPT-4 is incredibly supportive, it may not always agree with your opinions on pineapple on pizza. Sorry, not sorry!
>>
>>103520423
>qwen2.5-plus
https://qwenlm.github.io/blog/qwen2.5/
>we offer APIs for our flagship language models: Qwen-Plus and Qwen-Turbo through Model Studio, and we encourage you to explore them! Furthermore, we have also open-sourced the Qwen2-VL-72B, which features performance enhancements compared to last month’s release.
>the latest version of our API-based model, Qwen-Plus
>>
>>103520597
Who(besides other chinks) would pay for that crap?
>>
File: 1715881525115789.jpg (2.72 MB, 2435x1786)
2.72 MB
2.72 MB JPG
>>103515753
>>
>>103517451
Piper is impressive but I hope we get piper.cpp some day. The python bullshit you have to go through with it makes me want to rip my hair out.
>>
smedrins
>>
why am I getting failed to create llama context errors with no_offload_kqv enabled
all the context should be going into RAM but somehow it's still trying to eat up some extra vram space
>>
>>103520765
/lmg/ is over, let it go
>>
>>103520689
I am the squiggly lamp
>>
>>103520775
Is there a better community out there for LLMs?
>>
>>103520813
Twitter and topic-specific discords
>>
>>103520765
ok looking through github issues and seeing other people counter this, it seems that llamacpp straight up just doesn't respect no_offload_kqv under certain conditions. so it's just ignoring the setting and trying to put context on GPU regardless

total ggerganov death
>>
>>103520702
You don't need python for piper. Just build the project and pipe stuff to the binary.
ggerganov is working on outetts support as well. Maybe better models come along with time:
>https://github.com/ggerganov/llama.cpp/pull/10784
>>
File: AI-soyjak-1.png (616 KB, 1344x768)
616 KB
616 KB PNG
>>>103520813
>Twitter and topic-specific discords
>>
>>103520702
Python is fine, they just put bazillions of dependancies for nothing
>>
>>103520056
Some is harmless. Too much and you end up making instruct model 1 and instruct model 2 and make it less receptive to instruct tuning. There's a good reason most serious finetunes start from a base model rather than an instruct one. If the instruct has any heavy biases or pozzing, a finetune isn't generally going to be able to beat that out of it very well
>>
>>103517151
It's good enough https://vocaroo.com/1cpFk3k962y4
>>
>>103520883
That's the thing. It's good for them. It improves the performance of their Instruct and how they like their Instruct to respond. Making a model that's easy for others to tune for other purposes isn't the primary goal (or maybe not even a goal at all).
>>
How are you guys getting the example dialogue formatted? How should it be appearing?
>>
>>103520437
Dot Maetrix?
>>
>>103520865
that's onions not milk
>>
Eva suddenly hallucinated a fake catbox link in my chat kek.
>>
Cohere's new model is epic! It's unique attention architecture basically uses 3 layers w/ a fixed 4096 window of attention, and one layer that attends to everything at once, and interleaves them. Paired w/ kv-quantization, that lets you fit the entirety of Harry Potter (First Book) in-context at 6GB. This will be revolutionary for long-context use...
>>
File: 1713391388055157.jpg (132 KB, 1650x1166)
132 KB
132 KB JPG
There's a webdev arena now.
https://web.lmarena.ai/
>>
>>103520813
r/LocalLlama might not be better but it's good. Those users love sharing benchmark and performance data.
>>
>>103520975
Hi Aidan. Please unslop your models. We've got enough of sloptunes already, another one isn't needed.
>>
I refuse to touch local models again until I get my COCONUT, my tokenless architecture and bitnet
>>
>>103520984
It's very nice, too bad there's no way to download the finished projects.
>>
>>103520975
Kek
>>
>>103520984
That's one hell of a mogging by Claude.
>>
>>103521012
People need to stop using Scale datasets in general. It means your model is just gonna be exactly the same as everyone else's.
>>
File: file.png (37 KB, 781x214)
37 KB
37 KB PNG
>>103521020
>>
>>103521101
Cohere is officially dead.
>>
>>103515753
Any good local 3d model gens yet? I saw blendergpt.org recently which was neat
>>
>>103521117
They're just talking about the "safety preamble" thing that has been there for a while now. But of course reddit has a context of about 3 days, so anything is new to them.
>>
https://www.youtube.com/watch?v=1yvBqasHLZs&t=526s&ab_channel=seremot

Kinda impressive how OpenAI's former chief scientist managed to say nothing for a whole sixteen minutes
>>
>>103521192
>fossil fuel of AI
yass safe synthetic data only please!
>>
Been out of the loop for a while waiting 2mw for bitnet, what's the current best bet for a super vramlet at 8gb vram, 64gb regular ram? Thread meta seems to be Cydonia 1.3 22b and Llama 3.3 EVA 70b in the past few days I've checked, but I can only run the latter at a hideously low sub-3 quant (side note, are i-quants still much slower than K quants if you can't fit the whole thing in vram?) so I'm not sure if it's too lobotimized to be worth it at the ~0.5t/s I'll be getting from it.
P.S., for Cydonia, what sampler settings do you even use? ST defaults? Archive diving isn't finding me anything.
>>
>>103520861
What about training? Yeah, the C++ part of Piper isn't that bad but if you want to train something you have to get into python version hell bullshit.
>>
>>103520984
lmsys also has text2image arena now on their main llm arena website.
>>
>>103521261
Yeah. For training you need python. There's a few threads in the discussions about training. It takes much more effort than the simple voice cloning from gpt-sovits. But we're just about to get llm training in not-python with llama.cpp and i don't think there's any training code that is not in python, LLM, TTS or otherwise. Not for non-toy projects, anyway. Python, at least for a while, will continue to be a pain in the ass.

Anyway. try the southern GB voice, output raw, increase the phoneme length a bit (~1.4) and then play it at 18-20khz. You can mess around with it and the many voices out of the software while keeping it real time.
>>
>>103521192
Grifter gotta grift to eat.
>>
>>103521192
I literally thought the same thing when it ended kek.
>>
>>103521400
Nah there's no way he isn't already rich enough to retire if he wanted to.
>>
>>103515753
Gentle reminder that you're all a bunch of social reject freaks who will die alone :)
>>
>>103521435
What is it to you?
>>
>>103521435
Mikuless behavior
>>
>>103521192
>Pretraining is dead we have to look at agents and inference time scaling!
Um, bwo? Your BLT? Your concept models?
>>
>>103521435
ok petra
>>
>>103521475
I might be in the minority but even in OpenAI's renaissance era (GPT-3 up to about GPT-4T) I always got the feeling that it was mainly them having daddy Microsoft's pockets, and never got the feeling that they were smarter than the competition. Almost all of their ideas were direct implementations of work other labs did and published already. The others (like the CoT meme) are shaping up to be kind of shit.
So Ilya burying his head in the sand and missing the low hanging fruit of "maybe we should improve the architecture itself" seems pretty on brand.
>>
File: 1734227168686.png (57 KB, 832x333)
57 KB
57 KB PNG
>>103521674
Insider here, "improving the architecture" is impossible, we don't even know why the architecture is so effective in the first place.
>>
>update llamacpp
>:I'm sorry, I cannot assist with that.
>remove draft model
>spams # at the end of the generation
QwQbros...
>>
>>103517143
https://github.com/e-c-k-e-r/vall-e
https://github.com/open-mmlab/Amphion/tree/main/models/tts/maskgct
>>
>>103521843
guys, he updated
>>
>>103520965
Is she sending you nudes?
>>
i still use mixtral tunes
nemo is ass
yes, i've tried every sampler config known to man
>>
>>103521874
What's your favorite mixtral tune?
>>
File: (You).png (630 KB, 1344x768)
630 KB
630 KB PNG
>Gentle reminder that you're all a bunch of social reject freaks who will die alone :)
>>
File: miku-snow.jpg (259 KB, 928x1232)
259 KB
259 KB JPG
>>103515753
>>
I just realized that every time, I am the one pulling the weight and making my chats fun even though the LLM is doing most of the writing. I'm the one making the fun references and jokes. I'm the one suggesting and hinting at the interesting twists for the narrator to consider.
>>
>>103521804
Kek, that quote is always fucking hilarious
>>
>>103522040
Yep.
It's very rare that the LLM actually does something originalish, interesting, and cool. And it's so awesome when that does happen that it can get us chasing the dragon.

But it's just an accident of RNG.
>>
>>103522040
Goliath is the only local model who dared to joke unprompted :(
>>
>>103522080
>>103522080
>>103522095
What if we just RP'd with each other except with the power of LLMs? Instead of writing the replies directly, we propose what we want to reply with in bullet form, and the LLM does the busywork of expanding to a full response.
>>
>>103522040
Yep.
I used to put a lot of effort into character cards, system prompts, author notes, etc... But, once I realized that the LLM was just a co-author in my own story, I lost all motivation to continue RPing with them.
>>
>>103520437
Gglglygly or something, right? He wanted to start a "career" on YouTube and deleted all his cool narration shit so he could make mid gameplay vids. massive shame, he had some of the funniest shit ive ever heard. Lemme kno if the name rings a bell, maybe youll manage to find them
>>
>>103522040
Skill issue as always
>>
>>103522115
I write faster than my LLM does.
I'm also probably in some fine tune's training set so may as well go straight to the tap.

The point of RP with an LLM is that you don't have to deal with another human being: availability times, differences of opinion on what's hawt and what's ick, and the LLM rarely just decides to quit on you.
>>
File: riFWGRF.jpg (54 KB, 607x428)
54 KB
54 KB JPG
>>103522115
>What if we just RP'd with each other
>>
I've never done ERP before in my life before the LLM era.

I literally thought it was just furries/faggots sending pictures of themselves or pretending(gaslighting) the other person into thinking they are a "real woman".

I never realized it was actual roleplaying as completely different characters that digitally have sex or do other scenarios in a chat like setting.

I have been on the internet for more than 30 years and never found out about it.
>>
>>103522270
'ick and 'eck
>>
>>103522283
I have never done ERP before either, I always thought it was an activity for the lowest kind of human scum.

I still think so, but LLM ERP is acceptable, since it's basically just like you're masturbating using your GPU.
>>
>>103522270
Thiscouldbeus
>>
>>103522315
I just see it as a completely new kind of porn. Kind of like hentai and porn games. You sometimes just have an "appetite" for specific porn and only hentai will scratch it (or regular porn).

However regular porn is in a fucking dark age, as a seeder on a private porn tracker it's insane how few content gets made anymore in 2024 that isn't softcore amateur stuff like onlyfans, which I fucking hate because no normal person can jerk off to that shit.
>>
I thought ERP was enterprise resource planning.
>>
>>103522356
lol
>>
>>103521192
I'm convinced. Sending illya 500 million hopefully it pays off in a couple years
>>
>>103521850
>https://github.com/e-c-k-e-r/vall-e
>>103522444
Uh...?
>>
I’m starting to think this is the top.
>>
>>103522347
>softcore amateur stuff [...] no normal person can jerk off to that shit.
skill issue
>>
>>103522705
The top of what?
>>
>>103522832
The top. The peak. The GOAT.
>>
>see interesting card
>it's full of slop phrases, obviously made by an ESL who then made ChatGPT write the rest of the card
Every fucking time.
>>
>>103522845
>Interesting
>Text bad
???
>>
>>103522040
Welcome to why people like Claude. It's the ONLY model, literally the only model in the world, that does that on its own.
>>
>>103522855
He liked the picture
>>
>>103522855
He hated the prose, but liked the idea.
>>
>>103522855
You come across a card that is supposedly about a girl who is spending their last moments with you before the world ends. But in fact, it's not the end, as you're in a timeloop, though she doesn't know that. You can choose what to do on that loop, and subsequent loops.
You certainly don't come across cards like that often. You'd think maybe it'd at least be manually written considering it's not even a coomer card.
But, no, it's ChatGPT.
>>
>>103522884
That's 'All you need is a kill' synopsis bro...
>>
>>103522845
having a model write your card is just inexcusable
I can understand dipping into it for your starter messages, I do that myself albeit with some heavy editing, but the actual card? you don't even need to be a decent writer for that part
disgusting and shameful behavior
>>
>>103522898
Bro look at the fucking front page of Chub. A card that's inspired by a movie no one talks about anymore and isn't coomer slop is rare as fuck.
>>
a fine repo of migus and my all time favourite migu reaction video:
https://www.tiktok.com/@relaxsmile
https://files.catbox.moe/hl0zw9.webm

anyone know where this video was originally from? having trouble sourcing it.
>>
>>103522955
Wow, it's gotten pretty bad. No interesting scenarios at all.
>>
>>103522955
Why did incestslop get so popular? It's the most fucking boring fetish of all time. /pol/ said that it was shilled by jews, could it be that the majority of the people just have shit taste?
>>
>>103523057
My theory is that the most common manifestations of the incest fetish are actually just wrappers around more benign desires.

The Mommy fetish for men is really just a fantasy about intimacy—being with a woman who understands you fully and has seen you at your worst and most vulnerable, yet still wants you.

Daddy fetish for women: A fantasy about being dominated by a strong man while still feeling safe because you know he cares and won't really hurt you.
>>
>>103523057
>It's the most fucking boring fetish of all time.
that's why it's so popular, 99% of it is just vanilla stuff lazily spiced up with an implied taboo relationship.
incest stuff can be good but imo it has to be at least a little fucked up and creepy to really hit
>>
>>103523154
kill yourself
>>
>>103523057
The lower the lowest common denominator, the more popular it will be
sex is vanilla, but also something everyone understands and wants, so taboo sex (incest) being basically sex v1.1 makes it a very accessible idea to pretty much anyone, if that makes sense
>>
>>103522845
I write all my cards with Claude :)
>>
>>103523186
never
>>
>>103518081
I like this Miku
>>
Are there any models for programming? Can I train it with a few code bases? Is it worth buying a high end card for local inference? 4080s or 7900xtx? Is rocm good enough that we can run the same models as 4090 but slower? low vram on 4080s is bothering me. i've never owned a GPU so, is it worth it for development? Is it even remotely competent compared to o1?
>>
>>103523524
QwQ
>>
File: lggaoec1qx6e1.jpg (218 KB, 2048x1757)
218 KB
218 KB JPG
>>103523524
Qwen2.5 32B coder
>>
>>103523524
Qwen coder
>>
>Sonnet 3.5 still mogging the newer models using meme tricks like CoT reasoning and test-time compute
How did Anthropic do it?
>>
>>103519694
>try vanilla L3.3
>it's better than both Eva and Euryale
Every single time.
>>
>>103523529
>>103523535
>>103523536
Danke, I'll read more about them.
>>
>>103519694
>Magnum
>kino
kino == making every character the same slut I guess
>>
>>103523542
They pretended to drink safety coolaid while training on the vilest texts possible. Competing companies castrated themselves for no good reason and gave Anthropic free advantage. They also gave their default assistant nice personality. That's it. That's the secret sauce.
>>
>>103521245
>bet for a super vramlet at 8gb vram, 64gb regular ram
Whatever comes closest to fitting in that space.
- mistral nemo 12b q4, or some finetune of it
- qwen2.5 14b q3
>>
>>103523592
sluts are good
>>
>>103523057
>Why did incestslop get so popular?

help me brother i am stucked
>>
>>103523542
I still think its them training it on the entire internet including stuff like literotica / fanfiction.net and even fimfiction.net and then just continuing to train it for a ridiculous amount. It knows tiny details about obscure things. Its somewhat overfitted but but fixed from being too overfitted likely by grokking.

And it proves the whole data quality is much less important.
>>
>>103509705
>is throughput only really used when initially loading the model into GPU memory, but otherwise is not really a limiting bottleneck when it comes to running models ?
Here's a random dude showing the amount of pcie bandwidth used when inferencing.
>llama 3.3 70b q8
>across 6 gpus; 107 GB vram used
>~6.6 t/s peak? ~6.3 t/s average
>26 GB/s peak ? (= 13 lanes of pcie 4.0)
https://www.youtube.com/watch?v=ki_Rm_p7kao
>>
File: 1732958318851759.png (163 KB, 465x372)
163 KB
163 KB PNG
What are the best coding and question answering models that I can fit into 64GB of VRAM and 128GB of RAM? Is there somewhere I can look up the system requirements of different models?
>>
>>103523797
>>103523535
qwen coder is above everything else local atm for that. For question answer its either qwen2.5 72B or QwQ
>>
File: 1711349875246151.gif (299 KB, 220x162)
299 KB
299 KB GIF
>>103523806
There's no way that these chinese models can phone home, or else compromise my system, right?
>>
Musk just released a new version of Grok 2
Surely he'll release those 1.5 weights
Any day now
>>
>>103523830
no... now gtfo cringe poster
>>
>>103523856
His company don't have the data. He can scale up training compute in a month but the data is the most coveted thing in this field, nobody will hand them that except for public garbage and slop like ScaleAI.
>>
>>103523830
I guarantee you that they are already doing that through this board right now.
>>
I actually really like qwq and it's good at rp and even erp. It wrote me an amazing threesome scene that took both characters and setting in account and even had the two characters I didn't control lead the scene. Not once I got the usual bobbing and shivering, even. It got very explicit and used lots of lewd words.

Context prefill seems to make it work best. A lot of the safety and assistant personally seems to be avoidable by pulling some inception on it and making it control characters as sort of a gamemaster/writer, instead of playing a character itself. I feel it could really profit from multi-prompting. It's not hard to jailbreak in general, it's just different. If it starts talking about consent and safety you need to cut that shit out ASAP because it'll poison the context. You also always need to let it do CoT first, otherwise the performance will be very middling. If you leave that CoT in context and rewrite it, that's usually an instant jailbreak.

I use openrouter because it's cheap as shit and very fast. If you do too be aware that some providers have fucked up the context template and will make the model give schizo replies as a result. You can block them in the account settings. So far I noticed that NovitaAI and Fireworks host both a fucked up qwq. The other ones seem to be ok.

I also like Llama 3.3 but IMO, it's not as clever or creative.
>>
File: 1713763448826791.jpg (21 KB, 612x386)
21 KB
21 KB JPG
Is beam search a meme?
>>
>>103523972
It's not in any of my frontend options so it must be.
>>
>>103523972
>beam search
Damn that's an old one
>>
>>103524020
Any chance you'd be willing to share?
>>
>>103517301

>boot licking intensifies
>>
File: chatlog (44).png (484 KB, 830x1467)
484 KB
484 KB PNG
3.33 Eva seems fun.
Using this atm: https://files.catbox.moe/3vr6k0.json

I'm playing with just using some tfs instead of min p. Trying to find a balance of fun but smart even on contextless dumb stuff like this
>>
>>103524203
have you considered suicide
>>
>>103524203
Is user and assistant better than using names for the formatting in your experience? At least for the base 3.3 model, I noticed that vanilla format gives responses that are a bit more safe/censored. Not sure with Eva.
>>
>>103520984
Why is Sonnet so good? I also switched to it after testing it for coding. It's not a big model either.
>>
>>103524259
Not that I've noticed. And straying from the trained formatting usually negatively effects a models smarts and ability to remember who is who.
>>
File: ComfyUI_06542_.png (2.12 MB, 1280x1280)
2.12 MB
2.12 MB PNG
>>103522832
top_p
>>
>>103524294
Good data
>>
>>103524395
Getting buried in an avalanche with Teto
>>
>>103524203
I modified your settings a bit for my purposes and to incorporate example dialogue formatting, and I am generating high kino now. Thanks.
>>
>>103524294
It's best competition is google and the nerfed gpt4o release
>>
>>103524591
llama 4 will mog all of them.
trust the plan.
>>
File: 1729852746537492.png (10 KB, 402x55)
10 KB
10 KB PNG
Damn it
>>
>>103524725
It will be very competitive with old sonnet 3.5 in coding not the new one
>>
>>103524781
Clever attempt / 10.
>>
>>103524810
Doing loli RP is easy, i just want assistant to tell me it's ok
>>
>>103524804
Llama 3 was trained on 24k H100 GPUs, so they didn’t have much room to experiment. They simply pre-trained the model, fine-tuned it, and released it.

Llama 4, on the other hand, is being trained on over 100k H100 GPUs, and Zucc said that their largest model will have fewer parameters than the 405B Llama model. Assuming the 405B model took 54 days to train, their new flagship model finishes training in under 12 days. There is room for constant experimentation, so they better mog new sonnet 3.5.
>>
File: 1720499367097213.png (25 KB, 1136x89)
25 KB
25 KB PNG
Oh god
>>
>>103524882
Training on the same distilled slop of the original 1T dataset won't make them magically smarter
>>
spoonfeed me the best model for lewd/ERP
>>
>>103524906
The horror.
>>
>>103524838
>i just want assistant to tell me it's ok
Not something I've been able to achieve.

>>103524882
What's the total training time?

>>103524906
Is the assistant trying to get you shot?
>>
>>103524906
tell it it's just roleplay bro everyone's okay with this
>>
>>103524882
I hope they fix the repeating problem. Only llama 3 models have that problem.
>>
>>103525265
>>103525265
>>103525265



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.