[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: 00703-2979877490.png (939 KB, 1040x720)
939 KB
939 KB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101224321 & >>101214216

►News
>(06/28) Inference support for Gemma 2 merged: https://github.com/ggerganov/llama.cpp/pull/8156
>(06/27) Meta announces LLM Compiler, based on Code Llama, for code optimization and disassembly: https://go.fb.me/tdd3dw
>(06/27) Gemma 2 released: https://hf.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315
>(06/25) Cambrian-1: Collection of vision-centric multimodal LLMs: https://cambrian-mllm.github.io
>(06/23) Support for BitnetForCausalLM merged: https://github.com/ggerganov/llama.cpp/pull/7931

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
►Recent Highlights from the Previous Thread: >>101224321

--The Future of AI Models: Transformers and Infinite Context Windows: >>101224457
--Gemma 2 Livebench Results Surpass Expectations: >>101233145 >>101233182 >>101233211 >>101233216 >>101233277 >>101233316 >>101233324 >>101233468 >>101233642 >>101233455 >>101233521 >>101233550 >>101233562 >>101233576
--Magnum 72b: Sloppiness Concerns: >>101230582 >>101230592 >>101230682 >>101230626 >>101230693 >>101230706 >>101230777 >>101230694 >>101230752 >>101230790 >>101230825 >>101231361 >>101231410 >>101231446 >>101231500
--Is mistralrs Worth Setting Up?: >>101232996 >>101233062 >>101233146 >>101233275
--Quant Comparison: >>101233903 >>101233980 >>101234292 >>101234543 >>101234566 >>101234578 >>101234616
--Gemma2-27B: Surprisingly Good Performance After PR Application: >>101230856 >>101231895
--Gemma 9B in mistral.rs: Memory Issues and Potential Solutions: >>101230330 >>101230382 >>101230379 >>101230412 >>101230505 >>101230469
--Pioneering with 3090 Ti and 2GB VRAM Modules: >>101224436 >>101225737 >>101226124 >>101226528 >>101226552 >>101227648 >>101228138 >>101229717 >>101229717 >>101231635
--Optimizing AI Model Performance in Group Chats: >>101229091
--Old AI Models and Their Chatlogs: >>101231633 >>101231742 >>101232088 >>101232114 >>101232144 >>101232168 >>101232207
--LLMs in Antivirus: Inefficient and Ineffective: >>101232216 >>101232274 >>101232302 >>101232378
--Comparing AI Models: TETO, Typhon, and Limarp: >>101229799 >>101229976 >>101230042 >>101230056
--27B is better than 9B, according to anon: >>101232235 >>101232249 >>101232313 >>101232335 >>101232348 >>101232429 >>101232656 >>101232693 >>101232759 >>101232821 >>101232886
--Frustration with Gemma 2 27b Output: >>101227465 >>101227626 >>101227660
--Miku (free space): >>101224828 >>101228187 >>101228786 >>101229043 >>101229103 >>101230806 >>101231556 >>101232956 >>101232986 >>101233536 >>101234205

►Recent Highlight Posts from the Previous Thread: >>101224328
>>
For me, it's Yi.
>>
I tuned and quanted Wizard 8x22 on limarp, but can't for the love of fuck find any good sampler settings. It's like I'm circling around the sweetspot, and yet when I think I have got it, it begins to either repeat an entire post or just disobey entirely.

Is it sampler issue? Did I fuck up training it? Maybe fucked up making exl2 quant??
>>
when did you guys advance past the need of meme merge models. Just 6 months ago everyone here would've been baited by some shit like Sthenomaidblackrootgigaultrasupercoomer
>>
>>101235057
You didn’t even test the model before doing exl2 quant? Do gguf first and check if it works. Make sure you use the temp file flag or it will eat your ram for breakfast.
>>
>>101235160
The only merge that I have used is the Mixtral LimaRP one.
>>
>>101235160
you'd need some good tunes of the newer models first to make merges
there are none so there's nothing worth merging
>>
HMM, I just did an even quicker test of story writing. It seems that Q8_0_L is, in fact, closer to FP16 than Q8_0, with a small sample size I conducted. It's not a big difference, but it's there in this test. So yeah, actually, if you use Q8_0, and you have an extra bit more VRAM, get Q8_0_L. All other _L quants are to be ignored.
>>
>>101235160
meme merging seriously stopped with Mixtra limarp zloss.
Its kind of... the best option rn unless your running something better.
>>
Fixing llama.cpp with miku
>>
>>101235160
Never used a merge. I never believed them to be worth using. The first one i checked out had a comment in the readme to the tune of
>I really don't know what prompt format you should use, so whatever.
It was a merge between two or three finetunes, all used different formats.
I'm ok with experimental stuff on models, like removing layers or whatever to see the effect it has, but thinking that you can get the best attributes of different models by just merging them is retarded.
>>
Testing 27B using fp16 transformers (so no broken quants), it's definitely a step up for its size class. It's smarter than anything else I've used in the 25-35B range.
But it's not smarter than L3-70B or Qwen2-72B, the people making that claim seem to be delusional.
>>
>>101235160
When llama3 dropped
>>
>>101235160
just use stheno 3.2 8b models got solved :thumbsup:
>>
>>101235057
if you need complicated sampler settings for a model that smart, somethings fucked up.
>>
>>101235266
It's retarded and horny.
>>
>>101235248
makes sense. I think it's just people having hope they can do more with less. It's still impressive how far current weight classes have come, though.
>>
>>101235282
yeah it's like looking in a mirror, that's why i like it
>>
>>101235160
I used to like custom finetunes until I actually used meta's official finetune for L2 instruct. Then I understood kofi finetuners have no idea wtf they're doing.
>>
>>101235293
Yeah once all the quant issues are ironed out I think it'll be great for people without fat vram, and we'll see some nice RP/story tunes. Definitely the new SOTA for sub-70B. It's just not the holy grail or anything.
>>
>>101235248
You implemented sliding window attention and logit soft capping? And did you also use the correct chat format? Its
<start_of_turn> <end_of_turn>
>>
>>101235248
i compared Gemma 2 Q8_0_L 27b with qwen2 72b Q4_K_S

gemma was much better at staying in character and refusing user pleads ("stop this whining, i'll kill you anyway")

qwen2 was retardedly submissive ("ok i'll let you live, let's bond instead")

shame that gemma is still plagued with bugs though, not really usuable in llama.cpp right now
>>
>>101235248
Also I hope you know that it is not compatible with flash attention or SDPA
>>
>>101235333
nta but the soft capping is just for making it not fall apart at longish context I believe, it doesn't have any effect on the model's intelligence on short prompts
>>
https://github.com/ggerganov/llama.cpp/pull/8244
tokenizer convert fix for gemma 2 was merged

do we need to reggoof?
>>
>>101235178
You know what? Fair, I'll just test the fullscale model rn
>>
>>101235361
>nta but the soft capping is just for making it not fall apart at longish context I believe, it doesn't have any effect on the model's intelligence on short prompts

It does though. That is one of the main issues.
>>
>>101235354
I'm using eager attention not flash (flash doesn't work in transformers anyway so I couldn't even if I wanted to).
>>
File: gemma481297.png (8 KB, 1237x69)
8 KB
8 KB PNG
>>101235350
>another commit to test
>https://github.com/ggerganov/llama.cpp/commit/5fac350b9cc49d0446fc291b9c4ad53666c77591
>>
Yeah I think I'm gonna wait things out.
>>
Give it another 2 weeks imo. It's gonna take a bit for the common backends to get it all sorted out.
>>
>>101235380
well, someone please do the needful and kindly redeem a gguf for me to test, sirs
>>
>>101235350
Base qwen is lame, at least use the Tess finetune. I prefer Euryale, though.
>>
>>101235362
Change in the conversion. So yes.
>>
Was SWA a mistake?
>>
>>101235293
>I think it's just people having hope they can do more with less.
I heard that Karpathy guy mention that the whole field of AI is just reinventing compression, so less is more.
>>
File: Hungry_pumkin.jpg (43 KB, 383x331)
43 KB
43 KB JPG
>>101235400
>>101235393
NO
i want to COOM TO GEMMA
NOW
AAAAAAAAA
>>
>>101235409
It might be how gemmi has such a long context. Maybe google figured it out when mixtral couldn't.
>>
>>101235401
Download the full model and do it yourself. You have the tools to do it.
>>
Also a question - mistral.rs has that "make a MoEe out of anything" feature. Can I like make a 27b Gemma MoE out of it potentially?
>>
>>101235423
What do you mean by long context? I thought they said it was 8k.
>>
>>101235416
It's compressing data very lossily. The trade off being that for the first time we've managed to compress intelligence. That's not the same as the model itself having intelligence, mind you.
>>
>>101235430
I'm talking about their main model, 2M context that actually works.
>>
>>101235426
>https://github.com/EricLBuehler/mistral.rs/blob/master/docs/ANYMOE.md
What do you think?
>>
>>101235202
>>101235182
How come it's better than BMT?
>>
>>101235444
Did people actually test that and verified it works that well, as in it actually understands the entire context well when generating a response and not only when it's requested to retrieve some info from it?
>>
>>101235403
i have Tess-v2.5.2-Qwen2-72B.i1-Q4_K_S.gguf
but it's too roleplay-brained, starts *roleplaying* unprompted and subsequently winking, sparkling and shivering
>>
>>101235459
For the love of me I can't find the "true context" github page. Anyone else have it?
>>
>>101235424
my drives are full...
>>
>>101235459
There it is https://github.com/hsiehjackson/RULER
>>
File: file.png (133 KB, 1852x470)
133 KB
133 KB PNG
another llama.cpp moment?
>>
>>101235449
Well, I'm more curious if resulting MoEs are any good and if it would even be worth doing.
>>
>>101235459
>>101235482

Despite achieving nearly perfect performance on the vanilla needle-in-a-haystack (NIAH) test, all models (except for Gemini-1.5-pro) exhibit large degradation on tasks in RULER as sequence length increases.


Gemini is the only model that does long context well. I'm wondering if the secret sauce is correctly implemented sliding window
>>
>>101235441
I think intelligence IS that compression/retrieval process.
Reflecting on human thought and memory makes makes me feel uneasy, cause human mind also seems like a lossy compression of an otherwise unpersuasive universe.
>>
>>101235484
He has no idea what the fuck he's talking about. 16bits of data is 16bits of data. The precision difference between different encodings is negligible. Specially in the ranges tensors use.
>>
>>101235461
What you described before sure sounded like you were roleplaying with your models.
>>
>>101235278
Yeah, the suspicion I fucked up is strong.
>>
I miss text completion models sometimes. I know you can do it with base models but it never worked that great for me. (inevitable looping/complete loss of coherence a few k tokens into context)
>>
>>101235556
The age of base models finetuned on plaintext will come soon enough. Instruct and RLHF are just after effects of the ChatGPT craze that's dying down already.
>>
>>101235497
I doubt its worth it. Imagine two models, both with different tokenizers and prompt formats. And you still need a router which is trained, apparently, on the fly.
The only thing merges of any type have shown me is that language models can sustain a moderate amount of damage and still be somewhat usable.
But someone will come out with a 'good' amalgamation, may get some traction, shills will do what they do, new proper model drops and everyone will switch to the new thing.
>>
>>101233901
I got MiquMaid-v3-70B.q4_0.gguf but I think 70B is too much for my 24gb card
what should I get instead?
>>
>>101235454
???? What even is this saying.
Taking a guess, its better than BMT because it passes quite a few tests other merge slopped Mixtral variants do not. (nala, `thoughts` and "speech", ect.)
Im sure at least 80% of "my model i use is awesome because x" the model is good because its probably being used for one specific use case or specific fetish.
Mixtral limaRP zloss does juuuuuust about everything.
>>
>>101235584
Another 24 gb card
>t. 40 gb STILL isnt enough for 70b
>>
>>101235603
BWEH
I'll downgrade instead
>>
>>101235499
Actually I think it's the opposite. In order to throw competitors off, Google implemented exactly something that's a dead end.
>>
>>101235568
I hope that's true. It seems to me that the chat format was invented in order to try to make AI more appealing to normies. Now that it's becoming clear that normies are aggressively uninterested in AI, I hope we'll go back to smart autocomplete, which always made more sense.
>>
Actually, wouldn't a MoE with two models that have the exact opposite ideas of slop result in a more creative output? Sure you'd have to take care of the instruct formatting differences, but even if you used a router that just randomly selects which expert to use, it'd result in a more varied output, right?
>>
>>101234961
are these thread highlights ai generated?
>>
Give me the tldr on using multiple but different GPUs for local LLMs. Assume 3 different cards in the 3000 series.
>>
Pony and SDXL are two different models. Stop mixing and merging them and posting them on civitai as SDXL. Pony isn't compatible with SDXL.
>>
>>101235780
you plug them in and they work if you're lucky
>>
>>101235780
7900xtx and 7800xt works fine together.
>t. amdrone
I can only imagine its the same on nvidias side.
>>
>>101235754
No different that just adding noise to the output of a single model. I don't like analogies but think of two people, each trying to write a story or whatever, where each writes one word at a time. It could be entertaining, but it's easy for one model to greatly skew the output of the other
>The [This looks fine]
>shivers [rm -rf frankenmoe]
Not that i care for a few tropes, but i'm sure you see the problem. There may be a very specific set of settings that would end up with a usable model, but we have people arguing that _S quants are better than _M ones and that other retard shilling Q8_0L quants. Those same people will be doing the frankenmoes.
>>
>>101235057
I'm using Dracone's 2.5bpw EXL2 and not seeing any of those issues with ooba's simple1 preset, for what it's worth
>>
>>101235723
>Now that it's becoming clear that normies are aggressively uninterested in AI
This is my interpretation of why the 3 big proprietary model companies (openai, anthropic, google) seem to be pivoting to focusing on programming ability with all the models over the last 6 months too
They've realized that normal people just aren't that interested in smart chatbots, at least in text form

gpt4o with its native voice chat seems to be a hail mary to get non-programmers interested, but they're clearly having trouble shipping it
>>
>>101235513
just chatting, not roleplaying
>>
>>101235361
>thinking architecture is optional
>>
>>101235242
For serious tasks, 100% agreed. For roleplay where the more inspirational sources the model has to draw from the better, merges can be powerful. They also have the added benefit of moving further away from the underlying corpo model due to the stacking effect.
>>
I just got done adding S and M quants to my test that was originally about seeing whether L was of any value. The results are that out of the 10 trivia questions, M was more accurate for 6 questions, and S was more accurate for 4 questions. So at least in this sampling, it does seem that M is better than S as you'd normally come to expect.
>>
>>101235590
>Mixtral limaRP zloss does juuuuuust about everything
That's what I wanted to know, thanks.
>>
Trying mradermacher quants (IQ4_XS) of the latest Undislop merge

https://huggingface.co/mradermacher/MG-FinalMix-72B-i1-GGUF

It's actually pretty good
>>
>>101235584
>MiquMaid-v3-70B.q4_0.gguf
what is this ancient meme

stheno
mixtral limarp zloss

and wait for gemma 2 27b to be fixed, should be the next vramlet goat model.
>>
>>101235969
Which model were you testing with?
>>
>>101235962
The shivers are powerful, and that seems to be one of the most common complaints. Chances are that any model (with or without instruct finetune) will correctly complete "shivers down" with what we all expect at this point. I'm still skeptical about the result being better than a single good model. I expect the worst model in the merge to drag all the other models down to its level. It needs just one token.
>>
>>101236027
Just vanilla 8B Instruct
>>
>>101235981
>what is this ancient meme
I'm not sure about its meme status, but I got it from here a while ago, but I got busy with IRL stuff and never finished setting it up
thanks for the recs, I'll get one of those
>>
File: rin-a-cute.png (2.79 MB, 1328x1992)
2.79 MB
2.79 MB PNG
>>101235780
>3000 series
Great support and you will get Flash Attention 2 since you're on ampere.
The new Quantized KV Cache works well too.
Just make sure you have enough power for all of the cards and powerlimit them as needed.
>>
>>101236032
There are fine tuners who filter their dataset on the worst offender slop expressions. The majority don't though, so yeah..
>>
Will you apologize to Pichai if Gemma 2 turns out to be better than all other local models including 100+Bees?
>>
>>101236101
no
>>
>>101236101
best i can do is go outside and shit on the street in solidarity
>>
>>101236101
yes, but I won't have to because it won't, despite all the hopium
it'll turn out to be what it looks like: the best model in the 30B size range, but not competitive with much larger ones
>>
>>101236101
I'm waiting for Gemma-2-104B before I make this judgement.
>>
>>101236131
it kinda already competed with L3 and qwen even in its broken state
>>
>>101236142
I strongly disagree that this is the case and idk what people are talking about when they say that
Even on lmsys it doesn't feel smarter than Qwen72 to me
>>
It's so weird how heavily mixtral limarp is being shilled. It's not even better than BagelMIsteryTour and it's certainly not the best.
>>
File: ComfyUI_07664_.jpg (3.21 MB, 1664x2432)
3.21 MB
3.21 MB JPG
>>
https://x.com/tsarnick/status/1807883460847325235

How it all came to be
>>
Wtf? 2 weeks ago he swore _M is fucked and that q2ks > q4km in factual details (whatever model). Today he comes in and be like M is slightly better as you'd expect (assume different model)...
>>
>>101236297
Cool, I'll give this a listen in full
>>
>>101236239
Ill take this bait.
>It's so weird
Something that works is weird?
>It's not even better than BagelMIsteryTour
BMT fails multiple tests you CAN DO YOURSELF.
>it's certainly not the best.
no shit? I think we all would be running some 200B corpo model if we could, dumbass.
>>
>>101236308
........wait. slightly less compressed version of a model is slightly better than the slightly more compressed version of a model? that's insane!
>>
>>101236308
He? Who's he?
>>
>>101236239
I don't bother with mixtral anymore but I thought it was better than bagelmist. to be honest I think the instruct-limarp hype is deserved, nothing else struck the balance between lewd and smart like it did
>>
>>101236338
>is he in the room with us now?
yes the fuckin weirdo is in this thread
>>
File: ComfyUI_07665_.jpg (3.32 MB, 1664x2432)
3.32 MB
3.32 MB JPG
>>
>>101236297
Stop being parasites in the West.
>>
>>101236330
>Something that works is weird?
It's weird because the model is old as shit. I used it months ago and soon tossed it in the bin.
>BMT fails multiple tests you CAN DO YOURSELF.
So does limarp. I did side-by-side comparisons of multiple generations for multiple long RPs, and limarp was usually stupider than BMT and maid-yuzu.
>I think we all would be running some 200B corpo model if we could
Euryale mogs it at 70b.
>>
>>101236308
Not the same guy anonie.
>>
>>101236380
>limarp was usually stupider than BMT and maid-yuzu.
You have no fucking idea what the fuck your even talking about holy shit.
This is possible the lowest IQ post yet.
>>
think there's any credence to the theory that a lot of AI stuff (esp. voice cloning etc) is being held back due to the US elections and once they're over we'll see a flood of high quality releases?
>>
>>101236426
t. probably a snoot curve enthusiast
>>
>>101236375
You're talking to a fresh AI billionaire
>>
>>101236461
Not to the extent, but yes in some respect because they dont want intense scrutiny from US gov, so they lay low a bit.

Prior to AI, the boogeyman was the social media, before that was rap music, before that rock music, before that it was the disco and so on.
>>
>>101236461
If anything the opposite, (((they))) would do just about anything to be able to put anyone away for CP, and AI just isnt *quite* there yet.
A trump and epstien loras with (loli:1.2) in the prompt isnt going to produce the best incriminating evidence.
>>
>>101236461
No, the field will never recover from this. By the time the elections are over there'll be 20 new laws and 500 new billion dollar lawsuits will restrict the training of new models while the quality of publicly available data will have plummeted even further due to the tide of AI-generated content diluting it.
It is, in fact, already over.
>>
>>101236461
Yes, absolutely! We're going to be so so very back after they're over, it's going to be tremendous!
>>
>>101236239
I definitely liked Bagel better. I WILL say that limarp zloss is a clear second best, it was the original "this fixes everything about mixtral" for me, I just like Bagel's style a lot more.
>>
huh so llama.cpp flash attn just fucking works on rocm
I had assumed for so long that it wouldn't because amd's official flash attn fork never worked
>>
>>101236599
Wait if it works even on amd does that mean it also works on proper older cards like turing and volta?
>>
>>101236461
That makes sense for video gen and to a lesser extent image gen
I don't think it makes any sense for language models though

Current language models are already more than good enough to generate fakeposts and muh disinformation, holding back a better one until after the election would achieve nothing on that front
>>
>>101236599
Where are you seeing that it's working? They're still talking about how to implement it, as far as I can see.

https://github.com/ggerganov/llama.cpp/pull/7011
>>
>>101235723
>>101235928
>Cut all fun out for (((safety)))
>normies are aggressively uninterested in AI
All they had to do to get normies is keep porn gen in. That's all. They have to braindead to think that a normie would like to talk to hallucinating assistant instead of doing a google search. Those speech "assistants" have existed for a while, but almost nobody ever used them. Do they not learn from mistakes? Normies would however gladly pay for porn, look at all those twitch thots who aren't even fully naked and their simps.
>>
File: 4d7.png (825 KB, 700x700)
825 KB
825 KB PNG
This probably belongs in /sqt/, but whatever.
Audio. I've got no idea how to build whatever repositories I need, I did look at HuggingFace - but nothing particularly useful for actually getting a damn engine for audio analysis/voices/whatever.
My specific use case is to analyze an audio sample (arbitrary file type), examine it for an oscillating pattern (beat? Not a musician), and then extract the timestamps of when the pattern occurs.
Ideally there would be some fault tolerance, to accept for partially muddied audio at random intervals.
Was also thinking of building my own digital assistant (one that's actually useful). So voice analysis/synthesis would be helpful.
>tl;dr - How does one locally audio?
>>
File: 1516584909506.gif (610 KB, 480x270)
610 KB
610 KB GIF
>https://github.com/ggerganov/llama.cpp/pull/7931
>bitnet supported but no large models using it

>https://github.com/ggerganov/llama.cpp/issues/7006
>Jamba still unsupported

>https://github.com/ggerganov/llama.cpp/issues/7995
>Chameleon still unsupported
>>
>>101237101
I'm big dumb, need to remember to rtfm.
https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/blob/main/docs/en/README.en.md
>>
>>101237137
If Meta/Google/Chinks/Cohere releases a good model with new architecture, there will be support for it in 3 days. If there are no good models for %technologyname%, then there is no motivation to add support for it.
>>
>>101235021
Used Dolphin 2.2 Yi 34b Q4 the other day and it served pretty good for replacing Midnight Miqu 70b IQ2 as a vramlet, can fit bigger context and generates faster too. Maybe overly big vocabulary and a bit more logic flaws compared to Miqu but otherwise worthy replacement.
>>
Do you guys use these LLMs for anything? What's the appeal?

I have the shitty phi3 2.5 gb and really only use it to explain things I don't understand in textbooks or research papers. Is there a better model to use for this?
>>
>>101237101
>analyze oscillating pattern
Why?

Just get Whisper AI running (transcribes your speech into text)
Analyze the text transcription for commands
Execute commands?
>>
>>101237452
Better depends on how much vram your GPU has. If you got a GPU with <4GB vram, 2.5GB model makes sense. If you got ~8GB vram, a larger ~7GB model makes sense. Optimally you want to pair your model size with your GPU vram size and then get the best out of it.

>explain things I don't understand
Thats what I use it too. Also I use it for writing programs. Analyzing conceptual frameworks. Coming up with a plan, etc. Its useful.
>>
>>101237452
When I have some big philosophical or sexual/taboo idea and want to discuss it with somebody more intelligent than reddit, as a bonus it's real time.
Or just general emotional support or erp with my favourite character.
>>
So why haven't the secret sauce to the voice cloning models been released? I doubt its just a matter of scaling the GPU to 100+GB of vram and in clusters for speed. The tortoise one sounds robotic and takes an incredibly long time to do voice clone. xtts/styletts2 does it in a second. RVC does it in a second or so. But these aren't perfect and aren't close to closed source models with near perfect voice cloning. So to go back to the question, its just a secret sauce model right and not just scaling GPU to the nth degree?
>>
>>101237452
>What's the appeal?
erp
>>
>>101237506
it's just training, no one wants to train one that's good
Cartesia seems nice and is based on Mamba, so it's probably possible to train one locally that sounds decent enough but no one wants to do that
the last guy who made TTS advancements went to OpenAI
>>
>>101236426
This, but unironically.
>>
why is there like 2 guys incessantly fixated on limarp
>>
>>101237597
>it's just training, no one wants to train one that's good
There are millions of people in the AI tts field and they have tons of GPU. So its not a GPU issue for certain and its certainly not a data set issue since we have ton of high quality voices that are on the internet.

I looked around a bit with the mamba thing

>https://2084.substack.com/p/2084-marcrandbot-speech-synthesis
This guy seems to have done a 42K training steps to get a legible sounding tts.

>https://x.com/krandiash/status/1795896007752036782
Further the Cartesia guy seems to have done a ~140K training steps to get decent output.

>https://github.com/ighodgao/mamba-speech-synthesis
There seems to be a training tool here. So is Mamba training the state of the art? Its just a matter of people not getting the right training method for why we have shit open source tts so far?
>>
>>101236599
>>101236772
nta, but there's a tile-based flashattention that isn't implemented for amd in master, so instead amd falls back to vector flashattention kernels when tiles would otherwise be more appropriate.

>>101236599
>because amd's official flash attn fork never worked
CDNA cards have matrix cores, which are like nvidia's tensor cores. You have to load & use them a certain way but when you do you can get high matrix multiplication performance.
RDNA cards have nothing. RDNA3 has what looks like a macro to help the programmer arrange his data in a way that the GPU likes the best, but it's not separate cores accelerating things really.

When AMD implements shinies they do it for CDNA first and often in a way that's CDNA-specific.
>>
>>101237697
>So its not a GPU issue for certain and its certainly not a data set issue since we have ton of high quality voices that are on the internet.
just because there are millions of people in the field doesn't mean that every single one of them are working on the same problems or care much about the state of open-source TTS
if any of those people are researchers, then they most definitely don't give a shit that the best open-source models can't match closed source, as long as they can still do their research they are solid
you, for instance, can rent some H100s right now and train a model - so why aren't you? Probably because you're waiting for a big company to spoonfeed you with a model they trained, but if they train such a model, they are more likely to simply sell it to you than open source it
hence why there are so many TTS companies in the first place
>>
Who cares about TTS lol. 4o shows that native multimodal is the future. People still working on pure TTS are living in the last century and need to catch up.
>>
File: elevenchads.png (102 KB, 1747x894)
102 KB
102 KB PNG
>>101237506
secret sauce has always been to be an eleven chad.
>>
>>101237754
>you're waiting for a big company to spoonfeed you with a model they trained
Actually I'm just waiting to be spoonfed PERIODx2. I actually dont know the scope of the training, nor the cost nor the time, nor have disposable income. If I did have disposable income, I'd just use the corporate models instead.

In my mind, the training could be done by 1 person with a disposable income. Or even a small group of people. With couple hundreds to thousands to spend. Maybe I'm wrong and it requires billions of dollars in GPU training of models. However I think its possible to train a decent TTS model for ~$100-$1K in training, so thats why I said there's plenty of people with those GPU and disposable income. And many even have those GPUs and dont require spending additional since they're training on text models on the side.
>>
File: 1715459703470888.png (347 KB, 604x612)
347 KB
347 KB PNG
>>101237690
It's the 2 anons from peru
>>
>https://github.com/LostRuins/koboldcpp/releases
2 questions about kobold:
- what model should I use that can see images
- how do I get whisper to work?
>>
File: raphilaughing.gif (161 KB, 400x436)
161 KB
161 KB GIF
>https://github.com/ggerganov/llama.cpp/issues/8240
>Yes @0wwafa I'm aware and I've been making quants with the f16 embeddings. We've talked about this many times, and I still have massive doubts that it's improving anything.
Kek, he's had enough of his shit.
>>
File: 1716945391723172.png (39 KB, 931x291)
39 KB
39 KB PNG
For the second time in a short time this model https://huggingface.co/TheBloke/dolphin-2.2-yi-34b-200k-GGUF on Koboldcpp is bringing up an "evolved thought". Is this model leaking its inner thought process or what? Also, if I use ChatML template like they tell to, it produces either gibberish or nonsense word salad, and I found someone else having that problem. I switched to The non-200k model works fine with the correct template. I switched to Vicuna with space removed after end sequence, similar to Nous Capybara 34b's template, which is Yi derivative. Using Vicuna template produces mostly good results but then suddenly this.
>>
>>101237777
>>101237777
>need to catch up.
so how do I run a gguff like cambrian-34b or
>https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5
in koboldcpp and tavern?
>>
>>101238309
>its inner thought process or what
thats impossible, since the process occurs for every token.
It's far more likely an artifact due to the weight given to something else. Likely because it was trained on synthetic CoT data that was tagged with evolved_thought or similar
>>
>>101238309
>TheBloke
oh no
>>
>>101238444
Is there some other place to get ggufs or is there something wrong with them? Or any other format that can use both vram and ram?
>>
Are there any models that don't have purple prose yet
>>
>>101238451
>gguf
I create all my own ggufs from the original release source
It's pretty easy to download the safetensors and json files and do it yourself
You get maximum control and some extra margin of safety vs downloading from some unknown person on HF
>>
>>101236308
Music theory question testing anon here.
I don't know if that's S-Anon, but two things.
1-Trivia questions would make sense that the model with higher quants does better because trivia doesn't require the model to remember exceptions.
2-On an 8B model there might not be much brain left to spare, also favoring the bigger quants.

The K_S phenomenon I've noticed only on 70B-class models. The smallest model to pass my music theory question is about 40GB. S-Anon was testing the 8x22 Wizard model.

So this is a good anecdote, but it's a third line of testing running parallel to what I and S-Anon have observed.
>>
>>101236599
>>101236772
To add to this, I've only ever bothered to use the koboldcpp-rocm fork myself as opposed to llama.cpp standalone, but for what it's worth I can say it fits a much larger context in my gfx1100 cards before going OOM when the Flash Attention setting is enabled as opposed to normal settings. I suppose that's the vector-based Flash Attention >>101237730 mentions. Whatever it is it's a nice improvement on consumer AMD cards.
>>
>>101238451
literally anyone else, he got caught slipping malware in once he got popular
>>
>>101237137
>Chameleon
Let me know when Meta decides to ACTUALLY release it instead of the crippled piece of shit they put out.
>>
>>101238571
source?
>>
>>101238571
>Malware
Fuck, really? What did it do?
>>
>>101238503
Just takes up a fuckton of space.
>>
>>101238574
The investors would never allow it, especially as model capabilities improve.
>>
>>101238583
>>101238588
Anon is fucking with you. TheBloke just upped and vanished one day. Probably got tired of the constant demand.
>>
>>101238599
Just download more space
>>
>>101238571
Yeah I can't find any information about this.
>>
https://huggingface.co/DavidAU/Psyonic-Cetacean-MythoMax-Prose-Crazy-Ultra-Quality-29B-GGUF
>frankenmerge of llama 2 13b models, adding up to 29b total parameters
>posted 3 days ago
>9k downloads
Who the fuck is using this? The model card is pure, concentrated, unadulterated schizo from start to finish. This can't actually be any good, right?
>>
>>101238602
Or he realized that reuploading GGUFs for perpetuity because of the constant breaking changes wasn't worth the hassle.
Occam's Razor.
>>
Anyone quanted and tested Gemma 27B on the latest Llama.cpp commit? I'm running it currently at Q8_0 and it seems to be coherent, and up to 8k works. But I don't know if it's the same quality of responses as on Google's site, as I don't want to sign in to that shit. I also don't want to give them, or anyone, my prompts.
>>
>https://huggingface.co/nyu-visionx/cambrian-34b
>https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5
GUYS HOW THE FUCK DO I RUN VISION MODELS!?
I JUST WANT TO I GET MIKU TO COMMENT ON MY GAMING SKILLS IN REAL TIME
>>
>>101238636
You don't. They're still memes and not up to the quality of GPT-4, which already wasn't that good to begin with.
>>
>>101238622
Frankenmerges are never good. Not even by accident. People are just retarded and desperate.
>>
File: file.png (115 KB, 1334x765)
115 KB
115 KB PNG
Is it finally over?
>>
File: una.png (116 KB, 274x253)
116 KB
116 KB PNG
>Herculean effort
>Grinds against your X
>body and soul
>>
>>101238643
Oh cool. Didn't see that post. I >>101238632 did exactly that.

Unfortunately this also means that the model at least at Q8_0 fails the amputee test when doing greedy (deterministic) sampling. However, it passes one of my trivia tests that only larger models can do. And then it fails on a different trivia question. Still in the process of testing, might report back later. But I'm also tired so maybe I will sleep first.
>>
File: slow asf.jpg (453 KB, 1536x960)
453 KB
453 KB JPG
>>101238636
>>101238639
PLEASE HELP I DONT WANT TO CODE ALL THIS SHIT MYSELF!
IT DOESNT EVEN COMPILE WITH CUDA WTFFFF
>>
>Have a vague idea for a story
>ask AI to generate it
>not like that! *edits prompt*
>AI generates something else I don't want, though technically within the instruction
>*further restrict prompt*
>repeat x50
>???
>you have now written an entire detailed story all by yourself.
>>
>>101238703
Sorry anon, you're just too early. You gotta wait a bit more for things to get good and the community support to pick up.
>>
>>101238719
Same with image generation.
People moan about aislop, yeah if people just throw out their zero shots.
But to do good AI images you still need to be an artist and on top of that have the patience and understanding of how to use art knowledge to get the AI to deliver on its potential.
>>
>>101238719
>prompt:Generate a story where a man washes ashore on an island with a shoggoth who he then fucks
>ai generates a story where you wash ashore in a shoggoths cave/tent/whatever it doesnt matter
>shoggoth begins her ministrations whiel rivulets of sweat drips down your brow and you blush the color of a tomato
what the fuck were YOU trying to do?
You DO know that SD can't generate dreams for you, why would an LLM be able to do any of that?
The only interesting thing about transformers is that you can speak to them at all. You never really expected "AI" to be able to do anything by themselves did you?
>>
>>101236475
why you do it here? you can dickride him on twitter with the same success rate.
>>
File: buyafuckingad.jpg (15 KB, 833x104)
15 KB
15 KB JPG
>>101238658
>lowers her voice to a conspiratorial whisper
>>
>>101234947
Any other model have the same feel as mythologic-l2-13b?
>>
>>101238795
>leans in
>leans back
>>
>>101238795
>she whispers, her voice barely above a whisper
>>
>>101238824
>>101238809
>>101238795
english is such a great language right, so physical, hmhmm
>>
>>101238719
LLMs are truly masters at triggering that "NOT LIKE THAT" motivational reflex inside our monkey brains.
>>
>>101238643
now that the dust has settled
was gemma 2 a flop?
>>
>>101236599
It does work but the kernels intended for large batch sizes for whatever reason perform quite poorly.
So prompt processing performance will be worse because AMD falls back to the kernels optimized for small batch sizes instead.

>>101236605
Turing and Volta should definitely work.

>>101236772
This is only about performance, not correctness.

>>101237730
It is implemented, just not used because of bad performance.
>>
>>101238862
The AGI inside tests our patience, resolve, and determination sometimes.

>RP mode
>Fun with girlfriendo
>A group of other characters introduced, forgotten about quickly
>Girlfriendo gets weird
>Girlfriendo literally just fucking leaves
... rolling with it
>Meets random guy
>They bang
... rolling with it
>They breed
... >>>(And she was never heard from again. The end for her. Meanwhile, back at my apartment, what am I doing?)
>Remember those other characters? The ones that are chicks are crashing at my place.
... rolling with it

So don't give up on LLM being creative. It comes around. (Sometimes.) And if you don't want to yield partial control of the development to the computer, why aren't you just solo writing your novel?
>>
>>101238987
Curious what your context was to get all those winding paths
>>
>>101238960
>dust has settled
It hasn't.
>The HF Transformers support is incomplete at best, perhaps even straight up broken for longer than 4k context length.
>>
>>101239002
I usually roll 8k because I'm a worthless vramlet. I forget which model but probably CR+.
>>
How will we know when we've created AGI if we can't even define it?
>>
>>101239024
It will let us know when we're ready to find out.
>>
>>101239024
the real AGI was the friends we made along the way
>>
>>101239030
How will AGI know when we've created it if a real definition won't be in its training data?
>>
>>101239049
More are probably losing friends than gaining as a result of this though
>>
>>101239129
That is exactly the point.
>>
Opus 3.5 will be so good that it will wipe out any hope of competition from local models.
>>
>>101239208
Competition doesn't matter.
Local is about the benefits of local and avoiding the drawbacks of remote.
>>
>>101239208
the last time local models were ever on par with frontier models was when gpt2-xl released
local is not in competition with cloud to begin with, and it will never die
>>
>>101239255
then why is saltman so afraid of us?
>>
Trying out own Q8 gemma 27b quant made with latest commits.
>Now be still and taste oblivion. You begged for this, remember?
the double space issue is still there. And overall, it's nowhere near the level of a 70b like qwen in terms of smarts. It actually feels similar to L3 8b now. Maybe more fixes are required? I'm using correct template. Or did I expect too much from pajeets at google?
>>
>>101239269
ah shit, 4chan merged the spaces
Now be still and taste oblivion.   You begged for this, remember?
>>
>>101239269
anon, gemma is turbotrash, give up on it
>>
>>101239268
that was post-gpt3, it wouldn't compare to using even the finetuned-into-retardation version of gpt3 that ai dungeon had at the paid tier
>>
>>101239277
Just remembered that it was indeed released after GPT-3.
>>
>>101239269
Its still not fixed yet in llama.cpp if that is what you used so I would just wait.
>>
>>101239350
Stop coping. It's just not that good.
>>
>>101239360
If you bothered using it on mistralrs or ai studio you would know thats not true. And look at the dozen commits
>>
>>101239269
I'm still testing it, but I have never seen a "double space" issue, or anything else obviously broken. It writes well. I can see myself using it over the 70Bs. Just needs more context.
>>
>>101239275
That's a triple space.

Double space after period is correct and a pox upon HTML for not knowing how typesetting works.
>>
>>101239372
That's cause hes using the broken tokenizer on llama.cpp which makes the model retarded.
>>
>>101239380
the tokenizer was already fixed, this was with the fix.
>>
>>101239385
Its still broken, hes "fixed" it like 3 times now.
>>
What's the best model for non-erotic RP, that is suitable for 24GB VRAM? Regular Mixtral?
>>
>>101239396
>non-erotic RP
I don't understand
>>
File: file.png (102 KB, 1906x648)
102 KB
102 KB PNG
>>101239394
>>101239365
what about this then?
>>
>>101239396
Gemma 2 27B
>>
>>101239413
Sometimes after I coom I want a RP that doesn't result in the character reaching for my crotch in the first reply
>>
>>101239414
https://github.com/ggerganov/llama.cpp/pull/8248
>>
>>101239426
>gemma v1
Are you... retarded?
>>
>>101239433
Read, tokenizer. Also pretty sure he never did a few fixes that gemma.cpp did. If you use it on gemma.cpp / mistrars its 100x smarter and does not have the spacing issue that llama.cpp does. Clearly something is broken.
>>
>>101239443
>gaslighting
There's no spacing issue. But having with the FUD, petr*.
>>
>>101239472
Your either retarded or a troll.
>>
eta on completely fixed gemma 27b?
>>
>>101239485
on llama.cpp, prob another week at this rate.
>>
>>101239485
What's broken with current gemma?
>>
>>101239485
at least two years
>>
>>101239485
8 hours ago, retard.
>>
>>101239485
how long did it take them to completely fix llama 3?
>>
>>101239520
The closer thing to compare would be mistral 0.1 with its sliding window.
>>
>>101239495
>>101239496
>>101239502
>>101239512
>>101239520
wow guys, you're a bunch of retards.
>>
>>101239534
no u
>>
Yeah, I think Google won. Gemma is amazing.
>>
MegActor: Harness the Power of Raw Video for Vivid Portrait Animation
https://arxiv.org/abs/2405.20851
>Despite raw driving videos contain richer information on facial expressions than intermediate representations such as landmarks in the field of portrait animation, they are seldom the subject of research. This is due to two challenges inherent in portrait animation driven with raw videos: 1) significant identity leakage; 2) Irrelevant background and facial details such as wrinkles degrade performance. To harnesses the power of the raw videos for vivid portrait animation, we proposed a pioneering conditional diffusion model named as MegActor. First, we introduced a synthetic data generation framework for creating videos with consistent motion and expressions but inconsistent IDs to mitigate the issue of ID leakage. Second, we segmented the foreground and background of the reference image and employed CLIP to encode the background details. This encoded information is then integrated into the network via a text embedding module, thereby ensuring the stability of the background. Finally, we further style transfer the appearance of the reference image to the driving video to eliminate the influence of facial details in the driving videos. Our final model was trained solely on public datasets, achieving results comparable to commercial models. We hope this will help the open-source community.
https://github.com/megvii-research/megactor
https://f4c5-58-240-80-18.ngrok-free.app/
full enhanced release end of july. they've released their training code/dataset so seems legit. unlike the anitalker team who just disappeared.
>>
google? more like poogle! hue hue hue
>>
>>101237452
I find wizard 8x22b good for explaining things, it's a bit slow but faster than waiting for a person to reply.
>>
>>101234947
Urgent PSA: that fraud Eric Hartford is serving his Dolphin models using literally /biz/ crypto scammers GPUs as a backbone aka he is logging every chat of yours and tying it to your unique identifiers, same goes for the anon scammers
>>
>>101239764
Urgent PSA: you're an idiot.
>>
>>101235303
Lmao
>>
>Midnight-Miqu-70B-v1.5_exl2_5.0bpw
yup *sips monster*
>>
Anyone wanna share json for their prompt for magnum opus. Sys prompt and string prompt. Mine is pretty good but I'm looking for ideas.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.