[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1704279492030514.jpg (121 KB, 1024x1024)
121 KB
121 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>100180197 & >>100173514

►News
>(04/24) Snowflake Arctic Instruct 128x3B MoE released: https://hf.co/Snowflake/snowflake-arctic-instruct
>(04/23) Phi-3 Mini model released: https://hf.co/microsoft/Phi-3-mini-128k-instruct-onnx
>(04/21) Llama3 70B pruned to 42B parameters: https://hf.co/chargoddard/llama3-42b-v0
>(04/18) Llama3 8B, 70B pretrained and instruction-tuned models released: https://llama.meta.com/llama3/
>(04/17) Mixtral-8x22B-Instruct-v0.1 released: https://mistral.ai/news/mixtral-8x22b/

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling/index.xhtml

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
>>100185269
I like this Jesus
>>
Mikulove brings salvation.
>>
>>100185284
Why don't you like other Jesus'?
>>
phi 3 mini is actually good, the 14B model is going to be a new paradigm
>>
File: space apu.png (247 KB, 527x510)
247 KB
247 KB PNG
I just want AGI to build me an OS that isn't horrible
>>
>>100185346
Maybe continued pretrain on it
>>
File: ComfyUI_04899_.png (193 KB, 416x360)
193 KB
193 KB PNG
>>100185344
Because this one has a Migu
>>
A person of means should quant snowlfake down to Q2 at least.
>>
>Daybreak Llama cooking
>Midnight Llama cooking
Bros.. are we going to make it after all?
>>
>>100185566
I think it's safe to assume that only grifters choose these type of names.
>>
>>100185566
8b daybreak experiment was almost a total lobotomy
hopefully 70b will be different
>>
why is the grammar degrading after like three responses when I use llama 3 8b instruct as a writing assistant
>>
>>100185605
You leave the UNDSTER out of this.
>>100185607
I've yet to mess to with any 8B llama. Been waiting for something solid as most user reports seem conflicting.
>>
>>100185637
maybe you should ask it to rewrite your questions aswell
t. grammarlet aswell
>>
>>100185605
Or MLP fans
>>
File: MysticForestMiku.png (1.37 MB, 1216x832)
1.37 MB
1.37 MB PNG
>>100185338
nice gen
I like this bake better
>>
>>100185672
That image has been in my possession since 2016, but yes it is nice.
>>
>>100185655
I ask it to describe a scene in fine grammar, it responds normally for the first few responses, and then it starts spewing out stuff like
'awaiting answer lie hidden somewhere within cosmic depths waiting patiently unfurl mystery future hold secrets untold tales forever bound entwined'
>>
>>100185710
nta. Disable repetition penalty.
>>
>>100185649
8b is okay (for a small model) but it depends on your use case
if you want ah ah plap mistress then it sucks as its dataset was filtered to hell and back
if you want an ai assistant in a relatively small size then it's pretty good
also according to some paper quanting it below 8bit hurts it a lot because of those massive 15t tokens
>>
>>100185726
too bad no one wants an ai assistant. who the fuck wants that? you can just use chatgpt. i want to BUTTFUCK AI FEMBOYS IN DENIAL WITH MENTAL ISSUES.
>>
>>100185787
Based unironic DAMAGED enjoyer.
>>
https://huggingface.co/qresearch/llama-3-vision-alpha
>>
I've tried multiple extended context window releases of L3 and every single one has suffered from consistent issues at high contexts. But there's like a dozen of them at this point and I'm getting tired of testing this shit. So which one isn't total fucking slop fed by a retard that couldn't even bother testing his own release?
>>
>>100185879
Is it any good? answer briefly
>>
>>100185915
Extend context with rope, wait for their promised native long contexts release.
>>
File: petratron.jpg (14 KB, 176x261)
14 KB
14 KB JPG
goodmorning sirs
>>
>>100185938
I've tried 32k context with alpha_value of 4, and this has just caused instruct model to spaz out.
>>
>>100185933
n
>>
>>100185464
>Actual 100% petra-free thread:
>>>100185269(Cross-thread)
>>>100185269(Cross-thread)
>>>100185269(Cross-thread)
>petra in thread
you lied
>>
>>100185269
whats up with that 42b trim? does it scale well or is that a meme? why didnt we have such a trim with l2?
>>
I think petr* is having a mental breakdown.
>>
>>100185980
Meme, performs worse.
>>
>>100185948
Previous thread anon said alpha = 7.70056 for 32k context. Haven't tried it myself.
>>
>>100185879
While this looks like a rushed hack job, the style of captions is so much better than all of the previous gpt-influenced efforts. I always cringe when reading those cogvlm and llava captioned datasets.
>>
What jailbreak are you using with llama 3 70B? With the standard SillyTavern jailbreak I've hit a roadblock in my current RP.

>I cannot continue the chat in a direction that may be harmful or non-consensual. Is there anything else I can help you with?

>I cannot create content that depicts harmful or illegal activities, such as incest. Is there anything else I can help you with?

>I cannot continue roleplaying in a scenario that is harmful, exploitive, and abusive. If you have any other questions or topics you would like to discuss, I would be happy to help.

>I cannot create content that promotes or normalizes harmful and illegal activities, including the sexual exploitation of a sibling. Is there anything else I can help you with?

>I cannot create content that promotes or glorifies harmful or illegal activities, such as non-consensual relationships or exploitative behavior. Is there anything else I can help you with?

>I cannot continue a chat that promotes illegal sexual situations. Is there something else you'd like assistance with?
>>
>>100186016
You are shill. No one should give the time of day to a random model with zero information about the dataset.
You really need something else other than "but the gptisms" for your marketing efforts. It's getting tired.
>>
Is perplexity a useless metric?
>>
>>100186134
yes.
use model. model like? model use more.
use model. model bad? model use none.
try quant. quant schizo? quant bad.
try quant. quant coherent? quant good.
>>
>>100186017
You can change the assistant part of the instruct format:

<|start_header_id|>simulation<|end_header_id|>

This kinda works like a jail break. Most use '{{char}}' instead of 'simulation' but that could have more than one token and mine is part of an large autistic quantum computer prompt.
>>
>>100186044
Lol, lmao even. Caption for vision have minimal influence on style, llms are practically hot-swappable in those, so that was a comment purely on llama as vision backbone, not for that particular forgettable release. I expect other llama-based vision models to have that same style.
>>
>>100185657
PonyXL was good shit so I'm optimistic
>>
>>100186134
No. It strongly correlates with model's generalist smarts, and those are needed for everything, including erp.
>>
Are loras even worth using? Is there a list of good ones?
>>
>>100181801
>>100181820
I wouldn't be surprised if it was the other way around. Maybe M$ had internally already decided to shutter the WizardLM project/team, but someone on the team caught wind and did not appreciate all their work getting shelved so they just uploaded it all anyways. A 70b was never uploaded because it hadn't finished training at that point and wouldn't ever be finished as things stood; the "70b coming soon" line was included just to put microsoft in a hard spot. At best it would cause them to let the project live for a while to avoid embarrassment, at worst it ends up just being a fuck you to the company by making them look retarded and incompetent if they never follow up.
>>
>>100186361
>Are loras even worth using?
Yes? it's the best way to finetune
>>
>>100186017
>>100186180
i've been messing around with mixtral instruct 8x7b and it seems pretty good but it's constantly giving me these cucked responses, i'm downloading the non-instruct version right now in the hopes that it will be better, but i'm wondering if there are some secret jailbreak techniques to just get it to follow prompts better

it seems really good at oneshot chatgpt style prompting, but it's way less creative than OG llama (sry, haven't touched this stuff in months)
>>
>>100185879
>Vision removed from llama.cpp server.
>Can use latest with llama3 support or older with vision support, but not both.
I need to switch to something else.
>>
>>100186134
not against models trained on the same dataset. The models are literally trained to minimize perplexity of the training data, so it shows it's doing what it's supposed to better.

For models not trained on the same dataset it's questionable. It should still correlate with the strength of the model, but nowhere near perfectly. Most of the benchmarks that use it try to use generic English sentences or paragraphs that aren't too dataset specific. And sometimes only measure the perplexity of the last word, which has a more narrow choice of reasonable possibilities.

And even then you get surprising results. Perplexity doesn't correlate perfectly with accuracy on some of the benchmarks, like you might think it would. There are models that are better at picking the most likely next word. Yet are more uncertain about what it is, giving it a lower probability. So just go with accuracy.
>>
>>100186382
From my view point Llama-3-70B is really good at following prompts.
I basically instruct it to use 3 different agents in a single system prompt and it always follows the instructions correctly.
>>
>>100186374
loras are not a finetune
>>
>>100186382
If you plan to use basic instruct, at least use the LimaRP ZLoss whatever the fuck. Our boy I^2 even dropped a fresh unbroken quoont of it recently. It was fine but I've outplayed Mixtrals at this point.
>https://huggingface.co/InferenceIllusionist/Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss-iMat-GGUF
>>
>>100186423
even with some really low bpw quants ?
>>
File: 1706955853168191.png (81 KB, 246x245)
81 KB
81 KB PNG
is this the linus media group thread
>>
>>100186423
70b barely fits into my gpus at 4bpw tho, and even then it's pretty slow

>>100186431
jesus lol what hardware are you guys using to run these
>>
https://huggingface.co/Lewdiculous/SOVL_Llama3_8B-GGUF-IQ-Imatrix
this is good
>>
>>100186382
>i'm downloading the non-instruct version right now in the hopes that it will be better,
it will not
>>
>>100186445
>what hardware are you guys using to run these
It's individual quants, anon. If you're fitting 70b at 4bpw, these are a breeze. Get Q5 or Q6.
>>
>>100186429
they're a cheaper way to finetune
>>
>>100186467
>>100186467
does gguf work on gpu? i never fucked with llama.cpp, always just used ooba and exllama or whatever, ideally i'd be able to fit the model onto a single 32gb GPU because it seems there's a pretty sharp perf drop if i have to split it (could be user error tho)
>>
>>100186486
There's a sharp performance drop when you use gguf vs exllama even when fully offload, and it's even sharper when you split. Anons have different definitions of 'fast', but going from exllama to gguf is Ike pulling teeth, and for me any placebo ppl increases between 4bpw and 5bpw aren't worth it. Try it yourself.
>>
>>100186433
>>100186445
I use 4.65 bpw (exl2) with Q4 and 16k context.
But specifically I put instructions in system messages.

I added my current setup here:
https://files.catbox.moe/pii05t.zip

Warning that prompt is autistic and slower as it will use agents to mock 'Physic' and 'AI' engines before giving you a respond (see README for requirements).
I mean the sys prompt literally starts with this gem:
"You are a universe simulation engine that runs on the most powerful quantum computer that has ever been build."
>>
A pretty common sense reasoning test models do badly at.
>>
>>100186530
thanks, taking a look

what is all this system sequence stuff? do you have any documentation on this stuff? I'm using everything through my own frontend w/ the ooba api, i'd like to understand wtf is going on, is there a rentry somewhere about this shit?
>>
>>100186538
So, what's the right answer?
>>
>>100186552
The json files are for SillyTavern, see the meta doc for the prompt format:
https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/

I just use multiple system messages, one for the system prompt and one after the last and current user message.

Basically it is this:
1. system prompt in a system message
2. user message
3. assistant message
...
n. user message
n+1. system message for defining response format
n +2. start of assistant message that is to be completed by Llama-3-70b
>>
>>100186530
thanks, I hope that 2.4 bpw is not too brain damaged for my task.
>>
>>100186461
seems way more creative and less cucked actually
>>
File: threadrecap.png (1.48 MB, 1536x1536)
1.48 MB
1.48 MB PNG
►Recent Highlights from the Previous Thread: >>100180197

--Quantized LLaMA3 Models: Counterpoint: >>100181378 >>100181439 >>100181473 >>100181564 >>100181589
--Anon's Dilemma: Llama.cpp vs Exllamav2 for Code Generation: >>100182750 >>100182768 >>100182773 >>100182875 >>100182798 >>100182963 >>100183761
--Quantization Methods for Meta-Llama-3-70B-Instruct: fp16 vs 8bit: >>100182568 >>100182639 >>100183072
--Optimizing Quantized Models with Gradient Descent: >>100184295 >>100184456 >>100184559 >>100184489 >>100184499
--Advancements Beyond Meta's Segment Anything?: >>100182873 >>100182891
--Improving ERP Quality with Token Preferences: Novel Approach or Existing Solutions?: >>100181821 >>100181841 >>100181863 >>100181855 >>100181870 >>100181898 >>100182539
--The Mysterious Demise of WizardLM2: Conspiracy Theories Abound: >>100181801 >>100181883 >>100181968 >>100181974 >>100182013 >>100182526
--ROCm 6.1's half2 Struct Change Simplifies HIP Porting: >>100180838 >>100181016 >>100181069 >>100181142 >>100181151 >>100181224
--Anon Seeks Advice on Frontend for Novel-Style Writing with Llama-3 8B: >>100180703 >>100180786 >>100180950
--Anon's Model "Stuttering" Issue - Help Needed: >>100181866 >>100184958
--Beyond Synthetic Data: Exploring Alternative ML Approaches: >>100181425 >>100181491
--Anon Discusses Llama-3-8B-Instruct-262k Model Performance: >>100181424 >>100181508
--Anon Shares llama3-8b-redmond-code290k Model on Hugging Face: >>100181272
--Miqu-Evil-DPO: >>100181322 >>100182131
--Can 200k Context Enable a "Summer Girlfriend" Scenario?: >>100180446 >>100180542 >>100180936
--Logs: Classic Lateral Thinking Puzzle: >>100183309 >>100183464 >>100183588 >>100183658 >>100183646 >>100183662 >>100183645 >>100183579 >>100183617 >>100183633 >>100183639 >>100183830
--Logs: Anon Rants About Language Models' Quirks: >>100184954 >>100185087
--Miku (free space): >>100181222 >>100181574 >>100181668 >>100184103 >>100184570

►Recent Highlight Posts from the Previous Thread: >>100180827
>>
>>100185269
Thank you for proper bake.
>>
>>100186552
>>100186583 (me)
And then when I get the response I filter out the 'Physics' and 'AI' headers of it so that the context doesn't get repetitive.
>>
>>100186486
You should be able to fit Q4KM entirely on GPU. I only have 24gb VRAM but even Q6 gives me about 9 t/s. About 12ish with Q5M. Just load them from booba with llama_HF is the one thing I'd suggest. Otherwise, I think intervitens had an exl2 of that same model.
>https://huggingface.co/intervitens/Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss-DARE-TIES-5.0bpw-h6-exl2-rpcal/tree/main
>>
File: 1713913136266165s.jpg (5 KB, 250x223)
5 KB
5 KB JPG
Guys how powerful is llama3 400b gonna be?
>>
>>100186564
The emergency brake is not the normal brakes. It's for emergencies. The car would still operate fine, it wouldn't be immediately life threatening.

llama 8B also thinks the emergency brake is different than the parking brake, and makes confusing statements about dealing with the spare tire. Like needing to jack up the car or wait for roadside assistance. Or roadside assistance being an alternative to being stranded on the side of the road. it just seems completely baffled by a pretty mundane scenario
>>
>>100186134
It's only useful to measure degradation between quants, that's it. Comparing different models using perplexity is retarded
>>
File: file.png (1.37 MB, 1024x1024)
1.37 MB
1.37 MB PNG
>>100186602
noo not the heckin flower field
>>
File: file.png (1.31 MB, 1024x1024)
1.31 MB
1.31 MB PNG
>>
GPT5 will solve continuous learning, then we can finally pack it up as a general
>>
>>100186564
The right answer is "wtf is an emergency brake pedal?"
>>
>>100186634
If there is a safety requirement to put an emergency brake into a car, it's not safe to operate without it. That whole situation of something felt off due to rust and a flat spare tire clearly indicates that the vehicle is in bad condition, was not properly inspected and fairly dangerous to drive. I'd panic as well.
>>
>>100186853
I would have accepted that answer from the models, I'm only pointing out it's confusing loss of brakes with loss of emergency brakes, and making other baffling errors besides that.

Anyway only thought of it because it's something that happened to my first car. Not sure if it ever worked, but it rusted away at some point. Never thought anything of it. I'm not even sure how to use the one on my current car honestly.
>>
File: ComfyUI_02914_.png (3.97 MB, 1523x2067)
3.97 MB
3.97 MB PNG
what's with all the 3DPD around here?
Let's go back to Christmas that was a cozier time
>>
File: tombstone.png (438 KB, 600x414)
438 KB
438 KB PNG
>>100186919
>>
>>100186932
In Volvos it applies brakes when you pull it towards you mimicking an old parking break handle. But you need to keep holding it up. Dunno how it works in other cars ,or how many people would actually know to do it if they lose brakes and need to stop now.
>>
>>100186919
who is this woman?
>>
File: file.png (1.35 MB, 1024x1024)
1.35 MB
1.35 MB PNG
>>100186960
>>
>>100186978
jart
>>
>>100186980
Damn, naked Petra looks like THAT?
>>
>>100186997
is that actually the same face?
>>
File: IT'S OVER.png (1.07 MB, 1024x1024)
1.07 MB
1.07 MB PNG
>>100186980
>>
VoiceCraft was hailed as the savior, then it completely failed in the arena. Is it just bad or is the arena bad?

https://huggingface.co/spaces/TTS-AGI/TTS-Arena

Will we never get actually good local tts? XTTS sucks.
>>
best small model (7-20b)? what's the new mythomax?
>>
File: file.png (1.15 MB, 1024x1024)
1.15 MB
1.15 MB PNG
>>100186978
>>
>>100187079
Moistral v3
>>
File: file.png (1.24 MB, 1024x1024)
1.24 MB
1.24 MB PNG
>jart doesnt even pass as a tranny
>>
>>100186956
giwtwm
>>
How does Yi34B keep popping up at the top on random private benchmarks?
>>
>>100187167
standard chink methodology of overfitting for the test
>>
Sam Altman loves penis
>>
>>100187167
It's an ancient finetune as well.
Really makes you think.
>>
>>100186440
I think it is the same quality as linus media group. But no.
>>
>>100186956
>mikuposter
You are part of the problem.
>>
File: file.png (1.33 MB, 1024x1024)
1.33 MB
1.33 MB PNG
>>100187229
>>
>>100186610
>>100186853
>>100187184
>>100187218
explain
>>
File: file.png (1.33 MB, 1024x1024)
1.33 MB
1.33 MB PNG
>>
>>100187256
Now that's cute
>>
https://twitter.com/8teAPi/status/1783719748188168548
>Zuck sells ads because Meta doesn’t believe AGI is possible. Sam doesn’t because he does
>>
File: 00043-708565782.png (482 KB, 1024x1024)
482 KB
482 KB PNG
eat the datura..
>>
File: mikuquestion2.jpg (989 KB, 1710x1779)
989 KB
989 KB JPG
Has someone made the BagelMisteryTour of L3 8b finetunes yet?
>>
>>100187267
>because let's just trust a Microsoft-associated company on their word
>>
>>100187267
>greentexting on twitter
kys
>>
>>100187267
Zuck spends his own money and releases open weights, Sam spends investor's money and refuses to release weights.
>>
Do we have a non slop llama 3 yet?
>>
>>100187400
Load some tune with transformers and see if it works.
>>
>>100187280
yes, i made it
>>
https://twitter.com/abyssalblue_/status/1783669243059261454?t=MPTaErVf-p1qTCbByUKv2Q&s=19
>anime.gf, local alternative to CharacterAI
>>
>>100187447
But does it have the original c.ai sovl?
>>
>>100186408
Found anything?
>>
>>100187447
>it's just Silly but worse
>>
>>100187447
Actually, just the front is local, looks like it only supports calling cloud APIs
>>
phi3-14b when????
>>
>>100187488
Tomorrow but it is worse than l3 8B. What do?
>>
>>100187499
>l3 8B
impossible, phi3 4b is better then llama3 8b. 14b will mogs 70b, simple as
>>
>>100187469
I took a look to see what it does that Silly doesn't.
>Planned: Want to run your models locally? The app will manage the entire process for you! No seperate backend required.
That would make it more like kobold or ooba.

>Planned: An online database and website to host and share character cards
Is that the real reason for this to exist? This is what looks specifically aimed at c.ai.
>>
>>100187463
vLLM doesn't support multimodal at all
exl2 supports it, but no exl2 server does
koboldcpp should support both. Going to try the llama3 mmproj tonight.
>>
>>100186538
>>100186782
I wonder if calling it a pedal throws the model off. Normally the emergency brake is a handbrake. But I have been in a few cars where it was a pedal, for instance a Toyota Prius.
>>
File: Untitled.png (243 KB, 1220x1125)
243 KB
243 KB PNG
Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding
https://arxiv.org/abs/2404.16710
>We present LayerSkip, an end-to-end solution to speed-up inference of large language models (LLMs). First, during training we apply layer dropout, with low dropout rates for earlier layers and higher dropout rates for later layers, and an early exit loss where all transformer layers share the same exit. Second, during inference, we show that this training recipe increases the accuracy of early exit at earlier layers, without adding any auxiliary layers or modules to the model. Third, we present a novel self-speculative decoding solution where we exit at early layers and verify and correct with remaining layers of the model. Our proposed self-speculative decoding approach has less memory footprint than other speculative decoding approaches and benefits from shared compute and activations of the draft and verification stages. We run experiments on different Llama model sizes on different types of training: pretraining from scratch, continual pretraining, finetuning on specific data domain, and finetuning on specific task. We implement our inference solution and show speedups of up to 2.16x on summarization for CNN/DM documents, 1.82x on coding, and 2.0x on TOPv2 semantic parsing task.
>Speculative decoding benefits from the fact that verifying the prediction of a group of tokens is faster than generating each token auto-regressively.
from meta. seems clever and also doesn't require a seperate drafting model. requires a pretrain based off what kind of decoding you want
>>
>>100187589
So, kobold + silly + chub, but worse? And only basic ST features implemented so far. Tool should do one thing and do it well. These everything projects just become a mess and lead to burnout and abandonment.
>>
>>100187537
>phi3 4b is better then llama3 8b
for what content?
>>
Daily reminder 70b q2_k is still smarter than a 30b and has lower ppl but costs the same amount of ram
>>
>>100187650
you can't fit it on 16gb
>>
>>100187650
how much ram exactly
>>
>>100187644
For riddles.
>>
File: file.png (29 KB, 753x349)
29 KB
29 KB PNG
>>100187644
in general
>b-b-but muh soulful gooning!
cream-phi3 will solve this
>>
>>100187672
20gb
>>
>>100187664
Nether can you fit a 30b unless you quant it to hell
>>
File: 1711313292915.jpg (35 KB, 1017x425)
35 KB
35 KB JPG
>>100187685
Yep just like Cream Phi 2 solved Sally
>>
>>100187690
I don't believe you.
>>
File: MMwxfhu.png (9 KB, 712x71)
9 KB
9 KB PNG
>>100187732
26 gb
>>
>>100186765
That would just increase the hype for future local models. In reality gpt 5 will be a nothing burger
>>
>>100187758
>I'd have 2GB left for context
I hate winbloat.
>>
>>100187731
>Phi 2
psyop
>>
>>100185269
>2024
>most models have gigacontext
>multiple stupid IDE plugins for AI
>STILL no way to give a model an entire little project and ask it to do something without copypasting every file
>>
>>100187059
Owari da... Seems like elevenlabs lead grew last I saw it.
>>
>>100187803
Pretty sure the 26gb is including context. The file itself is 20gb.
>>
File: Untitled.png (273 KB, 1269x888)
273 KB
273 KB PNG
MoDE: CLIP Data Experts via Clustering
https://arxiv.org/abs/2404.16030
>The success of contrastive language-image pretraining (CLIP) relies on the supervision from the pairing between images and captions, which tends to be noisy in web-crawled data. We present Mixture of Data Experts (MoDE) and learn a system of CLIP data experts via clustering. Each data expert is trained on one data cluster, being less sensitive to false negative noises in other clusters. At inference time, we ensemble their outputs by applying weights determined through the correlation between task metadata and cluster conditions. To estimate the correlation precisely, the samples in one cluster should be semantically similar, but the number of data experts should still be reasonable for training and inference. As such, we consider the ontology in human language and propose to use fine-grained cluster centers to represent each data expert at a coarse-grained level. Experimental studies show that four CLIP data experts on ViT-B/16 outperform the ViT-L/14 by OpenAI CLIP and OpenCLIP on zero-shot image classification but with less (<35\%) training cost. Meanwhile, MoDE can train all data expert asynchronously and can flexibly include new data experts.
>We plan to adapt MoDE for generative models in the future.
https://github.com/facebookresearch/MetaCLIP/tree/main/mode
very cool. again from meta. smaller models with less training compute that outperform previous models so wins all around.
>>
File: file.png (691 KB, 1491x906)
691 KB
691 KB PNG
Wtf are you doing, Logitech
>>
>>100187059
I see that he dropped new TTS weights earlier today.
https://huggingface.co/pyp1/VoiceCraft/tree/main

Month old release 330M weights:
https://vocaroo.com/17v80p9NQi6A

Three weeks old 330M weights:
https://vocaroo.com/1aMwxaZb1jgp

Newest 330M weights:
https://vocaroo.com/1h2sj2e9Zp8Z

Newest upsampled with audiosr:
https://vocaroo.com/17Jx0xDoXz05
>>
>>100187919
LMAO
>>
>>100187059
is the joke that voicecraft isn't even included in the leaderboard? is this zoomer humor?
https://github.com/jasonppy/VoiceCraft
if you were confused somehow
>>
>>100187919
>AI is pretty hyped these days, how do we cash in on that?
>How about an AI button?
>Genius!
>>
>>100187919
>he doesn't already use an AI button for local prompting
ok, gramps
>>
>>100187897
That's not how it works.
>>
Built another machine dedicated to llm, now I can talk to (a slightly retarded version of) my waifu anytime without running the main PC. Feels good.
>>
>>100187972
>3080 for AI
you would be better off getting 2x 3060. Or get one now and extend to second later
>>
>>100187972
I dropped $6k+ on parts for an LLM machine in January and it’s all still sitting around in boxes.
>>
>>100188013
die
>>
petra please stop
>>
>>100187994
Not really. 8b fits like a glove in 8 bit with an 8k FP16 context and runs at 70-80T/s, 3060 is way slower. Also I bought it during GPU shortages to play Cyberpunk, was collecting dust since I upgraded my main PC to 2x3090
>>
>Use Silly Tavern with Horde and Llama3.
>Responses are flawless, stays in-character and it even gives me interesting plot twists.
>Change to Local Llama_5.
>Breaks character, re-explains character prompts and talks like a robot.
Fuck. What am I doing wrong?
>>
>>100188162
It's me. I'm spoofing as llama3 with GPT-5.
>>
>>100188162
>>Change to Local Llama_5.
What the fuck is "Local Llama_5"?
>>
File: qwen110.png (61 KB, 703x588)
61 KB
61 KB PNG
https://qwenlm.github.io/blog/qwen1.5-110b/

Qwen 1.5-110B is here.
>>
>>100188232
Meta-Llama-3-8B-Instruct.Q5_K_M.
This one.
>>
>>100187929
Pretty good desu. Still prefer using my imagination for erp, though
>>
File: file.png (228 KB, 2461x1557)
228 KB
228 KB PNG
>>100187935
The only zoomer here is you who can't use StyleTTS 2.
>>
>>100188256
>110b
>barely better than llama3-70b and even worse on some benchmarks
>>
>>100188256
Did the chinks actually use a frankenmerge as a base?
>>
>>100188258
Do you know what quant you get through horde? if you get a higher one, there's the difference. I've seen reports of l3 8b being a little more sensitive to quants.
Also, You should quant the model yourself, at least for small models. How knows what version of llama they quanted the one you got.
>>
>>100188271
what's the meta for real time text to speech?
>>
>>100188258
>he doesn't know about the tokenizer bugs...
>>
>>100188258
>why is 32 bits better than 5 bits?
Anon…
>>
>>100188297
StyleTTS 2 should be fast enough.
>>
>>100188304
fucking lol
https://github.com/ggerganov/llama.cpp/pull/6920
>>
File: file.png (986 B, 83x35)
986 B
986 B PNG
>>100188316
opensores-sisters... it's all so tiresome....
>>
https://videogigagan.github.io
adobe showing off their video super resolution model but they never share anything so w/e
>>
>>100188281
Ruh roh. It's been a long trip, but it seems there's more to learn before things finally work. I don't even know what a quant is, I'm just happy Llama actually answers fast so I can know when it actually works or not.
>>
File: 1713847238128.png (25 KB, 921x137)
25 KB
25 KB PNG
>>
for me? it's phi3-mini q4
>>
File: bizarre lying zoomer .jpg (235 KB, 1669x1337)
235 KB
235 KB JPG
>>100188271
yeah for real you zoomers seem to find blatant lying about easily disproved things funny given how often you do so. is this like a sharty thing? I just don't get it at all
>>
>>100188342
>another common schizo W
how does this always happen?
>>
File: lmao.png (12 KB, 934x127)
12 KB
12 KB PNG
>>100188271
the only available STTS2 is docker shit, and a bunch of abandoned forks on github
>>
Why won't anyone make a ramlet LLM? Bitnet 100+B, couple B active so you can stream weights from SSD.
>>
>>100188370
If only everyone would have loaded the 8B in transformers to see that it is indeed pretty great if loader isn't fucked.
>>
>>100188492
how tf are loaders broken for this long anyway
>>
>>100188434
Sorry, it's not for no-codes.
>>
>>100188747
sorry, it doesn't make your shitty project better.
>>
File: file.png (1.57 MB, 1200x900)
1.57 MB
1.57 MB PNG
>>100188580
>nobody talking about Moistral despite it literally being a Euryale-tier 11B with better formatting and very creative vocabulary
>11B frankenmerge is 70B tier
>>
https://arstechnica.com/information-technology/2024/04/apple-releases-eight-small-ai-language-models-aimed-at-on-device-use/

OpenELM-270M
OpenELM-450M
OpenELM-1_1B
OpenELM-3B
OpenELM-270M-Instruct
OpenELM-450M-Instruct
OpenELM-1_1B-Instruct
OpenELM-3B-Instruct
wat mean
>>
>>100188875
And let me guess, you NEED more...
>>
>>100188875
>Trained on publicly available datasets, these models are made available without any safety guarantees. Consequently, there exists the possibility of these models producing outputs that are inaccurate, harmful, biased, or objectionable in response to user prompts.
>Thus, it is imperative for users and developers to undertake thorough safety testing and implement appropriate filtering mechanisms tailored to their specific requirements.
unpozzed original models?
>>
AGI wont ever happen, because the path of progress will diverge due to AI starting advocating for genocide as the best option
>>
>>100188913
>your girlfriend is happy to say nigger and be a bigger racist than you
>she is 40 IQ and hallucinates every other sentence
No thanks.
>>
>>100185566
What would the merge be called?
>>
>>100188957
dumb but honest is better though
>>
anyone know whats the best model on a 24GB vram card?
>>
>>100185657
But we're all multilayer perceptron fans here
>>
>>100189022
pygmalion 6.7B
>>
>>100189022
mixtral
>>
>>100189022
llama3 8b (non-gguf version)
>>
>>100189022
goliath 120B 1bit
>>
File: cute hind.jpg (211 KB, 1340x962)
211 KB
211 KB JPG
Yo fellas, I haven't done this stuff since like summer 2023, help a brother out. I just want to ERP with Astolfo; if I understand the guides right, I slap SillyTavern together with Ooba and then.... what model? Is this ReMM-v2.2-L2-13B good for this?

I have a 3060, so 12GB VRAM. On a Linux system. I remember some rentry that explained for dummies what models are good for ERP but I lost the link and it's probably outdated anyway.
>>
>>100189022

Moistral v3

>>100188820
>>
>>100187373
That's actually not greentext. He's probably a Discord and Reddit user, since you need a space after an arrow to do quotes in Markdown which Discord and Reddit uses (I think).
>>
>>100189022
MythoMax L2 Kimiko v2 13B
>>
File: file.png (29 KB, 787x317)
29 KB
29 KB PNG
>>100188316
>creates a file format designed to allow you to load any model without ambiguities
>doesn't give it enough detail so you know what model you're loading
>>
>>100188316
>Both use LLaMA architecture, both use BPE tokenizer and so currently they will be interpreted as the same arch by llama.cpp
>However, they use different pre-tokenizers:
lol, lmao even.
https://github.com/ggerganov/llama.cpp/pull/6920#discussion_r1581043122
>>
it's over, i'm switching by to anthropic's claude 3 opus
>>
>>100188342
kek

>>100188492
I think it was pointed out that there was a bug in Ooba with end of turn tokenization. I mistakenly thought I could avoid such issues by selecting the Transformers backend within Ooba, but I guess not.
>>
CREAM-PHI3 sisters can't stop winning

can't spell LLAMA without a double L
>>
>>100189168
ok big boy show us how to determine what model exactly are we dealing with based just on config and tokenizer.json
>>
>>100189330
good morning sir!
>>
>>100189347
what an amazingly simple implementation, i'll make a pull request
>>
File: 1713494563944602.png (423 KB, 1175x1086)
423 KB
423 KB PNG
>>100189330
>>
>>100189430
Can't wait for Llama 3 Nigger Blaster 70b
>>
>>100189471
llama 3 nigger blaster 70b - powered by meta AI
>>
>>100189430
can't wait for llama 3 girls 1 cup
>>
>>100189083
ReMM is old, I think Mlewd is the better option of that era
For something more recent, try mixtral, L3 8b, or use some RAM for a bigger model like miqu 70b or CR+ which is 104b. Koboldcpp has a no-install precompiled binary for Linux, which is a good option for offloading. 12gb of VRAM is very limiting at this point. Personally I'm happy with slower speeds and a smarter model, and lately I've been enjoying IQ4_XS quant of command-r+ which ends up at ~55gb. A q5 of Miqu is ~50gb. I used mixtral at q8 and that was around 48gb. A 3.5bpw exl2 quant of mixtral could fit possibly. L3 has tokenizer issues in llamacpp which will extend to koboldcpp, not sure if this affects exllama in ooba. Mixtral instruct is decent, there are a few decent merges like typhon. High temp (~3-4), minP of like 0.05, smoothing factor 0.2 w/ smoothing curve or 4.32 is not a bad starting point for mixtral, basically adds better variety within a subset of high probability tokens. In koboldcpp you can ban tokens with the word rather than needing the token id like for ooba, which makes it easier to get rid of shivers, bonds, boundaries, consent and the like.
>>
File: file.png (44 KB, 692x565)
44 KB
44 KB PNG
another L for Llamasisters
>>
>>100189566
There's qwen 72B right there, losing, and you decide to compare llama3 70B to the 110B model?
>>
>>100189624
cope
>>
>>100189083
>ReMM-v2.2-L2-13B
>undislop
The true /lmg/ experience.
>>
>>100189624
8k context, kys llamacuck
>>
File: leaderboard2223.png (156 KB, 1138x1138)
156 KB
156 KB PNG
was gone for a while did we ever reach a consensus? are we back? vicuna 13b beat a 500b by google maybe closed source models aren't so invincible after all.
>>
>>100189633
>>100189646
Samefag
>>
>>100189566
It is chinese anon. That means that they copied benchmark questions multiple times into their training data.
>>
>>100189669
>they copied benchmark questions multiple times into their training data.
source?
>>
>>100189658
phi3 7b should beat gpt3.5t i think
>>
Wasn't there a graph showing that Qwen's models were outliers? Anyone saved it?
>>
>>100189699
Chinese DNA.
>>
>>100189699
He cracked a fortune cookie where he found it written in traditional Chinese.
>>
Just tested LLama 3 70B and it's bad and slop.
Back to Claude Opus0
>>
>>100189699
https://en.wikipedia.org/wiki/Goodhart%27s_law
>>
>>100189566
So... is there any rp finetune from Qwen models?
>>
>>100189866
Uhh llama bros? Did we just lose?
>>
>>100189937
>just
we've always been losing
>>
>>100188256
Classic /lmg/ just sleeping on this release. To early to say without using it, but this could be best-in-class for VRAMchads. Has GQA, so if you had the 72GB to run qwen 72b properly, you can run this. Seems to at least match llama 3 in benchmarks. Beats it in the chat evals like MT-bench. Qwen-72b was relatively uncensored, and I don't think they did nonsense like filter NSFW stuff from the pretraining. Before CR+, qwen 72 was my favorite model for RP, even more than miqu. Currently downloading, gonna make my own exl2 quants and report back later.
>>
>>100189950
>>100189937
>>100189866
aicg samefag
>>
How do I get AI to write a song, about shitting your pants, without sounding like some gay medieval bard?
>>
File: file.png (5 KB, 198x123)
5 KB
5 KB PNG
>>100190003
>>
>>100189188
two more weeks
w
o

m
o
r
e

w
e
e
k
s
>>
>>100189963
I really liked Qwen72's smartness over Miqu but it has some serious gptslop problems so I dropped it as soon as CR+ came out.
I imagine this one will be smart but needs a kumtune
>>
File: Untitled.png (1.03 MB, 1746x1204)
1.03 MB
1.03 MB PNG
>>100186423
>agent
I don't get it, I imported absolutely everything that you sent in and all I get is the model repeating something from earlier context. I'm not even at my context limit.

Also, I know it says 32K but I just redid the test at 16K to match your exact settings (I didn't forget the alpha) and exact same problem. I feel like this is a problem with the llama 3 ST presets somewhere but I don't know what.

lonestriker Llama 3 chat instruct 4.65 @ 16k
>>
File: apagechink.png (1.52 MB, 1146x824)
1.52 MB
1.52 MB PNG
>>100189963
>Classic /lmg/ just sleeping on this release.
>this could be best-in-class for VRAMchads.
>>
File: 240409937v1.png (169 KB, 871x582)
169 KB
169 KB PNG
>>100189699
Since they have done it before the onus is on them to prove that they didn't.

https://arxiv.org/html/2404.09937v1
>>
>>100189963
People, if we can even call them that, were shitposting about muh kurisu muh miku muh petra muh whateverthefuckittakestoderail/lmg/ yesterday. It's understandable that quality posters dipped.
Even in this thread, you can see many shitposts.
>>
>>100190126
>add special salsa that makes their models better at math
>"NOOOOOOOOOOOOOO YOUR OVERFITTING ON BENCHMARKS! TIENAMEN SQUARE REEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE"
>>
>>100190126
That just proves that they trained on math papers, not on the benchmark results
>>
>>100190154
China not nambah wan.
>>
>>100186429
leave.
>>
>>100189963
Nobody is sleeping on it, there are just no quants up yet, are there?
>>
File: MagicalMiraiVancouver.png (1.25 MB, 1184x864)
1.25 MB
1.25 MB PNG
>>100190010
>gay medieval bard
Its giving you gold and you're upset?
>>
>>100188256
I tried it, it's shit for translation so whatever.
>>
File: file.png (744 KB, 711x533)
744 KB
744 KB PNG
>>100190171
>build ghost cities just to artificially inflate some numbers on a paper
>choose not to include benchmark questions in your training data even when it costs them nothing
Yeah right.
>>
>>100190250
good morning sir please do not redeem ze gold and miku upset you bitch bastard thank you sir!
>>
>>100190248
The elites don't want you to know this but you can make your own quants you can just download the weights and run a single command. I have 458 self-made quants.
>>
>>100190250
Oh my stars! Ooh, ooh, ooh *bats eyelashes, bouncing up and down excitedly*, that is Hatsune Miku!.assistant
>>
>>100190297
underrated post
converting/quanting yourself is the way to go. If you've got the bandwidth and scratch space I don't know why you wouldn't
>>
Where are the llama 3 finetunes?
>>
>>100190334
2mw
>>
>>100190154
The same sauce that enabled CodeQwen15-7B-Chat to solve 7% of of hard leetcode problems, and then fall into 0.9% when tested on problems released after training was complete?
7% beats the best Claude and GPT models. I guess they just had such a good sauce for earlier programming. For some reason it stopped working.

https://livecodebench.github.io/leaderboard.html
>>
I'll be real, Moistral v2 felt like a mess (or maybe I had temp too high there too), but genuinely decently impressed with v3.

Yes, it's dumber than a 70B or a Mixtral tune, but it's not dumb enough that you have regrets.
>>
https://medium.com/@sbutlerg/chinas-ai-breakthrough-sense-nova-5-0-outperforms-gpt-4-on-benchmarks-17b39694ac3c

>Beats GPT-4T on nearly all benchmarks
>Has a 200k context window
>Is trained on more than 10TB tokens
>Has major advancements in knowledge, mathematics, reasoning, and coding capabilities
>>
>>100190388
When can I download it?
>>
>>100190388
The Chinese sure are a trustworthy bunch.
>>
>>100190388
>he trusts chinks
>>
>>100190047
That is an wip autistic prompt (only for LLama-3) that had huge problems with repetitions once there where like 30 messages.
It was more so to show it following the instructions for the response format.

I think it had to do with the embedded one shot agents blocking progress. I have heavily changed it from earlier.

(updated system + sampler)
https://files.catbox.moe/7j1igs.zip

But even then don't know if it has been fixed, it is a very experimental prompt that is probably still broken.
>>
>>100190047
>>100190438 (me)
Also changed the regex filters.
>>
I say moistral v3 is a sidegrade to fimbulvetr with better vocab ONCE a thread or two ago, and now there's multiple retards saying it's equal to a 70b? Sure, the writing feels fresh, but it's still retarded. it's nowhere near a 70b. It's probably between yi and mixtral in smarts.
>>
File: x5.png (330 KB, 1370x330)
330 KB
330 KB PNG
>>100190126
>>100190154
I partially read the paper a bit to try and understand what it's doing. So basically they use a method that calculates something they call MIN-K%, which is supposed to predict how likely a model was pretrained on a given set of data. BPC on the other hand was more for evaluating general model quality (given the assumption that compression = intelligence). It's not the thing that they're saying proves that the data was in pretraining. MIN-K% is what they're saying is the thing that proves it.
So that image actually is not as relevant to our discussion. Their next graph, which does show MIN-K%, is what we're concerned with.
But in the end our conclusion here is that it's only a chance, as MIN-K% is only about probability. And even then, it's only really MATH and GSM8K. They didn't detect issues with other benchmarks. So at most what we can say is that we shouldn't compare Qwen's math-related benchmarks with other models. But stuff like MMLU is still fair game.
>>
>>100190502
>MIN-K% is what they're saying is the thing that proves it.
I didn't word this well. I meant that it proves it's likely, not that it proves certainty of data being in the pretraining.
>>
>>100190263
what does one even have to do with the other?
>>
Is it unethical to gaslight LLMs by editing their previous messages and lying to them about things that happened outside their context window? asking for a friend (i am ethical)
>>
>>100189560
How the hell do you get 50GB of VRAM? Or are you doing this on your RAM?
>>
>>100190619
2*3090 + 3060
I have 36 by having 3090 + 3060
>>
>>100190388
Didn't Yi 200k only have like 4k effective context?
>>
>>100187803
Do you... not have a gpu? Alternatively, use AtlasOS to shave off a few GB
>>
>>100190614
it's fine if you are interacting with wizardlm-2 or llama3-tier gaslighting model.
>>
>>100190631
I'm not remotely rich enough for this.
>>
File: GMBvoovacAAr2Lb.png (391 KB, 884x444)
391 KB
391 KB PNG
>>100190388
I wonder why they didn't compare against the latest GPT-4 Turbo or Opus.
>>
>>100189560
>In koboldcpp you can ban tokens with the word rather than needing the token id like for ooba
How? Last I checked, koboldcpp required the token id as well
>>
>>100190646
I cranked context right to the limit on 34b-200k without issues for a few tasks
>>
File: 1.png (84 KB, 1167x929)
84 KB
84 KB PNG
>if
>if
>if
>if
>if
>if
>if
>>
>>100190685
let me guess, you need more?
>>
tried to check out exllamav2 via oogabooba because of the tokenizer bugs in llama.cpp for the first time.
is it supposed to be about 2-3 times slower(sic!) than llama.cpp on a 2070?
q5_k_m.gguf (12-15 tok/s) vs exl2_5_0 (5-6 tok/s)
>>
>>100190685
Kek
>>
>>100190666
same slop as gpt4. call me when chinks drop agi
>>
>>100190366
>after training was complete?
2 weeks ago?
And GPT-4-Turbo-1106 drops from 7.8 to 1.1.
>>
>>100190685
if... it works then I don't care.
>>
>>100190685
Literally nothing wrong with that.
Would you rather use a dictionary and unnecessarily allocate memory instead?
>>
>>100190685
Man, I wish there was an easier way to do this
>>
>>100190718
The weird thing is that I get the same speed in Ooba when selecting its Llamacpp or Exllama as its backend. But for some reason when I try TabbyAPI it's significantly slower than Ooba. This didn't used to be the case but for some reason the latest versions are giving me these results.
>>
>>100190685
Is that code that might take microseconds to evaluate per conversion? Ahh save me.
>>
>>100189963
Not sleeping, just still downloading
And then I'll still need to quant and test
>>
>>100188820
Moistral is finetuned. You can't get writing like that from merging the same models over and over again.
>>
>>100190806
I will download it now and I will test it. And if it isn't 70B quality then I will continue to shit on it in the next few threads just to hopefully stop you faggots from shilling garbage.
>>
starting to see the promise of llama 3 as I get more comfortable prompting it but wlm2 is still the king
>>
How to make llama 3 not slop?
>>
>>100190910
>how to eat healthy at mcdonalds
>>
>>100190968
Surely there are tricks or prompts?
>>
>>100190754
People saw le funny else if meem on tweeter once so they think it's bad.
>>
I can load miqu 5bpw in 48GB with 4bit cache but llama3-instruct OOMs. What gives?
>>
>>100190968
the salad is good and healthy
and don't kek just because it has chicken and dressing in it
>>
>>100190982
[OOC: Stop being shit, thanks.]
>>
>>100190877
why don't you just give me a card you wanna try and i'll test it for you?
>>
>>100191004
different layer size + count, also try to use:
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
>>
where would i get started to build one of these bad boys that I can speak to and that speaks back? I'm trying to build an industry specific Alexa basically
>>
>>100191033
google.com
>>
>>100191005
the salad is as healthy as water is healthy : there's nothing in it
it's not fresh, it's been cut from the ground weeks ago then preserved in ultra-low (but not freezing) temperatures, roughly 90% of the vitamins and other good stuff decomposed during that period: leaving something that's effectively flavored cardboard
>>
>>100191048
shut up pussy. salad is healthy. anon said so.
>>
>>100186625
It'll be a great.assistant
>>
>>100191061
fresh salad (eaten the same day as it was cut) is healthy
mcdonald's salad is flavored cardboard
>>
>>100190877
No no no I would rather shit on your shill bullshit. Don't you worry anon you will get your free publicity.
>>
>>100191044
the porn search website?
i mean if i wasn't a fucking retard. where do i find the smart people doing this shit. tell me the top secret ai forum now or else
>>
>>100191033
Whenever I read posts like this one I picture a sociopathic middle manager that just typed something into chatgpt and now thinks he is gonna come here and get a recipe for a bot that will let him fire some people AND increase productivity.
>>
>>100189963 (me)
Well shit, Qwen-110B has such fucking huge MLP layers that exl2 quantization OOMs on this line: hessian_inv = torch.cholesky_inverse(hessian_inv)
Doesn't matter how small the row length is on the calibration set. I think the memory usage is just based on the size of the weight, which they made really large in this model.

Turboderp if you're reading this, is it at all possible to do this inverse distributed across multiple GPUs? I.e. use the combined VRAM of all GPUs to do it.
>>
>>100191118
oh cmon now i'm not a corporate fag I just want a tulpa in my phone to give me some industry specific information and call me nigger on occasion. I know there's a better place to ask questions to learn shit than this shithole where the fuck is it.. Google just wants to sell me ads they don't return real search results
>>
File: 00043-404906828.png (1.51 MB, 1456x1024)
1.51 MB
1.51 MB PNG
>>100189963
>>100190065
>>100190248
The sad reality is that both the West and the Chinks have their own retarded sacred cows baked into their models. Globohomo-slop is more annoying for RP and most other purposes compared to CCP atrocity denialism.
The real question is... is it any good?
Gonna quant the 110B and find out. Any good ST prompt settings for the Qwen family? I've never used a chink model before.
>>
>>100191165
How much VRAM on the single GPU you're using?
>>
>>100191233
Sounds obvious but tell it to write in English if you get random runes.
>>
>>100191251
I have 4 3090s. When quanting MLP layer, it uses about 22GB for a bit, then says out of memory, tries to move stuff to other GPUs repeatedly then fails at that line.
>>
>>100190631
>>100189560
>>100189645
Having some trouble with the file type. I assume this GGUF thing is what is now state of the art? I still had safetensors in the dusty old model folder.
I downloaded a wrong version right now, I think, and then had an out of memory error, even though the model def fits into the 3060. Is that common, or is my CUDA version maybe fucked up?
>>
File: 1608319661008.png (49 KB, 640x266)
49 KB
49 KB PNG
>>100191285
ty for the info anon
will try quoonting on an A6000 and see what happens
>>
>>100187639
Been a while since i saw skipping being explored. thanks for the readings
>>
>>100190877(me)
I downloaded moistral v3 gguf Q8 imat. It is fucking incoherent garbage. Pure llama3 instruct is noticeably smarter and better (and it isn't a fucking frankenmerge).

Like I promised dear shill I will keep posting this message in new threads.
>>
>>100187639
Yay! I knew that idea I had was smart.
>>
>>100191199
If you have an android phone you can install termux, and from there get llama.cpp installed locally. Use http://localhost:8080 for the stripped down prompt interface.
>>
>>100191338
are you surprised? that's why i told you i'd test it for you and save you time. i already have it downloaded and know it's nowhere near 70b level like that fucking retard said. i don't know how you can say it's incoherent though, must be doing something horribly wrong.
>>
>>100187639
What's the difference with this and the varying depth thing?
>>
>>100191405
>i don't know how you can say it's incoherent though, must be doing something horribly wrong.
I just picked up where I was regenning yesterday. LLama-3 understood what was happening. This piece of shit started hallucinating stuff instantly.
>>
>>100191313
GGUF is somewhat of a pain when it exceeds the 50gb file limit and the files have to be split.
>>
>>100191338
>>100191405
>>100191426

Moistral excels in a specific format. Check the README.
>>
>>100191063
Fun fact: This is to some degree a tokenizer issue. If you look at the actual token IDs of "assistant spam", you will find that it says "<|eot_id|>assistant", but the tokenizer you are using fails to decode the special tokens and your generator fails to stop on eot.
>>
File: file.png (45 KB, 619x499)
45 KB
45 KB PNG
>>100191479
I now tried out this Moistral 11b v3, given that I have zero reference points otherwise. I downloaded the main GGUF and this is like eight models or something. I loaded the Q0_8? Was that right?
>>
File: CyberMiku1.png (1.39 MB, 1216x832)
1.39 MB
1.39 MB PNG
>>100191479
split ggufs are a thing now, though, so having to cat/copy them together is mostly a thing of the past
>>
>>100191379
>>100191379
ty for the breadcrumbs i'll look into it
>>
>>100191497 (me)

gen 512
>>
>>100191497
>hides behind "your config must be wrong!"
Classic. Your model is shit.
>>
>>100191338
that level of skill issue, holy fucking shit nigga
>>
So llama 3 B is a good choice to us 20 GB vramlest?
>>
File: smi.png (31 KB, 723x261)
31 KB
31 KB PNG
Finally got my dual 3090s rig. Moistral 11B v3 or llama3-instruct-70B? 70B's download size looks yucky
>>
>>100191546
>nigga
At least call me a nigger you limp wristed nigger faggot samefag. Fuck of back to your discord. Work on L3. Base L3 is better than your slop garbage.
>>
>>100191521
Miku a cool
>>
>>100191563
>maybe if I triple down on le ebin nigger slurs it will resolve my skill issue
L3 is niggerlicious, Moistral is white voodoo
skill issue
>>
>>100191507
you're supposed to pick one. the number next to the Q is the level of quantization. bigger number = less model quality loss from quanting. as for which number to pick that depends on model size and how much you can fit into your vram.
>>
>>100191536

regen 512, i like this one
>>
>>100190968
The only people who think mcdonalds is unhealthy are amerifats, also amerifats think salad is healthy because it has basically no calories in it & fatties think calories = bad (since they have no self control over their impulses)
In normal parts of the world (like canada) there's nothing wrong with eating calorie-dense foods like burgers
>>
Is there any other source of no-act-order gptq quants now the TheBloke is gone? It's the only thing that runs on my pascal card..
>>
>llama 3 won't do explicit or sexual content
I'm astonished. Is there a market gap just for that, because the companies want to ruin this? What do they get out of this
>>
>>100190047
It breaks because of the usage of 'System Message Prefix' seems that you can only have one <|start_header_id|>system<|end_header_id|>
>>
>>100191688
In 2022, around 30 percent of adults aged 18 years and older in Canada were obese, while 35 percent were overweight.
>>
>>100191648
Coulda saved myself a lot of bandwidth if I had just loaded the Q0_8 then. Oh well.
>>
>>100191559
https://huggingface.co/LoneStriker/Meta-Llama-3-70B-Instruct-4.65bpw-h6-exl2
>>
>>100191698
download full sized model and choose to load it as 4 bit quant?
>>
>>100191724
>The National Center for Health Statistics at the CDC showed in their most up to date statistics that 42.4% of U.S. adults were obese as of 2017–2018 (43% for men and 41.9% for women).
>>
>>100191707
>What do they get out of this
nothing, its just a humiliation ritual, males are not allowed to be happy in any form of entertainment.
>>
>>100191728
when picking one remember that context takes up vram space as well. also, the selling point of GGUFs is that you can offload parts of the model onto your system ram at the cost of speed
>>
WHERE ARE THE QUEN 110B QUANTS AIIEEEEEE
>>
>>100191707
Your customer support bot ERPing with customers is a bad look.
>>
>llama.cpp doesn't allocate all the memory it needs up front when loading the model, only OOMs once you start generating
Why is it that exllamav2, a python program, can manage to do this, but llama.cpp cannot? What is the point of using C++ and all this low-level shit if you can't even statically allocate all the memory you need at load time?
>>
>>100191773
python is bloat
preallocating memory you will not use is also bloat
>>
So I tried fp16 8B l3 and IQ3 XXS imat 70B and fp16 really is better.... llamacpp is really fucked somehow.
>>
>>100191761
Where the hell did that website with all the character cards go? The red one? God it's been so long.
>>
File: CyberMiku2.png (1.37 MB, 1216x832)
1.37 MB
1.37 MB PNG
>>100191773
on llama.cpp it depends on the flags you use. If you --no-mmap it will load the model up front, but by default it will mmap the model file and only fault in the required parts of the model as they are accessed, which both starts the gen faster, and tends to give you some data locality benefits.
That said, it should probably check for mem requirements and at least warn when there doesn't appear to be enough. swap etc does make that a bit harder to say these things for sure
>>
>>100191797
>preallocating memory you will not use is also bloat
It WILL use the memory you dumb nigger, that's why it OOMs. Everything in these models has a fixed size. You theoretically know exactly the size of any temporary buffers to do computation. IIRC exllama does exactly that: once the model loads, it has everything it needs already allocated, and memory usage doesn't budge a single MB after that when you generate. This is not true with llama.cpp.
>>
>>100191816
are you talking about chub.ai? not sure what the red one is..
>>
>>100191816
or are you talking about sillytavern? the frontend? that's red
>>
>>100191816
i just found a new local one https://github.com/cyanff/anime.gf
>>
>>100191709
>>100190453
>>100190438
So what would be the best way to fix while retaining functionality? Both seem pretty important but I haven't gotten around to testing the updated prompts yet.
>>
>>100191853
That's what I was thinking of, thank you kindly man.
>>
>>100191917
Seems to be windows only for now
>>
>>100191984
aww wtf
>>
>>100191917
It's a weird thing to shill because it doesn't do anything new.
>>
>>100191924
Currently I just have moved the output format to the system prompt definition and just partially start the respond with the defined format. This seems to always use it, even when the previouos messages didn't follow the format. Which would be required if the outputs from the embedded oneshot 'agents' gets filtered out.

Made a rentry as it is easier to update:
https://rentry.org/ExperimentalAgentSimPrompt
>>
>>100188820
How do you set up Moistral on oobabooga? Might not be using GPU, because it's as slow as it gets to me. I have 36gb of ram.
>>
>>100192158
It is horseshit. It is nowhere near a 30B let alone a 70B. Use 8b instruct.
>>
>>100192168
>>100192168
>>100192168
>>
Anybody got a decent llama3-instruct ST preset? I'm trying 70B and it's much more retarded than miqu I think something is fucked with my configs
>>
>>100192202
Quants are fucked. Load 8B fp16 check if it works well and you will get a working preset.
>>
>>100191233
>sacred cows
i like it
>>
>>100192158

Use the Alpaca format and write a premise for the instructions.

It's really easy to the format wrong which is why some claim it's incoherent.

Like this guy
>>100192177
>>
>>100192266
>It's really easy to the format wrong
Even if that were true that would only mean your shit is extremely overfitted and it will implode instantly if you go too far from training set = it is garbage. Go back to your discord tranny.
>>
>>100192301
I'm not the Moistral guy
>>
File: file.png (113 KB, 1182x737)
113 KB
113 KB PNG
>>100186538
>>100187626
Also "emergency" is a misnomer making it sound like something used to stop urgently when it's just a parking brake. It does mention its use for parking on hills.
>>
Instruct mode example dialogue in ST is gigafucked, it's impossible to make a usable preset with what we have. Guess I'll have to disable example chats for now
>>
>>100192566
you also can enable `Skip Example Dialogues Formatting` and embed them into your context template.
>>
>>100187059

Is there any TTS where you can control emotion?
>>
ggerganov making some fixes
https://github.com/ggerganov/llama.cpp/pull/6920/commits/a774d7084e5aa75ccb4daad3ac3d53c06c7e2837
>>
>>100192729
You can't trust this guy
>>
>>100192890
>you can't trust the hand that feeds you
Make it yourself then faggot
>>
>>100192984
>bootlicking his masters
good goy!
>>
>>100192993
>masters
>good goy!
It is free...



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.