/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>102429190 & >>102417229►News>(09/17) Mistral releases new 22B with 128k context and function calling: https://mistral.ai/news/september-24-release/>(09/12) DataGemma with DataCommons retrieval: https://blog.google/technology/ai/google-datagemma-ai-llm>(09/12) LLaMA-Omni: Multimodal LLM with seamless speech interaction: https://hf.co/ICTNLP/Llama-3.1-8B-Omni>(09/11) Fish Speech multilingual TTS with voice replication: https://hf.co/fishaudio/fish-speech-1.4>(09/11) Pixtral: 12B with image input vision adapter: https://xcancel.com/mistralai/status/1833758285167722836►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/llama-mini-guidehttps://rentry.org/8-step-llm-guidehttps://rentry.org/llama_v2_sillytavernhttps://rentry.org/lmg-spoonfeed-guidehttps://rentry.org/rocm-llamacpphttps://rentry.org/lmg-build-guides►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbenchJapanese: https://hf.co/datasets/lmg-anon/vntl-leaderboardProgramming: https://hf.co/spaces/mike-ravkine/can-ai-code-results►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
►Recent Highlights from the Previous Thread: >>102429190--StyleTTS-ZS zero-shot text-to-speech synthesis project and xtts2 improvements: >>102430346 >>102430383--Mistral AI correctly answers Castlevania trivia, but struggles without game context: >>102430040 >>102430050 >>102430068 >>102430106 >>102430513 >>102430626 >>102430648 >>102430653 >>102430665 >>102430751 >>102430762--Mistral-Small-Instruct generates a Python script for the booba API: >>102429844 >>102429886--Mistral AI's Pixtral 12B model shows high accuracy in multimodal knowledge and reasoning tasks: >>102430951 >>102430997--Q6_K_L and Q8 quantization types compared: >>102432386 >>102432464 >>102432519 >>102432528 >>102432832--Prompt engineering automation and evaluation for smaller LLMs: >>102429803 >>102429867 >>102430180 >>102430253 >>102431460 >>102431498--Poor quality AI output with explicit content, users hope for improvement through finetuning: >>102434343 >>102434414 >>102434484--Mistral Small's intelligence drops at longer context sizes: >>102431384--Mistral Small model generates in-character response for "The girl called Alice": >>102431859 >>102432030 >>102432064--Mistral Small Q8 can solve Sally question due to training on quizzes: >>102430428 >>102430654 >>102430636--IQ2_M Mistral Small model is usable and generates smart responses: >>102431056 >>102431145 >>102431429--ExLlamaV2 8bpw models padded with extra precision, 6bpw precision sufficient: >>102432045 >>102432069 >>102432103 >>102432834 >>102432962 >>102432987 >>102433001 >>102433040 >>102433048 >>102433108 >>102433180 >>102433192 >>102433082 >>102433149 >>102433244 >>102433764 >>102433261 >>102433277 >>102433311--Teto (free space): >>102429241 >>102429806 >>102430975 >>102431015 >>102432273 >>102433465 >>102433919►Recent Highlight Posts from the Previous Thread: >>102429197
>>102434739I respect your high context autism, but goddamn anon. 30k at 30t/s is a pretty nice deal I say.
>>102434752>IQ2_M Mistral Small model is usable and generates smart responsesTrue.
>>102434766>30kI don't believe there is a single model that can do 30k tokens ERP without the quality being complete trash. There are just too many patterns it can pick.
Hi all, Envoid here. I made a theme song for Drummerhttps://voca.ro/19Y676wbJfOM
>>102434851That's why you use meme sampling that discourages repetition. Personally I use presence penalty at 1 to encourage new tokens over old ones universally (but 1 is still a low enough value it won't pass on important tokens even if they repeat) and DRY at the default values for discouraging repeated strings of tokens, although both need the length adjusted until you find the sweet spot or the output quality will eventually suffer when too many tokens are being penalized.
bros... how do I speed up my vector stores... computing cosine similarities is so fucking slow that it's better on CPU...
is it just me or is mistral small 22B a complete nothingburger?
>>102434973It's a nothingburger for non-VRAMlets. But I enjoy tuning models for VRAMlet enjoyment. So it's a something burger for me.
>>102434973It's a collective vramlet delusion.
I ran TabbyAPI four times and the probabilities of the top token were different each time. Is there nondeterminism in exllamav2 or is my graphics card dying?The whole reason I was doing this was tracking down an unexpectedly large discrepancy between Mistral Small 8.0 bpw exl2 and 8_0 gguf. The top token probability was 0.6850 on llama.cpp on the first run then stable at 0.6843 on multiple subsequent regenerations so I presume it's because something got cached, while using TabbyAPI it was 0.5913 then 0.5951 then 0.5762 then 0.5738. The (IMO rather large) difference ended up mattering in my case because the absurdity I was tracking down was right on the border of being excluded by a min p filter. After making sure samplers were neutral on both the next thing I did was check the SHA256 checksums of the files since I can't believe the difference was that great. If the problem isn't that my hardware is dying, then something is very wrong with at least one of those quants. Both were from LoneStriker. For that matter even if my hardware is dying it still could be true.
>>102434858kino
>>102435003>the absurdity I was tracking down was right on the border of being excluded by a min p filterAs an aside, the token to exclude could actually be the start of numerous reasonable clauses, but if that token is picked then the model with overwhelming likelihood (90% confidence in the top token) predicts the next token to be something that contradicts previous information. The gravitational pull of using certain phrases seems too great.
sometimes I think maybe feeding terabytes of scraped web text into a statistics engine will not possibly lead to agibut I might just be retarded
sometimes I think anons don't realize what bad or good writing is because they don't read good books and have slop brains
>>102435220female detectedmales can easily detect good writing because their dicks will give it a standing ovation automatically
There's unironically nothing wrong with a singular instance of shivers running down a spine. The issue is when it appears in adjacent paragraphs and even adjacent sentences. At that point it becomes slop.
>>102435260saying this when erotic literature is mainly written for and purchased by womenmost men get off to coomer fan fiction
>>102435271>saying this when erotic literature is mainly written for and purchased by womenhence why most of it is so terribly written and full of slop
>>102435279do you think models are being trained off of erotic novels? because the slop you see is coming from coomer rp and fanfiction written by horny men.
>>102435220A lot of people think Asimov was good.
>>102435292NTA but you've clearly never read an actual book. Spine shivers come from literally all human writing. Because feeling some sensation in one's spine is a literal actual natural reaction to visceral stimulation. And the model just generalizes it all into shivers down the spine, even though in actual writing it's somewhat varied. But the exact average of it is shivers in the downward direction along the spine. This is like... 6+ month old discussion around here.
>>102435314>NTA but you've clearly never read an actual book.stop reading there. I'm more well read than 99% of the thread. You probably read ender's game and think it was a classic.
>>102435267Same with eyes slop(widening narrowing rolling). Once in 2 pages is okay, every fucking paragraph until DRY eliminates them all is slop. I even had character with no eyes roll imaginary eyes.
>>102435314I've never had shivers run down my spine IRL therefore SLOP.
Did some tests. Setup: 2x3090 ti.Too lazy to make graphs, so read the numbers (c/t is cost per token, and measured as the wattage used divided by tokens/second):400 w: 17 tok/s = 23.5 c/t300 w: 17 tok/s = 17.65 c/t250 w: 14.44 tok/s = 17.31 c/t200 w: 10.80 tok/s = 18.52 c/tConclusion: 300 w optimal (in this setup).Caveat: used nvidia-smi -pl and trusted software cap. No hw measurements.
>>102435340>stop reading there. I'm more well read than 99% of the thread.I read that. I read it in what I imagine this guy's voice sounds like.
>>102435003It is not entirely deterministic, atomic addition sums numbers in a random order, and FP16 precision isn't exceptional, so a+b+c != b+a+c
SOAP: Improving and Stabilizing Shampoo using Adamhttps://arxiv.org/abs/2409.11321>There is growing evidence of the effectiveness of Shampoo, a higher-order preconditioning method, over Adam in deep learning optimization tasks. However, Shampoo's drawbacks include additional hyperparameters and computational overhead when compared to Adam, which only updates running averages of first- and second-moment quantities. This work establishes a formal connection between Shampoo (implemented with the 1/2 power) and Adafactor -- a memory-efficient approximation of Adam -- showing that Shampoo is equivalent to running Adafactor in the eigenbasis of Shampoo's preconditioner. This insight leads to the design of a simpler and computationally efficient algorithm: ShampoO with Adam in the Preconditioner's eigenbasis (SOAP). With regards to improving Shampoo's computational efficiency, the most straightforward approach would be to simply compute Shampoo's eigendecomposition less frequently. Unfortunately, as our empirical results show, this leads to performance degradation that worsens with this frequency. SOAP mitigates this degradation by continually updating the running average of the second moment, just as Adam does, but in the current (slowly changing) coordinate basis. Furthermore, since SOAP is equivalent to running Adam in a rotated space, it introduces only one additional hyperparameter (the preconditioning frequency) compared to Adam. We empirically evaluate SOAP on language model pre-training with 360m and 660m sized models. In the large batch regime, SOAP reduces the number of iterations by over 40% and wall clock time by over 35% compared to AdamW, with approximately 20% improvements in both metrics compared to Shampoo.https://github.com/nikhilvyas/SOAPneat
>>102434973It's okay and the best for what it is. (20B ~ 30B) The vocabulary is also not terribly slopped like CR's earlier release. That shit was horrible.It has okay spatial reasoning, but if you can run something larger, go with that.
>>102435374 (me)>Conclusion: 300 w optimal (in this setup).Because speed/cost is the efficiency (bang/buck) and17/23.5 = 0.72317/17.65 = 0.96314.44/17.31 = 0.83410.8/18.52 = 0.583(in case it wasn't obv)
>>102435374Cool dudes are now capping frequencies, you're not up to speed, dork
>>102435311Asimov was good, but not for his writing style.
>>102435425Ah, Pratyush Patel and Chaojie Zhang. This must be good.
>>102435271Have you actually read women's crap? I got one book, I thought, oh, well, I'm sure the smut parts will be sort of short or whatever. It had 12 pages of a torture scene. A woman's idea of sexy is strange.
>>102434973It's smarter than Nemo but only like 25% smarter. Not a nothingburger but I don't think it's as much of a capabilities leap for its size class as Nemo was.
>>102435501>only like 25% smarter.Which methodology are you using to come up with that number?
I'm using a 3070 and koboldcpp. Is there anything I need to do to make the AI faster beyond using CUBLAS?
>>102435510Bellyfeel. Pretend I said "only somewhat" if putting a number on it is offensive to you.
>>102435523I'm not offended. I was just walking you towards admitting that your statement was a bunch of made up bullshit.
>>102435540you dropped this
>>102435540>The only valid opinions are those that can be measured and quantifiedredditbrained take
>>102435552>>102435562>if I cut my dick off I'm a woman
>>102435425How do you do frequency capping on nvidia? I only see -pl flag.
>>102435576nvidia-smi --lock-gpu-clocks 0,1600 --mode 1
>>102434973It seems good for what it is, meaning 22b models. Obviously it's not going to beat recent models that are bigger>>102435414>like CR's earlier releaseDo you mean CR pre plus? Used to use that a lot till today, that shit can't follow rules if it's life depended on it. Even basic shit like using "speech" or *actions* is beyond it without you pointing out or editing it's messages.
>>102435383go read green eggs and ham kid, scrampathetic
>Use Qwen 2.5 72B>It's utter fucking garbageWhere'd Qwen go so wrong, bros?
>>102435574>"gender and sex are completely different thing">"to be a real woman I need to cut my dick off!"How can we take them seriously lol
>>102435657Multilanguage
>>102435657Qwen was never good desu, everytime I tried one of them it outputted random chinese tokens
>>102435662not true, the Mistral models are really focused on multilanguage and they are good models overall, I'd even argue that the SOTA models (gpt4, Claude 3.5) are multimodal aswell
here's a tip, unslopnemo is the best 12b tune
>>10243550125% would be worth it if it works past 16k context too.
Reasons to not use ollama
btw slop is only a problem for ESLs and retards who don't know to prompt
>>102435713It is extremely annoying how it takes a long ass time to allocate space before downloading a model.It offloads the model from memory if you don't use it for a few minutes.The quants are hidden in the web pageYou can't use models directly from hf, so you need to download them even if you already have them in your drive.
If you care about good prose in smut, you're gay as fuck.
>grr I don't know how to read>I just need to coom!anon just admitted to only having LLMs as a means for sexual gratification and thinks everyone else is gay
Anyone ever used a custom ChatML context/instruct set of instructions that actually improve roleplay? I dunno if snake oil or actually good.
>>102435753When the prose is the only part of the smut, yeah, I do.
>>102435783No, just you.
>>102435374>>102435415>simple limiting the max wattage instead of optimizing the voltage by undervolting (maximizing the clocks) wasted performance
>>102435657>>Use Qwen 2.5 72BEarly access? I've heard they went the Meta route and filtered by bad words.
qwen was never good
https://nvlm-project.github.io/vlm from nvidia (not up yet)
>>102435912>I've heard they went the Meta route and filtered by bad words.I mean, what else do you expect from chinks? They are as retarded as the west when it comes to censorship
>>102435619Thanks.>>102435825How much of a performance hit are we talking though? A single nvidia-smi call is trivial effort.
>>102435931The west censors shit to make themselves (white privileged college students) feel betterChina censors shit on mass to protect their governments power and to restrict the masses free speech... wait a second, that's also why the west does it!
What will locusts do when OpenAI enforces CoT for all prompts, making it almost unjailbreakable?
>>102435976annoying libertarian techbro hands wrote this post
>>102435990I'm still surprised someone didn't find a jailbreak to make gpt4 repeat the CoT prompt again so we can see it
I'm trying to set up the optimal settings for mistral nemo seen here:https://rentry.org/freellamasI'm a little confused on the>Sequences for this modelsection. This menu has completely changed nowadays.Is input and output now: System Prompt Prefix and Suffix?Which one is now Last Output Sequence? And which is Separator?I tried to RTFM but didn't really find anything explaining the old settings.I posted this on /aicg/ initially and they sent me here kek
Found this in last thread. Can someone else confirm if the gap between IQ4_XS and Q3_K_L really is that significant?
>>102435753Fuck you smugposter, the plot makes the porn better.
>>102436007I just changed per https://hf.co/bartowski/Mistral-Small-Instruct-2409-GGUF/discussions/1 and the writing of Mistral Small instantly changed for the better. That models still work but with degraded quality when one uses the wrong template is kind of maddening.
>>102435619It's hovering around 216 watts for each card (capped at 400w). 14-15 tokens/s. 300w seems a tiny bit more efficient, but not bad. (Efficient here means "how much would I have to pay in electricity for it to write me some long blurb".)
>>102435991nta but literally kill yourself. no, really. stop what you're doing and go swing from a rope
>>102434739Let us know your findings, I'm currently trying out mistral large IQ2M (~2.7 bpw) because it's much faster than IQ4XS (1.3T/s vs 1T/s)IQ3M seems to be unoptimized because it's slower than IQ4XS (0.85T/s)
>>102436319>muh goberment censorship to something something the masses!idk why you said "nta" when you're clearly the same annoying retard
>>102435412LLM engineers have to re-invent shampoo and soap because they've never used IRL
How can I make Q6_K_L quants? Llama.cpp quantize doesn't support it and google gives no results.
>>102436439I think it's done like this.
>>102436467Oh, so just --output-tensor-type Q8_0 and --token-embedding-type Q8_0?
Will Mistral really make any more MoE models? The primary reason they did it was because it was quick and cheap, as they could initialize from a smaller model they already trained, but now they have the compute and can train models like 123B fine.
>>102436544We MIGHT get a revamp of 8x22 or 8x7 but they're not going to train a new one from scratch. MoEs were a MeMe.
>>102435752>You can't use models directly from hfapparently you can import them, but you have to make a one liner file.
>>102436493I believe so, yes, when I saw discussion on that matter it was mentioning that Output Tensors and Embeddings at q8_0 was superior, so I think that's all it is. Whether it's truly better or not I can't say.
>>102435990They use Opus
>>102435003exl2 is hugely nondeterministic. Has always been. I see it as the price you pay for speed.
>>102435479I've read one, within about 50 pages i'd actually run into 8 spine shivers, and that was not even the worst of it. It also had "that" sentence structure almost exclusively, the same one the LLMs spout continuously. Just one of these books has such a high slop density that it can poz an entire dataset, i wager. If it hadn't been published in 2021, i would have swore that it was written by an LLM beginning to end with no editing. But if i'm being honest, the LLM would probably do better.
training with my gpus at 200 watts is pretty comfy
Mistral Small almost seems to be capable of playing my dots game.
Bros? Are we back?
>>102436761
>>102436790>Newsflashit was over before it began
>>102436836Simply ban that word, and live happily ever after.
> Anything interesting being discussed in this thread? Use bullet points and add >>NNNNNNNN links.--- Determinism in LLM Generation (>>102435003) There seems to be nondeterminism in LLMs, with varying token probabilities during regeneration. Some users suggest this is due to atomic addition and FP16 precision (>>102435383). Efficiency and Power Consumption (>>102435374) User experiments with different power settings for running LLMs. User concludes that around 300 watts is the optimal setting for their setup (>>102435414, >>102435931). Comparison of Mistral Models (>>102434973, >>102435510, >>102435520) Discussion on the performance and capabilities of Mistral Small and other models. Some users praise Mistral Small, while others find it lacking compared to larger models (>>102434943, >>102435825). Sexual Preferences and Writing Quality (>>102435201, >>102435220, >>102435660) Debate over the importance of prose quality in erotic literature and the gender dynamics related to writing and consuming such literature. Jailbreaking AI Restrictions (>>102435990, >>102436005) Speculation on the impact of potential OpenAI changes to enforce Chain of Thought (CoT) for prompts and its influence on ""jailbreaking"" AI restrictions. Vector Store Optimization (>>102434943) A user seeks advice on speeding up computational tasks related to vector stores and cosine similarity computations. Quantizing Model Weights (>>102436432, >>102436467) Users discuss the methods for creating Q6_K_L quantized model weights. Model Training and Dataset Bias (>>102436701) Conversation around the quality and bias found in training datasets for LLMs, with particular focus on erotic content.
>>102437006>one of the quotes leads to something completely different
>>102437029Just showing you how it. Mistral Small isn't perfect.
>st>[DEPRECATION NOTICE] Model scopes for Vectore Storage are disabled, but will soon be required.what is this
>>102435668>random chinese tokensBased Chink making burger feel the same as we ESL with their shitty models when is non english prompt and chats, even Claude do this shit and turn french into English tokens.
>>102437105well this sounds fun for my 15 rag databases
I guess 2 bit large is much better than 8 bit small?
>>102437307Is 2 bit smol better than nemo?
>>102437251Use case?
>>102435520pls resbond
>>102437429rp
>>102437431Buy a new gpu
>>102437431Exllamav2 is faster than koboldcpp however it has some non-deterministic behavior and requires your model + context to fit entirely in VRAM. Aside from that, nothing. Maybe practice chess or watch some shows while genning (coming from a 0.5t/s guy)
What's the smallest model that's effective for writing simple scripts or config files?
>>102434744it's still tuesday somewhere
>>102437553Nemo
I have 16gb of vram. Glad mistral released the small model. Perfect for people like me.In b4 people shit on it again for not being as good as models that need 2x24gb+ vram.WIth Nemo the CoT prompt did not really improve the writing upon further testing. That might be different with mistral 22b. Actually had a "oh also i should do X". (x being something in the context)But need to test more.
>>102437666Really? When I looked at programming benchmarks it scored worse than llama 3.1 8b even.
>>102437692(In my experience) it performs better with consequential prompts when you need to correct or improve a solution
Reporting. Genning a lot of replies in parallel does not seem to reduce the quality.I made a script to do sentiment classification for hotel reviews dataset, Russian language, and genning one by one achieves the same result as genning tens in parallel.Incidentally, Nemo achieves the accuracy of 90%, Mistral-Small - 92%, and 2.75bpw Mistral-large - 89%.Tabbyapi.
>>102435797Anyone?
>>102437903ChatML sucks.assistant
guys I think I might have just achieved AGI.
>>1024378892.75bpwos...
Just tried out Mistral Large 2 and I legitimately like it more than Opus.So far I only used proprietary models for ERP. Why did no one tell me straight up that open weight models already caught up in ERP?
>>102438047Because most people using the api models can't run large at a worthwhile quant with enough speed.
It seems like even mistral can't agree on how its prompt template should look like.Opened up tokenizer_config.json of large and small side by side. Small has spaces before [INST] and after, also a space before [/INST].Large only has a space after [INST].
>>102437975that's mistral small?
>>102438083Small also seems to start every reply with a space, which is somewhat infuriating because I often want a single token response for classification, and I always get the space.
>>102438084Well I did a LoRA on it, but it ended up overcooked (loss drops fast after the first epoch it seems) so I SLERP merged it back into the original model. And it's honestly pretty decent. Although it's one of those models where if you mention "consent" or "NSFW" anywhere in the prompt it will just go full on porn mode.
>>102438081It doesn't help that the samplers for mistral-large on proxies are completely inadequate. This leads to the model being repetitive and drier than Popeyes biscuits. It also makes me wonder if some of the other corpo models could be saved if they had a more robust sampler setup.Watch this become openai's next "breakthrough".
Mistral Small is actually great
>>102438126what samplers do you use?
>>102438143Aside from temp. Rep pen at 1.1, min-p and smoothing, very rarely dry at 0.45.
>>102438107Cap at 2 tokens and trim...
>>102438153I don't think rep pen and dry work very well together.
>>102438177I look at probabilities, and for everything else I just had to look at probabilities of the first token.
n-nooo. the positivity bias in mistral small is so bad.can finetunes even get rid of that? gemma2/llama3 is dead because of that.nemo was good because of the convincing characters. even with all the retardation sometimes.
>>102438262>can finetunes even get rid of that?They got rid of it from Largestral so I don't see why not.
>>102435619Compared to limiting power, this makes idle power consumption get worse for some reason, though.
>>102436467I wish we could use different quantization levels (e.g. none) for the attention layers, or exclude specific layer numbers (e.g. the first and last one), that's possibly where most of the damage actually comes from (according to Meta).
Why is Euryale 2.2 souless compared to 2.1?
>>102438047What prompt are you using anon?
>>102438047 It’s nowhere near Opus level. I really wish it was, but it’s not. >Skill issue Yeah, but not on my end.
So does Mistral Small have the same prompt format as Mistral Large, or is there a space before the [/INST]?
>>102436022It depend on your model and particularly its size, you can even go as low as 2 for 70B, it'll be retarded for a 70b but still better than 12B.You cannot just ask "whats the best quant", "is there always this drop" and get an answer that you can apply to all.
Any LLM based games anon can recommend? If I can snap a local to it, even better.Only played that yandere thing a year back.
>>102438329Eh? Which finetunes are you talking about? Been using base mistral large instruct 2.75bpw and it has the same positivity bias like almost every other model under the sunPLEASE man I'm begging
>>102438927>>102438329Would like to know too. I tried Magnum and it lost way too much intelligence.
>>102438768There are spaces before both [INST] and [/INST], and system message is placed before the last message from user rather than before the last message from user.
>2 years of shitposting and nobody has made a single private quantifiable benchmark to test a new model's vibesI'm making one tonight. It's pretty close to ayumi's naughty words bench, among other things
>>102439070>It's pretty close to ayumi's naughty words bencstraight to the trash
>>102438943try one of the 123b merges then
Why is it that the larger the models the more assistant slop they become?Is it because for the size you need synthetic gpt output training data?
new model when?
>>102439134yeah pretty much
>>102439117I bet you don't realize how dumb merges are since you're using already dumb low bpw quants.
>>102439134Training larger models is expensive, so they tend to overly compensate to mitigate risk. Large models are also primarily targeted towards corporate clients.
Hi all, Drummer here...https://huggingface.co/BeaverAI/Cydonia-22B-v1a-GGUF/tree/mainFormats: Metharme, Mistral, Text Completionhttps://fridge-checked-interpretation-hash.trycloudflare.com/
>>102439378>MetharmeThat's a word I didn't think I'd see any time soon. Ancient pyg prompting format.
>>102439484I see a template that doesn't require added tokens & assistant tag, I take it.
>>102439509Should've taken alpaca then.