/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>101553102 & >>101546566►News>(07/24) Mistral Large 2 123B released: https://hf.co/mistralai/Mistral-Large-Instruct-2407>(07/23) Llama 3.1 officially released: https://ai.meta.com/blog/meta-llama-3-1/>(07/22) llamanon leaks 405B base model: https://files.catbox.moe/d88djr.torrent >>101516633>(07/18) Improved DeepSeek-V2-Chat 236B: https://hf.co/deepseek-ai/DeepSeek-V2-Chat-0628>(07/18) Mistral NeMo 12B base & instruct with 128k context: https://mistral.ai/news/mistral-nemo/►News Archive: https://rentry.org/lmg-news-archive►FAQ: https://wikia.schneedc.com►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/llama-mini-guidehttps://rentry.org/8-step-llm-guidehttps://rentry.org/llama_v2_sillytavernhttps://rentry.org/lmg-spoonfeed-guidehttps://rentry.org/rocm-llamacpphttps://rentry.org/lmg-build-guides►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardProgramming: https://hf.co/spaces/bigcode/bigcode-models-leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbench►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
►Recent Highlights from the Previous Thread: >>101553102--Large language model comparison of programming language performance: >>101553305 >>101553666--Machine learning model benchmark results comparison table: >>101553857--Mistral-Large is good at lewd and NSFW content, Bitnet discussion: >>101553838 >>101553901 >>101553922 >>101554028 >>101554071 >>101554148 >>101555092 >>101555152 >>101555206--Llama model performance and its impact on the AI landscape: >>101553176 >>101553387 >>101553483 >>101553315--Improving Model Safety Behavior with Rule-Based Rewards: >>101553675 >>101553805--Hugging Face model link: >>101554912--Running large AI models locally and hardware requirements: >>101554561 >>101554616 >>101554857 >>101554957 >>101554993 >>101555199--Running Mistral Large 2 with different GPUs: >>101553665--Mistral license, pricing, and availability discussion: >>101554107 >>101554130 >>101554142 >>101554455 >>101554461--Mistral Large for smut and open-source licensing confusion: >>101555445 >>101555463 >>101555472 >>101555501 >>101555509 >>101555529 >>101555474--Logs: Mistral Large disappointment and reroll plans: >>101556295--Llama.cpp naming convention changes: >>101555081 >>101555170--GPT-3 and VNTL model performance in Japanese-English translation: >>101556133--Comparison of AI models and prompting techniques: >>101556100--Anon asks for help downloading gated models from huggingface: >>101554400 >>101554454--Miku (free space): >>101553607 >>101554400 >>101554557 >>101555825 >>101555861 >>101555913►Recent Highlight Posts from the Previous Thread: >>101553112
BREAKING>BREAKINGBREAKING>BREAKINGLlama 3.1 flopped
>>101556980Why is Mistral Large so good?
>>101557005They didn't filter their pretraining data.
>>101556989>>101557009What model
>>101557016Mistral Large 2 (2407)
I love my beautiful Eve <3
>>101557018q5_K_M
>>101557018>>101557031
Cohere are on it.
The glowies are making their list and checking it twice.
>>101557049no thats illegal in chinatwo more years
SillyTavern CSS so you don't have to tell people which models you're using/* Change timestamp to model name */.timestamp { font-size: 0;}.timestamp::before { content: attr(title); font-size: calc(var(--mainFontSize) * 0.8);}
/* Change timestamp to model name */.timestamp { font-size: 0;}.timestamp::before { content: attr(title); font-size: calc(var(--mainFontSize) * 0.8);}
It still thinks someone can be their own sister/brother though. Sad. >>101557044Some of it has to do with the prompt template I use which tends to push models into an uncontrollable spiral of sloppy try-hard prose. Which is part of the reason its used for the testing.
>>101557016>>101557018>>101557033I wonder if it'll pass after being quanted down to my tier.Gimme dat iMat.IQ2_XXXSS action.
HOLY SHITMistral is literally uncensored. It can translate hardcore rough ntr bestiality rape smut from chinese to english BUT ACTUALLY GOOD. Doesnt read forced, it is just good, just "translate" and it does. Lmao, fuck google, fuck meta, fuck openai and fuck anthropic
>>101556983Why does this recap feel so lazy, are you fine recap anon?
were we ever satisfied with llms? post some of your favorite sovl moments
>>101557168Prove.
>>101557166There's "Model Icons" you can enable, but showing full models might ruin immersion.
>>101556983>>101557178How does recap anon stay up 24/7? Or is it a bot? What model is used for summarization? Is it fed the entire chat as context?
>>101557178recap anon needs some head rubs and a kiss on the forehead
Holy shit you guys. This was the best Zhongli, dicks out for harambe, test result I've ever gotten from a model.
>>101557202afaik it's a bot but recap anon reviews it before posting
>>101557228how does he stay up 24/7?
>>101557168You can't say that and not at least share the source.
>>101557195>showing full models might ruin immersionwould it ruin immersion any more than a timestamp would be? i feel like the timestamp is much worse especially if you're not roleplaying literally the exact presentand you can always turn it off just like timestamps too
>>101557237word on the street is, he does it for free.
So…L3.1 405B > mistral large 2 > L3.1 70B > Gemma 27B > L3.1 8BFor each size?
>>101557240its because he made it up
>>101557249Large 2 is on par with 405B
>>101557237There are 4 Recap Anons working together.
2 more weeks
>>101556993Bump
>>101557267no
>>101556980Has anyone tried fine-tuning one of those open models from Mistral? how hard and expensive would it be? I thought about preparing my own dataset on certain topics to finetune one of their models to my needs. Do I need to prepare that kind of set with questions and expected answers or can i just train it on a huge pile of text instead? I am very new to the topic of LLMs in general, so apology for my lack of knowledge.
>>101557202>>101557237picrel>>101557178>>101557203I've had to resort to using a smaller model to keep up with the amount of posting. Please bear with me.
Have never paid a dime or talked to an AI I don't run locally. It's been rough because I suck shit at python, git and being a nerd in general. Through sheer retarded effort I have gotten to a point where I am pretty satisfied with my local output.Then I fucked up. I put 5 dollars in the paypig machine and talked to Claude. Then I asked him to help me rewrite a fictional character I've been working on. I'm ruined bros. Like a pretty white girl dropped into Pakistan, I am fucking devastated.If you're like I was. Don't paypig. Not even once. You're better off not knowing.
>>101557249Mistral large 2 > L3.1 70B > Gemma 27B > Mistral NeMo 12B = Gemma 9B > L3.1 8B
>>101557301>expensivemillions of bucks
Nemo-12B or L3.1-8B?I don't see how 8B would win, and Nemo is mostly uncucked.Haven't tried L3.1-8B though.Whats the consensus so far?
>>101557237The source of his energy is the power of his Goddess.https://www.youtube.com/watch?v=CXhqDfar8sQ
while you were there complaining and being an useless little faggot, ollama guy fixed llama 3.1not gggerganov, not llama.cpp cuda dev, not slaren. ollama guy fixed it.https://github.com/ggerganov/llama.cpp/pull/8676/
>>101557330nemo got mogged hard by 3.1 why'd you think they panik released it just before?
>>101557334kino, but you dont have to be a mean little nigger though personally i have no dog in this fight and hope everyone (except undster) does their best.
>>101557326Swap the positions of 27B and 12B and you're correct.
>>101557334Damn, he was forced to move a finger. That's a power move by the llama.cpp devs.
>>101557362>except undster>>97223983>For the record, I completely and unequivocally support Undi and his creation of new model hybrids, and think that everyone who attacks him is mindbroken incel scum, who may or may not be employed by OpenAI to do so.>everyone who attacks him is mindbroken incel scum
>>101557331Take my hand, Miku. I'll pull you through!
>>101557374It's his redemption arc for not putting llama.cpp in the readme
>>101557379jesus is that the level of bait this general is operating at these days?good thing i only lurk when major happenings occur.
>lmg thread>all of the posts are from humansNuke this shit already
>>101557330Nemo is leagues smarter than 8B at storywriting at least, it's not even close. I think the people claiming otherwise just haven't tried it and are shitposting.
>>101557394>level of bait>>97062246>I'm not Petra. Petra's an amateur. I'm something considerably worse.>I'm also the point of origin for the practice of the above being added to sysprompts; as well as the 2, 5, 10, 12, and 60 times tables, which enable bots to answer arithmetic questions, when everyone previously said that they never could, and laughed at me for trying.
The 106B~150B range seems to be the ideal for performance. No idea why Zucc keeps gimping himself by skipping this segment and either going for tiny 70b or too big 405b
>>101557409>Q6_Kyour brain is gguf quantized be quiet computelet
>>101557415>I'm the Schizo Futa Anonwhat in the god damn
how are you guys trying out nemo if koboldcpp hasnt been updated yet?https://github.com/LostRuins/koboldcpp/issues/1011i want to try it too
>still no q4_K_M of 405B
>>101557317Maybe it's just the fact that I started out with a mix of Character.AI and Poe before getting local, but I have no problem with viewing different models as existing for different purposes. Gippity4 is for code help, political analysis, and as a general Jarvis bot, while I still use local for my Chun Li card and periodic futa degeneracy. I even still weave in a little Character.AI from time to time, because although it's a pale shadow of its' former self, I still have a few cards there that are hard to let go of completely. Yes the new interface sucks rocks, but with a sufficiently well written card, as long as you're not using it for coom, CharAI isn't completely useless, anyway.
>>101557441frankenstein buildbut it might be shit, tried it, pretty broken.https://github.com/Nexesenex/kobold.cpp/releases
>>101557441>how are you guys trying out nemo if koboldcpp hasnt been updated yet?>experimental branch>llama.cpp itself>vllmidk a true mystery
>>101557441By using llama.cpp.
>>101557427Shut you you IQ2_Migger
>>101557441I'm using it in ooba, it works fine.I know a lot of people here don't like ooba for some reason, but pretending you don't remember that it exists is weird.
>>101557473>you yourep/rope broke?
MistralAI fags are such gigachads, they managed to get a model as good at L3-405b with a model almost 4 times lighter (123b)
100B Is All You Need?
>>101557394It's a few months old; they insist on dredging it up, constantly. You've also got to love the fact that on the one hand, they keep telling me to go back to R#ddit, but on the other, they also keep digging up my old material, broadcasting it on the board, and therefore providing a multiple course buffet for my ego. Their understanding of psychology is as pathetic as everything else they attempt.
>>101557484Meta are true chads they got a model 60% as good as 405 in 50x smaller size..
Has anyone ran any tests comparing mistral large and 405b llama?
>>101557441ooba uses llama cpp with tokenizer fix for nemokoboldcpp is actually slow af now for pushing updates...
>>101557484They also created Nemo which is at least 95% of Large while being 10% of the size and so the optimal choice for anyone who isn't retarded
>>101557505Stop trying to make 8B happen. It's not going to happen.
>>101557503>multiple course buffet for my ego.glad you agree you're a shitposter petra, now go bak
>Mistral Large 2 was added>Llama 3.1 70B disappearedWhat went wrong?
>>101557528l3.1 8b as well, bet they edited an older result when they only had 405...
>>101557508Large is worse than 405B at pretty much anything, but Large has more sovl.
>>101557528>large infinitely worse than opus, sonnet, gpt4oglad to see this meme model die before it took off
>>101557560>200Bare you trolling?the human brain has ~1000000B
>>101557560Damn this is so weird lol, I guess Water will be the natural enemy of videogen models for a long time.
>>101557554what do you mean? it's the best model to use if you're not a millionaire with 15x3090 gpu's or something, and it has way more sovl than the cucked llama series
>>101557560These videos are so gross.I don't understand how """"""people"""""" can enjoy looking at them.
Nemo has repetition problems, no?
>>101557441llama.cpp via llama-server. It works with natively Silly too.
>>101557594instead of spending $3k to run this shit at q4 you can buy literal years worth of claude sonnet 3.5 tokens
>>101557604Does it remember prefixes if you have a large prompt and regenerate?
>>101557598yeyeye
>>101557573i was wrong, apparently the human brain only has ~86 billion neuronsbut neurons aren't exactly equivalent to parameters since they can perform some basic logic iirceither way, transformers models are relatively inefficient compared to our brains so like the other anon said, 100B is probably all you need
>>101557614claude 3.5 is too cucked you can't do everything with it
How can hugging.chat serve all these big models for free?
>>101557620Prefixes?It doesn't re-process the whole context if that's what you are asking.
>>101557631VC cash
>>101557631vc money
>>101557631Investor money, aka. pyramid scheme.
>>101557414Trying right now and it's not capable of following complex instructions like Gemma 2 27B. If you want something formatted differently than the usual book-style RP, it will fuck it up very often.
>>101557631honeypot
>>101557631all me
>>101557631By using a magic zero-bit quantization.
>>101557637>>101557638How does Viet Cong have any money, and why?
>>101557636Nice, I thought the llama.cpp server was way behind and didn't have basic features like that. I'll ST later today with llamafile for fun.
>>101557623anon...synapses are parameters, not neurons, each neuron has ~7000-10000 synapses depending on age
>>101557649>it will fuck it up very often.Yeah Gemma can't follow RP Markdown format.
>>101557631I'm letting them use some cards in my private rig to host that service. Be thankful.
>>101557675how come i'm so retarded then?
>>101557649That's a prompt issue. Especially local with that shitty instruct template in SillyTavern.
>>101557690bad training data
>>101557672Lol that feet came directly from horror movies.
>>101557669Just use llama.cpp instead of another fork that might not be updated.
>>101557690>how come i'm so retarded then?transformers architecture is way better than our brain architecture?
>>101557675>synapses are parameters, not neuronsive never heard this comparison made>each neuron has ~7000-10000 synapsesthis sounds a lot more analogous to a relationship between weights(neurons) than the parameters themselves
>>101557690poor education, excessive consumption of coom, most human interaction involves posting on a forum where everyone call each other "Anon"
>>101557690bad training data/ training stopped prematurely
you guys are so mean..
Are quants of new mistral anywhere? I can only find some empty hf repos
>>101557714>another fork that might not be updatedakshully, jartfile is much faster than chudcpp because i/k quant guy works in collabration with Jartine
>>101557637>>101557638This isn't true surprisingly, CEO posted recently that HF is profitable. I was shocked, like you assumed they were just burning investor cash.
>>101557720>transformers architecture is way better than our brain architecture?it's not though, transformers requires a ridiculously larger amount of data (and I think electricity too but i'm not sure) to be run. we don't need to consume the entire internet to be smart enough to know how many r's are in strawberry
>>101557726>excessive consumption of coomas if the guys on the silicon valley aren't giant coomers...
How do I run Large at home for cheap and with at least 20 T/s?
>>101557747How? Where is that money coming from? What are they selling?
>>101557745*humps you*
>>101557744https://huggingface.co/legraphista/Mistral-Large-Instruct-2407-IMat-GGUF/tree/main
>>101557690Overtraining on goon data.>>101557747Maybe he just lied to get even more vc.
>>101557751>it's not though, transformers requires a ridiculously larger amount of datawe see a shit ton of data aswell with our eyes and ears anon, imagine it's 60 fps, multiply that with your age and get the astronomical data you actually went through, it's way higher than what the model got in the first place
>>101557762You download Nemo and pretend it's Large
>>101557747found the post where he says ithttps://twitter.com/ClementDelangue/status/1811675386368966682very explicitly says that they make a profit and aren't burning VC money, which I think would be illegal for a CEO to lie about
>>101557771thanks a lot anon
*sharts*
>>101557771>parts in their own foldersbased
>>101557792how the fuck do they make money though
>>101557805Yeah I don't know either man, lol, all I know is he says they are
>>101557792>>101557805>>101557827isn't huggingface owned by microsoft though?
>>101557722>ive never heard this comparison madethis is literally how they were invented, they looked how biological neurons work and created the simplified mathematical model where artificial neurons are biological neurons and the connection between them (synapses in biological brain) are parameters in artificial neural networks.
Have any of yous guys used Meta's Chameleon model? The one they released in May https://arxiv.org/abs/2405.09818#
>>101557792i remember when chatgpt came out and news articles were talking about how openai is losing millions in short time, and now huggingface is hosting even larger models. I guess some like NVIDIA might pay for the hosting themselves?
what exactly causes repetition related issues? Even at the start of an RP? i've never had this issue and now im suddenly having it. wtf.
>>101557788>it's way higher than what the model got in the first placeit's not, I calculated it out of curiosity a few months ago. I don't remember the exact number but the model training would be ~100k human years if I remember correctly. In any case it was way bigger than human lifespan
>>101557697I can say that Llama 3.1 8B also fails in the same way (if not worse), but 70B gets it immediately. Gemma 2 9B is also definitely not as capable as the 27B version in consistently following relatively complex output formatting (dialogue without tags + interspersed inner monologue + short-form narration with asterisks), but it's on par with or slightly better than Nemo 12B.
>>101557771Is it broken in any way? Is it better to wait for upstream llama.cpp fixes?
>>101557850It was always there but you just didn't notice it.
>>101557893what exactly causes repetition related issues? Even at the start of an RP? i've never had this issue and now im suddenly having it. wtf.
>>101555266>>101555182if you can't tell this is a man then your detector needs to be replacedhttps://www.youtube.com/watch?app=desktop&v=-mRi-B3t6fA&t=430
>>101557850Show what you mean. Repetition is not what most people talk about. If you're talking about run on sentences, that's repetition penalty too high. It's not using words like 'a' and 'the' and cannot finish a sentence. If it's repeating sentence structure, then don't be too pushy with your writing instructions. It just picks up the pattern from the context and follows it. It's the one thing they're good at.
>>101557899It was always there but you just didn't notice it.
>>101557901Stop obsessing about it, petra.
>>101557878let's say 25 years * 60 fps * 150kb (average size of a 1024x1024 picture) = 16.13 TB
>>101557904In my case, It's an entire prompt verbatim even over up to 10 swipes, or copying the structure of two paragraphs yet making the rest of the gen original enough.>>101557913In my case, It's an entire prompt verbatim even over up to 10 swipes, or copying the structure of two paragraphs yet making the rest of the gen original enough.
>>101557690hit the books (training data) and become your own expert
I'm getting refusals from Mistral Large, what am I doing wrong? It's an incest story, both characters adults
>>101557934this... this is not how it works at all anon. You can't just put random arbitrary numbers there and call it a day.
>>101557938I think you are in a unique situation where you could ask yourself why it is doing that. But if that fails there is also an option of asking yourself why it is doing that.
>>101557938Relax the writing rules, then. I'm sure you can remove half your prompt without losing anything important. Also, give it stuff to work with. If you follow the same pattern in your writing you cannot expect the llm to be better than you.
What the fuck are those consolidated weights in the Mistral Large repo?
>>101557976how is that arbitrary? 25 years is the age when our brain is fully developped, 60 fps is kinda the framerate where we don't see much difference if we go further, and I was being nice with 150kb because that's for a jpeg and our eyes have much more quality than that
>>101557986>>101557990The only thing i changed (which i did to try Nemo) was the instruct and context templates, but i switched them back to what i was using before.at that point i started fucking with settings like a retard (because again wanted to try Nemo) and that doesnt really change much, just creativity.
>>101558001>our eyes have much more quality than that>glasses anons...
>>101557990>If you follow the same pattern in your writing you cannot expect the llm to be better than you.How long until we can stop playing with dolls? I am a 30 years old virgin here and I shouldn't be doing that I think.
>>101558013?
>>101558005nemo is super repetitve at least on gguf i know for sure
>>101557849nothose articles have no idea what they're talking about half the time, i read one that suggested OpenAI is spending billions of dollars a day on ChatGPTthe reality is that they're making an absolute killing because inference is dirt cheap
>>101558026yeah nemo was absolutely broken which is why i switched backthough I sort of remember an issue like this in the past where issues with one model carried over to another, and i have NO clue how that was fixed. besides maybe trying to reboot my system but im doing shit right now so that isnt happening.
>>101556980I'm wondering how sonnet 3.5 compares to llama 405b in c# coding.
>>101558023my eyes are a shit, like jpeg quality 25% or worse
>>101558005If you're using nemo, lower the temp to 0.3 and move it up as you want more 'creativity'. If you came back from nemo to other model, adjust it accordingly. Show your prompt, show your settings, show your model.
We've been scammed.
>>101557992When I tried Nemo with the official API, it complained when they weren't there.
>>101558047oh nonononono
>>101558036oh, yeah same I have myopia that's why I'm wearing glasses, that doesn't change my argument, your brain sees quality pictures if you wear glasses
>>101558043neutralized settings (1 temp, sometimes 0, doesnt seem to matter), kunoichi-lemon-royale-v2-32K-7B-Q5_K_M and Meta-Llama-3.1-8B-Instruct-Q5_K_M.Its funny, turning on dynamic temp seems to adjust and almost break the repetition, but that's dangerous because it sometimes spits out garbage.
Mistral large is better for programming than wizardlm8x22b? Is there a gguf that will fit in 96gb?
>>101558016"Garbage in, garbage out" works in context as well. Make it "entertaining" for the model and it will keep you entertained as well. I hope for a future where all the "ah ah, mistress" proompters are told by the llm to fuck off.
>>101558047>32k contextthat's kinda good no?
large2stral is everything I ever wanted in a local modelthank you based arthur... thank you...!
>>101557899Instruct training. It introduces the GPTslop behavioral pattern to the model."What is the capital of Britain?""I see that you are asking me the capital of Britain. To find out the capital of Britain we can simply look at what the capital of Britain is. The Capital of Britain is London."The reason the training data is formatted like this is to cause the model to set up its own breadcrumb trail to keep it from veering off topic. And these patterns translate over to RP. That's why it iterates over all the shit in the card>And then he runs his SLENDER fingers through his JET BLACK LOCKSjust grabbing phrases and shit out of the card and puking them back out.Finetuning on RP prompts doesn't help either because all of those contain models running on slopped GPT endpoints doing this exact same behavior. What really is needed is a hand crafted RP instruct dataset to tune a base model on.
>>101558057>that doesn't change my argument, your brain sees quality pictures if you wear glassessure, was just funny reading that as I literally need to lean in to my desk to read (late and glasses off...)
>>101558073It's subpar and they advertise it as 128k
>>101558073not for coding
>>101558087if they advertise it as 128k you can probably just use it at 128kthe value in the config doesn't limit you or anything iirc
>>101558099but it'll rope then
VRAMlet talk, anything better than gemma 2 in all these new models?
>>101558107uhhh no it shouldn'tdon't use a backend that does that
>>101558001>25 years is the age when our brain is fully developpedthat's not even close to being true>60 fps is kinda the framerate where we don't see much difference if we go furtherthis is completely wrong as visual perception doesn't work like camera, so putting any framerates here is wrong from the start>and I was being nice with 150kb because that's for a jpeg and our eyes have much more quality than thatthe same objection as before, also if you check how much information is going through optic nerve you would be surprised how small it is. Most of human visions is just a brain "hallucination". Only very small part of our field of vision is actually sharp, the rest brain is calculating from blurred images and thanks to saccade movements.
>>101558112anything
>>101558084hmmmmmthen that means i probably fucked up and still picked the wrong instruct format, thinking i did it rightwhoops. ill try a different instruct after im finished generating these 10 images of realistic Rouge the Bat's pussy in SD.
>>101558116lcpp auto ropes based on the config iirc, if config says 32 and you set 128 it'll rope
>>101558064>1 temp, sometimes 0, doesnt seem to matterIt should matter. If you get deterministic replies with swipes at temp 1 something is fucked in your setup.>kunoichi-lemon-royale-v2Merge. Discard it. At most, take a finetune. Any will do. I won't recommend any.>Meta-Llama-3.1-8B-Instruct-Q5_K_MI assume you're using the latest version of whatever you use. If not, update. If you still get the same output at temp 1 it's a bug. You should report it.
>>101558134just change the config then
>>101558134lcpp also lets you just manually specify the rope base so you can avoid that
>>101558149that's illegal
>>101558145>Merge. Discard it. At most, take a finetune. Any will do. I won't recommend any.i'll have you know that mememerge absolutely BTFO'd any of the recommendations in this thread, especially CR. But i do notice 3.1 even base instruct with my problems is just slightly better so thatll be my main.ill download a newer version and see what happens.
>>101558119>this is completely wrong as visual perception doesn't work like camera, so putting any framerates here is wrong from the startyou can approximate though, because you have to make somme calculations, so framerate it is, and 60 seems to be a good spot. >the same objection as before, also if you check how much information is going through optic nerve you would be surprised how small it is. Most of human visions is just a brain "hallucination". Only very small part of our field of vision is actually sharp, the rest brain is calculating from blurred images and thanks to saccade movements.That doesn't really matter, even if the brain sees hallucinations, it's high quality hallucination, the simple fact we can differentiate a 1024x1024 pictures and a 4k picture means that our brain probably sees in the range of 4kI never said we were computers and shit, but if a computer had to live like us, that's the approximations he would get, 60 fps and 4k
thanks for the goon bros, now i'll clean myself up, lift some weights then watch anime
Comparing weights and biases on a digital neural network to biological brains is stupid. You fuckers never learn.
>>101558182training data on the thread bots isn't updated in real time anon
>>101558182uhh, but sam altman and all the other experts said that we'll have agi that can replace humans within the decade
>>101558182forming analogies is an act of higher cognition. You're literally seething at other people not being an NPC.
https://huggingface.co/cognitivecomputations/dolphin-2.9.3-mistral-nemo-12b-ggufWhat's the word on this, kids? Worth the download?
>>101558204MODS
>>101558208>dolphin
>>101558163>i'll have you know that mememerge absolutely BTFO'd any of the recommendations in this thread, especially CR.Dude. It's a 7B. I like me some small models, but i'd never claim a 7b being better than CR.
>>101558216dolphin indeed!
>>101558124Got an opinion that's not retarded hyperbole?
>>101558217Have you been paying attention to the threads the past 24 hours?
>>101558208Hello Petrus.>>101558232
>>101558195>uhh, but sam altman and all the other experts said that we'll have agi that can replace humans within the decadeMost of those experts that claimed that did it 3 decades ago. Two more weeks, i suppose.
why the fuck is every Mistral Large 2 benchmark about codding performance. literally don't care
>>101557528>mistral 7b is better than command r plus>it's also better than the new mistral nemo 13b>llama 400 better than fucking opuswhat the fuck is this list mate? and I thought arena was bad don't post this shit ever again
>>101558241only productive use of llms
>>101557771>No IQ quantsmeh.
>>101558207Analogies are for the layman as an intro to a subject. They're unnecessary for people that understand the concepts.>See? open notepad.txt and it's like a real notebook>How do i change pages?>Are you an NPC??!?!?!?!?!?!?!?!
>>101558241because coding is probably the most important thing that makes OpenAI relevant, Meta wants to kill that company so he wants people to get a free access of a good coding model and not rely on OpenAI anymore for work, and I can understand them, that's a huge security issue to give your data and coding to a closed company like OpenAI in the first place
eh large 2 is okay, so funnily melodramatic with my old prompts.
>>101558248Mistral Small is not a 7B model.
>>101558207>forming analogies is an act of higher cognition.thanks anon :3
>>101558273>Metabut he said mistral?
llamafile update: it does work with ST. I used this command:>./Meta-Llama-3.1-8B-Instruct.Q4_K_S.llamafile --server -ngl 20Then connected with the "Chat Completion", "Custom (OpenAI-compatible)" choice. I used this as the URL:>http://127.0.0.1:8080/v1Is that the best way? That was very easy to set up, as llamafile is literally 1 file for both the model and the server.
>>101558172based, but for me it's weights, then anime, then goon
>>101558291that's the same thing, Meta, Mistral, Qwen, they focus on coding because companies simply want to work with a local model and not give their private data to Sam Altman
>>101558294>That was very easy to set up, as llamafile is literally 1 file for both the model and the server.buy ad with mozilla money jartine
>>101558294Fuck off, Jart
>>101558294How do you know that model isn't a virus?
>>101558084This isn't correct. All LLMs have this behavior, it's what is generally called "ICL" (In context learning).
>>101558319I unironically trust jart
>>101558326base
>>101558204Kek this is like a balloon with limbs
>>101558319Why would Mozilla (a real actual company) distribute malware?
>>101558294I would rather download that random koboldcpp executable than anything that jart touched.
CerealBENCH update>Claude3.5 Sonnet>LLaMA3.1-405B>GPT4o>Qwen2-72b>Mistral-Large2>Claude Opus>LLama3.1-70b>Qwen1.5-72B>llama3-8b>LLama3-70b>Command-R+>Claude Haiku>LLama2-70b>llama3.1-8b>Mixtral8x22B>Yi-34B>Mixtral8x7Bwill keep you updated
LLAMACPP CRASHED MY MACBOOKFUCK THIS POS SOFTWARE
>>101558376That'd how you get a virus
>>101558241Because AI companies have completely given up on getting normal people interested in LLMs and are pivoting to just making them into tools for computer programmers.
>>101558282holy shit
>>101558171Anon, no.I don't have the time to explain the whole process of visual perception to you but you can't make these approximations because they don't make any sense. You have no idea what you are talking about, I'm telling you this as someone who studied neurobiology. I had a nice textbook about neurophysiology of vision, I can look for it and give you the title and author when I find it, if you want. It should clear some misconceptions you have.
>>101558379>MACBOOK
>>101558379ollama sir
>>101558364Why would a man try to convince everyone he isn't one?
>>101558282now that's some claude soul
Fixed my repetition issue seemingly, at least the start of erp's, it was the instruct settings + i forgot to save which settings preset i was using, now its business as usual. Whoops.>>101558379looks like you have to buy another macbook sar :^)
>>101558378anon what the fuck is this
>>101558364No one but Jart is working on llamafile, even if she uses Mozilla's name. And making models be executables is kinda retarded.
>>101558402That girl is a better programmer than you will ever be
>>101558393>I'm telling you this as someone who studied neurobiologylol you wasted your time for knowledge that anon is just gonna compeltely ignore, nerd.
>>101558419It's a great way of having a self contained model that you will still be able to run in the long term. Having more than 1 file is bloat.
>>101558393You don't seem to understand, I never said we live like computers, my message was that if one day a computer were to live like us for 25 years (like seeing things and shit), the data it would get would be 25 years * 60 fps * 4k pictures, that's the data it would get if it were to live like us, you understand now?
>>101558422>That girlObjectively false>is a better programmer than you will ever beYet to be proven.
>>101558182>comparing the mathematical model of human brain to human brain is stupid/lmg/ brainrot never stops to amaze me
>>101558438noooo but it not like us tho, it had sensors n shiet you cannot compare!
>>101558442How many stars do you have on GitHub? Do you have 17,000 stars? Yeah I thought not.
>>101558357yeah the prompt was>In a sunny backyard, a beautiful little Russian girl lies on her side, her legs elegantly bent and spread. Clad in a cheerful pink two-piece swimsuit, her exposed stomach shines as she smiles, her golden hair flowing around her in the warm breeze.but obviously it didn't do the two-piece swimsuit part and instead focused on the "stomach" tokenkling is unironically better at making tods than any other age group..>>101558423im just happy neurobiology anon roasted the ESL underage """accelerationist""" for talking about things he doesnt understand and for making me second guess myself when I know I'm smarter than a stinky poo like him
>>101558436The model was already a single file.
>>101558446but you can quantify what a computer is seeing based on the camera, sensors and shit, that's the point
>>101558282that is pretty cool, I remember when people were like local models never ever. idiots.
>https://x.com/OpenAIDevs/status/1815836887631946015>Customize GPT-4o mini for your application with fine-tuning. Available today to tier 4 and 5 users, we plan to gradually expand access to all tiers. First 2M training tokens a day are free, through Sept 23.local lost.
>>101558458>00001-of-00011.gguf
>>101558448At least you don't dispute the first fact. Good for you. You're making progress.I don't care for keeping a reputation here.
>>101558481old news retard
>>101558484>Files which exceed the Hugging Face 50GB upload limit have a .catX extension. You need to use the cat command locally to turn them back into a single file, using the same order.https://huggingface.co/jartine/gemma-2-27b-it-llamafile#about-upload-limits
>>101558495You're probably a redditor with more estrogen than jart herself
>>101558484>you now have 11 llamafiles executables>each of them a different virus
>>101558497local still lost, despite the fact that it is an old news.
>>101558518That dude is packing. He could swing his enormous cock across your face and knock you out for a week.
>>101558208Does anyone else have an opinion of this model?
>>101558545sorry petrus people are burnt out on dolpin/gptslop claudeslop is the trend now
>Waaaaa... the cuda code too big. i cannot make me llamafile!!! I need to further quant the models into oblivion because windows has a file size limit for executables, waaaaaaaaaaaaaaaaaaaa.
>>101558208dolphin is dead, I'm not touching that shit since airoboros-13b-gpt4-1.4, that's when he trained his shit with gpt4-march 2023, that most sovlfull gpt4 model we ever had, and it was smart aswell too, only C3.5 sonnet gave me the same feeling again
>>101558561Why does llamafile make you so insecure?
>>101558563dolphin and airoboros are not same do?
>>101558047This is probably just an updated version of the instruct finetune applied to previous Mistral Large, which also had 32k tokens context. Nemo is a newer model.https://mistral.ai/news/mistral-large/
>>101558423>lol you wasted your time for knowledge that anon is just gonna compeltely ignore, nerd.it may be a surprise but I didn't study it for anons >>101558438And I'm telling you this is not comparable because you are using arbitrary numbers. You assume that your (60fps * 4k) would be the case for computers but if it's not necessary for humans to learn then it also doesn't have be for computers. The amount of data needed to process images would be immensely smaller if computer scientists figured out how to model what our visual cortex is doing. It's like making approximations what kind of speed you can achieve with different pace of pedaling on a bike while ignoring the fact you can just drive a car to do it way faster.
>>101558561calm down cuda dev
>>101558545>Dolphin is licensed according to apache 2.0 license. We grant permission for any use, including commercial. Dolphin was trained on data generated from GPT4, among other models.
>>101558510Feline bros just keep winning
>>101558571Certain ideas are just bad. There is absolutely 0 point in packaging a ridiculously big datafile in the executable. Other than that, it seems to work well for cpu.
>>101558585>You assume that your (60fps * 4k) would be the case for computers but if it's not necessary for humans to learncome on man, you think we could live in a 10fps*256px world? that shit would make me dizzy, there's a reason we feel confortable at 60 fps * 4k setting, because that's really close to what we see in real life, don't be obtuse like that, please
Isn't it bullshit to call them 3.1 if they're not related to the original llama version 3 models at all?
>>101558586He wouldn't engage in this drivel. And if he did, he'd do it more eloquently than me.
>>101558609>if they're not related to the original llama version 3 models at all?they are tho? same arch except for context
>>101558609 (me)my name is petra, btw
>>101558614nah, message is too short, not pretentious enough>verdict: NOT PETRUS
>>101558613Same architecture, sure, but they're not continued pretrains of the original models. They're distilations of 405B.
>>101558627I thought they used training data from 405B on top of the original models, not full distillation?
>>101558608you should be slapped for being so retarded and narcissisticdon't reproduce
>>101558637what a projection, kill yourself nigger
>>101558627was gonna say this>>101558636they probably just genned some synth data from 405 to train on top
>>101558600I see the point. I was using Dolphin Mixtral again yesterday, and before I remembered that I could switch to the Mistral tokenizer to enable logit bias, I was getting swamped by a tidal wave of diversity and inclusion. Still, I'm going to download Nemo now, and we'll see how it goes. I'll try some nice, safe, politically neutral code questions. Maybe a Pong game.
>>101558645i'm not the one arguing about stuff i literally don't know anything about you double nigger
>>101558208for erp try this onehttps://huggingface.co/BeaverAI/mistral-doryV2-12b
what's this llama shit i just learned about 5 minutes ago?
>>101558653And I'm not the one talking about biology when the topic is about how close of a setting a computer should get to replicate our point of view, you 2 digit IQ retard
>>101558645>>101558653So remind me again, guys. Why do I get more hate for baiting/shitting up threads, than the people who spam this sort of shit everywhere?
>>101558668a whole lot of disappointment
>>101558608>come on man, you think we could live in a 10fps*256px world?This is exactly what I'm trying to tell you. Seriously, check how our vision is working, this is much better approximation if you want to compare (what can't be compared) at all. Most of vision processing is being made from incomplete and fuzzy visual data. The way that the small amount of data and shitty pictures captured by our eyes becoming clear and sharp pictures while going through layers of visual cortex is quite mindblowing when you learn about it for a first time.
>>101558657>made by the one that was screeching that limarp and all models with it should be bannedhttps://huggingface.co/BeaverAI/mistral-doryV2-12b/commits/main>>100828064>>100828083
>>101558672>moving the goalpostsmore like stealing the goalposts because you are the blackest gorilla nigger that has ever lived>>101558674you can kill yourself too
>>101558674because you're a pretentious holier than thou with a victim complex as demonstrated by this very post
>>101558694go fuck yourself faggot, you know you are wrong in the end of the day, trying to sound smart while talking about irrelevant shit that has nothing to do with the topic in question, get bent nigger
>>101558026I have the same issue with exl2
>>101558657>>101555363 >>101555391
>>101558719Sheeeit
I don't wanna FIND NEMO i wanna FIND DORY and i think EVERY RED BLOODED AMERICAN CAN AGREE!
>>101558672you are arguing with two different anons if you didn't catch that by the way
dory more like boring
>>101558769astounding pun
>>101558705I don't deny this, but the problem is I enjoy it.
>>101558769im finding this pun to be funny
>>101558775tokenizer issue pls understand
What do we do now?
>>101558793die of blood clots from sitting too long
>>101558800Which model?
>>101558800posting logs without model, sampler, and prompt info should be a capital offense
>>101558793Dunno bout you, but I'm still testing Nemo out, with two fine tunes queued for testing.
>>101558793if Vram >= 72GB: run_mistral_large_2()elif Vram <= 24GB: run_mistral_nemoelif boring_dry == true run_gemma_27B
>>101558827>Nemo out, with two fine tunes queued for testing.if dory id reconsideersee>>101558722
>>101558819I think the only real reason why they do it, is because they get off on the sense that they have something which other people don't.
currently running Mistral Large q4_M in all of its 0.8t/s glorycomparing results to Nemo i don't think the slight increase in quality can vouch for 50 times less gen speed
>>101557904It's repeating the first words from the previous sentences and the following sentence structure. Gemma 27b doesn't do that with the same prompt
>>101558852nemo?
>>101558848If your scenario is simple enough to not see a smart model then nemo is the best balance imo. Soul and smart enough for 99% of rp / writing stuff.
>>101558834Those comments were mine, in fact.I like to try the official tune, then a fine tune, then back to the official tune to see if I was doing anything wrong.Rinse and repeat.
>>101558856yes
>>101558852Show your prompt and settings. If you're too lazy to even give enough information for people to help you, go use gemma or whatever model works for you.
>>101558868know issue it's ove'rhad it too, low temp, high temp, no rep pen, some rep pen, {{random}} schizo inject, it'd still loop
>>101558812>>101558819gemma2-9b-sppo-iter3-q8_0config from anon >>101545047:3
>>101558880backend, quant, format, settings? I have not had that problem myself but I use vllm which I know most dont.
>>101557301finetuning largestral will cost something to the tune of $1k-$10k depending on how big the dataset is. Maybe more. You will need to rent at least a couple of a100/h100 for lora. Don't even think about full finetune.For the small models, it's much more manageable. You can do it at home if you have 3090s
dbrx2 when
>>1015589092 days after grok 2
>>101558898lcpp/kcpp q8, 0.2-1.1 temp, 1.0 (disbaled) to 1.1 rep pen. but some anon said exl2 looped also above or last thread.
>>101558921Speaking of, Elon took his shiny new 100k H100 cluster online yesterday and started training right away.
>>1015588994bit Qlora isn't that bad. You can finetune 8x22b with just 96GB VRAM and that's a bigger model than the new Mistral-Large
>>101558927 me>0.2-1.1 tempnot dynatemp btw tried a few in between
>>101558852a} Disable all other samplers except static temperature.b} Don't set temperature higher than 0.4 to start.c} If it's still doing it, the problem is likely either your card or prompt format.https://files.catbox.moe/ot5sj3.pngFor cards, consider using a few shot format like the one in the above card, rather than W++. The word duplication in the description is intentional; it's a name:value format which explicitly specifies the relations between terms.
>>101558898I use vllm with the neuralsomething fp8 weights from huggingface, set add bos true add eos false, currently temp 0.4, repetition penalty didn't seem to do much so 1.0, top_p 0.9
>>101558962petrus... not like this
>>101558962>it's a name:value format which explicitly specifies the relations between terms.>Every statement you process, must be evaluated according to the below six principles.>"principle of identity":"1 = 1">"principle of contradiction":"1 ? 0">"principle of non-contradiction":"1 ? 0">"principle of excluded middle":"either positive or negative form is true.">"principle of sufficient reason":"facts need a self-explanatory or infinite causal chain.">"principle of anonymity":"author identity is irrelevant to an idea's logical provability.">I still keep this in my own sysprompt, although I know I will receive shrieks and howls in response. so you do huh
>>101558962i've seen you shill this card alot of times and i still don't understand wtf it's doing
>>101558986Either tell me specifically what I am doing that you're having a problem with, or shut the fuck up. If your issue is simply the fact that the card format violates your own preconceptions, then that's also not my problem. It's the only card I've got that consistently produces good results with every model I try it with.
>>101558962>This fork has been tested with three major models; MLewd 13b, Mythomax 13b, and Mistral 7b. Mythomax seems to work best.https://characterhub.org/characters/petrus4/adriana-cruzwhy not link your chub?
>>101558829Is 72GB enough to run it?
>>101559011Have you actually tried using the card? Chatting with it?
>>101559025>The design of this fork adheres to my card authoring doctrine, of minimising prose as much as possible, while giving the model descriptions, numerical data, a list of interests, and one or two examples of behaviour, and then letting the AI fill in the rest of the blanks. I feel that it works better than adding every single detail myself, since it encourages adaptive rather than static behaviour. I also use Myers Briggs personality profiles, as a means of providing a full and complex personality, while minimising token expenditure.word slop of pure pretentious
>>101558962Why is the word duplication intentional?
>>101559049i will eventually
>>101559073you wouldn't understand, petrus thinks in a higher plane of existence, literally https://characterhub.org/characters/petrus4/hexnet-1d18e703
>>101559082
https://pastebin.com/gHVRraHJ
>>101559118I thought about doing something like that for perplexity too.
>>101559100Tokens: 616 (l3 tokenizer)...
>>101559135Yeah, it saves a lot of time for bigger models. Hopefully they add Mistral Large since 405B is just watery soup.
>>101558946any LoRA that doesn't change at least 10% of the model's parameters is a cope lora.
>>1015591584bit qlora 1 epoch rank 8 is enoughdonate to my kofi
>>101558379what shit OS crashes from an user mode app?
>>101559196windows
>>101559212"macbook"
VLLM crashed my windows. Piece of shit.
Is it me or is 3.1 8B / 70B noticeably worse than 3
I am once again asking is there anything worth updating for RP over midnight miqu that can run on ~48 vram
>>101559238gguf?
>>101559238They're about the same in my testing, but I think there was a lot more shine on 3 due to the hype cycle being so long. Same as I think Gemma 2 27B is being mildly slept on due to it being a bit wonk at release. It's a monster for that size.
>>101559243 (me)my name is mikufag, btwand yes, i'm still in denial
>>101559252>Same as I think Gemma 2 27B is being mildly slept on due to it being a bit wonk at releaseDon't most of us know now to wait for finetunes, rather than using a vanilla release?
>>101559256look I don't give a shit about you schizos crying about shilling free models, I am currently using said model and nothing I have tried easily outperforms it in sillytavern but I have not checked for a month
>>101559229linux only crashes windows huh?
>>101559269no? most shit on all tunes pretty much, even the corpo instructs
>>101559256>>101559169>>101559100>>101559082>>101559073>>101559054No matter how much misery you attempt to cause others, it will never equal the amount that you are obviously motivated by yourselves.
>>101558899I wanted to basically add more knowledge to the model from small dataset that I will create on my own. I though about finetuning mixtral or nemo.
i think i'm gonna stick with Nemo for the time being
>>101559054>Myers Briggs personality profilesAmateur shit. Real pros use personality checksums.
>>101559297I'm enjoying it too.I'll still give dolphin-2.9.3-mistral-nemo and mini-magnum-12b-v1.1 an honest run.
so is there a guide on what to buy to run the big llama yet?
>>101559291>No matter how much misery you attempt to cause othersfunny coming from the guy that was dooming that anything post mixtral 25 was woke and local was over
I saw someone on le reddit say they ran the 70b on a single 4090, isn't that literally impossible
>>101559272They won't give you good information. They have no intention of doing that. All they are interested in is trying to spread their own pain. If you want to know what models are good to use, you are going to have to download some and try them yourself.
>>101559318>isn't that literally impossibletotally possible if you chop3/4 of th brain out ~q2 quants
>>101559319>If you want to know what models are good to use, you are going to have to download some and try them yourself.Finally some petrus advice i'd agree on
>>101559336Technically Q4 is 3/4ths removed and Q4 is as low as you can go without it becoming a brain-damage quant. And also technically don't they do the pretraining at fp32?So even fp16 is 50% brain removal. and Q4 is 75% of the remaining 50%.
>>101559318IQ2_M and below are viable on 24GB.
>>101559314In how many years' time will you still remember my posts, Anon? 10? 20? 50? If you have so little else to occupy your mind, perhaps I've actually done you a favour.
do either llama 3.1 or mistral large have good elx2 quants yet?
>>101559352Even "brain-damaged" quants are better than FP16 8B.
>>101559364i dunno maybe i'd forget you if you didn't have such a recognizable "tone" to your posts + typing style, and weren't always doomin
Is offloading context faster than the model?
>>101559252Honestly, post unfucking, Gemma 2 27B is my favorite model and it isn't even close. So much knowledge, solve, and understanding of human psychology and emotion that the other models (sans maybe 405B) just can't fucking touchJust wish it had more context. 8k is basically nothing nowadays
>>101559011He thinks invoking logical first principles will magically bootstrap models into becoming smarter than they are. Naturally, if they can't manage basic reasoning, the best way to fix this is to barrage them with a list of impractically generic and abstract rules and they'll use their retard-level faculties of inductive logic to overcome the obvious catch-22 and connect the dots and become 10x smarter.But no one here recognizes his tortured genius and everyone always just writes off his sysprompt as placebo :(
>>101559410imo nemo being a bit dumber is worth the 128k context. Its also far less dry.
>>101559377>muh heckin' bencherinos
>>101559418>But no one here recognizes his tortured genius and everyone always just writes off his sysprompt as placebo :(>>97309445>I still keep this in my own sysprompt, although I know I will receive shrieks and howls in response.
>>101559410>8k is basically nothing nowadaysDepends on your use case. For RAG it's hardly a broom closet, but for coombot cards it's still usable. Then again, if its' text is that good, you're probably going to want it to slowburn.
>>101559272Try looking in /r/LocalLLaMA again, because you got that recommendation from there, Miku.
>>101559418>He thinks invoking logical first principles will magically bootstrap models into becoming smarter than they are.Can you prove it doesn't?
>>101559410Hello sars I would like to take the time to talk to you about google's latest model. Sars are you listening? Sar?
>>101559457>everything I don't like is le redditretard
>>101559418Don't worry, Anon. You did manage to convince me to give up, for the most part.
>>101559483but that not me.. (your no1 fan) like for real, that someone else...
>>101559460No, they can't; and they also never bothered trying to come up with an alternate approach themselves. They are a group of maybe 3-4 howling jackals; they produce absolutely nothing of worth themselves. Their only goal is to demoralise and dissuade anyone else here, who might potentially produce something valuable; and unfortunately, they are extremely effective at what they do.
>7B: Llama 3 8B>13B: Nemo 12B>30B: Gemma 2 27B>65B: Llama 3 70BIt all worked out in the end
>>101559410I RoPE it out to 16k. If you're already running a quant, a little RoPE doesn't hurt as bad as people think. Things for me have been generally stable out to 4x, even for intensive stuff like coding.
>>101559529come on google gemma 2.1 128k context, come oooonnnnn!
>>101559515>Their only goal is to demoraliseonce again funy
>>101559536Even 16k is hardly useful besides quick testing and I'm sure any higher and it becomes retarded.
>>101559529Gemma 27B is worse than Gemma 9B though.
>>101559601Yeah how did google manage to fuck that one up so bad?
>>101559643distillation work too well
>>101558378openai lost, big time nasty
>>101559436I tested IQ2 70B quants with my 3090 and they do feel better than 8B Llama 3.1 like the graph suggests. Better at following the prompt, at least. They would probably have less damage if output and embed tensors were quantized to something higher.
>>101559667Hello and welcome Robert!
>>101559601No, that's definitely not true. Earlier implementations lacked softcapping support though, and apparently that affected the 27B model more. GGUF quantizations done before the fix will remain defective.
>>101559678ROBBERRRTT!!!
>>101559667>They would probably have less damage if output and embed tensors were quantized to something higher.https://huggingface.co/RobertSinclairhttps://huggingface.co/ZeroWw
>>101559707>GGUF quantizations done before the fix will remain defective.Tried after and SPPO 9B still mogged 27...
>>101559711Nah, q5_k to q8_0 will be enough, no need for f16.
>>101559748>no need for f16But (ZeroWw) quantizations...
>>101559740It's okay anonI believe you
>>101558898fp16, fp8 or awq?
>>101559601Thats not true at all. And im not talking about benchmarks.
>>101559894FP8
>>101559997made yourself or from huggingface?
>>101560013>>101560013>>101560013
>>101559515this whole thread is oddi've never heard of your magical sysprompt and I'm inclined to believe all the posts defending you are simply you with a different hat on
>>101560128Correct.
I tested llama 3.1 8b, gemma 2 27B and mistral-nemo 12B in my native language, German - mistral wipes the floor with llama and gemma.mistral - perfect german, coherent and meaningful answersllama - good german, produces mostly nonsensegemma - broken german, not usablethanks mistral, i am now your fan
>>101560266hi Wolfram Ravenwolf