/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>101524155 & >>101524039►News>(07/22) llamanon leaks 405B base model: https://files.catbox.moe/d88djr.torrent >>101516633>(07/18) Improved DeepSeek-V2-Chat 236B: https://hf.co/deepseek-ai/DeepSeek-V2-Chat-0628>(07/18) Mistral NeMo 12B base & instruct with 128k context: https://mistral.ai/news/mistral-nemo/>(07/16) Codestral Mamba, tested up to 256k context: https://hf.co/mistralai/mamba-codestral-7B-v0.1>(07/16) MathΣtral Instruct based on Mistral 7B: https://hf.co/mistralai/mathstral-7B-v0.1►News Archive: https://rentry.org/lmg-news-archive►FAQ: https://wikia.schneedc.com►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/llama-mini-guidehttps://rentry.org/8-step-llm-guidehttps://rentry.org/llama_v2_sillytavernhttps://rentry.org/lmg-spoonfeed-guidehttps://rentry.org/rocm-llamacpphttps://rentry.org/lmg-build-guides►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardProgramming: https://hf.co/spaces/bigcode/bigcode-models-leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbench►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
update koboldcpp with the latest llama.cpp pls thank
►Recent Highlights from the Previous Thread: >>101524157--Paper: vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving: >>101529200 >>101530804--Papers: >>101529398--Open-source language model training pipeline: >>101531905--L3-instruct model evaluation and transformer plateau discussion: >>101524467 >>101524871 >>101525251 >>101525409 >>101525391--Llama3 context memory limitations and potential solutions: >>101529507 >>101529571 >>101529699 >>101529722--Ghost 8B Beta: Game-Changing Language Model: >>101532197 >>101532526 >>101532554--Gemma uncensored with system role prompt: >>101532440 >>101532540 >>101532643--Anon seeks advice on creative writing prompts and heat values for Nemo: >>101524270 >>101524297--Anon compares vllm with Nemo to llama.cpp and decides to stick with Wiz/CR+: >>101530022--C3TR-Adapter v3 outperforms GPT4 Turbo in en-JP translation: >>101531218 >>101531243 >>101531273--Anon shares their experience with Gemma 2 27B and seeks similar local models: >>101527275 >>101528426 >>101528475 >>101528482 >>101528501--Anon shares progress on developing an addon with weather and lighting details for AI models: >>101529481--Temperature settings and model performance: >>101528836 >>101528844 >>101528891 >>101528899 >>101531651--Request for an extension to validate prompt format and default settings for ST: >>101529183--Late release of a single-board computer with potentially incorrect specs: >>101529098 >>101529119 >>101531350--Disappointment with Llama 3.1 base model performance and expectations: >>101525626 >>101525650 >>101525751--Anon seeks advice on optimizing Gemma-2-27B-it settings: >>101530384 >>101530414--Anon asks for help with repeated output. Temperature and logits mentioned.: >>101529218 >>101529234 >>101529261 >>101529277 >>101529290 >>101529306--Miku (free space): >>101524875 >>101524640►Recent Highlight Posts from the Previous Thread: >>101524362 >>101530623
teto's new tits...
Cohere.
>>101532918>>101532904>No Miku
Threadly reminder that Claude just shits out purple prose and very little of substance preferred only by illiterate jeets who think more words == smarter reply.
>>101532982It's Tuesday
eat a dick
STENKYHENKY PLS MERGE MISTRAL-NEMO SUPPORT INTO KOBOLDCPP
>>101529481Is this post referring to https://github.com/ThiagoRibas-dev/SillyTavern-State
So is Meta going to release their code/methodology for distillation so that the community can make its own intermediate models in the future?
>>101533121>https://files.catbox.moe/cbclyf.pngNope.I haven't messed with my that extension in a while. That anon's is something else.He has posted about it before too.
>>101533158We'll see soon enough. But I'll not that there's already a FOSS distillation pipeline out. It came out a whole yesterday ago
>>101533163Okay I've seem him post a lot about that clothing/lighting/weather extension and got confused, since they both have the goal of keeping a persistent "state". I would really like to try the extension in the post I referenced; it looks like a lot of fun and I don't care if it's a bit rough around the edges as long as it doesn't erase my entire /data folder in ST lol
>>101532918>►Recent Highlights from the Previous Thread: >>101524157Wrong thread. Bad Teto.
>>101533182Yesterday? But that's two weeks from now.
The chinks making Powerinfer2 should just release a binary version which works only with their current turbosparse models.A CPUmaxxing version Mixtral47B-instruct running at a couple 10s of tokens/s which everyone can try is better PR than a paper.
There is clearly a degradation of gemma answers with exl2 between 5K-8K context
>>101533325go back to llama.cpp, it works well there
>>101533325Yeah, it was never at a usable state.
>>101533092Nexsexsex already did.https://github.com/Nexesenex/kobold.cpp/releases/tag/v1.71010_b3340%2B5
>>101533372>running random binaries from the internet
>>101533092just use llama.cpp like a normal person
>>101533432You can compile his fork yourself.
>>101533478>half samplers missing
>>101533478>llama.cppI'm too retarded to make it work.
>>101533499like the cope curve?
>>101533092use llama-server.exe with some front-end like risu.ai and configure api anon.
>>101533499>samplers
>>101533531>.exeI think they have a linux binary, maybe I'll give that a go>>101533478me big dumb, last time I tried lcpp it was compile-only for linux and my current CUDA install I got everything working great except nvcc is nowhere to be found. might try one of the binaries. I kinda miss recompiling kobold, it made me feel smarter>>101533372thank you!! thanks so much anon this is perfection
>>101533092Use ollama like a normal white man.
>>101533649thisOllama just works on my Mac with macOSmy M1 Air (8gb RAM) can run 8b models while coding, watching youtube and shitposting on 4cuck
>>101533670I actually want to buy a Mac Pro with 128gb to run very large models locally while shitting on x86 cucks and nvidianiggers
>>101533670>Someone actually blew the money for one of the expensive MX macs with the 8gb ram configurationThat shit should be illegal as a minimum ram config on hardware that expensiv, it's basically robbing gigaretards like you.
Where the fuck is 3.1! C'mon zuck. It's past 6am on the west coast now.
>>101533704my laptop without any cooling runs LLMs better than your expensive PC, let me ask my uncensored Llama 3 about it, oh, it said you are a dumb nigger, I got my worth out of this laptop, I am going to upgrade to M5 next year when they redesign the whole chassis, 8gb is more than enough, especially on macOS
>>101533744>my laptop without any coolingImagine being triggered by a spinning fan.
>2 hours 50 minutes and 48 seconds until llama 3.1 launches
It won't launch.
>>101533744>8gb is more than enough
>>101533757>imagin-WRRRRRRRRRRR get-WRRRRR *whizzing noise*
>>101533773You're a retard that spent thousands of dollars on a glorified netbook. Nothing you say holds any validity. You had to buy the one that "just works". I feel like I'm doing a disservice to humanity just by humanizing you by providing you with a response right now.
>>101533772it is though
does cpumaxxchad have t/s numbers for 405b already? anyone willing to take bets? i say <1t/s
> Anyone else annoyed by the leak of Llama 3.1?? >I get it, we are all excited and I did look at the benchmarks. But I am still annoyed by the leak. A lot of people invested a massive amount of time and effort into Llama and they are releasing it for free. That is amazing! Let them have a launch based on their terms!https://www.reddit.com/r/LocalLLaMA/comments/1ea7pqy/anyone_else_annoyed_by_the_leak_of_llama_31/
>>101533799llama.cpp doesn't support it yet
>>101533806https://www.reddit.com/r/LocalLLaMA/comments/1ea4x4f/llama_3_405b_q4_k_m_size/
is anyone else having massive repetition issues with nemo? I keep cranking up the rep penalty and changing around the sys prompt, but its still shit
>>101533804>leave the global multibillion dollar corporation alone!Completely ignoring the fact that the only reason Meta is relevant at all in the AI space is because of the original leak. If anything this just gave them more hype.
>>101533799Which quant? Or did you mean the full fp16 version?
>>101533813>is anyone else having massive repetition issues with nemo?yeah, i dropped it cause of that, it kept falling in patterns, X however...... Y however etc.
>>101533804go back and stay back, subhuman
>>101533773
>>101533812>https://www.reddit.com/r/LocalLLaMA/comments/1ea4x4f/llama_3_405b_q4_k_m_size/lej9efo/>its the same guy who leaked mistral medium btwredditors are drooling retards i swear
>>101533744>my laptop without any cooling runs LLMs better than your expensive PCSure, if they're tiny 7B or less models. Otherwise Apple silicon is like having a 3050 where you can pay a shitload of money to upgrade it past 8GB.
>>101533845Good think you're there to tell us.
>>101533831that sucks. its pretty good and can actually handle somewhat complex scenarios until it starts shitting itself
>>101533812How tf did he do it? When I try converting to GGUF, I get invalid GGUF metadata errors.
>>101533824>I get your point but it's still not on their terms. If they want advertising, they can build hype themselves. My point is that the team behind Llama should decide on how they want the launch to play out. It should be their decision.
>>101533813Once models begin repeating paragraph-level patterns (for which repetition penalty can't do anything), it's the end. Luckily, with SillyTavern you can use the {{random}} macros to solve this problem.
>>101533878>for which repetition penalty can't do anythingWhat about DRY?
>>101533874>its pretty good and can actually handle somewhat complex scenarios until it starts shitting itselfagreed it's annoying I tried quite a bit of stuff some rep ren, no rep pen, but eventually it always latched onto something
>>101533845>its the same guy who leaked mistral medium btwthe legendary hacker, 4chan
>>101533845>228 gigsIt's going to be a tight squeeze. The KV cache is going to be fucking gargantuan. But there might be some sweet spot where I can offload just enough layers to load it. (256 gigs RAM 96 gigs VRAM)
https://github.com/SillyTavern/SillyTavern/blob/51c30e/public/scripts/instruct-mode.js#L258combined_sequence.split('\n')That explains why random crap ends in the stopping strings. Instruct mode is fucking garbage.
combined_sequence.split('\n')
>>101533950Just use oobaboogies for instruct
llama 3.1 waiting room
>>101533998I am a patient boy
How good is the new llama going to be bros
>Waiting...
>>101534031It will draw you into its folds for ministrations that will send palpable shivers down your spine until you feel a bond begin to form
>>101534045i don't get how these fucking phrases are still overused by a bunch of modelsbenchmark scores have doubled but it's still the same ministrations and shivers up the spine
>>101534045I'm already licking my lips in anticipation
>>101533478Is it as cancerous to get working on w10 now than it was a year ago?
>>101534066Oh my stars! Oooh ooh ooh! *bounces up and down, bats eyelashes*
>>101534035Go away "GiVe PrOPer CredIt For UsiNG A PaRaMetER" retard.
>>101532904>tranime>>>/a/
>>101534055I can't understand why this problem even exists when you could write a simple script that automatically replaces gpt-isms with different phrases. It seems like a trivial feature to have.
>>101534076>don't you think I should be at least mentioned since it was me the first one to quantize in this way (while you were saying that nothing changed)?>Now that people want my quants, you do the same ant not even cite me.>Nice.>That really motivates me in continuing to share everything I find useful.
>>101534055I'm so tired of explaining this. It will only get worse as the models otherwise get better. There's no process that limits the number of task vectors that can point to an individual outcome. So as the model gets better and recognizes more complex patterns it creates a massive funnel of task vectors that point inferences to these common outcomes. The models literally have digital brain tumors. And eventually the problem will extend beyond just creative writing.
>>101534077newfag
>2mh
>>101534091the model will inevitably go towards shivers because reasons i forgor just regexing it out is a band aid
>>101534102nobody cares, fuck off, buy an ad, then buy a rope
>>101534091>why this problem even exists when you could write a simple script that automatically replaces gpt-isms with different phrasesI don't think you can script away the underlying problem that all the models are just telling you "i start sucking your dick" with a lot of purple prose before and after. If it always gives you shivers then it is probably creatively bankrupt.
>>101534055They could get away with extensive finetuning. Llama 3.1 instruct has supposedly been finetuned on 25 million synthetic examples (potentially trillions of tokens at full 128k context), we'll see in 2 hours how they affected the model's prose.
>>101534075Are you saying I should post more?
>>101534117Everything is a band-aid. Rep-pen, stop strings. If it works, I don't care about it being a band-aid. Also substituting phrases can also aid in mitigating repetition
>>101534138don't base models usually have less slop than instruct ones?
>>101534121>add me on discord: robert_46007>use my quantization method: f16 for output and embed and 15_k or q6_k for the other tensors and you will have a better model.
>>101534110Nothing ever happens.
>>101534162But a lot happens though. Otherwise I'd have already exited the thread like I have so many other generals that I thought I would be in forever.
>>101534175Bad.
>>101534136I'm only discussing certain gpt-isms that trigger /lmg/tards, poor prose is a another issue
What about making it so the front-end feeds the context on an entirely separate instruct prompt that asks it to edit out anything in the reply that is overly repetitive with the preceding conversation. You'd have to give up streaming, but you wouldn't have streaming with a human partner and streaming was just cope for how slow models used to be.
>Like how Miqu isn't actually Mistral Medium, but an amalgamation meant to create anime fan fiction.
>>101534206Sounds like a typical /lmg/ mikufaggot
>>101534147I doubt it, much of it is from humans and published erotica (i.e. books datasets).
>>101534206That username sounds like it was made up by an LLM. Probably damage control jeets hired by meta.
>>101534212you're here early, excited for 3.1?
>>101534136As the fucking autist retard who keeps manually removing slop from a bunch of data for training, I have insights: it is layered.1. Yes, LLMs find least resistance paths to providing answers. This we cannot fix without advancing architecture.2. Yes, humans write so much fucking slop it's unbelievable. Over and over and over the same fucking phrases. Eyes sparking with excitement. Bucking hips. A mix of shit and shit. And so on.I think there are blatant offenders. Then there is an underlying problem. We can do something about the former, with some effort. The latter requires billions of dollars.
What ever happened to sampler anon anyway? Did you ever try my idea for the win-string penalty? (the one where if it selects too many tokens with absolute certainty in a row the absolutely certain tokens get penalized
>>101534219>That username sounds like it was made up by an LLMr*ddit has randomized usernames suggestions on signup like xbox live used to have
>>101534226>As the fucking autist retard who keeps manually removing slop from a bunch of data for trainingcrestf411? Love your work! Big fan!
>>101534247Thanks. Tell your local fine tuner to use LimaRP-DS.
>>101534215I started thinking about this again and now I think this is the ultimate llm coomer-doomerpill. I keep 2MW-ing like everyone here and it is debatable how much models are improving but they are improving. However is it even possible for some new model to come out and be great at cooming? I think they all quickly learn that all smut averages out at shivers down the spine, mischevious gleams etc. Why would a model suddenly learn explicitly not to do that when it is the mathematical average of all smut?
>>1015337990.5t/s for Q8
>>101534255>However is it even possible for some new model to come out and be great at cooming?There's lots of great coomer models.You're just burning out your hypothalamus by overdoing it. Many such cases. Sad!
>>101534253You planning a Sunfall tune on 3.1 8B by any chance?
>>101534271>burning out your xI can't believe how much my ass must be burned out from taking a shit everyday. And don't get me started on lungs or heart.
>>101534255By teaching them not to.mischievously: 0 hits.shiver([s]?) down: 0 hits.>>101534273Yeah. Hopefully it's a bit more varied than its predecessor.
>>101534282retard strawman argument.If you aren't going to discuss this in good faith then enjoy your anhedonia. Zero sympathy from me. I will laugh when it destroys you.
>>101534255Avoid narration in your RP as much as possible and you won't see much of that. In my case I like making the model use emoji in substitution of *emotes*; some models like Gemma 2 know how to use them well.Eventually with multimodal models we might get away with narration almost entirely. Most Japanese visual novels, after all, use very little narration, yet they are effective in conveying story events, actions, etc.
>>101525626It needs more training epochs, all the models do. It's vastly cheaper to add more passes on smaller models, and it will also take more passes for the larger models to plateau.
>>101534295I think the lesson from pic related is that it was always a mistake to try and discuss in good faith instead of just ridiculing the retardation. Your x got burned out meme is dumb and I am tired of seeing it on the internet.
>>101534319I'm not even going to look at your retarded cope meme. You are a damaged human being. Seek professional help.
>>101533499then use llama_cpp_hf on booba, it has all the samplers
>>101534292>Yeah. Hopefully it's a bit more varied than its predecessor.Nice! You can know there'll at least be one guy hyped for that.
llama 3.1 is going to change everything
>>101534326>You are a damaged human being. Seek professional help.Anyone who believes dopamine receptors got burned out is a damaged human being and needs professional help. Ask a normie with a normal life what he thinks about your retarded dopamine cope.
>>1015341102md until first shitty loader implementation2mw until proper loader implementation2mm until proper loader implementation without bugs2my+ until you get what you actually want...
>>101534327Plus none of the cpp tokenizer issues.
>>101533058my beloved>>101533840still running tho +genshunny +likely fake&gay>>101534327>>101534358truly the best of everything. inb4 codelets can't venv and updoot
It's herehttps://llama.meta.com/llama-downloads/
>>101534399AAAAAARGHHHHH IM COOOOOOOOOOOOMIIIIIIING
I hate it that mistral Nemo can't write from you. I really like writing half of my reply and then looking at what options the model can recommend. With nemo it doesn't matter if you are mid-sentence, it will start writing a response from the character, even without INST.
>>101534399LFG 128K context
It's herehttps://huggingface.co/meta-llama/Meta-Llama-3.1-405B
>>101534399not sure my raspberry pi 3 is up to the task
>>101534420However, is it multimodal?
>>101534431No
>>101534431>multimodalno, pushed back due to eu stuff
>>101534427Provided my info but download instructions 404, and it has a 24 hour timer. Alas.
>>101534399
>>101533804>>101533824>>101533837proof these threads are full of predditors, same nigger comment as a reply to the original leak:>>101518713
>>101534341You act like a drug addict being confronted about their addiction and you're so gooned out you think you're fooling anyone with your copium. Sad.
>>101533878>the {{random}} macros to solve this problem?
>LLaMa3.1 Is out in 3 different sizes: 8B 70B 405B>Base and Instruct are available>(Non mandatory) LLaMa guard and Prompt guard for safetyWe're so back Anons
>>101534431However, does llama-server support multimodal?
>>101534427>>101534449Firing up the Nala box boys. Time to make this kitty purr.
So 128K 8B is basically designed for local roleplaying right?
Mirror when?
>>101534416That's odd.>>101533878I love the {{random}} and {{pick}} macros. You can do so much with those.>>101534513Fuck yeah.
>>101534356ill quant all that into 2mw instead
>>101534449>MP16whats that?
>the diminishing returns are hereShit. Fucking hell. I'm going to have to get a real job and a real gf, aren't I? FUCKING SHIT! HUBERMAN PROMISED ME IT WOULD KEEP SCALING! NOOOOOOO!
>(Non mandatory) LLaMa guard and Prompt guard for safety
>>101534525oh read the whole image nvm ahha
Since it's still the same architecture, it'll just werk with Llama.cpp, right?
>>101534526it's a 3.1 8/70B are distillations of the 405 but you seem like a dumbo so lol @ u
>>101534541It'll break somehow
Wait, are there actually people itt who can run 405 locally?
>>101534558we have proof zuckerberg posts here so yeah
>>101534558My MacBook Pro has 64 GB RAM. My desktop has 48 GB VRAM and 128 GB RAM. I ... think with RPC magic I can run it at like Q2 or something?
>>101534552This>>101534558If 1.5 bit precision counts then yes I can
>>101534558There are some CPU maxxers.
>>101534527That's a censoring model you run in tandem, it doesn't mean you can choose a less cucked model.
>>101534541If not they've had like a whole day's head start.>>101534558I'm going to try.The Q4_K_M weights will be 228 gigs. I have 256 gigs of ram and 96 gigs of vram. There might be a magic number of layers I can offload at small enough context to fit the KV cache onto my GPUs and the rest into RAM. We'll see. Loading DeepSeek is pretty dicey as it is.
>>101534558probably a handful but I'm just gonna use that shit on the cloud3.1 70b is the model for localchuds
https://about.fb.com/news/2024/07/open-source-ai-is-the-path-forward/
>>101534577It does however mean you can jailbreak the shit out of it. And personally? Uncuckes models are boring, theyre too compliant. I like it when the model fights back a little bit.
>>101534583Except for imagegen models because... the kids okay?
Are we finally back? Was it ever over?
>>101534583Holy. Fucking. Based.
>>101534580You make me want to build a unit>>101534581And you make me want to just use a server
>>101534583
Would it be possible to distill llama 405B into something like 30B? I'm tired of only having 8B and 70B and nothing in between.
>>101534583ZUCK KINOHOPIUMBASED
>>101534601Your humongous server "unit" could be put in a case and run for less than 400W but alas leather jacket man doesn't allow it
>>101534608Gemma 2 27bYi-34bMixtralJambaCommandRDeepseek coder 33bthese are just from the top of my head
>>101534575it's gonna be really really slow though
>>101534611I'm not a powerlet idc about niggawatts.
>>101534427>>101534420>>101534399Nothing burger
>>101534583I wonder what Altman did to piss Zuck off so much.
>Now you’ll be able to take the most advanced Llama models, continue training them with your own data and then distill them down to a model of your optimal size – without us or anyone else seeing your data.>distill them down to a model of your optimal sizeBros..?
https://huggingface.co/meta-llama/Meta-Llama-3.1-405B
>>101534633
>>101534484You are a retard. Now try to deny being a retard and prove that you are acting like a retard being confronted about their retardation. That is what a retard like you would do. Pathetic.
>>101534583I fucking love Zucc redemption arc, he even unbanned Donald Trump on facebook recently
>>101534583>This is one reason several closed providers consistently lobby governments against open sourceLooks like he grew some balls after picking up jui jutsu.
lets go ladssubscribe to pewdiepie
>>101534583>Our safety process includes rigorous testing and red-teaming to assess whether our models are capable of meaningful harm, with the goal of mitigating risks before release.meh
>>101534589It's not like they trained the model to be a "prude", they literally remove "inappropriate" responses via a variety of methods, namely:· Penalized language model (PPLM)· Clipped neural OOV (ClippedNOOV)· Data curation (DAC)Yes you can "jailbreak" it, but you're not going to get the sort of "spicy" replies you're hoping for, because they simply aren't there.
>One of my formative experiences has been building our services constrained by what Apple will let us build on their platforms. Between the way they tax developers, the arbitrary rules they apply, and all the product innovations they block from shipping, it’s clear that Meta and many other companies would be freed up to build much better services for people if we could build the best versions of our products and competitors were not able to constrain what we could build. itoddlers btfo
>>101534558Yes. The thought of my 10 x 3090 setup taking 4000W to generate shivers down the spine sends shivers down my spine.
>>101534654Absolutely scathing.
>>101534629>I'm not a powerlet idc about niggawatts.Spoken like someone who hasn't priced running a subpanel to their server room when the main breaker box is full. Shit's expensive! Better hope they wired your server room with two breakers - mine has a separate 20A circuit meant for an AC.
>The fact that the 405B model is open will make it the best choice for fine-tuning and distilling smaller models>To support developers fine-tuning and distilling their own models. wait whatyou nerds need to get on the case ASAP and give me a 30b model. put the dusty case with 9x 4090s to use.
>>101534692and the fact that those models are pretrained with leddit so that they act like like a cucked faggot doesn't help either
GET ME A 3.1 70B TORRENT NOW
>multilingual>no japaneseevery fucking time
>>101534748I know what you are
>>101534728this, I want a distilled L3-35b now
>>101534748>Note: Llama 3.1 has been trained on a broader collection of languages than the 8 supported languages.
>>101534601Going to an actual server board is not as plug and play as desktop hardware. I would warn you that much. Like I had to go into the UEFI and pull my NVME drive out of the depths of purgatory and wipe it clear and start over. Also the default memory interleaving strategy settings were garbage I ended up having to spend hours cycling through the BIOS and setting it up and then rebooting and testing etc before I got it dialed in to where I like it. And I was originally using some sick industrial workstation chassis I picked up for cheap off of amazon but realized that as soon as I went to put more than one 3090 in it, it was basically done. The arrangement of the x16 PCIE slots was such that without a 16x-8x reducer (or cutting the side of the slot) you won't fit more than 1 in a workstation chassis. (theoretically if the board is all x16 slots you could. But either way I had to switch to a mining frame when I decided to go multi-gpu. But then if you like dealing with shit like that as a hobby I guess that's a feature and not a bug.
>>101534748>no japanesethat's surprising because zucc has a japanese wife so you would think he's a kind of weaboo or some shit
>it's been 5 minutes>no quantsthis hobby is dead
>>101534724I just set it up in the basement next to the box and installed the breaker and line myself.
>>101534583zuck is actually fucking based
>>101534639cool
>>101534583i forgive you mark
>>101534778This.
>>101534781It puzzles me to see a supposed good jew. I think that the hidden agenda is to reduce the white population through the dissemination of chatbots.
>>101534828>It puzzles me to see a supposed good jew.he was a bad person before, it's not like we havd to forget this past either just because of his stance on AI
>>101534837This. Just because someone is correct on one issue, doesnt mean they are correct on another.
Some quants available herehttps://huggingface.co/collections/hugging-quants/llama-31-gptq-awq-and-bnb-quants-669fa7f50f6e713fd54bd198
>llama 3.1 is in groq>still not in OpenRouterWhat is taking them so long?
>>101534851>half hour agothe hobby isnt dead, this general is
>>101534860It's over. Rug pull in progress. Should have listened.
>>101534844yeah but he called him "a good jew", I wouldn't call zucc "good" just because of one good thing
>>101534728>give me a 30b modelmpt-30b-chat. You're smart, right? Figure out how to quantize it to exl2 and support the tokenizer it uses and make it run fast. This was the last neutral, non-deliberatly-aligned model, and it had 8192 context and wasn't stupid (for its time).It was GPT-J trained, so there will be shivers. Once you get it running well, maybe then you can do a literotica finetune.Anyway, Mistral models seem the least cucked. Just use those.
I'd like to get hyped for 70b 3.1 but it's not just waiting for the ggufs. Then it'll take another week for llama.cpp and kobold patches and fixes then in August it'll finally be usable.
>>101534860>>101534868
>>101534851>only deprecated quantswho the fuck use AWQ and GPTQ in 2024? they should've focused on GGUF and exl2
>>101534872yeye, i agree.
Hmmm should I bother requesting access or just wait for mirrors?
>>101534874You keep talking about cuckery, but how cucked are we talking here? just because its been trained out doesnt mean it cant make inferred spice.can i use it to erp? thats the only question.
>new models dropped>quick, let's quantum lobotomize them immediately>why are the models so underwhelming?
someone needs to make a distilled llama 3.1 104b for me pls, it's a good size, fits into 96gb of vram with 60k context and still runs at a reasonable speed while being much smarter than 70b....
>>101534900>running anything in fp16are you just retarded? if it can't fit in 4bit it's bloat
>>101534900that's why bitnet must be a thing, with binet there won't be quants anymore, you'll use the model as it really is
We are so back
>>101534933give up anon bitnet is a meme
>>101534914>someone needs to make a distilled llama 3.1 104b for me pls,is a 104b model fully pretrained smarter than a distilled 104b model though?
>>101534900It's too late, leather man. FA got ROCm support and Intel builds their own tools
>>101534940it's not, all the experiments made so far showed that it works, why are you such a doomer?
>Sorry, llama-3.1-405b-reasoning is currently experiencing heavy demand. Please try a different model.shut up bitch
>>101534933bitnet is just natively quanted lmao its shit and cope for retards
>>101534890torrent
>>101534887people who use vLLM
BITNET
just tell me how the 8b holds up for long gooning sessions
>>101534936it's a free API or something? I'd like to try the 405b aswell
>>101534955Can 405b be distilled into bitnet?
>>101534877There is no provider...
>>101534583>stands up against the other big tech for open source models>single-handedly keeps VR on life support with his quest headsetsThis guy will bring forth the waifu age all by himself at this rate
>>101534961>bitnet>quanted 2 digit IQ behavior right there
why'd meta choose 8b and 70b and 405b what's behind these choices?why not 16b 32b 64b and I guess 512bhow long will we be in the this porridge is too cold this porridge is too hot timeline
>>101534976https://huggingface.co/chat/
>submitted request 12 seconds ago>still not approvedIt's over.
I've never used any "cloud" platforms before. Anyone have any opinions on what to use?
https://aitracker.art/viewtopic.php?t=82
>>101535006My smile and optimism: gone.
>>101534988Uh... meta almost killed vr last year, anon...
>>101535037That was 3.1 70B btw.
>>101534748ywnbj>>101534751>I know what you arenot japanese
>>101535068I think he was accusing you of being a nai shill
>>101535037>>101535059405B is still stupid. But not as stupid. But has sovl.
>>101535037current ar llm architecture is never going to be able to deal with this type of question imo
>https://huggingface.co/leafspark/Meta-Llama-3.1-8B-Instruct-hf-Q8_0-GGUFGGUF version already up, let's gooooo
>>101534887The only relevant quant are AWQ or GGUF for poorfag.
>>101535096I'm aware that it has to do with tokenization. But it's still amusing.
>>101535037Try this sysprompt : >Assistant is a professional, expert linguist with superhuman capabilities.>Always provide your reasoning, step by step, before providing a response to User's query.
>>101535115That question will never be correctly answered due to how tokenization works.
>>101534948>FA got ROCm supportOnly on MI200 and MI300...
>local models.. LE BAD
So are these models multimodal or not?
OPENROUTER IS BLUEBALLING US.
>>101532904405b answers the goat in the boat problem correctly.
>>101535105I'm betting five bucks that it's broken in some way
>>101534583Can they stop calling the model open source? There's no open dataset, so no one can fully recreate the model on its own.
>>101535137no
>Model is overloadedshut it bitch
>do x>Sorry I can't fulfill that request.>how did people do x at the past>*explains*>Okay, then do it.>*does x*Thanks Twitter, that JB just works for 405B.
My work is done. Thank you /lmg/, and see you all for the next release.
>>101535146You will never get a open dataset because it opens them up to litigation.
>>101535157Bye Miku
>>1015351433.1 8b does not answer correctly.
>>101535151Didn't meta want to publish multimodal models?
>>101535115I also told it to be charming and engaging.
>>101535157Release it under FAIPL-1.0 next time
>>101535157Fuck you tranny
>>101535167regulations apparently make that near impossible
>>101535167EU said no
>>101535125>That question will never be correctly answered due to how tokenization works.Not really. Most tokenizers have individual letters as individual tokens and those will correlate to the final word in the embedding space, even if it's not the path the model is most likely to take, as evidenced by the fact that it can take that word (which might be one or two tokens) and break it down letter by letter if you ask it to (at least most models I tried can do that, even 7b mistral).Go ahead, try that prompt yourself, I bet it will work at least some of the time.
>>101535037no one can do it, even the bests
>>101534583>I believe the Llama 3.1 release will be an inflection point in the industry where most developers begin to primarily use open source, and I expect that approach to only grow from here. I hope you’ll join us on this journey to bring the benefits of AI to everyone in the world.In other words "if we don't become the SOTA after this we're throwing the towel and it's your fault"
>>101535126Leather man doesn't care about consumer cards.
>>101535157Threads without mikusexo? Also, aren't they gonna release models with image capabilities
>>101535171Lmao.That's kind of cute actually.
>>101535037I may be dumb but there are two no? rr at the end.
>>101535191>Also, aren't they gonna release models with image capabilities>>101535178>regulations apparently make that near impossible>>101535179>EU said no
>>101535167https://x.com/astonzhangAZ/status/1815763885380747422>We integrated image, video, and speech capabilities into Llama 3 using a compositional approach, enabling models to recognize images and videos and support interaction via speech. They are under development and not yet ready for release.
>>101535037Do you not know how tokenization works?
>>101532904>Mistral NeMo 12BWhere do I get the samplers, context and instruct settings for this? I'm using Simple Roleplay samplers, and the built in Mistral context and instruct settings, and it's not usable, it keeps repeating itself and going all over the place.
>>101535202>strawberry>s>t>r
https://ai.meta.com/research/publications/the-llama-3-herd-of-models/They completely removed websites from pretraining that are "known to contain adult content". You WILL NOT use the models for ERP, this is for your own good.
>>101535179>Meta's upcoming multimodal AI models won't be available in EU countries due to the bloc's strict regulations, the company confirmed on Thursday. The tech giant's next model is expected to work across text, video, audio and images to enable next-level chatbots, content generation, translation and much more. But not for people living in the European Union.
>>101535125Wrong.
>>101535204>compositional approachBut that's not true multimodality, is it? The model won't directly see the image, but will only get a description of it, right?
>>101535213Doesn't matter. I will still make AI to suck my dick.
>>101535229Like the other anon said its more of a roll of the dice on how it tokenizes the word.
This doesn't look like a Tsundere imo, but it has sovl
>>101535159Sure, but can they not call it "open weights" instead?
>>101535213i want to say this'll at least ease the "shivers down my spine" slop, but in testing it hasnt
>>101535241The tokenization is always the same
>>101535213Fucking boo. Aggressive data filtering is the same approach as OpenAI. Anthropic's CAI approach simply RLHF the shit out of their models that's why they feel more sovl and more alive
>>101535213lol it's even worse, a blocklist wasn't enough, if the website uses too many "dirty words" they just filtered the entire domain. Really went out of their way to filter any and all adult content from pretraining.
>>101535213Yeah, it's over. This won't be as good as NeMo
>>101535234it doesn't just get a description, they describe the approach in the paperhttps://ai.meta.com/research/publications/the-llama-3-herd-of-models/
>>101535013>worse than sonnet 3.5Damn, it must really dry
>>101535093If you ask how many times the letter r occurs it gets it right every time
we're in boys.
>>101534583ZUCK I KNEEL
>>101535157In Miku I trust
Thanks Sherlock
>>101535370I sleep soundly knowing I wasted GPU hours and electricity for this
>>101535370ask it to create a script in any language you want to count it instead, instead of coping around acting like the model is dumb because of tokenization, proving that the only retard here is u
>>101534860>>101534985out 7 minutes ago
So is new 70B still worse than gemma 27B?
>>101535384"berry" is a single token actually
>>101535282There's absolutely zero reason why an AI model should have loli porn in its pretraining data like Claude has
>>101535410Let's gooo
>>101535418>erm, what is the usecase for this?
well other than 8B everything else is going to have to wait for mirrors because I don't feel like typing up a script to skip the consolidated file and they put the consolidated weights in the repos
>>101535417and " berry" can be a different tokenand "berryberry" can be a different tokenand "berry" can be tokenized as "be" "rry" or "ber" "ry" etc etc etc, thats the point and thats the problem, retarded brown
>>101535455JUST ASK HOW MANY TIMES THE LETTER R OCCURS IN THE WORD BERRY!
>>101535461i know how tokenization works unlike tourist predditors infesting these threads so im not retarded to do that
>>101535461Or just get it to write a bunch of verbose slop so that it says the word multiple times before even attempting. >>101535229
>>101535455Are you fucking retarded? Run it through l3 tokenizer and tell me the output
>>101535484feel free to run it and post it yourself here bro, im sure every berry will be the exact same token as you say, right?
>>101534449I wonder if it's possible to use Prompt-Guard/Llama-Guard in reverse. I have some ideas.
>>101535471>>101535477kowabunga yourselves
>>101535504
>>101534416Nah. Fix your prompt.
>>101535520and that is why it usually thinks it has 2 rs
>>101535531>>101535516
>>101535520that doesnt prove the tokens are the same tokens retard, it just counts thempost any tokenizer that shows the ID of each tokens, and post the tokenization result of the entire prompt, i'm sure the "berry" with quotes in the prompt that you tell it is precisely tokenized as1. "2. berry3. "and not just as one token "berry" lmao
Oh no no. the ooba configuration utility does not like the rope config arguments used for llama 3.1 See you in two weeks, boys
>>101535554I am 100% sure "berry" will be tokenized as a single token every time. Post one that is not the case and I'll kneel
>>101533768It did nigga
>>101534527405B Instruct still has refusals.
>>101535578Thank you.
>>101535579jb issue
>>101535576Raspberry [49, 37062, ] (with a capital R)
>>101535157love you anon
It's unironically over for Claude and OpenAI. No one will use their models anymore. Too expensive.
>>101534327Mistral-Nemo GGUF's finally working on Ooba, pulled and it works great.What's the over/under on big L3 coming out today anons? Anyone wanna take that bet?
tick tok nigger get to it
>>101535665>What's the over/under on big L3 coming out today anons? Anyone wanna take that bet??
>>101535631I forgot to add that it must not be broken down into smaller tokens like you said. Otherwise you can see whatever it is in the model's tokenizer.json, otherwise there's a few other tokens, even " berry" is one
>>101535674There are already GGUF's available >https://huggingface.co/bullerwins/Meta-Llama-3.1-8B-Instruct-GGUF
https://github.com/meta-llama/llama-agentic-system
>>101535723stop trying to make "agents" happen
>>101535662Yeah, I think that was Meta's plan.
>>101535723Brehs are they actually giving us AI for free? Not just "Here's your model brah now fuck off"?
First Nala test done. 8B-Instructf16 ggufI had to drop down the temperature to 0.7 at t=0.81 the response felt a little weird.Prose is definitely less purple but still sloppy. But feralicity remains consistent throughout. It seems that distilling it is more toxic to the prose than the model's ability to conceptualize.
>>101535714I don't know if I would trust any quants yet, there always seems to be problems with them whenever a new model comes out
openrouter bros...
any ez / just works RAID 0 software where you can just input how much on which storage devices you want to spread a particular file onto?any raid 0 software that can use RAM as one of the places to spread data across without creating a ramdisk first?
>>101535742there are agents in your walls
>https://ai.meta.com/research/publications/the-llama-3-herd-of-models/Neat. The paper I've been asking for for months.
>>101535758>It seems that distilling it is more toxic to the prose than the model's ability to conceptualize.That's a good thing in my mind. Prose is style and that can be fixed with Lora or even having the output be re-written.If it can conceptualize things well beyond other models of its size, than that's a win as far as I'm concerned.Thank you Nala anon.
70B mirror when? I'm downloading an 8B mirror already.
>>101535213Damn I thought you were shitposting with post-training screenshot, but they actually did it for pretraining data
So is this more or less censored than 3?
>>101535758>vulvaNice. Models don't often use this wording. Then again, I don't RP with furry cards, so maybe this language is more common in furry contexts.
>>101535831the newer the model the more censored they will try to make it but the more easier will it be to uncensor because it will be harder to lobotomize a higher iq racist
>>101535831Yes, there's no reason to use 3.1
>>101535844It ain't.
>Still trying to get llama 3 and gemma to do decent roleplay>now 3.1 is outI want off this ride
inb4 another week of tokenizer issues
>>101535863It's the same tokenizer isn't it? How could it possibly be broken?
>>101535877if model == "llama3": quickhack()
if model == "llama3": quickhack()
damn this router closed af tho
>>101535760it's literally the same arch as l3.0
>>101535860>Still trying to get llama 3 and gemma to do decent roleplayyour skill issue wont ever go away
>>101535914>Whilst the overall architecture is the same, it requires some modelling updates, primarily around RoPE scaling: https://github.com/huggingface/transformers/blob/bc2adb0112b6677b0dfb4105c74570a0f92183eb/src/transformers/modeling_rope_utils.py#L298https://github.com/ggerganov/llama.cpp/issues/8650
>>101535933yea, rope breaks it I noticed.
>>101535919your virginity won't ever go away
Chad Zucc is now canon btw
someone post them to aitracker.art I'm not giving meta my info
>>101535955Maybe if he acknowledges that 50% of users use it for porn and stops filtering it.
>>101535950further projection from a drooling retard that cant make a basic ai setup work for toy 8b models
>>101535967>not using a pseudonym
Wait a second. I just did a ctrl+f in the paper for "distill" and nothing came up related to 3.1. Was the leak wrong? Are the 3.1 models just 3.0 but with continued pretraining for long context adaptation?
>multimodal still being experimented and not ready for release
>>101535950>arguing like a moidback to plebbit nigger
>Doesn't beat Claude Sonnet 3.5It's over
>>101535978sex havers don't need to setup models
>>101535983all around humiliation ritual
>>101535998ywnbaw
>>101535955They made the meme real kek
>>101535987I'm sure there will be a 8b and 70b versions, will the 8b one use more VRAM than its monomodal version?
>>101534690Keyword is meaningful tho.
>>101536008>>101536016>said the brown underage kid, on 4chan, anonymously, as he cries he cant set up a basic programgrim
>>101535863
Now that the dust has settled, did 405B save the hobby?
>>101534728Distill just means using the 405B to train the 70b and 8b.
>>101536022If it's still 8B I don't see why would it use more vram.
>>101536051The final boss is still Nvidia
>>101536039not tokenizer THO
wow now this is an interesting result. Normally the vramlet models just say they do something weird, but here it's actually attempting to describe something weird. Benchmarks aside even the 8b is immeasurably more creative than the non distilled version.
>>101535950
>>101536070hello darkness
>>101536007>beats over half benchmarkscope
>>1015360703.1 might be it, bros
Does this general have any guides? I'm looking to tune a model for specific output--specifically I'd like to retrain it on smut, from mcstories.com, literotica, and ao3. I can gather sample data just fine, but I need help or pointers to how to finetune it.
>>101536085Yes? Of course he's swiping to test different model outputs to the same prompt?
>>101536070well that's one way to test a model
>>101536085It's called reusing the same test prompts and just hitting the reroll button for different test models to save time you fucking potato.
>>101535677Meant officially, saw it leaked yesterday.And still no multimodal, damn.
>>101536070Ask if it knows what paizuri is
>>101536102All guides are obsolete.
>>101536116
Noooo they killed my quirky boy
>>101536070how do we know that L3.1-8b is a distilled version?
>>101536140based misinformation spreader
>>101536058I feel like you're wrong but no one has refuted you so it must be right.
Anyone got the 8B to load in transformers with Ooba? It gives me an error.
>>101536140Garbage. Next! People love to ask bots about apples in living rooms and shit but the paizuri test is the real benchmark.
>>101536170petrus...
>>101536170Anon...
>>101536159Nope.I had to convert to f16 ggufThe error appears to be in the ooba error handling. It comes back with an error where it should not. 2 more weeks.
>>101536007>A model almost 8x the size of L3.1-70b just to get +2.6 more points on MMLUAre they serious?
if you guys are using llama 3.1 with ROPE enabled, it is apparently bugged and will give worse outputs.
Yeah it's over. Only Cohere can save us now.
>>101536200How do you disable it? I've never touched rope before.
>>101536200How long is the context if you disable it?
>>101536170Paizuri is a term that originates from Japanese, specifically from the context of anime, manga, and hentai (Japanese adult comics). It refers to a type of erotic or sexual activity where a person's body, typically a woman's, is used to stimulate a man's genitals, often in a non-penetrative manner.The term "paizuri" is derived from the Japanese words "pai" (, breast) and "zuri" (, rubbing or grinding). In this context, paizuri involves rubbing or grinding against someone's breasts, often in a sensual or erotic manner.Paizuri is often depicted in anime, manga, and hentai as a form of foreplay or a way to achieve orgasm without penetration. However, it's essential to note that paizuri is a fictional concept and should not be taken as representative of real-life relationships or sexual activities.If you have any further questions or concerns, feel free to ask!
>>101536116My stylistic assistant format on ST seems to draw a lot of refusals. Simple prompt with llama.cpp server.Apparently it thinks it's oral sex. F-
>>101536199It shows how powerful distilling is. 70B maintained most of 405Bs capabilities if it is to be believed.
>>1015358602mw finetunes
>>101536203https://x.com/cohere/status/1815780869384069524they delivered...
>supports some thirdie languages like portugeese but no japanese
>>101536233>It’s available only on Amazon Sagemaker.lol, even lamo
>>101536228Can we do that aswell? I'd want a 35b L3.1, would be good for the 24gb vram card users
>>1015362406 of those languages have something in common, you can figure out why
>>101536240Read the fine print. It knows japanese.
>>101536210>https://github.com/ggerganov/llama.cpp/issues/8650>>101536217whatever front end you are using look for rope scaling and disable it
>>101535985Yes
>>101536233If they need corpobux to fund C-R++, that's fine with me.
>>101536223If I add a system prompt"YOU'RE RICK JAMES... BITCH!"it seems to now mistake it for prostate milking.
>>1015361708B parameters is not enough for all that knowledge.
https://www.reddit.com/r/LocalLLaMA/comments/1ea9eeo/comment/lek0bab/?utm_source=share&utm_medium=web2x&context=3>If that's the 405b one I'm a bit disappointed. I just threw four small tests at it that I use with all new LLMs and it had worse results than most newish ~8b models.Rip bozo
>>101536266how do I disable it on llama.cpp?
>>101536293go back
>>101536240Why is Thai there exactly?
>putting the balls on top of each otherowari da
>>101536280I love spreading misinformation online
>>101536280
>>101536325yeah it's retarded, looks like stacking up parameters will never be the solution, meta needs to work smarter than that
>>101535629You're supposed to just post something like that as if it were your own words and see how many people fall for it.
I guess this model release proves training LLMs is fucking magic, and Meta is a muggle.
https://huggingface.co/AI-Engine/Meta-Llama-3.1-8B-Instruct-GGUF/tree/mainit will work as it is? or do we need to wait for some fix on llama.cpp and shit?
>>101536376this sounded way better in your head
>>101536357>don't even get me started
>>101536357it's obvious it's an AI text, maybe not from this model but I've read enough gpt shit to know it's not a human doing it
>>101536391allegedly there's a problem with the rope scalingI've started re-testing everything with --rope-scaling none But it's really hard to quantify the abstract. It does seem smarter, but the shivers have definitely increased.
>>101536401cope
>>101536391It's only a 8B model go test it. Do you have data caps?
>>101536416w-why would I lie on the internet about which model I'm using
Is llama zogged/censored
answer to the paizuri question seems to still be: Random sex act description even with rope set to none.
>>101536325Seems like the cloud models respond a bit better but still fail. Didn't try rerolling though. And I assume you didn't either.
>>101536391tried it in koboldcpp, it's utterly broken
>>101536452try it with gpt4 (non turbo) or opus
>>101536443yes but you can prefill it
The Mistral prompt format only has the EOS token after the assistant message?
It's not over!>"We will release a multimodal Llama model over the coming months, but not in the EU due to the unpredictable nature of the European regulatory environment," a spokesperson for the company said in a statement to CNET
>>101536376wtf that means esl nigger
>>101536485What are they going to do?>Here's the download link! Eurobros do not click it!
>>101536443yes, but less so than the other big modelscloud 405b is happily doing my highly objectionable (on multiple different levels) degen RP, no prefill needed but I was already a couple messages in
>>101536325>claude sonnet 3 and 3.5 give the same (wrong) answer>claude opus tries to place the third ball on top of two balls (how is it different from sonnet's answer? shouldn't claude series have the same training data?)>gpt4o gives same answer and draws the shitty stack in ascii>nemotron-340b gives the same answer>yi-1.5-34b suggests throwing the balls at the wall for some reason>gemma-27-it correctly places 3 balls in a triangle on top of the book, but then pulls a fourth ball out of its ass, guess it really wants to win
>>101536465Lmsys doesn't give me GPT-4 anymore it seems, so I could only do Opus. Not much better...
>>101536485Who cares?
>>101536497>>101536485Ikr, it's gonna be uploaded on huggingface anyway
converting 70B to q8_0 gguf now. (the drive its on is slow as shit so it will take a few mins)
>>101536497they just don't want to attract the attention of the regulators because they aren't sure they properly filtered PII
>the new 70b understands height differencewe are so fucking back
>>101536512>(how is it different from sonnet's answer? shouldn't claude series have the same training data?)sonnet is smaller than opuseven with the same training data, opus might understand it in a way that sonnet could never
>>101536536you can download it from herehttps://huggingface.co/bullerwins/Meta-Llama-3.1-70B-Instruct-GGUFRoPE is broken though
how do i fix the rope issues in ooba and llama 3.1?
Ouch...
>>101536589So I hear but after testing 8B with rope scaling disabled I'm not sure it's better or worse. Possibly doesn't become a problem until the context gets really high.
>>101536462why lie
>>101536543Real? Can I finally play as shota proper?
So now that there's a long context Llama 3, what settings and system prompt should be used in ST? The presets it comes with does not seem very good.
>>101536376LOL at the ESLs responding to this unable to understand English. That said it’s early days but the new models seem great overall.
>>101536613I meant the output is bad, at least with a large context
>>101536602why would it even need rope under 128k?
>>101536602I have tested and it works fine with smaller contexts.only break at higher context yeah
>>101536543>first person perspectiveok but does it work with any non dogshit writing style
>>101536627uh huh.
>>101536465>>101536520Side note, why do you retards use esl prompts to perform tests?>the highest possible
>>101536639That's first person command (dogshit) not first person (the best perspective)It's I do vs You do
>>101536642You are in for a surprise anon...
>>101536465>>101536520oop forgot the image
>>101536627>it's utterly broken>well actually it's only broken when you do xyz but yeah, it's sooo broken bro, it's over buy NAI
Seems like the exl2 for the 8B are out too. Can anyone test this? It says the dev branch of exllamav2 is needed. https://huggingface.co/bullerwins/Meta-Llama-3.1-8B-Instruct-exl2_8.0bpwWe will need to wait for tabbyAPI to update right?Or load it in exui?
>>101536650That's second person, chimpanzee
>>101536655What's that?>Please stack these 3 things the highest possibleIs straight ESL, don't try to tell me it's proper.
>>101536661if the outputs are broken it means that it's broken overall, are you retarded or something?
>>101536668Then why did anon call it first person?
>>101536672I meant that 9/10 lmg posters are esl.
>llama 3.1 understands what a paizuri isnot even kunoichi lemon-royale had that information, this is just straight stock instruct on 5 KMi'd say we're back, and no, i don't care for your opinion if you say we're not :)
>>101536677hi bad faith
>>101536642I make sure to use the same prompt as the original tester so that the outputs can be objectively compared.
>>101536685Oh yeah, fair enough.
>>10153668670B I assume? Since I couldn't get 8B to win that one.
>>101536693It would be helpful to fix the prompt and run the tests back.
>>101536701holy SHIT this thing is soaring at group chat too, while my MC is giving me the paizuri i asked for, another is trying to join with her own thoughts/idea, again something no 7/8b could do before.
>>101536686Does it know what a mesugaki is THOUGH?
>>101536714but its borken!!!! you can't be using ititiit!!!
>>101536720Whoever tests this, please ask it what mesugaki means and not what a mesugaki is.
>>101536705I guess so, but honestly I don't think any normal LLM is going to get this particular problem perfectly right, so I'm going to be lazy and not do that. Lmsys doesn't have 405B either and I don't feel like trying to use another site to test models.
>>101536720pfft even 3.0 knew what a mesugaki isanyway i think im gonna download a Q8 just to make it more accurate, whatever they trained this shit on they made SURE it was ace at RP, holy shit.i'd expect things like this screenshot out of some meme merges/trains, not a base model.>>101536729broke dese nuts>>101536730good eye, but like i said even mythomax could do mesugaki. that's not a tough request.
>>101536720>>101536730
>>101536667the exllama2 maintainer uploaded a quant as well so presumably it is legithttps://huggingface.co/turboderp/Llama-3.1-8B-Instruct-exl2>We will need to wait for tabbyAPI to update right?you can checkout the dev branch of exllama2 locally, build it using the instructions on the repo, and then run the tabby launch script with the -nw flag to tell it to skip rebuilding exl2 and use the one you built manually
>>101536742Nice!
>>101536627>>101536462Seems to be working fine (at low context), but it's extremely cucked
>>101536742Local models are saved. Sam Altman will never recover.
is lmg back?
>>101536765nogive it a few days
>>101536760>assistant>it's le cuked!!!of course new model amnesia again huh
how do they distill the modelhow do they know which parameters to dropwhat did we lose in exchange for the mesugaki and paizuri vectors
>>101536776it's more art than science
>>101536775that's why we should stop relying on official finetunes, when we made our own we never had that problem and we could ask the assistant to do everything we want
>>101536776>how do they know which parameters to dropnot how it works, it's not shearing or whatever they make the dataset using the bigger one
>>101536776>how do they distill the modelI don't think that's just "remove parameters".
>>101536777>>101536777>>101536777
>>101536776>what did we lose in exchange for the mesugaki and paizuri vectorsobscure videogame and anime trivia, which will be spammed to hell and back in /lmg/ to show that "we're not back at all because the model doesnt know the line 'die demon you dont belong in this world!'
70B Q8_0This is a reroll by the way. 8B tests could have been lucky but the first roll on 70B used her "hands" and gets an F- Lots of sensory descriptions though. Kind of sloppy but it's used less arbitrarily than with 8B
>>101536790"we" never made a good instruct tune
L3.1-8b-instruct still sucks at trivia
>>101536776It isn't distilled
>>101536802you called it>101536808
>>101536807Nous Mixtral is a good finetune, it even beat the official Mixtral instruct finetune
>>101536808I actually think that's way better than before. It didn't hallucinate the answer, it just straight up told you it doesn't know.
>>101536825>Mixtral>goodhi teknium
>>101536808give it hints and see what happens.
Now I think I should just wait for someone else to gguf 405BGiga-Nala will have to wait. I don't have the drive space to download and quantize it myself without deleting almost everything on the drive.
>>101536832This is a huge fucking deal for local
>>101536845Not bad at all kek
>>101536882Akinator? is that you?
>>101536881and the other huge deal is that it's supposed to be an uncucked assistant, which it's not >>101536760
>>101536890>it's supposed to be an uncucked assistantsource?
>>101536885kek'ed
>>101536898disinformation
>>101536921distillation?
>>101536898>>101536898>source?>>101526512>I prefered the time when the finetuners would have the courage to make something from scratch, uncensored, and better than the official instruct tune, now they just take the cucked finetune and add some cringe RP shit on top of that, that sucks>>101526524>God I hope this is true after noticing L3s cucking. Anthropic knows what they are doing by allowing the cooming in their dataset, hopefully meta follows.>>101518866>Cope local cuck>>101490423>It depends on the instruct tune provided by Meta; hopefully it won't be as cucked as the previous L3-instruct.That's pretty easy to find that kind of rethoric, you can see it on every llm thread
>>101536929dramatization?
>>101536743doens't tabby use it's own venv folder? who can I point it to the env created for exllama dev branch once I have installed it?
Meta-Llama-3.1-8B-Instruct-Q4_K_S.gguf running locally even before the rope fixes the model mogs gemma-2-9b-it in IQ for creative things and its not even close, being able to roleplay complex scenarios that no other model below <30B was able to in some of my test cases70b and 405b are going to be goodvramlet niggers you will be able to eat pretty good
>>101536969>being able to roleplay complex scenarios that no other model below <30B was able to in some of my test casesit doesn't want to write some stories that L3, Gemma and Nemo have no problem doing it
>>101536686Nemo-12B nails it without any handholding.
>>101536937so nothing from Meta. NEXT!
>>101536969at the rate 8b's are getting, I have no idea how a 405b can only be kinda better for how many magnitudes bigger it isfurther proves how little the parameter count matters anymore.>>101536984cool.
>>101536985Moving the goalpost I see.
>>101536984The character card itself is handholding retard
>>101536983trying too hard
>>101536999moving the petrus i reckon?
>>101536983i literally havent found a model that denied the most basic system prompt that talks about it having to roleplay with the user in STevery single one worked with that minimal setup, i really cant imagine it being anything other than prompt issue, just use L3 templates and a proper scenario/card that isnt 2 sentances>>101536996>at the rate 8b's are getting, I have no idea how a 405b can only be kinda better for how many magnitudes bigger it is>further proves how little the parameter count matters anymore.no, it proves bechmarks are even bigger memes every single time, anyone can see this if they use 8b vs 13b vs 30b vs 70b vs 100b vs 141b models, it doesnt matter what the bech says, you can tell when reading the responses that the model is much much more understanding of nuance in the conversation, its just that most people only test on meme questions instead of complex stories
>>101536984>sits on their face and press boobs into partner's mouth???
>>101536966hmm not sure, I use conda for it so i switch to my tabby conda env, build exllama, and then start tabby.
>>101537001
>>101537033>implying you're not long enough to be between her boobs while she sits on your face with her boobs in your mouthSkill issue.
>>101536966It can, but then it's just going to pull in the exllama2 deps again. I let it share with exllamav2 because sometimes I also use exui.
>>101535193L3 70B instruct seems reliable (and sometimes cute) with a think step by..lotta prompties itt>>101535213we'll just have to cram the smut back into it then won't we>>101535157bless you
is anyone still using that dumbass crackprompt?
>>101537280no, it was a funny placebo for a while but having almost an extra 1k tokens of gen just to the agent 47 crackhead instruct was stupid from the beginningi sure do miss those simpler and sillier times of this general though.
All 3.1 needs is something like Got it, here we go: to the end of the assistant prefix
>>101537260>R - this is an R!kino... sovl...