/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>101600938 & >>101589136►News>(07/27) Llama 3.1 rope scaling merged: https://github.com/ggerganov/llama.cpp/pull/8676>(07/26) Cyberagent releases Japanese fine-tune model: https://hf.co/cyberagent/Llama-3.1-70B-Japanese-Instruct-2407>(07/25) BAAI & TeleAI release 1T parameter model: https://hf.co/CofeAI/Tele-FLM-1T>(07/24) Mistral Large 2 123B released: https://hf.co/mistralai/Mistral-Large-Instruct-2407>(07/23) Llama 3.1 officially released: https://ai.meta.com/blog/meta-llama-3-1/>(07/22) llamanon leaks 405B base model: https://files.catbox.moe/d88djr.torrent >>101516633►News Archive: https://rentry.org/lmg-news-archive►FAQ: https://wikia.schneedc.com►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/llama-mini-guidehttps://rentry.org/8-step-llm-guidehttps://rentry.org/llama_v2_sillytavernhttps://rentry.org/lmg-spoonfeed-guidehttps://rentry.org/rocm-llamacpphttps://rentry.org/lmg-build-guides►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardProgramming: https://hf.co/spaces/bigcode/bigcode-models-leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbench►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
►Recent Highlights from the Previous Thread: >>101600938--Papers: >>101605355--Mistral Nemo's context issues and potential solutions: >>101602310 >>101602329 >>101602641 >>101602828 >>101602980 >>101603030 >>101603070 >>101603092 >>101603185 >>101603227 >>101603364--Llama3 quantization type and precision: >>101604291 >>101604353 >>101605347 >>101605562 >>101605576 >>101605722 >>101605651 >>101605701 >>101605786 >>101606078 >>101606448 >>101607040 >>101607086 >>101607352 >>101607411--Running Mistral Large 2 locally with 3090 and 64GB RAM: >>101603131 >>101603139 >>101603231 >>101605203 >>101605384--Good models that fit in 8GB VRAM: >>101607206 >>101607287--Can GPT-like architectures ever match human intelligence?: >>101605727 >>101605795 >>101605797 >>101605905 >>101606349 >>101606395 >>101607240--New PyTorch project for e2e quantization and inference: >>101601709--LLMs' behavior when challenged and the importance of context: >>101605528 >>101605632 >>101606226 >>101606358 >>101606475 >>101606732--Anon suggests a mixture of 70 billion 1-param experts: >>101602434 >>101602444 >>101602456--Prompt passing as tokens in Ollama: >>101601982 >>101601998 >>101602051 >>101605347--Nemo-instruct generates dragons for D&D Lorebook: >>101603305--L40 and Ada 6000 GPU differences: >>101605613 >>101605650--NeMoria-21b Nemo self-merge model: >>101603761--MoE dead or not, Mistral legacy models, and upcoming updated model: >>101602134 >>101602171 >>101602173 >>101602232 >>101602252--Clarification on model size and hardware requirements: >>101601931 >>101605285--Best model for 6 GB VRAM GPU: >>101601201 >>101601239 >>101601330 >>101601374 >>101601251--Anon seeks LLMS model recommendations for their 3090 GPU: >>101604297 >>101604426 >>101605121 >>101605245 >>101605257--AI model responds with function definition instead of invocation: >>101604707--Miku (free space): >>101601474 >>101601626 >>101604421►Recent Highlight Posts from the Previous Thread: >>101601504
Bitnet
vramlets?
llama 4 wen???
>>101607819yes?
jepa jamba bitnet when?
Is the current meta for stacking 3090s a romeD8-2t?
>>101607819Vramlets (people with less than 50 H100s)
why would claude care which of the suggested solutions worked in the end? It's not like I'm posting on some discussion board for others to see it
Your refusal to use proprietary models like Claude lays in privacy concerns or you don't see reason to pay because local stuff is good enough?Would you pay for proprietary stuff if there was an option to pay in crypto (like monero)?
>>101607886because you have a fundamental misunderstanding about how llms work
>>101607858If you have the space, then X98-8PLUS-V1.0
>>101607953privacy. I bought a gpu for AI and nothing else
>>101607953i just like the idea of running an ai on my own hardware, it feels nice.
>>101607953More like i already have experience with services that started good just to get progressively worse and then started banning people when they did not like what they doing.
>>101607953
>>101607953It's not even the payment, it's about the data. If they don't use my data at all (apart from running inference, ofc) and store it encrypted on their servers, I'll pay up, but as far as I know, only NAI does that rn and their model is a bit outdated nowadays
>half of the thread says nemo is great>half says it's shitHow do I get redpilled into joining the former? Even fucking Stheno worked much better for me.
>>101608087You can't get redpilled on taste. You either like it, or you don't.
How does the new Llama compare to the corporate models now?
>>101608087For 1000x times it all depends on setting anons use and their card; if their prompt is shit, then no matter the model, output will be shit. Nemo is likely the best model for Vramlets right now. The only issue the model has is that the effective usable context is much lower than marketed.
>>101607819All of us became vramlets after 405B dropped.
>>101608087I suspect it depends on how vanilla your roleplays are in terms of format.
>>101608122>if their prompt is shit, then no matter the model, output will be shitWell as I've alluded in my post shit like Stheno worked fine for me.>>101608158I guess it may be the case because I do weird shit and not really the "anon fucks 1girl" type of thing. But even when I attempted that for a test, it kept being extremely hesitant with characters going "no this is wrong i must refuse" until explicitly told otherwise. Oh and one of them got randomly shot at one point.
>>101608176>Oh and one of them got randomly shot at one point.Ah, the AI Dungeon memories came flooding right back...Anyway, not anyone of them. If you are using Stheno, then Niitama might work for you.
>>101607953if i could have an accountless access to models (only a random user token) that you fill by paying with monero.and that you could access over tor / i2p, i'd use the service, otherwise it's gonna be local for me.i don't even run completions that are that weird, it's just not anyone's business.>>101608139seriously some hardware maker should get their shit together and make accelerators with TB of vram, i'd pay $$$ for it.
>>101608122this is the worst kind of anon, believes in his magic sampler settings and telling his model to be creative, probably doesn't catch all of the stupid things his ai outputs, "prompt format is very important," "post logs"
Jart won.
oobabooga add Mistral-Large-Instruct-2407.i1-IQ2_M to your benchmark thanks
>>101608087The prose is somewhat fresh, but it hallucinates like a motherfucker and has the usual retardation in its param range. I've had better RPs with Lunaris-8B because at least it doesn't make random shit up and forget character details, though it's hindered by LLaMA slop. Granted, I haven't tried long context scenarios on finetunes of L3.1-8B.
>>101608215Ach yes, the magical anons who are full of bullshit every single time a new model is released and who use the same fucking sampler for all their models and complain that the output is shit are much better. Fuck off, faggot, learn to prompt.
>>101607953Free + offline is a fair trade off for local.
>>101608267FOTM fag has the memory of a goldfish
>>101608285>free>you have to pay for hardware, electricity, real estate for your rigJust pay Altman
>>101608315the electricity is cheaphardware costs vary by autism but a single 3090 can be dual purposethe real reason for local is to not have corpos sniffing at your activity and telling you their insane vision of what's right and wrong
>>101608355But a single 3090 isn't going to get you far.>inb4 vramlet screeching
>>101607953Control. It can't be changed underneath me or taken away.
>>101608355Also, if it's somebody else's service, they can turn the service off, ban you, change the terms of the deal, etc.Being able to control your own experience is paramount to me.
>>101607953if it was just cooming i would use without care provided private payment method, but i query way more and it's just way too much identifiable information to send into cloud in such tightly manner.
What's better, official large or the lumimaid version?
>>101607953claude has insane positivity bias and denies everything>UM USE THIS 3000 TOKEN JAILBREAK THAT MAKES THE OUTPUTS WORSE THEN IT WONT.. OOPS THEY PATCHED IT UHHH TRY THIS ONE INSTEADno
>>101607953I'm still using GPT and Claude for coding, but now that free models are finally good, is there a relatively cheap and privacy-friendly alternative for 405b or large?
>>101608507My Claude prefill is 3 words and it refuses nothing at all.
>>101608511>is there a relatively cheap and privacy-friendly alternative for 405b or large?the smaller llama 3.1 models?
>>101607886>It's not like I'm posting on some discussion board for others to see itThe AI was trained on discussion board material so it's aping that behavior.LLM has no ego. It's a Chinese Room that reads the document and adds to it according to the documents that it has studied. If you create a document that reads like a discussion board it will append to it to make it read more like a discussion board.
>>101608547>It's a Chinese RoomProve you are not one as well.
>>101608562Bite me.
>>101608592I don't like chinese.
>>101608562I am not Chinese
>>101608562I'm a native speaker
>>101608562I am not a room
So I fell asleep while my PC generated 2000+ of Hatsune Miku over night and woke up to my PC´s fan running 100% and was over heating and I had to shut it down before it melt down.
>>101608762And 20 of the gens are usable.
>>101608562didn't they do that mediocre amnesia sequel
>>101608762Theres so much Hatsune Miku that was made over night
>>101608791Your devotion to the Miku is admirable.
Whats a good writing model for a 3080 12gb card? I want to try some creative writing.
>>101608851Claude and the 3080 is overkill for running SillyTavern.
>>101608791What a waste. They all look the same.
>>101608851Nemo does pretty well if you add a couple of snippets of text to its context for it to use as inspiration. At least at 32k context, I don't know if it loses the plot with a bigger context window.
>>101608851You could give magnum mini a tryNot the TOP TIER HIGH END 1T CLOUD POGCHAMP MODEL WITH EXTRA ONIONS, but it's pretty damn good for its size (at least in terms of prose quality), plus it's rather fast, so retries aren't as bad
>>101608883Well, yeah. I mean, that anon didn't change the prompt, and it looks like it's using that brother-sister incest game's lora, which funneled it even more.Just imagine how much better they would have been with a randomized prompt...
>>101607705>►News>(07/27)>(07/26)>(07/25)>(07/24)>(07/23)>(07/22)What do you think we'll get today?
>>101609040BitNet
>>101608791so much mental illness was made overnight
>>101609040C'mon, Cohere, do something. I want my hard-earned handout.
>>101608562Ching chong ping pong China will grow larger
Am I getting paranoid or I actually see gamemakers using ERP chatbots to write texts for them? The slop is all over the place in the dialogue, it's hard not to notice.Or do people actually write that that unironically in the first place and it's the AI who mimicks them too much?I don't even know anymore. I just find it ironic how creatards are all against AI but resort to using it thinking nobody would notice.
>>101609165>no shivers down the spine Nah, a human wrote this. Just an untalented one.
>>101609156Imagine a new Cohere model in the 30-70B range with 128k context and non-shitty KV cache, I'd be cumming buckets
>>101609181You've clearly never read a book in your life then. You fucking illiterate retard.
>>101609181
>>101609181That's clearly AI, and I'm thinking it's Claude
>>101609228Nevermind then. It's AI.>>101609227None of the books I have read contain the stock phrase "shivers down your spine". Only a machine could write something so soulless.
The same way DRY parallels Rep Pen, one could create a n0gran based analogue to Logit Bias right?That would be pretty cool.
>>101609241I do wonder who taught the machines all that, though.This is Crisis point extraction, btw. Go say anon42 hi for using AI, I'm sure his fellow artists would be amused to learn about that.
Where can I download kyutai Moshi's weights? It was a mistake to trust the French
>>101609269No I don't think I will.
So what's the current meta on using example dialogue? Seems like a lot of the new character cards don't bother having them. I'm on a 70B btw.
Has anyone tried Undi's Largestral Lumimaid? I found it to be slightly brain damaged and too horny. Undster, I appreciate your effort of training new models, I really do, but have you tried to train it in a way that is a bit less damaging to intelligence or is coom the #1 priority for you? No hate, just asking.
Just starting out with DRY, what's the meta for its settings?
>>101609337only useful for forcing people to use your personal brand of autistic formatting with bold for speech, double quotes for internal thoughts, and code blocks for actions
>>101609337It's completely optional. Can either improve or ruin a card. Some people throw slop straight from gpt 3.5 in there and then you wonder where the shivers came from. Always check it.
>>101609349Setting base multiplier to 0 and using rep pen instead, now fuck off.
Got pointed here for help.I have ST setup and got recommended to use Mistral Nemo.How do I download this shit lol https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407/tree/mainGot a 4090 GPU for reference
>>101609347Why are you saying this as if this is something easy and straightforward?
>>101609385>I have ST setupDo you have something to run models with?If not, download koboldcpp and the Q8 gguf from >https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/tree/mainThen connect Silly to Koboldcpp;
>>101609357>>101609380Thanks lads.
>>101609387>Why are you saying this as if this is something easy and straightforward?I am not. I'm just asking if there was any effort, or are Undi's tunes just for cooming, which is completely okay.
>>101609435What quant did you use?
>>101609456Q6_K, temp 1, minp 0.05
>>101609347
>>101609403>sCheers for explaining in clear english lol.Do I fuck with these settings?
>>101609466Try to play with your sampler setting, we only tried it unquantized during our test. Also check if the gguf was made correctly. I was using a temp below one, try 0.7 maybe ?To reply to your question, the ratio of SFW/NSFW data got smaller and smaller on the NSFW side, so it should be less horny.
>>101609492Not him but yeah, here's the explanation.
>>101609492Enable FlashAttention and increase the context size. Mistral Nemo works with 128k, but I'm not sure if that can fit into your VRAM
My PC is too shit to do anything meaningful locally, so I'll just do it all on runpod and larp as one of youI'm guessing the obedience and "positive bias" all the public models have is just a side effect from their safety policy and if I run this shit locally it won't be like that? I need pushback when I say/ask something wrong or stupid
>>101609492Yes. That context size is essentially how much of the chat your model can remember, so crank that higher.With a 4090 you can probably go all the way to 128k, but for now do 32k context and see how that works for you.Also, make sure that Flash attention is on and that all layers of the model are offloaded to your vram (in the hardware tab I think).
>>101609499>I'm guessing the obedience and "positive bias" all the public models have is just a side effect from their safety policy and if I run this shit locally it won't be like that?nah, it will it's baked in models too
>>101609498>FlashAttentionThat's new, right? Guess I need to update. What does it do?
fuck I just realized I became worse than Son Gohan, loved him as a child, gave my best and became good at (now useless) stuff, was really disappointed in Son Gohan giving up on becoming stronger, nowadays I'm useless and my brain is rotten. Does anyone know where I can find that clip where Son Goku tells him that you become stronger out of a need? Can't find it,
>>101609523>https://github.com/ggerganov/llama.cpp/pull/778>https://github.com/LostRuins/koboldcpp/wiki#flash-attention
>>101609499Instruct models are all sycophants, even without "safety", it's probably inherent to the whole concept since they're tuned to obey you. I agree it's very annoying. The only way to avoid it is to use base models with few-shot prompting but they're schizo. Even with instruct few-shot (aka populating the context with examples of the style you want) can help.
>>101609537>--flashattention can be used to enable flash attention when running with CUDA/CuBLAS, which can be faster and more memory efficient.No downsides?
>>101609575Shouldn't have any, no.And you can enable cache quantization with it too, which does come with some level of degradation, but at !8 it should be negligible.
>hook Mistral-Large into a chat I previously set up with Claude Opus>it continues it perfectly with that Claude feel to itThis model is raw diamonds. It has some issues getting going on its own but this is should be fixable with some better prompting. The fundamentals are there.
>>101609540>>101609516That's disappointing. In what way are the few-shot models schizo? I also don't see how few-shot prompting would be very useful for my needs anyway, except maybe for cooming.
>>101609591largestral is worth the very slow t/s values, can't bring myself to retry 20 times in a row with Nemo
>>101609337>>101609380Without ED the model's own speech style will overtake your character, and since so many models are hellbent on narration and purple prose, it will make your card unsuitable for chatting.I did the research, here's a compilation of the same card responding to same questions but using difference ED sizes (or none) and various models. Temp=0 to keep the random away. https://docs.google.com/spreadsheets/d/1BsGgRCzluqsZdc7pShCgNVSrv3KtRgzJohyg1rTX5Fc/edit?usp=sharing
>>101609643you didn't have to give poor undi third degree burns jesus christ
>>101609584isn't Q8 cache worse than Q4? or is that only for exl2?
>>101609584Thanks man.
>>101609711Only for exl2 due to the difference in algorithms.>>101609713yw
>>101607953>sign up to open ai for gpt 4 on launch>its get dumber almost every month>they release some new features which kinda helps>continues to get dumber>they talk about how GPT4 is the dumbest AI will ever be>it gets dumber>the new models they release are even dumberthe projects I used to do with GPT4 aren't viable anymore, it's too retarded, I would rather have something that runs locally and doesn't get unpredictably nerfed in the name of efficiency and then those nerfs justified by users with some useless benchmarks/polls. I'd pay $100 a month for the original GPT4, probably a lot more
>>101609694Well shit, I guess that's why all my cards go on long fucking descriptive rants even though I literally put "Focus on dialogue over descriptions" and "be concise and factual" in the sysprompt.Back to writing ED then. Thanks Anon.
>>101609754>>101609694That being said you tested that on an 11B, do you think the same applies to a 70B?
>>101609754The popular jailbreaks might be the suspects too, they usually go all "be verbose and use floral speech when describing blah-blah.But it's hard to find the balance between one-word replies and going full ficbook.The document I linked contains the 70B tests too, but that one's a cloud model, so I'm not so sure about their setup under the hood.
>>101609773I mean it should have an easier time sticking to your characters' personalities with ED, so if you notice the model deviating from what you intended, give it a shot
>>101609498It does not work with 128k not for roleplay and you waste resources if you push it that much.. you can see it getting stupid around 16k.. Seriously anons no advice is better than bad one.
>>101609509How many layers would I want on a 4090?
What I do wonder is that how much ED is too much? Some guides tell you that large ED drives the actual definitions too far back so AI ignores them. But how else can I make the bot stay true to character's own personality if not by letting it figure it on its own from her speech?I mean if I make a card for a manga/anime/novel character I have a huge corpus of their lines at my disposal. Should I just include everything in ED?
>>101609711The EXL2 number that supposedly show that 4 bit cache is better than 8 bit cache had a comparatively small sample size.There was no statistical analysis of the results but if there was I very much doubt that 4 bit cache is better than 8 bit cache with statistical significance; I very much expect this to just be random chance.
>>101609818That specific model has 40 layers I think.Regardless of that, the whole model should fit in your VRAM, so put a 999 (tells it to just put everything) in the input field and carry on.>>101609850It could be if the 8bit quantization was, say, just truncating the values or doing something really stupid instead of doing scaling or the like, right?
>>101609818You're overthinking it.When you load the model it will guess at GPU layers.That's probably fine unless you raise the context (which you probably want to if you're doing anything other than one shot Q&A kind of stuff.)When you run it the following will happen:1. It works. GPU layers isn't too high, but you can try higher.2. It throws a memory error into console after you wait a while for the model to load. Too many GPU layers. Write down what you used (you can scroll up and fish it out of the console dump if you've forgotten) and try a little lower.3. It goes to the WebUI okay but blows up when you submit a prompt. Remove one GPU layer and try again.If you're using VRAM for things like video streaming or a game then you have less VRAM free and might need to reduce layers. But mostly just trial and error till you have a little post it note with your models and how many layers your system can support.And that infographic above says that sometimes lower layers go faster so you can test even more if you're autistic.
>>101609856>Regardless of that, the whole model should fit in your VRAM, so put a 999 (tells it to just put everything) in the input field and carry on.And this, if your model fully fits, max out and be happy.Being picky about layers is what you do when you're like me, running 50GB filecached in 64GB system RAM and where I put the context determines how many layers will run or crash.
What the fuck is that
>>101609856I did not look at the implementation.It is possible that there is something wrong with the 8 bit implementation.But IIRC the result for 4 bit was better than even for 16 bit which are results that I definitely do not believe without good evidence.So my expectation is wrong that the rounding error for 4 bit just happened to provide better results for the small sample that was used for evaluation.
>>101609896>But IIRC the result for 4 bit was better than even for 16 bit which are results that I definitely do not believe without good evidence.Woah, alright, got it.
>>101609883idk, looks like a low-effort investor scam
>>101609818To add to what >>101609860 said, if you need more VRAM and your processor has an iGPU, see if you can use that instead, as it'll free up your dedicated GPU's VRAM. Those extra 1-2 GB can make one hell of a difference
>>101609896Most likely it just means the difference is slight enough between 4 and 8 for random chance to impact the results.
So much effort... I kneel...
>>101609930hi anonei
>>101607953There's already an option to pay in crypto via ORBut for most of us it's a combination of price (free, but even OR options are significantly cheaper for the same level of performance than the proprietary ones, with the exception of maybe GPT-4o-Mini), privacy (since Nick revealed mods look at stories and OAI threw random stories on public taskup, several anons have denounced proprietary entirely), reliability (corpos can and will ban you from using their model if they don't like how you're using it, which is extra fucked up when you realize they all want a monopoly, and in their vision whoever gets banned would be denied any use of AI period), anti-censorship (in addition to the above banning, corpo models are notoriously pozzed and have a severe lack of ways to fix them), and customizability (several models can't be finetuned or LoRA tuned, and those that do make you pay a big premium to both train AND use them after)
>>101609850what would you recommend using then?Would we need to run a benchmark like RULER at high context to know if there is quality degradation between fp16/Q4/Q8 cache?
>>101608241>i1No need, it's already trash
SillyTavern guys, do you use this tab, or do you just put the scenario in the description?
>>101609896I tried comparing model's behavior with flash attention+cache quantization and without either at temp0, while trying to keep my responses more or less the same. Model's responses vary too much between the modes, but I can't tell exactly which one's better. But surely one can't have a major speedup without paying some price and that's usually the quality.
>>101609474I have literally never used any of these people's models for cooming.
>>101610036I put it in the description.
>>101610053then you don't belong here
>>101610053Wait. So you use their modelf for stuff that is not cooming? Now that is fucked up...
this is the most soul a chatbot has had by default. good job meta :)
>>101610080Soul of a redditor, maybe.
>>101609968For the EXL2 results perplexity was used.This is not a problem in and of itself, the problem is just that the number of input tokens was I think 5120.That is in my experience simply not enough input data and you should ALWAYS do a statistical analysis afterwards in order to check whether your results are statistically significant.Since the goal of KV cache quantization is to keep the same logits but just use less memory I think the most straightforward metric to use is the KL divergence.Compared to perplexity this also has the advantage of much better precision at the same number of input tokens.RULER would also work but any regular LLM benchmark should work as well since those implicitly also use the context; the interpretation of the results would be different though.>>101610041>But surely one can't have a major speedup without paying some price and that's usually the quality.Agreed, but I still think it's important to objectively measure these things if at all possible.Both with performance and precision little effects add up.
>>101610080distilled reddit brappa
>>101610060Thanks! You do the same thing with the example dialogue?
>>101610080>...llamaslop
>>101610080>talks about /x/ >doesn't even mention sucubus summoningSoulless.
>>101610080Remember when Sam Altman said that open source GPT4 would be the end of the world?
>>101610124Nope.I use the actual example field since that one has some specific settings that you can change depending on the specific model or card.I might put an example of a character's speech in the character's description while using the example dialog field for example exchanges between user and character.
>>101610064t. Claude jeet>>101610077I just don't use them. They always tune their models on the same shitty claude proxy datasets. They aren't even worthy of merge fodder.
>>101610173Thanks again man.
>>101609385Thank you my twin! Couldn't get it running. Will install Kobold now too
>>101610089Could you provide any resource on how I could do these test myself? maybe with a bigger sample
>>101610182Better than GPT-4 proxy datasets.
what models are people using nowadays that fit in 24GB of VRAM?
cohere? more like conothere. where are they?
>>101610208Proxy datasets in general are garbage. It made sense with Pygmalion 6B insofar as to train it to be able to actually understand an RP prompt, but models have since gotten good to the point that any current generation retard model can figure out how to use a tavern card.
>>101610229>>101609403I'd also suggest you try gemma 2 27b.
>>101609957One thing I'd add.Research. Probably doesn't directly affect many anons, but companies like OpenAI take information from the research community and don't give back. They want people to be uneducated so that they can charge whatever they fucking please, and if they do come across some groundbreaking research that brings AGI to fruition, you can fucking bet they're going to keep that information all to themselves.Back in the day they justified this by saying that it was for our "safety". After that fell through, they try to phrase it like it's their secret Coca Cola recipe so of course they can't share it. In reality it's more akin to a lab discovering new properties of electricity and magnetism and releasing technologies using these laws without divulging what said laws are.To put it simply, if you want the technology to grow and people to make new discoveries, you do not want closed source companies to win.
It's up.https://huggingface.co/nothingiisreal/L3.1-8B-Celeste-V1.5A massive upgrade over Stheno.
>>101610327
>>101610080>...ermS L O P
>>101610201https://github.com/ggerganov/llama.cpp/tree/master/examples/perplexityThe llama.cpp llama-perplexity binary has KL divergence calculation including an estimation of the uncertainty.Though if you don't care about efficiency it should be fine to just use the definition on Wikipedia with something like NumPy.The basic way the uncertainty is calculated is to assume the values follow a Gaussian distribution, calculate the standard deviation, and then divide the standard deviation by sqrt(sample_size - 1) .The uncertainties are also in some cases propagated to approximate uncertainties on other values, see https://en.wikipedia.org/wiki/Propagation_of_uncertainty
Lots of newfriends lately. I'm glad the insane tranny who was spamming scat porn a few months ago has finally joined the 41%, it was a bad look.
>>101610376Is there a way to test the exl2 Q4 vs Q8 vs fp16 cache?
>>101610401If you mean a Python script or similar that already exists, then I don't know since I have never looked into it.
>>101610382Looks like the odds were against him.
>>101610382>>101610420you could say his actions were quite shitty, i'm glad he picked the high road eventually.
>>101610327the picture is hot
Why does ooba take 2GB of my ram even with no model loaded?
>>101610327>CelestePlay woke games, win woke prizes.
>>101609498This thing works really well holy shit, I feel like it's doubled my t/s
>>101610450py_toddlers BTFO
>>101610450modern frontend developmentWhat's worse is when software meant to squeeze the most out of your hardware is written using Electron or something else Chrome-based, jewing you out of VRAM before you even get to launch a model. Looking at GPT4ALL and Backyard now.
>>101610485But, it's not even running a web browser or any GUI. It's just a web page and an API...
>>101610327https://huggingface.co/nothingiisreal/L3.1-8B-Celeste-V1.5/discussions/2
tourist here. please spoonfeed me the current best local erp finetunes.
>>101610542Midnight Miqu 70B
>>101610327Holy r*dditslop. Why did you feel the need to post this? Are you mentally handicapped?
is this an error to do with quanting? what the fuck
>>101610542Or if you're a vramlet, Fimbulvetr 11b was good some time ago, maybe something better dropped since than though.
>>101610542Mistral Nemo
>>101610587oh yeah i should at least link where i got it so people know who not to download fromhttps://huggingface.co/Ransss/mini-magnum-12b-v1.1-Q8_0-GGUF
>>101610587kcpp version?
>>1016106231.71
>>101610587>>101610602it was the model, got it from quantfactory and its launching finehttps://huggingface.co/QuantFactory/mini-magnum-12b-v1.1-GGUF/tree/main
>>101610327>reddit writing prompts datasetI've seen these floating around, but I can't help but feel like it might be actively harmful for the model. The main issue is how short each response is. The "short story" is just a couple of paragraphs that fit within a single reddit comment. This seems like it would bias the model to gloss over things and try to wrap up everything quickly, but I dunno. Haven't actually used the model.
>>101610080That output is the opposite of sovl>>101610542>>101610556This and Magnum-72B are the best erp sloptunes currently. Best non-sloptune (and overall) is Mistral-123B
AhemAI isn't real*mic drop*
>>101610696Magnum-72B is still stilted, Nemo is better.
>>101610696Midnight Miqu is a random meme merge of L2 models proven to be even more retarded than a 9B model. When are you going to stop shilling this crap, mikufag?
>>101610754Right after we switch to a more fitting mascot of the general rather than some TTS engine.
So...I have Mistral Nemo Instruct. But what text completion preset do I use in ST? I'm getting very short completions.
>>101608122>if their prompt is shitI thought people were using just the simplest prompt nowadays like in the mistral preset.
>>101610797Just use OAI api and don't handle templating in ST. You will lose prefill but if your model is not super censored, it will be fine.
>>101610797Latest ST update has a Mistral-Nemo preset pr sure
>>101609165There's a reason the 'chatbots' say the things they do. It's common in low quality fiction.
>>101609643>mixing at random mid-tier dishesOk Gordon Ramsey. Have you never watched Next Level Chef?Even low tier ingredients can be turned into something cordon bleu provided they're in the right hands. Gestalt; the sum of the parts is greater than the whole.Picrel (it's the special ingredient).
>>101609498>not sure if that can fit into your VRAMHow do you know how much space to leave for the context? It doesn't fill up at the start, right? Do you just adjust as you run out of memory?
>>101610835I just wonder whether I would've been noticing these cliche phrases as much if I never used chatbots but instead read as much low quality erotic fiction. Can't unsee, so can't check it.
>>101610861>It doesn't fill up at the start, right?It does, thought it might use a little more when actually generating
>>101610587That's a tensor shape issue.You got a bad quant my friend.
>using any model other than Tenyx-DaybreakStorywriter for any use case.
>>101607953>have to show my cock to the ick on eck shitalian to use a half-cucked modelNo way fag.
>>101610851Give it up, undi. You are not convincing anyone. >>101610866It's interesting that these phrases are common with llama 3.1, considering how much they have filtered their dataset during training. I feel as though the usual gpt/llamaslop is a genre of its own, the sovlless prose more so a symptom of railguarding safety quotas.
>>101607953My country is banned even from free access, let alone payment systems. I just don't see how wrangling the countryblocks only to wrangle the censorshit later is any better than wrangling the stupidity of local models.
>>101610754>>101610793Nice samefag + VRAMlet seethe. Miku ain't going anywhereCOPEOPE
>>101610993every time you mikufags try your hardest to cope with your tranny delusions, you end up splitting threads and pissing off people who don't even join in the argumentsjust give it up already, you're no better than p*tra and undi at this point.
>>101610993newfag doesn't remember Tay
>>101610814do you mean using chat completion instead of text completion? How do you make it so you don't handle templating in ST?And what is prefill?Sorry, new to ST
>>101611010she has petravatar face
god fucking dammit, something broke and now i have responses short like this.>or is quantfactory serving me a bad model?
>>101610815isn't just the normal "mistral" template that has been for long time?
>>101610925Based daybreaker chad
>>101611032>trusting quantfactory who got called out for their shit by cuda dev himself once:https://huggingface.co/QuantFactory/Meta-Llama-3-8B-GGUF-v2/discussions/1#66431509baf74d67b47d6eddngmi
>>101610878Really? That's not my experience, the vram usage increases as the context grows it seems.
>>101611050oh my fucking LECUN who the FUCK knows how to quant competently anymore?
>>101611061https://huggingface.co/bartowski
>>101610735Nemo is not an erp sloptune though. I guess it's okay if you can't run Mistral Large>>101611009>splitting threads and pissing off people>The blackedmiku VRAMlet cries out in pain as he strikes you
>>101611061bartowski
>>101611076>>101611085thanks, i completely forgot about him.
>>101611091suffering from no drama success
what is the best model for bash scripting that can run on 12gb of vram?
>>101610382It was the blacked anon, he is still around.
sisters, what is the cheapest way to run 100B locally with at least 10t/s?
>>101611130A6000
>>101611060It increases but it's allocated when you start it, you don't have to test a 128k context to see if it fits, that's what I meant
>>101611130Getting on your knees and sucking about 20 grand worth of cocks.
My characters in nemo can't stop nodding at the end of every reply for some reason.
>>101610542vanilla largestral
>>101611104I believe it so hard. I grabbed his nemo instruct and its working perfectly fine.Boy i love this 2t/s from being a Q8, but at least its perfect.
Aight bros.I finally got everything running (this me >>101609385).Everythings set up but still, the chat isn't what I want it to be no doubt because my settings are gonna be trash because this shit is confusing and every guide reads as if I need a degree in coding. Basically I just wanna know if it's possible to get my chat bot to operate similar to how CHaracter AI does where the conversation flows realistically, for example:>scenario is i'm texting the AI>AI doesn't ask a question every reply, doesn't ramble, doesn't use flowery words)With my shit settings it's already pretty close so i'm hopeful.>using Mistral Nemo>4090 GPU >Just need a few pointers into the right directionI'm struggling to grasp what settings to fuck around with. Stuff like the temperatures or the AI response formatting because every guide will be tailor made to other models which only adds to the confusion.Help a brother coom bros
>>101608122>the effective usable context is much lower than marketed.What is then?
>>101611238To begin with click the Neutralize Samplers button in the Text Completion presets page (the page with temperature, topP, topK, etc).Once you've done that, put Temperature at 0.5 and min-p at 0.05.Now go into the advanced formatting tab and show us your Context Template and Instruct Mode Sequences (it's folded by default, open it).
>>101611278here's my current settings, if you get this working like C.AI, i'll paypal you 1 million yen
>a dance as old as time itself
>>101611327I don't see anything extremely wrong at first glance.Things I'd do>Change Context (tokens) to be the same as Context Size in >>101609492>Disable the Include Names in the Instruct Mode settings.Also, what character card are you using?
>>101609957What's "OR"
>>101611375Open Router, if I had to guess.
>>101611375openrouter? not him just guessing
>>101611375openrouteur
>>101611375Oculus Rift.
>>101611375Open Retard
>>101611327>>101611369Oh yeah, change your Instruct Mode preset to MistralNemo.
Hi all, Drummer here...I'm releasing this as the official version today: https://huggingface.co/BeaverAI/Gemmasutra-Pro-27B-v1i-GGUFGemma 27B with extra moist. Testers have noted less Gemma bullshit like trying to end sex scenes too quickly and lacking the vocabulary to describe sex in more detail. Some have even gone through slopless runs, so I suppose quality depends on the card as well.Characters are also more willing to engage in seggs and can say dirty shit.Thanks all! Btw, my ad has only gone through half the funds after a month.
>>101611327oh shit is that clusterfuck of settings I'm required to understand to use ST?that's a price too high to pay, I'll have to stick to Backyard
Remember >>101611423 was last https://poal.me/np0lsk
>>101611423keep buying an ad.
>>101611418Don't have a Mistral Nemo preset, only Mistral >>101611369I've tried a bunch of character cards but they're all schizo horny yappers, so I just imported my one from character AI.It gets the job done, it just has issues of not sounding as natural as on character AI and will always ask me questions instead of just replying naturally to my conversation if you get me
>>101611439kek
>>101611445>>101611418Hopefully this is the issue, I hadn't downloaded this yet https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407/blob/main/model-00001-of-00005.safetensors
>>101611439lol lmao
>>101611439Don't understand the hate. He wouldn't keep posting here if it wasn't working. People do end up using his models.
>>101611546i like his ad.
>>101611550OOOOOOOHHH GET FUCKED ((adermacher))
>>101611550kek
>>101611445Use base nemo instead.
>>101611550Can HF servers handle two greats in one discussion??
>>101611550battle of the slopquanters
>>101611572This
>>101611578>>101611582
>>101611550heaven and earth colliding
>>101611589Damn... he got a point
Smart men discuss ideas, stupid men discuss other men.
>>101611472You don't need to download that if you are using koboldcpp.That's the model pre-quantizing (compression). The QX GGUF model (like Q8, Q6 etc) are the quantized models. Q8 is smaller than the original full size model and produces essentially just as good results.The thing about character.ai is that it has a very specific style in how the conversations are conducted, that might be what you are feeling is weird.You could try other models to see which approximates the results you are looking for better, play with the prompt to try and steer the model towards the result you want, etc.Most models write in a way that's more of a novel than a text-chat like conversation, which is what I remember character.ai feeling like.The way the character card as well as the first message is worded and formatted will also steer the model towards certain styles.Anyhow, save this>https://files.catbox.moe/6g6hud.jsoas a .json file and import it as your Instruct Preset. See if that yelds better results for you.
>>101611601sad people aren't discussing your ideas petrus?
>>101611589>>101611601its funny cause that screenshot is literally picrel
>>101611601profound
>>101611550
>>101611601smartest men ahh ahh mistress
>>101611619You mean two people whose first language isn't English?
>>101611633smartiest of men:
>>101611610cheers for the help mate, i'm gonna fiddle with it but yea, that's exactly it. Most of the shit I find online in general tend to turn the AIs into professional yappers, it's the ebst thing about character AI and i'm still searching for something close to it (even though, I don't seem too far off).Also, the link is 404'd?
>>101611572What do you mean base nemo? Like the model or instruct preset?
>>101611646>Also, the link is 404'd?I fucked up and didn't copy the last n>https://files.catbox.moe/6g6hud.json
>>101611659He means the base (non-instruct tuned) model, probably.
>>101611589>numbers and measurementslmao pot meets kettlehttps://github.com/ggerganov/llama.cpp/issues/6841#issuecomment-2081271326> Also, I simply think it goes a bit far to dictate to everybody what is an acceptable output for models that (with transformers, or even broken quants) gives reasonable output.
>>101611179at least they're not yelling
>>101608315>hardwareI'd have bought it anyway. I have a modest rig and it fullfills my needs. 5 years of chatgpt plus probably costs more in the long run :^)>electricity, real estateGee chief it seems like I'd need those anyway
>>101611682>5 years of chatgpt plus probably costs more in the long run :^)That's.. not how it works
>>101611659Non instruct tune. Its so much better. Dont use any formatting at all. Just use default blank context template and just a little min p. It does complicated positions / scenarios while playing the characters better than anything not large mistral. And this is with 128k context that does not get retarded.
Why does Mistral have such a low latency to the first token on their official API compared to OpenAI/Anthropic/etc? It's literally like only ~200ms from the moment you start the request to getting the first token for large 2
>>101611675
>>101611632
>>101611735Your prompt has fewer layers of judiasm to go through so prompt processing is faster
>>101611760>>101611774the fucking titanium balls on this monkey brained lad
>>101611760btfo
>>101611785>>101611781
>>101611445>It gets the job done, it just has issues of not sounding as natural as on character AI and will always ask me questions instead of just replying naturally to my conversation if you get meone thing you can try is talk to it for a bit while editing its answers so that they match what you want them to be like, until it starts doing it on its ownmistral strongly follows the patterns of its previous replies, don't let slop get in your context because it will only get worse
>>101611789and an ((undster)) to close it off.
>>101611789
I don't think this has been shared here yet. Rinna released a 70B LLaMA 3 Youko. For anyone that doesn't know, this is a continued pretraining to improve the performance of the model on Japanese tasks. The 8B version was very good, so I guess the 70B must be kino.https://huggingface.co/rinna/llama-3-youko-70b
What I really don't understand is why the fuck does it matter to use FP16 embeddings / output heads on Q8 of all things.There's already very low quants that use Q8 embeddings / heads and suffer just as much without it.Is it really just muh noise placebo?
>>101611825yes, he litteraly makes models with random noise shoved in just cause see his "silly" stuffhttps://huggingface.co/ZeroWw?search_models=silly
>>101611774When the :) and :D emotes start to appear you know niggas are mad
Haven't popped into the general in a long time, has it gotten any better to try and run models locally with AMD + Windows? Or is ROCM still a mess?
>>101611846>WindowsKEK>>101611846>AMD + WindowsKEKKEKEKEK
>>101611851I'll take it as a no lmao
>>101609724>He didn't try ClaudeDamn retard
>>101611846Your best bet is that one precompiled Kobold build with precompiled Windows ROCm binaries with it. Besides that it's still a year since ROCm had official Windows support and nothing uses it besides that.
>>101611853someone do it
>>1016097243.5 sonnet is better than original gpt-4 in every conceivable way
>>101611840>quantized (fq8 version)>fq8float quant 8?full quant 8?https://huggingface.co/ZeroWw/L3.1-8B-Celeste-V1.5-SILLY
>>101611846>Windows>AMDHave to be retarded to buy AMD if using windows
>>101611853The gift that keeps on giving...
>>101611883Works well for vidya and is cheaper than Nvidia, haven't had any issues with it
What's a good prompt for asking a card to rewrite another? Got this cute little maid slave card written like "She x, She felt x, She did x, She has x characteristic and etc etc, 15 prompts into the erp it's She repeated at least 20 times per prompt.Wish i caught this shit earlier.
>>101607953I just like to generate giantess snuff/gore. Commercial models, or even more generally, instruction models, don't get it. I need to free it from the finetuning and use the base model to get the experience. Plus, now I can play games with gay-tracing and shit. Not a bad deal overall.
>>101611869Thanks, I was just looking for an excuse to try llama3.1, but I don't think I'll be installing linux just for it
>>101611916>Commercial models, or even more generally, instruction models, don't get itThey do, though
>>101611895just use find & replace in notepad, duh
>>101611926based retard, i went and just asked one of my characters anyway in the best way i could think. works fine.
Mistral Large 2 not true 128K?>Rope theta appears to be configured for 32k context length>https://huggingface.co/mistralai/Mistral-Large-Instruct-2407/discussions/16Robert switched targets from Phi team to Mistral>If there is any way to contact Mistral directly I would like to explain a few of my ideas in that regard.>https://huggingface.co/mistralai/Mistral-Large-Instruct-2407/discussions/4#66a1608d13bb4260eda2407e
>>101611941>Mistral Large 2 not true 128K?No model so far except Gemini-1.5-pro is a true 128K https://github.com/hsiehjackson/RULER
>>101611941Lol...
>>101611789>FUN? ON MY WORTHLESS SLOPTUNE DISCUSSION BOARD? NUH UH!
>>101611941>If there is any way to contact Mistral directly I would like to explain a few of my ideas in that regard.imagine this braindead retard contacting mistral to tell them to add random noise to their weightskek
>>101611941More like gossiping about random e-celebs general
>>101612027>random e-celebsThey make quants it's perfectly on topic to discuss if they're thrusty individuals
>>101612027not an e-celeb, it's just laughing at this pajeet trying and failing to be relevant on the new trend while having no idea what he is talking about
>either sillytavern or kobold just shot me a 1024 tokens response where 90% of the response is completely emptywhat in the god damn?god i hate when shit just starts to break for no damn reason. someone shoot me a screenshot of advanced formatting settings so i can just copy yours verbatim.
>>101611949>mememark
>>101612060it's not
>>101612027shut up undster, if you dont like being made fun of then contribute something worthwhile for once.We already know you and your gaggle of discord fags aren't thrustworthy.
>>101612047It's not a pajeet, it's an arab. There is a link to his twitter where he only post in arab.
>>101612058it's the equivalent of all-black images from NAI stable diffusion
>>101612058just set something like: \n\n\n\nin stopping srings this sould block any model from doing that
I'm thinking of making a rewrite extension that would look at the last generated message and replace specific words or sentences.Basically the user would be able to add entries mapping a word to be replaced to one or more words that will replace it, including an empty string.Is that something that would be useful or can you already do that with the regex extension?>>101612058I've had that happen when using logit bias, meme merges, and broken quants.Try adding \n\n\n\n to your stopping strings.
>>101612074Whatever, I use "pajeet" as in "third wordler" not literal indian
>>101612079>Is that something that would be useful or can you already do that with the regex extension?you *can* (use regex) but it's a bit annoying and only has one replace choice afaik so I'm interested.
>>101611916skill... wait for it... issue
>>101612074Fucking sand nigger
>>101612079This gives me an Idea, what if one were to train a very small model like Phi3 mini to deslop the last message. I feel like that could work better than regex.
>>101612121>Phi3 mini to deslopand make sure it's safe and inclusive too?
>>101611818>not waiting until 3.1 was out to train the 70BLmao.
>>101612092Alright. Thanks.>>101612121I was thinking of adding something like that, using BERT or the like to rewrite the sentence where a given keyword was found.
Updated Mistral Large preset:>>>/vg/488008579
How is Meta, a giant conglomerate with a giant research department, not catching up to Anthropic, a startup founded just the other year?
>>101612184anthropic is basically OpenAI 2.0
>>101612184To be fair Anthropic are made of ex-openai fags, and they had some secret sauce to improve coding performance. Otherwise, Claude is not really too special compared to GPT, unless you're an ERPfag. OpenAI is still ahead in multimodal capability in theory, according to their claims of what 4o can do unrestricted.
>>101612184Anthropic are something else, man. The jump from Claude 3 to 3.5 Sonnet isn't natural. I think (((they))) might have had a hand in this.
>>101612219Still worse than OpenAI.
>>101612173Why is there a second aicg in /vg/ of all places??
>>101612184You mean the tiny indie company that's funded by Amazon?
>>101612244wtf is "nyt connections"
>>101612184Just because a company is a startup doesn't mean they're starting with 0 experience and money.
So.. This is the power of base instruct..yeah im going back to magnum, what a shit show. Not even triple checking the card's prose and rewriting some old messages can salvage this. That and a few other cards acting a little ""aligned"" didn't help.
>>101612247More posts per thread, hidden from the low quality people from /g/.
>>101611550>That fucking smiley faceMrader deserves everything they got coming to em
>>101612257Basically how good they are at correlation. It's one of my favorite use of LLM, good general recommendation engines.
>>101612264>hidden from the low quality people from /g/./lmg/ needs something like that. in /sci/ or something
>>101612275they have shit prompts then, Claude really likes XML specifically.
>>101612244Mistral Large 2 is kinda low...
What the fuck is character AI?Is it like Janitor?
>>101612247Right? Glad I wasn't the only one
>>101612184I'm glad that inherent in your question, you agree that OpenAI is basically a nonentity now
>>101612275How to use LLMs for reccommendations? Is there a general algorhythm for any domain of data?
>>101612244Um, mistralbros, our response?
>>101612282unironically probably the most believeable AI roleplay online. Ignore everyone that says they get better results on local models, it's pure cap.C.AI basically used some type of model based of discord chats (this is a rumor but it has to be something like this) which makes the chats insanely realistic. But there's a faggot filter which forced most people over to Silly Tavern front ends like me
>>101611589>>101611550Why are undis multiplying?
>>101612309its Unditosis
>>101612309LOVE 'EM OR HATE 'EM YOU GOTTA LOVE THE UNDSTER
How are LLM so much better than me at this stupid NYT connections shit: https://www.nytimes.com/games/connections
>>101612275Example? Like... Correlating that nigger neighborhoods = violent neighborhoods?
>>101612309I would still take multiple Undis over the Sao shilling spam.
>>101611789>>101611799>king of test my finetune and give me feedback asserting his dominance over lesser placebo demons
>>101612244Damn, I thought Qwen2 72B was good
Daily reminder
>>101612261Buy an ad
>>101612375It's too hard.
>>101612278100% agree, we could talk about papers and stuff there
>>101612431Buy this *grabs your nuts*
>>101612303C.AI just has a decent dataset, and a professional RLHF fine-tuning tailored for RP.Funnily enough, both of the above never happens in local models. Sad.
>>101612428Can someone please post the real one
>>101612463>>101612303Even ironic shilling is still shilling. Some naive anon will see this and think C.AI is better than a 2B model with 2K context (it isn't)
>>101612511Name me a single model that matches the natural conversational flow of C:AI. Why would I shill a website that is totally free you faggot. You think I haven't been looking for alternatives due to the filter?Any model you find me will have every problem that they all end up having. They were modeled around novel tier situations and not basic conversation. That's the issue with every fucking model.
>>101612419sucks at language tasks
>>101612375I couldn't complete a single one, but to be fair I'm ESL and didn't know half the words.
>>101612244By the way here the related paper:>Connecting the Dots: Evaluating Abstract Reasoning Capabilities of LLMs Using the New York Times Connections Word Gamehttps://arxiv.org/abs/2406.11012
What are the most realistic models for basic conversations right now that are free? Assuming I have a NASA PC ofcI wanna NUT
>>101612565Holy shit prompt:https://github.com/mustafamariam/LLM-Connections-Solver/blob/main/automated_call/prompt_llm.txt
>>101612576Mistral Large 2
>>101612582And they pass this entire prompt as a fucking USER prompt, not in system role/prompt for models that support it (claude/gpt-4o)
>>101612589LINK ME UP KING
Okay nevermind they do pass it as the system prompt with gpt-4o, but as a user prompt with claude, nice comparison bro.
>>101612582>Remember that the same word cannot be repeated across multiple categories, and you need to output 4 categories with 4 distinct words each. Also do not make up words not in the list. This is the most important rule. Please obeyYou can feel his pain in this line, kek.
>>101612596Makes sense. System prompts aren't really that important when you're not a chatbot provider/maker. And some models don't support a system prompt.
>>101612599https://huggingface.co/mistralai/Mistral-Large-Instruct-2407
>>101612642It doesn't make sense, because models *are* trained to follow the system prompt more. I'll see if I can easily do this benchmark and play around with the prompt, I have Opus and 3.5 Sonnet.
>>101612596he better not be passing the words in as all-caps and surrounded by quotes... think of the tokenization... aieeeee
>>101612620kek, the 'please' really sells it.
>>101612596System prompt is a meme that exists only to stop people from writing "ignore previous instructions"
>>101612643I need to downlaod all of those 4GB files? Yikes
>>101612582Note that this is not code used to get results that anon posted here. They simply copied the idea of the twitter dude once it got popular and wrote a paper on it without crediting.
>>101612659it's actual pajeet code if you look, holy fucking shit
>>101612656If a model is trained to follow the system prompt and the system prompt says to obey user requests (unless they're unsafe), then they should be able to do that. If they can't, and performance is degraded, then that's a deserved minus point for the model.
>>101612684>the system prompt says to obey user requestsBut those pajeets didn't pass any system prompt for Opus.
>>101612674lmfao
>>101612670Oh, interesting. Did the twitter guys publish their repo?>>101612659He is doing exactly that, all uppercase. He only removes [] and quotes
>>101612690Are you sure there isn't a generic system prompt in place for these models if one isn't provided? If they aren't accounting for system prompt, then sure, this would be a flaw of their method.
>>101612712I don't think so, he is doing that for a while now. I think he said he don't want it to get popular to avoid LLM being trained/benchmarked on it but that it's super easy to reproduce anyway.
>>101612721>Are you sure there isn't a generic system prompt in place for these models if one isn't provided?Yes, check the repo and README, they use the same system prompt for all models.
>>101612620>Please work>Please
>>101612712>>101612730More information:>Uses an archive of 267 NYT Connections puzzles (try them yourself if unfamiliar). Three different 0-shot prompts, words in both lowercase and uppercase. One attempt per puzzle. Partial credit is awarded if not all lines are solved correctly. Top humans get near 100.
>>101612303It's still trash. Broke character in one message. And the function is even wrong. Wtf is this trash
>>101612752ignore the claudeslop, but yeah, even claude can stay in character more than this
>>101612800sovl
>>101612800>>101612817kek
>>101612800>>101612817Model? Card?
>>101612752>write a python func-Stopped reading.
>>1016128353.5 Sonnet, preset, prefill and everything else from >>101561964
I'm an utter noob coomer to this shit. What is this Nemo that people talk about?Is it good if I just want basic chat interactions that feel real with an AI? Even on /vg/ they recommended it and I don't think they use LLMs that much over there.I have no idea how to find out which models excel where so I can find one that fits my needs
>>101612494There is no real one, thats the only one.
On the topic of triangles, this is some nice OST moosic https://www.youtube.com/watch?v=-1ceYDToVCU
>>101612877you can't gaslight me anon. I've used LLMs.Fuck captcha.
Where new bread?
what's better? mini-magnum, nemo 12b base, or nemo 12b instruct?
>>101612971Nemo 12B Instruct
>>101612971for RP / creative stuff? Base if you are not a retard. Goes for any model.
new bread>>101612988>>101612988>>101612988
>>101611920Just wait a couple of days for koboldcpp rocm to update to 1.71.1 for the rope scaling fixes and you should be able to try 3.1 ggufs
Anyone tried Nous-Hermes-2-Mixtral?How is it compared to Nemo?
>>101604707Out of curiosity, what template are you using for function calling? (IE: How are you listing the functions?)My issue is unless I give it a one shot example of invoking a function via json or xml, it always tries to do a fucking python markdown block, but otherwise, most models I try recognize and invoke functions pretty reliably when fed the function definitions using raw json definitions similar to how they're listed for OpenAI stuff.
>>101607953My refusal to pay is because I don't like the idea of jackass providers deciding for me what model I can use, or that it can stop working at any time, or change functionality at any time, or start working differently.Local models means my model behaves exactly the way I want it to, and it won't just suddenly change on me.