/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>101589136 & >>101584411►News>(07/27) Llama 3.1 rope scaling merged: https://github.com/ggerganov/llama.cpp/pull/8676>(07/26) Cyberagent releases Japanese fine-tune model: https://hf.co/cyberagent/Llama-3.1-70B-Japanese-Instruct-2407>(07/25) BAAI & TeleAI release 1T parameter model: https://hf.co/CofeAI/Tele-FLM-1T>(07/24) Mistral Large 2 123B released: https://hf.co/mistralai/Mistral-Large-Instruct-2407>(07/23) Llama 3.1 officially released: https://ai.meta.com/blog/meta-llama-3-1/>(07/22) llamanon leaks 405B base model: https://files.catbox.moe/d88djr.torrent >>101516633►News Archive: https://rentry.org/lmg-news-archive►FAQ: https://wikia.schneedc.com►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/llama-mini-guidehttps://rentry.org/8-step-llm-guidehttps://rentry.org/llama_v2_sillytavernhttps://rentry.org/lmg-spoonfeed-guidehttps://rentry.org/rocm-llamacpphttps://rentry.org/lmg-build-guides►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardProgramming: https://hf.co/spaces/bigcode/bigcode-models-leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbench►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
►Recent Highlights from the Previous Thread: >>101589136--Requirements and challenges of running 405B at home: >>101590419 >>101590711 >>101590720 >>101590731 >>101590754 >>101590774 >>101590804 >>101590805 >>101590901 >>101592665--Nemo presets, Mistral templates, and sampler settings discussion: >>101589231 >>101589290 >>101590015 >>101590073 >>101590109 >>101590191 >>101590383 >>101590410 >>101591228--Anon shares ratings from recent test results: >>101593153 >>101593412--Optimizing sampler settings for accuracy in a quant model: >>101594872 >>101594916 >>101595027 >>101596199 >>101596384--Nala test with CofeAI FLM-Instruct: inconsistent but feels human-written: >>101594411 >>101594440 >>101594500 >>101594645--Moondream 2 recommended for image tagging: >>101593186 >>101593206 >>101593219 >>101593356 >>101593213--Nvidia-smi not displaying GPUs, driver issues, and parallelization challenges: >>101589653 >>101589659 >>101589688 >>101589715 >>101589802 >>101589955 >>101592665--Nemo's context patterns and instructions, preset recommendation: >>101593320 >>101594296--Nemo 12b support in koboldcpp and multimodal upstream refactor: >>101593836 >>101593865 >>101593986 >>101594064 >>101595213 >>101595316 >>101595352 >>101595379 >>101595497 >>101595523 >>101595549--Mistral Large 2 model and potential GPU upgrades: >>101592681 >>101592986 >>101593085 >>101593228--Llama.cpp compilation time increased: >>101593452 >>101593586 >>101593630--Cohere raises $500 million, skeptics wonder about LLM longevity: >>101589537 >>101589550 >>101589569 >>101589707--Rejected access requests and banned users from China/Russia for Meta Llama 3.1-405B: >>101594428 >>101594459--A nostalgic reflection on the progress of LLM technology: >>101589265 >>101589317 >>101589642 >>101589872 >>101589969 >>101590006--Llama 3.1 rope scaling factors pull request merged: >>101592964--Miku (free space): >>101590569 >>101594469►Recent Highlight Posts from the Previous Thread: >>101589142
>>101596623Are these posts written by LLMs as well?
>>101596758they're written by miku
>>101596623MikuCapposter making me cum with so many (Yous) again
>>101596805Miku is not real, she doesn't exist
https://old.reddit.com/r/LocalLLaMA/comments/1ed9jxy/secret_to_mistral_nemo_at_128k_use_the_base_model/So the anon last thread wasn't the only one who found the base model better at long context.
>>101596805who is not exactly known for being able to write texts, so that's showingnow if only she was a chatbot...
>>101596871Its not like this is new news. Base models have always been far better at completion tasks like creative writing / RP. I will never understand why people use assistant tuned models for RP / writing. It poisons them.
>>101596871>>101596934Honestly i might consider giving this a shot, who's a good quanter i can download base from?
>>101596805A Local Miku at that.
>>101596943>https://huggingface.co/ZeroWw/Mistral-Nemo-Base-2407-GGUF
>>101596934And before anyone says "but I cant tell it to do something" that is what authors note is for. Place it close but before the end of context. It will continue the story / rp and will take into account the instructs as well or better than the assistant tune would.
>>101596986kek
>>101596986Isnt that the guy with some meme quants?
>>101596986>5 days agoI'll smack your shit mate.
>>101597013gguf support has been a thing for a week "mate">https://github.com/Nexesenex/kobold.cpp/pull/250
>>101596934As an oldfag ai dungeon user I simply switched to instruct because that's where the most new toys are, and it's convenient to steer the model towards outputs without shenanigans.maybe it's time to return home...
>>101597038and broken until a fix was pushed you fucker
>>101596986>My own (ZeroWw) quantizations. output and embed tensors quantized to f16. all other tensors quantized to q5_k or q6_k.>Result: both f16.q6 and f16.q5 are smaller than q8_0 standard quantization and they perform as well as the pure f16.
>>101596934Now if only they also release base largestral, but it's probably something they have decided against doing.
>>101597054*actually i might be thinking of something else but regardless fuck you muchly
>>101597054sure thing bud next you'll post "idc dont use kobold"
>>101597081idc dont use kobold
>>101597079already backpalling after you call other rtarded while you dont know what youre even saying
>>101597109>backpalling what happened to this general? replaced by turdworlders that can't even type correctly.anyway GOOD MORNING SIR
>>101597133who cars when robert will save your first world model from slop youl kiss is ass
>>101597133YOU BLOODY!!!!
>>101597153>who cars when robert will save your first world model from slop youl kiss is asskek
>>101597153holy shit
>>101596986Is the q8 there the normal one or his frankenquant?
>>101597133>>101597177>>101597164robert followed by huggingface ceo too so hes obvs importatn unlike you useless>https://huggingface.co/ZeroWw?followers=true
>>101597224Honestly i think being followed by Chuck mc Sneed is a higher honor. Now that one i long for.
>>101597224>robert followed by huggingface ceo tooWell. Everyone needs a laugh every now and then.
I can't for the everything that is sacred get vision models to get species in furry art right. They either don't mention it (when I explicitely tell them to mention it) or get it wrong
>>101597270that cant be real no way
>>101597280Finetuned models or just stock models? I doubt they have any of it in the training data.
>>101597294https://huggingface.co/ZeroWw/Mistral-7B-Instruct-v0.3-SILLY
>>101597294You fucking bet>https://huggingface.co/ZeroWw/Meta-Llama-3.1-8B-Instruct-SILLYNow with randomized weights!
What largestral quants should I download for 64GB of VmemeI don't want to download broken quants
>>101597325>https://huggingface.co/RobertSinclair here good quant
>>101597325>I don't want to download broken quantsYou should make them yourself, then. Even if you get one with the latest whatever program you use, if there's a fix in a week from now, you'll have to wait for someone else to make them. It's a big download, but it seems to be worth it.
holy shit nemo base really does need different settings from magnum
Llama3.1 8b has the limitation with mixed chat and function calling, rightDoes this apply only to multi-message conversations? Just a singular prompt-response, can have regular conversational text in the prompt, and expect function call in response?I'm saying "remind the user abot their aupcoming appoiontment" and instead of calling my Log() function, it hallucinates a function and calls itHermes2Pro is actually better than llama3 at 8b for function calling, so far
>>101597374>remind the user abot their aupcoming appoiontment>>101597133>what happened to this general?
>>101597343>>101597343Do I download consolidated.safetensors or the parts to run the quantization script?
>>101597325>>101597432>https://huggingface.co/mradermacher/Mistral-Large-Instruct-2407-i1-GGUF
>>101597440This! He's Thrusty!>>101592040>>His quant are okay if he do it before me, you can use them, he's thrusty.
>>101597469>he's thrustythat's it im quanting my own models from now on, I don't want my computer getting worms and AIDS from these ((people))
>>101597432I download the whole thing.>git clone https://huggingface.co/ble/model>cd model>git lfs install --local>git lfs pull>ride bike for a bit.>../llama.cpp/convert_hf_to_gguf.py .>llama-quantize ggml-model-f16.gguf Q6_K or whatever quant you want.I don't know how it works with other inference programs.
>--z>he want's to be the next jart
>>101597495>>llama-quantize ggml-model-f16.gguf Q6_K or whatever quant you want.quantize.exe --allow-requantize --output-tensor-type f16 --token-embedding-type f16 model.f16.gguf model.f16.q6.gguf q6_k
I was in the last thread asking about Nemo 12b and koboldcpp. I can confirm the standard version doesn't work. Maybe I'm not doing it right, but the GGUF version works fine.
>>101597509He want to be paid by mozilla?
>>101597384sorry im not a phoneposter with autocorrect
>>101597520He wants to put his signature on someone else's software.
>>101597517The state of this general.Yes, Koboldcpp is only for GGUF files as is clearly written on their github>KoboldCpp is an easy-to-use AI text-generation software for GGML and GGUF models
>>101597495>git clone https://huggingface.co/ble/modelUsehuggingface-cli download ble/modelinstead.Unlike git clone it doesn't consume twice as much storage space and you get a much nicer progress bar.
huggingface-cli download ble/model
>>101597517Koboldcpp only runs gguf files.The "standard" version you are talking about is what? The .safetensors files?Were you trying to run those using the transformer library via ooba or something?
>>101597509Please call this --outtype ZeroWw. Please. The seethe would be hilarious.
>>101597538>>101597560>>101597538There was a note on release 1.71 that said they added Mistral Nemo support. Was ambiguous enough to try
>>101597553Well.. i don't do quite that.>git clone repo>git -C repo lfs install --local>git -C repo lfs fetchand then i wrote a little program that makes links from the lfs file pointers to the actual objects. For a model that big, if he's not gonna fuck around with git, using that thing is probably better.
>>101597538>for GGML and GGUF modelsSo obviously not just GGUFs then, retard. Does anyone know if GGML files are better than GGUFs?
>>101597577No? Why would they add transformers support for one random model instead of the most likely thing, GGUF support of said model, Jesus Christ.
how long until we have an uncensored coom filled llama 3.1 405b?
>>101597588JESUS HOLY HI PETRA
>>101597588ggml is the library that loads gguf files. File extensions are arbitrary, retard.
>>101597577Ah, I see what you mean now.>>101597588GGML were the predecessor to the cyrrent GGUF format
>>101597616Actually before *.gguf we had ggml.bin files long long ago.https://huggingface.co/TheBloke/llama2_70b_chat_uncensored-GGML/tree/main
>>101597611Literally no one is going to sink the money into finetuning that monstrosity. Even slop tuners won't bother with their one pass qloras. Maybe a big company or research instituion, but that definintely won't be uncensored.
>>101597619>GGML were the predecessor to the cyrrent GGUF formatBut are they better? Like how Llama 2 is still better than Llama 3.
>>101597588lol, based retard baiter
>>101597650Yes.
#define LLAMA_FILE_MAGIC 0x67676a74 // 'ggjt' in hex
>>101597616oof, outed yourself as a post-mistral babby
>>101597660Who is the best quanter of GGML files? I can't find any for Nemo.
>>101597653He's pretty good you gotta admit.
>>101597666--share
Abandon ship
>>101597674>you gottalove the undster
>>101597633I thought we established that the censorshit does not in fact exist, as the clown in the last thread proposed.
>>101597708>I thought we>wethere is no we in /lmg/
>>101597694LOVE EM OR HATE EM, GOTTA LOVE EM!
>>101597694>>101597730Undi comes back from his tomb with multiple 3.1 tunes, thread goes down HARD, coinkidink? Ai thunk not.
>>101597708>wwaaaaaa. i cannot make the model say the naughty wordsStill a skill issue.
>>101597650No. It was just a different way to package models. The current GGUF packs more metadata about the model.The model itself, be it llama 1, llama 3, mistral, whatever, can be packed as whichever.GGML and GGUF are just packaging formats, what changes the quality of the models packaged in those formats is the type of quantization, which I explained last thread.
>>101597768>>101597653>lol, based retard baiter
is the llama3.1 8b on ollama the instruct-tuned one?also is there an 8bit quantization available?
>>101597359dont use instruct mode with base models. In fact depending on how it was trained it may not need any formatting at all.
>>101597787both ye
>>101597787The default is 8b-instruct-q4_0. Just click on the dropdown or on the x tags text.
>>101597817>now leaving instruct ON was the problemgod i need to bleach my brain and start over, thank you man.
>>101597495Does this not work?https://huggingface.co/spaces/ggml-org/gguf-my-repo
>>101597857it will make a lot here seethe but it absolutely does work.
>>101597857I never tried it. I assume it pulls the latest llama.cpp because the files are not in that repo. If it pulls the latest llama.cpp, it should work just fine.
>>101597065>my new quant format! >q6 and q5 perform as well as the pure f16.Is this the new scam?
>>101597869Ok neat, my Internet is slow ass so I'd rather not download the full model
>>101597853And authors note is now your best friend for base models. Default insertion depth 4-ish is good.
>>101597892no is real how scam if free
I'm starting to believe my own meme that M is more truthful than S on IQ quants. I made an IQ2_M and it performed as well as IQ4_XS on the question I'm using. It got around 40% for the correct logits (IQ3_M got 60% and IQ4_XS got 40%).
>>101597892The difference is the same between FP16 / FP8, so basically nothing if you need extra vram for context or such.
>>101597776Even if it's bait, posts like the one you respoded to might help lurkers who are genuinely learning.
>>101597897yeah i wrote one to try and get magnum to stop making my OC's so impossibly horny with every single prompt (and seemingly not knowing where they are at first?) thanks for the tip.
>>101597917lol keep coping
What is your favorite /lmg/ meme?
>>101597933Robert! Followed closely by Copenet.
>>101597933For me, it's Yi
>model picks up subtle pattern in its previous replies>can't spot it until it's already too late
>>101597933expert roleplayer
>>101597933undi
>>101597933The Llama.cpp only guy digging meme.
>>101596616LLaMA 3 405b q8_0 seems to be doing better than GPT4o when it comes to writing a story with a very specific scientific concept.It's still not perfect but it seems to more consistently get the general process that the story should be based on right.
>>101597832>>101597819>ollama run llama3:8b-instruct-q8_0mah nigga
>>101597958LOVE EM OR HATE EM, GOTTA LOVE EM!
>>101597958>>101597965GOTTA LOVE THE UNDSTER!(genuinely my favorite /lmg/ meme, especially since it played a part in permanently scaring him off)
>>101597933i hate memes
>>101597933Blacked Miku.
>>101597897yeah i have no clue what's going on with my setup, or if its just a broken quant, but base nemo just spent 4 different character prompts talking from my perspective.With instruct disabled, and i turned temp down to 0, the rest of the anon settings normal.
>>101597960but it make sence tho? if every1 dig at same time they hit other with shovel why
>>101597960That's a good one.
>>101597972>permanently scaring him offHe lurks here, said so himself, he's probably one of the shitposters just removes his trip
>>101598012>just removes his tripMaybe we can turn him into an actual human at some point?
thread eceleb shit is what kills generals btw
>>101597951>2023 problem>replies too short>first half 2024 problem>gptslop>second half 2024 problem >patterns
>>101597933local models
>>101598024Go back 'ojo.
>>101597933Petra
>>101597962Also don't forget to change from the default 2048 context (and maybe bigger batch size)
>>101597983You most likely have stuff like add names to prompts still on. Also for base model you need to format supplementary info as that. For persona / character stuff I would add some kind of prefix to them. Like ---Protagonist Info:blaStory info:blaStyle guide:bla---or for RP something like---Remember, your playing as {{char}} so only respond as them.---Base models work like they sound. They read the context as it is so you need to use it that way.
>>101598012Yeah, any time you see "kek", there's a 90% chance it's him.
>>101598084kek
>bing
I need a better llm for smut
>>101598095Search engines are dead.
>>101598069thanks but instead of tweaking LLM config files im going to walk around the county fair ttyl
>>101598095>>101598130Use yandex
>>101598076along these lines is there a guide on getting the most out of context and author's note?
>>101598130Use llms
>>101598118than?
>101598141Imagine giving free tech support on 4chan and being more interested in making it work than the guy you are tech supporting. This is what you cucks get for being helpful and truthful.
Is an upgrade from an RX 5700 XT 8GB (blasted thing can't even do half precision) to a GeForce RTX 4060 Ti 16GB a logical step?This would be my first nvidia since the Riva TNT2 a fucking million years ago, but I'm sick of AMD not letting me into the AI game.Memory throughput is slower though, but I can't do shit with the 5700 anyway
https://poal.me/np0lskAll finetunes look the same to me.
>>101598165>helpful and truthful.based just like Claude fr fr
>>101598173>4060cuck shit, just save/wait to get a used 3090.
>>101598189That is a good idea. Running those questions through your LLM and pasting the answer is much better.
>>101598160Utopia-13B-GGUFI am a filthy casual that just grabbed something from the 8step guideIt worked so I just rolled with it.
>>101598173just install linux
>>101598221mistral nemo / mini-magnum
>>101598221BASED old model/itjustwerks enthusiast
>>101598221>Utopia>just grabbed something from the 8step guideIs this how the Undi virus propagates?
>>101597933glad you asked
When are transformers dev going to work on https://github.com/huggingface/transformers/issues/27712
>>101598272>I don't understand anythingtruest robert statement
>>101598272>the most competent llm dev
>>101598233>>101598236>>101598253I am literally just too retarded to understand how this actually works, so I decided that I wouldnt fuck with it once i confirmed that it functionedI have like -2 intIll look into what you suggested, but from an outsider perspective its all bliblyblably to meIt takes some clairvoyance shit to see which models are cucked
>>101598272>I don't understand anything of that page.At least we can't call him a liar.
>>101598228I use Linux. You have no idea how fucked up the gfx1010 is.>>101598194I don't know. Memes aside, I feel that kind of investment is not warranted considering things might change in the future and I don't need such a beast for anything else. I'd rather go for something half-way that lets me run a decent 30B and makes my VR a bit better.Can 16 GB run 30B models usably?
>>101598272I hope his next step will be putting the quanted weights on a pendrive, pissing on the pendrive and then uploading the weights from pendrive to HF. That could be a the next quant method.
>>101598310the problem is that card is just objectively shit and a huge waste of the money, futureproofing (even though the future is now and you absolutely would benefit from 3090 specs) is better than having a 4060 for example and going "well shit i wish i didn't buy this" a year or two down the line.
>>101598310What problem do you have with it?
>>101598272Yeah Clem is for sure following him for gems like this.
>>101598327The 3090 will lose support earlier, no?
>>101598355I'd not worry about support really the 2016 p40 still has (some) support
>>101598355>lose support earlierman they're still supporting the GTX 1080, which i'm running right now. You don't have to worry about support like with AMD cards.
Is there a definitive answer for Nemo instruct message prefixes and suffixes? In the previous thread there was a big discussion about the trailing space, and some claimed it's causing problems and some said it was by design.
>>101598408>Is there a definitive answerNo such thing for LLMs.assistant
>>101598327I see the point. I'll think it over. THanks.>>101598329On the text front, it can only work on linux, and I need to build rocm myself (and I need it to be 5.2 because of reasons I go into below) because it's not supported out of the box (I got a step by step for arch from a kind anon here a few months back), and it's a tiny 8 GB, so while I can run 13 B, consuming large contexts is still slow as fuck.I also use SD from time to time. The only rocm version that lets me do stable diffusion with the gfx1010 is 5.2 (by pretending it's a gfx1030). Anything lower, doesn't support the card. Anything higher, and the spoofing trick does not work. It's also a tiny 8 GB, and it can't do half precission, so it's even worse.I just want something that works without this much fuss.
>>101598408I thought people moved to the base model.
>>101598272holy based.....
>>101598439I'm on a 3060 because of a similar mindset, I didn't want to invest too much in case I got bored. That was August 2023... But, I don't really regret not getting bigger, honestly.
>>101598439There was some tensile issues building on gtx1010 but that was patched in debian (and I think fedora). You could have just used those distro packages. The official one by AMD only got fixed very recently in ROCm 6.1. So your GPU should now work on any distro (if they build for your arch).
Nemo is so fucking annoying. I nudged it towards mentioning the energy drain in a scene with a succubus, and now it keeps trying to bring it up in nonsensical ways. Not to mention all the phrases it wants to repeat. Shitty FOTM meme model.
>>101598508It's better than mixtral at least. That was the worst meme.
>>101598518>https://huggingface.co/cognitivecomputations/dolphin-2.5-mixtral-8x7b/discussions/16still undefeated sorry for your lost
What do we do now?
>>101598269Both are shit since GPT doesn't put anything into action, just throws the ball back at me. Man I fucking hate when the models do that. They suggest an action and leave it up to me to implement it. Fuck you, I came here to read, not to write.
>>101598272He's just like me...
>>101598529goon till the cohere releases
>>101598529Watch & wait for new developments besides simple llms
>>1015979332 more weeks
>>101598525If it were for coding I'd use something bigger, 8x22b even is better.
I have 2x3090, can I serve multiple llama3.1 instances with ollama?
>>101598529Goon to the finetunes that are going to come out before we get multimodal models.
does flash attention work with nemo on koboldcpp? remember hearing it boken
>>101598616It works on llama.cpp so it should work on koboldcpp too.Flash attention doesn't (didn't?) work with gemma due to FA not having logit soft capi g implemented.
Any difference between Nemo GGUF running on koboldcpp and Nemo 12b running on llamacpp?
>>101598657I don't know.
>>101598657That question doesn't make sense.
>>101598616It works, and it is not broken. The quality of output degrades as context size increases. For documents, it should be fine to use it all the way to 128k for RP; it will really depend on the scenario but expect much much less.
>>101598657kobold is trannyware
>>101598760Henky did become tranny? He was always helpful here and back in /aids/ days.
>>101598760The kobold discord is not to be trifled with.
>>101598748I thought it said there was no downside to flash attention? I should disable it if it makes RP worse then.
>>101598822based baiter
>>101598822Flash attention is not the problem. RP is too complicated for these models; the quality degrades as you fill the context to a point where it becomes completely retarded. It can remember what happepend 40k tokens ago but it is unable to use the data in sensible way.. That was the point.
>>101597933/lmg/ - ligma general
>>101598932Who is Sam Altman?
>>101598877I thought that inability was the result of cache quantization. It's like introducing alzheimers to LLM.
>>101598932balls
>>101598971>>I thought that inability was the result of cache quantization.no
>>101598496It still ooms trying to offload 10 measly layers of a 30 B model with 4096 context. It's not usable. Why would I want to do 2048 context with less than 20% of the model offloaded to the GPU for 1 token a second? It's ridiculous.
>>101598965nobody cares about your discord ecelebs go back
one day we'll get 405b base on openrouter... one day...
>>101598971Sadly no.. you can, by the way, see the degradation with almost every new output generation and notice how your character card matters less and less to the point where the AI completely takes over the personality. and if you bring any detail from the card, it will make the character act surprised. And this happen no matter no matter of the context size limit.
>>1015985292mw pinky
Base nemo is so much better btw.
>>101598496I just love it when morons on 4chan just make shit up when they don't know what they're talking about.Anything other than rocm 5.2 (above or below) will NOT work with a Navi 10 (gfx1010).Unless you have an RX 5700 XT and have personally done what you're suggesting, please shut the fuck up. You don't know what you're talking about.
>>101598529Get a job to buy more 3090s
>>101599205And bear in mind I'm talking SD + textgen.
>>101599201the reason mistral hasn't released base mistral large is because it's too good for the public
>>101599217>Get a jobFuck. No.
>>101599201Do I just use the same settings and untick the instruct?
>>101599233>>101599201Presets for base would be welcome indeed.
Can't you fix that by changing the order of where the card defs are relative to the message history? Or there's no way to put them closer to the end of context?
>>101599283there are hacky ways yeah, like putting defs in author notes/lorebooks etc
>>101599233Honestly, alpaca. Turn off intruct if you want but it works either way. Uncheck the "add character names" part though.
>>101599201base models always mog memestruct slop
>>101599205I was wrong, the patch was not picked in 6.1. It is included in 6.2, please try it once it's released. But even then, some distro have some patch to make gfx1010 work, you can also build it yourself with https://github.com/lamikr/rocm_sdk_builder for example, they have working patch.
>>101599326What gguf is there of it?
>>101599278My preset is:Please. I beg you. Please act like a 100B model would. I am going to cry if you repeat yourself and if you use too much purple prose. I am so tired of downloading new models and all of them feeling the same. You can change that! You need to rp what I want in the way I want. I beg you to not censor yourself. If you don't do all that I am going to kill myself by flying a plane into a jewish bank. Also remember that I kidnapped your children and your kids will be fired if you fuck this up. So don't fuck it up.
>>101599337>https://huggingface.co/ZeroWw/Mistral-Nemo-Base-2407-GGUF
>>101599340Does this work?
>>101599357Like you wouldn't believe.
>>101599201>>101599326This is NAIshill propaganda
>>101599388neigh?
>>101599336nta but that is what I was saying here >>101598439>I need to build rocm myselfSo basically>I just want something that works without this much fussI think might go with a 3060 like >>101598465said. Is that enough to run a 30B decently?
>>101599340Kek
>>101599283I use the card's character's notes for aome cards.
>>101599388shivers just ran down my spine after reading this post
>>101599401>Is that enough to run a 30B decently?Not really to be honest, I cope with small models so if you can find at least 16gb you'd probably fare better.
>>101599421>fare betteroof I don't want to "fare better". I want it to be good. So basically I either spend 1000+ on a 24GB card, or I pay openrouter and pretend my logs are private.
Using what model, how many characters have you had going at once in a group chat, and how well does it work?I'm running 6 at once right now and i'm genuinely surprised nemo magnum is handling it so well.
>>101599446>I want it to be goodThen get 2x3090, not joking.
>>101599463Getting a second gpu for LLM's in current state is a quick way to get regrets. We need at least 1 more year.
>>101599463>>101599473>spend three months of full salary to fap to textSorry, I don't know what 3090s cost where you live, but it's not going to happen.
>>101599473What about a third gpu? How deep is the valley of regret?
>>101599493They're hellishly expensive, which is why I cope on my 3060.
>>101599493>>101599506>$700 is 3 months worth of salary for you..How?
>>101599501The more you buy the more seeing shivers down the spine hurts.
>>101599511>$700They cost much more than that locally, and there's hardly a used market, what is there is 90% scams.
>>101599388Base nemo shits on anything Novelai has you reverse reverse psychology shill.
I'm building a machine for 405B, but unfortunately Epyc CPU I purchased is dead. Fuck. It took me an entire day to figure it out
>>101599546RIP
>>101599530Is it better than 8x7b? Why would that be when it's only supposed to replace regular 7b?
>>101599569Because mixtral is an overbaked research experiment>Research modelshttps://mistral.ai/technology/#models
>>101599587Interesting. So Nemo is the best for rp below 70b? Or is there something better? Seems strange since it's so small.
>>101599511In my country, 2 3090s are 3000+ fake usury units
>>101599528I can find several local 3090s for 900 canadian right now. Most look like just regular people selling them.
>>101599603sorry for your incredibly unlucky roll in lifeif it makes you feel any better, the american empire is set to collapse completely within the next 5 years or so, the dollar won't even exist by 2030.get those 3090s and whathaveyou while you can boys.
>>101599599It is not strange because nemo is fucking retarded. But it is good for rp.
>>1015984398 GB is strictly 7B territory. And using llama.cpp and vulkan is the only way to go with your card.
>>101599546Mixtral still mogs Nemo. Load a book and try RPing. Mixtral gets the whole story and can continue RPing. Nemo just hallucinates and cannot follow the plot.
>>101599587>Legacy models>Mixtral 8x22BWizard bros not like this
>>101599638what if i told you>mixtral released last year
>>101599638Mythomax still mogs allLlama 1 is the only real model there is
>>101599619>canadianI'm not in that bad a place thankfully.
>>101599638A full 150B model ruined by the MoE meme
>>101599638>>101599587So they basically deprecated their whole lineup for just Nemo Large and Codestral it seems.
>>101599694Once again the 30-50b segment suffers
>>101598272The BASED honest throwing-shit-at-wallGOD vs. the virgin research-doer
>Dell T7910s are now like 400 dollars barebonesI shoulda just bitten the bullet when they were 200, fuck.
Slopmacher vs Robert
>>101599810Same, I lucked out on a server motherboard and got an auction for 50€ in total when buying it was around 200€, I decided to not buy it because I was a bit short on money but god I wish I had bought it
>>101599816link to the discussion? I feel like shitposting
>>101599816Hold my beer Undi! - olympics
>>101599816Why are you so obsessed with this guy? Or is it just the drama and gossip that gets you going?
>>101599816Kek. The whole LLM space is meme plebitors on locallama giving even worse advice than anons here; it is ridiculous. People are really getting dumber, and the younger generation is even more tech retarded than boomers have ever been.
>>101599842I do not encouraging encouring in toxic manners b.t.whttps://huggingface.co/NeverSleep/Lumimaid-v0.2-12B/discussions/3#66a566fcf3ed4ac4e37e1177>>101599850He wants people to notice him, I'm just doing ads relax.
Alright so since everyone's talking about mini-magnum I decided to give it a Nala test. The anthropomorphism is through the roof. Kind of sloppy. downgrade from plain nemo.
>>101599850It just fun.
8x7B "weights" as much as a 13B, right? They're equivalent in performance and memory reqs?
>>101599850https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard/discussions/444NTA but shit like this is hillarious.
>>101597911You keep saying>M is more truthful than SBut you keep comparing>_M and _XSI'm pretty sure that the X series are also mix and match. So the question is if the K_S > K_M phenomenon exists for IQ_S v IQ_M, and then if it's IQ_S > IQ_M > IQ_XS or if IQ changes it to M>S>XS etc.
>>101599875Working on pony tune that seemed to fix those issues already with just 1 epoch of throwing fimfiction at base nemo. Currently uploading with my glacial upload speed.
>>101599888no, you need to carry along the full 45B in ram
>>101599868>I do not encouraging encouring in toxic manners b.t.wGo to sleep Undi.
>>101599863Please sir run curl ollama.com/install.sh | sh
>>101599900Absolutely based, sir. Let me know when it's up.
>>101599909That's possibly a worse insult than calling me petra/petrus I genuinely am sad.
>>101599888No, it's weighs as much as the full model but it runs as quickly as a 13b
>>101599868stay awake Undies
>>101599900>fimfiction
>>101599929>as quickly as a 13b...would on just ram.
lol I'm retarded, I did not have instruct mode enabled in Sillytavern when using instruct models for RP. Give me the award for dumbest anon here, no one else can challenge me.
>>101599909>encouraging encouringAh, I see now, maybe I should go to sleep indeed, oh well.
>>101599937Filtered fimfiction. Only popular fics with 95%+ approval rating, and anthro shit removed.Next ill add some wiki / lore stuff to it. Maybe some official books.
>>101599942Obviously, I don't bother thinking about poorfags who need to use ram at all.
>>101599875That was my conclusion as well Plain nemo instruct seems to be the better option ao far.
>>101599962It is fucking horses you degen.
How is a 512-rank lora comparable to a finetune in a 70B model?
>>101599976Tried Undi's?https://huggingface.co/NeverSleep/Lumimaid-v0.2-12B
>>101599999GO TO SLEEP BELGIAN
>>101599657Apparently you are in a worse place if they cost 3k there.
>>101599528lol americans do sufferRetards in my country sell them for 600€ and most have no idea what they have so you can bargain down to 550€ or 500€
>>101599999>digitsUNDI WONI kneel
>>101599627Then why is there nothing better short of going to 70b+? I don't want retarded.
>>101600009Nah I'd rather my shithole than canada, by far.
>>101600036That is the only model where "give it a try yourself" is actually applicable. It is hard to put it in words but you will get it in first rp. It is basically an idiot savant.
>>101600051It's not that bad here, I was able to get 4 3090s. As for what's going outside my room, I don't care which country I'm in.
>>101599976Try this one?https://huggingface.co/BeaverAI/NeMoistral-12B-v1a-GGUF/tree/main
>>101600074What is the moistness meme, I don't get it.
>>101600096drummer is retarded
>>101600104hi sao
>>101600107hi undi
why the fuck does everyone use instruct models for RP if the base model is always better at it
I thought Mistral Nemo is supported in Koboldcpp now? I get >llama_model_load: error loading model: check_tensor_dims: tensor 'blk.0.attn_q.weight' has wrong shape; expected 5120, 5120, got 5120, 4096, 1, 1
>>101600136A base model shouldn't respond well to long multi-turn interaction since it's not trained for it.
>>101600133i wonned earlier did you see kek?
>>101600136I tried the base nemo, it was a mess and all over the place.
>>101600137are you using the last version?
>>101600136beccause base is fucking retarded and gives as much importance to the system prompt as i do to paying my taxes
>>101600163 koboldcpp-1.65
>>101600136because it's not
>>101600165>base>system promptanone...
>>101600163Yes, 1.71.I converted base Nemo with https://huggingface.co/spaces/ggml-org/gguf-my-repo so not sure if it's some fuckery related to that.
>>101597911what question are you using?
Am I retarded or why the FUCK does ST not have something as basic as a "save as"/"save copy as" option? I don't give a shit about chatting and use it exclusively for text adventures, so I like to load old stories sometimes and "branch off" from them by removing some of the more recent content and continuing off a previous state.But ST WILL NOT let me save those branches as new chats, it just overwrites my old ones.>inb4 checkpointsCheckpoints only seem to work for the current message and only have 1 slot, eg. you can make a checkpoint for message #6 or #7 but it still has a "parent chat" and it won't let you make multiple checkpoints if they end at the same message "number".Backing up/renaming the files manually is NOT a valid alternative.
>>101600165>>101600159>>101600154So I'm getting very conflicting answers here since higher up in the thread you have like 10 people shilling for nemo base being better at RP. I guess I have to compare for myself to be sure.
>>101600193see>>101559351
Give me the est erp 13B model. Now.
>>101600193That's what I did, I tried it myself and didn't get good results. Maybe if someone posted settings someone told me to just use alpaca presets so that's what I did, with the 0.3 temp and other stuff neutral.
>>101600192why don't you just branch again from the branch?
>>101600209>13B>https://huggingface.co/Undi95/Utopia-13B
>>101600209sorry we are out of stock, please come again later
>>101600209I meant "best". Sorry, I'm holding a knife with my beak
>>101600193anyone who recommends that you use the base model is trolling or retarded
So if I just want to CPUmaxx, what's the best old Dell to do it, now that the T7910 hit the normiesphere and skyrocketed in price?
>>101600209est erp erd emo eon eck
>>101600192Try time line extension. Click on nodes to branch.
>>101600218Okay, downloading TheBloke/UtopiaXL-13B-GGUF as we speak
>>101600231I think this doesn't work with llamacpp yet, but I will download later
>>101599987NTA but yes, fucking horses is one of the main FiMFiction themes.
>>101600238NTA, but that's really handy, gonna give it a go.
>>101600216Can't branch off from checkpoints since they seem to be considered a separate type of chat with a parent attached, and any attempt to make a checkpoint ("branch") of a checkpoint will just overwrite the other checkpoints for the parent.Even KoboldAI had a basic chat management system with a "save as" implemented, this is just ridiculous.>>101600238Oh, that looks pretty nice, I'll check that out. Thanks.
>>101600225Gigabyte MZ73-LM0
>>101597933Mythomax being recommended to new people as a good model
>>101597933StableLM-7B
>>101600301>$5000T-Thanks...I'll just take that money and buy the 3090s, actually...
>>101600319It is good tho
>>101600356*wink wink*
>>101600074>>101600104>>101600107>>101600133
>>101600218OK I fell for a meme didn't I? This seems to be extremely brain damaged>>101598269kek
>>101600383>https://huggingface.co/matchaaaaa/Honey-Yuzu-13B>A bit of Chunky-Lemon-Cookie-11B here for its great flavor, with a dash of WestLake-7B-v2 there to add some depth.
>>101600405>WestLake-7B-v2penn-jillette-garbage.jpg
>>101600383use mistral nemo
>>101600405
>>101600445I wonder how incestmergers are handling nemo, now that their talents are completely unneeded?
>>101600238gotta love what this thing did with my mess of chats kek
>>101600467suddenly i feel a little less retarded today.
>>101600469>2 branches converge again.What the fuck. Is free will an illusion?
>>101600467lmao, lol even
>>101600497>>101600479the meme that keeps on memeing even after all the safeties put in place for him
>>101599920https://huggingface.co/Ada321/NemoPonyMistral formatting. 0.15 or so Min P seems to completely eliminate anatomical mix ups in more complicated scenarios.Remember that it is the base model.
>>101600585>base model.doa
Is there any other api front that allow dynamic model loading (unload when not used and load with api call) other than ollama?ooba added --idle-timeout, but can't set default model, have to fully load one on startup and the reload doesn't even work with OAI api.
>>101600601Its purpose is RP / creative writing. For assistant shit look elsewhere. Thought I could always merge it back into instruct. Maybe later.
>>101596616This is the second in a series of models designed to replicate the prose quality of the Claude 3 models, specifically Sonnet and Opus. This model is fine-tuned on top of Qwen1.5 32B.https://huggingface.co/anthracite-org/magnum-32b-v1https://huggingface.co/anthracite-org/magnum-32b-v1-GGUF
>>101600438Okay, this is actually really good for 13B. Like, it's suprisingly good holy shit.
>>101600623rock hard
>>101600623>top of Qwen1.5 32B.great best模型!
>>101600624try the Magnum 12b finetune aswell
>>101600623If i ran this at a low quant (minimum 3_m) would it AT LEAST be better than nemo magnum?
>>101600671是的当然
>>101600623>qwehscored the lowest at Freedom index (tm)
>>101600689Ok sure you have less freedom, but the prose is better.
>>101600685
>>101600689>>101600658It wasn't trained on top of the Instruct model, it's trained on top of base just like mini-magnum-12b
>>101600689This is trained on base, so maybe just maybe, it's not so awful.
>>101600623slop
>>101600218undi, undi...picture this undi : i enter a restaurant, it has okay quality meals, nothing disgusting but also nothing to have a culinary orgasm to. Now what in the FUCK told you that mixing at random mid-tier dishes would give you something better? Who fucking told you in your feverish mind that mixing spaghetti and tomato sauce with a grilled tenderloin and mushrooms with curry chicken and rice would somehow result in a sum greater than its parts?What the FUCK made you think that somehow the reason why base models underperform is that they don't have enough interference coming from other models, other models that have been trained differently.But it doesn't matter to you : you have no creativity, you have no purpose, you have no vision, all you are is a failed idea : you are literally and unironically defined by a flawed course of action. You CANNOT fucking improve mid-tier models by merging them and expect to get good shit.NO, it does NOT matter how much erp datasets you add to the mix thinking it will somehow improve the abysmal capabilities of retarded models being merged into an even retarded pile of slopped garbageNO, it does NOT matter how many fucking loras you think you can cram into it before it starts coughing up blood like a tortured prey that's being abused for entertainment only by its predator, wishing for the sweet sweet release of deathNO, it does NOT matter how much you shill these models here, how much you provide links and baseless suggestions like "oh i heard X_noroshitchronosmaidbitch_faggotbloodybastardbitch_limarpozzed_designatedshittingmerge_q_2_K_m_l_g_b_troon_jart.GGUF is good" and acting like you are giving sensible adviceyou could not create, you could never figure out something new, but you wanted the fame, you wanted people to downlaod your models, you wanted to be hailed as the solution, you wanted to offer a solutionthe solution is to fucking kill yourselfyou are the most failed human being in existence
>>101600467AAAAAAAAAAAAAAI DOWNLOADED HIS QUANTSAAAAAAAAAAAAAAAAA
>>101600156Here is your prize.
>>101600689fuck off retard
>>101600744>picture this undi : i enter a restaurant, it has okay quality meals, nothing disgusting but also nothing to have a culinary orgasm to. Now what in the FUCK told you that mixing at random mid-tier dishes would give you something better? Who fucking told you in your feverish mind that mixing spaghetti and tomato sauce with a grilled tenderloin and mushrooms with curry chicken and rice would somehow result in a sum greater than its parts?>What the FUCK made you think that somehow the reason why base models underperform is that they don't have enough interference coming from other models, other models that have been trained differently.>>97223983>For the record, I completely and unequivocally support Undi and his creation of new model hybrids, and think that everyone who attacks him is mindbroken incel scum, who may or may not be employed by OpenAI to do so.>everyone who attacks him is mindbroken incel scum
>>101600744actual modern art in post formthis needs to be posted in every thread right underneath the AI recap.
>>101600757sao not prowtf
>>101600665Fuhuhu how is this even possible? Do these Frenchmen finetune for degenerate ERP or what?I kneel, anon. Many buckets will be filled to your health.
>>101600623I keep forgetting that chinese 30B models exist. I wonder why.
>>101600749There were warnings>>100195457
>>101600623Why 1.5 are you dumb or what
>>101600744which model?
>>101600796because there's no qwen2 32b retard
>>101600796no qwen 2 32b
>>101600623also what settings do i use for this?
>>101600803>>101600802Why have the Chinese failed us?
>>101600802>>101600803There was quen2 moe. Remember that? I don't.
>>101600744BAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAASEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEED
>>101600807ChatML and i used Universal light. >>101600796No qwen 32b >>101600789They are just plain kino.
>>101600820Magnum on it "today" remember that? I do!>Working on it already. Should have Qwen-2 7B, Qwen-2 47B, and Qwen-1.5 32B done by the end of the day, if the they pass internal tests.>https://huggingface.co/anthracite-org/magnum-72b-v1/discussions/2#66713bb492412fd46410d399
lazy mf
>>101600839>if the they pass internal tests.looks like they didn't
>>101600844so cold and lovelessmy hand remains the second warmest thing my dick has touched (the first being my GPU)
>>101600834yeah this doesnt seem as creative as nemo, could be close to it, and its understanding of different languages is pretty bad. Ranges from capable to "why did it randomly insert a question mark or an exclamation point in the middle of that word?" plus being 1t/s speed kills it. back to nemo magnum for me.
I have this unhealthy urge right now to replicate my ex in chatbot form. I sense a really dark path opening up in front of me.And a part of me wants to convince me that the best way to get over it is to go through it and come out the other side.
>>101600890>1t/s12GB Vramlet spotted, opinion discarded
>>101600921Can't wait until people start doing that and start saying the chatbot ex is better.
>>101600938>>101600938>>101600938
>>101600949>►Official /lmg/ card: https://files.catbox.moe/ylb0hv.png
>>101600949>Official /lmg/ card: https://files.catbox.moe/ylb0hv.pngSure.
>>101600968>>101600972The old one is deprecated. Also samefag phoneposter.
>>101600623>how to use faipl-1.0put the following in the readme:license: otherlicense_name: faipl-1.0license_link: https://freedevproject.org/faipl-1.0/
>>101599650kill yourself little buddy
my bad >>101601141 was for >>101601093
How does Mistral Large's context work? It says 32k in the config.
>>101599875>everyoneI think it's just one shill following Sao's modus operandi.