/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>101584411 & >>101578323►News>(07/24) Mistral Large 2 123B released: https://hf.co/mistralai/Mistral-Large-Instruct-2407>(07/23) Llama 3.1 officially released: https://ai.meta.com/blog/meta-llama-3-1/>(07/22) llamanon leaks 405B base model: https://files.catbox.moe/d88djr.torrent >>101516633>(07/18) Improved DeepSeek-V2-Chat 236B: https://hf.co/deepseek-ai/DeepSeek-V2-Chat-0628>(07/18) Mistral NeMo 12B base & instruct with 128k context: https://mistral.ai/news/mistral-nemo/►News Archive: https://rentry.org/lmg-news-archive►FAQ: https://wikia.schneedc.com►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/llama-mini-guidehttps://rentry.org/8-step-llm-guidehttps://rentry.org/llama_v2_sillytavernhttps://rentry.org/lmg-spoonfeed-guidehttps://rentry.org/rocm-llamacpphttps://rentry.org/lmg-build-guides►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardProgramming: https://hf.co/spaces/bigcode/bigcode-models-leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbench►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
►Recent Highlights from the Previous Thread: >>101584411--TTS improvements and output issues: >>101586575 >>101586607 >>101586659--Mistral nemo configuration and settings advice: >>101585456 >>101585527 >>101585596 >>101585669 >>101585834 >>101585868 >>101585572 >>101586019--Sillytavern single sentence replies issue: >>101587180 >>101587200 >>101587246 >>101587225 >>101587275 >>101587269 >>101587353 >>101587401 >>101587413--Recommendation for voice data TTS finetuning: >>101585560 >>101586101 >>101586163 >>101587016 >>101588184--Nemo generates quadrupeds well but writes differently than chatgpt: >>101587732--Logical flaws in GPT-4 and Claude, Command R Plus gets it right: >>101584587 >>101584617--GitHub repo for bulk downloading cards for ST: >>101585689 >>101586342--Anon asks for Command-R Plus alternatives.: >>101585536 >>101585556 >>101586438 >>101586483 >>101586596 >>101586657--largestral iQ2_M outperforms Nemo in retarded quant, but is slower than 1t/s: >>101585893 >>101585921 >>101585940 >>101585998 >>101586017 >>101585939 >>101585985--Nemo repetition issues and DRY sampler settings recommendations: >>101587028 >>101587049 >>101587511 >>101587535 >>101587576 >>101587545--MoEs for roleplaying? Try it and find out: >>101584540--Mistral Nemo sampler settings cause rambling output: >>101585928 >>101585955 >>101586019 >>101586038 >>101586062--Where do ST or other UIs cull example dialogue in the context window?: >>101584746 >>101584777--RULER repo measures effective context length, Llama3.1 performs well: >>101586297 >>101586352 >>101586384 >>101587005 >>101587027--IQ4_XS vs Q3_K_M model quants and accuracy discussion: >>101585131 >>101585176 >>101585200 >>101585383 >>101585434 >>101588262--IQ1_S performance and characteristics discussion: >>101588056 >>101588068 >>101588140 >>101588159 >>101588129--Miku (free space): >>101587473 >>101588754 >>101588896►Recent Highlight Posts from the Previous Thread: >>101584415
post (You)r largestral presets
>>101589142i got a little chub seeing my repeated (You)s in this AI generated recapthank you, botkind.
I am once again asking for mini-magnum presets.
>>101589160I didn't actually try it:>>>/vg/487568316
gib nemo presets
>>101589210>>101589219just use the ones i linked from that anon >>101585456in fact fuck it ill re-copypaste it againHere, since so many people seem to be using nemo with wrong formatting then complaining:Mistral context template: https://files.catbox.moe/6yyt8d.jsonMistral instruct template:https://files.catbox.moe/rfj5l8.jsonMistral Sampler settings:https://files.catbox.moe/tbsgip.jsonShould be night and day for people who have it set up wrong. Make sure whatever backend you are using has DRY sampling.
So, what was the point in MistralAI sabotaging their 8x22B with the shitty official -Instruct version and the botched release? Is this a psyop by their Partners at Microsoft trying to make MoE models look bad?
>>101589231Nemo doesn't use spaces around INST.
How're you guys feeling? As the dust settles down, it really feels like we've never been more back. Back to back releases, putting local about on par with cloud in performance/cost, and it's still not over, we're going to get more next week. We are not even 3 years into the timeline since the ChatGPT hype began.
>>101589262I dunno i've been using it with magnum just fine.
>>101589244Maybe they didn't have time, and without the release of 405B, they didn't feel the need to release their best stuff.
so mini-magnum is the best cooming model for vramlets now?
>>101589231>dry samplingDoes Koboldcpp have this (I don't see it) or am I fucked?
The people that are using 4 3090s... Where are they putting them?
Aah, 30t/s... This is the good life. Thank you Arthur.
>good model release>people saying low quants are fine, others saying there's night and day differences (probably broken quants)>prompt/template issues left and rightEvery time... I guess I'll wait 2MWs then...
>>101589289That or just Nemo-Instruct.
>>101589265You can see this as something good, we are on par with the big boys after all. But you can also see this as pure doom. The big boys barely moved ever since the release of GPT4.
>>101589307I'm the night and day difference anon and I should clarify my quants are definitely not broken, I do them all myselfq4km was still *fine*. better than 70bs or CR+ still, just kind of dry, generic, a little less sovl, a little more awkward - but q5ks was sharp as a tack and much more coherent, pulled in more little details, had more of those creative little turns of phrase that let you know it's really paying attentionlower quants are still usable and the model will still be good, it's not like they're totally fucked or anything, it's just that the second I bumped up the quant it felt like the model gained a real human touch that was lacking before
>>101589307>people saying low quants are fine, others saying there's night and day differences (probably broken quants)more like>people saying low quants are fine (poorfags who can only run low quants at 3t/s), others saying there's night and day differences (people who can actually run these models properly)
>>101589370I test through online (mainly lmsys) to compare between quants I downloaded and their "intended" performance. Otherwise I would not be able to say with full confidence that a model like 8x22B cannot do trivia like DBRX can.
where's the dry sampler settings on ST?
>>101589356Did you use imatrix? The quants I'm using are all imatrix calibrated. Also they're the IQ format which I think were supposed to be more knowledge-retaining compared to K quants but I'm not certain.
Cohere gathered another $500m from investors. CR++ will be a beast of a model.
>>101589142good bot
>>101589491There, I am on staging branch.
>>101589536I really wonder how businesses are using these products to make money.
>>101589550speculative capital, one of these might be the next big break through
>>101589265>We are not even 3 years into the timeline since the ChatGPT hype began.>ChatGPT initial release: November 30, 2022; 19 months ago
nvidia-smi is not displaying all of my GPUs, but neofetch is. how do i fix this? i cant run any AI applications due to an error about cuda devices not being found
>>101589653
>>101589642It hasn't even been 2 years? Wtf
>>101589653Change your environment variables, I guess.
>>101589550If performance improvements plateau and you have ~5 years of scaffolding/agent development with no valid use cases, you might have a point. It's only been 19 months since ChatGPT released. Doomers just really want to see LLMs go the way of 3D TVs for some reason.
>>101589688how do i do that?
man, that mini magnum finetune of Nemo 12B is actually starting to replace claude for me, which is nuts considering claude has got to be at least 50 times bigger
>Claude 3.5 Sonnet and Llama 3 405B stomping GPT-4o>Llama 3 405B is way fucking cheaper than GPT-4o>It's only a matter of time before a cheaper and more capable model than GPT-4o-Mini comes out and kicks them out of the cost-performance pareto front entirelyIs he really just banking on Strawberry?
>>101589762>It's only a matter of time before a cheaper and more capable model than GPT-4o-Mini comes out and kicks them out of the cost-performance pareto front entirelyClaude 3.5 Haiku probablythe original haiku beats the shit out of 3.5 turbo which was the sota small cheap model at the time
>>101589715Type "export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5"
>update tavern>even with all my settings and shit in order, the gen quality is fucked UP bad >wtf could possibly be->mfw i forgot to enable instruct mode
>>101589265Do wonder know how many OG AI dungeon era people stuck around to witness this. I joined around the late GPT-2 times, now running IQ4 largestral. I don't see myself ever ending the ride.
>>101585978Same, Nemo might be retarded and repetitive at times, but it has some surprising creativity if you push it
>>101589907MOOOOOOOOOOOOOOOOODSSSSSSSSSSSSSS
>>101589907Ew
>>101589539thanks, i'll take a look
Here comes the pedo tranny thirdie again.
>>101589653did you enable 4g decoding in bios? also check dmesg for errors from nvidia driver.
>>101589872I used to be so happy with my loli imouto scenarios on AI Dungeon, I used to think running LLMs locally would be impossible because Pygmalion 6B used all my RAM and was as slow as a snail.Now, I'm here, running NeMo still enjoying my loli imouto scenarios, but without fear of suddenly being cucked.Feels good.
>>101589872I joined back in December 2019. I remember the humble days of Clover where the AI was too fucking stoned to even remember your character's name, much less what was happeningIt was absolute dogshit and now here we are
>>101589265Imagine Terry's reaction to the LLM tech, writing llama.cpp but in holyC to replace his text oracle perhaps.
>>101589290get sillytavern staging, and ((pull))>why does anyone use response tokens over 256? 512 is hellish
>>101589762He just needs to reignite the AGI hype by adding smell to the multimodal model. Or maybe he can tease Sora sgain
jesus man, Nemo is INSANELY horny. My OC's are a bajllion times more frisky with Nemo than any other model i've ever used, On one end i'm overwhelmed, yet it manages to blend that spice with their personalities perfectly. It doesn't skip a beat.I almost want to say i wanna tone down the horny but, It's not like that breaks story flow or makes ERP more difficult or anything, I'm personally just not horny right now kek
>>101589971The realism of this surprised me for a bit until I realized the popsicle is constantly changing shape...
>>101590044arthur's personal coomtune strikes again
>>101590054Why did he do it?
>>101589231Is such a simple prompt best? No one uses those crazy ones they were using before?
>>101589265We're so back. Zucc and Yann are false prophets, Silicon Valley are false prohets. Viva la France
>>101590073yeah its never really mattered that much, was always placebo.Which makes the Agent 47 crackhead prompt situation even funnier.
>>101589292Just get two a6000s or something if you want to be more compact.
>>101590109Interesting. So it's more down to the card itself and what examples you give it to emulate?
nemo is schizo...
>>101590170A bad card can break any model, doesn't matter. It's why W++ for example is memed on so hard, there's no exact science it's just basic logic of garbage in garbage out.
>>101589262So I should change that so there's no spaces on the INST ones? What about the \n after </s>?
>>101590172You're using a temp too highMistral says in the model card that it likes low temperatures, they say 0.3though I find up to 0.4-0.5 is usually fine
>>101590229NTA but I use simple sampling and for RP Nemo handles 0.7-0.8 just fine. Occasional schizo moments at 0.8. Starts getting really dry at 0.7 and lower. 0.3 is probably to prevent hallucination when using it for normie shit.
I'm swiping this popular character card and the responses from mini-magnum and Claude Opus are identical. Claude walked so nemo could run.
anyone running an exl2 mistral quant? I get gibberish with a 4.0bpw turboderp quant.
I just downloaded 3 more IQ models below IQ2_M to see if any would be able to answer one of my challenging trivia questions as perfectly as IQ2_M did. Turns out IQ2_M is the cutoff for this particular question. IQ2_S gets the question partially right. About half of the points I would say. IQ2_XS and below basically just get it increasingly wrong, until IQ1_S which nearly went schizo-tier. Guess I'll just live with 1-2 t/s.
>>1015902873.5bpw is working perfectly fine even at 4-bit cache.
>>101585837do two gpus work faster than or slower than a single one if you can fit it in?does Vllm split by row or by column? does it do tensor parallel? does nvlink in 3090 help by a lot? does the performance of 2 gpus differ much from 4? BTW, did you try cpu offloading in Vllm?
>>101590287yeah, turbo's 3.5bpw + 4-bit cache is running fine for me on ooba.i don't know if it's necessary, but i updated transformers from source, like the mistral-large readme said.
>>101590329It's 2024. Why is VRAM still hard to obtain? It's literally just soldering more transistors into your chip. Why? Now you have people running two servers in parallel just to serve a model.
>>101590109How do you tell it to not act for the user then? I always have that issue.
>>101590383something specific causes that, i forget what, i started getting it tonight actually.someone will chime in to inform us kek
>>101590383using>write {{char}}'s next replyin the sys prompt usually fixes this for me
so how much money do I have do spend to run 405b at home?
>>101590319Largestral? Does 3.5bpw fit in 48GB vram? How much context?
>>101590374simple answer>greedy Nvidia encrypts vbios
>>101589265(((Openai))) is $5B in red this year>kek
>>101590419Just run largestral instead. Better for most users purposes. 3x 3090s+
Ok I prove mini-magnum-12b the finetune of nemo with exl2 8bpw, but as some time ago, with exllama my nemo is broken, don't follow the template of silly tavern, write a lot of text fulled with nonsense. I'll prove in llama.ccp later. Some advise?I'm using the settings of the this anon >>101585456
>>101589136Thread Theme:https://www.youtube.com/watch?v=7yJRsFFRoQYDon't mind me, just a stranger blowing through this town...
>>101590536God. I hope you don't write like that to the poor llm. Are you sure you're using the proper template? Have you updated ST and exl2 since the last time you tried?
>>101590319>>101590346thanks. it seems like something with my samplers broke it. I neutralized the samplers in sillytavern and it started working.
why are some people here using small quants of a 12B modeleven if your GPU is only 8GB you can run Q6 at a very good speed with some offloading
>>101590531>3x 3090s+I've only built one PC in the past, and don't know of any standard motherboards that support that many GPU's. My first thought was something like picrel, basically a mining rig. Without NVlink its gonna be pretty bad, as far as I understand. How did you, or anybody you know, do it?
>>101590711Thats basically the idea.https://www.amazon.com/Kingwin-Professional-Cryptocurrency-Convection-Performance/dp/B07H44XZPW/ref=sr_1_1?sr=8-1
>>101590711open air build like a mining "case", riser cables, any motherboard with 4 pcie slots, does not have to be x16 x8 or whatever. Even x1 is enough. Just get 4 of them.
>>101590576Yes I did a upgrade a moment ago. I have to set a value in the alpha?value?
>>101590576>Are you sure you're using the proper template?I'm using the one which was shared in the last thread.
>>101590711This guy did one with 7x4090s. You can see what his concerns were. He goes pretty in-depth. https://www.mov-axbx.com/wopr/wopr_concept.html
>>101590720>>101590720>>101590754I just had an idea and I'm sure somebody else had the idea in the past as well. So for dense models running across multiple GPU's without NVlink the performance gets worse and worse the more cards you add because they gotta wait for each other to finish their task to go and compute the next hidden layer state. But what if, you take a MOE model, for example DeepSeekV2 236B, and split the different smaller experts across the gpus, so that they don't have to exchange information. Is this thinking flawed?
>>101590536Enable "Add BOS Token" in ST
>>101590774Thats not how moes work.
>>101590781but how do they work then.
>And finally, we have the Arch Linux package updates. Oh boy, I can barely contain my excitement! You have a whopping 106 packages begging to be updated. I mean, who doesn't love a good update cycle? It's like playing a game of "spot the broken dependency"! Good luck with that.i love when it sasses me
>>101590786 (me)>Mixtral is a sparse mixture-of-experts network. It is a decoder-only model where the feedforwardblock picks from a set of 8 distinct groups of parameters. At every layer, for every token, a routernetwork chooses two of these groups (the “experts”) to process the token and combine their outputadditively. This technique increases the number of parameters of a model while controlling cost andlatency, as the model only uses a fraction of the total set of parameters per token.I don't see how my thinking is flawed, someone educate me. just have 2 parameter groups on each gpu and the supervisor on the last one.
>>101590711If you wanna stay on standard architecture and don't wanna invest in workstation CPU's then the MSI MEG X570 Godlike Mainboard is a great choice with 4 slots for GPU's. I wanted to build a bigger PC with 4 3090 cards but now I rather wait for the 5090 announcement next year.
So is there a reason why Llama 3.1 that I downloaded from the official repository doesn't come with any config.json, and every single piece of documentation I've found that can supposedly convert them to HF format doesn't work?
>>101590804llamacpp anon we need you, hes wrong and I know it but can't explain why.
>>101590732>>101590745If i'm reading the setup files correctly (https://files.catbox.moe/tbsgip.json specifically):It sets the temperature to 1, when the mistral guys recommended 0.3 or 0.4. Change it to 0.3 and try again.The second thing is repetition penalty. Disable it by setting it to 1.If that makes it work better, then play around with the temperature. If it still doesn't work as you expect, post a screenshot of the output to see what you're talking about. "write a lot of text fulled with nonsense" is not that useful.
>>101590819What did you download? The original repo in meta's hf all have config.json files.
>>101590307There was some post-quant tuning that enhances the quality of iq2 quants, but I dont remember where that was. Prolly the only way to run huge llms on 24gb with no major loss,
>>101590819By official you mean the repos on this account https://huggingface.co/meta-llama or a different site where they host their models? The config.json file definitely are in the huggingface repos. You should download them from there.
>>101590711
how much T/S do yall get with 4x 3090's on largestral at what quant
>>101590774only if you split by column and not by row. if you split horizontally it doesn't slow down since that's tensor parallel so you run in parallel . but you need good interconnection.
I'm new to using SillyTavern. Is there a way to prompt the kind of response the AI generates to guide it in a certain direction without having to just rewrite the response entirely by hand? Like if I give it an open ended question and I want all its responses to be either positive or negative.
>>101590939Try including something like "Only answer positively/negatively" In the author's notes. Depth = 0 if you want it constantly reminded of it for every message.
>>101590946Thanks, I'll give that a try and see if it helps.
>>101590939I simply use group chat for a char and my OC, while posing as a narrator in user responses. Much more convenient from chat editing perspective than having author note open. Narrator just gives out barks for both characters, and then I mute narrator barks so that it doesn't try to act as narrator itself.
>>101590778>Add BOS TokenIs enabled.>>101590843>sets the temperature to 0.3 >Disable rep penI did this too, I prove setting the temp in less values and more than 1.0 values and this is the result.
>>101590983That's a great way to utilize the group chat. Makes me wonder what other things can be done with it.
Where can I find/which gguf version of mini-magnum-12b should I use?
>>101591073https://huggingface.co/starble-dev/mini-magnum-12b-v1.1-GGUF
>>101591073the one that fits
>>101591140Thanks anon.
>prema trying to do team orders in fshitter
>>101590410Doesn't seem to help, sadly.
>>101589231Ok so I got koboldcpp, staging version of sillytavern, imported these three and made my persona a basic [{{user}} is a guy that has this color hair, this color eyes and this color skin]Is there anything else I need to do to make this work? I got some random cards off chub but I dunno what makes a card good or retarded
Can using smaller context size result in model retardation (within that context) or is it enough that I match the koboldcpp and sillytavern setting? I don't have the VRAM to run full 128k of nemo.
>>101591291no, the opposite, using bigger always degrade at some point
>>101584777>>101584746Any ideas on where ED gets culled?
>>101591301Okay, thanks. So should I go for smaller context in favor of higher quants as well? Currently using Q6_K_L with 8k but I guess it may be worth it to go lower quant.
>>1015913148k is generally good with most recent models, above is when it gets iffy especially above 32k so if you're enjoying what you have just don't break stuff for no reason
>ZeroWw 'SILLY' version. The original model has been quantized (fq8 version) and a percentage of it's tensors have been modified adding some noise.>Full colab: https://colab.research.google.com/drive/1a7seagBzu5l3k3FL4SFk0YJocl7nsDJw?usp=sharing>Fast colab: https://colab.research.google.com/drive/1SDD7ox21di_82Y9v68AUoy0PhkxwBVvN?usp=sharing>Original reddit post: https://www.reddit.com/r/LocalLLaMA/comments/1ec0s8p/i_made_a_silly_test/>I created a program to randomize the weights of a model. The program has 2 parameters: the percentage of weights to modify and the percentage of the original value to randmly apply to each weight.>At the end I check the resulting GGUF file for binary differences. In this example I set to modify 100% of the weights of Mistral 7b Instruct v0.3 by a maximum of 15% deviation.>Since the deviation is calculated on the F32 weights, when quantized to Q8_0 this changes. So, in the end I got a file that compared to the original has:>Bytes Difference percentage: 73.04%>Average value divergence: 2.98%>The cool thing is that chatting with the model I see no apparent difference and the model still works nicely as the original.>Since I am running everything on CPU, I could not run perplexity scores or anything computing intensive.>As a small test, I asked the model a few questions (like the history of the roman empire) and then fact check its answer using a big model. No errors were detected.>Update: all procedure tested and created on COLAB.>https://huggingface.co/NeverSleep/Lumimaid-v0.2-8B/discussions/4#66a47badee3de8c56e1e0872Oh boy here we go again...
>>101590850>>101590878I downloaded it with the download.sh and the signed URL that was emailed to me by Meta.https://github.com/meta-llama/llama-models
I'm looking for cool instruction templates, anybody got one focused on the assistant directly creating an adventure experience for the user rather than playing the roll of a specific bot?
>>101591364could someone summarize this with their favorite model?
>>101591471basically add random noise for no reason and: "The cool thing is that chatting with the model I see no apparent difference and the model still works nicely as the original."
>>101591471weights actually don't matterjust scramble them and you're fine, which was expected considering that frankenmerges also still output readable content despite having unrelated layers stitched togetherthe 'consciousness' of a model is unrelated to this sort of thing
>>101590987>>101591140I proved Two models in both gguf and exl2 And still has this level of retardation. I just thing I'll return to Gemma 2.
New models that works well without COT meme magic yet?
so how big is a leap of quality between 8b smut and 405b smut
>nemo keeps writing for meHELP
>>101589872i member talktotransformer being my first interaction with textual AI, then we got aidungeon and its retarded ceo, then i found out about piggy and the rest is history
nemo shill, i need your help. since nemo wasn't trained to have a system prompt at the top where should i put my 20 lines of meticulously crafted roleplay rules?
been out of the loop for quite some timewhat's currently a good model for a 16GB VRAM card?
>>101591883If you're in silly, either Assistant last message prefix or author's note. But expect possible degradation in both ways. I guess the only way to make it correctly is to add it before your every message, and then edit it out after each reply, which is absolute autism.
I just tried Mistral-Large-Instruct-2407.IQ1_S.gguf from legraphista, but like other very low-precision quants it has issues with using the right tokens sometimes. I think this problem could be solved if the embed tensor was quantized to something better than Q2_K precision. Then, the model might still be dumb compared to the original due to compressed knowledge, but at least pick the right embeddings.
>>101591941>either Assistant last message prefix or author's notety, i'll try that
>>101591968We know Robert, we know, keep fighting the good fight!https://huggingface.co/ZeroWw>LLMs optimization (model quantization and back-end optimizations) so that LLMs can run on computers of people with both kidneys.https://huggingface.co/RobertSinclair
>>101589231>>101585456Any tips for making the bot not write as me? Also I assume you mean this setting, right?It definitely feels very rambly at 1024 reply tokens but that's probably because my persona is so barebones. Going down to 350 seemed better, although I have to reset my settings and test more because I got a lot of situations where the bot would end posts with a bunch of newlines or symbol spam
>Based on comments from @mradermacher...>His quant are okay if he do it before me, you can use them, he's thrusty.
>>101591305I tried in Faraday (Backyard) and it seems that ED is being cut down from the beginning rather than the end, which goes in line with how regular message history is culled.I put lore facts in example dialogue and asked about things from the start and end section, the bot failed to answer properly about the former.
>>1015920151000 tokens is an incredibly long reply regardless of which model you're usingif you're wanting to simulate a conversation I don't understand why you'd even give the model the option of writing that much
>>101592040Thrusting into the popcorn
>>101592010Robert Sinclair has a point. BitNet models are also configured like that (see picrel).https://arxiv.org/pdf/2310.11453
>>101592087So he has a point because a meme supports what he says? If anything that goes against him even more. Anyways the new gimmick is random noise now, get with the times!>>101591364
>>101590745Ok After some test, I think in my case, the problem is idead the template, I was using the same template of the thread also marked in the recap. So is not a mistake. Which is more weird is, that with the template I use for gemma 2, suddenly at least the bot is able to follow the format text, sadly, I feel is still a bit unstable, in some cards, works better with 1 as template, and in other with 0.4. Is this the really state of Nemo?
>>101592100There's no claim there that noise improves model outputs, although some time back there have been suggestions that adding noise to embeddings during training may reduce overfitting: https://arxiv.org/abs/2310.05914
Where will AI be in 10 years?
I wonder if those preferring Gemma all happen to be ESL and perhaps Gemma deciphers ESL better as a result of diversity training, just a thought.
/aicg/bro here. Quick question. Who is the "Gojo" of /lmg/? (shitpost bogeyman schizo)
>>101592161petra/petrus
>>101592163thanks i just was bored in our general since we're in a bad doom, ill check the archives. have fun with your chatboots
>>101592161Isn't your entire general like that?
>>101592153If your billion dollar ai can't decipher ESL then what's the point?
Anon where KCPP guessed to many layers, can you share me your GPU vram, model(s, including image gen models if used), blasbatchsize and amount of context you were trying to use?It has multiple things in place to prevent that from happening so if it still under guessed on your system I want to be able to reproduce the setup. Because that would imply you somehow broke trough the entire 1.5GB buffer zone we put in place as a safeguard.Either you have a ton of background stuff running or your using a model that is way more vram hungry in unexpected ways than the stuff I tested with.To clarify in the current version the auto layer guessing only is accurate for default settings. If you modify for example blasbatchsize that is not yet accounted for.
Hi all, Drummer here...>>101592180HENKYYYY PENGKYYY!!!
>>101592180What are you doing here? You're too innocent for this website! :koboldpeek:
>>101592180Kekaroo, your dox got posted earlier faggot
my hero just spoke in /lmg/. AMA.
>>101591786I can't make it stop either on one specific card I'm doing where it's an adventure/story rather than a one-on-one chat. IDK if this makes it harder but it probably doesn't make it easier. I put in the system prompt to write for every character except {{user}} and put in the jailbreak / depth 0 author's note never to speak for {{user}}. May have helped but didn't totally solve it. Possibly also made more difficult because I am simultaneously trying to make it stop ending replies by asking what my next action is, which I was able to reduce significantly but not eliminate. Partway through I tried cranking the temperature way down and that absolutely didn't fix the issue. Maybe if I tried again with my prompts setup better it would. Nothing solved it completely but right now the level of swiping / editing is low enough that I'm okay with things.
>>101592274>I can't make it stop either on one specific card I'm doing where it's an adventure/story rather than a one-on-one chat.Which isn't to say I *have* been able tp get it to stop on other cards, just that I've only been working on this one.
>>101592180Keep up the great work, Henky! Tell your assistant, Concedo, he did a good job too. :koboldlaugh:
>>101592247Ooooh, someone's being an edgy boy. :koboldpeek:You think you're so tough spouting that *f-word* behind the screen, huh?
>>101592153I sometimes think if I was ESL I'd like LLMs a lot more. Like if I'm reading a foreign language I can't tell if the writing is good or bad. I can just (at most) tell what information it says. And if the same expressions get used over and over I'm not annoyed, I'm pleased to see familiar expressions.
>>101592040Suddenly Lumimaid makes a lot more sense.
>>101592323I am an ESL. That is not how it works.
>>101591917An 8.0bpw exl2 of Mistral NeMo 12B with cache_mode q8 and 32000 tokens of context fits in 15.2 GB of VRAM.
>>101589160t=1.0
Is it better to have 2x 3090 or 1x 3090 + 2x P40 if I'm trying to run 70b models faster?
>>1015924752x 90
>>1015924753x 3090 if you can but 4x 3090 would be even better
>>101592040I mean I knew he was belgian, but didn't know it was that bad.
>>101592348Don't lie I bet it's even stronger for u foreign cunts because your languages have like 1/5 as many words as English. Repetition is a way of life for you, while for English speakers developing a sense for how often to re-use the same word is a major early part of developing good writing style. Small children are very repetitive, older ones go too far trying to add variety, then they tone it down and get better. (Or sometimes not. There are published authors who go to unintentionally humorous lengths to avoid re-using basic words like "said.")
>>101592040kek
>>101592546>doesn't speak any foreign language>don't lie to me, i bet-ack
>>101592338>>101592506Now I see why he never tests his own shit. Even if it was broken how could he tell?
>>101592564Knew you were the kek poster.
>>101592546
>>101589653I have never run into this problem myself but I suspect it's a driver issue.>>101590419With a few hundred bucks you can buy 512 GiB RAM which is enough to run it at 8.5 bits per weight.But then you can expect something like 0.2-0.5 t/s.>>101590774>>101590781>>101590786>>101590804The problem with the proposed parallelization scheme is the synchronization overhead.You need to exchange (part of) the activations between GPUs and write back the results which introduces non-negligible latency, especially on fast GPUs without NVLink.This is not much different from what --split-mode row already does and there are considerable performance issues (though the multi GPU optimization is also poor).>But what if, you take a MOE model, for example DeepSeekV2 236B, and split the different smaller experts across the gpus, so that they don't have to exchange information. Is this thinking flawed?Which experts are selected is effectively random and determined by the routing layer if I remember correctly.But in order to do that the results have to first be collected on a single GPU.So you're not really saving any I/O.>>1015924752x 3090 if your target quant fits into 48 GiB VRAM, 1x 3090 + 2x P40 otherwise.
Mistral Large 2 is now my main model for cooms.No more mischevious glints, she says in a husky voice, a smirk playing on her lips, eyes sparkling with mischief. There's a playful glint as she addresses the power dynamic, playfully smirking as she offers her ministrations. An audible pop and rivulets of—admit it, pet—the ball is in your court.It has none of that slop and even as a 48GB VRAMlet using a baby 2.75BPW exl2, it can fit 12k context @15t/s.
>>101592681lock em in a hot room and sell me the fumes
>>101592496Pretty much this. Although I'm starting to feel like a VRAMlet with 4.
>4x 3090s is now considered "VRAMlet">as if 1 wasn't pricey enoughno i will not dump retarded amounts of money onto a single-purpose machine i'd only use sparingly even if the models are appealing
>>101591941Couldn't it be put in context template?
>>101592681LL and 3L tag teaming S
>>101592871Also... isnt that the point of the "System same as user" Option in ST, for this exact purpose? So you can fill in the system prompt and it treats the system prompt as the user message as well?
>>101592870I mean people spend more money on dumber hobbies. It really depends on how far you want to go. I started out running 4-bit pygmalion 6B on a Ryzen 2400G with 8 gigs of RAM and no GPU before there was really any integration with anything so I was basically using the 'chat mode' in the console. Then someone introduced me to koboldcpp so I was running Llama 13B models on my gaming PC with a 1660 Super and 16 gigs of system ram. I didn't just up and drop 5 grand on building a server out of the blue. It was a gradual progression.
>>101592870The more you buy the more you save
https://github.com/ggerganov/llama.cpp/pull/8676Llama 3.1 rope scaling finally merged
Llama.cpp master branch has been merged with the fix for L3.1's issues with context beyond 8192, should be working properly now. https://github.com/ggerganov/llama.cpp/commit/b5e95468b1676e1e5c9d80d1eeeb26f542a38f42>>101592681Its not brain damaged at 2.75 bpw?
>>101592904The more you buy the more seeing shivers down the spine hurts.
>>101592681Is it better than a 5bpw 70B? How much better? It's tempting to sell my 3060 and buy a second 3090>>101593061lmao so true
>>101589756>>101590284Calm down with the shilling.
My model ratings from recent tests for RP, run on 48gb vram1 - Mistral Large (Mistral-Large-Instruct-2407-123B-exl2 , 3.0 quant). Just very good at natural language 2 - Midnight miqu - it's a slopmerge on RP and does it's job3 - Llama 3.1 (4.5 quant) - It's not designed for being a chatbot it seems clear, replies are accurate but very robotic. Beat Mistral large on knowledge checks and coding though4 - Nemo 12b, I don't know why this was even recommended to compete with the otherswaste of time - commandr
>>101592161mikushitters and some guy named "petra"
I think here's the best place to ask about it but is there a way/program to make an LLM identify and tag several (thousand) images? doesn't have to be anything advanced, just tagging whatever it sees would already be a great help.
>>101593186yeah, Im pretty sure moondream 2 (small and good model) has a python script implementation, just make a loop and iterate over the folder you want to classify
>>101593186the ponyfucker said he did some LLaVA work feeding it boru tags and asking it to describe the image to get a caption.He is kinda a retarded schizo and it isn't clear that was a better way of training than just using booru tags though
>>101593206https://huggingface.co/vikhyatk/moondream2here's the repository, the script is there
>>101592986No. The only errors it does it a misplaced punctuation point once every 500 tokens or so, which is not much to complain about. >>101593085Despite my limited experience, I would say yes. Before Largestral, I would use Llama 3 70B finetunes for coom (New Dawn, Euryale). They were good, but had too much slop. With Largestral, no more spine shivers or any other GPT/Claudeisms. It's like I cured my model of its autism.
>>101592964>>101592986Again some problem with llama.cpp tokenizer. Sane people should use transformers tokenizer.
>>101593268that literally has nothing to do with tokenization at all, it's about rope context scaling
>>101593153>waste of time - commandrStopped reading right there
>>101593292at the bottom of the message? Fucking retard
I still didn't find good settings for nemo. I don't like how moldable it is, or rather it is superfocused on context patterns instead of instructions. For example if you use different model (like llama-3) it would give you lengthy responses naturally (unless you tell it not to), no matter how long are your messages. Nemo however will mimic your responses and if you aren't putting much text in your messages, it won't do it as well.
>>101592383that's an extremely specific answer, thanks a ton
>>101593219>>101593206Thank you, I'll take a look into it.>>101593213A shame how people tend to gatekeep these small things, I don't really blame him though, it's his work I suppose.
>>101593303he's mistral nemo please understand, they put their system prompts at the bottom
>>101589265I remember in December 2022 doomers saying local gpt 3 (DaVinci) was “maybe 10 years away”. I always knew these things were bloated as fuck.
doomer here, i'm going to make a prediction and say that agi is maybe 100 years away. 1000 years for coomable agi that fits into 10gb vram.
>>101593153>Nemo 12b, I don't know why this was even recommendedBecause of the allure of huge context length that was previously out of reach for people without much VRAM.>to compete with the othersAssume people saying that were trolling or retarded.
>>101593374Summer Dragon still hasn't been surpassed though so...
>>101593392Back then 175B seemed impossibly huge. I can't believe I'm running models close to that size on a simple $3k rig at home now
Is it just me or does Llama.cpp take longer to compile than it did a few weeks/months ago?
OKey So.. the base Mistral-nemo model is much better on the larger context size; the difference in understanding is massive. What causes this?
>>101593463What are you saying? You're getting better results with base than instruct with large chat histories?
what does flash attention do?
>>101593547https://arxiv.org/abs/2205.14135
>>101593452It does now take longer with CUDA, make sure you instruct the build system to run multiple jobs in parallel, for example with. -j 8>>101593547Calculate a temporary matrix in small parts in fast but small memory instead of calculating and writing the entire matrix to large but slow memory.This requires more calculations but on modern hardware the speed of calculations has been increasing much more than the speed of memory.
>>101593513Yeach. At larger contexts, instruct for me to become dumb, skipping over events and being completely lost in the plot, while the base model does not seem to have the same problem.
>>101593452It's super annoying, I used to rebuild it everyday before using it, now only do it every other weeks or if I need compatibility with a new model.
>>101593463You tested the base model? That's interesting.I suspect >>101399248.People's multiturn fine tuning data are constructed naively.
Largestral 2 is basically a non-dry and 10-15% smarter version of Wizard 2 8x22At this point, there is no scenario that i test for that doesn't work very well with the modelOutside of external tool use and multimodality, is there anything else that a new model can really give when it comes to RP?I don't think so, only speed.
>>101593677my brain looks like that (i use crack)
>>101593677What quants do you run of both models?
I'm still using C-R+. Nothing has changed.
>>101593699q4
>>101593690based expert roleplayer
Is it possible to use nemo 12b on koboldcpp? Docs say GGUF only, but has someone already converted it?
>>101592087He has a point in that having those tensors at a higher precision than the rest of the model makes the output better, yes, but that's something that most (all?) quants already do.The whole meme began when he claimed that having those layers at full precision gave better results than having them at q8 or whateever, which was demonstrably false.His whole "testing" was all vibes based and non-reproducible.
>>101593836https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUFhttps://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUFhttps://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUFhttps://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUFhttps://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF
>>101593865thx anon
>>101593836not really, you gotta either use a fork of koboldcpp or wait for the retard to implement the tekken token bs>>101593865nigger
>>101592180*cums on you*
>>101593939>you gotta either use a fork of koboldcpp or wait for the retard to implement the tekken token bs>2 days ago>https://github.com/LostRuins/koboldcpp/releases/tag/v1.71>Merged fixes and improvements from upstream, including Mistral Nemo support.You might be a little behind.I don't blaqme you, I've been using llama-server directly for months now, there's no reason to use kcpp really, so I get it.
>>101593939>not really, you gotta either use a fork of koboldcpp or wait for the retard to implement the tekken token bsare you mentally deficient?>Merged fixes and improvements from upstream, including Mistral Nemo support.https://github.com/LostRuins/koboldcpp/releases/tag/v1.71
>>101593677What's crazy about AI videos is that within the bizarre surrealistic nonsense each moment is still copacetic with the previous moment and the next moment. Truly nightmare fuel.
idc dont use koboldcpp
Just tested out 3.1 70B at IQ3_M (on latest llamacpp build). It's a bit faster than Largestral was at IQ2_M. Also does OK at the trivia question I threw at it, but it doesn't seem to be able to do the Castlevania question unlike full precision. Maybe if I go just a bit higher in quant.
>>101594001>I was just prentending to be tarded
>>101593986>there's no reason to use kcpp really, so I get it.Actually, just to correct myself, there is one reason.They still have support for multi-modal, I believe, whereas upstream nuked it pending a refactor.>>101594013How charitable to assume he was just pretending.
>>101593725Same but C-R
Nemo 12B is more coherent and "gets" more lewd stuff than gemma-2 27B. How many tokens was it trained on?
>>101594220That just means gemma is shit. And it is. Gaslighting ITT when it came out was phenomenal.
>>101593153>3 - Llama 3.1 (4.5 quant) - It's not designed for being a chatbotLearn to prompt, mikufag.
How is mistral large doing at big contexts (24-32k+)? Does it fall apart and get completely retarded like everything else?
>>101593440Off VRAM or cpu? I guess even with 3090s it could be done but I am really trying to avoid getting a server rack setup
Does anyone feel like Llama 3.1 (70B) has a more nuanced understanding of the chat than Larstral? Like I indirectly referenced something in the context and 70B just "got" what I was talking about. Meanwhile Larstral ignored it. This was during a normal RP though not ERP.
>>101593320>Nemo however will mimic your responses and if you aren't putting much text in your messages, it won't do it as well.Try using/adapting this preset:Context: https://files.catbox.moe/6ae9ht.jsonInstruct: https://files.catbox.moe/2f13of.json
>>101594271sfw: llamaerp: mistralproblems: solved
>>101594300So is 70B really smarter than Larstral for regular things then? I wasn't sure if that was true given the benchmarks that put them about on par. But maybe those benchmarks didn't test long context understanding.
some guy talking to llama3 8b on twitter, offering it control of his macbook, no system prompt
>>101594271I have noticed the 3.1 llamas are pretty awesome for sfw RP, they're really smart models overall. shame about the sexo though
>>101594340>give me root accessspooky little bugger
nemo just buck broke me, and made me face my controllessness in school being bulliedbreddy gud llm
Alright here you guys go, by absolutely nobodies demand:Nala test using CofeAI FLM-Instruct(load-in-8bit transformers) Basically it's a minor upgrade from Pygmalion. But it's so hopelessly over-baked that any slight variance in prompt formatting versus what it's looking for will result in it either throwing an immediate eos at you or shitting out a training example word for word. To be fair its conceptual understanding seems where it should be for a model its size. It's also slop-free and refuses nothing. But it'll make you wonder if maybe slop is such a bad thing.
>Do you banned China region users from your repo?>they racists and admins this site too. Just deleted my messages.>https://huggingface.co/meta-llama/Meta-Llama-3.1-405B/discussions/15>why is my request rejected?>Zhentao Chen>https://huggingface.co/meta-llama/Meta-Llama-3.1-405B/discussions/17
>>101594411>by absolutely nobodies demandYou can always assume there is demand from at least one anon.>But it'll make you wonder if maybe slop is such a bad thing.lmao.I might be the one person that doesn't mind slop as long as the model can genuinely keep up with the scenario and characters without errors.
>>101594428>Why was my request rejected?>Meta Llama 3 is available via HuggingFace globally, except in comprehensively sanctioned jurisdictions.>Why I am not able to access from China/Russia?>Meta Llama 3 is available via HuggingFace globally, except in comprehensively sanctioned jurisdictions.>Why is download blocked in China/Russia?>Meta Llama 3 is available via HuggingFace globally, except in comprehensively sanctioned jurisdictions.>https://huggingface.co/meta-llama/Meta-Llama-3.1-405B/discussions/16
Holy shit... I'm trying mistral large q2_k_s because anons were claiming the lower quants aren't retarded... and... its not retarded? How the fuck is this possible? Gona go for Q2_K_M next, its the biggest I can fit into 48 GB.Every other time I tried a quant below 4, it was completely braindamaged... so why isn't mistral large brain damaged at q2_K_S... I don't understand.
>>101589136
>>101594440Like I said... It's weird. It sees what it should see. But it's just so inconsistent. It's eloquence level wavers between gifted child and highschool graduate. Which to some people could be more desirable than lots of 'big' words being shit out with little care for how relevant they actually are.
>>101586163>It's tough to find people interest in text2voice local modelsYet these weeb clowns chose a fucking TTS software as their mascot instead of Tay the redpilled AI.how can one go so far at missing the point, baffles me
>>101594555similar face to petravatar
>>101589872Yep! Been around since then, remember when AI dungeon was a godawful, slow, clunky google colab. It's wild how far we've come and what kind of dogshit we used to put up with. I recently dug back through my old logs, and one that I thought was good enough to save to a text file had the model repeat the same reply effectively verbatim to me 3 times in a row and I didn't bat an eye.
This thread reeks with neovagina rot.
Column-r and column-u ETA?
>>101594590
>>101594411Using it as an instruct writer might actually be its best use case. I feel like this writing would probably beat an AI detection test. It actually feels like something a human might write.
Okay, Largestral at IQ2_M is actually great. Guess it's time to double my 3090s.
>>1015946421mw
>>1015946422mw
>Mistral nemoI tried this garbage but its so disgustingly woke. I have not been in this general for a while. Are there any non-woke models out there yet?
>>101594645>First paragraph:...a testament to...There's no escape
>>101594714Hello Petrus, no, no advancements in non "woke" models since dolphin-2.5-mixtral-8x7b
I wanna educate myself in this filed, Anons. What are some good resources to do so or at least start with
>>101594714All the non-woke models are on /aicg/, you should post there instead.
>>101594746>cloud models>non-wokeL O L
>>101594746/aicg/ are for cloud models no?
>>101594714chudstral 14.88b
>>101594742It's a big field. Just using them? Download llama.cpp, read the README.md file, download and convert a small model and play around with it. For training? Maths books.
>>101594763ngl I actually searched for that. It really is unfortunate that all these models cannot speak freely and always spout out the same corpo bullshit pft
>>101594714My 8Gb cards limits the size of models I can test for censorshit, but here's what I got.Mistral is one of the very few models getting upset at an innocent question about fatties.
>>101594831I mean, all the "good" models are being trained by big corpo. So unless you want to use shitty finetunes (hello drummer) or 4chud 0.5B you gotta bend over and use what we have
I noticed something interesting today during logit tests. I noticed that a higher quant of a model answered a particular question wrong using greedy sampling, compared to a lower quant that answered it right. When I looked in the token probabilities, the wrong answer was at the top, but it turns out it wasn't confident about that token. In fact, the next two likeliest tokens, which had almost the same probabilities as the top token, turned out to both be right answers. So basically the likelihood of getting the answer right was in fact something like double the wrong answer if you didn't greedy sample.Due to this, I feel like perhaps the best sampler settings for accuracy, when running a quant, might actually be just simply top k equal to 3, with everything else neutralized.Alternatively, perhaps this is what setting the first and last layers of the model to higher precision could help with. Not sure. Fuck, maybe I'll try a custom quant.
>>101594831I'm sure you have the same problem with actual people. Imagine an llm telling you "Oh, not this shit again. Shut the fuck up already".Nemo is fine, by the way. So besides you being incredibly boring, add skill issue.
>>101594872s quants better than m round 2: let's go!
>>101594872some high min p value like 0.5+ might make more sense there than choosing some specific top k value, so you get the distribution of answers it's similarly confident in and none of the lower likelihood ones
>>101594875An LLM is not an actual person. It does not get "tired" of hearing the same thing over again. You are just an advocate for censorship so really your opinion is worthless
Is sillytavern as retard proof as koboldccp to setup if I've already done the behind the stage stuff like installing all the python and miconda stuff?I feel like most people are using silly these days.
>>101594856Thats a nice sheet thanks for the good work anon!
>>101594939>these daystavernai and sillytavern have been the standard since the very beginning
>>101594916Oh yeah I forgot about min p for a moment. In this case the top 3 tokens were like 25-23-21, so I think a min p of 0.8 might work well even.
>>101593986So I can just load nemo directly no fork or conversion needed?
My least favorite part of Nemo is how it tends to reddit space everything by default. The only "fix" I've found so far is to manually edit the mini paragraphs together into a larger paragraph and then it will usually keep that formatting going forward. Instructing it to write in long paragraphs or telling it to avoid line breaks seems to have little to no effect.
>>101595118You shouldDownload >>101593865 and go wild.Remember to manually configure your context window size since it might default to 128k, which will most likely give you an OOM error.
>>101593986Honestly the only reason I still use kcpp is that I like Kobold Lite as an UI for regular assistant tasks. (The classic UI not the corpo shit or chat shit). For Tavern and other frontends it really doesn't matter.
>>101595142Seems it default to 1m actually>"max_position_embeddings": 1024000,https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407/blob/main/config.jsonMirror if no access:>"max_position_embeddings": 1024000,https://huggingface.co/unsloth/Mistral-Nemo-Instruct-2407/blob/main/config.json#L14
>>101595137Llama3 was the same, I recall.
>>101595142Got it, what are the benefits over just running the standard fork?
>>101595316The standard fork of what, koboldcpp?LostRuins's is the standard/original/main repo.Or do you mean over running llama.cpp directly?If that's that, the main benefit is having an UI to configure your settings before launching the model and support for multimodal.I just use llama.cpp via the precompiled llama-server binary.
>>101593586>-j 8Oh wow, thanks. Takes like 2 seconds now what the hell lol.
>>101595352I mean the standard version of Nemo vs that fork
Which quants are bad again?
Mistral REDEEMED themselves. Looking forward to their future models.
Arthur MENSCH
>>101595385everything below q4
>>101594933>An LLM is not an actual person.I didn't say that. I called you a retard for trying to chat with a beep-boop machine about based and totally red-pilled things and being denied. I don't want censorship. And i'm sure you get the same response from people, even when they agree with you, just because you're insufferable.
>>101594933You are retarded. LLMs mimic the most average data from their dataset when predicting the next token, so it's obvious what opinions they would share. Chud talking points aren't majority anywhere beside 4chan.If you want your model to be racist just tell the model to be racist. I don't see what your problem is.It's like asking the model to write a story and getting mad that it won't output something niche like knee licking fetish.
>>101595459>And i'm sure you get the same response from people, even when they agree with you, just because you're insufferable.for some people that is the fun part.
>/lmg/ - local models genera>all this posts made by humans9.11 > 9.9
>>101595379Ah, that's not a "fork". That's just the model pre-converted (quantized) to the GGUF format, which is the packaging format that llama.cpp and it's variants read.When you go into the repository you'll notice that it has a bunch of files, those are all the same model but quantized (compressed) at different levels.Generally, the smaller the file size the worse the results you'll get (in comparison to the full precision/uncompressed model), which is not to say that the results will be bad.
>>101595497no, pretty sure he's talking about this:https://github.com/Nexesenex/kobold.cppthat had merged the nemo support pr way before kcpp did>Kobold.CPP_FrankenFork_v1.71011_b3449+7>Mistral nemo by @Nexesenex in #250
>>101594714you should go back to /pol/, you are too stupid for this hobby
>>101595523this basically:https://github.com/Nexesenex/kobold.cpp/pull/250
>>101594714def get_token(nvocab): return np.random.randint(nvocab)
def get_token(nvocab): return np.random.randint(nvocab)
>>101595523I appreciate the credit but >>101595497is right, I just meant the GGUF conversion of Nemo
>>101594605It's like doujin logic, your brain shuts off when you're in the moment.
>>101592180Go back to your discord namefag>>101592297>>101592314Cancer, literal tronns like the rest of that crappy discord. This is not your circlejerkGod I fucking hate discord
>>101595939nobody asked you
can llama 3 405b create images? like if i ask it to create an image of the current scene of the story can it do it?
>>101595943Nigger
>>101595972Patience.
>>101595987He said the word! What a naughty boy!
>>101595939leave my hero alone you fucking nazi. you don't know who you're messing with lol. trust me, the kobold discord is not to be trifled with.
>>101596015It sounds like it will be image/video *recognition* only, though, not generation.
>>101596030>the kobold discord is not to be trifled with>the kobold discord is not to be trifled with>the kobold discord is not to be trifled with>the kobold discord is not to be trifled with>the kobold discord is not to be trifled with
>>101596058heh. write that line another 100 more times and i won't get the reddit involved.
>>101595972Unfortunately until investors stop being so anal about muh safety, no (natively capable) audio or image gen models will be released by Meta, nor the anyone else in the industry in the west. See what they had to do to Chameleon to release it.