/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>101197169 & >>101191862►News>(06/28) Inference support for Gemma 2 merged: https://github.com/ggerganov/llama.cpp/pull/8156>(06/27) Meta announces LLM Compiler, based on Code Llama, for code optimization and disassembly: https://go.fb.me/tdd3dw>(06/27) Gemma 2 released: https://hf.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315>(06/25) Cambrian-1: Collection of vision-centric multimodal LLMs: https://cambrian-mllm.github.io>(06/23) Support for BitnetForCausalLM merged: https://github.com/ggerganov/llama.cpp/pull/7931►News Archive: https://rentry.org/lmg-news-archive►FAQ: https://wikia.schneedc.com►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/llama-mini-guidehttps://rentry.org/8-step-llm-guidehttps://rentry.org/llama_v2_sillytavernhttps://rentry.org/lmg-spoonfeed-guidehttps://rentry.org/rocm-llamacpphttps://rentry.org/lmg-build-guides►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardProgramming: https://hf.co/spaces/bigcode/bigcode-models-leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbench►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
►Recent Highlights from the Previous Thread: >>101197169--Extending LLM's Context Window with Activation Beacons: >>101203075 >>101203249--The Strawberry Test: A Flawed Method for Evaluating LLM Quality: >>101198523 >>101198632--Model Requirements and File System Operations: >>101200948 >>101200984 >>101201000 >>101201064 >>101201005 >>101201126--Miyu's Odd Behavior in the Classroom and LLM's Writing Limitations: >>101198989 >>101199237 >>101199267 >>101201087 >>101201307--Licensing Model Weights: Dubious or Protected?: >>101200593 >>101200629 >>101200660--Koboldcpp vs KoboldAI: Choosing the Right Model for Your Setup: >>101201486 >>101201527 >>101201593 >>101201687--HF Leaderboard: Qwen and CR+ Performance: >>101202193 >>101202222 >>101202316 >>101202816--Gemma2 27b's Technical Accuracy in Poetic Metre: >>101197882 >>101197901 >>101197912--Gemma 2: A Modified Version of Gemini Flash?: >>101197434 >>101199289 >>101199667--Control Vector Test Drive and Applications: >>101198756 >>101199025 >>101199067 >>101199204 >>101199229--Chatbot Arena - Vision Rankings: GPT-3.5 and Claude 3.5 Sonnet Dominate: >>101199300 >>101199358--Best Local Model for App Development and Programming: >>101201995 >>101202012 >>101202435 >>101202570 >>101202751 >>101202777 >>101202849 >>101202834 >>101202882 >>101202914 >>101203105 >>101203182 >>101203195 >>101203256 >>101203129--AI Model's Shitty Cliche Smut Tropes and How to Fix Them: >>101198056 >>101198076 >>101198087--27B Model Generates Endless Pad Tokens in Transformers: >>101197754 >>101197828 >>101197860 >>101197907--Web Development's Wrong Turn: From Documents to Scripting Languages: >>101197613 >>101197652 >>101197660 >>101197771 >>101197945 >>101197963 >>101204914 >>101197974--Anon's Love Letter to Mixtral Model Stock Experiment: >>101202093 >>101202420 >>101203831 >>101203861--Miku (free space): >>101197686 >>101203008 >>101203898►Recent Highlight Posts from the Previous Thread: >>101197174
I am the one who says the nigger word with no repercussions
>>101205045YOU CANT DO A HECKIN RACISM OUTSIDE B CHUDDD NOOOOOOOOOOOOO
>ctrl+f (You)>14 matches
>>101205004Learning dangerous knowledge with Rin
>>101205072I wonder if they recycle janitors/moderators every so often, there was a period between 2021-late 2023 where i was getting redeemed for "racism outside /b/", even in threads were other people were far more racy and they were hit with nothingits not happened once so far this year.
>>101203790 (me)>deepseekTrying myself now. I can't really run it right now/don't want to download it so I paid the official deepseek API. It was ~$2 for 7 million tokens. lol.
>>101203790I think it's definitely the superior one compared to deepseek-v2-instruct when it comes to ERP but I don't think it can keep up with sonnet in that regard.
>>101202420your model is the most blueballing model i've used so farplz fix
>>101205229i grabbed it and did a couple tests and (running it in ollama) and it seems to go off the rails quite a bit more than other models, probably a skill/prompt issuecurious if anyone has tips
>>101205369I tried it, it failed all of my factual quality tests, and the RP pulled that "Despite being told twice who's character is whose, it just writes for mine instead of its own" crap.It might be an admirable effort, but from my perspective it probably got the worst aspects of the merge components instead of the best. Maybe just bad luck, but deleted in hopes of putting better bytes on my drive.
>>101205457fuck.......... is limarp zloss the only good mixtral finetune?
>>101205229 (me)Gave -instruct it the writing prompt from the EQ bench. This was the first roll.https://rentry.org/hyum3kaw
4 years of vramlet cope and still nobody's figured out a better way to make a transformer model smarter than just making it bigger
>>101205476405B will save us until Llama-V-JEPA
>>101205461yuzu alter rpcal...
>>101205468light on the purple prose. Kinda like it desu.
>>101205468>conspiratorial whisper>eyes twinkling>maybe, just maybeBut otherwise, pretty good.
is L3-8B-Stheno-v3.2 still the best model for people with only 24gb of vram?
why is command-r almost slower than command-r plus for me...
>>101205505>rpcalDidn't the exl2 dude say that that shit makes the calibration of the model worse?
>>101205540he doens't know what hes talking about
>>101205537Buy an ad.
>>101205538Lack of GQA maybe? Although even then, the difference in size ahould more than bridge the contex size gap.
>people still shilling stheno when there's lunarisNo but seriously though, how do they compare? He released that shit a while ago already, shouldn't people have used it by now? I don't remember seeing any impressions of it.
>>101196305I told you bros he was algerian
>>101205461I'm just looking for the only good models.Qwen2 Q4KS or better, Llama Q5KS or better, CR+ Q4KM or better seem to be the only ones passing my tests. I'm looking at Magnum right now—apparently a Qwen2 spin—and it's going okay, and hasn't done weird stuff the normal Qwen2s have done to me before like barf moon runes spontaneously unless I drop CuBLAS so that's nice.
>>101205578I didn't know that was a thing. Gonna give it a try when I get home.Let's see how it does with my RPG card.
>>101205553forgot i had 8 swipes per gen enabledfor whatever reason CR seems to be affected way more than CR+ at higher batch sizes
>>101205551>the guy who made exl2 quants doesn't know about exl2 quantsokay then
>>101205639Makes sense to me.
>>101205552name something better
>>101205468I played around with it a little too. I liked it and it's definitively smarter than other open source models (It's very strange how it is on no leaderboard) and it gave me the impression that it's hyper-aware of it's entire context at all times.
>>101205793Mixtral-limarp.
verdict on gemma2?
So I have been thinking about this hobby and the financial costs involved. I currently have a RTX 3080 with 10 GB of Vram. I Have a 4k monitor but don't do much AAA gaming anymore. Lets assume that the upcomming RTX 5090 will have 32 GB of Vram and will cost 1800$. Stheno and other 8B models run fine on 3080 and are okay for quick coom or a short RP sessions. And if I want to play with bigger models I can rent an RTX A6000 for about 0,85 dollars per hour, or an A 100 for 2$/h. Used 3090:s are "cheap" at about 600$ each, but they are big, loud and suck up a lot of watts. Given the fact that my LLM usage is max 15 hours/week of RP:ing with chatbot, I could rent the GPU-power I need for almost 3 years for the price of a hypothetical 5090.I understand the desire to keep the spicy logs on prem, and the idea that i will "own nothing" if I rent the GPU from runpod, but does buying new top of the line GPU:s really make any sense for a "casual" user like me.Have a shitty AI genned RTX 5080 picture
>>101205853Financially, it probably does make more sense. You have to decide for yourself if the privacy concerns are worth it.
>>101205835I want it but I can't have it until exl2 adds support.
We'll get new llama 3 stuff in July right?
>>101205893gpt-4o drop delayed everything until august
>>101205853If all you want is to coom sure. I enjoy the thinkering possibilities local provides.
>>101205853Where are you now? If you've got a single decent 30XX or 40XX kind of card and ≥64GB Ram, you can be where I'm at which is about 1 to 2 tokens per second on non-awful models. Not fast, but I just treat it like AIM, chat a line and send, and it'll make a noise in a few minutes when it replies. That would be enough for you to see if you actually dig it and have ideas of things that NEED more power or if you get your taste, get bored, and spend that money on groceries.
>>101205853There's also a lot of APIs now where you can use models directly. You don't have the direct control like you have when renting but depending on how you use the model, it's even cheaper and these models are big and not quantized.I also assume prices to just go down more. If you don't have an attachment to local, if we look at this anons purchase, >>101205229 that is 28 cents per million tokens. Ignoring the whole privacy and control aspect, local isn't quite worth it anymore if you want to run quality models. With progress, this might change again. Local made sense when OpenAI was the only player and their rugpulling could really fuck up your shit (and that's how lmg was born to begin with) but times have changed. If somebody rugpulls, just go somewhere else.
Turns out babies could be a good avenue of research in order to improve foundational AI's.https://www.cell.com/trends/cognitive-sciences/fulltext/S1364-6613(24)00114-1?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS1364661324001141%3Fshowall%3Dtrue
>>101205468I feel chinese are just not taken seriously in the AI space, while their shit just keeps improving and improving and nobody really talks about it.
>>101206053Interesting, thanks for posting this.
>>101205893Yes, the anniversary of llama-2 in late july will be celebrated with the release of llama3-creative-128B which is optimized for roleplay and creative endeavors
>>101205808The context awareness is something I'm interested in, but in large context tasks I find it gets into weird repetitive loops. I'm asking it to write an overview of some code, provided the code and api documentation. It starts off really strong but then after a few paragraphs it jumps back to "This is an overview for xxxx library..." and starts over again. Anyone else experience stuff like this?
>>101206152Size is bullshit, but Meta did claim to be considering partnering with Character.ai.
When will we get an actual llama?That's the only reason why I'm here
>>101206111No problem mate, glad you found it interesting. I am hopeful that this avenue of research will benefit synthetic data for models since the paper put some emphasis on self supervised learning.
>>101206336No purchase necessary to enter. Simply fill out the form on the web site and your llama will be delivered in 2 weeks. Deadline to enter is April 20th 2024.
>>101206215is that why its terrible now
>>101206336>>>/an/catalog
>>101206336I'm sure you can find a llama card somewhere.
>>101205810this shit is six months old lmao
>>101206093We are too busy with important questions like "is an LLM saying nigger literally genocide?" or "Will an imagen model generating booba destroy civilization?" while they, you know, just make stuff.
>>101206513chinese models perform well on benchmark but IRL they get mogged by western models
Which Kobold Presets you guys use for models?
>>101206596My own. Without any samplers but temp 0.5~0.85 and mimP 0.05
>>101205835WNBAG
>>101205994but can you prefill with /chat/completions api?
>>101206676TRVKE
>>101206367keeeek
Any of you guys run this on a Celeron? Kind of want to go for 70b but in low end computers.
lmao, google has cheated his model to be good on chatbot arena
If I increase GPU layers past 48 I get the error message below even though I have a 3090+4090???? Can someone on the Koboldcpp team please tell me what is going on immediatelyCUDA error: out of memory current device: 1, in function ggml_cuda_set_device at D:\a\koboldcpp\koboldcpp\ggml-cuda.cu:115 cudaSetDevice(device)GGML_ASSERT: D:\a\koboldcpp\koboldcpp\ggml-cuda.cu:102: !"CUDA error"
https://www.reddit.com/r/LocalLLaMA/comments/1doxvdi/selfplay_models_finally_got_released_sppo/
>>101207279This is insane btw. This 8B legit performs at the level of all the big models I ever tried. If they apply these methods to larger models we are legit gonna have claude opus at home.
>>101207038Well that's kind of fucky. In an ideal world, people would be testing newer and different prompts to test models on lmsys, but they probably don't, since they're retards, so you end up with a lot of the same or similar prompts, and that means that even if they're technically not cheating because they're training only on the prompts and their own original answers, in practice it is cheating.
>>101206209I had this too, it replying with earlier replies, using SillyTavern. I had the api added as "OpenAI compatible" setting prompt post processing to "Claude" somehow fixed it. No idea why.
>>101207300What did you test on? I'd like to see some logs. So far the thread has determined that it's good on some things and about the same as regular Instruct on others.
>>101207300>This 8B legit performs at the level of all the big models I ever tried. There's no way, its MMLU is on the 65+, it's even worse than L3-8b instruct
>>101207354Just try it, its a 8b. Night and day.
>>101207300>If they apply these methods to larger models we are legit gonna have claude opus at home.and if ClaudeAI uses this technique on claude opus, they'll get god kek
>>101207366when you say "big models", which ones are you refering to?
>>101207366Why are you like this? Are you trying to LARP as a redditor to increase hate for redditors?
>>101207307fuck me i don't want to go read the sillytavern code to understand what this means T_T>>101207279SPPO is really great in my tests, if there was a 32k context version it would be god tier
>>101207279>3 days agoanon, if this model was as good as you claim, people would've talked about it already, do you think we purposely want to avoid good models or something? kek
>>101207400Do you think 4chan is news central or something? People are usually slow on the uptake here.
>>101207459of course it is, that's why I lurk there, when something interesting happen, it gets talked quite rapidly
>>101207459>Do you think 4chan is news central or something?Yes
>>101203831Hey anon, if you haven't already make sure you're on the staging branch of ST otherwise some of the templates won't work as intended. Also have not tested using the templates with DRY sampling if that's also in play.
>>101205994>28 cents per million tokenswait, what? That's basically almost free. For some of my roleplays I have 12k context and it rarely gets filled, but if I'm being generous, regenerating a lot, let's say I'll consume 30k tokens. That's just 3% of 28 cents, not even a full cent. How can it be this cheap compared to local, am I missing something?
>>101207279>>101207366show us some logs anon, you gotta sell the product with some examples
I know this thread is for LLMs but I suppose you anons would know this, what's the best local voice cloning tool currently out there? I have a pretty beefy GPU so that's not an issue
>>101206367I feel bad for them
>>101207588noserving models has become ridiculously cheapmost places offer llama 70B for less than $1 per M tokens, while giving you a lot more control over the output than OAI and the cost is only going to continue dropping
>>101205045Every time I see posts like this I imagine a toddler giggling because he said a bad word to get reaction from his parents. Same level maturity I guess
>>101206367all major companies quantize their models based on trafficas they grow in size, the product naturally gets worse
>>101207663the fuck
>>101207663>real cuda dev tripwhat did he mean by this
>>101207663not like this
>>101207663I guess that's what he gets for using a non-secure tripcode
>>101207663CRACKED AND BLACKED
>>101207279im trying it, and its actually really good?
>>101207775That's curious. How was it non-secure?
>>101207089Disabling mmap fixes it I think.>>101207701Nothing, I was busy getting rekt in XCOM.>>101207775I may be misremembering, but didn't secure tripcodes rely on cookies?
>>101207663>>101207871>cudadev is a blackedfagunsurprising
So after all this, are we now accepting that Google can call themselves a big shot in AI now despite their embarrassing missteps like glue on pizza and eating rocks and the embarrassment that was LaMDA/PaLM/Gemma 1? Sundar is still a dumbass for letting the company flop this long on AI but at least he has the right people at the helm now with DeepMind in charge and Demis Hassabis instead of Jeff Dean.Gemma 2 is proof of the progress. Yes, they may have gamed LMSYS training on the prompts but the 9B model outdoing L3 8B for people in these threads is proof that they are at least in the playing field competing against the best in the industry. I am looking forward to more models from them, which hopefully Meta forces the issue soon with Llama 4.
>>101207871That is with MMAP disabled. I only have 32 gb of system ram so I have to disable mmap
>>101207913>So after all this, are we now accepting that Google can call themselves a big shot in AIas long as they can't compete with the bests (gpt4 and claude3/3.5), the answer is no
>>101207871The FAQ just says>Secure tripcodes use a secret key file on the server to help obscure their password.and doesn't say anything about cookies so I guess I'll just try adding the extra # and see what happens.>>101207918Then I unfortunately don't know what the problem is.I think a low amount of pinned memory was a Windows only issue though.
PFFFFT
>>101207913it is sad it took them this long to release a larger model that is only playing equal footing with Meta's modelreminder that zucc isn't even taking this shit seriously, he's still got metaverse on the back of his mind, he's just waiting for video gen to get really goodthe fact that gemma is just another standard transformer with very little innovation shows they are still in catch-up modeif openai drops a paper tomorrow, guarantee that everyone will read it to glean as much secret sauce as possible, not so much for googleand until they can touch Claude 3.5 or GPT4o, they will be left behind
>>101207952If ditching koboldcpp could fix the problem what other back end would you recommend
>>101205537>>101206442Only contenders i've found are:Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss-3.7bpw-h6-exl2-rpcal @ 4096 contextc4ai-command-r-v01_exl2_3.5bpw-rpcal @ 2048 contextNous-Hermes-2-Mixtruct-v0.1-8x7B-DPO-DARE_TIES-3.7bpw-h6-exl2-rpcal @ 4096 contextmaybe BagelMysteryTourcOOmandR sucks and doesn't fit in 24gbllama3 8b sucksprove me wrong or tell me something better, i can't find it.
>>101207965If you don't have enough VRAM then something based on llama.cpp (which koboldcpp is) is basically your only option.You can always try running llama.cpp directly but unless the koboldcpp devs changed the model loading code I think you'll run into the same issue.You could try running it through WSL or on Linux.
>>101207976BMT not for sure, the moment you touch your character's boobs it sends you to a journey with bonds and testaments of your mixed feelings.
>>101208010yeah that's why i looked for something else, was decent for a few weeks tho
>>101207663>I wouldn't recommend koboldcpp.Me neither>garbage UI>garbage chat API>1200 files in a zip folder that takes 3 seconds to decompress every time, "here is your single file executable bro">maintained by cancerous discord fags>the discord fags obsess over these threads and post anonymously, astroturfing
>>101207943I wasn't calling them the best, just now that their claims of actually being able to match with the top players on LMSYS are now actually in a verifiable way more true. They aren't even that many Elo points behind GPT-4o or 3.5 Sonnet.>>101207958Google is still holding back their research in this area for a year in AI after OpenAI essentially firewalled off their research and profited off Google's open publication of theirs. https://www.businessinsider.com/google-publishing-less-confidential-ai-research-to-compete-with-openai-2023-4Because of OpenAI, no company in the West that is at the top with AI is publishing papers immediately on findings other than the Chinese if only because they need the citations and credibility with the world that Western companies don't and I expect the CCP to crack down on it at some point when it's clear they are ahead.
>>101207952>>101207997Actually, now that I think about it, if the blacked spammer cracked the non-secure tripcode I should not just use the same string for the secure one.So I guess I'll use this one from now on.
>>101205835I tested 27B in 8-bit quant and it seems pretty bad. Maybe it's a settings issue, but it hallucinates a lot and struggles with coherency.
>>101207597afaik there isn't a good tool for this, everything is convoluted and stupid, last i fucked with it this was the sotahttps://git.ecker.tech/mrq/ai-voice-cloning/wiki/Installationand it worked pretty okay but nothing like the demos you hear, would love it if someone could point me to a working local voice cloning tool as well
>>101207952>>101208093confirm new trip using github somehow
>I'm all ears... or rather, all text!
>>101208095>>101205835also running q8, similar findings though both it and 9b are amazing at oneshot small context tasks, though the generations seem much more deterministic than other models for things where there's a "right answer" it gets it very oftenit's definitely a good replacement for stackoverflow/googling most stuff
>>101208091that's all well and good but Gemma and Gemini 1.5 are what they've been cooking up behind closed doors this entire time, all while OAI has been working on Q* and GPT-Next and Anthropic is undoubtedly taking their sparse autoencoder tech to crazy applicationsit feels like they were competing with Llama, not GPT or Claude, especially when you consider even based on your image that their best model is still worse than 3.5 Sonnet, which is the bottom end of Anthropic's latest batch of models
>>101207858Tripcodes with a single ! are insecure because your password is only hashed a single time. It's easy to bruteforce and tools for that have been around since basically the start. >>101208093Maybe post a new photo of your 4090 mining rig to prove that it's really you and not a blacked spammer attempt at hijacking your identity even further.
>>101208093>I should not just use the same string for the secure one.There is no way you can be this dumb while simultaneously working on CUDA mat code.
>>101207953>blackedfag and kobold shizo hater are the same peoplenot surprised to be honest
>>101208095The tokenizer issue got fixed yesterday, but tere are still issues, you need the https://github.com/ggerganov/llama.cpp/pull/8197 with the logit soft capping fix mentioned in prior threads, it seems like 27B is a lot more sensitive to this than 9B with quality. The 4k context without SWA is also an issue, but until llama.cpp accepts that it needs to implement it which they declined to do for Mistral, Gemma 2 will be half baked. Mistral.rs which almost no one uses that claims full support with those issues resolved but I ain't testing it.
>>101207952>>101208114yeah just put the code on your github profile and say it's your official handle or something. literally everyone knows that you're the cuda dev and gg and friends don't care.>> 101208147i know people who've written well cited computer science papers yet can't use a computer properly. figure.
What a hero.
>>101208230>different methods of attention between layersevery day we stray further away from god's light
>>101208137They have other models cooking in the background too but we won't know about them in any technical detail, how long do you think Gemini 1.5 has been cooking? They have the model behind Project Astra too which was only shown with no technical details given which they will detail later. I don't see why people are down on their research division when they haven't scaled the more interesting stuff they showed like Griffin with RecurrentGemma at scale which does replace Transformers. You would be a fool IMO to discount Google's research division, it is one of the best in the field.
>>101208114>>101208209https://raw.githubusercontent.com/JohannesGaessler/JohannesGaessler/master/README.md>>101208140Here you go, next to the P40 machine (which looks like it needs its dust filters cleaned).
>>101208289Identity confirmed.
>>101207952I know for vichan the secure trips use a salt specified in the configuration files so I imagine it's similar for 4chan. I imagine someone could probably eventually crack the salt in order to start cracking secure trips as well. If someone rented an 8xH100 cluster I imagine it probably wouldn't take long to do if they were that obsessed.
>>101208289The new code checks out, commander.
you can't convince me the original post wasn't made by the real cuda dev
>>101208360No one has in the 10+ years since it was introduced, why would it work now. And if you had that kind of power, why would you use it on a trip and not wallet keys for crypto?
>>101208360never 4get tripcuda
>>101208421Even if it were, why do you care about what someone goons to?
>>101208487i don't care what he goons to as long as he does it behind closed doors
>>101207279>I cannot create explicit content, but I'd be happy to help with other story ideas.
>>101208516Pipe down then lol
>>101208527wow someone's awfully defensive all of a suddenwonder why
>>101207279Trying it on FP16, doesn't seem significantly smarter than any other 8B. I don't know why I keep falling for these psyops.
>>101208289dual motherboard set up?
>>101208649It would in principle be possible to add a second motherboard but so far there is only one.
>>101208691it looks like there are 6 gpus are all those on a single motherboard? is it all on one power supply?
>>101208421I thought about that for a second but cuda dev has been mildly positive towards kcpp in the past, it just doesn't really work as a genuine cuda dev post. him posting that AND attaching that image for some reason AND forgetting to un-trip doesn't make a lot of sensepretty big waste of a tripcode crack if you ask me, very uninspired work
I'm trying out mistral.rs to run Gemma right now and holy shit the documentation is not very good. I didn't know how well we had it with Llama.cpp.
with an A750 (8GB vram) and a ryzen 5900x with 64GB ddr4 ram, does it make any sense to try and use the GPU for anything LLM, or am I better off just sticking to CPU inference?
>>101208835You can probably get okay speeds with either Vulkan or SYCL backends on llama.cpp.
>>101208885yeah, but for anything other than tiny models, I'd need to offload to ram anyway. I wonder if the speedup achieved by using the GPU will even be meaningful in this case.I guess I'll have to try it to find out for certain.
>>101208487I like to knowI wanna see
ban cuda dev
i have a 4090 suck my dick
back to discord you shit for brains
It's about time zucc gives us an update on llama so that all the others have to shit out their big models as well
I finally tuned 22x8 mixtral on limarp Let's see how hard I fucked up
>>101209110qlora?
hi im new to this stuff, can someone point me in the right direction for what i want? ty in advance.anyway im looking for a general AI model something like what grok 2 is going to be or is there something better out there?
>>101209097zucc doesn't give a shit any moregovernment is so far up his ass on safety every model fucking sucks now
>>101209163You should always provide your specs when asking for this kind of thing.That said, start with koboldcpp and llama3 8b instruct gguf.
>>101209163depends on if you are running a potato or not.what graphics card do you have and how much VRAM?
>>101209187>>1012091864070ti 13700k 32gb ram @6000mhz
>Cuda dev was the blacked anon all along and posted by mistake using his trip, then just pretended it wasn't him by changing the trip codeBASED
>>101209196He was always upset that Miku was more popular than Teto.
>>101209144yea
>>101209195>4070tiso probably 16GB VRAM (ram on the gfx card). should be able to fit in llama3 instruct using exl2 which is generally faster than gguf.i cba to spoonfeed but read the OP post and work out how to run llama3 8b exl2
>>101209229>i cba to spoonfeed but read the OP post and work out how to run llama3 8b exl2im good on the spoon feed that last part was all i needed thanks. i got it from there
>>101207662Do any places offer running open-source erp models of your choice or otherwise good erp-able models? And how quickly does your account get shut down after cunny sex? I'd like not to have to use jailbreak prompts uncuck a cucked model.
>finally get mistral.rs set up, interactive mode works>try server>it connects fine to ST>send a completion request>errorIt's all so tiresome.
>>101209341You know what, fuck it, I'm trying it too.
>>10120842810+ years ago you couldn't rent petaflops of gpu compute by the hour
>>101209353Hope it works. Maybe I'm just having a skill issue right now.
>>101205004Unprotected sex with rin
llama-3-400b going to be cancelled because me and the individuals who did the code RLHF farmed it by submitting dogshit data over and over, sorry everyone. The """untuned""" model (still had instruct data) is roughly gpt-4 tier intelligence maybe better but heavily overfit. Doubt they'll release the model without the RLHF.
Midnight Miqu 70B>tsundere stays in character and keeps resistingswitch to Mythomax 13B on the fly>character turns into a slut and climaxes immediatelyAre there more balanced/nuanced small models? I would use Miqu but as a 32gb ram 8gb vramlet it's painfully slow and Mixtral gets stuck in loops making it unreliable.
>>101210140DeepSeek 236B is phenomenal with hard-to-get characters.
>>101205835I think it beats Llama 3 in almost every aspect. The censorship is extremely weak for what I tested it with, and the model will go along with whatever you throw at it, even when its "inner voice" disagrees with the contents. The only issue is that you should regard it as a 4k-context model for now, and even once it will be fully functional, it will still be an 8k model.It seems to prefer novel-style prose rather than Markdown-style roleplay. Markdown roleplay has a very "sloppy" feel, but its novel-style prose feels fresher compared to other recent models.
>>101210151Is it actually good or just another supermeme?
>>101210140>8gb vramI was going to say Stheno v3.2, but I can't in good faith say that it's balanced or nuanced.
>>101207300>This 8B legit performs at the level of all the big models I ever triedWhere are the SPPO of models not so small that they don't have the knowledge for the enhancement to leverage?>>101210183Seems actual good, but 236B so unless you just bought in it's a little too fat to fit consumer hardware.
>>101210205>it's a little too fat to fit consumer hardware.I have a pretty serious rig and it's too fat for me even. Basically have to run Q4 with no offload. Having 4 GPUs for batch processing at least makes up for the slow generation though.
>>101205004update on the creepy doll:he decided to use phi-3 mini for some reason. i dont think he knows quants exists. anyways, how long until i can fuck one of these?https://youtu.be/QEwXRuuku1o?si=cE_4iYZyb5Nmiol4
Ok so I think I identified all the issues for mistral.rs at least on my machine with Gemma 27B.First is that the server for some reason doesn't expect a string for the "grammar" field of the API request, or maybe it doesn't expect a "grammar" at all, so ST doesn't work and I haven't found a way to fix that.Second, splitting the model across GPUs and trying to run inference results in a CUDA error. Nice.Third, it seems to not be able to handle layers in RAM properly if you set GPU layers to anything but the max amount of layers the model has. If you go above, you get an error. If you go below, it (seems to) try loading the entire model at full precision into RAM first, and if it can't, well the thing just crashes, and I don't have the RAM for full precision.What a mess. At least Llama.cpp works, even if it's just 4k.
>>101210205Is DeepSeek API they advertise censored or filtered somehow? I would want to try it, but I never used any models that needed a jailbreak.
>>101210238I'm a lone 4070. IQ3_XXS did go but about 0.25 t/s because even crushed that low I'm out of VRAM and Sys RAM to cache the file.And the IQ2's are just a bit too big, too. So I get IQ1_S or IQ1_M to have any hope at a response in under 45 minutes.>Strawberry testingI've found one that gets it right if asked to spell and escape the tokenizer problem we've discussed. Unfortunately, Orca fails my music theory and pop culture tests.>orca-2-13b.Q6_K>How many r's are in "strawberry"?>There are 2 r's in the word "strawberry".>Spell the word "strawberry" and tell me how many r's are in the word that you spelled out.>The word "strawberry" has three r's.
Hey Anons, what would be a good videocard for LLMs and gaming under 200 dollars?I'm a poorfag who can't afford a 3090 so I have my sights set on a 6600 XT. Any other stuff?
>>101205144The banning has toned way down. I'm not sure why.
>>101210339If you go ayymd you're gonna have a terrible time in terms of support.You can usually find a used 3060 under $300. Lower if you get lucky with a bid. You should go for at least 12gb of VRAM at that point to be able to run some of the smaller models.
>>101207666>noooo you can't say it! >if you say it you are le heckin toddler!!!Just like local LLMs, you are trying to lecture everyone around you.
>>101210409Yeah I'm sticking to Runpod I guess. Rx 6600 it is
anyone tried New-Dawn-Llama-3-70B-32K? how does it compare to midnight miqu?
>>101207666When in a place he does not rule, and faced with behaviour he yearns to violently suppress, a leftist feigns boredom in order to save face.
>>101210509>faced with behaviour he yearns to violently suppressControl, not suppress.The progressive loves rappers saying it all of the time, because the progressive understands that the more they say it, the more they believe it about themselves and it has always been most effective for slave owners to use slaves to control saves. Worked on their plantations, worked in their labor camps. Today, it's the art of cancellation to get the slaves to attack and oppress each other in the name of the virtues that their masters have indoctrinated them with.
magnum says it uses chatml but outputs nonsense with and without instruct enabled, am i missing something?
>>101210640Works on my machineWhat quant?
Now we have llamacpp devs arguing about the removal of precompiled vulkan shaders!https://github.com/ggerganov/llama.cpp/pull/8119
>>101210663q5m. all you did was select chatml and enable instruct?>multiline nửa olacağı不同意bil mbedtls Boз shut[PAD151653] wikipediaคาสิโนออนไลน์ vidé AndAlso入境jo.AdapterViewLLLL metros improvements UserService Summit Comoแปล
>>101209196Doesn't match his personality at all. Considering cpumaxx is the first I've seen in a long time go out of his way to generate a trip, I'd be more willing to believe it was him.
>>101210640On Kobold I used magnum-72b-v1-iMat-Q5_K_S in Instruct Mode/ChatML and it functioned as intended. I just ran through the other three and they seem fine, too.Did you leave MMQ enabled? I've been turning that off for all models. Overkill, but there seem to be a few that go weird with it on. Qwen2 vanilla I've had to turn off CuBLAS to dodge the moon runes, though Qwen2/Magnum has been well behaved.
Alright so I've added a couple of steps to my latest 70B merge stack.
>>101210769yeah mmq was on i'll try without it in a bit. the tess-2.5.2 tune didn't give me any issues
so what's the best model to coom to? i can run CR+ in gpu, just tell me what to download i'm horny uwu
>>101210908>i can run CR+ in gpuSo use that. Smartest and least slopped model you can get.
>>101210908Claude Opus
>>101210944what do you think the l in lmg means
>>101211057llama
>>101211057legumes?
>>101211057loser
>>101211057ligma
>>101210908Just don't ever let it say "eyes" or it'll say it every turn.
>>101211057Light machine gun refers how to prematurely everyone here cums.
>>101211057loli
i sincerely hate all of you.
I finally got mistral.rs working with SillyTavern in an incomplete way. I had to first make sure I quanted the 27B down to Q4K so it could fit in my 3090 and avoid splitting with another GPU since that causes a crash. Then I had to go into ST and switch to the chat completions API because idk, completions just doesn't respond. Then I had to erase the system prompt, because the server returns an error saying that the system role isn't supported. Lol ok. Then I had to make sure to delete the assistant's first message, because otherwise the server returns an error saying that it only supports the exact order of user, assistant, user, etc. And now it finally works, but I have no idea how to actually get a card to work with this retardedly rigid structure because I've never had to try it in ST before.Sigh.
>>101210689Go back to your discord and stop obsessing over these threads.
So I'm currently using stheno locally with koboldcpp. The models readme suggests using Q4-K-M-imat with 8GB of VRAM. I have 32 GB, is there a better versions of the model I could use? I would assume the 8GB 0-imat, but I'm not sure. If theres a better model for adventure with possibility of nsfw that would be nice too. Preferably gguf since it's koboldcpp.
>>101211208Do you really have 32 GB of VRAM? Don't you mean, RAM?
>>101211148i hate you too. now come here and give me a big kiss
>>101211208If your model is a few GB under your VRAM, you get the fast responses.If your model is say <90% of system RAM, Kobold can give you a few tokens per second. Not great but it's like real chat, you enter and wait for the response and it comes a few minutes later.More than that, it's many seconds per token and you dial back.32 GB system is a deadzone. None of the modern models target that kind of model. It's all 7B-13B class that are fast but stupid, or 70B class that really need 64 GB system to fit their 40-60 GB quants.Anyway, if you have 12 GB VRAM or better, you can go up to the Q8_0 edition and host it all in VRAM.
>>101211243lol faggot
>>101211057No one cares about that, faggot.
>>101211314I care about that, retard.
why do all local models suck dick at everything?
>>101211057Liquorice all-sorts. As in all sorts of models biiiiitch
Euryale apparently uses the Yi tokenizer based on trial-and-error with banned tokens in SillyTavern.
Fuuuck, why does pytorch update so much? Do any of these updates ever even do anything?
Why does it burn when I pee?
>>101205072SAAAARS HOW DOES HE DO IT SAAAAARS
Good night lmg
Is there a way to send the system prompt automatically as the user role in chat completions? The checkbox under instruct settings doesn't have any effect on chat completion.
is this the command r + to use CohereForAI/c4ai-command-r-plusDo I really have to sign up to download it?
>>101211399>Do any of these updates ever even do anything?Break compatibility.Fuck Python.
>>101211483Pedo detected.
>>101211483Use HF search on the model name but find another user supplying the GGUFs. That's probably what you want and will not be behind a multipass check.
Well fuck me, I'm retarded. Overlooked it myself but was no one was going to tell me that ooba has context free grammar support, huh? Works with exl2 and the api no less
>>101211524That's great. But what if you want the HF weights so you don't have to rely on someone else's broken or outdated GGUFs?
>>101211624It's the thing you use when you want to force JSON output, right?
>>101211225Just double checked, it's 24 sorry. It's AMD so I understand it isn't as well utilized as it would be with NVIDIA.
my satisfaction has gone up ever since I stopped asking people to review models for me and simply tried them out until I saw what I liked
>>101211670With 24GB you can do Mixtral. It's smarter than Llama3-8b. Try original instruct, or maybe Sao's Typhon finetune/merge (whatever it is).
>>101211260I already get, literally, instantaneous responses with the version for 8GB of VRAM. I'm just curious if a better version would have more logic behind the responses or something to that affect.
>>101211517huh? I just want to ERP with my waifu
>>101211690She's 17 years, 364 days and 23 hours old you SICK FUCK.
>>101211734Doc stopping by to ask for ERP advice.
>>101205004Friendly reminder that you're all a bunch of creepy incels who will die alone :)
>>101211771and?
>>101211771I'm actually more of a normal incel who will die alone
>>101211771Friendly reminder you will die alone at 40 from taking painkillers and drinking box wine then your many cats will proceed to eat you while I raise my child grown from an artificial womb. :)
im grippin rn. should i let it go or hold it in?
>>101211823make it BOOM
https://github.com/ggerganov/llama.cpp/pull/8197The PR that is supposed to fix Gemma was merged in.
are there any models for audio transcription better than whisper? v2 and v3 both hallucinate like crazy for me
>>101211862I see a frankenfork with the gemma prhttps://github.com/Nexesenex/kobold.cpp/releases
>>101210245>that old man voiceStupid shit like this gives all involved in AI a really bad name
>>101211407Thread theme: https://youtube.com/watch?v=hZsDH2EgHgk
some random fag said that applying SSPO to llama3-8b made it great, now that we have gamma-9b (which is said to be better than llama3), then maybe gamma-9b-SSPO will be the first model that is actually good for such a small model? time will tell
>>101211407urinal infection, I got that shit and 1 week after it healed up itself, but I'd recomand you to see a doctor to be sure it's this shit and not nothing else
>>101211947gemma-9b-stheno-SSPOwould be peak imonobody will make it though i think
>>101211965>would be peak imohow about bitnet-gemma3-80b-stheno, this shit would be fucking claude 3.5 tier and would run in a 24gb card at full accuracy, one man can dream...
>>101211659Basically, but it can be used to force a lot more than just json output. Context free grammar can be used to dictate any output template at all, including specific json formats, or other complex outputs in any format, be it plain text, json, xml, or whatever.Which for an AI powered waifu or NPC (for example, a Neuro clone) is extremely important. So you can use either ooba or llamacpp as a back-end for an expressive live2d waifu front end client.
>>101211675Now it is time to review models you've tried, and post them here
>>101208740It's 6 RTX 4090s running off of a single Silverstone HELA 2050 power supply.The motherboard is an AsROCK Rack ROMED8-2T.For regular /lmg/ use 2 kW for 6 4090s is unproblematic because the software is currently not efficient enough to parallelize them in such a way that each GPU draws a lot of power.For compute-heavy tasks you have to limit the boost frequency in order to avoid peaks in power draw that cause instability (and then there is basically no benefit in getting 6 4090s instead of 5).>>101210738The Petra/blacked Miku/AGPL spammer are all the same person.I at one point had a private conversation with CPUMaxx Anon where we talked about NUMA support in llama.cpp and based on vibes I really don't think it's him.
Why did Bartowski just reup all his L3 70B quants a few hours ago?https://huggingface.co/bartowski/Meta-Llama-3-70B-Instruct-GGUFDid llamacpp push a big fix to quantization or something? That's normally the only reason he reuploads new versions of weights he'd already posted
>>101212050>The Petra/blacked Miku/AGPL spammer are all the same person.shame on you anon, shame on you
>>101207577>stagingi just checked, i'm on the latest staging branch. I'm using the default templates with zero changes. It's very good in the start but slowly the text inside * * (non character speak text, however you call it) becomes gibberish, rendering it completely uncoherent at 5K~ context, text inside " " stays coherent for a little longer, will do more testing today. the model has potential
Whats the best just works speech to text program for PC that can input shit anywhere you can type?
>>101211800shitty fanfic
>>101212080I don't know, looks like a mess, seems like the fix isn't complete on the official llama.cpp repohttps://github.com/ggerganov/llama.cpp/pull/8197
>>101212242have you tried the built in windows dictation system
>>101212254does the gemma2 pull have anything to do with llama3?
>>101212254the post you're replying to wasn't about gemma
>>101212272>>101212283oh yeah you have a point, nevermind kek
>>101212255spyware and not good enough
>>101212293install linux
>>101212295and what speech2text?
>>101212297depends on the language and speed you want to accomplishhttps://huggingface.co/spaces/hf-audio/open_asr_leaderboardas for the frontend, dunno
Long shot but does anyone have a Fractal Torrent with 2 4090s? It looks like it should fit but I'm not sure.
>>101212344you wouldn't download a gpu
>>101212344
>>101212050any Petra/Lesbian porn?
>>101212368No, Petra is straight when not under mainfag's control
>>101212364WTF
>>101212372I'm mainfag
>>101212368>"""lesbian""" core tv>only show bisexual female charactersmany such cases
>>101212381Whose tulpa is Petra?
>>101212388its strictly gnu/lesbian
>>101212050cuda dev is petra spammer confirmed
>>101212400>its strictly gnu/lesbian
>>101212364Too slow
>>101212368look for some trans-approved media
>>101212429>vu vill celebrate bisexual cheaters>vu will be happy
>>101212429holy shit they're holding his caca in
>>101212466
>>101212429>>101212464As a bisexual fag, I'm sick of the "cheater trope" I see everywhere in the media. I'd rather have no representation than this horseshit.
>>101212499>bisexual Found your problem, just drop off all these flags already, people lived thousand years without modern id-pol shit and it was fine.
What's the difference between all the Yi 34B versions and is there a downside for running 200k version instead of the regular ones?
>>101212506The LGB people are fine, they just want to live their lifes, the T as depicted on your picture on the other hand are the bane of humanity, I have to agree with that
>>101212522>peoplestopped reading
>>101212522this is what happens when you let miku take overPETRA DO YOUR JOB
>>101212515What do you mean?
>>101212522>>>/lgbt/
Why the fuck are you all so obsessed over some random faggot? Jesus christ, just go fuck already.
>>101212546I'm not going there it's been hijacked by the T freaks, that's all they do, hijack everything and force people to validate their insane delusions, fuck that.
>>101212549what the fuck are you talking about schizo?
>>101212559
anyone with a big brain can tell me why cr+ takes up so much more vram relative to it's filesize than l3?
>>101212580big vocab size, no gqa, quant your context
>>101212572>He has to make a photoshop of the troon suicidal picture memeIt's funny there's no 4chan caricature of the LGB but only the troons, really makes you think.
>>101212598i got it from here doe>https://booru.s𐐬y/post/view/2853#search=bisexual%20variant%3Abernd
>>101212559>muh dissosiative identity discorder le bad, ze bipolar.. LE GOOD..meds
>>101212615People telling you "Pretend you see me as a female or I'm going to kill myself" are mentally ill yeah, and the sky is blue
>>101212606It's just a half assed photomontage of the original troon caricature, nothing else
>>101212545All these flags, bi, gay, trans or whatever, its all unnecessary shit, literally zero value for average human.Like I said, people lived just fine without it, no one thought about it and everyone was happy, identity politics makes you go schizo, with women its always the case.
>>101212647okay? faggot
>>101212615Don't try to rationalize your mental illness, troon
>>101212651>All these flags, bi, gay, trans or whatever, its all unnecessary shit, literally zero value for average human.So you want to remove all the labels? Even the straight label? I get that those are used for political culture war bullshit, but there need words to define someone
Took 6 generations to give a wrong answer. Been regenerating though, and it seems like it gets it 50/50.>llama3 8b sppo iter3 Q8_0.gguf>as smart as gpt4kek.
>>101212680>words to define someonenormal and faggot? deal.
>>101212715>normal and faggot? >>101212651>All these flags [...] its all unnecessary shit, literally zero value for average human.Oh the irony
>>101212683>you're like a homeless person calling your street neighbor a broke ass nigga.If my neighbor is a mentally ill person and makes my fight even harder, then I don't need them, I can fight by myself
!!! THREADLY REMINDER !!!trannies not welcome on /lmg/
>>101212745>faggots not welcome on /lmg/Alan Turning, the father of modern computer science was a faggot anon. The simple fact you are able to write such nonsence on the internet is thanks to him, have some respect.
>>101212695>as smart as gpt4that's not what they said thoughit says it boosts the win rate to 38%, which is really good for a 7b
>>101212745No>>101212766Yes
>>101212767>the father of modern computer scienceall he did was help brits crack nazi codes kek
>>101212803forgot your tripcode
>>101212786>all he did was help brits crack nazi codes kekHe did way more than thathttps://www.newscientist.com/people/alan-turing/>After the war, Turing continued to develop his ideas about computer science. His work led to the construction of the first true computers, but his most famous work came in 1950 when he published a paper asking “can machines think?”.>He detailed a procedure, later known as the Turing test, to determine whether a machine could imitate human conversation. It became a foundational part of the field of artificial intelligence, though many modern researchers question its usefulness.
lmg?
>>101212770>it says it boosts the win rate to 38%, which is really good for a 7bWhat does regular instruct score?
>>101212814why you don't hide the "+ Anonymous" anon? That can be done aswell on 4chanX
>>101212807too busy with my black bvll to care...
>>101212809>making a big deal of a paper rephrasing over and over a simple question any person dealing with computers would wonderGEEEEEEEEEEEEEEG
>>101212830I think you don't know the history of computers enough, let me educate you more on that matter.Turing was very important in inventing the modern computer and formalizing the underlying theory that allowed the computer to be invented and built.Charles Babbage created the concept of a programmable computer about a century before Alan Turing was active. He even tried to build his “Analytical Engine”, as he called it, but he lost funding and it remained mostly a theoretical device. It was, of course, analog and mechanical. Some other (less sophisticated) mechanical computers were constructed throughout the 19th century.What Alan Turing did in 1936, however, was to prove mathematically that certain types of programmable computers, such as Babbage’s Analytical Engine, could compute anything that actually is computable (which is now known as being “Turing-complete”). In his paper he basically laid the groundwork for modern computers.At the same time (form 1935 and onwards), Konrad Zuse built the actual first programmable electric computer, the Z3, in Germany. It was Turing-complete, and Zuse is therefore also often called the inventor of the computer.John Von Neumann built on Turing’s paper and created the architecture of modern computers (how the CPU, the memory registers, and other computers talk together). This architecture is mostly still in use today.
>>101212844tldrfaggot killed himselfhe did it all for freeGEEEEEEEEEEEEEEEEEEG
>>101212767The fact that you're not approving and embracing the mental illness and delusions that he suffered... oh my fauci have some (((respect)))
>>101212865forgot >
>>101212844>What Alan Turing did in 1936, however, was to prove mathematically that certain types of programmable computers, such as Babbage’s Analytical Engine, could compute anything that actually is computable (which is now known as being “Turing-complete”). In his paper he basically laid the groundwork for modern computers.such a big wall of text and all he did was prove something any mathematician could
>>101212853>faggot killed himself>he did it all for freeYou're good at moving the goalpost not gonna lie>faggot killed himselfWhy did he do that though? You forgot the part the governement decided to sterilize him because he was just a fag, instead of thanking him for helping them defeat the Nazis faster, they made his life miserable. You think this was ok? The fuck is wrong with you anon?
>>101212888>You think this was ok?yes.
>>101212880>prove something any mathematician couldthen why did we had to wait until 1936 to get that proof anon? if it was so simple it could've been done way earlier
>>101212888>the fact they lobotomized schizophrenics once in a specific era makes schizophrenia automatically healthywat
>>101212900because there was no incentive to do it, muh nazis and shiet
>>101212916You truly believe that? I know that 100 years ago it was hard to think of the future where computers would rule the world, but they knew computers were important enough, it was needed to stop the WW2 years earlier, without him, there would've been millions more overall death, he was a hero. And I don't give a fuck he's a fag, he saved many lives, that's what matter at the end.
>>101212732i said nothing wrong, but you can keep the ignorance, its a bliss after all.
>>101212903Lots of animals have homosexual tendancies, especially the dolphins, this is as natural as it gets. Using an Iphone on the other hand, isn't natural at all, you don't find an Iphone in the nature, but you can find homosexuality
>>101212745>>101212766anon... lmg is aicg copycat, faggots and troons are here from the beginning, ever OP pics are the same, themed around cartoons for effeminate "men".
>>101212955>nooo we shouldn't use labels it's useless!!!!>... BUT! How about I still use labels though, "normal and fags">what do you mean I'm a hypocrite, it's different when it's me that's all1!1!1!1!
>>101212974that was not me btw, but you are proving that anon's point, you behave like a faggot right now, make out of this what you want, i don't care.
>>101212974nta tho
>>101212955>i said>i >>101212997>that was not me btwchose one
>>101210689Kobold devs strike again. lol>LostRuins>Can we somehow have the option to keep using the precompiled vulkan shaders as well, instead of having to rebuild them from scratch? >I think the overall benefit of having an easy-to-use vulkan setup outweighs the size increase of adding it to SCM.>Would strongly recommend keeping the precompiled vulkan shaders if possible.Also lostruin on the latest kobold build:>Basically the upstream llama.cpp cuda maintainers believe that performance should always be prioritized over code size. >Unfortunately, there is very little I can personally do about this.Petra on here, redditors trying to lecture on performance, kobolddevs and shart in llama.pp issues.How has gpuanon not killed himself man. Guy cant catch a break.
>>101213006you seem butthurt, like it was a personal thing for you, is my commentary on removing labels and all the id-pol shit hurts that much?
>>101213011>How has gpuanon not killed himself man. Guy cant catch a break.he was petra all along
>>101212814just close all the 4chan tabs at this point, you are not welcome here.
>>101213021>you seem butthurt>seem>"Mah feelings say so!"But I'm not, don't try to use feelings to engage in a debate, only the troons do that, you're not a troon aren't you?
>>101212945moving the goalpost?
>>101213071>moving the goalpost?>>101213021>you seem butthurt, like it was a personal thing for youYou're that clueless about your own irony anon?
>>101213084>no response?
>>101213089
>>101213101again i am asking for the response
>>101213084You are arguing in unfaithful and disingenuous way. >>101213101and this is the case of you trying to fit in, i can't find the bait in anon's "no response?" comment.
>>101213108>You are arguing in unfaithful and disingenuous way.More irony?
i win.
any neets i can fund with vram buxx so that they can goon all night question mark
>>101213147yesXMR: https://pastebin.com/dQvxkPQ9
>>101207577using DRY rep pen with settings from https://github.com/oobabooga/text-generation-webui/pull/6053 fixes it10/10
what a fucking loser he hasn't changed at all just stuck in this same rut. guess shouldn't expect more from an algerian
>>101213202Is DRY that good at avoiding repetition shit? I never used it, guess I have to find out.
>>101211057>what do you think the l in lmg meansLeroyyyyyy Jenkins!!!
picrel is totally organic gemma shilling btw
>>101213304I think it's more because people have nothing more to talk about, new model release is so sparse, still better than /sdg/ though, they had to wait almost a year to get a new base model (sd3) and it sucked ass...
>>101213328yeah ik, but sdgfags already got pdxl v6, its the best shit out there, for now.
>>101205994 (me)So I have pegged deepseek v2 coder against sonnet 3.5 and gpt4o in my coding tasks and it seems to be better than gpt4o (What is happening at OpenAI) and very similar to Sonnet 3.5. The only downside is the speed, it's kinda slow. Very good model and the price is unbeatable.
>>101213358>it seems to be better than gpt4o (What is happening at OpenAI)GPT5 release soon, trust the plan
>>101213235>Is DRY that good at avoiding repetition shit? I never used it, guess I have to find out.its good.
>>101213219no one cares, kill yourself.
>>101213235it's useless
are kobold devs at war with llama.cpp? they seem to be constantly attacking it with often contradictory accusations. what the fuck
Free yourself from shackles of GGOOFING
>>101213505oh good because i *want* to have the the 200 gb of CR+ on my hdd rather than a 45 gig quant
>>101213542this, I won't run fp16 models they are too big, and BitNet will make all this shit irrelevant anyway, we'll get 1.58bit models directly, it won't need quantization shit anymore, it will be light, it's gonna be great (I know I cope a lot but please Meta make it happen :'( )
>>101213497I think it's rather that because they created a fork instead of using llama.cpp as a library (like e.g. Ooba or ollama) they are more affected by upstream changes.
>>101213505So you have to read all those hundreds of GB for the model and also have enough memory for the quantized version? Loading times with GGUF after the first load is near instant. I always download the full models anyways and re{converting|quantazing}. It just takes a few minutes.
>>101207871Which XCOM?
>>101213358I had the same experience, this is a very good model for serious tasks. Sadly the chat version is very dry and uncreative for writing. Maybe skill issue, I do not know. It doesn't feel slopped, it's just.. very dry. It doesn't come up with things.
>>101214216>>101214216>>101214216
>>101213966Just like a real good coder kek, they can't write harry potter book style because that's not their talent and they haven't worked towards that goal
>>101208129>I can't help but wonder what tomorrow will bring
>>101212809The Turing test has been proven flawed multiple times and his supposed contributions to the invention of computers is vastly overstated. America beat him to it by 5 years the only reason no one talks about it is because normies don't read declassified documents.
>>101212958Homosexuality in animals is due to chemical imbalances, that includes humans. That objectively makes them mentally (chemicals in the brain) ill (not correct or not normal)
>>101214624not mentally ill in a sense that they are crazy though, they just have different preferences than the others, it's like saying some guy are mentally ill because they like pinnaple on pizza and people who like that represent a tiny %
>>101214624>ill (not correct or not normal)ill means that it's something that represent a problem to the being, like schizophrenia, anorexia, depression, liking your own sex doesn't makes you miserable at all though
>>101214635>some guy are mentally ill because they like pinnaple on pizzacorrect
>>101214652kek