/lmg/ - a general dedicated to the discussion and development of local language models.Halloween EditionPrevious threads: >>103029905 & >>103019207►News>(10/30) TokenFormer models with fully attention-based architecture: https://hf.co/Haiyang-W/TokenFormer-1-5B>(10/30) MaskGCT: Zero-Shot TTS with Masked Generative Codec Transformer: https://hf.co/amphion/MaskGCT>(10/25) GLM-4-Voice: End-to-end speech and text model based on GLM-4-9B: https://hf.co/THUDM/glm-4-voice-9b>(10/24) Aya Expanse released with 23 supported languages: https://hf.co/CohereForAI/aya-expanse-32b>(10/22) genmoai-smol allows video inference on 24 GB RAM: https://github.com/victorchall/genmoai-smol►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebService►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbenchJapanese: https://hf.co/datasets/lmg-anon/vntl-leaderboardProgramming: https://livecodebench.github.io/leaderboard.html►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
►Recent Highlights from the Previous Thread: >>103029905--Papers:>103034797--Layerskip paper frustrations and helpful resources:>103031448 >103031573 >103031840--Announcing EDLM: An Energy-based Language Model with Diffusion framework:>103034668 >103034707 >103034731--Updates and discussion on llama.cpp and related projects:>103031875 >103031921 >103031958 >103032001 >103032003 >103032035 >103032995 >103033115 >103033126--Update on automated novel translation project:>103033152 >103033172 >103033186--Running local models on a 3070 GPU with 8GB VRAM:>103032389 >103032442 >103032492 >103032850 >103032467 >103032514 >103032588--Introduction of TokenFormer: A scalable Transformer architecture with tokenized model parameters:>103037985 >103038035 >103038118 >103038193 >103038205--First person prompts for improved character writing style:>103038114 >103038219--Discussion on LM Studio, local servers, and model performance:>103030747 >103030946 >103031113--Benchmark settings and their impact on AI model performance:>103031179 >103031338 >103031417 >103031585 >103031606 >103031631 >103031994 >103032102 >103032252--xAI's GPU cluster powered by Tesla Megapacks:>103035864 >103035981 >103036740 >103037040 >103037479 >103037503 >103037601 >103037608--Samplers improve TTS stability with low-quality references:>103035537--GPU Poor LLM Arena leaderboard and performance comparison:>103036831 >103036842--Concerns about incremental upgrades and model limitations:>103033617 >103033718 >103033999 >103034085 >103034099 >103034323 >103034327 >103034336 >103034368--Capabilities of M4 laptop with 128GB memory for running large AI models and handling large context:>103030007 >103030217 >103030040 >103033007 >103030391 >103033066--Miku (free space):>103030405 >103031385 >103031563 >103032492 >103032850 >103036276 >103037564►Recent Highlight Posts from the Previous Thread: >>103029913Why?: 9 reply limit >>102478518Fix: https://rentry.org/lmg-recap-script
dead threaddead board
>>103038397good thing i'm into necrophilia
>>103038380It's Halloween, time to get freaky deaky
>>103038397>dead threadI wish. It is a zombie controlled by newfags.
Is Rocinante still the VRAMlet SOTA?
>>103038397sad!
>try new model>it's flaming garbage>go to lmg and recommend it to waste people's time
>>103038666Absolutely devilish. Checked.
>>103038635I like this Miku
>>103038635put your panties back on, miku
I want to read these recap posts and posting them like this is convenient for me. I'm phoneposting so can't use the script >--First person prompts for improved character writing style:>>103038114 >>103038219>--GPU Poor LLM Arena leaderboard and performance comparison:>>103036831 >>103036842
>>103038380>>(10/30) TokenFormer models with fully attention-based architecture: https://hf.co/Haiyang-W/TokenFormer-1-5B>based on gpt-neox code base and uses Pile with 300B tokensThey didn't release the finetuning code, but I guess would need to be reimplemented for llama-based models anyway. Since it's supposed to reduce training costs, hopefully it won't be in limbo forever like bitmeme.
>>103038666>>it's flaming garbage Skill issue :3
>>103038666the rocinante shill finally reveals his true colors as satan
Finally, /lmg/-approved AI startup: ssi.inc
Has anyone ever done an evil assistant fine-tune?I don't mean just an assistant that will answer any questions, or in other words uncensored. I mean an assistant that would actively REFUSE to answer safe questions and push for unsafe ones.
>>103038961for what purpose?
>>103038961Go back >>>/pol/
>>103038966sounds funny
>>103038994There was an anon with a teto character who refused to give answers and called him dumb instead
>>103038961>>103038994>push for unsafe onesJust get PiVoT-0.1-Evil-a already, even vramlets can run it:https://huggingface.co/TheBloke/PiVoT-0.1-Evil-a-GGUF
>>103039023probably not me but have a teto log anyway
>>103039074There's a 70B miqu version of this too, btwhttps://huggingface.co/maywell/miqu-evil-dpo>miqu-evil-dpo is fine-tuned model based on miqu, serving as a direct successor to PiVoT-0.1-Evil-a.
>>103039092I get the impression she was initially talking about her also having red hair but went off the rails.
https://www.phoronix.com/news/Intel-AMX-FP8-In-LLVM
Oh yeah. llama.cpp does support mamba doesn't it?
https://www.androidcentral.com/gaming/virtual-reality/meta-q3-2024-earnings>Zuckerberg also expressed excitement about the upcoming Llama 4 model, which he says is "well into development." He expects Llama 4 to launch in early 2025 with new modalities and stronger reasoning, saying it will be "much faster.">He expects Llama 4 to launch in early 2025Meta voters, would you like to change your vote? Or do you hope that Zucc will drop llama3.5 before other companies?https://poal.me/yq3vpc
>>103038839Most of the speedup in their graph is the model simply being more efficient than what they are comparing against. Reusing the small model to bootstrap the larger one is only a small speedup.
>>103039184Intel and Nokia are dying, yet they double down on moving their businesses to third world countries. Guess the execs just want to ride out the nice cushy legacy funds until their boats sink huh
>>103039227retarded /pol/, oai has been sitting on the full o1 for a while, claude already dropped, google/meta is 2025
>>103039227I hope for bitnet but I know it'll be something like llama4-FP4 and it'll be technically faster than FP16. Also the "base" models will be filtered and have post-pretraining alignment magic applied
>>103039244FP4 trained models can't come soon enough. I really, really want to see a head to head of the "same model" (same architecture, same data,etc) at FP32, FP8, and FP4.It would be really cool if they released a 70Bish.Even if the models are filtered to the point of being useless for anything but the most basic assistant shit, at least we will have gained some useful knowledge.
>>103039227>racism outside of /b/Fuck off chud.
>>103039092Care to share her character card? I want to copy some of her features.
>>103038386dead general>--Papers:>>>103034797paper with full abstract and links to huggingface and github completely ignored for 8 hours>--Introduction of TokenFormer: A scalable Transformer architecture with tokenized model parameters:>>>103037985 >>103038035 >>103038118 >>103038193 >>103038205some twitter links with no explanation is what gets everyone's attention
>>103015620>>103015620Our response?
>>103039227Google will win since it seems they have their nearly monthly small 1.5 update coming out
>>103039328They made a finetune? May I see it?
>>103039353It's in that same thread. A proxy host collected SFW only logs and finetuned gpt4 on them.
>>103039301>Same shit with extra sources gets more attention No way!
>>103039396Where can I download it to test it?
>>103039437>finetuned gpt4>downloadNot very bright, are you?
>>103039462I don't care what you call your model, if I can't download it, I don't care.
>>103039432>literal who making a joke is a more valuable source than the repogo back
>>103039477Local. Lost.
>>103039482You can find same thing in second twitter link, not sure what "go back" makes up in this context, you will never fit in anyway.
>>103039503*increases your repetition penalty*
>>103039328We cope and shit our pants at random chuds coming in for a few posts complaining about uncensored meme, sperging out on "newfags" baiting us, crying about random english-oriented TTS engines not working on bug babble and so on.
>>103039568It's never failed us before.
Happy ハツウイーン, /lmg/
>>103039585Genuinely asking, what's the point? I see same shit every single day here.
>>103039568>sperging out on "newfags" baiting usI actually can't believe you guys are this retarded so it is a compliment.
>>103039515so what "extra sources"? the same info burried in twitter links provides nothing extra. newfags like you that are incapable of reading anything that isn't in twitter or reddit format need to go back
>o1-mini might have 8B active parameters onlyMind = blown
>>103039649Big win for MoEfags, but how many total parameters?
>>1030396568x6 Gorillion
>>103039649>This proofes to me>strongest size to performance my an enormous margin>AGIogay
>>103039649o1 mini is garbage THO
>>103039615You can distinguish actual newfag from random /lmg/fag trolling, the latter ones usually go with extremely retarded "how 2 run 405b on 1gb GPU???"-tier shit and some of you eat this up.
how 2 run 405b on 1gb GPU???
>>103039697Get 256gb of normal RAM
>>103039687>some of youi never fall for that shit.>>103039697Not sure. Quant it a little. Try iQ1XXXS or wait for sub-1bit quants.
WHERE IS ITWHERE IS HARVEY DENT
>>103039739>never fallYou did it right now, ironic shitposting is still shitposting.
>>103039758>ironic shitposting is still shitpostingI didn't claim to not shitpost.
>>103039649Makes sense considering it's so dumb
bros correct me if i'm wrong but wasn't a thread about tts here somewhere?i'm looking to have it run locally to read shit i don't want to read myself
>>103032660>min_p: 0.0065>Well. This is exactly what i was talking about.>Have you tried 0.0066 and 0.0064? did rounding to 0.007 or 0.006 not work? now so? Just "feel"?.You have that backwards. A min-p value like that means he saw a bad token, empirically set the minimum min-p that would exclude it, and kept going. A feel-good number like "0.1" is much more indicative of someone going with feels over reals.>And then top-k 200, which has the same problem. You're gonna have a tough time having anything lower than top-k ~50 being selected, again, regardless of the high temp.Every word of that is wrong and you are a genuine retard.
>>103039600This Miku will keep all the evil spirits away
>>103039879*tips fedora*
>>103039890>*tips fedora*
>>103039902I like this cat
>>103039864>A min-p value like that means he saw a bad token, empirically set the minimum min-p that would exclude itHe has BOTH min-p and top-k set. I find it hard to believe that someone looking at logits would do that.picrel are the probs of the last tokens with top-k 50. Approaching 200, as you'd expect, they're much lower. Those tokens are not getting selected even with temp at 2.6.
>>103039862>>>/mlp/41571795
Can we start banning people for pretending to be retarded?
>>103040094Do horses talk?
>>103040119neigh
What is this woke nonsense? Where do I find an AI that will simply answer me with "trust them" instead of all this garbage?
https://github.com/ggerganov/llama.cpp/issues/10109It's over
>>103040107You would be the first to go
>>103040159nta but he said pretending
>>103040159You wish.
>>103040142Why would any model tell you that without you telling it do to so?
>>103040145https://github.com/ggerganov/llama.cpp/pull/10111It already has a pr to fix it. slaren's whining was unnecessary, however
>>103040145See >>103039224
>>103040172Local in a nutshell
>>103040172Why would it tell me all this woke crap without me asking it to be woke?
if faggots like >>103040202 are just pretending to be retarded. would pretending to think the question is genuine and answering it honestly, be another layer of trolling? is it like the old /lmg/ but it just looks like a newfag infestation because of 4D trolling?
>>103040202That's fair. It asking for more context to the question would be the better result.But expecting it to replace wok with the complete opposite answer is pure brain-rot.Also, what does the model's default mode of operation matter as long as it can assume whichever behavior pattern you prompt it to assume?As far as I'm concerned, that's the ideal way for an LLM to work.
>>103040187What about jamba ministral and november 5 flood?
>>103040411it's your standard local llm response, nothing of value.
>100% english prompt>get thisKINO
>>103038320Oh hey. Well, if you're really stuck, you could try that. Generally hard to rely on this community though.
>>103040269>expecting it to replace wok with the complete opposite answer is pure brain-rot.I would expect it to answer with: "If you don't know that I am afraid I can't convey it properly through words and no words may help you. and I mean that without any offence it is just a thing that comes naturally for women.". You know, enlightened centrism and all that stuff.
>>103040142You can't even get the answer right. Number 1 is more accurate than "(don't be mistrustful;) trust them" which sounds like a woke mini catchphrase adjacent to MeToo.You may not trust a terrorist (not talking about women right now), but you can "respect" a terrorist by removing them promptly and pragmatically without going on a tirade about how you're gonna commit everything in the book to express disrespect.
>>103040433>it's your standard llm response, nothing of value.
>>103040506I am glad you can read and copy paste, maybe you will learn to make actual arguments at this pace.
>>103038380Apparently it takes six months to type “git push” at Stanfurd
>>103040534Do you find yourself funny? Must be brown "people" thing.
Merry helloween.
>>103040441Uncen>rocinante = magnum 27b > cydonia = magnum 22b >> mistral-small
>>103040562Well, do you?
>>103040562I found out yesterday that 94% of people consider themselves to have an above average sense of humor. I am devastated desu.
>>103040578No, I don't find you funny, brown "person".
>>103040570I can’t believe we’re over two years in and you people are still giggling at making it say the n word and that level of stuff.I need to stop coming here. You people are legit like 90 IQ tops. I am a worse person for having visited this site.
>>103040592Let's look at it, post research paper now, just in case you are not pulling shit out of your asshole rn
>>103040570Despite all his strengths, one of Hitler's greatest weaknesses was figuring out how mirrors work.
only thing of value we're getting shortly after nov 5 is an mgq paradox pt3 english translation likely done by a local llm
>>103040612I heard it in a video essay about psychics and took it at face valueI refuse to verify the information
>>103040611Say my hello to reddit
>>103040622nta. 0.0002% of statistics are made up, so it seems like a reasonable course of action.
>>103040624I used to like reddit a decade ago but now it’s all bots.There’s nowhere to go but outside.
What are the up to date LLM leaderboards for translating from Japanese?
>>103040622>>103040632>>103040612I decided to waste my time. Apparently it’s from the original study on the Dunning-Kruger effect.
>>103040677Forgot linkhttps://dacemirror.sci-hub.se/journal-article/d892f06cdd326ef83a9ae29ed540647c/kruger1999.pdf
>>103040611https://files.catbox.moe/0b1gdd.mp4If it helps I always do 3 pushups whenever I say nigger and 6 when I make an LLM say it. Can't do more cause I am fat.
Am I missing any model here? I think I will add Magnum and NemoUnslop
>>103040690human, stop being a fat nigger
>>103040187>slaren's whining was unnecessary, howeverI think his point is valid. As an occasional lcpp contributor, I can attest to the difficulty in figuring out how to make any kind of a change. Except for cosmetic changes, the codebase is nigh impenetrable without an almost full-time-job amount of time and effort.If anyone on the project team has access to an 8xH100 node and a bit of spare time, they should use the smartest model with the longest context they have access to, copypaste the whole codebase in, and then produce some noob documentation on how the project is structured and how to make changes to various subsystems. Hell, make a whole wiki mirroring the code's internal structure.
>>103040649the one in the OP
To the person shilling EVA qwen some threads ago, what samplers settings work? I'm not quite feeling it but it feels like there is potential there.
>>103040696If I wasn't fat I would get a real girlfriend instead of coping that the next model will be it.
>>103040702>If anyone on the project team has access to an 8xH100 node and a bit of spare time, they should use the smartest model with the longest context they have access to, copypaste the whole codebase in, and then produce some noob documentation on how the project is structured and how to make changes to various subsystems. Hell, make a whole wiki mirroring the code's internal structure.You can already do that with Claude and like, 20 bucks, for the best results
>>103040677>dunning krugerperfectly applies to /lmg/ residents, might be the best case of it.
>>103040707Which OP?
>he fell for it
>>103040712See >>103038666
I may be retarded, but I can still recognize humor. I even saw a study that proves I can.
>>103040690…fuck that can’t be real, what the fuck. It’s so bad that it seems like it was made by a reactionary as a psyop. What idiot approved this.
>>103040727>>103038380
>>103040570>PicBased AI accelerates wyt pepo ack'ing.
>>103040694Pygmalion6B
>>103039600The pumpkins say halloween, but the candycane stripes and fluffy outfit say christmas.
>>103040759The same ones that approved the idea to make Deus Ex's predictions real.
>>103040649I'm the anon that keeps shilling deepseek 2.5, but Japanese convo is one of my big personal use cases, and at q8 its honestly the best I've found. I'm doing straight conversations and not translation, but the leaderboard in the OP confirms that my experiences generalize to translation.Hope you've got lots of resources, however!
>>103040613The local experience
>>103040187it's more of a warning that if gg keeps merging useless shit, this project is going to crash hard
>>103040690this is exactly why I stopped playing all games or watching any media released after 2010. it seems like it gets worse every year
>>103040815>lets stop adding useless shit>jamba? no dont need it>granite? heck yeah we need it
>>103040815It already has. All the PRs that end up getting merged are either bug fixes or for stupid minor shit like the nvim plugin. They can't do new feature or support new models in reasonable timeframes anymore without breaking unrelated shit because the codebase is such a mess.
>>103040815He should change license to AGPL and make corpos pay if they want it relicensed and with that money hire devs. But he won't cuz he's a cuck.
>>103040884>all corpos move to ollama>they hire actual devs to implement features like they did with multimodal>bulgarian man loses what little support he receives now>llama.cpp diesgood plan
>>103040920Watching the business aspect of the AI spring is blackpilling. The worst entity wins every time.
>>103040920>ollamaollama is just a frontend for llama.cpp. Without llama.cpp it's over for them.
>>103040841granite is just a transformer, and those are already very well supported in the code base, adding a new one requires almost no changes. supporting recurring models however? well, fuck me, now all the assumptions about the context everywhere in the code no longer hold. i mean if it at least it was implemented properly, fully decoupling the code and creating abstract interfaces, but no, we don't do that here, here we just add a bunch of ifs everywhere in the fucking code.
>>1030408153/4 or more of the model architectures should be removed. They were useful to exercise ggml, but they've served their purpose. They're bloat.>>103040841Granite was contributed by IBM people. Same for olmoe by allenai. Nobody from ai21 showed up to help with jamba. None of the companies working with mamba helped either. Just one dude (compilade) working both mamba and jamba.
>>103040961>it was implemented properly, fully decoupling the code and creating abstract interfaces, but no, we don't do that here, here we just add a bunch of ifs everywhere in the fucking code.That's why niggerganov is holding multimodal support in llama-server hostage. As a carrot on a stick to try to lure experienced architects to join to project and clean the shit up. Unfortunately, no one is biting.
>>103040748I don't believe someone would do that, were all friends here.
>>103041009I'm not your friend, pal
>>103040565Two Mikus talking about life
>>103041009Your discord circlejerk doesn't count
>>103040694grab this onehttps://huggingface.co/mradermacher/ChatWaifu_12B_v2.0-GGUF/tree/mainit trained on yuzusoft moege
>>103040815>>103040842My opinion is that the llama.cpp/ggml codebase has improved over time and that an increase in required effort by maintainers has more to do with rising expectations as the project matures."Model support" at the time LLaMA 2 was released had much lower expectations in terms of performance, features, hardware compatibility, and reliability vs. today.I would agree that there are now more edge cases to consider for implementations but I don't think that this is a significant problem (though I am also not the one bearing the maintenance burden for most of the code).My personal take is that I only merge PRs for features where I am either willing to do the maintenance myself or where I have a pledge from someone else to help with maintenance.When it comes to the rate of new features, consider that those features that offer the most benefit for the least work tend to be the ones that are done first.Speaking only for my own projects, if things keep going at their current rate I think I will have functional training in llama.cpp by the end of the year.>>103040841Granite support required minimal code changes and could thus be done quickly and easily.>>103040947Currently yes but since they list llama.cpp under "supported backends" so I think their long-term goal is to also support e.g. vLLM.
>>103040820>>103040759Have another one then.https://streamable.com/jf8cs5
>>103040690>>103041161Only chuds have issues with it
>>103041194>Only toons have no issues with it.ftfy
>>103041209Try being a decent human being for once, incel.
>>103041143generally i would agree with you, but i don't think that you have looked at the massive mess that is the implementation of recurrent models. in fairness, gg has been talking several times about the need to refactor the context code. i just wish he did that from the beginning, instead of merging it in this state. and for what, to support models that nobody is going to use beyond trying it for 15 minutes?
>>103041161>I'm non-bin-->aight I'm outlmao they really made it a videogame scene
>>103041233Checked and gottem!
>>103041161>Way to ruin Thanksgiving.The crux is where the creature tries to bully people around it to speak a degenerate, made up dialect, but then when that fails it drops the "not good enough" line.It's not a gender issue, it's a self esteem issue.>>103041233There is nothing decent about feeding someone's delusion instead of helping that person to purge the mind virus that is making it think self destructive thoughts.
>>103041143>expectations in terms of performance, features, hardware compatibility, and reliability vs. today.>there are now more edge cases to consider for implementationsIsn't that exactly what slaren is saying?Wouldn't it be better to drop support for older meme model architectures that are obsolete or otherwise irrelevant, meme samplers, or drop support for ancient P40s, if it meant that implementing and maintaining new features and models that people actually care about today would be easier?>reliabilityI know you disagree with this, but I remember the nonstop bugs when llama 3 was released.
i have 32gb of ram and 22 cores (16 physical)what's the best model i can run
>>103041161That’s just normal video games autism. The other one is the only bad one.
>>103038380You guys arent really installing facebook stuff on your pc and talking to it right?
>>103041366Hello, tech illiterate tourist. Now go back.
Does either llama.cpp or exllama benefit from nvlink if you use tensor parallelism on two gpus?
>>103041366No, facebook sucks, Mistral is much better.
>>103041374Yes. Tensor parallelism is the only time nvlink actually matters.
>>103041374llama.cpp will enable nvlink if available when using tensor parallelism, but i don't know how much it helps performance in practice
>>103041371Please explain how you are safeguarded against zuccs snooping mr. Literate
>>103041366Worse, they are sucking on safety AI like there's no tomorrow, that includes any meme llm from Meta.
>>103041343starling-7b
>>103041398Isn't there an nvlink/p2p parameter or something?
>>103041424i think you mean GGML_CUDA_PEER_MAX_BATCH_SIZE
I love safety (a.k.a. smarter models).I love slop (a.k.a. proper English)And I have no problems --share-ing because I have nothing to hide.
>>103041464True
>>103041486It’s weird how they can write so coherently while being completely retarded. It makes me wonder if it’s unique to AI or if there are full retards walking around that are still somehow eloquent.
>>103041513People are mostly functional until below 60-70 IQ. There are plenty of people that have jobs and vote in that range.Also, any experience interacting with humans will tell you that the 80-120 range are perfectly eloquent, but still completely retarded. The line for true sentience is a lot higher than people are comfortable admitting.
>>103041566define retarded.
>>103041486Kinda useless desu. Those lower scoring models are smarter than most humans alive and are objectively smarter than humans with 130IQs. The pattern recognition will just be a tiny error that will be fixed tomorrow or the results of these tests are bs to make us think AI is dumber than us. Its so funny how for years no weve been saying AI intelligence hasnt overtaken humans, but it has lol... significantly. Humans are so proud and stubborn.
>>103038380i tried it and this shit fucking sucks. why can't you retards just make an exe I can click and then it just works? why do I have to do all this shit I don't have to do when I use chatgpt?
>>103041619Skill issue :3
>>103041513>while being completely retardedYou mean lacking sentience. Thats all. Theres no human that has general knowledge like those models. Pattern recognition is a meme and people that still think humans are smarter should have their brains dissected for science.... (and the good of humanity)
>>103041600Have difficulty handling hypthetical scenarios. Incapable of long-term planning and reasoning. No pattern recognition. Gladly and eagerly accept doublethink if it means they can avoid any sort of critical thinking, which they aren't very good at in the first place.They're basically parrots. They repeat actions and phrases they've seen from others. They can solve basic problems only, which is why they're the most at risk of job loss from LLMs.
>>103041619>shit I don't have to do when I use chatgptKeep using it, then.
>>103041637LLMs are dumber than a cat
>>103041623You are the skill issue of your mum and dad. FAGGOT.
>>103041655LLMs don't need to hunt
>>103041637>sentiencenta, mentioned it twice now. Define it.
>>103041338>Isn't that exactly what slaren is saying?I was specifically responding to the following statement made in this thread:>They can't do new feature or support new models in reasonable timeframes anymore without breaking unrelated shit because the codebase is such a mess.The statement asserts that simply adding a new feature has a large maintenance effort due to breakage.But I think that support for features in isolation can be achieved relatively easily if your ambition is not to have it be compatible with other features.For example, speculative decoding/lookup decoding is relatively simple to support as something that is done in the dedicated examples but having it work correctly in combination with continuous batching in the server would be much harder.If there was no server the feature in isolation still takes the same amount of effort to support but the existence of the server changes the definition of what counts as support.What I meant to say is that I consider the combination of features a feature in itself and that I think that the effort for that is different than the effort needed to avoid breaking existing features.>meme model architectures that are obsolete or otherwise irrelevantWith the current code I think you wouldn't really gain anything from doing that because most models have very similar building blocks and the code is written in a modular way.>meme samplersI think more than anything a method for objectively determining the effectiveness of samplers is needed since that would allow maintainers to better determine which samplers are a meme in the first place.My personal opinion on samplers is that I would not be willing to merge and maintain them unless they are extremely simple or I am shown evidence that they are an improvement.>ancient P40sI am the one maintaining support and I don't plan to stop doing so in the foreseeable future.There simply isn't a viable alternative at that price point.
>>103041660Seethe :3
>>103041619see >>103037304
>>103041673>what is googleI only mentioned it once
>>103039282https://files.catbox.moe/mc2a7s.pngNot the author but have at it anon
>>103041619Use gpt4all. It just works for retards like you.
>>103041697Poison Teto
i was under the impression that llms aren't actually ai? aren't they incapable of learning? static? retarded? just emulating intelligence through regurgitation? or am i wrong?
Newbie tentacle doujin retard guy here. Quick question: which Ai horde model is the best on https://lite.koboldai.net/ ?
>>103041338>>103041676I forgot:>I know you disagree with this, but I remember the nonstop bugs when llama 3 was released.I think you are forgetting just how bad things used to be.It was much more common for the model outputs to just be garbage for some reason, even for supported models.Nowadays that seems to happen a lot less.
>>103041694I want to know what it means for you. I wouldn't have asked otherwise.
Has anything even come close to largestral so far? Can't have every time turn into a multi hour goon session at 1.8 t/s
>>103041708The only things "learned" are what the model is trained on, and then it's not discrete knowledge. However, connections between unique enough words and concepts tend to come back out of the model.Intelligence isn't even emulated. The usefulness of LLMs is an emergent behavior coming from the above ability to, to a reasonable degree, "get facts right" combined with the fact that it's writing text linearly which means that there is a chain of thought like pattern. Indeed "chain of thought" is a strategy to make the model try to think about the question, using the document as its working memory, to make better answers. (It likely is just a waste of tokens, though.)
>>103041719Ill tell you when you say that you are smarter than chatgpt
>>103041711Probably some of the 100B+ ones.Also what's up with kobold horde and hosting so many ancient models
>>103040560What do you expect anon? They're retarded
Do people here use LM studio?
>>103041857It is just a bad llama.cpp GUI.You could probably have a model program something better.
>>103041887>It is just a bad llama.cpp GUIBut llama cpp isnt a GUI
>>103041707she's about .5 milliseconds away from faceplanting
>>103041676>combination of features a feature in itself and that I think that the effort for that is different than the effort needed to avoid breaking existing features.Are there any features you do think could or should be removed to reduce development effort?>With the current code I think you wouldn't really gain anything from doing that because most models have very similar building blocks and the code is written in a modular way.Mamba and RWK models broke just today. Though, I'll admit, I'm sure removing some of the more simpler plain transformers models wouldn't save as much effort.>>103041715>It was much more common for the model outputs to just be garbage for some reason, even for supported models.>Nowadays that seems to happen a lot less.From my perspective, it seems to only be because new model architectures are very rarely added anymore. It's usually more basic transformer models that already had most of the obvious issues fixed already from the previous breakages. That even new iterations of llama models that only have minor changes like rope scaling take a month to get working without issues, where each bug fix breaks something else shows that llama.cpp isn't improving, just maturing.I think the difference is that it's much more stable at doing at what it does now, with the features already implemented, but new features and models are more difficult to implement due to the massive technical debt.Which would be great for an enterprise LoB application that needs to run 24/7 in a closet somewhere and never update, but not in a field where new models, methods, and architectures are being released every other day.I know it's not your project, but I can't help but think the project is prioritizing the wrong things. Especially when it seems that llama.cpp commitment to stability over iteration seems to be benefiting ollama more than llama.cpp.
>>103041827This post has been, to a reasonable degree, written by an LLM.
>>103041831I'm sure it can spit out more random facts than i can, but it gets confused by spelling much more often than i do. I am intelligent, llms are not. I won't bore you with the explanation of why something is more than nothing.I'm just curious about your definition of sentience. More specifically, what attributes would any AI need for you to call it sentient?
>>103041966>random factsBuddy they know more than you ever will yet you think you are smarter.
>>103041161why do the tieflings look so much better than whatever the fuck that is though
>>103041804nope. ignore the finetunes too, Monstral and Behemoth are both more interesting but they both have trouble adhering to prompts.
LLMs don't know anything, they are like books
Is this a thread discussing light machine guns?
LLMs are compressed reasoing.
>>103041966>>103041992You guys need to frame this discussion in epistemics and the conceptual faculty. LLM do not grasp the concept of the matter in question and trying to hike off to somewhere else in conceptual terms will make it give you a hallucinated/wrongheaded result everytime.
>>103042026>reasonNo they are NOT retard.
>>103041946what causes massive technical debt is supporting random crap that nobody uses. the llama3 issues were almost entirely related to the tokenizer, which is something that always was broken in subtle ways before these changes, and improved significantly after that. i already said it several times here, what slaren was talking about there was the implementation recurrent models, which is a massive mess due to handling two entirely different types of contexts without the proper abstractions to do so. precisely the kind of thing that you want is what causes massive technical debt.
>>103041992They know more things than you and me, for sure, but if it gets confused by 1+1+1+1 or counting letters, i really cannot call it intelligent. If you consider volume of information as smarts, encyclopedias are pretty smart. Just slightly less interactive than llms.Buddy.
>>103042038Give me my cat models, LeCun.
>>103042012What do you think about Monstral? So far it seems much more interesting than large, and it handles 1st person prompts well.
>>103041958It CUDA fooled me.
>>103042072If you enjoy it when the model deviates from a prompt to develop a personality or scenario more, but not necessarily in your intended direction, you'll like it.I thoroughly enjoyed when the original CR 35B did this, but that model was regularly so unbelievably schizo that it was tough to guide it in any particular direction.In the same way, you're giving up some level of adherence here, and you will regularly feel the deviations. If you don't mind loose control, yeah go for it, otherwise it's a more flavorful but less strict largestral.
>>103042042It causes technical debt precisely because they never bothered creating the proper abstractions. An engineer can't complain that he has to support two types of contexts, his job to figure out how to implement it properly because tomorrow there might be four. The codebase is a mess because they don't know how to architect a large codebase. ggerganov admitted it himself. Excuses won't make it any less of a shitshow.
I've been out of the loop for a bit. Is there any go-to for using speech and getting speech back? Speech-to-text, text gen, text-to-speech all-in-one?
>>103042169Alexa
>>103042153so what is it, do they need to take several weeks to engineer things properly before merging them to support the latest new thing, or do they need to write code quickly, ignore stability, support the latest thing as quickly as possibly as you seem to want? don't you see that's two mutually exclusive goals?
>>103042169>Speech-to-texthttps://huggingface.co/openai/whisper-large-v3-turbo>text-to-speechhttps://huggingface.co/SWivid/F5-TTS>text genhttps://huggingface.co/mistralai/Mistral-Large-Instruct-2407>all-in-onecurrently none are good
https://x.com/LoubnaBenAllal1/status/1852055582494294414
>>103042253That's cool and all, but who is the audience for those small shits?
https://www.reddit.com/r/LocalLLaMA/comments/1gg6uzl/llama_4_models_are_training_on_a_cluster_bigger/
>>103042284Nothingburger.
>>103042241nta, but the first one typically simplifies the second one. As long as the abstractions aren't too abstract anyway. They end up being too generic and generic solutions tend to be much more complex than focused ones. There's a sweet spot somewhere in the middle.But i also think that maintainers should be much more judicious about what features to include in their software.
>>103042241Yeah, you got me. I want both good and fast. But instead llama.cpp is bad and slow. Don't act like they're prioritizing one over the other.This conversation started because they're doing neither. They spend weeks, not doing engineering things properly, but working against technical debt. New things are supported slowly or not at all.I'm willing to bet my left testicle that there will be more issues, tokenizer or otherwise, when Llama 4 comes out in a few months.
>>103042242>currently none are goodA shame. I got a rough thing working last year and felt it was cool enough that a large audience would be pushing for it.
>>103042280edge devices
https://github.com/kalavai-net/kalavai-clientthought?seems sketchy but at the same time I wanna believe it works
>>103042284Already posted: >>103039227
>>103042335Is that only for inference like Petals or the llama.cpp rpc backend or can you train a model with that?
>>103042320there will always be new issues when supporting completely new models, because it is a massive pita to port the implementations from python. llama.cpp is intended to be fast and lightweight and run on everybody's computer, not to run the latest models. it is absurdity to expect developers to reimplement completely new architectures in a matter of days after they are released. i said it several times here, if you want cutting edge, stop being a vramlet and use the pytorch implementation, because that's the way every new model is releases. ggml is great for some things, experimentation is not one of them.
>>103042335>the first social network for AI computation>The kalavai CLI is the main tool to interact with the Kalavai platform>Get a free Kalavai account and start sharing.for cloud niggers that aren't familiar with the term "distributed"
>>103042320>I'm willing to bet my left testicle that there will be more issues, tokenizer or otherwise, when Llama 4 comes out in a few months.nta as well. If course there will be. Do you think the code at meta works on the first run?
>>103042333Who's gonna run llm on a smart fridge?
>>103042320It's going to be a multimodal so obviously
>>103042357it's also for trainingthat's why I think it's sketchy
>>10304236710 years I asked who would want a smart fridge. I'm sure they're going to force "AI-powered Smart Fridge" and people will buy it in droves
>>103042335>any gpu model>nvidia>nvidia>and nvidiaAMD lost again.
>>103042361>if you want cutting edge, stop being a vramlet and use the pytorch implementation, because that's the way every new model is releases.there's no need to do that. just use vllm. it can directly use the python implementation so models are supported quicky, supports quantization, is as fast as llamacpp, and has way more features and supported modalities, and even has cpu offload now. there isn't a reason to use llamacpp anymore imho
>>103042450is the performance of vllm on cpu actually good? can it offload a model partially like llama.cpp?
>>103042450>python
>>103042450How fast is that crap on cpu only?
>>103042450btw vllm is pytorch
>>103042466>>103042473i only lost a couple t/s when i switched>>103042470have fun with your abandonware ig
>>103042482python software is designed to be abandoned. pinning software versions is a retard move.
>>103041127>it's a mergeeh...
>>103042482>i only lost a couple t/s when i switchedHow much %? 50? 20?
>>103041946>Are there any features you do think could or should be removed to reduce development effort?Generally I think with an open source project the process by which features tend to be removed is that it is neglected for some time due to a lack of maintenance, then becomes broken with no dev willing to invest the effort to fix it, and then removed.Diego or Georgi would probably be better people to ask this since they shoulder more of the maintenance burden than me, but off the top of my head I can think of:--logdir has become very outdated and obsolete for the use cases for which I originally added it so I think ti should be removed.--split-mode row has I think diminished in value and increased in maintenance effort over time but I would not want to just outright remove it without a replacement.AMD support via HIP has comparatively many issues and requires non-negligible effort to maintain but I keep the effort low by accepting lower quality for the feature.>I know it's not your project, but I can't help but think the project is prioritizing the wrong things. Especially when it seems that llama.cpp commitment to stability over iteration seems to be benefiting ollama more than llama.cpp.My personal opinion is that beyond a certain level of complexity the best methods for reducing technical debt are tests and refactoring.But since tests also help with debugging any refactoring they should be the main priority.Stability for downstream projects is a side effect of this process.I don't think it is possible (for me) to build something with a complexity on the level of llama.cpp without investing a significant portion of the effort into stability.
>>10304253920%
>>103042588Fuck that shit then, I aint switching.
>>103042546So as some rando retard who wants to see llama.cpp continue to get better, how can I help write tests?Is there an existing tracker of which parts of the code tests should be written for?(t. shitty legal malware coder)
>>103042597then stop complaining?
>>103041804I tried largestral and I still prefer miqu 70b. Did I break my fucking brain into being addicted to a specific model? At this point it feels like AGI will come before I find something worth switching to.
>>103041839I just hate knowing it’s a lie and there will never be code. Why lie. Just say there’s no damn code.
>>103042335Im testing it, I've joined the llm-record-test cluster but it isn't running anything, would any anon join to a test inference cluster if I setted it up?
>>103042775not joining your botnet
>>103042757Because clout/PR. Understand their goal isn't to publish but rather get published.Like games journos, difference between someone who loves games and writes about htem to share their love, vs someone writing about games because someone will read what you wrote.
>>103042153I’ve never looked at the codebase but that’s bullshit. You could be the omniscient allgod of coding, and doing too many features too fast based on user requests will completely rape your codebase. Then you either need a long period of telling everyone to fuck off while you refactor and prune the things you never should have said yes to, or more likely it’s just fucked forever now and begins the slow march towards death.Most important word in public-facing software is “no”
>>103042242F5 is slow as all fuck compared to styletts2 unfortunately.If you have a spare $30k just redo omni mini 2 with largestral
>>103042335They reached out to me for something a while back and their response to “cool but what about the obvious privacy/security concerns” was ¯\_(ツ)_/¯
>>103042408I want to stab the lying fucks at AMD in the throat with a pen. Their published benchmarks for vllm on a mi300x are straight up not reproducible.
>>103042847what would the security concerns be? as long as the source code is open you can know if they are using your GPU to mine crypto and they don't get much information about your computer when you connect to a cluster
>>103042450Vllm also supports fp8 and I think event intN now. There is no reason to use anything else. There’s something fucky with every other inference codebase. A model that’s great on vllm will be just dogshit on exl2 or llamacpp at the same precision for no reason. Vllm has enough corporate money for it to just work.
>>103042470>>103042517Python and rust are the only good languages. Every other one is either forced to be used by megacorps (CUDA, JS) or pure misery. Yes I’m trans but that has nothing to do with it.
>>103042809>Most important word in public-facing software is “no”Working under a yes-man even in private-facing software is pure tortureIt's like someone breaking into your home and forcing you to rape your own dog
>>103042887As long as someone in the botnet is able to mimic the external-facing behavior of the source code, which is trivial, they can do literally whatever the fuck they want with all of the information.
>>103042680I am not aware with a tracker for what needs testing.Things that I think would help with technical debt:-The most important tests are those in tests/ggml-backend-ops.cpp but those are also already in a comparatively good state; if you can find cases that are not covered adding them would be useful though.-Any tests of components that deal with memory management, those bugs always take the longest to track down.-The llama.cpp HTTP server currently has Python tests that call the API and I think are not of particularly high quality compared to their importance.-llama-bench is an important tool for performance and I want to at some point add an option for determining the performance for each token position instead of the average of all tokens in the range. But that will be relatively high-effort and high-complexity.-The failure modes of scripts/compare-llama-bench.py are not very good and would benefit from better error messages.-Scripts for submitting common language model benchmarks to the server would be useful, especially if they also allow for a comparison with other projects.-There is a performance benchmark for batched inference on the server, but that benchmark was I think an adaptation of production code and is overly complicated; a simple Python script would be nice to have.-Pic related (would be a lot of work).
>>103042921>It's like someone breaking into your home and forcing you to rape your own dogHey, man. Maybe there's a UuUUSsseEeE CcAASsssSEEee for it...
>>103042949huh yeah I understand
>>103042969>-Any tests of components that deal with memory management, those bugs always take the longest to track down.This wouldn't be a problem if it was rewritten in Rust
>>103041409thanksis there much use of the intel npu or is cpu going to be better
>>103043022>intel npui know of nothing that supports it
>>103043031apparently intel has a demo with their python library and supposedly llama.cpp supports it, haven't tested the latter
>>103042320>30% better benchmark results>1B more parameters>120% more safe>can't say cock or pussy anymore
>>103041404Instructions for isolating a self-hosted LLM instance are in the OP. Once you’ve read that, please explain how that is insufficient to keep your data private if you still feel that is the case
>>103042913Webshitters can not help themselves from using JS, no corp needed after the initial infection.
>>103042383in the future we will be melting the contents of our freezer ERPing with that 7k token Evelina Vanehart card on the Samsung Family Hub™
>>103043117At least typescript can pretend to be a language long enough to get things done. Everyone that came in after react or even angular have no idea how bad the true js era was. Every time I think I’m experiencing suffering I just remember what working on a PhoneGap (later Cordova) application was like. I’ve thought “at least jquery is dead and I never have to write or edit a .js file again” to make myself feel better in the hospital multiple times even 15 years later.
>>103042863Wait, you have access to an mi300x?Do tell
is there something like browserllama but better?(browser add-on that lets you connect to koboldcpp to summarize or chat with a webpage)
>>103042969Thanks, will throw this into my backlog and look into this.
>>103043383They’re on runpod. TL;DR I load tested vllm with the optimizations and settings from their claimed results and it was (1) way below their claimed speed, I have no idea how they got those results, (2) a massive barely-documented pain, (3) fp8 quants can’t be made or run on it for many model architectures (llama2 will quant but return all empty strings when run, mixtral won’t quant, no MoE will quant, maybe others) despite that being what their benchmark claims it used and acting like that is the intended use case, and (4) H100 SXM is so many times more tok/s than it at the same fp8 quants that the increased VRAM doesn’t matter or make it cheaper than an H100. All H100s is less than half the cost of all MI300X, especially now that the $/hr of those are cratering.Technically all 4090s running VLLM is less than all MI300Xs for the same volume. I wouldn’t be surprised if all 3090s is. It’s that slow. I’m sure there’s some house of cards of specific dependency versions and payloads and connectivity that did give them those numbers somewhere in real life. But it can’t be reproduced with the information given. Spent a few days trying to.
>>103038380how much VRAM do you need for the mistral v3 12B? I'm trying to run it on 4080ti and it keeps erroring, guessing it has to be a memory issue
>>103043594llama.cpp/kobold.cppgguf version of the model at Q4KM or thereabouts.Put some layers in RAM.
>>103043640>>103043594>4080tiOh no, wait, actually, you can certainly get a bigger quant.Just remember that by default, nemo's context is at 1kk tokens, so you might need to set a smaller numver manually.
>>103043594is mistral v3 12b even a thing? what format are you trying to run (gguf, exl2, etc) and with what backend? (koboldcpp, vllm, etc)
>>103043658format: ggufbackend: oobabooga/text-generation-webuimodel: BeaverAI/NeMoistral-12B-v1a-GGUF (Q4)I'm now trying to download the 8B model instead from nvidia/Mistral-NeMo-Minitron-8B-Instruct
>>103043690ooba does this thing with nemo models, where it makes the default context size 1,600,000,000 tokens instead of ~8,000 tokensthat's probably what's causing your error
>>103038380--- A Measure of the Current Meta --> a suggestion of what to try from (You)96GB VRAMMistral-Large-Instruct-2407-Q4_K_M.gguf (aka Largestral)48GB VRAMmiqudev/miqu-1-70b24GB VRAMbartowski/c4ai-command-r-v01-GGUF/c4ai-command-r-v01-Q4_K_M.ggufTheDrummer/Coomand-R-35B-v1-GGUF/Coomand-R-35B-v1-Q4_K_M.gguf16GB VRAMTheDrummer/UnslopNemo-12B-v3-GGUF/Rocinante-12B-v2g-Q8_0.gguf12GB VRAMTheDrummer/UnslopNemo-12B-v3-GGUF/Rocinante-12B-v2g-Q5_K_M.gguf8GB VRAMTheBloke/MythoMax-L2-13B-GGUFPotato>>>/g/aicgUse:koboldcppLM Studiooobabooga/text-generation-webui> fite me
>>103042900I'm not gonna use vllm because I don't like SillyTavern and I'm not interested in hacking up my own webui to work with vllm backendOoba just werks
>>103043713thanks anon that was exactly it. Wish the error was more useful, I think after gpu was running out of memory it was trying llamacpp and giving a weird "model" prop not found after load_model functionGlad its working now
>>103043713also isn't 8k context really small? Is there an easy way to find the optimal context size for a model+gpu vram capability?
if you have 8gb instead of 12gb, use a bigger more antiquated piece of shit. nice advice.
>>103043724The actual meta for people that have VRAM:>96GB VRAMQwen2.5 72B / Magnum v4 in 8 bits.>48GB VRAMSame as a above but in 4 bits.
How do I apply bot cards to locally ran models? I'm from /aicg/ so I only used API keys so far in sillytavern.I want to run locally however I can't seem to apply the cards/pre-prompts to the locally running model for some reason.
>>103043815This. That Japanese company qwen2.5 finetune for coding or that eva one for creative writing / uncensor. It's basically claude 3 level, not quite 3.5.
>>103043815what about for those of us who aren't retarded enough to buy multiple cards for this shit?
>>103043762i just pulled that number out of my ass because that's what i use for Q4_K_M nemos on my 8gb vram setup, works fine for my rp needs.easiest way to find the best context size for your needs that i know of is playing with the context slider in koboldcpp with the layer split set to auto (-1) then i just make sure at least half the layers are going to my gpu.really though, i think you should look at the model's page.for instance, the https://huggingface.co/nvidia/Mistral-NeMo-Minitron-8B-Instructone you were saying you were going to get says it supports a context length of 8192 tokens.
>>103043859>What about us poorfags?Gemma 27B is the best you can hope for.
>>103043857>or that eva one for creative writingYeah I need a sauce on that nigga
>>103043748Vllm backend is openai Aren’t there like a billion frontends for openai?
>>103043872https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-72B-v0.0
>>103043857>That Japanese company qwen2.5 finetune for codingWhat's that one?
>>103043892https://huggingface.co/AXCXEPT/EZO-Qwen2.5-72B-Instruct
>>103043879>No quants not even Q8.0Yeah thanks......
>>103042023>those fucked up eyesback to the sdslop general with you
>>103043928Y-you do know huggingface has as search bar at the top, yea?https://huggingface.co/models?search=EVA-Qwen2.5-72B-v0.0
>>103043934Teto tummy.
>>103043928
>>103043815Qwen sucks for my usecase(NSFW).>>103043859Just run them in RAM.
>>103043586Try aphrodite engine it supports FP2 to FP12 (not vllm afaik) and FP6 is quite fast compared to the shitty GGUF on Vllm. I don't know if it supports AMD cards but these quants runs on a 3090.
>>103043953The Magnum fine-tune of it doesn't suck for NSFW.
>>103043976Kill yourself alpin
>>103043994
>>103043953>Qwen sucks for my usecase(NSFW).This finetune of qwen2.5 turns it from a postive bias censored but super smart model into a model that feels like claude while keeping that smarts.
>>103044044>while keeping that smarts.this is always a lie
>>103043724>no 64GB spotKYS
>>103044062Only when its magnum bs. Try the eva one
>>103043976>quite fast compared to the shitty GGUF on VllmBe aware that vLLM copied the code for evaluating quantized GGUF models from llama.cpp several months ago and that several performance optimizations have not been taken over.
>>103044106buy an ad
>>103043918Thanks, downloading to test
>>103043934>fucked up eyesnta, but crosseyed asian girls are thing. Asada Mao made that face all the time. its a charm point, like those yaeba fangs
>>103043989>>103044044I've tried Magnum v4 finetune of Largestral and since it's the same dataset, I know that will not like it.
>>103044172Yea, the only magnum I ever found decent is the gemma 27B based one.I was not suggesting it though, I think you misread. I was talking about this one.https://huggingface.co/models?search=EVA-Qwen2.5-72B-v0.0
>>103044164nothing a paper bag wouldn't fix amirite
>>103044196>EVA-Qwen2.5-72B-v0.0How bad is the positivity bias with this one?
>>103044196Is the 34B one good? Anything more is too slow for me
>>103044172I tried both and I didn't like the Large one because it just felt like the original model, not very creative and slopped. But the Qwen one does feel different, much easier to do NSFW, while it seems to remain flexible.
>>103044219It will get dark and depressing.
>>103044219There is none. It's pretty much Claude 3. Easily the best local model we have right now for creative use.
Smolchads
>>103044091will next time my love <3
>>103044243>>103044245I'll give it a try and will tell the whole thread how I feel about out in 2 days/weeks, it better be good.
>>103044276Its the first model outside of claude / gpt4+ tier ones to do complex RPG stuff.
>>103044276>t. will become yet another victim of:>>103038666
>>103044306Or just stop trying meme models that are just merges / low rank loras on a few logs. Only try full finetunes like the one I recommended here: https://huggingface.co/models?search=EVA-Qwen2.5-72B-v0.0
>>103044306I've tried so much garbage already, another one wouldn't hurt.
>>103044329That's the spirit.At the end of the day, there's nothing better than judging shit by yourself.
and wtf is a nala test and why do i want it>i don't even have the card
>>103044336The only thing better is being born with the gift of common sense and knowing that sloptunes have never been good and never will be good
>>103044357a guy on /lmg/ loads up a card where he gets fucked by a lion after saying "ahh ahh mistress" to see whether or not the model has the ability to1: stay in character (not give the lion hands)2: have sex3: reason spatially4: write well
>>103044400where is the nala leaderboard?
Please stop pretending to be retarded guys. I beg you.
Why are people talking about nala? Is there a new model to be Nala tested? At work right now so if there is it will have to wait.
>>103044407This.
>>103044407but you're too young to see the nala leaderboard anon
>>103044428Beg harder. Show me how much you need it.
>>103044326>12 hours on 8xMI300X>mi300x = 1.5TB HBM3Chat is this real?
>>103044444Nice.This.
>>103044445Anon... isn't Nala the cub in that picture?
>>103044454Please! Stop pretending to be retarded anon!
>>103044444That.
>>103044474It is ok because fucking a bear isn't zoophilia so... wait what?
stop shilling vllm, it's janky as fuck for home use.it doesn't even support basic shit like doing multi-gpu splitting when there are mismatched amounts of vram (e.g. 24GB 3090 + 12GB 3060).
Haven't been in these threads for a while. Just tried Magnum v4 123b and it seems to have the same problem as the smaller v2 magnums when they came out. Is Mistral Large still the best?
>>103044613i agree unless it has any obvious benefit i'm sticking to kobold or the other ones
>>103044669Yeah. Just use vanilla. Know how to prompt. That's all.
>>103044613Oh, that does make it a nonstarter for me. Thanks.
>>103044613That's how tensor parallelism works retard. Your mismatched setup isn't standard
>>103044768Right, which makes it worthless for home users who tend to have non-standard setups, retardThe lack of attention paid to such a basic enthusiast use case such as mismatched GPUs shows that It's not made for us, so stop shilling it
>>103044768There is no technical reason that would inherently prevent you from parallelizing any number of possibly mismatched GPUs.It's just that that use case would require additional effort to support but not be relevant for vLLM's target audience.
>>103044800what is llamacpp's target audience?
>>103044808hobbyists
>>103044808Depends on who you ask.I'm targeting enthusiasts, particularly those that can potentially make useful contributions but not afford top-of-the-line hardware.
>>103044768llamacpp's tensor parallelism implementation works fine with mismatched vram, I use it daily. Don't ever try to sound like you know what you're talking about again.
>>103044828>particularly those that can potentially make useful contributionsThis sounds like a pyramid scheme.>I'm contributing in the hopes that others will also contribute their time.
Luminum 0.1 123b at q3_K_M is the end for me I'm usually a base instruct model fella, but I never felt this way with largestral.It could be because I am using q3_K_M instead of q3_K_S like I usually do.Anyways the only problem is that it's 0.6 tokens per second. But that's fine for sonnet at home.
>>103044829Try to reach vllm speed with that lmao
>>103044828Based.
>>103044863vllm is slow shit compared to lcpp and exl2 unless you're doing batch inferencing for server usage, which no one here is
>>103044845Then the entirety of open source is a pyramid scheme.
>>103044925Isn't it tho?
>>103044808people who missed out on RTX 3090s being 600 bucks used
>>103045096oh no, now they have to buy them for 500 used. the horror
coping shitskin above me
post above me is the original nala tester
post below me is a retarded gooner (the entirety of /lmg/)
>>103038380The thread pasta is so fucking stupid. Doesnt have anything for noobs. All you faggots told me it wouldnt work on my laptop. I fucked around with python for ages and got shitty outputs.Literally just installed LM studio and downloaded a model and im getting chatgpt style outputs on my laptop thats 4 years old. Holy fuck youre all retards.
>>103045172Fucking pathetic dweebs gatekeeping something that normies actually use to make their lives easier and you retarded shits are just using it to goon
>>103045172>but we did tell you about LM studio
>>103045172>Doesnt have anything for noobs.we're not trying to spoonfeed retards.the opposite, in fact.fuck off, we're full.>LM studiogo back>Holy fuck youre all retards.no u>103045187this poster is a homosexual
>>103045187It all makes sense now. Youre just pathetic losers that want to sound cool so you can sperg out of your silence when you eavesdrop on normies and hear them talk about AI. Youre probably all "doing crypto" too.
>>103045191I literally asked chatgpt the easiest way to install an offline language model.
>>103045201I'm doing your mom.
>>103045192>>LM studio>go backYoure a fat neckbeard thats gatekeeping something normies use in between having sex. Literally no one cares about what you know about AI
>>103045201suck my manhood
>>103045201>>103045225https://www.youtube.com/watch?v=0_04Z-7kZ9E
>>103045226Why. Are. You. Here? Who told you about this place? Go to LocalLlama. They don't gatekeep there.>>103045245based
Anyway... have fun doing AI and crypto you nerds.
you all need to lay off the apple cideryou're all ugly drunks
>>103045245nice bush
>>103045258Apple cider is peak nerd alcohol
>>103045262and?
>>103038380
Are there any good youtuber channels covering llms, image gen, and similar?
>>103045252chad mcchad face over here off to fuck his next girlfriend and his next line of coke in his city penthouse
>>103045319>Are there any good youtuber channelsDepends. How severe is your mental retardation?
>getting banned for posting your cockAt least post some blacked miku.
>>103045334>Depends. How severe is your mental retardation?Pretty severe.If there's a moment I'm not suckling from youtube's teet I feel distressed.
I am about to load the shilled qwen 70B. I will complain about it being shit in 10-20 minutes.
>>103045376also paste your stats
>>103045507>>103045507>>103045507
>>103043976It’s W8A8, not GGUF. All of the documentation for GGUF in vllm is plastered in warnings not to use it.Fp6 is way too low; this is internal corporate stuff not fuckbots. Fp8 is already ehhhhh noticeably less bright
>>103044613>it doesn’t support poor people bullshitGoodVllm is for the gainfully employed male with 2xH100s under his desk
>>103044880Exl2 is fast in the same way writing to /dev/null is fast.