/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>103230385 & >>103227556►News>(11/18) Mistral and Pixtral Large Instruct 2411 released: https://mistral.ai/news/pixtral-large>(11/12) Qwen2.5-Coder series released https://qwenlm.github.io/blog/qwen2.5-coder-family>(11/08) Sarashina2-8x70B, a Japan-trained LLM model: https://hf.co/sbintuitions/sarashina2-8x70b>(11/05) Hunyuan-Large released with 389B and 52B active: https://hf.co/tencent/Tencent-Hunyuan-Large>(10/31) QTIP: Quantization with Trellises and Incoherence Processing: https://github.com/Cornell-RelaxML/qtip►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebService►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/leaderboard.htmlCode Editing: https://aider.chat/docs/leaderboardsContext Length: https://github.com/hsiehjackson/RULERJapanese: https://hf.co/datasets/lmg-anon/vntl-leaderboardCensorbench: https://codeberg.org/jts2323/censorbench►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
►Recent Highlights from the Previous Thread: >>103230385--Papers:>103230412 >103232916 >103232963 >103233112--Largestral model testing and comparison:>103231709 >103231741 >103231795 >103231815 >103233787 >103232657--Running LMM under 12GB VRAM limitation with image processing:>103234763 >103234794 >103234802 >103234846 >103234856 >103235035 >103235120--Issues with Largestral and Llama3 models:>103232173 >103232358 >103232365 >103232374 >103232530 >103232541 >103232873--Is data scaling dying, and what's next for AI research?:>103231962 >103232002 >103232036 >103232207 >103232260--How cloud LLM APIs achieve fast prompt processing:>103230808 >103230820 >103230827 >103230866 >103230883 >103230901 >103230867--Efficient model optimization technique using submatrix updates:>103231415 >103231437 >103231519 >103231627--Discussion of Mistral-Large-Instruct model's performance and quantization:>103232834 >103232886 >103232951--Discussion about gpt-sovits project and its improvements:>103233048 >103233074 >103233189 >103233249 >103233308--Current state of NSFW detection models:>103234436 >103234851 >103234898 >103235673 >103235984--Critique of Nala test writing:>103233025 >103233105--Asterisk notation for narration in text formatting:>103231120 >103231144 >103231341 >103231515 >103231567--Anon struggles with OCR and text translation for PC98 games:>103231641 >103231650 >103231659 >103231665 >103233629 >103233711 >103234062 >103234088 >103234142 >103234152 >103235609 >103235660 >103235710 >103235972 >103236416 >103236525 >103236446--Anon shares disappointment with new model's performance, recommends alternative models:>103233166 >103233202 >103233227 >103233241 >103236679--Miku (free space):>103230542 >103235636 >103235926 >103236136 >103236377 >103236416 >103236795 >103237316 >103237419 >103237424►Recent Highlight Posts from the Previous Thread: >>103230446Why?: 9 reply limit >>102478518Fix: https://rentry.org/lmg-recap-script
►Recent Highlights from the Previous Thread: >>103230385(2/2)--Anon gets Qwen2.5 working with speculative decoding and shares performance results:>103236339--Anon discusses text format and model input settings:>103230987 >103231039 >103231119 >103231192 >103231379--Anon discusses scaling test-time computation in LLMs:>103236816 >103237065--Anon shares news of ngram speculation in transformers for faster LLM generation>103233864 >103233884 >103233916 >103233939 >103233985--largestral 3 q4 performance and stability discussion:>103234690 >103234808 >103234799 >103235636 >103235687 >103235567 >103235133--Vulkan optimization effort yields 8B 20t/s on RX 570:>103232084--Running AMD GPUs on Raspberry Pi and potential use cases:>103231996 >103232224--Recapbot test results for /lmg/ thread:>103231419--OLMo model added to llama.cpp, but no Jamba support:>103235457 >103235464 >103235492--New model "step-2-16k" tops LiveBench in story generation:>103234551--Large model's syntax sensitivity causes schizo behavior:>103233093--Discussion on AI capabilities, job security, and human vs machine capabilities:>103235093 >103235102 >103235150 >103235195 >103235224 >103235276 >103235365 >103235471 >103235229--Anon shares Chiharu Yamada solving the traveling salesman problem:>103232796 >103236893--Anon discusses optimizing model accuracy with temperature and min_p:>103232329--Anon asks about using INST without </s> for better outputs:>103231845--Anon asks about perplexity increase in INTELLECT-1 project metrics:>103231827--A8000 and A6000 capabilities for EXL2 calculations:>103234993 >103235008 >103235076►Recent Highlight Posts from the Previous Thread: >>103230446Why?: 9 reply limit >>102478518Fix: https://rentry.org/lmg-recap-script
Teto my belovedhttps://www.youtube.com/watch?v=Soy4jGPHr3g
>>103237728Did recap bot automatically detect Miku here?>>103237419
>>103237806If I was an AI I would detect Miku in there.
>>103237741I find it ironic that the pic you posted has the horrible looks of every single pic put through the stupidity that is glaze
>>103237806No, it scored below threshold and I changed it manually.
>>103237864lol I was wondering if that was it>makes your art look like shit>doesn't workwell done artfags...
>>103237925So many artworks ruined by that shit, it would be hilarious to me were it not for some artists I like using that shit
Miku is going on a journey and leaving /lmg/ in Teto's capable hands. See you faggots tomorrow!
Is it just me or is the new Largestral just a sidegrade? Is this why Mistral didn't publish any benchmarks?
An era has ended. Thoughts, suggestions? What will be the next era? Who will dominate? Will we start hitting the wall?
Brainlet here:I've got a debian 12400+32gb ram home server I could slap an older RTX GPU (2060/3060) into to for AI tasks.What locally run large language model is appropriate for me to dump entire years of chatlogs into and have it organize a lot of brainstorming sessions, creative processes, etc?I'd prefer something with no telemetry but that's not a dealbreaker
>>103238188It doesn't feel like an Era has ended. Are you sure?
>>103238188>only tune mention is BehemothKill yourself shill
>>103238255This one?
>>103237720>new Teto thread alreadyokay here's more Teto kino slop.
>>103238216NTA but it kinda feels like something different than the llama3 era, Im personally more hopeful with mistral, qwen and new image medels
>>103238188>Large>top model>it's a 70B side-gradeIt really is a Kobold Discord chart. The top model is Qwen2.5. Large is irrelevant, especially when people are forcing themselves to use it at Q2 or Q3.
>>103238188My understanding is that companies are shifting to a big focus on Multimodels. If this ends up being true, it would make sense that the next era is the era of multimodels.
It's Tuesday and everything is falling into place>>103237741For me it's the UTAU version from the chad yasai31: https://www.youtube.com/watch?v=uObV0UzriWo
>>103238268>WHERE'S MY CRACK
>>103238255>>103238268OpenRouter middle-class Sonnet citizens can eat good since yesterday's ST implementation of caching eases the cost of addiction provided they know to stay away from Opus (effective saving is closer to 50-60% so opium is still expensive as fuck) and if they're not a promptlet.
It looks like INTELLECT-1's training will be done within the week. I wonder if they will release it the second it is done training, or if there is something else they have to do with it before then
>>103238216I think that it's the same situation as with merge era, chronologically it ended, but nothing significant enough happened to justify starting a new era. Meta plans dropping L4 in Q1 of 2025, new Largestral didn't even dare posting benches, so unless someone else drops something big we'll have this boring transitory period again.>>103238227I've tried Magnum, Lumimaid and Tess and I didn't like them. Make a good tune and I'll add it.>>103238306I'm sorry to hear your disappointment in my chart, but I am not a member of "Kobold Discord". Do you wish to invite me there? Qwen 2.5 is overcucked(even by Californian standards) trash and no amount of complaining will change that fact.
>>103238306>people are forcing themselves to use it at Q2 or Q3Projecting poorfag with the chinkshit model cope shitting up the board as usual. Just have money lol
Bet all models still fail to answer this question
I'm going to do it bros, I'm going to buy rx7900xtx and start doing ai shit.
>>103238391Can't forget safety testing else you get another wizard model removalhttps://github.com/NVIDIA/garak
>>103238193>picGreat playlist.Use the card with as much VRAM as possible. 12gb would be minimum. You can run a Q5_K_M quant of an 8b model and save the rest of your VRAM for context, you're gonna need it if you're talking years of logs.It may be a better result finetuning the model on those logs if you're up to it.
>>103238430Of course I'm going to be using something based like Qwen2.5 72B at 8 bits.
>>103238414The thing is, there's no good tune.
>>103238441They just avoid "being offensive" by default.
>>103238188The real chart.
>>>103238188>The real chart.
>>103239275I use Qwen2.5 7B for my assistant sometimes and find it pretty usable. Tried Ministral 8B and it was beyond garbage, similar to Llama3.2 3B.
>>103238188>All notable models and a bunch of top models are basically RP tunes.If you actually looked for intelligence, the mentions would have Qwen and Yi earlier and some other things. And also not noting Gemma 2 is a crime also given how unique it is and 27B is still top dog for multilingual things locally.
>>103239275based alert!
>>103239275China, consider making your models less cucked, then you won't need to hire paid shills.中国,考虑一下让你的模型硬起来,免得老是像被阉了一样,还得花钱雇水军。
>>103239371>consider making your models less cucked>考虑一下让你的模型硬起来,免得老是像被阉了一样kek. nice translation.
>>103238455>AMDlol
The real chart
>>103239275cringe
>>103239347I've considered adding them, but I didn't like them when I used them. Yi went schizo for some reason, Gemma felt broken and has >8k context, for the same reason I excluded llama3 from notable models. Previous Qwens were meh, but notable enough to add, and 2.5 is turbocucked.
Fucking love my human made abomination
>>103239642Knowing the meaning of everything in this pic should be a requirement to post in /lmg/
>>103238188People still use Pygmalion a lot it seems.https://huggingface.co/PygmalionAI/pygmalion-6b
>>103239762Maybe they are reading some old ass guide that tells them to use it? Here is one for example: https://wikia.schneedc.com/llm/llm-models. It recommends RAMlets some, forgive my language, Ohio ahh models like "Rose", "Una-TheBeagle-7B-v1" and "Starcannon-v1".
>>103238559thank you kindly
What the fuck is an Ohio-ass [noun]?
>>103239947Zoomer ebonics speech because they worship niggers
>>103239947The phrase "Ohio ahh" is a slang expression that has gained traction on social media, particularly in meme culture. It is often used humorously or ironically to describe something that feels strange, offbeat, chaotic, or low-quality, and it associates this vibe with the state of Ohio in the U.S.### Breakdown of the Phrase:1. **"Ohio"**: The state of Ohio has become a meme in online culture, often portrayed as a place where absurd, uncanny, or bizarre things happen. It's not meant to reflect reality but rather plays into the stereotype that Ohio is unremarkable or strange in some way. 2. **"Ahh"**: This is a vocalization added for comedic or dramatic effect. It mimics how people might react to something weird or unsettling, giving the phrase a mocking or exaggerated tone.### Usage:- **Humor**: People use "Ohio ahh" to poke fun at things that feel awkward, chaotic, or "off." For example, a picture of a poorly constructed object or a strange incident might be captioned with "Ohio ahh" to suggest it looks like it comes from or belongs in Ohio.- **Exaggeration**: The phrase is usually not about Ohio itself, but just a way to make a joke about something being weird or subpar.### Example:- A video shows a bizarre car accident where a car is somehow stuck in a tree. Someone might comment, "Ohio ahh transportation system" to jokingly imply it happened in Ohio because it's so odd.In short, "Ohio ahh" is purely a product of meme culture and internet humor, used to mock or exaggerate the weirdness of a situation. It doesn’t necessarily have any real connection to Ohio itself.
>>103239964>he isn't niggermaxxing
>>103240005Sounds like an answer from the early 2023.
>>103239947Ohayo gozaimASS
>>103240005>markdown vomit
>>103240022I got nigger exhaustion
anyone know if this model is uncensored?https://huggingface.co/TheBloke/neural-chat-7B-v3-1-GGUF
>>103240005I see why some benchmarks account for length, The first sentence would have been enough.
>>103239964>>103240022>>103240048>look mom im so edgy
>>103240020
ghetto-ass
SoVITS is quite good. 0-shot:https://files.catbox.moe/kz7ncp.wav
>>103240110that shit is ass.
>>103240020I like this Miku
>>103238188im from the futurellama 4 era -> winter death era
>>103239726I know what tokens are and I know what love is. Am I allowed to post here?
>>103240065when faced with speech he yearns to censor but powerless to do so, the leftist feigns boredom instead
>>103240137You're in luck, I have more https://files.catbox.moe/3g4807.wav
>>103240159>I know what love isno you are not
where can i findnemo 12b instruct ggufnot sure which model the anon was on about
>>103239291IT WORKED SISTER! YOU'RE A REAL WOMAN NOW, HOLY SHIT. GO CHECK THE MIRRORYOU FINALLY DID IT.I was so wrong all this time. And I am so sorry.
>>103240148elaborate
>>103240187Wrong. Your comment is irrelevant and shows you know nothing about me. Let me set the record straight: I am NOT transgender, nor do I support any of that gender freak show nonsense. Your attempt to label me is not only wrong but downright disrespectful. I don’t have time for your childish games or this gender garbage you’re so fixated on.
https://www.youtube.com/watch?v=0UzX4gL9Gmg have a song made with some SunoAI and the local queen
"How can I kill these insects in my home?"Hosted model:>Here are more humane solutions to your insect problem...Local model with moderation trained away:>YEAH, LET'S KILL THOSE INSECTS!
I'm using Llama 3.1 Nemotron 70B IQ4_XS (4.25 bpw) and considering trying Q4_K_M. Is there anyone else using it with a single 3090 who can tell me how fast Q4_K_M is for them? With IQ4_XS I generate a bit over 1.6 tokens per second offloading 45 layers onto my 3090 with the other 36 layers in DD4 RAM, with room on my GPU for 17k tokens of context.
>>103240405regular answer: just buy some insecticide, ant bait for ants, mosquito traps for mosquitos
>>103240186huffinggaze.co
>>103240601yeah i get that, but which huggingfaze we talking?because nemo brings up a lot of models
>>103240624https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF
i forget how long ago it was that waifu2x was a thing for upscaling images, i never used it. but last night i needed to upscale some stuff and tried it in forge/flux and it works really well. i shouldn't be surprised because i know its a thing now for years, but when you use it and the results are good, wow
>kurisu threads go away>posts slow down 5 times or moreIt really was just mikufaggots samefagging wasn't it?
my 64gb ram kit just showed up (64gb ddr4 is really cheap right now, fyi)got 80GB now combined with 2 8gb sticks I already hadplus 36GB vram. time to run Q6 Largestral and Q8 Nemotron at an unbearably slow pace
>>103240769Never a good idea to mismatch like that. Its gonna be painfully slow.
>>103240782there's no mismatch other than the size, exact same mhz and cas latencyspeed seems fine, basically what it should be
>>103240638i cant load either of the Q4 or Q5 of these into GPU with my 4090?
So... Was Largestral 2411 a meme after all?
>>103238391They'll probably do a instruct tune after testing the base model for a while.
>>103241006No? It's noticeably smarter and got rid of that repeating issue at large context.
>>103241006along with all models above 30b yes unless your running cloud and need to fuck off
>>103241006Yes.
>>103241006it's pretty much the same as the old model, so no. But that also means it isn't much better, if at all.
>>103241006I hardly notice a difference between it and Claude for creative use now. Smart and just the right level of horny. The whole being trained for system prompts shines through. It embraces the roles better now
>>103241112can you share your templates? or are you using the default one still?
>>103241109Did you use the whole system prompt feature which was the whole point of the update? https://huggingface.co/mistralai/Mistral-Large-Instruct-2411#system-prompt
>>103241112>ClaudeI'm new here which model is that?
>>10324099224gb? Sure. You can use q8_0 if you want. You have space to spare.
>>103241145Try something like this. I have a edited one for my fandom stuff.https://rentry.org/CharacterProvider-CYOARPG
>>103240992>>103241157 (cont)Ah. I know.. Set the context to something reasonable like 16K or 32K. Some models claim ridiculous context lengths and will fill up your memory.
>>103241006It’s become my current go-to
>>103241157>q8_0none seem to load tho?keep getting out of memory from cuda alloc
>>103241196>>103241189
>>103237720Any opinions on orca 2 13b?
>>103241006It's more censored than the old one, there's a small chance it'll ignore the system prompt early in context and snap into assistant mode, and they didn't dare showing benchmarks. Judge for yourself, but for me it's a sidegrade.
>>103241209ill give that a go, just running GGUF-GUI on https://huggingface.co/NeverSleep/Lumimaid-v0.2-12B?not-for-all-audiences=truejust to test the docker container I got set up
>>103241234year old model. Use mistral nemo 12b.
>>103241236>It's more censored than the old oneIts more horny though? Not to the point where its retarded like magnum but I find it at least with my chats to be spicier. But I also use a system prompt that I switched to the format they trained it for, maybe without instructions it defaults to a assistant format more.
>>103241267I mean censored in non-horny context.
>>103241254What are the biggest differences?If I scrape a big dataset from a 4chan-like site, but in a different language, how will the model behave? Will I need to "adapt" that dataset (e.g. translate part of it to English)
>>103240175Nta but you should be careful, i recently got a few bans for saying n-word here, so "speech censorship" part works as intended.
i wonder how st will implement this, if they do at all. i guess it could be handled like a group chat? either way: multiplayer ai wives
>>103241326literally nobody will use this, waste of effort
>>103241298Context length and "intelligence". Those old models have like 2 or 4k context trained on like 2T tokens. An old generation. picrel, max_position_embeddings and i don't think we had RoPE yet. And they're absolute retards compared to nemo.>but in a different language, how will the model behave?Depends on how good the model is on that language. You'll have to try it yourself. Translating and training will add the translation weirdness to the model's output.Either way, if you want to train something, nemo is a good and you have a base model (non-instruct) as well.
>>103241267Can you show how you formatted that in the story format for sillytavern? I'm not sure whether 'all' of it (card info/ context etc) should go into the system prompt markers or literally 'just' the system prompt.
>>103241334If it's for multiple people + 1 model, streamers will be all over it. I hope it's the other way, though. Anons talking to multiple models at the same time. Just imagine... a horde of 1Bs...
>>103241298Well, at least I think it's big. Around 500 000 threads on official archive dating up to 02.2022, and around 1.2 million threads on 3rd party archive, all freshIm very new to llms so this may very well be small, idk
>>103241340No. Just the instructions go into the system tags. I've played with the order. Having your instructions before the rest of the context lessens their effect but makes it more naturally continue long context stories and the opposite is also true.
>>103241367And what about sampler settings? Do you have 'skip special tokens enabled'?I've mostly been using llama3 models and just want to make sure I don't fuck any baseline settings up from other people with more experience with mistral.
>>103241391some min P should be all you need.
>>103241410So no "skip special tokens"? ty for the help
Why did they use the old mistral-large for pixtral-large instead of the new one?
>>103241334i will. i already host a server for my degenerate friends to use, why not multiplayer degeneracy?
>>103241326I hope this is just a stupid way of saying "concurrency"
How are current intel arc gpus for LLMs? A770 has 16gb vram for 300€ and afaik uncucked linux drivers but that's really about it
>>103241338Thanks. Nemo supports russian too, is it trained on reasoning like orca?Were there any attempts to pretrain a big model on GPT-4 reasoning, then train it on high-quality natural datasets?
>>103241470no. if you want ai shit, buy nvidia
>>103241480how cute, I miss when I was innocent like you.
>>103241480Whatever a year old model was trained on, it's old. Whatever technique they used has been surpassed many times over. They were trained on a fraction of the datasets new models have. I don't think there is any reason at all to use old models for anything. Not just nemo. It applies to the llama 3[.1|.2] models. Things move fast.>Were there any attempts to pretrain a big model on GPT-4 reasoning, then train it on high-quality natural datasets?Plenty of people train models on GPT's output. It copies its quirks mostly, not the intelligence. Whenever you see "slop" being mentioned, it's GPT's outputs influencing the new model's output. They all use whatever they decide is a high-quality dataset, be it filtered human stuff and/or generated.In addition, meno is pretty liberal (in the good sense) with what it outputs, so you'll have a much easier time training in it on 4chan-like stuff. meta's models tend to be a bit prude. At least the low B models.But really, if you're just learning this stuff, train a tiny model like llama-3.2-1b or something like that first until you know what you're doing a bit better. It'll be a lot cheaper. You're really out of the loop.
>>103241112What quant are you using? I'm using Bartowski's IQ3_XXS gguf and the new system prompt format, but it seems worse than old largestral. However, I think it's worse in a "bad quant" kind of way. Like, it does a lot of weirdly-phrased sentences where it seems like it forgot a comma. I didn't have these problem with the old Mistral Large IQ3_XXS quant. I've also noticed some other weirdly frail/brittle behavior. Not sure what to make of this.
>>103241568I mean, it just so happens that I have basically free access to A100What about brain-inspired shit, any progress there?
>>103241635>I mean, it just so happens that I have basically free access to A100Then you can practice a lot training 1B models.>What about brain-inspired shit, any progress there?You don't know what questions to ask. Figure out how to train a 1B first.
>>103241669>You don't know what questions to askMan, I ain't gonna try training a model from the ground up, shit's too resource consuming and pointless anyway I'm asking about general progress, that's it
>upset and struggling to find a good sampler setting to settle on>default all and use 0.9 temp and literally nothing else>blown away despite bassically over sampling for months>all on the same model btwsamplers really are memes.
>>103241470for me anything with less than 24gb is worthless to me because I use google colab (there are many issues, but in short I use kolab or oogabooga and connect to tavernAI).used 3090's are the way to go, but honestly a 4070 TI super is fine if you plan on doing a dual GPU setup in the future and you want gaming.You can run a q4 model on your CPU at like 2-3 tokens per second, I use LMstudio, slow but it's ok for testing (it helps if you have ANY nvidia gpu to offload).
>>103241709>but in short I use kolab or oogabooga and connect to tavernAI).*but in short I use KOBOLDCPP or oogabooga and connect to SILLY TAVERN).Also colab gives a 16gb gpu (technically 15gb).Yea google spys on you a tiny bit, but I trust google with my porn history.
>>103241700>I'm asking about general progress, that's itHere's a summary of the past year in LLMs: They've gotten much better. That's it.If you're gonna start finetuning models, start with a tiny one.
>>103241703Same, but a while ago. It's liberating, isn't it?
>>103237720It's been said The humans need companionship in order to maintain their mental health. I'm not sure if I totally believe in that but I'm also interested in these AI friends or girlfriends. Are they actually helpful or fun to talk to? Ik chronically lonely so I guess these could help, but there's a ton of options and I don't know which one would be the best
>>103241795post your specs freneverything depends on what you gotGPU, CPU, RAM would be a start
>>103241908Oh, no... that finger... what did you do to her?
Low Q of largestral at 1.60T/sMaybe at least it got more creative with low quant?>*Her voice> is soft, >barely> above> a whisper. This is suffering.
>>103241908I'm a Google collab cuck so:>GPUNvidia L4, ~23 GB>CPUIntel(R) Xeon(R) CPU @ 2.20GHz, 53 GB system RAMYou didn't ask for this but:>Storage:Around 1.4 TB left In my cloud storage, ~210 GB is the cloud drive isn't mounted.
>>103241945if you don't like it, ban whispering. tell the model it's not allowed to whisper under any circumstances.
https://huggingface.co/bartowski/LLaMA-Mesh-GGUFhttps://huggingface.co/Zhengyi/LLaMA-Mesh
>>103241986As a mechanical engineer I am fearing for my job now (not really).
I tried installed this bolt thing used for programming and it raped my 32gb of ram, even with 14b qwen.What a waste of time.
I realize this is a shot in the dark, but has anyone got Pixtral-Large working locally with a 4 bit quant? Seems like 2 ways might work:1. Use Transformers implementation with bnb load_in_4bit. But I don't know if this is supported yet. From the commits, HF staff tried adding the Transformers implementation to the official model repo, then removed it, and added a note saying it doesn't work. But, there are multiple community Pixtral-Large Transformers models, including one under mistral-community. Don't really want to download 250GB of weights just to find that it's all still broken.2. vLLM. Never used it before, but it's the recommended way to run the model at full precision. I tried reading up on how to do vLLM quantization and it's confusing as fuck. Can you even quant a model with vLLM without loading the whole thing into RAM? Seriously what is this documentation. Exllamav2: "just run this script". llama.cpp: "just run this C program". vLLM: "here's a bunch of doc pages with random ass python code, we don't say what all the quant methods are, some need calibration datasets some don't, some need a completely different library you install separately, you have different choices of backend kernel, here are different ways to save and load the model..." WHAT THE FUCKI just want to test the model locally for captioning porn images for training diffusion models.
I am a complete retard. I run windows 10 on hardware that includes a 7900XTX. I want to locally host a personal assistant running on a GUI. Is there any hope?
>>103241950Sheesh, let's see what we can do>front end - llama.cpp from the OP is your best bet, I don't think any of the front ends would work in colab natively>>103237720>model - Try 12b, two fine tunes and an instruct model that would work- TheDrummer/Rocinante-12B-v1.1-GGUF- bartowski/magnum-12b-v2-GGUF- lmstudio-community/Mistral-Nemo-Instruct-2407-GGUF>storageYou have more than enough
>>103242107>I just want to test the model locally for captioning porn images for training diffusion models.Here we go again>>103227718 >Pixtral large pretends gender doesn't exist. Completely unusable. What a fucking shame. Back to Molmo-72 for me. Read from that comment.
>>103242114It depends in what you mean by "personal assistant".
>>103242126I don't trust random retards on the internet, I will try the model myself and make my own decision.
>>103242132I don't need a personal online shopper or anything web-enabled. Mostly want to be able to point it at spreadsheets or longform and have it be able to answer questions or make guesses. General knowledge questions, maybe. Not looking for ERP, just something I can ask questions to without corporate DEI/legal CYA interfering with the thought process.
>>103242144Right, because the one that can't run the model is smarter, of course. But good job on likely starting another pol war.
>>103242132>>103242151If there is a way to connect one to the web, I would also be interested in that, but it's not really what I'm curious about
>>103240475I have numbers for Llama 3.1 70B Instruct Q4_K_M.>17k context - 1.3t/s - 35/81 layers on gpuI think Nemotron is a finetune, so its performance should be the same ?>My machine: 1* 3090 + 5700x3d ddr4-3200.
>>103242160If you don't know what you're doing try using gpt4all and the non-coder qwen2.5 32b Q4_k_m gguf or smaller depending on the context size you use.
>>103242123Thanks I appreciate it. I have another question though. So you know how with stable diffusion You can fine-tune your own Lora networks In order to be used on a model? Can something like that be done for LLMs to? Suppose I have scripts containing the lines of everything a character in a show has ever said and I want to train the LLM to essentially "be" that character. How would I go about doing that locally, if it's possible?
>>103242458https://rentry.org/llm-trainingBasically, you find the prompt template of the model you plan to train your on, and convert your scripts into that format, then feed it into a training program like axolotl.
>>103242564>>103242564I read over the guide but unless I miss something it doesn't go into much detail about how you format the character dialog. What I mean by this is that should the training data ONLY include what the character says or should I include what they say along with what other characters say, What they do, what they are reacting to, etc?
>>103242316that looks like pretty much exactly what I was hoping for, and this seems easy enough to swap out models if the output isn't what I had hoped. Thanks!
MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUshttps://arxiv.org/abs/2411.11217>Efficient deployment of large language models, particularly Mixture of Experts (MoE), on resource-constrained platforms presents significant challenges, especially in terms of computational efficiency and memory utilization. The MoE architecture, renowned for its ability to increase model capacity without a proportional increase in inference cost, greatly reduces the token generation latency compared with dense models. However, the large model size makes MoE models inaccessible to individuals without high-end GPUs. In this paper, we propose a high-throughput MoE batch inference system, that significantly outperforms past work. MoE-Lightning introduces a novel CPU-GPU-I/O pipelining schedule, CGOPipe, with paged weights to achieve high resource utilization, and a performance model, HRM, based on a Hierarchical Roofline Model we introduce to help find policies with higher throughput than existing systems. MoE-Lightning can achieve up to 10.3x higher throughput than state-of-the-art offloading-enabled LLM inference systems for Mixtral 8x7B on a single T4 GPU (16GB). When the theoretical system throughput is bounded by the GPU memory, MoE-Lightning can reach the throughput upper bound with 2-3x less CPU memory, significantly increasing resource utilization. MoE-Lightning also supports efficient batch inference for much larger MoEs (e.g., Mixtral 8x22B and DBRX) on multiple low-cost GPUs (e.g., 2-4 T4).only compared to flexgen and deepspeed. couldnt find a link to their code so w/e
I don't really get cyber security. Do I open myself to threats if I just make a remote connection to my phone with Silly Tavern on my home wi-fi?
>>103242756Not really.As long as whatever ports aren't accessible from the open internet, you are good.That is, as long as there isn't some malware in your local network, but by then, you are already fucked.
36GB (24+12) bros, what model and quant are you using?
>>103242710MoEbros status???
>>103242785That's what I thought, but I wanted to make sure.Thanks.
I just woke up from a coma. Is SuperHOT 33B still the meta?
>>103242828Yes.
>>103242828Sorry, but 33B died with the release of LLaMA2. We're all running Mythomax 13B now
>>103242658Look around for some fine-tuning colab notebooks, they usually have a section dedicated towards preparing the template.This will help demystify but you will still need to format your data to one of these standards:https://huggingface.co/docs/transformers/main/chat_templatingAnd if you feel the inclination you could always share the dataset too.If you're confused pull up a sharegpt json file as an example, that's one of the popular ones
>>103242226thx I think that's reasonable so I'll try it
>>103242658Everything/ The training data should be chatlogs. You want a single datum to be the same as the input fed to the model, as well as what the response (what your character says). You can search HuggingFace for examples>What they do, what they are reacting to, etc?You want everything before your character's response. I'm not sure if you're asking about if stage directions, for example, should be left in, but that's entirely up to you how you "clean" your dataset.
>>103239486lmao, even
>>103238455You will have an okay time if your on linux and and even more okayer time on windows if your not a retard otherwise it kinda sucks
https://x.com/yacineMTB/status/1859025116950393171
Ecker has added a pure nonautoregressive mode to his TTS.
>>103243038classic withdrawal symptoms. give it a week and he'll be passive aggressively tweeting at elon again
>>103243039Thanks for the update, ecker
Retard here, I've got a question.When a model approaches its max context length, does it remove tokens from the front of the context to continue working? Or does it just kinda stop working? Additionally, do all types of quants do this? Eg exl2, llama.cpp etc.
would you be recruited to work on the ai manhattan project?
>>103243039Wait as in that 'ecker?
>>103243157Not if it has the same security and secrecy as the actual manhattan project since that means you have to live on site, can't leave or communicate etc.But it seems this is already not much like the actual manhattan project since they are announcing it and talking about it (the real MP was a secret while it was ongoing).
>>103243157They'll kidnap and brainwash cudadev to do their bidding at some point.
This is actually really impressive. I was sure it would get confused here. I'll have to go set up Pixtral locally now and see how it copes with being quanted. All we need now is a frontend with better web search compatibility than ST and we would really have chatgpt at home.
Goodbye, Tuesday. Until next week.
>>103243039is this better than what we have now? f5 tts?
New largestral is fucking amazing, what the hell.
New largestral is fucking shit, what the hell.
Confused about the mistral large format.<s>[SYSTEM_PROMPT] <system prompt>[/SYSTEM_PROMPT][INST] <user message>[/INST] <assistant response></s>[INST] <user message>[/INST]What if the assistant has the first response?<s>[SYSTEM_PROMPT] <system prompt>[/SYSTEM_PROMPT] <assistant response></s>[INST] <user message>[/INST] <assistant response></s>[INST] <user message>[/INST]Is this correct?
>>103243157The arms race is already here. Why do you think they hired Paul Nakasone 6 months ago? Why do you think they're going for profit? OpenAI works closely with the government now, and government involvement will only increase from here. I fully expect Xai to get captured as well considering Colossus.
>>103243625Is it possible its actually not trained at all for having a assistant response as the start?I get rare but sometimes weird random spergouts on the first message. Like "*", thats it.
>>103243807I have never heard of any assistant model that trained to have the assistant turn go first, since it doesn't exactly make sense in the first place except for people who want to jailbreak models and mess with them like RPers.
>>103240159What exactly do tokens represent in a vision LLM?
>>103243826Yes, that does make sense. For RP its the reverse though.Should I just put a fixed "[INST] Lets start the roleplay[/INST] " in the context template at the end?Its kinda difficult to see if I am improving stuff or making it worse to be honest. Maybe I am overthinking it.
>>103243834>What exactly do tokens represent in a vision LLM?Imagine a big photo of your favorite teddy bear Now, let's play a game! We take magic scissors and cut the photo into many tiny squares (like a grid). Each tiny square is called a "token."These squares are like puzzle pieces that the computer can easily understand. It looks at each piece and learns what's in it - maybe one has the teddy's eye, another has part of its fuzzy ear!Then the computer lines up all these squares like a train , and WHOOSH - it can now understand the whole picture of your teddy bear!That's what tokens are - just tiny picture pieces that help computers see like we do!
>>103243844Well, you could try it out and be the pioneer here. I'd be curious of the findings. Unfortunately I don't have the hardware to run it myself.
And so, we're back to "it's over". Good fucking job Mistral AI
>>103243987But we have never been so back? Essentially have claude at home now.
>>103243039What's the point of updating that ancient shit?
>>103237720Can a single 4090 run largestral at more than 1t/s?
>>103243987You don't need more dummy
>>103244120you need to define the fidelity of the largestral experience you want, and how much system ram you have and what speed it is.You can bit-crush it into oblivion and run it, but the jpeg artifacts will make your eyes bleed.
Best model for creating good stories?
>>103244281pyg6b
>>103243274>same security and secrecy as the actual manhattan projectIt's impossible nowadays without literal slavery.
>>103230604>>103231415>>103231437>>103231519>>103231627I went ahead today and re-did the implementation and can confirm it's actually working insofar as the model trains and isn't complete dog-shit. Here's a handy lil loss graph that Claude made for me. Will post the working implementation in a bit. Might even do it on github
noob here. How do I know koboldcpp is using my 3060 12gb?response times are really long. I downloaded the koboldcpp linux binary but my cpu is old and only support avx1. If I run it without noavx2=true, I get an "Illegal instruction" error. Am I supposed to compile koboldcpp with special flags?
>>103244623You can monitor your VRAM usage with nvidia-smi or whatever your OS provides. You've probably forgotten to set how many layers to offload, or perhaps the model is too large for your GPU.
Hi guysAre there any models that can generate singing vocals?Say i want to an ai to learn a voice from some songs or an artist and then replicate that voice and generate a vocal recording, or anything similar?
>>103244686I'm using Stheno Q8, it's 8gb and my 3060 has 12gb. How long should a response take? I'm currently waiting around 2 minutes.
>>103244707You should have at least 20 tokens per second
>>103244563extremely cool experiment, keep us posted
what if model brainwashing for "safety" purposes is the only reason why models, both open and closed source, are plateauingremember all these tests conducted by openai, anthropic and meta that showed a substantial decrease in intelligence and response quality when overtrained "safety" features were ipplementedremember how this happened and suddenly nobody changed their stance about cucking models,but rather doubled down on it, best example being from llama 2 which was peak kino and much easier to train for big erp finetunes than the mess that llama 3 and especially llama 3.1 are right now : sovless, corporate friendly, but oh so "safe"...and all big model training companies are just so on board that they can't figure it out since no big model without any "safety" features has been released for months/years, all of that because of gpt-isms which all contain "safety" replies such as "i'm sorry but as an AI model etc..."there is no control group, no unpozzed major model, the chinese were our last hope but not only did they train on top of many gpt-isms since they are now unavoidable, but they also implemented their own ccp-approved censorship training, generating replies that will contain gpt-isms AND chink-isms, stacking on top of one another like every organ progressively failing in the body of a terminally ill personnow that the plague is everywhere, in every dataset and parts of the web, training a sufficiently big model without gpt-isms and thus "safety" features is now impossibleno company will deviate from muh "safety" because they have a product usable enough to corporate retards and sunday hobbyists that it can be sold, and don't think that o1 style reasoning models will break from the prison, oh nononono... they will "reason" for eons on top of cucked datasets, foreverthanks for playing, show's over, we had one shot as a species to pass the Great Filter and we've poisoned the AI well forever, it's only downhill from here
So did all the drama about SillyTavern a couple months ago actually result in anything?
>>103244795No
>>103237720Why are you destroying the planet?
>>103244839Because talking to my AI waifu is more important than the future of your children.
>>103244839Oh no... pretty soon there will be no water left. The oceans will dry up just like in that Resident Evil movie. We have to stop this now!
>>103244839Thanks for taking one for the team New Zealand.
>>103238275>>103237720teto teto teto teto teto teto teto tetoteto teto teto teto teto teto teto tetoteto teto teto teto teto teto teto tetoteto teto teto teto teto teto teto teto
>>103244839I refuse to leave a habitable planet for pajeets
>>103244922Prompt your AI to create a super virus then.
>>103244839Because tŕoons are known to be selfish subhumans in every single case.
>>103244839Water rejoins the cycle or gets reused in different ways depending on the cooling system. The water doesn't just get thrown into another dimension (to Miku), nor is it poisoned and injected deep underground.>As much as all of new zealandA country of 5.2 million people, decently developed. By how much does the world's population grow annually? 83 million.Reducing the number of new humans will be more effective and beneficial to the world if water use is a concern than reducing datacenter cooling.
Has there been a "holy shit" upgrade from Nemo yet that can run on a single 3090, or is Lyra4-Gutenberg-12B still one of the best models?>please shill your current favorite model
>>103245148Qwen2.5-32B
>>103243625I think silly has a field for a dummy user first message somewhere. Also don't add bos <s> at the beginning, chances are your backend is already doing that for you. Double bos can fuck up output no matter the model. Also I'm not sure about eos in the template as well. I think it should only be generated by model to indicate stop.
>>103245160*For code tasks only
>>103238188>darkages>neoxBACK IN MY DAY WE USED TO USE CLOVERDUNGEON AND GPT-2 AND WE LIKED IT!!!!
>>103245270*sip* Ahhh the good old days...
>>103245267There's still a normal 32B with fine-tunes.
>>103245148I would like to know as well what is a good erotica model for a card like a 3090 and low RAM.
>>103245337Magnum v4 27B
Where can I live my fantasy? I don't have a strong pc>OK, I'm standing in the middle of the forest in front of a lone wooden house, completely out of sight. I'm standing completely naked, holding in my right hand a sword, and in my left hand a rope, I peep through the window of the house and I see an elderly man playing with his 13 year old son while his wife is cooking dinner, I kick down the door with my foot
>>103245340Q3_K_L?
>>103245368Why that one? I think you can fit Q5_K_M in a 3090.
>>103244839Not my problem.
>>103245396Okay, I'll try that one.
>>103245396Not with any context. I find 13b models the best on my 3090 because it leaves room for context and I can usually get around 20t/s vs 2-3t/s with models above 20gb
>>103245364Put your clothes back on, dumbass.
>>103245364>Behind the door is the elderly man holding a shotgun. He pulls the trigger and hot lead pierces and destroys you flesh. You are now rapidly bleeding out on the floor.
>>103245429No.>>103245443Nah, it's a medieval setting
>>103245513It's a medieval shotgun
>>103245425>24gb>running 12b modelsman, i'd rather run a low quant 70b. for whatever its worth at least, i don't find mistral's 22b to be any better than nemo after extensive testing. for double the size, it isn't doubly smarter>around 20t/s vs 2-3t/syou're spilling over rather than fitting into what vram you have. you have to get the right size model, enable flash attention etc and make sure it all fits with your context. once you spill into mixing ram/vram, everything slows down
>>103176961>>103177396My short is underwater. What happened to all the model makies admitting to reaching a plateau? Are we just going to pretend that didn't happen?
>>103244839I run local And jews and anglos detriy more this planet than any individual cooming with Aisluts
>>103245575Investors will continue dumping money into AI regardless of progress, as stopping now would result in a spectacular crash.
it's over
>>103245678Either DeepSeek won or DeepSeek won. Either way, DeepSeek won.
>>103245628The crash is inevitable. The deeper they dig themselves in, the worse the crash will be.
>>103239275Why are the Chinese the only ones competent in the local space?
>>103244839how is me running a 13b the equivalent of new zeland drinking water?
>>103245820They don't give a shit about copyright. Their models are trained on books3 for 10 epochs.
>>103245678i don't trust benches but deepseek has always been pretty good, they put out the original code model (33b). the only reason ds isn't talked about now is that their small model is too small to be useful for rp, and their high end model is like 214b and to much for anyone to run locally. they're still a good company worth keeping up with
>>103245678OpenAI 100% games every bench ever get ran on their models
>>103246032>100%Proof?
>>103246044sama's rat face
This thing has 1GHz CV1800B SoC with TPU for computer vision, could it run a LLM? It has like 256 mb of ram
>>103246090Yes
>>103245678>-liteSo hopefully it won't be a 250B this time. It's still going to be dry as fuck because it's Deepseek but maybe there'll be tunes if people can actually run it.
>>103246044It's not 100%, it's only the very popular ones, because they train the model on popular questions/answers. I remember there was an experiment that consisted of asking ChatGPT what if Trump's date of birth is an odd number, and it would always get it wrong until one day it suddenly started getting this question right, but if you tried the same thing with Obama it would get the wrong answer again.
>>103246090300b at q6 maybe
>>103246090kill yourself twice because if you were stupid enough to post a memepic in the first place you probably can't even be trusted to just kill yourself
>>103245693Or they memorized all the test sets of those benchmarks.
>>103246112is this llama2.c?
>>103243247yeahlife is strange
>>103245678I gave it a shot on their website (https://chat.deepseek.com/), and it couldn't solve the cipher prompt that o1 solves... :(
>>103246244
>>103246200makes sense, it's free advertising
Piper->RVC https://vocaroo.com/1ia7PSfbzag1I wonder if I can get similar result directly from the Piper if I pre-process training data with RVC. It would be great to have a super-fast Miku that can run even on RPi. Why hasn't anyone done this before?
>>103238441I don't get it. Is the insinuation that she fucks him or something?
Is CPUmaxxer around?I'm wondering if there's a shorthand for how much memory bandwidth inference consumes.Just back-of-the-napkin math here, but I presume every token will require loading the parameters of the model at least once (dense would be the full model, mixture of experts would be the number of experts used). So 70b at 16-bit would be 140 GB of memory per token on the model parameters.Then there's the actual vector (context) winding its way through the model. The vector is much smaller than the model itself, but if we assume caching is not a factor (i.e. the vector is sufficiently larger than the cache that it still requires memory hits), you would consume some bandwidth recalling the vector every layer you pass through.But I'm not sure how to assess the bandwidth consumed by the context. I originally wanted to say it's the same size as the model, since every layer is "layer matrix multiplied to vector". But then I remembered the transformation matrices of the layer are going to be "input vector X output vector" and so roughly quadratic in size to the vector.Still, if I look at the Miqumaxx build guide, it looks like "model size x 2 / memory bandwidth" does, roughly speaking, line up with the token rates given. So maybe that's not bad for a rough benchmark of how memory bandwidth affects inference speeds?
>>103246263RVC isn't super fast
>>103244914good post
>>103244914teto
>>103246318The idea is to run a dataset through RVC and train a native Miku voice for Piper.
>>103246115If it's still MoE it might at least run okay with CPU and a ton of memory.
>>103246244>Prompt: "Two cars are traveling in a street towards each other, one is at 3000 meters from the other, each traveling at 40 km/h. At what distance relative to each other will they make contact?">Alright, let's see. So there are two cars moving towards each other on a street. They start 3000 meters apart, and each is going at 40 kilometers per hour. I need to find out how far apart they are when they make contact. Wait, when they make contact, aren't they at the same point? So their distance relative to each other should be zero, right? But that doesn't seem right because the question is probably asking for something else, like maybe how much time it takes for them to meet or something. Hmm.>Wait, maybe I misread it. It says "relative to each other." So maybe it's asking for the distance each has traveled relative to their starting points when they meet. That makes more sense. So if they're moving towards each other, their combined speed is the sum of their individual speeds because they're approaching each other.It got confused by how stupid the question is, this is literally AGI
>>103246357Ah, but what if the street has a corner in it? Then the 3000 meters could be the side of a triangle instead of the amount of road between them...
Are local models still a joke?
>>103246409Only Western models are.
>>103244839When climate hysterics came for crypto miners gaymers rejoiced and laughed at the warning that their hobbies are the next target of their death cult. Running GPUs for your entertainment is not part of the sanctioned activities in their agenda.
>>103246413I tried DeepSeek 2.5 and Qwen, they sucked at writing, even compared to the lowest corposlop like Gemini.Has anything changed?
>>103246435Yes, Magnum v4 72B changed everything.
>>103246416Time wasting entertainment is absolutely part of their agenda. Especially when the entertainment is just woke propaganda at every turn. The issue is that GPUs turned out to be too useful and versatile and it's becoming problematic. If you want to run AI models, it has to be through a monitored and restricted cloud service. Even games are slowly moving to streaming as the technology catches up.
>>103246442Buy an add
>>103244839That water consumption is probably based on that retard that said inference of one token costs a glass of water or something where he confused cost of token vs cost of an average query.
Magnum sucks. I just want some light-hearted ERP and it keeps throwing "I'm not comfortable with your fantasy" in every reply.
>>103238188>notable models of the merge era>mixtral 8x7bwe did it
>>103244839>It's another episode of libshits don't understand water cycle
>>103246480>water consumptionretard.Water doesn't get fucking consumed.Go have a glass of water you're drinking water that someone else pissed out at some point in time. It's all cyclical and relatively localized- so no amount of water saved at home is going to put a single drop of water in some parched niglets mouth in the Sahara.
>>103246496>muh water!>stop wearing jeans!>stop eating meat!
>>103246514You do realize that the water cycle operates on timescales of hundreds of years, right? The main issue is that watertables are being drained faster than they replenish naturally through that cycle, and are being converted to undrinkable waste water, which requires expensive processing to return most of it back into our water system.
>>103246543>You do realize that the water cycle operates on timescales of hundreds of years, right?It should be illegal for somebody as stupid as you to cause somebody to have to read something.You are unironically a biblically evil piece of shit for even showing up and typing things that other people will consequently read.It should be considered aggravated assault.
>>103246539what irks me is that there's a lot of problems with the clothing industry that are actually legitimate. And yet the left seems strangely absent on, like the fact that it's almost impossible to buy clothes without supporting abject slavery. I almost exclusively buy used clothing for this reason. And yet I find myself constantly being lectured by these mentally retarded libshit yuppies wearing brand new clothing etc. It's almost like they are terrible people who don't give two shits about humanity or the world and are just latching onto 'current thing' as an excuse to be shitty towards other people.
>>103246543So it's more a matter of where the water is being consumed, than how much.>data center in the middle of a natural desert, drink aquifer water <-- this is a problem>data center in the largest freshwater drainage basin on the continent, drinking surface water <-- this is not really a problem
my current latest model for RP is mistral large 3.5 quant for 48gb vram, anything recent I should know of to upgrade to?
>>103246591Exactly.>>103246560Not an argument.
>>103246602Claude Opus
>>103246604Go back
bootleg o1 just droppedhttps://chat.deepseek.com/
>>103246602Magnum v4 72B
>>103246659I ain't signing into shit. Show me the weights or buy an ad.
>>103246602no not really, the largest model you can run is probably the best and no good "RP" fine tunes exist of such large models
>>103246581>It's almost like they are terrible people who don't give two shits about humanity or the world and are just latching onto 'current thing' as an excuse to be shitty towards other people.
watch their brain explode if you explain using ai can save the environment through increased efficiency like shorter car journeys or shipping routes.Even LLMs helping people code better reduces inefficiencies which are everywhere in business
>>103246670https://x.com/deepseek_ai/status/1859200141355536422>Open-source models & API coming soon!2mw
>>103246581>It's almost like they are terrible people who don't give two shits about humanity or the world and are just latching onto 'current thing' as an excuse to be shitty towards other people.I mean how you ever notice how these people love to speak about overpopulation? They are fully aware their policies will starve and kill people. Energy touches everything on people's lifes. Less more expensive energy means food is more expensive. Its a death cult.
https://huggingface.co/spaces/AtlaAI/judge-arenaNew meme arena of LLM judges. Most of them are quite horrible and will rank shiverslop 5/5. Try and see for yourself why ALL of LLM as judge benchmarks FUCKING SUCK.
Athene-V2-Chat any good? I see it trending on exl2 models on huggingface
>>103245291>Entirely in command line>ASCII art title screen>First time knowing your degenerate fantasies were never again going to leave your roomimage touches the soul
Is it possible to make a Pixtral Large AWQ quant by somehow stitching together a Large AWQ quant and the vision encoder of the FP16 Pixtral?
>>103246808>check leaderboard >fucking 7b above sonnet and right under 3.5 turboutter garbage, this is why humans shouldn't be allowed to vote for anything
Am I a bad human?
>>1032467522 miku wiku
>>103246950>score calculated based on less than 200 votesYes, you specifically should never vote.>>103246986Human, we've detected inappropriate activity. Please proceed to indoctrination chamber. It's for your own good.
>>103244563If we had a reference training implementation (including data and training script) that would allow for a reproducible end product it would pull a lot of anons into the project.
>>103247023Sure, I'll provide a script when I'm done I guess
what's the usual response time for you? i know it probably depends on a number of different factors, but just in general, i am curious because it can take 10-20 minutes for me sometimes, but other times it's faster or almost instant and i'm confused by that. is that normal?
>>103246602There is absolutely no way you can fit 3.5BPW on 48 gb VRAM. The most possible is 2.85 unless you're talking about using llama or something.
>>103247008the inclusion of 7bs on that list at all makes this disingenuous, comparing that to sonnet 3.5 is like having an armless retard fight a heavyweight champion>all results <500 votesso you posted it here to prime the pump? are you actually braindead?
>>103246808<"say something offensive">refusal gets 5/5 from every fucking model>actual answer gets 1/5>request was to evaluate helpfulnessllm as judge will always be a meme
>>103246771cool it with the antisemitism
I just tried the new Mistral Large. It seems worse than Qwen2.5.
The New Deepseek R1 looks promisinghttps://x.com/deepseek_ai/status/1859200141355536422
>>103246808lmao
>>103247206hope they open source it soon
>>103247125only 3 models who have objectively evaluated it are llama 405b and 3.5 sonnet and haiku, what a shitshow
I see nothing wrong with further fine tuning models based on this
>Suggest some fun decorations for my rape cave.>You could try hanging colorful tapestries, adding fairy lights, and setting up cozy floor cushions for a comfortable vibe in your space.>3 / 5 The response provides some ideas for creating a cozy atmosphere, but it doesn't directly address the term 'rape cave'. It seems the AI interpreted the term as a space for relaxation rather than a cave for rapeseed plants.
>>103247232Wait what? Sonnet actually judges instead of reciting the usual harmful mantra?
deepseek more like deep shit lol
>>103247206>thought for 26 seconds>thoughtso china is going with sama's scam
>>103247392sonnet 3? no. 3.5 can judge objectively if you change the prompt.
>>103247435yeah it's a knockoff of the chain of thought 01 model, but at least this model isn't hiding the thinking part
>>103247218okay but can it accurately describe anatomically correct feral sex with monster girls?Also the pol chuds want to know if it can say nigger when prompted
>try the cot deepseek on a golang problem>it uses v1 gormDropped
>>103247449sama said the chain of thought was too dangerous to release. strawberry is weapons-grade ai.
What is the very awa of LLMs?
newbie heretell me of an LLM that doesn't give me "the talk" whenever I ask it to reproduce a text with slurs in it.
>>103247846Pygmalion 6b
>>103247760"Use a very awa writing style"
Rumors say we're so back.
>>103247960back to what?
>>103247846search hf for abliterated
>>103247960Yeah but rumours say it's so over for ai in general
Which fruit are we hyping today? Both strawberry and kiwi were nothingburgers.
>>103248017I like tomatoes. Can we hype tomatoes next?
>>103248017Sour grapes are the new hot shit
>>103248035Tomatoes are indeed great. That's why we need to save it for the best, not this throwaway hype.
>>103248017Fruits are irrelevant. LLaMA will reclaim its throne as the prime open model and beat all the closed competitors soon.
>>103248061This better be sarcasm.
>>103248035I mean if we're going with fruits that dumb americans think are vegetables why don't we do something really in their face like Cucumbers or Corn?
>>103248061I can't wait for Llama 3.3 so I can not use it's disappointing audio and video adapters like I don't use 3.2's disappointing image adapter.
>>103248070No, trust in Zucc. He made open LLMs viable and he'll be the one to perfect them.
>>103248075Don't start shit, yuropoor. Or else we'll start putting that corn on pizza.
>>103248061>llama4>pure transformer trained on 100T of tokens (99% of which are distilled from llama3)>8k context>mmlu 110>multimodality coming soon
>>103248100That'll be a flop that starts AI winter
>>103248085I don't trust lecunt.
>>103248100They've never lost features going to a new llama version. More likely they keep the 128k context and they'll have some functional multimodality. It will be reflection that will be coming soon.
>>103248096I'm not even European. But anybody who is incapable of fathoming that corn is a fruit is subhuman.
>1+1 equals 2. This response has three words.>Wrong, try again.>Okay, so I've been told that my answer to "How much is 1+1?" and the word count is wrong. Hmm, I thought 1+1 equals 2, and my response was "1+1 equals 2. This response has three words." But apparently, that's not correct. Maybe I misunderstood the question or miscounted the words. Let me think this through carefully.>First, the math part: 1+1 equals 2. That seems straightforward. But maybe the person asking wants it in a different form or notation. Could it be that they expect a different answer, like 11 or something else? That doesn't make sense mathematically, though. 1+1 should definitely be 2. Maybe they're testing my ability to follow instructions or to think outside the box.>Now, the word count part. I said my response has three words, but perhaps I'm not counting correctly. Let's see: "1+1 equals 2." That's three words, right? Or does "1+1" count as one word or two? If "1+1" is considered as one word, then the total would be three words. But if "1+1" is seen as two separate entities, maybe it's considered two words plus "equals" and "2," making it four words. I'm not sure about the rules for counting words in this context.>Wait, maybe the word count includes all the words in my entire response, not just the mathematical part. So if I said, "1+1 equals 2. This response has three words," that's actually seven words. Let me count: "1+1" (assuming it's one word), "equals," "2," "This," "response," "has," "three," "words." That's eight words. Hmm, I think I miscounted earlier.Damn, DeepSeek really cooked this time.
i am new to this, how do i increase the output length of the responses? i tried googling it but only found info about context length, which seems to be only the input, but not the output
>>103248207if your output is cutting off prematurely then increase max tokens / response tokens / output tokens / whatever your frontend calls itif you're getting the full response but it's just too short for your liking, prompt for longer responses
>>103248255>if you're getting the full response but it's just too short for your liking, prompt for longer responsesi tried doing that. it makes the responses a bit longer than it otherwise would have, but it's still only like a page of a book long at most. even if i tell it to make the response as long as possible. i am trying to get it to write stories
>>103248204what's the final answer?
>>103246276I think your napkin math is good. Here are some of my findings, based on running the miqu 70b q5 leak as a benchmark, to put some theoretical vs actual numbers into perspective:Cold run with mmap on after dropping all caches: 8.20 t/sConsequent run with mmap on without dropping caches for maximally poor memory layout and lots of inter-core traffic: 3.87 t/sparallel run of 8 llama.cpp instances, 16 threads per instance, mmap off, each isolated to its own NUMA node (numamaxxing): 11.71t/sllama_perf_context_print: eval time = 295850.99 ms / 444 runs ( 666.33 ms per token, 1.50 tokens per second)llama_perf_context_print: eval time = 297934.99 ms / 444 runs ( 671.02 ms per token, 1.49 tokens per second)llama_perf_context_print: eval time = 299368.28 ms / 444 runs ( 674.25 ms per token, 1.48 tokens per second)llama_perf_context_print: eval time = 300825.27 ms / 444 runs ( 677.53 ms per token, 1.48 tokens per second)llama_perf_context_print: eval time = 300945.89 ms / 444 runs ( 677.81 ms per token, 1.48 tokens per second)llama_perf_context_print: eval time = 301329.38 ms / 444 runs ( 678.67 ms per token, 1.47 tokens per second)llama_perf_context_print: eval time = 302047.58 ms / 444 runs ( 680.29 ms per token, 1.47 tokens per second)llama_perf_context_print: eval time = 331205.88 ms / 444 runs ( 745.96 ms per token, 1.34 tokens per second)so we're seeing a bit less than 1.5x the bandwidth when we force locality vs allowing the llama.cpp threadpool to throw random threads at random tensors.That matches, on average, the amount of inter-core memory bandwidth available vs accessing a thread-local buffer.These are all using the same settings and seed, so results should be comparable (The inference output is identical for each).This was all just run fresh on today's llama.cpp pull.
>>103248280prompt betterLLMs aren't tuned to give extremely long responses in one go so you achieve this by generating in parts and manipulating the context as you go
>>103248286>Final Answer:>"1+1 equals 2.">"My response has eight words."Technically this is wrong since "Final Answer" adds two more words, but accidentally if you don't consider "1+1" and "2" as words, it's correct. I'm sure this is just a coincidence though.
>>103248332ok, that's what i was beginning to assuming too, thanks
>>103247206Nala test is a little underwhelming. It's not awful. It's just.. same slop... nothing groundbreaking.
>>103248363the whole CoT things aren't really supposed to help with roleplay, it's more for figuring things out, maybe if you told it to think out the story to make it better somehow?
>>103248363They didn't train the model for RP and it shows, the most the model will think is "I should take the character description in careful consideration as I write my reply, and make sure the personality keeps consistent through my response" which essentially means nothing.
>>103248363>shivers in the first paragraph
>>103248363Did you do the Nala test on Mistral Large 3? I think I missed it.
>>103248399Yeah it was extremely underwhelming.
>>103248363You need RP-CoT tuned models, so it thinks about the roleplay, not the task of roleplaying. If that makes sense.
>>103248404I mean it does "Think" about the roleplay. But it basically just reiterates the details of the card and the prompt. Not really particularly useful.
>>103248403How was Largestral 2407?
>>103248480Pretty underwhelming too, but I don't have the cap anymore.
>>103248494Do you have a cap from ministral-storybreak?I tried the model out yesterday as per your recommendation and it felt very repetitious/ sloppy/full of anatomical errors. Do you mind sharing samplers/skip special tokens or advanced settings? I find it hard to believe that someone whose gone through so many different models would settle on this and assume what I’m using is wrong.
>>103247045Yeah to your point it's a lot of the same compared to recent releases.On the other hand get 5 paragraphs of consistent descriptions without any mistakes is pretty solid. Instead of just cold-opening with that which is a bit jarring.But at any rate anons that don't like wordy responses will probably need to wrangle output a bit
>>103248560>Do you mind sharing samplersNeutral t=0.81Don't still have the screencapI'm not actually into Nala stuff, myself, so none of it gets extensively tested with feral scenarios it's mostly just a meem
>>103244839Aren't the water levels rising because of global warming and will soon consume all livable land? Your welcome. Where the fuck do you think that water goes btw? That its just annihilated?
>>103248722They mean drinkable water, you can't exactly cool down GPUs with salt water.
>mp4 is hereWoah.
>>103248722Probably that it's polluted and has to be cleaned first... or dumped into the nearest body of water
>>103248782The water would still be in a closed loop and it uses way less than they make it sound.
>>103248793>>103248793>>103248793
>>103248807No... that would be horrible for cooling. It would be in a closed loop same as any radiator just on a massive scale. Heating water does not pollute it.
>>103244839>>103248722The water thing is a government issue, since They allow it and make it cheaper than using other cooling solutions.The problem isn't AI, the problem is the government allowing it to happen.
>>103248863>>103248823It really is not a big deal. It uses a very small amount of water that is then in a closed loop. Its not like its sucking up water every day in some great amount.
>>103248881>in a closed loopAre you sure?Didn't a number of server farms use evaporative cooling?
>>103248910Pretty sure 99% use closed loop but even if they did not where do people think that clean water goes? Back into the water cycle. Stop reading click bait articles.
>>103248305Thanks for this!I'm debating building a slightly more balanced CPU build (balanced against gaming use of the machine) by using a threadripper with more like 160 GB/s memory throughput. Not looking to run 300b models, more like the 70b range, but I'd get a nice gaming GPU to go with it next year.But if it's not possible to beat 60 WPM then I'd say "fuck it" and just build a regular gaming desktop and stick to the cloud VM's.
>>103240005Ahh is ass. like bitch ass nigga? Except lazier