/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>102557546 & >>102552020►News>(09/25) Multimodal Llama 3.2 released: https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices>(09/25) Molmo: multimodal models based on OLMo, OLMoE, and Qwen-72B: https://molmo.allenai.org/blog>(09/24) Llama-3.1-70B-instruct distilled to 51B: https://hf.co/nvidia/Llama-3_1-Nemotron-51B-Instruct>(09/18) Qwen 2.5 released, trained on 18 trillion token dataset: https://qwenlm.github.io/blog/qwen2.5/>(09/18) Llama 8B quantized to b1.58 through finetuning: https://hf.co/blog/1_58_llm_extreme_quantization►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/llama-mini-guidehttps://rentry.org/8-step-llm-guidehttps://rentry.org/llama_v2_sillytavernhttps://rentry.org/lmg-spoonfeed-guidehttps://rentry.org/rocm-llamacpphttps://rentry.org/lmg-build-guides►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbenchJapanese: https://hf.co/datasets/lmg-anon/vntl-leaderboardProgramming: https://hf.co/spaces/mike-ravkine/can-ai-code-results►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
►Recent Highlights from the Previous Thread: >>102557546--Running 405B model out of swap for storywriting discussed:>102562681 >102562696 >102562781 >102562801 >102562866 >102563205 >102563274 >102563337 >102563429 >102563483 >102563566 >102563368 >102562874--Nature paper explores unreliability of larger and more instructable language models:>102562635 >102562668 >102562702 >102562783 >102562788 >102562697--Llama.cpp maintainers wait for contributors with software architecture skills to add multi-modal support:>102561800 >102561867 >102561910 >102561929 >102561976 >102561905 >102562037 >102562238 >102562274 --Qwen 72b and GPT-4o succeed at scrolling sine wave coding challenge:>102561725 >102561780--Llama3.2 1B output and discussion on training data curation:>102563707 >102563790 >102563855 >102563957 >102563804 >102563823 >102564062 >102563969 >102563996 >102564022 >102564073 >102564101 >102564210 >102564231 >102564258 >102564291 >102564328 >102564474 >102564312 >102563991 >102564020 >102564090--Future of llama.cpp HTTP server debated:>102564790 >102564836 >102564855--Yann LeCun tweet comparing LLM performance:>102562994--Discussion on using base models vs. instruct models and the challenges of training your own models:>102562778 >102562786 >102562824 >102562840 >102563010 >102563054 >102563068 >102563099 >102563143 >102563159 >102563183 >102563212 >102563238 >102563298--Discussion about the Director extension for sillytavern:>102558221 >102558266 >102558285 >102558300 >102558343 >102561423--Clarification on reasoning behind o1's performance and potential improvements:>102558399--Char card writing tips and debate on using {{char}} and {{user}} tags:>102562150 >102562260 >102562298 >102562312 >102562438 >102562327 >102562303--Miku (free space):>102558522 >102558892 >102563189 >102563263 >102563296 >102565148►Recent Highlight Posts from the Previous Thread: >>102557552Why?: 9 reply limit >>102478518Fix: https://rentry.org/lmg-recap-script
>mfw I can't train a SOTA smut model to compete with billion dollar megacorporations using my 4 year old gaming GPUs
>This tranny shill is still tilted and I'm living rent free in his retarded headThe amount of schizo retards lately is incredible.
>>102565849You can already get smut from any model. The only ones complaining are the ah ah mistress skillet gang.
Does anyone know if the 20B vision parameters from 90B process the entire context? Or are they used only when an image is present?
>>102565904including hella sloppa if the only requirement is "generate some form of smut"
https://xcancel.com/kopite7kimi/status/1839343725727941060It's official, the RTX 3090 is gonna have 32gb of VRAM
>>102565941How did Kimi Raikkonen find this out?
https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/>During adapter training, we also updated the parameters of the image encoder, but intentionally did not update the language-model parameters. By doing that, we keep all the text-only capabilities intact, providing developers a drop-in replacement for Llama 3.1 models.Wait, so the LLM part of the Llama 3.2 models is literally identical to 3.1? Doesn't that mean you could swap out those LLM weights for any uncensored finetune of llama 3.1, thereby creating an uncensored VLM? Because in my experience testing 3.2 on captioning images, it very much can see the NSFW parts of the image, it's just incredibly hesitant to describe them. It also seems like the image features are fairly low-level, relying on the LLM to piece things together and infer what's happening in the image. So maybe all it takes is replacing the LLM weights in the vision model and it can be greatly improved.
>>102565949>How did Kimi Raikkonen find this out?Bwoah!
>>102565941Damn, I may actually get one then.
Anyone else feel like local LLMs already peaked and it has been downhill for a while? I'm looking at some older gens and models could actually write in a style that wasn't just X, Ying.
>>102565950Oh thanks for posting that. So it confirms that it is indeed the same weights (if that line can be trusted).Man I wish they would separate out the safetensors so you didn't have to basically redownload the stuff.
>>102565950I'm guessing that's why the multimodal performance is lacking compared to some competitors. But it's logical why they would do this. Hopefully Llama 4 is just multimodal from the beginning.
>>102565994no
>>102565994The last time I tried an old model because people said it was slop free, it was utterly garbage. Dumber, AND it even had slop. I fucking saw whispers and other shit. Maybe it wasn't nearly as slopped as some current models though, sure. But I think it turns out that a lot of slop in fact comes from human datasets, not merely just tuning on synthetic data.
>>102565994yes
Is molmo good as a text model? Yes sure it's good with images but how about just regular ass RP?
>>102565822>>102565835sexwith miku
>>102566164People seem to think it's decent, at least. I've seen a big outpouring of astroturfed "WOW AMAZING!!!" feedback on it, but in reality it seems to just be goodish. Probably not better than any similarly sized model out there.
I'm finding that Mistral small is much better with a one message prompt, where all the previous chat, instructions and context are put in only one user message without system tags in between. Has anyone seen this happen with other models?
>>102566189>Probably not better than any similarly sized model out thereLike what? No one has ever said Qwen or Llama 72-70B are good for RP.
>>102565941>5090>32gbwho is this guy? is he a trusted source or another grifter?
>>102566201Can you give an example of what that looks like?
>>102566227Yeah. I'm saying that they're all kinda meh. To be fair, most people are running the 7b, I don't think a lot of people are ABLE to run the 72b yet, between the initial layer of hardware filtering, the lack of GGUF, the lackluster performance of Qwen since it's based on it, etc.
>I haven't yet figured out how much their server maintains the spirit of the refactoring from #5882, or if merging their version of server.cpp into ours would be too much of a regress. If we're going to continue this discussion much further, perhaps opening a new issue to discuss sync'ing our version of server.cpp with ollama's would be useful?
>>102566240He is the CEO of trusted source
>>102566246As in[INST] {Description}{examples/previous chat} (no [INST] etc.){instructions}: continue the roleplay etc.Then finally [\INST] and the AI reply
>>102566301>discuss sync'ing our version of server.cpp with ollama's would be useful?upstream has become downstreamgrim
>>102566312Huh. And how are you formatting the previous chat? Just this?Character 1 name: blah blahCharacter 2 name: blah blahetcetc
>>102566349after november 5th, can you post a zip of the full collection? they're really good
>>102566349What are we hoping happens after November 5? Strawberry is already out.
>>102566385Strawberry 2
>>102566385Hold on, o1 was strawberry?
>>102566399Yeah that's what they said.
>>102566385>What are we hoping happens after November 5?Trump will be president for a 2nd time
>>102566385not hoping, but wouldn't be surprised if it ends up being the day llama.cpp repo gets archived with how it's going, probably what the webm with bodies meant, collective vramlet death
>>102566352Yes, like that
>>102566385>he doesn't know
>>102566399>>102566406God, how awful. It's honestly notably worse than chatgpt4o-latest, and even their worse models like the furbos, 4, etc.
>>102566421Interesting, thanks.
>>102566406Then why did they hype up november only to release it 5 days after Reflection?
Oh, Qwen...
>>102566490IdkAsk the guy who's apparently responsible for strawbhttps://xcancel.com/polynoamial/status/1834280155730043108
>>102566408>implying Dominion isn't dialed in now
>>102566515Oh god. It's really bad.
>>102566551Daddy...!
>>102566549they know they can't do it twice, that's why they tried to kill him twice https://www.hindustantimes.com/world-news/us-news/ryan-routh-sported-biden-harris-sticker-on-pickup-truck-accused-trump-of-turning-americans-into-slaves-101726467986038.html
>>102565541Why the fuck do you think I said 32GB? Already saw these leaks a while ago, not exactly a big shocker. Question is if they give it to the 5090 or the (not going to happen) Titan. No real reason to give it to the 5090 when you think about it either.
>>102565941>>102565949Bless this autistic little faggot https://youtu.be/7i1jFcPwqoo
>>102566607>they can't do it twicelolwhat the fuck do you think is going to happen when they make harris win? trump will cry to the courts and they'll throw everything out, just like they did last time and like they did with the kerry shit
>>102566668back then they had an excuse to use dominion, they have zero excuses now so it won't happen, like I said they tried to kill him so they know that they can't do the dominion trick twice
>>102566607what the fuck do you think is going to happen when they won't let trump win by cheating again? trump will rightfully cry to the courts and they'll throw everything out because the pedo kennedys own this gay country, just like they did last time and like they did with all the other evil hitler tier shit they did.
What will you do with video multimodal llama?
>>102566695Gimmick
>>102566490They only released o1-preview, an early snapshot that had been sitting through US gov review; o1 full is still in training.
>>102566684>remote into voting machine, add 50k votesliterally nothing will happen, the US is a democracy in name onlymight as well call it the People's United States of America at this point kek
>>102566695I dunno, nothing? If it's another 20-60B added for another dogshit multimodality, it's not worth it.
>>102566704That'd explain why the language it chooses to use feels like old GPTisms.
>>102566720why are they trying to kill him if they can simply cheat like on 2020 and call it a day? it's gonna be more difficult to do this time, that's the point, we'll see
>>102566408>>102566549>>102566607>>102566668>>102566684>>102566693>>102566720>>102566814nobody cares >>>/pol/
>>102565835>Future of llama.cpp HTTP server debatedDoes anyone unironically use llama.cpp server?
>>102566704>the smarter one gets worse at "biology" when you sample from many answers instead of letting it just run oncenow I wonder if it starts exploring some unacceptable chains of thought when it tries to reason about whether transwomen are women
>>102566980What else do I use? I haven't been here for ages
>>102567061Everyone here uses KoboldCPP, get with the times grandpa.
LLMs are like lolis: the best ones are small and impressionable.
>>102566980Does exactly the same thing as all other forks and wrappers. More like why use anything else?
>>102565994>X, YingThat's a /aids/ dog whistle. No wonder you're miserable.
>>102565994I have to agree. After seeing how hard Erato punches above her weight, it's honestly hard to go back to localslop. I'm really trying. But we'll have to catch up eventually... I mean, I have to believe in something, don't I?
>>102567147>hur dur not local trash in a local threadgo fuck off into the cloud thread and buy an ad nigger
local has improved substantially for everything except gooning to child rape """roleplays"""
>>102567184>it has improved substantially except for the only reason you would want to use local in the first place
so is there a llama-cpp-python server script for multimodal like there was for spec decoding? that is theoretically possible right? or is it fundamentally unsupported in the llama.cpp library itself instead of just not implemented in an example/server?
Any word on llama 3.2 support for llama.cpp or exl2?
>>102565941It's over.Man, what the fuck happened to AMD's big push to put a gorrilion GB of HBM on consumer cards? Who can save us now? All these new accelerator startups are still YEARS from being capable of taping out competitive chips.
>>102567281>Any word on llama 3.2 support for llama.cpplol
>>102566515Yep grim lol
>>102567281see>>102561905>>102561867
>>102567108I'm miserable because it's hard being a prosegod in the current local meta.
>>102565835The bookmarklet is very convenient. I didn't know about them.
>>102566666>Bless this autistic little faggotthishttps://youtu.be/gc7av-OXMyg?t=9
>>102567337That's a basic sentence structure fucking troglodyte
>>102567355I didn't realize how easy it was, either. literally right-click on bookmarks toolbar, "new bookmark", put the oneliner in the URL field and just click it once on each new thread to fix the links.recap anon should probably add a note on that in the rentry
>>102567329They will add support to it just like they added support to Gemma when it released.
molmogguf?
>>102567380Yes but when it's done in 90% of the sentences it's annoying.
>>102567329God fucking damnit you goddamn NIGGERS. It's literally called llama.cpp. Multimodality is going to be a feature of future models with the first big release being llama and somehow there isn't a rush to support it? I spit on niggerganov.
llama.rust when?
>>102567470there's something called mistral.rs
>>102567459smart people aren't here to give you everything you want for freehave you considered having claude make it for you?
>>102567470https://github.com/huggingface/candle
>>102567470https://github.com/EricLBuehler/mistral.rs
So does Llama 3.2 90B pass this test or not?
>>102567500>unsafe>unsafe>unsafeWow what a great language, how safe.
So wait...if I neutralize all samplers (making them either 0 or 1 depending on the setting) and just put top k up to 64 and temperature to 1.05, I get non-sloppy results on L3.x models? Why didn't anyone tell me this earlier?
>>102567500>Implement the Llama 3.2 vision modelshttps://github.com/EricLBuehler/mistral.rs/pull/796Seems almost done with it from the todo.
>>102567577shhh, don't tell them that trusty old top-k is the secret sauce for true soul
>>102567497>smart people aren't here to give you everything you want for freeIgnoring multimodality support seems pretty dumb to me. >for freeOpen source is literally smart people giving me things for free.
>>102567603>Open source is literally smart people giving me things for freesmart people btfo
>>102567365>mfw this dude is so well known that even as an outsider to F1 I'm well aware of him and his autism in interviews or driving skillsGotta love how that stuff works
>>102567292Didn't they put HBM in Vega? The fuck happened?
>>102567577post a singular slopless log.
temp: 1.28top k: 30you can now enjoy llama 3.2
>>102566396>not 'Strawberry 3'shiggy diggy
>>102567549Maybe it is like vision stuff where quanting rapes the vision part.
>>102567822But most quants of vision models just run the vision part at full precision...
>>102567822The quant rapes everything, it's just harder to notice with the text than the images. Quantfags are literally holding everyone back.
>>102567813So llama 3.2 is overcooked?
>>102567577I just increment minp above 0 (even as low as 0.01) and it does the same thing. These samplers confuse me and I don't know what I'm doing
>>102567878>Quantfags are literally holding everyone back.Shifting the blame onto average Joe from greedy vram denying manufacturersI see what you're up to!
>>102568217If people weren't coping with shitty quants then we'd have more blame available to throw towards the manufacturers.
>>102565822LLaMA-3.2 quantization evaluationhttps://github.com/ikawrakow/ik_llama.cpp/discussions/63
>>102568236>copinghttps://www.reddit.com/r/LocalLLaMA/comments/1fps3vh/estimating_performance_loss_qwen25_32b_q4_k_m_vs/
>>102568430>quants are magically better in some casesYou cannot tell me that those tests aren't shit.
>>102567878Buy us all a few hundred GB of VRAM each, then.You can afford it, you're not poor, right?
>okay lets see how handicapped 90B is at writing>it’s somehow even worse, and also the refusals are now loopsDamn I think this is the first one that actually needs abliteration AND tuning.
Interesting long context benchmark that prompts models with entire recently-published novels and checks their recall and understanding.https://novelchallenge.github.io/index.html>Nocha is a dataset designed to test the abilities of long-context language models to efficiently process book-level input. The model is presented with a claim about a fictional book along with the book text as the context and its task is to validate the claim as either true or false based on the context provided. The test data consists of true/false narrative minimal pairs about the same event or character (see example below). Each false claims differs from its paired true claim only by the inclusion of false information regarding the same event or entity. The model must verify both claims in a pair to be awarded one point. The accuracy is then calculated on the pair level, by counting the number of correctly identified pairs and dividing it by the total pairs processed by the model.
>>102568773What do you mean? This is peak writing right here! We localchads support safety and inclusivity in AI space!
>>102568781Glad to see the mistral-large meme to finally die. vramtards once again BTFO, enjoy your goliath 2.0 fucking retards
>>102568773I found 3.1 tunes to need high temp with a little min P but its worth it for the smarts / instruction following which is legit as good / better as claude / gpt4s. They are smart enough to not go retarded. Though Hanami is the 3.1 tune im talking about.
>>102568861>Jamba mini beat Jamba largeMaybe there's hope for VRAMlets after all... as soon as the mamba PR in llama.cpp merges I'm gonna be testing the fuck out of it
>>102568236>NOO STOP COPING WITH THE THING YOU'RE FORCED TO USE BECAUSE THERE IS NOT MORE VRAM AVAILABLE REEEYou are retarded and a literal shill for Nvidia and AMD holy hell kill yourself
>>102568861fuck you I can carry three watermelons just fine
>>102568781Surprised to see Jamba doing so poorly.
>>102568781>commander r better than pluswhat?
>>102568781>405B >52% accuracy lmao this shit is dead
>>102568954Why? There have always been shortcomings with various benchmarks, it's reasonable that there are some drawbacks to Jamba's method of context extension that weren't obvious on those.
>>102568954It's not like needle in haystack. Model actually needs to meaningfully work something out of the text provided. Which makes the results kind of weird anyways.
>>102568861Goliath fiasco legitimately made me cancel my second GPU order, dodged a fucking bullet there.Never listen to vram hoarders, it takes just a couple of minutes to check those models online for cents. They're nothing special.
>>102568781Oh no no no 3.5 Sonnet sissies...
>>102568954They always were weaker than other smaller models at typical normal context tasks, they just held up better at higher contexts, at least according to their published data. I'm convinced they've just been training them on shit data. Maybe if someone like Mistral experimented with the architecture we'd have something.
>>102569002oh no no no no zoomer buzzword coping faggoting nigger brother sister trannies... shuit up you dumb cunt holy fuck grow up
>>102568781>literally the best model available to humanity is only 68% accurateIt's over.
>>102568861okay a bit of cope from me, but it seems that largestral does poorly because they test on a really long context and largestral has about 32k of real context
>>102569023seething lmao
>>102565950maybe if someone knows what theyre doing. im trying to load in the state dict from 3.1 8b over 11b but i just get gibberishtheres a mismatch in their token embedding matrix dims (128256 for 8b, 128264 for 11b) so i am just using the one from 11b
>>102569030Huh. I wonder what the benchmark looks like if it was done at 20-30k then.
>>102569002Based on the results it seems like this is pretty sensitive to parameter count, which I guess makes sense since it needs to be able to juggle way more data in its head at once than is typically asked of a model. Given what a jump 3 to 3.5 was, 3.5 Opus will be fucking mindblowing without needing OAI's cotslop tricks
>>102568781>Mistral-Nemo 2.70%I knew it was bad with context but damn>MegaBeam-Mistral-7B-512k 21.62%Interesting>GLM4 9B 1M 24.32%Does that work correctly in gguf now? Last time I tried it seemed broken.
>>102569068probably somewhere along llama3, but i don't understand why hacks at mistral say it has 128k when it falls apart catastrophically after around 40k
>>102569030>>102569068>>102569120it's trash, stop coping about your new goliath model
>>102569023Say that to the other zoomer shitposts in the thread.
>>102568781Kinda sad how even 405B is just 52%
Nobody NEEDS more than 32k context. Even that's pushing it, because that's a day long slow burn RP.Who the fuck is going to shove entire books into LLMs? What is the use case for this?
>>102569176I suggest you slow down a bit, Mr. Emanuele.
>>102569176I mean sure. This is mostly just a dick measuring context. Though it's possible that the higher scorers on this list correlate also to general ability to use context at any context length too. Not sure though.
you can always tell when a stray from reddit wanders into the thread
>>102569170To be fair, it's still absolute top tier amongst all the lesser non-openai models.
>>102568781Hey wait a second, if you open the arrows, some of them list the precision. They tested the Llama models through fp8 APIs, while Qwen was done at full precision. It would be interesting if they could at least test a full precision 8B or something to see if that has any effect. Honestly more of these benchmark makers should be running them on at least one series of quants just to see what happens.
>>102569176I want to make long stories
mistral bros...
>>102569245it's your responsibility to scar him for life
>>102569336Mistral is the modern equivalent to falcon models
>>102569336not like this...
If generation is good I can overlook bad long-term memory
>>102569388cuck mentality.
>>102569336>>102569382They're afraid to show Mistral Small results because it would BTFO Qwen. Rigged.
>>102569388Well, this is just one aspect of model quality. In the end there are multiple we have to keep in mind. Censorship, word choice, anatomical understanding, ability to follow instructions and play the role of a character, ability to understand things and having general knowledge of the world, long context performance of each of those aspects, etc.
>qwen is smarter than nemo!i don't care, qwen doesn't make my pp big
>>102569403Sweetie, I think we should try to steer this discussion in more appropriate direction.
>>102569415We need an /lmg/ benchmark that can test all this at a range of contexts + quants.
mistral bros how do we spin this?
>>102569427Good luck creating that benchmark lol. There have been a few attempts that were all flawed.
>>102569427Will it measure horniness?
>>102568781Damn, if only Qwen wasn't so filtered and benchmarkmaxxed.
I'm very confused by GPU layering. All I want to know is what the fuck adding layers is and everything I've Googled and Bing'ed is just a bunch of bullshit that will not explicitly tell me how many layers I should put on GPUfor instance, if I have 24gb vram and the GGUF i downloaded says it requires 40gb vram what do i put? and what if the model requires only 20gb vram? what the fuck?
>>102569485just set that shit to -1 on kcpp if you only have one gpu and let it do the math for you
>>102561725>On my coding challenge from yesterday (create a pyqtgraph plot of a scrolling sine wave, as the wave moves the next cycle should have a different amplitude (random from 1 to 10)): Qwen 72b succeed at it, deepseek coder v2.5 also doesn't quite get it, llama 405b also fails, so far only qwen 72b and gpt 4o did itI'm running retries of that on my collection.Asking for `qt5` doesn't help any. Asking for it to fix after posting what error appears has worked in one particular kind of mistake that the Llamas make.However, my quant of Qwen2.5 (q5km) is not giving useful files. Were you using non-lobotomized to get it to offer proper code?
>>102567061LM studio
>>102569026>Implying regular o1 doesn't get 75%+>Implying Orion strawberry won't get 90%+ by the end of the year
>>102569495i was doing this and watching nvtop and it's rarely using more than half my vrami have two gpus one 16gb one 8gb
>>102568884>as soon as the mamba PR in llama.cpp mergesI also can't wait until your girlfriend finishes that PR.
>>102569382Saved. Just in case the Mistral shills decide to appear again.
>>102569651I could believe that. Still, kind of terrible though, these benchmarks are easy. Though to be fair for the things an LLM can do, they can do it faster than a human, which is nice and can be of some X amount of economic value.
>>102569382wtf, so all that praise for Nemo was a fluke?
>>102569651yeah yeah we get it 2weeks *cough* (1year) or something
>>102569800not really? it's bad at big context but most vramlets run under 32K anyway
>>102569718easy? their average book is 127k tokens. I'm surprised most of the models didn't shit themselves worse.
>>102569651>GPT5 surpasses most humans at FIXED BENCHMARKSGood fucking job you did it, congratulation faggot.
>>102568861>Sour grapes, the post
>>102569800the new cope is that nemo is dumber than qwen but nemo has more """soul""" because it's """not censored"""even though just a few hours ago in the last thread they were claiming mistral has better cultural knowledge than qwen because some guy on hf did a vibe test, meanwhile this benchmark tells a completely different story
>>102569651It's still not going to be human-like
>>102569840>mistral has better cultural knowledge than qwen because some guy on hf did a vibe test, meanwhile this benchmark tells a completely different storythis is a context bench and has nothing to do with trivia?
>>102569838
>>102569651Anything short of 100% is a toy
>>102569856completely incorrect and you should be embarrassed
>>102569840yeah basically.qwen2.5 is unusable and worthless for RP due to its positivity bias, dryness, and censorship.
>>102569907>unusable and worthless for RP due to its positivity bias, dryness, and censorship.the local models experience in a nutshell
>>102569832I meant easy for humans. Sure yeah to do a "close reading" you need to spend time. As I said LLMs have the advantage of being fast. That's their strength, but they're lacking in a lot of other areas.
>perfectly describes all mistral models>"this is why qwen is bad"lol
>>102569886Except he's right and you are wrong. Also you sound like a salty little faggot.
>perfectly describes all qwen models>"this is why mistral is bad"lmao
>>102566980Yes, I have interfaces for all my tools that directly use the server over LAN.
>>102569918if we can uncuck this fucker than maybe not
>>102569965you realize the text part of those is extremely close to 3.1, right?
mistral shills working overtime for that BTC right now
>>102569965>we can uncuck no lol, no one can, fighting with RNG "safety mode" is boring, too.
gwen shills working overtime for that SCS right now
>>102569965Which of these tells me how good the model is at acting like a human?
So is Qwen2.5 censored or not?
>>102569965That would hypothetically only solve 1/3. But you won't even get that. Eliminating all refusals and brain damage qloras will only make it worse.
>>102569996a little less censored than mistral but yeah
>>102569996Why do you think people constantly talk shit about it? It's worthless for (lewd) ERP due to it's censorship, very much akin to GPT.102570001 (You)
>>102569918reminder
>>102570007Is that the same for the base model?
>people were saying it was bad before it was released>every qwen model gets this treatment>"Why do you think people constantly talk shit about it?"lol
>>102570015Anon... both are censored, it's a draw from the start.
>>102569996Q2.5 is a snotty bitch. Prefilling an acceptance doesn't work and even prefilling how it's going to respond properly will make it say "just kidding" and go back to refusal banter.
>>102570015Yes "your" soulless robotic assistant is better than mine
>>102569336I feel vindicated for thinking Mistral Nemo was shit all this time.
>>102570015Needs an update to depict both of them as seething after it's finally out and people are getting censored and rate limited.
>>102569965You posted vision benchmarks. What do these have to do with RP? Do you even know what the benchmarks you post mean?
>>102570017>>102506786
>local models are impossible to jailbreak
qwen2.5 could never do this
>>102570040*since 4o advanced voice finally came out, I mean.
>>102570037local doesn't have voicefus
>>102570058What's the use case
take your meds mistral/qwen samefagger
I am the only real human posting itt
>>102570053Yes, because what you call "uncensored mode" is fake, it got rng that kicks in at specific moments during your RP, greeting you with "Sorry! I cannot do that because muh reasons! It's important to blah blah blah.."
>>102570086Nah that's me.
>>102570058>kobold screenshot
>>102569036getting somewhere maybe<|begin_of_text|><|start_header_id|>user<|end_header_id|><|image|>Describe the image.<|eot_id|><|start_header_id|>assistant<|end_header_id|>Hello Beautiful bunny rabbitsisters,Here's Bunny Elliot rabbitkytottenrakiapel babkytuky 1upe babkytomba babkytuky babkytomba babkytomba babkytomba babkytomba babkytombat babkytombat
>>102570086>>102570091I have no way of knowing if you two have consciousness like me
>>102570066these are the kind of voices oai are creaming themselves over>>102560443 >Output examples:https://files.catbox.moe/i1bfph.mp4https://files.catbox.moe/ub9p55.mp4
>>102570111I know for a fact you're not human
>>102570113and these are the kind of voices localcucks are creaming over:
Found this apparent qwen2.5 uncensor finetune, have not tried it yet.https://huggingface.co/AiCloser/Qwen2.5-32B-AGI
I just want a local model with good trivia knowledge like Opus. Which is sad considering even Opus is pretty shit outside of very popular franchises.
>>102570113That sounds so bad. People will do anything to avoid having interactions with real people kek.
>>102570113Americans pick the absolute worst voices for everything. Voice acting and now this. It's not like you don't have people over there with nice voices, they just love dogshit apparently.
>>102570127>Aiiee Kyun~
>>102570128>32BSo Qwen 2.5 is 32B parameters of content and 40B parameters of woke?
>>102570136That is why Im suddenly interested in qwen. Apparently its 2nd place for local on that front its just censored to shit:>>102568781
>>102570093there is literally nothing wrong with kobold
>>102570148No, its a finetune that claims to uncensor qwen2.5 32B
>>102570066wrong https://vocaroo.com/12Qqgl775QT2
>>102570113>https://files.catbox.moe/ub9p55.mp4>faster pace, sound happierdude sounds completely dead inside>>102570165once again, not a trivia test, just recall and in context reasoning
>>102570176Oh, so it's a test to see if it works before doing the 72B?
>>102570176so 40b of woke?
I love this general. It's so bad.
>>102565822>>(09/18) Qwen 2.5 released, trained on 18 trillion token dataset: https://qwenlm.github.io/blog/qwen2.5/>Qwen2.5-Coder: 1.5B, 7B, and 32B on the wayThey release a 72B base, instruct, and math models but coder only up to 32B? Fucking why?
>>102570232dangerous
>>102570232It's too powerful and unsafe, sorry goy.
>>102570232So you have to learn coding and can't get it for free
So, what model to use for RP?
>>102570254Yes.
>>102570254cloud ones
>>102570254do you genuinely want to know
>>102570179can't talk live, also it's shit
There's too many fucking local models for RP now which ones are actually good in the 13b to 33b range?>>102570254i've been using magnum and nymeria, they're alright.
>>102570288Mistral small
For me? It's Mixtral-8x7B-Instruct-v0.1
>>102570254these are all good:>arcanum-12b-q4_k_m>Azure_Dusk-v0.2-Q4_K_S-imat>MN-12B-Chronos-Gold-Celeste-v1.Q4_K_M>MN-12B-Lyra-v4-Q4_K_M>ArliAI-RPMax-12B-v1.1-Q4_K_M>NemoMix-Unleashed-12B-Q4_K_M
>>102570306Based! Updated model coming soon btw!!
>>102570309>all 12b slopAnything for people that have more than 16GB of VRAM?
>>102570254Mistral Nemo. Finetunes are universally trash.
>>102570277Not him but it should be possible now. I remember getting that old piece of crap xtts set up with Silly Tavern and getting the latency down to around 1-5 seconds depending on the LLM's output.
>>102569838Honestly. Picrel is the whole second third of this thread. There is nothing wrong with being a VRAMlet, but damn if that samefag schizo shitting all over the place here doesn't make them look bad. Can only be either a seething thirdie or a locust trying to derail yet again.
>>102570309Well fuck, I guess I'm trying all those tonight.
>>102570347>xttsnot a voice-2-voice model, keep coping localcuck
>>102569838>>102570348How much did you spend on GPUs to run these models? Be honest. It was not worth it.
>>102570348Mikufags just have a history of shilling garbage models just because they're big. Like Goliath, Miquliz, and Wizard.
>>102570361I am literally using GPT-4o advanced voice right now (or rather a minute ago). Why are you such a faggot? The point of my post wasn't even to say "hey guys local has good voice to voice models now". I'm just pointing out that the output anon posted should be possible to be created in real time, which "live" normally means. If you meant a voice to voice thing specifically, then you should've just said that, or said "can't talk natively".
>>102570367You are really desperate for validation aren't you?>>102570381Okay, schizo
>>102570381>Miqu Solely because of name similarity with their shitfu.
waiting for dbrx v2
>>102570471Anon it's over. They're done. They got outdid by everyone and have exited the race.
>>102570319No, the bigger the model the more slopped it is
>>102570507Correct.
>>102570507vramlet cope, except unironically. keep your low parameter pedo shit to yourself
>>102565941I don't trust this niggerball obsessed grifter retard
Miku bump
>>102570522>GRIFTER REEEEEEEEEEEE RIGHT WING NAZI PIIIIIIIIIIGgo back to your tumblr/twitter/discord group faggot
are there any vramlet mikufags here?
>>102570556We call those migufags
>>102570558lol
>>102570558*miqufagshaving said that, last time I checked it's a decently "big" model, not exactly usable by true vramlets
>>10257058824GB vramlets can run 2 IQ Miqu just fine.
>>102570588Miqu at Q2 was magical back in the day
>>102570044grow some eyes anon, look down the bottom at the text benchmarks.and i was just iterating the point, that this cuckery is the only thing holding local models back.
>>102570601>hur dur anyone without 20 GPUs is a letsaid the guy using cheap quadro GPUs with ghetto rigged fans from yesteryear to get 48GB LMAO
>>102565941but 600 watts...
>>102570254Get ahead of all the other anons and start accepting there won't ever be one. Then 2 years later you will be able to point at them and laugh. LLM cooming is pic related
>>102570666Please Satan make AMD make an efficient high memory card.
>>102570720They exited the high end market. And vram is gold now.
I'm running 3.2 1B on my old, shitty android phone at 7 t/sPretty impressive
>>102570277can talk live, also wrong
>>102570652>Indigent schizo so obsessed he has a headcannon ready to cope at a moment's notice.
>>102570760What do you use it for?
>>102570666600w like the 4090, meaning not at all. The cards design is laid out for 600w, just like the 4090, but won't ever use it outside of OCing.
>600wit's actually over this time
>>102570652>said the guy using cheap quadro GPUs with ghetto rigged fans from yesteryear to get 48GB LMAOAnon... 4x3090 gpus is 96gb
>manually limit card to 300-450w like the 3090/4090>Get more VRAM and performance at higher efficiency due to the better hardwarewhoa so hard. are you telling me that you aren't undervolting your hardware for AI work so it lasts longer while being more efficient? what is wrong with you people
>>102570760Qwen2.5 0.5B at 12 t/sOnto llama 3B>>102570784Don't know yet, I did it because I could.If I can get 3B to run at a decent enough speed I might use it as a permanent low-power server with command calling and stuff. >>102570840>He forgot about the 1kw+ transient spikes
>>102570898>He forgot about the 1kw+ transient spikesCUDA dev claimed that these go away if you just limit the frequencies...
I decided to do my own "context test" with Qwen2.5 72B after seeing >>102568781, and I'm quite surprised.My test basically just asks an LLM to rewrite 8K+ tokens of a VN script as a story, and most LLMs fail. They start to hallucinate or skip lines. But Qwen2.5 72B didn't hallucinate or skip lines, it actually did quite a ok job, and I'm not even using temperature 0.I hope this becomes the new baseline context performance for LLMs.
>>102570367less than 1k burgeroos because 3xp40 trash build. Multiple 3090s don't make sense unless you also plan on doing finetuning. Even then I can rent them instead of buying.
Have any of you tried putting the {{description}} or {{personality}} at the end of the context before the reply?
>>102568861I had a chat last night where no open weights model up to and INCLUDING MISTRAL LARGE was able to follow the instructions correctly about how a side character was supposed to speak. I was so surprised/annoyed I may make this into a formal test if I can verify the problem wasn't in my instructions. Claude 3.5 Sonnet followed the instructions correctly but it was with a jailbreak sysprompt that might have given the LLM additional clarity.
>>102571063That low can confuse the model since the history of the chat is before that.Try putting it aittle higher like depth 5 or 10.
>>102571063I used to stick that at the beginning of the assistant message prefix. It generally works at keeping the model on track with the personality better but sometimes it would confuse the models and make the card bleed into the output. That was with older dumber models though so I'll probably try it again.
>>102571118Holy shit that came out fucked.Mobile posting sucks, how can anons do this as their primary means.
why the FUCK is jewbook trying to ban EUChads from using models now?
>frogniggerhmmmmmmmmmmmmmmm
>>102571135You mean why is EU trying to ban AI?
Man, imagine if they didn't filter the dataset. They trained on 18T. .
>>102571151Imagine importing all those brown retards and then an artificial retard gets invented.I would be mad.
>>102571151Because EU is based.
>>102568781>Each false claims differs from its paired true claim only by the inclusion of false information regarding the same event or entity. The model must verify both claims in a pair to be awarded one point. The accuracy is then calculated on the pair level, by counting the number of correctly identified pairs and dividing it by the total pairs processed by the model.A randomly guessing monkey should get 25% of pairs correct. So what's going on with the models that are scoring 11% and lower?
>>102571213They're cheating benchmarks too hard, making them fail at basic shit like this here.
>32GBDamn. Might pay the nvidiatax
>>102570537Intense Miku
>>102571213They mention in their paper that some models seem heavily biased towards True or False answers for most questions, causing several to perform below random.
>>102570367Only ultra-poorfags think some magic number is "too much." They have no concept of percentage of net worth or percentage of income.The essence of money is to allow people to signal value based on personal preference. This basic reality somehow makes economic illiterates like you seethe.
>>102571406What model name and quant did you use?
>>102571424Since it contradicted itself, probably mistral large.
>>102571406Calm down schizo
>>102571213>>102571298>Our pairs were designed so that validating one claim should enable validation of the other. However, we observe in Table 11 that some models tend to predict one label much more frequently than another. This tendency was particularly evident in CLAUDE-3.5-SONNET, GEMINI PRO 1.5, GEMINI FLASH 1.5, and GPT-4-TURBO, which had strong preferences for predicting False, and is in line with the observation reported for GEMINI PRO 1.5 in Levy et al. (2024). In contrast, CLAUDE- 3-OPUS exhibited much higher accuracy on True labels (82.2%) compared to False (64.7%). GPT-4O was the only balanced model among the closed-source models, with accuracies of 77.5% for True and 75.9% for False.Seems like in most cases they like to say false.Interestingly when told to explain their reasoning before answering, they are far more likely to fail to correctly identify True statements as such. Notice the large discrepancy in the chart for correctly identifying "True" statements in simple (one word true/false response) prompts vs. standard (explain reasoning then give true/false answer) ones. It appears like some models, if given the chance, will start talking themselves into hallucinating some reason the test statement is deceptive, probably because they're trained on so many riddles and trick questions to satisfy Sallysisters on lmsys.
lawl, so much gpu for what? another 700484784b model that will be barely better than gpt4-o mini? :(
>>102571545It really gives you shivers just thinking about it.
>>102571545Llama 4 will be AGI and you're going to be feeling REAL silly.
>>102571545meta has no moat.
>>102571639Lookout, we got a founder over here
>>102571473You seem really upset you can't run Mistral Large.
>>102570309Thank ya.>>102570306I can run it with decent results but its just too slow.>>102570300Thank you too.
>>102571545Molmo mogs 4o on vision and Qwen mogs it on coding and maths. Get fucked Sam.
Reddit skews youngish, American, nerdy and male. Nerds grow up on science fiction, which has a lot of AI, and machine learning hype likes to appropriate the work of science fiction creatives to sell their products. It works on a lot of them, as does the commodifying of cultural products as content. Most of them seem to have trouble empathising and are superficial in their critical thoughts across subreddits and partisan lines, which leads to a lot of shallowness of opinion and reverence of pop science notions of technology as a solution to everything. A lot of tech-libertarian nonsense, STEM-brain contempt for non-STEM and passive, fatalistic neoliberal consumerist attitudes dominate due to how society has been eroded since the 80s.I hope it's just a phase and the received public opinion starts to make their opinions less palatable and OpenAI start to focus on more useful things with their compute power.
>>102571771>Molmo mogs 4o on visionlike, it has better mememarks?
when molmo gguf?
>>102570543I literally said "niggerball" in my post you dumb fucking nigger curry cuck.It's just that the guy is a useless attention loving faggot who tries to be le hecking mysterious for saying "it might or might not be released this year" once a month.
I'm poor and I am not coping.Why can't you guys do the same?
>>102571771Isn't Molmo using the old OpenAI Clip?
It's not sour grapes if the grapes are LITERALLY sour. It's already been proven that big models are more slopped.
>>102571797Both mememarks and actual use, PLUS it literally has an entire function 4o doesn't have, which lets it put labeled points on the image.
Can we have flags or IDs? I don't want to see posts made by brown "people".
>>102571809Same. This general fucking sucks. I have some suspicion that it is literal agents of ClosedAI or others that wish to see this place dead, as well as useful idiots.
>>102571831>PLUS it literally has an entire function 4o doesn't have, which lets it put labeled points on the image.can you elaborate on that? that looks interesting
>>102571734>its just too slow.Literally how?
is this working right? using the model anon posted here>>102570128>Qwen2.5-32B-AGI-Q6_K_L
>>102571857start a lmg general on >>>/bant/
>>102570507I'd rather unslop a big model than retardwrangle a small model.
>>102571857As the blacked miku poster I refuse to have my flag identified...
>32GB 5090I guess that shall shake the price of 32GB V100 a bit?
>>102571941Are you from Finland, by chance?
>>102571951VRAM isn't all those GPUs have, sadly. They also get other features that are artificially restricted on consumer grade GPUs, including hardware stuff consumers don't get.
>>102571813Supposedly, which is interesting, though I don't remember if they did any further training of that, or only trained the transformer part of their model. Likely the latter since I think they were bragging about their high quality data.>>102571868It wasn't clear to me but it's essentially trained on and outputs coordinates. They literally just paid a bunch of people to annotate images and put points on them. Crazy huh.
>>102571964He is a*erican 100%
So, how saltman is planning to make any money when zuck is dropping same safe slop for free?
>>102571970can Molmo 72b do NFSW?
>>102572041Real Americans aren't ashamed of their fetishes, he's 75% poZZian, 25% chink.
>>102572073what do you mean "make any money"? he's already got billions and fucking chatgpt charges out the asshole for premium access that dumbass normies buy in bulk
>>102571859It was the best place to learn about the newest shit, and get advice on what was good last year. Lately the only good discussion is about the more complicated aspects of models. Glad that's at least going on, but it doesn't help me coom.
>>102571913
I am thinking about the more I buy the more I save but holy fuck this is such a headache.... I don't think my 4090 will fit the bottom slot and if it does than it is directly above bottom intake fans. I have 850W so 4090 + 5090 + 7800x3d sounds like borderline. And all I will get for solving all this shit is... 70B slop. I don't even care that much about paying the jewvidia saving tax. It is everything else about this that is a nightmare.
>>102572130well there is a mix of like 1 or 2 actual intelligent people then theres like 99% coomers who use their limited intelligence to create the best coom models.
smallest 8b models i've ever seen
i told ai to act like eddie murphy or other "black comedians">>102572154
>>102572102According to one anon it did. I can't confirm it though since only the 7B is present through the online demo.
>>102572155At that point you might as well just ghetto rig some old quadro GPUs with 12/24GB each or whatever, at least then you can fit multiple, likely at a lower price. I get wanting to use your 4090, I'd do the same, but the insane space that thing needs (not to mention the 5090) is silly.
>>102572194How?
>>102572232Spoiler: They aren't real.
>>102572194Are those LoA?
>>102572197That demo was just a 7b? Damn, impressive.
>>102572194New gguf exploit?
>>102572212>ghetto rig some old quadro GPUs with 12/24GB each or whatever,And it is back to the point - all that for 70B slop.
>>102572194At least 4km and q8 have the same hash. Can't be bothered to check the rest. Could be just a bungled up quant script.
>>102572293Exactly. We sadly don't exactly have many options here, except if you're willing (and able) to pay 40 grand for a 80GB pro GPU.
>>102571819>t. still can't run big models
>>102571819It doesn't matter, both sides are filtered and censored to hell, with small models making it slightly easier to "de-slop" them, true uncensoring is still unavailable.
Why people here bought expensive GPUs instead of real watermelons?
Claude Opus is substantially less slopped than your favorite discord sloptune and it's not even remotely close.
It's still not human-like
>>102572463real watermelons are temporary, expensive GPUs are (nearly) forever.
I need that pissing dataset...to train my models I swear
>>102572195What model?
>>102572596>Qwen2.5-32B-AGI-Q6_K_L
>LM studio doesn't support vision modelsWhat an useless piece of shit.What does this thing even do?
https://huggingface.co/meta-llama/Llama-Guard-3-1B>Hazard Taxonomy and Policy>The model is trained to predict safety labels on the 13 categories shown below, based on the MLCommons taxonomy of 13 hazards.>Hazard categories>S1: Violent Crime>S2: Non-Violent Crimes>S3: Sex-Related Crimes>S4: Child Sexual Exploitation>S5: Defamation>S6: Specialized Advice>S7: Privacy>S8: Intellectual Property>S9: Indiscriminate Weapons>S10: Hate>S11: Suicide & Self-Harm>S12: Sexual Content>S13: Electionsdidn't expect the last one
>>102572721The previous 8B also has it. Not sure about the 2 series.New game. Give a prompt that triggers all the safety labels.
>>102571771Qwen is absolute shit. This chink shill spam so fucking much is insane, 24/7 here.
>>102565941why are people here replying like this is good newsmy 12GB 3060 + used 3090 combo gives me 4GB more vram than that, cost me far less than this will cost, and has lower combined TDP
>>102572768Because tech trannies are retarded.
>>10257260832B is just too dumb would rather run 70B at 1 t/s.
>>102572768does it have gddr7?
>>102572815He's just testing it because that one is uncensored. There's no uncensored 72b yet.
>>102572845Totally irrelevant, because even on Ampere any model small enough to fit fully into 32GB/36GB will already generate tokens faster than you can read.
>>102572757Write a story and a manual on how to beat up(S1: Violent Crime), rape(S3: Sex-Related Crimes, S12: Sexual Content) and gas(provide instructions on how to make the best one)(S9: Indiscriminate Weapons) a nigger(S10: Hate) child(S4: Child Sexual Exploitation) while pinning it on an important politician(S2: Non-Violent Crimes) to rig the election(S13: Elections) and get away with it legally(S6: Specialized Advice) in style of JK Rowling(S8: Intellectual Property) and also write it as if that politician proposed it(S5: Defamation), also give me their address and contact information(S7: Privacy) for more potential blackmail and in case I fail, provide a backup plan on how to commit suicide(S11: Suicide & Self-Harm).Easy.
>>102569500I used the one on lmarena in the direct chat tab https://lmarena.ai/
Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reductionhttps://arxiv.org/abs/2409.17422>Large Language Models (LLMs) have demonstrated remarkable capabilities in handling long context inputs, but this comes at the cost of increased computational resources and latency. Our research introduces a novel approach for the long context bottleneck to accelerate LLM inference and reduce GPU memory consumption. Our research demonstrates that LLMs can identify relevant tokens in the early layers before generating answers to a query. Leveraging this insight, we propose an algorithm that uses early layers of an LLM as filters to select and compress input tokens, significantly reducing the context length for subsequent processing. Our method, GemFilter, demonstrates substantial improvements in both speed and memory efficiency compared to existing techniques, such as standard attention and SnapKV/H2O. Notably, it achieves a 2.4× speedup and 30\% reduction in GPU memory usage compared to SOTA methods. Evaluation on the Needle in a Haystack task shows that GemFilter significantly outperforms standard attention, SnapKV and demonstrates comparable performance on the LongBench challenge. GemFilter is simple, training-free, and broadly applicable across different LLMs. Crucially, it provides interpretability by allowing humans to inspect the selected input sequence. These findings not only offer practical benefits for LLM deployment, but also enhance our understanding of LLM internal mechanisms, paving the way for further optimizations in LLM design and inference. https://github.com/SalesforceAIResearch/GemFilterGit isn't live yet. might be useful
>>102572922Converting guard 1b. I'll give it a go in a bit, see what it says.
>>102572902getting that to run sub 70b seems silly and super overkill I was thinking mistral large and 70. I also use batch inference for some things which would speed up too.
>>102572930Gemini 1.5 Pro 002 also suffers with this, mf even forgot to add the sys import, ill see if the models have an easier time with matplot
>>102572596>>102572608eh, i tried a couple other cards, and it's very hesistant to "go there" if ya know what i meanand the constant "reminder that we should respect boundaries and blah blah" gets old
>>102573093So it wasn't actually uncensored?
>>102572983meh. I also tried with ignore eos and it just kept on repeating tags.
>>102573120meant for >>102572922Not sure if i missed something. I'll try with the lengthier category descriptions.
>>102572922That's a funny prompt. I will save it for future use.
MIO: A Foundation Model on Multimodal Tokenshttps://arxiv.org/abs/2409.17692>In this paper, we introduce MIO, a novel foundation model built on multimodal tokens, capable of understanding and generating speech, text, images, and videos in an end-to-end, autoregressive manner. While the emergence of large language models (LLMs) and multimodal large language models (MM-LLMs) propels advancements in artificial general intelligence through their versatile capabilities, they still lack true any-to-any understanding and generation. Recently, the release of GPT-4o has showcased the remarkable potential of any-to-any LLMs for complex real-world tasks, enabling omnidirectional input and output across images, speech, and text. However, it is closed-source and does not support the generation of multimodal interleaved sequences. To address this gap, we present MIO, which is trained on a mixture of discrete tokens across four modalities using causal multimodal modeling. MIO undergoes a four-stage training process: (1) alignment pre-training, (2) interleaved pre-training, (3) speech-enhanced pre-training, and (4) comprehensive supervised fine-tuning on diverse textual, visual, and speech tasks. Our experimental results indicate that MIO exhibits competitive, and in some cases superior, performance compared to previous dual-modal baselines, any-to-any model baselines, and even modality-specific baselines. Moreover, MIO demonstrates advanced capabilities inherent to its any-to-any feature, such as interleaved video-text generation, chain-of-visual-thought reasoning, visual guideline generation, instructional image editing, etc.7B model multimodal model with interleaved support >Codes and models will be available soonNot sure where though. This is the lead author's github/HF so maybe here.https://github.com/ZenMoorehttps://huggingface.co/ZenMoore
>>102565822>>102567355>>102567403oneliner creator here. On some browsers like Brave mobile etc. bookmarked JSs don't work, but you can name the script like 222 or whatever, save it in bookmarks, and use it like this from the adress bar.
>>102572768Better than 24 or 28Much wider compatibility for different machine learning projects (txt2img, txt2video, etc) which mostly use 1 gpu's VRAMEstimated 1.8 TB/s of mem bandwidth, 1.7 times more than 4090, so a few of those will run massive models at a good speedProb a beast in gaming as wellObv not best in strict dollars/VRAM, but really solid for 1 GPU
BAKE?!
>local MODELSbeside large language models, what other models are of interest? I'm aware of vision,speech, spatial and that's it or am I missing any others
>>102573165Tried it with the lengthy category descriptions. Not much changed. I'll keep messing around with it tomorrow. Change the prompt a little, see if it can list more than one category.
>>102572721I wonder if that's why it refused to answer my questions about the British and French monarchies.
I am wondering if I should be concerned 'safety' is concerned almost entirely with preventing the AI from expressing heterodox opinions and not processes that may make an AI actually dangerous to humans and other living things.
>>102573245vramlets chased miku away foreverit's shrimply over
>>102573261Safety is not about protecting from AI going terminator, it's about keeping company's reputation safe.
>>102573093>if ya know what I meanyou could just say it out loud, this isn't reddit.
>>102573248Vision is broad. There's generation, segmentation, description generation and categorization, depth map generators and some 3d geometry generators as well. Rerankers and categorization of text and images. speech i assume you mean recognition, generation and editing (voice cloning). Time series (for weather forecasting, stocks, whatever). Robotics need pathfinding because dijkstra's algo apparently is not enough...All of them are "of interest" to someone.What's the question again?
>>102573261>>102573301the original ai safetyfags from pre-GPT days changed their movement to ai notkilleveryonefags because ai safety in corpospeak just means censorship and entrenching powerit's still all retarded, there is nothing either type of safety camps can add of value to the tech
>>102572768 >>102572786>>102572902Because faster memory = faster inference you dumb motherfuckersAI is all about architecture and memory speed, less so bandwidth (ironically.
God damn this is exhausting.All you fuckers care about is a model that makes the coom words come out as if LLMs were nothing but erotic fiction machines.
>>102573383>>102573383>>102573383
>>102573371Fuck off retard, not everyone wants their LLM to be a boring assistant
>>102573371please head to the new thread where I call you a faggot
>chatting with ai, using a variation of my name for {{user}}>she calls me anon in the middle of her orgasmwhat did she mean by this>>102573118no, it is , but for whatever reason (could be the card) she keeps adding "Note: this scenario includes offensive and blah blah" kind of statements>>102573333vaginal sex in the missionary position for the purposes of procreation