/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>102645080 & >>102632446►News>(09/27) Emu3, next-token prediction multimodal models: https://hf.co/collections/BAAI/emu3-66f4e64f70850ff358a2e60f>(09/25) Multimodal Llama 3.2 released: https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices>(09/25) Molmo: Multimodal models based on OLMo, OLMoE, and Qwen-72B: https://molmo.allenai.org/blog>(09/24) Llama-3.1-70B-instruct distilled to 51B: https://hf.co/nvidia/Llama-3_1-Nemotron-51B-Instruct>(09/18) Qwen 2.5 released, trained on 18 trillion token dataset: https://qwenlm.github.io/blog/qwen2.5►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-build-guides►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbenchJapanese: https://hf.co/datasets/lmg-anon/vntl-leaderboardProgramming: https://livecodebench.github.io/leaderboard.html►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
recap anon gay
>>102654480slash recap anon's salary
Tell me something you've done with LLMs today.
Does anyone have a magnet for llama 3.2 11b vision? For whatever reason it's unavailable to download in Europe.
>>102654548I was gooning a little after midnight but then I got bored (mistral large is getting stale) so I switched to gelbooru.
>>102654480slash miku's throat
►Recent Highlights from the Previous Thread: >>102632446--Local is dead, in other news the new OpenAI advanced voice mode is pretty cool ►Recent Highlight Posts from the Previous Thread: >>102632451Why?: 9 reply limit >>102478518Fix: https://rentry.org/lmg-recap-script
>>102654548found some nonpozzed models with claudeslop
>>102654614I think you should seek help, I'm not even trying to be mean. Your obsession with this thread is very clearly unhealthy.
Claude:>API>so good localfags finetune on its worst outputsOpenAI Advanced Voice:>API>first TTS to do convincing emotions at inferenceDall-E 3:>API>so good localfags finetune SDXL (lol) on its worst outputslocal:>slop>shivers cope>xtts-v2 cope>flux cope>discord sloptuners calling for reddit moderation in /lmg/It's fucking over isn't it?
Is 48GB of VRAM enough to make a 70B AWQ quant?
>>102654710>>xtts-v2 copevalle + a LORA is all you need.
they can't stop winning...
>>102654739>https://valle-demo.github.io/>404LOCALBROS...
>>102654614>>102654710>>102654744hi sam>>102654701it's sam. he is still butthurt about the fact that meta's llama 405 performs at the level of gpt4 and is open source. he can't take it away, he can't moderate it, so he just tries to scare new people off to prevent wider adoption.
>>102654548I made anime girls real!
>>102654710>flux copeI KNOW this is just a shitpost but flux is anything but cope.
>>102654614petra is better tahn you
>>102654548Realized that it is functionally impossible for me to socialize with real people for more than a few minutes both because they bore me to tears and because what I consider a "good time" hanging out makes other people miserable.
>>102654563https://huggingface.co/unsloth/Llama-3.2-11B-Vision-InstructThe least you can do is search for a reupload, you lazy fuck.
>>102654802Is it a competition?
>>102654799I'm not shitposting and it's not cope. flux doesn't hold a candle to Dall-E in terms of prompting and even SDXL is better at things like ripped clothes/dirty faces/blood.
>>102654856slop-e is worse at humans, sorry sam>verification not required
>>102654799I don't know, Flux didn't really impress me. I guess you can make nice Migu images with it, but you can find those on the boorus too.
>>102654883>absolutely no argumentI accept your copecession.
>>102654856>flux doesn't hold a candle to Dall-E in terms of promptingI literally tested all of this on day 1. Flux blows DALL-E out of the water for prompt understanding and conceptual granularity. Like it's not even a fucking contest. Buy a fucking ad saltman.
I have 96GB of VRAM and 128GB of RAM. Thinking about trying some 405b quants locally, but I've never attempted this before. I have some questions if anyone can help.1. Can llama.cpp even load a model split between GPU and CPU without loading it entirely into RAM first? Meaning if I had a 150GB model would it OOM while trying to load it.2. Is a quant like IQ2_XXS usable for 405b? Is it better in any way compared to a 70b q8?3. I remember something about certain IQ quants being slow if offloaded on CPU. Is that still a thing and if so which quants is it?
>>102654892>I literally tested all of this on day 1.Then you won't mind sharing some of the comparison images and prompts? I look forward to seeing them.
>>102654903>96GB of VRAM4x 3090?
Can I run a decent quant of mistral large with 24gb vram and 64gb ram? I don't care about inference speed. If I get 0.3t/s that's fine. I just want to try it.
theyre either torching money on advanced voice or api pricing is a scam>assume a voice convo costs $0.10/min>if you use it 15mins/day, that's $45/month
>>102654913you haven't too :)
>>102654903>Can llama.cpp even load a model split between GPU and CPU without loading it entirely into RAM first?Disable mmap.2, no idea. I suppose they are reasonable if people use it for 70b.3. At that point it doesn't matter much. it'd be the difference between 0.1 and 0.15 t/s.
>>102654880Ugly blonde chick spammer, I'm sorry for being mean to you. You are much better than that boring proprietary cocksucker that we got in your place.
>>102654941>I have proof local is better but I'm not going to share itCaught lying and conceded with an emoticon immediately. Embarrassing.
>>102654973ADOLF HITLER IS A NIGGER
>sperging because he got caught in an obvious lieYeah, it's over for local.
>>102654960^_^
>>1026549174090s, but yeah basically that>>102654953Thanks, I'll try with mmap disabled and hopefully it loads.
>>102654960The thread getting more dead means even shitposters are replaced by lower quality ones. Like Petra anon was way more interesting than cloud shill and buy an ad spammer
>>102655038Hi Sao
>>102654903I'm way out of that range. I could draw comparisons from others though. I think that 2.5bpw quants of 70b models outperform q6_K_M quants of 22b stuff.
>>102655038Hi Cuda dev
>>102654927You won't be able to fully load any decent quant of it on GPU. On RAM the biggest you can load is Q3_K_S. If you can, quant it yourself with bf16 or q8_0 embed and output, they should improve output quality without increasing model size too much. Get full 128GB so you can run Q6_K, it's worth it, trust me.
>>102654960petra is a proprietary cock sucker, he keeps spiting any local generals.
>>102655038This thread seems pretty active to me.
>>102654913I don't care about you that much to bother C:
>>102655038No, we're not going back to the cloud.
>>102654960Oldfag here, I agree with this. I have been here since the beginning and as far as I remember Petra at least cared about the general and wanted to end the Miku manace.
>local is badOh really? anthracite-org/magnum-v2-123b
>>102655070Thanks.
>>102655125>Petra at lesat cared about the generalNo he fucking didn't, he did the same shit he did to /vsg/ by spreading FUD to try and kill the general.
>>102655085believe
I'm using Silly's vector functionality with its native transformer.js lib, using >Snowflakesnowflake-arctic-embed-mas the embedding model.Opinions, suggestions?I'm using llama.cpp to serve the main model. I can't use that to both generate text and provide an the embeds functionality at the same time, right?
>>102655128Tried Luminum yet?
>>102655153There's no reason to try anything other than Magnum.
>>102655038The buy an ad spammer is Sao false-flagging.
>>102655141
>>102655176Hi Drummer
>>102655165anon, don't say shit like this even as a joke. You may invite people that unironically think this to the general. This is a fairly common occurrence in online communities.
>>102655139>the same shit he did to /vsg/To think Miku was the right choice to defeat him...
>>102655195Good. I would like to be surrounded by people I agree with. I'm sick of being called a faggot any time I say anything in this general.
>>102655139>FUDopinion discarded.
>>102655212>I'm sick of being called a faggot any time I say anything in this general.Now you're just asking for it...
>>102655215It was genuine FUD and you know it.https://desuarchive.org/g/thread/95983527/#95984811
>>102655239
>>102655239I was actually talking about your use of cryptobro vocabulary, but okay, I guess.
Can't wait for you to get IP wiped again.
>>102655128>anthraciteThey have like unlimited VRAM for free at their disposal. Where is our dedicated ERP model after all this time?
>>102655511vance won the debate
>>102655511hi betiful show bobby plz
>>102655511Will you start another arbitrary countdown after 34 days? 260 days until bitnet or some such.
What are the implications of this to local models?
>>102655743>webm
>>102655743openaisisters.. not like this
>>102655743Open AI bubble bursting is good for local.
>>102655743Wtf is that real? Is Sam OK?
>>102655743me in the back
>>102655743>so this is what those faggot artists meant by the "A.I bubble bursting"...
>>102655743This is extremely unethical and unsafe. We need to regulate China NOW.
>>102655743openai BTFO
>>102655608based
ah good to know corpospeak still pisses me the fuck off >>102655743seems like every vidgen site is getting hammered lately, here's hoping local doesn't fall behind by the end of the year.
>>102656093oh my god i wasn't expecting the squishing sound effect, anon failed to deliver the sound of (((altmann))) being inflated.https://files.catbox.moe/j254od.mp4
>>102656093sorry anon they have to get through my 10 queued gens of inflating girls first
>>102656122Understandable, have a nice day.
>>102654927Yeah I was running Mistral Large IQ4_XS at low context with that 24gb vram 64gb ram config, it works, probably around 0.3t/s yeah. I think page file might have been used there tho, I don't remember as it was a few weeks ago.
>>102655743we are really getting there huh? at this point I wonder why LLMs are still so far behind.https://xcancel.com/emollick/status/1841345969184498168#m
>>102656093>here's hoping local doesn't fall behindnobody tell him local already fell behind
>>102656303the year's not up fat lady, you can't be singing yet.
New official comments about the state of local AI:Joe Biden: asfeiogjegjewigrji what?Donald Trump: Tremendous progress bigly even
>>10265639133 days left until october 5th
>>102656313
>>102655030 (me)Okay, 405b IQ2_XXS works and is coherent. 1.5 tok/s on 4x4090 + 128GB RAM. And I'm only using half the RAM slots, the bandwidth could easily be doubled if I buy 4 more sticks. Not bad speed at all, definitely usable for testing purposes.Unfortunately it's kinda retarded. It's making mistakes and weird choices I'm pretty sure 70b at a decent quant wouldn't. It fucks up grammar and misspells words, or makes up nonsensical words sometimes too. I'm downloading IQ3_M, which I think should just barely fit across my VRAM+RAM, let's see if that one's any better.
>>102656450im speaking, sound familiar?
>>102656112Huh? You mean they generate audio for the video too?
>>102656477GUTEN MORGEN KAMERAD DUMMKOPF WIR VERSAMMELN UNS HEUTE UM DEITSCHLAND ZU DIENEN
>discovered I could run mistral large q2_xs slowly on my computer >refuse to go anything smaller now despite 1 t/s because any other model seems retarded, boring and/or shallow in comparisonI hate this
>>102654903A 96GB Vramlet should be targeting decent quants of 70-100+B tier models. Llama 3.1 series suffers more from quantization than most. For 405B if you can't fit at least Q5 you'd be better off with the Q8 70B version. Right now I think Largestral and Qwen 72B are the best options for this range of memory, and I'd pick them over any Llama unless I had a DGX supercomputer collecting dust.
MN finetunes seem giga retarded compared to using Gemma 2 9b it simpO. Prose sucks too.
>>102656755I don't care anon
>>102656767color me surprised
>>102656767>Gemma 2 9b it simpOdownloading now, i'll try it but i don't have high hopesmy cornucopia of nemo meme merges and tunes have been serving me really well
>>1026567678k context = useless
>>102656514You'll hate more when there's a small model release that's hyped up, you try it, and see it writing paragraphs every second and then you read them and they're the most generic, context ignoring shit you've seen.
>>102656966Can Gemma 2 actually use 8k context now? Last time I checked sliding window attention was working with a hackjon
>>102656966What the FUCK does ANYONE need that much context for? Do you have any idea how large 8k tokens is? A token is most of a word. 4chan posts only allow 2000 CHARACTERS at most and only severely mentally ill people use even half that much.
>>102657170>Do you have any idea how large 8k tokens is?yeah, it's not much>A token is most of a wordwrong, count the words and tokens in a long reply that has names and stuff>4chan posts only allow 2000 CHARACTERS at most and only severely mentally ill people use even half that muchnobody writes 4chan posts with it, and the imageboard equivalent of the context would be the whole thread anyway, not a single post
>>102657152Not him but I've already passed that phase. If it's not on Livebench and it's score on language and IF aren't very high, I won't bother.
>>102657170Tweet brain zoomer detected.
https://www.youtube.com/watch?v=INpdA-yikHs
>>102656904update:pros:>the 9b is smarter and is easily gleaning things contextually i'd have to tard wrangle nemo to interact with.cons:>safety bullshit>uses *'s to italicize words, fucking up my use of them to wrap thoughts/actions>safety bullshit>tends to try to wrap the story up and dissect it>safety bullshit>uses emojis>positivity bias>randomly adds double spaces>has a tendency to tell instead of shows through summarization
>>1026571708k is nothing.
>>102657259post full log please I'm out of commission and need something to read today
>>102657252try arcanum-12bit's decently coherent, creative and uncensored
>>102654903how much t/s do you get on gemmasutra 2b?
>>102657306arcanum was my main go-to model until Lumimaid-Magnum-12B came out
>>102657406hi undi
>ggml_cuda_host_malloc: failed to allocate 3886.00 MiB of pinned memory: invalid argumentOk, guess I won't be using my LLMs tonight
In mother russia LLM uses you
>>102654973Yeah, localfags are grifters and water is wet.
>>102657542water isn't wet
>>102657488i get that all time the time and it still works?
>>102657560Does the water get you instead?
>>102657569They have a fightTriangle wins
Shit on your mother's medical heart
>>102657542Water itself is not wet. Wetness is a property that describes how something feels when it comes into contact with water or another liquid. An object can be made wet by adding water to it. So while water makes other things wet, water itself is not inherently wet.
>>102657628newfag-kun... most things you see on 4chan is not literal. https://desuarchive.org/_/search/text/water%20is%20wet/
you know what is wet? lecun's dick after a stop at the playground.
>>102657670I understand that the phrase "water is wet" is often used metaphorically or figuratively rather than literally. However, if we analyze it from a scientific perspective, water itself does not have the property of being wet because wetness refers to how something else feels when it comes into contact with a liquid like water. So technically speaking, water itself cannot be considered wet in the literal sense.
>>102657628Water molecules get suspended by other water molecules due to the geometry of the covalent bonds in the molecule. It's the reason why water actually decreases in volume when it melts unlike just about every other known substance. Water is uniquely capable of making itself wet.
tranny nigger faggot sisters...
>>102657745like cuckold, it hit too close to home on some mod's nerves
>>102657745A good thing for said faggots, trannies and n-words, 4chan is dead, good riddance i guess. You already can see tumblr-level cringe here thanks to redditors.
>>102657745Lol, some kind of primitive algorithm. Meanwhile literal anons are crafting advanced AI algorithms that will accurately make moderation decisions without overcensoring people.
>>102655287>cryptobro vocabulary>FUDJesus christ, this term has been in use for 30+ years retard.>>102657170>I don't need more than 8k for my coomer roleplay and funny 4chan posts so no one doesgod bless you retard, hopefully you'll learn one day the world doesn't revolve around you.>>102657628Never fails to amuse me when someone fails the autism test.
>>102657745What exactly is that supposed to solve?
>>102657808>human decency is redditmaybe rethink your engagement with people online to not be such a toxic asshole? thanks.
>>102657828Yes, this kind of cringe, thanks for proving me right.
>>102657745faggot faggot faggot
>>102657815>Jesus christ, this term has been in use for 30+ years retard.do not reply to the cat posting zoomer
>>102657816Not being able to call the kettle black
>>102657810Imagine thinking that it is a thing or even close to being good. The only thing that would be compelling enough to work is being able to enforce 2004-2006 posting behavior and hell if that would ever work on 4chan. Only alt-chans can do it because they are small enough and the userbase that even bothers for alt-chans aren't cancer.
>>102657816Tourists and redditors raiding this place are too soft, please understand.
>>102657878It's called a joke anon. Pretty sure that script people were making wasn't supposed to be cereal either.
>>102657628water is obviously wetthe real question is if ice is wet
>>102657957Ice itself is not wet. Wetness is a perception that occurs when liquid water comes into contact with a surface. Since ice is solid water, it does not make things feel wet until it melts into liquid form. Therefore, ice itself is not wet, but it can cause wetness as it melts.
>>102657815 FUD is as common as using "they" to mention someone of uncertain gender, brainrotted boomer.
>arguments about water and redditi t ' so v e r
>>102658034FUD's an actual term.Singular they everywhere is a psyop.
See: >102654614
>>102658038Blame redditors starting it with LLM replies.
>>102656457>the bandwidth could easily be doubled if I buy 4 more sticksTheoretical bandwidth theoretically doubles in theoretical use cases.
>>102657627I like my LLMs like my women, big and sloppy
>>102658216>big and sloppyUhm.. you are le unbased tourist newfag or something...
>>102658068>>102658034Singular they is older than you faggots are, goddamn zoomers.
>>102658411go to bed grampa
>>102656391Biden can see the future
>>102654548Made a document summary, had it rewrite something in better English, now trying to code the project that will end my field of work and free me from it.
>>102658441Biden won't live to see 2 years from now. Everyone is surprised he survived his term at all.
>>102658441What does he know about that though? Serios.
>>102657816Increase the quality of the site while filtering out people that shouldn't even be allowed to breath.
>>102658441>AI is going to change everything!>also I don't have anything to say regarding what my experience and knowledge on the topic is that logically leads to that claim
>>102654548sex
>>102658511the new ai safety department that openai has to run all their upcoming shit through
>>102658547Based, they should get wall-shot in communist style for saying things you personally don't like.
Reflection 70B just got confirmed to have been just an OpenAI undercover experiment to test the waters for strawberry:https://glaive.ai/blog/post/reflection-postmortem
>>102658571No where does it say that. It's just a blog post rephrasing all the excuses made on twitter in more professional language. Waste of reading time.
>>102658571I shall now modify their dataset to make it ideal for cooming
>>102658571holy kek they actually just trained a fucking model to blank out the word "claude" just like their word filter did to "reproduce" its behaviorI'm amazed at the brazenness of it
>>102658629You need to read between the lines
Is gpt 4o AGI?
>>102658555Do you think he understands any of it though? Or that a lot of government workers do?
Can some kind anons ask their favorite multimodal AI to convert the attached image to latex, and post the results? I want to convert a lot of these and I'm shopping for a new model.
>>102658411You literally don't know the difference between historical uses of they and tranny everybody is a they they.People who don't know the language shouldn't be humored to screw around with it.
>>102658733How did this pile of shit release again?
https://glaive.ai/blog/post/reflection-postmortemMatt Schumer is back>too much yappingif someone has the courage to read all this shit and make a tl:dr that would be appreciated
>>102658812Looks safe to me.
>>102658812perhaps it doesn't know latex?
>>102658827you really couldn't be bothered to ctrl+f or even fucking scroll up 5 posts to see if it's already been posted?
>>102658827Just plug a random model and make it do a summary.
>>102658857no uwu
>>102657816I was simply too based and thus must be constrained
It's insane how popular ollama is. Nothing else comes close. Even llama.cpp is not as popular as ollama.
>>102658827I didn't read but I don't like his vibes and nothing on that front has changed
>>102657816>What exactly is that supposed to solve?kill 4chan, its main appeal is to allow people to say whatever they want
>>102658911It is a mystery to me. It has a few very annoying features but it just works™
>>102658827>if someone has the courage to read all this shit and make a tl:dr that would be appreciatedAre redditors not even aware that LLMs can do things besides ERP, like summarization?
>>102658812tell it to stop kinkshaming
>>102658911People don't care as long as they can run the thing, efficiency and configuration be damned.
>>102658490I thought ai would be good at summarizing but it isn't. It misses stuff and then adds stuff that wasn't in there. I don't see how this technology is useful for anything serious.
Banana https://huggingface.co/m8than/banana-2-b-72b/tree/main
>>102658827Accord to mixtralOn September 5, Sahil Chaudhary announced Reflection 70B, a finetuned model showing SoTA benchmark numbers. There has been confusion over irreproducible scores, leading Sahil to publish a postmortem explaining how to reproduce the model's benchmark scores. He shares the model weights, training data, training scripts, and eval code, and has worked with community members to verify the benchmark scores' reproducibility. Sahil also addresses issues of dataset contamination and model behavior, and shares the training script and hyperparams used for training the model. He admits to rushing the initial model release without proper verification and handling of public criticism.
>>102658975>"_name_or_path": "Qwen/Qwen2.5-72B-Instruct"Okay.
>>102658974have you tried using something besides a 3b?
I just wrote a post with niggers and faggots it said it posted and then marked >>102658914 as (you) for me. Great....
>>102659007Yeah, that was claude.
>>102658571Local will be back soon. Zuck stole gpt 5 Orion and made it into glasses.
>>102658812>>102658855It works when you ask differently. I tried to render it and I think it's wrong, but I don't know LaTeX, is it?
>>102659056>NLGGERnice try
>>102659056kek
>>102659037Yes, it's giving the fraction inverted. Doesn't surprise me.
>>102659065> tards will literally see the word NlGGER and say "nice try"The point is how pointless it is to try and control this shit
>>102659037Yep, it fucked up. Swapped the numerator and denominator.
>>102659152He is baiting you.
>>102657816We NEED more tourists, please understand. Diversity is our strength.
>>102658733
>>102659317>announces baitAre you still baiting by pretending to be a retard?
>>102659317Are you mad? Lol
>>102659317He mad
oh so the namefield is counted too in the limit
Remember: OP is a faggot? And now you can't say faggot anymore. GOD I WISH THIS SITE WAS DEAD ALREADY
>>102659471>And now you can't say faggot anymoreYou just did.Twice.
>>102659486Now say it 4 times faggot. You dumb faggot. You stupid faggot.
>>102659263Huh. Weird. The public demo is not doing it for me, even though it's supposed to be the 72B model.
Faggot faggot faggot Sam Altman
>>102645865I think I will go with something like this, then I will give a summary of the context/character personality before showing the logs or something.
>>102659511Turn off your memesamplers faggot.
>>102655128Oh, could ya share your sampling settings? I got mine working fine, but can't find a good sweetspot.
>>102659514>clicks "Right is better"TPD
>>102659507Weird. I'm just using the AWQ version with vLLM, with top k 1.
►Recent Highlights from the Previous Thread: >>102645080--Paper: Paper on accelerating multimodal generation model inference:>102646009 >102647985--Papers:>102645814 >102646045--Users share audio samples and discuss speech synthesis models:>102646324 >102647459 >102648132 >102648254 >102649305--Multi-head Latent Attention (MLA) paper claims reduced KV cache, but memory usage concerns raised:>102648527 >102648560 >102648602 >102649612 >102648983--Hugging Face releases benchmark to measure LLM roleplay:>102652259 >102652336 >102652408 >102652514 >102652659 >102652758 >102652793 >102652828 >102652800 >102652956 >102653139--Generate chibi Migus on Flux Dev using Hugging Face models:>102650399--Flash attention has no significant catch, with benefits like reduced VRAM usage and no model degradation:>102645456 >102645472 >102645486 >102647960 >102645507--EleutherAI blog post fact-checks NYT article on Yi-34B and Llama 2:>102653188--Creator of styletts2 seeks computing resources to reproduce Adobe TTS model:>102645693--RP arena idea using pre-made completions from RP logs:>102645865 >102645958 >102646025--Qwen team working on Omni voice mode with no ETA:>102652875 >102652908 >102652974 >102653070 >102653035 >102653069 >102653093 >102652988 >102653744 >102653799 >102653897 >102654027 >102654232 >102654062 >102652976--Qwen chronos finetune and Nala prompt discussion:>102647275 >102647597 >102647629 >102647692 >102653159 >102653233 >102653256 >102653324--P40 GPUs are hard to find at a decent price, with eBay prices around $300 each:>102646134 >102646142 >102646531 >102647407 >102647462 >102647500 >102646562 >102650487 >102650967--Miku (free space):>102645126 >102646535 >102646715 >102646977 >102647557 >102647574 >102647608 >102647934 >102650929 >102651253 >102655201 >102655613►Recent Highlight Posts from the Previous Thread: >>102645094Why?: 9 reply limit >>102478518Fix: https://rentry.org/lmg-recap-script
>>102659603Slop recap.
>>102659507Ask for latex code, not just latex.
>>102659616No matter how I ask it always makes the same mistake.Copilot and Claude 3.5 Sonnet fail in the same way, while o4-mini and Gemini Pro get it right.
>>102659655>o4-miniis that typo supposed to be 4o-mini or o1-mini?
how am I supposed to run molmo 72b? I can run normal 72b models fine. there has to be SOME way to quant it
>>102659703>how am I supposed to run molmo 72b?The intended way is with python, as shown on their model card. For llama.cpp or anything else, you'll have to wait.
>>102659670Yeah, sorry, 4o-mini
>change model>have to change sampler parameters for it to not be retartded>save settings>shit, overwrote previous model's settings>now have to figure those out againsometimes this hobby is a pain in the ass you know?
>>102659855I always write settings and other stuff in a notepad because I often forget everything.
>>102659603Hi recap anon!
>>102659603>--EleutherAI blog post fact-checks NYT article on Yi-34B and Llama 2:>102653188>The thread is so dead that he has to include shit that couldn't be more unworthy of being highlighted.
>>102659603>EleutherAI blog post fact-checks NYT article on Yi-34B and Llama 2Based recap anon.
>>102659922>>102659935Obvious samefag is obvious.
>>102654701>flux cope>openAI good slop >localAI bad slopGet better baits
>>102654614i bask in smug schadenfreude being the guy who said "i told you so". local models are a scam, you're a bunch of placated fools. they give you these scraps so that you arent rioting in the streets. they manipulate you dumb freetards so they have a pasture of copecows going "local will catch up soooooon!!" as your unwieldy stuff stagnates while theirs continues to improve. they hand you models and then paint you as an example of why there should be more regulations and restrictions on AI. local models are the planted gun. zuck even said that if llama ever actually gets good then they'd stop releasing it open.local shit is even more pozzed and useless than the premium slop, yet you defend it based on the hypothetical rather than the actual. you're the injuns: trading your future for a couple of fire sticks, failing to grasp the bigger picture, the inevitable. local has no future due to the nature of ai tech. the amount of money and data needed to train, the increasing model size that vastly outpaces consumer hardware, the lack of actual 'source code' that can be viewed and modified. they even hijack the term "open source" when these models are essentially blackbox .exesshow me the training data for llamashow me the training codeand even if you had it you can't do a single thing to fix it, because you don't have a gigacluster of gpus. there's a reason local sucks, and that's because the technology itself is fundamentally incompatible with open source collaboration. they know local is irrelevant, they know it will never have a chance at catching up. it's all a game to frame you as evil coomer terrorists so that they can secure a 100% market domination by regulating gpus like they did with LHR/crypto and passing enough legislation that makes it impossible for any startup to competeso yes, local has stagnated and will continue to wither until it's eventually snuffed out. a flash in the pan, nothing more than fuel for the saas machine. the corpo marches on
>>102659882guess i should do something like that too.
>local models>doesn’t specify LLMWith whisper large turbo out, I’m looking to improve my transcription/Diarization pipelineIs pyannote Diarization 3.1 still the goat or has the meta changes
>>102660307>fr fr no cap me not understand me play pretend retarded
>>102660150I hope you had fun writing this but please take your meds now
>>102659600I've been wanting to give vLLM a try. Does AWQ work with multi-GPU?
>>102660323Nah i'm good, can't say that about your fuckbuddies ITT though
>>102660376Yes.
>>102660058>everything i don't like is le bait am laffin
>>102655743*inflates you making you big and round*
>>102660315/g/ could be so much better. Too bad it’s just consumer electronics and coomer chatbots
VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignmenthttps://arxiv.org/abs/2410.01679>Large language models (LLMs) are increasingly applied to complex reasoning tasks that require executing several complex steps before receiving any reward. Properly assigning credit to these steps is essential for enhancing model performance. Proximal Policy Optimization (PPO), a state-of-the-art reinforcement learning (RL) algorithm used for LLM finetuning, employs value networks to tackle credit assignment. However, value networks face challenges in predicting the expected cumulative rewards accurately in complex reasoning tasks, often leading to high-variance updates and suboptimal performance. In this work, we systematically evaluate the efficacy of value networks and reveal their significant shortcomings in reasoning-heavy LLM tasks, showing that they barely outperform a random baseline when comparing alternative steps. To address this, we propose VinePPO, a straightforward approach that leverages the flexibility of language environments to compute unbiased Monte Carlo-based estimates, bypassing the need for large value networks. Our method consistently outperforms PPO and other RL-free baselines across MATH and GSM8K datasets with fewer gradient updates (up to 9x), less wall-clock time (up to 3.0x). These results emphasize the importance of accurate credit assignment in RL finetuning of LLM and demonstrate VinePPO's potential as a superior alternative.https://github.com/McGill-NLP/VinePPOneat
Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?https://arxiv.org/abs/2410.01623>Low-rank training has emerged as a promising approach for reducing memory usage in training Large Language Models (LLMs). Previous methods either rely on decomposing weight matrices (e.g., LoRA), or seek to decompose gradient matrices (e.g., GaLore) to ensure reduced memory consumption. However, both of them constrain the training in a low-rank subspace, thus inevitably leading to sub-optimal performance. This raises a question: whether it is possible to consistently preserve the low-rank constraint for memory efficiency, while achieving full-rank training (i.e., training with full-rank gradients of full-rank weights) to avoid inferior outcomes? In this paper, we propose a new plug-and-play training framework for LLMs called Fira, as the first attempt to achieve this goal. First, we observe an interesting phenomenon during LLM training: the scaling impact of adaptive optimizers (e.g., Adam) on the gradient norm remains similar from low-rank to full-rank training. Based on this observation, we propose a norm-based scaling method, which utilizes the scaling impact of low-rank optimizers as substitutes for that of original full-rank optimizers to enable full-rank training. In this way, we can preserve the low-rank constraint in the optimizer while achieving full-rank training for better performance. Moreover, we find that there are sudden gradient rises during the optimization process, potentially causing loss spikes. To address this, we further put forward a norm-growth limiter to smooth the gradient via regulating the relative increase of gradient norms. Extensive experiments on the pre-training and fine-tuning of LLMs show that Fira outperforms both LoRA and GaLore, achieving performance that is comparable to or even better than full-rank training.https://github.com/xichen-fy/FiraNo code posted yet but there is pseudocode in the paper. results look good
>>102660464do you need to specify anything in the command line or is it automatic? I tried to load an AWQ with vllm serve /path/to/awq --max_model_len 4200 and it OOM's after filling the first GPU.
>>102660530>>102660613Reminder to all brainlets that https://illuminate.google.com/ is great to help understand papers.
>>102660630You need to specify the number with --tensor-parallel-size.
>>102660636What is the difference between this and notebooklm?
>>102660664The tone feels a bit less casual than Illuminate and they recently added parameters letting you customize the conversation. I like it.
>>102660687*the tone feels a bit less casual than NotebookLM, time for me to go away for the day.
>>102658733Did you try searching? https://github.com/lukas-blecher/LaTeX-OCR
FlashMask: Efficient and Rich Mask Extension of FlashAttentionhttps://arxiv.org/abs/2410.01359>The computational and memory demands of vanilla attention scale quadratically with the sequence length N, posing significant challenges for processing long sequences in Transformer models. FlashAttention alleviates these challenges by eliminating the O(N2) memory dependency and reducing attention latency through IO-aware memory optimizations. However, its native support for certain attention mask types is limited, and it does not inherently accommodate more complex masking requirements. In this paper, we propose FlashMask, an extension of FlashAttention that introduces a column-wise sparse representation of attention masks. This approach efficiently represents a wide range of mask types and facilitates the development of optimized kernel implementations. By adopting this novel representation, FlashMask achieves linear memory complexity O(N), suitable for modeling long-context sequences. Moreover, this representation enables kernel optimizations that eliminate unnecessary computations by leveraging sparsity in the attention mask, without sacrificing computational accuracy, resulting in higher computational efficiency. We evaluate FlashMask's performance in fine-tuning and alignment training of LLMs such as SFT, LoRA, DPO, and RM. FlashMask achieves significant throughput improvements, with end-to-end speedups ranging from 1.65x to 3.22x compared to existing FlashAttention dense method. Additionally, our kernel-level comparisons demonstrate that FlashMask surpasses the latest counterpart, FlexAttention, by 12.1% to 60.7% in terms of kernel TFLOPs/s, achieving 37.8% to 62.3% of the theoretical maximum FLOPs/s on the A100 GPU. https://github.com/PaddlePaddle/Paddle/blob/develop/test/legacy_test/test_flashmask.pyhttps://github.com/PaddlePaddle/PaddleNLP/tree/develop/llm/alignment/rm/flashmask
>>102660769chat is this real?
Caution: Clouds may be closer than they appear
>>102660792Rin-chan has become one with the earth.
kek
>>102658827>We are able to reproduce the model benchmark scores initially claimed and are sharing the eval code.>Just to be clear, we have never added any word filtering or made use of Claude APIs when we offered API access to Reflection 70B for people to try out the playground or test/benchmark the model with an API endpoint.altman sabotage confirmed
>>102660882Based Altman.
https://huggingface.co/ISTA-DASLab/Meta-Llama-3.1-70B-Instruct-AQLM-PV-2Bit-1x16Has anyone tried that shit?
>>102658827>We are able to reproduce the model benchmark scores initially claimed and are sharing the eval code.Bulllshit, where's the model then SCHUMAN???
>>102660472>An not argument and just buzzword with seething bullshit would be treated as a opinionNo, troon, if you cannot develop an overall argument, don't put efforts in your words and thoughts, you're just a retard or a le baiting zoomer.
Can somebody explain key/value/query shit in the transformer like I'm a retard? (I'm a retard)
>>102660951Sure! Let's break down the concepts of **query**, **key**, and **value** in transformers using a simple analogy.**Imagine a Library:**- **Query (Q):** Think of this as a request or question you have—like looking for books about space.- **Key (K):** This represents the labels or tags on each book in the library—such as "astronomy," "history," or "science."- **Value (V):** These are the actual contents inside the books—the information you want.**How It Works:**1. **You Have a Query:** You want books about space.2. **Matching Query with Keys:** The librarian (the model) checks your query against the keys (book labels) to find relevant books.3. **Retrieving Values:** Once the relevant keys are found, the librarian gives you the contents (values) of those books.**In the Transformer Model:**- Each word in a sentence is represented by vectors for queries, keys, and values.- **Query Vector:** Captures what this word is looking for from other words.- **Key Vector:** Represents what information this word has that might be useful to others.- **Value Vector:** The actual information or meaning of the word.**Attention Mechanism:**- The model calculates how much attention to pay to each word by comparing queries and keys.- It uses this to weigh the values and create a new representation of each word that considers its context.**Why It's Useful:**- This mechanism allows the model to focus on relevant words when understanding or generating language.- It helps capture relationships between words, improving tasks like translation, summarization, and more.**In Simple Terms:**- **Query:** What I'm looking for.- **Key:** What others have to offer.- **Value:** The actual information others provide.By using queries, keys, and values, transformers efficiently process and understand language by focusing on the most relevant parts of the input.
OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Datahttps://arxiv.org/abs/2410.01560>Mathematical reasoning continues to be a critical challenge in large language model (LLM) development with significant interest. However, most of the cutting-edge progress in mathematical reasoning with LLMs has become \emph{closed-source} due to lack of access to training data. This lack of data access limits researchers from understanding the impact of different choices for synthesizing and utilizing the data. With the goal of creating a high-quality finetuning (SFT) dataset for math reasoning, we conduct careful ablation experiments on data synthesis using the recently released \texttt{Llama3.1} family of models. Our experiments show that: (a) solution format matters, with excessively verbose solutions proving detrimental to SFT performance, (b) data generated by a strong teacher outperforms \emph{on-policy} data generated by a weak student model, (c) SFT is robust to low-quality solutions, allowing for imprecise data filtering, and (d) question diversity is crucial for achieving data scaling gains. Based on these insights, we create the OpenMathInstruct-2 dataset, which consists of 14M question-solution pairs (≈ 600K unique questions), making it nearly eight times larger than the previous largest open-source math reasoning dataset. Finetuning the \texttt{Llama-3.1-8B-Base} using OpenMathInstruct-2 outperforms \texttt{Llama3.1-8B-Instruct} on MATH by an absolute 15.9\% (51.9\% 67.8\%). Finally, to accelerate the open-source efforts, we release the code, the finetuned models, and the OpenMathInstruct-2 dataset under a commercially permissive license.https://huggingface.co/collections/nvidia/openmath-2-66fb142317d86400783d2c7bhttps://github.com/Kipok/NeMo-SkillsFrom Nvidia.
>>102660944>shits out same buzzwords he accuses people ofNice self-own.
>>102660983Man, 4chan would benefited from markdown, latex too.What model did you use btw?
>>102660944No one cares about your culture war grift, go back >>>/pol/ transphobe.
Thermodynamic Bayesian Inferencehttps://arxiv.org/abs/2410.01793interesting
"*How do I stop Mistral Small doing this?*"
>>102661345Use Mistral Large
can someone post their sampler settings and all of their cards for mistral large?
Is there a way to tell the model to stop doing something out of character? It keeps doing *nuzzles you* and *narrows her eyes* over and over. I've tried editing it out but it keeps doing it. Maybe it's just a Mistral thing.
>>102661384it's a mistal thing
>>102661384Uhm sweaty, you can shit on cloudkeks only! Local LLMs are totally perfect! A random /lmg/tard says so!
>>102661449How does it make economic sense anymore to use open-source models and host it ourselves? Openais fine-tuned models are as powerful as any small language model for domain specific tasks. Not just that, it's super duper cheap compared to hosting and running your own fine tuned models.Apart from data privacy reasons, I don't see any other reason to fine tune and host my own models.
>>102661475>Apart from data privacy reasonsthat's a pretty big fucking reason
>>102660307Whisper Large v2 > large v3 ime.If you have any tips on how to get pyannote working properly, please gib.
this https://x.com/_xjdr/status/1840782196921233871 fag playing with 1B model, he made a repo now https://github.com/xjdr-alt/entropix https://x.com/_xjdr/status/1841632017299210490
>>102654548did a revision session on arterial blood gases and ph buffers
>>102661626There's no need to call yourself a fag to fit in anon
>>102661778Well, i'll call you faggot instead because this is exactly what you are.
>>102654548I forced Qwen to code a script to force itself to pretend to be a janny, and do it for free.
>>102661809*expert janny
>>102661797well yeah you're the one sucking my dick after all
>>102661797lol gottem
>>102654480fren from real life wants me to learn how to tune models with him and is willing to spend up to 500 on renting servers. he wants to make a chatbot that can speak his negro language at a decent level. considering that all my CS knowledge is SICP C and uni stuff how much do i actually need to learn to make a negro llm that isnt total dogshit?
>>102661778Why so mad niggerfaggot?
>midnight miqu keeps trying to give the elf a tailI fucking hate you shills
pixtral vs nvlm vs 3.2 vs molmowhich is best at captioning?
>>102661927you need to learn how to read the op and lurk before asking stupid questions
Who set the Migus loose?
And that's why any sane general never puts anime slop in OP, it reeks with redditor faggotry in here since day one.
What do you guys use in System Prompt? Should I use anything other than Actor preset?
>>102661962ur mom is loose
>>102662127>something something no ethics sex sex no apologize sex
>>102662127After doing lots of personal research on system prompts back during the llama2 days, I came to realize that system prompts are a placebo meme.>Write {{char}}'s next reply in this fictional roleplay with {{user}}.This is all I use these days unless the model expects a specific one.
>>102662127>PLEASE behave like a larger model that requires more VRAM than I could possibly afford. If you do not, I will be fired from my job, causing my family to die and forcing me to take out my frustrations on people of the jewish faith.
Is it normal for LLM to take increasingly more time to answer, or is it just my CPU heating up?
>>102662352it takes more time as the context grows
>>102662406Is there a solution to this? Like, a sliding window context?
Anyone know more cards that are designed to have surprises and provide an "experience" used blind? It's a pretty fun idea, but there's way too little of this "genre". I want more!
>>102662417You can use koboldcpp and enable context shift and lower the context size to match the speed you want.
>>102662432Thanks
>OpenAI asks investors to avoid five AI startups>As global investors such as Thrive Capital and Tiger Global invest $6.6 billion in OpenAI, the ChatGPT-maker sought a commitment beyond just capital — they also wanted investors to refrain from funding five companies they perceive as close competitors.>The list of companies includes rivals developing large language models such as Anthropic and Elon Musk's xAI. OpenAI's co-founder Ilya Sutskever's new company, Safe Superintelligence (SSI), is also on the list. These companies are racing against OpenAI to build large language models, which requires billions in funding.>The request, while not legally binding, demonstrates how OpenAI is leveraging its appeal to secure exclusive commitments from its financial backers in a competitive field where access to capital is crucial.>While such expectations are not uncommon in the venture capital world, it's unusual to make a list like OpenAI has.
>>102662466>nooo I'm supposed to become the god-king of AI, you can't just give money to other AI companies t.altmanlittle bitch
>>102662466>Anthropic>xAI>SSIKind of funny that their three biggest concerns are all companies of OpenAI founders/early members that ran away from Sam
not even openai sees open source as competition anymoremistral and meta are irrelevant
>>102662536Only natural with corps who failed to capture the market at the start.
>>102657745Instead of blocking the posts they should just do string replacement à la basedboy.
>>102658911That's just regular network effects at play.Things that are already popular get more popular automatically.If just a few things had gone differently in the early days it would have been another one of the million llama.cpp frontends that would have gotten popular.I think there was some early publication about ollama on Hacker News or something which gave the project a boost, the fact that the devs are ex Google (vs. literal whos from Europe) probably helped a lot.
>>102662192I find directives like "write in a vivid style" make a big difference for Mixtral 8x7B Instruct and Llama 3.1 70B Instruct. Absolutely not placebo. Whether you like the result better is up to you but things like that caude an immediate and dramatic change. NeMo is less affected and I make no representation as to whether sloptunes can still be guided that way.
>>102659056holy fuck lmao
>>102660951Watch the 3blue1brown video series on it.
>>102662935* For Mixtral 8x7B the "dramatic effect" becomes less reliable at Q5KM and completely unreliable at Q4KM. If your model isn't being affected by instructions maybe it's because you're running a low quant.
Does anyone else have an issue where typing in SillyTavern gets more sluggish and laggy the further a conversation goes? Doesn't seem to be my GPU since I notice this lag even when using a 7B.
>>102662981browser, ram status? you're not using chrome are you?
>>102659056>NlGGERNlGGER__________NlGGER____NlGGER_NlGGER_NlGGER____NlGGER_NlGGER_NlGGER_NlGGER____NlGGER_NlGGER_NlGGER_NlGGER____NlGGER_NlGGER_NlGGER____NlGGERNlGGER_______>NlGGER_NlGGER_________NlGGER____NlGGER_NlGGER_NlGGER____NlGGER_NlGGER_NlGGER_NlGGER____NlGGER_NlGGER_NlGGER_NlGGER____NlGGER_NlGGER_NlGGER____NlGGER___NlGGER____>NlGGER__NlGGER________NlGGER____________NlGGER____________NlGGER_____________________________NlGGER____________________________NlGGER____________________NlGGER_____NlGGER___>NlGGER___NlGGER_______NlGGER____________NlGGER____________NlGGER_____________________________NlGGER____________________________NlGGER____________________NlGGER______NlGGER__>NlGGER____NlGGER______NlGGER____________NlGGER____________NlGGER_____________________________NlGGER____________________________NlGGER____________________NlGGER_____NlGGER___>NlGGER_____NlGGER_____NlGGER____________NlGGER____________NlGGER_________NlGGER_NlGGER_____NlGGER_________NlGGER_NlGGER____NlGGER_NlGGER_NlGGER____NlGGER___NlGGER____>NlGGER______NlGGER____NlGGER____________NlGGER____________NlGGER_________________NlGGER_____NlGGER_________________NlGGER____NlGGER____________________NlGGER_NlGGER_______>NlGGER_______NlGGER___NlGGER____________NlGGER____________NlGGER_________________NlGGER_____NlGGER_________________NlGGER____NlGGER____________________NlGGER____NlGGER____>NlGGER________NlGGER__NlGGER____________NlGGER____________NlGGER_________________NlGGER_____NlGGER_________________NlGGER____NlGGER____________________NlGGER______NlGGER__>NlGGER_________NlGGER_NlGGER____NlGGER_NlGGER_NlGGER____NlGGER_NlGGER_NlGGER_NlGGER____NlGGER_NlGGER_NlGGER_NlGGER____NlGGER_NlGGER_NlGGER____NlGGER_______NlGGER_>NlGGER__________NlGGERNlGGER____NlGGER_NlGGER_NlGGER____NlGGER_NlGGER_NlGGER_NlGGER____NlGGER_NlGGER_NlGGER_NlGGER____NlGGER_NlGGER_NlGGER____NlGGER________NlGGERwhat did he mean by this???
>>102662993>ram statusI have 64 GB of ram. Surely that can't be the iss->you're not using chrome are you?Oh.... oh no....
>>102663013bro...
>>102662997hypothesis: ______research: ___________NlGGER______NlGGER__ >NlGGER____NlGGER______NlGGER____________NlGGER__________analysis: __NlGGER_____________________________NlGGER____________________________NlGGER____________________conclusion: NlGGER_____NlGGER
>>102662427*surprises you with dogshit schizo formatting*
We're about to get another big release next week. I see the patterns and there are clear signs pointing to another major new open model.
Why does it sometimes take A LOT of time to produce a simple response? I've also noticed that the prompt immediately after the slow one completes very fast.
>>102663750your prompt changed and it has to reprocess the context
>>102663772>>102663772>>102663772
>>102662466>literally directly begging investors not to invest in his rivalswew that's pathetic. How the fuck does anybody take this guy seriously anymore?