/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>102552020 & >>102544848►News>(09/25) Multimodal Llama 3.2 released: https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices>(09/25) Molmo: multimodal models based on OLMo, OLMoE, and Qwen-72B: https://molmo.allenai.org/blog>(09/24) Llama-3.1-70B-instruct distilled to 51B: https://hf.co/nvidia/Llama-3_1-Nemotron-51B-Instruct>(09/18) Qwen 2.5 released, trained on 18 trillion token dataset: https://qwenlm.github.io/blog/qwen2.5/>(09/18) Llama 8B quantized to b1.58 through finetuning: https://hf.co/blog/1_58_llm_extreme_quantization►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/llama-mini-guidehttps://rentry.org/8-step-llm-guidehttps://rentry.org/llama_v2_sillytavernhttps://rentry.org/lmg-spoonfeed-guidehttps://rentry.org/rocm-llamacpphttps://rentry.org/lmg-build-guides►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbenchJapanese: https://hf.co/datasets/lmg-anon/vntl-leaderboardProgramming: https://hf.co/spaces/mike-ravkine/can-ai-code-results►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
►Recent Highlights from the Previous Thread: >>102552020--Papers: >>102556485 >>102556658 >>102556704--Techlet seeks advice for running smut models on RTX 3060 and 32GB RAM setup:>102553022 >102553034 >102553095 >102554016 >102554051 >102555169 >102555266 >102555360--SDL input example may support whisper.cpp voice recognition on Linux:>102552838 >102552967--Molmo 72b local execution challenges and workarounds:>102552240 >102552305 >102552320 >102552354 >102553463 >102554973 >102552544--90B model issues and potential improvements with quantization:>102552694 >102552834 >102552873 >102552940--90B model fails to interpret hazard symbols, while chatgpt endpoint succeeds:>102553581 >102553614--Llama3.2 3B passes ShaderToy test and generates working code:>102554042 >102555022 >102555072--Llama 3.x improvements are incremental, increasing context length and adding vision:>102552674 >102552688 >102552744--usecublas mmq 0 is now default and makes a big difference:>102552283 >102552332--Nala test with 90B model shows improvement but raises questions about test design:>102552440 >102552512 >102552522 >102552505 >102552520 >102552723--LlaMA 3.2 3B one-shots Snake Game:>102552587 >102552652--L3 Tenyx Day generates working pyqtgraph plot of scrolling sine wave:>102553616 >102555234--Alternatives to 90b vision model for captioning and image generation:>102552399 >102552424 >102552437 >102552443 >102552509 >102552667 >102552679 >102552715 >102552733 >102552745 >102552843 >102552783 >102552824 >102553106 >102553151 >102555291 >102555313 >102555335 >102555367--Adjusting batch size and ubatch size for prompt processing and layers:>102552065 >102552157--90B 4bit bnb model struggles with image description accuracy and spatial orientation:>102554144--Miku (free space):>102552059 >102553803 >102554159 >102554179 >102556227►Recent Highlight Posts from the Previous Thread: >>102552037https://rentry.org/lmg-recap-script
I hate the anti christ
>>102557552dumb spam poster
>Using OpenRouter I tried Hermes 3 70B on a whim and found I actually liked it>I tried to use it just now and found I got rug pulledServes me right for ever using a 3rd party service.
best vision model to send dick pics to?
Why is everyone quitting at OpenAI before the cashout?
>everything new is shit>nothing happensit's over isn't it
>>102557712>Implying anyone but the jew will get moneyThat's why. Leave now before everything goes to shit completely and your reputation gets tarnished as a result.
>>102557712people can have morals or brains, but not both
I was away for a day and so much shit happened.
>>102557739all nothingburgers it's over
>>102557739nothing happened
>new multimodal dropsOh cool!>text and visionGod dammit. When will the dumb vision meme die? It has not given models any better sense of spatial reasoning, it's just a dumb party trick for asking it to explain things you already know for a laugh.
>>102557712Because (You) only care for money, and don't have a shred on integrity.>>102557739Indeed. A lot of SHIT happened.
>>102557712They plateaued and had to obfuscate the fact. Expect some youtube documentary retell that as big revelation in a couple years.
>>102557739>so much shit happened.molmo shill or meta shill?either way, you know what to buy
>>102557757that's why they have to cash out and ipo now, before the public knows enough to not buy their bags
>>102557775what kind of ammo?
>>102557753Its not even image out. Very underwhelming.Multimodal for local always means text/image in, text out. BORING.
"multimodal" more like shittimodal
How can i try a 4bit quant of llama3.2 with a pascal multi gpu since there is no gguf?dont say exllama, that has insane prompt processing time. probably because of pascal.aphrodite engine? i'm serious by the way.
"multimodal" more like faggotmodal
llamacpp multimodal support when?llamacpp 3.2 support when?
>>102557801exllama 2
>>102557775A 3090?
>>102557791We have plenty of resources for image out. We need multimodal text+speech models.
>llama 3 is shit>llama 3.1 is shit>llama 3.2 is shitremember when llama 3 was going to save us? and then when llama 3.1 was going to save us? yeah it's over. pack it up.
>>102557827Right after Jamba
Mikulove
>>102557846like what..chameleon and the poor attempts to reimplement what has been cut out. lolthere is no text+image out model as far as i know.
>>102557739>so much shit happened.tl;dr? what did I miss?
>>102557841No. The last stock of 4090s you can still find, then wait for the 4 grand 5090/Titan with 32GB VRAM.
wake me up when one model can write to me, send me pictures, and whisper in my ear. everything else is RNG with extra steps.
>>102557854Do you have ANY idea what they're planning for L4? I can't say much but, well, let's just say it's just a bit too early to give up hope. Check back in a couple weeks and let me know how over it is or isn't.
>>102557858is that before or after DRY?
>>102557900yes yes l4 will totally save us just like l3 did
>>102557900>two more weekskek, almost had me
>>102557712Because Sam probably made it clear that only him will get the bag, so they didn't see the point on staying and then having to clap for his ass when he'll get the 140b, that's fair
>>102557787>that's why they have to cash out and ipo now, before the public knows enough to not buy their bagsthis, it's probably soon over for OpenAI, it'll probably be brought by Microsoft after that
zoomer doomers are so fucking unbearable lmfao at least they'll all troon out sooner rather than later
>>102557992I have my doubts that Microsoft even needs, or wants to period. It's all about the datasets and staff anyway, which they can get easier (and cheaper) in other ways. They already have the hardware by default too.
>>102558011yeah Idk, Microsoft doesn't seem to know how to make models, so they better take those from OpenAI
So will I be able to send dick pics to my sillytavern chat soon?
>>102558011OpenAI's mailing list and customers (despite not being profitable) are pretty valuable to a company like Microsoft. If the price is right they'll buy.
>>102558025All they have to do is "poach" the staff and data and be done, then remake shit to have full control start to end.Simple taking the model wouldn't fix their lack of knowledge or skill in how to use, make or improve them. Granted, neither does OAI, but so it goes.>>102558040Just throwing out some ideas, that's all. Not like it will matter to them money wise either way, for obvious reasons.
>>102557892You just know it will never be allowed. Even if it works out, you'll have the same crippled and censored experience with AI rejecting and calling you "incel chud" for wrong opinions.
>>102558055>Even if it works out, you'll have the same crippled and censored experience with AI rejecting and calling you "incel chud" for wrong opinions.tough pill to swallow but it's true, the only way to get out of this is to get a great base model and finetune this shit with based text, but that's pretty unlikely it's gonna happen
>>102558077>get a great base modelextremely unlikely
Has the anon that made the Director extension put any newer versions out?
>>102558221>Director extensionwhuz dat?
>>102558266It's a sillytavern extension that adds bits of info to the prompt based on presets and lorebooks, used to tell the AI things like what the character/user is currently wearing, the time of day, weather, etc.It's like a slightly more automated author's note.
>>102558025Microsoft can't make anything anymore. It's what you get with a bunch of pajeets.
>>102558285that sounds pretty neat, you have a link to it? i couldn't find it on google
>>102558300This is the last version I can find. Not sure if it works with the latest sillytavern>>101910710
>>102558296is there any big tech company left that this doesn't apply to?
>>102557534>>102557696I believe you're still misunderstanding the reasoning behind my post. Yes, it is expected that for any normal LLM, its performance would decrease with the more difficult problems. That is indeed obvious. I am suggesting that despite that, o1 is still under-performing because of this or that (which may be more clear as you are showing details from the paper I didn't look at, because I was just looking at the summary). My implied reasoning was that if o1 is able to dedicate more tokens to thinking about problems, and its performance improves generally without foreseen limit (note on that at the end of the post), then it should just dedicate more tokens to the more difficult problems and solve them with similar accuracy.Now, as you have shown, they did note the token counts o1 used. In this case that does push forward the discussion of understanding what happened in the study. Yes, based on the logic I meant to present so far, I would say now that it's possible that o1's performance on the more difficult problems could've improved with even more tokens, and perhaps right now it is just an artificial limit that stopped them from being able to get that data. However, we don't really know, as there is also no data to suggest that it won't stop improving at some point soon or far away.>Their claim was that having it "think" longer on the same task would increase its accuracy on that task, not that...OpenAI might not have claimed it explicitly, but it's kind of the idea, that, if allowed to, potentially the model could just get better and better to ridiculous lengths by being allowed to think more. They only said that they would investigate this new scaling behavior, but didn't say anything to quell the implication (and general tone of the article) that it's some new scaling paradigm that will lead us to crazy lengths of improvement.
>not an eldritch horror
>>102558522>4096x4096>that quality
>3.2 90B Vision is super retardedno not like this...
is the 3.2 90b multimodal model stronger for text-only applications than 3.1 70b?
>absolute eldritch horror
>>102558892not scary
>>102558904real eldritch horrors never are
>>102558892>ywn SEX a real eldritch horrorBrehs...Surely this comment will not come back to bite me in the ass one day.
Wonder how's shitma 3.2 in safety, the most important thing in this world.
Has anyone tried the 1B or 3B for speculative decoding of 70B, and compared it to using 8B, for the draft model?
I've been reading that they kept the text part of the models the same, and just added on the vision adapters, but is that really true? Is it possible to download only the adapter and stick it onto my existing 70B? Also, I feel like this should present some interesting optimization options. Like the adapter's weights are only going to be used when encoding the image, right? So in theory you should be able to get some good overall gains by having the adapter's weights in RAM, assuming a RP use case.
hello guys i need coom model for my 6gb vram 2060 rtx nvidia card from huang grifter and 4x8gb ram fury patriot fx supreme edition with rgb lights (lights are red and green)coom model need to write ah ah mistress and sex (optionally go to scenes)
so is meta gonna open source the 405b? they have it on their cloud
Where the fuck do I go if I want to discuss local audio models.You're telling me image, video and text gets their own generals but not anything for Audio generation? Been trying to find some up-to-date audio models for various stuff like Text->Audio or Audio->Audio, all I find are shitty reddit posts from 9 months ago on how to add audio to ERP LLMs.
>>102560550You can discuss it here since it lacks alternatives.Here you gohttps://play.ai/
>>102560550r/localllama
>>102560186they made a 405b vision model? sounds stupid, there's no use for one that large
>>102560556I'm sick of these 20 online-only signup garbage services, is there not a single good local alternative at this point? Feels like nothing has happened on local models since Elevenlabs came out ages ago.
>>102560550im gatekeeping that stuff for myself since anons here are baby duck retards anyway
>>102557546>Chatgpt advanced model rolls out which features real-time and emotional responses>Local models are still stuck in early 2023 figuring out optimal ways of converting speech to textOpen source gets btfo'd again
>>102560550You go to r/elevenlabs and r/SunoAI
>>102560590on the contrary, vision models will be actually useful for the first time probably around the 2T range
>>102560876>akshully there's no use for one that smallok so you agree it's useless
>>102560550fish is great, 60% of the time, it works every time
>>102557900>Introducing: llama-4. This state of the art model now uses an improved tokenizer that prevents model from outputting any adult oriented material. We just removed all the dicks, blowjobs, loli etc. And if the model realizes the safety measures were circumvented it calls an external function to delete itself from your hard drive.
>>102560900why are you guys asshurt over this when there is an endless supply of porn on the internet
>>102560941it's really easy to flip this question aroundwhy are ml devs obsessed with preventing porn generation when there's an endless supply of porn on the internet
>>102560941An interactive experience tailored to your personal tastes is infinitely better than anything else you can find.
>>102560957data is crucial to ml development and porn is slop
>>102560965in that case ml devs should love it, because they fucking love slop
>>102560965lol
>>102560965Careful with trvth like that... We're not ready...
>>102560941That is kind of a mid bait because I don't even know how to respond to you. I want to touch my penis to the text written for me specifically. And my niche fetish is hard to find. I am not like the piss anon who can find terabytes of girls pissing themselves.
>>102560590>>102560890it's a research model you fucking mong>this is only useful at this size!good I guess we'll just never make any progress since the intermediary steps aren't useful for practical reasons
>>102557712Altman is just getting rid of the people who tried to push him out a year ago. He's preparing his step to become the god-king of modern AI.
>>102560976I can guarantee that my fetish is rarer, and even pyg got me off pretty well.You're just lazy/dumb (same thing).And don't bring piss anon into this.
>>102558343The hosting period is expired, any chance you mind sharing?
new model when?
you can tell the state of things is good when the thread is slow, means everyone's too busy having sex with their graphics card to shitpost here.
>>102561051>And don't bring piss anon into this.why?
>>102561613why aren't you having sex with your graphics card instead of shitposting here?
>>102561626long refractory period
>>102561613>busy having sex with their graphics card to shitpost here.I had to stop. I can't perform.
Did we ever had a good comparison point between a MoE and a monolithic (close to) equivalent or are these Molmo models the first time we can do a somewhat like for like comparison?
On my coding challenge from yesterday (create a pyqtgraph plot of a scrolling sine wave, as the wave moves the next cycle should have a different amplitude (random from 1 to 10)): Qwen 72b succeed at it, deepseek coder v2.5 also doesn't quite get it, llama 405b also fails, so far only qwen 72b and gpt 4o did it, it seems to be a problem similar to when you ask a question that the model has seen a lot in its training data but tweak a detail, it ignores the detail and defaults to the more "general" behaviour
>>102561717In general dense will always be better quality wise but the point of Moe is that you only run part of parameters so you can offload to regular ram and still get usable speed. Mixtral was the best example where I could run Q5 on 24GB vram at 5T/s
>>102561633if you are so weak you can't even overcome the refractory period, you'll never be able to handle the next gen of localsit's over for you
>>102561744nta but its going to take over a day for mine. local is single thread and slow when you do a lot at once
>>102561725You can easily push outside distribution when programming, it's very information dense and rigid, not like creative language where it's easy to mask the simplistic mechanics in the model. On the other hand, they're great for automating repetitive boiler plate bullshit, it's a fucking treat when the model shits out getters and setters.
>>102560550That general is on /mlp/ unironically
So now that llama sota models are multi-modal, will lcpp finally have to support something other than text?
>>102560811There are a dozen of them on Github, but you can't code for shit zoomers.
>>102561800seems like ollama might end up supporting it before upstreamhttps://github.com/ollama/ollama/pull/6971https://github.com/ollama/ollama/pull/6965https://github.com/ollama/ollama/pull/6963(Coming very soon) 11B and 90B Vision modelshttps://ollama.com/blog/llama3.2
>>102561800ggerganov:>My PoV is that adding multimodal support is a great opportunity for new people with good software architecture skills to get involved in the project. The general low to mid level patterns and details needed for the implementation are already available in the codebase - from model conversion, to data loading, backend usage and inference. It would take some high-level understanding of the project architecture in order to implement support for the vision models and extend the API in the correct way.>We really need more people with this sort of skillset, so at this point I feel it is better to wait and see if somebody will show up and take the opportunity to help out with the project long-term. Otherwise, I'm afraid we won't be able to sustain the quality of the project.https://github.com/ggerganov/llama.cpp/issues/8010#issuecomment-2376339571
>>102561867>seems like ollama might end up supporting it before upstreamIt's not like it's that much work. They just need to copy-paste the cli code into the server. They can even use the original server multimodal code from earlier this year as a template.llama.cpp could do it too, if they wanted to. But ggerganov refuses to add it back in only because the code isn't elegant enough or something like that.>seems like ollama might end up supporting it before upstreamStill embarrassing.
>>102561910>But ggerganov refuses to add it back in only because the code isn't elegant enough or something like that.llama.cpp abandonware>>102561905
>>102561910That's actually kinda based, I'll wait.
>>102561929hi cuda dev please dont spam blacked miku in rage when ollama adds 3.2 support k?
>>102561943I'm not who you think I am. I'm just the dude who wrote the OG Miku prompt back in the llama 1 days, can't believe the amount of asshurt it has caused over time.
Honestly, imo qwen 2.5 72B IQ4XS with 4-bit KV cache has been alright. Unlike miku and cydonia, it manages to keep a secret written in a card I'm using, but it just loves repeating literally the same sentence(s) verbatim, even when I crank up DRY and/or rep penDon't know if it's my writing or the model. I feel like a finetune could really make it shine. Haven't used it for sex yet, but it doesn't complain during foreplay at all
>>102561905>We really need more people with this sort of skillset, so at this point I feel it is better to wait and see if somebody will show up and take the opportunity to help out with the project long-term.You know what, from the point of view of a main maintainer of a large open source project, that's fucking fair enough.
>>102557552Why did the bot linking break? Did you shift to a lower quant?
>>102562135can't have more than 9 refs now, not 100% sure why, but here's where it was noticed>>102478518 >>102478544
We all talk about models and shit, but how do you guys write your char cards?
>>102561905That's what happens when you try to support everything. It gets too big to maintain.
>>102562150plain language, formatting tags are irrelevant[char's name] is X, Y, Z. [char's name] has X, Y Z. [char's name] does X, Y Z.no {{char}} or {{user}}, ever
>>102562148Ah. Well, probably for the best, honestly. No more fucking threads where some asshat spamquotes every post in the thread with NIGGERS NIGGERS NIGGERS
>>102562238Gotta admit it's funny that LLAMA.cpp wants to wait for supporting LLAMA 3.2 of all things. Guess they want to avoid another llama3.x incident and weeks of bugfixes.
>>102562260but {{char}} and {{user}} is converted to plain language in ST with their appropriate names........
>>102562150i just do shit like this then write out the first message. or grab something off chub and remove all the {{char}}s to not fuck up the context shifting.i (probably incorrectly) assume wrapping stuff in square brackets keeps it from trying to emulate the terseness of the factoids in the actual chat.
>>102562260>no {{char}} or {{user}}, everWhy?Writing Nala is a lioness is the exact same as writing {{char}} us a lioness, so I guess it's a wash in this case, but for {{user}} at least it makes more sense if you want to use different personas and have it be referenced in the card itself.
>new multimodal model release>look inside>text and vision
>>102562260I'm the anon who asked earlier. That's it? I mean, I've been banging my head against the wall trying to get my characters all formatted in Alichat and plist, and it worked fine up until Llama2. But ever since Mixtral, L3, and Nemo dropped, I've got this feeling that Alichat is responsible for a ton of repetition and pattern sticking (in a bad way). Honestly thought you guys would have some more advanced LLM wizardry than just, "Nah, you're good, just use plain text."
>>102562327Anon is right in that clear, concise, plaintext is the way to go.Some models seem to react well to tab based indentation for lists too, but it's generally unnecessary.
>>102562148>not 100% sure why,Because the anon that always shits up the thread is one of the mods who has a clear anti-AI agenda and Hiro is too much of a cuck to defend his website from subterfuge.
>>102561806Name one
>>102562359>Anon is right in that clear, concise, plaintext is the way to go.Thank god. I suddenly feel the urge to make cards again.
>Llama 3.2 1B and Llama 3.2 3B>Mogged by Qwen>Llama 3.2 11B and Llama 3.2 90B>Mogged by Molmo>Voice modality>Only on Meta AI chat, enjoy your text and image modalitiesUm... bros?
>>102562367Whisper?
>>102562414Thats not mutlimodal, its just speech to text.
>>102562298>>102562312{{char}} refers to the name of the card, not necessarily the name of the character(s)same for {{user}}, requires changing persona even if you just want to use a different name
>>102562412>Only on Meta AI chatHonestly I'm still bitter about this one. They talked about speech understanding in the Llama 3 paper and showed it was better than Whisper, only to not give it to us.
>>102562439Here at Meta safety is our top priority. We don't want people to get PTSD from thinking about somebody doing something privately in their own home where they have no way of knowing if it's actually occurring or not.
Is Tiger-Gemma-9B-v2 q8 the best uncensored model for writing that I can run on 16GB vram?I cant get nemoremix 12b q8 running with ooga.
>>102562437That's enough for most usecases
>>102562479>12b q8Have you tried q6?Specially for nemo it should have very little degradation over q8 thanks to quantization aware training, if I'm not imagining that that's a thing.
>>102562479>I cant get nemoremix 12b q8 running with ooga.Show the errors of you want help, you retard. I'm sure your context is set to high.
>>102562508Oh yeah, that's a thing.Its configs defaults to a sky high context size.
>>102562479probably this>>102562508nemo default to 1 million context for some reason
>>102562491>no use case>Just wait 30-60 seconds for your speech to be converted to text then wait again for the main llm inference
>>102562517>>102562521>nemo default to 1 million context for some reasonIt's what config.json says.And yeah. We've only had 672314 anons with that problem so far. Very rare. And none of them can read the terminal output.
>>102562439coming to local right after the US elections... right bros? right?
>>102562535Whisper inference speed is practically real-time faggot
>>102562551As long as the correct candidate wins, maybe.
>>102561905literally
>>102562600not literally. ollama is digging as we speak
is there really that much of a difference between Q4 and Q8 to justify using twice as much vram?
>>102562607Considering that VRAMlets are always crying about their models being retarded I would assume so.
>>102562544>And none of them can read the terminal output.can you blame them? it's not in twitch or tiktok format
https://www.nature.com/articles/s41586-024-07930-yPaper on how more and more instruct tuning will eventually make models wind up giving inaccurate answers. It's nuts how they just keep slopping their own models into the grave.
>>102562635They just need it respond coherently and make function calls, so they don't care. We just need a large base model that wasn't trained on a filtered dataset, but no one will ever release something like that.
>>102562629Yeah. With enough AI, it wouldn't surprise me if error messages start being converted to little clips of an indian explaining how to fix them.
Is it worth running a 405B model out of swap if I can't do it in RAM, but I'm doing storywriting and don't really care about speed. Does anyone do it?
>>102562635>as task avoidance decreases the odds of giving a right or wrong answer (aka any answer) increasesWow. Imagine my fucking shock. What a fucking scam study.Anybody quoting it is a fucking retard that didn't even read it. Holy fuck.Jesus Christ. Academics are such pseud fucking retards.
>>102562607Depends...
>>102562681If you don't care about speed, sure.Just be careful to not burn your SSDs down.
>>102562635Not surprising. When it comes to corpos and gaming human preference, a pleasant sounding response > a correct one.
>>102562683>>as task avoidance decreases the odds of giving a right or wrong answer (aka any answer) increasesYeah, exactly. I'd rather a model avoid giving an answer to a task that's too hard than give a wrong answer.
>>102562635Alexandr Wang is not gonna like this
>>102562681it's going to be unbelievably slow and awful for your disks, just don't botherrun something you can do in RAM, the difference in quality isn't worth it
>>102562702Doing tasks in the first place is an emergent property of doing instruct tuning you dumb fucking retard. You fucking braindead pajeet moron.
Why aren't we using the base model to complete conversations instead of using instruct models, again?
>>102562696>>102562718Can SSDs really be worn out by reads? I really don't mind starting a generation and doing something else while I wait.>the difference in quality isn't worth itIn my experience, increasing the parameters has always been worth it. Back in the day, stepping up from LLaMA 1 34B to running 65B out of swap is what convinced me to get 64GB of RAM for my current PC. But I figured I'd ask here before downloading a 405B model since they're huge.
>>102562726Obviously, but there's such a thing as overtuning or tuning shit wrong. Even if the study is flawed, it's plain to see that models are becoming way too overconfident in their answers (given the dramatic rise lack of variability on rerolls), and that some training make them avoid answering questions they have low certainty on (or provide disclaimers) could do some good.
>>102562778base models require more effort to do what you want
>>102562726The emergent behaviour comes from the language training, not from the instruct tuning.>>102562702It cannot know what it doesn't know. They have no introspection. They just complete text the best they can. Sometimes it's not good enough.
>>102562781>Can SSDs really be worn out by reads? I really don't mind starting a generation and doing something else while I wait.Basically: no. The amount of damage a read does to an SSD is totally negligible, you'd have to be reading for years straight to cause any sort of harm. The reason using paging with SSDs is bad is because it's effectively RAM, which is constantly being written to and changed. For volatile memory, that's no problem, but for drives, it's a disaster.
>>102562778The modern LLM user is a lot more lazy and spoiled than us gpt3 veterans. Nobody can prompt anymore even with instruct, using base models is far beyond their capabilities.
>>102562412>Mogged by Qwen*only in coding/maths>Mogged by Molmo*only in visionThere is no single model with the coding, math, language, vision, and everything knowledge of the world, all in one, as GPT-4o and 3.5 Sonnet. Though it's nice that there are now finally ones in each category that are on par with them. Well except voice, but even Sonnet 3.5 doesn't have voice like 4o.And honestly 4o voice is not that great now that it's been censored to hell and has an hour daily limit. Yes I have it.
>>102562778Well, I want to, but they stopped putting out base models. NAI's the closest thing there is to a text completion model, at the moment.
>>102562801>The reason using paging with SSDs is bad is because it's effectively RAM, which is constantly being written to and changed.I don't think this happens if you have a swapfile/partition configured. An mmapped GGUF file getting paged in should only be reads. I only remember seeing the reads, not the writes, get pegged in htop back when I ran 65Bs on a 32GB system.
>>102562801>>102562866Sorry, I mean if you *don't* have a swapfile/partition configured
>>102562412Llama4 trained on a gorillion GPUs and ultra high quality and safe tokens will take the crown again bro
>>102562778It feels like you trade slop and repetition for less coherence and comprehension. Not exactly a step up.
>another day>nothing happenedI guess it's really over this time
>oysters
>>102562840>but they stopped putting out base modelsDid /lmg/ forget about Nemo already?
>>102562994>the best ones are small and openlecunny strikes again
>>102562994That analogy is gonna bite him in the ass.
>>102562994oioioioioi
>>102563010Sorry, base models at a size above "unusably retarded", my bad.
LLMs are like lolis: the best ones are small and impressionable
>>102563031You got 72B Qwen a fucking week ago.
>>102563031>>102563010Qwen 2.5 72B has a base model too.
>>102563031>https://huggingface.co/ai21labs/AI21-Jamba-1.5-Large>https://huggingface.co/Snowflake/snowflake-arctic-base>Nooo.. that's toooo big!!!! I want it just the right size!
>>102562994LLMs are like women
>>102563068filtered trash just like llama and qwen
>>102563054>>102563056Didn't they turbo lobotomize it to the point that finetuning can't save it? Figured something that bad'd be base model-level data elimination, that's usually the case when the model can't even recognize body parts or starts collapsing ala that one Stable Diffusion release.
>>102563083>i want a model just for meeeeeeeeee. why don't they think about meeee!???!?!?!?!?!You're running out of options, then. When do you start to train your own models?
>>102563089It wasn't lobotomized. They took Meta's filtering approach too far and filtered out even the slightest mention of sex, even gender and body parts.But it's a good and sterile assistant, so other corpos are likely to continue this approach.
>>102563068>arctic-basenice pun heh
>>102563112Well, that's what I meant, I guess I just used "lobotomize" as a blanket term, but clarified later. Really fucking awful, you'd think they'd realize the calamitous implications of doing that shit after SD's model was destroyed by it. Do they think they're safe from the effects of such catastrophic model data loss?
>>102563099>When do you start to train your own models?as soon as i win the lottery
>>102563143You could just scam a bunch of investors out of money and/or compute.Much easier.
>>102562607no, if you can run an even bigger model q4 even better.
Why did people train Mistral models for sex when they're already overly horny (mostly Nemo and Small)? I don't get it, are the people who use those models literally just going up to the model without any context and being like "ME WANT SEX NAO!!11!!" or something?
>>102563164>"ME WANT SEX NAO!!11!!"Too many tokens. The meta is "ahh ahh mistress"
>>102563159NTA, but I feel like the startup scam window is pretty much closed. All the last stragglers like Mistral got in long ago. Like they say, if the pyramid scheme/stock/etc. is already mainstream, it's too late for you to get in on it.
>>102563164because mistral models are bland and coomers like retarded schizo babble
Good morning /lmg/!
>>102562866Aye, but the OS will still try to load as much as possible, so you'll have more page writes than usualIt's not that bad though, really. You can write dozens of gigabytes a day and you'll still probably replace your SSD before wear and tear becomes a problemModern SSDs are incredibly resilient and the TBW estimates are usually very conservative
>>102563164I got turned off from Mistral when Large repeated entire phrases and entire message structure chunks for several messages in a row, and only two messages in, too.
>>102563183Nah. There's plenty of VC money still being thrown around, you "just" have to sell an idea that's different from what's super visible in the market right now.
I like lolis: the best ones are small and impressionable
>>102563209nemo does that too with nothing but temp 0.3 to 0.5.There were some schizo settings floating around, something like temp 5, Top K 3 and some min-p that you might as well give a try I guess.
>>102563212I suppose if anyone could, it'd be someone involved enough still to be here through all the fucking horseshit spam in this thread.
>>102563099NTA but the situation really is 50 options and all of them suck at sucking dick.
I fuck lolis
>>102563141It works for them because these models are actually being used for things other than generating porn. It may come as a surprise to you but yes, really, they are. Mostly for corporate RAG and boring data manipulation tasks though, sure.
>>102563164"training" for one epoch only makes the model sound a biit more like training data. It teaches it nothing. The whole finetune business is cruising on placebo.
>>102557546Me on the left
>>102563205If you don't have a swapfile (or any rw mappings) the OS won't write any rw page out because there's literally nowhere on the disk it can put them.
>>102563263I don't believe you
>>102563164>ME WANT SEX NAO!!11!!That's Sao's, Drummer's, Undi's and Anthracite's audience. I never got the appeal of hornytunes, they completely ruin the immersion. Like bitch, I've just met you 5 minutes ago, you are supposed to be shy, why the hell are you jumping on my dick already? Are they complete promptlets who can only say "ahh ahh mistress" and then wonder why with normal models girls don't like them?
>>102563141>Do they think they're safe from the effects of such catastrophic model data loss?Probably. The idea must be that if they filter more accurately, it won't damage the model.SD filtered so much it could not output any humans in anything but an upright pose. BFL also filtered NSFW out of their dataset. Flux originally couldn't do genitals, but all other anatomy was fine. So clearly, there is a "correct" way to filter out just the portion of reality they don't want.
>>102563278Too bad
>>102563241>50 optionsMost model architectures are abandoned. We don't see many architectures other than llama-based. There's a mamba and mamba 2 here and there, a jamba over there but being realistic, no big company is going to make smut-capable models on purpose.
>>102563164nemo instruct has many issues that don't exist in most other finetunesfor instance, whenever it writes one or two replies beginning with "10 minutes later", it will often start doing that with every single other reply
>>102563299>mistral model repeats itselfwhoa no fucking way!
>Messaging base model>Have it semi-coherently complete text>Add one word to tone prompt>Did I ever tell you about the time my uncle died? Died? died? Died? Death death love happiness corn porn horn cycle cycling cycosis medicine seen alert alive alzheimers allegiance articulation articuno zapdos moltres arbok SHINY SHINY SWEEP SWEPT SWEEPING shadow arttiiigughhhhh goooood good ed,,zinger suivante,,tels handknits finish,,cagefuls basinlike bag octopodan,,imbossing vaporettos rorid easygoingnesses nalorphines,,benzol respond washerwomen bristlecone,,parajournalism herringbone farnarkeled,,episodically cooties,,initiallers bimetallic,,leased hinters,,confidence teetotaller computerphobes,,pinnacle exotically overshades prothallia,,posterior gimmickry brassages bediapers countertrades,,haslet skiings sandglasses cannoli,,carven nis egomaniacal,,barminess gallivanted,,southeastward,,oophoron crumped,,tapuedWhy the fuck are they so sensitive to that shit?
>>102563274Not sure if that's a great idea though
>>102563310not enough meme samplers
>>102563323Why not?
>>102563298Aren't DeepSeek models relatively unfiltered? They release base models.
>>102563347Are they good? I never hear anyone talk about them.
>>102563309This has always been a formatting issue. You have a missing or extra space in your formatting or such.
>>102563323For video editing, for example, it's not uncommon to have a scratch disk. One that is completely used for swap during encoding, and expected to fail sooner rather than later. I don't see that as a problem for llms if the user is ready for that. An expendable resource, basically.
>>102563363Is ST's mistral default just dogshit, then? Man. In any case, a 100+B model has no beusiness being so sensitive that a single out of place space could cause such calamitous problems. That's 7b shit.
>>102562994LECUNNY NO
>>102563360Deepseek is smart but very, very plain. Good assistants but bad for RP.
>>102563274vm.swappiness = 1anything else would be a self-own for an inference server
>>102563382>Make model so boring you don't even HAVE to censor itMaybe they should just go this route.
>>102563347>Aren't DeepSeek models relatively unfiltered?That's what i mean by "on purpose". Absolutely no big company will make a dataset for smut, but some will happen to have it in their dataset and not care enough to filter it.I know about the model, but i haven't tried it yet. I don't think i can run it.
Is there ar entry for Llama jailbreaks?
>>102563310use a smarter model
Even if you're willing to wait 10,000 years for a reply, the wear on your CPU from running at inference levels for the ridiculous length of time swap inference + using up an SSD would cost more than just buying more RAM.
>>102563337If your OS runs out of memory, then what? Which program should it stop first? Having some amount of swap space is pretty important imo
>>102563414Bwo... it's 405b base... how much smarter can I even go...
>>102563438use a smarter prompter
>>102563382>smart but dryThat's also my experience.They're probably 80% of the smarts of 405b with 5x the performance for cpumaxxers.
>>102563421Normie mobos don't support more than 64-128gb. If the plan is to run 405B, he's gonna end up swapping anyway.And the CPU will get 0 pressure, as the bottleneck will be on the ssd being ridiculously slow. It's gonna idle most of the time.
>>102563410>jailbreakswe don't do that here
>>102563429A lot of people run Linux without swap. If you "run out" of memory, but it's because your memory is full of 10's of GBs of readonly mmaped disk files, Linux will evict those first before it OOM kills a single process.
>>102563438>it's 405b base... how much smarter can I even go...bigger quant, or...?I run 405b at q8 and have not had this problem over tens of thousands of tokens.Maaaybe the occasional repeated slop phrase, but not once has it devolved into a gibbering thesaurus like smaller models tend to.
Is it finally time to admit that we plateaued months ago?
>>102563377>Is ST's mistral default just dogshit, then?Yes, absolutely. It is utterly retarded.
>>102563503Actually official mistral rep made a pr to silly and kobold with updated template, but I think it's still wrong because it puts eos there, when the backend already takes care of it.
>>102563503What's a good one, then? I'd love to experience it actually working. It seemed like a fun model, just horribly repetitive.
>>102563534If it's mistral large, then it uses v3 tokenizer. So, a single whitespace after [INST] and [/INST].
>>102563483If you say so, I'm just a mere wangblow user and I haven't looked into how linux' memory management yet
Seems like the scene is utter shit at the moment, nothing good coming out and it seems a real uncensored model will actually never going to exist. Gonna take a break for 1 or 2 years and try again. I just wonder how much would it cost to make something like mistral large but completely uncensored.
are qwen good models as an assistant for learning chinese and japanese?
>>102563489Well, it isn't that, really. But it's similarly gibbering, just more coherently.
>>102563598It struggles a bit with romanji but 72B is probably the best vramlet cope we have right now.
why haven't we trained our own model from scratch yetwe have plenty of idiots with 4 3090s, just train your own mistral losers
>>102563608nta. Looks close, but not quite, like a rep-pen issue. Just a run on sentence with a bigger vocabulary than usual. You have rep-pen off, right, anon?Also, you have to give it a reverse prompt or stop word or whatever it's called on that thing you're using. Otherwise it just keeps on going.
>>102563661>4 3090snow we just a thousand more
Llama3.2 1B is good, actually> As Wikibot trekked through the dense forest, the air thick with humidity and the canopy above filtering out most of the sunlight, he couldn't help but feel a pang of melancholy wash over him. The worst part about walking was not the physical toll it took on his body – his legs aching in protest as he hauled his backpack full of research equipment and supplies across the rough terrain. But it was what lurked just beneath the surface that truly unnerved him.> The rustling of leaves, the snapping of twigs, and the distant call of a bird seemed to take on sinister tones when he walked alone. The forest was already alive with an otherworldly energy, a vibration that seemed to emanate from the very trees themselves. And every step felt like a trespass into that world – as if he was disturbing some ancient balance, something hidden and malevolent beneath the surface. Wikibot's mind began to wander, conjuring images of unseen horrors lurking just out of sight.> As he walked, the forest grew darker, the shadows deepening like dark molasses pools. The trees loomed above him, their branches tangling together like skeletal fingers. A faint breeze rustled through the leaves, carrying an eerie whisper on its breath – a soft "whisper… whisper…" that seemed to carry a warning without actually saying a word. Wikibot's heart quickened, his skin prickling with goosebumps as he sensed the forest was watching him, waiting for him to make another wrong move.>>102563570>>102563673Why not train on the mountains of porn available on ffnet and ao3?
>>102563189Good morning Miku
>>102563707>Why not train on the mountains of porn available on ffnet and ao3?As someone who curated 100mb of data, you have no fucking idea how much data 100mb of plaintext is. Much less the astronomical amount of data you need to actually train shit. So much of it is dogshit, SO much. You can't just let it scrape and then train on that, you have to perform some sort of cursory quality check, even when filtering by rating.
>>102563707>Why not train on the mountains of porn available on ffnet and ao3?Probably a nightmare to process and format correctly. And then finding the good stuff. Quality matters.
>>102563790By train shit, I do mean actually pretraining from scratch, because you can't add data that isn't there with finetuning.>>102563804This, too. Formatting IS a nightmare. Even if you go with books, which are largely a better source of uninterrupted, quality prose, the formatting those fucks use varies so dramatically that there can be no one-size-fits-all, automated solution. Even books in the same series/by the same author often vary wildly in construction.
>>102563823Bro? Just hire thousands of jeets to do it over the course of several months?
>>102563790>widdle baby is afraid of a 100Mb text fileI've done way more.You already have tags on those websites. All you need to do is filter by the number of downloads. It'd be better to have a community where every degenerate could contribute his own creme de la creme
Can my aifu rate my set up already?
>>102563855>You already have tags on those websites.And that's why things are the way they are.ahh ahh mistress...
>>102563855And your data probably sucks donkey dick. Likely full of turboslop, gay porn, fetishes you didn't account for, etc. if you interacted with it so little that 100mb doesn't seem like a lot to you.
>>102563707>ao3if you think synthetic data is slopped you clearly haven't read anything on ao3. it's all bad.
>>102563855>Curated set vs. blindly scraped dataset with methodology I explicitly said didn't work (sorting by downloads/rating)Yeah, no shit you've done way more. Your model outcome is basically guaranteed to be worse than mine (or anyone who did any sort of manual review's), though.
>>102563922I know but I'd rather take it over nothing. Now please put AO3 back in your training data I beg of you Zucc
>>102563896It wasn't porn so maybe you're right. It was text, but not porn.>>102563922It's not all bad.t. harry potter fic writer>>102563893Add some old serious stuff from gutenbergez
>>102563922>if you think synthetic data is slopped you clearly haven't read anything on ao3. it's all bad.This. You really have to comb through shit and make sure it's decent to get anything worthwhile. Books and the like are much better sources of fiction.
>>102563969>Add some old serious stuff from gutenbergGutenberg would be the only thing if it were my choice.
>>102563991Current llms have seen all the _public_ books in existence ten times over
>>102563996>Gutenberg would be the only thing if it were my choice.wasn't this tried way back when? I thought the results were poor but can't back that assertion up with any actual facts
>>102563831>Just hire thousands of jeets to do it over the course of several months?This happened. And here is a fun thought. What if all the companies are still using porn in datasets but it is all rated by pajeets. Which is the reason for all the slop. And then keep in mind how reddit fellates all the new models. Imagine if those are pajeets that are ecstatic over their models repeating and talking about shivers. While companies say this is all for safety but have no idea how to improve cooming even if they did all because they use jeet rated data.
>>102564022There's a gutenberg dataset on hf that some people use, but it's like 10 books or so. That's nothing. As for a full gutenberg model, i don't know. Last time i mirrored gutenberg it was like 800gb... i doubt small-timers would ever try that. Dunno about big companies.
>>102564020They have also seen most copyrighted text.>>102563831>>102564062Jeets can't read>>102563991Add some Anais Nin. The most depraved shit I've ever read.
>>102563661>idiotsIt was fun to build.
>>102564073>There's a gutenberg dataset on hf that some people use, but it's like 10 books or so.It's funny how true this is for most authors. The amount of data these things need is truly astounding, the complete works of R.L. Stine are like 1-2mb, I think? It's insane.
What's the best most intelligent, creative, soulful model for RP currently?
>>102563261Please explain. You can often reach the minimum eval loss in one epoch, with additional epochs contributing very little on top of that except overfitting.
>>102564110midnight miqu still. 9 months later. if pure intel, largestral.
>>102564110mistral nemo
>>102564115If I were to guess there are infinite ways of sucking dick and your dataset is just too small to change all that much.
https://docs.mistral.ai/capabilities/function_calling/How do I use this
>>102563661I have x4 3090's but I don't know anything about training from scratch. That's what I'm trying to figure out. The goal is a fully uncensored erotica/roleplay model, but it seems the biggest problem would be to prepare a good dataset.
>>102564110Me.
>>102564101>https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1Found it. There are a few others as well. It's way more than 10 books, but it's still 10mb. Another one is 14gb. Compared to the ~800gb of a full mirror they're nothing.>R.L. Stine are like 1-2mb, I think? It's insane.Sounds about right. I'd still like to see a full gutenberg model.
>>102564101>The amount of data these things need is truly astounding, the complete works of R.L. Stine are like 1-2mb, I think? It's insane.Is that why these things write so poorly? The actual literature is tiny
>>102564231There's a lot of it, but people train on the 10mb dataset, not the 800gb of a full mirror. Which i understand for finetuners, of course... and then it gets mixed up with more normal internet content, which dilutes the good it could find in books.>>102564210
>>102563661>we have plenty of idiots with 4 3090s, just train your own mistral loserswith 4 3090 you'll get your model the next century, you need way more than that to get a good and big model fast
>>102564231>Is that why these things write so poorly? The actual literature is tinyWell, in a sense. They COULD write significantly better, but given the limitations of the architecture, combined with the extreme (and only worsening) overeagerness to wrap everything up in a nice bow by the end of the generation, you'll never get anything as complex and human as setups and payoffs, clever throwbacks, independently developing plot elements, etc. It's just not built for that, it's built for doing what you ask and doing so in as close to one message as possible. It's also the statistical average of all human writing, so it's pretty much mathematically incapable of surprising you if you've read any amount of literature, unless you crank up the temp a ton.
>>102564231Garbage on the internet outnumbers actual literature by an order of a magnitude. That still isn't enough so Meta uses Llama to generate trillions of tokens of synthetic reddit to fill the gap.The issue is, companies want to sell an assistant. Proper literature doesn't have too many examples of Q&A, software troubleshooting, or current knowledge.
>>102564291Also, it's sort of implied, but absolutely this>>102564258, the normal internet crap fights back HARD against the quality of literature. Any trainer will tell you that it only takes a few dogshit stories or consistent grammatical errors to tank the quality of the model and have that appear constantly. Imagine the damage the entirety of the internet could do. That's why they have the legions of jeets looking at the data, without that kind of manpower to manually oversee it, the outputs would be fucking horrendous quality.
>>102564176>I have x4 3090lol vramlet
>>102564332Thank you for the useful comment, retardo-kun.
>>102564342and still you can't train your own model ;)
>>102564384Thank you for the useful and insightful comment yet again, retardo-kun.
>>102564394I mean he's not wrong, 4x3090s is nothing, it'd take years, if not decades to train a sizeable model (let's say 50B+)These companies use literal supercomputers and it still takes months
>>102564434>50B+I don't want a general use slop model.The idea is a small model focused only on literature and a core of general knowledge so is not retarded.
>>102564328I want to see more raw pure token count models. While i keep banging on on wanting to see a full gutenberg model, i know things like A Pickle for the Knowing Ones is also found there. I know full well it's not going to be perfect. I just want them to be more fun.
>mfw I can't train a SOTA smut model to compete with billion dollar megacorporations using my 4 year old gaming GPUs
Lmao this retard got triggered for some reason. No one said anything about competing with top models, what a retarded waste of oxygen.
>>102564510>No one said anything about competing with top modelselaborate anon, with your 4x3090 cards, what size would you be aiming for, so that we can laugh a bit
>that retarded waste of oxygen is the triggered one not me>brb I'm going to take down anthropic by training a SOTA smut model on my GPU that can't run modern games at 4k60 on high settings
i get this is the local model general, but why would you need to make your model locally? couldn't you just cheaply rent some retardedly powerful rig to make it?
>>102564554Give me seven reasons why I can't train a perfectly capable smut model on my 2020 Ampere gaming cards. I'll wait.
>>102564554>couldn't you just cheaply rent some retardedly powerful rig to make it?where can I rent a couple thousand H100s to create a new decently sized transformers model from scratch?
>>102564554let me just upload a few hundred gigs of copyrighted material and text smut to a service that has my payment details, that sounds smart
>>102564554Experimenting can get expensive. Finding good datasets, a good base model to use, good training parameters. That is if you want to just finetune. Full training is a separate thing. llm.c trained a 1.6B model i think for 600 bucks, i think. Many would prefer to buy a gpu and use a bigger model with that money.
>>102564510Ignore him, nobody's saying they want to train anything big on anything they own, anything that isn't a qlora is obviously out of reach.
>>102564582Breh nobody gives a shit. OpenAI is blatantly training on YouTube and Google maps data. The training service won't kill their business by ratting out your hobbyist smut training run
>>102564587for anyone who is retarded (all of you), that pricing doesn't scale. even if it only costed $600 to make a 1.6B (unlikely), it would scale exponentially. a 7B wouldn't be ((7 / 1.6) * 600) or we'd already have homebrew smut models.
>>102564610OpenAI has lawyers, random hf guy doesn't, I'd rather avoid the possibility of being turned into an example by some publisher or what have you
>>102564610delusional. you are not a big corporation. you do not have an agreement with microsoft. vast and runpod will absolutely cancel your account to avoid dealing with copyright issues themselves
>>102564587This. Even qloras require absurd amounts of bashing your head against the wall, I wasted 60 dollars before I got a qlora that wasn't lower quality than the 30b base model I was using. The amount of money needed to rent out a whole center's clusters for however many weeks you need to pretrain each attempt would be astronomical.
>>102564618I know that, anon. That's why i said that it gets expensive to experiment.>even if it only costed $600 to make a 1.6B (unlikely)https://github.com/karpathy/llm.c/discussions/677
>>102564640>>102564646heh.... explain the goosebumps QLORA i trained, then.... checkmate.....
>>102564646delusional. you are a schizo
>>102564646And risk a business suicide for nothing. Either you have a delusion of grandeur or I wouldn't put you in charge of anything important
>>102564672>GPT2>trained on 30B tokensThat's why. Training anything even remotely comparable to what we have now would be way more expensive. TinyLlama is a 1.1B trained on 3T tokens and it costed them approximately $72k.
>>102562994Also true for girls
the 3b at Q8 is really good, but heavily censored. looking forward to the merges/tuneswhatever theyre called.fucking knew they were gonna go the small but powerful route with models going forward. /lmg/'s tiny e-peen compensator PC's in shambles knowing you'll be running fucking 3bs and 1bs next year that are better than your triple b's of today kek
Funny how people shit on effort without seeing how expensive it all is. No proper feedback to let them improve in future attempts, just disparaging them instead.Qloras and vramlet cope methods are good for small models, hence smaller models having more tunes. For big ones? You'd need an entire A100 node or more, hence why there's so litttle actual tunes at that size.
>>102564767hi Sao
>>102564744cope
>>102564721Just continue pretraining on an existing model then, surely that will work and be cheaper.
>>102564721I KNOW. My response was to >>102564554 explaining why even renting a training cluster to train a model is expensive, even if you somehow get a working model on your first go. Why are you arguing about?
>>102564774>all he can do, one word, as his entire world crumbles around himsee you on the single digit parameter side
>Long-term, I kinda' wonder if it isn't in llama.cpp's interests to stop supporting the HTTP server altogether, and instead farm that out to other wrapper projects (such as ollama), and we instead focus on enhancing the capabilities of the core API.Thoughts?
>>102564458I'm still not sure you can do it in a reasonable amount of time with just a few consumer grade cardsYou can run the numbers yourself though
>>102564767I'm sorry, but if a finetuner can't even be bothered to filter out dogshit that everybody can see on the first page of dataset viewer, then it's not effort, it's a pure waste of energy and money.
>>102564790I always thought it was weird they added the server at all.
>>102564790I think its still a valuable thing to have llama-server in the codebase. Its overly simplistic for real use, but serves as an excellent starting point for anyone looking to build their own.
>>102564780Sure, just give me their exact pretraining settings and the latest pretraining checkpoint.
>>102564790llama is built as a library for experimenting on top of ggml. I think the server should be simpler and not bother about multiple users and shit like that. It should be, as the directory name implies, an example. It's up to the people that care about specific features to add them on their own. As it is, they provide way too much for grifters. llama.cpp's devs should make grifter's work harder. I use llama-cli almost exclusively, so it wouldn't affect me much.
>>102564790Ollama won! cuda dev meltie incoming
>>102564744>prompt that i have a boner to one of my assistants>she breaks down in tears>start massaging retarded bimbo princess peach's breasts>she leans into the touch and continues the convoeh it's not THAT censored, i think it's just a notch above base 3.1 instruct.still not gonna stop sloptuners from shivering its timbers though.
>>102564853You also need their pretraining dataset to avoid catastrophic forgetting
>>102564781You said someone trained a 1.6B for $600 in the context of a conversation about homebrew smut models. That number didn't make sense so I clarified that even if it was true (unlikely), it wouldn't scale. Then you posted the source which seems to explains that the 1.6B was a proof of concept meme model reproducing an ancient LLM from 2019. Then I provided numbers that would map on to this conversation more accurately by bringing up the training cost for TinyLlama, a similar sized model that attempted to hold up to modern standards. I'm just trying to keep the parameters of the conversation grounded in reality so we don't have retards asking why there's no $2500 7B smut models being made, there's no need to sperg out.
>>102564790Never gave a fuck about the server part, it can die if it means they can work on more important stuff
>>102564883My point is that its expensive to rent shit, even for a tiny model. Even to experiment with finetuning. You agree that even for a tiny model it's expensive. Yes, it's a meme model. Yes, it's an old model and it still costs more than most people are willing to pay to experiment. My example was just a point of reference.There is nothing to argue about. We agree.
>>102564790nooo that's the thing i use. that means it's a bad idea to stop supporting it.
>>102564947It's going to die because they're already not working on it
>>102562635Shut up racist bigot! We localchads go by safety! Safe AI is the only correct AI, it cannot go wrong!
>We agree.Mostly but $600 for 1.6B is relatively inexpensive for enough people in this thread. Enough to cause confusion, which is why I thought it was important to clarify that a disappointing 1.1B costed $72k. Nobody is arguing with you. Meds.
kek meta legit released a 3b that's totally solid for RP and /lmg/ hates it, because of course you faggots don't even know how to use a 3b of all things.*hands you 3 billion watermelons.assistant*
>>102565050i'm downloading it now, it better be good
for casual use, how much of a difference is 16gb to 24gb?I can get a rtx 4080Super (16gb) for slightly more money than a 3090 (24gb) and overall the 4080S is much better
>>102565050I'm using the 70B though
>>102565050Why would anyone run that when even the poorfags here can run 12B?
What inference settings are other anon's using with L3.1 in co-writing/rp scenarios? eg. temperature, top p, top k, typical p, min p, repetition penalty, frequency penalty, presence penalty, samplers, etc?I'm having trouble getting the balance between coherence and insanity just right for a satisfying flow.
>>102565107b's don't have a direct correlation on quality
an amalgamation of migus in their natural habitat
Anyone know if it's possible to rip the vision parameters out of 90B so you can just use it as a standard 70B textgen model? Or are the 70B weights in the 90B REALLY the literal same as 3.1 70B?
>>102565139I don't see anyone here claiming that the 3B is better or close to 12B. All I have seen so far is that "it's decent" or whatever, and that says nothing.
>>102565050>vramlets be like
>>102565149The vision has text encoding, it's probably pretty inextricably tied to it, even if you COULD extract the difference. Also on that note, if you could shave off the difference between it and 3.1, you know that'd just make it 3.1 again, right? It wouldn't magically keep the new text info.
>https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/discussions/1Damn, this phil guy is actually based. I think it's worthwhile to have different companies do different things in the space so we have a multitude of good options for different use cases, but in the end he might be right, that what we got so far was mostly an artifact of early experimentation, and that in an attempt to "compete", companies will follow the same trend in the end.Both the corpos and the corpo bootlickers are disgusting.
>>102564790least indirect ollama shill
>>102565092 (me)it's smart, but too much positivity and safety bullshit to use in any rpdeleted
>my popular knowledge test>I did a vibe checknone of this matters btw. phil is not based, he's a faggot and he needs to stop shilling his link in this thread.
>wake up from a coma>Llama 3.2 released>wow 90B! Finally a competition to Largestral>Turns out it's just 70B with a 20B of vision model strapped on itI'm done with Meta.
>>102565411don't forget that the 20b of vision doesn't even seem to be that good
70B text-to-image/text model when?
>llama 3.2>chameleon was killed for THISunfathomably grim
>Of course! The image attached to this post appears to be a flat color image of an orange circle. Perhaps an avant garde, minimalist depiction of an orange? How creative!
>>102565411>>102565429yeah it sucks, but fortunately we got Molmo's vision model at the same time, and this shit is really goodhttps://molmo.allenai.org/blog
>>102565317>The vision has text encodingWhat does that actually mean? How some vision models have worked is that they literally just have a separate encoder that translates an image into tokens and inserts that into the context, but text doesn't go through that, so that part of the model could be ripped off and the text model would perform quite literally the same. Are you saying that even text goes through the vision encoder on Llama?
>>102557890>the 4 grand 5090>32GBOn cue.
>>102565405So far people have seemed to be in agreement that Qwen does indeed have bad pop culture knowledge, and also is bad at RP. No one has posted proof otherwise, that it is good at trivia and is good at RP.
>>102565411>Meta tells people months in advance that they will release a future version with multimodal adapters>somehow people expect 4o or somethingLmao.
>>102565565hi Yann
>>102565481You should post this diagram instead if you don't want to be labeled as a shill. Llama 3.2's multimodal isn't good but that diagram is quite literally misinformation.
>>102565605hi Arthur
>muh misinformationjust lie
>>102565636True. The fake log poster was quite a funny incident.
>>102565050I have 24GB. I already felt horrible trying a 7B. I am not going lower.
>>102565541If this is true is gonna be one of the biggest let downs ever.
>>102565681especially if you look at the 5080
>>102565541>4 grand 5090What the fuck? Are they trying to converge the prices of the highest end consumer GPUs and the server GPUs so they don't ever actually have to raise the capacity/$ on consumer GPUs and inevitably make them a better value proposition for corpos? Fuck me, man. I blame the guys who made the supercomputer out of PS3s.
>>102565480>Wow, is it really that bad?>go and try it out on lmsys>it's fine, it even got the thin white border???
>>102565757you weren't supposed to try this yourself
>>102565757>3.2 11b can't even see the orangeowari da
>>102565690>16GLOL
>>102565757Maybe you need to go back to pretraining until you can identify jokes, anonie.
>>102565796that'll be $1100, paypig. start saving for the $1600 24GB refresh in a year.
>>102565822>>102565822>>102565822
>>102565810It seems you may need some pretraining as well.
>>1025657573.2 90B can't do nsfw as well as 1.5 Pro or 3.5 Sonnet
heck, sfw too
>>102562994>LeCun>That fucking tweetIncredible, absolutely incredible! Dude gonna get cancelled left right and center lmao