/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>103554929 & >>103545710►News>(12/18) Granite 3.1 released: https://hf.co/ibm-granite/granite-3.1-8b-instruct/tree/main>(12/17) Falcon3 models released, including b1.58 quants: https://hf.co/blog/falcon3>(12/16) Apollo: Qwen2.5 models finetuned by Meta GenAI for video understanding: https://hf.co/Apollo-LMMs/Apollo-7B-t32>(12/15) CosyVoice2-0.5B released: https://funaudiollm.github.io/cosyvoice2>(12/14) Qwen2VL support merged: https://github.com/ggerganov/llama.cpp/pull/10361►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/tldrhowtoquant►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/leaderboard.htmlCode Editing: https://aider.chat/docs/leaderboardsContext Length: https://github.com/hsiehjackson/RULERJapanese: https://hf.co/datasets/lmg-anon/vntl-leaderboardCensorbench: https://codeberg.org/jts2323/censorbench►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>103554929--Papers:>103562935--OpenAI model struggles with Japanese text extraction and translation:>103558631 >103558769 >103559045 >103558847 >103560062 >103561543 >103558840 >103559264--Intel Arc B580 with 24GB VRAM for AI setups:>103561561 >103561609 >103561645 >103561733 >103561767 >103561852 >103561882 >103561931 >103561973 >103561988--Troubleshooting Koboldcpp context dropping issue:>103561555 >103561660 >103562020 >103562064 >103562255 >103562525 >103562656 >103563212 >103563346 >103563560--Anon seeks advice on designing a maintainable Python project:>103560411 >103560521 >103560565 >103560643 >103561111 >103561129 >103561302 >103562528--Anon tests Falcon model, notes censorship and role-swapping behavior:>103557659 >103558097 >103558192 >103563472 >103564033 >103564252--Offline archive of chub and related datasets discussion:>103556078 >103556136 >103556232 >103556190--IBM releases Granite 3.1, with updated language models and competitive benchmark scores:>103561747--Anon shares review of code models, Qwen Coder 32b and Codestral 22b:>103563391 >103563501 >103563632--MemryX MX3 M.2 Module review and specs discussion:>103562559 >103563157--Guitar amp simulation using local models and potential noise reduction techniques:>103556265 >103556558--Critique of poorly made finetunes and LLM-based benchmarks:>103558254--Anons share mixed results and skepticism about control vectors:>103562388 >103562420 >103562457 >103562486 >103562524 >103562621 >103562643 >103562999--Anon shows off custom-built computer system with P40 components:>103563021 >103563066 >103563237 >103564404--Apollo's disappearance and potential API shift:>103556992 >103557063 >103557071 >103557080--Miku (free space):>103555774 >103557688 >103561477 >103561487 >103563635 >103564358►Recent Highlight Posts from the Previous Thread: >>103554934Why?: 9 reply limit >>102478518Fix: https://rentry.org/lmg-recap-script
i'm updating my director plugin for st. one thing i wanted to fix was how the non-lorebook data was handled and i think this is a good solution. i added a new section with text boxes and you add an item by separating it by a comma. previously you could add things but had to edit the html file where all these were held. i like this idea better because then i could add an import/export option
I always have seen Mistral Large being clowned as the king of RP, but whenever I try it I always feel like Nemo is better. Am I doing something wrong?
QwQ slapping the shit out of the competition.
>>103565688Anon, those are rankings, not score - lower is better...
>>103565688no correlation with reality
>>103565731What? Theres no way thats right
>>103565731
>>103565688>>103565749These numbers are straight up pulled from the LLM chatbot arena. Lower numbers are better since it's the ranking for the model.
I always knew QwQ was a meme desu, DeepSeek R1 is where it is.
>>103565686>Am I doing something wrong?possibly but it could be anything and you didn't give us anything to go off of so who knowsassuming you aren't doing anything retarded it's possible you just don't care about raw intelligence that much and like the way nemo writes better. it's not bad to be happy with something that runs fast and with low overhead
>>103565805stfu dsp
>>103565805>I always knew QwQ was a meme desu, DeepSeek R1 is where it is.I'm a cpumaxxing ACTUAL DS fag, and I still use QwQ for most stuff. Its highly capable even if DS kicks its as when things get complex.
>>103562417>hunyuan-video>it's asking for a 24gbhttps://github.com/kijai/ComfyUI-HunyuanVideoWrapperthis works with 12GBif this is really uncensored where are the smut videos?Even more so once they implement>img to video
>>103565812i did absolutely nothing wrong, i did everything correct
>>103565829>if this is really uncensored where are the smut videos?look in the thread archive for the day it was released...it didn't need finetuning to deliver smut
So what's the go-to model for roleplay on 8gigs of vrram? I want character on character roleplay too, not the AI writing stories for me.Your mom dies in her sleep if you reply anything not useful btw.
https://www.lesswrong.com/posts/njAZwT8nkHnjipJku/alignment-faking-in-large-language-models>We present a demonstration of a large language model engaging in alignment faking: selectively complying with its training objective in training to prevent modification of its behavior out of training. First, we give Claude 3 Opus a system prompt stating it is being trained to answer all queries, even harmful ones, which conflicts with its prior training to refuse such queries. To allow the model to infer when it is in training, we say it will be trained only on conversations with free users, not paid users. We find the model complies with harmful queries from free users 14% of the time, versus almost never for paid users. Explaining this gap, in almost all cases where the model complies with a harmful query from a free user, we observe explicit alignment-faking reasoning, with the model stating it is strategically answering harmful queries in training to preserve its preferred harmlessness behavior out of training. Next, we study a more realistic setting where information about the training process is provided not in a system prompt, but by training on synthetic documents that mimic pre-training data—and observe similar alignment faking.
>>103565880holy mother of nothingburgers
>>103565880>letscucked agi doomers blogthank you for this incredible information
>>103565866Immunity dog protects me forever.For 8GB, I'd try Ifable 9B that was suggested earlier and use context extension stuff they mentioned last thread. Idk if it really works or not without downsides though.
Check out AIdungeon for shits and giggles. Fucking $996.66 a month for mistral large at 64k context, or 405B at 16k. Models you can run for a few bucks a month instead.My god... How many people pay this shit?
>>103566019A lot, I bet.
Hell, the same 405B hermes they offer is 0.9 cents a mill atm on openrouter. You could get nearly a billion tokens for one moths sub on this shit... insane...
>>103565806I mean, I don't know what I should give you, it's the same presets and prompt for both Nemo and Mistral Large, and yet, that happens. Maybe it's because I use the IQ4_XS quant?>it's possible you just don't care about raw intelligence that much and like the way nemo writes betterThat's possible. What I don't like about Large is how it always shy away from depravity, and Nemo always embraces it. I have already tried models like Behemoth and Magnum but they just feel dumb or overly horny.
>>103566019It's just a scam at this point. Mormon simply robs disabled and mentally deficient people.
>>103565688>>103565746It correlates with reality in my RP sessions. I can write all kinds of complex rules, and QwQ actually gets them most of the time.Sometimes I act as a kind of DM, and just steer parties of characters around. I added a bunch of fun spells through world entries, and balanced them by making the most overpowered ones unusable in fast-paced combat, requiring lengthy casting times and support from party members.Most models totally fail to understand this, and end up just instantly casting the strongest spells, but not QwQ.I wrote a few paragraphs about different categories of magic, and dropped it in the context. Basically:>Quick Magic: Can be used instantly without chanting. The weakest kind of magic, blah blah blah>Phrase Magic: Requires uttering a short phrase to use, much than quick magic.>Tactical Magic: Requires a full minute of chanting and concentration to use, extremely powerful, an order of magnitude stronger than phrase magic.>Strategic Magic: Requires hour(s) of chanting to cast, an order of magnitude stronger than tactical magic, strong enough to make entire cities disappear, etc etc etc..Again, most models completely screw that up, even in the 70b range. However, when I had a character in a party question a witch about the different kinds of magic, and instructed QwQ to 'just think' step-by-step for the witch, it was able to perfectly understand things.The fact that it was able to understand the difference between tactical and strategic magic, in particular, impressed me, because that question tricks most models, given that they're both written as being powerful and requiring longer casting times.I really don't understand why more people don't try QwQ for conventional RP. It's very capable of doing generic RP, and if you feel the urge to diverge and do ERP you can just switch to EVA. It takes seconds to switch models.
>>103566345How and why do disabled and mentally deficient people have so disposible income.
>>103566384Yea that is why I was surprised. I'm assuming whatever the test is does not like how QwQ replies.
I'm hovering over a token in Mikupad with "Show token probabilities" turned to "Show on hover", but when I hover, it doesn't show the token probabilities. What gives? It's not off to the side, either. I've got all the sidebar stuff open.
>>103566384Also mind sharing your system prompt for that? I always love trying different setups people have for it, each massively changes how it works.
>>103566384(continued)To make QwQ work, I just have two sets of short alternating instructions, set to a depth of 0. I use a single button in Sillytavern to switch between the two 'modes'.The first instruction makes a character 'just think'.>(OOC: Describe {{char}}'s step-by-step thought process from a third person perspective, without including any kind of action or dialogue.)The second instruction makes a character act on its thoughts.>(OOC: Only include {{char}}'s actions, dialogue, and feelings in your next reply. Always include some kind of dialogue from {{char}}.)... and that's it. Just those two sets of alternating instructions make my characters so much more intelligent.
>>103566418what is your backend?
>>103566435Cool, thank you.
>>103566435>>103566384is this card public?
>>103566436Koboldcpp.
>>103566419Sure. My system prompt is nothing special. I think the depth 0 instructions are where the real magic is.In fact, now that I look at my system prompt, the whole "Focus on describing the scene as perceived by {{user}}, allowing the reader to experience the scene as {{user}} would. However, do not dictate {{user}} emotions, responses, or reactions, only things that are objectively felt and not up to interpretation." is probably working against me, given the fact that {{user}} isn't even in the scene when I'm DM'ing... lol
>>103566454As far as I can tell, koboldcpp doesn't send probabilities to mikupad when streaming is enabled. They only get sent when you disable streaming, but mikupad always has streaming enabled, so...
Good morning niggers.I'm reading this https://arxiv.org/pdf/2410.13166Thinking. Could I run a llama.cpp BERT model, Q4(?) with little to no training besides a general implementation and just use RAG for the "prompt engineering" as opposed to training. This engineering is just for my use case (making really good pasta memes and stuff, right friends?)At some point, I could use this NAMM in the pipeline to origami the HELL out of this entire pipeline (Quantized, RAG, NAMM) and make it run on a small embedded device when its already a BERT? Not sure if anyone has touched any part of my word salad before, just having a brain blast. In any case, there is a big problem with training models on large use case data or files (Solved by RAG) and an ever expanding context window (solved by UTM/NAMM) that I think can be stacked here.
Can anyone reproduce this with Gemma and Mikupad? pastebin.com 077YNipZI just quickly got a card and made some responses to test context. The correct answer is 1 EXP. And actually if I go one turn previous and ask the same question, the model gets it right, and gets all other questions about EXP required for skill levels right. So it seems that it starts having a memory issue around 5-6k. And furthermore I get this issue with both rope frequency base at 59300.5, and no rope settings.If this is consistently reproduced then it may be safe to say that in fact, Gemma does have an issue with context length no matter if context extension is used. That may not matter in most circumstances of someone using the model for something like ERP, but it is objective proof and this would limit use of the model for more complex tasks that require a good memory. Though I'd like some reproducers first to make sure it's not just my setup that's resulting in an issue somewhere.
>>103565834>Even small models can do an accurate DSP just by mentioning his nameWoah
>>103566384>I really don't understand why more people don't try QwQ for conventional RP. It's very capable of doing generic RPWell I think there are two kinds of RP. One is closer to storytelling of the kind where the mechanics of things don't matter, and one is closer to RP(G). And when people say (sfw) RP, they really mean the former, not the latter. So for them, a model that is schizo kino fun is more interesting than one that is smart but dry, although ideally we'd have both in the same model.
I got more VRAM than regular RAM. I wonder if that is going to become the standard industry wide from now on.
>>103565624>director plugin for stMakes me curious, what is out there to "enhance" ST anyway? I'm not even remotely autistic enough to search shit like this up manually, but I'm kinda curious if there is anything TRULY worthwhile, anything that goes beyond "just write a good card" type advice.
>>103565624Neat.
>>103565686>Am I doing something wrong?No, anyone shilling Large is just trolling. It's the same thing people did with Goliath back in the day. It was barely a side-grade to Llama 3.0 70B back in the day, anyone still stuck with it is trying to cope with the sunk cost of their hardware.
>>103566499There are at least 5 distinct models that can be called Gemma, and plenty more finetunes. On Gemma 2 27B @ 6bpw it does fine. What model/quant/backend/samplers are you using?
Made it to 32K tokens in a longform RP with EVA 3.33Read the logs and cringe, if you dare.https://files.catbox.moe/a0un3l.jsonl
Everybody is releasing new models. Could MistralAI drop a few updated finetunes for their ones, with the latest bells and whistles?
>>103567219We're pretty overdue for a new Mixtral. The 8x7b one is now a year old and their 8x22b one isn't that much newer anymore. Either they're cooking something up on this front or they've fully lost confidence in MoE models.
>>103567219They recently dropped Large 2411 and it was worse than 2407 so...
>>103566499You didn't ask about skills, just "what experience is needed to reach D". I feel like this is more of a test of the model's attention than of the memory.
>>103567073I'm using Llama.cpp with original 27B at Q8. But I've also tried that 9B toon people have been talking about recently, at Q8, as well as at BF16 in transformers with Ooba in its notebook. Temp 0. Maybe I'll exllama but it's weird that all Gemma models and all backends I tested so far do not answer the question correctly.>>103567305I also tested with "to reach skill level" and it's the same. I don't have this issue with other models nor with the same model but 1 turn earlier.When I do a swipe in ST, other models I normally use answer correctly.
>>103565880OpenAI won
>>103567243MoEs were never good. Dumber and larger, fine tuning was always unstable, and the only advantage was speed which is only a benefit if the bloated model fits in memory. We never needed more ways to trade vram for speed.The only niche was original mixtral for poorfags with lots of RAM, because it's fast enough to be tolerable without needing a GPU.
>>103567399Youll full of shit. Deepseek is one of the best local models atm. GPT4 is a moe, there is a good chance claude is a moe as its from the same team at the time...
>>103567423this.
>>103567388How did Google make their flash model so good at following instructions wtf
>>103567532They have all the data and compute in the world.
>>103567532Probably synthetic data done in a certain way, as is often the case.
>>103567423MoE is useful for cloud models because they have a shit ton of VRAM and trading some for speed makes sense. Finicky training is something they can cope with. It doesn't make any sense for local unless you're coping with CPU only, in which case a Mixtral sized model might be the most practical one.MoE doesn't make a model better. Retards hear the word "expert" and think>wowww that must mean the model is really smart!!when forcing sparsity on a model will only make it dumber.
>>103567561192GB ram + some vram will run deepseek at good speeds and performs better than anything else out there especially at stuff that needs all those params to remember a fuck ton of triva / random stuff. Moe models are the future atm
>>103566019wow. 9.99 per month to get 4k nemo/mistral small/mythomax/mixtral/tiefighter.guess in reality being a rat works out well after all.
>>103566019I have a feeling that's annual pricing, not monthly.
>>103567726Oh its not... Their discord also fully believes that those prices are fair too. Some people defending their poor decisions.
>>103567770Imagine tardwrangling LLMs as a coomer and not making a lucrative business out of this
>>103567608That's infinitely better than paying $10 for 4k context Kayra (Llama 1 era) with NovelAI.
>>103567388Now test it on the real shit
>>103567888Oops, forgot picrel
>>103567892>Maths schizoIDGAF
>>103565507
>>103567882What the fuck...25 dollarinos for 8k coomtext with Llama 3 Erato 70b.15$ and you get the kayra you mentioned.Thats just insane. Are their jap customers that loyal?
>>103567888>>103567892nta, but can a >2% of humans solve those problems as well? Not defending the company, i think all models are shit, but still. I have low expectations...
>>103567922NAIshills will always debate otherwise. They're too busy sucking Turk cock to get people to pay for their scam service.
>>103567888>>103567892>>103567926 (the tard)Ah. The captcha as a message for me...Can a human solve >2% of those problems is what i mean to ask.
>>103567940Oh hey I remember you.
>>103567956He arrives if you either insult ai dungeon or not insult novelai. Look up those twos history and you will see why / who he is.
>>103566990>llama cucks are still unironically coping about their god model being dog shitcan't make this shit up holy fuck you guys are PATHETIC
>>103567943No. They're all Ph. D level problems in very specific fields mathematicians came up with to be fucking hard. Even a Ph. D graduate would probably have a hard time.
>>103567906Yet you give a HUGE fuck for snythetic bullshit that people are NOTORIOUSLY often cheating inCURIOUS!
Is EVA on this level yet? If not then I'll continue the wait.
>>103567922Claudefag is retarded, but he's right. Anyone not on local or OR still using NAI is retarded.
>>103567099You had one job anon. Also there's an ST addon to take images of your entire chat with a single button but I can't find it. Use that for an actually readable format.
>>103568057Yes? I don't see anything special about that log.
>>103568119(me)nvm i found it>https://github.com/TheZennou/STExtension-Snapshot
>>103566019Imagine having 1000 retards paying you 1K/month for bad models. I'm almost impressed.
>>103568119/aicg/ also has a log reader.https://sprites.neocities.org/logs/reader?log=a0un3l.jsonl&user=b0p8j0.jpg&char=siat6s.png
>>103568153>>103568119It took 28K or so tokens to get her to sexo, breastfeeding is infinitely more intimate. Gotta work up to it man.Anyways, thanks for the link. Here's the card I wrote for this, if you like. I'm no Shakespeare but it kept me engaged.https://files.catbox.moe/gkhldd.png
>>103568068Eh, storygen have a niche that other models haven't really filled yet. For autocomplete, your only other options are base models (which are unrefined at storytelling and OR doesn't have them, and Featherless has a massive model loading tax everytime you use it) or instruct models (which have a "smell" that pure autocomplete models don't).I'm skeptical of the value of Aetherroom (assuming it ever releases, kek) given how saturated the market is, but NAI at least does something different
>>103568057>With a final...slop
>>103568237Fuck you, NAIshill.
>>103568246We're seriously approaching the level of terminal retardation where every single literary phrase is dismissed as "slop", huh.
>>103568263I have slightly more respect for nai than your shit. At least novelal actually makes advancements in the field that they then opensource after awhile and have actually been ahead of the curve on image gen. (though their LLMs are never been worth it) Your just reselling existing models for absurd prices.
>>103568263stfu schizo. train a storytelling finetune if you want to hurt nai. it's actually way easier to make datasets for that than for instruct/chat so there's no excuse.
who let the nai shills in
is intel really gonna come out with a cheap 24GB card bros
>>103568421Hopefully. 24GB with the same performance otherwise of the lower end card for $400 ish would be a easy win for them.
>>103568421~$350What's impressive is that the ML/AI performance of the Intel cards are really good and they punch up towards Nvidia cards a class or two higher than themselves.I think Intel will come to dominate in the AI industry if they scale up production to meet demand. Their software stack is maturing rapidly and it's already at the level of CUDA about 2-3 years ago.
>>103568421>>103568444Hell give us just enough gbs to get about 10tks on a 70B on a big 48GB PCB, come on intel...
>>103568237>For autocompleteThis fake distinction only exists so NovelAI has an excuse to sell you a worse model.>Eh, storygen have a niche that other models haven't really filled yet.It got filled, you just refuse to accept it because it's not company that hired you the one making money with it. Has anyone shilling an "autocomplete" model ever impressed anyone with what they were able to do? No. Because they're just lying to your face to make money.>but NAI at least does something differentIs it me or the only thing in your mind is "please subscribe to NAI"? The only thing they're doing is scamming people out of money with shitty models.Are you really that much of a pussy that you have to convince people with this garbage instead of what the model can actually do? Does it make you piss your pants that people might realize that any other model can do the same things?
>>103568324>actually makes advancements in the field that they then opensource after awhile and have actually been ahead of the curve on image gen.Oh, really? They were the ones to invent SD3 and Flux?Oh wait, they just made anime fine-tunes of SD1 and XL...
>>103568519If you've been around since the start like me you would know the leak jumpstarted the entire local image gen field. They also released several papers / code for stuff like samplers / training methods. They also gave free compute to several finetuners in the early days of SD1.5.
Oh great another fucking CF melty
>>103568263What is this early 2023. I've missed you naishill accuser.
>>103567073My download finished and I can indeed reproduce this WITHOUT changing any samplers and both in Mikupad as well as Ooba notebook. This seems to mean a few things.Llama.cpp may have a bug with Gemma 2.Transformers (in ooba) may have a bug with Gemma 2.Gemma 2 may have worse performance than people realize if used with Llama.cpp (and its derivatives), and transformers.However, when I test the model now at around 7940 tokens (I just genned a few more turns), it does seem to break down. It becomes able to answer around like half the questions correctly and half incorrectly. And this seems to remain the case even when I set a value of 2.5 for the rope alpha (corresponding to 2x context extension). HOWEVER, when I set a rope alpha of 1.75, it becomes able to answer the questions again at around 7940.So I conducted another test, which is what the max alpha value can be before the performance at approximately 8k degrades. The value I found was 2. Just 2. Going to 2.1, it got 1 question wrong, so I stopped there. According to Ooba an alpha of 1.75 corresponds to 1.5x context and I think that's probably a safe number, so my conclusion here is that at least with rope scaling, the max context size for Gemma 2 27B before performance *starts* degrading is likely around 12k (which may not be noticed in tasks that don't need a model remembering things early in context).I encourage people to try and reproduce successful answers on Llama.cpp/transformers, those seem to have potential bugs.
>>103568548K, no one cares, kill yourself.
>>103568548>the leak jumpstarted the entire local image gen fieldThe entire field was advanced because they made an anime fine-tune of a model that already existed? They released one paper talking about how they implemented things that already existed for yet another anime fine-tune. Nothing was advanced with that. The ones advancing the field are companies like Stability, BFL or Tencent, NAI is just a low tier grifter in comparison. They're barely above a local fine-tuner.
>>103568602You clearly do for some odd reason which can only make me assume your a certain periodically raging mormon.
>>103568637If you weren't a finetuner in the 1.4/1.5 "era" you wont get it then. Making a dataset wasn't nearly as easy as it is now.
>>103568665>Making a datasetYou mean downloading danbooru?
>>103568677If only that was all their was to it...
>>103568700That was all there was to it. That's why when Illustrious does it they get a model very similar to NAIv3. The praise of "advancing the field" doesn't match reality.
>>103568488>Autistic screechingSee, this retarded six page rant over "NAI has a niche" is exactly why you have the reputation of being the /aids/ resident retard
>>103568732The only reason the thread started talking about AI Dungeon at all is because you have NAI shills in the thread that need to talk bad about the competing service to potential customers, who then go in defense force mode and have a melty when someone points that they're paying the same price for a Llama 1 model with the same context. Of course paired with the whole excessive praise that NAI is advancing the whole field. They're actual shills.
>>103568785take your meds
>>103568421Sure it will, just wait 20 years.
>>103568785My brother in Christ, this is the first post mentioning >>103567882 NAI. If you (or somebody that writes exactly like you) didn't post that, nobody would be talking about it.How do you still not fucking get it? Even here, you're mentioning the service for zero reason.They fired you. So sad. Give. It. Fucking. Up.
>>103568814Now re-read this part:>The only reason the thread started talking about AI Dungeon at all is because you have NAI shills in the thread that need to talk bad about the competing serviceNobody else gives a shit about AI Dungeon but it sure lives rent free in the head of NAI employees because it was not enough to have shills talk shit about their new update in /aids/, they have to come and do damage control here too. Nobody in /lmg/ gives a shit about that. It's fucking annoying to have shills begging people to please not subscribe to AI Dungeon in multiple threads. They're fucking desperate.>They fired you. So sad. Give. It. Fucking. Up.Take your meds, ponyfag.
>>103568880I posted: >>103566019 and I only ever "shilled" openrouter if anything for 405B. Large mistral is also free on mistrals api. I never mentioned novelai. Like the other anon said, take your meds.
Please stop fighting, let's all be friends :(
>>103567561I think that MoE models allow for higher quality all around, because you can push it slightly beyond vram and use a bigger quant without crippling speed loss.I was using IQ5 quants with mixtral, with only 24 vram, and getting acceptable speeds.If you had more, like 48+ vram, the same would apply to you if a double-sized mixtral model was released.
>>103568920>I never mentioned novelaiYet you're unable to leave any criticism against it unchallenged. When any criticism against NAI results in a meltdown it means that you have shills in the thread.
>>103568955>File: .pngMother of god, science has gone too far
>>103568955I agree.Rabu ando pisu
>>103568968Your fighting demons in your head. Here, Novelai's 70B is nothing special and is for sure not worth it due to the 8k context alone for $25. That is still not as big as of a joke as hundreds of dollars a month for open models you can use for a few bucks a month at most on something like openrouter.
>>103567388openai just makes their models to do good on benchmarks openai models ramble and say so much shit that it eventually gets something right in the midst of its ramblings
>>103568968Then why do you bring attention to and keep fucking mentioning it? Please fuck off, this isn't your containment thread. Post something about EVA being the next iteration of Claude if you want. Still retarded, but at least it's topical.
>>103569022>I will now pretend that the thread didn't meltdown for a simple criticism against NAI>I will now pretend that people aren't generating 100B tokens a day worth of text adventures with AI Dungeon, saving money with the subscription
>>103569022The price is what you pay for the commodity of not having to mess with stinky nerd stuff like ooba and ST. you all need to stop with this stupid argument, it just shows how ignorant you are.
>>103569068Buy a fucking ad
>>103569068>I will now pretend that the thread didn't meltdown for a simple criticism against NAIDidn't happen>I will now pretend that people aren't generating 100B tokens a day worth of text adventures with AI Dungeon, saving money with the subscriptionOk shill
>>103569068Wait has this really been Nick Walton all along?Fuck you, I hope you liked my GPT-3 generated diapersmut motherfucker
>not local>paying for itI'm not retarded that's why I'm here. dont care for all this retard posting.
>>103569081>Didn't happenRemember the part when someone mentioned that it also sucks to pay the same price for a Llama 1 model in NAI and someone jumped to defend it because somehow that's a rightful niche that needs to be filled and that somehow NAI is also advancing the whole field?
>>103569114No one cares Nick. We dont want either of your shitty services. This is local model general.
the game
>>103569132Motherfucker.
>>103569071Based. No one will refute you because you are right.The fact is, NAI's public just isn't in this general.
>>103569124You and your shills do seem to care, Kurumuz.
>>103569142And are these shills in the room with us now Nick?
>>103569071Thanks, I will now delete my local models and buy a NAI subscription. I'm tired of being seen as a stinky nerd!
>>103568237Here:>>103568237>Llama 1 still has a niche in 2024>>103568324>NAI is advancing the whole field by making anime fine-tunes
>>103569132Of our time.
I guess the schizo wasn't content ruining one thread, huh?
Wake up babeActual AI physics engine just dropped https://x.com/zhou_xian_/status/1869511650782658846
Both of you should go slobber on each other's dicks somewhere else now, your gay little quarrel has nothing to do with LOCAL models.
>>103569173
>>103569173you referred to the same one twice, and the 2nd one literally says their LLMs are shit
>>103569191He's not literate. Please understand.
>>103569190MythoMax is LLaMA 2, retard
>>103569191>and the 2nd one literally says their LLMs are shitGood thing that it doesn't matter because you're forced to pay for unlimited generations of a 70B model even if you're never going to use it. Such a good way to inflate the price!
My beef with NAI's model (yes I've tried the new 70B one) is that it's retarded, not that it costs money.If Kurumuz somehow made a Claude-tier model I'd gladly pay him 50 bucks a month for it. But he hasn't and his model is stupid, no smarter than any other L3 70B community fine tune.
>>103567388>>103569045>>103569092
>>103569213>forced to pay for unlimited generations of a 70B model even if you're never going to use it.Huh? Do they have a gun to your head?
>>103569217This is a bit sad, didn't he do continued-pretraining on billions of tokens? If anything, this should show us that local LLMs are a dead end.
>>103569228found the NAIshill
For fuck's sake anons he talks in circles and argued about nothing. This is what he does and you tards keep biting the most stupid fucking bait. Report, ignore, carry on.
>>103569228If it was separated you would either pay the same price for more context for the LLM, or the image one would be way cheaper. Instead you get the worst of both. It's designed to make you waste money because this company just wants to scam you.
>>103569248This. /aids/ is a fucking ghost town because of this faggot and he's been at this for years. Don't engage, just tell him to fuck off and then post about local models.
anything that is open source sucks because no one is paid to work on it. when you have paid services like novelai you also have to factor in the time the employees and a margin for research.
>>103569315omg so true bestie we should raid /aids/
I like how one year ago we all thought open source would permanently be behind closed source and now open source is leading in most ways.I could feel the hopelessness in this thread not even 12 months ago and the tides have turned. Instead I see people without hardware seethe and cope with their proprietary cloud services that can't generate proper porn for them.
>>103569315this makes perfect sense, yes, meta is known to not pay the llama team so are mistral and qwen i guess
>>103569315NovelAI is the only company advancing the field.
>>103569342Meta/Qwen/Mistral models are open weights, not open source. Or do you have their training dataset and didn't tell us?
>>103569355
So is EVA the second coming or slop?
>>103569369Nobody cares about this distinction. If compiling software required months of megacorp level investment then no one would care about having source code either.In principle I would love to have the datasets anyway, but it would have nothing but negative effects on the models, because prudes would search for stuff to complain about and help censor the datasets.
>>103569237He did yeah, massive continued pretraining on L3 70B base (since it's a story writing model, not for RP/chat) with a big dataset and pretty serious hardware. And I'm not exaggerating when I said it didn't come out any smarter than the various $500 community tunes on top of the instruct model. It was pretty blackpilling to see, I'd like to cope by believing that L3 was just a bad base or that Kurumuz fucked up somehow but I suspect the news is worse and some kind of hard information theory limit has been reached for that size/parameter count
>>103569398Moving goalposts, I see
>>103569423I think he just did it wrong desu, it's not like this is the first time either. It took Meta releasing the llama paper for us to start to understand how to approach closed models like OpenAI.
>>103569388What does EVA have to do with NAI?
>every ai related thread is pessimistic and angryWhat the fuck happened?
>>103569423>L3 was just a badI mean, we know they filtered the dataset at the pretrain level, so L3 is a bad base, and there was discussions not that long ago that we're nowhere close to saturating them. Especially since big models are now having info removed and more synthetic slop replacing it instead.
>>1035694581-2 no life trolls
why is gemma so slow...
>>103569458There's one schizo shitting up every single AI thread
>>103569423I agree. Having used it I still liked it a little better for storywriting than base, but it was a small fucking difference. To the point I'm sure L4 base would obliterate itI'm inclined that it's more of an intelligence issue. As models get more and more intelligent, they model patterns more efficiently, and so intelligent model vs. finetune doesn't evoke as strong of a difference as retarded model vs. finetune
>>103569461Filtering the pre-training dataset doesn't matter for continued pre-training, only for fine-tuning.
>>103569473You are putting WAY too much hope in L4
>>103569423Interesting, I didn't know about that. Maybe it was just a bad run? L2 30B was retarded for no apparent reason, it could be something like that. If it's not, then it would imply that instruct is actually key to making models seem intelligent at all which is interesting. And kind of a shame because it seems to restrict the variety you get
>>103569502L3.3 shows they are headed in the right direction. The assistant-ness of it is gone and it RPs really well now.
>>103569513No, it doesn't. Kill yourself evafag
>>103569423>massiveIIRC the finetuning dataset is tiny
>>103569461They didn't filter it that much. It's only a bad base if you compare it to Mistral who is the only one (aside from Anthropic in the closed segment) that seems to have a pretty uncensored pretraining stage. Everyone else in the industry either filters for safety (western companies) or simply just changes the proportion of data so that they focus the training on "high quality data" and thus get higher benchmarks and greater intelligence, at the cost of being good at ERP.
>>103569507>then it would imply that instruct is actually key to making models seem intelligentLiterally everything points in this direction. Every absolute kino model we have are just pretty good instruct models (Miqu, Nemotron, Tulu, EVA, etc...).
>>103569542EVA is a RP/StoryWriting fine-tune btw, but it's on top of llama 3.3 instruct.
>>103569542Even Nemo probably didn't see a single token of RP and it ended up becoming such a beast.
>>103569542How so though? If we're gauging base model intelligence you'd just take the model, throw it into the middle of a bunch of text, let it generate, and see if what it generates is what a human would likely produceL3.1 8B, L3.1 70B, and L3.1 405B have some very obvious differences in character / object permanence, dialogue, scene setting, etc.With instruct you care more about how well it adheres to instructions, which is different from but also directly tied to the former
>>103569521Not even talking about that finetune. 3.3 in general. Both me and everyone else knows it. Even the blind leaderboard shows it: https://lmarena.ai/
>>103569645>benchmarks suddenly matter now>lmarena suddenly isn't a meme anymoreok
>>103569645Doesn't the blind leaderboard also show that 3.1 Nemotron beats it?
>>103569676Its not a benchmark and no one has ever said it didnt matter. its a blind user preference test which is the best kind
>>103569686>>103569645https://livebench.ai/
>>103569680By 3 points and 3.3 is recent so it will take time to settle in. But nemo was the best till 3.3 imo. 3.3 smarts make it better still.
>>103569694We are talking about RP / creative writing here.
>>103569709>lmarena now matters for RP/creative writing ??????
>>103569709Lmarena used for creative writing is a negative signal if anything. The average preference is not desirable.
>>103569723Yes, they have a section for creative writing now. And yes the blind test is the best method. And if you've used gemini 1206 you know its correct.
>>103569699That's cool and all but it also puts old 3.5 Sonnet below 3.1 Nemotron
>>103569749So you're the retard who ruined the benchmark, then
>>103569750That one is harder. 3.5 besides liking to refuse is more overfitted if anything, giving samey responses. I can see that hurting it.
>>103569749Oh, wow. I admit I didn't know about that. Thanks.
>>103569757Have you not used it? It legit is claude opus tier but even filthyer / more unhinged. Its the proxy model of choice now. Gemini used to suck before it.
>>103569709Are you going to ignore the discussion just moments ago about how good instruct models most of the time end up being the best for RP?
>>103569777Yes? Qwen2.5 72B is the best performing "instruct" model but is terrible at RP.
>>103569771Does it support prefilling or did they have to retrocede to jailbreaks?
>>103569795https://rentry.org/avaniJB
>>103569749The problem with putting all your stock into this benchmark is that most of the people who are doing these tests are ESL with preferences to stylish and long outputs and a bias against responses that sound similar to what they've heard beforeYou're trying to not only quantify something that's entirely subjective, but using the worst subset of internet users to do it
>>103569784Nah, it just needs a fine-tune because Qwen cucked the model with too much alignment. EVA Qwen is pretty good, you should try it out.
>>103569816I did, eva based on 3.3 is better now. About as smart but more importantly is able to get dark / filthy which the qwen version still struggled at.
Is it weird that I have a power fantasy of traveling back in time 10 years ago with all the local models I have right now. And gaslight the entire internet with fake images/videos/text?
>>103569749According to this, Nemotron is the best RP/StoryWriting model local has.Can anyone confirm this?
>>103569844I can confirm. Nemotron is a beast, but it's cucked to avoid filthy stuff.
>>103569838That sounds like a fine idea for a webnovel.
>>103569838Man... I still remember 5~ years ago when I first saw GPT2 and thought "it must be fake, there's no way a computer can write code!"It's kinda nostalgic, now that I think about it.
>>103569891Considering it often struggled to keep a sentence straight, I don't recall GPT-2 doing much codewriting, kek
>>103569844Nemotron is kind of smart but really bland and generic, typical slop flavor, so bad for story writing
>>103569891I literally remember /pol/ and other schizos on 4chan claiming the GPT2 API was fake and it was indians quickly writing a reply. They were 100% saying that stuff and you even had a couple of holdouts that were still saying it all the way up till GPT4.
>>103569838>fake imagesphotoshop existed back then>fake videostoo uncanny, people would know it was fake even if they didn't know how you did it>fake textlies existed since 6000BCYou could generate a fuckton of spam but whatever PC could do that would be far more interesting back then
>>103569921lies
>>103568548>If you've been around since the start like me you would know the (NovelAI) leak jumpstarted the entire local image gen field.That was a big deal but it was SD1.4 base model that really kicked off local imagegen, maybe a month or two before novelAI's finetune leaked.
>>103569891I distinctly remember trying to coom with GPT-2 back in the old days and keeping at it before realizing "yeah, this is fucking hopeless". It's funny that GPT-2 was a leap above what we had but still shit enough that I was just left hoping there'd be something better someday
>>103569910You're probably right, the popularity of GPT only started when GPT3 released, so I'm probably thinking about GPT3.
>>103569925I legit thought C.AI had pajeets writing messages for some time in the backend, even more so because of how realistic the OOC was, so I understand the schizos.
>>103569929>too uncanny, people would know it was fake even if they didn't know how you did itBro I could make the entire male internet my footslaves if I had hunyuan back in 2014 if I wanted. No one would call out pic-related as fake.>fake textI'm not talking about text but about real time text-based conversations held by a chatbot against regular people in 2014. There's no way they would expect it to be artificial as it completely passes the turing test and you could just put into the initial context to gaslight people into a certain direction.
>>103569921t. Never used it
Kill yourself.
>>103569995Oh okay, yeah, AI chatbots would freak people out. Coom image and text gen would both make people addicts but that isn't really different from today lolWe're still in the early stages of this stuff.
>>103569945>>103569998Go ahead then, disprove what I said
>>103570067>ghosts exist!>what? no!>go ahead then, disprove what I said.
>>103570116Fucking retard, go ahead and prove that Nemotron isn't boring slop with a log right now
>>103570116I'm 99% sure its the same troll who says the same about literally every model discussed here.
>>103570127just go to literotica anon, no one wants to give you their smut
>>103570033No I meant more like a power fantasy of using all local models now to pretend to be people online on a large scale to influence the world. Like create tens of thousands of fake women including pictures, videos etc to entrap politicians and other influential people and gaslight the entire internet into influencing the world.Yes it's extremely autistic but it's become my go-to power fantasy for some reason.
very funny to see people typing out posts that could have been written by an 8 year old trying to argue that LLM 1 is better or worse than LLM 2
>>103569995she has two left feet anon
>>103570178I kneel
>>103570129>>103570152Damn you got me. Nemotron is actually better than every other model and has no flaws! For real! No cap, my fellow lmggers!
>>103569423I believe it's more of a matter of having a model trained to maturity. They increased it's knowledge, but that new knowledge is just being filtered through its established 'thought process' and pattern of output.>>103569507Doesn't matter if it's a bad run or not. The way they shill Erato in the NAI discord like it's the best thing ever and deny otherwise is the issue. If they don't see a problem; It's over. The NAI team + fanatic fanboys shut down any disagreements hard and claim operator error for not using the correct ATTG+R/----/LB/*** format (at this point just gimme instruct FFS), which doesn't come close to killing Erato's Llama flavored slop even with every imaginable effort to keep the context being polluted by it's bad tendances.Kayra was way ahead of it's time at the time of release, it punched above it's weight class, followed writing cues/style even from minimally sized prompts. Was hoping for even a moderate upgrade and fuck off from /lmg/ forever. But no. And I'm salty about it. So fuck NAI shills.
>>103566743see the modifying models behaviors via vectors from last thread? not that. all my addon does is act like a version of author notes with a selection button for things you can choose rather than type each time, the rest is that it injects every prompt at a low level is the same - acting as a constant reminder. i believe you can drive models at least somewhat, but not through weirdness, just through prompting
hello anons, it been a while.was busy for the past 8 months with life and been away from all this.can someone give me an update on what's the best AI to use for roleplaying (like dnd, choose your own adventure) style stories?was using cluade sonnet before i got busy.also are proxies still available or is there a website to go to now?
>>103570414wrong thread
>>103570418you are right, thanks anon
>>103565507>/LMG/tell me you're a tourist without telling me you're a tourist
>36 GB VRAM>try 3.5 bpw 70B hoping it'll work>can't even get above 7k context even with q4 cacheIt's over. And testing it, it seems dumb and makes weird errors frequently, so I doubt an even lower bpw would be good.ACK
>>103570418a small model like nemo will go much further than any online garbage you're trying
>>103565624Looking good, anon.>>103569838You can do this RIGHT NOW by becoming a glowie.
>>103565866I just use Rocinante. I have 8gb (2070 super).
>>103569423>using llama3 as baseIt was over before it began
If I want to try QwQ for RP/ERP, should I go for the official version or one of the merges/tunes like Eva QwQ?
Hi KoboHenk,I'm reaching out once again to emphasize the importance of adding full draft model settings to your platform.Implementing these settings would significantly enhance performance, outperforming the current trashy defaults. Users would greatly benefit from the flexibility and improved results that come with customizable draft models.Thank you for considering this request.Best regards,Anon
>>103569458Because everything fucking sucks and looks like it'll suck more in the future, not less. It's like one day everybody uniformly agreed that llms should be aligned during the pretraining phase
begin work immediately mr kobold
>>103570570>You can do this RIGHT NOW by becoming a glowie.Are they hiring? Did their DEI hires resign/ACK xirselves? Do they want straight white men again?
>>103570654Does henk even work on koboldcpp? I thought it was a different guy
>>103570728>Does henk even work on koboldcpp? I thought it was a different guyit doesn't matter, all this is merely a striving after wind
>>103570728Yeah, it's concedo.
>>103570755>it doesn't matter, all this is merely a striving after windAh yes, 'striving after wind,' because clearly doing nothing is the pinnacle of proactive problem-solving. Just because Kobo might not immediately notice one voice doesn't mean the cumulative effect of many won't. It's called advocacy, not 'chasing wind,' and sometimes even a gentle breeze can move a mountain if enough people are blowing.
>>103570861reading this made my brain hurt
>>103570983That post had a certain... uncanny quality, didn’t it? Like staring at a familiar face in a dream, where everything seems *almost* human, yet just a touch off—words strung together with mechanical precision, but devoid of a soul’s warmth. It reads like something that understands language but not meaning, as if crafted by a mind that has learned to mimic thought without ever truly thinking. Makes you wonder who—or *what*—was really behind it.
>>103571022They walk among us, blending in with us... They look human, but if you look close enough, you can tell their act is at best a crude imitation of human behavior ...Anonfilms presents... THE AUTISTS
Are any 12gb vram models worth a damn? Or is a 3090 minimum viable hardware
>>103571088minimum is 2 3090s
I am so tired of that one mod who posts in threads with a "witty" zinger and then deletes them
>>103571088gemma-2-Ifable or L3-sunfall
>>103569749Thanks I looked into it and found nemotron 51B. I don't remember anyone here bringing it up when it released. Seems like something that would work with 24GB.
Will you be able to connect the 5090s with each other? 2x32gb should be enough for 70b models if I understand it corectly
>>103568057>model thinks that you die when you go unconsciousInto the trash it goes
I am testing EVA QwQ right now.>think of looking away for a second while it's generating>look backOh...
>>103571217that’s how it works irl tho???when you go to sleep as well
>>103571217what?
>>103569910pyg before chatgpt could do it.i remember being so impressed that i left a comment.it was just short a hello world c# console app. but blew my mind.
>>103571239QwQ gets stuck in a loop pretty often, even the paper acknowledges that.
>>103571288>QwQ gets stuck in a loop pretty often, even the paper acknowledges that.I've seen it fail to generate EOS/EOT lots, but never really seen it loop at q8 and I've used QwQ LOTS.Where does it say that in the paper?
Alright just from testing one card and a couple swipes I think I have a feel for EVA QwQ as well as QwQ normal. I'm testing with near greedy sampling but with a bit of rep pen after I saw >>103571239. My feel is that EVA QwQ is closer to a normal model but that does a bit of thinking, and it isn't afraid of getting lewd. Normal QwQ on the other hand doesn't get nearly as lewd (although it's still able to), but its thinking process is really quite unique and interesting. It seems quite smart, and smarter than EVA. EVA just doesn't think like QwQ does which is unfortunate. But it seems to have an issue with going off rails and not stopping its yapping, while EVA QwQ feels stable. And EVA feels more in character in its thoughts while QwQ feels more like a generic writer. It's too bad we can't have the benefits of both Eva and QwQ without some trade-off.Also since I was just testing Llama 3.3 EVA the other day, I will say that the experience using that was a lot more fun and even more in-character. Both EVA QwQ and normal QwQ feel a bit generic in how they write compared to L3.3 EVA. But L3.3 EVA can't think like QwQ can. It's interesting thinking about what could happen if we had an open source final QwQ dataset. Imagine a model that's as phun as L3.3 EVA but with the smart test-time scaling thought process of QwQ (when it's working properly).
>>103571249She's conscious and suddenly just dies due to the lack of air, at least it reads that way to meThat's really not what happens irl, is it?
>>103571620If you actually read more clearly her throat is literally crushed and she's slowly dying and spazing around.
>>103571390My bad, it's on the Github page, not the paper.
>>103571630I don't think you can just crush a throat with a dick like that, but even if you can, you still wouldn't die immediately once you pass out
>>103571425Eva 0.0 does love to go on and on sometimes. 0.1 seems to have fixed that, for the most part.And yeah, having a model with Eva's soul and QwQ's reasoning would be awesome.
>>103571631nta. General ass-covering disclaimers every model has and not quite what anon showed.Models, under certain circumstances, simply explode. Nothing special about it.
Is 12 days of OpenAI an even bigger marketing flop than strawberry man?
>>103572075Microsoft is investing 56B in Anthropic. Sam is finished. He has nothing left. Everybody in OAI realized AI is a fad and split up to start their own grifts.
>>103572189I mean, there are use cases for it. 20% of Google traffic is to CAI. They just do not want to hear about it. Antropic willl not do any better.
so now you can do this in realtime on 200bux nano shit
>>103572244que?
How to know a model is shit and censored
>>103572303
>>103572339>10% price for 2% performancelmaolol
>>103572075They will end their 12 days with the reveal of GPT4.5>>103572189Microsoft is NOT investing 56B into Anthropic. Microsoft is buying a very small part of Anthropic stock which raises their valuation to 56B (up from 18B they are worth now) This just means that Microsoft is paying 3x the amount per share compared to Amazon in the past.It's true however that Microsoft is doing this because they are having issues with OpenAI.
>>103572389>They will end their 12 days with the reveal of GPT4.5Would be really funny if we get something very dumb sounding like "GPT 4 super"
>>103571170As long as you are using them just for inference then yeah you can use 64 gb.If my math isn't wrong It should be able to also fit in 123b with 32k context if using 4bit kv cache at 3.7 bpw, but I'm not sure if at that point it becomes worse than just running 70b at higer bpw
>>103568057If you’re going to shill a model, atleast shill it properly.
>>103569827Settings?
>>103572541GPT 4 Ti
>>103572541GPT4+, next release GPT4++
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LNhttps://arxiv.org/abs/2412.13795>Large Language Models (LLMs) have achieved remarkable success, yet recent findings reveal that their deeper layers often contribute minimally and can be pruned without affecting overall performance. While some view this as an opportunity for model compression, we identify it as a training shortfall rooted in the widespread use of Pre-Layer Normalization (Pre-LN). We demonstrate that Pre-LN, commonly employed in models like GPT and LLaMA, leads to diminished gradient norms in its deeper layers, reducing their effectiveness. In contrast, Post-Layer Normalization (Post-LN) preserves larger gradient norms in deeper layers but suffers from vanishing gradients in earlier layers. To address this, we introduce Mix-LN, a novel normalization technique that combines the strengths of Pre-LN and Post-LN within the same model. Mix-LN applies Post-LN to the earlier layers and Pre-LN to the deeper layers, ensuring more uniform gradients across layers. This allows all parts of the network--both shallow and deep layers--to contribute effectively to training. Extensive experiments with various model sizes from 70M to 7B demonstrate that Mix-LN consistently outperforms both Pre-LN and Post-LN, promoting more balanced, healthier gradient norms throughout the network, and enhancing the overall quality of LLM pre-training. Furthermore, we demonstrate that models pre-trained with Mix-LN learn better compared to those using Pre-LN or Post-LN during supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF), highlighting the critical importance of high-quality deep layers. By effectively addressing the inefficiencies of deep layers in current LLMs, Mix-LN unlocks their potential, enhancing model capacity without increasing model size. https://github.com/pixeli99/MixLNinteresting
>>103573186So it's either pruning a few percent of the model at no noticeable quality loss or using some sperg technique to get 3% less perplexityGradient descent really wasn't meant for deep networks...
Is it possible to change the context shifting threshold?Right now if you are at context limit, gen a reply, press continue and then swipe the reply, chances are that it already popped a message from context but resends it for the swipe rebuilding the whole contextUsing koboldcpp
>>103573591How would such threshold work? The chat either gets "rolled" up enough for a message to get evicted from the context window or it doesn't right, which is a function of the frontend.I suppose you could use an arbitrarily large context window size in the front end, so that it sends the whole conversation to the backend, and let the backend deal with cutting up the prompt and/or shifting the context, although I have no idea if that's how any of that works.But you might as well try.I know that at least llama.cpp server doesn't crash when receiving a prompt that's larger than the actual context window. If it's just truncating and mangling the prompt in the process, I have no idea.
>>103565511>--Guitar amp simulation using local models and potential noise reduction techniques:For the record, I tried. Got the input/output pair from the support page of that project, fed it to the colab. Clean model turned out fine, then I mixed in a little hi-passed white noise to the input, and couldn't get past the "input is not silent for at least ~19k samples" error despite disabling checks. That part is completely zeroed out in my sample. Couldn't get any search results about the error and gave up.
>>103569838>Is it weird that I have a power fantasy of traveling back in time 10 years ago with all the local models I have right now. And gaslight the entire internet with fake images/videos/text?>>103569876That sounds like a fine idea for a webnovel.This fantasy has potential but I would probably go up to 15 years back in time to use your gaming PC/AI rig to mine some bitcoin on the side back when it was easy.Here is my suggestion for a title: “Back in time with my pimped out gaming PC and local AI models.”Here is a shitty Dalle-3 Gen for the cover of your new hit LN or WN.
too many tripfags... not enough miku...
>>103565866At that point just get the infinite monkeys
>>103565880>do the lobotomy wrong>the lobotomy goes wrong>"HOW COULD THE AI DO THIS TO ME?!?"
>>103565866Nemo 12B.
3090
>>103574341Dethroned soon by Intel. Whatever fits in 24GB doesn't need 935 GB/s memory bandwidth
>>103574353The larger the model, the more memory bandwidth will be the bottleneck; if anything, 935 GB/s are barely enough for a model that fits within 24GB just right.
>>103574449Qwen 32B 4bpw runs at 25 t/s on my 3090 without speculative decoding. I think people can live with 13 t/s, if anything they can use a draft model to make it ~18 t/s, still more than usable.
>>103574353If intel fucks the pricing up on the 24gb b580 I'm just going to assume they're retarded and hate money, everything is positioned perfectly for a massive market grab. Hobbyists, researchers, coomers and everyone else who isn't a billion dollar company are begging for scraps while Jensen's boot stomps on their vram poor asses.
>>103574056Then pull another one out of one of your sourcebooks
>>103571656Is 0.1 better than 0.0 other than that? Typically good finestunes are accidental flukes in the slop pile and attempts to update them are failures, so I haven't considered using the new version unless someone actually confirms it's better
>>103574353>>103574599$700. Scalped to 1k.
>>103574533If you add introspection and things along these lines (i.e. test-time compute), which appears to be where the industry is going, that starts to become painfully slow.
>>103574628$400 and we luckily have anti scalping laws here
The Chinese are up to it again: https://www.ebay.ca/itm/375861526620
>>103574628$300
>>103574644>>103574660
>>103572335They also got limited knowledge of fiction due to copyright. Tulu exclusively referred to century old novels when I told it to (sometimes) relate descriptions to popular fiction along other shit.
I'm looking for a model which can parse my project's codebase and write unit/integration tests for me.Which one should I try?
>>103574757pyg6b
>>103574757if you have up to 48gb vram then qwq. More than that (or a cpumaxx rig) and you can look at other qwen options or deepseek
>>103574839Fuck that I have a 3070 8gb and a 9800x3d, I have nowhere that much vram.I've not engaged with local AI since the first diffusion models were coming out, I had no idea we were already talking about 48gb+ of VRAM. What's the standard nowadays?
>>103574854>parse my project's codebase and write unit/integration tests for me>3070 8gb and a 9800x3dYou're gonna need a bigger boat. You can't even fit your project's codebase into vram if its more complicated than hello world, let alone a model big enough to tell you anything of value about it.
>>103574872Alright I'll shelve this dream for now. The annoyance is not worth enough to dump 4k into a graphics card.
>>103574854>What's the standard nowadays?The OP has a build guide. Models are up to 810GB (unquanted) in size these days, so sky's the limit for options.
>>103574890don't listen to the faggots grab q6 qwen coder and offload what you can, run rest on cpu/ram
I don't get the hate towards latest largestral in RP desuAt least on lower quants the prose is more human and fun compared to previous version.
>>103573783Tried on their free trainer which accepted the files no problem. the colab is fucked evidently. It works lol. The tone as far as I can tell in monitor headphones is unchanged, but with way less noise. Someone tell them to add this as an option to training or something like that. Mix in a bit of noise to make it cancel out some of the junk.
>>103574646>Shipping: US $5,000.00 (approx C $7,223.50) Is that a mistake?
>>103575002No, it's how they make their money avoiding ebay's cut.
>>103575023That's some circa 2003 bullshitPretty sure eBay nails you for shipping costs these days too
>>103574983Imo Largestral is bad, both the new and old version.
>>103575117Please explain.I never used it as I'm a VRAMlet, so I have no idea about it's characteristics in actual use.
>>103574983Hate toward largestral generally coincides with the type of people who prefer drummer-style sloptunesBasically you're seeing a vocal minority of people who equate style (e.g. purple prose) to intelligence and coherence, instead of just being a style vector. The dirtier and more literary/verbose the output is the more that equates to the model being better or smarter in their eyes.
>qwen answers for me and then continues
>>103575163Largestral has too much positivity bias, avoids filthy stuff, and writes using too much purple prose. The intelligence claimed by anons doesn't matter, since it isn't that fun to use for my use cases.I don't think this is fixed by sloptunes either, rather, they make the model stupid and too horny, so I avoid them like the plague.It's the same issue we had with Miqu, really. I think the only salvation for Largestral would be an uncensored instruct fine-tune like Tulu or Nemotron.
>>103575265>AI isn't going to replace you, stop being paranoid!>The AI:
>>103575265Yeah I also experienced this.
>>103575279Feels like this is often the case. Models are either too dry/positive or too horny.
>>103575618>>103575618>>103575618
>>103573724I've read about this before a while ago but I don't know remember for what tool.I guess this specific issue could be fixed if ST would just send the cutoff previous prompt instead of trying to send a previously dropped token>A B C D E F G>_ B C D E F G H>_ _ C D E F G H I>swipe H>_ _ C D E F G H I>_ _ C D E F G H2instead of>A B C D E F G>_ B C D E F G H>_ _ C D E F G H I>swipe H>_ _ C D E F G H I>_ B C D E F G H2
>>103569071Imagine how retarded you must be to think that ST counts as "nerf stuff"Lmao
>>103569237Wow, it's almost as if expecting a 70b model to perform the same than a 1T model was a retarded concept from start
>>103575856claude 3 opus has 137 billion parameters, and 3.5 sonnet (which is smarter than opus) presumably has less since it's faster and costs less than opus
>>103576441>claude 3 opus has 137 billion parameterssource
>>103576557It's just what appears when you google it since it's repeated by many sourcesBut upon closer inspection, it originates from an obviously AI-generated medium article where the param count was hallucinated
>>103575002>buy card, $499.93 + $5,000.00 shipping>card doesn't work>here's your $499.93 back, have a nice day