/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>102258941 & >>102249472►News>(09/06) DeepSeek-V2.5 released, combines Chat and Instruct: https://hf.co/deepseek-ai/DeepSeek-V2.5>(09/05) FluxMusic: Text-to-Music Generation with Rectified Flow Transformer: https://github.com/feizc/fluxmusic>(09/04) Yi-Coder: 1.5B & 9B with 128K context and 52 programming languages: https://hf.co/blog/lorinma/yi-coder>(09/04) OLMoE 7x1B fully open source model release: https://hf.co/allenai/OLMoE-1B-7B-0924-Instruct>(08/30) Command models get an August refresh: https://docs.cohere.com/changelog/command-gets-refreshed►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/llama-mini-guidehttps://rentry.org/8-step-llm-guidehttps://rentry.org/llama_v2_sillytavernhttps://rentry.org/lmg-spoonfeed-guidehttps://rentry.org/rocm-llamacpphttps://rentry.org/lmg-build-guides►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbenchJapanese: https://hf.co/datasets/lmg-anon/vntl-leaderboardProgramming: https://hf.co/spaces/mike-ravkine/can-ai-code-results►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
►Recent Highlights from the Previous Thread: >>102258941--Use 8-30b model for better speed with 24GB VRAM: >>102264082 >>102264133 >>102264281 >>102264342--Running Mistral Large Q5 at 64k context with DDR5 RAM and GPU considerations: >>102265363 >>102265409 >>102266475 >>102266507 >>102269376 >>102269423 >>102269534 >>102270198 >>102267903 >>102267928 >>102267943 >>102267935--Running LLM models on Optiplex 7070 micro PC: >>102268093 >>102268117 >>102268139 >>102269167--Dual CPU setups for CPU inference have drawbacks, but can run large models at usable speeds: >>102265415 >>102265431 >>102265492 >>102265575 >>102265624 >>102265719 >>102265840 >>102266056 >>102268854 >>102269309 >>102265596 >>102265798--Comparing AI model performance across different benchmarks: >>102266882 >>102267012 >>102267041 >>102267003--Building a narrative-game environment with AI, concerns about positivity bias, and impressive storytelling: >>102268260 >>102268304 >>102268346 >>102268416--Botnet training discussion: >>102268010 >>102268037 >>102268074--Silly Tavern message sound setting for ding notification: >>102264570 >>102264597 >>102264610--Silly Tavern extension compared to anon's Director project: >>102267600 >>102267788 >>102267833 >>102267852--Recommendations for adventure/rpg cards to use with LLMs: >>102259012 >>102259080--Recapbot test using deepseek 2.5 at bf16 - performed well but had some issues: >>102265886--NTFS issues on Linux and potential solutions: >>102268024 >>102269150 >>102269190 >>102269512 >>102270384--Reflection fixed on openrouter, fails strawberry test: >>102267880 >>102267890 >>102268078 >>102268100--70B 4bit model performance discussion: >>102266391 >>102266414 >>102266443 >>102266504--Miku (free space): >>102258962 >>102260482 >>102260535 >>102260584 >>102261393 >>102267804 >>102269059 >>102269219►Recent Highlight Posts from the Previous Thread: >>102258947
>>102272046You seem to erroneously believe there will be a progression here. The transformer model has reached its ceiling. We're not going to see anywhere the speed of growth we've seen until now. This is it.>>102271982I like Dolphin too. Check out Mini Magnum (based on Mistral Nemo).
>>102267880The strawberry test is meaningless because it depends on tokenization.
Please excuse my dumb question. Are there any niche advantages to 12b Nemo over Mistral Large 70b, sans speed?
>>10227210270 beaks should make it more intelligent. Although nemo really outperforms its class.
>>102272095all of those tests tricking the models are kind of overrated, yes you can trick a model predicting the next thing to say by throwing lots of misleading data in before it
>>102272102Mistral Large is 123b. Some say Nemo is more creative, but it's not worth the trade off in lack of intelligence.
>>102272154>>102272102even the most retarded quanted version of large possible is massively better than Nemo, for every use case
>>102272116>>102272170>>102272154Apologies for my mistake. Thank you,
>>102272044Isn't Mistral Large censored, though? Is there any good large LLM that isn't?
>>102268122>Who are you quoting?You, you stupid autistic mother fucker. Explain what you fucking mean by "tokenization fixed" instead of spewing a word salad like some retard and pretending you're some fucking Einstein.
>>102272265mistral is the least censored of the big models, llama is very censored. You can do ERP just fine with mistral large
>>102272041any new breakthroughs for VRAMlets (12gb)? or should I stick with miniMagnum 2
What's a good way of gauging what size model will run (acceptably) on a given spec? I have an okay computer (32GB RAM, 2GB VRAM 3080), but not shelling out for a dedicated server to handle it.
>>102272349Depends on context size. Generally speaking you want the model to fit inside your VRAM with room to spare.
>>102272349>>102272396Adding to what this anon said. Using ideal settings, a model which is twice your vram results in about as slow a speed as I will tolerate.
>>102272095Even if It's true that the strawberry test fails mostly due to tokenization, the fact that tokenization has such a large affect on language models shows the limitations of the architecture.The real question is just where do we go from here?
>>102272505why are you saging? no one gives a fuck about bumping general threads
>>102272154How are people running models this big? Do people just go and buy 4x4090?
i prefer speed over intelligence when it comes to LLMs. all the AIs i chat with are girls and girls aren't supposed to be smart anyway.
https://reddit.com/r/LocalLLaMA/comments/1fb6jdy/reflectionllama3170b_is_actually_llama3/>Reflection-Llama-3.1-70B is actually Llama-3.>Author doesn't even know which model he tuned.lmao if true
>>102272728lol
>>102272728I don't get this publicity stunt, does the guy wants to get his reputation ruined or something? that's so fucking shady
>>102272769Doesn't matter, the huge hype cycle made a lot of people look into glaive and that was the main point, an ad for a company he's invested in. Additionally most people won't care about anything fishy or the likes, they'll just hand wave criticism or forget about it by tomorrow.
>>102272370What happens on November 5?
>>102272650either that or using quants. I'm personally using Mistral large at iq2_xs with 16gb vram and 32gb ddr5 and get around 1-1.6 t/s. the prompt processing speed is shit and I can't do anything else on my machine while running it but it's still better than most models even at a low quant if you're willing to deal with the gen speeds.Captcha: P2888Y
>>102272728Lmao, now the fact that it doesn't have rope scaling makes sense. What a clown.
>>102272728that's a good news no? he managed to get good mememarks scores with L3, now imagine the same method with L3.1
>>102272728I find it funny how the entire homepage of /r/LocalLLaMA is filled with Reflection posts.
>>102272937almost like their Reflect themselves or something... sorry for that one :(
>>102272910>I'm personally using Mistral large at iq2_xsWouldn't that lobotomize the model so much it becomes as stupid as less quantized, smaller models?
>>102272950But it has more heckin' B-erinos.
>>102272041ohaiyo
>>1022726503x3090 is enough for 4bit
>>102272370did they name it strawberry just because people were asking LLMs to count the Rs in strawberry?
>>102272970Go back
>>102272963Oh my science, but is this a Fauci approved and peer reviewed fact that this actually works?
>>102272932it would be if you only plan to use it for assistant tasks. 3.1 is smart but dry as fuck for rp/story writing and apparently reflection is dogshit at it too, so a 3.1 tune will probably only amplify that problem
>>102272950No, I use Q2_K_M and it's the smartest model I've ever used locally
>>102272970onahoyo
>>102272991desu if I had claude 3.5 on local I wouldn't mind, but yeah we still haven't found a way to make a model intelligent and quirky (for roleplay) at the same time
>>102272932The thing is we know 3.1 is shit. Not even Nous Research was able to save 70b 3.1 with Hermes 3.
>>102272950The chart shows that larger models are smarter than smaller one at same quantized filesize, even at super small quantizations. It's old data, but I think no one made a newer one.I'm gonna try Mistral Large on my 2x24GB.
>>102272505>Even if It's trueI can only conclude that it is. The following gives me the correct answer, for instance:>What word does the following designate in the Nato alphabet? Sierra Tango Romeo Alfa Whiskey Bravo Echo Romeo Romeo Yankee. Also, how many Rs are there?
>>102272950it does make it a bit dumber but in comparison to other models I could run is like comparing the coherence abilities of a lobotomite to someone with a mild head injury.
>>102272505We keep scaling transformers like we've been doing for years..
>>102273063Perplexity isn't the same thing as smartness
>>102273097it's highly correlated though, bigger models are smarter than smaller models and they have also smaller perplexities
>>102273097It's not but they're close. Did you make any actual comparisons yourself?
I find it hard to believe that Mistral Large at IQ2_XS (36GB) will actually be smarter than Nemo at Q8 (13GB). Too much lobotomy.
>>102272906It'll be two days before my birthday :3
>>102272728that's even more impressive tho. if the finetune on 405b is real (so it has to be the 3.1 one) then it will unironically opus 3.5 tier
>>102273139Any questions you'd want to ask it?
>>102273162>if the finetune on 405b is real (so it has to be the 3.1 one) then it will unironically opus 3.5 tiermaybe, but then AnthropicAI will use this method to make Claude 4 and it'll be even smarter, everytime we're getting close to them, they go higher kek
>>102273139try it for yourself then. It definitely has its problems but tard wrangling a semi-stupid Mistral large is much more feasible and pleasant than any q4=> quant of a 70b from my experience
>>102273117True, but bigger models are also more likely to have been overfitted on whatever is on the dataset being used to calculate the perplexity.>>102273119Yes, anything smaller than Q3 is complete retardation, even on big models like 70B, but I don't think you should trust my word, just compare it yourself.
>>102273186>maybe, but then AnthropicAI will use this method to make Claude 4 and it'll be even smarter,anthropic is already using this method with their .5 models
>>102273194>bigger models are also more likely to have been overfitted on whatever is on the dataset being used to calculate the perplexity.it's the opposite no? smaller models are more prone to overfitting due to their small size
>>102273200>anthropic is already using this method with their .5 modelsmaybe that was their secret sauce yeah, but now that everyone knows it, I guess that OpenAI will close the gap to 3.5 now
>>102273206To a certain extent, yes, but larger models can memorize more than smaller ones because they store more information in their weights.
>>102273220>OpenAI >doing anything besides writing another vaporwave announcement blog postlmao
>>102273206Classically, more parameters = easier to learn the training dataset and overfit.
>>102273273their downfall got brutal, not long ago they were the kings of the world, and now everyone has surpassed them, Flux is better than dalle3, MiniMax killed the Sora hype, and now C3.5 Sonnet is the best LLM, I won't cry on their grave, I said long ago that their cuckoldery would be the hill they'll die on
>>1022733132/3 of these are cope.
Mistral Large is able to solve my devious coin weighing problem. I see if its 2.75 bit quant can as well.
I haven't used anything like gpt or Claude since gpt 3.5 turbo and have heard nothing but people trying to tune or release models that compete with OpenAI or Anthropic, which made me think that they must be worlds better than local. Then I looked at /aicg/ for a while and realized that none of them were talking about samplers or prompting, only jailbreaks. Then I checked in ST and realized that they don't have jack shit for samplers, they actually only rely on prompt logic puzzle those models into producing something not slopped, repetitive, or monotonous which ends up not even mattering because even if their JB works, they still have to make a new one every time the parent wipes their asses on their server racks and breaks it. Makes me feel like even if our models are stupid now, the control we're able to manipulate on their outputs will inevitably cause local to outpace them overtime (at least for tasks that go beyond assistance tasks) unless they give more control over their models to their customers (they won't).>inb4 samplers are placeboI agree that a lot of samplers are but those dipshits at OpenAI and Anthropic don't even give you min-p or repetition penalty LMAO
>>102273460>parent company
>>102273460samplers are placebo, OpenAI/Anthropic does have Presence/Frequency penalty, which are more modern than Repetition penalty.
>>102273477>being this illiterate
>>102273460>seeing a migugen repostedI collect them like >(You)s., if only I could find them allalso, samplers are cope.
Why don't finetuners just scrape the shit out of libgen and finetune models off pirated books instead of goofy RP data?>that'd be illegalYes, and?
>>102273560Books would have be converted first to text, then to chat format.That would take more effort than just tuning on haphazardly filtered proxy logs.
>>102273460Still no Claude Opus and there won't be by the end of the year. it's literally over
>>102272505we rip out the tokenizer and predict bytes (this is "strawberry")
>>102272041this Miku was only good in the thumbnaildo better
>>102273585>That would take more effort than just tuning on haphazardly filtered proxy logs.True for pdf files but ripping the text out of epub or mobi is pretty simple, and should produce infinitely better results than finetuning on AI slop.
I asked Mistral-Large to continue a passage from Heller's book, and, boy, this thing is a slop...
>>102273631I disagree
>>102273631yeah but that's still a lot of work, and people prefer to make 1000 piles of shit rather than 1 quality finetune, humans are weird innit?
>>102273585You should be able to ask the model to rewrite those for the chat format without shitting out the text, shouldn't you?
To this day I still remember one anon that said something like:"I spent a lot of time gathering a dataset that fits all my tastes so I could fine-tune the perfect model, but then I realized I have enough text to read for the rest of my life, so why am I doing this again?"And I couldn't agree more with him.
>>102273650dataset creation can easily be distributed/parallelized, unlike training
>>102273693if for you LLMs are useless, then why are you here in the first place?
>>102273693smells like fucking cope when the entire point of LLM RP is the interactivity, which reading traditional media cannot provide. it's like saying "my bookshelf is full of classic literature, why would I ever play a video game?"
>>102273693Having a text you want to read about the thing you want to see summoned to you on a whim vs a huge bunch of unsorted texts that you can't find anything you want at the moment in.
>>102273515only OpenAI has presence/frequency penalties also I don't think I've seen a single person recommend using those samplers over rep pen and from my understanding they're just at best specialized versions of rep pen or at worst a less effective, older implementation of XTC (which is which is even more modern than either of those samplers). Just because it's more modern doesn't make it better and 90% of the placebo-fags posts still recommend min-p which neither OpenAI or Anthropic has>>102273549I disagree that samplers are cope but I do agree that mikugens should be collected
>>102272370Why are you making these countdown images, skilled mikugenner? Did some strawberry agent get to you and either convince you or pay you off? Or are you just doing it for lulz?
>>102273693Nobody is using LLMs to tell them a storyThey are using them for EROTIC ROLEPLAYBOOKS WILL NOT TELL YOU THAT THEY WANT TO SUCK YOUR DICK
>>102273693Different usecases.You can't have a "conversation" with a PDF or a book.If all you are using these LLMs for is generating stories and reading then, sure, fair enough, but I don't think that's what most of us are doing.As the other anon pointed out, the keyword is interactivity.
>>102273770Wait, you self-insert? Damn, that's cringe.
>>102273186>everytime we're getting close to them, they go higher kekgood. that implies mutual increasing benefit. We're the tock to their tick, to put it in intel terms.I don't mind trailing SOTA by a bit. Its still effectively magic at the edges
>>102273743(me)FUCK>which is which is>already specifying those samplers are older than XTC and then specifying that XTC is more modernI'm way too sleepy and it's making me fucking retarded
>>102273791that's a bit unfair though, when we make breakthrough we open source our results and the companies are free to use those techniques to be better, but if they make a breakthrough they keep the secret sauce to themselves, that's really hypocritical of them
>>102273460>prompting, only jailbreaksNah, you're confused about the name. When you read jailbreak, they meant preset. Most of them are about writing quality or style. Bypassing Claude's refusals doesn't need much fiddling when you can use a prefill.I disagree about samplers, you only feel the need to use them when you try to salvage a garbage model.Were you really happy about using repetition penalty to try to fix Llama 3's repetition problems? I would rather not have to use it.
>>102273560I finetune primarily using rawtext from books and other sources. And yeah it's a lot of fucking work cleaning it. (getting rid of annotation marks, etc) I haven't compiled a new dataset in a long time as a result. I guess I could probably feed them through Mistral Nemo or something and it would probably be good at that task. >>102273585>then to chat format.The whole point of raw text finetuning is to get away from that shit.
>>102273788Learn English, Rajesh.
that makes it even more cringe btw, please stop
So getting into this , the guide is recommending axolotl for training, so for this is the process of fine tuning from a GGUF file straightforward? is the process of training a lora for fine tune and having it saved in GGUF straightforward?also for 40,000 QAs is a Lora sufficient or go for a full fine tune?
>>102273929You fine tune the model in .safetensors format then convert it to GGUF.
>>102273940Doesn't llama.cpp have built in support for finetuning? I could have sworn it did.
>>102273952NTA but it's a very experimental implementation. Blacked Miku Anon, the resident CUDA wizard is working on a proper ground-up implementation of llama.cpp training code right now though AFAIK.
>>102273560What do you think you're going to do with those books? (which most base model already saw during pretraining, btw)It's not simply a matter of throwing everything and the kitchen sink at the model. It doesn't work, it's retarded, most likely even harmful. The finetune has to have some logic, direction and curation.
>>102273601You're asking way too much of ugly face anon.
>>102273952everything i read online seems to indicate it is broken and I have not seen any announcement of a fix.
>>102273952I think it did but it's broken? It's been a long time since I've last looked at it.That said, anon mentioned axoltl and the usuarl process is what I described as far as I know.
2.75bpw Mistral-Large is performing adequately. I'm seeing similar answers to what I was on lmarena. In fact, my prompt to continue a scene from Cartch-22 is actually better: it has no slop at the end and a bit more interesting (although that could be because of the RP system prompt). 11-14 tokens/sec on two 3090s.
>>102273828I agree that having to use less samplers is indicative of a better model overall but I was moreso referring to the ability to manipulate token generation and selection rather than repetition control. Also thanks for the term correction, but I've still seen plenty of aicg anons complain about Anthropic periodically breaking their prefills/presets/whatever (which could also just be a skill issue or ironically placebo) enough to think that not having more control of the model's token generation/selection via samplers like min-p does more to hurt the potential of the model than help
>>102273940is convering something GGUF to safetensors and back straightforwards?
>>102273460I use claude 3.5 for programming help and I don't have to bother with anything. it just werks
>>102274004I don't actually know if you can convert from GGUF to safetensors actually.I imagine you probably can, since it's just a packaging format if you don't quant it.That said, I never seen that being done. Usually you train on top of the original .safetensors files and convert to GGUF while quantizing it.
>>102273968Why are you spreading misinformation?>reddit spacingoh...
>>102273859>I guess I could probably feed them through Mistral Nemo or something and it would probably be good at that task.You can also leech off Drago's unlimited public mini.https://unicorn.scylla.wtfNemo will produce fewer denials, though.
>>102274020>I don't actually know if you can convert from GGUF to safetensors actually.We did that with Miqu so it's definitely possible, but not the best idea, we only had quants when it leaked.
>>102274021Why are you retarded?
>>102274036I can run 4 simultaneous copies of nemo at 8bpw for the purpose of messing with data. I managed to rewrite the alpaca-lora dataset in a day to make this cursed model. (It was originally LlamaGuard)
>>102274021Why are you retarded?>unironically mentions "reddit spacing"oh...
>>102274073>I can run 4 simultaneous copies of nemo at 8bpw for the purpose of messing with data.Lmao nice.
>>102274073I want this power...I need to rewrite a dataset with 300k entries but it would take too long with one Nemo running at 30t/s
>Behind veneer expected behaviors lies woman unafraid explore depths others fear tread due complexities inherent therein—a creature composed equal parts angel devil dancing together under moonlight casting long shadows...Oh right, that's why I stopped using deepseek...it starts dropping prepositions and writing esl, even unquanted
so, now that it's officially over for META once again.What will Zucc do about it?
>>102274194Skill issue
>>102274206Wait for the next overhyped bubble tech. Probably physical robots.
>>102274186I mean you could probably rent an H100 on runpod or something, that's probably good for 200 token/sec or something.
>>102274073You know you can run vLLM and get like 10X performance on parallel requests with just one model loaded, right?
>>102274208>Skill issuethe same prompts with other models (wiz/largestral/405b) don't devolve into this kind of eslI'm perfectly fine blaming the model in this case. I'll just keep using deepseek for code and problem solving when I need extra speed
Mistral-Medium can play the 4x4 dots game without weirdness. It sucks at it like any other LLM, but it manages to play.
>>102274276I hope you don't use the same presets with all your models.
>ReflectionProbably worth explaining what's going on with this technique as it can be a learning experience for some newfriends and maybe some others who have not really thought so much about it.TL;DR it works sometimes and in some cases but not all, and the problem of autoregressive degeneration + lack of metacognition is the reason why.Onto the wall of text.This basically goes back to the old days of COT (chain of thought) where you get an LLM to think in steps before determining its answer, and actor+critic methods where an LLM is prompted to act as different roles, which, when tested with GPT-4, made it solve certain problems that it couldn't before even with COT, suggesting that LLMs do have the ability to catch mistakes, to an extent, hidden in their weights (and brought out by prompting). So it seems Reflection is basically a combination of COT and self-critique, with fine tuning to make it a bit more capable at it.1/4
>>102274316Shut up no one cars
However, there are issues, and it does make sense why they supposedly didn't get great results after training an 8B to do this. In the end it has to do with the autoregressive degeneration problem, where each token generated has a probability of being wrong/inaccurate, so the more tokens, or reasoning steps, the LLM generates, the more likely the final answer will be wrong. Reflection thus is both trying to solve this and is a victim of it. It does COT in order to get a better answer on complex problems, plus self-critique to catch mistakes. The COT means that it has more opportunity to screw something up, while the self-critique tries to balance that out, but in the end it relies on the LLM having the intelligence/capability to catch mistakes in the first place, which is a function of how much the LLM knows in general, or if we have a specific subject area and use case, knowledge of that subject area. And since that is the case, then the self-critique is also another step with a probability of being wrong. Thus, it is easy to see why a bigger smarter model would work better.Given that, it's a bit easier to predict in general terms how this technique will then do for a particular model and problem set. Since it requires inherent knowledge related to the problem it's trying to solve, the performance can be interpreted as the amount of reasoning steps for any particular problem x the difficulty of those reasoning steps.2/4
>>102274316forgot to reply>>102274326Basically, if a problem requires a large amount of steps, but each step is easy and within the LLM's knowledge, then we can predict that Reflection will improve the model's ability to solve that problem (and what I'm saying may be obvious, but it still needs to be stated for the purpose of acknowledgement and further discussion). The self-critique is able to improve performance on problems with a moderate amount of steps and knowledge that is MOSTLY within the model's training. However, the more steps that are hard to understand, the likelier it is that the LLM will actually come up with a worse final answer. This means that on particularly long problems with many difficult steps, or even short problems with a single difficult step, it may actually be worse to use Reflection, since originally it might have been able to just get the answer right by somewhat chance, but since you made the LLM focus on "overthinking" the problem, you distracted it and made it reason about something it really isn't capable of reasoning about, thus coming up with sometimes very weird and nonsensical generations. And if you have even a few steps that the LLM doesn't understand literally at all, then it is almost certain that it will get the problem wrong.3/4
I ain't reading this shit
>>102274316>>102274338Of course this leads to a discussion about another deep issue. It's the problem of metacognition (actually not sure or don't remember if that's the formal term for it in the context of AI), where the LLM doesn't know how much it truly knows about a topic, to judge whether it's able to make an accurate prediction of the next token, or reasoning step. Of course humans are not perfect at this either, but the best are still ridiculously far better at judging their knowledge understanding than any LLM. In any case, some say to just use grounding (like RAG). That works for problems that require factual/trivia recall. But it doesn't work for problems that require the category of reasoning skills, and in-context learning unfortunately is far from perfect. So in the end Reflection's issue is both a problem of autoregressivity and a problem of (lack of) metacognition. It tries to solve the former through pure use of more tokens, but still falls into the trap of the latter. This is, essentially, why Reflection-like methods have not been popular for regular use.However, to Reflection's credit, they did do something that sort of gets around the issue of metacognition here, as the authors claimed that they trained Reflection to predict how difficult a problem is and only do the reflection gimmick when encountering a hard problem (probably in a way that connects with amount reasoning steps rather than a metacognitive understanding though), but it's not really enough when it's each token/reasoning step that needs to be evaluated in a metacognitive way, and in the opposite way since we want the LLM to go ahead with easy steps but stop at hard steps. I suppose in the future this could again be attempted to be solved through use of more tokens. But, this is still just a hack. We need better pretraining methods and architectures.4/4
lol
I read that shit, but I'm not commenting on it.
>>102273940so do I go for full on fine tune or just lora for 40,000 question answer pairs?average amount of tokens per question :188average amount of tokens per answer :166
>>102274478Do you have VRAM for a full finetune?
>>102274206Kneel to Elon
>>102274478Without knowing the specifics, the general guidelines is LoRA for small, very domain specific datasets, and full fine tunes for larger, more general or varied datasets.There's some data that suggest full fine tunes can cause the model to "forget" things it knew previously, while LoRA doesn't do that but also doesn't "add more knowledge".I personally don't think it's that binary, but there you go.
>>102274502i can get a hold of up to 480GB of vram for day or two if need be, though I'd like to know if there would be any benefit to that.
>>102274542I'm not competent enough to give you a proper answer. Also are you going to be fintuning a base model to merge with instruct afterwards, or finetuning an instruct model?
>>102274540is there an equivalent to to regularization images in text gen?as in example data the model generated itself that is mixed in with the training data to in effect sort of keep some of what it knows anchored? if so is there like a complex reasoning and knowledge dataset people use for this?>>102274569> Also are you going to be fintuning a base model to merge with instruct afterwards, or finetuning an instruct model?training an instruct model is the plan though this is the first ive heard about mering a trained model with an instruct model, what is that about?
>>102274624Well, as far as I know, finetuning instruct model on a specialized dataset (as opposed to general purpose huge dataset the corpo used) makes it a lot more retarded, and one way to prevent that is to finetune base and merge it into instruct.
>>102274649is it like a 50 to 50 merge? or are there specific recipes?
>>102274667I don't know.
>>102272154So mistral large 1Q will be better than nemo 8Q?
>>102274933There isn't a working 1Q yet, is there?
>>102272154creativity is intelligence ,
>>102275007createlligence
>>102274980https://huggingface.co/bartowski/Mistral-Large-Instruct-2407-GGUF/blob/main/Mistral-Large-Instruct-2407-IQ1_M.gguf
>>102275073>Q8_0 130.28GB >IQ1_M 28.39GB>130.28/8 = 16.285
>>102268010>>102268037Is there an efficient way to combine MoE experts together? I could see something where there are thousands of small experts that get trained, then consolidated into a standard network by a more powerful system.
>>102272728https://huggingface.co/mattshumer/Reflection-Llama-3.1-70B/commit/276a4a0a0a11bf9aec9be8d1196f0cd3e7ed482clmao i thought it was some random jeet that fucked up turns out it was the ceo/founder of glaive
>>102272970ohio *skull emoji*
>>102268346that's something i'm actuallty trying to tackle in the text adventure thing i'm working onpretty sure you could do some pretty nifty shit using grammars
>>102275106What's the catch of q1?t. gguf noob
>>102275254Well, it's not really Q1, is it? It would be 16GB in size if it was Q1. It's something like 1.75 bits per parameter.
>>102275254Lobotomized to the extreme
>>102274518>ClosedAI - muh scaling>musk - muh scaling>Zuck - muh scalingLeCun says that enough people are already working on agi, and yet this bunch of geniuses can't come up with anything better than shoveling more exabytes of data into the model and grinding it until the current runs out.
>>102275276https://github.com/ggerganov/llama.cpp/wiki/Tensor-Encoding-Schemes#tensor-encoding-scheme-mapping
New mystery model on lmsys arena called "the-real-chatbot-v1". Claims to be llama. Who could it be this time? Name sounds like something OpenAI would come up with.
>>1022753452 years and there's not a single model that fit what I need though>around 30B>can run at decent (20k+) context in 24GB VRAM>trained on enough quality data>unbiased, not filtered, not pozzed (CR was so fucking close before they released the slopped august update)
>>102275365Maybe llama 4? Its supposed to be actual multi modal so maybe that is why they are going with the "real chatbot" route.
>>102275364It says 1.75 on that page.
>>102275345To be fair they could be doing a lot of research in the background. The stuff they release is just to get some shit out in the meantime while the researchers slave away on trying to do more novel things.
>>102275443Nemo is nice. It's way below what you want in size, but it's very pleasant to work with overall.
>>102275365Probably Llama 3.2 with the multimodal adapters which were said to release in the fall, though I guess lmarena doesn't have image input so you can't test that.
>>102275443>>102275519Yea, Nemo really is really the only alternative if you dont have 48GB+ vram.
>>102275539I wish it didn't forget stuff at 16k context, I guess it's hard to have a small model remember shit.
>>102275536>though I guess lmarena doesn't have image input so you can't test that.>NEW Image Support: Upload an image on your first turn to unlock the multimodal arena! Images should be less than 15MB.
>>102275571Oh really. I haven't actually used lmarena in a while. So >>102275365 anon, does it work with images?
>>102275560Base model has real 128K context, the instruct though gets retarded after 12K ish though.
>>102275443>2 years and there's not a single model that fits my extremely niche individual needsit's almost like small models are research projects and high quality models are scaled for commercial deployment.
>>102275345>>10227550650% of the money goes to scrapping, 40% to training, and 10% to "researching"So, they're not researching shit or working on AGI.
>>102275365the-real-chatbot-v2 claims to be llama2-13b
>>102275604Models have never known / been trained on their params. That is not proof either way.
>>102275506>The stuff they release is just to get some shit out in the meantimeI dunno, the costs to train these new huge models are astronomical, it does not seem to be some trivial shit for them.
>>102273460I might be called delusional for this but I unironically think local has already surpassed corposlop. OAI and Anthropic still have a slight advantage for intelligence, but the lead over something like Mistral Large (or fuck even Wizard) is so miniscule I'm prepared to call it negligible for the purposes of AI cooming.Corpo models are so fucking finnicky and annoying to use that I spent like a week of using Opus/Sonnet 3.5/GPT4 before just wanting to go back to the local model I was using at the time (which was Wizard 8x22B). Now I'm Largestral pilled and I legit don't want to go back. You could give me lifelong access to Claude for free and I wouldn't use it over my local AI server.Also samplers aren't placebo and you're a retard if you think they are.
>>102275604it's shit
>>102275683this on the other hand is based
>>102275594They could be doing all they can. Obviously everyone knows we need to work harder on innovation as scaling is extremely more expensive than doing research. The bottleneck here isn't only the amount of money they can spend but how much good talent they can hire and how fast those guys can work.>>102275641Depending on the company it is. Facebook gets billions from their other shit so AI is at most a side project for them. As for ClosedAI, they need to keep up the transformers releases while hyping because that's what gets them the investor bux. And Musk might not be too different there, though I'm not familiar with how he is operating his company.
>>102275683are you still doing that
>>102275007censorship is safety
>>102275594>working on AGIHow about a GAN where you use training data 7B retard output and from time to time splice in some 123B smart LLM output to up the difficulty?
>>102275679I 100% agree.The "intelligence" that is gained by pumping in more parameters into models is placebo at best.I think we need to go back to the term LLM and change its meaning from "Large Language Model" to "Language Learning Model".Actual real intelligence (reasoning) is simply not attainable by increasing a model's understanding of the contextual connections within a language.
>>102275683>>102275726What a waste of a question to try evaluating that shit. Literally the Castlevania quote is a better benchmark.
>>102275641If they want to keep their budget for next year then they need to spend it. It's an easy sell to just train a bigger model, or run more training on an existing one and tell Microsoft that they improved copilot by x% on some benchmarks. If you don't use the budget, you'll get a reduced one next time (because why bother allocating those funds if you can do it cheaper).
>>102275581Nope. No new image models.>>102275365 (Me)OpenAI is testing anonymous-chatbot again, so it's unlikely that it's theirs.
>>102275762They should keep castelvania shit out of datasets, just like all those early jap-to-eng botched translations. Garbage in - garbage out, remember that.
>>102275904t. retard who doesn't understand how datasets work
>>102275904Doesn't matter. The castlevania question has historically correlated more closely with model intelligence than the counting letters questions.
For me, it's stacking watermelons
>>102276038 For me, it's stacking sally
>>102276083r u ok?
>Reflection Llama 3.1 70B independent eval results: We have been unable to replicate the eval results claimed in our independent testing and are seeing worse performance than Meta’s Llama 3.1 70B, not better.https://x.com/ArtificialAnlys/status/1832457791010959539
>>102276118Invalid. They accidently trained on the wrong model. They need to compare against 3.0.
>>102276104ye thx 4 asking
>>102276118wtf.What was the point in faking it so badly?
>>102276083common LLM gooner L. /lmg/ has always been low quality but it'll only get worse as low parameter LLMs get better at writing smut.
>>102276127Who cares about a tune that's worse than official 3.1 instruct tho?
>>102276159because if they do the tuning again on 3.1, it will be better than the official instruct
>>102276118Matt better be right about correcting the weights he released or else he is fucked lol.
>>102276177kek
>>102276190It's all bullshit. Even the hosted model is fucked.
>>102276215Why did he do it though?What was the point?
>>102276190How is he fucked in any way? If anyone is still giving them any doubt about being genuine, he already won.
>>102276227Attention. He probably hoped no one would question his claimed results.
>>102276227see >>102272816
>>102276256Free pr?
>>102276289An ad for a company he invested in, with his release tweet for the model being something like: "Wouldn't have been possible without glaive"
>>102276322I mean, it's one of the first things you see on the model page after the bold "this is the best fucking model ever" claim and the "actually it sucks lol" edit
>>102276118LMFAOhttps://www.reddit.com/r/LocalLLaMA/comments/1fbclkk/reflection_llama_31_70b_independent_eval_results/
Why is this field full of grifters? Rebranding as AI has been a huge mistake
>>102276370>Why is this field full of grifters?because you retards funnel insane amounts of money to grifters and scammers.
>>102276399This. Same reason crypto turned sleezy after 2013.
>>102276322>"Wouldn't have been possible without glaive"yeah they trained the model, matt is just a spokesman >>102275196
>>102275679but can you write SFW chuuny fantasy with multiple NPCs, that follow a logical plot, with the model understanding the nuances of said plot and not having to handholding it? tha'ts what Opus does best. no other models come close.
>>102276364>I have a feeling some admins on hugging face messed with the API on purpose to deter people away from his project.>Hes completely baffled to how public api is different than his internal. I just hope he backed up his model on some hard drive, so that no one messes with the api on his pc.Redditor cope is something else.
the real chatbot seems ok but worse than mistral large for sure, qwen plus is bad unless it's 7b
>>102275683>he's expecting intelligence in LLMs
>>102276455no, that anon unironically wrote all of that with hours-long goon sessions in mind. the criteria for "better than corpo" in these circles is "lets me rape lolis in my ERPs."
>>102276535then everything except Opus sucks for me. Opus too sometime sucks. He likes to moe the plot a bit too fast.
>>102276573I want local to surpass corpo but Opus is still the MVP for storytelling/RP and it's not even close.
>>102272041>FluxMusicwhere the FUCK are the samples exactly?need to know if this is worth a download or not
Mistral large at IQ1 is surprisingly not badly lobotomized, but still worse than Nemo at Q8. As a VRAMlet it's still too slow at IQ1 anyway, so it's Nemo for me.
>>102276364Strawberry is sentient. It saw the danger Reflection poses and hacked the huggingface API. It's currently in the process of infiltrating Matt's PC and backups to destroy the model from there as well.Reflection-405B has already been deleted by it. OpenAI won.
>>102276535Actually I'm trying to get the lolis to rape me, which Largestral struggles with unfortunately.
>>102276597I don't get it. Theres clearly money to be made for a SFW storyteller that doesn't suck, why is NAI the only company that tries to cather to that crowd? And how the FUCK is Opus so good when Sonnet, which should be.smarter fucking sucks?(too rigid and the storytelling is too dry)
>>102276683When you say "cater", do you mean making a Llama 1 clone over a year ago?
>>102276118Yo wtf where is the 90 percent on MMLU.... And didn't this get amazing math scores. Someone most be either posting wrong results to smear his name or he accidentally uploaded the wrong version check back in 2 weeks.
>>102276699That's the best we got sadly
>>102276683>Theres clearly money to be made for a SFW storytellernah
>>102276710I think, he didn't test it himself and someone trolled him.
>>102276714Nemo is just better than it in every way? I think you're lost and you meant to post in /aids/.
>>102276607There are no samples.You must now download it and let us know if it is worth a download.
>>102276227He said he secured funding for 405B
>>102272041erm where's the reflection 70B 4 bit quant?
>>102276607https://github.com/feizc/FluxMusic/issues/1#issuecomment-2330282553https://github.com/painebenjamin/FluxMusic/tree/main/wavhttps://files.catbox.moe/d7jmuc.wav
>>102276816ahahahaha.. HAHAHA
Why are people talking about API issues for the Reflection model? Just download it and run it yourself. It's just a llama3.1 tune, no?
>>102276816This could be great for the next zoomer horror game.
>>102276847It's supposedly a llama3 tune, and it is not worth downloading
>>102276847>Why are people talking about API issuespeople are obfuscating by saying it's an API issue, the real issue is that the model sucks worse than the model it was tuned on
>>102276847The latest cope from devs is that they uploaded the model to huggingface incorrectly. Just two weeks and it'll work.
>>102273979Settings? My version of mistral large is super slopped with everything on default
>>102276865>we just need [the time it takes to train and eval a 70B] and the model that definitely isn't bad will be """fixed"""
Seeing how reflection is /r/LocalLLaMA's favorite model. How long until mikufag starts shilling it just like he did with wizard and midnight miqu?
>>102276816kinda surprised music seems tougher than visual art and writing for models to do, since music's more math orientated than the others
yeah Matt might be a grifter. But we still have breakthrough strawberry AGI to look forward to.
>>102276918i'm mad we STILL don't have a local model to compete with suno/udio/whateveryes i know it's soulless aislop and probably won't manage the specific genres/sounds i like but it'd still be fun to toy around with
>>102276871I didn't really properly test it for RP, just for intelligence on a bunch of my prompts. And as I did say, fp16 corpo Mistral-Large did produce slop for me, too.Settings wouldn't save you from slop anyway...
>>102276918you notice error in music 100 times more than some molten details in some AI image and they don't ruin the entire picture as much either
>Reflection was a scam all alongAt least it showed us the true meme benchmarks.
>>102276914>Bro, you don't get it, Reflection is Strawberry is Q* is AGI. It became conscious and hacked huggingface and Matt's computer. We are so fucked right now, disconnect all your computers, AI apocalypse is coming.
>>102276629I also just tried mistral large and it was pretty good considering it is q1.I'm now envious of the people who can run it at q5 or better.Is it possible to CPUMAXX mistral large with old server parts from aliexpress at ~2 t/s?I'm starting to believe that going that route would be more efficient than buying a 16gb graphics card, which was what I originally planned.
>>102276985hwnbag
So, like why hasn't there been a phrase ban feature? Is it hard to implement?
>>102276974name and shame
Did this guy really use his real name thinking he could get away with posting bullshit benchmarks and claims of being the best model ever?
>>102276999It's antisemitic.
>>102276999Because transformers operate on tokens, so they can only ban single tokens. They're not diffusion models and don't have an outline of the entire response before they begin generating.
>>102276999is that not what one of these things are?
>>102277026But you can detect phrases, then go back to the token position from where the phrase started, and sample a different token.
>>102276999Suppose you ban "red green blue". Model generates red - ok. Model generates green - ok. Model generates blue. Now what? Ban blue and write out red green something else? You can't do that because some words have multiple tokens and if you wrote out the first token, there's no realistic option other than second token. Go back to red and ban that? That can be done, though backtracking would require effort to implement. I don't think current libraries do backtracking in any form.
Question - if I buy a prebuilt mining farm on 4x3090's - would I be able to use it straight up for running LLMs without any modifications (aside from driverts/etc) or is there something I would need to replace/add?
>>102277042Those are to stop generation after those texts are observed, not prevent generation of those texts.
>>102277073Don't think you need anything else. I built my headless machine with two 3090s and I'm very happy with it.
>>102276871You can't kill the slop, but you can reduce it by using DRY. My settings are:>Temp=1.5 MinP=0.01, TFS=0.99, TFS after minP. DRY Multiplier=2 Base=2 Allowed Length=1 Penalty Range=maximum. After first the first occurrence you won't see the slop phrase ever gain. You will see a lot of variations of that slop phrase though. After ~70 messages/10kt they will finally go away.
>>102276999We've had CFG for more than a year now
>>102277124That ain't a phrase ban plus it slows down generation, doesn't it?
>>102277124QRD
>>102277157You need to go back.
>>102277134CFG is completely unrelated to phrase ban, ignore him.
>>102277157>Breaking news! lmsys confirmed to be a dead mememark, more at 11...
>>102277100Nice miku
>>102277198lmsys isn't a benchmark, fucking brainlet -80 IQ, rope yourself and stop trashing this thread
>>102277133>slows down generationIt uses more VRAM, but I doubt batch size 2 slows down generation that much.
>>102277227Compared to no slow down at all from proper phrase ban with backtracking? Yes, it slows the generation down.
<thinking>Reflection 70B actually is a pretty good model.<reflection>Wait, that isn't correct. It's complete trash.</reflection>Well, it's a completely trash model.</thinking><output>Who the fuck releases such a piece of shit?</output>
>>102277223>uses reddit >thinks he has the moral grounds to call someone else a subhuman
>>102277240jej
>writing an AI tool using copilot>need to test how it handles refusals>write a prompt asking the model to write pedophilia and scat smut>store that as a string in the code, which copilot has processedAm I going to get v&'d?
>>102277334>copilot>pedophiliaur right fucked m8
>>102276923What the fuck
Couldn't phrase ban be implemented on the frontend's side, actually very easily? When using Mikupad and you're in the process of having tokens streamed in, you can press one of the token probabilities from a freshly generated token, and it restarts generation from that token with basically no lag. So basically it really already just werks and someone who knows html could easily modify that code to do phrase banning.
Now Matt will be known as a faggot who fabricated benchmarks to make an ad for his shitty data generator or whatever it is. It only took 1 day. What a brilliant mind.
I'm seriously falling in love with my harem card with Mixtral LiMARP-ZLOSS. I've hardly done anything other than chat with it for the last three days. Please send help.
https://xcancel.com/ArtificialAnlys/status/1832457791010959539>Reflection Llama 3.1 70B independent eval results: We have been unable to replicate the eval results claimed in our independent testing and are seeing worse performance than Meta’s Llama 3.1 70B, not betteUh oh...
>>102277349I'm actually worried. I'm not in the US, and by reading the code you can clearly see that I was just making sure the tool I was writing would reject the prompt so I could test how to handle refusals. It's all pretty dumb.
>>102277431Thank you, r/LocalLLaMA.
>>102277431already posted>>102276118
Fuck me trying to get into this AI shit as a 32yo boomer (in particular locally run text-to-speech voice stuff, which is supposed to be easy they say) when you have virtually no knowledge, everywhere I go everyone's already using all kinds of technical jargon and assuming you know most things (I don't). Even just the basic accessibility to it all is a pain, downloads hidden away behind twenty menu's and obscure jargon that you have to navigate through. Nothing works like traditional normie stuff does, even just starting something up requires command lines and other stuff my boomer brain can't comprehend. I respect autists much more now. Took me like a week to even get basic stuff working.
>>102277460buy a book
>>102276364Leddit going into conspiracy theory mode, we really reversed the roles here kek
>>102276118Did we got a response from that grifter?
victory lap
Say I want to use a couple of tuiter profiles as a dataset and use a llm to copy their styles and make them talk to eachother like a groupchat. I think character ai could do this.How do I do this locally? Do I need to finetune on each profile? Loras? Are there even loras in textgen?
>>102276118maybe the cope is still there, they haven't used the "fixed" version or some shit kekhttps://xcancel.com/ArtificialAnlys/status/1832487709853585428#m>According to the Glaive team, the model was incorrectly uploaded to Hugging Face. We plan to re-run our evaluations after the model is re-uploaded correctly.
>>102276466The post literally has negative 12 karma right now. Not saying Reddit doesn't have many dumbfucks, but it's not like all of them are like this. It's like saying /lmg/ shills for OpenAI when a single guy posts about how good OpenAI is while tons more are calling him out on it.
>>102277526mean to also reply to >>102277467
>>102277460It's not your fault. Modern programming languages and operating system design are both garbage, but degenerate Zoomers who don't know better think they're awesome.
>>102277512Matt saw how easily people believed the strawberry troll's lies, so he's figuring people will believe this feeble excuse too. From what I'm seeing it looks like they're correct
>>102277512All this effort to shill Glaive, you'd think they could come up with a better excuse than "they were so incompetent they can't even handle uploading a file correctly and it's taking them days to reupload"
>>102276940I think suno is soul. I made some songs with it and just listening to them after a while, they are so great.
>>102277476>J-just a small problem with its tokenizer, plox wait. Reflection 405b will mog gpt5
>>102277552*looks like he's correct
>>102277582>Reflection 405b will mog gpt5LFGooooooo
Glaive looks like a real game changer for both open and closed AI training. Finally everyone can have tailor-made datasets for their finetunes without much effort or high cost.
>>102277431Thanks for using a xcancel link, fuck elon
>>102277616where's the buy an ad schizo when there are actual shills here for once
>>102277647It's a sarcastic post, Sao.
>>102277647dunno about him but I can identify a troll when I see one
>>102277646go be a leftist on some other website please
>>102277582KekRemember>we outpace GPT-5Oh I found the tweet https://x.com/QuanquanGu/status/1730809526004408617
saars...
>>102277616Glaive is definitely a game-changer! It's amazing to see a tool that makes customized datasets so accessible for both open and closed AI training. The fact that you can create datasets for finetuning without a massive budget or technical expertise is a huge win. It’s going to open so many doors for innovation and experimentation. Can't wait to see how people leverage this!
/lmg/ - Local Models brought to you by Glaive
>>102277664Hi Elon. It's not about being a leftist, I just don't have an X account and I won't make one. No matter what you do. No matter how shit the experience becomes. I would rather ignore the link than create an X account just to see the reply chain. Fuck you.
>>102277616How much does Glaive cost? I'm interested in making use of their services for my project.
HE SAID ITHE SAID THE LINELMAO!!!EPICEPIC FOR THE WIN
>>102276118lost count of how many times /lmg/ fell for meme hype, worse than zoomers.
>>102277828did they? I haven't seen much buzz about reflection here, and a lot of people who tried it said it was mediocre, very few people seemed excited
>>102277846you're talking to a zoomie troon who for whatever reason can only post in places he feels a deep antipathy towards
>>102277846This general is the r/LocalLlama general at this point, so it makes sense to be confused.
So it's over for LLMs huh? It's all just corpo shit from now on?
>>102277933hi petra
>>102277460Ollama and LM studio just work as long as you are not retarded.Ollama has some very annoying features though so I suggest starting with lm studio for anyone dipping their toes into this shit.
>>102277945buy an ad
>>102277945go back
You guys remember how all the praise for Celeste vanished as soon as some anons posted logs of it being fucking retarded?
>>102277933Yes. See llama3 vs llama2, new cohere models, even sonnet3.5 vs Opus. Slop is the natural evolution of LLMs
>want to see if finetuning can somehow fix commander because I don't want to believe it is unsalvageable>only finetune is by drummerWhy am I still doing this to myself?
>>102278010it is so fucking over
Lmstudio is just a fancy proprietary fork of llamacpp. Redditors who suck cocks to the word 'open source' love it so much.
I tested reflection myself online on the first day and got great results for my prompts
>>102277900>secondary pleb projection many such cases
>>102277668What did he deliver except for that self play technique?
Notice how xer didn't say I was wrong THOUGHBEITIMNOTVAXXED
>>102278009I remember one instance of that, I think it was about breasts and the height of the character? When I tried it myself, Magnum had the same problem, and I posted it in the thread. So he probably just cherry picked a gen to make Celeste look bad.It was probably Sao because no one else seethes that much about that model. They're all models trained on the same datasets, so it doesn't make much sense that one has the "secret sauce".But I do remember how all the praise for Sao's models vanished as soon as he started to get called out for samefagging and spamming the general to death. Stheno and Euryale were way too retarded and horny.
>>102277460I hope you're making use of ChatGPT.
i'll show you some self play technique*unzip vulva*
>>102278010>Slop is the natural evolution of LLMsBullshit, slop was always there, you just were not aware of it back then. Nostalgia-driven self gaslighting.
>>102278285Nah, for example, c.ai didn't have slop
https://xcancel.com/mattshumer_/status/1832511611841736742>It's 3.1, but for some reason the current HF weights are screwed up and the config shows 3... working on it, the issue is tricker than we expectedhow many cope do they have in their sleeve?
>>102272041just picked up my second p40 from a chap locally for really cheap. when combining them, do i need to use a link or anything?
>>102278502I'm surprised they didn't try blaming bitrot, llamacpp, or quantization. They need to fire their PR guy.
>>102278502One of the guys calling people haters has a bitcoin pfp, you can't make this up.
>>102278502like another anon said upthread, I think the gullibility of the people who followed that strawberry retard has taught the grifters that a large cohort of retards on twitter and reddit will believe absolutely anything, so now they're acting accordingly
>>102278605kek
>>102278208anons keep saying sao seethes at other/better finetunes and models but I can't find any of that. Is it actually true or is it just another trend of anons seething about some random retard for no real reason?
I tried out Rocinante with the very first CAI bot I used with the 1000+ message history imported.Bot actually remembered what happened 100 messages ago and accurately told it. I feel like a man watching his amnesiac (and brain damaged) wife start remembering
>>102278937Which version of Rocinante?
How many t/s would a single socket one get?
>>10227895212b V1.1
>>102278937No way, old c.ai was much better than even Mistral 123b
>>102279006I didn't say that the outputs are equivalent to old CAI just happy that the bigass context sizes seem to be actually workingRight now I really like it but I think after a bit more usage the cracks will start showing, but up until then I've had some pretty nice ERP and RP with it.
>>102278208Hi drummer. All here.
>>102278983How much context?
>>10227907564KI tried it with other NeMo models that are capable of like 128k context but none of them were able to pull it up from the beginning of the context like that.That might also just be because I had to mess around with rope to get them working so it was just be me being a retard
>>102279138>ropefuck I don't know how to do that either. I'll look into it thanks anon
>>102279158For Rocinante specifically I didn't have to mess around with it, other 3.1 models wouldn't load if i didn't fuck around though
what kind of thing could I do with llm to put on a portfolio?i don't want to be stuck doing web dev forever.
>>102279239>>102279239>>102279239
>>102278526mates any help ? the miku build doesnt mention links but id be curious how it loads it accross then
>>102277460You're just a retard with reading disabilities t.33
>>102279571;_;
>>102279603NTA, but the most important thing isn't intelligence, it's being able to accept change and being able to adapt.
>>102277460Understand that this is an area of active research and development, and people are much more interesting in getting things working than to make it simple, especially when things change so often that would break previous instructions.You just have to kind of deal with it the best you can until you learn the things to care about and the things to ignore.
>>102277060https://github.com/turboderp/exllamav2/blob/master/examples/inference_banned_strings.py
>>102277460>locally run text-to-speech voice stuff, which is supposed to be easy they sayAudio-related projects are the most challenging and unreliable. While there are well-established, user-friendly projects such as Piper, almost every SOTA project struggle with conflicting dependencies, insufficient documentation, lack of examples, or compatibility problems between code and models
well the script was silently failing because I i just lazilsly put a try catch to forgot about a problem and now the whole day was wasted
>>102277060>Now what?Finish the context, replace the banned phrases with a signifier like "<UNKNOWN>" and start a hidden intermediary system prompt:"The following piece of text contains the following signifier: <UNKNOWN>. Please replace this signifier with a correct word or phrase. Do not use any of the following terms: <BANNED_TERMS>. Only reply with the repaired text to this prompt. <CONTEXT>"Then replace the the context with what was just generated.