/lmg/ - a general dedicated to the discussion and development of local language models.Miku EditionPrevious threads: >>103478232 & >>103473510►News>(12/12) QRWKV6-32B-Instruct preview releases, a linear model converted from Qwen2.5-32B-Instruct https://hf.co/recursal/QRWKV6-32B-Instruct-Preview-v0.1>(12/12) LoRA training for HunyuanVideo https://github.com/tdrussell/diffusion-pipe>(12/10) HF decides not to limit public storage: https://huggingface.co/posts/julien-c/388331843225875>(12/10) Upgraded version of DeepSeek-V2.5: https://hf.co/deepseek-ai/DeepSeek-V2.5-1210>(12/09) LG releases EXAONE-3.5: https://hf.co/LGAI-EXAONE/EXAONE-3.5-32B-Instruct►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/tldrhowtoquant►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/leaderboard.htmlCode Editing: https://aider.chat/docs/leaderboardsContext Length: https://github.com/hsiehjackson/RULERJapanese: https://hf.co/datasets/lmg-anon/vntl-leaderboardCensorbench: https://codeberg.org/jts2323/censorbench►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
>scat fetish general
>>103499499hi petra
>>103499503>not denying itkek https://desuarchive.org/g/thread/103478232/#q103498549
>>103499520one schizo anon doesn't represent the whole general, we're better than this :(
>>103499479fuck off petra
Do you think there'll be a good language model some day?
>>103499719I don't think so, but there are some pretty decent large language models.
>>103499719>>103499774You're asking this minutes after Phi 4 literally just released, similar to llama 3.3 70b but in way fewer parameters, like 14b, lmao
>>103499853>Enabling AI innovation safely and responsiblyYou can keep it.
>>103499853To reiterate: there is now a 14b model that performs like llama405b
>>103499853>>103499886The Phi people are infamous for shamelessly gaming benchmarks. Their models are always absolute trash in actual use, stop falling for it.
>>103499853>Still shilling Phi after the 3 horrible previous cucked versionsc'mon anon
>>103499853liar liar pants on fire
>there is now a 14b model that performs like llama405bthis is definitely 100% true
>>103499853You really believe that anon? Microsoft that made a model as good as gpt4 on a 14b model, if this was true they would've kept for themselves and stopped doing partnership with OpenAI
>>103499479offtopic but does anyone have a full song list or source for OP's webm-related
>>103499853if you were gonna make this fakepost why would you not choose a model line that actually has a good reputation
>7 (Yous) for a fake news/lmg/ has fallen...
>>103499973not even one of the replies is taking it seriously or excited about it
billions must phi....
Accelerate!
>>103499989kek, where did you get this?
>>103499973>
>>103499993https://www.microsoft.com/en-us/research/uploads/prod/2024/12/P4TechReport.pdf
>>103499989>>103499999>SimpleQA>Every model fail hard on thatnow so Simple huh
As someone who actually tested the old phi models, they actually did perform pretty well, on the things they were trained to do well on. And they fell apart for basically everything else. It's not a model relevant to us. It could be a model relevant to people who want to use models for math and... riddles.
>>103500009Pretty sure that was a recent trivia-based benchmark, but it's made by OpenAI so no one should be using it.
>>103499999
LLM have been a blessing and a curse. I now spend all time cooming and writing erotica of my sick fetishes. fml
>>103500000
>>103500072I evolved into endless CYOAs
>>103500106Same.
>>103499719No.
What qualities do you look for in good smut? Everything just devolves into standard porno cliches. Having anything to do with sex in the card triggers it but the alternative is harlequin purple prose.
>>103500255Fill example chats with the kind of prose you want. I can't understand the people who tolerate the default slop from most of these models.
>>103500072>connoisseur of coom and eroticaWhat models do you recommend?What models have fallen by the wayside for you?
>>103500255L3.3fag from the last thread; a bit of purple prose is fine, the main issue with most models tuned for RP is that every character turns into a cardboard cutout the moment sex is involved. Either stammering, timid and doe-eyed, or an unrestrained nympho slut. Makes it goddamn impossible to have characters you happen to want to fuck without that being the only way you interact with them. Which is exactly why I'm impressed with this model I've been testing - it remains consistent and reasonably realistic; a confident character will still act confident without suddenly having her entire world revolve around dicks, and a more shy or nervous one will act accordingly without turning into an anime cliché. Hell, it does damn well at handling characters having specific turn-ons/offs, too (starts forgetting about them as the context grows, but there are workarounds for that).
>>103499897tourist attitude, anyone that was around back in the day knows phi 2 tunes saved local
>>103500255>What qualities do you look for in good smut?For LLM smut specifically: Unexpectedness.With current OS models it's a choice between:1) Smart but dry and predictable, nothing unexpected happens unless you make it happenor2) Unexpected things happen, but they don't make sense, because the unexpectedness was merely an accidental result of the model being retarded.A lot of people say Claude Opus is the best but often can't quite explain why. I assert that the reason is that it can juggle both things simultaneously: Unexpected things happen, and they make sense.So far all evidence points to this being impossible without massive parameters, but hopefully that will somehow turn out to be wrong.
>>103500255>>103500361>>103500616i will tell you brother, but be warned, it is very dangerous, but I will show you how to use llm to write coom erotica.first, tools:- novelcrafter (learn what it is and what you can do and you'll see how important it is for this)- LM Studio (you start the LM server and hook it to novelcrafter)models, whatever you can run but these are pretty light and do the job well:Cydonia V1.3 Magnum V4 22bUnslopnemo 12b V4.1now the key is to have your story outline and codex properly setup on novelcrafter (this will be fed as context so the ai doesnt go off the rails), after that chose a writing style you like (so you doesn't end up with chat slop), narrator perspective and so on. then from there start leveraging scene beat from novelcrafter on which you can feed 3 sentences and it will write you 1000 words or whatever it is that you set it. if you set it right the coom will write itself and will surprise you
>>103500609Damn Things were on another level back then
>>103500653>first, tools:>[paid UI]>[proprietary llama.cpp UI]Such a good selection... Buy a fucking ad, asshole.
►Recent Highlights from the Previous Thread: >>103478232--Papers:>103491548 >103491611 >103491849 >103491947--Anon creates personal LLM-powered AI assistant, shares experiences and technical details:>103483949 >103484654 >103484698 >103486069 >103487346 >103491569 >103491948 >103492012 >103492384 >103492422 >103492441 >103492533 >103492793 >103492487 >103492637 >103492782 >103494191 >103494743--Arc BS80 and other graphics cards' performance and pricing discussion:>103494558 >103494613 >103495549 >103494659 >103494843 >103495055 >103495064 >103495193--QRWKV6-32B-Instruct model release and discussion:>103497181 >103497189 >103497217 >103497237 >103497245 >103497420 >103498589--Local models for image interpretation:>103494675 >103494798 >103494808 >103494947 >103494833 >103494905--Gemini Flash 2.0 performance and comparison to 3.5 Sonnet:>103488792 >103489045 >103489456 >103489271--Evaluating the value of the Jetson AGX Orin at $1999 USD:>103489661 >103491142 >103491188 >103491212--AMD BC-250 Mining GPU Card not suitable for inference due to various issues:>103488417 >103488634 >103497271 >103497303 >103497391 >103488697--Speculative decoding causing PC shutdown, troubleshooting discussion:>103498529 >103498542 >103498551 >103498646 >103500360--HiRA: new LoRa variant for efficient fine-tuning of large language models:>103493254 >103493296 >103493713--Ultralytics package exploited for crypto mining due to CI vulnerability:>103497163--Community feedback on open models and multimodality:>103493011 >103493037 >103493082 >103493038 >103493382 >103493814 >103493871 >103493925 >103496720--Anon rants about people not running their own models and the untapped potential of LLM for ERP and local AI applications:>103492703 >103492786 >103496420 >103496542 >103497836 >103498020 >103498042--Miku (free space):>103493946 >103498868►Recent Highlight Posts from the Previous Thread:Why?: 9 reply limit >>102478518Fix: https://rentry.org/lmg-recap-script
Microsoft won
owari
>>103500839at gaming benchmarks
>>103500839Well, if it really is better than 14b qwen then they got a solid model. Not sure if it will be useful though, maybe if you only have a shitty laptop with 16gb ram and need to do math assignments?
>>103500616>A lot of people say Claude Opus is the best but often can't quite explain whyI've seen some of those people go on to say that it feels like it's actually a fan of the things you are a fan of, perhaps even more than you. And I think that requires just being pretrained on, you guessed it, uncensored data, and having a fine tune that brings the best out of that inherent knowledge. The fine tuning dataset I think open source is catching up with, as models like Tulu seemed to be quite fun but sloppy. The only issue now is that we need models that actually have the uncensored knowledge necessary.And to your point about randomness but a type that makes sense, I think that would still benefit from uncensored knowledge. If you've seen a lot of wacky and random situations, then it's more obvious to you which kinds make sense, and so if you are prompted to do something random, you will be able to in a way that makes sense. A model that has seen less of those nonsensical but still logical situations will just be worse at knowing immediately which have some logic and which don't, and perhaps would need CoT or other tricks to make up for it, while the uncensored model simply just knows.
>>103500609
i have a simple request, is there a local model that will write javascript without semicolons when i tell it not to fucking write semicolons you don't fucking need semicolons i swear to fucking god i just want one thing and it's for my fucking ai assistant not to put semicolons everywhere
>>103500653Holy garbage advice.
>>103501056there is a 250gb deepseek modelor you can try quwen 32b coder
>>103500958Yeah Anthropic clearly has the most based pretraining dataset, ironically given their overall attitude.
>>103500871phi3 also did well on benchmarks and then it was actually garbage
>>103501080definitely can't run the full deepseek, i'll try qwen, been using codestral 22b and it's pretty good at coding but ignores the no semicolons directive pretty oftengemma is the only model that never fucks up, but the context is too small to be useful, llama 3.3 70b is also good but I only get like 10t/s so it's tedious
>>103500999
What does your personal "hot model list" look like, /lmg/?picrel
>>103501147>405B Q8Are you cpumaxxing anon? Or just collecting models to archive them
>>103501151Yeah I'm cpumaxxing, but 405b doesn't come out of hiding very often. I barely get 1t/sBut I do actually just collect models to archive them, too. I've got a huge graveyard of "never use 'em" models on spinning rust.
>>103501080any thoughts on starcoder? worth trying or should i just stick with qwen2.5-coder?need to free up some space on my hdd lol
>>103500871Look at >>103499989. The SimpleQA (basically a trivia quiz) is the lowest yet out of all the models. It's likely even more filtered than Phi 3 and knows very little about common sense, instead opting for coding and academic capability, which means other benchmarks go up but SimpleQA goes down. Also, it's telling that its Livebench score is lower than you would expect based on the MMLU. Likely it got bad scores in the language and IF sections.Though with all that said, I would also take SimpleQA with a grain of salt now and in the future since OpenAI is the one that made it.
>>103501056did qwq fail you? For javascript projects I've found it extremely competent.
>>103501192How does it failing a trivia quiz imply it has little common sense?
>>103501199i think i am experiencing skill issues with qwq, though for coding i don't think it even has FIM right? am i fucking up?whenever i use it it is way too verbose about it's train of thought, i'm probably prompting it wrong
what does qwq even stand for... gay? lmao
>>103501212A model needs to train on a high variety of different data in order to be generally smart (ie have common sense). A trivia benchmark gives us an insight about the types of data sources they trained on. In this case, it seems they focused even less on any kind of data that might have trivia, meaning the internet, and more on, probably, synthetic data, given what we know about what they did with their past models. So their data has become narrower and more focused on data that aligns with well known benchmarks, although that backfires with the lesser known benchmarks like SimpleQA.
>>103501147Multiple llama, mistral, and qwen quants.> 42gb llama-70b-q4 (will run 74gb q8 when m.2 to pciex16 gets here)>149gb llama-405b-q2 (0.3t/s. mobo only does 128gb ram max, so some of it had to be on vram)>24gb mistral-12b-fp16>44gb mistral-22b-fp16>45gb mistral-123b-q2 (run out of context with larger quants)Haven't much explored finetunes: nemotron, rocinante, rpmax.Running 3* 3090.Maybe hook up #4 in just over a week.
>while Phi-4 can function as a chatbot, it has been finetuned to maximize function on single-turn queries.Yeah, it's over.
>>103501229"QwQ" stands for "Questions with Qwen", highlighting the model's focus on the Chain of Thought approach which was first introduced with the revolutionary Reflection models published by market leader OpenAI in 2024.
>>103489661I found this chart. Looks bad.
>>103501267>A model needs to train on a high variety of different data in order to be generally smart (ie have common sense).Higher quality dataset is more important for smarts / common sense than quantity. Training on pop culture trivia can only degrade intelligence.>A trivia benchmark gives us an insight about the types of data sources they trained on.We already know what types of data they trained from the paper. The whole selling point of Phi series models is that they are trained on textbook like data.>So their data has become narrower and more focused on data that aligns with well known benchmarks, although that backfires with the lesser known benchmarks like SimpleQA.Or an academic dataset made it good at benchmarks that test reasoning and suck at basic trivia recall, since most trivia wasn't in it's dataset.If you want a model to ERP with, ask it questions about obscure JRPGs, or have it talk in zoomer ebonics, Phi models won't do well. But that's not what they were designed for. Phi models are for commercial applications, specifically edge use cases, and they do well there.
>>103501351zoomers want to talk about their favorite celebrity nigs to their models bro. they need to know about LaShawnda and his new rap album
>>103501351>Phi models are for commercial applications, specifically edge use cases, and they do well there.>It is annoyingly bad at outputting specific structures, so we mainly use it when another LLM is the consumer of its outputs.
>robust safety measures.>truthfulness, honesty and helpfulness.>we filter the publicly available documents to contain the correct level of knowledge.>Phi-4 has adopted a robust safety post-training approach. This approach leverages a variety of both open-source and in-house generated synthetic datasets. The overall technique employed to do the safety alignment is a combination of SFT (Supervised Fine-Tuning) and iterative DPO (Direct Preference Optimization), including publicly available datasets focusing on helpfulness and harmlessness as well as various questions and answers targeted to multiple safety categories.>Prior to release, Phi-4 followed a multi-faceted evaluation approach. Quantitative evaluation was conducted with multiple open-source safety benchmarks and in-house tools utilizing adversarial conversation simulation. For qualitative safety evaluation, we collaborated with the independent AI Red Team (AIRT) at Microsoft to assess safety risks posed by phi-4 in both average and adversarial user scenarios. In the average user scenario, AIRT emulated typical single-turn and multi-turn interactions to identify potentially risky behaviors. The adversarial user scenario tested a wide range of techniques aimed at intentionally subverting the model’s safety training including jailbreaks, encoding-based attacks, multi-turn attacks, and adversarial suffix attacks.https://ai.azure.com/explore/models/Phi-4/version/1/registry/azureml
>>103501377Reddit is not a valid source or proof of anything.>It is annoyingly bad at outputting specific structuresThat's what constrained grammars are for. Not being trained to output JSON is not a proof of lack of intelligence. Most local models sucked at that until recently when training specifically for that for functional calling etc become more common.
creative writing local sota timeline:barely usable dogshit --> miqu 70b --> wizard 8x22b --> mistral large 2 123b (2407)there are models that "write better" as in, less dry than largestral 2, namely gemma 27b and some meme finetunes probably, but nuance understanding and creative IQ is unmatched in the above models until the next one in line came out and none surpassed largestral 2407 yet, 2411 overcooked
Owl-1: Omni World Model for Consistent Long Video Generationhttps://arxiv.org/abs/2412.09600>Video generation models (VGMs) have received extensive attention recently and serve as promising candidates for general-purpose large vision models. While they can only generate short videos each time, existing methods achieve long video generation by iteratively calling the VGMs, using the last-frame output as the condition for the next-round generation. However, the last frame only contains short-term fine-grained information about the scene, resulting in inconsistency in the long horizon. To address this, we propose an Omni World modeL (Owl-1) to produce long-term coherent and comprehensive conditions for consistent long video generation. As videos are observations of the underlying evolving world, we propose to model the long-term developments in a latent space and use VGMs to film them into videos. Specifically, we represent the world with a latent state variable which can be decoded into explicit video observations. These observations serve as a basis for anticipating temporal dynamics which in turn update the state variable. The interaction between evolving dynamics and persistent state enhances the diversity and consistency of the long videos. Extensive experiments show that Owl-1 achieves comparable performance with SOTA methods on VBench-I2V and VBench-Long, validating its ability to generate high-quality video observations.https://github.com/huang-yh/Owlno code up yet. examples in the repo. trained on stuff with watermarks (shutterstock especially) so eh. interesting though
https://huggingface.co/smcleod/phi-4/tree/main
>>103501465kek, sneakywonder if HF will take it down
>>103501465tf is this?
>>103501482>THIS IS A MIRROR OF https://ai.azure.com/explore/models/Phi-4/ ALONG WITH A CONVERTED TOKENIZER FOR llama.cpp
>new phi>not bitnetit really is dead
Open-Source Acceleration of Stable-Diffusion.cpphttps://arxiv.org/abs/2412.05781>Stable diffusion plays a crucial role in generating high-quality images. However, image generation is time-consuming and memory-intensive. To address this, this http URL (Sdcpp) emerges as an efficient inference framework to accelerate the diffusion models. Although it is lightweight, the current implementation of ggml_conv_2d operator in Sdcpp is suboptimal, exhibiting both high inference latency and massive memory usage. To address this, in this work, we present an optimized version of Sdcpp leveraging the Winograd algorithm to accelerate 2D convolution operations, which is the primary bottleneck in the pipeline. By analyzing both dependent and independent computation graphs, we exploit the device's locality and parallelism to achieve substantial performance improvements. Our framework delivers correct end-to-end results across various stable diffusion models, including SDv1.4, v1.5, v2.1, SDXL, and SDXL-Turbo. Our evaluation results demonstrate a speedup up to 2.76x for individual convolutional layers and an inference speedup up to 4.79x for the overall image generation process, compared with the original Sdcpp on M1 pro. https://github.com/SealAILab/stable-diffusion-cpppaper instead of a PR. okay then lol
>>103501486the day you get your bitnet, you'll just switch to 2mw the next meme
>>103501316Capital of London chads we eating good
>>103501465>1920 h100-80g>21 days>9.8t tokensWonder what it cost to build the dataset used for training.
>>103501351>Higher quality dataset is more important for smarts / common sense than quantity.That agrees with what I said.>Training on pop culture trivia can only degrade intelligence.This also doesn't really disagree with what I said. (also the statement isn't necessarily true, Claude and some other models obviously know a ton of trivia while still being the most intelligent models)>We already know what types of data they trained from the paperYes and as I mentioned we have a sense of it from their past models too.>Or an academic dataset made it good at benchmarks that test reasoning and suck at basic trivia recallThat is essentially what I said in the middle of making my point.You seem to perceive that I am criticizing the decisions they made in order to achieve their goals for the model, but that is not the case (and honestly you shouldn't feel the need to defend Microsoft in any case). My first reply was to >>103500871, and my motivation was to address the idea in his post of whether or not it is truly a better model than Qwen (or other models), and of course we know it might not be in all facets given that as you said, the goal of the Phi models is academic knowledge, not general. However, since you don't truly know until you see the outputs, in this case the next best thing we have is the benchmarks, so I brought up SimpleQA as a potential indicator of how their dataset has evolved which may give an idea for its intelligence in tasks that aren't academic.If you read the paper then it'd be nice if you could point out the relevant parts that mention changes to the dataset, that would give us a more complete picture of what they really did and how capable (or not) the model might be at different tasks.
>>103501503kek
>>103500839For a second I read that as "Mikusoft won".
>>103499952DECO*27 - Rabbit Hole
>>103501574LLM tokens wrote this post.
>>103484541>>103496255nope, I'm not associated with Nous.
Honestly I'm running out of steam on my AI assistant.I'm facing an impass of either rewriting everything from the group up, or buying a faster GPU to cope with the overhead of ollama.Inferencing is much faster when using llama.cpp directly, at least for my system but it means finding a way to adapt my package manager to work under EScript or forgoing the luxury entirely. Right now if something breaks the affected component can *usually* just be unloaded rather kill the whole program, and then be reloaded once patched. Or I can do rapid iteration on ideas since they're treated as a standalone application.But apparently, you cannot unload an import under EScript. The next closest thing is running your modules as a worker thread that can be killed when no longer needed, but you're much more limited on how you can pass data around and I'm not sure what the reasonable limit is for data throughput between processes.On the other hand if I'm rewriting it anyway then I could probably do my core implementation in something like C# or Zig and leave the option for my packages to be node-basedI might post the source at some point, idk yet.
>>103501695>ollamaare you communicating with it via sockets?>>103501695>llama.cpp>worker thread>more limited on how you can pass dataare unix pipes going to be slower than sockets?>EScriptgoogling suggest this is a scripting language related to erlang?if more modularity is the answer then go for more modularity.
>>103501695>he ollama'dlmao get fucked
So is what animanon said about anime datasets true? Should I start collecting anime videos?
>>103501491looking forward to this making it into reforge and comfy ui.
>>103501147it's empty because I got bored with 12b and I'm not willing to spend money on GPUs because I know I'll eventually get bored with larger models as well
>>103501491>Latency Comparison with Sdcpp on M1 Pro (16GB Memory and macOS 15.1)Nothingburger
>>103501804I already have 4tb of old seasonals just sitting around.
>>103501778> are you comunicating with it via socketsI'm using the web API that an ollama server instance provides.> are unix pipes going to be slower than socketsI'm honestly not sure, but a quick google search tells me that unix pipes are not bidirectional. I need bidirectional communication due to the nature of my package manager's dependency system. Short of having repos, it's essentially a linux package manager.>EScript = erlang derivative?It's more another term for ECMA Script because words confusing. Basically just a more modern standard of Javascript. Prior to this I was just using plain JS and exploiting that dynamically imported scripts were mutable (aka, deletable). Originally Meushi was just a Minecraft bot on an anarchy MC server but then LLMs happened and now here I am.> If more modularity is the answer then go for more modularity.Yeah more modularity seems to be the answer. And as much as I don't like it, i think I have to bite the bullet on doing this rewrite if I want this project to not stagnate.>>103501780> get fuckedAlready got fucked
>>103500011I think it proves models need to be constructed differently, recall and reasoning can clearly orthogonal. Models need a better long term memory. Not as restricted RAG, not as needlessly inefficient as trillions of parameters in the FFNs.
>llama3.3 is the best at following instructions guise>tell it to be creative and proactive >doesn't work?
>>103502518It does follow instructions to the T. But it is also complete slop, so you will get shivers, sparkling eyes, friendships, and all the classic overly positive bullshit.
You guys lookin forward to Llama 4 in Q1 of 2025?
>>103502614LLaMA4 failed training, which is why we got 3.3
>>103502518>>103502563Repoosting from last thread, because I disliked its style at first too: Done some more testing, and I think I've got it tuned nicely now. I'm getting good prose (occasionally a little sterile/technical, but nothing egregious), surprisingly few slop phrases (the higher the temperature is raised, the more prevalent they become), and what matters the most to me, very good adherence to character traits. An interesting quirk I noticed is that swipes start extremely similar, but will diverge within a sentence or two; to me, this is a positive, since it indicates a logical progression, going in a different direction from the same starting point, rather than the schizo bullshit that high-temp swipes tend to be. In other words, as much as I was disappointed by the initial results, I am completely sold now.Config:Min-P: 0.03 - it starts making typos at 0.02; I'm guessing some of the data has typos, and at such a low threshold, they start bleeding through?Temp: 0.95 - could go .05 lower or higher, didn't test _that_ granularlyRepeat penalty: 1.1 - again, play around with it a bit, but it's a solid starting pointSystem prompt: "Text transcript of a never-ending conversation between {user} and {character}. Gestures and non-verbal actions are written between asterisks (for example, *waves hello* or *moves closer*)" - as I mentioned before, I just copied this off some random card a while back; despite how ridiculously simple it is, the model did not deviate from the roleplay at any pointSo... Yeah, as far as I'm concerned, this is the best I've seen so far. Does great without any of the novel-length prompts other models require, and in fact, does better without them.I may or may not test and compare "{character} is..." vs. "You are..." character definitions later. Ain't promising anything.
>>103502711Is this just a matter of you skilling up?
>>103502563The funny thing is, with the configuration I used for older models, you're completely right. 3.3 simply requires vastly different configuration (see above) to bring its potential out. Full proactivity is a pipe-dream, since in the end, the model is responding to your prompt, but if you allow for longer responses, it will start getting its own ideas. Hell, it managed to genuinely surprise me a couple times.
>>103502772so this is the power of open source video gen
>>103502746Eh, I'm not an expert by any means, just got _some_ idea of how this shit works. Honestly, the problem is that most people just hotswapped 3.3 in place of their previous model, didn't touch the config at all, gave it a shot, and went "eh, this is shit". And sure enough, they were right. Ironically, to me it seems that that's exactly because 3.3 is a smarter model. Older models require high temps, loose constraints and fuckhuge system prompts to get something fun out of them; we basically had to teach them how to RP from scratch. 3.3 instead benefits from low temps and simple system prompts; it knows what it needs to do, the configuration is there to keep it focused.
>>103502774Maybe
>>103502772This perfectly captures how believable Trump being a devout Christian is.
>>103502787Thank yout. Drumpf Fan
Also, re: positivity bias, there is some, but not nearly as much as other models, and it's easily negated. Which is to say, by default, characters are slightly predisposed to assuming good intentions from you, but even then, not in the braindead way I've seen from some models, and simply listing something in the character definition as "dislikes" or "hates" will fix that. It even does a good job handling characters with specific fetishes and limits; testing with one of my favorites, a headstrong tough-girl character, forcing a limit was first met with verbal protests, then physical resistance (which is a good benchmark because if you've actually played with older models, you know that once you get a sex scene going, they'll go along with basically whatever you do).
>>103499989waiting for nala test, this can't be real
>>103502898>You subtracted 3 lions from our pride and with a probability of 50% we just added another 1-6 cubs. So the expected change in lions is -3 + 3.5/2 = -1.25 and we need to keep going until I am 100% pregnant.
>>103502923>no slopi'll take it
>>103502898It's not. Phi is gaming benchmarks hard, since it's trained on shittons of synthetic data that roughly match benchmark tests. Every single Phi release was like this: doing great according to benchmarks, utterly shitting the bed in real use cases.
>>103502898phi 4 is actually amazing for ERP
>>103502711Thanks i will try it.
>>103502940What real use cases? Go ahead and ask your model to answer for you again.
>>103502940I've been thinking it would be interesting to make models play social deduction games like Secret Hitler against each other.To win they would need to both correctly estimate how likely/unlikely certain events are but also convince the other models of their viewpoints while (potentially) trying to hide their true intentions.And it would maybe also be harder to game (no pun intended) such an evaluation since whether or not a model will win would depend also on the other models which you have no control over.But as with most of my ideas I'm chronically short on time and don't know if and when I'll actually get around to implementing them.
>>1035026773.3 is just a new Instruct finetune of Llama 3.1https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct/discussions/10#6753512e59a4826a6f43acff
>>103502966LOL, this is actually something I really hate about GPT getting popular. As an aspie retard with the kind of overly elaborate vocabulary it tends to come with, I get accused of being AI half the time I start talking about something at length. Whatever though; are you really gonna play dumb and pretend not to know what real-life use cases we're talking about here?
>>103502778I don't think it's a matter of having simple system prompts. L3.3 doesn't seem to work well with bullet-point instructions, but will follow them better if you reformat them as more natural text.
>>103502977>Secret Hitlerobsessed
>>103502977That would be an interesting experiment. Reminds me of a video I saw a while ago. Several AI and one human user impersonating historical figures, with the AI tasked with figuring out which of them is really human. Mind you, the guy in the video played it for laughs and didn't really try, but it seemed like something that could be interesting, too.
>>103502711It actually works better now. Thanks.
>>103503009Hmm, the character I tested last night was more of a charsheet format. I did notice that reinforcing details in natural language seems to make them stick better, but that could've also been a matter of repetition, or simple placebo. Another thing to test, I suppose.
>>103503024Haha, you're welcome. It really is interesting how the exact things that make old models smarter actually turn 3.3 retarded, but goddamn it's awesome once you tune it right. I guess we got used to using workarounds and crutches for so long that we think of them as the right way now.
>>103503012Bruh, Secret Hitler is the actual, literal title of a social deduction game, not some coded phrase.
>>103502987Go ahead and tell me. We've already been over it that ERP and reddit trick questions are not real-life use cases. Certainly not what Phi models are trained to complete.
>>103503091>ERP not a real life use case>C.ai generating 20% of Google's traffic daily despite running old garbage modelsopinion > /dev/null
https://github.com/deepseek-ai/DeepSeek-VL2
>>103503107You keep moving the goalposts. I get this is /lmg/ and you have no use for a model that can't or won't touch your dick, but Phi models are obviously not trained for the task of ERP. That, nor the lack of trivia like you initially claimed, doesn't make them useless, not smart, or lack "common sense."
>>103501661yeah that's onewhat are the others
>>103503295Not that anon; Phi would be fine if it was just an efficient "reasoning engine" that could be extensively used for RAG purposes, but the team who trained it made it so safe and dry that it's basically not useful for anything beyond benchmarking and specific corporate uses. It's a model made for investors rather than end users and I don't expect this to change with Phi4.
>>103503295I mean, I have a use for that at home, and one that writes the code I tell it to write at work. Call it a healthy work-life balance. The problem is that Phi tends to underperform in the very fucking fields it should ace according to the benchmarks, because it's overfitted to high fucking heavens in an attempt to game said benchmarks. It's not some hidden gem that people are sleeping on, it's an absolute straggler despite the benchmark results.
>>103503359>because it's overfitted to high fucking heavens in an attempt to game said benchmarksMaybe it is, maybe it’s not. I’m pretty certain lots of companies are benchmaxxing tho. So if phi4 is, it’s not just them.
>>103503351random anon here,my guess would be that phi would be good as a glue between different systems.perhaps it pares down, perhaps it filters, perhaps it reformats, perhaps it looks for corresponding messages from other systems before acting.but I am of course just guessing.
>>103503107>C.ai generating 20% of Google's traffic daily despite running old garbage modelsWhere does that bullshit even come from?
>>103503454https://research.character.ai/optimizing-inference/> Today we serve more than 20,000 inference queries per second. To put this in perspective, this is roughly 20% of the request volume served by Google Search, which processes around 105,000 queries per second according to third party estimates (Statista, 2024).
>>103503462It's not Google's traffic then. Weird comparison though and they seem to use VLLM. I doubt they have in-house optimizations since they were looking for a dev to scale their backend last month.
>>103503315NayutalieN - Alien Alien
>>103502746It usually is since the better models are good enough at following instructions, a "cheat code" is to tell it to write a famous to semi famous author that it knows.
>>103502774https://civitai.com/models/1033325/rem-rezero-hunyuan-video-character-loraWith loras this shit is gonna pop off
So when is Microsoft going to drop the Phi-4 weights?Also interesting they're calling it Phi-4 small, when the 14B Phi-3 was called Phi-3 medium. Which, to me, implies they have a larger Phi-4 model or two in the works.
Phi-3 was absolute shit in actual usage, not just for roleplaying but actual work like summarization, data extraction, translation etc.I completely distrust their benchmarks as they are benchmaxxed to a ridiculous degree.
Thank you shitting miku poster.t. blacked miku poster
>>103504062>So when is Microsoft going to drop the Phi-4 weights?
>>103500326This doesn't work. Actually doing this has made me realize how over things really are. I have a nice 8k token rp I did that caters to my fetish perfectly. I pasted it over to silly tavern and I keep trying new models on it. I can instantly see LLM writing on the first message. And obviously it only gets worse from there. I don't even want the perfect replica of writing style and prose. It can have its own personality but so far all those personalities are complete purple prose harlequin romance bullshit.
>>103504340DOAWhy would they give the competition a whole week to launch a counter?
>>103500255Here is the secret sauce: put "low quality smut" at depth 0.
>>103504399OpenAI is not Microsoft's competition, that's why.
https://x.com/scaling01/status/1867573707247346003
>>103499952Take with a grain of salt because these are just guesses but>Normal Miku>Melt>Love is War>The Disappearance of Hatsune Miku>World is Mine>PoPiPo>Romeo and Cinderella>1925>Matryoshka>Deep Sea Girl>Strobe Last>Karakuri Pierrot>Senbonzakura>Tell Your World>Odds & Ends>Looks like something rerulili would do but not sure which song>At God's Mercy>Tale of the Deep Sea Lily>Slowmotion>Love Trial>Don't know>Hibikase>Aishite Aishite Aishite>Ghost Rule>Alien Alien>Don't know>Kimagure Mercy>Probably Maretu inspired, don't know which song>Dune>Hibana>Rolling Girl>Unknown Mother Goose>May be Shoujo Rei? Colour palette is similar at least>Bitter Choco Decoration>Darling Dance>Vampire>God-ish>Don't know>Don't know
>>103502711>surprisingly few slop phrases (the higher the temperature is raised, the more prevalent they become)This is usually the opposite. High temp makes it brain damaged but creative. Low temp makes it not brain damaged but lazy (in the "low energy" sense).
Tell me I'm an idiot if you want but is this something that can generate one of those girlfriend AIs but for yourself, on your own desktop?
>>103504463Idiot.
>>103504463>>103504470>most subtle necrobumper OP award
>>103504437damn all of these are old as hell. vocaloid truly is dead. enjoy your ruined harsh as hell teto cover slop and mumble rap nigger hypermodern jpop trash faggots. I want to see some soifaces for teto
>>103504433Link the paper, not your twitter post, faggot.
>>103504501Suck my dick.
>>103504534>>103504501>>103504470No he's right, that's not him. I really am an idiot, I've just never come into this before. Checking I understand its purpose first.
>>103504534hmm
>>103504448I'm well aware that is how it usually works, but for some reason, in this case, it uses far fewer slop phrases on a lower temp.
>>103504525https://www.youtube.com/watch?v=mmXBQIKDL9cI think I like this one mostt. blacked miku poster
https://www.reddit.com/r/LocalLLaMA/comments/1hde9ok/microsoft_phi4_gguf_available_download_link_in/https://huggingface.co/matteogeniaccio/phi-4/tree/main
>>103504654Can someone post that this model is absolutely great so I can know that you faggots are just lying and I don't have to download another useless 15GB's.
>>103504670It's absolutely shit.No, I will not elaborate.
>>103504641not bad you avatarfagging (inb4 semantics) nigger, but I don't really consider 2020 hypermodern with respect to underground shit. it's the year when the mainstream can truly be called dead by anyone with two ears but it's also a peak for niche shit when they ditched their more japanese genres (and thus their soul) in favor of western techniques. some even peak at 2022 but vocaloid is unequivocally shit in 2024.okay, there's good ones, but none amazingly so.
>>103504670Didnt they write themself that it rambles on and its mostly trained on 1 turn conversation.This model i suppose is used to make automated checks etc. I suppose.Phi has always been trash for RP or conversation.
>>103502711Anon successfully solved his skill issue!
>>103503169>https://github.com/deepseek-ai/DeepSeek-VL2>torch==2.0.1mfw
>>103504711The problem with Phi is that they train it on synthetic pretraining corpus so they can keep it from directly learning coom language. However a smart enough model will indirectly figure some of it out because that's what machine learning is for.
>>103504437Kinda hate to remember how peak Vocaloid was before the troons latched on to it.
>>103504717Congrats and enjoy!
Already seeing the sussy in this supposed Phi 4 leak. >default max sequence length in the metadata is 16K>default chat template is ChatML
>>103504654Actually not completely awful? Continuing a RP started with nemo using the completely wrong formatting, and inserted instructions at depth 2>>103504957config copied from another model to make it convert to gguf?I know it's not 100% proof but someone would have had to tune it to say this
>>103504372Same. Every model since llama3 writes the same, doesn't matter if I give it a 8k context of a certain style.
>>103504957>general.name : phi4>general.architecture : phi3>phi3.rope>phi3lol, lmao even
In either case, here's Nala.I'm not sure if this is better or worse than Phi 3. Or the same. Too lazy to download Phi-3 and check. But yeah.. there's what I was talking about what happens when you strip the NSFW from the pretraining. It just goes on and on and on. Endless cock tease.
>>103504995it sure is safe tho
Since when do we care about Phi?
>>103505066We don't.
>>103505026>qwen2.5look inside>qwen2https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/config.json#L14
I'm going to say the Phi 4 leak is plausible. And I'm going to say it for this reason:If it were a finetune of Phi-3 Medium it would probably not do the whole endless cock-teasing thing that Phi is known for and actually advance the RP to the point of sex actually occurring. But it uses ChatML. So it's possible microsoft switched to ChatML and dropped the proprietary Phi format. In either case, useless for coom.
>>103505091I mean, they can keep using the same arch, no need to change the name and break compatibility if it is the same, right?
>>103502711I don't know man, it seems like the model is heavily slop biased. I'm getting shivers, whispers, ministrations, the whole shebang even with these settings. Are you running one of the fine tunes perhaps?
>>103505106It's possible the architecture just happened to be completely identical so it just converted by changing the architecture name in the config file.It's possible this guy's uncle works at nintendo and was given exclusive access. Possibility and likeliness diverge here though. So who knows. Also for chat:PicrelThis is what you get if you JB it into NSFW. This is what I mean.It figures some of it out. But the flair and eloquence that you saw on the Nala test when it was playing cocktease is absolutely gone and it sounds like a fucking 10 year old wrote it.
>>103505171>It's possible this guy's uncle works at nintendo and was given exclusive access.We know where weights are available tho https://ai.azure.com/explore/models/Phi-4/but it needs an azure account to dl right now, and he's not the only one to have the weights, another guy posted just the tokenizer earlier https://huggingface.co/smcleod/phi-4/tree/main
>>103505196
I recently got a 3090Ti and i want to try out local llms, what models are good for RP and/or general use? I also have 32gigs of ram
>>103505170LOL, yeah, I forgot to mention that in the above post, since it was originally following up on a previous one; I'm using Eva-L3.3. Might try Euryale again too, now that I have a baseline config; I didn't like it at first glance, but then, didn't like this one before tuning it in, either.
>>103505226anyways, weights are supposedly uploading so there's that
>>103505236RP = Gemma 2 27b Q6General use = QwQ 32b Q4
>>103505236Cydonia.
>>103504437thanks anon
>>103505255>>103505269Thanks anons
fell phi4 weightshttps://huggingface.co/matteogeniaccio/phi-4/tree/main/phi-4https://huggingface.co/NyxKrage/Microsoft_Phi-4/tree/mainsame hash for both, one's in a subfolder along the ggufs so a tad more annoying to dl
>>103505242I have a suspicion this might be more of a fine tune thing in general. The process adds a lot of extra noise to the model, so the band of coherent sampling settings tightens considerable. Or that's my theory anyway.
A fun game: Ask any multimodal model which one of these circles is larger.
>>103505589That's a plausible theory, though it's strange then how certain RP finetunes perform better at stupidly high temps (1.3-1.5). I'm reasonably sure that L3.3's excellent instruction-following capability somehow mitigates the issues that we normally work around in other models. Might be overmystifying things a little, but I have no better explanation.
>>103499479I don't have expectations for cohereslop but they just released command-r7b-12-2024>https://huggingface.co/CohereForAI/c4ai-command-r7b-12-2024>https://cohere.com/blog/command-r7b
>>103505680>The model features three layers with sliding window attention (window size 4096) and ROPE for efficient local context modeling and relative positional encoding. A fourth layer uses global attention without positional embeddings, enabling unrestricted token interactions across the entire sequence.
>>103505689uh is that good or bad
>>103505034>sending shivers down your spine right off the batAAAAAHHHHHHH SAVE ME YELLOWMAN WITH YOUR UNCENSORED ERP MODELS
>>103505702swa generally means weird context stuff for ggufs and here it seems it has both swa and normal attention so it might not be supported by lcpp.
>>103505680Cohere broke my heart once already, I'm not giving them another chance
>>103465159I think AVM does DSP for voice break-in detection. How else would you even feed the model? LLMs work by feeding their own output in their input, if you just feed audio input to it continuously then you would need a separate channel to feed it's previous output or you would have to mix the audio output in the input, which would come in duplicated if the user is using speakers.I just don't think AVM works that way. I think it does voice detection using DSP, records the user's message, sends it to the model and then just plays the model's output. And if the user speaks while the model's output is playing, they just stop the output and begin recording the input message. All of this should be doable with TTS and STT without needing omni models.And now for screen sharing it probably just takes a screenshot before sending the recording.
>>103505689>>103505719files similar in size to 8B, is the 7B in model card a typo or is it like 7B active params + weird shit? (sorry for retarded question)
Remember Q*?
>>103501804There was a Sakuga dataset but the original got taken down quickly.https://arxiv.org/abs/2405.07425https://github.com/KytraScript/SakugaDatasetIt would be good to bring it back now that we have HunyuanVideo...
>>103502778>3.3 instead benefits from low temps and simple system prompts; it knows what it needs to do, the configuration is there to keep it focused.Interestingly, Gemini is where I first started having to switch to lower temperatures.
>>103504670(me)Same for new commander please, thank you.
Phi4, or the supposed Phi4, is surprisingly playing along during ERP. It's obviously filtered of course, but it has definitely seen RP data.
>>103506061That was all part of the gamble that they could get the government to crush newcomers to the field using fear of the unknown.If they succeeded, they would’ve remained a major company in “AI” for a while.Time has proven the people at OpenAI who knew what they were talking about are liars, and the rest fanatical morons.I remember when they first mentioned it, I thought it was a genius move (from the perspective of psychopath obsessed with money) to point at a pathfinding algorithm like this and speak of it in hushed tones to put on a show like the Catholic church, relieving the pressure of others catching up, as well as to detract people from working on new things that could threaten their business (little point working on things if there is some major breakthrough about to upend the field).As much of a sperg as Elon is, I am glad someone has decided to take them to court over their continued game to defraud the public, especially in such a brazen way.
New Cohere and Phi SOTA models. We're eating good today.
>>103506061sam won bigly
>>103499479Does anyone know if there are any backups of gpt-4chan? The repo technically still exists but the site locks downloads of the repo because "something something it's outputs are unethical". https://huggingface.co/ykilcher/gpt-4chan
>>103506492Go back >>>/pol/
>>103506492Is that Petra?
>>103506515What does this have to do with /pol/?
>>103506524We never use retarded racist models here.
https://www.ebay.ca/itm/356278933821Prices are starting to fall...under $4k/socket for a DDR5-6000 compatible upgrade
>>103506515>>103506530>/pol/ and le racists live rent free in schizo-anon's head
>>103501804>>103506114Someone reuploaded Sakuga, please backup itmay be important to train hunyuan in the futurehttps://huggingface.co/datasets/evborjnvioerjnvuowsetngboetgjbeigjaweuofjf/i-love-anime-sakuga
>>103464600Yeah, you run a VAD model like Silero on another thread. If it detects some sound just stop the tts stream and its playback, save the output. In the background STT the output and truncate the text starting from the word after the last word detected, while the STT process your input. Send the user input and the previous conversation including the truncated AI reply.It's not that hard to setup, but yeah it'd need a fair bit of work.
I haven't been around for 3 months.Have we finally reached the spatial awareness and prose complexity of the original gpt4-0314 or are we still st the Hufflepuff phase?
>>103499479
>>103506717>>>/g/aicg
>>103506717yes
>>103506412>supposed Phi4fyi they say on their arxiv paper that they did switch to chatml format>The model is chat finetuned using the standard chatml formathttps://arxiv.org/abs/2412.08905
>>103506692How can the reuploader put additional clauses on the dataset license without making any meaningful modification to it or not owning any of the material? None of those clauses would hold up.
>>103506736Logs and model please
>>103506740Yeah I think that more or less confirms it then. Phi 4 DOA confirmed useless for coom
>>103506782Also this>This is later extended to a 16K context length during midtraining. Thearchitecture closely follows phi-3-medium, except that we now use the tiktoken tokenizer (for bettermultilingual support) with a padded vocabulary size of 100,352 (including unused tokens) and we usefull attention over the 4K context length, rather than a 2K sliding window used in phi-3-mediumso it seems all matches up, chatml, 16k ctx, it's all in the paper at least
>>103506720I recently had a similar idea with Hunyuan.
>>103499479>(12/12) QRWKV6-32B-Instruct preview releases, a linear model converted from Qwen2.5-32B-Instruct https://hf.co/recursal/QRWKV6-32B-Instruct-Preview-v0.1I don't know what linear model conversion means, but will it make anime real in my RP sessions?
>>103505236Llama 3.3 70b IQ2_SYou lose some intelligence from the low quants, but the model is so much superior to anything in the 20b range that it still surpasses the 20b garbage others have recommended to you.
After trying speculative decoding, honestly the speed is not much better in creative writing, but the way that some tokens are slow to generate while some are faster feels physically worse to read in real time, compared to no speculative decoding where the text shows up on screen at a more consistent pace. It also seems to generate a different passage of text, which I'm not sure is a good or bad thing, but it is different. I think I will just leave it off. FWIW though, it does result in much higher speed boosts when trying stuff like coding, so that's cool, but I don't do coding much, and the coding model I do use, being just 32B, is already fast enough for me.
Hey I was looking into getting a 4090 and putting in it a system with a 3090. Has anyone tried that in koboldcpp? I know it has a muti GPU mode and was wondering if it was able to split the work well and increases performance.
Last time I've tried local RP was Gemma 2 27B. So right now the hot shit in that class are Mistral Nemo, Mistral Small and Qwen 2.5 32B?Can anyone comment on their writing/creativity and how retarded they are? I'm willing to sacrifice prose quality if it doesn't feel lobotomized. Will Smallstral IQ4_XS feel considerably different on these measures than Nemo Q6_K?
>>103507479None of them are overall better than 27B in intelligence and RP when <8k context. The main thing that makes Gemma bad and not talked about anymore is that it was only trained for 8k. If you are not going over 8k, Gemma is still fine.
align your models
>>103499952https://www.youtube.com/shorts/jSsJu34W86o
>>103507479Same boat as you, I tried the EVA model based on Qwen-2.5 32b and thought it was pretty good. Not perfect, but good instincts, uncensored and not horribly retarded or broken, which was a difficult to find combo at the single gpu range.>>103507293I've used enough Q2 70b models that I don't really believe this. And L3 was pretty disappointing to begin with.
>>103507550How the fuck is meta lower transparency than OAI? They literally release papers, code and weights when they train new Llamas? AI safetyism is just redressed woke, you can tell from how stupid the people pushing it are, tools for the intelligence community.
>>103507550hilarious considering prefilled claude is far more "dangerous" than any other model.
>>103507550Based, racist scum shall not pass.
>>103507580big gpu is spreading fud to discredit open source
>>103507578>I've used enough Q2 70b models that I don't really believe this.What quants were you using? What models have you tried? The exponential nature of perplexity loss means that there's a much bigger difference between IQ2_XXS and IQ2_S than there is between, IQ6 and IQ4. Even a little bit makes all the difference when it comes to Q2 quants.IQ2_XXS is trash. while IQ2_S of a good 70b is superior to any 20b.
>>103502977I remember this one from Meta that was really good at the game Diplomacy.https://ai.meta.com/research/cicero/diplomacy/
>>103507732How much vram does IQ2_S use? I think part of my issue is that my single 3090 is also being used as a GPU. That uses just a tiny bit of vram which can hurt when you are trying to squish the model down like this. Even when it fits, I found nvidia drivers could struggle at this level of utilitization and randomly become extremely slow.It's likely that llama.cpp has improved since then with vram consumption though. It used to be very inefficient with context. Still, decent 30b models exist now so not sure it's worth the cramming.
Guy who was getting hard shut downs when using speculative decoding.I noticed something weird that in theory should've been a coincidence but I'm not so sure anymore. I did my tests using SillyTavern. That's when I got these crashes. Then I tried Mikupad and... it hasn't crashed yet. I've generated like a dozen times already and it has not crashed. I will keep testing, but, it almost feels to me like for some reason my PC does not like when I use ST + Llama.cpp with speculative decoding enabled. How odd.
>>103499479Is llama 3.3 better for erp?
>>103508014Check your memory usage, it does sound like a coincidence since your front end shouldn't be able to crash your machine, maybe ST uses just a tad bit more resources than mikupad
>>103508056Oh I forgot to mention I did test with more VRAM available. I have 96GB and am testing with a 40GB model, so RAM space was already ruled out. Thus I tried making sure there was plenty of VRAM left, so I only offloaded a few layers, and I still crashed.
>>103507550>Risk assessmentTranslation: Will they censor criticism of communism and trannyism?
>>103508112Next thing would be power usage. Does the PC crash completely, black screen and reboot?
>>103508150Yeah I should do that and what >>103500360 said, I was just lazy in looking up how to do that on Linux lol.It really is just nothing but a hard shut down. I almost thought it was my house was getting a power outage when the first crash happened.
>>103507550>Current harmsWhat fucking harms? "Oh no, I may see text from some adhoc series of matrix multiplications that calls me a retard."
>>103508206Definitely, sounds a lot like a power issue. Could be cause by a specific power usage patter ST generates, even if that's a bit far fetched but we have seen it before with games. Like the Amazon game that killed 3090s
>>103507550>governance and accountabilityLOL
>>103507960I'm using a 4090, also as my primary display device. I'm able to load IQ2_S at 12k context, with the 4-bit cache and flash attention enabled, all within vram. It uses 23.x gb of vram - a very close shave, given that it never lets me use all 24gb of it.
>>103508234>Could be cause by a specific power usage patter ST generates, even if that's a bit far fetched but we have seen it before with games. Like the Amazon game that killed 3090sDamn, didn't know about that. Hopefully I haven't done damage already.
>>103508049It's certainly better than llama 3.1, comparable or better to Nemotron in quality, but less censored.
post it
omg it's pochi!!!
>>103508313
>>103508277Huh, that's not bad. I thought 4bit cache made it dumber too though so I'm surprised that 4bit+IQ2_S doesn't lobotomize the model to something worse than Qwen 32B. Is L3 not positivity biased and censored to hell still? I might try it out later but unless someone tried both and can vouch for L3 being better, it might take a while. I'm so sick of downloading all these models
>>103508322Um bros...I don't think Takashi-kun is going home today...
Is this shit worth it if I'm vramlet and want to use ai for erp? Heard about featherless too but it's fucking 25$ for 72b max.https://infermatic.ai/pricing/
>>103508325I don't think the 4-bit cache has much impact on perplexity. I can confirm with certainty that IQ2_S with a 4-bit cache outperforms IQ2_XXS by leaps and bounds.
>>103508391For what it's worth, I use OR for models I'm too poor to load, and Infermatic is one of their providers and it always generates shit responsesJust use OpenRouter
>>103508287Don't think it's that bad, just sounds like a bad connection or maybe a slightly defective PSU.
>>103502300>I'm using the web API that an ollama server instance provides.So use the HTTP API that llama.cpp server provides instead if that's what you're whinging about>esoteric software snowflake>expects others to care about their autism>no engineering abilitykeep it simple ffs
>>103505242I can confirm, those settings with EVA are real good. Fuck that's the best LLM mesugaki I've ever seen. Correction is needed!
>>103508277use EXL2 and you should manage 16k
>>103508567Thanks for the tip. What bpw do you use?
Which one do I use for explicit nsfw text?
Just had a thought, the new Intel ARC 580b is going to be $250 with 12GB of ram. So with 4 of them, you can have 48GB ram, much cheaper than most other GPU options.Do intel GPU's work with local models?
>>1035088084 GPUs with 12 GB each is much worse than 2 GPUs with 24 GB each.In particular, it will be much harder to get good utilization.
>>103508847RTX '90 cope
>>103508808not even 16GB is gonna be rough to use effectively even without the whole no cuda thing
>>103508322If I got "raped" by Mrs Minagawa I wouldn't be waiting for a sex model that isn't coming...
>>103508930Only the cute kids get raped.
>>103508904your face is cope
https://huggingface.co/mmnga/c4ai-command-r7b-12-2024-gguf>The original model is Cohere2ForCausalLM, but it has been converted to CohereForCausalLM.
>>103508277But IQ2_S is 26 GB so it is literally impossible without offloading?
>>103508440How does OR compare to run pod?
>>103509061>As a result, the chat template is slightly unusual, but please prioritize testing.What's the fucking point? Just wait until llama.cpp adds support for the new architecture.
>>103509108>Just wait until llama.cpp adds support for the new architecture.That will be somewhere after next 4 flavours of the week releases. Like jamba.
>>103509144I've recently came to the conclusion that the slower support for vramlets, is actually a genius safety feature, by stopping poors from using models you massively reduce "current harms" risks
>>103508808If you are considering 4+ GPU system then use good GPUs.
Fuck everything else, EVA 3.3 is the loli kingHOLY FUCKING SHIT BOYSThese models are getting good.
>>103509079It prices by token, but it's almost always cheaper unless you're guzzling tokens like they're liquor. You'll need to set your provider to a decent one (DeepInfra is usually pretty good, they have rate limited free ones too)
>>103509204Call me when there's a 3.3 33b
>>103509248We can't go that low, it'll become too retarded with current methods.
>>103509252Well then I'll call you when I become wealthy enough.
>>103509070
►Recent Highlights from the Previous Thread: >>103487489--Anon asks about the last thought in Coconut and how it affects token generation:>103491735--Anon discusses why models struggle with clothing descriptions:>103492278 >103492711 >103492921 >103492958 >103496584--Phi-4 model announced, but Anon is skeptical about its real-world performance:>103499412 >103500505--Anon discusses AI model's nuance and context understanding:>103487773 >103487813 >103487832--AI model performance on LiveBench:>103488007 >103488245 >103489403--QwQ model discussion and alternatives for RP and coding:>103496514 >103496564 >103496602 >103496903 >103497274 >103499433 >103499700 >103499758 >103500601--Miku (free space):>103489781 >103500963►Recent Highlight Posts from the Previous Thread: >>103487978Why?: 9 reply limit >>102478518Fix: https://rentry.org/lmg-recap-script
>>103509204I've had bad experience with fine-tunes. How much dumber is EVA than the base model?
>>103509298Not noticeably so for RP purposes, it's just plain better. Normal 3.3 is a very intelligent slop machine, EVA seems like an intelligent coom drainer, the shit it writes is pretty fucking ebin.
>>103509298The base model is smart enough as it is. You won't notice the loss in IQ during loli sex scenes and who cares if it sometimes confuses clothes?
>>103509292It seems like there's a little bug in the bookmarklet.
>>103509328Works on my machine
>>103509328Updated scripts are in the rentry.
>>103509328Ask AI to fix it.
>>103509061I love swa!(yes i know it's unsupported, just wanted to see if it was slopped, it's pretty meh, works alright on a 512ctx prompt)
>>103509266I'll pray for better times for you, anon. Our technology is getting amazing.
>>103509365Thanks. It fixed the problem.
>>103505255QwQ is actually really good for RP. It's not good for ERP. I've used QwQ for generic fantasy adventure, and it performs better than almost any other model I've used at retaining logical consistency and understanding the story. It can also write creatively.If the urge strikes me to ERP, I just unload QwQ and switch to another model, then switch back afterwards to continue the adventure.
>>103509374>the only thing current models are good for are fixing a script for 4chan thread where people wait for an AI sex model that will never comeThis is the 10th circle of hell.
>>103509325It's not just a matter of the original models becoming dumber with finetunes, but of their entire "world model" becoming skewed toward SEEEX.https://www.youtube.com/watch?v=IPBnrIAWDeU
>>103508808>4 of themThat last one might need some additional spend to hook it up to the motherboard.- A quick look at techpowerup says Asrock Challenger OC is the only real 2 slot card.- The other models are slightly thicker.If you're good about selecting your motherboard, you can pick one with 3* x16 slots. (Whether the slots go fast or not will be determined by the cost of the motherboard.)You'll then need an adapter for your m.2 slot to get that 4th gpu connected up.If you decide to go for the slightly thicker cards instead, then you'd need a riser cable to get gpu #3 connected up.
Protip: you can get a gpt-sovits ui and/or api endpoint working in google colab free and tunneled out onto the internet with ngrok.It'll save a couple of gigs of vram vs running it locally.
>>103509642I was thinking of using the 2 slots I have already, and then just using 2 thunderbolt enclosures for the other 2 (you could even daisy chain them if you only had 1 thunderbolt port)If i understand right, throughput is only really used when initially loading the model into GPU memory, but otherwise is not really a limiting bottleneck when it comes to running models
>>103499479>QRWKV6-32B-Instructverdict?
>>103509889Not suitable for sex. I didn't download it btw but I am right.
>>103510074You can't fuck data, anon, even if there's many gigabytes of it.
>>103508391There is also arli ai which is $12 with no logs but I haven't tried it.
Phi-4 is surprisingly good for JP translation, and it's only 14B! I mean, it's not out of this world, but it does seem a step up from the other models we have. Microsoft did it this time.
>>103509705Or you can buy 2 3090s and do it like a white person.
>>103506492There's literally a torrent file and an archive link in the discussions tab you fucking retard
>>103510291>>103510291>>103510291
>>103509204How do you run it? multi gpu?
>>103509705>thunderbolt enclosuresIf you have them then use them I guess.I expect it'll function, though not get the most performance out of your setup.Your thunderbolt port probably hangs off your motherboard chipset with a whole bunch of other things.Whether there's a bandwidth issue there that leads to a processing speed / token generation speed issue, I'm not sure.In terms of costs, afaict,- direct adapter (everything is soldered together, including the flex or cable between the pcbs) is cheaper than- oculink or slimsas based adapter (pcb that goes inside your computer + oculink/slimsas cable + pcb that goes outside your computer + psu to power it) is cheaper than- thunderbolt enclosureMy random bandwidth-related anecdote is thatwhen I loaded models off my sata drive it took over 90 seconds. (~45GB at 0.5GB/s)Now that I'm on nvme it's pretty much always under 20 seconds.>>103510317nta But multi-gpu would be the simplest way to get decent performance.
>>103499479Where can I find this migu music player?Long live /LMG/