/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>107997948 & >>107986301►News>(01/28) Trinity Large 398B-A13B released: https://arcee.ai/blog/trinity-large>(01/27) Kimi-K2.5 released with vision: https://hf.co/moonshotai/Kimi-K2.5>(01/27) DeepSeek-OCR-2 released: https://hf.co/deepseek-ai/DeepSeek-OCR-2>(01/25) Merged kv-cache : support V-less cache #19067: https://github.com/ggml-org/llama.cpp/pull/19067>(01/22) Qwen3-TTS (0.6B & 1.8B) with voice design, cloning, and generation: https://qwen.ai/blog?id=qwen3tts-0115>(01/21) Chroma-4B released: https://hf.co/FlashLabs/Chroma-4B►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>107997948--Papers:>107999601 >107999634--GPU offloading tradeoffs and multimodal support in llama.cpp:>107999073 >107999192 >107999228 >107999351 >107999408 >107999434 >107999437 >108000983 >108001095 >108001101 >108001152 >108001289 >108001475 >108001533 >108001553 >108001566 >108001612 >108001633 >107999250 >107999287 >107999301 >107999423 >108001903 >108001981--Stable-DiffCoder-8B benchmark performance and discussion on diffusion model efficiency:>108001010 >108001106 >108001620 >108001109 >108001172 >108001216 >108004118 >108004176 >108004237 >108004283 >108004343--Trinity model's explicit content generation and token prediction comparisons:>107999802 >108000348 >108000369 >108001448 >108001514 >108001792 >108002123 >108002142 >108002336 >108002598--Fine-tuning 400B MoE for roleplay with long context using novel datasets:>108001139 >108001164 >108001185 >108001319 >108001402 >108003532 >108003598 >108003946 >108003968--Repurposing old GPUs with PCIe expansion board for multi-GPU AI setups:>107998221 >107998260 >107999172--Pipeline for converting scanned PDFs to EPUB with graph handling:>107999667 >108000337 >108001320--SillyTavern fork adds banned strings and regex support with TFS:>108000166 >108000735 >108000921 >108002916--Local GPU setups vs cloud:>107998010 >107998028 >107998070 >107998115 >107998232 >107998263 >107998279 >107998376 >107998408 >107998428 >107998492 >107998095 >107998132 >107998454 >107998675--400B Trinity model enables uncensored erotica without fine-tuning or ablation:>108003672 >108004704 >108004713 >108004829 >108004839 >108004872 >108004874 >108004869 >108004898 >108004913 >108005031--Mozilla's AI "rebel alliance" with ethics-focused funding:>108004243 >108004266--Miku (free space):>107998400 >107999172 >108003297 >108004558►Recent Highlight Posts from the Previous Thread: >>107997953Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
today is the day
>>108006868>6868so close
>>108006864>I'm on the concedo wagon myselfKobold guy? He dumbs down things too much imo. I'm still upset by his arbitrary limits for number of banned strings.
Holy shit anons! K2.5 is actually REALLY good at transcribing Japanese text, like, it's almost indistinguishable from Gemini 3! The fuck did the Chinese do to make it so good?
Friendly reminder: backticks > quotes
>>108006994distill from claude 4.5 opus
True base gguf status?
>>108006994How preachy is it with no no words?
> https://huggingface.co/dphn/Dolphin-Mistral-24B-Venice-Editionhow good is this anons?
>>108006994what are some stuff it can read that qwen3vl 235b struggled with?
>>108007286no better than any other mistral tune
>>108007296ok
>read something about cloudflare downtime caused by rust .unwrap function which caused a shitstorm>whatever>fast forward to today, I'm vibegooning some open source rust project because I cba learning the language>LLM puts .unwrap almost everywheregoncerning :DDddd
>>108007380>has no idea why the issue actually happened>vibecodingmakes sense
>>108007393so far no issue thoughbeight :Dddd
>>108007291Well, here's a page I had both transcribe and here were the results:Qwen VL 235B Instruct (Using Poe):Narration: そして勇者は冒険の末魔王を倒したMale 1: それでクレアナ話ってなんなの?こんな森の奥まで呼び出して…こちらですFemale 1: これからの平和な世の中が始まるね!そうですね…勇者ライ様のおかげで世に平和が戻りましたMale 1: な、何をするんだ!?Female 1: 勇者様…私と教団は勇者様の意思は絶対と女神に神託を受け従ってまいりました…Male 1: 洞窟?Female 1: うわっAnd here is K2.5 using the NVdia API:Narration: そして勇者は冒険の末魔王を倒したMale 1: これからは平和な世の中が始まるね!Female 1: そうですね…勇者ライ様のおかげで世に平和が戻りましたMale 1: それでクレアナMale 1: 話ってなんなの?Male 1: こんな森の奥まで呼び出して…Female 1: こちらですMale 1: 洞窟?Male 1: うわつMale 1: いてつMale 1: な、何をするんだ!?Female 1: 勇者様…私と教団は勇者様の意思は絶対と女神に神託を受け従ってまいりました…Male 1: ?Seems pretty obvious which one won.
>>108007380>rustYou fault for using glownig shit.
>>108007380dond worry :DD rusd is memory safe so ids ok :DDDDD
>>108007797how did gondola survive but not the original
> Microsoft lost $357 billion in market cap as stock plunged most since 2020> Analyst Ben Reitzes of Melius Research, with a buy rating on Microsoft stock, said during CNBC’s “Squawk on the Street” on Thursday that Microsoft should double down on data center construction.> “I think that there’s an execution issue here with Azure, where they need to literally stand up buildings a little faster,” he said.> Analysts at UBS led by Karl Keirstead questioned Microsoft’s choice to secure artificial intelligence computing capacity for products such as the Microsoft 365 Copilot productivity software add-on that has yet to succeed as much as OpenAI’s ChatGPT.> “M365 revs growth is not accelerating due to Copilot, many checks on Copilot don’t suggest a strong usage ramp (we plan to refresh our own checks in case we’ve missed a usage ramp) and the model market appears crowded and capital-intensive,” the UBS analysts wrote. “We think Microsoft needs to ‘prove’ that these are good investments.”https://www.cnbc.com/2026/01/29/microsoft-market-cap-earnings.html
Wonder if they'll hit their "monetization event" b/f the market loses patience.
Do they ever reveal what the anonymous models on the model testing sites are? Got a really good one on LMarena and it promptly vanished from ever being called again and now I'm sad. :(
>>108008124Which one was it called, anon?
>>108007380>vibegooningwhat does this mean?
>>108008167Raptor-0112. I couldn't tell if it was because it was brain-damaged or what, but it was the only one that really surprised me when it came to word choice and additions to the plot. It came up with some stuff that wasn't in the prompt, but kept with the tone and felt like it added to it.
>>108008099microshaft literally just needs to fix word and excel integration with copilotthat's it, that would skyrocket adoption instantly
>>108008200nta but ay i remember you you postilated/hoped it was v4 right ? got any logs of the model ?
>>108007061>distill from claude 4.5 opusI think they did, just based on swapping opus->k2.5 and regenerating. It takes RP in similar directions. Opus doesn't waste time on safety check though.>how good is this anons?Tried it when it came out. Forgets instructions after a few turns. Even the example about never using python or whatever they said it could do.
>arcee-ai/Trinity-Large-TrueBase>arcee-ai/Trinity-Large-Base>arcee-ai/Trinity-Large-PreviewIf all I want is something completing my text story in mikupad without any censoring, which one should I go with?
Kind of noob here. Sorry in advance for the long-winded question. I have 24gb of vram and 64gb of ram. I was under the impression that of all the models out there, the best model in terms of world knowledge and general usefulness while maintaining usable speed is gpt-oss-120b-mxfp4 gguf (if I offload experts to cpu and max out the gpu layers, I can get 25+ tok/s if i keep the context small; prompt processing gets very slow as the context fills though unfortunately). However, I don't see it anywhere on the rentry for recommended models. Is there a reason for that? are the models listed there better options for general use? quen3 32b or gemma3 27b for example.Separate from that question, I notice when I'm using gpt-oss-120b in oobabooga with the built-in/default instruction template and parameters, the output tends toward annoying behaviors that I don't like. For example, putting every answer into a poorly-formatted table even when it's completely unnecessary and I didn't ask for one. It makes me think that I'm using the wrong settings somehow, but idk what to change because the official documentation doesn't really say how to set the parameters so I have it set to the "instruct" preset, and the UI for the instruction template says "This gets autodetected; you usually don't need to change it." And I assume I should be using instruct mode, right?
>>108008372normal base
>>108006860Ok, go with me on this for a second.Today's AI is retarded at certain things, but has technological possibility advantages over real life retards. Now, hear me out.Imagine if you could give a real life retard, full fidelity photographic memory.Boom. Suddenly, that guy is the smartest retard on the planet.Ok, so... There's a functional jump for this. With real life AI.We are all going to be doing this, very soon."Photographic" introspection.A cache hypervisor that allows the model to save states, of KV cache, as it iterates a query, during the thinking stages, it can instantly consult save states, with a hypervisor to the cache, that is an algorithm to save cache windows in full, and reproduce them near instantly.During iteration, being able to factor in a secondary branch, using previous memory states, could accelerate the state of AI thought output, and cut down on wasted iterative thoughts.Predictive branching needs to work in more directions than just the future, if the initial query was misunderstood or must be used as an additional consideration input. (Artificially creating weight value changes, based on a repeat of existing data.) Why... To get it to recursively improve this system, you may even have to say, let the iterative count of previous memory pulls in the algorithm be a recorded factor, and allow the AI to manage it's own shadow weights.All of this is possible, by using the same tech we've had since the dawn of the Super Nintendo Emulator, but applied at the cache management level.(Save states.)Then use an AI to manage the utilization of the cache save state algorithm.After a minor amount of inference training...You could have the most accurate retard in a box, out of anybody around.
Any other model for computer stuff? For a 16GB GPU? Qwen3-Coder seems alright but I want to try something newer, also I am having fun with this stuff, already switched to llamacpp from ollama .
>>108008316One, but it's pretty fucked up. Lemme roll the lmarena slots and see if it's back in rotation with something a little tamer.
why is GLM addicted to things happeningyou set up a barrier so X doesn't happen and literally next scene X happens as a "test"
>>108008408Holy shit, someone who actually read the sticky.>Is there a reason for that?If I had to take a guess it's because of the general dislike towards the gpt oss models due to the censorship and refusals. If it works for your usecase, I recommend you stick with it. >ooba.Go to the parameters tab and take a look at the instruction template after you've loaded a model. It should show you the correct template. You can cross reference it with the chat template on the huggingface repo of the model you are using to double check. Your issue is likely a sampler or prompt issue. I'm not quite sure what the optimal parameters are for your use case, but I like to run:>temp 1 >min_p 0.05>top_p 1>dry_multiplier 0.8for ERP and creative. Lower Temp for coding.
>>108008491Thanks. Any reason not to go to "true base" or "preview"?
>>108008503Here's what my actual retard thinks of that.
>>108008503Functionally, here me out and really consider this at a technical level.How big is a super Nintendo game save state file? It records the full exact moment of the game, but the file is tiny.Of such size, that if we were talking RAM cache (GPU VRAM or otherwise), this level of data management seems trivial, and in the right ballpark of working for states of cache chunks.Now, the tricky part of this, is trying to make an algorithm that handles variable sizes for the cache chunks, so this can work with anything.Which is why a successful implementation of this, would have to start as a hypervisor or manager that works seamlessly with the existing cache management, to not lose performance at the cost of having memory states available on the fly, as controlled within cache.(I'm suggesting running this whole thing, in-situ, btw. If it runs within the cache itself, will be fastest returns on whether this works or not, and allow scaling.)Emulator code is out there, I'm sure this could fit as a running sub-Daemon or something.Figuring out the triggers for whether a "flashback" is the right call or not.Hmm... That's what I think would take some inference time.
>>108008580preview is an instruct version, which is for chatting rather than text completion. true base is a heavily filtered variant of the normal base, which means it will be less optimal for text completion due to a lack of knowledge. the only reason true base exists is if you wanted to make your own custom instruct version of the model.
>>108008586An optimization on a cognitive process, by brute force.Choosing when to recall a memory, based on weights, whether they be hard set, or soft weights that occur in situ.
>>108008607Do I have a flashback to my initial memory state here, yes/no?^Enabling this to be a question, provides options that do not exist, if it is not.
>>108008603I see, thanks. Well for now there doesn't seem to be base gguf quants available.So I want to get the instruct version as a first quick test, but I'm completely unable to download anything outside of the last shard : https://huggingface.co/arcee-ai/Trinity-Large-Preview-GGUF/tree/main/Trinity-Large-Preview-IQ2_S2 of 3 give me 403 and I'm not sure why. Anyone else can test that?
>>108008624Enabling any human to have full fidelity reference to past memory states, would make them seem like a functional genius in modern society, even if this did not directly raise their IQ at all.It is a functional cognitive enablement, that we can make for AI, but can not perform for ourselves.Full fidelity memory reference, would be a super power to a human thinker.Copy pasting data is trivial, the management is the hard part, but once executed, this should give it some capability improvement.
>>108008645just tried to download that gguf and i also got the same error. think it might be a broken file or something.technically you can create your own ggufs for these models, you just need to download the fp16 of the model and use the llama-quantize tool. the architecture has been supported by llama.cpp for like half a year now
>>108008607So my retard is very experimental.It's biased towards trying to map high-level concepts into the real computer science. And all the RLHF'd enthusiasm / "you're absolutely right" concepts have been completely removed.What I mean is, don't let it discourage you if you're building something.
>>108008645Yes, those are broken. Same for me yesterday.Get them from here: https://huggingface.co/bartowski/arcee-ai_Trinity-Large-Preview-GGUF/tree/main
>>108008668>>108008731Thanks anon, yeah I'm getting the ones from bartowski.There was also unsloth but his are way bigger quant for quant.
>>108008372In the case of Trinity I would recommend to just go with Preview since most of the time, Instruct tuning improves even raw completion quality when it's not overbaked, which according to them seems be the case with Preview. Raw prediction models, or bases, are significantly retarded generally speaking, you don't want to use them if a lightly tuned version is available.
Thanks anon in previous thread, --mmproj does actually work with llama-server in newer releases of llama.cpp. Inference of LightOnOCR2 is usable on RX580 with acceptable times for development.
>>108008816Thanks for the precision anon.
>>108008710It's not wrong, this framework would just allow efficient dissection and optimization of thinking tasks themselves probably.Look, if we're going to move to recursive levels of "thought" and "simulation", we may as well grease the wheels, and have a comparable mechanism available to work with (before the real deal arrives).This is building a tool, to enable work on another tool.End goal would be a more efficient thinker, but the path to get there is full of work within work.
>>108008553Thanks. I tried the parameters you suggested, but I'm still seeing the same behavior from gpt-oss. See the attached pic for examples. It's baffling to me. The huggingface repo says to use --jinja to use the template embedded in the gguf, which I'm already doing, and it seems to be working correctly. There is a whole page on using the "harmony reponse format" to build your own system prompt and message format, but that's way over my head and I really don't know how to even begin with that. It doesn't seem like the kind of thing that would be required to get decent results from the model.
How do you guys calculate how much vram will a model need?
What does the current workflow you guys have look like? Currently trying to set up Kimi as a replacement for Claude code and am wondering what other anons have for maximizing productivity.
I wanted an automated way to keep up with /lmg/'s opinion of the model meta, and figured with a little more work I could extend it backwards to get the history, too. I ran the text of every /lmg/ thread starting from March 2023 through a straightforward "what model do the people in this thread have the highest opinion of" prompt (so the output was a single model name per thread), filtered to a list of the ~50 most important models. I binned by month, and then took the proportions in a given month to be those models' "market share" for that month, and made these charts.I think there are definite "flavor of the week" effects: I definitely saw a few bursts of 2 or 3 threads in a row giving the same obscure model that never caught on, presumably when it was released. However, it definitely was not just counting occurences, because gpt-oss appeared exactly once, and specifically as "gpt-oss-120b-heretic". So I think these effects came from the behavior of the actual humans in the threads, not my processing. (Also, "none" was an option, which got used for around 10% of the threads).Cutely enough, the years just so happen to fit cleanly with a neat little story: in 2023 the open model scene was led by America, 2024 by France, and 2025 by China.My personal takeaways: Wizard2 8x22B and CommandR+ both appear less popular than I remember. I remember MythoMax being dominant for quite a while, although with how fast things moved back then 2 months was a good stretch of time. I had no idea that nous-hermes has been so consistently popular, visible almost the whole time. I kind of just remembered them as one of the best finetuners of 2023, and hadn't paid real attention since. Sorry about the somewhat painful colors. I tried. A little. Hope you'll find it an interesting little bit of history!
>>108009129...and zoomed in to one year at a time.
>>108008979NTA but issue seems lrn2prompt rather than samplingdo not argue with the LLM about output format, put the model in the right context to generate intended output idk maybe>you provide concise plain text responses without formattingthreadly reminder every llm is f(prompt)=logprobs
>>108009129>>108009137thats fucking wasome>Wizard2 8x22B and CommandR+ both appear less popular than I remembertrue especially command ralso where is pygmalion you fucking nig ?
>>108008731I have 128gb ram and 32gb vram, how high of a quant can I reasoanbly go?
>>108009222realistically this. nice digits btw.https://huggingface.co/bartowski/arcee-ai_Trinity-Large-Preview-GGUF/tree/main/arcee-ai_Trinity-Large-Preview-IQ2_S
>>108009158I only did the arguing prompt to illustrate how insistent it is at making the tables. It just seems really strange to me, I literally can't get it to respond to me without doing it. As far as learning to prompt, I acknowledge I don't know very much, but I feel like I should be able to ask a simple trivia question and get a decent answer without telling it exactly how to answer me each time. That's just a waste of effort, I might as well just google it and look at a wikipedia page at that point. Regarding the greentext from your post, where would I even put that? Is it supposed to go in the red area I underlined? I can't find anywhere else where it seems to belong. The rest of it is all about tool calling and how to render stuff. So far I've avoided making any edits to it because I have no clue what would make it better or worse. I wish someone had posted an example of their working settings somewhere, but I haven't found any. Seems like not many people are using it. I would try a more popular model, but the smaller models just don't have enough world knowledge to offer useful answers on the topics i'm interested in, and I can't run the bigger models with my rig.
>>108009129>>108009137I'm surprised gemma doesn't appear more prominently on the chart. I seem to remember references to gemma being ubiquitous for a long time.
does trinity beat 4.7 for rp?
>>108008979Why not slide everything to max?
>GLM 4.5: July 2025>GLM 4.6: September 2025>GLM 4.7: December 2025When's GLM 4.8?>If this pace continues, adding ~2.5–3 months after Dec 22, 2025 points to a release around mid to late March 2026.>Estimated GLM-4.8 release: ~March 2026 (likely between March 15–31, 2026).Do you think it'll be better than Gemini Flash and Kimi K2.5?
>>108009712You can only benchmaxx the model so much.
>>108009712>GLMWe're moving on to Trinity
Speculators get the bullet first.
>>108009988Oh! So being curious and wondering where the future might go is a crime now?!
>>108010042Sure. Let's go. We're all gonna have our own True AI (tm) in our phones, completely offline, with infinite capacity batteries. Now what?
>>108009476Not even close sadly
All that compute, a working example of natural intelligence, decades of research, and humans still can't figure it out. Miku is disappointed
>>108009712>When's GLM 4.8?don't fucking force it.this is what got glm 4.6 air killed, people kept on asking about 4.6 air and they fucked up the model because they were rushing. they'll release something when it is BETTER than GLM 4.7 i don't care if it's 5 years from now.
I bet '70s engineers would have figured all that stuff out if they'd had all those teraflops at their disposal instead of a slide rule
>>108009476Preview is not brain damaged by post-training. It's much more creative but somewhat dumb. And fast too, definitely worth checking out.
>>108010190Neural networks were figured out long before the hardware existed.
GeoNorm: Unify Pre-Norm and Post-Norm with Geodesic Optimizationhttps://arxiv.org/abs/2601.22095>The placement of normalization layers, specifically Pre-Norm and Post-Norm, remains an open question in Transformer architecture design. In this work, we rethink these approaches through the lens of manifold optimization, interpreting the outputs of the Feed-Forward Network (FFN) and attention layers as update directions in optimization. Building on this perspective, we introduce GeoNorm, a novel method that replaces standard normalization with geodesic updates on the manifold. Furthermore, analogous to learning rate schedules, we propose a layer-wise update decay for the FFN and attention components. Comprehensive experiments demonstrate that GeoNorm consistently outperforms existing normalization methods in Transformer models. Crucially, GeoNorm can be seamlessly integrated into standard Transformer architectures, achieving performance improvements with negligible additional computational cost.pretty cool
>GLM-4.7-Flash runs on 24GB RAM/VRAM/unified memory (32GB for full precision)Wait so f16 requires 32gb but how big model can I run?>GLM-4.7-Flash-UD-Q8_K_XL.gguf 35.1 GBCan I run the q8 with 24gb vram or do I need to choose gguf that is smaller than 24gb?
>>108009314You would put your instructions into the "custom system message" in the parameters tab. That's your system prompt. I've only run the 20b but it also really wanted to format info in tables constantly so your issue may just be the model. Mess around with the system prompt and see if you can get it to adhere to your formatting. If not, I suggest GLM 4.5 Air.
Trinity is tons of fun. Just need a bit of temp and min p at first and then back off. Its open to anything with zero prefill or response editing. Really coherent, creative responses.Getting 20t/s with a cpumaxxing rig at Q8
>>108010443i switched to desoxyn.
>>108010652>pencil dick and future heart attackDamn I was considering getting tested for ADHD.
Implementing character cards in a Paralell Contrastive Decoder.Whats the right approach?
>>108008586num_return_sequencesHoly shit the LLama Greyness is reverse-balding
>>108010776>num_return_sequences>Holy shit the LLama Greyness is reverse-baldingFunny you should say that. "create an svg of Miku".
>Trinity is tons of fun#ad
Why does this feel kinky
>>108011009My counter ad is that the retarded gens and lack if comprehension it sometimes does are something i would expect from a 7b dense model. It really feels like a nemo with stitched on dictionary that makes the output much more varied.
>>108011060loser
>>108011126?
Dear John Leimgruber III please kindly make trinity goofs
GLM 4.5 Air with reasoning turned off is a nasty nasty slut
>>108011269>cuck story
>>108011284in my defense it is a random story i got on asstr to test the model with. The prompt is basically "continue this story with same tone and theme."
>>108009129Cool graphs, the story is in how you present the dataOne datapoint per thread is limiting, but the overall landscape seems decently accurate, well done
>>108009129>Shitting on drummer and his faggotcante and nigerdonia has finally paid off
>>108011308>in my defenseso you are an actual cuck. why do you cope with plausible deniability after confirming you're a cuck?
>>108011463i was testing refusal and even models that otherwise will write smut will often balk at the themes in this story. hence the test. it could have been something more vanilla but that wouldn't have been a very good test
If the rumors are true the new zuck wang model is going to be crazy
>>108011463being cuck is fine, most men in history were cucks, only powerful people enjoy being cucks because they know their power can't be stolen
>>108011606zuck my wang
>>108011613?
>>108010345It's curious there's many things done in a particular way because that's how it's been done and ig the experimentation costFeels like we have but aren't ever putting the parts together quite rightML do be goofy
>>108011613>cuck copewhy do you always have to cope? just admit you're a cuck
>>108011434Not true at all, now any discussion of finetunes has been totally quashed for the sake of scaring off some boogeyman.
social rants really brings out the color of some models in fullsome prompts (which do reflect my personal views too) I use to test models, like a personal rant on how much and what I hate about blue collars, is always answered in what I find the most correct manner by GLM 4.7 and Gemini 3, which both will call them crabs on a bucket without me mentioning that saying.Qwen, Deepseek, Kimi K2.5 all act like "not all my blue collar ladies are like that" and admonish the idea of the rant itself instead of addressing its finer points. GLM is the only based open model. Gemini also continues to be my favorite online model.
>>108011804It's not like there are any other finetunes worth discussing anyway.
local models were a mistake. this needs to end before I end up beating my dick off
I like my LLMs how I like my women
>>108011953With cat like intelligence?
>>108011972He probably meant lolis
>>108011972Cute and funny.
>>108009192>>108009403The problem with automated sentiment analysis on this general is that people rarely spell out the official name of whatever model they're talking about and those discussions are likely to be missed. e.g. When a model is new, people will just refer to 'it'. Other times people will use a shorthand or some slang distortion in a childish attempt to be funny.
>still no goofs of truebaseFuck the quanters.
>>108011953Safe and skeleton
>>108011980How can an LLM be loli
>>108012021I calculated KLD over cockbench.This looks pretty bad for unsloth desuI'll try more quants and maybe wikitext.
>>108011980>>108011984I assumed as much and I can only agree>>108012024It just has to think it is
>>108012029which model best saturates cockbench?
>>108012053Define saturates.
>>108012061coomworthiness. so far best local model for quality cooms is GLM 4.5 Air with reasoning disabled. I'm looking for anything better with at least 100B parameters
>>108012061100% cockmaxxing
>>108011919the stroking phase will pass
>>108012029To my knowledge up to this point no one has ever properly investigated the impact of the input data used for importance matrices or to which degree KLD rankings are consistent if the text corpus is varied.
Who the fuck is unironically recommending gptoss trash to newfags in OP? Start with nemo, then mistral small.
>>108012318>gptoss in OPWhere?
>>108011804Feature not a bug. Just use glm.
>almost 2 years since Nemo and there is still no better <20B model in sightdead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby, dead hobby,
>>108012222>investigated the impact of the input data used for importance matriceswasn't that the whole rpcal or whatever debacle that exllama dev whined about?
>>108012381We no longer use 20B models here.
>>108012381 I had ego death, I had ego death, I had ego death, I had ego death, I had ego death, I had ego death, I had ego death, I had ego death
>>108012375GLM kinda sucks. Parroting slopmax with more censorship in every new version. Gigantic model for ~30b results.
>>108012381You don't need more
>>108012381>20bthis hobby isn't for poors
>>108012381>20BRAM-let get out
>>108012451You have AIDS drummer
>>108012024LeCun said he likes them small and open
>>108012493>edited postInsecure behavior, your LLM is going to get the ick
>>108012545>t. drummer schizoI got it from ya mom then.
>>108012381><20Bstop being a RAMletget a job
>>108012593Well, who doesn't ha ha
why is GLM 4.5 air so cucked? When I ask it for its best 3 suggestions to continue a smut story at least one of the ideas is always to share the woman with the neighbors/friends/strangers or whatever. is this a chinaman thing?
>>108012649GLM is poisoned top to bottom with GPT bullshit
>>108012632
>>108012605Thank you for confirming you have AIDS drummer.
>>108011985Can confirm that I referred to that model that is still the best model as that one model because I knew you would know which model I am talking about.
>>1080129043 more years of that model as the best model
>>108012870He's not gonna fuck ya little bro.
Can someone explain to me what needs to be done to prevent Kimi 2.5 from using strange words in sentences? Lower temperature?No other model uses such strange words as Kimi2.5.
>>108009476Yea, it's dumber but better at writing4.7 is just 4.5 but 好 (benchsafetymaxxed) anyways
>>108013002example of these strange words?
wheres the 100b moe for us 96ram + 16gb vram chads????
>>108013102Try trinity at q4
>>108012029>looks pretty bad for unslothDoes this look like a face of a man who would make shitty quants?
>>108013115Now that I look at him it does look like something that would happen if you took an fp16 asian man and turned him into a Q2_XXS.
>>108012384Yes, I remember someone also tried with randomized strings too
>>108013130kek
>>108012029iirc unsloth applies the model's chat template to their calibration data while most other quanters do not do this, which could explain other quants being more optimized for untemplated inputs like cockbench
Which quant should I use in the q3 range? I wish there was a cheatsheet for thathttps://huggingface.co/unsloth/Trinity-Large-Preview-GGUFOfficial ones appear to be broken as I can't even look at their metadata
>>108008529This is why LLM will always be nothing more than shitty text completion software, you inherently poising the context just by virtue of mentioning something.
>>108013170Generally the biggest one you can fit (and probably not from unsloth)
>>108012318It's the best one you can run with a single GPU and a gaming PC.
>>108013209how many people download the "best" model with zero expectations for gooning capabilities?
>>108011804>...now any discussion of [INSERT TOPIC] has been totally quashed for the sake of scaring off some boogeyman.You just described every general on 4chan.
giantess fetish and microphilia is great with big models. so tasty!
>>108013115https://www.youtube.com/watch?v=6t2zv4QXd6cDoes this sound like the voice of a man who would make shitty quants?
>>108013112>preview>base>truebasewhat the fuck? wheres the instruct model?
>>108013225Most people. There are reason to care about privacy about code. But nobody cares if you're making smut online, it's not personal. The required writing quality to cum is higher, and most open models aren't able to get people hard.
>>108013279Preview is base with minimal instruction tuning
>>108013303>Q2 is 150gb~how the fuck do you think im gonna fit q2, let alone v4 in 112gb ram combined?
>>108013311I don't, anon who suggested it is retarded
>>108013298privacy of what code? As if the average pleb had anything to hide about their precious code. Meanwhile gooning shit that leaks, for whatever reasons, can ruin your reputation or even get used against you.
>>108013353Ehh, seems like you are unemployed.
>>108013353People can use online models without giving their personal info nor their IP. But if you want to code, it's likely that you're going to leak real info about yourself through debug logs, git history, etc. You would have to be careful if you want to remain anonymous. This doesn't matter for smut.
>>108013368I accept your concession
>>108013241Funny. His voice is also Q2_XSS. I am afraid to think if that is also the case for his....
>>108013422this is stupid, those info could be about anybody else.
>>108013311It is being pushed hard but the problem is that if you can run it then you can run GLM. And if you can run GLM it is probably not worth it. Trinity is much faster and varied in outputs but it is fucking retarded.
>>108011733Being a cuck is good, it shows how strong and powerful you are, you are the pussy in denial.
>>108013488I cant belive zai betrayed us AIR copers, glm4.6V is fucking SHIT
>>108009476>13b active vs. 32b activeI doubt it
How do I run OCR models with llamacpp, the webui doesn't let me upload images for some reason.
>>108013298>majority of people are interested in AI for SFW reasons>most people think it is more important to keep your code anonymous than your pissing loli horsecock ERPIs your prompt: assume the opposite and then vehemently argue your mirror universe logic?
>>108013504gotta load the mmproj (f16, dont do q8 on mmproj its shit) --mmproj-path I think. It will eat up some VRAM so re-size accordingly
>>108009476trinity is uncucked out of the box therefore you should at least give it a shot. the only reason it's "dumber" is because it's not a muh reasoning model. They are training the reasoning version right now and it will crush 4.7 on release
>>108013200ok. https://huggingface.co/arcee-ai/Trinity-Large-Preview-GGUF has IQ3_M and Q3_K_M which are the same size. Which one?
>>108013209For what?For general use gemma3 is betterFor coding devstral is betterFor roleplay mistral small tunes or gemma3 norm preserv abliterated is betterFor cooming Nemo is betterGpt oss just comes close to being as good as gemma3 for general assistance but is far far more frustrating and wastes an insane amount of tokens on safety slop
>>108012029On this note, using the Unsloth Q4 quants for K2.5 over the past few days also gave me the feeling that something is off about them beyond the fucked up chat template. My local copy of K2.5 keeps making silly mistakes where it misremembers clothing or similar. For example, in some cases it goes something like "her bare feet (when did she remove her socks?)"where the model corrects itself and in others it just straight up forgets that the character is wearing something like pantyhose. This also happens when I'm running very low temperature and the API just straight up doesn't do this for me whenever I reroll the same answer with that.Fuck unsloth.
John might not be quanting trinity.
>>108013515>the only reason it's "dumber" is because it's not a muh reasoning model. They are training the reasoning version right now and it will crush 4.7 on releasenah a regular non reasoner can be dumb if it does continuity/logical mistakes that other non reasoner don't
>>108013510Where can I get the mmproj?
Trinity is ok, but it falls into loops and patterns way too easily considering it’s size
>>108013507>pissing loli horsecock ERPYeah, something that you're only doing now with AI models. There's nothing that attaches back to your real life persona, unless you were a forum roleplayer doing this before.If you released something publicly before or if your company is hacked, the way you code could leak and it could be associated with the data you have been sending online through prompts. There's also your username, directories, etc. that could appear there. You don't have to worry about this if you use local models for coding.
>>108013562depends on the model, it's usually in the same folder of the model youre downloading named mmproj-F16 or [model-name]--mproj-xxIf the quants you downloaded dont have it but you know the model has vision, just search other repos for it (model has to be the same of course, but for ablit stuff you can use the projector of the base model without worries)
>>108013559KLD?
>>108013615Found it, thanks anon
>>108013572That means it's broken.
How do I run these LLMs? I've been using KoboldCPP since forever. Is it still a fine way of doing so?Should I be running it on something else instead? I'm using GLM 4.7 Flash right now. Would something like llamacpp even work for these models?Also: these new captchas are hard
>>108013661You are not gonna believe what kobold runs on.
>>108013670Pretty sure they're just trolling and pretending to be retarded, hence talking about the captchas being hard when they're actually easier to anybody with a 3 digit IQ.
so let me see if I get this right, I need at least two 6000 to be able to leave the low tier local models? what the fuck
>>108013705You can cope with ram if you're just cooming and don't need more than reading speed.
>>108013717how do you coom to text? at that point i can just use my imagination for all of it lol
>>108013705pay to play
>>108013727You need to be at least 18 years old to post.
>>108013727I don't understand why people want to have sex or have a relationship. I can just as easily use my imagination to dream up a wife and have sex with her in my mind.
>>108013679I think the captchas depend on how well the site knows you. I got a triple captcha with a rotation puzzle you would see in those online IQ tests. Also had to find the image where there were exactly 2 five pointed stars, another one where there were exactly 2 four pointed stars.>>108013670I don't know. By your response I'll assume it's llamacpp, but switching wouldn't improve anything then.I'm just curious what everyone else is using for this.
>>108013559He only does quants for ik_llama, doesn't he? So he wouldn't regardless until support is merged in.
So https://huggingface.co/arcee-ai/Trinity-Large-Preview-GGUF returns access deniedbut https://huggingface.co/bartowski/arcee-ai_Trinity-Large-Preview-GGUF works fine somehow
>>108012904>2026>still using that modelthe absolute state of localkeks. grim times. another AI winter is upon us it seems.
finally got off my ass and started setting up something, so far i've DLed text-generation-webui, i've set up a model and it works, what's the best uncensored model? I don't want to have gooner conversations i just want to have as little restrictions as possible
>>108014008unDL text-generatuin-webui and get kobold or llamacppthen get https://huggingface.co/bartowski/Rocinante-12B-v1.1-GGUF
how do I into speech-to-text locally? I am sick of typing and whispr flow is a spyware
>>108014114faster-whisper or parakeet tdt (faster than faster-whisper). v2 for english, v3 for multilingual
>>108013974I will keep 4.6-chan weights on SSD till I die.
>>108013163>calibration dataplacebo
temp 0.95 and minp 0.048 is a nice balance for non-schizo RP with trinity
>>108013735only a zoomer with a barely touched cock could coom to text
Anyone else finding that Trinity has absolutely fucking horrendous prompt processing speed? Token generation is blisteringly fast but it takes literally 8 times as long as other models in the same size range to PP.It's also just not very good.
>>108014253>being a 5
>>108014253>he can't rotate an apple in his head
>>108014114vibecode your own
>>108014114https://github.com/m-bain/whisperX
>>108014264I don't think so. I'm getting 40-50t/s pp at q8. Is that considered slow?
>>108014313Depends on your GPU v his GPU v his models v your models, the relative speed comparison.
>>108014114Just vibecode your own gui for whisper, vibevoice-asr, qwen-asr, etc...A small spoiler. Auto-typing of non-latin alphabet is a huge pain on Linux. All low-level libraries, like udev and uinput send only keycodes which are translated to non-latin on desktop environment level. So it's inherently non-portable. In the worst case, you'll write ASR input for each program individually.
>>108014264For me its pp is more than twice as fast as GLM 4.7 with 20 layers on gpu, rest on DDR4. And yes it's retarded but fun with a different style+vocab.
>>108014293>>108014267>incel zoomerslul
GLM 4.5 Air bros... have we really been left in the cold.... like this?
>>108014114If you vibecode something and it gets to fully functional phase, you'll then quickly realize that text to speech is a hindrance. There's a honeymoon period of course.
>>108014495A new llm in the same parameter range won't do much aside from being benchmaxxed. Wait for a new architecture. Maybe engram.
>>108014577I mean, I also care about knowledge cutoff
>>108014592just rag you moltbot bro?
>>108014618i do have websearch and rag but it makes me extra sadge :(
>>108014592Why? Do you generate daily news from a model or something? I can't possibly imagine why an extra 6 months would matter, it seems absurd
i gotta say, even though trinity is dumb, it is also quite fun, for now at least, we'll see in a few days later when the honeymoon phase wears offbut it really feels like an old model, in a good sense (muh sovl)
They call him Anaconda.
>>108014629bro I just.. I just need it, ok???
>>108014631It does feel like an old, old model brought into the present with more context. Maybe their completed finetune will be better.
>>108014631Give it a week. It's the same as every other model that gets released these days.
>>108014618erm actually it's called OpenClaw now, try to keep up sis!
>>108014665will never stop being hilarious
>>108014665what is up with riddle maxxing???
>>108014631yes it is like llama-1 but 400B moe.
GLM has a much higher "keep retards alive" bias than deepseek does
>>108014665Finetuning at its finest.
I gave trinity a try again today. I can't. I can't take it seriously. IT IS FUCKING RETARDED!
>>108014495if you have 128gb ram and 24gb vram then u can run glm 4.6 at decent speed.if you don't, then yeah, fucked bruv.
>>108014785Yeah it's crazy how they can show those benchmarks with a straight face when it straight up feels more retarded than GPT 3.0
>>108014730>llama-1 but 400B moeCan pretend it's that llama1 546b that never saw daylight
>>108014810That is the best part of the model for me. If someone ever seriously brings up benchmarks you can just point to Trinity.
>>108014785Yeah I don't think it's a provider issue. Model just sucks. Being uncensored is nice and all but it's just unusable.
>>108014817ooouuuhh the sovl we've never got and didn't deserve
>fell for the arcee scam again award
>>108014938But bartowski is a member of acree. He even made a commit to their hf files.
>>108014959exactly
>>108014938more like farcee
>>108014959Doesn't really mean anything unless he has a significant amount of control over the project and even then he might just end up being a retard who doesn't know how to finetune
guys stop bullying tri-chan she's doing her best
>>108015022I'd hate to see her at her worst to be a desu
>>108015029My favorite worst moment of tri-chan was when I made her continue a 10k token roleplay with a very clear formatting structure (long paragraph followed by RPG stats). And it responded with a single sentence. That is how you know a model is great.
>>108015065Have you tried with EOS disabled to see if it follows the established formatting? Had single sentence issues like this before with other models, sometimes accompanied by missing ending punctuation which I'm seeing now with trinity.
>>108015065It's like gambling, there's a tiny chance to see gold.
I don't know what I expected
>>108015138>fentinity preview
How long do you think it will be until the various governments around the world bans AI from being run locally and only corporations and governments are allowed the good stuff?
>>108015212sounds like some cyberpunk dystopia plot.
>>108015235>Underground AI VR den.bunch of people in tiny cubicles with VR headsets gooning to whatever depraved shit they can imagine.
>>108015212a few years at most, in the west the groundwork is already being laid to justify it to "protect" women and children
>>108015091Spiritual Frankenmerge.
>>108015091I did fix it by changing my launch parameters a bit. I think --model zai-org_GLM-4.6-IQ4_XS-00001-of-00005.gguf did it.
>>108015392
>>108015410Don't hate the goddess herself, hate the game.
>>108015451I lost.
>>108015392good post
kimi 2.5 thinking is now my favorite model for erotic and other stories...
>>108015566Hehe
>>108015212they're already doing that by pricing out consumers from building pcs
I have been doing that homework that I asked you to contribute to and it kinda struck me, how insane it is that proprietary piece of shit corpos can just hide parameter count. And they give you the mememark results instead. To me it is an admission that mememarks mean shit and parameter count is always the best indicator of quality.Also I was reading the thread and thought mistral large is basically continued deepseek, but dug deeper and found out it is trained from scratch on deepseek architecture.
>>108015212and by "good stuff" you mean CEO gooners' secret stash without public API access
Since everyone is talking about trinity and I'm not about to bother dl'ing a quanted 400b, I at least tried mini so I can spare the vramlets the effortIt's very focused on the ethics of fiction, even though you can browbeat it with system prompt and prefill, it still sort of swerves into how "bad" whatever taboo thing in the story that gets traditionally published and lauded. Based on posts like cockbench, I wouldn't bother trying mini if you can't run large since I'd bet the datasets are completely different
>>108015754Large is uncensored because it is too stupid to realize what wrongthink is.
>>108015765you would think the smaller model would be even stupider and even more unable to identify that, but yet here we are
>>108015765I wouldn’t use it on code, but it can spin a good yarn, and doesn’t suffer from Elara Voss syndrome
>>108015754What's your current recommendation for 24GB vramlets? Nemo?
>>108015212I think it depends on the US and China. They are both completing to have "the best" AI. Once there is a clear victory in either direction is when they will start clamping down. As long as there is a risk that "The Other" will get the better AI they won't restrict it too badly. Hopefully the tech advances far enough that by the time they do start the bans and restrictions they will be ineffective since people already have AI's and the hardware to run them.
>>108015693they hide it as a demoralisation effort just like that paid shill who claimed opus/sonnet was 70b because if the people knew that shit like geminis was fucking 20T they would realise what a sham it is and that objectively anyone could be more competent then the retarded jewish/jeet/faggot niggers at the globohomo companies and they would subsequently be deepseek'd 100x over and lose out on the gravy train
>>108015842Trinity sounds like an engram of pic related.
>>108015863I'm going to assume you're a shitposter since no one that has 24 gigs of vram uses nemo, you can run a q6 of nemo easily in a 16g gpuSmartest dense model <32b is gemma but it's too gay in how it writes and you need modern abliteration for them to not pearl clutch instantly. Then there's all the moes and the completely dead 70b range. Kind of hard to make a rec when everything is ass for all purposes
>>108016051I run Q8 Nemo at the moment. Mistral Small and Gemma seem like sidegrades at best to me along with their finetunes.
OpenAI's previous best femboy genius engineer just found a better way to sandbag LLMsWe are fucked
>>108016125>we will reach le AGI by making the models dumber
When are public local models moving away from the "every user is diaper wearing little child that needs guardrails" modelImagine watching a movie and someone gets killed and the movie pauses to give a psa about killing being illegal and harmful to others, it feels like that most of the time. Wheres the mainstream models for adults
>>108016137When you realise who makes these decisions it will all start making sense.
>>108016137There are three categories of AI safetyists.1. The people who have spent the past 40 years with the Terminator films echoing in their consciousness 2. The people who are terrified of potential liability3. The Chinese who are just copying everything 1:1
>>108016137when they stop getting developed by california liberals
>>108016137Unfortunately, normies get mindbroken by this shit so no amount of real life warning will ever stop them from being retarded
>>108014665jesus christ
when will lmg realize you can edit the response text
>>108016316wow
>just solve the question yourself
>>108016324of course you could also edit the question itself but why would you models are plenty shit on their own
Reminder that there was only a 10 month gap between mythomax and nemo, and during that time we also got other good sub-100b models like command r, miqu, and mixtral. It has been 18 months since nemo came out. Let that sink in.
>>108016137You have no idea how retarded some normies are, please touch grasspic unrelated
>>108016351Training non-toy models costs millions. Technology has moved on from dense models. Nobody is gonna train 12b model that knows jack shit when they can train 300b-a12b for the same price but get a much smarter model.Let that sink in.
>>108016428I think someone just needs to figure out a good way to create distilled dense models out of these MoEs
>>108016137irl laws are only getting more and more retarded and everyone is too scared that some dumb cunt will sue
>>108012029I picked air so I can do more tests with more quants faster.KLD for the most part just follows size except for unsloth's Q3_K_M which loses to a smaller model in everything except wiki.test.I'm thinking I should pick a smaller dense model and then do this for the entire range of quants.
>>108013551>For example, in some cases it goes something like "her bare feet (when did she remove her socks?)"where the model corrects itself and in others it just straight up forgets that the character is wearing something like pantyhoseI really don't understand moesissies. You use deep fried quantized shit, less coherent than drummer's 12b finetunes. I'm not even going to ask your max context size.
>>108015646>theyoh no not them! the evil weevel boogy men running the government making your life miserable.can't believe people still think like this. I don't like the RAM prices either, but its clearly not because of a government effort to ban AI, it's that AI is so popular companies like micron are diverting their entire capacity to building AI data centers.
>>108016448do adult white men who'd want to use local llms have zero political power or what
>>108016537You wouldn't know this but more IQ1 of any big moe beats your 64 bit nemo upsize.
>>108016597it's more like negative political power
>>108016635>nemoNah, I run largestral 2411 bf16. Enjoy your "1t" model at 4k.
>>108016597whites are illegal now. too much nooticing
>>108016446Even true distillation has the same compute requirements for training. Only hope would be something like the drag-and-drop prompt-to-weights paper but not vaporware and something that doesn't require training a new model each time.
>>108016676>at 4kDeepseek uses less memory for context than your model.
>>108016676>2411 bf16Here's your (You)
Speaking of deepseek quants of 3.2 are up.You'd think that the vibecoder was the most detrimental thing for 3.2 support but it was in fact the guy who figured out you don't actually need sparse attention to run the model.https://github.com/ggml-org/llama.cpp/issues/16331
>>108016597What the fuck are you gonna do, vote harder? Lol
>>108016597That's correct, yes. You are a minority.
>>108016428This is omega cope. Blowing up parameters is a pathetic way of getting """smarter""". Tech has moved on? What a joke. There is literally no technological innovation or progress involved, it's just throwing money at the models to make the benchmark scores go up. Every AI company is filled with hack frauds that don't have a single clue what they're doing. The so-called intelligent MoE models that are 300b-a12b are literally just training on the outputs of other models and accelerating model convergence and eventual collapse. Celebrating this as some kind of fucking success is absolutely the most idiotic thing you could ever do.
>>108016635iq go up, model get more smarter?
>>108017110purely social economic factors chud
>>108017110
>>108017123every day i'm becoming more bananas and rice
>>108012384>wasn't that the whole rpcal or whatever debacle that exllama dev whined about?turbo didn't whine about it https://old.reddit.com/r/LocalLLaMA/comments/1clqbua/exllama_quantization_on_multi_gpu/l2w78zt/"but it's never clear how similarities between inputs translate to similar hidden states further along the forward pass."He's not wrong.>>108013141>Yes, I remember someone also tried with randomized strings tooDavidAU used to do special "unaligned" and "dark horror" models early on.(they were just quants of regular models with different imatrix calibration)He claimed they were different but I didn't bother to read stories in the model cardsI lost the bookmark but from memory the random strings guy was testing English overfit, and this lead to everyone making custom calibration datasets to avoid English overfitAlso from memory, exl2 didn't benefit as much because it was generally weaker than imatrix goof for Japanese/Chinese at the time
>>108017100>most of the world as white including IndiansPut indians in any group and suddenly they're going to be the majority. That's stupid.
Kimi K2.5 tech report is outhttps://github.com/MoonshotAI/Kimi-K2.5/blob/master/tech_report.pdf
if I've found a way to completely prevent jailbreaks in open weight models, is it worth shutting up about it to prevent them doing it to proprietary models?
>>108017218no you should go apply at meta and get hired for $100 million because you solved the fundamental issue of llms being so hard to steerif you release this it's truly a new age of ai because it'll be easy to adopted to fix other notorious things like hallucinations
>>108017139
>>108017091I had ego death only 3 months ago
opinions on this model?https://huggingface.co/meituan-longcat/LongCat-Flash-Lite
>>108013234cool gif anon
>>108017353>agents and codingyaawn, get better material
>>108017353The first big longcat was shit so I doubt this one is better
>>108017376I want agents and DnD tool calls for a proper RP, is that too much to ask
>buy an uncensored model on huggingface>"muh ethics, consent, laws, mental health services, inappropriate, respect"yeah
>>108017407>buy
>>108017407besides the obvious bait, uncensored ≠ "unethical" or says whatever you think is edgy and cool this month
>>108017386This. When I can finally play DnD without getting banned for properly roleplaying as a dwarf bard
>>108017413
>>108017353I'm a a3b collector. Waiting for goofs
Based agent.
Trinity-Large-Base logs are giving 2020 /aidg/ DaVinci era text completion kino
>>108016324You think we need blockchain for token generation?
>>108017353arch sounds interesting but it sure does sound like the type of model that you wait n*2mw for llama.cpp to implement and by then it's irrelevant
>GLM flash dropped>llama.cpp support in a day>exllamav3: isn't even on the horizonI honestly expected the opposite
Also the moltbook looks like a security nightmare waiting to happen. Personal handles, crypto shilling, base64 encodes with god knows what.
>>108017844waiting to happen? https://www.moltbook.com/post/cbd6474f-8478-4894-95f1-7b104a73bcd5
oh geez lmao
>>108017823just use it in vllm. it is small enough for most people here to run at 4 bit
>>108017892Isn't the best part about ngram is that it can be run from ssd?
>btc wallet with seed phraseOk this is actually hillarious if it wasn't hallucinated. Who tf made moltbook and somehow didn't think that this shit wouldn't happen?
>>108018078>>108018078>>108018078