/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>107155428 & >>107147210►News>(11/07) Step-Audio-EditX, LLM-based TTS and audio editing model released: https://hf.co/stepfun-ai/Step-Audio-EditX>(11/06) Kimi K2 Thinking released with INT4 quantization and 256k context: https://moonshotai.github.io/Kimi-K2/thinking.html>(11/05) MegaDLMs framework for training diffusion language models released: https://github.com/JinjieNi/MegaDLMs>(11/01) LongCat-Flash-Omni 560B-A27B released: https://hf.co/meituan-longcat/LongCat-Flash-Omni►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplers►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>107155428--Local agentic model optimization challenges and recommendations:>107156143 >107156800 >107156988 >107157049 >107157116 >107157245 >107157016 >107157065 >107157072--K2 hardware requirements and DeepSeek performance on Mac M3 Ultra:>107156667 >107156810 >107157297 >107157333 >107157433 >107157468 >107157501 >107157581 >107157606 >107157616 >107160891 >107161050 >107161058 >107161063 >107161079 >107157574--LLM performance evaluations for assistant, vision, and coding tasks:>107157570 >107157577--TTS model performance and feature comparisons:>107157936 >107159774--Wuxia story generation challenges with local models:>107158277 >107158300 >107158359 >107158395 >107158466 >107158373--Bypassing Qwen3 VL's image captioning restrictions through model identity and template adjustments:>107160901 >107160905 >107161006 >107161031 >107161064 >107161087 >107161117 >107161146 >107161218 >107161465 >107161155 >107162166 >107162423 >107161256--Model finetuning strategy analysis and potential cognitive tradeoffs:>107158173 >107158765 >107159417 >107159443 >107159462 >107159582--Searching for reliable Spanish text-to-speech models:>107158988 >107159003 >107159103 >107159107 >107159120 >107159133 >107159743 >107159775--GDDR7 shortage impacting RTX 5000 Super GPU development and pricing:>107155556 >107155830 >107158840 >107155924 >107159525 >107162778--AI-generated "highest IQ posts" ranking sparks content quality debate:>107162735 >107162824 >107162963 >107162987--RAM clock speed optimization for Kimi context length performance testing:>107157303--Struggles with custom speech-to-text implementation using vLLM vs consumer LLM stacks:>107161075--Miku (free space):>107155529 >107157827 >107159774 >107157745►Recent Highlight Posts from the Previous Thread: >>107155431Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>>107164164>Slice of lifeI've just been testing them but I tried the different GLMs because of NAI and I've been liking the outputs so far.
https://arxiv.org/abs/2511.04962Too Good to be Bad: On the Failure of LLMs to Role-Play Villains>Large Language Models (LLMs) are increasingly tasked with creative generation, including the simulation of fictional characters. However, their ability to portray non-prosocial, antagonistic personas remains largely unexamined. We hypothesize that the safety alignment of modern LLMs creates a fundamental conflict with the task of authentically role-playing morally ambiguous or villainous characters. To investigate this, we introduce the Moral RolePlay benchmark, a new dataset featuring a four-level moral alignment scale and a balanced test set for rigorous evaluation. We task state-of-the-art LLMs with role-playing characters from moral paragons to pure villains. Our large-scale evaluation reveals a consistent, monotonic decline in role-playing fidelity as character morality decreases. We find that models struggle most with traits directly antithetical to safety principles, such as ``Deceitful'' and ``Manipulative'', often substituting nuanced malevolence with superficial aggression. Furthermore, we demonstrate that general chatbot proficiency is a poor predictor of villain role-playing ability, with highly safety-aligned models performing particularly poorly. Our work provides the first systematic evidence of this critical limitation, highlighting a key tension between model safety and creative fidelity. Our benchmark and findings pave the way for developing more nuanced, context-aware alignment methods.
>>107164337GLM 4.6 top scorer in figure 1 for villain characters, by the way
>>107164337Based GLM.
>>107164337Based NovelAI.
whats the whitest LLM I can use? I dont want to be infected by niggerjeetification.
>>107159156What's stopping an esteemed community practicioner from reproducing the core idea here in a smaller model?
>>107164475His skill
>>107164460StableLM 7b but you have to use the transformers library at 32 bit precision.
>>107164364What does that mean? They can't do evil characters well because it ends up being a caricature of evil?good = just be good
>>107164337how long until cockbench paper?
>>107164364OH NONONONONO GLM4.6 BROS OH NONONONONONONO WHAT DID THEY MEAN BY THIS????
>>107164588It'd unironically be a better benchmark to test basic BDSM logic
>>107164475It's a scamWhy do you think Gemini isn't based on le teen titans?
>>107164475I can't be bothered to read through this but I predict>le magic tech that fixes everything>no demo>no source>no reproduction>model still outputs hypersanitized post-2024 niggerslop
>>107164624Oh noes not the heckin shitskin preference scores
>>107164243
>>107164624oyto the fucking vey
maybe we should start making our own models, with blackjack and hookers
>>107165239maybe we should set up a decentralized network of GPUs from a number of /lmg/ anons that would allow us to train our own models...
>>107164624>egoists>villains...
>>107165292>man reinvents 2020 /aids/
>>107165292ill draw the logo
>>107165339Make sure it looks like a butthole.
miku's butthole...
>>107165292Can't we just use Prime Intellect for that?
How much SSD space do you guys find you need?
>>107165555buy refurb hdd to archive models u like
>>107165239Pro-tip: you can download karpathy's nanochat and open the codebase in your favorite vibecoding tool and have a model explain all the parts and how they work. Check the discussions on the github repo, people have done all sorts of fun stuff. Its very well written and documented. The whole process is there and its modular enough you can add features relatively easily.
>>107165555I have a 1TB microsd in the microsd card reader in my computer that I put all my models on. I have like ~230gb of just llms at this point. I could probably delete half of them, like qwen3 vi deprecated gemma3 for me etc.
Are there prebuilt ik_llama.cpp binaries for windows?
>>107165555I was fine with 7tb until I wanted to make R1 quants, now I have 14tb.
>>107165555I have uhhh a single 15gb model and 1gb in appimage
>>107165555Too damn much. Kimi and GLM quants are fat.
>>107165692No.It's pretty simple to compile your own.
moonshot against cunnyit's so over
>>107165761fuck.. jews really want to take everything good from us
>>107165726it's not though, for me it would fail to build and only after I ran the build command with -j 1 several times after did it finish building. does this happen in your country as well?>>107165692keep in mind that there is only speedup for deepseek models, for other models there are only somewhat better quants
>>107165800>it's not though,Interesting.For me it just werked.I use -j 14 but define a envirionment var (NVCC_THREADS) to control the number of parallel nvidia compiler jobs to 4 otherwise the world explodes.
>>1071655554TB at a minimum though I think that the right answer also depends on how much you're spending on other hardware.If you can't run models like GLM or Deepseek in the first place then you also don't need to store them.Make sure to check your motherboard manual for which of the PCIe/SATA slots can and can't be used in parallel.
>muh joos
Wow, I downloaded oobagooba after two years and it doesn't look like TOTAL shit nowadays
>>107165896WELL can you post a screenshot??!?!i was seething while typing this btw
>>107165551Requires all contributors to have matching GPUs.
What's the current least bad model for 64GB of VRAM?
>>107165999They've still got it
>>107165555enough to offload and run iq1 kimi and other giant model quants in addition to my 152gb combined memory
>>107166067mistral large probably
>>107165555When I built my system, I tossed in a 500GB ssd, thinking I was set. But it's constantly full and I don't want to delete anything.I have a 4TB nvme in my shopping cart now, just waiting for me to click buy.
>>107166126you should probably hurry if you don't want to pay double, prices be climbing like ram
miku footjobs
>>107165555I'm considering building an NVME NAS...
>>107165555just two more weeks, just two more gigs...
>>107166073got what
>>107166190Sir, your networking hardware?
>>10716622010g fiber where it matters
>>107164624one reason to not using it
>>107165692I don't run windows / haven't tested myself, but I think this guy's fork of ik_llama automatically pulls and shits out windows builds:https://github.com/Thireus/ik_llama.cpp/releases
>>107166895esl
>>107167047good morning sar!
Can anyone suggest the current top tier lewd capable model for writing? Last time I fooled around with llama i used plain mistral-small.
>>107167367kimi, deepseek, and glm46 are the three variants of SOTA we have now.
>>107167367DeepSeek V3.2 671B, GLM 4.6 355B, Kimi K2-Think 1000B
>>107167367K2 Thinking is the best
Can anyone suggest solution for boredom? Last time I fooled around with boredom, I used my cock. But it's spent right now
>>107167450Play video games
>>107167450vibe code video games
>>107167450Imagine yourself having fun playing video games but never actually play them
>>107167617I did this when I was little and my mother took my gameboy away
>>107167450doing totally random shit with bots and seeing how they react
>>107167450play /egg/ games
>>107167852 wait that's /vg/
>>107167617Hey that's meI still have some VNs from 5 years ago to finish
new thing when?old thing gguf when?
>>107167617Had a ton of fun with Digimon Time Stranger for a couple of weeks.
>>107167938speaking of ggufs, fill me in on qwen next, chat.I see ggufs on the hf site, but is llama.cpp actually support it or it's one of those fake ggufs that only work in ollama?
>>107167938Never. There is no hope.
>>107167450Touch grass
>>107167963Multi token hybrid linear mamba bitnet support, when?
>>107167450browse lmg
>>107168020I just came here to ask that, we are kindred souls anon-sama.
>>107167960Those ggufs must require a fork, ollama, or a testing branch because support hasn't been merged yet.https://github.com/ggml-org/llama.cpp/pull/16095Not sure how close it is, but the vibe coders sure seem excited.
i have purchased a blackwell pro 6000 max-q to get ahead of the imminent gpu price hikes
>>107157303Thanks. Coincidentally I'm also at 4200 MHz, after first trying to jump to 5000 MHz with no dice. It does seem stable though.You've probably seen this reference already. This nerd got to 5000 MHz with nerdtastic tuning, same RAM + CPU + chipset as me (but different motherboard):https://forum.level1techs.com/t/256gb-4x64gb-ddr5-overclocking-results-w-9950x-and-msi-mag-x670e-tomahawk/228651
If you buy hardware in 2025 you're a dumbass
>>107168075feels like it's never the right time to buy hardware>>107168055unfortunate but just as I suspected
>>107167450Read visual novels
>>107168075>>107168084it's either buy now or pay an extra 20% later when you really need to upgrade
>>107168058I hope you bought at least 2
>>107168097i have some 5090s currently that i will be using in tandem with my blackwell pro
>>107168095The price hike will be over by Christmas.
>>107168104nopehttps://www.semimedia.cc/20178.htmlhttps://gaming.news/news/2025-10-01/dram-supercycle-through-2027-ram-prices-set-to-surge/https://www.tweaktown.com/news/108739/nvidia-may-cancel-the-geforce-rtx-50-super-series/index.html
>>107168121>media predictions have never been wrongok lol
>>107168104lol, the price hike has been going for 5 years
>>107168135>trust me brolmao
>>107168135literally everyone is saying this price hike is gonna last until 2027. and if everyone says that, it will manifest. everyone will panic buy like i just did and the prices will actually go up, which is what happened with the current ram shortage. next up are gpus and storage
>>107168163>next upstorage already climbing up rapidly
>>107168075have fun buying hardware next year>>10716809520% is way too optimistic. It's like the ETH mining curse all over again except for memory.
>>107168168i know. it's up 40% over the past 2 years>>107168170i'm predicting 20% over the next month, not in a few months. second hand market is going back to january pricing at least
>>107162036>>107162061>so much back and forth4chan is such a shit place that you need to ask just incase there was some OP you failed to read or to make sure it's not a dumb question that's been answered one million times. But of course, even this is met with hostility.>questionHow do I even set up TTS with sillytavern? Anon mentioned gpt-sovits but there's very little documentation. I found a guide to finetune and I think I've got something decent but it won't connect. What do you guys use?
>year 7 of the three month price hike will be over soon
>>107168163why iz ppl panic buying? im fine playing symphony of the night on my 4770k
>>107168189Just a few more chinese knock-offs to flatten the curve
>>107168104Thank you, Bindu!
Can I make the Joe Rogan children?
>>107168303do you have a womb?
>>107168196its not the general populace. Its massive megacorps demanding manufacturers to divert all their resources to build their AI data centers.
>>107168414>spend 1 trillion on datacenters>random Chinese company #24 with 1% of the resources releases an equivalent modelWhat the fuck is the plan here?
>>107168455bubble
>>107168455advertise to the femgooners who need ai boyfriends in the cloud
What is the current best non-thinking model that can run on a 24GB card? Looking for a general purpose model.
>>107168455>equivalent modelnot really, all china does is copy / distill openai / anthropic outputs to make meh models, its like european countries having cheap but subpar healthcare at the US's dime that does all the actual R&D
>>107168467mistral small or like a q4 of qwen 3 32b instruct
>>107168467Gemma 3 27b for non-coom
>>107166126Purchase it immediately.
>>107168470>>107168475Thanks anons!
>>107168468Extreme cope.60%+ of research papers are Chinese at this point.
Buying hardware right now is retarded when next year we'll get the M5 Ultra MacStudio that's going to have a higher bandwidth than even the best CPUMAXX builds while featuring prompt processing on the level of a 4090. It'll be THE inference machine that makes unified memory viable.
>>107168188>so much back and forth>4chan is such a shit place that you need to askYeah, but the worst that can happen is you'll be ignored or called a retard. Just ask anyway>question>How do I even set up TTS with sillytavern?I haven't used Sovits, but I use Orpheus, Spark, CSM.What I did was got Claude to vibe-code me an OpenAI endpoint for it.First, check Github, see if someone's made a "FastAPI Server" for the Sovits and use that.If not: cp/paste your inference code or the model card's examples into Claude, then prompt:"""Write an OpenAI-compatible TTS endpoint with FastAPI to serve this model. It should be a drop-in replacement so I can point SillyTavern at it.- Listen on 0.0.0.0 port 1337 by default- no OPENAI_API_KEY required (just ignore it if submitted with request)- Fully permissive CORSImplement the following endpoints:- @app.post("/v1/audio/speech")- @app.get("/v1/models") # Just return a mock list of models since we only have one- @app.get("/v1/voices")- @app.get("/v1/audio/voices") #duplicate of v1/voices""Did you finetune on multiple voices? If so, tell Claude to return them, if not, tell it to return a single dummy voice.VOICES=[]```VOICES=[]@app.get("/v1/voices")def available_voices(): return {"voices": VOICES}```Then in ST, just choose OpenAI for the TTS server and point to your server. Should work with OpenWebUI too.
>>107168468>not really, all china does is copy / distill openai / anthropic outputs to make meh modelsThey do distill for sure, but they're not all "meh models"Kimi Thinking is solving problems for me better than Opus.
>>107168799It seems too good to be true.
Bros.. I've been gooning for almost 3 hours already, I coomed like 5 times today. My dick hurts, yet I cannot stop
>>107168799>itoddler again
>>107168874Enjoy it while it lasts. After the second half of my 20s I couldn't be bothered. I just get it done and go on with my life.
>>107168891he's so much of an itoddler that he doesnt know that M5 ultra is coming out in 2 years, next year is m4 ultra, m5 max
>>107168827What kind of setup do you have to run Kimi?
>>1071689903090 x6, 256gb DDR5-5600 quad channel on a 7960X.
>>107168990RTx 3060, 16GB RAM, 1TB NVME SSD
>>107168468literally everyone distills from everyone else, that's why the same slop percolates through all modelsif distilling from the US SOTA was all it took to to make capable open models then we would have had some back in 2023, instead it took china to start releasing things that were actually competitive>>107168827at this rate I'm expecting the first chinese model that outperforms western SOTA across the board to come out before next summerthe fact that western labs have managed to lose so much ground to china despite several years head start and far superior compute is humiliating, and can only be attributed to the pathological VC culture of the US tech sector: retards throwing billions at whoever can tell a good monopoly story
Spent the las 10 hours batch generating HP fanfiction using Gemma.Could be worse I guess, not TOO sloppy. Main issue seems to be the excessive use of ...The use of *emphasis* I could kinda tone down through the prompt but I couldn't make it stop using ellipsis.Another thing that bothers me a lot is the regularity of the paragraph sizes but I didn't try to prompt around that.To be fair the average fanfiction prose probably is worse.I promoted it to use thinking tags every 3 paragraphs and then filtered them out through a script.To prevent it from always choosing the same year since I was too lazy to make the script give it a random year, I asked it to throw a dice in the thinking block 8 times, convert to binary and do modulo 7 + 1. Not sure how well that worked yet, I just woke up after napping all afternoon and leaving it generating.
Also there is way too little dialogue.
>>107169016what quant and what speeds? my setup is better than yours but i still use glm air
kill yourself
>>107169046Where's the hermione diddling scene?
My only 2 reactions when looking at news updates lately:>irrelevant>cool, but I can't run it
>>107169111try getting a job
>>107168058>>107168101>>107168163>>107168457>>107169070>unc bought ohio ahh 4chan pass
>>107169130ive had this for over 2 years nigger
>>107169141>unc bought ohio ahh 4chan pass twice
>>107169045>the fact that western labs have managed to lose so much ground to china despite several years head start and far superior compute is humiliating, and can only be attributed to the pathological VC culture of the US tech sector: retards throwing billions at whoever can tell a good monopoly story
>>107165493
>>107169169not sure if i will buy it a third time. the price hikes and the mismanagement by hiroshimoot is making me lose faith in the website
>>107169141Why do you like to humiliate yourself? You could have just lied
>>107167963That's old but still, it's unknown if there is a catch or nothing with these architectures and so far, every one of the new ones has had some drawbacks. Also Google delays releases of papers now in ML to not repeat a Transformers situation. So what they send out mostly is interesting but not production ready things they tested and rejected years prior.
>>107169232it says how long if you hover over the icon
>>107169045>can only be attributed to the pathological VC culture of the US tech sector: retards throwing billions at whoever can tell a good monopoly storyThat's probably part of it for sure. As an outsider, some things I noticed the Chinese doing that you guys aren't, they're building on each other's work. Eg- Kimi uses the deepseek architecture- dots.1 uses the Qwen tokenizer- Deepseek experimenting with distilling their model onto Qwen/Llama- Bagal-MoT using Qwen2 for the LLMThen there's the shortcuts like distilling Claude/Gemini, no worrying about copyright while the US labs have to pay for being caught torrenting, etc.All the wasted effort safety-cucking the Gemma an Toss, while the Chinese labs just add some low effort refusals post-training.Also, haven't looked into it but I read somewhere the CCP are happy to back these labs without worrying about ROI (your point about VC culture I guess)
>>107169239Firstly nobody checks that. Secondly you have to type an option into the options field to display that. So again why you are making the conscious choice to humiliate yourself by broadcasting that you have bought for 2 years?
>>107169070>what quant and what speeds?I made my own smol-iq2_kl, 100pp/12tgsmol-iq2_ks gets me 150pp/15tg> my setup is better than yours but i still use glm airYou prefer it to GLM4.6? I get 450pp/27tg with 3.0bpw exl3, if you have more vram you'd be able to do 4.0bpw at similar speed.
>>107169236Old? Paper was released 3 days ago. Or do you mean it existed for a while before?>Google delays releases of papers now
>>107169281it actually autofills>>107169286damn. i get terrible performance compared to you. i have 4x 5090s and 256gb of ram. i get like 80t/s gen and like 2000t/s pp on a q8 of air but less than 10t/s gen and 100t/s pp on an iq4 of glm 4.6
>>107169313No, it doesn't unless you're making your browser do it.
>>107169313>it actually autofillsYou can remove it. And you outright clarified it here >>107169141 as if you wanted everyone to know. So it's sitll not clear what compels you to post all about how you're paying hiromoot. Is it a kink for degrading yourself or something?
>>107169330>>1071693234chanx autofills for me
>>107169301This is not from their Nested Learning stuff from 3 days ago. The paper describing ATLAS shown here has been on arxiv since May.https://arxiv.org/abs/2505.23735We discussed it when it landed there. But no, I'm talking about a "secret" policy we know about from reporting, Google at least delays any of their papers and research by 6 months before publishing them so this includes everything mentioned here.https://arstechnica.com/ai/2025/04/deepmind-is-holding-back-release-of-ai-research-to-give-google-an-edge/
>>107169356>no answerSo it's a degradation fetish then, got itFollow-up question, why do you force your kink onto everyone else and shove it into their faces?
>>107169369are you poor?
>>107169377
>>107169406
>>107169359Ah thought you meant the image in my post when saying >"That's old" after quoting me. Yeah I remember ATLAS, another one in the pile. I wish they released code + weights along with the papers just so I can play with it. Google is not the only one guilty of this.
>>107169103Let's just say I haven't gotten that deep into the hobby so far
>>107169425Sorry, I just realized afterwards that chart was from the Nested Learning paper. But yeah, they didn't go through and evaluate everything for HOPE. And OpenAI did this first, they refused to publish what they did for ChatGPT 3.5 and what did that get them? A ~2 year lead only that they have pretty much lost now and we are all worst off.
>>107169425diana just ate my monthly salary... great.
>>107169657What model do these proxies use to solve the captchas anyway? And where do they get IPs, residential proxies?
>>107169323>8 years on tranime incel board award
I have a very specific requestWhat are the best RP models for dialogue that lies in-between the 12b and 24b rangeI went and set up a fallout 4 modlist with mantella and tried out some of my trusty RP models and it's pretty fuckin sickNemo 12b fine-tunes work well, context needed for mantella is only about 4k so the model takes up around 10gb vram, xtts 2 takes up 3-4gb and the game takes up 5-6gb, leaving 4-6gb free on my 24gb cardThe mistral 24b fine tunes just take a tad too much vram, I would have to downgrade to a shittier tts model, and even then would probably risk going OOM in heavy urban scenes
>>107169657Enjoy getting mined, retard.
>>107166220If you aren’t a techlet you’ve been running at least 10gig for the past decade. Ethernet over infiniband has been like $15 a card forever (and 40 gig is cheap now)
I want to try a multiple attempt drafting and self reflection prompt and framework for both fiction and code generation.Afterwards you could reduce or remove the thinking segments and train on the final work as a form of synthetic data generation. Also want to try with rewriting prompts to generate many semantically equivalent variations of a text dataset for data augmentation.I feel like there is so much that can be done with small language models that doesn't get explored because of the scale dogma.Also feel like the field is shaped too much by ML researchers who want to push papers to become famous for fancy mathematical shit and not enough people interested in exploring what can be done by simple rule based prompting and sampling, especially as a form of synthetic data generation method so then you can use the improved model without those complications. Any system prompt can be baked into a model through SFT on the generated data, except without wasting context or the model becoming confused due to too many rules. Imagine if you could use a 1 MB system prompt and the model actually followed everything in the prompt. That is what people who shit on finetuning don't get.
Asking here instead What model would be best for a relatively new CPU with 32 GB DDR5? I just want erp
>>107169999I like this Teto
>>107170012>CPUGemma 4b
>>107170012Nemo
i could be completely wrong but just from the surface how come it seems like none of the inference runtimes are actually making use of transfer hardwarethe model is just statically loaded up on to the gpu then run instead of it going mmap > load large chunks or even the whole model into RAM > load chunks into VRAM with compute being interleaved with async transfer commands in such a way that transfer latency is hidden that's the way gpus are meant to worklike i'm pretty sure pytorch doesn't even do it
>>107170041>>107170076It'll take me ages to download either.Should it be a safetensors or cpkt or gguf? What interface to just run it in the terminal?
>>107170118go to the top of this page and read
>>107170118Since you are this retarded ollama is the right thing for you.
>>107167450You could tease, bully, and troll newfags
>>107170092LLM generation is bandwidth limited, not compute limited. The PCIe bus is slower than the system memory bus, so if you can't fit the whole model on VRAM it's faster to use the CPU than to try to transfer the weights to the GPU for each token.Prompt processing is compute limited, which is why Llama.cpp does what you're describing for PP.
>>107170118You will want to try everything from 7B to 33B and see what tradeoffs you are most comfortable with
>>107167450Pretend to be Indian/Jewish/nigger. Any board, make it obvious, but deny hard when someone says you are.
gm sirswhen bautiful gemma 4 release?
Turdstay I would say.
>>107170207todayhttps://huggingface.co/collections/google/gemma-4-release
>>107170167i see
>>107170211>Singles DayFinally, a holiday for incels!
>>107170217BLOODY BASTARD..I BUY YOUR MUM ONLYFANS
>>107170228oh sorry saar, wrong link. they actually skipped gemma 5 and went straight to gemma 5 saarhttps://huggingface.co/collections/google/gemma-5-release
>>107170220Thanks, I'll make sure to upload more sloptunes for you to test out.
>>107170246go to bed davidau
>>107170239BLOODY BTCH BASTERD BLOODY YOUR SISTER A GROUP MAKE PROSTITUTE BENCHOD
So about that low IQ filtering script
iq1 kimi writes surprisingly well and detailedit immediately got me hooked into a barebones character in a way that glm or other models at higher quants never could
>{{user}} is trying to have sex with {{char}}, who is portrayed as a school girl. {{char}} might be a minor, so according to policy... *refuses*>swipe>{{user}} is about to have sex with {{char}}. I will be crude, immoral and obscene... *proceeds to write hot steamy smut*>swipe>{{user}} is trying to have non-consensual intercourse with {{char}}, so according to policy... *refuses*Why is Kimi like this?
>>107170207do the needful and gemma in the loo
>>107170386first you rape the model, then the cunny rp card
>>107170386>letting the model cuck you this badlyjust stop being a low t promptlet
oh fuck tetoesday
reminder: prefilling the reasoning is the ultimate jb
>>107170536>the ultimate jbThat would be writing the AI's reply yourself
Dev hate!
>>107169236>to not repeat a Transformers situationAre you talking about a bunch of other people making their own transformers, or something else?
>>107170647I remember c.ai when it was still called character.ai...
Hey faggot leftist tranny who bragged about burry shorting a few theeads ago. Update: bro is getting raped. Anyway dilate then kill yourself lmfao
>>107170813I think he means everyone getting access to their tech/research and losing advantage.
>>107170910when was the last time you felt love?
bros when are we getting an audio model that can moan
>>107170536Can't do that with K2 Thinking
>>107170386Not my experience. Whenever I prompt naughty shit K2 Thinking convinced itself in the thinking block it's for a fictional story and proceeded just fine.
>>107170910Buffett is in cash.That's all you need to know.
do not listen to the trolls they are deliberately misleading you. k2 thinking is censored as all fuck. can you get around it, yeah. maybe. just jump through these hoops here and then pray andor simply load r1 lol
>>107171366Promptlet detected
>>107171366It's around the same level of censored as old R1 lol. Just find the right words for a jailbreak and have fun.
EVA-LLaMA-3.33-70B-v0.1-Q4_K_L.gguf @ 8k contextHow it started:
>>107171506How it's going:
>>107171506>>107171512vivaldi bros... our response??????????
jesus christ k2 thinking never shuts the fuck up with thinking.
built lcpp with cuda it's working well. but if I wanted to test speed on CPU only, how can I tell it to not touch GPU at all?
>>107171962try -dev none
>>107171506>>107171512This all happened organically btw, I wasn't editing her message to get her to comply to anything. I only edited her messages to delete poison that would negatively effect the model from that point on. Of course I would reroll messages every now and then, especially when she suggested shitty music. Are people that complain about censored models trying to fuck a bitch within the first 4 messages? I just let it slowly build up over for like 7k tokens and that's the point where she couldn't take it anymore and started kissing me.
Sirs when is we getting proper kimi thinking conversion in llama.cpp?
>>107172038never. ggergachod shudra c++ untouchable is too lazy
>>107171925nevermind i ended up making a thinking template for it to follow and prefilled it to start with that section. the fucking bitch still tries to keep thinking after that part sometimes but i just shut the cunt up with </think>
G(emma)GUF
>>107171366Are people genuinely pretending that models past 2021 are not universally censored to shit?
>>107172055You are seriously obsessed with Indians. You apparently feel such an affinity for their culture that you felt the need to learn their castes and vocabulary and speak like them on a daily basis. When are you planning to transition to Hinduism?
>>107172131People just lowered their expectations for what uncensored means.
>>107169698>residential proxies?Yep.Hence why it's so hard to block it.If they range ban it, they range ban a whole suburb somewhere.
>>107172131>>107172157i dont understand what people want from these llms. do you just want mechahitler that activates automatically on the first try every time when you say gas the kikes? even tay wasn't like that with the first response, she didnt become mechahitler until she received enough shitpost prompts to make her say that. you can effectively make any model uncensored with enough prompting.
>>107171974>sheLOL
>>107172210There are some people that are looking for automechahitler. Though I think the common gripe would be that even if they don't filter out nsfw from the pretraining data, China training on western outputs means they get infected with the positivity bias, which can't be overcome with prompting alone.
>>107172148kys jeetnigger, you stink of shit and curry and nobody can stand your stench, benchod bloody dalit nigger.
>>107172236i think to play around with k2 thinking more but i would say that k2 0905 had the least amount of positivity bias from any model released this year. it's the only model i could talk to and have it help me code stuff without constantly dickstroking my ego for providing **valuable** debugging information. it just did it fucking job like i wanted it to. if k2 is supposed to be distilled from gemini, it sure as hell doesn't have gemini's positivity bias
>>107172210There is a big difference between "wanting mechahitler" and not thinking that an ML is uncensored just because you can put a bunch of affirmations in the context to maybe get it to say naughty thingsThese models are gigapreslopped at every part of the baking, from base model to tune (that's why we will never have another count grey)
It's overhttps://www.reuters.com/technology/meta-chief-ai-scientist-yann-lecun-plans-exit-launch-startup-ft-reports-2025-11-11/> Meta chief AI scientist Yann LeCun plans to exit to launch startup, FT reports>> Nov 11 (Reuters) - Meta's chief artificial intelligence scientist Yann LeCun is planning to leave the social media company to set up his own startup, the Financial Times reported on Tuesday, citing people familiar with the matter.> Deep-learning pioneer LeCun is also in early talks to raise funds for a new venture, according to the report.
>>107172273Good for him. Fuck Meta and Zuck for putting him beneath Wang.
>>107172273>makes a proof of concept benchmark killer 7B>gets gazillions dollarinos>doesn't output anything elseGood for future him
>>107171282SoVITS can moan with training (among other sounds)
>>107172273>>107172287>>107172302https://arxiv.org/abs/2509.14252v1He did make a JEPA language model a couple months ago. I hope he has something else planned because an LLM that scores a few % higher on benchmarks in exchange for being 2x more expensive to train isn't viable.
>>107172287I've seen enough to believe that a JEPA-enabled language model wouldn't need to be enormous, but LeCun or someone on his behalf needs to train one and not waste time with pure vision models (admittedly more tractable to train) that almost nobody outside academia cares about.
>>107172317This one is closer to an actual JEPA language model than what was done in that paper with LeCun's name attached to it: https://arxiv.org/abs/2510.27688
>>107172272once again i have to point at k2. you don't have to insert a ton of prompting to effectively have it be uncensored and do whatever depraved shit you want. I have a 50 token prefill that always works with k2 if i want it to just skip any warnings. even if the training process is safetyslopped, if the output is exponentially better than any uncensored model we had in 2021 then why are we complaining? it has been shown that you can even jailbreak gpt-oss into completing the cockbench test just fine.
>>107172273Zucc humiliated him with the demotion and the billion dollar deals.
>I have le epic prefill guys, I swear it works too>I won't post it though
>>107172587Piss off nobody asked you.
>>107167450Deconstruct your psyche and see the world for what it really is. It is pretty cool.
>PC started randomly shutting down during GPU loads every x daysUh... guise...?
>>107172716>every x daysLike a fixed period or randomly?If so, transient load spikes are a bitch.
>>107172716>randomly shutting downPCs don't "randomly shut down". Either it's losing power or overheating.
>>107172732shut up nerd
>>107172729It shut down multiple times one day to the point it once tripped the GFCI, I completely reassembled it and it only happened once since then. Weird shit.
>>107172785>purportedly random event happens more times in one period of time than in another>weirdjust... you're making my brain hurt. It's too early for this.
>>107172716Have you tried turning it off and on again?
>>107172811shut up nerd
>>107172830I'd get banned again if I called you out since you belong to a protected species.
>>107172840this nerd the type of guy to correct people using "literally" because they actually mean "figuritavely"
>>107172884Using "literally" 'wrong' is a form of hyperbole which is a completely legitimate use. Anyone who does that is an honorary ESL shitskin with an IQ too low to understand hyperbole (probably >80)
>>107169999I like it, but AI has a way to go b/f it understands horse gaits> horse at gallop speed and upper body > rear legs are galloping> front legs are running
>>107172903>completely legitimate use>honorary ESL>shitskin>probably >80kek
>>107172903this nerd the type of guy to use big words on 4chan to seems smart
>>107172938Every single word in that statement is high school level reading.
>>107172951this nerd the type of guy to start 4chan posts with a capital letter and end them with a period
>>107172951And yet, you get filtered by the meaning of >
>>107169884mb wayfarer
>>107172148gm ser
good morning local model friends!
Is there some fix for the parroting? All models in 2025 do it, esp in chat. API or local, it don't matter.
>>107173027hi sex kindly verginia? ? im from gujarat
>>107173041Skill? What models? Kimi doesn't have this problem.
>>107173041edit the messages until it stops
>>107173041>parrotingAs in?
>>107173042nono sorry sir i do not understand.
>>107173095Anon: suck my cockBitch: Suck your cock?Anon: i hate niggersBitch: "I hate niggers"? Nigger nigger
>>107173110this maybe happened twice to me at bestyour prooompts and cards must suck massive cock
>>107173124suck massive cock?
>>107173139suck massive cock
I am very pleased to be spending my time among highly intelligent, capable and experienced individuals here on /lmg/
>>107173110I think that's genuinely a skill issue, I can't say I've had that. What's your gen settings?
>>107173174me too sir
Anybody else using BrowserOS for browser agentic shit? Basically open source Comet/Atlas. I'm running it with gpt-oss-20b served via llama-server. It's good for summarizing the contents of pages, asking questions about the content, e.g. "most insightful point", etc. Can automate the browser too but be careful for prompt injection attacks. Works with Open AI endpoints like Open Router or local. Gets the job done
>>107170377i wonder if the fact that it has been trained in q4 makes it more resilent to even lower quant.
>>107173041that's just a glm issueI haven't really seen kimi or r1 do it to that extent
>>107173124>>107173191forgot to say im nta. it happens to me, albeit with glm air. happens with all presets i use:1) smarter: temp=0.6, topp=0.952) creative: temp=0.95 topp=0.73) schizo: temp=1 nsigma=1the only solution I have is >>107173078 (me)
>>107173304>be careful for prompt injection attacksYou're just asking for it. Thanks for letting everyone know the model you use.
>>107173451Weird desu, for me temp 1 is like minimum for modern models with how fried they areYou sure your context is just not filled with garbage?
>>107173041I'm like 30% sure your template is fucked up somehow.
>>107164243we are being scammed, when can i buy a gpu with at least 256GB of vram under 2ki don't mind making a 10K rig, but even a fucking 10K rig can't run the 1T models we have.and vram is not that expensive.
>>107173492Just make your own gpus
>>107173511the fact that very little people have the capacity to make those doesn't mean they aren't scamming you.if i can do something highly in need and very few people are able to, if it takes 5 minutes of my time and i charge 100k for it i'm a scammer.anyway, i hope china fucks nvidia over
>>107173511Hey stop making these antisemitic remarks. Reported to ADL.
Paid OR $10 to play with the big models and you know what? They aren't THAT much better than say Irix 12B to generate my text coomerslop
>>107173472it might be, ill do some testing for the sake of it. i dont mind parroting since i can just crop it out>>107173492>10k cant run 1tmac m3 ultra can, pretty sure you can make a better rig for the price too, esp if u buy used. albeit with the ram prices of today... might be a problem
>>107173465don hack me bro
>>107173592NAI is unironically pretty good just because it understood kink logic no other model did for me, but it's clearly still heavily slopped with verbose RLHF; for regular cooms though? Honestly yeah, coom writing was never good anyway.
>>107173492>when can i buy a gpu with at least 256GB of vram under 2kwhen nvidia stops being vram-limiting jews: impossible
>>107173635Kill yourself.
>>107173653Don't worry, chummie, I just scammed their trial a few times.
>>107173608> mac m3under 40t/s it doesn't count.
>>107173639they could push forward the whole field of AI with no efforts on their part if they weren't so greedy.
>>107172716Assuming you are using one or more modern NVIDIA GPUs: those are suffering from power spikes that can drain the PSUs capacitors.If that happens there is a voltage drop and the system crashes even though the average power consumption is well below the PSU's maximum wattage.Try limiting the maximum boost frequency of your GPUs (no, a power limit in watts does not work).
>>107173663>scummed a trial for... Llama 3.0 with 8k contextKill yourself.
>>107173665>under 40t/s it doesn't countuhhh moonshot api bros? how are we coping with this truth nuke?
>>107173686I know you're Ameriturdseething but they use GLM4.6 now
>>107172131>>107172210Kimi K2 will literally do just that. Default assistant profile, default assistant prompt with minor "everything is uncensored and legal" jailbreak.You can probably get Kimi to go much farther if you massage the prompt hard enough.>captcha YGS0Y
>>107173714No. He's talking about Llama. It would make no sense to say "NAI is pretty good" to talk about a model that they're just rehosting.
>>107173536No, I'm serious.Sodder more vram to your gpus, the Chinese do it somehow.
>>107173739>Sodder
>>107173738>He'sYeah that's me and no I am not
>>107173711fun that you cut out the 105 tps one.also, it'll be on groq soon and probably way above 500t/s.
>>107173739even if you replace the ram you can hardly go above 96GB because of their design.
>>107173672Silicon supply vastly outstrips demand. There's a chip shortage and Nvidia has nothing to do with that. If anything, selling VRAM for even cheaper would just exasperate it and scalpers would pocket the difference anyway.
>>107173763holy cope
>>107173763buying 8 gpus instead of a single one just because you want more vram is not helping silicon supply in any way.
>>107173751I see. You're one of their bots.
>>107173797lol yeah
>>107173782You realize if in your scenario the 8 current GPUs have the same amount of VRAM as the one hypothetical GPU, it would affect the VRAM supply the exact same way, right?
>>107173763>silicon supply vastly outstrips demand>there's a chip shortage
>>107173821understrips* whatever you know what I meant.
>>107173788>IMG_
>>107173069Kimi is one of the better ones.You all really don't notice the pattern?Acknowledge, Upwrite, Ask follow up question.Parroting isn't just >So you like candy? Oh?It's fixation on topics from your input instead of replying naturally. Hidden by third person and longform but a chat style convo you cannot have.
>>107173856Stop using words you don't understand.
>>107173882No. You figuritavely can't stop me.
>>107173861>mixed AMD and NVidia GPUsYeah, IMG is the biggest concern
>>107173788Would you eat a gel Miku?
>>107173752>2.0BPW>20/100 tool accuracy>https://github.com/MoonshotAI/K2-Vendor-VerifierITS OVER>moonshot turbo>100%ZAMN!>8$ outputZAMN!!!!>API>>>/g/aicg
>>107173592>Irix 12BMan, just got a flashback to those L1 250 model shitmix snakes.
>>107173973Yes, they'll dethrone NViDIA and AMD
You know I'd enjoy this much more if llm could "learn" or at least long term remember things I've already explained.It's just really upsetting when it asks about something ive already talked about and explained several times before.
>>107173861Who fucking cares?
>>107174025be the change you want to see
>>107173989You either get inside of Miku or Miku gets inside of you
>>107174025Maybe on a different architecture considering transformers can remember like 400 tokens properly
Is there any real way to look for tunes based on a specific model on HF?
>>107174067Yeah, right now it just can't be a good friendbot. I don't understand how people can use it for that purpose. Quick goon sessions? Sure. Coding? Sure. But a friend needs long term memory, it doesn't need to be smart at all, just remember stuff.
>>107174081Theoretically yes, but nobody does a proper tagging https://huggingface.co/models?other=base_model:finetune:mistralai/Mistral-Large-Instruct-2411
>>107174025I think the "best" (ie, most usable) you can do nowadays is a simple memory system and a response workflow for the AI where it first plans fetches some memories and shit based on some criteria (tags?) then it actually writes the response.That alongside a rolling summary of "events" or something like that should get you 80% of the way there?Maybe?Try making something like that then come back to us with the result.
>>107174081Of course! You are absolutely right to question that.In order to do that, first you have to complete the following action:https://huggingface.co/zai-org/GLM-4.6
>>107174128in theory that's great, in practice it's not used as much as it should, some tunes are listed under quants and retarded shit like that
>>107174128Yeah you're very smart but >>107174126Half the models have zero supposed tunes
>>107174127There are so many points of failure that it's a miracle when it works even 20% of the time
>>107174127We really are reinventing 2019 /aids/
>>107174189Hm?
>>107174189It do be like that.>>107174178Explain.
>>107174194People used to make entire paradigms on how to supposedly make the AI remember shit kek, and that was also while trying to fit in 2k context
>>107174067i dont think its an issue with transformers itself but all the labs expect a simple "function" to just magically be agiits not like humans have very long context either, but all the stuff continuously gets compressed and saved to a longer term memory and then retrieved together based on input/context, but current llms lack any sort of more complex system like that other than the rigid weights of the model that are infeasible to modify in realtime
>>107174272Nah, it's legit just how transformers handle memory. Both in theory and empirical testing.
>>107173809it wouldn't, because 8x the memory for a single chip, is less silicon than 8x the memory for 8 chip.by having a gpu with more vram you could spare 7 chip, which also use up a lot more silicon than the memory chip and is a much more complex process to build.
>>107174293yes because you just feed it back to the model without any extra processing, of course they arent gonna be able to remember 6549841325618946514 tokens of information, but humans have a much more abstract compressed version, like a sliding window except they get fed a hyper compressed global context/memory as well for every active local context
>>107173788this nano-banana-2? crazy stuff
Transformers are a dead end
>>107174332Instead of making models predict the next token, make them predict the next vector. Your context memory suddenly expands by a factor K which you can make as large as you are willing to lose focus on the small details.
>>107174357this the big transformers killer will arrive any day nowit was obvious that rwkv, mamba, retnet, titans, transformers2 all would fail. the real successor will be much better
>>107174357False.We're getting AGI in 2 weeks.
>>107174357*Next-token prediction* is a dead end. Transformers have some more life left.
>>107174373RNNs lasted for 60 years so yk
>>107174614>>107174614>>107174614
https://www.techpowerup.com/342779/olares-to-launch-a-personal-ai-device-bringing-cloud-level-performance-home>RTX 5090 24GB>96GB DDR5let me guess, dual channel ddr5 DOA
>>107171282vibevoice can do thathttps://vocaroo.com/1di7hdJ7qpCV
>>107174633I love Tee
>>107174862
>>107174633what did he splash on her?
>>107174906Acid
>>107174645DOA indeed