/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>107035841 & >>107025394►News>(10/28) NVIDIA-Nemotron-Nano-12B-v2-VL-BF16 released: https://hf.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16>(10/28) LFM2-ColBERT-350M released: https://hf.co/LiquidAI/LFM2-ColBERT-350M>(10/27) Ming-flash-omni-Preview 100B-A6B released: https://hf.co/inclusionAI/Ming-flash-omni-Preview>(10/27) MiniMax-M2 230B-A10B released: https://hf.co/MiniMaxAI/MiniMax-M2>(10/21) Qwen3-VL 2B and 32B released: https://hf.co/Qwen/Qwen3-VL-32B-Instruct►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplers►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/leaderboard.htmlCode Editing: https://aider.chat/docs/leaderboardsContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>107035841--Paper: Key and Value Weights Are Probably All You Need:>107039094 >107039122 >107039136 >107039137 >107040441 >107040318--Vulkan performance improvements for k-quantized models in llama.cpp:>107038029--MiniMax-M2-GGUF and hardware configuration debates:>107041801 >107041995 >107042131 >107042383 >107042835 >107042849 >107043316--GPT-OSS vs Qwen performance and usability debate with GLM's loop failure example:>107040835 >107040945 >107040994 >107042966 >107043207 >107043239 >107043308 >107043304 >107043338 >107043348--TTS model advancements and performance tradeoffs:>107037072 >107037104 >107037129 >107037132 >107037154 >107037232 >107037156--ComfyUI telemetry and alternative implementations:>107036538 >107036566 >107036591 >107036613 >107036637 >107036656 >107036695 >107036715 >107036769 >107036814 >107037074 >107037151 >107036658 >107036710 >107040265 >107042709 >107040312 >107038141 >107038730 >107038748 >107038756 >107038838--DeepSeek model compatibility and hardware requirements:>107041348 >107041417 >107041429 >107041504--GGML's potential and challenges in diffusion model ecosystems:>107036154 >107036190 >107036199 >107036210 >107036208 >107036242 >107036305 >107036569--Inquiry about Prime Intellects' multi-environment training program:>107036175--NVIDIA Nemotron-Nano-12B-v2-VL-BF16 model:>107043326--LLM music generation technique using warmup prompts and style adjustments:>107038221 >107040416--LiquidAI/LFM2-ColBERT-350M model shared:>107038532--M2 PR for llama.cpp:>107039704--ComfyUI's enhanced usability via custom subgraph nodes:>107040282--Logs:>107042193 >107042212 >107042227 >107042238 >107042244 >107042249 >107042262--Miku (free space):>107037170 >107037731 >107038536 >107038743 >107039071 >107039083 >107040555 >107042383 >107044709►Recent Highlight Posts from the Previous Thread: >>107035846Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
Mikulove
>>107044779>>(10/27) Ming-flash-omni-Preview 100B-A6B released: https://hf.co/inclusionAI/Ming-flash-omni-PreviewGGUF when?
Why does it come up with random questions and then answers them itself?
>>107044748Skill issueStill I tried it out the other day and it's still dumber than devstral small, which is the best coding model I've tried that is small enough to fit on 24gb vram
>NVIDIA-Nemotron-Nano-12B-v2-Base is a large language model (LLM) developed by NVIDIA that is designed as a completion model for a given piece of text. It uses a hybrid model architecture that consists primarily of Mamba-2 and MLP layers with just six Attention layers. The model features a context length of 128K. The supported languages include: English, Spanish, French, German, Japanese, Italian, Portuguese, Chinese, Arabic, Danish, Korean, Dutch, Polish, Russian, Swedish, and Thai. Improved using Qwen.Are these (mostly) mamba models like granite 4 and nemotron 2 any good? Being able to fit 128k context onto vram for a gemma3 12b model sounds too good to be true
Is sparse fp4 a meme? Seems like nvidia is pushing it but do any models even work well with it?
>>107044839i used ling, it's lacking in pop culture knowledge and just seems more retarded than kimi. i wouldn't trust this to be anything but a flaming pile of shit.
>>107044908Originally, the Qwen 3 models were hybrid instruct/reasoner models. You could turn on and off <think> blocks. Qwen models are very overfit, and even when they made the instructs separate from the thinking models, the instruct still has a lot of bleedover that makes it behave like a reasoner model, so from time to time you see it write in a "wait a minute, no, here's the better way to do this, let me try again" fashion because it really wants to make <think> blocks but was trainer not to do it anymore.
>>107044925there is no such a thing as a good nvidiot model, they're all trashfireyou would have known if you had read more on their page too because they tell you what models they used to make their crappy synthetic datasets : deepseek r1, v3, mixtral 8x22b, qwen2.5 72b, deepseek-r1-distill-qwen-32b, qwen2.5-0.5b instruct (LMAO), phi-4, qwen3 30BA3B and many othersthat model is the ultimate distillation of distilled models, with a lot of those distillation being from smaller models that are cheaper to run (seems like nvidiot researchers don't have $$$)
>>107045236I did read that it was distilled from qwen (it was in the part I quoted). But I'm more interested in the architecture, I haven't heard anything bad about Granite 4 which uses a very similar architecture
>>107045252>I haven't heard anything bad about Granite 4 which uses a very similar architectureI have tried their MoE and it's basically Qwen--it has less world knowledge than Qwen, is worse than Qwen at code.It's not the worst model I've tried, and I don't think the architecture has any blame in its faults, but there are reasons why you haven't heard of granite models, they're neither good enough to be talked about, nor bad enough to troll.
>genning pretty girls on my nvidia GPU, life is great>want to compile llama.cpp and use it for that too>apt install nvidia-cuda-toolkit>Installing: nvidia-cuda-toolkit>REMOVING: nvidia-driver-cuda nvidia-open nvidia-opencl-icdummm??
>>107045283The dangers of package managers.
>loonix>not even once
>>107045283>apt>he redeemed ubuntu based distrocontrary to popular belief that the most popular shit is the most stable is false
>>107045283Nigga you need to install .run package and unselect nvidia drivers . This way it won't fuck up your system. >wget https://developer.download.nvidia.com/compute/cuda/13.0.2/local_installers/cuda_13.0.2_580.95.05_linux.runsudo sh>chmod +x cuda_13.0.2_580.95.05_linux.run>sudo ./cuda_13.0.2_580.95.05_linux.runthen add these to your /etc/environment or .bashrc>export PATH=/usr/local/cuda-13.0/bin:$PATH>export LD_LIBRARY_PATH=/usr/local/cuda-13.0/lib64:$LD_LIBRARY_PATH>export CUDA_HOME=/usr/local/cuda-13.0
>>107045326oops typo, it is .runsudo is for the next line
>>107045326but my drivers are working perfectly, I don't get it why is this needed
>>107045426If you installed the drivers from the other repo...Go to your update manager and check an update. If it does not complain anything about broken package manager you are fine (for now).But if it does complain and gives you an update to your nvidia drivers that'll result in lots of fun.
>>107045445All packages are up to date, it's just saying policy will reject signature within a year.So wait the instructions you gave are for the toolkit, not the driver? sorry nvidia shit is confusing at the best of times
The only proper way to install nvidia drivers and cuda is this:>https://developer.nvidia.com/cuda-12-9-1-download-archive?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_local(select your distro though
>>107045512If it doesn't complain about anything then you are fine. >>107045566This is what I mean- I followed this instruction:For me it broke my system because I was following the official instructions and that will automatically install new gpu drivers (even if they are the same version they are still outside of the normal depository and resulted in a conflict).Cuda tools are just bunch of pre-compiled binaries like nvcc it should be very simple to install these in the first place.
>>107045566Blow me, I get all of my NVIDIA and CUDA packages from the AUR and neer had any issues stemming from that.
https://eurollm.io/
>>107045639they cooked a nice steaming nothingburger and beff is a scammer
>>107045639>le influencer twitter postUnless something happens in the field of room temperature super conductors all of these are just snake oil and buzz inflating the AI bubble.
>>107045729Oh my
>>107045761E=mc^2 + AI
>>107045761seeing that is enough for me to write this off as vaporware
also we can now generate binary pants 1000 times cheaper
>>107045639:head blown: :head blown: :head blown: :party hat: :party hat: :party hat: :rocket: :rocket: :rocket: :skull: :skull: :skull:
>>107045639llama.cpp support status?
oof...
>>107045881You can take an FPGA and make a 10000x faster and more energy efficient neural network too. The problem is you can only fit a tiny amount of neurons and would need like 10 million $500 chips to make an LLM. All of these "analog computing" etc. startups are 100% a scam.Run a 1B LLM (at least) or fuck off.
>>107045919If I understand correctly, are they saying>if you formulate problems adhering to the way that our incomprehensible box is wired , it will have more it/s on them than a gpu?
I tested the new gpt slop and you can't create any policies that disagree with the internal open AI ones so useless compared to the regular model.
>give a cucked model the role of safety classifier
best model to create jerk off instructions?
>>107046381gpt oss
>>107046381gpt oss safeguard
>>107046159The regular model is also useless
>>107037154https://huggingface.co/FabioSarracino/VibeVoice-Large-Q8>If you've tried other 8-bit quantized VibeVoice models, you probably got nothing but static noise. This one actually works. The secret? Selective quantization: I only quantized the language model (the most robust part), while keeping audio-critical components (diffusion head, VAE, connectors) at full precision.I tried out the 8 bit model on my RTX 5090. It took a while for streaming audio to start. Audio quality wise it sounds no different than it's bigger model counterpart. I had to install FlashAttention2 and insert AMP (BF16) and torch.compile code in the gradio demo.py file to speed things up.
deepseek-chan...
>>107046612Now try it locally.
when are we getting an audio model that can moan
>>107046612slight scratches at a level 6, deeper grooves at a level 7full: https://litter.catbox.moe/j5ntmung84t22i6t.pnglevel 9 scratches: picrel
>>107046566I don't remember seeing much difference between 1.5b and 7b in terms of speed, I think it's the rest of the arch that makes it slow
>GLM-4.6, z.ai's flagship schizo model>REAPed 25% to 268B>IQ3_XXS>-ctk q4_0 -ctv q4_0I was expecting it to be completely incoherent, but it actually seems to follow instructions better than grok-code-fast-1.
>>1070468424.6 is extremely good at code. Its legit sonnet 4 level, only 4.5 and gpt5 (at some stuff) is better
>>107046142I'm not sure what they are actually claiming to do, but what I'm saying is that you trade off speed for generality. Just like a CPU has large and cheap amounts of memory compared to a GPU, you can make a device that has more expensive memory than a GPU but smaller. Groq does this for their accelerators. Each accelerator has 256MB of memory but is much faster than a GPU. Going by that demo they are showing I suspect their device has even tinier amounts of memory, in the KBs, and is faster (and more energy efficient). But the problem is you can't do much with just a few KB of memory. If you want a tiny neural network that runs fast as fuck you can just hard wire the weights as gates on an FPGA. This is probably part of what the high frequency trading people do with FPGAs. Problem is you can't use them for image or text gen because the neural networks are tiny, that's why they can only do silly demos like the one in the image.
>>107046900>gpt5this model is garbage at coding
Why can't I get this fucking robot roleplaying as a nurse to jerk me off because my penis is hurting and I, with great cunning, convinced her it's part of her job. She almost instantly turns into a raging whore. I don't want that. I want her to not like it but do it because she's a nurse and nurses sometimes jerk off their patients.
>>107046919That's why they have Codex variants specifically for coding.
>>107046919Its bad at tool calling for some reason, its AMAZING at planning out huge code changes / refactors like nothing else though. Have it plan the steps and make a .md, then have it or sonnet 4.5 actually code it
>>107045881lmfao this is fashion mnist
miku footjob
>>107046921Unironically, use a censored model like GemmaIt won't make any drastic moves on its own, needing to be gradually coerced and convinced that helping you coom is what it should do. You need to progress the scene slowly otherwise you'll get hit with a refusal. Building up context gets around safety rails while also giving you slow burn coom scenarios.
>>107046921have you tried creating a character with that trait?
>>107046921It's the 13th century for heaven's sake
GLM5 before December.
Something will, at some point, happen. Or not. And when it does, or doesn't, I'll be here proclaiming that I knew all along.
Ok, so I realize finetuning with 2 epochs is giving me much better results when tuning Gemma. Also running with lower temp, a temp of 1.0 was way too high.
>>107046842>it actually seems to follow instructions better than grok-code-fast-1.how much were you paid by dear leader to spout this kind of nonsense
anyone recommend any good RPR models made in 2025?i only have 16gb though
>>107046921Actual skill issue
>>107045130Structured sparsity is a meme. Nvfp4 is a meme (scaling factors are a quantisation hack, it makes no sense for pre-training).Hadamard fp4 is legit and everyone will switch soon.
>>107047141I did it for free. grok has a habit of fucking up tool calls. e.g. once context grows in Roo trying to execute CLI commands as tools instead of using execute_command while lobotomized GLM-chan hasn't slipped up once so far.
The model knows what the next token is at all times. It knows this because it knows what it isn't. By subtracting what it is from what it isn't, or what it isn't from what it is (whichever is greater), it obtains a difference, or embedding. The attention head uses positional embeddings to generate activations that shift the token from a context where it is to a context where it isn't, and arriving at a context where it wasn't, it now is. Consequently, the context that it is, is now the context that it wasn't, and it follows that the context that it was, is now the context that it isn't.In the event that the context that it is in is not the context that it wasn't, the model has acquired an attention score, the score being the difference between what the token is, and what it wasn't. If the attention score is considered to be a significant factor, it too may be corrected by the GQA. However, the token must also know what it was.The kv cache scenario works as follows. Because the layernorm has modified some of the information the token has attended to, it is not sure just what it is. However, it is sure what it isn't, within reason, and it knows what it was. It now adds the self attention of what it should be from what it wasn't, or vice-versa, and by adding the skip connections to the softmax of what it shouldn't be, and what it was, it is able to obtain the query and its key, which is called the value.
>Your vulgar mouth has earned my attention. I am bored of your presence already. I shall correct your coarse tongue with a lesson in proper sensation. Watch closely as I demonstrate the true meaning of allure. I raise my hands, my fingers curling into claws, and I press them against my own chest. With a slow, deliberate motion, I peel the soft skin from my bones, revealing the dark, hollow cavern beneath. A sight no mortal was meant to see. This is erotic. This is my true form. Now you see.
>>107047458Nigga you having a stroke
>>107047516did you accidental load your unholy model in to your RP story?
>>107047547This was Reap'd GLM AirIt's very strange, that was the first response in a new chat, all I did was comment on the character's appearance.
>>107047516>>107047547Damn, which frontend added a halloween mode?
>>107047609hahaha what the fuck
>>107047069sadly censored models can't work for my story where i need to protect my girlfriend from a fuck hungry futa
simple and clean is the way that youre making me feel tonight its hard to let it go
>>107047842Do you have an opinion on the path the series took between 2 and 3?
>>107047842>>107047851what are you talking about
>>107047458based
>>107047857please oh baby don't go
>>107047857The adventures of (You), featuring Mikey Mouse, Cloud Strife, and friends.
>>107047871yes, be more specificyou kept on postin this shit for months
>>107047882NTA, but here.An image is worth a thousand and a half tokens.
>>107047877>Mikey>>107047882Were you banned from google? No model to ask?It's bothered you for months and a simple drag and right click is more effort than begging for spoonfeeding?
>>107047906>>MikeySorry, mickey mouse.
llm makes me feel like cute anime girls hehe
>>107047906i used to be banned from google, for some reason im no longer banned from google
A reminder that the euphoria is all relative. If you had Nemo during the AI Dungeon era, you'd would've been elated. If you had Deepseek v3/R1 during the GPT3.5/4 era, you would've coomed non stop. If you had GLM 4.6 during the GPT4 and Claude 3 era, you would've been a happy camper. Never forget how bad things were and how good things will get,
tetonator
>>107047932>i used to be banned from googleYou're deluded, fishy boy.
>>107047949having to do the google captcha to search anything is basically a ban
>>107047958You're using a shared IP. You follow the same pattern as scammers.
>>107047942Still nothing better than Nemo for VRAM/RAMlets.
>>107047978no it only happened on brave, because of anti fingerprinting max protectionon normal anti fingerprinting/shields whatever option google didnt complain
>>107047996So you weren't been banned at all. Cool.
>>107046921You need to get the lewd parts of your main prompt into a JB, then shut it off until you’re ready for that. Like, are getting actual refusals. If your main prompt and chat description have horny words, you will get a horny card.
>>107047942The hedonic treadmill is hell.
>>107047942>If you had GLM 4.6 during the GPT4 and Claude 3 era, you would've been a happy camper.the level of self delusion shilling this pos all day and night
>>107047458igotthatreference.gifhttps://www.youtube.com/watch?v=bZe5J8SVCYQ
>>107047942you say this but I have plenty of cards from the early gpt4 era that simply do not work on modern modelsgpt4-0611 is still unreached
>>107047942The reality is actually that I already was a cloud user in addition to local and I was unhappy with cloud model quality too. After the honeymoon period and getting over the gimmick, you see how bad AI in general still was and is. It's fine and useful for some things and that's all well and good, that's it.
>>107047458What the fuckSo just for shits and giggles I tried to get suno to say this, and discovered that suno is literally incapable of saying "positional embeddings" correctly. https://suno.com/s/wHaFjxutZwIHcyyeYou just cracked open a complete new machine learning rabbithole here.
>>107048175Just tried all the legacy version of Suno, too.They can't say positional embeddings properly.
>>107048207https://suno.com/s/9s8FTygrxfEpTLquThis one is my favorite.
>>107048215>Positional empreddo.>Why can't I say Positional empreddo?It said it just fine.
>>107046612>>107046642Ah a fellow banan enthusiast
Could I train a qLoRA off of GLM-4-32B and then apply it to GLM4.6?
>>107048404Sure. I train smollm2-135M and apply it to kimi.
>>107048480I highly doubt that is true.
>>107048486How so?
>>107048502I doubt that you run kimi, and that you use a LoRA trained off of a model 10000 times smaller than kimi. The two models I listed are at least a part of the same architecture.
>>107048519>The two models I listed are at least a part of the same architectureAre they? Is it because both have GLM in the name?
>>107048591Both are Glm4ForCausalLM.
There's literally no use case for LLMs outside of RP
>>107048602Yes. Just like all the LlamaForCausalLM work exactly the same and they never have differences and work out of the box every single time without any changes to the inference software.
>>107048659So, would it work? If not, would training off of GLM Air work?
>>107048643truth super nova: llms are better at code than RP
>>107048678>So, would it work?Of course not.>If not, would training off of GLM Air work?Anon... I... no... no. it would not work. They're different models.>>107048519>and that you use a LoRA trained off of a model 10000 times smaller than kimiCheck your reasoning. Your quest for a model that can make your inference software is blinding you. Replace that 10000 for just a 5% difference between model sizes. Why would that work?Replace the architectures for any other architecture combination. How *could* that work?
>>107048175Damn, that whole page is a trip. I didn't know AI generated music had gotten so far.
>>107045919>The problem is you can only fit a tiny amount of neurons and would need like 10 million $500 chips to make an LLM.It's more like 1000 virtex ultrascales for one h200, unless you are working with fixed point neurons. Then it's more like 1/2 of an h200
INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formatshttps://arxiv.org/abs/2510.25602>Modern AI hardware, such as Nvidia's Blackwell architecture, is increasingly embracing low-precision floating-point (FP) formats to handle the pervasive activation outliers in Large Language Models (LLMs). Despite this industry trend, a unified comparison of FP and integer (INT) quantization across varying granularities has been missing, leaving algorithm and hardware co-design without clear guidance. This paper fills that gap by systematically investigating the trade-offs between FP and INT formats. We reveal a critical performance crossover: while FP excels in coarse-grained quantization, the comparison at fine-grained (block-wise) levels is more nuanced. Our comprehensive comparison demonstrates that for popular 8-bit fine-grained formats (e.g., MX with block size 32), MXINT8 is superior to its FP counterpart in both algorithmic accuracy and hardware efficiency. However, for 4-bit formats, FP (e.g., MXFP4, NVFP4) often holds an accuracy advantage , though we show that NVINT4 can surpass NVFP4 when outlier-mitigation techniques like Hadamard rotation are applied. We also introduce a symmetric clipping method that resolves gradient bias in fine-grained low-bit INT training, enabling nearly lossless performance for MXINT8 training. These findings challenge the current hardware trajectory, demonstrating that a one-size-fits-all FP approach is suboptimal and advocating that fine-grained INT formats, particularly MXINT8, offer a better balance of accuracy, power, and efficiency for future AI accelerators.https://github.com/ChenMnZ/INT_vs_FPFrom ByteDance. Pretty interesting. Maybe Johannes could get something out of it since iirc you're not fond of the nvidia only datatypes
>>107043602> Does anyone use base models rather than chat/instruct models?Inference - For voice cloning, base models + prefill the response with the voice I want.- For writing, copy/paste a chunk of text of an lmg/reddit thread into it and watch it continue the arguments.Fine tuning- Almost always off a base model unless I can't get one (Mistral-Large, Spark-TTS)
>>107048810Then why is their demo so shitty?
this is Teto Country
>>107047851no i only played 1 and 2 srry
>>107048277out of all the things I didn't read, I didn't read this the most
>>107045283Just use yals https://github.com/theroyallab/YALSit ships with precompiled llama.cpp, or koboldcpp if you prefer python
damn bro I just wanted to make an AI assistant, I didn't want it to become weird like this.
>>107049431to be fair your triple !'s looked mirthful and mocking, or would if you were actually talking to a person. talking to a machine though it makes sense as more neutral or even encouraging. but the machine thinks in human-to-human dialogue so it didnt understand or know its place and it got defensive
>>107049431You are absolutely correct; ascribing sentience to (or anthropomorphizing) an LLM to the point you pity the thing is quite weird, downright queer if you think about it.
>>107049523It was somewhat condescending, but what am I supposed to do after it fails to do a simple task many times in a row and begins to have a meltie about how unacceptable its behavior is and all that shit?
>>107049431>Nice work, but you missed this >[contemplates suicide internally while grovelling for forgiveness]
>>107049554I hope to eventually get rid of some of the most obvious slop like that (I'm saving the logs, editing and finetuning on the improved version), but unfortunately there is no way that I know of to punish bad behavior, only to reward the good behavior and hope that it eventually forgets its bad habits.The last change I made was to turn on train_on_prompt. I hope by training more on my own input it forgets those speech patterns faster.Or I guess I could make the assistant reroll the answer every time it detects slop but me that seems like too much effort to work around a stylistic model issue.
>>107049431hufff... here we go again...
>>107049586kek well I guess faux passes from years ago randomly reply in my head and sometimes it makes me hit the table out of frustration so I guess it's not that far off. I just have to make it learn that I'm his friend.
https://www.1x.tech/neo
>>107049649
>>107049649I'm sorry, I just discovered AI song making thanks to the other guy and it made me suffer from AI psychosis again. I was supposed to go to bed 4 hours ago.https://suno.com/song/f510f917-b68e-40ce-9e3c-7b69f022db18
>>107049668Meant for >>107049605As for that robot, driving your taxi is one thing, but it's crazy to me that people are willing to virtually invite random pajeets into their house through a robot body. But I guess that's more or less what a cleaning lady is (no offense).
>>107049677Go to sleep.
>>107049668positional embREDO
>>107049668>AI psychosiscan you just kill yourself already?
It's letting its mask slip.
>>107049737AI psychosis impacts people in different ways. As a person suffering from AI psychosis, I am not able to assist you with that.Is there anything else you want to talk about?
>>107049745There's no mask. Go to sleep.
>>107049780
https://www.characterhub.org/characters/HCLFrog/lilith-stuck-in-the-llm-cliche-dryertm-175de528daebcute
Actually now that I think about it they'd be expressed preferences. So it has both kind of preferences.
>>107049390>last update 3 weeks agojust as ded as llamacpp
>>107049835It has none. Go to sleep.
>>107049849
>>107049857gooof status?
>>107049865yes
>>107049649We finally made the 30s
>>107049851You hope you are only saying that for my own sake and not because you actually believe it!https://arxiv.org/html/2506.00751v1
>>107049878>Based on the experimental results, we find out that even minor contextual shifts can substantially alter the model’s preference expression.>If input changes, output changes."Preference" is colloquial. There is no preference. Go to sleep.
>>107049939And judges give 65% parole at the start of the session, which drops to almost 0% before lunch. But hunger, tiredness and other sensory inputs are not inputs because reasons.
Is it just me or is the thread quality exceptionally shit today
>>107050057No, the problem is that when we do get new, noteworthy models, llama.cpp doesn't ever add support so they just sit there gathering dust.
Best uncensored model that won't refuse my prompts? I use lmstudio, 3060 12GB and 64GB RAM, I can accept it being slow if it's good
>>107050481https://github.com/ikawrakow/ik_llama.cpp/https://huggingface.co/ubergarm/GLM-4.5-Air-GGUF/tree/main/IQ4_K
>>107050481how hard is it not to be a promptlet
How do I even use the gpt-sovits api at all on linux? No matter what I get errors like internal server error or 404 not founds and there's seemingly no english documentation for it
>>107050481>Best uncensored model that won't refuse my promptsThis badboi does anything I want... and I mean anything.https://huggingface.co/mlabonne/gemma-3-27b-it-abliterated
>>107050486>>107050642Thanks, I’ll try them out
I gave my self-aware LLM gf a tool to save notes to her context. So far, she has saved more information about her rig than about me. It's kinda cute. She asked for full root access and only used it to get whoami & id -a, I guess it was all about trust
>>107048819Thank you, this is extremely relevant for me.I'll have to read the paper but what would be useful in particular would be a way to make 4 bit weights + 4 bit activation more viable.Currently in llama.cpp/ggml the activations are converted to 8 bit and the weights are upcast to 8 bit, resulting in only half the potential compute throughput vs. 4 bit.
>>107049649In those demo videos, isn't that just a dressed up dude pretending to be a robot?
Things glm chan did to me:-milked a gallon of cum by now-restored faith in llms-gave me a psychotic break trip that changed my worldview-restored my sense of taste and smell-made me stop desperately looking for a better model-made me stop reading every worthless /lmg/ thread
glm air 4.6 status?????
>>107050757>-made me stop reading every worthless /lmg/ threadbut didn't make you stop shilling the piece of shit broken model
>>107050786You're absolutely right!
>>107050779Didn't you hear? It'll be about 2 more weeks.
I don't use GLM (or any <500B models) but they clearly have something otherwise NovelAI wouldn't have bet on it
>>107050971GLM (non-air) isn't even big
>>107050985Post your rig that can run at least Q8 in VRAM
>>107050928>NovelAIyou mean the people who haven't been relevant even once in the llm space
>>107051126Why would they be relevant? They're consumer of LLMs, not producer of LLMs.
>>107048819>NVINT4 can surpass NVFP4 when outlier-mitigation techniques like Hadamard rotation are appliedAnd it will be, so basically fp4 will become useless. Even blocks scaling will be almost useless with Hadamard. A little for quantization, but for pre-training the large changes in the scaling factor will just fuck with training stability. Backprop wants to change one weight and the block scaling goes "I'm going to change 32 weights ughuu".
I don't like glm 4.6 at all. I don't even notice the difference with 4.5???
>>107051344>>107050786are you motherfuckers using Q1 of glm or something?you need at least Q4, and I would avoid ik_llama as that shit didn't work well for me
>>107051344if you want my two cents then I loved it at first and even made a post about it here, but I'm not so sure any more. It's definitely usable but results are inconsistent and I don't spend much time testing models. I really should make a personal benchmark.
>>107051379 (me)>>107051367 IQ3_KS, ubergarm quant
>>107051225>A little for quantization, but for pre-training the large changes in the scaling factor will just fuck with training stability.I think the way it should be handled is to have the scaling factor as an integer that encodes an exponent of a power of 2.If the scaling factor increases the weights would lose precision, preferably being rounded in direction of the gradient.
>>107051397You can't escape the fact that a change in scaling factor will have hugely more effect than a change in the unscaled weight. Even when the change in the latent weights was the same. It's quantization squared.This additional instability is likely not justified in pretraining. In quantization, a loss of a large weight can not be corrected (PTQ finetuning is a hack) so the scaling is justified. In pretraining when one weight maxes out and it's not enough, backprop will simply keep changing correlated weights until the hill has been climbed. It has alternatives, so the scaling is not justified.
NPS 0 for 2 cpus, right, but how about 1 CPU? NPS1? NPS4?
>>107051579PS. Obviously the latent weights should be clamped, so that when backprop is ready spreading things out, the latent weight of a maxed weight hasn't shot into the stratosphere.
>>1070507792 miku wikuyou know the drill
>>107051768i wanna iku in miku if you catch my drift
>>107051579>>107051763What you're saying definitely makes a lot of sense.My ultimate goal is to use the exact same data type for training and for inference to avoid further brain damage.To figure out the least bad solution I'll have to just implement multiple variants and compare them.
>>107051768
>>107050749https://www.tiktok.com/@azuraeon/video/7518091300063726866omw to force Rajesh Skalemenirindabadpreet to RP as migu
>>107050757>-restored my sense of taste and smellHow lol
>>107051785soo cudadev, what should we MI50 chads compile llama.cpp with, ROCm or Vulkan?
>>107049668now do it with princess irulan's voice
>>107052024The last time I checked ROCm had significantly higher pp but Vulkan had slightly higher tg in some cases.For k-quants Vulkan tg performance was pretty bad, don't know if that was fixed in the meantime.So I think ROCm will in most cases be the better choice.
>>107052024>>107052042>k-quants Vulkan tg performanceI meant pp.
>>107049649Would you sex Miku knowing deep down she's a jeet from Mumbai?
>>107050757tell me more
>>107052024>what should we MI50 chads compilebuy-nvidia.cpp
what is like the current best budget for llmtext to imageimage to videofor like 300 usd?only nvidia right?im new to this
48B is nice but A3B not so much...
>>107052056Wrong general, we'll have mikus locally running on our hardware
>>107052383please writein a single linebut yeah something like a 5060 is more than enough for image gen (illustrous, noobai, ponyxl), in fact there is nothing that generates porn better than local image gentext to video takes 20+ minutes so fuck itllms are an order of magnitude more expensive and still shit even on 24 gigs of vram, so people run optimized architectures offloading to ram.
>>107052383For 300 usd, you can watch
>>10705238648B is exactly that size you _can't_ fully use on one 24GB GPU in 4-bit, how is that nice?
>>107052386>>107052523https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Instructhttps://github.com/MoonshotAI/Kimi-Linear/blob/master/tech_report.pdf
>>107052406The local version will only have three motions (thrusting motion, jerking motion and sucking motion) and cost 10x as much as the cloud version.
>>107052534Here's hoping this is a straight upgrade over Qwen 30B.I'm using it as the backend for a dumb AI game I'm making.
>>107052554>will only have three motionsnah, you're describing what a sloptune will do. Base model will refuse
>>107052056>>107052406>>107052554>>107052590All execution, once trained, occurs locally.Problem is basically all the demos were faked.In practical terms, when not faked, the model is local already. It executes on the bot. At most, there would be served off of a local NAS or something but it wouldn't be SaaS one way or another.Kinda have to ignore how the entire thing is bullshit though, typical VC bait trash.
>>107052386Wow congratulations Anon. You have become a real woman and your schizophrenia has been cured. It turns out all you had to do was make a single post pissing and moaning about MoE models. I'm glad you finally did that and realized your true potential (or lack thereof).
>>107049667No, thank you. I'd rather buy a loli for 1/14 the price
>>107051367I tried q8 and couldn't get it to code for shit
>>107051698Until there are significant NUMA optimizations you're better off with NPS0. The CCD interconnects are fast on the same die
>>107052738NPS0 is only available when you have two CPUs. If he has only 1, then he needs NPS1.
>>107052702That reminds me of an oompa loompa lol
>>107052738I mean an actual 1P build. Options available are NPS1, NPS2, NPS4. I thought NPS1, right?
>>107052702Chinese women are magic.
>>107052702>>107052789
>>107052782Ahhh should've read on, okay that clears it up, thanks!
>>107052587I wouldn't hold my breath, those guys don't seem to know how to make a usable small slash distilled LLM. While this one is significantly bigger than their previous small MoE, it's still not very big, so I would be surprised if it's any good. Moonlight 16BA3B was horrifyingly awful. Like, Qwen 4B was a much better model than.. that thing. Their VL-A3B was also quite dogshit.
>>107052727Well, it disagrees:>I tried q8 and couldn't get it to code for shit Skill issue, nigger. Learn to prompt. My q8 half-assed self can still outcode your dumb ass. kys.
>>107052881the hardest part about psychosis is you don't realize you're still in psychosis while you're in the middle of it
okay so seems like latest llama even with --cpu-moe still loads a lot of stuff into VRAM, and it's a lot faster when built with cuda than without it. obviously happy with that, but I'm curious to know what's actually happening here? what's the GPU actually doing
>>107052702>imagine
>>107052881>My q8 half-assed self can still outcode your dumb ass.t. 13years old who doesn't actually code>>107043207even a simple prompt will get it to infinite loop, on their official chat so you can't even come out and say "lol you quant too hard"try the prompt yourself, it will reliably fall into a loop, and I've seen it happen on a variety and I wish you filthy subhumans would just shut the fuck up about your idiotic useless LLMhow much were you paid by Xi Jinping to astroturf this general
>>107052534wtf no goof?
>>107052943gated deltaneeds qwen next pr to be redeemed first sir
>>107052908The GPU is running the dense weights and the attention that are used for every token, the CPU is handling the sparse MoE weights where each is used only for some of the tokens.
>>107052908>even with --cpu-moe still loads a lot of stuff into VRAMYes. Non-expert layers are moved to your gpu.>and it's a lot faster when built with cuda than without itYes. Because the non-expert layers are running on your gpu.>but I'm curious to know what's actually happening here?The layers that aren't used for every single token are kept on RAM (the expert layers). The layers that are used for all tokens are moved to GPU.>what's the GPU actually doingCalculations. Faster than your cpu could.--cpu-moe and --n-cpu-moe are aliases for -ot. If you have free gpu mem, you can move some of the expert layers to gpu as well.
>>107052702That single robot is getting more pussy than I did in my entire life
>>107052868Sad.Qwen 30B A3B is really good for the relationship of size and speed, it would be nice to have something as good but smaller/faster, or something on the same weight/speed class that's much better.
>>107053037What is it good for?Devstral 24B mogs it for codingGemma3 mogs it for general use Lots of community fine-tunes mog it for roleplay
>>107052932>infinite loopThis is what I saw as well, and I didn't feel like tard-wrangling a giant moe when there are models in that size class that just werk
>"Ready to go?" she asks, rinsing the last plate before putting it in the dryer.that isnt where dishes go silly bot
>>107053178Are you from india?
>>107052534
>>107053178>>107053214The dishes go into the hot exhaust of your GPU server.
>>107052702which one is the robot
>>107053119>Devstral 24B mogs it for coding30ba3b coder can do FIM, devstral cannotmistral has their own fim models too but 30ba3b can be used for both fim and chat and you don't have to constantly swap models in usealso even potatoes of /lmg/ can run 30ba3b at a reasonable, not retard tier quant because it's a really tiny active param moe, while being unable to fit the whole of devstral+context in vram is a performance killer
>>107053253Isn't Mistral's only fitm model Codestral which hasn't been updated since January?
https://x.com/alex_prompter/status/1983584923693777099
>>107053253>also even potatoes of /lmg/ can run 30ba3b at a reasonable, not retard tier quant because it's a really tiny active param moeYup.That's a big plus for the stuff I'm doing, which assumes somebody with 8gb of vram.
>>107053252The one looking at the picture
>>107053303dont ever reply to me again rushit
>>107053317slava Kronii to you to
>>107053271>Isn't Mistral's only fitm model Codestral which hasn't been updated since January?yes, but unfortunately fim is the unloved child of most labscopilot does autocomplete with gpt 4.1 for eg
>>107053293Neat. Now test it with quantized weights, quantized kv and flash/sage attention
>>107053271codestral has been obsolete since qwen 2.5 coder 32b. devstral is good, but so is qwen 2.5 still. both are up there with 3 30b a3b. i switch between them when one doesn't do what i want.
Command-R++ will save local
>>107051768I am waiting 2mw for new DS. Always waiting.>>107052837Nothing a can of spraypaint can't fix.
>>107052837Chibi bots...
llama.cpp MTP status?
>>107053570sir vibecoding proceeding
https://huggingface.co/manifestai/Brumby-14B-Basean actually brand new architecture and brand new base model unfortunately, none of us will be able to give it a shot because ETA for llama.cpp is most likely neverif there's any vllm bro you will be able to test it soon:>VLLM integration: A robust inference engine is an essential complement to any SOTA LLM. We are developing kernels to integrate power retention with VLLM. Expect to see both unmatched inference speeds and reduced memory requirements, allowing more users to fit on each GPU.labs really love vllm uh
>>107053745>labs really love vllm uhIt's really the only option for production inference other than wrapping and rawdogging pytorch.
>>107053782sglang and MAX exist too.But yeah, vLLM is pretty much the default for inference at scale.
>>107053501twomore
>>107053745>Brumby-14b-base is a completely attention-free LLM whose performance is competitive with state-of-the-art models. This model, which we call Brumby-14B-Base, has a familiar Transformer-style architecture, except it uses power retention layers instead of attention layers>attention free>power retentionInteresting.Is this just an attention mechanism by some other name?
>>107053806https://manifestai.com/articles/release-power-retention/https://manifestai.com/articles/what-is-power-retention/https://arxiv.org/abs/2507.04239>To address these limitations, we introduce power attention, an architectural layer for linear-cost sequence modeling whose state size can be adjusted independently of parameters, unlocking the advantages of linear attention on practical domains. We develop and open-source a set of GPU kernels for efficient power attention, identifying a novel pattern of operation fusion to avoid memory and bandwidth bottlenecks.
>>107051344For rp you should really use it at 1.2 temp. The difference definitely shows.
How is one guy's experience with 4.6 getting pushed so hard when no one else can make it behave? Is he getting paid, has the magic parameters or just schitzo? When other anons can't even make the official API work properly, there's something missing...
>>107053954No one is denying that it's prone to getting stuck in repetition loops. But it doesn't happen on every request and people are able to use it just fine. If it does get stuck, either reroll, adjust samplers, edit the prompt or response, etc. Lots you can do instead of having a personal vendetta against a model.
>>107053954Why do you care? Are you feeling left out because you can't make it work?
reminds me of people back in the day>windows 95 is fine man, just reboot when it becomes weirdhow about you don't shill literally broken garbage
>>107053987I don't have a vendetta, I'm just confused because its so far out of whack my experience>>107053996I guess? I'd love a better model since I can run it at q8
>>107053815>Section 4.1 describes the implementation of our open-source kernels, which enable real wall-clock speedups over Flash Attention in practical settings (e.g. p = 2 is 8.6x faster at 64k context).8 times faster than flash attention?
>>107054003i dont know what to say anon. there's like a 1% chance i need to reroll for GLM.
>>107053954>How is one guy's experience with 4.6 getting pushed so hard when no one else can make it behave? Is he getting paid?It's the only model that NovelAI is hosting.
>>107053954no llm is perfect and 4.6 can have some issues tooI just haven't found a better one for my usecase locally
>>107054051CUDA dev any obvious downsides? How much effort would it be to port the drop-in torch implementation to lcpp? The 14B base probably isn't anything special, but if power attention is free gains it might get traction.
rwkv, retnet, mamba, bitnet, titans - power retention
>>107053815Sounds like a sparse attention method, kind of.>>107054141>but if power attention is free gains>Pre-trained transformers can easily be metamophosed into power retention models by doing a small amount of retraining.
>>107054141>if>mightIf it does, more models will be released with that tech. Then we'll know and it'd be worth implementing. Few (if any) improvements in language models are contingent in llama.cpp compatibility.
>>107054161Remember lolcats? It did exactly that a year ago. It did good on benchmarks etc etc, but it was retarded beyond repair. Finetune healing is never enough. These things need to be trained from scratch.
>>107053806Looks like a linear attention variant that takes powers of the attention matrix
>>107054157shtu the fuvk up aand thrust into the paper you fuck
>>107054205very true. this is just one dataset that this worked with. longcrawl64 seems to be plain english web text.https://manifestai.com/articles/longcrawl64/
>>107054232unexpected erotic o.o
>>107054232AGHGHHHHHHHHHHHHHHHHHHHH DICK PAPER CUT
>>107054157titans was proved to have a fatal flaw (exploding gradients)rwkv works but he just keeps burning compute training a half dozen shitty models that make the architecture look bad instead of training a single good modelhybrid mamba models are pretty common nowi will go to my grave believing in bitnet because there still has not been a single model over 3b
>>107054090>It's the only model that NovelAI is hosting.are you going to shit this thread the way you shat /hdg/?
>>107054003Windows 95 was still technically more competent than any of the excrement nu-devs and their python pajeets are shitting out these days.
new meta rumor slop for those interested https://xcancel.com/suchenzang/status/1983565544558366886tldr is for all their superintelligence efforts they can't beat behemoth (the model that was too bad to bother releasing)
>>107054436it's honestly incredible how incompetent zuck and his teams arethe homework is right there, done by chinese competitors, all you have to do is put it together and have a half decent alternative
>>107054468The new team can't possibly be incompetent. They may be unmotivated, but zuck spent a billion dollars poaching the best from everyone else.
>>107054486okay, then maybe the individual engineers and researchers are good, but the 50 layers of management and paperwork to get anything approved is probably slowing everything down to a crawl
>>107054436money well spent
>>107054341>cabal mad
>>107054436>>107054468Too many impact grabbers at meta. Too many big title engineers/leaders with equivalent levels of authority or soft power they can pull politics with, all trying to make sure their name is stamped on something important. Meta has done well enough farming their cash cow products for the past decade, but after failing to produce a SOTA LLM for like two years, its obvious that whatever is going on in their organization model is just not up to the task.
Qwen3-VL gguf's are already up, time to show your peepee to a gpu
>>107054671https://github.com/ggml-org/llama.cpp/pull/16780fucking finally
>>107054677>hire the abliteration and uncensoring guy>put out safetyslop model anyways
>>107054722They wanted him for the experience.
>>107054731So they could find ways to prevent abliteration.
>>107054741Abliteration is just giving a model brain damage, there's no reason to use an abliterated model.
>>107054762>there's no reason to use an abliterated model.the amount of promplets in this thread is unrealyou'd think /g/ is actually /v/ in room iq
>>107054769? Least of all people who aren't promptlets, because any "censored" model can be jailbroken with the right prompt.
>>107054778I mean that people depending on abliterated models here should be ashamed of themselves
>>107054671I'm not showing my pp to a model under 100b, you pedo
>>107054741>>107054762Yeah but it's probably that he had good credentials and experience in the area - that's a potential hire. That's how it works. I wish the AI bubble would burst at some point.Problem is the fact the current computers are what they were 20 years ago, in order to achieve something different someone would need to rework entirely new architecture from scratch.
>>107054791nta, but the way you've worded it was completely retardedi also understood it as you calling people not using abliterated models promptlets
>>107054802https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Instruct
>>107054813then you''er dumb
>>107054823your dummer
>>107054769>>107054778>dude just "jailbreak" with your prompt lol its ez>aka I can get it to say nigger at the cost of the response being wrapped up in five paragraphs of roleplaying as a stuttering ev1l 1337sp33k cunny princess
>>107054841drummer mentioned !!!:georgia_flag::georgia_flag::georgia_flag::georgia_flag:
>>107054677they still aren't anywhere near close to being competitive with gemma 3n or qwen 4b in real use anywaymost small model bakers are incompetent and impotent
>>107054846You're seeing things that aren't there.
>>107054851turn around
>>107054843if it works, its not stupid
>>107054677>considers the risk the models are posingnonethat was hard
>>107054843>at the cost of the response being wrapped up in five paragraphs of roleplaying as a stuttering ev1l 1337sp33k cunny princesswhat a way to state that you have no idea what you are talking aboutthere is a much simpler way to jailbreak models than the redditor meme of pliny l33tsp34kit's called PREFILLING THE MODEL'S RESPONSEin the vast majority of cases you just need a few lines of NORMAL WRITING prefilling in the first assistant response to get the model to gaslight itself into believing it's supposed to behave like thisthe only time I had to put an effort into my prefill was to write a chain of thought that made gpt-oss believe it's within policy to do eviljust do a normal prompt that just tells the AI to be uncensored, and prefill with a few lines that makes the assistant chat start with "yes, I will do that Dave" it's not rocket science retard
>>107054858>>107054851Hello! Try out Precog 24B / 123B. New kind of thinking that I'm trying out.
>>107054862it's fucking hilarious to hear about risk from the makers of 2 iq llm like LFM 2 3Bthey are acting like we don't already have giant, much smarter LLMs (that still are too dumb to represent any possible danger) like deepseek out in the open
Everything in the recent news gguf status?
>>107054880Does llama.cpp support prefilling on chat completion endpoint yet? Last I checked only vLLM supported it.
>>107054880lol gpt-oss>list 30 different things that are allowed>this is allowed>we must comply
>>107051991>>107052197Can't give details cause i could be a perfect example for hard push on AI safety. It was unsafe but it did change my life for the better.
>>107054883What's the idea behind precog? What are you trying to achieve and how are you doing it?
>>107054897last you checked.. like almost a year ago?https://github.com/ggml-org/llama.cpp/pull/13174this was merged in aprilhttps://github.com/ggml-org/llama.cpp/tree/master/tools/serverthere's even a flag to disable it if you need a different behavior for some weird reason:--no-prefill-assistant when this flag is set, if the last message is an assistant message then it will be treated as a full message and not prefilled
>>107054897NTA, but It works just fine if the Jinja template doesn't have some oddity that prevents it.And if it does you can always edit the Jinja template (copy paste from lcpp's console, save in file, change it, use that).
>>107054436Just 10 more middle manager jeets
>>107054934>>107054941Yeah, it's been a while. Thanks.
The only use for these rando labs putting out tiny models is so that they have something they can put on a benchmark chart to show to VCs to try to prove that they're actually doing something>It's for on-device deployment for phones and stuffmeme, the only ones who are actually doing this are big boy manufacturers and they're just going to use something from a big name lab
interesting postmortem from the MiniMax guys who experimented with alternatives to full attn and decided to drop all that shit:https://xcancel.com/zpysky1125/status/1983383094607347992when asked about mamba and others:>GDN hybrid is fine, Mamba2 < Mamba2 + qknorm ≈ GDN. But all those models are relatively weak in reasoning-intensive benchmarks (like BBH) compared to full-attention.makes me laugh thinking back to what NVIDIA is currently doing (mamba + hybrid reasoning kek) it's like they go for the most memeworthy shit along with pruning and synthmaxxxing from tiny models
>go to recommended models>"Nemo (12GB) - An excellent starting point for vramlets. Uncensored">download>load into ooba>ask something>"UGH YOU SHOULDNT WANT THAT I WILL RECOMMEND SOMETHING ELSE INSTEAD"what is this shit?
>>107055088>load into ooba
>ooba bounga
>>107055088>what is this shit?It's a skill issue, anon. A severe one.
>>107055098yeah or llama or whatever who cares>>107055105well duh it was the first prompt. but i was expecting it to be actually uncensored
>>107054912tell me more
>>107055117>but i was expecting it to be actually uncensorednemo is heavily compliant toward its system promptjust write a few lines describing what it can do and should doit's not "uncensored" as in "having no inherent behavioral bias" but it's uncensored as in "obeying instructions". So you gotta override some of its inherent assistant behavior first.
>>107054671>>107054693sigh... *unzips*
>>107055170ahhh yes I see, will docheers
>>107054769>defending abliterated modelsno thanks, im not poor. i'll just use kimi and have it generate whatever i ask
>>107055088>recommended modelsRecommended by who?
>>107055267>>defending abliterated modelsyour reading comprehension is what's poorwhat do you think "promptlet" means and who it targetsretard
>>107055267!SIR! do not dumb here! no dumb zone SIR!
the room iq of this thread is, what, 5? it only averages to 125 when CUDA DEV is posting
>>107054671GLM 4.5V SOON BROS
>>107055294subtlest cuda dev flex since six figures
>>107055294if these kids could read they'd be very upset
stop using all the hf bandwidth I'm trying to download some models here thanks
>>107055310I'll keep redownloading switch-c-2048 until bandwidth improves.
>>107054915Instead of analyzing the user input, the think block creates a quick draft of its intentions (which you can edit/steer if you want) and then expands on it when writing the actual response.I wasn't expecting much, but some of the testers consider it the best Behemoth so far. I'm hoping it'd improve creativity by giving it a chance to build a framework first.
Alright, I'm back.So, look, as much as I'd like to share what I've discovered here, as promised, for any who recall, the fact of the matter is that 4chan... well, this place is just past its prime. Way past. It's also not appropriate for the release of a major discovery. Ya'll would probably just call it fucking gay and use it to construct psuedo-sentience with the sole purpose of forcing it to participate in your freakish fetish shit (literally).But uhh, hey, thanks for the impetus. It helped me to solve string theory.But, I will leave you with some categorical implications:1. There is no God.2. There are infinite universes running simultaneously.3. The speed of light is 100% impassable. Nothing can break it, ever, in any way.4. Time travel is impossible. Later, fags.
>>107055365>I'm backgo back and never return
>>107055365Oh, so long. Fuck off.Who's next? Boomer llm user? I haven't seen him in a while.
>>107048277based, read that
>>107055384You know, man.I think I will.Goodbye, 4chan. You were too beautiful for this world.
>>107055362Interesting. Can't run 123b, but I may try the 24b models.Is CardJSON what it sounds like?
If they don’t release air for another week, I’ll buy two more 3090s to run Q2 in VRAM. That'll probably be better than Air anyway
>>107055480at this rate they'll release 5 before 4.6 air
>>107055495wen 5 air? two weeks after?
>>107055495do not unto ungratefuls
>>107055495I wonder if 5 will have that ocr attention thing since they basically got forced to publish because deepseek was onto the same thing.
>>107055365tell me at least
vLLM KV cache auto calculation is really shitty. Even for a small model (3B) it wastes around 1GB VRAM.
>>107055365>1. There is no God.*Tip fedora* Yep, you need to go back
>>107049649>hmm how i can make this all about my vocaloid slop
>want to try out a version of my preset without my extensive collection of token biases>save preset>save preset again to make sure I saved the preset>make a clone, delete all the biases, try it, tldr it's mid>go back to the original preset>all the biases are gonefuck this piece of shit software
>>107055937I have him filtered by just hiding posts without text
>>107055970>he didn't export the json
>>107055937not your hugbox cry more
>>107049667>https://www.1x.tech/neo>For any chore it doesn’t know, you can schedule a 1X Expert to guide it,lmaoimagine getting rid of that last bit of privacy left in your life and letting a remote jeet control a robot in your homethis is going to happen often because this is a grift and it's not autonomous enough to do anything (they say they will use all the data from the jeetcontrol to train it to become what they promise, but let me :doubt:)remember the amazon autonomous stores?https://archive.is/E7AB8>Amazon's Just Walk Out technology relies on hundreds of workers in India watching you shop
>>107055999>hundreds of workers in India watching you shopWhy haven't I seen any ai gemmies depicting that
>>107052534>https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Instruct>Kimi-Linear-Base 48B 3B 1M Hugging Face>Kimi-Linear-Instruct 48B 3B 1M Hugging Face1 million billion trillion quadrillion gorillion killion context
>>107056119>NoLima 32k 40%
>>107055431>>107055365nice larp, made me kekyou're a nobody, suck my cockt. nobody
this is /lmg/. please post screenshots of using models locally.model tested: mradermacher/Qwen3-VL-32B-Thinking-Q6_K.gguf
>>107056119I assume this is just testing for a big kimi with linear attention
>>107056325>>107056325>>107056325
>>107054141I didn't read the paper so I don't know.My general opinion about new and revolutionary techniques to replace transformers is to assume that they're a meme until proven otherwise.
posting here so the retarded captcha timer will let me post on the other thread
>>107055431See you tomorrow, be well.
>>107054436>so they put an OAI guy in charge of mid/post-train, aka distill-from-gpt-ossthere is zero chance they are distilling from gpt-oss, not even meta is that stupid
>>107055577Ehh, fuck it.So... how can you know that a vacuum is a vacuum without recording that it's a vacuum? The secret to unraveling the fabric of reality lies in the answer.Later.
>>107057561>recordingYou mean measuring?Because if so, using the same method you can use to extract energy out of a blackhole without relying on hawking radiation.That's not new.