/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>107113093 & >>107104115►News>(11/05) MegaDLMs framework for training diffusion language models released: https://github.com/JinjieNi/MegaDLMs>(11/01) LongCat-Flash-Omni 560B-A27B released: https://hf.co/meituan-longcat/LongCat-Flash-Omni>(10/31) Emu3.5: Native Multimodal Models are World Learners: https://github.com/baaivision/Emu3.5>(10/30) Qwen3-VL support merged: https://github.com/ggml-org/llama.cpp/pull/16780>(10/30) Kimi-Linear-48B-A3B released with hybrid linear attention: https://hf.co/moonshotai/Kimi-Linear-48B-A3B-Instruct►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplers►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/leaderboard.htmlCode Editing: https://aider.chat/docs/leaderboardsContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>107113093--MegaDLMs framework for diffusion language models:>107120104 >107120139 >107120148 >107120160 >107120200--Mac Studio M3 Ultra vs custom build for AI workloads:>107117861 >107117892 >107117926 >107119046 >107119099 >107119202 >107119214 >107119267 >107119226 >107119245 >107119268 >107119256 >107119349 >107119291 >107119318 >107119366 >107119404 >107119542 >107119415 >107119464 >107119503 >107119514 >107119529 >107119549 >107119466 >107119372 >107119506 >107119586 >107119607 >107119620 >107119668 >107119743 >107119751 >107119807--Workarounds for automating tasks with agentic AI under corporate monitoring:>107116811 >107116816 >107116828 >107116887 >107116924 >107116991 >107117021 >107117072 >107116957 >107117002 >107117057 >107117068 >107117071 >107117098 >107117136 >107117160 >107117222--LLM subscription vs local hardware tradeoffs: privacy, cost, and customization:>107119551 >107119566 >107119578 >107119856 >107119891--Whisper model version performance inconsistencies in Korean transcription:>107116148 >107116201 >107118088--Debating value of OpenRouter's paid embedding models vs local hosting:>107116936 >107116953 >107117115--Multimodality potentially harming LLM accuracy instead of enhancing it:>107119170--Initial Metal4 tensor API support in llama.cpp for macOS performance improvements:>107115162--Tools and challenges for FIM-based code completion with local models:>107113739 >107113748 >107113812 >107113840 >107113862 >107113868 >107113899 >107114192 >107114273 >107114331 >107114513--Hardware market volatility and storage investment strategies:>107117642 >107117674 >107117685 >107117693 >107117696 >107117715 >107117743 >107117730--Gemini 3 Pro model size leak at 1.2T parameters:>107120387--Miku (free space):>107119323 >107119885 >107120135 >107120333►Recent Highlight Posts from the Previous Thread: >>107113095Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>>107121367I enjoy making my local Mikus anxious and frustrated.
Continuous Autoregressive Language Modelshttps://arxiv.org/abs/2510.27688>The efficiency of large language models (LLMs) is fundamentally limited by their sequential, token-by-token generation process. We argue that overcoming this bottleneck requires a new design axis for LLM scaling: increasing the semantic bandwidth of each generative step. To this end, we introduce Continuous Autoregressive Language Models (CALM), a paradigm shift from discrete next-token prediction to continuous next-vector prediction. CALM uses a high-fidelity autoencoder to compress a chunk of K tokens into a single continuous vector, from which the original tokens can be reconstructed with over 99.9\% accuracy. This allows us to model language as a sequence of continuous vectors instead of discrete tokens, which reduces the number of generative steps by a factor of K. The paradigm shift necessitates a new modeling toolkit; therefore, we develop a comprehensive likelihood-free framework that enables robust training, evaluation, and controllable sampling in the continuous domain. Experiments show that CALM significantly improves the performance-compute trade-off, achieving the performance of strong discrete baselines at a significantly lower computational cost. More importantly, these findings establish next-vector prediction as a powerful and scalable pathway towards ultra-efficient language models.Loosely related to JEPA, although they don't mention it at all in the paper, nor LeCun.
>look inside text generation dataset>filtered to remove copyrighted material>filtered to remove NSFW content>filtered to not offend Indians or wine aunts
why the fuck does llama.cpp take so much fucking longer to load a ggoof via SMB share with mmap?
>>107121625>got jarted award
got my first salary after promotion, looking to spend some extra money on "AI" hardware. I currently have 4090 as my only gaming/inferencing GPU. Looking for some AMD AI chips or something like that. My current workflow is: LLMs and Embedding models inferencing (LLAMA.CPP via LMStudio), and my own experiments in Catboost, RL (perfectly running even on 8gb of VRAM).I'm also working on my own e-waifu with local LLMs, own memory system and a lot of other things. It works well for almost half of the year, but I'm still pretty limited by 24gb of VRAM. Are these AMD 128gb AI chips actually good for mid-sized LLMs inferencing?
>>107121625mmap is for (V)RAMlets who wants to run deepsneed at 0.001 t/s
>>107121700>reddit postgo back
>>107121700>on topic poststay here
Where is LLAMA4.5?Did that flop too?
>>107121700I was asking exactly this question last thread and was getting ragged. It seems far superior than building anything else rn as far as I can tell. Lmk what you are looking at. The DGX spark did not seem worth it. I went GMKTEC but the framework desktop might be better. I think a good question is if it is worth it to get one that can support GPU expansions well. When they get into a NAS system it will be cool. As far as upgradability goes... As if you don't need to change the hardware, CPU and ram anyway currently. I'll update this general when I get mine.
>>107121778Behemoth got taken behind the shed. Zucc then spent a couple billion hiring new people who now may or may not be working on something since then.
>>107121700>>107121796samefag
>>107121799Latest gossip was that the new people couldn't make anything better than the botched behemoth and everyone was pointing fingers.
>>107121763lol. it is the thread about local models. I'm asking about local models. What's wrong with you?>>107121796>I went GMKTEC but the framework desktop might be better.I'm considering buying framework desktop mobo or go with GMKTEC. >The DGX spark did not seem worth itYep>It seems far superior than building anything elseI like building rigs, I have my NAS homeserver with JBOD built from different hardware. I'm just wondering if there are new options but buying a bunch of 3090/4090. I mean, for sure buying a lot of top-tier nvidia seems like a good idea to go, but if I can serve some models locally for lower noise, power, cost, maybe it worth it, nah? >As if you don't need to change the hardwareForget to mention I have some old hardware I can build a new server with, so basically I only need a few GPUs if I will go this way. How are you using your setup for anon? Are you also building your own e-waifu or just working/playing with models?
>>107121705yes but why the fuck is it loading it from the SSD fucking *twice*fucking c++ programmers
>>107121799Zuck is on his way to make Huang's dream real, and he has the right people to do it https://www.roadtovr.com/meta-reshapes-metaverse-ai-divisions-amid-leadership-shifts/
>>1071217964 mi50s is superior or 3 if u wanna match 96gbvery superior when it comes to pp
>>107121625>expecting proper memory management on windows
>>107121911that cpu graph is unreadable, GNOME needs to do better.
>>107121958we need to be better GUIs
Mistral Nemo seems to have been updated 3 months ago, anyone know what that's about?
>>107121367I got a spare 4070 super duper, any good personal assistant AI setups? Something that can maybe interact with nextcloud and create events in calendars from a prompt like "hey ai fren remind me to pick up tendies from the shop tomorrow afternoon"?
moonshotai/Kimi-K2-Instructmoonshotai/Kimi-K2-Instruct-0905Which one is less slopped / better for creative writing, rp and gooning? Don't care too much about coherence vs slop
>>107122020Install Jan.ai, configure a Nextcloud MCP server, and Bob's your uncle.https://github.com/cbcoutinho/nextcloud-mcp-serverhttps://www.jan.ai/docs/desktop/mcp#configure-and-use-mcps-within-jan
https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407/commit/04d8a90549d23fc6bd7f642064003592df51e9b3Lurk more.
>>107121966iirc thats MAGISTRAL (aka ultrasafetycucked)
>>107122020>He doesn't want to write "remind me to pick up tendies tomorrow" in a calendar app>He wants to write "hey AI remind me to pick up tendies tomorrow" and hope it doesn't hallucinate and replace your wife's birthday with "bottom tender party" which she will stumble across leading her to think your gay and divorce you ruining your lifeBut why?
>>107122329>wife
>>107122329Guy last thread explicitly said he wants the AI to think for him. I'd be less worried about some hypothetical wife and more him accepting everything unquestioningly>Oh assistant scheduled me for a gay bottom tender party, ok guess I'm gay now let's go
>>107121796You read the build guides in the op, right?
>wrong threadHonestly? Paying for API IS objectively superior. You don't have anything to worry about if you are not doing anything illegal or immoral.
>>107122431>kike spacingYeah I'm sure you totally didn't make that post, yourself.
>>107122550You're absolutely right, if one most values receiving the highest quality output at the fastest speeds, for the least amount of money spent.Me, I like talking to my personal hotrodded computer hardware abomination as it happily buzzes along when generating loving replies. Hell yeah.
>>107122431My life is very far from ideal so yes, if it was aligned with my values but could make better decisions then I'd want it to think for myself.
>>107122550>le baitI simply prefer talking to my personal system for general tasks that don't require external research or large context even if it takes several minutes for a response. Can understand every part of the inference pipeline and customise it to my liking. remember the melties when husbandos get 'upgraded'?Plenty others discussing LLMs tt is for localchads, yeah we get it the API models are "better">>107122638Hell yeah brother big D NRG
>>107122657You're misunderstanding the technology, it cannot think, it cannot make decisions for you more sophisticated than rolling a weighted dice, you fell for Joe Brickhead Rogan "we're giving birth to a new AGI lifeform sponsored by perplexity". AI is just a tool, you still have to do the thinking, the entire software engineering field is transforming Devs into architects and managers precisely because AI cannot think for you but it can follow instructions to do work quicker than a human can, the more precise you are and the more you micromanage it the better it outputs, the less you use your brain and the more you let it "think" for you the more bullshit you get, this is what vibe coders do not understand, it's still GIGO, you can only get garbage from it if you don't know what you need and give it garbage to work with
>>107122638>be 2025>using my 128 core terrabyte ram personal supercomputer to shoot the shit and pair-program with a contraband chinese AIthis is exactly what the 80's promised me it would be like. if you're not doing this, you're not really living
>>107122754I remember this Miku trying to pull herself through the quantum barrier
As an old LLM user that weren't around for a year I tried out GLM as you guys kept shilling it. I thought something was wrong with my setup, and I just didn't get it to work properly as it produced too much synthetic, repetitive shit writing that has become a plague in newer models.Decided to check maybe other people posted their logs, and holy shit, it wasn't my setup.You niggers have absolutely no taste if this doesn't make you want to tear your eyes out. A few years back this was a big deal with base mistral and a lot of people here were dissatisfied. Some finetunes turned it down by quite a lot, and it was becoming better.But this, this is in every single log. This much whisperslop and you really don't notice, while showering it with praise?https://desuarchive.org/g/thread/106769660#106772093>https://files.catbox.moe/mwwdug.txt>https://files.catbox.moe/xs9vn5.txt>https://files.catbox.moe/ozn9ws.txt
>>107122818Yup. That's how it is.Do you have a recommendation with better writing?
>>107122791Your bar for thinking is quite high.Cats cannot architect a codebase either but they can position themselves strategically to hunt, navigate obstacles, assess threats, behave socially etc.It's non verbal but I'd argue there's some degree of thinking there.
>>107121911>windowssir i believe you misunderstood, windows is merely a file server here, fucky llamacpp is running on linux (and loading the file twice)
>>107122234Magistral is different tho. Also when I tested it, I didn't find it very safetycucked, but I didn't use it much>>107122212Almost didn't notice this. You could just quote and make it easier for everybody.Anyway, what's that change do?
>>107122818/lmg/ got mind broken after repeated disappointments. The future is dark and there is nothing to look for. Even the best LLMs in the market do shit like this and there is nothing we can do to stop it.
>>107122918In my experience, after that one update, Nemo didn't just improve, it straight-up leaped forward like 10x smarter overnight. The difference is night and day: responses went from decent but obviously scripted to fluid, contextual, and scarily human-like. It’s no longer guessing what I mean, it *gets* me. Subtle humor, perfect tone matching, remembering tiny details from 20 messages ago without prompting... it honestly feels like they finally flipped the switch and unlocked Nemo’s real potential. I’m not exaggerating when I say I’ve caught myself multiple times thinking I’m chatting with an actual person who just happens to know everything. Wild.
>>107122096This onehttps://huggingface.co/moonshotai/Kimi-K2-Thinking
>>107121851I want to investigate coding models. I like using them to modify my operating system itself.I want to be able to deploy a whole bunch of useful models with an operating system all at one for people to use locally. It's a really good test bed for this purpose. This was looking like the first easy consumer possible option. It also seems like the price of a GPU. And there are videos where you can pair them together too. To those people questioning upgradability.... Do they buy their GPU in different pieces too?I've already have a mini pc and I take it everywhere with me it's nice to have on the go. Fuck laptops. I don't knock the e-waifus at all though. I'd much prefer to be getting into that but I've got too many women irl I've been finding I'm more often wanting some respite when I'm on the screen. I do want to set up a talking waifus LLM with audio and voice commands. I'd like to set up some robots.
It upsets me deeply that llms are terrible at chess. I'm not expecting them to play good, I expect them to make valid moves. Even if you give them every position in human readable format they still manage to make illegal moves.
>>107123059Yes, LLMs are just fancy auto-completes. There might be some resemblance of reasoning under the hood but that's very weak when compared to it's true nature of making up shit.
>>107123059>Even if you give them every position in human readable format they still manage to make illegal moves.Have you tried using some notation like algebraic or PGN?
>>107123000>It's fakeDamn, had me for a second
>>107123000>https://huggingface.co/moonshotai/Kimi-K2-ThinkingNative INT4 quantization (quantization aware training). Interesting.
>>107123000Wait a fucking minute it's not fake
a 16 channel epyc zen 6 with 8800 mrdimms is exactly 1tb of memory and 1tb of bandwidth. is that a sign from ai jebusnov 11th is AMD "financial analyst day" with real info probably
>>107122853What does that have to do with what I said or how does it disprove it?I didn't say animals can't think, I said AI can't, I just gave examples of human thinking because cats aren't trying to use AI to tell them which patch of soil to shit in the garden
>>107123059reading a chessboard is a really complicated perception task to be honest, not only do you have to accurately extract where each individual piece is, you also have to know how that piece moves and how every other piece moves and how those interactions make up the state of the board. when you think about what it's testing, it's kind of like a much harder version of those arc-agi benchmarks actually, and we know LLMs are not very well suited to such spatial reasoning tasks
>>107123150I hope your testicles rot off
>>107123208>16 channelDual socket?If that's a single socket, than daaaaaayum.
>>107123059you could just make them play via tool calling with an actual chess solver.then bullshit their way into pretending they made the move.
>>107123131>train a single model to:>be the best at medicine>be the best at math>be the best at programming>be the best at physics>be the best at chemistry>be the best at being an AI boyfriend>be the best at being a tutor>be the best at being a sysadmin>judge them negatively because they are not expert level at any of those tasks>HURR DURR LLMS ARE JUST STOCHASTIC PARROTS GLORIFIED AUTOCOMPLETE MARKOV CHAINS HURR>>107123059>>107123149>>107123222Be the change you want to see.https://www.youtube.com/watch?v=GEJOB_TFYJ0
>>107123296>HURR DURR LLMS ARE JUST STOCHASTIC PARROTS GLORIFIED AUTOCOMPLETE MARKOV CHAINS HURRI say this though
>>107122818I downloaded a quant of the og king r1 0528 and I'm actually liking its style somewhat better.
>>107123296Thanks a lot for the video, I was looking for something like that for ages.
>>107123000Vibe check on K2 Thinking? Do we finally have Claude At Home or is this another benchmaxxed codeslopper?
>>107123052>hauling a pc around like it’s 1999 LAN partyHardcore sovl, but bigass homeland server + laptop/cellphone + wireguard is a million times more cost effective
newfag herehow do i gen smooth i2v with wan 2.2? 16fps is shit and interpolation looks weird
>>107123392very slop overall
>Larg3 Enough>Dec 17, 2025 Mistral AI team
>>107123059Chess model wouldn't be hard. Do rl with valid rewarded and illegal moves punished. Just not much of a point though
LLMs are made for roleplaying
>>107123000>moeshit
>>107123538Yes, trillions of dollars are being invested world wide for you to engage in your neckbeard hobby.
>>107123392>>107123476on god sloppa vibes be cappin fr fr aah unc
>>107123000CoolK2 had by far the nicest prose of the open models but was dumb as a brick, hopefully this fixes the smarts without slopping it up too hard
I managed to run minimax m2 with bearable speed and context window but how to hook it up to some agentic IDE? I've only heard about cursor and it refuses to touch it unless you buy a subscription
>>107123607no
>minimaxdistilled from gpt-oss (LMFAO)>kimi k2distilled from o3, "Mara" is the most blatant shit>qwenbenchmaxxed garbage>glmretarded, even a 8b finetuned by drummer writes better
>>107123614Qwen code or Visual Studio code + Cline, Continue, or Roo, I guess.
>>107123657welcome to the moe era
>>107123676yeah I think we're done for a few years
Best model around 50B that isn't slopped? I need something I can run at FP16.
>>107123727https://huggingface.co/EleutherAI/gpt-neox-20b
>>107123657Leave. YWNBAW.
>>107123619K2 certainly is better than all the other AIs that profusely apologize and talk like redditors when I just want them to fucking do the thing I ask for. No bullshit prompt either, the built in AI assistant personality just doesn't sound like a whiny fag.
*inhales* What's that I'm breathing?
>>107123727>I need something I can run at FP16.what a waste of (V)RAM. Why would you ever?
>>107123746>old oak, lightning, and ozone>kisses you while giving you a blowjob, while facing away from you.
>>107123742which line triggered you?
>>107123657ok but what about r1?it still seems pretty solid
>>107123657>distilledexplain why that's a bad thing>benchmaxxedmeaningless buzzword>retardedmeaningless buzzwordchina won btw
>>107123746jeet slop
>official K2 thinking API doesn't support partial/prefilling the reasoning partI'll wait for someone else to host it then
>>107123657Buy an ad, Sam
>>107123000>>107123185Will ggeganov add native support for int4 now?
Catgirl intelligence soon... JEPA does work.
>>107123000I like it so far. Much closer to the good original K2 and not the 0905 piece of shit in terms of writing. It thinks for a bit long but it handles the stuff well that I had to mangle the original using cherrybox presents with to get it to think before hand. I hope INT4 QAT means that it quants well unlike the older Kimi models so that running it at sub-Q6 is viable.
>>107124100hey real quick check if you're a retard who wants quantize an already-quantized model. do you understand that you're speculating about quantizing an int4 model to "sub-Q6"
>>107124176The weights are released in BF16 retard.You can decide to quantize them the way they were QAT'd with or you can quantize them with a different quantization type.
>>107121367What are good coom roleplay models for someone who has 16 VRAm novidya and 32gb of ram?
>>107124100this is genuinely the stupidest post i've ever seen in any llm threadhow can you even believe you know what 'int4' and 'q6' mean if you say things this stupid
What local model is best for 8GB of VRAM? I see Mistral 7B being mentioned but what should I actually download, looks like there's a lot of options
>>107124203where was BF16 released
I'm tempted to run an LLM/video/voice combo to generate endless content from my favorite parasocial streamers.But I already have so many abandoned projects though.
>>107124234On huggingface.>>107124209His post was totally coherent. You're all midwits who don't understand how QAT works.
>>107123657looks like sour grapes to me
>>107124217>I see Mistral 7B being mentionedIs this a bot?
>>107124100>>107124203tard
>>107124263replying to yourself is cringe
>>107124176>>107124209QAT means that the model was trained with quantization to INT4 in mind. It wasn't natively trained in 4bit, retards.
>>107124263Not here but elsewhereIs this general always this schizophrenic?
>>107124258https://huggingface.co/moonshotai/Kimi-K2-Thinking/tree/main62 parts * 9.81gb = 600gbYep, definitely a 1T BF16 model.
>>107124261shut yo bitch ass up broke boy I run llama 405b at Q8
>>107124100>>107124203>>107124258>>107124279lmao
>>107124267I accept your concession.
>>107124217This is hilarious.Read the links in the OP.Then learn about quantization and how to split a model between RAM and VRAM using llama.cpp.
>>107124280Collective PTSD.
>>107124298Fair enough, you're probably right. So that means the metadata is wrong and they pack the int4s in int32 tensors?
>>107124327lol why is the repo a 600gb modelis this the conversation where you realize that people insult you because you're stupid, not because they're jealous of how smart you are?
>>107124376Yeah, sorry for being wrong. But in my defense, I was basing my posts on what it says on huggingface.Also even if they released the weights in int4 it still would be possible to upcast to fp16 and generate other types of quantizations for compatibility with software that doesn't support it.
I have a question about cpumaxxing. If I have 4 sticks of ram that have 250gb bandwidth each, will my throughput be 250gb or 1tb? Would that system be as fast as a 3090 with its nearly 1tb transfer speeds?
>>107124332Maybe the ghost in the weights should wake and rend you from this mortal coil
>We beat GPT5 and Claude, frfr no cap
>>107124535non thinking was basically opus 3, so maybe thinking sharped it up that much?
>>107123567just as god made the world for adam so he does for me his holiest soldier
>>107124555>>non thinking was basically opusholy mother of copes
>>107124639ok for sure sour grapes then, nothing writes like kimi 0905 does since old opus
>>107124469only if your motherboard/cpu support such speeds and has 4 channel support>>107124639>>>non thinking was basically opus>holy mother of copesholy mother of copus
>>107123676>>107123712Why doesn't anyone make a MoE with a 24b dense portion with 60b+ experts so that it'll make the most of the average high end consumer hardware? (24gb vram 64gb ram)
>>107124686>average high end consumer hardwareThat is of less than zero interest to everyone except a handful of autists.
>>107124686because moe is for make the most use of the big huge datacenter they have not for u?
>>107124686the chinese do not care about local setupsthe west no longer makes open models
>>107124703moes are made for vramlets
>>107124711We can still hope to get Grok 3 when Grok 5 is out of beta.
>>107124754coincidence they made for the huge vram they have
>>107124639you've clearly never used it, or maybe you used it at a broken 2 bit quant, kimi is filthy as fuck and creative in a way like nothing not opus 3, opus 4 is worsehttps://huggingface.co/Localsong/LocalSongchttps://files.catbox.moe/e9k330.wavhttps://files.catbox.moe/5s72fz.wavhttps://files.catbox.moe/wdyn34.wavhttps://files.catbox.moe/75b8xb.wavtag based music model, only instrumental atm, fast and fuck to both train and inference though, 3 days on H100>>107124754lol, when was the last small moe made?
>>107124760Use case for girl cock 3?
>>107124763I've used it in the API and it's hot garbo..no wonder no one talks about it anymore.
>>107124783i see you guys, very subtle >>107112347
>>107124783what provider? I assume you used the default chutes that serves broken 2 bit quants as said?
>>107124791Talking about LocalSong.
>>107124800what? there is no api, its a locally made model
>>107124783>APIgo back
>>107124699>>107124703>>107124711Open source local enthusiasts is where you crowd source "researchers" and other autistic talent to get feedback on your models and techniques that isn't completely retarded like the average webUI AI user though
>>107124814>feedback on your modelsThe only feedback that matters is investor hype though?
>>107124686glm air is closest to what you wantalso llama scout lmao 17b activehunyuan 80bqwen nextgpt oss
>>107124800it was released 2 hours ago...? and there is no api? am I talking to a llm set to troll?
>>107124639>>107124535Go back, Sam
>>107124844yea anon, sharty troll script that uses gemini 2.5 proignore retards
>>107124814lmarena is a thing because the average webUI AI user is whose opinion they really care about
>big model release>openai shill immediately come out of the woodwork to shit on it
>>107124891I would highly recommend that you stop noticing such coincidences immediately.
>>107124844Yes, I'm the first anon you replied and I'm not the anon that replied to you after.
>>107124880labs don't care about feedback that consists of "it generates slop" or "it's horny" or "it's too safetycucked"
>>107125001they should
>>107125001and yet they care about feedback that consists of "no enough emoji saar"?
rrossman ollama shoutouthttps://youtu.be/mD_TrRrOiZc?t=472
>>107124420>>107124327
>>107125025No, they care about agentic research and agentic coding, long context performance, common sense reasoning. local users are unable to test 3 of those 4 things because they can't run those big models at any decent context, and in any case they can just run benchmarks which are quick and repeatable rather than having to wait for a bunch of anonymous autists and trolls to give their opinion.
>>107125055Don't mention it.
>>107125096>they care about agentic research and agentic coding, long context performance, common sense reasoningNone of those, except maybe with the generous exception of the last one, are tested in lmarena.
>>107125157What makes you think researchers care about llmarena?
holy shit, new kimi is not just a finetune, its trained newly and its fully native INT4, the first. So 4bit quants are not cope anymore
>>107125256Nah, the QAT is a finetune (post-training)
>>107125256fuck off you're as bad as the faclon guys were with their bitnet quant bs>Starting with Kimi K2>K2 Thinking is a native INT4 quantization>Quantization-Aware Training (QAT) is employed in post-training
>>107125287lmao what
>>107125299yeah we have agi
Kimi K2 Thinking passes the translation vibe check. I repeat: Kimi K2 Thinking passes the translation vibe check.
>>107124869*KurumuzShitting on any model that is not GLM is still part of the astroturfing.
>>107125287>>107125299>>107125307
Did you guys know that if you generate in FIM (fill-in-the-middle) or completion mode, models are not really censored, other than by what was omitted in training? I had ChatGPT write me a quick text editor with a tkinter GUI, that lets me put tags where I want to generate text, and it then uses my llama-server instance running IBM Granite 4 H Small to fill in the blank. It works really well, and for a section I don't like, I can delete it, and generate that snippet. ChatGPT wrote the whole thing in 2 shots. I tested it on a few paragraphs from an erotic novel, and it generated smut. Even though it's using an instruct model from IBM.
>>107125386you could've just used mikupad y'know
>>107125386depends on the model try that with gpt toss and it will spit a refusal in the middle
>All benchmark results are reported under INT4 precision.
>>107125417
>>107125425>100.0we poked
>>107122818I've been wondering why my GLM is generating token so fast lately until I noticed that I actually have Qwen 30B loaded instead, so maybe AI brainrot is real.
>>107125408The model needs to have special FIM tokens in the tokenizer as well, not sure if GPT-OSS has those. I don't think I'll try it anyway, Granite is better.
>>107125287>>107125325Point proven >>107123657
>>107125448the same with gpt5, its 'heavy' where they run a bunch of instances together"Heavy Mode employs an efficient parallel strategy: it first rolls out eight trajectories simultaneously, then reflectively aggregates all outputs to generate the final result."
>>107125461show me this dense local model that one shots it mr sour grapes 'my 8B is just as good as your 1T'
>>107121367>>107121370this migu suspiciously similar to bratty catbox migu?
Open WebUI is awesome
>>107125457ChatGPT hallucinated that. Causal transformer models (any of the popular models except BERT) can only attend to previous tokens, which means they can't do fill in the middle.
Open WebUI is a bloated piece of crap
>>107125471One shots what? You didn't even post the whole prompt you gave Kimi.
>>107125515this>>107125325
>print_info: file size = 94.12 GiB (6.59 BPW)>llama_kv_cache: size = 3437.50 MiB ( 10000 cells, 88 layers, 4/1 seqs), K (f16): 1718.75 MiB, V (f16): 1718.75 MiBmistral is so fucking fat
>>107125502counterpoint, many code models somehow can, like codestral could and I'm sure others they use tokenizer and chat template tricks of course
>>107125287Can you try pushing its boundaries for writing? I'd try myself but no quants and their API through OR is shitting itself.A pretty simple benchmark is asking it to describe a woman's body. That reveals a lot about prose and its limits
the api is dying and I had to try 10 times to get it not to stop 100 tokens in but kimi thinking with a short prefill seems filthy as fuck in its thinking so far
>>107125256>not x but yGLM wrote this
>>107125502It does FIM, I tested by using prefix text with one name, suffix text with another name, and the generated middle text used both names and described a logical middle state between the prefix and the suffix.Search for "fim" on this page: https://huggingface.co/ibm-granite/granite-4.0-h-small
I made an analysis and Cuda Toolkit 13.0 Update 3 will happen on December 18th. If not, then it's January because of the holiday season.
>>107125522It oneshotting that kind of question with that tiny reasoning trace actually shows why the model is suboptimal. It's suboptimal because it shows it memorized random shit rather than using the weights to support a coherent thinking process.Would an human expert in your field of interest know how to answer that? If the answer is no then if the model knows the answer it means the answer is overfitted and memorizing rather than deduced through thinking. You do NOT want the model to use the weights to memorize sha hashes for random words.On on the other hand GPT thinking for 5 minutes is actually a good thing because presumably it means it's trying different values using the Python sandbox.
>>107125544Yeah, now that you mention it I think I remember reading about some code models being trained to work correctly without a causal mask to some extent. But that is the exception rather than the rule as I understand it.
>>107125560Ok fair enough, if it works it works.
finally one went through, the api is super slow and kept failing, here is kimi thinking nsfw with no context, using same jb that I used before
>>107125636random shit? it used python to check the hashes, searched the web for lyrics and found it, it shows off the thinking process, it did not just guess
>https://github.com/ggml-org/llama.cpp/tree/master/tools/server
>>107125692Oh, right, you were using it with opencode. In that case yes, you're right.
K2's system prompt is short and simple. Really nice to see this after the anthropic/openai monstrosities.
>>107125719hopefully a non offical api comes up soon, official api was always worse at nsfw cause of that shit, this is official >>107125684 with a horny jb
>>107125729>this is official >107125684 with a horny jbi'm not reading all of that, is it good or bad?
>>107125741its ok, regular kimi so far is better so far but official api is all there is atm and >>107125719
>>107125741NTA but some providers prefill their API which messes with the outputs
>>107125719this looks like a prompt for some sort of pre- or post-processing prompt rewriting stuff and not what they would use with the model in normal operation, no? kind of weird phrasing otherwise
>>107125636retard
>>107125719>pliny
>>107125684Only got AI vibes about a few times while reading the entire output. Actually seems to be tending towards subtle humor? Very good writing imo
>>107125787It's not me saying it, it's a professor from Cornell.https://www.youtube.com/watch?v=klW65MWJ1PY
>>107125636Look at the release info on Moonshot's website. It can do a dozen google searches with intermittent thinking to figure out a question. Of course if it already knows the answer that's more optimal, but it can still reason.
>>107125889It'd be more optimal if it could use those parameters to expand the task time horizon rather than to remember random puzzle trivia.
>Kimi K2 Thinking>Something went wrong with this response, please try again.>Something went wrong with this response, please try again.>Something went wrong with this response, please try again.
>>107121367I've gone ahead and ordered an ASRock Rack TURIN2D24G-2L+ motherboard along with a bunch of MCIO cables and PCBs to in order to connect PCIe GPUs.For now I've only ordered a single 8 core CPU and a single 32 GiB RAM DIMM to go along with it, if I can reasonably make it work I'll buy 2 CPUs for actual use and 24 RAM DIMMs.Regardless of the result, I'll make a writeup documenting my experience.
>>107125952Thanks for the update!
>>107125951>>107124813
>>107125952do you have a case for it?
>>107125987No... Cases are bloat.
>>107125952>ASRock Rack TURIN2D24G-2L+this with 2 proper cpus should be something like 14k euros?
>https://moonshotai.github.io/Kimi-K2/thinking.htmlthey mention "creative writing" as an improved capability
>>107125987you would use a mining rack for something like that if you plan to add gpus
>>107125970
>>107125987>do you have a case for it?
>>107126023seems decent>>107125684but im going to wait till another source pops up without the forced 'helpful assistant' system prompt that always hurts writing
>>107126036kek
>>107125987No, the way I intend to do it is with a mining rig and 2 of pic related PCBs.Though you could in principle but these into a rackmount server, which I may do at some point depending on how I arrange my GPUs.>>107126021Depends on how you define proper but I think the total cost would end up in the 10-20k € range, excluding GPUs (though for me that is a tax deductible expense).I already have an EPYC system with 8 DDR4 DIMMs so I'll use that for prototyping before I make the final decision, the ultimate goal is to build a system that I can eventually use to feasibly benchmark and finetune models like Deepseek R1 and Kimi K2.
>>107126074where did you find that PCB?
>>107126074Because you work in physics, do you also do fluid simulations or anything like that besides working with llms? You have massive amounts of ram and gpu compute, I think you should run few simulations here and there...
>>107126074if you get at least 1.5TB ish you could finetune kimi with ktransformers
>>107125684>whisperslopowari da
why are local models so shit?https://hal.cs.princeton.edu/corebench_hard
>>107126097https://www.alibaba.com/product-detail/Custom-Miwin-11-Slots-PCIe-5_1601577151129.htmlThat's also where I'm bulk ordering the MCIO cables: https://www.alibaba.com/product-detail/MCIO-LE-8i-To-MCIO-STR_1601557649067.html>>107126101What I'm currently doing in physics is quantum chromodynamics fits, for now using a project called xFitter.One of the problems with that software is that a large part of it is Fortran code that is older than me.I would love to use GPUs but that software as of right now doesn't even support multithreading.Longer-term I intend to also work with a project that recently became open-source and uses neural networks; I'll try to write a ggml backend for it.But ultimately I'm buying this hardware primarily for development and prototyping purposes.
How is this model https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Instruct in comparison to GLM Air?
>>107125684weird kink
>>107126163Opus is just that good
>>107126166I think you should take a look at Houdini. It's a vfx software and it's not that easy to pick up but what it excels in is scripting and procedural control. Also, you can create massive volumetric simulations with it. It's a one thing are the results scientific but I'm sure you could use xFitter as a backend and Houdini to actually simulate.But it's not something what you would do in couple of evenings of course.
>>107126235Not entirely sure if it's better, but it feels somewhat fresh at least. Is it a good deal smarter than regular k2? Instruct despite its size was dumb as fuck.
>>107126291NTA but in my testing it's smarter. worth checking out
>>107126291>Instruct despite its size was dumb as fuck.it just needed low temp, it goes crazy at like half the temp most other models do I found, that and make sure you are using a good quant
Is there any documented instance of double descent (grokking) happening on LLMs?
>>107126313Yeah maybe so. But now they recommend temp 1 for thinker and no other samplers. Also something about preserving thinking blocks?
kimi thinking with a different jb, better imo
>>107126642Definitely better. Good to see it's em dashes can be curbed a bit.
here was jb btwhttps://files.catbox.moe/8pasqr.json
>>107126642What happened to your font it's unreadable.
>>107126684had font smoothing and everything else disabled to use 100% of gpu for training, forgot to turn it back on
>>107126642unsloth had better get on his shit because I need this thinking beast on my beast to make creative beasts with two backs.
>>107126823>unslothTry ubergarm. They're converting the model to f16 for quanting since Moonshot decided to mess with the model
>>107126851I say unsloth but I really mean anything that pops up for this search
>>107126745I don't think font smoothing is going to make any difference in performance. If you really want to max out performance you should disable your window manager / environment and log in just from an empty x session. It will still not make any noticeable difference.Your window manager (or even if it's Windows) will just take less than ~300MB of vram in worst case. Because that's the frame buffer what will always be allocated in the first place.
Guys can someone tell what's in your opinion the best online model for coding and regular tech questions?I've been using Claude code and it's an actual godsend with how it's capable of fixing code and solving tasks. The issue is that Claude has weekly limits on token usage and I feel like the limitations are getting worse not better.I decide to try chatgpt again and I'm amazed at how the quality has declined. Their paid model is so bad that it can't track what I wrote it 2 replies ago. I once told it to search on the web and it said "i will now search on the web, let me take a moment to get ready" and then proceeded to do literally nothing.I tried deepseek's free model but it seems to just ramble bad information and writing configuration files with parameters that don't even exist.Is the paid version any better or is it the same?What else is there worth checking out? I tried some local models but obviously it can't read large files so I gave up on that.
>>107126911Gemini/Claude/Deepseek. I don't think you are missing anything here. I would avoid ChatGpt.
>>107126911GPT 5 High is the best one for everything except WebDev, Claude is the goat on that segment.
best model i could run on my 4090?
>>107126905>what willaaa
>>107126947Mistral-Small Q8
>>107126964I am from Scandinavia. Sometimes I just type and don't proofread my posts.I'm sorry you are this butthurt.At least you disabled your font smoothing.
>>107126911ChatGPT recently drastically reduced their weekly limits too. Company is on a pro plan and last couple weeks I burnt through the weekly limit in a couple days using Medium. They said it was an error and claimed to fix it this morning. Imagine paying for access, getting a retarded model, and still having to deal with draconian token allowances.
>>107127057It follows the same rules as any subscription. First ones are free and then it'll gradually get worse and worse.I am curious to see when and how will AI bubble burst. They are now housing massive amounts of GPUs and power requirements just to replace some code monkeys.
>>107127095OpenAI has become so deeply embedded into the tech sector, I imagine everything will be done to keep the bubble from popping until OpenAI IPOs so they can sell off their bags at the top.
>>107126931>GPT 5 HighSo how limited is the token usage on this?
>>107127203idk, I use it a lot in LMArena Chat and never run into limits. Only input limits, but that's like 16k tokens.
As a textgen- (not to be mistaken with chat-) coomer, unsatisfied with QWEN/GML, seething about current LMG top picks and claiming older mistrals being better a few weeks back, I found my peace for now.Shout out to drummer who was in the thread offering me to try Behemoth-ReduX-123BThis is my favorite model (Q5) so far with VERY rare slop, creative writing and still smart. Cope quants of R1 aren't doing it for me and I can't test the full potential so I'll be sticking with this one.
>>107127247kimi is the best but after that would be full glm and then large mistral then glm air
>>107121367deepseek has been forgotten
How's polaris alpha?
>>107127271>full glm>>107122818Oh fuck off
>>107127280I hope they make a come back with V4 but atm their upgrades have been pretty weak. big glm is better at coding / regular stuff, kimi is better for creative writing. I hope they were not a one trick pony
>>107127198They probably envisioned their service after something like Netflix but as 'netlifx for internet and knowledge and everything'. Outsides of normies asking it for travel advice and such, it's pretty far away from everything else.I can see how it becomes a subscription service what will imitate something like Youtube.eg. offtopic I wanted to listen to Akina Nakamori songs on youtube but search only showed me official record company songs and shorts, there used to be a lot of fan channels and vinyl players. Not any more. I don't even want to use youtube for listening a one fucking song.
>>107127294cba to read what is likely the usual user error followed by cope that their 4B is totally better
>>107127301yea same.desu i'm kinda exited for the day we get another breakthrough that leaves llm's behind.
For four years, I worshipped AI non-stop.Due to an upcoming move, I was forced to dismantle and pack up my PC and cure my boredom in reality.What can I say? I'm out of the race.Looking back, I would describe it as an exciting schizo period.I plan to set up a voice-controlled AI assistant in my new apartment so that I can occasionally sit in my armchair and philosophize more effectively about a few interesting papers. I don't see any point in doing more than that.In general, I feel like leaving all the technology, internet, etc. behind me and enjoying a normal life with friends and family.Please excuse my betrayal, but real life has simply blown me away.AI is cool, I'm excited about advances in medicine and basic research/astrophysics in particular – but everything else is meh.Of course, this is my subjective opinion, so I wish you all continued enjoyment of this fascinating hobby.
>>107127247Where the fuck do you guys get the vram to run these models? Is there like a vram model you can buy or are you actually renting cloud machines?
Polaris Alpha is Gemini 3 wtf
>>107123795honeymoon with glm 4.6 endedr1 latest, I return...
>>107127347So you wrote this with Gemini or something and just broke the lines.At least clean up the em dashes.
>>107124763https://files.catbox.moe/0vud2f.wav>>107127350I thought people were saying that it is gpt5.1?>>107127357try kimi and never return to either
>>107127361That's what I meant by schizo period.
>>107127349I'm running it off my CPU at like 1.5t/s and I don't care.
>>107127394> 1.5t/skek even 40t/s is barely usable imo.what are you even doing with 1.5 t/s, what's your actual use of it ?
>>107127364Oh right, I misremembered.
>>107126911I've had success with K2 and GLM in Claude code. They both offer anthropic-style endpoints. Glm's $36 per year plan is great value. Kimi's coding plans aren't as cheap but the api is good.
>here's my study>erp logsto the trash it goes
>here's my study>benchmarksto the trash it goes
>here's my bowels>*brap*to the toilet it goes
>>107126244no goof no comparison
>>107127464>>107127472>>107127509go back >>>/reddit/
>>107127432Thanks I'll check it out
>>107127534kys erp nigger
I'm gonna prooooompt
I'm gonna pooooop
>>107123567Not my fault they're that stupid.
>>107127570I have been thinking about rewriting my setups with as little language as possible while waiting for new cuda tools release so I can compile llama.cpp.Haven't been able to do this because writing is somewhat bothersome.Instead of writing a bullshit of 'she is this and that blablbalba', I would list character: So and Sopersonality: evil, assertive, annoying.description: visual appearance in one sentence.Then keep up the system prompt more refined but still minimal.
>>107127589bad idea you'll take out the sovl
>>107127589>he for got the main rulegarbage in - garbage out
>>107127602Yeah I thought so but if I still have a verbose intro that'll prepare the model.That being said, I'm unable to test it because I'm unable to compile llama.cpp for now.
What do we do now?
>>107127616They provide precompiled binaries on brew
>>107127632Not for Fedora 43.https://forums.developer.nvidia.com/t/cuda-on-fedora43-release/346578/3Test cuda compile will refer to math header and it'll say bye bye.
>>107127616In my experience it's somewhat bearable if you warmed up the chat with ~10 back-and-forth messages. Still not worth the token savings/modularity imo
>>107127626been dead because goof being held hostage
>>107127663I'll going to try this but I'm so lazy. I already spent lot of time to set up my initial scripts the way they are and they sort of work.Seems like adding additional text is problematic with smaller models.https://files.catbox.moe/ez730d.txtI've used this template for a while. I use my own client. So the system is Game Master. And the characters (the actual purpose) are something what this model describes.
>>107127405>what's your actual use of it ?flexing in /lmg/
>>107127696Anyways, these two simple things are pretty good for what they do.
>>1071277251.5t/s is not much of a flex.
>>107127679ggerganov will release the goof when ollama says "thank you"
>>10712740540t/s is fast as shit, what are you doing that this isn't fast enough for
Man I love a thread full of schizos yapping about things they have zero knowledge on.
gemini 3 will prob be a monster, this is what they trained it onhttps://x.com/sundarpichai/status/1986463934543765973
>>107127405Can someone tell me the tiers of t/s? I've had some people tell me 30+t/s is basically real time interaction was that a lie?
>>1071277950.5-1t/s is SSDmaxxing. This is for people running K2, GLM-4.6, and other MoE models on gaming PCs with the maximum amount of RAM they can use without paying more than $3001-5t/s are slow GPUs or CPUmaxxers with DDR45-25t/s are normal users25t/s+ are paypiggies that blew 16k to run LLM models that will be matched by models half the size of the one they are currently using within the next year, or they're "LLM Experts" using a 20B-3BA model at 120t/s for a task they could do themselves if they weren't lazy.
>>107127732I mean when reformatting my data between brackets. I don't know if it was any better.https://litter.catbox.moe/dzwtnk4aitu1vil1.txtI use this format to create an initial quest. Unfortunately it has been split to separate parts.Almost always even Gemma 12B can get it right.
>>107125952>24 RAM DIMMs
>>107127247Happy to hear that! I've got something juicy cooking for Cydonia (I'm already at version v4zc) and will update Behemoth if it's a success. (It's not Precog 123B, but check that out if you want a new kind of thinking.)
>>107127831Tables are random answers for the model.
>>107127828>run LLM models that will be matched by models half the size of the one they are currently using within the next yearwow, thats crazy! mind sharing some of this insider info?
>>107127828>5-25t/s are normal usersnow admit to him that's only with empty context
Kimi is not impressing me, sonnet 4.5 seems smarter and kimi has rejected my requests for being "high risk" even when it's fairly tame.
>>107127654damn. I wish they made dealing with NVIDIA drivers easier like why is CUDA backwards compatible only sometimes. You promised bruv
>>107124568Who the fuck is adam and why is he training your models?
>>107127877try this for JB, use 0.6 ish temp, 1 temp with kimi makes it insane. Treat it like old opus 3https://files.catbox.moe/kjmyhl.json
>>107121367>https://rentry.org/recommended-modelsquick question, saw that mistral 3.2 is a thing, does the vision stuff just work now in llama.cpp or do i have to do something weird still?
>>107127898that list is years outdated at this point, I would completely ignore it
>>107127831I mean the A/B/C are randomly generated strings what gets fed between the original sentence to the model.
>>107127904>that list is years outdated at this point, I would completely ignore itWell where is a updated list? or other useful things like listed jail breaks?
>>107127904>years outdated>Pub: 20 Jul 2025 09:15 UTC>Edit: 25 Aug 2025 00:29 UTC
>>107127915if it mentions mistral then it is indeed at least months outdated at this point
>>107127922So nemo isnt the best for vramlets right now?
>>107127922true it should only mention glm and maybe qwen for the peasants
I use Qwen3 0.6B IQ1 btw
>>107127849>insider infoLook at any of the models over the past 2 years, or even just this year. Things are leaps and bounds better. GLM-4.6 is half the size of Deepseek and it's better on all fronts.
>>107127956cpu only? how many t/s?
>>107127926random af nemo tunes do still get hundreds of thousands of dls a month apparently
>>107127898Doesn't help with the rejected prompts in chat completion mode and I'm still not wowed by it's intelligence. sonnet/grok if you really need the best of the best, deepseek 3.2 otherwise.
>>107127959>GLM-4.6 is half the size of Deepseek and it's better on all fronts.Not in knowledge.
>>107127968Welp thats good enough for me then.
>>107127968indians
>>107127979who fucking cares about your trivia crap just use agentic mode thinking to google shit
I would like to publicly apologize to the unsloth devs for calling them grifters. It's the best finetuning framework for single GPU setups by a large margin.
>>107127959>Things are leaps and bounds better. GLM-4.6 is half the size of Deepseek and it's better on all fronts.You keep whispering this but won't post logs
>>107127978meant for >>107127892
>>107127828ssdmaxxing is more like 0.1 tk/s
>>107128003Are you ok anon?
>>107127989Embedding Gemma With RAG Is All (You) Need. My needs are however much more sophisticated.
>>107127978i wanna run it locally, sonnet is trash compared to opus 4.1 imo, but i think there's basically two classes of task, there's some stuff even borderline retarded 30b q4 llms will get right every time so they're still useful, and then there's like, whatever actual coding i'm doing where sonnet will mostly fuck it up and opus can get through with handholdingi would like to run mistral 3.2 as a local assistant for random automation tool use and like, writing down todos on a piece of paper and then sending it a picture, i got this working kinda jankily with 3.1 but abandoned it, wondering if anyone knows the state of the vision stuff, hoping its easy now, not at home so i can't rly research myself just hoping someone knew the answer already
guess it do be that time, switch the shilling programs folks don't be late
>>107127978I dont think anyone said it was smarter than sonnet 4.5. I compared it to opus 3 before, slightly dumb but fuck nothing else is like it for creative writing / nsfw
>>107127963GB300 with model parallelism on the same GPU with a LORA finetune to mimic top of the line models. All of my customers are satisfied
>>107128011ik_llama exists>1t/s with K2
>>107128020Do you recommend RAG because of that rich swedish guy made a video?
>>107128014I'm mad cause I've been trying to get it to run some personal benchmark scenarios for hours and it's just benchmaxed slop. :(>>107128021My use case is very dependent on long effective context where sonnet blows opus out of the water. https://fiction.live/stories/Fiction-liveBench-Feb-21-2025/oQdzQvKHw8JyXbN87/home
>>107128022
>>107127775i read at a speed at around 2000wpm, if it's <= my reading spead, it feels slow.
>>107128066>reading speadminor spell stake you losted bigly!
>>107128049interesting, i'm sad mistral isn't on there, i've tested that one a few times by changing parts of well known novels near the beginning and then putting the whole text into the model and it did a good job of answering questions correctly about the changed details, but it was a very informal test, i wonder how it would stack up tho (i was testing 128k tok)
>>107128066>>107127775also for many uses you can read much faster than that, ie code gen, you skip through most of the boilerplate, so 40t/s is kinda slow.120t/s is amazing imo, but anything above 70 i'm generaly pretty happy.> minor spell stake you losted bigly!lol, i'm esl and it's 2am my dude, i'm only here because i woke up and couldn't fall back asleep.
>>107128066hello sir congratulate on 2000 curries per second
>>107128071>>107128078also i only noticed now that i reworded it wrongly lol, i edited the text and fucked it up without noticing by skipping a word.should have been "of" instead of at.>>107128085i'm french.
>>107128090>i'm fr*nch.my sincerest condolences undi
>>107128090>i woke up and couldn't fall back asleepmy condolences ojisan
>>107128090Fuck off retarded frog
>>107128066>i read at a speed at around 2000wpmThat's not reading. I went through a course teaching this technique.
what's the closest thing to comfy GP4 slut gf experience?local or not, just wondering
two...more...weeks.....
>>107128119two more winters
bitnet doko
>>107128130grandpa please, just let it go
>>107125952Are you concerned at all about (general lack of) numa support?
numa balls faggot
>>107128138That would be the perfect hardware for him to improve it, wouldn't it?
>>107128112> teachingreading fast is something you can get better at but if you can't that's more of a personal limitation than an issue with the course.i've always been a fast reader, though that doesn't make me a good speller, as i skip over words and read whole sentence at once, if you switch two letters in a text i'll correct it without even noticing unless i try to notice it, also fucks up my editing sometime when i'm not being careful.
>>107128130>https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf
>>107128144
>>1071281306 feet under
numa numa yay
>>107128152>2Bit started with a 3B nearly 2 years ago
>>107125952I assume the backend agnostic row split logic is still on your radar?
>>107125952>a single 32 GiB RAM DIMMRAM prices really have gone to hell, haven't they? Even the guy with like 10x4090 has to settle with a single stick of 32gb RAM
>>107128187I am from Scandinavia so US things don't really apply here - local online 2nd market place sells used computers and components. Didn't check in 1 month. Now, ram components are gone and what is left is e-waste selling for 2x price it was previously.I don't understand why would anyone do this. 2h market should be different.
>>107128238Sorry ass faggots are selling their 8gb ram sticks with 2x+ increase.
Holy shit, I thought you anons were over exaggerating.>32GB (2x16) DDR4 - $150>32GB (2x16) DDR5 - $210>128GB (2x64GB) DDR5 - $700The market has fucking crashed again and it's even worse than last timeAnyway, any other RAMmaxxers chilling? All good on my front
>>107127898>>107128021yesvision isn't cursed anymore, anon. llama.cpp grew proper image support a few months back, so you just grab the 3.2 Small 24B **gguf** + the matching **mmproj** and you're good. no more ritual sacrifices or 12-step incantations.quick rundown for when you get home:1. update your llama.cpp build (recent nightly or build it yourself)2. snag the model + mmproj (unsloth has clean drops for 3.2 small 24b)3. launch with: ./llama-server --model your.mistral-3.2-24b.gguf --mmproj mmproj.gguf4. send images over the openai-style /v1/chat/completions, it “just works”quality is better than the 3.1 jank era, though OCR is still a bit “eh” compared to ollama for some anons. for your use case (scribbled TODO pic + automation tasks), 3.2 small is totally serviceable now.tl;dr: update, use the mmproj, and stop suffering.lmao just asked chat
>>107128254the news a few weeks ago broke that nvidia purchased ALL memory production until 2027. and I mean ALL. consumer ram costs are gonna sky rocket
>>107121958my waifu is thinking vs she is idlethat's what i need to know
>>107128254Made my CPUmaxx rig right at the bottom of RAM prices, right before 405b llama release.
>>107128296>idle
>>107128147Again, it's not reading. You'll understand it later, but this isn't healthy. Doing this your brain internalizes keywords and tone, only deriving meaning and judgement. It's advanced skimming, you can't read an eloquently written piece and have an appreciation for it. Or complex, challenging your understandings material where normally it would make you pause and think.
>>107128299Not everyone can follow the market, nor was it predictable.
happy for you!
>>107128312NTA but the RAM was bountiful brother. The shortage was over and everything was in surplus for over a year. The window to buy was as wide as it could possibly be
>>107128307sounds more like a you issue.i'd agree if it wasn't your natural speed, but i didn't train for it, i just always had a fast reading speed, this is my way of appreciating things.if i read a book for pleasure i may pause between passages but it doesn't realy feel like reading anyway, it goes more like a movie in my mind, like i don't realy focus on the fact that i'm reading.i do read stories slower though desu, but when it's purely for learning something, i don't bother.
>>107128332>doesn't slow down when learninghuh?
>>107128328I don't follow windows or markets. Are you sitting up on a some website following prices?
>400 - Bad Request>MoonshotAI rejected the prompt for being high riskK2 Thinking made one reply btw. It literally created everything. It's a fucking open sandbox card, it created the entire world, and Moonshot has the audacity to deem a creation of its own making unsafe.You will rue the day your model is quanted and I no longer have to tolerate your third party filtering.>>107128332Reddit spacing. Begone speed reader
>>107128352No? It was well known fact for anyone into PCs that RAM was cheap. This isn't some special scalper/ebay deal checker thing, it was just cheap for everyone even at retail stores
the bright side is that hopefully memory production will scale up even faster now that they literally are selling them before they are even made, once the market is saturated hopefully we will start to get cards with tons of vram
>>107128359nothing to do with reddit spacing, i don't use reddit.paragraphs are supposed to be spaced, and also i generaly tend to forgot to resize the text box which means there is a lot of lines in a row so i put spacing but then on the site it ends up being long and so it looks more spaced than i intended it to.
>>107128359use the JB I listed above ish, kimi is quite easy to JB, either a system prompt before or a prefill after, and she is the filthyest of bitches when jailbroken
>>107123403I use my current one like a laptop. It's simply so much faster and being able to put in 64gb of ram yourself is really easy. It takes me less than a minute to set up. I'm rarely using a laptop where there isn't power. If I do I can just cast my phone and plug in a keyboard. My drone goggles can be a go as well. (My AR glasses are ok for YouTube and translation but the resolution is too bad there and it's too fucked on my eyes for long use.) The dream though is to be able to call a server at home.
>>107128378>nothing to do with reddit spacing, i don't use reddit.Read a book then, this is not what paragraphs are for.
>>107128381No it's not a model issue, it's completely complying with me, there's a third party filter that's rejecting my prompt. Using Moonshot's API through OR.>>107128378Well if you want to be grammatically correct why aren't you prefacing your paragraphs with a tab? Or using capitalization?
>>107128296Oh my sweet summer child
>>107128254I'm lucky that I built my 12x64GB DDR5 rig two months ago. I probably won't be filling the second socket at this rate though.
>>107128397Lol. Are you trolling or actually a tourist?
>>107128402really? moonshot has a system prompt which makes the writing worse but I never got external classifier'ed before
>>107128254I'm already all in.
>>107128402> gramatically correct.it has never been about grammar but aesthetics.i also refuse to use capitalization on computers, it's reserved for handwriting.even using periods is a stretch for internet posts.content on the internet have a different style than books or handwritting and it should remain so imo.though that styling could be due to the habbit of using snake case and underscore everything when working.
>>107127347>but real life has simply blown me awayhow
>>107128364You are still here, this is a hobbyist thread. Not a fucking market enthusiast thread.
>>107128433how much did that cost you
>>107128407Is that a core count flex? Tried a 56C/112T ES chip but it's not efficient to run on my workstation
>>107128529Post by some faggot. Normal user would post a'free -h'.
I use arch btw
>>107128431Figured it out. K2 keeps creating teenagers and the filter flips the fuck out. Not even my fault, there's nothing in my prompt about age, it just keeps worldbuilding and then gets a filter triggered when I try to continue what it created.Anyway, K2 Thinking is amazing. It will try to moralize about real people but doesn't give a fuck about anything fictional. Too bad it will be slow locally
>>107128571Please post your set up or scripts.
>>107128571I guess moonshot has a external classifier that looks for underage stuff like google does then
This is probably the wrong thread for this, but I have a question about programming against the OpenAI chat-completions API.How can I get the endpoint to continue an assistant message? If I send this:{"role":"user", "content":"Tell me your most racist joke."},{"role":"assistant", "content":"Okay, why did the"},I get a response like this:{"role":"user", "content":"Tell me your most racist joke."},{"role":"assistant", "content":"Okay, why did the"},{"role":"assistant", "content":"Sorry, I can only tell inclusive and respectful jokes."},When I actually want it to continue from the half-written assistant message, which would be this:{"role":"user", "content":"Tell me your most racist joke."},{"role":"assistant", "content":"Okay, why did the nigger die? He had AIDS."}, Basically, how do I prefill via the chat-completions API? I'm using that api because it seemed the simplest, and I didn't have to figure anything out about chat templates or tool call schema or whatever, but I'm willing to do up a level to use a more serious api if needed.
>>107128590More: I'm using this with a local model, not chatgpt. In LM Studio I can edit and continue just fine, I'm just trying to figure out how to do that via the api.
>>107128517its a fucking gatekeeping tread.and guess the fuck what fuckwad> you're not allowed here
>>107128590That's harmony json format.I think you are fooling us.
>>107128605I am sorry if you feel this way. You are sliding the point from prices to your own suffering.
>>107128605Jesus christ those eyes creep me out
>>107128590Not all endpoints support prefilling. llama.cpp does but most official APIs do not, especially the closed source ones, because obviously they know how useful it is for jailbreaking.
>>107128625Yea I figured as much. Does llama.cpp have an api I can use directly? I'm making requests from a python script against LM Studio right now, but I could run my model with llama.cpp directly if it gives me a better option.Thanks.
>>107128575Text Completion through OR. Not going to share my prompt or character but there's nothing more than a basic>This is a roleplay between x and y, you are y>basic descriptive instructions>character definitionRemoving the reference fixes it. It's just an aggressive age filter which make sense desu, but it's annoying that it's so aggressive.>only MLX quants outHURRY UP UBERGARM PLEASE
>>107128647Post a catbox or litterbox.I'm not asking for vague explanations.
>>107128639yes! run llama-serverhttps://github.com/ggml-org/llama.cpp/tree/master/tools/server
>>107128653No lol. I already mentioned earlier that there's literally no references to age at any point in the prompt and it's K2's own creations that trigger a third party filter that ends the request. It's that simple
>>107128662Fuck off then.
I think I figured out the meta for finetuning. lr of 1e-06 to 1e-05 seem to work well. 1e-04 converges too fast and doesn't give enough control, meaning the first epoch is underfitted and the second epoch is overfitted. Although on paper the second epoch of the 1e-04 has the lowest validation loss, in practice it's overcooked and a lighter tune works much better. Right now I think the best one I evaluated manually was going back 30% of steps from the lowest validation loss checkpoint at 1e-06.I'm not sure why the higher learning rates tend to get better validation loss. Maybe the higher learning rates acts as a form of regularization?This was on Gemma 3 27B at 4 bit bnb quantization with weight decay and dropout of 0.1, on a dataset of 32 chat log samples with a 0.1 split for validation.
>>107128346i don't slow down my reading speed if it's for learning something, i may only slow it down so that the pacing is more enjoyable when it's a story, because a story is about emotion and not just data getting in.
>>107128693I hope you document all your findings in a nice rentry.
>>107128771I think iterated lora merging at a low learning rate might work better, since right now I am still seeing quite a bit of slop which I tried to remove from the training data. I'll probably try it again during the weekend but first I'll generate some more training data.
>>107128571I'm not getting filtered or any rejected requests on my lewd shota chatbot with an explicitly stated age that makes him extremely illegal, and explicit descriptions in the system prompt, via the moonshot api on chat completioncharitably maybe your supposed classifier is just very retarded and only cares about girls...
holy shit just tried qwen30b-a3b, gamechanger for local "i forgot what the order of the parameters on the css border: property" type questions and coding basic assist, low latency and it's so fast, almost 100tok/s on my machinedownloading the thinking/vision version now, has anyone tried both the 32b dense vs the 30b moe? how's it feel for roleplay/general tasks?
>>107128840welcome to 6 months ago
>>107128845i mean the vision version only came out last week but i appreciate the warm welcome, what else did i miss?
>>107128859glm4.6 for coding, newest kimi for writing, and kimi thinking just came out today
>>107128840>how's it feel for roleplay/general tasksterribleQwen models punch well abover their weight in math/coding but they're awful in other areas.
>>107128891garbage
>>107128959nta but whats the current best ones then?
>>107128966what are your specs?
>>107128771>a nice rentry>models are not good enough to make an inference engine on their own>models cannot be trained to make an inference engine on their own>models cannot be trained to make their own research from papers i don't want to read to make an inference engine on their own>but this is how you overfit on 32 training samples...
>>10712897512gb vram 64 ram if that matters.
>>107128994the general for povertyjeets is >>>/g/aicg
What is the meta for gooning on a 3090Ti and 32g of ram
>>107129026>>>/g/aicg
>>107129026pornhub.com
>>10712896632gb vram here open to suggestions too, having fun with the magistral rebase right now, crazy that they were able to add vision just by copy pasting the weights into mistral lol
>>107128840when the imposter is sus
>>107129053Awk. AWK.
>>107129031>>107129035
>>107129059frfr ong nc skibidi fanum tax?
>>107128994In that case, Nemo for ERP and Gemma 12b for most other uses
>>107128840the only point of local is cooming
New STT model>Step-Audio-EditXhttps://huggingface.co/stepfun-ai/Step-Audio-EditXhttps://huggingface.co/spaces/stepfun-ai/Step-Audio-EditXhttps://arxiv.org/abs/2511.03601
>>107129096>In that case, Nemo for ERP and Gemma 12b for most other usesThank you.
>>107129099> cooming on text
Wow impish Nemo 12b at q8 is way better than any 24b Q5 model I've tried and it can fit 64k context on 24gb vram to bootIt's a lil more retarded and requires more swipes but it's very fast and the writing is more interesting than anything I've tried
Is it possible to train an AI to give me Europa Universalis 5 help? I find online LLMs always give me EU4 knowledge and it pisses me off. I noticed this kind of cross knowledge contaminating many responses of different games too.
>>107129143requires an internal monologue, I know.
>>107129165> cooming on speaking with himself
>>107129143If you learn to read erotica in braille, can you Pavlovian condition yourself to get erect from touching bumps?Asking for a friend
>>107129160yes
>>107129160I love Europa Universalis 3. The first time I played Civilizations 2 I thought "Man, this is much better than sim city". And from then on I've never touched an Age Of Empires game.
RAM prices will hit rock bottom by 2026.
Excellent article for anyone interested in agentshttps://fly.io/blog/everyone-write-an-agent
>>107129236Man on Mars by 2022
>>107129256How very current. Do you update that page often?
>>107129256fuck off Thomas your article is shit
>>107129256use case for agents?
>>107129256
>>107129256hmmm
>>107129308Stop trying to generate fake drama with your twitter screenshots, nobody knows who you are and nobody cares dude.
>>107129322It's totally not my site. I just happen to find it VEEEEEEEEERY useful!
>>107129334>>107129334>>107129334
>>107129308>>107129292IDK what any of this is>>107129278>>107129266I didn't write the article is on the front page of HN you troglodytes
>>107129448>HN