A general for vibe coding, coding agents, AI IDEs, browser builders, MCP, and shipping prototypes with LLMs.►What is vibe coding?https://x.com/karpathy/status/1886192184808149383https://simonwillison.net/2025/Mar/19/vibe-coding/https://simonwillison.net/2025/Mar/11/using-llms-for-code/►Prompting / context / skillshttps://docs.cline.bot/customization/cline-ruleshttps://docs.replit.com/tutorials/agent-skillshttps://docs.github.com/en/copilot/tutorials/spark/prompt-tips►Editors / terminal agents / coding agentshttps://cursor.com/docshttps://docs.windsurf.com/getstarted/overviewhttps://code.claude.com/docs/en/overviewhttps://aider.chat/docs/https://docs.cline.bot/homehttps://docs.roocode.com/https://geminicli.com/docs/https://docs.github.com/en/copilot/how-tos/use-copilot-agents/coding-agent►Browser builders / hosted vibe toolshttps://bolt.new/https://support.bolt.new/https://docs.lovable.dev/introduction/welcomehttps://replit.com/https://firebase.google.com/docs/studiohttps://docs.github.com/en/copilot/tutorials/sparkhttps://v0.app/docs/faqs►Open / local / self-hostedhttps://github.com/OpenHands/OpenHandshttps://github.com/QwenLM/qwen-codehttps://github.com/QwenLM/Qwen3-Coder►MCP / infra / deploymenthttps://modelcontextprotocol.io/docs/getting-started/introhttps://modelcontextprotocol.io/exampleshttps://vercel.com/docs►Benchmarks / rankingshttps://aider.chat/docs/leaderboards/https://www.swebench.com/https://swe-bench-live.github.io/https://livecodebench.github.io/https://livecodebench.github.io/gso.htmlhttps://www.tbench.ai/leaderboard/terminal-bench/2.0https://openrouter.ai/rankingshttps://openrouter.ai/collections/programming►Previous thread>>108549329
First for its ogre.
What's the smallest model that still works on openclaw? I'm tired of having it crash and burn every time I try to go off the cloud.
>>108567200dunno, used a 3gb model on my 12gb vram 3060 and it just shit the bed constantly. pretty sure this software is just bad
Reminder that Saint Altman just gave us a free Codex limits reset
>>108567200forget about local vibe coding for now, unless you have 100K+ hardware. You need a 760B model to be usable. Buy the fucking tokens for now
>>108567231Those models can chat, but openclaw is something else.I can run pretty much a lot of stuff locally, but when I use openclaw it fails.
>>108567233no no no
>Hey Opus please make a plan for this feature>ok [session limit reached]lmao the $20 plan is pretty much worthless now
>>108567384Opus is absolutely not usable on the $20 plan much less at this hour
aaah
>>108567436scam altman is my favorite jew. my second favorite jew is jerry seinfeld
>>108567436Claude on suicide watch
>>108567384>>108567400It takes me a few hours to hit the usage limit but I only ask it to help me plan algorithms not do everything. If you try to ask it to scan your entire codebase to figure out X you'll use 20% in one go. I use local LLM for that and then feed the results into claude so it can do things more accurately with less token burn.
>>108567249Vibe coding works but agentic coding doesn't.
>>108567249It's more like $2500-5000 hardware. Qwen122B / Minimax2.5 / GLM 5.1 are all solid options. Opus is better sure but they can't rug pull the model from you or make you run out of tokens in 8 seconds. Options include any strix halo mini pc or a Mac for 20-40tokens per second (fast enough for real work). Local models are only going to get better. I haven't lost all hope that we're stuck begging anthropic / openai for crumbs of tokens>>108567507Qwen 122 can do agent stuff fine
>>108567436It's over for Claude Code
>>108567529>$2500-5000 hardwarekek, we're talking about agentic coding here, have you tried the models you mentioned on Open Code?Even Kimi 2.5 760B will shit the bed after 100K context, now imagine the models you mentioned.
I upgraded from qwen 4b to 9b and it's so much slower... but hopefully it's smarter. It still fits 100% in my macbook's GPU though, so idk...
>>108567168>>108566569>Gemma is less verboseCan confirm. Ran an @explore command on my vibecoded codebase and it gave a short, concise bullet-pointed explanation of it. Kimi-k2.5's explanations were usually paragraphs worth of text for the same task and responded in a comparable amount of time. Now to test its actual performance.
>>108567568use case?
>>108567529The only model of those that's running on a Strix is Qwen 122B and at a shit PP tk/s.
Do any anons here have experience running rocm in wsl ? Currently failing with Ollama and will be trying lemonade next. I have rocm working as pytorch seems to be detecting my AMD card in wsl
>>108567609Building my own coding agent harness
>>108567638oof, that's a cursed setup if I heard one
>>108567661I've started today and I'm just getting to grips with everything.
>>108567638just use kiim
>happily proompting away>out of nowhere get a message that I have run out of usage>I haven't even done that much work, decide to check my usage graph for the day and it's pic related>come here to complain about it and see >>108567436This fucking jew. Did they actually reduce usage for everyone not on the 100$ plan?
>>108567742Something is fucked on OpenAI servers, I just had a 1 hour prompt run successfully and I still have 100% left
It's impressive how disproportionate the Claude billing is compared to subscription usage.On the Max tier, you can talk to it all day every day for $100 a month. With paid extra usage, each short message is at least a few dollars. Uh.
>>108567436Some guy called it the other day! How did he know?
>>108567771Yeah, it's ridiculous. If it was API only I'd get it, coporate segmentation.But trying to grift people by trying to get them 10x API prices for what they get on their subscription is just scummy. Must be related to improving the numbers for the upcoming IPO.
>>108567771They obviously want everybody on a subscription, because like with every subscription its a reoccurring income even if you dont use it, and probably will increase costs when everybody is hooked
>>108567554Isn't it exactly the same deal as Claude Code though, usage wise?
>>108567791i had claude suggest i get a dildo it can control so I can program better
>>108567800proof?
>>108567786I'd limit myself, but since they gave $100 free this month, I'm trying it for dumb things.Asking it to commit something cost $2. Creating a simple bash script to group a few commands cost $1. But those were quick and easy tasks. I'd be scared to have it do its whole subagents thing where it works autonomously for half an hour.
>>108567816That's why I didn't even enable the thing to take advantage of the free credits. It was very likely I'd accidentally go over and end up getting billed real money.
>>108567567Yeah I use it with Qwen 122B IQ4. It's fine, does it fuck up more than Claude? Sure, but it's not unusable. I get like 19-20tps which is fast enough for real time. 35B-A3B is 40tps but less reliable so I don't use it. Would rather have a correct answer than a fast one.>>108567624For me 20tk/s or more is enough. Prompt it and work on another part of the code while it works. I want to own my tools so I'm fine with it being 70-80% as good. I think small models are useless for coding but the big boys definitely aren't. I tried Qwen27B and 35B as well on my dedicated GPU, didn't get decent results out of using it for coding but I wasn't trying with Crush / Pi / Opencode so i'll revisit it.
Anyone here using a paid glm / kimi / minimax model? Ideally something 90% as good as Opus for 10% of the cost.
>>108568019>I get like 19-20tps which is fast enough for real timeholy kek, that's absolutely unusable for large codebases, maybe this is fine for a static website
>>108568019That's not what I meant. PP means prompt processing (the same thing as time to first token).
>>108568070It's not unusable at all for large codebases. The biggest task I've asked it to do was generate bindings for a scripting language by emulating the bindings we already had for another scripting engine. It took an hour (lol) but the bindings were about 95% correct. Would have taken a month easily if I did it by hand. And because it was local I didn't hit my usage limits in 8 seconds. This codebase is about 600k lines and primarily C++I know this is /vcg/ but I'm not a vibe coder. My workflow is to prompt it, work on something else in the project manually, and then come back in 5 minutes or so when it's done running the task. Actually unusable is like < 5 tokens per second. What is your usecase that you need an answer from the LLM in 30s instead of 5 minutes?>>108568157I see. Anecdotally the time to first token is pretty quick. It only gets terrible when the context hits ~70% or so for me. For the binding thing I just had it summarize all its work and findings into a markdown file and started a new session. I have a 7900xtx and I can run the smaller 27 and 35b models at crazy fast speeds on there, but they just feel so stupid compared to claude, qwen 122b, and the other quantized bigboy models.
Eyeball planets!
>>108568244
>>108568246
put codex to work on helping find out the login flow for a 1998 MMO
>>108568244Really cool. Also I think you’re intimidating anons because it looks so good and they’re afraid of posting their projects
bruh
>>108568262Do you get refusals for muh copyright reasons?
>>108567168Gemma4 t/s (on Apple Silicon) if anyone is interested. As of writing this most recent gpus still curb-stomp even M5 MAX chips in the memory bandwidth department to these should be even faster on those. the 26B moe model runs lightning fast on opencode with ollama as the backend. The 31B dense model is obviously shower but not enough th be utterly unusable, though I haven't tested either's performance at long contexts so I'll have to test that later.
>>108568401does it do dumb shit whenever it says that? What model is it?
>>108568442Kimi. Seemed to calm down for a while after getting told.
>>108568432it's been completely on board with it, probably too niche of a game for it to care anyway lmao
>>108568252are you completely generating these planets - craters, gas clouds on the gas giants, etc.? how is it done?I always wanted to make my own space gaymu, this might finally inspire me to get to work (or rather proompt). keep spamming this anon!
>>108568449I've used that as my main model for a while but I've never seen it say "This is getting complex" for anything I've asked it to do, even a complete refactor of a script, No that might be because the shit I ask you to do is relatively simple And I go one step at a time instead of expecting it to shit out quality stuff in one shot.
>>108568496it may be because I'm running it with context nearly maxxed out
>>108568526If it's becoming retarded You might want to run a compaction soon.
>>108568435Can you test Gemma 26B with OpenCode on your machine?
>>108568481Yes the planets are completely generated on demand based on its known parameters. I'm not too sure on the current generation method, but it seems to produce expected results most of the time, with some exceptions like LHS 1140 b not being a water/ice world. But since water content is pretty much a coin toss in temperate zones, it's a given that they won't always look how you might expect them to. I'll probably make a lot of changes later on but it's a solid foundation.Terrestrial surface generation is one thing I wasn't able to proompt exactly to my liking as there's a lot of things that go into terrain generation to make it look natural and AI in its current state can't really take my ideas very well and turn it into something decent. The implementation of it in this app is a watered-down version of what I managed to achieve (as it needs to generate the map near-instantly), so it does the job. Either way, I'll make the code public if anyone wants to take what I vibe-slopped and improve on it.
>>108568534My assistant doesn't use compaction. When I run out I just discard the ~30% at the beginning of the context.I'm trying to reverse engineer the encoding of an undocumented assembly ISA. This the level of detail I'm working at.
>>108568545I already did that earlier, though I only had it do relatively simple stuff. See the reply chain here >>108567750Will test more tomorrow and report my findings but I need to sleep soon. >>108568576Neat! I've been wondering how these models along with agent harnesses perform when working with low-level languages. It's my unconfirmed assumption that they are mostly trained to do well with Python, C#, and other popular well-known languages. Assembly is still relatively well known so I guess that might be incorporated to a decent degree in training too, but idk of they're That's good at those low level languages as they are with high level/well known ones.
>>108568545Also Make sure your open code install is updated. There are many reports of people having issues with Gemma 4 both with open code and other front ends, back ends, harnesses, etc. I've had no issues so far but that's likely because I made sure I updated my install to 1.4.2, which was released 5 hours ago at the time of writing this. So do anyone having issues with it, you might just have to update whatever software you're using.
>>108568626>>108568602>>108568545>How do I upgrade $opencode upgradeNow I must sleep. Goodnight frens
$opencode upgrade
>>108568555Thanks anon, I'll keep looking out for it in the future, please include screenshots, it always picks my attention to your postsAs for generation - so the terrain, is it generated from like texture pieces, put together by the generator, or are all the 'features' on the planet done with like a computer shader?
>>108568626e4b still can't figure out how to use the basic tools in the /init command :cAm I doing something wong?
Best openclaw subscription?
>>108568693>e4bThere's your problem. Those are meant to be used for general purpose tasks on edge devices or shit rigs with low specs (It even runs decently fast on mobile devices). At that perimeter count you may as well be asking a toddler to rebuild the Saturn V frok scratch. Of course it's going to get confused. You need to be using models that are "smart" Enough to even use tool calling. Ideally once that are specifically trained to be good at it like the following:https://huggingface.co/google/gemma-4-26B-A4B-ithttps://huggingface.co/google/gemma-4-31B-itTheir Ollama page spoonfeeds the differences and use cases wellhttps://ollama.com/library/gemma4
>>108568753E2b is 9 GB instead of the 2 GB that it usually designates.By E2b they mean as fast as 2b, not as big as 2b.
>>108568753that's so sad since e4b is very impressive as a regular ChatBot.
>>108568679It's basically just a dumbed down version of this https://refactored-mountain-adem.pagedrop.io with the generated "visual profile" created from the planetary data fed into it + random variation. This does not have craters though, because Gemini could not make them look good while Claude got it first try. All it does is generate continents from a noise pattern, finer details from a smaller noise layer, distorts it with another noise function, maps the colours to elevation and wraps it around a sphere.
>>108568770Not quite. Effective denotes effective parameters (I literally showed you the Page via a screenshot and linked the page.....). It's some fancy new technique they used in training that results unless VRAM being used. It's kinda like Moe but not really. That would be way too simple of an explanation. >instead of the 2 GB that it usually designates.That denotes the parameter size. That's almost never referring to file size.
>>108568784Use the right tools for the right jobs. Based on when I've read and heard the "effective" models are pretty decent general purpose models for asking one off questions but should not be used for anything complex.
Why would i want it to go into retard mode for planning? This seems backwards.The compute walls are closing in regardless. Get your hard coding challenges done ASAP lads
>>108568811Someone smarter than me could probably add things like erosion modifiers and tectonic evolution to get more organic land masses, but that would increase the time to generate the map significantly so it would no longer be real-time.
>>108568824Parameter size is close to size on vram
>>108568846Depends on what the precision you're using is. If it's q8_0 as a 20b for example, It will use a little over 20 GB of RAM. If it's FP16 then it will use twice as much. FP32? Four times as much. q4_k_m Will use roughly half the RAM q8_0 would use. Etc etc. The quantization formats are useful for doing navkin math to determine whether or not your rig can actually run a model.