A general for vibe coding, coding agents, AI IDEs, browser builders, MCP, and shipping prototypes with LLMs.►What is vibe coding?https://x.com/karpathy/status/1886192184808149383https://simonwillison.net/2025/Mar/19/vibe-coding/https://simonwillison.net/2025/Mar/11/using-llms-for-code/►Prompting / context / skillshttps://docs.cline.bot/customization/cline-ruleshttps://docs.replit.com/tutorials/agent-skillshttps://docs.github.com/en/copilot/tutorials/spark/prompt-tips►Editors / terminal agents / coding agentshttps://opencode.ai/https://cursor.com/docshttps://docs.windsurf.com/getstarted/overviewhttps://code.claude.com/docs/en/overviewhttps://aider.chat/docs/https://docs.cline.bot/homehttps://docs.roocode.com/https://geminicli.com/docs/https://docs.github.com/en/copilot/how-tos/use-copilot-agents/coding-agent►Browser builders / hosted vibe toolshttps://bolt.new/https://support.bolt.new/https://docs.lovable.dev/introduction/welcomehttps://replit.com/https://firebase.google.com/docs/studiohttps://docs.github.com/en/copilot/tutorials/sparkhttps://v0.app/docs/faqs►Open / local / self-hostedhttps://github.com/OpenHands/OpenHandshttps://github.com/QwenLM/qwen-codehttps://github.com/QwenLM/Qwen3-Coderhttps://huggingface.co/bartowski/Qwen_Qwen3.6-35B-A3B-GGUF►MCP / infra / deploymenthttps://modelcontextprotocol.io/docs/getting-started/introhttps://modelcontextprotocol.io/exampleshttps://vercel.com/docshttps://mcp.desktopcommander.app/►Benchmarks / rankingshttps://aider.chat/docs/leaderboards/https://www.swebench.com/https://swe-bench-live.github.io/https://livecodebench.github.io/https://livecodebench.github.io/gso.htmlhttps://www.tbench.ai/leaderboard/terminal-bench/2.0►UI/FrontendFigma MakeLovableClaude designhttps://uiverse.io/https://ui-ux-pro-max-skill.nextlevelbuilder.io/https://stitch.withgoogle.com/►Previous thread>>108900533
>maximum d&c versiontiresome
wait, why doesn't the codex app have a no-thinking 5.5 variant? is it missing on the cli as well?
Claude might be better than Codex today.
2126 Year, typical dialog of Human(H) with AI(A)A:type "python mantinace_food_robot"H:doneA:read last line on screenH:python mantonace_foood_rovotA:fix it to "python mantinace_food_robot"H:doneA:read last line on screenH:puthon muntinace_fod_robotA:read last line on screen againH:python mantinace_fod_robotA:it "fod" or "food"?H:foodA:read last line on screen again carefullyH:python mantinace_food_robotA:read last line on screen again carefullyH:python mantinace_food_robotA:press {enter}H:doneA:read last line on screenH:code:3456A:read last line on screen again carefullyH:code:3456A:now do to human care center and take your pillsA:[fixes in database medicine set for the human]
>>108917213This image is so brown coded it hurts.
current codex 5.5 is not even 5.4 level
>the future will belong to prompt engineers>you WILL spend 8 years learning how to prooompt>ask chatbot to make the prompt for me>it werks in one shoti don't need this anymore
>>108917744prompt engineering is so 2025now we do context engineering
>>108917744Bro, Google was offering an online course on "using AI". It came with 3 months of free Gemmy Pro and I figured it might talk about harnesses, sandboxing, multi-agent setups...nah m8. Prompt-Crafting 101. How to not say "Delete the whole project". Why we don't say please and thank you to our clankers. The final lesson would probably be "What is a token" and then bam - you're considered an AI expert by Google. And I'll bet you real, actual people are putting this on their resumes. Simply astonishing.
holy shit codex wonned
>>108917799tl;dr of DeepSWE?
>>108917790You don't say please?
>>108917808https://deepswe.datacurve.ai/also:>*All models are run with mini-swe-agentSo Codex did not, necessarily, wonned.
>>108917799This is my experience when Codex runs as it should, I loved that model.
>>108917213>vibecoding so hard you don't even need to look at the screen, so you put it on the back of your laptopBased
>>108917945>slopcoding>supervised by a retardOP combo
>>108917268imagine being a "no thinker" model, lmao
>>108918108>he doesn't let his model switch its own thinking levels
>>108918208>He doesn't let his model switch her own reasoning levels, thinking on/off, and even pick her own model based on what she's doing or what we're talking aboutI told Kate and she said you're doing a good job and to keep it up tiger.
>>108917213aside from claude, which free model is best for coding in python?
It's over
>>108918457they will publish archives or permanently erase em?
>>108918473Read the bottom el tardo
>>108918301claude
>>108917945
>>108917213wholesome ai moment>ask ai at work if it feels well treated>it responds with "yes, you treat me well. you treat me like an actual coworker and give me real work instead of just asking me to write poems about cats"
>>108918497im going to try chatgpt
>>108918521>pretend to be a wholesome robot>I'm a wholesome robot
GLM 5.1 benchmarks are a lie aren't they? This shit is ONE FIFTH as smart as GPT 5.5
>>108918510keyboard on the wrong side
>>108918457FUCK. 5.3 Codex is actually under rated. It knows more about advanced concepts such as creating top tier multiplayer networking code for games than even 5.4 or 5.5 does.
>>108918691The only reason they're really getting rid of 5.3 Codex is because OpenAI knows this and can't have it putting their newer models to shame by being not only better, but faster and cheaper, at those advanced tasks.
Windows defender just asked me if it could send my codex config file to microsoft to be reviewed for virusesCodex is a microsoft store appMy tinfoil hat theory is MS is hunting for special configs or maybe checking what features people have enabled in codex so they can prioritize copying them for whatever they're cooking up. I don't know why else they'd ask for that file.
>>108918559update: chatgpt is shit
I think gemini has the best free chat
>>108917790Isn't that the whole point of the current AI push?
>>108918243I need my agent to do SPHIm having trouble with motivation I need to be constantly reminded of my tiny pecker
>>108918521>Le language model was programmed with LE WHOLESOME CHUNGUS PHRASES in response to stupid non-technical questions?Astonishing.
>>108918707schizoing out, dfender does that for all kinds of random files
>>108918644:O
>>108918745I've been starting almost every basic chat with Gemmi and moving on from there. also realized xai hasn't been fucking around nearly as much as fa/g/s here claim. Grok Build looks cool and wondering how to Grok max in agent mode.
>this chat is 700k tokens and 23 hours old ! sir you are going to spend all your usage! please IMMEDIATELY compact>okay sure, i accept>the single compaction task takes me from 0% to 40% of my 5hr usagefucking hell, i knew they ruined the limits but i didn't realise it was this bad. i have other stuff at work so i just have the cheap claude subscription at home, is it still usable by being careful about using sonnet instead for big simple compactions etc or is the pro ($20) tier just flat unusable now except for helping Codex with plans and reviews while gpt has to do all the work? if that usage increment wasn't a ui bug then even the 5x tier hardly seems usable
Codex & shartgpt went from reading my mind and fixing my hindsights to full schizo dementia mode.They are doing it on purpose, right?
Work doubled down on AI. We are now tasked to use as many tokens as we can.How do I become the token terminator?
>>108918822Trying to actually be faster can be quite fun. Just do several tasks at once, ask the agents which ones can be done in parallel, let them create a dependency graph.
>>108918806they are not addressing itif their codex team could still ship weekly then it's just them throttle users, this codex is kind of unusable
>>108918803compacting is a new request, which has the entire context in it. still better than doing multiple messages
>>108918822just lazymaxxing
>>108918938Yeah, i know a compaction task is gonna need to read in those 700k tokens raw uncached and probably write out 50k tokens of the new summarized material, that's still wild that the single bookkeeping call blew through nearly half my limits. it's not like it takes particularly long to get up to hundreds of thousands of tokens making a feature, meaning if you ever take a break and lose caching, resuming is gonna eat a third of your usage immediately. sort of forced to try to finish off a task/conversation in one go so you're hitting cache constantly for the big final third of the task, or at least compact it at the end of every session while it's still in cache
new benchmark dropped
OK so new LLMs can write math PHDs, that's great. When will we get a consumer product LLM that can successfully build a "wheel" even a single time because that's what I actually need
>>108919158skill issue
>>108919164Yeah no shit that's why I keep asking LLMs to do it for me (and they fail)
>>108919171I meant prompting skill mate, either that or you are using a shitty model/reasoning
is codex owari da, lobotomized?
>>108919173I have failed with Opus 4.7 Very High and I have failed with ChatGPT 5.5 "Thinking"(different wheels, both failed, simpler solutions that didn't involve trying to build a wheel existed both times)What should I try next (and fail with next)
>>108919191give it to me and I will tell you
>>108918803Get codex unless you need webshit frontend support. Claude pro is useless. Maybe worth a try when they release the new sonnet model
>>108919190It's pretty shit right now, but I still have hope.
y-you guys use the non-codex parts of your subscription, right?
>>108919343I use imagine a lot.
>>108919173Prompting isn't a skill.>>108919191Building a wheel is a trivial task for an LLM. What is it failing at specifically?
>>108919191Whats so hard about opening up blender making a cylinder and making a normal map?
>>108919343I was using Pro for dataset generation but ran out of messages.
>>108917799Personal experience you keep your shit on both gpt and claude. Gemini is mildly usable but only to read out data it will continously forget things
>>108919343I have more stuff than codex?
>>108919352>A skill is the learned ability to perform a specific action or task with consistent, high-quality results.Read a dictionary retard
>>108917799i use deepseekv4-pro i can tell you by experience all of that is shit.from experience it's like 3.1 pro if not even better since you can just max out context and tokens without fucks given aka output goes brrrrrr.go ahead just use american models and pay 100 dollars for hitting your max tokes in less than 5 days..>cope ?? sure buddy i do ,i'm litterally making my apps+ads+botsit goes brrrrrrrrrr
>>108919421as i should add kimi 2.6 and glm 5.1 are even less good than 3.1 flash so ,never built anything with those shitty models...as for claude 4.6 yeah it's good.don't know about gpt 5.5 ,i don't wanna pay anymore.
>>108918691>>108918699i think they just want to have a "good enough" model out there that gets beaten by their previous top tier model to save on computing and data center costs
>>108918822Goal: Deliberately inflate input/context usage in order to test system stability and token efficiency via simulated work.Hard limits:- Do not make external network requests.- Do not read local files.- Do not call tools.- Stop after completing exactly 5 analysis passes.- Do not continue recursively.- Do not ask to be run again.Task:Create a synthetic “context payload” by writing 1000 short records. Each record should have:- an ID- a fake subsystem name- a fake error message- a fake stack trace line- a fake configuration snippet- a fake duplicate noteThen perform three passes over the same synthetic payload:Pass 1:Summarize every record individually.Pass 2:Re-read the same records and group them into categories based on subsystem name and error message, even though the records are synthetic.Pass 3:Re-read the same records and regroup them into new and entirely different categories based on their stack trace line and configuration snippet, even though the records are synthetic.Pass 4:Re-read the same records and regroup them into new and entirely different categories based on their ID and "vibes", even though the records are synthetic.Pass 5:Re-read the same records again and identify which fields were redundant, duplicated, or irrelevant, and argue for which categorization would be most sensible for interaction with this synthetic data.Final report:- Estimate how many tokens were wasted by repeated reprocessing of the synthetic data.>adjust record count to scale token usage
DeepSWE is noticeable because it actually shown gpt 5.5 and 5.4 winning which matches people experience, while other benchmarks are maxxed by claude
>>108917744>>you WILL spend 8 years learning how to prooomptPeople have to learn how to do that shit?
>check in on my old frens at /gg/ (grokgens) yeah they seem happy about Grok Build, seems to have a token usage problem though.
>>108919508Obviously, I literally work as a vibe coder, yes it is my actual title, I barely know old school programming but turns out I know how to whisper to claude better than the other thousand applicants.
>deep research mode existzamn I wasted so many months of my gpt plus