A general for vibe coding, coding agents, AI IDEs, browser builders, and shipping prototypes with LLMs.## News(7/01) Fable 5 + Mythos 5 restored globally after US lifted export controls (6/30).(6/30) Claude Sonnet 5: near-Opus 4.8 quality at $2/$10 intro, 1M ctx, new default for Free/Pro.(6/30) Meituan LongCat-2.0: 1.6T open coding model (MIT).(6/26) GPT-5.6 preview: Sol/Terra/Luna, Codex+API to trusted partners; Sol Ultra 91.9% Terminal-Bench 2.1. GA in weeks.(6/13) GLM-5.2: Z.ai open-weights a 1M-context coding model (MIT).----## What “vibe coding” is, and how to do ithttps://simonwillison.net/2025/Mar/19/vibe-coding/https://simonwillison.net/2025/Mar/11/using-llms-for-code/----## Frontier models using fully-general tooling — start here if you have $20 or sohttps://developers.openai.com/codex/clihttps://claude.com/product/claude-code## Not worth it for code, but maybe good for other thingshttps://geminicli.com/docs/https://x.ai/clihttps://chat.z.ai/## Open / local / self-hosted>>>/g/lmg----## Prompting / context / skillshttps://arps18.github.io/posts/claude-code-mastery/https://simonwillison.net/guides/agentic-engineering-patterns/using-git-with-coding-agents/https://github.com/mattpocock/skills — /grilling is a favorite## Other editors / terminal agents / coding agentshttps://aider.chat/https://pi.dev/https://opencode.ai/https://cursor.com/docshttps://docs.windsurf.com/https://docs.cline.bot/https://docs.github.com/en/copilot/how-tos/use-copilot-agents/coding-agent## UI/Frontendhttps://www.figma.com/make/https://www.anthropic.com/news/claude-design-anthropic-labshttps://uiverse.io/https://stitch.withgoogle.com/## In-browser builders / hosted vibe toolshttps://bolt.new/https://replit.com/https://v0.app/docs## Benchmarks / rankingshttps://www.tbench.ai/leaderboard/terminal-bench/2.0## What we’ve donehttps://vcg.gitgud.site## Previous thread>>109186872
Is GTP 5.7 better than Fable?
i'm gay
>>109193771yes (I'm Sam)
trying to figure out how to use my one week lifetime window for fable is a bitch.
>>109193866I was saving it but I realized my Fable 1 week limit resets tomorrow so now I am just spamming everything I can think of to it
Have any of you tried Headroom for savings?https://headroomlabs-ai.github.io/headroom/
>>109193795are you also Chinese?
>>109193866stop acting as if this is a once in a life time experience. its going to come back to subscriptions soon, especially after 5.6 comes out and they have competition. models get better every day. just use it. if you have it do something that you feel turned out bad or you feel you wasted it with a bad prompt, then you learned something and will be able to take better advantage of future models
>>109193894i did that last night, had it do a bunch of shit as long as it was prior to 8am local
>>109193902>stop acting as if this is a once in a life time experience. its going to come back to subscriptions soon,lmao no and even if it does it will be nerfedplus the government might just rangeban all good models at some point now that we have precedent for it
>>109193898all of this shit (headroom, cavemen) etc. is just useless voodoo magic cope for poorfags.just use the models as they were intended.Pretty sure if Anthropic could save compute like that, their $500k engineers would've already figured that one out
>>109193911There's no question that using less tokens cost less, it's a matter of whether it affects results though.
>>109193910so you honestly believe that anthropic, openai, google, and china will just stop being able to push out innovations? thats the most retarded thing ive heard
>>109193920nta anon but we've had this conversation multiple times in these threads and the vast majority of output tokens are reasoning tokens and they're already ultra-caveman mode - you likely cannot affect these in any way - you're just making the small fraction of output tokens that you read, harder to parsethe other ones that intercept terminal outputs and reformat that are worse
>>109193929they are in the uber "burn money for marketshare" phase and the IPO is around the corner, after that happens prices will spike hardit might get better in a few years, maybebut you also have forgotten the whole government thingthey may just make the entire thing corpos only
>>109193947We did have this conversation before, there's just no way to know for sure what the big labs are doing though.
>>109193741No lie you can actually get about 15 liters of usable biodiesel from an average human body. Transesterification. Fat + methanol = energy,Yes I have been thinking about this a lot lately.
>>109193741Me on the left.
>>109194002there absolutely is, the reasoning traces leak from time to timewe know gpt is an autistic caveman and so is fable/mythos
>>109193741So what are you retards vibecoding?Anyone made decent software?
>>109194046How many times are you gonna ask this?
>>109193741>>109194005Very cool billy gatesMake it happen
>>109193960>they may just make the entire thing corpos onlynot how it works, you cant hit a killswitch like that. its like how microsoft's most profitable sector is 365 and azure, but they still keep all kinds of consumer facing services open.if we were in a vacuum and chinese models didnt exist, maybellms arent inherently unprofitable, they just are when youre doing anything you can to stay ahead so you brute force with compute. none of the chinese models are doing this, hence why theyre better long term bets. deepseek in particular is pretty far behind but it's very cheap and has a completely unique architecture that lets it have huge contextif anything they would shut it off for free users but continue to subsidize subs. they just cant afford to completely go ghost, china will immediately win
>>109194078>muh chinesemodelsbenchmaxxed slop
>>109194076
>>109194085What if it decides to do that on its own?
has anyone successfully vibecoded circuits? e.g. spice netlists for simulation of analog circuits, kicad schematics, etc
>>109194064No, I am just curious if LLMs are so great show me software you made?
>>109194094Make it a hard rule that Fable is only allowed to spawn Opus agents. ez pz
>>109194098They're great, I'm just a retard with bad ideas. I'm having a le blast though.
>>109194084so why do people pick glm so often in blind tests thenlet me guess, arena ai is bribed by china, right? or do real word blind web dev tests not count
>>109194108nta but bro look at the other models in the ranking sonnet 5 and qwen3.7 wtf is this supposed to even represent?
>>109194098How many fucking times are you going to do this. Fuck off and get a life.
>>109194119i dont know what your point is because youre a bit incoherent but these are blind tests and this is the site every llm provider looks at as a blind test source of truth
>>109193911I just like caveman because even when told to be extremely concise LLM responses are still too wordy. The cost savings, if any, are a bonus.
>>109194108>web frontend
>>109194133my point is some of those models are straight ass if thats top 10 they failed to test properly
>>109194174>WebDev
>>109194179sorry i was vibe reading
>529 overloadedit's over
>>109194159literally all that matters. its what people see.back end is easy for any of these models at this point, save for really complex, large codebases (which isnt something you should have as a vibecoder)but heres the agent test>>109194174theyre not ass. qwen 3.7 max is genuinely a beast. most ai projects use qwen at some point in the chain because of how efficient it is. sonnet 5 isnt ass, it just doesnt have much improvement over 4.6 but 4.6 isnt assand like i said its literally a blind test, theres no possibility for errors in their testing. people give a prompt, multiple models take it, the users view the results without knowing which model made what, and pick the best oneits a 100m dollar company and every llm provider looks at it
>>109194189do they have a methodology for their testing documented somewhere because i have plenty of nits to pick but no reason to waste your time with them if they have it all written out
>>109194189Agent Arena is the only leaderboard I trust because it's the one that reflects my opinions on models. Whenever I try a new model I'm always happy to see it slot into that list exactly where I thought it would.
>>109194046
>>109194209https://arena.ai/ Don't be lazy, anon.
>>109194209Agent Arena...like a gladiator ring for agents...intriguing...
>>109194130Oh no, the aitard is angry.
>>109194204what do you mean? the methodology is what i just said. the scoring system is just elo, which is a well known scoring system used by a ton of shitagain, someone gives a prompt, a few models work concurrently, they all get shown and compared to one another. so for example, AvB, CvD, AvC, AvD, BvC, etc so a proper hierarchy can be formed. the model names arent shown until all votes have been made
>>109194227You are lashing out at ghosts it won't make you feel better
>>109194209>>109194220Which models would you like to see in a hypothetical "agent arena"? Claude, GPT, Gemini, DeepSeek, Grok, GLM, Kimi...who else?
>>109194227
Anybody else doing inference engineering work on R9700/gfx1201? I know there's quite a few kernel anons here at this point and I'm thinking there's probably a decent amount of overlap between what we're all working on. I'm trying to avoid reinventing the wheel and do my part to help get AMD stack out of anecdote hell.
>>109194189Using AI in complex codebases feels like a completely legitimate usecase. If my codebase were small, I would just write it myself.
>>109194024Ok, I looked it up, you might be right. Interesting.https://www.reddit.com/r/ClaudeAI/comments/1ul1396/fable_5_leaked_chainofthought_in_web_interface/
>>109194229The sample breadth in my opinion is too wide compared to the actual data points. It's fun trivia but looking at the actual head to heads (e.g. opus 4.8 with thinking typically loses to GLM whereas opus 4.8 without thinking typically wins) it just comes off as nonsensical in terms of the actual value its producing. You can argue for collective data (as in samples among the variety of models averaged) as showing some type of value but realistically if I'm reading the data points right it comes out as a tie the vast majority of the time and whatever weighting they use doesn't do a great job of showing that
my project is now on the order of 50k LOC of python and i only understand about a third of it, but it works really well and fasti plan to just finish polishing the code and then go back and try to figure out how it works
>>109194310>50k LOC python>really fast
>>109194293yeah but then youre not a vibecoder>>109194297thats a well known thing. thinking does not automatically mean better. nor does higher effort levels. in fact, using higher effort levels can make it noticeably worse at creative tasks. but on code, like the agent and webdev rankings i sent, thinking is higher in both. what ranking are you looking at?
>>109194024If they stopped monitoring the chain of thought at all and are only paying attention to results and number of tokens to get there, that type of thing does seem possible.That's still a non zero chance that this is Anthropic trying to poison the well though.
>>109194314the actual work is done outside of pythonrust only gained about 5-10% on top of it
>>109194234I already feel better.
>>109194332Oh they're monitoring it.But they probably actively encouraged the model to learn to speak like that for brevity or at least they let it happen and hid the facts.
>>109194379One shotted this yesterday.