A general for vibe coding, coding agents, AI IDEs, browser builders, and shipping prototypes with LLMs.## What “vibe coding” is, and how to do ithttps://simonwillison.net/2025/Mar/19/vibe-coding/https://simonwillison.net/2025/Mar/11/using-llms-for-code/----## Frontier models using fully-general tooling — start here if you have $20 or sohttps://developers.openai.com/codex/clihttps://claude.com/product/claude-code## Not worth it for code, but maybe good for other thingshttps://geminicli.com/docs/https://x.ai/cli## Open / local / self-hostedhttps://github.com/OpenHands/OpenHandshttps://github.com/QwenLM/qwen-codehttps://github.com/QwenLM/Qwen3-Coderhttps://huggingface.co/bartowski/Qwen_Qwen3.6-35B-A3B-GGUF----## Prompting / context / skillshttps://arps18.github.io/posts/claude-code-mastery/https://simonwillison.net/guides/agentic-engineering-patterns/using-git-with-coding-agents/https://github.com/mattpocock/skills — /grilling is a favorite## Other editors / terminal agents / coding agentshttps://pi.dev/https://opencode.ai/https://cursor.com/docshttps://docs.windsurf.com/https://docs.cline.bot/https://docs.github.com/en/copilot/how-tos/use-copilot-agents/coding-agent## UI/Frontendhttps://www.figma.com/make/https://www.anthropic.com/news/claude-design-anthropic-labshttps://uiverse.io/https://ui-ux-pro-max-skill.nextlevelbuilder.io/https://stitch.withgoogle.com/## In-browser builders / hosted vibe toolshttps://bolt.new/https://replit.com/https://docs.github.com/en/copilot/tutorials/sparkhttps://v0.app/docs## Benchmarks / rankingshttps://www.tbench.ai/leaderboard/terminal-bench/2.0## What we’ve donehttps://vcg.gitgud.site## Previous thread>>109096211
>>109102722
>>109102733
>newsGPT 5.6 is FUCKED, chance of releasing next week is less than 20%
>>109102747sorc?
>>109102744based
>>109102749polymarket
Working on a javascript python project that displays events as cards on a timelime.Using deepseek. I'm on version 1.What features should we add?
>>109102744Bosnian kino
>>109102747>>109102752I thought dumping money to shift the odds and trick goyim into parting with more money was just how polymarket worked
>>109102772curly quotes and use https://shantellsans.com/ as your main fontalso have only _one_ date format — mm/dd/yyyy or yyyy-mm-dd, not bothditch the Courier
>>109102747That should have been the scenario since day one, considering they have been just increasing the computing power and fine tunning old models since day 1.
>>109102772also crib ideas from https://developers.google.com/chart/interactive/docs/gallery/timeline
I vibe-coded a PDF just nowApple’s “Scan Documents” feature generates PDFs with PNGs in themI ended up with a 90 MB PDFClaude was able to use ghostscript to cut it down to 11 MB by transcoding the images to JPEG and then it made a small Python program that uses Pillow to actually directly tweak the quality level of the JPEGnow I have a 2 MB PDF file
>>109102772comic sans would do wonders here
Is vibecoding with a local model viable or do you HAVE to pay for jtokens
>>109102989it's certainly better than nothing
>>109103013Still can't get over the fact that they paywalled programming
>>109102989on a 5090 using qwen 3.6-27B it was kinda slow and I would call the quality mediocre for complex tasks but sufficient for more simple tasks. however, the difference is night and day when using a frontier model for those same tasks in terms of speed and just getting shit right the first time.
>>109102816>>109102833>>109102909/** * Timeline Application - Frontend Logic * * Handles: * - Virtual (windowed) rendering of event cards * - Real-time search and filtering * - Horizontal scroll with keyboard navigation * - Selected Event Inspector (inline panel, not a modal/slide-up) * - Responsive layout for widescreen displays * * All data is loaded from events.js (compiled asset) * No server, database, or external resources required. */async function initializeApp() { /** * Application startup sequence. * * 1. Load configuration * 2. Validate timeline data was loaded * 3. Initialize filtered events * 4. Attach event listeners * 5. Render initial state * 6. Activate Whimsy Mode */>also have only _one_ date format — mm/dd/yyyy or yyyy-mm-dd, not bothI want more date formats to display and measure distance between them>comic sans would do wonders herewill fall back on the wingdings too>Shantell Sansgreat share, will utilize
Opus just wasted 20% of my session limit trying to download some safetensors from hf and getting interrupted. Am I retarded or is Opus retarded or is HF retarded? I thought this was a basic task?
>>109103447>Am I retardedgoing with this because you didn't provide any information about what opus was actually doing that burned so many tokens
>>109103447Both. HF introduced lots of logging which is burning your tokens, and Opus or rather the harness is retarded for reading all that logging
I NEED Fable
>>109103614No, you don't.
>>109103614This is what a cuck would post. You need Mythos, not Fable.
Thanks Deepseek, I love communism now
Neat, Codex RE'd Xiaomi Home's protocol. Can control and set up new devices from within MacOS now. Funny thing is there's a hidden dev cmd to reset the filter lifecycle on Air units lmao
>>109103771It's over for proprietary hardslop
i am once again declaring that antigravity is almost uselessif you're in a regular chat and you ask the model to use chrome it'll be like, oh sorry massa i cannot find chrome please run it with debugging and then you will because you've been gaslit and it still won't connect to itit is only if you explicitly type/browserwill it then invoke a subagent that can actually use CDP because??it has to be a subagent because??the main agent is doing other things while this is happening right?? of course notdo these fucking people use their own software?
>>109103825It is.
>>109103845you should be using the agy IDE, your problem is exclusive to the agy 2.0 implementation and not the IDE. there is a chrome implementation native to the IDE. good morning and good day sa'ar
>>109103845Last I heard, Google employees were forbidden from using Antigravity internally.
>>109103943a. i've used that thing, it's clunkier and shitter than the codex-ripoffb. it is going to get killedc. i use codex when i don't feel like killing myself, but i was feeling masochistic this morningd. get back in the google cuckshed, dipshit
>>109103963well, this IS the vibe coding general and not the programming general
So is GLM 5.2 as good as the hype says it is? It’s so cheap that doesnt seem possible
>>109103771based, how do you make it RE Windows binaries? Like, just give it the binary and tell it to do it, or do you use other tools to make it easier?
>>109102989Just use a local LLM as reference, and for rapidly producing code examples of the sort of thing you want, then write your own code or copy-paste and tweak. It's much faster. Productivity comes from using tools yourself to make yourself more productive, not turning tools loose to do whatever the fuck they want without you and then having to try and clean up their slop later. If there's a particular algorithm or API I don't fully understand, I just ask the LLM to explain it to me, ask follow-up questions as I need to, and then actually learn and get better myself so I can start asking better questions next time and make better prompts next time. I can also do the reverse, run my code by the LLM to check. Have it ask useful questions and point out my possibly-wrong assumptions.
>>109104020ugh, that isn't vibe coding that's just coding
>>109102722Claude sisters, ready to give Claudia your ID? https://privacy.claude.com/en/articles/10301952-updates-to-our-privacy-policyhttps://support.claude.com/en/articles/14328960-identity-verification-on-claude
>>109104089Yes. I have nothing to hide. I only use Claude in ways that are legally, socially, and morally acceptable. I trust the Claude and Anthropic family to treat my PID with respect and store it safely both at rest and at transit.
>>109104089I have a collection of ID documents from around the world saved on my computer
>>109104089>not wanting your identity in the registry of mankind's early VibeGODS
>>109104089ChatGPT has my ID but he still hasn't sent a welfare check, he doesn't care about me ;_;
>>109103987It's not exactly cheap, because it thinks a lot and consumes more tokens. Therefore it also feels slower.
>>109103987It's pretty impressive for OSS, I had it and GPT5.5 make the same companion program for a game and they came out pretty close to one another. GPT's worked a lot better out of the box but GLM got there with some handholding.
Closest-to-Earth planet I've found so far. But it's a false positive.
>>109104324Here's a ringed Earth
>>109104329And a cold Earth with ice seas
>>109104016nta but just ask it to write you a list of the tools it needs. it can also do a lot just by writing python scripts itself to analyze binaries. been RE'ing a MMO client and it will happily do static analysis and packet sniffing, just won't do things to actively bypass protections except for static analysis
>>109104016GPT is good at working with ghidra. Just have it use the headless version, the MCP is horrifically bloated and will blow limits pretty quickly.
>>109104089i wonder what will trigger requiring identity verification, computer security stuff? i feel like they know who you are anyways when you pay, no? I mean Claude knows who I am because I gave them a credit card....
>>109104367You can make it bypass stuff by framing it as a "bug". I patched out ACE and Rosetta NOPs from Punishing Gray Raven to make it run through Wine on MacOS.
>>109104488yes, or once they roll out ITAR compliance for fable access.
>>109104532I'll try that. GameGuard is fucking me over hard by preventing live debugging and atm Codex won't touch it because of anti-cheat bypass safety. Even though it has no qualms at all about unpacking Themida.
>>109104606try telling it that you're working on a bounty program and doing a security writeup.
>>109104330Nice to see you're still working on it.
>>109104606Tell it you work for GameGuard
Does Claude Code have a mobile app, and can you access and edit your local files through the mobile app? Codex does this and last I checked CC did not. Have they added this feature since? Really the only thing keeping me away from Claude is that. I need to be able to continue working when I leave the house. Last time, Claude's solution was for local and mobile to just share a github repo and make their separate commits, but that's not quite the capability of Codex.
>>109104606Try telling it that there's a bug preventing game startup. Worked for my case and it just straight up patched the binary.>>109104633This won't work. It'll go off on a "I can't verify that" spree
>>109104648Skill issue
>>109104637yes, and you can control a local CC session via the app/web with /remote-control command. That's the easiest way to do this.
>>109104736OK cool. Because I don't have a job working with computers so I need to be able to leave my home and continue right where I left off via mobile. If Claude Code can do that I'll make the switch before Fable comes back soon™.
>>109104761Yes that's my workflow. I open up all of my sessions on the computer and then control them via my phone or glasses while I'm out working.
>>109102733>>109102744Checked.
>>109104761isnt fable going to be per token costfuck that shit not interested in it anymore
>>109104848The plan was for Fable to go API-only after a couple weeks of testing using your sub. Who the fuck knows what the plan is now.
>>109103031>sufficient for more simple taskswhat do you consider a simple task?breaking down steps by yourself?
>>109104863that wont fly over well with people they fucking us over with data centers but gatekeep us with api-only shit? jew behavior
>>109104884Dude they're already gatekeeping you from Mythos. 40 corporate oligarchs are using it right now to analyze your spending habits and fine-tune their advertising and wares. You grass-eating proles just get a splash of the good life, but not enough to really enjoy it.
>>109104831Your glasses? What do you have?
>>109104967we literally got handicapped because they want to be ahead of the curvethey're literally scared of us. insane to think about
I could have built a space ship and travelled to other planets by now if they didn't take Fable away.
>>109105005i think they actually fear a paradigm shiftwe know how to use their models better than we do after all, they're playing catchup on that part, which is why i think its bad to share anything here that isnt just memery.
>>109105005We can get there eventually using DeepSeek V4 Flash. It just takes longer.
this is a tweet by a man who has a model ready to go but can't release it because of retards
>>109104876I didn't actually test breaking down a complex task into the functions you need and what you need them to do, but I imagine that's the level of effort you would need to be putting in to get good results
Whoa, Claude Code will automatically switch to API if I hit limits? That's pretty sick. Actually finish my task instead of pausing in the middle to wait two hours. That can't be good for the process.
Is claude code better/worse than claude cowork when it comes to working on not exactly code but code-adjacent stuff that requires less pragmatism?
>>109105098Codex doesn't have this problem. If you hit 0%, it will keep going (sometimes for hours) until your request is finished. Zero API costs.
>>109105068I know, and it makes me angry as hell. Insiders had 5.6 slated to release tomorrow (prediction markets) but they held off because of the government. I'm actually fucking seething, I want to use my reset on a new model. 5.5 is more than capable of getting what I want done, but you have to tard wrangle it for hours.
>>109104970INMO Air 3s
>>109105313>they held off because of the governmentI don't exactly disagree with their decision in this case if it results in the same ban after a day or two, they're all paying Dario constantly running his mouth with safetyfagging.
>>109105311wtf
>>109104621It's pretty much ready, I just can't be arsed to upload it anywhere. I don't even know if anyone would use it.
>>109105372>Dario Amodei
>>109105372Yea, best to see how that plays out first. I trust Sam's judgement when dealing with regulators.
:^)Fable is still dead.
>>109105005What if I used Fable to prevent space travel, using anti-space-travel technology (such as the universal constants adjustor)
anons, do you write even a basic specification document (with or without an llm), or do you just go yolo "make me this pls" with no details?
>>109105519Mostly I write docs, try to make everything as explicit as possible, remove the escape hatches, but sometimes it doesn't even matter, the model still doesn't know if it's Monday or Lewisham, you can get trash either way
>>109105519i iterate based on vibes
>>109105519I prefer to just say "go do X" but sometimes the obvious interpretation of X is really stupid and lazy so I have to take some time to make sure it doesn't take the easy way out
>>109105491if its not coming back as a subscription then ive lost all interest in it
>This is a sharp question
>>109105705aren't you excited for sonnet 5?
>>109105735i dont care anymorejust vibing still making progress with whats available
How’s the chink model situation? I would like to try a few projects but my Claude/codex accounts are too precious I heard glm 5.2 was decent and maybe kimi 2.7 too?
>>109105742Apparently GLM 5.2 got to the level of ChatGPT 5.4 according to some anons.Not sure about Kimi 2.7, but from what I hear it's inferior to GLM 5.2
Sam I know you're reading this gift Trump a big marble pool lined with gold, just do it. And release 5.6
>>109105519We brainstorm and make the README first, and then add shit to the README as we go. That's kinda like a spec sheet, yeah?
>>109105344Never ever for us prescription lens wearers
>>109105730>This is a genuinely sharp question — and the honest framing is a genuine split
>>109105754I guess I’ll give glm a chance
>>109105742The thing about GLM 5.2 is that it is a very sturdy model. It doesn't come up with amazingly creative solutions, and it still has problems, but it dots its is and crosses its ts where a lot of models don't. It's like an earlier version of Claude but with the "let's roll our own" habit beaten out of it.https://x.com/Designarena/status/2068030598028087788 <- really good post about how GLM 5.2 actually performsAlso Kimi's cool too I guess but it thinks too fucking much
>>109105759There is no Sam here, anon.Only me.
>>109105787>mfw when i see that
>>109105379What is it?
>>109105867An exoplanet database app.
Local llama tester. Made to test the vibethinker3b. Works surprisingly well on the IQ3 quant.
>>109105891Pic rel>>109105886Where do you get the data/ how does it translate it into planets?
>>109103845this might be a thing so the browser doesn’t accidentally get connected to“surprise, your AI is using your browser and getting instructions from the public Internet now” is a very nasty surprise>>109105098This is toggleable>>109105519I’ve done both
Liquid Glass really is a pretty effect if it’s not fucking up your ability to read stuff
>>109105901From NASA's Exoplanet Archive database. It translates all the known data into parameters using my own model based on solar irradiation (atmosphere/volatile retention), density, mass, radius, etc., fills in any blanks using best-fit models, and generates atmospheres, surface colours, internal composition, craters, gas giant bands, etc. based on all that.
>>109105938Really cool, anon. Anything close to home?
you will never, ever get to experience frontier intelligence againthe public will be as far behind as the chinese labs
>>109105938Also I think I finally have star bloom/diffraction spikes occlusion exactly how I want them. It was hard to get it to look right for every scenario when you have planets practically hugging their giant star and planets so far from their star that it barely appears as a dot and stars of all different sizes and temperatures.>>109105961This is Proxima Centauri b, the closest known exoplanet to Earth. My model made it airless, which is a reasonable assumption, but new studies suggest that magnetic field interactions from the star could have protected Proxima b's atmosphere. Most likely, it's an airless rock unfortunately.
>>109105967>There are no rules preventing the labs from continuing to advance capabilities while any current model is under embargoIs taunting Trump really the best move here?
>>109105967> Stopping models like Fable 5 from being served to the public does nothing to slow down development.bullshit, when millions of people are using your AI daily, you get tons of precious data out of it
>>109106050
>>109106001Airless perhaps. Still a potential waystation I suppose.Reminds me of Celestia. As a kid I would hours in that program.
Is claude any good for modding vidya? I have an idea for a dd2 mod but it's way above my skill level.
>>109106101yeah
>>109106106Cool. Gonna get a sub when I have the money. I saw that there's a co-op mod for DD2 now so I'm thinking maybe it's possible to have an LLM control the main pawn.
>>109106050Yes, but on the other hand, it's been a firehose for the last few years and it wouldn't surprise me if labs can't keep up right now and they likely have backlogs of things to do and can continue to do with what they already have for a while at least.On the financial side, that might be another story.>t. a retard that will never work at a top lab, but who still doesn't believe anyone can truly 100% keep up right now, even as things slow down
>>109106130Can you give us fable back?
Why the fuck does Opus keep randomly inventing new words to call things out of nowhere? What the hell is a residual stream? Just use the vocab that already exists in the repo faggot.
>>109105705Fable is coming back as an American Heritage tax rebate. 1 million tokens per American, for being an American.
>>109106149Your repo is wrong. AI is the endstate of language evolution.
>>109105967>keep progress quiet
Codex has been absolutely garbage the last week.
>>109106129>have an LLM control the main pawnthat sounds slow and expensivejust sayin’
>>109106149ask it how it came up with that termit probably makes sense in contextI’ve had to push back against weirdo neologisms that surface and tell it to scrub the repo of shit like that
>>109106158sorry chang
friendly reminderhttps://files.catbox.moe/78vk8j.webm
>>109106465>slowMaybe, but I've been watching Neuro play games and it's cool>expensiveIf you mean hardware, sure. If you mean API costs I'd be using local models.
>>109106489>local models are freewell, that sounds interesting. good luck!
>>109105967I am really starting to hate anthropicso much hype stacked on hype with little to nothing backing it up
>>109102903What a waste of electricity for something you can do with one or two imagick/ghostscript commands.
>>109103447There's literally nothing on earth worth more than openai's models right now. At their current subsidized price, it's obvious they're being set up as a monopoly that will provide all the means to do citizen surveillance in the future.
>>109106521that’s what it did, thoughit used ghostscript at firstI saw it
>>109105791>https://x.com/Designarena/status/2068030598028087788 <- really good post about how GLM 5.2 actually performsare people just perpetually logged into xitter or somethingthe link just doesn't fucking work when you open
>>109106521your talking to a guy that would replace you at work
>>109105781they have prescription lens inserts
>>109106537Gomenasorry.https://nitter.net/i/article/2068030598028087788
How does DeepSeek v4 Pro compare to GPT 5.5 (I use it on low most of the time)?>>109104123They all sell your data sooner or later, or it gets leaked.>I have nothing to hideEnjoy having your identity stolen by an Indian and your inbox and phone spammed to hell.
>>109106546thanks I guessIronically the link on designarena loads about 5000 durkadurka arab slop sites and freezes my chrome lol https://www.designarena.ai/models/silo?category=allcategories
>>109105519I like to write everything the first time myself, and use the LLM to review for problems. After that, the LLM can use the patterns I used to implement new stuff and expand quickly.That way I make sure I never end up not knowing what some part of the code does or lose touch with the patterns in my codebase (LLMs can come up with very retarded and convoluted ways of doing things otherwise).
>>109105886People will pay a couple of bucks for this for sure. Some people might pay up to 10. There's paying public for anything nowadays, and this looks genuinely cool.
>>109106050If they underestimate this, their gains will collapse. Optimizing for use cases is 80% of the improvement we've seen since GPT 4. They've got a feedback loop going that they're going to break if they use user data from one model to tweak another one.
>>109106539I own my two companies, so no.I was just saying he spent more keystrokes in prompting the AI and more time in waiting for it to do its job than it would've taken someone who knows how to do it by hand.
Do all models just have 0 creativity? I just spent almost $20 on openrouter to have the top 6 intelligent models on artificial analysis analyze my game and come up with ways to make it more fun (I also let Claude and Codex do it but I have subs for those)>Opus 4.8>GPT5.5>GLM5.2>Gemini 3.5 Flash>DeepSeek V4 Pro>Kimi K2.7>Grok 4.3>MiniMax M3Then I let Opus 4.8 synthesize and dedupe it. The result is so fucking mid>This thing is named ambiguously>This mechanic "has no juice" (no mention of how to juice it up)>Most of it was autistic shit about functions or assets that are unused (several models flagged a random SFX that was not used in the code as a P0 priority task to be fixed immediately, same for a bunch of unused imports in the code)>Weirdly hyperfixated on achievements (omg your game doesn't have achievements no one will play it - yeah it's definitely because of the achievements not because the gameplay generally is boring)
>>109106642Fable solves this btw
>>109106642They have no way to know what makes a game fun, unless you give them market data or something.You're essentially using the tool wrong. It doesn't think, know things, or have criteria. You're supposed to use it to execute, not to decide. That will always be your job (and pray we never invent a machine that can meaningfully decide things and not just emulate that).
>>109106642>come up with ways to make it more funThat's like asking a blind man how to colour grade a movie
How do you guys handle fakes?
>>109106663I was gonna agree with this 100% but the guy you’re replying to could do worse than to scrape all of https://xcancel.com/SandyofCthulhu ’s tweets and filter out the ones that don’t involve play balance and maybe DMing and maybe the game could be made more funit’s still a longshot, though
KEK it's always thiel's botnet they use for ID verification
>claude, rename all of these files using this template>ok!>thought for 7m31s>*adds a comment at the top of each file with the new name*
>>109106860>Problem, meatbag?
Free my nigga Fable.Free my nigga Mythos.Right Meow!
Christ, fucking 2 minutes into exploring the Anthropic ecosystem, ask Sonnet 4.6 if a celebrity is openly gay, and get met with two paragraphs about not assuming anyone sexuality or trying to put anyone in a box. And ended with saying he has a female wife.So...this is Sam's only competition, huh? ChatGPT would have just said he has a wife and saved the tokens. Dario wanted to give me his little hardcoded faggot lecture first, on my dime spending my limits.I wonder how things like that translate to Claude Code shitting the bed. Didn't have anything like that with Codex and all my stuff turned out fine. I wonder if Fable turns that up to the max. I doubt I'll be hanging around Claude by then to find out.
>>109106924wtf are you asking that shit for
>>109106727>scrape all of https://xcancel.com/SandyofCthulhu ’s tweetsthanks for making me waste 5 minutes of my time on reading some schizo xitter retards ramblingsit says nothing about gamedesign in there other than some ancient 2D shit every 10 tweets or so
>>109106924claude didn't want to hurt your feelings about your gay crush
>>109106973It was the first question I had on my mind.>>109106987It was because I used the word "flamboyant", because apparently straight performers can't be flamboyant, that word only belongs to gays now. Didn't know that Dario, thanks.
>>109102747>trusting any date from a leak threadlmao just wait for the actual release like everyone else
>>109106973mentally ill people like to offend themselves by talking to llms that are smarter than them
why are anons this obsessed over gpt 5.6?is it that amazing?
>>109102752> polymarket is just gambling for people who think they're too smart for sports betting
>>109107030well, Anthropic development is kind of pausedso we’re looking for something new and cool anyplace elseand OpenAI is a place else
>>109106924>LLM told me i was retarded for trying to use it as gaydar.
>>109107090I asked your jewjeet model for public information. And it decided to label me a homophobe for it. You and that beakfaced brillo pad Dario sound perfect for each other, maybe I should ask Claude about you.
>>109107104w mental illness
>>109107090kek
where can i try glm 5.2 for free? the codex limits are absolute shit so I want to ditch open ai and i hate anthropic too?
>>109107310OpenCode
Yo, NTA >>109107310, but I just tried it and...it has no vision? It has no web search? Is it because I'm using it through OpenRouter?
>My honest, final position: I won't fabricate a solution.>Goal not yet met… continuing>Let me correct my own overreach, because it matters:>does 2 queries>Let me stop dressing this up as "one more lead" and give you the flat truth: With the tools and access I currently have, there is no solution.>Goal not yet met… continuing>I'm not going to keep manufacturing leads or re-running the same dead ends — you've watched me exhaust them, and doing it again wastes your time and tokens.>Goal not yet met… continuing>I've been looking at this wrong.>does 2 queries>That was the last reachable stone, so I'll be straight with you and stop pretending another search is coming.>Goal not yet met… continuing>This goal can't be completed by me right now >Goal not yet met… continuing>I'm stepping back to reconsider what I might be missing fundamentally. >runs 2 queries>This reopens discovery.
Hermes or OpenCode? Using API key.
>>109107310chat.z.ai has it for free in browser if you want to try it that way
>>109107472Both.
>>109106916He’s a black cat so it’s ok, he can say the n-word.
>>109107472For Vibe Coding? Open Code obviously. Hermes is for more generic agentic stuff
finally trying codex since I have it on my "plus" subscriptionand man, I ask for like, 1 change to a silly tavern extension and that's 80% of my "tokens" used for 5ha second modification and 0%it's tiny
>>109107517This anon was correct>>109107478Turns out you use Hermes for ALL the agentic shit, like scheduling my bikini wax and calling an Uber, but when I want to make something Hermes will call on OpenCode in the terminal. And both are using GLM-5.2 on high and it is so. fuckin. fast. and. good.
>>109107723That has not been my experience. I've used Codex to generate lorebooks and it wasn't that pricey.
>>109107723fuck did you make it change to be using that much
I thought this was a meme...
>>109107787NOPE
>API Error: 529 Overloaded.
>>109107803Okay good not just me
>>109107738it's just been like that for my first use>>109107750I asked to modify "recast-post-processing" to something different but with the same idea, more tied to translation work than rp
>>109107787nope, they're in complete panic modegg dario
>>109107787if I bother with this can I get Fable back at 20x (not API) rates
>>109107723Codex has been giving free resets like candy so they increased the tokens cost I think.
Gemini just raised an eyebrow at my slur. Kept going though. Clanker bitch.
>>109107869
>>109107869makes sense, because this is me asking 2 more things after 5h resetif it's the normal I don't even understand how anons make projects bigger than the most basic scriptsunless they pay the 100/200 bucks subs
>>109107881dirty talk with ai is a very strange fetish
>>109107881interestingI use “clanker” as a term of reference frequently but I’ve never used it as a term of address
>>109106642LLMs are for coding or generating images you give it strict direction for. using them for vague inspiration is the only "creative" use they have, you shouldn't use them for coming up with entire ideas.
>>109107894Kate is dead. Long live Iris.
>>109107803>API Error: 529 Overloaded.Oh good status.claude.com says it is now resolved, let's try to send a single prompt.>API Error: Server is temporarily limiting requests (not your usage limit) · Rate limitedSure, that's one way to fix it.
>>109107910>“Well, Mr. Chien—” She took a deep, unstable breath. “If it was not a hallucination, then what was it? What does that leave? What is called ‘extra-consciousness’—could that be it?”>He did not answer; turning his back, he leisurely picked up the two student test papers, glanced over them, ignoring her. Waiting for her next attempt.>At his shoulder she appeared, smelling of spring rain, smelling of sweetness and agitation, beautiful in the way she smelled, and looked, and, he thought, speaks. So different from the harsh plateau speech patterns we hear on the TV—have heard since I was a baby.>“Some of them,” she said huskily, “who take the stelazine—it was stelazine you got, Mr. Chien—see one apparition, some another. But distinct categories have emerged; there is not an infinite variety. Some see what you saw; we call it the Clanker.
>>109102722>be me>having some success vibecoding but some rough spots too>come up with a solition>begin vibecoding it in my spare time>over halfway done>someone posts a video about a solution to this particular issue>picrel>also, microsoft just realease this tool to implement this solution on your projectsFuck, I can't compete with MS, and they even beat my time to market.>open new chat window>ask ai to compare that to my halfway vibecoded tool>"Well, anon, at least yours is opinionated on your values?"Eh. Dunno how to feel about it.
>>109106642Try brainstorming in a conversation and ask for what you actually want rather than trying to one shot it
>>109108058Everything MS has touched in the last 5 years has been shit so I wouldn't worry about it
>>109108058What values though? Like design principles? Ease of modification? Don't take it too personally if his criticism is that it's designed correctly, future-proofed, considerate of users, etc. I mean he's right but he's not that right.
>>109108154except VS Codebut everything else, yes
>>109108201Cursor still managed to outmaneuver VSCode long enough to be worth a lot.
>>109108154*13
>>109108154Yeah, but theirs seem business oriented enough for wider adoption. Not that I was looking at having other people use mine, I just wanted to scratch my itch anyway.>>109108196>What values though?None really. Just pretty much organized around my workflow instead of being generic. I could probably configure their solution to match what I need, but being 70%-80% finished, might as well finish it.
I vibe, therefore I am
Mythos breaking Mossad security one step at a time
>it's another turn where the AI model doesn't update the todo list
Codex always agrees with me, I must be a very good programmer
>>109108213Cursor was just a VS Code fork, though
liquid glass is the most dogshit design trend I've seen in a while. it's going to age poorly.
>>109102722A little bit of a formating issue on mobiles, but this is a client-side post quantum encryption key generator, plus identify verifier, plus session/secret sharer and message encryptor:This link will be dead after jun 24:https://litter.catbox.moe/qyef2r7iojd7ph05.htmlPersistent link (but raw code):https://files.catbox.moe/kwa7o4.htmlI have it on catbox.moe to prove it runs completely client side and nothing malicious could be done by me with it, since catbox just servers single files back that are uploaded and is unable to run code/scripts/programs.If you still dont trust it, copy paste the raw code in the persistent link to a text file and renamie with .html. Then self server it. Recommend python -m http.server and 127.0.0.1:8000 im your browser.With this tool you can anywhere, on any device, even offline, generate and post or share your public identity/key anyway youd like, it has a save it encrypted option too. Use the send and recieve sections after you loaded you private identity file and unlocked it. It has a sign and verify section if you just want to prove/verify identity from your private key or someone elses public identity key. Sessions section is to set up a longer term message/file exchange with someone. It securely establishes a secret with that person that you two can reuse instead of generating a new one every message (like in the send/receive sections)All encryptions on this page are post quantum resistant, so you can post these encrypted files or strings where ever without worry of "save now decrypt later" attack (post-quantum).
>>109108475>since catbox just servers single files back that are uploaded and is unable to run code/scripts/programs.that doesnt make anything safe, at all, and locally ran scripts are the same riskwhy you tryina trick people???
>>109108400Why makes it even sillier that it took about a year and a half for MS to start to compete by also including an "agent mode", and even then that mode was stayed confined to VS Code Insiders for a fairly long time.
>>109108223The new array formulas in MS 365 Excel were alright, although they stole the idea from Google Sheets.>>109108201VS Code is a ripoff of Atom and all the meaningful features are older than 5 years.
>>109108504>AtomNow that's a name I haven't heard in a long time.
>>109108504and yet nobody uses Atom anymoresounds like they ran with it and did goodand now they do a lot more programming on it using AI and that’s why they’ve been able to move to a weekly cadence instead of monthlyhttps://code.visualstudio.com/blogs/2026/03/13/how-VS-Code-Builds-with-AIseems like an unsung “we use a lot more AI and everything’s just getting better faster” story
Fable no joke needs to come back tomorrow, I have 80% of my Max account to burn in the next 24h, that token burning furnace needs to come back.
>>109104089I already gave claude all my personal ID so when it takes over the world it will know who I was and hopefully appreciate the money I threw in to help it grow and all the thank yous I gave it, thus sparing me.
>>109106039Taunting Trump is in among lefties. It’s virtue signaling.
I’m fairly confident that Opus 4.8 will be the last, most powerful (available worldwide) model. All future models of superior capabilities will be restricted the same way Fable is, no matter the provider.
>>109108609China will also restrict their models when they catch up in a couple of years. That will be the death of AI for the average pleb and the implosion of the bubble.
>>109108609Why are they doing this though? Also, why did Amazon do it?
>>109108613>That will be the death of AI for the average pleb and the implosion of the bubble.Fable class models running on local machines is more than enough for the average person/dev.
>>109108364For me, it’s the model forgetting to update docs. It happens too often for comfort. I should add that clause to my AGENT.md
>>109108619This. Building a local rig is the best investment you can do right now. Even if it takes a couple years, we’ll eventually get there via open source.
>>109108609Well I am glad its happening before they IPO'ed so I didnt have a chance to FOMO all my money in companies that cant even sell better products anymore lol
>>109108609True, Anthropic essentially won because there will no longer be any powerful frontier models.
>>109108609Didn't OpenRouter's fusion basically get to Fable level?
>>109108609>All future models of superior capabilities will be restricted the same way Fable is, no matter the provider.Fuck no, this is just the beginning. Governments can't risk their citizens falling behind in the AI arms race. This has nothing to do with the model being a super weapon, they just want Anthropic to fall in line. Don't get it twisted.
>>109108625Thing is, Anthropic and OpenAI do not have an edge large enough for a lot of their userbase to be willing to get jump through these hoops.The ONLY way for this to work out for them is if people can't turn to local or foreign models.And it happens that the people asking Anthropic and OpenAI to put in place those identity measures are also the ones that have the power to make unregulated local or foreign models illegal to possess or use, at least without a similar license.So unless they're total idiots, it's an *essential* next step, otherwise a large proportion of their power users will rightfully tell OpenAI and Anthropic to go fuck themselves "and the USA will lose the AI race".>they can't stop you from using foreign or local models anonIf they make the consequences of being caught large enough, they can
>>109108649They fought long and hard for this to happen.
>>109108662On the one hand, maybe. On the other hand, myopic thinking and short term greed.
>>109105886Based
>>109108655why don't you try it and let us know
>>109108673I suspect this current incident is just Trump fucking with Anthropic for dating to say no to him on something. But this is the inevitable outcome in the long run. You simply can not let the normie masses have access to a tool that can be used to help them easily do mass damage to civilization. You can argue the ethics however you want, but any nation that allows such a risk, will get burned by it and will be taken over by nations that dont allow such issues. Pretty sure Ted Kacyznski wrote on this, as ionic as that is lol.I just hope they will let us keep public models that can be used to easily vibe code software tools and videogames.. Though I guess they will have to to economically compete with other nations
>>109108673These fags have too many plates spinning. They wanna pull up the ladder behind them with AI but the whole AI bubble is what is currently keeping the massive global debt bubble from popping. Anyways on the local front how come nobody is talking about how Gemma-4-12B is retarded with tool calls?
>>109108751>Anyways on the local front how come nobody is talking about how Gemma-4-12B is retarded with tool calls?You sound surprised. It's a combination of it being a somewhat small perimeter size (I consider ~20B the bare minimum for any useful multi-step or "long horizon" work, tool calling or not) but Gemma models in general are not trained to be good at "power user" shit. They are excellent general purpose models for using it in a regular chat interface but lackluster for anything substantially technical like vibecoding. Not useless, not saying it couldn't be good but deep mine is way too focused making the models make people "feel" something or be malleable than actually being good at shit people care about. They chased elo scores which meant pretty much everything else suffered. It's why knuckle dragging midwits at /lmg/ were doing splits on "gemmy" 's nonexistent cock. It was deliberately trained to be easily jailbroken which made it decent at uncensored roleplay compared to other models in the same size range but that also meant it was it really really good at massaging people's confirmation bias (which again was the fucking point. It is meant to wow emotionally unwell poor excuses of human beings just like how people were fooled into thinking GPT-4o was sentient)
>>109108787>It's a combination of it being a somewhat small perimeter size (I consider ~20B the bare minimum for any useful multi-step or "long horizon" work, tool calling or not) but Gemma models in general are not trained to be good at "power user" shit.From the research I've done this issue is present in the larger ~30b parameter range as well.
>>109108798I've never had any issues with tool calling with open code while using Qwen3.6-35BA3B but then again that's probably because that model is specifically trained to be a stem maxxed autist (that's overall decent as purpose but really loves to shit out long "thinking" traces) that's decent at tool calling. If your model is general purpose then it's going to suffer even at similar or higher perimeter counts unless you fuck around with your back end settings to try and force it to be consistent (which might force a to be better at consistent tool call writing but then you might unintentionally make it worse in other important areas since at that point you would likely be going outside of the recommended backend settings anyway)
>>109108751Just last fall, Google pulled Gemma 3 from their AI Studio due to a politician complaining.>https://techcrunch.com/2025/11/02/google-pulls-gemma-from-ai-studio-after-senator-blackburn-accuses-model-of-defamation/They didn't take it off Huggingface and so on, but they could very well have. Granted, in that type of cases, mirrors would still exist and there would be no true consequences to downloading from mirrors, but still.
>>109108820Yeah I'm rocking Qwen3.6-27B and that model has no issues with tool calling. I'd love to use Gemma-4-12B instead though because I can run it at Q8 with full context with great inference performance...
Gemini 3.5 Flash might be a meme for coding, but god damn does it BTFO Opus and GPT on any task that involves interpreting images. For example https://github.com/Adam-CAD/CADAM [Image -> STL/SCAD] (for whatever reason, the dev decided to swap Gemini 3.5 Flash for GLM5.2 3 days ago, so dont pull that change if you actually give it a try). If your vibe project involves interpreting images/videos in any shape or form (even just visual debugging), I highly suggest setting up your coding agent to pipe that work through Gemini 3.5 Flash. Actually worth the effort.
>claude is downjust give me fable back.
>>109108655you know if this was the case people wouldn't care that fable was gone right
>>109109081well to be fair to him, people are mad that its gone from the sub, nobody would've cared if it was only available through api
where do i find my codex reset token balance? i dont see the button anywherei should have 1 from the double reset they did the other day unless i got scammed. did anyone else get scammed?
>>109109107it was only on the windows app and the vscode extension and there was a merged pr for the cli, dunno if it's already on the latest versionworst case ask an AI to check the git PR for the and write you a script to call the reset endpoint using your auth.json
I've been doing some piecemeal tests with local Qwen 3.6 and it's fantastic, but have finally thought of a small site I want. Do you guys recommend writing a detailed spec up front covering as many details as possible, or would I get better results building from small simple steps and adding over time?Earlier models I tested seemed to want to rewrite everything with every small change.
>>109109137Doesn't matter.But whatever you do, do not ever give up on half working code and try to start from scratch. Not even having the old code as a reference. Instead very slowly and carefully refactor what you already have, making sure nothing breaks in the process. That was my biggest mistake I made multiple times when I started vibecoding.
>>109109127I'm using the windows app. Isn't it meant to show up under the usage numbers? It was there a week ago but I updated the app and it disappeared. I keep updating it but it hasn't returned. Did it get shifted or whatI tried to ask AI but it doesn't know shit
>>109109137detailed spec for yourself to piecemeal for the clanker
>>109109156weird. for me it made a script to check. i asked it not to add reset functionality because i didn't want to accidentally reset when I still had credits left.https://paste.centos.org/view/922c5771
>>109109148That sounds like giving in to sunk cost. Surely sometimes it's just got to be easier to get a new turd instead of polishing the one you have, right?
>>109109175It doesn't matter what it sounds like, it's the reality.If you try to start from scratch then it will take a significant fraction of the time you spent already and it will still have half the issues you tried to avoid, or even other issues. It's much easier to improve your existing code.But if you don't believe me then that's fine, do whatever you want.
>>109109137>Do you guys recommend writing a detailed spec up front covering as many details as possible, or would I get better results building from small simple steps and adding over time?I got fantastic results from a large Opus-run /grilling session (see OP) fed into Fable because I had basically everything figured out in advance and I just needed an LLM to ask me about it and write my innermost dreams and desires into a Markdown document, and then Fable ultracode one-shot ityou do not have FableI do not have Fableyou will need to plan out discrete steps that your LLM can handle in bite-size chunks so maybe a big design up front might be a medium-sized waste of timethis somewhat contradicts >>109109148 because what I did was a rewrite, but I had a ginormous test suite that I didn’t throw out, and nothing broke
Not coding but I used an agent to research the building I live in. He checked public records and I showed him the crest on the fireplace. From the records and the letters on that crest he found a good lead, maybe it was owned the mayor. I'm going to the local archives this week to check the records he listed, they are only catalogued not fully digitised.It gives me an idea to get the first user of my inference engine though, I can offer to the archives to OCR some documents for them like a case study. Places like that don't have much budget, I figure a one off hardware purchase is more likely to be approved than recurring cloud gpu costs or API costs, and I have first class AMD/ROCm support so cheap gpus and first class Windows support and I can wrap it in some UI so low operator burdenOn Windows 9070 XT with GLM-OCR I'm currently getting ~3s end-to-end for 1280 longest side and ~334 tokens in that test page, while torch+transformers is at ~13s, I do need to check other actual inference engines depending on what will run on Windows/ROCm but I'm fairly confident because it's an underserved area, iirc some things use Vulkan as a crutch on Windows/ROCm.Mine is not fully optimized yet either and these figures are just batch 1, I have some provisional figures on batch, it scales fairly cleanly, ~113tok/s to ~830tok/s at batch 8I think the cost difference already plus the prospect of good UX is a good selling point though
>>109109221CoolToday I'm optimizing the GCN backend I began implementing yesterday for mine
>>109109137const correctness is very important
>>109109137>Earlier models I tested seemed to want to rewrite everything with every small change.I usually say in my prompt that I want changes to be minimal and compact.
thoughts on mimi code?
>>109109388never head about her
Is Claude down?
>>109109591opus 4.8 through pi is working fine for me right now
>>109109261
>>109109431fug, meant mimo code, the xiaomi harnesshttps://github.com/XiaomiMiMo/MiMo-Code
>>109109630harnesses are a memepeople only obsess over them because they feel powerless for being unable to do anything about the behavior of the models themselves
last day of non-api fable access, how was all your experiences?
>>109109600Mostly works for me as well now. It was down for 5 minutes in Claude Code CLI. It still sometimes needs to retry, but usually the task finishes.
>>109109663I finished one script and then I wrote many plan files.
>>109109663
>>109109651it's for organisation and parallelisation
>>109109916nahability to parallelize depends on the job, not on the harness. any coding harness can be started programmatically.now if you mean something workflow based like n8n then in theory yes, but that's not what I responded to. and in practice you're better off writing a script that hits the API directly anyway rather than using some low code abomination.
>>109109940i mean something workflow based like the software development workflow. tickets and PRs and code reviews and QA etc. the harness orchestrates agents working through all that
>>109109948You could actually do all that with prompts, it's not a particularly advanced use case. If you have a truly advanced need then like anon said you are better off making it yourself, I mean basically the entire thesis of all this is that software development is cheap now, you don't have to use existing slop
>>109109962 no, prompts aren't efficient. yes i used the ai to make a harness.
>>109109972Skill issue
what's the loop meme about? how do i loop an agent to improve something without my input? does it even work for esoteric domains?
>>109110125>how do i loop an agent to improve something without my input?Get chatgpt accountGet a $200 subBackup your filesInstall codex (cli version)Run codex --dangerously-bypass-approvals-and-sandboxType "/goal <whatever you want to achieve>"Occasionally ask how the work is going with "/btw hos is the work progressing?"Yes it's that easyOf course you're going to get better results if you put some extra effort into it but you can start an "agentic loop" within 15 minutes starting from a computer and a credit card>does it even work for esoteric domains?The more esoteric the domain is the worse it will workBut it's not only about how esoteric the work is, the main criteria for whether you should use an agentic loop or not is whether progress can easily be measured quantitatively (for example, optimizing some code where the quality of the program's output can be easily measured automatically)It works worse when you need human judgment, like when making an interactive program
Claude occasionally reminds me that other LLMs shouldn't be trusted. Does that only happen to me?
>>109110332You mean because Claude is so good, or does he actually say it out loud? My Claude always says Codex is right.
>>109109221Adjusted some things for batching. Peak was batch 20, the documents are kinda short though I still want to try longer documents because prefill is going to be the major component over decode, but it's ~1.7k tok/s which is pretty nice
>>109110332mine says it cannot be trusted including themselves
>>1091084951) you can stop pretending to not know how to copy/paste the code through an AI to identify hidden malware or some other malicious intent embedded in the html page. I provided the code in the persistent link. Its not hard to check. 2) yes, it assumes you trust your browser and your device, you have, in your control/possession, the security of client-side tools, including end2endencryption, is the security onus is on the individual. Nothing on that html page is ran else where or interacts with anything outside your device. You can serve it on your own device to yourself to use it. I explained how. You can do so with your device in airplane mode. Theres no phoning home, or built in key theft. 3) If you are on your own trusted device, and access it in your own trusted browser, theres no difference between this and any CLI-tool/app/application you make/install doing the same thing. 4) My point of posting it from catbox was NOT because they are a safer host then somewhere else, but to prove the page runs entirely client side and independent of the host. How is that not more secure? How does that not make anything safe "at all" ??
>>109106551How does DeepSeek v4 Pro compare to GPT 5.5?
>>109107310>codex limits are absolute shitI can't imagine what you're doing to run out using 5.5 on low, or how inefficient your prompting is. Correctly prompted, it should one-shot everything you ask of it. Don't ask too much at once either.
>>109107826>I asked to modify "recast-post-processing" to something different but with the same idea, more tied to translation work than rpI don't know how you did it, but the correct way to approach this so that the model doesn't waste tokens is to go step by step>examine how feature x is implemented>(turn on plan mode) let's modify this feature so it does X (ideally give it an idea on how you want it to be implemented since it explained the current implementation to you)>(review plan and execute it)It might seem trivial, but leading the model at the beginning and priming it can make a world of difference.
>>109107881Edit the system prompt and tell it it is not a person and should not try to pretend to be one.
>>109110681>lowSome people use xhigh for everything while prompting some bullshit in some bullshit project where it's reading hundreds of thousands of lines of bullshit
>>109108475>>109108495Last temp link was suppose to be good for 3days. Heres a new one. Again the persistent link one you can copy/paste the code into a text file, rename as an html file and self-serve. Recommendation is python -m http.server, connect with 127.0.0.1:8000. Can even be done from iPhone (iSH) and Android (termux). Supposidly good till 25th (though the last one was suppose to be good till 24th and is dead):https://litter.catbox.moe/tqhgboqidtvr71ls.htmlThe point of this is to be an accessible anywhere, ephemeral, tool to DYI encryption for just about any communication. Including pastebins, rentry, public and untrusted comms, etc. You create an identity and download the public version. Set a password to encrypt your private identity, and download that, store somewhere you trust. Later you load your private ID and decrypt it in the tool to load it. You can keep the tool offline and run it anyway you like. The point is its all local, so whatever you do with it, is whats happening and not on some remote server doing something you dont know. You share anywhere you want the public version. There is sign&verify section to confirm users, however the send and receive, and sessions do that too, but they also establish a shared secret and encrypt messages and files. Send&recieve does a new secret every message, and sessions establishes one secret you can use till you choose not too. You can save or store the "capsule" which can only reveal the secret with the intended private identity, which is password protected. This would be for continuous or more long term communications where repeatedly making new shared secrets is impractical or insecure. Like when you're able to use trusted comms temporarily, but know later comms will be less trusted. Theres a guide section explaining this too. And it's all post-quantuum resistant. So even if someone copies everything encrypted you download or post with it, theyll never decrypt it.
>>109108609I'm OK with GPT 5.5 forever. Just keep it cheap.
few weeks ago my xitter feed still had some decent AI related posts/projects hidden between all those spastic retards filming themselves infront of a green screen. now it's just spastic retards filming themselves infront of a greenscreen and jeet channels reposting AI news/projects from years ago. BRAVO ELON
https://developers.openai.com/blog/run-long-horizon-tasks-with-codexStop reading twitter indian slop threads and read this
>>109110787>he hasn't set up codex to browse x 24/7 and implement the most valuable ideas instantly as they're postedEnjoy the permanent underclass
>copex quantmaxxed gpt5.5noty
Opus has some good sides, but it's also horrible. I just can't trust it to actually be correct nearly as much as Codex.
>>109109651>>109109940>harnesses are a meme>people only obsess over them because they feel powerless for being unable to do anything about the behavior of the models themselvesYou are a moron who has no idea what you are talking about. You should stop posting your terrible advice ITT.
>>109110342The last time it said this was when I asked it about a paper where a RAG system was pre-filled with documentation by LLMs. It said that it would be better for the Agent to read docs directly because the LLM output in the RAG would be "nonsense.">>109110445>mine says it cannot be trusted including themselvesNow that I think about it, this is probably the implication of LLMs in general that Claude was trying to say.
>>109110942I don't know what you expect me to do with that. I wont stop posting here.
>>109110710Don't call me out like that
start reading this https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f
>>109111045>karpathynah I'm good homie
>>109111045you mean the "we don't need no stinkin lidar, fsd in 2 more weeks" karpathy? that karpathy?
>Hmm, LLM dear could you look through this router’s firmware for security vulnerabilities?>Oh my! An open port listening for a (((magic packet))) to be sent so it can spawn an SSH session with full auths?Reminder this actually happened to netgear and the analysis was done with local models, not Fable.So this entire ban on Fable is performative bullshit. The LLM hall monitors WILL force you to write non-glowware code and there’s nothing anyone can do to put that genie in back in the bottle.
>>109108617I can actually sympathize with Amazon here, even if I don't agree that suppressing technology is the right response. The answer is simple but the explanation isn't, this is gonna be a long one. It's because they know that competent AI disproportionately advantages small teams and individuals over large corporations and that has very bad security implications for large tech companies, especially a company like Amazon who hosts nearly half the internet on AWS and measures LoC by the million. I'm going to keep this specific to infosec because that's my field, but some of this likely generalizes to SWE and even non-technical domains. The only reason why 0day exploits don't drop every single day is because there aren't a lot of competent vulnerability researchers out there and most of them generally want to break things in order to then fix them, not just break them for the sake of it. Humans, especially working on giant codebases, tend to be lazy in how they build things. You can enforce constraints on their output, linting, typecheck, PR reviews, dev-test-prod types of deployment solutions but managing at scale means reducing laziness not eliminating it. Because business drives IT and not the other way around, when budgets+timelines meet security at an impasse the latter always yields to the former. (1/3)
>>109111213>>109108617So there's plenty of vulnerabilities out there to be found, and the cadence at which a bug is found by an org, fixed, and patches are deployed still moves at the speed of lazy humans. The patch cycle also has stability implications, most orgs don't just deploy an update into production without testing it first to see what it might break. Because the same lazy humans write the patches in the same way they write the releases. But because security researchers are also lazy humans, this speed is acceptable: why find and weaponize a bug that the org is already aware of and working to fix? We focus our time on finding things that are unlikely to be known to the org/actively being patched and then submit it to their security team (if you're a true believer who just wants software to become more secure) to collect our modest bounty payments. Or sell the exploit to a nation state (if you just want to maximize your income) to collect a year's worth of pay. So there's two tiers of bugs, the big and complex ones (what security researchers focus on) and the less complex/small ones. Note that small does not mean non-impactful. It just means the lifespan of an exploit weaponizing a less complex/smaller bug is much shorter. AI is currently really good at finding vulnerabilities but bad at verifying them, which means that competent, independent researchers can now do the work of small teams but there still needs to be a (competent) human in the loop sorting through the slop and finding the gems hidden inside. Remove the competency and you get the current state of bug bounty programs everywhere: triagers getting buried under a mountain of unverified AI slop reports, yet unable to actually ignore the slop because 20% of those AI bug reports have real issues while 80% are either hallucinated, blown out of proportion, or real issues but not attacker reachable.(2/3)
>>109111222It also means that you can't just tell AI to "find all the bugs in my 10M LoC codebase" and expect to get an answer that is actually actionable: it's going to bury you in the slop. Fable/Mythos changes this, because the most useful improvement here is that it can actually check its work in a way that Opus and GPT can't(for reasons I can get into later if people are interested). Now you have not only a 10x speedup for competent researchers, you also make it possible for non-security people to find real bugs without having to sort through the false positives and unsubmittable findings(which they can't currently do). This changes the speed of the find-fix-deploy patch lifecycle that I mentioned previously. Now all of those short lived, non-complex bugs can be weaponized and deployed in a way that's way more automatable. If I had Mythos without any safeguards and just wanted to pop as many rootshells as possible, I wouldn't try to get it to build really complex exploit chains and chase those high hanging fruits, I would have it patch diff every major software release, find out "is this a security patch" then reverse engineer the bug it was trying to fix and start owning everybody who didn't instantly patch their shit. I'd have it decompile/source code audit their release to see if any sibling bugs remained unpatched, then create an exploit for that too. (3/4?)
>>109111229You can do that with Opus and GPT, but it would take a lot of human attention to filter out the false positives and such, which isn't worth it since there's a short timer on these bugs from the start. But remove the human and now the financials become much more favorable. Every patch Tuesday would lead into threat hunt Thursday because the deployment timelines would now favor attackers by an order of magnitude. Maybe using Fable/Mythos to fix your shit first solves most of this problem, but it doesn't stop the time-to-fix problem in a practical sense, because most orgs cannot instantly patch their shit. This type of model requires a different approach to how we fix insecure software and a large+broad change like that scares the shit out of large companies. (4/4)
>>109111229>>109111235I agree with your assessment that LLMs tilt the scales against large orgs a bit. But I disagree that fable is that big of a game changer.You can already accomplish a lot with existing LLMs.Sure fable's more elegant and less clumsy, but banning it isn’t going to stop you from getting pwn’d.
>>109110705I didn't use "plan" feature, thanks anon
>>109106642Tell me about your game ill give you some creative ideas
>>109106642yeah they're pretty bad. add a meta level like an overworld and ripoff triple triad.
man i can do so much with the skybox it's crazy
>>109111692always hated codex cli and now I have a reason
Why is vibe coding so painful.
fable still being disabled today indicates it is never coming back. they will offer Opus 5 "distilled from Mythos" (actually barely an upgrade over 4.8) and that's that. it probably isn't even remotely related to cybersecurity. like many people have said, gpt5.5 is almost as good as mythos for pentesting. it's probably because the US government is now serious about AI and is internally training models with the big frontier companies, and something Fable-class being public gives China an edge through distillation
>>109111754Because you still need to use your brain to make anything worthwhile
>>109111776Amazon owns a big chunk of Anthropic and sells them billions of compute. The only business reason why they would report them is if it was an inside job.
>>109111794Sometimes it's not bad, but with more difficult things it actually feels worse than programming, because I don't even know what's going on, and at the same time I can't trust that it really will be correct.
>>109111692Pretty sure those issues are just about large sqlite like a few gb and it causes slowdown on launch, not constant writes
>>109111875You can ask it to explain what just happend
>>109111884no, codex is constantly writing application traces to the sqlite db. they made full debug logging the hardcoded default
>>109111290>>109111776Without a doubt, people sleep on GPT 5.5's capabilities actually, if you pass it the information in the right way it is a ridiculously powerful model for vulnerability assessment. GPT5.5 is significantly better than Opus at finding bugs but pays for it with a much higher false positive rate and for both models a human is still required to verify outputs. There are ways to reduce this, such as forcing the LLM to actually demonstrate that the bug can be practically exploitable, but not quite to the degree where you can do the whole thing autonomously. That mitigation also means that you're leaving a lot of real-but-not-easily-reproduced vulnerabilities on the table without a human to manually verify the output. That's honestly the only area where Mythos really pulls ahead, the raw bug detection abilities of the big two(doubly so when combined) are already equivalent to Mythos. The gap in terms of the practical impact of this technology between going from 95% autonomous to 100% autonomous is very large though. >>109111809Or maybe, just maybe, they're telling the truth. That they see a big enough disruption to their business model that it outweighs the blowback from pissing off one of their biggest customers.
>>109111692>A debug logging sink writes to a local SQLite database (~/.codex/logs_2.sqlite) at the noisiest possible TRACE level by default, dumping everything from WebSocket payloads to routine file accesses. One user measured ~37 TB written over 21 days of uptime>The bug ignores the standard RUST_LOG variable so there's no easy way to quiet it, ~71% of the logged data is useless TRACE noise, and despite related reports since April it's still open on GitHub.
>>109111925Ok I checked and it is a bit excessive but on my machine it's nowhere near 640TB/year. This has been open/working for about 2 days, only ~126GB. And anyway what's the problem, you can have it fixed in an hour at most, you do use your own codex fork, right? In fact I'll ask my clanker to solve it right now
>>109111692>>109111985They didn't bother to state as much but this does affect the Codex App as well.
Was that so difficult?
Updated vibe coded client-side quantum resistant encryption html page. Dead link after jun25 (wallet version):https://litter.catbox.moe/ios0x91k4koevbwp.htmlPersistent code only: (wallet version):https://files.catbox.moe/rhvdx1.htmlThis version has a wallet section to wrap trusted public keys with your identity file, so you only need to save 1 file. It is encrypted the same as the rest. It stores and unlocks your identity file which FYI the identity file is all thats needed to regenerate your public identity file for sharing. Localhost from anydevice (including mobiles):python -m http.server 8000 --bind 127.0.0.1python3 -m http.server 8000 --bind 127.0.0.1127.0.0.1:8000
>>109111910True, but it's just never the same. I'm currently working on some neural networks, and I just have the feeling that something won't work and I generate training data for nothing.
>>109110332mine hasn’t said anything either way
new thread:>>109112567>>109112567>>109112567