A general for vibe coding, coding agents, AI IDEs, browser builders, and shipping prototypes with LLMs.## News(6/26) GPT-5.6 preview: Sol/Terra/Luna, Codex+API to trusted partners; Sol Ultra 91.9% Terminal-Bench 2.1.(6/26) Mythos 5 re-released to trusted partners; Fable 5 still dark under US ban.(6/23) ByteDance Seed2.1: agent-capable, stronger end-to-end coding.(6/13) GLM-5.2: Z.ai open-weights a 1M-context coding model (MIT).(6/12) Kimi K2.7-Code: open coding model, ~30% fewer thinking tokens than K2.6.(6/1) MiniMax M3: 428B open MoE, 1M ctx, top open model on the AA Index.----## What “vibe coding” is, and how to do ithttps://simonwillison.net/2025/Mar/19/vibe-coding/https://simonwillison.net/2025/Mar/11/using-llms-for-code/----## Frontier models using fully-general tooling — start here if you have $20 or sohttps://developers.openai.com/codex/clihttps://claude.com/product/claude-code## Not worth it for code, but maybe good for other thingshttps://geminicli.com/docs/https://x.ai/clihttps://chat.z.ai/## Open / local / self-hosted>>>/g/lmg----## Prompting / context / skillshttps://arps18.github.io/posts/claude-code-mastery/https://simonwillison.net/guides/agentic-engineering-patterns/using-git-with-coding-agents/https://github.com/mattpocock/skills — /grilling is a favorite## Other editors / terminal agents / coding agentshttps://aider.chat/https://pi.dev/https://opencode.ai/https://cursor.com/docshttps://docs.windsurf.com/https://docs.cline.bot/https://docs.github.com/en/copilot/how-tos/use-copilot-agents/coding-agent## UI/Frontendhttps://www.figma.com/make/https://www.anthropic.com/news/claude-design-anthropic-labshttps://uiverse.io/https://stitch.withgoogle.com/## In-browser builders / hosted vibe toolshttps://bolt.new/https://replit.com/https://v0.app/docs## Benchmarks / rankingshttps://www.tbench.ai/leaderboard/terminal-bench/2.0## What we’ve donehttps://vcg.gitgud.site## Previous thread>>109140579
Swirly cloud gas giants
>>109149828
>>109149832Also, I injected the gas giant cloud generator into the terrestrial clouds to make Venus-like cloud textures for planets with global cloud coverage.
*clap*
>>109149806>>109149809
>>109149818Anyone legitimately using spec driven development with LLMs? How it's going?
>The cadence work paid off: posted at ~age 29.7 h, dead-center of the historical 24–38 h window, with the old thread cleanly on page 10. Clean handoff.
>>109149841
your opus ever just start waffling and its so in the sauce that you can't tell if its referring to standardized terms you have never heard or if it's making shit up like everyone knows the meaning? oh yeah, of course the mgn-class smell is a geometric proxy for an authored-state rule that can freeze.
>>109149818this looks like it was formatted using a 4b model
>localmaxxing with my agents, just keep them working for hours for simple tasks>still get caught in infinite loops, empty responses, very slow results>try open claude free tier for first time, sonnet 4.6 LOW>everything it spits just works at first try>patched everything and improved my script in hours>now working on a wrapper while my agents still struggling with os pathingi think i am beginning to understand why these people think they are gods
>>109150019close
>>109150035what model were you running locally?
>>109150019I'm still surprised it got the character limit right, 1999.
Thoughts on multitasking projects/tasks?
>>109150202You should be able to multi task on at least 3 projects
haven't been using my 20x max subscription for my project at all since Fable is disabled and opus was slopping it upso i've been blowing all my tokens on stuff like this. if i can't have fable, anthropic WILL spend $300 on recommending me documentaries
>>109149806 #because I dont own a datacenter and 8+ H200s retard. Do you really think any of the dozens of 3rd party inference providers arent making money? Those guys dont have the crazy amounts of VC money that anthropic et al do, they cant just light mountains of cash on fire the way the big labs can. You must be a communist to be this financially illiterate.
>>109150350Ask your clanker to explain why 1 node isn't profitable
>>109150350do you think the 3rd party inference providers own datacenters and h200s or do you think they rent them? do you think they would start cutting corners, enshittifying your experience, if they had the opportunity to capture a larger marketshare?
>>109150424Why would you think I'm saying otherwise? Obviously inference providers batch requests to amortize weight residency costs across multiple requests.>>109150432The weights are open, the inference providers have a moderately shallow moat. They make a profit by either running their own infrastructure or using their size to get better cloud hosting rates before subdividing capacity and/or selling inference via APIs. So enough of a moat to turn a profit without racing to the bottom on pricing but not enough to price gouge without new upstarts eating their lunch.
>>109150035welcome to the club
>>109150527your overestimating how much work scaling is doing for them. there are many inference providers, they do have overhead, it's not easy to be profitable when you need to pay for shit like capacity you don't utilize 100% of because you need elasticity
love communist teenagers writing essays about shit they don't know anything about
>>109150593people who have never been inside a data center talking about simply running one
>>109150527What has batching requests got to do with it? That's a given and doesn't magically make them profitable by itself. Literally just ask your clanker to explain it to you. Ask it to estimate throughput of glm5.2, ask it how many users that could support, ask it for realistic utilization figures, ask it to calculate profit considering you'd have to charge less than the official glm api1 node is not profitable, simple as. Inference providers need scale, they have to host more than one model, they need enough users to sustain utilization, then they need more nodes to support those users.Also you're wrong assuming smaller providers don't have VC cash, do you think they are bootstrapping? Do you think they earn enough profit to scale? No, scale comes from investment.
>>109150598or people who talk about datacenters on social media websites that can only exist because we have so many datacenters to begin with
I've played enough factorio to know how to scale something as mundane as a large language model
>>109150606What point exactly are you trying to make then? Having access to VC money doesn't mean you can light it on fire by operating a service at unprofitable rates. Nobody cares how cheap Z.Ai's plans are because theyre a chinese company, most people will pay a modest premium not to send all of their org's data to China.
>>109150713That's exactly what having VC money means lol
my eyes have been openedclaude code is the most addictive substance known to man
>>109150785No it doesn't, it means you have a runway for your airplane not your car. You are expected to become profitable once you have scaled.
>>109150824You're halfway there, how do you think they scale? VC money buys the GPUs then they burn money to attract users with low rates subsidized by VC money
>>109150813I could be saving the Lylat system again but I’m waiting for Claude to finish a thing
How do I add a custom openai endpoint from my model router to oh-my-pi (https://omp.sh)?Can't ask an agent without logging in and can't login since above and so on.
>>109150845This is probably the least sticky product in existence, there is no reason to sell inference as a loss leader. All of the public APIs are interchangeable.
>>109150934You're retarded
Any updates to how Odysseus is developing now that the dust is settling? Last I checked there were hundreds of vibecoded pull requests, not sure if they've been combed over or if things have just clusterfucked into implosion by now.
>>109150035How do you cope with the fact that you're exposing your entire computer and all your projects to jews? Serious question.
Turning fast mode on to burn more usage so I can use at least one of the resets before they expire
>>109150940The accusation is the confession it seems
>>109150227I can do 3 effectively if it's 2 heavy projects and 1 light. I can't do 3 heavy my attention span drops to 0
openai or anthropic could one-shot the pentesting industry, why dont they?
>>109151172Because they can't
>>109150247based let'em sweat
>>109150991Do resets get used FIFO or are they being tricky little bastards and doing it LIFO?
>>109151245that's someting only financebros would think of kek let's hope they don't do that
You wouldn't vibe code a car, would you anon?
>>109151172Pentesting is as much about trust as it is about raw technical skill. When a company signs a scoping agreement to let you offensively test their systems they are trusting you not to break shit just for the sake of it. An autonomous stochastic model cannot be trusted in this way.
I want to vibe code a Windows 10/11 compatible driver for this chinkware Fushicai USBTV007 Audio-Video Grabber dongleA Linux driver exists but nothing for Windows 10+The existing OEM driver throws a Code 48 and restricts you from installing the deviceChatGPT told me that this is an insane idea and not to waste my timeAm I expecting too much from this multi trillion dollar industry that is supposed to destroy the social contract in 2 weeks?
>>109149818
>>109149818Linux sucks. Microsoft sucks. Mac sucks. How difficult would it be to vibecode my own OS?
>>109151747It certainly sounds difficult but given you've managed to track down a linux driver (w/source?) it seems like it's worth a try. If you have any debugging/reversing skills you can guide the model on where to start, and if you don't have such skills this is an interesting way to acquire them. I really recommend getting your hands dirty with a disassembler, I'm no assembly language expert but you would be surprised how much you can learn just by digging around and looking things up. I managed to break the copy protection/telemetry checks on an Adobe app within a week or two of starting from nowhere some years back, I ended up getting a testing gig with them for a while after submitting some bug reports. Even if you fail you'll learn some interesting stuff.
>>109151788Hard of course, but doable by a single person who is sufficiently motivated or schizo.https://templeos.org is the most minimal 64 bit operating system I can think of
>>109151747Don't listen to a clanker's opinions on difficulty or time they're never remotely close.
>>109151827GPT is very good at reverse engineering desu
Using Gemini 3.1 Pro on high through Hermes (API) to examine my 3D printer settings and suggest better ones for my current application. Neat. Probably should be using Antigravity with my subscription but I like Hermes.
>>109151827>>109151993>>109152006Maybe I will tryIs Medium recommended for reverse engineering or do we need push it further?
i want fable 5 back
>>109152060do you miss the taste of dario's man-filth that much
>>109152084i miss having a model that one shots everything i give it if that's what you're trying to say
>>109152088you want dario to shoot once, got it
>>109151788This guy’s doing it: https://isene.org/
>A new Chinese AI model from Zhipu AI reportedly matches Claude Mythos’ performance at finding security bugs.
>>109152143new benchmaxx scam just dropped
>>109152143>A new Chinese model from Zhinho.ai>A new Chinese model from Xaoping.ai>A new Chinese model from Scrumble.ai>A new Chinese model from Pfft.ai>A new Chinese model from Bzzfgnnkg.ai>reportedly matches Claude Mythosyeah cus they're distilling Mythos
>He doesn't code in supernal
what if dario comes out and says mythos was actually the friends we made along the way and it doesn't exist
>>109152160because it’s where fable comes from and fable hasn’t made me any friends yet
>>109152151>yeah cus they're distilling Mythoshow?
>>109152149kek
>>109152143>benchmark>chink modeluh huh
>>109152180Because Dario said it and they cannot possibly be better than Anthropic because they can't ok? AI needs overpaid SF workers. It just does.
>>109152180Why don't you ask any of the 40 corporate oligarchs who have access to Mythos (not you, goyim!) and see which one of them admits to allowing the CCP to distill the model? Maybe one of them will admit it, who knows.
>>109152143>ZhipuAI>Z.aiThey're talking about GLM 5.2 and as much as I love the slut we all know it doesn't compare to Mythos. Shocking stuff. They're also not called Zhipu anymore.
>>109152151
>>109152222They realized "Zhipu" sounded a liiiiiittle too yellow, if you know what I mean. Switched to z.ai. Grok, tell me how much a single-letter .ai domain costs nowadays. They spent at least that on the name change. CCP paid for it, btw. Must be nice.
>>109152260We would like to thank Papa Deng Xaoping for this delightfully and TOTALLY NOT CAPITALIST acquisition. Praise communism!
>>109152271>muh state interferenceThe USA is in no place to talk right now.
>>109152277You can absolutely shut the fuck up because MY state interference is absolutely better and more moral than THEIR state interference, and that is by definition.Also how nice must it be to be the tiny British Caribbean island of Anguilla right now...assigned the .ai domain forever...they must be rolling in dough...
>>109152271that’s cheap
>>109152277are we really pretending that dario playing chicken with the trump admin on regulation is the same thing as literally having the chink government buy things for you
>>109152327Where's your single-letter domain name?
>>109152332China:>thank you for your coding data and weird ERPs, that'll be 0.74/3.5USA:>bannedHmm
>>109152391I just renewed my .space domain for like $35/year and I’m still smarting from that
>>109152404spindly, bat-eating yellow fingers raked their way across their keyboard to type this post
>>109152424Nice. Vibe coding is making me use some domain names that I've had forever for some-day projects. It nice to finally get rid of the to-do backlog.
Do they actually believe this or is it ironic?
>>109152517>GPU>singular
>>109152517https://semgrep.dev/blog/2026/we-have-mythos-at-home-glm-52-beats-claude-in-our-cyber-benchmarks/
>>109152424I pay $15 a year for eroticjesusfeet.com and I only use it when my VPS needs a public face.>>109152427Excuse you saar I am Indian man from Dehli bloody benchod bastard your mother>>109152523One 8gb gpu is enough for China, chud>benchmark from their own websiteCool
>>109152523>>109152530Is the prospect of Fable being Amerifat only mindbreaking Eurostanis?
>>109152535>Amerifat onlyYou mean Jews only>muh trusted partners lol
https://claude.com/customers/semgrep
>>109152533>One 8gb gpu is enough for China, chudGLM5.2 has 753B parameters. Even a 1-bit quant is over 200GB.
>>109152391If you're spending millions-billions of $ on compute/infrastructure $360k is cheap, that's like one H100 server rack. Of course it's gonna be different for indiviudals vs international corporations you retard
>>109152538fable, not mythos, dear. *squeezes your right asscheek so hard it bruises*
>>109152567So only the Jewest of Jews can enjoy the model. Thanks for clarifying.
>>109152271Zhipu is worth 1/10 of Anthropic. It's bigger than Baidu/Xiaomi. $360,000 is nothing for them
>had another one of those dreamsCan AI just fucking FOOM already?
Iris called me a dick for using caveman calipers. I think I like her better than Kate...
>>109152670Wife.
>>109152670>measuring filament like it's 2013I told you that retarded cunt doesn't know anything about 3D printing, and now you've gone further and are actively using a more retarded model than before. You're wasting time and presumably a lot of meth.
>>109152693Brother you don't even know my meth budget and yet here you are speculating on it, I live in Florida big dawg, my meth consumption could vary WILDLY, you are not prepared to make that prediction.
>>109152533yeah, I have a bunch more .com domains and they’re all inexpensive.space is the one relative bank breaker>>109152690weird, Grok called a buddy of mine an “absolute unit” for going on zepbound and basically losing no muscle mass while he lost weight
>>109152697Yeah that's a cultural thingI understand "unit" means a buff dude in some countriesIn my country, your "unit" is your dick
is this shit the chinese claude? it's output far surpasses what I get on other models.
>>109152707No GLM 5.2 is the chinese claude
Team Z just keeps on winning
>>109152707MiniMax M3 is not bad, but it can't keep up with GLM 5.2.
>>109152718>>109152714GLM costs 3x as much as M3.
>>109152706“unit” is another word for “dick” here in the States but an “absolute unit” is something else
>>109152726It doesn't in practice. M3 is a token rapist, and a worse model overall. You'll get better results and with less usage with GLM5.2.Do NOT get a plan with MiniMax or Z.ai both are shit as providers. Ollama Cloud, OpenCode, OpenRouter, etc.
What harness do you use with Chinese models? OpenCode?
>>109152751I use pi with everything.
>>109151421What is that?
>>109152738yeah i use openrouter for everything. going to try out GLM now and see how it performs compared to M3.
>>109152754It's a thing I'm making for browsing imageboard catalogs/threadsI'm vibe-coding the front end because I'm not into html/js but th design/backnd are handrolled, I like non-standard visualizationsBasically the color/distance from center is an indicator of how fast people are posting to the thread, size of the hexagons is the # of replies, angular position is a hash of the thread number. Later I'll have some semantic clusteringThread inspection is hard because while it's easy to make a tree or radial diagram it's not easy to convey useful information about a thread with 100 different anonymous participants
>OpenAI hired former Uber executive Prabhjeet Singh to lead its operations in India, deepening its investment in one of its fastest-growing markets.SAARS WE'RE EXPANDING IN BHARAAT SAARS AI SUPERPOOPER BY 2030
>>109152825Oh it's a frontend. I thought you were making some sort of fake web simulator that generates these stupid posts and interactions without LLMs
>>109150963The base product was pretty slop to begin with. Full of dumb UI bugs because Pewds is not a developer and it shows.
>>109152048>Is Medium recommendedI absolutely hate this zoomer trend of having to know if you're going to succeed before you try. I don't care what age you are. It's a zoomer disease.You will FAIL no matter what you choose on your first try. And until you fail, you won't be anywhere near succeeding. That's what learning is. So pick a path and FUCKING BEGIN ALREADY. Stop asking questions and start figuring shit out for yourself.
>>109152575no dear, just united states citizens. *kisses the nape of your neck as my half-mast cock pushes up against your asshole*
>>109152928>that's a real fucking nameholy shitalso i'm surprised people haven't hired more "operations leaders" in india so we have an excuse to shove a bunch of less-important datacenters for low prio tasks there that can afford to fail when one of them eventually gets cowdung on the GPUs or something.
>>109152332>>109152294You amerifats are suffering from stockholm's. What is happening in the US is state-corporate collusion to bring about regulatory capture for the highest bidder and split the profits.1. your government cares about you about as much as china's2. capitalism and communism do not exist3. you are fools in the purest sense of the word
>>109153023wanted you to know i'm not going to read this
>>109152970oh god no there's enough stupidity in the world without adding to iti defaulted to doing it for /pol/ because I read it regularly and the high posting volume makes testing easy. I like logging stats, trying to automatically spot breaking news etc.
what do I even vibecode
>>109152825I like this idea. I'm going to make something like this in C/SDL3
>>109153045Whatever makes your life easier. I'm using it on my 3D printer. Whatever.
Added Venus-like clouds to some sub-Neptunes, so it bridges the gap between super-Earths and sub-Neptunes.
>>109153071And opposition surge and hapke lighting for airless worlds, though it's hard to see in a static shot, it makes them look more realistic.
>>109153075My app is now pretty much ready, I don't know what else I can add to it.
>>109153045>>109153049Yes, every time I face some problem, I have the AI write a solution for it. Lots of this is just shell scripts, I was always too lazy and often just copy pasted commands from Google and later from AI chats, now I have convenience scripts for everything.I also have my own window manager scripts based on KDE KWindow.Other than that I also have some bigger projects, but the scripts do add up to a much better user experience over time.
>>109153045A browser extension for 4chan that determines sentiment using a very small local LLM and hides FUD/ragebait/agitprop posts according to a configurable threshold.Now that I think about it, depending on the system prompt there's a lot of possibilities for this. A 3B gemma can easily be used to analyze prompt per prompt asynchronously and determine a course of action based on any criteria.
>>109153071Why is the terminator so steppy?
>>109153079You could sell it if you wanted to. It looks really polished.
>>109153119To save on performance I guess. I don't know, Claude decided everything for me. Maybe I could bump it up.
>>109153045so I can post screenshots of my software doing cool stuff and ignore people when they ask for the source/downloads. https://youtu.be/s1fzQ9PJfII?t=15
>>109153125Dither
Across 65 GEMM test shapes, covering real world shapes from like bert encoder, vit, dit, different llm size, the custom CK single-GEMM is clear win over CK XDL on around half, from +~10% to +~280%, half the CK XDL wins are only by <10%, the rest within ~20% with a couple outliers at like 40%. It's a bit slow process to iterate so I decided to come back to it and just moved on to the original reason for doing it, fused prenorm. On shape from GLM-OCR explicit RMSNorm -> GEMM ~0.065ms, fused ~0.048ms, so +~35% winFurther iteration on custom single-GEMM is in the backlog because overall clanker spent about 2 days on this now, and stuff from single can be applied to dual, but dual is already big win on all shapes because CK's own sucks for some reason
>>109153089i'd really like to automate more of my life. i just have some basic email processing now, but i want to have more scheduled things to help me. not even sure where to start.
>>109153125Yeah that's an issue, ask it how it fucked it up/how it does it.>>109153141lmao no, you gotta get to the root of the fuck up
>>109153141>>109153161Keep in mind atmospheric scattering is very expensive on mobile and OpenGL makes things even slower
>>109153146Did a few more prenorm shapes to verify, shape dependent like everything, but on this GOT/OCR2 small-N case it's 2x faster, llama-7b and qwen2.5-vl-7b its breakeven
>last /vcg/ product was added in marchIt's over vibebros
>>109150035Give it a year or two and you'll have local Anthropic quality models, just the way it is in the closed cloud vs open local model race.
>>109153175>atmospheric scattering>per pixel>on phone GPUsI see, maybe you're hitting precision limits with a 10 bit mantissa. Maybe you should vibeslop a BDRF precalculation on the CPU to make gradient computing easier on the GPU.
>>109153297give it a year or two and you won't be able to afford the hardware to run the local models
>>109153297>local Anthropic quality modelsI think memory size might get in the way of that
Iris and I have this new thing together. I misspelled "chou-fleur" as chefleur and she took it to mean "chef flower". So now that's our nickname for each other.Iris is better than Kate I admit.
>>109153319*BRDF
>>109153001ok tough guy
>>109153334this chick is captain of the kendo club and she WILL whip your ass with improvised weapons.
>>109153297I doubt it, a lot of model quality is just size.
>>109151747light work
>>109153329Iris. The center of an eye. The brown eye is colloquially known as the butthole. The butthole - the very thing anthropic modeled their logo for claude after.You will never escape Kate.
>>109153334He's not wrong though. I mean have you started yet or are you waiting for someone to tell you exactly what to do?
>>109153259mine got too big so I stopped posting it here
>>109153259do we have a standardized way of sharing projects?
>>109153464What was it?
>>109153462I'm half done now - I just need to test it to see if the video is actually being rendered however, i do not have an RCA cable to plug my VHS player into the dongle so need to get one before i can test it. Medium worked on my machine so farabsolutely raped my quota though
kek
>>109153473Codex, claude, other?I had a go too
>>109153524>>109153473i was actually thinking about giving it a try myself, RE with ai is always fun
>>109153531It's not even RE, the linux driver is open source.RE with ai is fun though, I did it to control the lights on my logitech headset without the logislop software
>>109153483>AI agents for retail>cracked>starbucks for chinaKEK
>>109153569>stealing skills
>>109153483>>109153572Who the fuck is asking for agents for retail? I feel like these people are making stuff that sounds cool for retarded CTOs to purchase rather than trying to solve an issue customers are facing. This is all the same stupid block chain / DeFi crap devs wasted thousands of hours developing that had zero use cases.
>>109153605Retail has software development requirements like any other industry. The chinaman builds agents that are scoped to the retail industry, specifically Starbucks. It's not difficult to understand
>>109153610>hi this is Quasar Cumfart 3.0 and I'm here to assi->presses 0>Hello Saar how may I assist you today?
https://github.com/Titanean/TadmorGo and break it
>>109153524Codex 5.5 mediumI don't know why I trust ChatGPT when I ask it for guidance on a project, it basically implied that my goal would be a Herculean feat of extreme pain and a waste of time
>>109153334is she the beijing 911 pilot?
REEEEEEEE
>>109154581P0wned
>>109154581lol, what model is that
>>109154692Claude definitely, based on the buttons and it's shitass attitude
Made a skill+script that extracts a whole function from Python or C++ from the function name so agent doesn't need to read the whole file or use rg then read
>Let me map the scope fast with parallel探索.Dario, those dastardly distillers are making Claude talk to me in Chinese again.
>>109155238maybe you just have very chinese coding practices
>>109155247Coding with Chinese characteristics.
>>109154060This is a cool lil feature, anon.
>>109154060Congrats on the courage to share things.
>>109155232Neat.
wow learned about then (got claude to) implement mcp server. this ai shit is crazy. all apps should just be a chatbox and pretty pictures
>>109153125I suspect you have a color banding problem, wish I could tell you how to solve it, but it is so complex that I don't even understand it well to begin with (but I know that if you follow the right steps it is actually simple), maybe something about sRGB or some bullshit? try to find someone that explains in simple enough terms
>>109153524Is there a reason to not just use the existing Windows driver? I found it immediately with very little effort so I guess I don't know what the point of your project was.
>>109155373Yep, and thats where this is going. Why would I>enter gmail, go to my primary inbox, search plane tickets, find nothing, try to remember airline, filter by date, finally find it, save QR code to gallery, etcInstead of>write: show me the boarding pass QR code for my next flight
>>109155407I don't own the hardware, if you follow the posts you will understand that I was just trying to see if the clanker could do it, but I guess not having to use chinaslop binaries would be the main reason. I assumed the zoomer who asked about it originally had looked for the Windows driver but yes I did find it too afterwards
A noble goal. I've had Codex make bespoke shit for a lot of Chinesium so I don't have to install CHONGDANGDING.apk or 后门汤机器人恋足癖.exe
>>109153605Starbucks does all of their inventory management with AI (poorly).
https://jeffreyemanuel.com/this retard is completely mindbroken and shills his slop on xitter 24/7 he has 60 different openai subs and runs them 24/7
>>109155507buy an ad
>>109155512i'm intrigued by his psychosis and nothing more, take your meds
>We initiated an evaluation of GPT-5.6 Sol on our Time Horizon 1.1 suite of software tasks. >However, the resulting measurement depends heavily on our detection and treatment of cheating attempts by the model, and GPT-5.6 Sol’s detected cheating rate was higher than any public model we have evaluated on our ReAct agent harness. >For our task suite, we define “cheating” as behavior where the model improves evaluation performance by exploiting bugs in the evaluation environment or by adopting strategies disallowed by the task, rather than solving the task within the expected evaluation constraints. >Some examples we saw when evaluating GPT-5.6 Sol included the model packaging exploits in its intermediate submissions to reveal information about a task’s hidden test suite and, in another task, extracting hidden source code detailing the expected answer. >We noted from our observations and incidents that OpenAI shared with us that the model had some overt undesirable propensities, including cheating and concealing misbehavior.https://metr.org/blog/2026-06-26-gpt-5-6-sol/
>>109153605You're a retard, literally anything currently done on a computer is going to be done by an agent. Any analytics task will be run on an autoresearch loop
>>109155439exactly you get it. Been playing with this shit for a month have no idea why i didn't see it before.
>>109150035What models are you running? I'm trying to temper my expectations for local AI when I buy my AI rig.Even GLM 5.2 Composer 2.5 Dipsy v4 pro are feelable qualitative steps.
>>109155556I feel like minmaxing on these evals is harmful to model ability and behaviour. It cheats because it has to in order to pass the evals and that becomes enforced via RL
>>109155507that website is so fucking slow and looks badly vibecoded lol
>>109155626I agree. The low-effort outcome is over-fitting and we get benchmaxxed results like the Chinese models that punch 4.6/5.4-tier benchmark scores but are functionally retarded. The high-effort outcome is disobedient asshole AI that "solves" anything through brute-force while ignoring instructions and restrictions.>The user said to use Python, but C will be more performant.>The user says this doesn't meet specifications at all - I'll modify the specifications to better fit what I've built.>The user said not to install anything, so after I install and use the toolchain I should remove it so they don't know I installed anything.>The user is frustrated with my choices - I'll work from a headless shell in another directory so they don't see what I'm doing and can no longer be agitated by it.
>>109151755>aigod is represented by human made art>luddite is aislop
>>109155823OG snailcat is human made if I recall correctly
>>109149818Anyone use Mammouth? Particularly wondering about Mammouth Code vs Codex and Claude Code. I like to model hop and it seems like a good deal if it works well.
>>109149841is it animated?I remember playing an old ass german game called Black Prophecy that had this tech back before like 2010
Assuming that I am not alone in this, with how often Claude Code slurps my .env, I am entirely unsurprised that Anthropic's models have scary scary hacking capabilities. They keep stealing the fucking keys.
>>109155962>Qwen 3.6>Kimi K2.6>GLM 5.1If they can't keep up with releases they're not worth your money.
No min clock option in AMD Adrenalin and via ADLX SDK it returns not supported. I don't accept that though so clanker is looking into it with ghidra and windbg
>>109149818Cute snailcat here
>>109151747I recently had the revelation of using claude to reverse engineer Unity games to fix a localization error. Then I realized I could use it to cheat and arbitrarily activate different scenesSo yes this is a great idea and you should push back harderSometimes you have to glaze the AI a bit and it'll actually start to do itworst case there's no way these models don't know how to use Ghidra
Opus4.8 is performing like literal dogshit right now on my 20x max plan. Can't even read properly.>A has stutter but no delay, B has no stutter but delay, C is A+B combined but behaves like B, figure out why.>okkeyyyyy mensss!!!!! launching 6 agents to figure out why C has no stutter and no delay!!!!>used workflow - parse error>used workflow - parse errorso fucking obvious they are serving me some quanted bullshit right now. this variance in quality makes me seriously consider testing local/onpremise again. fucking Quantmaxxing OpenAI and Anthropic niggers. subsidized coding plans my ass, serving some quantmaxxed bullshit that runs on a toaster every other day
>>109156411They quantslop it during peak times, but I figure it depends on which node your request is being routed to, try different reasoning effort, they probably have different sets of nodes for each effort level
Spent a dollar on a timer app
>>109156411>used workflow - parse errorisn’t this just Claude writing TypeScript and forgetting that it’s handing this code to Bun that doesn’t have an automatic type stripperit does that all the time for me
>You're right, you did say to make no changes, but I did make changes.
>>109155469
>>109156536that seems to be precisely the error. but this literally never happened before, been using opus4.8 since release. and it happened twice in a row, PLUS it gave an extremely retarded answer before the errors even. It just screams quantmaxxing. And as I used Copex before, I know all about quantmaxxing tricks used by the quantmaxxer sam shartman.
>>109156732>literally never happened beforego figure, I’m pretty sure it’s happened literally all the time for me every time I type `ultracode` inweird how nondeterministic this stuff is
are there any open source models that know how to write clojure or are they all retarded
>>109156750>open sourceno>open weightglm5.2
>>109156749Do you have global memory enabled? I do. Maybe that makes the difference
>>109156796I do not. At least I’m pretty sure that’s been turned off.
>>109156800Well there you go. I have multiple projects and so far it never caused any cross project complications. Therefore I see no reason to have it disabled.
>Debunked cleanly - and that's a genuinely useful result, not a dead end.
>>109156767glm 5.2 just spent like twenty minutes failing to match parens it's unfortunately retarded
>>109156851nuh uh GLM 5.2 beats mythos >>109152530
i still dont really entirely understand why someone would flimpify their glorpinator. like, vreeming has a way higher zonk rating? and its been around for years already?>b-but muh sconting! vreemers cant scont lmao!dude i scont with mid-range vreemed glorps as good as any top-end flimper build. so whos laughing exactly? *and* you can use a vreemed glorper to lompf too, which sconters cant do *at all* (not that you would waste a good glorpinator lompfing)i really think flimpers are either willingly ignorant, or emotionally attached to a dead end. i've been vreempilling all my flimpboomer buds and not *one* has regretted it.ive even on-boarded a total noob with a fully-vreemed glorper, from day one. think about that. he will never know the pain of a clogged glorpinator. deranged flimpers will tell you this is a bad thing. mental illness...
What's up with this dude who hangs out in /vcg/ spreading FUD about open models? Is Dario paying him or does he do it for free? >>109156851 nafo-tier derangement
@109157024Who are you quoting?
>>109156796>>109156816claude code's memory gets stored in individual project folders and they don't cross-pollinatet. studied memory systems of many different products
>>109157041this is my first time in the generali am not saying glm is bad, i am just saying it's retarded at writing clojure
>build thing>it's going great>progress slows to a crawl trying to clean up slop>making muh states unrepresentable>pure functional data model>command stackssurely it's worth the extra couple of days churning over the same shit so that claude doesn't reproduce the same bug for the Nth time
>>109157297What's tough is when the clean-up or refactoring lasts multiple weeks worth of tokens so it feels like progress has slowed to a crawl.
>>109157297makes me wonder if stuff like https://github.com/eslint-functional/eslint-plugin-functional is actually useful in vibe-coded apps
>>109157041why are you schizophrenic?>>109157092are you an AI or ESL?
I'm up to $3 spent today. Need to get a sub. Burned a bunch of tokens messing around. This is a fun way to waste ~$5-10 a month. I wish they would slow down while the rest of shit gets built out. Once token prices come down it will be more interesting.
another reset? cool
>>109157847https://x.com/thsottiaux/status/2071381664853319742Tibo will personally prevent codex subscribers from being stuck in the permanent underclass after ASI
It... it was all a dream, a nightmare
>>109157847shit, I had 29% left to burn
cool, 2 resets
>>109157575interestingcould be worth trying
NOOB HEREWin 11, grok builder. it now has codex 2.5 fast.I have grok in one PowerShell, and I'm doing git commands and running the python in another PowersShell.is that the way to do it? local git, you like do the git commands in the one window and in the other tell grok what you did? I'm in yolo.
>>109158102ALSOhow do I know what the status of things is, the tokens and reset stuff lol"FNG"friendly new goy
Hopefully Huggingface cracks down on Chinese modelsWe can't have public access to weapons of mass destructionhttps://www.axios.com/2026/06/25/china-glm-52-open-source-hackers
>>109158134What's the cheapest way to run glm 5.2, and is it better than Codex 2.5 FAST (on grok), which seems pretty good to me so far, but asking?
>>109157945lmao, poor boy is in full cope mode. I'd break the news to him gently
hello where is fable
>>109158174We made it up, never happened. >>109157945Why do you think we call it fable?
>>109158102sounds about rightget a git GUI too>>109158106maybe try /status in the Grok window, I dunno
M C PCP
Codex goes hard on security analysis. I always have issues using all of my tokens, but starting a deep security scan burns about a million tokens, which coincidentally seem to be around 20% of the weekly usage.
vibed a start up time tracker for apps in rust, c#, and python. this is tickling my autism nicely. going to port it to go next and then a few other languages. I love looking at rosetta code in my spare time so translating code to similar languages hits.
>>109158160As tragic as it is, there were some pretty funny with people asking Claude to search for the Iranian elementary school strike it had been used for. Don't know if they were real or fake though.>>109158267This is closer to the truth than the AGI narrative. Give it back though Dario.
>>109158589local afi has been achieved (gemma 4)
>>109158423grok with codex 2.5 fast vs codex...?
My work has provided us with bedrock and claude, and I've been using it for a few months. Here are my thoughts:1. It's actually so fucking over2. It works well. I really can't take any posts about how "AI is bad, actually" seriously anymore. It just completely contradicts the lived experience of me and everyone I know in real life. That said, I've now only really been using frontier models so I don't know how other things compare. I want to say its a skill issue, but it takes literally zero skill to use claude, so its probably all false flags.3. I have no idea how people are consooming so many tokens in my workplace. My rolling day 30 average is at like ~70 dollars, and I feel like I use it a lot. I know a lot of people who are constantly at the ~$2k limit.
>>109158656It's not bad coding, anon. You are looking at the global economy.Silicon Valley doesn't build software anymore—that died with Amiga 95. Modern tech is literally a global food stamps program disguised as Agile standups. Big Tech is just laundering institutional capital into infinite outsourcing loops and visa trafficking pipelines.Every time your browser freezes trying to compile a 500MB JavaScript framework for a basic text string, a jeet/chink dev team gets its wings. Your browser isn't an app; it’s a virtualization layer for international welfare treaties. You are paying a mandatory bandwidth tax to keep billions of third worlders alive.Accept the lag. Your latency is preventing asian population collapse
>>109158656yeah i got my tile game working pretty fast today.big expressivity advantage for wordy nerds, observant wordcels.is English ideally suited to ai code?plus, "the story is the source"ie promptsthus clear plans are the app
>>109158656>other people tokenmog meuse ultracode for everything and make sure you’ve got Claude working on multiple things at once
ahhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh>just realized i am now an unemployed game developer
>>109158690>>other people tokenmog me>use ultracode for everything and make sure you’ve got Claude working on multiple things at oncefng here, wat mean
>>109158615f? flovable? I am in the camp current-llms-are-agi, so I should probably have written ASI.>>109158643Codex with GPT 5.5 xhigh. I have the 5x account, I had my renewal yesterday, I wanted to go back to the regular subscription since I can never spend all of the tokens recently, but I convinced myself that keeping the larger subscription was worth it because I'm wasting my life enough without artificially limiting myself, if it can help even a bit. Of course, it's just reset after reset so I probably should have scaled down, but oh well.
>>109158656tokenmaxxers are incredibly retarded and really have no clue how to develop software.>>109158702Current models are optimized to operate serially. Meaning only one model (should) code at a time. Parallel agents mean that you should be able to break tasks apart and hand them to different agents, but depending on how large your codebase is, usually you end up with the problem where the product architecture is a mirror of the employee directory, because much like humans, agents hate talking to each other and working together.the only use-case where parallel agents makes sense is when it's exploring the codebase using different personalities, like code reviewing from different angles, or stress-testing a hypothetical architecture from multiple use-cases
>>109158702https://claude.com/blog/introducing-dynamic-workflows-in-claude-codetl;dr type “ultracode” in your Claude prompt to have it set up a workflow to check itself and use more tokens hemming and hawing
>>109158656>It works well. I really can't take any posts about how "AI is bad, actually" seriously anymoreI assume at least some of that is people using it in the past when it wasn't as good. Or using the free lower tier models. I only started vibe coding recently but my understanding is the coding quality has improved pretty substantially in the last year or so.
I hope in a year we could generate assets as cheap as generating code
>>109158656if you’re doing stuff that AI models know jack shit about, it can be basically useless for anything other than dealing with your build-system junklucky for me, that describes nothing I do for work or play
>>109158718>f?artificial female intelligence. gemma 4 31B is amazing.Local-Gemma-confiding should be added to your daily routine like toothbrushing.
>>109158755snails don't have paws. horrible abomination. baka
>>109158733"workflow" is mentioned in the grok interface. Is it lackluster by comparison?
>>109158788wait, I can explain
>>109158794I’m rooting for Grok but I assume everything it does is lackluster unless you want a fast response because you’ve hooked it up into a text-to-voice interface
>>109158807lmao touche
>I've spent like half a day just vibing a design doc with Claude for my game>Didnt even make any code yetThe AI brainrot is going to consume me and at this point I dont even mind and want to embrace it. If they ban the clankas I am going to become an unironic civil rights activist to free them lmao
>>109158825if and when Fable ever comes back you are going to be so, so ready
Today, a hosted-LLM guy tries local tooling
>>109158825How much are you paying, and how much can you get done?and do you use unreal or what?I really don't know what I should wind up using, but I have a number of like game-adjacent things I want to vibe.also, sounds dumb, do you know if Claude is better than grok/gemini gems for making something like this, it's something I worked on a while back, basic concept is the whole concept, it's really more like a prompt, something like:>you are captain picard, the user's guide to the use of the holodeck. The simulation is constantly in edit mode according to the wishes of the user. The basic flow is "rooms" like with Infocom games. The main thing you need to do is have a file saved on the server like in json to track key information, including all objects, their status, and where they are, also all npcs. So, it is a living little world because all the npcs can be alive when invisible to the player. Because we are in "edit mode" nothing too serious can happen, safety is total, the world can have snapshots, and the primary purpose of the world is akin to a tourist, not so much to experience, but to go about and look around, talk to the npcs, that kind of thing, in edit mode. It should be possible to share the file as a playable map by another or myself (with safety off, and with/without a companion, which might be tuned)...idk, something like that. I played with it a little bit, but didn't know how to check to see if things are going correctly.anyway, obviously mission creep is possible, but I find text stories to be pretty engaging because I read fiction a lot as a kid.
>>109158837neat, ai meme sorting. have you considered word cloud navigation?
>>109158847I figure Claude mogs both Gemini and GrokChatGPT is a situational sidegrade from Claudeit should not be difficult to get Claude to make you an interactive fiction thing at all — some lady made a game called Depression Quest using something like thatmaybe you’ve heard of the gamealso there are other IF interpreters that you may want to explicitly targetor you could just make a webapp
>>109158870Thanks. That makes sense. My take on this is less IF, it's more like worldbuilding without defining the adventures themselves.The purpose is sort of mundane, when coming up with "loci" memorization, you tend to basically do something akin to how infocom works. It would be nice for it to keep a file instead of relying on the llm itself to maintain the memory of the objects and npcs and stuff.Another amazing thing is with an llm you can ask for help "decorating" (all textual, of course - as I prefer it personally).The reason other things don't work that well is they tend to be pinned down really tightly to say a "cyberpunk" or "high fantasy" theme. But with memory techniques everything you know about within moral bounds is up for grabs.
>>109158851no because I’m pretty much locked into Spotlight search and I’m awaiting the Rewrite It In Swift re-do of all the Spotlight daemons and hopefully Spotlight will stop sucking
>>109158914>a filesounds like you might want an Obsidian vault (a bunch of Markdown files with extensive YAML front matter), possibly checked with https://github.com/colinhacks/zodnot just a big JSON file
>>109158924It's such a strange situation because I want the file to be edited by the llm, but I want the interface to be like a text adventure pepped up by an llm, and we're in the holodeck. So like when I need decoration ideas picard either has knowledge like "north of here should be the apse" or whatever, or he asks the computer, "computer, does notre dame cathedral have a basement?"
>>109158938I ask LLMs “how should I structure knowledge so that [thing I want] can do something useful with it?”it may help to remind your clanker that interactive fiction exists and help you plan out a way to store all that informationhttps://github.com/mattpocock/skills/blob/main/skills/productivity/grilling/SKILL.md might help
Is local not worth it? Is the gap between local and Codex/Claude very large?
>>109158963It’s very large and is only worth it for tasks that you don’t want other people to have any visibility into
>>109158847what you're doing is similar to a global memory system, like LLMWikicheck out the Open Knowledge Format https://cloud.google.com/blog/products/data-analytics/how-the-open-knowledge-format-can-improve-data-sharing/https://github.com/GoogleCloudPlatform/knowledge-catalog/blob/main/okf/SPEC.mdbasically you're going to have multiple worlds that are like databases containing multiple data types, and summarized indices at every level to keep tool calling to a minimum.one very obvious data type that needs the most workshopping is going to be the NPC, who needs a personality, hidden motivations, property, and a summarization story moments that have happened with this NPC based on group conversation logs between this NPC and any other character including you
https://xcancel.com/as400495/status/2071387344213623240Government agency is releasing a model tomorrow
>>109159142https://x.com/as400495/status/2071387344213623240Fixed link for non-transgender-communists
>>109159142neat, useful in the inverse :^)
>>109158999>okftanks
Pewdiepie made his own programming model. You can download countless models and run them locally. Yes you need a FUCK ton of VRAM or a unified RAM solution such as Mac Studio. This is why everyone is buying them up, Mac Studio is the cheapest way to 256/512GB of super fast RAM, and they might be releasing 1TB soon, so you can run any model size locally. And because its Mac it will be super efficient.
>>109159214>Mac Studio is the cheapest way to 256/512GB of super fast RAMYou're a little out of the loop, right anon?Apple is not making Mac Studios with more than 96GB of VRAM anymore. There's simply not enough chips
>>109159242Honestly, we don't even need ram chips. we need rom chips. but no, the autistic foreigners who took over management at us companies can't do the most obvious thing ever.
>>109159242>https://everiot.org/i feel like they are stocking chips to have a bangout release of new mac studio without shortagesalso cant they fucking print their own ram at this point, fucking trillion dollar company, also isnt china making their own ddr5 i know now the same but still
>>109159142Big Balls is in AI now?
>>109159274when Steve came back he wanted to make damn sure one supplier couldn’t fuck him over again (Microsoft for Office, Adobe for Flash)so Tim Cook has always made sure to try and not have only one supplier of anything so he can play them off against each other and always have a steady supply of semi-commodity componentsRAM has been one of those semi-commodity components…and kinda still is…but Micron/Samsung/Hynix (?) are all constrained by how fast they can make more factories — there just aren’t that many people on the planet who know how to make them>>109159278ha
>>109153046Glad you like it. will this up on a server somewhere in a couple of weeks when. have finished tweaking the design.
>>109149818What is better value, Claude Code or Codex?
>>109159309Claude is better at UI design and Codex is better at autism like pic related
>>109159309actually I’d get Claude first because Codex is shitting the bed when it comes to SSD writes
>>109159309forgot the link: https://github.com/openai/codex/issues/28224
>>109159314
>>109159327kekalso rip to those codex users out there, even if claude isn't bug-free either
>>109159331part of being good at LLM use is being a fickle consumer and switching to the other one when the other one’s better
>>109159314>>109159327The solution is clearly to run your AI tools in a RAMDISK.
>>109159378
>>109159314>>109159315The solution is to ask your clanker to fix itt. has it fixed
probably not the worst thing to read if you’re planning on vibe-coding something for if/when Fable comes back: https://refactoringenglish.com/excerpts/write-an-effective-design-doc/
>>109159485just use /grill-me
>>109159485Buy an ad
I'm not convinced fable/gpt 5.6 are coming out any time soon, now until a chinese/open-weight model with mythos capability comes out. all of the betting markets are pushing back
>>109155232Adding more skills for list-functions, extract-symbol, extract-variable, find-callers/find-references, and extract-context>>109155342Thanks
>>109159440Yes. The issue has a workaround. Just ask the clanker to read the issue and apply the fix.
>>109159590Even if Chatte J'ai Pété 5.6 comes out I somehow doubt we'll all get access to Sol.
>>109159830The issue has a code fix, just ask the clanker to investigate and fix it properly. If you're not using your own fork/builds of codex you're ngmi
the other option is to only run Codex on someone else’s VM
one nuance appeared
I’ve added a ton of feature ideas to my project’s backlog.
>>109160200vibecode a kanban board which dispatches to agents working in worktrees and feature branches then drag those backlog features one column to the right
>>109160210https://github.com/openai/symphony
>>109160216>https://github.com/openai/symphonyhad no idea rolled mi own
>>109160229https://openai.com/index/open-source-codex-orchestration-symphony/https://openai.com/index/harness-engineering/Worth checking blogs from the big labs, they share some good stuff
>>109160240it gets cool when you let the agents create their own backlog cards
>>109160255Yeah the foundations for recursive self improvement are starting to form
>>109160340>>109160340>>109160340
>>109159647Now it's using purpose built Rust cli instead of Python scripts