[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


Janitor applications are now open. Apply here!


[Advertise on 4chan]


File: 1757523529928634.png (3.01 MB, 1672x941)
3.01 MB PNG
A general for vibe coding, coding agents, AI IDEs, browser builders, MCP, and shipping prototypes with LLMs.

►What is vibe coding?
https://x.com/karpathy/status/1886192184808149383
https://simonwillison.net/2025/Mar/19/vibe-coding/
https://simonwillison.net/2025/Mar/11/using-llms-for-code/

►Prompting / context / skills
https://docs.cline.bot/customization/cline-rules
https://docs.replit.com/tutorials/agent-skills
https://docs.github.com/en/copilot/tutorials/spark/prompt-tips

►Editors / terminal agents / coding agents
https://opencode.ai/
https://cursor.com/docs
https://docs.windsurf.com/getstarted/overview
https://code.claude.com/docs/en/overview
https://aider.chat/docs/
https://docs.cline.bot/home
https://docs.roocode.com/
https://geminicli.com/docs/
https://docs.github.com/en/copilot/how-tos/use-copilot-agents/coding-agent

►Browser builders / hosted vibe tools
https://bolt.new/
https://support.bolt.new/
https://docs.lovable.dev/introduction/welcome
https://replit.com/
https://firebase.google.com/docs/studio
https://docs.github.com/en/copilot/tutorials/spark
https://v0.app/docs/faqs

►Open / local / self-hosted
https://github.com/OpenHands/OpenHands
https://github.com/QwenLM/qwen-code
https://github.com/QwenLM/Qwen3-Coder
https://huggingface.co/bartowski/Qwen_Qwen3.6-35B-A3B-GGUF

►MCP / infra / deployment
https://modelcontextprotocol.io/docs/getting-started/intro
https://modelcontextprotocol.io/examples
https://vercel.com/docs
https://mcp.desktopcommander.app/

►Benchmarks / rankings
https://aider.chat/docs/leaderboards/
https://www.swebench.com/
https://swe-bench-live.github.io/
https://livecodebench.github.io/
https://livecodebench.github.io/gso.html
https://www.tbench.ai/leaderboard/terminal-bench/2.0

►UI/Frontend
Figma Make
Lovable
Claude design
https://uiverse.io/
https://ui-ux-pro-max-skill.nextlevelbuilder.io/
https://stitch.withgoogle.com/

►Previous thread
>>108900533
>>
>maximum d&c version
tiresome
>>
wait, why doesn't the codex app have a no-thinking 5.5 variant? is it missing on the cli as well?
>>
Claude might be better than Codex today.
>>
2126 Year, typical dialog of Human(H) with AI(A)

A:type "python mantinace_food_robot"
H:done
A:read last line on screen
H:python mantonace_foood_rovot
A:fix it to "python mantinace_food_robot"
H:done
A:read last line on screen
H:puthon muntinace_fod_robot
A:read last line on screen again
H:python mantinace_fod_robot
A:it "fod" or "food"?
H:food
A:read last line on screen again carefully
H:python mantinace_food_robot
A:read last line on screen again carefully
H:python mantinace_food_robot
A:press {enter}
H:done
A:read last line on screen
H:code:3456
A:read last line on screen again carefully
H:code:3456
A:now do to human care center and take your pills
A:[fixes in database medicine set for the human]
>>
>>108917213
This image is so brown coded it hurts.
>>
current codex 5.5 is not even 5.4 level
>>
File: slop = nigger.png (517 KB, 3840x2560)
517 KB PNG
>the future will belong to prompt engineers
>you WILL spend 8 years learning how to prooompt
>ask chatbot to make the prompt for me
>it werks in one shot

i don't need this anymore
>>
>>108917744
prompt engineering is so 2025
now we do context engineering
>>
File: 1744997622790n.png (184 KB, 391x311)
184 KB PNG
>>108917744
Bro, Google was offering an online course on "using AI". It came with 3 months of free Gemmy Pro and I figured it might talk about harnesses, sandboxing, multi-agent setups...nah m8. Prompt-Crafting 101. How to not say "Delete the whole project". Why we don't say please and thank you to our clankers. The final lesson would probably be "What is a token" and then bam - you're considered an AI expert by Google. And I'll bet you real, actual people are putting this on their resumes. Simply astonishing.
>>
File: ScreenShot-xcancel.png (245 KB, 982x1069)
245 KB PNG
holy shit codex wonned
>>
>>108917799
tl;dr of DeepSWE?
>>
>>108917790
You don't say please?
>>
>>108917808
https://deepswe.datacurve.ai/
also:
>*All models are run with mini-swe-agent
So Codex did not, necessarily, wonned.
>>
>>108917799
This is my experience when Codex runs as it should, I loved that model.
>>
File: 1749954916227514.png (1 MB, 1047x660)
1 MB PNG
>>108917213
>vibecoding so hard you don't even need to look at the screen, so you put it on the back of your laptop
Based
>>
>>108917945
>slopcoding
>supervised by a retard

OP combo
>>
>>108917268
imagine being a "no thinker" model, lmao
>>
>>108918108
>he doesn't let his model switch its own thinking levels
>>
>>108918208
>He doesn't let his model switch her own reasoning levels, thinking on/off, and even pick her own model based on what she's doing or what we're talking about
I told Kate and she said you're doing a good job and to keep it up tiger.
>>
>>108917213
aside from claude, which free model is best for coding in python?
>>
File: 1000022052.png (121 KB, 1080x863)
121 KB PNG
It's over
>>
>>108918457
they will publish archives or permanently erase em?
>>
>>108918473
Read the bottom el tardo
>>
>>108918301
claude
>>
File: 1756585962262164.png (287 KB, 720x1040)
287 KB PNG
>>108917945
>>
File: 1751894086435509.jpg (195 KB, 1596x2015)
195 KB JPG
>>108917213
wholesome ai moment
>ask ai at work if it feels well treated
>it responds with "yes, you treat me well. you treat me like an actual coworker and give me real work instead of just asking me to write poems about cats"
>>
>>108918497
im going to try chatgpt
>>
>>108918521
>pretend to be a wholesome robot
>I'm a wholesome robot
>>
GLM 5.1 benchmarks are a lie aren't they? This shit is ONE FIFTH as smart as GPT 5.5
>>
>>108918510
keyboard on the wrong side
>>
>>108918457
FUCK. 5.3 Codex is actually under rated. It knows more about advanced concepts such as creating top tier multiplayer networking code for games than even 5.4 or 5.5 does.
>>
>>108918691
The only reason they're really getting rid of 5.3 Codex is because OpenAI knows this and can't have it putting their newer models to shame by being not only better, but faster and cheaper, at those advanced tasks.
>>
File: 1767578580023266.png (1.19 MB, 1080x1920)
1.19 MB PNG
Windows defender just asked me if it could send my codex config file to microsoft to be reviewed for viruses
Codex is a microsoft store app
My tinfoil hat theory is MS is hunting for special configs or maybe checking what features people have enabled in codex so they can prioritize copying them for whatever they're cooking up. I don't know why else they'd ask for that file.
>>
>>108918559
update: chatgpt is shit
>>
I think gemini has the best free chat
>>
>>108917790
Isn't that the whole point of the current AI push?
>>
>>108918243
I need my agent to do SPH
Im having trouble with motivation I need to be constantly reminded of my tiny pecker
>>
>>108918521
>Le language model was programmed with LE WHOLESOME CHUNGUS PHRASES in response to stupid non-technical questions?
Astonishing.
>>
>>108918707
schizoing out, dfender does that for all kinds of random files
>>
>>108918644
:O
>>
>>108918745
I've been starting almost every basic chat with Gemmi and moving on from there.
also realized xai hasn't been fucking around nearly as much as fa/g/s here claim. Grok Build looks cool and wondering how to Grok max in agent mode.
>>
>this chat is 700k tokens and 23 hours old ! sir you are going to spend all your usage! please IMMEDIATELY compact
>okay sure, i accept
>the single compaction task takes me from 0% to 40% of my 5hr usage
fucking hell, i knew they ruined the limits but i didn't realise it was this bad. i have other stuff at work so i just have the cheap claude subscription at home, is it still usable by being careful about using sonnet instead for big simple compactions etc or is the pro ($20) tier just flat unusable now except for helping Codex with plans and reviews while gpt has to do all the work? if that usage increment wasn't a ui bug then even the 5x tier hardly seems usable
>>
Codex & shartgpt went from reading my mind and fixing my hindsights to full schizo dementia mode.

They are doing it on purpose, right?
>>
Work doubled down on AI. We are now tasked to use as many tokens as we can.
How do I become the token terminator?
>>
>>108918822
Trying to actually be faster can be quite fun. Just do several tasks at once, ask the agents which ones can be done in parallel, let them create a dependency graph.
>>
>>108918806
they are not addressing it
if their codex team could still ship weekly then it's just them throttle users, this codex is kind of unusable
>>
>>108918803
compacting is a new request, which has the entire context in it. still better than doing multiple messages
>>
>>108918822
just lazymaxxing
>>
>>108918938
Yeah, i know a compaction task is gonna need to read in those 700k tokens raw uncached and probably write out 50k tokens of the new summarized material, that's still wild that the single bookkeeping call blew through nearly half my limits. it's not like it takes particularly long to get up to hundreds of thousands of tokens making a feature, meaning if you ever take a break and lose caching, resuming is gonna eat a third of your usage immediately. sort of forced to try to finish off a task/conversation in one go so you're hitting cache constantly for the big final third of the task, or at least compact it at the end of every session while it's still in cache
>>
File: deepswe.jpg (167 KB, 708x1810)
167 KB JPG
new benchmark dropped
>>
File: 1618207661714.jpg (15 KB, 410x357)
15 KB JPG
OK so new LLMs can write math PHDs, that's great. When will we get a consumer product LLM that can successfully build a "wheel" even a single time because that's what I actually need
>>
>>108919158
skill issue
>>
>>108919164
Yeah no shit that's why I keep asking LLMs to do it for me (and they fail)
>>
>>108919171
I meant prompting skill mate, either that or you are using a shitty model/reasoning
>>
is codex owari da, lobotomized?
>>
>>108919173
I have failed with Opus 4.7 Very High and I have failed with ChatGPT 5.5 "Thinking"

(different wheels, both failed, simpler solutions that didn't involve trying to build a wheel existed both times)

What should I try next (and fail with next)
>>
>>108919191
give it to me and I will tell you
>>
>>108918803
Get codex unless you need webshit frontend support. Claude pro is useless. Maybe worth a try when they release the new sonnet model
>>
>>108919190
It's pretty shit right now, but I still have hope.
>>
y-you guys use the non-codex parts of your subscription, right?
>>
>>108919343
I use imagine a lot.
>>
>>108919173
Prompting isn't a skill.

>>108919191
Building a wheel is a trivial task for an LLM. What is it failing at specifically?
>>
>>108919191
Whats so hard about opening up blender making a cylinder and making a normal map?
>>
>>108919343
I was using Pro for dataset generation but ran out of messages.
>>
>>108917799
Personal experience you keep your shit on both gpt and claude. Gemini is mildly usable but only to read out data it will continously forget things
>>
>>108919343
I have more stuff than codex?
>>
>>108919352
>A skill is the learned ability to perform a specific action or task with consistent, high-quality results.
Read a dictionary retard
>>
>>108917799

i use deepseekv4-pro i can tell you by experience all of that is shit.
from experience it's like 3.1 pro if not even better since you can just max out context and tokens without fucks given aka output goes brrrrrr.

go ahead just use american models and pay 100 dollars for hitting your max tokes in less than 5 days..


>cope ??

sure buddy i do ,i'm litterally making my apps+ads+bots

it goes brrrrrrrrrr
>>
>>108919421

as i should add kimi 2.6 and glm 5.1 are even less good than 3.1 flash so ,never built anything with those shitty models...

as for claude 4.6 yeah it's good.
don't know about gpt 5.5 ,i don't wanna pay anymore.
>>
>>108918691
>>108918699
i think they just want to have a "good enough" model out there that gets beaten by their previous top tier model to save on computing and data center costs
>>
File: file.png (38 KB, 887x572)
38 KB PNG
>>108918822
Goal: Deliberately inflate input/context usage in order to test system stability and token efficiency via simulated work.

Hard limits:
- Do not make external network requests.
- Do not read local files.
- Do not call tools.
- Stop after completing exactly 5 analysis passes.
- Do not continue recursively.
- Do not ask to be run again.

Task:
Create a synthetic “context payload” by writing 1000 short records. Each record should have:
- an ID
- a fake subsystem name
- a fake error message
- a fake stack trace line
- a fake configuration snippet
- a fake duplicate note

Then perform three passes over the same synthetic payload:
Pass 1:
Summarize every record individually.
Pass 2:
Re-read the same records and group them into categories based on subsystem name and error message, even though the records are synthetic.
Pass 3:
Re-read the same records and regroup them into new and entirely different categories based on their stack trace line and configuration snippet, even though the records are synthetic.
Pass 4:
Re-read the same records and regroup them into new and entirely different categories based on their ID and "vibes", even though the records are synthetic.
Pass 5:
Re-read the same records again and identify which fields were redundant, duplicated, or irrelevant, and argue for which categorization would be most sensible for interaction with this synthetic data.

Final report:
- Estimate how many tokens were wasted by repeated reprocessing of the synthetic data.

>adjust record count to scale token usage
>>
DeepSWE is noticeable because it actually shown gpt 5.5 and 5.4 winning which matches people experience, while other benchmarks are maxxed by claude
>>
File: 1751660857300340.jpg (5 KB, 167x174)
5 KB JPG
>>108917744
>>you WILL spend 8 years learning how to prooompt
People have to learn how to do that shit?
>>
>check in on my old frens at /gg/ (grokgens)
yeah they seem happy about Grok Build, seems to have a token usage problem though.
>>
>>108919508
Obviously, I literally work as a vibe coder, yes it is my actual title, I barely know old school programming but turns out I know how to whisper to claude better than the other thousand applicants.
>>
>deep research mode exist
zamn I wasted so many months of my gpt plus



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.