/g/ - >we've got the best model ever but it's too good s - Technology

>>108553160
All this garbage about it being too powerful is a meme, but this benchmark is a big deal. Opus 4.1->4.5, which pushed vibe coding from useless to actually usable, was a jump of about 5% on SWE-Bench. Mythos is a jump of 13%, and is nearing benchmark saturation. It's probably going to be a genuinely highly competent code model. Hope you all have good job security lmao.

Anonymous
04/08/26(Wed)00:13:36 No.108554335

Anonymous 04/08/26(Wed)00:13:36 No.108554335

>>108553160
>93.9% on SWE-bench
what happens when it hits 100%? it will be able to implement the whole Windows given only API docs?

Anonymous
04/08/26(Wed)00:17:42 No.108554363

Anonymous 04/08/26(Wed)00:17:42 No.108554363

>>108554335
What happens when you get a 100% in calculus? you move on to a harder subject

Anonymous
04/08/26(Wed)00:22:02 No.108554386

Anonymous 04/08/26(Wed)00:22:02 No.108554386

>>108553037
oyvey

Anonymous
04/08/26(Wed)06:50:48 No.108556085

Anonymous 04/08/26(Wed)06:50:48 No.108556085

File: file.png (217 KB, 1246x786)

217 KB PNG

>>108554335
>>108554363
https://openai.com/index/why-we-no-longer-evaluate-swe-bench-verified/

openai thought swe-verified was effectively saturated at 80% because the benchmarks was rejecting valid answers + problems of training on test data (which was being sourced from open source repos).
so mythos getting to 90+ is probably the model tailoring its solutions to match the benchmark ?

that said, i wouldn't be surprised if the other swe-benches have similar issues, so we may have just saturated swe-pro as well

Anonymous
04/08/26(Wed)06:52:28 No.108556092

Anonymous 04/08/26(Wed)06:52:28 No.108556092

>>108553037
max IQ to trust this physiognomy?

Anonymous
04/08/26(Wed)06:54:46 No.108556105

Anonymous 04/08/26(Wed)06:54:46 No.108556105

>>108553037
bug bounty hunters are quaking in their combat boots

Anonymous
04/08/26(Wed)07:09:32 No.108556180

Anonymous 04/08/26(Wed)07:09:32 No.108556180

>>108556085
Of course it's overfitted to the ass

Anonymous
04/08/26(Wed)07:15:40 No.108556210

Anonymous 04/08/26(Wed)07:15:40 No.108556210

>>108554313
Yes it's a big deal. On the other hand Opus 4.6 at its best is also very good, the practical problem is that it rarely runs at its best and if we ever get access to Mythos the same thing will probably happen.

Anonymous
04/08/26(Wed)07:34:40 No.108556294

Anonymous 04/08/26(Wed)07:34:40 No.108556294

>on the juice and their lies

Anonymous
04/08/26(Wed)07:44:00 No.108556343

Anonymous 04/08/26(Wed)07:44:00 No.108556343

>>108553037
Why are most of these large AI companies run by jews? its not like they are even the competent ones as all of the AI work is done by chinese people (most of whom live in USA).

Anonymous
04/08/26(Wed)07:48:04 No.108556364

Anonymous 04/08/26(Wed)07:48:04 No.108556364

>>108553671
You've been alive for a long time.

Anonymous
04/08/26(Wed)07:49:20 No.108556367

Anonymous 04/08/26(Wed)07:49:20 No.108556367

>>108553050
ffmpeg accesses online? Eeeehhhh doubt.

Anonymous
04/08/26(Wed)07:51:39 No.108556381

Anonymous 04/08/26(Wed)07:51:39 No.108556381

>>108556343
they just move the bank notes

Anonymous
04/08/26(Wed)07:51:42 No.108556383

Anonymous 04/08/26(Wed)07:51:42 No.108556383

For vulnerabilities it would be about intercepting an online connection.
Analyzing the program and how it does do an online connection.
Assuming/determining possible program update parameters and intercept vectors.
Automatically doing this for all programs?

Do we really call an autoupdater for programs?

Anonymous
04/08/26(Wed)07:52:43 No.108556391

Anonymous 04/08/26(Wed)07:52:43 No.108556391

It's possible to do so much hacking with an AI.
Eeeehhhh why am I brainstorming evil thoughts for you niggers. You will actually go out and do it.

Anonymous
04/08/26(Wed)07:56:27 No.108556421

Anonymous 04/08/26(Wed)07:56:27 No.108556421

>>108553050
idk in my first two weeks vibe coding with local AI, I accidentally stumbled into some potential exploits in a common markdown parser

I wouldn't be surprised if a stupider AI would have also found the bug

Anonymous
04/08/26(Wed)07:57:40 No.108556429

Anonymous 04/08/26(Wed)07:57:40 No.108556429

>>108553050
>talk is cheap, send patches
hate this shit, i don't have anything to prove, i can just talk about my hobby, i don't have to prove a fucking thing.

Anonymous
04/08/26(Wed)08:00:57 No.108556451

Anonymous 04/08/26(Wed)08:00:57 No.108556451

>>108556429
YOU WILL work for free!

Anonymous
04/08/26(Wed)08:06:19 No.108556490

Anonymous 04/08/26(Wed)08:06:19 No.108556490

>>108553037
What is that *thing* in OP's picture? If I looked like that I'd probably never leave the house, until the date of the plastic surgery. He's got a major WrongFace elephant man look going on there. Maybe if he closed his mouth when smiling he'd only be hideous instead of horrifying.

Anonymous
04/08/26(Wed)08:06:42 No.108556496

Anonymous 04/08/26(Wed)08:06:42 No.108556496

File: turtle_original.jpg (31 KB, 550x544)

31 KB JPG

this is not /o/ mr ?

Anonymous
04/08/26(Wed)12:35:50 No.108558301

Anonymous 04/08/26(Wed)12:35:50 No.108558301

>>108556429
are you a company claiming to have made a groundbreaking new discovery? then yeah you have something to prove, retard

Anonymous
04/08/26(Wed)12:40:56 No.108558340

Anonymous 04/08/26(Wed)12:40:56 No.108558340

>>108553037
cunt looks like an unfunny TV comedian.
do wish he'd stfu, my AI Co. shitlist is already full for the foreseeable.

Anonymous
04/08/26(Wed)13:32:57 No.108558838

Anonymous 04/08/26(Wed)13:32:57 No.108558838

File: ad14759e004087f2637ce6479(...).png (24 KB, 994x991)

24 KB PNG

>>108556343
Jews have capital and international connections and you are a goy who sells his labor.

Anonymous
04/08/26(Wed)13:37:22 No.108558872

Anonymous 04/08/26(Wed)13:37:22 No.108558872

>>108556429
>I can say your product is shit
>no, I don't need to prove it
>just trust me bro

Anonymous
04/08/26(Wed)16:09:03 No.108560226

Anonymous 04/08/26(Wed)16:09:03 No.108560226

>>108558301
ffmpeg confirmed it is real
you lost luddite

Anonymous
04/08/26(Wed)16:15:47 No.108560279

Anonymous 04/08/26(Wed)16:15:47 No.108560279

>>108560226
I don't care who says it's true I won't believe it

Anonymous
04/08/26(Wed)16:24:43 No.108560339

Anonymous 04/08/26(Wed)16:24:43 No.108560339

>>108560279
>I'm retarded on purpose, so there
so brave

Anonymous
04/08/26(Wed)16:32:36 No.108560385

Anonymous 04/08/26(Wed)16:32:36 No.108560385

>>108553037
>no guys you don't understand the vibecoded apps with this thing are too bussin frfr we can't make money with this it's too good ong

Anonymous
04/08/26(Wed)16:46:10 No.108560486

Anonymous 04/08/26(Wed)16:46:10 No.108560486

>>108558301
"groundbreaking new discovery"
sir it was a basic vulnerability that wouldn't even get you two days of mcdonalds wages from a VDP, where do you people who know nothing of technology find a technology board?

Anonymous
04/08/26(Wed)16:53:42 No.108560545

Anonymous 04/08/26(Wed)16:53:42 No.108560545

>>108560486
the discovery being that your LLM can suddenly find and fix unknown bugs in leading software, not the bug itself

Anonymous
04/08/26(Wed)16:54:18 No.108560549

Anonymous 04/08/26(Wed)16:54:18 No.108560549

>>108560545
that's not a new discovery either? been doing it since 2022.

Anonymous
04/08/26(Wed)17:36:22 No.108560882

Anonymous 04/08/26(Wed)17:36:22 No.108560882

>>108556105

mr bounty likely has firearms waiting for anything related

Anonymous
04/08/26(Wed)17:51:53 No.108560997

Anonymous 04/08/26(Wed)17:51:53 No.108560997

>>108553037
Why would I care about their unreleased models, when a single chat can eat up rate limits quickly? I used to be able to chat nearly all day, until they fucked it up.

Anonymous
04/08/26(Wed)17:59:23 No.108561048

Anonymous 04/08/26(Wed)17:59:23 No.108561048

>>108560882
Just wait until he hears about the mutiny.

Anonymous
04/08/26(Wed)18:04:25 No.108561089

Anonymous 04/08/26(Wed)18:04:25 No.108561089

>>108560549
yes, and in principle it could generate code in 2022, but if you tried it then you realized how dogshit it was

Anonymous
04/08/26(Wed)18:19:27 No.108561177

Anonymous 04/08/26(Wed)18:19:27 No.108561177

File: check.png (45 KB, 1264x868)

45 KB PNG

Anonymous
04/08/26(Wed)18:27:41 No.108561228

Anonymous 04/08/26(Wed)18:27:41 No.108561228

>>108561177
>Charismatic Leadership
These faggots have the charisma of a proctoscopy performed with a cheese grater.

Anonymous
04/08/26(Wed)18:29:13 No.108561237

Anonymous 04/08/26(Wed)18:29:13 No.108561237

>>108561089
Yeah I remember when GPT-3 (the base model, not ChatGPT 3.5) first came out a bunch of people had their minds blown that you could say "HTML button the color of trump's hair" and it would make a yellow button, just enough to demonstrate it understood the semantic meaning of what you were asking. That was enough to be impressive back then. The context window of top models in 2022 ranged from 4k-8k tokens so they couldn't even in principle analyze a real codebase.

Anonymous
04/08/26(Wed)18:36:14 No.108561276

Anonymous 04/08/26(Wed)18:36:14 No.108561276

>>108561048

he still hears or young gypsy got his ears?

Anonymous
04/08/26(Wed)19:19:27 No.108561554

Anonymous 04/08/26(Wed)19:19:27 No.108561554

>>108556343
They simply have easier access to the watering hole.
Same reason why the biggest companies are mysteriously and conveniently located in the cities which have local federal reserve branches.

Anonymous
04/08/26(Wed)20:40:41 No.108562024

Anonymous 04/08/26(Wed)20:40:41 No.108562024

>>108556343
I mean you could make the case for Altman sure, all he's ever done is run businesses. But Amodei is absolutely one of the competent ones, along with Ilya. Those two were AI researchers at Baidu and Deepmind respectively that got poached for OpenAI, leaving higher paying jobs for the promise of working for a non-profit that "benefits humanity" and left to make their own companies when they thought it was failing to live up to that goal. Frankly I think their fears are hysterical and misguided but I don't believe for a second that they're not serious. All the internal memos and documents that have been leaked or revealed during Musk's lawsuit's discovery process seemed to corroborate that, along with the fact that Ilya nearly blew up the whole company over it and only failed because Microsoft threatened to just hire everyone into their own AI division.