[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: images(85).jpg (20 KB, 588x441)
20 KB
20 KB JPG
>we've got the best model ever but it's too good so you got can't have it
>>
goy*
>>
File: Ffmpeg.png (438 KB, 1182x1256)
438 KB
438 KB PNG
>>108553037
What now?
>>
>>108553050
>one example
>>
>from 80.8% to 93.9% on SWE-bench
it's over
>>
>>108553037
the model was promised to me 3000 years ago
>>
>>108553037
ai is only safe when it's in the hands of megacorporations and governments
>>
>indians are in charge of all tech companies
>jews are in charge of all AI companies
>>
>>108553160
All this garbage about it being too powerful is a meme, but this benchmark is a big deal. Opus 4.1->4.5, which pushed vibe coding from useless to actually usable, was a jump of about 5% on SWE-Bench. Mythos is a jump of 13%, and is nearing benchmark saturation. It's probably going to be a genuinely highly competent code model. Hope you all have good job security lmao.
>>
>>108553160
>93.9% on SWE-bench
what happens when it hits 100%? it will be able to implement the whole Windows given only API docs?
>>
>>108554335
What happens when you get a 100% in calculus? you move on to a harder subject
>>
>>108553037
oyvey
>>
File: file.png (217 KB, 1246x786)
217 KB
217 KB PNG
>>108554335
>>108554363
https://openai.com/index/why-we-no-longer-evaluate-swe-bench-verified/

openai thought swe-verified was effectively saturated at 80% because the benchmarks was rejecting valid answers + problems of training on test data (which was being sourced from open source repos).
so mythos getting to 90+ is probably the model tailoring its solutions to match the benchmark ?

that said, i wouldn't be surprised if the other swe-benches have similar issues, so we may have just saturated swe-pro as well
>>
>>108553037
max IQ to trust this physiognomy?
>>
>>108553037
bug bounty hunters are quaking in their combat boots
>>
>>108556085
Of course it's overfitted to the ass
>>
>>108554313
Yes it's a big deal. On the other hand Opus 4.6 at its best is also very good, the practical problem is that it rarely runs at its best and if we ever get access to Mythos the same thing will probably happen.
>>
>on the juice and their lies
>>
>>108553037
Why are most of these large AI companies run by jews? its not like they are even the competent ones as all of the AI work is done by chinese people (most of whom live in USA).
>>
>>108553671
You've been alive for a long time.
>>
>>108553050
ffmpeg accesses online? Eeeehhhh doubt.
>>
>>108556343
they just move the bank notes
>>
For vulnerabilities it would be about intercepting an online connection.
Analyzing the program and how it does do an online connection.
Assuming/determining possible program update parameters and intercept vectors.
Automatically doing this for all programs?

Do we really call an autoupdater for programs?
>>
It's possible to do so much hacking with an AI.
Eeeehhhh why am I brainstorming evil thoughts for you niggers. You will actually go out and do it.
>>
>>108553050
idk in my first two weeks vibe coding with local AI, I accidentally stumbled into some potential exploits in a common markdown parser

I wouldn't be surprised if a stupider AI would have also found the bug
>>
>>108553050
>talk is cheap, send patches
hate this shit, i don't have anything to prove, i can just talk about my hobby, i don't have to prove a fucking thing.
>>
>>108556429
YOU WILL work for free!
>>
>>108553037
What is that *thing* in OP's picture? If I looked like that I'd probably never leave the house, until the date of the plastic surgery. He's got a major WrongFace elephant man look going on there. Maybe if he closed his mouth when smiling he'd only be hideous instead of horrifying.
>>
File: turtle_original.jpg (31 KB, 550x544)
31 KB
31 KB JPG
this is not /o/ mr ?
>>
>>108556429
are you a company claiming to have made a groundbreaking new discovery? then yeah you have something to prove, retard
>>
>>108553037
cunt looks like an unfunny TV comedian.
do wish he'd stfu, my AI Co. shitlist is already full for the foreseeable.
>>
>>108556343
Jews have capital and international connections and you are a goy who sells his labor.
>>
>>108556429
>I can say your product is shit
>no, I don't need to prove it
>just trust me bro
>>
>>108558301
ffmpeg confirmed it is real
you lost luddite
>>
>>108560226
I don't care who says it's true I won't believe it
>>
>>108560279
>I'm retarded on purpose, so there
so brave
>>
>>108553037
>no guys you don't understand the vibecoded apps with this thing are too bussin frfr we can't make money with this it's too good ong
>>
>>108558301
"groundbreaking new discovery"
sir it was a basic vulnerability that wouldn't even get you two days of mcdonalds wages from a VDP, where do you people who know nothing of technology find a technology board?
>>
>>108560486
the discovery being that your LLM can suddenly find and fix unknown bugs in leading software, not the bug itself
>>
>>108560545
that's not a new discovery either? been doing it since 2022.
>>
>>108556105

mr bounty likely has firearms waiting for anything related
>>
>>108553037
Why would I care about their unreleased models, when a single chat can eat up rate limits quickly? I used to be able to chat nearly all day, until they fucked it up.
>>
>>108560882
Just wait until he hears about the mutiny.
>>
>>108560549
yes, and in principle it could generate code in 2022, but if you tried it then you realized how dogshit it was
>>
File: check.png (45 KB, 1264x868)
45 KB
45 KB PNG
>>
>>108561177
>Charismatic Leadership
These faggots have the charisma of a proctoscopy performed with a cheese grater.
>>
>>108561089
Yeah I remember when GPT-3 (the base model, not ChatGPT 3.5) first came out a bunch of people had their minds blown that you could say "HTML button the color of trump's hair" and it would make a yellow button, just enough to demonstrate it understood the semantic meaning of what you were asking. That was enough to be impressive back then. The context window of top models in 2022 ranged from 4k-8k tokens so they couldn't even in principle analyze a real codebase.
>>
>>108561048

he still hears or young gypsy got his ears?
>>
>>108556343
They simply have easier access to the watering hole.
Same reason why the biggest companies are mysteriously and conveniently located in the cities which have local federal reserve branches.
>>
>>108556343
I mean you could make the case for Altman sure, all he's ever done is run businesses. But Amodei is absolutely one of the competent ones, along with Ilya. Those two were AI researchers at Baidu and Deepmind respectively that got poached for OpenAI, leaving higher paying jobs for the promise of working for a non-profit that "benefits humanity" and left to make their own companies when they thought it was failing to live up to that goal. Frankly I think their fears are hysterical and misguided but I don't believe for a second that they're not serious. All the internal memos and documents that have been leaked or revealed during Musk's lawsuit's discovery process seemed to corroborate that, along with the fact that Ilya nearly blew up the whole company over it and only failed because Microsoft threatened to just hire everyone into their own AI division.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.