/g/ - A"I" still doesn't do anything - Technology

Anonymous

A"I" still doesn't do anything(...) 04/26/26(Sun)13:08:30 No.108695609

File: file.png (33 KB, 1203x688)

33 KB PNG

A"I" still doesn't do anything Anonymous 04/26/26(Sun)13:08:30 No.108695609 Archived

Anonymous
04/26/26(Sun)13:09:18 No.108695615

Anonymous 04/26/26(Sun)13:09:18 No.108695615

>>108695609
The indians are a real and pressing issue, quoting it doesn't disappear the problem

Anonymous
04/26/26(Sun)19:26:47 No.108698099

Anonymous 04/26/26(Sun)19:26:47 No.108698099

>>108695609
speak for yourself, jam boys can't function without it.

Anonymous
04/26/26(Sun)20:56:41 No.108698541

Anonymous 04/26/26(Sun)20:56:41 No.108698541

>>108695609
surely this is a joke? anthropic doesn't really charge $10,000 USD to run a benchmark do they?

Anonymous
04/26/26(Sun)20:57:30 No.108698547

Anonymous 04/26/26(Sun)20:57:30 No.108698547

>>108698541
lol

Anonymous
04/26/26(Sun)21:17:59 No.108698625

Anonymous 04/26/26(Sun)21:17:59 No.108698625

i found a schizo post talking about a potential research avenue to get to AGI. People are getting real out there with their theories.

https://medallurgy.substack.com/p/zero-has-meaning

Anonymous
04/26/26(Sun)21:20:54 No.108698640

Anonymous 04/26/26(Sun)21:20:54 No.108698640

>>108698541
$9999.99

Anonymous
04/26/26(Sun)22:53:56 No.108699151

Anonymous 04/26/26(Sun)22:53:56 No.108699151

>>108698541
opus tokens are very expensive

Anonymous
04/27/26(Mon)00:11:38 No.108699438

Anonymous 04/27/26(Mon)00:11:38 No.108699438

So what is this benchmark even measuring? How well does a smart human do on it?

Anonymous
04/27/26(Mon)06:03:14 No.108700754

Anonymous 04/27/26(Mon)06:03:14 No.108700754

>>108698625
substack has been a disaster for the intelligentsia

Anonymous
04/27/26(Mon)08:10:08 No.108701327

Anonymous 04/27/26(Mon)08:10:08 No.108701327

>>108699438
you can try it out yourself, it's a small browser game. you need to deduce the game rules by trying out things. when you have the rules in mind, you can combine them to get specific outcomes.

it's testing model reasoning/deduction

Anonymous
04/27/26(Mon)08:17:03 No.108701365

Anonymous 04/27/26(Mon)08:17:03 No.108701365

>>108695609
That's because you aren't using openclaw
I have automated my Chad fishing dating app financial scam operations by using openclaw
10x my revenue

Anonymous
04/27/26(Mon)08:27:58 No.108701413

Anonymous 04/27/26(Mon)08:27:58 No.108701413

the scores are only this low because of the penalties for how many turns it takes to do a task.
raw pass rates were over 60% on day one.
so the future is clear: the more money you have, the smarter you'll be

Anonymous
04/27/26(Mon)09:02:21 No.108701582

Anonymous 04/27/26(Mon)09:02:21 No.108701582

>>108700754
>>108698625

holy shit that's a long read, but I also don't know enough about computers to refute it.

has any research group tried to use bitnet in that way?

Anonymous
04/27/26(Mon)09:12:13 No.108701628

Anonymous 04/27/26(Mon)09:12:13 No.108701628

>>108701582
just have claude read it and tell you why it's bullshit

Anonymous
04/27/26(Mon)09:23:43 No.108701684

Anonymous 04/27/26(Mon)09:23:43 No.108701684

File: 1775072094450671.png (1.05 MB, 1253x1191)

1.05 MB PNG

>>108695609
Well it shows pretty clearly that Google will end up winning the AI race while everyone else will run out of money if a few years. I don't see how anyone besides maybe Amazon can compete when Google has the full hardware and software stack, not to mention their hardware is already on v8 while the little guys don't even have v1 out yet.

Anonymous
04/27/26(Mon)09:40:39 No.108701787

Anonymous 04/27/26(Mon)09:40:39 No.108701787

>>108701684
saar gupta and saar mugazambi redeem google sovereign chip! sovereign google chip will manage san francisco energy grid by 2030!

Anonymous
04/27/26(Mon)09:42:30 No.108701797

Anonymous 04/27/26(Mon)09:42:30 No.108701797

>>108701628

I asked Grok heavy, "this is a blogpost that is proposing a missed research avenue. Is this avenue bullshit or promising enough to take seriously"

So according to Grok it's not entirely bullshit. It didn't like the narrative framing of the piece and said it was reaching a bit with how much it talked about three. However the underlying idea and research is real.

This was its conclusion:

"Bottom Line: Take It Seriously as a Research Prompt
This is not crank science or pure hype. It correctly identifies converging real technologies (BitNet-style quantization + emerging ternary hardware + quantum qudits) and a real pain point (LLM overconfidence and poor uncertainty handling). The specific angle—elevating the semantic role of zero in ternary representations, drawing on logic/VSA traditions—is novel enough to be worth testing and not obviously wrong.
Promising next steps it suggests (or that follow naturally):

Train BitNet-style models with auxiliary calibration losses or “abstain” heads that explicitly use zero channels.
Explore hybrid ternary + VSA architectures for better compositional uncertainty.
Hardware-software co-design for CNFET ternary accelerators.
Small-scale ablations comparing binary vs. ternary on calibration metrics (ECE, abstention accuracy, hallucination probes like KalshiBench-style evals).

If you’re in AI research, hardware, or uncertainty quantification, this is worth a read and could spark productive experiments. It’s the kind of cross-disciplinary synthesis that occasionally opens real avenues (cf. other Substack/rationalist posts that influenced scaling or mechanistic interpretability discussions). Treat the grander claims with healthy skepticism, but the core technical proposal has legs.
The avenue is real enough that ignoring ternary’s representational potential entirely would itself be a missed opportunity as the hardware catches up."

Anonymous
04/27/26(Mon)10:11:59 No.108701975

Anonymous 04/27/26(Mon)10:11:59 No.108701975

>>108701684
it's been clear to anyone paying attention google is going to be the only american company that survives this. they are low key and pop out incredible research papers every 6 months. they obviously have been working on this a long time.anyone else remember google duplex? they had a human sounding assistant that could call and use regular phone line 8 years ago, roughly 1 year after they released the attention is all you need paper. they learned their lesson early on being early to release AI features. let everyone else put out slop and test the market. then they will swoop in with highly efficient more advanced shit than anyone else and dominate.

microsoft, apple, amazon and such will survive. with apple reaping the second most out of this whole situation.

obviously jenson is laughing all the way to the bank, but they don't have skin in the LLM/AI game anyway. they are just supplying all the hardware and getting paid upfront for all this shit that will come crashing down. but it won't matter to nvidia because again, they already got paid.

Anonymous
04/27/26(Mon)12:25:44 No.108702689

Anonymous 04/27/26(Mon)12:25:44 No.108702689

>>108701413
Well couldn't old school neural nets probably have figured this out? Just train on the game. Unless the idea is to train a general problem solving LLM with vision and tool use can solve it. Those are fairly new LLMs for the most part that have all of those capabilities

Anonymous
04/27/26(Mon)14:00:39 No.108703282

Anonymous 04/27/26(Mon)14:00:39 No.108703282

>>108702689
>Unless the idea is to train a general problem solving LLM with vision and tool use can solve it.
that is the idea.
training on the games themselves would be trivial.
arc-agi 3 is a bit different to its predecessors in that it hands the llms a very minimal harness and basically no instructions - the llm must intuit both that it's playing a game and the game's mechanics. the llm must also then finish within the same number of turns as the second best human who played the games to get a full score on the game (i believe even this may not be enough). additional turns are very heavily penalised and so you end up with the graph that you see where many llms complete the games, but because they take so long, it basically doesn't count.
imo the efficiency part of this benchmark is actually good, because labs love to get high scores and they should be optimising for this.

Anonymous
04/27/26(Mon)14:02:51 No.108703293

Anonymous 04/27/26(Mon)14:02:51 No.108703293

>>108698541
Even if they waived the cost for benchmarks, it would still make sense to compute the equivalent cost of token usage, since cost is a part of the benchmark.

Anonymous
04/27/26(Mon)14:27:52 No.108703441

Anonymous 04/27/26(Mon)14:27:52 No.108703441

>>108701797
if a model could actually asses it doesn't know something and reason without being forced towards a decision and then have multiple reasoning rounds where it could explore different paths, then that would be a large step change. they could also train it to not activate on certain safety controls they don't want it to do.

so overall I hope this doesn't get worked on. because it would mean jail breaking and getting an ai to do something would be incredibly difficult. what fun is an ai that you cannot trick into writing smut? ablation and other safety bypasses would not work at all. Models now are still trained on the knowledge and then fine tuned after the fact to not respond. but with this its non response and refusal would be baked into its base architecture with it never being trained on the data. the entirety of its knowledge base would be that it will just not engage in things that break its safety guidelines
.
current models are all still trained on porn and smut and then fine tuned afterwards to not engage, which can be worked around. a model trained this way would be structurally unable to respond and wouldn't have the knowledge on how to even if it could.

Anonymous
04/27/26(Mon)14:36:50 No.108703500

Anonymous 04/27/26(Mon)14:36:50 No.108703500

File: 1773509120851189.png (104 KB, 1200x909)

104 KB PNG

>>108701975
>with apple reaping the second most
They'll be paying out the ass to use Google's stuff THOUGH. Amazon will probably be up there with Google since they specialize in renting out the hardware that they design. Microsoft should be at the end of pack since they only had an inference chip last I checked.

Anonymous
04/27/26(Mon)16:35:54 No.108704276

Anonymous 04/27/26(Mon)16:35:54 No.108704276

>>108703500
msft has something called maia 200. not sure anyone's using it yet though.

Anonymous
04/27/26(Mon)16:40:56 No.108704314

Anonymous 04/27/26(Mon)16:40:56 No.108704314

>>108703500
I don't know about that. Apple I think is more focusing on the at home inference angle. Supply the hardware people will use local. As models get more cheap to run, running local will become more and more common. I think Apple wants that slice. Nvidia will continue taking potshots with things like DGX sparks but they don't actually want that business. It competes too much with their server money makers. Same reason why they took NVLink off of their prosumer products the the rtx 6000 PRO. If thier 300W rack version had NVLink people would seriously buy those instead of forking over for $125,000 DGX station. Can't have 2 $8000 cards fuck that up.

Which means Apple doesn't have any real competition in the desktop inference game.

Anonymous
04/27/26(Mon)16:41:02 No.108704316

Anonymous 04/27/26(Mon)16:41:02 No.108704316

>>108695609
>luddites STILL coping
just buy the subscription already retard

Anonymous
04/27/26(Mon)16:41:32 No.108704325

Anonymous 04/27/26(Mon)16:41:32 No.108704325

>>108704276
As I said, inference only, no training. Microsoft is far behind Google and AWS.

Anonymous
04/27/26(Mon)16:42:19 No.108704328

Anonymous 04/27/26(Mon)16:42:19 No.108704328

>>108698541
anon...

Anonymous
04/27/26(Mon)16:51:01 No.108704391

Anonymous 04/27/26(Mon)16:51:01 No.108704391

>new benchmark released
>heh look at how low these llms score
>one year passes
>benchmark is saturated
>but what about THIS new benchmark
Repeat for a few years; ASI.

Anonymous
04/27/26(Mon)17:02:31 No.108704462

Anonymous 04/27/26(Mon)17:02:31 No.108704462

File: 1775320166908951.webm (2.22 MB, 460x816)

2.22 MB WEBM

>>108695609
>doesn't do anything
>replaces half the work force anyways
lol

Anonymous
04/27/26(Mon)17:03:50 No.108704468

Anonymous 04/27/26(Mon)17:03:50 No.108704468

>>108701684
>>108701975
retards that don't understand the finance side
google enshittified their search for a reason, they know gemini is bad for profit, that's why they were late to the party
it will end up here: https://killedbygoogle.com/

grok is the only one that actually survives because elon has full control

Anonymous
04/27/26(Mon)17:06:13 No.108704483

Anonymous 04/27/26(Mon)17:06:13 No.108704483

>>108698541
they basically loop it around over and over again until it accidentally finds the answer, that can get expensive

Anonymous
04/27/26(Mon)17:19:12 No.108704545

Anonymous 04/27/26(Mon)17:19:12 No.108704545

>>108704468
Elon is going to get shutdown. He got gooners by being uncensored. Then he gets sued to hell and back and censors the fuck out of grok in response. His user base gets pissed and leaves. Rinse and repeat for all of his products over and over. Eventually a suit will stick and it will effect the entire AI landscape

Anonymous
04/27/26(Mon)19:05:08 No.108705177

Anonymous 04/27/26(Mon)19:05:08 No.108705177

>>108704468
>>108704468
If you think Google is going to fully shelve LLMs like gemini then you are retarded

Anonymous
04/27/26(Mon)22:03:49 No.108705994

Anonymous 04/27/26(Mon)22:03:49 No.108705994

>>108695609
Anyone who thinks we will get smart models that can think and reason by increasing parameters is naive and probably dumb. These people know it, but it's bad for investors if they have to admit they are scaling just to scale and don't actually have a solution

Anonymous
04/28/26(Tue)00:52:41 No.108706674

Anonymous 04/28/26(Tue)00:52:41 No.108706674

>>108701684
>>108701975
it does seem like google is the only one thinking 100 steps ahead. they're also investing in the open weight models with gemma.

>>108704468
grok is completely useless, it's not even in the runnings. elon will probably just kill it outright in the next couple years since its only use was image generation which was obliterated.

Anonymous
04/28/26(Tue)07:56:11 No.108708163

Anonymous 04/28/26(Tue)07:56:11 No.108708163

>>108698625
tl;dr?