[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1779070858578580.jpg (33 KB, 734x640)
33 KB JPG
What will happen once we run out of new human/natural content (e.g. text, pics, videos, etc.) to train future models on? What will happen when future models are fed AI generated slop data since currently generated output is already hard to distinguish from natural data and most content (e.g. source code, books, articles, pics, etc..) is already contaminated with AI?
>>
they moved onto ai generated human edited synthetic content ages ago
>>
>>108848443
I will piss in my ass
>>
>>108848443
Is already eating its own shit, that is why Claude is now as retarded as ChatGPT.
>>
File: IMG_0641.gif (1.36 MB, 260x260)
1.36 MB GIF
>>108848443
It's happening already, it's called model collapse. They're trying to stave it off using ten million tricks, some of which temporarily work, but in the long run it can't be avoided. For models to keep up with "trends" and not seem outdated, they'll need to ingest slop, as slop is driving many of these "trends". Keeping everything static doesn't work either, so model collapse here we come.
>>
>>108848443
they will pay people to make more content i guess. But bubble collapse is more likely
>>
>>108848762
For some things it can work in a narrow scope.
It might work for most programming problems if you only care about technically solving a specific problem but the solutions will probably get more and more verbose the more you train on synthetic data and the solutions will certainly not be very secure.
>>
>>108848443
how will we ever "run out of new content"?
> make humanoid robot
> let it walk around doing shit and collecting data from sensors, cameras, mics, etc
> keep improving until it has better dexterity than humans

Tesla already does this with their cars, they're robots that are constantly collecting new data. Eventually robots will surpass humans in data collection amount and quality as well.
>>
>>108848887
This might be easier for pics and videos and other specific types of data that can be used for specific types of machines (e.g. robotos). But what about text? the internet has been scraped to death, all media articles have been scraped, all books since ancient Egypt have been scraped, almost all code bases, open source and lately closed source used in LLM sessions, have been scraped. You can't introduce new, correct, useful text at the same pace that LLMs have been consuming in the past 3 years.
>>
>>108848887
Nah, tesla collecting data was relevant 5 year ago, now they have to build up artificial worlds where they can simulate edge cases that are unviable to record irl.
And those artificial worlds are largely built using ai tools and they run the simulations millions of times to produce data.
>>
>>108848443
>future models are fed AI generated slop data since currently generated output is already hard to distinguish from natural data
That's part of the reason for why companies and state/federal legislation are pushing for the use of one's real ID for anything online, so they can segregate viable training data from the shitflinging. The other part is mass surveillance. We can't have Chyna winning the AI race now can we?

Thirdies are producing AI-generated shit at an unsustainable level. It's kinda funny actually, they're shilling it so hard that the AI companies themselves can't keep up with it. To all the thirdies on this board, I say have at it!
>>
>>108848443
I'm not convinced this is a problem
you can keep the models as they are and they are already doing useful work
you can just drip feed it any useful new knowledge whenever it makes sense
maybe we'll need to retrain in a few hundred years when english mutates into a different language. if humans are still around
>>
>>108849455
they will be less useful for many things. This shit might even kill the free internet as we know it, since it will stop being profitable to post new content if ai just steals everything lol. Everything turned to same ai slop stagnated at 2020s
>>
>>108848443
>flood the internet with genned slop
>gauge niggercattle response
>select top performers to form a new training set
Sounds simple enough to me.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.