[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1768415622721909.png (2.24 MB, 1208x1022)
2.24 MB
2.24 MB PNG
Exactly how would this work?
>>
this method is known as bullshit in bullshit out
>>
>>108199363
Distillation. Query the model, and train another on the results. It's not really an attack, it's more paying for material, and then using it. But that's a no-no because that's not fair I guess.
>>
File: 1662069717962214.jpg (4 KB, 250x250)
4 KB
4 KB JPG
>hehe we just had to scrape the whole internet, copyrighted or not, just a mere six gorillion tokens of data, no biggie
>DISTILLING MY MODEL? THAT'S ILLEGAL!
>>
LLMs are just a statistical model. So unless they use some kind of crypographic random process in the inference then there will always be a direct coorelation between what you give and what it gives you back.
If you have enough data to coorelate inputs and outputs, you can train another model by trying to make the same coorelations with inputs and outputs.
>>
>>108199802
if you hide your blog from ai scrapers but keep it wide open for googlebot for le search ranking on a service nobody uses anymore you're cucking yourself by serving your shit up to gemini on a silver platter
>>
>>108201850
on my blog i serve 100k seo tags for every post, and they're all a variation of the Nword
>>
>>108199363
(((Google)))
>>
>>108199802
"Laws don't exist, only power"
Once you understand this, you realize there's no such thing as interpretation, but only alliances.
>>
>>108199363
heh
>>
>>108201876
I'm trans btw, not sure that it matters
>>
>>108199363
reversing weights
>>
Outrageous lies and defamatory material!
>>
Every company has access to the general literature (like libgen) and public web data (dated) to train the foundation model i.e. dumb/statistic model that predict the next word. To get better at certain task like math/programming, they have pay professional to write good and long training data and that process is not cheap. Chinese companies have been prompting for these curated/expensive data to improve their model capability.
>>
>>108201876
woah....
>>
>>108199363
>recover multi trillion parameter model from like a few million tokens
retard
>>
>>108199363
if i ask you 1000 questions then i know



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.