Exactly how would this work?
this method is known as bullshit in bullshit out
>>108199363Distillation. Query the model, and train another on the results. It's not really an attack, it's more paying for material, and then using it. But that's a no-no because that's not fair I guess.
>hehe we just had to scrape the whole internet, copyrighted or not, just a mere six gorillion tokens of data, no biggie>DISTILLING MY MODEL? THAT'S ILLEGAL!
LLMs are just a statistical model. So unless they use some kind of crypographic random process in the inference then there will always be a direct coorelation between what you give and what it gives you back.If you have enough data to coorelate inputs and outputs, you can train another model by trying to make the same coorelations with inputs and outputs.
>>108199802if you hide your blog from ai scrapers but keep it wide open for googlebot for le search ranking on a service nobody uses anymore you're cucking yourself by serving your shit up to gemini on a silver platter
>>108201850on my blog i serve 100k seo tags for every post, and they're all a variation of the Nword
>>108199363(((Google)))
>>108199802"Laws don't exist, only power"Once you understand this, you realize there's no such thing as interpretation, but only alliances.
>>108199363heh
>>108201876I'm trans btw, not sure that it matters
>>108199363reversing weights
Outrageous lies and defamatory material!
Every company has access to the general literature (like libgen) and public web data (dated) to train the foundation model i.e. dumb/statistic model that predict the next word. To get better at certain task like math/programming, they have pay professional to write good and long training data and that process is not cheap. Chinese companies have been prompting for these curated/expensive data to improve their model capability.
>>108201876woah....
>>108199363>recover multi trillion parameter model from like a few million tokensretard
>>108199363if i ask you 1000 questions then i know