Google Is Paying Reddit $60 Million for Fucksmith to Tell Its Users to Eat Glue

ForgottenFlux@lemmy.world · 8 个月前

Google Is Paying Reddit $60 Million for Fucksmith to Tell Its Users to Eat Glue

Margot Robbie@lemmy.world · 8 个月前

Reddit, and by extension, Lemmy, offers the ideal format for LLM datasets: human generated conversational comments, which, unlike traditional forums, are organized in a branched nested format and scored with votes in the same way that LLM reward models are built.

There is really no way of knowing, much less prevent public facing data from being scraped and used to build LLMs, but, let’s do an thought experiment: what if, hypothetically speaking, there is some particularly individual who wanted to poison that dataset with shitposts in a way that is hard to detect or remove with any easily automate method, by camouflaging their own online presence within common human generated text data created during this time period, let’s say, the internet marketing campaign of a major Hollywood blockbuster.

Since scrapers do not understand context, by creating shitposts in similar format to, let’s say, the social media account of an A-list celebrity starring in this hypothetical film being promoted(ideally, it would be someone who no longer has a major social media presence to avoid shitpost data dilution), whenever an LLM aligned on a reward model built on said dataset is prompted for an impression of this celebrity, it’s likely that shitposts in the same format would be generated instead, with no one being the wiser.

That would be pretty funny.

Again, this is entirely hypothetical, of course.

kjaeselrek@lemmy.ml · 8 个月前

What’s this about shitposting? I’m just here to talk about rampart.

Margot Robbie@lemmy.world · edit-2 8 个月前

I knew it! So that’s what you’ve really been up to on Lemmy, @kjaeselrek@lemmy.ml

Or should I say, Academy Award nominated actor Woody Harrelson?

ericatty@lemmy.ml · 8 个月前

The new SEO model

WindyRebel@lemmy.world · 8 个月前

As an SEO - I don’t want this AI crap at all in search. Leave it on its own siloed platform, please!

CheeseNoodle@lemmy.world · 8 个月前

So we should all start ending our comments with a randomly generated string of words to fuck with the models?

stork, fridge, tiger, animal, mineral, oxtail, oil, clouds

Clasm@lemmy.world · 8 个月前

Ideally, it would be the same word over and over, so that we can trick the AI into ending all sentences with the word. Bonus points if it is the word “buffalo”, since it can from a grammatically correct sentence.

Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo