@jaredj

jaredj@dataterm.digital · 2 years ago

Your reticence to wield sole power reassures me, and as a reddit user for 16 years, I support this decision you’ve “unilaterally” made.

The network effect is real, and centralized services are simpler to use than federated ones. But “vulnerability” is the right word for this centralization of content, and I’m glad to have moved here.

As a further fix to that vulnerability, #razit recommends replacing all comments and posts that one owns on Reddit with gibberish, because if we don’t: (1) this repository of centralized content, and the votes indicating its quality, will be exploited for large language model training; and (2) if the content remains, future users will interact with it on Reddit, rather than finding another place, cementing the network effect.

I’ve already read exhortations, months ago, before this flap, to avoid handing one’s quality content to a company like Reddit, and to post on one’s own blog or somewhere similarly less-centralized. And the longer my posts, the more I’ve thought about that while writing them.

I don’t see any tidy “export” function on Reddit, and I haven’t been that active, so I doomscrolled my entire comment and post history, and downloaded it as a giant 4MB html file. (Users who have been more active on Reddit may not be able to do this.) I’ll have to use BeautifulSoup to extract my comments out, but then I can post them on my blog or something.

While I don’t see the large language model deal coming for Reddit, I didn’t see GitHub Copilot coming either. I don’t really like the idea of snubbing (both of the) real people who need to read what I wrote, just to stick it to companies monetizing the content I’ve given away; but if there is an archive for people to read, and language modellers have to crawl web pages like the rest of us instead of getting the refined data, that seems more egalitarian.