Meta Secretly Trained Its AI on a Notorious Piracy Database, Newly Unredacted Court Docs Reveal

cantankerous_cashew@lemmy.world · 1 month ago

rumba@lemmy.zip · 1 month ago

The notorious piracy database in question is Library Genesis.

Cached article:

CriticalMiss@lemmy.world · 1 month ago

Earlier reports suggested they trained it on books from Bibliotik.

What changed?

halcyoncmdr@lemmy.world · 1 month ago

Probably just both honestly.

rumba@lemmy.zip · 1 month ago

In for a penny and for a pound.

BetaDoggo_@lemmy.world · 1 month ago

The llama-1 paper acknowledged the use of the books dataset, libgen isn’t mentioned in any of the papers so this is new info.