cantankerous_cashew@lemmy.world to Technology@lemmy.worldEnglish · 20 hours agoMeta Secretly Trained Its AI on a Notorious Piracy Database, Newly Unredacted Court Docs Revealwww.wired.comexternal-linkmessage-square30fedilinkarrow-up1326arrow-down17cross-posted to: technology@lemmy.world
arrow-up1319arrow-down1external-linkMeta Secretly Trained Its AI on a Notorious Piracy Database, Newly Unredacted Court Docs Revealwww.wired.comcantankerous_cashew@lemmy.world to Technology@lemmy.worldEnglish · 20 hours agomessage-square30fedilinkcross-posted to: technology@lemmy.world
minus-squarerumba@lemmy.ziplinkfedilinkEnglisharrow-up77·20 hours agoThe notorious piracy database in question is Library Genesis. Cached article: https://web.archive.org/web/20250110075821/https://www.wired.com/story/new-documents-unredacted-meta-copyright-ai-lawsuit/
minus-squareCriticalMiss@lemmy.worldlinkfedilinkEnglisharrow-up13·19 hours agoEarlier reports suggested they trained it on books from Bibliotik. What changed?
minus-squareBetaDoggo_@lemmy.worldlinkfedilinkEnglisharrow-up3·12 hours agoThe llama-1 paper acknowledged the use of the books dataset, libgen isn’t mentioned in any of the papers so this is new info.
minus-squarehalcyoncmdr@lemmy.worldlinkfedilinkEnglisharrow-up22·19 hours agoProbably just both honestly.
The notorious piracy database in question is Library Genesis.
Cached article:
https://web.archive.org/web/20250110075821/https://www.wired.com/story/new-documents-unredacted-meta-copyright-ai-lawsuit/
Earlier reports suggested they trained it on books from Bibliotik.
What changed?
The llama-1 paper acknowledged the use of the books dataset, libgen isn’t mentioned in any of the papers so this is new info.
Probably just both honestly.
In for a penny and for a pound.