Has anyone made or found a script to scrape a subreddit and import it to a Lemmy community? There are a handful of smaller subs that I’d like to mirror over to my instance (with author attribution) but haven’t found anything that works yet. https://github.com/rileynull/RedditLemmyImporter looks promising but links to a non-functioning Python script (tries to use Pushshift, which isn’t working at the moment).

  • phonelife@beehaw.org
    link
    fedilink
    English
    arrow-up
    5
    ·
    1 year ago

    You would need to scrape it using a personal API key which does have rate limits theoretically?

    That would be the most efficient way. You’d need to both write to a database and a document storage for the photos/videos.

    Otherwise you could scrape it through a browser using a library like puppeteer and store it similarly. But that’s probably the worst way to do it considering the API for reddit doesn’t charge yet. It’s really looking for title, (content, link, image or video), and OP. Comments are likely a waste of time to grab in most instances and would be hard to integrate back to Lemmy in its current state.