- cross-posted to:
- hackernews@lemmy.smeargle.fans
- hackernews@derp.foo
- cross-posted to:
- hackernews@lemmy.smeargle.fans
- hackernews@derp.foo
Has anyone used ArchiveBox for self hosted web archiving? If so, what are your thoughts on it compared to Internet Archive or other publicly available services?
I don’t particularly like the graphic interface as shown at https://demo.archivebox.io/public/. In my opinion, too much is displayed at once.
For my part, I use Wallabag to save single Internet pages. I think its graphic interface is better. But it is not perfect either.
I’ll check Wallabag out as well
I used it but unfortunately it did not meet my needs. I’m interested in a full mirror of a website, while ArchiveBox focuses on a single webpage with a max of 1 level deep. I use wget personally, but if your goal is to archive a single webpage then ArchiveBox might be a good choice.
Thanks for the info! Single page with no link following is all I need for this project, so I’ll give it a go.
I have been experimenting with it, for what it is, it works pretty well … for now. I have concerns about the fact that it’s a ton of moving parts basically duct-taped together by an abuse of the Django admin (that’s the web app platform it’s based on, which I was a developer for long ago). Also, the search function is primitive at best. I don’t think it’s going to be my long-term solution for this need, but maybe I’m wrong.
The archived pages are available as files on disk, I also added a script which generates index.html so I can browse it without starting the program. Basically the only time I run archivebox code is when adding a new site. And I never look at the GUI, it adds nothing to the table
It’s a great tool, but depends on what you expect from it and your use case. Personally I tried it but was always disappointed by it. I always just end up using SingleFile(Z) on my browser or in the cli along with the usual yt-dlp and the like and that’s all I need really. And if I need to save an entire site I just use wget or httrack. I don’t really have the need for a browsable archive of my saved pages, I usually order them by subject when saving.