05-17-2024, 08:22 AM
Earlier this week I started looking into what it would take to download as much as possible using "web crawler" tools. I didn't get very far partly because I've got real-world things to deal with and partly because I'm a little lazy.
I know it's possible to do but there are challenges. For example, how much storage would I need to hold a copy of every page? How long would it take to grab every page (is there enough time left)? What kind of bandwidth would I need? Could my Internet provider interpret what I'm doing as something malicious?
There might be legal issues too but if Wayback Machine can archive the entire Internet, it must not be a problem.
I know it's possible to do but there are challenges. For example, how much storage would I need to hold a copy of every page? How long would it take to grab every page (is there enough time left)? What kind of bandwidth would I need? Could my Internet provider interpret what I'm doing as something malicious?
There might be legal issues too but if Wayback Machine can archive the entire Internet, it must not be a problem.