[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1711861121843792.png (337 KB, 1497x936)
337 KB
337 KB PNG
Useful archiving efforts and other projects to help out with:

HIGH priority (If you don't help archive these automatically, the data will probably be lost forever):

1. http://warrior.archiveteam.org/
Help out automatically archive things being shut down right now by running ArchiveTeam Warrior program (or specific containers) in the background:
Requirements: Few GB of space, some bandwidth and small amount of CPU power, more info: https://wiki.archiveteam.org/index.php/ArchiveTeam_Warrior

If you learn that a site or any online data is in danger of shutting down, read through this page and contact ArchiveTeam on their IRC if required in order to have it archived: https://wiki.archiveteam.org/index.php/Projects

2. Help out automatically forward URLs you browse that are not archived on https://archive.org to them for archival with a browser extension:
https://github.com/internetarchive/wayback-machine-webextension
>>
MEDIUM priority (Important overall)

3. Seed torrents for as long as possible, rare data forever. Make sure to look up a guide for your router to PORT FORWARD your torrent client port, to substantially increase your upload (and your download) speed. In low population torrent swarms, if no one is port forwarded then you might not be able to connect to each other at all and exchange any data despite having it.
Requirements: As much or as little bandwitdh you want (you can set the limits if you need to)
https://github.com/qbittorrent/qBittorrent (Recommended client, especially to replace uTorrent)

4. Archive web pages you want to have a local copy of with a "Web Extension for saving a faithful copy of a complete web page in a single HTML file with a single click"
https://github.com/gildas-lormeau/SingleFile

5. Archive videos with "GUI front-end for youtube-dl, yt-dlp and other compatible video downloaders"
https://github.com/axcore/tartube

6. "Capture or record any area of your screen and share it with a single press of a key"
https://github.com/ShareX/ShareX

7. Archive entire websites you want to have a local copy of
https://www.httrack.com/ (not sure if there is anything better than this)

8. Publish the data that you have archived that isn't easily or at all available online. The easiest way is uploading it to https://archive.org. Once uploaded and edited to what you want, you can download the .torrent file archive.org will automatically always create for all items. This torrent can then be seeded and shared with a magnet link anywhere.
You can also just create torrents yourself in your torrent client and, as long as DHT (Distributed Hash Table, decentralized way to share torrents without the need for any specific tracker) is enabled in settings (on by default), your files will be searchable on DHT by DHT crawlers, local or online (for example https://btdig.com/, where you can actually also search for FILE NAMES within all DHT torrents)
>>
OTHER useful things:

- In your torrent client settings add the best trackers to be automatically added for all of your newly added torrents (helps more easily connect to peers, especially in obscure torrents):
https://github.com/ngosang/trackerslist

- Look into running a node for I2P (anonymous private network within the global internet):
Requirements: Mostly bandwidth, more info: https://geti2p.net/en/faq
https://geti2p.net/

- Look into running Tor/Hyphanet(Freenet)/IPFS nodes.

- "A self-hosted BitTorrent indexer, DHT crawler, content classifier and torrent search engine with web UI"
https://github.com/bitmagnet-io/bitmagnet

- "ArchiveBox is a powerful, self-hosted internet archiving solution to collect, save, and view websites offline"
https://github.com/ArchiveBox/ArchiveBox

- Look into donating your PC resources to be used more intensively in projects:
BOINC (Berkeley Open Infrastructure for Network Computing: https://boinc.berkeley.edu/projects.php
GIMPS (Great Internet Mersenne Prime Search): https://www.mersenne.org/

- Additional tools: https://github.com/iipc/awesome-web-archiving
>>
- Additional links to archiving and similar communities:
https://wiki.archiveteam.org/index.php/Archiveteam:IRC
https://www.reddit.com/r/Archiveteam
https://www.reddit.com/r/lostmedia
https://www.reddit.com/r/DataHoarder
https://www.reddit.com/r/GamePreservationists
https://www.reddit.com/r/torrents
https://www.reddit.com/r/qBittorrent
https://boards.4chan.org/t

What are you archiving or want to archive?
Do you have or know anyone who has some rare interesting data or media not available online?
>>
File: 1719477996143.jpg (156 KB, 1024x1024)
156 KB
156 KB JPG
>>101173328
>quality thread
Post maids and help OP archive interesting things forever!
>>
File: 27-06-2024 15-52-28.png (16 KB, 801x338)
16 KB
16 KB PNG
>>101173336
>httrack
an alternative to httrack is making your own WARC.gz's with wget. admittedly, kinda awkward to open those, but easy to store plus compressed by default.

also, anyone know if you can get websites removed from archive.org's blacklist? theres this old blog site ive personally archived, but is blacklisted probably because it has "gay" in the dudes domain name. its his surname. vigay.
>>
>>101174796
>>
>>101175576
You have official Wikipedia dumps. Something more than 20GB for current English Wikipedia, but you also have it will full edit history. I assume you can also find the versions from previous years as torrents if you really want to.
>>
>>101175657
>Something more than 20GB for current English Wikipedia, but you also have it will full edit history
It's actually around 100GB if you want the full zim, hosted on the kiwix site. But if it has the edit history then that would be really good because I don't trust Wikipedia over the last decade to not be a clusterfuck of edit wars on a lot of pages.
https://download.kiwix.org/zim/wikipedia/
>>
>>101175727
By the 20GB one I meant the most popular 22GB English, text only, without edit history one here: https://dumps.wikimedia.org/enwiki/20240620/

I don't know about the rest I just know there are dumps with edit history and everything else somewhere.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.