[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


Janitor applications are now being accepted. Click here to apply.


[Advertise on 4chan]


File: 1743635080699903.png (337 KB, 1497x936)
337 KB
337 KB PNG
Useful archiving efforts and other projects to help out with for people new to and interested in archiving:

HIGH priority (If you don't help archive these automatically, the data will probably be lost forever):

1. http://warrior.archiveteam.org/
Help out automatically archive things being shut down right now by running ArchiveTeam Warrior program (or specific containers) in the background
Requirements: Few GB of space, some bandwidth and small amount of CPU power, more info: https://wiki.archiveteam.org/index.php/ArchiveTeam_Warrior

If you learn that a site or any online data is in danger of shutting down, read through this page and contact ArchiveTeam on their IRC if required in order to have it archived: https://wiki.archiveteam.org/index.php/Projects

2. Help out automatically forward URLs you browse that are not archived on https://archive.org to them for archival with a browser extension
https://github.com/internetarchive/wayback-machine-webextension
>>
MEDIUM priority (Important overall)

3. Seed torrents for as long as possible, rare data forever. Make sure to look up a guide for your router to PORT FORWARD your torrent client port, to substantially increase your upload (and your download) speed. In low population torrent swarms, if no one is port forwarded then you might not be able to connect to each other at all and exchange any data despite having it.
Requirements: As much or as little bandwitdh you want (you can set the limits if you need to)
https://github.com/qbittorrent/qBittorrent (Recommended client, especially to replace uTorrent)

4. Archive web pages you want to have a local copy of with a "Web Extension for saving a faithful copy of a complete web page in a single HTML file with a single click"
https://github.com/gildas-lormeau/SingleFile

5. Archive videos with "GUI front-end for youtube-dl, yt-dlp and other compatible video downloaders"
https://github.com/axcore/tartube

6. "Capture or record any area of your screen and share it with a single press of a key"
https://github.com/ShareX/ShareX

7. Archive entire websites you want to have a local copy of
https://www.httrack.com/
>>
8. Publish the data that you have archived that isn't easily or at all available online. You can easily create torrents yourself in your torrent client and then share the magnet link to it anywhere online for anyone to access and, as long as DHT (Distributed Hash Table, decentralized way to share torrents without the need for any specific tracker) is enabled in settings (on by default), your files will be searchable on DHT by DHT crawlers, local or online (for example https://btdig.com/, where you can actually also search for FILE NAMES within all DHT torrents)
(archive.org also creates torrents for all uploads automatically but their torrents shouldn't be relied on because of an error-prone implementation and since they can also break when more files are uploaded or if the item's metadata changes, which includes even getting a new comment on the item)


OTHER useful things:

- In your torrent client settings add the best trackers to be automatically added for all of your newly added torrents (helps more easily connect to peers, especially in obscure torrents)
https://github.com/ngosang/trackerslist

- Look into running a node for I2P (anonymous private network within the global internet)
Requirements: Mostly bandwidth, more info: https://geti2p.net/en/faq
https://geti2p.net/

- Look into running Tor/Hyphanet(Freenet)/IPFS/YaCy/SearXNG nodes

- Easily capture and digitize all data AND METADATA from optical media (CDs, DVDs, Blu-rays...) with Media Preservation Frontend (MPF)
https://github.com/SabreTools/MPF

- "A self-hosted BitTorrent indexer, DHT crawler, content classifier and torrent search engine with web UI"
https://github.com/bitmagnet-io/bitmagnet

- "ArchiveBox is a powerful, self-hosted internet archiving solution to collect, save, and view websites offline"
https://github.com/ArchiveBox/ArchiveBox
>>
- Look into donating your PC resources to be used more intensively in projects:
BOINC (Berkeley Open Infrastructure for Network Computing): https://boinc.berkeley.edu/projects.php


- Additional archiving tools: https://github.com/iipc/awesome-web-archiving

- Additional links to archiving and similar communities:
https://wiki.archiveteam.org/index.php/Archiveteam:IRC
https://www.reddit.com/r/Archiveteam
https://www.reddit.com/r/DataHoarder
https://www.reddit.com/r/DataHoarder/wiki/index/ - Hardware and software for data hoarding FAQ
https://www.reddit.com/r/lostmedia
https://www.reddit.com/r/GamePreservationists
https://www.reddit.com/r/torrents
https://www.reddit.com/r/qBittorrent
https://annas-archive.se/torrents
>>>/t/

What are you archiving or want to archive?
Do you have or know anyone who has some rare interesting data or media not available online?
>>
>>106562173
>What are you archiving or want to archive?
i want to archive js heavy sites. give me a solution right now.
>>
>>106562182
There is no special solution, you just have to try multiple tools and see which one works the best.
>>
>>106562186
Shut up faggot you still leave out the best archive site because you are a useless tinkertroon who only archives lgbtqia+++ sonic porn
>>
Literally links to seven subreddits. How the fuck is a subreddit for qbittorent more imporant than actual archive sites. Go back nigger.
>>
>>106562214
>archive.is
1. Almost never had a page saved that wasn't in the Wayback Machine already
2. It has orders of magnitude less data than the Wayback Machine, especially old data
3. It doesn't have the important ability to "Save outlinks" when saving a page
4. It doesn't save any Flash files
5. It doesn't save any PDFs
6. It doesn't save any videos
7. It doesn't save any sounds
8. It only has a 50MB limit per page, which is a big problem for a lot of websites, especially nowadays which, if all images are included, can easily be hundreds of MB

Aside from helping to bypass paywalls on some sites for some news articles, it doesn't have a single unique and useful feature compared to the Wayback Machine, it's still centralized, theres no extension to automatically forward unarchived pages there for archival, there's no way to search all the text inside the pages, or any other feature that would make it worth for someone to go out their way to use it. It's simply another place with some copies of some limited amount of data already available elsewhere.

So there is no point in going out of the way to single it out on this list here, and it's already mentioned in the linked further reading in https://github.com/iipc/awesome-web-archiving anyway.

>>106562223
Because those linked communities contain useful information or discussions, especially for people new to archiving or torrenting. Excluding it because you may disagree with the average user on the site overall or don't like the site is irrelevant to the fact that there are niche communities that have a lot of useful information and would be a genetic fallacy.
>>
>>106562464
>Almost never had a page saved that wasn't in the Wayback Machine already
Wrong
>Aside from helping to bypass paywalls on some sites for some news articles, it doesn't have a single unique and useful feature compared to the Wayback Machine,
>Aside from its features, it has no features
It also isn't even true. Many sites disallow archive.org, for example instagram. These sites work on archive.is. This is just one example.
>Because those linked communities contain useful information or discussions, especially for people new to archiving or torrenting
This is false. Name one thing you would even need to proactively read from the qbittorent reddit. If you had a problem, you would look up the issue on google and look at the results, which may include reddit. You wouldn't start scrolling through random reddits to hopefully solve your problem. These are just the subreddits you scroll through all day (r/GamePreservationists) personally while jacking of to tranny porn on your second monitor. You litterally have archiving reddit as one of your goals in your OP image.
>>
>>106562533
>Complains about reddit
>Gives archiving instagram posts as an example extra feature of archive.is
Right...

Anyway, you handwave dismissed the first and completely ignored every single other of the 7 whole big points which destroy archive.is of any actual possibility of being a proper archive of most pages online given the severe limitations of it basically not archiving anything other than basic HTML and only up until 50MB. Which instantly makes it worthless for even archiving your example site of Instagram anyway where most profile pages will need much more than the measly 50MB for all of the images.

It's best to not give the users a false sense of security then to think that he archived a specific page but he didn't archive any pdfs, any videos, any audio, any flash files, any outlinks, or literally anything at all if it all goes over just 50MB.

>If you had a problem, you would look up the issue on google and look at the results, which may include reddit
Acting as if anyone can find almost any solution on the internet nowadays anywhere on Google without appending reddit at the end of the search query proves that you are just being bad faith.
>>
>>106562157
bump



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.