[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1715331077223923.png (337 KB, 1497x936)
337 KB
337 KB PNG
Useful archiving efforts and other projects to help out with:

HIGH priority (If you don't help archive these automatically, the data will probably be lost forever):

1. http://warrior.archiveteam.org/
Help out automatically archive things being shut down right now by running ArchiveTeam Warrior program (or specific containers) in the background:
Requirements: Few GB of space, some bandwidth and small amount of CPU power, more info: https://wiki.archiveteam.org/index.php/ArchiveTeam_Warrior

If you learn that a site or any online data is in danger of shutting down, read through this page and contact ArchiveTeam on their IRC if required in order to have it archived: https://wiki.archiveteam.org/index.php/Projects

2. Help out automatically forward URLs you browse that are not archived on https://archive.org to them for archival with a browser extension:
https://github.com/internetarchive/wayback-machine-webextension
>>
MEDIUM priority (Important overall)

3. Seed torrents for as long as possible, rare data forever. Make sure to look up a guide for your router to PORT FORWARD your torrent client port, to substantially increase your upload (and your download) speed. In low population torrent swarms, if no one is port forwarded then you might not be able to connect to each other at all and exchange any data despite having it.
Requirements: As much or as little bandwitdh you want (you can set the limits if you need to)
https://github.com/qbittorrent/qBittorrent (Recommended client, especially to replace uTorrent)

4. Archive web pages you want to have a local copy of with a "Web Extension for saving a faithful copy of a complete web page in a single HTML file with a single click"
https://github.com/gildas-lormeau/SingleFile

5. Archive videos with "GUI front-end for youtube-dl, yt-dlp and other compatible video downloaders"
https://github.com/axcore/tartube

6. "Capture or record any area of your screen and share it with a single press of a key"
https://github.com/ShareX/ShareX

7. Archive entire websites you want to have a local copy of
https://www.httrack.com/ (not sure if there is anything better than this)

8. Publish the data that you have archived that isn't easily or at all available online. The easiest way is uploading it to https://archive.org. Once uploaded and edited to what you want, you can download the .torrent file archive.org will automatically always create for all items. This torrent can then be seeded and shared with a magnet link anywhere.
You can also just create torrents yourself in your torrent client and, as long as DHT (Distributed Hash Table, decentralized way to share torrents without the need for any specific tracker) is enabled in settings (on by default), your files will be searchable on DHT by DHT crawlers, local or online (for example https://btdig.com/, where you can actually also search for FILE NAMES within all DHT torrents)
>>
OTHER useful things:

- In your torrent client settings add the best trackers to be automatically added for all of your newly added torrents (helps more easily connect to peers, especially in obscure torrents):
https://github.com/ngosang/trackerslist

- Look into running a node for I2P (anonymous private network within the global internet):
Requirements: Mostly bandwidth, more info: https://geti2p.net/en/faq
https://geti2p.net/

- Look into running Tor/Hyphanet(Freenet)/IPFS nodes.

- "A self-hosted BitTorrent indexer, DHT crawler, content classifier and torrent search engine with web UI"
https://github.com/bitmagnet-io/bitmagnet

- "ArchiveBox is a powerful, self-hosted internet archiving solution to collect, save, and view websites offline"
https://github.com/ArchiveBox/ArchiveBox

- Look into donating your PC resources to be used more intensively in projects:
BOINC (Berkeley Open Infrastructure for Network Computing: https://boinc.berkeley.edu/projects.php
GIMPS (Great Internet Mersenne Prime Search): https://www.mersenne.org/

- Additional tools: https://github.com/iipc/awesome-web-archiving
>>
- Additional links to archiving and similar communities:
https://wiki.archiveteam.org/index.php/Archiveteam:IRC
https://www.reddit.com/r/Archiveteam
https://www.reddit.com/r/lostmedia
https://www.reddit.com/r/DataHoarder
https://www.reddit.com/r/GamePreservationists
https://www.reddit.com/r/torrents
https://www.reddit.com/r/qBittorrent
https://boards.4chan.org/t

What are you archiving or want to archive?
Do you have or know anyone who has some rare interesting data or media not available online?
>>
>>101417141
would you be interested in making the OP shorter
>>
>>101417259
As long as it's easy to shift through quickly and every line has important info, no.
>>
how feasible is it to download all of wikipedia?
>>
>>101418346
https://en.wikipedia.org/wiki/Wikipedia:Database_download
https://dumps.wikimedia.org/

20 something GB for english, text only Wikipedia archive.
>>
bump
>>
>>101418375
wow, that's really small
>>
>>101417116
Thanks OP, great stuff. I don't have much to contribute, but I am working on a site showcasing interesting stuff that I find archived. This might be too broad and vague, but is there any way to find interesting stuff easier in archive.org and such sites? Like some generic tag or keyword? So far what I have in mind are just 4chan infographic/guide compilations and some almost lost media, like some obscene Dilbert parody comics that were taken down from the normal web through copyright slapsuits.
>>
File: 1718593104496432.png (43 KB, 485x219)
43 KB
43 KB PNG
>>101418602
>but is there any way to find interesting stuff easier in archive.org and such sites
Aside from the collections already advertised at the home page, you hav the "This just in" links to auto sort the category by newest.
>>
byump
>>
>>101418697
Thanks. You're doing great work with these threads. Don't let the lack of replies dissuade you, stuff like this is very useful. I like to regularly check on different web directories and link compilations and so much good stuff gets removed off the Internet forever in the span of just a few years.
>>
>>101418697
Neat, I'll check that out after work.
>>
>>101419939
>Don't let the lack of replies dissuade you
That won't ever happen, I know a lot of people view the list and that is enough for me.

There is a substantial rise in the amount of people in the recent years who are realizing internet is indeed not forever and talks about lost media where you see a lot more people archiving things, but we are far from most of the people knowing the easy tools they already have to ensure the digital content they view can be accessible by them and others easily and forever.
>>
>>101420606
I reposted the thread yesterday but some anons were not satisfied with your formatting
>>
>>101421092
The idea of the thread is to explain to people who don't know much about archiving in layman's terms what archiving tools are available and what they are as if I were explaining it to a friend IRL.

Given that there isn't any really redundant information within the text, there isn't much to remove. And I'd rather not make it just into a simple list (I already link https://github.com/iipc/awesome-web-archiving anyway), because that defeats the purpose of explaining in one place the most important archiving tools to people who don't know much about archiving.

Everyone who already is into archiving is just going to hang out in the linked IRC or similar specific communities probably anyway.

And as long as the text is formatted in a way where anyone can quickly shift through the different separated tools sorted by importance easily, the length of the initial list doesn't really matter.
>>
File: 1721082475554.jpg (189 KB, 1644x3048)
189 KB
189 KB JPG
just past 200+ uploads :)
>>
Op, I appreciate you routinely making these threads for great justice. Bump.
>>
>>101421422
unfathomably based
>>
>>101418602
>Thanks OP, great stuff.
>>101419939
>You're doing great work with these threads.
>>101421520
>I appreciate you routinely making these threads

Who are you trying to fool?
>>
>>101421573
Cope, seethe, and dilate
>>
File: 1710084614772215.png (75 KB, 544x804)
75 KB
75 KB PNG
>>101421573
If I cared about engagement I would do this every thread, or in a lot of them, meanwhile what you see is mostly me posting it and it dying after a couple of bump replies, but I already explained why I don't care about anything other than reminding people about the tools that they have >>101421219
>>
>>101421573
Don't be so jaded anon, some of us really appreciate stuff like this. It's not even just about preservation, I just like to dig around archives and see what interesting stuff I can find.
>>
>>101418375
they beg for money every year for 20 GB???
>>
>>101422057
To my knowledge, they beg for money to fund activist causes, Wikipedia itself is well funded for decades to come.
>>
>>101422057
>text only
>>
BUMP
>>
>>101421638
>using 4chanX
>>
File: 1721110632305.jpg (167 KB, 1600x1200)
167 KB
167 KB JPG
I refuse to let this thread die
>>
>>101417116
Why is the upload speed so low when I upload files to archive.org?
Even a 1 GB file can take me an hour and it doesn't make sense. I should be able to upload there at 30 MB/s at least.
It really slows me down and makes me not want to upload media there anymore.
>>
i'm currently seeding 20% of libgen.
am i doing gods work or should i be doing something else?
>>
>>101425086
A lot of people are piping a lot of gigabytes of data to it every second from a lot of automated software that has to be written to multiple HDDs and then processed and derived in a lot of ways paired with thousands of people requesting for very different obscure data from all over the place.
A lot of the times there is insane increase in bandwith requirements for various reasons that slow down the entire site.

Check your upload outside internet archive to make sure it's not you just in case but yes, sometimes you just have to leave it to upload in the background.
>>
File: 1721123651809.jpg (127 KB, 1636x2279)
127 KB
127 KB JPG
bros is there any way to see who starred your archive items? I've been trying to search how but all I get is the archive help page for the star system.
>>
>>101426451
Don't think so, no.
>>
>>101425123
Media is one thing, but book preservation and distribution is one of the best things one could ever do. I'm not sure what could top it.
>>
File: 1721135013240.png (164 KB, 1539x1822)
164 KB
164 KB PNG
Courtesy bump
>>
>>101427427
I'm doing ecological preservation
>>
>>101428530
Overrated, nothing lasts forever. Best we can do is have some fun until it all comes crashing down. I'm joking for what it's worth, I want this world to last at least until I die, when the world ends with me.
>>
>>101417116
tranny hobby
>>
>>101429834
oy vey dont you dare preserve information goy, what we tell you today is what reality is
>>
>>101429834
Are you struggling with your gender? That would be the best explanation for your way of thinking even though you could never admit it, not to yourself nor to anonymous strangers.
>>
what even is there to archive
>>
>>101431659
Just lurk instead of asking dumb shit.
>>
>>101431800
>le lost media
You just have OCD
>>
File: 1695924470536698.png (331 KB, 512x512)
331 KB
331 KB PNG
>>101432135
>>
>>101432135
It's not just lost media, plenty of stuff gets deleted, the Internet is far more impermanent than it might seem. Not that long ago some major blogging site deleted a ton of stuff. I believe they gave warning ahead of time so the stuff got archived elsewhere. It might not seem like anything useful until you actually need the information.
>>
>>101433087
everything should be ephemeral
>>
>>101433227
This braindead, two cent approach results in historical revisionism. Believe me when I say your enemies use this against you.
>>
File: osaka!.jpg (75 KB, 750x600)
75 KB
75 KB JPG
archive the internet for osaka
>>
I wish someone had saved the page of my favorite niche game
>>
>>101435612
What is it called? Aren't there any backups? Sites often steal and repost games.
>>
>>101417116
make a better OP, because of you it's difficult to look at the archives if I want to search for posts regarding tartube or httrack
>>
>>101417116
Bump, does anyone know if I can use a raspberry pi 2 as an Archiveteam_warrior?
>>
Bump
>>
>>101437525
>because of you it's difficult to look at the archives if I want to search for posts regarding tartube or httrack
How so?
>make a better OP
Like?
>>
bump
>>
.
>>
I don't know how people find the motivation for this stuff, but I'm glad someone does it. Any time something major happens I tell myself I should start archiving things locally but I never do. Seems like a massive headache to organize it all.
>>
>>101439453
>don't know how people find the motivation
Usually by wanting to experience something they didn't in a decade or more that they liked a lot before and then realizing you can either barely find it or not find it at all. Then when you do, you are of course going to save it locally for later.

Then there is no reason why you couldn't publish that something for others to now more easily find.

Over time you realize how much is disappearing and how easy it can be to archive it and save it forever.

>Seems like a massive headache to organize it all.
As long as most of the data is downloaded, zipped and published somewhere, it's enough. It will be easily sorted one day.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.