[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


Janitor applications are now open. Apply here!


[Advertise on 4chan]


This is a general which is focused on archiving, but also interested in other related topics.

Storage technology and file sharing:
Hardware, software, services, shadow libraries, backups, home server, and networks such as tape drives, HDDs, file systems, archive.today, IPFS, Arweave, BitTorrent, etc.

Development:
Example topic: web archiving is much harder in 2026 compared to 2016. Too many websites are walled off by systems such as Cl0udflare, making it impossible for services such as web.archive.org, archive.is, and megalodon.jp to capture their webpages. That's a big chunk of important data that easily disappears with no web archive captures. We have to develop solutions to this, such as using the SingleFile extension and other stuff.

In-depth history:
Examples: get into the "minutia and trivia" about the history of websites and all the little changes, or, talk about more important web history events and future events such as sites closing.

Analysis:
Examples: analyzing files and folders that you obtained from scraping or data hoarding, or, what you're sad was lost and not archiving, what you're glad was archived.

Questions:
Ask whatever questions about any of this.
>>
File: KPC-Blog-Tape-Library.jpg (150 KB, 1140x502)
150 KB JPG
Inspirations for this general:


/dhg/ - Data Hoarding General
>Links
>Rentry: https://rentry.org/dhg
>
What is /dhg/
>In this thread we discuss and create technology and software for data-hoarding, archiving, scripts, and more.
>
>gallery-dl - scrape images, manga, videos and more from many websites
>https://github.com/mikf/gallery-dl
>
>Hydrus Network
>https://hydrusnetwork.github.io/hydrus/
>
>Stash
>https://github.com/stashapp/stash
>
>SmartImage
>https://github.com/Decimation/SmartImage


/dapp/ P2P Decentralized Applications General
>Share your favourite dapps here.
>
>Examples:
>
>brig https://brig.readthedocs.io/
>ipfs https://ipfs.io/
>ZeroNet https://zeronet.io/
>Arweave https://github.com/ArweaveTeam/arweave
>Gitopia https://gitopia.org/
>BitTorrent
>
>Leave your suggestions below.
>
>These components collectively make up the future internet known as web3.


/dshag/ thread
>Data scraping, hoarding and analytics general thread.
>What are you scraping, hoarding or analyzing frens? Also post some pics so I can post them from next time, anime also works
>>
I wanted to make this about more topics than just archiving and data hoarding as I don't think that attracts many posters.

Also, /asdiq/ sounds like "ass dick". HAha, hope this general never dies. At least it isn't exactly another AI slop general.
>>
>>108914628
Nice idea. I've always felt that archival is going to become more and more important with the passage of time, especially in the face of rising storage costs, increasing surveillance and corporate greed.
>>
With IPFS gateways, I can have whatever URL path at https://site.com/ipfs/[cid]/[path] or https://[cid].ipfs.site.com/[path]

This is great, and directly helpful for archiving, but is there a way to have the URL contain a question mark? Not possible with ipfs gateways. Possible with a .onion site, but I don't want to use that anymore.

Do I really have to pay for some domain name so I can run this?:
https://site.com/memento/20260203040506/https://othersite.com/index.php?id=123

(Using ipwb.)
>>
>>108914674
Yup. Corporate greed makes grabbing some websites basically impossible. More reasons that archival becomes more important:

We live in the enshittification era of the Internet. Both web.archive.org and archive.org/details/ are enshittified procensorship hellholes that shouldn't be trusted. We need more alternatives and support for existing alternatives.

A year after BitTorrent was created, there was maybe tens or hundreds of terabytes of torrents. Decades later, that's ballooned into a much bigger and much harder to manage size if you want to capture a large part of it. Same can be said of other stuff. Many things drop off and are forever lost.

The world creates so much more data per year than it did last year. So far, it's an ever increasing trend. I learned that from reading about Filecoin (kinda sucks); I hope they finally got this FilBeam thing working:
https://docs.filecoin.cloud/reference/filoz/synapse-sdk/filbeam/toc/
>>
>>108914675
some chatgpt solutions:

>Encode the archived URL so it fits into path
>Instead of raw ?, encode the full target URL (base64, percent-encode, or use a path-safe encoding) and have your ipwb or handler decode it. This avoids needing special host handling. Example path: /memento/20260203040506/https%3A%2F%2Fothersite.com%2Findex.php%3Fid%3D123

>Use a free TLS proxy (ngrok / localtunnel / Cloudflare Tunnel)
>Cloudflare Tunnel (free) with a free workers.dev or *.trycloudflare.com address can front your local ipwb server and accept queries. Ngrok has paid TLS subdomains for custom domains; free subdomains rotate.
>>
How hard is it to have a hard drive and a pi running on a crt 24/7 ish simulating say Boomerang AMC reruns but instead of shitty old TV my favorite phonepost doomscrolls?
>>
>>108914954
Sounds fairly easy once you have all the hardware and connectors to the CRT TV.

Collection of images named this
img001.jpg
img002.jpg
img003.png
...
(GIF probably also works)

Then
ffmpeg -framerate 1/6 -i img%03d.jpg -c:v libx264 -r 30 -pix_fmt yuv420p out.mp4

Then play the "out.mp4" video. Done, slideshow of images at 6 seconds per image.

Reminds me of my time copying VHS tapes to DVDs. I could say more about that.
>>
>>108914940
Percent encoding method didn't work (I think I knew this months ago but forgot). https://archive.is/hFLPb is proof that it fails.

A file named
"https%3A%2F%2Fsite.com%2Findex.php%3Fpage%3Dpost%26s%3Dview%26id%3D12345679"

Becomes this in a gateway (double percent encoded):
https://[cid].ipfs.ipfs-02.hypha.coop/memento/20260527051814/https%253A%252F%252Fsite.com%252Findex.php%253Fpage%253Dpost%2526s%253Dview%2526id%253D12345679

We need it to be /memento/20260203040506/https://othersite.com/index.php?id=123 (or single percent encoded?) so archive.today can index it to othersite.com and not just *.hypha.coop
>>
>>108914940
>localtunnel
This would be fuckin dope if it worked with no walls:
>https://theboroer.github.io/localtunnel-www/
>$ sudo npm install -g localtunnel
>$ ipfs daemon &
>$ ipwb replay 20260527051814-https---rule34.xxx-index.php-page-post-s-view-id-13656708.cdxj &
>$ lt --port 2016

I got the random tranny porn web capture to show up in clearweb at
https://tidy-meals-feel.loca.lt/memento/20260527051814/https://rule34.xxx/index.php?page=post&s=view&id=13656708

BUT ONLY after clicking/copy-pasting on some verification shit. Works flawlessly if using a .onion site:
https://archive.is/ysIMX

but I said I didn't want to use that anymore.
>>
>>108915771
It's sad that the Tor2clearweb gateways have all went extinct. I could have used those. I'm now trying to use this thing:
https://localxpose.io/apps/nginx

Works:
>$ sudo npm install -g loclx

Fails:
>$ loclx tunnel http --to http://localhost:2016
>bash: loclx: command not found
>$ sudo npx loclx tunnel http --to http://localhost:2016
>sh: line 1: loclx: command not found

Works?
>$ npm config set prefix "$HOME/.local"; npm install -g loclx
>>
Archive-related news:

Deathwatch
>https://wiki.archiveteam.org/index.php/Deathwatch#2026-05
>May: Bucknell University Press will close at end of the 2025-26 school year.[61]
>May: The Primary School will close at end of the 2025-26 school year.[62]
>May: Sterling College will close at the end of the 2025-26 school year.[63]
>May: Trinity Christian College will close at the end of the 2025-26 school year.[64]
>2026-05-31: University of Houston Digital History will close.
>2026-05-31: Tistory will remove all uploaded videos.
>2026-05-31: plus a, a site documenting theater, will shut down.[65]
>2026-05-30: https://minelli.fr/[66]
>2026-05-30: ruru-jinro.net, ruru-jinro is an online Japanese werewolf game server that has been operating since May 2009. It is scheduled to close on May 30th (JST).[67]
>2026-05-29: Tele2 will be discontinued by it parent company Odido[68]
>2026-05-28: NIKKEI COMPASS will close service.[69]

Silicon Valley VCs Invest in Head-Mounted Cameras on Workers in India For Training AI
>https://web.archive.org/web/20260527022137/https://gizmodo.com/silicon-valley-vc-backs-startup-that-gathers-ai-datasets-from-head-mounted-cameras-on-workers-in-india-2000761062
>Human Archive believes its technology "will become foundational infrastructure for automating manual labor."
>A video went viral in India about a month ago appearing to show a vast number of garment workers wearing tiny, head-mounted cameras while they worked in a dreary-looking factory. A widespread hunch was the technology the video depicted was a system for what’s known as egocentric data collection—gathering first-person footage of people in action to train AI models, in order to replace the workers with robots. But it wasn’t totally clear if the video was real, let alone if the footage would or could be used to replace the workers.
>>
Is there a localhost to Internet thing which doesn't suck? Hoping one exists that doesn't require a login/verification. Such things did in fact exist in the past: see Tor2web and >>108915771 before it required said verification.

Otherwise, I'll have to make account(s) and pay for it.

>>108915974
>Works?
Nope:
>$ ~/.local/lib/node_modules/loclx/bin/loclx tunnel http --to http://localhost:2016
>Error: unauthenticated access
>>
This is an email from 1995-12-26 10:46. It has the subject line "Red Neck".

This image was deleted off of https://archive.org/details/ because that website is ran by petty fucks.

Full/original image in ar://:
- meta: https://thuanannew1.store/raw/B99wT2us-zAEYox4b1tSVGpgwYGw_N5V5XRNlKjQUvM
- data: https://bienchecung.store/raw/LaM_OMXzH7bxlANb9_K_IF8u9F7F-kg3KfpN0W66q0k
>>
>>108914639
Forgot about this general which I first saw months ago:

/AAD/ - Archiving And Donating computer resources general
>>108890811

Most recent thread in that general died in 2026-05-24:
https://desuarchive.org/g/thread/108890811/

Last post was:
>Another bump. I just wanted to say that I can't live without the Wayback machine anymore. I'm working on a project that often involves dead links and it would have been far more difficult to complete without it, maybe impossible. Whatever happened to "if it was uploaded to the Internet, it's there forever" or however the saying went?
>>
>>108914639
>https://ipfs.io/
Sadly, since May 13, 2026 all of the https://ipfs.io/ipfs/[cid] links redirect to
>title: IPFS Service Worker Gateway | HEAD@[7 hex characters]
>url: https://[cid].ipfs.inbrowser.link/
which is an inferior IPFS gateway.
>>
A month or two ago, I bought a used 4-TB HDD for 15 USD per terabyte. I have reason to believe that it was only lightly used. I catted it all out to /dev/null and saw no storage medium errors. U jelly?
>>
>>108914628
>hoard a bunch of shit in 2012
>it just lies on the NAS for over a decade, providing zero value to anybody
idk man, the zombie apocalypse just aint coming
>>
>>108917635
Breakdown of what you have?

I could think of value that it has such as
- deleted YouTube videos
- torrents which are dead now



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.