[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/t/ - Torrents

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


Janitor applications are now being accepted. Apply here.


[Advertise on 4chan]


File: ostorich.png (53 KB, 300x420)
53 KB
53 KB PNG
A continuation and generalization of the Yuki.la archive dump threads.
Post torrent scrapes and dumps from other (and late) archives and discuss any certain occurring idiocy from other archivers so others can consider on taking better charge and preparation to archiving 4chan threads.

Last thread dump: https://archived.moe/t/thread/1033260/
the Uber-compress dump: https://archived.moe/t/thread/1108330/
>>
File: file.png (113 KB, 1158x827)
113 KB
113 KB PNG
To reiterate, this was DMCA, LOL's "uber-compressed" dump of Yuki.la from the other thread, but mods deleted the thread for no fucking reason, unless it was to keep the threads singular.
So I'm reposting it here for better visibility:
>I was unhappy with the file-size and compression method in the original torrent, so I decompressed all the .TAR.GZs.
>I then recompressed the TARs using the best compression method available for text files, PPMD.

>This uber-compressed archive is meant for long-term storage of the data for datahoarders, not for immediate use.

>Here are the archive's statistics:
>Original uncompressed input size: 3.59 TB
>Original compressed input size: 517.71 GB
>Original number of files: 52,686,360

>Final output size: 249 GB
>Final compression ratio: 6.77%
>Final archive decompression speed: 16 MB/s

>Torrent download: https://btcache.me/torrent/212342C6AE503F02957382851CFBA340E273299F
>Magnet URL: magnet:?xt=urn:btih:212342C6AE503F02957382851CFBA340E273299F&dn=4chan.7z&tr=udp%3A%2F%2Fopen.tracker.cl%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce
>>
anyone got the best of 4chan greentexts?
>>
>>1153106
so it is just text?

and... how does one implement this as useful? not a lot of details on a 250 GB torrent dice roll
>>
>>1153615
>how does one implement this as useful
See the OP >>1033260
>>
>>1153511
>>>/r/
>>
>>1153615
>>1153629
4chan is a collection of rambling idiots, even if we make a great speech AI, the only use for it would be to shitpost (See gpt-4chan).
>>
daily reminder that the wakarimasen admin is a nigger
>>
File: thanks reddit.png (526 KB, 576x768)
526 KB
526 KB PNG
>>1153631
>4chan is a collection of rambling idiots
you pretty much describe the nature of the internet, although it's not exclusively for idiocy.

or are you going to convince me other sites are much in a brighter spot?
>>
>>1153670
>are you going to convince me other sites are much in a brighter spot?
Smaller forums are at the ends of the distribution, they're either places of genius or the utmost retardation and weird shit. Mainstream websites are at the center, and the center is a place of average, we're idiotic.
>>
Fireden is back up... except for /v/ and /vg/. Or at least the websites that redirect to the onion address are down. If someone still has /vg/'s onion address can they test if it's still up with TOR?
>>
>>1154494
nvm it seems to be still down
http://hidulaoe3wnqi3jejmizgdcwsgenf777j2f4qfcxvt4yrx7lbhd2a2yd.onion this I used to search on /vg/
>>
waka still down...
>>
File: 1350773744688.png (84 KB, 300x300)
84 KB
84 KB PNG
>>1153954
>Mainstream websites are at the center
>>
waka still down, also desu no longer archives /gif/
>>
rump
>>
Is there an archive that lets me search /t/. The few Ive seen dont allow for searching
>>
>>1158790
https://archiveofsins.com/t/

https://archive.4plebs.org/_/articles/credits/
4plebs has a nice break down for different archiving sites.
>>
>>1157550
>also desu no longer archives /gif/
SHIT really?
does 4plebs does that now?
>>
>>1159963
only archived moe does
>>
>>1160009
>archived.moe
not that neglective shithole again...
>>
>>1158851
thanks anon
>>
>>1158851
shit, i thought Sins was dead
>>
Is wakarimasen not coming back?
>>
>>1163798
probably not
>>
>>1166084
sucks...
>>
old archive of 8ch and 4chan posts, screencaps, including oldanon advice, bmw, lain, etc magnet:?xt=urn:btih:b6b53efe52a78ee3e1570a3a9462e616241c81ff&dn=_fringe&tr=udp://tracker.openbittorrent.com:80&tr=udp://tracker.opentrackr.org:1337/announce
anon's 2013-2020 4chan folder magnet:?xt=urn:btih:9c7e05658e1591d6a741578463c5e50bc6427776&dn=Anon's+2013+-+2020+4chan+Folder&tr=udp://tracker.openbittorrent.com:80&tr=udp://tracker.opentrackr.org:1337/announce
>>
>>1167081
>old archive of 8ch
b-b-based
how far does this one goes back to? was it before their shitty attempt to replace the site with the shittier "beta" version? the Josh bullshit?
>>
>>1167081
interesting
>>
File: WHAT_IS_GOING_ON.png (2.3 MB, 1400x1400)
2.3 MB
2.3 MB PNG
>desuarchive and archived.moe are both down due to cuckflare shit
>>
>>1169201
Bibanon's down too
>>
File: E4cVcUbXMAEOSnJ.jpg (12 KB, 424x257)
12 KB
12 KB JPG
>>1169207
>Bibanon's down too
>>
>>1169213
And KiwiFarms just got blocked, moved to a .ru site
CWCki's still dead
>>
>>1169201
Update apparently desuarchive has been down for maintenance all of today
Idk about archived.moe
>>
>>1169201
>>1169213
desuarchive and Bibanon are back up
>>
>>1169284
Nice
but Archived.moe is dead?
>>
>>1169287
Yeah it's still down
>>
>>1169287
>>1169289
Disregard this it took too fucking long to load but it's back up too
>>
>>1169201
they back
>>
>>1153106
If I wanted to archive this in seperate 40 gigabyte segments to make it survive until the heat death of the universe, how can I do that on linux? I can't figue out multipart zips at the moment and I think I'm retarded

I back everything up onto physical disk so that it will survive electromagnetic radiation etc
>>
>>1170179
There are archive formats (bzip2) that split a file into independently-compressed sections, so that corruption is localized to one section and the majority of the file can still be recovered. But stitching together uncorrupted sections still takes know-how and effort, so splitting into independent files on boards and thread number ranges might be more robust. Or you can do both.

More important though is keeping it backed up to multiple locations, and keeping those backups fresh.
>>
>>1170600
can you post a guide, im tech retarded but I need to learn about this specfically for my archive, since I have about 15tb to back to disk atm, and 3tb already done
>>
does anyone have the gpt-4chan bot?
>>
>>1170868
https://huggingface.co/ykilcher/gpt-4chan/discussions/4

for context
>>
>>1167081
any seeders?
>>
Fix for 2018's 2005-2008 text-only "Ten Billion" archive:
https://archive.org/details/archive_ten_billion_patched
>>
>>1173165
Nice thanks
>>
wakarimasen still down...
>>
So have we worked out a solution for any of this? I was just in >>>/r/18586430
The big problem to me has been gif since it just isn't archived anywhere anymore. At least the others still mostly exist, but scattered.
>>
>>1175354
To my knowledge, no. At least not if you want webms and gifs, because those take up much space and archivers don't want that.
>>
>>1153106
bump
>>
Does Pepe count?

magnet:?xt=urn:btih:14ef977445757b827b69ccad7ced84445171f23a&dn=pepe.rar&tr=http%3a%2f%2f24.77.48.250%3a9000%2fannounce
>>
>>1153106
I've tried, again and again, but never succeded. Does anyone have the old Tom Green trolls/raids on his late-night talk show. Most of them were scraped from YouTube...and the Internet.
>>
>>1180342
What's in that link, Pepe?
>>
>>1167081
someone reseed _fringe please
>>
>>1160009
archived.moe doesn't save full images from /gif/, only thumbnails. See this thread for full images of 2022-10: >>1182460 = torrent which does NOT need more peers right now!
>>
File: 1495858987880.png (523 KB, 1111x597)
523 KB
523 KB PNG
>>
literally no site archives /gif/ now with all the files still in...
>>
File: 1560812557338.jpg (39 KB, 640x615)
39 KB
39 KB JPG
>>
Is fireden's onion still down?
>>
File: 1631644750564.jpg (1.76 MB, 3538x3424)
1.76 MB
1.76 MB JPG
>>
>>1184441
Content aside, it's probably just too big to bother for a lot of archivers.
>>
Anyone got 2011-2016 archives with images?
I really want to know why glowies shut them down left and right.
>>
https://warosu.org/jp/thread/1786172#p1786183
is there a image archive of this thread anywhere
>>
>>1189100
It's probably not glowies but just costs of running the archives, especially storage if images. I think archives remove specific images if glowies request it.
>>
>>1184441
Good fuck /gif/.
>>
Arch.b4k.co sucks so much ass
>>
>>1192126
Normally it's pretty good, but sometimes they break the search.
>>
>>1181947
Am i only one afraid yer curius to download this link?
>>
>>1195435
Maybe, I didn't try it.
>>
>>1193682
The search always seems to have a 50% down rate whenever I take a gander at it.
>>
>>1198603
Oh well, what can you do.
>>
Why do archives only accept bitcoin and shitcoins and never regular old money for donations?
>>
File: 1562103122299.png (1.38 MB, 1100x1210)
1.38 MB
1.38 MB PNG
>>
>>1153954
>>1153670
Reddit is far more superior than this.
>>
>>1167081
>old archive of 8ch
Kill Your Self
>>
>>1170868
>>1170870
archive.org
its basically just a snapshot of pool
>>
>>1157550
>>1159963
>>1175354
>>1177167
>>1183054
>>1184441
>>1191379
>>1188946

Use this bruh

https://gist.github.com/4chenz/de3a3490aff19fd72e4fdd9b7dafc8f4
>>
>>1202929
Because A. Security Reasons B. Security Reasons and C. Security Reasons
Really wish that they would support paypal or some digital bank transfer app.
>But you can use paypal or some third party to convert it into crypto
MY ASS
>>
>>1206416
No really, fuck /gif/. It's been a complete cesspit for years now and any worthwhile archives are long gone.
>>
File: 1605855525334.jpg (11 KB, 248x203)
11 KB
11 KB JPG
>>1153106
I'm leaving this site forever and this seems to be a good Thread to post my last and final post on this site.
I will remember this site fondly as it shaped my humor and behavior as a young adult but now it is time I move on. This site is holding me back from my true potential.

Bye 4Chan.
>>
>>1184441
>>1188946
https://archived.moe/talk/thread/1479/
If you have $150 to burn every month you can donate to this admin or make an archive yourself.
>>
>>1206485
Unironically happy for you if true
>>
Anonymous Sat 05 Dec 2020 00:24:47 No.982410
Bump
>>
>>1153631
>>1153631
Can use gpt-4chan to create a customer service chat bot
>>
>>1211271
ROFL!
>>
>>1211271
Speaking of which, can anyone find Chat-4Chan anywhere? Can find neither a site nor a torrent.
>>
>>1215162
Here https://github.com/oobabooga/text-generation-webui#gpt-4chan
>>
>>1215348
>https://github.com/oobabooga/text-generation-webui#gpt-4chan

Many thanks!
>>
File: 4plebs is purple.png (40 KB, 1326x658)
40 KB
40 KB PNG
4plebs turned purple for me.

>>1206485
See you later.
>>
>>1206931
>>1217424
Yeah sorry guys, it only lasted like 40 days. I've been back for 2 days now. I just can't stop coming here. I unironically had withdrawal symptoms whilst I was not here.
>>
>>1217434
Remember - you are here forever.
But jokes aside, just try to post and lurk less, so it won't matter if you come here from time to time.
Get something else to do.
>>
>>1153106
Bump
https://desuarchive.org/co/thread/136778492/#q136778492
>>
File: 1582650951527.jpg (56 KB, 458x445)
56 KB
56 KB JPG
>>1217434
Pick a curated number of threads and just stay in those until they expire? Be extra-choosy in adding to that set so you can hit them all quickly and move on to your next activity?
>>
File: 1520213636602.jpg (16 KB, 250x250)
16 KB
16 KB JPG
>>
File: 1570127619190.jpg (347 KB, 1200x700)
347 KB
347 KB JPG
>>
These archival dumps are very useful
>>
>>1153106
Does anyone have any 2013-2014 /lgbt/ archives? According to https://wiki.archiveteam.org/index.php/4chan WorldAthleticProject and Foolzashit were monitoring it
>>
>>1231338
This would be the time of year for that.

Perpexity dot ai doesn't know either, though.
>>
>>1231338
https://chan.k47.cz/
>>
>>1233710
Thank you anon, you are the best.
>>
>>1153615
Yesterday I started to make some of my python objects capable of loading the html files from the pol tar.gz. Can't extract all the data the 4chan api gives, but if there's something in the html files, it can be extracted.
I don't know how much more I can tweak this or how well will it perform, but it's probably more useful to extract the data into the "standard" API's JSON format, because while html is highly compressible, the majority of the data is just redundant html. I expect my extracted data to be significantly smaller than these, even uncompressed.

So my question is: if I test my code for more boards (there are can be some differences, like pol has flags and IDs), will you guys with better bandwidth extract the data and share it in this new format?

Also I wouldn't pack a whole board into a single tar.gz, monthly batches at most.
And I won't share data because I can't afford the bandwidth, but I can upload the python code somewhere (although I'm gonna have to somewhat document it) and others could convert and share the archives of the board of their interest...
>>
>>1217434
Oh, and stay away from the boards where things expire in 24 hours or less. Too time consuming.
>>
File: EfqP6W8WkAc_RmO.jpg (37 KB, 750x394)
37 KB
37 KB JPG
>>
>>1153631
>4chan is a collection of rambling idiots
im so sick of this stupid fucking meme, yes there are people that trolled but ive seen a just as many serious and insightful posts on here as i have seen shitposts, though recently the shitposts have gone up and you know why? you know WHY? its cause of faggots like you perpetuating this mindset that 4chan is the funny maymay site ONLY good for shitposts and that there cant ever be any serious discussion of any sort whatsoever because this is 4chan so all discussions MUST be maymay shitposty garbage, you spread this shit to all the stupid fucking redditors who then flood the site and turn it into nothing but pure shitposting, shitposting wich isnt even GOOD because its all regurgitated REDDIT SHIT
>>
Regurgitated bump from Reddit from right before its management broke it.
>>
>>1240329
you're a newfag my man, I've been here since 2007 and the majority of 4chan boards with a few exceptions were always just different flavors of /s4s/. it has changed a little because people grew up but shitposting has always been prevalent here and the majority of "reddit shit" you mention is stuff that originated here but stopped being cool and considered reddit by retards trying to fit in
>>
>>
What are good 4chan archive sites?

Any for /gif/?
>>
>>1217434
If you have withdrawal symptoms that proves you're addicted and you need to break the cycle. Keep trying anon.
>>
File: 1541430232647.png (17 KB, 861x758)
17 KB
17 KB PNG
>>
File: go_back.jpg (232 KB, 571x566)
232 KB
232 KB JPG
>>1206413
off you go then sister
>>
Would be nice if archive moe enabled search for some boards. At least for dates/time periods that other archives don't cover.
>>
/lgbt/ archive
>>>/lgbt/32197766
>>
>>1153106
I put the data into Parquet files, and got even better results than the uber-compressed version (27% vs 48% of original compressed size).
Plus, they're quickly queryable using DuckDB.
>>
File: 1603101783165.jpg (171 KB, 1024x1024)
171 KB
171 KB JPG
>>
File: 1590187261256.png (497 KB, 654x663)
497 KB
497 KB PNG
>>
File: 1688312378703411.gif (745 KB, 385x381)
745 KB
745 KB GIF
>>1205376
By that cuck killing Pepe, he set him free. Pepe belongs to 4chan now.
>>
>>1206414
Shut up fag
>>
>>1167081
reseed pls
>>
File: Pepe_Not_Give_Up.jpg (65 KB, 1080x780)
65 KB
65 KB JPG
>>
File: Pepe_stuffed.jpg (124 KB, 880x789)
124 KB
124 KB JPG
>>
File: 10694-29164-9431.jpg (18 KB, 300x300)
18 KB
18 KB JPG
>>
>>1234051
>>1233710
Is there an API for this? I don't want to scrape it, butttt
>>
File: Pepe_in_the_Dark.png (179 KB, 640x462)
179 KB
179 KB PNG
>>
>>1189947
seconding this 10 months later... for meido
>>
>>1153106
how does one get these
>>
>>1267046
Check if this works https://archive.org/details/yuki.la-compressed
>>
>>1252664
cute
>>
>>1153631
Should probably feed it to an anime girl A.I.
It's the only logical conclusion to this site.
>>
>>1272032
>>
File: scatbump.jpg (41 KB, 600x750)
41 KB
41 KB JPG
>>
>>1167081
it stays on searching for metadata
>>
Happy holidays and happy archiving.
>>
I wonder if anyone will study the history of 4chan via these archives....
>>
File: update_2024_01_18.png (197 KB, 771x798)
197 KB
197 KB PNG
Making lots of progress on this frontend project. Help is appreciated

https://github.com/sky-cake/ayase-quart/tree/main/preview
>>
New 4plebs yearly dump is being uploaded at https://archive.org/details/@fourplebs

Torrents are available
>>
How do I get images pre 2012 from /jp/? Any image on warosu before mid july of 2012 is just the thumbnail and it sucks.
>>
File: 1327602666589.jpg (347 KB, 1376x1032)
347 KB
347 KB JPG
>>1288250
Most likely no.
>>
>>1206415
that wouldn't be too bad but I can't get the fucker to stop talking about the 2016 election
it also has trouble dealing with post numbers and will just spit out strings of numbers after a while
>>
File: 1343957368332.jpg (48 KB, 680x578)
48 KB
48 KB JPG
>>1290498
I'm just hoping that someone has a collection of images from that time that they could upload. I know there is no organized archive for what I'm looking for.
>>
Is there a 2014-2015 archive of /v/? I think fireden is still down.
>>
>>1286506
/trv/ seems to be missing from https://archive.org/details/4plebs-org-thumbnail-dump-2024-01. I'm not sure if it's intentional or not.
>>
Like tears in the rain.
>>
>>1291205
don't know if you are still here, but there might something worth looking in here
https://archive.org/details/2013_10_25_4chandata
only downside is that the download speed is slow as shit
>>
Is FoolFuuka still the way to go?
>>
>>1285051
Well, video essayists cite them all the time
>>
>>1298617
This is true (also bump)
>>
>>1153106
what's the best way to search through the archive?
Just spinning up a local foolfuuka instance?
>>
>>1217434
I stopped lurking /v/ for 4-6 months back in 2013
Now im highly capable of not browsing there 24/7, very easy to avoid cancerous threads.
>>
>>1291297
Seconding
There oughta be someone with a spare /v/ archive, at least during the days of DP and foolz
>>
>>1295780
>slow
Archive.org downloads are actually quite fast if you use the torrent option (even for rare files)
>>
>>1298409
I would say no, there are lots of modernized tools now.

Here are the scrapers I have tried

https://github.com/bibanon/neofuuka-scraper (works good, just have to set up the db schema yourself, see https://github.com/sky-cake/ayase-quart?tab=readme-ov-file#neofuuka for help with that)
https://github.com/bibanon/neofuuka-scraper/pull/4 (with thread filtering)
https://github.com/sky-cake/Ritual (has the most flexible thread filtering)
https://github.com/bbepis/Hayden/tree/master/Hayden (works good, takes care of schema creation)

Here are the frontends I have tried

https://github.com/sky-cake/ayase-quart (works good, still being developed to add new features)
https://github.com/bibanon/ayase (bare bones, vulnerable to sql injection, only use yourself)
>>
>>1291297
Fireden used have a TOR link that had the archive but has stopped working. I do have an archive of some random template images from some random thread.
>>
>>1206485
bye and good luck!
>>
>>1314498
thank you
>>
What tools do you use to import dumps? From 4plebs or other archives.
>>
Did anyone ever save the /new/ archive from 2010? I remember seeing it for a while afterwards but all traces are gone now
>>
>>1316364
Tell me more.
>>
yuki.la
ded
>>
File: 14izhc.jpg (49 KB, 960x776)
49 KB
49 KB JPG
>>1319623
>>
>>1312854
thanks for this anon, that's actually really helpful!
>>
File: 1724003540186.jpg (618 KB, 1080x1069)
618 KB
618 KB JPG
>>1324252
Hey, glad I can help!
>>
File: 1725030001564.jpg (726 KB, 1080x1040)
726 KB
726 KB JPG
>>
>>1312854
Thanks
>>
File: 1728395189004.jpg (799 KB, 1080x1040)
799 KB
799 KB JPG
>>1331193
'comb
>>
>>1319648
What did anon mean by this?
>>
>>1206413
Sadly true in a lot of cases.
>>
>>1206413
Reddit is gay.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.