A continuation and generalization of the Yuki.la archive dump threads.Post torrent scrapes and dumps from other (and late) archives and discuss any certain occurring idiocy from other archivers so others can consider on taking better charge and preparation to archiving 4chan threads.Last thread dump: https://archived.moe/t/thread/1033260/the Uber-compress dump: https://archived.moe/t/thread/1108330/
To reiterate, this was DMCA, LOL's "uber-compressed" dump of Yuki.la from the other thread, but mods deleted the thread for no fucking reason, unless it was to keep the threads singular. So I'm reposting it here for better visibility:>I was unhappy with the file-size and compression method in the original torrent, so I decompressed all the .TAR.GZs.>I then recompressed the TARs using the best compression method available for text files, PPMD.>This uber-compressed archive is meant for long-term storage of the data for datahoarders, not for immediate use.>Here are the archive's statistics:>Original uncompressed input size: 3.59 TB>Original compressed input size: 517.71 GB>Original number of files: 52,686,360>Final output size: 249 GB>Final compression ratio: 6.77%>Final archive decompression speed: 16 MB/s>Torrent download: https://btcache.me/torrent/212342C6AE503F02957382851CFBA340E273299F>Magnet URL: magnet:?xt=urn:btih:212342C6AE503F02957382851CFBA340E273299F&dn=4chan.7z&tr=udp%3A%2F%2Fopen.tracker.cl%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce
anyone got the best of 4chan greentexts?
>>1153106so it is just text?and... how does one implement this as useful? not a lot of details on a 250 GB torrent dice roll
>>1153615>how does one implement this as usefulSee the OP >>1033260
>>1153511>>>/r/
>>1153615>>11536294chan is a collection of rambling idiots, even if we make a great speech AI, the only use for it would be to shitpost (See gpt-4chan).
daily reminder that the wakarimasen admin is a nigger
>>1153631>4chan is a collection of rambling idiotsyou pretty much describe the nature of the internet, although it's not exclusively for idiocy.or are you going to convince me other sites are much in a brighter spot?
>>1153670>are you going to convince me other sites are much in a brighter spot?Smaller forums are at the ends of the distribution, they're either places of genius or the utmost retardation and weird shit. Mainstream websites are at the center, and the center is a place of average, we're idiotic.
Fireden is back up... except for /v/ and /vg/. Or at least the websites that redirect to the onion address are down. If someone still has /vg/'s onion address can they test if it's still up with TOR?
>>1154494nvm it seems to be still down http://hidulaoe3wnqi3jejmizgdcwsgenf777j2f4qfcxvt4yrx7lbhd2a2yd.onion this I used to search on /vg/
waka still down...
>>1153954>Mainstream websites are at the center
waka still down, also desu no longer archives /gif/
rump
Is there an archive that lets me search /t/. The few Ive seen dont allow for searching
>>1158790https://archiveofsins.com/t/https://archive.4plebs.org/_/articles/credits/4plebs has a nice break down for different archiving sites.
>>1157550>also desu no longer archives /gif/SHIT really?does 4plebs does that now?
>>1159963only archived moe does
>>1160009>archived.moenot that neglective shithole again...
>>1158851thanks anon
>>1158851shit, i thought Sins was dead
Is wakarimasen not coming back?
>>1163798probably not
>>1166084sucks...
old archive of 8ch and 4chan posts, screencaps, including oldanon advice, bmw, lain, etc magnet:?xt=urn:btih:b6b53efe52a78ee3e1570a3a9462e616241c81ff&dn=_fringe&tr=udp://tracker.openbittorrent.com:80&tr=udp://tracker.opentrackr.org:1337/announceanon's 2013-2020 4chan folder magnet:?xt=urn:btih:9c7e05658e1591d6a741578463c5e50bc6427776&dn=Anon's+2013+-+2020+4chan+Folder&tr=udp://tracker.openbittorrent.com:80&tr=udp://tracker.opentrackr.org:1337/announce
>>1167081>old archive of 8chb-b-basedhow far does this one goes back to? was it before their shitty attempt to replace the site with the shittier "beta" version? the Josh bullshit?
>>1167081interesting
>desuarchive and archived.moe are both down due to cuckflare shit
>>1169201Bibanon's down too
>>1169207>Bibanon's down too
>>1169213And KiwiFarms just got blocked, moved to a .ru siteCWCki's still dead
>>1169201Update apparently desuarchive has been down for maintenance all of todayIdk about archived.moe
>>1169201>>1169213desuarchive and Bibanon are back up
>>1169284Nicebut Archived.moe is dead?
>>1169287Yeah it's still down
>>1169287>>1169289Disregard this it took too fucking long to load but it's back up too
>>1169201they back
>>1153106If I wanted to archive this in seperate 40 gigabyte segments to make it survive until the heat death of the universe, how can I do that on linux? I can't figue out multipart zips at the moment and I think I'm retardedI back everything up onto physical disk so that it will survive electromagnetic radiation etc
>>1170179There are archive formats (bzip2) that split a file into independently-compressed sections, so that corruption is localized to one section and the majority of the file can still be recovered. But stitching together uncorrupted sections still takes know-how and effort, so splitting into independent files on boards and thread number ranges might be more robust. Or you can do both.More important though is keeping it backed up to multiple locations, and keeping those backups fresh.
>>1170600can you post a guide, im tech retarded but I need to learn about this specfically for my archive, since I have about 15tb to back to disk atm, and 3tb already done
does anyone have the gpt-4chan bot?
>>1170868https://huggingface.co/ykilcher/gpt-4chan/discussions/4for context
>>1167081any seeders?
Fix for 2018's 2005-2008 text-only "Ten Billion" archive:https://archive.org/details/archive_ten_billion_patched
>>1173165Nice thanks
wakarimasen still down...
So have we worked out a solution for any of this? I was just in >>>/r/18586430The big problem to me has been gif since it just isn't archived anywhere anymore. At least the others still mostly exist, but scattered.
>>1175354To my knowledge, no. At least not if you want webms and gifs, because those take up much space and archivers don't want that.
>>1153106bump
Does Pepe count?magnet:?xt=urn:btih:14ef977445757b827b69ccad7ced84445171f23a&dn=pepe.rar&tr=http%3a%2f%2f24.77.48.250%3a9000%2fannounce
>>1153106I've tried, again and again, but never succeded. Does anyone have the old Tom Green trolls/raids on his late-night talk show. Most of them were scraped from YouTube...and the Internet.
>>1180342What's in that link, Pepe?
>>1167081someone reseed _fringe please
>>1160009archived.moe doesn't save full images from /gif/, only thumbnails. See this thread for full images of 2022-10: >>1182460 = torrent which does NOT need more peers right now!
literally no site archives /gif/ now with all the files still in...
Is fireden's onion still down?
>>1184441Content aside, it's probably just too big to bother for a lot of archivers.
Anyone got 2011-2016 archives with images? I really want to know why glowies shut them down left and right.
https://warosu.org/jp/thread/1786172#p1786183is there a image archive of this thread anywhere
>>1189100It's probably not glowies but just costs of running the archives, especially storage if images. I think archives remove specific images if glowies request it.
>>1184441Good fuck /gif/.
Arch.b4k.co sucks so much ass
>>1192126Normally it's pretty good, but sometimes they break the search.
>>1181947Am i only one afraid yer curius to download this link?
>>1195435Maybe, I didn't try it.
>>1193682The search always seems to have a 50% down rate whenever I take a gander at it.
>>1198603Oh well, what can you do.
Why do archives only accept bitcoin and shitcoins and never regular old money for donations?
>>1153954>>1153670Reddit is far more superior than this.
>>1167081>old archive of 8chKill Your Self
>>1170868>>1170870archive.orgits basically just a snapshot of pool
>>1157550>>1159963>>1175354>>1177167>>1183054>>1184441>>1191379>>1188946Use this bruhhttps://gist.github.com/4chenz/de3a3490aff19fd72e4fdd9b7dafc8f4
>>1202929Because A. Security Reasons B. Security Reasons and C. Security ReasonsReally wish that they would support paypal or some digital bank transfer app.>But you can use paypal or some third party to convert it into cryptoMY ASS
>>1206416No really, fuck /gif/. It's been a complete cesspit for years now and any worthwhile archives are long gone.
>>1153106I'm leaving this site forever and this seems to be a good Thread to post my last and final post on this site.I will remember this site fondly as it shaped my humor and behavior as a young adult but now it is time I move on. This site is holding me back from my true potential.Bye 4Chan.
>>1184441>>1188946https://archived.moe/talk/thread/1479/If you have $150 to burn every month you can donate to this admin or make an archive yourself.
>>1206485Unironically happy for you if true
Anonymous Sat 05 Dec 2020 00:24:47 No.982410 Bump
>>1153631>>1153631Can use gpt-4chan to create a customer service chat bot
>>1211271ROFL!
>>1211271Speaking of which, can anyone find Chat-4Chan anywhere? Can find neither a site nor a torrent.
>>1215162Here https://github.com/oobabooga/text-generation-webui#gpt-4chan
>>1215348>https://github.com/oobabooga/text-generation-webui#gpt-4chanMany thanks!
4plebs turned purple for me.>>1206485See you later.
>>1206931>>1217424Yeah sorry guys, it only lasted like 40 days. I've been back for 2 days now. I just can't stop coming here. I unironically had withdrawal symptoms whilst I was not here.
>>1217434Remember - you are here forever. But jokes aside, just try to post and lurk less, so it won't matter if you come here from time to time.Get something else to do.
>>1153106Bumphttps://desuarchive.org/co/thread/136778492/#q136778492
>>1217434Pick a curated number of threads and just stay in those until they expire? Be extra-choosy in adding to that set so you can hit them all quickly and move on to your next activity?
These archival dumps are very useful
>>1153106Does anyone have any 2013-2014 /lgbt/ archives? According to https://wiki.archiveteam.org/index.php/4chan WorldAthleticProject and Foolzashit were monitoring it
>>1231338This would be the time of year for that.Perpexity dot ai doesn't know either, though.
>>1231338https://chan.k47.cz/
>>1233710Thank you anon, you are the best.
>>1153615 Yesterday I started to make some of my python objects capable of loading the html files from the pol tar.gz. Can't extract all the data the 4chan api gives, but if there's something in the html files, it can be extracted.I don't know how much more I can tweak this or how well will it perform, but it's probably more useful to extract the data into the "standard" API's JSON format, because while html is highly compressible, the majority of the data is just redundant html. I expect my extracted data to be significantly smaller than these, even uncompressed.So my question is: if I test my code for more boards (there are can be some differences, like pol has flags and IDs), will you guys with better bandwidth extract the data and share it in this new format?Also I wouldn't pack a whole board into a single tar.gz, monthly batches at most.And I won't share data because I can't afford the bandwidth, but I can upload the python code somewhere (although I'm gonna have to somewhat document it) and others could convert and share the archives of the board of their interest...
>>1217434Oh, and stay away from the boards where things expire in 24 hours or less. Too time consuming.
>>1153631>4chan is a collection of rambling idiotsim so sick of this stupid fucking meme, yes there are people that trolled but ive seen a just as many serious and insightful posts on here as i have seen shitposts, though recently the shitposts have gone up and you know why? you know WHY? its cause of faggots like you perpetuating this mindset that 4chan is the funny maymay site ONLY good for shitposts and that there cant ever be any serious discussion of any sort whatsoever because this is 4chan so all discussions MUST be maymay shitposty garbage, you spread this shit to all the stupid fucking redditors who then flood the site and turn it into nothing but pure shitposting, shitposting wich isnt even GOOD because its all regurgitated REDDIT SHIT
Regurgitated bump from Reddit from right before its management broke it.
>>1240329you're a newfag my man, I've been here since 2007 and the majority of 4chan boards with a few exceptions were always just different flavors of /s4s/. it has changed a little because people grew up but shitposting has always been prevalent here and the majority of "reddit shit" you mention is stuff that originated here but stopped being cool and considered reddit by retards trying to fit in
What are good 4chan archive sites? Any for /gif/?
>>1217434If you have withdrawal symptoms that proves you're addicted and you need to break the cycle. Keep trying anon.
>>1206413off you go then sister
Would be nice if archive moe enabled search for some boards. At least for dates/time periods that other archives don't cover.
/lgbt/ archive>>>/lgbt/32197766
>>1153106I put the data into Parquet files, and got even better results than the uber-compressed version (27% vs 48% of original compressed size).Plus, they're quickly queryable using DuckDB.
>>1205376By that cuck killing Pepe, he set him free. Pepe belongs to 4chan now.
>>1206414Shut up fag
>>1167081reseed pls
>>1234051>>1233710Is there an API for this? I don't want to scrape it, butttt
>>1189947seconding this 10 months later... for meido
>>1153106how does one get these
>>1267046Check if this works https://archive.org/details/yuki.la-compressed
>>1252664cute
>>1153631Should probably feed it to an anime girl A.I.It's the only logical conclusion to this site.
>>1272032
>>1167081it stays on searching for metadata
Happy holidays and happy archiving.
I wonder if anyone will study the history of 4chan via these archives....
Making lots of progress on this frontend project. Help is appreciatedhttps://github.com/sky-cake/ayase-quart/tree/main/preview
New 4plebs yearly dump is being uploaded at https://archive.org/details/@fourplebsTorrents are available
How do I get images pre 2012 from /jp/? Any image on warosu before mid july of 2012 is just the thumbnail and it sucks.
>>1288250Most likely no.
>>1206415that wouldn't be too bad but I can't get the fucker to stop talking about the 2016 electionit also has trouble dealing with post numbers and will just spit out strings of numbers after a while
>>1290498I'm just hoping that someone has a collection of images from that time that they could upload. I know there is no organized archive for what I'm looking for.
Is there a 2014-2015 archive of /v/? I think fireden is still down.
>>1286506/trv/ seems to be missing from https://archive.org/details/4plebs-org-thumbnail-dump-2024-01. I'm not sure if it's intentional or not.
Like tears in the rain.
>>1291205don't know if you are still here, but there might something worth looking in herehttps://archive.org/details/2013_10_25_4chandataonly downside is that the download speed is slow as shit
Is FoolFuuka still the way to go?
>>1285051Well, video essayists cite them all the time
>>1298617This is true (also bump)
>>1153106what's the best way to search through the archive?Just spinning up a local foolfuuka instance?
>>1217434I stopped lurking /v/ for 4-6 months back in 2013Now im highly capable of not browsing there 24/7, very easy to avoid cancerous threads.
>>1291297Seconding There oughta be someone with a spare /v/ archive, at least during the days of DP and foolz
>>1295780>slowArchive.org downloads are actually quite fast if you use the torrent option (even for rare files)
>>1298409I would say no, there are lots of modernized tools now.Here are the scrapers I have triedhttps://github.com/bibanon/neofuuka-scraper (works good, just have to set up the db schema yourself, see https://github.com/sky-cake/ayase-quart?tab=readme-ov-file#neofuuka for help with that)https://github.com/bibanon/neofuuka-scraper/pull/4 (with thread filtering)https://github.com/sky-cake/Ritual (has the most flexible thread filtering)https://github.com/bbepis/Hayden/tree/master/Hayden (works good, takes care of schema creation)Here are the frontends I have triedhttps://github.com/sky-cake/ayase-quart (works good, still being developed to add new features)https://github.com/bibanon/ayase (bare bones, vulnerable to sql injection, only use yourself)
>>1291297Fireden used have a TOR link that had the archive but has stopped working. I do have an archive of some random template images from some random thread.
>>1206485bye and good luck!
>>1314498thank you
What tools do you use to import dumps? From 4plebs or other archives.
Did anyone ever save the /new/ archive from 2010? I remember seeing it for a while afterwards but all traces are gone now
>>1316364Tell me more.
yuki.laded
>>1319623
>>1312854thanks for this anon, that's actually really helpful!
>>1324252Hey, glad I can help!
>>1312854Thanks
>>1331193'comb
>>1319648What did anon mean by this?
>>1206413Sadly true in a lot of cases.
>>1206413Reddit is gay.