[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/t/ - Torrents

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: fuck_humans.webm (814 KB, 848x576)
814 KB
814 KB WEBM
Torrents of full images from /gif/ - Adult GIF, released monthly. >>>/gif/ is a NSFW 4chan board. Info:
- Contents: porn, random videos, LiveLeak-esque videos, and other videos which were too interesting for weak video sharing sites like YouTube.
- Stats so far: more than half a million files, totals to more than one terabyte.
- Why? Among other reasons, no one else is archiving /gif/, so I decided to.

Previous: >>1231730
>>
== History ==

/gap/ began in 2022-10. Archive files from 2022-10 to 2024-03 only have full image files and not threads (plain text). Starting in this thread, including 2024-05, each release going forward should have threads.

== How does it work? ==

Every 24 hours, this happens:
1. Full image links obtained 24 hours ago get downloaded.
2. All /gif/ threads get downloaded in API/JSON format.
3. Full image links get extracted from the JSONs. Goto 1.
>>
*more than half a million video files (gif, webm)
>>
bump, not sure if u did but it would be good to publish your exact setup and workflow for this just in case someone else might want to use it for another board or do it for gif themselves or continue doing it one day if you disappear
>>
>trying to find archive of two specific threads, grab torrents
>it's a zip instead of folders i can pick through properly but whatever
>open one today
>it's a flatfile dump
>no index file, no metadata, filenames are presumably md5
>go to archivedmoe to find the thread to pull the hashes out
>the archive for that is 404 for some reason
I don't want to bitch too much because this is all still better than nothing, but please, I beg, an SQLite file, OR simply putting the thread number in the name.
>>
>>1311696
>flatfile dump
Is that what that's called? In the past I called a similar thing a "simple many-file folder" (which contains zero folders).

>simply putting the thread number in the [file]name.
Sounds like a bad idea. One reason: same files in multiple threads. Could have thread number of only the first thread it showed up in, but still.

More data stuff - unrelated to your post... I run an ipfs node which is consistently online in one computer. I run another IPFS node which is inconsistently/temporarily online in another computer. I have a cid which is only in the temp-online one. After recursively providing that cid to the dht in the temp one (for hours, probably still doing it) I saw that its storage went from 666 gb total to 669 gb. Conclusion: recursively providing a dag/CID which you don't have to the DHT seems to make you download it (if running as read-write which you are almost certainly doing).
>>
>>1310152
is 2024_05 available yet?
>>
>>1310152
Fuckin piece of shit, I found a fault that shows that 4chan /gif/ 2024-06 wasn't being downloaded for days. I think I fixed it now. I could check on logs and stuff going forward to see that it's working correctly.
>>
>>1311996 = day that I fixed that problem, has worked fine since then.
>>
>>1310152
What are you using for this? GChan?
>>
>>1310152
Where is the magnet link?
>>
>>1314207
See the previous thread. I didn't share 2024-06 and 2024-05 yet.

>>1313666
No
>>
>>1310155
Post source code plz
>>
>>1314635
>See the previous thread. I didn't share 2024-06 and 2024-05 yet.
But the previous thread is 404 already
>>
Fuck cloudflare captcha
>>
>>1310152
>>1310815
>>1314645
Setup: GNU/Linux computer, set .sh files to executable by running "chmod +x file.sh". I use simple/crappy code to download this stuff. My code does not enable /gif/ users to do remote code execution because it parses JSON in such a way that it deals with privileged bytes that are part of the JSON structure and not the contents (same thing with older versions of the code that I used which parsed HTML instead).

Folder: /path/4changif
Folder: /path/4changif/test
Folder: /path/4changif/threads
- Maybe sure you have those 3 folders created (replace "/path/4changif" with whatever you have.)

File: cron.sh
- ipfs://bafkreidzrvqgtjebkj7uqzibqjz6jri3ei3tvsjscf5i2lebejr5s3wgga / https://web.archive.org/web/20240628134551/https://sabrig1480.xyz/FSrXIjIxT1z4ndG9dx018rGq6MGD3cF6BUPNTXs2lu4
- Checks if cron0.sh is running, if not run cron0.sh. Specify the full path to "cron0.sh".

File: cron0.sh
- ipfs://bafkreiclpya533snasw6crtg4f6dgdoikkuudsgnmvz4uw4s53f572ugvy / https://aralper.xyz/ciw3Aigohm54YTN-EwQO0C3xPOckjS3BjjYCipRjqf0
- Main downloader. Change "basepath1='/path/4changif'" to whatever folder you have set up to download /gif/. Manually run lines 11 through 37 when first running it to kick things off. Make it so 11-37 lines are one command then run that command. After doing that, the downloads all happen automatically.
- In depth on each line. 1=Bash. 2=basepath1 variable. 3=HDD history. 6=if statement 1, 86400-second wait between links obtained and downloading the files of those same links, runs if above that number. 9=runs commands to download files, logs it. 10=clears commands to run. 12=Does stuff, gets thread OP numbers from https://a.4cdn.org/gif/catalog.json . 14=threadcount variable which is a number of all OPs. 16=while loop 1, to go over all the OPs. 18=selects a specific OP (var ii). 20=debug output. 22=Downloads a thread https://a.4cdn.org/gif/thread/$ii.json > $basepath1/threads/$ii.json.$now

1/?
>>
>>1314886
24=imgcount variable which is a number of all images in a thread ii - calculated as a count of JSON parts
>jq ".posts[].ext, .posts[].tim, .posts[].md5" | grep -v "^null$"
divided by 3. 26=filename variable - array of POSIX time filenames from the middle $imgcount lines of those JSON parts. 28=ext variable - array of file extensions from the top $imgcount lines of those JSON parts. 30=md5 variable - array of Base64(MD5) strings from the bottom $imgcount lines those JSON parts (formatted to "standard" URL-safe strings). 32=while loop 2: saves commands to download each image into a text file (downloads as "TZ=UTC wget -nc https://i.4cdn.org/gif/${filename[$n]}${ext[$n]}" -> "$basepath1/test/${md5[$n]}" - cmds in $basepath1/torun.txt); end while loop 2. 34-39=iterate while loop 1, end while loop 1, end if statement 1.

Ignore the/this "In depth on each line" section if you just want to use it and don't care about how exactly the code works. You can also replace each case of "/gif/" with "/mlp/" if you want to download another board. This skips downloading janny-deleted files, which is good and bad. Bad if it was some harmless video that got deleted because the poster was too based and got his post deleted due to politics. There's no HTTP archive of HTTP 4chan /gif/ files, so no thing to fall back on and check if it's a harmless file. In the /mlp/ example, there is. I don't have a thing to specifically record "found deleted", but you can look at cronlog2.txt for 404'd files if you want to use this script on other boards then check found-deleted against what's saved in desuarchive.org, for example. And since I brought it up, here's a one-terabyte torrent that an anon (not me) downloaded from desuarchive /mlp/ and other captures of /mlp/:
>magnet:?xt=urn:btih:9671fb0855c7931fe98f03f7612c18010fb10121&dn=4chan-mlp&tr=udp%3a%2f%2fopen.stealth.si%3a80%2fannounce&tr=udp%3a%2f%2ftracker.openbittorrent.com%3a6969%2fannounce

2/?
>>
>>1314887
Run "crontab -e" (NOT as sudo) and put this in there:
>0 * * * * /path/4changif/cron.sh
crontab runs hourly, cron*.sh runs daily. I guess I could simplify it to not be hourly->daily and just have crontab run it daily, but what I use works to only download it daily, so whatever.

File: 404.txt
File: addext.sh
File: howto.txt
File: 4chan_gif_2024_03_empty.txt
- see the latest torrent, magnet:?xt=urn:btih:84b2a6b0865a26bac9b7deef0ba63f893d6931c4&dn=4chan_gif_2024_03.zip

File: cronlog2.txt
File: cronlog1.txt
- automatically created, see the latest (4chan_gif_2024_03) for one of those

File: time.txt
File: chkcmd.txt
File: torun.txt
- automatically created

3/3 for now.
>>
>>1310152
>>1314207
>>1314635
Hey OP not to be mean or anything but why did you create this thread, just post the fucking magnet link, the previous thread is long gone from the archive
>>
== Links to 4chan /gif/ 2022-10 to 2024-03 files? ==

Those are all in thread #1231730:
- with working CSS/JS (WARC in a parent folder): https://bafybeifn7bxeg34zc725kkjzfuxpbf2ftb5lgjpdh5r7hrdrrpf3zpat2m.ipfs2.eth.limo/raws/boards.4chan.org/t/thread/1231730.html
- as a text file: https://gateway.pinata.cloud/ipfs/bafybeicqxg64e6u3ws3ietrlxao7nxjwuxkbc542gh5xw53quazkgttpbu - includes .gz version at https://shadow39.online/OdSsPIeOso2q8kY6lvckhJuSzB4YGI9hdOqum0AncVQ
- folder bafybeic...tpbu also includes a text file which only contains the magnet links posted in that thread, with 4chan /gif/ ones at the top: https://utkububa.xyz/JsoPMGJhzMrtMCDblJjbXH8XFoa_9vfqzuKGNBfwNSw = https://gateway.pinata.cloud/ipfs/bafkreig3hnbw65gikxi2j62q7bgpsn2npx7dkxmfyzut5emvtumzzpwdze

Anchor: >>1310152
Replied to: >>1314207 >>1314778 >>1315187
>>
>>1315342
yeah I'm not clicking that shit glowie bot
>>
>>1315353
Wow, you are dumb. Here's that same text file:
https://files.catbox.moe/3nd5c2.txt
>text file which only contains the magnet links posted in that thread, with 4chan /gif/ ones at the top

which is also here:
https://ipfs.hypha.coop/ipfs/bafybeia7nmoydnj2d4gymp6rlpdpusozcip7x5znpax7gfprpfgx3wiaii/kill_all_retards.txt

Now go back to watching CNN.
>>
>>1314896
Thank you. Not trying to be annoying but, putting this somewhere like a GitHub equivalent would be cool (we should probs stop using GitHub at this point because u are training an LLM with every commit)
>>
>>1310152
Great initiative anon.
Looking forward to 05 and 06. Got all the other ones already. Im archiving some threads myself, mostly wsg and pol stuff.
>>
Done: stuff such as segmenting 06 into its own folder, created folders for 07, updated howto.txt

Todo: info on 05 and whatever
>>
>>1314207
>1231730
>>1314635
https://archiveofsins.com/t/thread/1231730/#1311161
>>
>>1316804
Guess I will work on 05 "soon". It's here:
/zc/z9/4chan.org/gif/
/zc/vid_4mb/

>4mb
/wsg/ seems to have raised their max file size limit to like 6MB:
https://boards.4ch an.org/wsg/thread/5612597/pol-politically-incorrect
>>>/wsg/5618193 - can upload a video where upstream size=6MB, or is that derived size?
Maybe /gif/ raised that limit too.
>>
>>1317998
4chan_gif_2024_05 is like 10 GB for certain reasons, and I'm just gonna have to accept that for now. In order to get the rest of I need like 100 USD so I myself can do a HDD repair, 1000 USD to get a "professional" HDD repair. I wish this was a data ransom situation because that would mean I have all of the data for that month and am pretending not to have it. I don't have a big part of it, so less motivation to release 2024_05 which I felt like releasing before releasing subsequent months. It's been 2 or 3 months and I haven't got on 05, now at the acceptance part of the stages of grieving, so I guess I will be more likely to get on it soon. Trump catching an octopus AI video from the following (can't attach extensionless WebM file then post it to 4chan):
>4chan_gif_2023_05 , https://gateway.ipfs.cybernode.ai/ipfs/bafybeigkplwidoyprmm7vyb2qlna7o2sgq26ydrg55nrqhf4xfcga3gxsu

FUCK 4CHAN'S KEKFLARE CAPTCHA!!!!
FIX THIS SHIT MOOT!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>
>>1320059
Oh look, a new wordfilter. I hate this website even more now. It's C U C K -> KEK.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.