Hello /t/With the changing face of the internet and censorship as a whole I would like to personally consider siterips for archival, I do not know how to do this personally and would like your helpQuestion 1: is there any software that can rip a site currently without any captcha bullshit in a functional form to place on a HDD, for example archiving gelbooru with search function and tags intact so it can work entirely as an backed up offline collection?Question 2: Is there any dedicated resource or forum for the collection of torrents that contain siterips?In the meantime, post any siterips you have in this thread, if anyone has gelbooru or danbooru siteripped I would be grateful, thank you>Wicked.com siteripmagnet:?xt=urn:btih:e188a075fbdbeddb803afb4a4aa5ea5f81486363&dn=Wicked.com%E7%B3%BB%E5%88%97.Siterip&tr=udp://tracker.openbittorrent.com:80&tr=udp://tracker.opentrackr.org:1337/announce
I respect your craft anon for you have posted a siterip and not just requested. Let me share with you what I know1) porno siterips - lots on here, otherwise use bt4rg and just search for "siterip" and you get a lot, mostly prons2) Archive Team. Definitely check out (2). Archive Team is a community of autists that rips many sites (non-pron) and uploads to archive.org. Most of their stuff is in WARC. So you have off the top of my head, pastebin is on there, a few others. You may also want to check out "the common crawl" which is fuel for LLM/AI stuff, but it is a siterip of the whole internet (a webcrawl). There are releases per each year and it's recommended if you want to do it right, you download every single release and compact them into one file, bit of a pain. You can use bt4rg to find whatever, for example I have stack exchange, some twitter, all of reddit (pre API change), etc. But really, again, look up Archive Team, those guys really go hard and have a massive amount of data, insane. it's clunkified and buried inside archive.org so you need to learn up on using that site and some python download tools, but if you go down that rabbit hole you'll find tons and tons and tons of wild shit.
>>1301739Looking into the archive team it seems very complete/useful, so thank you anon that is basically a /threadUnfortunately it does not look like I can use their "warrior" to archive sites of my own choosing and I'd rather not assume the risks associated with joining a group, I appreciate them for what they've done though
now that I think about it neither has any of the boorus listed as archived, the boorus are important to me and I'd like to back them up, what kind of software could I use to do it myself?>Masterclass siteripmagnet:?xt=urn:btih:425b660fabe162263d7ef8b43c076e03e9f3b27c&dn=Masterclass.com%20SITERIP%201080p%20WEB-DL%20H.264%20AAC2.0&tr=error%20code%3A%20525
>SexAndSubmission.com Full SiteRip 540p [WPz]magnet:?xt=urn:btih:aeb2661bcfed7b06dee8d0fb144d31d5aa2a56fe&dn=SexAndSubmission.com%20Full%20SiteRip%20540p%20%5BWPz%5D&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337&tr=udp%3A%2F%2Fbittorrent-tracker.e-n-c-r-y-p-t.net%3A1337%2Fannounce&tr=http%3A%2F%2Fbittorrent-tracker.e-n-c-r-y-p-t.net%3A1337%2Fannounce>X-Art mkv Site Ripmagnet:?xt=urn:btih:7aa8b4e80c053feb53de872b79624ea90aefeadb&dn=X-Art%20mkv%20Site%20Rip&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337&tr=udp%3A%2F%2Fbittorrent-tracker.e-n-c-r-y-p-t.net%3A1337%2Fannounce&tr=http%3A%2F%2Fbittorrent-tracker.e-n-c-r-y-p-t.net%3A1337%2Fannounce>Powershotz_SiteRipmagnet:?xt=urn:btih:92d41afc0ae58b83374ca5506ebc4a74b91c88ba&dn=Powershotz_SiteRip&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337&tr=udp%3A%2F%2Fbittorrent-tracker.e-n-c-r-y-p-t.net%3A1337%2Fannounce&tr=http%3A%2F%2Fbittorrent-tracker.e-n-c-r-y-p-t.net%3A1337%2Fannounce>Insex Siterip 2001-2003 (1000 Videos&60000 Photos)magnet:?xt=urn:btih:2d7a0d1233682605182902790bedceddc1e964ac&dn=Insex%20Siterip%202001-2003%20%281000%20Videos%2660000%20Photos%29&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337&tr=udp%3A%2F%2Fbittorrent-tracker.e-n-c-r-y-p-t.net%3A1337%2Fannounce&tr=http%3A%2F%2Fbittorrent-tracker.e-n-c-r-y-p-t.net%3A1337%2Fannounce>teenkasia.com Teen Kasia all videos siterip 2012-12-11 corrected aspect ratiomagnet:?xt=urn:btih:61641d43bfb580f633bdfe54a8b719f1b6253cc3&dn=teenkasia.com%20Teen%20Kasia%20all%20videos%20siterip%202012-12-11%20corrected%20aspect%20ratio&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337&tr=udp%3A%2F%2Fbittorrent-tracker.e-n-c-r-y-p-t.net%3A1337%2Fannounce&tr=http%3A%2F%2Fbittorrent-tracker.e-n-c-r-y-p-t.net%3A1337%2Fannounce
>Hentaied SiteRipmagnet:?xt=urn:btih:646ad9480fa75a2d8fb19e9d59a6c4157b3af3ed&dn=Hentaied%20SiteRip&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337&tr=udp%3A%2F%2Fbittorrent-tracker.e-n-c-r-y-p-t.net%3A1337%2Fannounce&tr=http%3A%2F%2Fbittorrent-tracker.e-n-c-r-y-p-t.net%3A1337%2Fannounce>ShibariStudy.com.SiteRip.012024.MP4.AAC.FullHD.Internal-CyberCrimemagnet:?xt=urn:btih:4924d87cd3abca968295eff38078667332393cfb&dn=ShibariStudy.com.SiteRip.012024.MP4.AAC.FullHD.Internal-CyberCrime&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337&tr=udp%3A%2F%2Fbittorrent-tracker.e-n-c-r-y-p-t.net%3A1337%2Fannounce&tr=http%3A%2F%2Fbittorrent-tracker.e-n-c-r-y-p-t.net%3A1337%2Fannounce
>>1302912>>1302913requesting for sexycandidgirls dot com pls anoni'd be happy with just the shorts section
i'm seeding these for few weeks more. [metart.com] Photosets - 1999 to 2023bWFnbmV0Oj94dD11cm46YnRpaDo1ZmQyN2Y0MDc2YzQ0ZGEwM2I5NDI3OTQ5MzVmNWQ4M2E0MTUyODQ5JmRuPSU1Qm1ldGFydC5jb20lNUQlMjBQaG90b3NldHMlMjAtJTIwMTk5OSUyMHRvJTIwMjAyMyZ0cj11ZHAlM0ElMkYlMkZvcGVuLnN0ZWFsdGguc2klM0E4MCUyRmFubm91bmNl[nubiles.net] Photosets - 2004 to 2023-10bWFnbmV0Oj94dD11cm46YnRpaDoxMDYxMDQ5MmI4M2Q1ZmI5MGUyZGVkMTYxMTViZDk2MmIxMjJjZDBiJmRuPSU1Qm51YmlsZXMubmV0JTVEJTIwUGhvdG9zZXRzJTIwLSUyMDIwMDQlMjB0byUyMDIwMjMtMTAmdHI9dWRwJTNBJTJGJTJGb3Blbi5zdGVhbHRoLnNpJTNBODAlMkZhbm5vdW5jZQ==[amourangels.com] Photosets 2006 - 2023-10bWFnbmV0Oj94dD11cm46YnRpaDo2YjQ0ZDQ2YmU5ZWFlOWJjNjFjMDM4ZGQ4YjU5NzRmYjgwNWUzNjBhJmRuPSU1QmFtb3VyYW5nZWxzLmNvbSU1RCUyMFBob3Rvc2V0cyUyMDIwMDYlMjAtJTIwMjAyMy0xMCUyMCUyOHB1YmxpYyUyOSZ0cj11ZHAlM0ElMkYlMkZvcGVuLnN0ZWFsdGguc2klM0E4MCUyRmFubm91bmNlCg==[ftvgirls.com] Photosets - 2002 to 2023-06bWFnbmV0Oj94dD11cm46YnRpaDpkNWUxN2YyODBiNmMyMGY5YmFjNDQwZWViYWIzM2ZmZDkxNGRlYjVjJmRuPSU1QmZ0dmdpcmxzLmNvbSU1RCUyMFBob3Rvc2V0cyUyMC0lMjAyMDAyJTIwdG8lMjAyMDIzLTA2JTIwJTI4cHVibGljJTI5JnRyPXVkcCUzQSUyRiUyRm9wZW4uc3RlYWx0aC5zaSUzQTgwJTJGYW5ub3VuY2U=[showybeauty.com] Photosets 2011 - 2023-10bWFnbmV0Oj94dD11cm46YnRpaDphMzNmYTk3ODE5OTEyYWQ0OGIyOWNiNjFlZTIyNDA4YzJiMGRmMzcxJmRuPSU1QnNob3d5YmVhdXR5LmNvbSU1RCUyMFBob3Rvc2V0cyUyMDIwMTElMjAtJTIwMjAyMy0xMCUyMCUyOHB1YmxpYyUyOSZ0cj11ZHAlM0ElMkYlMkZvcGVuLnN0ZWFsdGguc2klM0E4MCUyRmFubm91bmNl[femjoy.com] Photosets - 2004 to 2023-06bWFnbmV0Oj94dD11cm46YnRpaDowYTY4N2E2MDNlMDNhMDc2NTM2NWRkN2ZmYmIyNmU5NGUyNDU3OTk4JmRuPSU1QmZlbWpveS5jb20lNUQlMjBQaG90b3NldHMlMjAtJTIwMjAwNCUyMHRvJTIwMjAyMy0wNiUyMCUyOHB1YmxpYyUyOSZ0cj11ZHAlM0ElMkYlMkZvcGVuLnN0ZWFsdGguc2klM0E4MCUyRmFubm91bmNl[hegre.com] Photosets - 2002 to 2023-07bWFnbmV0Oj94dD11cm46YnRpaDpmYzk2YTA4YjcyNjUyZDAyZmY1NmEzMDIxOWE1MzNlNmIzNDk5ZTlhJmRuPSU1QmhlZ3JlLmNvbSU1RCUyMFBob3Rvc2V0cyUyMC0lMjAyMDAyJTIwdG8lMjAyMDIzLTA3JnRyPXVkcCUzQSUyRiUyRm9wZW4uc3RlYWx0aC5zaSUzQTgwJTJGYW5ub3VuY2U=
>>1303373[mplstudios.com] Photosets - 2003 to 2023-07bWFnbmV0Oj94dD11cm46YnRpaDpkZmFlZDFhODI4MzBlNDM3MzIwZGFjMmJhOTFlOTYyZmI5NDQyOWU4JmRuPSU1Qm1wbHN0dWRpb3MuY29tJTVEJTIwUGhvdG9zZXRzJTIwLSUyMDIwMDMlMjB0byUyMDIwMjMtMDcmdHI9dWRwJTNBJTJGJTJGb3Blbi5zdGVhbHRoLnNpJTNBODAlMkZhbm5vdW5jZQ==
can someone share magnet link for fuckedhard18?
All of this is boring ass shit
>>1301479HTTrack can perform an offline rip IIRC
>>1301479>softwarei share your concern anon and while i'm not a pro at this i've shared some siterips and large megapacks myself and based off of my experience you'll need to learn at least some basic programming and the basics on how modern webpages work, a good place to start would be something like scrapy https://scrapy.org/ it's relatively noob friendly and easy to use plus you can probably get some tutorials on getting started and once you get some experience you'll be able to get around logins, dynamic content loading and other bullshit like thatfor people interested in generic web archival rather than siterips i'd recommend checking this https://github.com/iipc/awesome-web-archiving , here you can find a lot of web archiving tools and tutorials, these tools don't require a lot of previous knowledge but they are not nearly as powerful as something like scrapy>dedicated siterip forumi'm not aware of anything like that, the closest you can get are private torrent trackers, other than that there are a lot of siterips here on /t albeit on different threads
Does anyone have a magnet for D18 video?
>>1301479I've only used this for wikis which are fairly open, but wget can recursively grab most/everything from a site. I used wget -w1 -crpnp URL to get everything from a few video game sites.
With the cracking down on game ROMs and abandonware in general, is there a working archive of myabandonware? It's not a perfect collection but impressive nonetheless and it would be a shame if it got lost to DCMA nonsense.
Looking for a site rip of uralesbian and fellatiojapan I just did a site rip of aozora bunko (japanese book website). If people are interested I can seed it. For OPs curiosity, I wrote the python script and scraped the site myself.
>>1301479the wicked siterip is missing all the Brown Sugars, that's the one thing I wanted.
>>1301479https://github.com/nid666/GamersriseupArchive
requesting asian appleseed and alike please share !
>>1301739>the common crawlIt appears to be in text form purely for AI sake, it's not particularly useful to me right now, but maybe in the future as a low priority task>>1305370I agree, I did not make this thread for porn>>1305641>HTTrackThis looks like exactly what I need, thank you anon>>1305733I do not recommend sharing publically currently, loose lips sink ships and I imagine that in a few years the powers that desire a reset of the internet will seek to destroy backups people keep personally as well, I recommend legitimately burying copies, for example getting an ammo can, turning it into a faraday cage with some flex seal etc, filling it with 100gb mdisks or a HDD, and burying it 10feet+ underground in a place you will be able to easily find it in, but others will not, things are getting scary for anyone who cares about freedom>Awesome web archivingSeems like an excellent source anon thank you>>1305787Wget requires the effort of me manually recreating the website once I have it downloaded, at the scale I am doing this it is inefficient and annoyingIf I where to give back to the community, what then might be a more secure/anonymous way to do this? torrents are very easy to trace and at scale using a wifi extender becomes an issue because of bandwidth limitations, though I remain unlikely to do so
>>1301479A thread potentially related to this topic popped up on /g/>>>/g/100644419
This thread is not permitted to die
>>1301479anyone have a hentai site rip? im into a little of everything. could really go for a pick me up from datahoarding some
>>1307629use `wget --mirror --page-requisites --adjust-extension --convert-links --wait=5 -e robots=off {url}` instead of HTTrack
>>1303373what the hell, more of this please or where i can find more of it.you deserve a monumental statue for your heroic deeds.
Anyone have a Yonitale siterip?
>>1307102inb4 vimm
>>1307629>>1305370I hate pornography. I clicked here because I too am interested in things like archiving stuff.For instance, what's going to happen when long-standing Mod content dies, such as Sims Exchange? How will we save that?
>>1307629> In a few years the powers that desire a reset of the internet will seek to destroy backups people keep personally as well, I recommend legitimately burying copiesSure, the folks behind that Great Reset might be able to brick back ups connected to computers, but cold storage is impenetrable unless the equipment fails or is zapped. You don't even need to bury it; a compact disc will last thirty years if storedp roperly.Tape can last twice that.TLDR: Get a tape drive if you want to think that long ahead, and make more than one copy of back ups.
>>1305370Even worse it's boring ass shit you can find literally anywhere. These aren't even particularly obscure porn sites but mainstream as fuck so you can probably find a full siterip on google.>>1317113>How will we save that?Looks like we won't honestly. I have a gigantic archive of lots of deleted FO3/NV/4 as well as Sims 3/4 mods and a few others but no one ever seems interested so I guess I'll just sit on this shit for my own use till the end of time. Yeah I have tried sharing numerous times before but there just is no interest it seems.
I found the skytorrent.in dump from 2018-02-22 on archive org. It is about 500Gb big and only has the torrent hashes as names for the torrent files. I set about making this dump usable for the lolz. I used bencode to extract what information I could from the files and created a SQLite file, then compressed the 37Gb file to about 4.4Gb with 7zip.The columns are Source_site,Date_Created,Torrent_Title,Size (in Mb),Comment,Torrent_Hash,Created_By,File_List.magnet:?xt=urn:btih:17ee9a7b1d189e37939ef60fd5484ab2ce560eb6&dn=skytorrents.in_dump+2018-02-22+export.7z&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Fopen.tracker.cl%3A1337%2Fannounce&tr=udp%3A%2F%2Fopen.demonii.com%3A1337%2Fannounce&tr=udp%3A%2F%2Fopen.stealth.si%3A80%2Fannounce&tr=udp%3A%2F%2Fexodus.desync.com%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker1.bt.moack.co.kr%3A80%2Fannounce&tr=https%3A%2F%2Ftracker.tamersunion.org%3A443%2Fannounce
>>1308312It's a shame but /g/ just really doesn't care about technology.
do anyone have tgirl-japan/shemale-japan siterips?
>>1307629I don't get too worried about any impending great reset. Even if true, its going to be several more years before mass amounts of romhack and magazine scan data are completely eviscerrated.The clearer-and-presenter danger is the temperament of site hosts and admins. Which is why I'm a huge proponent for siteripping. Even if the sites themselves burn, the data (both hosted and contextual) within is library worth holding onto.Reddit recently had a scare with this, when their executives announced incoming paywalls.https://dataconomy.com/2024/08/08/reddit-subreddit-paywall/ Despite that, anybody who's ever tried to create a program or fix and mother knows just how valuable a tool reddit can be. If a site as monolithic as reddit can be wrecked by change and downpour, anything can. (Personally, I'm still prepping for the inevitable mass deletion of youtube content for sys expense reasons.)It's worth keepin', worth givin' a damn about.
>for example archiving gelbooru with search function and tags intacthow the fuck do you expect this to work? You can't rip a server
>>1301479the usual boorus you can download (with tags in a separate file for each image) with gallery-dl, but if you want to replicate the search you will have to host your own booru and find a way to import both image files and the associated tags.
>>1303373I ripped metart in 2009 but my hard drive died so I lost some of it and some got corrupted.I see that some of the metadata changed (ages etc) and the images originally did not have watermarks but they do now.Would anybody like the incomplete rip without watermarks?Also these torrents are huge. If somebody is interested, I might repack the images with jxl to losslessly save space. The zip files would no longer be the original bytes, but it looks like the zip files all got renamed from the original name anyway.
>>1301479 >>1326934I use Hydrus for managing my porn collection. Booru style tag based file manager that has a ton of downloaders built in including a bunch of booru downloaders. You can cook up your own downloaders with it and setup auto downloaders that check for/download from sites however often you set it to. It's pretty flexible for me but I'm not a super data hoarder yet (roughly 600gb in it rn)
>>1310254>-e robots=offdoes that mean you can just ignore the robots.txt?Never done a siterip but looking forward to it
This sicflics siterip is still looked after by manyWhy the hell did he create this perfectly organized rip and leave the 2 lucky guys at 87,4%FInd and seedmagnet:?xt=urn:btih:8eda277efe030acddb3608dd17486e5bb7d2982f&dn=SicFlics.Complete.SiteRip&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Fopen.demonii.com%3A1337%2Fannounce&tr=udp%3A%2F%2Fopen.tracker.cl%3A1337%2Fannounce&tr=udp%3A%2F%2Fopen.stealth.si%3A80%2Fannounce&tr=udp%3A%2F%2Ftracker.torrent.eu.org%3A451%2Fannounce&tr=udp%3A%2F%2Ftracker-udp.gbitt.info%3A80%2Fannounce&tr=udp%3A%2F%2Fexplodie.org%3A6969%2Fannounce&tr=udp%3A%2F%2Fexodus.desync.com%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.theoks.net%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.dump.cl%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.ccp.ovh%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.bittor.pw%3A1337%2Fannounce&tr=udp%3A%2F%2Ftamas3.ynh.fr%3A6969%2Fannounce&tr=udp%3A%2F%2Frun.publictracker.xyz%3A6969%2Fannounce&tr=udp%3A%2F%2Fretracker01-msk-virt.corbina.net%3A80%2Fannounce&tr=udp%3A%2F%2Fopentracker.io%3A6969%2Fannounce&tr=udp%3A%2F%2Fopen.dstud.io%3A6969%2Fannounce&tr=udp%3A%2F%2Fnew-line.net%3A6969%2Fannounce&tr=udp%3A%2F%2Fmoonburrow.club%3A6969%2Fannounce&tr=udp%3A%2F%2Fleet-tracker.moe%3A1337%2Fannounce&tr=https%3A%2F%2Ftracker.bt4g.com%3A443%2Fannounce
Work has started on getting Nhen ripped. The current goal is at least getting every torrent file from the site into a single folder
>>1301479bumper bumper
Any Zishy siterips out there
bump
>>1305787>>1301479Wget has some excellent switches if you're handy with a keyboard.You can unironically come up with a great wget string and save it as an alias in your bash config and be all like `# Scrape (url)`
Considering internet archive has been down for the past few days, in relation to a lawsuit it is under; I'll bump with this; https://wiki.archiveteam.org/index.php/Internet_Archive#Backing_up_the_Internet_Archivehttps://wiki.archiveteam.org/index.php/ArchiveTeam_Warrior#I'm_looking_at_the_leaderboard._What_do_the_different_counters_mean?
>>1303373can you do a siterip for famegirls too?
1. It's very rare, but some sites (eg. Project Gutenberg) allow you to rsync all the data as-is on the server, before being mangled by HTML server, PHP scripts etc. This is the best case scenario, and it is very easy and fast to keep that mirror in sync.2. Some sites publish a full site dump, free or paid, full or incremental, at regular intervals or on demand. Best to ask the site admin.3. Some sites offer APIs, which you could use for scraping. However, those are more likely to have rate-limits, as well as might not expose all information. It really depends on the site, on some it's better to use the API, on some the HTML.4. Last is HTML, with right flags wget can scrape the whole site, embeded content from other sites, rewrite links to make the whole site readable locally, as well as keep relevant metadata to only update content that changed since last run.There are also WARC-based tools, but those afaik are mostly used for website snapshots and can't be easily kept up-to-date without scraping the whole site all over again.Personally, i use a wget script + filesystem snapshots to keep history.For more elaborate cases (eg. a blog with links to external file host) i use a python script with requests and beautifulsoup4.
I know most of you guys are here for porn, but any chance of siteripping brilliant.org? It seems to be a really cool educational site, but of course it's behind paywall.
>Juventa Club (Complete?)magnet:?xt=urn:btih:ebbedcc05095f419892ad7c2ab463b5c8c566bd0&dn=Juventa%20Club&tr=udp%3a%2f%2fopen.stealth.si%3a80%2fannounceIf I could get some help completing, I'll seed indefinitely
>>1303160>sexycandidgirlsAny other recommended sites like this>>1303373these dont work when I add magnent:xt=um: before them what do you need to make them work
>>1302912>SexAndSubmission.com Full SiteRip 540p [WPz]please seed this
>>1340518>these dont workit's base64
>>1340552it just dies on 36%