i am sharing my collection of cartoon frogs. these were all collected from 4chan. all files are unique but there are many duplicate frogs. i put the images into zip files because there are so many. torrent is 15 GB in total.magnet:?xt=urn:btih:7a165aec0c1917489773d44c05e44e0e978a41d1
>>990243what... for real bro?
the only post that matters on this entire board
15 gigs is worth it im coming pepe
what the fuck?
this is a thread that really worth
bump the fuck out of this
Have you gotten rid of the duplicates?
>>990243crashing pepe market with no survivors
>>990329i have attempted to cluster the images to remove duplicates. it works alright but my code is too inefficient and my computer doesnt have enough resources to run it on the entire dataset. i made a website that lets you explore the "deduplicated" dataset and view similar pepes. it only has 30,000 images. https://bbwroller.com/frens>>990243also i fucked up. it doesnt actually have 130,000 images. the torrent only has 100,000. i didnt include the other 30,000 because the probability they were actually pepes was lower
no wojaks?
>>990243absolutely based I am not a frog poster but I have become one starting from today
This should be stickied
>>990389
>>990455no u
>>990456Leddit is around the corner.
>>990458>Leddit is around the corner.
>>990459Leddit is around the corner.
Stupid thread.
>>990243are you from the past?do you have that kitten who wants a cheezburger too?
>>990477Then go back to jerking off to your trap/sissy megapacks, you tranny faggot. Better yet, kys.
>>990243BasedI'm seeding this now
seed baby seeeeed!!!!!!!!!!!!!
if you got rid of identical duplicates i would dl and sneed.
Thanks. Can we have a wojak collection too?
Now That's What I Call Autism vol. 5
sneeded+seeding this gem
thanks OP for making this available, it's a nice large dataset to experiment on. the problem is the absolutely massive amount of dupes and how to get rid of them. checksums are out, you need to use a tool for comparing images. first off, and what i should have done at the start, delete all pics under 126 pixels width. apparently people save fucking thumbnails instead of full pics and it's no loss to jettison those. you can do this easily in XnView, Search, and specify the dimensions. Next, Get VisiPics or AntiDupl.Net to use to try and automatically find similar images. This is where it gets tough because they dont have a simple way to say "out of this group of similar pics, just take the best one". And unless you are an autistic NEET, do not attempt to manually go through and select the best from the 4000 groups i found in the 100.zip alone. it will take days. currently looking for better methods to dedupe this thing.
>>990785the other thing i should mention is you have to sort of come up with some rules for specifying a dupe in something this diverse. for example, it's easy to say "just take the file with the larger dimensions" but what about when you have two identical pics and one has background transparency and the other doesnt? and what about when you have a pic that is maybe 5% different because some dude put his obscure website favicon logo on pepe's shirt? do you want to keep that one? you can't hardcode that rule because a lot of good, similar pics are slight changes in facial expression, which you want to keep. what about when they are identical dimensions but one is 400K larger than the other? are you somehow losing some valuable data? these are just some of the decisions that i'm struggling to find a way to automate.
I'll be seeding for awhile
>>990683>>990455
>>990243Fucking bump
>>990243Bump this shit, based as fuck
dead meme
>>990378you could post the code and let someone with a better computer run it.
>>990243anon, a thread died for thiss...and fuck that thread good job
>>991036based
>>990243Thanks, Based One
Here OP add this to your collection
we did it niggers
Okay so heres the plan:>Somehow delete duplicates with worse quality>Run every image through https://github.com/fhanau/Efficient-Compression-Tool with -strip -9 flags>zip>20% less size and a better dataset overall>???>profit
>>991122this is how you get in a position where you can't find duplicates, you retard, it's just 15gb anyway are you poor or something?
>>991132english nigger do you speak it?
>>990528Now this is a based post
>>991122you forgot step 0, which would be to remove all thumbnail sized images from the set first. and you are just fluffing over the remove dupes step which is the hardest part.before anyone tries to modify this massive archive you should ask yourself what you plan to do with this. at first i thought "oh, it'd be cool to have a folder where i could pick a pic for any feel when i post" but the fact is there are so many images with no descriptive filename, you'd spend 15 minutes looking and not even browse 5% of them. it's not good for that. it's not even good for creating a folder of go-to images because, again, you have to browse so many unsorted pics. this seems well suited for training an ai model. i'm gonna start reading up on the criteria for that because i donno what else to do with this, honestly. there's one interesting part to this archive that may make you NOT want to remove dupes. that is, it may be true that the most popular pics have the most variations (in size, dimension, etc. in which case maybe you don't want to remove dupes so when you feed it to a model generator, it's biased against the most popular images. i donno. i dont know shit about ai yet. Visipics is great for generating a crude histogram of images with the most variations. you better believe there is a base pape with literally every college/pro sports team hat on it in this archive. if you guys want the same thing but soijak, check out this archived thread for links:>>>/g/79476879also, just grab this 2gig zip of jaks: https://mega.nz/#F!tOhxAYjI!Y9nFdFHI_2wlCryV__4-wQ
>>991201>only 2gig
>>991202>>990389>>990683>>991201sorry so smol. how do i delete link?
>>991207just leave it, but in the internet there are way more variations of wojak than that of Pepe, thats why I was surprised the wojak archive is only 2gb while the Pepe around 10gb (if you delete most duplicates)
>>991212i was joking. if you want to try and do what op did for soijak, in that thread i posted they suggest scraping basedjak dot party. maybe you'll get a lot more. then again, i'd like to see him just scrape 4chan in the same way for it.
>>991201>you better believe there is a base pape with literally every college/pro sports team hat on it in this archivehttps://bbwroller.com/frens/search/0afd4ddba7970487d8de848a7ea1ebfd1584908662bc2378e66e31ee97b4a014https://bbwroller.com/frens/search/19619e62cf88cec4a08b6727570279fa0bd5eef8afae901141e6bceb40890c0ahttps://bbwroller.com/frens/search/d8e36989e2dd52b284333074267829782d8009e208af21b1e893f363fecb2465
>>990243omg i gonna cry, this is amazing.how close do you think your are to a complete collection ?
Bump
You guys must be just as addicted to coming here as I am, I don't post enough to want so many pepes though, here's a pepe I created through paint though, OP, enjoy.
this is better than porn
not a frog poster, but still high quality post OP. thanks
>>990243crashed qbittorent
Is anyone seeding right now?
>>991498it took a long time for the metadata to fully load on this in qbittorrent for me. i also tried adding a large list of trackers to it, but i donno if this is even on a tracker so that might not help>>991580i am, but it appears like i've been the only seed for a couple days. i'm not OP and i'm not gonna do it forever. people dont have an excuse to not seed this thing, it's not like it's copyrighted material or anything. r-r-right?>>991237this is incredible. what software do you use to determine similarity?
Beautiful work bro
>>991580I'll be seeding until 11am EST tomorrow
>>991655>what software do you use to determine similarity?https://github.com/JohannesBuchner/imagehash>>991580I am seeding on a vps with 200 MB connection. Try these teackershttps://ngosang.github.io/trackerslist/trackers_best.txt
>>990528This.
now this is awesome!!
>>990243if this is real im going to shed a tear.
>>990378Does this mean out of 130k only 30k are unique?
>>990243HOLY FUCKING BASED !!THANKS SO MUCH FREN !!
what a waste of time.
>>991995yeah complaining about pepe image dumps in the torrent section of a basket weaving forum is much more productive
>>992046based
>>990243Wouldn't that make Limited Edition Pepe less valuable?
>>990243Fucking based, thanks op
took like 7 hours to download but hek i got it now thanks friend
thanks but I'll wait for something better
great torrent. i'll seed this one for a while
>>992434MILHOUSE IS NOT A MEME YOU NEWFAG FUCK
any 3d ones like this
>>990243this is solid gold props to you OP
>>990243This is the level of autism that makes this site worthwhileThank you friend
>>993054This thread needs to live
>>991273I just like that fucking frog.
Does it have the Tarot pepe?
bump
>>990243Thanks OP, very cool!
Based
>>990243hahaha awesome. i have been saving Pepes and Trump images since 2015. i must have 1000s by now lol.
Anon, you don't really expect me to download 15 gb of cartoon frogs, do you? Because I will. Thank you.
There's a few thousand extra pepes available at the-eye too. https://the-eye.eu/public/Images/Pepe/
>>990243op here. i've got something in the works to remove thumbnails. i'll try to tackle duplicates later
>>994592Yeah, please. I downloaded this, extracted the 001 one and when I went through it, just wasted my time watching at low quality ones and in the end I figured my pepe collection is better and deleted it all...now I kinda regret it coz I could have *.gif or sort by size and kept some quality ones, if you could remove the shitty ones it would be an amazing torrent bro. Duplicates, well that would be great, but my biggest complaint was low quality pixelated thumbs in there. But thank you for your effort, dont wanna come out as a dick, you are doing keks work here after all.
>>991273This is history man, you need to think long term. In a decade or two, these will be worth millions.
someone post the removed duplicate version of op's post
>>990243thanks for the frens, faggit.
>>990243Am I there?T. 178cm man
fuck, torrent dead
>>994765na just very slow, fren. currently downloading and i'll keep seeding asap
>>994765I'm gonna wait on the fixed one with the dupes/thumbnails removed, personally
>>990243stupid frogposters
Seeding
>>990243Oh boi
Shit, quality is shit kinda
>>994828im dl'ing nowhave an exam tomorrow, then the dedupe is probably gonna take a day or 2
>>990243Thanks!
a
online viewerhttps://bbwroller.com/frens
>>995260pretty cool
>>990458'no u' started on this site
>>990243And here I was feeling bad over having collected almost 1100 frogs over the years.
now something for the other faggots trying to remove the dupesshould edits like these be considered dupes?quite a few images are jpegs of jpegs of jpegs so the similarity between these 2 is greater than the original and some jpegsdo 4channelers really cant into saving a fucking png
going deeper into the rabbithole, would these be considered dupes? there are 48 fucking variations of this file
>there's 84 more variants of the same file but mirroredwhat the fuck are 4channelers doing
put them through duplicate image softwarefound 15k duplicatesgj anon
Nigger.gif
>>995396>should edits like these be considered dupes?well obviously, are you not human?
>>995523no they shouldnt. that is a unique pepe. the black squiggle is representative of dust or hair on your monitor, and the smug pepe is to make you mad after you realize it is part of the picture. it is a modern masterpiece.
thank you OP, now I have a frog for every occasion