>>102238559
Huggingface allows you to upload tars if that's how you want to do it. And likely the worst that will happen is they'll get a takedown notice and the dataset gets removed. You can around dance this by having a research only / non-commercial license for the data. This is what hlky is doing for this datasets which do include the images.
Disclaimer: BIG data's activities are conducted for non-commercial research[1] purposes.
Datasets are licensed under `ODC-By` to keep the data open and available for research[2].
Note that per Section 2.3 this license does not apply to the individual Contents which may be covered by other rights.
[1] Research grants are welcome :)
[2] Contact a team member if you find the license to be incompatiable with your Derivative or Collective Database, or require an exception to attribution requirements for Produced Works.
Example: https://huggingface.co/datasets/bigdata-pw/Artsy
I would generally avoid images from major copyright trolls like Disney or Warner Brothers.