[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1756173704265014.jpg (198 KB, 570x644)
198 KB
198 KB JPG
Web Scraping General

Revival edition

FAQ: https://rentry.org/scrapists

> Captcha services
https://2captcha.com/
https://www.capsolver.com/
https://anti-captcha.com/

> Proxies
https://hproxy.com/ (no blacklist) (recommended, owned by friend of /wsg/)
https://infiniteproxies.com/ (no blacklist)
https://www.thunderproxies.com/
http://proxies.fo/ (not recommended)

> Network analysis
https://mitmproxy.org/
https://portswigger.net/burp

> Scraping tools
https://beautiful-soup-4.readthedocs.io/en/latest/
https://www.selenium.dev/documentation/
https://playwright.dev/docs/codegen
https://github.com/lwthiker/curl-impersonate
https://github.com/yifeikong/curl_cffi
https://github.com/mikf/gallery-dl
https://github.com/yt-dlp/yt-dlp

> Cool projects by members of our community
doubledouble.top / lucida.to - Free music scraped from spotify
kemono.cr - Kemonoparty for fanbox/fantia/subscribestar
tv.weboasis.app - Falcon, a goy invite-only pirate streaming service that scrapes video streams from multiple sources
>>
File: 1751300457066844.jpg (528 KB, 677x1009)
528 KB
528 KB JPG
how do you save the original images on j18/jlist/hmarket?

https://desuarchive.org/g/thread/107515771/#q107516090_2
>>
File: 1745806598200382.png (1.73 MB, 1326x1424)
1.73 MB
1.73 MB PNG
>>
>>107516932
is it even possible to still scrape what you want today with most sites "protecting against bots" with cloudflare's stupid shit or other similar "checking your browser" things?
>>
thank you for posting this
>t. longtime scrapmaker first time reader
>>
>>107521215
works for me
>>
>>107521307
but unironically
>>
File: 1736806947977185.jpg (307 KB, 1060x736)
307 KB
307 KB JPG
>>107516932
You should unironically be on Awoo to discuss scraping.
>>
>>107521215
It has more to do with the IP you're using than anything else
>>
File: biz'.jpg (192 KB, 768x768)
192 KB
192 KB JPG
Is it possible to use some scraper for stock market data?
Every single fucking API either close down the free version or gimp it beyond reason so you need to buy premium.
>>
what exactly do you scrape? anime tiddies?
>>
>>107516932
https://x.com/fireplacegg/status/1996265758867992684

Not gonna lie this is the reason why I want to scale my scraping
>>
>>107516969
hey anon check desuarchive
>>
>>107521869
this post is an ad
>>
Oh is this how the retards on /aicg/ learned to scrape. Finally.
>>
>>107516969
you again...
didn't someone solve your problem in the other thread ?
also are you the poor guy or the one offering 1xmr ?
>>
File: eatpromise.gif (218 KB, 640x360)
218 KB
218 KB GIF
>>107521668
bump for this
>>
>>107521215
No but this schizo will keep spamming his thread pretending it's still possible for the next decades. Scraping hasn't been possible for at least 5 years.
>>
>>107525968
do you seriously think shit cloudflare turnstile and anubis stopped us from scraping or even made it noticeably more difficult ?

The only bump in the road are captchas but its easier for us to solve them using indians or AIs than it is for you when you try to post on an unusable imageboard.
>>
>>107525968
>laugh in residential proxies
>>
>>107526298
Greetings fellow chad scraper
>laugh in curl_cffi
>>
>>107524781
yeah
https://desuarchive.org/g/thread/107515771/#q107516090_14
>>
>>107526438
why did the fag janny delete the old thread ?
>>
>>107526452
idk some people r being schizo i guess
>>
>>107526452
they hate actual tech.
>>
>>107526438
i get this error on cunny doujins.

https://pastebin.com/dDEQ9Rpn
>>
>>107523646
No it's NOT
>>
>>107516932
dumb idiot OP killing the original discord because he wanted to act be a script kiddy
>>
File: 9256.jpg (104 KB, 999x723)
104 KB
104 KB JPG
>>107525968
that's the feeling i'm getting. if i can't even visit a website without it "checking my browser", what chance does a scraper have?

even if there's a bot that uses an open firefox instance and simulates mouse movement/scrolling and delays clicks 1-5 seconds randomly, how long will that last?
>>
>>107526939
no one likes discord.
>>
>>107527500
yeah because this thread has had fascinating, valuable discussion so far. retard.
>>
File: A FAVOURITE WALLPAPER.jpg (327 KB, 1920x1080)
327 KB
327 KB JPG
>>107516932
can you even not innawoods without a mobile phone number nowadays?
>>
>>107526775
Just read nigga, it means the name[id] doesn't exist. So it's either using another id for your doujin or there is an additional logic your code isn't handling for this category



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.