[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: scraper.png (1.62 MB, 1892x2142)
1.62 MB
1.62 MB PNG
Web Scraping General

Reverse engineering edition pt 2

QOTD: Which is easier: parsing HTML or reverse engineering priv/undocumented APIs to scrape from?

FAQ: rentry co/t6237g7x

> Captcha services
https://2captcha.com/
https://www.capsolver.com/
https://anti-captcha.com/

> Proxies
https://hproxy.com/ (no blacklist) (recommended, owned by friend of /wsg/)
https://infiniteproxies.com/ (no blacklist)
https://www.thunderproxies.com/
http://proxies.fo/ (not recommended)

> Network analysis
https://mitmproxy.org/
https://portswigger.net/burp

> Scraping tools
https://beautiful-soup-4.readthedocs.io/en/latest/
https://www.selenium.dev/documentation/
https://playwright.dev/docs/codegen
https://github.com/lwthiker/curl-impersonate
https://github.com/yifeikong/curl_cffi

Official Telegram: @scrapists
Last thread: >>101054257
>>
'mp
>>
'mp
>>
'mp
>>
It's over
>>
>>101091990
>>101093799
>>101095385
>>101095455
reddit fucking shitstain, you do not "bump" useless threads.
if no one wants to post it in it means no one wants your garbage on the board
fuck off with your fucking cancerous "general" garbage
>>
>>101095472
Seethe.
>>
Go back to /b/ ranjeet
>>
>>101095472
This guy probably gave his left testicle for access to a read only API



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.