/g/ - /wsg/ - Web Scraping General - Technology

Anonymous

/wsg/ - Web Scraping General 04/24/24(Wed)19:34:51 No.100166675

File: scraper.png (1.62 MB, 1892x2142)

/wsg/ - Web Scraping General Anonymous 04/24/24(Wed)19:34:51 No.100166675 Archived

Web Scraping General

Whitehat cuck edition continued

QOTD: What are some good sources for scraping AI training data from?

> Captcha services
https://2captcha.com/
https://www.capsolver.com/
https://anti-captcha.com/

> Proxies
https://infiniteproxies.com/ (no blacklist)
https://www.thunderproxies.com/
http://proxies.fo/

> Network analysis
https://mitmproxy.org/
https://portswigger.net/burp

> Scraping tools
https://beautiful-soup-4.readthedocs.io/en/latest/
https://www.selenium.dev/documentation/
https://playwright.dev/docs/codegen
https://github.com/lwthiker/curl-impersonate
https://github.com/yifeikong/curl_cffi

Official Discord: discord.gg/9EKk3psXMr
Last thread: >>100150524

Anonymous
04/24/24(Wed)21:27:15 No.100167783

Anonymous 04/24/24(Wed)21:27:15 No.100167783

bump

Anonymous
04/24/24(Wed)21:34:21 No.100167855

Anonymous 04/24/24(Wed)21:34:21 No.100167855

sage

Anonymous
04/24/24(Wed)21:37:52 No.100167891

Anonymous 04/24/24(Wed)21:37:52 No.100167891

>>100167855
Poster had to show his drivers license and a DNA and semen sample and pay $100/m just to gain access to a read-only API he could have just scraped (even though that would have gone against the website's TOS)

Anonymous
04/24/24(Wed)21:39:13 No.100167898

Anonymous 04/24/24(Wed)21:39:13 No.100167898

At the end of the day yt-dlp is really the solution to pretty much everything

Anonymous
04/24/24(Wed)21:42:18 No.100167925

Anonymous 04/24/24(Wed)21:42:18 No.100167925

>>100141630
Aren't there like 10B possible phone numbers?

>>100143919
> Indirectly by training ML models on data
On this, what are some good sources for pulling data for training ML models?

>>100150865
Join cybercrime TG groups and look for people spreading drainer links, they should know about Twitter scraping

Anonymous
04/24/24(Wed)21:52:31 No.100168026

Anonymous 04/24/24(Wed)21:52:31 No.100168026

File: file.png (8 KB, 444x87)

8 KB PNG

>>100167898
was waiting for them to fix comments not downloading before I started scraping channels again but the zfs pool I was going to use to store the videos fuckin died

Anonymous
04/24/24(Wed)21:53:34 No.100168034

Anonymous 04/24/24(Wed)21:53:34 No.100168034

>>100168026
youtube sucks ass, who cares about video comments

Anonymous
04/24/24(Wed)21:55:56 No.100168056

Anonymous 04/24/24(Wed)21:55:56 No.100168056

File: *#($.jpg (101 KB, 1024x683)

101 KB JPG

where's the euro greek anon that runs the discord with a data scraping channel
show yourself

Anonymous
04/24/24(Wed)22:29:45 No.100168345

Anonymous 04/24/24(Wed)22:29:45 No.100168345

>>100167925
>On this, what are some good sources for pulling data for training ML models?
HuggingFace, Kaggle, roboflow or I scrap myself which is way more rewarding since the best data is always gatekept

Anonymous
04/24/24(Wed)23:37:21 No.100168908

Anonymous 04/24/24(Wed)23:37:21 No.100168908

bump

Anonymous
04/25/24(Thu)01:30:08 No.100169898

Anonymous 04/25/24(Thu)01:30:08 No.100169898

Does anyone here know a castle bypass or am I gonna have to pay some jeet in the sneaker botting coms?

Anonymous
04/25/24(Thu)03:17:21 No.100170754

Anonymous 04/25/24(Thu)03:17:21 No.100170754

>>100168034
Imagine scraping comments and using it to train a YouTube comment bot

Anonymous
04/25/24(Thu)05:24:46 No.100171663

Anonymous 04/25/24(Thu)05:24:46 No.100171663

bump

Anonymous
04/25/24(Thu)05:39:43 No.100171769

Anonymous 04/25/24(Thu)05:39:43 No.100171769

Having an issue with the selenium IDE (the web browser extension) throwing a fit over a 2d array:

Command: execute script
Target: return [["val1", "val2", "val3"], ["2d", "3d", "4d"]]
Value: A1
it gives me an error invalid or unexpected token

has anyone tried using 2d arrays before in their little web app. I can get it to work fine in the normal selenium webdriver but the IDE is a bit of a pain.

Anonymous
04/25/24(Thu)05:39:57 No.100171772

Anonymous 04/25/24(Thu)05:39:57 No.100171772

whats web scraping?

Anonymous
04/25/24(Thu)07:58:10 No.100172967

Anonymous 04/25/24(Thu)07:58:10 No.100172967

>>100171769
never mind I got it working.

Anonymous
04/25/24(Thu)08:08:45 No.100173069

Anonymous 04/25/24(Thu)08:08:45 No.100173069

>>100170754
You'd need a shitload of proxies though

Anonymous
04/25/24(Thu)09:06:07 No.100173634

Anonymous 04/25/24(Thu)09:06:07 No.100173634

Anyone know where I can scrape unobfuscated browser JS from?

Planning on training a GPT to deobfuscate obfuscated JS