[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1704606978300432.png (44 KB, 320x256)
44 KB
44 KB PNG
I need something that can
>queue multiple http requests
>can execute them asynchronously (perhaps using a thread pool)
>abort and retry requests on timeouts
>return downloaded http response synchronously
Preferable for Python, but any language is fine.
>>
https://www.python-httpx.org/

Would this work?
>>
>>101200279
aiohttp, asyncio (builtin since 3.6 iirc).
use a standard queue, fill it up and consume it in some looping coroutine. use the timeout options and write a basic try-catch to retry. store the results somewhere, like a list or something.

you can use asyncio.run to run the entire event loop until completion to get the results synchronously and then you can continue with your code after it's done.
i don't know why you'd want to return the http response synchronously though.

if your lazy and old, use scrapy. you will use twisted.
>>
>>101200294
Seems more like a modern replacement for the requests module.

>>101200319
I know how to do it. I'm just too lazy.
>>
>>101200362
well the code for it is so small and would be a hassle to actually make a library to do it. like, just write it yourself.
>>
>>101200279
use batch and aria2
>>
>>101200319

Go back, r*ddit typer
>>
All scraping libraries suck desu. I gave up and wrote my own shit in golang but I had more advanced requirements than you..
>>
>>101200279
You basically need bash
>>
File: 1719674488072.jpg (87 KB, 506x640)
87 KB
87 KB JPG
>>101200362
I'd recomended you try out some estrogen, it really improved my coding performance
>>
>>101200279
asyncio and aiohttp should do the trick easily
>>
>>101200279
What you're looking for is httpx, check it out.
>>
If you need to process a lot of requests in an efficient manner, just do Go, use the standard http lib and you are good.

If you like touching tips with your friends, you can use Python, async functions to request and asyncio.gather to run them concurrently. You can use tenacity for the retry stuff.
>>
>>101203848
Sounds like he's doing I/O bound tasks, Go is unnecessary. In fact it's the perfect use case for python.
>>
Just use celery queue my man
>>
>>101200279
You can do all of that with just bs4, concurrent.futures/threading and requests. If you're trying to get JS content and stuff like that you'll probably need headless browsers.

I've built scrapers in Python and C but honestly the best way I've found to do it to date is in JS (I know, not my favorite either) by creating browser extensions that use stuff like Playwright.
>>
I do this in C++ using libcurl, I don't really use a HTML parser either, I just notice the patterns then run SIMD substring search and it finds what I need x1000 faster while not using retarded tier amounts of RAM per page.
>>
>(((asynchronously)))
Kys
>>
/wsg/ is this way retard >>101208241
>>
>>101200362
>I know how to do it. I'm just too lazy.
give those instructions in chat gpt then.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.