[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1752437844165730.png (300 KB, 1920x1200)
300 KB
300 KB PNG
the great debate
>>
>>106762012
curl is for doing specific web requests, mimicking complicated auth chains and headers.
wget is a less engineered tool for downloading/archiving content from a network
>>
>>106762012
wget for downloads things, curl for everything else, the two programs complement each other
>>
curl -O makes wget obsolete
>>
>>106762012
bin/wget
#!/bin/bash
fn="`basename "$1"`"
curl "$1" > $fn
>>
>>106762584
theres some wget features for spidering stuff that im not sure curl does well
>>
>>106762584
cURL is much more fickle, you also need at least -J and -L to mimick Wget behavior, now try to throw a list of hundreds of URL at the cURL command line utility and see how that works out, you can't even chain arguments like -O
>>
>>106762889
also no recovery, clobber and no retry by default, cURL is clearly made for handcrafting requests, not bulk downloading
>>
what's the one that gets whole websites?
>>
>>106762012
Curl is amazing and open source devs like curl creator Daniel Stenberg gets too little credit.
>>
>>106762649
youre supposed to do the logic or trust somone elses code
>>
>>106762012
Both are good for there specific purposes, but I use aria2 over wget nowadays.
>>
>>106762012
Curl unles you're using the spidering / recursive download features of Wget like:
>>106762649

Here's a list of things Curl can do which Wget cannot:
>HTTP/2
>HTTP/3
>Impersonate common browsers like Google Chrome (https://github.com/lwthiker/curl-impersonate): This one it's ironic that Wget can't do that since its spidering/indexing would benefit from it
>>
>>106762897
https://github.com/curl/wcurl
>>
>>106766038
>>Impersonate common browsers like Google Chrome (https://github.com/lwthiker/curl-impersonate): This one it's ironic that Wget can't do that since its spidering/indexing would benefit from it
not really. spidering is usually something you do with common courtesy and curl impersonate is meant to get around blocks maliciously - understandably. i use it.
>>
>>106766065
Not necessarily maliciously, but rather because the block is there for everyone except a web browser now and there's no other way past it for bots.

You can have a legitimate interest in scraping / spidering a website and do everything right and respectful but you're still not getting past the filters of the modern web easily, everyone has some sort of filter in place now.
>>
>>106766074
It makes we wonder how people like Archiveteam have managed:
https://wiki.archiveteam.org/

They have a fork of Wget, can it impersonate browsers or are they fucked for archiving anything that wants a browser now?

You need a bit more than "User-Agent: Chrome" to bypass the sophisticated filters that check things like TLS fingerprints, etc.
>>
curl_cffi is based



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.