[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1760713210502367.jpg (271 KB, 2048x1024)
271 KB JPG
Why is this retarded language used for essentially all data science?
>>
>Why is
Why is
>Why is
Why is
>Why is
Why is
>Why is
Why is
>Why is
Why is
>Why is
Why is
>Why is
Why is
>Why is
Why is
>>
>>108979972
Because data scientists do not know how to program. To be fair, most programmers today also don't know how to program.
>>
Essentially because scientists simultaneously aren't good at programming, and look on those who are good at programming with an amount of disdain.
>>
>>108979972
Accessible for people who don’t have the time to get good at programming and has lots of tools
>>
It's a sneaky language
>>
It's the Windows 95 of programming languages. Of course data "scientists" will love it.
>>
>>108979972
It just works.
>>
>>108979972
practically speaking, there is a certain amount of advantage to having a repl when doing research work
also, since things like tensorflow have python bindings, it is script-kiddie tier in terms of getting started
it is also FOSS and has a large community of retards behind it (unlike e.g. MATLAB)
i hate python for several reasons, but it is somewhat understandable that it dominates data science
>>
i'll also add the overhead of running python is offset by the tremendous compute cost of most data science operations
>>
>>108980693
>MATLAB
...is a steaming pile of shit.
The DRM is such a pain in the ass, especially if you have a proper license, that no one with gray matter in his skull would ever consider using it over Python with numpy and tensorflow.
My best guess would be that pirates love MATLAB. I hate MATLAB.
>>
Data science is a made up term
Some of those jobs which are titled Data Scientist are real jobs, some are not
A lot of people who are in those jobs are actually really stupid
Stupid people started using python, then other people who entered the field, whether stupid or not, are just following the conventions of their field

Hope that helps.
>>
>>108979972
It's glue between lower level, quicker programs.
You do zero processing with python, it just run and setup data from program A to B to C.
It is very quick to write python programs, even if the language is too slow to run an NES emulator.
>>
>>108979972
To promote mediocrity
>>
>>108980693
R has a better development loop in that sense, yet is nowhere near python. Python repl was always garbage and data workers widely never learned about ipython. Lua had an infinitely better loop but it got lopped off in favor of python, too. Ipython notebooks are a massive hack over the lack of good experience. So that's a very bad argument.
>>
>>108980699
It actually surprisingly often isn't, because for example you might be forced to mass process some dataset before it is in a shape that you can run your process in the first place.
>>
>>108980793
Can confirm this has been my exact experience.
>>
>>108979972
>I don't know what thing is
>I don't have understanding what thing is
>I don't care to attempt to learn about what thing is
>Therefore this thing is retarded
Typical smoothbrain IQ thought process.
>>
>>108981027
Found the pyjeet
>>
>>108981046
>strawman reply
Found the smoothbrain.
>>
>>108980748
i hate MATLAB with a passion. there is so much inheritly wrong with the language design, libraries, and performance. i'm also an expert at it. thank jesus those days are behind me though, i'll likely never have to use it again.
>>
It's funny that Python caught on when a complete beginner to making programming languages who is following along with crafting interpreters would end up with something much much faster and of higher quality than the python interpreter. There is no reason what so ever why an add operation in any programming language should take up to 120x longer than it does in C like Python's can beyond the incompetence of the language designers. Python and Javascript becoming the defacto languages of the modern era proves God is not the kind and loving thing the Christians claim he is.
>>
>>108981054
>didn't deny anything
Surrender accepted
>>
>>108981076
This. Jews were right. God is twice as evil as he is good.
>>
>>108979972
Because some of the libraries for it are really quite good. The language itself is about 95% steaming turd once you get past the surface.
(Yes, I've used it quite a bit at work. It's the worst option, except for all the other ones that involve a fuckton of work to get anywhere close because of the library situation.)
>>
>>108981088
Gotta slightly correct this statement. They are terrible, even numpy is full of ridiculous bugs. However, they are very useful because there is no equivalent elsewhere at least not combined with common 'partnered' libraries.
>>
>>108981077
Yeah i don't deny the fact that you have the same IQ scores as a somalian.
I too accept your surrender and concession.
>>
File: 1772252811695907.png (250 KB, 718x588)
250 KB PNG
>>108980400
INDENTATION ERROR
>>
File: 1731280101968.gif (21 KB, 199x200)
21 KB GIF
>NOOOO, you can't just make a list of things and be done with it, you need to choose the right data structure and algorithms
>>
>>108981176
You deserve it.
>>
File: 1769790094741308.png (20 KB, 600x600)
20 KB PNG
Bow down to the superior language pyjeets
>>
>>108980693
>>108980822
Both correct
>>108980699
Also kind of correct. Another factor is that the audience is often a relatively small community of other researchers (a few thousand people or less). So the process looks like:

1. Have huge amount of raw data
2. Process the data with optimized C and Fortran programs, wrapped by python scripts to handle options and put the output in the right place. In some cases the programs are standalone, sometimes they're wrapped APIs. This step may take days even with the optimized C code.
3. Develop visualizations (graphs/heatmaps/diagrams/etc) to illustrate what you discovered with the data.
4. Put the visualizations on a website that you link to in your paper (or book).

In some cases, the results boil down to static graphics that can be displayed in PNG or SVG files. In other cases maybe you have a PHP/rails/django/node.js backend with some kind of javascript browser cancer on the frontend just like any other modern webdev. (Or you just use unadorned html forms and checkboxes on CGI scripts like it's 2001, turns out that method still works just fine). These will let the users to explore the data with simple options and entry fields. Since your audience is intrinsically limited to other researchers, you aren't worried about optimizing for millions of pageviews or addictive engagement or anything like that.
>>
>>108979972
>simple tables without a bunch of unnecessary boilerplate and having to worry about memory allocation + explicit garbage collection
>robust libraries for nice, multithreaded data transformations
>humane design language with human-readable syntax
why not? I never understood Python hate.
I rarely use it but I do not see a problem.
>>
>>108981431
All the levels of the language from the base bytecode up to and including the __dunders__ are just full of bad design decisions, and way too many language features depend on that. The result is that you're stuck being very slow, and there's basically nothing you can do about it without utterly gutting the best thing in the language; that massive collection of third party libraries is the true moat.

The threading model isn't quite as fucked as it was, but it's close. The value/type model is all sorts of fucked though, and that makes deciding what operations really do super difficult, which in turn makes unboxing values really awkward, and that would be the big way to go faster. But the language and its uses fight against that so hard. PyPy goes some of the way, but at great cost, and not nearly far enough.
I think you can tell I've thought about this a bit.
>>
because the alternative is spending a grand on matlab or mathematica, which most people don't want to do.
>>
>>108981506
>All the levels of the language from the base bytecode up to and including the __dunders__ are just full of bad design decisions
They don't matter though. Not when you're using python in its proper place.
The heavy lifting is all done in lower-level languages.
PicRel is a high-level gemini summary of what an actual real-world data science pipeline looks like and why Python is not the bottleneck. This pipeline might take hours to run. You aren't going to fuss over python taking an extra 30 seconds to write the final output file. Certainly not worth giving up all the ergonomics of the dynamic language.
>>
>>108979972
Trying to start a discussion when your first and only statement shows you're a clueless retard isn't a good idea
>>
>>108980400
jews
>>
>>108980453
*snakey
>>
>>108980748
>>108981072
i love you. both.
>>
Easy to fuck around with due to being interpreted, powerful enough for anything a DevOp/Sysmonkey might need and not as retarded as Perl.
>>
>>108983022
nothing is retarded as perl. it is the yardstick by which retardation is measured.
>>
>>108983027
Didn't say it was a high bar to clear.
>>
>>108983022
>DevOp/Sysmonkey
if you are using python for these tasks, you shouldn't have that job
>>
>>108983055
stockholm syndrome with your employer. lol. lmao even.
>>
>>108983064
those are shell tasks
if you don't understand the difference between a shell and python, you definitely shouldn't have that kind of job
>>
>>108983135
that isnt the point now, is it.
>>
>>108983055
>>108983135
not him but there are cases where bash is not the best fit
if you end up trying to write your own one off program in bash you may as well just use python instead. bash isn't good at being a normal language. but yes, if there's a utility out there that already does what you want, it's best to use bash
>>
>>108979972
>need language to process terabytes of data
>memory allocation is automated for you
It’s a mystery
>>
>>108983055
>reformat post for brevity
>remove "shell" since it's not really relevant
>cause unemployed anon to shit self over it
ahh, pottery.
>>
File: IMG_0058.jpg (75 KB, 750x1000)
75 KB JPG
I like python simply because it make /g/ayniggers seethe.
>>
>>108979972
at least it's not bash or powershell
>>
>>108983244
shell is not geared to general purpose programming
it's for executing processes
>>
>>108983244
bash is wonderful and you take that back
>>
>>108983314
writing in bash creates more subtle bugs than does writing in c create buffer overflows.
>>
>>108979972
Because data science isnt a real discipline
>>
>>108983203
>Choosing based on wether people like it or not
Women detected.
>>
>>108983331
brown hands typed this
>>
I like Python because it's easy and concise.
>>
>>108983381
>Concise
You clearly haven't write many python and/or have never seen python 2.
>>
Python is like pseudo code, it's amazing.
>>
>>108983405
works just about as well as pseudo code, too
>>
>>108980991
>>108979972
python is by far the easiest language to use when developing one-off experimental set-ups because it's got all the normal features you'd expect from a programming language, but it's also got full and trivial runtime introspection so any IDE with a debugger just lets you start a script with a few lines of code and develop it dynamically using breakpoints and the REPL

the performance critical parts of your code can usually be made "good enough" using vectorized libraries and since you're not writing anything that has to run more than once in a while you don't care how fast the setup and plumbing run, only that you can trivially access any object's internal states at any point to find subtle math errors and such

I also write C and C++ because my field requires it but the last thing you wanna be doing is developing your algorithm while writing performant software. You figure out exactly how it will work first, then write fast code in a "real programming language" later if it's required
>>
>>108983441
you have never developed anything remotely novel in your entire life.
>>
>>108981376
Step 2 never happens in real life. It's very rarely vectorizable so can't offload to numpy or similar. It ends up just being slow ass python. People hack around it by caching intermediary results at best.
>>
>>108983462
Literally nobody develops with python this way. The closest is ipython notebook-exclusive workflows. Why do you retards always make shit up like this? Are you just asking chatgpt or something?
>>
>>108983462
Sounds like python is a gimped version of a lisp like racket.
>>
>>108983055
I worked as a devops guy for years. Hard to justify using anything but python for most tasks. Only real alternative was ruby and ruby is at least as bad as python. Shell script is better for some cases but most of the time it's worse. I used C for the hell of it a few times for throwaway test tools but it was a waste of time, using python would have had no drawbacks. Golang is increasingly an option especially with LLM assistance, but it's still usually a waste of effort. If you're just wrapping LDAP commands and doing string manipulation you don't need fanatical type-checking.
>>108983587
By real life you mean the retards in your undergraduate data science class. Actual real life data science workflows look like this: >>108982832
>>
>>108983591
>Literally nobody develops with python this way.
Literally everyone does. If you use dynamic languages as if they were AOT-compiled languages, you have skill issues.
>>
>>108983591
everyone does, broheim
>>
I agree
>>
>>108983718
It basically is, but Python has helpful, consistent, easy-to-remember syntax for all the most common cases. It's very low-friction to write.

Lisp is fun to use but is not easier to write than Python and is often not more productive either, even if you aren't relying on libraries from Python's popular ecosystem. When sketching algorithms, most people want to be able write:
c = a[0] + b[4]
print(c)

Not:
(let ([c (+ (first a) (list-ref b 4))])
(displayln c))


That's a trivial example but that pattern scales to an extent also. There are similar patterns with loops and conditionals also, that are just easier to do in Python (or R).
>>
File: Sem título.png (75 KB, 1126x571)
75 KB PNG
>>108979972
I don't know what data science actually is or does but I use Python/Pandas to fix shitty accounting reports from pajeet systems.

>>108980614
This
>>
>>108979972
Pytgon's pip repository and overall documentation is terrible. I really don't understand why its used more than R based on CRAN alone, not to mention how much better the documentation is
>>
>Why do "data scientists" use a xyz

Well data science is not science, so would you trust anything that anyone says when his entire career is a lie?
>>
>>108985798
Python is used outside of Data Science also. It's a general purpose language that many people learn to do simple scripting. Therefore it has a much larger "install base" than R leading to a positive feedback loop of more libraries targeting Python than R (eg Tensorflow, PyTorch). R is still used a lot although pressure from Python seems to be eating at it. Anaconda dropped its R channel in 2025.
>>
>>108979972
Because jeets don't know Scala
>>
>>108985862
Anaconda is about the worst piece of software I have ever had the displeasure of installing on a computer. I consider it tantamount to malware.
>>
>>108979972
Literally one of the least retarded languages.

>nOOOooooOoOo let me put retarded indentations everywhere!
>>
>>108983135
No, Python has tons of advantages over shell for most operational scripting tasks. Shell has its place but anyone who'd make a post like yours is retarded and shouldn't be trusted to maintain anyone's system.
>>
>retarded retarded retarded
Can I listen to autistic drooling instead of whatever the fuck this is?
>>
>>108985798
Does it really have less packages than R?
>>
>>108991203
NTA but my understanding of his comment isn't that R has more packages, but better packages with better documentation.
>>
>>108979972
You have severe crippling autism.
>>
>>108979972
>Whine whine bitch bitch moan
Why did companies use BASIC back in the day?
Because it fucking works well enough for 99% of use cases and you don't have to worry about stupid shit like pointers or malloc()



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.