[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/sci/ - Science & Math

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • Additional supported file types are: PDF
  • Use with [math] tags for inline and [eqn] tags for block equations.
  • Right-click equations to view the source.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: OU.png (15 KB, 368x248)
15 KB
15 KB PNG
previous thread >>16174616

This is one of the boards newest generals. Fairly high activity due to edge lords trying to be funny but instead spreading facts about the absolute state of our world.

Intro stats is fairly easy, intermediate stats come the programming and we have already have several battles about what language is the best in the thread. Nobody uses SAS funnily enough, SPSS has had some people trying to joust the edgelords who are into R and C++, while the stata children are silent as usual.

Come one, come all. State your dumb questions, /pol/tardy or not. Some fairly useful and funny math is showcased in this thread.
>>
Stats people use C++? Since when?
>>
>>16187525
the previous thread 404'd after a small handful of replies because it wasn't launched with a racial crime statics pics, this thread condemned itself it a similar fate.
lrn2/psg/ fagit
>>
>>16188514
Is that so?
>>
>>16189863
If you're that desperate for attention then you should have stuck to the thread guidelines outlined in >>16188514

>>16188592
Yes, as demonstrated by the fact that you had to bump your garbage thread that nobody wants too see off the ass end of page 10.
>>
>>16187525

Any anons working on information geometry?
>>
>>16190158
It would be interesting to hear a bit about the applications of this.
>>
Can anyone redpill me on Poisson statistics?
>>
>>16187525
Biggest lie in all of statistics is that events are independent. They're not. In casinos it's common to see streaks of 10 in roulette. If the next event is independent you'd expect 50% of the times 10 streaks are observed to continue to 11. That's not what happens.
>>
I’m doing a course on ODE and the Laplace transform is absolutely not motivated lol. Digging through wiki it appears Laplace used a similar method in working with probabilities. Anyone have more info? Any decent introductory books on probabilities, especially ones that motivate Laplace transform?
>>
Bumpety
>>
>>16190985
I'd say they have a definite place in general stochastic processes. No experience myself, but just thinking a loud.
>>
>>16187525
What would be a good, complete beginner, book for probability and statistics and one good follow-up book? Preferably with depth, just not so much as to be overwhelming for a beginner.

>>16190158
That sounds fascinating. What is it?
>>
>>16187525
What's the difference between gamma and inverse gaussian distributions? I'm doing generalized linear mixed-effects modeling
>>
>>16192404
I think 'statistical inference' by casella and berger is a good follow-up. You can find the text and solution manual on libgen (beware that the solution manual actually has mistakes in it).

To get the most out of the text, I think being solid in calculus would help. The examples/problems can be quite rigorous, so it may not be the best starting point.
>>
how do I find a consulting job to make extra money as a PhD student? I am doing CS/ML. it's harder to find internship/studentship nowaday in FAANG btw.
>>
>>16192812
fuck, wrong thread
>>
>>16192404
First do one variable and then multivariable calculus. Then do a beginning course in probability and stats. Then applied stats. Then a more foundational theoretical course in stats.
>>
>>16192812
Honestly a lot of statisticians are consultants as well. I consult sometimes. Get yourself the simplest LLC you can get in your jurisdiction, do some projects that looks pretty and put it up on some wordpress shit. Then start cold calling companies within your domain knowledge sphere. My domain knowledge is within accounting and economics, so I consult within those spheres and neither accountants nor economists are very good at hardcore stats.
>>
Ay one got data on heights of men in america by age, race, region, etc., spanning multiple decades?
>>
>>16193179
Check https://datausa.io/
>>
Any Gibbs sampling chads here?
Given winbugs/openbugs is dead and ancient what's the best route to go down, JAGS or STAN?
>>
>>16193762
STAN is nice to use but occasionally a pain in the ass to install because of dependencies
>>
>>16193179
https://www.healthdata.org/research-analysis/health-by-location/united-states/county-profiles
>>
>>16192880
Good advice, thanks
>>
>>16193913
What domain knowledge do you have?
>>
>>16192880
Is this a side hustle or your primary income?
>>
>>16194101
side but scalable.
>>
>>16193762
I will second what >>16193763 said: STAN can be a bastard to install. However, it also has a very active community (https://discourse.mc-stan.org/), so chances are good you can get help/find info if you get errors. If you're using R, there's also a couple interfaces out there that might make STAN a bit easier to use.
>>
>>16190985
The Laplace transform of a density function gives you the moment generating function of the random variable. The MGF is very important to certain aspects of statistical signal processing and detection theory (especially large deviations theory and sequential hypothesis testing).

Probability, Random Variables and Stochastic Processes by Papoulis is the standard engineering oriented probability book used at either the upper undergrad or beginning of grad school. Has a decent amount of coverage of the relevance of Fourier and Laplace transforms to probability theory.

Another book that's perhaps less introductory than deals directly with the relevance of the PSD (so Fourier vs Laplace) is Bremaud's Fourier Analysis and Stochastic Processes. That one requires a good bit more analysis background to really work with though.
>>
I'm a bio student and I would like to master statistics. I have taken some intro statistics for biologists, but it was just a couple of lectures about the normal distribution and doing a t-test.

I would like to develop a solid background in statistics, from basics to more advanced topics. What books or online courses do you recommend?
>>
>>16194520
Depends on how lost in the sauce you want to get and also on your math background.

There's basically four "standard texts" in increasing level of difficulty that people recommend for either upper level undergrad or first year grad students that aren't doing measure theoretic probability:
1) Probability and Statistical Inference by Tanis and Hogg
2) All of Statistics by Wasserman
3) Probability and Statistical Inference by Mukhopadhyay
4) Statistical Inference by Casella and Berger
>>
>>16194554
>All of Statistics
can vouch for this. it was a good read. really helped me thru my PhD.
>>
>>16194562
I'd say if the 4 all of statistics and Casella and Berger were the most helpful for me. I'm not a statistician though, I'm an engineer. Can't comment on their usefulness for actual stats grad students.
>>
>>16194568
I've only read number 2 out of the 4 that were listed. needed it cause I were preparing for an ML interview. got recommended by a friend.
>>
>>16187543
Never
>>
>>16194569
Oh, I definitely don't recommend reading all 4 of them. They cover basically the same material but at different levels of depth and slightly different emphasis.

At the point that you've gone through one of them, you probably have enough background that you can just jump right into whatever specific statistics topic you actually want to study directly.
>>
>>16192808
Thanks.
>>16192878
Not what I asked for, but thanks for trying.
>>
>>16193924
Industrial engineering
>>
>>16194554
I want to get balls deep
>>
Bump
>>
File: 764923467892349238.jpg (37 KB, 483x470)
37 KB
37 KB JPG
Who here is reading a stats book, any stats book, daily?
>>
File: 978-0-387-21718-5.jpg (83 KB, 827x1241)
83 KB
83 KB JPG
>>16197214
The deepest you can go is measure theoretic/analysis based statistics. This will give you a lot of ability to tie in tools from more advanced mathematics if you are careful.

Mathematical Statistics by Jun Shao is a pretty good starting point for this, but is assumes you are already fairly comfortable with analysis and measure theoretic probability to a certain degree.
>>
How does one read a regression table ? How do you determine whether a result is statistically significant ? Are p-values (probability of null hypothesis) related to confidence intervals ?
>>
>>16198089
Give us an example table you would like to have interpreted
>>
>>16197968
I am reading the daily racial crime stats.
>>
>>16198153
It's good to stay informed. Thoughbeit that does not count.
>>
>>16197968
I don't read stats books daily, but I have been spending some time on some intermediate probability theory on a pretty close to daily basis recently.
>>
>>16198227
Doing what with it?
>>
>>16198263
Reading the book and doing problems. I'm trying to get a better understanding of continuous time Markov chains.
>>
>>16198266
Can you post the book?
>>
>>16190158
Thank you for the good read.
>>
File: 1685919603099301.jpg (111 KB, 854x351)
111 KB
111 KB JPG
>>16198095
That one for example
>>
File: 978-3-030-40183-2.jpg (104 KB, 827x1254)
104 KB
104 KB JPG
>>16198270
Sorry, I thought I had mentioned it in that post. Looking back I didn't.

I'm going through this book right now. Probably on the easier side for measure theoretic probability, but covers a much wider variety of stochastic process topics than the standard recommendations like Durrett, Ash, etc.
>>
>>16198352
The first row values are means and the ones in square brackets are confidence intervals (minimum and maximum). If the confidence interval crosses 0, the effect is thought to be negligible. If the CI range does not contain 0, it is thought to be statistically different from 0.

First column is calculated as just log income as a function of exports/area. Second column checks if colonizer effect and ln exports together have an effect. Third column checks if geography controls alter the effect of exports and colonizers.

P-value can be checked from a lookup table or a p-value calculator by taking in the F-stat value and calculating degrees of freedom from number of observations (usually N-1).
>>
>>16198153
Worthless.
>>
>>16198821
Nah they are good man. Gotta know what the darkies are up to.
>>
>>16198806
Thank you. Somehow you managed to explain it better than my professors.
>>
Bump
>>
>>16198836
You literally don't. It's hilarious you should say it as you have. You sound more black than I am.
>>
Any good resources for regression modeling?
>>
>>16200438
sci-kit learn user guide is good, not perfect but if you read through it you'll know sci-kit learn well enough at a minimum.

https://scikit-learn.org/stable/user_guide.html
>>
>>16200438
I would suggest 'Regression Modeling Strategies' by Frank Harrel. It's fairly approachable and covers a lot of topics (linear, logistic and ordered regression, model validation , etc).
>>
>>16200249
You are a dumb liberal faggot. What are you doing on 4chan?
>>
>>16201177
Enjoying anime because this is an anime website
>>
>>16200438
I have to learn this too. What book did you end up choosing?
>>
>>16201177
Cope. This is not your safe space, queer.
>>16201366
You're not me. I only rarely watch anime. I haven't seen any since Season 3 of Kimetsu no Yaiba.
>>
>>16202011
kimetsu no what? Are you one of those darkskinned pajeet anime watchers?
>>
>>16198744
I'll check that book out. Thanks
>>
>>16187525
why bother learning advanced SQL, R and stats when the world is run on excel, spss and "line look positive", "p value small" and "program says confidence high"
>>
>>16203871
You have two choices:
1. Join them and be doomed to reinvent the wheel every day
2. Do things that feel right and makes your works reproducible, and build a foundation for the next generation
>>
if you where to have say, 70% of A to happen and 30% of B to happen. even if you have done the math that made you come to this conclusion, would it still technically boil down to guessing?
>>
>>16204032
What do you mean? For any particular experiment (if it's properly random/stochastic) then knowing the distribution doesn't give you any ability to reliably know the outcomes. It can tell you their distribution, and you can make predictions in a statistical sense, but you can't know exactly the outcome of a probabilistic experiment without observing it.
>>
>>16204063
was thinking about situations where there is no guarantee, you are simply just using the knowledge and experience you have to get to a % outcome. like say the weather for meteorology.
>>
>>16204109
Then the answer to your question is yes. If you only know that P(A) = .7, P(B) = .3 and P(A or B) = 1, then you can't know for certain which of the two will happen until it happens.
>>
>>16204124
thanks anon
>>
Bump
>>
Give me a quick rundown on ridgeregressions plox.
>>
>>16206763
There's a few ways you can think about ridge regression.

The most straightforward way (and the way it was originally developed) is that ridge regression imposes an l2 norm constraint on your beta. You're minimizing the mean-square-error subject to your beta being within/on (depending on the setup) some sphere centered around the origin.

Another way of thinking about ridge regression is the Bayesian interpretation. Ridge regression imposes a Gaussian prior on beta.
>>
>>16207149
I always looked at it as an applied lagrange multiplier for statistics and regressions. That it's more of an optimiization thing than an error minimizer.
>>
Is anyone here studying probability / statistics on a daily basis?
>>
>>16207320
You can definitely look at it that way. In the literal sense ridge regression is an equality constraint on the L2 norm of your parameter that your objective function is applied to.

If your objective function is a linear least squares, that's the same thing as maximizing the posterior distribution of your parameter given the data with a Gaussian likelihood function on the data given the parameter and a Gaussian prior on the parameter.

It works out to be tomato tomahto.
>>
>>16208223
Thanks anon. You make me like this thread.
>>
>>16208655
Nice, this is a nice thread
>>
Tell me about the p value, what does it actually mean?
>>
>>16210217
Probability of false alarm. It's basically the probability that the particular data or test statistic you are observing could have happened randomly by chance even though the hypothesis isn't true.
>>
>>16210217
Assuming the null is true, the probability that one obtains results more extreme than what was observed.

This is a nice read about p-values: https://www.fharrell.com/post/pval-litany/#:~:text=A%20p%2Dvalue%20is%20the,the%20effect%20of%20a%20variable.
>>
>>16209340
Yes, a very nice thread.
>>
>>16208655
>>16209340
>>16211385
reading the first few chapters in the deep learning book by Yoshua bengio group would've give you this exact information. the fact that you guys are excited by this tells you guys are either undergrads or code monkeys who are ML wannabe.
>>
>>16211928
So what if they are undergrads? I don't understand your point. Yes, it's not particularly novel information if you are someone who has spent years doing Bayesian ML/Bayesian statistics, but it takes some time to see the connections between these frequentist regularization methods and the Bayesian MAP formulation of said regularization.
>>
>>16211928
Post pic of hand and it will be brown with CI of 95.
>>
>>16211946
Elitism is good, but it should be with a firm and happy hand. Not with a dull depressed heavy hand.
>>
What is the most difficult branch in statistics?
>>
>>16217068
In what way do you mean difficult? Do you mean mathematically difficult or do you mean practically difficult?
>>
>>16217241
Mathematically difficult
>>
>>16219240
I guess that depends on what you find difficult. Generally statistics gets mathematically complicated when the probability theory gets complicated.

Many people find measure theoretic statistics fairly difficult, and this will propagate throughout all of the related fields (performance analysis and large deviations theory, sequential analysis, information theoretic statistics, etc.) with this formulation.
>>
>>16211928
You're on 4chan, what did you expect?
>>
Statistics is not only useful. It's fun as well. I love to do PDEs on stats problems.
>>
>>16222603
>Statistics is fun
LOL seriously? You like anal (receiving)?
>PDE is fun
Hell yeah it is
>>
>>16222679
classic shitpost. Now go to another thread for retards.
>>
so when are you fags going to prove the theory of probability?
>>
>>16223279
lol lmao even
>>
>>16187543
They do, IF they're also computational mathematicians. The stats universities that are actually trying to push forward new or novel techniques use C++ and then make interfaces with R (because they know the applied community all uses R).

Take the INLA project as an example. And that's just something actively in development.
>>
Why is p-hacking bad? Isn't it literally just what happens as you collect more data regardless of the problem?

From a frequentist standpoint, your intervals and p-values go to zero as more data is collected simply because we are working from the interpretation of constant coefficients in our models. Statistical significance is great and all, but it's not a measure of importance or impact just 'hey this interval doesn't overlap with hypothesis X or other coefficient Y'.

I don't really understand the p-hacking problem whatsoever basically. Especially when combined with any sort of validation techniques or with any follow-on operational type question (statistically significant difference doesn't mean an impactful difference $1 is very statistically significantly different than $1.01 but doesn't actually matter in the majority of contexts).
>>
>>16223446
From my understanding, the problem with p-hacking is that you are collecting a biased sample set. It isn't just that you are collecting more data, it is that you are collecting more data under a specific subset which is more likely to show significance (e.g., tailed or skewed data science towards the extreme cases of the alternative).

It's a case of biased sample selection (or potentially pruning of negative outliers which would make your test statistics more centrally located).
>>
>>16223279
Its more of a question of how long before the theory can be proven with 100 percent accuracy. Any day now im sure..
>>
>>16223279
cope from brainlet
>>
>>16223446
P-hacking implies that you already have decided beforehand what the end result is instead of accepting the data as it is
>>
>>16224049
two more weeks right?
>>
Do any unis teach a completely unbiased course on race statistics?
>>
>>16226420
No. The same way that there are no colleges that teach entirely unbiased courses on any other highly controversial subject where there's still open research questions.
>>
>>16226420
lol god no. If you want to learn the real stuff, you have to learn it yourself. Start with the bell curve. Maybe the closest would be some analysis course on applied criminology at Quantico where they teach how the world works to federales.
>>
>>16226420
There's one prestigious uni called /pol/, you can complete a whole degree on racial statistics there
>>
>>16228727
kek
>>
are random variables a group under convolution?
>>
>>16230012
Define random
>>
your vanity thread is on page 10 again, better bump it quick
>>
>>16230928
lmao
>>
>>16230927
a function from the sample space to a subset of the reals (or real space)
>>
File: 0003.png (38 KB, 618x559)
38 KB
38 KB PNG
I LOVE <3 non parametric stats <3
>>
>>16232093
why?
>>
>>16233286
Fuck normal distributions
Fuck means
Fuck SD
>>
My PI forces me to use Matlab for all the analyses and statistics. It's surprisingly comfy but disgusting at the same time.
>>
>>16234955
You work in some kind of weird finance department?
>>
>>16235418
He probably works for the based department. Matlab is based as fuck. T. Statistical signal processing engineer.
>>
>>16235418
Applied physics
>>
>>16236011
Continue using it. Since you are in the field that actually uses it as a standard.
>>16235478
You my dear sir, are an idiot.
>>
>>16236346
I may be a retard but I'm a based retard who uses a software environment that easily handles constrained optimization of nonlinear objective functions.
>>
>>16187525
good thread OP
>>
Redpill me on gamma distributions
>>
File: misspelling.png (122 KB, 750x1050)
122 KB
122 KB PNG
Was over in another board and got suggested to post here.

Problem:
I'm doing data analysis for a refrigeration-based dehumidification product for a company. Sometimes it goes through QC no problem. Sometimes it has a lot of issues. I want to find out why.

What I've done so far:
I've been able to collate the following data (*):
1-Testing chart data for each product
2-Order form data for each product
3-BOM data for each product
(4-I'm working on getting job routing data for each product atm, as someone else in the other thread suggested to me).
Using 1, I can look at the number of failed charts to get a list of 'good' and 'bad' products.
Using 2, I can filter the previous list to only look at the dehum products.
Once I do this, I have a sample size of maybe 500 (the company is not high-volume, they make niche, custom products).
I've ran the following statistical tests:
-Script to do brute force ANOVAs of components in BOM v. good/bad end-products. This only identified outlier products' materials. For example, it was suggested things like, "The shipping crate used in the outlier is suspect." In general, I got a lot of "Pirates cause global warming" noise.
-Because of the previous results, I made all the data binary (good=1,bad=0,part in BOM=1,part not in BOM=0) and did Fisher p-testing. This only identified 'obvious' parts. Things like, "Yes, all compressors would be suspect, of fucking course, that's how refrigeration works." It didn't narrow anything down.
-I tried running correlations on some relevant variables (e.g., amount of refrigerant in product v. failed test numbers), and I just get noise.
There's a chance I missed something in these two previous tests, because there was a lot of noise to go through.
-Because of the small sample size (500), I feel I'm limited to single-variable analyses.

Can anyone think of anything else I should try?

(*) An aside vent: just getting this data collated, accessible, and cross-referenced was a PIA.
>>
File: 047072210X.jpg (35 KB, 300x469)
35 KB
35 KB JPG
>>16237819
At the end of picrel they go into something similar for VW.
>>
Do any projects graph how much the human genome has changed by year?
>>
>>16237819
You should be looking at processes not data. Just 6M: machine, man, materials, measurements, methods, and mother nature. Process failure must exist in one of these categories.
As a data analysis guy, just pareto it and list which problems are the worst and have them explore those.
>>
>>16190158
Thanks for the book.
>>
>>16198089
Absolute value is larger(preferably much) larger than the absolute value of 2, P-values are close to zero.
>>
Where can I, a noob, just ok in maths, start learning about stats?
>>
>>16241236
Download textbook with open datasets that you can easily get on the publishers website. Start going through the problems one by one until you dun goofed the entire book. Easy peasy lemon squeezy.
>>
>>16241615
which books have these open datasets?
>>
>>16242340
Not exactly a straightforward stats book, but Probabilistic Machine Learning by Kevin Murphy is free, has figures and python code on his GitHub and does have some statistics coverage. Introduction to Statistical Learning also has some code and data available.
>>
What's the point of the charateristic function again? They dont add any insights to the study of a probability distribution, unlike the mgf. So why it even exists.
>>
>>16243470
Nice, thanks
>>
>>16244215
There's a few uses for characteristic functions, especially for sampling distributions and frequency analysis for continuous time Markov chains.

In general though, an MGF is more useful if it's available, however not every probability density function has an MGF (while every probability density has a well defined characteristic function).
>>
>>16234287
Baste
>>
>>16244215
>They dont add any insights to the study of a probability distribution
read harder
>>
>>16236455
>easily handles constrained optimization of nonlinear objective functions
you can always code your own in C++, fag. it's not that hard.
>>
>>16187525
we were taught R at university but now I mostly use Python.
>>
>>16245643
> You should reinvent the wheel using older tools because I don't like you using better tools that others have made.

MATLAB is literally a professionally maintained system designed to be effective at solving these optimization problems. I could implement everything from scratch in assembly too, but it would be stupid to do so when others have spent their life's work building tools to do it for me.
>>
>>16245662
>MATLAB is literally a professionally maintained system
but then you're stuck with Matlab, faggot. it's a horrible language.
>>
>>16245690
> But then you're stuck with MATLAB, the industry standard for solving the exact problems MATLAB excels in.

You might as well say that researchers who study Neural Networks architectures are "stuck with Python."

You don't have to like MATLAB. It's not perfect and it's expensive, but it's not an accident that it's the industry standard in many fields of physics, engineering and optimization. There's nothing that MATLAB does that you couldn't do in some other general purpose language, but you'd likely have to make from scratch tools that MATLAB already handles natively in C.
>>
File: yasu.png (485 KB, 712x697)
485 KB
485 KB PNG
Matlab = SHIT TIER
Python = MEH TIER
R = GOD TIER

prove me wrong faggits
>>
>>16245856
All three of them are good choices for a general purpose statistics/data analysis language with each having certain things they excel at.

R is fantastic if you are working on theoretical statistics or looking to pull from the (many many) open data libraries from the natural sciences. A lot of the cutting edge of mathematical statistics work gets done in R and that's not an accident.

Python is flexible beyond either of the other two and provides unparalleled support for machine learning/adaptive statistics. If you are doing anything at all involving Neural Networks, decision trees or HMM's Python offers quite a lot to you.

MATLAB is the absolute king of matrix based scientific computing. It's literally what the name stands for, "matrix laboratory." If you are doing work that involves a lot of linear algebra (e.g., non-linear programming based statistics, Bayesian optimization or Kalman filtering/target tracking, adaptive linear filtering or stochastic control, etc.) you basically can't beat what MATLAB has to offer. Python is finally starting to see some decent target tracking support with the work being done by the developers of the Stone Soup library, but if you work in anything at all with radar/sonar/lidar/gps etc. you basically can't avoid Matlab.

Honorable mentions go to Julia for their efforts into scientific computing and emphasis on parallelization. Julia is also a great option to learn (but it's still pretty new so don't be surprised if it's not as well supported as the others).
>>
>>16245882
I was shitpoasting, but I do appreciate your god tier poasts on radar, sonar and applied stats. So when I am shitting on matlab, I am not shitting on you. So that is clear. I am shitting on the universities who are cheap fucks and cannot re-tool their shit to make their students better suited for the market place.
>>
>>16245882
Julia seems like fun, but very niche.
>>
File: F.png (5 KB, 256x256)
5 KB
5 KB PNG
>>16245882
>MATLAB is the absolute king of matrix based scientific computing
*ahem*
>>
>>16245967
Do people actually still use Fortran? I know a lot of the old gods of the field still reach to Fortran, but I've never met anyone under 70 who uses it on a regular basis.
>>
>>16245982
They do, you can actually get pretty spicy jobs if you have 10 years plus exp with Fortran.
>>
File: Fortran.png (10 KB, 566x149)
10 KB
10 KB PNG
>>16246016
500+ jobs with Fortran? what the fuckkk?
>>
>>16246016
That's good to know! The only thing in my world that still is actively maintained in Fortran is the official OA Labs engine for Bellhop/Kraken for underwater acoustic ray tracing. It's neat to hear that people are still actively using Fortran for real development in the year of our Lord 2024. Makes me feel less old.
>>
>>16245967
I've wanted to learn Fortran for a while but never bothered
>>
>>16190893
Hot hand fallacy
>>
>>16245982
About 70% of all HPC code is Fortran. It's absolutely entrenched and it will never change. And this 70% figure comes after decades of people attempting to force a change to C/C++ as the standard. We've also got CUDA Fortran now.
I'm in my 20s and picked up Fortran and I actually enjoy using it because of how simple and clear it is. Very easy to learn, and modern Fortran is not the abomination it once was with GOTO statements everywhere. It's also unbeatable when it comes to parallel computing.
>>16246016
I got a temporary job in my old department as an undergrad entirely because I was the only one who bothered to learn Fortran. They had an old codebase that needed to be looked at and for some reason nobody else wanted to work in Fortran because people were convinced it was obsolete, therefore no takers for the position, but it turns out stuff shouldn't be ignored just because it's old.
The hard part about Fortran is actually that it's normally written for very specialised purposes, so the trick is a lot of code you're going to read is likely going to require a relatively large amount of additional knowledge to understand properly. A lot of the time, a boomer will have written a numerical solver and not bothered to explain why an equation is there, or what it's doing. If they also spam GOTO a lot, then good luck.
>>
>>16246019
where? there are 4 in my entire shithole country
>>
>>16192812
>PhD student
It will be hard. I earned my PhD in 2020, now run a data science/ml/stats team (it's an interesting hybrid team internal to a big company), and all of our consultants tend to have PhDs and experience. The sort of "natural" lead in to being a consultant is to work in the space for a while, get to know all kinds of people while working with clients, and eventually just starting your own consultancy agency with the known contacts as your primary customers. It's very relationship driven. Without known contacts and without a PhD and experience, it will be difficult, but I guess not impossible; you'll just have to select smaller jobs/smaller companies and undercharge.
> it's harder to find internship/studentship nowaday in FAANG
Biggest tip to CS peeps: Fuck FAANG, its the worst option. There are about 10,000 new startups, especially in biotech, who need CS people. They tend to have a harder time finding people because they aren't very well advertised. While my peers were doing 5 rounds of interviews at FAANG and not getting internships, I found a super local biotech which had 0 SEO by googling the area, and messaged them. They essentially hired me right away as an intern, and then hired me for real about 2 months later. It was a startup with 5 people and they just really sucked at advertising, googling their name they didn't even come up. It sounds "and then everyone clapped", but I also found my second job the same way.
>>
>>16245856
R is got tier, I love it.
Python gets a bad rap but honestly the amount of mature libraries make it my preferred tool. I've only ever needed to write a couple of functions in Rust for speedup, but Python is generally a glorified C wrapper so is plenty fast.
I generally do all of my processing in Python and then export to R for fancy stats and for plotting (ggplot is still absolute god-tier for plotting, fuck matplotlib although seaborn is okay).
>>16245882
>MATLAB
absolutely fuck matlab, I used it for my whole PhD. It has way too many data types, any and all useful functional toolboxes become obsolete after about a year because they actively change EVERY useful function (removing them, merging them, changing them completely) and have no concept of stable, reusable code, and the stupid ass 2x year A and B release is just nonsense. No one can use your code unless they buy matlab (or you use their executable export BS but that's a mess).
I dislike everything about matlab. I used it for some of the things you say its good for (kalman filters for noisy object tracking) which it was great at the time for image processing, but I would rather implement kalman filters from scratch than use their implementation which I KNOW will change and break my code in 2 years.
I tried to run an app process I wrote in 2018B, in 2020A- and it didn't work because half the functions no longer existed. Not that I could check now because I refuse to pay for it.
Fuck MATLAB.
>>
>>16247462
>especially in biotech
everyone hate bio for a reason. those are the worst companies to work in. low pay, low equity, toxic morons ordering you around. they don't know lots about CS so they sometimes ask outrageous shit that only companies like Google barely have the capablities to execute.
also, most biotechs goes bankrupt because of failing FDA or just some scamming scheme to siphon money from investors anyway so expect your equities portion have a 90% chance of being worthless.
>>
>>16247482
Wtf are you on about. Matlab is extremely backwards compatible. And if they change anything, they give you deprecation warnings.

Python is the one that breaks shit constantly.
>>
>>16247502
>Matlab is extremely backwards compatible
Matlab specifically keeps every version as separate entities because they make changes to their toolboxes constantly. I don't know what to tell you other than, using their image toolbox from ~2016-2020, half of the functions were merged or removed. My code literally doesn't work between version because they changed the toolbox so much. There's not much I can say other than that.
Base matlab may be more stable, but it then just becomes a neutered language if you decide to ignore the toolboxes.
>>16247502
>Python is the one that breaks shit constantly
I don't find this to be the case, but maybe its because everyone uses virtual environments to self-contain projects and version. For free. Without downloading a whole separate multi-gigabyte "version" of the language.
>>
>>16247547
>Without downloading a whole separate multi-gigabyte "version" of the language.
Yeah, just download and maintain 10 versions of python and 20 versions of every python package on your computer
>>
>>16247564
Isn't python pretty backwards compatible? At least within the different versions, like 2.0, 3.0 etc.
>>
>>16248534
Python itself is ok, but the packages break compatibility with every minor update
>>
>>16248998
>packages break compatibility with every minor update
that's the problem with the packages, not python tho. even tho I think python authorities should enforce some kind of standard on backward compatibility of the 3rd party packages. worst yet I've seen is when a package is no longer maintained, its older complied binaries cease to exist on some corpo servers and your environment installation no longer work or you have to compile the binaries from sources, which can take a day just because random crap breaks.
>>
>>16192812
Send emails to every local business. Eventually someone will respond positively
>>
>>16187525
I came up with an interesting replacement for t-tests recently, and I want to share it. Basically, the exact way to get the p-value is to get the number of permutations where the difference in means is greater than or equal to the difference seen in the experiment, and then divide that by the total number of permutations. This is called a permutation test, but it's usually too expensive to compute, so people use t-tests as an approximation. What I've realized is that since computers are so powerful nowadays, you can just approximate the permutation test with monte carlo simulations, which avoids the headache of checking if your data meets the assumptions of a t-test.
>>16197968
Been trying to, but I've gotten lazy recently. Going to get back into it because of this comment.
>>
>>16250290
> What I've realized is that since computers are so powerful nowadays, you can just approximate the permutation test with monte carlo simulations, which avoids the headache of checking if your data meets the assumptions of a t-test.

Combinatorial explosion is going to fuck you up good my friend. Assignment algorithms are great to demonstrate exactly why you can't just wave your hands and say "powerful computers will fix it all."

Let's say you have a fancy global optimization based parking assignment algorithm and you have a (fairly small) parking lot of 100 spots and you want to prove that your algorithm is better than random assignment no matter what the starting layout is. There's 2^100 possible permutations, but with Monte Carlo sampling you could probably reduce your permutation test burden to 2^80 or so trials needed to reject the null.

Let's say you have a really powerful computer that can do 10,000 of these assignments per second, (which is actually very optimistic for a potentially 100 x 100 integer programming problem).

These Monte Carlo trials would take you a speedy 3.8E12 years to complete. Quite quick actually!

Let's say now you've got 100,000 of these computers arranged in some sort of sci-fi super cluster (and magically have instantaneous synchronization and no potential for accidentally repeated permutations). This would reduce your time to complete these trials down to a much more manageable 38 million years.
Now if you had a million of these 100,000 computer super clusters with perfect parallelization/synchronization and no data management issues, you could validate your algorithm in 38 years of constant Monte Carlo trials! You might need 10 nuclear powerplants solely dedicated to supporting your computing power to test your one little parking assignment algorithm, but you could do it!

I think dealing with the De Moivre Laplace approximation is a better choice in most of these kinds of circumstances.
>>
>>16245967
Surprisingly easy syntax. A lot like BASIC back in the day
>>
>>16250431
Why are you holding the permutation test
to a higher standard than the original t test?
If 0.001 of sampled permutations have a higher
test statistic, the p value is 0.001. Sure there
will be billions of permutations with a higher
test statistic but there's no need to get all of
them.
Why do you have to know a tiny p value exactly?
>>
>>16251871
Right so since you haven’t and can’t sampl all of them, you’re back to having to test the ones you did sample for statistical significance
>>
bump
>>
I know this might be retarded but are there any cutting edge research topic at the intersection of convex optimization and statistics?
>>
File: Fairy Skills.png (164 KB, 1273x543)
164 KB
164 KB PNG
Prefacing this by stating I'm not very good with stats. I have a stats question based on a video game I play and how certain skills are randomly learned.
When this character levels up enough to learn a skill, it can learn one of seven skills, and each skill has varying tiers of that skill that can be learned. The game states skills are attempted to be learned in a specific order, rather than all at once, see pic related - the first skill is attempted to be learned at a 1% chance of success, and if that fails then the next tier is attempted at a 2% chance of success, and so on down the columns then across the rows.
This means the actual learning chance isn't simply the chance of each skill, right?
If there were 100 skills each at a 1% chance, then the resulting learned skills as more and more are learned would start looking like a normal distribution centered around the midpoint of the list (I think?). However the skill chance is not constant, so there would be some bias to the distribution but I can't figure out how to combine the two.
To complicate matters further, if the character successfully learns any tier of one skill, the rest of the tiers of that skill are then unable to be learned, so the list is shortened.
>>
>>16253272
Yes, a lot actually.
>>
>>16253272
Yes, non-linear programming approaches to statistical estimation problems are very powerful. In particular, you'll see cone-tangent and Fenchel duality approaches to constrained NLS solutions used in all sorts of problematic statistics problems in physics and engineering (e.g., inverse parameter problems for things like distance or directional cosine based direction).
>>
Bayesian probability theory made me lose faith in humankind. Also the goat gameshow thing with 1000 doors and the host opens 998 other doors.
>>
>>16254025
What about Bayesianism has made you lose faith? Is it the interpretation of probability as a "belief" or "uncertainty" or is it something more about the mathematical approach to Bayesianism?
>>
wtf are you guys “programming” ?
>>
>>16253981
>cone-tangent and Fenchel duality approaches
lol. I am unironically at this part in a 140-pages paper I'm reading.
>>
File: 1708000923106363m.jpg (75 KB, 691x1024)
75 KB
75 KB JPG
>>16187525
suuuup?
>>
>>16254085
Fenchel conjugates are also super important for large deviations theory, which form the basis for near-optimal fixed sample size hypothesis testing when your elementwise test statistic is not necessarily a log likelihood ratio. Convex analysis and information theory both can be made very useful to statistics if you feel like learning some math.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.