[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/sci/ - Science & Math

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • Additional supported file types are: PDF
  • Use with [math] tags for inline and [eqn] tags for block equations.
  • Right-click equations to view the source.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: OU.png (15 KB, 368x248)
15 KB
15 KB PNG
previous thread >>16174616

This is one of the boards newest generals. Fairly high activity due to edge lords trying to be funny but instead spreading facts about the absolute state of our world.

Intro stats is fairly easy, intermediate stats come the programming and we have already have several battles about what language is the best in the thread. Nobody uses SAS funnily enough, SPSS has had some people trying to joust the edgelords who are into R and C++, while the stata children are silent as usual.

Come one, come all. State your dumb questions, /pol/tardy or not. Some fairly useful and funny math is showcased in this thread.
>>
Stats people use C++? Since when?
>>
>>16187525
the previous thread 404'd after a small handful of replies because it wasn't launched with a racial crime statics pics, this thread condemned itself it a similar fate.
lrn2/psg/ fagit
>>
>>16188514
Is that so?
>>
>>16189863
If you're that desperate for attention then you should have stuck to the thread guidelines outlined in >>16188514

>>16188592
Yes, as demonstrated by the fact that you had to bump your garbage thread that nobody wants too see off the ass end of page 10.
>>
>>16187525

Any anons working on information geometry?
>>
>>16190158
It would be interesting to hear a bit about the applications of this.
>>
Can anyone redpill me on Poisson statistics?
>>
>>16187525
Biggest lie in all of statistics is that events are independent. They're not. In casinos it's common to see streaks of 10 in roulette. If the next event is independent you'd expect 50% of the times 10 streaks are observed to continue to 11. That's not what happens.
>>
I’m doing a course on ODE and the Laplace transform is absolutely not motivated lol. Digging through wiki it appears Laplace used a similar method in working with probabilities. Anyone have more info? Any decent introductory books on probabilities, especially ones that motivate Laplace transform?
>>
Bumpety
>>
>>16190985
I'd say they have a definite place in general stochastic processes. No experience myself, but just thinking a loud.
>>
>>16187525
What would be a good, complete beginner, book for probability and statistics and one good follow-up book? Preferably with depth, just not so much as to be overwhelming for a beginner.

>>16190158
That sounds fascinating. What is it?
>>
>>16187525
What's the difference between gamma and inverse gaussian distributions? I'm doing generalized linear mixed-effects modeling
>>
>>16192404
I think 'statistical inference' by casella and berger is a good follow-up. You can find the text and solution manual on libgen (beware that the solution manual actually has mistakes in it).

To get the most out of the text, I think being solid in calculus would help. The examples/problems can be quite rigorous, so it may not be the best starting point.
>>
how do I find a consulting job to make extra money as a PhD student? I am doing CS/ML. it's harder to find internship/studentship nowaday in FAANG btw.
>>
>>16192812
fuck, wrong thread
>>
>>16192404
First do one variable and then multivariable calculus. Then do a beginning course in probability and stats. Then applied stats. Then a more foundational theoretical course in stats.
>>
>>16192812
Honestly a lot of statisticians are consultants as well. I consult sometimes. Get yourself the simplest LLC you can get in your jurisdiction, do some projects that looks pretty and put it up on some wordpress shit. Then start cold calling companies within your domain knowledge sphere. My domain knowledge is within accounting and economics, so I consult within those spheres and neither accountants nor economists are very good at hardcore stats.
>>
Ay one got data on heights of men in america by age, race, region, etc., spanning multiple decades?
>>
>>16193179
Check https://datausa.io/
>>
Any Gibbs sampling chads here?
Given winbugs/openbugs is dead and ancient what's the best route to go down, JAGS or STAN?
>>
>>16193762
STAN is nice to use but occasionally a pain in the ass to install because of dependencies
>>
>>16193179
https://www.healthdata.org/research-analysis/health-by-location/united-states/county-profiles
>>
>>16192880
Good advice, thanks
>>
>>16193913
What domain knowledge do you have?
>>
>>16192880
Is this a side hustle or your primary income?
>>
>>16194101
side but scalable.
>>
>>16193762
I will second what >>16193763 said: STAN can be a bastard to install. However, it also has a very active community (https://discourse.mc-stan.org/), so chances are good you can get help/find info if you get errors. If you're using R, there's also a couple interfaces out there that might make STAN a bit easier to use.
>>
>>16190985
The Laplace transform of a density function gives you the moment generating function of the random variable. The MGF is very important to certain aspects of statistical signal processing and detection theory (especially large deviations theory and sequential hypothesis testing).

Probability, Random Variables and Stochastic Processes by Papoulis is the standard engineering oriented probability book used at either the upper undergrad or beginning of grad school. Has a decent amount of coverage of the relevance of Fourier and Laplace transforms to probability theory.

Another book that's perhaps less introductory than deals directly with the relevance of the PSD (so Fourier vs Laplace) is Bremaud's Fourier Analysis and Stochastic Processes. That one requires a good bit more analysis background to really work with though.
>>
I'm a bio student and I would like to master statistics. I have taken some intro statistics for biologists, but it was just a couple of lectures about the normal distribution and doing a t-test.

I would like to develop a solid background in statistics, from basics to more advanced topics. What books or online courses do you recommend?
>>
>>16194520
Depends on how lost in the sauce you want to get and also on your math background.

There's basically four "standard texts" in increasing level of difficulty that people recommend for either upper level undergrad or first year grad students that aren't doing measure theoretic probability:
1) Probability and Statistical Inference by Tanis and Hogg
2) All of Statistics by Wasserman
3) Probability and Statistical Inference by Mukhopadhyay
4) Statistical Inference by Casella and Berger
>>
>>16194554
>All of Statistics
can vouch for this. it was a good read. really helped me thru my PhD.
>>
>>16194562
I'd say if the 4 all of statistics and Casella and Berger were the most helpful for me. I'm not a statistician though, I'm an engineer. Can't comment on their usefulness for actual stats grad students.
>>
>>16194568
I've only read number 2 out of the 4 that were listed. needed it cause I were preparing for an ML interview. got recommended by a friend.
>>
>>16187543
Never
>>
>>16194569
Oh, I definitely don't recommend reading all 4 of them. They cover basically the same material but at different levels of depth and slightly different emphasis.

At the point that you've gone through one of them, you probably have enough background that you can just jump right into whatever specific statistics topic you actually want to study directly.
>>
>>16192808
Thanks.
>>16192878
Not what I asked for, but thanks for trying.
>>
>>16193924
Industrial engineering
>>
>>16194554
I want to get balls deep
>>
Bump
>>
File: 764923467892349238.jpg (37 KB, 483x470)
37 KB
37 KB JPG
Who here is reading a stats book, any stats book, daily?
>>
File: 978-0-387-21718-5.jpg (83 KB, 827x1241)
83 KB
83 KB JPG
>>16197214
The deepest you can go is measure theoretic/analysis based statistics. This will give you a lot of ability to tie in tools from more advanced mathematics if you are careful.

Mathematical Statistics by Jun Shao is a pretty good starting point for this, but is assumes you are already fairly comfortable with analysis and measure theoretic probability to a certain degree.
>>
How does one read a regression table ? How do you determine whether a result is statistically significant ? Are p-values (probability of null hypothesis) related to confidence intervals ?
>>
>>16198089
Give us an example table you would like to have interpreted
>>
>>16197968
I am reading the daily racial crime stats.
>>
>>16198153
It's good to stay informed. Thoughbeit that does not count.
>>
>>16197968
I don't read stats books daily, but I have been spending some time on some intermediate probability theory on a pretty close to daily basis recently.
>>
>>16198227
Doing what with it?
>>
>>16198263
Reading the book and doing problems. I'm trying to get a better understanding of continuous time Markov chains.
>>
>>16198266
Can you post the book?
>>
>>16190158
Thank you for the good read.
>>
File: 1685919603099301.jpg (111 KB, 854x351)
111 KB
111 KB JPG
>>16198095
That one for example
>>
File: 978-3-030-40183-2.jpg (104 KB, 827x1254)
104 KB
104 KB JPG
>>16198270
Sorry, I thought I had mentioned it in that post. Looking back I didn't.

I'm going through this book right now. Probably on the easier side for measure theoretic probability, but covers a much wider variety of stochastic process topics than the standard recommendations like Durrett, Ash, etc.
>>
>>16198352
The first row values are means and the ones in square brackets are confidence intervals (minimum and maximum). If the confidence interval crosses 0, the effect is thought to be negligible. If the CI range does not contain 0, it is thought to be statistically different from 0.

First column is calculated as just log income as a function of exports/area. Second column checks if colonizer effect and ln exports together have an effect. Third column checks if geography controls alter the effect of exports and colonizers.

P-value can be checked from a lookup table or a p-value calculator by taking in the F-stat value and calculating degrees of freedom from number of observations (usually N-1).
>>
>>16198153
Worthless.
>>
>>16198821
Nah they are good man. Gotta know what the darkies are up to.
>>
>>16198806
Thank you. Somehow you managed to explain it better than my professors.
>>
Bump
>>
>>16198836
You literally don't. It's hilarious you should say it as you have. You sound more black than I am.
>>
Any good resources for regression modeling?
>>
>>16200438
sci-kit learn user guide is good, not perfect but if you read through it you'll know sci-kit learn well enough at a minimum.

https://scikit-learn.org/stable/user_guide.html
>>
>>16200438
I would suggest 'Regression Modeling Strategies' by Frank Harrel. It's fairly approachable and covers a lot of topics (linear, logistic and ordered regression, model validation , etc).
>>
>>16200249
You are a dumb liberal faggot. What are you doing on 4chan?
>>
>>16201177
Enjoying anime because this is an anime website
>>
>>16200438
I have to learn this too. What book did you end up choosing?
>>
>>16201177
Cope. This is not your safe space, queer.
>>16201366
You're not me. I only rarely watch anime. I haven't seen any since Season 3 of Kimetsu no Yaiba.
>>
>>16202011
kimetsu no what? Are you one of those darkskinned pajeet anime watchers?
>>
>>16198744
I'll check that book out. Thanks
>>
>>16187525
why bother learning advanced SQL, R and stats when the world is run on excel, spss and "line look positive", "p value small" and "program says confidence high"
>>
>>16203871
You have two choices:
1. Join them and be doomed to reinvent the wheel every day
2. Do things that feel right and makes your works reproducible, and build a foundation for the next generation
>>
if you where to have say, 70% of A to happen and 30% of B to happen. even if you have done the math that made you come to this conclusion, would it still technically boil down to guessing?
>>
>>16204032
What do you mean? For any particular experiment (if it's properly random/stochastic) then knowing the distribution doesn't give you any ability to reliably know the outcomes. It can tell you their distribution, and you can make predictions in a statistical sense, but you can't know exactly the outcome of a probabilistic experiment without observing it.
>>
>>16204063
was thinking about situations where there is no guarantee, you are simply just using the knowledge and experience you have to get to a % outcome. like say the weather for meteorology.
>>
>>16204109
Then the answer to your question is yes. If you only know that P(A) = .7, P(B) = .3 and P(A or B) = 1, then you can't know for certain which of the two will happen until it happens.
>>
>>16204124
thanks anon
>>
Bump
>>
Give me a quick rundown on ridgeregressions plox.
>>
>>16206763
There's a few ways you can think about ridge regression.

The most straightforward way (and the way it was originally developed) is that ridge regression imposes an l2 norm constraint on your beta. You're minimizing the mean-square-error subject to your beta being within/on (depending on the setup) some sphere centered around the origin.

Another way of thinking about ridge regression is the Bayesian interpretation. Ridge regression imposes a Gaussian prior on beta.
>>
>>16207149
I always looked at it as an applied lagrange multiplier for statistics and regressions. That it's more of an optimiization thing than an error minimizer.
>>
Is anyone here studying probability / statistics on a daily basis?
>>
>>16207320
You can definitely look at it that way. In the literal sense ridge regression is an equality constraint on the L2 norm of your parameter that your objective function is applied to.

If your objective function is a linear least squares, that's the same thing as maximizing the posterior distribution of your parameter given the data with a Gaussian likelihood function on the data given the parameter and a Gaussian prior on the parameter.

It works out to be tomato tomahto.
>>
>>16208223
Thanks anon. You make me like this thread.
>>
>>16208655
Nice, this is a nice thread
>>
Tell me about the p value, what does it actually mean?
>>
>>16210217
Probability of false alarm. It's basically the probability that the particular data or test statistic you are observing could have happened randomly by chance even though the hypothesis isn't true.
>>
>>16210217
Assuming the null is true, the probability that one obtains results more extreme than what was observed.

This is a nice read about p-values: https://www.fharrell.com/post/pval-litany/#:~:text=A%20p%2Dvalue%20is%20the,the%20effect%20of%20a%20variable.
>>
>>16209340
Yes, a very nice thread.
>>
>>16208655
>>16209340
>>16211385
reading the first few chapters in the deep learning book by Yoshua bengio group would've give you this exact information. the fact that you guys are excited by this tells you guys are either undergrads or code monkeys who are ML wannabe.
>>
>>16211928
So what if they are undergrads? I don't understand your point. Yes, it's not particularly novel information if you are someone who has spent years doing Bayesian ML/Bayesian statistics, but it takes some time to see the connections between these frequentist regularization methods and the Bayesian MAP formulation of said regularization.
>>
>>16211928
Post pic of hand and it will be brown with CI of 95.
>>
>>16211946
Elitism is good, but it should be with a firm and happy hand. Not with a dull depressed heavy hand.
>>
What is the most difficult branch in statistics?
>>
>>16217068
In what way do you mean difficult? Do you mean mathematically difficult or do you mean practically difficult?
>>
>>16217241
Mathematically difficult
>>
>>16219240
I guess that depends on what you find difficult. Generally statistics gets mathematically complicated when the probability theory gets complicated.

Many people find measure theoretic statistics fairly difficult, and this will propagate throughout all of the related fields (performance analysis and large deviations theory, sequential analysis, information theoretic statistics, etc.) with this formulation.
>>
>>16211928
You're on 4chan, what did you expect?
>>
Statistics is not only useful. It's fun as well. I love to do PDEs on stats problems.
>>
>>16222603
>Statistics is fun
LOL seriously? You like anal (receiving)?
>PDE is fun
Hell yeah it is
>>
>>16222679
classic shitpost. Now go to another thread for retards.
>>
so when are you fags going to prove the theory of probability?
>>
>>16223279
lol lmao even
>>
>>16187543
They do, IF they're also computational mathematicians. The stats universities that are actually trying to push forward new or novel techniques use C++ and then make interfaces with R (because they know the applied community all uses R).

Take the INLA project as an example. And that's just something actively in development.
>>
Why is p-hacking bad? Isn't it literally just what happens as you collect more data regardless of the problem?

From a frequentist standpoint, your intervals and p-values go to zero as more data is collected simply because we are working from the interpretation of constant coefficients in our models. Statistical significance is great and all, but it's not a measure of importance or impact just 'hey this interval doesn't overlap with hypothesis X or other coefficient Y'.

I don't really understand the p-hacking problem whatsoever basically. Especially when combined with any sort of validation techniques or with any follow-on operational type question (statistically significant difference doesn't mean an impactful difference $1 is very statistically significantly different than $1.01 but doesn't actually matter in the majority of contexts).
>>
>>16223446
From my understanding, the problem with p-hacking is that you are collecting a biased sample set. It isn't just that you are collecting more data, it is that you are collecting more data under a specific subset which is more likely to show significance (e.g., tailed or skewed data science towards the extreme cases of the alternative).

It's a case of biased sample selection (or potentially pruning of negative outliers which would make your test statistics more centrally located).
>>
>>16223279
Its more of a question of how long before the theory can be proven with 100 percent accuracy. Any day now im sure..
>>
>>16223279
cope from brainlet
>>
>>16223446
P-hacking implies that you already have decided beforehand what the end result is instead of accepting the data as it is
>>
>>16224049
two more weeks right?
>>
Do any unis teach a completely unbiased course on race statistics?
>>
>>16226420
No. The same way that there are no colleges that teach entirely unbiased courses on any other highly controversial subject where there's still open research questions.
>>
>>16226420
lol god no. If you want to learn the real stuff, you have to learn it yourself. Start with the bell curve. Maybe the closest would be some analysis course on applied criminology at Quantico where they teach how the world works to federales.
>>
>>16226420
There's one prestigious uni called /pol/, you can complete a whole degree on racial statistics there
>>
>>16228727
kek
>>
are random variables a group under convolution?
>>
>>16230012
Define random
>>
your vanity thread is on page 10 again, better bump it quick
>>
>>16230928
lmao
>>
>>16230927
a function from the sample space to a subset of the reals (or real space)
>>
File: 0003.png (38 KB, 618x559)
38 KB
38 KB PNG
I LOVE <3 non parametric stats <3
>>
>>16232093
why?
>>
>>16233286
Fuck normal distributions
Fuck means
Fuck SD
>>
My PI forces me to use Matlab for all the analyses and statistics. It's surprisingly comfy but disgusting at the same time.
>>
>>16234955
You work in some kind of weird finance department?
>>
>>16235418
He probably works for the based department. Matlab is based as fuck. T. Statistical signal processing engineer.
>>
>>16235418
Applied physics
>>
>>16236011
Continue using it. Since you are in the field that actually uses it as a standard.
>>16235478
You my dear sir, are an idiot.
>>
>>16236346
I may be a retard but I'm a based retard who uses a software environment that easily handles constrained optimization of nonlinear objective functions.
>>
>>16187525
good thread OP
>>
Redpill me on gamma distributions
>>
File: misspelling.png (122 KB, 750x1050)
122 KB
122 KB PNG
Was over in another board and got suggested to post here.

Problem:
I'm doing data analysis for a refrigeration-based dehumidification product for a company. Sometimes it goes through QC no problem. Sometimes it has a lot of issues. I want to find out why.

What I've done so far:
I've been able to collate the following data (*):
1-Testing chart data for each product
2-Order form data for each product
3-BOM data for each product
(4-I'm working on getting job routing data for each product atm, as someone else in the other thread suggested to me).
Using 1, I can look at the number of failed charts to get a list of 'good' and 'bad' products.
Using 2, I can filter the previous list to only look at the dehum products.
Once I do this, I have a sample size of maybe 500 (the company is not high-volume, they make niche, custom products).
I've ran the following statistical tests:
-Script to do brute force ANOVAs of components in BOM v. good/bad end-products. This only identified outlier products' materials. For example, it was suggested things like, "The shipping crate used in the outlier is suspect." In general, I got a lot of "Pirates cause global warming" noise.
-Because of the previous results, I made all the data binary (good=1,bad=0,part in BOM=1,part not in BOM=0) and did Fisher p-testing. This only identified 'obvious' parts. Things like, "Yes, all compressors would be suspect, of fucking course, that's how refrigeration works." It didn't narrow anything down.
-I tried running correlations on some relevant variables (e.g., amount of refrigerant in product v. failed test numbers), and I just get noise.
There's a chance I missed something in these two previous tests, because there was a lot of noise to go through.
-Because of the small sample size (500), I feel I'm limited to single-variable analyses.

Can anyone think of anything else I should try?

(*) An aside vent: just getting this data collated, accessible, and cross-referenced was a PIA.
>>
File: 047072210X.jpg (35 KB, 300x469)
35 KB
35 KB JPG
>>16237819
At the end of picrel they go into something similar for VW.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.