Here I discuss, how to derive F distribution as a random variable, which is a ratio of two independent chi-square disributions. I'll also briefly discuss F-test and ANOVA here.
In my previous posts I’ve described Chi-square distribution (as a special case of Gamma distribution) and Pearson’s Chi-square test, from which many other distributions and tests are derived in the field of statistics.
In this post I am going to derive the distribution function of a Snedecor’s F distribution. It is essentially a ratio between two independent Chi-square-distributed variables with and degrees of freedom respectively .
In order to infer its probability density function/cumulative distribution function from the ratio, I’ll have to discuss non-trivial technicalities about measure theory etc. first.
Conditional probabilities of multi-dimensional continuous variables
Suppose that we need to calculate the probability density function of a random variable , which is a multiple of 2 independent random variables, and .
First, let us recall the definition of independent random variables in a continuous case: . Basically, joint probability density function is a multiplication of individual probability density functions.
Thus, cumulative distribution function .
Now, we need to calculate the cumulative distribution function of a multiple of 2 random variables. The logic is similar to convolutions in case of a sum of variables: if the product , we allow to take an arbitrary value of , and should take value of then.
We will be integrating in a space, where , we have to multiply the integrand by Jacobian determinant .
Thus, probability density function of F distribution is .
Similarly, cumulative distribution function (note that multiplication of integrand by Jacobian is not required here, as this is a proper 2D integral).
Graphically, it represents the integral of 2-dimensional probability density function over the area, delimited by curve:
Off-topic consistency considerations
Please, skip this section, it is a memento for myself, the product of my attempts to reason about how this integration works.
Suppose, we want to get c.d.f. from p.d.f.: . How to interpret it? is an area, so is a unit rectangle; is an integral of over the length of each hyperbola, corresponding to a single value. When we integrate over the length of each hyperbola, as we approach infinity with s, t approaches zero, so the area of x stays the same.
A consistency consideration: we can infer p.d.f. from inequalities directly and see that integration is consistent:
Snedecor’s F distribution derivation
We want to calculate the probability density function of F distribution as a multiple of 2 distributions, chi-square and inverse chi-square. But we need to invert first to do so. We’ll have to derive the probability density function of inverse chi-square distribution.
Inverse chi-square distribution
Recall the probability density function of chi-square distribution: .
By inverse distribution formula: .
Thus, . Now, if , and .
As a result, p.d.f. of inverse chi-square .
Now, let us substitute the p.d.f. of chi-square and inverse chi-square distributions into F-distribution probability density function:
We aim to convert the integral into a gamma-function .
In order to do that we shall perform a variable substitution , hence, . Our integral then will take form of a gamma-function:
Substituting it into the expression for p.d.f., we get: .
An alternative derivation is available here.
Normalization of chi-square distributions by degrees of freedom
In actual F distribution chi-squared distributions are normalized by their respective degrees of freedom, so that
The general form of F distribution probability density .
F distribution is a special case of Beta-distribution
It is easy to notice that the expression is inverse of Beta-function .
It is also easy to see that is a typical integrand of an incomplete Beta-function, as the one used in Beta-distribution probability density function.
Thus, F distribution is just a special case of Beta-distribution .
F-test is just an application of F distribution to data.
Suppose you have a set of patients, and some subset of them receives a treatment. You need to prove that the treatment works.
You measure some parameter (e.g. duration of sickness) for the treated patients and for the whole set of patients.
You then assume a null-hypothesis that there is no difference between treated patients. If the null-hypothesis holds, the ratio of sample
variances between treated patients and all patients should be F-distributed. If the p-value obtained in this test is too
small, you reject the null hypothesis and claim that the treatment works.