Two-parameter family of continuous probability distributions
In probability theory and statistics , the inverse gamma distribution is a two-parameter family of continuous probability distributions on the positive real line , which is the distribution of the reciprocal of a variable distributed according to the gamma distribution .
Perhaps the chief use of the inverse gamma distribution is in Bayesian statistics , where the distribution arises as the marginal posterior distribution for the unknown variance of a normal distribution , if an uninformative prior is used, and as an analytically tractable conjugate prior , if an informative prior is required.[ 1] It is common among some Bayesians to consider an alternative parametrization of the normal distribution in terms of the precision , defined as the reciprocal of the variance, which allows the gamma distribution to be used directly as a conjugate prior. Other Bayesians prefer to parametrize the inverse gamma distribution differently, as a scaled inverse chi-squared distribution .
Probability density function [ edit ]
The inverse gamma distribution's probability density function is defined over the support
x
>
0
{\displaystyle x>0}
f
(
x
;
α
,
β
)
=
β
α
Γ
(
α
)
(
1
/
x
)
α
+
1
exp
(
−
β
/
x
)
{\displaystyle f(x;\alpha ,\beta )={\frac {\beta ^{\alpha }}{\Gamma (\alpha )}}(1/x)^{\alpha +1}\exp \left(-\beta /x\right)}
with shape parameter
α
{\displaystyle \alpha }
and scale parameter
β
{\displaystyle \beta }
.[ 2] Here
Γ
(
⋅
)
{\displaystyle \Gamma (\cdot )}
denotes the gamma function .
Unlike the gamma distribution , which contains a somewhat similar exponential term,
β
{\displaystyle \beta }
is a scale parameter as the density function satisfies:
f
(
x
;
α
,
β
)
=
f
(
x
/
β
;
α
,
1
)
β
{\displaystyle f(x;\alpha ,\beta )={\frac {f(x/\beta ;\alpha ,1)}{\beta }}}
Cumulative distribution function [ edit ]
The cumulative distribution function is the regularized gamma function
F
(
x
;
α
,
β
)
=
Γ
(
α
,
β
x
)
Γ
(
α
)
=
Q
(
α
,
β
x
)
{\displaystyle F(x;\alpha ,\beta )={\frac {\Gamma \left(\alpha ,{\frac {\beta }{x}}\right)}{\Gamma (\alpha )}}=Q\left(\alpha ,{\frac {\beta }{x}}\right)\!}
where the numerator is the upper incomplete gamma function and the denominator is the gamma function . Many math packages allow direct computation of
Q
{\displaystyle Q}
, the regularized gamma function.
Provided that
α
>
n
{\displaystyle \alpha >n}
, the
n
{\displaystyle n}
-th moment of the inverse gamma distribution is given by[ 3]
E
[
X
n
]
=
β
n
Γ
(
α
−
n
)
Γ
(
α
)
=
β
n
(
α
−
1
)
⋯
(
α
−
n
)
.
{\displaystyle \mathrm {E} [X^{n}]=\beta ^{n}{\frac {\Gamma (\alpha -n)}{\Gamma (\alpha )}}={\frac {\beta ^{n}}{(\alpha -1)\cdots (\alpha -n)}}.}
Characteristic function [ edit ]
The inverse gamma distribution has characteristic function
2
(
−
i
β
t
)
α
2
Γ
(
α
)
K
α
(
−
4
i
β
t
)
{\displaystyle {\frac {2\left(-i\beta t\right)^{\!\!{\frac {\alpha }{2}}}}{\Gamma (\alpha )}}K_{\alpha }\left({\sqrt {-4i\beta t}}\right)}
where
K
α
{\displaystyle K_{\alpha }}
is the modified Bessel function of the 2nd kind.
For
α
>
0
{\displaystyle \alpha >0}
and
β
>
0
{\displaystyle \beta >0}
,
E
[
ln
(
X
)
]
=
ln
(
β
)
−
ψ
(
α
)
{\displaystyle \mathbb {E} [\ln(X)]=\ln(\beta )-\psi (\alpha )\,}
and
E
[
X
−
1
]
=
α
β
,
{\displaystyle \mathbb {E} [X^{-1}]={\frac {\alpha }{\beta }},\,}
The information entropy is
H
(
X
)
=
E
[
−
ln
(
p
(
X
)
)
]
=
E
[
−
α
ln
(
β
)
+
ln
(
Γ
(
α
)
)
+
(
α
+
1
)
ln
(
X
)
+
β
X
]
=
−
α
ln
(
β
)
+
ln
(
Γ
(
α
)
)
+
(
α
+
1
)
ln
(
β
)
−
(
α
+
1
)
ψ
(
α
)
+
α
=
α
+
ln
(
β
Γ
(
α
)
)
−
(
α
+
1
)
ψ
(
α
)
.
{\displaystyle {\begin{aligned}\operatorname {H} (X)&=\operatorname {E} [-\ln(p(X))]\\&=\operatorname {E} \left[-\alpha \ln(\beta )+\ln(\Gamma (\alpha ))+(\alpha +1)\ln(X)+{\frac {\beta }{X}}\right]\\&=-\alpha \ln(\beta )+\ln(\Gamma (\alpha ))+(\alpha +1)\ln(\beta )-(\alpha +1)\psi (\alpha )+\alpha \\&=\alpha +\ln(\beta \Gamma (\alpha ))-(\alpha +1)\psi (\alpha ).\end{aligned}}}
where
ψ
(
α
)
{\displaystyle \psi (\alpha )}
is the digamma function .
The Kullback-Leibler divergence of Inverse-Gamma(αp , βp ) from Inverse-Gamma(αq , βq ) is the same as the KL-divergence of Gamma(αp , βp ) from Gamma(αq , βq ):
D
K
L
(
α
p
,
β
p
;
α
q
,
β
q
)
=
E
[
log
ρ
(
X
)
π
(
X
)
]
=
E
[
log
ρ
(
1
/
Y
)
π
(
1
/
Y
)
]
=
E
[
log
ρ
G
(
Y
)
π
G
(
Y
)
]
,
{\displaystyle D_{\mathrm {KL} }(\alpha _{p},\beta _{p};\alpha _{q},\beta _{q})=\mathbb {E} \left[\log {\frac {\rho (X)}{\pi (X)}}\right]=\mathbb {E} \left[\log {\frac {\rho (1/Y)}{\pi (1/Y)}}\right]=\mathbb {E} \left[\log {\frac {\rho _{G}(Y)}{\pi _{G}(Y)}}\right],}
where
ρ
,
π
{\displaystyle \rho ,\pi }
are the pdfs of the Inverse-Gamma distributions and
ρ
G
,
π
G
{\displaystyle \rho _{G},\pi _{G}}
are the pdfs of the Gamma distributions,
Y
{\displaystyle Y}
is Gamma(αp , βp ) distributed.
D
K
L
(
α
p
,
β
p
;
α
q
,
β
q
)
=
(
α
p
−
α
q
)
ψ
(
α
p
)
−
log
Γ
(
α
p
)
+
log
Γ
(
α
q
)
+
α
q
(
log
β
p
−
log
β
q
)
+
α
p
β
q
−
β
p
β
p
.
{\displaystyle {\begin{aligned}D_{\mathrm {KL} }(\alpha _{p},\beta _{p};\alpha _{q},\beta _{q})={}&(\alpha _{p}-\alpha _{q})\psi (\alpha _{p})-\log \Gamma (\alpha _{p})+\log \Gamma (\alpha _{q})+\alpha _{q}(\log \beta _{p}-\log \beta _{q})+\alpha _{p}{\frac {\beta _{q}-\beta _{p}}{\beta _{p}}}.\end{aligned}}}
If
X
∼
Inv-Gamma
(
α
,
β
)
{\displaystyle X\sim {\mbox{Inv-Gamma}}(\alpha ,\beta )}
then
k
X
∼
Inv-Gamma
(
α
,
k
β
)
{\displaystyle kX\sim {\mbox{Inv-Gamma}}(\alpha ,k\beta )\,}
, for
k
>
0
{\displaystyle k>0}
If
X
∼
Inv-Gamma
(
α
,
1
2
)
{\displaystyle X\sim {\mbox{Inv-Gamma}}(\alpha ,{\tfrac {1}{2}})}
then
X
∼
Inv-
χ
2
(
2
α
)
{\displaystyle X\sim {\mbox{Inv-}}\chi ^{2}(2\alpha )\,}
(inverse-chi-squared distribution )
If
X
∼
Inv-Gamma
(
α
2
,
1
2
)
{\displaystyle X\sim {\mbox{Inv-Gamma}}({\tfrac {\alpha }{2}},{\tfrac {1}{2}})}
then
X
∼
Scaled Inv-
χ
2
(
α
,
1
α
)
{\displaystyle X\sim {\mbox{Scaled Inv-}}\chi ^{2}(\alpha ,{\tfrac {1}{\alpha }})\,}
(scaled-inverse-chi-squared distribution )
If
X
∼
Inv-Gamma
(
1
2
,
c
2
)
{\displaystyle X\sim {\textrm {Inv-Gamma}}({\tfrac {1}{2}},{\tfrac {c}{2}})}
then
X
∼
Levy
(
0
,
c
)
{\displaystyle X\sim {\textrm {Levy}}(0,c)\,}
(Lévy distribution )
If
X
∼
Inv-Gamma
(
1
,
c
)
{\displaystyle X\sim {\textrm {Inv-Gamma}}(1,c)}
then
1
X
∼
Exp
(
c
)
{\displaystyle {\tfrac {1}{X}}\sim {\textrm {Exp}}(c)\,}
(Exponential distribution )
If
X
∼
Gamma
(
α
,
β
)
{\displaystyle X\sim {\mbox{Gamma}}(\alpha ,\beta )\,}
(Gamma distribution with rate parameter
β
{\displaystyle \beta }
) then
1
X
∼
Inv-Gamma
(
α
,
β
)
{\displaystyle {\tfrac {1}{X}}\sim {\mbox{Inv-Gamma}}(\alpha ,\beta )\,}
(see derivation in the next paragraph for details)
Note that If
X
∼
Gamma
(
k
,
θ
)
{\displaystyle X\sim {\mbox{Gamma}}(k,\theta )}
(Gamma distribution with scale parameter
θ
{\displaystyle \theta }
) then
1
/
X
∼
Inv-Gamma
(
k
,
1
/
θ
)
{\displaystyle 1/X\sim {\mbox{Inv-Gamma}}(k,1/\theta )}
Inverse gamma distribution is a special case of type 5 Pearson distribution
A multivariate generalization of the inverse-gamma distribution is the inverse-Wishart distribution .
For the distribution of a sum of independent inverted Gamma variables see Witkovsky (2001)
Derivation from Gamma distribution [ edit ]
Let
X
∼
Gamma
(
α
,
β
)
{\displaystyle X\sim {\mbox{Gamma}}(\alpha ,\beta )}
, and recall that the pdf of the gamma distribution is
f
X
(
x
)
=
β
α
Γ
(
α
)
x
α
−
1
e
−
β
x
{\displaystyle f_{X}(x)={\frac {\beta ^{\alpha }}{\Gamma (\alpha )}}x^{\alpha -1}e^{-\beta x}}
,
x
>
0
{\displaystyle x>0}
.
Note that
β
{\displaystyle \beta }
is the rate parameter from the perspective of the gamma distribution.
Define the transformation
Y
=
g
(
X
)
=
1
X
{\displaystyle Y=g(X)={\tfrac {1}{X}}}
. Then, the pdf of
Y
{\displaystyle Y}
is
f
Y
(
y
)
=
f
X
(
g
−
1
(
y
)
)
|
d
d
y
g
−
1
(
y
)
|
=
β
α
Γ
(
α
)
(
1
y
)
α
−
1
exp
(
−
β
y
)
1
y
2
=
β
α
Γ
(
α
)
(
1
y
)
α
+
1
exp
(
−
β
y
)
=
β
α
Γ
(
α
)
(
y
)
−
α
−
1
exp
(
−
β
y
)
{\displaystyle {\begin{aligned}f_{Y}(y)&=f_{X}\left(g^{-1}(y)\right)\left|{\frac {d}{dy}}g^{-1}(y)\right|\\[6pt]&={\frac {\beta ^{\alpha }}{\Gamma (\alpha )}}\left({\frac {1}{y}}\right)^{\alpha -1}\exp \left({\frac {-\beta }{y}}\right){\frac {1}{y^{2}}}\\[6pt]&={\frac {\beta ^{\alpha }}{\Gamma (\alpha )}}\left({\frac {1}{y}}\right)^{\alpha +1}\exp \left({\frac {-\beta }{y}}\right)\\[6pt]&={\frac {\beta ^{\alpha }}{\Gamma (\alpha )}}\left(y\right)^{-\alpha -1}\exp \left({\frac {-\beta }{y}}\right)\\[6pt]\end{aligned}}}
Note that
β
{\displaystyle {\beta }}
is the scale parameter from the perspective of the inverse gamma distribution. This can be straightforwardly demonstrated by seeing that
β
{\displaystyle {\beta }}
satisfies the conditions for being a scale parameter .
f
(
y
/
β
;
α
,
1
)
β
=
1
β
1
Γ
(
α
)
(
y
β
)
−
α
−
1
exp
(
−
1
y
β
)
=
β
α
Γ
(
α
)
(
y
)
−
α
−
1
exp
(
−
β
y
)
=
f
(
y
;
α
,
β
)
{\displaystyle {\begin{aligned}{\frac {f(y/\beta ;\alpha ,1)}{\beta }}&={\frac {1}{\beta }}{\frac {1}{\Gamma (\alpha )}}\left({\frac {y}{\beta }}\right)^{-\alpha -1}\exp(-{\frac {1}{\frac {y}{\beta }}})\\[6pt]&={\frac {\beta ^{\alpha }}{\Gamma (\alpha )}}\left(y\right)^{-\alpha -1}\exp(-{\frac {\beta }{y}})\\[6pt]&=f(y;\alpha ,\beta )\end{aligned}}}
Witkovsky, V. (2001). "Computing the Distribution of a Linear Combination of Inverted Gamma Variables". Kybernetika . 37 (1): 79–90. MR 1825758 . Zbl 1263.62022 .
Discrete univariate
with finite support with infinite support
Continuous univariate
supported on a bounded interval supported on a semi-infinite interval supported on the whole real line with support whose type varies
Mixed univariate
Multivariate (joint) Directional Degenerate and singular Families