Two-parameter family of continuous probability distributions
In probability theory and statistics , the inverse gamma distribution is a two-parameter family of continuous probability distributions on the positive real line , which is the distribution of the reciprocal of a variable distributed according to the gamma distribution .
Perhaps the chief use of the inverse gamma distribution is in Bayesian statistics , where the distribution arises as the marginal posterior distribution for the unknown variance of a normal distribution , if an uninformative prior is used, and as an analytically tractable conjugate prior , if an informative prior is required.[ 1] It is common among some Bayesians to consider an alternative parametrization of the normal distribution in terms of the precision , defined as the reciprocal of the variance, which allows the gamma distribution to be used directly as a conjugate prior. Other Bayesians prefer to parametrize the inverse gamma distribution differently, as a scaled inverse chi-squared distribution .
Probability density function [ edit ] The inverse gamma distribution's probability density function is defined over the support x > 0 {\displaystyle x>0}
f ( x ; α , β ) = β α Γ ( α ) ( 1 / x ) α + 1 exp ( − β / x ) {\displaystyle f(x;\alpha ,\beta )={\frac {\beta ^{\alpha }}{\Gamma (\alpha )}}(1/x)^{\alpha +1}\exp \left(-\beta /x\right)} with shape parameter α {\displaystyle \alpha } and scale parameter β {\displaystyle \beta } .[ 2] Here Γ ( ⋅ ) {\displaystyle \Gamma (\cdot )} denotes the gamma function .
Unlike the gamma distribution , which contains a somewhat similar exponential term, β {\displaystyle \beta } is a scale parameter as the density function satisfies:
f ( x ; α , β ) = f ( x / β ; α , 1 ) β {\displaystyle f(x;\alpha ,\beta )={\frac {f(x/\beta ;\alpha ,1)}{\beta }}} Cumulative distribution function [ edit ] The cumulative distribution function is the regularized gamma function
F ( x ; α , β ) = Γ ( α , β x ) Γ ( α ) = Q ( α , β x ) {\displaystyle F(x;\alpha ,\beta )={\frac {\Gamma \left(\alpha ,{\frac {\beta }{x}}\right)}{\Gamma (\alpha )}}=Q\left(\alpha ,{\frac {\beta }{x}}\right)\!} where the numerator is the upper incomplete gamma function and the denominator is the gamma function . Many math packages allow direct computation of Q {\displaystyle Q} , the regularized gamma function.
Provided that α > n {\displaystyle \alpha >n} , the n {\displaystyle n} -th moment of the inverse gamma distribution is given by[ 3]
E [ X n ] = β n Γ ( α − n ) Γ ( α ) = β n ( α − 1 ) ⋯ ( α − n ) . {\displaystyle \mathrm {E} [X^{n}]=\beta ^{n}{\frac {\Gamma (\alpha -n)}{\Gamma (\alpha )}}={\frac {\beta ^{n}}{(\alpha -1)\cdots (\alpha -n)}}.} Characteristic function [ edit ] The inverse gamma distribution has characteristic function 2 ( − i β t ) α 2 Γ ( α ) K α ( − 4 i β t ) {\displaystyle {\frac {2\left(-i\beta t\right)^{\!\!{\frac {\alpha }{2}}}}{\Gamma (\alpha )}}K_{\alpha }\left({\sqrt {-4i\beta t}}\right)} where K α {\displaystyle K_{\alpha }} is the modified Bessel function of the 2nd kind.
For α > 0 {\displaystyle \alpha >0} and β > 0 {\displaystyle \beta >0} ,
E [ ln ( X ) ] = ln ( β ) − ψ ( α ) {\displaystyle \mathbb {E} [\ln(X)]=\ln(\beta )-\psi (\alpha )\,} and
E [ X − 1 ] = α β , {\displaystyle \mathbb {E} [X^{-1}]={\frac {\alpha }{\beta }},\,} The information entropy is
H ( X ) = E [ − ln ( p ( X ) ) ] = E [ − α ln ( β ) + ln ( Γ ( α ) ) + ( α + 1 ) ln ( X ) + β X ] = − α ln ( β ) + ln ( Γ ( α ) ) + ( α + 1 ) ln ( β ) − ( α + 1 ) ψ ( α ) + α = α + ln ( β Γ ( α ) ) − ( α + 1 ) ψ ( α ) . {\displaystyle {\begin{aligned}\operatorname {H} (X)&=\operatorname {E} [-\ln(p(X))]\\&=\operatorname {E} \left[-\alpha \ln(\beta )+\ln(\Gamma (\alpha ))+(\alpha +1)\ln(X)+{\frac {\beta }{X}}\right]\\&=-\alpha \ln(\beta )+\ln(\Gamma (\alpha ))+(\alpha +1)\ln(\beta )-(\alpha +1)\psi (\alpha )+\alpha \\&=\alpha +\ln(\beta \Gamma (\alpha ))-(\alpha +1)\psi (\alpha ).\end{aligned}}} where ψ ( α ) {\displaystyle \psi (\alpha )} is the digamma function .
The Kullback-Leibler divergence of Inverse-Gamma(αp , βp ) from Inverse-Gamma(αq , βq ) is the same as the KL-divergence of Gamma(αp , βp ) from Gamma(αq , βq ):
D K L ( α p , β p ; α q , β q ) = E [ log ρ ( X ) π ( X ) ] = E [ log ρ ( 1 / Y ) π ( 1 / Y ) ] = E [ log ρ G ( Y ) π G ( Y ) ] , {\displaystyle D_{\mathrm {KL} }(\alpha _{p},\beta _{p};\alpha _{q},\beta _{q})=\mathbb {E} \left[\log {\frac {\rho (X)}{\pi (X)}}\right]=\mathbb {E} \left[\log {\frac {\rho (1/Y)}{\pi (1/Y)}}\right]=\mathbb {E} \left[\log {\frac {\rho _{G}(Y)}{\pi _{G}(Y)}}\right],}
where ρ , π {\displaystyle \rho ,\pi } are the pdfs of the Inverse-Gamma distributions and ρ G , π G {\displaystyle \rho _{G},\pi _{G}} are the pdfs of the Gamma distributions, Y {\displaystyle Y} is Gamma(αp , βp ) distributed.
D K L ( α p , β p ; α q , β q ) = ( α p − α q ) ψ ( α p ) − log Γ ( α p ) + log Γ ( α q ) + α q ( log β p − log β q ) + α p β q − β p β p . {\displaystyle {\begin{aligned}D_{\mathrm {KL} }(\alpha _{p},\beta _{p};\alpha _{q},\beta _{q})={}&(\alpha _{p}-\alpha _{q})\psi (\alpha _{p})-\log \Gamma (\alpha _{p})+\log \Gamma (\alpha _{q})+\alpha _{q}(\log \beta _{p}-\log \beta _{q})+\alpha _{p}{\frac {\beta _{q}-\beta _{p}}{\beta _{p}}}.\end{aligned}}} If X ∼ Inv-Gamma ( α , β ) {\displaystyle X\sim {\mbox{Inv-Gamma}}(\alpha ,\beta )} then k X ∼ Inv-Gamma ( α , k β ) {\displaystyle kX\sim {\mbox{Inv-Gamma}}(\alpha ,k\beta )\,} , for k > 0 {\displaystyle k>0} If X ∼ Inv-Gamma ( α , 1 2 ) {\displaystyle X\sim {\mbox{Inv-Gamma}}(\alpha ,{\tfrac {1}{2}})} then X ∼ Inv- χ 2 ( 2 α ) {\displaystyle X\sim {\mbox{Inv-}}\chi ^{2}(2\alpha )\,} (inverse-chi-squared distribution ) If X ∼ Inv-Gamma ( α 2 , 1 2 ) {\displaystyle X\sim {\mbox{Inv-Gamma}}({\tfrac {\alpha }{2}},{\tfrac {1}{2}})} then X ∼ Scaled Inv- χ 2 ( α , 1 α ) {\displaystyle X\sim {\mbox{Scaled Inv-}}\chi ^{2}(\alpha ,{\tfrac {1}{\alpha }})\,} (scaled-inverse-chi-squared distribution ) If X ∼ Inv-Gamma ( 1 2 , c 2 ) {\displaystyle X\sim {\textrm {Inv-Gamma}}({\tfrac {1}{2}},{\tfrac {c}{2}})} then X ∼ Levy ( 0 , c ) {\displaystyle X\sim {\textrm {Levy}}(0,c)\,} (Lévy distribution ) If X ∼ Inv-Gamma ( 1 , c ) {\displaystyle X\sim {\textrm {Inv-Gamma}}(1,c)} then 1 X ∼ Exp ( c ) {\displaystyle {\tfrac {1}{X}}\sim {\textrm {Exp}}(c)\,} (Exponential distribution ) If X ∼ Gamma ( α , β ) {\displaystyle X\sim {\mbox{Gamma}}(\alpha ,\beta )\,} (Gamma distribution with rate parameter β {\displaystyle \beta } ) then 1 X ∼ Inv-Gamma ( α , β ) {\displaystyle {\tfrac {1}{X}}\sim {\mbox{Inv-Gamma}}(\alpha ,\beta )\,} (see derivation in the next paragraph for details) Note that If X ∼ Gamma ( k , θ ) {\displaystyle X\sim {\mbox{Gamma}}(k,\theta )} (Gamma distribution with scale parameter θ {\displaystyle \theta } ) then 1 / X ∼ Inv-Gamma ( k , 1 / θ ) {\displaystyle 1/X\sim {\mbox{Inv-Gamma}}(k,1/\theta )} Inverse gamma distribution is a special case of type 5 Pearson distribution A multivariate generalization of the inverse-gamma distribution is the inverse-Wishart distribution . For the distribution of a sum of independent inverted Gamma variables see Witkovsky (2001) Derivation from Gamma distribution [ edit ] Let X ∼ Gamma ( α , β ) {\displaystyle X\sim {\mbox{Gamma}}(\alpha ,\beta )} , and recall that the pdf of the gamma distribution is
f X ( x ) = β α Γ ( α ) x α − 1 e − β x {\displaystyle f_{X}(x)={\frac {\beta ^{\alpha }}{\Gamma (\alpha )}}x^{\alpha -1}e^{-\beta x}} , x > 0 {\displaystyle x>0} . Note that β {\displaystyle \beta } is the rate parameter from the perspective of the gamma distribution.
Define the transformation Y = g ( X ) = 1 X {\displaystyle Y=g(X)={\tfrac {1}{X}}} . Then, the pdf of Y {\displaystyle Y} is
f Y ( y ) = f X ( g − 1 ( y ) ) | d d y g − 1 ( y ) | = β α Γ ( α ) ( 1 y ) α − 1 exp ( − β y ) 1 y 2 = β α Γ ( α ) ( 1 y ) α + 1 exp ( − β y ) = β α Γ ( α ) ( y ) − α − 1 exp ( − β y ) {\displaystyle {\begin{aligned}f_{Y}(y)&=f_{X}\left(g^{-1}(y)\right)\left|{\frac {d}{dy}}g^{-1}(y)\right|\\[6pt]&={\frac {\beta ^{\alpha }}{\Gamma (\alpha )}}\left({\frac {1}{y}}\right)^{\alpha -1}\exp \left({\frac {-\beta }{y}}\right){\frac {1}{y^{2}}}\\[6pt]&={\frac {\beta ^{\alpha }}{\Gamma (\alpha )}}\left({\frac {1}{y}}\right)^{\alpha +1}\exp \left({\frac {-\beta }{y}}\right)\\[6pt]&={\frac {\beta ^{\alpha }}{\Gamma (\alpha )}}\left(y\right)^{-\alpha -1}\exp \left({\frac {-\beta }{y}}\right)\\[6pt]\end{aligned}}} Note that β {\displaystyle {\beta }} is the scale parameter from the perspective of the inverse gamma distribution. This can be straightforwardly demonstrated by seeing that β {\displaystyle {\beta }} satisfies the conditions for being a scale parameter .
f ( y / β ; α , 1 ) β = 1 β 1 Γ ( α ) ( y β ) − α − 1 exp ( − 1 y β ) = β α Γ ( α ) ( y ) − α − 1 exp ( − β y ) = f ( y ; α , β ) {\displaystyle {\begin{aligned}{\frac {f(y/\beta ;\alpha ,1)}{\beta }}&={\frac {1}{\beta }}{\frac {1}{\Gamma (\alpha )}}\left({\frac {y}{\beta }}\right)^{-\alpha -1}\exp(-{\frac {1}{\frac {y}{\beta }}})\\[6pt]&={\frac {\beta ^{\alpha }}{\Gamma (\alpha )}}\left(y\right)^{-\alpha -1}\exp(-{\frac {\beta }{y}})\\[6pt]&=f(y;\alpha ,\beta )\end{aligned}}} Witkovsky, V. (2001). "Computing the Distribution of a Linear Combination of Inverted Gamma Variables". Kybernetika . 37 (1): 79–90. MR 1825758 . Zbl 1263.62022 .
Discrete univariate
with finite support with infinite support
Continuous univariate
supported on a bounded interval supported on a semi-infinite interval supported on the whole real line with support whose type varies
Mixed univariate
Multivariate (joint) Directional Degenerate and singular Families