Multivariate parameter family of continuous probability distributions
normal-inverse-Wishart Notation ( μ , Σ ) ∼ N I W ( μ 0 , λ , Ψ , ν ) {\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})\sim \mathrm {NIW} ({\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Psi }},\nu )} Parameters μ 0 ∈ R D {\displaystyle {\boldsymbol {\mu }}_{0}\in \mathbb {R} ^{D}\,} location (vector of real ) λ > 0 {\displaystyle \lambda >0\,} (real) Ψ ∈ R D × D {\displaystyle {\boldsymbol {\Psi }}\in \mathbb {R} ^{D\times D}} inverse scale matrix (pos. def. ) ν > D − 1 {\displaystyle \nu >D-1\,} (real) Support μ ∈ R D ; Σ ∈ R D × D {\displaystyle {\boldsymbol {\mu }}\in \mathbb {R} ^{D};{\boldsymbol {\Sigma }}\in \mathbb {R} ^{D\times D}} covariance matrix (pos. def. ) PDF f ( μ , Σ | μ 0 , λ , Ψ , ν ) = N ( μ | μ 0 , 1 λ Σ ) W − 1 ( Σ | Ψ , ν ) {\displaystyle f({\boldsymbol {\mu }},{\boldsymbol {\Sigma }}|{\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Psi }},\nu )={\mathcal {N}}({\boldsymbol {\mu }}|{\boldsymbol {\mu }}_{0},{\tfrac {1}{\lambda }}{\boldsymbol {\Sigma }})\ {\mathcal {W}}^{-1}({\boldsymbol {\Sigma }}|{\boldsymbol {\Psi }},\nu )}
In probability theory and statistics , the normal-inverse-Wishart distribution (or Gaussian-inverse-Wishart distribution ) is a multivariate four-parameter family of continuous probability distributions . It is the conjugate prior of a multivariate normal distribution with unknown mean and covariance matrix (the inverse of the precision matrix ).[ 1]
Suppose
μ | μ 0 , λ , Σ ∼ N ( μ | μ 0 , 1 λ Σ ) {\displaystyle {\boldsymbol {\mu }}|{\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Sigma }}\sim {\mathcal {N}}\left({\boldsymbol {\mu }}{\Big |}{\boldsymbol {\mu }}_{0},{\frac {1}{\lambda }}{\boldsymbol {\Sigma }}\right)} has a multivariate normal distribution with mean μ 0 {\displaystyle {\boldsymbol {\mu }}_{0}} and covariance matrix 1 λ Σ {\displaystyle {\tfrac {1}{\lambda }}{\boldsymbol {\Sigma }}} , where
Σ | Ψ , ν ∼ W − 1 ( Σ | Ψ , ν ) {\displaystyle {\boldsymbol {\Sigma }}|{\boldsymbol {\Psi }},\nu \sim {\mathcal {W}}^{-1}({\boldsymbol {\Sigma }}|{\boldsymbol {\Psi }},\nu )} has an inverse Wishart distribution . Then ( μ , Σ ) {\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})} has a normal-inverse-Wishart distribution, denoted as
( μ , Σ ) ∼ N I W ( μ 0 , λ , Ψ , ν ) . {\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})\sim \mathrm {NIW} ({\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Psi }},\nu ).} Probability density function [ edit ] f ( μ , Σ | μ 0 , λ , Ψ , ν ) = N ( μ | μ 0 , 1 λ Σ ) W − 1 ( Σ | Ψ , ν ) {\displaystyle f({\boldsymbol {\mu }},{\boldsymbol {\Sigma }}|{\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Psi }},\nu )={\mathcal {N}}\left({\boldsymbol {\mu }}{\Big |}{\boldsymbol {\mu }}_{0},{\frac {1}{\lambda }}{\boldsymbol {\Sigma }}\right){\mathcal {W}}^{-1}({\boldsymbol {\Sigma }}|{\boldsymbol {\Psi }},\nu )} The full version of the PDF is as follows:[ 2]
f ( μ , Σ | μ 0 , λ , Ψ , ν ) = λ D / 2 | Ψ | ν / 2 | Σ | − ν + D + 2 2 ( 2 π ) D / 2 2 ν D 2 Γ D ( ν 2 ) exp { − 1 2 T r ( Ψ Σ − 1 ) − λ 2 ( μ − μ 0 ) T Σ − 1 ( μ − μ 0 ) } {\displaystyle f({\boldsymbol {\mu }},{\boldsymbol {\Sigma }}|{\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Psi }},\nu )={\frac {\lambda ^{D/2}|{\boldsymbol {\Psi }}|^{\nu /2}|{\boldsymbol {\Sigma }}|^{-{\frac {\nu +D+2}{2}}}}{(2\pi )^{D/2}2^{\frac {\nu D}{2}}\Gamma _{D}({\frac {\nu }{2}})}}{\text{exp}}\left\{-{\frac {1}{2}}Tr({\boldsymbol {\Psi \Sigma }}^{-1})-{\frac {\lambda }{2}}({\boldsymbol {\mu }}-{\boldsymbol {\mu }}_{0})^{T}{\boldsymbol {\Sigma }}^{-1}({\boldsymbol {\mu }}-{\boldsymbol {\mu }}_{0})\right\}}
Here Γ D [ ⋅ ] {\displaystyle \Gamma _{D}[\cdot ]} is the multivariate gamma function and T r ( Ψ ) {\displaystyle Tr({\boldsymbol {\Psi }})} is the Trace of the given matrix.
Marginal distributions [ edit ] By construction, the marginal distribution over Σ {\displaystyle {\boldsymbol {\Sigma }}} is an inverse Wishart distribution , and the conditional distribution over μ {\displaystyle {\boldsymbol {\mu }}} given Σ {\displaystyle {\boldsymbol {\Sigma }}} is a multivariate normal distribution . The marginal distribution over μ {\displaystyle {\boldsymbol {\mu }}} is a multivariate t-distribution .
Posterior distribution of the parameters [ edit ] Suppose the sampling density is a multivariate normal distribution
y i | μ , Σ ∼ N p ( μ , Σ ) {\displaystyle {\boldsymbol {y_{i}}}|{\boldsymbol {\mu }},{\boldsymbol {\Sigma }}\sim {\mathcal {N}}_{p}({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})} where y {\displaystyle {\boldsymbol {y}}} is an n × p {\displaystyle n\times p} matrix and y i {\displaystyle {\boldsymbol {y_{i}}}} (of length p {\displaystyle p} ) is row i {\displaystyle i} of the matrix .
With the mean and covariance matrix of the sampling distribution is unknown, we can place a Normal-Inverse-Wishart prior on the mean and covariance parameters jointly
( μ , Σ ) ∼ N I W ( μ 0 , λ , Ψ , ν ) . {\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})\sim \mathrm {NIW} ({\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Psi }},\nu ).} The resulting posterior distribution for the mean and covariance matrix will also be a Normal-Inverse-Wishart
( μ , Σ | y ) ∼ N I W ( μ n , λ n , Ψ n , ν n ) , {\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Sigma }}|y)\sim \mathrm {NIW} ({\boldsymbol {\mu }}_{n},\lambda _{n},{\boldsymbol {\Psi }}_{n},\nu _{n}),} where
μ n = λ μ 0 + n y ¯ λ + n {\displaystyle {\boldsymbol {\mu }}_{n}={\frac {\lambda {\boldsymbol {\mu }}_{0}+n{\bar {\boldsymbol {y}}}}{\lambda +n}}} λ n = λ + n {\displaystyle \lambda _{n}=\lambda +n} ν n = ν + n {\displaystyle \nu _{n}=\nu +n} Ψ n = Ψ + S + λ n λ + n ( y ¯ − μ 0 ) ( y ¯ − μ 0 ) T w i t h S = ∑ i = 1 n ( y i − y ¯ ) ( y i − y ¯ ) T {\displaystyle {\boldsymbol {\Psi }}_{n}={\boldsymbol {\Psi +S}}+{\frac {\lambda n}{\lambda +n}}({\boldsymbol {{\bar {y}}-\mu _{0}}})({\boldsymbol {{\bar {y}}-\mu _{0}}})^{T}~~~\mathrm {with} ~~{\boldsymbol {S}}=\sum _{i=1}^{n}({\boldsymbol {y_{i}-{\bar {y}}}})({\boldsymbol {y_{i}-{\bar {y}}}})^{T}} . To sample from the joint posterior of ( μ , Σ ) {\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})} , one simply draws samples from Σ | y ∼ W − 1 ( Ψ n , ν n ) {\displaystyle {\boldsymbol {\Sigma }}|{\boldsymbol {y}}\sim {\mathcal {W}}^{-1}({\boldsymbol {\Psi }}_{n},\nu _{n})} , then draw μ | Σ , y ∼ N p ( μ n , Σ / λ n ) {\displaystyle {\boldsymbol {\mu }}|{\boldsymbol {\Sigma ,y}}\sim {\mathcal {N}}_{p}({\boldsymbol {\mu }}_{n},{\boldsymbol {\Sigma }}/\lambda _{n})} . To draw from the posterior predictive of a new observation, draw y ~ | μ , Σ , y ∼ N p ( μ , Σ ) {\displaystyle {\boldsymbol {\tilde {y}}}|{\boldsymbol {\mu ,\Sigma ,y}}\sim {\mathcal {N}}_{p}({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})} , given the already drawn values of μ {\displaystyle {\boldsymbol {\mu }}} and Σ {\displaystyle {\boldsymbol {\Sigma }}} .[ 3]
Generating normal-inverse-Wishart random variates [ edit ] Generation of random variates is straightforward:
Sample Σ {\displaystyle {\boldsymbol {\Sigma }}} from an inverse Wishart distribution with parameters Ψ {\displaystyle {\boldsymbol {\Psi }}} and ν {\displaystyle \nu } Sample μ {\displaystyle {\boldsymbol {\mu }}} from a multivariate normal distribution with mean μ 0 {\displaystyle {\boldsymbol {\mu }}_{0}} and variance 1 λ Σ {\displaystyle {\boldsymbol {\tfrac {1}{\lambda }}}{\boldsymbol {\Sigma }}} The normal-Wishart distribution is essentially the same distribution parameterized by precision rather than variance. If ( μ , Σ ) ∼ N I W ( μ 0 , λ , Ψ , ν ) {\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})\sim \mathrm {NIW} ({\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Psi }},\nu )} then ( μ , Σ − 1 ) ∼ N W ( μ 0 , λ , Ψ − 1 , ν ) {\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Sigma }}^{-1})\sim \mathrm {NW} ({\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Psi }}^{-1},\nu )} . The normal-inverse-gamma distribution is the one-dimensional equivalent. The multivariate normal distribution and inverse Wishart distribution are the component distributions out of which this distribution is made. ^ Murphy, Kevin P. (2007). "Conjugate Bayesian analysis of the Gaussian distribution." [1] ^ Simon J.D. Prince(June 2012). Computer Vision: Models, Learning, and Inference . Cambridge University Press. 3.8: "Normal inverse Wishart distribution". ^ Gelman, Andrew, et al. Bayesian data analysis. Vol. 2, p.73. Boca Raton, FL, USA: Chapman & Hall/CRC, 2014. Bishop, Christopher M. (2006). Pattern Recognition and Machine Learning. Springer Science+Business Media. Murphy, Kevin P. (2007). "Conjugate Bayesian analysis of the Gaussian distribution." [2]
Discrete univariate
with finite support with infinite support
Continuous univariate
supported on a bounded interval supported on a semi-infinite interval supported on the whole real line with support whose type varies
Mixed univariate
Multivariate (joint) Directional Degenerate and singular Families