Matrix t Notation T n , p ( ν , M , Σ , Ω ) {\displaystyle {\rm {T}}_{n,p}(\nu ,\mathbf {M} ,{\boldsymbol {\Sigma }},{\boldsymbol {\Omega }})} Parameters M {\displaystyle \mathbf {M} } location (real n × p {\displaystyle n\times p} matrix ) Ω {\displaystyle {\boldsymbol {\Omega }}} scale (positive-definite real p × p {\displaystyle p\times p} matrix ) Σ {\displaystyle {\boldsymbol {\Sigma }}} scale (positive-definite real n × n {\displaystyle n\times n} matrix )
ν > 0 {\displaystyle \nu >0} degrees of freedom (real) Support X ∈ R n × p {\displaystyle \mathbf {X} \in \mathbb {R} ^{n\times p}} PDF Γ p ( ν + n + p − 1 2 ) ( π ) n p 2 Γ p ( ν + p − 1 2 ) | Ω | − n 2 | Σ | − p 2 {\displaystyle {\frac {\Gamma _{p}\left({\frac {\nu +n+p-1}{2}}\right)}{(\pi )^{\frac {np}{2}}\Gamma _{p}\left({\frac {\nu +p-1}{2}}\right)}}|{\boldsymbol {\Omega }}|^{-{\frac {n}{2}}}|{\boldsymbol {\Sigma }}|^{-{\frac {p}{2}}}}
× | I n + Σ − 1 ( X − M ) Ω − 1 ( X − M ) T | − ν + n + p − 1 2 {\displaystyle \times \left|\mathbf {I} _{n}+{\boldsymbol {\Sigma }}^{-1}(\mathbf {X} -\mathbf {M} ){\boldsymbol {\Omega }}^{-1}(\mathbf {X} -\mathbf {M} )^{\rm {T}}\right|^{-{\frac {\nu +n+p-1}{2}}}} CDF No analytic expression Mean M {\displaystyle \mathbf {M} } if ν > 1 {\displaystyle \nu >1} , else undefined Mode M {\displaystyle \mathbf {M} } Variance c o v ( v e c ( X ) ) = Σ ⊗ Ω ν − 2 {\displaystyle \mathrm {cov} (\mathrm {vec} (\mathbf {X} ))={\frac {{\boldsymbol {\Sigma }}\otimes {\boldsymbol {\Omega }}}{\nu -2}}} if ν > 2 {\displaystyle \nu >2} , else undefined CF see below
In statistics , the matrix t -distribution (or matrix variate t -distribution ) is the generalization of the multivariate t -distribution from vectors to matrices .[ 1] [ 2]
The matrix t -distribution shares the same relationship with the multivariate t -distribution that the matrix normal distribution shares with the multivariate normal distribution : If the matrix has only one row, or only one column, the distributions become equivalent to the corresponding (vector-)multivariate distribution. The matrix t -distribution is the compound distribution that results from an infinite mixture of a matrix normal distribution with an inverse Wishart distribution placed over either of its covariance matrices,[ 1] and the multivariate t -distribution can be generated in a similar way.[ 2]
In a Bayesian analysis of a multivariate linear regression model based on the matrix normal distribution, the matrix t -distribution is the posterior predictive distribution .[ 3]
For a matrix t -distribution, the probability density function at the point X {\displaystyle \mathbf {X} } of an n × p {\displaystyle n\times p} space is
f ( X ; ν , M , Σ , Ω ) = K × | I n + Σ − 1 ( X − M ) Ω − 1 ( X − M ) T | − ν + n + p − 1 2 , {\displaystyle f(\mathbf {X} ;\nu ,\mathbf {M} ,{\boldsymbol {\Sigma }},{\boldsymbol {\Omega }})=K\times \left|\mathbf {I} _{n}+{\boldsymbol {\Sigma }}^{-1}(\mathbf {X} -\mathbf {M} ){\boldsymbol {\Omega }}^{-1}(\mathbf {X} -\mathbf {M} )^{\rm {T}}\right|^{-{\frac {\nu +n+p-1}{2}}},} where the constant of integration K is given by
K = Γ p ( ν + n + p − 1 2 ) ( π ) n p 2 Γ p ( ν + p − 1 2 ) | Ω | − n 2 | Σ | − p 2 . {\displaystyle K={\frac {\Gamma _{p}\left({\frac {\nu +n+p-1}{2}}\right)}{(\pi )^{\frac {np}{2}}\Gamma _{p}\left({\frac {\nu +p-1}{2}}\right)}}|{\boldsymbol {\Omega }}|^{-{\frac {n}{2}}}|{\boldsymbol {\Sigma }}|^{-{\frac {p}{2}}}.} Here Γ p {\displaystyle \Gamma _{p}} is the multivariate gamma function .
If X ∼ T n × p ( ν , M , Σ , Ω ) {\displaystyle \mathbf {X} \sim {\mathcal {T}}_{n\times p}(\nu ,\mathbf {M} ,\mathbf {\Sigma } ,\mathbf {\Omega } )} , then we have the following properties[ 2] :
The mean, or expected value is, if ν > 1 {\displaystyle \nu >1} :
E [ X ] = M {\displaystyle E[\mathbf {X} ]=\mathbf {M} } and we have the following second-order expectations, if ν > 2 {\displaystyle \nu >2} :
E [ ( X − M ) ( X − M ) T ] = Σ tr ( Ω ) ν − 2 {\displaystyle E[(\mathbf {X} -\mathbf {M} )(\mathbf {X} -\mathbf {M} )^{T}]={\frac {\mathbf {\Sigma } \operatorname {tr} (\mathbf {\Omega } )}{\nu -2}}} E [ ( X − M ) T ( X − M ) ] = Ω tr ( Σ ) ν − 2 {\displaystyle E[(\mathbf {X} -\mathbf {M} )^{T}(\mathbf {X} -\mathbf {M} )]={\frac {\mathbf {\Omega } \operatorname {tr} (\mathbf {\Sigma } )}{\nu -2}}} where tr {\displaystyle \operatorname {tr} } denotes trace .
More generally, for appropriately dimensioned matrices A ,B ,C :
E [ ( X − M ) A ( X − M ) T ] = Σ tr ( A T Ω ) ν − 2 E [ ( X − M ) T B ( X − M ) ] = Ω tr ( B T Σ ) ν − 2 E [ ( X − M ) C ( X − M ) ] = Σ C T Ω ν − 2 {\displaystyle {\begin{aligned}E[(\mathbf {X} -\mathbf {M} )\mathbf {A} (\mathbf {X} -\mathbf {M} )^{T}]&={\frac {\mathbf {\Sigma } \operatorname {tr} (\mathbf {A} ^{T}\mathbf {\Omega } )}{\nu -2}}\\E[(\mathbf {X} -\mathbf {M} )^{T}\mathbf {B} (\mathbf {X} -\mathbf {M} )]&={\frac {\mathbf {\Omega } \operatorname {tr} (\mathbf {B} ^{T}\mathbf {\Sigma } )}{\nu -2}}\\E[(\mathbf {X} -\mathbf {M} )\mathbf {C} (\mathbf {X} -\mathbf {M} )]&={\frac {\mathbf {\Sigma } \mathbf {C} ^{T}\mathbf {\Omega } }{\nu -2}}\end{aligned}}} Transpose transform:
X T ∼ T p × n ( ν , M T , Ω , Σ ) {\displaystyle \mathbf {X} ^{T}\sim {\mathcal {T}}_{p\times n}(\nu ,\mathbf {M} ^{T},\mathbf {\Omega } ,\mathbf {\Sigma } )} Linear transform: let A (r -by-n ), be of full rank r ≤ n and B (p -by-s ), be of full rank s ≤ p , then:
A X B ∼ T r × s ( ν , A M B , A Σ A T , B T Ω B ) {\displaystyle \mathbf {AXB} \sim {\mathcal {T}}_{r\times s}(\nu ,\mathbf {AMB} ,\mathbf {A\Sigma A} ^{T},\mathbf {B} ^{T}\mathbf {\Omega B} )} The characteristic function and various other properties can be derived from the re-parameterised formulation (see below).
Re-parameterized matrix t -distribution [ edit ] Re-parameterized matrix t Notation T n , p ( α , β , M , Σ , Ω ) {\displaystyle {\rm {T}}_{n,p}(\alpha ,\beta ,\mathbf {M} ,{\boldsymbol {\Sigma }},{\boldsymbol {\Omega }})} Parameters M {\displaystyle \mathbf {M} } location (real n × p {\displaystyle n\times p} matrix ) Ω {\displaystyle {\boldsymbol {\Omega }}} scale (positive-definite real p × p {\displaystyle p\times p} matrix ) Σ {\displaystyle {\boldsymbol {\Sigma }}} scale (positive-definite real n × n {\displaystyle n\times n} matrix ) α > ( p − 1 ) / 2 {\displaystyle \alpha >(p-1)/2} shape parameter
β > 0 {\displaystyle \beta >0} scale parameter Support X ∈ R n × p {\displaystyle \mathbf {X} \in \mathbb {R} ^{n\times p}} PDF Γ p ( α + n / 2 ) ( 2 π / β ) n p 2 Γ p ( α ) | Ω | − n 2 | Σ | − p 2 {\displaystyle {\frac {\Gamma _{p}(\alpha +n/2)}{(2\pi /\beta )^{\frac {np}{2}}\Gamma _{p}(\alpha )}}|{\boldsymbol {\Omega }}|^{-{\frac {n}{2}}}|{\boldsymbol {\Sigma }}|^{-{\frac {p}{2}}}}
× | I n + β 2 Σ − 1 ( X − M ) Ω − 1 ( X − M ) T | − ( α + n / 2 ) {\displaystyle \times \left|\mathbf {I} _{n}+{\frac {\beta }{2}}{\boldsymbol {\Sigma }}^{-1}(\mathbf {X} -\mathbf {M} ){\boldsymbol {\Omega }}^{-1}(\mathbf {X} -\mathbf {M} )^{\rm {T}}\right|^{-(\alpha +n/2)}} CDF No analytic expression Mean M {\displaystyle \mathbf {M} } if α > p / 2 {\displaystyle \alpha >p/2} , else undefined Variance 2 ( Σ ⊗ Ω ) β ( 2 α − p − 1 ) {\displaystyle {\frac {2({\boldsymbol {\Sigma }}\otimes {\boldsymbol {\Omega }})}{\beta (2\alpha -p-1)}}} if α > ( p + 1 ) / 2 {\displaystyle \alpha >(p+1)/2} , else undefined CF see below
An alternative parameterisation of the matrix t -distribution uses two parameters α {\displaystyle \alpha } and β {\displaystyle \beta } in place of ν {\displaystyle \nu } .[ 3]
This formulation reduces to the standard matrix t -distribution with β = 2 , α = ν + p − 1 2 . {\displaystyle \beta =2,\alpha ={\frac {\nu +p-1}{2}}.}
This formulation of the matrix t -distribution can be derived as the compound distribution that results from an infinite mixture of a matrix normal distribution with an inverse multivariate gamma distribution placed over either of its covariance matrices.
If X ∼ T n , p ( α , β , M , Σ , Ω ) {\displaystyle \mathbf {X} \sim {\rm {T}}_{n,p}(\alpha ,\beta ,\mathbf {M} ,{\boldsymbol {\Sigma }},{\boldsymbol {\Omega }})} then[ 2] [ 3]
X T ∼ T p , n ( α , β , M T , Ω , Σ ) . {\displaystyle \mathbf {X} ^{\rm {T}}\sim {\rm {T}}_{p,n}(\alpha ,\beta ,\mathbf {M} ^{\rm {T}},{\boldsymbol {\Omega }},{\boldsymbol {\Sigma }}).} The property above comes from Sylvester's determinant theorem :
det ( I n + β 2 Σ − 1 ( X − M ) Ω − 1 ( X − M ) T ) = {\displaystyle \det \left(\mathbf {I} _{n}+{\frac {\beta }{2}}{\boldsymbol {\Sigma }}^{-1}(\mathbf {X} -\mathbf {M} ){\boldsymbol {\Omega }}^{-1}(\mathbf {X} -\mathbf {M} )^{\rm {T}}\right)=} det ( I p + β 2 Ω − 1 ( X T − M T ) Σ − 1 ( X T − M T ) T ) . {\displaystyle \det \left(\mathbf {I} _{p}+{\frac {\beta }{2}}{\boldsymbol {\Omega }}^{-1}(\mathbf {X} ^{\rm {T}}-\mathbf {M} ^{\rm {T}}){\boldsymbol {\Sigma }}^{-1}(\mathbf {X} ^{\rm {T}}-\mathbf {M} ^{\rm {T}})^{\rm {T}}\right).} If X ∼ T n , p ( α , β , M , Σ , Ω ) {\displaystyle \mathbf {X} \sim {\rm {T}}_{n,p}(\alpha ,\beta ,\mathbf {M} ,{\boldsymbol {\Sigma }},{\boldsymbol {\Omega }})} and A ( n × n ) {\displaystyle \mathbf {A} (n\times n)} and B ( p × p ) {\displaystyle \mathbf {B} (p\times p)} are nonsingular matrices then[ 2] [ 3]
A X B ∼ T n , p ( α , β , A M B , A Σ A T , B T Ω B ) . {\displaystyle \mathbf {AXB} \sim {\rm {T}}_{n,p}(\alpha ,\beta ,\mathbf {AMB} ,\mathbf {A} {\boldsymbol {\Sigma }}\mathbf {A} ^{\rm {T}},\mathbf {B} ^{\rm {T}}{\boldsymbol {\Omega }}\mathbf {B} ).} The characteristic function is[ 3]
ϕ T ( Z ) = exp ( t r ( i Z ′ M ) ) | Ω | α Γ p ( α ) ( 2 β ) α p | Z ′ Σ Z | α B α ( 1 2 β Z ′ Σ Z Ω ) , {\displaystyle \phi _{T}(\mathbf {Z} )={\frac {\exp({\rm {tr}}(i\mathbf {Z} '\mathbf {M} ))|{\boldsymbol {\Omega }}|^{\alpha }}{\Gamma _{p}(\alpha )(2\beta )^{\alpha p}}}|\mathbf {Z} '{\boldsymbol {\Sigma }}\mathbf {Z} |^{\alpha }B_{\alpha }\left({\frac {1}{2\beta }}\mathbf {Z} '{\boldsymbol {\Sigma }}\mathbf {Z} {\boldsymbol {\Omega }}\right),} where
B δ ( W Z ) = | W | − δ ∫ S > 0 exp ( t r ( − S W − S − 1 Z ) ) | S | − δ − 1 2 ( p + 1 ) d S , {\displaystyle B_{\delta }(\mathbf {WZ} )=|\mathbf {W} |^{-\delta }\int _{\mathbf {S} >0}\exp \left({\rm {tr}}(-\mathbf {SW} -\mathbf {S^{-1}Z} )\right)|\mathbf {S} |^{-\delta -{\frac {1}{2}}(p+1)}d\mathbf {S} ,} and where B δ {\displaystyle B_{\delta }} is the type-two Bessel function of Herz[clarification needed ] of a matrix argument.
^ a b Zhu, Shenghuo and Kai Yu and Yihong Gong (2007). "Predictive Matrix-Variate t Models." In J. C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, NIPS '07: Advances in Neural Information Processing Systems 20, pages 1721–1728. MIT Press, Cambridge, MA, 2008. The notation is changed a bit in this article for consistency with the matrix normal distribution article. ^ a b c d e Gupta, Arjun K and Nagar, Daya K (1999). Matrix variate distributions . CRC Press. pp. Chapter 4. {{cite book }}
: CS1 maint: multiple names: authors list (link ) ^ a b c d e Iranmanesh, Anis, M. Arashi and S. M. M. Tabatabaey (2010). "On Conditional Applications of Matrix Variate Normal Distribution" . Iranian Journal of Mathematical Sciences and Informatics , 5:2, pp. 33–43.
Discrete univariate
with finite support with infinite support
Continuous univariate
supported on a bounded interval supported on a semi-infinite interval supported on the whole real line with support whose type varies
Mixed univariate
Multivariate (joint) Directional Degenerate and singular Families