I have been taking a graduate course on computational statistics this semester. Albeit to the fact the choices for priors in Bayesian model is subjective and prone to human error, the paradigm of hierarchy comes pretty handy when we deal with sequential dependent models such as some RNNs.

It is more and more becoming why Bayesian statistics is computational intensive: mind you, what else you can resort to when you wish to integrate out a parameter over products of mixtures of exponential family densities?

Indeed, consider the following simple hierarchical model:

We consider two cases: either $s \mathbin{\sim} \operatorname{\Gamma}(a, b)$ or $s \mathbin{\sim} \operatorname{Inv\operatorname{\Gamma}}(a, b)$. The integration for the first case is not very straightforward; nevertheless, it is given as a one-liner fact in the original paper [West 1987]. We proceed using this technique introduced on Cross Validated.

### Gamma Prior

This suggests a change of variable as $v = \sqrt{s} > 0$. It follows that:

Write $t = \sqrt{2b} v - \frac{\lvert z \rvert}{v} \in \mathbb{R}$. Noted $t(v)$ is monotone and $t \rightarrow -\infty$ as $v \rightarrow 0$; $t \rightarrow \infty$ as $v \rightarrow \infty$.

Moreover, noted

where the other root is rejected since $v\ge 0$. Hence

It follows that

where the last equality follows since the second term in the Jacobian is odd.

We therefore conclude that when $a=1$, $z\mathbin{\sim} \mathrm{DE}(\sqrt{2b})$.

### Inverse Gamma Prior

I.e., when $a = b = v/2$, $z\mathbin{\sim} t_{\mathrm{df}=v}$.

### Remark.

There may exist other simpler and more elegant method of tacking the first integral. For example, the use of moment generating functions or convert it into a contour integral. I may make a further update after I grasp them.

Moreover, the setting of $a=1$ in the first case is solely for the sake of integration. I am not sure whether the general integral is analytic otherwise.

1. West, M. 1987. On scale mixtures of normal distributions. Biometrika 74, 3, 646–648.