Bayes estimator

From Wikipedia, the free encyclopedia

In decision theory and estimation theory, a Bayes estimator is an estimator or decision rule that maximizes the posterior expected value of a utility function or minimizes the posterior expected value of a loss function (also called posterior expected loss).

1 Definition
2 Examples
3 Generalized Bayes estimator
4 Empirical Bayes estimator
5 Properties
- 5.1 Admissibility of Bayes estimators
- 5.2 Asymptotic efficiency of Bayes estimators
6 See also
7 Notes
8 References
9 External links

[edit] Definition

Suppose an unknown parameter θ is known to have a prior distribution $π$ . Let $δ$ be an estimator of θ (based on some measurements), and let $R (θ,δ)$ be a risk function, such as the mean squared error. The Bayes risk of $δ$ is defined as $E π {R (θ,δ)}$ , where the expectation is taken over the probability distribution of $θ$ . An estimator $δ$ is said to be a Bayes estimator if it minimizes the Bayes risk among all estimators. The estimator which minimizes the posterior expected loss for each x also minimizes the Bayes risk and therefore is a Bayes estimator.

If the prior is improper then an estimator which minimizes the posterior expected loss for each x is called a generalized Bayes estimator.

[edit] Examples

[edit] Minimum mean square error estimation

Main article: Minimum mean square error

The most common risk function used for Bayesian estimation is the mean square error (MSE), also called squared error risk. The MSE is defined by

$\mathrm{MSE} = E\left[ (\widehat{\theta}(x) - \theta)^2 \right]$ ,

where the expectation is taken over the joint distribution of $θ$ and $x$ .

Using the MSE as risk, the Bayes estimate of the unknown parameter is simply the mean of the posterior distribution,

$\widehat{\theta}(x) = E[\theta |X]=\int \theta f(\theta |x)\,d\theta.$

This is known as the minimum mean square error (MMSE) estimator. The Bayes risk, in this case, is the posterior variance.

[edit] Bayes estimators for conjugate priors

Main article: Conjugate prior

If there is no inherent reason to prefer one prior probability distribution over another, a conjugate prior is sometimes chosen for simplicity. A conjugate prior is defined as a prior distribution belonging to some parametric family, for which the resulting posterior distribution also belongs to the same family. This is an important property, since the Bayes estimator, as well as its statistical properties (variance, confidence interval, etc.), can all be derived from the posterior distribution.

Conjugate priors are especially useful for sequential estimation, where the posterior of the current measurement is used as the prior in the next measurement. In sequential estimation, unless a conjugate prior is used, the posterior distribution typically becomes more complex with each added measurement, and the Bayes estimator cannot usually be calculated without resorting to numerical methods.

Following are some examples of conjugate priors.

If x|θ is normal, x|θ ~ N(θ,σ², and the prior is normal, θ ~ N(μ,τ²), then the posterior is also normal and the Bayes estimator under MSE is given by

$\widehat{\theta}(x)=\frac{\sigma^{2}}{\sigma^{2}+\tau^{2}}\mu+\frac{\tau^{2}}{\sigma^{2}+\tau^{2}}x.$
If x₁,...,x_n are iid Poisson random variables x_i|θ ~ P(θ), and if the prior is Gamma distributed θ ~ G(a,b), then the posterior is also Gamma distributed, and the Bayes estimator under MSE is given by

$\widehat{\theta}(X)=\frac{n\overline{X}+a}{n+\frac{1}{b}}.$
If x₁,...,x_n are iid uniformly distributed x_i|θ~U(0,θ), and if the prior is Pareto distributed θ~Pa(θ₀,a), then the posterior is also Pareto distributed, and the Bayes estimator under MSE is given by

$\widehat{\theta}(X)=\frac{(a+n)\max{(\theta_0,x_1,...,x_n)}}{a+n-1}.$

[edit] Alternative risk functions

Risk functions are chosen depending on how one measures the distance between the estimate and the unknown parameter. The MSE is the most common risk function in use, primarily due to its simplicity. However, alternative risk functions are also occasionally used. The following are several examples of such alternatives. We denote the posterior generalized distribution function by $F$ .

A "linear" loss function, with $a > 0$ , which yields the posterior median as the Bayes' estimate:

$L(\theta,\widehat{\theta}) = a|\theta-\widehat{\theta}|$

$F(\widehat{\theta }(x)|X) = \tfrac{1}{2}$
Another "linear" loss function, which assigns different "weights" $a, b > 0$ to over or sub estimation. It yields a quantile from the posterior distribution, and is a generalization of the previous loss function:

$L(\theta,\widehat{\theta}) = \left\{\begin{matrix} a|\theta-\widehat{\theta}| & \mbox{for }\theta-\widehat{\theta} \ge 0 \\ b|\theta-\widehat{\theta}| & \ \ \ \mbox{for }\theta-\widehat{\theta} < 0 \end{matrix}\right.$

$F(\widehat{\theta }(x)|X) = \frac{a}{a+b}$
The following loss function is trickier: it yields either the posterior mode, or a point close to it depending on the curvature and properties of the posterior distribution. Small values of the parameter $K > 0$ are recommended, in order to use the mode as an approximation ( $L > 0$ ):

$L(\theta,\widehat{\theta}) = \left\{\begin{matrix} 0 & \mbox{for }|\theta-\widehat{\theta}| < K \\ L & \ \ \ \mbox{for }|\theta-\widehat{\theta}| \ge K \end{matrix}\right.$

Other loss functions can be conceived, although the mean squared error is the most widely used and validated.

[edit] Generalized Bayes estimator

This article or section needs copy editing for grammar, style, cohesion, tone or spelling.
You can assist by editing it now. A how-to guide is available. (April 2008)

Improper prior has infinite mass $\int{\pi(\theta)d\theta}=\infty$ and as a result the Bayes risk is usually infinite and has no meaning. However, the posterior expected loss usually exists, represented by-

$\int{L(\theta,a)\pi(\theta|x)d\theta}$

where L is the loss function, a is an action and π(θ|x) is the posterior density.
A Generalized Bayes estimator, for a given x, is an action which minimizes the posterior expected loss (when the prior π(θ) is improper).

A useful example is location parameter estimation under L(a-θ) loss function:
Here θ is a location parameter and f_x|θ=f(x-θ). It is common to use the improper prior π(θ)=1 in this case, specially when no other more subjective information is available. This yields,
π(θ|x)=π(θ)•f_x|θ=f(x-θ), so the posterior expected loss is (by defining y=x-θ),

$E[L(a-\theta)]=\int{L(a-\theta)f(x-\theta)d\theta}=\int{L(a-x+y)f(y)dy}$

Defining C=a-x we get,

$E[L(a-\theta)]=\int{L(C+y)f(y)dy}=E[L(y+C)]$

therefore the Generalized Bayes estimator is x+C where C is a constant minimizing E[L(y+C)].
Under MSE, as a private case, $C=E[y]=\int{yf(y)dy}$ and the generalized Bayes estimator is δ(x)=x-E[y].
Assuming for example gaussian samples X|θ~N(θ,I_p) where X=(x₁,...,x_p) and θ=(θ₁,...,θ_p) , then the generalized Bayes estimator of θ is δ(X)=X .

[edit] Empirical Bayes estimator

This article or section needs copy editing for grammar, style, cohesion, tone or spelling.
You can assist by editing it now. A how-to guide is available. (April 2008)

Main article: Empirical Bayes method

A Bayes estimator derived through the empirical Bayes method is called an empirical Bayes estimator. Empirical Bayes methods enable the use of auxiliary empirical data, from past observations, in the development of a Bayes estimator. This is under the assumption that the estimated parameters are from a common prior. Similarly, in compound decision problems (where simultaneous independent observations are being held) the data from current observations can be used.

Parametric empirical Bayes (PEB) is usually preferable since it is more applicable and more accurate on small amounts of data.^[1]

Example for PEB estimation:
Given x₁,...x_n past observations with the conditional distribution f(x_i|θ_i), the estimation of θ_n+1 based on x_n+1 is required.
Assuming that θ_i have common prior with a specific parametric form (e.g. normal), we can use the past observations to determine the moments of that prior μ_π and σ_π (mean and variance)in the following way:
First we estimate the moments μ_m and σ_m of the marginal distribution of x₁,...x_n by,

$\widehat{\mu}_m=\frac{1}{n}\sum{x_i}$

$\widehat{\sigma}_m^{2}=\frac{1}{n}\sum{(x_i-\widehat{\mu}_m)^{2}}$

Then we can use the following connection, where μ_f(θ) and σ_f(θ) are the moments of the conditional distribution,

$\mu_m=E_\pi[\mu_f(\theta)] , \sigma_m^{2}=E_\pi[\sigma_f^{2}(\theta)]+E_\pi[\mu_f(\theta)-\mu_m]$

Further assuming that μ_f(θ)=θ and σ_f(θ)=K is constant, we get:

$\mu_\pi=\mu_m , \sigma_\pi^{2}=\sigma_m^{2}-\sigma_f^{2}=\sigma_m^{2}-K$

So finally we get the estimated moments of the prior,

$\widehat{\mu}_\pi=\widehat{\mu}_m$

$\widehat{\sigma}_\pi^{2}=\widehat{\sigma}_m^{2}-K$

Now, if for example x_i|θ_i~N(θ_i,1) and we assume a normal prior (which is conjugate prior in this case) so $\theta_{n+1}\sim N(\widehat{\mu}_\pi,\widehat{\sigma}_\pi^{2})$ and we can calculate the Bayes estimator of θ_n+1 based on x_n+1.

[edit] Properties

This article or section needs copy editing for grammar, style, cohesion, tone or spelling.
You can assist by editing it now. A how-to guide is available. (April 2008)

[edit] Admissibility of Bayes estimators

Bayes rules with finite Bayes risk are typically admissible:

If a Bayes rule is unique then it is admissible. For example, as stated above, under mean squared error (MSE) the Bayes rule is unique and therefore admissible.
For discrete θ, Bayes rules are admissible.
For continues θ, if the risk function R(θ,δ) is continues in θ for every δ then the Bayes rules are admissible.

However, Generalized Bayes rules usually have infinite Bayes risk. These can be inadmissible and the verification of their admissibility can be difficult. For example, the generalized Bayes estimator of θ based on gaussian samples which is described in the "Generalized Bayes estimator" section above, is inadmissible for p>2 since it is well known that the James-Stein estimator has smaller risk for all θ.

[edit] Asymptotic efficiency of Bayes estimators

Suppose that x₁,…,x_n are iid samples with density f(x_i|θ) and δ_n=δ(x₁ ,…,x_n) is Bayes estimator of θ. In addition, let $\theta_0 \in \Theta$ be the true (unknown) value of θ. While Bayesian analysis assumes θ has density π(θ) and posterior density π(θ|X), for analyzing the asymptotic behavior of δ we regard θ₀ as a deterministic parameter. Under specific conditions,^[2] for large samples (large values of n), the posterior density of θ is approximately normal. This means that for large n the effect of the prior probability which was given to θ declines!
Moreover, if δ is the Bayes estimator under MSE then it is asymptotically unbiased and it converges in distribution to the normal distribution:

$\sqrt{n}(\delta_n - \theta_0) \to N(0 , \frac{1}{I(\theta_0)}),$

where I(θ₀) is the fisher information of θ₀. As a conclusion, the Bayes estimator δ_n under MSE is asymptotically efficient.

Another estimator which is asymptotically normal and efficient is the deterministic Maximum likelihood estimator (MLE), the relations between the two (for large samples) can be shown in the following simple example. Consider the estimator of θ based on binomial sample x~b(θ,n) where θ denotes the probability for success. Assuming the prior of θ is a Beta distribution, B(a,b), this is a conjugate prior and the posterior distribution is known to be B(a+x,b+n-x). So the Bayes estimator under MSE is,

$\delta_n(x)=E[\theta|x]=\frac{a+x}{a+b+n}$

The MLE in this case is x/n and so we get,

$\delta_n(x)=\frac{a+b}{a+b+n}E[\theta]+\frac{n}{a+b+n}MLE$

The last equation implies that, for n → ∞, the Bayes estimator (in the described problem) is close to the MLE. On the other hand when n is small the prior is more dominant.

[edit] See also

[edit] Notes

^ Berger (1980), section 4.5.
^ Lehmann and Casella (1998), section 6.8

[edit] References

Lehmann, E. L.; Casella, G. (1998). Theory of Point Estimation. Springer, 2nd ed. ISBN 0-387-98502-6.
Berger, J.O. (1985). Statistical Decision Theory and Bayesian Analysis. Springer Verlag, New York, Second Edition.. ISBN ISBN 0-387-96098-8 and also ISBN 3-540-96098-8.

[edit] External links

Bayesian estimation on cnx.org

Categories: Estimation theory | Bayesian statistics

See also ebooksgratis.com: no banners, no cookies, totally FREE.

Bayes estimator

From Wikipedia, the free encyclopedia

Contents

[edit] Definition

[edit] Examples

[edit] Minimum mean square error estimation

[edit] Bayes estimators for conjugate priors

[edit] Alternative risk functions

[edit] Generalized Bayes estimator

[edit] Empirical Bayes estimator

[edit] Properties

[edit] Admissibility of Bayes estimators

[edit] Asymptotic efficiency of Bayes estimators

[edit] See also

[edit] Notes

[edit] References

[edit] External links

Views

Navigation

Interaction

Search

Languages