By David L. Neuhoff, Department of Electrical Engineering and Computer Science, University of Michigan, USA, neuhoff@umich.edu
The usual answer to the question “What probability distribution maximizes entropy or differential entropy of a random variable X subject to the constraint that the expected value of a real-valued function g applied to X has a specified value μ?” is an exponential distribution (probability mass or probability density function), with g(x) in the exponent multiplied by a parameter λ, and with the parameter chosen so the exponential distribution causes the expected value of g(X) to equal μ. The latter is called moment matching. While it is well-known that, when there are multiple expected value constraints, there are functions and expected value specifications for which moment matching is not possible, it is not well-known that this can happen when there is a single expected value constraint and a single parameter.
This motivates the present monograph, whose goal is to reexammine the question posed above, and to derive its answer in an accessible, self-contained and complete fashion. It also derives the maximum entropy/differential entropy when there is a constraint on the support of the probability distributions, when there is only a bound on expected value and when there is a variance constraint. Properties of the resulting maximum entropy/differential entropy as a function of μ are derived, such as its convexity and its monotonicities. Example functions are presented, including many for which moment matching is possible for all relevant values of μ, and some for which it is not. Indeed, there can be only subtle differences between the two kinds of functions.
As one-parameter exponential probability distributions play a dominant role, one section of this monograph provides a self-contained discussion and derivation of their properties, such as the finiteness and continuity of the exponential normalizing constant (sometimes called the partition function) as λ varies, the finiteness, continuity, monotonicity and limits of the expected value of g(X) under the exponential distribution as λ varies, and similar issues for entropy and differential entropy. Most of these are needed in deriving the maximum entropy/differential entropy or the properties of the resulting function of μ.
Aside from addressing the question posed initially, this monograph can be viewed as a warmup for discussions of maximizing entropy/differential entropy with multiple expected value constraints and of multiparameter exponential families. It also provides a small taste of information geometry.
The usual answer to the question “What probability distribution maximizes entropy or differential entropy of a random variable X subject to the constraint that the expected value of a real-valued function g applied to X has a specified value µ?” is an exponential distribution (probability mass or probability density function), with g(x) in the exponent multiplied by a parameter λ, and with the parameter chosen so the exponential distribution causes the expected value of g(X) to equal µ. The latter is called moment matching. While it is well-known that, when there are multiple expected value constraints, there are functions and expected value specifications for which moment matching is not possible, it is not well-known that this can happen when there is a single expected value constraint and a single parameter.
This motivates the present monograph, whose goal is to reexamine the question posed above, and to derive its answer in an accessible, self-contained and complete fashion. It also derives the maximum entropy/differential entropy when there is a constraint on the support of the probability distributions, when there is only a bound on expected value and when there is a variance constraint. Properties of the resulting maximum entropy/differential entropy as a function of µ are derived, such as its convexity and its monotonicities. Example functions are presented, including many for which moment matching is possible for all relevant values of µ, and some for which it is not. Indeed, there can be only subtle differences between the two kinds of functions.
As one-parameter exponential probability distributions play a dominant role, one section provides a self-contained discussion and derivation of their properties, such as the finiteness and continuity of the exponential normalizing constant (sometimes called the partition function) as λ varies, the finiteness, continuity, monotonicity and limits of the expected value of g(X) under the exponential distribution as λ varies, and similar issues for entropy and differential entropy. Most of these are needed in deriving the maximum entropy/differential entropy or the properties of the resulting function of µ. Aside from addressing the question posed initially, this monograph can be viewed as a warmup for discussions of maximizing entropy/differential entropy with multiple expected value constraints and of multiparameter exponential families. It also provides a small taste of information geometry.