In information theory, the conditional entropy (or equivocation) quantifies the remaining entropy (i.e. uncertainty) of a random variable Y given that the value of another random variable X is known. It is referred to as the entropy of Y conditional on X, and is written H(Y | X). Like other entropies, the conditional entropy is measured in bits, nats, or bans.
More precisely, if H(Y | X = x) is the entropy of the variable Y conditional on the variable X taking a certain value x, then H(Y | X) is the result of averaging H(Y | X = x) over all possible values x that X may take.
Given discrete random variable X with support and Y with support , the conditional entropy of Y given X is defined as:
Note: The supports of X and Y can be replaced by their domains if it is understood that 0log0 should be treated as being equal to zero.
From this definition and the definition of conditional probability, the chain rule for conditional entropy is
This is true because
Intuitively, the combined system contains H(X,Y) bits of information: we need H(X,Y) bits of information to reconstruct its exact state. If we learn the value of X, we have gained H(X) bits of information, and the system has H(Y | X) bits of uncertainty remaining.
H(Y | X) = 0 if and only if the value of Y is completely determined by the value of X. Conversely, H(Y | X) = H(Y) if and only if Y and X are independent random variables.
Generalization to quantum theory
In quantum information theory, the conditional entropy is generalized to the conditional quantum entropy.
For any X and Y:
H(X,Y) = H(X | Y) + H(Y | X) + I(X;Y), where I(X;Y) is the mutual information between X and Y.
, where I(X;Y) is the mutual information between X and Y.
For independent X and Y:
H(Y | X) = H(Y) and H(X | Y) = H(X)
Although the specific-conditional entropy, H(X | Y = y), can be either lesser or greater than H(X | Y), H(X | Y = y) can never exceed H(X) when X is the uniform distribution.
- Theresa M. Korn; Korn, Granino Arthur. Mathematical Handbook for Scientists and Engineers: Definitions, Theorems, and Formulas for Reference and Review. New York: Dover Publications. pp. 613–614. ISBN 0-486-41147-8.
- C. Arndt (2001). Information Measures: Information and its description in Science and Engineering. Berlin: Springer. pp. 370–373. ISBN 3-540-41633-1.
- Entropy (information theory)
- Mutual information
- Conditional quantum entropy
- Variation of information
Wikimedia Foundation. 2010.
Look at other dictionaries:
Conditional quantum entropy — The conditional quantum entropy is an entropy measure used in quantum information theory. It is a generalization of the conditional entropy of classical information theory. The conditional entropy is written S(ρ | σ), or H(ρ | σ), depending on… … Wikipedia
Entropy (information theory) — In information theory, entropy is a measure of the uncertainty associated with a random variable. The term by itself in this context usually refers to the Shannon entropy, which quantifies, in the sense of an expected value, the information… … Wikipedia
Conditional mutual information — In probability theory, and in particular, information theory, the conditional mutual information is, in its most basic form, the expected value of the mutual information of two random variables given the value of a third. Contents 1 Definition 2… … Wikipedia
Entropy of mixing — The entropy of mixing is the change in the configuration entropy, an extensive thermodynamic quantity, when two different chemical substances or components are mixed. This entropy change must be positive since there is more uncertainty about the… … Wikipedia
Conditional random field — A conditional random field (CRF) is a statistical modelling method often applied in pattern recognition. More specifically it is a type of discriminative undirected probabilistic graphical model. It is used to encode known relationships between… … Wikipedia
Joint quantum entropy — The joint quantum entropy generalizes the classical joint entropy to the context of quantum information theory. Intuitively, given two quantum states ho and sigma, represented as density operators that are subparts of a quantum system, the joint… … Wikipedia
Joint entropy — The joint entropy is an entropy measure used in information theory. The joint entropy measures how much entropy is contained in a joint system of two random variables. If the random variables are X and Y, the joint entropy is written H(X,Y). Like … Wikipedia
Cross entropy — In information theory, the cross entropy between two probability distributions measures the average number of bits needed to identify an event from a set of possibilities, if a coding scheme is used based on a given probability distribution q,… … Wikipedia
Differential entropy — (also referred to as continuous entropy) is a concept in information theory that extends the idea of (Shannon) entropy, a measure of average surprisal of a random variable, to continuous probability distributions. Contents 1 Definition 2… … Wikipedia
Maximum-entropy Markov model — MEMM redirects here. For the German Nordic combined skier, see Silvio Memm. In machine learning, a maximum entropy Markov model (MEMM), or conditional Markov model (CMM), is a graphical model for sequence labeling that combines features of hidden … Wikipedia