| No. | Data | Occurrence | Total Count | Probability | Contribution Entropy |
|---|
The online information entropy (Shannon entropy) calculator supports calculating the entropy of binary data, text data, and probability distributions, and supports setting the logarithmic base to 2, e, and 10. Support setting result precision.
Information entropy (also known as Shannon entropy) is a core concept in information theory,
proposed by Claude Shannon in 1948, used to quantify the uncertainty or randomness of
information. It solves the quantification problem of 'how much information there really is'.
Information entropy measures the average uncertainty of an information source (or random event).
The higher the entropy value, the greater the uncertainty and
randomness of the system, making it more difficult to predict.
The lower the entropy value, the more ordered and deterministic
the system is.
- Input Content : Input the data used for calculating entropy.
-
Data Type : Select the data type of the input data or open file. This tool supports the
following three types of data:
- Binary Data : The data or opened file is binary data.
- Text Data : The input data or opened file is character text.
- Probability Data : The input data or opened file is probability data. The format of probability data is comma separated numerical values. Each probability must be greater than 0 and less than or equal to 1. The sum of all probabilities is 1.
- Data Format : When the data type is binary data, select the format of the input binary data, supporting hex and base64 formats.
- Calculate Type : When the data type is binary data, select to calculate bit level entropy or byte level entropy. The range of entropy values at the bit level is 0 <= H <= 1 (base 2). The range of entropy values at the byte level is 0 <= H <= 8 (base 2).
-
Base : Select the base for entropy calculation. This tool supports the following three
bases:
- 2 : The entropy value is calculated in bits. Commonly used in computer science, communication engineering, electronics, etc., it is the most commonly used unit and conforms to the binary system.
- e : The unit of entropy calculation is nat. Commonly used in theoretical deduction and mathematical fields, sometimes with advantages in formula simplification.
- 10 : The unit for calculating entropy is Hart, Less usage.
- Calculate Formula : H(x) = -∑P(xi)log(b,P(xi)) (i=1,2,..n) . P(xi) is the probability of xi , b is a logarithmic base that determines the unit of entropy. The maximum value of entropy Hmax = log(b,N), where b is the logarithmic base and N is the number of all possible values.
- According to information theory research, without considering context (zero order entropy), the entropy of Chinese text is approximately 9.5-9.7 bits per Chinese character; After considering context dependency (high-order entropy), the actual entropy can be reduced to 5-6 bits per Chinese character or even lower. The entropy of English text is approximately 4.0-4.7 bits per letter.
- When the input data is text, if the input text consists entirely of ASCII characters, the maximum entropy value is log(b,128). If the input text is entirely composed of Chinese characters, the maximum entropy value is log(b,6763) (the number of GB2312 Chinese characters). For other situations, the maximum entropy value is not calculated.