哈夫曼编码 · 算法

[TOC] # 前缀码对每一个字符规定一个0,1串作为其代码，并要求任一字符的代码都不是其他字符的前缀。这种编码称为前缀码。译码过程需要方便地取出编码的前缀，因此需要一种表示前缀码的合适的数据结构。为此，我们选用了二叉树。 # 哈夫曼树在一般的数据结构的书中，树的那章后面，著者一般都会介绍一下哈夫曼(HUFFMAN)树和哈夫曼编码。哈夫曼是哈夫曼树的一个应用。哈夫曼编码应用广泛，如JPEG中就应用了哈夫曼编码。首先介绍什么是哈夫曼树。哈夫曼树又称最优二叉树，是一种带权路径长度最短的二叉树。所谓树的带权路径长度，就是树中所有的叶结点的权值乘上其到根结点的路径长度（若根结点为0层，叶结点到根结点的路径长度为叶结点的层数）。树的带权路径长度记为WPL= (W1\*L1+W2\*L2+W3\*L3+...+Wn\*Ln)，N个权值Wi(i=1,2,...n)构成一棵有N个叶结点的二叉树，相应的叶结点的路径长度为Li(i=1,2,...n)。可以证明哈夫曼树的WPL是最小的。 * **哈夫曼编码步骤**： 1. 对给定的n个权值{W1,W2,W3,...,Wi,...,Wn}构成n棵二叉树的初始集合F= {T1,T2,T3,...,Ti,...,Tn}，其中每棵二叉树Ti中只有一个权值为Wi的根结点，它的左右子树均为空。（为方便在计算机上实现算法，一般还要求以Ti的权值Wi的升序排列。） 2. 在F中选取两棵根结点权值最小的树作为新构造的二叉树的左右子树，新二叉树的根结点的权值为其左右子树的根结点的权值之和。 3. 从F中删除这两棵树，并把这棵新的二叉树同样以升序排列加入到集合F中。 4. 重复二和三两步，直到集合F中只有一棵二叉树为止。简易的理解就是，假如我有A,B,C,D,E五个字符，出现的频率（即权值）分别为5,4,3,2,1,那么我们第一步先取两个最小权值作为左右子树构造一个新树，即取1，2构成新树，其结点为1+2=3，如图： ![](https://box.kancloud.cn/2016-02-15_56c136938fdb1.png) 虚线为新生成的结点，第二步再把新生成的权值为3的结点放到剩下的集合中，所以集合变成{5,4,3,3}，再根据第二步，取最小的两个权值构成新树，如图： ![](https://box.kancloud.cn/2016-02-15_56c13693a0d9a.png) 再依次建立哈夫曼树，如下图： ![](https://box.kancloud.cn/2016-02-15_56c13693afe5f.jpg) 其中各个权值替换对应的字符即为下图： ![](https://box.kancloud.cn/2016-02-15_56c13693bf596.jpg) 所以各字符对应的编码为：A->11,B->10,C->00,D->011,E->010 ![](https://box.kancloud.cn/2016-04-18_57148e3b4931d.gif) # 分析 # 代码实现-1 ``` #include <stdio.h> #include <string.h> typedef struct node_t { struct node_t *left,*right; int freq; char c; } *node; struct node_t pool[256] = {{0}}; node qqq[255],*q=qqq-1; int n_nodes = 0,qend = 1; char *code[128] = {0},buf[1024]; node new_node(int freq,char c,node a,node b) { node n = pool + n_nodes++; if(freq) n->c = c,n->freq = freq; else { n->left = a,n->right = b; n->freq = a->freq + b->freq; } return n; } void qinsert(node n) { int j,i=qend++; while((j = i/2)) { if(q[j]->freq <= n->freq) break; q[i] = q[j],i=j; } q[i] = n; } node qremove() { int i,l; node n = q[i=1]; if(qend < 2) return 0; qend--; while((l=i*2) < qend) { if(l+1 < qend && q[l+1]->freq < q[l]->freq) l++; q[i] = q[l]; i = l; } q[i] = q[qend]; return n; } /* walk the tree and put 0s and 1s */ void build_code(node n, char *s, int len) { static char *out = buf; if (n->c) { s[len] = 0; strcpy(out, s); code[n->c] = out; out += len + 1; return; } s[len] = '0'; build_code(n->left, s, len + 1); s[len] = '1'; build_code(n->right, s, len + 1); } void init(const char *s) { int i,freq[128] = {0}; char c[16]; while(*s) freq[(int)*s++]++; for(i = 0; i < 128 ; i++) { if(freq[i]) qinsert(new_node(freq[i],i,0,0)); } while(qend > 2) { qinsert(new_node(0,0,qremove(),qremove())); } build_code(q[1],c,0); } void encode(const char *s, char *out) { while (*s) { strcpy(out, code[*s]); out += strlen(code[*s++]); } } void decode(const char *s, node t) { node n = t; while (*s) { if (*s++ == '0') n = n->left; else n = n->right; if (n->c) putchar(n->c), n = t; } putchar('\n'); if (t != n) printf("garbage input\n"); } int main(void) { int i; const char *str = "thisis", buf[1024]; init(str); for (i = 0; i < 128; i++) if (code[i]) printf("'%c': %s\n", i, code[i]); encode(str, buf); printf("encoded: %s\n", buf); printf("decoded: "); decode(buf, q[1]); return 0; } ```