简介

哈夫曼编码（Huffman Coding）是一种无损压缩算法，由David A. Huffman在1952年提出。哈夫曼编码是一种用于数据压缩的算法，它基于字符出现频率的统计信息，将出现频率高的字符用较短的编码表示，出现频率低的字符用较长的编码表示。它的基本思想是将频率较高的字符用较短的编码表示，频率较低的字符用较长的编码表示，以此减少所需的比特数，从而达到压缩数据的效果。这样可以有效地减少数据的存储空间，提高数据传输的效率。该算法目前已经被广泛应用于数据传输、图像压缩、音频压缩等领域。

实现步骤

实验安排

第1天上午

PPT+Java环境配置+Node类+列表和排序

压缩算法简介及课程安排

阶段1：环境准备 Hello.java

阶段2：

统计字母出现频率 0.5h

阶段3：
2. 建立概率节点类 0.5h
3. 建立节点列表 0.5h
4. 定义根据节点概率的排序算法 0.5h

检测 0.5

第1天下午

2023/06/02 下午 str->HFT + HFT→字典<char, str> + 编码 + 参数化执行

新节点建立=新节点建立+最小双节点选择+添加到列表+排序 0.5h
循环节点遍历 0.5h
编码树构建输出
字符编码

第2天上午

2023/06/02 上午

第2天下午

一些问题

java -cp . Hello
javac -encoding gbk/utf-8 Hello.java

解码树构建+解码过程

最终目标

实现对二进制文件的压缩和解压：

压缩： haffman -c input_file compressed_file 将 input_file 压缩后输出到 compressed_file
解缩： haffman -d compressed_file decompressed_file 将 compressed_file 解压后输出到 decompressed_file
java
- java Haffman -c xunlei.exe xunlei.exe.bat 将文件 xunlei.exe 压缩后输出到 xunlei.exe.bat
- java Haffman -d xunlei.exe.bat xunlei1.exe 将 xunlei.exe.bat 解压到 xunlei1.exe
- 其中 -c 表示 compress 压缩， -d 表示 decompress 解压
C/C++
- haffman.exe -c xunlei.exe xunlei.exe.bat 将文件 xunlei.exe 压缩后输出到 xunlei.exe.bat
- haffman.exe -d xunlei.exe.bat xunlei1.exe 将 xunlei.exe.bat 解压到 xunlei1.exe
- 其中 -c 表示 compress 压缩， -d 表示 decompress 解压
Python
- python haffman.py -c xunlei.exe xunlei.exe.bat 将文件 xunlei.exe 压缩后输出到 xunlei.exe.bat
- python haffman.py -d xunlei.exe.bat xunlei1.exe 将 xunlei.exe.bat 解压到 xunlei1.exe
- 其中 -c 表示 compress 压缩， -d 表示 decompress 解压

第2天下午：文档编写

实现代码

在Java中实现哈夫曼编码的步骤如下：

统计字符出现的频率，可以使用HashMap来存储每个字符出现的次数。根据统计结果构建哈夫曼树，可以使用优先队列（PriorityQueue）来实现。遍历哈夫曼树，生成每个字符对应的哈夫曼编码，可以使用递归实现。将原始数据按照生成的哈夫曼编码进行编码，将编码后的数据写入输出流。解压缩时，读取编码后的数据，根据哈夫曼编码还原原始数据。

以下是一个简单的Java实现示例：

Java二进制操作示例

Java Formatting, javatpoint.com/java-string-format, java-string-format-examples

String.format("%1$" + length + “s”, inputString).replace(’ ', ‘0’);

注 Java的 String.Format 也是可以选参数的，1$ 表示第1个参数（下标从1开始），如：String.format("%2$s", 32, "Hello"); // prints: "Hello"

public class BitwiseOperations {
public static void main(String[] args) {
    int a = 0b10101010; // 170
    int b = 0b01010101; // 85
System.out.println("a = 0b10101010, b = 0b01010101"); // 打印：0

    // 按位与运算 &
    int c = a & b; // 0b00000000，即 0
    System.out.println("a & b = " + Integer.toBinaryString(c)); // 打印：0

    // 按位或运算 |
    int d = a | b; // 0b11111111，即 255
    System.out.println("a | b = " + Integer.toBinaryString(d)); // 打印：11111111

    // 按位异或运算 ^
    int e = a ^ b; // 0b11111111，即 255
    System.out.println("a ^ b = " + Integer.toBinaryString(e)); // 打印：11111111

    // 取反运算 ~
    int f = ~a; // 0b01010101，即 85 的补码形式
    System.out.println("~a = " + Integer.toBinaryString(f)); // 打印：1111111111111111111111111010101

    // 左移运算 <<
    int g = a << 1; // 0b101010100，即 340
    System.out.println("a << 1 = " + String.format("%40s", Integer.toBinaryString(g)).replace(' ', '0')); // 打印：101010100

    // 右移运算 >>
    int h = a >> 1; // 0b01010101，即 85
    System.out.println("a >> 1 = " + Integer.toBinaryString(h)); // 打印：1010101

    // 无符号右移运算 >>>
    int i = a >>> 1; // 0b01010101，即 85
    System.out.println("a >>> 1 = " + Integer.toBinaryString(i)); // 打印：1010101
}
}

执行与输出：

PS C:\data> javac -encoding utf-8 .\BitwiseOperations.java ; java BitwiseOperations
a = 0b10101010, b = 0b01010101
a & b = 0
a | b = 11111111
a ^ b = 11111111
~a = 11111111111111111111111101010101
a << 1 = 0000000000000000000000000000000101010100
a >> 1 = 1010101
a >>> 1 = 1010101
PS C:\data>

Java二进制读写

以下是Java程序读取一个二进制文件in.dat，将所有字节反转后写到 output.dat 的实现：

import java.io.*;

public class ReverseBytes {
public static void main(String[] args) throws IOException {
    // 打开输入文件
    FileInputStream in = new FileInputStream("in.dat");
    // 创建输出文件
    FileOutputStream out = new FileOutputStream("output.dat");

    int byteCount = 0;
    byte[] buffer = new byte[1024];

    // 读取和写入数据
    while ((byteCount = in.read(buffer)) != -1) {
        // 反转字节数组中的每个元素
        for (int i = 0; i < byteCount; i++) {
            buffer[i] = (byte) ~buffer[i];
        }
        // 写入反转后的数据到输出文件
        out.write(buffer, 0, byteCount);
    }

    // 关闭输入和输出流
    in.close();
    out.close();
}
}

在程序中，我们使用 FileInputStream 和 FileOutputStream 类来打开输入和输出文件，并用一个 byte 数组缓冲区来存储读取的字节数。然后，在 while 循环中不断读取数据，将每个字节都取反（即按位取反 ~），最后将反转后的数据写入到输出文件中。最后，记得关闭输入和输出流。

参考资料

Java实现

2023/05/31 测试可用

$ javac -encoding utf-8 .\Haffman.java
$ java Haffman

import java.util.*;

public class Haffman {
public static void compress(String data) {
    // Step 1: 使用字典统计统计字符出现的次数
    Map<Character, Integer> freqMap = new HashMap<>();
    for (char c : data.toCharArray()) {
        freqMap.put(c, freqMap.getOrDefault(c, 0) + 1);
    }

    // Step 2: 构建哈夫曼树
    PriorityQueue<Node> pq = new PriorityQueue<>();
    for (Map.Entry<Character, Integer> entry : freqMap.entrySet()) {
        pq.offer(new Node(entry.getKey(), entry.getValue()));
    }
    while (pq.size() > 1) {
        Node left = pq.poll();
        Node right = pq.poll();
        pq.offer(new Node('\0', left.freq + right.freq, left, right));
    }
    Node root = pq.poll();

    // Step 3: 生成哈夫曼编码
    Map<Character, String> codeMap = new HashMap<>();
    generateCode(root, "", codeMap);

    // Step 4: 压缩数据
    StringBuilder sb = new StringBuilder();
    for (char c : data.toCharArray()) {
        sb.append(codeMap.get(c));
    }
    String compressedData = sb.toString();
    System.out.println(compressedData);

    // Step 5: 解压缩数据
    StringBuilder result = new StringBuilder();
    Node node = root;
    for (char c : compressedData.toCharArray()) {
        if (c == '0') {
            node = node.left;
        } else {
            node = node.right;
        }
        if (node.isLeaf()) {
            result.append(node.ch);
            node = root;
        }
    }
    System.out.println(result.toString());
}

private static void generateCode(Node node, String code, Map<Character, String> codeMap) {
    if (node.isLeaf()) {
        codeMap.put(node.ch, code);
    } else {
        generateCode(node.left, code + "0", codeMap);
        generateCode(node.right, code + "1", codeMap);
    }
}

static class Node implements Comparable<Node> {
    char ch;
    int freq;
    Node left, right;

    public Node(char ch, int freq) {
        this.ch = ch;
        this.freq = freq;
    }

    public Node(char ch, int freq, Node left, Node right) {
        this.ch = ch;
        this.freq = freq;
        this.left = left;
        this.right = right;
    }

    public boolean isLeaf() {
        return left == null && right == null;
    }

    @Override
    public int compareTo(Node o) {
        return Integer.compare(freq, o.freq);
    }
}

public static void main(String[] args) {
    String data = "ABBCDDEEE";
    compress(data);
}
}