1 实验目的

本次实验主要是通过随机生成的符合Binomial分布的随机数，以验证其概率计算公式的正确性，其公式定义如下：
$P(X|n,k) = (_k^n)p^k(1-p)^{(n-k)}$

其中， $(_k^n)=C_n^k=n(n-1)(n-2)...(n-k)=\frac{n!}{k!(n-k)!}$ 即n中k种可能的全排列。

2 实验原理

实验分为两步：
第1步，通过 test1() 证明Javar提供的 Random.nextInt()能够提供平均分布的函数。
第2步，利用 Random.nextInt(n) 设计函数 boolean test(double p) 返回概率这p的事件是否可以发生，然后随机生成数据，验证 k 与 n 的关系。

3 实验过程

编写了Java代码，参见附录，实现了整个过程。

4 实验结果

4.1 平均分布验证 `test1()`

函数 test1() 主要用于验证 Random.nextInt(n) 生成的随机数是否符合正态分布。
通过，使用源代码的 test1() 函数，生成2亿个数据，得到以下结果：

******************** 结果列表 ********************
counters[0] = 9999557
counters[1] = 9995208
counters[2] = 9999310
counters[3] = 9999332
counters[4] = 9999267
counters[5] = 9999526
counters[6] = 9998762
counters[7] = 9999090
counters[8] = 10004129
counters[9] = 10004558
counters[10] = 10002649
counters[11] = 9996725
counters[12] = 9999594
counters[13] = 10000538
counters[14] = 10000581
counters[15] = 10005193
counters[16] = 9997375
counters[17] = 10000347
counters[18] = 9998698
counters[19] = 9999561
******************** 统计数据 ********************
平均值: 10000000
标准差: 10928.38
偏差比: 0.001093

偏差比只有约千分之一，所以可以得到结论：生成的函数符合平均分布。

4.2 Binomial公式数据生成验证 `test2()`

通过使用函数 test2() 可以得到以下数据：

size = 1000万

******************** Binomial 实验(n=20, size=10m) ********************
k值	发生次数		实际概率		理论概率		概率偏差比
0	12		0.000001	0.000001	25.829120%
1	199		0.000020	0.000019	4.333312%
2	1826		0.000183	0.000181	0.773672%
3	10759		0.001076	0.001087	1.038340%
4	46095		0.004610	0.004621	0.239194%
5	147249		0.014725	0.014786	0.411657%
6	370519		0.037052	0.036964	0.236669%
7	740837		0.074084	0.073929	0.209481%
8	1200126		0.120013	0.120134	0.101348%
9	1601530		0.160153	0.160179	0.016318%
10	1762175		0.176218	0.176197	0.011605%
11	1602306		0.160231	0.160179	0.032128%
12	1200108		0.120011	0.120134	0.102846%
13	739723		0.073972	0.073929	0.058796%
14	369674		0.036967	0.036964	0.008071%
15	147952		0.014795	0.014786	0.063801%
16	45972		0.004597	0.004621	0.505396%
17	10893		0.001089	0.001087	0.194196%
18	1835		0.000184	0.000181	1.270366%
19	204		0.000020	0.000019	6.954752%
20	6		0.000001	0.000001	37.085440%

size = 1亿

******************** Binomial 实验(n=20, size=100m) ********************
k值	发生次数		实际概率		理论概率		概率偏差比
0	95		0.000001	0.000001	0.385280%
1	1846		0.000018	0.000019	3.216435%
2	18026		0.000180	0.000181	0.517732%
3	108707		0.001087	0.001087	0.010920%
4	462541		0.004625	0.004621	0.105138%
5	1478587		0.014786	0.014786	0.000699%
6	3696003		0.036960	0.036964	0.011867%
7	7387746		0.073877	0.073929	0.069490%
8	12014555	0.120146	0.120134	0.009320%
9	16015085	0.160151	0.160179	0.017660%
10	17622246	0.176222	0.176197	0.014420%
11	16017554	0.160176	0.160179	0.002246%
12	12019029	0.120190	0.120134	0.046562%
13	7390142		0.073901	0.073929	0.037080%
14	3698042		0.036980	0.036964	0.043294%
15	1479722		0.014797	0.014786	0.077462%
16	461158		0.004612	0.004621	0.194177%
17	108729		0.001087	0.001087	0.009316%
18	18185		0.000182	0.000181	0.359761%
19	1899		0.000019	0.000019	0.437709%
20	103		0.000001	0.000001	8.003328%

5 结论

通过结果，我们可以看到：
1）公式计算结果与实际结果几乎一致；
2）1亿次较1000万次的结果误差更小，符合大数中心定律；
3）k值两端的分布偏差相对中心较大是由于数据量偏小造成的。

附录：源代码

package tests;

import java.util.Random;

public class V20210120_BinorialDistributionTest {

	static Random rand = new Random();

	public static void main(String[] args) {
		test1();
		// test2();
	}

	static long factorial(int n) {
		if (n < 2)
			return 1;

		long v = 1;
		for (int i = 1; i <= n; i++)
			v *= i;
		return v;
	}

	static long permutation(int n, int k) {
		// n! / (n-k)!/k!
		return factorial(n) / factorial(n - k) / factorial(k);
	}

	private static void test2() {
		int size = 100000000;
		int n = 20;
		double p = 0.5;
		int[] counters = new int[n + 1];
		for (int i = 0; i < size; i++) {
			counters[binomial(n, p)]++;
		}
		double[] actualValues = new double[n + 1];
		double[] theoryValues = new double[n + 1];

		// show(counters);

		for (int i = 0; i < theoryValues.length; i++) {
			actualValues[i] = counters[i] * 1.0 / size;
			theoryValues[i] = permutation(n, i) * Math.pow(p, i) * Math.pow(1 - p, n - i);
		}

		System.out.printf("******************** Binomial 实验(n=%d, size=%dm) ********************\n", n, size/1000000);
		System.out.println("k值\t发生次数\t\t实际概率\t\t理论概率\t\t概率偏差比");
		for (int i = 0; i < theoryValues.length; i++) {
			System.out.printf("%d\t%d\t\t%.6f\t%.6f\t%.6f%%\n", i, counters[i], actualValues[i], theoryValues[i],
					Math.abs(100 * (theoryValues[i] - actualValues[i]) / theoryValues[i]));
		}
	}

	public static int binomial(int n, double p) {
		int occurredTimes = 0;
		for (int i = 0; i < n; i++)
			occurredTimes += test(p) ? 1 : 0;
		return occurredTimes;
	}

	public static boolean test(double p) {
		return rand.nextInt(10000) < 10000 * p;
	}

	/**
	 * 通过随机1亿个[0,100)的随机数证明Java的随机数是符合平均分布的。
	 */
	public static void test1() {
		int[] counters = new int[20];
		int testTimes = 100_000_000;
		for (int i = 0; i < testTimes; i++) {
			counters[rand.nextInt(counters.length)]++;
		}

		show(counters);
	}

	static void show(int[] arr) {
		System.out.println("******************** 结果列表 ********************");
		for (int i = 0; i < arr.length; i++)
			System.out.printf("counters[%d] = %d\n", i, arr[i]);

		System.out.println("******************** 统计数据 ********************");
		double avg = mean(arr);
		double sigma = varience(arr);
		System.out.printf("平均值: %.0f\n", avg);
		System.out.printf("标准差: %.2f\n", sigma);
		System.out.printf("偏差比: %.6f\n", sigma / avg);
	}

	private static double varience(int[] arr) {
		double avg = mean(arr);
		double sigma = 0;
		for (int i = 0; i < arr.length; i++)
			sigma += Math.pow(arr[i] - avg, 2);

		return Math.sqrt(sigma);
	}

	public static double mean(int[] arr) {
		double mean = 0;
		for (int i = 0; i < arr.length; i++)
			mean += arr[i];
		return mean / arr.length;
	}
}