😃 从 JDK 12开始，JDK 就带有 JMH (Java Microbenchmark Harness) ，它是一个工具包，可以帮助您正确地实现 Java 微基准测试。JMH 是由实现 Java 虚拟机(JVM)的同一批人开发的，因此他们了解 Java 的内部原理以及 Java 如何在运行时进行优化。

一、What is JMH？

JMH： Java Micro Benchmark Harness【代码微基准测试工具集】的简写。【github for jmh】

JMH 是 OpenJDK 团队开发的一款基准测试工具，一般用于代码的性能调优，精度甚至可以达到纳秒级别，适用于 java 以及其他基于 JVM 的语言。和 Apache JMeter 不同，JMH 测试的对象可以是任一方法，颗粒度更小，而不仅限于rest api。

二、Why JMH？

2.1 JVM causes！

现在的 JVM 已经越来越为智能，它可以在编译阶段、加载阶段、运行阶段对代码进行优化。在需要进行性能测试时，如果不知道 JVM 优化细节，可能会导致你的测试结果差之毫厘，失之千里。同样的，Java 诞生之初就有一次编译、随处运行的口号，JVM 提供了底层支持，也提供了内存管理机制，这些机制都会对我们的性能测试结果造成不可预测的影响。

也许我们测试一个简单方法，是使用如下方式，亦或者加个循环，然后用总时间除以循环次数。

long start = System.currentTimeMillis();
// ....
long end = System.currentTimeMillis();
System.out.println(end - start);

但是，最终测试出来的数据真的准确吗？答案是否定的。

首先，时间戳的获取就有可能存在误差；其次，JVM可能会对一些代码进行优化，导致运行时不是真实场景下的耗时；再则，在循环中，JVM同样会有优化，会把循环展开（这里不展开说明）；最后，JVM会在各个阶段都有可能对代码进行优化，存在不确定性。

2.2 Without JMH

错误估计代码的性能
无法清晰判断相似方法之间的真实性能差距，从而错误选择方案

2.3 With JMH

性能测试更精确，能够阻止 JVM 和硬件在微基准执行期间应用的优化，从而模拟真实场景的代码运行性能
上手简单，只需要一些简单注解修饰，即可对相似的方法集合进行性能测试

三、JMH 快速上手

3.1 依赖引入：

<!--jmh 基准测试 -->
<dependency>
	<groupId>org.openjdk.jmh</groupId>
	<artifactId>jmh-core</artifactId>
	<version>1.34</version>
</dependency>
<dependency>
	<groupId>org.openjdk.jmh</groupId>
	<artifactId>jmh-generator-annprocess</artifactId>
	<version>1.34</version>
</dependency>

3.2 一个简单Demo：

我们对一个简单方法进行性能测试

public class JMHExample01 {
    @Benchmark
    public void wellHelloThere() {
        // this method was intentionally left blank.
    }

    public static void main(String[] args) throws RunnerException {
        final Options options = new OptionsBuilder().include(JMHExample01.class.getSimpleName())
                .forks(1)
                .measurementIterations(5)
                .warmupIterations(5)
                .build();
        new Runner(options).run();
    }
}

从代码中可以看出，我们对 wellHelloThere 函数进行性能测试，这里是故意留空的。

measurementIterations(5) warmupIterations(5) 分别表示正式运行批次与预热运行批次为5

运行结果如下：

# JMH version: 1.34
# VM version: JDK 1.8.0_312_fiber, OpenJDK 64-Bit Server VM, 25.312-b1
# VM invoker: D:\Software\TencentKona-8.0.8-312\jre\bin\java.exe
# VM options: -Dfile.encoding=UTF-8 -javaagent:C:\Program Files\JetBrains\IntelliJ IDEA 2021.2\lib\idea_rt.jar=52883:C:\Program Files\JetBrains\IntelliJ IDEA 2021.2\bin -Dfile.encoding=UTF-8
# Blackhole mode: full + dont-inline hint (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: com.eachen.jmh.samples.JMHSample_01_HelloWorld.wellHelloThere

# Run progress: 0.00% complete, ETA 00:01:40
# Fork: 1 of 1
# Warmup Iteration   1: 4250418666.712 ops/s
# Warmup Iteration   2: 4351990563.955 ops/s
# Warmup Iteration   3: 4294196982.723 ops/s
# Warmup Iteration   4: 4356422901.963 ops/s
# Warmup Iteration   5: 4380265370.149 ops/s
Iteration   1: 4333736833.571 ops/s
Iteration   2: 4357296430.734 ops/s
Iteration   3: 4389356560.825 ops/s
Iteration   4: 4388132443.569 ops/s
Iteration   5: 4383114985.169 ops/s
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8


Result "com.eachen.jmh.samples.JMHSample_01_HelloWorld.wellHelloThere":
  4370327450.774 ±(99.9%) 93359784.885 ops/s [Average]
  (min, avg, max) = (4333736833.571, 4370327450.774, 4389356560.825), stdev = 24245239.658
  CI (99.9%): [4276967665.889, 4463687235.658] (assumes normal distribution)


# Run complete. Total time: 00:01:41

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

Benchmark                                Mode  Cnt           Score          Error  Units
JMHSample_01_HelloWorld.wellHelloThere  thrpt    5  4370327450.774 ± 93359784.885  ops/s

得出的结果是，每秒可以运行 4370327450.774 次【ops/s = operations per second】，误差在 93359784.885

四、JMH基本用法

4.1 @Benchmark标记基准测试方法

对需要测试的方法使用注解 @Benchmark

如果没有检测到被注解，则会抛出异常

Exception in thread "main" org.openjdk.jmh.runner.RunnerException: ERROR: Another JMH instance might be running. Unable to acquire the JMH lock (C:\Users\EACHEN~1\AppData\Local\Temp\/jmh.lock), exiting. Use -Djmh.ignoreLock=true to forcefully continue.
	at org.openjdk.jmh.runner.Runner.run(Runner.java:211)
	at com.eachen.concurrence.jmh.JMHExample02.main(JMHExample02.java:41)
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8

4.2 Warmup 和 Measurement

什么是 Warmup 与 Measurement？

Warmup 与 Measurement 可以设置运行批次，前者表示预热的批次数，后者表示正式运行的批次数。

Warmup【预热】在JMH中，Warmup所做的就是【在基准测试代码正式度量之前，先对其进行预热，使得代码的执行是经历过了类的早期优化、JVM运行期编译、JIT优化之后的最终状态】，从而能够获得代码真实的性能数据。
Measurement 则是真正的度量操作，在每一轮的度量中，所有的度量数据会被纳入统计之中（预热数据不会纳入统计之中）

怎么使用 Warmup 与 Measurement？

设置全局的Warmup和Measurement
在基准测试方法上设置Warmup和Measurement

注意：runtime 的 options 配置可以覆盖注解中设置的数值

// 1.1 通过构造Options时设置
public static void main(String[] args) throws RunnerException {
        final Options options = new OptionsBuilder().include(JMHExample01.class.getSimpleName())
                .forks(1)
                .measurementIterations(10)
                .warmupIterations(10)
                .build();
        new Runner(options).run();
}
// 1.2 在对应的class上用相应的注解进行设置
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Measurement(iterations = 10)
@Warmup(iterations = 10)
@State(Scope.Thread)
public class JMHExample02 {
// 2 在基准测试方法上设置Warmup和Measurement
	@Measurement(iterations = 10)
	@Warmup(iterations = 10)
    public void normalMethod() {

    }

Warmup 以及 Measurement 详细说明

事实上，对于 Warmup 以及 Measurement，可以设置四个变量：

iterations 迭代的批次
time 对于每个批次的时间
timeUnit 与time对应，是其时间单位
batchSize 每个批次时benchmark方法运行的次数

/**
 * <p>Measurement annotations allows to set the default measurement parameters for
 * the benchmark.</p>
 *
 * <p>This annotation may be put at {@link Benchmark} method to have effect on that
 * method only, or at the enclosing class instance to have the effect over all
 * {@link Benchmark} methods in the class. This annotation may be overridden with
 * the runtime options.</p>
 *
 * @see Warmup
 */
@Inherited
@Target({ElementType.METHOD,ElementType.TYPE})
@Retention(RetentionPolicy.RUNTIME)
public @interface Measurement {

    int BLANK_ITERATIONS = -1;
    int BLANK_TIME = -1;
    int BLANK_BATCHSIZE = -1;

    /** @return Number of measurement iterations */
    int iterations() default BLANK_ITERATIONS;

    /** @return Time of each measurement iteration */
    int time() default BLANK_TIME;

    /** @return Time unit for measurement iteration duration */
    TimeUnit timeUnit() default TimeUnit.SECONDS;

    /** @return Batch size: number of benchmark method calls per operation */
    int batchSize() default BLANK_BATCHSIZE;

}

/**
 * <p>Warmup annotation allows to set the default warmup parameters for the benchmark.</p>
 *
 * <p>This annotation may be put at {@link Benchmark} method to have effect on that method
 * only, or at the enclosing class instance to have the effect over all {@link Benchmark}
 * methods in the class. This annotation may be overridden with the runtime options.</p>
 *
 * @see Measurement
 */
@Target({ElementType.METHOD,ElementType.TYPE})
@Retention(RetentionPolicy.RUNTIME)
@Inherited
public @interface Warmup {

    int BLANK_ITERATIONS = -1;
    int BLANK_TIME = -1;
    int BLANK_BATCHSIZE = -1;

    /** @return Number of warmup iterations */
    int iterations() default BLANK_ITERATIONS;

    /** @return Time for each warmup iteration */
    int time() default BLANK_TIME;

    /** @return Time unit for warmup iteration duration */
    TimeUnit timeUnit() default TimeUnit.SECONDS;

    /** @return batch size: number of benchmark method calls per operation */
    int batchSize() default BLANK_BATCHSIZE;

}

4.3 BenchmarkMode

JMH使用@BenchmarkMode这个注解来声明使用哪一种模式来运行，JMH为我们提供了四种运行模式，当然它还允许若干个模式同时存在

AverageTimeAverageTime 它主要用于输出基准测试方法每调用一次所耗费的时间，也就是elapsed time/operation。
ThroughputThroughput（方法吞吐量）则刚好与AverageTime相反，它的输出信息表明了在单位时间内可以对该方法调用多少次。
SampleTimeSampleTime（时间采样）的方式是指采用一种抽样的方式来统计基准测试方法的性能结果，与我们常见的Histogram图（直方图）几乎是一样的，它会收集所有的性能数据，并且将其分布在不同的区间中。
SingleShotTime 主要可用来进行冷测试，不论是Warmup还是Measurement，在每一个批次中基准测试方法只会被执行一次，一般情况下，我们会将Warmup的批次设置为0。
多Mode以及All 我们除了对某个基准测试方法设置上述四个模式中的一个之外，还可以为其设置多个模式的方式运行基准测试方法，如果你愿意，甚至可以设置全部的Mode。【可以看到 BenchmarkMode 注解是支持一个Mode 数组的】

BenchmarkMode 可以作为注解对 Benchmark方法或者 class上，也可以通过 Options 进行设置，同样的，它会覆盖注解中的设置。

/**
 * Benchmark mode.
 */
public enum Mode {

    /**
     * <p>Throughput: operations per unit of time.</p>
     *
     * <p>Runs by continuously calling {@link Benchmark} methods,
     * counting the total throughput over all worker threads. This mode is time-based, and it will
     * run until the iteration time expires.</p>
     */
    Throughput("thrpt", "Throughput, ops/time"),

    /**
     * <p>Average time: average time per per operation.</p>
     *
     * <p>Runs by continuously calling {@link Benchmark} methods,
     * counting the average time to call over all worker threads. This is the inverse of {@link Mode#Throughput},
     * but with different aggregation policy. This mode is time-based, and it will run until the iteration time
     * expires.</p>
     */
    AverageTime("avgt", "Average time, time/op"),

    /**
     * <p>Sample time: samples the time for each operation.</p>
     *
     * <p>Runs by continuously calling {@link Benchmark} methods,
     * and randomly samples the time needed for the call. This mode automatically adjusts the sampling
     * frequency, but may omit some pauses which missed the sampling measurement. This mode is time-based, and it will
     * run until the iteration time expires.</p>
     */
    SampleTime("sample", "Sampling time"),

    /**
     * <p>Single shot time: measures the time for a single operation.</p>
     *
     * <p>Runs by calling {@link Benchmark} once and measuring its time.
     * This mode is useful to estimate the "cold" performance when you don't want to hide the warmup invocations, or
     * if you want to see the progress from call to call, or you want to record every single sample. This mode is
     * work-based, and will run only for a single invocation of {@link Benchmark}
     * method.</p>
     *
     * Caveats for this mode include:
     * <ul>
     *  <li>More warmup/measurement iterations are generally required.</li>
     *  <li>Timers overhead might be significant if benchmarks are small; switch to {@link #SampleTime} mode if
     *  that is a problem.</li>
     * </ul>
     */
    SingleShotTime("ss", "Single shot invocation time"),

    /**
     * Meta-mode: all the benchmark modes.
     * This is mostly useful for internal JMH testing.
     */
    All("all", "All benchmark modes"),

    ;

    private final String shortLabel;
    private final String longLabel;

    Mode(String shortLabel, String longLabel) {
        this.shortLabel = shortLabel;
        this.longLabel = longLabel;
    }

    public String shortLabel() {
        return shortLabel;
    }

    public String longLabel() {
        return longLabel;
    }

    public static Mode deepValueOf(String name) {
        try {
            return Mode.valueOf(name);
        } catch (IllegalArgumentException iae) {
            Mode inferred = null;
            for (Mode type : values()) {
                if (type.shortLabel().startsWith(name)) {
                    if (inferred == null) {
                        inferred = type;
                    } else {
                        throw new IllegalStateException("Unable to parse benchmark mode, ambiguous prefix given: \"" + name + "\"\n" +
                                "Known values are " + getKnown());
                    }
                }
            }
            if (inferred != null) {
                return inferred;
            } else {
                throw new IllegalStateException("Unable to parse benchmark mode: \"" + name + "\"\n" +
                        "Known values are " + getKnown());
            }
        }
    }

    public static List<String> getKnown() {
        List<String> res = new ArrayList<>();
        for (Mode type : Mode.values()) {
            res.add(type.name() + "/" + type.shortLabel());
        }
        return res;
    }
}

4.4 OutputTimeUnit

OutputTimeUnit提供了统计结果输出时的单位，比如，调用一次该方法将会耗费多少个单位时间，或者在单位时间内对该方法进行了多少次的调用，同样，OutputTimeUnit既可以设置在class上，也可以设置在method上，还可以在Options中进行设置，它们的覆盖次序与BenchmarkMode一致，这里就不再赘述了。

4.5 三大State的使用

在JMH中，有三大State分别对应于Scope的三个枚举值。

Benchmark
Thread
Group

Thread独享的State

所谓线程独享的State是指，每一个运行基准测试方法的线程都会持有一个独立的对象实例，该实例既可能是作为基准测试方法参数传入的，也可能是运行基准方法所在的宿主class，将State设置为Scope.Thread一般主要是针对非线程安全的类。

Thread共享的State

有时候，我们需要测试在多线程的情况下某个类被不同线程操作时的性能，比如，多线程访问某个共享数据时，我们需要让多个线程使用同一个实例才可以。因此JMH提供了多线程共享的一种状态Scope.Benchmark。

线程组共享的State

第一，是在多线程情况下的单个实例；第二，允许一个以上的基准测试方法并发并行地运行。比如，在多线程高并发的环境中，多个线程同时对一个ConcurrentHashMap进行读写。使用 group 即可实现这种情况，多个基准测试方法可以并发运行。

4.6 @Param的妙用

可以解决代码的冗余，提供类似 Data-Driven-Test 的能力。

使用 param 可以实现 N* N * N 的测试效果。

另外，可以

参考 JMHSample_27_Params 例子

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
@State(Scope.Benchmark)
public class JMHSample_27_Params {

    /**
     * In many cases, the experiments require walking the configuration space
     * for a benchmark. This is needed for additional control, or investigating
     * how the workload performance changes with different settings.
     */

    @Param({"1", "31", "65", "101", "103"})
    public int arg;

    @Param({"0", "1", "2", "4", "8", "16", "32"})
    public int certainty;

    @Benchmark
    public boolean bench() {
        return BigInteger.valueOf(arg).isProbablePrime(certainty);
    }

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(JMHSample_27_Params.class.getSimpleName())
//                .param("arg", "41", "42") // Use this to selectively constrain/override parameters
                .build();

        new Runner(opt).run();
    }
}

4.7 JMH的测试套件（Fixture）

Setup以及TearDown

JMH提供了两个注解@Setup和@TearDown用于套件测试，其中@Setup会在每一个基准测试方法执行前被调用，通常用于资源的初始化，@TearDown则会在基准测试方法被执行之后被调用，通常可用于资源的回收清理工作

Level

使用Setup和TearDown时，在默认情况下，Setup和TearDown会在一个基准方法的所有批次执行前后分别执行，如果需要在每一个批次或者每一次基准方法调用执行的前后执行对应的套件方法，则需要对@Setup和@TearDown进行简单的配置。

Trial：Setup和TearDown默认的配置，该套件方法会在每一个基准测试方法的所有批次执行的前后被执行。【对应下图的位置1与位置2】
Iteration：由于我们可以设置Warmup和Measurement，因此每一个基准测试方法都会被执行若干个批次，如果想要在每一个基准测试批次执行的前后调用套件方法，则可以将Level设置为Iteration。【对应下图的位置3和位置4】
Invocation：将Level设置为Invocation意味着在每一个批次的度量过程中，每一次对基准方法的调用前后都会执行套件方法。【对应下图的位置5与位置6】

4.8 CompilerControl

JMH提供了可以控制是否使用内联的注解 @CompilerControl ，它的参数有如下可选：

CompilerControl.Mode.DONT_INLINE：不使用内联
CompilerControl.Mode.INLINE：强制使用内联
CompilerControl.Mode.EXCLUDE：不编译

此外还有其他的参数选项，可以参考：

@Target({ElementType.METHOD, ElementType.CONSTRUCTOR, ElementType.TYPE})
@Retention(RetentionPolicy.RUNTIME)
public @interface CompilerControl {

    /**
     * The compilation mode.
     * @return mode
     */
    Mode value();

    /**
     * Compilation mode.
     */
    enum Mode {

        /**
         * Insert the breakpoint into the generated compiled code.
         */
        BREAK("break"),

        /**
         * Print the method and it's profile.
         */
        PRINT("print"),

        /**
         * Exclude the method from the compilation.
         */
        EXCLUDE("exclude"),

        /**
         * Force inline.
         */
        INLINE("inline"),

        /**
         * Force skip inline.
         */
        DONT_INLINE("dontinline"),

        /**
         * Compile only this method, and nothing else.
         */
        COMPILE_ONLY("compileonly"),;

        private final String command;

        Mode(String command) {
            this.command = command;
        }

        public String command() {
            return command;
        }
    }
}

这里给到jmh的一个示例：

@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class JMHSample_16_CompilerControl {

    /*
     * We can use HotSpot-specific functionality to tell the compiler what
     * do we want to do with particular methods. To demonstrate the effects,
     * we end up with 3 methods in this sample.
     */

    /**
     * These are our targets:
     *   - first method is prohibited from inlining
     *   - second method is forced to inline
     *   - third method is prohibited from compiling
     *
     * We might even place the annotations directly to the benchmarked
     * methods, but this expresses the intent more clearly.
     */

    public void target_blank() {
        // this method was intentionally left blank
    }

    @CompilerControl(CompilerControl.Mode.DONT_INLINE)
    public void target_dontInline() {
        // this method was intentionally left blank
    }

    @CompilerControl(CompilerControl.Mode.INLINE)
    public void target_inline() {
        // this method was intentionally left blank
    }

    @CompilerControl(CompilerControl.Mode.EXCLUDE)
    public void target_exclude() {
        // this method was intentionally left blank
    }

    /*
     * These method measures the calls performance.
     */

    @Benchmark
    public void baseline() {
        // this method was intentionally left blank
    }

    @Benchmark
    public void blank() {
        target_blank();
    }

    @Benchmark
    public void dontinline() {
        target_dontInline();
    }

    @Benchmark
    public void inline() {
        target_inline();
    }

    @Benchmark
    public void exclude() {
        target_exclude();
    }

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(JMHSample_16_CompilerControl.class.getSimpleName())
                .warmupIterations(0)
                .measurementIterations(3)
                .forks(1)
                .build();

        new Runner(opt).run();
    }
}

结果如下：

Benchmark                                Mode  Cnt  Score   Error  Units
JMHSample_16_CompilerControl.baseline    avgt    3  0.231 ± 0.014  ns/op
JMHSample_16_CompilerControl.blank       avgt    3  0.228 ± 0.006  ns/op
JMHSample_16_CompilerControl.dontinline  avgt    3  1.494 ± 3.834  ns/op
JMHSample_16_CompilerControl.exclude     avgt    3  9.419 ± 6.557  ns/op
JMHSample_16_CompilerControl.inline      avgt    3  0.227 ± 0.007  ns/op

从执行结果可以看到内联方法和空方法执行速度一样，不编译执行最慢。