Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
Tips and Traps
- 
The built-in benchmarking is still unstable and will likely be deprecated. 
- 
criterion is currently the best Rust crate for benchmarking. 
Criterion
- 
With Rust stable, Criterion can only benchmark public functions/methods . 
- 
Criterion supports the same filtering behavior that the standard-library testing and benchmarking tools support, so you should be able to just run cargo bench NAMEand it will only run benchmarks with "NAME" in the benchmark name or function name.
- 
Even if Criterion is currently the best available benchmarking tool available in Rust, it still have a few issues. - It can only benchmark public functions/methods with Rust stable
- If you benchmark WALL/CPU times, the benchmark results of the same function (without code change) might vary significantly with 2 benchmarks running at very close times. It is suggested that you benchmark Linux perf events instead, which gives you stable benchmark results.
 
- 
There are lots of Criterion extensions enhancing features of Criterion. - cargo-criterion
- criterion-perf-events This is a measurement plugin for Criterion.rs to measure events of the Linux perf interface.
- criterion-cycles-per-byte
    measures (proportional) clock cycles using the x86 or x86_64 rdtscinstruction. Notice that RDTSC (and thus criterion-cycles-per-byte) does not measure accurate CPU cycles. Please refer to RDTSC does not measure cycles for detailed discussions.
- criterion-linux-perf A measurement plugin for Criterion.rs that provides measurements using Linux's perf interface
 
divan
divan is a fast and simple benchmarking for Rust projects.
Iai
- 
Iai is an experimental benchmarking harness that uses Cachegrind to perform extremely precise single-shot measurements of Rust code. 
- 
The idea of Iai is very cool, but unfotuantely it does not support excluding setup code from benchmark at this time. This makes Iai unusable in most cases. The PR Use Callgrind instead of Cachegrind #26 might fix this issue later. 
Benchmark Numbers for Rust
Infographics: Operation Costs in CPU Clock Cycles
Introduction to C and Computer Organization
random access of an element of array,
my impression is that it's about 6-7 CPU cycles (including bound check). get_unchecked (without bound check) takes about 4 CPU cycles (verify this)
How much does an array access cost?
multiplication of an non-const integer with a const integer: 4.5 cpu cycles
multiplication of 2 non-const usize: 6 cpu cycles
usize::count_trailing_zeros: 9.5 CPU cycles usize::count_ones: 21 CPU cycles
f64::max: about 12?
f64::max is not fast due to the fact that it needs to handle NaNs.
A simple implementation of max using >
is much faster if your data won't have NaNs.
Vec::clear / ArrayVec::clear: 2
References
- 
How to benchmark a private function on all versions of the compiler 
- 
Why my Rust benchmarks were wrong, or how to correctly use std::hint::black_box? 
- 
https://github.com/madsmtm/objc2/blob/master/objc2/benches/autorelease.rs