The optimization of short sequences of loop-free, fixed-point assembly code sequences is an important problem in highperformance computing. However, the competing constraints
of transformation correctness and performance improvement often force even special purpose compilers to produce sub-optimal code. We show that by encoding these constraints as terms in a cost function, and using a Markov Chain Monte Carlo sampler to rapidly explore the space of
all possible code sequences, we are able to generate aggressively optimized versions of a given target code sequence. Beginning from binaries compiled by llvm −O0, we are able to produce provably correct code sequences that either match or outperform the code produced by gcc −O3, icc−O3, and in some cases expert handwritten assembly.