Ocaml's optimization is very good, but not perfect. For example, if you need 32-bit integers instead of 31-bit integers, your performance is going to tank due to boxing.
MLton is a whole program optimizer (rather than function at a time like Ocaml) and wrings out a ton of performance though compile times are quite a bit longer.
As of 3 years ago it wasn't, but maybe that has changed.
Whole program compilation isn't without its downsides though. It seems pretty common to develop in SML/nj because of fast compiling then doing final performance profiling and deploying with mlton (having a standard helps). Function at a time is also more amenable to caching and reusing part of the previous compile.
MLton is a whole program optimizer (rather than function at a time like Ocaml) and wrings out a ton of performance though compile times are quite a bit longer.