You have to be careful in deferring such work. It may end up more expensive if it means you have multiple threads accessing that data, and/or needing to extend the lifetime of an object so the logger can access it.
as long as you are just using static strings and native types it amounts to a pointer/index bump and a load/store per item. Lets imagine you have the format string, priority number, system id, and 7 pieces of data in the payload. That would be 10 items, so like 40 cycles? I can see the 18ns the paper gets.
I had no doubt the 7ns number is heavily cooked.