The overhead of atomics is almost (if not entirely?) exclusively with regards to managing the caches in the CPU. Otherwise they're just normal bytes. Your CPU already has to do some cache management with regular bytes, so an atomic is only worse if there's contention (because that forces a flush).
The worst case for an atomic write is two additional cache line flushes, iirc.