most platforms come with their own libraries for this, which are usually a mix of hand coded assembly and C. You #include the whole library/sdk, but the linker strips out all bits you don't use.
Even then, if you read the disassembled code, you can usually find within a few minutes looking some stupid/unused/inefficient code - so you could totally do a better job if you wrote the assembly by hand, but it would take much more time (especially since most of these architectures tend to have very irregular instruction sets)
Even then, if you read the disassembled code, you can usually find within a few minutes looking some stupid/unused/inefficient code - so you could totally do a better job if you wrote the assembly by hand, but it would take much more time (especially since most of these architectures tend to have very irregular instruction sets)