Hacker News new | ask | show | jobs
by exacube 4008 days ago
This post is really old (2008) and probably already resolved. Here is the jist of the optimization that this person provides:

memory and string functions in libc have poor performance because you are not using XMM registers and you have no efficient way of dealing with unaligned data. The most efficient way of copying data when source and destination have different alignments is to read aligned into XMM registers; shift and combine consecutive reads so that they fit the alignment of the destination; then write aligned