Micro-ops are smaller units of work to execute a CPU instruction. Some instruction may take 1 uop like adding two registers together. Some may take multiple uops like adding a register and a memory location. For example split into one uop to read from memory into a temporary register and then another uop to add that temporary register with another.
There are instruction decoders that quickly breaks up an CPU instruction into a series of uops. It needs to be fast or the execution units may become idle. So there are limitations like breaking up into 4 uops max. On the Intel x86 processors, there were lots of quite complex instructions that might need to be broken down into more than 4 uops so a separate microcode modules can handle those. However there is only one of those so it can become a performance bottleneck if you use too many of those complex instructions.
If you really want a technical book on this, read "Modern Processor Design" by Shen. A bit pricey though (I got one second hand cheap.)
The bible are Agner Fog optimization manuals [1] which contain quite a detailed description of the microarchitecture of intel and AMD CPUs from the pentium era till today. They are based on the extensive reverse engineering done by the author.
David Kanter microarchitecture articles at RWT [2] are also quite good.
There are instruction decoders that quickly breaks up an CPU instruction into a series of uops. It needs to be fast or the execution units may become idle. So there are limitations like breaking up into 4 uops max. On the Intel x86 processors, there were lots of quite complex instructions that might need to be broken down into more than 4 uops so a separate microcode modules can handle those. However there is only one of those so it can become a performance bottleneck if you use too many of those complex instructions.
If you really want a technical book on this, read "Modern Processor Design" by Shen. A bit pricey though (I got one second hand cheap.)