Record internal state based on register units. This is often more
efficient as there are typically fewer register units to update
compared to iterating over all the aliases of a register.
Original patch by Matthias Braun, but I've been rebasing and fixing it
for almost 2 years and fixed a few bugs causing intermediate failures
to make this patch independent of the changes in
https://reviews.llvm.org/D52010.
In order to properly implement these atomic we need one register more than other
binary atomics. It is used for storing result from comparing values in addition
to the one that is used for actual result of operation.
https://reviews.llvm.org/D71028