forked from OSchip/llvm-project
				
			Update Atomics.rst
Summary: I changed various bits of the compilation of atomics recently, and forgot updating the documentation. This patch just brings it up to date. Test Plan: no change to the code Reviewers: jfb Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5590 llvm-svn: 218937
This commit is contained in:
		
							parent
							
								
									19c9f8cd50
								
							
						
					
					
						commit
						e83f59e658
					
				| 
						 | 
					@ -18,8 +18,8 @@ clarified in the IR.
 | 
				
			||||||
The atomic instructions are designed specifically to provide readable IR and
 | 
					The atomic instructions are designed specifically to provide readable IR and
 | 
				
			||||||
optimized code generation for the following:
 | 
					optimized code generation for the following:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* The new C++0x ``<atomic>`` header.  (`C++0x draft available here
 | 
					* The new C++11 ``<atomic>`` header.  (`C++11 draft available here
 | 
				
			||||||
  <http://www.open-std.org/jtc1/sc22/wg21/>`_.) (`C1x draft available here
 | 
					  <http://www.open-std.org/jtc1/sc22/wg21/>`_.) (`C11 draft available here
 | 
				
			||||||
  <http://www.open-std.org/jtc1/sc22/wg14/>`_.)
 | 
					  <http://www.open-std.org/jtc1/sc22/wg14/>`_.)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* Proper semantics for Java-style memory, for both ``volatile`` and regular
 | 
					* Proper semantics for Java-style memory, for both ``volatile`` and regular
 | 
				
			||||||
| 
						 | 
					@ -115,7 +115,10 @@ memory operation can happen on any thread between the load and store.
 | 
				
			||||||
A ``fence`` provides Acquire and/or Release ordering which is not part of
 | 
					A ``fence`` provides Acquire and/or Release ordering which is not part of
 | 
				
			||||||
another operation; it is normally used along with Monotonic memory operations.
 | 
					another operation; it is normally used along with Monotonic memory operations.
 | 
				
			||||||
A Monotonic load followed by an Acquire fence is roughly equivalent to an
 | 
					A Monotonic load followed by an Acquire fence is roughly equivalent to an
 | 
				
			||||||
Acquire load.
 | 
					Acquire load, and a Monotonic store following a Release fence is roughly
 | 
				
			||||||
 | 
					equivalent to a Release store. SequentiallyConsistent fences behave as both
 | 
				
			||||||
 | 
					an Acquire and a Release fence, and offer some additional complicated
 | 
				
			||||||
 | 
					guarantees, see the C++11 standard for details.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Frontends generating atomic instructions generally need to be aware of the
 | 
					Frontends generating atomic instructions generally need to be aware of the
 | 
				
			||||||
target to some degree; atomic instructions are guaranteed to be lock-free, and
 | 
					target to some degree; atomic instructions are guaranteed to be lock-free, and
 | 
				
			||||||
| 
						 | 
					@ -221,7 +224,7 @@ essentially guarantees that if you take all the operations affecting a specific
 | 
				
			||||||
address, a consistent ordering exists.
 | 
					address, a consistent ordering exists.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Relevant standard
 | 
					Relevant standard
 | 
				
			||||||
  This corresponds to the C++0x/C1x ``memory_order_relaxed``; see those
 | 
					  This corresponds to the C++11/C11 ``memory_order_relaxed``; see those
 | 
				
			||||||
  standards for the exact definition.
 | 
					  standards for the exact definition.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Notes for frontends
 | 
					Notes for frontends
 | 
				
			||||||
| 
						 | 
					@ -251,8 +254,8 @@ Acquire provides a barrier of the sort necessary to acquire a lock to access
 | 
				
			||||||
other memory with normal loads and stores.
 | 
					other memory with normal loads and stores.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Relevant standard
 | 
					Relevant standard
 | 
				
			||||||
  This corresponds to the C++0x/C1x ``memory_order_acquire``. It should also be
 | 
					  This corresponds to the C++11/C11 ``memory_order_acquire``. It should also be
 | 
				
			||||||
  used for C++0x/C1x ``memory_order_consume``.
 | 
					  used for C++11/C11 ``memory_order_consume``.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Notes for frontends
 | 
					Notes for frontends
 | 
				
			||||||
  If you are writing a frontend which uses this directly, use with caution.
 | 
					  If you are writing a frontend which uses this directly, use with caution.
 | 
				
			||||||
| 
						 | 
					@ -281,7 +284,7 @@ Release is similar to Acquire, but with a barrier of the sort necessary to
 | 
				
			||||||
release a lock.
 | 
					release a lock.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Relevant standard
 | 
					Relevant standard
 | 
				
			||||||
  This corresponds to the C++0x/C1x ``memory_order_release``.
 | 
					  This corresponds to the C++11/C11 ``memory_order_release``.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Notes for frontends
 | 
					Notes for frontends
 | 
				
			||||||
  If you are writing a frontend which uses this directly, use with caution.
 | 
					  If you are writing a frontend which uses this directly, use with caution.
 | 
				
			||||||
| 
						 | 
					@ -307,7 +310,7 @@ AcquireRelease (``acq_rel`` in IR) provides both an Acquire and a Release
 | 
				
			||||||
barrier (for fences and operations which both read and write memory).
 | 
					barrier (for fences and operations which both read and write memory).
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Relevant standard
 | 
					Relevant standard
 | 
				
			||||||
  This corresponds to the C++0x/C1x ``memory_order_acq_rel``.
 | 
					  This corresponds to the C++11/C11 ``memory_order_acq_rel``.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Notes for frontends
 | 
					Notes for frontends
 | 
				
			||||||
  If you are writing a frontend which uses this directly, use with caution.
 | 
					  If you are writing a frontend which uses this directly, use with caution.
 | 
				
			||||||
| 
						 | 
					@ -330,7 +333,7 @@ and Release semantics for stores. Additionally, it guarantees that a total
 | 
				
			||||||
ordering exists between all SequentiallyConsistent operations.
 | 
					ordering exists between all SequentiallyConsistent operations.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Relevant standard
 | 
					Relevant standard
 | 
				
			||||||
  This corresponds to the C++0x/C1x ``memory_order_seq_cst``, Java volatile, and
 | 
					  This corresponds to the C++11/C11 ``memory_order_seq_cst``, Java volatile, and
 | 
				
			||||||
  the gcc-compatible ``__sync_*`` builtins which do not specify otherwise.
 | 
					  the gcc-compatible ``__sync_*`` builtins which do not specify otherwise.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Notes for frontends
 | 
					Notes for frontends
 | 
				
			||||||
| 
						 | 
					@ -368,6 +371,11 @@ Predicates for optimizer writers to query:
 | 
				
			||||||
  that they return true for any operation which is volatile or at least
 | 
					  that they return true for any operation which is volatile or at least
 | 
				
			||||||
  Monotonic.
 | 
					  Monotonic.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* ``isAtLeastAcquire()``/``isAtLeastRelease()``: These are predicates on
 | 
				
			||||||
 | 
					  orderings. They can be useful for passes that are aware of atomics, for
 | 
				
			||||||
 | 
					  example to do DSE across a single atomic access, but not across a
 | 
				
			||||||
 | 
					  release-acquire pair (see MemoryDependencyAnalysis for an example of this)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* Alias analysis: Note that AA will return ModRef for anything Acquire or
 | 
					* Alias analysis: Note that AA will return ModRef for anything Acquire or
 | 
				
			||||||
  Release, and for the address accessed by any Monotonic operation.
 | 
					  Release, and for the address accessed by any Monotonic operation.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
| 
						 | 
					@ -389,7 +397,9 @@ operations:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* DSE: Unordered stores can be DSE'ed like normal stores.  Monotonic stores can
 | 
					* DSE: Unordered stores can be DSE'ed like normal stores.  Monotonic stores can
 | 
				
			||||||
  be DSE'ed in some cases, but it's tricky to reason about, and not especially
 | 
					  be DSE'ed in some cases, but it's tricky to reason about, and not especially
 | 
				
			||||||
  important.
 | 
					  important. It is possible in some case for DSE to operate across a stronger
 | 
				
			||||||
 | 
					  atomic operation, but it is fairly tricky. DSE delegates this reasoning to
 | 
				
			||||||
 | 
					  MemoryDependencyAnalysis (which is also used by other passes like GVN).
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* Folding a load: Any atomic load from a constant global can be constant-folded,
 | 
					* Folding a load: Any atomic load from a constant global can be constant-folded,
 | 
				
			||||||
  because it cannot be observed.  Similar reasoning allows scalarrepl with
 | 
					  because it cannot be observed.  Similar reasoning allows scalarrepl with
 | 
				
			||||||
| 
						 | 
					@ -400,7 +410,8 @@ Atomics and Codegen
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Atomic operations are represented in the SelectionDAG with ``ATOMIC_*`` opcodes.
 | 
					Atomic operations are represented in the SelectionDAG with ``ATOMIC_*`` opcodes.
 | 
				
			||||||
On architectures which use barrier instructions for all atomic ordering (like
 | 
					On architectures which use barrier instructions for all atomic ordering (like
 | 
				
			||||||
ARM), appropriate fences are split out as the DAG is built.
 | 
					ARM), appropriate fences can be emitted by the AtomicExpand Codegen pass if
 | 
				
			||||||
 | 
					``setInsertFencesForAtomic()`` was used.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
The MachineMemOperand for all atomic operations is currently marked as volatile;
 | 
					The MachineMemOperand for all atomic operations is currently marked as volatile;
 | 
				
			||||||
this is not correct in the IR sense of volatile, but CodeGen handles anything
 | 
					this is not correct in the IR sense of volatile, but CodeGen handles anything
 | 
				
			||||||
| 
						 | 
					@ -415,11 +426,6 @@ error when given an operation which cannot be implemented.  (The LLVM code
 | 
				
			||||||
generator is not very helpful here at the moment, but hopefully that will
 | 
					generator is not very helpful here at the moment, but hopefully that will
 | 
				
			||||||
change.)
 | 
					change.)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
The implementation of atomics on LL/SC architectures (like ARM) is currently a
 | 
					 | 
				
			||||||
bit of a mess; there is a lot of copy-pasted code across targets, and the
 | 
					 | 
				
			||||||
representation is relatively unsuited to optimization (it would be nice to be
 | 
					 | 
				
			||||||
able to optimize loops involving cmpxchg etc.).
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
On x86, all atomic loads generate a ``MOV``. SequentiallyConsistent stores
 | 
					On x86, all atomic loads generate a ``MOV``. SequentiallyConsistent stores
 | 
				
			||||||
generate an ``XCHG``, other stores generate a ``MOV``. SequentiallyConsistent
 | 
					generate an ``XCHG``, other stores generate a ``MOV``. SequentiallyConsistent
 | 
				
			||||||
fences generate an ``MFENCE``, other fences do not cause any code to be
 | 
					fences generate an ``MFENCE``, other fences do not cause any code to be
 | 
				
			||||||
| 
						 | 
					@ -435,3 +441,17 @@ operation. Loads and stores generate normal instructions.  ``cmpxchg`` and
 | 
				
			||||||
``atomicrmw`` can be represented using a loop with LL/SC-style instructions
 | 
					``atomicrmw`` can be represented using a loop with LL/SC-style instructions
 | 
				
			||||||
which take some sort of exclusive lock on a cache line (``LDREX`` and ``STREX``
 | 
					which take some sort of exclusive lock on a cache line (``LDREX`` and ``STREX``
 | 
				
			||||||
on ARM, etc.).
 | 
					on ARM, etc.).
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					It is often easiest for backends to use AtomicExpandPass to lower some of the
 | 
				
			||||||
 | 
					atomic constructs. Here are some lowerings it can do:
 | 
				
			||||||
 | 
					* cmpxchg -> loop with load-linked/store-conditional
 | 
				
			||||||
 | 
					  by overriding ``hasLoadLinkedStoreConditional()``, ``emitLoadLinked()``,
 | 
				
			||||||
 | 
					  ``emitStoreConditional()``
 | 
				
			||||||
 | 
					* large loads/stores -> ll-sc/cmpxchg
 | 
				
			||||||
 | 
					  by overriding ``shouldExpandAtomicStoreInIR()``/``shouldExpandAtomicLoadInIR()``
 | 
				
			||||||
 | 
					* strong atomic accesses -> monotonic accesses + fences
 | 
				
			||||||
 | 
					  by using ``setInsertFencesForAtomic()`` and overriding ``emitLeadingFence()``
 | 
				
			||||||
 | 
					  and ``emitTrailingFence()``
 | 
				
			||||||
 | 
					* atomic rmw -> loop with cmpxchg or load-linked/store-conditional
 | 
				
			||||||
 | 
					  by overriding ``expandAtomicRMWInIR()``
 | 
				
			||||||
 | 
					For an example of all of these, look at the ARM backend.
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
		Loading…
	
		Reference in New Issue