llvm-project

Commit Graph

Author	SHA1	Message	Date
David Sherwood	71bd59f0cb	[SVE] Add support for scalable vectors with vectorize.scalable.enable loop attribute In this patch I have added support for a new loop hint called vectorize.scalable.enable that says whether we should enable scalable vectorization or not. If a user wants to instruct the compiler to vectorize a loop with scalable vectors they can now do this as follows: br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !2 ... !2 = !{!2, !3, !4} !3 = !{!"llvm.loop.vectorize.width", i32 8} !4 = !{!"llvm.loop.vectorize.scalable.enable", i1 true} Setting the hint to false simply reverts the behaviour back to the default, using fixed width vectors. Differential Revision: https://reviews.llvm.org/D88962	2020-12-02 13:23:43 +00:00
Juneyoung Lee	8e504615e9	[LangRef] missing link, minor fix	2020-11-30 23:09:36 +09:00
Juneyoung Lee	1856e22eeb	[LangRef] minor fixes to poison examples and well-defined values section (NFC)	2020-11-29 20:51:25 +09:00
Juneyoung Lee	2e32c49d97	[LangRef] Add poison constant This patch adds a description about the newly added poison constant to LangRef. Differential Revision: https://reviews.llvm.org/D92162	2020-11-27 10:29:52 +09:00
Alex Richardson	3bc4157556	Add a default address space for globals to DataLayout This is similar to the existing alloca and program address spaces (D37052) and should be used when creating/accessing global variables. We need this in our CHERI fork of LLVM to place all globals in address space 200. This ensures that values are accessed using CHERI load/store instructions instead of the normal MIPS/RISC-V ones. The problem this is trying to fix is that most of the time the type of globals is created using a simple PointerType::getUnqual() (or ::get() with the default address-space value of 0). This does not work for us and we get assertion/compilation/instruction selection failures whenever a new call is added that uses the default value of zero. In our fork we have removed the default parameter value of zero for most address space arguments and use DL.getProgramAddressSpace() or DL.getGlobalsAddressSpace() whenever possible. If this change is accepted, I will upstream follow-up patches to use DL.getGlobalsAddressSpace() instead of relying on the default value of 0 for PointerType::get(), etc. This patch and the follow-up changes will not have any functional changes for existing backends with the default globals address space of zero. A follow-up commit will change the default globals address space for AMDGPU to 1. Reviewed By: dylanmckay Differential Revision: https://reviews.llvm.org/D70947	2020-11-20 15:46:52 +00:00
Leonard Chan	a97f62837f	[llvm][IR] Add dso_local_equivalent Constant The `dso_local_equivalent` constant is a wrapper for functions that represents a value which is functionally equivalent to the global passed to this. That is, if this accepts a function, calling this constant should have the same effects as calling the function directly. This could be a direct reference to the function, the `@plt` modifier on X86/AArch64, a thunk, or anything that's equivalent to the resolved function as a call target. When lowered, the returned address must have a constant offset at link time from some other symbol defined within the same binary. The address of this value is also insignificant. The name is leveraged from `dso_local` where use of a function or variable is resolved to a symbol in the same linkage unit. In this patch: - Addition of `dso_local_equivalent` and handling it - Update Constant::needsRelocation() to strip constant inbound GEPs and take advantage of `dso_local_equivalent` for relative references This is useful for the [Relative VTables C++ ABI](https://reviews.llvm.org/D72959) which makes vtables readonly. This works by replacing the dynamic relocations for function pointers in them with static relocations that represent the offset between the vtable and virtual functions. If a function is externally defined, `dso_local_equivalent` can be used as a generic wrapper for the function to still allow for this static offset calculation to be done. See [RFC](http://lists.llvm.org/pipermail/llvm-dev/2020-August/144469.html) for more details. Differential Revision: https://reviews.llvm.org/D77248	2020-11-19 10:26:17 -08:00
Nick Desaulniers	f4c6080ab8	Revert "[IR] add fn attr for no_stack_protector; prevent inlining on mismatch" This reverts commit `b7926ce6d7`. Going with a simpler approach.	2020-11-17 17:27:14 -08:00
Nikita Popov	c87c375096	[LangRef] Clarify GEP inbounds wrapping semantics Clarify the semantics of GEP inbounds, in particular with regard to what it means for wrapping. This cleans up some confusion on when it is legal to apply nuw/nsw flags to various parts of the GEP calculation. Differential Revision: https://reviews.llvm.org/D90708	2020-11-13 17:49:41 +01:00
Florian Hahn	8bb6347939	Add !annotation metadata and remarks pass. This patch adds a new !annotation metadata kind which can be used to attach annotation strings to instructions. It also adds a new pass that emits summary remarks per function with the counts for each annotation kind. The intended uses cases for this new metadata is annotating 'interesting' instructions and the remarks should provide additional insight into transformations applied to a program. To motivate this, consider these specific questions we would like to get answered: * How many stores added for automatic variable initialization remain after optimizations? Where are they? * How many runtime checks inserted by a frontend could be eliminated? Where are the ones that did not get eliminated? Discussed on llvm-dev as part of 'RFC: Combining Annotation Metadata and Remarks' (http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html) Reviewed By: thegameg, jdoerfert Differential Revision: https://reviews.llvm.org/D91188	2020-11-13 13:24:10 +00:00
David Green	7f34b9ddf8	[Sphinx] Fix langref formatting. NFC	2020-11-10 16:47:43 +00:00
David Green	b2ac9681a7	[ARM] Alter t2DoLoopStart to define lr This changes the definition of t2DoLoopStart from t2DoLoopStart rGPR to GPRlr = t2DoLoopStart rGPR This will hopefully mean that low overhead loops are more tied together, and we can more reliably generate loops without reverting or being at the whims of the register allocator. This is a fairly simple change in itself, but leads to a number of other required alterations. - The hardware loop pass, if UsePhi is set, now generates loops of the form: %start = llvm.start.loop.iterations(%N) loop: %p = phi [%start], [%dec] %dec = llvm.loop.decrement.reg(%p, 1) %c = icmp ne %dec, 0 br %c, loop, exit - For this a new llvm.start.loop.iterations intrinsic was added, identical to llvm.set.loop.iterations but produces a value as seen above, gluing the loop together more through def-use chains. - This new instrinsic conceptually produces the same output as input, which is taught to SCEV so that the checks in MVETailPredication are not affected. - Some minor changes are needed to the ARMLowOverheadLoop pass, but it has been left mostly as before. We should now more reliably be able to tell that the t2DoLoopStart is correct without having to prove it, but t2WhileLoopStart and tail-predicated loops will remain the same. - And all the tests have been updated. There are a lot of them! This patch on it's own might cause more trouble that it helps, with more tail-predicated loops being reverted, but some additional patches can hopefully improve upon that to get to something that is better overall. Differential Revision: https://reviews.llvm.org/D89881	2020-11-10 15:57:58 +00:00
Atmn Patel	cea0599aa7	[LangRef] Adds llvm.loop.mustprogress loop metadata This patch adds the llvm.loop.mustprogress loop metadata. This is to be added to loops where the frontend language requires that the loop makes observable interactions with the environment. This is the loop-level equivalent to the function attribute `mustprogress` defined in D86233. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D88464	2020-11-04 22:32:50 -05:00
Nikita Popov	fa48ff3fc9	[CodeGen] Fix neutral value of vecreduce fadd in tests (NFC) The neutral value is -0.0, not 0.0. This doesn't matter for "fast" reductions due to nsz, but does matter for reassoc-only and seq reductions. Change tests to mostly use -0.0 where the neutral value was intended, and add some additional test coverage in some places. Also update LangRef to use the right value.	2020-10-29 21:26:14 +01:00
Johannes Doerfert	14077836ec	[LangRef] Clarify `dereferenceable` -> `nonnull` implication If `null_pointer_is_valid` is present, `dereferenceable` does not imply `nonnull`, make it clear. Came up in D17993. Reviewed By: aqjune Differential Revision: https://reviews.llvm.org/D89417	2020-10-27 19:12:53 -05:00
Artur Pilipenko	6ec2c5e402	GC-parseable element atomic memcpy/memmove This change introduces a GC parseable lowering for element atomic memcpy/memmove intrinsics. This way runtime can provide an implementation which can take a safepoint during copy operation. See "GC-parseable element atomic memcpy/memmove" thread on llvm-dev for the background and details: https://groups.google.com/g/llvm-dev/c/NnENHzmX-b8/m/3PyN8Y2pCAAJ Differential Revision: https://reviews.llvm.org/D88861	2020-10-23 14:06:09 -07:00
Nick Desaulniers	b7926ce6d7	[IR] add fn attr for no_stack_protector; prevent inlining on mismatch It's currently ambiguous in IR whether the source language explicitly did not want a stack a stack protector (in C, via function attribute no_stack_protector) or doesn't care for any given function. It's common for code that manipulates the stack via inline assembly or that has to set up its own stack canary (such as the Linux kernel) would like to avoid stack protectors in certain functions. In this case, we've been bitten by numerous bugs where a callee with a stack protector is inlined into an __attribute__((__no_stack_protector__)) caller, which generally breaks the caller's assumptions about not having a stack protector. LTO exacerbates the issue. While developers can avoid this by putting all no_stack_protector functions in one translation unit together and compiling those with -fno-stack-protector, it's generally not very ergonomic or as ergonomic as a function attribute, and still doesn't work for LTO. See also: https://lore.kernel.org/linux-pm/20200915172658.1432732-1-rkir@google.com/ https://lore.kernel.org/lkml/20200918201436.2932360-30-samitolvanen@google.com/T/#u Typically, when inlining a callee into a caller, the caller will be upgraded in its level of stack protection (see adjustCallerSSPLevel()). By adding an explicit attribute in the IR when the function attribute is used in the source language, we can now identify such cases and prevent inlining. Block inlining when the callee and caller differ in the case that one contains `nossp` when the other has `ssp`, `sspstrong`, or `sspreq`. Fixes pr/47479. Reviewed By: void Differential Revision: https://reviews.llvm.org/D87956	2020-10-23 11:55:39 -07:00
Atmn Patel	1e55cf77f3	[LangRef] Define mustprogress attribute LLVM IR currently assumes some form of forward progress. This form is not explicitly defined anywhere, and is the cause of miscompilations in most languages that are not C++11 or later. This implicit forward progress guarantee can not be opted out of on a function level nor on a loop level. Languages such as C (C11 and later), C++ (pre-C++11), and Rust have different forward progress requirements and this needs to be evident in the IR. Specifically, C11 and onwards (6.8.5, Paragraph 6) states that "An iteration statement whose controlling expression is not a constant expression, that performs no input/output operations, does not access volatile objects, and performs no synchronization or atomic operations in its body, controlling expression, or (in the case of for statement) its expression-3, may be assumed by the implementation to terminate." C++11 and onwards does not have this assumption, and instead assumes that every thread must make progress as defined in [intro.progress] when it comes to scheduling. This was initially brought up in [0] as a bug, a solution was presented in [1] which is the current workaround, and the predecessor to this change was [2]. After defining a notion of forward progress for IR, there are two options to address this: 1) Set the default to assuming Forward Progress and provide an opt-out for functions and an opt-in for loops. 2) Set the default to not assuming Forward Progress and provide an opt-in for functions, and an opt-in for loops. Option 2) has been selected because only C++11 and onwards have a forward progress requirement and it makes sense for them to opt-into it via the defined `mustprogress` function attribute. The `mustprogress` function attribute indicates that the function is required to make forward progress as defined. This is sharply in contrast to the status quo where this is implicitly assumed. In addition, `willreturn` implies `mustprogress`. The background for why this definition was chosen is in [3] and for why the option was chosen is in [4] and the corresponding thread(s). The implementation is in D85393, the clang patch is in D86841, the LoopDeletion patch is in D86844, the Inliner patches are in D87180 and D87262, and there will be more incoming. [0] https://bugs.llvm.org/show_bug.cgi?id=965#c25 [1] https://lists.llvm.org/pipermail/llvm-dev/2017-October/118558.html [2] https://reviews.llvm.org/D65718 [3] https://lists.llvm.org/pipermail/llvm-dev/2020-September/144919.html [4] https://lists.llvm.org/pipermail/llvm-dev/2020-September/145023.html Reviewed By: jdoerfert, efriedma, nikic Differential Revision: https://reviews.llvm.org/D86233	2020-10-19 13:34:27 -04:00
Sam Parker	03f3ef221b	[LangRef] Correct return type llvm.test.set.loop.iterations.* The langref description for llvm.test.set.loop.iterations.* were missing the i1 return type. Differential Revision: https://reviews.llvm.org/D89564 Patch by: Janek van Oirschot	2020-10-19 12:56:38 +01:00
Juneyoung Lee	62a0ec1612	Add support for !noundef metatdata on loads This patch adds metadata !noundef and makes load instructions can optionally have it. A load with !noundef always return a well-defined value (has no undef bit or isn't poison). If the loaded value isn't well defined, the behavior is undefined. This metadata can be used to encode the assumption from C/C++ that certain reads of variables should have well-defined values. It is helpful for optimizing freeze instructions away, because freeze can be removed when its operand has well-defined value, and showing that a load from arbitrary location is well-defined is usually hard otherwise. The same information can be encoded with llvm.assume with operand bundle; using metadata is chosen because I wasn't sure whether code motion can be freely done when llvm.assume is inserted from clang instead. The existing codebase already is stripping unknown metadata when doing code motion, so using metadata is UB-safe as well. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D89050	2020-10-17 13:50:10 +09:00
Juneyoung Lee	701cf4b5a5	[LangRef] Rename the names of metadata in load/store's syntax (NFC) Discussed in D89050	2020-10-17 13:30:02 +09:00
Alok Kumar Sharma	0538353b3b	[DebugInfo] Support for DWARF operator DW_OP_over LLVM rejects DWARF operator DW_OP_over. This DWARF operator is needed for Flang to support assumed rank array. Summary: Currently LLVM rejects DWARF operator DW_OP_over. Below error is produced when llvm finds this operator. [..] invalid expression !DIExpression(151, 20, 16, 48, 30, 35, 80, 34, 6) warning: ignoring invalid debug info in over.ll [..] There were some parts missing in support of this operator, which are now completed. Testing -added a unit testcase -check-debuginfo -check-llvm Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D89208	2020-10-17 08:42:28 +05:30
Matt Arsenault	0a7cd99a70	Reapply "OpaquePtr: Add type to sret attribute" This reverts commit `eb9f7c28e5`. Previously this was incorrectly handling linking of the contained type, so this merges the fixes from D88973.	2020-10-16 11:05:02 -04:00
Scott Linder	3f2386de63	[DebugInfo][docs] Document DILabel in LangRef Add some minimal documentation for DILabel, originally introduced in D45024. Update the name and semantics of the `variables:` field in the documentation for `DISubprogram`; the field is now called `retainedNodes:` and is a heterogeneous list of `DILocalVariable` and `DILabel`. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D89082	2020-10-13 18:26:41 +00:00
Alok Kumar Sharma	96bd4d34a2	[DebugInfo] Support for DWARF attribute DW_AT_rank This patch adds support for DWARF attribute DW_AT_rank. Summary: Fortran assumed rank arrays have dynamic rank. DWARF attribute DW_AT_rank is needed to support that. Testing: unit test cases added (hand-written) check llvm check debug-info Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D89141	2020-10-10 17:51:12 +05:30
Amara Emerson	322d0afd87	[llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics. This change renames the intrinsics to not have "experimental" in the name. The autoupgrader will handle legacy intrinsics. Relevant ML thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html Differential Revision: https://reviews.llvm.org/D88787	2020-10-07 10:36:44 -07:00
Michael Kruse	c3f12dd606	[docs] Revise loop terminology reference. Motivated by D88183, this seeks to clarify the current loop nomenclature with added illustrations, examples for possibly unexpected situations (infinite loops not part of the "parent" loop, logical loops sharing the same header, ...), and clarification on what other sources may consider a loop. The current document also has multiple errors that are fixed here. Some selected errors: * Loops a defined as strongly-connected components. A component a partition of all nodes, i.e. a subloop can never be a component. That is, the document as it currently is only covers top-level loops, even it also uses the term SCC for subloops. * "a block can be the header of two separate loops at the same time" (it is considered a single loop by LoopInfo) * "execute before some interesting event happens" (some interesting event is not well-defined) Reviewed By: baziotis, Whitney Differential Revision: https://reviews.llvm.org/D88408	2020-10-05 10:28:04 -05:00
Tres Popp	eb9f7c28e5	Revert "OpaquePtr: Add type to sret attribute" This reverts commit `55c4ff91bd`. Issues were introduced as discussed in https://reviews.llvm.org/D88241 where this change made previous bugs in the linker and BitCodeWriter visible.	2020-09-29 10:31:04 +02:00
Juneyoung Lee	8bd205bf1d	[LangRef] Clarify the behavior of memory access instructions when pointers/sizes aren't well-defined This is a patch to LangRef that clarifies the behavior of load/store/memset/memcpy/memmove when the pointers or sizes are not well-defined as well. MSan detects a case when e.g., only lower bits of address are garbage when `-msan-check-access-address` is enabled, and it does not directly conflict with this patch because a C program should not use a pointer with undef bits and reasonable optimizations do not convert a well-defined pointer into a pointer with undef bits. This patch contains a definition of a well-defined value as well. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D87994	2020-09-26 08:13:27 +09:00
Matt Arsenault	55c4ff91bd	OpaquePtr: Add type to sret attribute Make the corresponding change that was made for byval in `b7141207a4`. Like byval, this requires a bulk update of the test IR tests to include the type before this can be mandatory.	2020-09-25 14:07:30 -04:00
Sanjay Patel	3a8ea8609b	[Intrinsics] define semantics for experimental fmax/fmin vector reductions As discussed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html This is hopefully the final remaining showstopper before we can remove the 'experimental' from the reduction intrinsics. No behavior was specified for the FP min/max reductions, so we have a mess of different interpretations. There are a few potential options for the semantics of these max/min ops. I think this is the simplest based on current behavior/implementation: make the reductions inherit from the existing llvm.maxnum/minnum intrinsics. These correspond to libm fmax/fmin, and those are similar to the (now deprecated?) IEEE-754 maxNum/minNum functions (NaNs are treated as missing data). So the default expansion creates calls to libm functions. Another option would be to inherit from llvm.maximum/minimum (NaNs propagate), but most targets just crash in codegen when given those nodes because no default expansion was ever implemented AFAICT. We could also just assume 'nnan' semantics by default (we are already assuming 'nsz' semantics in the maxnum/minnum intrinsics), but some targets (AArch64, PowerPC) support the more defined behavior, so it doesn't make much sense to not allow a tighter spec. Fast-math-flags (nnan) can be used to loosen the semantics. (Note that D67507 was proposed to update the LangRef to acknowledge the more recent IEEE-754 2019 standard, but that patch seems to have stalled. If we do update based on the new standard, the reduction instructions can seamlessly inherit from whatever updates are made to the max/min intrinsics.) x86 sees a regression here on 'nnan' tests because we have underlying, longstanding bugs in FMF creation/propagation. Those need to be fixed apart from this change (for example: https://llvm.org/PR35538). The expansion sequence before this patch may not have been correct. Differential Revision: https://reviews.llvm.org/D87391	2020-09-12 09:10:28 -04:00
Florian Hahn	1ddb3a369f	[LangRef] Adjust guarantee for llvm.memcpy to also allow equal arguments. This adjusts the description of `llvm.memcpy` to also allow operands to be equal. This is in line with what Clang currently expects. This change is intended to be temporary and followed by re-introduce a variant with the non-overlapping guarantee for cases where we can actually ensure that property in the front-end. See the links below for more details: http://lists.llvm.org/pipermail/cfe-dev/2020-August/066614.html and PR11763. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D86815	2020-09-05 19:18:23 +01:00
Yang Zhihui	691d436685	Fix typos in doc LangRef.rst Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D87077	2020-09-04 05:17:31 -07:00
Michael Kruse	137dfd616a	[LangRef] Fix condition for when a loop is considered parallel. The wording before this patch applies to llvm.mem.parallel_loop_access, not access groups. Reviewed By: mppf, hfinkel Differential Revision: https://reviews.llvm.org/D83781	2020-09-01 15:41:59 -05:00
Juneyoung Lee	09dcb52ca8	[LangRef] Apply a missing comment from D86189	2020-08-30 14:56:17 +09:00
Juneyoung Lee	98e5776897	[LangRef] State that storing an aggregate fills padding with undef This patch makes LangRef be explicit about the value of padding when storing an aggregate. It states that when an aggregate is stored into memory, padding is filled with undef. Here is a clue that supports this change (edited to reflect the discussion from llvm-dev): - IPSCCP ignores padding and directly stores a constant aggregate if possible. It loses the data stored in the padding. https://godbolt.org/z/xzenYs Memcpyopt ignores (the preexisting value of) padding when copying an aggregate or storing a constant: https://godbolt.org/z/hY6ndd / https://godbolt.org/z/3WMP5a The two items below are not relevant with this patch because Clang lowers load/store of individual field of struct into load/stores of the corresponding pointer with a primitive type. Also, when copy is needed, it uses memcpy instead of load/store of an aggregate, as discussed in the llvm-dev. However, this patch is still valid (as discussed) because it is needed to explain the two optimizations above. - According to C17, the value of padding bytes when storing values in structures or unions is unspecified. - I updated Alive2 and it did not find any problematic transformation from LLVM unit tests and while running translation validation of a few C programs. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D86189	2020-08-30 14:53:20 +09:00
Sjoerd Meijer	ff6dbb2319	Follow up of rGca243b07276a: fixed a typo. NFC.	2020-08-27 10:53:41 +01:00
Sjoerd Meijer	ca243b0727	[LangRef] get.active.lane.mask can produce poison value We had already specified that second argument `n` of this intrinsic is `n > 0`, but now add to this that the result is a poison value if this is not the case. Differential Revision: https://reviews.llvm.org/D86637	2020-08-27 08:57:35 +01:00
Juneyoung Lee	24dd04116d	[LangRef] Memset/memcpy/memmove can take undef/poison pointer if the size is 0 According to the current LangRef, Memset/memcpy/memmove can take a null/dangling pointer if the size is zero. (Relevant thread: http://lists.llvm.org/pipermail/llvm-dev/2017-July/115665.html ) This patch expands it and allows the functions to take undef/poison pointers too. This required the updates in the align attribute since it isn't specified what is the alignment of undef/poison pointers. This patch states that their alignment is 1. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D86643	2020-08-27 06:19:28 +09:00
Sjoerd Meijer	2002bb4878	[LangRef] Revise semantics of intrinsic get.active.lane.mask A first version of get.active.lane.mask was committed in rG7fb8a40e5220. One of the main purposes and uses of this intrinsic is to communicate information from the middle-end to the back-end, but its current definition and semantics make this actually very difficult. The intrinsic was defined as: @llvm.get.active.lane.mask(%IV, %BTC) where %BTC is the Backedge-Taken Count (variable names are different in the LangRef spec). This allows to implicitly communicate the loop tripcount, which can be reconstructed by calculating BTC + 1. But it has been very difficult to prove that calculating BTC + 1 is safe and doesn't overflow. We need complicated range and SCEV analysis, and thus the problem is that this intrinsic isn't really doing what it was supposed to solve. Examples of the overflow checks that are required in the (ARM) back-end are D79175 and D86074, which aren't even complete/correct yet. To solve this problem, we are revising the definitions/semantics for get.active.lane.mask to avoid all the complicated overflow analysis. This means that instead of communicating the BTC, we are now using the loop tripcount. Now using LangRef's variable names, its semantics is changed from: icmp ule (%base + i), %n to: icmp ult (%base + i), %n with %n > 0 and corresponding to the loop tripcount. The intrinsic signature remains the same. Differential Revision: https://reviews.llvm.org/D86147	2020-08-25 16:23:51 +01:00
Philip Reames	a96fc4638b	Remove deopt and gc transition arguments from gc.statepoint intrinsic (Forgot to land this a couple of weeks back.) In a recent series of changes, I've introduced support for using the respective operand bundle kinds on the statepoint. At the moment, code supports either/or, but there's no need to keep the old support around. For the moment, I am simply changing the specification and verifier to require zero length argument sets in the intrinsic. The intrinsic itself is experimental. Given that, there's no forward serialization needed. The in tree uses and generation have already been updated to use the new operand bundle based forms, the only folks broken by the change will be those with frontends generating statepoints directly and the updates should be easy. Why not go ahead and just remove the arguments entirely? Well, I plan to. But while working on this I've found that almost all of the arguments to the statepoint can be expressed via operand bundles or attributes. Given that, I'm planning a radical simplification of the arguments and figured I'd do one update not several small ones. Differential Revision: https://reviews.llvm.org/D80892	2020-08-14 16:07:40 -07:00
Kazu Hirata	a31b3893c7	[docs] Fix typos	2020-08-09 19:31:49 -07:00
Bevin Hansson	5de6c56f7e	[Intrinsic] Add sshl.sat/ushl.sat, saturated shift intrinsics. Summary: This patch adds two intrinsics, llvm.sshl.sat and llvm.ushl.sat, which perform signed and unsigned saturating left shift, respectively. These are useful for implementing the Embedded-C fixed point support in Clang, originally discussed in http://lists.llvm.org/pipermail/llvm-dev/2018-August/125433.html and http://lists.llvm.org/pipermail/cfe-dev/2018-May/058019.html Reviewers: leonardchan, craig.topper, bjope, jdoerfert Subscribers: hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83216	2020-08-07 15:09:24 +02:00
Bevin Hansson	177735aac7	[LangRef] Minor fixes to intrinsic headers and descriptions. NFC.	2020-08-07 15:09:24 +02:00
Simon Pilgrim	6e727551b9	Fix sphinx indentation warning to stop newline in byref section html output.	2020-08-04 16:12:50 +01:00
Jinsong Ji	d28f86723f	Re-land "[PowerPC] Remove QPX/A2Q BGQ/BGP CNK support" This reverts commit `bf544fa1c3`. Fixed the typo in PPCInstrInfo.cpp.	2020-07-28 14:00:11 +00:00
Jinsong Ji	bf544fa1c3	Revert "[PowerPC] Remove QPX/A2Q BGQ/BGP CNK support" This reverts commit `adffce7153`. This is breaking test-suite, revert while investigation.	2020-07-27 21:07:00 +00:00
Jinsong Ji	adffce7153	[PowerPC] Remove QPX/A2Q BGQ/BGP CNK support Per RFC http://lists.llvm.org/pipermail/llvm-dev/2020-April/141295.html no one is making use of QPX/A2Q/BGQ/BGP CNK anymore. This patch remove the support of QPX/A2Q in llvm, BGQ/BGP in clang, CNK support in openmp/polly. Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D83915	2020-07-27 19:24:39 +00:00
Roman Lebedev	fef0cf0810	[LangRef] Add integer min/max/abs intrinsics Add LangRef specification for the llvm.abs, llvm.umin, llvm.umax, llvm.smin, and llvm.smax integer intrinsics. Link to RFC: https://lists.llvm.org/pipermail/llvm-dev/2020-June/142257.html Proposed alive2 implementation: https://github.com/AliveToolkit/alive2/pull/353 Differential Revision: https://reviews.llvm.org/D81829	2020-07-23 20:56:18 +02:00
Alok Kumar Sharma	2d10258a31	[DebugInfo] Support for DW_AT_associated and DW_AT_allocated. Summary: This support is needed for the Fortran array variables with pointer/allocatable attribute. This support enables debugger to identify the status of variable whether that is currently allocated/associated. for pointer array (before allocation/association) without DW_AT_associated (gdb) pt ptr type = integer (140737345375288:140737354129776) (gdb) p ptr value requires 35017956 bytes, which is more than max-value-size with DW_AT_associated (gdb) pt ptr type = integer (:) (gdb) p ptr $1 = <not associated> for allocatable array (before allocation) without DW_AT_allocated (gdb) pt arr type = integer (140737345375288:140737354129776) (gdb) p arr value requires 35017956 bytes, which is more than max-value-size with DW_AT_allocated (gdb) pt arr type = integer, allocatable (:) (gdb) p arr $1 = <not allocated> Testing - unit test cases added - check-llvm - check-debuginfo Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D83544	2020-07-20 19:54:35 +05:30
Matt Arsenault	5e999cbe8d	IR: Define byref parameter attribute This allows tracking the in-memory type of a pointer argument to a function for ABI purposes. This is essentially a stripped down version of byval to remove some of the stack-copy implications in its definition. This includes the base IR changes, and some tests for places where it should be treated similarly to byval. Codegen support will be in a future patch. My original attempt at solving some of these problems was to repurpose byval with a different address space from the stack. However, it is technically permitted for the callee to introduce a write to the argument, although nothing does this in reality. There is also talk of removing and replacing the byval attribute, so a new attribute would need to take its place anyway. This is intended avoid some optimization issues with the current handling of aggregate arguments, as well as fixes inflexibilty in how frontends can specify the kernel ABI. The most honest representation of the amdgpu_kernel convention is to expose all kernel arguments as loads from constant memory. Today, these are raw, SSA Argument values and codegen is responsible for turning these into loads. Background: There currently isn't a satisfactory way to represent how arguments for the amdgpu_kernel calling convention are passed. In reality, arguments are passed in a single, flat, constant memory buffer implicitly passed to the function. It is also illegal to call this function in the IR, and this is only ever invoked by a driver of some kind. It does not make sense to have a stack passed parameter in this context as is implied by byval. It is never valid to write to the kernel arguments, as this would corrupt the inputs seen by other dispatches of the kernel. These argumets are also not in the same address space as the stack, so a copy is needed to an alloca. From a source C-like language, the kernel parameters are invisible. Semantically, a copy is always required from the constant argument memory to a mutable variable. The current clang calling convention lowering emits raw values, including aggregates into the function argument list, since using byval would not make sense. This has some unfortunate consequences for the optimizer. In the aggregate case, we end up with an aggregate store to alloca, which both SROA and instcombine turn into a store of each aggregate field. The optimizer never pieces this back together to see that this is really just a copy from constant memory, so we end up stuck with expensive stack usage. This also means the backend dictates the alignment of arguments, and arbitrarily picks the LLVM IR ABI type alignment. By allowing an explicit alignment, frontends can make better decisions. For example, there's real no advantage to an aligment higher than 4, so a frontend could choose to compact the argument layout. Similarly, there is a high penalty to using an alignment lower than 4, so a frontend could opt into more padding for small arguments. Another design consideration is when it is appropriate to expose the fact that these arguments are all really passed in adjacent memory. Currently we have a late IR optimization pass in codegen to rewrite the kernel argument values into explicit loads to enable vectorization. In most programs, unrelated argument loads can be merged together. However, exposing this property directly from the frontend has some disadvantages. We still need a way to track the original argument sizes and alignments to report to the driver. I find using some side-channel, metadata mechanism to track this unappealing. If the kernel arguments were exposed as a single buffer to begin with, alias analysis would be unaware that the padding bits betewen arguments are meaningless. Another family of problems is there are still some gaps in replacing all of the available parameter attributes with metadata equivalents once lowered to loads. The immediate plan is to start using this new attribute to handle all aggregate argumets for kernels. Long term, it makes sense to migrate all kernel arguments, including scalars, to be passed indirectly in the same manner. Additional context is in D79744.	2020-07-20 10:23:09 -04:00

1 2 3 4 5 ...

850 Commits