Volatility is not an aliasing property. We used to model volatile as if it had extremely conservative aliasing implications, but that hasn't been true for several years now. So, it doesn't make sense to be in AliasSet.
It also turns out the code is entirely a noop. Outside of the AST code to update it, there was only one user: load store promotion in LICM. L/S promotion doesn't need the check since it walks all the users of the address anyway. It already checks each load or store via !isUnordered which causes us to bail for volatile accesses. (Look at the lines immediately following the two remove asserts.)
There is the possibility of some small compile time impact here, but the only case which will get noticeably slower is a loop with a large number of loads and stores to the same address where only the last one we inspect is volatile. This is sufficiently rare it's not worth optimizing for..
llvm-svn: 340312
DAGCombiner doesn't pay attention to whether constants are opaque before doing the div by constant optimization. So BypassSlowDivision shouldn't introduce control flow that would make DAGCombiner unable to see an opaque constant. This can occur when a div and rem of the same constant are used in the same basic block. it will be hoisted, but not leave the block.
Longer term we probably need to look into the X86 immediate cost model used by constant hoisting and maybe not mark div/rem immediates for hoisting at all.
This fixes the case from PR38649.
Differential Revision: https://reviews.llvm.org/D51000
llvm-svn: 340303
Remove duplicate tests from InstCombine that were added with
D50582. I left negative tests there to verify that nothing
in InstCombine tries to go overboard. If isKnownNeverNaN is
improved to handle the FP binops or other cases, we should
have coverage under InstSimplify, so we could remove more
duplicate tests from InstCombine at that time.
llvm-svn: 340279
Summary:
Follow up change to rL339703, where we now vectorize loops with non-phi
instructions used outside the loop. Note that the cyclic dependency
identification occurs when identifying reduction/induction vars.
We also need to identify that we do not allow users where the PSCEV information
within and outside the loop are different. This was the fix added in rL307837
for PR33706.
Reviewers: Ayal, mkuper, fhahn
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D50778
llvm-svn: 340278
This is a slight modification of the tests from D50582;
change half of the predicates to 'uno' so we have coverage
for that side too. All of the positive tests can fold to a
constant (true/false), so that should happen in instsimplify.
llvm-svn: 340276
Summary:
Previously the new llvm.amdgcn.raw/struct.buffer.load/store intrinsics
only allowed float types for the data to be loaded or stored, which
sometimes meant the frontend needed to generate a bitcast. In this, the
new intrinsics copied the old buffer intrinsics.
This commit extends the new intrinsics to allow int types as well.
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D50315
Change-Id: I8202af2d036455553681dcbb3d7d32ae273f8f85
llvm-svn: 340270
Summary:
This commit adds new intrinsics
llvm.amdgcn.raw.buffer.load
llvm.amdgcn.raw.buffer.load.format
llvm.amdgcn.raw.buffer.load.format.d16
llvm.amdgcn.struct.buffer.load
llvm.amdgcn.struct.buffer.load.format
llvm.amdgcn.struct.buffer.load.format.d16
llvm.amdgcn.raw.buffer.store
llvm.amdgcn.raw.buffer.store.format
llvm.amdgcn.raw.buffer.store.format.d16
llvm.amdgcn.struct.buffer.store
llvm.amdgcn.struct.buffer.store.format
llvm.amdgcn.struct.buffer.store.format.d16
llvm.amdgcn.raw.buffer.atomic.*
llvm.amdgcn.struct.buffer.atomic.*
with the following changes from the llvm.amdgcn.buffer.*
intrinsics:
* there are separate raw and struct versions: raw does not have an
index arg and sets idxen=0 in the instruction, and struct always sets
idxen=1 in the instruction even if the index is 0, to allow for the
fact that gfx9 does bounds checking differently depending on whether
idxen is set;
* there is a combined cachepolicy arg (glc+slc)
* there are now only two offset args: one for the offset that is
included in bounds checking and swizzling, to be split between the
instruction's voffset and immoffset fields, and one for the offset
that is excluded from bounds checking and swizzling, to go into the
instruction's soffset field.
The AMDISD::BUFFER_* SD nodes always have an index operand, all three
offset operands, combined cachepolicy operand, and an extra idxen
operand.
The obsolescent llvm.amdgcn.buffer.* intrinsics continue to work.
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D50306
Change-Id: If897ea7dc34fcbf4d5496e98cc99a934f62fc205
llvm-svn: 340269
Summary:
This commit adds new intrinsics
llvm.amdgcn.raw.tbuffer.load
llvm.amdgcn.struct.tbuffer.load
llvm.amdgcn.raw.tbuffer.store
llvm.amdgcn.struct.tbuffer.store
with the following changes from the llvm.amdgcn.tbuffer.* intrinsics:
* there are separate raw and struct versions: raw does not have an index
arg and sets idxen=0 in the instruction, and struct always sets
idxen=1 in the instruction even if the index is 0, to allow for the
fact that gfx9 does bounds checking differently depending on whether
idxen is set;
* there is a combined format arg (dfmt+nfmt)
* there is a combined cachepolicy arg (glc+slc)
* there are now only two offset args: one for the offset that is
included in bounds checking and swizzling, to be split between the
instruction's voffset and immoffset fields, and one for the offset
that is excluded from bounds checking and swizzling, to go into the
instruction's soffset field.
The AMDISD::TBUFFER_* SD nodes always have an index operand, all three
offset operands, combined format operand, combined cachepolicy operand,
and an extra idxen operand.
The tbuffer pseudo- and real instructions now also have a combined
format operand.
The obsolescent llvm.amdgcn.tbuffer.* and llvm.SI.tbuffer.store
intrinsics continue to work.
V2: Separate raw and struct intrinsics.
V3: Moved extract_glc and extract_slc defs to a more sensible place.
V4: Rebased on D49995.
V5: Only two separate offset args instead of three.
V6: Pseudo- and real instructions have joint format operand.
V7: Restored optionality of dfmt and nfmt in assembler.
V8: Addressed minor review comments.
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D49026
Change-Id: If22ad77e349fac3a5d2f72dda53c010377d470d4
llvm-svn: 340268
Summary:
Previously a BUNDLE instruction inherited the DebugLoc from the
first instruction in the bundle, even if that DebugLoc had no
DILocation. With this commit this is changed into selecting the
first DebugLoc that has a DILocation, by searching among the
bundled instructions.
The idea is to reduce amount of bundles that are lacking
debug locations.
Reviewers: #debug-info, JDevlieghere
Reviewed By: JDevlieghere
Subscribers: JDevlieghere, mattd, llvm-commits
Differential Revision: https://reviews.llvm.org/D50639
llvm-svn: 340267
During combining, ReduceLoadWdith is used to combine AND nodes that
mask loads into narrow loads. This patch allows the mask to be a
shifted constant. This results in a narrow load which is then left
shifted to compensate for the new offset.
Differential Revision: https://reviews.llvm.org/D50432
llvm-svn: 340261
This reduces most of the sdiv stages (the MULHS, shifts etc.) to just zero/identity values and use the numerator scale factor to multiply by +1/-1.
llvm-svn: 340260
This patch teaches LICM to hoist guards from the loop if they are guaranteed to execute and
if there are no side effects that could prevent that.
Differential Revision: https://reviews.llvm.org/D50501
Reviewed By: reames
llvm-svn: 340256
Summary:
RegisterCoalescer::reMaterializeTrivialDef used to assert that
the input register was live in. But as shown by the new
coalesce-dead-lanes.mir test case that seems to be a valid
scenario. We now return false instead of the assert, simply
avoiding to remat the dead def.
Normally a COPY of an undef value is eliminated by
eliminateUndefCopy(). Although we only do that when the
destination isn't a physical register. So the situation
above should be limited to the case when we copy an undef
value to a physical register.
Reviewers: kparzysz, wmi, tpr
Reviewed By: kparzysz
Subscribers: MatzeB, qcolombet, tpr, llvm-commits
Differential Revision: https://reviews.llvm.org/D50842
llvm-svn: 340255
LangRef for BitCast requires that
"The bit sizes of value and the destination type, ty2, must be identical".
Currently verifier allows BitCast of pointer to vector of pointers so that
the sizes are different.
This change fixes that.
Reviewers: arsenm
Reviewed By: arsenm
Subscribers: llvm-commits, wdng
Differential Revision: https://reviews.llvm.org/D50886
llvm-svn: 340249
These intrinsics are modelled as writing for control flow purposes, but they don't actually write to any location. Marking these - as we did for guards - allows LICM to hoist loads out of loops containing invariant.starts.
Differential Revision: https://reviews.llvm.org/D50861
llvm-svn: 340245
This is encoded as __E and should print something like
"dynamic initializer for 'Foo'(void)"
This also adds support for dynamic atexit destructor, which is
basically identical but encoded as __F with slightly different
description.
llvm-svn: 340239
Summary:
We decided to revert this from i64 to i32 in Nov 28 CG meeting. Fixes
PR38632.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D51010
llvm-svn: 340234
If we can use comdats, then we can make it so that the global metadata
is thrown away if the prevailing definition of the global was
uninstrumented. I have only tested this on COFF targets, but in theory,
there is no reason that we cannot also do this for ELF.
This will allow us to re-enable string merging with ASan on Windows,
reducing the binary size cost of ASan on Windows.
I tested this change with ASan+PGO, and I fixed an issue with the
__llvm_profile_raw_version symbol. With the old version of my patch, we
would attempt to instrument that symbol on ELF because it had a comdat
with external linkage. If we had been using the linker GC-friendly
metadata scheme, everything would have worked, but clang does not enable
it by default.
llvm-svn: 340232
Due to some splat handling code in getVectorShuffle, its possible for NewV1/NewV2 to have their mask modified from what is requested. This can lead to cycles being created in the DAG.
This patch examines the returned mask and makes sure its different. Long term we may need to look closer at that splat code in getVectorShuffle, or add more splat awareness to getVectorShuffle.
Fixes PR38639
Differential Revision: https://reviews.llvm.org/D50981
llvm-svn: 340214
We can safely avoid interfering with the subus combine if both inputs are freely truncatable. Either both extends, or an extend and a constant vector.
Differential Revision: https://reviews.llvm.org/D50878
llvm-svn: 340212
1. Change the software pipeliner to use unknown size instead of dropping
memory operands. It used to do it before, but MachineInstr::mayAlias
did not handle it correctly.
2. Recognize UnknownSize in MachineInstr::mayAlias.
3. Print and parse UnknownSize in MIR.
Differential Revision: https://reviews.llvm.org/D50339
llvm-svn: 340208
This reverts commit 7debc334e6421bb5251ef8f18e97166dfc7dd787.
I missed updating legalizer-info-validation.mir as I had assertions
turned off in my build and that specific test requires asserts. Fixed it
now.
llvm-svn: 340197
32-bit constant address space is declared as 6, so the
maximum number of address spaces is 6, not 5.
Fixes "LLVM ERROR: Pointer address space out of range".
v3: use static_assert()
v2: add a very simple test for 32-bit addr space
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106630
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
llvm-svn: 340171
Only adds support to the existing 'large element' scalar/vector to 'small element' vector bitcasts.
Handle the case where the sign bit extends to only part of the small elements.
llvm-svn: 340169
This patch adds system registers for controlling aspects of SVE:
- ZCR_EL1 (r/w) visible at EL1 and EL0.
- ZCR_EL2 (r/w) visible at EL2 and Non-secure EL1 and EL0.
- ZCR_EL3 (r/w) visible at all exception levels.
and a system register identifying SVE:
- ID_AA64ZFR0_EL1 (r) SVE Feature identifier.
Reviewers: SjoerdMeijer, samparker, pbarrio, fhahn, javed.absar
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D50885
llvm-svn: 340158
If the arch is P8, we will select XFLOAD to load the floating point, and then, expand it to vsx and non-vsx X-form instruction post RA. This patch is trying to convert the X-form to D-form if it meets the requirement that one operand of the x-form inst is the special Zero register, and another operand fed by add inst. i.e.
y = add imm, reg
LFDX. 0, y
-->
LFD imm(reg)
Reviewers: Nemanjai
Differential Revision: https://reviews.llvm.org/D49007
llvm-svn: 340149
Only adds support to the existing 'large element' scalar/vector to 'small element' vector bitcasts.
The next step would be to support cases where the large elements aren't all sign bits, and determine the small element equivalent based on the demanded elements.
llvm-svn: 340143
We were basically assuming only one operand of the compare could be an ADD node and using that to swap operands. But we can have a normal add followed by a saturing add.
This rewrites the canonicalization to just be based on the condition code.
llvm-svn: 340134
The code already support 128 and 256 and even knows to split 256 for AVX1. So we really just needed to stop looking for specific VTs and subtarget features and just look for legal VTs with i8/i16 elements.
While there, add some curly braces around outer if statement bodies that contain only another if. It makes all the closing curly braces look more regular.
llvm-svn: 340128
A while back I submitted a patch to resolve backreferences
lazily, thinking this that it was not always possible to know
in advance what type you were looking at until you had completed
a full pass over the input, and therefore it would be impossible
to resolve backreferences eagerly.
This was mistaken though, and turned out to be an unrelated
problem. In fact, the reverse is true. You *must* resolve
backreferences eagerly. This is because certain types of nested
mangled symbols do not share a backreference context with their
parent symbol, and as such, if you try to resolve them lazily
their backreference context will have been lost by the time you
finish demangling the entire input. On the other hand, resolving
them eagerly appears to always work, and enables us to port
many more tests over.
llvm-svn: 340126
Summary:
I believe this restores the behavior we had before r339147.
Fixes PR38622.
Reviewers: RKSimon, chandlerc, spatel
Reviewed By: chandlerc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D50936
llvm-svn: 340120
Summary:
Port GNU Objcopy -G/--keep-global-symbol(s).
This is slightly different than the already-implemented --globalize-symbol, which marks a symbol as global when copying. When --keep-global-symbol (alias -G) is used, *only* those symbols marked will stay global, and all other globals are demoted to local. (Also note that it doesn't *promote* a symbol to global). Additionally, there is a pluralized version of the flag --keep-global-symbols, which effectively applies --keep-global-symbol for every non-comment in a file.
Reviewers: jakehehrlich, jhenderson, alexshap
Reviewed By: jhenderson
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D50589
llvm-svn: 340105
We were only printing the vtordisp thunk before as the previous
patch was more aimed at getting special operators working, one
of which was a thunk. This patch gets all thunk types to print
properly, and adds a test for each one.
llvm-svn: 340088
Summary:
The -I (--input-target) and -B (--binary-architecture) flags exist but are currently silently ignored. This adds support for -I binary for architectures i386, x86-64 (and alias i386:x86-64), arm, aarch64, sparc, and ppc (powerpc:common64). This is largely based on D41687.
This is done by implementing an additional subclass of Reader, BinaryReader, which works by interpreting the input file as contents for .data field, sets up a synthetic header, and adds additional sections/symbols (e.g. _binary__tmp_data_txt_start).
Reviewers: jakehehrlich, alexshap, jhenderson, javed.absar
Reviewed By: jhenderson
Subscribers: jyknight, nemanjai, kbarton, fedor.sergeev, jrtc27, kristof.beyls, paulsemel, llvm-commits
Differential Revision: https://reviews.llvm.org/D50343
llvm-svn: 340070
Extending the concept introduced in D49562, this patch lowers constant vXi8 ISD::SRL/ISD::SRA by zero/sign extending to vXi16 and using PMULLW and then truncating the high 8 bits of the result.
Differential Revision: https://reviews.llvm.org/D50781
llvm-svn: 340062
Summary:
Adds the option for the printing of summary information about functions
considered but rejected for importing during the thin link.
Reviewers: davidxl
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D50881
llvm-svn: 340047
Previously, some of the code for actually parsing mangled
operator names was more like formatting code in nature,
and was interspersed with the demangling code which builds
the AST. This means that by the time we got to the printing
code, we had lost all information about what type of operator
we had, and all we were left with was a string that we just
had to print. However, not all operators are actually even
operators. it's basically just a catch-all mangling for
"special names", and for some of the other types it helps
to know when we're actually doing the printing what it is.
This patch changes the way things work by introducing an
OperatorInfo structure and corresponding enumeration. When
we demangle we store the enumeration value and demangled
components separately. This gives more flexibility during
printing.
In doing so, some demanglings of special names which we didn't
previously support come out of this for free, so we now demangle
those.
A few are more complex and are better left for a followup patch
though.
An exhaustive test of every possible operator code is included,
with the ones that don't yet work commented out.
llvm-svn: 340046
There are two forms for label debug information in DWARF format.
1. Labels in a non-inlined function:
DW_TAG_label
DW_AT_name
DW_AT_decl_file
DW_AT_decl_line
DW_AT_low_pc
2. Labels in an inlined function:
DW_TAG_label
DW_AT_abstract_origin
DW_AT_low_pc
We will collect label information from DBG_LABEL. Before every DBG_LABEL,
we will generate a temporary symbol to denote the location of the label.
The symbol could be used to get DW_AT_low_pc afterwards. So, we create a
mapping between 'inlined label' and DBG_LABEL MachineInstr in DebugHandlerBase.
The DBG_LABEL in the mapping is used to query the symbol before it.
The AbstractLabels in DwarfCompileUnit is used to process labels in inlined
functions.
We also keep a mapping between scope and labels in DwarfFile to help to
generate correct tree structure of DIEs.
It also generates label debug information under global isel.
Differential Revision: https://reviews.llvm.org/D45556
llvm-svn: 340039
This patch addresses:
- Implementation within PPCISelLowering.cpp to check if we should use direct
load into vector instructions (such as lxsd/lfd ) when the scalar_to_vector
function is used; which will allow us to catch as many cases of the
scalar_to_vector uses as possible to translate the ld->mtvsrd sequence into
lxsd.
- Test cases to exhibit the behaviour of emitting lxsd/lfd.
Patch by amyk
Differential revision: https://reviews.llvm.org/D49698
llvm-svn: 340037
* -march=x86-64 -> -mtriple=x86_64-unknown-linux to avoid _ prefixes to
symbols
* add -start-before to avoid running the whole codegen on the IR. I
assumed it is meant to be running after X86SpeculativeLoadHardening.
llvm-svn: 340034
test/CodeGen/X86/shadow-stack.ll has the following machine verifier
errors:
```
*** Bad machine code: Using a killed virtual register ***
- function: bar
- basic block: %bb.6 entry (0x7fdc81857818)
- instruction: %3:gr64 = MOV64rm killed %2:gr64, 1, $noreg, 8, $noreg
- operand 1: killed %2:gr64
*** Bad machine code: Using a killed virtual register ***
- function: bar
- basic block: %bb.6 entry (0x7fdc81857818)
- instruction: $rsp = MOV64rm killed %2:gr64, 1, $noreg, 16, $noreg
- operand 1: killed %2:gr64
*** Bad machine code: Virtual register killed in block, but needed live out. ***
- function: bar
- basic block: %bb.2 entry (0x7fdc818574f8)
Virtual register %2 is used after the block.
```
The fix here is to only copy the machine operand's register without the
kill flags for all the instructions except the very last one of the
sequence.
I had to insert dummy PHIs in the test case to force the NoPHI function
property to be set to false. More on this here: https://llvm.org/PR38439
Differential Revision: https://reviews.llvm.org/D50260
llvm-svn: 340033
NewGVN uses InstructionSimplify for simplifications of leaders of
congruence classes. It is not guaranteed that the metadata or other
flags/keywords (like nsw or exact) of the leader is available for all members
in a congruence class, so we cannot use it for simplification.
This patch adds a InstrInfoQuery struct with a boolean field
UseInstrInfo (which defaults to true to keep the current behavior as
default) and a set of helper methods to get metadata/keywords for a
given instruction, if UseInstrInfo is true. The whole thing might need a
better name, to avoid confusion with TargetInstrInfo but I am not sure
what a better name would be.
The current patch threads through InstrInfoQuery to the required
places, which is messier then it would need to be, if
InstructionSimplify and ValueTracking would share the same Query struct.
The reason I added it as a separate struct is that it can be shared
between InstructionSimplify and ValueTracking's query objects. Also,
some places do not need a full query object, just the InstrInfoQuery.
It also updates some interfaces that do not take a Query object, but a
set of optional parameters to take an additional boolean UseInstrInfo.
See https://bugs.llvm.org/show_bug.cgi?id=37540.
Reviewers: dberlin, davide, efriedma, sebpop, hiraditya
Reviewed By: hiraditya
Differential Revision: https://reviews.llvm.org/D47143
llvm-svn: 340031
This patch performs a widening transformation of bitwise atomicrmw
{or,xor,and} and applies it prior to tryExpandAtomicRMW. This operates
similarly to convertCmpXchgToIntegerType. For these operations, the i8/i16
atomicrmw can be implemented in terms of the 32-bit atomicrmw by appropriately
manipulating the operands. There is no functional change for the handling of
partword or/xor, but the transformation for partword 'and' is new.
The advantage of performing this transformation early is that the same
code-path can be used regardless of the approach used to expand the atomicrmw
(AtomicExpansionKind). i.e. the same logic is used for
AtomicExpansionKind::CmpXchg and can also be used by the intrinsic-based
expansion in D47882.
Differential Revision: https://reviews.llvm.org/D48129
llvm-svn: 340027
Summary:
Currently, in LICM, we use the alias set tracker to identify if the
instruction (we're interested in hoisting) aliases with instruction that
modifies that memory location.
This patch adds an LICM alias analysis diagnostic tool that checks the
mod ref info of the instruction we are interested in hoisting/sinking,
with every instruction in the loop. Because of O(N^2) complexity this
is now only a diagnostic tool to show the limitation we have with the
alias set tracker and is OFF by default.
Test cases show the difference with the diagnostic analysis tool, where
we're able to hoist out loads and readonly + argmemonly calls from the
loop, where the alias set tracker analysis is not able to hoist these
instructions out.
Reviewers: reames, mkazantsev, fedor.sergeev, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D50854
llvm-svn: 340026
This is another step towards being able to canonicalize to the funnel shift
intrinsics in IR (see D49242 for the initial patch).
We should not have any loss of simplification power in IR between these and
the equivalent IR constructs.
Differential Revision: https://reviews.llvm.org/D50848
llvm-svn: 340022
- Generate pointer authentication instructions
- The functions instrumented depend on function attribtues:
all (all functions instrumentent)
non-leaf (only those that spill LR)
none
- Function epilogues sign the LR before spilling to the stack and authenticate
the LR once restored
- If the target is v8.3a or greater than can use the combined authenticate and
return instruction
Differential revision: https://reviews.llvm.org/D49793
llvm-svn: 340018
Add a DAG combine for the PowerPC code generator to generate the Power9 extswsli
extend sign and shift immediate instruction.
Patch by RolandF.
Differential revision: https://reviews.llvm.org/D49879
llvm-svn: 340016
Add +fp16fml feature for new FP16 instructions, which are a
mandatory part of FP16 from v8.4-A and an optional part of FP16
from v8.2-A. It doesn't seem to be possible to model this in
LLVM, but the relationship between the options is handled by
the related clang patch.
In keeping with what I think is the usual practice, the fp16fml
extension is accepted regardless of base architecture version.
Builds on/replaces Sjoerd Meijer's patch to add these instructions at
https://reviews.llvm.org/D49839.
Differential Revision: https://reviews.llvm.org/D50228
llvm-svn: 340013
Add support for cases where only some c1+c2 results exceed the max bitshift, clamping accordingly.
Differential Revision: https://reviews.llvm.org/D35722
llvm-svn: 340010
Summary:
Looking at the callee argument list, as is done now, might not work if
the function has been typecasted into one that is expected to return
a struct. This change also simplifies the code.
The isFP128ABICall() function can be removed as it is no longer needed.
The test in fp128.ll has been updated to verify this.
Reviewers: jyknight, venkatra
Reviewed By: jyknight
Subscribers: fedor.sergeev, jrtc27, llvm-commits
Differential Revision: https://reviews.llvm.org/D48117
llvm-svn: 340008
Summary: When @llvm.returnaddress is called with a value higher than 0
it needs to read from the call stack to get the return address. This
means that the register windows needs to be flushed to the stack to
guarantee that the data read is valid. For values higher than 1 this
is done indirectly by the call to getFRAMEADDR(), but not for the value 1.
Reviewers: jyknight, venkatra
Reviewed By: jyknight
Subscribers: fedor.sergeev, jrtc27, llvm-commits
Differential Revision: https://reviews.llvm.org/D48636
llvm-svn: 340003
The description of `isGuaranteedToExecute` does not correspond to its implementation.
According to description, it should return `true` if an instruction is executed under the
assumption that its loop is *entered*. However there is a sophisticated alrogithm inside
that tries to prove that the instruction is executed if the loop is *exited*, which is not the
same thing for infinite loops. There is an attempt to protect from dealing with infinite loops
by prohibiting loops without exit blocks, however an infinite loop can have exit blocks.
As result of that, MustExecute can falsely consider some blocks that are never entered as
mustexec, and LICM can hoist dangerous instructions out of them basing on this fact.
This may introduce UB to programs which did not contain it initially.
This patch removes the problematic algorithm and replaced it with a one which tries to
prove what is required in description.
Differential Revision: https://reviews.llvm.org/D50558
Reviewed By: reames
llvm-svn: 339984
https://reviews.llvm.org/D50401
Add opcodes for llvm.intrinsic.trunc, round, and update the IRTranslator
for the same.
Reviewed by: dsanders.
llvm-svn: 339977
Summary:
This adds support for exception handling to CFGStackify pass. This only
adds TRY / END_TRY markers and DOES NOT yet fix unwind mismatches that
can be created by the linearization of the CFG into the structural wasm
format. The mismatch fix will be added by following patches.
In detail, this patch
- Added support for TRY / END_TRY markers to support EH
- Changed many static functions into class member functions as they take
too many arguments now
- Added several more bookeeping data structures
- Refactored routines that decide where to insert markers, because
without refactoring this got too complicated as we added support for new
kinds of markers (TRY/END_TRY).
- Rewrote rethrow instructions' BB arguments to relative depths in EH
pad stack.
Reviewers: dschuff, sunfish
Subscribers: sbc100, jgravelle-google, llvm-commits
Differential Revision: https://reviews.llvm.org/D48273
llvm-svn: 339967
well as MIR parsing support for `MCSymbol` `MachineOperand`s.
The only real way to test pre- and post-instruction symbol support is to
use them in operands, so I ended up implementing that within the patch
as well. I can split out the operand support if folks really want but it
doesn't really seem worth it.
The functional implementation of pre- and post-instruction symbols is
now *completely trivial*. Two tiny bits of code in the (misnamed)
AsmPrinter. It should be completely target independent as well. We emit
these exactly the same way as we emit basic block labels. Most of the
code here is to give full dumping, MIR printing, and MIR parsing support
so that we can write useful tests.
The MIR parsing of MC symbol operands still isn't 100%, as it forces the
symbols to be non-temporary and non-local symbols with names. However,
those names often can encode most (if not all) of the special semantics
desired, and unnamed symbols seem especially annoying to serialize and
de-serialize. While this isn't perfect or full support, it seems plenty
to write tests that exercise usage of these kinds of operands.
The MIR support for pre-and post-instruction symbols was quite
straightforward. I chose to print them out in an as-if-operand syntax
similar to debug locations as this seemed the cleanest way and let me
use nice introducer tokens rather than inventing more magic punctuation
like we use for memoperands.
However, supporting MIR-based parsing of these symbols caused me to
change the design of the symbol support to allow setting arbitrary
symbols. Without this, I don't see any reasonable way to test things
with MIR.
Differential Revision: https://reviews.llvm.org/D50833
llvm-svn: 339962
This is a follow-up suggested with rL339604.
For tan(), we don't have a corresponding LLVM
intrinsic -- unlike sin/cos -- so this is the
only way/place that we can do this fold currently.
llvm-svn: 339958
Thread sanitizer instrumentation fails to skip all loads and stores to
profile counters. This can happen if profile counter updates are merged:
%.sink = phi i64* ...
%pgocount5 = load i64, i64* %.sink
%27 = add i64 %pgocount5, 1
%28 = bitcast i64* %.sink to i8*
call void @__tsan_write8(i8* %28)
store i64 %27, i64* %.sink
To suppress TSan diagnostics about racy counter updates, make the
counter updates atomic when TSan is enabled. If there's general interest
in this mode it can be surfaced as a clang/swift driver option.
Testing: check-{llvm,clang,profile}
rdar://40477803
Differential Revision: https://reviews.llvm.org/D50867
llvm-svn: 339955
When nodes are reassociated the vector-reduction flag gets lost.
The test case is here is what would happen if you had a sum of absolute differences loop that started with a non-zero but contant sum and that loop was unrolled. The vectorizer will generate a constant vector for the initial value. And DAGCombiner reassociate tries to move it down the addition tree erasing the vector-reduction flag. Interestingly this moves constants the opposite direction of the reassociate IR pass.
I've chosen to just punt on the reassociate, but I suppose we could maybe preserve the flag if both nodes have it set.
Differential Revision: https://reviews.llvm.org/D50827
llvm-svn: 339946
Normally the peephole pass converts EXTRACT_SUBREG to COPY instructions. But we're after peephole so we can't rely on it to clean these up.
To fix this, the eflags pass now emits a COPY with a subreg input.
I also noticed that in 32-bit mode we need to constrain the input to the copy to ensure the subreg is valid. Otherwise we'll fail verify-machineinstrs
Differential Revision: https://reviews.llvm.org/D50656
llvm-svn: 339945
Handle the case when the symbol is private. Private symbols are not in
the COFF object file symbol table, so they aren't inserted into
SymbolMap. We can't look up the section of the symbol that way. Instead,
get the MCSection from the MCSymbol and map that to the object file
section.
Print a better error message when the symbol has no section, like when
the symbol is undefined.
Fixes PR38607
llvm-svn: 339942
In cases where the debugger load time is a worthwhile tradeoff (or less
costly - such as loading from a DWP instead of a variety of DWOs
(possibly over a high-latency/distributed filesystem)) against object
file size, it can be reasonable to disable pubnames and corresponding
gdb-index creation in the linker.
A backend-flag version of this was implemented for NVPTX in
D44385/r327994 - which was fine for NVPTX which wouldn't mix-and-match
CUs. Now that it's going to be a user-facing option (likely powered by
"-gno-pubnames", the same as GCC) it should be encoded in the
DICompileUnit so it can vary per-CU.
After this, likely the NVPTX support should be migrated to the metadata
& the previous flag implementation should be removed.
Reviewers: aprantl
Differential Revision: https://reviews.llvm.org/D50213
llvm-svn: 339939
The fix is fairly simple, but is says something unpleasant about the usage and testing of invariant.start/end scopes that this went undetected. To put this in perspective, *any* invariant.end in a loop flowing through LICM crashed. I haven't bothered to figure out just how far back this goes, but it's not caused by any of the recent changes. We're probably talking months if not years.
llvm-svn: 339936
Each use of a value should be jointly dominated by the union of defs and
undefs. It can happen that it will only be jointly dominated by undefs,
and that is still legal. Make sure that the verifier is aware of that.
llvm-svn: 339924
There is no way in the universe, that doing a full-width division in
software will be faster than doing overflowing multiplication in
software in the first place, especially given that this same full-width
multiplication needs to be done anyway.
This patch replaces the previous implementation with a direct lowering
into an overflowing multiplication algorithm based on half-width
operations.
Correctness of the algorithm was verified by exhaustively checking the
output of this algorithm for overflowing multiplication of 16 bit
integers against an obviously correct widening multiplication. Baring
any oversights introduced by porting the algorithm to DAG, confidence in
correctness of this algorithm is extremely high.
Following table shows the change in both t = runtime and s = space. The
change is expressed as a multiplier of original, so anything under 1 is
“better” and anything above 1 is worse.
+-------+-----------+-----------+-------------+-------------+
| Arch | u64*u64 t | u64*u64 s | u128*u128 t | u128*u128 s |
+-------+-----------+-----------+-------------+-------------+
| X64 | - | - | ~0.5 | ~0.64 |
| i686 | ~0.5 | ~0.6666 | ~0.05 | ~0.9 |
| armv7 | - | ~0.75 | - | ~1.4 |
+-------+-----------+-----------+-------------+-------------+
Performance numbers have been collected by running overflowing
multiplication in a loop under `perf` on two x86_64 (one Intel Haswell,
other AMD Ryzen) based machines. Size numbers have been collected by
looking at the size of function containing an overflowing multiply in
a loop.
All in all, it can be seen that both performance and size has improved
except in the case of armv7 where code size has regressed for 128-bit
multiply. u128*u128 overflowing multiply on 32-bit platforms seem to
benefit from this change a lot, taking only 5% of the time compared to
original algorithm to calculate the same thing.
The final benefit of this change is that LLVM is now capable of lowering
the overflowing unsigned multiply for integers of any bit-width as long
as the target is capable of lowering regular multiplication for the same
bit-width. Previously, 128-bit overflowing multiply was the widest
possible.
Patch by Simonas Kazlauskas!
Differential Revision: https://reviews.llvm.org/D50310
llvm-svn: 339922
This patch refactors the existing TargetLowering::BuildSDIV base implementation to support non-uniform constant vector denominators.
This is the last patch necessary to close PR36545
Differential Revision: https://reviews.llvm.org/D50765
llvm-svn: 339908
Summary:
This prefix was added in r333421, and it changed our dumper output to
say things like "CVRegEAX" instead of just "EAX". That's a functional
change that I'd rather avoid.
I tested GCC, Clang, and MSVC, and all of them support #pragma
push_macro. They don't issue warnings whem the macro is not defined
either.
I don't have a Mac so I can't test the real termios.h header, but I
looked at the termios.h sources online and looked for other conflicts.
I saw only the CR* macros, so those are the ones we work around.
Reviewers: zturner, JDevlieghere
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D50851
llvm-svn: 339907
This will allow the library to just use __builtin_expf directly
without expanding this itself. Note f64 still won't work because
there is no exp instruction for it.
llvm-svn: 339902
Allow the comparison of x86 registers in the evaluation of assembler
directives. This generalizes and simplifies the extension from r334022
to catch another case found in the Linux kernel.
Reviewers: rnk, void
Reviewed By: rnk
Subscribers: hiraditya, nickdesaulniers, llvm-commits
Differential Revision: https://reviews.llvm.org/D50795
llvm-svn: 339895
When demangling string literals, Microsoft's undname
simply prints 'string'. This patch implements string
literal demangling while doing a bit better than this
by decoding as much of the string as possible and
trying to faithfully reproduce the original string
literal definition.
This is a bit tricky because the different character
types char, char16_t, and char32_t are not uniquely
identified by the mangling, so we have to use a
heuristic to try to guess the character type. But
it works pretty well, and many tests are added to
illustrate the behavior.
Differential Revision: https://reviews.llvm.org/D50806
llvm-svn: 339892
When we have an MD5 mangled name, we shouldn't choke and say
that it's an invalid name. Even though it's impossible to demangle,
we should just output the original name.
llvm-svn: 339891
Expand the number of cases when `pow(x, 0.5)` is simplified into `sqrt(x)`
by considering the math semantics with more granularity.
Differential revision: https://reviews.llvm.org/D50036
llvm-svn: 339887
While searching through the use-def tree, ignore GetElementPtrInst
instructions because they don't need promoting and neither do their
indices. Otherwise, the wide indices prevent the transformation from
happening.
Differential Revision: https://reviews.llvm.org/D50762
llvm-svn: 339871
When emitting the difference between two symbols, the standard behavior is
that the difference will be resolved to an absolute value if both of the
symbols are offsets from the same data fragment. This is undesirable on
architectures such as RISC-V where relaxation in the linker may cause the
computed difference to become invalid. This caused an issue when compiling to
object code, where the size of a function in the debug information was already
calculated even though it could change as a consequence of relaxation in the
subsequent linking stage.
This patch inhibits the resolution of symbol differences to absolute values
where the target's AsmBackend has declared that it does not want these to be
folded.
Differential Revision: https://reviews.llvm.org/D45773
Patch by Edward Jones.
llvm-svn: 339864
Originally committed in r339755 which was reverted in r339806 due to
an asan issue. The issue was caused by my assumption that operands to
a CallInst mapped to the FunctionType Params. CallInsts are now
handled by iterating over their ArgOperands instead of Operands.
Original Message:
Treat signed icmps as 'sinks', allowing them to be in the use-def
tree, enabling more promotions to be performed. As a sink, any
promoted incoming values need to be truncated before being used by
the signed icmp.
Differential Revision: https://reviews.llvm.org/D50067
llvm-svn: 339858
a shorter name ('x86-slh') for the internal flags and pass name.
Without this, you can't use the -stop-after or -stop-before
infrastructure. I seem to have just missed this when originally adding
the pass.
The shorter name solves two problems. First, the flag names were ...
really long and hard to type/manage. Second, the pass name can't be the
exact same as the flag name used to enable this, and there are already
some users of that flag name so I'm avoiding changing it unnecessarily.
llvm-svn: 339836
Summary:
Profile count of a block is computed by multiplying its block frequency
by entry count and dividing the result by entry block frequency. Do
rounded division in the last step and update test cases appropriately.
Reviewers: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D50822
llvm-svn: 339835
This patch fixes PR38125.
Instruction extension types are recorded in PromotedInsts, it can be used later in function canGetThrough. If an instruction has two users with different extension types, it will be inserted into PromotedInsts two times in function promoteOperandForOther. The second one overwrites the first one, and the final extension type is wrong, later causes problem in canGetThrough.
This patch changes the simple bool extension type to 2-bit enum type, add a BothExtension type in addition to zero/sign extension. When an user sees BothExtension for an instruction, it actually knows nothing about how that instruction is extended.
Differential Revision: https://reviews.llvm.org/D49512
llvm-svn: 339822
Handle fmul, fsub and preserve flags.
Also really test minnum/maxnum reductions.
The existing tests were only checking from
minnum/maxnum matched from a fast math compare
and select which is not the same.
llvm-svn: 339820
To lower this we now create a new V1 containing the low half of both sources and a new V2 containing the upper half of both sources. Then we created a repeated lane shuffle of those new sources to create the final result.
This fixes PR35833
Differential Revison: https://reviews.llvm.org/D41794
llvm-svn: 339818
If a relocation group doesn't have the RELOCATION_GROUP_HAS_ADDEND_FLAG set, then this implies the group's addend equals zero.
In this case android packed format won't encode an explicit addend delta, instead we need to set Addend, the "previous addend" variable, to zero by ourself.
Patch by Yi-Yo Chiang!
Differential Revision: https://reviews.llvm.org/D50601
llvm-svn: 339799
These correspond to the x86 tests added with rL339790 / rL339791, but I widened
the non-fsin tests to v3f32 to show the problem because AArch supports v2f32 ops.
llvm-svn: 339793
Subregister liveness applies selectively to register classes with certain
properties. Make sure that when it's enabled, it applies to a given virtual
register (in virtual register rewriter).
llvm-svn: 339784
To make ISD::VSELECT available(legal) so long as there are altivec instruction,
otherwise it's default behavior is expanding.
Use xxsel to match vselect if vsx is open, or use vsel.
In order to do not write many patterns in td file, promote (for vector it's
bitcast) all other type into v4i32 and only pattern match vselect of v4i32 into
vsel or xxsel.
Patch by wuzish
Differential revision: https://reviews.llvm.org/D49531
llvm-svn: 339779
This allows to set custom Info field value for SHT_GROUP sections.
It is useful to allow this because we would be able to replace at least one binary
object committed in LLD and replace it with the yaml2obj based test.
Differential revision: https://reviews.llvm.org/D50776
llvm-svn: 339772
We only try to promote types with are smaller than 16-bits, but we
also need to check that the type is not less than 8-bits.
Differential Revision: https://reviews.llvm.org/D50769
llvm-svn: 339770
When trying to combine a DAG that builds a vector out of sign-extensions of
vector extracts, the code assumes legal input types. Due to that, we have to
disable this combine prior to legalization.
In some cases, the DAG will look slightly different after legalization so
account for that in the matching code.
This is a fix for https://bugs.llvm.org/show_bug.cgi?id=38087
Differential Revision: https://reviews.llvm.org/D49080
llvm-svn: 339769
This patch fixes a regression introduced at revision 338702.
A processor resource mask was incorrectly implicitly truncated to an unsigned
quantity. Later on, the truncated mask was used to initialize an element of a
vector of processor resource descriptors.
On targets with more than 32 processor resources, some elements of the vector
are left uninitialized. As a consequence, this bug might have eventually caused
a crash due to null dereference in the Scheduler.
This patch fixes PR38575, and adds a test for it.
llvm-svn: 339768
Currently, it is possible to use yaml2obj for producing SHT_GROUP sections
of type GRP_COMDAT. For LLD test case I need to produce an object with
a broken (different from GRP_COMDAT) type.
The patch teaches tool to do such things.
Differential revision: https://reviews.llvm.org/D50761
llvm-svn: 339764
This commit adds new sibling-call test cases, so it will be possible to see
how these test cases will be changed after applying D45653.
See D45653 for details.
llvm-svn: 339760
This patch refactors the existing BuildExactSDIV implementation to support non-uniform constant vector denominators.
Differential Revision: https://reviews.llvm.org/D50392
llvm-svn: 339756
Treat signed icmps as 'sinks', allowing them to be in the use-def
tree, enabling more promotions to be performed. As a sink, any
promoted incoming values need to be truncated before being used by
the signed icmp.
Differential Revision: https://reviews.llvm.org/D50067
llvm-svn: 339755
Add pointers to the list of allowed types, but don't try to promote
them. Also fixed a bug with the promotion of undef values, so a new
value is now created instead of mutating in place. We also now only
promote if there's an instruction in the use-def chains other than
the icmp, sinks and sources.
Differential Revision: https://reviews.llvm.org/D50054
llvm-svn: 339754
The `experimental_guard` intrinsic has memory write semantics to model the thread-exiting
logic, but does not do any actual writes to memory. Currently, `AliasSetTracker` treats it as a
normal memory write. As result, a loop-invariant load cannot be hoisted out of loop because
the guard may possibly alias with it.
This patch makes `AliasSetTracker` so that it doesn't treat guards as memory writes.
Differential Revision: https://reviews.llvm.org/D50497
Reviewed By: reames
llvm-svn: 339753
Summary:
This shouldn't have been a specific number but rather a regex. This was
a part of rL339474 which got reverted.
Reviewers: aardappel
Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D50728
llvm-svn: 339736
Flags in DIBasicType will be used to pass attributes used in
DW_TAG_base_type, such as DW_AT_endianity.
Patch by Chirag Patel!
Differential Revision: https://reviews.llvm.org/D49610
llvm-svn: 339714
Modifies existing SIMD tests to also check that SIMD instructions are
lowered to the expected bytes. This CL depends on D50597.
Reviewers: aheejin
Subscribers: sunfish, jgravelle-google, sbc100, llvm-commits
Differential Revision: https://reviews.llvm.org/D50660
Patch by Thomas Lively (tlively)
llvm-svn: 339712
1) We print __restrict twice on member pointers. This is fixed
and relevant tests are re-enabled.
2) Several tests were disabled because of printing slightly
different output than undname. These were confirmed to be
bugs in undname, so we just re-enable the tests.
3) The test for printing reference temporaries is re-enabled. This
is a clang mangling extension, so we have some flexibility with
how we demangle it. The output currently looks fine, so we just
re-enable the test with no fixes.
llvm-svn: 339708
Implement instruction selection for all versions of the extract_lane
instruction. Use explicit sext/zext to differentiate between
extract_lane_s and extract_lane_u for applicable types, otherwise
default to extract_lane_u.
Reviewers: aheejin
Subscribers: sunfish, jgravelle-google, sbc100, llvm-commits
Differential Revision: https://reviews.llvm.org/D50597
Patch by Thomas Lively (tlively)
llvm-svn: 339707
Summary:
This patch teaches the loop vectorizer to vectorize loops with non
header phis that have have outside uses. This is because the iteration
dependence distance for these phis can be widened upto VF (similar to
how we do for induction/reduction) if they do not have a cyclic
dependence with header phis. When identifying reduction/induction/first
order recurrence header phis, we already identify if there are any cyclic
dependencies that prevents vectorization.
The vectorizer is taught to extract the last element from the vectorized
phi and update the scalar loop exit block phi to contain this extracted
element from the vector loop.
This patch can be extended to vectorize loops where instructions other
than phis have outside uses.
Reviewers: Ayal, mkuper, mssimpso, efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D50579
llvm-svn: 339703
rL339686 added the case where a faux shuffle might have repeated shuffle inputs coming from either side of the OR().
This patch improves the insertion of the inputs into the source ops lists to account for this, as well as making it trivial to add support for shuffles with more than 2 inputs in the future.
llvm-svn: 339696
There are two forms for label debug information in DWARF format.
1. Labels in a non-inlined function:
DW_TAG_label
DW_AT_name
DW_AT_decl_file
DW_AT_decl_line
DW_AT_low_pc
2. Labels in an inlined function:
DW_TAG_label
DW_AT_abstract_origin
DW_AT_low_pc
We will collect label information from DBG_LABEL. Before every DBG_LABEL,
we will generate a temporary symbol to denote the location of the label.
The symbol could be used to get DW_AT_low_pc afterwards. So, we create a
mapping between 'inlined label' and DBG_LABEL MachineInstr in DebugHandlerBase.
The DBG_LABEL in the mapping is used to query the symbol before it.
The AbstractLabels in DwarfCompileUnit is used to process labels in inlined
functions.
We also keep a mapping between scope and labels in DwarfFile to help to
generate correct tree structure of DIEs.
It also generates label debug information under global isel.
Differential Revision: https://reviews.llvm.org/D45556
llvm-svn: 339676
Summary: This revision improves previous version (rL330322) which has been reverted due to crashes.
This is the patch that lowers x86 intrinsics to native IR
in order to enable optimizations. The patch also includes folding
of previously missing saturation patterns so that IR emits the same
machine instructions as the intrinsics.
Reviewers: craig.topper, spatel, RKSimon
Reviewed By: craig.topper
Subscribers: mike.dvoretsky, DavidKreitzer, sroland, llvm-commits
Differential Revision: https://reviews.llvm.org/D46179
llvm-svn: 339650
Summary:
When WPD is performed in a ThinLTO backend, the function may be created
if it isn't already in that module. Module::getOrInsertFunction may
add a bitcast, in which case the returned Constant is not a Function and
doesn't have a name. Invoke stripPointerCasts() on the returned value
where we access its name.
Reviewers: pcc
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D49959
llvm-svn: 339640
Summary:
The AsmWriter was only writing the Args for a ConstVCall if it was
non-empty, however, the LLParser was always expecting it. To aid
in making it optional, surround the ConstVCall VFuncId and Args in
parentheses when writing, then make the Args optional when reading.
Reviewers: pcc
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D49960
llvm-svn: 339637
Summary:
Calls marked 'tail' cannot read or write allocas from the current frame
because the current frame might be destroyed by the time they run.
However, a tail call may use an alloca with byval. Calling with byval
copies the contents of the alloca into argument registers or stack
slots, so there is no lifetime issue. Tail calls never modify allocas,
so we can return just ModRefInfo::Ref.
Fixes PR38466, a longstanding bug.
Reviewers: hfinkel, nlewycky, gbiv, george.burgess.iv
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D50679
llvm-svn: 339636
The behavior in 64-bit mode is different between Intel and AMD CPUs. Intel ignores the 0x66 prefix. AMD does not. objump doesn't ignore the 0x66 prefix. Since LLVM aims to match objdump behavior, we should do the same.
While I was trying to fix this I had change brtarget16/32 to use ENCODING_IW/ID instead of ENCODING_Iv to get the 0x66+REX.W case to act sort of sanely. It's still wrong, but that's a problem for another day.
The change in encoding exposed the fact that 16-bit mode disassembly of relative jumps was creating JMP_4 with a 2 byte immediate. It should have been JMP_2. From just printing you can't tell the difference, but if you dumped the encoding it wouldn't have matched what we started with.
While fixing that, it exposed that jo/jno opcodes were missing from the switch that this patch deleted and there were no test cases for them.
Fixes PR38537.
llvm-svn: 339622
Summary:
This comes with `Implicit Conversion Sanitizer - integer sign change` (D50250):
```
signed char test(unsigned int x) { return x; }
```
`clang++ -fsanitize=implicit-conversion -S -emit-llvm -o - /tmp/test.cpp -O3`
* Old: {F6904292}
* With this patch: {F6904294}
General pattern:
X & Y
Where `Y` is checking that all the high bits (covered by a mask `4294967168`)
are uniform, i.e. `%arg & 4294967168` can be either `4294967168` or `0`
Pattern can be one of:
%t = add i32 %arg, 128
%r = icmp ult i32 %t, 256
Or
%t0 = shl i32 %arg, 24
%t1 = ashr i32 %t0, 24
%r = icmp eq i32 %t1, %arg
Or
%t0 = trunc i32 %arg to i8
%t1 = sext i8 %t0 to i32
%r = icmp eq i32 %t1, %arg
This pattern is a signed truncation check.
And `X` is checking that some bit in that same mask is zero.
I.e. can be one of:
%r = icmp sgt i32 %arg, -1
Or
%t = and i32 %arg, 2147483648
%r = icmp eq i32 %t, 0
Since we are checking that all the bits in that mask are the same,
and a particular bit is zero, what we are really checking is that all the
masked bits are zero.
So this should be transformed to:
%r = icmp ult i32 %arg, 128
The transform itself ended up being rather horrible, even though i omitted some cases.
Surely there is some infrastructure that can help clean this up that i missed?
https://rise4fun.com/Alive/3Ou
The initial commit (rL339610)
was reverted, since the first assert was being triggered.
The @positive_with_extra_and test now has coverage for that case.
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: RKSimon, erichkeane, vsk, llvm-commits
Differential Revision: https://reviews.llvm.org/D50465
llvm-svn: 339621
Even though this code is below a function called optimizeFloatingPointLibCall(),
we apparently can't guarantee that we're dealing with FPMathOperators, so bail
out immediately if that's not true.
llvm-svn: 339618
At least one buildbot was able to actually trigger that assert
on the top of the function. Will investigate.
This reverts commit r339610.
llvm-svn: 339612
Summary:
This comes with `Implicit Conversion Sanitizer - integer sign change` (D50250):
```
signed char test(unsigned int x) { return x; }
```
`clang++ -fsanitize=implicit-conversion -S -emit-llvm -o - /tmp/test.cpp -O3`
* Old: {F6904292}
* With this patch: {F6904294}
General pattern:
X & Y
Where `Y` is checking that all the high bits (covered by a mask `4294967168`)
are uniform, i.e. `%arg & 4294967168` can be either `4294967168` or `0`
Pattern can be one of:
%t = add i32 %arg, 128
%r = icmp ult i32 %t, 256
Or
%t0 = shl i32 %arg, 24
%t1 = ashr i32 %t0, 24
%r = icmp eq i32 %t1, %arg
Or
%t0 = trunc i32 %arg to i8
%t1 = sext i8 %t0 to i32
%r = icmp eq i32 %t1, %arg
This pattern is a signed truncation check.
And `X` is checking that some bit in that same mask is zero.
I.e. can be one of:
%r = icmp sgt i32 %arg, -1
Or
%t = and i32 %arg, 2147483648
%r = icmp eq i32 %t, 0
Since we are checking that all the bits in that mask are the same,
and a particular bit is zero, what we are really checking is that all the
masked bits are zero.
So this should be transformed to:
%r = icmp ult i32 %arg, 128
https://rise4fun.com/Alive/3Ou
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: RKSimon, erichkeane, vsk, llvm-commits
Differential Revision: https://reviews.llvm.org/D50465
llvm-svn: 339610
Added a test case to reduction showing where it's illegal to identify
vectorize a loop.
Resetting the reduction var during loop iterations disallows us from
widening the dependency cycle to VF, thereby making it illegal to
vectorize the loop.
llvm-svn: 339605
This is a very partial fix for the reported problem. I suspect
we do not get this fold in most motivating cases because most of
the time, the libcall would have been replaced by an intrinsic,
and that optimization is handled elsewhere...but maybe it should
be handled here?
llvm-svn: 339604
Fix SelectionDAG::computeKnownBits asserting when handling EXTRACT_SUBVECTOR
when zero extending the demanded elements mask if it is already as long as the
source vector.
Differential Revision: https://reviews.llvm.org/D49574
llvm-svn: 339600
Summary: The GR740 provides an up cycle counter in the
registers ASR22 and ASR23. As these registers can not be
read together atomically we only use the value of ASR23
for llvm.readcyclecounter(). The ASR23 register holds the
32 LSBs of the up-counter.
Reviewers: jyknight, venkatra
Reviewed By: jyknight
Subscribers: fedor.sergeev, jrtc27, llvm-commits
Differential Revision: https://reviews.llvm.org/D48638
llvm-svn: 339551
This is a second part of D49974 that handles widening of conditional branches that
have very likely `false` branch.
Differential Revision: https://reviews.llvm.org/D50040
Reviewed By: reames
llvm-svn: 339537
This is another variation of PR38533. In this case, the result type of the bitcast is legal and 16-bits wide, but not a scalar integer. So we need to emit the convert to i16 and then bitcast it to the true result type. This new bitcast will be further type legalized if necessary.
llvm-svn: 339536
Previously if the result type was a vector, we emitted a FP_TO_FP16 with a vector result type which isn't valid.
This is basically the opposite case of the root cause of PR38533.
llvm-svn: 339535
Fixes PR37524.
The exception handling encodings for x86_64 in kernel code model
has been changed with r309884. Restore it to correct ones. These
encodings include PersonalityEncoding, LSDAEncoding and
TTypeEncoding.
Differential Revision: https://reviews.llvm.org/D50490
llvm-svn: 339534
Summary:
We've supported constant folding for sse versions for many years. This patch adds support for the avx512 versions including unsigned with the default rounding mode. We could probably do more with other roundings modes and SAE in the future.
The test cases are largely based on the sse.ll test cases. But I did add some test cases to ensure the unsigned versions don't accept negative values. Also checked the bounds of f64->i32 conversions to make sure unsigned has a larger positive range than signed.
Reviewers: RKSimon, spatel, chandlerc
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D50553
llvm-svn: 339529
I'm not sure the exact nsz flag combination that
is OK. I think as long as it's on either, this is OK.
For now just check it on the omod multiply.
llvm-svn: 339513
If one of the elements is undef, use the canonicalized constant
from the other element instead of 0.
Splat vectors are more useful for other optimizations, such
as matching vector clamps. This was breaking on clamps
of half3 from the undef 4th component.
llvm-svn: 339512
Try to improve the computed counts when it has been explicitly set by a pragma
or command line option. This moves the code around, so that first call to
computeUnrollCount to get a sensible count and override that if explicit unroll
and jam counts are specified.
Also added some extra debug messages for when unroll and jamming is disabled.
Differential Revision: https://reviews.llvm.org/D50075
llvm-svn: 339501
Now we switch to the subregister in expandPostRAPseudos where we already switched the opcode.
This simplifies a few isel patterns that used the pseudo directly. And magically seems to have improved our ability to CSE it in the undef-label.ll test.
llvm-svn: 339496
If we have an assume which is known to execute and whose operand is invariant, we can lift that into the pre-header. So long as we don't change which paths the assume executes on, this is a legal transformation. It's likely to be a useful canonicalization as other transforms only look for dominating assumes.
Differential Revision: https://reviews.llvm.org/D50364
llvm-svn: 339481
Summary:
Moved Explicit Locals pass to last.
Made that pass obligatory.
Made it convert from register to stack based instructions, and removed the registers.
Fixes to related code that was expecting register based instructions.
Added the correct testing flag to all tests, depending on what the
format they were expecting so far.
Translated one test to stack format as example: reg-stackify-stack.ll
tested:
llvm-lit -v `find test -name WebAssembly`
unittests/MC/*
Reviewers: dschuff, sunfish
Subscribers: jfb, llvm-commits, aheejin, eraman, jgravelle-google, sbc100
Differential Revision: https://reviews.llvm.org/D50568
llvm-svn: 339474
LLVM normally prefers to minimize the number of bits set in an AND
immediate, but that doesn't always match the available ARM instructions.
In Thumb1 mode, prefer uxtb or uxth where possible; otherwise, prefer
a two-instruction sequence movs+ands or movs+bics.
Some potential improvements outlined in
ARMTargetLowering::targetShrinkDemandedConstant, but seems to work
pretty well already.
The ARMISelDAGToDAG fix ensures we don't generate an invalid UBFX
instruction due to a larger-than-expected mask. (It's orthogonal, in
some sense, but as far as I can tell it's either impossible or nearly
impossible to reproduce the bug without this change.)
According to my testing, this seems to consistently improve codesize by
a small amount by forming bic more often for ISD::AND with an immediate.
Differential Revision: https://reviews.llvm.org/D50030
llvm-svn: 339472
There are two cases we need to support with extern "C"
functions. The first is the case of a '9' indicating that
the function has no prototype. This occurs when we mangle
a symbol inside of an extern "C" function, but not the
function itself.
The second case is when we have an overloaded extern "C"
functions. In this case we emit $$J0 to indicate this.
This patch adds support for both of these cases.
llvm-svn: 339471