An upcoming patch D41434, changes the ordering of the matcher table
for assembly. This patch corrects the definition of the normal MIPS
cvt.d.w not to be available in microMIPS.
llvm-svn: 325589
Current implementation always allocates the parameter save area conservatively
for fastcc functions. There is no reason to allocate the parameter save area if
all the parameters can be passed via registers.
Differential Revision: https://reviews.llvm.org/D42602
llvm-svn: 325581
We can always convert xor %a, -1 into MVN, even in thumb 1 where the -1
would not otherwise be considered a cheap constant. This prevents the
-1's from being pulled out into constants and potentially hoisted.
Differential Revision: https://reviews.llvm.org/D43451
llvm-svn: 325573
For instructions like call foo and jmp foo patch changes
relocation produced from R_X86_64_PC32 to R_X86_64_PLT32.
Relocation can be used as a marker for 32-bit PC-relative branches.
Linker will reduce PLT32 relocation to PC32 if function is defined locally.
Differential revision: https://reviews.llvm.org/D43383
llvm-svn: 325569
Summary:
The machine instruction scheduler was illegally moving a buffer store
past a buffer load with the same descriptor and offset. Fixed by marking
buffer ops as mayAlias and isAliased. This may be overly conservative,
and we may need to revisit.
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D43332
Change-Id: Iff3173d9e0653e830474546276ab9d30318b8ef7
llvm-svn: 325567
The 128 and 256 bit versions were already not used by clang. This adds an equivalent unmasked 512 bit version. Then autoupgrades all sizes to use unmasked intrinsics plus select.
llvm-svn: 325559
This is a follow on commit to r[x] where we fix the other direction of copy.
For this case, after converting the source from gpr32 -> fpr32, we use a
subregister copy, which is essentially what EXTRACT_SUBREG does in SDAG land.
https://reviews.llvm.org/D43444
llvm-svn: 325550
Previously we used vptestmd, but the scheduling data for SKX says vpmovq2m/vpmovd2m is lower latency. We already used vpmovb2m/vpmovw2m for byte/word truncates. So this is more consistent anyway.
llvm-svn: 325534
We swapped the operands and used setle, but I don't see any reason to do that. I think this is a holdover from SSE where we swap and the invert to use pcmpgt. But with AVX512 we don't want an invert so we won't use pcmpgt. So there's no need to swap.
llvm-svn: 325527
Canonicalize EQ/NE PCMPM to have build vector all zeros on the RHS so we don't have to pattern match it in both locations. This significantly reduces the number of isel patterns needed since we also had to multiply it out with loads being in either operand of the 'and' input node and in the 'and' masking node.
This removes over 24000 bytes from the isel table.
llvm-svn: 325526
Summary: GCN ISA supports instructions that can read 16 consecutive dwords from memory through the scalar data cache; loadstoreVectorizer should take advantage of the wider vector length and pack 16/8 elements of dwords/quadwords.
Author: FarhanaAleen
Reviewed By: rampitec
Subscribers: llvm-commits, AMDGPU
Differential Revision: https://reviews.llvm.org/D43275
llvm-svn: 325518
It was reverted because it broke the grub build. The reason the grub
build broke is because grub does its own relocation processing and was
not handing R_386_PLT32. Since grub has no dynamic linker, the fix is
trivial: handle R_386_PLT32 exactly like R_386_PC32.
On the report it was noted that they are using
-fno-integrated-assembler. The upstream GAS (starting with
451875b4f976a527395e9303224c7881b65e12ed) will already be producing a
R_386_PLT32 anyway, so they have to update their code one way or the
other
Original message:
Don't assume a null GV is local for ELF and MachO.
This is already a simplification, and should help with avoiding a plt
reference when calling an intrinsic with -fno-plt.
With this change we return false for null GVs, so the caller only
needs to check the new metadata to decide if it should use foo@plt or
*foo@got.
llvm-svn: 325514
This adds the program memory address space setting to the AVR data
layout.
This setting was very recently added under r325479.
At the moment, there are no uses of this setting. In the future, things
such as switch lookup tables should reside there.
llvm-svn: 325481
The parseFunctionArgs() method was directly reading the
arguments from a Function object, but is should have used the
arguments supplied by the SelectionDAGBuilder.
This was causing
the lowering code to only lower one argument, not two in some cases.
Thanks to @brainlag on GitHub for coming up with the working fix!
Patch-by: @brainlag on GitHub
llvm-svn: 325474
We're accidentally checking that the same node is a constant twice instead of checking the other node.
This isn't a functional problem since we didn't do anything below that explicitly requires constants. It just means we may have introduced a sign_extend or zero_extend that won't fold out.
llvm-svn: 325469
Enable multiple COPY hints to eliminate more COPYs during register allocation.
Note that this is something all targets should do, see
https://reviews.llvm.org/D38128.
Review: Yonghong Song
llvm-svn: 325457
Previously we used the immediate encoding if the load was in operand 0 and the short encoding if the load was in operand 1.
This added an insane number of bytes to the size of the isel table. I'm wondering if we should always use the immediate form during isel and change to the short form during emission. This would remove the need to pattern match every combination for both the immediate form and the short form during isel. We could do the same with vpcmpgt
llvm-svn: 325456
This makes sure that alloca() function calls properly probe the
stack as needed.
Differential Revision: https://reviews.llvm.org/D42356
llvm-svn: 325433
Enable multiple COPY hints to eliminate more COPYs during register allocation.
Note that this is something all targets should do, see
https://reviews.llvm.org/D38128.
Review: Stanislav Mekhanoshin, Tom Stellard.
llvm-svn: 325425
Sadly, r324359 caused at least PR36312. There is a patch out for review
but it seems to be taking a bit and we've already had these crashers in
tree for too long. We're hitting this PR in real code now and are
blocked on shipping new compilers as a consequence so I'm reverting us
back to green.
Sorry for the churn due to the stacked changes that I had to revert. =/
llvm-svn: 325420
Summary:
Currently we convert to shuffles during lowering. This moves it to DAG combine so hopefully we can get it done before type legalization has to extend the condition.
I believe in some cases we're creating SHRUNKBLENDs that end up with constant conditions because we see the extended on the condition and think its a dynamic selelect before DAG combine gets a chance to constant fold the extend. We could add combines to turn SHRUNKBLENDs with constant condition back to vselect. But it seemed like it might be better to just send them to shuffles as early as possible so they never get a chance to become SHRUNKBLENDs. This the reason some tests went from blends controlled by a constant pool load to just move.
Some of the constant pool entries changed because the sign_extend introduced by type legalization turned undef elements in select condition into 0s. While the select->shuffle used -1 in the shuffle mask. So now the shuffle lowering can do what it wants with them.
I'll remove the lowering code as a follow up. We might be able to simplify some of the pre-checks for SHRUNKBLEND as the FIXME there says.
Reviewers: spatel, RKSimon, efriedma, zvi, andreadb
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43367
llvm-svn: 325417
Undef in select condition means we should pick the element from one side or the other. An undef in a shuffle mask means pick any element from either source or worse.
I suspect by the time we get here most of the undefs in a constant vector have been removed by other things, but doing this for safety.
llvm-svn: 325394
The data type is assumed to be a vector, but sometimes it is not, leading
to an assertion.
Add simple test-case to verify this.
Differential revision: https://reviews.llvm.org/D42599
llvm-svn: 325378
Summary:
This patch extends the promotion of alloca to vector to the arrays of up to 16 elements. Also we introduce
an option, -disable-promote-alloca-to-vector, to switch promotion to vector off, if needed.
Reviewers:
arsenm
Differential Revision:
https://reviews.llvm.org/D33559
llvm-svn: 325372
This seems to interfere with a target independent brcond combine that looks for the (srl (and X, C1), C2) pattern to enable TEST instructions. Once we flip, that combine doesn't fire and we end up exposing it to the X86 specific BT combine which causes us to emit a BT instruction. BT has lower throughput than TEST.
We could try to make the brcond combine aware of the alternate pattern, but since the flip was just a code size reduction and not likely to enable other combines, it seemed easier to just delay it until after lowering.
Differential Revision: https://reviews.llvm.org/D43201
llvm-svn: 325371
Add an explicit check before looking up symbol in SymbolIndices.
This was previously silently succeeding and returning zero for such
unnamed temporaries.
Differential Revision: https://reviews.llvm.org/D43365
llvm-svn: 325367
Summary:
In the current implementation of GPR Indexing Mode when the index is of non-uniform, the s_set_gpr_idx_off instruction
is incorrectly inserted after the loop. This will lead the instructions with vgpr operands (v_readfirstlane for example) to read incorrect
vgpr.
In this patch, we fix the issue by inserting s_set_gpr_idx_on/off immediately around the interested instruction.
Reviewers:
rampitec
Differential Revision:
https://reviews.llvm.org/D43297
llvm-svn: 325355
Running a bootstrap build with UBSan produces a number of instances where
we have signed integer overflow due to this transform. Change the type to
long to prevent this UB on 64-bit build machines.
llvm-svn: 325347
These instructions conflict with their full length variants
for the purposes of FastISel as they cannot be distingushed
based on the number and type of operands and predicates.
Reviewers: atanasyan
Differential Revision: https://reviews.llvm.org/D41285
llvm-svn: 325341
Enable multiple COPY hints to eliminate more COPYs during register allocation.
Note that this is something all targets should do, see
https://reviews.llvm.org/D38128.
Review: Eli Friedman
llvm-svn: 325327
This patch combines some cases of ARMISD::CMOV for integers that arise in comparisons of the form
a != b ? x : 0
a == b ? 0 : x
and that currently (e.g. in Thumb1) are emitted as branches.
Differential Revision: https://reviews.llvm.org/D34515
llvm-svn: 325323