Commit Graph

49278 Commits

Author SHA1 Message Date
Chen Zheng c941d925b0 [MachineCycle][NFC] add a cache for block and its top level cycle
This solves https://github.com/llvm/llvm-project/issues/57664

Reviewed By: sameerds

Differential Revision: https://reviews.llvm.org/D134019
2022-09-21 01:31:04 -04:00
Craig Topper 70a64fe7b1 [RISCV] Remove support for the unratified Zbt extension.
This extension does not appear to be on its way to ratification.

Out of the unratified bitmanip extensions, this one had the
largest impact on the compiler.

Posting this patch to start a discussion about whether we should
remove these extensions. We'll talk more at the RISC-V sync meeting this
Thursday.

Reviewed By: asb, reames

Differential Revision: https://reviews.llvm.org/D133834
2022-09-20 20:26:48 -07:00
Shubham Sandeep Rastogi 636de2bf34 Change isLittleEndian to follow llvm style and add an accessor
Differential Revision: https://reviews.llvm.org/D134290
2022-09-20 17:00:47 -07:00
Scott Linder f583151461 [NFC][AMDGPU] Refactor AMDGPUDisassembler
Clean up ahead of a patch to fix bugs in the AMDGPUDisassembler.

Use lit.local.cfg substitutions and more idiomatic use of split-file to
simplify and extend existing kernel-descriptor disassembly tests.

Add a comment to AMDHSAKernelDescriptor.h, as at least one small set
towards keeping all kernel-descriptor sensitive code in sync.

Reviewed By: kzhuravl, arsenm

Differential Revision: https://reviews.llvm.org/D130105
2022-09-20 20:37:19 +00:00
Kazu Hirata 00874c48ea [IPO] Reorder parameters of InlineFunction (NFC)
With the recent addition of new parameter MergeAttributes (D134117),
callers need to specify several default parameters before getting to
specify the new parameter.

This patch reorders the parameters so that callers do not have to
specify as many default parameters.

Differential Revision: https://reviews.llvm.org/D134125
2022-09-20 09:09:38 -07:00
Eric Li 403d72cd43 [Support][NFC] Clarify function comment
Follow-up to 86118ec2 that addresses the comments in D134072, which
were accidentally left off of the commit.
2022-09-20 11:10:16 -04:00
Eric Li 86118ec2d0 [Support] Provide access to the full mapping in llvm::Annotations
Providing access to the mapping of annotations allows test helpers to
be expressive by using the annotations as expectations. For example, a
matcher could verify that all annotated points were matched by a
matcher, or that an refactoring surgically modifies specific ranges.

Differential Revision: https://reviews.llvm.org/D134072
2022-09-20 11:06:21 -04:00
Vy Nguyen 016c2f5e32 [lld-macho] Support -dyld_env
This arg is undocumented but from looking at the code + experiment, it's used to add additional DYLD_ENVIRONMENT load commands to the output.

Differential Revision: https://reviews.llvm.org/D134058
2022-09-20 10:16:45 -04:00
Caroline Concatto d32b8fdbdb [LLVM][AArch64] Replace aarch64.sve.ld by aarch64.sve.ldN.sret
This patch removes the intrinsic aarch64.sve.ldN from tablegen in favour of
using arch64.sve.ldN.sret.

Depends on: D133023

Differential Revision: https://reviews.llvm.org/D133025
2022-09-20 13:15:07 +01:00
luxufan bfd31c6f12 [MemorySSA][NFC] Use const whenever possible
Differential Revision: https://reviews.llvm.org/D134162
2022-09-20 02:21:02 +00:00
Matt Arsenault 2adae8e1b7 VectorCombine: Pass through AssumptionCache 2022-09-19 19:25:22 -04:00
Matt Arsenault ce44357216 Analysis: Add AssumptionCache to isSafeToSpeculativelyExecute
Does not update any of the uses.
2022-09-19 19:25:22 -04:00
Matt Arsenault 34fb7803f8 GlobalISel: Pass through AssumptionCache 2022-09-19 19:10:51 -04:00
Matt Arsenault bcb931c484 SelectionDAG: Add AssumptionCache analysis dependency
Fixes compile time regression after
bb70b5d406
2022-09-19 19:10:51 -04:00
Matt Arsenault 0d8ffcc532 Analysis: Add AssumptionCache argument to isDereferenceableAndAlignedPointer
This does not try to pass it through from the end users.
2022-09-19 18:57:33 -04:00
Mircea Trofin c625c17b88 [lld][thinlto] Include -mllvm options in the thinlto cache key
They may modify thinlto optimization.

This patch only extends support for `-mllvm`. There is another way to
pass llvm flags, `-plugin-opt`, but its processing is different and will
be provided in a subsequent patch.

Differential Revision: https://reviews.llvm.org/D134013
2022-09-19 12:04:17 -07:00
Fangrui Song 0b140d0910 [Object] Add zstd decompression support to Decompressor
llvm::object::Decompressor is used by many DWARF consumers like llvm-dwarfdump,
llvm-dwp, llvm-symbolizer. Add tests to them. The lldb test can be left to
D133530.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D134116
2022-09-19 11:41:16 -07:00
zhijian dcd5abd4c4 [AIX] llvm-readobj support a new option --exception-section for xcoff object file.
Summary:

llvm-readobj support a new option --exception-section for xcoff object file.

https://www.ibm.com/docs/en/aix/7.2?topic=formats-xcoff-object-file-format#XCOFF__iua3i23ajbau

Reviewers:  James Henderson,Paul Scoropan

Differential Revision: https://reviews.llvm.org/D133030
2022-09-19 10:55:48 -04:00
Nikita Popov dd61726d5b Revert "[SimplifyCFG] accumulate bonus insts cost"
This reverts commit e5581df60a.

This causes major compile-time regressions, about 2-3% end-to-end
on CTMark.
2022-09-19 14:46:43 +02:00
Max Kazantsev 818b1ab84e [SCEV][NFC] Remove unused parameter from forgetLoopDispositions
Let's be honest about it, we don't drop loop dispositions for
particular loops. Remove the parameter that misleadingly makes
it apparent that we do.
2022-09-19 14:06:42 +07:00
Kazu Hirata 6b49f30fca [llvm] Deprecate llvm::empty (NFC)
This patch deprecates llvm::empty as I've migrated all known uses of
llvm::empty(x) to x.empty().

Differential Revision: https://reviews.llvm.org/D134141
2022-09-18 22:01:32 -07:00
Kazu Hirata 2078350645 Use std::make_unsigned_t (NFC) 2022-09-18 18:41:02 -07:00
Lang Hames 0e43f3b04d [ORC][ORC-RT] Make WrapperFunctionCall::Create support void functions.
Serialized calls to void-wrapper-functions should have zero bytes of argument
data, but accessing ArgData[0] may (and will, in the case of SmallVector) fail
if the argument data buffer is empty.

This commit fixes the issue by adding a check for empty argument buffers.
2022-09-18 17:53:45 -07:00
Yaxun (Sam) Liu e5581df60a [SimplifyCFG] accumulate bonus insts cost
SimplifyCFG folds

bool foo() {
  if (cond1) return false;
  if (cond2) return false;
  return true;
}

as

bool foo() {
  if (cond1 | cond2) return false
  return true;
}

'cond2' is called 'bonus insts' in branch folding since they introduce overhead
since the original CFG could do early exit but the folded CFG always executes
them. SimplifyCFG calculates the costs of 'bonus insts' of a folding a BB into
its predecessor BB which shares the destination. If it is below bonus-inst-threshold,
SimplifyCFG will fold that BB into its predecessor and cond2 will always be executed.

When SimplifyCFG calculates the cost of 'bonus insts', it only consider 'bonus' insts
in the current BB to be considered for folding. This causes issue for unrolled loops
which share destinations, e.g.

bool foo(int *a) {
  for (int i = 0; i < 32; i++)
    if (a[i] > 0) return false;
  return true;
}

After unrolling, it becomes

bool foo(int *a) {
  if(a[0]>0) return false
  if(a[1]>0) return false;
  //...
  if(a[31]>0) return false;
  return true;
}

SimplifyCFG will merge each BB with its predecessor BB,
and ends up with 32 'bonus insts' which are always executed, which
is much slower than the original CFG.

The root cause is that SimplifyCFG does not consider the
accumulated cost of 'bonus insts' which are folded from
different BB's.

This patch fixes that by introducing a ValueMap to track
costs of 'bonus insts' coming from different BB's into
the same BB, and cuts off if the accumulated cost
exceeds a threshold.

Reviewed by: Artem Belevich, Florian Hahn, Nikita Popov, Matt Arsenault

Differential Revision: https://reviews.llvm.org/D132408
2022-09-18 20:21:14 -04:00
Kazu Hirata 82293ed486 [ModuleInliner] Remove unused using declarations (NFC) 2022-09-18 14:27:06 -07:00
Kazu Hirata 3e720fa9dc Use std::decay_t (NFC) 2022-09-18 10:25:08 -07:00
Kazu Hirata 5e5a6c5b07 Use std::conditional_t (NFC) 2022-09-18 10:25:06 -07:00
Kazu Hirata 919b638eff [ADT] Use std::common_type_t (NFC) 2022-09-18 10:25:04 -07:00
Kazu Hirata d3b95ecc98 [ModuleInliner] Remove InlineOrder::front (NFC)
InlineOrder::front is a remnant from the era when we had a nested
"while" loops in the module inliner, with the inner one grouping the
call sites with the same caller.

Now that we have a simple "while" loop draining the priority queue, we
can just use InlineOrder::pop.

Differential Revision: https://reviews.llvm.org/D134121
2022-09-18 08:49:44 -07:00
Kazu Hirata 284f0397e2 [Transforms] Merge function attributes within InlineFunction (NFC)
In the past, we've had a bug resulting in a compiler crash after
forgetting to merge function attributes (D105729).

This patch teaches InlineFunction to merge function attributes.  This
way, we minimize the "time" when the IR is valid, but the function
attributes are not.

Differential Revision: https://reviews.llvm.org/D134117
2022-09-17 23:10:23 -07:00
Kai Nacke ae35188f97 [GISel] Fix match tree emitter.
The following changes are necessasy to get the generated tree
matcher to compile:

- In CodeExpansions::declare(), the assert() prevents connecting
  two instructions. E.g. the match code
    (match (MUL $t, $s1, $s2),
           (SUB $d, $t, $s3)),
  results in two declarations of $t, one for the def and one for
  the use. Removing the assertion allows this construct.
  If $t is later used, it is one of the operands, which should be
  perfectly fine.
- The code emitted in GIMatchTreeVRegDefPartitioner::generatePartitionSelectorCode()
  is not compilable:
  - The value of NewInstrID should be emitted, not the name
  - Both calls involving getOperand() end with one parenthesis too many
- Swaps generated condition for the partition code in the latter function

It also changes the rules i2p_to_p2i, fabs_fabs_fold, and fneg_fneg_fold
to use the tree matcher for a linear match. These rules are tested by:

CodeGen/AArch64/GlobalISel/combine-fabs.mir
CodeGen/AArch64/GlobalISel/combine-fneg.mir
CodeGen/AArch64/GlobalISel/combine-ptrtoint.mir
CodeGen/AMDGPU/GlobalISel/combine-add-nullptr.mir

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D133257
2022-09-18 00:00:15 +00:00
Aiden Grossman e5e3dccd07 [mlgo] Add in-development instruction based features for regalloc advisor
This patch adds in instruction based features to the regalloc advisor
gated behind a flag so a user can decide at runtime whether or not they
want to enable the feature. The features are only enabled when LLVM is
compiled in MLGO develpment mode (LLVM_HAVE_TF_API) is set to true.

To extract the instruction features, I'm taking a list of segments from
each LiveInterval and noting the start and end SlotIndices. This list is then
sorted based on the start SlotIndex and I iterate through each SlotIndex
to grab instructions, making sure to check for overlaps. This results in
a vector of opcodes and binary mapping matrix that maps live ranges to the
opcodes of the instructions within that LR.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D131930
2022-09-17 19:54:45 +00:00
Kazu Hirata 20d764aff0 [llvm] Don't including SetVector.h (NFC)
llvm/lib/ProfileData/RawMemProfReader.cpp uses SetVector without
including SetVector.h, so this patch adds an appropriate #include
there.
2022-09-17 12:36:43 -07:00
Fangrui Song 367997d0d6 [Support] Rename llvm::compression::{zlib,zstd}::uncompress to more appropriate decompress
This improves consistency with other places (e.g. llvm::compression::decompress,
llvm::object::Decompressor::decompress, llvm-objcopy).
Note: when zstd::uncompress was added, we noticed that the API `ZSTD_decompress`
is fine while the zlib API `uncompress` is a misnomer.
2022-09-17 12:35:17 -07:00
Kazu Hirata 6170437df5 [ModuleInliner] Remove unnecessary #includes (NFC)
While I am at it, this patch removes an unnecessary forward
declaration.
2022-09-17 12:05:35 -07:00
Kazu Hirata 5faf4bf195 [ModuleInliner] Move UseInlinePriority to InlineOrder.cpp (NFC)
UseInlinePriority specifies the priority function.  This patch
simplifies the code by moving UseInlinePriority closer to the actual
consumer -- the switch statement inside getInlineOrder.

Differential Revision: https://reviews.llvm.org/D134100
2022-09-17 11:41:28 -07:00
Sander de Smalen bed214cf0f [AArch64][SME] Add intrinsics for enabling/disabling ZA.
This adds the intrinsics:
* void @llvm.aarch64.sme.za.enable() -> smstart za
* void @llvm.aarch64.sme.za.disable()  -> smstop za

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D133894
2022-09-17 16:41:42 +00:00
Shivam Gupta 78533528cf [NFC] Fix a comment in InitializePasses.h 2022-09-17 19:09:26 +05:30
Kazu Hirata 29c841ce93 Revert "[llvm] Remove llvm::is_trivially_{copy/move}_constructible (NFC)"
This reverts commit 01ffe31cbb.

A build breakage with GCC 7.3 has been reported:

https://reviews.llvm.org/D132311#3797053

FWIW, GCC 7.5 is OK according to Pavel Chupin.  I also personally
tested GCC 8.4.0.
2022-09-16 18:26:20 -07:00
Kazu Hirata 6e30a9cc08 [Inliner] Retire DefaultInlineOrder (NFC)
DefaultInlineOrder was largely an exercise in generalizing the
traversal order of call sites within the inliner.

Now that the module inliner is starting to form its shape, there is no
point in sharing DefaultInlineOrder between the module inliner and the
CGSCC inliner.  DefaultInlineOrder and all the other inline orders are
mutually exclusive in the following sense:

- The use of DefaultInlineOrder doesn't make sense in the module
  inliner because there is no priority inherent in the order in which
  call sites are added to the list of call sites -- SmallVector.

- The use of any other inline order doesn't make sense in the CGSCC
  inliner because little prioritization can be done within one CGSCC.

This patch essentially reverts the addition of DefaultInlineOrder so
that the loop structure of Inliner.cpp looks like the state just
before we started working on the module inliner (circa June 2021).

At the same time, ww remove the choice of DefaultInlineOrder from
UseInlinePriority.

Differential Revision: https://reviews.llvm.org/D134080
2022-09-16 15:36:40 -07:00
Jessica Paquette 1076b31da8 [GlobalISel] Combine select + fcmp to fminnum/fmaxnum/fminimum/fmaximum
This is a partial port of the code used by the SelectionDAGBuilder to
translate selects.

In particular, see matchSelectPattern in ValueTracking.cpp. This is a
GISel-equivalent of the portion which handles fminnum/fmaxnum/fminimum/fmaximum.

I tried to set it up so it'd be easy to add the non-FP cases. Those are simpler.
On the AArch64-end, it seems like the FP cases are more important for perf
right now, so I bit the bullet and went at the more complicated problem. :)

I elected to do this as a post-legalize combine rather than in the
IRTranslator because

Deciding which fmax/fmin to use can depend on legalization rules
Philosophically-speaking (TM), putting it in a combine just feels cleaner

Being able to enable/disable the combine is handy
Another option would be to use the ValueTracking code in the IRTranslator and
match what SelectionDAGBuilder::visitSelect does. I think that may be somewhat
annoying since we'd need to write lowerings back into the selects in the
legalizer. I'm not strongly opposed to the approach.

We'd also want to be careful with vector selects once that's implemented,
which explicitly check if a vector select is legal on the target. That'd
probably need a hook.

From what I can tell, doing this as a combine is probably a cleaner option
long-term.

Differential Revision: https://reviews.llvm.org/D116702
2022-09-16 13:35:46 -07:00
David Majnemer 8a868d8859 Revert "Revert "[clang, llvm] Add __declspec(safebuffers), support it in CodeView""
This reverts commit cd20a18286 and adds a
"let Heading" to NoStackProtectorDocs.
2022-09-16 19:39:48 +00:00
Kazu Hirata e0bc76eb23 [ModuleInliner] Move InlinePriority and its derived classes to InlineOrder.cpp (NFC)
These classes are referred to only from getInlineOrder in
InlineOrder.cpp.  This patch hides the entire class declarations and
definitions in InlineOrder.cpp.

Differential Revision: https://reviews.llvm.org/D134056
2022-09-16 12:32:16 -07:00
Justin Lebar 8cc3bfd13f
[NFC] Fix indentation in ValueTracking.h.
In a separate patch I want to modify ValueTracking.h.  When I touch the
header, arc wants to clang-format the lines I touch (reasonable!).  But
then these whitespace changes get mixed into my patch.
2022-09-16 10:46:23 -07:00
mbs 7061a3f3f8 [support] Prepare TimeProfiler for cross-thread support
This NFC prepares the TimeProfiler to support the construction
and completion of time profiling 'entries' across threads.

Add ClockType alias so we can change the clock in one place.
(trivial) Use c++ usings instead of typedefs
Rename Entry to TimeTraceProfilerEntry since this type will eventually become public.
Add an intro comment.
Add some smoke unit tests.

Reviewed By: russell.gallop, rriddle, lattner, jloser

Differential Revision: https://reviews.llvm.org/D133153
2022-09-16 10:20:18 -06:00
Yuta Mukai 116838b151 [MachinePipeliner] Fix the interpretation of the scheduling model
The method of counting resource consumption is modified to be based on
"Cycles" value when DFA is not used.

The calculation of ResMII is modified to total "Cycles" and divide it
by the number of units for each resource. Previously, ResMII was
excessive because it was assumed that resources were consumed for
the cycles of "Latency" value.

The method of resource reservation is modified similarly. When a
value of "Cycles" is larger than 1, the resource is considered to be
consumed by 1 for cycles of its length from the scheduled cycle.
To realize this, ResourceManager maintains a resource table for all
slots. Previously, resource consumption was always 1 for 1 cycle
regardless of the value of "Cycles" or "Latency".

In addition, the number of micro operations per cycle is modified to
be constrained by "IssueWidth". To disable the constraint,
--pipeliner-force-issue-width=100 can be used.

For the case of using DFA, the scheduling results are unchanged.

Reviewed By: dpenry

Differential Revision: https://reviews.llvm.org/D133572
2022-09-16 09:51:48 +09:00
Navid Emamdoost 3e52c0926c Add -fsanitizer-coverage=control-flow
Reviewed By: kcc, vitalybuka, MaskRay

Differential Revision: https://reviews.llvm.org/D133157
2022-09-15 15:56:04 -07:00
Alexander Timofeev fbdea5a2e9 [AMDGPU] Always select s_cselect_b32 for uniform 'select' SDNode
This patch contains changes necessary to carry physical condition register (SCC) dependencies through the SDNode scheduler.  It adds the edge in the SDNodeScheduler dependency graph instead of inserting the SCC copy between each definition and use. This approach lets the scheduler place instructions in an optimal way placing the copy only when the dependency cannot be resolved.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D133593
2022-09-15 22:03:56 +02:00
Florian Hahn 81a11da762
[CGP,AArch64] Replace zexts with shuffle that can be lowered using tbl.
This patch extends CodeGenPrepare to lower zext v16i8 -> v16i32 in loops
using a wide shuffle  creating a v64i8 vector, selecting groups of 3
zero elements and an element from the input.

This is profitable on AArch64 where such shuffles can be lowered to tbl
instructions, but only in loops, because it requires materializing 4
masks, which can be done in the loop preheader.

This is the only reason the transform is part of CGP. If there's a
better alternative I missed, please let me know. The same goes for the
shouldReplaceZExtWithShuffle hook which guards this. I am not sure if
this transform will be beneficial on other targets, but it seems like
there is no way other convenient way.

This improves the generated code for loops like the one below in
combination with D96522.

    int foo(uint8_t *p, int N) {
      unsigned long long sum = 0;
      for (int i = 0; i < N ; i++, p++) {
	unsigned int v = *p;
	sum += (v < 127) ? v : 256 - v;
      }
      return sum;
    }

https://clang.godbolt.org/z/Wco866MjY

Reviewed By: t.p.northover

Differential Revision: https://reviews.llvm.org/D120571
2022-09-15 19:18:13 +01:00
Sergei Barannikov c6acb4eb0f [SDAG] Add `getCALLSEQ_END` overload taking `uint64_t`s
All in-tree targets pass pointer-sized ConstantSDNodes to the
method. This overload reduced amount of boilerplate code a bit.  This
also makes getCALLSEQ_END consistent with getCALLSEQ_START, which
already takes uint64_ts.
2022-09-15 14:02:12 -04:00
Jay Foad 3822a01e0b [AMDGPU] Add GFX11 ds_bvh_stack_rtn_b32 instruction
Differential Revision: https://reviews.llvm.org/D133928
2022-09-15 16:46:14 +01:00
Sander de Smalen 45d28779c5 [AArch64][SME] Fix lowering of llvm.aarch64.get.pstatesm()
A thread may not have access to SME or TPIDR2_EL0, so in order to
safely query PSTATE.SM in a streaming-compatible function, the
code should call `__arm_sme_state()`, as described in the ABI:

  c2bb09c4d4

This means that the value of pstate.sm is:
* 0 if the function is non-streaming.
* 1 if the function has `arm_streaming` or `arm_locally_streaming`.
* evaluated at runtime by a call to __arm_sme_state() otherwise.

This patch also adds a calling convention for calls to SME support routines.

At some point we can remove the need for the llvm.aarch64.get.pstatesm() intrinsic
and use function calls (with the corresponding cc) directly instead.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D131571
2022-09-15 15:14:13 +00:00
Matt Arsenault 63d1d37d35 RegAllocGreedy: Avoid overflowing priority bitfields
The class priority is expected to be at most 5 bits before it starts
clobbering bits used for other fields. Also clamp the instruction
distance in case we have millions of instructions.

AMDGPU was accidentally overflowing into the global priority bit in
some cases. I think in principal we would have wanted this, but in the
cases I've looked at, it had the counter intuitive effect and
de-prioritized the large register tuple.

Avoid using weird bit hack PPC uses for global priority. The
AllocationPriority field is really 5 bits, and PPC was relying on
overflowing this to 6-bits to forcibly set the global priority
bit. Split this out as a separate flag to avoid having magic behavior
for values above 31.
2022-09-15 10:38:40 -04:00
Alexey Lapshin adabfb5e32 [DWARFLinker][NFC] Set the target DWARF version explicitly.
Currently, DWARFLinker determines the target DWARF version internally.
It examines incoming object files, detects maximal
DWARF version and uses that version for the output file.
This patch allows explicitly setting output DWARF version by the consumer
of DWARFLinker. So that DWARFLinker uses a specified version instead
of autodetected one. It allows consumers to use different logic for
setting the target DWARF version. f.e. instead of the maximally used version
someone could set a higher version to convert from DWARFv4 to DWARFv5
(This possibility is not supported yet, but it would be good if
the interface will support it). Or another variant is to set the target
version through the command line. In this patch, the autodetection is moved
into the consumers(DwarfLinkerForBinary.cpp, DebugInfoLinker.cpp).

Differential Revision: https://reviews.llvm.org/D132755
2022-09-15 16:06:10 +03:00
Dhruva Chakrabarti 839ac62c50 Revert "[OpenMP] Codegen aggregate for outlined function captures"
This reverts commit 7539e9cf81.
2022-09-15 03:08:46 +00:00
Vitaly Buka 72b776168c [IRBuilder] Add CreateMaskedExpandLoad and CreateMaskedCompressStore 2022-09-14 19:18:52 -07:00
Giorgis Georgakoudis 7539e9cf81 [OpenMP] Codegen aggregate for outlined function captures
Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3)  forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call.

Reviewed By: jdoerfert, jhuber6, ABataev

Differential Revision: https://reviews.llvm.org/D102107
2022-09-15 00:54:05 +00:00
Sam Clegg 8273ca1421 [MC] Fix typo in getSectionAddressSize comment. NFC
The comment was refering to a now non-existant function that was removed
in 93e3cf0ebd.

Differential Revision: https://reviews.llvm.org/D133098
2022-09-14 15:15:41 -07:00
Craig Topper 50a699e362 [IR][VP] Remove IntrArgMemOnly from vp.gather/scatter.
IntrArgMemOnly is only valid for intrinsics that use a scalar
pointer argument. These intrinsics use a vector of pointer.

Alias analysis will try to find a scalar pointer argument and
will return incorrect alias results when it doesn't find one.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D133898
2022-09-14 15:00:07 -07:00
Arthur Eubanks ccc9107ad6 [OptBisect] Add flag to print IR when opt-bisect kicks in
-opt-bisect-print-ir-path=foo will dump the IR to foo when opt-bisect-limit starts skipping passes.

Currently we don't print the IR if the opt-bisect-limit is higher than the total number of times opt-bisect is called.

This makes getting the IR right before a bad transform easier.

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D133809
2022-09-14 13:48:03 -07:00
Fangrui Song 25394c9d10 [llvm-objdump] Change printSymbolVersionDependency to use ELFFile API
When .gnu.version_r is empty (allowed by readelf but warned by objdump),
llvm-objdump -p may decode the next section as .gnu.version_r and may crash due
to out-of-bounds C string reference. ELFFile<ELFT>::getVersionDependencies
handles 0-entry .gnu.version_r gracefully. Just use it.

Fix https://github.com/llvm/llvm-project/issues/57707

Differential Revision: https://reviews.llvm.org/D133751
2022-09-14 12:30:34 -07:00
Nikita Popov b1cd393f9e [AA] Tracking per-location ModRef info in FunctionModRefBehavior (NFCI)
Currently, FunctionModRefBehavior tracks whether the function reads
or writes memory (ModRefInfo) and which locations it can access
(argmem, inaccessiblemem and other). This patch changes it to track
ModRef information per-location instead.

To give two examples of why this is useful:

* D117095 highlights a weakness of ModRef modelling in the presence
  of operand bundles. For a memcpy call with deopt operand bundle,
  we want to say that it can read any memory, but only write argument
  memory. This would allow them to be treated like any other calls.
  However, we currently can't express this and have to say that it
  can read or write any memory.
* D127383 would ideally be modelled as a separate threadid location,
  where threadid Refs outside pre-split coroutines can be ignored
  (like other accesses to constant memory). The current representation
  does not allow modelling this precisely.

The patch as implemented is intended to be NFC, but there are some
obvious opportunities for improvements and simplification. To fully
capitalize on this we would also want to change the way we represent
memory attributes on functions, but that's a larger change, and I
think it makes sense to separate out the FunctionModRefBehavior
refactoring.

Differential Revision: https://reviews.llvm.org/D130896
2022-09-14 16:34:41 +02:00
Martin Storsjö e280940bfb [Support] Access threadIndex via a wrapper function
On Unix platforms, this wrapper function is inline, so it should
expand to the same direct access to the thread local variable. On
Windows, it's a non-inline function within Parallel.cpp, allowing
making the thread_local variable static.

Windows Native TLS doesn't support direct access to thread local
variables in a different DLL, and GCC/binutils on Windows occasionally
has problems with non-static thread local variables too.

This fixes mingw dylib builds with native TLS after
e6aebff674.

At the same time, move the whole thread local variable within
    #if LLVM_ENABLE_THREADS
to fix builds without threading support.

Differential Revision: https://reviews.llvm.org/D133759
2022-09-14 09:19:27 +03:00
Pengxuan Zheng ecb5ea6a26 [Object][COFF] Allow section symbol to be common symbol
I ran into an lld-link error due to a symbol named ".idata$4" coming from some
static library:
  .idata$4 should not refer to special section 0.

Here is the symbol table entry for .idata$4:

  Symbol {
      Name: .idata$4
      Value: 3221225536
      Section: IMAGE_SYM_UNDEFINED (0)
      BaseType: Null (0x0)
      ComplexType: Null (0x0)
      StorageClass: Section (0x68)
      AuxSymbolCount: 0
  }

The symbol .idata$4 is a section symbol (IMAGE_SYM_CLASS_SECTION) and LLD
currently handles it as a regular defined symbol since isCommon() returns false
for this symbol. This results in the error ".idata$4 should not refer to special
section 0" because lld-link asserts that regular defined symbols should not
refer to section 0.

Should this symbol be handled as a common symbol instead? LLVM currently only
allows external symbols (IMAGE_SYM_CLASS_EXTERNAL) to be common
symbols. However, the PE/COFF spec (see section "Section Number Values") does
not seem to mention this restriction. Any thoughts?

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D133627
2022-09-13 18:07:02 -07:00
YongKang Zhu 5fa6b24354 Address feedback in https://reviews.llvm.org/D133637
https://reviews.llvm.org/D133637 fixes the problem where we should hash raw content of
register mask instead of the pointer to it.

Fix the same issue in `llvm::hash_value()`.

Remove the added API `MachineOperand::getRegMaskSize()` to avoid potential confusion.

Add an assert to emphasize that we probably should hash a machine operand iff it has
associated machine function, but keep the fallback logic in the original change.

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D133747
2022-09-13 16:12:41 -07:00
Chris Bieneman 4b96f8996a [DX] DXContainer does not support COMDAT
The DXContainer is pretty primitive, but doesn't support COMDAT. We need
to set that in the Triple so that Clang won't try to emit COMDATs.
2022-09-13 13:59:47 -05:00
theidexisted 0a1c8522f3 [NFC][ADT] Fix assert message
Reviewed By: gchatelet

Differential Revision: https://reviews.llvm.org/D129632
2022-09-13 18:55:57 +00:00
Hendrik Greving 393a17b5d1 [ValueTypes] Define MVTs for v256i2/v128i4.
Adds MVT::v256i2, MVT::v128i4.

Differential Revision: https://reviews.llvm.org/D133603
2022-09-13 09:02:23 -07:00
Pavel Samolysov 02aaf8e3d6 [NFC][ScheduleDAGInstrs] Use structure bindings and emplace_back
Some uses of std::make_pair and the std::pair's first/second members
in the ScheduleDAGInstrs.[cpp|h] files were replaced with using of the
vector's emplace_back along with structure bindings from C++17.
2022-09-13 12:49:04 +03:00
Sylvestre Ledru cd20a18286 Revert "[clang, llvm] Add __declspec(safebuffers), support it in CodeView"
Causing:
https://github.com/llvm/llvm-project/issues/57709

This reverts commit ab56719acd.
2022-09-13 10:53:59 +02:00
Max Kazantsev 86d5586d78 [SCEVExpander] Recompute poison-generating flags on hoisting. PR57187
Instruction being hoisted could have nuw/nsw flags inferred from the old
context, and we cannot simply move it to the new location keeping them
because we are going to introduce new uses to them that didn't exist before.

Example in https://github.com/llvm/llvm-project/issues/57187 shows how
this can produce branch by poison from initially well-defined program.

This patch forcefully recomputes poison-generating flag in the new context.

Differential Revision: https://reviews.llvm.org/D132022
Reviewed By: fhahn, nikic
2022-09-13 12:56:35 +07:00
Aiden Grossman eec183c171 [nfc] Refactor SlotIndex::getInstrDistance to better reflect actual functionality
This patch refactors SlotIndex::getInstrDistance to
SlotIndex::getApproxInstrDistance to better describe the actual
functionality of this function. This patch also adds in some additional
comments better documenting the assumptions that this function makes to
increase clarity.

Based on discussion on the LLVM Discourse:
https://discourse.llvm.org/t/odd-behavior-in-slotindex-getinstrdistance/64934/5

Reviewed By: mtrofin, foad

Differential Revision: https://reviews.llvm.org/D133386
2022-09-12 23:33:35 +00:00
Amara Emerson 25bcc8c797 [GlobalISel][Legalizer] Fix minScalarEltSameAsIf to handle p0 element types.
The mutation the action generates tries to change the input type into the
element type of larger vector type. This doesn't work if the larger element
type is a vector of pointers since it creates an illegal mutation between
scalar and pointer types.

Differential Revision: https://reviews.llvm.org/D133671
2022-09-13 00:01:37 +01:00
David Majnemer ab56719acd [clang, llvm] Add __declspec(safebuffers), support it in CodeView
__declspec(safebuffers) is equivalent to
__attribute__((no_stack_protector)).  This information is recorded in
CodeView.

While we are here, add support for strict_gs_check.
2022-09-12 21:15:34 +00:00
Kazu Hirata 9606608474 [llvm] Use x.empty() instead of llvm::empty(x) (NFC)
I'm planning to deprecate and eventually remove llvm::empty.

I thought about replacing llvm::empty(x) with std::empty(x), but it
turns out that all uses can be converted to x.empty().  That is, no
use requires the ability of std::empty to accept C arrays and
std::initializer_list.

Differential Revision: https://reviews.llvm.org/D133677
2022-09-12 13:34:35 -07:00
YongKang Zhu 481a32f587 Bug fix on stable hash calculation for machine operands RegisterMask and RegisterLiveOut
MachineOperand::getRegMask() returns a pointer to register mask.  We should hash the raw content of register mask instead of its pointer.

Reviewed By: kyulee

Differential Revision: https://reviews.llvm.org/D133637
2022-09-12 13:25:04 -07:00
Fangrui Song e6aebff674 [ELF] Parallelize relocation scanning
* Change `Symbol::flags` to a `std::atomic<uint16_t>`
* Add `llvm::parallel::threadIndex` as a thread-local non-negative integer
* Add `relocsVec` to part.relaDyn and part.relrDyn so that relative relocations can be added without a mutex
* Arbitrarily change -z nocombreloc to move relative relocations to the end. Disable parallelism for deterministic output.

MIPS and PPC64 use global states for relocation scanning. Keep serial scanning.

Speed-up with mimalloc and --threads=8 on an Intel Skylake machine:

* clang (Release): 1.27x as fast
* clang (Debug): 1.06x as fast
* chrome (default): 1.05x as fast
* scylladb (default): 1.04x as fast

Speed-up with glibc malloc and --threads=16 on a ThunderX2 (AArch64):

* clang (Release): 1.31x as fast
* scylladb (default): 1.06x as fast

Reviewed By: andrewng

Differential Revision: https://reviews.llvm.org/D133003
2022-09-12 12:56:35 -07:00
Craig Topper 38ffa2bb96 [LegalizeTypes] Improve splitting for urem/udiv by constant for some constants.
For remainder:
If (1 << (Bitwidth / 2)) % Divisor == 1, we can add the high and low halves
together and use a (Bitwidth / 2) urem. If (BitWidth /2) is a legal integer
type, this urem will be expand by DAGCombiner using multiply by magic
constant. We do have to take into account that adding high and low
together can produce a carry, making it a (BitWidth / 2)+1 bit number.
So we need to also add back in the carry from the first addition.

For division:
We can use the above trick to compute the remainder, subtract that
remainder from the dividend, then multiply by the multiplicative
inverse of the Divisor modulo (1 << BitWidth).

This is based on the section "Remainder by Summing Digits" in
Hacker's delight.

The remainder trick is similar to a trick you may have learned for
determining if a decimal number is divisible by 3. You can add all the
digits together and see if the sum is divisible by 3. If you're not sure
if the sum is divisible by 3, you can add its digits together. This
can be repeated until you have a single decimal digit. If that digit
is 3, 6, or 9, then the original number is divisible by 3. This works
because 10 % 3 == 1.

gcc already does this same trick. There are additional tricks gcc
does urem as well as srem, udiv, and sdiv that I plan to add in
future patches.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D130862
2022-09-12 10:34:52 -07:00
Matthias Gehre c1502425ba Move TargetTransformInfo::maxLegalDivRemBitWidth -> TargetLowering::maxSupportedDivRemBitWidth
Also remove new-pass-manager version of ExpandLargeDivRem because there is no way
yet to access TargetLowering in the new pass manager.

Differential Revision: https://reviews.llvm.org/D133691
2022-09-12 17:06:16 +01:00
Jay Foad 210e6a993d [GlobalISel] Simplify extended add/sub to add/sub with carry
Simplify extended add/sub (with carry-in and carry-out) to add/sub with
carry (with carry-out only) if carry-in is known to be zero.

Differential Revision: https://reviews.llvm.org/D133702
2022-09-12 17:05:44 +01:00
Alexey Bataev dfe1e9dd79 [SLP]Improve reordering of clustered reused scalars.
If the reused scalars are clustered, i.e. each part of the reused mask
contains all elements of the original scalars exactly once, we can
reorder those clusters to improve the whole ordering of of the clustered
vectors.

Differential Revision: https://reviews.llvm.org/D133524
2022-09-12 06:52:25 -07:00
Matt Arsenault 7834194837 TableGen: Introduce generated getSubRegisterClass function
Currently there isn't a generic way to get a smaller register class
that can be produced from a subregister of a larger class. Replaces a
manually implemented version for AMDGPU. This will be used to improve
subregister support in the allocator.
2022-09-12 09:03:37 -04:00
Matt Arsenault bb70b5d406 CodeGen: Set MODereferenceable from isDereferenceableAndAlignedPointer
Previously this was assuming piontsToConstantMemory implies
dereferenceable.
2022-09-12 08:38:35 -04:00
David Spickett 739b69e655 [LLVM][AArch64] Explain that X19 is used as the frame base pointer register
Fixes #50098

LLVM uses X19 as the frame base pointer, if it needs to. Meaning you
can get warnings if you clobber that with inline asm.

However, it doesn't explain why. The frame base register is not part
of the ABI so it's pretty confusing why you get that warning out of the blue.

This adds a method to explain a reserved register with X19 as the first one.
The logic is the same as getReservedRegs.

I could have added a return parameter to isASMClobberable and friends
but found that there's a lot of things that call isReservedReg in various
ways.

So while one more method on the pile isn't great design, it is simpler
right now to do it this way and only pay the cost if you are actually using
a reserved register.

Reviewed By: lenary

Differential Revision: https://reviews.llvm.org/D133213
2022-09-12 09:18:09 +00:00
Johannes Doerfert c922cac868 Revert "[Attributor] AAPointerInfo should allow "harmless" uses"
Revert "[Attributor] Teach AAPointerInfo to look into aggregates"

This reverts commit 844f6c5d03 and
4ed0a88cd8 as they broke the buildbots
that run openmp/libomptarget/test/offloading/bug49021.cpp.
2022-09-11 21:37:54 -07:00
Johannes Doerfert 4ed0a88cd8 [Attributor] Teach AAPointerInfo to look into aggregates
If we have a constant aggregate, e.g., as an initializer, we usually
failed to extract the proper value/type from it. This patch provides the
size and offset information necessary to extract the right part of the
constant.
2022-09-11 20:16:11 -07:00
Johannes Doerfert 21711039e3 [OpenMP] Allow the Attributor to look at functions we also internalized
This is important as we have accesses to globals in those which we need to
categorize.
2022-09-11 20:16:11 -07:00
Kazu Hirata a21c8be1cc [llvm] Use std::aligned_storage_t (NFC) 2022-09-11 16:11:39 -07:00
Junduo Dong 6975ab7126 [Clang] Reimplement time tracing of NewPassManager by PassInstrumentation framework
The previous implementation of time tracing in NewPassManager is direct but messive.

The key codes are like the demo below:
```
  /// Runs the function pass across every function in the module.
  PreservedAnalyses run(LazyCallGraph::SCC &C, CGSCCAnalysisManager &AM,
                        LazyCallGraph &CG, CGSCCUpdateResult &UR) {
      /// ...
      PreservedAnalyses PassPA;
      {
        TimeTraceScope TimeScope(Pass.name());
        PassPA = Pass.run(F, FAM);
      }
      /// ...
 }
```

It can be bothered to judge where should we add the tracing codes by hands.

With the PassInstrumentation framework, we can easily add `Before/After` callback
functions to add time tracing codes.

Differential Revision: https://reviews.llvm.org/D131960
2022-09-11 05:42:55 -07:00
sunho d1c4d96126 [ORC][ORC_RT][COFF] Remove public bootstrap method.
Removes public bootstrap method that is not really necessary and not consistent with other platform API.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D132780
2022-09-10 15:25:50 +09:00
sunho 73c4033987 [ORC][ORC_RT][COFF] Support dynamic VC runtime.
Supports dynamic VC runtime. It implements atexits handling which is required to load msvcrt.lib successfully. (the object file containing atexit symbol somehow resolves to static vc runtim symbols) It also default to dynamic vc runtime which tends to be more robust.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D132525
2022-09-10 15:25:49 +09:00
Joe Loser 62b8a61d6c [llvm] Remove includes of `llvm/Support/STLArrayExtras.h`
`llvm` and downstream internal callers no longer use `array_lengthof`, so drop
the include everywhere.

Differential Revision: https://reviews.llvm.org/D133600
2022-09-09 17:44:00 -06:00
Fangrui Song 058f17d3af [ADT] Move LLVM_DEPRECATED before type after D133502
`[[deprecated(...)]]` cannot appear between `inline size_t`.
2022-09-09 15:56:58 -07:00
Joe Loser 5758c824da [ADT] Mark `llvm::array_lengthof` as deprecated
As a follow-up of 5e96cea1db, mark
`llvm::array_lengthof` as deprecated in favor of using `std::size` function
directly.

Differential Revision: https://reviews.llvm.org/D133502
2022-09-09 15:31:00 -06:00
Augie Fackler 4fea8ee540 OpenMP: mark allocptr attribute on __kmpc_free_shared
Differential Revision: https://reviews.llvm.org/D124491
2022-09-09 14:09:18 -04:00
Nikita Popov a9f312c7f4 [AST] Use BatchAA in aliasesUnknownInst() (NFCI) 2022-09-09 15:54:48 +02:00
Sebastian Neubauer c7750c522e Add helper func to get first non-alloca position
The LLVM performance tips suggest that allocas should be placed at the
beginning of the entry block. So far, llvm doesn’t provide any helper to
find that position.

Add BasicBlock::getFirstNonPHIOrDbgOrAlloca and IRBuilder::SetInsertPointPastAllocas(Function*)
that get an insert position after the (static) allocas at the start of a
function and use it in ShadowStackGCLowering.

Differential Revision: https://reviews.llvm.org/D132554
2022-09-09 15:39:53 +02:00
Namhyung Kim 43efb5e445 [llvm-objdump] Create name for fake sections
It doesn't have a section header string table so add a vector to have
the strings and create name based on the program header type and the
index.

Differential Revision: https://reviews.llvm.org/D131290
2022-09-09 12:27:07 +01:00
Serge Pavlov 7b9fae05b4 [Clang] Use virtual FS in processing config files
Clang has support of virtual file system for the purpose of testing, but
treatment of config files did not use it. This change enables VFS in it
as well.

Differential Revision: https://reviews.llvm.org/D132867
2022-09-09 18:24:45 +07:00
Serge Pavlov 55e1441f7b Revert "[Clang] Use virtual FS in processing config files"
This reverts commit 9424497e43.
Some buildbots failed, reverted for investigation.
2022-09-09 16:43:15 +07:00
Serge Pavlov 9424497e43 [Clang] Use virtual FS in processing config files
Clang has support of virtual file system for the purpose of testing, but
treatment of config files did not use it. This change enables VFS in it
as well.

Differential Revision: https://reviews.llvm.org/D132867
2022-09-09 16:28:51 +07:00
Fangrui Song 781dea021a [Support] Rename DebugCompressionType::Z to Zlib
"Z" was so named when we had both gABI ELFCOMPRESS_ZLIB and the legacy .zdebug support.
Now we have just one zlib format, we should use the more descriptive name.
2022-09-08 16:11:29 -07:00
raghavmedicherla 5d3cf8267f Revert "Support: Add mapped_file_region::sync(), equivalent to msync"
This reverts commit 142f51fc2f.

This shouldn't be committed, it got committed accidentally.
2022-09-08 12:49:52 -04:00
Thomas Lively ac3b8df8f2 [WebAssembly] Prototype `f32x4.relaxed_dot_bf16x8_add_f32`
As proposed in https://github.com/WebAssembly/relaxed-simd/issues/77. Only an
LLVM intrinsic and a clang builtin are implemented. Since there is no bfloat16
type, use u16 to represent the bfloats in the builtin function arguments.

Differential Revision: https://reviews.llvm.org/D133428
2022-09-08 08:07:49 -07:00
Joe Loser 5e96cea1db [llvm] Use std::size instead of llvm::array_lengthof
LLVM contains a helpful function for getting the size of a C-style
array: `llvm::array_lengthof`. This is useful prior to C++17, but not as
helpful for C++17 or later: `std::size` already has support for C-style
arrays.

Change call sites to use `std::size` instead.

Differential Revision: https://reviews.llvm.org/D133429
2022-09-08 09:01:53 -06:00
Eric Wang d8a2d3f7d4 [NFC][Regalloc] Introduce the RegAllocPriorityAdvisorAnalysis
This patch introduces the priority analysis and the priority advisor,
the default implementation, and the scaffolding for introducing the
other implementations of the advisor.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D132835
2022-09-08 07:50:03 -07:00
David Spickett e428baf001 [LLVM][ARM] Remove options for armv2, 2A, 3 and 3M
Fixes #57486

These pre v4 architectures are not specifically supported
by codegen. As demonstrated in the linked issue.

GCC has not supported 3M since GCC 9 and presumably
2 and 2A earlier than that. So we are aligned in that sense.

(see https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2abd6e34fcf3bd9f9ffafcaa47cdc3ed443f9add)

This removes the options and associated testing.

The Pre_v4 build attribute remains mainly because its absence
would be more confusing. It will not be used other than to
complete the list of build attributes as shown in the ABI.

https://github.com/ARM-software/abi-aa/blob/main/addenda32/addenda32.rst#3352the-target-related-attributes

Reviewed By: nickdesaulniers, peter.smith, rengolin

Differential Revision: https://reviews.llvm.org/D133109
2022-09-08 09:49:48 +00:00
Nikita Popov 96cb7c2273 [ConstantExpr] Remove fneg expression
As part of https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179,
this removes the fneg constant expression (which is, incidentally,
the only unary operator expression).

Differential Revision: https://reviews.llvm.org/D133418
2022-09-08 10:24:55 +02:00
Fangrui Song b6e1fd761d [llvm-objcopy] Support --{,de}compress-debug-sections for zstd
Also, add ELFCOMPRESS_ZSTD (2) from the approved generic-abi proposal:
https://groups.google.com/g/generic-abi/c/satyPkuMisk
("Add new ch_type value: ELFCOMPRESS_ZSTD")

Link: https://discourse.llvm.org/t/rfc-zstandard-as-a-second-compression-method-to-llvm/63399
("[RFC] Zstandard as a second compression method to LLVM")

Reviewed By: jhenderson, dblaikie

Differential Revision: https://reviews.llvm.org/D130458
2022-09-08 00:59:14 -07:00
Fangrui Song a41977dd0f [Support] Add llvm::compression::{getReasonIfUnsupported,compress,decompress}
as high-level API on top of `llvm::compression::{zlib,zstd}::*`:

* getReasonIfUnsupported: return nullptr if the specified format is
  supported, or (if unsupported) a string like `LLVM was not built with LLVM_ENABLE_ZLIB ...`
* compress: dispatch to zlib::uncompress or zstd::uncompress
* decompress: dispatch to zlib::uncompress or zstd::uncompress

Move `llvm::DebugCompressionType` from MC to Support to avoid Support->MC cyclic
dependency. There are 40+ uses in llvm-project.

Add another enum class `llvm::compression::Format` to represent supported
compression formats, which may be a superset of ELF compression formats.

See D130458 (llvm-objcopy --{,de}compress-debug-sections for zstd) for a use
case.

Link: https://discourse.llvm.org/t/rfc-zstandard-as-a-second-compression-method-to-llvm/63399
("[RFC] Zstandard as a second compression method to LLVM")

---

Note: this patch alone will cause -Wswitch to llvm/lib/ObjCopy/ELF/ELFObject.cpp

Reviewed By: ckissane, dblaikie

Differential Revision: https://reviews.llvm.org/D130506
2022-09-08 00:58:55 -07:00
Nikita Popov 0444b40ed3 Revert "[Support] Add llvm::compression::{getReasonIfUnsupported,compress,decompress}"
This reverts commit 19dc3cff0f.
This reverts commit 5b19a1f8e8.
This reverts commit 9397648ac8.
This reverts commit 10842b4475.

Breaks the GCC build, as reported here:
https://reviews.llvm.org/D130506#3776415
2022-09-08 09:33:12 +02:00
Fangrui Song 10842b4475 [Support] Work around GCC's enum support 2022-09-08 00:13:25 -07:00
Fangrui Song 5b19a1f8e8 [llvm-objcopy] Support --{,de}compress-debug-sections for zstd
Also, add ELFCOMPRESS_ZSTD (2) from the approved generic-abi proposal:
https://groups.google.com/g/generic-abi/c/satyPkuMisk
("Add new ch_type value: ELFCOMPRESS_ZSTD")

Link: https://discourse.llvm.org/t/rfc-zstandard-as-a-second-compression-method-to-llvm/63399
("[RFC] Zstandard as a second compression method to LLVM")

Reviewed By: jhenderson, dblaikie

Differential Revision: https://reviews.llvm.org/D130458
2022-09-07 23:53:40 -07:00
Fangrui Song 19dc3cff0f [Support] Add llvm::compression::{getReasonIfUnsupported,compress,decompress}
as high-level API on top of `llvm::compression::{zlib,zstd}::*`:

* getReasonIfUnsupported: return nullptr if the specified format is
  supported, or (if unsupported) a string like `LLVM was not built with LLVM_ENABLE_ZLIB ...`
* compress: dispatch to zlib::uncompress or zstd::uncompress
* decompress: dispatch to zlib::uncompress or zstd::uncompress

Move `llvm::DebugCompressionType` from MC to Support to avoid Support->MC cyclic
dependency. There are 40+ uses in llvm-project.

Add another enum class `llvm::compression::Format` to represent supported
compression formats, which may be a superset of ELF compression formats.

See D130458 (llvm-objcopy --{,de}compress-debug-sections for zstd) for a use
case.

Link: https://discourse.llvm.org/t/rfc-zstandard-as-a-second-compression-method-to-llvm/63399
("[RFC] Zstandard as a second compression method to LLVM")

Differential Revision: https://reviews.llvm.org/D130506
2022-09-07 23:53:14 -07:00
gonglingqin d5f7a2182d [LoongArch] Add codegen support for atomicrmw xchg operation on LA32
Depends on D131228

Differential Revision: https://reviews.llvm.org/D131229
2022-09-08 13:57:53 +08:00
gonglingqin b60f801607 [LoongArch] Add codegen support for atomicrmw xchg operation on LA64
In order to avoid the patch being too large, the atomicrmw xchg operation
on LA32 will be added later

Differential Revision: https://reviews.llvm.org/D131228
2022-09-08 13:57:26 +08:00
Fangrui Song f48931f3a8 [NewPM] Switch -filter-passes from ClassName to pass-name
NewPM -filter-passes (D86360) uses ClassName instead of pass-name as used in
`-passes`, `-print-after`, etc. D87216 has added a mechanism to map
ClassName to pass-name. Adopt it for -filter-passes.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D133263
2022-09-07 22:02:26 -07:00
Marco Elver 97c2220565 [SanitizerBinaryMetadata] Introduce SanitizerBinaryMetadata instrumentation pass
Introduces the SanitizerBinaryMetadata instrumentation pass which uses
the new MD_pcsections metadata kinds to instrument certain types of
instructions and functions required for breakpoint-based sanitizers.

The first intended user of the binary metadata emitted will be a variant
of GWP-TSan [1]. GWP-TSan will require information about atomic
accesses; to unambiguously determine if an access is atomic or not, we
also require "covered" information which code has been compiled with
SanitizerBinaryMetadata instrumentation enabled.

[1] https://llvm.org/devmtg/2020-09/slides/Morehouse-GWP-Tsan.pdf

Reviewed By: dvyukov

Differential Revision: https://reviews.llvm.org/D130887
2022-09-07 21:25:40 +02:00
Andrea Di Biagio 3262794804 [MCA] Correctly check pipeline availability for partially overlapping resource groups.
This patch mostly reverts commit 70b37f4c03 which fixed PR50725.

In case of explicit consumption of multiple partially overlapping group
resources, the ResourceManager was not correctly checking pipeline
esources availability.

The fix for PR50725 only partially addressed a few instances of that issue.
This is a more general (although, technically slower) fix for that same issue.

It also fixes Issue #57548

Thanks to Haohai Wen for the small reproducible.
2022-09-07 12:17:59 +01:00
Marco Elver 343700358f [AsmPrinter] Emit PCs into requested PCSections
Interpret MD_pcsections in AsmPrinter emitting the requested metadata to
the associated sections. Functions and normal instructions are handled.

Differential Revision: https://reviews.llvm.org/D130879
2022-09-07 11:36:02 +02:00
Marco Elver 31a548021b [GlobalISel] Propagate PCSections metadata to MachineInstr
Propagate (most) PC sections metadata to MachineInstr when GlobalISel is
doing instruction selection.

This change results in support for architectures using GlobalISel (such
as -O0 with AArch64). Not all instructions may be supported yet, and
requires further target-specific handling (such as done for AArch64
pseudo-atomics). Expanding supported instructions is planned on a
case-by-case basis and new use cases for PC sections metadata.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D130886
2022-09-07 11:36:02 +02:00
Marco Elver 0ba8886af5 [FastISel] Propagate PCSections metadata to MachineInstr
Propagate PC sections metadata to MachineInstr when FastISel is doing
instruction selection.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D130884
2022-09-07 11:36:01 +02:00
Nikita Popov 98a3a340c3 [ConstantExpr] Don't create fneg expressions
Don't create fneg expressions unless explicitly requested by IR or
bitcode.
2022-09-07 11:27:25 +02:00
Marco Elver da695de628 [MachineInstrBuilder] Introduce MIMetadata to simplify metadata propagation
In many places DebugLoc and PCSections metadata are just copied along to
propagate them through MachineInstrs. Simplify doing so by bundling them
up in a MIMetadata class that replaces the DebugLoc argument to most
BuildMI() variants.

The DebugLoc-only constructors allow implicit construction, so that
existing usage of `BuildMI(.., DL, ..)` works as before, and the rest of
the codebase using BuildMI() does not require changes.

NFC.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D130883
2022-09-07 11:22:50 +02:00
Marco Elver 4c58b00801 [SelectionDAG] Propagate PCSections through SDNodes
Add a new entry to SDNodeExtraInfo to propagate PCSections through
SelectionDAG.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D130882
2022-09-07 11:22:50 +02:00
Jay Foad 1427d55d70 [TableGen] Document sequence with stride
Document (in comments) the optional fourth "stride" argument to the
sequence operator, which was added in svn r157416.

Differential Revision: https://reviews.llvm.org/D133297
2022-09-07 09:58:22 +01:00
Vitaly Buka 4c18670776 [NFC][sancov] Rename ModuleSanitizerCoveragePass 2022-09-06 20:55:39 -07:00
Vitaly Buka 5e38b2a456 [NFC][msan] Rename ModuleMemorySanitizerPass 2022-09-06 20:30:35 -07:00
Vitaly Buka 93600eb50c [NFC][asan] Rename ModuleAddressSanitizerPass 2022-09-06 15:02:11 -07:00
Vitaly Buka e7bac3b9fa [msan] Convert Msan to ModulePass
MemorySanitizerPass function pass violatied requirement 4 of function
pass to do not insert globals. Msan nees to insert globals for origin
tracking, and paramereters tracking.

https://llvm.org/docs/WritingAnLLVMPass.html#the-functionpass-class

Reviewed By: kstoimenov, fmayer

Differential Revision: https://reviews.llvm.org/D133336
2022-09-06 15:01:04 -07:00
Arthur Eubanks 7f57c97d30 [ThinLTOBitcodeWriter] Mark pass as required
Or else with -opt-bisect-limit we don't write ThinLTO bitcode.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D133378
2022-09-06 14:47:34 -07:00
bzcheeseman 716b9f7a1a [LLVM][Support/ADT] Add assert for isPresent to dyn_cast.
This change adds an assert to dyn_cast that the value passed-in is present. In the past, this relied on the isa_impl assertion (which still works in many cases) but which we can tighten up for a better QoI.

The PointerUnion change is because it seems like (based on the call sites) the semantics of the member dyn_cast are actually dyn_cast_if_present.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D133221
2022-09-06 13:58:56 -07:00
raghavmedicherla 142f51fc2f Support: Add mapped_file_region::sync(), equivalent to msync
Add mapped_file_region::sync(), equivalent to POSIX msync,
synchronizing written content to disk without unmapping the region.
Asserts if the mode is not mapped_file_region::readwrite.

Note that I don't have access to a Windows machine, so I can't
easily run those unit tests.

Change by dexonsmith

Differential Revision: https://reviews.llvm.org/D95494
2022-09-06 16:46:37 -04:00
Markus Böck f049b2c3fc [MC] Emit Stackmaps before debug info
This patch is essentially an alternative to https://reviews.llvm.org/D75836 and was mentioned by @lhames in a comment.

The gist of the issue is that Mach-O has restrictions on which kind of sections are allowed after debug info has been emitted, which is also properly asserted within LLVM. Problem is that stack maps are currently emitted as one of the last sections in each target-specific AsmPrinter so far, which would cause the assertion to trigger. The current approach of special casing for the `__LLVM_STACKMAPS` section is not viable either, as downstream users can overwrite the stackmap format using plugins, which may want to use different sections.

This patch fixes the issue by emitting the stack map earlier, right before debug info is emitted. The way this is implemented is by taking the choice when to emit the StackMap away from the target AsmPrinter and doing so in the base class. The only disadvantage of this approach is that the `StackMaps` member is now part of the base class, even for targets that do not support them. This is functionaly not a problem however, as emitting an empty `StackMaps` is a no-op.

Differential Revision: https://reviews.llvm.org/D132708
2022-09-06 20:20:56 +02:00
Joseph Huber 58645d3252 [OpenMP] Fix `omp_get_wtime` function being marked incorrectly as readonly
OpenMP has a list of of optimistic attributes that can be attached to
known runtime functions to aid some analysis. The `omp_get_wtime`
function incorrectly used the `readonly` attribute. This is not correct
at the `omp_get_wtime` function changes values depending on some
external state. This is more correctly modeled with
`inaccessiblememonly` meaning that the value does not depend on anything
within the module, but can not be removes as it depends on external
state.

Fixes #57578

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D133360
2022-09-06 12:59:00 -05:00
Jakub Kuderski 20573d11b7 [ADT] Remove is_splat
`is_splat` is superseded by `all_equal` and marked as deprecated.
See the discussion thread for more details:
https://discourse.llvm.org/t/adt-is-splat-and-empty-ranges/64692

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D132336
2022-09-06 13:49:26 -04:00
Matthias Gehre 7948d89afe Fix "[llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64" compilation on Windows 2022-09-06 16:11:14 +01:00
Marco Elver 7d63983c65 [SelectionDAG] Properly copy ExtraInfo on RAUW
During SelectionDAG legalization SDNodes with associated extra info may
be replaced with a new SDNode. Preserve associated extra info on
ReplaceAllUsesWith and remove entries in DeallocateNode.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D130881
2022-09-06 16:32:50 +02:00
Marco Elver cc3faf4226 [SelectionDAG] Rename CallSiteDbgInfo to NodeExtraInfo
For information infrequently attached to SDNodes, it is useful to
provide a way to add this information out-of-line. This is already done
for call-site specific information.

Rename CallSiteDbgInfo to NodeExtraInfo in preparation of adding
additional information not necessarily related to call sites only.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D130880
2022-09-06 16:32:50 +02:00
Matthias Gehre 2090e85fee [llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64
This adds the ExpandLargeDivRem to the default pass pipeline.
The limit at which it expands div/rem instructions is configured
via a new TargetTransformInfo hook (default: no expansion)
X86, Arm and AArch64 backends implement this hook to expand div/rem
instructions with more than 128 bits.

Differential Revision: https://reviews.llvm.org/D130076
2022-09-06 15:32:04 +01:00
Joseph Huber 5dbc7cf7ca [Object] Refactor code for extracting offload binaries
We currently extract offload binaries inside of the linker wrapper.
Other tools may wish to do the same extraction operation. This patch
simply factors out this handling into the `OffloadBinary.h` interface.

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D132689
2022-09-06 08:55:16 -05:00
Marco Elver 42836e283f [MachineInstr] Allow setting PCSections in ExtraInfo
Provide MachineInstr::setPCSection(), to propagate relevant metadata
through the backend. Use ExtraInfo to store the metadata.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D130876
2022-09-06 15:52:44 +02:00
Marco Elver c70f6e1362 [Metadata] Introduce MD_pcsections
Introduces MD_pcsections metadata kind. See added documentation for
more details.

Subsequent patches enable propagating PC sections metadata through code
generation to the AsmPrinter.

RFC: https://discourse.llvm.org/t/rfc-pc-keyed-metadata-at-runtime/64191

Reviewed By: dvyukov, vitalybuka

Differential Revision: https://reviews.llvm.org/D130875
2022-09-06 15:52:44 +02:00
Amara Emerson 3dd861818a [GlobalISel] Combine G_INSERT/EXTRACT_VECTOR_ELT with out of bounds indices to undef.
Differential Revision: https://reviews.llvm.org/D133309
2022-09-06 13:45:04 +01:00
luxufan 2e7aed1947 [MemorySSA][NFC] Simplify if condition
Differential Revision: https://reviews.llvm.org/D133332
2022-09-05 10:43:17 +00:00
Eli Friedman 63335afb4e [ARM64EC 2/?] Add target triple, and allow targeting it.
Part of patchset to add initial support for ARM64EC.

Per discussion on review, using the triple arm64ec-pc-windows-msvc. The
parsing works the same way as Apple's alternate Arm ABI "arm64e".

Differential Revision: https://reviews.llvm.org/D125412
2022-09-05 12:27:10 -07:00
Eli Friedman 488ad99ecf [ARM64EC 1/?] Add parsing support to llvm-objdump/llvm-readobj.
This is the first patch of a patchset to add initial support for
ARM64EC. Basic documentation is available at
https://docs.microsoft.com/en-us/windows/uwp/porting/arm64ec-abi .
(Discourse post:
https://discourse.llvm.org/t/initial-patches-for-arm64ec-windows-11-now-posted/62449
.)

The file format for ARM64EC is basically identical to normal ARM64.
There are a few extra sections, but the existing code for reading ARM64
object files just works.

Differential Revision: https://reviews.llvm.org/D125411
2022-09-05 12:25:08 -07:00
Joseph Huber c1d19a8489 [ELF] Provide the GNU hash function in libObject
GNU uses a different hashing function compared to the sys-V standard
function already provided in libObject. This is already used internally
in LLD for generating synthetic sections. This patch simply extracts
this definition and makes it availible to other users of `libObject`.
This is done in preparation for supporting symbol name lookups via the
GNU hash table.

Reviewed By: MaskRay, jhenderson

Differential Revision: https://reviews.llvm.org/D132696
2022-09-05 11:04:57 -05:00
Kazu Hirata 2bb43d72d9 [ADT] Use std::tuple_element_t (NFC) 2022-09-03 23:27:24 -07:00
Kazu Hirata 03c3c2db10 [llvm] Use std::remove_reference_t (NFC) 2022-09-03 23:27:22 -07:00