llvm-project

Commit Graph

Author	SHA1	Message	Date
gbreynoo	e60f147324	[llvm-dwarfdump][test] Add missing dedicated tests for some options This change adds tests specifically for --parent-recurse-depth, --quiet and -o. The test for -o found a typo in an error message which is also fixed in this change. Differential Revision: https://reviews.llvm.org/D103250	2021-06-01 14:57:00 +01:00
Nico Weber	65527a8082	[dsymutil tests] Try to make eh_frames.test run on other platforms We now have llvm-otool :)	2021-05-28 15:49:31 -04:00
Simon Giesecke	5f2d4b23b4	Add --quiet option to llvm-gsymutil to suppress output of warnings. Differential Revision: https://reviews.llvm.org/D102829	2021-05-27 12:36:34 +00:00
Esme-Yi	d82f2a123f	[llvm-objdump] Print the DEBUG type under `--section-headers`. Summary: Under the option --section-headers, we can only print the section types of TEXT, DATA, and BSS for now. This patch adds the DEBUG type. Reviewed By: jhenderson, Higuoxing Differential Revision: https://reviews.llvm.org/D102603	2021-05-27 04:53:14 +00:00
Fangrui Song	73a1179535	[llvm-mc] Add -M to replace -riscv-no-aliases and -riscv-arch-reg-names In objdump, many targets support `-M no-aliases`. Instead of having a `-*-no-aliases` for each target when LLVM adds the support, it makes more sense to introduce objdump style `-M`. -riscv-arch-reg-names is removed. -riscv-no-aliases has too many uses and thus is retained for now. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D103004	2021-05-26 10:43:32 -07:00
Andrea Di Biagio	5f500d73cd	[MCA] Add a test for PR50483. NFC	2021-05-26 15:52:11 +01:00
Andrea Di Biagio	63cc9fd579	[MCA][InOrderIssueStage] Fix LastWriteBackCycle computation. Conservatively use the instruction latency to compute the last write-back cycle. Before this patch, the last write cycle computation was incorrect for store instructions that didn't declare any register writes.	2021-05-26 14:17:43 +01:00
Simon Pilgrim	21aec4fdc5	[X86][SLM] Fix vector PSHUFB + variable shift resource/throughputs Match whats documented in the Intel AOM (+Agner) - PSHUFB xmm is really slow, and mmx/xmm vector shifts are half rate. Noticed while working to get the cost tables to more closely match llvm-mca analysis, in this case for shifts and truncations.	2021-05-26 11:14:21 +01:00
Simon Pilgrim	66978466ba	[X86][Atom] Fix vector variable shift resource/throughputs Match whats documented in the Intel AOM - the non-immediate variants of the PSLL/PSRA/PSRL* shift instructions requires BOTH ports - this was being incorrectly modelled as EITHER port. Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.	2021-05-26 10:30:59 +01:00
Wenlei He	fa14fd30ce	[CSSPGO][llvm-profgen] Change default cold threshold for context merging llvm-profgen uses profile summary based cold threshold to merge and trim cold context profile. This is to strike a good balance between profile size and performance. We've been using 99.9% as the cutoff to save profile size without affecting performance. This change switch to use 99.9% instead of 99.9999% as default cold threshold cutoff for llvm-profgen. Redundant switch csprof-cold-thres is also removed and tests cleaned up. Differential Revision: https://reviews.llvm.org/D103071	2021-05-25 10:41:10 -07:00
Langston Barrett	472c009139	[llvm-reduce] Exit when input module is malformed The parseInputFile function returns an empty unique_ptr to signal an error, like when the input file doesn't exist, or is malformed. In this case, the tool should exit immediately rather than segfault by dereferencing the unique_ptr later. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D102891	2021-05-25 10:01:12 -07:00
Jinsong Ji	882e4cbd74	[AIX][AsmPrinter] Print Symbol in comments for TOC load We are using TOCEntry symbols like `LC..0` in TOC loads, this is hard to read , at least requiring an additional step to figure out the loaded symbols. We should print out the name in comments. Reviewed By: #powerpc, shchenz Differential Revision: https://reviews.llvm.org/D102949	2021-05-25 16:37:40 +00:00
Simon Pilgrim	57250f2f3c	[X86][Atom] Fix vector PSHUFB resource/throughputs Match whats documented in the Intel AOM - the XMM variant of PSHUFB requires BOTH ports - this was being incorrectly modelled as EITHER port. Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.	2021-05-25 17:31:45 +01:00
Hongtao Yu	3b51b51877	[CSSPGO][llvm-profgen] Report samples for untrackable frames. Fixing an issue where samples collected for an untrackable frame is not reported. An untrackable frame refers to a frame whose caller is untrackable due to missing debug info or pseudo probe. Though the frame is connected to its parent frame through the frame pointer chain at runtime, the compiler cannot build the connection without debug info or pseudo probe. In such case we just need to report the untrackable frame as the base frame and all of its child frames. With more samples reported I'm seeing this improves the performance of an internal benchmark by 2.5%. Reviewed By: wenlei, wlei Differential Revision: https://reviews.llvm.org/D102961	2021-05-24 12:39:12 -07:00
serge-sans-paille	4ab3041acb	Revert "[NFC] remove explicit default value for strboolattr attribute in tests" This reverts commit `bda6e5bee0`. See https://lab.llvm.org/buildbot/#/builders/109/builds/15424 for instance	2021-05-24 19:43:40 +02:00
serge-sans-paille	bda6e5bee0	[NFC] remove explicit default value for strboolattr attribute in tests Since `d6de1e1a71`, no attributes is quivalent to setting attribute to false. This is a preliminary commit for https://reviews.llvm.org/D99080	2021-05-24 19:31:04 +02:00
Fangrui Song	5d9ea36baf	[UpdateTestChecks] Default --x86_scrub_rip to False True is a bad default: the useful symbol names and `@GOTPCREL` are scrubbed. Change the default and add global variable tests to x86-basic.ll (renamed from x86_function_name.ll since we now also test variables). I updated some tests to show the differences. Updated LCPI regex to include Darwin style `LCPI_[0-9]+_[0-9]+` (no leading dot). Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D102588	2021-05-21 19:26:15 -07:00
Djordje Todorovic	b9076d119a	Recommit: "[Debugify][Original DI] Test dbg var loc preservation"" [Debugify][Original DI] Test dbg var loc preservation This is an improvement of [0]. This adds checking of original llvm.dbg.values()/declares() instructions in optimizations. We have picked a real issue that has been found with this (actually, picked one variable location missing from [1] and resolved the issue), and the result is the fix for that -- D100844. Before applying the D100844, using the options from [0] (but with this patch applied) on the compilation of GDB 7.11, the final HTML report for the debug-info issues can be found at [1] (please scroll down, and look for "Summary of Variable Location Bugs"). After applying the D100844, the numbers has improved a bit -- please take a look into [2]. [0] https://llvm.org/docs/HowToUpdateDebugInfo.html#\ test-original-debug-info-preservation-in-optimizations [1] https://djolertrk.github.io/di-check-before-adce-fix/ [2] https://djolertrk.github.io/di-check-after-adce-fix/ Differential Revision: https://reviews.llvm.org/D100845 The Unit test was failing because the pass from the test that modifies the IR, in its runOnFunction() didn't return 'true', so the expensive-check configuration triggered an assertion.	2021-05-21 02:04:29 -07:00
Simon Pilgrim	a26288e803	[X86][Atom] Fix vector fadd/fcmp/fmul resource/throughputs Match whats documented in the Intel AOM - these are all fadd/fcmp use Port1 and fmul uses Port1, but in many cases BOTH ports are required - this was being incorrectly modelled as EITHER port. Discovered while investigating the correct fptoui costs to fix the regressions in D101555. Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.	2021-05-20 18:56:58 +01:00
Alex Orlov	752385b128	Add support for DWARF embedded source to llvm-symbolizer. This patch adds DWARF embedded source printout to llvm-symbolizer. Reviewed By: jhenderson, dblaikie Differential Revision: https://reviews.llvm.org/D102355	2021-05-20 21:40:28 +04:00
Djordje Todorovic	0ae3c1d4d7	Revert "[Debugify][Original DI] Test dbg var loc preservation" This reverts commit `76f375f3d9`. This will be pushed again, after investigating a test failure: https://lab.llvm.org/buildbot/#/builders/16/builds/11254	2021-05-20 07:11:35 -07:00
Djordje Todorovic	76f375f3d9	[Debugify][Original DI] Test dbg var loc preservation This is an improvement of [0]. This adds checking of original llvm.dbg.values()/declares() instructions in optimizations. We have picked a real issue that has been found with this (actually, picked one variable location missing from [1] and resolved the issue), and the result is the fix for that -- D100844. Before applying the D100844, using the options from [0] (but with this patch applied) on the compilation of GDB 7.11, the final HTML report for the debug-info issues can be found at [1] (please scroll down, and look for "Summary of Variable Location Bugs"). After applying the D100844, the numbers has improved a bit -- please take a look into [2]. [0] https://llvm.org/docs/HowToUpdateDebugInfo.html\ [1] https://djolertrk.github.io/di-check-before-adce-fix/ [2] https://djolertrk.github.io/di-check-after-adce-fix/ Differential Revision: https://reviews.llvm.org/D100845	2021-05-20 06:42:02 -07:00
Sergey Dmitriev	1fb5278882	[llvm-strip] Add support for '--' for delimiting options from input files This will allow to use llvm-strip with file names that begin with dashes. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D102825	2021-05-20 03:33:51 -07:00
Simon Giesecke	0ddc75fd08	Add option to llvm-gsymutil to read addresses from stdin. Differential Revision: https://reviews.llvm.org/D102224	2021-05-20 06:10:35 +00:00
Sergey Dmitriev	0b12963b74	[llvm-objcopy] Update LIT test to resolve bot failure [NFC] Reviewed By: hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D102823	2021-05-19 19:56:35 -07:00
Andrea Di Biagio	9acabe8b6f	[MCA] Unbreak the buildbots by passing flag -mcpu=generic to the new test added by commit `e5d59db469`. This should unbreak buildbot clang-ppc64le-linux-lnt.	2021-05-19 19:12:33 +01:00
Patrick Holland	e5d59db469	[MCA] llvm-mca MCTargetStreamer segfault fix In order to create the code regions for llvm-mca to analyze, llvm-mca creates an AsmCodeRegionGenerator and calls AsmCodeRegionGenerator::parseCodeRegions(). Within this function, both an MCAsmParser and MCTargetAsmParser are created so that MCAsmParser::Run() can be used to create the code regions for us. These parser classes were created for llvm-mc so they are designed to emit code with an MCStreamer and MCTargetStreamer that are expected to be setup and passed into the MCAsmParser constructor. Because llvm-mca doesn’t want to emit any code, an MCStreamerWrapper class gets created instead and passed into the MCAsmParser constructor. This wrapper inherits from MCStreamer and overrides many of the emit methods to just do nothing. The exception is the emitInstruction() method which calls Regions.addInstruction(Inst). This works well and allows llvm-mca to utilize llvm-mc’s MCAsmParser to build our code regions, however there are a few directives which rely on the MCTargetStreamer. llvm-mc assumes that the MCStreamer that gets passed into the MCAsmParser’s constructor has a valid pointer to an MCTargetStreamer. Because llvm-mca doesn’t setup an MCTargetStreamer, when the parser encounters one of those directives, a segfault will occur. In x86, each one of these 7 directives will cause this segfault if they exist in the input assembly to llvm-mca: .cv_fpo_proc .cv_fpo_setframe .cv_fpo_pushreg .cv_fpo_stackalloc .cv_fpo_stackalign .cv_fpo_endprologue .cv_fpo_endproc I haven’t looked at other targets, but I wouldn’t be surprised if some of the other ones also have certain directives which could result in this same segfault. My proposed solution is to simply initialize an MCTargetStreamer after we initialize the MCStreamerWrapper. The MCTargetStreamer requires an ostream object, but we don’t actually want any of these directives to be emitted anywhere, so I use an ostream created with the nulls() function. Since this needs to happen after the MCStreamerWrapper has been initialized, it needs to happen within the AsmCodeRegionGenerator::parseCodeRegions() function. The MCTargetStreamer also needs an MCInstPrinter which is easiest to initialize within the main() function of llvm-mca. So this MCInstPrinter gets constructed within main() then passed into the parseCodeRegions() function as a parameter. (If you feel like it would be appropriate and possible to create the MCInstPrinter within the parseCodeRegions() function, then feel free to modify my solution. That would stop us from having to pass it into the function and would limit its scope / lifetime.) My solution stops the segfault from happening and still passes all of the current (expected) llvm-mca tests. I also added a new test for x86 that checks for this segfault on an input that includes one of the .cv_fpo directives (this test fails without my solution, but passes with it). As far as I can tell, all of the functions that I modified are only called from within llvm-mca so there shouldn’t be any worries about breaking other tools. Differential Revision: https://reviews.llvm.org/D102709	2021-05-19 18:36:10 +01:00
Simon Pilgrim	b14f9a1ebd	[X86][Atom] Fix vector integer shift by immediate resource/throughputs Match whats documented in the Intel AOM (and Agner/instlatx64 agree) - these are all Port0 only. Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.	2021-05-19 14:39:40 +01:00
Sergey Dmitriev	f24f140290	[llvm-objcopy] Add support for '--' for delimiting options from input/output files This will allow to use llvm-objcopy with file names that begin with dashes. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D102665	2021-05-19 01:56:46 -07:00
Alex Orlov	4fedb3a613	[symbolizer] Added StartAddress for the resolved function. In many cases it is helpful to know at what address the resolved function starts. This patch adds a new StartAddress member to the DILineInfo structure. Reviewed By: jhenderson, dblaikie Differential Revision: https://reviews.llvm.org/D102316	2021-05-19 02:38:13 +04:00
Simon Pilgrim	f9b1208681	[X86][Atom] Fix vector integer multiplication resource/throughputs Match whats documented in the Intel AOM (and Agner/instlatx64 agree) - vector integer multiplies are pipelined - all Port0, throughput = 2 @ 128bits, 1 @ 64bits. Noticed while checking reduction costs - now that we can use in-order models in llvm-mca, the atom model is the "worst case scenario" we have in x86.	2021-05-15 14:25:48 +01:00
Roman Lebedev	990e806b36	[NFC][X86][MCA] Add sudo-zero-idiom vperm2f128/vperm2i128 tests - don't break deps While btver2 model states that this pattern is a zero-cycle zero-idiom on Jaguar, it does not appear to be the case on Znver3, here it measures as not being recognized as dep-breaking zero-idiom, let alone a zero-cycle one.	2021-05-14 20:23:05 +03:00
Roman Lebedev	1fc1c88704	[X86] AMD Zen 3: same-reg AVX YMM VPCMPGT{B,W,D,Q} is a zero-cycle(!) dep-breaking zero-idiom As measured by exegesis, and confirmed by ref docs.	2021-05-14 20:23:05 +03:00
Roman Lebedev	2f8572d8e2	[X86] AMD Zen 3: same-reg AVX XMM VPCMPGT{B,W,D,Q} is a zero-cycle(!) dep-breaking zero-idiom As measured by exegesis, and confirmed by ref docs.	2021-05-14 20:23:04 +03:00
Roman Lebedev	f8f7c765a0	[X86] AMD Zen 3: same-reg SSE XMM PCMPGT{B,W,D,Q} is a 1-cycle(!) dep-breaking zero-idiom As measured by exegesis, and confirmed by ref docs.	2021-05-14 20:23:04 +03:00
Roman Lebedev	d2fb4bfba8	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPCMPGT{B,W,D,Q} tests	2021-05-14 20:23:04 +03:00
Roman Lebedev	094b493a3a	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPCMPGT{B,W,D,Q} tests	2021-05-14 20:23:04 +03:00
Roman Lebedev	1c0ac0b0f2	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PCMPGT{B,W,D,Q} tests	2021-05-14 20:23:03 +03:00
Roman Lebedev	26eeb6e650	[X86] AMD Zen 3: same-reg AVX YMM VPSUBUS{B,W} is a 1-cycle(!) dep-breaking zero-idiom Not really mentioned in ref docs, but measures as such. Yes, this one is also not zero-cycle.	2021-05-14 20:23:03 +03:00
Roman Lebedev	41a5dcdf87	[X86] AMD Zen 3: same-reg AVX XMM VPSUBUS{B,W} is a 1-cycle(!) dep-breaking zero-idiom Not really mentioned in ref docs, but measures as such. Yes, this one is also not zero-cycle.	2021-05-14 20:23:03 +03:00
Roman Lebedev	6733fe5c0d	[X86] AMD Zen 3: same-reg SSE XMM PSUBUS{B,W} is a 1-cycle(!) dep-breaking zero-idiom Not really mentioned in ref docs, but measures as such.	2021-05-14 20:23:03 +03:00
Roman Lebedev	9e9c80c250	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPSUBUS{B,W} tests	2021-05-14 20:23:03 +03:00
Roman Lebedev	b6a0449b34	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPSUBUS{B,W} tests	2021-05-14 20:23:02 +03:00
Roman Lebedev	128d9c6bbd	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PSUBUS{B,W} tests	2021-05-14 20:23:02 +03:00
Roman Lebedev	555e1d2987	[X86] AMD Zen 3: same-reg AVX YMM VPSUBS{B,W} is a 1-cycle(!) dep-breaking zero-idiom Not really mentioned in ref docs, but measures as such. Yes, this one is also not zero-cycle.	2021-05-14 20:23:02 +03:00
Roman Lebedev	012417c980	[X86] AMD Zen 3: same-reg AVX XMM VPSUBS{B,W} is a 1-cycle(!) dep-breaking zero-idiom Not really mentioned in ref docs, but measures as such. Yes, this one is also not zero-cycle.	2021-05-14 20:23:02 +03:00
Roman Lebedev	29c4f892fe	[X86] AMD Zen 3: same-reg SSE XMM PSUBS{B,W} is a 1-cycle(!) dep-breaking zero-idiom Not really mentioned in ref docs, but measures as such.	2021-05-14 20:23:02 +03:00
Roman Lebedev	0e20d1f0ef	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPSUBS{B,W} tests	2021-05-14 20:23:01 +03:00
Roman Lebedev	14e48cf8ee	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPSUBS{B,W} tests	2021-05-14 20:23:01 +03:00
Roman Lebedev	4673af527e	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PSUBS{B,W} tests	2021-05-14 20:23:01 +03:00
Roman Lebedev	93f2642871	[X86] AMD Zen 3: same-reg AVX YMM VPSUB{B,W,D,Q} is a zero-cycle(!) dep-breaking zero-idiom As confirmed by the exegesis measurements, and ref docs.	2021-05-14 20:23:01 +03:00
Roman Lebedev	7a45b96e04	[X86] AMD Zen 3: same-reg AVX XMM VPSUB{B,W,D,Q} is a zero-cycle(!) dep-breaking zero-idiom As confirmed by the exegesis measurements, and ref docs.	2021-05-14 20:23:01 +03:00
Roman Lebedev	1ea8be214f	[X86] AMD Zen 3: same-reg SSE XMM PSUB{B,W,D,Q} is a 1-cycle(!) dep-breaking zero-idiom As confirmed by the exegesis measurements, and ref docs.	2021-05-14 20:23:00 +03:00
Roman Lebedev	bbd2117c34	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPSUB{B,W,D,Q} tests	2021-05-14 20:23:00 +03:00
Roman Lebedev	d08909d1cb	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPSUB{B,W,D,Q} tests	2021-05-14 20:23:00 +03:00
Roman Lebedev	a6f5351443	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PSUB{B,W,D,Q} tests	2021-05-14 20:23:00 +03:00
Roman Lebedev	ce22f53916	[X86] AMD Zen 3: same-reg AVX YMM VPANDN is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 20:23:00 +03:00
Roman Lebedev	44c2b4fe91	[X86] AMD Zen 3: same-reg AVX XMM VPANDN is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 20:23:00 +03:00
Roman Lebedev	a72cacb53f	[X86] AMD Zen 3: same-reg SSE XMM PANDN is a 1-cycle(!) dep-breaking zero-idiom As confirmed by the exegesis measurements, and ref docs.	2021-05-14 20:22:59 +03:00
Roman Lebedev	9acc589e5a	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPANDN tests	2021-05-14 20:22:59 +03:00
Roman Lebedev	a3617138c2	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPANDN tests	2021-05-14 20:22:59 +03:00
Roman Lebedev	3f235a0b84	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PANDN tests	2021-05-14 20:22:59 +03:00
Roman Lebedev	1d73c2b8cf	[X86] AMD Zen 3: same-reg AVX YMM VPXOR is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 20:22:59 +03:00
Roman Lebedev	31669b5073	[X86] AMD Zen 3: same-reg AVX XMM VPXOR is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 20:22:58 +03:00
Roman Lebedev	498bf365f4	[X86] AMD Zen 3: same-reg SSE XMM PXOR is a 1-cycle(!) dep-breaking zero-idiom As confirmed by the exegesis measurements, and ref docs.	2021-05-14 20:22:58 +03:00
Roman Lebedev	3009f8a383	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPXOR tests	2021-05-14 20:22:58 +03:00
Roman Lebedev	d58d020b6c	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPXOR tests	2021-05-14 20:22:58 +03:00
Roman Lebedev	0f7a595095	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PXOR tests	2021-05-14 20:22:58 +03:00
Roman Lebedev	4af4afe014	[X86] AMD Zen 3: same-reg AVX YMM VANDNPD is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 14:06:24 +03:00
Roman Lebedev	17f99a8a41	[X86] AMD Zen 3: same-reg AVX XMM VANDNPD is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 14:06:24 +03:00
Roman Lebedev	38ceb46fb0	[X86] AMD Zen 3: same-reg SSE XMM ANDNPD is a 1-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 14:06:24 +03:00
Roman Lebedev	3221e06e9b	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VANDNPD tests	2021-05-14 14:06:24 +03:00
Roman Lebedev	0b7e52e725	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VANDNPD tests	2021-05-14 14:06:24 +03:00
Roman Lebedev	055fa84cd8	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM ANDNPD tests	2021-05-14 14:06:24 +03:00
Roman Lebedev	d8a595b81c	[X86] AMD Zen 3: same-reg AVX YMM VANDNPS is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 14:06:24 +03:00
Roman Lebedev	fd4cbc822b	[X86] AMD Zen 3: same-reg AVX XMM VANDNPS is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 14:06:23 +03:00
Roman Lebedev	f38dcbecb6	[X86] AMD Zen 3: same-reg SSE XMM ANDNPS is a 1-cycle(!) dep-breaking zero-idiom Same as SSE XMM XORPS/XORPD, it is not zero-cycle, even though it breaks the deps. As confirmed by the exegesis measurements, and ref docs.	2021-05-14 14:06:23 +03:00
Roman Lebedev	c79c7bb980	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VANDNPS tests	2021-05-14 14:06:23 +03:00
Roman Lebedev	a57006d627	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VANDNPS tests	2021-05-14 14:06:23 +03:00
Roman Lebedev	a657808948	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM ANDNPS tests	2021-05-14 14:06:23 +03:00
Roman Lebedev	43a7f130a7	[X86] AMD Zen 3: same-reg AVX YMM VXORPD is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 11:56:07 +03:00
Roman Lebedev	336b9dbe88	[X86] AMD Zen 3: same-reg AVX XMM VXORPD is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis measurements, and ref docs.	2021-05-14 11:56:07 +03:00
Roman Lebedev	9c596bc541	[X86] AMD Zen 3: same-reg SSE XMM XORPD is a 1-cycle(!) dep-breaking zero-idiom Same as with it's float friend, unlike their AVX versions. As confirmed by exegesis, and ref docs.	2021-05-14 11:56:07 +03:00
Roman Lebedev	3567c7eda1	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VXORPD tests	2021-05-14 11:56:07 +03:00
Roman Lebedev	57eee56d0a	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VXORPD tests	2021-05-14 11:56:06 +03:00
Roman Lebedev	fdc65e46b6	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM XORPD tests	2021-05-14 11:56:06 +03:00
Roman Lebedev	59554c01ab	[X86] AMD Zen 3: same-reg AVX YMM VXORPS is a zero-cycle(!) dep-breaking zero-idiom As confirmed by exegesis, and ref docs.	2021-05-14 11:56:06 +03:00
Roman Lebedev	2a7c52ff7f	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VXORPS tests	2021-05-14 11:56:06 +03:00
Roman Lebedev	26c1bffe67	[X86] AMD Zen 3: same-reg AVX XMM VXORPS is a zero-cycle(!) dep-breaking zero-idiom Unlike it's legacy SSE XMM XORPS version, which measures as being 1-cycle, this one is certainly a zero-cycle instruction, in addition to both of them being dependency breaking. As confirmed by exegesis measurements, and ref docs.	2021-05-14 11:56:06 +03:00
Roman Lebedev	a9fb321a67	[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VXORPS tests	2021-05-14 11:56:06 +03:00
Roman Lebedev	aa0dcb3ba4	[X86] AMD Zen 3: same-reg SSE XMM XORPS is a 1-cycle(!) dep-breaking one-idiom While both the SOG and Agner insist that it is zero-cycle, i can not confirm that claim. While it clearly breaks the dependency, i can not come up with a snippet, or measurement approach, to end up with IPC bigger than 4, which, to me, means that it actually consumes execution resource of an FP unit for a cycle.	2021-05-14 00:03:36 +03:00
Roman Lebedev	6c4596793d	[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM XORPS test	2021-05-14 00:03:36 +03:00
Martin Storsjö	b42fb6811e	[llvm-nm] Support the -V option, print that the tool is compatible with GNU nm This unlocks some codepaths in libtool. Differential Revision: https://reviews.llvm.org/D102321	2021-05-13 22:36:25 +03:00
Aakanksha Patil	464e4dc50f	[AMDGPU] Add gfx1034 target Differential Revision: https://reviews.llvm.org/D102306	2021-05-13 14:25:18 -04:00
Jordan Rupprecht	1336c5ae2f	[llvm-cov][test] Add test coverage for "gcov" implying "llvm-cov gcov" compatibility. Much like other LLVM binary utilities, `llvm-cov` has a symlink compatibility feature where it runs in `gcov` compatibility mode if the binary name ends in `gcov`. This is identical to invoking `llvm-cov gcov ...`. Differential Revision: https://reviews.llvm.org/D102299	2021-05-12 08:21:42 -07:00
Greg McGary	5a43901539	[llvm-objdump] Exclude __mh__header symbols during MachO disassembly `__mh_(execute\|dylib\|dylinker\|bundle\|preload\|object)_header` are special symbols whose values hold the VMA of the Mach header to support introspection. They are attached to the first section in `__TEXT`, even though their addresses are outside `__TEXT`, and they do not refer to code. It is normally harmless, but when the first section of `__TEXT` has no other symbols, `__mh__header` is considered by the disassembler when determing function boundaries. Since `__mh_*_header` refers to an address outside `__TEXT`, the boundary determination fails and disassembly quits. Since `__TEXT,__text` normally has symbols, this bug is obscured. Experiments placing `__stubs` and `__stub_helper` first exposed the bug, since neither has symbols. Differential Revision: https://reviews.llvm.org/D101786	2021-05-12 06:39:14 -07:00
Alex Orlov	d8e65585f7	Fixed llvm-objcopy to add correct symbol table for ELF with program headers. This fixes the following bugs: https://bugs.llvm.org/show_bug.cgi?id=43935 Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D102258	2021-05-12 12:39:30 +04:00
Petr Hosek	8280ece0c9	[Coverage] Support overriding compilation directory When making compilation relocatable, for example in distributed compilation scenarios, we want to set compilation dir to a relative value like `.` but this presents a problem when generating reports because if the file path is relative as well, for example `..`, you may end up writing files outside of the output directory. This change introduces a flag that allows overriding the compilation directory that's stored inside the profile with a different value that is absolute. Differential Revision: https://reviews.llvm.org/D100232	2021-05-11 15:26:45 -07:00
Alan Phipps	eccb925147	Reland "[Coverage] Fix branch coverage merging in FunctionCoverageSummary::get() for instantiation"" Originally landed in: `6400905a61` Reverted in: `668dccc396` Fix branch coverage merging in FunctionCoverageSummary::get() for instantiation groups. This change corrects the implementation for the branch coverage summary to do the same thing for branches that is done for lines and regions. That is, across function instantiations in an instantiation group, the maximum branch coverage found in any of those instantiations is returned, with the total number of branches being the same across instantiations. Differential Revision: https://reviews.llvm.org/D102193	2021-05-11 11:48:23 -05:00
Alan Phipps	668dccc396	Revert "Fix branch coverage merging in FunctionCoverageSummary::get() for instantiation" This reverts commit `6400905a61`.	2021-05-11 11:26:19 -05:00
Alan Phipps	6400905a61	Fix branch coverage merging in FunctionCoverageSummary::get() for instantiation groups. This change corrects the implementation for the branch coverage summary to do the same thing for branches that is done for lines and regions. That is, across function instantiations in an instantiation group, the maximum branch coverage found in any of those instantiations is returned, with the total number of branches being the same across instantiations. Differential Revision: https://reviews.llvm.org/D102193	2021-05-11 10:42:40 -05:00
Alex Orlov	05d1ae4e18	* Add support for JSON output style to llvm-symbolizer This patch adds JSON output style to llvm-symbolizer to better support CLI automation by providing a machine readable output. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D96883	2021-05-11 13:10:54 +04:00
Djordje Todorovic	1ed2963600	[llvm-dwarfdump] Fix abstract origin vars location stats calculation There are cases where a concrete DIE with DW_TAG_subprogram can have abstract_origin attribute, so we handle that situation as well. Differential Revision: https://reviews.llvm.org/D101025	2021-05-11 01:04:51 -07:00
Roman Lebedev	6a64c462eb	[X86] AMD Zen 3: same-reg AVX YMM VPCMP is dep breaking one-idiom As measured by exegesis, and confirmed by ref docs. Still not zero-cycle :)	2021-05-10 23:49:27 +03:00
Roman Lebedev	5864e7b86b	[NFC][X86][MCA] AMD Zen 3: add tests for same-re AVX YMM VPCMP	2021-05-10 23:49:27 +03:00
Roman Lebedev	2953245337	[X86] AMD Zen 3: same-reg AVX XMM VPCMP is dep breaking one-idiom As measured by exegesis, and confirmed by ref docs. Again, it's not zero-cycle.	2021-05-10 23:49:26 +03:00
Roman Lebedev	f59db6c4f8	[NFC][X86][MCA] AMD Zen 3: add tests for same-re AVX XMM VPCMP	2021-05-10 23:49:26 +03:00
Roman Lebedev	0f3bcb97ef	[X86] AMD Zen 3: same-reg SSE XMM PCMP is dep breaking one-idiom As measured by exegesis, and confirmed by ref docs. Much like with MMX PCMP, it does actually have to execute, though.	2021-05-10 23:49:26 +03:00
Roman Lebedev	0e538f937a	[NFC][X86][MCA] AMD Zen 3: add tests for same-reg XMM SSE PCMP	2021-05-10 23:49:26 +03:00
Roman Lebedev	b24edfff4f	[X86] AMD Zen 3: same-reg PCMPEQ is an MMX all-ones dep breaking idiom They are, however, not zero-cycle, and do actually execute. As measured by exegesis, and confirmed by ref docs.	2021-05-10 23:49:26 +03:00
Roman Lebedev	ba225ce961	[NFC][X86][MCA] AMD Zen 3: add tests for same-reg MMX PCMPEQ	2021-05-10 23:49:25 +03:00
Roman Lebedev	08cf2776ac	[X86] AMD Zen 3: sub-32-bit CMP also break dependencies They measure as having the same effect as 32-bit CMP.	2021-05-10 20:57:38 +03:00
Roman Lebedev	ecff974b66	[NFC][X86][MCA] AMD Zen 3: add tests for sub-32-bit CMP dep breaking	2021-05-10 20:57:37 +03:00
Fangrui Song	7a0231ae59	[llvm-objdump][MachO] Print a newline before lazy bind/bind/weak/exports trie This adds a separator between two pieces of information. Reviewed By: #lld-macho, alexshap Differential Revision: https://reviews.llvm.org/D102114	2021-05-10 09:16:18 -07:00
Roman Lebedev	be23d5e814	[X86] AMD Zen 3: same-reg CMP is a zero-cycle dependency-breaking instruction As measured by exegesis, and confirmed by ref docs.	2021-05-10 00:03:20 +03:00
Roman Lebedev	9a31efa2f5	[NFC][X86][MCA] AMD Zen 3: add tests for CMP dependency breaking	2021-05-10 00:03:20 +03:00
Roman Lebedev	11b0568dce	[X86] AMD Zen 3: same-reg SBB is a dependency-breaking instruction As confirmed by exegesis measurements, and ref docs. It does actually execute. While there, bump latency for MULX32rr, that seems to match measurements.	2021-05-10 00:03:20 +03:00
Roman Lebedev	8d0e2d2b0f	[NFC][X86][MCA] AMD Zen 3: add tests for SBB dependency breaking	2021-05-10 00:03:20 +03:00
Roman Lebedev	eed8552787	[X86] AMD Zen 3: same-register XOR/SUB are GPR dependency breaking zero-idioms As measured by exegesis and confirmed in reference docs.	2021-05-10 00:03:20 +03:00
Roman Lebedev	ab794852ed	[NFC][X86][MCA] AMD Zen3: add GPR zero-idiom dependency breaking tests	2021-05-10 00:03:20 +03:00
Roman Lebedev	a21df76db6	[X86] AMD Zen 3: XCHG is a zero-cycle instruction As measured by exegesis and confirmed by reference docs.	2021-05-09 20:37:57 +03:00
Fangrui Song	492173d42b	[test] Fix tools/gold/X86/new-pm.ll after D101797	2021-05-08 13:41:36 -07:00
Arthur Eubanks	34a8a437bf	[NewPM] Hide pass manager debug logging behind -debug-pass-manager-verbose Printing pass manager invocations is fairly verbose and not super useful. This allows us to remove DebugLogging from pass managers and PassBuilder since all logging (aside from analysis managers) goes through instrumentation now. This has the downside of never being able to print the top level pass manager via instrumentation, but that seems like a minor downside. Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D101797	2021-05-07 21:51:47 -07:00
Roman Lebedev	2819009b5a	[X86] AMD Zen 3: _REV variants of zero-cycles moves are also zero-cycles (PR50261) Sometimes disassembler picks _REV variants of instructions over the plain ones, which in this case exposed an issue that the _REV variants aren't being modelled as optimizable moves.	2021-05-07 18:27:40 +03:00
Roman Lebedev	a8e30e63ac	[NFC][X86][MCA] AMD Zen3: add test for zero-cycle X87 move	2021-05-07 18:27:40 +03:00
Roman Lebedev	34de155f7e	[NFC][X86][MCA] AMD Zen3 Decrease iteration count in reg-move-elimination tests Drop it just enough so it still produces the right IPC.	2021-05-07 17:06:45 +03:00
Roman Lebedev	758c173309	[X86] AMD Zen 3: throughput for renameable XMM/YMM moves is 6 They are resolved at the register rename stage without using any execution units.	2021-05-07 17:06:45 +03:00
Roman Lebedev	715c0d0bd4	[X86] AMD Zen 3: AVX YMM moves are zero-cycle I've verified this with llvm-exegesis. This is not limited to zero registers.	2021-05-07 17:06:45 +03:00
Roman Lebedev	ee020b930d	[X86] AMD Zen 3: AVX XMM moves are zero-cycle I've verified this with llvm-exegesis. This is not limited to zero registers.	2021-05-07 17:06:44 +03:00
Roman Lebedev	9db4203883	[X86] AMD Zen 3: SSE XMM moves are zero-cycle I've verified this with llvm-exegesis. This is not limited to zero registers. Refs: AMD SOG 19h, 2.9.4 Zero Cycle Move The processor is able to execute certain register to register mov operations with zero cycle delay. Agner, 22.13 Instructions with no latency Register-to-register move instructions are resolved at the register rename stage without using any execution units. These instructions have zero latency. It is possible to do six such register renamings per clock cycle, and it is even possible to rename the same register multiple times in one clock cycle.	2021-05-07 17:06:44 +03:00
Roman Lebedev	0d961fbd52	[NFC][X86][MCA] AMD Zen 3: Add tests for renameable AVX YMM moves	2021-05-07 17:06:44 +03:00
Roman Lebedev	bcbfc22ff9	[NFC][X86][MCA] AMD Zen 3: Add tests for renameable AVX XMM moves	2021-05-07 17:06:44 +03:00
Roman Lebedev	cbabe4f4d6	[NFC][X86][MCA] AMD Zen 3: Add tests for renameable SSE XMM moves	2021-05-07 17:06:44 +03:00
Roman Lebedev	d8c6202576	[X86] AMD Zen 3: throughput for renameable GPR moves is 6 They are resolved at the register rename stage without using any execution units.	2021-05-07 17:06:43 +03:00
Roman Lebedev	e6d688ec96	[NFC][X86][MCA] Increase iteration count in reg move elimination tests So the IPC actually stabilizes at 6.	2021-05-07 17:06:43 +03:00
Roman Lebedev	bda9ca3e44	[NFC][X86][MCA] AMD Zen 3: add tests with non-eliminatible MMX moves In Zen3, MMX moves are not eliminated, i've verified this with llvm-exegesis.	2021-05-07 13:56:07 +03:00
Roman Lebedev	7059b28d5d	[X86] AMD Zen 3: 32/64 -bit GPR register moves are zero-cycle I've verified this with llvm-exegesis. This is not limited to zero registers. Refs: AMD SOG 19h, 2.9.4 Zero Cycle Move The processor is able to execute certain register to register mov operations with zero cycle delay. Agner, 22.13 Instructions with no latency Register-to-register move instructions are resolved at the register rename stage without using any execution units. These instructions have zero latency. It is possible to do six such register renamings per clock cycle, and it is even possible to rename the same register multiple times in one clock cycle.	2021-05-07 13:56:07 +03:00
Roman Lebedev	227678089c	[NFC][X86][MCA] AMD Zen 3: add tests with eliminatible GPR moves	2021-05-07 13:56:07 +03:00
gbreynoo	f0762fc42f	[llvm-dwarfdump] Help option output should be consistent with the command guide The dwarfdump command guide shows the short options used as aliases but these are not found in the help text unless --show-hidden is used. Investigating other tools some follow this pattern, others like llvm-objdump show aliases with --help. This change fixes the help output to be consistent with the command guide. This includes updating alias descriptions in the help output to use "--". As part of this change I updated cmdline.test, including some options that were missing testing. Differential Revision: https://reviews.llvm.org/D101646	2021-05-07 11:23:05 +01:00
Matthew Voss	22aece57be	Allow llvm-dis to disassemble multiple files Differential Revision: https://reviews.llvm.org/D101110	2021-05-06 11:08:55 -07:00
Fangrui Song	b3336bfa2e	[llvm-objcopy][ELF] --only-keep-debug: set offset/size of segments with no sections to zero PR50160: we currently ignore non-PT_PHDR segments with no sections, not accounting for its p_offset and p_filesz: this can cause an out-of-bounds write in `writeSegmentData` if the p_offset+p_filesz is larger than the total file size. This can be fixed by setting p_offset=p_filesz=0. The logic nicely unifies with the logic added in D90897. Reviewed By: jhenderson, rupprecht Differential Revision: https://reviews.llvm.org/D101560	2021-05-05 10:26:57 -07:00
Fangrui Song	e510860656	[llvm-objdump] Add -M {att,intel} & deprecate --x86-asm-syntax={att,intel} The internal `cl::opt` option --x86-asm-syntax sets the AsmParser and AsmWriter dialect. The option is used by llc and llvm-mc tests to set the AsmWriter dialect. This patch adds -M {att,intel} as GNU objdump compatible aliases (PR43413). Note: the dialect is initialized when the MCAsmInfo is constructed. `MCInstPrinter::applyTargetSpecificCLOption` is called too late and its MCAsmInfo reference is const, so changing the `cl::opt` in `MCInstPrinter::applyTargetSpecificCLOption` is not an option, at least without large amount of refactoring. Reviewed By: hoy, jhenderson, thakis Differential Revision: https://reviews.llvm.org/D101695	2021-05-05 00:20:41 -07:00
Fangrui Song	96f3a63076	[llvm-objcopy] --dump-section: error if '=' is missing or filename is empty Fix PR45416: the diagnostic when '=' is missing is misleading. `FileOutputBuffer::create` returns successfully when the filename is empty (the temporary file is `.tmp%%%%%%%`), but `FileOutputBuffer::commit` will error when renaming `.tmp%%%%%%%` to the empty name). Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D101697	2021-05-04 17:30:57 -07:00
Martin Storsjö	70c4930637	[llvm-readobj] [ARMWinEH] Try to resolve label symbols into regular ones Unwind info generated by MSVC tends to have relocations pointing at static "label" symbols like "$LN4" instead of regular ones based on the actual function's name. Try to resolve such symbols to a non-label symbol if possible (ideally to an external symbol), to improve the readability. Differential Revision: https://reviews.llvm.org/D101567	2021-05-04 22:22:18 +03:00
Fangrui Song	dcf6d0d389	[llvm-objdump] Fix -a after D100433 -a is alias for --archive-headers, not --all-headers	2021-05-04 10:17:36 -07:00
Fangrui Song	0c2e2f88fb	[llvm-objdump] Improve newline consistency between different pieces of information When dumping multiple pieces of information (e.g. --all-headers), there is sometimes no separator between two pieces. This patch uses the "\nheader:\n" style, which generally improves compatibility with GNU objdump. Note: objdump -t/-T does not add a newline before "SYMBOL TABLE:" and "DYNAMIC SYMBOL TABLE:". We add a newline to be consistent with other information. `objdump -d` prints two empty lines before the first 'Disassembly of section'. We print just one with this patch. Differential Revision: https://reviews.llvm.org/D101796	2021-05-04 09:56:07 -07:00
gbreynoo	a617e2064d	[llvm-objdump] Remove Generic Options group from help text output Reapply `7368624` after revert and fix Looking at other tools using tablegen for help output, general options like --help are not separated from other options. This change removes the "Generic Options" option group so the options are listed together. the macho specific option group is left unaffected. The test help.test was modified to reflect this change. Differential Revision: https://reviews.llvm.org/D101652	2021-05-04 17:42:20 +01:00
Dimitry Andric	dffddde73a	Revert "[llvm-objdump] Remove Generic Options group from help text output" This reverts commit `73686247ac`, as there were git stash conflict markers left unresolved.	2021-05-04 18:28:31 +02:00
gbreynoo	73686247ac	[llvm-objdump] Remove Generic Options group from help text output Looking at other tools using tablegen for help output, general options like --help are not separated from other options. This change removes the "Generic Options" option group so the options are listed together. the macho specific option group is left unaffected. The test help.test was modified to reflect this change. Differential Revision: https://reviews.llvm.org/D101652	2021-05-04 16:48:03 +01:00
Giorgis Georgakoudis	404fa9a6cf	[Utils] Add prof metadata to matched unnamed values Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D101742	2021-05-03 15:15:34 -07:00
Konstantin Zhuravlyov	2055cc8ef4	AMDGPU: XFAIL LLVM::note-amd-valid-v2.test for big endian	2021-05-03 09:45:19 -04:00
Konstantin Zhuravlyov	94aaf3ddd9	Reland "AMDGPU/llvm-readobj: Add missing tests for note parsing/displaying" This reverts commit `54aad63659`. Includes fix for note-amd-valid-v3.s test.	2021-05-02 22:56:17 -04:00
Sergio Perez Gonzalez	761d5614a1	[Object] Fix e_machine description for EM_CR16 and add EM_MICROBLAZE Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D101133	2021-05-02 19:25:39 -07:00
Roman Lebedev	2b93c9c16c	[X86] AMD Zen 3 Scheduler Model Introduce basic schedule model for AMD Zen 3 CPU's, a.k.a `znver3`. This is fully built from scratch, from llvm-mca measurements and documented reference materials. Nothing was copied from `znver2`/`znver1`. I believe this is in a reasonable state of completion for inclusion, probably better than D52779 `bdver2` was :) Namely: * uops are pretty spot-on (at least what llvm-mca can measure) {F16422596} * latency is also pretty spot-on (at least what llvm-mca can measure) {F16422601} * throughput is within reason {F16422607} I haven't run much benchmarks with this, however RawSpeed benchmarks says this is beneficial: {F16603978} {F16604029} I'll call out the obvious problems there: * i didn't really bother with X87 instructions * i didn't really bother with obviously-microcoded/system instructions * There are large discrepancy in throughput for `mr` and `rm` instructions. I'm not really sure if it's a modelling defect that needs to be fixed, or it's a defect of measurments. * Pipe distributions are probably bad :) I can't do much here until AMD allows that to be fixed by documenting the appropriate counters and updating libpfm That being said, as @RKSimon notes: >>! In D94395#2647381, @RKSimon wrote: > I'll mention again that all the znver* models appear to be very inaccurate wrt SIMD/FPU instructions <...> so how much worse this could possibly be?! Things that aren't there: * Various tunings: zero idioms, etc. That is follow-ups. Differential Revision: https://reviews.llvm.org/D94395	2021-05-01 22:08:13 +03:00
Jez Ng	c00fc180ec	[llvm-readobj] Recognize N_THUMB_DEF as a symbol flag The right symbol flag mask is ~0x7, not ~0xf. Also emit string names for the other flags (we were missing some). Reviewed By: #lld-macho, gkm Differential Revision: https://reviews.llvm.org/D101548	2021-04-30 17:39:56 -04:00
Arthur Eubanks	511f2cecf7	[llvm-reduce] Don't unset dso_local on implicitly dso_local GVs This introduces a flag that aborts if we ever reduce to IR that fails the verifier. Reviewed By: swamulism, arichardson Differential Revision: https://reviews.llvm.org/D101279	2021-04-30 11:57:22 -07:00
Arthur Eubanks	545a8177ea	[llvm-reduce] Add flag to only run specific passes Reviewed By: fhahn, hans Differential Revision: https://reviews.llvm.org/D101278	2021-04-30 11:51:01 -07:00
Konstantin Zhuravlyov	54aad63659	Revert "AMDGPU/llvm-readobj: Add missing tests for note parsing/displaying" This reverts commit `c9c4676a45`. Reason for revert: note-amd-valid-v3.s test fails if AMDGPU is not built.	2021-04-30 14:45:52 -04:00
Nick Desaulniers	dde24a87c5	[llvm-objdump] add -v alias for --version Used by the Linux kernel's CONFIG_X86_DECODER_SELFTEST. Link: https://github.com/ClangBuiltLinux/linux/issues/1130 Reviewed By: MaskRay, jhenderson, rupprecht Differential Revision: https://reviews.llvm.org/D101483	2021-04-30 11:26:36 -07:00
Konstantin Zhuravlyov	c9c4676a45	AMDGPU/llvm-readobj: Add missing tests for note parsing/displaying This is a follow up review/change for https://reviews.llvm.org/D95638 Add valid note tests for code object v2 notes: - NT_AMD_HSA_CODE_OBJECT_VERSION (required yaml2obj update) - NT_AMD_HSA_HSAIL (required yaml2obj update) - NT_AMD_HSA_ISA_VERSION (required yaml2obj update) - NT_AMD_HSA_METADATA - NT_AMD_HSA_ISA_NAME - NT_AMD_PAL_METADATA Add valid note tests for code object v3 notes: - NT_AMDGPU_METADATA Add invalid note tests for code object v2 notes: - NT_AMD_HSA_CODE_OBJECT_VERSION (required yaml2obj update) - NT_AMD_HSA_HSAIL (required yaml2obj update) - NT_AMD_HSA_ISA_VERSION (required yaml2obj update) Add invalid note tests for code object v3 notes: - NT_AMDGPU_METADATA Differential Revision: https://reviews.llvm.org/D101304	2021-04-30 11:19:16 -04:00
Andrea Di Biagio	8bd4f3d547	[MCA] Fix CarryOver check in the DispatchStage (PR50174). Early exit from method DispatchStage::isAvailable() if the dispatch group is already full. Not all instructions declare at least one uOP. Fixes PR50174.	2021-04-30 14:26:46 +01:00
Martin Storsjö	4750a8b1bc	Reapply [llvm-readobj] [ARMWinEH] Fix handling of relocations and symbol offsets When looking up data referenced from pdata/xdata structures, the referenced data can be found in two different ways: - For an unrelocated object file, it's located via a relocation - For a relocated, linked image, the data is referenced with an (image relative) absolute address For the latter case, the absolute address can optionally be described with a symbol. For the case of an object file, there's two offsets involved; one immediate offset encoded in the data location that is modified by the relocation, and a section offset in the symbol. Previously, for the ExceptionRecord field, we printed the offset from the symbol (only) but used the immediate offset ignoring the symbol's address (using only the symbol's section) for printing the exception data. Add a helper method for doing the lookup and address calculation, for simplifying the calling code and making all the cases consistent. This addresses an existing FIXME comment, fixing printing of the exception data for cases where relocations point at individual symbols in the xdata section (which is what MSVC generates) instead of all relocations pointing at the start of the xdata section (which is what LLVM generates). This also fixes printing of the function name for packed entries in linked images. Relanded with a format string fix in the formatSymbol function; one can't use %X as format string for an uint64_t. That bug has been present since this code was added in `e6971cab30`. Differential Revision: https://reviews.llvm.org/D100305	2021-04-30 09:51:23 +03:00
Martin Storsjö	5bf2ef9d86	Revert "[llvm-readobj] [ARMWinEH] Fix handling of relocations and symbol offsets" This reverts commit `3778924088`. The added test fails on at least one buildbot, by printing a reversed combination, printing "func3_xdata +0x18 (0x8)" while it's supposed to be "func3_xdata +0x8 (0x18)", see e.g. https://lab.llvm.org/buildbot/#/builders/107/builds/7269. Currently no idea how that could happen, but reverting until it can be figured out.	2021-04-30 00:06:16 +03:00
Martin Storsjö	3778924088	[llvm-readobj] [ARMWinEH] Fix handling of relocations and symbol offsets When looking up data referenced from pdata/xdata structures, the referenced data can be found in two different ways: - For an unrelocated object file, it's located via a relocation - For a relocated, linked image, the data is referenced with an (image relative) absolute address For the latter case, the absolute address can optionally be described with a symbol. For the case of an object file, there's two offsets involved; one immediate offset encoded in the data location that is modified by the relocation, and a section offset in the symbol. Previously, for the ExceptionRecord field, we printed the offset from the symbol (only) but used the immediate offset ignoring the symbol's address (using only the symbol's section) for printing the exception data. Add a helper method for doing the lookup and address calculation, for simplifying the calling code and making all the cases consistent. This addresses an existing FIXME comment, fixing printing of the exception data for cases where relocations point at individual symbols in the xdata section (which is what MSVC generates) instead of all relocations pointing at the start of the xdata section (which is what LLVM generates). This also fixes printing of the function name for packed entries in linked images. Differential Revision: https://reviews.llvm.org/D100305	2021-04-29 23:35:10 +03:00
Alexander Shaposhnikov	86f291ebb2	[llvm-objcopy][MachO] Add support for LC_THREAD/LC_UNIXTHREAD Add support for LC_THREAD/LC_UNIXTHREAD (these load commands can be copied over without any modifications). Test plan: make check-all Differential revision: https://reviews.llvm.org/D101384	2021-04-28 16:29:33 -07:00
Jonas Devlieghere	625bd94c6d	[dsymutil] Add flag to force a static variable to keep its enclosing function Add a flag to change dsymutil's behavior and force a static variable to keep its enclosing function. The test shows a situation where that could be useful. I'm not convinced this behavior makes sense as a default, which is why it's behind a flag. rdar://74918374 Differential revision: https://reviews.llvm.org/D101337	2021-04-28 11:33:04 -07:00
Alex Richardson	79030a22cc	[llvm-objdump] Fix dumping dynamic relative relocations for SHT_REL Previously printing R_386_RELATIVE relocations would trigger `error: can't read an entry at 0x40: it goes past the end of the section (0x40)` I found this while writing a test case for LLD (D100490). This also includes some minor cleanup in the elf-dynamic-relcos.test llvm-objdump test based on the newly added test. Reviewed By: jhenderson, MaskRay Differential Revision: https://reviews.llvm.org/D100489	2021-04-28 12:23:00 +01:00
Alex Richardson	9692811b26	[update_(llc_)test_checks.py] Support pre-processing commands This has been rather useful in our downstream CHERI target where we want to run tests both with addrspace(0) and addrspace(200) pointers. With this patch we can prefix the opt command with `sed -e 's/addrspace(200)/addrspace(0)/g' -e 's/-A200-P200-G200//g'` to test both cases using the same IR input. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95137	2021-04-28 12:19:19 +01:00
Alexander Shaposhnikov	412437aec0	Revert "[llvm-objcopy][MachO] Add support for LC_THREAD/LC_UNIXTHREAD" This reverts commit `4dfddf715b` since it breaks some build bots (e.g. clang-ppc64be-linux)	2021-04-27 16:19:59 -07:00
Alexander Shaposhnikov	4dfddf715b	[llvm-objcopy][MachO] Add support for LC_THREAD/LC_UNIXTHREAD Add support for LC_THREAD/LC_UNIXTHREAD (these load commands can be copied over without any modifications). Test plan: make check-all Differential revision: https://reviews.llvm.org/D101384	2021-04-27 15:54:51 -07:00
Fangrui Song	a41f076ef1	[test] Fix tools/gold/X86/weak.ll after D94202 The order regressed after D94202: after `a b`, when adding `a c`, we may reorder `a` to the right of `b`, causing the final order to be `b a c`.	2021-04-26 16:04:22 -07:00
Ali Tamur	51b4610743	Support DW_FORM_strx* in llvm-dwp. Currently llvm-dwp only handled DW_FORM_string and DW_FORM_GNU_str_index; with this patch it also starts to handle DW_FORM_strx[1-4]? Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D75485	2021-04-26 12:32:45 -07:00
Martin Storsjö	f8de9aaef2	[llvm-rc] Add a GNU windres-like frontend to llvm-rc This primarily parses a different set of options and invokes the same resource compiler as llvm-rc normally. Additionally, it can convert directly to an object file (which in MSVC style setups is done with the separate cvtres tool, or by the linker). (GNU windres also supports other conversions; from coff object file back to .res, and from .res or object file back to .rc form; that's not yet implemented.) The other bigger complication lies in being able to imply or pass the intended target triple, to let clang find the corresponding mingw sysroot for finding include files, and for specifying the default output object machine format. It can be implied from the tool triple prefix, like `<triple>-[llvm-]windres` or picked up from the windres option e.g. `-F pe-x86-64`. In GNU windres, that option takes BFD style format names such as pe-i386 or pe-x86-64. As libbfd in binutils doesn't support Windows on ARM, there's no such canonical name for the ARM targets. Therefore, as an LLVM specific extension, this option is extended to allow passing full triples, too. Differential Revision: https://reviews.llvm.org/D100756	2021-04-26 22:04:29 +03:00
Tim Renouf	18adf4bb0d	[AMDGPU][llvm-objdump] Add lit.local.cfg missing from recent commit Stops llvm-objdump tests failing when AMDGPU target is not supported. Change-Id: Ic4ae443958c41c303ff6bee0966e5f21ab7a1851	2021-04-26 14:07:04 +01:00
Tim Renouf	8710eff6c3	[MC][AMDGPU][llvm-objdump] Synthesized local labels in disassembly 1. Add an accessor function to MCSymbolizer to retrieve addresses referenced by a symbolizable operand, but not resolved to a symbol. That way, the caller can synthesize labels at those addresses and then retry disassembling the section. 2. Implement that in AMDGPU -- a failed symbol lookup results in the address being added to a vector returned by the new function. 3. Use that in llvm-objdump when using MCSymbolizer (which only happens on AMDGPU) and SymbolizeOperands is on. Differential Revision: https://reviews.llvm.org/D101145 Change-Id: I19087c3bbfece64bad5a56ee88bcc9110d83989e	2021-04-26 13:56:36 +01:00
Djordje Todorovic	6ba150dbb4	[llvm-dwarfdump] Fix split-dwarf bug in stats for inlined var loc cov Initial (D96045) patch didn't handle split dwarf cases, so this fixes that bug. In addition, before applying this patch, we had a slowdown that happened after the D96045. With this patch, the slowdown will be fixed as well. Differential Revision: https://reviews.llvm.org/D100951	2021-04-26 01:56:15 -07:00
Keith Smiley	86b98c60c5	llvm-objdump: add --rpaths to macho support This prints the rpaths for the given binary Reviewed By: kastiglione Differential Revision: https://reviews.llvm.org/D100681	2021-04-22 16:01:10 -07:00
Hongtao Yu	aaf120b528	[llvm-profgen] A couple tweaks to the testing harness. 1. Remove unnecessary filtering code. 2. Add llvm-profgen to tool substitutions. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D101006	2021-04-22 08:57:14 -07:00
Wenlei He	dff8315892	[CSSPGO][llvm-profdata] Support trimming cold context when merging profiles The change adds support for triming and merging cold context when mergine CSSPGO profiles using llvm-profdata. This is similar to the context profile trimming in llvm-profgen, however the flexibility to trim cold context after profile is generated can be useful. Differential Revision: https://reviews.llvm.org/D100528	2021-04-22 00:42:37 -07:00
Hongtao Yu	1a719089a8	[CSSPGO][llvm-profgen] Always report dangling probes for frames with real samples. Report dangling probes for frames that have real samples collected. Dangling probes are the probes associated to an empty block. When reported, sample count on a dangling probe will not be trusted by the compiler and we will rely on the counts inference algorithm to get the probe a reasonable count. This actually fixes a bug where previously only those dangling probes with samples collected were reported. This patch also fixes two existing issues. Pseudo probes are stored in `Address2ProbesMap` and their pointers are used in `PseudoProbeInlineTree`. Previously `std::vector` was used to store probes and the pointers to probes may get obsolete as the vector grows. I'm changing `std::vector` to `std::list` instead. The other issue is that all outlined functions shared the same inline frame previously due to the unchanged `Index` value as the dummy inlineSite identifier. Good results seen for SPEC2017 in general regarding profile quality. Reviewed By: wenlei, wlei Differential Revision: https://reviews.llvm.org/D100235	2021-04-21 18:07:58 -07:00
Martin Storsjö	64bc44f5dd	[llvm-rc] Run clang to preprocess input files Allow opting out from preprocessing with a command line argument. Update tests to pass -no-preprocess to make it not try to use clang (which isn't a build level dependency of llvm-rc), but add a test that does preprocessing under clang/test/Preprocessor. Update a few options to allow them both joined (as -DFOO) and separate (-D BR), as rc.exe allows both forms of them. With the verbose flag set, this prints the preprocessing command used (which differs from what rc.exe does). Tests under llvm/test/tools/llvm-rc only test constructing the preprocessor commands, while tests under clang/test/Preprocessor test actually running the preprocessor. Differential Revision: https://reviews.llvm.org/D100755	2021-04-21 11:50:10 +03:00
Sebastian Neubauer	4897effb14	[AMDGPU] Add TransVALU to gfx10 Instructions on the transcendental unit are executed in parallel to the normal VALU, so add this as an extra resource. This doesn't seem to have any effect, but it should be more correct. Differential Revision: https://reviews.llvm.org/D100123	2021-04-20 15:34:43 +02:00
Nico Weber	1a3f88658a	[llvm-objdump] Add an llvm-otool tool This implements an LLVM tool that's flag- and output-compatible with macOS's `otool` -- except for bugs, but from testing with both `otool` and `xcrun otool-classic`, llvm-otool matches vanilla otool's behavior very well already. It's not 100% perfect, but it's a very solid start. This uses the same approach as llvm-objcopy: llvm-objdump uses a different OptTable when it's invoked as llvm-otool. This is possible thanks to D100433. Differential Revision: https://reviews.llvm.org/D100583	2021-04-20 08:24:58 -04:00
Martin Storsjö	73cda4d183	[llvm-rc] Fix handling of the /X option to match its documentation and rc.exe This matches how it's documented in the option listing. Differential Revision: https://reviews.llvm.org/D100754	2021-04-20 09:22:43 +03:00
David Penry	78a871abf7	[ARM] Use ProcResGroup in Cortex-M7 scheduling model Used to model structural hazards on FP issue, where some instructions take up 2 issue slots and others one as well as similar structural hazards on load issue, where some instructions take up two load lanes and others one. Differential Revision: https://reviews.llvm.org/D98977	2021-04-19 21:23:05 +01:00
LemonBoy	24185541ca	[yaml2obj/obj2yaml/llvm-readobj] Support printing and parsing AVR-specific e_flags The `e_flags` contains a mixture of bitfields and regular ones, ensure all of them can be serialized and deserialized. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D100250	2021-04-15 15:54:28 +02:00
Alex Orlov	49cbf4cd85	Fix bug in .eh_frame/.debug_frame PC offset calculation for DW_EH_PE_pcrel This fixes the following bugs: https://bugs.llvm.org/show_bug.cgi?id=27249 https://bugs.llvm.org/show_bug.cgi?id=46414 Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D100328	2021-04-15 15:06:20 +04:00
Nico Weber	5a625e5303	[llvm-objdump] try to fix section-filter.test in full builds after `51aa61e74b`	2021-04-14 20:58:51 -04:00
Nico Weber	1035123ac5	[llvm-objdump] Switch command-line parsing from llvm::cl to OptTable This is similar to D83530, but for llvm-objdump. The motivation is the desire to add an `llvm-otool` symlink to llvm-objdump that behaves like macOS's `otool`, using the same technique the at llvm-objcopy uses to behave like `strip` (etc). This change for the most part preserves behavior. In some cases, it increases compatibility with GNU objdump a bit. For example, the long options now require two dashes, and the long options taking arguments for the most part now require a `=` in front of the value. Exceptions are flags where tests passed the value separately, for these the separate form is kept as an alias to the = form. The one-letter short form args are now joined or separate and long longer accept a =, which also matches GNU objdump. cl::opt<>s in libraries now have to be explicitly plumbed through. This patch does that for --x86-asm-syntax=, but there's hope that we can remove that again. Differential Revision: https://reviews.llvm.org/D100433	2021-04-14 20:12:24 -04:00
Wenlei He	00ef28ef21	[CSSPGO] Fix dangling context strings and improve profile order consistency and error handling This patch fixed the following issues along side with some refactoring: 1. Fix bugs where StringRef for context string out live the underlying std::string. We now keep string table in profile generator to hold std::strings. We also do the same for bracketed context strings in profile writer. 2. Make sure profile output strictly follow (total sample, name) order. Previously, there's inconsistency between ProfileMap's key and FunctionSamples's name, leading to inconsistent ordering. This is now fixed by introducing context profile canonicalization. Assertions are also added to make sure ProfileMap's key and FunctionSamples's name are always consistent. 3. Enhanced error handling for profile writing to make sure we bubble up errors properly for both llvm-profgen and llvm-profdata when string table is not populated correctly for extended binary profile. 4. Keep all internal context representation bracket free. This avoids creating new strings for context trimming, merging and preinline. getNameWithContext API is now simplied accordingly. 5. Factor out the code for context trimming and merging into SampleContextTrimmer in SampleProf.cpp. This enables llvm-profdata to use the trimmer when merging profiles. Changes in llvm-profgen will be in separate patch. Differential Revision: https://reviews.llvm.org/D100090	2021-04-10 12:39:10 -07:00
Alex Orlov	f47a4c0713	[lld] Fixed CodeView GuidAdapter::format to handle GUID bytes in the right order. This fixes https://bugs.llvm.org/show_bug.cgi?id=41712 bug. Reviewed By: aganea Differential Revision: https://reviews.llvm.org/D99978	2021-04-09 05:29:14 +04:00
Andrew Savonichev	f08a2fc09e	[MCA] Add tests for IPC on Cortex-A55 The tests compare IPC statistics that MCA provides with IPC values measured on Cortex-A55 hardware. For hardware tests, each snippet is run in a loop unrolled by 1000, and IPC is measured by linux-perf. Several tests do not match the hardware: the skewed ALU is not supported, LDR seem to be missing a forwarding path. Differential Revision: https://reviews.llvm.org/D98174	2021-04-08 19:37:07 +03:00
Nico Weber	c22b09debd	Revert "[clang] Speedup line offset mapping computation" This reverts commit `6951b72334`. Breaks several bots, see comments on https://reviews.llvm.org/D99409	2021-04-07 09:42:11 -04:00
serge-sans-paille	6951b72334	[clang] Speedup line offset mapping computation Clang spends a decent amount of time in the LineOffsetMapping::get(...) function. This function used to be vectorized (through SSE2) then the optimization got dropped because the sequential version was on-par performance wise. This provides an optimization of the sequential version that works on a word at a time, using (documented) bithacks to provide a portable vectorization. When preprocessing the sqlite amalgamation, this yields a sweet 3% speedup. Differential Revision: https://reviews.llvm.org/D99409	2021-04-07 14:04:32 +02:00
Jonas Devlieghere	162c2759b6	[dsymutil] Stop emulating dsymutil-classic CIE caching behavior Stop emulating dsymutil-classic which only cached the last used CIE for reuse.	2021-04-06 20:15:41 -07:00
Jonas Devlieghere	5d07dc8977	[dsymutil] Don't emit .debug_pubnames and .debug_pubtypes Consider the .debug_pubnames and .debug_pubtypes their own kind of accelerator and stop emitting them together with the Apple-style accelerator tables. The only reason we were still emitting both was for (byte-for-byte) compatibility with dsymutil-classic. - This patch adds a new accelerator table kind "Pub" which can be specified with --accelerator=Pub. - This patch removes the ability to emit both pubnames/types and apple style accelerator tables. I don't think anyone is relying on that but it's worth pointing out. - This patch removes the --minimize option and makes this behavior the default. Specifying the flag will result in a warning but won't abort the program. Differential revision: https://reviews.llvm.org/D99907	2021-04-06 19:01:45 -07:00
Arthur Eubanks	9c8b28a69b	[llvm-reduce] Remove unwanted module inline asm We can clear line by line, but that's likely not very important. Reviewed By: hans Differential Revision: https://reviews.llvm.org/D99921	2021-04-06 09:35:37 -07:00
Jay Foad	94d0fc32f5	[AMDGPU] Add some missing testing for new subtargets gfx90a and gfx90c Differential Revision: https://reviews.llvm.org/D99647	2021-04-06 08:38:59 +01:00
Roman Lebedev	d094f3c3c5	[llvm-exegesis] SnippetFile: do create source manager in MCContext This way, once there's an error in the snippet file (like in the test), llvm-exegesis won't crash with an assertion failure, but print a nice diagnostic about the problem.	2021-04-04 15:58:39 +03:00
Roman Lebedev	64a52e1e32	[llvm-exegesis] Don't erroneously refuse to measure POPCNT instruction	2021-04-04 14:38:26 +03:00
Eric Astor	0499a9d688	[ms] [llvm-ml] Accept /WX to signal that warnings should be fatal. Define -fatal-warnings to make warnings fatal, and accept /WX as an ML.EXE compatible alias for it. Also make sure that if Warning() returns true, we always treat it as an error. Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D92504	2021-04-02 15:13:20 -04:00
Eric Astor	15ec0ad77a	[ms] [llvm-ml] Fix case-sensitivity for variables and textmacros Make variables and text-macro references case-insensitive, to match ml.exe. Also improve error handling for text-macro expansion. Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D92503	2021-04-02 14:08:02 -04:00
Samuel	0bc5436ae8	[llvm-reduce] Move tests to tools folder Move tests for llvm-reduce to tools folder Reviewed By: fhahn, lebedev.ri Differential Revision: https://reviews.llvm.org/D99632	2021-04-01 10:04:10 -07:00
Douglas Yung	c5109d3c79	Fix path in test added in `e0577b3130` to work with both Linux/Windows paths. Patch by Ying Yi!	2021-03-30 04:17:29 -07:00
Markus Böck	142d522ded	[llvm-profdata] Make sure to consume Error on the error path of setIsIRLevelProfile Encountered a crash while running a debug build, where this code path would be taken due to a mismatch in profile coverage data versions. Without consuming the error, an assert would be triggered inside the destructor of Error. Differential Revision: https://reviews.llvm.org/D99457	2021-03-30 08:52:58 +02:00
Jonas Devlieghere	b19a9efbc9	[dsymutil] s/dwarfdump/llvm-dwarfdump/ in test	2021-03-29 17:14:35 -07:00
Jonas Devlieghere	e0577b3130	[dsymutil] Relocate DW_TAG_label dsymutil is not relocating the DW_AT_low_pc for a DW_TAG_label. This patch fixes that and adds a test. Differential revision: https://reviews.llvm.org/D99534	2021-03-29 15:45:48 -07:00
Wenlei He	30b0232336	[CSSPGO][llvm-profgen] Context-sensitive global pre-inliner This change sets up a framework in llvm-profgen to estimate inline decision and adjust context-sensitive profile based on that. We call it a global pre-inliner in llvm-profgen. It will serve two purposes: 1) Since context profile for not inlined context will be merged into base profile, if we estimate a context will not be inlined, we can merge the context profile in the output to save profile size. 2) For thinLTO, when a context involving functions from different modules is not inined, we can't merge functions profiles across modules, leading to suboptimal post-inline count quality. By estimating some inline decisions, we would be able to adjust/merge context profiles beforehand as a mitigation. Compiler inline heuristic uses inline cost which is not available in llvm-profgen. But since inline cost is closely related to size, we could get an estimate through function size from debug info. Because the size we have in llvm-profgen is the final size, it could also be more accurate than the inline cost estimation in the compiler. This change only has the framework, with a few TODOs left for follow up patches for a complete implementation: 1) We need to retrieve size for funciton//inlinee from debug info for inlining estimation. Currently we use number of samples in a profile as place holder for size estimation. 2) Currently the thresholds are using the values used by sample loader inliner. But they need to be tuned since the size here is fully optimized machine code size, instead of inline cost based on not yet fully optimized IR. Differential Revision: https://reviews.llvm.org/D99146	2021-03-29 09:46:14 -07:00
Andrew Savonichev	bba25a9cd8	[MCA] Support carry-over instructions for in-order processors Instructions that have more uops than the processor's IssueWidth are issued in multiple cycles. The patch fixes PR49712. Differential Revision: https://reviews.llvm.org/D99339	2021-03-26 00:06:19 +03:00
Konstantin Zhuravlyov	f4ace63737	AMDGPU: Add target id and code object v4 support - Add target id support (https://clang.llvm.org/docs/ClangOffloadBundler.html#target-id) - Add code object v4 support (https://llvm.org/docs/AMDGPUUsage.html#elf-code-object) - Add kernarg_size to kernel descriptor - Change trap handler ABI to no longer move queue pointer into s[0:1] - Cleanup ELF definitions - Add V2, V3, V4 suffixes to make a clear distinction for code object version - Consolidate note names Differential Revision: https://reviews.llvm.org/D95638	2021-03-24 11:54:05 -04:00
Vinicius Tinti	804ff7f293	[llvm-objdump] Implement --prefix-strip option The option `--prefix-strip` is only used when `--prefix` is not empty. It removes N initial directories from absolute paths before adding the prefix. This matches GNU's objdump behavior. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D96679	2021-03-24 13:22:35 +00:00
Andrew Savonichev	292da93d59	[MCA] Disable RCU for InOrderIssueStage This is a follow-up for: D98604 [MCA] Ensure that writes occur in-order When instructions are aligned by the order of writes, they retire in-order naturally. There is no need for an RCU, so it is disabled. Differential Revision: https://reviews.llvm.org/D98628	2021-03-24 13:54:04 +03:00
Andy Wingo	9ac5620cb8	[WebAssembly] Rename WasmLimits::Initial to ::Minimum. NFC. This patch renames the "Initial" member of WasmLimits to the name used in the spec, "Minimum". In the core WebAssembly specification, the Limits data type has one required "min" member and one optional "max" member, indicating the minimum required size of the corresponding table or memory, and the maximum size, if any. Although the WebAssembly spec does instantiate locally-defined tables and memories with the initial size being equal to the minimum size, it can't impose such a requirement for imports. It doesn't make sense to require an initial size for a memory import, for example. The compiler can only sensibly express the minimum and maximum sizes. See https://github.com/WebAssembly/js-types/blob/master/proposals/js-types/Overview.md#naming-of-size-limits for a related discussion that agrees that the right name of "initial" is "minimum" when querying the type of a table or memory from JavaScript. (Of course it still makes sense for JS to speak in terms of an initial size when it explicitly instantiates memories and tables.) Differential Revision: https://reviews.llvm.org/D99186	2021-03-24 09:10:11 +01:00
Jay Foad	fc7e3e7dd9	[AMDGPU] Set SchedRW on real instructions Coyp SchedRW from pseudos to real instructions so that llvm-mca has access to it. This is NFC for normal compiler codegen, which schedules pseudos not real instructions. Add an llvm-mca test for some high latency double-precision instructions as a smoke test. Differential Revision: https://reviews.llvm.org/D99187	2021-03-23 15:38:11 +00:00
Andrea Di Biagio	f5bdc88e4d	[MCA] Improved handling of negative read-advance cycles. Before this patch, register writes were always invalidated by the RegisterFile at instruction commit stage. So, the RegisterFile was often losing the knowledge about the `execute cycle` of writes already committed. While this was not problematic for non-delayed reads, this was sometimes leading to inaccurate read latency computations in the presence of negative read-advance cycles. This patch fixes the issue by changing how the RegisterFile component internally keeps track of the `execute cycle` information of each write. On every instruction executed, the RegisterFile gets notified by the RetireStage, so that it can internally record the execute cycle of each executed write. The `execute cycle` information is stored within WriteRef itself, and it is not invalidated when the write is committed.	2021-03-23 14:47:23 +00:00
Yvan Roux	241032a205	[llvm-symbolizer][llvm-nm] Fix AArch64 and ARM mapping symbols handling. Exclude AArch64 mapping symbols ($x and $d) for symtab symbolization as it was done for ARM since D95916 tom bring bots back to green state. This is implemented by setting SF_FormatSpecific such that llvm-symbolizer will ignore them, and use this flag to re-implement llvm-nm --special-syms option which make it work for both targets. Differential Revision: https://reviews.llvm.org/D98803	2021-03-23 14:17:12 +01:00
Rahman Lavaee	949abf7d6a	[llvm-readelf, propeller] Add fallthrough bit to basic block metadata in BB-Address-Map section. This patch adds a fallthrough bit to basic block metadata, indicating whether the basic block can fallthrough without taking any branches. The bit will help us avoid an intel LBR bug which results in occasional duplicate entries at the beginning of the LBR stack. This patch uses `MachineBasicBlock::canFallThrough()` to set the bit. This is not a const method because it eventually calls `TargetInstrInfo::analyzeBranch`, but it calls this function with the default `AllowModify=false`. So we can either make the argument to the `getBBAddrMapMetadata` non-const, or we can use `const_cast` when calling `canFallThrough`. I decide to go with the latter since this is purely due to legacy code, and in general we should not allow the BasicBlock to be mutable during `getBBAddrMapMetadata`. Reviewed By: tmsriram Differential Revision: https://reviews.llvm.org/D96918	2021-03-22 21:38:05 -07:00
Jonas Devlieghere	3d6c7d6e8e	[dsymutil] Fix spurious warnings for missing symbols with thinLTO Fix spurious warnings for missing symbols with thinLTO. The latter appends a unique suffix to avoid collisions for exported private symbols, resulting in dsymutil complaining it couldn't find the symbol in the object file. rdar://75434058 Differential revision: https://reviews.llvm.org/D99125	2021-03-22 18:36:39 -07:00
Wenlei He	ce6bfe9411	[CSSPGO][llvm-profgen] Use profile summary based threshold for context trimming and merging Switch to use cold threshold from profile summary for cold context merging and trimming, instead of relying on hard coded values. Minor refactoring included for switch names, etc. Differential Revision: https://reviews.llvm.org/D98921	2021-03-22 08:56:59 -07:00
Andrew Litteken	0776eca7a4	Revert "[IRSim] Adding basic implementation of llvm-sim." Causing build errors on the Windows Buildbots. This reverts commit `5155dff278`.	2021-03-20 18:03:09 -05:00
Andrew Litteken	5155dff278	[IRSim] Adding basic implementation of llvm-sim. This is a similarity visualization tool that accepts a Module and passes it to the IRSimilarityIdentifier. The resulting SimilarityGroups are output in a JSON file. Tests are found in test/tools/llvm-sim and check for the file not found, a bad module, and that the JSON is created correctly. Reviewers: paquette, jroelofs, MaskRay Recommit of: `15645d044b` to fix linking errors. Differential Revision: https://reviews.llvm.org/D86974	2021-03-20 16:47:50 -05:00
Fangrui Song	948be862d6	[llvm-readobj] Remove legacy GNU_PROPERTY_X86_ISA_1_{NEEDED,USED} and dump new GNU_PROPERTY_X86_ISA_1_{NEEDED,USED} https://sourceware.org/bugzilla/show_bug.cgi?id=26703 deprecated the previous GNU_PROPERTY_X86_ISA_1_{CMOV,SSE,*} values (renamed to `COMPAT`) and added new values. Since the legacy values are not used by compilers, having dumping support in llvm-readobj is unnecessary. So just drop the legacy feature. The new values are used by GCC 11 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97250) `-march=x86-64-v[234]` to indicate the micro-architecture ISA levels. Differential Revision: https://reviews.llvm.org/D98818	2021-03-19 14:35:22 -07:00
Wenlei He	1410db70b9	[CSSPGO] Add attribute metadata for context profile This changes adds attribute field for metadata of context profile. Currently we have an inline attribute that indicates whether the leaf frame corresponding to a context profile was inlined in previous build. This will be used to help estimating inlining and be taken into account when trimming context. Changes for that in llvm-profgen will follow. It will also help tuning. Differential Revision: https://reviews.llvm.org/D98823	2021-03-18 22:00:56 -07:00
Andrew Savonichev	e6ce0db378	[MCA] Ensure that writes occur in-order Delay the issue of a new instruction if that leads to out-of-order commits of writes. This patch fixes the problem described in: https://bugs.llvm.org/show_bug.cgi?id=41796#c3 Differential Revision: https://reviews.llvm.org/D98604	2021-03-18 17:10:20 +03:00
Chen Zheng	12824266c7	[NFC] make XCOFF dwarf dump test run only on PowerPC target.	2021-03-17 21:59:47 -04:00
Chen Zheng	d33b016ada	[XCOFF][llvm-dwarfdump] llvm-dwarfdump support for XCOFF Author: hubert.reinterpretcast, shchenz Reviewed By: jasonliu, echristo Differential Revision: https://reviews.llvm.org/D97186	2021-03-17 21:21:51 -04:00
Eric Astor	1236dbc2fa	[ms] [llvm-ml] Allow the /Zs parameter as a synonym for -filetype=null For ml.exe, /Zs implies a syntax check with no output files. Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D90061	2021-03-17 12:18:43 -04:00
Fangrui Song	8fbedb6b90	[llvm-nm] Add --format=just-symbols and make --just-symbol-name its alias https://sourceware.org/bugzilla/show_bug.cgi?id=27487 binutils will have --format=just-symbols/-j as well. Arbitrarily prefer `-j` to `--format=sysv`. Previously `--format=sysv -j` prints in the sysv format while `-j` takes precedence over other formats. Differential Revision: https://reviews.llvm.org/D98569	2021-03-16 10:07:01 -07:00
David Zarzycki	0fda5e8441	[llvm-exegesis testing] Workaround unreliable test Picking an instruction at random is not perfectly reliable.	2021-03-16 08:00:14 -04:00
wlei	dddd590fd0	[CSSPGO][llvm-profgen] Fix getCanonicalFnName usage in llvm-profgen Previously we didn't support to keep the unique linkage name(-funique-internal-linkage-name) in llvm-profgen. As discussed in https://reviews.llvm.org/D96932, we choose to do canonicalization for it. Now since "selected" is set as the default parameter of getCanonicalFnName in `D96932`, we don't need to add any attribute here for the previous usage and only fix the missing usage in the pseudo probe decoding. Differential Revision: https://reviews.llvm.org/D98226	2021-03-15 21:00:42 -07:00
Wael Yehia	c05990a0cc	[PATCH] fix location of test case from D97507.	2021-03-15 09:34:24 -04:00
Johannes Doerfert	cd1bd6e587	[Utils] Check for more global information in update_test_checks This allows to check for various globals (metadata/attributes/...) and also resolves problems with globals (metadata/attributes/...) being reused across different prefixes. Reviewed By: sstefan1 Differential Revision: https://reviews.llvm.org/D94741	2021-03-11 23:31:16 -06:00
David Blaikie	7906c0309b	Move (llvm-original-di-preservation) test example output into the Inputs directory (since it's an input to the test execution) The "Inputs" subdirectory is used for all files read by the test, not only those used as input to the execution - so even though this file is used as a golden reference for the output of the test, it's still an input to the test execution (it is read in the process of executing the test).	2021-03-11 17:36:33 -08:00
Jay Foad	7340fd6886	[MCA] Support in-order CPUs with MicroOpBufferSize=1 Differential Revision: https://reviews.llvm.org/D98356	2021-03-11 10:12:54 +00:00
Djordje Todorovic	9f41c03f82	[Debugify][OriginalDIMode] Export the report into JSON file By using the original-di check with debugify in the combination with the llvm/utils/llvm-original-di-preservation.py it becomes very user friendly tool. An example of the HTML page with the issues related to debug info can be found at [0]. [0] https://djolertrk.github.io/di-checker-html-report-example/ Differential Revision: https://reviews.llvm.org/D82546	2021-03-11 01:11:13 -08:00
Zequan Wu	8d5c3ae357	Revert "[llvm-cov] reset executation count to 0 after wrapped segment" This reverts D85036 Differential Revision: https://reviews.llvm.org/D98084	2021-03-09 14:47:32 -08:00
Alexander Shaposhnikov	ede56e5127	[llvm-objcopy][MachO] Add support for --keep-undefined This diff introduces --keep-undefined in llvm-objcopy/llvm-strip for Mach-O which makes the tools preserve undefined symbols. Test plan: make check-all Differential revision: https://reviews.llvm.org/D97040	2021-03-08 18:57:25 -08:00
Alexander Shaposhnikov	5f2f84a68a	[llvm-objdump][MachO] Add support for dumping function starts Add support for dumping function starts for Mach-O binaries. Test plan: make check-all Differential revision: https://reviews.llvm.org/D97027	2021-03-08 18:44:44 -08:00
Rahman Lavaee	c245c21c43	[llvm-readelf] Support dumping the BB address map section with --bb-addr-map. This patch lets llvm-readelf dump the content of the BB address map section in the following format: ``` Function { At: <address> BB entries [ { Offset: <offset> Size: <size> Metadata: <metadata> }, ... ] } ... ``` Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D95511	2021-03-08 16:20:11 -08:00
wlei	c460ef61d6	[CSSPGO][llvm-profgen] Change sample count of dangling probe in llvm-profgen Differential Revision: https://reviews.llvm.org/D96811	2021-03-08 14:36:02 -08:00
Hongtao Yu	e68fafa49f	[CSSPGO] llvm-profdata support for CS profile. Context-sensitive AutoFDO profile has a different name scheme where full calling contexts are encoded as function names. When processing CS proifle, llvm-profdata should use full contexts instead of leaf function names. Reviewed By: wmi, wenlei, wlei Differential Revision: https://reviews.llvm.org/D97998	2021-03-08 09:04:40 -08:00
Keith Smiley	64240f8138	llvm-nm: add flag to suppress no symbols warning This spelling matches binutils https://sourceware.org/bugzilla/show_bug.cgi?id=27408 Differential Revision: https://reviews.llvm.org/D83152	2021-03-07 16:20:13 -08:00
Abhina Sreeskantharajan	c52fe0b021	[test] Use host platform specific error message substitution in lit tests This patch uses the errno python library to print out the correct error messages instead of hardcoding the error message per platform. Reviewed By: jhenderson, ASDenysPetrov Differential Revision: https://reviews.llvm.org/D97472	2021-03-05 07:21:53 -05:00
James Henderson	076698154a	[llvm-objcopy] Fix crash for binary input files with non-ascii names The code was using the standard isalnum function which doesn't handle values outside the non-ascii range. Switching to using llvm::isAlnum instead ensures we don't provoke undefined behaviour, which can in some cases result in crashes. Reviewed by: MaskRay Differential Revision: https://reviews.llvm.org/D97663	2021-03-05 08:57:40 +00:00
James Henderson	47c343d768	[llvm-objcopy][test] Fix test that could have passed spuriously The test was showing that when --strip-unneeded is specified for an executable, all the symbols are stripped. However, the set of symbols used in the test would be stripped by --strip-unneeded for an ET_REL object too. Fix this by adding additional symbols that aren't normally stripped by --strip-unneeded. Reviewed by: MaskRay Differential Revision: https://reviews.llvm.org/D97664	2021-03-05 08:57:39 +00:00
Haowei Wu	db06088d63	[llvm-ifs] Add option to use InterfaceStub library This change adds '-use-interfacestub' option to allow llvm-ifs to use InterfaceStub lib when generating ELF binary. Differential Revision: https://reviews.llvm.org/D94461	2021-03-04 11:28:49 -08:00
James Henderson	1562e4552c	[llvm-objcopy][llvm-strip][test] Improve testing This patch adds a number of new test cases that cover various llvm-objcopy and llvm-strip features that had missing test coverage of various descriptions: * --add-section - checked the shdr properties, not just the content. * Dedicated test case for --add-symbol when there are many sections. * Show that --change-start accepts negative values without overflow. This was previously present but got lost between review versions. * --dump-section - show that multiple sections can be dumped simultaneously to different files, and that an error is reported when a section cannot be found. * --globalize-symbol(s) - show that symbols that are not mentioned are not globalized, if they would otherwise be, and that missing symbols from the list do not cause problems. * --keep-global-symbol - show that the --regex option can be used in conjunction with this option. * --keep-symbol - show that the --regex option can be used in conjunction with this option. * --localize-symbol(s) - show that symbols that are not mentioned are not localized, if they would otherwise be, and that missing symbols from the list do not cause problems. * --prefix-alloc-sections - show the behaviour of an empty string argument and multiple arguments. * --prefix-symbols - show the behaviour of an empty string argument and multiple arguments. Also show the option applies to undefined symbols. * --redefine-symbol - show that symbols with no name can be renamed, that it is not an error if a symbol is not specified, and that the option doesn't chain (i.e. --redefine-sym a=b --redefine-sym b=c does not redefine a as c). * --rename-section - show that all section flags are preserved if none are specified. Also show that the option does not chain. * --set-section-alignment - show that only specified sections have their alignments changed. * --set-section-flags - show which section flags are preserved when this option is used. Also show that unspecified sections are not affected. * --preserve-dates - show that -p is an alias of --preserve-dates. * --strip-symbol - show that --regex works with this option for llvm-objcopy as well as llvm-strip. * --strip-unneeded-symbol(s) - show more clearly that needed symbols are not stripped even if requested by this option. * --allow-broken-links - show the sh_link of a symbol table is set to 0 when its string table has been removed when this option is specified. * --weaken-symbol(s) - show that symbols that are not mentioned are not weakened, if they would otherwise be, and that missing symbols from the list do not cause problems. * --wildcard - show the wildcard behaviour for several options that were previously unchecked. Reviewed by: alexshap Differential Revision: https://reviews.llvm.org/D97666	2021-03-04 11:32:27 +00:00
Oliver Stannard	aac056c528	[objdump][ARM] Use correct offset when printing ARM/Thumb branch targets llvm-objdump only uses one MCInstrAnalysis object, so if ARM and Thumb code is mixed in one object, or if an object is disassembled without explicitly setting the triple to match the ISA used, then branch and call targets will be printed incorrectly. This could be fixed by creating two MCInstrAnalysis objects in llvm-objdump, like we currently do for SubtargetInfo. However, I don't think there's any reason we need two separate sub-classes of MCInstrAnalysis, so instead these can be merged into one, and the ISA determined by checking the opcode of the instruction. Differential revision: https://reviews.llvm.org/D97766	2021-03-04 11:15:57 +00:00
Andrew Savonichev	d791695cb5	[MCA] Add support for in-order CPUs This patch adds a pipeline to support in-order CPUs such as ARM Cortex-A55. In-order pipeline implements a simplified version of Dispatch, Scheduler and Execute stages as a single stage. Entry and Retire stages are common for both in-order and out-of-order pipelines. Differential Revision: https://reviews.llvm.org/D94928	2021-03-04 14:08:19 +03:00
James Henderson	8bb74d16ef	[llvm-objcopy/strip] Fix off-by-one error in SYMTAB_SHNDX need check The check for whether an extended symbol index table was required dropped the first SHN_LORESERVE sections from the sections array before checking whether the remaining sections had symbols. Unfortunately, the null section header is not present in this list, so the check was skipping the first section that might be important. If that section contained a symbol, and no subsequent ones did, the .symtab_shndx section would not be emitted, leading to a corrupt object. Also consolidate and expand test coverage in the area to cover this bug and other aspects of the SYMTAB_SHNDX section. Reviewed by: alexshap, MaskRay Differential Revision: https://reviews.llvm.org/D97661	2021-03-04 10:23:45 +00:00

... 3 4 5 6 7 ...

5281 Commits