Commit Graph

1151 Commits

Author SHA1 Message Date
Jan Svoboda 622354a522 [llvm][ADT] Implement `BitVector::{pop_,}back`
LLVM Programmer’s Manual strongly discourages the use of `std::vector<bool>` and suggests `llvm::BitVector` as a possible replacement.

Currently, some users of `std::vector<bool>` cannot switch to `llvm::BitVector` because it doesn't implement the `pop_back()` and `back()` functions.

To enable easy transition of `std::vector<bool>` users, this patch implements `llvm::BitVector::pop_back()` and `llvm::BitVector::back()`.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D117115
2022-01-21 14:50:53 +01:00
Juergen Ributzka 3025c3eded Replace PlatformKind with PlatformType.
The PlatformKind/PlatformType enums contain the same information, which requires
them to be kept in-sync. This commit changes over to PlatformType as the sole
source of truth, which allows the removal of the redundant PlatformKind.

The majority of the changes were in LLD and TextAPI.

Reviewed By: cishida

Differential Revision: https://reviews.llvm.org/D117163
2022-01-13 09:23:49 -08:00
Leonard Grey 0f85393004 [MachO] Port call graph profile section and directive
This ports the `.cg_profile` assembly directive and call graph profile section
generation to MachO from COFF/ELF. Due to MachO section naming rules, the
section is called `__LLVM,__cg_profile` rather than `.llvm.call-graph-profile`
as in COFF/ELF. Support for llvm-readobj is included to facilitate testing.

Corresponding LLD change is D112164

Differential Revision: https://reviews.llvm.org/D112160
2022-01-12 09:22:26 -05:00
Kazu Hirata b932bdf59f [llvm] Remove redundant member initialization (NFC)
Identified with readability-redundant-member-init.
2022-01-07 17:45:09 -08:00
Kazu Hirata e5947760c2 Revert "[llvm] Remove redundant member initialization (NFC)"
This reverts commit fd4808887e.

This patch causes gcc to issue a lot of warnings like:

  warning: base class ‘class llvm::MCParsedAsmOperand’ should be
  explicitly initialized in the copy constructor [-Wextra]
2022-01-03 11:28:47 -08:00
Kazu Hirata fd4808887e [llvm] Remove redundant member initialization (NFC)
Identified with readability-redundant-member-init.
2022-01-01 16:18:18 -08:00
Fangrui Song c6bf71363a [ELFAsmParser] Optimize hasPrefix with StringRef::consume_front 2021-12-30 00:16:03 -08:00
Sami Tolvanen 9a74c753fe [ThinLTO][MC] Use conditional assignments for promotion aliases
Inline assembly refererences to static functions with ThinLTO+CFI were
fixed in D104058 by creating aliases for promoted functions. Creating
the aliases unconditionally resulted in an unexpected size increase in
a Chrome helper binary:

https://bugs.chromium.org/p/chromium/issues/detail?id=1261715

This is caused by the compiler being unable to drop unused code now
referenced by the alias in module-level inline assembly. This change
adds a .set_conditional assembly extension, which emits an assignment
only if the target symbol is also emitted, avoiding phantom references
to functions that could have otherwise been dropped.

This is an alternative to the solution proposed in D112761.

Reviewed By: pcc, nickdesaulniers, MaskRay

Differential Revision: https://reviews.llvm.org/D113613
2021-12-10 12:21:37 -08:00
Phoebe Wang d7c07f60b3 [X86][MS-InlineAsm] Make the constraint *m to be simple place holder
D113096 solved the "undefined reference to xxx" issue by adding
constraint *m for the global var. But it has strong side effect due to
the symbol in the assembly being replaced with constraint variable.
This leads to some lowering fails. https://godbolt.org/z/h3nWoerPe

This patch fix the problem by use the constraint *m as place holder
rather than real constraint. It has negligible effect for the existing
code generation.

Reviewed By: skan

Differential Revision: https://reviews.llvm.org/D115225
2021-12-10 09:29:38 +08:00
Tobias Burnus c01c62c76c [MC][ELF] Fix accepting abbreviated form with Type change
Follow up to D92052 and D94072, exposed due to D107707

Many assemblers to permit that only the first .section contains all
the attributes like '.lds_bss,"w",@nobits' and later section only
use the name ('.lds_bss') inheriting those attributes from the first
section.  I turned out that the case that Type changed was missed
when implementing it - and D107707 make it much more likely to hit
that issue. That's fixed by this commit.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D114717
2021-11-30 14:45:26 +00:00
Kazu Hirata d14d7068b6 [llvm] Use StringRef::contains (NFC) 2021-10-23 08:45:27 -07:00
Anirudh Prasad fb99424a6f [SystemZ][z/OS] Introduce initial support for GOFF asm parser
- Introduce a skeleton outline for the GOFFAsmParser
- Before instantiating AsmParser/HLASMAsmParser, target specific asm parsers are attempted to be initialized first before proceeding. If it doesn't exist for a particular file type, we report a fatal error.
- This patch allows to properly instantiate the HLASMAsmParser on z/OS, and ensures we can write lit tests and unit tests which will involve the instantiation of asm parsers, without an assert / fatal error.

Reviewed By: uweigand, Kai

Differential Revision: https://reviews.llvm.org/D110730
2021-10-01 10:29:14 -04:00
Peter Smith b026ce9c8a [MC] Add Subtarget for MAsmParser call to emitCodeAlignment
The call to emitCodeAlignment was missing a STI which is required
after D45962.

emitCodeAlignment has a default parameter of 0 for MaxBytesToEmit.
Explicitly passing 0 here was interpreted as as nullptr for the STI.
This could possibly be avoided by taking STI as a const reference in
emitCodeAlignment.

Differential Revision: https://reviews.llvm.org/D109425
2021-09-08 13:28:24 +01:00
Peter Smith 5e71839f77 [MC] Add MCSubtargetInfo to MCAlignFragment
In preparation for passing the MCSubtargetInfo (STI) through to writeNops
so that it can use the STI in operation at the time, we need to record the
STI in operation when a MCAlignFragment may write nops as padding. The
STI is currently unused, a further patch will pass it through to
writeNops.

There are many places that can create an MCAlignFragment, in most cases
we can find out the STI in operation at the time. In a few places this
isn't possible as we are in initialisation or finalisation, or are
emitting constant pools. When possible I've tried to find the most
appropriate existing fragment to obtain the STI from, when none is
available use the per module STI.

For constant pools we don't actually need to use EmitCodeAlign as the
constant pools are data anyway so falling through into it via an
executable NOP is no better than falling through into data padding.

This is a prerequisite for D45962 which uses the STI to emit the
appropriate NOP for the STI. Which can differ per fragment.

Note that involves an interface change to InitSections. It is now
called initSections and requires a SubtargetInfo as a parameter.

Differential Revision: https://reviews.llvm.org/D45961
2021-09-07 15:46:19 +01:00
Tozer 5c6f748cbc [MCParser] Correctly handle CRLF line ends when consuming line comments
Fixes issue: https://bugs.llvm.org/show_bug.cgi?id=47983

The AsmLexer currently has an issue with lexing line comments in files
with CRLF line endings, in which it reads the carriage return as being
part of the line comment. This causes an error for certain valid comment
layouts; this patch fixes this by excluding the carriage return from the
line comment.

Differential Revision: https://reviews.llvm.org/D90234
2021-08-17 15:52:51 +01:00
Simon Atanasyan 990e8025b5 [MC][ELF] Do not error on parsing .debug_* section directive for MIPS
MIPS .debug_* sections should have SHT_MIPS_DWARF section type to
distinguish among sections contain DWARF and ECOFF debug formats, but in
assembly files these sections have SHT_PROGBITS (@progbits) type. Now
assembler shows 'changed section type for ...' error when parsing
`.section .debug_*,"",@progbits` directive for MIPS targets.

The same problem exists for x86-64 target and this patch extends
workaround implemented in D76151. The patch adds one more case
when assembler ignores section types mismatch after `SwitchSection()`
call.

Differential Revision: https://reviews.llvm.org/D107707
2021-08-09 08:54:56 +03:00
Arthur Eubanks ad25344620 [MC][CodeGen] Emit constant pools earlier
Previously we would emit constant pool entries for ldr inline asm at the
very end of AsmPrinter::doFinalization(). However, if we're emitting
dwarf aranges, that would end all sections with aranges. Then if we have
constant pool entries to be emitted in those same sections, we'd hit an
assert that the section has already been ended.

We want to emit constant pool entries before emitting dwarf aranges.
This patch splits out arm32/64's constant pool entry emission into its
own MCTargetStreamer virtual method.

Fixes PR51208

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D107314
2021-08-03 20:55:31 -07:00
Anirudh Prasad a8cfa4b9bd [SystemZ][z/OS] Initial code to generate assembly files on z/OS
- This patch consists of the bare basic code needed in order to generate some assembly for the z/OS target.
- Only the .text and the .bss sections are added for now.
- The relevant MCSectionGOFF/Symbol interfaces have been added. This enables us to print out the GOFF machine code sections.
- This patch enables us to add simple lit tests wherever possible, and contribute to the testing coverage for the z/OS target
- Further improvements and additions will be made in future patches.

Reviewed By: tmatheson

Differential Revision: https://reviews.llvm.org/D106380
2021-07-27 11:29:15 -04:00
Eric Astor a4e964a282 [ms] [llvm-ml] Fix macro case-insensitivity
We previously had issues identifying macros not registered with a lowercase name.

Reviewed By: mstorsjo, thakis

Differential Revision: https://reviews.llvm.org/D106453
2021-07-22 15:50:52 -04:00
Eric Astor 5fba605896 [ms] [llvm-ml] Support built-in text macros
Add support for all built-in text macros supported by ML64:
@Date, @Time, @FileName, @FileCur, and @CurSeg.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D104965
2021-07-21 11:44:09 -04:00
Eric Astor 4cbb912d75 [ms] [llvm-ml] Add support for numeric built-in symbols
Support @Version and @Line as built-in symbols. For now, resolves @Version to 1427 (the same as for the VS 2019 release of ML.EXE).

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D104964
2021-07-21 11:43:07 -04:00
Simon Tatham e49985bb60 Remove unused parameter from parseMSInlineAsm.
No implementation uses the `LocCookie` parameter at all. Errors are
reported from inside that function by `llvm::SourceMgr`, and the
instance of that at the clang call site arranges to pass the error
messages back to a `ClangAsmParserCallback`, which is where the clang
SourceLocation for the error is computed.

(This is part of a patch series working towards the ability to make
SourceLocation into a 64-bit type to handle larger translation units.
But this particular change seems beneficial in its own right.)

Reviewed By: miyuki

Differential Revision: https://reviews.llvm.org/D105490
2021-07-12 15:07:03 +01:00
Eric Astor 678211de6d [ms] [llvm-ml] Standardize blocking of lexical substitution
In MASM, the ifdef family of directives treats its argument literally, without expanding it as a text macro. Add support for this, and also replace the special handling that was previously used for echo.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D104196
2021-07-02 14:17:37 -04:00
Jinsong Ji bf64210fd8 [AIX] Add dummy XCOFF MCAsmParserExtension
Implement XCOFFMCAsmParser so that we can use MC to parse inline asm.

The directives and storage mapping classes will be added later
iteratively.

Reviewed By: xgupta

Differential Revision: https://reviews.llvm.org/D105259
2021-07-02 16:12:21 +00:00
Jonas Paulsson 7aef99351a [MCStreamer] Move emission of attributes section into MCELFStreamer
Enable the emission of a GNU attributes section by reusing the code for
emitting the ARM build attributes section.

The GNU attributes follow the exact same section format as the ARM
BuildAttributes section, so this can be factored out and reused for GNU
attributes generally.

The immediate motivation for this is to emit a GNU attributes section for the
vector ABI on SystemZ (https://reviews.llvm.org/D105067).

Review: Logan Chien, Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D102894
2021-06-30 16:00:27 -05:00
Anirudh Prasad 2dca0b5a1c [AsmParser][SystemZ][z/OS] Fix hanging scenario in HLASMAsmParser class
- In the caller of the overridden `parseStatement` function (i.e. the `AsmParser::Run()`) in the case of an error **and** if we're not at the start of the statement, we "eat" up until the end of the current statement, so we don't have to process it again.
- However, in the HLASMAsmParser class what's happening is that, if an error occurs at the very start of the statement (for example, you invoke the HLASMAsmParser to parse a gnu directive), we will error out, but we never really progress in terms of the next token in the statement to parse. We simply keep looping processing the same error over and over again (partly because we're at the start of the statement)
- To remedy this, when the `parseAsHLASMLabel` function fails, before returning, we "eat" until the end of the statement function, so we don't process it anymore.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D104869
2021-06-28 12:47:08 -04:00
Eric Astor c8d0d8a8a1 [ms] [llvm-ml] Add support for ALIGN, EVEN, and ORG directives
Match ML.EXE's behavior for ALIGN, EVEN, and ORG directives both at file level and in STRUCTs.

We currently reject negative offsets passed to ORG inside STRUCTs (in ML.EXE and ML64.EXE, they wrap around as for an unsigned 32-bit integer).

Also, if a STRUCT is declared using an ORG directive, no value of that type can be defined.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D92507
2021-06-25 17:19:45 -04:00
Martin Storsjö 42f74e8249 [llvm] Rename StringRef _lower() method calls to _insensitive()
This is a mechanical change. This actually also renames the
similarly named methods in the SmallString class, however these
methods don't seem to be used outside of the llvm subproject, so
this doesn't break building of the rest of the monorepo.
2021-06-25 00:22:01 +03:00
Anirudh Prasad 631362665c [AsmParser][SystemZ][z/OS] Support for emitting labels in upper case
- Currently, the emitting of labels in the parsePrimaryExpr function is case independent. It just takes the identifier and emits it.
- However, for HLASM the emitting of labels is case independent. We are emitting them in the upper case only, to enforce case independency. So we need to ensure that at the time of parsing the label we are emitting the upper case (in `parseAsHLASMLabel`), but also, when we are processing a PC-relative relocatable expression, we need to ensure we emit it in upper case (in `parsePrimaryExpr`)
- To achieve this a new MCAsmInfo attribute has been introduced which corresponding targets can override if needed.

Reviewed By: abhina.sreeskantharajan, uweigand

Differential Revision: https://reviews.llvm.org/D104715
2021-06-24 12:50:11 -04:00
RamNalamothu 167e7afcd5 Implement DW_CFA_LLVM_* for Heterogeneous Debugging
Add support in MC/MIR for writing/parsing, and DebugInfo.

This is part of the Extensions for Heterogeneous Debugging defined at
https://llvm.org/docs/AMDGPUDwarfExtensionsForHeterogeneousDebugging.html

Specifically the CFI instructions implemented here are defined at
https://llvm.org/docs/AMDGPUDwarfExtensionsForHeterogeneousDebugging.html#cfa-definition-instructions

Reviewed By: clayborg

Differential Revision: https://reviews.llvm.org/D76877
2021-06-14 08:51:50 +05:30
Eric Astor d81c059c3e [ms] [llvm-ml] Fix capitalization of the ignored CPU directives
These directives are matched in lowercase, so make sure to use lowercase for their P suffix.

Differential Revision: https://reviews.llvm.org/D104206
2021-06-13 18:34:42 -04:00
Eric Astor f03a3caac5 [ms] [llvm-ml] Warn on command-line redefinition
If a macro is defined on the command line and then overridden in the source code, this is likely to be an error in the user's build system. We should warn on this.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D104008
2021-06-10 14:20:21 -04:00
Eric Astor 00ebbedd1c [ms] [llvm-ml] Make variable redefinition match ML.EXE
MASM specifies that all variable definitions are redefinable, except for EQU definitions to expressions. (TEXTEQU is unspecified, but appears to be fully redefinable as well.)

Also, in practice, ML.EXE allows redefinitions where the value doesn't change.

Make variable redefinition possible for text macros, suppressing expansion if written as the first argument to an EQU or TEXTEQU directive.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D103993
2021-06-10 08:36:15 -04:00
Eric Astor c8d6e67d53 [ms] [llvm-ml] Fix parity errors in error handling for INCLUDE directive
Also adds basic testing for "include" directive.

Differential Revision: https://reviews.llvm.org/D103980
2021-06-09 13:34:36 -04:00
Eric Astor dc0c3fe5f3 [ms] [llvm-ml] Disambiguate size directives and variable declarations
MASM allows statements of the form:
	<VAR> DWORD 5
to declare a variable with name <VAR>, while:
	call dword ptr [<value>]
is a valid instruction. To disambiguate, we recognize size directives by the trailing "ptr" token.

As discussed in https://lists.llvm.org/pipermail/llvm-dev/2021-May/150774.html

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D103257
2021-06-08 15:44:31 -04:00
Anirudh Prasad e52007cac4 [SystemZ][z/OS] Stricter condition for HLASM class instantiation
- A lot of lit tests simply specify the arch minus the triple. On z/OS, this could result in a scenario of some-other-triple-unknown-ibm-zos. This points to an incorrect triple + arch combo.
- To prevent this, isOSzOS change is switched in favour of isOSBinFormatGOFF.
- This is because, the GOFF format is set only if the triple is systemz and if the operating system is GOFF. And currently, there are no other architectures/os's using the GOFF file format.
- An argument could be made that the problematic tests be fixed to explicitly specify the arch-vendor-triple string, but there's a large number of these tests, and adding this stricter scope ensures that we aren't instantiating the incorrect instance of the AsmParser for other platforms when run on z/OS.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D103343
2021-06-01 15:56:50 -04:00
Anirudh Prasad b8dcd920ec [AsmParser][SystemZ][z/OS] Introducing HLASM Parser support to AsmParser - Part 2
- This patch is the second (and hopefully final) part of providing HLASM syntax for inline asm statements for z/OS to LLVM (continuing on from https://reviews.llvm.org/D98276)
- This second part deals with providing label support
- As mentioned in https://reviews.llvm.org/D98276, if the first token is not a space we process the first token as a label, and the remaining tokens as a possible machine instruction
- To achieve this, a new `parseAsHLASMLabel` function is introduced. This function processes the first token, validates whether it is an "acceptable" label according to HLASM standards, and then emits it
- After handling and emitting the label, call the `parseAsMachineInstruction` instruction to process the remaining tokens as a machine instruction.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D103320
2021-05-31 11:27:02 -04:00
Anirudh Prasad f076da66b9 [AsmParser][SystemZ][z/OS] Introducing HLASM Parser support to AsmParser - Part 1
- This patch (is one in a series of patches) which introduces HLASM Parser support (for the first parameter of inline asm statements) to LLVM ([[ https://lists.llvm.org/pipermail/llvm-dev/2021-January/147686.html | main RFC here ]])
- This patch in particular introduces HLASM Parser support for Z machine instructions.
- The approach taken here was to subclass `AsmParser`, and make various functions and variables as "protected" wherever appropriate.
- The `HLASMAsmParser` class overrides the `parseStatement` function. Two new private functions `parseAsHLASMLabel` and `parseAsMachineInstruction` are introduced as well.

The general syntax is laid out as follows (more information available in [[ https://www.ibm.com/support/knowledgecenter/SSENW6_1.6.0/com.ibm.hlasm.v1r6.asm/asmr1023.pdf | HLASM V1R6 Language Reference Manual ]] - Chapter 2 - Instruction Statement Format):

```
<TokA><spaces.*><TokB><spaces.*><TokC><spaces.*><TokD>
```

1. TokA is referred to as the Name Entry. This token is optional
2. TokB is referred to as the Operation Entry. This token is mandatory.
3. TokC is referred to as the Operand Entry. This token is mandatory
4. TokD is referred to as the Remarks Entry. This token is optional

- If TokA is provided, then we either parse TokA as a possible comment or as a label (Name Entry), Tok B as the Operation Entry and so on.
- If TokA is not provided (i.e. we have one or more spaces and then the first token), then we will parse the first token (i.e TokB) as a possible Z machine instruction, TokC as the operands to the Z machine instruction and TokD as a possible Remark field
- TokC (Operand Entry), no spaces are allowed between OperandEntries. If a space occurs it is classified as an error.
- TokD if provided is taken as is, and emitted as a comment.

The following additional approach was examined, but not taken:

- Adding custom private only functions to base AsmParser class, and only invoking them for z/OS. While this would eliminate the need for another child class, these private functions would be of non-use to every other target. Similarly, adding any pure virtual functions to the base MCAsmParser class and overriding them in AsmParser would also have the same disadvantage.

Testing:

- This patch doesn't have tests added with it, for the sole reason that MCStreamer Support and Object File support hasn't been added for the z/OS target (yet). Hence, it's not possible generate code outright for the z/OS target. They are in the process of being committed / process of being worked on.

- Any comments / feedback on how to combat this "lack of testing" due to other missing required features is appreciated.

Reviewed By: Kai, uweigand

Differential Revision: https://reviews.llvm.org/D98276
2021-05-19 11:05:30 -04:00
Sam Clegg 3041b16f73 [WebAssembly] Add TLS data segment flag: WASM_SEG_FLAG_TLS
Previously the linker was relying solely on the name of the segment
to imply TLS.

Differential Revision: https://reviews.llvm.org/D102202
2021-05-12 13:31:02 -07:00
Sam Clegg 3b8d2be527 Reland: "[lld][WebAssembly] Initial support merging string data"
This change was originally landed in: 5000a1b4b9
It was reverted in: 061e071d8c

This change adds support for a new WASM_SEG_FLAG_STRINGS flag in
the object format which works in a similar fashion to SHF_STRINGS
in the ELF world.

Unlike the ELF linker this support is currently limited:
- No support for SHF_MERGE (non-string merging)
- Always do full tail merging ("lo" can be merged with "hello")
- Only support single byte strings (p2align 0)

Like the ELF linker merging is only performed at `-O1` and above.

This fixes part of https://bugs.llvm.org/show_bug.cgi?id=48828,
although crucially it doesn't not currently support debug sections
because they are not represented by data segments (they are custom
sections)

Differential Revision: https://reviews.llvm.org/D97657
2021-05-10 16:03:38 -07:00
Nico Weber 061e071d8c Revert "[lld][WebAssembly] Initial support merging string data"
This reverts commit 5000a1b4b9.
Breaks tests, see https://reviews.llvm.org/D97657#2749151

Easily repros locally with `ninja check-llvm-mc-webassembly`.
2021-05-10 18:28:28 -04:00
Sam Clegg 5000a1b4b9 [lld][WebAssembly] Initial support merging string data
This change adds support for a new WASM_SEG_FLAG_STRINGS flag in
the object format which works in a similar fashion to SHF_STRINGS
in the ELF world.

Unlike the ELF linker this support is currently limited:
- No support for SHF_MERGE (non-string merging)
- Always do full tail merging ("lo" can be merged with "hello")
- Only support single byte strings (p2align 0)

Like the ELF linker merging is only performed at `-O1` and above.

This fixes part of https://bugs.llvm.org/show_bug.cgi?id=48828,
although crucially it doesn't not currently support debug sections
because they are not represented by data segments (they are custom
sections)

Differential Revision: https://reviews.llvm.org/D97657
2021-05-10 13:15:12 -07:00
Philipp Krones 632ebc4ab4 [MC] Untangle MCContext and MCObjectFileInfo
This untangles the MCContext and the MCObjectFileInfo. There is a circular
dependency between MCContext and MCObjectFileInfo. Currently this dependency
also exists during construction: You can't contruct a MOFI without a MCContext
without constructing the MCContext with a dummy version of that MOFI first.
This removes this dependency during construction. In a perfect world,
MCObjectFileInfo wouldn't depend on MCContext at all, but only be stored in the
MCContext, like other MC information. This is future work.

This also shifts/adds more information to the MCContext making it more
available to the different targets. Namely:

- TargetTriple
- ObjectFileType
- SubtargetInfo

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D101462
2021-05-05 10:03:02 -07:00
Anirudh Prasad ae2aef1361 [AsmParser][SystemZ][z/OS] Reject character and string literals for HLASM
- As per the HLASM support we are providing, i.e. support only for the first parameter of the inline asm block, only pertaining to Z machine instructions defined in LLVM, character literals and string literals are not supported (see Figure 4 - https://www-01.ibm.com/servers/resourcelink/svc00100.nsf/pages/zOSV2R3sc264940/$file/asmr1023.pdf for more information)
- This patch explicitly rejects the usage of char literals and string literals (for example "abc 'a'") when the relevant field is set
- This is achieved by introducing a field called `LexHLASMStrings` in MCAsmLexer similar to `LexMasmStrings`

Reviewed By: abhina.sreeskantharajan, Kai

Differential Revision: https://reviews.llvm.org/D101660
2021-05-05 10:21:55 -04:00
Fangrui Song 7cac6a9d7a [MC] Add MCAsmParser::parseComma to improve diagnostics
llvm-mc will error "expected comma" instead of "unexpected token".
2021-05-04 14:13:19 -07:00
Fangrui Song 7b1e1fccb0 [MC] Don't capitalize a floating point diagnostic 2021-05-04 13:40:26 -07:00
Fangrui Song 3d473ae72e [MC] Remove unneeded "in '.xxx' directive" from diagnostics
The directive name is not useful because the next line replicates the error line
which includes the directive.
2021-05-04 13:30:29 -07:00
Anirudh Prasad ca02fab7e7 [AsmParser][SystemZ][z/OS] Implement HLASM location counter syntax ("*") for Z PC-relative instructions.
- This patch attempts to implement the location counter syntax (*) for the HLASM variant for PC-relative instructions.
- In the HLASM variant, for purely constant relocatable values, we expect a * token preceding it, with special support for " *" which is parsed as "<pc-rel-insn 0>"
- For combinations of absolute values and relocatable values, we don't expect the "*" preceding the token.

When you have a " * "  what’s accepted is:

```
*<space>.*{.*} -> <pc-rel-insn> 0
*[+|-][constant-value] -> <pc-rel-insn> [+|-]constant-value
```

When you don’t have a " * " what’s accepted is:

```
brasl  1,func           is allowed (MCSymbolRef type)
brasl  1,func+4         is allowed (MCBinary type)
brasl  1,4+func         is allowed (MCBinary type)
brasl  1,-4+func        is allowed (MCBinary type)
brasl  1,func-4         is allowed (MCBinary type)
brasl  1,*func          is not allowed (* cannot be used for non-MCConstantExprs)
brasl  1,*+func         is not allowed (* cannot be used for non-MCConstantExprs)
brasl  1,*+func+4       is not allowed (* cannot be used for non-MCConstantExprs)
brasl  1,*+4+func       is not allowed (* cannot be used for non-MCConstantExprs)
brasl  1,*-4+8+func     is not allowed (* cannot be used for non-MCConstantExprs)
```

Reviewed By: Kai

Differential Revision: https://reviews.llvm.org/D100987
2021-05-03 14:58:24 -04:00
Anirudh Prasad ded0a70aeb [AsmParser][SystemZ][z/OS] Reject "Dot" as current PC on z/OS
- Currently, the "." (Dot) character, when not identifying an Identifier or a Constant, refers to the current PC (Program Counter)
- However, in z/OS, for the HLASM dialect, it strictly accepts only the "*" as the current PC (Support for this will be put up in a follow-up patch)
- The changes in this patch allow individual platforms to choose whether they would like to use the "." (Dot) character as a marker for the current PC or not.
- It is achieved by introducing a new field in MCAsmInfo.h called `DotIsPC` (similar to `DollarIsPC`)

Reviewed By: abhina.sreeskantharajan

Differential Revision: https://reviews.llvm.org/D100975
2021-04-29 11:58:54 -04:00
Anirudh Prasad 07b0a72d8e [AsmParser][SystemZ][z/OS] Use updated framework in AsmLexer to accept special tokens as Identifiers
- Previously, https://reviews.llvm.org/D99889 changed the framework in the AsmLexer to treat special tokens, if they occur at the start of the string, as Identifiers.
- These are used by the MASM Parser implementation in LLVM, and we can extend some of the changes made in the previous patch to SystemZ.
- In SystemZ, the special "tokens" referred to here are "_", "$", "@", "#". [_|$|@|#] are already supported as "part" of an Identifier.
- The changes in this patch ensure that these special tokens, when they occur at the start of the Identifier, are treated as Identifiers.

Reviewed By: abhina.sreeskantharajan

Differential Revision: https://reviews.llvm.org/D100959
2021-04-28 15:43:24 -04:00
Anirudh Prasad 8f6185c713 [AsmParser][ms][X86] Fix possible misbehaviour in parsing of special tokens at start of string.
- Previously, https://reviews.llvm.org/D72680 introduced a new attribute called `AllowSymbolAtNameStart` (in relation to the MAsmParser changes) in `MCAsmInfo.h` which (according to the comment in the header) allows the following behaviour:

```
  /// This is true if the assembler allows $ @ ? characters at the start of
  /// symbol names. Defaults to false.
```

- However, the usage of this field in AsmLexer.cpp doesn't seem completely accurate* for a couple of reasons.

```
  default:
    if (MAI.doesAllowSymbolAtNameStart()) {
      // Handle Microsoft-style identifier: [a-zA-Z_$.@?][a-zA-Z0-9_$.@#?]*
      if (!isDigit(CurChar) &&
          isIdentifierChar(CurChar, MAI.doesAllowAtInName(),
                           AllowHashInIdentifier))
        return LexIdentifier();
    }
```

1. The Dollar and At tokens, when occurring at the start of the string, are treated as separate tokens (AsmToken::Dollar and AsmToken::At respectively) and not lexed as an Identifier.
2. I'm not too sure why `MAI.doesAllowAtInName()` is used when `AllowAtInIdentifier` could be used. For X86 platforms, afaict, this shouldn't be an issue, since the `CommentString` attribute isn't "@". (alternatively the call to the setter can be set anywhere else as needed). The `AllowAtInName` does have an additional important meaning, but in the context of AsmLexer, shouldn't mean anything different compared to `AllowAtInIdentifier`

My proposal is the following:

- Introduce 3 new fields called `AllowQuestionTokenAtStartOfString`, `AllowDollarTokenAtStartOfString` and `AllowAtTokenAtStartOfString` in MCAsmInfo.h which will encapsulate the previously documented behaviour of "allowing $, @, ? characters at the start of symbol names")
- Introduce these fields where "$", "@" are lexed, and treat them as identifiers depending on whether `Allow[Dollar|At]TokenAtStartOfString` is set.
- For the sole case of "?", append it to the existing logic for treating a "default" token as an Identifier.

z/OS (HLASM) will also make use of some of these fields in follow up patches.

completely accurate* - This was based on the comments and the intended behaviour the code. I might have completely misinterpreted it, and if that is the case my sincere apologies. We can close this patch if necessary, if there are no changes to be made :)

Depends on https://reviews.llvm.org/D99374

Reviewed By: Jonathan.Crowther

Differential Revision: https://reviews.llvm.org/D99889
2021-04-21 10:21:09 -04:00
Anirudh Prasad 6ddd8c28b7 [AsmParser][SystemZ][z/OS] Add support to AsmLexer to accept HLASM style integers
- Add support for HLASM style integers. These are the decimal integers [0-9].
- HLASM does not support the additional prefixed integers like, `0b`, `0x`, octal integers and Masm style integers.
- To achieve this, a field `LexHLASMStyleIntegers` (similar to the `LexMasmStyleIntegers` field) is introduced in `MCAsmLexer.h` as well as a corresponding setter.

Note: This field could also go into MCAsmInfo.h. I used the previous precedent set by the `LexMasmIntegers` field.

Depends on https://reviews.llvm.org/D99286

Reviewed By: epastor

Differential Revision: https://reviews.llvm.org/D99374
2021-04-13 15:29:37 -04:00
Anirudh Prasad f7eec83932 [AsmParser][SystemZ][z/OS] Add in support to allow use of additional comment strings.
- Currently, MCAsmInfo provides a CommentString attribute, that various targets can set, so that the AsmLexer can appropriately lex a string as a comment based on the set value of the attribute.
- However, AsmLexer also supports a few additional comment syntaxes, in addition to what's specified as a CommentString attribute. This includes regular C-style block comments (/* ... */), regular C-style line comments (// .... ) and #. While I'm not sure as to why this behaviour exists, I am assuming it does to maintain backward compatibility with GNU AS (see https://sourceware.org/binutils/docs/as/Comments.html#Comments for reference)
For example:
Consider a target which sets the CommentString attribute to '*'.
The following strings are all lexed as comments.

```
"# abc" -> comment
"// abc" -> comment
"/* abc */ -> comment
"* abc" -> comment
```

- In HLASM however, only "*" is accepted as a comment string, and nothing else.
- To achieve this, an additional attribute (`AllowAdditionalComments`) has been added to MCAsmInfo. If this attribute is set to false, then only the string specified by the CommentString attribute is used as a possible comment string to be lexed by the AsmLexer. The regular C-style block comments, line comments and "#" are disabled. As a final note, "#" will still be treated as a comment, if the CommentString attribute is set to "#".

Depends on https://reviews.llvm.org/D99277

Reviewed By: abhina.sreeskantharajan, myiwanch

Differential Revision: https://reviews.llvm.org/D99286
2021-04-13 11:15:09 -04:00
LemonBoy edb18ea5a9 [AsmParser] Recognize more escaped characters between single quotes
The GNU AS manual states the following about single-character constants enclosed within single quotes:

>  Some backslash escapes apply to characters, \b, \f, \n, \r, \t, and \" with the same meaning as for strings, plus \' for a single quote.

Add two more characters to the switch handling this case to match GAS behaviour, plus a test to make sure nothing regresses.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D99609
2021-04-08 09:59:37 +02:00
Abhina Sreeskantharajan 82b3e28e83 [SystemZ][z/OS][Windows] Add new OF_TextWithCRLF flag and use this flag instead of OF_Text
Problem:
On SystemZ we need to open text files in text mode. On Windows, files opened in text mode adds a CRLF '\r\n' which may not be desirable.

Solution:
This patch adds two new flags

  - OF_CRLF which indicates that CRLF translation is used.
  - OF_TextWithCRLF = OF_Text | OF_CRLF indicates that the file is text and uses CRLF translation.

Developers should now use either the OF_Text or OF_TextWithCRLF for text files and OF_None for binary files. If the developer doesn't want carriage returns on Windows, they should use OF_Text, if they do want carriage returns on Windows, they should use OF_TextWithCRLF.

So this is the behaviour per platform with my patch:

z/OS:
OF_None: open in binary mode
OF_Text : open in text mode
OF_TextWithCRLF: open in text mode

Windows:
OF_None: open file with no carriage return
OF_Text: open file with no carriage return
OF_TextWithCRLF: open file with carriage return

The Major change is in llvm/lib/Support/Windows/Path.inc to only set text mode if the OF_CRLF is set.
```
  if (Flags & OF_CRLF)
    CrtOpenFlags |= _O_TEXT;
```

These following files are the ones that still use OF_Text which I left unchanged. I modified all these except raw_ostream.cpp in recent patches so I know these were previously in Binary mode on Windows.
./llvm/lib/Support/raw_ostream.cpp
./llvm/lib/TableGen/Main.cpp
./llvm/tools/dsymutil/DwarfLinkerForBinary.cpp
./llvm/unittests/Support/Path.cpp
./clang/lib/StaticAnalyzer/Core/HTMLDiagnostics.cpp
./clang/lib/Frontend/CompilerInstance.cpp
./clang/lib/Driver/Driver.cpp
./clang/lib/Driver/ToolChains/Clang.cpp

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D99426
2021-04-06 07:23:31 -04:00
Ricky Taylor 4db18d62af [M68k] Add support for Motorola literal syntax to AsmParser
These look like $00A0cf for hex and  %001010101 for binary. They are used in Motorola assembly syntax.

Differential Revision: https://reviews.llvm.org/D98519
2021-04-05 20:02:29 +01:00
Eric Astor 0499a9d688 [ms] [llvm-ml] Accept /WX to signal that warnings should be fatal.
Define -fatal-warnings to make warnings fatal, and accept /WX as an ML.EXE compatible alias for it.

Also make sure that if Warning() returns true, we always treat it as an error.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D92504
2021-04-02 15:13:20 -04:00
Eric Astor 15ec0ad77a [ms] [llvm-ml] Fix case-sensitivity for variables and textmacros
Make variables and text-macro references case-insensitive, to match ml.exe.

Also improve error handling for text-macro expansion.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D92503
2021-04-02 14:08:02 -04:00
Anirudh Prasad 7b921a6747 [AsmParser][SystemZ][z/OS] Add in support to accept "#" as part of an Identifier token
- This patch adds in support to accept the "#" character as part of an Identifier.
- This support is needed especially for the HLASM dialect since "#" is treated as part of the valid "Alphabet" range
- The way this is done is by making use of the previous precedent set by the `AllowAtInIdentifier` field in `MCAsmLexer.h`. A new field called `AllowHashInIdentifier` is introduced.
- The static function `IsIdentifierChar` is also updated to accept the `#` character if the `AllowHashInIdentifier` field is set to true.
Note: The field introduced in `MCAsmLexer.h` could very well be moved to `MCAsmInfo.h`. I'm not opposed to it. I decided to put it in `MCAsmLexer` since there seems to be some sort of precedent already with `AllowAtInIdentifier`.

Reviewed By: abhina.sreeskantharajan, nickdesaulniers, MaskRay

Differential Revision: https://reviews.llvm.org/D99277
2021-04-01 11:24:43 -04:00
Konstantin Zhuravlyov f4ace63737 AMDGPU: Add target id and code object v4 support
- Add target id support (https://clang.llvm.org/docs/ClangOffloadBundler.html#target-id)
  - Add code object v4 support (https://llvm.org/docs/AMDGPUUsage.html#elf-code-object)
    - Add kernarg_size to kernel descriptor
    - Change trap handler ABI to no longer move queue pointer into s[0:1]
  - Cleanup ELF definitions
    - Add V2, V3, V4 suffixes to make a clear distinction for code object version
    - Consolidate note names

Differential Revision: https://reviews.llvm.org/D95638
2021-03-24 11:54:05 -04:00
Anirudh Prasad 301d9261b7 [AsmParser][SystemZ][z/OS] Re-introduce HLASM comment syntax
- https://reviews.llvm.org/rGb605cfb336989705f391d255b7628062d3dfe9c3 was reverted due to sanitizer bugs in the introduced unit-test (specifically in the Address sanitizer https://lab.llvm.org/buildbot/#/builders/5/builds/5697)
- This patch attempts to rectify that, as well as re-factor parts of the test
- The issue was previously, within the `setupCallToAsmParser` function in the unit-test, `SrcMgr` was declared as a local variable. `SrcMgr` owns a unique pointer. Since the variable goes out of scope at the end of the function, the unique pointer is released.
- This patch, moves the declaration of the `SrcMgr` variable to a class field, since the scope will remain until the class's destructor is invoked (which in this case is at the end of the unit test)
- Furthermore, this patch also moves the `MCContext Ctx` declaration from a local variable instance inside a function, to a unique pointer class field. This ensures the instantiation of the MCContext remains until the tear down of the test.

Reviewed By: abhina.sreeskantharajan

Differential Revision: https://reviews.llvm.org/D99004
2021-03-24 10:17:00 -04:00
Yuanfang Chen b4a8c0ebb6 [LTO][MC] Discard non-prevailing defined symbols in module-level assembly
This is the alternative approach to D96931.

In LTO, for each module with inlineasm block, prepend directive ".lto_discard <sym>, <sym>*" to the beginning of the inline
asm.  ".lto_discard" is both a module inlineasm block marker and (optionally) provides a list of symbols to be discarded.

In MC while emitting for inlineasm, discard symbol binding & symbol
definitions according to ".lto_disard".

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D98762
2021-03-18 15:33:42 -07:00
Anirudh Prasad 9f5da80013 Revert "[AsmParser][SystemZ][z/OS] Reland "Introduce HLASM Comment Syntax""
This reverts commit b605cfb336.

Differential Revision: https://reviews.llvm.org/D98744
2021-03-16 18:39:04 -04:00
Anirudh Prasad b605cfb336 [AsmParser][SystemZ][z/OS] Reland "Introduce HLASM Comment Syntax"
- Previously, https://reviews.llvm.org/D97703 was [[ https://reviews.llvm.org/D98543 | reverted ]] as it broke when building the unit tests when shared libs on.
- This patch reverts the "revert" and makes two minor changes
- The first is it also links in the MCParser lib when building the unittest. This should resolve the issue when building with with shared libs on and off
- The second renames the name of the unit test from `SystemZAsmLexer` to `SystemZAsmLexerTests` since the convention for unittest binaries is to suffix the name of the unit test with "Tests"

Reviewed By: Kai

Differential Revision: https://reviews.llvm.org/D98666
2021-03-16 17:11:46 -04:00
Hubert Tong 4f9cc1512d Revert "[AsmParser][SystemZ][z/OS] Introducing HLASM Comment Syntax"
This reverts commit bcdd40f802.
See https://reviews.llvm.org/D98543.
2021-03-12 14:48:00 -05:00
Anirudh Prasad bcdd40f802 [AsmParser][SystemZ][z/OS] Introducing HLASM Comment Syntax
- This patch adds in support for the ordinary HLASM comment syntax asm
  statements (Reference - Chapter 7, Comment Statements, Ordinary Comment
  Statements)
- In brief, the ordinary comment syntax if used, must begin with the "*"
  character
- To achieve this, this patch makes use of the CommentString attribute
  provided in the base MCAsmInfo class
- In the SystemZMCAsmInfo class, the CommentString attribute was set to
  "*" based on the assembler dialect
- Furthermore, a new attribute RestrictCommentString, is provided to only
  treat a string as a comment if it appears at the start of the asm
  statement. Example: "jo *-4" is valid in HLASM (jump back 4 bytes from
  current point - similar to jo -4 in gnu asm) and we don't want "*-4" to
  be treated as a comment.
- RFC for HLASM Parser support implementation: https://lists.llvm.org/pipermail/llvm-dev/2021-January/147686.html

Reviewed By: scott.linder, Kai

Differential Revision: https://reviews.llvm.org/D97703
2021-03-12 11:56:11 -05:00
Fangrui Song 45f949ee46 [MC] Migrate some parseToken(AsmToken::EndOfStatement, ...) to parseEOL() 2021-03-06 19:25:22 -08:00
Fangrui Song bb6732cf62 [MC] Add parseEOL() overload and migrate some parseToken(AsmToken::EndOfStatement) to parseEOL()
For many directives, the following diagnostics

* `error: unexpected token`
* `error: unexpected token in '.abort' directive"`

are replaced with `error: expected newline`.

`unexpected token` may make the user think a different token is needed.
`expected newline` is clearer about the expected token.

For `in '...' directive`, the directive name is not useful because the next line
replicates the error line which includes the directive.
2021-03-06 17:45:23 -08:00
Fangrui Song e5eb3e3836 [MC] Parse end-of-line for .addrsig & .addrsig_sym 2021-03-06 16:26:27 -08:00
Fangrui Song fd785f98aa [MC] Parse end-of-line for .cfi_* directives
Otherwise MCAsmStreamer will emit duplicate newlines.
2021-03-06 16:20:55 -08:00
Fangrui Song d96af2ed2d [MC] Support .symver *, *, remove
As a resolution to https://sourceware.org/bugzilla/show_bug.cgi?id=25295 , GNU as
from binutils 2.35 supports the optional third argument for the .symver directive.

'remove' for a non-default version is useful:
`.symver def_v1, def@v1, remove` => def_v1 is not retained in the symbol table.
Previously the user has to strip the original symbol or specify a `local:`
version node in a version script to localize the symbol.

`.symver def, def@@v1, remove` and `.symver def, def@@@v1, remove` are supported
as well, though they are identical to `.symver def, def@@@v1`.

local/hidden are not useful so this patch does not implement them.
2021-03-06 15:23:02 -08:00
Benjamin Kramer 955365524a [MCParser] Bring back srcmanager diagnostics in AsmParser
AsmParser may have no LLVMContext attached to it, which means after
5de2d189e6 everything goes to stderr.
Restore the old behavior.
2021-03-02 13:43:03 +01:00
Yuanfang Chen 5de2d189e6 [Diagnose] Unify MCContext and LLVMContext diagnosing
The situation with inline asm/MC error reporting is kind of messy at the
moment. The errors from MC layout are not reliably propagated and users
have to specify an inlineasm handler separately to get inlineasm
diagnose. The latter issue is not a correctness issue but could be improved.

* Kill LLVMContext inlineasm diagnose handler and migrate it to use
  DiagnoseInfo/DiagnoseHandler.
* Introduce `DiagnoseInfoSrcMgr` to diagnose SourceMgr backed errors. This
  covers use cases like inlineasm, MC, and any clients using SourceMgr.
* Move AsmPrinter::SrcMgrDiagInfo and its instance to MCContext. The next step
  is to combine MCContext::SrcMgr and MCContext::InlineSrcMgr because in all
  use cases, only one of them is used.
* If LLVMContext is available, let MCContext uses LLVMContext's diagnose
  handler; if LLVMContext is not available, MCContext uses its own default
  diagnose handler which just prints SMDiagnostic.
* Change a few clients(Clang, llc, lldb) to use the new way of reporting.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D97449
2021-03-01 15:58:37 -08:00
Fangrui Song 880c9c56c1 [MC] Allow .cfi_sections with empty section list
GNU as supports this. This mode silently ignores
.cfi_startproc/.cfi_endproc and .cfi_* in between.

Also drop a diagnostic `in '.cfi_sections' directive`: the diagnostic
already includes the line and it is clear the line is a `.cfi_sections` directive.
2021-02-25 22:29:49 -08:00
Jon Roelofs 7f6e331645 Support `#pragma clang section` directives on MachO targets
rdar://59560986

Differential Revision: https://reviews.llvm.org/D97233
2021-02-25 09:30:10 -08:00
Petr Hosek 16af973933 [MC][ELF] Support for zero flag section groups
This change introduces support for zero flag ELF section groups to LLVM.
LLVM already supports COMDAT sections, which in ELF are a special type
of ELF section groups. These are generally useful to enable linker GC
where you want a group of sections to always travel together, that is to
be either retained or discarded as a whole, but without the COMDAT
semantics. Other ELF assemblers already support zero flag ELF section
groups and this change helps us reach feature parity.

Differential Revision: https://reviews.llvm.org/D95851
2021-02-16 14:23:40 -08:00
Sam Clegg 7e7cfce0b6 [WebAssembly] Use data sections by default
This allows data sections that don't start with `.data` to be
used/created.

Without this, clang's `__attribute__((section("foo")))` would
generate assembly that would not parse.

Differential Revision: https://reviews.llvm.org/D96233
2021-02-09 11:03:06 -08:00
Fangrui Song 3e8ab54ba0 [MC] Upgrade DWARF version to 5 upon .file 0
Without `-dwarf-version`, llvm-mc uses the default `MCContext::DwarfVersion` 4.

Without `-gdwarf-N`, Clang cc1as uses `clang::driver::ToolChain::GetDefaultDwarfVersion`
which is 4 on many toolchains. Note: `clang -c` can synthesize .debug_info without -g.

There is currently a MCParser warning upon `.file 0` and MCParser errors upon
`.loc 0` if the DWARF version is less than 5. This causes friction to the
following usage:

```
clang -S -g -gdwarf-5 a.c

// MC warning due to .file 0, MC error due to .loc 0
clang -c a.s
llvm-mc -filetype=obj a.s
```

My idea is that we can just upgrade `MCContext::DwarfVersion` to 5 upon
`.file 0` to make the above commands work.

The downside is that for an explicit version `clang -c -gdwarf-4 a.s`, it can be
argued that the new behavior drops the probably intended diagnostic. I think the
downside is small because in most cases DWARF version for an assembly action
should either match the original compile action or be omitted.

Ongoing discussion taking a similar action for GNU as: https://sourceware.org/pipermail/binutils/2021-January/114980.html

Differential Revision: https://reviews.llvm.org/D94882
2021-02-02 09:41:05 -08:00
Fangrui Song 1477ed8465 [MC] Support SHF_GNU_RETAIN as section flag 'R'
On Linux target triples, GNU as sets EI_OSABI to ELFOSABI_GNU when SHF_GNU_RETAIN is used。
On `*-*-freebsd`, it usually sets EI_OSABI to ELFOSABI_FREEBSD.

GNU ld respects SHF_GNU_RETAIN only for ELFOSABI_FREEBSD/ELFOSABI_GNU.
https://sourceware.org/bugzilla/show_bug.cgi?id=27282

MC doesn't set ELFOSABI_GNU for SHF_GNU_RETAIN/STB_GNU_UNIQUE/STT_GNU_IFUNC.
MC assembled object files do not have special semantics in GNU ld.

Reviewed By: psmith

Differential Revision: https://reviews.llvm.org/D95730
2021-02-02 09:34:09 -08:00
Tobias Burnus 70ea15b889 [MC][ELF] Fix accepting abbreviated form with sh_flags and sh_entsize
Followup to D92052 as I missed an issue as shown via GCC bug https://gcc.gnu.org/PR97827, namely: (e.g.) ".rodata." implies ELF::SHF_ALLOC.

Crossref:

- D73999 / commit 75af9da755
  added for LLVM 11 a check that sh_flags and sh_entsize (and sh_type)
  changes are an error, in line with GNU assembler.

-  D92052 / commit 1deff4009e
   permitted the abbreviated form which many assemblers accept and
   GCC generates: while the first .section contains the flags and entsize,
   subsequent sections simply contain the name without repeating entsize or
   flags.

However, the latter patch missed in the check that some flags are automatically set, e.g. '.rodata." implies ELF::SHF_ALLOC.

Related https://bugs.llvm.org/show_bug.cgi?id=48201

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D94072
2021-01-28 14:54:43 +00:00
Kazu Hirata cfa241680f [llvm] Don't include StringSwitch.h where unnecessary (NFC) 2021-01-21 19:59:48 -08:00
Jian Cai 9dfeec8530 Reland "[AsmParser] make .ascii support spaces as separators"
This relands commit e0963ae274, which was
reverted on commit 82c4153e66 due to a
test failure, which turned out to be a false positive.
2021-01-14 17:51:47 -08:00
Jian Cai 82c4153e66 Revert "[AsmParser] make .ascii support spaces as separators"
This reverts commit e0963ae274. The change
breaks some GDB tests. Revert it while we investigate.
2021-01-13 14:38:22 -08:00
Kazu Hirata 1d0bc05551 [llvm] Use llvm::append_range (NFC) 2021-01-06 18:27:33 -08:00
Jian Cai e0963ae274 [AsmParser] make .ascii support spaces as separators
Currently the integrated assembler only allows commas as the separator
between string arguments in .ascii. This patch adds support to using
space as separators and make IAS consistent with GNU assembler.

Link: https://github.com/ClangBuiltLinux/linux/issues/1196

Reviewed By: nickdesaulniers, jrtc27

Differential Revision: https://reviews.llvm.org/D91460
2020-12-20 22:41:00 -08:00
Fangrui Song 01d1de8196 [MC] Reject byte alignment if larger than or equal to 2**32
This is consistent with the resolution to power-of-2 alignments.
Otherwise, emitCodeAlignment and emitValueToAlignment cannot handle alignments
larger than 2**32 and will trigger assertion failure (PR35218).

Note: GNU as as of 2.35 will use 1 for such a large byte `.align`
2020-12-20 14:17:00 -08:00
Tobias Burnus 1deff4009e [MC][ELF] Accept abbreviated form with sh_flags and sh_entsize
D73999 / commit 75af9da755
added for LLVM 11 a check that sh_flags and sh_entsize (and sh_type)
changes are an error, in line with GNU assembler.

However, GNU assembler accepts and GCC generates an abbreviated form:
while the first .section contains the flags and entsize, subsequent
sections simply contain the name without repeating entsize or flags.

Do likewise for better compatibility.

See https://bugs.llvm.org/show_bug.cgi?id=48201

Reviewed By: jhenderson, MaskRay

Differential Revision: https://reviews.llvm.org/D92052
2020-12-11 16:45:45 +00:00
Hongtao Yu 705a4c149d [CSSPGO] Pseudo probe encoding and emission.
This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s

Pseudo probes are in the form of intrinsic calls on IR/MIR but they do not turn into any machine instructions. Instead they are emitted into the binary as a piece of data in standalone sections.  The probe-specific sections are not needed to be loaded into memory at execution time, thus they do not incur a runtime overhead. 

**ELF object emission**

The binary data to emit are organized as two ELF sections, i.e, the `.pseudo_probe_desc` section and the `.pseudo_probe` section. The `.pseudo_probe_desc` section stores a function descriptor for each function and the `.pseudo_probe` section stores the actual probes, each fo which corresponds to an IR basic block or an IR function callsite. A function descriptor is stored as a module-level metadata during the compilation and is serialized into the object file during object emission.

Both the probe descriptors and pseudo probes can be emitted into a separate ELF section per function to leverage the linker for deduplication.  A `.pseudo_probe` section shares the same COMDAT group with the function code so that when the function is dead, the probes are dead and disposed too. On the contrary, a `.pseudo_probe_desc` section has its own COMDAT group. This is because even if a function is dead, its probes may be inlined into other functions and its descriptor is still needed by the profile generation tool.

The format of `.pseudo_probe_desc` section looks like:

```
.section   .pseudo_probe_desc,"",@progbits
.quad   6309742469962978389  // Func GUID
.quad   4294967295           // Func Hash
.byte   9                    // Length of func name
.ascii  "_Z5funcAi"          // Func name
.quad   7102633082150537521
.quad   138828622701
.byte   12
.ascii  "_Z8funcLeafi"
.quad   446061515086924981
.quad   4294967295
.byte   9
.ascii  "_Z5funcBi"
.quad   -2016976694713209516
.quad   72617220756
.byte   7
.ascii  "_Z3fibi"
```

For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format :

```
FUNCTION BODY (one for each outlined function present in the text section)
    GUID (uint64)
        GUID of the function
    NPROBES (ULEB128)
        Number of probes originating from this function.
    NUM_INLINED_FUNCTIONS (ULEB128)
        Number of callees inlined into this function, aka number of
        first-level inlinees
    PROBE RECORDS
        A list of NPROBES entries. Each entry contains:
          INDEX (ULEB128)
          TYPE (uint4)
            0 - block probe, 1 - indirect call, 2 - direct call
          ATTRIBUTE (uint3)
            reserved
          ADDRESS_TYPE (uint1)
            0 - code address, 1 - address delta
          CODE_ADDRESS (uint64 or ULEB128)
            code address or address delta, depending on ADDRESS_TYPE
    INLINED FUNCTION RECORDS
        A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined
        callees.  Each record contains:
          INLINE SITE
            GUID of the inlinee (uint64)
            ID of the callsite probe (ULEB128)
          FUNCTION BODY
            A FUNCTION BODY entry describing the inlined function.
```

To support building a context-sensitive profile, probes from inlinees are grouped by their inline contexts. An inline context is logically a call path through which a callee function lands in a caller function. The probe emitter builds an inline tree based on the debug metadata for each outlined function in the form of a trie tree. A tree root is the outlined function. Each tree edge stands for a callsite where inlining happens. Pseudo probes originating from an inlinee function are stored in a tree node and the tree path starting from the root all the way down to the tree node is the inline context of the probes. The emission happens on the whole tree top-down recursively. Probes of a tree node will be emitted altogether with their direct parent edge. Since a pseudo probe corresponds to a real code address, for size savings, the address is encoded as a delta from the previous probe except for the first probe. Variant-sized integer encoding, aka LEB128, is used for address delta and probe index.

**Assembling**

Pseudo probes can be printed as assembly directives alternatively. This allows for good assembly code readability and also provides a view of how optimizations and pseudo probes affect each other, especially helpful for diff time assembly analysis.

A pseudo probe directive has the following operands in order: function GUID, probe index, probe type, probe attributes and inline context. The directive is generated by the compiler and can be parsed by the assembler to form an encoded `.pseudoprobe` section in the object file.

A example assembly looks like:

```
foo2: # @foo2
# %bb.0: # %bb0
pushq %rax
testl %edi, %edi
.pseudoprobe 837061429793323041 1 0 0
je .LBB1_1
# %bb.2: # %bb2
.pseudoprobe 837061429793323041 6 2 0
callq foo
.pseudoprobe 837061429793323041 3 0 0
.pseudoprobe 837061429793323041 4 0 0
popq %rax
retq
.LBB1_1: # %bb1
.pseudoprobe 837061429793323041 5 1 0
callq *%rsi
.pseudoprobe 837061429793323041 2 0 0
.pseudoprobe 837061429793323041 4 0 0
popq %rax
retq
# -- End function
.section .pseudo_probe_desc,"",@progbits
.quad 6699318081062747564
.quad 72617220756
.byte 3
.ascii "foo"
.quad 837061429793323041
.quad 281547593931412
.byte 4
.ascii "foo2"
```

With inlining turned on, the assembly may look different around %bb2 with an inlined probe:

```
# %bb.2:                                # %bb2
.pseudoprobe    837061429793323041 3 0
.pseudoprobe    6699318081062747564 1 0 @ 837061429793323041:6
.pseudoprobe    837061429793323041 4 0
popq    %rax
retq
```

**Disassembling**

We have a disassembling tool (llvm-profgen) that can display disassembly alongside with pseudo probes. So far it only supports ELF executable file.

An example disassembly looks like:

```
00000000002011a0 <foo2>:
  2011a0: 50                    push   rax
  2011a1: 85 ff                 test   edi,edi
  [Probe]:  FUNC: foo2  Index: 1  Type: Block
  2011a3: 74 02                 je     2011a7 <foo2+0x7>
  [Probe]:  FUNC: foo2  Index: 3  Type: Block
  [Probe]:  FUNC: foo2  Index: 4  Type: Block
  [Probe]:  FUNC: foo   Index: 1  Type: Block  Inlined: @ foo2:6
  2011a5: 58                    pop    rax
  2011a6: c3                    ret
  [Probe]:  FUNC: foo2  Index: 2  Type: Block
  2011a7: bf 01 00 00 00        mov    edi,0x1
  [Probe]:  FUNC: foo2  Index: 5  Type: IndirectCall
  2011ac: ff d6                 call   rsi
  [Probe]:  FUNC: foo2  Index: 4  Type: Block
  2011ae: 58                    pop    rax
  2011af: c3                    ret
```

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D91878
2020-12-10 17:29:28 -08:00
Derek Schuff 8d396acac3 [WebAssembly] Support COMDAT sections in assembly syntax
This CL changes the asm syntax for section flags, making them more like ELF
(previously "passive" was the only option). Now we also allow "G" to designate
COMDAT group sections. In these sections we set the appropriate comdat flag on
function symbols, and also avoid auto-creating a new section for them.

This also adds asm-based tests for the changes D92691 to go along with
the direct-to-object tests.

Differential Revision: https://reviews.llvm.org/D92952
This is a reland of rG4564553b8d8a with a fix to the lit pipeline in
llvm/test/MC/WebAssembly/comdat.ll
2020-12-10 16:43:59 -08:00
Derek Schuff dd1aa4fdd8 Revert "[WebAssembly] Support COMDAT sections in assembly syntax"
This reverts commit 4564553b8d.
It broke several buildbots.
2020-12-10 15:55:33 -08:00
Mitch Phillips 7ead5f5aa3 Revert "[CSSPGO] Pseudo probe encoding and emission."
This reverts commit b035513c06.

Reason: Broke the ASan buildbots:
  http://lab.llvm.org:8011/#/builders/5/builds/2269
2020-12-10 15:53:39 -08:00
Mitch Phillips b955eb688d Revert "[NFC] Fix a gcc build break by using an explict constructor."
This reverts commit 248b279cf0.

Reason: Dependency of patch that broke the ASan buildbots:
  http://lab.llvm.org:8011/#/builders/5/builds/2269
2020-12-10 15:53:38 -08:00
Derek Schuff 4564553b8d [WebAssembly] Support COMDAT sections in assembly syntax
This CL changes the asm syntax for section flags, making them more like ELF
(previously "passive" was the only option). Now we also allow "G" to designate
COMDAT group sections. In these sections we set the appropriate comdat flag on
function symbols, and also avoid auto-creating a new section for them.

This also adds asm-based tests for the changes D92691 to go along with
the direct-to-object tests.

Differential Revision: https://reviews.llvm.org/D92952
2020-12-10 14:46:24 -08:00
Hongtao Yu 248b279cf0 [NFC] Fix a gcc build break by using an explict constructor. 2020-12-10 11:21:40 -08:00
Hongtao Yu b035513c06 [CSSPGO] Pseudo probe encoding and emission.
This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s

Pseudo probes are in the form of intrinsic calls on IR/MIR but they do not turn into any machine instructions. Instead they are emitted into the binary as a piece of data in standalone sections.  The probe-specific sections are not needed to be loaded into memory at execution time, thus they do not incur a runtime overhead. 

**ELF object emission**

The binary data to emit are organized as two ELF sections, i.e, the `.pseudo_probe_desc` section and the `.pseudo_probe` section. The `.pseudo_probe_desc` section stores a function descriptor for each function and the `.pseudo_probe` section stores the actual probes, each fo which corresponds to an IR basic block or an IR function callsite. A function descriptor is stored as a module-level metadata during the compilation and is serialized into the object file during object emission.

Both the probe descriptors and pseudo probes can be emitted into a separate ELF section per function to leverage the linker for deduplication.  A `.pseudo_probe` section shares the same COMDAT group with the function code so that when the function is dead, the probes are dead and disposed too. On the contrary, a `.pseudo_probe_desc` section has its own COMDAT group. This is because even if a function is dead, its probes may be inlined into other functions and its descriptor is still needed by the profile generation tool.

The format of `.pseudo_probe_desc` section looks like:

```
.section   .pseudo_probe_desc,"",@progbits
.quad   6309742469962978389  // Func GUID
.quad   4294967295           // Func Hash
.byte   9                    // Length of func name
.ascii  "_Z5funcAi"          // Func name
.quad   7102633082150537521
.quad   138828622701
.byte   12
.ascii  "_Z8funcLeafi"
.quad   446061515086924981
.quad   4294967295
.byte   9
.ascii  "_Z5funcBi"
.quad   -2016976694713209516
.quad   72617220756
.byte   7
.ascii  "_Z3fibi"
```

For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format :

```
FUNCTION BODY (one for each outlined function present in the text section)
    GUID (uint64)
        GUID of the function
    NPROBES (ULEB128)
        Number of probes originating from this function.
    NUM_INLINED_FUNCTIONS (ULEB128)
        Number of callees inlined into this function, aka number of
        first-level inlinees
    PROBE RECORDS
        A list of NPROBES entries. Each entry contains:
          INDEX (ULEB128)
          TYPE (uint4)
            0 - block probe, 1 - indirect call, 2 - direct call
          ATTRIBUTE (uint3)
            reserved
          ADDRESS_TYPE (uint1)
            0 - code address, 1 - address delta
          CODE_ADDRESS (uint64 or ULEB128)
            code address or address delta, depending on ADDRESS_TYPE
    INLINED FUNCTION RECORDS
        A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined
        callees.  Each record contains:
          INLINE SITE
            GUID of the inlinee (uint64)
            ID of the callsite probe (ULEB128)
          FUNCTION BODY
            A FUNCTION BODY entry describing the inlined function.
```

To support building a context-sensitive profile, probes from inlinees are grouped by their inline contexts. An inline context is logically a call path through which a callee function lands in a caller function. The probe emitter builds an inline tree based on the debug metadata for each outlined function in the form of a trie tree. A tree root is the outlined function. Each tree edge stands for a callsite where inlining happens. Pseudo probes originating from an inlinee function are stored in a tree node and the tree path starting from the root all the way down to the tree node is the inline context of the probes. The emission happens on the whole tree top-down recursively. Probes of a tree node will be emitted altogether with their direct parent edge. Since a pseudo probe corresponds to a real code address, for size savings, the address is encoded as a delta from the previous probe except for the first probe. Variant-sized integer encoding, aka LEB128, is used for address delta and probe index.

**Assembling**

Pseudo probes can be printed as assembly directives alternatively. This allows for good assembly code readability and also provides a view of how optimizations and pseudo probes affect each other, especially helpful for diff time assembly analysis.

A pseudo probe directive has the following operands in order: function GUID, probe index, probe type, probe attributes and inline context. The directive is generated by the compiler and can be parsed by the assembler to form an encoded `.pseudoprobe` section in the object file.

A example assembly looks like:

```
foo2: # @foo2
# %bb.0: # %bb0
pushq %rax
testl %edi, %edi
.pseudoprobe 837061429793323041 1 0 0
je .LBB1_1
# %bb.2: # %bb2
.pseudoprobe 837061429793323041 6 2 0
callq foo
.pseudoprobe 837061429793323041 3 0 0
.pseudoprobe 837061429793323041 4 0 0
popq %rax
retq
.LBB1_1: # %bb1
.pseudoprobe 837061429793323041 5 1 0
callq *%rsi
.pseudoprobe 837061429793323041 2 0 0
.pseudoprobe 837061429793323041 4 0 0
popq %rax
retq
# -- End function
.section .pseudo_probe_desc,"",@progbits
.quad 6699318081062747564
.quad 72617220756
.byte 3
.ascii "foo"
.quad 837061429793323041
.quad 281547593931412
.byte 4
.ascii "foo2"
```

With inlining turned on, the assembly may look different around %bb2 with an inlined probe:

```
# %bb.2:                                # %bb2
.pseudoprobe    837061429793323041 3 0
.pseudoprobe    6699318081062747564 1 0 @ 837061429793323041:6
.pseudoprobe    837061429793323041 4 0
popq    %rax
retq
```

**Disassembling**

We have a disassembling tool (llvm-profgen) that can display disassembly alongside with pseudo probes. So far it only supports ELF executable file.

An example disassembly looks like:

```
00000000002011a0 <foo2>:
  2011a0: 50                    push   rax
  2011a1: 85 ff                 test   edi,edi
  [Probe]:  FUNC: foo2  Index: 1  Type: Block
  2011a3: 74 02                 je     2011a7 <foo2+0x7>
  [Probe]:  FUNC: foo2  Index: 3  Type: Block
  [Probe]:  FUNC: foo2  Index: 4  Type: Block
  [Probe]:  FUNC: foo   Index: 1  Type: Block  Inlined: @ foo2:6
  2011a5: 58                    pop    rax
  2011a6: c3                    ret
  [Probe]:  FUNC: foo2  Index: 2  Type: Block
  2011a7: bf 01 00 00 00        mov    edi,0x1
  [Probe]:  FUNC: foo2  Index: 5  Type: IndirectCall
  2011ac: ff d6                 call   rsi
  [Probe]:  FUNC: foo2  Index: 4  Type: Block
  2011ae: 58                    pop    rax
  2011af: c3                    ret
```

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D91878
2020-12-10 09:50:08 -08:00
Scott Linder 19c56e11fa [MC] Fix ICE with non-newline terminated input
There is an explicit option for the lexer to support this, but we crash
when `-preserve-comments` is enabled because it checks for
`getTok().getString().empty()` to detect the case. This doesn't
work currently because the lexer reports this case as a string of length
1, containing a null byte.

Change the lexer to instead report this case via an empty string, as the
null terminator isn't logically a part of the textual input, and the
check for `.empty()` seems natural and obvious in the calling code.

Reviewed By: niravd

Differential Revision: https://reviews.llvm.org/D92681
2020-12-09 23:39:32 +00:00
Fangrui Song 9c53b2adc8 [MC] Delete unused declarations
Notes:

* llvm::createAsmStreamer: it has been moved to TargetRegistry.h
* (anon ns)::WasmObjectWriter::updateCustomSectionRelocations: remnant of D46335
* COFFAsmParser::ParseSEHRegisterNumber: remnant of D66625
* llvm::CodeViewContext::isValidCVFileNumber: accidentally added by r279847
2020-12-06 15:36:39 -08:00
Scott Linder d55d6806ad [MC] Consume EndOfStatement in .cfi_{sections,endproc}
Previously these directives were always interpreted as having an extra
blank line after them.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D92612
2020-12-04 22:30:29 +00:00
Eric Astor c64037b784 [ms] [llvm-ml] Support command-line defines
Enable command-line defines as textmacros

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D90059
2020-12-01 18:06:05 -05:00
Eric Astor abef659a45 [ms] [llvm-ml] Implement the statement expansion operator
If prefaced with a %, expand text macros and macro functions in any statement.

Also, prevent expanding text macros in the message of an ECHO directive unless expanded explicitly by the statement expansion operator.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D89740
2020-11-30 14:33:24 -05:00