Commit Graph

10 Commits

Author SHA1 Message Date
Nikita Popov be5af50e7d [BPF] Use elementtype attribute for preserve.array/struct.index intrinsics
Use the elementtype attribute introduced in D105407 for the
llvm.preserve.array/struct.index intrinsics. It carries the
element type of the GEP these intrinsics effectively encode.

This patch:

 * Adds a verifier check that the attribute is required.
 * Adds it in the IRBuilder methods for these intrinsics.
 * Autoupgrades old bitcode without the attribute.
 * Updates the lowering code to use the attribute rather than
   the pointer element type.
 * Updates lots of tests to specify the attribute.
 * Adds -force-opaque-pointers to the intrinsic-array.ll test
   to demonstrate they work now.

https://reviews.llvm.org/D106184
2021-07-17 11:09:18 +02:00
serge-sans-paille 4ab3041acb Revert "[NFC] remove explicit default value for strboolattr attribute in tests"
This reverts commit bda6e5bee0.

See https://lab.llvm.org/buildbot/#/builders/109/builds/15424 for instance
2021-05-24 19:43:40 +02:00
serge-sans-paille bda6e5bee0 [NFC] remove explicit default value for strboolattr attribute in tests
Since d6de1e1a71, no attributes is quivalent to
setting attribute to false.

This is a preliminary commit for https://reviews.llvm.org/D99080
2021-05-24 19:31:04 +02:00
Yonghong Song 54d9f743c8 BPF: move AbstractMemberAccess and PreserveDIType passes to EP_EarlyAsPossible
Move abstractMemberAccess and PreserveDIType passes as early as
possible, right after clang code generation.

Currently, compiler may transform the above code
  p1 = llvm.bpf.builtin.preserve.struct.access(base, 0, 0);
  p2 = llvm.bpf.builtin.preserve.struct.access(p1, 1, 2);
  a = llvm.bpf.builtin.preserve_field_info(p2, EXIST);
  if (a) {
    p1 = llvm.bpf.builtin.preserve.struct.access(base, 0, 0);
    p2 = llvm.bpf.builtin.preserve.struct.access(p1, 1, 2);
    bpf_probe_read(buf, buf_size, p2);
  }
to
  p1 = llvm.bpf.builtin.preserve.struct.access(base, 0, 0);
  p2 = llvm.bpf.builtin.preserve.struct.access(p1, 1, 2);
  a = llvm.bpf.builtin.preserve_field_info(p2, EXIST);
  if (a) {
    bpf_probe_read(buf, buf_size, p2);
  }
and eventually assembly code looks like
  reloc_exist = 1;
  reloc_member_offset = 10; //calculate member offset from base
  p2 = base + reloc_member_offset;
  if (reloc_exist) {
    bpf_probe_read(bpf, buf_size, p2);
  }
if during libbpf relocation resolution, reloc_exist is actually
resolved to 0 (not exist), reloc_member_offset relocation cannot
be resolved and will be patched with illegal instruction.
This will cause verifier failure.

This patch attempts to address this issue by do chaining
analysis and replace chains with special globals right
after clang code gen. This will remove the cse possibility
described in the above. The IR typically looks like
  %6 = load @llvm.sk_buff:0:50$0:0:0:2:0
  %7 = bitcast %struct.sk_buff* %2 to i8*
  %8 = getelementptr i8, i8* %7, %6
for a particular address computation relocation.

But this transformation has another consequence, code sinking
may happen like below:
  PHI = <possibly different @preserve_*_access_globals>
  %7 = bitcast %struct.sk_buff* %2 to i8*
  %8 = getelementptr i8, i8* %7, %6

For such cases, we will not able to generate relocations since
multiple relocations are merged into one.

This patch introduced a passthrough builtin
to prevent such optimization. Looks like inline assembly has more
impact for optimizaiton, e.g., inlining. Using passthrough has
less impact on optimizations.

A new IR pass is introduced at the beginning of target-dependent
IR optimization, which does:
  - report fatal error if any reloc global in PHI nodes
  - remove all bpf passthrough builtin functions

Changes for existing CORE tests:
  - for clang tests, add "-Xclang -disable-llvm-passes" flags to
    avoid builtin->reloc_global transformation so the test is still
    able to check correctness for clang generated IR.
  - for llvm CodeGen/BPF tests, add "opt -O2 <ir_file> | llvm-dis" command
    before "llc" command since "opt" is needed to call newly-placed
    builtin->reloc_global transformation. Add target triple in the IR
    file since "opt" requires it.
  - Since target triple is added in IR file, if a test may produce
    different results for different endianness, two tests will be
    created, one for bpfeb and another for bpfel, e.g., some tests
    for relocation of lshift/rshift of bitfields.
  - field-reloc-bitfield-1.ll has different relocations compared to
    old codes. This is because for the structure in the test,
    new code returns struct layout alignment 4 while old code
    is 8. Align 8 is more precise and permits double load. With align 4,
    the new mechanism uses 4-byte load, so generating different
    relocations.
  - test intrinsic-transforms.ll is removed. This is used to test
    cse on intrinsics so we do not lose metadata. Now metadata is attached
    to global and not instruction, it won't get lost with cse.

Differential Revision: https://reviews.llvm.org/D87153
2020-09-28 16:56:22 -07:00
Yonghong Song fbb64aa698 [BPF] extend BTF_KIND_FUNC to cover global, static and extern funcs
Previously extern function is added as BTF_KIND_VAR. This does not work
well with existing BTF infrastructure as function expected to use
BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO.

This patch added extern function to BTF_KIND_FUNC. The two bits 0:1
of btf_type.info are used to indicate what kind of function it is:
  0: static
  1: global
  2: extern

Differential Revision: https://reviews.llvm.org/D71638
2020-01-10 09:06:31 -08:00
Fangrui Song 502a77f125 Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" as cleanups after D56351 2019-12-24 15:57:33 -08:00
Yonghong Song a27c998c00 [BPF] fix a CO-RE issue with -mattr=+alu32
Ilya Leoshkevich (<iii@linux.ibm.com>) reported an issue that
with -mattr=+alu32 CO-RE has a segfault in BPF MISimplifyPatchable
pass.

The pattern will be transformed by MISimplifyPatchable
pass looks like below:
  r5 = ld_imm64 @"b:0:0$0:0"
  r2 = ldw r5, 0
  ... r2 ... // use r2
The pass will remove the intermediate 'ldw' instruction
and replacing all r2 with r5 likes below:
  r5 = ld_imm64 @"b:0:0$0:0"
  ... r5 ... // use r5
Later, the ld_imm64 insn will be replaced with
  r5 = <patched immediate>
for field relocation purpose.

With -mattr=+alu32, the input code may become
  r5 = ld_imm64 @"b:0:0$0:0"
  w2 = ldw32 r5, 0
  ... w2 ... // use w2
Replacing "w2" with "r5" is incorrect and will
trigger compiler internal errors.

To fix the problem, if the register class of ldw* dest
register is sub_32, we just replace the original ldw*
register with:
  w2 = w5
Directly replacing all uses of w2 with in-place
constructed w5 for the use operand seems not working in all cases.

The latest kernel will have -mattr=+alu32 on by default,
so added this flag to all CORE tests.
Tested with latest kernel bpf-next branch as well with this patch.

Differential Revision: https://reviews.llvm.org/D69438
2019-10-25 14:27:25 -07:00
Yonghong Song d46a6a9e68 [BPF] Remove relocation for patchable externs
Previously, patchable extern relocations are introduced to patch
external variables used for multi versioning in
compile once, run everywhere use case. The load instruction
will be converted into a move with an patchable immediate
which can be changed by bpf loader on the host.

The kernel verifier has evolved and is able to load
and propagate constant values, so compiler relocation
becomes unnecessary. This patch removed codes related to this.

Differential Revision: https://reviews.llvm.org/D68760

llvm-svn: 374367
2019-10-10 15:33:09 +00:00
Yonghong Song 05e46979d2 [BPF] do compile-once run-everywhere relocation for bitfields
A bpf specific clang intrinsic is introduced:
   u32 __builtin_preserve_field_info(member_access, info_kind)
Depending on info_kind, different information will
be returned to the program. A relocation is also
recorded for this builtin so that bpf loader can
patch the instruction on the target host.
This clang intrinsic is used to get certain information
to facilitate struct/union member relocations.

The offset relocation is extended by 4 bytes to
include relocation kind.
Currently supported relocation kinds are
 enum {
    FIELD_BYTE_OFFSET = 0,
    FIELD_BYTE_SIZE,
    FIELD_EXISTENCE,
    FIELD_SIGNEDNESS,
    FIELD_LSHIFT_U64,
    FIELD_RSHIFT_U64,
 };
for __builtin_preserve_field_info. The old
access offset relocation is covered by
    FIELD_BYTE_OFFSET = 0.

An example:
struct s {
    int a;
    int b1:9;
    int b2:4;
};
enum {
    FIELD_BYTE_OFFSET = 0,
    FIELD_BYTE_SIZE,
    FIELD_EXISTENCE,
    FIELD_SIGNEDNESS,
    FIELD_LSHIFT_U64,
    FIELD_RSHIFT_U64,
};

void bpf_probe_read(void *, unsigned, const void *);
int field_read(struct s *arg) {
  unsigned long long ull = 0;
  unsigned offset = __builtin_preserve_field_info(arg->b2, FIELD_BYTE_OFFSET);
  unsigned size = __builtin_preserve_field_info(arg->b2, FIELD_BYTE_SIZE);
 #ifdef USE_PROBE_READ
  bpf_probe_read(&ull, size, (const void *)arg + offset);
  unsigned lshift = __builtin_preserve_field_info(arg->b2, FIELD_LSHIFT_U64);
 #if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
  lshift = lshift + (size << 3) - 64;
 #endif
 #else
  switch(size) {
  case 1:
    ull = *(unsigned char *)((void *)arg + offset); break;
  case 2:
    ull = *(unsigned short *)((void *)arg + offset); break;
  case 4:
    ull = *(unsigned int *)((void *)arg + offset); break;
  case 8:
    ull = *(unsigned long long *)((void *)arg + offset); break;
  }
  unsigned lshift = __builtin_preserve_field_info(arg->b2, FIELD_LSHIFT_U64);
 #endif
  ull <<= lshift;
  if (__builtin_preserve_field_info(arg->b2, FIELD_SIGNEDNESS))
    return (long long)ull >> __builtin_preserve_field_info(arg->b2, FIELD_RSHIFT_U64);
  return ull >> __builtin_preserve_field_info(arg->b2, FIELD_RSHIFT_U64);
}

There is a minor overhead for bpf_probe_read() on big endian.

The code and relocation generated for field_read where bpf_probe_read() is
used to access argument data on little endian mode:
        r3 = r1
        r1 = 0
        r1 = 4  <=== relocation (FIELD_BYTE_OFFSET)
        r3 += r1
        r1 = r10
        r1 += -8
        r2 = 4  <=== relocation (FIELD_BYTE_SIZE)
        call bpf_probe_read
        r2 = 51 <=== relocation (FIELD_LSHIFT_U64)
        r1 = *(u64 *)(r10 - 8)
        r1 <<= r2
        r2 = 60 <=== relocation (FIELD_RSHIFT_U64)
        r0 = r1
        r0 >>= r2
        r3 = 1  <=== relocation (FIELD_SIGNEDNESS)
        if r3 == 0 goto LBB0_2
        r1 s>>= r2
        r0 = r1
LBB0_2:
        exit

Compare to the above code between relocations FIELD_LSHIFT_U64 and
FIELD_LSHIFT_U64, the code with big endian mode has four more
instructions.
        r1 = 41   <=== relocation (FIELD_LSHIFT_U64)
        r6 += r1
        r6 += -64
        r6 <<= 32
        r6 >>= 32
        r1 = *(u64 *)(r10 - 8)
        r1 <<= r6
        r2 = 60   <=== relocation (FIELD_RSHIFT_U64)

The code and relocation generated when using direct load.
        r2 = 0
        r3 = 4
        r4 = 4
        if r4 s> 3 goto LBB0_3
        if r4 == 1 goto LBB0_5
        if r4 == 2 goto LBB0_6
        goto LBB0_9
LBB0_6:                                 # %sw.bb1
        r1 += r3
        r2 = *(u16 *)(r1 + 0)
        goto LBB0_9
LBB0_3:                                 # %entry
        if r4 == 4 goto LBB0_7
        if r4 == 8 goto LBB0_8
        goto LBB0_9
LBB0_8:                                 # %sw.bb9
        r1 += r3
        r2 = *(u64 *)(r1 + 0)
        goto LBB0_9
LBB0_5:                                 # %sw.bb
        r1 += r3
        r2 = *(u8 *)(r1 + 0)
        goto LBB0_9
LBB0_7:                                 # %sw.bb5
        r1 += r3
        r2 = *(u32 *)(r1 + 0)
LBB0_9:                                 # %sw.epilog
        r1 = 51
        r2 <<= r1
        r1 = 60
        r0 = r2
        r0 >>= r1
        r3 = 1
        if r3 == 0 goto LBB0_11
        r2 s>>= r1
        r0 = r2
LBB0_11:                                # %sw.epilog
        exit

Considering verifier is able to do limited constant
propogation following branches. The following is the
code actually traversed.
        r2 = 0
        r3 = 4   <=== relocation
        r4 = 4   <=== relocation
        if r4 s> 3 goto LBB0_3
LBB0_3:                                 # %entry
        if r4 == 4 goto LBB0_7
LBB0_7:                                 # %sw.bb5
        r1 += r3
        r2 = *(u32 *)(r1 + 0)
LBB0_9:                                 # %sw.epilog
        r1 = 51   <=== relocation
        r2 <<= r1
        r1 = 60   <=== relocation
        r0 = r2
        r0 >>= r1
        r3 = 1
        if r3 == 0 goto LBB0_11
        r2 s>>= r1
        r0 = r2
LBB0_11:                                # %sw.epilog
        exit

For native load case, the load size is calculated to be the
same as the size of load width LLVM otherwise used to load
the value which is then used to extract the bitfield value.

Differential Revision: https://reviews.llvm.org/D67980

llvm-svn: 374099
2019-10-08 18:23:17 +00:00
Yonghong Song d3d88d08b5 [BPF] Support for compile once and run everywhere
Introduction
============

This patch added intial support for bpf program compile once
and run everywhere (CO-RE).

The main motivation is for bpf program which depends on
kernel headers which may vary between different kernel versions.
The initial discussion can be found at https://lwn.net/Articles/773198/.

Currently, bpf program accesses kernel internal data structure
through bpf_probe_read() helper. The idea is to capture the
kernel data structure to be accessed through bpf_probe_read()
and relocate them on different kernel versions.

On each host, right before bpf program load, the bpfloader
will look at the types of the native linux through vmlinux BTF,
calculates proper access offset and patch the instruction.

To accommodate this, three intrinsic functions
   preserve_{array,union,struct}_access_index
are introduced which in clang will preserve the base pointer,
struct/union/array access_index and struct/union debuginfo type
information. Later, bpf IR pass can reconstruct the whole gep
access chains without looking at gep itself.

This patch did the following:
  . An IR pass is added to convert preserve_*_access_index to
    global variable who name encodes the getelementptr
    access pattern. The global variable has metadata
    attached to describe the corresponding struct/union
    debuginfo type.
  . An SimplifyPatchable MachineInstruction pass is added
    to remove unnecessary loads.
  . The BTF output pass is enhanced to generate relocation
    records located in .BTF.ext section.

Typical CO-RE also needs support of global variables which can
be assigned to different values to different hosts. For example,
kernel version can be used to guard different versions of codes.
This patch added the support for patchable externals as well.

Example
=======

The following is an example.

  struct pt_regs {
    long arg1;
    long arg2;
  };
  struct sk_buff {
    int i;
    struct net_device *dev;
  };

  #define _(x) (__builtin_preserve_access_index(x))
  static int (*bpf_probe_read)(void *dst, int size, const void *unsafe_ptr) =
          (void *) 4;
  extern __attribute__((section(".BPF.patchable_externs"))) unsigned __kernel_version;
  int bpf_prog(struct pt_regs *ctx) {
    struct net_device *dev = 0;

    // ctx->arg* does not need bpf_probe_read
    if (__kernel_version >= 41608)
      bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg1)->dev));
    else
      bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg2)->dev));
    return dev != 0;
  }

In the above, we want to translate the third argument of
bpf_probe_read() as relocations.

  -bash-4.4$ clang -target bpf -O2 -g -S trace.c

The compiler will generate two new subsections in .BTF.ext,
OffsetReloc and ExternReloc.
OffsetReloc is to record the structure member offset operations,
and ExternalReloc is to record the external globals where
only u8, u16, u32 and u64 are supported.

   BPFOffsetReloc Size
   struct SecLOffsetReloc for ELF section #1
   A number of struct BPFOffsetReloc for ELF section #1
   struct SecOffsetReloc for ELF section #2
   A number of struct BPFOffsetReloc for ELF section #2
   ...
   BPFExternReloc Size
   struct SecExternReloc for ELF section #1
   A number of struct BPFExternReloc for ELF section #1
   struct SecExternReloc for ELF section #2
   A number of struct BPFExternReloc for ELF section #2

  struct BPFOffsetReloc {
    uint32_t InsnOffset;    ///< Byte offset in this section
    uint32_t TypeID;        ///< TypeID for the relocation
    uint32_t OffsetNameOff; ///< The string to traverse types
  };

  struct BPFExternReloc {
    uint32_t InsnOffset;    ///< Byte offset in this section
    uint32_t ExternNameOff; ///< The string for external variable
  };

Note that only externs with attribute section ".BPF.patchable_externs"
are considered for Extern Reloc which will be patched by bpf loader
right before the load.

For the above test case, two offset records and one extern record
will be generated:
  OffsetReloc records:
        .long   .Ltmp12                 # Insn Offset
        .long   7                       # TypeId
        .long   242                     # Type Decode String
        .long   .Ltmp18                 # Insn Offset
        .long   7                       # TypeId
        .long   242                     # Type Decode String

  ExternReloc record:
        .long   .Ltmp5                  # Insn Offset
        .long   165                     # External Variable

  In string table:
        .ascii  "0:1"                   # string offset=242
        .ascii  "__kernel_version"      # string offset=165

The default member offset can be calculated as
    the 2nd member offset (0 representing the 1st member) of struct "sk_buff".

The asm code:
    .Ltmp5:
    .Ltmp6:
            r2 = 0
            r3 = 41608
    .Ltmp7:
    .Ltmp8:
            .loc    1 18 9 is_stmt 0        # t.c:18:9
    .Ltmp9:
            if r3 > r2 goto LBB0_2
    .Ltmp10:
    .Ltmp11:
            .loc    1 0 9                   # t.c:0:9
    .Ltmp12:
            r2 = 8
    .Ltmp13:
            .loc    1 19 66 is_stmt 1       # t.c:19:66
    .Ltmp14:
    .Ltmp15:
            r3 = *(u64 *)(r1 + 0)
            goto LBB0_3
    .Ltmp16:
    .Ltmp17:
    LBB0_2:
            .loc    1 0 66 is_stmt 0        # t.c:0:66
    .Ltmp18:
            r2 = 8
            .loc    1 21 66 is_stmt 1       # t.c:21:66
    .Ltmp19:
            r3 = *(u64 *)(r1 + 8)
    .Ltmp20:
    .Ltmp21:
    LBB0_3:
            .loc    1 0 66 is_stmt 0        # t.c:0:66
            r3 += r2
            r1 = r10
    .Ltmp22:
    .Ltmp23:
    .Ltmp24:
            r1 += -8
            r2 = 8
            call 4

For instruction .Ltmp12 and .Ltmp18, "r2 = 8", the number
8 is the structure offset based on the current BTF.
Loader needs to adjust it if it changes on the host.

For instruction .Ltmp5, "r2 = 0", the external variable
got a default value 0, loader needs to supply an appropriate
value for the particular host.

Compiling to generate object code and disassemble:
   0000000000000000 bpf_prog:
           0:       b7 02 00 00 00 00 00 00         r2 = 0
           1:       7b 2a f8 ff 00 00 00 00         *(u64 *)(r10 - 8) = r2
           2:       b7 02 00 00 00 00 00 00         r2 = 0
           3:       b7 03 00 00 88 a2 00 00         r3 = 41608
           4:       2d 23 03 00 00 00 00 00         if r3 > r2 goto +3 <LBB0_2>
           5:       b7 02 00 00 08 00 00 00         r2 = 8
           6:       79 13 00 00 00 00 00 00         r3 = *(u64 *)(r1 + 0)
           7:       05 00 02 00 00 00 00 00         goto +2 <LBB0_3>

    0000000000000040 LBB0_2:
           8:       b7 02 00 00 08 00 00 00         r2 = 8
           9:       79 13 08 00 00 00 00 00         r3 = *(u64 *)(r1 + 8)

    0000000000000050 LBB0_3:
          10:       0f 23 00 00 00 00 00 00         r3 += r2
          11:       bf a1 00 00 00 00 00 00         r1 = r10
          12:       07 01 00 00 f8 ff ff ff         r1 += -8
          13:       b7 02 00 00 08 00 00 00         r2 = 8
          14:       85 00 00 00 04 00 00 00         call 4

Instructions #2, #5 and #8 need relocation resoutions from the loader.

Signed-off-by: Yonghong Song <yhs@fb.com>

Differential Revision: https://reviews.llvm.org/D61524

llvm-svn: 365503
2019-07-09 15:28:41 +00:00