AVX512_VPOPCNTDQ is a new feature set that was published by Intel.
The patch represents the LLVM side of the addition of two new intrinsic based instructions (vpopcntd and vpopcntq).
Differential Revision: https://reviews.llvm.org/D33169
llvm-svn: 303858
Summary:
This patch adds udiv/sdiv/urem/srem/udivrem/sdivrem methods that can divide by a uint64_t. This makes division consistent with all the other arithmetic operations.
This modifies the interface of the divide helper method to work on raw arrays instead of APInts. This way we can pass the uint64_t in for the RHS without wrapping it in an APInt. This required moving all the Quotient and Remainder allocation handling up to the callers. For udiv/urem this was as simple as just creating the Quotient/Remainder with the right size when they were declared. For udivrem we have to rely on reallocate not changing the contents of the variable LHS or RHS is aliased with the Quotient or Remainder APInts. We also have to zero the upper bits of Remainder and Quotient that divide doesn't write to if lhsWords/rhsWords is smaller than the width.
I've update the toString method to use the new udivrem.
Reviewers: hans, dblaikie, RKSimon
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33310
llvm-svn: 303431
compatible target triple
Currently, an assertion fails in ThinLTOCodeGenerator::addModule when
the target triple of the module being added doesn't match that of the
one stored in TMBuilder. This patch relaxes the constraint and makes
changes to allow target triples that only differ in their version
numbers on Apple platforms, similarly to what r228999 did.
rdar://problem/30133904
Differential Revision: https://reviews.llvm.org/D33291
llvm-svn: 303326
Often you have an array and you just want to use it. With the current
design, you have to first construct a `BinaryByteStream`, and then create
a `BinaryStreamRef` from it. Worse, the `BinaryStreamRef` holds a pointer
to the `BinaryByteStream`, so you can't just create a temporary one to
appease the compiler, you have to actually hold onto both the `ArrayRef`
as well as the `BinaryByteStream` *AND* the `BinaryStreamReader` on top of
that. This makes for very cumbersome code, often requiring one to store a
`BinaryByteStream` in a class just to circumvent this.
At the cost of some added complexity (not exposed to users, but internal
to the library), we can do better than this. This patch allows us to
construct `BinaryStreamReaders` and `BinaryStreamWriters` directly from
source data (e.g. `StringRef`, `MutableArrayRef<uint8_t>`, etc). Not only
does this reduce the amount of code you have to type and make it more
obvious how to use it, but it solves real lifetime issues when it's
inconvenient to hold onto a `BinaryByteStream` for a long time.
The additional complexity is in the form of an added layer of indirection.
Whereas before we simply stored a `BinaryStream*` in the ref, we now store
both a `BinaryStream*` **and** a `std::shared_ptr<BinaryStream>`. When
the user wants to construct a `BinaryStreamRef` directly from an
`ArrayRef` etc, we allocate an internal object that holds ownership over a
`BinaryByteStream` and forwards all calls, and store this in the
`shared_ptr<>`. This also maintains the ref semantics, as you can copy it
by value and references refer to the same underlying stream -- the one
being held in the object stored in the `shared_ptr`.
Differential Revision: https://reviews.llvm.org/D33293
llvm-svn: 303294
driver-mode recognition in clang (this is because the sysctl method
always returns one and only one executable path, even for an executable
with multiple links):
Fix DynamicLibraryTest.cpp on FreeBSD and NetBSD
Summary:
After rL301562, on FreeBSD the DynamicLibrary unittests fail, because
the test uses getMainExecutable("DynamicLibraryTests", Ptr), and since
the path does not contain any slashes, retrieving the main executable
will not work.
Reimplement getMainExecutable() for FreeBSD and NetBSD using sysctl(3),
which is more reliable than fiddling with relative or absolute paths.
Also add retrieval of the original argv[] from the GoogleTest framework,
to use as a fallback for other OSes.
Reviewers: emaste, marsupial, hans, krytarowski
Reviewed By: krytarowski
Subscribers: krytarowski, llvm-commits
Differential Revision: https://reviews.llvm.org/D33171
llvm-svn: 303285
We have to check gCrashRecoveryEnabled before using __try.
In other words, SEH works too well and we ended up recovering from
crashes in implicit module builds that we weren't supposed to. Only
libclang is supposed to enable CrashRecoveryContext to allow implicit
module builds to crash.
llvm-svn: 303279
Summary:
It avoids problems when other libraries raise exceptions. In particular,
OutputDebugString raises an exception that the debugger is supposed to
catch and suppress. VEH kicks in first right now, and that is entirely
incorrect.
Unfortunately, GCC does not support SEH, so I've kept the old buggy VEH
codepath around. We could fix it with SetUnhandledExceptionFilter, but
that is not per-thread, so a well-behaved library shouldn't set it.
Reviewers: zturner
Subscribers: llvm-commits, mgorny
Differential Revision: https://reviews.llvm.org/D33261
llvm-svn: 303274
Since we use AddVectoredExceptionHandler, we get notified of
every exception that gets raised by a program. Sometimes these
are not necessarily errors though, and this can be especially
true when linking against a library that we have no control
over, and may raise an exception internally which it intends
to catch.
In particular, the Windows API OutputDebugString does exactly
this. It raises an exception inside of a __try / __except,
giving the debugger a chance to handle the exception to print
the message to the debug console.
But this doesn't interoperate nicely with our vectored exception
handler, which just sees another exception and decides that we
need to terminate the program.
Add a special case for this so that we ignore ODS exceptions
and continue normally.
Note that a better fix is to simply not use vectored exception
handlers and use SEH instead, but given that MinGW doesn't support
SEH, this is the only solution for MinGW.
Differential Revision: https://reviews.llvm.org/D33260
llvm-svn: 303219
Summary:
After rL301562, on FreeBSD the DynamicLibrary unittests fail, because
the test uses getMainExecutable("DynamicLibraryTests", Ptr), and since
the path does not contain any slashes, retrieving the main executable
will not work.
Reimplement getMainExecutable() for FreeBSD and NetBSD using sysctl(3),
which is more reliable than fiddling with relative or absolute paths.
Also add retrieval of the original argv[] from the GoogleTest framework,
to use as a fallback for other OSes.
Reviewers: emaste, marsupial, hans, krytarowski
Reviewed By: krytarowski
Subscribers: krytarowski, llvm-commits
Differential Revision: https://reviews.llvm.org/D33171
llvm-svn: 303015
We already counted the number of bits in the RHS so its pretty cheap to just check if the RHS is 1.
Differential Revision: https://reviews.llvm.org/D33154
llvm-svn: 302953
This helped the compiler generate better code for the single word case. It was able to remember that the bit width was still a single word when it created the Remainder APInt and not create code for it possibly being multiword.
llvm-svn: 302952
At this point in the code rhsWords is guaranteed to be non-zero and less than or equal to lhsWords. So if lhsWords is 1, rhsWords must also be 1. urem alread had the check removed so this makes all 3 consistent.
llvm-svn: 302930
Summary:
This adds a resize method to APInt that manages deleting/allocating storage for an APInt and changes its bit width. Use this to simplify code in copy assignment and divide.
The assignment code in particular was overly complicated. Treating every possible case as a separate implementation. I'm also pretty sure the clearUnusedBits code at the end was unnecessary. Since we always copying whole words from the source APInt. All unused bits should be clear in the source.
Reviewers: hans, RKSimon
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33073
llvm-svn: 302863
This time it actually occurred to me to change the #defines
to actually test the pre-processed out codepath. Hopefully
this time it works.
llvm-svn: 302752
This lets toString take advantage of the degenerate case checks in udivrem and is just generally cleaner.
One minor downside of this is that the divisor APInt now needs to be the same size as Tmp which requires an additional allocation. But we were doing a poor job of reusing allocations before so the new code should still be an improvement.
llvm-svn: 302704
The description says it returns the number of words needed to represent the results. But the way it was coded it always returns (lhsWords + rhsWords) or (lhsWords + rhsWords - 1). But the result could be even smaller than that and it wouldn't tell you.
No one uses the result today so rather than try to fix it, just remove it.
llvm-svn: 302551
The value of 'i' is always the smaller of DstParts and SrcParts so we can just use that fact to write all the code in terms of SrcParts and DstParts.
llvm-svn: 302408
Currently multiply is implemented in operator*=. Operator* makes a copy and uses operator*= to modify the copy.
Operator*= itself allocates a temporary buffer to hold the multiply result as it computes it. Then copies it to the buffer in *this.
Operator*= attempts to bound the size of the result based on the number of active bits in its inputs. It also has a couple special cases to handle 0 inputs without any memory allocations or multiply operations. The best case is that it calculates a single word regardless of input bit width. The worst case is that it calculates the a 2x input width result and drop the upper bits.
Since operator* uses operator*= it incurs two allocations, one for a copy of *this and one for the temporary allocation. Neither of these allocations are kept after the method operation is done.
The main usage in the backend appears to be ConstantRange::multiply which uses operator* rather than operator*=.
This patch moves the multiply operation to operator* and implements operator*= using it. This avoids the copy in operator*. operator* now allocates a result buffer sized the same width as its inputs no matter what. This buffer will be used as the buffer for the returned APInt. Finally, we reuse tcMultiply to implement the multiply operation. This function is capable of not calculating additional upper words that will be discarded.
This change does lose the special optimizations for the inputs using less words than their size implies. But it also removed the getActiveBits calls from all multiplies. If we think those optimizations are important we could look at providing additional bounds to tcMultiply to limit the computations.
Differential Revision: https://reviews.llvm.org/D32830
llvm-svn: 302171
Otherwise, each CPU has to manually specify the extensions it supports,
even though they have to be a superset of the base arch extensions.
And when there's redundant data there's stale data, so most of the CPUs
lie about the features they support (almost none lists AEK_FP).
Instead, do the saner thing: add the optional extensions on top of the
base extensions provided by the architecture.
The ARM TargetParser has the same behavior.
Differential Revision: https://reviews.llvm.org/D32780
llvm-svn: 302078
This was reverted due to a "missing" file, but in reality
what happened was that I renamed a file, and then due to
a merge conflict both the old file and the new file got
added to the repository. This led to an unused cpp file
being in the repo and not referenced by any CMakeLists.txt
but #including a .h file that wasn't in the repo. In an
even more unfortunate coincidence, CMake didn't report the
unused cpp file because it was in a subdirectory of the
folder with the CMakeLists.txt, and not in the same directory
as any CMakeLists.txt.
The presence of the unused file was then breaking certain
tools that determine file lists by globbing rather than
by what's specified in CMakeLists.txt
In any case, the fix is to just remove the unused file from
the patch set.
llvm-svn: 302042
This patch adds support for the the LightWeight Profiling (LWP) instructions which are available on all AMD Bulldozer class CPUs (bdver1 to bdver4).
Reapplied - this time without changing line endings of existing files.
Differential Revision: https://reviews.llvm.org/D32769
llvm-svn: 302041
Currently several places assume the VAL member is always at least the same size as pVal. In particular for a memcpy in the move assignment operator. While this is a true assumption, it isn't good practice to assume this.
This patch gives the union a name so we can write the memcpy in terms of the union itself. This also adds a similar memcpy to the move constructor where we previously just copied using VAL directly.
This patch is mostly just a mechanical addition of the U in front of VAL and pVAL everywhere. But several constructors had to be modified since we can't directly initializer a field of named union from the initializer list.
Differential Revision: https://reviews.llvm.org/D30629
llvm-svn: 302040
This patch adds support for the the LightWeight Profiling (LWP) instructions which are available on all AMD Bulldozer class CPUs (bdver1 to bdver4).
Differential Revision: https://reviews.llvm.org/D32769
llvm-svn: 302028
The "macosx" OS type is still the canonical type. In the future "macos" will
become the canonical OS type (but we will still support "macosx").
rdar://27043820
Differential Revision: https://reviews.llvm.org/D32748
llvm-svn: 302011
The patch is failing to add StringTableStreamBuilder.h, but that isn't
even discovered because the corresponding StringTableStreamBuilder.cpp
isn't added to any CMakeLists.txt file and thus never built. I think
this patch is just incomplete.
llvm-svn: 302002
Previously we had knowledge of how to serialize and deserialize
a string table inside of DebugInfo/PDB, but the string table
that it serializes contains a piece that is actually considered
CodeView and can appear outside of a PDB. We already have logic
in llvm-readobj and MCCodeView to read and write this format,
so it doesn't make sense to duplicate the logic in DebugInfoPDB
as well.
This patch makes codeview::StringTable (for writing) and
codeview::StringTableRef (for reading), updates DebugInfoPDB
to use these classes for its own writing, and updates llvm-readobj
to additionally use StringTableRef for reading.
It's a bit more difficult to get MCCodeView to use this for
writing, but it's a logical next step.
llvm-svn: 301986
When dumping raw data from a stream, you might know the offset
of a certain record you're interested in, as well as how long
that record is. Previously, you had to dump the entire stream
and wade through the bytes to find the interesting record.
This patch allows you to specify an offset and length on the
command line, and it will only dump the requested range.
llvm-svn: 301607
libraries are properly unloaded when llvm_shutdown is called.
Summary:
This was mostly affecting usage of the JIT, where storing the library handles in
a set made iteration unordered/undefined. This lead to disagreement between the
JIT and native code as to what the address and implementation of particularly on
Windows with stdlib functions:
JIT: putenv_s("TEST", "VALUE") // called msvcrt.dll, putenv_s
JIT: getenv("TEST") -> "VALUE" // called msvcrt.dll, getenv
Native: getenv("TEST") -> NULL // called ucrt.dll, getenv
Also fixed is the issue of DynamicLibrary::getPermanentLibrary(0,0) on Windows
not giving priority to the process' symbols as it did on Unix.
Reviewers: chapuni, v.g.vassilev, lhames
Reviewed By: lhames
Subscribers: danalbert, srhines, mgorny, vsk, llvm-commits
Differential Revision: https://reviews.llvm.org/D30107
llvm-svn: 301562
The previous algorithm processed one character at a time, which is very
painful on a modern CPU. Replace it with xxHash64, which both already
exists in the codebase and is fairly fast.
Patch from Scott Smith!
Differential Revision: https://reviews.llvm.org/D32509
llvm-svn: 301487
libraries are properly unloaded when llvm_shutdown is called.
Summary:
This was mostly affecting usage of the JIT, where storing the library handles in
a set made iteration unordered/undefined. This lead to disagreement between the
JIT and native code as to what the address and implementation of particularly on
Windows with stdlib functions:
JIT: putenv_s("TEST", "VALUE") // called msvcrt.dll, putenv_s
JIT: getenv("TEST") -> "VALUE" // called msvcrt.dll, getenv
Native: getenv("TEST") -> NULL // called ucrt.dll, getenv
Also fixed is the issue of DynamicLibrary::getPermanentLibrary(0,0) on Windows
not giving priority to the process' symbols as it did on Unix.
Reviewers: chapuni, v.g.vassilev, lhames
Reviewed By: lhames
Subscribers: danalbert, srhines, mgorny, vsk, llvm-commits
Differential Revision: https://reviews.llvm.org/D30107
llvm-svn: 301236
This replaces a hand written copy loop with a call to memcpy for both zext and sext.
For sext, it replaces multiple if/else blocks propagating sign information forward. Now we just do a copy, a sign extension on the last copied word, a memset, and clearUnusedBits.
Differential Revision: https://reviews.llvm.org/D32417
llvm-svn: 301201
This patch adds an in place version of ashr to match lshr and shl which were recently added.
I've tried to make this similar to the lshr code with additions to handle the sign extension. I've also tried to do this with less if checks than the current ashr code by sign extending the original result to a word boundary before doing any of the shifting. This removes a lot of the complexity of determining where to fill in sign bits after the shifting.
Differential Revision: https://reviews.llvm.org/D32415
llvm-svn: 301198
Summary: SUSE's ARM triples end with -gnueabi even though they are hard-float. This requires special handling of SUSE ARM triples. Hence we need a way to differentiate the SUSE as vendor. This CL adds that.
Reviewers: chandlerc, compnerd, echristo, rengolin
Reviewed By: rengolin
Subscribers: aemerson, rengolin, llvm-commits
Differential Revision: https://reviews.llvm.org/D32426
llvm-svn: 301174
Previously single word would always return 0 regardless of the original sign. Multi word would return all 0s or all 1s based on the original sign. Now single word takes into account the sign as well.
llvm-svn: 301159
The changes are causing the i686-mingw32 build to fail.
This reverts commit r301153, and the changes for a separate warning on i686-mingw32 in r301155 and r301156.
llvm-svn: 301157
libraries are properly unloaded when llvm_shutdown is called.
Summary:
This was mostly affecting usage of the JIT, where storing the library handles in
a set made iteration unordered/undefined. This lead to disagreement between the
JIT and native code as to what the address and implementation of particularly on
Windows with stdlib functions:
JIT: putenv_s("TEST", "VALUE") // called msvcrt.dll, putenv_s
JIT: getenv("TEST") -> "VALUE" // called msvcrt.dll, getenv
Native: getenv("TEST") -> NULL // called ucrt.dll, getenv
Also fixed is the issue of DynamicLibrary::getPermanentLibrary(0,0) on Windows
not giving priority to the process' symbols as it did on Unix.
Reviewers: chapuni, v.g.vassilev, lhames
Reviewed By: lhames
Subscribers: danalbert, srhines, mgorny, vsk, llvm-commits
Differential Revision: https://reviews.llvm.org/D30107
llvm-svn: 301153
The current code is trying to be clever with shifts to avoid needing to clear unused bits. But it looks like the compiler is unable to optimize out the unused bit handling in the APInt constructor. Given this its better to just use SignExtend64 and have more readable code.
llvm-svn: 301133
This reverts commit r301105, 4, 3 and 1, as a follow up of the previous
revert, which broke even more bots.
For reference:
Revert "[APInt] Use operator<<= where possible. NFC"
Revert "[APInt] Use operator<<= instead of shl where possible. NFC"
Revert "[APInt] Use ashInPlace where possible."
PR32754.
llvm-svn: 301111
For single word, shift by BitWidth was always returning 0, but for multiword it was based on original sign. Now single word matches multi word.
llvm-svn: 301094
Currently sle and ule have to call slt/ult and eq to get the proper answer. This results in extra code for both calls and additional scans of multiword APInts.
This patch replaces slt/ult with a compareSigned/compare that can return -1, 0, or 1 so we can cover all the comparison functions with a single call.
While I was there I removed the activeBits calls and other checks at the start of the slow part of ult. Both of the activeBits calls potentially scan through each of the APInts separately. I can't imagine that's any better than just scanning them in parallel and doing the compares. Now we just share the code with tcCompare.
These changes seem to be good for about a 7-8k reduction on the size of the opt binary on my local x86-64 build.
Differential Revision: https://reviews.llvm.org/D32339
llvm-svn: 300995
This should fix the bug https://bugs.llvm.org/show_bug.cgi?id=12906
To print the FP constant AsmWriter does the following:
1) convert FP value to String (actually using snprintf function which is locale dependent).
2) Convert String back to FP Value
3) Compare original and got FP values. If they are not equal just dump as hex.
The problem happens on the 2nd step when APFloat does not expect group delimiter or
fraction delimiter other than period symbol and so on, which can be produced on the
first step if LLVM library is used in an environment with corresponding locale set.
To fix this issue the locale independent APFloat:toString function is used.
However it prints FP values slightly differently than snprintf does. Specifically
it suppress trailing zeros in significant, use capital E and so on.
It results in 117 test failures during make check.
To avoid this I've also updated APFloat.toString a bit to pass make check at least.
Reviewers: sberg, bogner, majnemer, sanjoy, timshen, rnk
Reviewed By: timshen, rnk
Subscribers: rnk, llvm-commits
Differential Revision: https://reviews.llvm.org/D32276
llvm-svn: 300943
Associate the version-when-defined with definitions of standard DWARF
constants. Identify the "vendor" for DWARF extensions.
Use this information to verify FORMs in .debug_abbrev are defined as
of the DWARF version specified in the associated unit.
Removed two tests that had specified DWARF v1 (which essentially does
not exist).
Differential Revision: http://reviews.llvm.org/D30785
llvm-svn: 300875
This question comes up in many places in SimplifyDemandedBits. This makes it easy to ask without allocating additional temporary APInts.
The BitVector class provides a similar functionality through its (IMHO badly named) test(const BitVector&) method. Though its output polarity is reversed.
I've provided one example use case in this patch. I plan to do more as a follow up.
Differential Revision: https://reviews.llvm.org/D32258
llvm-svn: 300851
The hardware div feature refers only to Thumb, but because of its name
it is tempting to use it to check for hardware division in general,
which may cause problems in ARM mode. See https://reviews.llvm.org/D32005.
This patch adds "Thumb" to its name, to make its scope clear. One
notable place where I haven't made the change is in the feature flag
(used with -mattr), which is still hwdiv. Changing it would also require
changes in a lot of tests, including clang tests, and it doesn't seem
like it's worth the effort.
Differential Revision: https://reviews.llvm.org/D32160
llvm-svn: 300827
Summary: This is a simple question we should be able to answer without creating a temporary to hold the AND result. We can also get an early out as soon as we find a word that intersects.
Reviewers: RKSimon, hans, spatel, davide
Reviewed By: hans, davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32253
llvm-svn: 300812