Commit Graph

341 Commits

Author SHA1 Message Date
Zachary Turner 07d803777c [CodeView] Micro-optimizations to speed up type merging.
Based on a profile, a couple of hot spots were identified in the
main type merging loop.  The code was simplified, a few loops
were re-arranged, and some outlined functions were inlined.  This
speeds up type merging by a decent amount, shaving around 3-4 seconds
off of a 40 second link in my test case.

Differential Revision: https://reviews.llvm.org/D42559

llvm-svn: 323790
2018-01-30 17:12:04 +00:00
Zachary Turner 1bc2ce6b9b Speed up iteration of CodeView record streams.
There's some abstraction overhead in the underlying
mechanisms that were being used, and it was leading to an
abundance of small but not-free copies being made.  This
showed up on a profile.  Eliminating this and going back to
a low-level byte-based implementation speeds up lld with
/DEBUG between 10 and 15%.

Differential Revision: https://reviews.llvm.org/D42148

llvm-svn: 322871
2018-01-18 18:35:01 +00:00
Zachary Turner 0d07a8e948 [COFF] Teach LLD to use the COFF .debug$H section.
This adds the /DEBUG:GHASH option to LLD which will look for
the existence of .debug$H sections in linker inputs and use them
to accelerate type merging.  The clang-cl side has already been
added, so this completes the work necessary to begin experimenting
with this feature.

Differential Revision: https://reviews.llvm.org/D40980

llvm-svn: 320719
2017-12-14 18:07:04 +00:00
Zachary Turner 048f8f99bf [CodeView] Teach clang to emit the .debug$H COFF section.
Currently this is an LLVM extension to the COFF spec which is
experimental and intended to speed up linking.  For now it is
behind a hidden cl::opt flag, but in the future we can move it
to a "real" cc1 flag and have the driver pass it through whenever
it is appropriate.

The patch to actually make use of this section in lld will come
in a followup.

Differential Revision: https://reviews.llvm.org/D40917

llvm-svn: 320649
2017-12-13 22:33:58 +00:00
Michael Zolotukhin 0c169bf7f7 Remove redundant includes from lib/DebugInfo.
llvm-svn: 320620
2017-12-13 21:30:49 +00:00
Zachary Turner ecd2684ed7 [DebugInfo] Fix register variables not showing up in pdb.
Previously, when linking against libcmt from the MSVC runtime,
lld-link /verbose would show "Ignoring unknown symbol record
with kind 0x1006".  It turns out this was because
TypeIndexDiscovery did not handle S_REGISTER records, so these
records were not getting properly remapped.

Patch by: Alexnadre Ganea
Differential Revision: https://reviews.llvm.org/D40919

llvm-svn: 320108
2017-12-07 22:51:16 +00:00
Zachary Turner 376d437776 Teach llvm-pdbutil to dump types from object files.
llvm-svn: 319859
2017-12-05 23:58:18 +00:00
Zachary Turner 023f88ef42 Fix -Wmissing-braces error.
llvm-svn: 319855
2017-12-05 23:19:33 +00:00
Zachary Turner 87b78e9d1e [CodeView] Add support for content hashing CodeView type records.
Currently nothing uses this, but this at least gets the core
algorithm in, and adds some test to demonstrate correctness.

Differential Revision: https://reviews.llvm.org/D40736

llvm-svn: 319854
2017-12-05 23:08:58 +00:00
Zachary Turner f0e4c6a819 Simplify the DenseSet used for hashing CodeView records.
This was storing the hash alongside the key so that the hash
doesn't need to be re-computed every time, but in doing so it
was allocating a structure to keep the key size small in the
DenseMap.  This is a noble goal, but it also leads to a pointer
indirection on every probe, and this cost of this pointer
indirection ends up being higher than the cost of having a
slightly larger entry in the hash table.  Removing this not only
simplifies the code, but yields a small but noticeable
performance improvement in the type merging algorithm.

llvm-svn: 319493
2017-11-30 23:00:30 +00:00
Zachary Turner ca6dbf1440 Split TypeTableBuilder into two classes.
llvm-svn: 319456
2017-11-30 18:39:50 +00:00
Zachary Turner 52d036e693 [CodeView] Factor some code out of TypeTableBuilder.
This class had some code that would automatically remap type
indices before hashing and serializing.  The only caller of
this method was the TypeStreamMerger anyway, and the method
doesn't make general sense, and prevents making certain future
improvements to the class.  So, factoring this up one level
into the TypeStreamMerger where it belongs.

llvm-svn: 319377
2017-11-29 22:41:56 +00:00
Zachary Turner 3e3936da93 Make TypeTableBuilder inherit from TypeCollection.
A couple of places in LLD were passing references to
TypeTableCollections around, which makes it hard to change the
implementation at runtime.  However, these cases only needed to
iterate over the types in the collection, and TypeCollection
already provides a handy abstract interface for this purpose.

By implementing this interface, we can get rid of the need to
pass TypeTableBuilder references around, which should allow us
to swap the implementation at runtime in subsequent patches.

llvm-svn: 319345
2017-11-29 19:35:21 +00:00
Zachary Turner 4c1fa68590 Fix a warning.
llvm-svn: 319263
2017-11-29 00:13:44 +00:00
Zachary Turner 29b081dcd1 [NFC] Minor cleanups in CodeView TypeTableBuilder.
llvm-svn: 319260
2017-11-28 23:57:13 +00:00
Rafael Espindola bba7f862d8 Fix non assert build warnings.
llvm-svn: 319200
2017-11-28 18:50:08 +00:00
Zachary Turner 6900de1dfb [CodeView] Refactor / Rewrite TypeSerializer and TypeTableBuilder.
The motivation behind this patch is that future directions require us to
be able to compute the hash value of records independently of actually
using them for de-duplication.

The current structure of TypeSerializer / TypeTableBuilder being a
single entry point that takes an unserialized type record, and then
hashes and de-duplicates it is not flexible enough to allow this.

At the same time, the existing TypeSerializer is already extremely
complex for this very reason -- it tries to be too many things. In
addition to serializing, hashing, and de-duplicating, ti also supports
splitting up field list records and adding continuations. All of this
functionality crammed into this one class makes it very complicated to
work with and hard to maintain.

To solve all of these problems, I've re-written everything from scratch
and split the functionality into separate pieces that can easily be
reused. The end result is that one class TypeSerializer is turned into 3
new classes SimpleTypeSerializer, ContinuationRecordBuilder, and
TypeTableBuilder, each of which in isolation is simple and
straightforward.

A quick summary of these new classes and their responsibilities are:

- SimpleTypeSerializer : Turns a non-FieldList leaf type into a series of
  bytes. Does not do any hashing. Every time you call it, it will
  re-serialize and return bytes again. The same instance can be re-used
  over and over to avoid re-allocations, and in exchange for this
  optimization the bytes returned by the serializer only live until the
  caller attempts to serialize a new record.

- ContinuationRecordBuilder : Turns a FieldList-like record into a series
  of fragments. Does not do any hashing. Like SimpleTypeSerializer,
  returns references to privately owned bytes, so the storage is
  invalidated as soon as the caller tries to re-use the instance. Works
  equally well for LF_FIELDLIST as it does for LF_METHODLIST, solving a
  long-standing theoretical limitation of the previous implementation.

- TypeTableBuilder : Accepts sequences of bytes that the user has already
  serialized, and inserts them by de-duplicating with a hash table. For
  the sake of convenience and efficiency, this class internally stores a
  SimpleTypeSerializer so that it can accept unserialized records. The
  same is not true of ContinuationRecordBuilder. The user is required to
  create their own instance of ContinuationRecordBuilder.

Differential Revision: https://reviews.llvm.org/D40518

llvm-svn: 319198
2017-11-28 18:33:17 +00:00
Reid Kleckner 8aa32ffbad [codeview] Fix handling of S_HEAPALLOCSITE
The type index is from the TPI stream, not the IPI stream. Fix the
dumper, fix type index discovery, and add a test in LLD.

Also improve the log message we emit when we fail to rewrite type
indices in LLD. That's how I found this bug.

llvm-svn: 316461
2017-10-24 17:02:40 +00:00
Reid Kleckner 0e88118dd7 [codeview] Add support for inlinee lists
This adds type index discovery and dumper support for symbol record kind
0x1168, which is a list of inlined function ids. This symbol kind is
undocumented, but S_INLINEES is consistent with the existing
nomenclature.

Fixes PR34222

llvm-svn: 316398
2017-10-23 23:43:40 +00:00
Reid Kleckner ecddee27a8 [codeview] Recognize two records with no type index fields
Thunk records do not have types and frame cookies do not have types.

These were found while linking libconcrt.lib from MSVC.

llvm-svn: 316385
2017-10-23 22:44:24 +00:00
Hans Wennborg 660531085a CodeView: Provide a .def file with the register ids
The list of register ids was previously written out in a couple of dirrent
places. This puts it in a .def file and also adds a few more registers (e.g.
the x87 regs) which should lead to more readable dumps, but I didn't include
the whole list since that seems unnecessary.

X86_MC::initLLVMToSEHAndCVRegMapping is pretty ugly, but at least it's not
relying on magic constants anymore. The TODO of using tablegen still stands.

Differential revision: https://reviews.llvm.org/D38480

llvm-svn: 314821
2017-10-03 18:27:22 +00:00
Hans Wennborg ea89ff7c25 CodeView symbol dumper: use symbolic names for registers
https://reviews.llvm.org/D38469

llvm-svn: 314690
2017-10-02 17:44:47 +00:00
Zachary Turner abb17cc084 [llvm-pdbutil] Support dumping CodeView from object files.
We have llvm-readobj for dumping CodeView from object files, and
llvm-pdbutil has always been more focused on PDB.  However,
llvm-pdbutil has a lot of useful options for summarizing debug
information in aggregate and presenting high level statistical
views.  Furthermore, it's arguably better as a testing tool since
we don't have to write tests to conform to a state-machine like
structure where you match multiple lines in succession, each
depending on a previous match.  llvm-pdbutil dumps much more
concisely, so it's possible to use single-line matches in many
cases where as with readobj tests you have to use multi-line
matches with an implicit state machine.

Because of this, I'm adding object file support to llvm-pdbutil.
In fact, this mirrors the cvdump tool from Microsoft, which also
supports both object files and pdb files.  In the future we could
perhaps rename this tool llvm-cvutil.

In the meantime, this allows us to deep dive into object files
the same way we already can with PDB files.

llvm-svn: 312358
2017-09-01 20:06:56 +00:00
Zachary Turner 5641c07d6b [PDB] Serialize records into a stack-allocated buffer.
We were using a std::vector<> and resizing to MaxRecordLength,
which is ~64KB.  We would then do this repeatedly often many
times in a tight loop, which was causing measurable performance
impact when linking PDBs.

Patch by Alex Telishev
Differential Revision: https://reviews.llvm.org/D36940

llvm-svn: 311375
2017-08-21 20:17:19 +00:00
Zachary Turner 197bba0028 Remove unused variable.
llvm-svn: 311119
2017-08-17 20:18:36 +00:00
Zachary Turner 96bcd6a37a [llvm-pdbutil] Fix some dumping issues.
When dumping, we were treating the S_INLINESITESYM as referring
to a type record, when it actually refers to an id record.  We
had this correct in TypeIndexDiscovery, so our merging algorithm
should be fine, but we had it wrong in the dumper, which means it
would appear to work most of the time, unless the index was out
of bounds in the type stream, when it would fail.  Fixed this, and
audited a few other cases to make them match the behavior in
TypeIndexDiscovery.

Also, I've now observed a new symbol record with kind 0x1168 which
I have no clue what it is, so to avoid crashing we have to just
print "Unknown Symbol Kind".

llvm-svn: 311117
2017-08-17 20:04:51 +00:00
Zachary Turner ee9906d884 [LLD/PDB] Write actual records to the globals stream.
Previously we were writing an empty globals stream.  Windows
tools interpret this as "private symbols are not present in
this PDB", even when they are, so we need to fix this.  Regardless,
without it we don't have information about global variables, so
we need to fix it anyway.  This patch does that.

With this patch, the "lm" command in WinDbg correctly reports
that we have private symbols available, but the "dv" command
still refuses to display local variables.

Differential Revision: https://reviews.llvm.org/D36535

llvm-svn: 310743
2017-08-11 19:00:03 +00:00
Zachary Turner 59e3ae827d [PDB] Fix linking of function symbols and local variables.
The compiler outputs PROC32_ID symbols into the object files
for functions, and these symbols have an embedded type index
which, when copied to the PDB, refer to the IPI stream.  However,
the symbols themselves are also converted into regular symbols
(e.g. S_GPROC32_ID -> S_GPROC32), and type indices in the regular
symbol records refer to the TPI stream.  So this patch applies
two fixes to function records.
  1. It converts ID symbols to the proper non-ID record type.
  2. After remapping the type index from the object file's index
     space to the PDB file/IPI stream's index space, it then
     remaps that index to the TPI stream's index space by.

Besides functions, during the remapping process we were also
discarding symbol record types which we did not recognize.
In particular, we were discarding S_BPREL32 records, which is
what MSVC uses to describe local variables on the stack.  So
this patch fixes that as well by copying them to the PDB.

Differential Revision: https://reviews.llvm.org/D36426

llvm-svn: 310394
2017-08-08 18:34:44 +00:00
Reid Kleckner 14d90fd05c [PDB] Improve GSI hash table dumping for publics and globals
The PDB "symbol stream" actually contains symbol records for the publics
and the globals stream. The globals and publics streams are essentially
hash tables that point into a single stream of records. In order to
match cvdump's behavior, we need to only dump symbol records referenced
from the hash table. This patch implements that, and then implements
global stream dumping, since it's just a subset of public stream
dumping.

Now we shouldn't see S_PROCREF or S_GDATA32 records when dumping
publics, and instead we should see those record in the globals stream.

llvm-svn: 309066
2017-07-26 00:40:36 +00:00
Reid Kleckner 898ddf61c0 [codeview] Emit 'D' as the cv source language for D code
This matches DMD:
522263965c/src/ddmd/backend/cv8.c (L199)

Fixes PR33899.

llvm-svn: 308890
2017-07-24 16:16:42 +00:00
Reid Kleckner 67653ee086 [codeview] Fix YAML for LF_TYPESERVER2 by hoisting PDB_UniqueId
Summary:
We were treating the GUIDs in TypeServer2Record as strings, and the
non-ASCII bytes in the GUID would not round-trip through YAML.

We already had the PDB_UniqueId type portably represent a Windows GUID,
but we need to hoist that up to the DebugInfo/CodeView library so that
we can use it in the TypeServer2Record as well as in PDB parsing code.

Reviewers: inglorion, amccarth

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D35495

llvm-svn: 308234
2017-07-17 23:59:44 +00:00
Reid Kleckner 3167480ca6 [codeview] Don't use the type visitor to merge types
Summary:
This didn't do much to speed things up, but it implements a FIXME, and I
think it's a nice simplification. We don't need the record kind switch.
We're doing that ourselves.

Reviewers: ruiu, inglorion

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D35496

llvm-svn: 308213
2017-07-17 20:31:38 +00:00
Reid Kleckner a842cd75e2 [codeview] Remove TypeServerHandler and PDBTypeServerHandler
Summary:
Instead of wiring these through the CVTypeVisitor interface, clients
should inspect the CVTypeArray before visiting it and potentially load
up the type server's TPI stream if they need it.

No tests relied on this functionality because LLD was the only client.

Reviewers: ruiu

Subscribers: mgorny, hiraditya, zturner, llvm-commits

Differential Revision: https://reviews.llvm.org/D35394

llvm-svn: 308212
2017-07-17 20:28:06 +00:00
Reid Kleckner af88a910fd [CodeView] Dump BuildInfoSym and ProcSym type indices
I need to print the type index in hex so that I can match it in
FileCheck for a test I'm writing.

llvm-svn: 308107
2017-07-15 18:10:39 +00:00
Reid Kleckner 6597c28d76 [PDB] Fix type server handling for archives
Summary:
This fixes type indices for SDK or CRT static archives. Previously we'd
try to look next to the archive object file path, which would not exist
on the local machine.

Also error out if we can't resolve a type server record. Hypothetically
we can recover from this error by discarding debug info for this object,
but that is not yet implemented.

Reviewers: ruiu, amccarth

Subscribers: aprantl, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D35369

llvm-svn: 307946
2017-07-13 20:12:23 +00:00
Reid Kleckner 8d8888ff42 [codeview] Change readobj symbol dumping format
Avoid duplicating DictScope with hand-written names everywhere.  Print
the S_-prefixed symbol kind for every record. This should make it easier
to search for certain kinds of records when debugging PDB linking.

llvm-svn: 307732
2017-07-11 23:41:41 +00:00
Reid Kleckner 447f677133 [codeview] Fix type index discovery for four symbol records
I encountered these when linking LLD, which uses atls.lib. Those objects
appear to use these uncommon symbol records:

  0x115E S_HEAPALLOCSITE
  0x113D S_ENVBLOCK
  0x1113 S_GTHREAD32
  0x1153 S_FILESTATIC

llvm-svn: 307725
2017-07-11 22:37:25 +00:00
Eugene Zelenko 4fcfc19976 [CodeView, PDB] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 306911
2017-06-30 23:06:03 +00:00
Zachary Turner af8c75a8c0 [llvm-pdbutil] Output the symbol offset when dumping.
Type records have a unique type index, but symbol records do
not.  Instead, symbol records refer to other symbol records
by referencing their offset in the symbol stream.  In a sense
this is the analogue of the TypeIndex, but we are not printing
it in the dumper.  Printing it not only gives us more useful
information when manually investigating the contents of a PDB,
but also allows us to write better tests by enabling us to
verify that fields that reference other symbol records do
so correctly.

Differential Revision: https://reviews.llvm.org/D34906

llvm-svn: 306890
2017-06-30 21:35:00 +00:00
Zachary Turner 02a267758e [llvm-pdbutil] Add the ability to dump the dependency tree for a type
Previously we had the -type-index option which would dump the record of
a single, but we had no way to follow the dependency graph backwards and
also dump all dependent types.

Having this option makes test-writing better, because we can limit the
test to only those records that are of importance for the thing we're
trying to test, which allows us to use things like CHECK-NEXT to reduce
fragility.

Differential Revision: https://reviews.llvm.org/D34899

llvm-svn: 306852
2017-06-30 18:15:47 +00:00
Eugene Zelenko 8456b16ea9 [CodeView] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 306616
2017-06-29 00:05:44 +00:00
Zachary Turner c2f5b4bfd9 [llvm-pdbutil] Dump raw bytes of type and id records.
llvm-svn: 306167
2017-06-23 21:50:54 +00:00
Reid Kleckner d0e6e24a53 [PDB] Add symbols to the PDB
Summary:
The main complexity in adding symbol records is that we need to
"relocate" all the type indices. Type indices do not have anything like
relocations, an opaque data structure describing where to find existing
type indices for fixups. The linker just has to "know" where the type
references are in the symbol records. I added an overload of
`discoverTypeIndices` that works on symbol records, and it seems to be
able to link the standard library.

Reviewers: zturner, ruiu

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D34432

llvm-svn: 305933
2017-06-21 17:25:56 +00:00
Reid Kleckner 44cdb10964 [PDB] Start emitting source file and line information
Summary:
This is a first step towards getting line info to show up in VS and
windbg. So far, only llvm-pdbutil can parse the PDBs that we produce.
cvdump doesn't like something about our file checksum tables. I'll have
to dig into that next.

This patch adds a new DebugSubsectionRecordBuilder which takes bytes
directly from some other producer, such as a linker, and sticks it into
the PDB. Line tables only need to be relocated. No data needs to be
rewritten.

File checksums and string tables, on the other hand, need to be re-done.

Reviewers: zturner, ruiu

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D34257

llvm-svn: 305713
2017-06-19 17:21:45 +00:00
Reid Kleckner 18d90e17ad [CodeView] Fix dumping of public symbol record flags
I noticed nonsensical type information while dumping PDBs produced by
MSVC.

llvm-svn: 305708
2017-06-19 16:54:51 +00:00
Zachary Turner 26dbc5420d Delete TypeDatabase.
Merge the functionality into the random access type collection.
This class was only being used in 2 places, so getting rid of it
simplifies the code.

llvm-svn: 305653
2017-06-18 20:52:45 +00:00
Zachary Turner b0fdd214b7 Don't crash if a type record can't be found.
This was a regression introduced in a previous patch.  Adding
back the code that handles this case.

llvm-svn: 305617
2017-06-17 00:02:24 +00:00
Zachary Turner ad859bd472 [CodeView] Fix random access of type names.
Suppose we had a type index offsets array with a boundary at type index
N. Then you request the name of the type with index N+1, and that name
requires the name of index N-1 (think a parameter list, for example). We
didn't handle this, and we would print something like (<unknown UDT>,
<unknown UDT>).

The fix for this is not entirely trivial, and speaks to a larger
problem. I think we need to kill TypeDatabase, or at the very least kill
TypeDatabaseVisitor. We need a thing that doesn't do any caching
whatsoever, just given a type index it can compute the type name "the
slow way". The reason for the bug is that we don't have anything like
that. Everything goes through the type database, and if we've visited a
record, then we're "done". It doesn't know how to do the expensive thing
of re-visiting dependent records if they've not yet been visited.

What I've done here is more or less copied the code (albeit greatly
simplified) from TypeDatabaseVisitor, but wrapped it in an interface
that just returns a std::string. The logic of caching the name is now in
LazyRandomTypeCollection. Eventually I'd like to move the record
database here as well and the visited record bitfield here as well, at
which point we can actually just delete TypeDatabase. I don't see any
reason for it if a "sequential" collection is just a special case of a
random access collection with an empty partial offsets array.

Differential Revision: https://reviews.llvm.org/D34297

llvm-svn: 305612
2017-06-16 23:42:44 +00:00
Zachary Turner 59224cba2e Remove some dead code / includes.
I'm trying to get rid of the TypeDatabase class, so the first
step is to minimize its footprint.

llvm-svn: 305611
2017-06-16 23:42:15 +00:00
Zachary Turner 4e950647fb [llvm-pdbutil] Add support for dumping lines and inlinee lines.
llvm-svn: 305529
2017-06-15 23:56:19 +00:00
Zachary Turner 6305545527 Resubmit "[llvm-pdbutil] rewrite the "raw" output style."
This resubmits commit c0c249e9f2ef83e1d1e5f166b50673d92f3579d7.

It was broken due to some weird template issues, which have
since been fixed.

llvm-svn: 305517
2017-06-15 22:24:24 +00:00
Zachary Turner da504b794c Revert "[llvm-pdbutil] rewrite the "raw" output style."
This reverts commit 83ea17ebf2106859a51fbc2a86031b44d33696ad.

This is failing due to some strange template problems, so reverting
until it can be straightened out.

llvm-svn: 305505
2017-06-15 20:55:51 +00:00
Zachary Turner b560fdf3b8 [llvm-pdbutil] rewrite the "raw" output style.
After some internal discussions, we agreed that the raw output style had
outlived its usefulness. It was originally created before we had even
thought of dumping to YAML, and it was intended to give us some insight
into the internals of a PDB file. Now we have YAML mode which does
almost exactly this but is more powerful in that it can round-trip back
to a PDB, which the raw mode could not do. So the raw mode had become
purely a maintenance burden.

One option was to just delete it. However, its original goal was to be
as readable as possible while staying close to the "metal" - i.e.
presenting the output in a way that maps directly to the underlying file
format. We don't actually need that last requirement anymore since it's
covered by the yaml mode, so we could repurpose "raw" mode to actually
just be as readable as possible.

This patch implements about 80% of the functionality previously in raw
mode, but in a completely different style that is more akin to what
cvdump outputs. Records are very compressed, often times appearing on
just one line. One nice thing about this is that it makes full record
matching easier, because you can grep for indices, names, and leaf types
on a single line often.

See the tests for some examples of what the new output looks like.

Note that this patch actually regresses the functionality of raw mode in
a few areas, but only because the patch was already unreasonably large
and going 100% would have been even worse. Specifically, this patch is
missing:

The ability to dump module debug subsections (checksums, lines, etc)
The ability to dump section headers
Aside from that everything is here. While goign through the tests fixing
them all up, I found many duplicate tests. They've been deleted. In
subsequent patches I will go through and re-add the missing
functionality.

Differential Revision: https://reviews.llvm.org/D34191

llvm-svn: 305495
2017-06-15 19:34:41 +00:00
Zachary Turner a8cfc29c9a Resubmit "[codeview] Make obj2yaml/yaml2obj support .debug$S..."
This was originally reverted because of some non-deterministic
failures on certain buildbots.  Luckily ASAN eventually caught
this as a stack-use-after-scope, so the fix is included in
this patch.

llvm-svn: 305393
2017-06-14 15:59:27 +00:00
Zachary Turner 0085dce221 Revert "[codeview] Make obj2yaml/yaml2obj support .debug$S..."
This is causing failures on linux bots with an invalid stream
read.  It doesn't repro in any configuration on Windows, so
reverting until I have a chance to investigate on Linux.

llvm-svn: 305371
2017-06-14 06:24:24 +00:00
Zachary Turner a3da4467fa [codeview] Make obj2yaml/yaml2obj support .debug$S/T sections.
This allows us to use yaml2obj and obj2yaml to round-trip CodeView
symbol and type information without having to manually specify the bytes
of the section. This makes for much easier to maintain tests. See the
tests under lld/COFF in this patch for example. Before they just said
SectionData: <blob> whereas now we can use meaningful record
descriptions. Note that it still supports the SectionData yaml field,
which could be useful for initializing a section to invalid bytes for
testing, for example.

Differential Revision: https://reviews.llvm.org/D34127

llvm-svn: 305366
2017-06-14 05:31:00 +00:00
Sylvestre Ledru 337804d86a Same expressions on both sides of the return
Summary:
I guess we want PointerToMemberFunction & PointerToDataMember


Fix coverity cid 1376038 


Reviewers: zturner

Reviewed By: zturner

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34110

llvm-svn: 305219
2017-06-12 18:53:46 +00:00
Zachary Turner 3226fe95bb [pdb] Support CoffSymbolRVA debug subsection.
llvm-svn: 305108
2017-06-09 20:46:52 +00:00
Zachary Turner 7e62cd17d6 Allow VarStreamArray to use stateful extractors.
Previously extractors tried to be stateless with any additional
context information needed in order to parse items being passed
in via the extraction method.  This led to quite cumbersome
implementation challenges and awkwardness of use.  This patch
brings back support for stateful extractors, making the
implementation and usage simpler.

llvm-svn: 305093
2017-06-09 17:54:36 +00:00
Bob Haarman fdf499bf2d [codeview] use 32-bit integer for RelocOffset in DebugLinesSubsection
Summary:
RelocOffset is a 32-bit value, but we previously truncated it to 16 bits.

Fixes PR33335.

Reviewers: zturner, hiraditya!

Reviewed By: zturner

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D33968

llvm-svn: 305043
2017-06-09 01:18:10 +00:00
Zachary Turner 28c22c83e3 [pdb] Don't crash on unknown debug subsections.
More and more unknown debug subsection kinds are being discovered
so we should make it possible to dump these and display the
bytes.

llvm-svn: 305041
2017-06-09 00:53:59 +00:00
Zachary Turner deb391309c [CodeView] Support remaining debug subsection types
This adds support for Symbols, StringTable, and FrameData subsection
types.  Even though these subsections rarely if ever appear in a PDB
file (they are usually in object files), there's no theoretical reason
why they *couldn't* appear in a PDB.  The real issue though is that in
order to add support for dumping and writing them (which will be useful
for object files), we need a way to test them.  And since there is no
support for reading and writing them to / from object files yet, making
PDB support them is the best way to both add support for the underlying
format and add support for tests at the same time.  Later, when we go
to add support for reading / writing them from object files, we'll need
only minimal changes in the underlying read/write code.

llvm-svn: 305037
2017-06-09 00:28:08 +00:00
Zachary Turner 1bf7762049 [llvm-pdbdump] Support native ordering of subsections in raw mode.
This is the same change for the YAML Output style applied to the
raw output style.  Previously we would queue up all subsections
until every one had been read, and then output them in a pre-
determined order.  This was because some subsections need to be
read first in order to properly dump later subsections.  This
patch allows them to be dumped in the order they appear.

Differential Revision: https://reviews.llvm.org/D34015

llvm-svn: 305034
2017-06-08 23:49:01 +00:00
Zachary Turner 88101dadcc [CodeView] Fix endianness bug.
We should be outputting in little endian, but we were writing
in host endianness.

llvm-svn: 304741
2017-06-05 22:12:23 +00:00
Zachary Turner 349c18f837 [CodeView] Handle Cross Module Imports and Exports.
While it's not entirely clear why a compiler or linker might
put this information into an object or PDB file, one has been
spotted in the wild which was causing llvm-pdbdump to crash.

This patch adds support for reading-writing these sections.
Since I don't know how to get one of the native tools to
generate this kind of debug info, the only test here is one
in which we feed YAML into the tool to produce a PDB and
then spit out YAML from the resulting PDB and make sure that
it matches.

llvm-svn: 304738
2017-06-05 21:40:33 +00:00
Zachary Turner 92dcdda623 [CodeView] Support CodeView subsections in any order.
Previously we would expect certain subsections to appear
in a certain order because some subsections would reference
other subsections, but in practice we need to support
arbitrary orderings since some object file and PDB file
producers generate them this way.  This also paves the
way for supporting Yaml <-> Object File conversion of
CodeView, since Object Files typically have quite a
large number of subsections in their debug info.

Differential Revision: https://reviews.llvm.org/D33807

llvm-svn: 304588
2017-06-02 19:49:14 +00:00
Zachary Turner afb81a83a9 Fix 2 more -Wreorder warnings.
llvm-svn: 304494
2017-06-01 23:24:50 +00:00
Zachary Turner ebd3ae8371 [CodeView] Properly align symbol records on read/write.
Object files have symbol records not aligned to any particular
boundary (e.g. 1-byte aligned), while PDB files have symbol
records padded to 4-byte aligned boundaries.  Since they share
the same reading / writing code, we have to provide an option to
specify the alignment and propagate it up to the producer or
consumer who knows what the alignment is supposed to be for the
given container type.

Added a test for this by modifying the existing PDB -> YAML -> PDB
round-tripping code to round trip symbol records as well as types.

Differential Revision: https://reviews.llvm.org/D33785

llvm-svn: 304484
2017-06-01 21:52:41 +00:00
Zachary Turner d427383cb8 [CodeView] Move CodeView YAML code to ObjectYAML.
This is the beginning of an effort to move the codeview yaml
reader / writer into ObjectYAML so that it can be shared.
Currently the only consumer / producer of CodeView YAML is
llvm-pdbdump, but CodeView can exist outside of PDB files, and
indeed is put into object files and passed to the linker to
produce PDB files.  Furthermore, there are subtle differences
in the types of records that show up in object file CodeView
vs PDB file CodeView, but they are otherwise 99% the same.

By having this code in ObjectYAML, we can have llvm-pdbdump
reuse this code, while teaching obj2yaml and yaml2obj to use
this syntax for dealing with object files that can contain
CodeView.

This patch only adds support for CodeView type information
to ObjectYAML.  Subsequent patches will add support for
CodeView symbol information.

llvm-svn: 304248
2017-05-30 21:53:05 +00:00
Zachary Turner 591312c5c1 [CodeView] Add more DebugSubsection implementations.
This adds implementations for Symbols and FrameData, and renames
the existing codeview::StringTable class to conform to the
DebugSectionStringTable convention.

llvm-svn: 304222
2017-05-30 17:13:33 +00:00
Zachary Turner 8c099fe06e [CodeView] Rename ModuleDebugFragment -> DebugSubsection.
This is more concise, and matches the terminology used in other
parts of the codebase more closely.

llvm-svn: 304218
2017-05-30 16:36:15 +00:00
Zachary Turner f2110283c6 Remove unused member.
llvm-svn: 303942
2017-05-25 23:47:56 +00:00
Zachary Turner fed467eefb [CV Type Merging] Find nested type indices faster.
Merging two type streams is one of the most time consuming
parts of generating a PDB, and as such it needs to be as
fast as possible.  The visitor abstractions used for interoperating
nicely with many different types of inputs and outputs have
been used widely and help greatly for testability and implementing
tools, but the abstractions build up and get in the way of
performance.

This patch removes all of the visitation stuff from the type
stream merger, essentially re-inventing the leaf / member switch
and loop, but at a very low level.  This allows us many other
optimizations, such as not actually deserializing *any* records
(even member records which don't describe their own length), as
the operation of "figure out how long this record is" is somewhat
faster than "figure out how long this record *and* get all its
fields out".  Furthermore, whereas before we had to deserialize,
re-write type indices, then re-serialize, now we don't have to
do any of those 3 steps.  We just find out where the type indices
are and pull them directly out of the byte stream and re-write
them.

This is worth a 50-60% performance increase.  On top of all other
optimizations that have been applied this week, I now get the
following numbers when linking lld.exe and lld.pdb

MSVC: 25.67s
Before This Patch: 18.59s
After This Patch: 8.92s

So this is a huge performance win.

Differential Revision: https://reviews.llvm.org/D33564

llvm-svn: 303935
2017-05-25 23:36:16 +00:00
Zachary Turner 7f97c362a4 [CodeView Type Merging] Don't keep re-allocating temp serializer.
Previously, every time we wanted to serialize a field list record, we
would create a new copy of FieldListRecordBuilder, which would in turn
create a temporary instance of TypeSerializer, which itself had a
std::vector<> that was about 128K in size. So this 128K allocation was
happening every time. We can re-use the same instance over and over, we
just have to clear its internal hash table and seen records list between
each run. This saves us from the constant re-allocations.

This is worth an ~18.5% speed increase (3.75s -> 3.05s) in my tests.

Differential Revision: https://reviews.llvm.org/D33506

llvm-svn: 303919
2017-05-25 21:15:37 +00:00
Zachary Turner dda25b128c [CodeView Type Merging] Avoid record deserialization when possible.
A profile shows the majority of time doing type merging is spent
deserializing records from sequences of bytes into friendly C++ structures
that we can easily access members of in order to find the type indices to
re-write.

Records are prefixed with their length, however, and most records have
type indices that appear at fixed offsets in the record. For these
records, we can save some cycles by just looking at the right place in the
byte sequence and re-writing the value, then skipping the record in the
type stream. This saves us from the costly deserialization of examining
every field, including potentially null terminated strings which are the
slowest, even though it was unnecessary to begin with.

In addition, we apply another optimization. Previously, after
deserializing a record and re-writing its type indices, we would
unconditionally re-serialize it in order to compute the hash of the
re-written record. This would result in an alloc and memcpy for every
record. If no type indices were re-written, however, this was an
unnecessary allocation. In this patch re-writing is made two phase. The
first phase discovers the indices that need to be rewritten and their new
values. This information is passed through to the de-duplication code,
which only copies and re-writes type indices in the serialized byte
sequence if at least one type index is different.

Some records have type indices which only appear after variable length
strings, or which have lists of type indices, or various other situations
that can make it tricky to make this optimization. While I'm not giving up
on optimizing these cases as well, for now we can get the easy cases out
of the way and lay the groundwork for more complicated cases later.

This patch yields another 50% speedup on top of the already large speedups
submitted over the past 2 days. In two tests I have run, I went from 9
seconds to 3 seconds, and from 16 seconds to 8 seconds.

Differential Revision: https://reviews.llvm.org/D33480

llvm-svn: 303914
2017-05-25 21:06:28 +00:00
Zachary Turner bb64231d2d Don't do a full scan of the type stream before processing records.
LazyRandomTypeCollection is designed for random access, and in
order to provide this it lazily indexes ranges of types.  In the
case of types from an object file, there is no partial index
to build off of, so it has to index the full stream up front.
However, merging types only requires sequential access, and when
that is needed, this extra work is simply wasted.  Changing the
algorithm to work on sequential arrays of types rather than
random access type collections eliminates this up front scan.

llvm-svn: 303707
2017-05-24 00:26:27 +00:00
Zachary Turner 7daf62e743 [CodeView] Eliminate redundant hashes and allocations.
When writing field list records, we would construct a temporary
type serializer that shared a bump ptr allocator with the rest
of the application, so anything allocated from here would live
forever.  Furthermore, this temporary serializer had all the
properties of a full blown serializer including record hashing
and de-duplication.

These features are required when you're merging multiple type
streams into each other, because different streams may contain
identical records, but records from the same type stream will
never collide with each other.  So all of this hashing was
unnecessary.

To solve this, two fixes are made:

1) The temporary serializer keeps its own bump ptr allocator
instead of sharing a global one.  When it's finished, all of
its memory is freed.

2) Instead of using the same temporary serializer for the life
of an entire type stream, we use it only for the life of a single
field list record and delete it when the field list record is
completed.  This way the hash table will not grow as other
records from the same type stream get inserted.  Further improvements
could eliminate hashing entirely from this codepath.

This reduces the link time by 85% in my test, from 1 minute to 9
seconds.

llvm-svn: 303676
2017-05-23 18:56:23 +00:00
Reid Kleckner 36238b15d7 Speculative build fix for non-Windows
llvm-svn: 303667
2017-05-23 18:28:13 +00:00
Reid Kleckner ded38803c5 [PDB] Hash types up front when merging types instead of using StringMap
Summary:
First, StringMap uses llvm::HashString, which is only good for short
identifiers and really bad for large blobs of binary data like type
records. Moving to `DenseMap<StringRef, TypeIndex>` with some tricks for
memory allocation fixes that.

Unfortunately, that didn't buy very much performance. Profiling showed
that we spend a long time during DenseMap growth rehashing existing
entries. Also, in general, DenseMap is faster when the keys are small.
This change takes that to the logical conclusion by introducing a small
wrapper value type around a pointer to key data. The key data contains a
precomputed hash, the original record data (pointer and size), and the
type index, which is the "value" of our original map.

This reduces the time to produce llvm-as.exe and llvm-as.pdb from ~15s
on my machine to 3.5s, which is about a 4x improvement.

Reviewers: zturner, inglorion, ruiu

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D33428

llvm-svn: 303665
2017-05-23 18:23:59 +00:00
Zachary Turner bf35e6ab2a Revert "Make TypeSerializer's StringMap use the same allocator."
This reverts commit e34ccb7b57da25cc89ded913d8638a2906d1110a.

This is causing failures on the ASAN bots.

llvm-svn: 303640
2017-05-23 15:50:37 +00:00
Zachary Turner d4136e945e Implement various flavors of type merging.
Previous algotirhm assumed that types and ids are in a single
unified stream.  For inputs that come from object files, this
is the case.  But if the input is already a PDB, or is the result
of a previous merge, then the types and ids will already have
been split up, in which case we need an algorithm that can
accept operate on independent streams of types and ids that
refer across stream boundaries to each other.

Differential Revision: https://reviews.llvm.org/D33417

llvm-svn: 303577
2017-05-22 21:07:43 +00:00
Zachary Turner 12f8c31c04 Make TypeSerializer's StringMap use the same allocator.
llvm-svn: 303576
2017-05-22 21:07:14 +00:00
Zachary Turner 526f4f2aa8 Resubmit "[CodeView] Provide a common interface for type collections."
This was originally reverted because it was a breaking a bunch
of bots and the breakage was not surfacing on Windows.  After much
head-scratching this was ultimately traced back to a bug in the
lit test runner related to its pipe handling.  Now that the bug
in lit is fixed, Windows correctly reports these test failures,
and as such I have finally (hopefully) fixed all of them in this
patch.

llvm-svn: 303446
2017-05-19 19:26:58 +00:00
Zachary Turner 1dfcf8d92c Revert "[CodeView] Provide a common interface for type collections."
This is a squash of ~5 reverts of, well, pretty much everything
I did today.  Something is seriously broken with lit on Windows
right now, and as a result assertions that fire in tests are
triggering failures.  I've been breaking non-Windows bots all
day which has seriously confused me because all my tests have
been passing, and after running lit with -a to view the output
even on successful runs, I find out that the tool is crashing
and yet lit is still reporting it as a success!

At this point I don't even know where to start, so rather than
leave the tree broken for who knows how long, I will get this
back to green, and then once lit is fixed on Windows, hopefully
hopefully fix the remaining set of problems for real.

llvm-svn: 303409
2017-05-19 05:57:45 +00:00
Zachary Turner 47fdc73771 Don't crash if someone tries to visit an empty type stream.
llvm-svn: 303408
2017-05-19 05:18:09 +00:00
Zachary Turner 59ab6a3816 [CodeView] Reduce memory usage in TypeSerializer.
We were using a BumpPtrAllocator to allocate stable storage for
a record, then trying to insert that into a hash table.  If a
collision occurred, the bytes were never inserted and the
allocation was unnecessary.  At the cost of an extra hash
computation, check first if it exists, and only if it does do
we allocate and insert.

llvm-svn: 303407
2017-05-19 04:56:48 +00:00
Zachary Turner 8f1d87a79a Fix crasher in CodeView test.
Apparently this was always broken, but previously we were more
graceful about it and we would print "unknown udt" if we couldn't
find the type index, whereas now we just segfault because we
assume it's valid.  But this exposed a real bug, which is that
we weren't looking in the right place.  So fix that, and also
fix this crash at the same time.

llvm-svn: 303397
2017-05-19 00:56:39 +00:00
Zachary Turner 7b62d7ccc0 Fix some build errors and warnings.
llvm-svn: 303391
2017-05-18 23:12:42 +00:00
Zachary Turner b32ec02b80 [CodeView] Raise the source to ID map out of the TypeStreamMerger.
This map will be needed to rewrite symbol streams after re-writing
the corresponding type streams.

llvm-svn: 303390
2017-05-18 23:04:08 +00:00
Zachary Turner 0c60f269fc [CodeView] Provide a common interface for type collections.
Right now we have multiple notions of things that represent collections of
types. Most commonly used are TypeDatabase, which is supposed to keep
mappings from TypeIndex to type name when reading a type stream, which
happens when reading PDBs. And also TypeTableBuilder, which is used to
build up a collection of types dynamically which we will later serialize
(i.e. when writing PDBs).

But often you just want to do some operation on a collection of types, and
you may want to do the same operation on any kind of collection. For
example, you might want to merge two TypeTableBuilders or you might want
to merge two type streams that you loaded from various files.

This dichotomy between reading and writing is responsible for a lot of the
existing code duplication and overlapping responsibilities in the existing
CodeView library classes. For example, after building up a
TypeTableBuilder with a bunch of type records, if we want to dump it we
have to re-invent a bunch of extra glue because our dumper takes a
TypeDatabase or a CVTypeArray, which are both incompatible with
TypeTableBuilder.

This patch introduces an abstract base class called TypeCollection which
is shared between the various type collection like things. Wherever we
previously stored a TypeDatabase& in some common class, we now store a
TypeCollection&.

The advantage of this is that all the details of how the collection are
implemented, such as lazy deserialization of partial type streams, is
completely transparent and you can just treat any collection of types the
same regardless of where it came from.

Differential Revision: https://reviews.llvm.org/D33293

llvm-svn: 303388
2017-05-18 23:03:06 +00:00
Zachary Turner 1d795c451e [CodeView] Simplify the use of visiting type records & streams.
There is often a lot of boilerplate code required to visit a type
record or type stream.  The #1 use case is that you have a sequence
of bytes that represent one or more records, and you want to
deserialize each one, switch on it, and call a callback with the
deserialized record that the user can examine.  Currently this
requires at least 6 lines of code:

  codeview::TypeVisitorCallbackPipeline Pipeline;
  Pipeline.addCallbackToPipeline(Deserializer);
  Pipeline.addCallbackToPipeline(MyCallbacks);

  codeview::CVTypeVisitor Visitor(Pipeline);
  consumeError(Visitor.visitTypeRecord(Record));

With this patch, it becomes one line of code:

  consumeError(codeview::visitTypeRecord(Record, MyCallbacks));

This is done by having the deserialization happen internally inside
of the visitTypeRecord function.  Since this is occasionally not
desirable, the function provides a 3rd parameter that can be used
to change this behavior.

Hopefully this can significantly reduce the barrier to entry
to using the visitation infrastructure.

Differential Revision: https://reviews.llvm.org/D33245

llvm-svn: 303271
2017-05-17 16:39:06 +00:00
Zachary Turner dd3a739d52 [CodeView] Add a random access type visitor.
This adds a visitor that is capable of accessing type
records randomly and caching intermediate results that it
learns about during partial linear scans.  This yields
amortized O(1) access to a type stream even though type
streams cannot normally be indexed.

Differential Revision: https://reviews.llvm.org/D33009

llvm-svn: 302936
2017-05-12 19:18:12 +00:00
Aaron Ballman f22f885b66 Removing a file that is not necessary (and was causing link diagnostics with MSVC 2015); NFC.
llvm-svn: 302531
2017-05-09 14:22:48 +00:00
Zachary Turner 1dacb24222 [CodeView] Add support for random access type visitors.
Previously type visitation was done strictly sequentially, and
TypeIndexes were computed by incrementing the TypeIndex of the
last visited record.  This works fine for situations like dumping,
but not when you want to visit types in random order.  For example,
in a debug session someone might lookup a symbol by name, find that
it has TypeIndex 10,000 and then want to go straight to TypeIndex
10,000.

In order to make this work, the visitation framework needs a mode
where it can plumb TypeIndices through the callback pipeline.  This
patch adds such a mode.  In doing so, it is necessary to provide
an alternative implementation of TypeDatabase that supports random
access, so that is done as well.

Nothing actually uses these random access capabilities yet, but
this will be done in subsequent patches.

Differential Revision: https://reviews.llvm.org/D32928

llvm-svn: 302454
2017-05-08 18:38:43 +00:00
Zachary Turner 8c74673388 [CodeView] Reserve TypeDatabase records up front.
Most of the time we know exactly how many type records we
have in a list, and we want to use the visitor to deserialize
them into actual records in a database.  Previously we were
just using push_back() every time without reserving the space
up front in the vector.  This is obviously terrible from a
performance standpoint, and it's not uncommon to have PDB
files with half a million type records, where the performance
degredation was quite noticeable.

llvm-svn: 302302
2017-05-05 22:02:37 +00:00
Zachary Turner 4f145b2a59 Remove unused private field.
llvm-svn: 302069
2017-05-03 19:42:06 +00:00
Davide Italiano 2e23ce4cad [CodeView] Remove constructor initialization of a removed field.
I should've staged this with my last commit.

llvm-svn: 302059
2017-05-03 18:02:46 +00:00
Zachary Turner cf468d86f3 [CodeView] Use actual strings for dealing with checksums and lines.
The raw CodeView format references strings by "offsets", but it's
confusing what table the offset refers to.  In the case of line
number information, it's an offset into a buffer of records,
and an indirection is required to get another offset into a
different table to find the final string.  And in the case of
checksum information, there is no indirection, and the offset
refers directly to the location of the string in another buffer.

This would be less confusing if we always just referred to the
strings by their value, and have the library be smart enough
to correctly resolve the offsets on its own from the right
location.

This patch makes that possible.  When either reading or writing,
all the user deals with are strings, and the library does the
appropriate translations behind the scenes.

llvm-svn: 302053
2017-05-03 17:11:40 +00:00
Zachary Turner 2d5c2cd3ce [llvm-readobj] Update readobj to re-use parsing code.
llvm-readobj hand rolls some CodeView parsing code for string
tables, so this patch updates it to re-use some of the newly
introduced parsing code in LLVMDebugInfoCodeView.

Differential Revision: https://reviews.llvm.org/D32772

llvm-svn: 302052
2017-05-03 17:11:11 +00:00
Zachary Turner c504ae3cef Resubmit r301986 and r301987 "Add codeview::StringTable"
This was reverted due to a "missing" file, but in reality
what happened was that I renamed a file, and then due to
a merge conflict both the old file and the new file got
added to the repository.  This led to an unused cpp file
being in the repo and not referenced by any CMakeLists.txt
but #including a .h file that wasn't in the repo.  In an
even more unfortunate coincidence, CMake didn't report the
unused cpp file because it was in a subdirectory of the
folder with the CMakeLists.txt, and not in the same directory
as any CMakeLists.txt.

The presence of the unused file was then breaking certain
tools that determine file lists by globbing rather than
by what's specified in CMakeLists.txt

In any case, the fix is to just remove the unused file from
the patch set.

llvm-svn: 302042
2017-05-03 15:58:37 +00:00
Daniel Jasper dff096f217 Revert r301986 (and subsequent r301987).
The patch is failing to add StringTableStreamBuilder.h, but that isn't
even discovered because the corresponding StringTableStreamBuilder.cpp
isn't added to any CMakeLists.txt file and thus never built. I think
this patch is just incomplete.

llvm-svn: 302002
2017-05-03 07:29:25 +00:00
Zachary Turner 59e83892e0 Fix use after free in BinaryStream library.
This was reported by the ASAN bot, and it turned out to be
a fairly fundamental problem with the design of VarStreamArray
and the way it passes context information to the extractor.

The fix was cumbersome, and I'm not entirely pleased with it,
so I plan to revisit this design in the future when I'm not
pressed to get the bots green again.  For now, this fixes
the issue by storing the context information by value instead
of by reference, and introduces some impossibly-confusing
template magic to make things "work".

llvm-svn: 301999
2017-05-03 05:34:00 +00:00
Zachary Turner 67736594f7 Fix type conversion error.
llvm-svn: 301987
2017-05-02 23:41:51 +00:00
Zachary Turner 7dba20bd2b Make codeview::StringTable.
Previously we had knowledge of how to serialize and deserialize
a string table inside of DebugInfo/PDB, but the string table
that it serializes contains a piece that is actually considered
CodeView and can appear outside of a PDB.  We already have logic
in llvm-readobj and MCCodeView to read and write this format,
so it doesn't make sense to duplicate the logic in DebugInfoPDB
as well.

This patch makes codeview::StringTable (for writing) and
codeview::StringTableRef (for reading), updates DebugInfoPDB
to use these classes for its own writing, and updates llvm-readobj
to additionally use StringTableRef for reading.

It's a bit more difficult to get MCCodeView to use this for
writing, but it's a logical next step.

llvm-svn: 301986
2017-05-02 23:36:17 +00:00
Zachary Turner edef14510e [PDB/CodeView] Read/write codeview inlinee line information.
Previously we wrote line information and file checksum
information, but we did not write information about inlinee
lines and functions.  This patch adds support for that.

llvm-svn: 301936
2017-05-02 16:56:09 +00:00
Zachary Turner 8a2ebfb1cd [CodeView] Write CodeView line information.
Differential Revision: https://reviews.llvm.org/D32716

llvm-svn: 301882
2017-05-01 23:27:42 +00:00
Zachary Turner 7cc13e557c [PDB/CodeView] Rename some classes.
In preparation for introducing writing capabilities for each of
these classes, I would like to adopt a Foo / FooRef naming
convention, where Foo indicates that the class can manipulate and
serialize Foos, and FooRef indicates that it is an immutable view of
an existing Foo.  In other words, Foo is a writer and FooRef is a
reader.  This patch names some existing readers to conform to the
FooRef convention, while offering no functional change.

llvm-svn: 301810
2017-05-01 16:46:39 +00:00
Zachary Turner 5b6e4e0aed [llvm-pdbdump] Abstract some of the YAML/Raw printing code.
There is a lot of duplicate code for printing line info between
YAML and the raw output printer.  This introduces a base class
that can be shared between the two, and makes some minor
cleanups in the process.

llvm-svn: 301728
2017-04-29 01:13:21 +00:00
Zachary Turner c37cb0c6a5 [CodeView] Isolate Debug Info Fragments into standalone classes.
Previously parsing of these were all grouped together into a
single master class that could parse any type of debug info
fragment.

With writing forthcoming, the complexity of each individual
fragment is enough to warrant them having their own classes so
that reading and writing of each fragment type can be grouped
together, but isolated from the code for reading and writing
other fragment types.

In doing so, I found a place where parsing code was duplicated
for the FileChecksums fragment, across llvm-readobj and the
CodeView library, and one of the implementations had a bug.
Now that the codepaths are merged, the bug is resolved.

Differential Revision: https://reviews.llvm.org/D32547

llvm-svn: 301557
2017-04-27 16:12:16 +00:00
Zachary Turner e509447418 [Support] Make BinaryStreamArray extractors stateless.
Instead, we now pass a context memeber through the extraction
process.

llvm-svn: 301556
2017-04-27 16:11:47 +00:00
Zachary Turner 67c5601404 Rename some PDB classes.
We have a lot of very similarly named classes related to
dealing with module debug info.  This patch has NFC, it just
renames some classes to be more descriptive (albeit slightly
more to type).  The mapping from old to new class names is as
follows:

   Old          |        New
ModInfo         | DbiModuleDescriptor
ModuleSubstream | ModuleDebugFragment
ModStream       | ModuleDebugStream

With the corresponding Builder classes renamed accordingly.

Differential Revision: https://reviews.llvm.org/D32506

llvm-svn: 301555
2017-04-27 16:11:19 +00:00
Vassil Vassilev e1f12fadc0 Remove unused functions. Remove static qualifier from functions in header files. NFC.
llvm-svn: 299947
2017-04-11 14:55:32 +00:00
Reid Kleckner c4b5d794f1 [codeview] Cope with unsorted streams in type merging
Summary:
MASM can produce type streams that are not topologically sorted. It can
even produce type streams with circular references, but those are not
common in practice.

Reviewers: inglorion, ruiu

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31629

llvm-svn: 299403
2017-04-03 23:58:15 +00:00
Reid Kleckner 1c3b5087b7 [codeview] Add support for label type records
MASM can produce these type records.

llvm-svn: 299388
2017-04-03 21:25:20 +00:00
Reid Kleckner acd9a6f09d [codeview] Fix buggy BeginIndexMapSize assertion
This assert is just trying to test that processing each record adds
exactly one entry to the index map. The assert logic was wrong when the
first record in the type stream was a field list.

I've simplified the code by moving the LF_FIELDLIST-specific logic into
the callback for that record type.

llvm-svn: 299035
2017-03-29 22:51:22 +00:00
Reid Kleckner 5d57752c81 [PDB] Split item and type records when merging type streams
Summary: MSVC does this when producing a PDB.

Reviewers: ruiu

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31316

llvm-svn: 298717
2017-03-24 17:26:38 +00:00
Reid Kleckner a5d187b0ff [PDB] Use two DBs when dumping the IPI stream
Summary:
When dumping these records from an object file section, we should use
only one type database. However, when dumping from a PDB, we should use
two: one for the type stream and one for the IPI stream.

Certain type records that normally live in the .debug$T object file
section get moved over to the IPI stream of the PDB file and they get
new indices.

So far, I've noticed that the MSVC linker always moves these records
into IPI:
- LF_FUNC_ID
- LF_MFUNC_ID
- LF_STRING_ID
- LF_SUBSTR_LIST
- LF_BUILDINFO
- LF_UDT_MOD_SRC_LINE

These records have index fields that can point into TPI or IPI. In
particular, LF_SUBSTR_LIST and LF_BUILDINFO point to LF_STRING_ID
records to describe compilation command lines.

I've modified the dumper to have an optional pointer to the item DB, and
to do type name lookup of these fields in that DB. See printItemIndex.
The result is that our pdbdump-headers.test is more faithful to the PDB
contents and the output is less confusing.

Reviewers: ruiu

Subscribers: amccarth, zturner, llvm-commits

Differential Revision: https://reviews.llvm.org/D31309

llvm-svn: 298649
2017-03-23 21:36:25 +00:00
Reid Kleckner c573acd9e9 [codeview] Move type index remapping logic to type merger
Summary:
This removes the 'remapTypeIndices' method on every TypeRecord class. My
original idea was that this would be the beginning of some kind of
generic entry point that would enumerate all of the TypeIndices inside
of a TypeRecord, so that we could write generic graph algorithms for
them without duplicating the knowledge of which fields are type index
fields everywhere. This never happened, and nothing else uses this
method. I need to change the API to deal with merging into IPI streams,
so let's move it into the file that uses it first.

Reviewers: zturner, ruiu

Reviewed By: zturner, ruiu

Subscribers: mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D31267

llvm-svn: 298564
2017-03-23 00:14:23 +00:00
Reid Kleckner 45928018c5 [codeview] Use separate records for LF_SUBSTR_LIST and LF_ARGLIST
They are structurally the same, but now we need to distinguish them
because one record lives in the IPI stream and the other lives in TPI.

llvm-svn: 298474
2017-03-22 01:37:38 +00:00
Zachary Turner 2ed2aa75bf [pdb] Fix an uninitialized read, and add a test for it.
This was originally reported in pr32249, uncovered by PTVS-Studio.
There was no code coverage for this path because it was
difficult to construct odd-case PDB files that were not generated
by cl.

Now that we can write construct minimal PDB files from YAML,
it's easy to construct fragments that generate whatever we want.

In this patch I add a test that creates 2 type records.  One
with a unique name, and one without.  I verify that we can go
from PDB to Yaml with no errors.  In a future patch I'd like
to add something like llvm-pdbdump raw -lookup-type that will
just dump one record and nothing else, which should make it
a bit cleaner to find this kind of thing.

llvm-svn: 298017
2017-03-17 00:15:55 +00:00
Zachary Turner 42cb87f401 [PDB] It is not an error getting the "Invalid" Annotation opcode.
The linker can insert invalid opcodes to indicate padding
bytes, and we should not fail in this case.

llvm-svn: 298016
2017-03-17 00:15:27 +00:00
Zachary Turner 407dec59a4 [llvm-pdbdump] Add support for dumping symbols from Yaml -> PDB.
Previously we could round-trip type records from PDB -> Yaml ->
PDB, but for symbols we could only go from PDB -> Yaml.  This
completes the round-tripping for symbols as well.

llvm-svn: 297625
2017-03-13 14:57:45 +00:00
Zachary Turner d9dc2829ea [Support] Move Stream library from MSF -> Support.
After several smaller patches to get most of the core improvements
finished up, this patch is a straight move and header fixup of
the source.

Differential Revision: https://reviews.llvm.org/D30266

llvm-svn: 296810
2017-03-02 20:52:51 +00:00
Zachary Turner 695ed56ba5 [PDB] Make streams carry their own endianness.
Before the endianness was specified on each call to read
or write of the StreamReader / StreamWriter, but in practice
it's extremely rare for streams to have data encoded in
multiple different endiannesses, so we should optimize for the
99% use case.

This makes the code cleaner and more general, but otherwise
has NFC.

llvm-svn: 296415
2017-02-28 00:04:07 +00:00
Zachary Turner 120faca41b [PDB] Partial resubmit of r296215, which improved PDB Stream Library.
This was reverted because it was breaking some builds, and
because of incorrect error code usage.  Since the CL was
large and contained many different things, I'm resubmitting
it in pieces.

This portion is NFC, and consists of:

1) Renaming classes to follow a consistent naming convention.
2) Fixing the const-ness of the interface methods.
3) Adding detailed doxygen comments.
4) Fixing a few instances of passing `const BinaryStream& X`.  These
   are now passed as `BinaryStreamRef X`.

llvm-svn: 296394
2017-02-27 22:11:43 +00:00
NAKAMURA Takumi 05a75e40da Revert r296215, "[PDB] General improvements to Stream library." and followings.
r296215, "[PDB] General improvements to Stream library."
r296217, "Disable BinaryStreamTest.StreamReaderObject temporarily."
r296220, "Re-enable BinaryStreamTest.StreamReaderObject."
r296244, "[PDB] Disable some tests that are breaking bots."
r296249, "Add static_cast to silence -Wc++11-narrowing."

std::errc::no_buffer_space should be used for OS-oriented errors for socket transmission.
(Seek discussions around llvm/xray.)

I could substitute s/no_buffer_space/others/g, but I revert whole them ATM.

Could we define and use LLVM errors there?

llvm-svn: 296258
2017-02-25 17:04:23 +00:00
Zachary Turner af299ea5d4 [PDB] General improvements to Stream library.
This adds various new functionality and cleanup surrounding the
use of the Stream library.  Major changes include:

* Renaming of all classes for more consistency / meaningfulness
* Addition of some new methods for reading multiple values at once.
* Full suite of unit tests for reader / writer functionality.
* Full set of doxygen comments for all classes.
* Streams now store their own endianness.
* Fixed some bugs in a few of the classes that were discovered
  by the unit tests.

llvm-svn: 296215
2017-02-25 00:44:30 +00:00
Zachary Turner d2684b7969 [PDB] Rename Stream related source files.
This is part of a larger effort to get the Stream code moved
up to Support.  I don't want to do it in one large patch, in
part because the changes are so big that it will treat everything
as file deletions and add, losing history in the process.
Aside from that though, it's just a good idea in general to
make small changes.

So this change only changes the names of the Stream related
source files, and applies necessary source fix ups.

llvm-svn: 296211
2017-02-25 00:33:34 +00:00
Zachary Turner 181fe17b6f Don't assume little endian in StreamReader / StreamWriter.
In an effort to generalize this so it can be used by more than
just PDB code, we shouldn't assume little endian.

llvm-svn: 295525
2017-02-18 01:35:33 +00:00
Zachary Turner 7b327d051b [pdb] Add the ability to resolve TypeServer PDBs.
Some PDBs or object files can contain references to other PDBs
where the real type information lives.  When this happens,
all type indices in the original PDB are meaningless because
their records are not there.

With this patch we add the ability to pull type info from those
secondary PDBs.

Differential Revision: https://reviews.llvm.org/D29973

llvm-svn: 295382
2017-02-16 23:35:45 +00:00
Zachary Turner 5ce0f4a9de Properly parse the TypeServer2 record.
llvm-svn: 294046
2017-02-03 21:22:27 +00:00
Rui Ueyama a9b29615fb Re-submit r293820: Return Error instead of bool from mergeTypeStreams().
llvm-svn: 293847
2017-02-02 00:47:10 +00:00
Rui Ueyama 7d07a1652d Revert r293820: Return Error instead of bool from mergeTypeStreams().
It broke buildbots.

llvm-svn: 293824
2017-02-01 22:28:43 +00:00
Rui Ueyama 00d4f49717 Return Error instead of bool from mergeTypeStreams().
Previously, mergeTypeStreams returns only true or false, so it was
impossible to know the reason if it failed. This patch changes the
function signature so that it returns an Error object.

Differential Revision: https://reviews.llvm.org/D29362

llvm-svn: 293820
2017-02-01 22:09:34 +00:00
Zachary Turner d50c01308e [pdb] Add a new command for analyzing hash collisions.
This introduces the `analyze` subcommand.  For now there is only
one option, to analyze hash collisions in the type streams.  In
the future, however, we could add many more things here, such
as performing size analyses, compacting, and statistics about
the type of records etc.

llvm-svn: 293795
2017-02-01 18:30:22 +00:00
Benjamin Kramer 061f4a5fe6 Apply clang-tidy's performance-unnecessary-value-param to LLVM.
With some minor manual fixes for using function_ref instead of
std::function. No functional change intended.

llvm-svn: 291904
2017-01-13 14:39:03 +00:00
Zachary Turner 629cb7d8cc [CodeView] Finish decoupling TypeDatabase from TypeDumper.
Previously the type dumper itself was passed around to a lot of different
places and manipulated in ways that were more appropriate on the type
database. For example, the entire TypeDumper was passed into the symbol
dumper, when all the symbol dumper wanted to do was lookup the name of a
TypeIndex so it could print it. That's what the TypeDatabase is for --
mapping type indices to names.

Another example is how if the user runs llvm-pdbdump with the option to
dump symbols but not types, we still have to visit all types so that we
can print minimal information about the type of a symbol, but just without
dumping full symbol records. The way we did this before is by hacking it
up so that we run everything through the type dumper with a null printer,
so that the output goes to /dev/null. But really, we don't need to dump
anything, all we want to do is build the type database. Since
TypeDatabaseVisitor now exists independently of TypeDumper, we can do
this. We just build a custom visitor callback pipeline that includes a
database visitor but not a dumper.

All the hackery around printers etc goes away. After this patch, we could
probably even delete the entire CVTypeDumper class since really all it is
at this point is a thin wrapper that hides the details of how to build a
useful visitation pipeline. It's not a priority though, so CVTypeDumper
remains for now.

After this patch we will be able to easily plug in a different style of
type dumper by only implementing the proper visitation methods to dump
one-line output and then sticking it on the pipeline.

Differential Revision: https://reviews.llvm.org/D28524

llvm-svn: 291724
2017-01-11 23:24:22 +00:00
Zachary Turner a9054ddd9c [CodeView/PDB] Rename a bunch of files.
We were starting to get some name clashes between llvm-pdbdump
and the common CodeView framework, so I took this opportunity
to rename a bunch of files to more accurately describe their
usage.  This also helps in llvm-pdbdump to distinguish
between different files and whether they are used for pretty
dump mode or raw dump mode.

llvm-svn: 291627
2017-01-11 00:35:43 +00:00
Zachary Turner c640b76db5 [CodeView] Add TypeDatabase class.
This creates a centralized class in which to store type records.
It stores types as an array of entries, which matches the
notion of a type stream being a topologically sorted DAG.
Logic to build up such a database was already being used in
CVTypeDumper, so CVTypeDumper is now updated to to read from
a TypeDatabase which is filled out by an earlier visitor in
the pipeline.

Differential Revision: https://reviews.llvm.org/D28486

llvm-svn: 291626
2017-01-11 00:35:08 +00:00
Zachary Turner 10005d915e Delete unused file.
llvm-svn: 290021
2016-12-17 00:58:19 +00:00
Zachary Turner 46225b193f Resubmit "[CodeView] Hook CodeViewRecordIO for reading/writing symbols."
The original patch was broken due to some undefined behavior
as well as warnings that were triggering -Werror.

llvm-svn: 290000
2016-12-16 22:48:14 +00:00
Zachary Turner d0fffd1d14 Revert "[CodeView] Hook CodeViewRecordIO for reading/writing symbols."
This reverts commit r289978, which is failing due to some rebase/merge
issues.

llvm-svn: 289981
2016-12-16 19:25:23 +00:00
Zachary Turner a4e7dfbc16 [CodeView] Hook CodeViewRecordIO for reading/writing symbols.
This is the 3rd of 3 patches to get reading and writing of
CodeView symbol and type records to use a single codepath.

Differential Revision: https://reviews.llvm.org/D26427

llvm-svn: 289978
2016-12-16 19:20:35 +00:00
Zachary Turner 44728f4014 Fix some size_t / uint32_t ambiguity errors.
llvm-svn: 286305
2016-11-08 22:30:11 +00:00
Zachary Turner 4efa0a4201 [CodeView] Hook up CodeViewRecordIO to type serialization path.
Previously support had been added for using CodeViewRecordIO
to read (deserialize) CodeView type records.  This patch adds
support for writing those same records.  With this patch,
reading and writing of CodeView type records finally uses a single
codepath.

Differential Revision: https://reviews.llvm.org/D26253

llvm-svn: 286304
2016-11-08 22:24:53 +00:00
Zachary Turner 7251ede7c5 Add CodeViewRecordIO for reading and writing.
Using a pattern similar to that of YamlIO, this allows
us to have a single codepath for translating codeview
records to and from serialized byte streams.  The
current patch only hooks this up to the reading of
CodeView type records.  A subsequent patch will hook
it up for writing of CodeView type records, and then a
third patch will hook up the reading and writing of
CodeView symbols.

Differential Revision: https://reviews.llvm.org/D26040

llvm-svn: 285836
2016-11-02 17:05:19 +00:00
Bob Haarman 26a87bd030 [codeview] support emitting indirect virtual base class information
Summary:
Fixes PR28281.

MSVC lists indirect virtual base classes in the field list of a class,
using LF_IVBCLASS records. This change makes LLVM emit such records
when processing DW_TAG_inheritance tags with the DIFlagVirtual and
(newly introduced) DIFlagIndirect tags.

Reviewers: rnk, ruiu, zturner

Differential Revision: https://reviews.llvm.org/D25578

llvm-svn: 285130
2016-10-25 22:11:52 +00:00
Zachary Turner 4d49eb9fa0 [CodeView] Refactor serialization to use StreamInterface.
This was all using ArrayRef<>s before which presents a problem
when you want to serialize to or deserialize from an actual
PDB stream.  An ArrayRef<> is really just a special case of
what can be handled with StreamInterface though (e.g. by using
a ByteStream), so changing this to use StreamInterface allows
us to plug in a PDB stream and get all the record serialization
and deserialization for free on a MappedBlockStream.

Subsequent patches will try to remove TypeTableBuilder and
TypeRecordBuilder in favor of class that operate on
Streams as well, which should allow us to completely merge
the reading and writing codepaths for both types and symbols.

Differential Revision: https://reviews.llvm.org/D25831

llvm-svn: 284762
2016-10-20 18:31:19 +00:00
Reid Kleckner 990504e625 Remove LLVM_NOEXCEPT and replace it with noexcept
Now that we have dropped MSVC 2013, all supported compilers support
noexcept and we can drop this portability macro.

llvm-svn: 284672
2016-10-19 23:52:38 +00:00
Reid Kleckner edfc9dcf42 Truncate long names in type records
In the MS ABI, the frontend is supposed to MD5 such pathologically long
names. LLVM should still defend itself from long names, though.

Fixes part of PR29098.

llvm-svn: 284136
2016-10-13 17:33:22 +00:00