Originally I wanted to remove the RegularExpression class in Utility and
replace it with llvm::Regex. However, during that transition I noticed
that there are several places where need the regular expression string.
So instead I propose to keep the RegularExpression class and make it a
thin wrapper around llvm::Regex.
This patch also removes the workaround for empty regular expressions.
The result is that we are now (more or less) POSIX conformant.
Differential revision: https://reviews.llvm.org/D66174
llvm-svn: 369153
Summary:
This commit achieves the following:
- Functions used to return a `TypeSystem *` return an
`llvm::Expected<TypeSystem *>` now. This means that the result of a call
is always checked, forcing clients to move more carefully.
- `TypeSystemMap::GetTypeSystemForLanguage` will either return an Error or a
non-null pointer to a TypeSystem.
Reviewers: JDevlieghere, davide, compnerd
Subscribers: jdoerfert, lldb-commits
Differential Revision: https://reviews.llvm.org/D65122
llvm-svn: 367360
Delete the abstract GetOffset function, which is only defined for
rnglists entries. Instead fix up entries which refer to the range list
classes so that one can statically know that he is dealing with the
rnglists section and call the function that way.
llvm-svn: 367106
Summary:
With the last round of refactors, supporting type units in dwo files
becomes almost trivial. This patch contains a couple of small fixes,
which taken as a whole make type units work in the split dwarf scenario
(both DWARF4 and DWARF5):
- DWARFContext: make sure we actually read the debug_types.dwo section
- DWARFUnit: set string offsets base on all units in the dwo file, not
just the main CU
- ManualDWARFIndex: index all units in the file
- SymbolFileDWARFDwo: Search for the single compile unit in the file, as
we can no longer assume it will be the first one
The last part makes it obvious that there is still some work to be done
here, namely that we do not support dwo files with multiple compile
units. That is something that should be easier after the DIERef
refactors, but it still requires more work.
Tests are added for the type units+split dwarf + dwarf4/5 scenarios, as
well as a test that checks we behave reasonably in the presence of dwo
files with multiple CUs.
Reviewers: clayborg, JDevlieghere, aprantl
Subscribers: arphaman, lldb-commits
Differential Revision: https://reviews.llvm.org/D63643
llvm-svn: 364274
Summary:
When dwo support was introduced, it used a trick where debug info
entries were referenced by the offset of the compile unit in the main
file, but the die offset was relative to the dwo file. Although there
was some elegance to it, this representation was starting to reach its
breaking point:
- the fact that the skeleton compile unit owned the DWO file meant that
it was impossible (or at least hard and unintuitive) to support DWO
files containing more than one compile unit. These kinds of files are
produced by LTO for example.
- it made it impossible to reference any DIEs in the skeleton compile
unit (although the skeleton units are generally empty, clang still
puts some info into them with -fsplit-dwarf-inlining).
- (current motivation) it made it very hard to support type units placed
in DWO files, as type units don't have any skeleton units which could
be referenced in the main file
This patch addresses this problem by introducing an new
"dwo_num" field to the DIERef class, whose purpose is to identify the
dwo file. It's kind of similar to the dwo_id field in DWARF5 unit
headers, but while this is a 64bit hash whose main purpose is to catch
file mismatches, this is just a smaller integer used to indentify a
loaded dwo file. Currently, this is based on the index of the skeleton
compile unit which owns the dwo file, but it is intended to be
eventually independent of that (to support the LTO use case).
Simultaneously the cu_offset is dropped to conserve space, as it is no
longer necessary. This means we can remove the "BaseObjectOffset" field
from the DWARFUnit class. It also means we can remove some of the
workarounds put in place to support the skeleton-unit+dwo-die combo.
More work is needed to remove all of them, which is out of scope of this
patch.
Reviewers: JDevlieghere, clayborg, aprantl
Subscribers: mehdi_amini, dexonsmith, arphaman, lldb-commits
Differential Revision: https://reviews.llvm.org/D63428
llvm-svn: 364009
Previously it was storing a *pointer*, which left open the possibility
of this pointer being null. We never made use of that possibility (it
does not make sense), and most of the code was already assuming that.
However, there were a couple of null-checks scattered around the code.
This patch replaces the reference with a pointer, making the
non-null-ness explicit, and removes the remaining null-checks.
llvm-svn: 363381
Summary:
This patch creates a cache of file lists in line tables referenced by
type units. This cache is used to avoid parsing a line table twice
(since a file list will generally be shared by many type units).
It also sets things up in a way that parsing of DW_AT_decl_file
attributes will keep working even when we stop creating lldb compile
units for dwarf type units, but it stops short of actually doing that.
This means that the request for files now go directly to SymbolFileDWARF
instead of being routed there indirectly via the
lldb_private::CompileUnit class.
As a result of this, a number of occurences of SymbolContext variables
in DWARFASTParserClang have become unused, so I remove them.
This patch reduces the number of times a file list is being parsed, but
the situation is still suboptimal, as the parsed list is being copied
multiple times. This will be fixed when we stop creating CompileUnits
for DWARF type units.
Reviewers: clayborg, aprantl, JDevlieghere
Subscribers: jdoerfert, lldb-commits
Differential Revision: https://reviews.llvm.org/D62894
llvm-svn: 363143
Summary:
r362103 exposed a bug, where we could read incorrect data if a skeleton
unit contained more than the single unit DIE. Clang emits these kinds of
units with -fsplit-dwarf-inlining (which is also the default).
Changing lldb to handle these DIEs is nontrivial, as we'd have to change
the UID encoding logic to be able to reference these DIEs, and fix up
various places which are assuming that all DIEs come from the separate
compile unit.
However, it turns out this is not necessary, as the DWO unit contains
all the information that the skeleton unit does. So, this patch just
skips parsing the extra DIEs if we have successfully found the DWO file.
This enforces the invariant that the rest of the code is already
operating under.
This patch fixes a couple of existing tests, but I've also included a
simpler test which does not depend on execution of binaries, and would
have helped us in catching this sooner.
Reviewers: clayborg, JDevlieghere, aprantl
Subscribers: probinson, dblaikie, lldb-commits
Differential Revision: https://reviews.llvm.org/D62852
llvm-svn: 362586
The function Extract() is almost a duplicate of FastExtract() but is not used.
Delete it and rename FastExtract() to Extract().
Reviewed By: JDevlieghere
Differential Revision: https://reviews.llvm.org/D62593
llvm-svn: 362049
Summary:
debug_ranges got renamed to debug_rnglists in DWARF 5. Prior to this
patch lldb was just picking the first section it could find in the file,
and using that for all address ranges lookups. This is not correct in
case the file contains a mixture of compile units with various standard
versions (not a completely unlikely scenario).
In this patch I make lldb support reading from both sections
simulaneously, and decide the correct section to use based on the
version number of the compile unit. SymbolFileDWARF::DebugRanges is
split into GetDebugRanges and GetDebugRngLists (the first one is renamed
mainly so we can catch all incorrect usages).
I tried to structure the code similarly to how llvm handles this logic
(hence DWARFUnit::FindRnglistFromOffset/Index), but the implementations
are still relatively far from each other.
Reviewers: JDevlieghere, aprantl, clayborg
Subscribers: lldb-commits
Differential Revision: https://reviews.llvm.org/D62302
llvm-svn: 361938
The fix form sizes use to have two arrays: one for 4 byte addresses and in for 8 byte addresses. The table had an issue where DW_FORM_flag_present wasn't being represented as a fixed size form because its actual size _is_ zero and zero was used to indicate the form isn't fixed in size. Any code that needed to quickly access the DWARF had to get a FixedFormSizes instance using the address byte size.
This fix cleans things up by adding a DWARFFormValue::GetFixedSize() both as a static method and as a member function on DWARFFormValue. It correctly can indicate if a form size is zero. This cleanup is a precursor to a follow up patch where I hope to speed up DWARF parsing.
I verified performance doesn't regress by loading hundreds of DWARF files and setting a breakpoint by file and line and by name in files that do not have DWARF indexes. Performance remained consistent between the two approaches.
Differential Revision: https://reviews.llvm.org/D62416
llvm-svn: 361675
Summary:
This patch implements the main feature of type units. When completing a
type, if we encounter a DW_AT_signature attribute, we use it's value to
lookup the complete definition of the type in the relevant type unit.
To enable this lookup, we build up a map of all type units in a symbol
file when parsing the units. Then we consult this map when resolving the
DW_AT_signature attribute.
I include add a couple of tests which exercise the type lookup feature,
including one that ensure we do something reasonable in case we fail to
lookup the type.
A lot of the ideas in this patch have been taken from D32167 and D61505.
Reviewers: clayborg, JDevlieghere, aprantl, alexshap
Subscribers: mgrang, lldb-commits
Differential Revision: https://reviews.llvm.org/D62246
llvm-svn: 361603
Summary:
NFC = [[ https://llvm.org/docs/Lexicon.html#nfc | Non functional change ]]
This commit is the result of modernizing the LLDB codebase by using
`nullptr` instread of `0` or `NULL`. See
https://clang.llvm.org/extra/clang-tidy/checks/modernize-use-nullptr.html
for more information.
This is the command I ran and I to fix and format the code base:
```
run-clang-tidy.py \
-header-filter='.*' \
-checks='-*,modernize-use-nullptr' \
-fix ~/dev/llvm-project/lldb/.* \
-format \
-style LLVM \
-p ~/llvm-builds/debug-ninja-gcc
```
NOTE: There were also changes to `llvm/utils/unittest` but I did not
include them because I felt that maybe this library shall be updated in
isolation somehow.
NOTE: I know this is a rather large commit but it is a nobrainer in most
parts.
Reviewers: martong, espindola, shafik, #lldb, JDevlieghere
Reviewed By: JDevlieghere
Subscribers: arsenm, jvesely, nhaehnle, hiraditya, JDevlieghere, teemperor, rnkovacs, emaste, kubamracek, nemanjai, ki.stfu, javed.absar, arichardson, kbarton, jrtc27, MaskRay, atanasyan, dexonsmith, arphaman, jfb, jsji, jdoerfert, lldb-commits, llvm-commits
Tags: #lldb, #llvm
Differential Revision: https://reviews.llvm.org/D61847
llvm-svn: 361484
Summary:
Type units don't describe any code, so they should never be the result
of any address lookup queries.
Previously, we would compute the address ranges for the type units for
via the line tables they reference because the type units looked a lot
like line-tables-only compile units. However, this is not correct, as
the line tables are only referenced from type units so that other
declarations can use the file names contained in them.
In this patch I make the BuildAddressRangeTable function virtual, and
implement it only for compile units.
Testing this was a bit tricky, because the behavior depends on the order
in which we add things to the address range map. This rarely caused a
problem with DWARF v4 type units, as they are always added after all
CUs. It happened more frequently with DWARF v5, as there clang emits the
type units first. However, this is still not something that it is
required to do, so for testing I've created an assembly file where I've
deliberately sandwiched a compile unit between two type units, which
should isolate us from both changes in how the compiler emits the units
and changes in the order we process them.
Reviewers: clayborg, aprantl, JDevlieghere
Subscribers: jdoerfert, lldb-commits
Differential Revision: https://reviews.llvm.org/D62178
llvm-svn: 361465
Summary:
This patch introduces the DWARFTypeUnit class, and teaches lldb to parse
type units out of both the debug_types section (DWARF v4), and from the
regular debug_info section (DWARF v5).
The most important piece of functionality - resolving DW_AT_signatures
to connect type forward declarations to their definitions - is not
implemented here, but even without that, a lot of functionality becomes
available. I've added tests for the commands that start to work after
this patch.
The changes in this patch were greatly inspired by D61505, which in turn took
over changes from D32167.
Reviewers: JDevlieghere, clayborg, aprantl
Subscribers: mgorny, jankratochvil, lldb-commits
Differential Revision: https://reviews.llvm.org/D62008
llvm-svn: 361360
In D61502#1503247 @clayborg suggested that SymbolFileDWARF *dwarf2Data is
really redundant in all the calls with also having DWARFUnit *cu. So remove it.
One `SymbolFileDWARF *` nullptr check
(DWARFDebugInfoEntry::GetDIENamesAndRanges) could be removed, other two nullptr
checks (DWARFDebugInfoEntry::GetName and DWARFDebugInfoEntry::AppendTypeName)
need to stay in place (now for `DWARFUnit *`).
Differential Revision: https://reviews.llvm.org/D62011
llvm-svn: 361277
Summary:
This patch introduces the DWARFUnitHeader class. Its purpose (and its
structure, to the extent it was possible to make it) is the same as its
LLVM counterpart -- to extract the unit header information before we
actually construct the unit, so that we know which kind of units to
construct. This is needed because as of DWARF5, type units live in the
.debug_info section, which means it's not possible to statically
determine the type of units in a given section.
Reviewers: aprantl, clayborg, JDevlieghere
Subscribers: lldb-commits
Differential Revision: https://reviews.llvm.org/D62073
llvm-svn: 361224
This moves the sections from SymbolFileDWARF to DWARFContext, where it
was trivial to do so. A couple of sections are still left in
SymbolFileDWARF. These will be handled by separate patches.
llvm-svn: 361127
So far dw_offset_t was global for the whole SymbolFileDWARF but with
.debug_types the same dw_offset_t may mean two different things depending on
its section (=CU). So references now return whole new referenced DWARFDIE
instead of just dw_offset_t.
This means that some functions have to now handle 16 bytes instead of 8 bytes
but I do not see that anywhere performance critical.
Differential Revision: https://reviews.llvm.org/D61502
llvm-svn: 360795
D42892 changed a lot of code to use superclass DWARFUnit instead of its
subclass DWARFCompileUnit.
Finish this change more thoroughly for any *CompileUnit* -> *Unit* names.
Later patch will introduce DWARFTypeUnit which needs to be sometimes different
from DWARFCompileUnit and it would be confusing without this renaming.
Differential Revision: https://reviews.llvm.org/D61501
llvm-svn: 360443
Summary:
The implementation of GetID used a relatively complicated algorithm,
which returned some kind of an offset of the unit in some file
(depending on the debug info flavour). The only thing this ID was used
for was to enable subseqent retrieval of the unit from the SymbolFile.
This can be made simpler if we just make the "ID" of the unit an index
into the list of the units belonging to the symbol file. We already
support indexed access to the units, so each unit already has a well
"index" -- this just makes it accessible from within the unit.
To make the distincion between "id" and "offset" clearer (and help catch
any misuses), I also rename DWARFDebugInfo::GetCompileUnit (which
accesses by offset) into DWARFDebugInfo::GetCompileUnitAtOffset.
On its own, this only brings a minor simplification, but it enables
further simplifications in the DIERef class (coming in a follow-up
patch).
Reviewers: JDevlieghere, clayborg, aprantl
Subscribers: arphaman, jdoerfert, lldb-commits, tberghammer, jankratochvil
Differential Revision: https://reviews.llvm.org/D61481
llvm-svn: 360014
A lot of comments in LLDB are surrounded by an ASCII line to delimit the
begging and end of the comment.
Its use is not really consistent across the code base, sometimes the
lines are longer, sometimes they are shorter and sometimes they are
omitted. Furthermore, it looks kind of weird with the 80 column limit,
where the comment actually extends past the line, but not by much.
Furthermore, when /// is used for Doxygen comments, it looks
particularly odd. And when // is used, it incorrectly gives the
impression that it's actually a Doxygen comment.
I assume these lines were added to improve distinguishing between
comments and code. However, given that todays editors and IDEs do a
great job at highlighting comments, I think it's worth to drop this for
the sake of consistency. The alternative is fixing all the
inconsistencies, which would create a lot more churn.
Differential revision: https://reviews.llvm.org/D60508
llvm-svn: 358135
This reverts commit r356682 because it breaks the DWO flavours of some
tests:
lldb-Suite :: lang/c/const_variables/TestConstVariables.py
lldb-Suite :: lang/c/local_variables/TestLocalVariables.py
lldb-Suite :: lang/c/vla/TestVLA.py
llvm-svn: 356773
This is mostly mechanical, and just moves the remaining non-DWO
related sections over to DWARFContext.
Differential Revision: https://reviews.llvm.org/D59611
llvm-svn: 356682
All of this is code that is unreferenced. Removing as much of
this as possible makes it more easy to determine what functionality
is missing and/or shared between LLVM and LLDB's DWARF interfaces.
llvm-svn: 356509
These log statements have questionable value, and hinder the effort
of separating the high and low level DWARF parsing interfaces inside
of LLDB. Removing them for now, and if/when we need such log statements
again in the future, we can add them back (if possible) or introduce a
mechanism for logging from the low-level interface in such a way that it
isn't coupled to the high level interface.
Differential Revision: https://reviews.llvm.org/D59498
llvm-svn: 356469
LLVM doesn't produce DWARF64, and neither does GCC. LLDB's support
for DWARF64 is only partial, and if enabled appears to also not work.
Finally, it's untested. Removing this makes merging LLVM and
LLDB's DWARF parsing implementations simpler.
Differential Revision: https://reviews.llvm.org/D59235
llvm-svn: 355975
This is a very thin wrapper over a std::vector<DWARFDIE> and does
not seem to provide any real value over just using a container
directly.
Differential Revision: https://reviews.llvm.org/D59165
llvm-svn: 355974
The `ap` suffix is a remnant of lldb's former use of auto pointers,
before they got deprecated. Although all their uses were replaced by
unique pointers, some variables still carried the suffix.
In r353795 I removed another auto_ptr remnant, namely redundant calls to
::get for unique_pointers. Jim justly noted that this is a good
opportunity to clean up the variable names as well.
I went over all the changes to ensure my find-and-replace didn't have
any undesired side-effects. I hope I didn't miss any, but if you end up
at this commit doing a git blame on a weirdly named variable, please
know that the change was unintentional.
llvm-svn: 353912
Summary:
This adds support for auto-detection of path style to SymbolFileBreakpad
(similar to how r351328 did the same for DWARF). We guess each file
entry separately, as we have no idea which file came from which compile
units (and different compile units can have different path styles). The
breakpad generates should have already converted the paths to absolute
ones, so this guess should be reasonable accurate, but as always with
these kinds of things, it is hard to give guarantees about anything.
In an attempt to bring some unity to the path guessing logic, I move the
guessing logic from inside SymbolFileDWARF into the FileSpec class and
have both symbol files use it to implent their desired behavior.
Reviewers: clayborg, lemo, JDevlieghere
Subscribers: aprantl, markmentovai, lldb-commits
Differential Revision: https://reviews.llvm.org/D57895
llvm-svn: 353702
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
Summary:
If we opened a file which was produced on system with different path
syntax, we would parse the paths from the debug info incorrectly.
The reason for that is that we would parse the paths as they were
native. For example this meant that on linux we would treat the entire
windows path as a single file name with no directory component, and then
we would concatenate that with the single directory component from the
DW_AT_comp_dir attribute. When parsing posix paths on windows, we would
at least get the directory separators right, but we still would treat
the posix paths as relative, and concatenate them where we shouldn't.
This patch attempts to remedy this by guessing the path syntax used in
each compile unit. (Unfortunately, there is no info in DWARF which would
give the definitive path style used by the produces, so guessing is all
we can do.) Currently, this guessing is based on the DW_AT_comp_dir
attribute of the compile unit, but this can be refined later if needed
(for example, the DW_AT_name of the compile unit may also contain some
useful info). This style is then used when parsing the line table of
that compile unit.
This patch is sufficient to make the line tables come out right, and
enable breakpoint setting by file name work correctly. Setting a
breakpoint by full path still has some kinks (specifically, using a
windows-style full path will not work on linux because the path will be
parsed as a linux path), but this will require larger changes in how
breakpoint setting works.
Reviewers: clayborg, zturner, JDevlieghere
Subscribers: aprantl, lldb-commits
Differential Revision: https://reviews.llvm.org/D56543
llvm-svn: 351328
This patch simplifies boolean expressions acorss LLDB. It was generated
using clang-tidy with the following command:
run-clang-tidy.py -checks='-*,readability-simplify-boolean-expr' -format -fix $PWD
Differential revision: https://reviews.llvm.org/D55584
llvm-svn: 349215
A skeleton compilation unit may contain the DW_AT_str_offsets_base attribute
that points to the first string offset of the CU contribution to the
.debug_str_offsets. At the same time, when we use split dwarf,
the corresponding split debug unit also
may use DW_FORM_strx* forms pointing to its own .debug_str_offsets.dwo.
In that case, DWO does not contain DW_AT_str_offsets_base, but LLDB
still need to know and skip the .debug_str_offsets.dwo section header to
access the offsets.
The patch implements the support of DW_AT_str_offsets_base.
Differential revision: https://reviews.llvm.org/D54844
llvm-svn: 347859
The issue happens because starting from DWARF v5
DW_AT_addr_base attribute should be used
instead of DW_AT_GNU_addr_base. LLDB does not do that and
we end up reading the .debug_addr header as section content
(as addresses) instead of skipping it and reading the real addresses.
Then LLDB is unable to match 2 similar locations and
thinks they are different.
Differential revision: https://reviews.llvm.org/D54751
llvm-svn: 347842
Summary:
While parsing a childless compile unit DIE we could crash if the DIE was
followed by any extra data (such as a superfluous end-of-children
marker). This happened because the break-on-depth=0 check was performed
only when parsing the null DIE, which was not correct because with a
childless root DIE, we could reach the end of the unit without ever
encountering the null DIE.
If the compile unit contribution ended directly after the CU DIE,
everything would be fine as we would terminate parsing due to reaching
EOF. However, if the contribution contained extra data (perhaps a
superfluous end-of-children marker), we would crash because we would
treat that data as the begging of another compile unit.
This fixes the crash by moving the depth=0 check to a more generic
place, and also adds a regression test.
Reviewers: clayborg, jankratochvil, JDevlieghere
Subscribers: lldb-commits
Differential Revision: https://reviews.llvm.org/D54417
llvm-svn: 346849
This adds support for DW_RLE_base_addressx, DW_RLE_startx_endx,
DW_RLE_startx_length, DW_FORM_rnglistx.
Differential revision: https://reviews.llvm.org/D53929
llvm-svn: 345958
This adds the support for DW_FORM_addrx, DW_FORM_addrx1,
DW_FORM_addrx2, DW_FORM_addrx3, DW_FORM_addrx4 forms.
Differential revision: https://reviews.llvm.org/D53813
llvm-svn: 345706
It merges DWARFDebugInfoEntry's m_empty_children into m_has_children.
m_empty_children was implemented by rL144983.
As Greg confirmed m_has_children was used to represent what was in the DWARF in
the byte that follows the DW_TAG. m_empty_children was used for DIEs that said
they had children but actually only contain a single NULL tag. It is fine to
not differentiate between the two.
Also changed assert()->lldbassert() for m_abbr_idx 16-bit overflow check as
that could be a tough bug to catch if it ever happens.
I have checked all calls of HasChildren() that this change should not matter to
them. The code even wants to know if there are any children - it does not
matter how the children presence is coded in the binary.
Patch written based on suggestions by Greg Clayton.
Differential Revision: https://reviews.llvm.org/D53321
llvm-svn: 344644
xbolva00 bugreported $subj in: https://reviews.llvm.org/D46810#1247410
It can happen only from the line:
m_die_array.back().SetEmptyChildren(true);
In the case DW_TAG_compile_unit has DW_CHILDREN_yes but there is only 0 (end of
list, no children present). Therefore the assertion can fortunately happen only
with a hand-crafted DWARF or with DWARF from some suboptimal compilers.
Differential Revision: https://reviews.llvm.org/D53255
llvm-svn: 344605
If BuildAddressRangeTable called ExtractDIEsIfNeeded(false), then another
thread started processing data from m_die_array and then the first thread
called final ClearDIEs() the second thread would crash.
It is also required without multithreaded debugger using DW_TAG_partial_unit
for DWZ.
Differential revision: https://reviews.llvm.org/D40470
llvm-svn: 333987
rL145086 introduced m_die_array.shrink_to_fit() implemented by
exact_size_die_array.swap, it was before LLVM became written in C++11.
Differential revision: https://reviews.llvm.org/D47492
llvm-svn: 333636
GetUnitDIEPtrOnly() needs to return pointer to the first DIE.
But the first element of m_die_array after ExtractDIEsIfNeeded(true)
may move in memory after later ExtractDIEsIfNeeded(false).
DWARFDebugInfoEntry::collection m_die_array is std::vector,
its data may move during its expansion.
Differential revision: https://reviews.llvm.org/D46810
llvm-svn: 333437