Commit Graph

4 Commits

Author SHA1 Message Date
Kirill Bobyrev b217ddb1bb [clangd] Use TRUE iterator instead of complete posting list
Stop using `$$$` (empty) trigram and generating a posting list with all
items. Since TRUE iterator is already implemented and correctly inserted
when there are no real trigram posting lists, this is a valid
transformation.

Benchmarks show that this simple change allows ~30% speedup on dataset
of real completion queries.

Before

```
-------------------------------------------------------
Benchmark                Time           CPU Iterations
-------------------------------------------------------
DexAdHocQueries    5640321 ns    5640265 ns        120
DexRealQ         939835603 ns  939830296 ns          1
```

After

```
-------------------------------------------------------
Benchmark                Time           CPU Iterations
-------------------------------------------------------
DexAdHocQueries    3452014 ns    3451987 ns        203
DexRealQ         667455912 ns  667455750 ns          1
```

Reviewed by: ilya-biryukov

Differential Revision: https://reviews.llvm.org/D51287

llvm-svn: 340729
2018-08-27 09:47:50 +00:00
Simon Pilgrim 7b31fae983 Fix MSVC 'std::min: no matching overloaded function found' error.
llvm-svn: 339557
2018-08-13 12:24:48 +00:00
Kirill Bobyrev ff2dd9095f [clangd] Generate incomplete trigrams for the Dex index
This patch handles trigram generation "short" identifiers and queries.
Trigram generator produces incomplete trigrams for short names so that
the same query iterator API can be used to match symbols which don't
have enough symbols to form a trigram and correctly handle queries which
also are not sufficient for generating a full trigram.

Reviewed by: ioeric

Differential revision: https://reviews.llvm.org/D50517

llvm-svn: 339548
2018-08-13 08:57:06 +00:00
Kirill Bobyrev 5e82f05e7a [clangd] Introduce Dex symbol index search tokens
This patch introduces the core building block of the next-generation
Clangd symbol index - Dex. Search tokens are the keys in the inverted
index and represent a characteristic of a specific symbol: examples of
search token types (Token Namespaces) are

* Trigrams -  these are essential for unqualified symbol name fuzzy
search * Scopes for filtering the symbols by the namespace * Paths, e.g.
these can be used to uprank symbols defined close to the edited file

This patch outlines the generic for such token namespaces, but only
implements trigram generation.

The intuition behind trigram generation algorithm is that each extracted
trigram is a valid sequence for Fuzzy Matcher jumps, proposed
implementation utilize existing FuzzyMatcher API for segmentation and
trigram extraction.

However, trigrams generation algorithm for the query string is different
from the previous one: it simply yields sequences of 3 consecutive
lowercased valid characters (letters, digits).

Dex RFC in the mailing list:
http://lists.llvm.org/pipermail/clangd-dev/2018-July/000022.html

The trigram generation techniques are described in detail in the
proposal:
https://docs.google.com/document/d/1C-A6PGT6TynyaX4PXyExNMiGmJ2jL1UwV91Kyx11gOI/edit#heading=h.903u1zon9nkj

Reviewers: sammccall, ioeric, ilya-biryukovA

Subscribers: cfe-commits, klimek, mgorny, MaskRay, jkorous, arphaman

Differential Revision: https://reviews.llvm.org/D49591

llvm-svn: 337901
2018-07-25 10:34:57 +00:00