New user self-registration is disabled due to spam. For an account please email bugs-admin@lists.llvm.org with your e-mail address and full name.

Bug 45819 - Nondeterminism of iterators causes false ThinLTO cache misses
Summary: Nondeterminism of iterators causes false ThinLTO cache misses
Status: RESOLVED FIXED
Alias: None
Product: tools
Classification: Unclassified
Component: lto (show other bugs)
Version: trunk
Hardware: PC Windows NT
: P normal
Assignee: katya.romanova
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-05-06 14:47 PDT by katya.romanova
Modified: 2020-06-16 14:53 PDT (History)
8 users (show)

See Also:
Fixed By Commit(s): 252892fea7088abbeff9476e0ecbacc091d135a0


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description katya.romanova 2020-05-06 14:47:35 PDT
We noticed that when building an executable with ThinLTO (cache enabled), from time to time unexpected cache misses happening and new cache entries are generated when not needed. It doesn't happen in a predictable manner (i.e. a cache miss might or might not happen). I will provide an explanation of why it happens sporadically.

==========================

REPRODUCING THE PROBLEM:

Pretty much any medium size project will reproduce the problem. 

Below is an example, where I link 9 bitcode files with ThinLTO (cache enabled). 9 cache entries are created. After I relink again (without any changes), 13 cache entries appear (i.e. ThinLTO decided to regenerate cache for 4 files).

// No cache directory yet
$ ls
bar.c  clean.bat    foo.c  har.c  hoo.c  lar.c  link-only.bat  loo.o   main.o  mar.o  moo.o
bar.o  compile.bat  foo.o  har.o  hoo.o  lar.o  loo.c          main.c  mar.c   moo.c  run.bat

// linking 9 bitcode files with ThinLTO and cache enabled.
$ D:/LLVM/Upstream_TOT_Windows/build/Release/bin/ld.lld.exe -v --thinlto-cache-dir=CACHE main.o bar.o mar.o lar.o har.o foo.o moo.o loo.o hoo.o -o main.exe
LLD 11.0.0 (https://github.com/llvm/llvm-project.git 0e13a0331fb90078bf71cc0c4612492a6954a5d0) (compatible with GNU linkers)

// After linking, CACHE directory appears; it has 9 cache entries now
$ ls CACHE
llvmcache.timestamp                                 llvmcache-672ECABBAC96BA45430F348DFCE8E15491B55EB6
llvmcache-03A6D70AEC6CAD0212E34C0A503FBFC9FE5215F1  llvmcache-992141653337693DA6421334203353861997FC36
llvmcache-309BAC6DAF777946CF2B27C3FD275A9068C05062  llvmcache-B514E34E44F63329B217ECDBC11D3CA910D4C61C
llvmcache-4320D3E62841F1AF0ED4D2C750579F993CA6D990  llvmcache-B68A1DC4F284712DC4A93C5ABC81D7F6004A8B44
llvmcache-50BC52B7A3DB8537B5803E65DD37EAFA66ED3909  llvmcache-CFE77E094F6A065B944ECC2CB43C8E2CEF17BE3B

// re-linking the same way as before (no changes were made)
$ D:/LLVM/Upstream_TOT_Windows/build/Release/bin/ld.lld.exe -v --thinlto-cache-dir=CACHE main.o bar.o mar.o lar.o har.o foo.o moo.o loo.o hoo.o -o main.exe
LLD 11.0.0 (https://github.com/llvm/llvm-project.git 0e13a0331fb90078bf71cc0c4612492a6954a5d0) (compatible with GNU linkers)

// Note that CACHE directory now has 13 cache entries, though only 9 cache entries are expected.
$ ls CACHE
llvmcache.timestamp                                 llvmcache-672ECABBAC96BA45430F348DFCE8E15491B55EB6
llvmcache-03A6D70AEC6CAD0212E34C0A503FBFC9FE5215F1  llvmcache-8EBFE11F8BD1F02A060B8A4AB2061C0442B64CF1
llvmcache-2EF2ADD5D8F0D727E460B6A94CA3A24C71C051CB  llvmcache-992141653337693DA6421334203353861997FC36
llvmcache-309BAC6DAF777946CF2B27C3FD275A9068C05062  llvmcache-B514E34E44F63329B217ECDBC11D3CA910D4C61C
llvmcache-41900E8308B2AB588B8EFDBFEE923C1F3BA154C8  llvmcache-B68A1DC4F284712DC4A93C5ABC81D7F6004A8B44
llvmcache-4320D3E62841F1AF0ED4D2C750579F993CA6D990  llvmcache-CFE77E094F6A065B944ECC2CB43C8E2CEF17BE3B
llvmcache-50BC52B7A3DB8537B5803E65DD37EAFA66ED3909  llvmcache-D032EEB2A05D13816229E79B08CAFFE5526A0A22


ANALYSIS OF THE PROBLEM:

Look at the following excerpt in LTO.cpp, where we are calculating the LTO cache entry key:

void llvm::computeLTOCacheKey(
    SmallString<40> &Key, const Config &Conf, const ModuleSummaryIndex &Index,
    StringRef ModuleID, const FunctionImporter::ImportMapTy &ImportList,
    const FunctionImporter::ExportSetTy &ExportList,
    const std::map<GlobalValue::GUID, GlobalValue::LinkageTypes> &ResolvedODR,
    const GVSummaryMapTy &DefinedGlobals,
    const std::set<GlobalValue::GUID> &CfiFunctionDefs,
    const std::set<GlobalValue::GUID> &CfiFunctionDecls) {

Note that ‘ExportList’ is a ‘DenseSet’ of ValueInfo's:
    using ExportSetTy = DenseSet<ValueInfo>;

‘ExportList’ is used for the calculation of a hash value (LTO Cache Key).

Though the elements of this DenseSet are the same all the time, the order in which the iterator walks through the elements of this set might (or might not) be different the next time when we relink with ThinLTO. If the order happens to be different, we will generate a different hash value and a cache miss will happen.

I wrote some code that looped through the elements of an ExportList (DenseSet<ValueInfo>) using the iterator and printed GUID values (which is what is used when a cache entry key is calculated).

For one of the files (bar.o in the example below), the order in which ExportList elements are visited is different, so the cache entry key value is different too. As a result, we will have a cache miss here.

Module ID: bar.o
Export: 16434608426314478903 13549030661878666305 5512409407375956689

Module ID: bar.o
Export: 16434608426314478903 5512409407375956689 13549030661878666305

Looking at the implementation of:
  template <> struct DenseMapInfo<ValueInfo> (see ModuleSummaryIndex.h)

we notice that the hash value that is being used by a DenseMap is a pointer:
  static unsigned getHashValue(ValueInfo I) { return (uintptr_t)I.getRef(); }

but we cannot guarantee that the compiler will allocate memory at the same
address for the object when we run the executable for the second time.

HOW WE COULD FIX IT:

(1) We could store GUIDs into a container and sort the values before we use them to generate a hash value.

Before:
  for (const auto &VI : ExportList) {
    auto GUID = VI.getGUID();
    // The export list can impact the internalization, be conservative here
    Hasher.update(ArrayRef<uint8_t>((uint8_t *)&GUID, sizeof(GUID)));
  }

After:
  std::vector<uint64_t> exportsGUID;
  exportsGUID.reserve(ExportList.size());
  for (const auto &VI : ExportList) {
    auto GUID = VI.getGUID();
    exportsGUID.push_back(GUID);
  }

  // Sort the export list elemets GUIDs.
  std::stable_sort(exportsGUID.begin(), exportsGUID.end());
  for (uint64_t guid : exportsGUID) {
    // The export list can impact the internalization, be conservative here
     Hasher.update(ArrayRef<uint8_t>((uint8_t *)&guid, sizeof(guid)));
  }

(2) We could use a different hash function.  The current implementation:
  static unsigned getHashValue(ValueInfo I) { return (uintptr_t)I.getRef(); }
is based on an address (which can change from run to run, causing this problem). If we decide to do this change, we need to discuss with hash function to use.

(3) We could use a different container that guarantees that the order of iteration matches the order of insertion. E.g. We could use SetVector instead of DenseSet, i.e. change 
using ExportSetTy = DenseSet<ValueInfo>;
   into
using ExportSetTy = SetVector<ValueInfo>;
The code changes are minimal and that fix takes care of the problem. However, SetVector is using more space and not as efficient. Potentially we could use some other container, maybe std::vector or SmallVector. This might require more code changes.


SOME ADDITIONAL NOTES:
(1) I only encountered this problem on Windows, where I use the lld linker more frequently. The problem might exist on Unix as well, but since it is not reproducible consistently, I may simply have not run into it.

(2) Another thought why it fails to work on Windows, but might work on Unix is because of address space randomization (i.e. MSVC compiler has /HIGHENTROPYVA enabled). 

(3) We might have a similar issue with ImportList as well. I haven't looked into it yet.
Comment 1 Teresa Johnson 2020-05-07 08:48:59 PDT
+pcc for thoughts since the caching is used by Chromium. I mostly look at distributed ThinLTO which doesn't use the cache computation and won't be affected by this issue.

This presumably got introduced with D70128 which was committed back in Nov and changed the export list from using a std::unordered_set to a DenseSet. I missed the fact that it would affect determinism in the iteration order when reviewing the patch. cc'ed Evgeny who authored the patch.

I suppose this was done for efficiency reasons. But due to the cache effects we should either change this back to unordered_set or some other deterministically iterated set. It shouldn't have a fundamental effect on what that patch was doing. Or as Katya suggests, we could sort them in some way before computing the cache key.

It looks like the import list may also have this problem as it uses a StringMap which also doesn't have a deterministic iteration order. It's been that way for a couple years, so I'm surprised this non-determinism hasn't showed up earlier.

I have a preference towards doing some sorting when the cache key is computed, so that the rest of the thin link isn't affected by a less efficient container.
Comment 2 Eugene Leviant 2020-05-07 09:32:15 PDT
I guess std::unordered_set also doesn't have deterministic iteration order, so changing type from DenseSet to std::unordered_set may not help. Sorting looks good, I'd probably use std::sort instead of std::stable_sort, because we're hashing GUID only.
Comment 3 David Blaikie 2020-05-07 10:47:40 PDT
Changing the hash function (2) is inadequate because hashing containers have no guaranteed ordering.

If there's a clear "build" phase, separate from the "use/iterate" phase - yeah, just sort it (1) (moving/copying the values into a sortable container from the hashing container) before the use/iterate.

If there is no clear phasing (if you do some building, then iterate, then do some more building, etc) - then use a MapVector (3).
Comment 4 katya.romanova 2020-05-07 12:47:09 PDT
(In reply to David Blaikie from comment #3)
> Changing the hash function (2) is inadequate because hashing containers have
> no guaranteed ordering.
> 
> If there's a clear "build" phase, separate from the "use/iterate" phase -
> yeah, just sort it (1) (moving/copying the values into a sortable container
> from the hashing container) before the use/iterate.
> 

Yes, we have a clear "build" phase, followed by a separate "use/iterate" phase.

So, the general consensus is to sort both ExportList and ImportList containers, similar to what I have proposed in (1), but use std::sort std:stable_sort.
Comment 5 katya.romanova 2020-05-12 03:06:12 PDT
I submitted a review for the fix for this problem.
https://reviews.llvm.org/D79772
Comment 6 katya.romanova 2020-06-10 13:01:36 PDT
Fixed by commit 252892fea7088abbeff9476e0ecbacc091d135a0