LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 23277 - __builtin_object_size(null_pointer) returns the wrong result
Summary: __builtin_object_size(null_pointer) returns the wrong result
Status: RESOLVED FIXED
Alias: None
Product: clang
Classification: Unclassified
Component: -New Bugs (show other bugs)
Version: trunk
Hardware: All All
: P normal
Assignee: Unassigned Clang Bugs
URL:
Keywords:
Depends on:
Blocks: 21420
  Show dependency tree
 
Reported: 2015-04-18 11:37 PDT by Daniel Micay
Modified: 2017-03-22 14:07 PDT (History)
9 users (show)

See Also:
Fixed By Commit(s):


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Daniel Micay 2015-04-18 11:37:31 PDT
This outputs (size_t)-1 in both cases with `gcc -O2` but clang gives 0 for the second case:

  #include <stdio.h>

  int main(int argc, char **argv) {
    printf("%zu\n", __builtin_object_size(NULL, 0));
    void *p = NULL;
    printf("%zu\n", __builtin_object_size(p, 0));
    return 0;
  }

This is broken with glibc's _FORTIFY_SOURCE implementation for nullable parameters. It assumes that it can do comparisons like `object_size(buffer) < copy_size` after handling the unknown case, but this can result in a false positive with Clang. It's not a glibc bug because they're using GNU C as it is defined by GCC, which is where this feature comes from.
Comment 1 Richard Smith 2015-04-18 18:04:54 PDT
__builtin_object_size(p, 0) gives an upper bound on the number of bytes that can be accessed through pointer p, so both 0 and -1 seem like correct answers here, since LLVM explicitly assumes that there cannot exist a valid object at address 0. As you say, we may need to deliberately give a worse answer here to placate glibc.

Which glibc function in particular are you having problems with, and how does it use __builtin_object_size?

Whatever we do, the first case (constant-folded by Clang) and the second case (constant-folded by LLVM) should give the same answer.
Comment 2 Richard Smith 2015-04-18 18:09:20 PDT
Also, _FORTIFY_SOURCE + glibc + clang is not supported and does not work (for instance, it relies on __builtin_va_pack_len and friends, which we have no intention of supporting), so glibc compatibility is unlikely to be a strong motivator for a change here.
Comment 3 Daniel Micay 2015-04-18 19:30:37 PDT
The realpath(name, resolved) function is one example. The second parameter can be NULL, in which case realpath allocates the string itself with malloc(). The glibc code has a check for (size_t)-1 and then fails if the detected size is smaller than PATH_MAX. This occurs in all of the _FORTIFY_SOURCE functions with similar nullable buffer parameters.

I'm working on extending _FORTIFY_SOURCE in Android's libc (Bionic) and this ends up being annoying because the distinction between zero-size and NULL is important. It has to be handled like this:

#if defined(__clang__)
    if (bos == 0 && !(__builtin_constant_p(resolved) && resolved != NULL)) {
        return __realpath_real(path, resolved);
    }
#endif

The code then checks for bos < PATH_MAX and errors out. I don't think 0 is necessarily a better answer because it makes it seem like there's a 0-size object when NULL really means there's no object at all. It's easy enough to detect NULL if you actually want to handle it separately, but I don't think that's the common case. It's harder to undo the merge of the two cases.
Comment 4 Daniel Micay 2015-04-18 19:33:01 PDT
I know standard C doesn't actually have 0-size objects, but GNU C and other languages like Rust *do*, as does LLVM IR.
Comment 5 Richard Smith 2015-04-18 20:04:24 PDT
I would expect the common case to be that a null pointer is treated like a pointer to an object of size 0 -- not inherently an invalid argument, but invalid if the function requires the ability to access a nonzero number of bytes. If the function explicitly specifies that it has special behavior for a null argument, I'd expect the fortify check to also have an explicit special case for a null argument.

In LLVM IR, the null pointer acts essentially like a pointer to a zero-sized object.

FWIW, if we could prove that a pointer did not point to an object at all, I think 0 would be the right thing for __builtin_object_size to return when applied to that object -- the answer to the question "how many bytes can I access from this pointer?" is zero.
Comment 6 Daniel Micay 2015-04-18 20:22:46 PDT
If it can figure out that there's a dangling pointer, 0 is definitely the right answer. I don't think it's the right way to handle NULL though.
Comment 7 Renato Golin 2015-09-29 10:07:12 PDT
(In reply to comment #1)
> __builtin_object_size(p, 0) gives an upper bound on the number of bytes that
> can be accessed through pointer p, so both 0 and -1 seem like correct
> answers here

From the docs:

"type is an integer constant from 0 to 3. (...) The second bit determines if maximum or minimum of remaining bytes is computed."

Which results in:

When in doubt, "__builtin_object_size should return (size_t) -1 for type 0 or 1 and (size_t) 0 for type 2 or 3."

I think the docs are clear enough in this case. This has nothing to do with fortify or glibc.
Comment 8 George Burgess 2016-12-20 19:05:09 PST
This incompatibility just bit us in ChromeOS-land, so I'll throw out my 2c:

Were I designing __builtin_object_size from the ground up, I'd totally agree that __builtin_object_size(NULL, N) should lower to 0. That said, my understanding of __builtin_object_size in clang is that it exists for compatibility with gcc. This NULL-pointer (mis-)feature is clearly relied on by glibc, and I'm uncertain how many other users of __builtin_object_size depend on it (implicitly or explicitly). Because __builtin_object_size, in my mind, exists primarily for compatibility, I don't see why we can't try to better match this corner case.

I'm happy to make clang answer conservatively in this case; if I do, would you be okay with it Richard?
Comment 9 Richard Smith 2016-12-20 21:23:56 PST
Re comment#7:
> When in doubt, "__builtin_object_size should return (size_t) -1 for type 0 or 1 and (size_t) 0 for type 2 or 3."

But there is no doubt as to how many bytes can be accessed through a null pointer. The answer is zero bytes.

> This has nothing to do with fortify or glibc.

I disagree. Giving 0 for a null pointer is sound and principled, and the obvious right answer; the only reason to do something else is compatibility with GCC or with code that relies on the GCC semantics.

And, to be clear, that compatibility *is* important.

Re comment#8:
> I'm happy to make clang answer conservatively in this case; if I do, would you be okay with it Richard?

Yes, this seems like the best thing to do. But it's not clear that it's Clang you need to change; the "wrong" answer is coming from LLVM's lowering of @llvm.objectsize, and for that intrinsic, zero is definitely the correct answer for a null pointer.

Ideally, we should solve this in a way that doesn't insert a branch for every non-constant use of __builtin_object_size...
Comment 10 George Burgess 2016-12-20 22:42:32 PST
> But it's not clear that it's Clang you need to change; the "wrong" answer is coming from LLVM's lowering of @llvm.objectsize, and for that intrinsic, zero is definitely the correct answer for a null pointer 

Yup!

> Ideally, we should solve this in a way that doesn't insert a branch for every non-constant use of __builtin_object_size...

I was thinking of adding a bit to @llvm.objectsize that controls how it handles NULL, similar to how it has the min/max bit already. If we make the default "null is 0 bytes", I'd imagine that should be fairly painless.
Comment 11 Richard Smith 2016-12-21 00:46:53 PST
If the people on the llvm side are OK with that, it's fine by me.
Comment 12 Renato Golin 2016-12-21 06:17:48 PST
(In reply to comment #10)
> I was thinking of adding a bit to @llvm.objectsize that controls how it
> handles NULL, similar to how it has the min/max bit already. If we make the
> default "null is 0 bytes", I'd imagine that should be fairly painless.

I recommend you send this as an RFC to the LLVM dev list before trying to implement it (unless it's really quick and serves as a demonstration). Changing core IR builtins should be done with a larger consensus.
Comment 13 George Burgess 2017-03-22 14:07:58 PDT
Fixed by r298430+r298431.

Thanks!