LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 46299 - objcopy zero-size section, huge binaries
Summary: objcopy zero-size section, huge binaries
Status: NEW
Alias: None
Product: tools
Classification: Unclassified
Component: llvm-objcopy/strip (show other bugs)
Version: 10.0
Hardware: Other other
: P normal
Assignee: Unassigned LLVM Bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-06-12 03:54 PDT by Vincent Hamp
Modified: 2020-06-17 01:36 PDT (History)
8 users (show)

See Also:
Fixed By Commit(s):


Attachments
ELF (94.33 KB, application/x-executable)
2020-06-12 03:55 PDT, Vincent Hamp
Details
ELF (fixed) (91.73 KB, application/x-executable)
2020-06-15 05:18 PDT, Vincent Hamp
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Vincent Hamp 2020-06-12 03:54:06 PDT
I'm expecting objcopy to create a binary from the attached ELF like this:
llvm-objcopy A.elf -O binary A.bin

Running size tells me that the binary should have 824B, yet the file I get is 384MB big.

Using readelf -e A.elf to inspect the section headers I can see that there is a suspicious NULL section at the very beginning which is absolutely empty. Could this be the reason why the binary gets so bloated?

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .vector_table     PROGBITS        08000000 001000 000010 00   A  0   0  4
  [ 2] .version          PROGBITS        08000010 001010 000010 00   A  0   0  1
  [ 3] .text             PROGBITS        08000020 001020 000308 00  AX  0   0  4
  [ 4] .rodata           PROGBITS        08000328 001328 000000 00  AX  0   0  1
  [ 5] .ARM.exidx        ARM_EXIDX       08000328 001328 000010 00  AL  3   0  4
  [ 6] .preinit_array    PROGBITS        08000338 001338 000000 00   A  0   0  1
  [ 7] .init_array       INIT_ARRAY      08000338 001338 000004 04  WA  0   0  4
  [ 8] .fini_array       FINI_ARRAY      0800033c 00133c 000004 04  WA  0   0  4
  [ 9] .data             PROGBITS        20000000 002000 000000 00  WA  0   0  1
  [10] .data2            PROGBITS        10000000 002000 000000 00  WA  0   0  1
  [11] .bss              NOBITS          20000000 002000 0001ac 00  WA  0   0 512
  [12] ._user_heap_stack PROGBITS        200001ac 002000 000e04 00  WA  0   0  1
  [13] .ARM.attributes   ARM_ATTRIBUTES  00000000 002e04 000049 00      0   0  1
  [14] .debug_str        PROGBITS        00000000 002e4d 004795 01  MS  0   0  1
  [15] .debug_loc        PROGBITS        00000000 0075e2 001709 00      0   0  1
  [16] .debug_abbrev     PROGBITS        00000000 008ceb 000d43 00      0   0  1
  [17] .debug_info       PROGBITS        00000000 009a2e 00a9f9 00      0   0  1
  [18] .debug_ranges     PROGBITS        00000000 014427 000148 00      0   0  1
  [19] .comment          PROGBITS        00000000 01456f 00002a 01  MS  0   0  1
  [20] .debug_frame      PROGBITS        00000000 01459c 00065c 00      0   0  4
  [21] .debug_line       PROGBITS        00000000 014bf8 0012bb 00      0   0  1
  [22] .symtab           SYMTAB          00000000 015eb4 000690 10     24  60  4
  [23] .shstrtab         STRTAB          00000000 016544 000106 00      0   0  1
  [24] .strtab           STRTAB          00000000 01664a 0004b5 00      0   0  1
Comment 1 Vincent Hamp 2020-06-12 03:55:04 PDT
Created attachment 23606 [details]
ELF
Comment 2 James Henderson 2020-06-15 04:37:37 PDT
Hi Vincent,

Just to let you know, the NULL section header is entirely normal. ELF requires there to be a single NULL section header at the start of the section header table, and it is occasionally used for special metadata.

Did you get the 824B if you run the command using GNU objcopy? I get the following output using GNU and LLVM size:

   text    data     bss     dec     hex filename
    868    3596     428    4892    131c A.elf

I also note however, that the ELF section header dump doesn't look quite the same as the one you posted:

---

There are 26 section headers, starting at offset 0x17544:

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .vector_table     PROGBITS        08000000 001000 000010 00   A  0   0  4
  [ 2] .version          PROGBITS        08000010 001010 000010 00   A  0   0  1
  [ 3] .text             PROGBITS        08000020 001020 000334 00  AX  0   0  4
  [ 4] .rodata           PROGBITS        08000354 001354 000000 00  AX  0   0  1
  [ 5] .ARM.exidx        ARM_EXIDX       08000354 001354 000010 00  AL  3   0  4
  [ 6] .preinit_array    PROGBITS        08000364 001364 000000 00   A  0   0  1
  [ 7] .init_array       INIT_ARRAY      08000364 001364 000004 04  WA  0   0  4
  [ 8] .fini_array       FINI_ARRAY      08000368 001368 000004 04  WA  0   0  4
  [ 9] .data             PROGBITS        20000000 002000 000000 00  WA  0   0  1
  [10] .data2            PROGBITS        10000000 002000 000000 00  WA  0   0  1
  [11] .bss              NOBITS          20000000 002000 0001ac 00  WA  0   0 512
  [12] .bss2             PROGBITS        200001ac 0021ac 000000 00  WA  0   0  1
  [13] ._user_heap_stack PROGBITS        200001ac 0021ac 000e04 00  WA  0   0  1
  [14] .ARM.attributes   ARM_ATTRIBUTES  00000000 002fb0 000049 00      0   0  1
  [15] .debug_str        PROGBITS        00000000 002ff9 004795 01  MS  0   0  1
  [16] .debug_loc        PROGBITS        00000000 00778e 001bc1 00      0   0  1
  [17] .debug_abbrev     PROGBITS        00000000 00934f 000d4a 00      0   0  1
  [18] .debug_info       PROGBITS        00000000 00a099 00ad8d 00      0   0  1
  [19] .debug_ranges     PROGBITS        00000000 014e26 000148 00      0   0  1
  [20] .comment          PROGBITS        00000000 014f6e 00002a 01  MS  0   0  1
  [21] .debug_frame      PROGBITS        00000000 014f98 00065c 00      0   0  4
  [22] .debug_line       PROGBITS        00000000 0155f4 0012cf 00      0   0  1
  [23] .symtab           SYMTAB          00000000 0168c4 0006b0 10     25  60  4
  [24] .shstrtab         STRTAB          00000000 016f74 00010c 00      0   0  1
  [25] .strtab           STRTAB          00000000 017080 0004c3 00      0   0  1

---

I also took a look at the attached ELF, and I think it looks slightly odd to me. I suspect, though I don't know, there's something wrong with your assembly or possibly linker script. I dumped the program headers and here is the result:

---

There are 6 program headers, starting at offset 52

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x001000 0x08000000 0x08000000 0x00364 0x00364 R E 0x1000
  LOAD           0x001364 0x08000364 0x08000364 0x00008 0x00008 RW  0x1000
  LOAD           0x002000 0x20000000 0x1800036c 0x00fb0 0x00fb0 RW  0x1000
  GNU_RELRO      0x001364 0x08000364 0x08000364 0x00008 0x00c9c R   0x1
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x0
  EXIDX          0x001354 0x08000354 0x08000354 0x00010 0x00010 R   0x4

 Section to Segment mapping:
  Segment Sections...
   00     .vector_table .version .text .rodata .ARM.exidx
   01     .preinit_array .init_array .fini_array
   02     .data .bss .bss2 ._user_heap_stack
   03     .preinit_array .init_array .fini_array
   04
   05     .rodata .ARM.exidx
   None   .data2 .ARM.attributes .debug_str .debug_loc .debug_abbrev .debug_info .debug_ranges .comment .debug_frame .debug_line .symtab .shstrtab .strtab

---

Things I noticed from this and the section header dumps:
1) You appear to have a PROGBITS ._user_heap_stack section, following the .bss section. This will cause the .bss section to be allocated file space in the segment since the later section cannot be represented otherwise.
2) The .bss2 section in the attachment appears to be PROGBITS too, which suggests there you have created this section with the wrong flags. This may also be the mistake with ._user_heap_stack.
3) The address of .data2 goes backwards. This is probably harmless in itself, but might indicate another problem somewhere.
4) As far as a I know the file size of a binary output will be the difference between the start address of the first non-NOBITS allocatable section (in this case .vector_table) and the end address of the last one (in this case ._user_heap_stack). This gives a size value required of 384MB.
Comment 3 Vincent Hamp 2020-06-15 05:18:39 PDT
Created attachment 23616 [details]
ELF (fixed)
Comment 4 Vincent Hamp 2020-06-15 05:33:11 PDT
Hello James

Thank you for your fast reply. Apparently I'm an idiot. I must have attached the wrong ELF file where I've experimented with some linker script changes. I've recompiled and reattached an ELF file where the dump matches the one I posted 3 days ago.

Now the ELF attached actually contains 24 sections and .bss2. is no longer present.

The ._user_heap_stack still is though but I don't really know why its of type PROGBITS. This section is marked as (NOLOAD) in my linker script like this:

  ._user_heap_stack (NOLOAD) :
  {
    . = ALIGN(8);
    PROVIDE ( end = . );
    PROVIDE ( _end = . );
    . = . + _Min_Heap_Size;
    . = . + _Min_Stack_Size;
    . = ALIGN(8);
  } >RAM

So was the .bss2 section in the first ELF file btw. Yet it also ended up as type PROGBITS?

I've also tried removing the ._user_heap_stack section from my linker script altogether. This also had no effect on the produced binary which was still 384MB large.

The address of .data2 indeed goes backwards, but sadly that address comes from my silicon vendor so there is no changing that.
Comment 5 James Henderson 2020-06-15 05:53:39 PDT
Thanks for the updated ELF Vincent. I don't have any more time to look at this today unfortunately. Regarding the linker behaviour for NOBITS/PROGBITS, maybe Geroge Rimar, or Fangrui Song can assist. I'm assuming you're using LLD? They have more knowledge than I do in that area. Fangrui has also done some work on llvm-objcopy in the binary output area recently, so might see something I've missed or misunderstood.

A simple solution for .data2 is to put it before .data in the linker script. That should change its order in the section header table without impacting the address in this case (I'm assuming it's given a hard-coded address in the linker script).

I probably wasn't clear with my point 4, but I think that point indicates there isn't a bug in llvm-objcopy here. Unfortunately, I haven't got an ARM-supporting GNU objcopy to verify with. I assume you do, and if so, could you try using it on the output to see the result, and let me know what the size is then, please?
Comment 6 Vincent Hamp 2020-06-15 06:27:55 PDT
Running arm-none-eabi-size on the ELF gives me the following output:
   text	   data	    bss	    dec	    hex	filename
    824	   3596	    428	   4848	   12f0	A.elf

What I also find interesting in this regard is that the section ._user_heap_stack I've posted before seems to get counted to "data". The section is empty and to my knowledge only used to produce linker errors in case there isn't enough RAM available to alloc all static objects + minimum heap size + stack size. Generating an ELF with arm-none-eabi-gcc with the very same linker script does not count this section as "data".

GCC's ELF is also missing the NULL section at the very beginning, .ARM.exidx (which are turned off anyhow?) and only shows two LOADS for the program headers:

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x010000 0x08000000 0x08000000 0x00158 0x00158 RWE 0x10000
  LOAD           0x000000 0x20000000 0x20000000 0x00000 0x01190 RW  0x10000


I've now also tried to remove not only ._user_heap_stack but also .data2 and .bss2... still, no changes.
Comment 7 Vincent Hamp 2020-06-15 06:32:16 PDT
Ok, checked the ELF again. Sorry about that NULL section thing. You were right in that this section is omnipresent and it's also present in an ELF generated by GCC.
Comment 8 James Henderson 2020-06-15 06:44:34 PDT
The size tool will only give an indication of the memory footprint of the sections within a binary. It does not indicate the size of the program segments, which could theoretically be beyond that. Additionally, it is not a good guide for the binary output size, assuming my understanding of binary output is also correct (it might not be as I'm not a user of it). ._user_heap_stack is counted as data presumably because it is marked (incorrectly) as a PROGBITS section. That sounds like a bug in the linker to me, and might be the ultimate cause of the large object you're getting from llvm-objcopy.

> I've now also tried to remove not only ._user_heap_stack but also .data2 and
> .bss2... still, no changes.

By this, do you mean removed from the linker script, from the input, or something else?
Comment 9 George Rimar 2020-06-15 07:00:03 PDT
At first I thought that issue might be because of no input sections in ._user_heap_stack definition, but we have a test case that handles such case, e.g.:
https://github.com/llvm/llvm-project/blob/master/lld/test/ELF/linkerscript/noload.s

And our handling in LLD looks trivial for such a simple case:
https://github.com/llvm/llvm-project/blob/master/lld/ELF/ScriptParser.cpp#L765

So to answer the question why ._user_heap_stack is created as a PROGBITS would be helpfull either to have a little sample, or a linker reproduce file (if it is acceptable). Reproduce file can be created with a --reproduce option. It creates a tar with all linker inputs included and can be used to debug the behavior.
Comment 10 Vincent Hamp 2020-06-15 07:33:32 PDT
By removing sections I meant removing them from the linker script (and from my startup code).

I think I succeeded in creating a linker reproduce file. Sadly the file size limit here does not allow me to attach it directly, so I uploaded it here:
https://higaski.at/repro.tar
Comment 11 George Rimar 2020-06-16 02:38:08 PDT
(In reply to Vincent Hamp from comment #10)
> By removing sections I meant removing them from the linker script (and from
> my startup code).
> 
> I think I succeeded in creating a linker reproduce file. Sadly the file size
> limit here does not allow me to attach it directly, so I uploaded it here:
> https://higaski.at/repro.tar

I've tried the repro provided and the ._user_heap_stack section is SHT_NOBITS for me:

umb@ubuntu:~/tests/200$ readelf -a MSxxx 
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           ARM
  Version:                           0x1
  Entry point address:               0x8000020
  Start of program headers:          52 (bytes into file)
  Start of section headers:          146144 (bytes into file)
  Flags:                             0x5000400, Version5 EABI, hard-float ABI
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         7
  Size of section headers:           40 (bytes)
  Number of section headers:         24
  Section header string table index: 22

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .vector_table     PROGBITS        08000000 010000 000010 00   A  0   0  4
  [ 2] .version          PROGBITS        08000010 010010 000010 00   A  0   0  1
  [ 3] .text             PROGBITS        08000020 010020 0002c8 00  AX  0   0  4
  [ 4] .rodata           PROGBITS        080002e8 0102e8 000000 00  AX  0   0  1
  [ 5] .ARM.exidx        ARM_EXIDX       080002e8 0102e8 000010 00  AL  3   0  4
  [ 6] .preinit_array    PROGBITS        080002f8 0102f8 000000 00   A  0   0  1
  [ 7] .init_array       INIT_ARRAY      080002f8 0102f8 000004 04  WA  0   0  4
  [ 8] .fini_array       FINI_ARRAY      080002fc 0102fc 000004 04  WA  0   0  4
  [ 9] .data             PROGBITS        20000000 010300 000000 00  WA  0   0  1
  [10] .bss              NOBITS          20000000 010300 0001ac 00  WA  0   0 512
  [11] ._user_heap_stack NOBITS          200001ac 010300 000e04 00  WA  0   0  1
  [12] .ARM.attributes   ARM_ATTRIBUTES  00000000 010300 000049 00      0   0  1
  [13] .debug_str        PROGBITS        00000000 010349 004795 01  MS  0   0  1
  [14] .debug_loc        PROGBITS        00000000 014ade 001481 00      0   0  1
  [15] .debug_abbrev     PROGBITS        00000000 015f5f 000d4a 00      0   0  1
  [16] .debug_info       PROGBITS        00000000 016ca9 00a78d 00      0   0  1
  [17] .debug_ranges     PROGBITS        00000000 021436 000148 00      0   0  1
  [18] .comment          PROGBITS        00000000 02157e 00007e 01  MS  0   0  1
  [19] .debug_frame      PROGBITS        00000000 0215fc 00065c 00      0   0  4
  [20] .debug_line       PROGBITS        00000000 021c58 00128b 00      0   0  1
  [21] .symtab           SYMTAB          00000000 022ee4 000660 10     23  60  4
  [22] .shstrtab         STRTAB          00000000 023544 0000ff 00      0   0  1
  [23] .strtab           STRTAB          00000000 023643 00049c 00      0   0  1
Key to Flags:
Comment 12 George Rimar 2020-06-16 02:39:45 PDT
Are you using the lastest LLD available? Mine is

umb@ubuntu:~/tests/200$ ~/LLVM/LLVM/llvm-project/build/bin/ld.lld -v
LLD 11.0.0 (https://github.com/llvm/llvm-project.git 16b7eb6dd1247dbe322061d33636a054d6c954dc) (compatible with GNU linkers)
Comment 13 Vincent Hamp 2020-06-16 03:50:42 PDT
No I'm using 10.0.0

[vinci@threadripper ~]$ ld.lld -v
LLD 10.0.0 (compatible with GNU linkers)
Comment 14 George Rimar 2020-06-17 00:48:22 PDT
So, running the following on MSxxx produced by the latest LLD

llvm-objcopy MSxxx -O binary A.bin
(llvm-objcopy is also built from latest sources)

results in a 768 bytes output for me. Seems there is no issue, you just have to update your LLD/LLVM.
Comment 15 George Rimar 2020-06-17 01:36:40 PDT
The fix is backported to 10.0.1: https://bugs.llvm.org/show_bug.cgi?id=46225