MCLinker - the final toolchain frontier 
           Jörg Sonnenberger
          
            joerg@NetBSD.org
          
           Naples, April 06, 2013 
           BSD Day 2013 
        
        
           Overview 
          
            -  Introduction 
-  Architecture 
-  Performance 
-  Implementation status 
-  Future work 
 Introduction 
          
            -  Machine Code Linker complements the MC layer of LLVM 
-  Created by Luba Tang from MediaTek in 2011 
-  Uses same BSD-license as LLVM 
 Architecture: High-level view 
          
            -  Build input tree 
-  Build fragment reference graph 
-  Layout sections, relocate and write output 
-  GNU ld: three steps mixed up 
-  gold: merge first two phases 
 Build the input tree 
            
              -  Goal: High-level intermediate reprensentation 
-  Based on command line 
-  ...and file system content 
-  Deals with positional arguments (--start-group, --as-needed) 
-  Nesting: linker archives contain objects 
-  Typed objects: object files, linker archives, shared libraries 
 Build fragment reference graph 
            
              -  Goal: symbol resolution 
-  Build a graph with sections as nodes, symbol references as edges 
-  Traverse input tree and look for files 
-  If it requested OR provides a missing definition 
-  ...process sections and symbol table 
-  Linker groups: use stack, push when hitting start 
-  ...repeat from start as long as new undefined reference occur 
-  Optimize for cache locality 
-  Place symbol attributes and initial part of name in same cache line 
 Layout sections 
            
              -  Goal: decide section order and final positions 
-  Merge sections with same name and subsections 
-  Drop redundant or unused sections 
-  Finalize symbol values 
-  Advantage of late layout: avoids recomputations 
-  Single pass for ordering and address assignment 
 Compute relocations 
            
              -  Apply finalized symbol values to relocations 
-  Decide which relocations are known at link time 
-  ...and which are left for the run time linker 
-  ...or whether they can be replaced by cheaper versions 
-  Constant tables vs limited intermediate encoding 
-  Global dynamic vs initial exec TLS method 
 Write output 
            
              -  Goal: write final binary 
-  Apply relocations to input sections 
-  Write resulting sections/segmentions 
-  Mix in metadata 
-  Use memory mapped files if possible 
-  ...helps page lookup table (TLB) cache 
-  ...improves page locality 
-  ...helps filesystem cache 
 Performance: Time and memory use 
            
              
                | Binary |  | GNU ld | gold | MCLinker | 
              
                | llvm-tblgen | Run time | 0.10s | 0.04s | 0.05s | 
              
                | Peak RSS | 17,700KB | 17,528KB | 17,508KB | 
              
                | clang | Run time | 1.41s | 0.44s | 0.69s | 
              
                | Peak RSS | 150MB | 182MB | 176MB | 
            
          
          
             Output size 
            
              
                | Binary | Segment | GNU ld | gold | MCLinker | 
              
                | llvm-tblgen | text | 1,828KB | 1,786LB | 2,124KB | 
              
                | data | 2,664 | 2,520 | 2,408 | 
              
                | bss | 5,912 | 2,520 | 5,360 | 
              
                | clang | text | 26.9MB | 26.7MB | 34.3MB | 
              
                | data | 22,112 | 22,112 | 21,984 | 
              
                | bss | 47,736 | 47,704 | 47,624 | 
            
            
            
              -  MCLinker behaves like --export-dynamic 
-  Text size difference in .rodata and .dynstr 
 Linking GCC's cc1 
            
              
                |  | GNU ld | MCLinker | 
              
                | Run time | 0.20s | 0.16s | 
              
                | Peak RSS | 47,888KB | 51,752KB | 
              
                | Code size | 8,618KB | 8,178KB | 
              
                | Data size | 1,154KB | 1,154KB (+48B) | 
            
          
        
        
          
             Implementation status: MI 
            
              -  Most basic ELF functionality works:
    -  Static/dynamic linkage 
-  Partial linking 
-  Visibility and binding rules 
-  DT_NEEDED not honoured yet 
 
 i386 and amd64 
            
              -  build.sh release works 
-  ...using a fallback to GNU ld for parts depending on linker scripts 
-  TLS support incomplete: relaxation tests fail 
 ARM 
            
              -  build.sh release builds 
-  ...using a few more hacks than X86 
-  ...parts of libc.so don't work optimized 
-  ...analysis is still running 
-  TLS support incomplete 
-  ARM ELF header flags problematic 
-  Optional system linker for Android 
-  No support for AArch64 
 MIPS 
            
              -  Used by Android/MIPS 
-  NetBSD untested (yet) 
-  No support for N64 or O64 
 Future work 
          
            -  Extensive testsuite 
-  Symbol versioning 
-  Linker scripts 
-  LTO 
-  Research: fine grained layout on a per function base 
-  EH table optimisations 
-  Platform work:
    -  To-be-completed: X86 (i386 and amd64), ARM and MIPS support 
-  Work-in-progress: X32, MIPS64, Hexagon 
-  Not-started-yet: AArch64 
 
 Corporate supporters 
          
            -  MediaTek 
-  Google 
-  Intel 
-  MIPS 
-  Qualcomm