Explains how compilers translate high-level language source code (like code written in Python) into low-level machine code (code that the computer can understand) to help readers understand how to produce the best low-level, computer readable machine code.
In the beginning, most software was written in assembly, the CPU's low-level language, in order to achieve acceptable performance on relatively slow hardware. Early programmers were sparing in their use of high-level language code, knowing that a high-level language compiler would generate crummy, low-level machine code for their software. Today, however, many programmers write in high-level languages like Python, C/C++/C#, Java, Swift. The result is often sloppy, inefficient code.
But you don't need to give up the productivity and portability of high-level languages in order to produce more efficient software.
In this second volume of the Write Great Code series, you'll learn:
• How to analyze the output of a compiler to verify that your code does, indeed, generate good machine code
• The types of machine code statements that compilers typically generate for common control structures, so you can choose the best statements when writing HLL code
• Just enough 80x86 and PowerPC assembly language to read compiler output
• How compilers convert various constant and variable objects into machine data, and how to use these objects to write faster and shorter programs
NEW TO THIS EDITION, COVERAGE OF:
• Programming languages like Swift and Java
• Code generation on modern 64-bit CPUs
• ARM processors on mobile phones and tablets
• Stack-based architectures like the Java Virtual Machine
• Modern language systems like the Microsoft Common Language Runtime
With an understanding of how compilers work, you'll be able to write source code that they can translate into elegant machine code. That understanding starts right here, with Write Great Code, Volume 2: Thinking Low-Level, Writing High-Level.
Author(s): Randall Hyde
Edition: 2
Publisher: No Starch Press
Year: 2020
Language: English
Commentary: Vector PDF
Pages: 656
City: San Francisco, CA
Tags: Programming; Best Practices; Assembly Language; Compilers; Programming Languages
Brief Contents
Contents In Detail
Acknowledgments
Introduction
Performance Characteristics of Great Code
The Goal of This Book
Chapter Organization
Assumptions and Prerequisites
The Environment for This Book
For More Information
Chapter 1: Thinking Low-Level, Writing High-Level
Misconceptions About Compiler Quality
Why Learning Assembly Language Is Still a Good Idea
Why Learning Assembly Language Isn’t Absolutely Necessary
Thinking Low-Level
Compilers Are Only as Good as the Source Code You Feed Them
How to Help the Compiler Produce Better Machine Code
How to Think in Assembly While Writing HLL Code
Writing High-Level
Language-Neutral Approach
Additional Tips
For More Information
Chapter 2: Shouldn’t You Learn Assembly Language?
Benefits and Roadblocks to Learning Assembly Language
How This Book Can Help
High-Level Assemblers to the Rescue
High-Level Assembly Language
Thinking High-Level, Writing Low-Level
The Assembly Programming Paradigm (Thinking Low-Level)
For More Information
Chapter 3: 80x86 Assembly for the HLL Programmer
Learning One Assembly Language Is Good, Learning More Is Better
80x86 Assembly Syntaxes
Basic 80x86 Architecture
Registers
80x86 32-Bit General-Purpose Registers
The 80x86 EFLAGS Register
Literal Constants
Binary Literal Constants
Decimal Literal Constants
Hexadecimal Literal Constants
Character and String Literal Constants
Floating-Point Literal Constants
Manifest (Symbolic) Constants in Assembly Language
Manifest Constants in HLA
Manifest Constants in Gas
Manifest Constants in MASM
80x86 Addressing Modes
80x86 Register Addressing Modes
Immediate Addressing Mode
Displacement-Only Memory Addressing Mode
RIP-Relative Addressing Mode
Register Indirect Addressing Mode
Indexed Addressing Mode
Scaled-Index Addressing Modes
Declaring Data in Assembly Language
Data Declarations in HLA
Data Declarations in MASM
Data Declarations in Gas
Specifying Operand Sizes in Assembly Language
Type Coercion in HLA
Type Coercion in MASM
Type Coercion in Gas
For More Information
Chapter 4: Compiler Operation and Code Generation
File Types That Programming Languages Use
Source Files
Tokenized Source Files
Specialized Source Files
Types of Computer Language Processors
Pure Interpreters
Interpreters
Compilers
Incremental Compilers
The Translation Process
Scanning (Lexical Analysis)
Parsing (Syntax Analysis)
Intermediate Code Generation
Optimization
Compiler Benchmarking
Native Code Generation
Compiler Output
Emitting HLL Code as Compiler Output
Emitting Assembly Language as Compiler Output
Emitting Object Files as Compiler Output
Emitting Executable Files as Compiler Output
Object File Formats
The COFF File Header
The COFF Optional Header
COFF Section Headers
COFF Sections
The Relocation Section
Debugging and Symbolic Information
Executable File Formats
Pages, Segments, and File Size
Internal Fragmentation
Reasons to Optimize for Space
Data and Code Alignment in an Object File
Choosing a Section Alignment Size
Combining Sections
Controlling the Section Alignment
Aligning Sections Within Library Modules
How Linkers Affect Code
For More Information
Chapter 5: Tools for Analyzing Compiler Output
Background
Telling a Compiler to Produce Assembly Output
Assembly Output from GNU Compilers
Assembly Output from Visual C++
Example Assembly Language Output
Assembly Output Analysis
Using Object Code Utilities to Analyze Compiler Output
The Microsoft dumpbin.exe Utility
The FSF/GNU objdump Utility
Using a Disassembler to Analyze Compiler Output
Using the Java Bytecode Disassembler to Analyze Java Output
Using the IL Disassembler to Analyze Microsoft C# and Visual Basic Output
Using a Debugger to Analyze Compiler Output
Using an IDE’s Debugger
Using a Stand-Alone Debugger
Comparing Output from Two Compilations
Before-and-After Comparisons with diff
For More Information
Chapter 6: Constants and High‑Level Languages
Literal Constants and Program Efficiency
Binding Times
Literal Constants vs. Manifest Constants
Constant Expressions
Manifest Constants vs. Read-Only Memory Objects
Swift let Statements
Enumerated Types
Boolean Constants
Floating-Point Constants
String Constants
Composite Data Type Constants
Constants Don’t Change
For More Information
Chapter 7: Variables in a High‑Level Language
Runtime Memory Organization
The Code, Constant, and Read-Only Sections
The Static Variables Section
The Storage Variables Section
The Stack Section
The Heap Section and Dynamic Memory Allocation
What Is a Variable?
Attributes
Binding
Static Objects
Dynamic Objects
Scope
Lifetime
Variable Definition
Variable Storage
Static Binding and Static Variables
Pseudo-Static Binding and Automatic Variables
Dynamic Binding and Dynamic Variables
Common Primitive Data Types
Integer Variables
Floating-Point/Real Variables
Character Variables
Boolean Variables
Variable Addresses and High-Level Languages
Allocating Storage for Global and Static Variables
Using Automatic Variables to Reduce Offset Sizes
Allocating Storage for Intermediate Variables
Allocating Storage for Dynamic Variables and Pointers
Using Records/Structures to Reduce Instruction Offset Sizes
Storing Variables in Machine Registers
Variable Alignment in Memory
Records and Alignment
For More Information
Chapter 8: Array Data Types
Arrays
Array Declarations
Array Representation in Memory
Swift Array Implementation
Accessing Elements of an Array
Padding vs. Packing
Multidimensional Arrays
Dynamic vs. Static Arrays
For More Information
Chapter 9: Pointer Data Types
The Definition of a Pointer
Pointer Implementation in High-Level Languages
Pointers and Dynamic Memory Allocation
Pointer Operations and Pointer Arithmetic
Adding an Integer to a Pointer
Subtracting an Integer from a Pointer
Subtracting a Pointer from a Pointer
Comparing Pointers
Using Logical AND/OR Operations with Pointers
Using Other Operations with Pointers
A Simple Memory Allocator Example
Garbage Collection
The OS and Memory Allocation
Heap Memory Overhead
Common Pointer Problems
Using an Uninitialized Pointer
Using a Pointer That Contains an Illegal Value
Continuing to Use Storage After It Has Been Freed
Failing to Free Storage After Using It
Accessing Indirect Data Using the Wrong Data Type
Performing Illegal Operations on Pointers
Pointers in Modern Languages
Managed Pointers
For More Information
Chapter 10: String Data Types
Character String Formats
Zero-Terminated Strings
Length-Prefixed Strings
Seven-Bit Strings
HLA Strings
Descriptor-Based Strings
Static, Pseudo-Dynamic, and Dynamic Strings
Static Strings
Pseudo-Dynamic Strings
Dynamic Strings
Reference Counting for Strings
Delphi Strings
Using Strings in a High-Level Language
Unicode Character Data in Strings
The Unicode Character Set
Unicode Code Points
Unicode Code Planes
Surrogate Code Points
Glyphs, Characters, and Grapheme Clusters
Unicode Normals and Canonical Equivalence
Unicode Encodings
Unicode Combining Characters
Unicode String Functions and Performance
For More Information
Chapter 11: Record, Union, and Class Data Types
Records
Declaring Records in Various Languages
Instantiating a Record
Initializing Record Data at Compile Time
Storing Records in Memory
Using Records to Improve Memory Performance
Working with Dynamic Record Types and Databases
Discriminant Unions
Declaring Unions in Various Languages
Storing Unions in Memory
Using Unions in Other Ways
Variant Types
Namespaces
Classes and Objects
Classes vs. Objects
Simple Class Declarations in C++
Class Declarations in C# and Java
Class Declarations in Delphi (Object Pascal)
Class Declarations in HLA
Virtual Method Tables
Abstract Methods
Sharing VMTs
Inheritance in Classes
Polymorphism in Classes
Multiple Inheritance (in C++)
Protocols and Interfaces
Classes, Objects, and Performance
For More Information
Chapter 12: Arithmetic and Logical Expressions
Arithmetic Expressions and Computer Architecture
Stack-Based Machines
Accumulator-Based Machines
Register-Based Machines
Typical Forms of Arithmetic Expressions
Three-Address Architectures
Two-Address Architectures
Architectural Differences and Your Code
Complex Expressions
Optimization of Arithmetic Statements
Constant Folding
Constant Propagation
Dead Code Elimination
Common Subexpression Elimination
Strength Reduction
Induction
Loop Invariants
Optimizers and Programmers
Side Effects in Arithmetic Expressions
Containing Side Effects: Sequence Points
Avoiding Problems Caused by Side Effects
Forcing a Particular Order of Evaluation
Short-Circuit Evaluation
Using Short-Circuit Evaluation with Boolean Expressions
Forcing Short-Circuit or Complete Boolean Evaluation
Comparing Short-Circuit and Complete Boolean Evaluation Efficiency
The Relative Cost of Arithmetic Operations
For More Information
Chapter 13: Control Structures and Programmatic Decisions
How Control Structures Affect a Program’s Efficiency
Introduction to Low-Level Control Structures
The goto Statement
Restricted Forms of the goto Statement
The if Statement
Improving the Efficiency of Certain if/else Statements
Forcing Complete Boolean Evaluation in an if Statement
Forcing Short-Circuit Evaluation in an if Statement
The switch/case Statement
Semantics of a switch/case Statement
Jump Tables vs. Chained Comparisons
Other Implementations of switch/case
The Swift switch Statement
Compiler Output for switch Statements
For More Information
Chapter 14: Iterative Control Structures
The while Loop
Forcing Complete Boolean Evaluation in a while Loop
Forcing Short-Circuit Boolean Evaluation in a while Loop
The repeat..until (do..until/do..while) Loop
Forcing Complete Boolean Evaluation in a repeat..until Loop
Forcing Short-Circuit Boolean Evaluation in a repeat..until Loop
The forever..endfor Loop
Forcing Complete Boolean Evaluation in a forever Loop
Forcing Short-Circuit Boolean Evaluation in a forever Loop
The Definite Loop (for Loops)
For More Information
Chapter 15: Functions and Procedures
Simple Function and Procedure Calls
Return Address Storage
Other Sources of Overhead
Leaf Functions and Procedures
Macros and Inline Functions
Passing Parameters to a Function or Procedure
Activation Records and the Stack
Breaking Down the Activation Record
Assigning Offsets to Local Variables
Associating Offsets with Parameters
Accessing Parameters and Local Variables
Registers to the Rescue
Java VM and Microsoft CLR Parameters and Locals
Parameter-Passing Mechanisms
Pass-by-Value
Pass-by-Reference
Function Return Values
For More Information
Afterword: Engineering Software
Glossary
Online Appendixes
Index