Build Your Own Programming Language: A programmer's guide to designing compilers, interpreters, and DSLs for solving modern computing problems

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Written by the creator of the Unicon programming language, this book will show you how to implement programming languages to reduce the time and cost of creating applications for new or specialized areas of computing Key Features • Reduce development time and solve pain points in your application domain by building a custom programming language • Learn how to create parsers, code generators, file readers, analyzers, and interpreters • Create an alternative to frameworks and libraries to solve domain-specific problems Book Description The need for different types of computer languages is growing rapidly and developers prefer creating domain-specific languages for solving specific application domain problems. Building your own programming language has its advantages. It can be your antidote to the ever-increasing size and complexity of software. In this book, you'll start with implementing the frontend of a compiler for your language, including a lexical analyzer and parser. The book covers a series of traversals of syntax trees, culminating with code generation for a bytecode virtual machine. Moving ahead, you'll learn how domain-specific language features are often best represented by operators and functions that are built into the language, rather than library functions. We'll conclude with how to implement garbage collection, including reference counting and mark-and-sweep garbage collection. Throughout the book, Dr. Jeffery weaves in his experience of building the Unicon programming language to give better context to the concepts where relevant examples are provided in both Unicon and Java so that you can follow the code of your choice of either a very high-level language with advanced features, or a mainstream language. By the end of this book, you'll be able to build and deploy your own domain-specific languages, capable of compiling and running programs. What you will learn • Perform requirements analysis for the new language and design language syntax and semantics • Write lexical and context-free grammar rules for common expressions and control structures • Develop a scanner that reads source code and generate a parser that checks syntax • Build key data structures in a compiler and use your compiler to build a syntax-coloring code editor • Implement a bytecode interpreter and run bytecode generated by your compiler • Write tree traversals that insert information into the syntax tree • Implement garbage collection in your language Who this book is for This book is for software developers interested in the idea of inventing their own language or developing a domain-specific language. Computer science students taking compiler construction courses will also find this book highly useful as a practical guide to language implementation to supplement more theoretical textbooks. Intermediate-level knowledge and experience working with a high-level language such as Java or the C++ language are expected to help you get the most out of this book.

Author(s): Clinton L. Jeffery
Edition: 1
Publisher: Packt Publishing
Year: 2021

Language: English
Commentary: Publisher's PDF
Pages: 494
City: Birmingham, UK
Tags: Syntactic Analysis; Domain-Specific Languages; Code Generation; Compilers; Lexical Analysis; Translators; Garbage Collection; Syntax Trees

Cover
Copyright
Contributors
Table of Contents
Preface
Section 1: Programming Language Frontends
Chapter 1: Why Build Another Programming Language?
So, you want to write your own programming language…
Types of programming language implementations
Organizing a bytecode language implementation
Languages used in the examples
Language versus library – what's the difference?
Applicability to other software engineering tasks
Establishing the requirements for your language
Case study – requirements that inspired the Unicon language
Unicon requirement #1 – preserve what people love about Icon
Unicon requirement #2 – support large-scale programs working on big data
Unicon requirement #3 – high-level input/output for modern applications
Unicon requirement #4 – provide universally implementable system interfaces
Summary
Questions
Chapter 2: Programming Language Design
Determining the kinds of words and punctuation to provide in your language
Specifying the control flow
Deciding on what kinds of data to support
Atomic types
Composite types
Domain-specific types
Overall program structure
Completing the Jzero language definition
Case study – designing graphics facilities in Unicon
Language support for 2D graphics
Adding support for 3D graphics
Summary
Questions
Chapter 3: Scanning Source Code
Technical requirements
Lexemes, lexical categories, and tokens
Regular expressions
Regular expression rules
Regular expression examples
Using UFlex and JFlex
Header section
Regular expressions section
Writing a simple source code scanner
Running your scanner
Tokens and lexical attributes
Expanding our example to construct tokens
Writing a scanner for Jzero
The Jzero flex specification
Unicon Jzero code
Java Jzero code
Running the Jzero scanner
Regular expressions are not always enough
Summary
Questions
Chapter 4: Parsing
Technical requirements
Analyzing syntax
Understanding context-free grammars
Writing context-free grammar rules
Writing rules for programming constructs
Using iyacc and BYACC/J
Declaring symbols in the header section
Putting together the yacc context-free grammar section
Understanding yacc parsers
Fixing conflicts in yacc parsers
Syntax error recovery
Putting together a toy example
Writing a parser for Jzero
The Jzero lex specification
The Jzero yacc specification
Unicon Jzero code
Java Jzero parser code
Running the Jzero parser
Improving syntax error messages
Adding detail to Unicon syntax error messages
Adding detail to Java syntax error messages
Using Merr to generate better syntax error messages
Summary
Questions
Chapter 5: Syntax Trees
Technical requirements
Using GNU make
Learning about trees
Defining a syntax tree type
Parse trees versus syntax trees
Creating leaves from terminal symbols
Wrapping tokens in leaves
Working with YACC's value stack
Wrapping leaves for the parser's value stack
Determining which leaves you need
Building internal nodes from production rules
Accessing tree nodes on the value stack
Using the tree node factory method
Forming syntax trees for the Jzero language
Debugging and testing your syntax tree
Avoiding common syntax tree bugs
Printing your tree in a text format
Printing your tree using dot
Summary
Questions
Section 2: Syntax Tree Traversals
Chapter 6: Symbol Tables
Establishing the groundwork for symbol tables
Declarations and scopes
Assigning and dereferencing variables
Choosing the right tree traversal for the job
Creating and populating symbol tables for each scope
Adding semantic attributes to syntax trees
Defining classes for symbol tables and symbol table entries
Creating symbol tables
Populating symbol tables
Synthesizing the isConst attribute
Checking for undeclared variables
Identifying the bodies of methods
Spotting uses of variables within method bodies
Finding redeclared variables
Inserting symbols into the symbol table
Reporting semantic errors
Handling package and class scopes in Unicon
Mangling names
Inserting self for member variable references
Inserting self as the first parameter in method calls
Testing and debugging symbol tables
Summary
Questions
Chapter 7: Checking Base Types
Type representation in the compiler
Defining a base class for representing types
Subclassing the base class for complex types
Assigning type information to declared variables
Synthesizing types from reserved words
Inheriting types into a list of variables
Determining the type at each syntax tree node
Determining the type at the leaves
Calculating and checking the types at internal nodes
Runtime type checks and type inference in Unicon
Summary
Questions
Chapter 8: Checking Types on Arrays, Method Calls, and Structure Accesses
Checking operations on array types
Handling array variable declarations
Checking types during array creation
Checking types during array accesses
Checking method calls
Calculating the parameters and return type information
Checking the types at each method call site
Checking the type at return statements
Checking structured type accesses
Handling instance variable declarations
Checking types at instance creation
Checking types at instance accesses
Summary
Questions
Chapter 9: Intermediate Code Generation
Preparing to generate code
Why generate intermediate code?
Learning about the memory regions in the generated program
Introducing data types for intermediate code
Adding the intermediate code attributes to the tree
Generating labels and temporary variables
An intermediate code instruction set
Instructions
Declarations
Annotating syntax trees with labels for control flow
Generating code for expressions
Generating code for control flow
Generating label targets for condition expressions
Generating code for loops
Generating intermediate code for method calls
Reviewing the generated intermediate code
Summary
Chapter 10: Syntax Coloring in an IDE
Downloading the example IDEs used in this chapter
Integrating a compiler into a programmer's editor
Analyzing source code from within the IDE
Sending compiler output to the IDE
Avoiding reparsing the entire file on every change
Using lexical information to colorize tokens
Extending the EditableTextList component to support color
Coloring individual tokens as they are drawn
Highlighting errors using parse results
Adding Java support
Summary
Section 3: Code Generation and Runtime Systems
Chapter 11: Bytecode Interpreters
Understanding what bytecode is
Comparing bytecode with intermediate code
Building a bytecode instruction set for Jzero
Defining the Jzero bytecode file format
Understanding the basics of stack machine operation
Implementing a bytecode interpreter
Loading bytecode into memory
Initializing the interpreter state
Fetching instructions and advancing the instruction pointer
Instruction decoding
Executing instructions
Starting up the Jzero interpreter
Writing a runtime system for Jzero
Running a Jzero program
Examining iconx, the Unicon bytecode interpreter
Understanding goal-directed bytecode
Leaving type information in at runtime
Fetching, decoding, and executing instructions
Crafting the rest of the runtime system
Summary
Questions
Chapter 12: Generating Bytecode
Converting intermediate code to Jzero bytecode
Adding a class for bytecode instructions
Mapping intermediate code addresses to bytecode addresses
Implementing the bytecode generator method
Generating bytecode for simple expressions
Generating code for pointer manipulation
Generating bytecode for branches and conditional branches
Generating code for method calls and returns
Handling labels and other pseudo-instructions in intermediate code
Comparing bytecode assembler with binary formats
Printing bytecode in assembler format
Printing bytecode in binary format
Linking, loading, and including the runtime system
Unicon example – bytecode generation in icont
Summary
Questions
Chapter 13: Native Code Generation
Deciding whether to generate native code
Introducing the x64 instruction set
Adding a class for x64 instructions
Mapping memory regions to x64 register-based address modes
Using registers
Starting from a null strategy
Assigning registers to speed up the local region
Converting intermediate code to x64 code
Mapping intermediate code addresses to x64 locations
Implementing the x64 code generator method
Generating x64 code for simple expressions
Generating code for pointer manipulation
Generating native code for branches and conditional branches
Generating code for method calls and returns
Handling labels and pseudo-instructions
Generating x64 output
Writing the x64 code in assembly language format
Going from native assembler to an object file
Linking, loading, and including the runtime system
Summary
Questions
Chapter 14: Implementing Operators and Built-In Functions
Implementing operators
Asking whether operators imply hardware support and vice versa
Adding String concatenation to intermediate code generation
Adding String concatenation to the bytecode interpreter
Adding String concatenation to the native runtime system
Writing built-in functions
Adding built-in functions to the bytecode interpreter
Writing built-in functions for use with the native code implementation
Integrating built-ins with control structures
Developing operators and functions for Unicon
Writing operators in Unicon
Developing Unicon's built-in functions
Summary
Questions
Chapter 15: Domain Control Structures
Knowing when you need a new control structure
Defining what a control structure is
Reducing excessive redundant parameters
Scanning strings in Icon and Unicon
Scanning environments and their primitive operations
Eliminating excessive parameters via a control structure
Rendering regions in Unicon
Rendering 3D graphics from a display list
Specifying rendering regions using built-in functions
Varying graphical levels of detail using nested rendering regions
Creating a rendering region control structure
Summary
Questions
Chapter 16: Garbage Collection
Appreciating the importance of garbage collection
Counting references to objects
Adding reference counting to Jzero
Generating code for heap allocation
Modifying the generated code for the assignment operator
Considering the drawbacks and limitations of reference counting
Marking live data and sweeping the rest
Organizing heap memory regions
Traversing the basis to mark live data
Reclaiming live memory and placing it into contiguous chunks
Summary
Questions
Chapter 17: Final Thoughts
Reflecting on what was learned from writing this book
Deciding where to go from here
Studying programming language design
Learning about implementing interpreters and bytecode machines
Acquiring expertise in code optimization
Monitoring and debugging program executions
Designing and implementing IDEs and GUI builders
Exploring references for further reading
Studying programming language design
Learning about implementing interpreters and bytecode machines
Acquiring expertise in native code and code optimization
Monitoring and debugging program executions
Designing and implementing IDEs and GUI builders
Summary
Section 4: Appendix
Appendix: Unicon Essentials
Running Unicon
Using Unicon's declarations and data types
Declaring different kinds of program components
Using atomic data types
Organizing multiple values using structure types
Evaluating expressions
Forming basic expressions using operators
Invoking procedures, functions, and methods
Iterating and selecting what and how to execute
Generators
Debugging and environmental issues
Learning the basics of the UDB debugger
Environment variables
Preprocessor
Function mini-reference
Selected keywords
Assessments
Chapter 1
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
Chapter 7
Chapter 8
Chapter 11
Chapter 12
Chapter 13
Chapter 14
Chapter 15
Chapter 16
Other Books You May Enjoy
Index