Guide to Computer Processor Architecture: A RISC-V Approach, with High-Level Synthesis

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

The book presents a succession of RISC-V processor implementations in increasing difficulty (non pipelined, pipelined, deeply pipelined, multithreaded, multicore).
Each implementation is shown as an HLS (High Level Synthesis) code in C++ which can really be synthesized and tested on an FPGA based development board (such a board can be freely obtained from the Xilinx University Program targeting the university professors).
The book can be useful for three reasons. First, it is a novel way to introduce computer architecture. The codes given can serve as labs for a processor architecture course. Second, the book content is based on the RISC-V Instruction Set Architecture, which is an open-source machine language promised to become the machine language to be taught, replacing DLX and MIPS. Third, all the designs are implemented through the High Level Synthesis, a tool which is able to translate a C program into an IP (Intellectual Property). Hence, the book can serve to engineers willing to implement processors on FPGA and to researchers willing to develop RISC-V based hardware simulators.

Author(s): Bernard Goossens
Series: Undergraduate Topics in Computer Science
Edition: 1
Publisher: Springer
Year: 2023

Language: English
Pages: 464
City: Cham
Tags: RISC; RISC-V; Toolchains; Reduced Instruction Set Computer; Load-Store Architecture; Floating-Point Instructions; FPGA; Xilinx Vitis; Vivado; Vitis IDE; RISC-V Architecture; Pipelined Processor; Multicycle Pipeline; Multiple Hart Pipeline; Multiple Core Processors

Preface
Processor Architecture: A Do-It-Yourself Approach
RISC-V Open-Source Processor Designs
A Very Practical Introduction to Computer Architecture for Undergraduate Students
A Teaching Tool for Instructors With a GitHub Support
A Practical and Detailed Introduction to High-Level Synthesis and to RISC-V for FPGA Engineers
What the Book Does Not Contain
An Organization in Two Parts: Single Core Designs, Multiple Core Designs
References
Acknowledgements
Contents
Acronyms
Single Core Processors
1 Introduction: What Is an FPGA, What Is High-Level Synthesis or HLS?
1.1 What Hardware to Put in an FPGA?
1.2 Look-Up Table (LUT): A Piece of Hardware to Store a Truth Table
1.3 Combining LUTs
1.4 The Structure of an FPGA
1.5 Programming an FPGA
2 Setting up and Using the Vitis_HLS, Vivado, and Vitis IDE Tools
2.1 Getting the Hardware
2.2 Getting the Software: The Xilinx Vitis Tool
2.3 Installing the Development Board Definition in the Vitis Software
2.4 Installing the Book Resources
2.5 Using the Software
2.5.1 Creating a Project
2.5.2 Creating an IP
2.5.3 Simulating an IP
2.5.4 Synthesizing an IP
2.6 Creating a Design with Vivado
2.7 Loading the IP and Running the FPGA with Vitis
3 Installing and Using the RISC-V Tools
3.1 Installing a RISC-V Toolchain and an Emulator/Debugger
3.1.1 Installing the RISC-V Toolchain
3.1.2 The Spike Simulator
3.1.3 Building Executable Code For the RISC-V FPGA Based Processors
3.2 Debugging With Gdb
3.2.1 Installing Gdb
3.2.2 Installing OpenOCD
3.2.3 Defining a Linker Script File Compatible With the Spike Simulated Machine
3.2.4 Using the Linker Script File to Compile
3.2.5 Defining a Spike Configuration File for OpenOCD
3.2.6 Connecting Spike, OpenOCD, and Gdb
3.2.7 A Debugging Session
3.3 Debugging a Complex Code with Gdb
4 The RISC-V Architecture
4.1 The RISC-V Instruction Set Architecture
4.1.1 The RV32I Registers and the RISC-V Application Binary Interface
4.1.2 The RV32I Instructions
4.1.3 The RV32I Instruction Formats
4.1.4 The Assembler Syntax
4.2 Code Examples
4.2.1 Expressions
4.2.2 Tests
4.2.3 Loops
4.2.4 Function Calls
5 Building a Fetching, Decoding, and Executing Processor
5.1 General Programming Concepts for HLS
5.1.1 The Critical Path
5.1.2 Computing More to Reduce the Critical Path
5.1.3 Parallel Execution
5.2 The Fundamental Execution Time Equation for a Processor
5.3 First Step: Build the Path to Update PC
5.3.1 The Fetching_ip Design
5.3.2 The Fetching_ip Top Function
5.3.3 The Fetch Function
5.3.4 The Execute Function
5.3.5 The IP Running Condition
5.3.6 The IP Simulation with the Testbench
5.3.7 The Simulation Prints
5.3.8 The Fetching_ip Synthesis
5.3.9 The Z1_fetching_ip Vivado Project
5.3.10 The Helloworld.c Program to Drive the Fetching_ip on the FPGA
5.4 Second Step: Add a Bit of Decoding to Compute the Next PC
5.4.1 The RISC-V Instruction Encoding
5.4.2 The Fetching_decoding_ip
5.4.3 The Fetching_decoding_ip.h File
5.4.4 The Fetch Function and the Running_cond_update Function
5.4.5 The Decode Function
5.4.6 The Instruction Execution (Computing Next PC)
5.4.7 The Fetching_decoding_ip Simulation with the Testbench
5.4.8 The Fetching_decoding_ip Synthesis
5.4.9 The Z1_fetching_decoding_ip Vivado Project
5.4.10 The Helloworld.c Code to Drive the Fetching_decoding_ip
5.5 Third Step: Filling the Execute Stage to Build the Register Path
5.5.1 A Fetching, Decoding, and Executing IP: The Fde_ip Design
5.5.2 Two Debugging Tools: Register File Dump and Code Disassembling
5.5.3 The IP Running Condition
5.5.4 The Fde_ip.h File
5.5.5 The Decode Function and the Execute Function
5.5.6 The Register File
5.5.7 Computing
5.5.8 Simulating the Fde_ip With the Testbench
5.5.9 The Fde_ip Synthesis
5.5.10 The Z1_fde_ip Vivado Project
5.5.11 The Helloworld.c Program to Drive the Fde_ip on the FPGA
6 Building a RISC-V Processor
6.1 The Rv32i_npp_ip Top Function
6.1.1 The Rv32i_npp_ip Top Function Prototype, Local Declarations, and Initializations
6.1.2 The Do ... While Loop
6.2 Decoding Update
6.3 Data Memory Accesses: Alignment and Endianness
6.4 The Execute Function
6.4.1 The Computation of the Accessed Address
6.4.2 The Compute_result Function
6.4.3 The Mem_store Function
6.4.4 The Mem_load Function
6.4.5 The Write_reg Function
6.5 Simulating the Rv32i_npp_ip With the Testbench
6.6 Synthesis of the Rv32i_npp_ip
6.7 The Z1_rv32i_npp_ip Vivado Project
6.8 The Helloworld.c Program to Drive the Rv32i_npp_ip on the FPGA
7 Testing Your RISC-V Processor
7.1 Testing the Rv32i_npp_ip Processor with my Test Programs
7.2 More Testing with the Official Riscv-Tests
7.2.1 Running the Riscv-Tests With Spike
7.2.2 The Riscv-Tests Structure
7.2.3 Adapting the Riscv-Tests Structure to the Vitis_HLS Environment
7.2.4 Adding a _start.S Program to Glue All the Tests Together
7.2.5 The Testbench to Simulate the Tests in Vitis_HLS
7.2.6 Running the Riscv-Tests in the Vitis_HLS Environment
7.2.7 Running the Tests on the FPGA
7.3 Running a Benchmark Suite on the Rv32i_npp_ip Processor
7.3.1 The Basicmath_small Benchmark From the Mibench Suite
7.3.2 Running the Basicmath_Small Benchmark on the FPGA
7.3.3 The Other Benchmarks of the Mibench Suite
7.3.4 The Mibench and Riscv-Tests Benchmarks Execution Times on the Rv32i_npp_ip Implementation
7.4 Proposed Exercises: The RISC-V M and F Instruction Extensions
7.4.1 Adapt the Rv32i_npp_ip Design to the RISC-V M Extension
7.4.2 Adapt the Rv32i_npp_ip Design to the RISC-V F Extension
7.5 Debugging Hints
7.5.1 Synthesis Is Not Simulation: You Can Disactivate Some Parts of the Simulation
7.5.2 Infinite Simulation: Replace the ``Do ... While'' Loop By a ``For'' Loop
7.5.3 Frozen IP on the FPGA: Check Ap_int and Ap_uint Variables
7.5.4 Frozen IP on the FPGA: Check Computations Inside #Ifndef __SYNTHESIS__
7.5.5 Frozen IP on the FPGA: Replace the ``While (!IsDone(...));'' Loop By a ``For'' Loop
7.5.6 Frozen IP on the FPGA: Reduce the RISC-V Code Run
7.5.7 Non Deterministic Behaviour on the FPGA: Check Initializations
7.5.8 Debugging Prints Along the Run on the FPGA
8 Building a Pipelined RISC-V Processor
8.1 First Step: Control a Pipeline
8.1.1 The Difference Between a Non-pipelined and a Pipelined Microarchitecture
8.1.2 The Inter-stage Connection Structures
8.1.3 The IP Top Function
8.1.4 Control Flow Instructions Handling in the Pipeline
8.1.5 The Fetch_decode Pipeline Stage
8.1.6 The Execute_wb Pipeline Stage
8.1.7 The Simulation and Synthesis of the IP
8.1.8 The Vivado Project Using the IP
8.1.9 The Execution of the Vivado Project on the Development Board
8.1.10 Further Testing of the Simple_pipeline_ip
8.1.11 Comparison of the Non-pipelined Design with the Pipelined One
8.2 Second Step: Slice a Pipeline into Stages
8.2.1 A 4-Stage Pipeline
8.2.2 The Inter-stage Connections
8.2.3 The Decode Part of the Fetch_decode Stage
8.2.4 The IP Top Function
8.2.5 The Bypass Mechanism in the Execute Stage
8.2.6 The Execute Stage
8.2.7 Memory Load Hazards
8.2.8 The Memory Access Stage
8.2.9 The Writeback Stage
8.2.10 The Testbench Function
8.2.11 The IP Synthesis
8.2.12 The Vivado Project
8.2.13 The Execution of the Vivado Project on the Development Board
8.2.14 Further Testing of the Rv32i_pp_ip
8.3 The Comparison of the 2-Stage Pipeline with the 4-Stage One
9 Building a RISC-V Processor with a Multicycle Pipeline
9.1 The Difference Between a Pipeline and a Multicycle Pipeline
9.1.1 The Wait Signal to Freeze Stages
9.1.2 The Valid Input and Output Bits
9.1.3 Fetching and Computing the Next PC
9.1.4 The Safe Structure of a Multicycle Stage
9.1.5 Multiple Multicycle Stages
9.2 The IP Top Function
9.3 The Pipeline Stages
9.3.1 The Fetch Stage
9.3.2 The Decode Stage
9.3.3 The Issue Stage
9.3.4 The Execute Stage
9.3.5 The Memory Access Stage
9.3.6 The Writeback Stage
9.4 Simulating, Synthesizing, and Running the IP
9.4.1 Simulating and Synthesizing the IP
9.4.2 The Vivado Project and the Implementation Report
9.4.3 Running the IP on the Development Board
9.4.4 Further Testing of the Multicycle_pipeline_ip
9.5 Comparing the Multicycle Pipeline to the 4-Stage Pipeline
9.6 Proposed Exercise: Reduce II to 1
10 Building a RISC-V Processor with a Multiple Hart Pipeline
10.1 Handling Multiple Threads Simultaneously with a Multiple Hart Processor
10.2 A Multiple Hart Memory Model
10.3 The Multihart Pipeline
10.3.1 The Number of Harts
10.3.2 The Multihart Stage States
10.3.3 The Occupation Arrays
10.3.4 The Multihart_ip Top Function
10.3.5 The New_cycle Function to Copy the _to_ Structures Into the _from_ Ones
10.3.6 The Multihart Fetch Stage
10.3.7 The Decode Stage
10.3.8 The Issue Stage
10.3.9 The Execute Stage
10.3.10 The Memory Access Stage
10.3.11 The Writeback Stage
10.3.12 The Lock_unlock_update Function
10.3.13 The Running_cond_update Function
10.4 Simulating the Multihart_ip
10.4.1 Filling Harts with Independent Codes
10.4.2 Filling Harts with a Parallelized Code
10.5 Synthesizing the IP
10.6 The Vivado Project and the Implementation Report
10.7 Running the Multihart_ip on the Development Board
10.7.1 Running Independent Codes
10.7.2 Running a Parallelized Application
10.7.3 Further Testing of the Multihart_ip
10.8 Comparing the Multihart_ip to the 4-stage Pipeline
10.8.1 Two Harts
10.8.2 Four Harts
10.8.3 Eight Harts
Multiple Core Processors
11 Connecting IPs
11.1 The AXI Interconnection System
11.2 The Non Pipelined RISC-V Processor with External Memory IPs
11.2.1 The Top Function with a Bram Interface
11.2.2 The IP Synthesis
11.2.3 The Vivado Project
11.2.4 Running the IP on the Development Board
11.3 Connecting Multiple CPUs and Multiple RAMs Through an AXI Interconnect
11.3.1 The Multiple IPs Design
11.3.2 The CPU Top Function
11.3.3 The CPU Header File and the Testbench Code
11.4 Simulating, Synthesizing, and Running a Multiple IP Design
11.4.1 Simulation
11.4.2 Synthesis
11.4.3 The Vivado Project
11.4.4 Running the Multiple IP Design
12 A Multicore RISC-V Processor
12.1 An Adaptation of the Multicycle_pipeline_ip to Multicore
12.1.1 Adding an IP Number to the Top Function Prototype
12.1.2 The IP Top Function Declarations
12.1.3 The IP Top Function Initializations
12.1.4 The IP Top Function Main Loop
12.1.5 The Register File Initialization
12.1.6 The Memory Access
12.2 Simulating the IP
12.2.1 Simulating Independent Programs on the Different IPs
12.2.2 Simulating a Parallelized Program
12.3 Synthesizing the IP
12.4 The Vivado Project
12.5 Running the IPs on the Development Board
12.5.1 Running Independent Programs
12.5.2 Running a Parallelized Program
12.6 Evaluating the Parallelism Efficiency of the Multicore IP
13 A Multicore RISC-V Processor with Multihart Cores
13.1 An Adaptation of the Multihart_ip to Multicore
13.1.1 The Multicore Multihart IP Top Function Prototype and Local Declarations
13.1.2 The Data Memory Accesses
13.2 Simulating the IP
13.2.1 Simulating Independent Programs
13.2.2 Simulating a Parallelized Program
13.2.3 Synthesizing the IP
13.2.4 The Vivado Project
13.3 Running the IP on the Development Board
13.3.1 Running Independent Programs
13.3.2 Running a Parallelized Program
13.4 Evaluating the Parallelism Efficiency of the Multicore Multihart IP
14 Conclusion: Playing with the Pynq-Z1/Z2 Development Board Leds and Push Buttons
14.1 A Zynq Design to Access Buttons and Leds on the Development Board
14.2 A Design to Access Buttons and Leds from a RISC-V Processor
14.3 Conclusion
Index