Embedded Computing: A VLIW Approach to Architecture, Compilers and Tools

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

The fact that there are more embedded computers than general-purpose computers and that we are impacted by hundreds of them every day is no longer news. What is news is that their increasing performance requirements, complexity and capabilities demand a new approach to their design. Fisher, Faraboschi, and Young describe a new age of embedded computing design, in which the processor is central, making the approach radically distinct from contemporary practices of embedded systems design. They demonstrate why it is essential to take a computing-centric and system-design approach to the traditional elements of nonprogrammable components, peripherals, interconnects and buses. These elements must be unified in a system design with high-performance processor architectures, microarchitectures and compilers, and with the compilation tools, debuggers and simulators needed for application development. In this landmark text, the authors apply their expertise in highly interdisciplinary hardware/software development and VLIW processors to illustrate this change in embedded computing. VLIW architectures have long been a popular choice in embedded systems design, and while VLIW is a running theme throughout the book, embedded computing is the core topic. Embedded Computing examines both in a book filled with fact and opinion based on the authors many years of R&D experience. · Complemented by a unique, professional-quality embedded tool-chain on the authors' website, http://www.vliw.org/book · Combines technical depth with real-world experience · Comprehensively explains the differences between general purpose computing systems and embedded systems at the hardware, software, tools and operating system levels. · Uses concrete examples to explain and motivate the trade-offs.

Author(s): Joseph A. Fisher, Paolo Faraboschi, Cliff Young
Edition: 1
Publisher: Morgan Kaufmann
Year: 2004

Language: English
Pages: 712

CHAPTER 1 An Introduction to Embedded Processing......Page 38
1.1 What Is Embedded Computing?......Page 40
1.1.1 Attributes of Embedded Devices......Page 41
1.1.2 Embedded Is Growing......Page 42
1.2 Distinguishing Between Embedded and General-Purpose Computing......Page 43
1.2.1 The “Run One Program Only” Phenomenon......Page 45
1.2.2 Backward and Binary Compatibility......Page 46
1.2.3 Physical Limits in the Embedded Domain......Page 47
1.3 Characterizing Embedded Computing......Page 48
1.3.1 Categorization by Type of Processing Engine......Page 49
1.3.2 Categorization by Application Area......Page 54
1.3.3 Categorization by Workload Differences......Page 59
1.4 Embedded Market Structure......Page 60
1.4.1 The Market for Embedded Processor Cores......Page 61
1.4.2 Business Model of Embedded Processors......Page 62
1.4.3 Costs and Product Volume......Page 63
1.4.5 Industry Standards......Page 65
1.4.6 Product Life Cycle......Page 67
1.4.7 The Transition to SoC Design......Page 68
1.4.8 The Future of Embedded Systems......Page 73
1.5 Further Reading......Page 75
1.6 Exercises......Page 77
CHAPTER 2 An Overview of VLIW and ILP......Page 82
2.1.1 Baseline: Sequential Program Semantics......Page 83
2.1.2 Pipelined Execution, Overlapped Execution, and Multiple Execution Units......Page 84
2.1.3 Dependence and Program Rearrangement......Page 88
2.1.4 ILP and Other Forms of Parallelism......Page 89
2.2 Design Philosophies......Page 91
2.2.1 An Illustration of Design Philosophies: RISC Versus CISC......Page 93
2.2.2 First Definition of VLIW......Page 94
2.2.3 A Design Philosophy: VLIW......Page 96
2.3.1 The Phases of a High-Performance Compiler......Page 100
2.3.2 Compiling for ILP and VLIW......Page 102
2.4 VLIW in the Embedded and DSP Domains......Page 106
2.5.1 ILP Hardware in the 1960s and 1970s......Page 108
2.5.2 The Development of ILP Code Generation in the 1980s......Page 110
2.5.3 VLIW Development in the 1980s......Page 113
2.5.4 ILP in the 1990s and 2000s......Page 114
2.6 Exercises......Page 115
CHAPTER 3 An Overview of ISA Design......Page 120
3.1.1 Architectural State: Memory and Registers......Page 121
3.1.2 Pipelining and Operational Latency......Page 122
3.1.3 Multiple Issue and Hazards......Page 123
3.1.4 Exception and Interrupt Handling......Page 126
3.1.5 Discussion......Page 127
3.2 Basic VLIW Design Principles......Page 128
3.2.1 Implications for Compilers and Implementations......Page 129
3.2.2 Execution Model Subtleties......Page 130
3.3 Designing a VLIW ISA for Embedded Systems......Page 132
3.3.1 Application Domain......Page 133
3.3.2 ILP Style......Page 135
3.3.3 Hardware/Software Tradeoffs......Page 137
3.4.1 A Larger Definition of Architecture......Page 138
3.4.2 Encoding and Architectural Style......Page 142
3.5 VLIW Encoding......Page 149
3.5.2 Instruction Encoding......Page 150
3.5.3 Dispatching and Opcode Subspaces......Page 154
3.6 Encoding and Instruction-set Extensions......Page 156
3.8 Exercises......Page 158
CHAPTER 4 Architectural Structures in ISA Design......Page 162
4.1.2 Datapath Width......Page 164
4.1.3 Operation Repertoire......Page 166
4.1.4 Micro-SIMD Operations......Page 176
4.2 Registers and Clusters......Page 181
4.2.1 Clustering......Page 182
4.2.3 Address and Data Registers......Page 186
4.2.4 Special Register File Features......Page 187
4.3 Memory Architecture......Page 188
4.3.1 Addressing Modes......Page 189
4.3.3 Alignment Issues......Page 190
4.3.4 Caches and Local Memories......Page 191
4.4 Branch Architecture......Page 193
4.4.1 Unbundling Branches......Page 195
4.4.2 Multiway Branches......Page 197
4.4.3 Multicluster Branches......Page 198
4.4.4 Branches and Loops......Page 199
4.5.1 Speculation......Page 200
4.5.2 Predication......Page 205
4.6 System Operations......Page 210
4.7 Further Reading......Page 211
4.8 Exercises......Page 212
CHAPTER 5 Microarchitecture Design......Page 216
5.1.1 Register File Structure......Page 219
5.1.2 Register Files, Technology, and Clustering......Page 220
5.1.3 Separate Address and Data Register Files......Page 221
5.2 Pipeline Design......Page 223
5.2.1 Balancing a Pipeline......Page 224
5.3.1 Instruction Fetch......Page 228
5.3.2 Alignment and Instruction Length......Page 229
5.3.3 Decoding and Dispersal......Page 231
5.4 The Datapath......Page 232
5.4.1 Execution Units......Page 234
5.4.2 Bypassing and Forwarding Logic......Page 237
5.4.3 Exposing Latencies......Page 239
5.4.4 Predication and Selects......Page 241
5.5.1 Local Memory and Caches......Page 243
5.5.2 Byte Manipulation......Page 246
5.5.3 Addressing, Protection, and Virtual Memory......Page 247
5.5.4 Memories in Multiprocessor Systems......Page 248
5.5.5 Memory Speculation......Page 250
5.6.1 Branch Architecture......Page 251
5.6.2 Predication and Selects......Page 252
5.6.3 Interrupts and Exceptions......Page 253
5.6.4 Exceptions and Pipelining......Page 255
5.8 Power Considerations......Page 258
5.8.1 Energy Efficiency and ILP......Page 259
5.9 Further Reading......Page 262
5.10 Exercises......Page 264
6.1 System-on-a-Chip (SoC)......Page 268
6.1.1 IP Blocks and Design Reuse......Page 269
6.1.2 Design Flows......Page 273
6.1.3 SoC Buses......Page 276
6.2 Processor Cores and SoC......Page 282
6.2.1 Nonprogrammable Accelerators......Page 283
6.2.2 Multiprocessing on a Chip......Page 287
6.3 Overview of Simulation......Page 291
6.3.1 Using Simulators......Page 293
6.4 Simulating a VLIW Architecture......Page 294
6.4.1 Interpretation......Page 295
6.4.2 Compiled Simulation......Page 296
6.4.3 Dynamic Binary Translation......Page 305
6.4.4 Trace-driven Simulation......Page 307
6.5 System Simulation......Page 308
6.5.2 Hardware Simulation......Page 309
6.5.3 Accelerating Simulation......Page 312
6.6 Validation and Verification......Page 313
6.6.1 Co-simulation......Page 315
6.6.2 Simulation, Verification, and Test......Page 316
6.7 Further Reading......Page 319
6.8 Exercises......Page 321
7.1 What Is Important in an ILP Compiler?......Page 324
7.2 Embedded Cross-Development Toolchains......Page 327
7.2.1 Compiler......Page 328
7.2.2 Assembler......Page 329
7.2.3 Libraries......Page 331
7.2.4 Linker......Page 333
7.2.6 Run-time Program Loader......Page 334
7.2.7 Simulator......Page 336
7.2.8 Debuggers and Monitor ROMs......Page 337
7.2.9 Automated Test Systems......Page 338
7.3 Structure of an ILP Compiler......Page 339
7.3.2 Machine-independent Optimizer......Page 341
7.4.1 Code Layout Techniques......Page 343
7.5 Embedded-Specific Tradeoffs for Compilers......Page 348
7.5.1 Space, Time, and Energy Tradeoffs......Page 349
7.5.2 Power-specific Optimizations......Page 352
7.6 DSP-Specific Compiler Optimizations......Page 357
7.6.1 Compiler-visible Features of DSPs......Page 359
7.6.2 Instruction Selection and Scheduling......Page 362
7.6.4 Local Memories......Page 364
7.6.5 Register Assignment Techniques......Page 365
7.6.6 Retargetable DSP and ASIP Compilers......Page 366
7.7 Further Reading......Page 369
7.8 Exercises......Page 370
CHAPTER 8 Compiling for VLIWs and ILP......Page 374
8.1.1 Types of Profiles......Page 375
8.1.3 Synthetic Profiles (Heuristics in Lieu of Profiles)......Page 378
8.1.5 Profiles and Embedded Applications......Page 379
8.2 Scheduling......Page 380
8.2.1 Acyclic Region Types and Shapes......Page 382
8.2.3 Schedule Construction......Page 394
8.2.4 Resource Management During Scheduling......Page 405
8.2.5 Loop Scheduling......Page 408
8.2.6 Clustering......Page 417
8.3 Register Allocation......Page 419
8.3.1 Phase-ordering Issues......Page 420
8.4.1 Control and Data Speculation......Page 422
8.4.2 Predicated Execution......Page 423
8.4.3 Prefetching......Page 426
8.5 Instruction Selection......Page 427
8.6 Further Reading......Page 428
8.7 Exercises......Page 432
CHAPTER 9 The Run-time System......Page 436
9.1.1 Exception Handling......Page 437
9.2 Application Binary Interface Considerations......Page 439
9.2.1 Loading Programs......Page 441
9.2.2 Data Layout......Page 443
9.2.3 Accessing Global Data......Page 444
9.2.4 Calling Conventions......Page 446
9.2.5 Advanced ABI Topics......Page 449
9.3 Code Compression......Page 452
9.3.1 Motivations......Page 453
9.3.3 Architectural Compression Options......Page 454
9.3.4 Compression Methods......Page 457
9.4.1 “Traditional” OS Issues Revisited......Page 464
9.4.2 Real-time Systems......Page 465
9.4.3 Multiple Flows of Control......Page 468
9.4.4 Market Considerations......Page 470
9.4.5 Downloadable Code and Virtual Machines......Page 473
9.5.1 Multiprocessing in the Embedded World......Page 475
9.5.2 Multiprocessing and VLIW......Page 476
9.6 Further Reading......Page 477
9.7 Exercises......Page 478
10.1 Programming Language Choices......Page 480
10.1.1 Overview of Embedded Programming Languages......Page 481
10.1.2 Traditional C and ANSI C......Page 482
10.1.3 C++ and Embedded C++......Page 484
10.1.4 Matlab......Page 487
10.1.5 Embedded Java......Page 489
10.1.6 C Extensions for Digital Signal Processing......Page 493
10.1.7 Pragmas, Intrinsics, and Inline Assembly Language Code......Page 499
10.2.1 Importance and Methodology......Page 502
10.2.2 Tuning an Application for Performance......Page 503
10.2.3 Benchmarking......Page 510
10.3 Scalability and Customizability......Page 512
10.3.1 Scalability and Architecture Families......Page 513
10.3.2 Exploration and Scalability......Page 514
10.3.3 Customization......Page 515
10.3.4 Reconfigurable Hardware......Page 517
10.3.5 Customizable Processors and Tools......Page 518
10.3.6 Tools for Customization......Page 520
10.3.7 Architecture Exploration......Page 524
10.4 Further Reading......Page 526
10.5 Exercises......Page 527
11.1 Digital Printing and Imaging......Page 530
11.1.1 Photo Printing Pipeline......Page 532
11.1.2 Implementation and Performance......Page 538
11.2 Telecom Applications......Page 542
11.2.1 Voice Coding......Page 543
11.2.2 Multiplexing......Page 546
11.2.3 The GSM Enhanced Full-rate Codec......Page 547
11.3 Other Application Areas......Page 551
11.3.1 Digital Video......Page 552
11.3.2 Automotive......Page 555
11.3.3 Hard Disk Drives......Page 559
11.3.4 Networking and Network Processors......Page 565
11.4 Further Reading......Page 572
11.5 Exercises......Page 574
APPENDIX A The VEX System......Page 576
A.1 The VEX Instruction-set Architecture......Page 577
A.1.1 VEX Assembly Language Notation......Page 578
A.1.2 Clusters......Page 579
A.1.3 Execution Model......Page 581
A.1.5 Arithmetic and Logic Operations......Page 582
A.1.6 Intercluster Communication......Page 586
A.1.7 Memory Operations......Page 587
A.1.8 Control Operations......Page 589
A.1.9 Structure of the Default VEX Cluster......Page 591
A.1.10 VEX Semantics......Page 593
A.2 The VEX Run-time Architecture......Page 595
A.2.1 Data Allocation and Layout......Page 596
A.2.3 Stack Layout and Procedure Linkage......Page 597
A.3 The VEX C Compiler......Page 603
A.3.1 Command Line Options......Page 605
A.3.2 Compiler Pragmas......Page 613
A.3.3 Inline Expansion......Page 620
A.3.4 Machine Model Parameters......Page 622
A.3.5 Custom Instructions......Page 623
A.4 Visualization Tools......Page 625
A.5 The VEX Simulation System......Page 626
A.5.1 gprof Support......Page 628
A.5.2 Simulating Custom Instructions......Page 631
A.5.3 Simulating the Memory Hierarchy......Page 632
A.6.1 Clusters......Page 633
A.6.2 Machine Model Resources......Page 634
A.7.1 Compile and Run......Page 636
A.7.2 Profiling......Page 639
A.7.3 Custom Architectures......Page 640
A.8 Exercises......Page 642
APPENDIX B Glossary......Page 644
APPENDIX C Bibliography......Page 668
Index......Page 698