A Designer's Guide to Asynchronous VLSI

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Bypass the limitations of synchronous design and create low power, higher performance circuits with shorter design times using this practical guide to asynchronous design. The fundamentals of asynchronous design are covered, as is a large variety of design styles, while the emphasis throughout is on practical techniques and real-world applications.

Author(s): Peter A. Beerel, Recep O. Ozdag, Marcos Ferretti
Edition: 1
Publisher: Cambridge University Press
Year: 2010

Language: English
Pages: 353
City: Cambridge; New York

Half-title......Page 3
Title......Page 5
Copyright......Page 6
Dedication......Page 7
Contents......Page 9
Acknowledgments......Page 13
1: Introduction......Page 15
1.1 Synchronous design basics......Page 16
1.2.1 Computer-aided design for high-performance......Page 18
1.3 Asynchronous design basics......Page 19
1.4 Asynchronous design flows......Page 20
1.5 Potential advantages of asynchronous design......Page 21
1.5.1 High performance......Page 22
1.5.3 Modularity and ease of design......Page 23
1.6.1 Testing and debugging......Page 24
1.7 Organization of the book......Page 25
References......Page 26
2.1.1 Bundled-data channels......Page 30
2.1.2 One-of-N channel......Page 32
2.1.3 Single-track 1-of-N channel......Page 34
2.1.4 Shared channels......Page 35
2.1.6 Abstract channel diagrams......Page 36
SEQ module......Page 38
PAR module......Page 39
2.2.2 Pipelined handshaking......Page 40
Full buffers versus half buffers......Page 42
Non-linear pipelines......Page 43
2.3 Asynchronous memories and holding state......Page 44
2.4.1 Non-pipelined arbiters......Page 47
2.4.2 Pipelined arbiters......Page 49
2.5.1 Two-place FIFO......Page 50
The 2 x 2 asynchronous crossbar......Page 52
2.6 Exercises......Page 54
References......Page 55
3: Modeling channel-based designs......Page 57
3.1 Communicating sequential processes......Page 58
3.2 Using asynchronous-specific languages......Page 60
3.4 Using existing hardware design languages......Page 61
3.5.1 Using send and receive macros......Page 62
3.5.3 Using enclosed handshaking macros......Page 64
3.5.4 Modeling the dining-philosophers problem in VerilogCSP......Page 66
3.5.5 Modeling a 2 x 2 asynchronous crossbar in VerilogCSP......Page 68
3.6.1 Send and receive macros......Page 69
3.6.2 Synchronization macros......Page 70
3.6.4 Enclosed handshaking macros......Page 71
3.7.2 Monitoring the state of ports and channels......Page 72
3.8 Summary of VerilogCSP macros......Page 75
3.9 Exercises......Page 76
References......Page 78
4: Pipeline performance......Page 80
4.1.1 Forward latency......Page 81
4.1.3 Backward latency......Page 82
4.2 Linear pipelines......Page 83
4.2.1 Homogeneous linear pipelines......Page 84
4.2.2 Series composition of linear pipelines......Page 86
4.3 Pipeline loops......Page 87
4.3.1 Design example: implementation of Euclid's algorithm......Page 88
4.3.2 Performance analysis of rings......Page 90
4.3.3 Improving ring throughput......Page 91
4.4 Forks and joins......Page 93
4.5 More complex pipelines......Page 95
4.6 Exercises......Page 96
References......Page 97
5.1 Petri nets......Page 98
5.1.1 Petri net types......Page 99
5.1.3 Modeling delays in Petri nets......Page 100
5.1.4 Cycle time......Page 101
5.2.1 Full-buffer channel nets......Page 102
5.2.2 Cycle time and throughput......Page 103
5.3.1 Shortest-path-based algorithm......Page 104
5.3.2 Linear-programming-based approaches......Page 106
5.3.4 Karp's algorithm......Page 108
5.4.1 Slack matching: an intuitive analysis......Page 110
5.4.2 Slack matching: an MILP optimization framework......Page 112
5.5 Advanced topic: stochastic performance analysis......Page 114
5.6 Exercises......Page 116
References......Page 117
6: Deadlock......Page 120
6.1 Deadlock caused by incorrect circuit design......Page 121
6.2.1 Data token starvation......Page 122
6.3 Deadlock caused by arbitration......Page 124
6.3.1 Arbiter deadlock Example 1......Page 125
6.3.2 Arbiter deadlock Example 2......Page 127
Reference......Page 129
7.1.1 Delay-insensitive design......Page 130
7.1.3 Speed-independent design......Page 131
7.2 Timing constraints......Page 132
7.4 Logic styles......Page 133
7.4.2 Dynamic logic......Page 134
7.4.3 Muller C-element implementations......Page 135
7.5 Datapath design......Page 137
7.5.1 Bundled data......Page 138
7.5.2 Quasi-delay-insensitive design......Page 139
7.5.4 Indictability......Page 140
7.6.1 Communicating sequential process language refinement......Page 143
7.6.3 Gate-level netlist translation......Page 144
7.6.4 High-level synthesis-based approaches......Page 145
References......Page 146
8.1 Fundamental-mode Huffman circuits......Page 150
8.1.3 Burst-mode specification......Page 152
8.1.4 Hazards......Page 155
Dynamic hazards......Page 156
8.1.5 Burst-mode design example......Page 157
8.2 STG-based design......Page 160
8.2.1 STG example......Page 161
8.2.2 CAD tools for STG-based controller design......Page 162
8.3 Exercises......Page 163
References......Page 164
9.1 Two-phase micropipelines......Page 166
9.1.1 Non-linear pipelines......Page 169
9.1.2 Resource sharing......Page 170
9.1.3 Arbitration......Page 171
9.1.4 Event-module implementations......Page 172
9.2 Four-phase micropipelines......Page 173
9.3 True-four-phase pipelines......Page 176
9.4 Delay line design......Page 178
9.4.1 Asymmetric delay line templates......Page 179
9.4.2 Symmetric delay line templates......Page 181
9.5 Other micropipeline techniques......Page 182
9.6 Exercises......Page 183
References......Page 184
10: Syntax-directed translation......Page 186
10.1.3 Composite commands......Page 187
10.2 Handshake components......Page 188
10.3 Translation algorithm......Page 190
10.4 Control component implementation......Page 191
10.5 Datapath component implementations......Page 192
10.5.1 QDI implementations......Page 194
10.5.2 Single-rail implementations......Page 198
10.6 Peephole optimizations......Page 201
10.7 Self-initialization......Page 202
10.8 Testability......Page 203
10.9.1 Tangram digital compact cassette error corrector......Page 206
10.9.3 Balsa SPA - an asynchronous ARM V5T processor......Page 208
10.9.4 Haste ARM996HS: a commercially available asynchronous ARM core......Page 209
10.10 Summary......Page 210
References......Page 211
11.1 Weak-conditioned half buffer......Page 214
11.2 Precharged half buffer......Page 218
11.2.1 PCHB full adder......Page 221
11.2.2 Conditional reading and writing......Page 222
11.2.3 PCHB reset......Page 223
11.2.4 PCHB register......Page 224
11.3 Precharged full buffer......Page 230
11.4 Why input-completion sensing?......Page 231
11.5 Reduced-stack precharged half buffer (RSPCHB)......Page 234
11.5.1 Conditional reading and writing RSPCHB......Page 238
11.5.2 RSPCHB register......Page 239
11.5.3 Loops using RSPCHB......Page 241
11.6 Reduced-stack precharged full buffer (RSPCFB)......Page 243
11.8 Token insertion......Page 246
11.9 Arbiter......Page 250
11.10 Exercises......Page 252
References......Page 253
12.1 Williams' PS0 pipeline......Page 254
12.3.1 The LP3/1 pipeline......Page 256
12.3.2 The LP2/2 pipeline......Page 258
12.3.3 The LP2/1 pipeline......Page 260
12.4.1 The LPSR2/2 pipeline......Page 261
12.4.2 The LPSR2/1 pipeline......Page 263
12.5 High-capacity pipelines (single-rail)......Page 264
12.6.1 Slow and stalled right-hand environments in forks......Page 267
12.6.2 Slow and stalled left-hand environments in joins......Page 268
12.7.1 Solution 1 for LPSR2/2......Page 269
12.7.3 Pipeline cycle time......Page 270
12.8.1 Joins......Page 271
12.8.2 Forks......Page 272
12.9 High-capacity pipelines (single-rail)......Page 273
12.9.2 Pipeline cycle time......Page 275
12.10 Conditionals......Page 276
12.11 Loops......Page 277
12.12 Simulation results......Page 278
References......Page 280
13.1. Introduction......Page 281
13.2 GasP bundled data......Page 283
13.3 Pulsed logic......Page 284
13.4 Single-track full-buffer template......Page 285
13.4.1 Static single-track full-buffer (SSTFB) template......Page 287
13.4.2 The 10-transition template......Page 288
13.5.2 STFB fork......Page 289
13.5.4 STFB merge......Page 290
13.5.5 STFB split......Page 291
13.5.6 STFB arbiter......Page 292
13.5.7 STFB kpg adder......Page 293
13.5.8 Shared channels......Page 294
13.5.9 Bit generators and buckets......Page 296
13.6.1 Transistor-sizing strategy......Page 297
13.6.2 Output sub-cell STFB_POUT......Page 299
13.6.4 Input channel reset transistors......Page 300
13.6.5 Direct-path current analysis......Page 301
13.6.6 Performance analysis......Page 303
13.8.1 The prefix adder......Page 304
13.8.3 The output circuitry......Page 306
13.8.4 The chip implementation......Page 308
13.8.6 Comparisons......Page 309
13.8.8 Test results......Page 310
13.9 Conclusions and open questions......Page 313
13.10 Exercises......Page 314
References......Page 315
14: Asynchronous crossbar......Page 318
14.1 Fulcrum's Nexus asynchronous crossbar......Page 319
14.1.1 The crossbar......Page 320
14.1.2 Input control......Page 321
14.1.3 Output control......Page 322
14.2.2 Clock domain converter datapath......Page 323
14.2.3 Latency......Page 324
References......Page 325
15.1.1 Background of the algorithm......Page 327
15.1.2 The synchronous design......Page 328
Normalization and its benefits......Page 329
Register-transfer-level design......Page 331
Synchronous chip implementation......Page 334
15.2.1 The asynchronous Fano architecture......Page 335
15.2.2 The skip-ahead unit......Page 337
15.2.3 The memory design......Page 339
15.2.5 Simulation results and comparison......Page 340
15.3 An asynchronous semi-custom physical design flow......Page 343
15.3.1 Physical design flow using standard CAD tools......Page 344
References......Page 349
Index......Page 350