Short compute times are crucial for timely diagnostics in biomedical applications, but lead to a high demand in computing for new and improved imaging techniques. In this book reconfigurable computing with FPGAs is discussed as an alternative to multi-core processing and graphics card accelerators. Instead of adjusting the application to the hardware, FPGAs allow the hardware to also be adjusted to the problem. Acceleration of Biomedical Image Processing with Dataflow on FPGAs covers the transformation of image processing algorithms towards a system of deep pipelines that can be executed with very high parallelism. The transformation process is discussed from initial design decisions to working implementations. Two example applications from stochastic localization microscopy and electron tomography illustrate the approach further. Topics discussed in the book include:• Reconfigurable hardware• Dataflow computing• Image processing• Application acceleration
Author(s): Frederik Grüll, Udo Kebschull
Series: River Publishers Series in Information Science and Technology
Publisher: River Publishers
Year: 2016
Language: English
Pages: 228
City: Gistrup
Cover
Half Title
Series Page
Title Page
Copyright Page
Table of Contents
Foreword
Preface
Acknowledgements
List of Figures
List of Tables
List of Listings
List of Abbreviations
1: Introduction
1.1 Motivation
1.2 Overview
1.2.1 The Idea
1.2.2 Aim of this Book
1.3 Outline
2: Dataflow Computing
2.1 Early Approaches
2.1.1 Control Flow and Dataflow
2.1.2 Dataflow Machines
2.1.3 Dataflow Programs
2.2 Principles of Dataflow Computing on Reconfigurable Hardware
2.2.1 Primitives
2.2.2 Scheduling
2.2.2.1 Dynamic Scheduling
2.2.2.2 Static Scheduling
2.2.2.3 Combined Forms
2.2.3 Image Processing
2.2.3.1 Point Operations
2.2.3.2 Convolutions
2.2.3.3 Reductions
2.2.3.4 Operations with Non-Linear Access Patterns
2.3 FPGA Hardware
2.3.1 Integrated Circuits
2.3.1.1 Configurable Logic Blocks
2.3.1.2 Block RAM
2.3.1.3 Digital Signal Processors
2.3.2 Low-Level Hardware Description Languages
2.3.2.1 VHDL and Verilog
2.3.2.2 FPGA design flow
2.3.3 FPGAs as Application Accelerators
2.3.3.1 Pipelining
2.3.3.2 Flynn’s Taxonomy
2.3.3.3 Limits of Acceleration
2.4 Languages
2.4.1 Imperative Languages
2.4.1.1 Handel-C
2.4.1.2 Xilinx Vivado High-Level Synthesis
2.4.1.3 ROCCC 2.0
2.4.2 Stream Language
2.4.2.1 MaxCompiler
2.4.2.2 Silicon Software VisualApplets
3: Acceleration of Imperative Codewith Dataflow Computing
3.1 Relation to List Processing
3.1.1 Basic Functions
3.1.2 Transformations
3.1.2.1 Nested Lists
3.1.3 Reductions
3.1.4 Generation
3.1.5 Sublists
3.1.6 Searching
3.1.6.1 Indexing Lists
3.1.7 Zipping and Unzipping
3.1.8 Set Operations
3.1.9 Ordered Lists
3.1.10 Summary
3.2 Identification of Throughput Boundaries
3.2.1 Profiling in Software
3.2.2 Profiling the CPU System
3.2.3 Profiling Dataflow Designs
3.3 Pipelining Imperative Control Flows
3.3.1 Sequences
3.3.2 Conditionals
3.3.3 Loops
3.3.3.1 Loop Unrolling
3.3.3.2 Loop Parallelization
3.3.3.3 Loop Cascading
3.3.3.4 Loop Tiling
3.3.3.5 Loop Interweaving
3.3.3.6 Finite-State Machines
3.3.4 Summary
3.4 Efficient Bit and Number Manipulations
3.4.1 Encoding
3.4.1.1 Integers and Fixed-Point Representations
3.4.1.2 Floating-Point Representations
3.4.1.3 Alternative Encodings
3.4.2 Dimensioning
3.4.2.1 Range
3.4.2.2 Precision
3.5 Customizing Memory Access
3.5.1 Memory Layout and Access Patterns
3.5.2 On-Chip Memory
3.5.3 Off-Chip Memory
3.6 Summary
4: Biomedical Image Processingand Reconstruction
4.1 Localization Microscopy
4.1.1 History
4.1.2 Physical Principles
4.1.3 Localization Algorithms
4.1.4 Background Removal
4.1.5 Spot Detection
4.1.6 Feature Extraction
4.1.7 Super-Resolution Image Generation
4.1.8 State of the Art
4.1.9 Analysis of the Algorithm
4.1.9.1 Methods
4.1.9.2 Dataflow
4.1.9.3 Dimensioning of the Hardware
4.1.10 Implementation
4.1.10.1 Host Code
4.1.10.2 Background Removal
4.1.10.3 Spot Detection
4.1.10.4 Spot Separation
4.1.10.5 Feature Extraction
4.1.10.6 Visualization
4.1.11 Results
4.1.11.1 Accuracy
4.1.11.2 Throughput
4.1.11.3 Resource Usage
4.1.12 Discussion
4.2 3D Electron Tomography
4.2.1 Reconstruction Algorithms
4.2.2 State of the Art
4.2.3 Analysis of the Algorithm
4.2.3.1 Modifications
4.2.3.2 Dataflow
4.2.3.3 Dimensioning of the Hardware
4.2.4 Implementation
4.2.4.1 Scheduling
4.2.4.2 External DRAM
4.2.4.3 Ray–Box Intersection
4.2.4.4 Projection Accumulator
4.2.4.5 Residues Storage
4.2.4.6 Multi-Piping
4.2.5 Results
4.2.5.1 Accuracy
4.2.5.2 Throughput
4.2.5.3 Resource Usage
4.2.6 Discussion
5: Conclusion
5.1 Portability
5.2 High-Level Development
5.3 Acceleration
5.4 Outlook
References
Index
About the Authors