This first comprehensive treatment of AD describes all chainrule-based techniques for evaluating derivatives of composite functions with particular emphasis on the reverse, or adjoint, mode. The corresponding complexity analysis shows that gradients are always relatively cheap, while the cost of evaluating Jacobian and Hessian matrices is found to be strongly dependent on problem structure and its efficient exploitation. Attempts to minimize operations count and/or memory requirement lead to hard combinatorial optimization problems in the case of Jacobians and a well-defined trade-off curve between spatial and temporal complexity for gradient evaluations.
The book is divided into three parts: a stand-alone introduction to the fundamentals of AD and its software, a thorough treatment of methods for sparse problems, and final chapters on higher derivatives, nonsmooth problems, and program reversal schedules. Each of the chapters concludes with examples and exercises suitable for students with a basic understanding of differential calculus, procedural programming, and numerical linear algebra.