Making Presentation Math Computable: A Context-Sensitive Approach for Translating LaTeX to Computer Algebra Systems

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

This Open-Access-book addresses the issue of translating mathematical expressions from LaTeX to the syntax of Computer Algebra Systems (CAS). Over the past decades, especially in the domain of Sciences, Technology, Engineering, and Mathematics (STEM), LaTeX has become the de-facto standard to typeset mathematical formulae in publications. Since scientists are generally required to publish their work, LaTeX has become an integral part of today's publishing workflow. On the other hand, modern research increasingly relies on CAS to simplify, manipulate, compute, and visualize mathematics. However, existing LaTeX import functions in CAS are limited to simple arithmetic expressions and are, therefore, insufficient for most use cases. Consequently, the workflow of experimenting and publishing in the Sciences often includes time-consuming and error-prone manual conversions between presentational LaTeX and computational CAS formats. To address the lack of a reliable and comprehensive translation tool between LaTeX and CAS, this thesis makes the following three contributions. First, it provides an approach to semantically enhance LaTeX expressions with sufficient semantic information for translations into CAS syntaxes. Second, it demonstrates the first context-aware LaTeX to CAS translation framework LaCASt. Third, the thesis provides a novel approach to evaluate the performance for LaTeX to CAS translations on large-scaled datasets with an automatic verification of equations in digital mathematical libraries.

This is an open access book.

Author(s): André Greiner-Petter
Edition: 1
Publisher: Springer Vieweg
Year: 2023

Language: English
Pages: 215
City: Wiesbaden
Tags: Open Access; LaTeX; Computer Algebra Systems; Presentational Mathematics; Presentation to Computation Translations; Computable Mathematics; Mathematical Information Retrieval

Contents
List of Figures
List of Tables
Abstract
Zusammenfassung
Acknowledgements
CHAPTER 1
Introduction
1.1 Motivation & Problem
1.2 Research Gap
1.3 Research Objective
1.4 Thesis Outline
1.4.1 Publications
1.4.2 Research Path
CHAPTER 2
Mathematical Information Retrieval
2.1 Background and Overview
2.2 Mathematical Formats and Their Conversions
2.2.1 Web Formats
2.2.1.1 MathML
2.2.1.2 OpenMath
2.2.1.3 OMDoc
2.2.2 Word Processor Formats
2.2.2.1 LATEX
2.2.2.2 Semantic/Content LaTeX
2.2.2.3 sTeX
2.2.2.4 Template Editors
2.2.3 Computable Formats
2.2.3.1 Computer Algebra Systems
2.2.3.2 Theorem Prover
2.2.4 Images and Tree Representations
2.2.5 Math Embeddings
2.3 From Presentation to Content Languages
2.3.1 Background
2.3.1.1 Related Work
2.3.2 Benchmarking MathML
2.3.2.1 Collection
2.3.2.2 Gold Standard
2.3.2.3 Evaluation Metrics
2.3.3 Evaluation of Context-Agnostic Conversion Tools
2.3.3.1 Tool Selection
2.3.3.2 Testing framework
2.3.3.3 Results
2.3.4 Summary of MathML Converters
2.4 Mathematical Information Retrieval for LaTeX Translations
CHAPTER 3
Semantification of Mathematical LaTeX
3.1 Semantification via Math-Word Embeddings
3.1.1 Foundations and Related Work
3.1.1.1 Word Embedding
3.1.2 Semantic Knowledge Extraction
3.1.2.1 Evaluation of Math-Embedding-Based Knowledge Extraction
3.1.2.2 Improvement by Considering the Context
3.1.2.3 Visualizing Our Model
3.1.3 On Overcoming the Issues of Knowledge Extraction Approaches
3.1.4 The Future of Math Embeddings
3.2 Semantification with Mathematical Objects of Interest
3.2.1 Related Work
3.2.2 Data Preparation
3.2.2.1 Data Wrangling
3.2.2.2 Complexity of Math
3.2.3 Frequency Distributions of Mathematical Formulae
3.2.3.1 Zipf’s Law
3.2.3.2 Analyzing and Comparing Frequencies
3.2.4 Relevance Ranking for Formulae
3.2.5 Applications
3.2.6 Outlook
3.3 Semantification with Textual Context Analysis
3.3.1 Semantification, Translation & Evaluation Pipeline
CHAPTER 4
From LaTeX to Computer Algebra Systems
4.1 Context-Agnostic Neural Machine Translation
4.1.1 Training Datasets & Preprocessing
4.1.2 Methodology
4.1.3 Evaluation of the Convolutional Network
4.1.3.1 Results
4.1.3.2 Qualitative Analysis and Discussion
4.2 Context-Sensitive Translation
4.2.1 Motivation
4.2.2 Related Work
4.2.3 Formal Mathematical Language Translations
4.2.3.1 Example of a Formal Translation
4.2.4 Document Pre-Processing
4.2.5 Annotated Dependency Graph Construction
4.2.6 Semantic Macro Replacement Patterns
4.2.6.1 Common Knowledge Pattern Recognition
CHAPTER 5
Qualitative and Quantitative Evaluations
5.1 Evaluations on the Digital Library of
Mathematical Functions
5.1.1 The DLMF dataset
5.1.2 Semantic LaTeX to CAS translation
5.1.2.1 Constraint Handling
5.1.2.2 Parse sums, products, integrals, and limits
5.1.2.3 Lagrange’s notation for differentiation and derivatives
5.1.3 Evaluation of the DLMF using CAS
5.1.3.1 Symbolic Evaluation
5.1.3.2 Numerical Evaluation
5.1.4 Results
5.1.4.1 Error Analysis
5.1.5 Conclude Quantitative Evaluations on the DLMF
5.1.5.1 Future Work
5.2 Evaluations on Wikipedia
5.2.1 Symbolic and Numeric Testing
5.2.2 Benchmark Testing
5.2.3 Results
5.2.3.1 Descriptive Term Extractions
5.2.3.2 Semantification
5.2.3.3 Translations from LATEX to CAS
5.2.4 Error Analysis & Discussion
5.2.4.1 Defining Equations
5.2.4.2 Missing Information
5.2.4.3 Non-Matching Replacement Patterns
5.2.5 Conclude Qualitative Evaluations on Wikipedia
CHAPTER 6
Conclusion and FutureWork
6.1 Summary
6.2 Contributions and Impact of the Thesis
6.3 Future Work
6.3.1 Improved Translation Pipeline
6.3.2 Improve LaTeX to MathML Converters
6.3.3 Enhanced Formulae in Wikipedia
6.3.4 Language Independence
Glossary
Bibliography of Publications,
Submissions & Talks
Bibliography