Exploring MathCQA with a Math-aware Search Engine

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

This thesis details how a math-aware search engine Tangent-L

Author(s): Ng YinKi
Year: 2021

Language: English

List of Figures
List of Tables
Introduction: from MathIR to MathCQA
Background
Mathematical Information Retrieval (MathIR)
Formula Representations
Retrieval Models
Effectiveness Measures
The NTCIR MathIR benchmark
Math Community Question Answering (MathCQA)
The ARQMath Lab Series
Dataset: The MSE Collection and Formula Files
Task 1: The MathCQA Task
Task 2: In-Context Formula Retrieval
Math-aware Search Engines at ARQMath-2
TF-IDF and Tangent-S
Approach0
XY-PHOC-DPRL
MIRMU and MSM
DPRL
TU_DBS
NLP_NITS
PSU
Tangent-L: the Math-aware Search Engine
The Vanilla Version
From Formulas to Math Tokens
Single Retrieval Model with BM25+ Ranking
Incorporating Repeated Symbols
From Repeated Symbols to Repetition Tokens
Revised Ranking Formula
Formula Normalization
Five Classes of Semantic Matches
Limitation
Holistic Formula Search
Formula Retrieval with a Formula Corpus
Retrieval with Holistic Formulas
Addressing the MathCQA Task
Query Conversion: Creating Search Queries from Math Questions
Basic Formula Extraction
Keyword Extraction with ``Mathy'' Words
Math-aware Retrieval: Searching Indexed Corpus for Best Matches
Different Forms of Retrievals
Parameter Tuning for Tangent-L
Creating Indexing Units
Data Cleansing for Formula Files
Answer Ranking: Finalizing the Ranked Answers
Incorporating CQA Metadata
Ranking by Proximity
Experimental Runs for Best Configuration
Setup for Evaluation
Comparing Generated Search Queries
Comparing Corpora
Core Tangent-L: Fine Tuning , and Formula Normalization
Core Tangent-L: Fine Tuning for Individual Queries
Tangent-L Variant: Exploring Holistic Formula Search
Validating Proximity
Validating CQA Metadata
MathDowsers' Submission Runs and Results
Submissions Overview
Strengths and Weaknesses
Addressing In-context Formula Retrieval
Formula-centric: Selecting Visually Matching Formulas
Document-centric: Screening Formulas from Matched Documents
MathDowsers' Submission Runs and Results
User Interface for Data Exploration
The MathDowsers' Browser
Highlighting of Matching Terms
Conclusion and Future Work
References
APPENDICES
The ARQMath Lab Official Results
The MathCQA Task in ARQMath-1
The MathCQA Task in ARQMath-2
In-Context Formula Retrieval in ARQMath-1
Official Result in ARQMath-1
Official Result in ARQMath-1 and Re-evaluation during ARQMath-2
In-Context Formula Retrieval in ARQMath-2
The ARQMath Lab Resources
Manually-selected Keywords and Formulas for ARQMath-1 Topics
Word Lists for Search Queries
Top-50 Most Common Words from the MSE Tags
Top-50 Most Common Words from NTCIR MathIR Wikipedia Articles Titles
Optimal values for Individual Topics of Different Dependencies
Conclusions from MathDowsers' Working Notes in the MathCQA Task
ARQMath-1 Submission Runs
ARQMath-2 Submission Runs
Machine Specifications and Efficiency
Machines used for the ARQMath-1 system
Machines used for the ARQMath-2 system
User Interface of the MathDowsers' Browser
The ARQMath Question Panel
The Answers Panel.
Interface for inputting a custom answer ranking
Displaying human relevance judgments