The principal aim of this research is describing to which extent formal
models for linguistic data structuring are crucial in Natural Language
Processing (NLP) applications. In this sense, we will pay particular attention to those Knowledge Management Systems (KMS) which are
designed for the Internet, and also to the enhanced solutions they may
require. In order to appropriately deal with this topics, we will describe
how to achieve computational linguistics applications helpful to humans in establishing and maintaining an advantageous relationship with
technologies, especially with those technologies which are based on or
produce man-machine interactions in natural language.
We will explore the positive relationship which may exist between
well-structured Linguistic Resources (LR) and KMS, in order to state
that if the information architecture of a KMS is based on the formalization of linguistic data, then the system works better and is more consistent.
As for the topics we want to deal with, frist of all it is indispensable
to state that in order to structure efficient and effective Information Retrieval (IR) tools, understanding and formalizing natural language combinatory mechanisms seems to be the first operation to achieve, also
because any piece of information produced by humans on the Internet is
necessarily a linguistic act. Therefore, in this research work we will also
discuss the NLP structuring of a linguistic formalization Hybrid Model,
which we hope will prove to be a useful tool to support, improve and
refine KMSs.
12
Exploring Formal Models of Linguistic Data Structuring
More specifically, in section 1 we will describe how to structure language resources implementable inside KMSs, to what extent they can
improve the performance of these systems and how the problem of linguistic data structuring is dealt with by natural language formalization
methods.
In section 2 we will proceed with a brief review of computational
linguistics, paying particular attention to specific software packages
such Intex, Unitex, NooJ, and Cataloga, which are developed according to Lexicon-Grammar (LG) method, a linguistic theory established
during the 60’s by Maurice Gross.
In section 3 we will describe some specific works useful to monitor
the state of the art in Linguistic Data Structuring Models, Enhanced
Solutions for KMSs, and NLP Applications for KMSs.
In section 4 we will cope with problems related to natural language
formalization methods, describing mainly Transformational-Generative Grammar (TGG) and LG, plus other methods based on statistical
approaches and ontologies.
In section 5 we will propose a Hybrid Model usable in NLP applications in order to create effective enhanced solutions for KMSs. Specific
features and elements of our hybrid model will be shown through some
results on experimental research work. The case study we will present
is a very complex NLP problem yet little explored in recent years, i.e.
Multi Word Units (MWUs) treatment.
In section 6 we will close our research evaluating its results and presenting possible future work perspectives.
Keywords
Knowledge Management System, Natural Language Processing, Linguistic Formal Model, Hybrid Formal Model.
Author(s): Federica Marano
Series: X Ciclo – Nuova Serie 2008-2011
Publisher: Universitá degli Studi di Salerno
Year: 2011
Language: English
Pages: 168
City: Salerno
Tags: Knowledge Management System; Natural Language Processing; NLP; Linguistic Formal Model; Hybrid Formal Model
Foreword 13
Introduction 17
The Relationship between Linguistic Resources and Knowledge
Management Systems 27
1 Well-structured Linguistic Resources for effective
Knowledge Management Systems 27
The Point of View of Computational Linguistics 39
2 A Brief Review of Computational Linguistics 39
2.1 A Short Survey on Some Main Computational
Linguistics Subfields 46
2.2 Lexicon-Grammar, a Frame for Computational
Linguistics 49
2.3 Lexicon-Grammar: Resources, Tools and Software
for Computational Linguistics 55
8
A State of the Art 67
3 Natural Language Formalization 67
3.1 Models of Linguistic Data Structuring 68
3.1.1 PAULA XML: Interchange Format for Linguistic
Annotations 68
3.1.2 EXMARaLDA 70
3.1.3 TUSNELDA 71
3.2 Enhanced Solutions for Knowledge Management
Systems 73
3.2.1 Defining Knowledge Management and Knowledge
Management System Structure 74
3.2.2 Different Types of Knowledge 77
3.2.3 From Knowledge Management to Enhanced
Knowledge Management Systems 80
3.2.4 Knowledge Management Systems 81
3.2.5 KMSs and Data-Driven Decision Support Systems 87
3.3 NLP Applications for Knowledge Management Systems 88
3.3.1 WordNet 89
3.3.2 FrameNet 91
3.3.3 KIM 94
Formal Models for Linguistic Data 97
4 The Question of Linguistic Data Structuring Formal
Models 97
9
4.1 “On the Failure of Generative Grammar” 102
4.2 Lexicon-Grammar: a Theoretical and Methodological
Challenge in the Formal Modelling of Linguistic Data 106
4.3 Statistical Models: Faster Methods of Data Processing 110
4.3.1 Statistical Analysis Tools and Procedures 112
4.4 Ontology-Based Models: a Survey on Classification
Tools 116
Hybrid Model of Linguistic Formalization for Knowledge
Management 121
5 Hybrid Model of NLP 121
5.1 Linguistic Pre-processing of Data for NLP Applications 126
5.1.1 Linguistic Resources and Tools in Translation
Processes 134
Discussions and Conclusions 147
References 151