European Language Grid: A Language Technology Platform for Multilingual Europe

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

This open access book provides an in-depth description of the EU project European Language Grid (ELG). Its motivation lies in the fact that Europe is a multilingual society with 24 official European Union Member State languages and dozens of additional languages including regional and minority languages. The only meaningful way to enable multilingualism and to benefit from this rich linguistic heritage is through Language Technologies (LT) including Natural Language Processing (NLP), Natural Language Understanding (NLU), Speech Technologies and language-centric Artificial Intelligence (AI) applications. 
The European Language Grid provides a single umbrella platform for the European LT community, including research and industry, effectively functioning as a virtual home, marketplace, showroom, and deployment centre for all services, tools, resources, products and organisations active in the field. Today the ELG cloud platform already offers access to more than 13,000 language processing tools and language resources. It enables all stakeholders to deposit, upload and deploy their technologies and datasets. The platform also supports the long-term objective of establishing digital language equality in Europe by 2030 – to create a situation in which all European languages enjoy equal technological support. 
This is the very first book dedicated to Language Technology and NLP platforms. Cloud technology has only recently matured enough to make the development of a platform like ELG feasible on a larger scale. The book comprehensively describes the results of the ELG project. Following an introduction, the content is divided into four main parts: (I) ELG Cloud Platform; (II) ELG Inventory of Technologies and Resources; (III) ELG Community and Initiative; and (IV) ELG Open Calls and Pilot Projects.

Author(s): Georg Rehm
Series: Cognitive Technologies
Publisher: Springer
Year: 2022

Language: English
Pages: 379
City: Cham

Foreword
Preface
Contents
List of Contributors
European Language Grid EU Project (Parts I, II and III)
European Language Grid FSTP Pilot Projects (Part IV)
Acronyms
Chapter 1 European Language Grid: Introduction
1 Overview and Context
2 The European Language Grid EU Project
3 Beyond the ELG EU Project
4 Summary of this Book
4.1 Part I: ELG Cloud Platform
4.2 Part II: ELG Inventory of Technologies and Resources
4.3 Part III: ELG Community and Initiative
4.4 Part IV: ELG Open Calls and Pilot Projects
References
Part I ELG Cloud Platform
Chapter 2 The European Language Grid Platform: Basic Concepts
1 Introduction
2 Overview of the ELG Platform
2.1 Catalogue
2.2 Repository of Language Resources and Technologies
2.3 Running Language Technology Cloud Services
3 User Types and User Model
4 Architecture
5 Catalogue Contents and Metadata Model
6 Publication Life Cycle
7 ELG and the FAIR Principles
Findability principles
Accessibility principles
Interoperability principles
Re-usability principles
8 Related Platforms and Infrastructures
9 Conclusions
References
Appendix
Chapter 3 Using the European Language Grid as a Consumer
1 Introduction
2 Web-based Interface
2.1 Viewing the Catalogue
2.2 Searching the Catalogue
2.2.1 Free Text Search
2.2.2 Faceted Search
2.3 Viewing Metadata Records and Resources
2.4 Consumer’s Grid
2.5 Try out UIs for Language Technology Services
3 Public REST APIs
3.1 Accessing and Using the Catalogue
3.2 Downloading a Resource
3.3 Language Technology Service Public API
4 Python SDK for Users
4.1 Browsing the Catalogue
4.2 Downloading a Resource
4.3 Obtaining an Access Token
4.4 Calling Language Technology Services
5 User Authentication
6 Licensing and Billing
7 Consumer-Related Functionalities in ELG and other Platforms
7.1 Catalogue and Repository Functionalities
7.2 Language Technology Service Execution
8 Conclusions
References
Chapter 4 Contributing to the European Language Grid as a Provider
1 Introduction
2 Adding Resources to the ELG Platform
2.1 Creating Metadata Records
2.1.1 Creation and Upload of Metadata Files
2.1.2 Metadata Editor
2.2 Uploading and Managing Data Files
2.3 Managing Catalogue Entries
3 Validating and Publishing Metadata Records
4 Entity-Type Specific Requirements
4.1 ELG-compatible Services
4.1.1 Internal LT API Specification
4.1.2 Helper Services
4.1.3 Integration Requirements and Options
4.1.4 Creation of Docker Images
4.1.5 Helper Libraries for Java
4.1.6 Helper Tools for Python
4.1.7 Metadata Requirements
4.1.8 Technical Validation and Registration of ELG-Compatible Services
4.1.9 Custom Try Out Interface
4.2 ELG-hosted Resources
4.2.1 Requirements for ELG-hosted Resources
4.2.2 Packaging Data and Splitting Metadata Records: Recommendations
4.3 Metadata Records for External LRTs, Organisations and Projects
5 Provider-Related Functionalities in ELG and other Platforms
5.1 Metadata Requirements
5.2 Provider User Interface and Metadata User Interface
5.3 Try Out User Interface
5.4 Helper Tools for Packaging Resources
5.5 Packaging Data Resources
6 Conclusions
References
Chapter 5 Cloud Infrastructure of the European Language Grid
1 Introduction
2 Cloud Infrastructure
2.1 Kubernetes and Cloud Native
2.2 Storage
2.3 Software Repositories
2.4 Container Registries
3 Installation
3.1 ELG Charts
3.2 Third-Party Charts
4 Scalability of LT Tools and Services
4.1 Implementation
4.2 Use Cases
5 Conclusions
References
Chapter 6 Interoperable Metadata Bridges to the wider Language Technology Ecosystem
1 Introduction
2 Approach
3 Establishing Interoperable Connections: Four Use Cases
3.1 Use Case 1: OAI-PMH (CLARIN Nodes and ELRC-SHARE)
3.2 Use Case 2: Custom API and Proprietary Schema (Hugging Face)
3.3 Use Case 3: General Catalogues and Standard Schemas (Zenodo)
3.4 Use Case 4: Collaborative Community Initiatives (ELE, ELG)
3.5 Summary of Use Cases
4 Implementing Metadata Interoperability
4.1 ELG Metadata Schema – Relaxed Version
4.2 Publication Policies for Imported Metadata Records
5 Interoperability across Repositories
5.1 Technical Interoperability across Repositories
5.2 Semantic Interoperability across Repositories
5.3 Minimal Metadata Requirements
5.4 Duplicate Resources
6 Conclusions
References
Part II ELG Inventory of Technologies and Resources
Chapter 7 Language Technology Tools and Services
1 Introduction
2 Machine Translation
3 Automatic Speech Recognition
3.1 Case Study: Speech Tools from HENSOLDT
4 Text Analytics
4.1 Case Study: Cogito Discover from Expert.AI
4.2 Case Study: GATE from University of Sheffield
4.3 Case Study: Microservices At Your Service
5 Other Service Types
5.1 Pilot Project: Terminological Concept Systems from Natural Language Text from University of Vienna
5.2 Pilot Project: MKS as Linguistic Linked Open Data from Coreon
6 Conclusions
References
Chapter 8 Datasets, Corpora and other Language Resources
1 Introduction
2 Identification of Language Resources and Repositories
2.1 Identification by the Consortium
2.2 Identification by the National Competence Centres
2.3 Collaboratively Filling the Gaps
2.3.1 Contributions from the ELG Pilot Projects
2.3.2 Contributions from the European Language Equality Project
2.3.3 Platform Users
3 Integrating Repositories into ELG
3.1 Priorities in the Ingestion Work
3.2 Contributing Language Resources
4 Procedures to Ingest Language Resources
4.1 Metadata Conversion
4.1.1 From ELRA Catalogue to ELG
4.1.2 From META-SHARE to ELG
4.1.3 From ELRC-SHARE to ELG
4.1.4 Import into ELG
4.2 Metadata Extraction and Completion
4.2.1 Zenodo
4.2.2 ELRA-SHARE-LRs
4.2.3 Quantum Stat
4.2.4 Hugging Face
4.3 Metadata Harvesting
4.3.1 ELRC-SHARE
4.3.2 LINDAT/CLARIAH-CZ
4.3.3 CLARIN-PL and CLARIN-SI
4.3.4 Zenodo
5 Language Resources in the ELG Catalogue
6 Language Resources and Legal Issues
7 Language Resources and Data Management
8 Conclusions
References
Chapter 9 Language Technology Companies, Research Organisations and Projects
1 Introduction
2 The European Language Technology Landscape
3 Organisations in the European Language Grid
3.1 Collecting the Members of the European LT Community
3.2 Preparation and Integration of Metadata Records
3.3 Claiming and Enriching Organisation Pages
3.4 Organisation Pages in the European Language Grid
4 Projects in the European Language Grid
5 Conclusions
References
Part III ELG Community and Initiative
Chapter 10 European Language Technology Landscape: Communication and Collaborations
1 Introduction
2 Stakeholders of the European Language Grid
2.1 Language Technology Providers
2.1.1 Participants in the Open Calls – Pilot Projects
2.2 Language Technology Users
2.2.1 Public Administrations and NGOs
2.2.2 European Citizens – Members of the European Language Communities
2.3 Additional Horizon 2020 EU Projects
2.4 Major European Projects and Initiatives
2.5 National Competence Centres
2.6 Public at Large
3 Communication and Outreach Activities
3.1 Communication Strategy
3.2 Communication Campaign
3.2.1 Communication Objectives
3.2.2 Communication Channels
4 Collaborations with other Projects and Initiatives
5 Conclusions
References
Chapter 11 ELG National Competence Centres and Events
1 Introduction
2 National Competence Centres
2.1 Tasks and Responsibilities
2.2 Role and Structure
2.3 Visibility and Promotion
2.4 Operational Aspects
3 Conferences and Workshops
3.1 META-FORUM Conference Series
3.1.1 META-FORUM 2019
3.1.2 META-FORUM 2020
3.1.3 META-FORUM 2021
3.1.4 META-FORUM 2022
3.2 ELG Workshops
3.3 Additional Conferences
4 Conclusions
References
Chapter 12 Innovation and Marketplace: A Vision for the European Language Grid
1 Introduction
2 Innovation
2.1 Significance of Innovation
2.2 Types of Innovation and Innovation Strategies
2.3 Open Innovation in the ELG Platform and Marketplace
2.3.1 Products
2.3.2 Services
2.3.3 Further Aspects of Innovation
3 Multi-sided Marketplace Approach
3.1 Foundations for a Successful Marketplace
3.2 ELG Ecosystem of Participants
3.3 Technical and Practical Aspects
4 Conclusions
References
Chapter 13 Sustaining the European Language Grid: Towards the ELG Legal Entity
1 Introduction
2 Long-term Vision and Mission of ELG
2.1 Mission of the European Language Grid
2.2 Added Value for Stakeholders
3 Main Pillars of the Business and Operational Model
3.1 Expectations by the ELG Consortium’s SME Partners
3.2 Key Aspects of the ELG Legal Entity
3.3 Assessment of Operational Costs
3.4 Business Model Canvas
3.5 Product Portfolio and Revenue Streams
3.5.1 Product Category: Marketplace
3.5.2 Product Category: Consulting
3.5.3 Product Category: ELG APIs
3.5.4 Product Category: LT-as-a-Service
3.5.5 Product Category: Data-as-a-Service
Data-as-a-Service (for academic users)
3.5.6 Product Category: Repository-as-a-Service, Platform-as-a-Service
3.5.7 Product Category: Events
3.5.8 Product Category: Marketing and Advertisements
3.5.9 Miscellaneous
ELG Use Cases as Show Cases
3.5.10 Summary and Assessment
3.6 Legal Entity Type
4 Summary and Next Steps
References
Part IV ELG Open Calls and Pilot Projects
Chapter 14 Open Calls and Pilot Projects
1 Introduction
2 Organisation of the Open Calls
2.1 Management Structure and Organisation
2.1.1 Pilot Board
2.1.2 External Evaluators
2.1.3 Management Team
2.1.4 Technical Team
2.2 Timeline
2.3 Communication with Stakeholders
2.4 Submission Process
2.5 Evaluation Process
2.5.1 Preparation of the Evaluation Process
2.5.2 Execution of the Proposal Evaluation Process
3 Results
3.1 Open Call 1
3.1.1 Overview
3.1.2 Selected Projects
3.1.3 Feedback provided and Survey for Proposers
3.2 Open Call 2
3.2.1 Changes made between Open Call 1 and Open Call 2
3.2.2 Overview
3.2.3 Selected Projects
3.2.4 Survey for Proposers to the Open Call 2
4 Pilot Project Execution
5 Conclusions
References
Chapter 15 Basque-speaking Smart Speaker based on Mycroft AI
1 Overview and Objectives of the Pilot Project
2 Mycroft Localisation
3 Privacy, Gender and Proximity
4 Developments in Basque Speech Technology
4.1 ASR Robustness in Noisy Environments
4.2 ASR Closed Grammar-based Recognition
4.3 Neural Network-based Basque TTS
4.4 Gender-neutral Voice
5 Conclusions and Results of the Pilot Project
References
Chapter 16 CEFR Labelling and Assessment Services
1 Overview and Objectives of the Pilot Project
2 Methodology
3 Implementation
4 Evaluation
5 Conclusions and Results of the Pilot Project
References
Chapter 17 European Clinical Case Corpus
1 Overview and Objectives of the Pilot Project
2 Corpus Collection and Annotation
3 Implementation
4 Evaluation
5 Conclusions and Results of the Pilot Project
References
Chapter 18 Extracting Terminological Concept Systems from Natural Language Text
1 Overview and Objectives
2 Methodology
2.1 Preprocessing
2.2 Term Extraction
2.3 Relation Extraction
2.4 Postprocessing
3 Evaluation
4 Conclusions and Results of the Pilot Project
References
Chapter 19 Italian EVALITA Benchmark Linguistic Resources, NLP Services and Tools
1 Overview and Objectives of the Pilot Project
2 Methodology
2.1 Surveying the EVALITA Tasks
2.2 The EVALITA Knowledge Graph
2.3 Anonymisation of Resources
2.4 Release of Data and Models through ELG
3 Conclusions and Results of the Pilot Project
References
Chapter 20 Lingsoft Solutions as Distributable Containers
1 Overview and Objectives of the Pilot Project
2 Methodology
3 Implementation
4 Evaluation
5 Conclusions and Results of the Pilot Project
References
Chapter 21 Motion Capture 3D Sign Language Resources
1 Overview and Objectives of the Pilot Project
2 Methodology and Experiment
2.1 Recording Setup
2.2 Data Annotation
2.3 Data Post-processing
2.4 Dataset Parameters
3 Conclusions and Results of the Pilot Project
References
Chapter 22 Multilingual Image Corpus
1 Overview and Objectives of the Pilot Project
2 Methodology
2.1 Ontology of Visual Objects
2.2 Collection of Images and Metadata
3 Criteria for the Selection of Images
3.1 Generation and Evaluation of Suggestions
3.2 Annotation Protocol
4 Multilingual Classes
5 Conclusions and Results of the Pilot Project
References
Chapter 23 Multilingual Knowledge Systems as Linguistic Linked Open Data
1 Overview and Objectives of the Pilot Project
2 Making Coreon Data Structure LLOD-compatible
3 Real-Time Data Access via a SPARQL Endpoint
4 Conclusions and Results of the Pilot Project
References
Chapter 24 Open Translation Models, Tools and Services
1 Overview and Objectives of the Pilot Project
2 Increasing Language Coverage
3 Conclusions and Results of the Pilot Project
References
Chapter 25 Sign Language Explanations for Terms in a Text
1 Overview and Objectives of the Pilot Project
2 Methodology
3 Implementation
4 Evaluation
5 Conclusions and Results of the Pilot Project
References
Chapter 26 Streaming Language Processing in Manufacturing
1 Overview and Objectives of the Pilot Project
2 Graphical, Flow-based Modeling with Apache StreamPipes
3 Architecture
4 Implementation
5 Conclusions and Results of the Pilot Project
References
Chapter 27 Textual Paraphrase Dataset for Deep Language Modelling
1 Overview and Objectives of the Pilot Project
2 Methodology
3 Implementation
4 Evaluation
5 Conclusions and Results of the Pilot Project
References
Chapter 28 Universal Semantic Annotator
1 Overview and Objectives of the Pilot Project
2 Methodology
3 Implementation
4 Evaluation
5 Conclusions and Results of the Pilot Project
References
Chapter 29 Virtual Personal Assistant Prototype YouTwinDi
1 Overview and Objectives of the Pilot Project
2 Methodology
2.1 Use Case 1: Automated Translation of local News
2.2 Use Case 2: Secure Communication between Virtual Assistants
3 Implementation
4 Conclusions and Results of the Pilot Project
References