Computational Processing of the Portuguese Language: 14th International Conference, PROPOR 2020, Evora, Portugal, March 2–4, 2020, Proceedings (Lecture Notes in Computer Science, 12037)

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

This book constitutes the proceedings of the 14th International Conference on Computational Processing of the Portuguese Language, PROPOR 2020, held in Evora, Portugal, in March 2020.

The 36 full papers presented together with 5 short papers were carefully reviewed and selected from 70 submissions. They are grouped in topical sections on speech processing; resources and evaluation; natural language processing applications; semantics; natural language processing tasks; and multilinguality.

Author(s): Paulo Quaresma (editor), Renata Vieira (editor), Sandra Aluísio (editor), Helena Moniz (editor), Fernando Batista (editor), Teresa Gonçalves (editor)
Publisher: Springer
Year: 2020

Language: English
Pages: 440

Preface
Organization
Contents
Speech Processing
Towards Automatic Determination of Critical Gestures for European Portuguese Sounds
1 Introduction
2 Methods
3 Results and Discussion
3.1 Discussion
4 Conclusions
References
Comparison of Heterogeneous Feature Sets for Intonation Verification
1 Introduction
2 Related Work
3 Imitation Classification Approach
3.1 Word Segmentation
3.2 Feature Sets
3.3 Dynamic Time Warping
3.4 Classification
4 Experimental Setup
4.1 Data Set
4.2 Cross-Validation Experiments
4.3 Experiments in the Test Set: Results and Discussion
5 Conclusions and Future Work
References
The BioVisualSpeech European Portuguese Sibilants Corpus
1 Introduction
2 Related Work
3 Sibilant Consonants and Sigmatism
4 The Sibilants Corpus
4.1 Isolated Sibilants
4.2 Words with Sibilants
4.3 Word Samples Annotation
5 Conclusion
References
Evaluation of Deep Learning Approaches to Text-to-Speech Systems for European Portuguese
1 Introduction
2 Development of the Base Model
3 Evaluation
3.1 Analysis of Grapheme-to-Phoneme Errors
4 Comparison with Merlin
4.1 Interrogative Sentences
5 Speaker Adaptation Tests
6 Conclusions and Future Work
References
Evaluation and Extensions of an Automatic Speech Therapy Platform
1 Introduction
2 The VITHEA Platform
2.1 Automatic Speech Recognition
2.2 Speech Synthesis and Virtual Character Animation
3 Evaluation of the VITHEA Platform
3.1 Evaluation Description
3.2 Evaluation Results
4 Recent Improvements Motivated by the Evaluation
4.1 Adaptation to Mobile Devices
4.2 Follow-Up Systems and Projects After VITHEA
5 Conclusion
References
Resources and Evaluation
A Dataset for the Evaluation of Lexical Simplification in Portuguese for Children
1 Introduction
2 Related Work
3 Correcting and Complementing Synonyms
4 Ranking Synonyms
5 Conclusions and Future Work
References
Situational Irony in Farcical News Headlines
1 Introduction
2 Computational Modelling of Irony
3 Data Collection and Annotation
3.1 Preliminary Analysis
3.2 Gold-Standard Corpus
3.3 Inter-annotator Agreement
4 Modelling Out-of-Domain Contrast
5 Experimental Setup
6 Results and Discussion
7 Main Conclusions and Future Work
References
Inferring the Source of Official Texts: Can SVM Beat ULMFiT?
1 Introduction
2 The Dataset
3 The Models
3.1 Preprocessing
3.2 Baseline
3.3 Transfer Learning
4 Experiments
4.1 Baseline
4.2 Transfer Learning
5 Results
5.1 Ablation Analysis
6 Conclusion
References
Native Language Identification on L2 Portuguese
1 Introduction
2 Related Work
3 Data and Method
3.1 Data
3.2 Classification Models and Evaluation
3.3 Features
4 Results
4.1 Individual Feature Types
4.2 Ensemble Models
5 Conclusion and Future Work
References
Aligning IATE Criminal Terminology to SUMO
1 Introduction
2 Background
2.1 IATE Criminal Domain
2.2 Top Ontologies
2.3 WordNet and Its Alignments to Top Ontologies
3 Related Work
4 Matching Approach
4.1 Pre-processing
4.2 Synset Disambiguation
4.3 Identification of Correspondences to SUMO
5 Experiments
5.1 Results and Discussion
6 Concluding Remarks and Future Work
References
The Construction of a Corpus from the Brazilian Historical-Biographical Dictionary
1 Introduction
2 DHBB
3 Universal Dependencies for Portuguese
4 Text Segmentation
5 Part-of-Speech Tagging
6 Conclusion
References
Natural Language Processing Applications
Making the Most of Synthetic Parallel Texts: Portuguese-Chinese Neural Machine Translation Enhanced with Back-Translation
1 Introduction
2 Related Work
3 NMT Architecture
4 Experimental Setup
4.1 Seed Corpus and MT System
4.2 Experiments
5 Results
6 Discussion and Conclusions
References
Leveraging on Semantic Textual Similarity for Developing a Portuguese Dialogue System
1 Introduction
2 Related Work
3 Training a Model for Portuguese STS
4 Integration in a Dialogue System
4.1 Question Variations
4.2 Evaluation Results
4.3 Error Analysis
5 Out-of-Domain Interactions
5.1 Identifying Out-of-Domain Interactions
5.2 Answering Out-of-Domain Interactions
6 Conclusion
A Example Conversation
References
Fake News Detection on Fake.Br Using Hierarchical Attention Networks
1 Introduction
2 Related Work
3 Dataset and Methods
3.1 Fake.Br Corpus
3.2 Hierarchical Attention Network
3.3 Model Configuration
4 Results
5 Conclusion
References
Screening of Email Box in Portuguese with SVM at Banco do Brasil
1 Introduction
2 Related Works
3 The Application
3.1 Corpus Construction
3.2 Preprocessing, Architecture and Results
4 Discussion
5 Conclusions
References
Back to the Feature, in Entailment Detection and Similarity Measurement for Portuguese
1 Introduction
2 Related Work
3 Modelling Equivalence for Portuguese Sentences
3.1 BERT Embeddings
3.2 Lexical Features
3.3 Non Deep Learning Methods
4 Experimental Setup
4.1 The SICK-BR Corpus
4.2 Machine Learning Choices
4.3 Evaluation Metrics
5 Results and Discussion
6 Conclusion
References
Emoji Prediction for Portuguese
1 Introduction
2 Related Work
3 Methodology
3.1 Data
3.2 Classifiers
4 Results
4.1 Scenario 1: Prediction of Five Emojis
4.2 Scenario 2: Prediction of Ten Emojis
5 Final Notes
References
Vitality Analysis of the Linguistic Atlas of Brazil on Twitter
1 Introduction
2 Materials
2.1 Twitter Corpus
2.2 Lexicon Analysis
2.3 Semantic Analysis
3 Our Methods
3.1 Lexicon Method
3.2 Semantic Method
4 Experiments and Results
4.1 Lexicon Experiments
4.2 Semantic Experiments
5 Results
6 Related Works
7 Discussions and Challenges
8 Conclusion
References
Fact-Checking for Portuguese: Knowledge Graph and Google Search-Based Methods
1 Introduction
2 Related Work
3 Fact-Checking Approaches
3.1 Wikipedia's Knowledge Graph (WKG)
3.2 Proposal with Google's Search Results (GSR)
4 Experiments and Results
4.1 The WKG Method
4.2 The GSR Method
5 Final Remarks
References
Extraction and Use of Structured and Unstructured Features for the Recommendation of Urban Resources
1 Introduction
2 Related Work
3 Proposed Approach
3.1 Data Collection
3.2 Data Analysis
3.3 Recommendation of Urban Resources
3.4 Evaluation
4 Extraction and Use of Indicators
5 Results and Discussion
6 Final Remarks
References
Semantics
A Portuguese Dataset for Evaluation of Semantic Question Answering
1 Introduction
2 Adopted Methodology
2.1 Translation of Question Sentences
2.2 Selection of Keywords
2.3 Adaptation of QALD-7 Queries and Responses to DBPedia PT
2.4 Necessary Adaptations
3 Corpus Application Possibilities
4 Conclusions
References
Exploring the Potentiality of Semantic Features for Paraphrase Detection
1 Introduction
2 Related Work
3 The Corpus
4 Paraphrase Identification Method
5 Experiments and Results
6 Final Remarks
References
Portuguese Language Models and Word Embeddings: Evaluating on Semantic Similarity Tasks
1 Introduction
2 Related Work
3 Word Representations
3.1 Static Word Representations
3.2 Contextualized Word Representations
4 Experiments and Results
4.1 Evaluation
5 Conclusion and Future Work
References
Relation Extraction for Competitive Intelligence
1 Introduction
2 Related Work
3 RelP++: A Relation Extraction Framework
4 Evaluation on a Real Scenario
4.1 Results and Discussion
4.2 Data Visualization
5 Conclusion
References
Natural Language Processing Tasks
Evaluating Methods of Different Paradigms for Subjectivity Classification in Portuguese
1 Introduction
2 Related Work
3 The Corpora
4 The Methods
4.1 Lexicon-Based Method
4.2 Graph-Based Method
4.3 Machine Learning-Based Methods
5 Results and Discussion
6 Final Remarks
References
Sentence Compression for Portuguese
1 Introduction
2 Related Work
3 The Datasets
4 Our Methods
5 Experimental Setup and Results
6 Final Remarks
References
An Investigation of Pre-trained Embeddings in Dependency Parsing
1 Introduction
2 Dependency Parsing Strategies
3 Word Embeddings
4 Method
4.1 Corpus
4.2 Baseline: UDPipe
4.3 Parser: jPTDP
5 Results
6 Concluding Remarks
References
Segmentation of Words Written in the Latin Alphabet: A Systematic Review
1 Introduction
2 Review Protocol
2.1 Research Questions
2.2 Search Strategy
2.3 Inclusion/Exclusion Criteria
2.4 Quality Assessment (QA)
3 Conducting the Review
4 SLR Results
4.1 RQ.1: What Are the Differences in WS Methods in Specific Contexts?
4.2 RQ.2: Which Technique Performed Best in Specific Contexts?
4.3 RQ.3: What Is the State of the Art in WS in the Portuguese Language Context (PL)?
5 Discussion and Conclusions
References
A Deep Learning Model of Common Sense Knowledge for Augmenting Natural Language Processing Tasks in Portuguese Language
1 Introduction
2 Background Knowledge
2.1 World Knowledge in NLU Systems
2.2 Common Sense Knowledge Bases
3 A Deep Learning Model of Common Sense Knowledge
4 Experimental Evaluation
4.1 Target Application A - Stance Classification
4.2 Target Application A – Chatbot in the Portuguese Language
5 Conclusion
References
Linguistic Analysis Model for Monitoring User Reaction on Satirical News for Brazilian Portuguese
1 Introduction
2 Data
3 Method and Analysis
3.1 Categories
3.2 Linguistic Features
4 Results and Discussion
5 Conclusions
References
Multilinguality
Word Embeddings at Post-Editing
1 Introduction
2 Related Work
3 Bilingual Word Embeddings
4 Proposed Approach
4.1 Suggestion Generation at WE@PE
5 Experiments and Results
5.1 Intrinsic Evaluation
5.2 Extrinsic Evaluation
6 Conclusions and Future Work
References
Argument Identification in a Language Without Labeled Data
1 Introduction
2 Related Work
3 Experiments
3.1 First Experiment: Machine Translation
3.2 Second Experiment: Learning Transfer
3.3 Third Experiment: Data Augmentation
4 Conclusions
References
Natural Language Inference for Portuguese Using BERT and Multilingual Information
1 Introduction
2 Related Work
3 ASSIN Dataset
4 BERT
5 Experiments
5.1 Fine Tuning BERT on the ASSIN Corpus
5.2 Balancing the ASSIN Corpus
5.3 Multilingual Data Augmentation on the ASSIN Corpus
6 Results and Discussion
7 Conclusions and Future Works
References
Exploiting Siamese Neural Networks on Short Text Similarity Tasks for Multiple Domains and Languages
1 Introduction
2 Related Work
2.1 Short Text Similarity Shared Tasks
2.2 Siamese Neural Networks
3 Methods
3.1 Datasets
3.2 Model Architecture and Features
3.3 Experimental Setup
4 Results
5 Discussion
6 Conclusion
References
CrossOIE: Cross-Lingual Classifier for Open Information Extraction
1 Introduction
2 Related Work
3 CrossOIE
3.1 Problem Definition
3.2 Cross-Lingual Contextual Embedding
3.3 Model Architecture
4 Experiments
4.1 Dataset
4.2 Experimental Setup
4.3 Evaluation and Results
4.4 Discussion
5 Conclusion and Future Work
References
One Book, Two Language Varieties
1 Introduction
2 Related Work
3 Parallel Corpus
4 Methodology
5 Types of Linguistic Phenomena in EP–BP Alignments
6 Quantitative Results
7 Conclusions
References
Short Papers
Speech Breathing and Expressivity: An Experimental Study in Reading and Singing Styles
1 Introduction
2 Methodology
2.1 Corpus
2.2 Subjects
2.3 The Production and Perception Tasks Performed by the Subjects
2.4 Data Recording
2.5 Analytical Procedures
3 Results
4 Discussion
5 Conclusion
References
Exploring Portuguese Word Embeddings for Discovering Lexical-Semantic Relations
1 Introduction
2 Related Work
3 Data and Tools
4 Experimentation
5 Conclusion
References
The ASSIN 2 Shared Task: A Quick Overview
1 Introduction
2 Task
3 Participants and Results
4 Conclusions
References
A Multiplayer Voice-Enabled Game Platform for the Elderly
1 Introduction
2 Multiplayer Quiz Game
2.1 Game Design
2.2 System Architecture
2.3 Speech and Language Technologies
3 Preliminary Results
3.1 User Satisfaction Surveys
3.2 Speech Recognition Results
4 Conclusions and Future Work
References
Towards a Conversational Agent with ``Character''
1 Introduction
2 Datasets
3 Model
3.1 Generative Model
3.2 The Retrieval-Based Module
4 Evaluation
5 Conclusions and Future Work
References
Author Index