Data Mining: 19th Australasian Conference on Data Mining, AusDM 2021, Brisbane, QLD, Australia, December 14-15, 2021, Proceedings (Communications in Computer and Information Science)

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

This book constitutes the refereed proceedings of the 19th Australasian Conference on Data Mining, AusDM 2021, held in Brisbane, Queensland, Australia, in December 2021.*
The 16 revised full papers presented were carefully reviewed and selected from 32 submissions. The papers are organized in sections on research track and application track.

*Due to the COVID-19 pandemic the conference was held online.

Author(s): Yue Xu (editor), Rosalind Wang (editor), Anton Lord (editor), Yee Ling Boo (editor), Richi Nayak (editor), Yanchang Zhao (editor), Graham Williams (editor)
Publisher: Springer
Year: 2021

Language: English
Pages: 252

Preface
Organization
Contents
Research Track
Parallel Nonlinear Dimensionality Reduction Using GPU Acceleration
1 Introduction
2 GPU-RAPIDS Platform for UMAP
2.1 GPU-RAPIDS Platform
2.2 Constructing k-NN Graph: UMAP-Learn vs CuML-UMAP
2.3 Get the Nearest Neighbours
2.4 k-NN Graph Optimisation on GPU
2.5 Embedding Optimisation
3 Benchmark Experimental Comparison
3.1 F-MNIST Data Set
3.2 MNIST Digits Data Set
4 A Case Study on Leukemia Diagnostic Data Analysis
5 Conclusion
References
Taking the Confusion Out of Multinomial Confusion Matrices and Imbalanced Classes
1 Introduction
2 Prior Work on Making Sense of Confusion Matrices
3 Factoring the Confusion Matrix Using Class Odds
4 Application and Demonstration
4.1 Eddy's Probabilistic Reasoning Challenge (2 Classes)
4.2 Cancer of Unknown Primary—CUP (17 Classes)
4.3 HAndwritten SYmbols—HASY (379 Classes)
5 Discussion and Conclusions
References
Sharpshooting Most Beneficial Part of AUC for Detecting Malicious Logs
1 Introduction
2 Definition of AUC and pAUC
3 Problem
4 PUMA
4.1 Exact Definition of pAUC
4.2 Formulation
4.3 Training
5 Datasets
6 Evaluation
6.1 Experimental Conditions
6.2 Feature Extraction
6.3 Methods Evaluated
6.4 Experimental Results
7 Related Work
8 Conclusion
References
A Drift Aware Hierarchical Test Based Approach for Combating Social Spammers in Online Social Networks
1 Introduction
2 Background and Related Work
3 An Approach of Spam Drift Detection Based on User Behavior Changes
3.1 Hierarchical Test-Based Approach for Drift Detection
3.2 Drift Detection Based on Kullback-Leibler Divergence
3.3 Drift Detection Based on Feature Similarity Differences.
3.4 Validation of Drifted Users Based on Peer Acceptance
3.5 Drift Adaptation with Newly Predicted Labels
4 Experimental Setup
4.1 Datasets and Criteria for Evaluation
4.2 Experiment Results and Evaluation
5 Discussion
6 Conclusion
References
Hospital Readmission Prediction Using Semantic Relations Between Medical Codes
1 Introduction
2 Related Works
3 Proposed Model
3.1 Basic Notation
3.2 Generate Description Features
3.3 Learning Representations with description Features
3.4 Prediction Model
4 Experiments
4.1 Experimental Setup
4.2 Experimental Results
5 Conclusion
References
HFM++: An Enhanced Holographic Factorization Machine for Recommendation
1 Introduction
2 Preliminary
3 Our HFM++ Model
3.1 Model Architecture
3.2 Model Learning
4 Empirical Evaluation
4.1 Experimental Setup
4.2 Experimental Results
5 Related Work
6 Conclusion
References
Deep Learning for Bias Detection: From Inception to Deployment
1 Introduction
2 Literature Review
3 Language Based Deep Learning Model for Bias Detection
3.1 Neural Network Language Model
3.2 Regularising Classifier with Language Model
4 Empirical Analysis
4.1 Data Collection
4.2 Baseline Models
4.3 Experimental Results: SSW Dataset
4.4 Experimental Results: the iShield.ai Dataset
5 Deployment and Architecture of Integrated System
6 Conclusion
References
Exploring Fusion Strategies in Deep Learning Models for Multi-Modal Classification
1 Introduction
2 Related Work
3 Methodology: Multi-modality Deep Learning Classification
3.1 Feature Extraction
3.2 Fusion Strategy
3.3 Classification
3.4 Complexity Analysis of a Fusion Strategy
4 Experiments
4.1 Datasets
4.2 Baseline Models
4.3 Experimental Set-Up
4.4 Evaluation Criteria
5 Results and discussion
5.1 Sentiment Analysis
5.2 Hate Speech Detection
5.3 Crisis Event Detection
5.4 Fake News Detection
5.5 Final Remarks
6 Conclusion
References
Application Track
Chameleon: A Python Workflow Toolkit for Feature Selection
1 Introduction
1.1 Feature Selection for Classification
1.2 Python Toolkit for Feature Selection and Classification
1.3 Contribution
2 Methods
2.1 Chameleon Toolkit Structure
2.2 Feature Selection Methods
2.3 Feature Selection with Common Features
2.4 Classification Methods
2.5 Performance Metrics
2.6 Data Sets
2.7 Feature Selection Evaluation
2.8 Software
3 Results and Discussion
3.1 Comparing Toolkit Feature Selection Methods
3.2 Evaluating Common Features Selection Method
4 Conclusion
References
PostMatch: A Framework for Efficient Address Matching
1 Introduction
2 Related Work
3 The PostMatch Framework
3.1 `Site' and `Locality' Fields in Addresses
3.2 Address Parsing
3.3 Normalisation
3.4 The `Postparse' Post-Processing Method
3.5 Address Pair Matching
4 Experimental Evaluation
4.1 Data and Experimental Environment
4.2 Experiments on Address Parsing
4.3 Experiments on Address Matching
5 Conclusions
References
Detection of Classical Cipher Types with Feature-Learning Approaches
1 Introduction
2 Related Work
3 Neural Cipher Identifier
3.1 Long Short-Term Memory
3.2 Transformer
3.3 Learning-Rate Schedulers
3.4 Transfer Learning
3.5 Ensemble Learning
4 Conclusion
References
SOMPS-Net: Attention Based Social Graph Framework for Early Detection of Fake Health News
1 Introduction
2 Related Work
3 Dataset
4 Problem Definition
5 Architecture
5.1 Social Interaction Graph
5.2 Publisher and News Statistics
6 Experiments and Results
6.1 Experimental Setup
6.2 Comparison Systems
6.3 Results Analysis
6.4 Early Detection
7 Conclusion
References
The Impact of Sentiment in the News Media on Daily and Monthly Stock Market Returns
1 Introduction
2 Related Literature
2.1 Feedback from Text Media to Financial Markets
2.2 Computational Methods for Sentiment Analysis
3 Methodology
3.1 The Data
3.2 Topic Modelling of the News Media Corpus
3.3 BERT for Sentiment Classification
3.4 Times Series Analysis of Sentiment Scores
4 Results
5 Conclusion
References
Investigation of Topic Modelling Methods for Understanding the Reports of the Mining Projects in Queensland
1 Introduction
2 Related Works
2.1 Document Representation Model
2.2 Recent Studies of Topic Modelling
3 Topic Modelling Approaches
3.1 Two-Dimensional Topic Modelling: LDA and NMF
3.2 Three-Dimensional (3D) Topic Modelling: Tensor Clustering
3.3 Dataset
3.4 Data Preprocessing
3.5 Evaluation Measures
3.6 Topic Keyword Matching
4 Results and Discussion
4.1 Document Group by LDA
4.2 Document Group by Tensor
4.3 Indecisiveness of Tensor Clustering
5 Conclusion
References
A Semi-automatic Data Extraction System for Heterogeneous Data Sources: a Case Study from Cotton Industry
1 Introduction
2 Background and Related Work
3 Automated Discovery of Relevant Information
3.1 Data Acquisition
3.2 Semantic Enrichment Using Chunking and NER
3.3 Query Formulation
4 Empirical Analysis
4.1 Data Sources
4.2 Implementation
4.3 Results: Information Retrieval Performance
5 Conclusion
References
Nonnegative Matrix Factorization to Understand Spatio-Temporal Traffic Pattern Variations During COVID-19: A Case Study
1 Introduction
2 Related Work
3 Nonnegative Matrix Factorization for Traffic Pattern Elicitation
3.1 Nonnegative Matrix Factorization
3.2 NMF for Traffic Pattern Elicitation
3.3 Understanding Traffic Pattern Variations
4 A Case Study
4.1 Dataset
4.2 Evaluation Measures
4.3 Traffic Pattern Results for 2019
4.4 Traffic Pattern Results for 2020
4.5 Traffic Pattern Variation Analysis
5 Conclusion
References
Correction to: Taking the Confusion Out of Multinomial Confusion Matrices and Imbalanced Classes
Correction to: Chapter “Taking the Confusion Out of Multinomial Confusion Matrices and Imbalanced Classes” in: Y. Xu et al. (Eds.): Data Mining, CCIS 1504, https://doi.org/10.1007/978-981-16-8531-6_2
Author Index