Albert Bifet and Richard Kirkby August 2009
Massive Online Analysis (MOA) is a software environment for implementing algorithms and running experiments for online learning from evolving data streams.
MOA includes a collection of offline and online methods as well as tools for evaluation. In particular, it implements boosting, bagging, and Hoeffding Trees, all with and without Na¨ıve Bayes classifiers at the leaves.
MOAis related to WEKA, theWaikato Environment for Knowledge Analysis,which is an award-winning open-source workbench containing implementations of a wide range of batch machine learning methods. WEKA is also written in Java. The main benefits of Java are portability, where applications can be run on any platform with an appropriate Java virtual machine, and the strong and well-developed support libraries. Use of the language is widespread, and features such as the automatic garbage collection help to reduce programmer burden and error.
This text explains the theoretical and practical foundations of the methods and streams available in MOA. The moa and the weka are both birds native to New Zealand. The weka is a cheeky bird of similar size to a chicken. The moa was a large ostrich-like bird, an order of magnitude larger than a weka, that was hunted to extinction.
DATA STREAM MINING: A Practical Approach, Albert Bifet and Richard Kirkby August 2009Contents
Introduction and Preliminaries Preliminaries MOA Stream Mining
Assumptions
Requirements
Mining Strategies
Change Detection StrategiesMOA Experimental Setting Previous Evaluation Practices - Batch Setting
- Data Stream Setting
Evaluation Procedures for Data Streams - Holdout
- Interleaved Test-Then-Train or Prequential
- Comparison
Testing Framework
Environments - Sensor Network
- Handheld Computer
- Server
Data Sources - Random Tree Generator
- Random RBF Generator
- LED Generator
- Waveform Generator
- Function Generator
Generation Speed and Data Size
Evolving Stream Experimental Setting - Concept Drift Framework
- Datasets for concept drift
Stationary Data Stream Learning
Hoeffding Trees The Hoeffding Bound for Tree Induction
The Basic Algorithm - Split Confidence
- Sufficient Statistics
- Grace Period
- Pre-pruning
- Tie-breaking
- Skewed Split Prevention
Memory Management - Poor Attribute Removal
MOA Java Implementation Details - Fast Size Estimates
Numeric Attributes Batch Setting Approaches - EqualWidth
- Equal Frequency
- k-means Clustering
- Fayyad and Irani
- C4.5
Data Stream Approaches - VFML Implementation
- Exhaustive Binary Tree
- Quantile Summaries
- Gaussian Approximation
- Numerical Interval Pruning
Prediction Strategies Majority Class
Adaptive HybridHoeffding Tree Ensembles Batch Setting- Bagging
- Boosting
- Option Trees
Data Stream Setting - Bagging
- Boosting
- Option Trees
Realistic Ensemble Sizes Evolving Data Stream Learning
Evolving data streamsAlgorithms for mining with change- OLIN: Last
- CVFDT: Domingos
- UFFT: Gama
A Methodology for Adaptive Stream Mining - Time Change Detectors and Predictors: AGeneral Framework
- Window Management Models
Optimal Change Detector and Predictor Adaptive Sliding Windows Introduction
Maintaining Updated Windows of Varying Length - Setting
- First algorithm: ADWIN0
- ADWIN0 for Poisson processes
- Improving time and memory requirements
K-ADWIN = ADWIN + Kalman Filtering .Adaptive Hoeffding Trees 135Introduction
Decision Trees on SlidingWindows - HWT-ADWIN : HoeffdingWindow Tree using ADWIN
- CVFDT
Hoeffding Adaptive Trees - Example of performance Guarantee
- Memory Complexity Analysis
Adaptive Ensemble Methods New method of Bagging using trees of different size
New method of Bagging using ADWIN
Adaptive Hoeffding Option Trees
Method performance Bibliography
Massive Online Analysis Manual, Albert Bifet and Richard Kirkby, August 2009:
Contents
Introduction Data streams Evaluation [b]Installation Using the GUI
Using the command line[/b]
Comparing two classifiers Tasks in MOA WriteStreamToARFFFile
MeasureStreamSpeed
LearnModel
EvaluateModel
EvaluatePeriodicHeldOutTest
EvaluateInterleavedTestThenTrain
EvaluatePrequential Evolving data streams Streams- ArffFileStream
- ConceptDriftStream
- ConceptDriftRealStream
- FilteredStream
- AddNoiseFilter
Streams Generators - generators.AgrawalGenerator
- generators.HyperplaneGenerator
- generators.LEDGenerator
- generators.LEDGeneratorDrift
- generators.RandomRBFGenerator
- generators.RandomRBFGeneratorDrift
- generators.RandomTreeGenerator
- generators.SEAGenerator
- generators.STAGGERGenerator
- generators.WaveformGenerator
- generators.WaveformGeneratorDrift
Classifiers Classifiers for static streams - MajorityClass
- Naive Bayes
- DecisionStump
- HoeffdingTree
- HoeffdingTreeNB
- HoeffdingTreeNBAdaptive
- HoeffdingOptionTree
- HoeffdingOptionTreeNB
- HoeffdingTreeOptionNBAdaptive
- OzaBag
- OzaBoost
- OCBoost
Classifiers for evolving streams - OzaBagASHT
- OzaBagADWIN
- SingleClassifierDrift
- AdaHoeffdingOptionTree
Writing a classifier Creating a new classifier
Compiling a classifier Bi-directional interface with WEKA WEKA classifiers from MOA - WekaClassifier
- SingleClassifierDrift
MOA classifiers from WEKAA framework for learning from a continuous supply of examples, a data stream. Includes tools for evaluation and a collection of machine learning algorithms. Related to the WEKA project, also written in Java, while scaling to more demanding problems.
Сайты:
MOAMassive On-line Analysis is an environment for massive data mining