Программное обеспечение - MOA - Massive Online Analysis+Data Stream Mining+Manual

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Albert Bifet and Richard Kirkby August 2009
Massive Online Analysis (MOA) is a software environment for implementing algorithms and running experiments for online learning from evolving data streams.
MOA includes a collection of offline and online methods as well as tools for evaluation. In particular, it implements boosting, bagging, and Hoeffding Trees, all with and without Na¨ıve Bayes classifiers at the leaves.
MOAis related to WEKA, theWaikato Environment for Knowledge Analysis,which is an award-winning open-source workbench containing implementations of a wide range of batch machine learning methods. WEKA is also written in Java. The main benefits of Java are portability, where applications can be run on any platform with an appropriate Java virtual machine, and the strong and well-developed support libraries. Use of the language is widespread, and features such as the automatic garbage collection help to reduce programmer burden and error.
This text explains the theoretical and practical foundations of the methods and streams available in MOA. The moa and the weka are both birds native to New Zealand. The weka is a cheeky bird of similar size to a chicken. The moa was a large ostrich-like bird, an order of magnitude larger than a weka, that was hunted to extinction.
DATA STREAM MINING: A Practical Approach, Albert Bifet and Richard Kirkby August 2009
Contents
Introduction and Preliminaries
Preliminaries
MOA Stream Mining
Assumptions
Requirements
Mining Strategies
Change Detection Strategies

MOA Experimental Setting
Previous Evaluation Practices
- Batch Setting
- Data Stream Setting
Evaluation Procedures for Data Streams
- Holdout
- Interleaved Test-Then-Train or Prequential
- Comparison
Testing Framework
Environments

- Sensor Network
- Handheld Computer
- Server
Data Sources
- Random Tree Generator
- Random RBF Generator
- LED Generator
- Waveform Generator
- Function Generator
Generation Speed and Data Size
Evolving Stream Experimental Setting

- Concept Drift Framework
- Datasets for concept drift
Stationary Data Stream Learning
Hoeffding Trees

The Hoeffding Bound for Tree Induction
The Basic Algorithm

- Split Confidence
- Sufficient Statistics
- Grace Period
- Pre-pruning
- Tie-breaking
- Skewed Split Prevention
Memory Management
- Poor Attribute Removal
MOA Java Implementation Details
- Fast Size Estimates
Numeric Attributes
Batch Setting Approaches
- EqualWidth
- Equal Frequency
- k-means Clustering
- Fayyad and Irani
- C4.5
Data Stream Approaches
- VFML Implementation
- Exhaustive Binary Tree
- Quantile Summaries
- Gaussian Approximation
- Numerical Interval Pruning
Prediction Strategies
Majority Class
Adaptive Hybrid

Hoeffding Tree Ensembles
Batch Setting
- Bagging
- Boosting
- Option Trees
Data Stream Setting
- Bagging
- Boosting
- Option Trees
Realistic Ensemble Sizes
Evolving Data Stream Learning
Evolving data streams

Algorithms for mining with change
- OLIN: Last
- CVFDT: Domingos
- UFFT: Gama
A Methodology for Adaptive Stream Mining
- Time Change Detectors and Predictors: AGeneral Framework
- Window Management Models
Optimal Change Detector and Predictor

Adaptive Sliding Windows
Introduction
Maintaining Updated Windows of Varying Length

- Setting
- First algorithm: ADWIN0
- ADWIN0 for Poisson processes
- Improving time and memory requirements
K-ADWIN = ADWIN + Kalman Filtering .
Adaptive Hoeffding Trees 135
Introduction
Decision Trees on SlidingWindows

- HWT-ADWIN : HoeffdingWindow Tree using ADWIN
- CVFDT
Hoeffding Adaptive Trees

- Example of performance Guarantee
- Memory Complexity Analysis
Adaptive Ensemble Methods

New method of Bagging using trees of different size
New method of Bagging using ADWIN
Adaptive Hoeffding Option Trees
Method performance

Bibliography
Massive Online Analysis Manual, Albert Bifet and Richard Kirkby, August 2009:
Contents
Introduction
Data streams Evaluation
[b]Installation
Using the GUI
Using the command line[/b]
Comparing two classifiers
Tasks in MOA
WriteStreamToARFFFile
MeasureStreamSpeed
LearnModel
EvaluateModel
EvaluatePeriodicHeldOutTest
EvaluateInterleavedTestThenTrain
EvaluatePrequential

Evolving data streams
Streams
- ArffFileStream
- ConceptDriftStream
- ConceptDriftRealStream
- FilteredStream
- AddNoiseFilter
Streams Generators
- generators.AgrawalGenerator
- generators.HyperplaneGenerator
- generators.LEDGenerator
- generators.LEDGeneratorDrift
- generators.RandomRBFGenerator
- generators.RandomRBFGeneratorDrift
- generators.RandomTreeGenerator
- generators.SEAGenerator
- generators.STAGGERGenerator
- generators.WaveformGenerator
- generators.WaveformGeneratorDrift
Classifiers
Classifiers for static streams
- MajorityClass
- Naive Bayes
- DecisionStump
- HoeffdingTree
- HoeffdingTreeNB
- HoeffdingTreeNBAdaptive
- HoeffdingOptionTree
- HoeffdingOptionTreeNB
- HoeffdingTreeOptionNBAdaptive
- OzaBag
- OzaBoost
- OCBoost
Classifiers for evolving streams
- OzaBagASHT
- OzaBagADWIN
- SingleClassifierDrift
- AdaHoeffdingOptionTree
Writing a classifier
Creating a new classifier
Compiling a classifier

Bi-directional interface with WEKA
WEKA classifiers from MOA
- WekaClassifier
- SingleClassifierDrift
MOA classifiers from WEKA
A framework for learning from a continuous supply of examples, a data stream. Includes tools for evaluation and a collection of machine learning algorithms. Related to the WEKA project, also written in Java, while scaling to more demanding problems.
Сайты:
MOA
Massive On-line Analysis is an environment for massive data mining

Language: Russian
Commentary: 141102
Tags: Информатика и вычислительная техника;Искусственный интеллект;Интеллектуальный анализ данных