Data Classification
Read it now on the O’Reilly learning platform with a 10-day free trial.
O’Reilly members get unlimited access to books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.
Book description
Research on the problem of classification tends to be fragmented across such areas as pattern recognition, database, data mining, and machine learning. Addressing the work of these different communities in a unified way, this book explores the underlying algorithms of classification as well as applications of classification in a variety of problem domains, including text, multimedia, social network, and biological data. It presents core methods in data classification, covers recent problem domains, and discusses advanced methods for enhancing the quality of the underlying classification results.
Show and hide more
Table of contents Product information
Table of contents
- Preliminaries
- Series
- Dedication
- Editor Biography
- Contributors
- Preface
- Chapter 1 An Introduction to Data Classification
- 1.1 Introduction
- 1.2 Common Techniques in Data Classification
- 1.2.1 Feature Selection Methods
- 1.2.2 Probabilistic Methods
- 1.2.3 Decision Trees
- 1.2.4 Rule-Based Methods
- 1.2.5 Instance-Based Learning
- 1.2.6 SVM Classifiers
- 1.2.7 Neural Networks
- 1.3.1 Large Scale Data: Big Data and Data Streams
- 1.3.1.1 Data Streams
- 1.3.1.2 The Big Data Framework
- 1.4.1 Rare Class Learning
- 1.4.2 Distance Function Learning
- 1.4.3 Ensemble Learning for Data Classification
- 1.4.4 Enhancing Classification Methods with Additional Data
- 1.4.4.1 Semi-Supervised Learning
- 1.4.4.2 Transfer Learning
- 1.4.5.1 Active Learning
- 1.4.5.2 Visual Learning
- Figure 1.1
- Figure 1.2
- Figure 1.3
- Figure 1.4
- Figure 1.5
- Table 1.1
- 2.1 Introduction
- 2.1.1 Data Classification
- 2.1.2 Feature Selection
- 2.1.3 Feature Selection for Classification
- 2.2.1 Filter Models
- 2.2.2 Wrapper Models
- 2.2.3 Embedded Models
- 2.3.1 Features with Group Structure
- 2.3.2 Features with Tree Structure
- 2.3.3 Features with Graph Structure
- 2.4.1 The Grafting Algorithm
- 2.4.2 The Alpha-Investing Algorithm
- 2.4.3 The Online Streaming Feature Selection Algorithm
- 2.5.1 Scalability
- 2.5.2 Stability
- 2.5.3 Linked Data
- Figure 2.1
- Figure 2.2
- Figure 2.3
- Figure 2.4
- Figure 2.5
- Figure 2.6
- Figure 2.7
- Figure 2.8
- Figure 2.9
- Figure 2.10
- 3.1 Introduction
- 3.2 Naive Bayes Classification
- 3.2.1 Bayes’ Theorem and Preliminary
- 3.2.2 Naive Bayes Classifier
- 3.2.3 Maximum-Likelihood Estimates for Naive Bayes Models
- 3.2.4 Applications
- 3.3.1 Logistic Regression
- 3.3.2 Parameters Estimation for Logistic Regression
- 3.3.3 Regularization in Logistic Regression
- 3.3.4 Applications
- 3.4.1 Bayesian Networks
- 3.4.1.1 Bayesian Network Construction
- 3.4.1.2 Inference in a Bayesian Network
- 3.4.1.3 Learning Bayesian Networks
- 3.4.2.1 The Inference and Learning Algorithms
- 3.4.3.1 Conditional Independence
- 3.4.3.2 Clique Factorization
- 3.4.3.3 The Inference and Learning Algorithms
- 3.4.4.1 The Learning Algorithms
- Figure 3.1
- Figure 3.2
- Figure 3.3
- Figure 3.4
- 4.1 Introduction
- 4.2 Top-Down Decision Tree Induction
- 4.2.1 Node Splitting
- 4.2.2 Tree Pruning
- 4.3.1 Splitting Criteria
- 4.3.2 Stopping Conditions
- 4.3.3 Pruning Strategy
- 4.3.4 Handling Unknown Values: Induction and Prediction
- 4.3.5 Other Issues: Windowing and Multivariate Criteria
- 4.4.1 RainForest-Based Approach
- 4.4.2 SPIES Approach
- 4.4.3 Parallel Decision Tree Construction
- 4.5.1 ID3 Family
- 4.5.2 VFDT Family
- 4.5.3 Ensemble Method for Streaming Data
- Figure 4.1
- Figure 4.2
- Figure 4.3
- Figure 4.4
- Figure 4.5
- Figure 4.6
- Table 4.1
- Table 4.2
- Table 4.3
- 5.1 Introduction
- 5.2 Rule Induction
- 5.2.1 Two Algorithms for Rule Induction
- 5.2.1.1 CN2 Induction Algorithm (Ordered Rules)
- 5.2.1.2 RIPPER Algorithm and Its Variations (Ordered Classes)
- 5.3.1 Association Rule Mining
- 5.3.1.1 Definitions of Association Rules, Support, and Confidence
- 5.3.1.2 The Introduction of Apriori Algorithm
- 5.3.3.1 Additional Discussion for CARs Mining
- 5.3.3.2 Building a Classifier Using CARs
- 5.4.1 Text Categorization
- 5.4.2 Intrusion Detection
- 5.4.3 Using Class Association Rules for Diagnostic Data Mining
- 5.4.4 Gene Expression Data Analysis
- Table 5.1
- Table 5.2
- Table 5.3
- 6.1 Introduction
- 6.2 Instance-Based Learning Framework
- 6.3 The Nearest Neighbor Classifier
- 6.3.1 Handling Symbolic Attributes
- 6.3.2 Distance-Weighted Nearest Neighbor Methods
- 6.3.3 Local Distance Scaling
- 6.3.4 Attribute-Weighted Nearest Neighbor Methods
- 6.3.5 Locally Adaptive Nearest Neighbor Classifier
- 6.3.6 Combining with Ensemble Methods
- 6.3.7 Multi-Label Learning
- Figure 6.1
- Figure 6.2
- Figure 6.3
- Figure 6.4
- Figure 6.5
- 7.1 Introduction
- 7.2 The Maximum Margin Perspective
- 7.3 The Regularization Perspective
- 7.4 The Support Vector Perspective
- 7.5 Kernel Tricks
- 7.6 Solvers and Algorithms
- 7.7 Multiclass Strategies
- 7.8 Conclusion
- Bibliography
- Figure 7.1
- Figure 7.2
- Figure 7.3
- Figure 7.4
- Figure 7.5
- Figure 7.6
- Figure 7.7
- Figure 7.8
- Figure 7.9
- 8.1 Introduction
- 8.2 Fundamental Concepts
- 8.2.1 Mathematical Model of a Neuron
- 8.2.2 Types of Units
- 8.2.2.1 McCullough Pitts Binary Threshold Unit
- 8.2.2.2 Linear Unit
- 8.2.2.3 Linear Threshold Unit
- 8.2.2.4 Sigmoidal Unit
- 8.2.2.5 Distance Unit
- 8.2.2.6 Radial Basis Unit
- 8.2.2.7 Polynomial Unit
- 8.2.3.1 Layered Network
- 8.2.3.2 Networks with Feedback
- 8.2.3.3 Modular Networks
- 8.2.5.1 Hebbian Rule
- 8.2.5.2 The Delta Rule
- 8.3.1 The Single-Layer Perceptron
- 8.3.1.1 Perceptron Criterion
- 8.3.1.2 Multi-Class Perceptrons
- 8.3.1.3 Perceptron Enhancements
- 8.3.2.1 Two-Class Adaline
- 8.3.2.2 Multi-Class Adaline
- 8.3.3.1 LVQ1 Training
- 8.3.3.2 LVQ2 Training
- 8.3.3.3 Application and Limitations
- 8.4.1 Radial Basis Function Network
- 8.4.2 RBFN Training
- 8.4.2.1 Using Training Samples as Centers
- 8.4.2.2 Random Selection of Centers
- 8.4.2.3 Unsupervised Selection of Centers
- 8.4.2.4 Supervised Estimation of Centers
- 8.4.2.5 Linear Optimization of Weights
- 8.4.2.6 Gradient Descent and Enhancements
- 8.5.1 MLP Architecture for Classification
- 8.5.1.1 Two-Class Problems
- 8.5.1.2 Multi-Class Problems
- 8.5.1.3 Forward Propagation
- 8.5.2.1 Mean Square Error (MSE)
- 8.5.2.2 Cross-Entropy (CE)
- 8.5.2.3 Minimum Classification Error (MCE)
- 8.5.4.1 Backpropagation with Momentum
- 8.5.4.2 Delta-Bar-Delta
- 8.5.4.3 Rprop Algorithm
- 8.5.4.4 Quick-Prop
- 8.6.1 Use of Prior Knowledge
- 8.6.2 Layer-Wise Greedy Training
- 8.6.2.1 Deep Belief Networks (DBNs)
- 8.6.2.2 Stack Auto-Encoder
- Figure 8.1
- Figure 8.2
- Figure 8.3
- Figure 8.4
- Table 8.1
- 9.1 Introduction
- 9.2 Generic Stream Classification Algorithms
- 9.2.1 Decision Trees for Data Streams
- 9.2.2 Rule-Based Methods for Data Streams
- 9.2.3 Nearest Neighbor Methods for Data Streams
- 9.2.4 SVM Methods for Data Streams
- 9.2.5 Neural Network Classifiers for Data Streams
- 9.2.6 Ensemble Methods for Data Streams
- 9.3.1 Detecting Rare Classes
- 9.3.2 Detecting Novel Classes
- 9.3.3 Detecting Infrequently Recurring Classes
- 9.5.1 Text Streams
- 9.5.2 Graph Streams
- 9.5.3 Uncertain Data Streams
- Figure 9.1
- Figure 9.2
- Table 9.1
- 10.1 Introduction
- 10.2 Scale-Up on a Single Machine
- 10.2.1 Background
- 10.2.2 SVMPerf
- 10.2.3 Pegasos
- 10.2.4 Bundle Methods
- 10.3.1 Parallel Decision Trees
- 10.3.2 Parallel SVMs
- 10.3.3 MRM-ML
- 10.3.4 SystemML
- 11.1 Introduction
- 11.2 Feature Selection for Text Classification
- 11.2.1 Gini Index
- 11.2.2 Information Gain
- 11.2.3 Mutual Information
- 11.2.4 χ2-Statistic
- 11.2.5 Feature Transformation Methods: Unsupervised and Supervised LSI
- 11.2.6 Supervised Clustering for Dimensionality Reduction
- 11.2.7 Linear Discriminant Analysis
- 11.2.8 Generalized Singular Value Decomposition
- 11.2.9 Interaction of Feature Selection with Classification
- 11.5.1 Bernoulli Multivariate Model
- 11.5.2 Multinomial Distribution
- 11.5.3 Mixture Modeling for Text Classification
- 11.6.1 SVM Classifiers
- 11.6.2 Regression-Based Classifiers
- 11.6.3 Neural Network Classifiers
- 11.6.4 Some Observations about Linear Classifiers
- 11.9.1 Classifier Ensemble Learning
- 11.9.2 Data Centered Methods: Boosting and Bagging
- 11.9.3 Optimizing Specific Measures of Accuracy
- 11.10.1 Semi-Supervised Learning
- 11.10.2 Transfer Learning
- 11.10.3 Active Learning
- Figure 11.1
- Figure 11.2
- Figure 11.3
- Figure 11.4
- 12.1 Introduction
- 12.1.1 Overview
- 12.2.1 Text Features
- 12.2.2 Image Features
- 12.2.3 Audio Features
- 12.2.4 Video Features
- 12.3.1 Fusion Methods
- 12.3.2 Audio Visual Speech Recognition
- 12.3.2.1 Visual Front End
- 12.3.2.2 Decision Fusion on HMM
- 12.4.1 Popular Applied Ontology
- 12.4.2 Ontological Relations
- 12.4.2.1 Definition
- 12.4.2.2 Subclass Relation
- 12.4.2.3 Co-Occurrence Relation
- 12.4.2.4 Combination of the Two Relations
- 12.4.2.5 Inherently Used Ontology
- 12.5.1 Data Modalities
- 12.5.2 Challenges in Geographical Classification
- 12.5.3 Geo-Classification for Images
- 12.5.3.1 Classifiers
- Figure 12.1
- Figure 12.2
- Figure 12.3
- Figure 12.4
- Figure 12.5
- Figure 12.6
- Figure 12.7
- Figure 12.8
- Figure 12.9
- 13.1 Introduction
- 13.2 Time Series Representation
- 13.3 Distance Measures
- 13.3.1 Lp-Norms
- 13.3.2 Dynamic Time Warping (DTW)
- 13.3.3 Edit Distance
- 13.3.4 Longest Common Subsequence (LCSS)
- 13.4.1 Speeding up the k-NN
- 14.1 Introduction
- 14.2 Background
- 14.2.1 Sequence
- 14.2.2 Sequence Classification
- 14.2.3 Frequent Sequential Patterns
- 14.2.4 n-Grams
- 14.4.1 Filtering Method for Sequential Feature Selection
- 14.4.2 Pattern Mining Framework for Mining Sequential Features
- 14.4.3 A Wrapper-Based Method for Mining Sequential Features
- 14.5.0.1 Alignment-Based Distance
- 14.5.0.2 Keyword-Based Distance
- 14.5.0.3 Kernel-Based Similarity
- 14.5.0.4 Model-Based Similarity
- 14.5.0.5 Time Series Distance Metrics
- 14.8.1 Semi-Supervised Sequence Classification
- 14.8.2 Classification of Label Sequences
- 14.8.3 Classification of Sequence of Vector Data
- 15.1 Introduction
- 15.2 Collective Classification Problem Definition
- 15.2.1 Inductive vs. Transductive Learning
- 15.2.2 Active Collective Classification
- 15.3.1 Label Propagation
- 15.3.2 Iterative Classification Algorithms
- 15.5.1 Directed Models
- 15.5.2 Undirected Models
- 15.5.3 Approximate Inference in Graphical Models
- 15.5.3.1 Gibbs Sampling
- 15.5.3.2 Loopy Belief Propagation (LBP)
- 15.6.1 Data Graph
- 15.6.2 Relational Features
- Figure 15.1
- Figure 15.2
- 16.1 Introduction
- 16.2 Preliminaries
- 16.2.1 Data Uncertainty Models
- 16.2.2 Classification Framework
- 16.3.1 Decision Trees
- 16.3.2 Rule-Based Classification
- 16.3.3 Associative Classification
- 16.3.4 Density-Based Classification
- 16.3.5 Nearest Neighbor-Based Classification
- 16.3.6 Support Vector Classification
- 16.3.7 Naive Bayes Classification
- Figure 16.1
- Figure 16.2
- Figure 16.3
- Figure 16.4
- Figure 16.5
- 17.1 Introduction
- 17.2 Rare Class Detection
- 17.2.1 Cost Sensitive Learning
- 17.2.1.1 MetaCost: A Relabeling Approach
- 17.2.1.2 Weighting Methods
- 17.2.1.3 Bayes Classifiers
- 17.2.1.4 Proximity-Based Classifiers
- 17.2.1.5 Rule-Based Classifiers
- 17.2.1.6 Decision Trees
- 17.2.1.7 SVM Classifier
- 17.2.2.1 Relation between Weighting and Sampling
- 17.2.2.2 Synthetic Over-Sampling: SMOTE
- 17.2.2.3 One Class Learning with Positive Class
- 17.2.2.4 Ensemble Techniques
- 17.3.1 Difficult Cases and One-Class Learning
- 17.4.1 One Class Novelty Detection
- 17.4.2 Combining Novel Class Detection with Rare Class Detection
- 17.4.3 Online Novelty Detection
- Figure 17.1
- Figure 17.2
- 18.1 Introduction
- 18.2 The Definition of Distance Metric Learning
- 18.3 Supervised Distance Metric Learning Algorithms
- 18.3.1 Linear Discriminant Analysis (LDA)
- 18.3.2 Margin Maximizing Discriminant Analysis (MMDA)
- 18.3.3 Learning with Side Information (LSI)
- 18.3.4 Relevant Component Analysis (RCA)
- 18.3.5 Information Theoretic Metric Learning (ITML)
- 18.3.6 Neighborhood Component Analysis (NCA)
- 18.3.7 Average Neighborhood Margin Maximization (ANMM)
- 18.3.8 Large Margin Nearest Neighbor Classifier (LMNN)
- 18.4.1 Semi-Supervised Metric Learning
- 18.4.1.1 Laplacian Regularized Metric Learning (LRML)
- 18.4.1.2 Constraint Margin Maximization (CMM)
- 18.4.2.1 Pseudo-Metric Online Learning Algorithm (POLA)
- 18.4.2.2 Online Information Theoretic Metric Learning (OITML)
- Table 18.1
- Table 18.2
- 19.1 Introduction
- 19.2 Bayesian Methods
- 19.2.1 Bayes Optimal Classifier
- 19.2.2 Bayesian Model Averaging
- 19.2.3 Bayesian Model Combination
- 19.3.1 General Idea
- 19.3.2 Random Forest
- 19.4.1 General Boosting Procedure
- 19.4.2 AdaBoost
- 19.5.1 General Stacking Procedure
- 19.5.2 Stacking and Cross-Validation
- 19.5.3 Discussions
- Figure 19.1
- Table 19.1
- Table 19.2
- Table 19.3
- Table 19.4
- Table 19.5
- Table 19.6
- Table 19.7
- Table 19.8
- Table 19.9
- Table 19.10
- Table 19.11
- Table 19.12
- Table 19.13
- Table 19.14
- Table 19.15
- 20.1 Introduction
- 20.1.1 Transductive vs. Inductive Semi-Supervised Learning
- 20.1.2 Semi-Supervised Learning Framework and Assumptions
- 20.2.1 Algorithms
- 20.2.2 Description of a Representative Algorithm
- 20.2.3 Theoretical Justification and Relevant Results
- 20.3.1 Algorithms
- 20.3.2 Description of a Representative Algorithm
- 20.3.3 Theoretical Justification and Relevant Results
- 20.4.1 Algorithms
- 20.4.1.1 Graph Cut
- 20.4.1.2 Graph Transduction
- 20.4.1.3 Manifold Regularization
- 20.4.1.4 Random Walk
- 20.4.1.5 Large Scale Learning
- 20.5.1 Algorithms
- 20.5.2 Description of a Representative Algorithm
- 20.5.3 Theoretical Justification and Relevant Results
- Figure 20.1
- Figure 20.2
- 21.1 Introduction
- 21.2 Transfer Learning Overview
- 21.2.1 Background
- 21.2.2 Notations and Definitions
- 21.3.1 Instance-Based Approach
- 21.3.1.1 Case I: No Target Labeled Data
- 21.3.1.2 Case II: A Few Target Labeled Data
- 21.3.2.1 Encoding Specific Knowledge for Feature Learning
- 21.3.2.2 Learning Features by Minimizing Distance between Distributions
- 21.3.2.3 Learning Features Inspired by Multi-Task Learning
- 21.3.2.4 Learning Features Inspired by Self-Taught Learning
- 21.3.2.5 Other Feature Learning Approaches
- 21.4.1 Heterogeneous Feature Spaces
- 21.4.2 Different Label Spaces
- 21.6.1 Binary Classification vs. Multi-Class Classification
- 21.6.2 Knowledge Transfer from Multiple Source Domains
- 21.6.3 Transfer Learning Meets Active Learning
- 21.7.1 NLP Applications
- 21.7.2 Web-Based Applications
- 21.7.3 Sensor-Based Applications
- 21.7.4 Applications to Computer Vision
- 21.7.5 Applications to Bioinformatics
- 21.7.6 Other Applications
- Figure 21.1
- Table 21.1
- Table 21.2
- Table 21.3
- 22.1 Introduction
- 22.2 Motivation and Comparisons to Other Strategies
- 22.2.1 Comparison with Other Forms of Human Feedback
- 22.2.2 Comparisons with Semi-Supervised and Transfer Learning
- 22.3.1 Heterogeneity-Based Models
- 22.3.1.1 Uncertainty Sampling
- 22.3.1.2 Query-by-Committee
- 22.3.1.3 Expected Model Change
- 22.3.2.1 Expected Error Reduction
- 22.3.2.2 Expected Variance Reduction
- 22.4.1 A Simple Example
- 22.4.2 Existing Works
- 22.4.3 Preliminaries
- 22.4.4 Importance Weighted Active Learning
- 22.4.4.1 Algorithm
- 22.4.4.2 Consistency
- 22.4.4.3 Label Complexity
- 22.5.1 Active Learning in Sequences
- 22.5.2 Active Learning in Graphs
- 22.5.2.1 Classification of Many Small Graphs
- 22.5.2.2 Node Classification in a Single Large Graph
- 22.6.1 Active Learning of Features
- 22.6.2 Active Learning of Kernels
- 22.6.3 Active Learning of Classes
- 22.6.4 Streaming Active Learning
- 22.6.5 Multi-Instance Active Learning
- 22.6.6 Multi-Label Active Learning
- 22.6.7 Multi-Task Active Learning
- 22.6.8 Multi-View Active Learning
- 22.6.9 Multi-Oracle Active Learning
- 22.6.10 Multi-Objective Active Learning
- 22.6.11 Variable Labeling Costs
- 22.6.12 Active Transfer Learning
- 22.6.13 Active Reinforcement Learning
- Figure 22.1
- Figure 22.2
- 23.1 Introduction
- 23.1.1 Requirements for Visual Classification
- 23.1.2 Visualization Metaphors
- 23.1.2.1 2D and 3D Spaces
- 23.1.2.2 More Complex Metaphors
- 23.2.1 Nomograms
- 23.2.1.1 Naïve Bayes Nomogram
- 23.2.2.1 Edge Cluttering
- 23.2.3.1 Star Coordinates
- 23.2.4.1 Clustering
- 23.2.4.2 Naïve Bayes Classification
- 23.2.5.1 Self-Organizing Maps
- 23.2.5.2 Generative Topographic Mapping
- 23.2.6.1 Decision Trees
- 23.2.6.2 Treemap
- 23.2.6.3 Hyperbolic Tree
- 23.2.6.4 Phylogenetic Trees
- 23.3.1 EnsembleMatrix and ManiMatrix
- 23.3.2 Systematic Mapping
- 23.3.3 iVisClassifier
- 23.3.4 ParallelTopics
- 23.3.5 VisBricks
- 23.3.6 WHIDE
- 23.3.7 Text Document Retrieval
- Figure 23.1
- Figure 23.2
- Figure 23.3
- Figure 23.4
- Figure 23.5
- Figure 23.6
- Figure 23.7
- 24.1 Introduction
- 24.2 Validation Schemes
- 24.3 Evaluation Measures
- 24.3.1 Accuracy Related Measures
- 24.3.1.1 Discrete Classifiers
- 24.3.1.2 Probabilistic Classifiers
- 24.4.1 Parametric Statistical Comparisons
- 24.4.1.1 Pairwise Comparisons
- 24.4.1.2 Multiple Comparisons
- 24.4.2.1 Pairwise Comparisons
- 24.4.2.2 Multiple Comparisons
- 24.4.2.3 Permutation Tests
- Figure 24.1
- Figure 24.2
- Figure 24.3
- Figure 24.4
- Table 24.1
- Table 24.2
- Table 24.3
- Table 24.4
- Table 24.5
- 25.1 Introduction
- 25.2 Educational Resources
- 25.2.1 Books on Data Classification
- 25.2.2 Popular Survey Papers on Data Classification
- 25.3.1 Data Benchmarks for Software and Research
Show and hide more
Product information
- Title: Data Classification
- Author(s): Charu C. Aggarwal
- Release date: July 2014
- Publisher(s): Chapman and Hall/CRC
- ISBN: 9781498760584