Adaptive OVFDT with Incremental Pruning and ROC Corrective Learning for Data Stream Mining
OVFDT (Optimized Very Fast Decision Tree) is a new classifier in data stream mining based on an incremental learning method called Hoeffding tree algorithm (HTA). HTA uses information gain and Hoeffding bound to build a decision tree bottom-up from the incoming data streams. OVFDT is equipped with extra mechanisms for regulating the tree size (for satisfying small heap memory requirement) and for improving prediction accuracy. The result is an optimal balance of compact decision tree and good performance accuracy. The prototype of OVFDT was developed and tested by UM. In this sequel project, we want to continue the research endeavor by incorporating further enhancements in the algorithms of the OVFDT model, such as a Receiver Operating Characteristic Matrix (ROC curve is an important learning curve in machine learning). ROC curve is to be extended from univariate model to ROC matrix multivariate model for monitoring the performance of each tree node in OVFDT. Any underperforming tree node(s)/path(s) would be adaptively pruned away or replaced as cued by the ROC matrix. Effectively the overall performance of OVFDT could improve even under input data streams that contain concept changes.