High Computational Complexity of the Decision Tree
Induction with many Missing Attribute Values


Rafal Latkowski

Institute of Computer Science, Warsaw University
ul. Banacha 2, 02--097 Warsaw, Poland
rlatkows@mimuw.edu.pl

Abstract

The decision tree induction is a widely applied technique in 
machine learning. Algorithms based on  such technique with careful
implementation can reach the computational complexity 
Omega(n*log(n)). However, a special treatment of decision trees is
needed  in case of missing values. The most popular and widely
used approach to this problem is to divide such cases among all
subsets (sub-nodes). Such an approach is implemented, e.g., in
C4.5, RSES-Lib, WEKA and many other machine learning programs.
In this paper we show that this strategy leads to a significant
increase (O(n^gamma)) of computational complexity of the decision
tree induction algorithm for data sets with many missing values.