Classification and Regression Trees (CART)
Last updated
Was this helpful?
Last updated
Was this helpful?
CART was proposed by Leo Breiman to the machine learning community. Classically, this algorithm is known as decision trees only, but in the modern-day community, some programming languages refer to it as CART. This algorithm is the cornerstone of the ensemble machine learning system, like bagging and boosting.
The representation of CART is the binary tree only, and this is the same binary tree that we all have learned in data structures. Let's have a brief review of the binary tree algorithm.
Binary trees are different from other trees in the sense that in a normal tree structure, there may be any number of children for a parent node (including root); but in a binary tree, as its name suggests, any parent node can have a maximum of two branches (or nodes). This figure shows a representation of a simple binary tree:
This tree is a subtree that we created in the previous chapter. Here, you can see that each node has a maximum of two children. One more thing to add in here—when we work with numerical values as attribute values, our binary tree will look like this:
If you look closer at the preceding tree, you'll see that we have changed our attribute values from categorical to numerical. So, the criteria of node splitting is going to be based on numerical values only. There is an important property of binary trees: one branch will have values lesser than the parent node and the other will have them higher than the parent node. As you can see in the figure, all children on the left have values more than their parent nodes, while the children on the right have lower values than the parent nodes. This important criteria is used to divide data using an anchor point and create a subset out of it. Later on, we can check whether this subset can be divided further or not.
Now, let's try to create a simple binary tree out of a simple dataset to understand its concept.
Suppose we have the following array of values:
After executing the preceding line, we will have a list of numbers like this:
Now, as we know that a binary tree may have a maximum of two branches (left and right), we can write it in the form of a Python dictionary with key values as left and right. There will be one more field called data, where we can store our data. Each node in our binary tree will have these three fields. Let's write a method to create a node with the aforementioned key values:
As you can see, our dictionary node has three fields where data will hold the node value at a particular node, and left and right will hold the next nodes.
Let's call the preceding method to create a root node for our tree; we will choose median of the preceding array as our root node. Let's create it:
After executing the preceding lines, we will get the following dictionary:
Now, as we have our root node, we can start building our tree out of it. We will take array values one by one using a for loop; we will put values less than the parent node in the left branch, while values greater than the parent node will go to the right branch. The following is the method to do this:
We will use recursion to create our tree. Recursion is the way to call the same function again and again to complete repetitive tasks, As you can see in the preceding method, we are calling createBinaryTree inside the same function to build a tree under a tree; each call to createBinaryTree will add a subtree to the respective node.
Now that we have sufficient code to create a binary tree for our small array, let's create tree out of it:
As we have already created a root node of our tree, we will now start extracting values from the array and use them to create tree. After the execution of the preceding block, we will have a Python dictionary; it will have our entire tree structure, as follows:
So, this is the tree we have built out of our array.
The first question that can come to your mind is, now what? What to do with this tree? Well, binary trees are extremely useful whenever you need to search for any element in them. However, binary search is out of our scope; we will see how a binary tree will help us make our decision tree.
So, to make a decision tree out of a binary tree algorithm, what should be required to add to our code? Let's think about it:
We should have some metric on the basis of which we can decide what should be the value of our node (including the root node)
We must know where to split a tree branch
We must know some stopping criteria of our tree growing process, or it may grow for infinite time
What do I mean by some metric? Well, we used information gain (using the attribute entropy) to select nodes in the previous chapter and that technique showed us some promising results. But that criteria to check the impurity of a subset is well defined for categorical data rather than numerical. So what should we use for our test case? Well, we will be using Gini index as our metric to quantify purity and choose the value of the node.