
In order to become familiar with the Algorithm::DecisionTree module:

  (1)    Run the 

               training_data_generator.pl

         script to create your training data. First run the
         script as it is, and then make a copy of the
         param.txt file, modify this parameter file as you
         wish, and run the above script with your version of
         param.txt.


  (2)    Next run the 

                construct_dt_and_classify_one_sample.pl

         script as it is.  

         HIGHLY RECOMMENDED:  Always turn on the debug1 option
                              on in the call to the constructor
                              when experimenting with a training
                              datafile for the first time.

         Now modify the test sample in this script and see
         what classification results you get for the new
         test sample.  Next run this script on the new
         training datafile that you yourself created.  You
         would obviously need to use the test samples that
         mention the feature and value names in your own
         parameter file.


  (3)    If your decision tree is going to be very large, you may
         need to change the values of the following constructor 
         parameters in the experiments in (2) above:

                    max_depth_desired

                    entropy_threshold

         The first parameter, max_depth_desired, controls the
         depth of the tree from the root node, and the second
         parameter, entropy_threshold, controls the resolution
         in the entropy space.  The smaller the value for the
         first parameter and the larger the value for the second
         parameter, the smaller the decision tree.


  (4)    Now run the test data generator script by invoking 

                generate_test_data.pl

         As it is, it will put out 20 samples for testing. But you
         can set that number to anything you wish.

         The test data is dumped into a file without the class labels
         for obvious reasons.  The class labels are dumped into a
         separate file whose name you can specify in the above 
         script.  As currently programmed, the name of this file is

                test_data_class_labels.dat

         By comparing the class labels returned by the classifier 
         with the class labels in this file, you can assess the 
         accuracy of the classifier.


  (5)    Finally, run the classifier on the test datafile by

         classify_test_data_in_a_file.pl  training.dat  testdata2.dat  out.txt

         Note carefully the three arguments you must supply the script.
         The first is for where the training data is, the second for 
         where the test data is, and the last where the classification 
         results will be deposited.


=======================================================================


FOR THE CASE OF VERY LARGE DECISION TREES:


  Large decision trees can take a very long time to create.
If that is the case with your application, having to create
afresh a decision tree every time you want to classify
something can quickly become tiresome.  If such is the case
with your application, consider storing your decision tree
in a diskfile.  Subsequently, you can use the disk-stored
decision tree for your classification work.  The following
scripts in this directory:

      store_dt_on_disk.pl

      classify_from_disk_stored_dt.pl

show you how you can do that.
