Random Forest Classifier


Please explain why do you think this question should be reported?

Report Cancel


I have just begun to use Python script in Unscrambler 11, and I have created my first Random Forest Classification model. I have first crated the model with “RandomForestClassifierBuildModel”, then I applied it to data with RandomForestClassifierClassify”. Apparently all went well, but when I checked result I found a small but annoying issue.

Unlike usual Unscrambler classifiers, RF Classifier did NOT list categories in predicted class vector in the same order of true class vector. Therefore, not only different markers were assigned to the same class in scatter plots (this would be a lesser problem), but when I created the confusion matrix using the “Contingency Table” tool, correctly classified sample were not on the main diagonal as they should be.

OK, I could rearrange them by hand, but it would not be an optimal solution, especially when many class are involved.

I have tried to change order of categories, using “Category Property”, but it not worked, because it changed the predicted class to all samples.

In other word, samples are not associated to a particular label, but to a particular position in the modality list. If you invert, say, the order of labels “A” and “B” in the list, all “A” samples become “B” and vice versa.

Therefore, I don’t know how to fix such issue. Do you have any suggestion?

About the Author

Thread Reply

  1. External Admin

    Please briefly explain why you feel this answer should be reported .

    Report Cancel

    The predicted class vector output from the Random Forest Classify script is ordered so that it corresponds with the rows from the X matrix that was used as input for the classification: the class appearing in the first row corresponds to the sample represented in the first row of the X matrix, predicted class in the second row corresponds to the second row (sample) in the X matrix, and so on.

    Even though the contingency table can be useful for creating a confusion matrix by comparing predicted classes from a classification and the corresponding reference classes, it is actually not meant solely for this purpose. The output will therefore not always give the correctly classified samples in the main diagonal. An example of this may be if you have more reference classes than your model predicted. Say the original model was built on samples representing 4 classes and during prediction of the same samples, the model did not classify any sample as class 3. In this case, class 3 would not be included in one of the dimensions of the contingency table, resulting in either a 3 X 4 or 4 X 3 matrix. The confusion matrix, on the other hand, is always a square matrix: same number of rows and columns.

    A script that summarizes classification results is currently in the works, with outputs including confusion matrix and other tools to evaluate the performance of a classification model. This will be shared in Camo Community and will be applicable to any classification model, as the input will be a predicted class vector and reference class vector.

  2. 28/01/2020 at 9:44 am

    Please briefly explain why you feel this answer should be reported .

    Report Cancel

    Thanks for your help. I am looking forward for the release of the script for summarizing classification results. It will be very useful.

    Indeed, I know that contingency table are not the optimal solution, but presently are the only option in Unscrambler, which has not a tool for comparing categorical variables. Or, to be fair, such tools must exist, because when you crate a LDA or SVM model it gives you all classification statistics.

    However, such tool is not directly accessible to user, and when you validate such models on a new set you have to manually compare actual and predicted classes. This is rather awkward, especially if the data set is large. For such reason, I often use, cautiously, contingency analysis for matching predicted and reference classes.

  3. 23/09/2020 at 7:37 am

    Please briefly explain why you feel this answer should be reported .

    Report Cancel

    New python script for classification performance metrics is now available. This includes generation of a confusion matrix, accuracy, sensitivity, specificity, etc. Access it from here: https://community.camo.com/?p=2120#classification

Leave an answer