Random forest importance

6/19/2023

For example, a variant of the Random Forest method has been proposed where the feature sub-sampling was conducted according to spatial information of genes on a known functional network 10.Objective functions of the support vector machine and the logistic regression were modified by adding relational penalty terms, again based on known functional information 11, 12, 13. This can be useful in building robust predictive models especially when the underlying structures in the feature space are complex and unknown.Ĭlassification methods have been developed considering known functional links between features. Literature shows that among the machine learning techniques, random forests 7 (RF) have been an excellent tool to learn feature representations 8, 9, given their robust classification power and easily interpretable learning mechanism. While the primary goal of these methods are to achieve high classification accuracy, efforts have also been put into learning effective feature representations. Therefore, the prediction task has been formulated as a classification problem combined with feature representations, and related work tried to solve the problem by utilizing machine learning approaches such as random forests 1, 2, neural networks 3, sparse linear models 4, 5 and support vector machines 6. Moreover, existence of complex unknown correlation structures among predictors has brought more difficulty in prediction and feature extraction. In such datasets, the sample sizes tend to be very small compared to the number of predictors (genes), hence resulting in the \(n\ll p\) issue. In the field of bioinformatics, the development of computational methods for predicting clinical outcomes using profiling datasets with a large amount of variables has drawn great interest. The method is demonstrated a useful addition to current predictive models with better classification performance and more meaningful selected features compared to ordinary random forests and deep neural networks. Simulation experiments and real data analyses using two RNA-seq expression datasets are conducted to evaluate fDNN’s capability.

Using this built-in feature detector, the method is able to learn sparse feature representations and feed the representations into a neural network to mitigate the overfitting problem. To tackle these problems, we propose a newly developed classifier named Forest Deep Neural Network (fDNN), to integrate the deep neural network architecture with a supervised forest feature detector. Further, the sparsity of effective features with unknown correlation structures in gene expression profiles brings more challenges for classification tasks. This “ n ≪ p” property has prevented classification of gene expression data from deep learning techniques, which have been proved powerful under “ n > p” scenarios in other application fields, such as image classification. In predictive model development, gene expression data is associated with the unique challenge that the number of samples (n) is much smaller than the amount of features ( p).

0 Comments

Random forest importance

Leave a Reply.

Author

Archives

Categories