Improvising Boruta feature selection in multivariate framework (QuantDev Brownbag)

Time	Wed, Mar 13, 2024 - 12:00 pm to 01:00 pm Add to Calendar `2024-03-13 12:00:00 2024-03-13 13:00:00 Improvising Boruta feature selection in multivariate framework (QuantDev Brownbag) HHD 101 conference room Population Research Institute America/New_York public`
Location	HHD 101 conference room Google map
Presenter(s)	Priyanka Paul, a doctoral student in HDFS at Penn State.
Description	With increasing access to large-scale datasets, it is easier to create and test models of process, but identifying important moderators in a sea of possible influential variables can be quite challenging. Furthermore, the presence of heterogeneity in data necessitates advances in analytical strategies. We introduce SEM-Boruta, a novel adaptation of the Boruta feature selection algorithm, and explore its robustness through application on simulated datasets within a multilevel multigroup framework. Boruta utilizes Random Forest Classifiers, a machine learning procedure based on recursive partitioning that is often used to identify potentially important predictors. By adapting Boruta to use Structural Equation Model (SEM) Trees (Brandmaier et al., 2013), we convert it from a predictor selection algorithm into a moderator selection algorithm: a guided form of heterogeneity search. We present initial validation of SEM-Boruta’s robustness and provide insights into its ability to discern relevant features in complex data structures. This investigation also underscores the importance of tools like SEM-Boruta in identifying heterogeneous subsets in a population and provides a nuanced understanding of heterogeneity in research data.

Time

Wed, Mar 13, 2024 - 12:00 pm to 01:00 pm Add to Calendar 2024-03-13 12:00:00 2024-03-13 13:00:00 Improvising Boruta feature selection in multivariate framework (QuantDev Brownbag) HHD 101 conference room Population Research Institute America/New_York public

Location

HHD 101 conference room

Google map

Presenter(s)

Priyanka Paul, a doctoral student in HDFS at Penn State.

Description

With increasing access to large-scale datasets, it is easier to create and test models of process, but identifying important moderators in a sea of possible influential variables can be quite challenging. Furthermore, the presence of heterogeneity in data necessitates advances in analytical strategies. We introduce SEM-Boruta, a novel adaptation of the Boruta feature selection algorithm, and explore its robustness through application on simulated datasets within a multilevel multigroup framework. Boruta utilizes Random Forest Classifiers, a machine learning procedure based on recursive partitioning that is often used to identify potentially important predictors. By adapting Boruta to use Structural Equation Model (SEM) Trees (Brandmaier et al., 2013), we convert it from a predictor selection algorithm into a moderator selection algorithm: a guided form of heterogeneity search. We present initial validation of SEM-Boruta’s robustness and provide insights into its ability to discern relevant features in complex data structures. This investigation also underscores the importance of tools like SEM-Boruta in identifying heterogeneous subsets in a population and provides a nuanced understanding of heterogeneity in research data.