Time to 01:00 pm Add to Calendar 2024-03-13 12:00:00 2024-03-13 13:00:00 Improvising Boruta feature selection in multivariate framework (QuanDev Brownbag) HHD 101 conference room Population Research Institute America/New_York public
Location HHD 101 conference room
Presenter(s) Priyanka Paul, a doctoral student in HDFS at Penn State.

With increasing access to large-scale datasets, it is easier to create and test models of process, but identifying important moderators in a sea of possible influential variables can be quite challenging. Furthermore, the presence of heterogeneity in data necessitates advances in analytical strategies. We introduce SEM-Boruta, a novel adaptation of the Boruta feature selection algorithm, and explore its robustness through application on simulated datasets within a multilevel multigroup framework. Boruta utilizes Random Forest Classifiers, a machine learning procedure based on recursive partitioning that is often used to identify potentially important predictors. By adapting Boruta to use Structural Equation Model (SEM) Trees (Brandmaier et al., 2013), we convert it from a predictor selection algorithm into a moderator selection algorithm: a guided form of heterogeneity search. We present initial validation of SEM-Boruta’s robustness and provide insights into its ability to discern relevant features in complex data structures. This investigation also underscores the importance of tools like SEM-Boruta in identifying heterogeneous subsets in a population and provides a nuanced understanding of heterogeneity in research data.