COBRA web application to benchmark linear regression models for catalyst optimization with few-entry datasets

by Zhen Cao, Laura Falivene, Albert Poater, Bholanath Maity, Ziyung Zhang, Gentoku Takasao, Sadeed Bin Sayed, Luigi Cavallo, et.al.
Year: 2025

Extra Information

Cell Reports Physical Science.

Abstract

Multivariate linear regression (MLR) is a promising machine learning tool for catalyst engineering with small datasets. In order to become the standard in medium to small laboratories, MLR predictors must work with minimal experimental data and provide reliable predictions. To evaluate MLR predictors trained on limited datasets, we developed an easy-to-use workflow to validate models before real-world application. This workflow was tested on 29 reaction classes, covering various reaction types (e.g., hydrogenation and cross-coupling) and experimental properties (e.g., yields and selectivities). We confirm that common metrics like coefficient of determination (R2) and cross-validated R2 (Q2) are useful starting points but insufficient to uncover all model weaknesses. The proposed workflow offers a systematic pipeline applicable to any MLR model, addressing these gaps. Additionally, a web application was developed to enable users to perform these validation tests online. This framework aims to standardize and simplify the evaluation of MLR predictors in catalyst research.

 

Keywords

homogeneous catalysis multivariate linear regression steric descriptors electronic descriptors catalyst design assessing regressor model