Reliably scoring and ranking candidate models of protein complexes and assigning their oligomeric state from the structure
1
Posted on 6 Feb 2023 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.167569565.51141128/v1 | This a preprint and has not been peer reviewed. Data may be preliminary.
of the crystal lattice represent outstanding challenges. A community-wide effort was launched to tackle these challenges. The
latest resources on protein complexes and interfaces were exploited to derive a benchmark dataset consisting of 1677 homodimer
protein crystal structures, including a balanced mix of physiological and non-physiological complexes. The non-physiological
complexes in the benchmark were selected to bury a similar or larger interface area than their physiological counterparts,
making it more difficult for scoring functions to differentiate between them. Next, 252 functions for scoring protein-protein
interfaces previously developed by 13 groups were collected and evaluated for their ability to discriminate between physiological
and non-physiological complexes. A simple consensus score generated using the best performing score of each of the 13 groups,
and a cross-validated Random Forest (RF) classifier were created. Both approaches showed excellent performance, with an
area under the Receiver Operating Characteristic (ROC) curve of 0.93 and 0.94 respectively, outperforming individual scores
developed by different groups. Additionally, AlphaFold2 engines were shown to recall the physiological dimers with significantly
higher accuracy than the non-physiological set, lending support for the pertinence of our benchmark dataset. Optimizing the
combined power of interface scoring functions and evaluating it on challenging benchmark datasets appears to be a promising
strategy.