TY - JOUR
T1 - A deep learning framework for protein-to-metal binding prediction using protein language models
AU - Shishir, Fairuz Shadmani
AU - Sarker, Bishnu
AU - Rahman, Farzana
AU - Shomaji, Sumaiya
PY - 2025/7/29
Y1 - 2025/7/29
N2 - This study presents an end-to-end deep learning framework for protein–to-metal-ion binding prediction, a critical task in understanding protein function, structural stability, and metal transport mechanisms. A binding site is a residue location in a protein sequence where a metal binds to a protein. Manual curation of metal binding sites is a tedious process involving mining through research articles, making it expensive, laborious, and time-consuming. Therefore, developing a computational pipeline is essential to predict metal ion binding of unannotated proteins. A significant shortcoming of existing computational methods is the failure to capture the long-term dependency of the residues, the absence of positional information, and a pre-determined set of residues and metal ions. In this paper, we propose a metal-ion binding prediction pipeline using a large language model, emphasizing 1) the comparative performance of five state-of-the-art protein language models (pLMs), 2) the impact of positional encoding of binding sites, and 3) the comparison with classical machine learning techniques. A 10-fold cross-validation evaluation yielded a Matthews Correlation Coefficient (MCC) of 0.89, along with precision, recall, and F1 scores exceeding 95% for the six most extensively studied metal ions reported in the literature.
AB - This study presents an end-to-end deep learning framework for protein–to-metal-ion binding prediction, a critical task in understanding protein function, structural stability, and metal transport mechanisms. A binding site is a residue location in a protein sequence where a metal binds to a protein. Manual curation of metal binding sites is a tedious process involving mining through research articles, making it expensive, laborious, and time-consuming. Therefore, developing a computational pipeline is essential to predict metal ion binding of unannotated proteins. A significant shortcoming of existing computational methods is the failure to capture the long-term dependency of the residues, the absence of positional information, and a pre-determined set of residues and metal ions. In this paper, we propose a metal-ion binding prediction pipeline using a large language model, emphasizing 1) the comparative performance of five state-of-the-art protein language models (pLMs), 2) the impact of positional encoding of binding sites, and 3) the comparison with classical machine learning techniques. A 10-fold cross-validation evaluation yielded a Matthews Correlation Coefficient (MCC) of 0.89, along with precision, recall, and F1 scores exceeding 95% for the six most extensively studied metal ions reported in the literature.
U2 - 10.1109/TCBBIO.2025.3595446
DO - 10.1109/TCBBIO.2025.3595446
M3 - Article
SN - 2998-4165
JO - IEEE Transactions on Computational Biology and Bioinformatics
JF - IEEE Transactions on Computational Biology and Bioinformatics
ER -