TY - JOUR
T1 - CathepsinDL
T2 - Deep Learning-Driven Model for Cathepsin Inhibitor Screening and Drug Target Identification
AU - Junaid Anwar Qader, Mohammed
AU - Mohan Sah, Chandra
AU - Kumar Sahoo, Tapan
AU - Kumar Majhi, Santosh
AU - Mishra, Kaushik
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2025
Y1 - 2025
N2 - Cathepsins are lysosomal proteases which are crucial for protein breakdown, bone remodelling and antigen processing and whose dysregulation leads to diseases like cancer, osteoporosis and neurodegenerative disorders, making them an important drug target. This study introduces a 1D Convolutional Neural Network-based classification model to enhance screening for potential cathepsin inhibitors, leading to more efficient selection of potential targets for experimental validation. The dataset was gathered from BindingDB and ChEMBL with the target Cathepsin B, S, D and K and the respective half-maximal inhibitory concentration (IC50) values for their inhibitors, which were categorized into four classes—potent, active, intermediate, and inactive—based on the ranges of IC50 values. The inhibitor ligands were collected in the Simplified Molecular Input Line Entry System (SMILES) notation and were converted to molecular descriptors using the RDKit library. Due to the large number of features, feature selection techniques such as Recursive Feature Elimination (RFE), variance thresholding, and correlation analysis were employed to refine and reduce the initial molecular descriptor set. Data augmentation techniques like Synthetic Minority Over-sampling Technique (SMOTE) were applied to address the issue of class imbalance. The proposed model achieved high classification accuracies, with Cathepsin B at 97.67% ± 0.54% ± , Cathepsin S at 90.69% ± 0.57% , Cathepsin D at 97.27% ± 0.23% , and Cathepsin K at 92.03% ± 1.07% ± , highlighting the effectiveness of feature selection and deep learning in ligand classification. This approach enhances the identification of potential drug targets for in vitro and in vivo testing, making the process more cost-effective and time-efficient.
AB - Cathepsins are lysosomal proteases which are crucial for protein breakdown, bone remodelling and antigen processing and whose dysregulation leads to diseases like cancer, osteoporosis and neurodegenerative disorders, making them an important drug target. This study introduces a 1D Convolutional Neural Network-based classification model to enhance screening for potential cathepsin inhibitors, leading to more efficient selection of potential targets for experimental validation. The dataset was gathered from BindingDB and ChEMBL with the target Cathepsin B, S, D and K and the respective half-maximal inhibitory concentration (IC50) values for their inhibitors, which were categorized into four classes—potent, active, intermediate, and inactive—based on the ranges of IC50 values. The inhibitor ligands were collected in the Simplified Molecular Input Line Entry System (SMILES) notation and were converted to molecular descriptors using the RDKit library. Due to the large number of features, feature selection techniques such as Recursive Feature Elimination (RFE), variance thresholding, and correlation analysis were employed to refine and reduce the initial molecular descriptor set. Data augmentation techniques like Synthetic Minority Over-sampling Technique (SMOTE) were applied to address the issue of class imbalance. The proposed model achieved high classification accuracies, with Cathepsin B at 97.67% ± 0.54% ± , Cathepsin S at 90.69% ± 0.57% , Cathepsin D at 97.27% ± 0.23% , and Cathepsin K at 92.03% ± 1.07% ± , highlighting the effectiveness of feature selection and deep learning in ligand classification. This approach enhances the identification of potential drug targets for in vitro and in vivo testing, making the process more cost-effective and time-efficient.
UR - https://www.scopus.com/pages/publications/105018046793
UR - https://www.scopus.com/pages/publications/105018046793#tab=citedBy
U2 - 10.1109/ACCESS.2025.3617246
DO - 10.1109/ACCESS.2025.3617246
M3 - Article
AN - SCOPUS:105018046793
SN - 2169-3536
VL - 13
SP - 173695
EP - 173711
JO - IEEE Access
JF - IEEE Access
ER -