TY - JOUR
T1 - Natural-Language Processing (NLP) based feature extraction technique in Deep-Learning model to predict the Blood-Brain-Barrier permeability of molecules
AU - Singh, Ravi
AU - Ghosh, Powsali
AU - Ganeshpurkar, Ankit
AU - Anand, Asha
AU - Swetha, Rayala
AU - Singh, Ravi Bhushan
AU - Kumar, Dileep
AU - Singh, Sushil Kumar
AU - Kumar, Ashok
N1 - Publisher Copyright:
© 2023 Wiley-VCH GmbH.
PY - 2023/10
Y1 - 2023/10
N2 - Blood-Brain-Barrier (BBB) permeability is one of the critical factors in the success and failure of CNS drug development. The most accurate method of measuring BBB permeability involves clinical experiments, which are labour-intensive and time-consuming. Thus, numerous efforts were made to use artificial intelligence (AI) to predict molecules′ BBB permeability. Most of the previous models are based on calculated descriptors and molecular fingerprints. In the present work, we have developed an NLP-based feature extraction technique in Deep-Learning models to predict BBB permeability. We have used the B3DB database and generated SELFIES to extract features from the molecules. We have employed word level and N-gram tokenization to represent words into numeric vectors. The extracted features were fed into several Artificial Neural Network (ANN) and Bi-directional Long Short-Term Memory (LSTM) models. The model, ANN-10 built using ANN and 6-gram tokenization, performed best on the independent test set. The accuracy, precision, recall, F1, specificity and AUC of ROC scores were found to be 0.89, 0.91, 0.91, 0.91, 0.85 and 0.90. Thus, the developed model can be used for the early screening of CNS drugs.
AB - Blood-Brain-Barrier (BBB) permeability is one of the critical factors in the success and failure of CNS drug development. The most accurate method of measuring BBB permeability involves clinical experiments, which are labour-intensive and time-consuming. Thus, numerous efforts were made to use artificial intelligence (AI) to predict molecules′ BBB permeability. Most of the previous models are based on calculated descriptors and molecular fingerprints. In the present work, we have developed an NLP-based feature extraction technique in Deep-Learning models to predict BBB permeability. We have used the B3DB database and generated SELFIES to extract features from the molecules. We have employed word level and N-gram tokenization to represent words into numeric vectors. The extracted features were fed into several Artificial Neural Network (ANN) and Bi-directional Long Short-Term Memory (LSTM) models. The model, ANN-10 built using ANN and 6-gram tokenization, performed best on the independent test set. The accuracy, precision, recall, F1, specificity and AUC of ROC scores were found to be 0.89, 0.91, 0.91, 0.91, 0.85 and 0.90. Thus, the developed model can be used for the early screening of CNS drugs.
UR - https://www.scopus.com/pages/publications/85168574869
UR - https://www.scopus.com/pages/publications/85168574869#tab=citedBy
U2 - 10.1002/minf.202200271
DO - 10.1002/minf.202200271
M3 - Article
AN - SCOPUS:85168574869
SN - 1868-1743
VL - 42
JO - Molecular Informatics
JF - Molecular Informatics
IS - 10
M1 - 2200271
ER -