TY - GEN
T1 - Secure Code Generation with CodeT5
T2 - 3rd World Conference on Information Systems for Business Management, ISBM 2024
AU - Nayak, Sanjana Ganesh
AU - Rao, Anirudha
AU - Bhat, B. L.Siddhartha
AU - Nayak, Sanjana Ganesh
AU - Balachandra, Mamatha
AU - Prakash, Om
AU - Singh, Varinder Pratap
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
PY - 2025
Y1 - 2025
N2 - In the last couple of years, generative models especially large language models drawing from recent advances in AI have emerged as promising for numerous applications. This work illustrates how a large language model, CodeT5, can enhance secure text-to-code generation. CodeT5 is proposed as a unified pre-trained encoder–decoder transformer model that benefits from semantic hints given by developer-assigned identifier names, improving code understanding and promoting trustworthy text-to-code transcribing. It addresses gaps by incorporating an identifier-aware pre-training task and connecting natural language to programming language abstractions through user-written code comments. To enhance code security, CodeT5 is trained on a huge CVE dataset, leveraging code snippets before and after security patches. This new hybrid paradigm helps promote secure coding as well as AI-enhanced software engineering.
AB - In the last couple of years, generative models especially large language models drawing from recent advances in AI have emerged as promising for numerous applications. This work illustrates how a large language model, CodeT5, can enhance secure text-to-code generation. CodeT5 is proposed as a unified pre-trained encoder–decoder transformer model that benefits from semantic hints given by developer-assigned identifier names, improving code understanding and promoting trustworthy text-to-code transcribing. It addresses gaps by incorporating an identifier-aware pre-training task and connecting natural language to programming language abstractions through user-written code comments. To enhance code security, CodeT5 is trained on a huge CVE dataset, leveraging code snippets before and after security patches. This new hybrid paradigm helps promote secure coding as well as AI-enhanced software engineering.
UR - https://www.scopus.com/pages/publications/105009904712
UR - https://www.scopus.com/pages/publications/105009904712#tab=citedBy
U2 - 10.1007/978-981-96-1206-2_4
DO - 10.1007/978-981-96-1206-2_4
M3 - Conference contribution
AN - SCOPUS:105009904712
SN - 9789819612055
T3 - Smart Innovation, Systems and Technologies
SP - 29
EP - 39
BT - Information Systems for Intelligent Systems - Proceedings of ISBM 2024
A2 - In, Chakchai So
A2 - Londhe, Narendra S.
A2 - Bhatt, Nityesh
A2 - Kitsing, Meelis
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 12 September 2024 through 13 September 2024
ER -