TEST-DRIVEN DEVELOPMENT FOR CODE-GENERATED BACKENDS USING SCHOLARSHIP EVALUATION SYSTEMS CASE STUDY
Abstract
Despite significant advancements in Large Language Models (LLM) optimization techniques, challenges remain in applying these models to real-world software development tasks. Noble Saji Mathews and Meiyappan Nagappan's Tgen framework demonstrate that TDD can enhance the functionality and robustness of code generated by LLMs. Building on previous research, this study aims to extend the implementation of Test-Driven Development (TDD) in code generation, specifically for backend systems, in this case the Pertamina University scholarship evaluation system. Pertamina University manages various scholarships and over 400 awardees, necessitating an efficient scholarship management system. This research aims to evaluate how Test-Driven Development (TDD) enhances LLM-generated backend code and to explore different test codes that can maximize LLM potential. We gather functional requirements from stakeholders and the project manager, then create test cases based on these requirements. The study involves three analyses. The first analysis compares LLM-generated backend code using both unstandardized and standardized test codes. Standardized test code follows specific implementation rules. The second analysis examines the ability of LLMs to use test cases to generate test code, which then generates backend code. This is compared with backend code generated using human-created test code. The third analysis involves performance testing to compare LLM-generated backend code with human-generated backend code, assessing how LLMs can potentially outperform manual code generation. Code generated using TDD implementation passes all the test case given with 100% accuracy, TDD implementation improved by 28.5% when using the standardized test code. The code generated using LLMs generated test code only pass 41% of the test cases, which is 59% different from the code generated by human generated test code. The results indicate that LLM-generated code demonstrates competitive performance in terms of throughput, response time, and error rate when compared with human-generated code. Internal quality analysis performed using static code analysis tools to compare the internal quality of human and LLM generated backend code, static code analysis shows that LLM-generated code gives zero issues in terms of security, reliability and maintanability, while human-generated shows five issues in maintanability. These findings explain that TDD implementation with good written tests can improve LLMs capability in code generation. However, the ability of LLMs to produce their own test case and test code is still far from expected, making it unrealistic for programmers to rely solely on LLMs to generate backend code without any manual intervention.