Uyaguari, FernandoFernandoUyaguariAcuña, Silvia TeresitaSilvia TeresitaAcuñaCastro, John W.John W.CastroDieste, ÓscarÓscarDiesteJuristo, NataliaNataliaJuristo2025-10-102025-10-1020259505849https://hdl.handle.net/20.500.12740/23371Context: Test-driven development (TDD) is a software development technique studied empirically over the last few decades. There are several systematic literature reviews (SLRs) on TDD. The reliability of these studies should not be taken for granted because SLRs are highly dependent on the context and researcher decision-making. Objective: This study determines, analyses and synthesizes the limited overlap between SLRs on TDD and its influence on the conclusions and results with respect to the code quality and developer productivity response variables. Method: A tertiary study was conducted to source SLRs on TDD from the scientific literature, and the primary studies referenced in each SLR were analysed. We compared SLRs with similar objectives, SLRs with similar response variables, and all SLRs. We analysed the differences between the selected primary studies and their impact on the conclusions and results. Results: The overlap between SLRs with similar response variables (54 %) is greater than between SLRs with similar objectives (36 %). Only three per cent of the primary studies are included in all eight analysed SLRs. Conclusions regarding external quality and productivity may vary across the SLRs on TDD. While we found that SLR results are similar, these results may differ when authors classify primary studies by experiments and case studies. Conclusion: SLRs with similar response variables tend to be more repeatable than SLRs with similar objectives and SLRs addressing the same topic. The SLR authors’ criteria with respect to the consistency of evidence may influence the conclusions of SLRs on TDD. The results of SLRs where all primary studies count equally appear to be consistent. The SLR authors’ criteria for selecting primary studies may influence the results classified by case studies and experiments. © 2025 Elsevier B.V., All rights reserved.restrictedAccessRELIABILITYREPEATABILITYSYSTEMATIC LITERATURE REVIEWTEST-DRIVEN DEVELOPMENTCOMPUTER SOFTWARE SELECTION AND EVALUATIONSOFTWARE RELIABILITYSOFTWARE TESTINGCODE DEVELOPERSCODE QUALITYDECISIONS MAKINGSEXTERNAL QUALITYSCIENTIFIC LITERATURESOFTWARE DEVELOPMENT TECHNIQUESTERTIARY STUDYTEST DRIVEN DEVELOPMENTSOFTWARE DESIGNReliability of systematic literature reviews on test-driven developmentReviewhttps://doi.org/10.1016/j.infsof.2025.107762