1
Rothman Orthopaedic Institute, Thomas Jefferson University, Philadelphia, PA, USA.
2
Google Search AI, Google, Mountain View, CA, USA.
10.22038/abjs.2025.84896.3874
Abstract
Objectives: Large language models (LLMs) may improve the process of conducting systematic literature reviews. Our aim was to evaluate the utility of one popular LLM chatbot, Chat Generative Pre-trained Transformer (ChatGPT), in systematic literature reviews when compared to traditionally conducted reviews.
Methods: We identified five systematic reviews published in the Journal of Bone and Joint Surgery from 2021 to 2022. We retrieved the clinical questions, methodologies, and included studies for each review. We evaluated ChatGPT’s performance on three tasks. (1) For each published systematic review’s core clinical question, ChatGPT designed a relevant database search strategy. (2) ChatGPT screened the abstracts of those articles identified by that search strategy for inclusion in a review. (3) For one systematic review, ChatGPT reviewed each individual manuscript identified after screening to identify those that fit inclusion criteria. We compared the performance of ChatGPT on each of these three tasks to the previously published systematic reviews.
Results: ChatGPT captured a median of 91% (interquartile range, IQR 84%, 94%) of articles in the published systematic reviews. After screening of these abstracts, ChatGPT was able to capture a median of 75% (IQR 70%, 79%) of articles included in the published systematic reviews. On in-depth screening of manuscripts, ChatGPT captured only 55% of target publications; however, this improved to 100% on review of the manuscripts that ChatGPT identified on this step. Qualitative analysis of ChatGPT’s performance highlighted the importance of prompt design and engineering.
Conclusion: Using published reviews as a gold standard, ChatGPT demonstrated ability in replicating fundamental tasks for orthopedic systematic review. Cautious use and supervision of this general purpose LLM, ChatGPT, may aid in the process of systematic literature review. Further study and discussion regarding the role of LLMs in literature review is needed.
Yao, J. J. , Lopez, R. D. , Rizk, A. A. , Aggarwal, M. and Namdari, S. (2025). Evaluation of a Popular Large Language Model in Orthopedic Literature Review: Comparison to Previously Published Reviews. The Archives of Bone and Joint Surgery, (), -. doi: 10.22038/abjs.2025.84896.3874
MLA
Yao, J. J., , Lopez, R. D., , Rizk, A. A., , Aggarwal, M. , and Namdari, S. . "Evaluation of a Popular Large Language Model in Orthopedic Literature Review: Comparison to Previously Published Reviews", The Archives of Bone and Joint Surgery, , , 2025, -. doi: 10.22038/abjs.2025.84896.3874
HARVARD
Yao, J. J., Lopez, R. D., Rizk, A. A., Aggarwal, M., Namdari, S. (2025). 'Evaluation of a Popular Large Language Model in Orthopedic Literature Review: Comparison to Previously Published Reviews', The Archives of Bone and Joint Surgery, (), pp. -. doi: 10.22038/abjs.2025.84896.3874
CHICAGO
J. J. Yao , R. D. Lopez , A. A. Rizk , M. Aggarwal and S. Namdari, "Evaluation of a Popular Large Language Model in Orthopedic Literature Review: Comparison to Previously Published Reviews," The Archives of Bone and Joint Surgery, (2025): -, doi: 10.22038/abjs.2025.84896.3874
VANCOUVER
Yao, J. J., Lopez, R. D., Rizk, A. A., Aggarwal, M., Namdari, S. Evaluation of a Popular Large Language Model in Orthopedic Literature Review: Comparison to Previously Published Reviews. The Archives of Bone and Joint Surgery, 2025; (): -. doi: 10.22038/abjs.2025.84896.3874