AI-authored detection of academic papers
AI-authored detection of academic papers can now have over 99% accuracy thanks to Desaire’s team’s innovative XGBoost model.
Desaire and her research team have shed new light on how to differentiate between academic content authored by AI and humans. This groundbreaking method adds another layer of detection and security, crucial in an era where AI language models like ChatGPT are increasingly used.

Introduction
As AI models like ChatGPT, developed by OpenAI, become more common, the ability to distinguish between AI-generated and human-authored texts is crucial. Desaire’s team focuses on this challenge in the academic realm, where they’ve leveraged supervised classification methods for creating an innovative solution.
The Issue at Hand
ChatGPT’s capacity for wide-ranging writing tasks, from grammar corrections to research report generation, made it popular, garnering over 100 million subscribers within two months of its release. The potential misuse of such technology, especially for academic writing, calls for effective detection strategies to distinguish between human and AI outputs.
Current Detection Techniques
Existing detection techniques span a broad spectrum, from ‘zero-shot’ methods that require minimal training to intensive deep learning methods like RoBERTa. While both have shown promise, they face difficulties when distinguishing between academic writing by humans and ChatGPT’s outputs.
Prior studies, focusing on online data sources like Reddit or Wikipedia, demonstrated that the RoBERTa detector could correctly identify authorship (human or ChatGPT) with high accuracy. However, the less emotive language of academic writing poses a unique challenge that these detectors struggle to overcome.
Desaire’s team cites studies like those by Gao et al., which showed that human reviewers correctly identified AI from human-authored medical abstracts less than 70% of the time. The online adaptation of RoBERTa, the GPT-2 Output Detector, fared slightly better with an accuracy of 82%.
The Proposed Solution
Desaire and her team sought to answer two main questions: Can a leading approach effectively distinguish between AI and human academic science writing? And, can a better classification strategy be devised? Their investigation centered on the GPT-2 Output Detector for its proven success, wide acceptance, and comprehensive literature.
Additionally, they sought to develop a method that uses a novel, pertinent training dataset to pinpoint a minimal set of human-identified features. The ultimate goal was to design a model that didn’t rely on deep learning but could discern the unique writing quirks of academic scientists.
The team’s strategy involved assembling a training dataset comprising 64 articles from the journal Science and 128 ChatGPT-generated examples. This ensured topic diversity and freedom from discipline-specific norms. Post model development, they tested it against two sets containing human-authored articles and ChatGPT-generated essays, yielding over 1200 test paragraphs.
Their discussion highlights this as the first effective method to differentiate between human and AI-generated academic writing. They highlighted the features considered, like diversity in sentence length, paragraph length, punctuation usage, and common word usage. They stress that their method isn’t a one-size-fits-all model, but a template that can be applied to various domains, especially academic literature.
However, as a proof-of-concept study, Desaire and her team acknowledge that further research is needed. The goal is to assess the approach’s applicability across broader contexts, test it on larger datasets, and explore different types of academic writing.
The team also emphasizes the ‘arms race’ in the development of AI detectors, necessitating more researchers to join the effort. They suggest expanding the current features or designing new ones for enhanced detection. For instance, they propose assigning scores to words based on how commonly they appear in the language model.
The study concludes by confirming their achievement in evaluating the GPT-2 Output Detector’s effectiveness and developing a superior method. The newly proposed approach shows greater accuracy at the document level, proving beneficial for academic writing, given a representative training dataset.
Conclusion
Desaire’s team provides a promising solution for detecting AI-authored academic literature, offering a practical alternative for researchers lacking deep learning expertise. This breakthrough might redefine the landscape of AI author detection in academic and specialized domains.
Resources
Desaire, H., Chua, A. E., Isom, M., Jarosova, R., & Hua, D. (2023). Distinguishing academic science writing from humans or ChatGPT with over 99% accuracy using off-the-shelf machine learning tools. Cell Reports Physical Science, 101426. DOI: https://doi.org/10.1016/j.xcrp.2023.101426