The framework of Ghostbuster, our new cutting-edge method for detecting AI-generated text.
Large language models like ChatGPT write impressively, so well, in fact, that they have become a problem. Students began using these templates to write assignments, leading some schools to ban ChatGPT. Additionally, these models also tend to produce texts containing factual errors. Cautious readers may therefore want to know whether generative AI tools have been used to write news articles or other sources before trusting them.
What can teachers and consumers do? Existing tools for detecting AI-generated text sometimes perform poorly with data that differs from what they were trained on. Additionally, if these models misclassify actual human writing as AI-generated, they may endanger students whose real work is called into question.
Our recent article presents Ghostbuster, a cutting-edge method for detecting AI-generated text. Ghostbuster works by finding the probability of generating each token in a document under several weaker language models, then combining functions based on these probabilities as input to a final classifier. Ghostbuster does not need to know which model was used to generate a document, nor the probability of generating the document under that specific model. This property makes Ghostbuster particularly useful for detecting text potentially generated by an unknown model or black box model, such as the popular commercial models ChatGPT and Claude, for which probabilities are not available. We are particularly interested in ensuring that Ghostbuster generalizes well, so we evaluated different ways of generating text, including different domains (using newly collected datasets of essays, news, and stories) , language models, or prompts.
Examples of human-written, AI-generated text from our datasets.
Why this approach?
Many current AI-generated text detection systems are fragile when it comes to classifying different types of text (e.g. different types of writing). fashionsor different text generation models Or instructions). Simpler models that use perplexity alone it generally cannot capture more complex features and performs particularly poorly on new writing domains. In fact, we found that a baseline based solely on perplexity was worse than chance in some areas, including data on non-native English speakers. Meanwhile, classifiers based on large language models like RoBERTa easily capture complex features, but overfit the training data and generalize poorly: we found that a RoBERTa baseline had abysmal generalization performance in the worst case, sometimes even worse than a perplexity-only baseline. No-shot methods which classify text without training on the labeled data, by calculating the probability that the text was generated by a specific model, also tend to have poor results when a different model was actually used to generate the text.
How Ghostbuster Works
Ghostbuster uses a three-step training process: probability calculation, feature selection, and classifier training.
Calculation of probabilities: We converted each document into a series of vectors by calculating the probability of generating each word in the document under a series of weaker language models (a unigram model, a trigram model and two GPT-3 models not optimized for instructions, ada and Davinci).
Feature Selection: We used a structured search procedure to select features, which works by (1) defining a set of vector and scalar operations that combine probabilities, and (2) searching for useful combinations of these operations using forward selection features, repeatedly adding the best remaining functionality.
Classifier training: We trained a linear classifier on the best features based on probability and some manually selected additional features.
Results
When trained and tested on the same domain, Ghostbuster achieved 99.0 F1 on all three datasets, outperforming GPTZero by a margin of 5.9 F1 and DetectGPT by 41.6 F1. Out of the domain, Ghostbuster averaged 97.0 F1 across all conditions, outperforming DetectGPT by 39.6 F1 and GPTZero by 7.5 F1. Our benchmark RoBERTa achieved 98.1 F1 when evaluated in the domain on all datasets, but its generalization performance was inconsistent. Ghostbuster outperformed the RoBERTa benchmark in all domains except out-of-domain creative writing, and had much better out-of-domain performance than RoBERTa on average (F1 margin of 13.8).
Results on Ghostbuster's in-domain and out-of-domain performance.
To ensure that Ghostbuster is robust to the range of ways a user can prompt a model, such as prompting different writing styles or reading levels, we evaluated Ghostbuster's robustness to several prompt variations. Ghostbuster outperformed all other approaches tested on these fast variants with 99.5 F1. To test generalization between models, we evaluated performance on text generated by Claudewhere Ghostbuster also outperformed all other approaches tested with 92.2 F1.
AI-generated text detectors were fooled by slightly modifying the generated text. We looked at Ghostbuster's robustness to changes, such as swapping sentences or paragraphs, rearranging characters, or replacing words with synonyms. Most changes at the sentence or paragraph level did not significantly affect performance, although performance gradually declined if the text was changed by repeated paraphrases, using commercial detection tools such as Undetectable AI, or by making numerous changes to words or characters. Performance was also better on longer documents.
Since AI-generated text detectors may misclassify text from non-English speakers as generated by the AI, we evaluated the performance of Ghostbuster on the writing of non-English speakers. All models tested had greater than 95% accuracy on two of the three data sets tested, but their results were worse with the third, shorter set of tests. However, document length may be the main factor here, since Ghostbuster performs almost as well on these documents (74.7 F1) as it does on other out-of-domain documents of similar length (75.6 to 93.1 F1).
Users who wish to apply Ghostbuster to real-world cases of potential prohibited use of text generation (e.g., student essays written by ChatGPT) should note that errors are more likely for shorter texts, distant domains from those on which Ghostbuster was trained (e.g., different varieties of English), text written by non-native speakers of English, generations of human-edited models, or text generated by prompting a model from AI to modify an entry created by a human. To avoid perpetuating algorithmic harm, we strongly recommend against automatically penalizing the alleged use of text generation without human supervision. Instead, we recommend careful, humane use of Ghostbuster if classifying someone's writing as AI-generated could harm them. Ghostbuster can also help with a variety of low-risk applications, including filtering AI-generated text from language model training data and checking whether online news sources are AI-generated. AI.
Conclusion
Ghostbuster is a state-of-the-art AI-generated text detection model, with F1 performance of 99.0 in the tested areas, representing a substantial improvement over existing models. It generalizes well across different domains, prompts, and models, and is well suited to identifying text from black-box or unknown models, as it does not require access to the probabilities of the specific model used to generate the document .
Future directions for Ghostbuster include providing explanations for model decisions and improving robustness against attacks that specifically attempt to fool detectors. AI-generated text detection approaches can also be used alongside alternatives such as watermark. We also hope that Ghostbuster can help you with various applications, such as filtering language model training data or reporting AI-generated content on the web.
Try Ghostbuster here: ghostbuster.app
Learn more about Ghostbuster here: ( paper ) (code)
Try to guess if the text is AI-generated here: ghostbuster.app/experiment