- 4.7B pictures per day
- 7.5M blogs per day
- 3.7M videos per day (YouTube only)
- 400K books per day
- 100K songs per day
- 30 games per day (Steam only)
But many of today’s AI tools are incredibly empowering, filling in lacking skills, speeding up production and helping us over the hump when we’re mentally or creatively blocked. The result will be an acceleration in output, resulting in daily production that far exceeds the paltry numbers listed above. And it won’t be long before the AI tools no longer need us to prompt them, but rather will programmatically create on their own.
At that point, how can you separate the wheat from the chaff? Or, to put it another way, separate the authentic human voices from the computer-model-generated ones? While our eyes are extremely good at detecting the uncanny valley that AI generated pictures and videos inadvertently create (especially when they include humans), we’re not so good with other forms of media.
One way is to turn to Python and its use of machine learning techniques to help us identify the content that is most likely AI generated. The simplest version of this technique reads in text and performs a lexical comparison with the words, phrases and idioms that an AI is most likely to be biased to choose.
In this post we’ll look at two such tools: GPTzero and its iteration, DetectGPT. While the original concept for these projects was to identify text generated specifically by chatGPT, we’ll test the models against a range of AI text generators as well human-generated text to assess the state of the art in AI detection.
Getting Started with Python AI Detection
Before you begin, make sure that you’ve installed the GPTzero and DetectGPT Python runtime environments. Note that you can install both runtimes onto your system without fear of conflict since they are both installed into virtual environments, which you can then switch between to try them out.
Also note that on installation, the project also reached out to Github and pulled down the corresponding repository, automatically cloning it into the virtual environment, even if you don’t have Git installed on your system.
In order to download and install these ready-to-use Python projects, you will need to create a free ActiveState Platform account. Just use your GitHub credentials or your email address to register. Signing up is easy and it unlocks the ActiveState Platform’s many other dependency management benefits.
Or you can also use our State tool CLI to install the GPTzero and DetectGPT Python runtime environments as follows:
For Windows users, run the following at a CMD prompt to automatically download and install the DetectGPT Python runtime and project code into a virtual environment:
powershell -Command "& $([scriptblock]::Create((New-Object Net.WebClient).DownloadString('https://platform.www.activestate.com/dl/cli/911674306.1670279101_pdli01/install.ps1')))" -c'state activate --default Pizza-Team/DetectGPT'
Unfortunately, the ActiveState Platform currently will not build orjson, which is a dependency of gradio – a key requirement to run the GTPzero and DetectGPT projects. To get around this limitation, we can just pip install gradio by running:
python3 -m pip install gradio
Once that’s complete, we now have DetectGPT ready to go, but let’s also install the GPTzero project alongside it. Type Exit on the command line to exit out of the DetectGPT project.
You can now run:
powershell -Command "& $([scriptblock]::Create((New-Object Net.WebClient).DownloadString('https://platform.www.activestate.com/dl/cli/911674306.1670279101_pdli01/install.ps1')))" -c'state activate --default Pizza-Team/GPTzero'
Again, we’ll need to pip install gradio by running:
python3 -m pip install gradio
————————————————————-
For Linux users, run the following to automatically download and install the Computer Vision Python runtime and project code into a virtual environment:
sh <(curl -q https://platform.www.activestate.com/dl/cli/911674306.1670279101_pdli01/install.sh) -c'state activate --default Pizza-Team/DetectGPT'
Unfortunately, the ActiveState Platform currently will not build orjson, which is a dependency of gradio – a key requirement to run the GTPzero and DetectGPT projects. To get around this limitation, we can just pip install gradio by running:
python3 -m pip install gradio
Once that’s complete, we now have DetectGPT ready to go, but let’s also install the GPTzero project alongside it. Type Exit on the command line to exit out of the DetectGPT project.
You can now run:
sh <(curl -q https://platform.www.activestate.com/dl/cli/911674306.1670279101_pdli01/install.sh) -c'state activate --default Pizza-Team/GPTzero'
Again, we’ll need to pip install gradio by running:
python3 -m pip install gradio
Okay, we’re all set.
Detecting AI-Generated Text with Python
First, let’s start with GPTzero, which should be your active project. If it isn’t, you can make it the active project by running the following command:
state activate Pizza-Team/GPTzero
Now we can load up the detection model in a simple web application so we’re ready to go:
python3 webapp/main.py
Let’s give GPTzero some text for it to review by pasting in the introductory section above. This will give us a “human-written” baseline. GTPzero’s assessment is pretty definitive:
{ Perplexity: 24, Perplexity per line: 145.27777777777777, Burstiness: 574, label: 1 } The Text is written by Human.
Now let’s try evaluating output from ChatGPT. We’ll get ChatGPT to write a short story about a sentient turtle so it will need to generate something from scratch, rather than reinterpreting an existing text.
Input: write the beginning of a novel about a sentient turtle that becomes aware that she’s different from the other animals in her pond
Output (abbreviated): The sun glinted off the tranquil surface of the pond, casting a golden glow over the water. In the midst of the calm, there was a turtle named Tilda who basked on a log, her dark green shell gleaming in the warm light. She had lived in the pond for as long as she could remember, watching the world go by at a leisurely pace.
But one day, as Tilda floated lazily on the water’s surface, a strange sensation washed over her. She felt an odd tingling in her flippers, a tingling that seemed to spread through her entire body. At first, she dismissed it as a passing sensation, but as the tingling continued, Tilda realized that something was changing within her.
But it was not just her physical abilities that were changing. Tilda found herself pondering existential questions that had never crossed her mind before. She wondered about her place in the world, her purpose, and her identity. Who was she, really? Was she just a turtle, destined to live out her days in the pond, or was there something more to her existence?
Tilda’s journey of self-discovery had only just begun, and she knew that it would be filled with challenges and uncertainties. But she was determined to embrace her uniqueness, to explore the world with open eyes and an open heart, and to find her place in a world that seemed both familiar and foreign. And so, with a newfound sense of purpose and curiosity, Tilda set out on a quest to unravel the mysteries of her existence and discover her true identity as a sentient turtle.
GPTzero’s evaluation of the story is again definitive:
{ Perplexity: 6, Perplexity per line: 22.90909090909091, Burstiness: 60, label: 0 } The Text is generated by AI.
Now let’s try something in between. We’ll ask ChatGPT to “What is Einstein’s Theory of Gravity?” for which it should have a lot of human-written reference information in its training database to compose an answer from, rather than having to make it up from whole cloth. GPTzero is not fooled:
{ Perplexity: 5, Perplexity per line: 12.11111111111111, Burstiness: 21, label: 0 } The Text is generated by AI.
Finally, we’ll use a different AI text generator (copy.ai) to write a blog post about Einstein’s Theory of Gravity, and then ask GPTzero to evaluate it:
{ Perplexity: 12, Perplexity per line: 14340.615384615385, Burstiness: 185975, label: 1 } The Text is written by Human.
This is not entirely unexpected since GPTzero was originally designed to recognize ChatGPT specifically, rather than AI-generated text in general. DetectGPT is a more general model, so let’s ask it to evaluate the copy.ai output.
First we’ll need to activate the DetectGPT project by running the following commands:
exit state activate Pizza-Team/DetectGPT python3 webapp/main.py
Now we can evaluate the copy.ai blog post:
{ confidence: "80.57%", label: 0 } This text is most likely generated by an A.I.
We can also ask DetectGPT to evaluate our three other texts, but the results are very convincing: the DetectGPT model is excellent at discerning human-created from AI-generated text.
Conclusions – Python for AI Detection
When a model is overfitted to a single AI text generator (such as GPTzero being overfitted to ChatGPT), it can be fooled by text generated by a different AI system. However, as DetectGPT clearly shows, it is possible to create a more general model that quite accurately recognizes AI-generated text.
Using ActiveState Python is a simple and easy way to evaluate different projects/models at the same time since it:
- Automatically clones the Github project (no need for Git to be installed on your system) and Installs the associated runtime environment – all with a single command.
- Automatically Installs all projects into virtual environments by default to ensure they don’t step on each other or cause dependency conflicts.
- Supports pip install if a required package is currently missing/won’t build on the ActiveState Platform, giving you flexibility.
- Makes switching between projects as simple as exiting from the current project and activating the next one.
Next steps:
Sign up for a free ActiveState Platform account so you can download the GPTzero and DetectGPT projects and try them for yourself.
Read Similar Stories
Learn how to use ArcGIS for Python to solve complex vehicle routing problems in order to maximize delivery timeliness and minimize mileage.
Python tutorial – use Abstractive Text Summarization and packages like newspaper2k, PyPDF2, and SPaCy to summarize text with deep learning.
As AI-generated text improves, can we programmatically distinguish it from human created text? Read this blog to find out.