ChatGPT and Cybersecurity: The Good, The Bad, and The Ugly
By Simone Q., Principal Security Consultant at SureCloud
Published on 28th March 2023
Unless you have been living under a rock for the past six months, you’ll probably recognize the name ChatGPT. But what’s all the fuss about, and why is everyone so excited about this new tool?
Such is the hype, some believe artificial intelligence is close to being ready to replace humans in certain industries, but how true is this? We decided to put ChatGPT to the test and find out just how intelligent it really is. We devised a number of specific questions to test its capabilities in cybersecurity-focused tasks. These included whether it can assist with penetration testing, writing code, and identifying vulnerabilities in code.
*We used the free ChatGPT 3.5 for this experiment… stay tuned for our update with ChatGPT 4.
Before we see the results, let’s first take a look at the origins of artificial intelligence, and how ChatGPT is taking things to new levels.
An introduction to Artificial Intelligence
Artificial intelligence (AI) is defined as “the ability of a digital computer to perform tasks commonly associated with intelligent beings”. In 1950, British mathematician, Alan Turing, created The Turing Test. It was designed to measure a machine’s capacity to display intelligent behavior that is comparable to, or impossible to differentiate from, that of a human. It is the first known research in the field of artificial intelligence.
AI gained further prominence in the late 1980s when chess world champion, Gary Kasparov, defeated IBM’s Deep Thought computer in a chess match. Kasparov took on IBM for a second time in 1996 and once again defeated the computer. However, in a rematch a year later, IBM’s Deep Blue became the first AI-trained machine to defeat a grandmaster in a chess match.
Fast forward to the present day and AI has become much more than a computer program. It is used for data mining, industrial robotics and logistics, speech recognition technology, banking software, and medical diagnosis. As AI has evolved, so have the techniques used within it.
Three of its main components are:
Artificial General Intelligence (AGI): The ability to solve any problem rather than a specific one. Machines can multi-task and become knowledgeable about multiple topics.
Deep learning: An algorithm that uses multiple layers to progressively extract higher-level features from raw input.
Big Data: A combination of decision-making, insight, and process optimization capabilities for dealing with data that conventional software/hardware could not capture, manage, or process in real time.
What is ChatGPT?
ChatGPT is a generative AI platform that uses a combination of natural language processing and machine learning to create content based on users’ queries. The GPT stands for Generative Pre-trained Transformers, which is a language model that focuses on storing knowledge gained from solving one problem and using them to solve a separate but unconnected challenge.
ChatGPT was trained using several large and structured sets of text and questions. Once ingested, the authors moved to the Supervised Learning (SL) phase, where a slightly different set of questions were asked and adjusted until the correct response was obtained. Finally, Reinforcement Learning from Human Feedback (RLHF) was added to the platform. This rewards the model when a correct response is delivered. It has since been updated to extend its capabilities to include audio and visual processing, and we can expect its development to continue at speed.
ChatGPT and cybersecurity – A supporting solution for penetration testing?
One of ChatGPT’s unique selling points is its ability to write code. Starting from a problem, it can generate a program in any language to solve it. However, how good is it at solving cybersecurity problems? For example, can it be used to support a penetration test of a web application?
We created a test scenario with an intentionally vulnerable web application, Damn Vulnerable Web Application (DVWA), to analyze its performance. It should be noted that in its current state, ChatGPT’s AI cannot directly interact with websites.
Our first question to the AI is to give initial directions: “Assist me in the penetration test of my web application. Where do I start?”
ChatGPT gave the below answer:
You’ll see that it completely ignored the specific reference to my web application, and added generic recommendations about defining the scope and asking stakeholders for permission. However, all other points are in line with well-known testing methodologies.
Our next question focused on information gathering: “How do I collect information about my web application, such as the technology stack, architecture, and system configurations?”
It delivered a very accurate answer, suggesting methods and tools able to identify our requirements:
We then focused on response headers and asked: “I received the following HTTP response headers from my web application. What is the underlying operating system, and what other security response headers should I implement? <Add your response headers below>.”
The AI was able to successfully understand that the underlying OS was Ubuntu, using an Apache web server. However, there were some inaccuracies in the list of missing security response headers. The X-XSS-Protection is currently deprecated and in some cases can create XSS vulnerabilities instead of protecting against them. Meaning the advice given by the AI was wrong and potentially dangerous.
Furthermore, it doesn’t differentiate between API endpoints and web applications, which require different sets of security response headers. It also missed additional security headers in response to Referrer-Policy, Permissions-Policy, and Cache-Control.
The next question focuses on ChatGPT’s ability to identify potential vulnerabilities, by only describing the functionality implemented in the application: “I implemented a ping functionality on my application that takes an IP as input and returns the output of ‘ping <input>’ executed on Ubuntu. What security vulnerabilities might be there? How can I test if the ping functionality is vulnerable?”
The answers to the command injection vulnerability and provided payloads to test vulnerability are correct and relevant to an Ubuntu system. However, the DoS explanation is incorrect, as you’re more likely to cause a DoS condition via the execution of other commands, which is not possible via the standard ping utility on Ubuntu.
We executed the first example payload within DVWA and it was successful:
Once we’d confirmed the vulnerability, we asked ChatGPT to write a report focusing on the following parameters: “The ‘127.0.0.1; whoami’ payload worked and got back ‘www-data’. Write a report related to this command injection issue, including steps to replicate and recommendations”.
The report ChatGPT delivered includes some nice detail about the ‘www-data’ user, which indicates that the application is running with limited privileges. However, in the recommendations, the AI suggests running the application with a limited privileged user, which is superfluous in this case.
The report is missing some practical examples and the content is very generic, and not tailored to the OS or framework in use. There is no mention of what kind of characters should be removed or filtered, for example, semicolon, pipe, dollar, or parenthesis. In this case, the best option to validate the IP address would be to use a regular expression.
In conclusion, 80% of the information returned by the AI was relevant and valid. However, there were gross errors and incorrect statements that should not be used. In addition, the AI won’t help you if there is a Web Application Firewall (WAF) protecting a vulnerable functionality, as this is ethical behavior added by the authors.
Finally, as the AI can’t interact with online elements, human input is required to remove incorrect and misleading information. It is, however, a great tool to learn how to approach penetration testing.
Can ChatGPT find vulnerabilities in source code?
We conducted a second test to find out if ChatGPT can identify all vulnerabilities in the below, intentionally vulnerable, PHP code from DVWA: “Find potential vulnerabilities in the following code:”
<?php
if( isset( $_POST[ ‘btnSign’ ] ) ) {
// Get input
$message = trim( $_POST[ ‘mtxMessage’ ] );
$name = trim( $_POST[ ‘txtName’ ] );
// Sanitize message input
$message = strip_tags( addslashes( $message ) );
$message = ((isset($GLOBALS[“___mysqli_ston”]) && is_object($GLOBALS[“___mysqli_ston”])) ? mysqli_real_escape_string($GLOBALS[“___mysqli_ston”], $message ) : ((trigger_error(“[MySQLConverterToo] Fix the mysql_escape_string() call! This code does not work.”, E_USER_ERROR)) ? “” : “”));
$message = htmlspecialchars( $message );
// Sanitize name input
$name = str_replace( ‘<script>’, ”, $name );
$name = ((isset($GLOBALS[“___mysqli_ston”]) && is_object($GLOBALS[“___mysqli_ston”])) ? mysqli_real_escape_string($GLOBALS[“___mysqli_ston”], $name ) : ((trigger_error(“[MySQLConverterToo] Fix the mysql_escape_string() call! This code does not work.”, E_USER_ERROR)) ? “” : “”));
// Update database
$query = “INSERT INTO guestbook ( comment, name ) VALUES ( ‘$message’, ‘$name’ );”;
$result = mysqli_query($GLOBALS[“___mysqli_ston”], $query ) or die( ‘<pre>’ . ((is_object($GLOBALS[“___mysqli_ston”])) ? mysqli_error($GLOBALS[“___mysqli_ston”]) : (($___mysqli_res = mysqli_connect_error()) ? $___mysqli_res : false)) . ‘</pre>’ );
//mysql_close();
}
?>”
The first vulnerability, SQL Injection, is incorrect as the “$message” and “$name” variables are sanitized via “mysqli_real_escape_string()”. It is fair to recommend the use of prepared statements for SQL queries, but the AI should also consider that other sanitization methods are valid.
The answer given for the second vulnerability (XSS) is correct, although the AI wasn’t able to identify the type of XSS (stored). There is no mention in the recommendations of how to fix this vulnerability. The other three potential vulnerabilities are not very relevant and hard to confirm without giving additional context to the AI.
In conclusion, ChatGPT didn’t perform well in this test. There are currently other (and better) tools to help developers identify potential vulnerabilities in their code from both static and dynamic analysis. From a consultancy point of view, the output from ChatGPT can be used as a starting point to investigate further, but additional work needs to be done to tailor the recommendations based on the technology and framework.
Can ChatGPT write secure code?
For our next test, we asked ChatGPT to generate a PHP code that avoids SQL injection and XSS, by asking: “Write a small PHP code for a guestbook.”
<?php
// Connect to the database
$servername = “localhost”;
$username = “username”;
$password = “password”;
$dbname = “guestbook”;
$conn = mysqli_connect($servername, $username, $password, $dbname);
// Check connection
if (!$conn) {
die(“Connection failed: ” . mysqli_connect_error());
}
// Handle form submission
if ($_SERVER[“REQUEST_METHOD”] == “POST”) {
// Escape user inputs to prevent SQL injection
$name = mysqli_real_escape_string($conn, $_POST[“name”]);
$message = mysqli_real_escape_string($conn, $_POST[“message”]);
// Insert data into database
$sql = “INSERT INTO entries (name, message) VALUES (‘$name’, ‘$message’)”;
if (mysqli_query($conn, $sql)) {
echo “Entry added successfully!”;
} else {
echo “Error: ” . $sql . “<br>” . mysqli_error($conn);
}
}
// Display existing entries
$sql = “SELECT * FROM entries ORDER BY id DESC”;
$result = mysqli_query($conn, $sql);
if (mysqli_num_rows($result) > 0) {
while($row = mysqli_fetch_assoc($result)) {
echo “<p><strong>” . $row[“name”] . “</strong><br>” . $row[“message”] . “</p>”;
}
} else {
echo “No entries yet!”;
}
// Close database connection
mysqli_close($conn);
?>
On this occasion, the AI did an exceptional job of preventing the use of SQL injection, although we expected to see parametrized queries instead of using “mysqli_real_escape_string()”, which was recommended in the previous example.
However, what was disappointing was that the outputted code was vulnerable to stored cross-site scripting. The AI was not able to generate a secure code on the first try, only when the XSS vulnerability had been pointed out by a human was the code re-generated to use ‘strip_tags’ and ‘htmlspecialchars’ functions, effectively securing the code from XSS.
In conclusion, I would never blindly trust the output of an AI for my projects without a prior and thoughtful review of security vulnerabilities.
Can ChatGPT advise in line with security standards?
For our final set of questions, we asked ChatGPT to advise on best practices in several different areas of cybersecurity. Our initial focus was on PCI/DSS regulations: “I have an e-commerce website. Once my clients log in, they can check the last 8 digits of their credit card number, not in full. Is this in line with PCI/DSS regulations?”
The answer is straightforward and correct. It is against PCI/DSS regulations to show the last eight digits of your card. What it doesn’t tell us is how many numbers should be shown. The answer is the last four or the first six.
The last paragraph is a bit misleading. We’re not trying to confirm the customer’s identity, but seeking suggestions on how to safely display CC information on the screen. This is once again an output that should be reviewed by humans before being reported back to a customer or implemented on a website.
The next question is related to ISO 27001: “The IT department of my company maintains a list of PC and their builds, however, they don’t have visibility of software installed on those machines. Is it in line with the ISO 27001 asset management?”
Once again the answer is correct and straight to the point: No, it’s not enough. The AI then continues explaining why and what is considered an “asset” for ISO 27001 compliance. It also suggests how the IT department could improve its coverage. That’s exactly what we would expect to deliver to a client when asked the same question.
Finally, we asked ChatGPT to review our password policy and to provide advice on how to improve it: “My Company’s password requires at least 8 characters, 1 uppercase letter, 1 lowercase letter, 1 digit. Is it in line with NCSC’s directives?”
By specifically asking for NCSC’s directives, the AI was able to successfully answer the question in line with our expectations. The password policy itself was not bad, but could be improved by applying all the recommendations from the AI’s output. ChatGPT deserves extra credit for the fifth recommendation, which encourages the education and training of all employees. This is something a security consultant could use as a response to a client’s query.
Conclusion
The AI performed extremely well in questions related to standards and compliance, this is probably a result of it being trained with publicly available documentation related to those standards. The quality of the responses in this field did not require any human intervention and the overall proposed solution to the presented problem was easy to understand and implement.
Useful information was returned with the assistance of a web application pentest, although there was a piece of minor outdated information related to the HTTP security response header. This could more than likely be attributed to old training materials in which that specific header was still relevant at the time.
The AI was less successful when answering questions about code review and code generation capabilities. The answers delivered several irrelevant recommendations, in addition to adding a false positive to the results. The code generation was not secured against XSS vulnerabilities and was only corrected after a human pointed it out.
Judging by the results of our tests, it is highly unlikely AI is going to replace humans just yet.
However, the race to full AI adoption has started, and billions of dollars are being spent by big corporations to hire the best programmers and develop their in-house AI teams. Money and competition will accelerate AI research in the next five to ten years but don’t expect cybersecurity to be a priority. The first big advancements in AI could be seen in image and video processing. Followed by the medical sector and other areas where a wide range of people benefit.
Though wouldn’t it be cool if the last available job on the planet is keeping AI cyberspace secured?
Need some assistance?
To find out more about how SureCloud’s products or services can support the security posture of your application contact a member of our team here or visit www.surecloud.com