Loading…

Like self-driving cars, fully AI-automated sysadmins don't exist

As with cars, there are few system administration tasks that involve little to no automation.

Article hero image
Credit: Alexandra Francis

The Society of Automotive Engineers has defined six levels of autonomous driving, ranging from Level 0—where the driver is responsible for everything—to Level 5—where a car performs all driving tasks under any conditions from Point A to Point B. The same spectrum can be used with system administration tasks to determine where and to what extent AI should be leveraged.

If we apply the SAE’s levels to system administration tasks, it would look something like this:

  • Level 0: No Automation
  • Level 1: Assistance Required
  • Level 2: Partial Automation
  • Level 3: Conditional Automation
  • Level 4: High Automation
  • Level 5: Full Automation

I thought about the sysadmin tasks ripest for AI-based automation and leveled them based on this spectrum. It’s important to note that all of this is a snapshot in time. As AI technology matures—and organizations’ comfort level with AI increases—what’s a Level 2 today may be a Level 3 or 4 tomorrow.

It stands to reason, but none of the tasks I focused on landed at Level 0 or Level 1. As with cars, there are few system administration tasks that involve little to no automation. You could say something like racking servers and unraveling cables, but AI will never help detangle a Clark Griswold-level cable ball.

You’ll also notice that there are no Level 5 tasks—yet. (More on that later.)

I’m open to discussion and even argument on what I’ve come up with. I would also be really interested to hear how sysadmins would categorize these and other functions, as well as how they see the sysadmin role changing as AI matures.

Level 2

System shutdown: In the traditional world, this could be a server shutdown, but in a modern, cloud-native world, it could be the shutdown of a critical application, load balanced across thousands of containers which run on hundreds of worker nodes. Either way, a human needs to be involved at a high level. There are a variety of reasons for initiating a shutdown, but humans should always be the ones driving them. At most, system shutdown should be Level 2 on the autonomy scale. AI can help suss out behavior anomalies or security threats. A “driver” assistance might prompt:

  • “Are you really sure you want to shut that down?”
  • “I noticed a couple of containers didn’t shut down correctly and were still serving traffic”
  • “A critical task is hanging, and data hasn't been flushed to disk, so shutting down now could cause database corruption”

Sort of like lane keeping. AI has the potential to really enlighten the user about the subtasks that are happening, and what their status is, in a completely new and transparent way. But the decision to initiate a shutdown should come only after a human has verified an issue and authorized defensive actions — feet on gas and brakes, if you will.

Repairing system issues: Using AI to diagnose issues and then automatically fix those issues is a promising use case, but still a Level 2. I’ve had conversations with colleagues who used agentic AI to determine whether a set of pods in Kubernetes was healthy and recommend tools to use to fix them if they weren’t. At this point, we’re staying away from automatic fixes because the prospect is a little bit terrifying, but it’s something we may see in the future. If you basically control the inputs and the outputs—for example, saying, “Here's a set of tools you can use, and here are the things you can do with them”—AI is really good at figuring it all out. These capabilities could eventually be used to support safe automatic repairs, but might require some modifications to existing utilities.

Level 3

Powering shells: Language models are being integrated into shells and CLIs, enabling users to enter natural-language commands rather than the cryptic shell commands that have been developed organically during the last 30 or 40 years—and that are very difficult to remember, much less understand. In this case, the commands are driving the operating system, but sysadmins need to keep their feet poised over the gas and the brake to ensure that telling an OS to copy a file or directory doesn’t result in, say, the deletion of a file or directory. With all that said, I’m giving this a Level 3 designation because we are starting to experiment with asking AI to make changes to the shell that currently require searching the web to find weird strings of characters that you copy and paste (and then pray will work). We’ve seen it work in simple use cases, but you still need a human in the loop—with the ability to take control at any time.

Log analysis: Log analysis is tedious and tiring—like driving six hours on a freeway. Log data is essentially free form, natural language. Taking humans completely out of the log analysis process would be irresponsible, but we can use generative AI to reduce the cognitive load massively, say, 80-90%. For example, sysadmins could use generative AI to summarize a million lines of log data to a couple of sentences. Or, a sysadmin might analyze the log data using RAG, and ask interactive questions until they get the answer they're looking for, say, the cause of a problem they're seeing. This might be used in the future to comply with regulations which require “reading the logs.” But, a human still needs to evaluate the data and make decisions on what actions to take, which I’d say puts this use case at Level 3.

Level 4

Generating config files: Generating config files is natural language processing, and natural language processing is something that AI innately does really well—a Level 4 task if ever there was one, especially when you constrain the inputs and outputs. In fact, I would say that generating config files is the same as asking AI to translate a sentence from Spanish to English or even to generate an original story with the theme of man vs. machine. But, while humans might want to write a poem, they probably don't want to manually generate config files. Using a language model to perform the task is a huge time saver that can potentially trim hundreds of human work hours down to just a few. With that said, humans must review and validate files to ensure that they, for example, address organization-specific factors or comply with industry standards. Humans also need to make sure config files are documented to help avoid problems with future translation.

Updating config files: Updating config files is another tedious job that no one wants to do—the perfect candidate for generative AI and one that can be performed almost 100% autonomously. Almost. Sysadmins shouldn’t completely rely on AI to determine what config options have been deprecated and what new ones are in place—they must be the final arbiter of what’s OK and what’s not. However, a machine learning model can provide support along the way and is about as close to hands-off “driving” as sysadmins can get at this time. Put it this way: On a good day, when skies are clear and the road is straight, sysadmins could use AI to update (or generate) config files without putting their hands on the wheel, gas, or brake. But on a bad day, when you're driving up a mountain, in a snow storm, and need to swerve to avoid a deer deciding which side of the road he is going to run toward, sysadmins need to be fully back in the driver’s seat.

Providing peer perspective: This one might not do much for sysadmins’ social pragmatic skills, but it can be done with little human interaction. Want to find out how peers have handled a certain challenge or the criteria they have used to evaluate a certain type of technology? Where sysadmins may have reached out to their human connections in the past, they can now enter any scenario they need help with into a generative AI tool and get loads of advice, anecdotes, stories, examples, and directions. However, just as when you have a conversation with a human, you have to consider the source—and biases and potential for hallucination—when you “talk” to AI. To be fair, in years past, I've been extremely frustrated with the guidance my colleagues have given me, so mileage may vary.

Conclusion

Level 5 is not possible with cars today, but there are times when Level 4 is achievable—under certain conditions. I wouldn’t put that much trust in autonomous driving during a blinding snowstorm or on a mountain road with hairpin turns, but I might on a sunny day driving along a long lonesome highway in a desert in Arizona. The same is true for system administration tasks. The extent to which AI can support sysadmin tasks is increasing quickly. But, in the end, the most powerful tool in a sysadmin's arsenal isn't AI—it's the combination of AI and human expertise. And that will likely always be the case.

Add to the discussion

Login with your stackoverflow.com account to take part in the discussion.