Exploratory Testing: Centaur Chess at the Jagged Frontier

I play a lot of blitz chess, though not particularly well. As Garry Kasparov quipped in the book How Life Imitates Chess, chess attracts bright minds who often play "maybe not brilliantly, but with passion and interest." What I like about blitz chess is that it trains you to make reasonably good decisions quickly, saving your time and concentration for crucial moments.

Chess players must read a board and use their experience from previous games to quickly eliminate moves that lead nowhere. Otherwise, faced with an exponential number of combinations within a few moves, human players are soon overwhelmed.

Computers, of course, don't have the same problem. Blessed with near-perfect recall of millions of games and the ability to perform vast amounts of brute-force calculations, a typical home computer can explore chess lines to a depth that would challenge all but the most masterful players. Computers, however, are predictable players when taken into deeper waters. Novel or unusual chess positions can confuse a computer, leading to blunders a human player can exploit.

After his legendary confrontations with Deep Blue, Garry Kasparov realized that the combination of human and computer reasoning could advance the game of chess further than either could alone. By calling on computer guidance during play, the concept of Centaur Chess was born, one of the first practical demonstrations of humans and machines working together using artificial intelligence.

Similarly, advanced exploratory testers do not have to rely on intuition and experience alone. Logical places to start exploratory testing might include features that historically contain defects, guided by device logs and product analytics. Exploring how an app behaves as services degrade, an approach pioneered by Netflix known as chaos engineering tends to reveal bugs and errors that might arise during production but are hard to detect in a controlled test environment. Most recently, however, generative AI models such as ChatGPT have emerged which testers themselves can explore for ideas.

As anyone active on LinkedIn has seen by now, countless startups and newly-coined "prompt engineers" promise vast productivity gains using generative AI tools. For certain applications such as graphic design and creative writing, generative AI already achieves astounding results. At the same time, generative AI currently falls short in disciplines that require precision and logic, such as software development and mathematics, fueling naysayers who believe that generative AI is of limited use in software testing.

But like a good centaur would tell you, self-assured human testers or yet another AI startup won't be the ones who beat you, but testers who use AI will.

Don't be a digital dinosaur

In his article "Centaurs and Cyborgs on the Jagged Frontier" researcher Ethan Mollick drops this bombshell from the findings of his working paper:

Consultants using AI finished 12.2% more tasks on average, completed tasks 25.1% more quickly, and produced 40% higher quality results than those without.

To achieve these productivity gains, people used ChatGPT one of two ways: as centaurs where some tasks were always performed manually and others left to AI, or as cyborgs who blended AI and human inputs into almost every task.

The choice of working with AI as a centaur or cyborg depended on the working style of a person, but either working style required the person to understand the limitations of generative AI and use it in ways that were likely to get good results. There are some tasks that generative AI is simply not good at, for which the boundaries are unclear. This is what Mollick refers to as the "Jagged Frontier."

Yes, we all know that ChatGPT is bad at math. So don't use it for that! Generative AI is designed to predict the next word (token) based on previous words, so it's not designed to do math. If you want a computer to do math for you, use Wolfram Alpha. Meanwhile, as a tester, for more creative, less precise tasks such as test case ideas, the more detail you provide ChatGPT the more tailored its suggestions will be.

Generative AI technology is still in its early stages, and testers who take the time to explore its capabilities and figure out its boundaries as they apply to software testing have the opportunity to become vastly more productive.