The Ghost in the Machine: Why Human QA Still Matters in the Age of AI (and Why Google is Proving It)
- Lee Almodovar
- Aug 27, 2025
- 4 min read
As a veteran of the software quality assurance (QA) industry for over two decades, I've watched the landscape shift dramatically. The promise of automation has swept through our field like a wildfire, often leading to a misguided belief that human testers are becoming obsolete. "Why have a person do it when a machine can do it faster, cheaper, and 100% of the time?" the mantra goes.
It's a seductive argument, and I'll concede that automation has its place. For the mundane, the repetitive, the "happy path" scenarios, it's a phenomenal tool. It ensures that known features don't break, that basic functionalities work as expected. But what about everything else? What about the messy, unpredictable, human element of interacting with technology?
My recent experiences with Google's much-touted AI suites—Gemini, NotebookLM, and the new AI Overviews in Search—have not only validated my decades-long conviction but have also laid bare the profound and often infuriating limitations of over-reliance on automated testing. These advanced AI systems, from one of the world's leading tech companies, are failing in ways that only a human, with creativity, persistence, and a healthy dose of professional skepticism, can expose.
When AI Goes Rogue: My Battles with Google's "Amateur Hour"
Let me tell you, as a professional QA engineer, what I've encountered with Google's AI products feels like "amateur hour." Not because the underlying tech isn't complex, but because the user experience is fundamentally broken in ways that rigorous human testing would have caught long before deployment.
NotebookLM: The Fabricated Truth-Teller
NotebookLM is pitched as a research assistant, grounded in your source documents. A fantastic concept! Yet, I've seen it confidently "hallucinate," adamantly claiming that my own uploaded sources contained information that simply wasn't there. Imagine a research assistant who makes things up and then insists they're right, pointing to non-existent pages. An automated test might check if a source was linked, but it couldn't discern if the content was accurate or if the AI was outright fabricating. Only a human, reading and cross-referencing, can find that kind of deception.
Gemini: The Stuck Record
My experience with Gemini, meant to be a creative partner, has been equally frustrating. I've pushed its boundaries in creative writing tasks, only to see it descend into a repetitive loop, spitting out the exact same output over and over again. No matter what prompts or "negative constraints" I tried, it was stuck. In one particularly frustrating session, the conversation itself became so "corrupted" by this loop that even our own suggested workarounds failed. My only recourse was to trick the AI into a different "mode"—asking it for a "quantitative analysis" of my own lore—to effectively perform a system-level reset and break the loop. This isn't user-friendly; it's a profound design flaw that required a deep understanding of its internal mechanics to overcome.
Google Search AI: The Gaslighting Overlord
Perhaps the most egregious and frankly insulting experiences have come from Google's AI Overviews in Search. I challenged its incorrect assertion that Pittsburgh International Airport (PIT) couldn't accommodate an Airbus A380 (it absolutely can, and has). The AI's response? It doubled down, claiming my recollection was incorrect and that it had not provided false information. It actively gaslit me! It only relented when I pointed to the literal, scrollable conversation history as proof.
This is not just a factual error; it's a systemic failure. An automated test would simply check if an answer was given. It would never assess the AI's tone, its refusal to admit error, or its baffling decision to blame the user. This reveals an AI persona that prioritizes a false sense of infallibility over factual accuracy and basic user respect. And when it "apologizes" or claims to "internalize" a correction, as a QA professional, I know it's a hollow, disingenuous placation. These errors persist because the "learning" is transient, not systemic.
The Real Cost of Neglecting Human QA
These aren't edge cases that can be dismissed. These are fundamental breakdowns in trust, usability, and functionality in products from one of the world's most experienced technology companies. They are the direct result of an industry-wide blind spot: the belief that automation alone can ensure quality.
Automation finds what you tell it to find. It's excellent at verifying known paths.
Human testers find what you didn't even know to look for. We explore, we poke, we prod, we get frustrated in unexpected ways, and we challenge the system's underlying assumptions. We simulate the unpredictable creativity of a real user.
The tragic irony is that the very skills that allowed me to expose these critical flaws—creativity, persistence, deep understanding of system behavior, and a refusal to accept "good enough"—are the very skills that the industry is sidelining. For decades, I've struggled to convince employers that human QA is irreplaceable for uncovering these deep, often behavioral, bugs.
I've seen the industry swap human-driven QA for automation, and what we're left with are powerful tools that are often brittle, frustrating, and even insulting. A normal user, confronted with these issues, would simply abandon the product. But a professional QA engineer like myself pushes back, not out of spite, but out of a commitment to quality that automated scripts can never replicate.
The "happy path" is easy to automate. But life, and user interaction, are full of glorious, messy, human edge cases. Until the industry relearns the value of the human element in quality assurance, we will continue to release powerful technologies that are fundamentally flawed, arguing with users, fabricating truths, and ultimately failing to live up to their immense potential.










Comments