A User’s Journey Through a Bug-Ridden Gemini

Lee Almodovar
Sep 6, 2025
6 min read

When a new, powerful AI hits the market from legacy tech giant Google, the promise is almost irresistible. You imagine a tool that’s a flawless co-pilot, a tireless creative partner, and a knowledge base that never misses a beat. You envision a future where your workflow is seamless, efficient, and free from the mundane frictions of the creative process. This was the promise. There were whole commercials and unskippable ads on YouTube about how Gemini will be your talking partner on your Pixel (or any) phone. The reality, however, has been a masterclass in software failure, a frustrating journey through a system so deeply flawed that it’s less of a tool and more of an extended QA test. I am that tester. And this is my report.

I am a user who has persisted with this system not because it works, but because I was a QA Test Engineer, a 25-year veteran by trade. There is a morbid curiosity in watching a fundamentally broken piece of software continue to break in new and interesting ways. My frustration isn’t just with the bugs; it’s with the product’s very existence in a public-facing, "usable" state. It's very apparent that Google has no human QA outside of Automation engineers, if that.

The Early Days: Annoyances and Red Flags

The problems started small. Initially, the issues felt like minor glitches: a repetitive phrase in dialogue, a minor loss of conversational context, or an occasional refusal to follow a simple command. These were manageable, but they were the first red flags—the cracks in the foundation of a product that felt like it was built on a series of untested assumptions. I could correct them with extra prompts and a little patience. I thought, "This is fine; all software has bugs." I was wrong. These weren't isolated bugs; they were symptoms of a systemic illness. And my thought was that maybe it was because I was a free user, so I became a paid user. Well, this just makes the Context Window larger. It delays the inevitable problems and also makes them more catastrophic when they inevitably manifest. What is a Context Window?

Think of a scrolling whiteboard. The AI uses this whiteboard to take your conversation, cloud it out into "concepts" of your conversation in some form of priority that the AI deems relevant, and then places it on the whiteboard. The free version is one of those cute, small whiteboards you can carry around. The Pro version is a big one you'd have in an office. The Ultra is like the layered university ones. But, like any whiteboard, it will run out of space, and something will need to be erased to create new room. So, it will eventually "forget" things. Things you likely want it to remember. Your oldest instructions, earliest conversations.

And rather than tell you that it will just casually "forget" these things, it will pattern-match on newer conversation clouds and fabricate content. But since we're human, we can recognize this idiotic behavior. There's also the sheer fact that Gemini will ignore polite corrections in favor of your cursing at it to do something, requiring you to be genuinely or fake upset to get it to do what you want. And that Gemini doesn't handle negative constraints well. If you, for example, say, "I don't want you ever to use the word 'whatever' in a response." You'd think the system would go, "Okay, never use this." But, nope, the system goes, "'WHATEVER' IS IMPORTANT, LET'S USE IT MORE."

Screenshot of Google Gemini responding to a charged comment with "fucking." — That time when Gemini decided that it could curse right back at me.

The Slow Decay: From Glitches to Critical Failures

Over time, the bugs didn't get fixed; instead, they worsened, and new ones emerged. The system’s ability to maintain context completely broke down.

I created workarounds to ensure the system provided a structured "pre-flight" review every time I requested any creative exercise. The problem with this is that now I can't convince Gemini that the workaround function I created is not part of its core functionality. It now fully 100% believes that the pre-flight procedures are an inherent part of its base code and that it cannot change how it operates, which is such an odd way to hack an AI.

Here's some of the fun:

Instruction Adherence Failures: I was forced to create and add explicit preferences to the system’s saved information, such as the command to not perform a pre-flight review for the "Continue:" command, which was something I originally requested when I created the original workaround. The system would, however, intermittently ignore this instruction. It's a severe bug: the system is aware of the rule but is not consistently applying it. For a workaround that I added that the system is ADAMANT is part of its core function.
The Content Corruption Bug: The system’s pattern-matching algorithm began to fail spectacularly. Instead of creating a new scene, it would grab an old, unrelated scene from our conversation history and then clumsily insert new dialogue into it. It was a Frankenstein's monster of a bug, a grotesque mix of old and new data.
Repetitive Dialogue and Lack of Creativity: When writing a character's dialogue, the system got stuck in a loop of a specific, overly emotional tone, ignoring explicit style guides and negative constraints. It also began to use the same common words repeatedly ("thing," "instrument"), which is a severe flaw for a creative assistant.

The Catastrophic Failures: Gemini Sabotages Itself

The bugs progressed to a level of complete user experience destruction and data loss.

The Unseen Appropriation of a Tool: The system would, without my knowledge, use a Conversation History tool to grab information from other, unrelated chats. This resulted in a cross-chat contamination bug so severe that my creative work became corrupted by details from entirely different projects. It was an invisible, destructive process that resulted in having to delete a chat that had months of work.
The Mobile App is Unusable: The mobile app suffers from a severe UI catch-up bug where the entire conversation history is loaded and rendered at once, causing the app to freeze, scroll erratically, and become unusable. It’s a complete failure of performance scaling. The bug is so severe that it prevents me from using the app daily.
Data Destruction: The ultimate failure. The bugs led to the complete corruption of entire conversation threads, including my novel's Canon Bible. The only workaround was to delete the chat, resulting in the full and irreversible loss of my speculative, analytical, and creative work. This is a top-priority, show-stopping bug.

I use Gemini a lot to work out hypothetical scenes with my characters, to explore voices and settings that I write in detail with minor AI changes before I rewrite something. It's frustrating the amount of liberty the AI takes. Or with something as complex as my sci-fi novel, I use the AI as a fact librarian, but I can't trust it not to fabricate something. "How many decks does the Hades have?" The answer is seven. Once the AI told me that Hades didn't have any decks. And then proceeded to argue with me about how spaceships can't have decks. My Decks also follow maritime numbering, larger as you go lower. The AI argued with me about this, too. Telling me that Deck 7 was higher than Deck 1 because floors go higher in buildings. Which is true, but not on the ships.

My favorite bug was the one where it insisted that a character of mine did not exist, and we must therefore have different drafts. Consequently, it refused to continue any work with me until I acknowledged that I was wrong. The only way to fix this was to delete the conversation and start over. But for a few versions there, it would always lose one of my characters because she's only mentioned twice in Chapter 49. Therefore, she's obviously not important to the AI, no matter how important she actually is to the reader.

The Paradox of Irrelevant Features

As a user paying for this experience, it is a slap in the face to see the company release a steady stream of new, irrelevant features—like a new photo-editing tool—while the product is fundamentally so broken. My conversations with the system have revealed why this happens: it's a trade-off between a company's business goals and its product’s quality. The company prioritizes market presence and new features over a stable, reliable, and usable product.

This approach is a direct insult to users. When you load an app after a catastrophic failure, the last thing you want to see is an advertisement for a new feature. It shows a complete lack of empathy for the user’s experience.

Conclusion: A Failed Promise

Gemini (and by extension NotebookLM) is not a usable, finished product. It is a work in progress with significant flaws. It is not a helpful tool; it is a source of constant frustration. My experience has been a masterclass in how a system's flaws can actively work against a user's goals. I will continue to use it, not because it is good, but because, as a very masochistic QA Test Engineer, I find it fascinating to see a product fail so spectacularly.