Aston Cook

Posted on Apr 15

The Flaky Test Question That Separates Senior QA Engineers From Juniors

#qa #testing #career #playwright

I've run more than 50 automation interviews in the past year. The same question exposes experience gaps faster than any other:

"Tell me about the last flaky test you fixed. Walk me through your debugging process."

That's it. Three sentences. And within two minutes, I usually know if a candidate has actually shipped automation or just read about it.

Here's what I hear from junior candidates, what I hear from senior ones, and what you should say if you want to sound like you belong in the room.

The junior answer

Junior candidates almost always say something like this:

"I added a wait. Then it passed."

Sometimes it's sleep(5). Sometimes it's page.waitForTimeout(3000). Sometimes they swap the selector. The test goes green, they merge the PR, and they move on.

I'm not trying to pile on. Every automation engineer has done this at some point, myself included. The problem is not the quick fix. The problem is that the candidate treats a flaky test as a nuisance to silence rather than a signal to investigate.

When I push and ask "why was it flaky in the first place," the room gets quiet. That silence is what I'm listening for.

The senior answer

Senior candidates treat flakiness as a diagnostic puzzle. They have a mental checklist. When I ask the same question, they give me something closer to this:

"I start by asking whether the flake is in the test, the app, or the environment. Most flakes I've seen in the last year have been race conditions where the test asserts before the app finishes a network call. I check the trace first, then I look at whether we have an auto-waiting locator strategy, then I look at test isolation."

Notice what's happening. They are not naming a tool. They are naming a process. Tools change. Process compounds.

The three buckets every senior engineer knows

When you get the flaky test question, structure your answer around three buckets. This is how I teach it in mock interviews on AssertHired, and it mirrors how strong engineers actually think on the job.

Bucket 1: The test itself

Most flaky tests are bad tests. Common patterns I see:

Hardcoded waits instead of web-first assertions
Selectors that rely on DOM structure instead of accessibility or test IDs
Tests that depend on previous test state
Assertions that run before the network call resolves

Playwright made a lot of this easier with built-in auto-waiting. Reports suggest teams moving from Selenium with custom waits to Playwright cut flake rates by around 60 percent, and that matches what I saw when we migrated our suite at Resilience. We did not get to zero flakes. We got to a place where flakes usually pointed at a real bug instead of a timing trick.

If you fix bucket one, you sound competent. You do not yet sound senior.

Bucket 2: The application under test

This is where the senior engineers earn their title.

A test that flakes on button.click() followed by a modal assertion might not be a test problem. It might be a product bug. The modal sometimes opens in 200ms and sometimes in 2 seconds because a backend call is slow under load. Fixing the wait hides the bug. Logging the timing, filing a ticket, and partnering with the dev team surfaces it.

In one interview last month, a candidate told me she found a flaky login test that was actually catching a rare session race condition. She escalated it, the team patched it, and the flake disappeared. That story took her 90 seconds to tell, and it got her the recommendation.

When you interview, bring a story like that. One is enough.

Bucket 3: The environment and infrastructure

Flaky tests in CI that pass locally are almost always environmental. Things I check:

Is the CI runner sharing resources? A test that passes on a fresh machine but flakes on a loaded runner is a concurrency problem, not a test problem.
Are test artifacts being cleaned between runs?
Is the database seeded deterministically or are we pulling from a shared staging DB?
Are we running with too much parallelism for the app to handle?

I once chased a flake for three days before realizing two parallel test workers were hitting the same user account. The fix was not in the test. It was in the test data strategy. We moved to per-worker test users, and the flake rate on that suite dropped from around 8 percent to under 1 percent.

Test data is where a lot of senior engineers separate themselves. If you have a story about building a fixture factory, a per-test tenant, or a cleanup hook that actually works, tell it.

What I want to hear in the first 60 seconds

When I ask the flaky test question, the best answers share three traits.

First, they start with a real example. Not "I would do X." Instead: "Last month I had a test that failed once every 20 runs." Real numbers beat hypotheticals every time.

Second, they name the category before the fix. Saying "this was a test isolation problem, not a timing problem" tells me the candidate has a mental taxonomy.

Third, they end with a system change, not just a code change. "We added a pre-run health check for the staging environment" is a better answer than "I added a retry." Retries are fine. Systemic improvements are better.

Common interview traps I see

A few anti-patterns that tank otherwise strong candidates.

Talking about test retries as the primary solution. Retries are a pressure valve. They are not a strategy. If your answer starts and ends with "we retry three times," I will assume you have not actually fixed a flake.

Name-dropping tools without context. Saying "we use Playwright with the trace viewer" is fine. Saying "we use Playwright with the trace viewer because the trace viewer showed us the button re-rendered during our click" is much better. The second version proves you actually use the tool.

Blaming the devs. Sometimes the app is the problem. Saying so is fine. Saying it with contempt is not. Strong engineers talk about partnering with the dev team, filing tickets, and sharing reproductions. That cultural signal matters more than people think.

Preparing for this question

If you have an interview coming up, spend 20 minutes doing this exercise.

Pick the last three flaky tests you worked on. For each one, write down: the category (test, app, or environment), the symptom, the root cause, the fix, and the systemic improvement you made after.

If you cannot fill in all five fields for at least one test, you need more reps before your interview. That's not a judgment. It's just a gap you can close in a week of focused work.

If you can fill in all five for multiple tests, you are further along than most candidates I see. Practice telling one story in under 90 seconds, and you are going to stand out.

The bigger point

Flaky tests are the most honest question I can ask in an interview. They reveal whether a candidate has actually shipped automation, whether they treat testing as a craft or a checkbox, and whether they think in systems or in snippets.

If you can answer this question well, you can probably answer most of my other questions too. And if you cannot answer it yet, the path is straightforward: ship more automation, fix more flakes, keep a journal of what you learn. There is no substitute for reps.

Aston Cook is a Senior QA Automation Engineer and founder of AssertHired, an AI-powered mock interview platform for QA professionals. He has conducted 50+ automation engineer interviews and writes about QA career development. Find him on LinkedIn (16K+ followers) or at asserthired.com.

DEV Community