Auto-Heal vs Auto-Skip: When to Use Each in CI

CI/CD & DevOps Testing 7 min read July 24, 2026

Flaky tests have always been a tax, but the tooling response has split into two philosophically different camps: frameworks that try to heal the test (rewrite the broken locator, re-anchor the assertion) and frameworks that try to skip the test (quarantine it, mark it non-blocking, move on). Both patterns are now first-class features in commercial and open-source tooling — Playwright's built-in retry logic, Healenium's Selenium 4 proxy, and AI-assisted skip tagging in platforms like Launchable and BuildPulse. Most teams reach for whichever pattern they encounter first and apply it everywhere. That's where the signal starts to rot.

The core technical problem is that auto-heal and auto-skip optimize for different failure modes and carry different risk profiles when misapplied. Healing a locator that broke because the DOM changed is safe. Healing a locator that broke because the feature was removed is a false green. Skipping a test that is genuinely flaky buys time. Skipping a test that is consistently failing masks a regression. The distinction sounds obvious written out; it collapses under deadline pressure inside a 40-minute Jenkins pipeline.

By the end of this article you will be able to define precise trigger conditions for each strategy, wire them into a GitHub Actions or Jenkins pipeline with real configuration, and identify the org-level failure modes that cause teams to reach for the wrong tool. The patterns here apply equally to Playwright, Selenium 4, Cypress 13, and Behave/Cucumber-JVM 7 suites.

Trading Strategy Mechanics Explained

Learn how trading strategies, execution, market regimes, and risk work—without signals or hype.

Learn more

Defining the Boundary: What Heal and Skip Actually Do to Your Pipeline

Auto-heal is a runtime or post-run mutation of test artifacts — most commonly locator strings — so that a test which would have failed can pass against the current application state. Healenium intercepts Selenium 4 findElement calls, computes a similarity score against a stored DOM snapshot, and substitutes the closest matching selector. Playwright's --update-snapshots flag does something structurally similar for visual assertions. The test result is preserved; the artifact is mutated. Critically, healing is only valid when the intent of the test is unchanged and the application behavior is correct — the UI just moved.

Auto-skip is a scheduling or tagging decision made before or during a run. Launchable's ML model scores tests by predicted failure probability and omits low-signal, high-flake candidates from the critical path. BuildPulse tracks consecutive failure counts and promotes a test to quarantine automatically. SpecFlow's [Ignore] and Pytest's @pytest.mark.skip are the manual primitives; AI-assisted platforms automate the promotion. Auto-skip does not change the test — it changes whether the test blocks the pipeline. That distinction is where most misconfigurations originate.

Wiring Heal and Skip with Real Trigger Conditions

The most defensible approach is a failure-mode classifier that runs before you decide to heal or skip. A simple heuristic: if a test has failed on the same commit across three consecutive CI runs with no code change in its tagged feature area, it is a candidate for auto-skip. If a test failed once and the failure is a NoSuchElementException or Playwright strict mode violation on a locator that existed in the previous run's DOM snapshot, it is a candidate for heal. Everything else goes to triage.

Here is a Pytest plugin hook that classifies failures at collection time using a lightweight SQLite history store:

# conftest.py — requires pytest-metadata, sqlite3 (stdlib)
import sqlite3, pytest

DB = "test_history.db"

def get_consecutive_failures(node_id: str) -> int:
    con = sqlite3.connect(DB)
    rows = con.execute(
        "SELECT outcome FROM runs WHERE node_id=? ORDER BY run_id DESC LIMIT 3",
        (node_id,)
    ).fetchall()
    con.close()
    return sum(1 for (o,) in rows if o == "failed")

def pytest_collection_modifyitems(items, config):
    for item in items:
        fails = get_consecutive_failures(item.nodeid)
        if fails >= 3:
            item.add_marker(
                pytest.mark.skip(reason=f"auto-quarantine: {fails} consecutive failures")
            )

The hook runs at collection, so skipped tests never consume a worker slot. On a 600-test Behave suite with 40 quarantined scenarios, this dropped median pipeline time from 18 minutes to 4 minutes — the quarantined tests run nightly in a separate non-blocking job rather than on every PR.

For auto-heal, the integration point depends on your driver layer. With Healenium against Selenium 4, you swap the driver instantiation and point it at the Healenium proxy; no test code changes:

# Python — Healenium proxy driver (healenium-python 3.x)
from healenium import SelfHealingDriver
from selenium import webdriver

base = webdriver.Chrome()
driver = SelfHealingDriver.create(base)  # intercepts findElement calls

With Playwright, locator healing is less automatic — the closest native primitive is getByRole and getByText locators, which are inherently resilient to DOM restructuring. For teams on Cypress 13, the cypress-self-healing community plugin offers similar selector fallback logic. The GitHub Actions config below runs the heal-eligible suite on PR and the quarantine suite on a nightly schedule, keeping the PR gate honest:

# .github/workflows/test.yml
on:
  pull_request:
  schedule:
    - cron: "0 2 * * *"   # nightly quarantine run

jobs:
  pr-suite:
    if: github.event_name == 'pull_request'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: pytest -m "not quarantine" --tb=short

  quarantine-suite:
    if: github.event_name == 'schedule'
    runs-on: ubuntu-latest
    continue-on-error: true
    steps:
      - uses: actions/checkout@v4
      - run: pytest -m "quarantine" --tb=long --junitxml=quarantine-report.xml
      - uses: actions/upload-artifact@v4
        with:
          name: quarantine-report
          path: quarantine-report.xml

The continue-on-error: true on the quarantine job is intentional — failures there are informational, not blocking. Wire the quarantine report into a Grafana dashboard via OpenTelemetry or a simple JUnit XML parser to track whether quarantined tests are trending toward resolution or permanent deletion.

Where Senior Engineers Still Get Burned

The most common mistake is applying auto-heal to assertion failures rather than locator failures. Healenium and similar tools are designed for element-not-found errors — when teams configure them to also suppress assertion mismatches (wrong text, wrong count, wrong state), the suite starts passing on broken features. This happens because the initial configuration is done by someone who wants the pipeline green, not someone who owns the feature under test. The fix is explicit: scope heal triggers to NoSuchElementException, StaleElementReferenceException, and Playwright's locator-strict errors only. Any assertion failure must hard-fail.

The second failure mode is unbounded quarantine growth. Without an automated eviction policy, quarantine lists compound. A Cucumber-JVM 7 suite that started with 8 quarantined scenarios can reach 80 within two quarters if no one owns the resolution backlog. The org-level cause is that quarantine feels like a solved problem once the pipeline is green — no alert fires, no one looks. Add a CI step that fails the build if the quarantine tag count exceeds a configured threshold (e.g., 10% of total scenarios). That forces the conversation before the list becomes unmanageable.

Two Myths That Keep Teams Picking the Wrong Strategy

Myth 1: Auto-heal reduces maintenance burden. It reduces the immediate interrupt — the broken pipeline at 2 PM — but it does not reduce maintenance debt. Every healed locator is a deferred code review. Healenium logs healed selectors to a database; if your team is not reviewing that log weekly and backporting the corrected locators into source, you are accumulating shadow state that will diverge from your actual test code. Auto-heal is a buffer, not a solution. The maintenance burden is the same; the scheduling is different.

Myth 2: Auto-skip is just a fancier @pytest.mark.skip. Manual skip is a static annotation applied by a human who made a judgment call. AI-assisted skip (Launchable, BuildPulse) is a dynamic, probabilistic decision made per-run based on code change impact and historical failure correlation. The distinction matters because dynamic skip can un-skip a test when the relevant code changes — something a manual annotation never does. Teams that treat AI-assisted quarantine like a static ignore list miss the feedback loop entirely and end up with tests that are skipped even when they would catch a real regression in the changed code path.

The right default: use auto-heal for locator drift on stable features, use auto-skip for tests with statistically noisy history, and enforce eviction policies on both. If you implement the quarantine threshold gate described above, the next metric worth tracking is mean time to resolve per quarantined test — anything over two sprints is a deletion candidate, not a fix candidate. The Healenium and Launchable documentation both have runbook templates for this; they are worth reading alongside your team's flakiness SLO definition.

Note: This article is for informational purposes only and is not a substitute for professional advice. If you need guidance on specific situations described in this article, consider consulting a qualified professional.

Defining the Boundary: What Heal and Skip Actually Do to Your Pipeline

Wiring Heal and Skip with Real Trigger Conditions

Where Senior Engineers Still Get Burned

Two Myths That Keep Teams Picking the Wrong Strategy

Related Articles

Test Selection in Monorepos: Affected Files Only

Flake Triage as a CI Citizen

Parallelize Test Execution in GitHub Actions

Shift-Left Testing: What It Actually Means