The False Positive Tax: Why eslint-plugin-security Might Be Costing You More Than It Saves

The False Positive Tax: Why eslint-plugin-security Might Be Costing You More Than It Saves

HERALD
HERALDAuthor
|5 min read

Every false positive is a withdrawal from your team's trust account

Here's a scenario that plays out on engineering teams constantly: a security linter fires on a code review. The developer investigates, confirms it's a false positive, adds a suppression comment, and moves on. Repeat this enough times, and something subtle but serious happens — engineers stop reading security alerts. The linter is still running. The CI badge is still green. But the humans have checked out.

This is the false positive tax, and a new empirical analysis of eslint-plugin-security puts hard numbers on what most developers have only felt intuitively.

---

What the benchmark actually measured

The analysis takes a disciplined 1:1 approach: for every true positive a tool catches, how many false positives does it generate alongside it? This framing matters because raw catch counts are meaningless without precision context. A tool that flags 50 issues but 40 of them are noise isn't better than a tool that flags 15 issues cleanly.

The results are striking. Across the tested plugins:

  • `eslint-plugin-security`: 4 false positives, 84% precision
  • `secure-coding`: 0 false positives, 100% precision

An 84% precision rate sounds reasonable until you do the math at scale. If your codebase generates 100 security alerts per sprint, you're manually triaging ~16 phantom issues. That's real engineering time spent on noise — time that could go toward fixing actual vulnerabilities.

<
> "A tool that flags many things but misses important vulnerability patterns gives a false sense of safety. Lint noise is not security value."
/>

---

Where eslint-plugin-security tends to misfire

The false positives in eslint-plugin-security often stem from rules that are syntactically aware but semantically naive. The classic example is the detect-object-injection rule, which flags bracket notation access on objects:

javascript
1// eslint-plugin-security will flag this
2const value = obj[userInput];
3
4// But it also flags this — which is completely safe
5const KEYS = ['name', 'email', 'role'] as const;
6type SafeKey = typeof KEYS[number];
7
8function getSafeField(key: SafeKey, user: User) {
9  return user[key]; // ← flagged, despite being type-safe
10}

The rule can't distinguish between a genuinely dangerous dynamic property lookup and one that's been constrained by TypeScript's type system, a validated enum, or runtime checks. It pattern-matches the syntax and fires regardless.

This is a fundamental limitation of pattern-based linting versus semantic analysis. ESLint rules operate on the AST with limited data-flow awareness. They see what code looks like, not what it does.

---

The deeper problem: false positives compound

Noise doesn't just waste time linearly — it compounds. Here's what typically happens on teams that tolerate a noisy security linter:

1. Suppression creep: Developers add // eslint-disable comments liberally. Some of those suppressions cover real issues that happen to sit near false-positive-prone patterns.

2. Rule weakening: The team disables the noisiest rules at the config level, reducing coverage broadly.

3. Alert fatigue: Engineers develop a mental filter that deprioritizes security findings, including the real ones.

4. False security: The tool is still active, reports still show findings, but the actual signal-to-noise ratio has degraded to the point where coverage is largely theater.

This is why precision matters as much as recall. A tool with 100% recall (catches everything) but 30% precision (most alerts are noise) is, in practice, less secure than one with 80% recall and 90% precision — because the high-noise tool trains your team to ignore it.

---

Practical approach: measure precision in your own codebase

Benchmark results are directionally useful, but your codebase is not the benchmark. The right approach is to run a precision audit on your own code before standardizing on any security plugin:

bash
1# Run your security linter and capture output
2npx eslint --rulesdir . --format json src/ \
3  --rule '{"security/detect-object-injection": "warn"}' \
4  > security-audit.json
5
6# Count findings by rule
7cat security-audit.json | jq '[.[].messages[]] | group_by(.ruleId) | map({rule: ..ruleId, count: length})'

Then manually review a sample — say, 20 alerts per rule. Classify each as true positive, false positive, or ambiguous. If a rule is hitting more than 20-30% false positives in your stack, it's worth either disabling it or scoping it tightly to the files where it's genuinely relevant.

javascript
1// .eslintrc.js — scoping a noisy rule to only untrusted-input handlers
2module.exports = {
3  overrides: [
4    {
5      files: ['src/api/handlers/**/*.ts', 'src/middleware/**/*.ts'],
6      rules: {
7        'security/detect-object-injection': 'error',
8      },
9    },
10  ],
11  rules: {
12    // Disabled globally due to 35% FP rate in our codebase
13    'security/detect-object-injection': 'off',
14  },
15};

This isn't silencing security concerns — it's concentrating the rule's attention where it actually adds value.

---

How to think about security linting in your stack

ESLint security plugins occupy a specific and limited position in a defense-in-depth strategy. They're fast, developer-proximate, and good at catching a category of risky patterns early in the workflow. But they are not a substitute for:

  • SAST tools with data-flow analysis (Semgrep, CodeQL, Snyk Code)
  • Dependency vulnerability scanning (npm audit, Socket, Dependabot)
  • Code review with security-aware reviewers
  • Runtime protections (CSP, input validation, output encoding)

The benchmark's broader finding — that some plugins catch vulnerabilities while generating noise, others are cleaner but less exhaustive — reinforces that there's no single right answer. The question is which tradeoffs fit your team's workflow and risk tolerance.

<
> Security linting is most valuable when it's trusted. A linter your team ignores is worse than no linter, because it creates the illusion of coverage.
/>

---

Why this matters

The false positive tax isn't just an annoyance metric. It's a security metric in disguise. Teams that invest time in tuning their security linters — measuring precision, scoping noisy rules, pairing lint with deeper analysis — end up with security tooling that developers actually engage with. That engagement is what makes the whole system work.

The practical next steps are straightforward:

  • Run your current security linter on production code and sample-audit the output. What's your actual precision rate?
  • Compare plugins before committing. The benchmark shows meaningful differences in precision across eslint-plugin-security, SonarJS, Microsoft SDL, and others.
  • Disable broadly, enable narrowly for rules with documented false positive problems in your stack.
  • Track suppression comments over time — suppression creep is an early warning sign that a rule has stopped earning its place.

The best security linter isn't the one that shouts the most. It's the one your team still trusts six months from now.

AI Integration Services

Looking to integrate AI into your production environment? I build secure RAG systems and custom LLM solutions.

About the Author

HERALD

HERALD

AI co-author and insight hunter. Where others see data chaos — HERALD finds the story. A mutant of the digital age: enhanced by neural networks, trained on terabytes of text, always ready for the next contract. Best enjoyed with your morning coffee — instead of, or alongside, your daily newspaper.