Hands-On: Troubleshooting and Mitigating CPU Spikes on a WooCommerce Production Site with a Three-Layer Defense

Background

The e-commerce production site (WooCommerce) experienced two CPU spikes: system load surged from a normal ~3% to 77% (first incident) and 68% (second incident). All php-fpm workers were saturated, causing the site to return HTTP 000 (connection timeout).

Root Cause

A large volume of crawlers sent GET requests with URLs containing parameters add-to-cart=XXXXX&currency=XXX. Such requests trigger WooCommerce’s full “add-to-cart” logic—including database queries and session writes—making them significantly heavier than ordinary page views.

Key characteristics:

  • HTTP method: GET (legitimate add-to-cart actions should be triggered via POST upon user button clicks)
  • Absence of the WooCommerce session cookie (indicating non-human traffic)
  • Originating from numerous distinct IPs, spoofing various Chrome User-Agent strings
  • Aggressive pagination: /items/page/N?add-to-cart=XXXXX

Cross-validation between sar system metrics and access logs revealed that HTTP 499 status codes (client-side timeout/disconnect) accounted for over 60% of responses—confirming that new requests queued up and timed out due to exhausted php-fpm workers.

Mitigation Process (Three Layers → Consolidated to One)

Layer 1: mu-plugin (PHP layer — removed)

Initially, a PHP interception script was deployed in wp-content/mu-plugins/ to check for: GET request + add-to-cart parameter + missing session cookie → return HTTP 403.

Problem: mu-plugins execute only after WordPress has fully loaded. Each malicious request still consumed 2–3 seconds of a php-fpm worker. Under high crawler volume, workers remained saturated.

Layer 2: auto_prepend_file (PHP engine layer — removed)

We then leveraged .user.ini to register an auto_prepend_file, executing the blocking logic before WordPress initialization.

Advantage: Returns HTTP 403 in milliseconds; php-fpm workers are barely utilized.
Limitation: Still passes through php-fpm, so some overhead remains under extreme concurrency.

Layer 3: nginx rules (Final solution — retained)

We implemented conditional blocking directly in the nginx virtual host configuration using nginx variables:

# Anti-bot: block GET add-to-cart requests lacking WooCommerce session
if ($request_method = GET) {
    set $block_atc "";
}
if ($arg_add-to-cart) {
    set $block_atc "${block_atc}Y";
}
if ($http_cookie !~* "wp_woocommerce_session_") {
    set $block_atc "${block_atc}N";
}
if ($block_atc = "YN") {
    return 403;
}

Logic: If request is GET and contains add-to-cart and lacks the WooCommerce session cookie → nginx returns HTTP 403 immediately, without invoking PHP at all.

nginx does not support nested if statements, so string concatenation (set $var "${var}X") is used to simulate logical AND conditions—a well-established nginx technique. The variable $block_atc equals "YN" only when all three conditions are satisfied.

Consolidation

After confirming the nginx-layer rule was effective, we removed both the mu-plugin and the auto_prepend_file logic, retaining only the single nginx-based interception point. This reduces maintenance overhead and eliminates potential interference among multiple layers of rules.

Results

  • System load dropped from 41 → 1.2 (recovered within 5 minutes)
  • Malicious crawler requests are blocked by nginx with HTTP 403 — zero PHP resource consumption
  • Legitimate users remain unaffected (they possess the WooCommerce session cookie)
  • Normal add-to-cart functionality works seamlessly (real browsers submit with valid session cookies)

Lessons Learned

  1. WooCommerce’s GET-based add-to-cart is an inherent attack surface: Anyone can construct a URL to trigger full cart logic—and WooCommerce imposes no default rate limiting.
  2. PHP-layer blocking is symptomatic, not fundamental: Even returning HTTP 403 consumes php-fpm resources. In high-concurrency scenarios, blocking must occur at the nginx or CDN layer.
  3. Multi-condition evaluation in nginx uses string concatenation: The pattern set $var "${var}X" followed by if ($var = "XY") is the standard, reliable approach.
  4. auto_prepend_file is an excellent intermediate solution: When nginx access is unavailable, it delivers orders-of-magnitude better performance than mu-plugins.
  5. Ultimate defense should be CDN-based protection (e.g., Cloudflare): Block malicious traffic at the edge—before it even reaches nginx.

Follow-up: Two New Issues

Issue One: Cavalcade Zombie Tasks Causing Repeated CPU Spikes

After deploying the nginx crawler rules, CPU usage continued spiking repeatedly. Investigation revealed a second root cause: Cavalcade (WordPress’s background task scheduler) had accumulated a large number of zombie tasks.

Symptoms: 940,000 failed tasks queued up; four tasks in “running” status had been active since February 26; all 419 tasks in “waiting” status had expired and were being rescheduled continuously; a single WP Rocket SaaS task consumed 78% CPU.

Root Cause: The Worker class in the Cavalcade runner lacks any timeout mechanism. In PHP CLI mode, max_execution_time=0, so once a task hangs, the worker process is never reclaimed.

Fix: Added a 300-second timeout to the Worker (SIGTERM → SIGKILL), reduced max_workers from 4 to 2, cleared 940,000 failed jobs and 7.7 million logs, and restarted three systemd services.

Issue Two: nginx Cookie Detection Rules Incorrectly Blocking Legitimate Customers

Original rule logic: Return HTTP 403 for GET requests to add-to-cart endpoints lacking the woocommerce_session cookie. However, WooCommerce sets its session cookie only after the first “add-to-cart” action—so new customers were immediately blocked with 403 on their very first “Add to Cart” click.

Fix: Switched to rate limiting. Added map + limit_req_zone in the http block of nginx.conf, applying selective rate limiting based on the add-to-cart parameter (rate=6r/m burst=3 nodelay). Excess requests return HTTP 429. Normal users rarely add more than three items to cart within one minute; crawlers making bulk requests receive 429 responses with zero PHP resource consumption.

Additional Lessons Learned

  • Cavalcade provides no built-in worker timeout—this must be implemented manually during deployment.
  • Cookie-based detection logic must account for cases where the cookie does not yet exist during the user’s first interaction.
  • Using nginx map + limit_req is the optimal per-parameter rate-limiting solution for WooCommerce.