📦 Fixing Claude Code's process forking bug

I’ve been using Claude Code on the CLI recently, and I keep running into annoying bugs. A particularly egregious one was where Claude Code spawned thousands of zombie processes and exhausted my process limit, essentially fork-bombing me.

In the iTerm2 window where I had claude running, I saw the following output being spammed to the console:

EAGAIN: resource temporarily unavailable, posix_spawn '/usr/local/bin/pgrep'
      path: "/usr/local/bin/pgrep",
   syscall: "spawn pgrep",
     errno: -35,
 spawnargs: [ "-P", 635 ],
      code: "EAGAIN"

      at spawn (node:child_process:669:35)
      at spawn (node:child_process:14:39)
      ...

and eventually my machine crawled to a halt. On opening a new iTerm2 window, I saw:

forkpty: Resource temporarily unavailable

Cause

I looked it up and turns out Claude Code has a known bug on macOS where a child-process tracking loop repeatedly spawns pgrep, eventually exhausting the per-user process budget. The stack trace shows posix_spawn failing with EAGAIN while trying to spawn pgrep -P ....

The problem with these kind of bugs that exhaust your machine’s resources is that it’s hard to be ready to debug them when they do happen, since they take down the very thing you’re trying to debug with (i.e. my terminal). I couldn’t even open a new browser tab to a website, since that creates a new process in Chromium (for good, security-related reasons), and I had already hit the limit!

Mitigation: a wrapper script

I added a wrapper that caps the process limit for Claude’s process tree. I also moved my fix for Claude Code breaking paste bracketing to this Claude Code wrapper script.

#!/usr/bin/env bash
# Claude Code wrapper - limits max processes, fixes bracketed paste on exit

# Find the real claude binary
REAL=""
IFS=':' read -r -a dirs <<< "$PATH"
for d in "${dirs[@]}"; do
  cand="${d:-./}/claude"
  if [[ "$cand" != "$0" && -x "$cand" && ! -d "$cand" ]]; then
    REAL="$cand"
    break
  fi
done

[[ -z "$REAL" ]] && { echo "Error: claude not found" >&2; exit 127; }

# Cap max processes (default 1000) to prevent fork bombs
ulimit -u "${CLAUDE_CODE_MAXPROC:-1000}" 2>/dev/null || true

# Disable feedback survey popup.
export CLAUDE_CODE_DISABLE_FEEDBACK_SURVEY=1

echo "Claude wrapper active (maxproc=$(ulimit -u))" >&2

# Run claude
"$REAL" "$@"
rc=$?

# Fix bracketed paste mode that claude sometimes leaves enabled
printf '\e[?2004l'

exit $rc

Customizing the process limit

Set CLAUDE_CODE_MAXPROC to adjust:

export CLAUDE_CODE_MAXPROC=2000

If Claude goes haywire, it hits the process ceiling instead of taking down your whole session.

This helped a lot next time Claude Code went haywire, since it wasn’t able to completely exhaust my machine. I was able to do some analysis and confirm the issue.

Analysis

Checking the process limit vs. actual count:

# User process limit:
ulimit -u

# Actual process count
ps -U "$USER" | wc -l

Something was creating many, many processes. A histogram of process states showed the problem:

$ ps -u "$USER" -o comm= | sort | uniq -c | sort -nr | head -10
[a large number] <defunct>
... some other ones

This revealed many defunct (zombie) processes. Grouping zombies by parent PID:

$ ps -A -o ppid=,stat= | awk '$2 ~ /Z/ {print $1}' | sort | uniq -c | sort -nr | head -20
[a large number] 16784

Almost all were parented by a single PID, which turned out to be claude -r:

$ ps -p 16784 -o command
COMMAND
/Users/shivan/.local/bin/claude

Killing that process immediately dropped the count back to normal (~542) and iTerm started working again.

The actual fix

In the error output:

EAGAIN: resource temporarily unavailable, posix_spawn '/usr/local/bin/pgrep'
      path: "/usr/local/bin/pgrep",
   syscall: "spawn pgrep",
     errno: -35,
 spawnargs: [ "-P", 635 ],
      code: "EAGAIN"
      ...

It was confusing why Claude Code was using /usr/local/bin/pgrep. It should have been using the system-installed one in /usr/bin/pgrep. which pgrep also gave me the wrong location, so it wasn’t just Claude Code.

On more investigation, it looks like when I migrated from an Intel Mac to Apple Silicon, orphaned Homebrew files remained in /usr/local/ (the old Intel Homebrew location) while the new Apple Silicon Homebrew uses /opt/homebrew/. The orphaned /usr/local/bin/pgrep was a symlink to proctools, a Homebrew package that provided pgrep before macOS included it natively.

Critically: the proctools pgrep has a bug where the -P flag (filter by parent PID) is completely ignored. Instead of returning only child processes, it returns all processes on the system. So when Claude Code runs pgrep -P <pid> to track its child processes, the broken version returns hundreds of PIDs instead. If Claude Code iterates over these results spawning more pgrep calls, the process count explodes exponentially. This is exactly what happened in the PM2 project: “TreeKill fails to work and spawns hundreds of pgrep processes which typically make the system unusable”. Their fix was to always use native pgrep on macOS; exactly like in my case.

I removed all orphaned Intel Homebrew files, made sure I didn’t have proctools installed, and now which pgrep correctly returns /usr/bin/pgrep. I think this actually fixes the issue, since people on the GitHub issue said that brew uninstall proctools and using the system-installed pgrep fixed their problem.