Agentic GTM with PhantomBuster: Lessons from Building an Intent-Based Lead Prospecting Pipeline

The PhantomBuster API is powerful, underdocumented, and full of operational surprises. Here are the lessons from building an automation that orchestrates Phantoms to find high-intent B2B prospects on LinkedIn — and what “intent-based” actually means in practice.

Most lead generation tools give you a list of people who match a job title. An intent-based pipeline gives you a list of people who match a job title and just publicly signalled that they care about the problem you solve.

At QualitaX, we built a “Prospecting” agent that finds LinkedIn posts about specific pain points, extracts everyone who engaged with them, filters for ideal customer profile matches, classifies their companies with an LLM, and pushes qualified contacts into HubSpot — fully automated, once a week (or once a day depending on configuration), unattended.

PhantomBuster is the engine that powers the two most expensive stages of this pipeline: extracting post engagers and enriching LinkedIn profiles. It is also where every significant production issue originated.

This article documents what we learned — not from the documentation, but from running real workloads against real API constraints.

1. Working Code Describes Reality.

Before writing a single line of integration code, we studied a working PhantomBuster implementation. Reading its 1,800-line monolith taught us things that the API documentation did not:

PhantomBuster’s Post Commenter & Liker Scraper has four workers — Master, Post Extractor, Commenter Worker, Liker — that must be orchestrated sequentially.
Worker IDs are not configured anywhere in the API. They are discovered from the master agent’s v1 API output by parsing log lines.
The real CSV field names from PhantomBuster (profileUrl, hasLiked, occupation, postsUrl) don’t match what you’d guess from the documentation.
The v1 and v2 APIs expose different capabilities. The v2 API is the primary interface, but the v1 API has endpoints — like /agent/{id}/abort — that v2 simply does not offer.

The working code also revealed the actual orchestration sequence for getting data from a single post:

async def _scrape_one_post(client, post_url, master_id, workers, pb_key, ...):
    # Step A: Save post URL to master agent config
    await _save_agent_argument(
        client, master_id, pb_key,
        {"linkedinPostUrl": post_url, "inputType": "linkedinPostUrl"},
    )

    # Step B: Launch post extractor worker
    await _launch_and_wait(client, workers["post_extractor"], pb_key, ...)

    # Step C: Launch master (processes commenters)
    await _launch_and_wait(client, master_id, pb_key, ...)

    # Step D: Launch liker worker
    await _launch_and_wait(client, workers["likers"], pb_key, ...)

    # Step E: Launch master again (collects all results)
    agent_data = await _launch_and_wait(client, master_id, pb_key, ...)

    # Step F: Download combined CSV from S3
    raw_results = await _download_result_csv(client, agent_data, master_id, pb_key)
    return _normalise_engagers(raw_results, post_url, topic_label, ...)

Four sequential agent launches, each requiring its own poll-until-done cycle, for a single LinkedIn post.

Lesson: When building an integration with a complex API, find a working implementation and study its actual data flows. API documentation describes the interface. Working code describes the protocol.

2. HTTP 429 Can Mean Two Completely Different Things

When PhantomBuster returns HTTP 429, it can mean either of two things:

API rate limit: you are making too many API calls per second. Wait a few seconds, retry the same request. It will work.
Parallelism limit: your plan only allows N agents running simultaneously, and you have hit that ceiling. Retrying the same launch request will never succeed — you need to wait for the currently running agent to finish.

Our original retry logic treated both as the same condition: wait 15 seconds, retry. For a parallelism limit, this burns through all retries while the other agent continues running for minutes.

The fix required a fundamentally different retry strategy — not “wait and retry the request,” but “wait for the blocking condition to clear”:

async def _wait_for_slot(client, pb_key, poll_interval=20, timeout=300):
    """Wait until no PhantomBuster agents are currently running."""
    start = time.time()
    while time.time() - start < timeout:
        all_agents = await _pb_request(
            client, "GET", f"{PB_API_V2}/agents/fetch-all", pb_key,
        )
        running = []
        for a in all_agents:
            output = await _get_agent_output(client, str(a["id"]), pb_key)
            if output.get("status") == "running":
                running.append(a.get("name", a["id"]))

        if not running:
            return

        logger.info(
            "Waiting for PB slot - %d agent(s) running (%s)...",
            len(running), ", ".join(str(r)[:30] for r in running),
        )
        await asyncio.sleep(poll_interval)

    logger.warning("Timed out waiting for PB slot after %ds", timeout)

This turns a retry loop into a sequencing mechanism. Every agent launch is now preceded by _wait_for_slot(), which blocks until the parallelism constraint is satisfied rather than retrying blindly.

Lesson: When an API returns a generic rate-limit status code, investigate whether it covers multiple distinct conditions. “Too many requests per second” and “too many concurrent operations” require fundamentally different strategies. The first is a timing problem. The second is a capacity problem.

3. The Status Field You See Is Not the Status You Need

PhantomBuster’s fetch-all endpoint returns a list of agents, each with a lastEndStatus field. The natural assumption is that this tells you whether an agent is currently running. It does not — it shows the status of the last completed run. An agent that is actively running right now still shows its previous run’s end status.

To check whether an agent is currently running, you must call fetch-output for each individual agent and check the status field on that response. This is a per-agent API call, not a bulk operation:

# WRONG: lastEndStatus shows the PREVIOUS run's result
all_agents = await _pb_request(client, "GET", f"{PB_API_V2}/agents/fetch-all", pb_key)
for a in all_agents:
    if a.get("lastEndStatus") == "running":  # This field is never "running"
        ...

# CORRECT: fetch-output gives the CURRENT execution status
for a in all_agents:
    output = await _get_agent_output(client, str(a["id"]), pb_key)
    if output.get("status") == "running":  # This is the real-time status
        ...

This distinction cost us an outage. The _wait_for_slot function originally checked lastEndStatus and concluded no agents were running, then launched a new agent — which immediately hit the parallelism limit.

Lesson: “status” fields in APIs can often describe the last completed state, not the current state. Do not assume. When you need real-time running status, verify you are checking the right endpoint — and write a test that confirms the difference.

4. “Error” Does Not Mean “No Results”

PhantomBuster’s multi-agent workflows can produce complete results and still exit with an error status. This is not a bug — it is a consequence of multi-step orchestration where later steps can fail after earlier steps have already persisted their output.

In production, we observed the master agent successfully retrieved 6 posts, found 290 profiles, and saved 132 qualified leads to a CSV — then exit with status error and exit code 1 because a worker agent reported “Invalid argument.” The CSV was valid. The 132 leads were real. Our pipeline discarded all of them.

The original polling logic treated any non-finished status as a hard failure:

# BEFORE: error status = discard everything
if status in ("error", "launch error"):
    raise ToolExecutionError("phantombuster", f"Agent error: {last_line}")

The fix: check for evidence of persisted results before discarding an errored run:

# AFTER: check for saved results before raising
if status in ("error", "launch error"):
    has_results = any("CSV saved" in ln or "result.csv" in ln for ln in lines)
    if has_results:
        logger.warning(
            "Agent %s ended with error but produced results - "
            "treating as partial success: %s",
            agent_id, last_line[:200],
        )
        return await _get_agent_status(client, agent_id, pb_key)

    raise ToolExecutionError("phantombuster", f"Agent error: {last_line}")

The downstream _download_result_csv() function already handles CSV retrieval regardless of exit status — it just needed the polling logic to let it run.

Lesson: In multi-step external workflows, “the job failed” and “the job produced no useful output” are different conditions. An error at step 4 of 5 does not invalidate steps 1 through 3. Always check for partial results before discarding a failed run — especially when results are persisted externally rather than returned inline.

5. Every External Process Needs an Abort Mechanism

After a parallelism-limit failure, the master agent was stuck in running state — the pipeline process had been killed, but PhantomBuster did not know that. This blocked all subsequent launches until the agent was manually aborted through the web UI.

The v2 API does not expose an abort endpoint. The v1 API does:

async def _abort_agent(client, agent_id, pb_key):
    """Abort a running agent via v1 API."""
    url = f"{PB_API_V1}/agent/{agent_id}/abort"
    headers = {"X-Phantombuster-Key": pb_key}
    try:
        resp = await client.post(url, headers=headers, timeout=15)
        if resp.status_code == 200:
            logger.info("Aborted agent %s", agent_id)
        else:
            logger.warning("Abort agent %s returned %d", agent_id, resp.status_code)
    except Exception as e:
        logger.warning("Failed to abort agent %s: %s", agent_id, e)

This was discovered only after a failure forced us to dig through the v1 API reference. The abort endpoint does not seem to be referenced anywhere in the v2 documentation or migration guide.

Lesson: Any integration that launches external processes needs a corresponding abort or cancel mechanism. If the orchestrator crashes, the external process does not stop — and if it is holding a concurrency slot, everything downstream is blocked. Find the kill switch before you need it.

6. “Stopped Workflow” Is a Third State Nobody Warns You About

Our first run failed with HTTP 412 on every post: “The workflow of this agent has been stopped. A worker agent cannot be launched if its master agent is stopped.”

This is not about rate limits or parallelism. In PhantomBuster, a multi-agent workflow has an explicit “started” and “stopped” state that is separate from whether its agents are idle or running. A stopped workflow refuses all worker launches regardless of plan limits or agent availability.

This state is managed through the PhantomBuster web UI, not through the API. There is no API endpoint to check whether a workflow is started or stopped, and no API endpoint to start it. It is a manual, one-time setup step that was only discovered during the first run.

Lesson: Multi-agent orchestration platforms often have workflow-level states that are invisible through individual agent APIs. Test the full orchestration flow in the real environment — with real credentials, real plan limits, and real workflow states — before calling it ready. Unit tests with mocked API calls will never catch this.

7. The Cheapest Gate Goes First

The pipeline has seven stages, and the ordering is deliberate. The ICP title filter — a pure Python fuzzy match using rapidfuzz — runs before any paid enrichment:

# Stage 3: ICP filter (free, milliseconds, rejects 85-90%)
passing, rejected = filter_by_icp_title(
    engagers,
    target_titles=cc.icp.target_titles,
    threshold=cc.icp.title_match_threshold,
)

# Stage 4: Profile enrichment (PB credits, 15-45s per profile)
profiles = await run_profile_scraper(
    profile_urls=[e.linkedin_url for e in passing],  # Only ICP matches
    ...
)

In a typical run, 328 engagers become 28 ICP matches — a 91.5% rejection rate. If profile enrichment ran first, we would spend PhantomBuster credits and 15-45 seconds per profile on 300 people who were never going to qualify.

The ordering principle extends through the full pipeline:

Post discovery — Cost: SearchAPI query fee — Rejection rate: 0% (discovery, not filtering)
Engager extraction — Cost: PB credits per post — Rejection rate: 0% (extraction, not filtering)
ICP title filter — Cost: Free (Python) — Rejection rate: 85–92%
Profile enrichment — Cost: PB credits per profile — Rejection rate: ~5% (missing data)
Company classification — Cost: Claude tokens per company — Rejection rate: 30–60%
HubSpot push — Cost: Free (included in subscription) — Rejection rate: ~0%

Every dollar spent on enrichment is spent on a contact that has already passed the free filter. Every Claude API call is made for a contact whose title already matched and whose profile was already enriched.

Lesson: In a multi-stage pipeline with paid external services, order stages by cost and selectivity. Free filters go first. Expensive enrichment goes last. This is not an optimisation — it is a cost control mechanism that compounds with volume.

8. Intent Signals Need Deterministic Scoring, Not LLM Judgement

It is tempting to ask the LLM to assess “how interested” a prospect is. We deliberately did not do this. Intent scoring is deterministic — a pure function of four observable signals:

def calculate_intent_score(engagement_type, title_match_score,
                           classification, size_verdict) -> int:
    score = 0
    # Engagement signal (observed behaviour)
    if engagement_type == EngagementType.BOTH:
        score += 4
    elif engagement_type == EngagementType.COMMENTED:
        score += 3
    elif engagement_type == EngagementType.LIKED:
        score += 1

    # Title match quality (ICP fit)
    if title_match_score and title_match_score >= 90:
        score += 3
    elif title_match_score and title_match_score >= 80:
        score += 1

    # Company classification (LLM-assessed, but output is an enum)
    if classification == B2BClassification.B2B_SAAS:
        score += 2
    elif classification == B2BClassification.B2B_SERVICES:
        score += 1

    # Company size (in target range)
    if size_verdict == SizeVerdict.IN_RANGE:
        score += 2

    return score  # 0-11 scale

The LLM classifies the company — but produces a constrained enum (b2b_saas, b2b_services, b2c, unclear), not a free-text assessment. The score formula is pure Python: testable, auditable, and identical across every run. A client can look at a contact with score 8/11 and reconstruct exactly why: commented on a post (3), exact title match (3), B2B SaaS company (2). No black box.

Lesson: Use LLMs for judgement calls that require world knowledge — like classifying whether a company is B2B SaaS. Use deterministic code for scoring and ranking. The combination gives you the LLM’s intelligence without sacrificing auditability.

09. Sometimes Production Is the Only Real Integration Test

Three issues appeared only in production. None of them could have been found with mocked API calls:

PhantomBuster workflow stopped (HTTP 412) — a UI-level state that the API cannot query or set. Never occurs in unit tests.
Parallelism limit on agent launches (HTTP 429 variant) — only manifests when a real PB plan with concurrent agent limits is involved.
Master agent stuck in running state — only occurs when the orchestrator process is killed mid-run, leaving PhantomBuster unaware that the client disconnected.

All three are integration-level failures that exist in the gap between “the API call succeeds” and “the multi-step workflow completes.” They cannot be found by testing individual functions — they require running the full pipeline against real services with real concurrency constraints and real plan limitations.

After fixing all three, the pipeline successfully extracted 328 engagers from 39 posts, filtered to 28 ICP matches, and entered profile enrichment — demonstrating the full funnel working end to end.

Lesson: For pipelines that orchestrate external services, there is no substitute for running against the real thing. Mock-based tests verify your code handles API responses correctly. Production runs verify that the sequence of operations works within the operational constraints of the external service — constraints that are often undocumented, plan-specific, and invisible until they break your pipeline at 3am.

The Pattern Across All 9 Lessons

Every lesson above shares a common thread: the PhantomBuster API is a capability layer, not a reliability layer. It gives you the ability to retrieve LinkedIn data but it does not give you retry logic, sequencing, abort mechanisms, partial result recovery, or cost control. Those are your responsibility.

An agentic GTM pipeline is not a script that calls an API. It is an orchestration system that manages external state across multiple services — PhantomBuster, LinkedIn (indirectly), SearchAPI, Claude, HubSpot — each with its own failure modes, rate limits, and operational constraints. The agent’s job is to navigate all of these reliably, week after week, without human intervention.

The intent signal — “this person engaged with a post about a problem you solve” — is genuinely valuable. A sales rep opening HubSpot on Monday morning to find 15 new contacts, each with the specific post they engaged with, the topic it relates to, and a one-sentence explanation of why the your offer is relevant to them, is a fundamentally different starting point than a cold list.

But the signal is only as good as the pipeline that delivers it. And the pipeline is only as good as its handling of the cases where PhantomBuster returns 429 and means something different than you expected.

QualitaX builds production-grade agentic systems for B2B go-to-market. If you want an AI agent that finds prospects based on real intent signals — not just job titles — get in touch.