Batch enrich HubSpot contacts missing job title or company size using an agent skill

low complexityCost: Usage-based

Prerequisites

Prerequisites
  • Claude Code, Cursor, or another AI coding agent that supports skills
  • HubSpot private app token stored as HUBSPOT_TOKEN (scopes: crm.objects.contacts.read, crm.objects.contacts.write)
  • Apollo API key stored as APOLLO_API_KEY

Overview

Create an agent skill that finds contacts in HubSpot with incomplete data, batch-enriches them via Apollo, and fills in the gaps — without overwriting any existing data. Run /batch-enrich on demand or schedule it weekly.

Step 1: Create the skill directory

mkdir -p .claude/skills/batch-enrich/scripts

Step 2: Write the SKILL.md file

Create .claude/skills/batch-enrich/SKILL.md:

---
name: batch-enrich
description: Finds HubSpot contacts missing critical fields (job title, company, phone) and batch-enriches them via Apollo. Only fills empty fields — never overwrites existing data. Uses Apollo's bulk endpoint for efficiency.
disable-model-invocation: true
allowed-tools: Bash(python *)
---
 
Batch enrich HubSpot contacts with missing fields:
 
1. Run: `python $SKILL_DIR/scripts/batch_enrich.py`
2. Review the enrichment summary
3. Confirm the number of contacts updated and fields filled

Step 3: Write the batch enrichment script

Create .claude/skills/batch-enrich/scripts/batch_enrich.py:

#!/usr/bin/env python3
"""
Batch Enrichment: HubSpot (search missing) → Apollo (bulk enrich) → HubSpot (update)
Finds contacts missing job title, enriches via Apollo bulk endpoint,
writes only empty fields back to HubSpot.
"""
import os
import sys
import time
from datetime import datetime
 
try:
    import requests
except ImportError:
    os.system("pip install requests -q")
    import requests
 
HUBSPOT_TOKEN = os.environ.get("HUBSPOT_TOKEN")
APOLLO_API_KEY = os.environ.get("APOLLO_API_KEY")
 
if not all([HUBSPOT_TOKEN, APOLLO_API_KEY]):
    print("ERROR: Set HUBSPOT_TOKEN and APOLLO_API_KEY environment variables")
    sys.exit(1)
 
HS_HEADERS = {"Authorization": f"Bearer {HUBSPOT_TOKEN}", "Content-Type": "application/json"}
APOLLO_HEADERS = {"x-api-key": APOLLO_API_KEY, "Content-Type": "application/json"}
PERSONAL_DOMAINS = {"gmail.com", "yahoo.com", "hotmail.com", "outlook.com", "aol.com"}
 
# --- Search for contacts missing job title ---
print(f"[{datetime.now().isoformat()}] Searching for contacts missing job title...")
contacts = []
after = 0
while True:
    resp = requests.post(
        "https://api.hubapi.com/crm/v3/objects/contacts/search",
        headers=HS_HEADERS,
        json={
            "filterGroups": [{"filters": [{
                "propertyName": "jobtitle",
                "operator": "NOT_HAS_PROPERTY"
            }]}],
            "properties": ["email", "firstname", "lastname", "jobtitle", "company", "phone", "linkedin_url", "industry"],
            "limit": 100,
            "after": after,
        }
    )
    resp.raise_for_status()
    data = resp.json()
    contacts.extend(data["results"])
    if data.get("paging", {}).get("next"):
        after = data["paging"]["next"]["after"]
        time.sleep(0.2)
    else:
        break
 
# Filter to business emails only
business_contacts = [
    c for c in contacts
    if c["properties"].get("email")
    and c["properties"]["email"].split("@")[-1].lower() not in PERSONAL_DOMAINS
]
print(f"Found {len(contacts)} total, {len(business_contacts)} with business emails\n")
 
# --- Batch enrich via Apollo bulk endpoint ---
enriched = 0
fields_filled = 0
 
for i in range(0, len(business_contacts), 10):
    batch = business_contacts[i:i+10]
    details = [{"email": c["properties"]["email"]} for c in batch]
 
    apollo_resp = requests.post(
        "https://api.apollo.io/api/v1/people/bulk_match",
        headers=APOLLO_HEADERS,
        json={"details": details}
    )
    apollo_resp.raise_for_status()
    matches = apollo_resp.json().get("matches", [])
 
    for contact, match in zip(batch, matches):
        if not match:
            continue
 
        props = contact["properties"]
        updates = {}
 
        # Only fill empty fields
        if not props.get("jobtitle") and match.get("title"):
            updates["jobtitle"] = match["title"]
        if not props.get("company") and match.get("organization", {}).get("name"):
            updates["company"] = match["organization"]["name"]
        if not props.get("phone") and match.get("phone_numbers"):
            phone = match["phone_numbers"][0].get("sanitized_number") if match["phone_numbers"] else None
            if phone:
                updates["phone"] = phone
        if not props.get("linkedin_url") and match.get("linkedin_url"):
            updates["linkedin_url"] = match["linkedin_url"]
        if not props.get("industry") and match.get("organization", {}).get("industry"):
            updates["industry"] = match["organization"]["industry"]
 
        if updates:
            updates["enrichment_date"] = datetime.now().strftime("%Y-%m-%d")
            updates["enrichment_source"] = "apollo-batch"
 
            requests.patch(
                f"https://api.hubapi.com/crm/v3/objects/contacts/{contact['id']}",
                headers=HS_HEADERS,
                json={"properties": updates}
            ).raise_for_status()
 
            enriched += 1
            field_count = len(updates) - 2  # exclude enrichment_date and enrichment_source
            fields_filled += field_count
            email = contact["properties"]["email"]
            print(f"  {email} -> {list(k for k in updates if k not in ('enrichment_date', 'enrichment_source'))}")
 
    print(f"  Batch {i//10 + 1}: processed {min(i+10, len(business_contacts))}/{len(business_contacts)}")
    time.sleep(1)
 
print(f"\nDone. Enriched {enriched}/{len(business_contacts)} contacts, filled {fields_filled} fields.")

Step 4: Run the skill

# Via Claude Code
/batch-enrich
 
# Or directly
python .claude/skills/batch-enrich/scripts/batch_enrich.py

Step 5: Schedule it weekly (optional)

Option A: Cron

# crontab -e — run every Sunday at 10 PM
0 22 * * 0 cd /path/to/project && python .claude/skills/batch-enrich/scripts/batch_enrich.py >> /var/log/batch-enrich.log 2>&1

Option B: GitHub Actions

name: Weekly Batch Enrichment
on:
  schedule:
    - cron: '0 3 * * 0'  # Sunday 3 AM UTC
  workflow_dispatch: {}
jobs:
  enrich:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'
      - run: pip install requests
      - run: python .claude/skills/batch-enrich/scripts/batch_enrich.py
        env:
          HUBSPOT_TOKEN: ${{ secrets.HUBSPOT_TOKEN }}
          APOLLO_API_KEY: ${{ secrets.APOLLO_API_KEY }}

Cost

  • Apollo: 1 credit per person (bulk_match has the same per-person cost as individual calls). Basic plan ($49/mo) = 900 credits.
  • HubSpot: Free within API rate limits.
  • Compute: Free on GitHub Actions.
  • Weekly batch of 100 contacts: 100 Apollo credits + 10 bulk API calls. Monthly: ~400 credits for weekly runs.
The script never overwrites existing data

The most important behavior of this script is the if not props.get("jobtitle") check for every field. If a sales rep manually entered a job title, the script will not overwrite it — even if Apollo has a different value. This preserves manually curated data.

When to use this approach

  • You want to clean up data gaps right now without building automation infrastructure
  • You're onboarding a new data vendor and want to test fill rates before committing to a platform
  • You want to run enrichment on a specific segment — modify the search filter for a custom list
  • You prefer seeing enrichment results in real-time as the script runs

When to move to a dedicated tool

  • You want enrichment to run reliably every week without human intervention
  • You need visual monitoring showing fill rates, credit usage, and error rates
  • Multiple team members need to manage the enrichment configuration
  • You want to chain enrichment with other workflows (lead scoring, routing) in one platform

Need help implementing this?

We build and optimize automation systems for mid-market businesses. Let's discuss the right approach for your team.