Batch enrich HubSpot contacts missing job title or company size using code

medium complexityCost: $0Recommended

Prerequisites

Prerequisites
  • Node.js 18+ or Python 3.9+
  • HubSpot private app token with crm.objects.contacts.read and crm.objects.contacts.write scopes
  • Apollo API key with enrichment credits
  • A scheduling environment: cron or GitHub Actions

Step 1: Set up the project

# Verify HubSpot search with NOT_HAS_PROPERTY
curl -s -X POST "https://api.hubapi.com/crm/v3/objects/contacts/search" \
  -H "Authorization: Bearer $HUBSPOT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "filterGroups": [{"filters": [{"propertyName": "jobtitle", "operator": "NOT_HAS_PROPERTY"}]}],
    "properties": ["email", "jobtitle"],
    "limit": 5
  }' | python3 -m json.tool

Step 2: Search HubSpot for contacts with missing fields

import requests
import os
import time
from datetime import datetime
 
HUBSPOT_TOKEN = os.environ["HUBSPOT_TOKEN"]
APOLLO_API_KEY = os.environ["APOLLO_API_KEY"]
HS_HEADERS = {"Authorization": f"Bearer {HUBSPOT_TOKEN}", "Content-Type": "application/json"}
 
CRITICAL_FIELDS = ["jobtitle", "company", "phone", "industry"]
 
def get_contacts_missing_fields(field="jobtitle", limit=200):
    """Find contacts missing a specific field using NOT_HAS_PROPERTY."""
    contacts = []
    after = 0
 
    while len(contacts) < limit:
        resp = requests.post(
            "https://api.hubapi.com/crm/v3/objects/contacts/search",
            headers=HS_HEADERS,
            json={
                "filterGroups": [{"filters": [{
                    "propertyName": field,
                    "operator": "NOT_HAS_PROPERTY"
                }]}],
                "properties": ["email", "firstname", "lastname"] + CRITICAL_FIELDS,
                "limit": min(100, limit - len(contacts)),
                "after": after
            }
        )
        resp.raise_for_status()
        data = resp.json()
        contacts.extend(data["results"])
 
        if data.get("paging", {}).get("next"):
            after = data["paging"]["next"]["after"]
            time.sleep(0.2)  # HubSpot Search rate limit
        else:
            break
 
    return contacts
NOT_HAS_PROPERTY

The NOT_HAS_PROPERTY operator finds contacts where the property has never been set or was explicitly cleared. It doesn't match empty strings — only truly null values. If someone set a field to an empty string, it won't appear in results.

Step 3: Batch enrich via Apollo's bulk endpoint

Use Apollo's bulk_match endpoint to enrich 10 contacts per request, reducing API calls:

def bulk_enrich_apollo(contacts_batch):
    """Enrich up to 10 contacts in a single Apollo API call."""
    details = []
    for contact in contacts_batch:
        email = contact["properties"].get("email")
        if email:
            details.append({"email": email})
 
    if not details:
        return []
 
    resp = requests.post(
        "https://api.apollo.io/api/v1/people/bulk_match",
        headers={"x-api-key": APOLLO_API_KEY, "Content-Type": "application/json"},
        json={"details": details}
    )
    resp.raise_for_status()
    return resp.json().get("matches", [])
 
def enrich_batch(contacts):
    """Process contacts in batches of 10 using Apollo bulk endpoint."""
    results = []
    for i in range(0, len(contacts), 10):
        batch = contacts[i:i+10]
        matches = bulk_enrich_apollo(batch)
 
        for contact, match in zip(batch, matches):
            if match:
                results.append({"contact": contact, "match": match})
 
        time.sleep(1)  # rate limit between bulk calls
        print(f"  Processed {min(i+10, len(contacts))}/{len(contacts)}")
 
    return results
Apollo bulk_match array ordering

The matches array is returned in the same order as the details input array. If a person isn't found, the corresponding index contains null. Always zip/pair results by index, not by email.

Step 4: Update HubSpot (only empty fields)

The key rule: never overwrite existing data. Only fill fields that are currently null:

def update_contact_fields(contact_id, existing_props, apollo_match):
    """Write only fields that are currently empty."""
    properties = {}
 
    field_map = {
        "jobtitle": lambda m: m.get("title"),
        "company": lambda m: m.get("organization", {}).get("name"),
        "phone": lambda m: (m.get("phone_numbers") or [{}])[0].get("sanitized_number"),
        "linkedin_url": lambda m: m.get("linkedin_url"),
        "industry": lambda m: m.get("organization", {}).get("industry"),
    }
 
    for hs_field, extractor in field_map.items():
        existing_value = existing_props.get(hs_field)
        if not existing_value:  # only fill if empty
            new_value = extractor(apollo_match)
            if new_value:
                properties[hs_field] = new_value
 
    if not properties:
        return 0
 
    properties["enrichment_date"] = datetime.now().strftime("%Y-%m-%d")
    properties["enrichment_source"] = "apollo-batch"
 
    resp = requests.patch(
        f"https://api.hubapi.com/crm/v3/objects/contacts/{contact_id}",
        headers=HS_HEADERS,
        json={"properties": properties}
    )
    resp.raise_for_status()
    return len(properties) - 2  # subtract enrichment_date and enrichment_source

Step 5: Tie it together

def main():
    print(f"[{datetime.now().isoformat()}] Starting batch enrichment...")
 
    contacts = get_contacts_missing_fields(field="jobtitle", limit=200)
    print(f"Found {len(contacts)} contacts missing job title")
 
    # Filter out personal emails
    business_contacts = [
        c for c in contacts
        if c["properties"].get("email") and
        c["properties"]["email"].split("@")[-1].lower() not in
        ("gmail.com", "yahoo.com", "hotmail.com", "outlook.com", "aol.com")
    ]
    print(f"After filtering personal emails: {len(business_contacts)} contacts")
 
    enriched_results = enrich_batch(business_contacts)
    print(f"Apollo matched {len(enriched_results)} contacts")
 
    fields_filled = 0
    contacts_updated = 0
    for item in enriched_results:
        count = update_contact_fields(
            item["contact"]["id"],
            item["contact"]["properties"],
            item["match"]
        )
        if count > 0:
            contacts_updated += 1
            fields_filled += count
 
    print(f"\nDone. Updated {contacts_updated} contacts, filled {fields_filled} fields.")
 
if __name__ == "__main__":
    main()

Step 6: Schedule the script

# .github/workflows/batch-enrich.yml
name: Weekly Batch Enrichment
on:
  schedule:
    - cron: '0 3 * * 0'  # Sunday at 3 AM UTC
  workflow_dispatch: {}
jobs:
  enrich:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'
      - run: pip install requests
      - run: python batch_enrich.py
        env:
          HUBSPOT_TOKEN: ${{ secrets.HUBSPOT_TOKEN }}
          APOLLO_API_KEY: ${{ secrets.APOLLO_API_KEY }}

Rate limits

APILimitStrategy
HubSpot Search5 req/sec200ms between paginated calls
HubSpot PATCH150 req/10 secNo delay needed for batch sizes under 150
Apollo bulk_match5 req/sec, 10 records/request1 second between bulk calls

Cost

  • Apollo: 1 credit per person in the bulk request (same as individual calls). The bulk endpoint saves time, not credits. Basic plan ($49/mo) = 900 credits. 200 contacts/week = 800 credits/month.
  • HubSpot: Free within rate limits.
  • GitHub Actions: Free tier (2,000 min/month). Each batch run takes 2-5 minutes.
  • Per 200 contacts: 200 Apollo credits + ~20 bulk API calls + ~200 HubSpot PATCH calls. Total cost: ~$11 at Basic plan pricing.
Bulk endpoint still costs 1 credit per person

Apollo's bulk_match endpoint doesn't offer a discount — it's 1 credit per person in the request, same as individual calls. The benefit is fewer HTTP requests (1 instead of 10) and faster processing. Use it for efficiency, not cost savings.

Next steps

  • Expand field checks — run the search for multiple missing fields: jobtitle, company, phone, industry. Use separate search queries per field or combine with filterGroups.
  • Add deduplication — track enriched contacts by ID to avoid re-processing on overlapping runs
  • Add Slack summary — post a weekly summary to a Slack channel with enrichment metrics
  • Monitor fill rates — log what percentage of contacts Apollo successfully enriches to evaluate ROI

Need help implementing this?

We build and optimize automation systems for mid-market businesses. Let's discuss the right approach for your team.