Batch enrich HubSpot contacts missing job title or company size using code
Prerequisites
- Node.js 18+ or Python 3.9+
- HubSpot private app token with
crm.objects.contacts.readandcrm.objects.contacts.writescopes - Apollo API key with enrichment credits
- A scheduling environment: cron or GitHub Actions
Step 1: Set up the project
# Verify HubSpot search with NOT_HAS_PROPERTY
curl -s -X POST "https://api.hubapi.com/crm/v3/objects/contacts/search" \
-H "Authorization: Bearer $HUBSPOT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"filterGroups": [{"filters": [{"propertyName": "jobtitle", "operator": "NOT_HAS_PROPERTY"}]}],
"properties": ["email", "jobtitle"],
"limit": 5
}' | python3 -m json.toolStep 2: Search HubSpot for contacts with missing fields
import requests
import os
import time
from datetime import datetime
HUBSPOT_TOKEN = os.environ["HUBSPOT_TOKEN"]
APOLLO_API_KEY = os.environ["APOLLO_API_KEY"]
HS_HEADERS = {"Authorization": f"Bearer {HUBSPOT_TOKEN}", "Content-Type": "application/json"}
CRITICAL_FIELDS = ["jobtitle", "company", "phone", "industry"]
def get_contacts_missing_fields(field="jobtitle", limit=200):
"""Find contacts missing a specific field using NOT_HAS_PROPERTY."""
contacts = []
after = 0
while len(contacts) < limit:
resp = requests.post(
"https://api.hubapi.com/crm/v3/objects/contacts/search",
headers=HS_HEADERS,
json={
"filterGroups": [{"filters": [{
"propertyName": field,
"operator": "NOT_HAS_PROPERTY"
}]}],
"properties": ["email", "firstname", "lastname"] + CRITICAL_FIELDS,
"limit": min(100, limit - len(contacts)),
"after": after
}
)
resp.raise_for_status()
data = resp.json()
contacts.extend(data["results"])
if data.get("paging", {}).get("next"):
after = data["paging"]["next"]["after"]
time.sleep(0.2) # HubSpot Search rate limit
else:
break
return contactsThe NOT_HAS_PROPERTY operator finds contacts where the property has never been set or was explicitly cleared. It doesn't match empty strings — only truly null values. If someone set a field to an empty string, it won't appear in results.
Step 3: Batch enrich via Apollo's bulk endpoint
Use Apollo's bulk_match endpoint to enrich 10 contacts per request, reducing API calls:
def bulk_enrich_apollo(contacts_batch):
"""Enrich up to 10 contacts in a single Apollo API call."""
details = []
for contact in contacts_batch:
email = contact["properties"].get("email")
if email:
details.append({"email": email})
if not details:
return []
resp = requests.post(
"https://api.apollo.io/api/v1/people/bulk_match",
headers={"x-api-key": APOLLO_API_KEY, "Content-Type": "application/json"},
json={"details": details}
)
resp.raise_for_status()
return resp.json().get("matches", [])
def enrich_batch(contacts):
"""Process contacts in batches of 10 using Apollo bulk endpoint."""
results = []
for i in range(0, len(contacts), 10):
batch = contacts[i:i+10]
matches = bulk_enrich_apollo(batch)
for contact, match in zip(batch, matches):
if match:
results.append({"contact": contact, "match": match})
time.sleep(1) # rate limit between bulk calls
print(f" Processed {min(i+10, len(contacts))}/{len(contacts)}")
return resultsThe matches array is returned in the same order as the details input array. If a person isn't found, the corresponding index contains null. Always zip/pair results by index, not by email.
Step 4: Update HubSpot (only empty fields)
The key rule: never overwrite existing data. Only fill fields that are currently null:
def update_contact_fields(contact_id, existing_props, apollo_match):
"""Write only fields that are currently empty."""
properties = {}
field_map = {
"jobtitle": lambda m: m.get("title"),
"company": lambda m: m.get("organization", {}).get("name"),
"phone": lambda m: (m.get("phone_numbers") or [{}])[0].get("sanitized_number"),
"linkedin_url": lambda m: m.get("linkedin_url"),
"industry": lambda m: m.get("organization", {}).get("industry"),
}
for hs_field, extractor in field_map.items():
existing_value = existing_props.get(hs_field)
if not existing_value: # only fill if empty
new_value = extractor(apollo_match)
if new_value:
properties[hs_field] = new_value
if not properties:
return 0
properties["enrichment_date"] = datetime.now().strftime("%Y-%m-%d")
properties["enrichment_source"] = "apollo-batch"
resp = requests.patch(
f"https://api.hubapi.com/crm/v3/objects/contacts/{contact_id}",
headers=HS_HEADERS,
json={"properties": properties}
)
resp.raise_for_status()
return len(properties) - 2 # subtract enrichment_date and enrichment_sourceStep 5: Tie it together
def main():
print(f"[{datetime.now().isoformat()}] Starting batch enrichment...")
contacts = get_contacts_missing_fields(field="jobtitle", limit=200)
print(f"Found {len(contacts)} contacts missing job title")
# Filter out personal emails
business_contacts = [
c for c in contacts
if c["properties"].get("email") and
c["properties"]["email"].split("@")[-1].lower() not in
("gmail.com", "yahoo.com", "hotmail.com", "outlook.com", "aol.com")
]
print(f"After filtering personal emails: {len(business_contacts)} contacts")
enriched_results = enrich_batch(business_contacts)
print(f"Apollo matched {len(enriched_results)} contacts")
fields_filled = 0
contacts_updated = 0
for item in enriched_results:
count = update_contact_fields(
item["contact"]["id"],
item["contact"]["properties"],
item["match"]
)
if count > 0:
contacts_updated += 1
fields_filled += count
print(f"\nDone. Updated {contacts_updated} contacts, filled {fields_filled} fields.")
if __name__ == "__main__":
main()Step 6: Schedule the script
# .github/workflows/batch-enrich.yml
name: Weekly Batch Enrichment
on:
schedule:
- cron: '0 3 * * 0' # Sunday at 3 AM UTC
workflow_dispatch: {}
jobs:
enrich:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.12'
- run: pip install requests
- run: python batch_enrich.py
env:
HUBSPOT_TOKEN: ${{ secrets.HUBSPOT_TOKEN }}
APOLLO_API_KEY: ${{ secrets.APOLLO_API_KEY }}Rate limits
| API | Limit | Strategy |
|---|---|---|
| HubSpot Search | 5 req/sec | 200ms between paginated calls |
| HubSpot PATCH | 150 req/10 sec | No delay needed for batch sizes under 150 |
| Apollo bulk_match | 5 req/sec, 10 records/request | 1 second between bulk calls |
Cost
- Apollo: 1 credit per person in the bulk request (same as individual calls). The bulk endpoint saves time, not credits. Basic plan ($49/mo) = 900 credits. 200 contacts/week = 800 credits/month.
- HubSpot: Free within rate limits.
- GitHub Actions: Free tier (2,000 min/month). Each batch run takes 2-5 minutes.
- Per 200 contacts: 200 Apollo credits + ~20 bulk API calls + ~200 HubSpot PATCH calls. Total cost: ~$11 at Basic plan pricing.
Apollo's bulk_match endpoint doesn't offer a discount — it's 1 credit per person in the request, same as individual calls. The benefit is fewer HTTP requests (1 instead of 10) and faster processing. Use it for efficiency, not cost savings.
Next steps
- Expand field checks — run the search for multiple missing fields: jobtitle, company, phone, industry. Use separate search queries per field or combine with
filterGroups. - Add deduplication — track enriched contacts by ID to avoid re-processing on overlapping runs
- Add Slack summary — post a weekly summary to a Slack channel with enrichment metrics
- Monitor fill rates — log what percentage of contacts Apollo successfully enriches to evaluate ROI
Need help implementing this?
We build and optimize automation systems for mid-market businesses. Let's discuss the right approach for your team.