Enrich HubSpot companies with technographic data from BuiltWith using code
medium complexityCost: $0Recommended
Prerequisites
Prerequisites
- Node.js 18+ or Python 3.9+
- HubSpot private app token with
crm.objects.companies.readandcrm.objects.companies.writescopes - BuiltWith API key (Pro plan or above for API access)
- Custom HubSpot company properties:
tech_stack_crm,tech_stack_marketing,tech_stack_analytics,tech_stack_all,tech_enrichment_date - A scheduling environment: cron or GitHub Actions
Step 1: Set up the project
# Test the BuiltWith API
curl "https://api.builtwith.com/v21/api.json?KEY=$BUILTWITH_API_KEY&LOOKUP=hubspot.com" \
| python3 -m json.tool | head -50Step 2: Fetch companies missing tech data from HubSpot
import requests
import os
import time
from datetime import datetime
HUBSPOT_TOKEN = os.environ["HUBSPOT_TOKEN"]
BUILTWITH_API_KEY = os.environ["BUILTWITH_API_KEY"]
HS_HEADERS = {"Authorization": f"Bearer {HUBSPOT_TOKEN}", "Content-Type": "application/json"}
def get_companies_without_tech(limit=50):
companies = []
after = 0
while len(companies) < limit:
resp = requests.post(
"https://api.hubapi.com/crm/v3/objects/companies/search",
headers=HS_HEADERS,
json={
"filterGroups": [{"filters": [{
"propertyName": "tech_stack_crm",
"operator": "NOT_HAS_PROPERTY"
}]}],
"properties": ["domain", "name"],
"limit": min(100, limit - len(companies)),
"after": after
}
)
resp.raise_for_status()
data = resp.json()
companies.extend(data["results"])
if data.get("paging", {}).get("next"):
after = data["paging"]["next"]["after"]
else:
break
return companiesStep 3: Look up tech stacks via BuiltWith
CATEGORY_MAP = {
"crm": ["crm"],
"marketing": ["marketing-automation", "email", "marketing"],
"analytics": ["analytics", "web-analytics"],
"ecommerce": ["ecommerce", "shopping-cart", "payment"],
"hosting": ["hosting", "cdn", "cloud-paas"],
}
def lookup_tech_stack(domain):
"""Call BuiltWith and categorize technologies."""
resp = requests.get(
"https://api.builtwith.com/v21/api.json",
params={"KEY": BUILTWITH_API_KEY, "LOOKUP": domain}
)
resp.raise_for_status()
data = resp.json()
# Extract technologies from the nested response
results = data.get("Results", [])
if not results:
return None
paths = results[0].get("Result", {}).get("Paths", [])
if not paths:
return None
technologies = paths[0].get("Technologies", [])
if not technologies:
return None
# Categorize
categorized = {cat: [] for cat in CATEGORY_MAP}
for tech in technologies:
tag = (tech.get("Tag") or "").lower()
cats = [c.lower() for c in (tech.get("Categories") or [])]
for category, keywords in CATEGORY_MAP.items():
if any(kw in tag for kw in keywords) or any(kw in c for kw in keywords for c in cats):
categorized[category].append(tech["Name"])
break
return {
"tech_stack_crm": ", ".join(categorized["crm"]) or "None detected",
"tech_stack_marketing": ", ".join(categorized["marketing"]) or "None detected",
"tech_stack_analytics": ", ".join(categorized["analytics"]) or "None detected",
"tech_stack_all": ", ".join(t["Name"] for t in technologies),
"tech_count": len(technologies),
}BuiltWith response for unknown domains
If BuiltWith doesn't recognize a domain, it may return an empty Results array or a result with no Paths. Always check for empty/missing data at each nesting level. Don't assume the structure is always fully populated.
Step 4: Update HubSpot companies
def update_company_tech(company_id, tech_data):
"""Write tech stack data to HubSpot company."""
properties = {
**tech_data,
"tech_enrichment_date": datetime.now().strftime("%Y-%m-%d"),
}
# Remove tech_count from HubSpot update (internal metric only)
properties.pop("tech_count", None)
resp = requests.patch(
f"https://api.hubapi.com/crm/v3/objects/companies/{company_id}",
headers=HS_HEADERS,
json={"properties": properties}
)
resp.raise_for_status()
def main():
companies = get_companies_without_tech(limit=50)
print(f"Found {len(companies)} companies to enrich\n")
enriched = 0
skipped = 0
for company in companies:
domain = company["properties"].get("domain")
name = company["properties"].get("name", "Unknown")
if not domain:
print(f" {name} — no domain, skipping")
skipped += 1
continue
tech_data = lookup_tech_stack(domain)
if not tech_data:
print(f" {name} ({domain}) — no tech data found")
skipped += 1
time.sleep(2)
continue
update_company_tech(company["id"], tech_data)
enriched += 1
print(f" {name} ({domain}) — {tech_data['tech_count']} technologies")
print(f" CRM: {tech_data['tech_stack_crm']}")
print(f" Marketing: {tech_data['tech_stack_marketing']}")
print(f" Analytics: {tech_data['tech_stack_analytics']}")
time.sleep(2) # BuiltWith rate limit
print(f"\nDone. Enriched: {enriched}, Skipped: {skipped}")
if __name__ == "__main__":
main()Step 5: Schedule the script
# .github/workflows/tech-enrichment.yml
name: Technographic Enrichment
on:
schedule:
- cron: '0 3 * * 0' # Weekly on Sunday at 3 AM UTC
workflow_dispatch: {}
jobs:
enrich:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.12'
- run: pip install requests
- run: python tech_enrich.py
env:
HUBSPOT_TOKEN: ${{ secrets.HUBSPOT_TOKEN }}
BUILTWITH_API_KEY: ${{ secrets.BUILTWITH_API_KEY }}Rate limits
| API | Limit | Delay |
|---|---|---|
| BuiltWith | Varies by plan (typically 1-2 req/sec) | 2 seconds between calls |
| HubSpot Search | 5 req/sec | 200ms between pages |
| HubSpot PATCH | 150 req/10 sec | No delay needed |
Cost
- BuiltWith Pro: $295/mo for 500 API calls (
$0.59/lookup). Enterprise: $495/mo for 2,000 calls ($0.25/lookup). - HubSpot: Free within API rate limits.
- GitHub Actions: Free tier (2,000 min/month).
- Budget tip: At 50 companies/week, you'll use 200 calls/month — well within the Pro plan's 500-call limit.
BuiltWith bills per lookup, not per technology
You pay 1 API call per domain lookup, regardless of how many technologies BuiltWith detects on that domain. A domain with 3 technologies costs the same as one with 300. This makes BuiltWith more cost-effective for companies with large tech stacks.
Next steps
- Detect specific competitors — add a check for competitor product names and set a boolean
uses_competitor_productproperty for easy filtering - Segment by tech maturity — companies with 50+ technologies are likely tech-savvy enterprises; companies with 5-10 are leaner. Use
tech_countfor segmentation. - Track changes over time — store previous tech stack data and compare on re-enrichment to detect when a company adds or drops a tool
Need help implementing this?
We build and optimize automation systems for mid-market businesses. Let's discuss the right approach for your team.