Identify website visitors with Clearbit Reveal and create HubSpot companies using an agent skill
Prerequisites
- Claude Code or another agent that supports the Agent Skills standard
- Clearbit Reveal API key stored as
CLEARBIT_API_KEYenvironment variable (legacy) or HubSpot Breeze Intelligence enabled - HubSpot private app token stored as
HUBSPOT_TOKENenvironment variable - A file or directory containing visitor IP addresses (server logs, CSV export, etc.)
Clearbit was acquired by HubSpot and rebranded as Breeze Intelligence. The standalone API is being sunset. This skill works with the legacy Clearbit Reveal API. If you're using Breeze, visitor identification happens natively in HubSpot — use this skill for batch processing of historical log data or as a supplement.
Overview
This agent skill takes a log file or CSV of visitor IP addresses, resolves them to companies via Clearbit Reveal, filters for ICP matches, and creates new company records in HubSpot. Ideal for batch processing a day's worth of visitor data on demand.
Step 1: Create the skill directory
mkdir -p .claude/skills/identify-visitors/scriptsStep 2: Write the SKILL.md file
Create .claude/skills/identify-visitors/SKILL.md:
---
name: identify-visitors
description: Resolve website visitor IPs to companies via Clearbit Reveal and create matching companies in HubSpot.
disable-model-invocation: true
allowed-tools: Bash(python *)
---
Identify companies from website visitor IPs and add them to HubSpot.
Usage: /identify-visitors <path_to_log_or_csv>
1. Run: `python $SKILL_DIR/scripts/reveal_visitors.py --input "$1"`
2. Review the output showing resolved companies and ICP matches
3. Confirm new companies were created in HubSpotStep 3: Write the script
Create .claude/skills/identify-visitors/scripts/reveal_visitors.py:
#!/usr/bin/env python3
"""
Visitor Identification: Clearbit Reveal → HubSpot
Resolves visitor IPs to companies, filters by ICP, creates HubSpot records.
"""
import argparse
import csv
import os
import re
import sys
import time
try:
import requests
except ImportError:
os.system("pip install requests -q")
import requests
# --- Config ---
CLEARBIT_API_KEY = os.environ.get("CLEARBIT_API_KEY")
HUBSPOT_TOKEN = os.environ.get("HUBSPOT_TOKEN")
if not all([CLEARBIT_API_KEY, HUBSPOT_TOKEN]):
print("ERROR: Set CLEARBIT_API_KEY and HUBSPOT_TOKEN environment variables")
sys.exit(1)
HS_HEADERS = {"Authorization": f"Bearer {HUBSPOT_TOKEN}", "Content-Type": "application/json"}
CB_HEADERS = {"Authorization": f"Bearer {CLEARBIT_API_KEY}"}
MIN_EMPLOYEES = 50 # Adjust to your ICP
# --- Functions ---
def extract_ips(input_path):
"""Extract unique IPs from a log file or CSV."""
ips = set()
with open(input_path) as f:
for line in f:
# Match IPv4 addresses
found = re.findall(r'\b(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\b', line)
ips.update(found)
# Filter out private/reserved IPs
public_ips = set()
for ip in ips:
octets = [int(o) for o in ip.split(".")]
if octets[0] == 10:
continue
if octets[0] == 172 and 16 <= octets[1] <= 31:
continue
if octets[0] == 192 and octets[1] == 168:
continue
if octets[0] == 127:
continue
public_ips.add(ip)
return public_ips
def reveal_company(ip):
"""Resolve IP to company via Clearbit Reveal."""
resp = requests.get(
"https://reveal.clearbit.com/v1/companies/find",
params={"ip": ip},
headers=CB_HEADERS,
)
if resp.status_code == 404:
return None
resp.raise_for_status()
company = resp.json().get("company")
if not company or company.get("type") != "company":
return None
return {
"domain": company.get("domain"),
"name": company.get("name"),
"industry": company.get("category", {}).get("industry"),
"employees": company.get("metrics", {}).get("employees"),
"city": company.get("geo", {}).get("city"),
"state": company.get("geo", {}).get("state"),
"country": company.get("geo", {}).get("country"),
}
def company_exists(domain):
"""Check if company exists in HubSpot by domain."""
resp = requests.post(
"https://api.hubapi.com/crm/v3/objects/companies/search",
headers=HS_HEADERS,
json={"filterGroups": [{"filters": [{
"propertyName": "domain", "operator": "EQ", "value": domain,
}]}]},
)
resp.raise_for_status()
return len(resp.json().get("results", [])) > 0
def create_company(company):
"""Create company in HubSpot."""
resp = requests.post(
"https://api.hubapi.com/crm/v3/objects/companies",
headers=HS_HEADERS,
json={"properties": {
"domain": company["domain"],
"name": company["name"],
"industry": company.get("industry", ""),
"numberofemployees": str(company.get("employees", "")),
"city": company.get("city", ""),
"state": company.get("state", ""),
"country": company.get("country", ""),
}},
)
resp.raise_for_status()
return resp.json()["id"]
# --- Main ---
parser = argparse.ArgumentParser()
parser.add_argument("--input", required=True, help="Path to log file or CSV with visitor IPs")
parser.add_argument("--min-employees", type=int, default=50, help="Minimum employee count for ICP")
args = parser.parse_args()
MIN_EMPLOYEES = args.min_employees
ips = extract_ips(args.input)
print(f"Found {len(ips)} unique public IPs\n")
created = 0
already_exists = 0
no_match = 0
unresolved = 0
seen_domains = set()
for ip in sorted(ips):
company = reveal_company(ip)
time.sleep(0.1) # Respect rate limits
if not company:
unresolved += 1
continue
domain = company["domain"]
if domain in seen_domains:
continue
seen_domains.add(domain)
employees = company.get("employees") or 0
if employees < MIN_EMPLOYEES:
no_match += 1
continue
if company_exists(domain):
print(f" EXISTS: {company['name']} ({domain})")
already_exists += 1
continue
company_id = create_company(company)
print(f" CREATED: {company['name']} — {employees} employees — ID: {company_id}")
created += 1
print(f"\n--- Summary ---")
print(f"IPs processed: {len(ips)}")
print(f"Companies created: {created}")
print(f"Already in HubSpot: {already_exists}")
print(f"Below ICP threshold: {no_match}")
print(f"Unresolved IPs: {unresolved}")Step 4: Run the skill
# Process yesterday's access log
/identify-visitors /var/log/nginx/access.log
# Process a CSV export from your analytics tool
/identify-visitors ~/downloads/visitor-ips.csv
# With a custom employee threshold
python .claude/skills/identify-visitors/scripts/reveal_visitors.py \
--input /var/log/nginx/access.log --min-employees 100Step 5: Schedule (optional)
For daily processing, schedule via cron:
# crontab — run daily at 7 AM, process previous day's log
0 7 * * * cd /path/to/project && python .claude/skills/identify-visitors/scripts/reveal_visitors.py \
--input /var/log/nginx/access.logWhen to use this approach
- You want to batch-process a log file or CSV of IPs on demand
- You're evaluating visitor identification before committing to Breeze Intelligence pricing
- You need custom ICP filtering beyond what Breeze offers natively
- You want full control over which companies get created and with what data
When to graduate to a dedicated tool
- You need real-time identification (as visitors land, not in daily batches)
- You want native HubSpot integration without managing API keys
- Breeze Intelligence pricing works for your volume
Each IP lookup costs 1 Clearbit Reveal credit, regardless of whether it resolves to a company. A log file with 1,000 unique IPs = 1,000 credits used, even though only 200-300 will resolve. Pre-filter your IPs to exclude known consumer ISP ranges to save credits.
Need help implementing this?
We build and optimize automation systems for mid-market businesses. Let's discuss the right approach for your team.