Snapshot your HubSpot pipeline weekly in Google Sheets using code and cron
Prerequisites
- Node.js 18+ or Python 3.9+
- HubSpot private app token (scopes:
crm.objects.deals.read,crm.schemas.deals.read) - Google Cloud project with the Google Sheets API enabled
- Google service account with a JSON key file, or OAuth2 credentials
- The service account's email address shared as an Editor on the target Google Sheet
- A scheduling environment: cron, GitHub Actions, or a cloud function
Why code?
A script gives you full control over snapshot logic, column ordering, and metric calculation with zero ongoing cost. The pipeline snapshot runs once a week and takes seconds — perfect for cron or GitHub Actions. The trade-off is you handle errors, monitoring, and Google credential management yourself. Best for teams with a developer who can maintain the script and wants version-controlled infrastructure.
How it works
- Cron or GitHub Actions triggers the script every Monday morning
- HubSpot Pipelines API fetches stage definitions for ID-to-label mapping
- HubSpot Search API pulls all active deals with pagination
- Aggregation logic computes total value, deal count, and per-stage breakdowns
- Google Sheets API v4 appends a single snapshot row via service account auth
Step 1: Set up the project
# Verify your HubSpot token works
curl -s "https://api.hubapi.com/crm/v3/pipelines/deals" \
-H "Authorization: Bearer $HUBSPOT_ACCESS_TOKEN" | head -c 200Create a service account in the Google Cloud Console, download the JSON key file, and share your Google Sheet with the service account email (e.g., snapshot@your-project.iam.gserviceaccount.com) as an Editor. Store the key file path in the GOOGLE_SERVICE_ACCOUNT_KEY environment variable.
Step 2: Fetch pipeline stages
Get a mapping of stage IDs to human-readable names.
curl -s "https://api.hubapi.com/crm/v3/pipelines/deals" \
-H "Authorization: Bearer $HUBSPOT_ACCESS_TOKEN" \
| jq '.results[0].stages[] | {id: .id, label: .label}'Step 3: Search for active deals
Use the HubSpot Search API to pull all deals with pagination.
from hubspot.crm.deals import PublicObjectSearchRequest, Filter, FilterGroup
def get_all_deals(client):
pipeline_filter = Filter(
property_name="pipeline",
operator="EQ",
value="default"
)
filter_group = FilterGroup(filters=[pipeline_filter])
all_deals = []
after = 0
while True:
request = PublicObjectSearchRequest(
filter_groups=[filter_group],
properties=[
"dealname", "amount", "dealstage",
"closedate", "createdate"
],
sorts=[{"propertyName": "amount", "direction": "DESCENDING"}],
limit=100,
after=after
)
response = client.crm.deals.search_api.do_search(
public_object_search_request=request
)
all_deals.extend(response.results)
if response.paging and response.paging.next:
after = response.paging.next.after
else:
break
return all_dealsAll HubSpot property values come back as strings — even numbers. amount returns "50000" not 50000. Always parse with float() / parseFloat() and handle None/null values.
Step 4: Calculate snapshot metrics
Aggregate deals into total value, count, and per-stage breakdowns.
from datetime import datetime, timezone
def calculate_snapshot(deals, stage_map, stage_order):
total_value = 0
by_stage = {}
for deal in deals:
props = deal.properties
amount = float(props.get("amount") or 0)
total_value += amount
stage_id = props.get("dealstage", "unknown")
stage_name = stage_map.get(stage_id, stage_id)
if stage_name not in by_stage:
by_stage[stage_name] = {"count": 0, "value": 0}
by_stage[stage_name]["count"] += 1
by_stage[stage_name]["value"] += amount
# Build the row: date, total value, deal count, then per-stage count/value pairs
today = datetime.now(timezone.utc).strftime("%Y-%m-%d")
row = [today, total_value, len(deals)]
for stage_name in stage_order:
data = by_stage.get(stage_name, {"count": 0, "value": 0})
row.append(data["count"])
row.append(data["value"])
return rowStep 5: Append to Google Sheets
Write the snapshot row to Google Sheets using the Sheets API v4 append method.
from google.oauth2 import service_account
from googleapiclient.discovery import build
def append_to_sheet(row, spreadsheet_id, range_name="Sheet1!A1"):
creds = service_account.Credentials.from_service_account_file(
os.environ["GOOGLE_SERVICE_ACCOUNT_KEY"],
scopes=["https://www.googleapis.com/auth/spreadsheets"]
)
service = build("sheets", "v4", credentials=creds)
body = {"values": [row]}
result = service.spreadsheets().values().append(
spreadsheetId=spreadsheet_id,
range=range_name,
valueInputOption="USER_ENTERED",
insertDataOption="INSERT_ROWS",
body=body,
).execute()
print(f"Appended {result.get('updates', {}).get('updatedRows', 0)} row(s)")
return resultThe Sheets API append method finds the last row with data in the specified range and inserts below it. valueInputOption: "USER_ENTERED" means Google Sheets will parse values the same way it would if you typed them — numbers stay as numbers, dates as dates.
The service account is a separate Google identity. It can't see your sheets by default. You must share the spreadsheet with the service account email address (xxx@yyy.iam.gserviceaccount.com) and give it Editor access. Without this, you'll get a 403 error.
Step 6: Wire it all together
#!/usr/bin/env python3
"""Pipeline snapshot: HubSpot → Google Sheets"""
import os
import sys
from datetime import datetime, timezone
try:
from hubspot import HubSpot
from google.oauth2 import service_account
from googleapiclient.discovery import build
except ImportError:
os.system("pip install hubspot-api-client google-api-python-client google-auth -q")
from hubspot import HubSpot
from google.oauth2 import service_account
from googleapiclient.discovery import build
HUBSPOT_ACCESS_TOKEN = os.environ.get("HUBSPOT_ACCESS_TOKEN")
SPREADSHEET_ID = os.environ.get("GOOGLE_SPREADSHEET_ID")
GOOGLE_KEY_FILE = os.environ.get("GOOGLE_SERVICE_ACCOUNT_KEY")
if not all([HUBSPOT_ACCESS_TOKEN, SPREADSHEET_ID, GOOGLE_KEY_FILE]):
print("ERROR: Missing env vars: HUBSPOT_ACCESS_TOKEN, GOOGLE_SPREADSHEET_ID, GOOGLE_SERVICE_ACCOUNT_KEY")
sys.exit(1)
# 1. Get stages
client = HubSpot(access_token=HUBSPOT_ACCESS_TOKEN)
pipelines = client.crm.pipelines.pipelines_api.get_all(object_type="deals")
stage_map = {}
stage_order = []
for pipeline in pipelines.results:
for stage in pipeline.stages:
stage_map[stage.id] = stage.label
stage_order.append(stage.label)
print(f"Found {len(stage_map)} stages: {', '.join(stage_order)}")
# 2. Get deals (paginated)
from hubspot.crm.deals import PublicObjectSearchRequest, Filter, FilterGroup
pipeline_filter = Filter(property_name="pipeline", operator="EQ", value="default")
all_deals = []
after = 0
while True:
request = PublicObjectSearchRequest(
filter_groups=[FilterGroup(filters=[pipeline_filter])],
properties=["dealname", "amount", "dealstage", "closedate", "createdate"],
sorts=[{"propertyName": "amount", "direction": "DESCENDING"}],
limit=100,
after=after,
)
response = client.crm.deals.search_api.do_search(public_object_search_request=request)
all_deals.extend(response.results)
if response.paging and response.paging.next:
after = response.paging.next.after
else:
break
print(f"Found {len(all_deals)} deals")
# 3. Calculate snapshot
total_value = 0
by_stage = {}
for deal in all_deals:
props = deal.properties
amount = float(props.get("amount") or 0)
total_value += amount
stage_name = stage_map.get(props.get("dealstage", ""), props.get("dealstage", "unknown"))
if stage_name not in by_stage:
by_stage[stage_name] = {"count": 0, "value": 0}
by_stage[stage_name]["count"] += 1
by_stage[stage_name]["value"] += amount
today = datetime.now(timezone.utc).strftime("%Y-%m-%d")
row = [today, total_value, len(all_deals)]
for stage_name in stage_order:
data = by_stage.get(stage_name, {"count": 0, "value": 0})
row.append(data["count"])
row.append(data["value"])
print(f"Snapshot: {today} | ${total_value:,.0f} | {len(all_deals)} deals")
# 4. Append to Google Sheets
creds = service_account.Credentials.from_service_account_file(
GOOGLE_KEY_FILE, scopes=["https://www.googleapis.com/auth/spreadsheets"]
)
service = build("sheets", "v4", credentials=creds)
result = service.spreadsheets().values().append(
spreadsheetId=SPREADSHEET_ID,
range="Sheet1!A1",
valueInputOption="USER_ENTERED",
insertDataOption="INSERT_ROWS",
body={"values": [row]},
).execute()
print(f"Appended to Google Sheets: {result.get('updates', {}).get('updatedRows', 0)} row(s)")Step 7: Schedule with cron or GitHub Actions
Cron (server-based):
# crontab -e
0 8 * * 1 cd /path/to/pipeline-snapshot && python snapshot.py >> /var/log/pipeline-snapshot.log 2>&1GitHub Actions (serverless):
# .github/workflows/pipeline-snapshot.yml
name: Weekly Pipeline Snapshot
on:
schedule:
- cron: '0 13 * * 1' # 8 AM ET = 1 PM UTC
workflow_dispatch: {} # Allow manual runs
jobs:
snapshot:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.12'
- run: pip install hubspot-api-client google-api-python-client google-auth
- run: python snapshot.py
env:
HUBSPOT_ACCESS_TOKEN: ${{ secrets.HUBSPOT_ACCESS_TOKEN }}
GOOGLE_SPREADSHEET_ID: ${{ secrets.GOOGLE_SPREADSHEET_ID }}
GOOGLE_SERVICE_ACCOUNT_KEY: ${{ secrets.GOOGLE_SERVICE_ACCOUNT_KEY }}For GitHub Actions, store the service account JSON key as a secret. You have two options: (1) save the entire JSON content as a secret and write it to a temp file at runtime, or (2) base64-encode the key, store that as a secret, and decode it in the workflow. Option 1 is simpler — just use echo "$KEY_JSON" > /tmp/sa-key.json and set GOOGLE_SERVICE_ACCOUNT_KEY=/tmp/sa-key.json.
Troubleshooting
Common questions
What if the Google service account key rotates?
Service account keys don't expire automatically, but you should rotate them periodically. Update the JSON key file and the corresponding secret in GitHub Actions. The Google Cloud Console lets you create a new key while the old one is still active, so there's no downtime.
Can I add week-over-week deltas to the snapshot?
Yes. Read the last row from Google Sheets before appending the new one, then calculate percentage changes for each metric. The Sheets API get method with a range like Sheet1!A:A lets you find the last row.
How do I handle pipelines with 200+ deals?
The script already handles pagination — it loops through 100-deal pages using the after cursor. Add a 200ms delay between pages to stay within HubSpot's 5 requests/second search API limit.
What's the total runtime for a typical snapshot?
Under 10 seconds for a 100-deal pipeline. HubSpot API calls take 200-500ms each, and the Google Sheets append is under 300ms. Even with pagination, a 500-deal pipeline finishes in under 30 seconds.
Rate limits
| API | Limit | Impact |
|---|---|---|
| HubSpot general | 150 req / 10 sec per account | No concern for weekly snapshot |
| HubSpot Search | 5 req / sec | Matters with pagination (add 200ms delay between pages) |
| Google Sheets API | 300 req / min per project | No concern for a single append |
| Google Sheets API | 60 req / min per user per project | No concern |
Trade-offs
- Full control over data transformation, column ordering, and snapshot logic
- Version controlled — track changes in git, review in PRs
- No per-execution cost — runs on existing infrastructure (GitHub Actions free tier includes 2,000 min/month)
- Google Sheets API is free — standard quotas are generous for weekly snapshots
- Maintenance overhead — you handle errors, monitoring, and credential rotation
- No visual debugging — logs, not a visual flow builder
Looking to scale your AI operations?
We build and optimize automation systems for mid-market businesses. Let's discuss the right approach for your team.