GitHub Events Analytics via MongoDB CDC
This use case demonstrates a real-time analytics pipeline for GitHub events. Data from the GitHub Archive is loaded into MongoDB Atlas, then streamed into RawTree via the rawtree-mongo-connector using Change Data Capture (CDC). The result is a set of analytical queries over millions of GitHub events — pushes, pull requests, issues, forks, and more — enabling dashboards that show developer activity, repository trends, and event throughput in real time.
Architecture
Setup Guide
1Clone & Prerequisites
Clone the rawtree-mongo-connector repository.
You will need Go 1.22+, mongoimport, mongosh, and direnv installed.
See the full prerequisites in the README→$ git clone https://github.com/rawtreedb/rawtree-mongo-connector.git
$ cd rawtree-mongo-connector/tools/gharchive_ingestion2Configure Environment
Copy the environment template and fill in your MongoDB Atlas URI and RawTree credentials. The .envrc file uses direnv to auto-load variables when you enter the directory.
$ cp .env.example.local .env.local
# Edit .env.local — set your MongoDB Atlas URI and RawTree API key
$ direnv allow3Load Data into MongoDB Atlas
Run the load script to download GitHub Archive events and import them into your MongoDB Atlas cluster. The script downloads hourly JSON files and streams them through mongoimport.
$ ./load-gharchive.sh 50000 # Load 50k events (~150 MB)4Build & Start the Connector
Build the Go binary from the repo root, then run it from the ingestion directory so direnv loads your environment variables. The connector snapshots existing data first, then switches to CDC mode.
# From the repo root
$ make build
# From tools/gharchive_ingestion (direnv loads .env.local)
$ cd tools/gharchive_ingestion
$ ../../bin/rawtree-mongo-connector --config config.yaml5Create a RawTree API Key
Create an API key to use with the live dashboard. You can use the CLI or go to console.rawtree.com → Settings → API Keys.
# Install the CLI
$ curl -sSf https://rawtree.com | sh
# Authenticate and create a key
$ rtree login
$ rtree key create --name mongo-cdc --permission read_write6Simulate Live Traffic
Run the continuous insert script to generate synthetic GitHub events. Timestamps continue from the latest event in the collection — no gaps.
$ ./continuous-insert.sh # 50 events every 10s
$ ./continuous-insert.sh 100 5 # 100 events every 5s