RawTreeShowroom
Back to Showroom
Database CDCMongoDBCDCGitHubTime SeriesChange Streams

GitHub Events Analytics via MongoDB CDC

This use case demonstrates a real-time analytics pipeline for GitHub events. Data from the GitHub Archive is loaded into MongoDB Atlas, then streamed into RawTree via the rawtree-mongo-connector using Change Data Capture (CDC). The result is a set of analytical queries over millions of GitHub events — pushes, pull requests, issues, forks, and more — enabling dashboards that show developer activity, repository trends, and event throughput in real time.

Architecture

GH Archive
Event generator
mongoimportmongosh
MongoDB Atlas
Change Streams
RawTree Mongo CDC Connector
HTTP API
RawTree
SQL
Dashboard

Setup Guide

1Clone & Prerequisites

Clone the rawtree-mongo-connector repository.

You will need Go 1.22+, mongoimport, mongosh, and direnv installed.

See the full prerequisites in the README
$ git clone https://github.com/rawtreedb/rawtree-mongo-connector.git
$ cd rawtree-mongo-connector/tools/gharchive_ingestion

2Configure Environment

Copy the environment template and fill in your MongoDB Atlas URI and RawTree credentials. The .envrc file uses direnv to auto-load variables when you enter the directory.

$ cp .env.example.local .env.local
  # Edit .env.local — set your MongoDB Atlas URI and RawTree API key
$ direnv allow

3Load Data into MongoDB Atlas

Run the load script to download GitHub Archive events and import them into your MongoDB Atlas cluster. The script downloads hourly JSON files and streams them through mongoimport.

$ ./load-gharchive.sh 50000  # Load 50k events (~150 MB)

4Build & Start the Connector

Build the Go binary from the repo root, then run it from the ingestion directory so direnv loads your environment variables. The connector snapshots existing data first, then switches to CDC mode.

  # From the repo root
$ make build

  # From tools/gharchive_ingestion (direnv loads .env.local)
$ cd tools/gharchive_ingestion
$ ../../bin/rawtree-mongo-connector --config config.yaml

5Create a RawTree API Key

Create an API key to use with the live dashboard. You can use the CLI or go to console.rawtree.com → Settings → API Keys.

  # Install the CLI
$ curl -sSf https://rawtree.com | sh

  # Authenticate and create a key
$ rtree login
$ rtree key create --name mongo-cdc --permission read_write

6Simulate Live Traffic

Run the continuous insert script to generate synthetic GitHub events. Timestamps continue from the latest event in the collection — no gaps.

$ ./continuous-insert.sh        # 50 events every 10s
$ ./continuous-insert.sh 100 5  # 100 events every 5s