Superstore Sales Analytics via Supabase CDC
This use case streams a retail sales table from Supabase Postgres into RawTree using Change Data Capture. The classic Kaggle Superstore Sales dataset (~9.8k order lines) is imported into Supabase, published via a logical-replication publication, and consumed by the supabase/etl Rust worker, which lands every row event — initial copy, insert, update, delete — in an append-only RawTree table. There are two ways to deploy the worker: the rawtree_supabase_cdc_ingestion Terraform resource automates everything (Secrets Manager, IAM, CloudWatch, ECS cluster and a long-running Fargate service), while the build-from-source path walks through the same setup by hand on an EC2 instance using the Docker image from the rawtreedb/examples repository — a useful contrast that shows exactly what the Terraform resource does for you. The dashboard reconstructs the live table state from the CDC event log (latest event per primary key, ordered by commit LSN) and mirrors the analysis panels of a popular Kaggle EDA notebook for this dataset: monthly sales trends, customer segments, regional performance, product category breakdowns, shipping analysis, and seasonality — plus live CDC pipeline health.
Architecture
Setup Guide
1Prepare the Source Table in Supabase
Create a Supabase project and import the Superstore Sales dataset (train.csv) via Table Editor → New table → Import data from CSV, naming the table superstore_sales_data in the public schema. Supabase creates the table and preserves the CSV column names verbatim (including spaces like "Row ID" and "Order ID"). Then enable logical replication for the table in the SQL Editor.
The dataset is the classic Superstore Sales train.csv (~9.8k rows) used by many Kaggle EDA notebooks.
Superstore Sales Dataset on Kaggle→$ -- REPLICA IDENTITY FULL makes UPDATEs / DELETEs include the full old row,
$ -- which the supabase/etl worker needs to emit complete CDC events.
$ ALTER TABLE public.superstore_sales_data REPLICA IDENTITY FULL;
$ DROP PUBLICATION IF EXISTS rawtree_superstore_publication;
$ CREATE PUBLICATION rawtree_superstore_publication
$ FOR TABLE public.superstore_sales_data;2Get the Connection URL & CA Certificate
Copy the Direct connection string from Project Settings → Database → Connection string — not the pooler URL, because logical replication requires a direct replication connection. Supabase signs its Postgres certificates with a private CA, so also download the certificate from Project Settings → Database → SSL Configuration.
The Supabase direct endpoint is typically IPv6-only — the Terraform example provisions a dual-stack VPC so the Fargate task can reach it.
# Alternative: dump the CA from the live TLS handshake
$ host=db.<your-project-ref>.supabase.co
$ openssl s_client -showcerts -starttls postgres -connect "$host:5432" </dev/null 2>/dev/null \
$ | awk '/-----BEGIN CERT/,/-----END CERT/' > ~/supabase-ca.crt3Configure Providers
Set your RawTree credentials as environment variables. The AWS provider uses your default credentials or AWS_PROFILE.
$ export RAWTREE_API_KEY="rt_..."
$ export RAWTREE_ORG="your-org"
$ export RAWTREE_PROJECT="your-project"
$ export TF_VAR_supabase_database_url='postgres://postgres:PASS@db.<ref>.supabase.co:5432/postgres?sslmode=require'4Write Terraform Configuration
A single rawtree_supabase_cdc_ingestion resource provisions everything: a Secrets Manager secret for the API key and database URL, an IAM execution role, a CloudWatch log group, an ECS cluster, and a long-running Fargate service running the supabase/etl worker image. The networking below assumes a dual-stack subnet — copy the self-contained VPC from the provider's example.
The full example includes the dual-stack VPC, subnet, and routing the worker needs for IPv6 egress to Supabase.
Complete example with VPC — terraform-provider-rawtree→$ terraform {
$ required_providers {
$ rawtree = { source = "rawtreedb/rawtree" }
$ aws = { source = "hashicorp/aws", version = "~> 5.0" }
$ }
$ }
$ provider "rawtree" {}
$ provider "aws" { region = "us-east-1" }
$ variable "supabase_database_url" {
$ type = string
$ sensitive = true
$ }
$ resource "rawtree_supabase_cdc_ingestion" "superstore" {
$ name = "superstore"
$ region = "us-east-1"
$ publication = "rawtree_superstore_publication"
# For production, prefer database_url_secret_arn /
# tls_root_cert_secret_arn pointing at secrets you manage —
# those values never enter Terraform state.
$ database_url = var.supabase_database_url
$ tls_root_cert_pem = file("~/supabase-ca.crt")
$ subnet_ids = [aws_subnet.this.id] # dual-stack subnet
$ assign_public_ip = true # IPv4 for the ghcr.io image pull
$ cpu = 512
$ memory = 1024
$ }
$ output "log_group_name" {
$ value = rawtree_supabase_cdc_ingestion.superstore.log_group_name
$ }5Apply & Verify
Apply the configuration, then tail the worker logs. On first start the worker performs an initial copy of all ~9.8k rows into the RawTree table public_superstore__sales__data (the destination name doubles source underscores), then streams live changes from the replication slot.
$ terraform init
$ terraform apply
# Tail the Fargate worker logs
$ aws logs tail $(terraform output -raw log_group_name) --follow
# Verify rows are landing in RawTree
$ rtree query "SELECT count() FROM public_superstore__sales__data"6Open the Dashboard
Create a read-only API key and connect the dashboard. Insert, update, or delete rows in Supabase and watch the CDC Operations panel pick up the changes within seconds.
$ rtree key create --name superstore-dashboard --permission read