ADR-001: Usage Metering and Billing

Status

Proposed

Context

Current State

MeclabsAI currently has a comprehensive usage tracking system built on Firestore:

Usage tracking model: server/models/usage.ts tracks AI token consumption across user/team/organization levels
Storage structure: Usage data stored hierarchically in Firestore collections (team/{teamId}/usage/{timestamp}, team/{teamId}/member/{userId}/usage/{timestamp}, organization/{orgId}/usage/{timestamp})
Billing integration: Existing Stripe integration (server/routes/stripe.ts) handles subscriptions and payments
Rate limiting: Message and API call limits implemented (server/core/chat/rate-limiting.ts)
Analytics: User behavior tracking and usage dashboards (app/dialogs/settings-usage.tsx)
Real-time tracking: Usage captured during OpenAI API calls and message processing

While functional, this system has limitations in scalability, real-time aggregation, and flexibility for complex pricing models.

Business Requirements

As MeclabsAI.com scales, we need a production-ready system to meter customer usage, particularly for AI-related services like token consumption. This system is critical for implementing usage-based pricing, providing transparency to our customers, and enabling business intelligence. The solution must track usage across various dimensions, including by user, team workspace, organization, and tenant, and integrate with our existing Stripe billing system to support our B2B customers.

Key considerations:

Scalability: The system must handle high-volume, real-time usage events generated by our AI services.
Flexibility: We need to support various pricing models, including pay-as-you-go, tiered pricing, and prepaid credits.
Developer Experience: The solution should be easy for our engineers to integrate with and maintain.
B2B Support: The system must accommodate the needs of our B2B customers, including invoicing and detailed usage breakdowns.
Integration: The solution must integrate with our existing stack, including our billing provider (e.g., Stripe).

Decision

We have decided to adopt OpenMeter Cloud as our usage metering solution. OpenMeter Cloud is a managed service based on the open-source OpenMeter platform, designed for real-time usage metering at scale, built on Kafka and ClickHouse for high-throughput event processing and real-time aggregation.

Implementation Details

We will use OpenMeter Cloud's managed service. The high-level implementation plan is as follows:

Set up OpenMeter Cloud account: Create account and configure initial project settings.
Integrate with our services: Instrument our applications to send usage events to the OpenMeter Cloud API. This will involve creating events for AI token usage, API calls, and other billable actions.
Define metering rules: Configure OpenMeter Cloud to aggregate and process usage data according to our pricing models.
Integrate with billing: Connect OpenMeter Cloud to our existing Stripe billing system to automatically generate invoices based on metered usage.
Build customer-facing dashboards: Develop UI components to display usage information to our customers using OpenMeter Cloud's APIs.

Cost Analysis

OpenMeter Cloud (our choice):

Free tier: 100K events/month, $10K billing volume (suitable for initial development)
Pro tier: $249/month + $10 per million events after 1M
Enterprise: Custom pricing with volume discounts

Self-hosted OpenMeter (alternative):

Infrastructure costs: ~$500-1000/month for Kafka + ClickHouse clusters
Engineering overhead: ~0.5 FTE for setup + 0.25 FTE ongoing maintenance
Total cost: ~$8K-12K/month (including engineer time)

Cost comparison at projected scale:

At 10M events/month: OpenMeter Cloud (~$339) vs Self-hosted (~$10K)
At 100M events/month: OpenMeter Cloud (~$1,249) vs Self-hosted (~$15K)
At 1B events/month: OpenMeter Cloud (~$10,249) vs Self-hosted (~$25K)

Decision rationale: OpenMeter Cloud provides better value considering faster implementation (2-4 weeks vs 8-12 weeks), zero operational overhead, and enterprise support. Self-hosted becomes cost-effective only beyond 50M events/month, but the engineering effort savings justify the higher cost.

typescript

// Example of a usage event
{
  "specversion": "1.0",
  "type": "prompt",
  "source": "meclabs-ai-api",
  "subject": "org:1234/user:5678",
  "id": "evt-uuid-1234",
  "time": "2025-07-15T14:00:00Z",
  "data": {
    "provider": "openai",
    "model": "gemini-1.5-pro",
    "type": "input",
    "tokens": 350
  }
}

Consequences

Positive

Fast Time-to-Market: 2-4 weeks implementation vs 8-12 weeks for self-hosted, allowing faster product iteration.
Zero Operational Overhead: No infrastructure management, monitoring, or maintenance required from our team.
Enterprise Support: Dedicated support, SLA guarantees, and proven scalability for production workloads.
High Performance: Built on Kafka and ClickHouse for high-throughput event ingestion and real-time queries.
Pricing Model Flexibility: Support for complex B2B pricing models including tiered, usage-based, and prepaid credits.

Negative

Higher Cost at Scale: More expensive than self-hosted at very high volumes (50M+ events/month).
Vendor Lock-in: Dependency on OpenMeter's managed service and pricing model.
Less Control: Limited ability to customize infrastructure or optimize for specific use cases.

Neutral

Data Residency: Usage data processed in OpenMeter's cloud infrastructure vs on-premises control.
Customization: API-level customization available, but less infrastructure-level control than self-hosted.

Alternatives Considered

Orb - Full-stack usage-based billing platform
- Pros: Complete billing workflows, pricing experimentation, robust reporting
- Cons: No public pricing, estimated $500+/month minimum, less control over infrastructure
- Rejected: More expensive at scale, vendor lock-in, less flexibility for custom B2B requirements
Metronome - Developer-focused metering platform
- Pros: High scalability (used by OpenAI), engineer-friendly, SQL-based configuration
- Cons: ~$X per 1K events + platform fees + % of billing volume, metering-only (no billing)
- Rejected: Incomplete solution requiring additional billing tools, higher costs at volume
Lago - Open-source billing and metering
- Pros: Open-source, 15K events/second ingestion, hybrid pricing models
- Cons: Premium features behind paywall, smaller community, less enterprise-ready
- Rejected: Less mature than OpenMeter, premium license confusion, limited scalability proof
Existing Firestore System - Continue with current implementation
- Pros: Already built, integrated with Stripe, team familiarity
- Cons: Limited real-time aggregation, Firestore query limitations, manual billing workflows
- Rejected: Cannot support complex pricing models or real-time usage dashboards at scale
Custom Solution - Build from scratch with Kafka + time-series DB
- Pros: Complete control, optimized for our use case
- Cons: 6+ months development time, ongoing maintenance burden, diverts from core product
- Rejected: Significant engineering undertaking when proven solutions exist
Self-hosted OpenMeter - Deploy OpenMeter infrastructure ourselves
- Pros: Lower cost at scale (50M+ events/month), complete infrastructure control, no vendor lock-in
- Cons: 8-12 weeks implementation time, ~0.25 FTE ongoing maintenance, Kafka/ClickHouse complexity
- Rejected: Engineering effort outweighs cost savings, diverts resources from core product development
Stripe Billing - Use Stripe's metered billing directly
- Pros: Integrated with existing payment flow, reliable, good documentation
- Cons: Limited metering flexibility, basic aggregation, not designed for complex B2B scenarios
- Rejected: Insufficient for multi-dimensional usage tracking and complex pricing models

Implementation Guidelines

Start with a pilot project to integrate OpenMeter Cloud with a single service to validate the integration approach.
Leverage OpenMeter Cloud's built-in monitoring and alerting capabilities, supplemented with our existing observability tools.
Create clear documentation for our engineering teams on how to instrument their services and send usage events to OpenMeter Cloud.
Begin with the free tier (100K events/month) for development and testing before scaling to Pro tier.

References

Notes

We will need to allocate engineering resources for the initial integration and instrumentation of our services with OpenMeter Cloud. The managed service eliminates infrastructure maintenance overhead, allowing our team to focus on core product development.

ADR-001: Usage Metering and Billing ​

Status ​

Context ​

Current State ​

Business Requirements ​

Decision ​

Implementation Details ​

Cost Analysis ​

Consequences ​

Positive ​

Negative ​

Neutral ​

Alternatives Considered ​

Implementation Guidelines ​

References ​

Notes ​