The engineering team tweaks your amazing support chatbot - just a small change to include full customer history in every prompt. Overnight, support costs spike 10x.
If youâve ever tracked cloud costs, you already know the drill: something is always quietly draining your budget while youâre busy doing literally anything else. Now weâve added LLMs into the mix - and surprise - they bring brand-new (and sometimes beautifully hidden) ways to overspend.
The good news? Your FinOps instincts still apply. The bad news? OpenAIâs billing model doesnât exactly speak fluent âAWS Cost Explorerâ yet. (AWS is just our example here - swap in your CSP of choice.)
Why Does This Matter Now?
LLMs are no longer experiments tucked away in R&D - theyâre powering customer support, search, product features, and even internal tooling. But as adoption explodes, the old ways of tracking cloud spend start to break down. Unlike EC2 or storage, LLMs donât come with resource IDs, flexible tagging, or predictable scaling curves. Thatâs why applying FinOps principles to LLM usage isnât a ânice-to-haveâ anymore - itâs becoming a survival skill for any team watching their AI bill grow faster than expected.
The Core Problem: No Real Tagging
Hereâs the kicker - OpenAI doesnât give you the rich, flexible tagging and metadata capabilities youâve come to depend on in AWS, GCP, or Azure. Yes, they have projects, but these arenât the same as true key/value tags you can use to slice, dice, and allocate spend however you want.
So your naming convention is your cost allocation strategy. If your projects are called test1 or foo, your attribution data will be completely useless.
In cloud FinOps, tagging lets you pivot costs instantly by environment, feature, team, or customer. In OpenAI, youâre working with fewer dials to turn - which means structure and discipline in naming is not optional.
The Anatomy of OpenAI Cost & Usage
At its core, OpenAI pricing is just another flavor of usage-based billing - but with its own quirks:
- Model â Your âinstance type,â but with more flavors and barely any discounts available. Youâve got text models (GPT-3.5, GPT-4), image generation (DALL·E), embedding models for vector searches, and fine-tunes for custom cases. Each comes with its own performance profile and price point - just like cloud instance families.
- Operation Type â Your CSP âservice typeâ equivalent. Chat, completion, embedding, fine-tuning, image generation - all billed differently, even within the same model.
- Tokens â Your âcompute minutesâ or âCPU cycles,â but in text form. Every word you send to the model is broken into smaller chunks called tokens (roughly ~4 characters or Ÿ of a word). You pay for tokens you send in (prompts) and tokens you get back (completions). The phrase âHow can I reduce my cloud costs?â is 7 words but about 9 tokens. If you send a 500-word customer history in the prompt, thatâs ~650 tokens before the model even starts replying â and if the response is another 500 words, youâre paying for both sides of the exchange. The more verbose your prompt or the longer the modelâs output, the bigger the bill.
- Time â When the request happened. This is critical for correlating spend spikes to a code change, feature launch, or that one developer âjust testing somethingâ in prod.
If AWS bills you for â4 hours of m5.large,â OpenAI bills you for âX tokens of GPT-4.â Same concept - just swap the jargon.
Example: Your product team ships an âinstant support replyâ feature using GPT-4. The prompt includes the customerâs entire history for context, and the completion is several paragraphs long. Everyoneâs thrilled with the quality⊠until Finance notices this feature costs 10x more than it would with a smaller model or a tighter prompt. If youâd tracked model + operation type + token count from the start, you could have optimized early.
The Proxy Pattern: Donât Leave Metadata to Chance
The reality: OpenAI only gives you two attribution fields - user and project. Thatâs it. No arbitrary tags, no environment=prod, no feature=checkout.
Thatâs why the LLM Proxy pattern is emerging as a best practice. Think of it as a traffic cop that sits between your app and OpenAIâs API.
- Without a proxy: every API call goes straight to OpenAI. Developers have to remember to attach the right project name or user ID each time â and mistakes slip through.
- With a proxy: every call first passes through the proxy. The proxy automatically stamps the request with the right metadata (project, team, feature, environment) before forwarding it to OpenAI.
Itâs a thin middleware layer that intercepts every API call (to OpenAI or any other LLM vendor)
Usage Data: The Other Half of the Puzzle
OpenAIâs Usage API is pure gold for FinOps - but by itself, itâs not the whole picture. It tells you what was used, but not the associated dollar value.
The real insight comes when you join usage data with cost data, and the bridge between those datasets is the project ID. Thatâs why consistent, meaningful project names are so critical.
When you align usage and cost data, you can:
- Attribute costs by team, feature, or environment
- Spot anomalies before they blow up into budget issues
- Forecast LLM spend with the same rigor youâd apply to cloud workloads
This is exactly the level of visibility you expect from AWS or GCP - it just takes a little extra work to get there with OpenAI.
The Bottom Line
Tracking OpenAI costs isnât magic - itâs just⊠different. Forget EC2 instance IDs; think models, tokens, and context. Get those right, and youâll have true FinOps visibility for your LLM workloads.
Ignore them, and youâll be back here in six months, staring at a bill you wish youâd understood sooner.



