Cut your LLM bill ~80% by distilling your own logs
model-shrink ingests your prompt/response logs and distills a small fine-tuned open model that handles the repetitive 80% of your traffic — at a fraction of the cost. Start with a free eval report that proves the savings before you train anything.
No signup for the free report · No data stored · Works with your existing logs
From log file to dollar savings
The free report is the whole pitch: proof, in your own numbers, that distillation pays off — before you commit to anything.
Upload your real logs
Drop in JSONL or a JSON array of your prompt/response pairs — OpenAI and Anthropic chat formats work out of the box. Nothing is stored.
See what's distillable
We cluster your traffic by task type and score how much of each a small fine-tuned 7-8B model can absorb at acceptable quality.
Get the dollar proof
A clear cost-savings dashboard: your current black-box spend vs. the distilled setup, in ₹ and $, with a per-workload breakdown.
How it works
Export your logs
Pull a sample of prompt/response pairs from your gateway, LangSmith, or DB. A few hundred rows is enough to project savings.
Run the free eval
Upload them. In seconds you get a distillation report — routable %, savings %, and which workloads to distill first.
Distill & deploy
On a paid plan we fine-tune a small open model on your data and host it behind one endpoint. Route the easy 80%, keep the frontier model for the long tail.
“We were spending more on GPT-4o for ticket triage than on our servers. Routing the easy classifications to a small model was the obvious win — we just needed the numbers to justify it.”
— The pitch we hear from every AI team. Run the report and see your own.
Run my free eval report →Stop overpaying the black box.
Free report today. Hosted distilled endpoint from $39/mo.