CacheFlow AI - AI API Cost OptimizerCacheFlow AI - AI API Cost Optimizer
Cut AI API costs by 60-85%. Smart caching, free API routing, local models. Works with OpenAI, Claude, Gemini, Cursor. One-time buyCacheFlow AI - AI API Cost Optimizer
Cut AI API costs by 60-85%. Smart caching, free API routing, local models. Works with OpenAI, Cla...
Overview
CacheFlow AI is a local proxy that sits between your app and AI APIs (OpenAI, Claude, Gemini), automatically reducing your costs by 60-85% without changing your code or sacrificing quality.
How it works:
1. Smart Caching — Same question = instant answer at $0. SQLite-based, zero dependencies.
2. Free API Routing — Simple tasks auto-route to Groq, Cerebras, OpenRouter (70B+ models at $0).
3. Local Model Support — Ollama integration with auto hardware detection (NVIDIA, AMD, Apple Silicon).
4. Prompt Compression — 10-30% token reduction on every request.
5. Real-Time Dashboard — Beautiful dark-themed UI with live savings counter and request logs.
Setup: npm install → npx cacheflow init → npx cacheflow start. Then change one line: baseURL: "http://127.0.0.1:4747/v1"
Includes 30 source files, 8 AI provider integrations, real-time WebSocket dashboard, CLI with init wizard, auto hardware detection, SQLite-based caching + analytics. Node.js 18+. MIT License.
Works with OpenAI SDK, Anthropic SDK, Cursor, LangChain, and any OpenAI-compatible tool.
Features
- Smart caching (SQLite-based exact match) — duplicate requests served instantly at $0
- Free API routing to Groq, Cerebras, OpenRouter, Gemini Free — 70B+ models at $0
- Local model support via Ollama — auto-detects NVIDIA GPU, AMD GPU, Apple Silicon
- Prompt compression — 10-30% token reduction per request
- Real-time dashboard with live savings counter, request timeline, provider breakdown
- OpenAI-compatible API — /v1/chat/completions, /v1/completions, /v1/embeddings, /v1/models
- Anthropic Messages API compatibility (/v1/messages)
- 8 provider integrations: OpenAI, Anthropic, Gemini, Groq, Cerebras, Ollama, OpenRouter, Gemini Free
- CLI: init wizard (auto-detects hardware + API keys), start, stop, status, stats, demo
- Cost estimation with per-model pricing for 15+ models (GPT-4o, Claude Sonnet, Gemini Pro, etc.)
- Streaming support (SSE) with analytics tracking
- Request analytics with detailed stats API
- YAML configuration with sensible defaults
- Node.js 18+ with ES Modules
- Full test suite included
- .env.example with all configuration options
- MIT License — use however you want
```
Requirements
- Node.js 18 or higher
- npm or yarn
- No external database needed (SQLite built-in)
- Optional: Ollama for local models
- Optional: Free API keys (Groq, Cerebras — get at groq.com)
- Optional: Paid API keys (OpenAI, Anthropic, Google)
```
Instructions
1. Extract the ZIP file
2. cd Source_Code
3. npm install (or use included node_modules)
4. npx cacheflow init (auto-detects your hardware and API keys)
5. npx cacheflow start (proxy starts on localhost:4747, dashboard on :4748)
6. Change your app's base URL to http://127.0.0.1:4747/v1
7. Open http://localhost:4748 for the real-time dashboard
8. Run: npx cacheflow demo (sends test requests to verify everything works)
9. Run: npx cacheflow status (shows live stats and savings)
For OpenAI SDK: new OpenAI({ baseURL: "http://127.0.0.1:4747/v1" })
For Anthropic SDK: new Anthropic({ baseURL: "http://127.0.0.1:4747/v1" })
For Cursor: Settings → API → Custom API URL → http://127.0.0.1:4747/v1
```
Other items by this author
| Category | Scripts & Code / NodeJS |
| First release | 30 March 2026 |
| Last update | 30 March 2026 |
| Files included | .css, .html, Javascript .js |
| Tags | cursor, NodeJS, developer tools, openai, llm, gemini, claude, groq, ollama, ai api proxy, smart caching, cost optimizer, token saver, api gateway, langchain |








