“Rebuilding Vapi but slower and nobody asked for push-to-talk.”
• MVP: PTT web/mobile interface + webhook router + basic keyword/intent parser + 2-3 LLM integrations + docs — shippable in 6-8 weeks • Path to $1: offer free tier (100 min/mo) + charge $49/mo for 1,000 min + overages; land first dollar in week 10 via Product Hunt launch to dev community • Feature sprawl risk high: users will immediately request wake-word, multi-turn context, CRM integrations, call analytics — scope creep into rebuilding Vapi • Must resist temptation to become full-stack platform; focus thesis is lightweight BYOA orchestration, but that's also why it's not defensible • Kill criterion: if can't ship sub-500ms latency and undercut Vapi pricing by 40%+ in first 90 days, pivot or shut down
• Voice AI agent TAM: $9.88B in 2026, expanding to $79.4B by 2034 (ResearchIntelo) • Developer-focused orchestration segment: estimated $500M-1B subset (Vapi raised at terms implying $150M+ valuation, Retell at $40M+ ARR) • SAM for "BYOA with webhooks": realistically <$50M — niche within niche targeting devs who want more control than no-code but don't want existing API-first platforms • SOM: <$2M Year 1 achievable only if radical price undercut or novel capability; otherwise drowned out by 12+ credible alternatives • Venture-scale only if pivots to vertical (healthcare voice agents, manufacturing coordinators) with 10x better containment rates
• Existing pricing: Vapi $0.05/min platform + usage, Retell $0.07/min all-in, Synthflow $450/mo for 2,000 min — leaves narrow wedge for differentiation • Per-minute SaaS challenging: $0.03/min to undercut means razor-thin margin after STT/LLM/TTS/hosting costs ($0.02/min blended) • Platform fee model ($0.02/min orchestration) yields ~$200 MRR per customer doing 10k min/mo — need 500 customers for $100k MRR • CAC likely $2k-5k (developer tools, long sales cycles), LTV capped at $5k-15k annual spend unless they scale to enterprise volume • Unit economics break even only at 2+ years retention, but switching cost near-zero (webhook URLs portable, no proprietary lock-in)
• Voice AI market growing 34.8% CAGR to $47.5B by 2034, but dominated by 20+ established platforms (Retell AI at $40M ARR, Vapi, Bland, Synthflow) already offering webhook integrations • Developer-focused platforms like Vapi and Retell charge $0.05-0.07/min with full webhook/API control — exactly what this idea proposes, but already shipping at scale • Push-to-talk is legacy UX: wake-word and continuous listening are table stakes in 2026; PTT relegated to troubleshooting mode in Home Assistant forums • No evidence of developer pain around "bringing your own agent" — platforms already support BYOLLM (bring your own LLM), custom voice, custom telephony • Market wants turnkey solutions with <400ms latency and compliance (HIPAA, SOC2), not another middleware layer requiring assembly
• Core stack proven: STT (Deepgram $0.01/min), LLM orchestration (OpenAI Realtime API), TTS (ElevenLabs), webhook triggers (standard HTTP) • Low technical risk to build MVP in 8 weeks using existing SDKs; GitHub shows 4+ open-source PTT voice assistant repos as reference • Real challenge is latency: competitors ship sub-600ms end-to-end; adding another middleware layer risks 800ms+ unless you control full pipeline like Retell does • Push-to-talk easier than wake-word (no continuous VAD/endpointing), but market doesn't want it — users expect always-listening or physical button on device • Webhook execution during live call requires <100ms response or conversation breaks; n8n/Zapier add 200-500ms lag vs native integrations
KILL A solution in search of a problem the market already solved better. **Strengths:** • Technically feasible with proven building blocks and 8-week build timeline • $50B+ macro tailwind in voice AI creating rising tide for all boats • Developer familiarity with webhook patterns lowers adoption friction **Risks:** • Zero differentiation vs Vapi/Retell/Bland — same webhooks, same BYOLLM, better latency, established trust • Push-to-talk is obsolete UX; even hobbyist Home Assistant users complain it's clunky vs wake-word • No moat: open-source alternatives exist, APIs are commoditized, customer can rebuild your entire platform in 2 weeks with off-the-shelf components • Margin compression inevitable in race-to-bottom pricing against VC-funded competitors burning $50M+ to own market • "Bring your own agent" implies portability, which means zero switching cost and zero retention leverage • Crowded market with 20+ credible platforms and $2.1B in VC funding (2025) chasing same developers