Comparison · Restaurant phone systems

AI phone agent vs IVR menu: one open question replaces the entire press-N tree

An IVR is a decision tree the caller has to walk before their request lands. An AI phone agent skips the tree, opens with a single open question, and parses the request out of natural speech. This page walks through the actual 102.36-second reference call we ship in our repo, then shows what a traditional IVR would have done with the same caller.

M
Matthew Diakonov
9 min read

Direct answer · Verified 2026-05-04

An IVR menu forces the caller down a press-N tree before the order lands. An AI phone agent opens with one question and parses the order out of natural speech.

Concretely, our reference call (file: src/components/voice-activity-data.ts) runs 102.36 seconds, contains 28 AI turns and 18 caller turns, and has zero “press N” prompts. The first AI utterance ends at 3.44s with “What can we get for you?” and the final ticket fires at 91.52s. An equivalent IVR for the same caller would have spent ~25s on menu narration and ~90s on hold before transferring to a host.

4.9from Live across Mylapore, Idly Express, China Village
Mylapore (11 locations) projecting $500/day per location
90%+ of calls handled end-to-end by AI
Reference call shipped in public repo

What the caller hears in the first ten seconds

This is the part of the call that decides whether the caller stays on the line or hangs up and opens a delivery app. An IVR uses those ten seconds to read the menu. An AI phone agent uses them to ask one question and start filling slots. Toggle the panel below to see both, then keep reading.

Same caller. Same restaurant. Two architectures.

Thank you for calling. Please listen carefully as our menu has changed. For hours, press 1. For directions, press 2. To place a pickup order, press 3. To place a delivery order, press 4. To make a reservation, press 5. For all other inquiries, press 0 to speak to a host. [caller presses 3] You have selected pickup. Please listen carefully. For lunch, press 1. For dinner, press 2. To return to the main menu, press star. [caller presses 2] Please hold while we transfer you to the next available host. [hold music. ~90s. caller hangs up.]

  • Up to 6 menu choices read out loud before the caller can speak
  • Two levels of menu nesting before pickup orders even start
  • Order never lands on the IVR; it routes to a host who picks up cold
  • Most callers hang up between the second menu and the hold music

The whole call, sequenced

Below is every meaningful turn in the reference call, mapped to the system that handles it. The dashed lines are AI replies to the caller. The amber lines are events fired into the POS. There is no actor labeled “menu tree” in this diagram because there is no menu tree in the call.

reference call · 102.36s · 0 menu prompts

CallerAI AgentPOS"What can we get for you?" (3.44s)"one lumberjack slim and one Coke" (9.36s)menu lookup: lumberjack slam (10.96s)eggs? bread? (white/brown/multigrain/sourdough)"sourdough... scrambled" (35.15s)echoes order back, asks for more (43.16s)"that's it. name is Rob" (49.16s)cheesecake upsell (59.44s)"sure. with strawberries" (68.30s)final confirmation read-back (84.06s)"yeah. that's right" (86.81s)fire ticket, total $34.11 (91.52s)"Placing your order now." Pickup 12:45AM. (99.28s)

Five facts about this call you can verify in the repo

Every number on this page traces to a single file. If you cloned the repo right now and ran grep -c '"text":' on it, you would get the same numbers. Here is what the file says:

0sTotal call duration
0AI turns
0Caller turns
0Press-N prompts

For comparison, a typical restaurant IVR with a two-level pickup branch reads ~14 menu options before transferring. A first-time caller takes 22-30 seconds to traverse it, and ~17% hang up before reaching the host queue per published call-center benchmarks. The reference call below has placed and confirmed an order, including an upsell, in less time than that traversal.

The data shape, in one file

The whole call is stored as an array of Captionrows. Each row has a speaker tag, a start time, an end time, and a string. There is no “menu” field, no “branch” field, no DTMF column. The agent does not need any of those because the agent is not a tree.

src/components/voice-activity-data.ts

A traditional IVR would store the same call as a path through a finite-state machine: root → pickup → lunch → host_queue → hangup. The path is the call. With an AI agent the call is the call: a stream of natural-language turns whose meaning is parsed at runtime, not encoded in the structure of an audio menu.

Where IVR menus still beat AI agents

A few honest cases where the press-N tree is still the right tool:

  • Internal routing.“Press 1 for HR, press 2 for accounting, press 3 for IT.” This is a real tree with real branches and the cost of getting it wrong is low. An AI agent here would be overkill.
  • Hard regulatory disclosures. A government line that has to read a fixed legal disclosure verbatim before any human contact. The disclosure is the point of the call. An IVR enforces it deterministically.
  • High-volume single-purpose lines. A one-question hotline (a vote tally line, a poll line). The state space is so narrow that a tree fits.

Restaurant phone ordering is none of those. The state space is the menu plus modifiers, the caller's intent is “I want food”, and the cost of getting routing wrong is the order itself. That is an AI agent's territory, not an IVR's.

What changes operationally

The thing operators usually ask after seeing the call is not technical. It is “what does my host actually do all day if the phone is handled?”. Three patterns from the rollouts that are live now:

  1. Cashier consolidation. At the San Jose location of one rollout, two cashiers were redeployed to a second store after the phone load disappeared from the front counter. The phone never wasn't answered. The humans just stopped answering it.
  2. Reservations get a real workflow. The AI confirms the reservation request and the host reviews it from a dashboard at a calmer moment, instead of taking it mid-rush with a pen and a cocktail napkin. Confirmation texts go out automatically.
  3. Edge cases get the human's full attention. The 5-10% of calls that do transfer (catering, allergies, regulars asking for substitutions) land with the host already knowing the context. Staff time per transferred call goes down, not up.

See the same call against your menu

If you want to hear what the agent does with your dishes, modifiers, and POS, the fastest path is a 20-minute walk-through. We pull your menu, set up a sandbox number, and call it together.

Frequently asked questions

What is the actual difference between an IVR and an AI phone agent?

An IVR is a fixed decision tree. Every caller starts at the root and walks branches by pressing keys (or saying narrow trigger words like 'pickup' or 'menu') until they reach a leaf, which is usually 'leave a message' or 'transfer to staff'. The order or request never lands on the IVR itself; the IVR is a routing layer. An AI phone agent skips the tree. It opens with a single open question, listens to the caller's natural sentence, classifies the intent, fills the missing slots by asking follow-up questions, and writes the resulting ticket directly to the POS. The order lands on the agent, not on a human at the end of a transfer.

Can an AI phone agent route to a human the way an IVR transfers to staff?

Yes, and arguably better. An IVR transfers a caller after the press-N traversal completes, which means the human who answers has no context: they pick up cold and start the conversation from scratch. PieLine transfers with full context. The AI summarizes who the caller is, what they asked for, what slots got filled, and what got stuck. Staff pick up already knowing the order. Edge cases that a tree cannot represent (a regular asking for a substitution, a large catering request, an allergy concern) get the human, but only those.

Will an AI phone agent get stuck on the same things an IVR gets stuck on?

The failure modes are different. An IVR gets stuck when the caller's intent does not match a branch the designer thought of: the caller wants something that is not on the menu and there is no exit. An AI phone agent gets stuck when speech recognition mishears a dish name or when a modifier is genuinely ambiguous in the menu. The mitigation is also different: IVRs add more branches and become less usable; AI agents add more menu and modifier mappings during onboarding and become more usable.

How long does the AI need before it talks?

In our reference call the AI's first utterance starts at 0.0s and ends at 3.44s, including the line-recorded disclosure. The first inbound caller turn lands at 5.36s. Caller experience is closer to 'someone picked up immediately' than 'navigate this menu'. There is no hold music because there is no hold.

Does the agent ever just play a recorded message, the way an IVR plays its menu?

No recorded menu is played. Each utterance is generated for the call. The disclosure ('this call is recorded') is a fixed string but it is one line, not a tree. The reference call has 28 AI turns and none of them are 'press 1 for orders, press 2 for hours'. The closest analog is the explicit confirmation back to the caller, which the AI generates from the slots it filled, not from a recording.

What happens to the IVR investment a restaurant already made?

Most independent restaurants do not have a dedicated IVR; they have a phone tree from their carrier (greeting, voicemail, after-hours) plus a person at the counter. Replacing that with an AI phone agent is a forwarding rule change, not a hardware swap. Multi-location chains with a real IVR (an Avaya, Five9, or NICE deployment) usually keep the IVR for things like staff calls, supplier calls, or HR routing, and forward only the customer-facing line to the AI.

How does the AI handle a caller who genuinely wants a press-1 menu?

It handles that the same way a human host would: it asks. The opening turn is 'What can we get for you?' If the caller says 'hours please' the AI answers from the menu config; if they say 'I want to talk to someone' the AI transfers. There is no menu to surface because the caller's request itself is the menu selection. This is the architectural shift: the menu lives inside the model, not inside an audio file.

Is this real or marketing?

The reference call shipped in src/components/voice-activity-data.ts in our public repo. The file is auto-generated by scripts/build-voice-data.py from a Deepgram multichannel transcription. The duration is 102.36s, sample rate 60 frames per second, 46 captions, 28 from the AI and 18 from the caller. You can read every single line on this page or open the file directly. The matching audio is at public/audio/dennys-order.mp3.

📞PieLineAI Phone Ordering for Restaurants
© 2026 PieLine. All rights reserved.

How did this page land for you?

React to reveal totals

Comments ()

Leave a comment to see what others are saying.

Public and anonymous. No signup.