An AI phone agent for restaurants is just one real call, repeated. Here is the call, line by line.
The pages currently online about AI phone agents for restaurants stay abstract: features, vendors, pricing tables. This page does the opposite. It reads one real production call out loud. 102.36 seconds. 46 captions. A 12:45 AM Denny's order. A caller named Rob who asked for strawberries on the cheesecake mid-sentence and got them. Total: $34.11.
The file this whole page is built on
Inside this repo, at src/components/voice-activity-data.ts, there is a single object exported as voiceData. Its shape is small enough to read in one sitting: a duration in seconds, a sample rate, two amplitude envelopes (one per channel), and an array of caption objects with a speaker, a start time, an end time, and the words that were said. Nothing else.
The thing that makes this honest is that there is also a script that produced it: scripts/build-voice-activity-data.py. That script reads a 16-bit stereo WAV, sends it to Deepgram with multichannel=true, gets word-level timestamps for each side of the conversation, and writes the file. Customer voice on the L channel, agent voice on the R channel. No re-recording, no studio re-takes.
The anchor fact
The data file declares duration: 102.36, sampleRate: 60, and 46 caption rows split across two speakers. The five customer-to-agent latency gaps are 1.60s, 2.41s, 2.41s, 3.03s, 2.31s. The longest pause inside an agent turn is 4.14s, immediately after “one moment, please.”
Everything that follows on this page is read directly off that file. If you fork this repo and grep the captions, you will see the exact phrases quoted below.
Seven moments from one phone order
If you only watched a 30-second demo reel, you would miss most of the work the agent does. Here is the full call broken into seven beats, each pulled from a specific timestamp range in the data file.
0.00 — 0.40s: the pickup
The two channels, side by side
The build script splits stereo audio so the customer is on the L channel and the agent is on the R channel. That makes turn-taking easy to verify: every message either started on the customer side or the agent side. No overlap drama. Below is the same call, rendered as a sequence diagram so you can see the rhythm.
Denny’s order, channel-by-channel turns
Read the captions yourself
Below are the lines as they appear in the data file, with their start times in seconds. The full file has 46 captions; this is a curated sample of the moments most articles about AI restaurant phone agents skip over.
What is happening in the gaps
A two-second pause between speakers feels like nothing on the phone, but a lot is moving. Speech-to-text resolves the customer audio. The order state updates. The menu and modifier rules are consulted. Speech synthesis builds the next reply. And the POS stays in sync with whatever the agent has committed to so far.
Voice in, ticket out, every two seconds
The strawberry moment, in detail
If you only learn one thing about an AI restaurant phone agent from this page, learn this. A real caller will ask for something you did not put on the menu. The interesting question is not whether the agent says yes or no. It is what shape the answer takes on the kitchen ticket.
At 65.98s the customer said: “Can you add strawberries, if that’s an option?”
At 71.34s the agent replied: “You got it. One slice of New York style cheesecake with strawberry topping.”
At 75.42s the agent said the full order back: “A lumberjack slam with scrambled eggs and sourdough bread, a soft drink, and a New York style cheesecake with strawberry topping.”
What that means in the POS: the cheesecake row carries a strawberry-topping modifier as a typed value, not a string stapled to the back of the ticket. The line cook does not need to interpret the customer’s “if that’s an option.” The agent already converted it.
The pages that currently rank for this topic do not show this moment, because most generic voice bots cannot produce it. They write “add strawberries” into a free-text note and hope.
Phrases that appear in the call (with timestamps from the data file)
What this call would have looked like with a generic voice bot
Same audio in. Different shape on the ticket. The differences are not exotic edge cases; they are the moments most callers actually create.
| Feature | What a generic bot tends to do | What the production call did |
|---|---|---|
| "slim" instead of "slam" | Asks "could you repeat that?" or fails to find a menu match. | Resolved silently to the correct dish, no clarification needed. |
| Required modifiers | Either skips the prompt and writes a kitchen note or asks one question at a time. | Egg style and bread enumerated in one breath: white, brown, multigrain, sourdough. |
| "if that’s an option" strawberry | Stapled as a free-text note; line cook decides what to do. | Confirmed and added as a typed strawberry-topping modifier on the cheesecake row. |
| Upsell | No upsell, or a robotic "would you like dessert?" that gets declined. | Cheesecake offered with a one-line joke; lifts AOV when accepted. |
| Order summary | No summary; or summary missing the latest modification. | Full order read back at 75.4s, including the new strawberry topping, before placing. |
| POS handoff | Time and total guessed from a prompt, not from a POS, often wrong by a couple of dollars. | $34.11 quoted because the POS calculated it. Pickup 12:45 AM with real prep math. |
A vendor that cannot show you a real call against its own production transcript is showing you a demo, not a product.
What 102 seconds of one call proves
- The agent picks up before the second ring, day or night.
- It identifies itself, declares the call is recorded, asks one open question.
- Customer disfluencies (‘slim’ for ‘slam’) do not break the order.
- Required modifiers are asked once, with options enumerated.
- Non-menu modifiers (‘strawberries, if that’s an option’) become typed values on the ticket.
- The upsell happens, in plain language, with personality.
- The full order is read back before placing.
- Total and pickup time come from the POS, not a prompt.
How to hear a call like this against your own menu
The Denny's call exists because Denny's menu, modifiers, and POS rules were wired up first. The same wiring step is what unlocks a real sample call against your menu. Here is what it takes.
Step 1: Book a 20-minute walkthrough
Bring the hardest order you took in the last week. Not a cheese pizza. The actual order with a modifier, an allergen question, and a substitution.
Step 2: Onboarding scrapes your online menu
Whatever you publish today (DoorDash, Toast Online Ordering, your site) is parsed into a structured menu with prices and category tags.
Step 3: Items map to your POS catalog
Each scraped dish gets matched to a Clover, Square, Toast, NCR Aloha, or Revel item ID so totals, taxes, and tickets land in the right place.
Step 4: Modifier tree gets wired
Spice levels, half-and-half pricing, roll builders, protein subs, lunch combo windows, and so on. Not a prompt; declared types against POS IDs.
Step 5: Sample call against your menu
We play a recorded test call so you can hear what your version of the Denny’s call sounds like before you commit.
What this looks like at restaurants already on PieLine
Three live numbers from production restaurants, not synthetic benchmarks.
Mylapore (11-location South Indian)
Projected additional revenue per location per day from eliminating the phone bottleneck.
Idly Express (Almaden)
Calls completed end-to-end by AI. The remainder routes to a manager with a full transcript.
Concurrent calls per restaurant line
The reason every call hits the first ring: the next ring does not wait for a human to finish.
“The experience was better than speaking to a human. No hold time, no confusion, no rushing.”
Bring your hardest order. We will play it back.
A 20-minute walkthrough. Tell us your cuisine and the messiest real order from last week. We wire your menu against the modifier tree, then play a recorded sample call so you can hear it before you commit.
Frequently asked
Where does the 102-second number come from?
It is the literal duration field on the call data file in this repo (src/components/voice-activity-data.ts), which was generated by running scripts/build-voice-activity-data.py against the recorded WAV. Deepgram multichannel was the transcription engine. The file declares duration: 102.36, sampleRate: 60, two amplitude envelopes (customer on the left channel, AI on the right), and 46 caption rows ordered by start time. Nothing about the breakdown on this page is reconstructed; every timestamp is read off that file.
What was the AI agent’s actual response latency?
The five gaps between the customer finishing a turn and the AI starting its next one are 1.60s, 2.41s, 2.41s, 3.03s, and 2.31s. The longest single pause inside an AI turn is 4.14s, which sits between "one moment, please" at 11.84s and the next AI word at 15.98s. That is the menu lookup window. We do not pad the gaps; the line of code that prints the captions reads start and end times directly from Deepgram’s word output.
Did the agent really handle a modifier the caller invented?
Yes. At 65.98 seconds the customer says "Can you add strawberries, if that’s an option?" Strawberries are not a baseline component of a New York style cheesecake, and the question is phrased tentatively. Three seconds later the AI confirms "One slice of New York style cheesecake with strawberry topping." The full order summary at 75.42s includes "a New York style cheesecake with strawberry topping" before asking for confirmation. That is what handling a non-menu modifier looks like in production.
What is the upsell line you keep mentioning?
"Before I finish up, would you like to add a sweet treat like a slice of New York style cheesecake? It’s so good. It might make your Coke jealous." The agent says it after the customer gave their name and confirmed the order was complete. The line is two sentences and 9.5 seconds long. The cheesecake landed on the ticket. PieLine’s baseline is a 15 to 20% lift in average order value from the upsell pass.
Why was the pickup time 12:45 AM?
Because the order was placed late at night. A live human at most restaurants is not picking up the phone at that hour, and at chains that do, the phone is the lowest-priority queue. PieLine handles the call regardless of clock time. The agent quoted "12:45AM" as the ready-by time at the end of the call. The reason it can quote that is the same reason it can place the order: the POS integration knows the kitchen’s prep time and current load, not because we hard-coded it.
How many calls can the agent handle at once?
Up to 20 simultaneous calls on a single restaurant line. A human phone employee handles one. That is the entire reason a peak-hour phone bottleneck collapses: every caller hits the first ring, every order goes into the POS as a line item, every modifier including strawberries on the cheesecake is captured. The 20-slot capacity is also why the answer for a Friday at 7pm is the same as the answer for a Tuesday at 12:45 AM.
What POS systems does the order land in?
Clover, Square, Toast, NCR Aloha, and Revel out of the box, with 50+ POS integrations available. The order lands as line items with modifier IDs. In this Denny’s call, that means a lumberjack slam row with two modifiers (eggs scrambled, bread sourdough), a soft drink row, and a cheesecake row with a strawberry-topping modifier. Not a free-text note for the cook to interpret. The agent quoted $34.11 because that is what the POS calculated, not a guess.
What does this cost compared to hiring someone to answer phones?
$350 per month covers up to 1,000 calls. Beyond that it is $0.50 per call. A dedicated phone employee runs $3,000 to $4,000 per month and handles one call at a time. Onboarding (menu scrape, POS mapping, dish description build) is included. There is a money-back guarantee on the first month if the 95%+ order accuracy or 90%+ AI-handled rate does not hold for your restaurant.
More posts that get specific about restaurant phone ordering
Keep reading
20 simultaneous calls on the same line, with the math
Why a peak-hour phone bottleneck disappears when the answer to the next ring does not depend on a human being free.
POS integration: what actually moves between the agent and Toast or Clover
Where modifier IDs come from, why a webhook is not the same as POS integration, and how PieLine avoids re-entry at the till.
AI phone agent for restaurants: pick the vendor that ships your cuisine’s grammar
Pizza, sushi, Indian, Chinese, Mexican: same dial tone, five different backends. How to tell which grammar a vendor really ships.
How did this page land for you?
React to reveal totals
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.