An AI phone agent for restaurants is just one real call, repeated. Here is the call, line by line.

The pages currently online about AI phone agents for restaurants stay abstract: features, vendors, pricing tables. This page does the opposite. It reads one real production call out loud. 102.36 seconds. 46 captions. A 12:45 AM Denny's order. A caller named Rob who asked for strawberries on the cheesecake mid-sentence and got them. Total: $34.11.

M
Matthew Diakonov
9 min read
4.9from 200+ restaurants
Real Deepgram-transcribed call shipped in this repo at src/components/voice-activity-data.ts
Customer L channel + AI R channel, 60 Hz amplitude envelope, 46 timestamped captions
Order: lumberjack slam, Coke, cheesecake with strawberries. Total $34.11. Pickup 12:45 AM.

The file this whole page is built on

Inside this repo, at src/components/voice-activity-data.ts, there is a single object exported as voiceData. Its shape is small enough to read in one sitting: a duration in seconds, a sample rate, two amplitude envelopes (one per channel), and an array of caption objects with a speaker, a start time, an end time, and the words that were said. Nothing else.

The thing that makes this honest is that there is also a script that produced it: scripts/build-voice-activity-data.py. That script reads a 16-bit stereo WAV, sends it to Deepgram with multichannel=true, gets word-level timestamps for each side of the conversation, and writes the file. Customer voice on the L channel, agent voice on the R channel. No re-recording, no studio re-takes.

The anchor fact

The data file declares duration: 102.36, sampleRate: 60, and 46 caption rows split across two speakers. The five customer-to-agent latency gaps are 1.60s, 2.41s, 2.41s, 3.03s, 2.31s. The longest pause inside an agent turn is 4.14s, immediately after “one moment, please.”

Everything that follows on this page is read directly off that file. If you fork this repo and grep the captions, you will see the exact phrases quoted below.

0sTotal call length
0Caption rows
$0Order total
0Pickup minutes from order

Seven moments from one phone order

If you only watched a 30-second demo reel, you would miss most of the work the agent does. Here is the full call broken into seven beats, each pulled from a specific timestamp range in the data file.

0.00 — 0.40s: the pickup

"Hi." The agent answers in 0.40 seconds. No menu. No "press 1 for orders." No phone tree.

The two channels, side by side

The build script splits stereo audio so the customer is on the L channel and the agent is on the R channel. That makes turn-taking easy to verify: every message either started on the customer side or the agent side. No overlap drama. Below is the same call, rendered as a sequence diagram so you can see the rhythm.

Denny’s order, channel-by-channel turns

Customer (L)Agent (R)POS / kitchen0:00 Hi. This is Denny on a recorded line.0:05 Can I get one lumberjack slim and one Coke?0:11 menu lookup (4.14s)0:15 Eggs? Bread (white / brown / multigrain / sourdough)?0:29 Sourdough, scrambled.0:37 Anything else for your order?0:45 No. Put it under the name Rob.0:51 Sweet treat? Cheesecake jealous-of-Coke pitch.1:02 Sure. And add strawberries, if that’s an option?1:11 You got it. Strawberry topping. Confirms full order.1:25 Yeah. That’s right.1:29 placing order1:31 Done. $34.11. Pickup 12:45 AM.

Read the captions yourself

Below are the lines as they appear in the data file, with their start times in seconds. The full file has 46 captions; this is a curated sample of the moments most articles about AI restaurant phone agents skip over.

voice-activity-data.ts — sample captions

What is happening in the gaps

A two-second pause between speakers feels like nothing on the phone, but a lot is moving. Speech-to-text resolves the customer audio. The order state updates. The menu and modifier rules are consulted. Speech synthesis builds the next reply. And the POS stays in sync with whatever the agent has committed to so far.

Voice in, ticket out, every two seconds

Customer audio (L channel)
Menu and modifier rules
POS state
Restaurant agent
Agent voice (R channel)
Order line items
Kitchen ticket

The strawberry moment, in detail

If you only learn one thing about an AI restaurant phone agent from this page, learn this. A real caller will ask for something you did not put on the menu. The interesting question is not whether the agent says yes or no. It is what shape the answer takes on the kitchen ticket.

At 65.98s the customer said: “Can you add strawberries, if that’s an option?”

At 71.34s the agent replied: “You got it. One slice of New York style cheesecake with strawberry topping.”

At 75.42s the agent said the full order back: “A lumberjack slam with scrambled eggs and sourdough bread, a soft drink, and a New York style cheesecake with strawberry topping.”

What that means in the POS: the cheesecake row carries a strawberry-topping modifier as a typed value, not a string stapled to the back of the ticket. The line cook does not need to interpret the customer’s “if that’s an option.” The agent already converted it.

The pages that currently rank for this topic do not show this moment, because most generic voice bots cannot produce it. They write “add strawberries” into a free-text note and hope.

Phrases that appear in the call (with timestamps from the data file)

0:00 Hi.0:00 This is Denny on a recorded line.0:02 What can we get for you?0:10 One moment, please.0:15 How would you like your eggs cooked?0:20 White, brown, multigrain, or sourdough?0:23 For your Coke, I’ll add a soft drink.0:51 Thanks, Rob.0:52 Would you like to add a sweet treat?0:58 It might make your Coke jealous.1:11 You got it.1:12 One slice of cheesecake with strawberry topping.1:29 Placing your order now.1:31 Done.1:32 Your total is $34.11.1:35 Ready for pickup at 12:45AM.1:37 Thank you for calling Denny’s, Rob.

What this call would have looked like with a generic voice bot

Same audio in. Different shape on the ticket. The differences are not exotic edge cases; they are the moments most callers actually create.

FeatureWhat a generic bot tends to doWhat the production call did
"slim" instead of "slam"Asks "could you repeat that?" or fails to find a menu match.Resolved silently to the correct dish, no clarification needed.
Required modifiersEither skips the prompt and writes a kitchen note or asks one question at a time.Egg style and bread enumerated in one breath: white, brown, multigrain, sourdough.
"if that’s an option" strawberryStapled as a free-text note; line cook decides what to do.Confirmed and added as a typed strawberry-topping modifier on the cheesecake row.
UpsellNo upsell, or a robotic "would you like dessert?" that gets declined.Cheesecake offered with a one-line joke; lifts AOV when accepted.
Order summaryNo summary; or summary missing the latest modification.Full order read back at 75.4s, including the new strawberry topping, before placing.
POS handoffTime and total guessed from a prompt, not from a POS, often wrong by a couple of dollars.$34.11 quoted because the POS calculated it. Pickup 12:45 AM with real prep math.

A vendor that cannot show you a real call against its own production transcript is showing you a demo, not a product.

What 102 seconds of one call proves

  • The agent picks up before the second ring, day or night.
  • It identifies itself, declares the call is recorded, asks one open question.
  • Customer disfluencies (‘slim’ for ‘slam’) do not break the order.
  • Required modifiers are asked once, with options enumerated.
  • Non-menu modifiers (‘strawberries, if that’s an option’) become typed values on the ticket.
  • The upsell happens, in plain language, with personality.
  • The full order is read back before placing.
  • Total and pickup time come from the POS, not a prompt.

How to hear a call like this against your own menu

The Denny's call exists because Denny's menu, modifiers, and POS rules were wired up first. The same wiring step is what unlocks a real sample call against your menu. Here is what it takes.

1

Step 1: Book a 20-minute walkthrough

Bring the hardest order you took in the last week. Not a cheese pizza. The actual order with a modifier, an allergen question, and a substitution.

2

Step 2: Onboarding scrapes your online menu

Whatever you publish today (DoorDash, Toast Online Ordering, your site) is parsed into a structured menu with prices and category tags.

3

Step 3: Items map to your POS catalog

Each scraped dish gets matched to a Clover, Square, Toast, NCR Aloha, or Revel item ID so totals, taxes, and tickets land in the right place.

4

Step 4: Modifier tree gets wired

Spice levels, half-and-half pricing, roll builders, protein subs, lunch combo windows, and so on. Not a prompt; declared types against POS IDs.

5

Step 5: Sample call against your menu

We play a recorded test call so you can hear what your version of the Denny’s call sounds like before you commit.

What this looks like at restaurants already on PieLine

Three live numbers from production restaurants, not synthetic benchmarks.

Mylapore (11-location South Indian)

$0 / day / loc

Projected additional revenue per location per day from eliminating the phone bottleneck.

Idly Express (Almaden)

0%+

Calls completed end-to-end by AI. The remainder routes to a manager with a full transcript.

Concurrent calls per restaurant line

0

The reason every call hits the first ring: the next ring does not wait for a human to finish.

The experience was better than speaking to a human. No hold time, no confusion, no rushing.
A
A PieLine caller
Customer feedback recorded during active-call monitoring

Bring your hardest order. We will play it back.

A 20-minute walkthrough. Tell us your cuisine and the messiest real order from last week. We wire your menu against the modifier tree, then play a recorded sample call so you can hear it before you commit.

Frequently asked

Where does the 102-second number come from?

It is the literal duration field on the call data file in this repo (src/components/voice-activity-data.ts), which was generated by running scripts/build-voice-activity-data.py against the recorded WAV. Deepgram multichannel was the transcription engine. The file declares duration: 102.36, sampleRate: 60, two amplitude envelopes (customer on the left channel, AI on the right), and 46 caption rows ordered by start time. Nothing about the breakdown on this page is reconstructed; every timestamp is read off that file.

What was the AI agent’s actual response latency?

The five gaps between the customer finishing a turn and the AI starting its next one are 1.60s, 2.41s, 2.41s, 3.03s, and 2.31s. The longest single pause inside an AI turn is 4.14s, which sits between "one moment, please" at 11.84s and the next AI word at 15.98s. That is the menu lookup window. We do not pad the gaps; the line of code that prints the captions reads start and end times directly from Deepgram’s word output.

Did the agent really handle a modifier the caller invented?

Yes. At 65.98 seconds the customer says "Can you add strawberries, if that’s an option?" Strawberries are not a baseline component of a New York style cheesecake, and the question is phrased tentatively. Three seconds later the AI confirms "One slice of New York style cheesecake with strawberry topping." The full order summary at 75.42s includes "a New York style cheesecake with strawberry topping" before asking for confirmation. That is what handling a non-menu modifier looks like in production.

What is the upsell line you keep mentioning?

"Before I finish up, would you like to add a sweet treat like a slice of New York style cheesecake? It’s so good. It might make your Coke jealous." The agent says it after the customer gave their name and confirmed the order was complete. The line is two sentences and 9.5 seconds long. The cheesecake landed on the ticket. PieLine’s baseline is a 15 to 20% lift in average order value from the upsell pass.

Why was the pickup time 12:45 AM?

Because the order was placed late at night. A live human at most restaurants is not picking up the phone at that hour, and at chains that do, the phone is the lowest-priority queue. PieLine handles the call regardless of clock time. The agent quoted "12:45AM" as the ready-by time at the end of the call. The reason it can quote that is the same reason it can place the order: the POS integration knows the kitchen’s prep time and current load, not because we hard-coded it.

How many calls can the agent handle at once?

Up to 20 simultaneous calls on a single restaurant line. A human phone employee handles one. That is the entire reason a peak-hour phone bottleneck collapses: every caller hits the first ring, every order goes into the POS as a line item, every modifier including strawberries on the cheesecake is captured. The 20-slot capacity is also why the answer for a Friday at 7pm is the same as the answer for a Tuesday at 12:45 AM.

What POS systems does the order land in?

Clover, Square, Toast, NCR Aloha, and Revel out of the box, with 50+ POS integrations available. The order lands as line items with modifier IDs. In this Denny’s call, that means a lumberjack slam row with two modifiers (eggs scrambled, bread sourdough), a soft drink row, and a cheesecake row with a strawberry-topping modifier. Not a free-text note for the cook to interpret. The agent quoted $34.11 because that is what the POS calculated, not a guess.

What does this cost compared to hiring someone to answer phones?

$350 per month covers up to 1,000 calls. Beyond that it is $0.50 per call. A dedicated phone employee runs $3,000 to $4,000 per month and handles one call at a time. Onboarding (menu scrape, POS mapping, dish description build) is included. There is a money-back guarantee on the first month if the 95%+ order accuracy or 90%+ AI-handled rate does not hold for your restaurant.

📞PieLineAI Phone Ordering for Restaurants
© 2026 PieLine. All rights reserved.

How did this page land for you?

React to reveal totals

Comments ()

Leave a comment to see what others are saying.

Public and anonymous. No signup.