Technology in the restaurant industry, told through one 102-second call.

Every article on this topic lists the same categories and moves on. This one opens a real audio file shipped in the PieLine repo, plays back the Deepgram transcript, and walks through the call second by second. By the end you will know what a modern restaurant voice stack sounds like to the customer, to the kitchen, and to the POS.

Matthew Diakonov, Written with AI

Published April 23, 202611 min read

Hear a live call and compare it to your current line

4.9from 200+ restaurants

The audio file public/audio/dennys-order.mp3 is shipped directly in the PieLine repo and served verbatim by Next.js at /audio/dennys-order.mp3

46 captions, 102.36 seconds, transcribed with Deepgram multichannel diarization and exported as a typed object in src/components/voice-activity-data.ts

PieLine writes rows into Clover, Square, Toast, NCR Aloha, Revel, and 50+ other POS systems; the $34.11 total in the transcript matches what hits the kitchen ticket

What does restaurant technology actually sound like?

One 102-second AI-handled phone order, timestamp by timestamp.

0.00s: 'Hi. This is Denny on a recorded line.'

5.84s: Customer: 'Can I get one lumberjack slim and a Coke.'

52.53s: AI runs the cheesecake upsell.

62.47s: Upsell accepted with a strawberry modifier.

89.12s: 'Placing your order now. Done.'

95.12s: 'Your total is $34.11, pickup at 12:45AM.'

0:00 / 0:05

Source audio: public/audio/dennys-order.mp3. Transcript source: src/components/voice-activity-data.ts.

Why another roundup was not going to help you

Read the existing roundups on this topic and you get a box score: AI, kiosks, contactless, QR, loyalty, kitchen robotics, delivery aggregation, inventory, reservations, analytics. It is a complete list. It is also the exact list most operators have seen in every trade publication since 2022. The list does not tell you what to do on Monday.

The useful frame is to pick one category at the edge of the stack, the phone, and describe it at enough resolution that an operator can compare their current line to what a modern voice stack actually produces. That is why this guide opens a specific file in the PieLine repo and walks through one call.

The anchor file

src/components/voice-activity-data.ts ships with the repo. A one line comment at the top reads Auto-generated from Deepgram multichannel transcription of public/audio/dennys-order.mp3.

The exported voiceData object contains duration: 102.36, sampleRate: 60, a two channel envelopes array, and a captions array of 46 speaker/timestamp/text entries. Every timestamp quoted in this guide is taken from that file; no times were invented to make a point. You can clone the repo and grep for any of them.

Source: src/components/voice-activity-data.ts, first two lines and captions array.

0Seconds of real audio shipped in the repo

0Captions produced by Deepgram multichannel

0 / 16AI turns vs. customer turns

$0Final booked ticket, USD

The call, second by second

Six phases. Each one is a thing the competing articles gloss over when they say “voice AI.” Every timestamp is from the shipped transcript. The ordering is the order a listener hears.

Phases of a 102-second AI phone order

0.00s to 3.44s, greeting and consent

"Hi. This is Denny on a recorded line. What can we get for you?" Consent flagged, brand identified, call opened for business. The equivalent on a human line is two to three rings before a hello.

5.84s to 9.36s, first item capture

Customer says "Can I get one lumberjack slim and, one Coke?" The mis-said "slim" for "slam" gets bound to the correct menu item by 38.04s in the readback. Two items captured, one disfluency absorbed.

15.99s to 25.66s, required modifiers and drink disambiguation

AI asks eggs and bread in a single turn ("White, brown, multigrain, or sourdough?") and classifies the Coke as a soft drink modifier. This is the turn that separates good voice from good speech to text.

52.53s to 59.44s, the upsell

"Before I finish up, would you like to add a sweet treat like a slice of New York style cheesecake? It's so good. It might make your Coke jealous." 6.9 seconds of copy, ending in a joke. Runs on every call.

62.47s to 72.15s, upsell accepted with modifier

Customer says "Sure. Yeah. I'll get one, slice of, the cheesecake. Can you add strawberries, if that's an option?" AI confirms the strawberry modifier by 72.15s. That is the part of the call that lifts the ticket to the final $34.11.

75.43s to 97.84s, readback, POS write, pickup

Full readback: Lumberjack Slam with scrambled eggs and sourdough bread, a soft drink, and a New York style cheesecake with strawberry topping. "Placing your order now. Done. Your total is $34.11, and your order will be ready for pickup at 12:45AM."

Six sub-moments, pulled directly from the transcript

These are the sections of the call most buyer conversations skip, each tied to a precise interval on the timeline.

0.00 to 3.44 seconds, opening

'Hi. This is Denny on a recorded line. What can we get for you?' The greeting sets the consent flag, identifies the restaurant, and invites the order in under 4 seconds. Most human answered phones lose 2 to 3 rings of wall clock before anyone picks up; the AI version starts on ring one.

5.84 to 9.36 seconds, item capture

Customer: 'Can I get one lumberjack slim and, one Coke?' The technology catches a disfluent 'slim' and still binds it to Lumberjack Slam in the menu ontology. This is where speech to text vendors break: disfluencies, misspoken item names, partial words. The transcript shows the correct binding two seconds later.

15.99 to 25.66 seconds, clarifying questions

AI: 'For your lumberjack slam, how would you like your eggs cooked, and what kind of bread would you like? White, brown, multigrain, or sourdough? For your Coke, I'll add a soft drink to your order.' Two required modifiers and a clarification on the drink modifier, delivered as one coherent turn rather than three separate back and forths.

52.53 to 59.44 seconds, upsell pitch

AI: 'Before I finish up, would you like to add a sweet treat like a slice of New York style cheesecake? It's so good. It might make your Coke jealous.' 6.9 seconds of copy. The throwaway joke line is in the shipped recording. Most human operators skip the upsell to get off the phone; the technology runs it on every call.

62.47 to 68.31 seconds, accepted with a modifier

Customer: 'Sure. Yeah. I'll get one, slice of, the cheesecake. Can you add strawberries, if that's an option?' The customer not only accepts but adds a modifier unprompted. The AI confirms the strawberry topping at 72.15s. That is real, booked incremental revenue; the reason the ticket ends at $34.11 instead of a lower base total.

89.12 to 97.84 seconds, POS write and readback

AI: 'Placing your order now. Done. Your total is $34.11, and your order will be ready for pickup at 12:45AM.' Two point four seconds of wall clock between send and confirm. The $34.11 matches what hits the POS. The 12:45 AM pickup time is promised at the same moment the ticket is committed.

Who speaks when, as a sequence

The same call, rendered as a message diagram so you can see where the AI consolidates turns that a human would typically split across multiple back and forths.

Denny's call, message by message (seconds)

Other categories a 2026 stack touches, for comparison

POSKitchen displayOnline orderingSelf order kiosksLoyalty + CRMDelivery aggregationReservationsScheduling + laborInventory + wasteQR + contactless payAnalytics dashboardAI phone answeringAI drive thru

Where the call fans out, once it lands

A voice category that does not touch the rest of the stack is a category stuck in 2018. The hub below is what the Denny’s recording looks like on a production deployment.

One inbound call, fanned out to the rest of the stack

What the transcript looks like as code

If you clone the repo and open src/components/voice-activity-data.ts, this is the first page. The full file is longer because it inlines the 6,141 sample envelope arrays for both speakers. The captions array is what this guide uses.

src/components/voice-activity-data.ts

Categories of technology in the restaurant industry, scored by whether you can hear them working

Some categories have a visible or audible moment where you know the technology ran. Others are invisible. That shifts what you buy first. The voice category is still the one most operators cannot hear inside their own building, which is the gap the Denny's transcript demonstrates end to end.

Feature	Usually invisible	You can hear it
POS and kitchen display	A new ticket, a printer click, a KDS chime. Every operator knows the sound.	Audible and well instrumented. The reference shape of restaurant technology.
Self order kiosk	Customer tap sounds, confirmation ding. Physically present on the floor.	Audible, but only for in store customers. Nothing for phone traffic.
Online ordering and app	Notification chime when a digital order lands. Data is complete.	Audible in the kitchen. Silent to the caller, because the caller never gets there.
Loyalty and CRM	Rarely audible. Points accrual happens in the background; the customer hears nothing.	Invisible until redemption. Useful only if the upstream channels feed it.
Delivery aggregation middleware	Audible as a new order chime from DoorDash or Uber Eats. Familiar sound.	Audible when it works, deafening when a tablet goes down mid rush.
Voicemail or IVR tree	Robot prompts, hold music, tones. The customer hears it; the operator usually does not.	Audible to the caller and miserable. Operators rarely listen back to their own tree.
AI phone answering	Audible to the caller and recordable to the operator. Full transcripts available. The Denny's file is an instance of this category.	The subject of this guide. Audible, transcribed, and queryable.
AI drive thru	Audible, often to every car in line. Chipotle, White Castle, Checkers have all piloted it.	Audible at the speaker post. Same category family as phone AI, different surface.
Robotic prep equipment	Audible on the line. Whir of a makeline, hiss of an oven.	Audible and dramatic, which is why every trade publication covers it. Slowest payback of the audible categories.

The audible / invisible split is not a value judgment. Invisible categories can be great. It is a filter for what an operator can evaluate by walking the floor and listening, versus what requires a report.

Why one 102-second call is worth pricing

The call ends at a $34.11 ticket. The question is how many calls like it your line is missing. The same repo ships a live React calculator on the homepage that answers that, with the 35 percent rush hour miss rate hard coded as a constant.

src/app/page.tsx

At 80 calls per day and a $35 ticket, that formula returns $0 of monthly lost revenue. Drag the sliders on the homepage and plug in your own numbers. One 102-second call looks trivial until you multiply it by the fraction your current line is not catching.

Audit your current phone line against the Denny's transcript

Does an answer start within one ring, including at rush, and at 12:45 AM?
Does the first sentence identify the restaurant and flag recording?
Can it absorb a disfluency (slim for slam) and still bind to the right menu item?
Does it ask required modifiers (eggs, bread) as one coherent turn?
Does it run an upsell, with item specific copy, on every call?
Does it handle unprompted modifiers (add strawberries) inside the upsell?
Does the order read back in full, in the customer's words, before send?
Does the POS write happen within a few seconds of placing the order?
Does the final turn promise a specific dollar amount and a specific pickup time?
Is the call archived as a transcript you can search afterwards?

“The experience was better than speaking to a human. No hold time, no confusion, no rushing. 90%+ of our calls are now handled end-to-end by PieLine, and we are projecting $500 in additional revenue per location per day.”

Jay Jayaraman

Owner, Mylapore (11 locations, Bay Area)

Want to hear a live call on your own menu, not Denny's?

Bring your existing phone number and your menu; we play back a real call recorded on your stack within the demo and point to where every timestamp would land in your POS.

Frequently asked questions

What counts as technology in the restaurant industry in 2026?

POS and kitchen display, online ordering, self order kiosks, contactless and QR payments, loyalty and CRM, AI phone answering and drive thru, reservations and waitlist platforms, delivery aggregation middleware, workforce and scheduling, inventory and waste, and the analytics layer that ties all of that together. The interesting question is which of those categories you can actually hear working. A POS beeps, a kiosk dings, a kitchen display chirps. The phone is the one place where most articles list 'voice AI' as a trend, and almost no article tells you what it actually sounds like.

What is the Denny's call this guide dissects?

A 102.36 second AI-handled phone order recorded on a real production phone line and shipped in the PieLine repo at public/audio/dennys-order.mp3. The audio is transcribed with Deepgram multichannel diarization, the result is stored in src/components/voice-activity-data.ts, and that file is imported by the hero visualization on aiphoneordering.com. The call contains a greeting, a two item order (Lumberjack Slam plus Coke), clarifying questions on eggs and bread, an upsell, an accepted upsell with a strawberry modifier, a full readback, a POS write, a total of $34.11, and a 12:45 AM pickup time.

Why dissect one call instead of publishing another trend list?

Because every restaurant operator has already read the trend lists. They know about AI, kiosks, and contactless. What they usually have not seen is what one unit of AI actually does in 102 seconds of real audio. The second-by-second walkthrough in this guide lets an operator compare what a modern restaurant voice stack handles to what their own phone line is doing tonight, and judge for themselves whether the gap is closeable in a quarter.

How many captions are in the transcript, and where do they live in the repo?

46 captions across 102.36 seconds, 30 from the AI and 16 from the customer. The file lives at src/components/voice-activity-data.ts and exports a typed object with duration, sampleRate, an envelopes object containing per-speaker amplitude arrays, and a captions array of speaker, start, end, and text fields. A comment at the top of the file says it is auto-generated from Deepgram multichannel transcription of public/audio/dennys-order.mp3.

When does the upsell happen and does it actually work?

The cheesecake upsell starts at 52.525 seconds with 'Before I finish up, would you like to add a sweet treat like a slice of New York style cheesecake?' and finishes at 59.44 seconds with the throwaway line 'It might make your Coke jealous.' The customer accepts at 62.465 seconds with 'Sure. Yeah.' and then adds a strawberry modifier at 65.985 seconds. That is a 10 second window from offer to accepted modification, and it is the part of the call that lifts the ticket from the base Lumberjack Slam plus Coke to the final $34.11.

How fast does the POS write happen?

'Placing your order now' is spoken at 89.12 seconds, 'Done' at 91.52 seconds. So roughly 2.4 seconds of wall clock between 'I am about to send this' and 'it is in.' The total wall clock from the customer finishing their first order sentence at 9.36 seconds to the POS write at 89.12 seconds is just under 80 seconds, including clarification, upsell, and full readback confirmation.

Which POS systems can this kind of technology write into on day one?

PieLine ships live integrations with Clover, Square, Toast, NCR Aloha, and Revel, and the homepage and llms.txt both claim over 50 POS integrations in total. Those five cover the overwhelming majority of independent and mid market installs in North America. The POS write in the Denny's recording is the last stage of the same pipeline, which is what lets 'Your total is $34.11' match what the kitchen prints for the cook.

How does this connect to the 35 percent missed call rate everyone quotes?

The PieLine homepage ships a React calculator at src/app/page.tsx, lines 221 to 230, that takes calls per day and average ticket and returns monthly lost revenue. The fixed constant missedRate is 0.35. At 80 calls per day and a $35 ticket, the calculator returns $29,400 in monthly lost revenue. The Denny's call is the shape of what one recovered call looks like. Multiply that by the recovered share of 35 percent and that is the order of magnitude of the technology's effect, per location, per month.

Does the call happen at a normal time of day or at 12:45 in the morning?

The pickup time quoted in the transcript is 12:45 AM, which is one of the clearer signals that this kind of technology is not a daytime aid. Denny's, diners, late night Indian and Chinese operations, and 24 hour QSR chains are exactly where the technology pays back fastest, because those windows are when the line cook is on and a human phone answerer is not.

Can a restaurant operator listen to the file themselves?

Yes. The audio file is served at /audio/dennys-order.mp3 from aiphoneordering.com because it sits in public/audio/ and Next.js serves the public directory verbatim. The hero clip on the homepage plays the audio with a dual channel envelope visualization driven by the same voice-activity-data.ts file this guide references. There is no paywall and no signup required to confirm what the technology sounds like.

How does this compare to a human answered call?

A human taking the same order takes longer per call (closer to 3 minutes with upsell), handles one line at a time, and produces almost no structured data afterward. A dedicated phone host costs $3,000 to $4,000 per month fully loaded and still cannot pick up call two during call one. PieLine is documented as handling up to 20 simultaneous calls at $350 per month flat for the first 1,000 answered calls, which is the math that moves voice from a staffing problem to a technology problem.

Which restaurants have already run this technology in production?

Mylapore, an 11 location South Indian chain in the Bay Area, is rolling PieLine across all locations and projects roughly $500 in recovered revenue per location per day, roughly $2M per year at full rollout. Idly Express in Almaden runs more than 90 percent of calls end to end on the AI, with the remaining sliver transferred to staff with the conversation context attached. Amber India is onboarding. These are the reference installs cited on aiphoneordering.com and in the PieLine llms.txt.

Adjacent guides on the same product and the same transcript.

Keep reading

Lens

Technology and the restaurant industry, scored as a data row

The same repo, a different lens. A one rule test that scores every category by whether it produces a joinable row.

Read

Trends

Restaurant technology trends, ranked by time to first dollar

Trend lists rank by category. This one ranks by payback speed, and voice tops it.