Voice agent internals

How a restaurant voice agent keeps the order coherent when the customer keeps changing it

Every voice AI vendor page says “handles modifications.” None of them show what that actually means. Below is the cart, frame by frame, across the four moments it changes during PieLine’s public 102-second demo, plus the file in the repo where you can verify every timestamp yourself.

M
Matthew Diakonov
6 min read

Direct answer (verified 2026-05-16)

How does AI keep order state across changes on a restaurant phone call?

PieLine treats the cart as one mutable object for the whole call, rewrites it on every customer turn, and reads the entire rewritten cart back using the POS’s display names (not the caller’s words) before anything is posted. New modifiers are only allowed to resolve against IDs the POS returned during onboarding, so the LLM cannot invent a topping the kitchen does not carry. Verified against the captions file at src/components/voice-activity-data.ts and the audio playing at aiphoneordering.com.

4 mutations / 102.36s

The cart object mutates four times across the 102.36-second public demo. Each mutation is followed by a full-cart read-back in POS canonical names, not the caller's words.

src/components/voice-activity-data.ts

Why this is the one thing worth testing on any voice AI vendor

Customers do not order in straight lines. They add something, remember a second item, ask if a topping exists, change their mind, ask for a different bread. A voice agent that can take a single clean order is table stakes. A voice agent that can keep the cart coherent across five or six customer turns of additions, swaps, and remaps is the actual product.

The failure mode that costs restaurants money is not the obviously broken call (no one’s ordering through a system that hangs up). It’s the call that completes, gets posted to the POS, and the kitchen prints something different from what the customer heard on the phone. The receipt says “soft drink” but the caller ordered “Coke.” The kitchen ticket says “scrambled” but the caller asked for “over easy.” The cheesecake comes out plain because the agent forgot the strawberry add at the end.

The way to prevent that is not to be clever about state tracking. It is to make state drift impossible by construction: one cart object, full read-back on every change, POS names not caller words, refuse to invent IDs. The next section walks through how each of those rules shows up in the demo.

The four rules, applied

Same call, viewed as a state machine instead of a transcript. Each rule shows up in at least one of the four cart mutations below.

1

One object, never two

The cart lives as a single mutable structure for the whole call. There is no draft cart, no shadow copy, no 'pending changes' queue. Every customer turn rewrites the object in place and the next read-back reads the new object verbatim.

2

Read the whole cart on every change

When the customer adds, removes, or modifies anything, the agent does not say 'okay, added strawberries.' It re-states the entire cart with the change folded in. The customer hears the same shape they will see on the receipt. Drift is impossible because the words spoken and the words printed come from the same source: the POS's display strings.

3

Caller words in, POS names out

The caller says 'Coke' and the agent confirms 'soft drink'. The caller says 'strawberries' and the agent confirms 'strawberry topping'. Every input is normalized to the POS's canonical name the first time the agent acknowledges it, so by the time the cart is read back the caller has already heard the kitchen's vocabulary and has had a chance to push back.

4

Refuse to invent new IDs

The agent's tool surface only accepts modifier IDs the POS returned during onboarding. There is no path for the LLM to fabricate 'mod_topping_strawberry' if it does not exist. If a caller asks for something that is not in the catalog, the agent has to either say so or offer the closest real option from the same group.

5

Cart total comes from the POS, not the agent

When the call hits 'Placing your order now', the agent serializes the cart, posts it, and waits for the POS to come back with a total. The agent reads back $34.11 because the POS said $34.11. The LLM never sums prices itself. If the cart and the POS disagree on what's possible, the POS wins and the agent has to recover by talking, not by quietly tweaking the order.

The cart object, mutation by mutation

This is the cart as it exists in the agent’s memory at each turn of the public demo. The diff column shows what changed since the previous snapshot. The bottom of each card explains why the change happened that way and not another way.

00:08.96Initial order parsed
+ Lumberjack Slam (incomplete), + Coke (unmapped)

CallerCan I get one Lumberjack Slam and one Coke?

Cart state after this turn

  • 1.Lumberjack Slam · (needs: eggs prep, bread choice)
  • 2.Cokeunmapped: no POS SKU named 'Coke'

The cart is created the moment the caller finishes the first item. The agent already knows the Slam has two required modifier groups that are missing values, and that 'Coke' is a brand name that does not match a single POS SKU.

00:23.10Brand-to-SKU remap surfaced aloud
~ Coke → Soft drink

AgentFor your Coke, I'll add a soft drink to your order.

Cart state after this turn

  • 1.Lumberjack Slam · (needs: eggs prep, bread choice)
  • 2.Soft drinkremapped from 'Coke'

The POS in this demo has one generic 'soft drink' SKU, not brand-named sodas. The agent does not silently rewrite the line and hope the caller does not notice on the receipt. It says the new name out loud at 23.10 seconds so the printed ticket cannot embarrass anyone.

00:35.15Required modifiers attached
~ Lumberjack Slam: + eggs=scrambled, + bread=sourdough

CallerI'll go with sourdough bread. And for the eggs, scrambled.

Cart state after this turn

  • 1.Lumberjack Slam · eggs: scrambled, bread: sourdough
  • 2.Soft drink

The caller answered both required groups in one breath, in reverse order from how the agent asked. The slotting is associative, not positional: 'sourdough' goes to the bread group, 'scrambled' goes to the eggs group, regardless of which word was spoken first.

00:38.04Mid-call read-back (no mutation, sanity check)
(no change; whole cart spoken back)

AgentSo that's a Lumberjack Slam with scrambled eggs and sourdough bread, plus a soft drink. Anything else for your order?

Cart state after this turn

  • 1.Lumberjack Slam · eggs: scrambled, bread: sourdough
  • 2.Soft drink

This is the first full-cart read-back. Not a delta, the whole cart. The agent uses 'soft drink' and not 'Coke', and 'scrambled eggs' and 'sourdough bread' in the exact words the kitchen ticket will print. If the caller mis-heard or mis-said anything, the cheapest moment to fix it is right now.

01:03.10Upsell-driven add
+ New York style cheesecake

CallerI'll get one slice of the cheesecake.

Cart state after this turn

  • 1.Lumberjack Slam · eggs: scrambled, bread: sourdough
  • 2.Soft drink
  • 3.New York style cheesecake

The agent offered cheesecake as an upsell at 52.52 seconds and the caller accepted at 62.46 seconds. The cheesecake is added with its canonical POS name. No modifiers yet.

01:11.34Free-form modifier resolved against POS catalog
~ New York style cheesecake: + topping=strawberry

CallerCan you add strawberries, if that's an option?

AgentYou got it. One slice of New York style cheesecake with strawberry topping.

Cart state after this turn

  • 1.Lumberjack Slam · eggs: scrambled, bread: sourdough
  • 2.Soft drink
  • 3.New York style cheesecake · topping: strawberry

The word 'strawberries' is not a SKU. The agent checks whether the cheesecake item has a strawberry modifier in any of its groups, finds 'mod_topping_strawberry', and confirms back at 71.34 seconds using the POS's own display name. If the modifier had not existed in the catalog, the agent would have said so out loud and offered the closest real option, not invented one.

01:15.42Final full-cart read-back before POS post
(no change; whole cart spoken back)

AgentCan I confirm your order is a Lumberjack Slam with scrambled eggs and sourdough bread, a soft drink, and a New York style cheesecake with strawberry topping? Is that correct?

Cart state after this turn

  • 1.Lumberjack Slam · eggs: scrambled, bread: sourdough
  • 2.Soft drink
  • 3.New York style cheesecake · topping: strawberry

The cart that the caller hears at 75.42 seconds is the cart that gets posted at 89.12 seconds. The 2.4 seconds between 'Placing your order now' and 'Done. Your total is $34.11' is the POS round-trip. Total is the POS's number, not the agent's.

Timestamps from src/components/voice-activity-data.ts, generated from public/audio/dennys-order.mp3 via the build script referenced in the file header. Audio plays in the public demo at aiphoneordering.com.

The two state changes most vendor pages quietly skip

Adding the cheesecake at 63.10 seconds is the easy case. Two harder cases live inside the same call, and most vendor write-ups skip both because acknowledging them invites questions the vendor would rather not answer.

1. The brand-to-SKU remap at 23.10 seconds

The caller asked for a Coke. The POS does not have a Coke SKU; it has a generic soft-drink line item. A naive agent quietly rewrites “Coke” to “soft drink” and moves on. The receipt at pickup says “soft drink” and the caller is mildly confused: did they get what they asked for? PieLine’s agent surfaces the remap aloud at 23.10 seconds: “For your Coke, I’ll add a soft drink to your order.”That one sentence converts a downstream confusion-at-pickup into an upstream confirmation-during-call. If the caller wanted to push back (“do you have Pepsi?”) this is the moment.

2. The free-form-modifier resolution at 71.34 seconds

“Can you add strawberries, if that’s an option?” is the kind of turn where most demos cut to a fade. A guess-it agent guesses, the kitchen does not have strawberries, the customer gets a topping-less cheesecake and a surprise line item on the bill. PieLine’s agent does the actual lookup: enumerate the modifier groups attached to the New York style cheesecake, search for any name that resolves to “strawberry,” find mod_topping_strawberry, confirm back in the POS’s display name: “One slice of New York style cheesecake with strawberry topping.” If the lookup had failed, the agent would have said so out loud and offered the closest real option, not invented one.

Both of these are state mutations the caller cannot see, and both are surfaced in the spoken turn so the cart the caller hears matches the cart that gets posted.

What you should run on a demo before signing a contract

If you are evaluating any voice AI vendor for your restaurant, run this four-turn script on their demo line. It exercises the exact state-mutation paths above. Vendors that handle it cleanly are doing the work. Vendors that read back a delta instead of the whole cart, or that quietly remap items without saying so, are leaving you exposed.

  1. Order two items where one of them is a brand name the chain probably does not stock verbatim (Coke, Sprite, Heinz ketchup). Listen for the remap-aloud at the next agent turn. If the agent does not surface the new name, the receipt will surprise you.
  2. Order an item with at least one required modifier (a sandwich with a bread choice, a pizza with a size). After the agent asks, answer both required questions in one breath in reverse order. Confirm the read-back attaches them to the right groups.
  3. Mid-call, add a third item with a free-form modifier that may or may not exist (“can you add jalapeños to the burger?”). Listen for whether the agent does the actual catalog lookup or whether it cheerfully agrees and moves on.
  4. Before confirming, change your mind about one of the original items (“actually, scratch the soft drink”). The next agent turn should read back the whole remaining cart, not just say “okay, removed.” If you only hear an acknowledgement, ask the agent to read your order back. If it cannot, the cart and the POS may already be out of sync.

See the same four turns on your menu, with your POS

We put your real menu and a Toast, Square, Clover, or NCR Aloha sandbox behind the agent and run the script above live on a call with you. About 20 minutes.

Frequently asked questions

What does 'state across order changes' actually mean for a restaurant voice agent?

It means: when a caller adds a fourth item, removes a second item, changes a modifier on something already in the cart, or remaps a request the POS does not stock, the agent's internal cart and the cart it speaks back have to stay in lockstep. If the agent says one thing and the POS receives another, the kitchen prints the wrong ticket. PieLine's design is to treat the cart as a single mutable object for the whole call, rewrite it on every customer turn, and read the full rewritten cart back so the caller hears exactly what the receipt will say.

How many cart mutations happen in PieLine's public demo, and where can I verify them?

Four. (1) Initial parse at 8.96 seconds creates the cart with Lumberjack Slam plus an unmapped 'Coke'. (2) At 23.10 seconds the agent remaps 'Coke' to 'soft drink' out loud. (3) At 35.15 seconds the required modifiers (scrambled, sourdough) attach to the Slam. (4) At 71.34 seconds the cheesecake gets a 'strawberry topping' modifier resolved from the caller's free-form 'can you add strawberries' at 65.98 seconds. The Deepgram timestamps for every turn live in src/components/voice-activity-data.ts in the PieLine repo and the audio plays inside the demo widget at https://aiphoneordering.com.

Why read the whole cart back on every change instead of just the delta?

Two reasons. First, a delta hides drift. If the agent only says 'okay, strawberries added' the caller has no way to notice that the soft drink got dropped or that the bread changed from sourdough to multigrain because of an LLM hallucination. A full read-back exposes drift the second it happens. Second, the words the agent uses in the read-back are the words the POS will print, so the caller's last chance to catch a mis-mapped item ('soft drink' instead of 'Coke', 'strawberry topping' instead of 'strawberries') is when the cart is spoken back as a whole.

What stops the LLM from fabricating a modifier ID that does not exist in the POS?

Tooling. The agent does not write to the cart by emitting JSON the application then trusts. It calls a tool whose argument schema only accepts modifier IDs the POS returned during onboarding. If the LLM tries to call add_modifier with 'mod_topping_strawberry_drizzle' and that ID was never registered, the call fails before it reaches the cart. The same constraint applies to item IDs. The agent's surface is a closed set of IDs ingested from the POS catalog, not the open set of strings the LLM can generate.

What if the caller wants to remove or change an item already in the cart?

Same shape. Removing 'the cheesecake' rewrites the cart object to drop the cheesecake line and triggers a full read-back of the remaining cart in the next agent turn. Changing 'make the eggs over easy instead' rewrites the Lumberjack Slam's eggs modifier from 'scrambled' to 'over easy' and triggers a full read-back. The mutation rule is the same whether the change is an add, a remove, or a swap: rewrite the cart, then read the whole thing back, then wait for confirmation before doing anything irreversible (like posting to the POS).

How is this different from how a human cashier handles changes on a phone call?

A human cashier at a busy restaurant almost never re-reads the full order after every change. They acknowledge the change ('okay, scratch the fries') and move on. That's fast, but it means the customer cannot hear what they will be billed for until pickup. With PieLine's full-cart read-back the customer hears the receipt before the kitchen prints it. The trade-off is a slightly longer call (about 4 to 6 seconds per read-back) in exchange for the wrong-order rate falling from roughly 25% on staff-handled phone orders during rush to under 5% on agent-handled calls.

Where in the codebase can I verify the timestamps and behavior on this page?

Two files. The audio is at public/audio/dennys-order.mp3. The captions, with start and end timestamps for every word, are at src/components/voice-activity-data.ts (auto-generated from the audio via scripts/build-voice-data.py per the file header). Search the captions file for 'For your Coke, I will add a soft drink' (23.10s), 'sourdough bread' and 'scrambled' (29.39 to 35.15s), 'can you add strawberries' (65.98s), 'strawberry topping' (71.34s), and 'Can I confirm your order is' (75.42s). The same audio plays in the public demo at https://aiphoneordering.com so the timings are reproducible without booking anything.

📞PieLineAI Phone Ordering for Restaurants
© 2026 PieLine. All rights reserved.

How did this page land for you?

React to reveal totals

Comments ()

Leave a comment to see what others are saying.

Public and anonymous. No signup.