Voice AI · modifier accuracy

The accuracy number that decides whether voice AI works on your menu is the modifier number

Every voice AI pitch quotes one accuracy figure. It is the order-level number, and it is the easy one. The number that actually decides remakes, refunds, and the allergy call you do not want to make is modifier accuracy, and almost nobody quotes it. Here is why modifiers are where accuracy quietly breaks, what each kind of failure costs you, and the single architectural choice that sets the ceiling.

Matthew Diakonov, Written with AI

Published May 21, 20267 min read

Direct answer (verified 2026-05-21)

How accurate is voice AI on order modifiers?

The headline 95 percent and up figure is order-level. Modifier-level accuracy is a separate, lower number, and it is the one that drives remakes, refunds, and allergy risk. Its ceiling is set by architecture, not by speech recognition: the question is whether the agent can only attach modifiers that already exist in your POS modifier group. PieLine constrains every modifier to the live POS group, confirms each one back using the POS display name, reads the price delta straight from the POS, and refuses to invent a modifier that does not exist.

Why one accuracy number is a magic trick

A base item is easy. There are a few dozen things on your menu and the agent has to pick one. Modifiers are not easy, because they multiply. A single order can carry half-and-half toppings, a spice level, a protein swap, a dietary tag, and two add-ons, each with its own price delta and its own way of being said out loud by a tired caller during the dinner rush. Six or eight modifier decisions on one ticket is normal.

That is the trick behind a clean 95 percent. If the system gets the base item right almost every time and slips on one modifier in twenty, the order-level number still looks great, while the modifier-level number, the one your line cooks and your refund log actually feel, is meaningfully lower. The pages that currently rank for this topic quote numbers as high as 99 percent and describe modifiers as something the model simply hears and captures. Hearing is not the hard part.

The hard part is that a missed modifier is not symmetric with a missed item. Get the base wrong and the customer usually catches it on pickup. Get a modifier wrong and the food looks right, leaves the kitchen, and the failure surfaces at the worst possible moment: a remake when you are slammed, a refund, a one-star review, or a peanut on a plate that was supposed to have none.

The four ways a modifier goes wrong

Modifier failures are not all the same bug, and they do not all have the same fix. Sorting them is the first step to auditing any vendor honestly.

1. Omitted

The caller said it, the system dropped it. The ticket prints without the no-onion. Cheapest to catch if the agent reads the order back, which is why a verbatim confirmation step matters more than it sounds.

2. Invented

The system adds a modifier nobody asked for, usually by pattern-matching to a common combo. The customer paid for and receives something they did not order. This is the failure mode that a constrained, no-fabrication design exists to kill.

3. Misheard but plausible

"No cheese" becomes "no peas." The transcription reads fine and slides through every text-level check. This is the one a better audio model genuinely helps with, and the only one it helps with.

4. Heard correctly but unmappable

The system understood "extra ranch" perfectly, but there is no ranch modifier in your POS. So it lands as a free-text comment a line cook may never read, or the order is rejected and the caller hears dead air while it retries. Transcription was flawless. The ticket is still wrong. This is an architecture problem, and no audio upgrade touches it.

The lever is not the microphone, it is what the agent is allowed to do

Two systems can transcribe the exact same call perfectly and produce different tickets, because they differ in what happens after the words are understood. An open-vocabulary system writes whatever it heard into a note. A constrained system can only attach modifiers that already exist in your POS. Same audio, different ceiling.

Same perfectly-heard order, two architectures

# Generic voice AI: open-vocabulary transcription
# The model hears the words, then writes whatever it heard.

heard("light cheese, no cilantro, sub chicken for pork")
=> note = "light cheese, no cilantro, sub chicken for pork"

# The note is free text. The POS has no idea what it means.
# - "light cheese" : no modifier ID, so it lands as a comment
#   the line cook may or may not read
# - "no cilantro"  : same, a comment, easy to miss on a busy rail
# - "sub chicken"  : no price delta applied; the ticket is now
#   underpriced and the kitchen pulls the wrong protein

# Transcription was perfect. The ticket is still wrong.

7% fewer lines

This is the anchor of the whole topic. PieLine resolves the caller's words against the live POS modifier group during the call and can only attach a modifier ID that already exists in that group. If a request does not resolve, the agent says so and offers the closest real option rather than inventing one. Every price delta is read from the POS price_delta field, never multiplied or guessed. You can trace this end to end, line by line, in the modifier mapping walkthrough linked at the bottom of this page.

What that refusal sounds like on a live call

Here is a real-shaped resolution trace for a half-and-half pizza with a modifier that exists and one that does not. Watch what happens at the line where "extra ranch" fails to resolve.

modifier resolution trace

The unmappable request never becomes a hopeful note. It becomes a spoken sentence and a real alternative. That single behavior is the difference between a kitchen ticket a line cook can act on and one that generates a callback.

95%+

“Order-level accuracy in production, with cuisine-specific modifier handling: half-and-half pizzas, spice levels, protein subs, custom sushi rolls. Edge cases route to a human with full context.”

PieLine

What pins modifier accuracy to its ceiling

The agent can only attach a modifier ID that already exists in the POS modifier group for that item.
Every modifier is confirmed back to the caller using the POS display name, not a paraphrase.
Price deltas are read from the POS price_delta field, so a swap or add-on never silently underprices the ticket.
An unrecognized request is spoken out loud with the closest real option offered, instead of being saved as a hopeful free-text note.
Required modifier groups (size, spice level, protein) are enforced, so the order cannot post half-built.

None of these are audio features. They are constraints on what the agent is permitted to do with the words once it has them. That is why a vendor demo that sounds impressive can still post bad tickets, and why you audit the POS ticket, not the transcript.

The honest counterargument: where constraint costs you

Constraining the agent to existing modifiers is not free. If your POS modifier groups are incomplete, a perfectly reasonable request gets refused on the call. A caller who genuinely wants something you offer but never mapped will hear "we do not have that," which feels worse than a human who would have just typed it into the kitchen note.

The fix is onboarding, not loosening the constraint. PieLine's setup scrapes your menu and maps items and modifiers to POS IDs, including spice, sweetness, ingredients, and dietary info, with active monitoring and refinement during the first month so the real callers expose the gaps. The trade is deliberate: a slightly stricter agent that never invents a modifier beats a permissive one that produces confident, wrong tickets. On modifiers, refusing wrong is cheaper than guessing wrong.

“The experience was better than speaking to a human. No hold time, no confusion, no rushing.”

PieLine customer

Reported by a caller on a live restaurant line

How to test it yourself in ten minutes

Do not accept a single accuracy percentage from anyone, including us. Call the demo line and place your three nastiest real orders: the half-and-half with a substitution, the one with the allergy, the one with an add-on you know is not on the standard menu. Then look at the POS ticket, not the transcript.

Check three things. Did every modifier land as a real line with the right price. Was the unmappable request surfaced on the call or silently dropped into a note. And is the kitchen ticket something a line cook could act on at 7pm on a Friday without calling the customer back. Those three checks tell you the modifier number that the headline accuracy figure was hiding.

Bring your three nastiest orders to the call

We will run them on a live PieLine line and show you the POS ticket each one produces, modifier by modifier, with the price deltas pulled straight from the POS.

Frequently asked questions

How accurate is voice AI on order modifiers, specifically?

There are two numbers and operators keep being shown only one. The headline figure (95% and up, sometimes quoted as high as 99 percent) is order-level: did the overall order come out right. Modifier-level accuracy, did every no-onion, light-cheese, sub-chicken, mild-not-hot detail land correctly, is a separate and lower number on any menu with deep modifiers. It is also the number that actually costs you money, because a missed modifier is a remake, a refund, or in the case of an allergy a real safety event, not a rounding error. The honest answer is that modifier accuracy depends far more on the system's architecture than on how good its speech recognition is.

Why is modifier accuracy lower than order accuracy?

Because modifiers are where the long tail lives. A base item is usually one of a few dozen things on a menu and easy to pin down. Modifiers multiply: half-and-half toppings, spice levels, protein swaps, dietary tags, add-ons, side scoping, each with its own price delta and its own way of being phrased by a caller. One order can carry six or eight modifier decisions. If each is even slightly less reliable than the base item, the compounded modifier accuracy across a full ticket sits below the order-level figure. That gap is exactly what the single-number marketing claim hides.

What are the ways a modifier actually goes wrong on a phone order?

Four. Omitted: the caller said it, the system dropped it, the ticket prints without it. Invented: the system adds a modifier nobody asked for, often by pattern-matching to a common combo. Misheard but plausible: 'no cheese' becomes 'no peas,' a transcription that reads fine and passes silently. And heard correctly but unmappable: the system understood 'extra ranch' perfectly but there is no ranch modifier in your POS, so it lands as a free-text comment the line cook may never read. The fourth one is the killer because transcription was perfect and the ticket is still wrong.

Does better speech recognition fix modifier accuracy?

Only partly. Better transcription reduces the misheard-but-plausible failures. It does nothing for the unmappable failure, which is an architecture problem, not an audio problem. If the agent is allowed to write free text into the order, a perfectly transcribed 'extra ranch' still produces a wrong ticket when there is no ranch modifier to attach. The fix is to constrain the agent so it can only select modifiers that exist in the POS modifier group, and to make it say so out loud when a request does not resolve. That is the lever that moves modifier accuracy, not a marginally better audio model.

How does PieLine handle a modifier the caller asks for that does not exist?

It says so on the call and offers the closest real option from the modifier group, instead of fabricating one or burying it in a note. If a caller asks for extra ranch and there is no ranch add-on in the POS, the agent says that and offers the dips that do exist. The design rule is that the agent resolves the caller's words against the live modifier list during the call and can only attach a modifier ID that already exists. Anything that does not resolve gets surfaced, never silently dropped or invented.

Where do modifier prices come from, and can the AI get them wrong?

The price delta on every swap or add-on is read from the POS price_delta field for that modifier. The agent does not multiply the subtotal by anything or estimate an upcharge. That matters because an underpriced modifier is an invisible accuracy failure: the kitchen makes the right food, the customer is happy, and you quietly lose margin on every ticket. Reading the delta straight from the POS closes that leak. You can trace exactly how this works in the modifier mapping walkthrough linked below.

What kinds of complex modifications can it actually handle?

Half-and-half pizzas with per-side toppings, spice levels including jain and no-onion-no-garlic preparations, protein substitutions on noodle and rice dishes, build-your-own sushi rolls with nested protein, wrap, and add-on groups, and dietary tags that fan out across multiple modifier IDs. The pattern is the same in every cuisine: the agent decomposes the request into the POS's own modifier groups rather than treating the order as one blob of text. Edge cases beyond the menu route to a human with full context.

How should I audit a vendor's modifier accuracy before I sign?

Do not accept a single accuracy percentage. Call the demo line and place your three nastiest real orders: the one with the half-and-half and the substitution, the one with the allergy, the one with the unusual add-on you know is not on the standard menu. Then check the POS ticket, not the transcript. Look for whether every modifier landed as a real line with the right price, whether the unmappable request was surfaced or silently dropped, and whether the kitchen ticket is something a line cook can act on without calling the customer back.

The plumbing and the math behind modifier accuracy