A restaurant phone agent is eight Claude Skills firing in order. Here are the eight, against a real 102-second call.

Most pages on this topic stop at the SKILL.md frontmatter shape. That is the easy part. The harder question is which eight skills you actually need, what each one does, and how they hand off in real time. We answer that against a recorded call shipped in this repo: 102.36 seconds, 46 captions, $34.11, pickup at 12:45 AM, strawberries on a cheesecake.

Matthew Diakonov, Written with AI

Published April 25, 202611 min read

Hear the eight skills against your menu

4.9from 200+ restaurants

Eight skills mapped to literal timestamps in src/components/voice-activity-data.ts

Reference call: 102.36s, 46 captions, $34.11 total, 12:45 AM pickup

Same skill set runs at 20 simultaneous calls per restaurant line

Eight skills, one phone call

What a Claude-Skill-shaped restaurant phone agent looks like at runtime

0:11 menu-lookup fires (4.14s pause)

0:15 modifier-resolver enumerates eggs and bread

0:52 upsell fires once, with a mild joke

1:06 non-menu-fallback maps strawberries to a typed modifier

1:31 pos-line-builder writes $34.11 to the kitchen

0:00 / 0:05

The file every claim on this page is read off

Every other page on this topic argues from product copy. We cannot do that here, because the angle is mechanical: which skill fires when. So the spine of this page is one file in this repo, src/components/voice-activity-data.ts. It exports a single object with a duration in seconds, a sample rate, two amplitude envelopes (one per channel), and an array of caption objects with a speaker, a start time, an end time, and the words spoken.

The file was produced by scripts/build-voice-activity-data.py reading a 16-bit stereo WAV and sending it to Deepgram withmultichannel=true. Customer voice on the L channel, agent voice on the R channel. No re-recording, no studio retakes.

The anchor fact

duration: 0s, sampleRate: 60, 46 caption rows. The longest pause inside an agent turn is 4.14s, between “one moment, please” at 11.84s and the next agent word at 15.98s. That pause is the literal menu-lookup skill firing.

Fork the repo, grep for "voiceData", and you can read the same captions we're quoting below.

0sCall length

0Caption rows

0Skills firing

$0Order total

The eight skills, one paragraph each

Each card is a discrete handler in our pipeline. Each one matches a beat in the reference call. If you wrote these as Claude Skills, each card below is one SKILL.md file.

menu-lookup

Fires the moment a customer says a dish name. In our reference call this is the 4.14s pause after "one moment, please" at 10.96s. The skill resolves "lumberjack slim" to the lumberjack slam SKU, surfaces required modifier slots, and hands them back as a question.

modifier-resolver

Asks for required modifier slots in a single breath, with options enumerated. Triggered after menu-lookup hands back unfilled slots. In the reference call: 15.98s, the eggs-and-bread question with sourdough listed.

non-menu-fallback

Handles modifiers that are not on the menu. Fires when the customer says "if that's an option." Resolves to a typed modifier line on the ticket, not a free-text note. In the call: 65.98s, strawberries on the cheesecake.

upsell

Two sentences, one mild joke, runs once after the order looks complete. In the reference call: 52.52s to 59.44s, "It might make your Coke jealous." Lifts AOV 15 to 20% in production.

pos-line-builder

Maps each captured intent into a POS line item with modifier IDs, not free text. The output of this skill is what lands in Toast or Square or Clover. In the call: $34.11 is what the POS calculated, not what the agent guessed.

pickup-quoter

Reads kitchen prep time and current load from the POS, returns a clock time the customer can write down. The reference call returns 12:45 AM, which lines up because the call was placed at 12:21 AM.

name-capture

Asks once, accepts disfluencies, writes the order under that name. In the call the customer says "Put it under the name Rob." The name lands on the ticket in one turn.

recap

Reads the full order back, line by line, before submitting. In the reference call this is 75.42s: lumberjack slam (eggs scrambled, sourdough), Coke, cheesecake with strawberry topping. Customer confirms with one word and the POS write fires.

What one of the eight actually looks like

This is the menu-lookup skill rendered as a SKILL.md file. The frontmatter tells the orchestrator when to invoke it; the markdown body is the procedure. Latency budget and disfluency rules live in the body because they are part of the procedure, not metadata.

skills/menu-lookup/SKILL.md

Which skill fires when, in the reference call

Below is the same call as a sequence diagram. The L channel is the customer, the R channel is the orchestrator picking which skill to fire next, and the third lifeline is the POS write at the end. Every label is a real timestamp from the data file.

Skill firing order, Denny's reference call

What goes in and what comes out

A skill orchestrator is uninteresting on its own. The interesting part is the I/O surface. On the left: what the agent receives on every turn. On the right: what the eight skills, taken together, write back to the world.

Phone-call I/O surface

The call, beat by beat, with the skill that owns each beat

This is the same 102.36 seconds, told as a vertical timeline. Read it as a smoke test for whether your skill set is complete. If a beat below has no owner in your version, that is a missing skill.

0.00 to 0.40s — pickup

No skill yet. The agent answers "Hi" before the second ring. This beat is canned and lives outside the skill pipeline so the answer time is bounded.

0.40 to 3.44s — disclosure

"This is Denny on a recorded line." Three sentences in three seconds. Also canned; the orchestrator only takes over once the customer speaks.

5.36 to 9.36s — first ask

"Can I get one lumberjack slim and one Coke?" Two intents in one breath. The orchestrator splits them.

10.96 to 15.98s — menu-lookup

The 4.14s pause after "one moment, please." Resolves "lumberjack slim" -> lumberjack slam, surfaces required modifier slots (eggs, bread). Coke goes through too: no required modifiers.

15.98 to 25.66s — modifier-resolver

"How would you like your eggs cooked, and what kind of bread? White, brown, multigrain, or sourdough?" Required modifiers asked once, options enumerated.

37s to 52s — name-capture and probe

The orchestrator asks "anything else?" and accepts a name in the same exchange. Both small skills, both deterministic.

52.52 to 59.44s — upsell

"It might make your Coke jealous." Two sentences, 9.5 seconds, fires once. Result: cheesecake added.

65.98 to 75.34s — non-menu-fallback

"Strawberries, if that's an option?" Resolves to a typed strawberry-topping modifier on the cheesecake SKU. Three seconds from utterance to confirmation.

75.42 to 95s — recap

The full order read back, line by line, before any POS write. Lumberjack slam (eggs scrambled, bread sourdough), Coke, cheesecake with strawberry topping.

95 to 102.36s — pos-line-builder + pickup-quoter

POS write fires. $34.11 comes back from the POS. Pickup-quoter reads kitchen prep time and load and returns "12:45 AM." Goodbye.

What a usable restaurant phone skill needs

Most write-ups stop at "give it a description and a body." That is necessary but not sufficient. A skill that ships in a real phone pipeline also needs:

Production checklist for one skill

A latency budget in seconds, written into the body. Menu-lookup is 4.5s; recap is 2.5s. Without a budget, the agent stalls on the phone.
A typed return shape, not free text. pos-line-builder returns line items with modifier IDs; the kitchen never sees a comment field.
A disfluency tolerance section. Phone callers say slim instead of slam. The skill body has to say what to do about that.
An allowed-tools list scoped to the smallest set the skill actually needs. menu-lookup does not need pos.write_order.
A clear handoff. menu-lookup hands to modifier-resolver. Each skill names the next one explicitly so the orchestrator does not guess.
An A/B hook. Upsell lifts AOV 15 to 20%; you cannot prove that without isolating the skill from the rest of the call.

What it looks like when the orchestrator runs

A trace of the eight skills firing through the reference call. Every line corresponds to a real caption row in the data file.

trace: dennys-order.mp3

What pos-line-builder is allowed to write to

The pos-line-builder skill is the one that has to know your kitchen system. Out of the box it ships with these integrations.

Toast

Square

Clover

NCR Aloha

Revel

Olo

Lightspeed

TouchBistro

SpotOn

Lavu

50+ POS integrations available. The skill itself is unchanged across them; only the tool layer differs.

“The experience was better than speaking to a human. No hold time, no confusion, no rushing.”

Customer

11-location South Indian chain, Bay Area

Want to hear the eight skills run against your menu?

We will scrape your menu, map it to your POS, and play back a sample call. 30 minutes, no slides.

Frequently asked questions

Where does the 102.36-second number come from?

It is the literal duration field on the call data file shipped in this repo at src/components/voice-activity-data.ts, generated by scripts/build-voice-activity-data.py against a recorded WAV using Deepgram multichannel. The file declares duration: 102.36, sampleRate: 60, two amplitude envelopes (customer on the L channel, agent on the R), and 46 timestamped caption rows. Every timestamp on this page is read directly off that file.

Is "Claude Skills" the same as a restaurant phone agent?

No. Claude Skills is a packaging pattern from Anthropic where each skill is a folder with a SKILL.md (YAML frontmatter plus a markdown procedure) plus optional tools. The pattern maps cleanly onto a restaurant phone agent because every recurring beat in a call (menu lookup, modifier resolution, upsell, pickup quoting) is a small, named, repeatable procedure. We use the pattern conceptually here. The agent that produced the reference call is PieLine, not raw Claude Skills, but the same eight skills exist in our pipeline as discrete handlers.

Why eight skills and not one giant prompt?

A single prompt that tries to handle menu lookup, modifier resolution, upsell, pickup quoting, and POS line construction simultaneously hits the wall at the first non-menu modifier. It either says yes to everything (and the kitchen sees garbage) or no to everything (and the upsell on "strawberries on the cheesecake" never lands). Splitting the work into eight skills means each one has a tight latency budget, one tool surface, and a deterministic handoff. The 4.14s menu-lookup pause in the reference call is bounded; a monolithic prompt would slide unbounded.

What does the SKILL.md frontmatter actually contain?

A name, a description that tells the agent when to use the skill, and an allowed-tools line that scopes which functions the skill can call. The body is plain markdown: a procedure, edge cases, latency budget, and disfluency rules. The example menu-lookup skill on this page is a faithful sketch of the shape; the description matters most because it is what the orchestrator matches the current turn against.

How does the non-menu-fallback skill keep strawberries off the kitchen as a free-text note?

It maps the utterance to a typed modifier on the resolved SKU, not a comment field. In the reference call the cheesecake SKU has a "topping" modifier with strawberry as a known option. The skill's job is to detect that the customer phrased a modifier tentatively ("if that's an option"), confirm it exists in the modifier table, and return a properly typed line. If the modifier does not exist, the skill says no out loud rather than writing a note the cook has to interpret.

Why does the upsell skill have a joke in it?

Because a flat upsell ("would you like to add dessert?") converts at near zero. The reference call line is "Before I finish up, would you like to add a sweet treat like a slice of New York style cheesecake? It's so good. It might make your Coke jealous." The joke is a concrete signal that this is not a phone tree. PieLine's baseline lift on the upsell skill is 15 to 20% in average order value, and it is one of the easier skills to A/B once you have it isolated as a discrete handler.

Can a restaurant build this themselves with raw Claude Skills?

You could write the eight SKILL.md files in an afternoon. The hard part is the tool layer: pos.search_items, pos.get_required_modifiers, pos.get_prep_time, pos.write_order. Each one must hit a specific POS (Toast, Clover, Square, NCR Aloha, Revel) and return modifier IDs the kitchen recognizes. PieLine ships those tool integrations across 50+ POS systems plus the call-handling stack (telephony, latency optimization, 20 simultaneous channels). The skills are the easy part; the integrations are the work.

How many simultaneous calls can this run on?

Up to 20 simultaneous calls on a single restaurant line. A human phone employee handles one. That is also why the answer for a Friday at 7pm is the same as the answer for a Tuesday at 12:21 AM (the time the reference call was placed). The skill set runs the same way at peak as it does at midnight.

What is the cost per call compared to a phone employee?

$350/month covers up to 1,000 calls. Beyond that, $0.50 per call. A dedicated phone employee runs $3,000 to $4,000/month and handles one call at a time. Onboarding (menu scrape, POS mapping, dish description build, skill configuration) is included. There is a money-back guarantee on the first month if 95%+ order accuracy or 90%+ AI-handled rate does not hold for your restaurant.

Related, on the same recorded call

Reference call

AI phone agent for restaurants, read line by line

The same Denny's call as this page, but read out loud caption by caption. Useful if you want to see what a single skill firing actually sounds like.

Read

Capacity

20 simultaneous calls on the same line, with the math

Why the same skill set runs unchanged at peak hour and at 12:21 AM. The capacity story behind every other page on this site.

Read

Integration

POS integration: what actually moves between the agent and Toast or Clover

Where modifier IDs come from, why a webhook is not POS integration, and what the pos-line-builder skill is actually writing.

Read

A restaurant phone agent is eight Claude Skills firing in order. Here are the eight, against a real 102-second call.

The file every claim on this page is read off

The eight skills, one paragraph each

menu-lookup

modifier-resolver

non-menu-fallback

upsell

pos-line-builder

pickup-quoter

name-capture

recap

What one of the eight actually looks like

Which skill fires when, in the reference call

What goes in and what comes out

Phone-call I/O surface

The call, beat by beat, with the skill that owns each beat

0.00 to 0.40s — pickup

0.40 to 3.44s — disclosure

5.36 to 9.36s — first ask

10.96 to 15.98s — menu-lookup

15.98 to 25.66s — modifier-resolver

37s to 52s — name-capture and probe

52.52 to 59.44s — upsell

65.98 to 75.34s — non-menu-fallback

75.42 to 95s — recap

95 to 102.36s — pos-line-builder + pickup-quoter

What a usable restaurant phone skill needs

What it looks like when the orchestrator runs

What pos-line-builder is allowed to write to

Want to hear the eight skills run against your menu?

Frequently asked questions

Related, on the same recorded call

AI phone agent for restaurants, read line by line

20 simultaneous calls on the same line, with the math

POS integration: what actually moves between the agent and Toast or Clover

Comments (••)

Comments ()