Restaurant management and operations has six measurable channels. The phone makes seven.
Open any playbook on this topic and you will find the same six categories: FOH, BOH, inventory, labor, POS, and marketing. The phone is never on the list, because a human on a phone leaves no artifact you can manage. PieLine ships every call as a 60 Hz amplitude envelope and a 46-caption transcript. The reference call already checked into our repo has 6,157 samples per channel, exactly zero cross-talk frames, and 1,922 silent frames (31.2 percent of the call). That is a management surface. Here is how to read it.
The six channels every other guide already covers
Before we argue that there is a seventh, it is worth being honest about the six. Any decent playbook on this topic hands the manager a weekly review agenda with line items grouped in roughly these buckets. You have read all of them. They work. They are also incomplete.
Each of these has an artifact a manager opens on Monday. The POS pulls a report. Inventory posts a variance. Labor exports hours. The cook line leaves tickets. That is why they count as channels: each one produces something the manager reads. The phone is the one activity inside the restaurant that has never produced anything to open.
Why the phone has always been the exception
Picture a Tuesday dinner service at a ten-location chain. The cashier picks up the phone and takes an order. She re-keys it into the POS. The POS emits the row, inventory adjusts, BOH gets a ticket. From the management dashboard the order looks identical to an order placed at the counter. The call itself is gone.
That invisibility is why the industry ends up with cost-per-call estimates, missed-call guesses, and training modules titled Phone etiquette that have no measurable output. You cannot manage what you cannot open. For the phone, the manager has nothing to open.
The underlying problem
Every other operational channel has a primary artifact. The phone does not. Until the phone produces an artifact the manager can read in under 30 seconds, it stays off the agenda and stays the largest source of unmanaged loss in the operation.
The artifact the phone now leaves
When PieLine answers a call, the audio is recorded as a 16-bit stereo WAV with the customer on the left channel and the AI on the right. The file in the repo is public/audio/dennys-order.mp3, one hundred and two point three six seconds long. A script in scripts/build-voice-activity-data.py runs Deepgram multichannel on the raw audio to get per-word timestamps, then computes per-channel RMS amplitude envelopes sampled at 60 Hz and groups the words into speaker-tagged captions on pause or punctuation boundaries.
What lands on disk is a single TypeScript module, src/components/voice-activity-data.ts, that looks like this.
This is the phone channel's primary artifact. A manager opens it the same way they open the ticket screen.
The four numbers a manager now reads
Every new channel eventually produces a small set of round numbers that managers memorize. Food cost percent. Labor percent. Ticket time. Cover count. Here are the four for the phone. All of them come from the reference call above, and all of them can be recomputed against any call that flows through PieLine.
Cross-talk frames
0 / 6,157
Frames where both parties are above 0.1 amplitude. Human-staffed lines drift up under rush; this line is flat zero by design.
Silent frame percent
0%
Frames where both parties are under 0.02 amplitude. This is the think-time band. Compare across calls and locations.
Talk time ratio (AI : customer)
0
49.04 seconds AI speech vs 19.76 seconds customer speech. Higher than a cashier call, on purpose: two readbacks and an audible POS placement.
Peak envelope
0
AI max RMS (customer max is 0.8857). Well-normalized call, no clipping, no whisper.
How the artifact is produced, end to end
Nothing about this pipeline is magic. It is a small stack you can read end to end. WAV goes in, a typed TypeScript module comes out. Everything a manager reads is derived from that module.
Stereo WAV to manager dashboard
What the manager actually does on Monday morning
A new management channel is useless if it does not fit the existing cadence. The phone channel piggybacks on the weekly operations review. Ten minutes, sampled calls, four checks per call.
The ten-minute phone channel review
Pull a sample of five calls from the prior week
Stratify by daypart (one lunch, one dinner rush, one off-peak, two holiday or weekend). Do not pick 'problem' calls; pick representative ones.
For each call, verify cross-talk frames = 0
If the count is nonzero, the AI was interrupted by a third party on the line (manager conference, cashier pickup). That is a policy problem, not an agent problem.
Check silent frame percent against the band
Default band is 20 to 40 percent. Below 20 means the agent is rushing (often on a menu that is well-mapped); above 40 means menu mapping is weak and clarifiers are stalling.
Tick the seven structural segments
Greet, intent capture, clarifier, customer resolution, upsell, readback, POS placement. Two or more missing goes into the onboarding queue with a one-line note.
Log the four numbers in the weekly ledger
Sample size, cross-talk total, silent percent band, segment coverage. Four columns, five rows. Trend week over week. Flag outliers.
What the ledger looks like in the terminal
The manager does not need a new tool. The ledger is a CSV. Five rows, four numbers, twenty cells. The whole phone channel fits in an email signature.
What each of the four KPIs actually tells the manager
A KPI you cannot act on is a number you throw away. Here is the action each of the four drives, one per card.
Cross-talk frames
Target is 0. Nonzero means a third party interrupted the line (manager pickup, cashier conference). The fix is policy: route interrupts through a different channel. This KPI is the cleanest, because even one nonzero frame is a known specific event to investigate.
Silent frame percent
Target band 20 to 40 percent. Below 20, agent is rushing and skipping clarifiers (expect modifier misses downstream). Above 40, menu mapping is weak. The action is almost always a menu refinement pushed to onboarding.
Talk time ratio
AI-to-customer ratio near 2 to 2.5 is normal and means the AI is doing the two readbacks and the POS narration. Ratio below 1.5 suggests a readback was skipped. Above 3 usually means the customer was confused and the AI over-explained.
Peak envelope
Customer near 0.88, AI near 0.91 is well-normalized. A customer peak below 0.3 means the line is too quiet (often a bad cell connection), which translates to clarifier loops. This KPI catches infrastructure issues, not agent issues.
How this compares to the old way of managing the phone
Most restaurants already have call recording. A few have call analytics that export call counts and average duration. Neither is management infrastructure. Here is the concrete difference.
| Feature | Call recording + spreadsheets | PieLine envelope + captions |
|---|---|---|
| Artifact per unit of work | An MP3 per call, a CSV of durations | voice-activity-data.ts, 6,157 samples per channel |
| Time to read one call | Up to 102 seconds (listen end to end) | Under 30 seconds (four numbers) |
| Comparable across locations | Per-location, per-format, not normalized | Same schema, rolls up by daypart |
| Cross-talk as a KPI | Not measured, not available | Counted at 60 Hz, target = 0 |
| Silent frame band | Not measured, not available | Sampled at 60 Hz, band 20 to 40 percent |
| Segment coverage | Subjective tagging by whoever listens | 7 labeled segments per call |
The six old channels plus the new one, side by side
If you are reworking the weekly review agenda, this is what it looks like with the phone added in.
Monday morning agenda, now with seven channels
- FOH: covers, turns, wait time, hospitality moments
- BOH: ticket times, station load, waste, temp logs
- Inventory: counts, pars, depletion, shrinkage
- Labor: schedule, hours, labor percent, overtime
- POS and cost of sales: voids, discounts, COGS, modifier mix
- Marketing and loyalty: repeat rate, campaigns, attribution
- Phone: cross-talk, silent percent, talk ratio, segment coverage
“The phone used to be the one line item on the agenda where we could only say 'it went okay, I think.' Now it is four numbers, and the numbers are comparable week over week.”
Internal note from a PieLine onboarding call
What disappears from the ops manual once this is in place
A common mistake is to add the seventh channel on top of everything else the team was doing for the phone. That defeats the purpose. Adopting the new channel also retires old practices that had no measurable output. Three of them, specifically.
Retire: phone etiquette module
Tone, pace, hold etiquette, script compliance. None of this applies when the caller never speaks to a human. The remaining training is a single shift briefing: here is the file, here are the four numbers, here is where to click.
Retire: missed-call estimation
Old practice was to multiply average order value by a guess at missed-call percent. Every call is now answered and logged. The estimate is replaced by a count.
Retire: cashier phone coaching
A cashier who takes no phone orders needs no phone coaching. That one-on-one slot becomes available for something else. Most operators redirect it into upsell training for the counter.
See the envelope on your own calls
We will spin up PieLine against your menu, show you the per-channel envelope, and walk the four phone KPIs on a real call from your line.
Frequently asked questions
What are the six channels most restaurant management and operations guides already cover?
FOH (covers, turns, wait time, hospitality moments), BOH (ticket times, station load, waste, temp logs), inventory (counts, pars, depletion, shrinkage), labor (schedule, hours, labor percent, overtime), POS and cost of sales (voids, discounts, COGS, modifier mix), and marketing and loyalty (repeat rate, campaigns, attribution). Every top guide you will find is some reshuffle of these six. They all assume the phone is either (a) a low-volume auxiliary to walk-ins or (b) noise that a cashier handles on the side. That assumption is load-bearing, and it breaks the moment the restaurant discovers that phone orders are 20 to 40 percent of a typical dinner rush.
Why has the phone never counted as a management channel before?
Because there was no artifact. BOH leaves tickets. Inventory leaves counts. POS leaves rows in a database. The phone left a customer who either hung up, complained, or placed an order that a human cashier re-entered by hand. A manager could not open the phone channel on Monday morning and read it. You could only ask the staff how it went, which is the same as not managing it. PieLine's voice-activity-data.ts changes that. Every call becomes a 60 Hz amplitude envelope plus speaker-tagged captions, which is readable the same way a ticket screen is readable.
What is the anchor file and what numbers can I verify from it?
The file is src/components/voice-activity-data.ts in the pieline-phones repo, regenerated by scripts/build-voice-activity-data.py against public/audio/dennys-order.mp3. Duration is 102.36 seconds. Sample rate is 60 Hz. Each channel (customer, AI) is an array of 6,157 floating-point RMS values. The file also contains 46 speaker-tagged captions with start and end timestamps. Peak amplitude for the customer channel is 0.8857; peak for the AI channel is 0.9067. If you run the build script locally with your own stereo WAV, the resulting TypeScript file has the same shape, which is why this works as a management artifact and not a one-off marketing asset.
What is a cross-talk frame and why is zero the target?
A cross-talk frame is a 1/60th-of-a-second slice of the call where both the customer envelope and the AI envelope are above 0.1 amplitude at the same time. In plain terms, both parties are speaking. In human-staffed phone ordering, cross-talk frames go up under pressure because the cashier starts interrupting to keep the line moving, which is where modifier misses originate. In the shipped reference call the cross-talk count is exactly 0 across all 6,157 frames. A manager can pull that number from any call the product processes. It is the first numeric KPI the phone channel has ever had.
What is a silent frame and why does 31.2 percent matter?
A silent frame is a 1/60th-of-a-second slice where both envelopes are under 0.02. The reference call has 1,922 silent frames out of 6,157, which is 31.2 percent. That silence is think time: the AI working through menu mapping, the customer deciding, the pause before an upsell. Managers reading the phone channel want silence to stay in a band. Too little means the agent is rushing and skipping clarifiers; too much means the agent is stalling on a menu it does not have mapped. The number is only useful because it is comparable across calls at 60 Hz resolution.
How long does the AI actually speak versus the customer in the reference call?
AI speech totals 49.04 seconds; customer speech totals 19.76 seconds. The remainder is silence and hand-off framing. That ratio (2.48 to 1, AI to customer) is high on purpose: the AI is reading back the full order twice and narrating the POS placement, both of which exist so the manager can audit the call against the seven structural segments. A cashier who interrupted would compress both sides and lose the audit trail. The manager treats the ratio itself as a metric, not just the total duration.
How do I integrate this channel into an existing weekly operations review?
Add one line to the review agenda called Phone (with KPIs: answer rate, cross-talk frames, silent frame percent, segment coverage). On Monday, open five sampled calls from the prior week in the dashboard. For each, tick the seven structural segments (greet, intent, clarifier, resolve, upsell, readback, POS placement) and check whether the cross-talk count is still 0 and the silent percent is in its band. The whole exercise is ten minutes. It replaces the old agenda line that said Phone etiquette, which had no measurable output.
Does this work for chains with multiple locations?
Yes. The envelope-and-caption artifact is per-call and per-location. Multi-unit operators aggregate by location and by daypart, the same way they already aggregate FOH covers. The data is small (roughly 6,000 floats per channel per 100 seconds of audio, plus the captions), so rolling it up across 10 or 100 locations is not a data-engineering problem. The management practice is new, but the infrastructure is not.
What does PieLine actually provide beyond the data file?
The AI agent that answers the phone 24/7 across up to 20 simultaneous calls, POS integration with Clover, Square, Toast, NCR Aloha and Revel, menu scraping and modifier mapping, same-day onboarding, and a dashboard that exposes the envelope, captions, and KPIs per call. Pricing is $350 per month for the first 1,000 calls and $0.50 per call after. The data artifact is a byproduct of the product doing its job, not a separate service.
How is this different from just recording calls, which restaurants already do?
A raw recording is not management infrastructure. It is a 102-second MP3 that a manager would have to listen to end to end. The envelope and the captions compress the call into a shape a human can read in seconds: where the speakers overlapped, where there was silence, which of the seven segments the agent actually covered. That is the difference between a video file and a dashboard, and it is why the phone can now be the seventh channel in the weekly review rather than the one everybody ignores.
How did this page land for you?
React to reveal totals
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.