A 45-beat story,
at 62% fewer tokens.

How Qento keeps a protagonist faithful across a full story, and why a compact notation beats a verbose one.

A dev log from Karyll Catubig · solo iOS · shipped April 12, 2026

The problem

Forty-five beats. One context window.

A single Qento volume is forty-five beats long. One beat is one paragraph the model generates, triggered by one reader choice. Every beat has to carry evolving character trust, unresolved emotional debts, open narrative threads, a genre-specific physics, the protagonist's psychology (wound, armor integrity, drive), and the reader's latest nudge.

Feed all of that to the model as prose instructions and you spend around three thousand tokens per beat. The bill climbs. The latency climbs. And the output drifts anyway. Three thousand tokens of prose still under-specifies a live state system.

First attempt

Prose instructions. Felt humane. Wasn't.

When you pack live state into prose, the model reads it like narrative. It picks up on the voice of the instructions and mimics that, instead of tracking the constraints the instructions describe. Output leans toward whatever tone the prompt itself was written in. Characters drift. Grudges evaporate. The wound stops mattering.

The insight

The model doesn't read sentences. It reads signal density.

A compact notation has higher constraint-to-noise ratio than prose. The same state, expressed as equations, becomes a set of constraints the model can actually follow, not a wash of language it has to interpret first.

The language beneath

Kwento. The language Qento speaks to itself.

Kwento is the layer between the reader and the generator. It never shows up on screen. It only shapes what does. One master equation, two dialects.

Awit

The narration dialect. From Filipino epic verse. Writes beats, chapter openers, finales: the prose the reader sees.

Diwa

The interior dialect. From Tagalog for consciousness. Writes the thoughts she carries but doesn't speak.

The shared spine

One equation. Two dialects.

Awit and Diwa aren't separate systems. They parameterize the same master equation: nine terms, three time indices, typed coefficients. The dialects differ only in which blocks they select and which coefficients apply.

The coefficients aren't numbers. They're typed values that direct how the model composes each block. Below: the schema, then a seven-line excerpt of the scene state the equation operates on for a single beat.

P(t, k, n) = ⊕_i  c_i ⊙ B_i(t, k, n)

|B| = 9   (typed blocks)
c   ∈ {readOnly, initialize,
       subset(sel), conditional(pred), ...}
⊕   block sum    ⊙   coefficient composition
t   beat    k   chapter    n   volume
T = 0.82 ^ +0.15 | unsettled
W = stirring("she doesn't deserve to stay")
α = 0.42  →  armor cracking
τ = 0.31  →  trust slipping
ε = 0.58  →  she can feel it now
S = 3/4  →  everything unsaid
R = 2/4  →  sentences cut lean

About 350 tokens for the state layer per beat — ~88% compression from the prose version. Net ~62% fewer input tokens across the full per-beat payload, at measurably better craft adherence.

The loop

How the beat refolds.

1. Nudge

Reader chooses what happens next. The choice writes deltas into the scene state.

2. State

The equation's terms update. Tension shifts. Armor cracks. Trust drifts. Genre physics scale which terms move and by how much.

3. Awit

The narration dialect reads the new state and writes the next beat. One paragraph, faithful to the updated scene.

4. Diwa

If a private moment opens, she speaks her mind. The exchange feeds back into the state as the next beat's starting condition.

The numbers

What it cost. What it saved.

62%
fewer input tokens, full per-beat payload
88%
compression on the state layer vs prose
45
beats per volume
1
person built it

Lessons

Three things I'd do differently.

I started with prose because prose felt humane. It isn't. The model is a pattern-reader; it rewards structure. Every hour I spent softening the instructions was an hour making the output worse.

I underweighted the integration cost. The master equation is maybe five percent of the work. You could reproduce the notation in a weekend. The real craft is the parameterization: how a nudge writes deltas into the state, how genre physics scale the terms, how Awit stays faithful across the drift, how Diwa's last thought becomes the next beat's starting condition. Kwento is the architecture that makes all of that click. That took the year.

I should have benchmarked earlier. The 62% and 88% numbers didn't exist until I built a proper eval harness. Until then I was optimizing in the dark, tightening prompts that weren't actually bottlenecked, leaving the real bottlenecks alone.

The story itself
is on the homepage.

iOS 18 and up · iPhone · no account required to start.