Marvin Labs | Guidance Tracking

Guidance Tracking captures the forward-looking statements management makes, links each one into a commitment tracked over time, and scores it against what actually happened. The output is a structured record of promises versus delivery for every company in coverage, summarized as two scores per company: accuracy and discipline.

It runs on Deep Research Agents and renders as structured tables, with each commitment showing its latest guided value, a source-linked date, a development history, and an outcome.

Reading and standardizing what management said is model-driven; the actuals each commitment is graded against are read straight from reported financials, not inferred by a model.

A Guidance Tracking report, with the accuracy and discipline scores at the top and each commitment listed alongside what was expected and what actually happened A Guidance Tracking report, with the accuracy and discipline scores at the top and each commitment listed alongside what was expected and what actually happened

Methodology

Data Sources

Guidance Tracking reads the same primary financial content that feeds the rest of the platform. Structured documents are always in scope:

Annual and interim reports
Earnings call transcripts (prepared remarks and Q&A)

Press releases are included when they carry forward-looking substance rather than boilerplate. Each press release is classified, and only those tagged as material to guidance are processed. Routine legal and supplemental filings are excluded.

Extraction walks the entire document, not only the summary or prepared-remarks section. Forward-looking statements frequently appear in the Q&A, in outlook sections, and in financial-review prose. Where a single sentence names several items, such as two geographies and two product lines, each is emitted as its own commitment, because each is independently verifiable.

Two Types of Forward-Looking Statements

Management predictions come in two forms, and both are tracked.

Metric-based statements

Metric-based statements set a quantitative target for a financial or operational measure: revenue, earnings, margins, user growth, production output. The target can be absolute, a range, or a relative term such as "mid-single digits" or "flat." Each is standardized into a testable assertion with a clearly defined metric, a target value or range, a segment (company-wide or a specific business unit), and a guided period.

Event-based statements

Event-based statements predict that a specific event will occur within a timeframe, without a numerical target: a product launch, a market entry, a facility opening, a regulatory decision. Each is standardized with a clearly defined event and a target date range.

Standardizing Vague Language

Management rarely speaks in clean numbers. The standardization step translates loose phrasing into a testable form while keeping the original quote attached, so every interpretation can be reviewed and challenged:

"Low to mid single digits" becomes a 1 to 5% range.
"Flat q/q" becomes a tolerance band around the prior period.
"Later in the year" becomes a date range with documented assumptions.
"Similar to last quarter" becomes a relational test against the prior result.

Linking Restatements: One Commitment Over Time

Management restates the same commitment across quarters. A target guided in January is reaffirmed in April, narrowed in July, and reported in October. That is one commitment moving through four filings, not four separate predictions.

Guidance Tracking links restatements into a single record. A commitment is one metric (or one anticipated event) for a given segment, target period, and currency basis. Every restatement of it is linked back to the same record, even when management shifts the date. The linked record shows how the guidance developed (raised, narrowed, lowered, reaffirmed, or dropped) and lets that path be compared against the realized outcome.

A metric guidance table where each row carries a history column tracing how the commitment was first guided, then raised, narrowed, or reaffirmed across later filings A metric guidance table where each row carries a history column tracing how the commitment was first guided, then raised, narrowed, or reaffirmed across later filings

Restatements are matched on meaning rather than exact wording. When the same underlying metric is phrased two ways across filings, such as "revenue growth" one quarter and "top-line growth" the next, both still resolve to the same record.

Resolving Outcomes

When a commitment's period arrives, its record is resolved against what actually happened.

Metric commitments are compared against the reported result, on the basis management guided to, with adjusted versus as-reported preserved. The comparison reads the reported financials directly rather than inferring numbers from document prose. Each metric commitment is labeled:

Beat: the realized outcome was better than the guided target.
Met: the outcome fell within the guided target or range.
Missed: the outcome fell short of the guided target.

When the company never reported a comparable actual, the commitment is left unresolved rather than forced into a verdict.

Event commitments are resolved on a schedule informed by their expected dates, validated against company communications and, where needed, external reporting such as news articles and industry reports. Launches and facility openings are often publicly visible without a formal filing. Each event commitment is labeled:

Occurred: the event happened within its window.
Delayed: the window has fully passed without the event, or it happened materially later than guided.
Pending: the window is still open at the evaluation date. This is not yet judgeable and is not a miss.
Cancelled: the commitment was explicitly abandoned.

Scoring: Accuracy and Discipline

Resolved commitments roll up into two scores per company, each on a scale of 1 to 5. The two are independent and measure different things.

Accuracy measures how well realized outcomes matched guidance, weighted by impact and miss magnitude. A small miss on a minor metric, or a slightly delayed event, barely affects the score. Repeated large misses on headline metrics such as revenue, EPS, or margin, or a cancelled flagship event, drive it down. Pending and unresolvable commitments do not count against accuracy.

Discipline measures how consistent and accountable the guidance practice is. It rewards commitments that were reaffirmed or explicitly updated each period and whose outcomes can be reconciled. It penalizes guidance issued once and then dropped without follow-up, or guidance that can never be checked because the company stopped disclosing the metric or never confirmed the event.

Separating the two reflects how analysts read management quality. A company can be accurate but undisciplined, hitting its numbers while managing guidance erratically. It can also be disciplined but inaccurate, reconciling every commitment while frequently missing. A single hit rate would hide that distinction. The approach follows research by Baik, Farber, and Lee (2011), which found that managers who issue frequent and accurate forecasts run firms that outperform peers on both operating and stock performance.

Practical Use

Build a data-driven view of management quality across a coverage universe, rather than relying on soft impressions.
Read an earnings call against the prior quarter's commitments instead of in isolation.
Distinguish teams that quietly drop guidance from teams that reaffirm and reconcile it.
Support an investment case with a record that holds up to scrutiny in an IC discussion.

For the wider framework these scores fit into, see Assessing Management Quality: Beyond Vibes, Handshakes, and Governance Checklists, which reads guidance accuracy alongside capital allocation, execution, and disclosure to judge a management team on its record rather than its reputation.

Guidance Tracking is available on the company Guidance page, which runs three agents: current state for open commitments, long-term performance for the scored track record, and earnings performance scoped to a single earnings event. The same evaluations are embedded in earnings reviews produced by Deep Research Agents and are queryable through AI Analyst Chat.