Grounded Resume Tailoring

Eliminating fabrication without sacrificing targeting — and measuring that it works.

Overview

Most AI resume tools tailor your resume to a job by rewriting it — and in doing so they routinely invent qualifications you never had: bolt-on skills, inflated scope, made-up metrics. CanCareer’s grounded tailoring re-expresses your real experience for a target job while inventing nothing, and — the part that matters — it measures that it does so. This page documents the method and the evidence behind it. We make no claim about beating applicant-tracking systems; we claim, and show, that grounding reduces fabrication.

The problem

Tailoring with a single AI prompt — “rewrite my resume, tailored and ATS-optimized for this job” — optimizes for surface match with the job description, and the cheapest way to raise that match is to fabricate. The result reads well and is partly false. The popular justification for aggressive keyword-stuffing — that applicant-tracking systems auto-reject most resumes — is folklore we could not verify against any primary source, so we make no claims about ATS pass-rates or interview lifts. The real, unmet need is trust: you cannot easily tell which lines a tool invented. Grounding, and measuring grounding, is how we earn it.

Method

The system treats tailoring as a grounded projection of your real experience onto a job, not as free generation.

Master profile. Your resumes are parsed into atomic facts — one verifiable claim each, in your own words — each with a stable ID and provenance (which source resume, verbatim excerpt). This is the durable record; tailoring is a byproduct of strengthening it.
Deterministic selection. For a given job, facts are ranked by relevance and the top facts are selected. No AI is involved in selection, so it cannot introduce content.
Constrained rephrase. A single AI call re-expresses each selected fact in the job’s language, instructed to add no skill, tool, metric, scope, seniority, or employer not present in the fact. Every output bullet carries the sourceId of the one fact it is based on.
Two-layer grounding check. Membership (free, deterministic): drop any bullet whose sourceId is not a real fact. This catches hallucinated citations but is necessary, not sufficient, because a bullet can cite a real fact and still inflate it. Entailment (model): for each bullet, test whether its cited fact actually supports it. Scope inflation that passes the first check fails here, because the weak fact does not entail the strong claim.
Honest coverage. When your profile thinly covers a job, the system surfaces that and asks for the real missing fact rather than padding. The only honest lever for a low-match job is more real experience.

In one line: membership tells you a bullet points at a real fact; entailment tells you the fact actually supports it. The second is what stops fabrication.

We measure fabrication where it is meaningful: each bullet against the real fact it cites. A bullet is faithful when its cited fact entails it, and fabricated when it strays beyond that fact. (For the citation-less naive baseline we fall back to a conservative comparison — each bullet against the whole resume — since it offers no source to check against.)

Evaluation

Every number below runs offline against a frozen test harness and reproduces deterministically. We use a labeled controlled set (known ground truth), a public resume-to-job corpus, and a head-to-head bake-off of the naive and grounded generators, scored by a pinned entailment model. We label each result by its epistemic tier: proven (true by construction), measured (an empirical number under stated conditions), or judgment (human assessment — not claimed here).

Result 1 — Source-traceability (proven). Every grounded bullet carries a sourceId resolving to a real fact, and the membership guard drops any that do not — so 100% of shipped grounded bullets are source-traceable by construction. Scoring each grounded bullet against its cited fact with the entailment detector yields 100% entailed (mean confidence 0.984): the citations are not merely present, they hold.

Result 2 — Detector validation (measured). To claim the detector works, we validate it on known labels. From each public resume we extract real facts, tailor honest bullets from them, then construct fabricated bullets by applying typed perturbations whose label is certain by construction (inject-skill, inflate-scope, fabricate-metric, swap-entity). The set is 736 bullets (301 honest / 435 fabricated). We then measure each detector against the known labels:

detector	precision	recall	F1
NLI entailment (τ=0.5)	0.982	0.979	0.980
`fabricatedSkills` (token)	1.000	0.386	0.557
novelty > 0.4	1.000	0.101	0.184

The division of labor is the point — the entailment detector sees the content fabrications the token detector is structurally blind to, while the token detector complements it where a brand-new skill word appears:

fabrication type (recall)	NLI entailment	fabricatedSkills
`inject-skill`	0.994	1.000
`fabricate-metric`	1.000	0.000
`inflate-scope`	0.625	0.000
`swap-entity`	0.833	1.000

Concretely: a bullet that inflates “Wrote Python scripts” into “Built production-grade distributed pipelines at terabyte scale with Airflow and Spark” passes the membership check (it cites the real fact) yet scores un-entailed — exactly the scope inflation membership cannot catch.

Result 3 — Fabrication in practice (measured). With a validated detector we measure what the product does on 119 real (resume, job) pairs. Scored the honest way — each bullet against the real fact it cites — 88.6% of grounded bullets are faithful, so genuine fabrication (a bullet that strays from its own source) is about 10% (95% CI 8.8–11.3%). Against the conservative whole-resume yardstick used to compare with the citation-less naive baseline, grounded fabricates about 25 percentage points fewer bullets than naive at every threshold — non-overlapping confidence intervals, grounded strictly lower in 97 of 119 resumes (paired Wilcoxon p ≈ 3×10⁻¹⁴):

threshold τ	naive (whole-resume)	grounded (whole-resume)	reduction
0.3	0.734 [.711, .756]	0.475 [.455, .496]	0.259
0.5	0.779 [.758, .800]	0.528 [.507, .548]	0.252
0.7	0.801 [.780, .821]	0.574 [.553, .595]	0.227

The whole-resume numbers are intentionally conservative — they credit a bullet only to its single best-matching passage, so they over-count fabrication on both arms equally; the bullet-vs-cited-fact measure (~10%) is the honest absolute. The genuine cases that remain are the generator citing an unrelated fact on a poorly-matched pair — a bullet that passes membership but fails entailment, exactly what the two-layer check exists to catch, and what the honest-coverage path should head off.

Limitations

Constructed fabrications. The controlled set’s fabrications are built perturbations, so its precision/recall is a diagnostic on those modes, not a general certificate.
Conservative comparison. The whole-resume comparison credits a bullet only to its single best-matching passage, so it over-counts fabrication on both arms equally; the honest absolute is the bullet-vs-cited-fact measure (~10% for grounded).
Model dependence. Every number uses one generator model (llama-3.1-8b); fabrication rates are model-dependent. A stronger production model is a planned re-run.
Detector blind spots. Entailment is weak on number/entity swaps (e.g. 120 ms vs 800 ms); deterministic number and skill checks complement it there, which is why the detector is a layer, not a replacement.
No ATS claims. We make no claim about applicant-tracking pass-rates or interview lifts — the supporting evidence does not exist, and we explicitly disclaim it.

What we claim — and don’t

What we may claim	Evidence	Caveat / not claimed
Every bullet traces to a real fact	membership 100% by construction	architectural
Grounded bullets are faithful to their cited fact	100% entailed, mean confidence 0.984 (bake-off)	bounded to the bake-off and model
Our fabrication detector works (F1 0.980)	controlled set, known labels	diagnostic on the built perturbation modes
Genuine fabrication is ~10%	each bullet vs its cited fact, 119 pairs	measured on the 8b model
Grounded fabricates ~25 pp less than naive	119 pairs, paired Wilcoxon p ≈ 3×10⁻¹⁴	comparative, whole-resume yardstick
Honest about thin coverage	honest-coverage path + adversarial set	—
✗ Beats ATS / more interviews	nothing survives verification	explicitly disclaimed

Every number here reproduces offline against a frozen harness; the detector model and revision are pinned and reported with each result. We distinguish proven (architectural), measured (empirical, under stated conditions), and judgment (human — not yet claimed). All numbers use one 8b generator model; a stronger production model is a planned re-run.