brandonwie.dev
EN / KR
On this page
ai-ml ai-mlnlpintent-classificationdistilbertbartmodel-selection

Phase A→B Classifier Deployment: Zero-Shot to Fine-Tuned

How to ship a working intent classifier on day one with zero labeled data, then graduate to a domain-specific model as you collect examples.

Updated April 6, 2026 3 min read

When building an intent classifier for a new domain, you have no labeled data on day one. How do you ship a working classifier immediately while building toward a domain-specific model? The answer is a two-phase deployment pattern that the industry has converged on.

The Cold Start Problem

You need to classify user queries into intents — SUMMARIZE, EXTRACT, REASON, SEARCH_ONLY — but you have no training data. Collecting and labeling 500+ examples per category takes weeks. Meanwhile, users need the feature now.

This is the classic ML cold start: you need a model to serve users, but you need user data to train a model. The two-phase approach breaks this chicken-and-egg cycle.

Phase A: Zero-Shot with BART-MNLI

Start with BART-MNLI (~400MB), a zero-shot classification model that requires no training data. It frames classification as natural language inference: “Does this text entail ‘this is a summarization request’?” This NLI framing works with any label set — you define intents as plain English descriptions, not numeric classes.

Phase A ships immediately. Accuracy is good (~85%) but not domain-specific. The real value is that it starts collecting labeled data through user corrections and feedback. Every time a user corrects a misclassification, you get a free training example.

Phase B: Fine-Tuned DistilBERT

After collecting ~500+ labeled examples per category (via user corrections, golden set curation, and augmentation), fine-tune DistilBERT (~250MB). It is 4x faster at inference (~12ms vs ~50ms), 40% smaller, and achieves higher domain-specific accuracy (~95% vs ~85%).

The transition pipeline looks like this:

collect labeled data (corrections, golden sets)
  → augment (templates, synonyms) to reach 500+/class
  → fine-tune DistilBERT with HF Trainer + early stopping
  → assess against golden set (accuracy >= 0.90, f1 >= 0.88)
  → shadow comparison (run both models, compare metrics)
  → manual promotion (Staging → Production in MLflow)

Model Comparison

DimensionBART-MNLI (Phase A)DistilBERT (Phase B)
Model size~400MB~250MB
Inference speed~50ms/sample~12ms/sample
Training data needed0500+ per class
Accuracy (domain)Good (~85%)Better (~95%)
FlexibilityAny labelsFixed label set
Architecture12-layer encoder-decoder6-layer encoder

Why This Pattern Is Industry Standard

This two-phase approach is not novel. Google uses it (start generic, collect data, specialize), Spotify applies it to content tagging (zero-shot → fine-tuned), and most enterprise ML teams follow the same progression. The pattern works because it decouples shipping from data collection — you deliver value immediately while building toward a better model in parallel.

When to Use This Pattern

  • New classification tasks where labeled data does not exist yet
  • Products where user corrections provide a continuous labeling signal
  • Resource-constrained environments (NAS, edge) where model size matters after the transition

When Not To

  • If you already have abundant labeled data — skip Phase A entirely
  • If categories change frequently — zero-shot’s flexibility may be a permanent advantage over fine-tuning
  • If the classification task is too nuanced for NLI framing (e.g., subtle sentiment distinctions)

Key Takeaway

Do not wait for perfect data to ship a classifier. Start with zero-shot (BART-MNLI), collect data through user interactions, and graduate to a fine-tuned model (DistilBERT) when you have enough examples. The two-phase pattern lets you ship on day one and improve continuously.

Comments

enko