← Back to projects

Knowledge Distillation Meets Reinforcement Learning

A two-stage training framework that first compresses a large "teacher" vision-language model into a smaller "student" using standard knowledge distillation, then further improves the student using reinforcement learning rewards.

Remote SensingReinforcement learningDomain shiftKnowledge distillationVision-language model

Problem

Smaller models are easier to deploy, but classic knowledge distillation can be too 'one-size-fits-all' for real-world data that varies a lot (different sensors, regions, patients, etc.), so the student may not generalize well.

Approach

  • Stage 1 (KD): train the student to mimic the teacher using logit matching + feature alignment + task loss.
  • Stage 2 (RL refinement): treat training like an 'agent' that gets rewards for (a) moving student features closer to teacher cluster centers and (b) producing confident predictions, while gradually shifting weight away from pure KD over time (dynamic balance).

Results

Zero-shot classification benchmarks: KDRL reports strong results across multiple remote-sensing datasets (e.g., RESISC 70.30, PatternNet 80.08, RSI-CB 48.85). Image-text retrieval: KDRL achieves high Recall@10 on RSITMD (67.44% image to text, 74.76% text to image) and RSICD (52.61% image to text, 57.04% text to image).

Knowledge Distillation Meets Reinforcement Learning
Highlights
  • "Teach, then coach." First the small model learns by copying a bigger expert model.
  • Rewards are intuitive: it gets rewarded for being more confident.
  • Strong wins on retrieval/classification, but not everything.