Self-distillation on math: probing
Looking for the SSD effect in output probabilities
read more →Looking for the SSD effect in output probabilities
read more →Finding a training recipe to escape the self-distillation noise trap.
read more →Initial baselines for simple self-distillation on competitive math.
read more →Setting up the data needed to train and evaluate simple self-distillation for competitive math.
read more →Improving LLM performance on competitive math through unverified self-distillation.
read more →