on the creation of narrow AI — NeurIPS 2025
New paper with Eric Michaud and Max Tegmark is out! (paper)
Some key findings:
1. In certain settings, curriculum learning effects are essential for achieving high performance — sometimes you need to train on a broad distribution to learn specific narrow skills.
2. Pruning often outperforms distillation at creating capable, task-specific networks.
3. Superposition remains a key bottleneck to efficient structured pruning.
4. Structured regularization can mitigate this problem by aligning task-specific features with prunable model components.
Check out the preprint for additional findings.
update: accepted to NeurIPS 2025 (: