From Enormous Structured Models to On-device Federated Learning: Robustness, Heterogeneity and Optimization