Back to Newsroom
The Zyphra Training Cookbook
August 26, 2024
PALO ALTO, CALIFORNIA

Training hybrid models is hard, and papers tend to gloss over the practical engineering work that goes into building good ones. The purpose of this cookbook is to enable other technical groups to hit the ground running when building their own hybrid (SSM, Transformer, MoE) models.

Authors
Quentin Anthony, Beren Millidge, Paolo Glorioso, and Yury Tokpanov
Collaborators
Daniel A Roberts (Sequoia Capital & MIT), Andrey Gromov (Meta FAIR), Kushal Tirumala (Meta FAIR) and Hassan Shapourian (Cisco)