Effective training of language models to fill in the middle


We show that autoregressive language models can learn to fill in text after we apply a direct transformation to the dataset, which simply shifts the range of text from the middle of the document to its end. Although this data augmentation has attracted considerable interest in recent years, we provide extensive evidence that training models with large portions of the data transformed in this way does not harm the original left-to-right generative ability, as measured by confounding and sampling estimates across a wide range of scales. Given the utility, simplicity, and efficiency of Fill-in-the-Middle (FIM) training models, we suggest that future autoregressive language models be trained with FIM by default. To this end, we perform a series of ablations of key hyperparameters, such as data transformation frequency, transformation structure, and fill range selection method. We use these ablations to prescribe strong defaults and best practices for training FIM models. We released our best padding model trained with best practices in our API and published our padding benchmarks to aid future research.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *