Bigger models or more data? The new scaling laws for LLMs

Evento:: AI Conf 2024
Lingua:: Inglese

Speaker

Luca Baggi
AI Engineer - xtream

Bigger models or more data? The new scaling laws for LLMs

The incredibly famous Chinchilla paper changed the way we train LLMs. The authors - including the current Mistral CEO - outlined the scaling laws to maximise your model performance under a compute budget, balancing the number of parameters and training tokens.

Today, these heuristics are in jeopardy. LLaMA-3, for one, is trained on an unreasonable amount of tokens of text - but this is why it's so good. How much data do we actually need to train LLMs? This talk will shed light on the latest trends in model training and perhaps suggest newer scaling laws.

Tag

Speaker

Luca Baggi

Bigger models or more data? The new scaling laws for LLMs