While the "L" in Large Language Models (LLMs) suggests massive scale, the reality is more nuanced. Some LLMs contain trillions of parameters, and others operate effectively with far fewer.
Take a look at a few real-world examples and the practical implications of different model sizes.
LLM sizes and size classes
As web developers, we tend to think of the size of a resource as its download size. A model's documented size refers to its number of parameters instead. For example, Gemma 2B signifies Gemma with 2 billion parameters.
LLMs may have hundreds of thousands, millions, billions or even trillions of parameters.
Larger LLMs have more parameters than their smaller counterparts, which allows them to capture more complex language relationships and handle nuanced prompts. They're also often trained on larger datasets.
You may have noticed that certain model sizes, like 2 billion or 7 billion, are common. For example, Gemma 2B, Gemma 7B, or Mistral 7B. Model size classes are approximate groupings. For example, Gemma 2B has approximately 2 billion parameters, but not exactly.
Model size classes offer a practical way to gauge LLM performance. Think of them like weight classes in boxing: models within the same size class are more comparable. Two 2B models should offer similar performance.
That said, a smaller model can equal the same performance as a larger model for specific tasks.