Large language models (LLMs) are language models that have a very high number of parameters, which are the weights that the model learns from data. Parameters help the model predict the next word or token in a sequence of text. The more parameters a model has, the more complex and powerful it can be, but also the more computationally expensive and resource-intensive it is to train and run.
The model sizes are usually measured in billions of parameters, ranging from ~1.5 to 70+. This means that some models have around 1.5 billion parameters, while others have more than 70 billion parameters. For comparison, BERT, a widely used LLM, has 110 million parameters, while PaLM 2, a recent LLM, has up to 340 billion parameters. There are many models below 20 billion parameters, many models above 70 billion parameters, but few models in between. This suggests that there is a gap in the parameter range of LLMs, which could be due to various factors, such as the availability of data, hardware, and optimization techniques.