LLM Model size

Large language models (LLMs) are language models that have a very high number of parameters, which are the weights that the model learns from data. Parameters help the model predict the next word or token in a sequence of text. The more parameters a model has, the more complex and powerful it can be, but also the more computationally expensive and resource-intensive it is to train and run.

The model sizes are usually measured in billions of parameters, ranging from ~1.5 to 70+. This means that some models have around 1.5 billion parameters, while others have more than 70 billion parameters. For comparison, BERT, a widely used LLM, has 110 million parameters, while PaLM 2, a recent LLM, has up to 340 billion parameters. There are many models below 20 billion parameters, many models above 70 billion parameters, but few models in between. This suggests that there is a gap in the parameter range of LLMs, which could be due to various factors, such as the availability of data, hardware, and optimization techniques.

LLM Model size (in billions of parameters)

Areas of application

Chatbots and virtual assistants, where LLMs can provide natural and engaging conversations with users.
Content generation, where LLMs can produce high-quality texts for various domains and purposes, such as creative writing, summarization, translation, etc.
Research assistance, where LLMs can help researchers find relevant information, generate hypotheses, and write papers.
Language translation, where LLMs can translate texts from one language to another with high accuracy and fluency.

Example

OpenAI’s GPT series of models (e.g., GPT-3.5 and GPT-4, used in ChatGPT and Microsoft Copilot), which are decoder-only transformer-based models that can generate natural language texts for various tasks.
Google’s BERT and RoBERTa models, which are encoder-only transformer-based models that can understand natural language texts for various tasks.
Google’s PaLM and Gemini models, which are encoder-decoder transformer-based models that can both understand and generate natural language texts for various tasks.
Meta’s LLaMA family of open-source models, which are encoder-decoder transformer-based models that can both understand and generate natural language texts for various tasks.
Anthropic’s Claude models, which are recurrent neural network variants that can generate natural language texts for various tasks.

Resources

← Precision in Machine Learning GPQA →