← Algorithmic Time Complexity Tokenization →

Tokens In Foundational Models

The smallest units of data that a model can process in Natural Language Processing (NLP). Can refer to words, characters, subwords or even sentences depending on the granularity of the model.

Areas of application

Natural Language Processing (NLP)
Language Modeling
Text Analysis
Information Retrieval

Example

In a language model, tokens could be individual words (‘dog’, ‘cat’, etc.) or subwords (e.g., ‘d-o-g’)

Resources

Semantic Data Model – Introduction

← Algorithmic Time Complexity Tokenization →