What is the Context Window

The context window is the maximum number of tokens an LLM can consider at once when generating a response.

This includes your inputs, the model's previous responses, any global instructions, any files you've uploaded, and so on.

For example, if a model has a context window of 4,000 tokens, it can "see" up to 4,000 tokens of combined input and output at the same time.

When you exceed that limit, the oldest token will be chopped off, meaning your earlier instructions will be "forgotten".

Larger context windows allow for longer conversations, bigger documents, or more detailed instructions, but they may also increase cost and latency. This is another important factor you must consider when choosing a model.