Back to Glossary
Glossary

What Is a Context Window in AI?

The short answer

A context window is the maximum amount of text a language model can process in a single request, measured in tokens. It includes both your input prompt and the model's response. Anything outside that window is invisible to the model during that request.

A context window is the maximum amount of text a language model can process at once, measured in tokens. It covers both the prompt you send and the response the model generates back to you.

Think of it as the model's working memory for a single request. It cannot see anything outside that window while it answers you, so what you fit inside it matters.

How a context window works in practice

Context windows are measured in tokens, not words or characters. A token is a chunk of text, roughly a few characters or part of a word, and both your input and the model's output count against the same limit.

This means the space is shared. If you paste a long document into a prompt, you leave less room for the model to reply. If you want a long answer, you need to leave headroom for it. Everything the model needs to reason about, your instructions plus the source material plus its own answer, has to fit inside that single window.

What this means for small and mid-sized businesses

For most SMEs, the practical takeaway is simple: the model only knows what you put in front of it during a request. Information outside the context window is not available to the model. It does not silently remember your previous chats, your full customer database, or a file you sent last week unless that content is included in the current request.

This is why well-built automations feed the model exactly the right slice of information at the right moment, instead of dumping everything in. Fitting the relevant policy, the specific customer message, and clear instructions into one window usually beats overloading it with data the model will ignore or run out of room for.

A concrete everyday example

Say you use an AI assistant to draft replies to customer inquiries. You paste the customer's message and your refund policy into the prompt, then ask for a reply. All three parts, the policy, the message, and the drafted reply, live inside one context window.

If your refund policy is very long and you also paste in a full order history, you might use up so much of the window that the model has little space left to write a proper reply, or it may not fit at all. Keeping the input tight and relevant leaves room for a useful answer.

When the context window is not the right thing to worry about

A bigger context window is not a fix for every problem. If your issue is that the model needs to remember things across many separate requests, over days or weeks, that is a memory and data-retrieval problem, not a window-size problem. You solve it by storing information and pulling the right pieces back in when needed.

It is also the wrong lever when accuracy is the real goal. Cramming more text into the window does not make the model more correct if the content is irrelevant or contradictory. Clear instructions and the right source material usually matter more than sheer capacity.

Frequently Asked Questions

Is a context window measured in words?

No. Context windows are measured in tokens, not words or characters. A token is a chunk of text roughly equal to a few characters or part of a word, and both your input and the output count toward the same limit.

Does the context window include the model's answer?

Yes. The context window includes both the input prompt you send and the model's generated response. They share the same space, so a longer prompt leaves less room for a longer answer.

Can a model remember things outside its context window?

Not during a single request. Information outside the context window is not available to the model at that moment. To carry information across requests, you need a system that stores it and reintroduces the relevant parts into a new prompt.

Ready to modernize your marketing?