What are LLMs?

Code Playground is only enabled on larger screen sizes.

LLM stands for Large Language Model. It is a type of advanced artificial intelligence model specifically designed to understand, process, and generate human language.

These models often come with trillions of parameters, allowing them to store extremely complex language patterns.

Some popular examples are ChatGPT by OpenAI, Gemini by Google, Claude by Anthropic, and Llama by Meta.

By integrating LLMs into your apps, you can create various AI features such as virtual assistants, review summarizers, smart recommendations, and much more. The only limit is your imagination.

In this chapter, we are going to discuss how to choose the right LLM for your app, and how to integrate them into your application.

We'll use the OpenAI Platform for demonstration, which offers a collection of different LLMs with different capabilities and specializations.

Setting up an OpenAI account

To get started, first go to https://auth.openai.com/create-account and create a new account.

The OpenAI Platform is where you can set up your APIs, check usage stats, view logs, and test different LLMs offered by OpenAI.

And next, OpenAI requires you to purchase some credits to get started.

Navigate to Settings -> Billing, and click the Add to credit balance button.

Fill in your card information,

And set up your billing system. The initial credit purchase is at least $5. You can choose if you want automatic recharge. If you're just testing out the platform, $5 is more than enough.

Setting an API key

And finally, we need to set up an API key, so we can access OpenAI's various LLM models securely within our app.

Go to Settings -> API Keys, and click the Create new secret key button.

Type in the name for this API key; the project will be our default project, and then click Create secret key.

And in the next window, copy the API key and save it somewhere safe. We'll need to use it later.

What are tokens

Before we move on, there are some important key concepts for LLMs that we need to discuss.

The first one is the token.

Machines process human language differently than we do. We humans split the sentence into words, and we map each word to a different meaning in our head, and then understand the sentence as a whole.

But machines can't really "understand" the word, instead, it splits the sentence into tokens.

A token can be as short as one letter or as long as one word, or part of the word, depending on the language and the tokenizer used by the model.

For example, the word "fantastic" might be a single token, while "unbelievable!" could be split into several tokens like "un", "bel", "ievable", and "!". There are many visual tokenizer platforms you can use to test this, such as the GPT-Tokenizer Playground.

For the English language, a token roughly equals to 3/4 of a word.

The token will then be converted into a number, aka the token ID, which the machine can understand and process.

Understanding tokens is important because this is how LLMs determine the cost and the maximum input/output size.

What is context window

The context window is the maximum number of tokens an LLM can consider at once when generating a response. This includes your inputs, the models's previous responses, any global instructions, any files you've uploaded, and so on.

For example, if a model has a context window of 4,000 tokens, it can "see" up to 4,000 tokens of combined input and output at the same time.

When you exceed that limit, the oldest token will be chopped off, meaning your earlier instructions will be "forgotten".

Larger context windows allow for longer conversations, bigger documents, or more detailed instructions, but they may also increase cost and latency. This is another important factor you must consider when choosing a model.

Choose the right model

One of the first steps to build an AI app is to choose the right LLM, and there are a few aspects you need to consider to make the right decision.

Before you start, ask yourself these questions:

How fast do you need the LLM to be?
How smart do you want the LLM to be?
What is your budget?

Obviously, you don't want your app to take forever to give a response. It will have a significant negative impact on the user experience.

But, on the other hand, fast models are often not as "smart", and your app may require something with a decent level of accuracy, reasoning ability, and knowledge scope.

And if you want your app to profit, cost is also an important factor. Models that are fast and smart often cost more, so you need to make sure your app has a clear budget restriction.

In practice, balancing speed, capability, and cost is the impossible triangle when choosing a model. You will have to make sacrifices.

Let's take a look at how things work in action.

Head over to the OpenAI documentation, here you can see OpenAI's latest models.

Click on one of the models, and you can access its details page.

This is where you can see some of its key metrics.

For GPT-5.2, it has good reasoning ability and decent speed. However, it comes at a cost, it is one of OpenAI's more expensive models.

The price per million tokens (roughly 750,000 words) is $1.75 for inputs, and $14 for output. You should be absolutely sure that you need something this powerful before opt to use this model, or it's going to drain your bank account real fast.

Another thing you should pay attention to is the context window. GPT-5.2 has a context window of 400,000 tokens, which is roughly 300,000 words, and the maximum output is 128,000 tokens, roughly 96,000 words.

Compare OpenAI models

By clicking the Compare button, you can compare multiple models by see their metrics side by side.

As a demonstration, we are going to compare GPT-5.2, GPT-5 mini, and GPT-5 nano.

Among the three models, GPT-5.2 is the smartest. It is most suitable for professional work, such as writing technical reports, data analysis, in-depth code reviews, and so on. But be careful, because it is also the most expensive.

GPT-5 nano is the fastest among the three, and it is ideal for quick summarization and classification tasks, or any high-volume jobs that does not require reasoning. It is also much cheaper than GPT-5.2, at only $0.05 per million tokens for input, and $0.40 per million tokens for output.

GPT-5 mini is a more balanced option. It has better reasoning than GPT-5 nano, and cheaper than GPT-5.2, making it a good option for everyday uses, such as standard customer support, workflow automation, and code assistance.

import "./styles.css";

document.getElementById("app").innerHTML = `
<h1>Hello world</h1>
`;

Open Sandbox