LLM Basics: Engineering with Google AI Studio
Deep Dive: How LLMs Actually Work
Today weâre moving beyond simple analogies. Weâre going to look at the actual levers we pull when working with Large Language Models (LLMs) like Gemini, using Google AI Studio as our laboratory.
1. The Raw Material: Tokens
LLMs donât process strings of text; they process sequences of Tokens.
- A token is a numerical representation of a character chunk.
- Tokens are approximately 4 characters long.
2. Training Data vs. Context Window
Itâs important to understand where the AIâs knowledge comes from:
- Training Data: This is the massive âlibraryâ the model read during its creation. Itâs static and doesnât change once the model is finished. If something happened yesterday, the model might not know about it because it wasnât in the training data.
- Context Window: This is the âshort-term memoryâ or the âworkspaceâ for your current chat. Anything you paste into the prompt, or anything discussed earlier in the conversation, lives here. In Google AI Studio, Gemini has a massive 1-2 million token context window, meaning you can drop entire books or hour-long videos into it to discuss.
3. Prompts: System vs User
In Google AI Studio, youâll see a distinction between different types of instructions. Mastering these is the key to getting high-quality results.
The System Prompt (The Director)
The System Prompt defines the âpersona,â fundamental rules, and constraints for the entire conversation. It tells the model how to behave before the user even speaks.
- Example: âYou are a senior Python developer. Always explain code simply and provide a âTry Thisâ challenge at the end of every response.â
- Pro Tip: Use the System Prompt to set the tone (professional, funny, academic) and to prevent the model from straying off-task.
The User Prompt (The Request)
The User Prompt is the specific task, question, or data you are providing right now. This is where you apply specific prompting strategies to improve the modelâs performance:
- Zero-Shot: Just asking a question (e.g., âSummarize this articleâ).
- Few-Shot: Providing 2-3 examples of input/output pairs within the prompt to âprimeâ the model for a specific format.
- Chain of Thought (CoT): This is one of the most powerful techniques. By simply adding an instruction like âLetâs think step by step,â you encourage the model to break down complex problems into smaller, logical parts before giving a final answer. This significantly improves reasoning for math, logic, and coding tasks.
4. Decoding Parameters: Controlling the âGuessâ
When an LLM generates a response, itâs calculating the probability of every possible next token. We use parameters to control how it selects the winner:
Temperature
Controls randomness.
- 0.0: Deterministic. The model always picks the most likely token (great for facts/code).
- 1.0+: Creative. The model takes more risks (great for brainstorming).
Top K
Limits the model to only consider the top K most likely tokens.
- If Top K is 3, the model ignores everything except the 3 best guesses. This prevents the model from picking a completely nonsensical word.
Top P (Nucleus Sampling)
Similar to Top K, but based on a cumulative probability threshold.
- If Top P is 0.9, the model looks at the smallest set of tokens whose probabilities add up to 90%. This allows the selection pool to shrink or grow dynamically based on how âconfidentâ the model is.
4. Live Demo: Google AI Studio
To see these in action, weâll be using Google AI Studio.
- Select a Model: (e.g., Gemini 1.5 Pro).
- Adjust the Sliders: Look at the right sidebar to find Temperature, Top K, and Top P.
- Inspect Tokens: Use the âToken counterâ to see how your prompt is being âreadâ by the machine.