LLM Basics: Engineering with Google AI Studio


Deep Dive: How LLMs Actually Work

Today we’re moving beyond simple analogies. We’re going to look at the actual levers we pull when working with Large Language Models (LLMs) like Gemini, using Google AI Studio as our laboratory.

1. The Raw Material: Tokens

LLMs don’t process strings of text; they process sequences of Tokens.

  • A token is a numerical representation of a character chunk.
  • Tokens are approximately 4 characters long.

2. Training Data vs. Context Window

It’s important to understand where the AI’s knowledge comes from:

  • Training Data: This is the massive “library” the model read during its creation. It’s static and doesn’t change once the model is finished. If something happened yesterday, the model might not know about it because it wasn’t in the training data.
  • Context Window: This is the “short-term memory” or the “workspace” for your current chat. Anything you paste into the prompt, or anything discussed earlier in the conversation, lives here. In Google AI Studio, Gemini has a massive 1-2 million token context window, meaning you can drop entire books or hour-long videos into it to discuss.

3. Prompts: System vs User

In Google AI Studio, you’ll see a distinction between different types of instructions. Mastering these is the key to getting high-quality results.

The System Prompt (The Director)

The System Prompt defines the “persona,” fundamental rules, and constraints for the entire conversation. It tells the model how to behave before the user even speaks.

  • Example: “You are a senior Python developer. Always explain code simply and provide a ‘Try This’ challenge at the end of every response.”
  • Pro Tip: Use the System Prompt to set the tone (professional, funny, academic) and to prevent the model from straying off-task.

The User Prompt (The Request)

The User Prompt is the specific task, question, or data you are providing right now. This is where you apply specific prompting strategies to improve the model’s performance:

  • Zero-Shot: Just asking a question (e.g., “Summarize this article”).
  • Few-Shot: Providing 2-3 examples of input/output pairs within the prompt to “prime” the model for a specific format.
  • Chain of Thought (CoT): This is one of the most powerful techniques. By simply adding an instruction like “Let’s think step by step,” you encourage the model to break down complex problems into smaller, logical parts before giving a final answer. This significantly improves reasoning for math, logic, and coding tasks.

4. Decoding Parameters: Controlling the “Guess”

When an LLM generates a response, it’s calculating the probability of every possible next token. We use parameters to control how it selects the winner:

Temperature

Controls randomness.

  • 0.0: Deterministic. The model always picks the most likely token (great for facts/code).
  • 1.0+: Creative. The model takes more risks (great for brainstorming).

Top K

Limits the model to only consider the top K most likely tokens.

  • If Top K is 3, the model ignores everything except the 3 best guesses. This prevents the model from picking a completely nonsensical word.

Top P (Nucleus Sampling)

Similar to Top K, but based on a cumulative probability threshold.

  • If Top P is 0.9, the model looks at the smallest set of tokens whose probabilities add up to 90%. This allows the selection pool to shrink or grow dynamically based on how “confident” the model is.

4. Live Demo: Google AI Studio

To see these in action, we’ll be using Google AI Studio.

  1. Select a Model: (e.g., Gemini 1.5 Pro).
  2. Adjust the Sliders: Look at the right sidebar to find Temperature, Top K, and Top P.
  3. Inspect Tokens: Use the “Token counter” to see how your prompt is being “read” by the machine.