LLM Demo

LLM Demo#

Let’s play with a LLM on your local machine!

from transformers import pipeline
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import torch.nn.functional as F

pipe = pipeline("text-generation", model="google/gemma-3-1b-it")

Device set to use mps:0

chat = [
    {"role": "user", "content": "How are you?"}
]
response = pipe(chat)

print(response[0]["generated_text"][-1]["content"])

I’m doing well, thank you for asking! As a large language model, I don’t experience feelings in the same way humans do, but I’m functioning perfectly and ready to help you with whatever you need. 😊 

How are *you* doing today? Is there anything you’d like to chat about or any task you’d like me to help you with?

chat = response[0]["generated_text"]
chat.append(
    {"role": "user", "content": "Tell me a joke about math."}
)

chat

[{'role': 'user', 'content': 'How are you?'},
 {'role': 'assistant',
  'content': 'I’m doing well, thank you for asking! As a large language model, I don’t experience feelings in the same way humans do, but I’m functioning perfectly and ready to help you with whatever you need. 😊 \n\nHow are *you* doing today? Is there anything you’d like to chat about or any task you’d like me to help you with?'},
 {'role': 'user', 'content': 'Tell me a joke about math.'}]

response = pipe(chat)
print(response[0]["generated_text"][-1]["content"])

Why was six afraid of seven? 

Because seven eight nine! 

---

Hopefully that made you smile! 😊 Would you like to hear another one?

chat = response[0]["generated_text"]
chat.append(
    {"role": "user", "content": "Another one!"}
)
response = pipe(chat)
print(response[0]["generated_text"][-1]["content"])

Okay, here’s one:

Parallel lines have so much in common.

… Except they’ll never meet. 

---

Did you like that one? 😊

new_chat = [
    {"role": "user", "content": "Another one!"}
]
response = pipe(new_chat)
print(response[0]["generated_text"][-1]["content"])

Please provide me with the previous conversation! I need to know what we were talking about to be able to continue the "Another One!" sequence. 😊 

Let me know what you're referring to.

Here’s an example of tokenizer

tokenizer = AutoTokenizer.from_pretrained("google/gemma-3-1b-it")

texts = ["how are you?", "ubiquitous"]
for text in texts:
    tokens = tokenizer.tokenize(text)
    token_ids = tokenizer(text)["input_ids"]
    print(f"Text: {text}")
    print(f"Tokens: {tokens}")
    print(f"Token IDs: {token_ids}")
    print(f"Decoded: {tokenizer.decode(token_ids)}")
    print("-" * 40)

Text: how are you?
Tokens: ['how', '▁are', '▁you', '?']
Token IDs: [2, 7843, 659, 611, 236881]
Decoded: <bos>how are you?
----------------------------------------
Text: ubiquitous
Tokens: ['ub', 'iqu', 'itous']
Token IDs: [2, 709, 2379, 59450]
Decoded: <bos>ubiquitous
----------------------------------------

This example of next token prediction. We output the top-k tokens for each input.

model_name = "google/gemma-3-1b-it"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# prompt = "dog:chien, house:maison, book:livre, sun:soleil, cat:"
prompt = "She sat quietly by the bank of the"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits

# Get logits for the last token in the sequence
next_token_logits = logits[0, -1, :]
probs = F.softmax(next_token_logits, dim=-1)

top_k = 10
top_k_probs, top_k_indices = torch.topk(probs, top_k)
top_k_tokens = [tokenizer.decode([idx]) for idx in top_k_indices]

for i in range(top_k):
    print(f"Token: {repr(top_k_tokens[i])}, Probability: {top_k_probs[i].item():.4f}")

Token: ' river', Probability: 0.9708
Token: ' stream', Probability: 0.0058
Token: ' creek', Probability: 0.0049
Token: ' River', Probability: 0.0030
Token: ' lake', Probability: 0.0029
Token: ' water', Probability: 0.0018
Token: ' Willow', Probability: 0.0006
Token: ' shimmering', Probability: 0.0005
Token: ' flowing', Probability: 0.0004
Token: ' old', Probability: 0.0003

Here’s the classical example of word embedding: what is “king” - “man” + “woman”?

words = ["king", "man", "woman"]
token_ids = [tokenizer(w, add_special_tokens=False)["input_ids"][0] for w in words]
embedding_matrix = model.get_input_embeddings().weight

# Get embeddings for king, man, woman
king_emb = embedding_matrix[token_ids[0]]
man_emb = embedding_matrix[token_ids[1]]
woman_emb = embedding_matrix[token_ids[2]]

# Compute the target embedding: king - man + woman
target_emb = king_emb - man_emb + woman_emb

# Compute cosine similarity with all embeddings in the vocab
cos_sim = F.cosine_similarity(target_emb.unsqueeze(0), embedding_matrix, dim=1)

# Exclude the original words from the search
for idx in token_ids:
    cos_sim[idx] = -float('inf')

# Find the most similar token
best_idx = torch.argmax(cos_sim).item()
best_word = tokenizer.decode([best_idx])

print(f"Closest word to 'king - man + woman': {best_word}")

Closest word to 'king - man + woman':  queen