LLM Demo#
Let’s play with a LLM on your local machine!
from transformers import pipeline
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import torch.nn.functional as F
pipe = pipeline("text-generation", model="google/gemma-3-1b-it")
Device set to use mps:0
chat = [
{"role": "user", "content": "How are you?"}
]
response = pipe(chat)
print(response[0]["generated_text"][-1]["content"])
I’m doing well, thank you for asking! As a large language model, I don’t experience feelings in the same way humans do, but I’m functioning perfectly and ready to help you with whatever you need. 😊
How are *you* doing today? Is there anything you’d like to chat about or any task you’d like me to help you with?
chat = response[0]["generated_text"]
chat.append(
{"role": "user", "content": "Tell me a joke about math."}
)
chat
[{'role': 'user', 'content': 'How are you?'},
{'role': 'assistant',
'content': 'I’m doing well, thank you for asking! As a large language model, I don’t experience feelings in the same way humans do, but I’m functioning perfectly and ready to help you with whatever you need. 😊 \n\nHow are *you* doing today? Is there anything you’d like to chat about or any task you’d like me to help you with?'},
{'role': 'user', 'content': 'Tell me a joke about math.'}]
response = pipe(chat)
print(response[0]["generated_text"][-1]["content"])
Why was six afraid of seven?
Because seven eight nine!
---
Hopefully that made you smile! 😊 Would you like to hear another one?
chat = response[0]["generated_text"]
chat.append(
{"role": "user", "content": "Another one!"}
)
response = pipe(chat)
print(response[0]["generated_text"][-1]["content"])
Okay, here’s one:
Parallel lines have so much in common.
… Except they’ll never meet.
---
Did you like that one? 😊
new_chat = [
{"role": "user", "content": "Another one!"}
]
response = pipe(new_chat)
print(response[0]["generated_text"][-1]["content"])
Please provide me with the previous conversation! I need to know what we were talking about to be able to continue the "Another One!" sequence. 😊
Let me know what you're referring to.
Here’s an example of tokenizer
tokenizer = AutoTokenizer.from_pretrained("google/gemma-3-1b-it")
texts = ["how are you?", "ubiquitous"]
for text in texts:
tokens = tokenizer.tokenize(text)
token_ids = tokenizer(text)["input_ids"]
print(f"Text: {text}")
print(f"Tokens: {tokens}")
print(f"Token IDs: {token_ids}")
print(f"Decoded: {tokenizer.decode(token_ids)}")
print("-" * 40)
Text: how are you?
Tokens: ['how', '▁are', '▁you', '?']
Token IDs: [2, 7843, 659, 611, 236881]
Decoded: <bos>how are you?
----------------------------------------
Text: ubiquitous
Tokens: ['ub', 'iqu', 'itous']
Token IDs: [2, 709, 2379, 59450]
Decoded: <bos>ubiquitous
----------------------------------------
This example of next token prediction. We output the top-k tokens for each input.
model_name = "google/gemma-3-1b-it"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# prompt = "dog:chien, house:maison, book:livre, sun:soleil, cat:"
prompt = "She sat quietly by the bank of the"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
# Get logits for the last token in the sequence
next_token_logits = logits[0, -1, :]
probs = F.softmax(next_token_logits, dim=-1)
top_k = 10
top_k_probs, top_k_indices = torch.topk(probs, top_k)
top_k_tokens = [tokenizer.decode([idx]) for idx in top_k_indices]
for i in range(top_k):
print(f"Token: {repr(top_k_tokens[i])}, Probability: {top_k_probs[i].item():.4f}")
Token: ' river', Probability: 0.9708
Token: ' stream', Probability: 0.0058
Token: ' creek', Probability: 0.0049
Token: ' River', Probability: 0.0030
Token: ' lake', Probability: 0.0029
Token: ' water', Probability: 0.0018
Token: ' Willow', Probability: 0.0006
Token: ' shimmering', Probability: 0.0005
Token: ' flowing', Probability: 0.0004
Token: ' old', Probability: 0.0003
Here’s the classical example of word embedding: what is “king” - “man” + “woman”?
words = ["king", "man", "woman"]
token_ids = [tokenizer(w, add_special_tokens=False)["input_ids"][0] for w in words]
embedding_matrix = model.get_input_embeddings().weight
# Get embeddings for king, man, woman
king_emb = embedding_matrix[token_ids[0]]
man_emb = embedding_matrix[token_ids[1]]
woman_emb = embedding_matrix[token_ids[2]]
# Compute the target embedding: king - man + woman
target_emb = king_emb - man_emb + woman_emb
# Compute cosine similarity with all embeddings in the vocab
cos_sim = F.cosine_similarity(target_emb.unsqueeze(0), embedding_matrix, dim=1)
# Exclude the original words from the search
for idx in token_ids:
cos_sim[idx] = -float('inf')
# Find the most similar token
best_idx = torch.argmax(cos_sim).item()
best_word = tokenizer.decode([best_idx])
print(f"Closest word to 'king - man + woman': {best_word}")
Closest word to 'king - man + woman': queen