A peek in the kitchen
Casper Lab is a tamagotchi-inspired experience where visitors can watch Casper wander his cozy room and generate new furniture and hats for him using AI. The interesting part is the image generation pipeline: how to take messy user input, turn it into a clean prompt, generate a visually consistent sprite, and process it for use on a transparent canvas.
Try the full experience →. Sign up, earn points, and add items to Casper's world!
When a visitor types something like "i want a cool wizard hat", that raw input goes through a two-step AI pipeline before it ever reaches the image model.
First, GPT-4o-mini extracts and spellchecks the item name. Then that clean name gets slotted into a strict prompt template that enforces visual consistency across all generated items:
# Prompt template enforces uniform pixel art style
SPRITE_PROMPT_TEMPLATE = (
"Pixel art sprite of {item}, 3/4 view angle, "
"black outline, white and gray shading only, "
"chunky pixels, cute tamagotchi style, "
"solid blue background (#0000FF)"
)
def generate_with_replicate(prompt: str) -> bytes:
full_prompt = SPRITE_PROMPT_TEMPLATE.format(item=prompt)
output = replicate.run(
"google/imagen-4-fast",
input={
"prompt": full_prompt,
"aspect_ratio": "1:1",
"output_format": "png"
}
)
...
Every item that comes out of this pipeline looks like it belongs in the same world: same pixel density, same outline style, same color palette.
What makes AI-generated items look cohesive isn't fine-tuning or LoRA. It's constraining the output. The prompt template forces grayscale-only coloring, black outlines, and a fixed blue background. That blue background isn't aesthetic; it's a chroma key.
Client-side JavaScript removes it pixel-by-pixel after the image loads:
const margin = 10;
for (let i = 0; i < data.length; i += 4) {
const r = data[i];
const g = data[i + 1];
const b = data[i + 2];
// Blue background detection: b > r + margin AND b > g + margin
// Can't touch grayscale pixels (where r ≈ g ≈ b)
if (b > r + margin && b > g + margin) {
data[i + 3] = 0; // Set alpha to 0 (transparent)
}
}
Because the sprites are grayscale only, the blue channel in any actual sprite pixel is always roughly equal to red and green. Pure blue (#0000FF) is maximally distant from that, so the algorithm never accidentally erases part of the sprite.
Users type all kinds of things: "make me sunglases", "i want a big bookshelf thing", or just "lamp". Before hitting the image model, a small LLM call extracts and cleans the item name. This is a pattern I use throughout: cheap, fast AI for structured extraction before the expensive generation step.
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{
"role": "system",
"content": "Extract just the item name from the user's input. "
"Fix spelling errors. Return ONLY the clean noun phrase. "
"No explanations, no quotes, just 1-4 words."
},
{"role": "user", "content": raw_prompt}
],
max_tokens=20,
temperature=0
)
A $0.001 extraction call saves a $0.04 image generation from being wasted on a misspelled or overly verbose prompt.
Why a blue background for chroma key? Pure blue (#0000FF) is maximally distant from grayscale. The removal algorithm only needs one comparison per pixel, and since sprite content is grayscale, it can never accidentally make part of the sprite transparent.
Why Imagen over DALL-E? Google's Imagen 4 Fast via Replicate produces more consistent pixel art at lower latency. The API is pay-per-generation with no subscription, which is a good fit for a portfolio project with sporadic traffic.
Why clean input with AI first? Garbage in, garbage out. Image models don't handle typos or verbose phrasing well. A fast extraction step normalizes the input so the prompt template works reliably every time.