Casper Lab

atheo.dev

Pixel Art Studio

Pan, zoom, and explore Casper's room

Full Experience

Swipe or Drag Pan Pinch Zoom

Flip to see fullscreen

Build Overview

Building Casper Lab

A peek in the kitchen

Overview

Casper Lab is a tamagotchi-inspired experience where visitors can watch Casper wander his cozy room and generate new furniture and hats for him using AI. The interesting part is the image generation pipeline: how to take messy user input, turn it into a clean prompt, generate a visually consistent sprite, and process it for use on a transparent canvas.

Try the full experience →. Sign up, earn points, and add items to Casper's world!

Image Generation Pipeline

When a visitor types something like "i want a cool wizard hat", that raw input goes through a two-step AI pipeline before it ever reaches the image model.

First, GPT-4o-mini extracts and spellchecks the item name. Then that clean name gets slotted into a strict prompt template that enforces visual consistency across all generated items:

# Prompt template enforces uniform pixel art style
SPRITE_PROMPT_TEMPLATE = (
    "Pixel art sprite of {item}, 3/4 view angle, "
    "black outline, white and gray shading only, "
    "chunky pixels, cute tamagotchi style, "
    "solid blue background (#0000FF)"
)

def generate_with_replicate(prompt: str) -> bytes:
    full_prompt = SPRITE_PROMPT_TEMPLATE.format(item=prompt)

    output = replicate.run(
        "google/imagen-4-fast",
        input={
            "prompt": full_prompt,
            "aspect_ratio": "1:1",
            "output_format": "png"
        }
    )
    ...

Every item that comes out of this pipeline looks like it belongs in the same world: same pixel density, same outline style, same color palette.

Uniform Style Through Constraints

What makes AI-generated items look cohesive isn't fine-tuning or LoRA. It's constraining the output. The prompt template forces grayscale-only coloring, black outlines, and a fixed blue background. That blue background isn't aesthetic; it's a chroma key.

Client-side JavaScript removes it pixel-by-pixel after the image loads:

const margin = 10;

for (let i = 0; i < data.length; i += 4) {
    const r = data[i];
    const g = data[i + 1];
    const b = data[i + 2];

    // Blue background detection: b > r + margin AND b > g + margin
    // Can't touch grayscale pixels (where r ≈ g ≈ b)
    if (b > r + margin && b > g + margin) {
        data[i + 3] = 0;  // Set alpha to 0 (transparent)
    }
}

Because the sprites are grayscale only, the blue channel in any actual sprite pixel is always roughly equal to red and green. Pure blue (#0000FF) is maximally distant from that, so the algorithm never accidentally erases part of the sprite.

Prompt-Based Input Cleaning

Users type all kinds of things: "make me sunglases", "i want a big bookshelf thing", or just "lamp". Before hitting the image model, a small LLM call extracts and cleans the item name. This is a pattern I use throughout: cheap, fast AI for structured extraction before the expensive generation step.

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {
            "role": "system",
            "content": "Extract just the item name from the user's input. "
                       "Fix spelling errors. Return ONLY the clean noun phrase. "
                       "No explanations, no quotes, just 1-4 words."
        },
        {"role": "user", "content": raw_prompt}
    ],
    max_tokens=20,
    temperature=0
)

A $0.001 extraction call saves a $0.04 image generation from being wasted on a misspelled or overly verbose prompt.

Design Decisions

Why a blue background for chroma key? Pure blue (#0000FF) is maximally distant from grayscale. The removal algorithm only needs one comparison per pixel, and since sprite content is grayscale, it can never accidentally make part of the sprite transparent.

Why Imagen over DALL-E? Google's Imagen 4 Fast via Replicate produces more consistent pixel art at lower latency. The API is pay-per-generation with no subscription, which is a good fit for a portfolio project with sporadic traffic.

Why clean input with AI first? Garbage in, garbage out. Image models don't handle typos or verbose phrasing well. A fast extraction step normalizes the input so the prompt template works reliably every time.