Pelican Rides a Bicycle¶

Below are two SVG solutions to the Simon Willison's benchmark: "Generate an SVG of a pelican riding a bicycle."

Claude Sonnet 3.5:¶

Claude-Code:¶

What Changed?!¶

As agentic gen/ai exploded in 2025, it felt like Willison's benchmark became a solvable problem if we used these new agentic powers: skills, sub-agents, tool-use and more. Using Claude-Code, I built a pipeline that now solves this problem.

The Github repo github.com/0x6a77/pelican-rides-a-bicycle let's you replicate this work and see an example of Claude-Code skills with tool-calling.

How It Works¶

The Claude-Code skills pipeline orchestrates specialized capabilities:

Agentic/AI Orchestrates the pipeline
Image Generation: Flux diffusion model generates a bitmap of the prompt
Vector Tracing: autotrace converts the bitmap to SVG paths

The agent's job is to know which tool to use when. This is fundamentally different from asking a transformer to generate SVG vectors from scratch.

2026, The Year of Agentic Pipelines?¶

Some might argue this doesn't solve Willison's benchmark - it solves the problem his benchmark reveals. It shows transformers can't do one-shot compositional spatial reasoning. My pipeline shows that agentic gen/ai doesn't need to because it can orchestrate specialized tools instead. The pelican benchmark is valuable for tracking raw model capabilities, but to build systems that work today, we need orchestration.¹

The right way forward with gen/ai is to embrace agentic gen/ai in 2026 in deeper and more interesting ways. Stay tuned for an expansion of this idea.

(Stephen Wolfram's work on computational irreducibility suggests the pelican task may require a level of spatial reasoning that transformers can't do, so tool orchestration isn't just pragmatic, but potentially necessary. I plan to write about this in a future post.) ↩