Teddy Lampert

Taking Harvard's CS50 AI, I was struck by how much machine learning resembles a black box even to the computer scientists who create it.

With how many misconceptions there are floating around about AI right now, I wanted to create a game that I could share with friends and family to help them understand how the technology actually works—hopefully, granting some insight into how to use it responsibly, while also satisfying general curiosity about it.

Highlights

Try out MNIST handwriting detection
Train your own image recognition model

Pedagogy

Beepo CV is deliberately structured to foster learner curiosity, engagement, and conceptual/intuitive understanding.

The app intentionally begins by asking users to draw a digit without any preamble. This is rooted in constructivist theory: by allowing learners to act before they’re told what’s happening, they experience authentic inquiry and surprise. When the app reveals that the drawing is just a numerical grid, and that the computer has already recognized the digit, it prompts a natural “How did it do that?”—a key moment of cognitive dissonance that motivates further exploration.

The steps that follow guide users through experiential learning. Instead of explaining how the computer knows what was drawn, Beepo presents learning as pattern recognition from examples and invites users to select two physical options—preparing them to understand how these models are created by training one themselves. Learners are also confronted with the limits of computation and the fundamental differences between human and machine learning with the way this problem is presented.

Requiring users to provide multiple photos of their chosen objects also isn’t just a technical necessity; it models for learners how exposure to varied data strengthens understanding. This hands-on data-gathering emphasizes the pivotal role that diversity and quantity of examples have in machine learning success—mirroring how people generalize concepts from multiple experiences in the real world and preparing users for later discussion of bias in AI.

By describing model training in intuitive, approachable terms—“wiring code at random to find what works”—the app normalizes the experimental, iterative nature of learning algorithms. This demystifies the process, lowers intimidation, and makes the powerful abstraction of neural networks accessible to users of all ages.

To summarize, Beepo CV positions the user as an active participant: collecting data, watching the model learn, and then evaluating its performance. This agency fosters ownership over the learning experience and supports metacognition, as users reflect on their actions and the computer’s responses. Rather than passively receiving information, learners discover principles by actively building and testing a simple AI themselves.

Engineering

Beepo CV is built in Swift with SwiftUI. On-device AI training and inference were by far the most interesting features from an engineering perspective, both of which were powered by Apple's Core ML framework.

The handwriting recognition model at the start is optimized to be as simple as possible while still producing clear results: each image is represented as a $28 \times 28$ grid of grayscale pixel values, giving the neural network $784$ inputs to work with. The architecture follows a classic convolutional pattern:

\text{Input (28×28)} \rightarrow \text{Conv} \rightarrow \text{Pool} \rightarrow \text{Conv} \rightarrow \text{Pool} \rightarrow \text{Conv} \rightarrow \text{Pool} \rightarrow \text{Dense} \rightarrow \text{Softmax}

The three convolutional layers (16 → 32 → 64 filters) extract increasingly abstract features, while max pooling layers progressively shrink the spatial dimensions from $28 \times 28$ down to $3 \times 3$ . A 128-unit dense layer then maps these features to the final 10-class probability distribution.

For the custom image classification model that users train themselves, the app organizes the user's dataset and then uses MLImageClassifier to train a model on it. This class uses transfer learning rather than training from scratch, optimizing for speed and efficiency on the limited computational resources of an entry-level iPad. Specifically, it takes a pre-trained vision model (trained on millions of images) and adapts it to the user's specific classes:

\text{Base Model (frozen)} \rightarrow \text{Feature Vector} \rightarrow \text{New Dense Layer} \rightarrow \text{Softmax}

This is also why it works with just 10 images per class: the model is able to generalize from the few examples it sees to the many it hasn't seen, and it does so very well.

Crop augmentation is also used to further improve the model's accuracy—squeezing more values out of the limited dataset by generating multiple training examples from each image.

Reflection

Everyone I showed Beepo CV to was fascinated, and I was happy to see how much they enjoyed the game and took away from it. I was also especially proud of the fact that it interested people across a very wide range of ages—from lower schoolers to grandparents.

And of course, I am very happy with the Apple Swift Student Challenge result! Being a WWDC Scholar has been a long-time goal, one I've worked towards for years before this submission.

Screenshots

Per competition requirements, the game must be playable within 3 minutes. The core gameplay loop shortened for Apple's judging criteria is as follows:

Beepo CVApple Swift Student Challenge Winner

Highlights

Pedagogy

Engineering

Reflection

Screenshots