Where's Waldo is a game wherein one looks for Waldo (a bespectacled, dead-eyed human candy cane looking guy) hidden in chaotic, busy scenes of whimsy. In other words, it's the perfect data set to challenge our visual recognition API with using Custom Training!
At Clarifai, we have an internal Hack Day every month where everyone works on a pet project they don't normally have time for. My Hack Day project this month was to attempt to answer a question that bored kids everywhere have asked when they found themselves stuck indoors, be it from either from a rainy day, a doctor's visit, or a cross country road trip in the back of a minivan ... Where's Waldo?
With scenes filled with animals, aliens, mermaids, and people of all sorts (including several Waldo imposters or "friends" as they call them), I wanted to see just how well our Custom Training model could perform on such a challenging and popular data set. Here's how I did it!
Step one: Find the data
Luckily for me, some kind and curious soul had already created and shared a Where's Waldo dataset online, where he took 19 Where's Waldo maps, split them into grids/tiles, then labeled the tiles accordingly ('waldo' or 'not waldo').
I decided to follow that method and do the same with other Waldo maps not included in his set. Maps were split up in several different grids (4x5, 4x6, 5x8, etc) to increase the sample sizes. These tiles were then uploaded into our app and labelled accordingly.
Step two: Train, test, more data
After the initial training, I tested the model against a map not in the training set, featuring Waldo on a moon colony, naturally. Against the entire map, our model found ... nothing. Well.. a 1% chance of Waldo being present. Not an unexpected result, since so much of the image simply isn't Waldo - that's what makes playing these so fun/frustrating.
I decided to test it again using the same grid system as the training. Going from tile to tile, results started making more sense. 1-3% chance of Waldo on relatively empty tiles. 10-20% chance on busier ones. 30-40% chance on Waldo imposters (shakes fist). then, a tile with 50% chance of Waldo! I scanned the tile anxiously. Boom. Waldo, hiding behind a crowd of people on the 2nd floor of a biodome.
Step three: More data, less imposters
My next goal was to lower the percentages of the Waldo lookalikes being flagged as Waldo. I went back to the original maps and manually cropped out several (somewhere between 500-1000) people; some lookalikes, others with similar color schemes, others just random people. All to tell the model who wasn't Waldo.
Higher percentages on Waldo positive tiles, lower percentages on Waldo negative tiles, less false-positives on lookalikes.
Overall though, the numbers still aren't great. The busier tiles still throw the model for a loop, Waldo predictions still aren't high enough, and false-positives are still present. Oddly enough, it isn't the characters that would normally trick human eyes (the lookalikes, characters wearing stripes or glasses, etc) that are throwing the false-positives, but something else that I haven't been able to discern yet.
As I revisit this project, I believe my next steps would be to find/increase the Waldo-positive examples, to try and counteract the implicit unbalanced nature of the data. That's the beauty of machine learning - the more examples you show, the better the results. If you're interested in trying this for yourself, sign up for a free API account and get started with Custom Training!