The world of artificially intelligent visual recognition has really started to blossom in the last two years, but sometimes it feels like there’s a new A.I. company popping up every day! If you’re reading this, you probably already know a little about visual recognition and machine learning but you’re not exactly sure what makes each visual recognition company different.* It can be really hard to keep up with all the options out there, so we created this guide to help you decide which visual recognition API is right for you.
There are six questions you should ask when you’re deciding which visual recognition API you want to use to power your app:
Visual recognition APIs allow you to build apps that automatically recognize and tag things in pictures. It sounds pretty straightforward, but there are actually several different factors - media formats, type of model, etc. - you’ll need to consider when picking the right API.
A great visual recognition API company will have a live demo you can try - put ours to the test with any image or video and watch it work like magic.
All visual recognition APIs have one or more models you can use. Different models are used to predict and identify different categories of things.
Every visual recognition API has a general model that recognizes and labels generic objects and things, like “lamp” and “dog” and “water.” Most visual recognition APIs understand a couple thousand different general concepts. Not to brag, but Clarifai understands over 11,000.
The best visual recognition APIs also give you access to domain models for more specific categories - for example, while Clarifai’s general model might recognize a cheeseburger as “food”, Clarifai’s food model would recognize a cheeseburger as “cheeseburger." Domain models are great for if you want to look at images through a more specific lens.
There are a lot of different domain models out there - here’s a breakdown of the most common ones:
Specializes in reading faces and emotions
Specializes in reading brands and logos
Specializes in recognizing landmarks and locations
Optical character recognition, also known as text
Even though domain models are more specific than general models, sometimes what you want to recognize is even more personalized and requires building a custom model. For example, if you're a company like Nike, you probably want to recognize every single Nike shoe in the universe - this would require a custom model.
Building a custom model in-house requires a dedicated data scientist to train a new data set on thousands, maybe even millions, of images, as well as thousands of lines of code and special infrastructure. Or, if you're using Clarifai's Custom Training product, all you have to do is show our API ten examples of a new concept and our technology automatically learns what it is and applies it to your very own custom model.
The cool thing about custom models is you can train A.I. to understand and recognize anything (even Pokemon):
A fun way to see custom training in action is to download our free Forevery iOS photo discovery app. The app automatically tags every photo on your camera roll and you can train it to recognize new concepts, too!
We've made some pretty cool custom models for our customers already:
- Weddings - specializes in recognizing items that are common to weddings
- Pornography - specializes in recognizing different categories and fetishes for pornographic sites
- Food - recognizes different types of food and makes you really hungry in the process
- Celebrities - recognizes celebrities and famous people all over the world
Not all visual recognition companies will build a custom model for you, nor will they necessarily have the resources required to do it well. If a custom model is what you need, make sure you find a company (like Clarifai!) that has leading researchers and tons of experience building and training models.
As you may already know, visual recognition technology is built upon a little thing called machine learning. Machine learning is when a computer is trained to act based on recognizing patterns rather than being explicitly programmed. The more examples a computer sees, the better it gets at understanding what it’s seeing. Here are a couple questions you should consider before deciding on a visual recognition solution.
So, more images trained = more accuracy, right?
Kind of. Yes, in general, the larger the dataset a model is trained on, the more accurate the results. However, there comes a point of diminishing returns - ten examples are obviously better than one example, but ten million examples are only incrementally better than one million.
Also, you can’t just train any random images from a web crawl to improve accuracy - you must train images that are labelled well and suit a specific task. A great visual recognition company (like Clarifai, naturally) will have access to specific datasets from certain parts of the web, partners, and feedback.
Wait, artificial intelligence has an editorial voice?
Yes! Even though we’ve trained computers to become smart, it’s still a human doing the initial training. That means human perceptions, limitations, and even biases can color each individual model. We’ve all heard the phrase, “One person’s trash is another’s treasure” - the same principal applies to visual recognition A.I.
As with human communication, computer understanding is limited by the size of its vocabulary. For example, some visual recognition APIs can recognize 1,000 different things in their vocabulary. Usually, these models are limited to only recognizing objects (e.g. tree, car, etc.).
We’re pretty sure Clarifai has the largest vocabulary of any visual recognition API on the market (#shamelessplug), recognizing 11,000+ different things in over 20 languages. And by “things,” we mean objects (e.g. cat), ideas (e.g. politics, love, friendship, etc.), and even real live human feelings (e.g. togetherness, anger, fun, etc.). Developers can trim down the number of concepts using our API but it’s always helpful to start with more options.
Additionally, you might also want a solution that allows you to use our own taxonomy. So, if our general model calls "soda" ... well, "soda" ... and maybe you want to call it "pop," a platform like Custom Training will allow you to do so.
If you’re choosing a visual recognition API, look for one that uses the vocabulary that best suits your needs.
Because humans train A.I., it’s unavoidable for our own biases and perceptions to make an impact on how computers understand the world. Imagine how differently a very conservative person and a very liberal person might see the world, then apply those differences to visual recognition:
At Clarifai, as we’re choosing words in our models’ vocabulary, we try to keep it neutral and factual and avoid using subjective terms. This might be exactly what you're looking for. Or, depending on your worldview, a more subjective approach might be better for you - just be aware that these differences exist and choose the one that’s right for you.
When you’re choosing an A.I. company to work with, understand how the company views its ethical responsibility as the shapers of artificial intelligence. It may not be an immediate concern for the app you’re building, but hey, we’re all humans and the robot apocalypse is fast approaching!
We’ve written on the topic of A.I. ethics in the past if you’re interested in learning more about our take on it.
We get that the dollar (or yuan, or rupee, or euro, etc.) is sometimes king, which is why it's so important to clarifai (see what we did there?) what you get in exchange for how much you pay. Pricing for visual recognition APIs can be confusing, so here’s our breakdown of the different pricing plans in the market and what they mean. That way, you can understand exactly how much value you’re getting out of your API solution.
Per tier pricing gives you up to a certain number of units for a fixed fee. For example, if your tier is up to 20,000 units, you would pay the same amount for the month whether you use 10,000 or 20,000 units.
Per unit pricing charges you a set amount per unit. For example, you might be charged $0.003 per unit. This means you’re charged each time you use a unit but you’re never charged for units you don’t use.
So which is better? That depends on how well you know your needs. If your needs are relatively stable month to month (e.g. you need around 20,000 calls per month), per tier pricing tends to be much cheaper because resources can be allocated more efficiently. If your needs fluctuate more (e.g. you need 5,000 calls this month but 50,000 next month), per unit pricing may be better.
It’s helpful to start with a visual recognition API that has a generous free tier - that way you can test and build to your heart’s content before committing. Clarifai gives you 15,000 free units to start and even more free units if you’re a student!
Processing images takes tons of computing power. Most visual recognition companies out there have very reliable infrastructure that can support high volumes, but it never hurts to ask about uptime and speed.
Some companies throttle or cap your usage, so it’s good to check what those limits are before you purchase. For example, Google Cloud Vision API limits your total usage to 20 million units per month, while Imagga limits your bandwidth to 2-5 API calls per second. If you’re concerned about hitting limits, Clarifai supports volumes in the billions and does not cap your bandwidth per second.
Support and customer service should be a factor in your decision making because you never know when you’ll need help. Integrations are important because they make it easier to build apps that work well with other popular apps. Here are some questions you should ask before settling on a visual recognition solution.
Some companies will only give you access to a support generalist if you need help. Others, like Clarifai, let you talk to real machine learning PhD experts if you so desire. Support is a big part of our company’s DNA - everyone from engineers to developer evangelists to marketers to the CEO himself takes part in helping customers.
A good indicator of how much a company cares about you as a customer is how hard it is to reach a real, live human at the company. Is the “contact us” form buried somewhere on the website or displayed prominently in the navigation bar? Does the company reply to tweets or Facebook messages? Are you talking to a human or a chatbot?
Some companies will charge hundreds of dollars extra for a separate support package if you need help. Others will not provide support unless you pay a certain amount of money. Clarifai gives great customer support to all of its customers as part of each API plan, even the customers on our free plan! We also hold monthly Hangouts where you can talk about anything - visual recognition, machine learning, careers, magical beasts and where to find them, etc. - with our support team and developer evangelists.
Depending on what you’re building, this could be a big deciding factor on which visual recognition API you want to use. Clarifai has tons of official and unofficial integrations and SDKs with popular services and languages. Our active community has not only contributed their helpful integrations on GitHub, they’ve also been really awesome about sharing their work on our blog and Devpost to teach and inspire others!
You, the informed consumer, should be sensitive about your data and privacy. Choose a company that is committed to protecting you as a customer. Even though the world of visual recognition is still pretty young, experienced brands like Clarifai have been around since the very beginning (three whole years ago, ha! #OG) and have in-house legal counsel to inform every single one of its decisions and the impact on its customers.
Today’s supporter may be tomorrow’s competitor - if you’re sharing your data with a company, do your research to make sure that they won’t be able to use it to compete with you directly or indirectly in the future.
Small companies with a single-minded focus tend to be better for developers because they’re focused on building a great product without regard to competing internal business interests or making money from advertisers. If you care about your data, it’s always better to use an API whose company doesn’t trade in your information to make money somewhere else in its business or sell to third parties.
You can understand how companies prioritize you as a customer by examining how they use your data in this chart. Clarifai only uses data to in ways that benefit our customers and improve our product. Other companies might occupy a wider or narrower swath on this chart depending on how they use your data.
Ok, now that you’ve read through our guide on visual recognition APIs in the market, I hope you’ve come to the natural conclusion that Clarifai is the absolute best. Just kidding (kind of, not really) - I hope you found this guide helpful in explaining the visual recognition options out there and you're ready to try Clarifai for free!