The world of artificially intelligent visual recognition has really started to blossom in the last two years, but sometimes it feels like there’s a new A.I. company popping up every day! If you’re reading this, you probably already know a little about visual recognition and machine learning but you’re not exactly sure what makes each visual recognition company different.* It can be really hard to keep up with all the options out there, so we created this guide to help you decide which visual recognition API is right for you.

Download as PDF

*yes, we are biased on our own awesomeness, but we know the market really well!

There are six questions you should ask when you’re deciding which visual recognition API you want to use to power your app:

What does the technology actually recognize?

How accurate is the technology?

What is the technology’s editorial voice?

How does the pricing add up?

Do support and extras come with the API?

Do you trust the company?


What does the visual recognition API actually recognize?

Visual recognition APIs allow you to build apps that automatically recognize and tag things in pictures. It sounds pretty straightforward, but there are actually several different factors - media formats, type of model, etc. - you’ll need to consider when picking the right API.

Media formats - images, animated gifs, video

Most visual recognition APIs can recognize images. But sometimes you want to go beyond static images and be able to recognize animated gifs and even video. Not every API can analyze images, animated gifs, AND video (Clarifai can! #humblebrag), so make sure the API you end up using satisfies all your media requirements.

Image

Animated Gif

Video

A great visual recognition API company will have a live demo you can try - put ours to the test with any image or video and watch it work like magic.

General models vs. domain models

All visual recognition APIs have one or more models you can use. Different models are used to predict and identify different categories of things.

Every visual recognition API has a general model that recognizes and labels generic objects and things, like “lamp” and “dog” and “water.” Most visual recognition APIs understand a couple thousand different general concepts. Not to brag, but Clarifai understands over 11,000.

The best visual recognition APIs also give you access to domain models for more specific categories - for example, while Clarifai’s general model might recognize a cheeseburger as “food”, Clarifai’s food model would recognize a cheeseburger as “cheeseburger." Domain models are great for if you want to look at images through a more specific lens.

There are a lot of different domain models out there - here’s a breakdown of the most common ones:

face

Face Recognition

Specializes in reading faces and emotions

nsfw1

NSFW & Moderation

Detects things that are not safe for work

logo

Logo Detection

Specializes in reading brands and logos

paint

Color Recognition

Recognizes all the colors of the rainbow and more

eiffel

Location Recognition

Specializes in recognizing landmarks and locations

ocr

OCR

Optical character recognition, also known as text

Custom models for every occasion

Even though domain models are more specific than general models, sometimes what you want to recognize is even more personalized and requires building a custom model. For example, if you're a company like Nike, you probably want to recognize every single Nike shoe in the universe - this would require a custom model.

Building a custom model in-house requires a dedicated data scientist to train a new data set on thousands, maybe even millions, of images, as well as thousands of lines of code and special infrastructure. Or, if you're using Clarifai's Custom Training product, all you have to do is show our API ten examples of a new concept and our technology automatically learns what it is and applies it to your very own custom model.

The cool thing about custom models is you can train A.I. to understand and recognize anything (even Pokemon):

A fun way to see custom training in action is to download our free Forevery iOS photo discovery app. The app automatically tags every photo on your camera roll and you can train it to recognize new concepts, too!

We've made some pretty cool custom models for our customers already:

  • Contraband - recognizes things like drugs, firearms, and live animals, often used on auction and listing sites to filter for illegal items and moderate sites according to terms of use
  • Weddings - specializes in recognizing items that are common to weddings
  • Pornography - specializes in recognizing different categories and fetishes for pornographic sites
  • Food - recognizes different types of food and makes you really hungry in the process
  • Celebrities - recognizes celebrities and famous people all over the world

Not all visual recognition companies will build a custom model for you, nor will they necessarily have the resources required to do it well. If a custom model is what you need, make sure you find a company (like Clarifai!) that has leading researchers and tons of experience building and training models.

How accurate is the visual recognition technology?

As you may already know, visual recognition technology is built upon a little thing called machine learning. Machine learning is when a computer is trained to act based on recognizing patterns rather than being explicitly programmed. The more examples a computer sees, the better it gets at understanding what it’s seeing. Here are a couple questions you should consider before deciding on a visual recognition solution.

What dataset is the model trained on and how many images does that include?

So, more images trained = more accuracy, right?

Kind of. Yes, in general, the larger the dataset a model is trained on, the more accurate the results. However, there comes a point of diminishing returns - ten examples are obviously better than one example, but ten million examples are only incrementally better than one million.

Also, you can’t just train any random images from a web crawl to improve accuracy - you must train images that are labelled well and suit a specific task. A great visual recognition company (like Clarifai, naturally) will have access to specific datasets from certain parts of the web, partners, and feedback.

Can you send feedback to the model to make it more accurate as you’re using it?

Visual recognition technology becomes the most accurate when it’s given both positive and negative feedback. Imagine you’re training a puppy - you have to discipline it when it pees on your couch so that it doesn’t do it again. Same with machine learning - you have to tell it when it’s wrong so that it gets it right in the future.

To see this feedback loop principle in action, download Forevery! Our free photo discovery app for iOS automatically applies relevant tags to every photo on your camera roll. You can train the app to recognize new tags and give it positive or negative feedback on tags to make our API smarter.

When you’re choosing your visual recognition solution, ask the company whether it has a feedback loop built in so you can help make its algorithm better!

What is the technology’s editorial voice?

Wait, artificial intelligence has an editorial voice?

Yes! Even though we’ve trained computers to become smart, it’s still a human doing the initial training. That means human perceptions, limitations, and even biases can color each individual model. We’ve all heard the phrase, “One person’s trash is another’s treasure” - the same principal applies to visual recognition A.I.

Look for a broad vocabulary

As with human communication, computer understanding is limited by the size of its vocabulary. For example, some visual recognition APIs can recognize 1,000 different things in their vocabulary. Usually, these models are limited to only recognizing objects (e.g. tree, car, etc.).

We’re pretty sure Clarifai has the largest vocabulary of any visual recognition API on the market (#shamelessplug), recognizing 11,000+ different things in over 20 languages. And by “things,” we mean objects (e.g. cat), ideas (e.g. politics, love, friendship, etc.), and even real live human feelings (e.g. togetherness, anger, fun, etc.). Developers can trim down the number of concepts using our API but it’s always helpful to start with more options.

Additionally, you might also want a solution that allows you to use our own taxonomy. So, if our general model calls "soda" ... well, "soda" ... and maybe you want to call it "pop," a platform like Custom Training will allow you to do so.

If you’re choosing a visual recognition API, look for one that uses the vocabulary that best suits your needs.

Objective vs. subjective recognition

Because humans train A.I., it’s unavoidable for our own biases and perceptions to make an impact on how computers understand the world. Imagine how differently a very conservative person and a very liberal person might see the world, then apply those differences to visual recognition:

At Clarifai, as we’re choosing words in our models’ vocabulary, we try to keep it neutral and factual and avoid using subjective terms. This might be exactly what you're looking for. Or, depending on your worldview, a more subjective approach might be better for you - just be aware that these differences exist and choose the one that’s right for you.

Good humans = good A.I.

When you’re choosing an A.I. company to work with, understand how the company views its ethical responsibility as the shapers of artificial intelligence. It may not be an immediate concern for the app you’re building, but hey, we’re all humans and the robot apocalypse is fast approaching!

We’ve written on the topic of A.I. ethics in the past if you’re interested in learning more about our take on it.

How does the pricing add up?

We get that the dollar (or yuan, or rupee, or euro, etc.) is sometimes king, which is why it's so important to clarifai (see what we did there?) what you get in exchange for how much you pay. Pricing for visual recognition APIs can be confusing, so here’s our breakdown of the different pricing plans in the market and what they mean. That way, you can understand exactly how much value you’re getting out of your API solution. 

Per Tier vs. Per Unit pricing - which is better?

Per tier pricing gives you up to a certain number of units for a fixed fee. For example, if your tier is up to 20,000 units, you would pay the same amount for the month whether you use 10,000 or 20,000 units.

Per unit pricing charges you a set amount per unit. For example, you might be charged $0.003 per unit. This means you’re charged each time you use a unit but you’re never charged for units you don’t use.

So which is better? That depends on how well you know your needs. If your needs are relatively stable month to month (e.g. you need around 20,000 calls per month), per tier pricing tends to be much cheaper because resources can be allocated more efficiently. If your needs fluctuate more (e.g. you need 5,000 calls this month but 50,000 next month), per unit pricing may be better.

It’s helpful to start with a visual recognition API that has a generous free tier - that way you can test and build to your heart’s content before committing. Clarifai gives you 15,000 free units to start and even more free units if you’re a student!

What do I do if I need to process a bunch of pictures upfront?

Backfill pricing is a cheaper pricing option that some visual recognition companies (like Clarifai!) offer to customers with large backlogs of images and videos. For example, if you have an archive of 10,000 cat pictures on file, you can process the whole batch upfront at a heavily discounted price before committing to monthly per tier or per unit pricing.

What limits and infrastructure are behind the API?

Processing images takes tons of computing power. Most visual recognition companies out there have very reliable infrastructure that can support high volumes, but it never hurts to ask about uptime and speed.

Some companies throttle or cap your usage, so it’s good to check what those limits are before you purchase. For example, Google Cloud Vision API limits your total usage to 20 million units per month, while Imagga limits your bandwidth to 2-5 API calls per second. If you’re concerned about hitting limits, Clarifai supports volumes in the billions and does not cap your bandwidth per second.

What kind of support, community, and integrations come with the API?

Support and customer service should be a factor in your decision making because you never know when you’ll need help. Integrations are important because they make it easier to build apps that work well with other popular apps. Here are some questions you should ask before settling on a visual recognition solution.

Who can I talk to if I need help, and how do I reach them?

Some companies will only give you access to a support generalist if you need help. Others, like Clarifai, let you talk to real machine learning PhD experts if you so desire. Support is a big part of our company’s DNA - everyone from engineers to developer evangelists to marketers to the CEO himself takes part in helping customers.

A good indicator of how much a company cares about you as a customer is how hard it is to reach a real, live human at the company. Is the “contact us” form buried somewhere on the website or displayed prominently in the navigation bar? Does the company reply to tweets or Facebook messages? Are you talking to a human or a chatbot?

Does customer support cost extra?

Some companies will charge hundreds of dollars extra for a separate support package if you need help. Others will not provide support unless you pay a certain amount of money. Clarifai gives great customer support to all of its customers as part of each API plan, even the customers on our free plan! We also hold monthly Hangouts where you can talk about anything - visual recognition, machine learning, careers, magical beasts and where to find them, etc. - with our support team and developer evangelists.

What kind of integrations does the API have?

Depending on what you’re building, this could be a big deciding factor on which visual recognition API you want to use. Clarifai has tons of official and unofficial integrations and SDKs with popular services and languages. Our active community has not only contributed their helpful integrations on GitHub, they’ve also been really awesome about sharing their work on our blog and Devpost to teach and inspire others!

Do you trust the company you're working with?

You, the informed consumer, should be sensitive about your data and privacy. Choose a company that is committed to protecting you as a customer. Even though the world of visual recognition is still pretty young, experienced brands like Clarifai have been around since the very beginning (three whole years ago, ha! #OG) and have in-house legal counsel to inform every single one of its decisions and the impact on its customers.

In data we trust

Knowledge is power, so your data is probably one of the most valuable things you own. Don’t just give it away to anyone. Make sure the company you're working with takes strides to protect your security and privacy. We strive to be transparent with how we use your data - you can read our privacy policy here.

Beyond data security, think about the product's existential security as well. Will a visual recognition API always be the company's focus, or is it just a science project among many other projects? Is the company well-funded and will it continue to grow and focus all of its resources on building and improving the visual recognition product? These are all questions you should consider before settling on the right visual recognition API solution, especially if you're hoping to make artificial intelligence a cornerstone of your business or app.

Will the company always act in your best interest?

Today’s supporter may be tomorrow’s competitor - if you’re sharing your data with a company, do your research to make sure that they won’t be able to use it to compete with you directly or indirectly in the future.

Small companies with a single-minded focus tend to be better for developers because they’re focused on building a great product without regard to competing internal business interests or making money from advertisers. If you care about your data, it’s always better to use an API whose company doesn’t trade in your information to make money somewhere else in its business or sell to third parties.

You can understand how companies prioritize you as a customer by examining how they use your data in this chart. Clarifai only uses data to in ways that benefit our customers and improve our product. Other companies might occupy a wider or narrower swath on this chart depending on how they use your data.

Ok, now that you’ve read through our guide on visual recognition APIs in the market, I hope you’ve come to the natural conclusion that Clarifai is the absolute best. Just kidding (kind of, not really) - I hope you found this guide helpful in explaining the visual recognition options out there and you're ready to try Clarifai for free!

Like this guide? Share it!
This guide really helped me understand all the visual recognition options out there! #Clarifai