Azure Cognitive Computer Vision
Important
Transport Layer Security (TLS) 1.2 is now enforced for all HTTP requests to this service. For more information, see Azure Cognitive Services security.
The Computer Vision Image Analysis service can extract a wide variety of visual features from your images. For example, it can determine whether an image contains adult content, find specific brands or objects, or find human faces.
You can use Image Analysis through a client library SDK or by calling the REST API directly. Follow the quickstart to get started.
This documentation contains the following types of articles:
You can analyze images to provide insights about their visual features and characteristics. All of the features in the list below are provided by the Analyze Image API. Follow a quickstart to get started.
Identify and tag visual features in an image, from a set of thousands of recognizable objects, living things, scenery, and actions. When the tags are ambiguous or not common knowledge, the API response provides hints to clarify the context of the tag. Tagging isn’t limited to the main subject, such as a person in the foreground, but also includes the setting (indoor or outdoor), furniture, tools, plants, animals, accessories, gadgets, and so on. Tag visual features
Object detection is similar to tagging, but the API returns the bounding box coordinates for each tag applied. For example, if an image contains a dog, cat and person, the Detect operation will list those objects together with their coordinates in the image. You can use this functionality to process further relationships between the objects in an image. It also lets you know when there are multiple instances of the same tag in an image. Detect objects
Identify commercial brands in images or videos from a database of thousands of global logos. You can use this feature, for example, to discover which brands are most popular on social media or most prevalent in media product placement. Detect brands
Identify and categorize an entire image, using a category taxonomy with parent/child hereditary hierarchies. Categories can be used alone, or with our new tagging models.
Currently, English is the only supported language for tagging and categorizing images. Categorize an image
Generate a description of an entire image in human-readable language, using complete sentences. Computer Vision’s algorithms generate various descriptions based on the objects identified in the image. The descriptions are each evaluated and a confidence score generated. A list is then returned ordered from highest confidence score to lowest. Describe an image
Detect faces in an image and provide information about each detected face. Computer Vision returns the coordinates, rectangle, gender, and age for each detected face.
Computer Vision provides a subset of the Face service functionality. You can use the Face service for more detailed analysis, such as facial identification and pose detection. Detect faces
Detect characteristics about an image, such as whether an image is a line drawing or the likelihood of whether an image is clip art. Detect image types
Use domain models to detect and identify domain-specific content in an image, such as celebrities and landmarks. For example, if an image contains people, Computer Vision can use a domain model for celebrities to determine if the people detected in the image are known celebrities. Detect domain-specific content
Analyze color usage within an image. Computer Vision can determine whether an image is black & white or color and, for color images, identify the dominant and accent colors. Detect the color scheme
Analyze the contents of an image to generate an appropriate thumbnail for that image. Computer Vision first generates a high-quality thumbnail and then analyzes the objects within the image to determine the area of interest. Computer Vision then crops the image to fit the requirements of the area of interest. The generated thumbnail can be presented using an aspect ratio that is different from the aspect ratio of the original image, depending on your needs. Generate a thumbnail
Analyze the contents of an image to return the coordinates of the area of interest. Instead of cropping the image and generating a thumbnail, Computer Vision returns the bounding box coordinates of the region, so the calling application can modify the original image as desired. Get the area of interest
You can use Computer Vision to detect adult content in an image and return confidence scores for different classifications. The threshold for flagging content can be set on a sliding scale to accommodate your preferences.
Image Analysis works on images that meet the following requirements:
As with all of the Cognitive Services, developers using the Computer Vision service should be aware of Microsoft’s policies on customer data. See the Cognitive Services page on the Microsoft Trust Center to learn more.
Customize and embed state-of-the-art computer vision image analysis for specific domains with Custom Vision, part of Azure Cognitive Services. Build frictionless customer experiences, optimize manufacturing processes, accelerate digital marketing campaigns, and more. No machine learning expertise is required.
What is Azure Cognitive Services – Computer Vision?
Azure Cognitive Services makes AI project development to be simpler, and faster!
What Cognitive Services can do?
1. OCR
– Read text from image, pdf
2. Face Detection
– Detect how many human face in a photo
– gender and age analysis
3. Image descriptions
– If you upload a swimming photo, AI will tell you there is a male or female swimming
4. Spatial Analysis(link to CCTV or surveillance cameras)
– Detect how long the person appears in the cctv
– Detect the person wears mask or not
– Measure the distance between people
Computer Vision can power many digital asset management (DAM) scenarios. DAM is the business process of organizing, storing, and retrieving rich media assets and managing digital rights and permissions. For example, a company may want to group and identify images based on visible logos, faces, objects, colors, and so on. Or, you might want to automatically generate captions for images and attach keywords so they’re searchable. For an all-in-one DAM solution using Cognitive Services, Azure Cognitive Search, and intelligent reporting, see the Knowledge Mining Solution Accelerator Guide on GitHub. For other DAM examples, see the Computer Vision Solution Templates repository.
Computer Vision can analyze images that meet the following requirements:
Language | Language code | Read 3.2 | OCR API | Read 3.0/3.1 |
---|---|---|---|---|
Afrikaans | af |
✔ | ||
Albanian | sq |
✔ | ||
Arabic | ar |
✔ | ||
Asturian | ast |
✔ | ||
Basque | eu |
✔ | ||
Bislama | bi |
✔ | ||
Breton | br |
✔ | ||
Catalan | ca |
✔ | ||
Cebuano | ceb |
✔ | ||
Chamorro | ch |
✔ | ||
Chinese Simplified | zh-Hans |
✔ | ✔ | |
Chinese Traditional | zh-Hant |
✔ | ✔ | |
Cornish | kw |
✔ | ||
Corsican | co |
✔ | ||
Crimean Tatar Latin | crh |
✔ | ||
Czech | cs |
✔ | ✔ | |
Danish | da |
✔ | ✔ | |
Dutch | nl |
✔ | ✔ | ✔ |
English (incl. handwritten) | en |
✔ | ✔ (print only) | ✔ |
Estonian | et |
✔ | ||
Fijian | fj |
✔ | ||
Filipino | fil |
✔ | ||
Finnish | fi |
✔ | ✔ | |
French | fr |
✔ | ✔ | ✔ |
Friulian | fur |
✔ | ||
Galician | gl |
✔ | ||
German | de |
✔ | ✔ | ✔ |
Gilbertese | gil |
✔ | ||
Greek | el |
✔ | ||
Greenlandic | kl |
✔ | ||
Haitian Creole | ht |
✔ | ||
Hani | hni |
✔ | ||
Hmong Daw Latin | mww |
✔ | ||
Hungarian | hu |
✔ | ✔ | |
Indonesian | id |
✔ | ||
Interlingua | ia |
✔ | ||
Inuktitut Latin | iu |
✔ | ||
Irish | ga |
✔ | ||
Italian | it |
✔ | ✔ | ✔ |
Japanese | ja |
✔ | ✔ | |
Javanese | jv |
✔ | ||
K'iche' | quc |
✔ | ||
Kabuverdianu | kea |
✔ | ||
Kachin Latin | kac |
✔ | ||
Kara-Kalpak | kaa |
✔ | ||
Kashubian | csb |
✔ | ||
Khasi | kha |
✔ | ||
Korean | ko |
✔ | ✔ | |
Kurdish Latin | kur |
✔ | ||
Luxembourgish | lb |
✔ | ||
Malay Latin | ms |
✔ | ||
Manx | gv |
✔ | ||
Neapolitan | nap |
✔ | ||
Norwegian | nb |
✔ | ||
Norwegian | no |
✔ | ||
Occitan | oc |
✔ | ||
Polish | pl |
✔ | ✔ | |
Portuguese | pt |
✔ | ✔ | ✔ |
Romanian | ro |
✔ | ||
Romansh | rm |
✔ | ||
Russian | ru |
✔ | ||
Scots | sco |
✔ | ||
Scottish Gaelic | gd |
✔ | ||
Serbian Cyrillic | sr-Cyrl |
✔ | ||
Serbian Latin | sr-Latn |
✔ | ||
Slovak | sk |
✔ | ||
Slovenian | slv |
✔ | ||
Spanish | es |
✔ | ✔ | ✔ |
Swahili Latin | sw |
✔ | ||
Swedish | sv |
✔ | ✔ | |
Tatar Latin | tat |
✔ | ||
Tetum | tet |
✔ | ||
Turkish | tr |
✔ | ✔ | |
Upper Sorbian | hsb |
✔ | ||
Uzbek Latin | uz |
✔ | ||
Volapük | vo |
✔ | ||
Walser | wae |
✔ | ||
Western Frisian | fy |
✔ | ||
Yucatec Maya | yua |
✔ | ||
Zhuang | za |
✔ | ||
Zulu | zu |
✔ |
Mobile Apps vs. Web Apps
Mobile apps are built for a specific platform, such as iOS for the Apple iPhone or Android for a Samsung device. They are downloaded and installed via an app store and have access to system resources, such as GPS and the camera function. Mobile apps live and run on the device itself. Snapchat, Instagram, Google Maps and Facebook Messenger are some examples of popular mobile apps.
on the other hand, are accessed via the internet browser and will adapt to whichever device you’re viewing them on. They are not native to a particular system and don’t need to be downloaded or installed. Due to their responsive nature, they do indeed look and function a lot like mobile apps — and this is where the confusion arises.
While the designs are similar and follow the same fonts and color scheme, these are essentially two different products.
Web apps need an active internet connection in order to run, whereas mobile apps may work offline. Mobile apps have the advantage of being faster and more efficient, but they do require the user to regularly download updates. Web apps will update themselves.
Above all, mobile apps and web apps are designed and built very differently. To further differentiate between the two, it helps to understand how each is developed.