We tested Edge’s new feature to provide automatic alt texts for images
Research and development
Using Artificial Intelligence, Microsoft Edge is now able to provide alternative descriptions for images that lack such descriptions. Fondazione LIA tested this new feature on several websites and for several types of images. We’re sharing the results in this article.
Providing alternative text for non-decorative images is one of the most important requirements when it comes to accessibility. Alternative texts (alt texts or alternative description) allow screen reader users to access the content and the information provided by an image. They represent a textual description of the visual content of an image, that would be otherwise inaccessible for users with visual disability.
Writing good alternative descriptions is not an easy task, it requires time and effort. Many images on the internet have inappropriate alt text, or no alt text at all. To fill this gap, Microsoft has added a new functionality to Microsoft Edge: it will now provide automatically generated alt text for those images that do not include it. The new feature builds upon modern image recognition technology based on Artificial Intelligence and Machine Learning algorithms, that can automatically process an image and provide a short textual description of it.
Automatic alt texts may not be perfect yet, but, for screen reader users, having some description is better than no description at all. In Microsoft Edge, users can enable this new feature in the browser settings.
Once the user has granted permission, Microsoft Edge will send images that do not have an alt text to its Azure Cognitive Service’s Computer Vision API to be processed. Most common image formats are supported (JPEG, PNG, GIF, WEBP and others). The Vision API analyzes the images and creates a short descriptive summary of the content of the image, that the screen reader will read to the user as the image description.
As of the writing of this article, the description can be generated in 5 languages (English, Spanish, Japanese, Portuguese, and Simplified Chinese), but the functionality can also recognize texts inside images in over 120 languages. To inform the user that the alt text they are reading is automatically generated, the descriptions are preceded by the formula “Appears to be”, while automatically recognized text inside the image is introduced by “Appears to say”.
Some types of images will not be described, such as images excessively small or large, the ones marked ad decorative by the creator, and all images categorized by the Vision API as explicit.
Microsoft is rolling out this feature in Microsoft Edge for Windows, Mac and Linux, but it won’t be available in Edge on Android and on iOS for now. According to Microsoft, “the algorithms are not perfect, and the quality of the description will vary”. But image recognition and algorithms constant improvements will gradually refine the quality of the service.
Results of the tests
Fondazione LIA has tested this feature with different types of images: photographs, graphs, paintings, illustrations and comics. The results we’ve obtained show that Microsoft Edge automatic image description works in a range going from “quite well” to “very well” with photographs, but has still a long way to go when it comes to other types of images. Even descriptions of photographs are not perfect yet and sometimes the description is rather misleading. For example, we came across a photograph of a person standing in front of a wall with its arms spread, its shadow assuming the shape of a superhero. The description was “Appears to be: a person standing in front of a group of persons”. Another time, a dog was mistaken for a cat. On the contrary, there have been times where the description was pretty accurate (although concise). Public personalities, such as Donald Trump or Viktor Orbán, have been correctly recognized.
As far as graphs are concerned, the Microsoft Vision API algorithms are able to distinguish the type of graph, classifying it as a bar graph, a line graph or a pie chart, but the description stops there, followed by the recognition and transcription of any text contained in the image. We also made an attempt with infographics, but the description given was simply “diagram”.
For the other types of images, we’ve found a great variability in the quality of the results. Drawings and illustrations have been, among the images tested, the categories where we’ve obtained the most random results, with drawings classified as “maps”, “cartoons”, “background patterns”. An extract from a comic was announced by the screen reader as “Appears to be: diagram”.
An attempt was also made with images of paintings and sculptures. In this case, the automatic descriptions proved to be closer to the content of the images, which were almost always correctly classified as “a painting of” or “a statue of”. However, we have seen that Vision API does not recognize famous arts and provides only generic descriptions. For example, the Apollo and Daphne by Lorenzo Bernini was described as “A statue of a woman and a man”, while Michelangelo’s famous Pietà was thus described: “A statue of a man sitting on a throne”, which of course is not the real content of the image.
Although not perfect and with room for improvement that Microsoft has already said it is aware of, the new Microsoft Edge feature is nevertheless an important step toward a more accessible Web.
The LIA 2019 pilot project
The results obtained during our test, shows that photographs get better descriptions than all other types of images, this is not surprising and was expected.
In fact, back in 2019 Fondazione LIA worked on a pilot project on the possibility of automatically generating alternative descriptions of images through the use of AI. The project focused on testing some AI algorithms for image recognition, image description and image classification chosen among those available on the market, applied to digital publications, where we can find complex images like charts, graphs, illustrations, drawings, etc.
On this occasion, we verified that the state-of-the-art image recognition algorithms had been optimized for photographs, a trend that the new Microsoft Edge feature now confirms.
While waiting for a new generation of automatic image description algorithms, the descriptions provided by content creators will probably provide a better experience for screen reader users.
If you are interested in this topic, Fondazione LIA has a long experience and offers specific courses on alternative descriptions for publishers and companies. To find out more, visit the Training page on our website.