The past seen through artificial eyes
How does artificial intelligence (AI) view a digital archive? Does it see connections that we don’t see? What patterns does it reveal? Can AI be a guide, or a curator? Does it know what we’re looking for better than we do ourselves? And is it a tool, or an oracle? Together with Het Nieuwe Institiuut and VPRO Tegenlicht, designer and artist Richard Vijgen has been investigating these questions over the past few months. Vijgen researched the possible role of AI in opening up digital heritage collections. Here, he describes how he did this using Het Nieuwe Instituut’s National Collection for Architecture and Urban Planning and the 20-year archive of Tegenlicht (Backlight), a documentary TV programme.
Inside the black box
Although AI is now commonplace, with many fields researching it and looking into its applications, it often remains unfathomable. In many instances, AI is a black box into which one can input something, which then (usually) produces something useful. Spotify playlists, Siri and Alexa voice commands, and self-driving Teslas are all examples that work surprisingly well. However, AI can also be the facial recognition software that does not recognise a Black woman as a person, or the banking algorithm that keeps refusing loans to the same people without anyone knowing why. Even for their developers, it is often impossible to determine why the AI makes a particular decision – a decision made in one of the hundreds of invisible layers of a trained neural network.
We look at AI with a mixture of awe and fear. How beautiful it is, as long as it does not turn against us! Without any prospect of understanding, AI will remain, in part, a mystery.
The heritage sector also makes extensive use of AI, such as for the automation of transcription, translation, annotation and classification. In general, these developments take place at the back-end, assisting the archivist and curator. But what role can AI have at the front-end, between the archive and the public? Moreover, how can AI provide a new public experience that is transparent, demystifies itself, and assumes a balanced relationship between artificial and human intelligence?
This study considers what the archive of the future might look like. How can AI contribute to the accessibility of a digital archive while offering, in a broad sense, new perspectives on how we interact with and experience digital archives?
This study applies existing AI techniques, such as visual object and pattern recognition, text analysis and classification, to the archives of Tegenlicht and the National Collection for Dutch Architecture and Urban Planning at Het Nieuwe Instituut. It focuses on two aspects:
- How does AI contribute to new forms of public experience of the archive?
- How can AI’s role be made visible or represented?
The first step of the study investigates object recognition. Using an already trained model (COCO), objects in 80 different categories can be recognised.Categories from the COCO dataset are, for example, person, car, train, spoon, giraffe. This study does this by using YOLO v3 (You Only Look Once), a popular neural network dedicated to object recognition.https: //pjreddie.com/darknet/yolo/. The algorithm looks at all episodes of Tegenlicht from 2018 and draws a frame around each object it recognises. An average Tegenlicht broadcast alternates between talking heads and cinematographic scenes. The algorithm can detect and label people and general objects.
The limitations of the 80 categories that the model can recognise also become apparent. Although the classification is reasonably accurate, it is also very general. A follow-up test uses a more broadly trained model, including different classifications based on ImageNet 9K 9000, which can make more distinctions.
However, it becomes apparent that the model with more categories makes mistakes more quickly, such as misidentifying trees in a park as broccoli. Sometimes, classifications become too general. For example, the mountaineers in the image below are sometimes recognised as a “person” but also as a “living thing” or “organism”.
These results call into question how the neural network arrives at a classification. We can visualise this using class activation maps. A class activation map shows which “neurons” are activated when an image is recognised. On a colour scale from blue to red, it becomes visible which part of the image activates the network the greatest, with blue being the least active and red the most. The image below shows how the three rightmost figures most activate the Imagenet9K model. The landscape itself scarcely activates it. This is because the model is trained using everyday objects, resulting in people being more prominent than snowy mountain landscapes.
Each time the model recognises something, it allocates a certainty percentage. By setting the threshold value of this percentage very high, the algorithm “sees” less, and by setting it low it sees more, but also makes more mistakes. These first experiments demonstrate that what an AI sees in the form of a neural network strongly depends on the data used to train the model. By using different models, an AI can “see” different things.
Training a model
Based on a selection of 100,000 images from the digital collection of the Het Nieuwe Instituut's National Collection for Dutch Architecture and Urban Planning , the study explores how a neural network can be trained and a model created that can recognise patterns from the selection.
The images from the selection were originally sorted by the architects and mainly contain drawings and some photos. This experiment uses a selection of 11 architects, and the neural network is trained until it reaches a 14% margin of error. This means that of all images that the neural network sees, 14% are attributed to the wrong architect and 86% to the correct one. Although the drawings are often visually similar, the classification is surprisingly effective. By further cleaning up the training data by deleting photos, this margin of error can be reduced further.
A confusion matrix indicates which architects are wrongly classified. This matrix makes it clear that occasionally a drawing by Cuypers is mistaken for a Berlage or a Dudok for a Blom. It is to be expected that a photo-based model is able to recognise the work of an architect in photos or video. A photo-based model is also expected to be able to recognise the work of an architect in photos and videos.
Another application of image recognition and analysis focuses on organising the archive. For this purpose, the study makes use of the selection of 100,000 images and the archive is organised in two steps. In step one, all images are classified (based on a general model) and formally analysed based on colour. Subsequently, an arrangement is made in which the most similar images are placed near one another using a T-SNE algorithm (t-distributed stochastic neighbour embedding) and placed in a grid.The translation from T-SNE to a grid is done with the help of RasterFairy by Mario Klingemann.The result is an abstract image consisting of small images that can be zoomed in on to see the individual image in its new context. Since the organisation is based on both image recognition and analysis, surprising combinations and connections can arise that would not be made based on metadata such as year, location or style.
Generative adversarial network
As a final experiment, the study uses the work of two architects, Piet Blom and Theo van Doesburg, to train a generative adversarial network. Two neural networks are played off against each other. One network tries to recognise the work of a particular architect optimally, and the other tries to produce an image from scratch that resembles the work of the architect in question. At first, this does not work; the images are random, and the recognition algorithm rejects them. After a while, however, it is better able to generate an image that, for example, can pass as a drawing by Van Doesburg, even though it is not. The outcome is visually interesting and raises all kinds of defining questions.
Who is the author of this image? Is it a new work by the algorithm or the architect, or is it a visual summary of his work?
The term artificial intelligence is called into question. Is it not better to talk about supplementary intelligence? Is artificial intelligence a false promise? What is its intelligence? Does it have a genuine understanding?
The techniques used in this study all use pattern recognition. Large amounts of information can train a neural network in such a way that it becomes statistically increasingly likely that it can accurately recognise an image.
Although this is more akin to Pavlovian reaction than intelligence and is not a question of understanding, it does, however, lead to useful applications. Moreover, it offers ample scope for further research.
One example is the possibility of determining a building’s architect based on a photographic image.
To a computer scientist, it comes as no surprise that the training model determines what an AI can recognise. Nevertheless, it makes sense to make this visible and insightful in public applications. How does an algorithm see, and how is it trained? To what extent does it accommodate uncertainty when classifying? Variables that can lead to very different outcomes. Variables that a user can select. This would make an AI less of a black box and more of a tool. It would help demystify the relationship between man and machine. Visualising intermediate steps, such as class activation maps, could also be useful.
A generative adversarial network may seem to generate a new design by Van Doesburg, but what is the significance of this? Is it actually a new work? Who is the author? Who is the copyright holder? Again, it would be a misconception to attribute a creative power to the algorithm. It could be regarded as an attempt to make variations on the architect’s work by distilling an essence.
Artificial intelligence will play a role in how the public experiences the archive of the future. Whether this is in the form of an oracle or a tool is a question of design.
Neural networks are intrinsically stratified and diffuse. A user-friendly interface carries the risk of mystifying the technology and placing the user in a passive role. AI as a tool requires more effort on the part of the user, because it places them in a more active and intelligent role.