I recently posted some thoughts on controlled vocabularies and whether there is such a thing as ‘industry standard’ at the Controlled Vocabulary forum. Here’s a modified excerpt of that text (please excuse if the paragraphs seem disjointed):
As someone who prepares data (including keywording and captioning) for Getty, Corbis and Alamy as well as small specialist libraries on a daily basis, I recognise that there isn't a single industry keywording standard. I can only wish for one as this would make my job a bit easier. Yet, Getty, Corbis and some other generic stock libraries are in many ways similar and, on the whole, the apparent differences are, as has been said in the Keywording Central blog, “stylistic rather than substantive”. Both agencies typically use terms such as Teenager, Mid Adult and Young Adult for age; Teenage Girl, Mid Adult Man and Young Woman for gender; and Caucasian and Asian ethnicity and for ethnicity. This would be a good starting point for standardisation.
As I said in my previous post, I believe it is possible to define good keywording and I would go even further by adding that it is necessary to define good keywording if image searching in the stock industry is to be meaningful (note: I regard social tagging a different kettle of fish which is why I limit the discussion to stock). Keywords are crucial for accessing images (at the moment content based retrieval cannot achieve the same level of accuracy and consistency as text based retrieval) and overarching rules on how to make visual content accessible by text apply. To summarise my previous post, good keywording is a) consistent (terms are used consistently and in the same way across the collection), b) relevant (meet the needs and search methods of the searchers) and finally c) diverse (cater as many needs as possible). How exactly these principles are applied varies from agency to agency but the overall criterion is the same.
Some argue that regardless of its advantages a controlled vocabulary runs into problems because language is a moving target, with new words entering the English vocabulary and old terms becoming obsolete. To an extent this is true but I don’t think this is where the limitations of a controlled vocabulary lie. In practice, keeping up with changing language isn’t a problem - I tweak my vocabulary maybe once a month and it’s never more than a couple of hours’ work. I don’t think the sheer size of the English language is a huge problem either as most of us only use a fraction of the entire English vocabulary (even Shakespeare who is regarded as having an unusually large vocabulary covered only a small part of the entire language) and our everyday language, which is one criteria for choosing search terms, is fairly limited. Further, the key aim of a controlled vocabulary is precisely to deal with this vastness of language - a well-constructed keyword tree will bring consistency to the way in which words are used whilst allowing people to find images regardless of the term they are searching for (and contrary to some claims, choosing a preferred term over another isn’t arbitrary as this depends on the target audience and how terms are used and understood in a given field). For me, the real limitations of a controlled vocabulary lie in the differences between text and visuals, which is another topic all together.
There has been a trend to imitate the major industry players’ vocabulary with the result of peculiar terms such as Human Hand and Full Suit being used. You may think that this is counter-intuitive and unhelpful as it is not how people normally talk and therefore it is not how people would search for images. However, the use of such terms arise from the need to display words as unambiguously as possible. This doesn’t mean that the searching experience is jeopardized. People can still use commonly used terms such as Hand and Suit and find the images they are after. Even in a search engine that doesn’t employ a thesaurus I think the use of neologisms together with ‘common’ words can be justified as it adds more search value, which is surely desirable.
Going back to my earlier point and to echo Sarah Saunders, there is no one solution that fits all but there are common sense principles that can dramatically improve image findability.
0 comments:
Post a Comment