How do people organize their documents? For example, some people keep all their emails in the Inbox and search by time or by person to find an email that they need. Others develop elaborate folder hierarchies and manually organize their email accordingly. New email systems like Google's gmail advertise their powerful search functions stating that "you never have to file emails in folders to be able to find them again". This is a big merit, since as folder hierarchies grow, it becomes increasingly hard to find the right folder to classify a file. Hierarchies have other limitations as well, for example it is impossible to put a file in two folders simultaneously, but sometimes one wishes she could, since the file relates to two semantic categories. An alternative to hierarchical (e.g. folder) organization is a flat keyword-based organization. More recently, there has been a lot of talk around "tagging" - for example, of web-pages, digital photos etc. However, users are generally reluctant to think of appropriate keywords to characterize a document. The same word may have many meanings, and the same meaning may be expressed with different words. The language that a person's community talks may not be the same used in another community. If we want to be able to share documents, and find documents shared by others, we need to have in mind the words that others will likely use to search for a document with a given semantics.

The AI community has been doing research in ontologies for the last 15 years, which has recently flourished under the notion of "Semantic Net" with XML, and RDF tagging. However, we believe that forcing everyone to agree on the same ontology is a futile goal. Of course, it doesn't need to be just one ontology; bridges across ontologies can be built through thesauri, like WordNet, and search engines are already able to search for documents across communities using different ontologies. So technically, the problem is close to being solved. However, humanly, it is an unsolvable problem. Requiring to follow a standard ontology puts a huge burden on the author or owner of the document who has to annotate it correctly. The result will be that people simply won't annotate, or will use their own terms which only they understand and can use.

Read the most recent stuff on this topic from Google.