City University London research has contributed to search functions that deliver relevant results quickly - saving time and money.
Published (Updated )
Professor Stephen Robertson, Professor of Information Science, who has been with the University since 1978, developed the theory put into practice - within software - by Research Fellow Stephen Walker, who was at City from 1988 to 1998.
The pair used probability theory to rank information according to the user's information need, in a function known as BM25 (BM standing for Best Match, 25 being the function number in software). The breakthrough came at the 1994 Text Retrieval Conference, showing that BM25 could significantly improve search results compared to other search ranking models.
Further work was carried out with Microsoft Research Cambridge, UK, to refine the ranking function, and research continues at the Department of Computer Science with researchers including Dr Andrew MacFarlane, on the optimization of filtering queries.
"Microsoft uses a form of this ranking function in its Bing search engine, which was first introduced in 2005 and is now the second largest web search engine after Google."
While commercial search engine companies do not disclose the inner workings of their search algorithms, these companies followed the development of BM25 closely as attested to by their attendance at the presentation of the research, and it is likely they have been at least influenced by this research.
Microsoft uses a form of this ranking function in its Bing search engine, which was first introduced in 2005 and is now the second largest web search engine after Google.
As well as powering web searches, this technology enables the advertising activity linked to these searches, which through Bing and other large search engines comprises an $8.4 billion market worldwide. With better ranking and more relevant documents promoting greater user acceptance and therefore greater advertising revenue, the City BM25 model has helped to deliver widespread commercial success.
"BM25 has also been adopted by widely-used open source software, including Apache Lucene [which is] used on nearly two thirds of all websites."
The model has been implemented successfully by small businesses to provide better search services for their clients. Muscat Limited used the probabilistic algorithms to develop internet and intranet search systems in the 1990's; software which still powers the Yell search engine today in 2014. Consumers accessing the BBC, Nokia, NASA and hundred other global websites, used the probability ranking Muscat search engine on a daily basis, to access the depths of the internet using simple and forgiving query constructs. Today the co-founders of Muscat use similar algorithm approaches to power behind-the-scenes advertising optimization - using the words on pages to contextually query 'best match' advertising during the time it takes a consumer to load the web-page.
BM25 has also been adopted by widely-used open source software, including Apache Lucene, Xapian and Greenstone. Software from the Apache project was used on nearly two thirds of all websites as of September 2013.
Flax, an enterprise search engine built on the Xapian system, is used by companies including the Government Digital Service, Newspaper Licensing Agency, TMV Marine, C Spence Ltd, Australian Associated Press, Reed Specialist Recruitment, Financial Times, Durrants and MyDeco to provide their search systems.
The Greenstone system, partly developed at City, University of London, is an open source digital library system whose users include various United Nations organisations including the UN Educational Scientific and Cultural Organisation (UNESCO), which encourages its use by client organisations.
Web users around the world may take it for granted that they can gain quick and useful search results for whatever query they care to type into a search engine. Pioneering research from City helped to drive this area as the Internet was first emerging as a public platform, and can be still be seen at work in major publicly available search engines and within business applications.