Dear friends in the information access community,
I am reaching out to you with this open letter because I believe we, the leading providers and analysts in the information access community, share a common goal of helping companies understand, evaluate, and differentiate the technologies in this space.
Frankly, I feel that we as a community can do much better at achieving this goal. In my experience talking with CTOs, CIOs, and other decision makers in enterprises, I've found that too many people fail to understand either the state of current technology or the processes they need to put in place to leverage that technology. Indeed, a recent AIIM report confirms what I already knew anecdotally--that there is a widespread failure in the enterprise to understand and derive value from information access.
In order to advance the state of knowledge, I propose that we engage an underutilized resource: the scholarly community of information retrieval and information science researchers. Not only has this community brought us many of the foundations of the technology we provide, but it has also developed a rigorous tradition of evaluation and peer review.
In addition, this community has been increasingly interested in connection with practitioners, as demonstrated by the industry days held at top-tier scholarly conferences, such as SIGIR, CIKM, and ECIR. I have participated in a few of these, and I was impressed with the quality of both the presenters and the attendees. Web search leaders, such as Google, Yahoo, and Microsoft, have embraced these events, as have smaller companies that specialize in search and related technologies, such as information extraction. Enterprise information access providers, however, have been largely absent at these events, as have industry analysts.
I suggest that we take at least the following steps to engage the scholarly community of information retrieval and information science researchers:The rigor and independence of the conferences and workshops makes them ideal as vendor-neutral forums. I hope that you all will join me in working to strengthen the connection between the commercial and scholarly communities, thus furthering everyone's understanding of the technology that drives our community forward.
- Collaborate with the organizers of academic conferences such as SIGIR, CIKM, and ECIR to promote participation of enterprise information access providers and analysts in conference industry days.
- Participate in workshops that are particularly relevant to enterprise information access providers, such as the annual HCIR and exploratory search workshops.
Please contact me at dt@endeca.com or join in an open discussion at http://thenoisychannel.blogspot.com/2008/07/call-to-action.html if you are interested in participating in this effort.
Sincerely,
Daniel Tunkelang
The general goal of the workshop will be to coalesce a research agenda that stimulates progress toward better systems that support information seeking. More specifically, the workshop will aim to identify the most promising research directions for three aspects of information seeking: theory, development, and evaluation.We are still working on writing up a report that summarizes the workshop's findings, so I don't want to steal its thunder. But what I can say is that participants shared a common goal of identifying driving problems and solution frameworks that would rally information seeking researchers much the way that TREC has rallied the information retrieval community.
We need to raise the status of evaluation procedures where recall trumps precision as a success metric. Specifically, we need to consider scenarios where the information being sought is existential in nature, i.e., the information seeker wants to know if an information object exists. In such cases, the measures should combine correctness of the outcome, user confidence in the outcome, and efficiency.I'll let folks know as more information is released from the workshop.
Exploratory search can be used to describe an information-seeking problem context that is open-ended, persistent, and multi-faceted; and to describe information-seeking processes that are opportunistic, iterative, and multi-tactical. In the first sense, exploratory search is commonly used in scientific discovery, learning, and decision making contexts. In the second sense, exploratory tactics are used in all manner of information seeking and reflect seeker preferences and experience as much as the goal (Marchionini, 2006).If we accept this dichotomy, then the first sense of exploratory search is a niche use case, while the second sense characterizes almost everything we call search. Perhaps it is more useful to ask what is not exploratory search.
But fault does not lie with technology solution providers. Most organizations have failed to take a strategic approach to enterprise search. 49% of respondents have "No Formal Goal" for enterprise Findability within their organizations, and a large subset of the overall research population state that when it comes to the "Criticality of Findability to their Organization’s Business Goals and Success", 38% have no idea ("Don’t Know") what the importance of Findability is in comparison to a mere 10% who claim Findability is "Imperative" to their organization.As I've blogged here before, there is no free lunch, and organizations can't expect to simply plug a search engine into their architectures as if it were an air freshener. But that doesn't let Endeca or anyone else off the hook. It is incumbent on enterprise search providers, including Endeca, both to set expectations around how it is incumbent on enterprise workers to help shape the solution by supplying their proprietary knowledge and information needs, and to make this process as painless as possible.
What is the difference? I think it's easiest to understand by thinking of a free-text search query as causing you to be dropped at some arbitrary point on a map. Our planet is sparsely populated, as pictured below, so most of the area of the map is off-road. Hence, if you're dropped somewhere at random, you're really in the middle of nowhere. Before you start trying to find nearby towns and attractions, your first task is to find a road.
How does this metaphor relate to clarification vs. refinement? Clarification is the process of finding the road, while refinement leverages the network of relationships in your content (i.e., the network of roads connecting towns and cities) to enable navigation and exploration.
"Did you mean..." is the prototypical example of clarification, while faceted navigation is the prototypical example of refinement. But it is important not to confuse the concrete user interfaces with their intentions. The key point, on which I'm glad to see Peter agrees, is that clarification, when needed, is a prerequisite for refinement, since it gets the user and the system on the same page. Refinement then allows the user to fully exploit the relationships in the data.
Last night, I had the privilege of speaking to fellow CMU School of Computer Science alumni at Fidelity's Center for Advanced Technology in Boston. Dean Randy Bryant, Associate Director of Corporate Relations Dan Jenkins, and Director of Alumni Relations Tina Carr, organized the event, and they encouraged me to pick a provocative subject.
Thus encouraged, I decided to ask the question: Is Search Broken?
Slides are here as a PowerPoint show for anyone interested, or use the embedded SlideShare show below.
| View | Upload your own
A couple of weeks ago, my colleague Luis Von Ahn at CMU launched Games With a Purpose,
Here is a brief explanation from the site:
When you play a game at Gwap, you aren't just having fun. You're helping the world become a better place. By playing our games, you're training computers to solve problems for humans all over the world.
Von Ahn has made a career (and earned a MacArthur Fellowship) from his work on such games, most notably the ESP Game and reCAPTCHA. His games emphasize tagging tasks that are difficult for machines but easy for human beings, such as labeling images with high-level descriptors.
I've been interested in Von Ahn's work for several years, and most particularly in a game called Phetch, a game which never quite made it out of beta but strikes me as one of the most ambitious examples of "human computation". Here is a description from the Phetch site:
Quick! Find an image of Michael Jackson wearing a sailor hat.
Phetch is like a treasure hunt -- you must find or help find an image from the Web.One of the players is the Describer and the others are Seekers. Only the Describer can see the hidden image, and has to help the Seekers find it by giving them descriptions.
If the image is found, the Describer wins 200 points. The first to find it wins 100 points and becomes the new Describer.
A few important details that this description leaves out:
Now, let's unpack the game description and analyze it in terms of the Human-Computer Information Retrieval (HCIR) paradigm. First, let us simplify the game, so that there is only one Seeker. In that case, we have a cooperative information retrieval game, where the Describer is trying to describe a target document (specifically, an image) as informatively as possible, while the Seeker is trying to execute clever algorithms in his or her wetware to retrieve it. If we think in terms of a traditional information retrieval setup, that makes the Describer the user and the Seeker the information retrieval system. Sort of.
A full analysis of this game is beyond the scope of a single blog post, but let's look at the game from the Seeker's perspective, holding our assumption that there is only one Seeker, and adding the additional assumption that the Describer's input is static and supplied before the Seeker starts trying to find the image.
Assuming these simplifications, here is how a Seeker plays Phetch:
The key observation is that Phetch is about interactive information retrieval. A good Seeker recognizes when it is better to try reformulating the search than to keep scanning.
Returning to our theme of evaluation, we can envision modifying Phetch to create a system for evaluating interactive information retrieval. In fact, I persuaded my colleague Shiry Ginosar, who worked with Von Ahn on Phetch and is now a software engineer at Endeca, to elaborate such an approach at HCIR '07. There are a lot of details to work out, but I find this vision very compelling and perhaps a route to addressing Nick Belkin's grand challenge.