Love Lives Longer

"Don't Judge A Person By His Action, But By His Intention"

My Photo
Location: Charlotte, North Carolina, United States

Wednesday, March 05, 2008

Challenges faced in Enterprise Search

Challenges faced in Enterprise Search

White Paper

Vivek Anand
Retail & Consumer Goods

Confidentiality Statement

The information contained in this document represents the current view of TCS on the issues discussed as of the date of publication. Because thing TCS present must respond to changing market conditions, it should not be interpreted to be a commitment on the part of TCS, and TCS cannot guarantee the accuracy of any information presented after the date of publication.
This white paper is for informational purposes only. TCS HOLD NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT.
Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in, or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of me.
TCS may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from TCS, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.
© 2008 TCS. All rights reserved.

About the Author

I am Vivek Anand, currently working in Hallmark project Onsite in Kansas City, Missouri
My role is of team member in project called Digital Asset Management Evolution (DAMe). I am currently engaged in building an enterprise search interface for Hallmark for their digital asset management.
I joined TCS Kolkata in November 2005 and have worked on couples of project in retail and consumer goods domain.


Executive Summary
Enterprise Search Challenges
Thing which are Driving Enterprise Search?
Few Challenges You Face in Enterprise Search
The Keys to Successful Enterprise Search
Enterprise Search Solution Evaluation Guide

Executive Summary

Search is no more just finding information in the business world. Search isn’t just about finding information. Infact it is a starting point. To be valuable and of some use to an organization, a search has to result in the ability to do something meaningful and profitable with the information you find. It has to be an integral part of a business productivity infrastructure.
Influenced by the consumer search experience on the Internet, the people in your organization have clear and demanding expectations about the way Search will look and feel within a business environment — as well as high standards for the relevance of results served. As an IT professional, you’re aware of the importance of effective enterprise-wide Search capabilities, and you know what kind of Enterprise Search experience employees are looking for. But you may find it a challenge to deliver what’s required, because Enterprise Search and Internet Search are very different.
Unstructured information represents the vast majority of data and accessible to enterprise.Exploiting this information requires systems for managing & extracting knowledge from large collections of unstructured data and application for discovering patterns and relationships.
In addition, navigating the marketplace for Enterprise Search solutions is a time-consuming and confusing task, with choices seemingly polarized between low-end, inexpensive offerings with basic features, and high-end, highly customizable, expensive solutions.
This white paper, written for Technical Decision Makers and people working in enterprise search irrespective of domain, takes a look at the drivers and challenges that define Enterprise Search and examines the key elements of a successful Enterprise Search solution. It demonstrates how a winning solution gives business users/workers access to widespread unstructured sources as well as structured and line-of-business (LOB) system data while respecting an organization’s varied security needs.

Enterprise Search Challenges

Influenced by the consumer search experience, and driven by a clear need to provide business user with timely, customized access to relevant business data, companies are looking for comprehensive search capabilities that span disparate information sources and integrate seamlessly with existing infrastructure. However, given the complexities of the enterprise environment, the challenge that many organizations are facing is how to ensure that an enterprise search matches user expectations, and how to make sense of the seemingly polarized choices available in the marketplace. The assets and resources of orginisation are spread wide across the globe.Further intrinsically unstructured form of information and resources poses even greater challenges.
Thing which are Driving Enterprise Search?

Explosion of Information
The edifice of today’s business scenario is information — and we’re all aware that the volume of information we consume, as well as the data we generate, is growing rapidly — quantified at a rate of about 50 percent per annum. The information explosion in the workplace has imposed new performance pressure on employees, who now work with an overwhelming amount of data and struggle to make sense of what they find. According to IDC (International Data Corporation) report incolloboration with EMC Corporation. Key findings are:
The 2006 digital universe was 161 billion gigabytes (161 exabytes) in size.
IDC projects a six fold annual information growth from 2006 to 2010.
While nearly 70% of the digital universe will be generated by individuals by 2010, organizations will be responsible for the security, privacy, reliability and compliance of at least 85% of the information.
In 2006, 161 exabytes of digital information were created and copied, continuing an unprecedented period of information growth. This digital universe equals approximately three million times the information in all the books ever written – or the equivalent of 12 stacks of books, each extending more than 93 million miles from the earth to the sun. According to IDC, the amount of information created and copied in 2010 will surge more than six fold to 988 exabytes, a compound annual growth rate of 57%.
While nearly 70% of the digital universe will be generated by individuals by 2010, most of this content will be touched by an organization along the way – on a network, in a data center, at a hosting site, at a telephone or Internet switch, or in a backup system. Organizations – including businesses of all sizes, agencies, governments and associations – will be responsible for the security, privacy, reliability and compliance of at least 85% of the information.
“This ever-growing mass of information is putting a considerable strain on the IT infrastructures we have in place today,” said Mark Lewis, EMC Executive Vice President and Chief Development Officer. “This explosive growth will change the way organizations and IT professionals do their jobs, and the way we consumers use information. Given that 85% of the information created and copied will be the responsibility of organizations and businesses; we must take steps as an industry to ensure we develop flexible, reliable and secure information infrastructures to handle the deluge.”
“The incredible growth and sheer amount of the different types of information being generated from so many different places represents said John Gantz, Chief Research Officer and Senior Vice President, IDC. “It represents an entire shift in how information has moved from analog form, where it was finite, to digital form, where it’s infinite. From a technology perspective, organizations will need to employ ever-more sophisticated techniques to transport, store, secure and replicate the additional information that is being generated every day.”
Other key findings:
Images – Images, captured by more than 1 billion devices in the world, from digital cameras and camera phones to medical scanners and security cameras, comprise the largest component of the digital universe.
Digital Cameras – The number of images captured on consumer digital still cameras in 2006 exceeded 150 billion worldwide, while the number of images captured on cell phones hit almost 100 billion. IDC is forecasting the capture of more than 500 billion images by 2010.
Camcorders – Camcorder usage should double in total minutes of use between now and 2010.
E-mail – The number of e-mail mailboxes has grown from 253 million in 1998 to nearly 1.6 billion in 2006. During the same period, the number of e-mails sent grew three times faster than the number of people e-mailing; in 2006 just the e-mail traffic from one person to another – i.e., excluding spam – accounted for 6 exabytes.
Instant Messaging – There will be 250 million IM accounts by 2010, including consumer accounts from which business IMs are sent.
Broadband – Today over 60% of Internet users have access to broadband circuits, either at home, at work or at school.
Internet – In 1996 there were only 48 million people routinely using the Internet. The Worldwide Web was just two years old. By 2006, there were 1.1 billion users on the Internet. By 2010, IDC expects another 500 million users to come online.
Unstructured Data – Over 95% of the digital universe is unstructured data. In organizations, unstructured data accounts for more than 80% of all information.
Compliance and Security – Today, 20% of the digital universe is subject to compliance rules and standards and about 30% is potentially subject to security applications.
Classification – IDC estimates that today less than 10% of organizational information is “classified,” or ranked according to value. IDC expects the amount of classified data to grow better than 50% a year.
Emerging Economies – These now account for 10% of the digital universe but will grow 30-40% faster than mature economies.

Demand to Make Search Real
In the business world, Search isn’t just about finding information from the dataware house. To be valuable to an organization, a search has to result in the ability to do something meaningful and profitable with the information you find. Enterprise Search isn’t simply about investigating content; it’s all about applying the knowledge you gather and using it to benefit the business you’re driving. It’s about real people needing the right tools to help them get their jobs done.

The Consumer Search Experience
Influenced by the consumer search experience on the Internet, business users/workers have great expectations about how a search solution should look, feel and perform. As an IT professional, you’re aware of the importance of effective enterprise-wide Search capabilities, and you know what kind of experience employees are looking for. But you may find it challenging to deliver what’s required because Enterprise Search and Internet Search are very different. Although the enterprise corpora are smaller, they lack the highly hyperlinked nature of web, thus some of the most successful techniques for the web, based on link analysis, do not apply in the enterprise.This results in lower relevancy of retrieval documents. Another factor is that in the enterprise there are additional security, reliability, and performance issues that complicate the problem. A well publicized example is the need to protect the privacy of individuals’ personal data.

Empowering People to Find Information & Expertise
In this very competitive Business scenario to achieve business objectives, business users/ workers must have access to the enterprise people and data they need to make informed, timely, and impactful decisions. But that information must be relevant — to avoid overburdening a person with unnecessary and distracting data, or conversely under-serving them with lack of detail. It also needs to be well protected to ensure that information is transparent only to authorized users. The Head of customer relations will need a very different view of the same data as a customer care specialist, for example. It’s all about getting the right amount of information to the right person in the right format.
Particular information may be meaningful to someone and obscure to other. It all depends on latitude in which they work. Data out of scope is of no meaning. A sales executive responding to a Request for Proposal (RFP) might need to access information from her laptop, a corporate information site, and some web pages. A finance manager reviewing a budget would be more interested in data from finance systems, document repositories, team sites, as well as input from subject matter experts. An executive preparing a strategy briefing might concentrate his search on SAP, or another line-of-business (LOB) system.

Need to Increase Business Efficiencies
An estimate of IDC (International Data Corporation), reveals that a search which results in unnecessary or no result can cost an organization millions of dollars yearly — the expense of not finding the information needed costs an organization employing hundreds of knowledge/business workers about US$5.3 million per year as they search through vast amount of structured, line-of-business (LOB) system and unstructured data. Considering the stakes, companies simply cannot afford to sustain an inefficient Search solution.

Few Challenges You Face in Enterprise Search

The Difficulty in Meeting Expectation of User Search
Search on internet has increased by leap and bound since the launch of really fast web search engine like Google, Yahoo, and Lycos etc. Internet search has dramatically grown as a cultural phenomenon, as a business, and as an easy way to find information about any subject. Based on the success and ubiquity of Internet Search, an organization might reasonably assume that the same ranking ingredients or algorithms could be applied successfully to Enterprise Search. The assumption is partially correct. Many of the broad approaches, when properly tuned, do help with relevance in Enterprise Search. However, to adequately assess an Enterprise Search solution, it is important to be aware of the differences between the Internet and the Enterprise.

The Key Differences between Internet and Enterprise Search
There are three main differences between an Internet and an Enterprise Search: Link Structure, Cross-Site Hierarchy, and Security.
Link Structure
Although the enterprise corpora are smaller in size and scope in comparison to World Wide Web, they lack the highly hyperlinked nature of WWW, thus some of the most successful techniques for the web, based on the link analysis, donot apply in the enterprise.The hyperlink model on the Internet is rich, especially among popular sites, because web page authors tend to link content in order to locate their sites in relation to others. The rising popularity of blogging has quickly enriched this aggregate link structure, with new content rapidly linked to and commented on.This results in lower relevancy of retrieval documents.
By contrast, the link structure in an enterprise tends to be far less dense, because people at work do not spend a lot of time creating hyperlinks to other content. In a business context, these links do not figure strongly in the successful use of the content. What link structure does exist in the enterprise tends to be more introspective and navigational than editorial in nature. Information owners might provide a table of contents or a list of related items, but do not often spend time writing descriptive metadata, attaching tags, precise taxonomies, and rich, hyperlinked annotations to trigger search algorithms.

Cross-Site Hierarchy

Enterprise Portal
Department Portal

Supplier/vendor Portal

Influneced by the Organisation hierarchy its Portal will often set up intranets to be somewhat, if not entirely, hierarchical in nature. Take for example any retails Organisation the enterprise portal is typically regarded as the root of the entire intranet, departmental portals are second-order sites, and supplier/vendor portal then fold underneath as third-order sites. Often this structure is highly-planned and regulated such that sites of a given type (for example, a HR Section) always fit into the hierarchy at a predefined level. Sometimes there are multiple roots, or authoritative sites, resulting in multiple hierarchies.
This is in sharp contrast to the Internet, where some popular portal sites could be considered roots, but certainly do not serve as top-level nodes in a strict and consistent cross-site navigational structure across the entire Internet.

Security, Reliability and Performance
In an Enterprise there are additional security, reliability and performance issues that complicates the problem.A well publicized example is the need to protect the privacy and personal information (like credit card details, employee payroll etc.) of an employee.
The vast majority of content on the Internet is accessible anonymously, so a user would not expect to find information that requires authentication. This means that Internet search engines to not have to trim out results that the user should not see. So, given the same query, every user gets the same results.

Achieving Relevance When Faced With Varied Information Sources
In the context of Search, relevance refers to the usefulness of results in relation to an initial query. Relevance is a key to effective Search performed.The results returned would be of no use if it is of no use to Business users/worker. Suppose a user is looking for Java expert person in TCS India in TCS Intranet portal.The results of query would be of no use if it returns experts in Java from TCS all over the world. So relevancy is important.
Information lives in many varied places both within and outside of an organization. It also exists in different forms. Around 75 percent of the information we seek exists in semi-structured or unstructured formats such as document files, share sites, subscription services, and websites.
Take for example Hallmarks Cards a retail gaint in gifts and greeting Cards section. It has lot of organizational data. Data can be media file, sound file, image file, or even copysheet, invoices etc. Being the global organization it has office all over the world, so is its data. Data are spread across world in different databases,storage mediums and file servers.While most enterprise employees have access to these types of information, it is often inefficiently dispersed, and searchers frequently have to drill through large amounts of irrelevant information to find what they’re looking for.
In contrast, users may have difficulty accessing enough information when it comes to accessing data from structured data sources and line-of-business (LOB) systems. While frequent users of LOB systems can afford to spend the time familiarizing themselves with specialized interfaces, casual users can’t justify the same kind of investment. Search in these areas is often difficult or impossible due to the complexity involved in accessing systems such as Mainframe, CRM, SCM, ERP applications. Consequently business users/workers are unable to access relevant information and are deprived of the tools they need to achieve business results.

Which Solution would Suite My Business Need?
When choosing an Enterprise Search solution, your organization will require you to balance many factors—including cost, usability, scalibility and extensibility. However, you may be concerned about what the marketplace currently has to offer. Traditionally, the Search market has offered two types of solutions: low-end, inexpensive offerings with basic features, and high-end, costly technologies that are resource-intensive and time-intensive to implement and manage. As a result, you risk making the wrong decision and either outgrowing an entry-level solution or regretting the purchase of a costly, complex platform that under-delivers.

The Keys to Successful Enterprise Search

A good Enterprise Search solution will be as effective for the people who use it as it is for the people who are responsible for its administration and security, so it’s not surprising that there is no single component that defines successful Enterprise Search.
While relevancy is a key, the user experience is also important, because an unfamiliar, complex, or inconvenient user interface will be a barrier to adoption.A good search engine is one which makes good use of meta-data attached to the asset/resource.
An effective Search solution will also provide efficient access to unstructured data and unlock information stored in line-of-business (LOB) systems. It will provide easy access to people and expertise—making it a one-stop-shop for finding all the organization has to offer to solve a problem. And it will meet the needs of IT professionals by providing a secure, manageable, scalable and extensible platform.

Enterprise Search Solution Evaluation Guide

· Relevance should be tuned for the organization, taking into account differences between Enterprise and Internet Search, such as link structure, hierarchy across sites, security, and the level of difficulty involved in finding documentation (document findability).
· Relevance should also take into account a rich and broad range of additional factors, such as click distance, URL text matching, metadata extraction, language detection, file type biasing, and text analysis.

Reach of an enterprise search solution refers to the types of information it can provide search over, and thus, find for the user. Users have come to have certain expectations for search engines, and those include being able to find every single piece of content “out there”. Reach can be measured along three dimensions:

· The type of content it can provide results for. Content can be classified in structured and unstructured content. Structured content refers to that one found in databases and line of business applications. Although different business applications provide some level of search functionality, the problem is that users must log-into the right business application to find that content – which seriously limits the usefulness of those search capabilities. Unstructured content refers to documents, spreadsheets, presentations and other files that don’t have a pre-defined or standard schema. Enterprise content, information and knowledge is increasingly located in unstructured content.

· The places it can reach the content in. While structured content primarily lives in databases, unstructured content can be found in file systems, web sites, file shares and other places. It’s important that the enterprise search solution you deploy can get to all the places where your enterprise content lives.

· Formats it can support. Format is something that is more relevant for unstructured data. Regardless where the content is located, it is important that the search engine is able to open and index it – independently of what program was used to create the file
User Experience
· The interface between the user and the search function should be simple, intuitive, and familiar, to encourage and enhance the experience.
· Search should be available from the interfaces of frequently used applications.
Efficient Access to Unstructured Data
· Although people generally have access to unstructured data, the process of finding it is often inefficient, with files in multiple locations (for example, multiple file shares containing duplicate copies and different versions of documents). Look for a solution that provides a clear, direct, and quick path to relevant information.
Access to Structured Data and LOB Systems
· Many organizations lock down much of their structured data, for fear of unauthorized users seeing more than they should — resulting in users being deprived of information that could be useful to them. Find a solution that helps secure and protect information where necessary, allowing appropriate access to structured data and LOB systems such as Siebel, SAP, CRM, and enterprise resource planning (ERP).
· Where users do have unobstructed access to structured data, differences in the search interface, syntax, and query methods can result in challenges both during the search for information and when interpreting results. Look for a solution that provides a common Search framework, regardless of the information source, and make sure that the interface allows casual users to have easy access to complex data sets.
Security, Management and Scalability
· Look for solutions that provide custom security trimming, as well as standard features to help protect corporate information from unauthorized access. Find out how granular the administrative controls are and check for customizable interfaces, scalability, and extensibility.


The hot debate over Enterprise Search and Web Search will go on. But one must understand the pros and cons of both the way.Enterprise search cannot as extensive as web search but positively it can incorporate many feature of web search and of course add some extra features like faecet search, filtering of search results based on the meta-data which matches with search query.

Simply put, the market place is not monolithic in its requirements. The diversity of demands on search technologies has been a disincentive for vendors to focus on the distinct niches and place more efforton the area like e-commerce.This seems to be shifting, especially with all the large software companies now seriously announcing products in the enterprise search markets.

