Oshyn Home Page
  • expertise
    • Overview
    • Contact Us |
    • Latest work: www.miramax.com
  • solutions
    • Overview
    • Content Management
      • Choosing a CMS
      • Site Migration
      • Sitecore Consulting
      • EPiServer CMS Consulting
      • Jahia Integration
      • Legacy CMS Solutions
      • Drupal Development
      • Common Issues
      • Training
    • Web Strategy
    • Mobile Platforms
    • Social Media
    • E-commerce
    • Portals & Collaboration
    • SOA
    • Contact Us |
    • Latest work: www.miramax.com
  • work
    • Overview
    • Client Quotes
    • Contact Us |
    • Latest work: www.websense.com
  • resources
    • Overview
    • News & Events
    • Newsletters
    • Blog
    • White Papers
    • Success Stories
    • Press Kit
    • Contact Us |
    • Latest work: www.disneydvd.com
  • partners
    • Overview
    • Agency Partner Program
    • Technology Partners
    • Contact Us |
    • Latest work: www.nea.org
  • company
    • Overview
    • Contact
    • Careers
    • Leadership Team
    • News & Events
    • Social Responsibility
    • Contact Us |
    • Latest work: www.icon4x4.com
  • Tweet
Monday, May 25, 2009  /   Oscar Bernal Oscar Bernal
Author Page
close

Oscar Bernal


Implementing Search Suggest with Apache Solr (Part1)

One of Oshyn’s latest success projects http://disneydvd.com relies heavily on its “search suggest” feature for providing a user friendly search to all available titles in the site. Needless to say there are many, many movies/tv shows to search for on such site and it is imperative to allow users to find what they are looking as fast as possible and with the least trouble.

With these obvious requirements in mind, Oshyn turned to Apache’s Solr enterprise search server. In short words, Solr works as a Lucene based web application that can be deployed to any Java servlet container. Solr provides several methods for querying the index through XML/HTTP and JSON APIs that are ideal for a search suggest feature where the browser will be making several AJAX calls on every user input.

Configuring Solr is not a challenge and is very well documented. However when it comes to a feature such as search suggest a different story arises and there are many variables to take into account to get a successful “search suggest” working in your site.  In this blog post I will identify the challenges Oshyn had to overcome when implementing this feature with Solr and will demonstrate the solutions to finally implementing a solid “Google like” search suggest in your site. 

1)Dealing with lack of support for wild card searches on phrase queries:

A “Phrase” as defined by the Lucene documentation on (http://lucene.apache.org/java/2_3_2/queryparsersyntax.html) is a group of words surrounded by double quotes such as "hello dolly".  Using Lucene you can search for phrases within your index by building a lucene query such as title:"pirates of the" where the field you want to search is called “title”. This is a very useful feature for finding results based on partial phrases, however this is not close to being enough when it comes to “search suggest” and I’m going to explain why.

In search suggest, if your users input “Pirates of the” in your search box, you want them to get the suggestion “Pirates of the Caribbean”. Lucene’s phrase search would work for this example, but how about when they enter “Pirates of th” or “Pirat” or… you can see where this is going. Phrase search will not return your desired results for this input because these are not phrases contained in your index. Remember a phrase in Lucene is considered as a group of WORDS (non space separated).

If you want your users to get the correct results you need to work around this a bit.

Oshyn implemented a query builder that turned a user’s input into a Lucene query that would always return the desired results.  How did we do this?

Lucene supports wildcard searches, meaning you can enter a partial word and lucene will return all results matching your partial criteria and 0 or more characters following your wildcard. For example searching for “Pira*” will return “Pirates”, “Pirate”, “Pirana”, etc.  Now you see we are getting close to where we want to but not close enough yet. Lucene does NOT support wildcard searches on phrase queries,  meaning searching for “Pirates of th*” will not be interpreted by Lucene.

Turns out a combination of phrase and wildcard queries is exactly what you need to provide a fully functional search suggest.
Enough introduction and let’s get to the action. How does this look to you?

User input:  Pirates of th
Oshyn’s Query Builder Output: (title:Pirates AND title:of AND (title:th* OR title:th)) OR title:"pirates of th" 

The query builder parses the original input and builds one that simulates a wildcard phrase query. It looks for all the words the user entered and adds a wildcard (*) to the last word. It also searches for the whole phrase the user entered using a phrase query in case the whole phrase is found in the index. This should work!

Part 2 of this entry will show you why such a query might not work immediately for you. The secret being “stopwords” so please keep on reading part 2 if you like this so far and don't forget to leave any comments!
Trackback Link
http://oshyn.com/BlogRetrieve.aspx?BlogID=1906&PostID=66760&A=Trackback
Trackbacks
Post has no trackbacks.

Pages: Previous Next

TwitterFacebookLinkedIn
ajax rotator

Blog Authors

Christian Burne   Christian Burne  
Subscribe Subscribe Subscribe Subscribe Subscribe
OTHER CATEGORIES
  • ALL

  • General

  • Web Content Management

  • Sitecore CMS

  • Open Text

  • Jahia

  • Drupal

  • EpiServer

  • SOA

  • Social Media and Mobile

  • Software Development

  • Visit Bloggers Profiles

RELATED POSTS
  • Managing Complexity in the Web Design Process
  • Why you don't have to rank #1 in search
  • 5 Ways a Discovery Phase can help keep your web design projects sane
  • Top 3 Mistakes Agencies and In-House Departments Make on Web Projects
  • Foundation Framework: A responsive front-end framework for your site
  • Introduction to Responsive Web Design
  • LESS Beginner’s Guide – A look into the CSS Preprocessor
  • Pros and Cons of using jQuery Mobile for your site
  • Seven Predictions for 2013 from Oshyn
  • Your site's SEO is only part of the puzzle

WHITE PAPERS

    Web Content Management, Social Media, Content: Three Kings for Your Website Web Content Management, Social Media, Content: Three Kings for Your Website (846 KB)
    Companies pursuing online marketing success, including Social Media, can increase the power of their online presence with right strategy and technology to maximize online visibility and engagement. Download this FREE white paper on the WCM, Social Media, and Content triad.

    Drupal Performance Tuning Drupal Performance Tuning (1213 KB)
    In this Free White Paper Oshyn evaluates Drupal Performance Tuning, sharing the results of testing response time and Requests Per Second (RPS) that a server can hold before the response rate becomes unacceptable. In this paper you will learn about optimizing performance of a website through changes to settings and the server.

    Enterprise Drupal: Social Media, Mobile, and Rich Media in your Website Enterprise Drupal: Social Media, Mobile, and Rich Media in your Website (1015 KB)
    In this free WCM white paper, Oshyn examines advanced Drupal capabilities: Multisite Environment, Access Control and Security, Enhanced User Profiles, Custom Breadcrumbs, Mobile Support, Podcasts, Advanced Multimedia, Locations and Maps, Internationalization and Locale based content, Events and Scheduled Tasks, Rules Actions and E-Commerce Solutions.

    Drupal Multilingual Drupal Multilingual (636 KB)
    There are several multilingual installation methods for Drupal. In this free white paper Oshyn evaluates and recommends several methods of using Drupal Open Source CMS to manage websites in multiple languages.

    Drupal Social Media Drupal Social Media (1297 KB)
    Looking for an Open Source CMS to for “Social Media Optimization” of your website? Download this free white paper, “Drupal and Social Media”, to learn about the extensive Social Media this Open Source CMS offers to create a dynamic and engaging website and online community.

    Drupal Multisite Options Drupal Multisite Options (427 KB)
    There are several multisite installation methods for Drupal. In this free white paper Oshyn evaluates and recommends several methods of using Drupal Open Source CMS to manage multiple sites.

    Open Source CMS: Is It Right for your Organization Open Source CMS: Is It Right for your Organization (496 KB)
    In this free white paper, “Open Source CMS: Is It Right for your Organization?” we share an in-depth look at the pros and cons of using Open Source Content Management Systems (CMS) or Open Source Web Content Management (WCM) platforms. Oshyn helps clients select CMS/WCM solutions based on the specific requirements of each client.

    Affiliate Content Sharing in a CMS/WCM World Affiliate Content Sharing in a CMS/WCM World (273 KB)
    The Content Editors at your company have created GREAT content! Now how do you share it? In this Free white paper learn several methods for using a Content Syndication tool to automatically repurpose content and how Content Sharing can generate business value.

    Sitecore and Social Media - An Interactive Web Content Management Platform Sitecore and Social Media - An Interactive Web Content Management Platform (898 KB)
    Social Media has revolutionized how people interact with business. In this white paper Oshyn’s Lead Sitecore Developer, Prasanth Nittala, discusses key points from the perspectives of marketing and Web development that make Sitecore a compelling choice for engaging in Social Media via your website. This Sitecore white paper draws from Oshyn’s expertise as a certified Sitecore partner, helping organizations understand the distinct capabilities offered by Sitecore CMS.

    The Business Case for Leveraging Open Text Web Solutions Delivery Manager The Business Case for Leveraging Open Text Web Solutions Delivery Manager (451 KB)
    This free white paper explores the evolving needs of small and medium size businesses and explains how the Open Text Web Solutions Delivery Manager (formerly RedDot LiveServer) can help businesses build their brand, reputation, and client base. This white paper examines strategies, key points and tips to leverage the features available in Open Text Web Solutions (RedDot CMS) to achieve an impactful user experience and to maximize visitor engagement through a reliable and powerful implementation.

    Open Text Best Practices: Part One Open Text Best Practices: Part One (763 KB)
    Authored by Oshyn Senior Consultant, Adaeze Okorie, this free CMS white paper draws from Oshyn’s vast experience as an Open Text Certified Partner, in helping organizations define strategies to meet business goals while implementing Open Text Web Solutions (RedDot CMS). Specifically in this free white paper Adaeze Okorie discusses strategies, key points and tips to leverage the features available in Open Text Web Solutions (RedDot CMS) to achieve an effective, reliable and robust implementation.

    Improving the ROI of Business Software: Service Oriented Architecture from a Business Perspective Improving the ROI of Business Software: Service Oriented Architecture from a Business Perspective (398 KB)
    Software selection and technology decision making should no longer be left to the IT department alone. By gaining an understanding of Service-Oriented Architecture, business people outside of the IT department will be better positioned to maximize the ROI of the company's technology platforms. Download this free white paper to learn more.

    Getting Over Social Media Marketing Paralysis for B2B Getting Over Social Media Marketing Paralysis for B2B (2254 KB)
    Many companies are well aware that Social Media has become critically important to engaging audiences and promoting online "presence" while some wonder how to approach their C-level executives and prove that it is not all hype. With so many ways to engage in Social Media, how can they get buy-in and begin execution with so many different venues and tools available? Staying on the sidelines and becoming a latecomer might make it more difficult to create a convincing "social" presence. Put the ove

    Performance Tuning Open Text Web Solutions Management Server and Delivery Server Performance Tuning Open Text Web Solutions Management Server and Delivery Server (235 KB)
    If you've made an investment in Open Text Web Solutions (formerly RedDot) Web Content Management products, you’ve undoubtedly experienced performance issues. While every CMS requires tuning, Open Text Web Solutions - RedDot is especially susceptible to mis-configuration and poor performance as the out-of-the-box installation comes untuned and ready for Development Environments only. In this FREE white paper we share performance tuning expertise as an Open Text Certified Partner that has optimize

    The Business Case for Leveraging Open Text Web Solutions Within Higher Education The Business Case for Leveraging Open Text Web Solutions Within Higher Education (430 KB)
    Academic institutions have a long reputation for being slower to adopt new technologies for their audiences. However, many schools are taking serious steps in improving the online experience they are providing. This white paper explores the unique needs of the higher education market, applying new tools & trends and specifically how the Open Text Web Solutions’ Delivery Manager (formerly known as RedDot LiveServer) can be leveraged to achieve those goals.

    SEO Best Practices within a Content Management System SEO Best Practices within a Content Management System (712 KB)
    In this free white paper, we share Search Engine Optimization (SEO) tips and best practices to follow when implementing a Content Management System (CMS). Certain features and functionality will help your content editors make website changes faster while minimizing the risk of human error. Download this free white paper to learn strategies to improve search engine rankings.

    Best Practices for Sitecore CMS Best Practices for Sitecore CMS (1121 KB)
    Sitecore CMS is an extensive Web Content Management (WCM) platform for the mid-market. It offers reduced IT expenditures, a streamlined content lifecycle, and a return of content control to the subject matter experts. The newest incarnation of Sitecore CMS version 6.0 is a mature product that incorporates standard social media components such as wikis, blogs, RSS syndication and “e-mail a friend” features.

    Optimizing SEO in your CMS (WCM) Optimizing SEO in your CMS (WCM) (3108 KB)
    Oshyn's Christian Burne spoke in depth about SEO in CMS at the Gilbane San Francisco Conference on June 3rd, 2009. Christian discussed the pressues of keyword competition and how the CMS can add tremendous power to climbing Google SERPs and other search engine rankings. The presentation was later part of a featured article on CMSWire. We've made the presentation available in PDF format. Download now to learn more about strategies for using your CMS to optimize SEO.

    The Best CMS for You: Tips on How to Select Your Next CMS The Best CMS for You: Tips on How to Select Your Next CMS (909 KB)
    As websites continue to grow in size, features and functionality, the visitors to these websites are also becoming more demanding and have higher expectations than ever before. Companies who committed valuable time and resources to web strategies just five years ago are finding they must re-evaluate and explore new options as their content, features and online offerings must keep pace with the constant and rapid movement in the digital marketplace. For many of these companies, there is a strong.

    Oshyn Sample Voluntary Product Accessibility Template (VPAT) Oshyn Sample Voluntary Product Accessibility Template (VPAT) (741 KB)
    Section 508 requires that when federal government and agencies procure, develop, and maintain or use electronic and information technology (EIT), they must ensure that it is accessible and in compliance with Section 508 standards developed by the Architectural and Transportation Barriers Compliance Board (Access Board). Oshyn understands these requirements and has delivered reports like these countless times.

    G SEO Best Practices Guide G SEO Best Practices Guide (349 KB)

    Sitecore CMS Implementation Best Practices Sitecore CMS Implementation Best Practices (481 KB)

    Twitter Facebook LinkedIn Featured in Alltop
    Oshyn, Inc.17785 Center Court Drive N Cerritos, CA 90703    1.888.483.1770 newbusiness@oshyn.com
    2013 Copyright Oshyn. All rights reserved.
    • View Mobile Version
    • Terms of Use
    • Privacy Policy
    • Contact Us
    x
    • Contact Us Oshyn 1.888.483.1770
      Have Oshyn Call Me Have Oshyn call you
      Request Further Information Request further information

      Submit an RFP Submit an RFP