Constructor Blog | Ecommerce Search Industry and Product Information

How To Turn A Document Search Engine Into A Product Search Engine

Written by Dan McCormick | Mar 25, 2015 7:00:00 AM

Quite often, people install an advanced, sophisticated search engine on their e-commerce site, and are heartbroken to discover that it returns awful results. Chances are that’s because it’s a document search engine, not a product search engine. In this article, I’ll explain what the difference is, and how to remedy the problem.

Document Search Engines

Most search engines are designed to search documents like web pages. The methods they use to determine which results to show are often more simple than you might think — usually just counting the number of words in a document that match the search query, and returning the documents with the most number of matches.

Web pages are a lot different than the structured data most e-commerce sites use. For instance, many document search engines have put a lot of work into improving relevancy through strategies like TF/IDF, which stands for term frequency/inverse document frequency. This is a clever way of finding relevant search results from a big set of documents.

Term frequency is a measure of how many times a term (that is, a word in a document) is used. If you’re searching for a document about platypuses, chances are that a document that uses the word “platypus” 10 times is going to be more relevant than a document that only uses it once.

The problem with using term frequency alone as a relevance measure is that certain common words (like “is”) occur very frequently. Inverse document frequency gives more weight to those words that occur less frequently, which further improves relevance.

E-Commerce and Document Search Engines

While all this is useful for document search engines, it can be counter-intuitive for product search engines on e-commerce sites. Consider, for example, indexing these two product descriptions:

“This comfortable jacket fits well and is a perfect match for our line of shoes.”

“These shoes are easy to wear and feel great.”

When someone searches for “comfortable shoes”, the search engine notices a) the word “comfortable” only appears in the first description and b) the word “shoes” appears once in both descriptions, so it assumes they’re both equally relevant. Because it finds both search words in the first description, it returns that result first, which appears to us as woefully wrong.

Fortunately, knowing how a document search engine works can help you modify how your search engine returns your data. Here are a few ways to use your search engine’s logic to your advantage:

1) Clean your data

Before all else, make sure the data going into your search engine is as clean as possible. Consider hiring people (through sites like oDesk) to go through your product names and descriptions to ensure all information is relevant and up-to-date. Otherwise, you’ll find that descriptions like “Band-Aids — OUT OF STOCK, BACK IN MARCH” will mysteriously show up for searches for “Marching Band Uniform” (because “March” and “Band” match).

With search engines, the old adage is truer than ever: Garbage In, Garbage Out.

2) Separate titles, descriptions, and keywords in your search engine indexes

If you’re currently combining titles, descriptions, and keywords into a single searchable field, your results are suffering. Descriptions, in particular, should be weighted much lower than titles and keywords. Consider the description, “This printer is the best on the market. It will print beautiful color images in vivid reds, blues, and greens on all weights of paper.” Your search engine will think this is a great match for a query for “red paper,” because the words “red” and “paper” appear in the description.

Instead, index titles and keywords separately, and boost matches in those fields much more than matches in the description. That will cause results whose titles and keywords match the search terms to appear first in the results.

3) Add product names to keyword lists multiple times

Remember TF/IDF? If your keywords for two products are “adult red shorts” and “short tennis socks”, your search engine considers those equally valid results for a search for “shorts”, because that word appears once in each set of keywords. But you can trick your search engine into understanding your products better by repeating words in keyword fields that are particularly important. So you might index those adult red shorts using “adult red shorts shorts shorts shorts”. This will tell the search engine to return that product first in a search for “shorts”.

4) Add synonyms to keyword lists

Indexing a “keywords” field that’s independent of the title and description of a product gives you a lot of good ways to improve your search results. Suppose you’re selling “WalkMaster Shoes” that are “perfect for tennis, walking, and strolling around.” You examine your list of zero-result searches and notice that people are searching for “sneakers” but not getting any results. You can add this keyword to your WalkMaster Shoes without changing the title or description of the product, and now searches for “sneakers” will return a relevant result.

Improving Results

Here are a few more ways to improve the results from your document search engine.

1) Add product properties like sizes, colors, and styles to your keyword lists

You may have a product named “Men’s Formal Shoe — Executive” with keywords “shoe, men’s, formal”. If you look through your search logs, you’ll likely notice people performing searches like “mens formal shoe size 10” or “black mens shoe”. These searches probably won’t show your men’s shoe because the size and color aren’t in the keywords list.

To solve that problem, be sure to add product properties like size, color, and style to your keyword list. That way, searches for those keywords will return the correct results.

2) Harness the power of bi-grams

Suppose you have two products: “Laundry Enhancer Softener – Fresh Spring Waters” and “40 lb. Water Softener Salt Pellets”. Someone searching for “water softener” will only want the second one of these, but your dutiful search engine will return both results because each product name contains the words “water” and “softener”.

Your search engine likely supports the idea of bi-grams: indexing two consecutive words together. (“Bi-grams” refer to sets of two words that are next to each other. You may also see the term “n-grams” which refers to sets of any number (“n”) words that are next to each other.) This lets consecutive words in the product title rank higher if they match consecutive words in the search term. For instance, in this example, “water softener” would be more likely to match the second product because the words “water” and “softener” appear next to each other and in the same order as the search query.

If your search engine doesn’t seem to be using bi-grams to rank search results, look through the configuration settings to see if this is possible.

3) Review stemming rules

Document search engines are designed to simplify words to their root form (or “stem”) so that, for instance, searches for “shoe” and “shoes” return the same results. This is usually a good thing, and saves you from having to add singular and plural forms of every keyword in your dataset to your search engine.

However, stemming can get you into trouble with certain words. “Fishing,” for example, stems to “fish”, which can lead to erratic search results if your data contains both terms. In general, beware of words that can be used as multiple parts of speech (“fish” is a noun, “fishing” is a verb), lest your “cream-colored whips” start appearing in searches for “whipped cream.”

If you’re having trouble with stemming, you can consider indexing both stemmed words and non-stemmed words, and boosting non-stemmed words in the search results. This would cause “whipped cream” to be indexed with exactly those words, so a search for that term would first return whipped cream products.

4) Add popularity metrics to your search index

Document search engines are fine-tuned to return accurate, relevant results based on the words in a document. They usually aren’t particularly concerned about how popular a document is, however. This often works against you if you’re using them to search product names, because you’ll find they rank obscure, little-purchased items higher than bestsellers.

You can work around this by storing a popularity metric for each product in your search engine, and boosting your search results according to that score. A simple way to do this is to simply count how many times each product is sold, and store that number in your search engine. Then, configure your search engine to return better-selling products higher in the results than worse-selling products. This will put bestselling items at the top of your search results, which will more accurately represent what your customers are looking for.

These four approaches can greatly improve the products that your search engine returns. We recommend you periodically review these suggestions to continually hone your search results and improve your site’s search experience.

Want to learn more about optimizing e-commerce on-site search?

Site search users convert at twice the rate of other shoppers. So how can you improve shoppers’ experiences with your e-commerce search engine?

Learn more in this new report from Constructor.io’s co-founder and CEO, Eli Finkelshteyn: