SEO Snake Oil Part 1: Latent Semantic Indexing (LSI)

This is the first in a series of posts I’ll be doing on “SEO Snake Oil“, or the often sketchy claims made about certain techniques in the field of search-engine-optimization.

increase search engine rankings lsi

Part of the challenge in convincing clients to invest in SEO is that it’s still a very new and widely misunderstood field.   The commercial internet traces back to the advent of the Netscape Mosaic browser in 1994, but the widespread optimization of webpages for search is really less than 10 years old (Google themselves only turned 12 the other day!).

Despite even the past few years of tremendous growth in SEO, the process still remains a mystery, or at least an unexplored area, for many businesses.

There are scores of reputable and knowledgeable SEO consultants out there, but as with all things on the web, any hot growth area is bound to be rife with charlatans and quick-scheme artists, looking to play upon potential customers’ confusion about the workings of the web.   This has lead to far-fetched sales pitches offered by “consultants” looking to grab thousands of dollars in fees in exchange for some off-the-shelf, half-baked advice. Much of this advice is wrong, outdated, or at worst, downright misleading.

A common thread in these adverts is a claim to knowledge of special techniques that will help get your site ranked in the search engines, and these techniques are deliberately explained in the most oblique and jargon-heavy manner.

My favorite is the claim that they will employ “Latent Semantic Indexing” techniques to craft specific word combinations that will cause Google to assign greater relevancy to your site vs others.   LSI is actually a fascinating topic, but it is not a “real” SEO tool for consultants to exploit, and contrary to what you can find in SEO Meme-land, it is NOT used by the search engines to rank pages.  Most folks that claim to understand it are either lying or deluded, and probably both.

So what is the LSI myth, and what’s the reality?  Let’s try to do this without a slide-rule and degree in Calculus….

LSI is a Data Index and Retrieval Process

LSI is a method of indexing a database so that patterns in word distribution can be easily identified.   The technique is based on identifying semantic relationships and assigning values to the degree to which words are either close or distant in meaning to one another. So the word “golf” is closely related to the word “club”, and somewhat less related to the word “grass”, for example.

To apply LSI theory to the web, an index of webpages would be assembled in which each word on each page would be measured in terms of how often it appears on a page along with every other word, such that every possible relationship of two or more words could be assigned a value that correlates to its presumed semantic relationship.

In a simple search algorithm, a query for the word “golf” would be answered by looking for the greatest number of instances of the word “golf” on a particular page, and returning those pages to the user.   An LSI algorithm, however, could do better, in that it would rank on the basis of additional words that had a close semantic relationship to “golf”, increasing the chances of a desirable page being returned.

LSI was first applied at Bell Laboritories in the late 1980′s, and its researchers successfully filed a patent for the procedure in 1988.   LSI has been shown to be effective as an index and retrieval method for applications such as speech-to-text conversion, automatic essay scoring, and is even useful in classifying genes based on conceptual modeling of biological information contained in scientific citations.

So with all this techie-sounding mathematical stuff going on, surely LSI must be the way that Google associates words in an internet search query, yes?

Well, no, it’s not, but this is a common misperception that many folks hold after hearing about the mathematical wonder of LSI.   Some of the confusion seems to have arisen from Google’s 2003 acquisition of Applied Semantics, a company which did have a patented text-processing technology called CIRCA, but this has nothing whatsoever to do with LSI.

Google does have a “synonym search operator” that has also led many to think it is using an LSI algorithm.   It’s actually a very cool and useful feature we can all use in Google, and it’s done by placing a tilde sign (~) in front of a searched term, like this:

lsi blog post

The result of that search for “~train” will be a page where certain terms synonymous with “train” are shown in bold, e.g. “rail“, “Amtrak“, and “subway“.   This type of search can be useful in keyword research to generate some related words you may not be aware of, but it does not produce a very wide list, and is NOT the result of LSI application.  It is more likely (although not known for sure) that Google is drawing upon clickthrough data some other user-generated data to make those associations.

LSI is not meant to discover synonyms at all, as it is more interested in common co-occurrence of related words that do not necessarily mean the same thing.   Even if the search engines wanted to make use of LSI in rankings, from a pure processing-power standpoint, it would be impractical and would slow the engine response time down considerably.

For some backup on this from Dr. E. Garcia, who understands the math on this much better than I do, and operates a site focused on the research of data retrieval and data ming, see his take on it.

And Now… the Snake Oil!

None of this seems to stem the tide of SEO consultants claiming intricate knowledge of LSI and the manner in which it should be applied to your keyword research, and the content of your website. Here are some of my favorite claims pulled from actual SEO websites. Remember, all of the following quotes represent PURE SEO BUNK!!!

“(Latent Semantic Indexing) is another component of SEO and helps determine your Web page “theme”. LSI is the artificial intelligence search engines use to rank your Web pages.”

“Integrating latent semantic indexing techniques into your search engine optimization consulting will reap future benefits as well. Sites will remain more current even if some terms are no longer searched for as often.”

“LSI represents advancement in the way search engines interact with users.”

“We work hard to make it easy for you to publish LSI-friendly websites”

“Undeniable sources prove that search engines like Google are using latent semantic indexing in their algorithm to provide searchers with useful, fresh, and relevant information. Therefore it is important to optimize your website with LSI in mind.”

Just Write Good Content

So the lesson here is:  when looking for someone to help you with SEO, or even to just explain to you how search engines operate, if you encounter anyone who starts talking about LSI, just know they are either making it up, or regurgitating something they saw on the web without doing any deeper investigation.

And it’s a shame, too, because clients are buying into this talk, and being fooled into some expensive content production that really doesn’t need to happen.

The best advice on making search-engine-friendly content for your website is still the following:

• Write high-quality content
• Speak to your users in a real voice, and address their core interests and needs
• If you are targeting a keyword in a piece of content, include it early in the piece, but don’t overstuff it in the content

There is no substitute for quality content, and LSI is not a technique that will improve your search traffic.   SEO content is best created by first figuring out who is best served by the content on your site, and writing for those folks, making strategic use of well-researched keywords along the way.

Then, with a little training, or the help of a competent SEO professional, you can apply dozens of additional on-site and off-site tactics designed to increase your site’s popularity, and draw search engine attention to the relevant content that is already there.

Beware the SEO Snake Oil!!

OK

JM

About Jim Magary

Jim Magary has been a media and marketing professional since 1991. Jim spent 18 years in the corporate advertising and media world in New York, managing media buying and planning strategy for blue-chip clients at large ad agencies. He also worked in the music business, generating fresh marketing ideas for a record label at the dawn of the digital music era.   Now in Boston, MA, Jim founded Boomient as Digital Marketing consultancy to help businesses fully leverage the internet for its massive marketing potential. Jim’s knowledge and experience doing integrated marketing and understanding the journey of the customer serves as Boomient’s foundational blueprint.
This entry was posted in SEO, SEO Best Practices and tagged , , , , . Bookmark the permalink.

One Response to SEO Snake Oil Part 1: Latent Semantic Indexing (LSI)

  1. Pingback: Tweets that mention SEO Snake Oil Part 1: Latent Semantic Indexing (LSI)Boomient Consulting -- Topsy.com

404 Not Found

404 Not Found


nginx