Jun
19

Search Engine Optimization (SEO) – Spiders

Milo

About Spiders and Internal Links

Spiders representing the Internet’s search engines regularly crawl the World Wide Web to look for Web sites and other content to add to the search engines’ proprietary databases. They do so by following the hyperlinks that connect Web sites on the Internet. Thus, when crawling the Internet, a spider theoretically might visit any of the billions of pages on the Web. — Excluding, of course, sites that have no links pointing to them; unless those sites are submitted directly to the engines.

When visiting specific Web sites — whether submitted directly or located via a regular crawl — search engines similarly rely on their spiders to crawl the sites’ contents. The spiders do so by following the site’s internal hyperlinks. Although some limit the crawl to two or three directory layers, spiders generally will attempt to visit and evaluate all pages and elements of a visited site. This means that even though a user only submits the URL of his/her site’s home page, the search engine’s spider may visit and index several pages. Many engines may even index FTP, image or multimedia files that they come across while spidering a Web site. — Provided that the spider has internal links to follow.

Due to their very nature, spiders are crippled when they have no links to follow. Thus, a Web site with broken or no internal links on it will prevent a visiting spider from venturing beyond its point of entry. If the site contains functional external links, the spider will follow those and leave the site behind. If the site does not contain any external links, the spider de facto has reached a dead end. Unless the site in question is a one-page Web site, broken or lacking hyperlinks inevitably will force the spider to bypass pages and other site elements that otherwise might have been indexed individually and contributed to the Web site’s overall traffic influx. okay

If, however, all the Web pages on a site are linked to each other with functional hyperlinks, a visiting spider is able to peruse every nook and cranny of the site.

About Link Analysis

Search engines Google, Yahoo!, MSN, and Ask all consider the amount and — more important — the quality of “back links” (i.e., inbound page links) perhaps the single most important factor in their ranking calculations for indexed Web content. In essence, the search engines consider the presence of an inbound link to a given Web page an indication that the page from which the link originates considers the page it is pointing to important enough to actually link to. Additionally, the anchor text of the back link will help the search engine understand why the page is important. Based on various parameters, including the importance of the originating Web site, the search engines will evaluate each link pointing to a given Web site. This means that the nature of each link is actually more important than the overall number of links pointing to a site. This is particularly true in the case of Google whose PageRank system largely depends on link analysis. Indeed, the very root of Google’s immense success lies in link analysis: When Google entered the search engine market, most search engines relied on keyword usage and Meta data when analyzing and ranking Web content. Google’s focus on back links as a means to determine the relevance of a given Web page thus broke new ground and has led directly to Google’s overwhelming market dominance.

Sheer link quantity will have little impact on a page’s ranking. A limited number of quality links likely will have a bigger impact on a site’s search engine ranking than a large number of low-importance links. A high number of more or less relevant links to a site might increase the chances of random traffic stopping by the site, but will not make a dent in an otherwise lackluster search engine ranking. In other words: The key to building links to a Web site is to obtain links from the right places, rather than from as many links as possible. Note that link analysis is only one of several pieces of the search engine ranking puzzle. Therefore, to achieve the best results quality links should go hand-in-hand with well-composed Meta and Title tag content and site copy.

Not All Links Are Created Equal

In their earlier incarnations, the Internet’s search engines were more prone to include link quantity in their ranking calculations. But with that concept all but gone, link quality is one of the keys to boosting search engine rankings.

When analyzing links to a Web site, search engines will attempt to determine the relative importance of the originating sites. This means that if an originating site offers similar or related content to the one it is pointing to, the link will receive a higher rating than if the originating site’s content was entirely unrelated. As well, an originating site may be ranked based on the links that are pointing to it, and hence the relative importance of those links and their origins. The highest-quality links are those that appear to occur organically: Organic linking occurs when a Webmaster comes across a Web page that he/she finds particularly interesting or important or for some other reason wants to direct his/her site visitors’ attention to and thus decides to link to that page from his/her site. Organic linking, thus, is not the product of any form of link exchange. Obviously, no search engine can determine with absolute certainty the genesis of a given back link. However, one-way linking may serve as an indication of organic linking, as link exchanges generally involve reciprocal linking. Reciprocal linking is not necessarily bad — indeed, it can be a necessary element in any Web site owner’s link building efforts —, but advanced search engines do not rate them as highly as organic ones.

A search engine’s link quality analysis further examines the relative importance of the page that contains the outbound link. In short: If, for example, a page that holds a Google PageRank of 8 points to a given Web page, then — theoretically — that link will have a more positive influence on the linked-to page’s search engine ranking than it would had the link originated from a page with a PageRank of 3. This, of course, assumes that the search engines trust that the back link is bona fide and not part of a link buying/link farming scheme.

It is important to keep in mind that the exact link analysis method varies between the Internet’s search engines, and that the same link therefore will not necessarily have the same significance with all search engines. Nevertheless, meticulously built links to and from carefully selected sites are bound to be more effective than random reciprocal links obtained via free-for-all link exchange services and farms. Still, reciprocal linking can be useful, and is often offered as common courtesy when requesting links from another Web site.

Search engines in some cases consider internal links — i.e., links between a Web site’s pages — bona fide links and thus incorporate them into their ranking algorithms. In most cases, however, internal links will have no effect on the ranking system.

Link Context

Some search engines, including Google and Ask’s Teoma, when performing link analysis zoom in on the link text and the context in which the link is found. This is generally done by examining words in or in the vicinity of the link. If the context fits the link, then the link likely will be considered more important than if it seemed out of context. In a nutshell: If a link or a lead-in to a link mentions music CDs and the link points to an online music store, that link will receive a relatively high ranking because the context supports the link destination. Conversely, if the link text/adjacent text makes no mentioning of music or any related terms, search engines will not credit the link for its context. In short: If a back link indicates that the originating Web site considers the targeted site important, then the link text and link context may indicate why it considers it important.

Finding and Acquiring Good Links

How, then, can the Web developer determine which link originators will create the desired results?

Using the Internet’s search engines is the easiest and most obvious method to find out which Web sites will provide relevant and hence high-scoring links. By visiting a search engine and typing in a search phrase that ideally would lead to the type of site the developer is creating the search engine will return a list of matching sites. Generally, the best-ranked matches are the ones most likely to generate the most effective links. Having perused the results, the next step is to visit the sites and convince their administrators to provide the desired links.

There is, of course, no guarantee that a particular Web site will add a requested link. There can be many reasons that a Web site might decline to do so. For example, the site might refuse to link to an obviously competing online location. Or, the site’s owner may not be interested in being affiliated with other sites. Another reason could be that the site is already providing several outbound links and is not interested in adding to the count. Thus, sometimes the Web developer will have to look for links someplace else.

In any case, to request a link to a site, the Web site owner should compose and send a formal request to the targeted Web site’s owner or administrator and explain why providing a link would be mutually beneficial. If applicable, reciprocal linking can be offered as an incentive. But if the request is turned down it is better to move on to the next Web site on the list and request a link from there instead.

Sandboxing

Lately, the top search engines, notably Google and Yahoo!, have employed so-called “sandboxing” measures to ensure that no new Web page’s ranking is allowed to skyrocket until its links and content have been examined carefully. The sandboxing period, generally a few months, applies to most newly-indexed pages and allows the search engines to ascertain that a page’s inbound links do not originate from a link farm or other form of search engine spamming. The sandboxing concept can be frustrating for Web site owners, but it further increases the importance of link quality in relation to search engine ranking.

Frame-Related Problems and Solutions

Because spider-based search engines cannot crawl the content of Web sites that use frames — and some older browsers cannot display them — Web developers are generally advised to abstain from using frames when building their Web sites.

Frames are usually used to separate site content and facilitate navigation. However, because search engines cannot read and crawl the body contents and links of sites that are built with frames, framed sites usually don’t score high in search engine rankings. If frame usage is required, the page should include a Noframes tag that will provide alternate content for search spiders and browsers that do not support frames.

The Problem With Frame Usage

A framed Web page consists of a home page — or frameset —, which contains HTML code that defines and links to the frames themselves. The frameset page is the master Head tag. This tag defines all other pages used in the framed site.

When visiting a framed Web site, search engines can view the master frameset page, but cannot put the framed pages together. This means that search engines only see the HTML code on the frameset page, not the primary page body contents, which are located in the frame elements. Thus, when submitting a framed Web page to an Internet search engine, generally, the frameset page is the one submitted, but because the actual content resides in the frame elements, the search engine won’t see any potentially rank-boosting elements located in the frame elements. Therefore, only very few, if any, framed Web sites possess top-ten search engine rankings.

See below for an example of frameset content. In this case, the frameset points to three frames: Header, FrameOne, FrameTwo:

<html>
<head>
<title>Site Title</title>
<meta http-equiv=”content-style-type” content=”text/css”>
<meta http-equiv=”content-type” content=”text/html”>
<meta name=”description” content=”Site description;”>
<meta name=”keywords” content=”keywords”>
<frameset> border=”0″ rows=”100,*”>
<frame name=”header” noresize src=”Header.htm”>
<frameset cols=”100,8*”>
<frame name=”navigation” noresize src=”FrameOne.htm”>
<frame name=”content” noresize src=”FrameTwo.htm”>
</frameset>
</frameset>
</html>

Obviously, the above content will not give an algorithmic search engine much to work with. — And because all of the site’s body content, including in- and external links, is absent, the search engine’s ability to crawl, evaluate and rank the site is severely limited.

Naturally, getting rid of the frames entirely would solve the problem of lackluster search engine rankings for framed Web sites. However, ridding a Web page of frames does limit the site-design options. The Web site developer therefore should carefully weigh the pros and cons of using frames on the site before deciding whether to stick with or abandon frame usage.

The Noframes Solution

An alternative to omitting frames is to add a Noframes tag to the master frameset page. While spiders cannot crawl framed Web sites, they can read the contents of a Noframes tag. The Noframes tag enables the Web site developer to create an alternate no-frame version of a framed Web site. When a search engine — or browser that cannot read frames — comes across a framed site, it will instead read or display the contents of the Noframes tag.

The Noframes tag may contain a simple message and a link to a no-frame alternate site. Or, the tag can contain the entire HTML code, including links and body content, for the alternate site. The latter option secures that visiting search spiders can crawl the site in its entirety, including all in- and external site links.

Placing the Noframes Tag

The Noframes tag should be placed inside the Frameset tag. See example below.

<html>
<head>
<title>Site Title </title>
<meta http-equiv=”content-style-type” content=”text/css”>
<meta http-equiv=”content-type” content=”text/html”>
<meta name=”description” content=”Site description”>
<meta name=”keywords” content=”keywords”>
<frameset> border=”0″ rows=”100,*”>
<frame name=”header” noresize src=”Header.htm”>
<frameset cols=”100,8*”>
<frame name=”navigation” noresize src=”FrameOne.htm”>
<frame name=”content” noresize src=”FrameTwo.htm”>
<noframes>
<body>
<h1>Page headline </h1>
<p> Page body </p>
</body>
</noframes>
</frameset>
</frameset>
</html>

Using Flash Animation

Well-composed Flash animation can turn a Web site into a virtual work of art. Unfortunately, most spider-based search engines do not read Flash. Although Google’s spider currently has the capability of reading and indexing Flash animation, it is generally recommended that site elements that are critical to the page’s message, subject, and navigation are created in a format that can be read by all spiders. Flash animation on a Web site will not help improve the site’s search engine ranking. — No matter how aesthetically pleasing the animation makes the site. Although Flash animation will not improve a Web site’s ranking, it could, if used unwisely, damage it. Note that Macromedia’s Flash Search Engine SDK provides a set of object and source code designed to convert a Flash file’s text and links into HTML, thus enabling search engines to read the content.

On home pages, Flash animation is sometimes used to create visually stunning lead-ins. Occasionally, home pages are built entirely with Flash. In such cases some or all of the home page content is unreadable and -crawlable for Internet search spiders. Therefore, pages that feature substantial amounts of Flash animation might only score mediocre rankings with Internet search engines, while entirely Flash-based pages probably won’t be indexed at all. In other words, submitting entirely Flash-animated pages to Internet search engines is usually futile. If submitting pages that are partly Flash-animated, decent rankings might be achieved by optimizing the page’s Meta and Title tags, and by adding well-composed and keyword-rich page copy.

Flash- and HTML-Based Hyperlinks

Because search engine spiders cannot read Flash, crawler-based search engines will detect no links created with Flash. For search engine positioning purposes, regular HTML hyperlinks therefore are considerably more effective than Flash-based links. However, by incorporating both solutions — an HTML and a Flash version of each page link — on the same page, Flash animation can be retained without crippling visiting spiders.

Redirections and Google

n the context of Google optimization your Web site’s internal links mainly serve to enable the Google to spider to access all corners of the site. All internal links, therefore, must be functional.

Generally, Google advises against using HTTP redirections on submitted URLs. Instead, you should submit the destination URL itself. If Google visits and indexes a Web site without the site’s URL being submitted via Google’s “Add URL” link (or through SEO services like Traffic Blazer), then the actual site will be indexed. Not the redirected URL.

In its Quality Guidelines Google very clearly advises Web masters against using “sneaky redirects.” That phrasing refers to redirections that serve the single purpose of tricking spiders into believing that the destination is more important than it really is.

Google does, however, recommend using a 301 permanent redirect HTTP status code if you have moved an existing Google-indexed site to a new URL:

Once your new site is live, you may wish to place a permanent redirect (using a “301” code in HTTP headers) on your old site to inform visitors and search engines that your site has moved.

In short: If you have moved your content, use a 301 redirection. Otherwise, do not attempt to redirect the Googlebot.

Understanding the Robots.txt Standard

In some cases, you may want to take control of how Internet search engines crawl and index your Web site. There can be a number of reasons to instruct visiting spiders to steer clear of entire Web sites, or certain site elements. For example, you might wish to prevent pages that are under construction from being indexed by search engines. Or you might simply want to limit your bandwidth consumption by preventing certain spiders from crawling your site.

Regardless of the reason for limiting spider access to your site, the standard method of doing so is to create a “robots.txt” file and place it in your Web site’s root directory. The Robots.txt file — whose directions practically all robots/spiders comply with — can prevent certain or all search engines from visiting and indexing particular pages within a Web site. A robots.txt file can also be used to entirely block spiders from crawling a site. The Traffic Blazer Robots.txt Generator provides and easy means of creating a robots.txt file for your Web site.

Composing the robots.txt file

The robots.txt file can either block spiders entirely, or it can prevent the search engines from accessing specific directories, files or entire Web pages.

The robots.txt file consists of two defining elements:

user-agent:
disallow: /

The first element— “user-agent:” — specifies which agents, spiders or browsers should read and obey the commands in the file. The second element — “disallow:” — defines which files and directories should be blocked from the search engines. An asterisk (“*”) denotes “everything.” By following this concept, you can thus define which spider(s) should steer clear of the site and which ones should crawl all or only selected parts of its contents.

The Robots Meta tag

A less versatile alternative to the robots.txt file is to create a Robots Meta tag for each applicable page on your Web site. A Robots Meta tag can be used to instruct search engine spiders not to index a page and/or not to follow the hyperlinks on the page. This tag must be included in the <head> tag portion of every Web page you want excluded. Note that the Robots Meta tag applies to all search engine spiders/robots; the Robots Meta standard does not allow you to specify particular spider names.

Add comment

Meta

Blogroll

Multiple Streams of Income