Computer Geeks

Computer Geeks

Geek Shop

Geek News

Geek Stuff

Science Geek

Computer Gaming

Linux Chat

Building Websites

Computer Forums

Computer Help Forum

Computer Hardware Forum

Computer Software Programs


Go Back   Computer Forums > Building Websites
FAQ Community Calendar Today's Posts Search

Building Websites This section covers all aspects of publishing, developing and maintaining websites. Topics include: website design, graphic design, website programming, web hosting, website marketing (SEO, link exchange, publicity, advertising), monetization & etc.

Computer Geeks
» Active Discussions
Computer Geeks
No Threads to Display.
» Other Websites
- Software Publishing

- Server Hardening
Reply
 
Thread Tools Display Modes
  #1  
Old 04-22-2006, 09:06 PM
Eminem Eminem is offline
Senior Member
GB Guru
 
Join Date: Apr 2006
Posts: 149
Default SEO Guide

Search Engine Optimization


This is the first issue of my SEO newsletter. I'd like to welcome all new subscribers. In this issue I have written a must-read article about the changes in Google's implementation of PageRank. It explains:
  • why the SandBox theory is nonsense
  • why the toolbar PR does not always correlate with high rankings.
  • why anchor text is NOT the most important factor for top Google rankings
  • why HostRank is the new major ranking factor at Google
  • how HostRank makes spamming much more difficult
  • and a lot of other goodies in here…

Before I go over the new implementation, let's first go over the problems with the original PageRank. If you don't understand PageRank very well, stop reading this article and first read Ian Rogers' PageRank article. You must understand PageRank to make sense of everything in my article.
Problems with the old PageRank system

PageRank was originally developed as a keyword independent score that estimates the importance of a page on the web. Most decent SEO's quickly understood how to manipulate it. Yes, PageRank is easy to manipulate. Here are a bunch of ways:

* You can hoard PageRank by artificially generating pages and making them link to another page that you want to stuff with PR. Remember, every new page adds a PR of 1 to all the PR on the web and then you can transfer it to another page. By simply adding lots of pages you generate new PR and can transfer it to the pages you want. In this way, you can build monstrous PR with a huge site that does not have a single link from another site! The cost of building new pages (adding PR to the web) is ZERO!
* Site-wide links pack PR easily. Essentially, you get a lot of PR from a single voter (another site, or your site as in the previous point).
* Links to other sites leaked PR from the internal pages and thus they decreased the PR of your site's pages. Webmasters understanding that, avoided putting links to other sites.
* Webmasters started manipulating their internal linking structure to prevent too much PR leakage by putting lots of internal links on the pages that leak PR. I've done that myself. Even if you have a high PR page, you can bury your link partners on a PR1 or PR2 page.
* With PageRank most incoming links to a site are to the home page and because every internal page also links to the home page, home pages get unusually high PR compared to internal (content) pages. Searchers want Google to list the page they are interested, not the home page.
* Calculating PageRank is a very resource costly process. The number of pages involved in its calculation is growing rapidly (faster that the improvements in CPU power).
* Google's toolbar shows the PR of a page and webmasters got an even better idea how to manipulate it.

So, What's the solution? It's HostRank.

HostRank is basically the same PageRank calculation, but its not on a page to page level, but on a site to site level. The graph that represents linking among sites is called a HostGraph or a SiteGraph.

Imagine every site as a page (node in the HostGraph). If any page on site A links to a page on site B then there's a link between site A and site B. Based on this data, Google runs a site to site PageRank calculation. Internal linking here does not matter, only host-to-host links. I have no idea how they've called it, but let's call it HostRank.

HostRank is a measure of a site's importance, not a page's importance. High HostRank can be obtained only if many other sites link to yours. Because creating lots of other sites is a costly process (unlike putting out pages) it is much more difficult to get high HostRank. It is much more difficult to convince a lot of sites that themselves have high HostRank to link to your site.

If you register a lot of domains and interlink them, that won't increase your HostRank because meshed linking structures distribute HostRank evenly.

The only way to domain-spam HostRank is to create a lot of domains and point all of them to your important site. Even if you succeed temporarily, a simple spam report to Google will get you banned in no time.

Under the HostRank algorithm, new pages don't add PR to the web, only new sites (hosts).
Vote limitation

How many links from site X to site Y does Google weigh? I am always a worst case scenario person and I assume Google counts only one host-to-host link. Even if you get 1 million links from another site, Google counts it as just 1 vote for your HostRank. I may be wrong and Google may have limited the host-to-host votes to 2, 3, 5 per site, but in any case this number is small.
External links (links to other sites)

Under the HostRank model, external links won't leak your HostRank (they may do that in very limited contexts, but in most cases they won't). Under the classic PageRank model, outgoing links leaked some PR.

All-in-all, HostRank encourages host-to-host voting (linking). You won't lose your rankings if you vote for other sites.

All outgoing links carry the same weight. You can't bury your link partners 5 levels deep. They'll get an even share of your total HostRank.
HostRank is scalable

The number of sites is much smaller than the number of pages. The number of sites grows at a much slower pace, than the number of pages. HostRank calculations are performed on a much smaller graph and that graph grows relatively slowly.
What about the SandBox?

HostRank recalculations are done periodically. If you don't have a high HostRank, you have no chance ranking for competitive keywords. New sites take longer to acquire links from many other domains and need to wait for the next HostRank recalculation.

And because PageRank is different from HostRank, webmasters wrongly assume they should be ranked high. They wrongly assume Google has put them in a SandBox. Google does not deliberately put new sites in a sandbox. New sites have a naturally low HostRank.
How can a new site rank high for competitive keywords?

1. Get links from as many unique high HostRank sites as fast as possible
2. Google must crawl and index these links
3. Google must recalculate the HostRank of all sites (this is probably done a couple of times per year).

How does Google distribute HostRank among the internal pages of a site?

The answer is LocalRank (I have previously referred to LocalRank as a formula from the Google re-ranking patent. That was a mistake. The Google re-ranking patent deals with a LocalScore, not LocalRank).

LocalRanks of a site are calculated by running PageRank on a graph including only the internal pages and links of a site. All external links are ignored. The final values are divided by the number of pages a site has. The resulting LocalRanks of a site are numbers from 0 to 1 and their sum is 1.

T. Haveliwala (author of topic sensitive PageRank and now working at Google) has a paper about faster PageRank calculations, where the first step is calculating these LocalRanks (local PageRanks). Haveliwala noticed that LocalRanks give less weight to the home page as compared to the original PageRank implementation. That's easily explained because LocalRanks ignore links from other sites, which link mostly to the home page. LocalRanks makes it easier for internal pages to rank well.
Combining HostRank of a site with the LocalRank of a page

I guess the simplest way to get a final importance score for a page is to just multiply the two factors. Example: if a site's HostRank is 300 and its home page has a LocalRank of 0.2 then the final importance Rank is 300 * 0.2 = 60

Because the sum of all LocalRanks of a site is 1, then the sum of the final combined ranks is the HostRank value. In other words, all pages from a site have a total weight (importance) of HostRank.

LocalRank1 + LocalRank2 + .. LocalRankN = 1
HostRank x (LocalRank1 + LocalRank2 + .. LocalRankN) = HostRank

To increase the importance of a page you can increase the HostRank of a site (links from other sites), increase its LocalRank by manipulating the internal linking structure or get more keyword rich links. The first method has the highest potential.

Let's recap: HostRank is what makes some sites rank well for almost anything. Low HostRank makes a site rank only for very uncompetitive keywords. A new site starts with a HostRank of 1 and that value will be changed in the next HostRank update.

In Haveliwala's paper, he mentions other advantages to LocalRanks:

1. The LocalRanks of a site are computed very fast. They involve a small data set and the values converge very fast.
2. LocalRank calculations can be parallelized. One computer can calculate the LocalRanks of site A, while another can calculate the LocalRanks of site B. That fits perfectly with Google's distributed architecture.
3. LocalRanks can be used to speed up the calculation of the classic page-to-page PageRanks.

Rank updates

1. HostRanks are recalculated infrequently. When that happens big changes is the SERPs follow.
2. LocalRanks can be distributed among the thousands of Google's commodity PCs. They are probably recalculated more often. When you change your internal linking structure or add new pages, the LocalRanks will change in the next LocalRank update of your site.

The final keyword ranking score

The final ranking score of a page for some keywords is a combination of HostRank, LocalRank and the Information Retrieval Score (IR Score). The IR Score is a keyword dependant score that weights keywords found on-page and off-page. I will elaborate on the IR Score in another article.

The major ranking factors are (in order of importance):

1. HostRank (how important is your site; host popularity). Get that huge, and you'll rank for almost anything.
2. IR Score (how relevant or specific your page is to the searched keywords).
3. LocalRank (how important the page is relative to the other pages on your site).

How do you get HostRank?

The major sites who can give you a large HostRank link are DMOZ and Yahoo's directory. Although they have loads of outgoing links, they do have many more incoming ones.

Submit articles to as many different sites as possible.

Put a press release and submit it to various sites.

If you have a site where people post a lot of outgoing links (such as a forum), put the forum on a subdomain or use the nofollow tag.

Unlike the original PageRank, HostRank favors link swaps! Start a reciprocal linking campaign. Swap with sites that have DMOZ and Yahoo listings. Swap with sites that have high HostRank (sites that rank well for competitive keywords).

HostRank is different from PageRank! I have a PR6 site that has a lower HostRank value than a PR4 site (that site outranks mine). I have more PR because of site-wide links from a friend's site and internal linking that protects PR leakage.

You may even disable the Toolbar PageRank. Just forget about it. Toolbar is not always related to HostRank. Frequently, optimized sites will have inflated PageRank and a much lower HostRank value! Check their links on Yahoo and MSN.

Be careful when buying links! Don't buy site-wides and ignore PR. Look for a site with lots of links from lots of other hosts.
LocalRanks and your Internal Linking Structure

I will cover this in a more detailed article. Generally:

1. On every page, put links to your most important pages.
2. On every page, remove links to pages that aren't important.
3. Don't protect from PR leakage by stuffing your link exchange pages with loads of internal links. They DON'T help. They dilute the internal value of the important ones. Leave links to your home page and the other important sections.
4. Stuff a page(s) with LocalRank and divert it to the important ones. There are 2 scenarios:
4.1. you want your home page to be as important as possible. In this case put a page X that links only to your home page and put links to page X on every page of your site. You may do that with more than one page (Y, Z etc.). My about page links only to my home page and every page links to my about page.
4.2. you want to spread the LocalRank importance among a set of pages. In this case do the same as in the previous case, but on page X link to all the important pages. Site maps are a great example of a page to stuff LocalRank to a set of internal pages.

Google's Guidelines

If you haven't read Google's guidelines lately be sure to do that again. You'll notice how they perfectly match the HostRank, LocalRank algorithms (submit to DMOZ, Yahoo, use a sitemap etc.).

I hope you enjoyed this article and find it useful. If that's the case, don't forget to sign up for my SEO newsletter, if you haven't done so yet. If you have a SEO/Webmaster related site I would appreciate a link from you.

If you have questions, ask them on my forum.

I will have to rewrite some parts of the other articles on this site. They are based on the classic PageRank model. Sad

In the next articles, I'll write about the IR Score, strategies for managing a group of sites' HostRanks and manipulating internal LocalRanks.

Good Luck!

Note: No part of this article can be reposted on other sites and media without the permission from its author. If you want to reference it, post a link to this page.
Related Papers

Ranking the Web Frontier - Paper dealing with HostRank from the IBM Almaden Research Center

Exploiting the Block Structure of the Web for Computing PageRank - A paper from T. Haveliwala. It has interesting info on LocalRanks (local PageRanks). The link points to a .pdf file.

Using SiteRank for Decentralized Computation of Web Document Ranking - Paper discusses SiteRank (HostRank) algorithms, their relation with PageRank and implications on search. The link points to a .pdf file.

Who Links to Whom: Mining Linkage between Web Sites - A paper from Google dealing with host-to-host linkage relations.
__________________
MyIndya.Com
Reply With Quote
  #2  
Old 04-24-2006, 01:35 AM
raudi raudi is offline
Senior Member
GB Advanced User
 
Join Date: Apr 2006
Posts: 110
Default

I skimmed through this, and it's a super good read, thanks a lot!
Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Powered by vBadvanced CMPS v3.2.3

All times are GMT -5. The time now is 08:50 PM.


Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
HTML Help provided by HTML Help Central.