You are missing our premiere tool bar navigation system! Register and use it for FREE!

•  Home •  Downloads •  Gallery •  Your Account •  Forums • 
Readme First
- Readme First! -

Read and follow the rules, otherwise your posts will be closed
· Home
· Buy a Theme
· Advertising
· AvantGo
· Bookmarks
· Columbia
· Community
· Donations
· Downloads
· Feedback
· Forums
· Private Messages
· Search
· Statistics
· Stories Archive
· Submit News
· Surveys
· Theme Gallery
· Top
· Topics
· Your Account
Who's Online
There are currently, 268 guest(s) and 0 member(s) that are online.

You are Anonymous user. You can register for free by clicking here
Search engines and the GET method

25.1.1. Search engines and the GET method

When a search engine spider encounters an URL with many parameters while indexing your pages, it will ignore the URL and not index that particular page. Just how many parameters are too many for a search engine, is difficult to say. The search engines are deliberately vague on this point, just as they are on almost every other point regarding their algorithms. For example, Google states the following in its Guidelines for Webmasters:

Allow search bots to crawl your sites without session ID's or arguments that track their path through the site. These techniques are useful for tracking individual user behavior, but the access pattern of bots is entirely different. Using these techniques may result in incomplete indexing of your site, as bots may not be able to eliminate URLs that look different but actually point to the same page.

This does not mean that Google will not spider dynamic pages at all, however (although this seems to have been the case in the past, though). Somewhere in the first months of 2003, Google's algoritms became intelligent enough to tackle the tricky problem of dynamic URLs - well, at least partly, see Google is now better at spidering dynamic sites. Observational evidence suggests that Google will now index a page whose URL contains no more than 2-3 parameters with short names (URLs with 2 parameters is the maximum for Google right now, said Marissa Mayer of Google in the Search Engines Strategies Chicago December 10th Day Two, Dec. 10th 2003 ).

Still, the problem remains for URLs with many parameters, as well as session IDs. A page that uses session IDs can generate an infinite amount of pages for a spider to visit. These types of pages are blocked from being indexed not only by Google, but from other search engines as well.

If a bot ignores a page due to a session ID or a large number of GET parameters on the URL, it will also ignore all pages referenced by that page (unless it finds its way to them through some other link that it can follow). Since every PHP-Nuke module is accessible through an URL of the form

and the PHP-Nuke forums with at least 4 URL parameters, one of which is a session ID,

you run the risk that the majority of your pages will be unknown to the search engines. As a rule of thumb we can say that, if this happens, it will cost you two thirds of your external referrals[1]. This can cost you your web existence and can mean the difference between success and failure for your website!

Why? Because search engines create multiple entry points into your website, a fact that many people fail to realize. Most people you know may be coming to your website through its main index.php page, mainly because it is easier to remember, or it's just the web address you printed on the business card you gave to them. But a well-indexed website will soon begin driving traffic to deeper located pages. The search engines have rendered elaborately crafted entry pages almost obsolete: today, every page of your website can be an entry page.

It is also overseen by the average webmaster that these interior pages often draw a different kind of users than the index page: users arriving there are much more qualified because they are looking for information specific to a certain topic. Because they are looking for very specific information, they are also more likely to convert on a sale or action that you have prepared for them.

If the search engines are not able to spider your dynamic content because of the GET parameters in the URL, you are losing - these more qualified visitors often won't find your website. Thus it is very important that you find a way to make as much of your website as possible visible to the search engines.

If you budget affords it, you can choose the lazy way: some serach engines, like Inktomi (FIXM: URL!), offer a Paid Inclusion Program. In a Paid Inclusion Program, it is you who submits a list of URLs to the search engine to crawl, not the search engine that finds them automatically. This way, the search engine can be sure that the list of URLs you submitted contains real content that is of importance to you and that none of the URLs contain duplicate content of one and the same page (something that can easy happen with session IDs and an automatic spider, for example).

On the plus side of a Paid Inclusion Program, you will get your pages indexed, the URLs will be the correct ones and the world will be able to search and find you. The downside is that you have to pay for each and every URL you want to have indexed. If this strains your budget, you will have to search for alternatives.

It turns out that such an alternative exists, thanks to the Swiss-Army-Knife of URL manipulation that is called mod_rewrite.



A typical website will get about two thirds of its external traffic from search engines and one third from sites that link directly to it. Of course, your mileage may vary.

Web site engine's code is Copyright © 2002 by PHP-Nuke. All Rights Reserved. PHP-Nuke is Free Software released under the GNU/GPL license.
Page Generation: 0.121 Seconds - 366 pages served in past 5 minutes. Nuke Cops Founded by Paul Laudanski (Zhen-Xjell)
:: FI Theme :: PHP-Nuke theme by coldblooded ( ::