When a search engine spider encounters an URL with many parameters while indexing your pages, it will ignore the URL and not index that particular page. Just how many parameters are too many for a
search engine, is difficult to say. The search engines are deliberately vague on this point, just as they are on almost every other point regarding their algorithms. For example, Google states the following in its Guidelines for Webmasters:
Allow search bots to crawl your sites without session ID's or arguments that track their path through the site. These techniques are useful for tracking individual user behavior, but the access
pattern of bots is entirely different. Using these techniques may result in incomplete indexing of your site, as bots may not be able to eliminate URLs that look different but actually point to the
This does not mean that Google will not spider dynamic pages at all, however (although this seems to have been the case in the past, though). Somewhere in the first months of 2003, Google's
algoritms became intelligent enough to tackle the tricky problem of dynamic URLs - well, at least partly, see Google is now
better at spidering dynamic sites. Observational evidence suggests that Google will now index a page whose URL contains no more than 2-3 parameters with short names (URLs with 2 parameters is the
maximum for Google right now, said Marissa Mayer of Google in the Search Engines Strategies Chicago December 10th Day Two,
Dec. 10th 2003 ).
Still, the problem remains for URLs with many parameters, as well as session IDs. A page that uses session IDs can generate an infinite amount of pages for a spider to visit. These types of pages
are blocked from being indexed not only by Google, but from other search engines as well.
If a bot ignores a page due to a session ID or a large number of GET parameters on the URL, it will also ignore all pages referenced by that page (unless it
finds its way to them through some other link that it can follow). Since every PHP-Nuke module is accessible through an URL of the form
and the PHP-Nuke forums with at least 4 URL parameters, one of which is a session ID,
you run the risk that the majority of your pages will be unknown to the search engines. As a rule of thumb we can say that, if this happens, it will cost you two thirds of your external
referrals. This can cost you your web existence and can mean the difference between success and failure for your
Why? Because search engines create multiple entry points into your website, a fact that many people fail to realize. Most people you know may be coming to your website through its main index.php
page, mainly because it is easier to remember, or it's just the web address you printed on the business card you gave to them. But a well-indexed website will soon begin driving traffic to deeper
located pages. The search engines have rendered elaborately crafted entry pages almost obsolete: today, every page of your website can be an entry page.
It is also overseen by the average webmaster that these interior pages often draw a different kind of users than the index page: users arriving there are much more qualified because they are
looking for information specific to a certain topic. Because they are looking for very specific information, they are also more likely to convert on a sale or action that you have prepared for
If the search engines are not able to spider your dynamic content because of the GET parameters in the URL, you are losing - these more qualified visitors often won't find your website. Thus it is
very important that you find a way to make as much of your website as possible visible to the search engines.
If you budget affords it, you can choose the lazy way: some serach engines, like Inktomi (FIXM: URL!), offer a Paid Inclusion Program. In a Paid Inclusion Program, it is you who submits a
list of URLs to the search engine to crawl, not the search engine that finds them automatically. This way, the search engine can be sure that the list of URLs you submitted contains real content that
is of importance to you and that none of the URLs contain duplicate content of one and the same page (something that can easy happen with session IDs and an automatic spider, for example).
On the plus side of a Paid Inclusion Program, you will get your pages indexed, the URLs will be the correct ones and the world will be able to search and find you. The downside is that you have to
pay for each and every URL you want to have indexed. If this strains your budget, you will have to search for alternatives.
It turns out that such an alternative exists, thanks to the Swiss-Army-Knife of URL manipulation that is called mod_rewrite.