You are missing our premiere tool bar navigation system! Register and use it for FREE!

NukeCops  
•  Home •  Downloads •  Gallery •  Your Account •  Forums • 
Readme First
- Readme First! -

Read and follow the rules, otherwise your posts will be closed
Modules
· Home
· FAQ
· Buy a Theme
· Advertising
· AvantGo
· Bookmarks
· Columbia
· Community
· Donations
· Downloads
· Feedback
· Forums
· PHP-Nuke HOWTO
· Private Messages
· Search
· Statistics
· Stories Archive
· Submit News
· Surveys
· Theme Gallery
· Top
· Topics
· Your Account
Who's Online
There are currently, 37 guest(s) and 0 member(s) that are online.

You are Anonymous user. You can register for free by clicking here
Nuke Cops :: View topic - GoogleTap mod_rewrite & SID Defined [ ]
 Forum FAQ  •  Search  •   •  Memberlist  •  Usergroups   •  Register  •  Profile •    •  Log in to check your private messages  •  Log in

 
Post new topic  Reply to topicprinter-friendly view
View previous topic Log in to check your private messages View next topic
Author Message
Zhen-Xjell
Nuke Cops Founder
Nuke Cops Founder


Joined: Nov 14, 2002
Posts: 5939


PostPosted: Sat Aug 30, 2003 7:38 pm Reply with quoteBack to top

Hi folks, if you have any questions or comments feel free to reply.

I've received some questions lately as to what Google Tap is all about. In order to answer I'd like to start off with "why googletap"...

I started up http://computercops.biz over 1.5 years ago and wanted to have the site crawled by Google. Was not very successful. Same thing with this site, http://nukecops.com.

This is where Google Tap comes into play... I've been told that someone originally wrote the mod_rewrite code for an old php-nuke, and if you know his name please feel free to comment on it. This original code slice was introduced to me by our same member who eventually released his googlifier. I've credited him in the GT release package.

With history aside, the original code was ported into the new php-nuke CMS, and a complete rewrite of the forums and other links were done. A mod_rewrite overhaul if you will.

This recode removed the &SID from the news articles, and also from the forums. Google sees this as a cookie per session only ID. And as such, ignores the URLs.

That is why the php-nuke news and forums are not indexed.

Now I know many of you have wonderful data in both of these modules, and I for one wanted to share my sites via Google and other crawlers.

From there, the forums have been opened up and so has the news. And GoogleTap has taken on a life of its own shortening links to other modules.

This is how it works...

There is a php code within the header.php file. While the php engine executes the php code, it finds all matching URLs and translates them into shorter URLs as defined in a header.php array.

When you click on the shorter links, the web browser (apache) can't find them because the files don't exist. This is where mod_rewrite comes into play. It matches the rewritten shorter php link into the real php-nuke long link -- all behind the scenes. Transparent if you will.

Now the problem with this is mod_rewrite is a major resource hog (can be) and typically makes apache processes run away. It can effectively bring an entire server to a halt thanks to memory leaks.

For the longest time I had to run a cron job that monitored the server load average. I had it set to "5", and when reached, the cron job would restart the apache server, forcing it to fall below that "5" threshold. Anything above that started to cause a noticeable delay in page loads.

However, since the server was upgraded to a dual CPU system, I no longer have high server load issues, and apache no longer runs away. So this is truly the case of "buy better hardware to solve the software hog problem".

Google Tap has effectively enabled php-nuke sites to be reached by millions of Internet users. The key is to stay on top of its development and future coding staying inline with php-nuke code changes, and add-ons.

At the same time I try new techniques to optimize the mod_rewrite code.

As a quick synopsis...

Within the header.php one needs to write the php-nuke URL to match using regular expressions. The second array is used to take the matching URL and craft it into the new shorter URL.

Hence the header.php does this:

long url --> short url

Then .htaccess while using mod_rewrite does this:

short url --> long url

Now remember these for mod_rewrite to work:

- mod_rewrite needs to be compiled and loaded into apache
- allowoverrides need to be set to "all" for the directory location your site resides in
- rewriteengine needs to be turned "on" in the .htaccess file

I can get into more details if you, so please let me know.

Thanks for your time.

For historical purposes, GT grew out of this thread:

http://www.nukecops.com/postt362.html

_________________
Paul Laudanski, Microsoft MVP Windows-Security
CastleCops: [de] [en] [wiki]
Find all posts by Zhen-XjellView user's profileSend private messageSend e-mailVisit poster's website
sting
Site Admin
Site Admin


Joined: Jul 24, 2003
Posts: 1985

Location: Apparently ALWAYS Online. . .

PostPosted: Fri Nov 07, 2003 8:54 am Reply with quoteBack to top

Could you post an example of your cron job to handle the runaway processes?

Thanks,


-sting

_________________
Is it paranoia if they are really out to get you?

-------------------------------------------------------
sting usually hangs out at nukehaven.net
Find all posts by stingView user's profileSend private messageVisit poster's websiteAIM AddressYahoo MessengerMSN MessengerICQ Number
Zhen-Xjell
Nuke Cops Founder
Nuke Cops Founder


Joined: Nov 14, 2002
Posts: 5939


PostPosted: Fri Nov 07, 2003 8:56 am Reply with quoteBack to top

filename: loadavg.pl

Code:

#!/usr/bin/perl -w
#use strict;
$|++;

open(LOAD,"/proc/loadavg") || die "couldn't open /proc/loadavg: $!\n";
my @load=split(/ /,<LOAD>);
close(LOAD);

if ($load[0] > 5) {
`/sbin/service httpd restart`;
}

_________________
Paul Laudanski, Microsoft MVP Windows-Security
CastleCops: [de] [en] [wiki]
Find all posts by Zhen-XjellView user's profileSend private messageSend e-mailVisit poster's website
KaTXi
Nuke Soldier
Nuke Soldier


Joined: Jul 02, 2003
Posts: 13


PostPosted: Tue Nov 11, 2003 11:11 pm Reply with quoteBack to top

mod_rewrite is killing my web site.
Is there any way to use mod_rewrite only when selected bots are visiting our sites or the first time that someone comes from a "rewrited" url (after searching on google, for example?
That should reduce the impact of mod_rewrite.
Find all posts by KaTXiView user's profileSend private message
kjcdude
Captain
Captain


Joined: Jun 10, 2003
Posts: 441

Location: Southern California

PostPosted: Tue Nov 11, 2003 11:19 pm Reply with quoteBack to top

I have also found that many problems arise from mod_rewrite, but i am still going to use GT because it gets me more pages cached.

_________________
Diablo Heat | The OC Sucks [b]Hot or Not[/b] | TheOCSucks.com The OC Sucks
Find all posts by kjcdudeView user's profileSend private messageSend e-mailVisit poster's websiteAIM AddressMSN Messenger
chris
Support Mod
Support Mod


Joined: Jul 17, 2003
Posts: 12


PostPosted: Wed Nov 12, 2003 2:21 am Reply with quoteBack to top

KaTXi wrote:

Is there any way to use mod_rewrite only when selected bots are visiting our sites


Certainly. Suppose the bot has the IP address 192.168.0.0. Then, to tell mod_rewrite that the following rule has to be applied to that IP address only, you prepend the line

Code:

RewriteCond %{HTTP_HOST} ^192\.168\.0\.0$


to the RewriteRule line that you want it to apply.

For the IP addresses of the Google bots:

google deepbot (IP 216.x.x.x)
google fresh bot (IP 64.x.x.x)

but the list may be incomplete.
Find all posts by chrisView user's profileSend private messageVisit poster's website
Display posts from previous:      
Post new topic  Reply to topicprinter-friendly view
View previous topic Log in to check your private messages View next topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum



Powered by phpBB © 2001, 2005 phpBB Group

Ported by Nuke Cops © 2003 www.nukecops.com
:: FI Theme :: PHP-Nuke theme by coldblooded (www.nukemods.com) ::
Powered by TOGETHER TEAM srl ITALY http://www.togetherteam.it - DONDELEO E-COMMERCE http://www.DonDeLeo.com - TUTTISU E-COMMERCE http://www.tuttisu.it
Web site engine's code is Copyright © 2002 by PHP-Nuke. All Rights Reserved. PHP-Nuke is Free Software released under the GNU/GPL license.
Page Generation: 0.580 Seconds - 39 pages served in past 5 minutes. Nuke Cops Founded by Paul Laudanski (Zhen-Xjell)
:: FI Theme :: PHP-Nuke theme by coldblooded (www.nukemods.com) ::