Archivio

Archivio per marzo 2011

Preventing Duplicate Content

17 marzo 2011 Nessun commento

Duplicate content is a problem with many websites, and most webmasters don’t realise they are doing anything wrong. Most search engines want to provide relevant results for their users, it’s how Google got successful. If the search engine was to return five identical pages on the same page of the search results, it’s not likely to be useful to the searcher. Many search engines have filters in place to remove the duplicate listings – this keeps their search results clean, and is overall a good feature. From a webmaster’s point of view however, you don’t know which copy of the content the search engine is hiding, and it can put a real damper on your marketing efforts if the search engines won’t show the copy you are trying to promote. A common request is to be able to remove or redirect the “index.php” from appearing in the url. This is possible only with server-side technology like “.htaccess” configuration files or your main server config by using the Mod_Rewrite Rewrite Module. Duplicate content occurs when the search engine finds identical content at different URLs like:

www and non-www

http://www.iwebdev.it and http://iwebdev.it

In most cases these will return the same page, in other words, a duplicate of your entire site.

root and index

http://www.iwebdev.it (root) and http://iwebdev.it/index.php

Most people’s homepages are available by typing either URL – duplicate content.

Session IDs

http://www.iwebdev.it/project.php?PHPSESSID=24FD6437ACE578FEA5745

This problem effects many dynamic sites, including PHP, ASP and Cold Fusion sites. Many forums are poorly indexed because of this as well. Session IDs change every time a visitor comes to your site. In other words, every time the Search engine indexes your site, it gets the same content with a different URL. Amazingly, most search engines aren’t clever enough to detect this and fix it, so it’s up to you as a webmaster.

One page, multiple URLs

http://www.iwebdev.it/project?category=web&product=design and http://www.iwebdev.it/project.php?category=software&product=design

A product may be allocated to more than one category – in this case the “product detail” page is identical, but it’s available via both URLs.

Removing Duplicate Content
Having duplicate content on your site can make marketing significantly more difficult, especially when you are marketing the non-www version and Google is only showing the www version. Because you can’t tell the search engines which is the “original” copy, you must prevent any duplicate content from occuring on your site.

www and non-www
I prefer to use the www version of my domain (no particular reason, it seems to look better on paper). If you are using Apache as your web server, you can include the following lines in your .htaccess file (change the values to your own of course).

RewriteCond %{HTTP_HOST} ^iwebdev.it
RewriteRule (.*) http://www.iwebdev.it/$1 [R=301,L]

If your webhost does not let you edit the .htaccess file, I would consider finding a new host. When it comes to removing duplicate content and producing search engine friendly URLs, Apache’s .htaccess is too good to ignore. If your website is hosted on Microsoft IIS, I recommend ISAPI Rewrite instead.

Remove all reference to “index.php”
Your homepage should never be referred to as index.htm, index.php, index.asp etc. When you build incoming links, you will always get links to www.iwebdev.it – your internal links should always be the same. One of my sites had a different pagerank on “/” (root) and “index.php” because the internal links were pointing to index.php, and creating duplicate content. Why go to the trouble of promoting two “different” pages at half strength when you can promote a single URL at full strength? After you have removed all references to index.php you should set up a 301 redirect (below) to redirect index.htm to / (root).

Remove Session IDs
I can give advice for PHP users, ASP and CF users should do their own research on exactly how to remove these. With PHP, if the user does not support cookies, the Session ID is automatically inserted into the URL, as a way of maintaining state between pages. Most search engines don’t support cookies, which means they get a different PHPSESSID in the URL every time they visit – this leads to very ugly indexing. There is no ideal solution to this, so I have to compromise. When sessions are a requirement for the website, I would rather lose a small number of visitors who don’t have cookies, than put up with PHPSESSID in my search engine listings (and potentially lose a lot more visitors). To disable PHPSESSID in the URL, you should insert the following code into .htaccess

php_value session.use_only_cookies 1
php_value session.use_trans_sid 0

This will mean visitors with cookies turned off won’t be able to use any features of your site that use sessions, eg logging in, or remembering form data etc.

Ensure all database generated page have unique URLs
This is somewhat more complicated, depending how your site is setup. When I design pages, I’m always wary of the “one page, one url” rule, and I design my page structure accordingly. If a product belongs to 2 categories, I ensure that both categories link to the same URL, or modify the content significantly on both versions of the page so it’s not “identical” in the eyes of the search engine.

301 Redirections
A 301 redirect is the correct way of telling the Search engines that a page has moved permanently. When you still want the non-www domain name to work, you should 301 redirect the visitor to the www domain. The visitor will see the address change and Search Engines will know to ignore the non-www and use the www instead.  Use your .htaccess to 301 redirect visitors from index.htm to / and any other pages that get renamed. eg.

redirect 301 /index.htm http://www.iwebdev.it/

Collect syslog events to database (second part)

10 marzo 2011 Nessun commento

In the previous post you installed the syslog-ng 3.2.2. Now you have to configure our syslog-ng daemon to collect events to database; for this tutorial we choosed a MySQL and Postgres databases. First of all you have to configure the syslog-ng configuration file.

nano /opt/syslog-ng/etc/syslog-ng.conf

Syslog-ng receives log messages from a source. To define a source you should follow the following syntax:

source <identifier> { source-driver(params); source-driver(params); … };

For example you have to define the following source:

source my_source{ tcp ( port ( 614 ) ); };

In syslog-ng log messages are sent to files. The destination syntax is very similar to sources:

destination <identifier> {destination-driver(params); destination-driver(params); … };

You will be normally logging to a file, but you could log to a different destination-driver: pipe, unix socket, TCP-UDP ports, terminals or to specific programs.

destination my_dest{ file(“/var/log/mylog.txt”); };
If you want to collect syslog to database you have to create mysql database and table

CREATE DATABASE `syslog` DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci;

USE `syslog`;

CREATE TABLE IF NOT EXISTS `logs` (
`id` bigint(20) unsigned NOT NULL auto_increment,
`host` varchar(128) collate utf8_unicode_ci default NULL,
`facility` varchar(10) collate utf8_unicode_ci default NULL,
`priority` varchar(10) collate utf8_unicode_ci default NULL,
`level` varchar(10) collate utf8_unicode_ci default NULL,
`tag` varchar(10) collate utf8_unicode_ci default NULL,
`datetime` datetime default NULL,
`program` varchar(15) collate utf8_unicode_ci default NULL,
`msg` text collate utf8_unicode_ci,
`seq` bigint(20) unsigned NOT NULL default ’0′,
`counter` int(11) NOT NULL default ’1′,
`fo` datetime default NULL,
`lo` datetime default NULL,
PRIMARY KEY (`id`),
KEY `datetime` (`datetime`),
KEY `sequence` (`seq`),
KEY `priority` (`priority`),
KEY `facility` (`facility`),
KEY `program` (`program`),
KEY `host` (`host`) )
ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

GRANT SELECT , INSERT , UPDATE , DELETE , CREATE , DROP , INDEX , ALTER ON `syslog` . * TO ‘syslog’@'localhost’;

SET PASSWORD FOR ‘syslog’@'localhost’ = PASSWORD( ‘syslog’ )

Edit syslog-ng config appropriately; add these rows in the destination section (if you want to use Postgres you have to change mysql to pgsql):

sql(type(mysql)
host(“localhost”)
username(“syslog”)
password(“syslog”)
database(“syslog”)
table(“logs”)
columns(“host”, “facility”, “priority”, “level”, “tag”, “datetime”, “program”, “msg”, “seq”)
values(“$HOST_FROM”, “$FACILITY”, “$PRIORITY”, “$LEVEL”, “$TAG”, “$YEAR-$MONTH-$DAY $HOUR:$MIN:$SEC”, “$PROGRAM”, “$MSG”, “$SEQNUM”)
indexes(“host”, “facility”, “priority”, “datetime”, “program”, “seq”));

Syslog-ng connects sources, filters and destinations with log statements. The syntax is:

log { source(src); filter(f_mail); filter(f_info); destination(mailinfo); };

So you have to connect my_source with my_dest:

log { source( my_source ); destination( my_dest ); };
If you want to test the configuration you have to restart the syslog-ng daemon and try to send a syslog event with Kiwi Syslog Gen.

Collect syslog events to database (first part)

9 marzo 2011 Nessun commento

Syslog-ng is an open source implementation of the Syslog protocol for Unix and Unix-like systems. It extends the original syslogd model with content-based filtering, rich filtering capabilities, flexible configuration options and adds important features to syslog, like using TCP for transport. In syslog-ng starting from version 3.0 there is a great option of forward logs directly to database (Postgres, or for that matter to MySQL, Firebird or sqlite database). In comparison with the old way of doing that, namely using a pipe and executing either a wrapper script or mysql client directly, the new way saves a great deal of resources as syslog-ng does not need to start a process every time there is a log message to log. So if you want this features you have to install syslog-ng of version 3.0 or greater with use flag sql enabled. In order to install syslog-ng you have to download the right version from the official site. For our purpose we download the syslog-ng 3.2.2 version (3.2.2/setups/linux-glibc2.3.6-i386).

wget http://www.balabit.com/downloads/files?path=/syslog-ng/sources/3.2.2/setups/linux-glibc2.3.6-i386/syslog-ng-3.2.2-linux-glibc2.3.6-i386.run

Once you downloaded the file you have to grant execute permission to syslog-ng-3.2.2-linux-glibc2.3.6-i386.run.

chmod +x syslog-ng-3.2.2-linux-glibc2.3.6-i386.run

Now you are ready to install the syslog-ng.

./syslog-ng-3.2.2-linux-glibc2.3.6-i386.run

The first screen shows the path where the syslog-ng will be installed; you have to presso “continue”.

The second screen resumes the parameters about your system; press “yes” if the information are corrects.

The third screen suggest user to check if the “/opt/syslog-ng/bin” and “/opt/syslog-ng/sbin” directory are in the search PATH. In order to do so, please add the following line into the shell profile:

PATH=/opt/syslog-ng/bin:$PATH

The fourth step checks if there is old version of syslog-ng installed. If the installer has detected a configuration file from a previous syslog-ng installation, the user can use this old configuration file. We choose “no”.

The installer generates a simple configuration file and asks if user wants to receive log messages from the network. We choose “yes”.

The last step asks user if he wants forward the log messages to a remote server; we choose “skip”.

Congratulation, we installed syslog-ng 3.2.2.