27 August 2008

Get visistor's OS and Browser with PHP - Script 2

$HTTP_USER_AGENT=$_SERVER['HTTP_USER_AGENT'];

$os=""; $osh="";

if (eregi ("(win|microsoft)", $HTTP_USER_AGENT)==true) $os="Windows";

if (eregi ("(unix|x11|lynx|konqueror|w3m)", $HTTP_USER_AGENT)==true) $os="Unix";

if (eregi ("linux", $HTTP_USER_AGENT)==true) $os="Linux";

if (eregi ("bsd", $HTTP_USER_AGENT)==true) $os="*BSD";

if (eregi ("sunos", $HTTP_USER_AGENT)==true) $os="SunOS";

if (eregi ("hp-ux", $HTTP_USER_AGENT)==true) $os="HP-UX";

if (eregi ("os/2", $HTTP_USER_AGENT)==true) $os="OS/2";

if (eregi ("qnx", $HTTP_USER_AGENT)==true) $os="QNX";

if (eregi ("mac", $HTTP_USER_AGENT)==true) $os="Macintosh";

if (eregi ("(powerpc|ppc)(mac)", $HTTP_USER_AGENT)==true) $os="Macintosh PowerPC";

if (eregi ("beos", $HTTP_USER_AGENT)==true) $os="Beos";

if (eregi ("solaris", $HTTP_USER_AGENT)==true) $os="Solaris";

if (eregi ("amigaos", $HTTP_USER_AGENT)==true) $os="AmigaOS";

if (eregi ("windows nt|winnt", $HTTP_USER_AGENT)==true) $os="Windows NT";

if (eregi ("windows nt 4", $HTTP_USER_AGENT)==true) $os="Windows NT 4";

if (eregi ("(windows nt 5|windows 2000)", $HTTP_USER_AGENT)==true) $os="Windows 2000";

if (eregi ("(windows nt 5.1|windows xp)", $HTTP_USER_AGENT)==true) $os="Windows XP";

if (eregi ("windows me", $HTTP_USER_AGENT)==true) $os="Windows ME";

if (eregi ("(windows 98|win98)", $HTTP_USER_AGENT)==true) $os="Windows 98";

if (eregi ("(windows 95|win95)", $HTTP_USER_AGENT)==true) $os="Windows 95";

if (eregi ("(windows 3.1|win3.1)", $HTTP_USER_AGENT)==true) $os="Windows 3.1";

if (eregi ("(windows 3.11|win3.11)", $HTTP_USER_AGENT)==true) $os="Windows 3.11";

if (eregi ("(mandrake|mdk)", $HTTP_USER_AGENT)==true) $os="Linux Mandrake";

if (eregi ("debian", $HTTP_USER_AGENT)==true) $os="Linux Debian";

if (eregi ("webtv", $HTTP_USER_AGENT)==true) $os="WebTV";


$browser=""; $browserh="";

if (eregi ("(netscape|mozilla)", $HTTP_USER_AGENT)==true) $browser="Netscape";

if (eregi ("mozilla/3", $HTTP_USER_AGENT)==true) $browser="Netscape 3";

if (eregi ("mozilla/4.5", $HTTP_USER_AGENT)==true) $browser="Netscape 4.5";

if (eregi ("mozilla/4.6", $HTTP_USER_AGENT)==true) $browser="Netscape 4.6";

if (eregi ("mozilla/4.7", $HTTP_USER_AGENT)==true) $browser="Netscape 4.7";

if (eregi ("(mozilla/5|gecko/)", $HTTP_USER_AGENT)==true) $browser="Mozilla 1";

if (eregi ("netscape/6", $HTTP_USER_AGENT)==true) $browser="Netscape 6";

if (eregi ("msie", $HTTP_USER_AGENT)==true) $browser="IE";

if (eregi ("msie 3", $HTTP_USER_AGENT)==true) $browser="IE 3";

if (eregi ("msie 3.0", $HTTP_USER_AGENT)==true) $browser="IE 3.0";

if (eregi ("msie 3.01", $HTTP_USER_AGENT)==true) $browser="IE 3.01";

if (eregi ("msie 4", $HTTP_USER_AGENT)==true) $browser="IE 4";

if (eregi ("msie 4.0", $HTTP_USER_AGENT)==true) $browser="IE 4.0";

if (eregi ("msie 4.01", $HTTP_USER_AGENT)==true) $browser="IE 4.01";

if (eregi ("msie 5", $HTTP_USER_AGENT)==true) $browser="IE 5";

if (eregi ("msie 5.0", $HTTP_USER_AGENT)==true) $browser="IE 5.0";

if (eregi ("msie 5.01", $HTTP_USER_AGENT)==true) $browser="IE 5.01";

if (eregi ("msie 5.1", $HTTP_USER_AGENT)==true) $browser="IE 5.1";

if (eregi ("msie 5.5", $HTTP_USER_AGENT)==true) $browser="IE 5.5";

if (eregi ("msie 6", $HTTP_USER_AGENT)==true) $browser="IE 6";

if (eregi ("msie 6.0", $HTTP_USER_AGENT)==true) $browser="IE 6.0";

if (eregi ("msie 6.0b", $HTTP_USER_AGENT)==true) $browser="IE 6.0b";

if (eregi ("opera", $HTTP_USER_AGENT)==true) $browser="Opera";

if (eregi ("opera.2", $HTTP_USER_AGENT)==true) $browser="Opera 2";

if (eregi ("opera.3", $HTTP_USER_AGENT)==true) $browser="Opera 3";

if (eregi ("opera.4", $HTTP_USER_AGENT)==true) $browser="Opera 4";

if (eregi ("opera.5", $HTTP_USER_AGENT)==true) $browser="Opera 5";

if (eregi ("opera.5.11", $HTTP_USER_AGENT)==true) $browser="Opera 5.11";

if (eregi ("opera.5.12", $HTTP_USER_AGENT)==true) $browser="Opera 5.12";

if (eregi ("opera.6", $HTTP_USER_AGENT)==true) $browser="Opera 6";

if (eregi ("opera.7.01", $HTTP_USER_AGENT)==true) $browser="Opera 7.01";

if (eregi ("opera.7.10", $HTTP_USER_AGENT)==true) $browser="Opera 7.10";

if (eregi ("opera.7.11", $HTTP_USER_AGENT)==true) $browser="Opera 7.11";

if (eregi ("lynx", $HTTP_USER_AGENT)==true) $browser="lynx";

if (eregi ("w3m", $HTTP_USER_AGENT)==true) $browser="w3m";

if (eregi ("konqueror", $HTTP_USER_AGENT)==true) $browser="Konqueror";

echo $browser."".$os;

Get visistor's OS and Browser with PHP - Script 1


/**************************************************
* User Agent Detector
* -------------------
*
* Deluxe Portal Version 2.0
*
* This SINGLE FILE is Licensed under the LGPL which
* can be found at
* [url=http://www.gnu.org/licenses/lgpl.txt]http://www.gnu.org/licenses/lgpl.txt[/url]
**************************************************/
$_REMOTE = array();
$_REMOTE['USER_AGENT'] = $_SERVER['HTTP_USER_AGENT'];
$_REMOTE['BROWSER']['TYPE'] = 0;
$_REMOTE['BROWSER']['VERSION'] = 0;
$_REMOTE['BROWSER']['GECKO'] = 0;
$_REMOTE['OS']['PLATFORM'] = 0;
$_REMOTE['OS']['TYPE'] = 0;
$_REMOTE['OS']['VERSION'] = 0;

if (strpos($_REMOTE['USER_AGENT'], 'Opera') === 0)
{
$_REMOTE['BROWSER']['TYPE'] = 'Opera';
list($version, $dummy) = explode(' ', $_REMOTE['USER_AGENT'], 2);
list($dummy, $version) = explode('/', $version, 2);
if (strpos($version, 'v') === 0) $version = dp_substr($version, 1);
}
elseif (strpos($_REMOTE['USER_AGENT'], 'Mozilla') === 0)
{
if (strpos($_REMOTE['USER_AGENT'], 'Opera'))
{
$_REMOTE['BROWSER']['TYPE'] = 'Opera';
list($dummy, $version) = explode('\)', $_REMOTE['USER_AGENT'], 2);
list($dummy, $dummy, $version) = explode(' ', $version);
if (strpos($version, 'v') === 0) $version = dp_substr($version, 1);
}
elseif (strpos($_REMOTE['USER_AGENT'], '(compatible'))
{
if (strpos($_REMOTE['USER_AGENT'], 'Konqueror'))
{
$_REMOTE['BROWSER']['TYPE'] = 'Konqueror';
list($dummy, $version) = explode('Konqueror/', $_REMOTE['USER_AGENT'], 2);
list($version, $dummy) = explode(';', $version, 2);
}
elseif (strpos($_REMOTE['USER_AGENT'], 'MSIE'))
{
$_REMOTE['BROWSER']['TYPE'] = 'Internet Explorer';
list($dummy, $version) = explode('MSIE', $_REMOTE['USER_AGENT'], 2);
list($version, $dummy) = explode(';', $version, 2);
}
}
else
{
if (strpos($_REMOTE['USER_AGENT'], 'Netscape6'))
{
$_REMOTE['BROWSER']['TYPE'] = 'Netscape';
list($dummy, $version) = explode('Netscape6/', $_REMOTE['USER_AGENT'], 2);
}
elseif (strpos($_REMOTE['USER_AGENT'], 'Gecko'))
{
$_REMOTE['BROWSER']['TYPE'] = 'Mozilla';
list($dummy, $version) = explode('rv:', $_REMOTE['USER_AGENT'], 2);
list($version, $dummy) = explode('\)', $version, 2);
}
elseif (strpos($_REMOTE['USER_AGENT'], 'Sun'))
{
$_REMOTE['BROWSER']['TYPE'] = 'Hot Java';
list($version, $dummy) = explode(' ', $_REMOTE['USER_AGENT'], 2);
list($dummy, $version) = explode('/', $version, 2);
}
else
{
$_REMOTE['BROWSER']['TYPE'] = 'Netscape';
list($version, $dummy) = explode(' ', $_REMOTE['USER_AGENT'], 2);
list($dummy, $version) = explode('/', $version, 2);
}
}
}
elseif (strpos($_REMOTE['USER_AGENT'], 'amaya') === 0)
{
$_REMOTE['BROWSER']['TYPE'] = 'Amaya';
list($dummy, $version) = explode('amaya/', $_REMOTE['USER_AGENT'], 2);
list($version, $dummy) = explode(' ', $version, 2);
}
elseif (strpos($_REMOTE['USER_AGENT'], 'EPOC') === 0)
{
$_REMOTE['BROWSER']['TYPE'] = 'EPOC Web';
list($dummy, $version) = explode('WTL/', $_REMOTE['USER_AGENT'], 2);
list($version, $dummy) = explode(' ', $version, 2);
}
if (isset($version)) $_REMOTE['BROWSER']['VERSION'] = trim($version);
if (preg_match('/gecko\/([0-9]+)/i', $_REMOTE['USER_AGENT'], $match)) $_REMOTE['BROWSER']['GECKO'] = $match[1];
$_REMOTE['OS']['PLATFORM'] = '*NIX';
if (preg_match_all('/(win[dows]*)[\s]?([0-9a-z]*)[\w\s]?([a-z0-9.]*)/i', $_REMOTE['USER_AGENT'], $match))
{
$v = $match[2][count($match[0])-1];
$v2 = $match[3][count($match[0])-1];
$_REMOTE['OS']['PLATFORM'] = 'Win32';
$_REMOTE['OS']['VERSION'] = $v2;
if (stristr($v,'NT'))
{
if ($v2 == '5.0')
$v = 'Windows 2000';
elseif ($v2 == '5.1')
$v = 'Windows XP';
elseif ($v2 == '5.2')
$v = 'Windows 2003';
elseif ($v2 == '6.0')
$v = 'Windows Codename Longhorn';
else
$v = 'Windows NT';
}
elseif (stristr($v,'9x'))
$v = 'Windows 9x';
elseif ($v.$v2 == '16bit')

{
$v = 'Windows';
$_REMOTE['OS']['PLATFORM'] = 'Win16';
}
else
$v .= $v2;

if (empty($v)) $v = 'win';
$_REMOTE['OS']['TYPE'] = $v;
}
elseif( preg_match('/os\/2|ibm-webexplorer/i', $_REMOTE['USER_AGENT']))
{
$_REMOTE['OS']['TYPE'] = 'OS/2';
$_REMOTE['OS']['PLATFORM'] = 'OS/2';
}
elseif (preg_match('/(68)[k0]{1,3}|[p\S]{1,5}(pc)/i', $_REMOTE['USER_AGENT'], $match))
{
$_REMOTE['OS']['TYPE'] = 'Mac OS';
$platform = !empty($match[1]) ? '68k' : '';
$platform = !empty($match[2]) ? 'Power PC' : $platform;
$_REMOTE['OS']['PLATFORM'] = $platform;
}
elseif (preg_match('/(sun|i86)[os\s]*([0-9]*)/i',$_REMOTE['USER_AGENT'],$match))
{
if(!stristr('sun',$match[1]))
{
$match[1] = 'sun'.$match[1];
$_REMOTE['OS']['TYPE'] = $match[1];
$_REMOTE['OS']['VERSION'] = $match[2];
}
elseif (preg_match('/(irix)[\s]*([0-9]*)/i', $_REMOTE['USER_AGENT'], $match))
{
$_REMOTE['OS']['TYPE'] = $match[1];
$_REMOTE['OS']['VERSION'] = $match[2];
}
elseif (preg_match('/(hp-ux)[\s]*([0-9]*)/i', $_REMOTE['USER_AGENT'], $match))
{
$_REMOTE['OS']['TYPE'] = 'HP-UNIX';
$_REMOTE['OS']['VERSION'] = (int) $match[2];
}
elseif (preg_match('/aix([0-9]*)/i', $_REMOTE['USER_AGENT'], $match))
{
$_REMOTE['OS']['TYPE'] = 'AIX';
$_REMOTE['OS']['VERSION'] = $match[1];
}
elseif (preg_match('/dec|osfl|alphaserver|ultrix|alphastation/i', $_REMOTE['USER_AGENT'], $match))
$_REMOTE['OS']['TYPE'] = 'DEC';
elseif (preg_match('/vax|openvms/i', $_REMOTE['USER_AGENT'], $match))
$_REMOTE['OS']['TYPE'] = 'VMS';
elseif (preg_match('/sco|unix_sv/i', $_REMOTE['USER_AGENT'], $match))
$_REMOTE['OS']['TYPE'] = 'SCO';
elseif (stristr('unix_system_v', $_REMOTE['USER_AGENT']))
$_REMOTE['OS']['TYPE'] = 'UNIXWARE';
elseif (stristr('ncr', $_REMOTE['USER_AGENT']))
$_REMOTE['OS']['TYPE'] = 'MPRAS';
elseif (stristr('reliantunix', $_REMOTE['USER_AGENT']))
$_REMOTE['OS']['TYPE'] = 'Reliant';
elseif(stristr('sinix', $_REMOTE['USER_AGENT']))
$_REMOTE['OS']['TYPE'] = 'SINIX';
elseif (preg_match('/(free)?(bsd)/i', $_REMOTE['USER_AGENT'], $match))
$_REMOTE['OS']['TYPE'] = $match[1].$match[2];
elseif(preg_match('/x11|inux/i', $_REMOTE['USER_AGENT'], $match))
$_REMOTE['OS']['TYPE'] = 'Linux';
elseif (stristr('epoc', $_REMOTE['USER_AGENT']))
{
$_REMOTE['OS']['TYPE'] = 'EPOC';
$_REMOTE['OS']['PLATFORM'] = 'EPOC';
}
else
$_REMOTE['OS']['PLATFORM'] = 'UNKNOWN';
}

print_r($_REMOTE);

22 August 2008

Making Word documents really read-only

The need for ‘read-only’
Now let’s get down to the main topic for this week. Nowadays, Word documents are used very often to send proposals, project plans, research papers, manuscripts and so on. We often want the guarantee that the original document sent should not be editable by the recipient. In addition, we want to ensure that unauthorised persons are not even able to open the document.

Preventing unauthorised access
To ensure that only authorised persons can see the document (open it) you simply put in the Password to Open. This way, before the document is opened, the user is asked to enter a password. More importantly, the document does not display plain text even if it is opened in a raw editor like Notepad. Not many know that assigning a password actually encrypts the contents of the Word file, so that the plain text is not visible within or outside Word.
The programmatic way of doing this while saving the document is:

Activedocument.saveas ,,, “

Now, after sending the document to the intended recipient, you have to somehow communicate the password. This is easy enough if you regularly communicate with them. In fact, if you regularly exchange confidential documents with specific persons, it is a good idea to predetermine a separate password for communication with each of them.

This simplifies the communication further and does not require you to communicate the password every time you send a document.

Of course, there are better ways of doing this using digital signature technology. But just consider how many documents you send or receive with digital signatures as of today, and then you will understand the elegant utility of my suggestion! Now we have prevented unauthorised access. But what about the additional requirement of even authorised persons not being able to modify the document?

Preventing modification
This is also available as a base feature of Word. You need to specify another password called “Password to modify”. You can enter these passwords from the File - Save As dialog - Tools - Security Options (Office XP dialog. Other versions have a different set of menus but the password options are still available). Here is how the dialog looks.

Now when you enter the password for modification and save the document, the user needs to enter two passwords while opening the file. Once for opening the file and another if modifications are to be done to the file. If you put a modification password and not inform the customer about this password, the document can only be opened and read by the recipient but not modified. However, there is a major problem in the scheme of things in this case. The code for inserting open as well as modify password is:

Activedocument.saveas ,,, “” _
,,

The problem
When you do not have password to modify, you can choose the Open Read Only button from the password entry dialog.

Now the document opens and displays (Read-Only) in the title bar. But to your surprise, you can still modify the document! When you try to save it, the document forces you to save it as a different file name. If you try to save it using the same file name, it will generate an error, stating that the file is read-only.

I would say, this feature does not serve the purpose of making a document read-only at all. Why do we want the document read-only? We really want it to become non-editable.

Moreover, we do not even want the chance that the user can use Select All - Copy and then paste the entire content of your document into another document and peacefully save it!

S(he) can of course print it for reference. Some customers have even gone to the extent of saying that even the Print Screen key should not work! This is rather extreme. But believe me, with some clever programming, even this need can be satisfied. Anyway, that would be the topic for another article. For now, let us find a way of achieving a truly read-only, non-editable and non-cut-pastable(!) document in Word. This would be a great achievement in itself.

The trick and the solution
The solution to this problem is not very obvious. In fact, I have not come across any documentation which specifies the usage as is mentioned in this article. I myself found this feature while trying to do something else. Now let us come to the solution. First I will simply give the steps to achieve the desired protection. Then I will explain it.
To make a Word document really read-only follow these steps.

1. Write the document as usual.

2. When you finish writing the document, save it. Now save it as another document to make a copy of the original document. Why so? Read on to find out!

3. Move your cursor to the end of the document.

4. From Tools menu, choose Customise. A dialog appears listing available toolbars. If it is not already selected, choose the Forms toolbar by enabling the checkbox next to it. Click on close button.

5. Now you will see the Forms Toolbar.

6. This toolbar allows you to make fill-in-the-blanks type of forms within a word document. We will not go into the details of these types of forms in this article. But for now, just click on the first toolbar button from left (where ab is written). The tool tip for this button is “Text form field”.

7. Now at the cursor a small grey patch will appear. This is the text entry form field.

8. Double click on this grey patch. A dialog will appear. In this dialog, clear the check box labelled “Fill-in enabled”. Close the dialog by a click on the Ok button.

9. The grey area will continue to appear. To remove the grey shading, click on the Form Field Shading button on the toolbar. Now the grey shading will disappear. In effect, the field is no longer noticeable.

10. Now all that you need to do is to protect the document. To do this, choose Tools - Protect Document... menu option. Another dialog appears.

11. Choose the Forms option and enter a password and choose Ok button. You will be asked to re-enter the password with a detailed warning. Take the warning seriously. Believe me, you yourself cannot edit the document if you forget the password! Now you know why I asked you to save the original document with protection. In case you forget the protection password (which all of us do very efficiently time and again ), you have a safe editable copy at hand!

12. Now, in order to prevent unauthorised access, choose File-Save As option and Select Tools - Security options tab to enter the regular Password to Open. This password should be different than the form protection password you entered earlier. The password to open will need to be communicated to the intended recipient. So the document protection password should be different.

13. Now save and close the document.

Nothing great seems to have happened yet. Actually it has, but it is not apparent. To see the magic, follow these steps:

1. Open the document.

2. Enter the password to open. The document opens as usual. Contents are seen.

3. Now try editing the document. Sorry, you can’t.

4. Try highlighting the text. No luck

5. Try select all from Edit menu, sorry again. The option is disabled!

6. Try all the things you can think of to select and copy the text. Sorry. Nothing works.

7. Try the Save As option. While saving as, you can of course remove the Password to open. But that is your risk because, you are making the document prone to unauthorised use. But yes, in our quest to make the document editable, let us try that also. Remove the Open password. Save it as another document.

8. Open this newly saved document and again try to select/copy the text. Sorry sir. Does not work.

Now, of course we are all techies, we know programming. We know the Word object model. So let us try to get the text programmatically. Incidentally, the simplest way to get all the text in the document and paste it into another document is as follows. Assume that there are two documents open. One document contains text (filled.doc) and another document is blank (blank.doc)

Here is the code to get all the text from filled.doc into blank.doc documents(“blank.doc”).range.Text = _ documents(“filled.doc”).Range.Text

Simple and effective, isn’t it.

Now with this knowledge, try to do the same with the protected document. Let us say our protected document was called “safe.doc”. Open the safe.doc and try the following code.

documents(“blank.doc”).range.Text = _
documents(“safe.doc”).Range.Text

Now, you are almost sure it would work. Tough luck! You get a very nice error as shown below.

The only way to duplicate this document and make changes to it is to visually see the original document and re-type all the text and formatting! Of course this can be done but it is so much more difficult now!

The technology behind the trick
The technology lies in the fill-in-the-blanks form fields of Word. When you have a form that has a fixed layout and content and some variable data entry, like say a Leave Request Form, you use the form fields. Now in order to make sure that end users enter only the form field data and not change the base form text, you have to protect the document using a password. That is what we did. However, we did not want to use the form field at all. It was a dummy field. So we disabled the fill-in feature of the form field (the grey coloured text box). Now nothing is editable in the document. And that is why all the features related to editing—typing, cut, paste, select all, mouse based text selection—were disabled. So we achieved our purpose effectively.

Lessons to be learned

  • Exploring features is a great method of enriching knowledge.
  • It requires great amount of thoughtful design effort to make features work effectively.
  • If you are providing programmatic access to your application functionality, the same restrictions that are applied to menus must be applied to programmatic usage.
  • Effective usage of a simple feature can lead to great value. Even companies that regularly use digital signatures and various other sophisticated methods for secure document interchange can still use lots of unprotected documents. This trick is usable by anyone who has Word.
  • Apart from obvious usage of a feature, innovative thinking can lead to additional usage scenarios. The only method of unearthing these scenarios is curiosity coupled with knowledge.

08 August 2008

Video Sharing Sites - What Research Says

In this comparison we review several different video sharing scripts for making your own YouTube clone website, including Vshare, ClipShare, Rayzz, MadiaMax and PHPMotion. Which is best?


Let's get one thing out in the open before we start. These scripts are all VERY similar. So similar, in fact, that many of them are based on the same original code (some say PHPMotion). That being said, each video sharing platform has taken that first generation of YouTube Clone script and made it their own by adding features, creating new templates, fixing up bugs and holes, etcetera... The success of your site is also going to come down to your hard work and the quality of your host. We went through several FFMpeg hosting companies before we settled on APTHost as our favorite because they have less down time and 24-hour technical support, 1-800 numbers AND they are located in Canada (the rest we found were in India). But that's another comparison altogether so let's get back to our review of each video sharing software package:


  1. Vshare YouTube Clone

    This was the first YouTube Clone script we installed, and we were somewhat disappointed. They say you get what you pay for in life, and this was no exception. At $10 it was a great way for us to get our feet wet and see what these video sharing sites were all about. Although the features and design were almost identical to PHPMotion there was one BIG difference: The Code Was Encrypted. Basically these people stole the code from another script, probably PHPMotion or ClipShare, encrypted it so users would never be able to wean themselves off of their support, and then charged you $10 for what you could have got for free from PHPMotion. That is why they get our lowest possible score.

    Current Version: 2.6
    Free Templates: Available
    Feature List: Good
    Design: Poor
    Value: Great
    Bugs: Few
    Direct Support: None
    Forum Support: Great
    Security: Poor
    Avg. Price: $10
    Rating: 1


  2. ClipShare Video Sharing

    Clipshare was the second video sharing script we installed, and we were fairly happy with the design and features. Although developed from the same base-code as Vshare, ClipShare is more fully evolved (especially the most recent version). There are the same vulnerabilities associated with any code that is based on open source, but ClipShare has done a better job (recently) of closing up those security holes. What really hurt this clone script in our review was the poor customer service, and the fact that the owners have been repeatedly accused of deleting unflattering forum posts by their own customers. For the price, go with a code base that was custom-developed from scratch, or - if we are to believe they were the first - at least one that hasn't been copied all over the internet.

    ClipShare Video Sharing

    Current Version: 3
    Free Templates: Available
    Feature List: Good
    Design: Good
    Value: Poor
    Bugs: Many
    Direct Support: Poor
    Forum Support: Good
    Security: Poor
    Avg. Price: $200 +
    Rating: 3


  3. Rayzz Multi-Media Sharing

    Rayzz is probably the strongest script we've seen in terms of developing a mix of video sharing and online community features. If you are looking to build a pure video sharing website this probably isn't the software/script for you. But if you are planning on bigger things and want to stand out from the thousands of other YouTube Clones out there, Rayzz is well worth the extra dough. When you consider that it isn't all that much more than ClipShare and MediaMax, Rayzz becomes the clear choice. However, if you have a small budget and are just looking to get something up quick, one of the open-source or less-expensive options might be a better choice.

    Current Version: 2.0
    Free Templates: Available
    Feature List: Great
    Design: Good
    Value: Good
    Bugs: Few
    Direct Support: Good
    Forum Support: Good
    Security: Great
    Avg. Price: $420 +
    Rating: 5


  4. MediaMax Script

    This is an interesting YouTube Clone because, at first, we thought it was a completely original script. After paying for it, we realized it was a very recent rebranding of Video Watch Pro. The difference in the MediaMax version includes a better design and some extra features like picture sharing and PHPBB forum integration. Of course, you'll pay for those differences with an extra $100. The worst part about this place is that they have some of the worst customer service I've ever seen. Seriously, unless you want to be reduced to blind rage or tears, DO NOT BUY MEDIAMAX and I'm guessing the same goes for VideoWatch Pro.

    Current Version: 1
    Free Templates: Not Available
    Feature List: Great
    Design: Great
    Value: Good
    Bugs: Few
    Direct Support: Fair
    Forum Support: Poor
    Security: Good
    Avg. Price: $300 +
    Rating: 1


  5. PHPMotion

    What can we say about this You Tube Clone other than you get what you pay for? It happens to be free, and for that reason it is widely distributed, buggy and full of security holes. It is a GREAT way to get your feet wet, but remember that moving over user accounts and videos to a different platform when you outgrow PHPMotion is going to be difficult and expensive. Our advice, unless you plan on having your own developers totally tweak out this base code, is to spend the money and go with something like Rayzz. Still, it is hard for us to say anything bad about something that is FREE.

    Current Version: 2.0
    Free Templates: Available
    Feature List: Poor
    Design: Poor
    Value: Great
    Bugs: Some
    Direct Support: None
    Forum Support: Great
    Security: Poor
    Avg. Price: Free
    Rating: 1


Other Scripts worth mentioning are php melody by phpsugar (music sharing), phpfox, AlstraSoft eFriends, and Dolphin (All social networking / community sites).

07 August 2008

10 Steps to optimize MySQL Queries


  1. He's swinging for the top of the trees

    The rule in any situation where you want to opimize some code is that you first profile it and then find the bottlenecks. Mr. Silverton, however, aims right for the tippy top of the trees. I'd say 60% of database optimization is properly understanding SQL and the basics of databases. You need to understand joins vs. subselects, column indices, how to normalize data, etc. The next 35% is understanding the performance characteristics of your database of choice. COUNT(*) in MySQL, for example, can either be almost-free or painfully slow depending on which storage engine you're using. Other things to consider: under what conditions does your database invalidate caches, when does it sort on disk rather than in memory, when does it need to create temporary tables, etc. The final 5%, where few ever need venture, is where Mr. Silverton spends most of his time. Never once in my life have I used SQL_SMALL_RESULT.

  2. Good problems, bad solutions

    There are cases when Mr. Silverton does note a good problem. MySQL will indeed use a dynamic row format if it contains variable length fields like TEXT or BLOB, which, in this case, means sorting needs to be done on disk. The solution is not to eschew these datatypes, but rather to split off such fields into an associated table. The following schema represents this idea:

    CREATE TABLE posts (
    id int UNSIGNED NOT NULL AUTO_INCREMENT,
    author_id int UNSIGNED NOT NULL,
    created timestamp NOT NULL,
    PRIMARY KEY(id)
    );

    CREATE TABLE posts_data (
    post_id int UNSIGNED NOT NULL.
    body text,
    PRIMARY KEY(post_id)
    );
  3. That's just…yeah

    Some of his suggestions are just mind-boggling, e.g., "remove unnecessary paratheses." It really doesn't matter whether you do SELECT * FROM posts WHERE (author_id = 5 AND published = 1) or SELECT * FROM posts WHERE author_id = 5 AND published = 1. None. Any decent DBMS is going to optimize these away. This level of detail is akin to wondering when writing a C program whether the post-increment or pre-increment operator is faster. Really, if that's where you're spending your energy, it's a surprise you've written any code at all

My list

Let's see if I fare any better. I'm going to start from the most general.

  1. Benchmark, benchmark, benchmark!

    You're going to need numbers if you want to make a good decision. What queries are the worst? Where are the bottlenecks? Under what circumstances am I generating bad queries? Benchmarking is will let you simulate high-stress situations and, with the aid of profiling tools, expose the cracks in your database configuration. Tools of the trade include supersmack, ab, and SysBench. These tools either hit your database directly (e.g., supersmack) or simulate web traffic (e.g., ab).

  2. Profile, profile, profile!

    So, you're able to generate high-stress situations, but now you need to find the cracks. This is what profiling is for. Profiling enables you to find the bottlenecks in your configuration, whether they be in memory, CPU, network, disk I/O, or, what is more likely, some combination of all of them.

    The very first thing you should do is turn on the MySQL slow query log and install mtop. This will give you access to information about the absolute worst offenders. Have a ten-second query ruining your web application? These guys will show you the query right off.

    After you've identified the slow queries you should learn about the MySQL internal tools, like EXPLAIN, SHOW STATUS, and SHOW PROCESSLIST. These will tell you what resources are being spent where, and what side effects your queries are having, e.g., whether your heinous triple-join subselect query is sorting in memory or on disk. Of course, you should also be using your usual array of command-line profiling tools like top, procinfo, vmstat, etc. to get more general system performance information.

  3. Tighten Up Your Schema

    Before you even start writing queries you have to design a schema. Remember that the memory requirements for a table are going to be around #entries * size of a row. Unless you expect every person on the planet to register 2.8 trillion times on your website you do not in fact need to make your user_id column a BIGINT. Likewise, if a text field will always be a fixed length (e.g., a US zipcode, which always has a canonical representation of the form "XXXXX-XXXX") then a VARCHAR declaration just adds a superfluous byte for every row.

    Some people poo-poo database normalization, saying it produces unecessarily complex schema. However, proper normalization results in a minimization of redundant data. Fundamentally that means a smaller overall footprint at the cost of performance — the usual performance/memory tradeoff found everywhere in computer science. The best approach, IMO, is to normalize first and denormalize where performance demands it. Your schema will be more logical and you won't be optimizing prematurely.

  4. Partition Your Tables

    Often you have a table in which only a few columns are accessed frequently. On a blog, for example, one might display entry titles in many places (e.g., a list of recent posts) but only ever display teasers or the full post bodies once on a given page. Horizontal vertical partitioning helps:

    CREATE TABLE posts (
    id int UNSIGNED NOT NULL AUTO_INCREMENT,
    author_id int UNSIGNED NOT NULL,
    title varchar(128),
    created timestamp NOT NULL,
    PRIMARY KEY(id)
    );

    CREATE TABLE posts_data (
    post_id int UNSIGNED NOT NULL,
    teaser text,
    body text,
    PRIMARY KEY(post_id)
    );

    The above represents a situation where one is optimizing for reading. Frequently accessed data is kept in one table while infrequently accessed data is kept in another. Since the data is now partitioned the infrequently access data takes up less memory. You can also optimize for writing: frequently changed data can be kept in one table, while infrequently changed data can be kept in another. This allows more efficient caching since MySQL no longer needs to expire the cache for data which probably hasn't changed.

  5. Don't Overuse Artificial Primary Keys

    Artificial primary keys are nice because they can make the schema less volatile. If we stored geography information in the US based on zip code, say, and the zip code system suddenly changed we'd be in a bit of trouble. On the other hand, many times there are perfectly fine natural keys. One example would be a join table for many-to-many relationships. What not to do:

    CREATE TABLE posts_tags (
    relation_id int UNSIGNED NOT NULL AUTO_INCREMENT,
    post_id int UNSIGNED NOT NULL,
    tag_id int UNSIGNED NOT NULL,
    PRIMARY KEY(relation_id),
    UNIQUE INDEX(post_id, tag_id)
    );

    Not only is the artificial key entirely redundant given the column constraints, but the number of post-tag relations are now limited by the system-size of an integer. Instead one should do:

    CREATE TABLE posts_tags (
    post_id int UNSIGNED NOT NULL,
    tag_id int UNSIGNED NOT NULL,
    PRIMARY KEY(post_id, tag_id)
    );
  6. Learn Your Indices

    Often your choice of indices will make or break your database. For those who haven't progressed this far in their database studies, an index is a sort of hash. If we issue the query SELECT * FROM users WHERE last_name = 'Goldstein' and last_name has no index then your DBMS must scan every row of the table and compare it to the string 'Goldstein.' An index is usually a B-tree (though there are other options) which speeds up this comparison considerably.

    You should probably create indices for any field on which you are selecting, grouping, ordering, or joining. Obviously each index requires space proportional to the number of rows in your table, so too many indices winds up taking more memory. You also incur a performance hit on write operations, since every write now requires that the corresponding index be updated. There is a balance point which you can uncover by profiling your code. This varies from system to system and implementation to implementation.

  7. SQL is Not C

    C is the canonical procedural programming language and the greatest pitfall for a programmer looking to show off his database-fu is that he fails to realize that SQL is not procedural (nor is it functional or object-oriented, for that matter). Rather than thinking in terms of data and operations on data one must think of sets of data and relationships among those sets. This usually crops up with the improper use of a subquery:

    SELECT a.id,
    (SELECT MAX(created)
    FROM posts
    WHERE author_id = a.id)
    AS latest_post
    FROM authors a

    Since this subquery is correlated, i.e., references a table in the outer query, one should convert the subquery to a join.

    SELECT a.id, MAX(p.created) AS latest_post
    FROM authors a
    INNER JOIN posts p
    ON (a.id = p.author_id)
    GROUP BY a.id
  8. Understand your engines

    MySQL has two primary storange engines: MyISAM and InnoDB. Each has its own performance characteristics and considerations. In the broadest sense MyISAM is good for read-heavy data and InnoDB is good for write-heavy data, though there are cases where the opposite is true. The biggest gotcha is how the two differ with respect to the COUNT function.

    MyISAM keeps an internal cache of table meta-data like the number of rows. This means that, generally, COUNT(*) incurs no additional cost for a well-structured query. InnoDB, however, has no such cache. For a concrete example, let's say we're trying to paginate a query. If you have a query SELECT * FROM users LIMIT 5,10, let's say, running SELECT COUNT(*) FROM users LIMIT 5,10 is essentially free with MyISAM but takes the same amount of time as the first query with InnoDB. MySQL has a SQL_CALC_FOUND_ROWS option which tells InnoDB to calculate the number of rows as it runs the query, which can then be retreived by executing SELECT FOUND_ROWS(). This is very MySQL-specific, but can be necessary in certain situations, particularly if you use InnoDB for its other features (e.g., row-level locking, stored procedures, etc.).

  9. MySQL specific shortcuts

    MySQL provides many extentions to SQL which help performance in many common use scenarios. Among these are INSERT … SELECT, INSERT … ON DUPLICATE KEY UPDATE, and REPLACE.

    I rarely hesitate to use the above since they are so convenient and provide real performance benefits in many situations. MySQL has other keywords which are more dangerous, however, and should be used sparingly. These include INSERT DELAYED, which tells MySQL that it is not important to insert the data immediately (say, e.g., in a logging situation). The problem with this is that under high load situations the insert might be delayed indefinitely, causing the insert queue to baloon. You can also give MySQL index hints about which indices to use. MySQL gets it right most of the time and when it doesn't it is usually because of a bad scheme or poorly written query.

  10. And one for the road…

    Last, but not least, read Peter Zaitsev's MySQL Performance Blog if you're into the nitty-gritty of MySQL performance. He covers many of the finer aspects of database administration and performance.