Thursday 23 February 2012

GIDApp No. 2

For my second Android™ project/application, which will remain unnamed for the moment, I will first need to gather some historical data from more than a couple of local (Malaysian) web sites. Some of this publicly available data go back as far as as 1985! It is insane to even contemplate fetching all this data by hand and so I had to come up with some kind of software to help me do that easily and quickly.

Previously, I would simply use PHP's fsockopen to quickly grab a web document or 2, or to masquerade as a browser, but this time fsockopen was simply not going to cut it. I needed something a lot easier to set up and one that could survive whatever robot traps there are on these kinds of sites usually.

Curl and Wget

When amateurs like me want to develop software that will masquerade as web robots or crawlers, the obvious choices are of course Wget and Curl. I have had limited experience with Wget, especially when setting up cron jobs on my web servers, but I have never had any with Curl. After quickly doing some research on both, I concluded the one more suited for my needs today is Curl.

It took me nearly 3 weeks, but today I have completed my "web robot" that successfully crawls all the necessary web sites, grabs any document I want, extracts just the information I need and puts it all, very nicely, into a MySQL database!

My custom web crawler, powered by PHP and Curl, is able to connect to a web site, manage cookies, send referrer data, request compressed web pages, navigate itself around a web site to get to the best parts, fetch the document containing the data I want, parse it, just extract the data I need, verify that it is correct, and save it all to the database! And it does this all at the rate of 1.5 minutes for one month's worth of data from one web site.

Considering that I have over 20 years of data to fetch, and that too from more than one web site, it is not bad at all, if you ask me! :)

At this rate, GIDApp No. 2 should be ready in 3 months. :)

Friday 3 February 2012

Buying Cotton Swabs with GIDCompare

A couple of days ago, I was at the local pharmacy and decided I also needed to get myself some cotton buds. Cotton buds are so cheap that I am sure many of us don't even know the price we usually pay for one. I know I didn't. Like regular people, I'd usually just walk into a store, grab the first one that catches my eye or whatever is available on the shelf, pay, leave and not think about it until I run out again.

Since I have GIDCompare installed on my Android™ phone now though, nothing, not even cheap cotton swabs, is going to escape my scrutiny. :)

I must reveal right away that the results below are somewhat shocking and that I am quite disturbed by the price variances I found.

As you will see, if you buy a certain brand of cotton buds at a certain store in Malaysia and if you just happen to choose the wrong packaging, you can pay up to 196% more for the same product!

This particular brand of cotton swab comes in 3 different sizes/packaging. Let us compare the first 2 first:

i. 5x20 sticks @ RM2.90

ii. 2x120 sticks @ RM5.88

Preview image of file
GIDCompare i vs. ii

Preview image of file
GIDCompare Report i vs. ii

Using the 'move' button in GIDCompare, we'll move Package ii up into Item A and fill in the details of our 3rd package in as Item B.  Here is the third product packaged and priced in yet another way:

iii. 5x100 sticks @ RM4.90

Preview image of file
GIDCompare ii vs. iii

Preview image of file
GIDCompare Report ii vs. iii

All 3 packaging compared above contain the exact same product, but can costs a whopping 196% more if you pick/buy the wrong one!

From now on, I am going to use GIDCompare to compare unit cost prices for everything, and I mean EVERYTHING! :)