Tutorial: Develop a local Amazon price tracking application
In this article, we describe how to develop a simple Amazon price tracking application that runs locally. Because it serves single-user only, its design and implementation is very simple. In subsequent articles, we will describe how to move the application to the Web and make it available to multiple users.
Let's first describe the problem we try to solve. If you find a good product on Amazon that you want to buy, sometimes you may want to wait until its price drops further. Rather than checking the price manually every day, you need an automated script that checks the price at your desired interval and then notify you when it drops. You may also want to track multiple products at a time. The application that we describe in the following does exactly that.
First we assume that your local machine is connected to the Internet all the time. This is required when you need to run cron jobs once or a few times a day to retrieve up-to-date prices of the products you want to track.
Second, we need to decide how to store the products' price data. Even though currently it is a trivial application used by yourself, we still suggest that you use a database-driven approach as a bit forethought will go a long way. The simple SQLite database serves this purpose very well.
Next we need to design the tables that store the price data. There are only two tables needed for this trivial application.
One is the
products table that contains the
id,
tracking_id,
name,
url and
status fields.
The numeric id field is the primary key
and increments automatically whenever we insert a new record.
The string tracking_id field is actually
ASIN,
the Amazon Standard Identification Number that tracks all products
sold on Amazon.
For books with ISBN, the ISBN is the same as the ASIN. We don't name the field
ASIN because later on we may extend the application to track
products sold at other web sites which may use different product tracking
system. The url field contains the URL to the product and the
status field can be 1 (active) or 0 (inactive). We only track
active products and can turn products active or inactive at will.
The other table is the prices table. It contains the
id,
ts(time stamp)
and price fields.
The id and ts fields constitute the primary keys and
id is a foreign key that references the same-named id field in the
products table.
You can download the sample SQL file for creating tables to track Amazon product price changes and also the the sample data file for testing SQL statements.
Because this is an application that runs locally and we have complete control of the local box, we can choose whatever implementation language that we feel comfortable with. In this article, we use Perl and two major Perl modules.
One is Perl DBI that is the standard database interface for Perl. You also need to install the database dependent drivers for Perl. In this case, it is the DBD::SQLite module for the SQLite database.
The other Perl module is WWW::Mechanize which makes it very easy to automate various Web based tasks such as retrieving Amazon product URLs.
Now we are ready to write the functions that query or update the tables we have just designed.
-
get_active_product_info: This function retrieves the active products and return them in a reference to a hash. The key of the hash is the product ID and its value is the reference to another hash that contains properties of the product such as name, URL and tracking ID.The following code listing shows how to use Perl DBI to query the SQLite table for the desired information.
$dbh = DBI->connect("dbi:SQLite:dbname=/tmp/price.db","","");
$sql = "select * from products where status = 1";
$sth = $dbh->prepare($sql);
$sth->execute();
$results = $sth->fetchall_hashref('id');Suppose you have the product ID stored in the
$idvariable, then you can access its URL through the$resultsvariable as follows:$url = $results->{$id}->{url}; -
get_product_info_from_url: This function accepts a URL as a parameter and retrieves the URL from Amazon. It saves the result to a temporary file and then parses the file to read the product's name, price and ASIN.The following code listing shows how to retrieve the URL and save it to a temporary file.
$agentstr = 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12';
$agent = WWW::Mechanize->new(autocheck => 1, agent => $agentstr);
$agent->get($url);
$agent->save_content($tmpfile);You don't need to use any fancy parser to parse the HTML file. Line by line searching through regular expressions works fine. The following code listing shows how to extract the price, product ASIN and name from the HTML file and store them to the
%producthash variable.if (/class="priceLarge">\$([.0-9]+)/) {
$product{price} = $1;
}
if (/name="ASIN"\s+value="([^"]+)"/) {
$product{id} = $1;
}
if (/<meta\s+name="description"
\s+content=\"[^:]+:\s*([^"]+)/) {
$product{name} = $1;
} -
init_product_from_url: This function accepts a URL, extracts the product information from the URL and then inserts it into theproductstable and also updates thepricestable if the price is available. -
get_last_price: This function accepts a product ID and then finds its latest non-zero price stored in thepricestable. If the product is not found, then 0 is returned. This price is used for comparison with the current price of the product. If the latter is lower, then notification email should be sent. -
update_price_info: This function accepts a product ID and its price and then store it to thepricestable with the current timestamp. This is usually called from a cron job that runs periodically to record the price history. -
insert_product_info: This function accepts a reference to a hash that contains a product's information and stores it to theproductstable. Upon success, it returns the product's ID which should always be greater than 0. -
send_email: This function accepts a reference to a hash that contains a product's name, prior price and current price and sends notification email. This function should be called only when current price is lower than prior price. For simplicity, we just use the externalmailprogram to send the email.
To use the same script for adding new product URL, updating product prices and
etc, we depend on the Perl Getopt::Long module to parse various
command-line parameters. For example, we can run the script to add a product
URL for price tracking:
local_watcher.pl -a http://www.amazon.com/exec/obidos/ASIN/0596515170/yuonwesidewil-20 -d /tmp/price.db
Please note that the script can accept an alternative location for the database file. We can also use the same script to check the latest prices and send email if any price has dropped:
local_watcher.pl -d /tmp/price.db -u -m yuonlamp@yahoo.com
To add the task to your cron job, you can run the following command to edit your crontab entries:
crontab -e
It will open an editor window and you can add entries like the following:
30 8 * * * /home/www/local/bin/local_watcher.pl -u -m yuonlamp@yahoo.com -d /tmp/price.db
The above line specifies that at 8:30am everyday, the script should check the
prices for the products stored in /tmp/price.db and send email to
yuonlamp@yahoo.com if any price drops.
You can download the sample Perl script to track Amazon product price changes and adapt it to your local environment.
So far we have just described how to develop a simple application to track Amazon product price changes. We can extend it easily to track changes in other web sites. In subsequent articles, we will describe how to make it a useful Web service for your site's visitors. Please stay tuned. :-)
You may also check the Amazon products recommendation list that we find to be of good value.