Webmaster’s Blog

Generating Meaningful Download Stats from Amazon S3

Thursday, June 13, 2013

One of the reasons for moving our podcast files to Amazon S3 from Adobe Business Catalyst was the ability to get download statistics. Business Catalyst provides these only for files set up as “Media Downloads”, but as mentioned in an earlier post, there are bugs which prevent files served in this way from playing correctly using the HTML5

You can configure Amazon S3 to write log files for each “bucket” you create (an S3 bucket is a collection of files in a kind of mini file system). These log files are quite verbose. They record all access to files in the bucket — which includes the writing of log files also. This creates the somewhat silly situation where writing a log file generates more activity to be logged, necessitating writing another log file, etc. As a result, log files proliferate, even when there is little or no file downloading going on.

There is an S3 API for retrieving files from an S3 bucket, and this can be used to retrieve log files whose names match a particular prefix. Log files are named starting with the date in the format “2013-06-10”, etc, so a match prefix can be used to select log files corresponding to particular years, months and days. Luckily there are Ruby libraries (gems) for easily interfacing to the S3 API.

Once a log file is downloaded, it needs to be filtered for events of interest. In the case of the podcast files on this website, these are downloads (GETs) of podcast files (match a particular prefix) which are from bona fide users (and not from the Amazon S3 console).

The author has written a couple of Ruby scripts.

The first one downloads log files for a specified date range.

The second one analyses the collection of log files and provides some meaningful statistics.

The following is a report produced for the approximately six weeks leading up to 12th June 2013. The names are invented, instead of listing IP addresses. This turns out to be much more useful data than what we were getting in the past from Business Catalyst.

WEEK 18 starting Monday 29 April 2013

ep1-SatishKumar, downloads: 1, (Kristopher)
ep2-MichaelFunder, downloads: 1, (Kristopher)
ep3-FocusedChild, downloads: 1, (Kristopher)
ep4-TheSoulOfEducation, downloads: 1, (Kristopher)
1 person. 4 downloads.

WEEK 19 starting Monday 06 May 2013

ep1-SatishKumar, downloads: 2, (Katie Cherilyn)
ep2-MichaelFunder, downloads: 2, (Katie Cherilyn)
ep3-FocusedChild, downloads: 2, (Katie Cherilyn)
ep4-TheSoulOfEducation, downloads: 2, (Katie Cherilyn)
2 people. 8 downloads.

WEEK 21 starting Monday 20 May 2013

ep1-SatishKumar, downloads: 2, (Clarisa James)
ep2-MichaelFunder, downloads: 3, (Clarisa Kala James)
ep3-FocusedChild, downloads: 3, (Clarisa Kala James)
ep4-TheSoulOfEducation, downloads: 3, (Clarisa Kala James)
3 people. 11 downloads.

WEEK 22 starting Monday 27 May 2013

ep1-SatishKumar, downloads: 6, (Grace Andera Napoleon Raphael Martin Stasia)
ep2-MichaelFunder, downloads: 5, (Grace Andera Napoleon Martin Mark)
ep3-FocusedChild, downloads: 5, (Grace Andera Napoleon Martin Stasia)
ep4-TheSoulOfEducation, downloads: 5, (Grace Andera Napoleon Martin Dwayne)
8 people. 21 downloads.

WEEK 24 starting Monday 10 June 2013

ep1-SatishKumar, downloads: 1, (Raphael)
1 person. 1 downloads.


Recent Posts