Investigation of Ann's AppleTV Network Capture File

by Lou Arminio


1. What is the MAC address of Ann's AppleTV?

We need a utility capable of not only extracting TCP flows from a pcap file, but one that could return detailed information on the flows contained in the file. The script pcaputil was developed to list pcap contents, including verbose information such as MAC address of source and destination systems, and extract tcp streams.

$ ./pcaputil --read evidence03.pcap --verbose --lookup

Pcap Index: evidence03.pcap
Capture Start Time: Mon Dec 28 04:08:01 2009 GMT 
INDEX (SRC MAC ADDR)         SOURCE.PORT     (DST MAC ADDR)    DESTINATION.PORT TIMEVALUE
[  1] (002500fe07c4)   192.168.1.10.49163 -> (002369ad577b)     8.18.65.67.http  0.403956
[  2] (002500fe07c4)   192.168.1.10.49164 -> (002369ad577b) 66.235.132.121.http  1.818230
[  3] (002500fe07c4)   192.168.1.10.49165 -> (002369ad577b)     8.18.65.32.http  15.874471
[  4] (002500fe07c4)   192.168.1.10.49166 -> (002369ad577b) 66.235.132.121.http  16.724160
[  5] (002500fe07c4)   192.168.1.10.49167 -> (002369ad577b)     8.18.65.58.http  16.835388
[  6] (002500fe07c4)   192.168.1.10.49168 -> (002369ad577b)     8.18.65.67.http  34.998462
[  7] (002500fe07c4)   192.168.1.10.49169 -> (002369ad577b) 66.235.132.121.http  35.190638
[  8] (002500fe07c4)   192.168.1.10.49170 -> (002369ad577b)     8.18.65.82.http  35.373253
[  9] (002500fe07c4)   192.168.1.10.49171 -> (002369ad577b)     8.18.65.27.http  55.8039
[ 10] (002500fe07c4)   192.168.1.10.49172 -> (002369ad577b) 66.235.132.121.http  55.919446
[ 11] (002500fe07c4)   192.168.1.10.49173 -> (002369ad577b)     8.18.65.27.http  71.753636
[ 12] (002500fe07c4)   192.168.1.10.49174 -> (002369ad577b) 66.235.132.121.http  71.899640
[ 13] (002500fe07c4)   192.168.1.10.49175 -> (002369ad577b) 66.235.132.121.http  80.35208
[ 14] (002500fe07c4)   192.168.1.10.49176 -> (002369ad577b)     8.18.65.67.http  87.917563
[ 15] (002500fe07c4)   192.168.1.10.49177 -> (002369ad577b)     8.18.65.10.http  88.320913
[ 16] (002500fe07c4)   192.168.1.10.49178 -> (002369ad577b) 66.235.132.121.http  100.857134
[ 17] (002500fe07c4)   192.168.1.10.49179 -> (002369ad577b)     8.18.65.88.http  106.366941
[ 18] (002500fe07c4)   192.168.1.10.49180 -> (002369ad577b) 66.235.132.121.http  106.523458
[ 19] (002500fe07c4)   192.168.1.10.49181 -> (002369ad577b)     8.18.65.89.http  140.655199
[ 20] (002500fe07c4)   192.168.1.10.49182 -> (002369ad577b) 66.235.132.121.http  140.811397

MAC address of Ann's AppleTV is 00:25:00:fe:07:c4


2. What User-Agent string did Ann's AppleTV use in HTTP requests?

First, let's break the TCP flows out from the PCAP file using pcaputil.

$ ./pcaputil --read evidence03.pcap --dir TCPFlows/ --prefix ann --dump 1-20 

Now let's look at the first one. This is a HTTP session between Ann's AppleTV and the server. A tool was developed to parse the HTTP file, separating the request and response parts, and allowing us to break out responses so they can be examined further. First, let's just look at the first flow:

$ ./httpparse --read TCPFlows/ann-01 

[1] Request:
  GET /WebObjects/MZStore.woa/wa/viewGrouping?id=39 HTTP/1.1
  UserAgent: AppleTV/2.4
  Cookie: s_vi=[CS]v1|259C176A85010C29-6000010D80115D7F[CE]
  Host: ax.itunes.apple.com

[1] Reply:
  HTTP/1.1 200 OK
  Content-Encoding: gzip
  Content-Type: text/xml
  Content-Length: 16551

Ann's AppleTV uses "AppleTV/2.4" as its User-Agent.


3. What were Ann's first four search terms on the AppleTV (all incremental searches count)?

Using httpparse to examine the flows, we see GET requests to ax.search.itunes.apple.com starting in file ann-03. We see by the q= variable embedded in the URL that the incremental searches were "h", "ha", "hac", and "hack".

$ ./httpparse --read TCPFlows/ann-03 | grep GET

  GET /WebObjects/MZSearch.woa/wa/incrementalSearch?media=movie&q=h HTTP/1.1
  GET /WebObjects/MZSearch.woa/wa/incrementalSearch?media=movie&q=ha HTTP/1.1
  GET /WebObjects/MZSearch.woa/wa/incrementalSearch?media=movie&q=hac HTTP/1.1
  GET /WebObjects/MZSearch.woa/wa/incrementalSearch?media=movie&q=hack HTTP/1.1

Search terms: "h" "ha" "hac" "hack"


4. What was the title of the first movie Ann clicked on?

We'll need to examine the results returned to Ann from from iTunes in order to determine what she saw, and clicked on. The httpparse script is able to parse through HTTP sessions and extract the contents returned. The script was designed to operate on a single file at a time. Since we have many TCP flows representing HTTP traffic, we'll process them sequentially in a loop. We'll generate md5 information for the files when we do this.

$ for i in `ls TCPFlows/`; do ./httpparse -r TCPFlows/$i --dir Extracts --all --md5 ; done

Using directory Extracts for output
Dumping file Extracts/ann-01-01.xml
Writing md5 hash to Extracts/ann-01-01.xml.md5

Using directory Extracts for output
Dumping file Extracts/ann-02-01.gif
Writing md5 hash to Extracts/ann-02-01.gif.md5

Using directory Extracts for output
Dumping file Extracts/ann-03-01.xml
Writing md5 hash to Extracts/ann-03-01.xml.md5
Dumping file Extracts/ann-03-02.xml
Writing md5 hash to Extracts/ann-03-02.xml.md5
Dumping file Extracts/ann-03-03.xml
Writing md5 hash to Extracts/ann-03-03.xml.md5
Dumping file Extracts/ann-03-04.xml
Writing md5 hash to Extracts/ann-03-04.xml.md5
<lots of output removed>

The Extracts directory now contains all the contents returned to Ann's Apple TV. To see the results returned after Ann finished typing her search term "hack", we'll have to examine Extracts/ann-03-04.xml, which was the last return value received.

The file is of type XML, and is a PLIST file. While this file is somewhat readable, it would be nice to make it more so. The plistparse script was developed to do just this. By default the script will clean up but preserve the format and of the PLIST XML file, and will accept regular expressions in order to more selectively display the contents of the file.

Examining the PLIST returned to Ann after her search, we see a rather large list of results. The results include both movie titles and URLs for each movie. We'll have to find the HTTP flow resulting from Ann clicking on a URL, and come back to this list in order to find the title of the movie Ann selected.

The next couple of flows, ann-04 and ann-05, appear to be related to the search, returning thumbnails and other images. The flow containing the link Ann clicked on is ann-06. The first GET request in this flow is to view a movie with id=333441649.

$ ./httpparse -r TCPFlows/ann-06| grep GET
  GET /WebObjects/MZStore.woa/wa/viewMovie?id=333441649&s=143441 HTTP/1.1
  GET /WebObjects/MZStore.woa/wa/relatedItemsShelf?ct-id=3&id=333441649&storeFrontId=143441&mt=6 HTTP/1.1
We can now examine the ann-03-04.xml search result and find the name of this movie. We'll use some filters to reduce the amount of data returned by plistparse.
$ ./plistparse -r Extracts/ann-03-04.xml -x "title$" -x 333441649 | less

<partial output>
  Key: title
  String_Value: Hackers
  Key: url
  String_Value: http://ax.itunes.apple.com/WebObjects/MZStore.woa/wa/viewMovie?id=333441649&s=143441

Ann selected the movie "Hackers"


5. What was the full URL to the movie trailer (defined by "preview-url")?

The results returned when Ann clicked on the URL were returned in the PLIST file Extracts/ann-06-01.xml. Searching for "preview-url" in this file, we find the same URL repeated several times.

$ ./plistparse -r Extracts/ann-06-01.xml -x preview-url
  Key: preview-url
  String_Value: http://a227.v.phobos.apple.com/us/r1000/008/Video/62/bd/1b/mzm.plqacyqb..640x278.h264lc.d2.p.m4v
<repeat lines omitted>

Full URL was http://a227.v.phobos.apple.com/us/r1000/008/Video/62/bd/1b/mzm.plqacyqb..640x278.h264lc.d2.p.m4v


6. What was the title of the second movie Ann clicked on?

The next viewMovie GET request is in flow ann-14 and was for a movie with id=283963264. Doing a primitive search through all our PLIST XML files, we find a couple of candidates for the search list containing this movie.

$ grep -l 283963264 Extracts/*  
Extracts/ann-09-01.xml
Extracts/ann-09-02.xml
Extracts/ann-09-03.xml
Extracts/ann-11-02.xml
Extracts/ann-11-03.xml
Extracts/ann-14-01.xml

Best guess for search result containing this movie is ann-09-03.xml.

$ ./plistparse -r Extracts/ann-09-03.xml -x "title$" -x 283963264 
  Key: title
  String_Value: Sneakers
  Key: url
  String_Value: http://ax.itunes.apple.com/WebObjects/MZStore.woa/wa/viewMovie?id=283963264&s=143441

Title of the second movie was "Sneakers".


7. What was the price to buy it (defined by "price-display")?

To get the price returned for the movie we'll look at the PLIST returned when the movie was selected (ann-14-01.xml).

$ ./plistparse -r Extracts/ann-14-01.xml -x "^price-display"
  Key: price-display
  String_Value: $9.99
  Key: price-display
  String_Value: $9.99

The price of the movie was $9.99.


8. What was the last full term Ann searched for?

The last flows to ax.search.itunes.apple.com was contained in TCPFlows/19-ann.

$ ./httpparse -r TCPFlows/ann-19  | grep GET
  GET /WebObjects/MZSearch.woa/wa/incrementalSearch?media=movie&q=iknowyourewa HTTP/1.1
  GET /WebObjects/MZSearch.woa/wa/incrementalSearch?media=movie&q=iknowyourewat HTTP/1.1
  GET /WebObjects/MZSearch.woa/wa/incrementalSearch?media=movie&q=iknowyourewatc HTTP/1.1
  GET /WebObjects/MZSearch.woa/wa/incrementalSearch?media=movie&q=iknowyourewatch HTTP/1.1
  GET /WebObjects/MZSearch.woa/wa/incrementalSearch?media=movie&q=iknowyourewatchi HTTP/1.1
  GET /WebObjects/MZSearch.woa/wa/incrementalSearch?media=movie&q=iknowyourewatchin HTTP/1.1
  GET /WebObjects/MZSearch.woa/wa/incrementalSearch?media=movie&q=iknowyourewatching HTTP/1.1
  GET /WebObjects/MZSearch.woa/wa/incrementalSearch?media=movie&q=iknowyourewatchingm HTTP/1.1
  GET /WebObjects/MZSearch.woa/wa/incrementalSearch?media=movie&q=iknowyourewatchingme HTTP/1.1

Ann's last search term was "iknowyourewatchingme".

New Tools Developed For The Contest

Several tools were written to automate the process of analyzing the pcap file provided as evidence. They are described below. All tools were written in PERL 5.10 and developed on a Fedora 11 platform.

Note: If any of the linked scripts below do not display in your browser, you may need to right-click and save the file in order to see the script. All scripts were created on Fedora and contain Linux format line-end characters. If you are using Windows, use Wordpad rather than Notepad to view the files properly.

pcaputil

pcaputil reads pcap files and either lists or extracts TCP streams. It is loosely based on pcapcat, written by Kristinn Gudjonsson for Puzzle 1. I used Kristinn's utility as part of my solution for Puzzle 2, but for Puzzle 3 I wanted more functionality than it delivered. pcaputil is able to list more details of TCP sessions it finds, including MAC address, packet timestamps, and host and service names. It also differs from pcapcat in that it lacks the features of adding a tcpdump style filter and the ability to list every packet found.

pcaputil allows the user to extract multiple TCP streams on a single invocation. A range of indexes of the streams is given on the command line. Ranges of indexes can be referenced as a comma separated list or as a range of numbers separated by a dash (-). The destination directory of the extracted streams may be specified, as well as the filename prefix. index numbers are appended to the filename prefix to identify what stream the file represents.

httpparse

httpparse reads a TCP stream representing an HTTP session between a client and a host. without any options, summary information on the transactions is printed. The summary information includes client GET requests, and client headers such as UserAgent, Host, and Cookie information. Reply data includes the initial HTTP status, Content-Encoding, Content-Type, and Content-Length information.

Each HTTP stream may contain several client and server interactions. The responses from the server will be varied, but will typically be HTML/XML to be rendered by the browser, or images in one of several formats. httpparse will extract the contents of server replies, performing necessary transformations from base64 encoding and zip compression, and saving the results to a file of the type indicated the MIME type indicated in the reply headers. MD5 hashes are optionally computed on the resulting files and stored along side the extracted files. One or more server responses can be extracted from a stream in a single invocation of httpparse by either listing index numbers of replies on the command line. An option to extract all replies without listing them is also available.

plistparse

plistparse reads XML formatted Apple PLIST files and aids in the reading and finding specific data contained in the PLIST. The heart of plistparse is the PlistParse.pm module, which implements a finite state machine that recognizes the grammer of Apple PLIST files which are in XML format. The grammar is defined in the file PlistParse.yp, which is compiled using the PERL Parse::Yapp module. Parse::Yapp can be obtained from CPAN, however it is not needed to run plistparse. PlistParse.pm is self-contained and does not require Parse::Yapp to be installed. In order to run plistparse, PlistParse.pm must either be installed as a local PERL module, or kept in the same directory as plistparse.

Without options, plistparse simply echoes the contents of a PLIST XML file, but with XML tags stripped and replaced with less verbose labels. The result is the original data contained in the PLIST but in a (hopefully) more human readable format. In order to further facilitate finding data within PLIST files, plistparse accepts PERL style regular expressions, either directly on the command line or contained in a file, to limit the information it displays.

Modules built with the Parse::Yapp module, including PlistParse.pm, contain code copywrited by Francois Desarmenien, and released under the GNU GPL.