Packet capture in python

Share

I’m home sick with a cold today and got bored. I wanted to play with packet capture in python, and the documentation for pcapy is a little sparse. I therefore wrote this simple little sample script:

    #!/usr/bin/python
    
    # A simple example of how to use pcapy. This needs to be run as root.
    
    import datetime
    import gflags
    import pcapy
    import sys
    
    FLAGS = gflags.FLAGS
    gflags.DEFINE_string('i', 'eth1',
                         'The name of the interface to monitor')
    
    def main(argv):
      # Parse flags
      try:
        argv = FLAGS(argv)
      except gflags.FlagsError, e:
        print FLAGS
    
      print 'Opening %s' % FLAGS.i
    
      # Arguments here are:
      #   device
      #   snaplen (maximum number of bytes to capture _per_packet_)
      #   promiscious mode (1 for true)
      #   timeout (in milliseconds)
      cap = pcapy.open_live(FLAGS.i, 100, 1, 0)
    
      # Read packets -- header contains information about the data from pcap,
      # payload is the actual packet as a string
      (header, payload) = cap.next()
      while header:
        print ('%s: captured %d bytes, truncated to %d bytes'
               %(datetime.datetime.now(), header.getlen(), header.getcaplen()))
    
        (header, payload) = cap.next()
    
    if __name__ == "__main__":
      main(sys.argv)
    

Which outputs something like this:

    2008-11-25 10:09:53.308310: captured 98 bytes, truncated to 98 bytes
    2008-11-25 10:09:53.308336: captured 66 bytes, truncated to 66 bytes
    2008-11-25 10:09:53.315028: captured 66 bytes, truncated to 66 bytes
    2008-11-25 10:09:53.316520: captured 130 bytes, truncated to 100 bytes
    2008-11-25 10:09:53.317030: captured 450 bytes, truncated to 100 bytes
    2008-11-25 10:09:53.324414: captured 124 bytes, truncated to 100 bytes
    2008-11-25 10:09:53.327770: captured 114 bytes, truncated to 100 bytes
    2008-11-25 10:09:53.328001: captured 210 bytes, truncated to 100 bytes
    

Next step, decode me some headers!

Share

Finding locking deadlocks in python

Share

I re-factored some code today, and in the process managed to create a lock deadlock for myself. In the end it turned out to be an exception was being thrown when a lock was held, and adding a try / finally resolved the real underlying problem. However, in the process I ended up writing this little helper that I am sure will be useful in the future.

    import gflags
    import thread
    import threading
    import traceback
    import logging
    
    ...
    
    FLAGS = gflags.FLAGS
    gflags.DEFINE_boolean('dumplocks', False,
                          'If true, dumps information about lock activity')
    ...
    
    class LockHelper(object):
      """A wrapper which makes it easier to see what locks are doing."""
    
      lock = thread.allocate_lock()
    
      def acquire(self):
        if FLAGS.dumplocks:
          logging.info('%s acquiring lock' % threading.currentThread().getName())
          for s in traceback.extract_stack():
            logging.info('  Trace %s:%s [%s] %s' % s)
        self.lock.acquire()
    
      def release(self):
        if FLAGS.dumplocks:
          logging.info('%s releasing lock' % threading.currentThread().getName())
          for s in traceback.extract_stack():
            logging.info('  Trace %s:%s [%s] %s' % s)
        self.lock.release()
    

Now I can just use this helper in the place of thread.allocate_lock() when I want to see what is happening with locking. It saved me a lot of staring at random code today.

Share

paramiko exec_command timeout

Share

I have a paramiko program which sshs to a large number of machines, and sometimes it hits a machine where Channel.exec_command() doesn’t return. I know this is a problem with the remote machine, because the same thing happens when I try to ssh to the machine from the command line. However, I don’t have any way of determining which machines are broken beforehand.

Paramiko doesn’t support a timeout for exec_command(), so I am looking for a generic way of running a function call with a timeout. I can see sample code which does this using threads, but that’s pretty ugly. I can’t use SIGALARM because I am not running on the main thread.

Can anyone think of a better way of doing this?

Share

Weird paramiko problem

Share

I had a strange paramiko problem the other day. Sometimes executing a command through a channel (via the exec_command() call) would result in an exit code being returned, but no stdout or stderr. This was for a command I was absolutely sure always returns output, and it wasn’t consistent — I’d run batches of commands and about 10% of them would fail, but not always on the same machine and not always at the same time. I spent ages looking at my code, and the code for the command running at the other end of the channel.

Then it occurred to me that this seemed a lot like a race condition. I started looking at the code for the paramiko Channel class, and ended up deciding that the answer was to check that the eof_received member variable was true before trying to close the channel.

It turns out this just works. I’ve my code running commands for a couple of days now and have had zero more instances of the “no output, but did exit” error. So, there you go. Its a shame that member variable doesn’t have accessors and isn’t documented though. I guess that makes my code a little more fragile than I would be happy with.

Share

Executing a command with paramiko

Share

I wanted to provide a simple example of how to execute a command with paramiko as well. This is quite similar to the scp example, but is nicer than executing a command in a shell because there isn’t any requirement to do parsing to determine when the command has finished executing.

    #!/usr/bin/python
    
    # A simple command example for Paramiko.
    # Args:
    #   1: hostname
    #   2: username
    #   3: command to run
    
    import getpass
    import os
    import paramiko
    import socket
    import sys
    
    # Socket connection to remote host
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock.connect((sys.argv[1], 22))
    
    # Build a SSH transport
    t = paramiko.Transport(sock)
    t.start_client()
    t.auth_password(sys.argv[2], getpass.getpass('Password: '))
    
    # Start a cmd channel
    cmd_channel = t.open_session()
    cmd_channel.exec_command(sys.argv[3])
    
    data = cmd_channel.recv(1024)
    while data:
      sys.stdout.write(data)
      data = cmd_channel.recv(1024)
    
    # Cleanup
    cmd_channel.close()
    t.close()
    sock.close()
    
Share

Implementing SCP with paramiko

Share

Regular readers will note that I’ve been interested in how scp works and paramiko for the last couple of days. There are previous examples of how to do scp with paramiko out there, but the code isn’t all on one page, you have to read through the mail thread and work it out from there. I figured I might save someone some time (possibly me!) and note a complete example of scp with paramiko…

    #!/usr/bin/python
    
    # A simple scp example for Paramiko.
    # Args:
    #   1: hostname
    #   2: username
    #   3: local filename
    #   4: remote filename
    
    import getpass
    import os
    import paramiko
    import socket
    import sys
    
    # Socket connection to remote host
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock.connect((sys.argv[1], 22))
    
    # Build a SSH transport
    t = paramiko.Transport(sock)
    t.start_client()
    t.auth_password(sys.argv[2], getpass.getpass('Password: '))
    
    # Start a scp channel
    scp_channel = t.open_session()
    
    f = file(sys.argv[3], 'rb')
    scp_channel.exec_command('scp -v -t %s\n'
                             % '/'.join(sys.argv[4].split('/')[:-1]))
    scp_channel.send('C%s %d %s\n'
                     %(oct(os.stat(sys.argv[3]).st_mode)[-4:],
                       os.stat(sys.argv[3])[6],
                       sys.argv[4].split('/')[-1]))
    scp_channel.sendall(f.read())
    
    # Cleanup
    f.close()
    scp_channel.close()
    t.close()
    sock.close()
    
Share

SSL, X509, ASN.1 and certificate validity dates

Share

I was curious about how SSL certificates store validity information (for example when a certificate expires), so I ended up reading the X509 specification (excitingly called “Internet X.509 Public Key Infrastructure Certificate and CRL Profile”), as well as the ASN.1 information for UTCTimes. This is all new to me, but I am sure lots of other people understand this.

In the end it wasn’t too hard, and now I have hacked support for displaying certificate validity into Python’s TLSlite. The point of this post is mainly so I can find that documentation again if I need it, although I’ll put the TLSlite patch online as soon as I have had a chance to test it a little better.

Share

Dealing with remote HTTP servers with buggy chunking implementations

Share

HTTP 1.1 implements chunking as a way of servers telling clients how much content is left for a given request, which enables you to send more than one piece of content in a given HTTP connection. Unfortunately for me, the site I was trying to access has a buggy chunking implementation, and that causes the somewhat fragile python urllib2 code to throw an exception:

    Traceback (most recent call last):
      File "./mythingie.py", line 55, in ?
        xml = remote.readlines()
      File "/usr/lib/python2.4/socket.py", line 382, in readlines
        line = self.readline()
      File "/usr/lib/python2.4/socket.py", line 332, in readline
        data = self._sock.recv(self._rbufsize)
      File "/usr/lib/python2.4/httplib.py", line 460, in read
        return self._read_chunked(amt)
      File "/usr/lib/python2.4/httplib.py", line 499, in _read_chunked
        chunk_left = int(line, 16)
    ValueError: invalid literal for int():
    

I muttered about this earlier today, including finding the bug tracking the problem in pythonistan. However, finding the will not fix bug wasn’t satisfying enough…

It turns out you can just have urllib2 lie to the server about what HTTP version it talks, and therefore turn off chunking. Here’s my sample code for how to do that:

    import httplib
    import urllib2
    
    class HTTP10Connection(httplib.HTTPConnection):
      """HTTP10Connection -- a HTTP connection which is forced to ask for HTTP
         1.0
      """
    
      _http_vsn_str = 'HTTP/1.0'
    
    class HTTP10Handler(urllib2.HTTPHandler):
      """HTTP10Handler -- don't use HTTP 1.1"""
    
      def http_open(self, req):
        return self.do_open(HTTP10Connection, req)
    
    // ...
    
      request = urllib2.Request(feed)
      request.add_header('User-Agent', 'mythingie')
      opener = urllib2.build_opener(HTTP10Handler())
    
      remote = opener.open(request)
      content = remote.readlines()
      remote.close()
    

I hereby declare myself Michael Still, bringer of the gross python hacks.

Share

Universal Feedparser and XML namespaces

Share

I’ve always found python’s Universal Feedparser to be a bit hard to work with when using feeds with XML namespaces. Specifically, if you don’t care about the stuff in the namespaces then you’re fine, but if you want that data it gets a lot harder.

In the past I’ve had to do some gross hacks. For example this gem is from the MythNetTV code:

      # Modify the XML to work around namespace handling bugs in FeedParser
      lines = []
      re_mediacontent = re.compile('(.*)<media:content([^>]*)/ *>(.*)')
    
      for line in xmllines:
        m = re_mediacontent.match(line)
        count = 1
        while m:
          line = '%s<media:wannabe%d>%s</media:wannabe%d>%s' %(m.group(1), count,
                                                             m.group(2),
                                                             count, m.group(3))
          m = re_mediacontent.match(line)
          count = count + 1
    
        lines.append(line)
    
      # Parse the modified XML
      xml = ''.join(lines)
      parser = feedparser.parse(xml)
    

Which is horrible, but works. This time around the problem is that I am having trouble getting to the gr:annotation tags in my Google reader shared items feed. How annoying.

In the case of the Google reader feed, the problem seems to be that the annotation is presented like this:

    <gr:annotation><content type="html">Awesome. Canberra has needed
    something better than buses between the towncenters for a while, and light rail
    seems like a great way to do it. I much prefer trains to buses, and catch a
    light rail service to work every day when I am in Mountain View.
    </content><author gr:user-id="09387883873401903052"
    gr:profile-id="114835605728492647856"><name>mikal</name>
    </author></gr:annotation>
    

Feedparser can only handle simple elements (not elements that contain other elements). Therefore, this gross hack is required to get this to parse correctly:

      simplify_re = re.compile('(.*)<gr:annotation>'
                               '<content type="html">(.*)</content>'
                               '<author .*><name>.*</name></author>'
                               '</gr:annotation>(.*)')
    
      new_lines = []
      for line in lines:
        m = simplify_re.match(line)
        if m:
          new_lines.append('%s<gr:annotation>%s</gr:annotation>%s'
                           %(m.group(1), m.group(2), m.group(3)))
        else:
          new_lines.append(line)
    
      d = feedparser.parse(''.join(new_lines))
    

Gross, and fragile, but working. This is cool, because it now means that I can apply more logic in the shared links that end up in my blather feed. I’m thinking of something along the lines of only shared links with an annotation will end up in that feed, and the blather entry will include the annotation. Or something like that.

Share

Domain name lookup helper for python?

Share

Hi. I have a list of the domain portion of URLs which looks a bit like this:

Whois lookup for fycnds.digitalpoimt.com
Whois lookup for wvgpzdea.digitalpoimt.com
Whois lookup for zhnsht.digitalpoimt.com
Whois lookup for frigo25.php5.cz
Whois lookup for handrovina.php5.cz
Whois lookup for blabota.php5.cz
Whois lookup for pctuzing.php5.cz
Whois lookup for viagraviagra.php5.cz
Whois lookup for poiu.php5.cz
Whois lookup for flasa.php5.cz
Whois lookup for yoy4.digitalpoimt.com
Whois lookup for hskly.digitalpoimt.com
Whois lookup for 2i0wjwbc.digitalpoimt.com
Whois lookup for harnhjc.digitalpoimt.com
Whois lookup for gqru.digitalpoimt.com

I need some code which determines which portion of these hostnames is a whois-able domain name. My problem is this doesn’t seem all that simple to do — some countries have a second layer of TLDs, and some do not.

Does anyone know of a python library, or failing that simple algorithm, which will do this for me?

(For those left wondering, I am trying to do some analysis of the spam I get on this blog, and for that I want to know if the whois information for a domain that left a suspect comment indicates anything suspicious.)

Share