It seems that there is no way of killing a blocking thread in python? The standard way of implementing thread death seems to be to implement an exit() method on the class which is the thread, and then call that when you want the thread to die. However, if the run() method of the thread class is blocking when you call exit(), then the thread doesn’t get killed. I can’t find a way of killing these threads cleanly on Linux — does anyone have any hints?
I’m home sick with a cold today and got bored. I wanted to play with packet capture in python, and the documentation for pcapy is a little sparse. I therefore wrote this simple little sample script:
#!/usr/bin/python # A simple example of how to use pcapy. This needs to be run as root. import datetime import gflags import pcapy import sys FLAGS = gflags.FLAGS gflags.DEFINE_string('i', 'eth1', 'The name of the interface to monitor') def main(argv): # Parse flags try: argv = FLAGS(argv) except gflags.FlagsError, e: print FLAGS print 'Opening %s' % FLAGS.i # Arguments here are: # device # snaplen (maximum number of bytes to capture _per_packet_) # promiscious mode (1 for true) # timeout (in milliseconds) cap = pcapy.open_live(FLAGS.i, 100, 1, 0) # Read packets -- header contains information about the data from pcap, # payload is the actual packet as a string (header, payload) = cap.next() while header: print ('%s: captured %d bytes, truncated to %d bytes' %(datetime.datetime.now(), header.getlen(), header.getcaplen())) (header, payload) = cap.next() if __name__ == "__main__": main(sys.argv)
Which outputs something like this:
2008-11-25 10:09:53.308310: captured 98 bytes, truncated to 98 bytes 2008-11-25 10:09:53.308336: captured 66 bytes, truncated to 66 bytes 2008-11-25 10:09:53.315028: captured 66 bytes, truncated to 66 bytes 2008-11-25 10:09:53.316520: captured 130 bytes, truncated to 100 bytes 2008-11-25 10:09:53.317030: captured 450 bytes, truncated to 100 bytes 2008-11-25 10:09:53.324414: captured 124 bytes, truncated to 100 bytes 2008-11-25 10:09:53.327770: captured 114 bytes, truncated to 100 bytes 2008-11-25 10:09:53.328001: captured 210 bytes, truncated to 100 bytes
Next step, decode me some headers!
I re-factored some code today, and in the process managed to create a lock deadlock for myself. In the end it turned out to be an exception was being thrown when a lock was held, and adding a try / finally resolved the real underlying problem. However, in the process I ended up writing this little helper that I am sure will be useful in the future.
import gflags import thread import threading import traceback import logging ... FLAGS = gflags.FLAGS gflags.DEFINE_boolean('dumplocks', False, 'If true, dumps information about lock activity') ... class LockHelper(object): """A wrapper which makes it easier to see what locks are doing.""" lock = thread.allocate_lock() def acquire(self): if FLAGS.dumplocks: logging.info('%s acquiring lock' % threading.currentThread().getName()) for s in traceback.extract_stack(): logging.info(' Trace %s:%s [%s] %s' % s) self.lock.acquire() def release(self): if FLAGS.dumplocks: logging.info('%s releasing lock' % threading.currentThread().getName()) for s in traceback.extract_stack(): logging.info(' Trace %s:%s [%s] %s' % s) self.lock.release()
Now I can just use this helper in the place of thread.allocate_lock() when I want to see what is happening with locking. It saved me a lot of staring at random code today.
I have a paramiko program which sshs to a large number of machines, and sometimes it hits a machine where Channel.exec_command() doesn’t return. I know this is a problem with the remote machine, because the same thing happens when I try to ssh to the machine from the command line. However, I don’t have any way of determining which machines are broken beforehand.
Paramiko doesn’t support a timeout for exec_command(), so I am looking for a generic way of running a function call with a timeout. I can see sample code which does this using threads, but that’s pretty ugly. I can’t use SIGALARM because I am not running on the main thread.
Can anyone think of a better way of doing this?
I had a strange paramiko problem the other day. Sometimes executing a command through a channel (via the exec_command() call) would result in an exit code being returned, but no stdout or stderr. This was for a command I was absolutely sure always returns output, and it wasn’t consistent — I’d run batches of commands and about 10% of them would fail, but not always on the same machine and not always at the same time. I spent ages looking at my code, and the code for the command running at the other end of the channel.
Then it occurred to me that this seemed a lot like a race condition. I started looking at the code for the paramiko Channel class, and ended up deciding that the answer was to check that the eof_received member variable was true before trying to close the channel.
It turns out this just works. I’ve my code running commands for a couple of days now and have had zero more instances of the “no output, but did exit” error. So, there you go. Its a shame that member variable doesn’t have accessors and isn’t documented though. I guess that makes my code a little more fragile than I would be happy with.
I wanted to provide a simple example of how to execute a command with paramiko as well. This is quite similar to the scp example, but is nicer than executing a command in a shell because there isn’t any requirement to do parsing to determine when the command has finished executing.
#!/usr/bin/python # A simple command example for Paramiko. # Args: # 1: hostname # 2: username # 3: command to run import getpass import os import paramiko import socket import sys # Socket connection to remote host sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.connect((sys.argv, 22)) # Build a SSH transport t = paramiko.Transport(sock) t.start_client() t.auth_password(sys.argv, getpass.getpass('Password: ')) # Start a cmd channel cmd_channel = t.open_session() cmd_channel.exec_command(sys.argv) data = cmd_channel.recv(1024) while data: sys.stdout.write(data) data = cmd_channel.recv(1024) # Cleanup cmd_channel.close() t.close() sock.close()
Regular readers will note that I’ve been interested in how scp works and paramiko for the last couple of days. There are previous examples of how to do scp with paramiko out there, but the code isn’t all on one page, you have to read through the mail thread and work it out from there. I figured I might save someone some time (possibly me!) and note a complete example of scp with paramiko…
#!/usr/bin/python # A simple scp example for Paramiko. # Args: # 1: hostname # 2: username # 3: local filename # 4: remote filename import getpass import os import paramiko import socket import sys # Socket connection to remote host sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.connect((sys.argv, 22)) # Build a SSH transport t = paramiko.Transport(sock) t.start_client() t.auth_password(sys.argv, getpass.getpass('Password: ')) # Start a scp channel scp_channel = t.open_session() f = file(sys.argv, 'rb') scp_channel.exec_command('scp -v -t %s\n' % '/'.join(sys.argv.split('/')[:-1])) scp_channel.send('C%s %d %s\n' %(oct(os.stat(sys.argv).st_mode)[-4:], os.stat(sys.argv), sys.argv.split('/')[-1])) scp_channel.sendall(f.read()) # Cleanup f.close() scp_channel.close() t.close() sock.close()
I was curious about how SSL certificates store validity information (for example when a certificate expires), so I ended up reading the X509 specification (excitingly called “Internet X.509 Public Key Infrastructure Certificate and CRL Profile”), as well as the ASN.1 information for UTCTimes. This is all new to me, but I am sure lots of other people understand this.
In the end it wasn’t too hard, and now I have hacked support for displaying certificate validity into Python’s TLSlite. The point of this post is mainly so I can find that documentation again if I need it, although I’ll put the TLSlite patch online as soon as I have had a chance to test it a little better.
HTTP 1.1 implements chunking as a way of servers telling clients how much content is left for a given request, which enables you to send more than one piece of content in a given HTTP connection. Unfortunately for me, the site I was trying to access has a buggy chunking implementation, and that causes the somewhat fragile python urllib2 code to throw an exception:
Traceback (most recent call last): File "./mythingie.py", line 55, in ? xml = remote.readlines() File "/usr/lib/python2.4/socket.py", line 382, in readlines line = self.readline() File "/usr/lib/python2.4/socket.py", line 332, in readline data = self._sock.recv(self._rbufsize) File "/usr/lib/python2.4/httplib.py", line 460, in read return self._read_chunked(amt) File "/usr/lib/python2.4/httplib.py", line 499, in _read_chunked chunk_left = int(line, 16) ValueError: invalid literal for int():
I muttered about this earlier today, including finding the bug tracking the problem in pythonistan. However, finding the will not fix bug wasn’t satisfying enough…
It turns out you can just have urllib2 lie to the server about what HTTP version it talks, and therefore turn off chunking. Here’s my sample code for how to do that:
import httplib import urllib2 class HTTP10Connection(httplib.HTTPConnection): """HTTP10Connection -- a HTTP connection which is forced to ask for HTTP 1.0 """ _http_vsn_str = 'HTTP/1.0' class HTTP10Handler(urllib2.HTTPHandler): """HTTP10Handler -- don't use HTTP 1.1""" def http_open(self, req): return self.do_open(HTTP10Connection, req) // ... request = urllib2.Request(feed) request.add_header('User-Agent', 'mythingie') opener = urllib2.build_opener(HTTP10Handler()) remote = opener.open(request) content = remote.readlines() remote.close()
I hereby declare myself Michael Still, bringer of the gross python hacks.
I’ve always found python’s Universal Feedparser to be a bit hard to work with when using feeds with XML namespaces. Specifically, if you don’t care about the stuff in the namespaces then you’re fine, but if you want that data it gets a lot harder.
In the past I’ve had to do some gross hacks. For example this gem is from the MythNetTV code:
# Modify the XML to work around namespace handling bugs in FeedParser lines =  re_mediacontent = re.compile('(.*)<media:content([^>]*)/ *>(.*)') for line in xmllines: m = re_mediacontent.match(line) count = 1 while m: line = '%s<media:wannabe%d>%s</media:wannabe%d>%s' %(m.group(1), count, m.group(2), count, m.group(3)) m = re_mediacontent.match(line) count = count + 1 lines.append(line) # Parse the modified XML xml = ''.join(lines) parser = feedparser.parse(xml)
Which is horrible, but works. This time around the problem is that I am having trouble getting to the gr:annotation tags in my Google reader shared items feed. How annoying.
In the case of the Google reader feed, the problem seems to be that the annotation is presented like this:
<gr:annotation><content type="html">Awesome. Canberra has needed something better than buses between the towncenters for a while, and light rail seems like a great way to do it. I much prefer trains to buses, and catch a light rail service to work every day when I am in Mountain View. </content><author gr:user-id="09387883873401903052" gr:profile-id="114835605728492647856"><name>mikal</name> </author></gr:annotation>
Feedparser can only handle simple elements (not elements that contain other elements). Therefore, this gross hack is required to get this to parse correctly:
simplify_re = re.compile('(.*)<gr:annotation>' '<content type="html">(.*)</content>' '<author .*><name>.*</name></author>' '</gr:annotation>(.*)') new_lines =  for line in lines: m = simplify_re.match(line) if m: new_lines.append('%s<gr:annotation>%s</gr:annotation>%s' %(m.group(1), m.group(2), m.group(3))) else: new_lines.append(line) d = feedparser.parse(''.join(new_lines))
Gross, and fragile, but working. This is cool, because it now means that I can apply more logic in the shared links that end up in my blather feed. I’m thinking of something along the lines of only shared links with an annotation will end up in that feed, and the blather entry will include the annotation. Or something like that.