Ongoing PandaLex development

Its been a long time since the last release of PandaLex. This is because I have been focussing more on Panda, apart from that I have been putting a fair bit of effort into getting the parser engine right.

The parser engine is nearly there. I have a test set of 2,347 test PDFs, of which PandaLex currently parses 2,320 correctly.

I’ll let you know when the next release is available…

PandaLex 0.1 Development Release

Here is the souce code for PandaLex so far. I have released this exactly as it appears in the Panda code, because this is a atarting point for people to have a look and comment on it. The following points should be noted:

  • It is not perfect
  • It is not a complete implementation of the PDF specification, version 1.3, yet
  • The next piece of development work for PandaLex is to get the hooks in place so that people can start using the parser for useful work. I am thinking this will take the form of a series of callback functions that can be defined by the user, but I am open to suggestions…

    Source (signed)

  • Why PandaLex?

    I have been thinking for the last day or so, and it occurs to me that the PDF parsing functionality in Panda is more generally useful than what is needed for Panda itself. For instance, the parse could also be used for pdf viewers, pdf modification (what Panda needs it for), or anything else you can think of.

    A good simple, and probably fairly common use of PandaLex would be a simple program to count the number of pages in a PDF document from the command line. This could be useful for determining if the document is damaged or something.

    Therefore, welcome to PandaLex‘s page… This is where development work for PandaLex will occur.