Thinking about IDL style descriptions of document formats

I’ve been background processing about IDL style definitions of document formats for the last few days. Specifically, I’m interested in ways of expressing the structure of a document outside of code, and then having code generated to process the specified document. Sort of like lex and yacc, but more flexible and not language specific. This would mean that when you wanted to process a document in your chosen language, you wouldn’t have to deal with things like SWIG — you’d just generate the native code and go for it.

Obviously these ideas aren’t new. DCE RPC’s IDL language is like this, as is Google’s protobuffers. However, I want something more generic. Has anyone seen something like this?