File formats
From OpenSourceGov
As important as using free and open source software is, data formats are just as important. Data almost always outlives the applications that read them. In the 1980s, the BBC partnered with Acorn Computers to create the BBC Domesday Project. Unlike the original Domesday Book, which was stored on the open but inflexible format called paper, the digital Domesday Project was written in BCPL and viewable only on a special LaserDisc player.
Hardware ought to output data in as open a format as is practical, and over an open protocol. Devices used be government should ideally use open interchange formats and an open protocol to get the data off the device. See Hardware Standards.
Similarly, even if software used is not open source, it should ideally have a standard interface: databases should speak SQL, pretty much everything should work with TCP/IP and SSL. Web apps should produce good quality and accessible HTML, CSS and JavaScript. Of course, open standards and file formats are only one part of the equation and government should ideally be using free and open source software.
Contents |
How to pick data formats
- The format should be widely understood and relatively easy-to-understand.
- Formats that are based on plain text ought to be preferred to binaries unless the use of a binary format is absolutely necessary.
- Try to ensure that there aren't patent or IP restrictions on the format. The W3C, for instance, have a strict Patent Policy to prevent vendors from using the standards process to push their proprietary technologies, and is an ideal model of how to prevent problems.
- Formats that are advanced by blatant and gross abuse of the standards process ought to be avoided - see NoOOXML.
Good choices
- Plain text - ASCII and Unicode
- HTML/XHTML + CSS + JavaScript
- To include data in HTML, use Microformats, GRDDL and RDFa.
- TeX/LaTeX/XeTeX, DVI
- XML, XSL, RELAX NG
- Resource Description Framework (RDF) - RDF/XML, Notation3, Turtle
- Scalable Vector Graphics (SVG)
- Open Document Format (ODF)
- Rich Text Format (RTF)
- Portable Document Format (PDF)
- Ogg Vorbis and Theora (OGG)
- Free Lossless Audio Codec (FLAC)
- JPEG, PNG
- Digital Negative (DNG)
Archiving
- E-mail is a plain-text format. To preserve e-mails, it's best to keep them like this: look at Berkeley Mailbox format , mbox and Maildir formats over Microsoft's Personal Storage Table format.
- If you've got mail in Microsoft's proprietary PST format, this document may help you to convert them.
- Mozila's Thunderbird client uses mbox, as does Evolution.
- To search Maildir, have a look at Mairix.

