"Working" file formats, those used in the course of collecting and working with project data, are not always ideal for re-use or long-term preservation, and may not meet the requirements of data archives or repositories or satisfy the expectations of research funders.
In the absence of specific directives from funders or repositories, we offer the following general guidelines for selecting file formats for preservation and reuse. The University's scholarly repository, Research Online, has some specific guidelines for materials to be deposited there.
Open, non-proprietary formats are far more likely to remain usable even if the software that created them is not available or no longer functional. Formats whose documentation is complete and freely available also have a higher likelihood of long-term preservation. If the program that created the file is the only option for reading or accessing the data, it is likely to be a proprietary, non-open format. As a general rule, plain text formats, such as comma- or tab- delimited files, are open formats and are typically better for re-use and long-term preservation.
Formats that compress the information in a file are often smaller, but the compression often permanently removes data from the file. These formats are "lossy," while formats that do not result in the loss of information when uncompressed are "lossless."
If the encryption key, passphrase, or password to a file is lost, there may be no way to retrieve the data from the file later, rendering it unusable to others. Uncompiled source code is more readily re-usable by others and has a far greater likelihood of remaining usable over time since recompiling is possible on different architectures and platforms.