Skip to Main Content

Research Data Management: File Management

Information for researchers on data management.

File management

File organization and naming conventions are often unique to the lab or researcher and can be highly personalized; the important thing is to be consistent and to write them down. Spending a little time on file management strategies early in the project planning process can save lots of time (and headaches) later. After determining conventions for file naming and organization, document and share them with collaborators, faculty advisors, student assistants, or anyone else who may need access to the data. Lab groups should establish a convention for the lab and save it to a shared space so that everyone can follow the same conventions.

Best practices

  • Use file naming conventions
    • Make file names unique, including the most important identifying information of the project. File names should be short, so do not try to include all of these, but elements of a good file name may include:
    • project name, acronym, or research data name
    • study title
    • location information
    • researcher initials
    • date (consistently formatted, i.e. YYYYMMDD or YYYY-MM-DD). See ISO 8601.
    • version number
  • Use leading zeros when incorporating numbers to enable sorting (a sequence of 1-100 should be numbered 001-100).
  • Use underscores to separate elements; avoid special characters, spaces, and periods other than the one before the file extension.
  • File names should be short enough to be readable, while still conveying enough pertinent information. Modern operating systems cannot handle file names more than 255 characters in length.

For example: DryValleySoil_ICPOES_20101115_JDS.dat
DryValleySoil is the project name, ICPOES is the instrument from which the data originated, 20101115 is the date of the sample run on the instrument, and JDS are the initials of Jane Doe Scientist.

Keep track of versions (version control)

It is important to keep track of versions when working with data. There are many benefits, most importantly the ability to revert data to an earlier version instead of starting from scratch or worse, having to regenerate data. There are three basic ways to keep track of versions.
 

Some tools, like Electronic Lab Notebooks, cloud storage applications, or more specialized project repositories such as GitHub or the Open Science Framework (OSF) may allow you to assign version numbers, and that is one option for managing versions of data. Other options include using a naming scheme or version control software. Some best practices for working with versions include:

  • Save an untouched copy of the raw data, and leave it that way. Always work on something other than the "safe" untouched copy (it is always possible to go back to the "safe" data and make a new copy in order to start from scratch).
  • Avoid ambiguous labels, such as 'revision', 'final', 'final2', etc. Instead, use a file naming convention (like v001, v002 or v1_0, v1_2, v2_0).
  • Use a directory structure naming convention that includes version information.
  • Use tools that automatically assign version numbers to manage the data. It may be a good idea to test this method to make sure that it is possible to revert to earlier versions and that the tool functions as expected.
  • If appropriate, use version control software (such as SVN, or GitHub). A coding project is likely to be a good candidate for using version control.

Document and use directory structure naming conventions

Do not rely on directory structure to provide critical information about the file contents. Directory top-level folders should include the project title, unique identifier, and date (year), but the files themselves should be well-described independent of the directory structure. Consider creating a brief description of the contents of major folders and providing an overview of the directory structure. This can be a text document or readme file that is stored in a top-level folder or shared space. The level of detail to strive for is enough to help someone else understand the contents and organization of your files in your absence.

Data Services Librarian

Profile Photo
Jim Kelly
He/him/his
Contact:
O’Shaughnessy-Frey Library | LIB 115
651-962-5012