Data @ Reed

Tabular Data Best Practices

Spreadsheets are widely used in research for data entry and storage. Creating a well-formed spreadsheet makes your data shareable and reusable later. Below are best practices to consider.

Create Good Variable Names

  • Make them short but meaningful
  • Do not use spaces, instead use underscores
  • Do no use special characters & , % # ; * : ( ) ! @ $ ^ ~ ’ { } [ ] ? < > - + /

Enter Dates as Text

  • Excel automatically reformats dates, which can then display differently on Windows and Macs
  • Use the form YYYYMMDD (ISO 8601)
    • Excel will treat it as a text field; sorting alphabetically will also sort chronologically

Use Consistent Codes for

  • Subject/participant identifiers
  • Categorical variables
  • Missing values

Put One Item per Cell

  • Break up multiple measurements into multiple cells
  • Do not include units in the cell, put these in the column names or, better yet, a data dictionary
  • Do not use fonts, italics, bold, or cell colors to represent data
  • No empty cells, use a standard code to indicate missing data

Make It a Rectangle

  • Put variables in columns and experimental unit/observation/measurement in rows
  • Only use one row for variable names
  • Only one rectangle per spreadsheet

Use Version Control to Track Changes

  • Use a version control system like Git
  • Upload files to Google Drive or the Open Science Framework for automatic version tracking
  • At a minimum, use a suffix in the file name to represent the version number
    • Example: ProjectName_Instrument_Condition_YYYYMMD_v01.txt
    • Never call a file “final

Getting Help

  • When searching the Internet for help include your Excel version and operating system