Why Cite Data
It is as important to cite data as it is to cite journal articles, books, and other sources. In addition to acknowledging the intellectual output of the researchers who created or collected the data, citation also allows datasets to be identified, accessed, reused, replicated for verification, credited for recognition, and tracked to measure impact.
What to Include in a Data Citation
Some style guides provide examples and recommendations for data citation, but not all do. When that is the case, you should adapt the guidelines for books using the elements below. In addition, some data repositories provide citation examples that you can adapt to align with your style guide.
Minimum elements needed for a data citation:
- Publisher/Distributor (often the repository hosting the data)
- Identifier (DOI or persisten URL)
Additional elements to include, if known, that improve the citation:
- Version or edition
- Resource Type (such as dataset, codebook, or computer file)
- Access Date
- Universal Numerical Fingerprint
Data Citation: APA (6th edition)
For a complete description of data citation guidelines, refer to pp. 210-211 of the Publications Manual of the American Psychological Association, 6th Edition (Ref Desk BF76.7 .P83 2010).
If the dataset has a DOI, use the form:
Author Last Name, First Initial. (Year). Title (Version) [Description of form]. Location: DOI.
United States Department Of Health And Human Services. Substance Abuse And Mental Health Services Administration. Office Of Applied Studies. (2011). Treatment Episode Data Set -- Discharges (TEDS-D) -- Concatenated, 2006 to 2011 (Version v2) [Data set]. ICPSR - Inter-university Consortium for Political and Social Research. https://doi.org/10.3886/icpsr30122.v2
If the dataset does not have a DOI, use the form:
Author Last Name, First Initial. (Year). Title (Version) [Description of form]. Retrieved from URL
Author Last Name, First Initial. (Year). Title (Version) [Description of form]. Location: Name of producer.
Pew Hispanic Center. (2004). Changing channels and crisscrossing cultures: A survey of Latinos on the news media[Data file and code book]. Retrieved from http://pewhispanic.org/datasets
Data Citation: Chicago (16th edition) (author-date)
How to cite data is not discussed in the Chicago Manual of Style, 16th edition (Ref Desk Z253.U69 2010). The example below uses standard data citation elements and the guidelines for a book.
Author last name, first name. Date. Title. Version. Location: Publisher. Distributor. Identifier
Smith, Tom W., Peter V. Marsden, and Michael Hout. 2011. General Social Survey, 1972-2010 Cumulative File. ICPSR31521-v1. Chicago, IL: National Opinion Research Center. Distributed by Ann Arbor, MI: Inter-university Consortium for Political and Social Research. doi:10.3886/ICPSR31521.v1
Data Citation: MLA (7th edition)
How to cite data is not discussed in the MLA Handbook, 7th or 8th edition (Ref Desk LB2369 .G53 2016). The example below uses standard data citation elements and the guidelines for a book.
Author Last Name, First Name. Title. Version. Location: Publisher [producer]. Distributor [distributor], Date of publication. Medium of publication. Date accessed. DOI.
Milberger, Sharon. Evaluation of Violence Against Women With Physical Disabilities in Michigan, 2000-2001. ICPSR version. Detroit: Wayne State U [producer]. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2002. Web. 19 May 2011. doi:10.3886/ICPSR03414