Tab-separated values

Tab-separated values
Filename extension	.tsv, .tab
Internet media type	text/tab-separated-values
Uniform Type Identifier (UTI)	public.tab-separated-values-text
UTI conformation	public.delimited-values-text
Developed by	University of Minnesota Internet Gopher Team; Internet Assigned Numbers Authority
Initial release	c. June 1993
Type of format	Delimiter-separated values format
Container for	database information organized as field separated lists
Standard	IANA MIME type

Tab-separated values (TSV) is a simple, text-based file format for storing tabular data.^[3] Records are separated by newlines, and values within a record are separated by tab characters. The TSV format is thus a delimiter-separated values format, similar to comma-separated values.

TSV is a simple file format that is widely supported, so it is often used in data exchange to move tabular data between different computer programs that support the format. For example, a TSV file might be used to transfer information from a database to a spreadsheet.

Example

The head of the Iris flower data set can be stored as a TSV using the following plain text (note that the HTML rendering may convert tabs to spaces):

Sepal length	Sepal width	Petal length	Petal width	Species
5.1	3.5	1.4	0.2	I. setosa
4.9	3.0	1.4	0.2	I. setosa
4.7	3.2	1.3	0.2	I. setosa
4.6	3.1	1.5	0.2	I. setosa
5.0	3.6	1.4	0.2	I. setosa

The TSV plain text above corresponds to the following tabular data:

Sepal length	Sepal width	Petal length	Petal width	Species
5.1	3.5	1.4	0.2	I. setosa
4.9	3.0	1.4	0.2	I. setosa
4.7	3.2	1.3	0.2	I. setosa
4.6	3.1	1.5	0.2	I. setosa
5.0	3.6	1.4	0.2	I. setosa

Character escaping

The IANA media type standard for TSV achieves simplicity by simply disallowing tabs within fields.^[4]

Since the values in the TSV format cannot contain literal tabs or newline characters, a convention is necessary for lossless conversion of text values with these characters. A common convention is to perform the following escapes:^[5]^[6]


escape sequence	meaning
`\n`	line feed
`\t`	tab
`\r`	carriage return
`\\`	backslash

Another common convention is to use the CSV convention from RFC 4180 and enclose values containing tabs or newlines in double quotes. This can lead to ambiguities.^[7]^[8]

Line endings

Records are typically separated by a line feed, as is typical for Unix platforms, or a carriage return and line feed, as is typical for Microsoft platforms. Some programs may expect the latter. The de-facto specification^[9] specifies that records are separated by an EOL, but does not specify any specific newline.

References

^ U of Edin. Research Data Support Team. "Choose the best file formats". University of Edinburgh. § Formats we recommend. Retrieved 23 May 2023.
^ ^a ^b "tabSeparatedText". Apple Developer Documentation: Uniform Type Identifiers. Apple Inc. Retrieved 23 May 2023.
^ "How To Use Tab Separated Value (TSV) files". International Monetary Fund. Retrieved 1 February 2023.
^ Lindner 1993.
^ Dusek, Jason (6 May 2014). "Linear TSV: simple, line-oriented, tabular data". Data Protocols - Open Knowledge Foundation (v1.0β ed.).
^ Dolan, Stephen (1 November 2018). "jq Manual". jq. Retrieved 23 May 2023.
^ Miller, Rob (22 September 2015). Text Processing with Ruby: Extract Value from the Data That Surrounds You. Pragmatic Bookshelf. p. 94. ISBN 978-1-68050-492-7.
^ Giuseppini, Gabriele; Burnett, Mark (10 February 2005). Microsoft Log Parser Toolkit: A Complete Toolkit for Microsoft's Undocumented Log Analysis Tool. Elsevier. p. 311. ISBN 978-0-08-048939-1.
^ "IANA: text/tab-separated-values".

Sources

"TSV — Tab-Separated Values" (11 February 2021 ed.). Library of Congress. fdd000533. Retrieved 23 May 2023.
Lindner, Paul (June 1993). "text/tab-separated-values" [Definition of tab-separated-values (tsv)]. Assigned Media Types Registry. IANA. Minnesota: University of Minnesota Internet Gopher Team. Retrieved 23 May 2023.
"How To Use Tab Separated Value (TSV) Files". Archived from the original on 12 January 2007.