CSV

This class provides a complete interface to CSV files and data. It offers tools to enable you to read and write to and from Strings or IO objects, as needed.

Reading

From a File

A Line at a Time

CSV.foreach("path/to/file.csv") do |row|
  # use row here...
end

All at Once

arr_of_arrs = CSV.read("path/to/file.csv")

From a String

A Line at a Time

CSV.parse("CSV,data,String") do |row|
  # use row here...
end

All at Once

arr_of_arrs = CSV.parse("CSV,data,String")

Writing

To a File

CSV.open("path/to/file.csv", "wb") do |csv|
  csv << ["row", "of", "CSV", "data"]
  csv << ["another", "row"]
  # ...
end

To a String

csv_string = CSV.generate do |csv|
  csv << ["row", "of", "CSV", "data"]
  csv << ["another", "row"]
  # ...
end

Convert a Single Line

csv_string = ["CSV", "data"].to_csv   # to CSV
csv_array  = "CSV,String".parse_csv   # from CSV

Shortcut Interface

CSV             { |csv_out| csv_out << %w{my data here} }  # to $stdout
CSV(csv = "")   { |csv_str| csv_str << %w{my data here} }  # to a String
CSV($stderr)    { |csv_err| csv_err << %w{my data here} }  # to $stderr
CSV($stdin)     { |csv_in|  csv_in.each { |row| p row } }  # from $stdin

Advanced Usage

Wrap an IO Object

csv = CSV.new(io, options)
# ... read (with gets() or each()) from and write (with <<) to csv here ...

CSV and Character Encodings (M17n or Multilingualization)

This new CSV parser is m17n savvy. The parser works in the Encoding of the IO or String object being read from or written to. Your data is never transcoded (unless you ask Ruby to transcode it for you) and will literally be parsed in the Encoding it is in. Thus CSV will return Arrays or Rows of Strings in the Encoding of your data. This is accomplished by transcoding the parser itself into your Encoding.

Some transcoding must take place, of course, to accomplish this multiencoding support. For example, :col_sep, :row_sep, and :quote_char must be transcoded to match your data. Hopefully this makes the entire process feel transparent, since CSV's defaults should just magically work for you data. However, you can set these values manually in the target Encoding to avoid the translation.

It's also important to note that while all of CSV's core parser is now Encoding agnostic, some features are not. For example, the built-in converters will try to transcode data to UTF-8 before making conversions. Again, you can provide custom converters that are aware of your Encodings to avoid this translation. It's just too hard for me to support native conversions in all of Ruby's Encodings.

Anyway, the practical side of this is simple: make sure IO and String objects passed into CSV have the proper Encoding set and everything should just work. CSV methods that allow you to open IO objects (CSV::foreach(), ::open, ::read, and ::readlines) do allow you to specify the Encoding.

One minor exception comes when generating CSV into a String with an Encoding that is not ASCII compatible. There's no existing data for CSV to use to prepare itself and thus you will probably need to manually specify the desired Encoding for most of those cases. It will try to guess using the fields in a row of output though, when using ::generate_line or Array#to_csv().

I try to point out any other Encoding issues in the documentation of methods as they come up.

This has been tested to the best of my ability with all non-“dummy” Encodings Ruby ships with. However, it is brave new code and may have some bugs. Please feel free to report any issues you find with it.

Namespace

Methods

#

A

add_row

C

convert,
converters

E

each

F

filter,
force_quotes?,
foreach

G

generate,
generate_line,
gets

H

header_convert,
header_converters,
header_row?,
headers

I

inspect,
instance

N

O

open

P

parse,
parse_line,
puts

R

read,
read,
readline,
readlines,
readlines,
return_headers?,
rewind

S

shift,
skip_blanks?

T

table

U

unconverted_fields?

W

write_headers?

Included Modules

Enumerable

Constants

VERSION	=	"2.4.8".freeze
	The version of the installed library.
FieldInfo	=	Struct.new(:index, :line, :header)
	A FieldInfo Struct contains details about a field's position in the data source it was read from. CSV will pass this Struct to some blocks that make decisions based on field structure. See CSV.convert_fields() for an example. `index` The zero-based index of the field in its row. `line` The line of the data source this row is from. `header` The header for the column, when available.
DateMatcher	=	/ \A(?: (\w+,?\s+)?\w+\s+\d{1,2},?\s+\d{2,4} \| \d{4}-\d{2}-\d{2} )\z /x
	A Regexp used to find and convert some common Date formats.
DateTimeMatcher	=	/ \A(?: (\w+,?\s+)?\w+\s+\d{1,2}\s+\d{1,2}:\d{1,2}:\d{1,2},?\s+\d{2,4} \| \d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2} )\z /x
	A Regexp used to find and convert some common DateTime formats.
ConverterEncoding	=	Encoding.find("UTF-8")
	The encoding used by all converters.
Converters	=	{ integer: lambda { \|f\| Integer(f.encode(ConverterEncoding)) rescue f }, float: lambda { \|f\| Float(f.encode(ConverterEncoding)) rescue f }, numeric: [:integer, :float], date: lambda { \|f\| begin e = f.encode(ConverterEncoding) e =~ DateMatcher ? Date.parse(e) : f rescue # encoding conversion or date parse errors f end }, date_time: lambda { \|f\| begin e = f.encode(ConverterEncoding) e =~ DateTimeMatcher ? DateTime.parse(e) : f rescue # encoding conversion or date parse errors f end }, all: [:date_time, :numeric] }
	This Hash holds the built-in converters of CSV that can be accessed by name. You can select Converters with #convert or through the `options` Hash passed to ::new. `:integer` Converts any field Integer() accepts. `:float` Converts any field Float() accepts. `:numeric` A combination of `:integer` and `:float`. `:date` Converts any field Date.parse accepts. `:date_time` Converts any field DateTime.parse accepts. `:all` All built-in converters. A combination of `:date_time` and `:numeric`. All built-in converters transcode field data to UTF-8 before attempting a conversion. If your data cannot be transcoded to UTF-8 the conversion will fail and the field will remain unchanged. This Hash is intentionally left unfrozen and users should feel free to add values to it that can be accessed by all CSV objects. To add a combo field, the value should be an Array of names. Combo fields can be nested with other combo fields.
HeaderConverters	=	{ downcase: lambda { \|h\| h.encode(ConverterEncoding).downcase }, symbol: lambda { \|h\| h.encode(ConverterEncoding).downcase.gsub(/\s+/, "_"). gsub(/\W+/, "").to_sym } }
	This Hash holds the built-in header converters of CSV that can be accessed by name. You can select HeaderConverters with #header_convert or through the `options` Hash passed to ::new. `:downcase` Calls downcase() on the header String. `:symbol` The header String is downcased, spaces are replaced with underscores, non-word characters are dropped, and finally to_sym() is called. All built-in header converters transcode header data to UTF-8 before attempting a conversion. If your data cannot be transcoded to UTF-8 the conversion will fail and the header will remain unchanged. This Hash is intetionally left unfrozen and users should feel free to add values to it that can be accessed by all CSV objects. To add a combo field, the value should be an Array of names. Combo fields can be nested with other combo fields.
DEFAULT_OPTIONS	=	{ col_sep: ",", row_sep: :auto, quote_char: '"', field_size_limit: nil, converters: nil, unconverted_fields: nil, headers: false, return_headers: false, header_converters: nil, skip_blanks: false, force_quotes: false, skip_lines: nil }.freeze
	The options used when no overrides are given by calling code. They are: `:col_sep` `","` `:row_sep` `:auto` `:quote_char` `'"'` `:field_size_limit` `nil` `:converters` `nil` `:unconverted_fields` `nil` `:headers` `false` `:return_headers` `false` `:header_converters` `nil` `:skip_blanks` `false` `:force_quotes` `false` `:skip_lines` `nil`

Attributes

[R]	col_sep	The encoded `:col_sep` used in parsing and writing. See ::new for details.
[R]	encoding	The Encoding CSV is parsing or writing in. This will be the Encoding you receive parsed data in and/or the Encoding data will be written in.
[R]	field_size_limit	The limit for field size, if any. See ::new for details.
[R]	lineno	The line number of the last row read from this file. Fields with nested line-end characters will not affect this count.
[R]	quote_char	The encoded `:quote_char` used in parsing and writing. See ::new for details.
[R]	row_sep	The encoded `:row_sep` used in parsing and writing. See ::new for details.
[R]	skip_lines	The regex marking a line as a comment. See ::new for details

Class Public methods

filter( options = Hash.new ) { |row| ... } filter( input, options = Hash.new ) { |row| ... } filter( input, output, options = Hash.new ) { |row| ... } Link

This method is a convenience for building Unix-like filters for CSV data. Each row is yielded to the provided block which can alter it as needed. After the block returns, the row is appended to output altered or not.

The input and output arguments can be anything ::new accepts (generally String or IO objects). If not given, they default to ARGF and $stdout.

The options parameter is also filtered down to ::new after some clever key parsing. Any key beginning with :in_ or :input_ will have that leading identifier stripped and will only be used in the options Hash for the input object. Keys starting with :out_ or :output_ affect only output. All other keys are assigned to both objects.

The :output_row_sep option defaults to $INPUT_RECORD_SEPARATOR ($/).

Class CSV < Object

Reading

From a File

A Line at a Time

All at Once

From a String

A Line at a Time

All at Once

Writing

To a File

To a String

Convert a Single Line

Shortcut Interface

Advanced Usage

Wrap an IO Object

CSV and Character Encodings (M17n or Multilingualization)