4. Csv

The csv(3m) module parses the popular comma separated values (CSV) format exported by applications like spreadsheets. This parser considers quoted elements, quoted quotes, and preserves carriage returns and newlines within elements. The separator character is specified by the user.

Example 1. Reading /etc/passwd
The following example uses csv(3m) to parse the /etc/passwd file and print the username and uid of each user.


  #include <stdlib.h>
  #include <stdio.h>
  #include <mba/csv.h>
  
  int
  main(void)
  {
      FILE *in;
      unsigned char buf[1024];
      unsigned char *row[10];
      int n;
  
      in = fopen("/etc/passwd", "r");
  
      while ((n = csv_row_fread(in, buf, 1024, row, 10, ':', CSV_TRIM)) > 0) {
          printf("%s %s\n", row[0], row[2]);
      }
      fclose(in);
  
      return EXIT_SUCCESS;
  }
  

Please note that escaping a quote requires that the entire element be quoted as well. So a quoted string element would actually look like """foo""" where the outer quotes quote the element and the remaining quote pairs are each an escape followed by the literal quote. This is consistent with how popular spreadsheets behave and deviating from this behavior will generate an error.

4.1. Csv functions

The csv_row_parse function
Synopsis

#include <mba/csv.h> int csv_row_parse(const tchar *src, size_t sn, tchar *buf, size_t bn, tchar *row[], int rn, int sep, int flags)
Description
The csv_row_parse function will parse a line of text at src for no more than sn bytes and place pointers to zero terminiated strings allocated from no more than bn bytes of the memory at buf into the array row for at most rn data elements. The sep parameter must specify a separator to use (e.g. a comma ',') and must not be a quote, carriage return, or newline. Comma ',', tab '\t', colon ':', and pipe '|' are common separators.

The flags parameter can be zero or any combination of CSV_TRIM and CSV_QUOTES. If CSV_TRIM is specified, strings will be trimmed of leading and trailing whitespace (but an unquoted carriage-return before a newline is always trimmed regardless). If the CSV_QUOTES flag is spcecified, quotes will be interpreted. Both flags should be specified when parsing conventional CSV files.

The csv_row_parse function is actually a macro for either csv_row_parse_str or csv_row_parse_wcs. The csv_row_parse_wcs function has the same prototype but accepts wchar_t parameters whereas csv_row_parse_str accepts unsigned char parameters.
Returns
The csv_row_parse function returns the number of bytes of src parsed or -1 if an error occured in which case errno will be set appropriately.

The csv_row_fread function
Synopsis

#include <mba/csv.h> int csv_row_fread(FILE *in, unsigned char *buf, size_t bn, unsigned char *row[], int numcols, int sep, int flags)
Description
Read a line of text from the stream in, process the line with csv_row_parse and place pointers to zero terminiated strings allocated from no more than bn bytes of the memory at buf into the array row for at most rn data elements.
Returns
The csv_row_fread function returns the number of bytes read from the stream in or -1 if an error occured in which case errno will be set appropriately.


Copyright 2003 Michael B. Allen <mba2000 ioplex.com>