0 comment

Numbering lines with Unix

notetoselfHave you ever had a csv file and wanted to import it into a database? And you would like to add a leading ID column numbered from 0, separated by, let’s say a colon? Here’s a hint: use the Unix pr (for print) utility:

pr -tn, -N0 test.csv | sed -e 's/^[ \t]*//' > new.csv

My test.csv contains a list of all world manufacturer ids (WMI) for car VINs (vehicle identification number). the first few rows look like:

AFA,Ford South Africa
AAV,Volkswagen South Africa

Please note that column headers are added later on. Now the output looks like this:

0,AFA,Ford South Africa
1,AAV,Volkswagen South Africa

Now for the curious: what does the command line do?
First for the pr part:

  • -t means: omit headers (remember: normally pr is used to print paginated content …)
  • -n, means: number lines. Use colon as a separator
  • -N0 means: start with 0

So much for that part. The pr utility normally numbers lines within a given column width (standard is 5 chars). This results in leading whitespace. We don’t want that, so the sed command removes spaces and tabs at the beginning of the line.
Enough Unix magic for now. Happy hacking!

Update: Detlef Kreuz just mentioned on Twitter, that this task could also be accomplished with awk:

awk -e '{print NR-1 "," $0;}' test.csv > new.csv

Here awk executes the commands inside the curly braces for every line of input. Each line will first print the line number minus 1, followed by a colon and the complete line. $0 is an internal awk variable containing the complete currect line, while $1, $2 … contain the split up fields (where to split is determined by FS, the field separator, which defaults to a space). Thanks Detlef!

Leave a Reply

Required fields are marked *.