2008-09-25

XML indenting with sed(1)

Some years ago I stumbled upon SedSokoban, the sokoban game implemented as a sed script. I found it pretty amusing, so I got interested in the more arcane uses of sed. As an exercise, I set out to write an XML indenter sed script.

Now I found that script (again), and I thought it would be a nice starting post here.

xmlindent.sed looks like this:

:a
/>/!N;s/\n/ /;ta
s/	/ /g;s/^ *//;s/  */ /g
/^<!--/{
:e
/-->/!N;s/\n//;te
s/-->/\n/;D;
}
/^<[?!][^>]*>/{
H;x;s/\n//;s/>.*$/>/;p;bb
}
/^<\/[^>]*>/{
H;x;s/\n//;s/>.*$/>/;s/^	//;p;bb
}
/^<[^>]*\/>/{
H;x;s/\n//;s/>.*$/>/;p;bb
}
/^<[^>]*[^\/]>/{
H;x;s/\n//;s/>.*$/>/;p;s/^/	/;bb
}
/</!ba
{
H;x;s/\n//;s/ *<.*$//;p;s/[^	].*$//;x;s/^[^<]*//;ba
}
:b
{
s/[^	].*$//;x;s/^<[^>]*>//;ba
}

Unfortunately it chokes on some xml inputs, but I could use it to pretty-format most of the common xml files I came across (configuration files, xml-based network protocol messages, etc).