XmlDocument + StringWriter = EVIL
ok, you can proly mark this one up for me just being lazy/dumb. but, after months of nagging problems w/ string encodings for XSL-transformed results, it finally dawned on me how stoopid i've been.
XmlDocument + StringWriter = EVIL
cuz it's all about the encoding, folks.
since i do mostly web apps, i do lots of XSLT work in C#. this usually goes just great, but occasionally i end up w/ goofy encoding problems. for example, sometimes MSIE will refuse to render results as HTML and will instead just belch the XML onto the client window. sometimes, even though i *know* i indicate UTF-8 in my XSL documents, the result displayed in the browser shows UTF-16. it really gets bad when i start putting together XML pipelines mixing plain XML w/ transformed docs. sometimes i just pull my hair out.
and it's all because i'm lazy/dumb. cuz StringWriter has no business being involved in an XML pipeline. we all know that right? and we all know why, right? do we?
i did. but i forgot.
see strings are stored internally as UTF-16 (Unicode) in C#. that's cool. makes sense. but not when you want to ship around the string results of an XML pipeline. that's when you usually want to stick w/ UTF-8. but StringWriter don't play dat.
so i just stopped using StringWriter to hold output form XML/XSL work. instead i use MemoryStream and make sure to set the encoding beforehand. here's some examples:
first, the wrong/dumb/Old-Mike way:
private string Transform(XmlDocument xmldoc, XmlDocument xsldoc, XsltArgumentList args)
{
XPathNavigator xdNav = xmldoc.CreateNavigator();
XslTransform tr = new XslTransform();
tr.Load(xsldoc);
StringWriter sw = new StringWriter();
tr.Transform(xdNav, args, sw);
return sw.ToString();
}
the above code will always return a string encoded in UTF-16. bummer.
now the proper/sane/New-Mike way:
private string Transform(XmlDocument xmldoc, XmlDocument xsldoc, XsltArgumentList args)
{
XPathNavigator xpn = xmldoc.CreateNavigator();
XslTransform tr = new XslTransform();
tr.Load(xsldoc);
System.IO.MemoryStream ms = new System.IO.MemoryStream();
tr.Transform(xpn, args, ms);
System.Text.Encoding enc = System.Text.Encoding.UTF8;
return enc.GetString(ms.ToArray());
}
this will return UTF-8 every time. much better.