Archive for September, 2008

Static constructors

kick it on DotNetKicks.com

Recently on Stackoverflow.com there was an interesting question asking what “hidden” or not-well-known C# features people would like to share. The first thing that came to my mind was static constructors (just “ctors” henceforth). For example, this is legal C#:

public class Foo
{
	static Foo()
	{ }

	public Foo()
	{ }
}

public static class Bar
{
	static Bar()
	{ }
}

Well gee, that’s neat, but what good is it? Here’s what MSDN has to say about that:

A static constructor is used to initialize any static data, or to perform a particular action that needs performed once only. It is called automatically before the first instance is created or any static members are referenced.

Essentially, static ctors are commonly best used to set static members of the class (whether the class itself is static or not). With a static class, the only operations that execute when the class is created is the assignment of static field values, and the static ctor.

Here’s an example of how I use this C# feature to set the connection string for a DataContext wrapper named FeedBurner, which is the name of a database server:

public static class FeedBurner
{
    private static string connectionString;  

    static FeedBurner()
    {
        SettingManager.OnChange = () =>
        {
            FeedBurner.connectionString =
                    SettingManager.Setting
						("ConnectionString");
        };
    }  

    public static FeedBurnerDataContext DataContext
    {
        get
        {
            return new FeedBurnerDataContext
			    (FeedBurner.connectionString);
        }
    }

SettingManager is a class that watches files with configuration settings and fires off OnChange() whenever these files change. In this example, if the connection string to the FeedBurner database is changed, the entire class will automatically update the connection string it uses to talk with the FeedBurner database, without having to restart the application.

Beware, though. If an Exception occurs in the static ctor, a TypeInitializationException is thrown. From MSDN:

When a class initializer fails to initialize a type, a TypeInitializationException is created and passed a reference to the exception thrown by the type’s class initializer. The InnerException property of TypeInitializationException holds the underlying exception.

And when this occurs, the Visual Studio debugger does not drill down into the static ctor for you to debug it. Instead, if one gets a TypeInitialzationExceptiom, he or she should programmatically force the debugger to break, to examine the local variables and other run-time data.

public class Foo
{
    static Foo()
    {
#if DEBUG
        try
        {
#endif
            // Do something
#if DEBUG
        }
        catch (Exception ex)
        {
            System.Diagnostics.Debugger.Break();
        }
#endif
    }
}

Pretty ugly, huh?

Generally, static ctors should be avoided. They’re usually more trouble than their worth, and there’s usually a better way to accomplish a task than using a static ctor. But they’re a nifty feature! Here are the notes MSDN offers on them:

A static constructor does not take access modifiers or have parameters.

A static constructor is called automatically to initialize the class before the first instance is created or any static members are referenced.

A static constructor cannot be called directly.

The user has no control on when the static constructor is executed in the program.

A typical use of static constructors is when the class is using a log file and the constructor is used to write entries to this file.

Static constructors are also useful when creating wrapper classes for unmanaged code, when the constructor can call the LoadLibrary method.

If a static constructor throws an exception, the runtime will not invoke it a second time, and the type will remain uninitialized for the lifetime of the application domain in which your program is running.

Happy coding!

kick it on DotNetKicks.com

Leave a Comment

Hexadecimal value 0x is an invalid character

kick it on DotNetKicks.com

Ever get a System.Xml.XmlException that says:

“Hexadecimal value 0x[whatever] is an invalid character”

…when trying to load a XML document using one of the .NET XML API objects like XmlReader, XmlDocument, or XDocument? Was “0x[whatever]” by chance one of these characters?

0×00
0×01
0×02
0×03
0×04
0×05
0×06
0×07
0×08
0×0B
0×0C
0×0E
0×0F
0×10
0×11
0×12
0×13
0×14
0×15
0×1A
0×1B
0×1C
0×1D
0×1E
0×1F
0×16
0×17
0×18
0×19
0×7F

The problem that causes these XmlExceptions is that the data being read or loaded contains characters that are illegal according to the XML specifications. Almost always, these characters are in the ASCII control character range (think whacky characters like null, bell, backspace, etc). These aren’t characters that have any business being in XML data; they’re illegal characters that should be removed, usually having found their way into the data from file format conversions, like when someone tries to create an XML file from Excel data, or export their data to XML from a format that may be stored as binary.

The decimal range for ASCII control characters is 0 – 31, and 127. Or, in hex, 0×00 – 0×1F. (The control character 0×7F is not disallowed, but its use is “discouraged” to avoid compatibility issues.) If any character in the string or stream that contains the XML data contains one of these control characters, an XmlException will be thrown by whatever System.Xml or System.Xml.Linq class (e.g. XmlReader, XmlDocument, XDocument) is trying to load the XML data. In fact, if XML data contains the character ‘\b’ (bell), your motherboard will actually make the bell sound before the XmlException is thrown.

There are a few exceptions though: the formatting characters ‘\n’, ‘\r’, and ‘\t’ are not illegal in XML, per the 1.0 and 1.1 specifications, and therefore do not cause this XmlException. Thus, if you’re encountering XML data that is causing an XmlException because the data “contains invalid characters”, the feeds you’re processing need to be sanitized of illegal XML characters per the XML 1.0 specification (which is what System.Xml conforms to—not XML 1.1) should be removed. The methods below will accomplish this:

/// <summary>
/// Remove illegal XML characters from a string.
/// </summary>
public string SanitizeXmlString(string xml)
{
	if (xml == null)
	{
		throw new ArgumentNullException("xml");
	}

	StringBuilder buffer = new StringBuilder(xml.Length);

	foreach (char c in xml)
	{
		if (IsLegalXmlChar(c))
		{
			buffer.Append(c);
		}
	}

	return buffer.ToString();
}

/// <summary>
/// Whether a given character is allowed by XML 1.0.
/// </summary>
public bool IsLegalXmlChar(int character)
{
	return
	(
		 character == 0x9 /* == '\t' == 9   */          ||
		 character == 0xA /* == '\n' == 10  */          ||
		 character == 0xD /* == '\r' == 13  */          ||
		(character >= 0x20    && character <= 0xD7FF  ) ||
		(character >= 0xE000  && character <= 0xFFFD  ) ||
		(character >= 0x10000 && character <= 0x10FFFF)
	);
}

Useful as these methods are, don’t go off pasting them into your code anywhere. Create a class instead. Here’s why: let’s say you use the routine to sanitize a string in one section of code. Then another section of code uses that same string that has been sanitized. How does the other section positively know that the string doesn’t contain any control characters anymore, without checking? It doesn’t. Who knows where that string has been (if it’s been sanitized) before it gets to a different routine, further down the processing pipeline. Program defensive and agnostically. If the sanitized string isn’t a string and is instead a different type that represents sanitized strings, you can guarantee that the string doesn’t contain illegal characters.

Now, if the strings that need to be sanitized are being retrieved from a Stream, via a TextReader, for example, we can create a custom StreamReader class that will skip over illegal characters. Let’s say that you’re retrieving XML like so:

string xml;

using (WebClient downloader = new WebClient())
{
	using (TextReader reader =
		new StreamReader(downloader.OpenRead(uri)))
	{
		xml = reader.ReadToEnd();
	}
}

// Do something with xml...

You could use the sanitizing methods above like this:

string xml;

using (WebClient downloader = new WebClient())
{
	using (TextReader reader =
		new StreamReader(downloader.OpenRead(uri)))
	{
		xml = reader.ReadToEnd();
	}
}

// Sanitize the XML

xml = SanitizeXmlString(xml);

// Do something with xml...

But creating a class that inherits from StreamReader and avoiding the costly string-building operation performed by SanitizeXmlString() is much more efficient. The class will have to override a couple methods when it’s finished, but when it is, a Stream could be consumed and sanitized like this instead:

string xml;

using (WebClient downloader = new WebClient())
{
	using(XmlSanitizingStream reader =
		new XmlSanitizingStream(downloader.OpenRead(uri)))
	{
		xml = reader.ReadToEnd()
	}
}

// xml contains no illegal characters

The declaration for this XmlSanitizingStream, with IsLegalXmlChar() that we’ll need, looks like:

public class XmlSanitizingStream : StreamReader
{
	// Pass 'true' to automatically detect encoding using BOMs.
	// BOMs: http://en.wikipedia.org/wiki/Byte-order_mark

	public XmlSanitizingStream(Stream streamToSanitize)
		: base(streamToSanitize, true)
	{ }

	/// <summary>
	/// Whether a given character is allowed by XML 1.0.
	/// </summary>
	public static bool IsLegalXmlChar(int character)
	{
		return
		(
			 character == 0x9 /* == '\t' == 9   */          ||
			 character == 0xA /* == '\n' == 10  */          ||
			 character == 0xD /* == '\r' == 13  */          ||
			(character >= 0x20    && character <= 0xD7FF  ) ||
			(character >= 0xE000  && character <= 0xFFFD  ) ||
			(character >= 0x10000 && character <= 0x10FFFF)
		);
	}

	// ...

To get this XmlSanitizingStream working correctly, we’ll first need to override two methods integral to the StreamReader: Peek(), and Read(). The Read method should only return legal XML characters, and Peek() should skip past a character if it’s not legal.

	private const int EOF = -1;

	public override int Read()
	{
		// Read each char, skipping ones XML has prohibited

		int nextCharacter;

		do
		{
			// Read a character

			if ((nextCharacter = base.Read()) == EOF)
			{
				// If the char denotes end of file, stop
				break;
			}
		}

		// Skip char if it's illegal, and try the next

		while (!XmlSanitizingStream.
		        IsLegalXmlChar(nextCharacter));

		return nextCharacter;
	}

	public override int Peek()
	{
		// Return next legal XML char w/o reading it 

		int nextCharacter;

		do
		{
			// See what the next character is
			nextCharacter = base.Peek();
		}
		while
		(
			// If it's illegal, skip over
			// and try the next.

			!XmlSanitizingStream
			.IsLegalXmlChar(nextCharacter) &&
			(nextCharacter = base.Read()) != EOF
		);

		return nextCharacter;

	}

Next, we’ll need to override the other Read* methods (Read, ReadToEnd, ReadLine, ReadBlock). These all use Peek() and Read() to derive their returns. If they are not overridden, calling them on XmlSanitizingStream will invoke them on the underlying base StreamReader. That StreamReader will then use its Peek() and Read() methods, not the XmlSanitizingStream’s, resulting in unsanitized characters making their way through.

To make life easy and avoid writing these other Read* methods from scratch, we can disassemble the TextReader class using Reflector, and copy its versions of the other Read* methods, without having to change more than a few lines of code related to ArgumentExceptions.

The complete version of XmlSanitizingStream can be downloaded here. Rename the file extension to “.cs” from “.doc” after downloading.

kick it on DotNetKicks.com

Comments (16)