Saturday, July 21, 2007

StreamReader: annoying design decisions in the .NET framework

You would be surprised the simple things that people forget to include in APIs that make using them very annoying.

Here's an example: StreamReader. Intelligently, StreamReader buffers, for performance reasons, input from the stream, so while you may only call ReadBlock() for 14 characters, StreamReader has read ahead 50 characters.

That's great and all, except they forgot to *support their own features*.

The most obvious problem is seeking: they didn't add buffer-aware seeking functions to StreamReader itself. So you have to dig around the underlying BaseStream to make the magic happen. Except that StreamReader doesn't actually check the current position of the base stream before it reads from its buffer.

So, when you try to seek on BaseStream, and then do a read, StreamReader happily (and very blindly) reads from its invalid buffer, thus fucking your shit up.

Apparently someone, somewhere, was vaguely aware of this, and provided "DiscardBufferedData()" -- one call to this and the buffer is emptied, so reading from StreamReader will pull the fresh data instead of the old data.

Except DiscardBufferedData() doesn't update the underlying stream's position. Oops!

So while you think you've read 540 bytes and the stream should be at 540, its actually at 560, because StreamReader buffered an extra 20 bytes for performance.

No big deal, though, right? We'll just see how much data was in the StreamReader buffer, and seek backwards that amount.

Or, we would, if DiscardBufferedData() had been designed properly so the discarded buffer size was the return value.

No biggie, we'll just check out the BufferSize property that the developers decided wasn't worth including as a class member. Shoot.

Of course, this isn't an insurmountable problem: you just have to keep tabs, manually, on where the stream position *SHOULD BE*, and then do your seeking calculations with those numbers in mind.

But, being able to work-around it isn't the problem: the fact that you *HAVE* to work around it is the big annoyance here. It's a "WTF PLZ" that shouldn't even be in the framework, much less having survived three iterations of .NET RTM.

There was a time where I would have sent that up as a feature request, but every time I visit Connect I see a whole bunch of feedback issues with the same replies from "Microsoft": "we can't fix this in time for X,Y,Z, but we'll keep an eye on it for future releases (not really)!"

I think I'll actually do it this time, though.

I'm a bit curious to see if or how they'll dodge it.


Anonymous said...

Did any easier workaround ever surface? I have the same problem, except I don't care about lines. However, I DO care about getting characters (possibly UTF-8 encoded) instead of bytes, thus wanted to use StreamReader.

Dan M.

Radical Ed said...

Not as far as I know. But, I've been immersed in the Ruby world lately, so something might have come up and I'm unaware of it.