Matt's Mind

Thursday, August 05, 2004

Unsigned integer types

OK, so I've been programming in Java now since the 1.0 days and never needed to worry about the lack of unsigned types. Java kind of sneakily works its way around the issue when they are needed by just using the "next size up": for example the call to read a byte to a stream, InputStream.read (), returns an int rather than a byte. I used to wonder why until I realised this is an unsigned byte, and using byte would break for values > 127. Another example is the java.util.zip.CRC32 class that outputs a long (64 bits), even though you might expect an int (32 bits).

A lot of the time this trick works, but a real problem occurs when you want to promote, say, an "unsigned" byte to an unsigned int. That is, a byte that you are treating as unsigned (eg one read from an input stream into a byte array) - it's only a bit pattern after all and most of the bitwise operators don't care about the sign. If you just do something like:
byte [] bytes = new byte [10];
input.read (bytes);
int uint = bytes [0];
Then you'll find that, for negative values, Java helpfully "extends" the negative bit on assignment, so for example if bytes [0] had the hex value FF (binary 11111111), then uint ends up as hex 807F (binary 1000000001111111). Which may cause some problems, especially since things still work for values less than 128.

The way to work around this? The code below promote an "unsigned" byte to an "unsigned" int (there may be a more elegant way to do this, but I know this works):
public static int promote (byte value)

{
if ((value & (byte)0x80) != 0)
{
// create int without sign extension and then re-add high bit
int uintValue = (value & (byte)0x7F);
uintValue |= 0x80;

return uintValue;
} else
{
return value;
}
}

Even though this is a PITA and a trap for new players, I can understand why the Java designers might have decided to leave unsigned types out. For a start, you immediately increase complexity a lot: you double the number of primitives and need to define how to handle things like assigning a signed to an unsigned safely. The current types entirely enclose the smaller types (a long can handle any int, an int can handle any short etc). But unsigned types overlap but don't enclose. Also, the JVM encodes the type of operand into bytecode instructions, and there are limited number of those (256 IIRC).

0 Comments:

Post a Comment

<< Home