Creating numeric types from byte arrays in Clojure

While writing my Clojure-based Minecraft map reader and NBT library, carrit, I’ve need to do some messing around with binary files. For this library I’ve made the decision to implement what I can in Clojure’s available functions, dropping down into Java interop where I haven’t found the native Clojure equivalent. As a minecraft map is divided into region files that are each relatively small (~3MB for larger ones) I’ve been slurping them one by one into a single contiguous byte array, decompressing each individual Minecraft sector of that into separate byte arrays that are then read and interpreted by my NBT code.

Been having some issues with Clojure’s interpretation of bytes when trying to convert a section of a byte array into a numeric value. For example, given an array [0xFF 0xFF 0xFF 0xFF]:

Results in:

As expected. However, given an array [0x00 0xFF 0xFF 0xFF], which should result in 16777215, we instead get:

Fortunately, I’m not the only one who’s encountered this, as can be seen in the following threads:

Some things to note are that bit-left-shift performs a widening conversion on the byte to a long and preserves the sign before shifting left, and that bit-or performs a similar widening conversion. So what was happening in the above example was:

  1. Retrieval of the first byte, 0x00. This is interpreted as 0. bit-shift-left shifts automatically performs a widening conversion, preserving the sign, and shifts this left by 24 bits, resulting in 0 as expected.
  2. Retrieval of the second byte, 0xFF. This is interpreted as -1. bit-shift-left automatically performs a widening conversion, preserving the sign, and shifts this left by 16 bits, resulting in a long with a value of -65536.
  3. Retrieval of the third byte, 0xFF. This is interpreted as -1. bit-shift-left automatically performs a widening conversion, preserving the sign, and shifts this left by 8 bits, resulting in a long with a value of -256.
  4. Retrieval of the fourth byte, 0xFF. This is interpreted as -1.
  5. Before the bit-or, 0xFF is interpreted as -1 and widened.
  6. (bit-or a b c d) performs OR on a and b, and again between the result and the next argument, repeating this for successive arguments. In this case, it perfoms bit-or on 0, -65536, -256 and -1, which of course results in -1.

Certainly fun when you don’t know what’s going on. One way around this is to manually perform the conversions yourself, then bit-and with 0xFF to ensure that the sign is removed, ie.:

A little convoluted and can do with some tidying up, but seems to do the job for now.

Edit: Fixed formatting of results, clojure brush doesn’t like raw numbers.

This entry was posted in Programming, Software and tagged , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *