Copyright 2002 Michael B. Allen <mballen@erols.com> encdec.h Encdec These functions may be used to encode and decode C objects such as integers, floats, doubles, times, and internationalized strings to and from a wide variety of binary formats as they might appear in portable file formats or network messages. These encodings include 16, 34, and 64 bit big and little endian intergers, big and little endian IEEE754 float and double values, 6 time encodings, and the wide range of string encodings supported by libiconv. The functions are all designed to be ideal for in-situ decoding and encoding of complex formats.

The Encdec Java Class

See the src/Encdec.java file for equivalent methods in Java. Formats generated by these two implementations are compatible with two exceptions; 64 bit times encdoded using Java will be truncated to the 1 second resolution of the C time_t type and the Java methods do not provide string encoding/decoding methods because Java supports a wide variety of encodings natively. The "UTF-8" encoding is good for transferring strings between Java and C. Note the encoding identifier used with the String constructor and String.getBytes() method may need to be specified as "UTF8" without the hyphen. In-fact many of the identifiers are different so it will be necessary to look up the correct identifier in the Java i18n documentation. The <ident>FLD</ident> macro The <ident>FLD</ident> macro

unsigned int FLD(i, m);
The FLD macro is used to decode bit-fields. It returns an integer value representing the value occupying the bits in mask m. If for example the input is 0xCBA98765 and the mask is 0x00FFFF00 a value of 0xA987 will be returned. With basic register optimizations this is equivalent to the expression (0xCBA98765 >> 8) & 0xFFFF. Masks can be complex. The mask 0x7F080 is equivalent to (i >> 7) & 0xFE1. Integer functions These functions should be used to encode and decode 16, 32, and 64 bit integers.
size_t enc_uint16be(uint16_t s, unsigned char *dst);
Encode a 16 bit integer in big endian order into the memory at dst and return the number of bytes written which is always 2.
size_t enc_uint32be(uint32_t i, unsigned char *dst);
Encode a 32 bit integer in big endian order into the memory at dst and return the number of bytes written which is always 4.
size_t enc_uint64be(uint64_t l, unsigned char *dst);
Encode a 64 bit integer in big endian order into the memory at dst and return the number of bytes written which is always 8.
size_t enc_uint16le(uint16_t s, unsigned char *dst);
Encode a 16 bit integer in little endian order into the memory at dst and return the number of bytes written which is always 2.
size_t enc_uint32le(uint32_t i, unsigned char *dst);
Encode a 32 bit integer in little endian order into the memory at dst and return the number of bytes written which is always 4.
size_t enc_uint64le(uint64_t l, unsigned char *dst);
Encode a 64 bit integer in little endian order into the memory at dst and return the number of bytes written which is always 8.
uint16_t dec_uint16be(const unsigned char *src);
Return a 16 bit integer decoded in big endian byte order from 2 bytes of src memory.
uint32_t dec_uint32be(const unsigned char *src);
Return a 32 bit integer decoded in big endian byte order from 4 bytes of src memory.
uint64_t dec_uint64be(const unsigned char *src);
Return a 64 bit integer decoded in big endian byte order from 8 bytes of src memory.
uint16_t dec_uint16le(const unsigned char *src);
Return a 16 bit integer decoded in little endian byte order from 2 bytes of src memory.
uint32_t dec_uint32le(const unsigned char *src);
Return a 32 bit integer decoded in little endian byte order from 4 bytes of src memory.
uint64_t dec_uint64le(const unsigned char *src);
Return a 64 bit integer decoded in little endian byte order from 8 bytes of src memory.
Time functions These functions may be used to encode a wide variety of low-resolution time encodings.
size_t enc_time(const time_t *timep, unsigned char *dst, int enc);
Encode the time_t object pointed to by timep into the memory at dst encoded in enc format. The following constants are valid enc parameters.
Identifier            Units        Epoch Bits Endianess Use case
----------------------------------------------------------------
TIME_1970_SEC_32BE    Seconds      1970  32   big       time_t
TIME_1970_SEC_32LE    Seconds      1970  32   little    time_t
TIME_1904_SEC_32BE    Seconds      1904  32   big       MS
TIME_1904_SEC_32LE    Seconds      1904  32   little    MS
TIME_1601_NANOS_64BE  Nanoseconds  1601  64   big       MS
TIME_1601_NANOS_64LE  Nanoseconds  1601  64   little    MS
TIME_1970_MILLIS_64BE Milliseconds 1970  64   big       Java
TIME_1970_MILLIS_64LE Milliseconds 1970  64   little    Java
time_t dec_time(const unsigned char *src, int enc);
Decode a return a time_t object encoded as enc in src. The constants listed in the enc_time description are valid enc parameters for this function as well.
Floating point numbers
size_t enc_floatle(const float f, unsigned char *dst);
Encode a 32 bit real number f into dst in little endian IEEE754 format and return the number of bytes encoded which is always 4.
size_t enc_doublele(const double d, unsigned char *dst);
Encode a 64 bit real number d into dst in little endian IEEE754 format and return the number of bytes encoded which is always 8.
size_t enc_floatbe(const float f, unsigned char *dst);
Encode a 32 bit real number f into dst in big endian IEEE754 format and return the number of bytes encoded which is always 4.
size_t enc_doublebe(const double d, unsigned char *dst);
Encode a 64 bit real number d into dst in big endian IEEE754 format and return the number of bytes encoded which is always 8.
float dec_floatle(const unsigned char *src);
Return a 32 bit real number decoded in little endian IEEE754 format from 4 bytes of src memory.
double dec_doublele(const unsigned char *src);
Return a 64 bit real number decoded in little endian IEEE754 format from 8 bytes of src memory.
float dec_floatbe(const unsigned char *src);
Return a 32 bit real number decoded in big endian IEEE754 format from 4 bytes of src memory.
double dec_doublebe(const unsigned char *src);
Return a 64 bit real number decoded in big endian IEEE754 format from 4 bytes of src memory.
Sting functions
int enc_mbsncpy(const char *src, size_t sn, char **dst, size_t dn, int cn, const char *tocode);
The enc_mbsncpy function encodes the multi-byte string at src into dst using the tocode encoding identifier. The tocode parameter can be one of the standard encoding identifiers such as "UTF-8", "KOI8-R", "ISO-8859-2", etc. See the libiconv documentation for a complete list:

http://www.gnu.org/software/libiconv/

Specifically the enc_mbsncpy function;

  • does not read more than sn bytes of src,
  • does not write to more than dn bytes of dst,
  • does not convert more than cn characters,
  • does not convert characters after a '\0' encountered in src,
  • advances dst by the number of bytes encoded into dst
  • and returns the number of characters converted
int enc_mbscpy(const char *src, char **dst, const char *tocode);
The enc_mbscpy function encodes the multi-byte string at src into dst using the tocode encoding identifier. The conversion stops when a '\0' character is encountered in src. This function is equivalent to enc_mbsncpy(src, INT_MAX, dst, INT_MAX, INT_MAX, tocode). See enc_mbsncpy for details.
size_t dec_mbsncpy(char **src, size_t sn, char *dst, size_t dn, int cn, const char *fromcode);
The dec_mbsncpy function decodes the string at src encoded as fromcode to the memory at dst as a locale dependent string (possibly UTF-8). The fromcode parameter can be one of the standard encoding identifiers such as "UTF-8", "KOI8-R", "ISO-8859-2", etc. See the libiconv documentation for a complete list:

http://www.gnu.org/software/libiconv/

More specifically the dec_mbsncpy function;

  • does not read more than sn bytes of src,
  • does not write to more than dn bytes of dst,
  • does not convert more than cn characters,
  • does not convert characters after a '\0' encountered in src,
  • advances src by the number of bytes decoded,
  • and returns the number of bytes written to dst unless a '\0' terminator is not encountered in src in which case one is artifically written to dst but not counted in the return value.
Additionally, if dst is NULL this function
  • does not write to dst,
  • does not advance the src pointer,
  • and returns the exact number of bytes required to encode a multi-byte string had dst not been NULL (i.e. for malloc). This includes the '\0' terminator regardless of wheather one was encountered in src.
size_t dec_mbscpy(char **src, char *dst, const char *fromcode);
The dec_mbscpy function decodes the string at src encoded as fromcode to the memory at dst as a locale dependent string (possibly UTF-8). The conversion stops when the character '\0' is encountered in src. This function is equivalent to dec_mbsncpy(src, INT_MAX, dst, INT_MAX, INT_MAX, fromcode);. See dec_mbscpy for details.
char *dec_mbsndup(char **src, size_t sn, size_t dn, int wn, const char *fromcode);
The dec_mbsndup function decodes the string at src encoded as fromcode and returns a locale dependent string (possibly UTF-8) stored in memory allocated with malloc(3). This memory should be freed with free(3) when it will no longer be referenced. This function just calls dec_mbsncpy(src, sn, NULL, dn, wn, fromcode), allocates the precise amount of memory, encodes the string in it with dec_mbsncpy(src, sn, dst, dn, wn, fromcode), and returns the new string.
char *dec_mbsdup(char **src, const char *fromcode);
The dec_mbsdup function decodes the string at src encoded as fromcode and returns a locale dependent string (possibly UTF-8) stored in memory allocated with malloc(3). This memory should be freed with free(3) when it will no longer be referenced. This function is equivalent to dec_mbsndup(src, -1, -1, -1, fromcode). See dec_mbsndup for details.