Bit-Twiddling in C

The C programming language has a number of features which are specially designed for performing bit manipulation. This is a brief review of some of the most important ones.

Octal and Hexadecimal

It is possible to specify numbers in decimal, octal, or hexadecimal.

Decimal
This is what you've been doing ever since you started programming in C. Examples: 1, 73, -437 all have the values you expect.
Octal
An octal number is specified with a leading 0. Examples: 010 = 108 = 810; 073 = 738 = 5910. Of course, the digits 8 and 9 can't be used in an octal number.
Hexadecimal
A hexadecimal number is specified with a leading 0x. Examples: 0x5 = 516 = 510; 0xa57 = a5716 = 2,64710. The digits from 10 to 15 are represented by a through f, as expected.

Mask Operators

Operators exist to perform the basic bitwise unary and binary operators.

Bitwise OR
The ``|'' (shift-\ on most keyboards; not 1, l, or !) is used to perform a bitwise OR. Examples:
	  0xa3 | 0x04 = 0xa7
	  0x7d | 0x58 = 0x7d
	   0xa |  0xc =  0xe
	  
Bitwise AND
The ``&'' operator performs a bitwise AND. Examples:
	  0xa3 & 0x04 =  0x0
	  0x7d & 0x58 = 0x58
	   0xa &  0xc =  0x1
	  
Bitwise XOR
The ``^'' operator performs a bitwise exclusive-or. Examples:
	  0xa3 ^ 0x04 = 0xa7
	  0x7d ^ 0x58 = 0x25
	   0xa ^  0xc =  0x6
	  
Bitwise NOT
The ``~'' operator performs a bitwise inverse. Example:
	  ~0x00a3 = 0xff5c
	  
(assuming a 16-bit variable)

Shifting

There are two shifting operators, which perform bit-shifts.

Right-shift
The ``>>'' operator performs a right-shift by the specified number of bits. Example:
	  0x1a3 >> 3 = 0x034
	  
(the value was right-shifted by three bits)
Left-shift
The ``<<'' operator performs a left-shift by the specified number of bits. Example:
	  0x1a3 << 2 = 0x068c
	  
(the value was left-shifted by two bits)

Macroes

Macroes are especially useful for making code involving bit-twiddling comprehensible. Macroes are textually substituted into your code at compile-time; for instance, if you

#define SIGNIFICAND 0x007fffff
#define PHANTOM     0x00800000
you can use
mantissa = (ieeeval & SIGNIFICAND) | PHANTOM;
to extract the significand from an IEEE-format floating point number, and put the phantom bit in.

Making a macro all upper-case is a very common convention, and helps avoid confusion with variables (which are typically lower case).

Specified-Width Variables

One of the weak points of the C language has always been that the width of the standard integer types is not specified; there are some weak requirements (a long can't be shorter than a short, for instance), but no way to specify that you actually want a 32-bit integer vs. a 16-bit or 64-bit integer.

As of 1999, the C standard specifies that a compliant implementation will include a file called <stdint.h> which defines precise-width types. In general, an unsigned integral value of specified width will take the form uintsize_t, so you can declare an unsigned 32-bit wide integer by saying

uint32_t myvar;

A signed integer of specified width will take the form intsize_t (note the lack of the leading u in this one).

You should #include <stdint.h> and use these types.

WARNING: Signed integers

Think about what happens if the following code appears in your program:

int32_t myvar; /* note myvar is signed */
myvar = 0xf0000000;
myvar = myvar >> 4;

You might expect that after this operation myvar will contain 0x0f000000; it won't. Instead, because it is a signed variable, C will conclude that you want to maintain it as negative, and will sign-extend it. So myvar will contain 0xff000000.

You'll find you're much better off if you (a) declare all the variables you'll be manipulating with these operators as unsigned, and (b) explicitly mask off any bits you want to shift in from the left as 0's. For years I thought unsigneds were guaranteed to shift in 0's; it turns out that this is not in fact required by the standard, though every compiler I've ever heard of does indeed do it this way.

argc and argv

When a C program is executed, it is passed two parameters, called argc and argv. argc is an integer, specifying how many parameters the program was called with; argv is an array of strings containing the parameters.

The first parameter is the name of the application itself.

It seems easiest to me to show how to use these by an example. So here is a little program that just prints out its parameters:


#include <stdio.h>
int main(int argc, char *argv[])
{
    int i;
	
    for (i = 0; i < argc; i++)
        printf("%2d: %s\n", i, argv[i]);

    return(0);
}

I ran it on my machine with the following results:


viper:11% ./test a b c d
 0: ./test
 1: a
 2: b
 3: c
 4: d

strtoul

A very valuable function for converting strings to unsigned integers is strtoul. You can get information about it by typing the command


man strtoul


Last modified: Mon Jan 23 11:05:17 MST 2006