The complete USB 2.0 specification is at usb_20.pdf. It's a 5 MB download and over 600 pages long, so you probably don't want to read it...
Darn -- there used to be an excellent description of USB at http://mes.loyola.edu/faculty/phs/usb[1234567].html, but recently I've been getting timeouts trying to get to it... anyway, the following description is distilled from it.
The interfaces to PC peripherals was a hodge-podge, with little rhyme and less reason.
The dominant low-speed interface was the serial port. It was originally designed as a communication mechanism between teletypes and modems, and was very good for that purpose. As a connection mechanism for computers and terminals, or computers and modems, 50 years later it was overly complex, allowed for a lot of capabilities that absolutely nobody used, and was difficult to configure.
The most common (though hardly dominant) high-speed device was the so-called parallel port, originally designed as a printer interface by Centronix. It was poorly designed, without proper handshaking (unlike the EIA-232C serial standard, which was well-designed for its intended purpose but had outlived its usefulness). The Centronix connector was large and expensive, so IBM chose to use the same connector for parallel as for serial (though they used the opposite polarity). Hardware hackers found ways to use it as a bidirectional interface (literally using razor blades and soldering irons on the motherboard); the bidirectional extensions had been added to the specification. Attempts had been made to extend it for use as a general IO bus, without a lot of widespread acceptance.
There were also some specialized interfaces, like for the keyboard and mouse.
The intent of the USB interface is to provide a single interface which will be suitable for the vast majority of IO devices, especially devices that are external to the PC like printers, scanners, keyboards, mice.... The oft-stated goal of the USB Implementer's Forum is that plugging a new device into a PC should be as easy as plugging a telephone into the wall, so the standard supports hot-plugging and device identification.
First, and most importantly, USB isn't a bus in spite of its name. It's a hierarchical star-topology network. This means that the network is tree-structured; the internal nodes are all called ``hubs'' and the leaves are the I/O devices. The total number of devices (including hubs and I/O devices) that can be connected with USB is 127, limited by a 7-bit address. So far as I've been able to find, there is no built-in limit to the branching factor of a hub (so you could have 126 I/O devices connected directly to it), but commercially available hubs typically support between 4 and 8 (the root hub in the PC is a special case that deserves mention; typically 2 for desktop or deskside systems and 1 for small notebooks). Each level in the device tree is called a ``tier;'' the maximum number of tiers (not including the root hub) is five.
A device will normally have several "endpoints," suitable for different types of transfers. What this means is really up to the device and the device driver; every device must have at least one bidirectional control endpoint, which is used for the device setup information. It will normally also have input and output endpoints for data. When a packet arrives at the device, its endpoint destination is part of its address; the device firmware uses this internally to determine how to handle the packet.
The USB cable has four conductors: power, ground, signal+, and signal-. Hot-plugging is supported by having the power and ground wires physically longer than the signal wires; they will make contact first, and the device is required to be in a stable state before the signal wires make contact. Hubs are required to supply 500 mA at 5 volts to power devices; a device that needs more power than that has to have its own power supply. There's a lot more detail on just how much power can be supplied by different types of hubs, and required by devices, under various conditions.
USB connectors are designed with polarized ends (type A and type B connectors), so that it is impossible to plug the upstream or downstream ends of two devices together. The intent is that hubs are equipped with female Type A connectors on the downstream end; a device can have either a male Type A connector or a female type B connector on its upstream end.
Unfortunately, the market has created both devices that expect a cable to connect them to the hub (so they have a type B connector), and devices that expect to plug into the hub directly (so they have a male type A connector). This has resulted in two types of cables; some which use the intended Type A/Type B ends, and some that have a male Type A on one end and a female on the other (USB extension cords. These are actually in violation of the standard, and the ones I've seen don't have the trademarked USB logo on them).
Signals are sent on the bus by reversing the s+ and s- lines. Reversing the voltages on them is a 0; leaving them unchanged is a 1 (this is called a NRZI, or ``Non Return to Zero Inverted,'' encoding). Because you could lose syncronization in a long string of 1's, a technique called bit-stuffing is used; if six 1's are sent in a row, a 0 is inserted in the bit stream at that point by the transmitter, and ignored by the receiver (note that this is different from an interface like a serial port, which requires the lines to go to a known state at the beginning and end of every character).
Here's an example of what sending a bit stream would look like.
The string is 11001111111011
; since there are seven 1's
in a row, a 0 has been inserted by the transmitter and will be
stripped off by the receiver.
A USB packet has the following format (I should mention that the example bit stream above is not the packet being described here):
Start of Frame packets are used to provide a heartbeat for the bus: they are issued every millisecond by the host. They contain a packet ID (0101), the frame number, and an error check.
Setup, In, and Out packets are all used to set up data transfers between the host and a device; in addition to the packet ID, they all have a 7 bit address, a 4 bit endpoint number (a device can have several ``endpoints;'' this would be something like the different buttons on a game controller), and an error check.
Data packets contain a packet ID, up to 1023 bytes of data, and an error check. There are actually two types of data packet, called data0 and data1. If you need to transmit more than 1023 bytes you do it by sending alternate data0 and data1 packets; this is so a sync loss can be detected (if you get two data0 packets in a row from somebody, you know something is wrong).
Finally, there are ACK, NAK, and stall packets.
Transactions take the form of several packets sent back and forth between the host and a device. In general, the host sends a packet to the device requesting the transfer take place, and the device responds with requested data. There are four basic types of transfer:
The host sends an IN packet to the device endpoint; the device responds with a DATA packet; the host sends an ACK. The maximum amount of data which can be sent is 8 bytes for a low-speed device or 64 bytes for a full-speed or high-speed device.
Alternatively, the host may send an OUT followed by a DATA packet (also limited as before); the device responds with an ACK.
If necessary, a data transfer of larger than 64 bytes can be performed; if a packet uses a full 64 bytes, it is assumed that there is more data to send; additional transactions are used until all the data is across. If the packet was intended to contain exactly 64 bytes, an empty packet is used to signify the end of the transfer.
One problem with USB is that it is a strictly host-peripheral protocol. There is no provision for connecting devices directly to each other, while a demand exists for being able to do things like plug a camera directly into a printer.
USB On-The-Go is an extension to USB to provide this capability. USB On-The-Go allows a pair of devices to dynamically negotiate who is the host.