Beyond3D has posted the second part of their Hardware Geometry Processing article, focusing on the Programmable Vertex Unit. It's a fairly in-depth article about how many of the features work in low-level hardware.
The issue of instruction length also points at a potential issue that developers need to take into account: don't change the Vertex Shader program unless you really have to, because uploading a new vertex program comes at the expensive of using bandwidth. Toggling between two Vertex Shader programs is thus definitely not a good idea for high throughput. Future Vertex Shader implementations might move to a loading (caching) mechanism for instructions but to make this possible the instruction size has to be kept as small as possible and the instruction re-use has to be substantial (e.g. processing of the same instruction on a large set of different vertices before moving to the next instruction).More information @ Beyond3D