venti – archival storage server|
Venti is a block storage server intended for archival data. In
a Venti server, the SHA1 hash of a block's contents acts as the
block identifier for read and write operations. This approach
enforces a write–once policy, preventing accidental or malicious
destruction of data. In addition, duplicate copies of a block
coalesced, reducing the consumption of storage and simplifying
the implementation of clients. |
This manual page documents the basic concepts of block storage using Venti as well as the Venti network protocol.
Venti(2) describes a C library interface for accessing Venti servers and manipulating Venti data structures.
Venti(8) describes the programs used to run a Venti server.
Scores may have an optional label: prefix, typically used to describe
the format of the data. For example, vac(1) uses a vac: prefix.
Files and Directories
Scores passed between programs conventionally refer to VtRoot blocks, which contain descriptive information as well as the score of a directory block containing a small number of directory entries.
Conventionally, programs do not mix data and directory entries
in the same file. Instead, they keep two separate files, one with
directory entries and one with metadata referencing those entries
by position. Keeping this parallel representation is a minor annoyance
but makes it possible for general programs like
venti/copy (see venti(1)) to traverse the block tree without knowing
the specific details of any particular program's data.
When truncating pointer blocks (VtDataType+n and VtDirType+n blocks), trailing zero scores are removed instead of trailing zero bytes.
Because of the truncation convention, any file consisting entirely
of zero bytes, no matter what its length, will be represented
by the zero score: the data blocks contain all zeros and are thus
truncated to the empty block, and the pointer blocks contain all
zero scores and are thus also truncated to the empty block, and
so on up the hash tree.
After the initial version exchange, the client transmits requests (T–messages) to the server, which subsequently returns replies (R–messages) to the client. The combined act of transmitting (receiving) a request of a particular type, and receiving (transmitting) its reply is called a transaction of that type.
Each message consists of a sequence of bytes. Two–byte fields hold unsigned integers represented in big–endian order (most significant byte first). Data items of variable lengths are represented by a one–byte field specifying a count, n, followed by n bytes of data. Text strings are represented similarly, using a two– byte count with the text itself stored as a UTF–encoded sequence of Unicode characters (see utf(6)). Text strings are not NUL–terminated: n counts the bytes of UTF data, which include no final zero byte. The NUL character is illegal in text strings in the Venti protocol. The maximum string length in Venti is 1024 bytes.
Each Venti message begins with a two–byte size field specifying the length in bytes of the message, not including the length field itself. The next byte is the message type, one of the constants in the enumeration in the include file <venti.h>. The next byte is an identifying tag, used to match responses to requests. The remaining bytes are parameters of different sizes. In the message descriptions, the number of bytes in a field is given in brackets after the field name. The notation parameter[n] where n is not a constant represents a variable–length parameter: n followed by n bytes of data forming the parameter. The notation string[s] (using a literal s character) is shorthand for s followed by s bytes of UTF–8 text. The notation parameter where parameter is the last field in the message represents a variable–length field that comprises all remaining bytes in the message.
All Venti RPC messages are prefixed with a field size giving
the length of the message that follows (not including the size
field itself). The message bodies are:
The type of an R–message will either be one greater than the type of the corresponding T–message or Rerror, indicating that the request failed. In the latter case, the error field contains a string describing the reason for failure.
Venti connections must begin with a hello transaction. The VtThello message contains the protocol version that the client has chosen to use. The fields strength, crypto, and codec could be used to add authentication, encryption, and compression to the Venti session but are currently ignored. The rcrypto, and rcodec fields in the VtRhello response are similarly ignored. The uid and sid fields are intended to be the identity of the client and server but, given the lack of authentication, should be treated only as advisory. The initial hello should be the only hello transaction during the session.
The ping message has no effect and is used mainly for debugging. Servers should respond immediately to pings.
The read message requests a block with the given score and type. Use vttodisktype and vtfromdisktype (see venti(2)) to convert a block type enumeration value (VtDataType, etc.) to the type used on disk and in the protocol. The count field specifies the maximum expected size of the block. The data in the reply is the block's contents.
The write message writes a new block of the given type with contents data to the server. The response includes the score to use to read the block, which should be the SHA1 hash of data.
The Venti server may buffer written blocks in memory, waiting until after responding to the write message before writing them to permanent storage. The server will delay the response to a sync message until after all blocks in earlier write messages have been written to permanent storage.
The goodbye message ends a session. There is no VtRgoodbye: upon
receiving the VtTgoodbye message, the server terminates up the
venti(1), venti(2), venti(8)|
Sean Quinlan and Sean Dorward, ``Venti: a new approach to archival storage'', Usenix Conference on File and Storage Technologies , 2002.