Designed and implemented the Google File System
(GFS) to meet the rapidly growing demands of Google’s
data processing needs. GFS shares many of the same goals
as previous distributed file systems such as performance,
scalability, reliability, and availability. However, its design
has been driven by key observations of our application workloads
and technological environment, both current and anticipated,
that reflect a marked departure from some earlier
file system design assumptions. We have reexamined traditional
choices and explored radically different points in the
design space.
First, component failures are the norm rather than the
exception. The file system consists of hundreds or even
thousands of storage machines built from inexpensive commodity
parts and is accessed by a comparable number of
client machines. The quantity and quality of the components
virtually guarantee that some are not functional at
any given time and some will not recover from their current
failures. We have seen problems caused by application
bugs, operating system bugs, human errors, and the failures
of disks, memory, connectors, networking, and power supplies.
Therefore, constant monitoring, error detection, fault
tolerance, and automatic recovery must be integral to the
system.
Second, files are huge by traditional standards. Multi-GB
files are common. Each file typically contains many application
objects such as web documents. When we are regularly
working with fast growing data sets of many TBs comprising
billions of objects, it is unwieldy to manage billions of approximately
KB-sized files even when the file system could
support it. As a result, design assumptions and parameters
such as I/O operation and blocksizes have to be revisited.
Third, most files are mutated by appending new data
rather than overwriting existing data. Random writes within
a file are practically non-existent. Once written, the files
are only read, and often only sequentially. A variety of
data share these characteristics. Some may constitute large
repositories that data analysis programs scan through. Some
may be data streams continuously generated by running applications.
Some may be archival data. Some may be intermediate
results produced on one machine and processed
on another, whether simultaneously or later in time. Given
this access pattern on huge files, appending becomes the focus
of performance optimization and atomicity guarantees,
while caching data blocks in the client loses its appeal.
Fourth, co-designing the applications and the file system
API benefits the overall system by increasing our flexibility.
For example, we have relaxed GFS’s consistency model to
vastly simplify the file system without imposing an onerous
burden on the applications. We have also introduced an
atomic append operation so that multiple clients can append
concurrently to a file without extra synchronization between
them. These will be discussed in more details later in the
paper.
Multiple GFS clusters are currently deployed for different
purposes. The largest ones have over 1000 storage nodes,
over 300 TB of diskst orage, and are heavily accessed by
hundreds of clients on distinct machines on a continuous
basis.
Saturday, November 18, 2006
Subscribe to:
Posts (Atom)