|  |
 |
Table of contents:
|  | HTML |  | PDF |
This article:
|  |
HTML
|  | PDF | DOI: 10.1147/rd.524.0329 | Copyright info |  |
 |
 |
Glamor: An architecture for file system federation
|  |  |
by U. Lanjewar, M. Naik, and R. Tewari
|
|
|  |
 |  |  |
|
| |
|
Most enterprises store their data in file systems distributed across independent file servers. With network bandwidths no longer being a limiting factor in information exchange, enterprises are moving toward geographically distributed operations, as well as multisite collaboration in which joint product development is becoming increasingly common. This requires data sharing in a uniform, secure, and consistent manner across the enterprise with reasonably good performance.
Currently data and file sharing is mostly ad hoc. Often, users rely on manual data access through mechanisms such as File Transfer Protocol (FTP) or a distributed file system protocol such as Network File System (NFS) [1] or Common Internet File System (CIFS) [2]. Data and file sharing within an office location could be facilitated by a cluster file system such as the IBM General Parallel File System* (GPFS*) [3]. However, cluster file systems are designed for high performance and strong consistency and can be expensive or difficult to deploy. Moreover, cluster file systems do not support managing files across a collection of independent, heterogeneous file systems and file servers. Recently, data grids using storage resource broker middleware enable some of the features of Glamor, our framework for building a federated file system. Some large enterprises use globally distributed file systems such as the Andrew File System (AFS*) [4, 5] or its successor the DCE Distributed File System (DCE/DFS*) [6]. (DCE/DFS is the remote file access protocol used with the distributed computing environment.) Whereas AFS and DCE/DFS were designed for global distribution and shared access, they have not been widely adopted for simple enterprise-wide file sharing partly because of the complexity and cost of setting up and managing proprietary systems. Consequently, the solution is not to build yet another globally distributed file system but to federate a set of multivendor, independent file servers so that they appear as one. Because the transition of data to new systems is expensive, it is desirable that the data remain where it currently resides—in existing file systems or on a variety of single file servers and network-attached storage (NAS) appliances. The file system clients should be able to obtain a unified view of all the data and navigate it like a local file system without the need for any special client-side software. Although performance is not the primary goal of a federation, data that is frequently accessed and primarily read-only can be replicated closer1 to the clients for faster access. Other features such as consistent and secure access are also desirable. Finally, for commercial viability, users should be able to access the data using standard client/server file access protocols such as NFS or CIFS.
As a first step toward meeting these goals, Glamor provides a common enterprise-wide namespace across a set of independent file servers. In order to handle a range of file systems, from a single file server appliance to a large cluster file system, Glamor provides flexible data management, in terms of both how much data can be managed as a logical unit and when and how that unit can be defined. For easy administration, Glamor relies on a central server to coordinate namespace operations and trigger data management events. Many characteristics of a federation are maintained because each server can operate independently and even in isolation, as might happen in the case of a network partition. To remain easy to deploy, Glamor relies on standard file access protocols and clients and can operate with existing file systems.
The current implementation of Glamor runs on the Linux** operating system and leverages the standard but optional client redirection features of the NFS version 4 (NFSv4) protocol [7] and the automounting (autofs) support for NFS version 3 (NFSv3) clients. The redirection and replication capability of the CIFS-based distributed file system (MS Dfs, or Microsoft distributed file system) would also allow Glamor to federate Microsoft Windows** operating system-based file servers, but the use of CIFS as a file access protocol is not described in this paper. Using our test infrastructure, we demonstrate how Glamor provides a common namespace and performs data replication and I/O (input/output) load-balancing. We discuss the modified Andrew benchmark results and the performance overheads of client redirection at the client.
In this paper, we highlight three main contributions of Glamor. First, we demonstrate that a viable federation of file systems can be built using standard file access protocols. Second, we describe the design of the Glamor namespace and how our implementation integrates with the existing NFS namespace. Finally, we demonstrate how Glamor enhances data mobility and location independence by replicating and migrating data without disrupting file system applications.
| |
|
Data and file sharing have long been achieved using traditional file transfer mechanisms such as FTP as well as distributed file access protocols such as NFS and CIFS. While the former are mostly ad hoc, the latter tend to be “chatty” (i.e., multiple network roundtrips are required to complete each command), having been designed for LAN (local-area network) environments where clients and servers are typically located in close proximity. Cluster file systems such as GPFS [3] and Lustre** [8] also facilitate data sharing and are designed for high performance and strong consistency, but they are neither inexpensive nor easy to deploy and administer. Other file system architectures such as AFS and DCE/DFS have attempted to solve the WAN (wide-area network) file-sharing problem through a distributed architecture that provides a shared namespace by uniting disparate file servers at remote locations into a single logical file system. However, these technologies do not rely on widely deployed standards and have not been widely adopted for enterprise-wide file sharing.
Recently, a new market has emerged to primarily serve the file access needs of enterprises where users are expected to interact across a number of locations. Wide-area file services (WAFS) are rapidly gaining acceptance, with leading storage and networking vendors integrating WAFS solutions into new product offerings [9, 10].
One of the commonly used mechanisms for sharing files over the network is the NFS protocol, originally developed by Sun Microsystems in 1984. Using the protocol, a file server can export file systems, or directory trees, in order to allow clients to access these systems. Clients can then access the remote data by issuing mount commands to the specific servers. The process of mounting NFS shares (e.g., exported directories) at the client can be automated by using an automounter service that provides access to remote file servers on demand when users first traverse a path name that leads to that server. The automounter initiates a mount on demand without requiring an explicit mount command to be issued by the user. An automounter also presents the client with a uniform namespace to all of the resources. In Linux, this service is provided by a combination of a user-space daemon and an in-kernel helper file system called autofs. Remote file systems can be identified by automounter “maps” that can be specified in various ways ranging from plain text files to an external server that supports Lightweight Directory Access Protocol (LDAP).
Introduced in 2000, NFSv4 is a distributed file system protocol similar in design to previous versions with additional support for high-performance data sharing over a WAN and enhancements for integrity and security. Unlike earlier versions, NFSv4 presents the client with a single seamless view of all exported file systems. A client can traverse the server namespace without regard to the structure of the file systems on the server. To improve availability, NFSv4 has provisions to support referrals, file system migration, and replication. When a file system is migrated to a new server, a client is notified of the change by means of a special error code and informed of new locations through a special attribute. The client may then access the file system on the new server with no disruption to applications. This special attribute, fs_locations, consists of a list of server host names and paths corresponding to the alternate locations for a file system. If a client finds a file system unresponsive or poorly performing, it may choose to access the same data from another location. NFSv4 also facilitates the combining of namespaces by allowing a server to refer the client to another server from a directory in its exported tree.
| |
|
As we have mentioned, Glamor is a framework for building a wide-area federated file system. The goal of this system is to provide distributed file access across enterprise-wide or Internet-scale networks. In other words, Glamor can be viewed as an architecture for file system virtualization. From the point of view of the client, the boundaries defined by the individual file servers are made indistinct, and instead clients see a unified file-based view of all the file storage within the enterprise. This simplifies management and enables better resource utilization and scalability. File servers need not be over-provisioned, as capacity can be shared across servers. Similarly, new servers can be added for additional capacity and bandwidth without being restricted by the limits of individual file systems. The basic building blocks of a virtualization architecture are a common namespace and “smart” data management that uses replication to bring data closer to where the clients are located, thereby providing faster access to users.
Traditionally, storage management in the UNIX** and Windows operating systems is performed at the file system level. File systems are usually associated with storage partitions (representing either physical or logical volumes) and manage all the storage space in that partition. With larger and more complex block storage systems, this management can be coarse-grained and limited in flexibility. To avoid these limitations, file system subsets called filesets have been used in the context of distributed file systems. In Glamor, we define a fileset to be any directory tree in a file system. A fileset is the basic unit of data management and can be as small as a single directory or as large as the entire file system. Data management in Glamor is very flexible. Glamor provides mechanisms for defining how related data can be grouped into filesets, where the data should reside, and when it should be moved or replicated on the basis of performance or availability requirements. Glamor also initiates and manages the replication of filesets.
Filesets in Glamor may be physically located on different file servers and are combined to create a common namespace by mounting them just like traditional file systems. All servers export the same namespace to clients. This makes the client view uniform and simple. The clients do not have to mount file systems from different servers and are not aware of which server hosts which part of the tree.
Multiple approaches exist for building a common namespace across a set of independent file servers. The simplest technique involves the setup of a specialized router as a front-end through which all the client requests are routed to the desired file server. This solution works well in small LAN-connected installations but cannot scale to a larger network. Another approach involves the use of specialized servers that can redirect client requests to the desired server and act as a client proxy. This solution is scalable but is not suited for off-the-shelf (i.e., standard, commercially available) file servers. As part of a key design choice for Glamor, we do not require a new proprietary client, proprietary networking hardware, or specialized servers. Thus, we rely on the standard clients for widely used network file access protocols, namely NFS in the UNIX environment and CIFS in the Windows environment. In an NFSv4 environment, Glamor leverages the client redirection feature to provide a common namespace. For NFSv3 clients, Glamor utilizes an automouter with LDAP. As there are no standard server-to-server protocols available for performing data replication, Glamor relies on internal data transfer protocols. While this paper does not focus on CIFS, the redirection features of CIFS could be similarly leveraged by Glamor in the future.
| |
|
The purpose of a unified namespace is to make all managed data available to all clients via the same path in a common file system-like environment. This should be achieved with minimal client-side configuration so that updates to the common namespace do not require configuration changes at the client. The Glamor namespace is constructed by logically mounting filesets that are physically located on different file servers. The namespace typically starts at a ROOT fileset and traverses through the other filesets mounted at specific paths in the ROOT fileset. Clients can optionally decide to use a starting point other than the ROOT fileset. The Glamor namespace contains two types of directories: regular directories and junction points. (A junction point is a directory in a parent fileset to which another fileset may be logically attached.) Glamor provides three main functions to manage a common namespace. First, the namespace information is stored in a shared, persistent repository. This includes the filesets and their physical locations along with the directory path where the junction point is to be created. Second, an interface is provided for a file server to query the repository to find out the locations of a fileset. Third, an interface is provided to instruct the file server to create a junction point at a given directory within a fileset where the target fileset is logically attached.
The Glamor subsystem stores the namespace information in an LDAP server-based repository and relies on LDAP for querying the namespace information. Glamor relies on the underlying file server to support an interface to create a junction point and trigger a referral when the client accesses the directory. The namespace schema defines the model used for populating, modifying, and querying the namespace repository.
In the following subsection, we continue to describe how a common federation-wide namespace is created, how it relates to other traditional namespaces, and how it is implemented.
| |
|
The primary organizational unit of Glamor is a cell. All Glamor objects such as filesets, fileset locations, and the root namespace are associated with a cell. Cells are independent and non-interacting so that a single organization (e.g., corporation) can create cells in a way that best meets its needs. Note that the cell is a logical construct, and Glamor (see the section on fileset replication) allows multiple cells to be administered by a single host. All file servers within a cell implement a unified and consistent namespace.
In addition to maintaining all the information necessary to manage filesets, a Glamor cell provides a range of other services, including 1) a single point of administration for all data management facilities, 2) a fundamental boundary for security controls, and 3) automation services, such as replica updates, to facilitate maintenance of filesets.
| |
|
To better understand how the Glamor namespace is created, we first view the namespace in different contexts:
-
Local server namespace—This is the view of the namespace at a server that corresponds to the locally mounted physical file systems associated with the attached storage devices (e.g., /dev/hd3 is mounted at /var).
-
NFS namespace—This is the view of the namespace visible at the NFS client. In NFSv3, the server exports multiple directories that are individually mounted by the client. In NFSv4, the server creates a single tree rooted at a NFS pseudo-root by combining different exported paths that become the NFSv4 pseudo-file system namespace.
-
Cell namespace—Glamor extends the NFS namespace concept by combining the filesets exported by all of the participating servers to form a shared cell namespace. The cell namespace is exported by all of the participating servers and is managed by a central administrating authority. A client connecting to any server in the cell is able to traverse the same namespace. Filesets can be mounted anywhere and multiple times in the cell namespace.
The cell namespace of Glamor is created by mounting filesets physically located on any of the server file systems. Figure 1 shows two servers, foo and bar, within a cell called mycell. The local fileset /local/myprojects of server foo is mounted within the cell namespace at /mycell/projects. Similarly, the local fileset /local/mybin of server bar is mounted at /mycell/bin. Both servers would simply export /mycell to the client. We now discuss how a Glamor mount relates to existing file systems and how it integrates with the NFS namespace.
Figure 1
Root fileset
The root of the cell namespace plays a role analogous to that of the standard NFSv4 pseudo-file system root by providing junction points for other filesets. Thus, it contains no data, but only junctions on which other filesets can be mounted. In the previous example, the root namespace contains the junction points /mycell/projects and /mycell/bin on which the filesets containing data are mounted. The root of the cell namespace can be viewed as a fileset that contains only directories and that is replicated across all the servers. The Glamor root fileset is shared by all the servers. The Glamor admin (i.e., administration) service is responsible for modifying and updating the root namespace in response to any namespace mount and unmount operations.
Thus far, we have discussed at a conceptual level how the Glamor namespace is created. In this section, we discuss how a Glamor mount is implemented using an existing file system. Glamor mount operations differ from file system mounts in two ways. First, Glamor mounts are persistent across system reboots, as the mount relationship between the filesets is stored in the LDAP-based namespace repository. This makes sense in a distributed, multiserver system such as Glamor for which the impact of individual system failures should be minimal. Second, Glamor filesets can be mounted more than once, in multiple places within the namespace. Because Glamor junction points have low performance overhead, there is no performance penalty for allowing multiple mounts.
Like file systems, filesets must be mounted on an existing directory in an existing fileset before they are visible to a client. For NFSv3 clients, filesets are combined through the namespace information stored in the namespace repository and are traversed through the use of autofs by employing LDAP queries to resolve the locations. In the NFSv4 context, the result of mounting a fileset is the creation of a special object called a junction. (A junction is a special directory in the physical file system with a marker that indicates that a different, possibly remote, fileset is mounted there.) How a junction marker is created depends on the file system and the implementation. The junction may be a directory with an extended attribute, a directory containing a special file, or indicated as an export option when the fileset is exported. In these cases, the junction marker should possess enough information to allow a server to determine the location (i.e., the server name and path for NFSv4 fs_locations attribute) of the target fileset. Client view
The client in Glamor is the standard NFS client. We require that if the client protocol is NFSv4, the client should have the ability to handle referrals as follows. First, when the client traverses a directory that happens to be a junction, the server returns an NFS4ERR_MOVED error to initiate redirection. Next, the client follows with a GETATTR fs_locations request; the server returns locations by querying the namespace repository using LDAP with the junction information as the key to perform a lookup. If the fileset is replicated, the server returns all the locations to the client as an ordered list. The client then mounts the fileset from the first server in the list and continues its traversal. If the server for the first location fails, the client tries the next location in the list. If the client protocol is NFSv3, the client defines the LDAP server from which it will generate, on-the-fly, the autofs map entries. When the client first accesses a directory that is managed by autofs, it will contact the LDAP server to obtain the namespace information in the form of local directories or remote mounts.
| |
|
Glamor stores all namespace-related information in a shared repository with an LDAP server front end and a persistent database as the back end (Figure 2). The namespace information is described by an LDAP schema that defines the various objects used to construct and manage the namespace. These objects include sites, filesets, fileset locations, fileset attributes, and junctions.
Figure 2
| |
|
A site represents geographical locality of clients and servers in close proximity, for example, across a LAN. Machines contained within one site are assumed to be co-located to each other in terms of network connectivity. Since enumeration of all clients and servers within a site is not feasible, they are grouped together by DNS (domain name server) domains. Sites are connected to other sites by a TCP/IP (Transmission Control Protocol/Internet Protocol) network. Each pair of sites has an administrator-assigned value indicating the performance or cost of the interconnection link between those sites. A low interconnection cost indicates a high-bandwidth link. If cost is left unspecified, it is assumed to be infinite or a high value. Glamor uses this information to determine the order in which data replications can be performed from one location to the other and to redirect clients to a file server that is closest to the client.
| |
|
Data management in Glamor happens at the granularity of a fileset. As defined earlier in this paper, a fileset is a directory tree within a file system. A fileset is a lighter weight (i.e., lower performance and lower memory overhead) object than a file system. Generally, filesets are created to represent a semantic storage relationship. For example, every user could have their home directory in a unique fileset, regardless of size. As filesets are not tied to system devices, they can be easily created, destroyed, and moved from one file system to another. The analogous file system operations, on the other hand, have higher performance and memory overhead. Because filesets have low performance overhead, there is little penalty for creating a large number of filesets. Whereas servers generally have a small number of file systems, the number of filesets a single file system can hold is limited only by the relative size of the fileset and the file system. Fileset locations
An instance of the fileset container abstraction is called a fileset location. The data for a fileset is physically stored in one or more fileset locations. In the simplest case, a single location exists for a fileset, and thus, the distinction between a fileset and its location is reduced. However, filesets can have multiple (identical) locations. In this case, because all locations are identical, they can be used interchangeably. Each fileset and fileset location is identified by a universally unique identifier (uuid) value.
Note that filesets, not fileset locations, are mounted in the cell namespace. It is the responsibility of the Glamor servers and NFS clients to determine the best location to use to access data for a fileset with multiple locations. Readers familiar with AFS and DCE/DFS will notice similarities between Glamor filesets and locations, AFS filesets and volumes, and DCE/DFS filesets and sites.
Locations are usually spread across multiple servers and can be used for 1) improving throughput by load-balancing across servers, 2) reducing latency by redirecting clients to closer servers, and 3) obtaining better reliability by redirecting clients to a different location when a failure or regional disaster occurs. Fileset attributes
Read–write filesets are the most common. They generally have only a single location (unless created in a cluster file system environment) and can be read and written by multiple clients just like a normal file system. Read-only filesets are created by replicating another fileset at a particular point in time. For example, a replica can be a daily backup of the original fileset. In this case, the replica-update mechanisms can include taking a snapshot of the source fileset at midnight and updating all the replicas.
Fileset locations in Glamor are equivalent but not identical to each other. Locations may be either dependent or independent. A dependent location is one whose contents are based on the contents of another Glamor location. An independent location, on the other hand, does not have this content dependency. Typically, independent locations represent source read–write locations, whereas dependent locations are replicas and read-only locations created by a Glamor-initiated replication. Dependent locations may represent consistent point-in-time copies of independent locations. Glamor cannot, however, inhibit subsequent changes to a dependent location or guarantee point-in-time consistency of a dependent location if the independent location has changed.
| |
|
The filesets in Glamor can be replicated and attached to the cell namespace. The replica fileset is attached at a different point in the cell namespace from the original fileset. Each replica fileset can have multiple fileset locations that can be placed on multiple servers. In our example, the fileset of server foo is replicated and the replica fileset is attached to the cell namespace at /mycell/backup/projects. The original read–write fileset was attached at /mycell/projects. While the original fileset was located at server foo, the replica fileset has multiple locations, one of them being at server bar.
One of the most common data management operations is replica fileset update. The data transfer between the source fileset and the replica fileset locations occurs using an out-of-band2 server-to-server protocol. The Glamor infrastructure allows different protocols to be easily used, and the default protocol is a differential compression scheme similar to rsync [11], which is an open-source utility that provides fast incremental file transfer. The rsync-like protocol transmits only fileset differences between servers. The protocol selection process will fall back to a simple copy if the rsync protocol is not available.
When fileset replication is to occur, the replication should represent a point-in-time copy or snapshot. This ensures that the data in the new or update replica is consistent with the state of the source at some point in time. Without point-in-time copies, the contents of the resulting fileset may not represent the contents of the source at any single time but rather a mixture of the contents of the source at different times. Whether this is acceptable depends on the use of the contents of the fileset.
Glamor does not implement snapshot support but can utilize that support when it is provided by the underlying file system. For example, if a fileset exists within a file system that provides snapshot functionality, when a fileset snapshot is requested, Glamor can create a snapshot of the entire file system, which is sufficient (but not necessary) to create a snapshot of the fileset location. This snapshot can then be used to populate or update a replica location. When the fileset snapshot is no longer required, it can be destroyed, at which point Glamor will release the file system snapshot.
| |
|
Replication in Glamor is useful for both load-balancing and failure handling. For load-balancing, the server returns different server locations (the fs_locations attribute in NFSv4 or multiple junction points in the autofs map) for the directory that is the root of a replica fileset. The subset of the available locations can be returned on the basis of the geographical or network location of the client, the load on the server, or a combination of other factors. To enforce load-balancing, Glamor can dynamically steer a client to different replica locations. Multiple replica locations are also useful to mask (i.e., hide) failures. When the client detects that the server providing a replica is unresponsive or poorly performing, it can connect to another server from the list of locations for that replica. Failover behaves in a manner somewhat similar to migration, except that when a server has failed, no state can be recovered.
| |
|
In this section, we evaluate the performance of Glamor using different workloads, with an emphasis on measuring the performance overheads and scalability of filesets, as well as evaluating server load-balancing through client redirection (NFSv4) or autofs maps (NFSv3). We compare the results of Glamor with those obtained using the standard NFS. We present two cases: a macro-benchmark to determine the performance of standard UNIX utilities such as ls and tar, as well as a modified Andrew benchmark to show the overall performance on common file system operations.
| |
|
We ran our experiments using a Linux NFS client and NFS servers in the following configurations: In the standard configuration, we used a setup with unmodified Linux NFS clients and servers. The results from these tests give us baseline numbers to evaluate the overheads and advantages of Glamor. In the Glamor configuration, we used a standard Linux client interacting with a Glamor-enabled namespace. For NFSv3, the client obtains namespace data from the LDAP database of Glamor using autofs maps; for NFSv4, the client uses the referral data returned from the LDAP database to traverse the namespace.
All experiments were conducted using identical IBM eServer xSeries* 330 computers using a 1.266-GHz Intel Pentium** III CPU, 2 GB of RAM, and a 36-GB 7,200-rpm hard disk. All of the computers ran the Linux 2.6.21 kernel.
| |
|
We used custom scripts and programs to create the file system hierarchy and populate the files, creating 1,500 directories and 60,000 files, totaling approximately 1 GB of data. All macro-benchmarks were run using this workload. We also used OProfile [12], a system-wide profiler for Linux, to identify performance bottlenecks; the details of this evaluation are not presented in this paper.
| |
|
Performance overheads in traversing the namespace with filesets are dominated by the response time to an LDAP query from autofs for an NFSv3 client and an fs_locations attribute request for an NFSv4 client. We evaluate the overhead of traversing filesets residing on multiple servers and the effect of client redirection on the application response time. To measure this, we perform a simple recursive listing using the UNIX command ls -lR on a tree rooted at Server A that is mounted by an NFS client. At several points in the directory tree, fileset junctions were created, each referring to another server, Server B. As the client traverses the namespace, an LDAP request is serviced upon hitting each fileset root. On following each referral, the client performs the equivalent of an NFS mount operation. On completion, the client would have as many mounts as the number of junctions that were followed. Figure 3 shows the results with NFSv3 and NFSv4. For NFSv3, the results [Figure 3(a)] show that the performance overhead of Glamor (using autofs with LDAP) is significant when compared to base NFS. However, most of the total time to run the ls -lR test is dominated by the LDAP request handling. The traversal time also increases as the number of junctions increases. For NFSv4 [Figure 3(b)], Glamor-based traversals have a 10% overhead compared to base NFS. For example, consider that at 14 junctions, the value associated with base NFS is 23.07 seconds, and for Glamor, it is 25.43 seconds.
Figure 3
It is noteworthy that once a client has traversed the namespace and resolved all junctions by creating the required in-memory state, this state information is cached so all future namespace traversals do not incur these overheads. This is also shown in Figure 3 (red line), where the same experiment has been repeated after the client has already traversed the namespace once. For the NFSv4 case, the traversal time with Glamor and caching is the same as that for base NFS.
| |
|
To demonstrate the benefits of I/O load-balancing through replication, we ran experiments that allow a client to access parts of the same file system through multiple filesets in parallel. We compare the time it takes a client to archive the entire fileset when reading from a single server with the time it takes for the same operation when spread across multiple servers. For this demonstration, we created a simple parallel version of GNU** tar with multiple processes running in parallel, each of which archives a portion of the directory. (The GNU project involves free software for UNIX-style operating systems.) The results are shown in Figure 4. As is evident from the results, as the number of servers increases, the response time decreases because the load is spread across multiple servers. For NFSv3 [Figure 4(a)], the response time for a parallel tar operation decreases from 20 seconds to 4 seconds as the number of servers is increased from one to six. For NFSv4 [Figure 4(b)], the response time for a parallel tar operation decreases from roughly 12 seconds to 10 seconds as the number of servers is increased from one to six.
Figure 4
| |
|
In this section, we discuss the evaluation of the performance of Glamor in an experiment that emulated a software development workload using a modified version of the Andrew benchmark [13]. We show that Glamor has acceptable performance in the presence of junctions in the namespace in an environment of a sequential workload with no data sharing. We also show that if the workload can take advantage of multiple replica locations by running operations in different filesets in parallel, Glamor demonstrates better performance compared to a single-server environment.
The benchmark works with an existing fileset (i.e., source tree). Phase I (mkdir) of the benchmark creates directories in the file system being tested; phase II (copy) copies the files from the fileset into the directories created; phase III (stat) recursively lists all the directories; phase IV (grep) scans each copied file; and phase V (make) performs a compilation of the source tree. We created a variant of the Andrew benchmark to demonstrate the effectiveness of data access from multiple replica locations. As input, we used the Glamor source code that contains 253 files and approximately 6 MB of data. In addition, we created Glamor junction points in the directory tree during the mkdir phase so that subsequent operations were spread over two servers through client redirection. Replica locations were created by observing the parallelism demonstrated by using make -j, where the j option is used to spawn a number of processes. We compared this scenario with a single server with no replica locations. In the make phase, we ran multiple jobs simultaneously (with make -j [number of servers]) to allow the compilation to take advantage of replica locations.
Table 1 shows the results of running the benchmark using a single client with a standard system and with two servers for both NFSv3 and NFSv4. Since the first four phases employ a sequential metadata workload, there is no advantage to using multiple replica locations. In fact, the response times are slightly worse because of client performance overheads on following referrals. The benefits are clear in phase V (make), where operations can be run in multiple filesets in parallel and the total response time with Glamor decreases slightly (~6%).
|
| Table 1 Modified Andrew benchmark execution time (in seconds) for Glamor with NFSv3 and NFSv4. |
|
|
|
|
|
| | No. of servers | mkdir | copy | ls -lR | grep | make -j | Total time |
|
| NFSv3 | 1 | 1.094 | 4.39 | 0.318 | 11.178 | 20.437 | 37.417 |
| | 2 | 1.261 | 3.638 | 0.263 | 11.168 | 19.447 | 35.771 |
| NFSv4 | 1 | 1.266 | 6.365 | 0.562 | 13.008 | 18.168 | 39.369 |
| | 2 | 1.044 | 5.321 | 0.564 | 14.315 | 15.988 | 37.232 |
|
| |
|
A number of previous research prototypes and commercial systems have explored some aspects and features of Glamor. Most notably, AFS [4], which is a globally distributed file system, introduced a number of concepts that we refine or reuse in Glamor. AFS introduced the concept of a cell as an administrative domain and support for a global namespace. AFS also introduced the volume abstraction for data management [14]. AFS has extensive client-side file caching for improving performance and supports cache consistency through callbacks. It allows read-only replication that is useful for improving performance. The successor to AFS [4] was the DCE/DFS [6] file system, which had most of the features of AFS but also integrated with the OSF DCE platform. DCE/DFS provided better load-balancing and synchronization features, along with “location transparency” (in which a client need not know the physical location of a file) across domains within an enterprise for easy administration. There were other AFS-related file systems such as Coda [15] that dealt with replication for better scalability while focusing on disconnected operations.
Whereas AFS was primarily designed for geographically distributed clients and servers, NFS [1] has become a widely deployed distributed file system, and it was designed for LANs that use a simple client/server model and a stateless server. Glamor relies on NFSv4 [7] for its operation. Unlike earlier NFS versions, the new NFSv4 protocol integrates file locking, Windows operating system-style share semantics, stronger security, compound operations, and client delegations. It also adds many new features that are leveraged by Glamor. These include volatile filehandles, client redirection, replication and migration support, and a pseudo-namespace. Similar to NFS, CIFS [2] provides the remote-server file access support primarily for Windows operating system-based clients. The Windows DFS [16] protocol extends CIFS to provide a common namespace using server directories and can redirect clients to other CIFS servers similar to NFSv4.
Recently, there has been some work on leveraging the features of NFSv4 to provide global naming and replication support. In Reference [17], the focus is on providing a global namespace and read-write replica synchronization. Other related efforts focus on improving performance by using parallel data access [18, 19]. Numerous IETF (Internet Engineering Task Force) draft proposals highlight the design considerations for NFSv4 naming [20], discuss the issues related to replication and migration [21], and provide implementation guidelines for NFSv4 referral support. In addition to the distributed file system work, there have been a number of commercial cluster file systems [3, 22, 23]. These are all geared for high-performance solutions using high-speed network connections and tightly coupled servers. (Here, the term “tightly coupled” refers to servers that operate in close cooperation and not in a loose federation as in the Glamor framework.)
A large assortment of research prototypes has grouped servers together for a common file service. The Silicon Graphics XFS** file system [24] decentralized the storage services through the use of a set of cooperating servers in a local-area environment. In contrast, the OceanStore [25] project is an archival system, aimed at storing huge collections of data using world-wide replica groups with security and consistency guarantees. Another effort that focuses on security and Byzantine faults is Farsite [26] in which a loosely coupled collection of untrusted insecure servers are grouped together to establish a virtual file server that is secure and reliable. Similarly, Archipelago [27] couples islands3 of data to provide scalable Internet services.
| |
|
In this paper, we demonstrated a commercially viable architecture of a federated file system middleware layer that provides a common namespace and relies only on off-the-shelf client and protocol implementations. Further, we demonstrated how Glamor enhances data mobility and location independence by replicating filesets without disrupting the client and client applications. We described the Glamor implementation that runs on Linux and leverages the LDAP protocol. Using our testbed infrastructure, we demonstrated how Glamor provides a common namespace and performs replication and load-balancing. We provided details on the performance overheads of client redirection both at the server and at the client, and we discussed Andrew benchmark results.
We envision that once the Glamor infrastructure and the mobility of data are established, we can then automatically create filesets and place them at different locations and perform automatic load-balancing with respect to the server and network conditions. As part of an ongoing effort, the Glamor infrastructure on Linux is being proposed as a federated file system standard at the IETF standards body.
*Trademark, service mark, or registered trademark of International Business Machines Corporation in the United States, other countries, or both.
**Trademark, service mark, or registered trademark of Linus Torvalds, Microsoft Corporation, Sun Microsystems, The Open Group, Intel Corporation, Free Software Foundation, or Silicon Graphics, Inc., in the United States, other countries, or both.
| |
| |
1In this paper, the term closer refers to network proximity as measured by the ping time.
2Out-of-band implies the use of a separate external protocol that is not defined by Glamor.
3Here, the term islands refers to self-contained and load-balanced file servers.
Received September 17, 2007; accepted for publication October 11, 2007; Published online June 3, 2008.
|
|