[클러스터][파일] 클러스터 파일 시스템 리스트 업

병렬 파일 시스템 분야

* PVFS (Parallel Virtual File System)

– http://www.parl.clemson.edu/pvfs/

– Clemson University

PVFS provides the following features:

Compatibility with existing binaries

Ease of installation

User-controlled striping of files across nodes

Multiple interfaces, including a MPI-IO interface via ROMIO

Utilizes commodity network and storage hardware

PVFS provides four important capabilities in one package:

a consistent file name space across the machine

transparent access for existing utilities

physical distribution of data across multiple disks in multiple cluster nodes

high-performance user space access for applications


– http://www.pvfs.org/pvfs2/

– Clemson University

PVFS2 provides the following features:

Ease of installation

User-controlled striping of files across nodes, and a well defined interface for defining new distribution schemes

Multiple interfaces, including a MPI-IO interface via ROMIO

Utilizes commodity network and storage hardware

Very modular design

Native support for several popular networking technologies like Myrinet, Infiniband, and TCP/IP

Support for user-defined access patterns

Support for heterogenous clusters

Distributed metadata


– http://www.lustre.org/

– CFS inc.

features :

Unparalleled scalability: objectbased storage architecture that scales to tens of thousands of

clients and petabytes of data – a file system without limits.

Proven performance: dramatic increase in throughput and I/O by intelligent serialization and

separation of metadata operations from data manipulation.

Open Source, open standards: developed and maintained as Open Source software with an open

networking protocol and POSIX file system semantics -ensuring broad support for industry-standard

platforms and heterogeneous networking environments.

Innovative distributed lock manager: intent-based optimizations prevent bottlenecks and increase

overall data throughput.

Cost effective: support for industry-standard platforms and heterogeneous networking

environments significantly reduces deployment and support costs.

High availability: designed to support transparent failover in all server components.


– http://portal.acm.org/citation.cfm?id=233558&dl=ACM&coll=portal

– IBM T. J. Watson Research Center

all data accesses are done directly to the I/O node that contains the requested data

different access patterns are show to achieve similar performance

good scalability with increased resources


– http://www.mathcs.emory.edu/pious/

– Emory University

a fully functional parallel file system on top of PVM

unique features, including:

two-dimensional file objects and logical file views,

coordinated file access with guaranteed consistency semantics,

data declustering for scalable performance,

transaction support and user-selectable fault tolerance modes,

extended file maintenance primitives for managing declustered files, and

C and Fortran language bindings.


– http://laurel.datsi.fi.upm.es/~gp/parfisys.html

– Universidad Politecnica de Madrid  

provides parallel services transparently

hiding the physical data distribution across the system

simplify file system access


– http://www.cs.dartmouth.edu/~dfk/nils//galley.html

– Dartmouth University


Support for many common access patterns

Complete control over parallelism

New three-dimensional file structure

Support for libraries

Asynchronous I/O

Low memory and computational overhead

Designed to be portable


– http://www.crpc.rice.edu/newsletters/win97/work_PPFS.html


allowing the application to advertise access patterns, control caching and prefetching, and

even control data placement.

extensible and portable, making possible a wide range of experiments on a broad variety of

platforms and configurations


– http://www.csis.hku.hk/cluster2003/presentation/technical/3C-3.pdf

DCFS는 PVFS와 거의 같은데 조금 다른 점이 있다. 우선 metadata를 PVFS에서는 MGR이라는 서버를 두어

이곳에서 모든 metadata를 관리했지만 DCFS에서는 metadata를 여러 개의 서버에 분산해서 보관한다.

그래서 metadata에 접근하는 시간이 병렬적으로 처리되어 성능 향상을 꾀할 수 있다.

* Armada

– http://www.cs.dartmouth.edu/~dfk/armada/

– Dartmouth University

an I/O framework that allows data-intensive applications to efficiently access

geographically distributed data sets

not a fully-featured I/O system

it lacks support for data management, security, and fault tolerance

* GPFS (General Parallel File System)

– http://www-1.ibm.com/servers/eserver/clusters/software/gpfs.html


High-performance parallel, scalable file system for Linux cluster environments

Shared-disk file system where every cluster node can have concurrent read/write access to a file

High availability through automatic recovery from node and disk failures

simplify multinode administration

the recreation of consistent structures for rapid recovery after node failures

“a backup or mirroring application to run concurrently with the online system.”!!

* Sun PFS

– http://www.sun.com/servers/white-papers/psfwhitepaper.pdf?rendition=pdf


provides high-performance file I/O to multiprocess applications running in distributed-memory,

cluster-based environments


분산 파일 시스템 분야


– http://www.uwsg.iu.edu/usail/network/nfs/overview.html

– Sun Microsystems.

remote access to shared file systems across networks

export or mount directories to other machines

a client/server architecture

* xFS: Serverless Network File Service

– http://now.cs.berkeley.edu/Xfs/xfs.html

– UC Berkeley

xFS는 분산 파일 시스템 환경에서 메타데이터 등을 관리하는 중앙 서버를 두지 않고 오직 클라이언트(peer)들만의

관계만으로 고성능, 확장성 및 신뢰성을 제공하는 파일 시스템의 한 종류이다.

xFS는 control processing과 data를 분산시켰고 각각의 클라이언트의 메모리를 활용한 cooperative caching을

도입하여 기존의 분산 파일 시스템(예:NFS)보다 더 뛰어난 성능을 제공한다.

즉, xFS의 기본 원리는 다음과 같다.

A.Dynamically distributes control processing across the system on a per-file granularity

B.Distributes its data storage across storage server disks

C.Eliminates central server caching : using cooperative caching instead of this

xFS는 크게 4가지 기반 기술 위에서 고안되고 구현되었다.

A.RAID : Redundant Arrays of Inexpensive Disks

B.LFS : Log-structured File System

C.Zebra : 이것은 RAID와 LFS를 엮어서 각각의 장점을 활용한 시너지 효과를 내게끔 하는 방식임

D.Multiprocessor Cache Consistency : locality 및 네트워크 통신의 부하를 줄이기 위해 사용되는

cache의 일관성을 유지하기 위해 사용되는 개념 ? providing uniform view of storage across system

* GFS (Global File System)

– http://www.redhat.com/software/rha/gfs/

– Redhat (Open Source)

high IO throughput

no single-point-of-failure

file system and volume resizing to be made while the system remains on-line to increase system availability

fast, scalable, high throughput access to a single shared file system

Scalable to hundreds of servers

Quota system for cluster-wide storage capacity management.

Direct IO support allows databases to achieve high performance without traditional file system overheads.

Dynamic multi-pathing to route around switch or HBA failures in the storage area network.

Dynamic capacity growth while the file system remains on-line and available.

Can serve as a scalable alternative to NFS.


– http://www.sgi.com/products/storage/cxfs.html


shared filesystem for storage area networks

Delivers on the promise of SANs

Instant, multi-OS, no copy data sharing

Time-tested, proven solution

Scalability and performance ensure investment protection

Solid, standard data integrity

Architected to scale up to 18 million terabytes

Guaranteed application bandwidth priorities with GRIO v2

High availability with data access failover

Most robust storage resource management available

True LAN-Free backup and restore

Complete SAN/NAS gateway


– http://www.coda.cs.cmu.edu/ljpaper/lj.html

– Carnegie Mellon University

disconnected operation for mobile clients

        reintegration of data from disconnected clients

        bandwidth adaptation

Failure Resilience

        read/write replication servers

        resolution of server/server conflicts

        handles of network failures which partition the servers

        handles disconnection of clients client

Performance and scalability

        client side persistent caching of files, directories and attributes for high performance

        write back caching


        kerberos like authentication

        access control lists (ACL’s)

Well defined semantics of sharing

* InterMezzo

– http://www.inter-mezzo.org/

– Carnegie Mellon University

distributed file system with a focus on high availability

filtering file system to generate a modification log file which is suitable for replay on other hosts

InterSync is a scalable client server system to synchronize InterMezzo file systems

systems. It avoids the need to scan file systems and can benefit from proxies and highly performing webservers

* Alternate File Sharing Systems : AFS, RFS

– http://www.uwsg.iu.edu/usail/network/nfs/afss.html

AFS scales better than NFS

excellent performance on wide-area configuration and security based on kerberos mutual authentication


Unlike NFS which provides a generic file system, RFS provides an exact copy of a Unix file system.

RFS provides access to files and directories without the user having to know where the resource is located. A name-server is used to register resource names, so the client machines doesn’t need to know where the resources are.

RFS allows users to mount special directories so that they can share devices (e.g. tape drives) residing on other machines.

RFS is a stateful protocol; the server maintains state information of local resources. The server can detect client crashes, so that cache consistency is guaranteed.

For the most part, security threats associated with NFS are also associated with RFS


– http://www.microsoft.com/windows2000/techinfo/howitworks/fileandprint/dfsnew.asp

– Microsoft

uniting files on different computers into a single name space

a single, hierarchical view of multiple file servers and file server shares on your network

* Frangipani

– http://citeseer.ist.psu.edu/thekkath97frangipani.html


provides all its users with coherent,

shared access to the same set of files, yet is scalable to provide

more storage space, higher performance, and load balancing as the

user community grows. It remains available in spite of component

failures. It requires little humanadministration, and administration

does not become more complex as more components are added to

a growing installation

* Google FS (GFS)

– http://www.cs.rochester.edu/sosp2003/papers/p125-ghemawat.pdf


a scalable distributed file system for large distributed data-intensive applications

provides fault tolerance while running on inexpensive commodity hardware, and it delivers

high aggregate performance to a large number of clients.

* MojaveFS

– http://chaos2.org/whoami/pub/icdcs03.pdf

a distributed file system that uses transactions to facilitate reliable concurrent programming

provides a global uniform namespace which allows for mobile computing

Transactions are supported through a journalling mechanism, and replication provides fault tolerance


– http://www.hpl.hp.com/techreports/2001/HPL-2001-19.pdf

– HP

a distributed file system designed for storage area networks

robust against failures and unfavorable access patterns

high scalability by a partitioning approach to sharing

storage resources

independent of the physical file system(s) used for the placement of data;

multiple file systems can co-exist in a DiFFS system.


슈퍼컴퓨팅 전문 기업 클루닉스/ 상무(기술이사)/ 정보시스템감리사/ 시스존 블로그 운영자

You may also like...

페이스북/트위트/구글 계정으로 댓글 가능합니다.