[클러스터][파일] 클러스터 파일 시스템 리스트 업
병렬 파일 시스템 분야
* PVFS (Parallel Virtual File System)
– http://www.parl.clemson.edu/pvfs/
– Clemson University
PVFS provides the following features:
Compatibility with existing binaries
Ease of installation
User-controlled striping of files across nodes
Multiple interfaces, including a MPI-IO interface via ROMIO
Utilizes commodity network and storage hardware
PVFS provides four important capabilities in one package:
a consistent file name space across the machine
transparent access for existing utilities
physical distribution of data across multiple disks in multiple cluster nodes
high-performance user space access for applications
* PVFS2
– http://www.pvfs.org/pvfs2/
– Clemson University
PVFS2 provides the following features:
Ease of installation
User-controlled striping of files across nodes, and a well defined interface for defining new distribution schemes
Multiple interfaces, including a MPI-IO interface via ROMIO
Utilizes commodity network and storage hardware
Very modular design
Native support for several popular networking technologies like Myrinet, Infiniband, and TCP/IP
Support for user-defined access patterns
Support for heterogenous clusters
Distributed metadata
* LUSTRE
– http://www.lustre.org/
– CFS inc.
features :
Unparalleled scalability: objectbased storage architecture that scales to tens of thousands of
clients and petabytes of data – a file system without limits.
Proven performance: dramatic increase in throughput and I/O by intelligent serialization and
separation of metadata operations from data manipulation.
Open Source, open standards: developed and maintained as Open Source software with an open
networking protocol and POSIX file system semantics -ensuring broad support for industry-standard
platforms and heterogeneous networking environments.
Innovative distributed lock manager: intent-based optimizations prevent bottlenecks and increase
overall data throughput.
Cost effective: support for industry-standard platforms and heterogeneous networking
environments significantly reduces deployment and support costs.
High availability: designed to support transparent failover in all server components.
* VESTA
– http://portal.acm.org/citation.cfm?id=233558&dl=ACM&coll=portal
– IBM T. J. Watson Research Center
all data accesses are done directly to the I/O node that contains the requested data
different access patterns are show to achieve similar performance
good scalability with increased resources
* PIOUS
– http://www.mathcs.emory.edu/pious/
– Emory University
a fully functional parallel file system on top of PVM
unique features, including:
two-dimensional file objects and logical file views,
coordinated file access with guaranteed consistency semantics,
data declustering for scalable performance,
transaction support and user-selectable fault tolerance modes,
extended file maintenance primitives for managing declustered files, and
C and Fortran language bindings.
* PARFISYS
– http://laurel.datsi.fi.upm.es/~gp/parfisys.html
– Universidad Politecnica de Madrid
provides parallel services transparently
hiding the physical data distribution across the system
simplify file system access
* GALLEY
– http://www.cs.dartmouth.edu/~dfk/nils//galley.html
– Dartmouth University
Features
Support for many common access patterns
Complete control over parallelism
New three-dimensional file structure
Support for libraries
Asynchronous I/O
Low memory and computational overhead
Designed to be portable
* PPFS
– http://www.crpc.rice.edu/newsletters/win97/work_PPFS.html
– UIUC
allowing the application to advertise access patterns, control caching and prefetching, and
even control data placement.
extensible and portable, making possible a wide range of experiments on a broad variety of
platforms and configurations
* DCFS
– http://www.csis.hku.hk/cluster2003/presentation/technical/3C-3.pdf
DCFS는 PVFS와 거의 같은데 조금 다른 점이 있다. 우선 metadata를 PVFS에서는 MGR이라는 서버를 두어
이곳에서 모든 metadata를 관리했지만 DCFS에서는 metadata를 여러 개의 서버에 분산해서 보관한다.
그래서 metadata에 접근하는 시간이 병렬적으로 처리되어 성능 향상을 꾀할 수 있다.
* Armada
– http://www.cs.dartmouth.edu/~dfk/armada/
– Dartmouth University
an I/O framework that allows data-intensive applications to efficiently access
geographically distributed data sets
not a fully-featured I/O system
it lacks support for data management, security, and fault tolerance
* GPFS (General Parallel File System)
– http://www-1.ibm.com/servers/eserver/clusters/software/gpfs.html
– IBM
High-performance parallel, scalable file system for Linux cluster environments
Shared-disk file system where every cluster node can have concurrent read/write access to a file
High availability through automatic recovery from node and disk failures
simplify multinode administration
the recreation of consistent structures for rapid recovery after node failures
“a backup or mirroring application to run concurrently with the online system.”!!
* Sun PFS
– http://www.sun.com/servers/white-papers/psfwhitepaper.pdf?rendition=pdf
– SUN
provides high-performance file I/O to multiprocess applications running in distributed-memory,
cluster-based environments
===============================================================================================
분산 파일 시스템 분야
* NFS
– http://www.uwsg.iu.edu/usail/network/nfs/overview.html
– Sun Microsystems.
remote access to shared file systems across networks
export or mount directories to other machines
a client/server architecture
* xFS: Serverless Network File Service
– http://now.cs.berkeley.edu/Xfs/xfs.html
– UC Berkeley
xFS는 분산 파일 시스템 환경에서 메타데이터 등을 관리하는 중앙 서버를 두지 않고 오직 클라이언트(peer)들만의
관계만으로 고성능, 확장성 및 신뢰성을 제공하는 파일 시스템의 한 종류이다.
xFS는 control processing과 data를 분산시켰고 각각의 클라이언트의 메모리를 활용한 cooperative caching을
도입하여 기존의 분산 파일 시스템(예:NFS)보다 더 뛰어난 성능을 제공한다.
즉, xFS의 기본 원리는 다음과 같다.
A.Dynamically distributes control processing across the system on a per-file granularity
B.Distributes its data storage across storage server disks
C.Eliminates central server caching : using cooperative caching instead of this
xFS는 크게 4가지 기반 기술 위에서 고안되고 구현되었다.
A.RAID : Redundant Arrays of Inexpensive Disks
B.LFS : Log-structured File System
C.Zebra : 이것은 RAID와 LFS를 엮어서 각각의 장점을 활용한 시너지 효과를 내게끔 하는 방식임
D.Multiprocessor Cache Consistency : locality 및 네트워크 통신의 부하를 줄이기 위해 사용되는
cache의 일관성을 유지하기 위해 사용되는 개념 ? providing uniform view of storage across system
* GFS (Global File System)
– http://www.redhat.com/software/rha/gfs/
– Redhat (Open Source)
high IO throughput
no single-point-of-failure
file system and volume resizing to be made while the system remains on-line to increase system availability
fast, scalable, high throughput access to a single shared file system
Scalable to hundreds of servers
Quota system for cluster-wide storage capacity management.
Direct IO support allows databases to achieve high performance without traditional file system overheads.
Dynamic multi-pathing to route around switch or HBA failures in the storage area network.
Dynamic capacity growth while the file system remains on-line and available.
Can serve as a scalable alternative to NFS.
* CXFS
– http://www.sgi.com/products/storage/cxfs.html
– SGI
shared filesystem for storage area networks
Delivers on the promise of SANs
Instant, multi-OS, no copy data sharing
Time-tested, proven solution
Scalability and performance ensure investment protection
Solid, standard data integrity
Architected to scale up to 18 million terabytes
Guaranteed application bandwidth priorities with GRIO v2
High availability with data access failover
Most robust storage resource management available
True LAN-Free backup and restore
Complete SAN/NAS gateway
* CODA
– http://www.coda.cs.cmu.edu/ljpaper/lj.html
– Carnegie Mellon University
disconnected operation for mobile clients
reintegration of data from disconnected clients
bandwidth adaptation
Failure Resilience
read/write replication servers
resolution of server/server conflicts
handles of network failures which partition the servers
handles disconnection of clients client
Performance and scalability
client side persistent caching of files, directories and attributes for high performance
write back caching
Security
kerberos like authentication
access control lists (ACL’s)
Well defined semantics of sharing
* InterMezzo
– http://www.inter-mezzo.org/
– Carnegie Mellon University
distributed file system with a focus on high availability
filtering file system to generate a modification log file which is suitable for replay on other hosts
InterSync is a scalable client server system to synchronize InterMezzo file systems
systems. It avoids the need to scan file systems and can benefit from proxies and highly performing webservers
* Alternate File Sharing Systems : AFS, RFS
– http://www.uwsg.iu.edu/usail/network/nfs/afss.html
AFS scales better than NFS
excellent performance on wide-area configuration and security based on kerberos mutual authentication
RFS
Unlike NFS which provides a generic file system, RFS provides an exact copy of a Unix file system.
RFS provides access to files and directories without the user having to know where the resource is located. A name-server is used to register resource names, so the client machines doesn’t need to know where the resources are.
RFS allows users to mount special directories so that they can share devices (e.g. tape drives) residing on other machines.
RFS is a stateful protocol; the server maintains state information of local resources. The server can detect client crashes, so that cache consistency is guaranteed.
For the most part, security threats associated with NFS are also associated with RFS
* DFS
– http://www.microsoft.com/windows2000/techinfo/howitworks/fileandprint/dfsnew.asp
– Microsoft
uniting files on different computers into a single name space
a single, hierarchical view of multiple file servers and file server shares on your network
* Frangipani
– http://citeseer.ist.psu.edu/thekkath97frangipani.html
– DEC
provides all its users with coherent,
shared access to the same set of files, yet is scalable to provide
more storage space, higher performance, and load balancing as the
user community grows. It remains available in spite of component
failures. It requires little humanadministration, and administration
does not become more complex as more components are added to
a growing installation
* Google FS (GFS)
– http://www.cs.rochester.edu/sosp2003/papers/p125-ghemawat.pdf
– GOOGLE
a scalable distributed file system for large distributed data-intensive applications
provides fault tolerance while running on inexpensive commodity hardware, and it delivers
high aggregate performance to a large number of clients.
* MojaveFS
– http://chaos2.org/whoami/pub/icdcs03.pdf
a distributed file system that uses transactions to facilitate reliable concurrent programming
provides a global uniform namespace which allows for mobile computing
Transactions are supported through a journalling mechanism, and replication provides fault tolerance
* DiFFS
– http://www.hpl.hp.com/techreports/2001/HPL-2001-19.pdf
– HP
a distributed file system designed for storage area networks
robust against failures and unfavorable access patterns
high scalability by a partitioning approach to sharing
storage resources
independent of the physical file system(s) used for the placement of data;
multiple file systems can co-exist in a DiFFS system.