Page content:
The Target testbed has two types of storage - one, where data is stored in files, and another one, where the metadata for these files is stored on separate metadata servers based on SSD disks (Fusion-IO). Key requirements for the metadata servers include good performance, high availability and redundancy.
The GPFS-based data storage offers some crucial capabilities to the Target testbed. Among them are
- Proven and effective scalability to above Petabyte scales
- Separation of hardware and software with the possibility of extensions and upgrades with minimal disturbance of normal operations
- An integrated life cycle management system and a combination of different data storage types, including tape storage
- Fast I/O for large data files and datasets ensured by the parallel file system
- Dynamic access patterns that warrant smooth operation of applications with diverse requirements ranging fromhigh number of I/O operationsper second (many small files) to high streaming bandwidth (few very large files)
|
The storage facilities of Target has recently reached the impressive 10 Petabytes. Based on the GPFS file system of IBM, Target storage is easily scalable and cost-effective.
|
The hardware infrastructure is hosted by the CIT data center at the University of Groningen campus. The Oracle database is distributed with a number of database servers. The data stored on the Target testbed must be assigned a certain type and importance (the importance may change dynamically with time). Based on these, and the automated life cycle management policies, provided by the data owner, an optimum storage pool in defined. The Target testbed has five data storage pool, each one comprising of similar storage facilities in terms of size, performance, energy consumption and cost.
- Tier Aconsists of relatively inexpensive, SAS-based hard disks, fast access, in a mirrored setup; mainly used for database files to minimize failure probability
- Tier B consists of large capacity, streaming, SATA based hard disks; mainly used for larger files with a sequential access patterns
- Tier C consists of medium capacity, high I/O per second, Fibre Channel based hard disks; focuses mainly on high performance for random file access patterns
-
Tier D consists of large capacity tape LTO4 & LTO5 library storage (HSM, ILM, archive purposes; mainly used for rare access patterns
- Tier E consists of large capacity, streaming, SATA based hard disks; mainly used for larger files with a sequential access patterns
The Target infrastructure uses four different types of interconnections.
The compute nodes and the data storage pools are connected with a ~80 Gbit backbone network core.
The storage interconnections link the storage components to the storage servers. Within Tier A, storage components are directly connected to the servers through SAS connections, whileTier B, C, D, and E use an 8 Gbit fiber-channel connection. The cluster interconnects link together the storage, database and application servers through a redundant 10 GbitTCP/IP-based Ethernet network providing a high bandwidth and low latency. The same type of network is used to connect the different databases into a clustered solution. Finally, the management interconnect, used to manage all components is a 1 Gbit Ethernet network.
|
The storage facilities of the target testbed are located at the CIT data center
|
| Last modified: | April 10, 2012 17:30 |
|
Associative links:
|