Pages

Sunday 30 March 2014

Storages

It came to my attention that the storage and protocols used nowadays were not that quite understandably presented. So I took my time to gather some material on the subject and did a summary from what I found, to see the original documents please see the reference in the bottŠ¾m.


 SANs are primarily used to enhance storage devices, such as disk arraystape libraries, and optical jukeboxes, accessible toservers so that the devices appear like locally attached devices to the operating systemSAN does not provide file abstraction, only block-level operations. However, file systems built on top of SANs do provide file-level access, and are known as SAN filesystems or shared disk file systems.

NAS vs SAN
The primary difference between NAS and SAN solutions is the type of access protocol. NAS protocols such as NFS and CiFS provide shared file level access to storage resources. The management of the file system resides with the NAS device. SAN protocols such as iSCSI and fibre channel provide block level access to storage resources. Block level devices are accessed by servers via the SAN, and the servers manage the file system.

  • NAS devices typically leverage existing IP networks for connectivity, enabling companies to reduce the price of entry for access to shared storage.
  • The RAID and clustering capabilities inherent to modern enterprise NAS devices offer greatly improved availability when compared with traditional direct attached storage.
  • Because NAS devices control the file system, they offer increased flexibility when using advanced storage functionality such as snapshots.
  • With 10GE connectivity, NAS devices can offer performance on par with many currently installed fibre channel SANs

Benefits of NAS
DAS is optimized for single, isolated processors and low initial cost.
DAS second hard disk is today most advisably connected as an external unit, or what is sometimes now known as a "DAS" or direct attached storage drive. DAS external hard disks connect via a USBfirewire or an E-SATA interface (see the hardware section), with USB being the most common.

On servers and high-end PC workstations (such as those used for high-end video editing), at least two hard disks are often linked together using a technology called RAID. This stands for "redundant array of independent disks" (or sometimes "redundant array of inexpensive drives"), and stores the data in each user volume on multiple physical drives.

SAN is optimized for performance and scalability. Some of the major potential 
benefits include support for high-speed Fibre Channel media which is optimized 
for storage traffic, managing multiple disk and tape devices as a shared pool 
with a single point of control, specialized backup facilities that can reduce 
server and LAN utilization and wide industry support.

NAS is optimized for ease-of-management and file sharing using lower-cost 
Ethernet-based networks. Installation is relatively quick, and storage capacity is 
automatically assigned to users on demand.

NAS gateways are optimized to provide NAS benefits with more flexibility 
in selecting the disk storage than offered by a conventional NAS device. 
Gateways can also protect and enhance the value of installed disk systems.

Despite their differences, SAN and NAS are not mutually exclusive, and may be combined in multi-protocol or unified storage arrays, offering both file-level protocols (NAS) and block-level protocols (SAN) from the same system. The best of both worlds!

Fibre Channel Protocol (FCP) is a transport protocol (similar to TCP used in IP networks) that predominantly transports SCSI commands over Fibre Channel networks.[1][2]

Fibre Channel, or FC, is a high-speed network technology (commonly running at 2-, 4-, 8- and 16-gigabit per second rates) primarily used to connect computer data storage.[1][2] 



RAID
RAID 0 and RAID 1 image
RAID 5 and RAID 10 image

Many possible RAID configurations are available. The first is called "RAID 0". This divides or "strips" the data in a storage volume across two or more disks, with half of each file written to one disk, and half to another. This improves overall read/write performance without sacrificing capacity. So, for example (as shown above), two 1TB drives may be linked to form a 2TB array. Because this virtual volume is faster than either of its component disks, RAID 0 is common used on video editing workstations.
In contrast to RAID 0, "RAID 1" is primarily intended to protect data against hardware failure. Here data is duplicated or "mirrored" across two or more disks. The data redundancy so created means that if one physical drive fails there is still a complete copy of its contents on another drive. However, this does mean that drive capacity is sacrificed. For example (as shown above), a 1TB RAID 1 volume requires two 1TB disks. While data write performance is not improved by using RAID 1, data read times are increased as multiple files can be accessed simultaneously from different physical drives.

If more than two drives are used, several other configurations become possible. For example, using three of more drives, "RAID 5" strikes a balance between speed and redundancy by stripping data across two drives but also writing "parity" data to a third. Parity data maintains a record of the differences between the blocks of data on the other drives, in turn permitting file restoration in the event of a drive failure. (A great explanation of parity and RAID 5 in detail can be found in this video. For mission-critical applications, "RAID 10" strips and mirrors data across four or more drives to provide the gold standard in performance and redundancy. You can find a more detailed explanation of RAID 0, 1, 5 and 10 on TheGeekStuff.com.

Many modern personal computer motherboards permit two SATA hard disk drives to set up in a RAID configuration. However, for users who do not require the extra speed provided by RAID 0, RAID 5 or RAID 10, there are relatively few benefits to be gained. Not least, it needs to be remembered that any hardware setup featuring more than one internal hard disk -- whether or not in a RAID configuration -- at best provides marginal improvements in data security and integrity. This is simply because it provides no more tolerance to the theft of the base unit, nor to power surges or computer power supply failures (which can simply fry two or more hard drives at once rather than one). A summary of RAID can also be found in my Explaining RAID video.

Logical unit number

In computer storage, a logical unit number, or LUN, is a number used to identify a logical unit, which is a device addressed by the SCSI protocol or protocols which encapsulate SCSI, such as Fibre Channel or iSCSI. A LUN may be used with any device which supports read/write operations, such as a tape drive, but is most often used to refer to a logical disk as created on a SAN. Though not technically correct, the term "LUN" is often also used to refer to the logical disk itself.
There is no 1:1 relationship between physical disk drives and LUNs. When provisioning storage, the administrator uses management software to create LUNs. They can create, for example, more than one LUN from one physical drive, which would then appear as two or more discrete drives to the user. Or they may create a number of LUNs that span several separate disks that form a RAID array; but, again, these will appear as discrete drives to users.
LUNs can be shared between several servers; for example, between an active server and a failover server. But problems can arise if a number of servers access the same LUN at the same time. There needs to be a method of ensuring data integrity because blocks are subject to change by the activities of those servers. For this, you need something like a clustered volume manager, clustered file system, clustered application or a network file system using NFS or CIFS.
In a SAN fabric, LUN storage is essential to the configuration of the environment and its performance. A LUN is a unique identifier given to separate devices, or logical units, so they can be accessed by a SCSI, iSCSI or Fibre Channel protocol. LUNs are key to disk array configuration because disks are typically defined in sets of RAID groups to protect against failure; however, those RAID groups can't be presented to the host. By assigning LUNs, all or a portion of a RAID group's capacity can be presented to the host as individual, mountable volumes.
From the computer perspective, SCSI LUN is only a part of the full SCSI address. The full device's address is made from the:
In the Unix family of operating systems, these IDs are often combined into a single "name". For example, /dev/dsk/c1t2d3s4 would refer to controller 1, target 2, disk 3, slice 4. Presently SolarisHP-UXNCR, and others continue to use "cXtXdXsX" nomenclature, while AIX has abandoned it in favor of more familiar names.


cXtXdXsX nomenclature in Unix
  • t-part: target ID identifying the SCSI target on that bus,
  • d-part: disk ID identifying a LUN on that target,
  • s-part: slice ID identifying a specific slice on that disk.

The cable and the host adapter form the SCSI bus, and this operates independently of the rest of the computer. Each of the eight devices is given a unique address by the SCSI BIOS, ranging from 0 to 7 for an 8-bit bus or 0 to 15 for a 16-bit bus. Devices that request I/O processes are called initiators.Targets are devices that perform operations requested by initiators. 

References: