We have migrated to a high-availability, multi-location iSCSI SAN (see below for a brief technical explanation [for those who want it]). This improved architecture directly enhances our service performance in two important ways:
Storage consolidation
Our once-disparate storage resources can now be moved from servers around their network to central locations (i.e.: in data centers). This allows for more efficiency in the allocation of storage. This means, for example, a server can be allocated a new disk volume without any change to hardware or cabling. Our clients will benefit from this flexibility as the likelihood of network interruptions will be further decreased.
Disaster recovery
Storage resources can be easily mirrored from one data center to a second site in real time. This second mirror site serves as a hot standby in the event of a prolonged outage. In particular, our iSCSI SAN architecture allows entire disk arrays to be migrated across a WAN with minimal configuration changes, in effect making storage "routable" in the same manner as network traffic. This means, in the event of a disaster on our end, we have taken the steps to ensure that clients’ data has been properly safeguarded, and that we can recover within minutes, if not within a “heart beat”.
Off-Site Replication
Most of the improvements in our infrastructure described above were made with one clear objective: Enable the automatic replication of client data to our secondary hot site. Don’t forget: Abaxio provides a viable offsite backup solution for its clients, most of who have below 2 TB. But we ourselves have *Peta* Bytes of data to protect, so solutions need to be extremely creative, and well-planned.
Design and implementation of the infrastructure described above required years of research and planning, and it seems to have paid off. We can now go beyond a periodic off-site backup to a high-availability, hot mirror site, ensuring minimal data loss, and enable immediate recovery from any disaster or system outage. Our system uses patented replication and failover technology that continuously captures byte-level changes as they happen and replicates those changes to one or more target servers at a secondary location. In the event of a disaster, we can recover from our target servers in minutes, if not seconds. As well, testing has shown that our software delivers better protection than many hardware-based solutions, and it cost us hundreds of thousands of dollars less to implement. We are able to scale organically as our data requirements grow.
Full-Server Failover
The Full-Server Failover feature of our system applies the source server’s OS configuration, applications, and data to the target server. Because applications do not have to be pre-installed and maintained on the target server, full-server failover is easy to configure. Full-Server Failover supports one-to-one connections of 32 bit or 64bit servers and can failover the LAN or WAN to dissimilar hardware.
Failback/Restore
In the event of a failure, we can facilitate data restoration from the target back to the original source or to an alternate location. Through a UI, we can easily restore data from the replicated disk back to the production disk once the failure is corrected. This greatly reduces the time to recover and restore, as it is not necessary to go offsite for tapes and then restore one at a time. This also ensures that we recover from the time the failure occurred, not from when the last backup was taken, which can result in a day or more of lost data and productivity. Unlike other solutions we’ve seen that make the users remember which files came from what location, our restore process automatically reverses the direction of the original replication job.
Block Checksum Re-Mirror
Should a disconnect occur between the source and target, instead of doing a complete mirror of the entire replication set, our system can perform a block-checksum re-mirror. This re-mirror only replicates the file differences between the source and target, which takes much less time and resources to accomplish. This ensures that the target is coordinated with the source. This feature greatly simplifies backup, recovery, and replication management.
iSCSI Explained:
In short, what is iSCSI, you may ask? Well, for the wire-heads amongst us, it means: Instead of fragmenting and encapsulating actual data files into packets, as with NAS, iSCSI encapsulates much smaller SCSI command blocks. This results in data transmission speeds at nearly the same rate as direct SCSI connections, and relieves the network of bandwidth-depleting file-level storage traffic.
In addition, since block-level I/O is transferred over IP, high performance data storage is no longer held captive to just LAN and MAN (Metropolitan Area Network) environments, but is now applicable in WAN environments as well. Without the distance restrictions associated other storage solutions, iSCSI presents an effective way for us to accomplish high performance offsite disaster recovery (see below). Use of iSCSI is also a great way to consolidate or “pool” data, making management more effective. This results in higher storage utilization which helps to keep IT costs down.