Answers to: Best practices for geo-redundancy?http://linuxexchange.org/questions/768/best-practices-for-geo-redundancy<p>We have a Lighttpd/Perl/MySQL web service we run on an Ubuntu VPS, and want to add redundacy so that if our datacenter has issues, we stay up.</p> <p>Interested in thoughts and comments on our proposed solution:</p> <ul> <li><p>We're looking at using GlusterFS to mirror the web roots and config files for our apps, and MySQL Replication in multimaster mode to mirror the database. Both would run over the WAN/Public Internet between the two datacenters with IPSec Transport mode encryption.</p></li> <li><p>We'd use dual A records (an IP at each datacenter) to host the sites. This would provide for round-robin while things were working, and would failover within 4 seconds (worst case, most browsers release a DNS pinning in 1000ms) to the other server, should connectivity be lost.</p></li> <li><p>GlusterFS and MySQL replication would both "self heal" and update the other server automatically once connectivity was restored, so there is no issue of needing to update an out-of-sync server after failover, and both servers can run in "live mode" with both A records live all the time - so there is no DNS propagation to take place to make a failover happen.</p></li> <li><p>In the event of software failure or a need to take one server offline for maintenance (rather than connectivity failure) we could simply pull one server's IP offline using the VPS control panel, or firewall it temporarily with iptables on the server itself.</p></li> <li><p>As well as the automatic failover we'd experience with a datacenter outage, we could also automatically initiate a failover in the event of software failure on one server using automatic monitoring - if one server isn't returning the content we expect to see, we would get an alert, and the monitoring software would automatically pull the offending server offline using the VPS host's API - causing requests to fail over to the other.</p></li> </ul> <p>Interested to know if anyone has tried doing anything similar, or for any thoughts, comments or suggestions on the above strategy.</p>enFri, 23 Jul 2010 15:30:32 -0400Answer by rfelsburghttp://linuxexchange.org/questions/768/best-practices-for-geo-redundancy/1129<p>I would recommend looking into Linux-HA, and heartbeat and pacemaker.</p> <p>I use them for high availability failovers between servers.</p>rfelsburgFri, 23 Jul 2010 15:30:32 -0400http://linuxexchange.org/questions/768/best-practices-for-geo-redundancy/1129Answer by Jeffery 1http://linuxexchange.org/questions/768/best-practices-for-geo-redundancy/838<p>As a followup to some of these "why bother" / "paranoia" comments:</p> <p>We have experienced four outages in the last month within our current datacenter (naming no names, but they are a huge organization in Dallas) which is why we're looking to add redundancy.</p> <p>Each time our services go down we lose a significant amount of money as our advertising partners (Google, etc) are bringing visitors for which we pay, but who can't see our site to convert to sales. While some advertising partners such as Adwords can be "paused" in a short space of time during an outage, not all of them can. Working with small margins at high volume as we are, any extended outage can be very expensive.</p> <p>Although the datacenter claims 100% uptime, things can still happen, as we saw last month.</p> <p>The amount we lost due to the outages so far last month, was far LESS than the cost of a redundant setup in another facility would have been.</p> <p>-J</p>Jeffery 1Fri, 11 Jun 2010 13:58:42 -0400http://linuxexchange.org/questions/768/best-practices-for-geo-redundancy/838Answer by nanodiamondhttp://linuxexchange.org/questions/768/best-practices-for-geo-redundancy/817<p>Paranoia => greater cost than benefit obtained. Sounds like a Fed job.</p>nanodiamondWed, 09 Jun 2010 18:34:23 -0400http://linuxexchange.org/questions/768/best-practices-for-geo-redundancy/817Answer by jeremyhttp://linuxexchange.org/questions/768/best-practices-for-geo-redundancy/803<p>Possibly not the answer you're looking for, but have you done a cost benefit analysis to ascertain the true impact of an outage and then compared that with the costs of the kind of redundancy you're talking about? Have you taken into account the added complexity the above setup adds and ensured it won't cause more downtime then the issue you're trying to avoid? True real-time live geographically redundant infrastructure can be extremely complex and in many cases is overkill. A good backup schedule and properly planned disaster recovery plan can often mitigate a datacenter outage (by failing over to your backup datacenter) to just a couple of minutes, while avoiding a lot of the complexity. FWIW, the fact that you're currently running on a VPS was the first indication that what you're looking to do is almost certainly overkill, but I could be wrong.</p> <p>--jeremy</p>jeremyTue, 08 Jun 2010 18:45:44 -0400http://linuxexchange.org/questions/768/best-practices-for-geo-redundancy/803