March 27th, 2014 | Published in Status
Runbox has seen a tremendous growth in our user base over the past months following the NSA revelations in the press. As a consequence of this we started executing our plans in January to acquire and install new and powerful virtualization servers and storage units.
Moving to the new servers
After substantial preparation of our server infrastructure we started moving data to the new ZFS based storage servers this week. The new storage servers are substantially faster, more reliable, and adds a lot more capacity than the current ones, and this process is moving forward steadily.
We are also deploying new, IMAP servers as an intermediate step towards completely replacing our application server infrastructure. The IMAP servers we are currently deploying will improve IMAP performance while we complete the process of installing new, physical application servers that will replace both our current IMAP, POP, and web servers.
Some bumps in the road…
Some of our POP users started experiencing connection problems after being moved to the new storage servers. These users have now been moved back to the old storage servers until we resolve these problems. Update 13:00 CET 27.03.2014: This has probably been solved and we are waiting for feedback from everyone that was affected previously.
Additionally, the interaction between new storage, old storage and the new IMAP servers did not work exactly as predicted, so we rolled back the changes on Wednesday. We had done extensive testing over a long period of time before we deployed this solution, but with some differences (NIC, OS versions) We have now done further testing and will attempt deployment again shortly .
What we’re doing to resolve the problems
We have reviewed the process thus far in detail and uncovered the likely cause of the problems between the new and old servers. We are making the required system changes to ensure a smooth transition next time.
We would like to apologize to those of you who have experienced connection problems with Runbox recently with IMAP and POP, and assure you that we, along with our team of system administrators, will work to resolve these problems over the next few days so that we can provide fast and reliable services to everyone who cares about online privacy, security and sustainable services.
We have gathered and analyzed data from the previous attempt at deploying the new servers and will make another attempt Wednesday (02.04.2014) morning CET, this time using a new set of virtualized servers. We will test new combinations of hardware and software between 8-10 AM CET until we have found the configuration that performs best. Meanwhile we have adjusted the configurations of the current IMAP servers to allow more concurrent connections and stop the connection errors some of our customers have seen throughout the day.
Generally IMAP should now operate normally. Between 9 and 11 AM CET when we carry out configuration work with the new IMAP servers some users may experience intermittent connection problems. This work will ensure that the new servers perform at their optimum reliability when we complete their configuration.
The new IMAP servers have performed perfectly during our test phase while emulating a large number of users, but something causes them to slow down when communicating with the new ZFS based storage units. We are working systematically to eliminate the causes and are excited about offering this superior storage technology to all our customers.
After several days of testing we have narrowed down the problem to the new ZFS based storage units; not the IMAP servers as was indicated earlier. There are two main issues we are looking at and we expect to have a permanently deployed solution after a couple more days of work.
We plan to do the work outside of European and US business hours to avoid service disruptions for as many customers as possible. We are also looking at contingency plans in case this does not turn out as expected.
If you experience connection errors with Runbox IMAP, please contact Support as the symptoms can vary from account to account. We can then take steps to improve the situation for your account specifically.
We have confirmed that the problem with the new ZFS storage was related to deadlocks in certain NFS threads in its operating system. A patch for this error was recently released, and after applying this upgrade the server has been operating perfectly for a full working day.
We therefore believe the problem to be resolved. We will continue to monitor its performance closely over the next few days.
The plan is then to continue moving user accounts to the new ZFS storage and our new IMAP servers, which is likely to improve IMAP performance for all our customers.