Incident Report for Today's DeployToday at 9am NZT we took down the realm for the deployment of the new account system. This migration was expected to take around 4 hours. The first thing that went wrong was that the migration took longer to run than it did on our test hardware. This extended the downtime for an extra hour past the point that we had budgeted for. After the realm was brought back up around 2PM NZT, we found that many players were getting disconnected frequently. This was caused by crashes in one of the backend master servers that caused online account session information to be lost. We spent around 15 minutes trying to investigate the causes of these crashes but were unable to immediately come up with any solutions so we decided to roll back the patch. Unfortunately in this case, what would normally take a very short amount of time to roll back took a very long time due to the extensive database migrations that had occurred during deployment. The databases are very large and restoring the backup took quite some time. The realm was brought back and the game restored at 3PM NZT. The restore of the website databases took even longer and resulted in extended website downtime as well (the website was not available until 4:30PM NZT). After investigation we have discovered that the crashes were caused by a very simple flaw. The constant that represents the length of an account name used in the account session was still accidentally using an old value, before we added the discriminator. If a player logged in with an account name longer than 27 characters then it would result in an exception being thrown when trying to copy the account name into the account session. This on its own should not have resulted in the master crashing, but this occurred in an area of the code base that was designed to be exception free, which resulted in the entire process crashing. The bug itself is already fixed, and we have also changed the code to be more resistant to exceptions occurring. However, we have decided to delay the redeploy of the patch until Monday NZT. It is clear that we need to do another round of QA on this deployment to make sure that we have found all corner cases before we can be confident in deploying it again. This is not the level of service you should expect from Grinding Gear Games and we are very sorry for the extended downtime. |
|
Transparency like this is what all game companies should strive for.
Good on you for not wasting time in deciding to roll everything back when you couldn't devise a solution immediately. |
|
Thanks for being honest with us
|
|
YEP classic
|
|
NT
|
|
Just keep up the good work. Noone is mad about this I guess. At least not me.
|
|
Thanks for sharing what happened.
It will convert your forum titles into decorative square badges that use the space next to your forum posts more economically so that you can show off an unlimited number of them at any one time. - GGG, 2018 (https://www.pathofexile.com/forum/view-thread/3573673)
|
|
Appreciate the update. Mystery Box for the poor please. j/k.
Love you guys. No joke. This game is the best ever created. Keep it up, bugs happen. I cannot fathom the amount of work that goes into this. Wishing you exiles sanity. |
|
Take your time, get it right and we will be patient. We're on the precipice of a huge and great change and I'm sure it's way more complex than most of us realise. Try to not focus on the lunatics and know that the majority of the playerbase is understanding and really appreciates this level of transparency
|
|
This level of transparency is respectable and commendable! Really appreciate the thorough explanation.
Keep up the great work! |
|