Sunday, May 10, 2009

2 buildbot maintenance windows this week

In the coming week, there will be 2 buildbot maintenance windows that could close the tree for 2-4 hours each. The reason for these is to allow for distruptive (from tree green-ness point of view) build systems and buildbot changes.

After these, the new hardware and resources available to our build infrastructure should be ready for use. As usual, work on these will be tracked in bugzilla, and if you encounter any problems, or have any good reason why this should be postponed, let it be known.

9AM PST, Monday, May 11th, 2009 - bug 492298
9AM PST, Thursday, May 14th, 2009 - bug 492297

[Edited: Fixed incorrect times due to backwards EDT -> PST conversion]

Friday, May 1, 2009

Hardware move complete

It took a little longer than initially planned for, but it's finally complete. All our hardware is now sitting in our brand new space, with room to grow.

The hardware upgrade also went really well, so our overall build capacity has been greatly increased today.

The builders are just starting to pick up builds again, so I expect it will take a little bit more time for the trees to go green again.

That's all folks!

Tuesday, April 28, 2009

Downtime: Moving Hardware - 2009/04/30

On Thursday, April 30th, 2009, we will be moving all our gear to a different location, so there will be downtime of all Mozilla Messaging services spread through the day.

Internet facing services should be down for 1-2 hours, and these include:

  • www.mozillamessaging.com
  • planet.mozillamessaging.com
  • SpreadThunderbird
  • All other *.mozillamessaging.com sites
Build services will be down for a longer period of time, and it might require closing the tree for a few additionnal hours, while it gets itself back to it's usual green.

As a positive side-effect of this move, we'll have more room to expand our capacity in the future. Plus, this includes a planned series of hardware upgrades that will be happening at the same time, a perfect occasion, since we have to power everything down anyways.

When completed, our build infrastructure will have close to 4x more computing resources at its disposal, yummy!

More information will be posted to this blog, as the move progresses.

Also, you can track progress on this issue by watching bug 490578

As usual, we always try and minimize outwardly visible downtime, but this time around, it can't be completely avoided.

[Update: Re-scheduled to April 30th]

Monday, April 27, 2009

Warning : MPT Colo Network issues

[Mon Apr 27 08:45:37 PDT 2009]

The primary Mozilla Colo in San Jose has experienced networking issues. While this was hapenning, there was some spurious build bustage, as various services *.mozilla.org would sometimes timeout.

[Update: Mon Apr 27 10:13:59 PDT 2009: All is back to normal]
[Reported: Mon Apr 27 08:45:37 PDT 2009]

Saturday, April 18, 2009

Resolved - Intermittent Network Issues - 2009/04/18

Starting at around 8:30 EST this morning, our main firewall has started experiencing some problems, and as a result, network connectivity is degraded. I am seeing highly variable packet drop rates, sometimes reaching up to > 80 %.

This means that currently, pretty much all *.mozillamessaging.com and *.spreadthunderbird.com will be slow at best, and might display hangs and time-outs.

Apologies all around, and I'll post an update as soon as this situation is resolved.

[UPDATE: 17:05 EDT - Issue resolved]
[UPDATE: 13:15 EDT - It's hapenning again]
[UPDATE: 11:20 EDT - Things are looking normal again]

Thursday, February 19, 2009

Downtime: OS X Builders

Around 2-3 PM PDT today, the OS X builders will be disabled temporarly so they can benefit from a RAM upgrade. This means the Tinderbox trees will not see OS X builds for that duration. Also, as the builders get terminated, they might introduce spurious burning to the trees, please ignore them.

Tracking bug is 474794

Wednesday, January 21, 2009

Rolling Build Infrastructure Outage

In light of the various bustages/instabilities from last week, I will be performing a few sweeping system-wide changes to the build infrastructure at Mozilla Messaging.

What this means is that today, every single build host will be at least stoped+started, and possibly rebooted too.

This should not induce bustage to the Trees, but it does mean that for a certain time window, various builders will be offline, so they won't be picking up new code pushes. For this reason, it was judged safer to hold the Trees closed while this is hapenning. That's Thunderbird, Thunderbird3.0 and Sunbird.

The work will be tracked in bug 474600, where the progress of the outage will be tracked. If you notice problems with builders during that period and you think it's related to this work, add a comment there.

Also, we'll try and keep tree closure to a minimum.