Tuesday, February 8, 2011

Outage: 2011-02-08 00:40-02:15 PST

*.mozillamessaging.com experienced a network outage that took us down from approximately 00:40 PST until 02:15 PST.

The root cause has not yet been determined, but services are now operating normally.

Tuesday, August 4, 2009

Buildbot downtime : Aug 5th 07:00-08:00 PDT

For a short duration, on August 5th, between 07:00-08:00 PDT, the Thunderbird buildbot master server will be restarted to close bug 501272 - [buildbot] Increase RAM of buildbot master to 1GB. Our buildbot master has been running very low on memory, and has started to experience more swaping than is acceptable. This downtime should be fairly short and once back, our buildbot master should resume business as usual, except happier.

Sunday, May 10, 2009

2 buildbot maintenance windows this week

In the coming week, there will be 2 buildbot maintenance windows that could close the tree for 2-4 hours each. The reason for these is to allow for distruptive (from tree green-ness point of view) build systems and buildbot changes.

After these, the new hardware and resources available to our build infrastructure should be ready for use. As usual, work on these will be tracked in bugzilla, and if you encounter any problems, or have any good reason why this should be postponed, let it be known.

9AM PST, Monday, May 11th, 2009 - bug 492298
9AM PST, Thursday, May 14th, 2009 - bug 492297

[Edited: Fixed incorrect times due to backwards EDT -> PST conversion]

Friday, May 1, 2009

Hardware move complete

It took a little longer than initially planned for, but it's finally complete. All our hardware is now sitting in our brand new space, with room to grow.

The hardware upgrade also went really well, so our overall build capacity has been greatly increased today.

The builders are just starting to pick up builds again, so I expect it will take a little bit more time for the trees to go green again.

That's all folks!

Tuesday, April 28, 2009

Downtime: Moving Hardware - 2009/04/30

On Thursday, April 30th, 2009, we will be moving all our gear to a different location, so there will be downtime of all Mozilla Messaging services spread through the day.

Internet facing services should be down for 1-2 hours, and these include:

  • www.mozillamessaging.com
  • planet.mozillamessaging.com
  • SpreadThunderbird
  • All other *.mozillamessaging.com sites
Build services will be down for a longer period of time, and it might require closing the tree for a few additionnal hours, while it gets itself back to it's usual green.

As a positive side-effect of this move, we'll have more room to expand our capacity in the future. Plus, this includes a planned series of hardware upgrades that will be happening at the same time, a perfect occasion, since we have to power everything down anyways.

When completed, our build infrastructure will have close to 4x more computing resources at its disposal, yummy!

More information will be posted to this blog, as the move progresses.

Also, you can track progress on this issue by watching bug 490578

As usual, we always try and minimize outwardly visible downtime, but this time around, it can't be completely avoided.

[Update: Re-scheduled to April 30th]

Monday, April 27, 2009

Warning : MPT Colo Network issues

[Mon Apr 27 08:45:37 PDT 2009]

The primary Mozilla Colo in San Jose has experienced networking issues. While this was hapenning, there was some spurious build bustage, as various services *.mozilla.org would sometimes timeout.

[Update: Mon Apr 27 10:13:59 PDT 2009: All is back to normal]
[Reported: Mon Apr 27 08:45:37 PDT 2009]

Saturday, April 18, 2009

Resolved - Intermittent Network Issues - 2009/04/18

Starting at around 8:30 EST this morning, our main firewall has started experiencing some problems, and as a result, network connectivity is degraded. I am seeing highly variable packet drop rates, sometimes reaching up to > 80 %.

This means that currently, pretty much all *.mozillamessaging.com and *.spreadthunderbird.com will be slow at best, and might display hangs and time-outs.

Apologies all around, and I'll post an update as soon as this situation is resolved.

[UPDATE: 17:05 EDT - Issue resolved]
[UPDATE: 13:15 EDT - It's hapenning again]
[UPDATE: 11:20 EDT - Things are looking normal again]