[MUD-Dev] ADMIN: So, what has been happening?
J C Lawrence
claw at kanga.nu
Wed Jun 18 22:09:07 CEST 2003
Writing as list owner:
You may have noticed a recent lack of list mail. Some explanation is
in order. Hopefully most of you have caught the very hurried notes at
http://www.kanga.nu/.
The very brief story:
A drive on kanga.Nu dies right before I flew out of state, and
another drive on the firewall sitting in front of the systems that
holds the Kanga.Nu backups died the day after I flew out.
I've been busy working out a recovery. We're pretty close to being
recovered with the exception of the list archives, but there are
many bits left.
The long story calendared:
28 May 2003 -- Berkeley has a brownout. The UPS at the ISP that
hosts Kanga.Nu literally melts down. It seems that this is not
unusual for large APC UPSes, and in fact is preferable to their
competition which tend to catch fire instead (search the NANOG
archives for discussion on this event and UPS failure modes in
general -- it is educational reading).
29 Man 2003 -- Having previously thought it was a dead router, I get
worried and call my ISP. They tell me about the melted router and
schedule time the next day to work with a tech on site over the
phone to find out why Kanga.Nu is not back on the 'net.
30 May 2003 16:00hrs -- I start working with Bill Woodcock on why
Kanga.Nu won't boot.
30 May 2003 17:45hrs -- We determine that a drive has died, that it
can't be resurrected, and remove it from the mount table and bring
the system, haltingly, partially, back to life. The drive that died
held all the list archives and most of the scripts that Kanga.Nu
uses for the mail system.
30 May 2003 17:45 - 18:30hrs -- I rapidly patch, bandaide, and duct
tape the systems on Kanga.Nu to work as best they can without the
dead drive.
30 May 2003 19:00hrs -- I leave to catch a plane to Connecticut.
2 June 2003 -- I start work on a consulting gig with Pfizer. I also
notice that I can't get into my home network...
4 June 2003 -- I persuade a fried to go over to my house and I walk
him thru trying to find out why the home gateway/firewall system
isn't running. It turns out that there was a brownout in San Jose
on Saturday (last day of May, the day after I left), which despite a
UPS and surge protector killed the drive in the gateway/firewall.
Problems: The backups for Kanga.Nu are on a system behind that
firewall and I can't get to them. Additionally the firewall at
Pfizer is dropping SSH sessions after roughly 15 minutes.
5 June 2003 -- I finally arrange permanent housing for myself in
Connecticut and buy a car (which turns out to be a time consuming
pain to register, but there lies another story).
11 June 2003 -- After wiring the house I'm in with CAT 5e,
installing network jacks etc and installing a D-Link home firewall
NAT box, I now finally have a decently working 'net connection.
Much typing ensues.
13 June 2003 -- Jon Leonard not only graciously lends me a spare
system, but installs Linux/Debian on it and goes over to my house
and hooks it up to my home network. This is a life saver. Great
thanks go to Jon for this.
15 June 2003 -- I get the firewall on Jon's borrowed system fully
configured, and regain access to the systems on the home network.
16 June 2003 -- The backups of the list archives etc finally finish
copying to kanga.Nu (DSL with 129K upstream). Missing are the list
archives for the last few quarters which are on other systems and
rather fiddly to restore.
17 June 2003 -- Due to side effects of the various other changes
going on I've had to upgrade Mailman 2.1 and the mail system
underneath it (which was in the plans, just not quite yet). Tuesday
is spent checking details and corners to make sure the upgrades are
done and now operate correctly.
18 June 2003 -- I start bringing the mailing lists back online, and
verifying that they operate correctly. This message is in fact one
of the tests for that...
Status:
Kanga.Nu *should* be functional as of now. Th major known exception
is that the list archives for this year haven't been restored yet.
They will be, there are just many other things to be done first.
Reconstruction is taking time, but is being worked on daily. In
this line I'd particularly like to thank Bill Woodcock and Jon
Leonard who have and are spending large chunks of their free time to
help get things back up and running properly. Much kudos should go
their way.
Please accept my apologies for the lack of information and response
over the last week. 'Net connectivity in Groton is not wonderful,
and my available time to do much more than triage and simple
rescue/reconstruction work has been minimal. I'd originally planned
and booked my time in Groton for standard moderation loads, not
system rescues.
Lost mail:
I have roughly 20 messages that were in the moderation queue with
Kanga.Nu went down in the brownout in Berkeley. I strongly suspect
that some mail was lost during all the various problems and
downtime.
I will be re-importing the old held mail into the mail system over
the next few days and moderating it as per normal. Please bear with
me on this. If you do not see your message on the list after a few
days, please resubmit it to the list.
Future:
There are a bunch of new features in Mailman 2.1. Don't be
surprised if things change a bit over the next days as I look into
how the new features can best be adapted to MUD-Dev's needs.
Thanks for your patience. Things have been busy.
ObNote: My disaster recovery plans before this never encluded failures
where I wasn't and couldn't be physically present _and_ didn't have
reliable 'net access. Its been a learning experience.
--
J C Lawrence
---------(*) Satan, oscillate my metallic sonatas.
claw at kanga.nu He lived as a devil, eh?
http://www.kanga.nu/~claw/ Evil is a name of a foeman, as I live.
_______________________________________________
MUD-Dev mailing list
MUD-Dev at kanga.nu
https://www.kanga.nu/lists/listinfo/mud-dev
More information about the mud-dev-archive
mailing list