[Op1st-dc-alerts] [mghpcc-alerts] Update re: Brief cooling system problem in the MGHPCC Computer Room at 3:03PM June 15, 2023
mghpcc-alerts
mghpcc-alerts at mghpcc.org
Fri Jun 16 22:10:52 EDT 2023
The hardware replacement that the Schneider technician provided was an incorrect part, and they will not be able to do any further work until after the holiday weekend.
The facility continues to operate with manual settings where needed to work around the faulty controller. The facility can operate in this mode, and we will have staff either on site or close by prepared to respond if problems occur. However, we may not be able to react as quickly as the automated controls to low probability / high-impact events such as utility power failure.
======prior alerts========
***June 16, 2022 2:45PM***
Subject:
Update re: Brief cooling system problem in the MGHPCC Computer Room at 3:03PM June 15, 2023
Diagnosis:
The source of yesterday’s cooling system malfunction was a Building Management System (BMS) controller that incorrectly turned a valve, shutting off part of the chilled water supply to the computer room.
Follow up
After the controller malfunctioned for a second time at 10PM yesterday evening (6/15), we took several actions:
Restored the valve position within a few minutes, minimizing impact.
Disabled the control signal from the BMS controller to the valve.
Kept people on site who can operate the valve manually in the (unlikely event that
needs to be operated.
Brought in Schneider technicians to analyze the controller.
Inspected relevant sections of the controller software to look for possible error and/or
problematic inputs.
Next Step
The Schneider technicians have recommended replacing the CPU board in the controller. We will be doing a short test to verify that this can be done non-disruptively, followed by the actual replacement after the new board arrives (ETA 4:30PM today).
***June 15, 2023***
At 3:03 this afternoon (June 15), a cooling system malfunction affected the flow of chilled water to the MGHPCC Computer Room. The malfunction was corrected at approximately 3:08PM bringing water flow back to normal.
We will send an update after determining root cause.
Apologies for any inconvenience and/or equipment alarms that this may have caused.
More information about the op1st-dc-alerts
mailing list